译（五十六）-Pytorch梯度剪裁

译（五十六）-Pytorch梯度剪裁 2023-07-16 950

文章首发及后续更新：，无图/无目录/格式错误/更多相关请至首发页查看。新的更新内容请到查看。欢迎提出任何疑问及批评，非常感谢！

如有翻译问题欢迎评论指出，谢谢。

asked: 怎么用 PyTorch 实现梯度剪裁？我碰到了梯度爆炸的问题。 Answers: - vote: 143 更完整的示例见。 optimizer.zero_grad() loss, hidden = model(data, hidden, targets) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step() - vote: 0 我碰到了相同的错误，我想剪裁正则但是依然是nan。译者注：答主在评论区提到 doesn’t work 是指 still gives a ‘nan’。我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。 - vote: 3 如果用的是 AMP，剪裁前还需要一些步骤： optimizer.zero_grad() loss, hidden = model(data, hidden, targets) self.scaler.scale(loss).backward() # Unscales the gradients of optimizers assigned params in-place self.scaler.unscale_(optimizer) # Since the gradients of optimizers assigned params are unscaled, clips as usual: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # optimizers gradients are already unscaled, so scaler.step does not unscale them, # although it still skips optimizer.step() if the gradients contain infs or NaNs. scaler.step(optimizer) # Updates the scale for next iteration. scaler.update() 参考：

asked: What is the correct way to perform gradient clipping in pytorch? 怎么用 PyTorch 实现梯度剪裁？ I have an exploding gradients problem. 我碰到了梯度爆炸的问题。 Answers: - vote: 143 A more complete example from : 更完整的示例见。 optimizer.zero_grad() loss, hidden = model(data, hidden, targets) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step() - vote: 0 Well, I met with same err. I tried to use the clip norm but it doesn’t work. 我碰到了相同的错误，我想剪裁正则但是依然是nan。译者注：答主在评论区提到 doesn’t work 是指 still gives a ‘nan’。 I don’t want to change the network or add regularizers. So I change the optimizer to Adam, and it works. 我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。 Then I use the pretrained model from Adam to initate the training and use SGD + momentum for fine tuning. It is now working. 具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。 - vote: 3 And if you are using Automatic Mixed Precision (AMP), you need to do a bit more before clipping: 如果用的是 AMP，剪裁前还需要一些步骤： optimizer.zero_grad() loss, hidden = model(data, hidden, targets) self.scaler.scale(loss).backward() # Unscales the gradients of optimizers assigned params in-place self.scaler.unscale_(optimizer) # Since the gradients of optimizers assigned params are unscaled, clips as usual: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # optimizers gradients are already unscaled, so scaler.step does not unscale them, # although it still skips optimizer.step() if the gradients contain infs or NaNs. scaler.step(optimizer) # Updates the scale for next iteration. scaler.update() Reference: 参考： [https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping](

免费搭建微信查券返利机器人来轻松赚佣金

文章来自:IT技术分享网
分享地址:http://www.5ityx.cn/cate100/366135.html

上一篇： .gitignore 文件不生效问题 & 解决方法

下一篇：【爬虫】Python爬取电商平台评论

译（五十六）-Pytorch梯度剪裁

译（五十六）-Pytorch梯度剪裁 相关内容

聚合标签

译（五十六）-Pytorch梯度剪裁相关内容