1. 注意momentum的定义
Pytorch中的BN层的动量平滑和常见的动量法计算方式是相反的,默认的momentum=0.1 \[ \hat{x}_{\text { new }}=(1-\text { momentum }) \times \hat{x}+\text { momemtum } \times x_{t} \] BN层里的表达式为: \[ y=\frac{x-\mathrm{E}[x]}{\sqrt{\operatorname{Var}[x]+\epsilon}} * \gamma+\beta \] 其中γ和β是可以学习的参数。在Pytorch中,BN层的类的参数有:
1 | CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) |
2. 注意BN层中含有统计数据数值,即均值和方差
track_running_stats – a boolean value that when set
to True
, this module tracks the running mean and variance,
and when set to False
, this module does not track such
statistics and always uses batch statistics in both training and eval
modes. Default: True
并且测试时,model.eval()后,若track_running_stats=True,模型此刻所使用的统计数据是Running status 中的,即通过指数衰减规则,积累到当前的数值。否则依然使用基于当前batch数据的估计值。
3. BN层的统计数据更新是在每一次训练阶段model.train()后的forward()方法中自动实现的,而不是在梯度计算与反向传播中更新optim.step()中完成
4. 冻结BN及其统计数据
从上面的分析可以看出来,正确的冻结BN的方式是在模型训练时,把BN单独挑出来,重新设置其状态为eval (在model.train()之后覆盖training状态).
You should use apply instead of searching its children, while named_children() doesn’t iteratively search submodules.
1 | def set_bn_eval(m): |
1 | def train(self, mode=True): |
5. Fix/frozen Batch Norm when training may lead to RuntimeError: expected scalar type Half but found Float
1 | import torch |
Please do
4 def fix_bn(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.eval().half()Reason for this is, for regular training it is better (performance-wise) to use cudnn batch norm, which requires its weights to be in fp32, thus batch norm modules are not converted to half in
. However, cudnn does not support batchnorm backward in the eval mode , which is what you are doing, and to use pytorch implementation for this, weights have to be of the same type as inputs.