PyTorch学习系列(九)——参数_初始化
那么如何在pytorch⾥实现呢。
PyTorch提供了多种参数初始化函数:
⽰例:
init.xavier_v1.weight)
上⾯的语句是对⽹络的某⼀层参数进⾏初始化。如何对整个⽹络的参数进⾏初始化定制呢?
def weights_init(m):
classname=m.class.name
if classname.find(‘Conv’) != -1:
xavier(m.weight.data)
xavier(m.bias.data)
net = Net()
net.apply(weights_init) #apply函数会递归地搜索⽹络内的所有module并把参数表⽰的函数应⽤到所有的module上。
不建议访问以下划线为前缀的成员,他们是内部的,如果有改变不会通知⽤户。更推荐的⼀种⽅法是检查某个module是否是某种类型:
def weights_init(m):
if isinstance(m, nn.Conv2d):
xavier(m.weight.data)
xavier(m.bias.data)
在使⽤⼤多如下使⽤:
def weights_init(m):
classname = m.class.name
# print(classname)
if classname.find(‘Conv3d’) != -1:
init.xavier_normal_(m.weight.data)英文经典
elif classname.find(‘Linear’) != -1:
init.xavier_normal_(m.weight.data)
model = C3D()
model.apply(weights_init)
1
2
gain = nn.init.calculate_gain(‘leaky_relu’)
1
w = pty(3, 5)
nn.init.uniform_(w)
1
2
w = pty(3, 5)
al_(w)
1
2
stant_(w, 0.3)
1
2
<_(tensor)
w = pty(3, 5)
_(w)
1
2
wiw = pty(3, 16, 5, 5)
nn.init.dirac_(w)
1
2
w = pty(3, 5)
万国司考
nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain(‘relu’))
1
2
w = pty(3, 5)
nn.init.xavier_normal_(w)
1
2
w = pty(3, 5)
nn.init.kaiming_uniform_(w, mode=‘fan_in’, nonlinearity=‘relu’)
1
2
w = pty(3, 5)
derivative
nn.init.kaiming_normal_(w, mode=‘fan_out’, nonlinearity=‘relu’)
1
2
w = pty(3, 5)
hogonal_(w)
1
2
w = pty(3, 5)
nn.init.spar_(w, sparsity=0.1)
卷积层
⾼斯初始化
从均值为0,⽅差为1的⾼斯分布中采样,作为初始权值。PyTorch中的相关函数如下:
1
kaiming⾼斯初始化
由FAIR的⼤⽜Kaiming He提出来的卷积层权值初始化⽅法,⽬的是使得每⼀卷积层的输出的⽅差都为1,具体数学推导可以参考论⽂[1].权值的初始化⽅法如下:
Wl~N(0,2(1+a2)×nl−−−−−−−−−−−√)
Wl~N(0,2(1+a2)×nl)
其中,a为Relu或Leaky Relu的负半轴斜率,nlnl为输⼊的维数,即nl=卷积核边长2×channel数nl=卷积核边长2×channel数。
在PyTorch中,相关函数如下:
1
上述输⼊参数中,tensor是torch.Tensor变量,a为Relu函数的负半轴斜率,mode表⽰是让前向传播还是反向传播的输出的⽅差为
1,nonlinearity可以选择是relu还是leaky_relu.
xavier⾼斯初始化
Glorot正态分布初始化⽅法,也称作Xavier正态分布初始化,参数由0均值,标准差为sqrt(2 / (fan_in +
fan_out))的正态分布产⽣,其中fan_in和fan_out是分别权值张量的输⼊和输出元素数⽬. 这种初始化同样是为了保证输⼊输出的⽅差不变,但是原论⽂中([2])是基于线性函数推导的,同时在tanh激活函数上有很好的效果,但不适⽤于ReLU激活函数。
std=gain×2fan_in+fan_out−−−−−−−−−−−−−−√
std=gain×2fan_in+fan_out
在PyTorch中,相关函数如下:
海湾地区
1
BatchNorm层
回顾
BatchNorm
初始化
对于scale因⼦γγ,初始化为1;对于shift因⼦ββ,初始化为0.
全连接层
对于全连接层,除了可以使⽤卷积层的基于⾼斯分布的初始⽅法外,也有使⽤均匀分布(uniform distribution)的初始化⽅法,或者直接设置为常量(constant)。
还有其它这⾥没有细讲的初始化⽅法,包括:
Orthogonal:⽤随机正交矩阵初始化。圣诞老人的故事
spar:⽤稀疏矩阵初始化。
TruncatedNormal:截尾⾼斯分布,类似于⾼斯分布,位于均值两个标准差以外的数据将会被丢弃并重新⽣成,形成截尾分布。PyTorch中似乎没有相关实现。
医生 英文参考
[1] Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification — He, K. et al. (2015)
[2] Understanding the difficulty of training deep feedforward neural networks — Glorot, X. & Bengio, Y. (2010)
-- coding: utf-8 --
from future import division
“”"
Creates a ResNeXt Model as defined in:
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2016).
Aggregated residual transformations for deep neural networks.
arXiv preprint arXiv:1611.05431.
“”"
author = “Pau Rodríguez López, ISELAB, CVC-UAB”
email = "
as nn
functional as F
import init
def __init__(lf, in_channels, out_channels, stride, cardinality, ba_width, widen_factor):
""" Constructor
Args:iveco
in_channels: input channel dimensionality
out_channels: output channel dimensionality
stride: conv stride. Replaces pooling layer.
cardinality: num of convolution groups.
ba_width: ba number of channels in each group.
widen_factor: factor to reduce the input dimensionality before convolution.
"""
super(ResNeXtBottleneck, lf).__init__()
width_ratio = out_channels / (widen_factor * 64.)
D = cardinality * int(ba_width * width_ratio)
lf.bn_reduce = nn.BatchNorm2d(D)
lf.bn_expand = nn.BatchNorm2d(out_channels)
lf.shortcut = nn.Sequential()
if in_channels != out_channels:
lf.shortcut.add_module('shortcut_conv',
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0,
bias=Fal))
lf.shortcut.add_module('shortcut_bn', nn.BatchNorm2d(out_channels))
def forward(lf, x):
bottleneck = lf.conv_reduce.forward(x)
bottleneck = F.relu(lf.bn_reduce.forward(bottleneck), inplace=True)
bottleneck = lf.conv_conv.forward(bottleneck)
bottleneck = F.relu(lf.bn.forward(bottleneck), inplace=True)
bottleneck = lf.conv_expand.forward(bottleneck)
bottleneck = lf.bn_expand.forward(bottleneck)
residual = lf.shortcut.forward(x)
lu(residual + bottleneck, inplace=True)
def __init__(lf, cardinality, depth, nlabels, ba_width, widen_factor=4):
韩国人生活方式
""" Constructor
Args:
cardinality: number of convolution groups.
depth: number of layers.
nlabels: number of class
ba_width: ba number of channels in each group.
widen_factor: factor to adjust the channel dimensionality
"""
super(CifarResNeXt, lf).__init__()
lf.cardinality = cardinality
lf.depth = depth
lf.block_depth = (lf.depth - 2) // 9
英语补习班
lf.ba_width = ba_width
lf.widen_factor = widen_factor
lf.nlabels = nlabels
lf.output_size = 64
lf.stages = [64, 64 * lf.widen_factor, 128 * lf.widen_factor, 256 * lf.widen_factor]
lf.bn_1 = nn.BatchNorm2d(64)
lf.stage_1 = lf.block('stage_1', lf.stages[0], lf.stages[1], 1)
lf.stage_2 = lf.block('stage_2', lf.stages[1], lf.stages[2], 2)
lf.stage_3 = lf.block('stage_3', lf.stages[2], lf.stages[3], 2)
lf.classifier = nn.Linear(lf.stages[3], nlabels)
init.kaiming_normal(lf.classifier.weight)
for key in lf.state_dict():