pytorch中卷积操作的初始化方法(kaiming_uniform_详解)

更新时间:2023-07-10 02:00:27 阅读: 评论:0

pytorch中卷积操作的初始化⽅法(kaiming_uniform_详解)摘要:
最近写了⼀篇⽂章,reviewers给了⼏个意见,其中之⼀就是:不同配置下的⽹络初始化条件是否相同,是怎样初始化的?
之前竟然没有关注过这个问题,应该是torch默认情况下会初始化卷积核参数,这⾥详细讲解⼀下torch卷积操作的初始化过程。
1. pytorch中的卷积运算分类
在pycharm的IDE中,按住ctrl+⿏标点击Conv2d可以进⼊torch的内部卷积运算的源码(conv.py)
搭建⽹络经常使⽤到的模块如下图所⽰:
class _ConvNd(Module):
class Conv1d(_ConvNd):
class Conv2d(_ConvNd):
class Conv3d(_ConvNd):
class _ConvTranspoNd(_ConvNd):
class ConvTranspo1d(_ConvTranspoNd):
class ConvTranspo2d(_ConvTranspoNd):
class ConvTranspo3d(_ConvTranspoNd):
可以看到:常⽤的卷积的⽗类均是
class _ConvNd(Module):
并且点开  class Conv2d(_ConvNd): 并没有发现参数初始化的具体⽅法,如下图所⽰。
所以猜想卷积初始化参数的⽅法应该在⽗类 _ConvNd(Module):
考研线代2. pytorch中的卷积操作的⽗类
下⾯是⽗类 _ConvNd 的源码,其中初始化参数的 ⽅法是
def ret_parameters(lf) -> None:
class _ConvNd(Module):
__constants__ = ['stride', 'padding', 'dilation', 'groups',
'padding_mode', 'output_padding', 'in_channels',
'out_channels', 'kernel_size']
__annotations__ = {'bias': Optional[torch.Tensor]}
def _conv_forward(lf, input: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor:        ...
_in_channels: int
out_channels: int
kernel_size: Tuple[int, ...]
电脑网速慢
stride: Tuple[int, ...]
padding: Tuple[int, ...]
dilation: Tuple[int, ...]
transpod: bool
output_padding: Tuple[int, ...]
groups: int
padding_mode: str
weight: Tensor
bias: Optional[Tensor]
def __init__(lf,
in_channels: int,
in_channels: int,
out_channels: int,
kernel_size: Tuple[int, ...],
stride: Tuple[int, ...],
padding: Tuple[int, ...],
dilation: Tuple[int, ...],
transpod: bool,
output_padding: Tuple[int, ...],
groups: int,
bias: bool,
padding_mode: str) -> None:
super(_ConvNd, lf).__init__()
if in_channels % groups != 0:
rai ValueError('in_channels must be divisible by groups')
让爱飞扬
if out_channels % groups != 0:
rai ValueError('out_channels must be divisible by groups')
valid_padding_modes = {'zeros', 'reflect', 'replicate', 'circular'}
if padding_mode not in valid_padding_modes:
rai ValueError("padding_mode must be one of {}, but got padding_mode='{}'".format(                valid_padding_modes, padding_mode))
lf.in_channels = in_channels
lf.out_channels = out_channels
lf.kernel_size = kernel_size
lf.stride = stride
lf.padding = padding
lf.dilation = dilation
lf.output_padding = output_padding
lf.padding_mode = padding_mode
# `_reverd_padding_repeated_twice` is the padding to be pasd to
# `F.pad` if needed (e.g., for non-zero padding types that are
# implemented as two ops: padding + conv). `F.pad` accepts paddings in
# rever order than the dimension.
lf._reverd_padding_repeated_twice = _rever_repeat_tuple(lf.padding, 2)
if transpod:
lf.weight = Parameter(torch.Tensor(
in_channels, out_channels // groups, *kernel_size))
el:
lf.weight = Parameter(torch.Tensor(
out_channels, in_channels // groups, *kernel_size))
if bias:
lf.bias = Parameter(torch.Tensor(out_channels))
el:
<_parameters()
def ret_parameters(lf) -> None:
init.kaiming_uniform_(lf.weight, a=math.sqrt(5))
if lf.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(lf.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(lf.bias, -bound, bound)
def extra_repr(lf):
s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}'
', stride={stride}')
if lf.padding != (0,) * len(lf.padding):
s += ', padding={padding}'
if lf.dilation != (1,) * len(lf.dilation):
s += ', dilation={dilation}'
if lf.output_padding != (0,) * len(lf.output_padding):
s += ', output_padding={output_padding}'
ups != 1:
s += ', groups={groups}'
if lf.bias is None:
if lf.bias is None:
s += ', bias=Fal'
if lf.padding_mode != 'zeros':
s += ', padding_mode={padding_mode}'
return s.format(**lf.__dict__)
def __tstate__(lf, state):
super(_ConvNd, lf).__tstate__(state)
if not hasattr(lf, 'padding_mode'):
lf.padding_mode = 'zeros'
3. def ret_parameters(lf) -> None:
卷积操作的默认的初始化⽅式:小米面
def ret_parameters(lf) -> None:
init.kaiming_uniform_(lf.weight, a=math.sqrt(5))
if lf.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(lf.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(lf.bias, -bound, bound)
该类中的参数的初始化⽅式是:Kaiming初始化
由我国计算机视觉领域专家何凯明提出了针对于relu的初始化⽅法,pytorch默认使⽤kaiming正态分布初始化卷积层参数。
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015),
using a uniform distribution. The resulting tensor will have values sampled from U( − bound, bound)  where bound = gain × √((3)/( fan_mode))
李蔚语
Also known as He initialization.
3.1 卷积核部分的参数初始化:
init.kaiming_uniform_(lf.weight, a=math.sqrt(5))
关于init.kaiming_uniform_这个函数,源码如下:
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
r"""Fills the input `Tensor` with values according to the method
described in `Delving deep into rectifiers: Surpassing human-level
performance on ImageNet classification` - He, K. et al. (2015), using a
uniform distribution. The resulting tensor will have values sampled from
:math:`\mathcal{U}(-\text{bound}, \text{bound})` where
.. math::
\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
Also known as He initialization.
Args:汪国真经典诗文
tensor: an n-dimensional `torch.Tensor`
手柄震动测试a: the negative slope of the rectifier ud after this layer (only
ud with ``'leaky_relu'``)
mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
prerves the magnitude of the variance of the weights in the
forward pass. Choosing ``'fan_out'`` prerves the magnitudes in the
backwards pass.
nonlinearity: the non-linear function (`nn.functional` name),
recommended to u only with ``'relu'`` or ``'leaky_relu'`` (default).
Examples:
>>> w = pty(3, 5)
>>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
"""
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
如何赚钱_grad():
return tensor.uniform_(-bound, bound)
torch中卷积核默认的初始化的详细参数为:
init.kaiming_uniform_(lf.weight, a=math.sqrt(5),mode='fan_in', nonlinearity='leaky_relu'))
关于 init.kaiming_uniform_中所使⽤的其他函数 ,如下不做进⼀步的分析,不过还是简单介绍⼀下。
_calculate_correct_fan(tensor, mode) # ⽤于计算计算当前⽹络层的fan_in(输⼊神经元个数)或  fan_out(输出神经元个数的),取决于 mode 的值 'fan_in' 'fan_o calculate_gain:# 对于给定的⾮线性函数,返回推荐的增益值,其实就是⼀个数,从下⾯图中的列表中选出对应的值
_calculate_correct_fan:在这⾥ model = fan_in, 计算 的是 当前⽹络层的fan_in(输⼊神经元个数)

本文发布于:2023-07-10 02:00:27,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/1088328.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:卷积   参数   默认   个数   函数
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图