首页 > 美文阅读

pytorch中卷积操作的初始化方法（kaiming_uniform_详解）

更新时间:2023-07-10 02:00:27 阅读：评论：0

pytorch中卷积操作的初始化⽅法（kaiming_uniform_详解）摘要：

最近写了⼀篇⽂章，reviewers给了⼏个意见，其中之⼀就是：不同配置下的⽹络初始化条件是否相同，是怎样初始化的？

之前竟然没有关注过这个问题，应该是torch默认情况下会初始化卷积核参数，这⾥详细讲解⼀下torch卷积操作的初始化过程。

1. pytorch中的卷积运算分类

在pycharm的IDE中，按住ctrl+⿏标点击Conv2d可以进⼊torch的内部卷积运算的源码(conv.py)

搭建⽹络经常使⽤到的模块如下图所⽰：

class _ConvNd(Module):

class Conv1d(_ConvNd):

class Conv2d(_ConvNd):

class Conv3d(_ConvNd):

class _ConvTranspoNd(_ConvNd):

class ConvTranspo1d(_ConvTranspoNd):

class ConvTranspo2d(_ConvTranspoNd):

class ConvTranspo3d(_ConvTranspoNd):

可以看到：常⽤的卷积的⽗类均是

class _ConvNd(Module):

并且点开 class Conv2d(_ConvNd): 并没有发现参数初始化的具体⽅法，如下图所⽰。

所以猜想卷积初始化参数的⽅法应该在⽗类 _ConvNd(Module):

考研线代2. pytorch中的卷积操作的⽗类

下⾯是⽗类 _ConvNd 的源码，其中初始化参数的⽅法是

def ret_parameters(lf) -> None:

class _ConvNd(Module):

__constants__ = ['stride', 'padding', 'dilation', 'groups',

'padding_mode', 'output_padding', 'in_channels',

'out_channels', 'kernel_size']

__annotations__ = {'bias': Optional[torch.Tensor]}

def _conv_forward(lf, input: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor: ...

_in_channels: int

out_channels: int

kernel_size: Tuple[int, ...]

电脑网速慢

stride: Tuple[int, ...]

padding: Tuple[int, ...]

dilation: Tuple[int, ...]

transpod: bool

output_padding: Tuple[int, ...]

groups: int

padding_mode: str

weight: Tensor

bias: Optional[Tensor]

def __init__(lf,

in_channels: int,

out_channels: int,

kernel_size: Tuple[int, ...],

stride: Tuple[int, ...],

padding: Tuple[int, ...],

dilation: Tuple[int, ...],

transpod: bool,

output_padding: Tuple[int, ...],

groups: int,

bias: bool,

padding_mode: str) -> None:

super(_ConvNd, lf).__init__()

if in_channels % groups != 0:

rai ValueError('in_channels must be divisible by groups')

让爱飞扬

if out_channels % groups != 0:

rai ValueError('out_channels must be divisible by groups')

valid_padding_modes = {'zeros', 'reflect', 'replicate', 'circular'}

if padding_mode not in valid_padding_modes:

rai ValueError("padding_mode must be one of {}, but got padding_mode='{}'".format( valid_padding_modes, padding_mode))

lf.in_channels = in_channels

lf.out_channels = out_channels

lf.kernel_size = kernel_size

lf.stride = stride

lf.padding = padding

lf.dilation = dilation

lf.output_padding = output_padding

lf.padding_mode = padding_mode

# `_reverd_padding_repeated_twice` is the padding to be pasd to

# `F.pad` if needed (e.g., for non-zero padding types that are

# implemented as two ops: padding + conv). `F.pad` accepts paddings in

# rever order than the dimension.

lf._reverd_padding_repeated_twice = _rever_repeat_tuple(lf.padding, 2)

if transpod:

lf.weight = Parameter(torch.Tensor(

in_channels, out_channels // groups, *kernel_size))

el:

lf.weight = Parameter(torch.Tensor(

out_channels, in_channels // groups, *kernel_size))

if bias:

lf.bias = Parameter(torch.Tensor(out_channels))

el:

<_parameters()

def ret_parameters(lf) -> None:

init.kaiming_uniform_(lf.weight, a=math.sqrt(5))

if lf.bias is not None:

fan_in, _ = init._calculate_fan_in_and_fan_out(lf.weight)

bound = 1 / math.sqrt(fan_in)

init.uniform_(lf.bias, -bound, bound)

def extra_repr(lf):

s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}'

', stride={stride}')

if lf.padding != (0,) * len(lf.padding):

s += ', padding={padding}'

if lf.dilation != (1,) * len(lf.dilation):

s += ', dilation={dilation}'

if lf.output_padding != (0,) * len(lf.output_padding):

s += ', output_padding={output_padding}'

ups != 1:

s += ', groups={groups}'

if lf.bias is None:

s += ', bias=Fal'

if lf.padding_mode != 'zeros':

s += ', padding_mode={padding_mode}'

return s.format(**lf.__dict__)

def __tstate__(lf, state):

super(_ConvNd, lf).__tstate__(state)

if not hasattr(lf, 'padding_mode'):

lf.padding_mode = 'zeros'

3. def ret_parameters(lf) -> None:

卷积操作的默认的初始化⽅式：小米面

def ret_parameters(lf) -> None:

init.kaiming_uniform_(lf.weight, a=math.sqrt(5))

if lf.bias is not None:

fan_in, _ = init._calculate_fan_in_and_fan_out(lf.weight)

bound = 1 / math.sqrt(fan_in)

init.uniform_(lf.bias, -bound, bound)

该类中的参数的初始化⽅式是：Kaiming初始化

由我国计算机视觉领域专家何凯明提出了针对于relu的初始化⽅法，pytorch默认使⽤kaiming正态分布初始化卷积层参数。

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015),

using a uniform distribution. The resulting tensor will have values sampled from U( − bound, bound) where bound = gain × √((3)/( fan_mode))

李蔚语

Also known as He initialization.

3.1 卷积核部分的参数初始化：

init.kaiming_uniform_(lf.weight, a=math.sqrt(5))

关于init.kaiming_uniform_这个函数，源码如下：

def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):

r"""Fills the input `Tensor` with values according to the method

described in `Delving deep into rectifiers: Surpassing human-level

performance on ImageNet classification` - He, K. et al. (2015), using a

uniform distribution. The resulting tensor will have values sampled from

:math:`\mathcal{U}(-\text{bound}, \text{bound})` where

.. math::

\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}

Also known as He initialization.

Args:汪国真经典诗文

tensor: an n-dimensional `torch.Tensor`

手柄震动测试a: the negative slope of the rectifier ud after this layer (only

ud with ``'leaky_relu'``)

mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``

prerves the magnitude of the variance of the weights in the

forward pass. Choosing ``'fan_out'`` prerves the magnitudes in the

backwards pass.

nonlinearity: the non-linear function (`nn.functional` name),

recommended to u only with ``'relu'`` or ``'leaky_relu'`` (default).

Examples:

>>> w = pty(3, 5)

>>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')

"""

fan = _calculate_correct_fan(tensor, mode)

gain = calculate_gain(nonlinearity, a)

std = gain / math.sqrt(fan)

bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation

如何赚钱_grad():

return tensor.uniform_(-bound, bound)

torch中卷积核默认的初始化的详细参数为：

init.kaiming_uniform_(lf.weight, a=math.sqrt(5),mode='fan_in', nonlinearity='leaky_relu'))

关于 init.kaiming_uniform_中所使⽤的其他函数，如下不做进⼀步的分析，不过还是简单介绍⼀下。

_calculate_correct_fan(tensor, mode) # ⽤于计算计算当前⽹络层的fan_in（输⼊神经元个数）或 fan_out（输出神经元个数的），取决于 mode 的值 'fan_in' 'fan_o calculate_gain：# 对于给定的⾮线性函数，返回推荐的增益值，其实就是⼀个数，从下⾯图中的列表中选出对应的值

_calculate_correct_fan：在这⾥ model = fan_in，计算的是当前⽹络层的fan_in（输⼊神经元个数）

本文发布于:2023-07-10 02:00:27，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1088328.html

上一篇：DISTRIBUTED MACHINE LEARNING

下一篇：一种消除CMOS图像传感器行噪声的时序

标签：卷积参数默认个数函数

留言与评论（共有 0 条评论）