首页 > 美文阅读

深度学习网络层之Pooling

更新时间:2023-05-12 01:46:31 阅读：评论：0

深度学习⽹络层之Pooling

最常见的池化操作为最⼤池化和平均池化：

前向传播：选图像区域的最⼤值作为该区域池化后的值。

反向传播：梯度通过最⼤值的位置传播，其它位置梯度为0。

前向传播：计算图像区域的平均值作为该区域池化后的值。

反向传播：梯度取均值后分给每个位置。

对于Average Pooling的输⼊X =x 1,x 2,...x n ，输出f (X )=1n n ∑i =1x i

∂f

∂x j (X )=∂f ∂x j 1n

n ∑i =1x i =1n n ∑i =1∂f ∂x j x i

=1n n

∑i =1δ(i −j )

当i =j 时，δ(x )=1，否则为0.论⽂提出了⼀种简单有效的正则化CNN的⽅法，能够降低max pooling的过拟合现象，提⾼泛化能⼒。对于pooling层的输⼊，根据输⼊的多项式分布随机选择⼀个值作为输出。训练阶段和测试阶段的操作略有不同。

训练阶段

1. 前向传播

(1)归⼀化pooling的输⼊，作为每个激活神经元的分布概率值p i =

a i

∑k ∈R j a k .

(2)从基于p 的多项式分布中随机采样⼀个位置的值作为输出。2. 反向传播

跟max pooling类似，梯度通过被选择的位置传播，其它位置为0.测试阶段

如果在测试时也使⽤随机pooling会对预测值引⼊噪⾳，降低性能。取⽽代之的是使⽤按归⼀化的概率值加权平均。⽐使⽤average pooling表现要好⼀些。因此在平均意义上，与average pooling近似，在局部意义上，则服从max pooling的准则。解释分析

pooling 种类

最⼤池化 Max Pooling

平均池化 Average Pooling（也称mean pooling）

Stochastic Pooling

按概率加权的⽅式可以被看作是⼀种模型平均融合的⽅式，在pooling区域不同选择⽅式对应⼀个新模型。训练阶段由于引⼊随机性，所以会改变⽹络的连接结构，导致产⽣新的模型。在测试阶段会同时

使⽤这些模型，做加权平均。假设⽹络有d层pooling层，pooling核⼤⼩是n，那么可能的模型有n d个。这⽐dropout增加的模型多样性要多（dropout率为0.5时相当于n=2）。

在CIFAR-10上三种pooling⽅法的错误率对⽐：

pooling 选择与实际应⽤

通常我们使⽤Max Pooling，因为使⽤它能学到图像的边缘和纹理结构。⽽Average Pooling则不能。Max Pooling通常⽤以减⼩估计值⽅差，在⽅差不太重要的地⽅可以随意选择Max Pooling和Average Pooling。Average Pooling⽤以减⼩估计均值的偏移。在某些情况下Average Pooling可能取得⽐Max Pooling稍好⼀些的效果。

average pooling会弱化强激活值，⽽max pooling保留最强的激活值却容易过拟合。

虽然从理论上说Stochastic Pooling也许能取得较好的结果，但是需要在实践中多次尝试，随意使⽤可能效果变差。因此并不是⼀个常规的选择。

按池化是否作⽤于图像中不重合的区域（这与卷积操作不同）分为⼀般池化（Gerneral Pooling）与重

叠池化（OverlappingPooling）。常见设置是filter⼤⼩F=2，步长S=2或F=3,S=2(overlapping pooling，重叠)；pooling层通常不需要填充。

代码实现

caffe cpu版pooling层实现代码:

template <typename Dtype>

void PoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

const vector<Blob<Dtype>*>& top) {

...

switch (this->layer_param_.pooling_param().pool()) {

ca PoolingParameter_PoolMethod_MAX:

const int pool_index = ph * pooled_width_ + pw;

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

const int index = h * width_ + w;

if (bottom_data[index] > top_data[pool_index]) {

top_data[pool_index] = bottom_data[index];

if (u_top_mask) {

top_mask[pool_index] = static_cast<Dtype>(index);

} el {

mask[pool_index] = index;

}

ca PoolingParameter_PoolMethod_AVE:

...

for (int i = 0; i < top_count; ++i) {

top_data[i] = 0;

}

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

top_data[ph * pooled_width_ + pw] +=

bottom_data[h * width_ + w];

}

top_data[ph * pooled_width_ + pw] /= pool_size;

...

ca PoolingParameter_PoolMethod_STOCHASTIC:

NOT_IMPLEMENTED;

}

template <typename Dtype>

void PoolingLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,

const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {

if (!propagate_down[0]) {

return;

}

switch (this->layer_param_.pooling_param().pool()) {

ca PoolingParameter_PoolMethod_MAX:

// The main loop

if (u_top_mask) {

top_mask = top[1]->cpu_data();

} el {

mask = max_idx_.cpu_data();

}

for (int n = 0; n < top[0]->num(); ++n) {

for (int c = 0; c < channels_; ++c) {

for (int ph = 0; ph < pooled_height_; ++ph) {

for (int pw = 0; pw < pooled_width_; ++pw) {

const int index = ph * pooled_width_ + pw;

const int bottom_index =

u_top_mask ? top_mask[index] : mask[index];

bottom_diff[bottom_index] += top_diff[index];

}

bottom_diff += bottom[0]->offt(0, 1);

top_diff += top[0]->offt(0, 1);

if (u_top_mask) {

top_mask += top[0]->offt(0, 1);

} el {

mask += top[0]->offt(0, 1);

}

break;

ca PoolingParameter_PoolMethod_AVE:

// The main loop

for (int n = 0; n < top[0]->num(); ++n) {

for (int c = 0; c < channels_; ++c) {

for (int ph = 0; ph < pooled_height_; ++ph) {

for (int pw = 0; pw < pooled_width_; ++pw) {

int hstart = ph * stride_h_ - pad_h_;

int wstart = pw * stride_w_ - pad_w_;

int hend = min(hstart + kernel_h_, height_ + pad_h_);

int wend = min(wstart + kernel_w_, width_ + pad_w_);

int pool_size = (hend - hstart) * (wend - wstart);

hstart = max(hstart, 0);

wstart = max(wstart, 0);

hend = min(hend, height_);

wend = min(wend, width_);

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

bottom_diff[h * width_ + w] +=

top_diff[ph * pooled_width_ + pw] / pool_size;

}

// offt

bottom_diff += bottom[0]->offt(0, 1);

top_diff += top[0]->offt(0, 1);

}

break;

ca PoolingParameter_PoolMethod_STOCHASTIC:

NOT_IMPLEMENTED;

break;

...

}

Stochastic Pooling的前向传播过程⽰例theano代码:

caffe中的Stochastic Pooling实现：

只为GPU做了代码实现，并需要与 CAFFE engine⼀块使⽤，需要在pooling_param ⾥边设置pool类型：STOCHASTIC ，在pooling_param 中设置 engine: CAFFE （如果使⽤GPU运⾏，默认引擎是cuDNN）.

Stochastic Pooling实现代码:

void StoPoolForwardTrain(..,Dtype* const rand_idx,..) {

rand_idx 是随机选的pooling 核上的位置⽐例,⽬前实现⽅式是使⽤如下的均匀分布产⽣函数⽣成： caffe_gpu_rng_uniform(count, Dtype(0), Dtype(1),

rand_idx_.mutable_gpu_data());

...

Dtype cumsum = 0.;

const Dtype* const bottom_slice =

bottom_data + (n * channels + c) * height * width;

// First pass: get sum

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

cumsum += bottom_slice[h * width + w];

}

const float thres = rand_idx[index] * cumsum;

// Second pass: get value, and t index.

cumsum = 0;

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

cumsum += bottom_slice[h * width + w];

if (cumsum >= thres) {// 轮盘赌，均匀分布

rand_idx[index] = ((n * channels + c) * height + h) * width + w;

top_data[index] = bottom_slice[h * width + w];

return;

}

...

}

void StoPoolForwardTest(...){

...

Dtype cumsum = 0.;

Dtype cumvalues = 0.;

const Dtype* const bottom_slice =

bottom_data + (n * channels + c) * height * width;

// First pass: get sum

for (int h = hstart; h < hend; ++h) {

for (int w = wstart; w < wend; ++w) {

cumsum += bottom_slice[h * width + w];// 求和

cumvalues += bottom_slice[h * width + w] * bottom_slice[h * width + w];// 求平⽅和 }

}

top_data[index] = (cumsum > 0.) ? cumvalues / cumsum : 0.;

...

}LeCun的“Learning Mid-Level Features For Recognition”对前两种pooling⽅法有⽐较详细的分析对⽐。进⼀步阅读

Processing math: 100%

本文发布于:2023-05-12 01:46:31，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/593136.html

上一篇：大学优秀学生入团申请书600字（通用5篇）

下一篇：幼儿教师演讲稿简短(十六篇)

标签：传播位置模型

留言与评论（共有 0 条评论）