深度学习⽹络层之Pooling
最常见的池化操作为最⼤池化和平均池化:
前向传播:选图像区域的最⼤值作为该区域池化后的值。
反向传播:梯度通过最⼤值的位置传播,其它位置梯度为0。
前向传播:计算图像区域的平均值作为该区域池化后的值。
反向传播:梯度取均值后分给每个位置。
对于Average Pooling的输⼊X =x 1,x 2,...x n ,输出f (X )=1n n ∑i =1x i
∂f
∂x j (X )=∂f ∂x j 1n
n ∑i =1x i =1n n ∑i =1∂f ∂x j x i
=1n n
∑i =1δ(i −j )
当i =j 时,δ(x )=1,否则为0.论⽂提出了⼀种简单有效的正则化CNN的⽅法,能够降低max pooling的过拟合现象,提⾼泛化能⼒。对于pooling层的输⼊,根据输⼊的多项式分布随机选择⼀个值作为输出。训练阶段和测试阶段的操作略有不同。
训练阶段
1. 前向传播
(1)归⼀化pooling的输⼊,作为每个激活神经元的分布概率值p i =
a i
∑k ∈R j a k .
(2)从基于p 的多项式分布中随机采样⼀个位置的值作为输出。2. 反向传播
跟max pooling类似,梯度通过被选择的位置传播,其它位置为0.测试阶段
如果在测试时也使⽤随机pooling会对预测值引⼊噪⾳,降低性能。取⽽代之的是使⽤按归⼀化的概率值加权平均。⽐使⽤average pooling表现要好⼀些。因此在平均意义上,与average pooling近似,在局部意义上,则服从max pooling的准则。解释分析
pooling 种类
最⼤池化 Max Pooling
平均池化 Average Pooling(也称mean pooling)
Stochastic Pooling
按概率加权的⽅式可以被看作是⼀种模型平均融合的⽅式,在pooling区域不同选择⽅式对应⼀个新模型。训练阶段由于引⼊随机性,所以会改变⽹络的连接结构,导致产⽣新的模型。在测试阶段会同时
使⽤这些模型,做加权平均。假设⽹络有d层pooling层,pooling核⼤⼩是n,那么可能的模型有n d个。这⽐dropout增加的模型多样性要多(dropout率为0.5时相当于n=2)。
在CIFAR-10上三种pooling⽅法的错误率对⽐:
pooling 选择与实际应⽤
通常我们使⽤Max Pooling,因为使⽤它能学到图像的边缘和纹理结构。⽽Average Pooling则不能。Max Pooling通常⽤以减⼩估计值⽅差,在⽅差不太重要的地⽅可以随意选择Max Pooling和Average Pooling。Average Pooling⽤以减⼩估计均值的偏移。在某些情况下Average Pooling可能取得⽐Max Pooling稍好⼀些的效果。
average pooling会弱化强激活值,⽽max pooling保留最强的激活值却容易过拟合。
虽然从理论上说Stochastic Pooling也许能取得较好的结果,但是需要在实践中多次尝试,随意使⽤可能效果变差。因此并不是⼀个常规的选择。
按池化是否作⽤于图像中不重合的区域(这与卷积操作不同)分为⼀般池化(Gerneral Pooling)与重
叠池化(OverlappingPooling)。常见设置是filter⼤⼩F=2,步长S=2或F=3,S=2(overlapping pooling,重叠);pooling层通常不需要填充。
代码实现
caffe cpu版pooling层实现代码:
template <typename Dtype>
void PoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
...
switch (this->layer_param_.pooling_param().pool()) {
ca PoolingParameter_PoolMethod_MAX:
const int pool_index = ph * pooled_width_ + pw;
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
const int index = h * width_ + w;
if (bottom_data[index] > top_data[pool_index]) {
top_data[pool_index] = bottom_data[index];
if (u_top_mask) {
top_mask[pool_index] = static_cast<Dtype>(index);
} el {
mask[pool_index] = index;
}
}
}
}
ca PoolingParameter_PoolMethod_AVE:
...
for (int i = 0; i < top_count; ++i) {
top_data[i] = 0;
}
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
top_data[ph * pooled_width_ + pw] +=
bottom_data[h * width_ + w];
}
}
top_data[ph * pooled_width_ + pw] /= pool_size;
...
ca PoolingParameter_PoolMethod_STOCHASTIC:
NOT_IMPLEMENTED;
}
template <typename Dtype>
void PoolingLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
if (!propagate_down[0]) {
return;
}
switch (this->layer_param_.pooling_param().pool()) {
ca PoolingParameter_PoolMethod_MAX:
// The main loop
if (u_top_mask) {
if (u_top_mask) {
top_mask = top[1]->cpu_data();
} el {
mask = max_idx_.cpu_data();
}
for (int n = 0; n < top[0]->num(); ++n) {
for (int c = 0; c < channels_; ++c) {
for (int ph = 0; ph < pooled_height_; ++ph) {
for (int pw = 0; pw < pooled_width_; ++pw) {
const int index = ph * pooled_width_ + pw;
const int bottom_index =
u_top_mask ? top_mask[index] : mask[index];
bottom_diff[bottom_index] += top_diff[index];
}
}
bottom_diff += bottom[0]->offt(0, 1);
top_diff += top[0]->offt(0, 1);
if (u_top_mask) {
top_mask += top[0]->offt(0, 1);
} el {
mask += top[0]->offt(0, 1);
}
}
}
break;
ca PoolingParameter_PoolMethod_AVE:
// The main loop
for (int n = 0; n < top[0]->num(); ++n) {
for (int c = 0; c < channels_; ++c) {
for (int ph = 0; ph < pooled_height_; ++ph) {
for (int pw = 0; pw < pooled_width_; ++pw) {
int hstart = ph * stride_h_ - pad_h_;
int wstart = pw * stride_w_ - pad_w_;
int hend = min(hstart + kernel_h_, height_ + pad_h_);
int wend = min(wstart + kernel_w_, width_ + pad_w_);
int pool_size = (hend - hstart) * (wend - wstart);
hstart = max(hstart, 0);
wstart = max(wstart, 0);
hend = min(hend, height_);
wend = min(wend, width_);
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
bottom_diff[h * width_ + w] +=
top_diff[ph * pooled_width_ + pw] / pool_size;
}
}
}
}
// offt
bottom_diff += bottom[0]->offt(0, 1);
top_diff += top[0]->offt(0, 1);
}
}
break;
ca PoolingParameter_PoolMethod_STOCHASTIC:
NOT_IMPLEMENTED;
break;
...
}
Stochastic Pooling的前向传播过程⽰例theano代码:
caffe中的Stochastic Pooling实现:
只为GPU做了代码实现,并需要与 CAFFE engine⼀块使⽤,需要在pooling_param ⾥边设置pool类型:STOCHASTIC ,在pooling_param 中设置 engine: CAFFE (如果使⽤GPU运⾏,默认引擎是cuDNN).
Stochastic Pooling实现代码:
void StoPoolForwardTrain(..,Dtype* const rand_idx,..) {
/*
rand_idx 是随机选的pooling 核上的位置⽐例,⽬前实现⽅式是使⽤如下的均匀分布产⽣函数⽣成: caffe_gpu_rng_uniform(count, Dtype(0), Dtype(1),
rand_idx_.mutable_gpu_data());
*/
...
Dtype cumsum = 0.;
const Dtype* const bottom_slice =
bottom_data + (n * channels + c) * height * width;
// First pass: get sum
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
cumsum += bottom_slice[h * width + w];
}
}
const float thres = rand_idx[index] * cumsum;
// Second pass: get value, and t index.
cumsum = 0;
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
cumsum += bottom_slice[h * width + w];
if (cumsum >= thres) {// 轮盘赌,均匀分布
rand_idx[index] = ((n * channels + c) * height + h) * width + w;
top_data[index] = bottom_slice[h * width + w];
return;
}
}
}
...
}
void StoPoolForwardTest(...){
...
Dtype cumsum = 0.;
Dtype cumvalues = 0.;
const Dtype* const bottom_slice =
bottom_data + (n * channels + c) * height * width;
// First pass: get sum
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
cumsum += bottom_slice[h * width + w];// 求和
cumvalues += bottom_slice[h * width + w] * bottom_slice[h * width + w];// 求平⽅和 }
}
top_data[index] = (cumsum > 0.) ? cumvalues / cumsum : 0.;
...
}LeCun的“Learning Mid-Level Features For Recognition”对前两种pooling⽅法有⽐较详细的分析对⽐。进⼀步阅读
Processing math: 100%