首页 > 美文阅读

maskrcnn超详细代码解读（一）

更新时间:2023-05-15 08:30:42 阅读：评论：0

maskrcnn超详细代码解读（⼀）

mask r-cnn 代码解读（⼀）

⽂章⽬录

本系列将对 mask r-cnn 的代码做⾮常详细的讲解。默认教程使⽤者已经对mask r-cnn的结构基本了解，因此不对原论⽂做解析、最好是读者⼿头有完整的mrcnn代码（没有也没事，会贴），对照着代码和博客来理解。

本⽂将通过解析代码再次梳理⽹络结构中模糊的地⽅。

1 代码架构

如下图所⽰，mrcnn 中包含四个主要的python⽂件：

config,py ：代码中涉及的超参数放在此⽂件中

model.py ：深度⽹络的build代码

utils.py ：涉及⼀些⼯具⽅法

visualize.py ：预测(detect)得到结果后，将预测的Bbox和mask结合原图重新绘制

另外还有 parallel_model.py ，是为了⽅便模型在多GPU上训练。

本⽂将主要根据 model.py 的内容进⾏，此过程中调⽤了别的代码时，也会对被调⽤的解析。

2 model.py 的结构

model.py 特别长，可根据内容划分成如下块：

块作⽤

清火片

Utility Functions对⽇志记录等做⼀些规定

Resnet Graph基础⽹络，提取特征（feature）

Region Proposal Network (RPN)在feature上⽣成anchors，并给出粗略的前景/背景判断结果、粗略的Bbox坐标

Proposal Layer输⼊rpn⽹络得到的Bbox，过滤背景得到proposals

ROIAlign Layer输⼊基础⽹络提取到的特征和 rois 的坐标，剪裁得到特征上坐标对应的部分。并reshape到相同⼤⼩

Detection Target Layer train过程中，根据ground truth 和 proposal得到 rois（region of interest）

Detection Layer inference过程中，根据proposals和Bbox deltas得到最后的Bbox结果

Feature Pyramid Network Heads输⼊pool后的特征，预测物体class和mask

MaskRCNN Class将上述结构串起来，最后返回⼀个model对象

Loss Functions

损失函数相关Data Generator

管理训练数据等Data Formatting

对输⼊图⽚的⼀些处理Miscellenous Graph Functions 其他⼀些函数

学习历史的方法块作⽤接下来将按照 Resnet Graph → RPN → Proposal Layer → ROIAlign Layer → Detection Target Layer → Feature Pyramid Network Heads → MaskRCNN Class 的顺序讲解 train 部分的代码。

然后是 inference 中不同的代码部分。

然后是损失函数和数据先关操作。整个解析中，会：贴上增加注释的代码将代码转换成容易理解的流程图

对代码中不易理解/别扭（个⼈认为）的地⽅额外解释

3 train 过程代码解析

3.1 Resnet Graph 这部分代码⽐较简单，⽹络结构也⼗分清晰。包括三个⽅法：def identity_block def conv_block

def resnet_graph

其中 identity_block 和 conv_block 是定义的两个卷积块，也就是两种不同的卷积⽅式：

中秋故事

综合素质评价评语具体的代码也贴在下⽅：

def identity_block (input_tensor , kernel_size , filters , stage , block ,

u_bias =True , train_bn =True ):

"""The identity_block is the block that has no conv layer at shortcut

# Arguments

input_tensor: input tensor

kernel_size: default 3, the kernel size of middle conv layer at main path

filters: list of integers, the nb_filters of 3 conv layer at main path

stage: integer, current stage label, ud for generating layer names

block: 'a','b'..., current block label, ud for generating layer names

u_bias: Boolean. To u or not u a bias in conv layers.

train_bn: Boolean. Train or freeze Batch Norm layers

"""

'''

⼀个1x1卷积→⼀个kxk卷积（kernel size）→⼀个1x1卷积→卷积结果和input加起来（像素相加）

'''

nb_filter1, nb_filter2, nb_filter3 = filters

conv_name_ba ='res'+str(stage)+ block +'_branch'

bn_name_ba ='bn'+str(stage)+ block +'_branch'

x = KL.Conv2D(nb_filter1,(1,1), name=conv_name_ba +'2a',

u_bias=u_bias)(input_tensor)

x = BatchNorm(name=bn_name_ba +'2a')(x, training=train_bn)

x = KL.Activation('relu')(x)

x = KL.Conv2D(nb_filter2,(kernel_size, kernel_size), padding='same',

name=conv_name_ba +'2b', u_bias=u_bias)(x)

x = BatchNorm(name=bn_name_ba +'2b')(x, training=train_bn)

x = KL.Activation('relu')(x)

x = KL.Conv2D(nb_filter3,(1,1), name=conv_name_ba +'2c',

u_bias=u_bias)(x)

x = BatchNorm(name=bn_name_ba +'2c')(x, training=train_bn)

x = KL.Add()([x, input_tensor])

x = KL.Activation('relu', name='res'+str(stage)+ block +'_out')(x)

return x

def conv_block(input_tensor, kernel_size, filters, stage, block,

暗室亏心

strides=(2,2), u_bias=True, train_bn=True):

"""conv_block is the block that has a conv layer at shortcut

# Arguments

input_tensor: input tensor

kernel_size: default 3, the kernel size of middle conv layer at main path

filters: list of integers, the nb_filters of 3 conv layer at main path

stage: integer, current stage label, ud for generating layer names

block: 'a','b'..., current block label, ud for generating layer names

u_bias: Boolean. To u or not u a bias in conv layers.

train_bn: Boolean. Train or freeze Batch Norm layers

Note that from stage 3, the first conv layer at main path is with subsample=(2,2)

And the shortcut should have subsample=(2,2) as well

"""

'''

⼀个1x1卷积→⼀个kxk卷积（kernel size）→⼀个1x1卷积→卷积结果和input 1x1卷积后加起来（像素相加） '''

nb_filter1, nb_filter2, nb_filter3 = filters

conv_name_ba ='res'+str(stage)+ block +'_branch'

bn_name_ba ='bn'+str(stage)+ block +'_branch'

x = KL.Conv2D(nb_filter1,(1,1), strides=strides,

name=conv_name_ba +'2a', u_bias=u_bias)(input_tensor)

x = BatchNorm(name=bn_name_ba +'2a')(x, training=train_bn)

x = KL.Activation('relu')(x)

x = KL.Conv2D(nb_filter2,(kernel_size, kernel_size), padding='same',

name=conv_name_ba +'2b', u_bias=u_bias)(x)

x = BatchNorm(name=bn_name_ba +'2b')(x, training=train_bn)

x = KL.Activation('relu')(x)

x = KL.Conv2D(nb_filter3,(1,1), name=conv_name_ba +

'2c', u_bias=u_bias)(x)

x = BatchNorm(name=bn_name_ba +'2c')(x, training=train_bn)

shortcut = KL.Conv2D(nb_filter3,(1,1), strides=strides,

name=conv_name_ba +'1', u_bias=u_bias)(input_tensor)幼儿园大班数学练习题

shortcut = BatchNorm(name=bn_name_ba +'1')(shortcut, training=train_bn)

x = KL.Add()([x, shortcut])

x = KL.Activation('relu', name='res'+str(stage)+ block +'_out')(x)

return x

在 resnet_graph 中，调⽤这两种卷积块，得到特征图。

⾸先：

def resnet_graph(input_image, architecture, stage5=Fal, train_bn=True):

"""Build a ResNet graph.

architecture: Can be resnet50 or resnet101

stage5: Boolean. If Fal, stage5 of the network is not created

train_bn: Boolean. Train or freeze Batch Norm layers

"""

asrt architecture in["resnet50","resnet101"]

asrt 是断⾔，如果输⼊的architecture不是resnet50或者resnet101中某⼀个的话，将报错。然后调⽤上述两个卷积块进⾏特征提取：

def resnet_graph(input_image, architecture, stage5=Fal, train_bn=True):

...

# Stage 1

x = KL.ZeroPadding2D((3,3))(input_image)

x = KL.Conv2D(64,(7,7), strides=(2,2), name='conv1', u_bias=True)(x)

x = BatchNorm(name='bn_conv1')(x, training=train_bn)

x = KL.Activation('relu')(x)

C1 = x = KL.MaxPooling2D((3,3), strides=(2,2), padding="same")(x)

# Stage 2

x = conv_block(x,3,[64,64,256], stage=2, block='a', strides=(1,1), train_bn=train_bn)

x = identity_block(x,3,[64,64,256], stage=2, block='b', train_bn=train_bn)

C2 = x = identity_block(x,3,[64,64,256], stage=2, block='c', train_bn=train_bn)

# Stage 3

x = conv_block(x,3,[128,128,512], stage=3, block='a', train_bn=train_bn)

x = identity_block(x,3,[128,128,512], stage=3, block='b', train_bn=train_bn)

x = identity_block(x,3,[128,128,512], stage=3, block='c', train_bn=train_bn)

C3 = x = identity_block(x,3,[128,128,512], stage=3, block='d', train_bn=train_bn)

# Stage 4

x = conv_block(x,3,[256,256,1024], stage=4, block='a', train_bn=train_bn)

block_count ={"resnet50":5,"resnet101":22}[architecture]

芹菜叶子for i in range(block_count):

x = identity_block(x,3,[256,256,1024], stage=4, block=chr(98+ i), train_bn=train_bn)

C4 = x

# Stage 5

if stage5:汲取的反义词

x = conv_block(x,3,[512,512,2048], stage=5, block='a', train_bn=train_bn)

x = identity_block(x,3,[512,512,2048], stage=5, block='b', train_bn=train_bn)

C5 = x = identity_block(x,3,[512,512,2048], stage=5, block='c', train_bn=train_bn)

el:

C5 =None

return[C1, C2, C3, C4, C5]

这部分代码也很简单，基本就是平铺直叙，整理这部分⽹络结构如下：

3.2 Region Proposal Network (RPN)

这部分包括两个⽅法：

def rpn_graph(feature_map, anchors_per_location, anchor_stride)

def build_rpn_model(anchor_stride, anchors_per_location, depth)

在 build_rpn_graph 中调⽤ rpn_graph ，得到output然后返回⼀个model对象。（先在 **_graph ⽅法中写好⽹络结构，然后在 build_** ⽅法中调⽤它，返回⼀个model对象。这种写代码思路是CNN中常

见的写法。后⽂不再提及。）

其中输⼊参数说明如下：

feature_map，shape = [batch, height, width, depth]，是上⽂ resnet 输出的结果。

anchors_per_location ：feature 中每个 pixel 产⽣的 anchors 的数量

anchor_stride ：产⽣anchors的间隔，⼀般是1（也就是每个pixel都产⽣anchors），或者2（间隔⼀个）

需要说明的是，之前 resnet 产⽣了五个不同尺⼨的 feature ，⽽ RPN 的输⼊中没有体现这⼀点。

这是因为：在后期 Mask RCNN 调⽤这两个 layer 的时候，将 resnet ⽣成的 feature 放⼊ list 中，然后循环 list ，对每⼀层的 feature 单独输⼊ RPN。

rpn_class_logits，shape = [batch, H * W * anchors_per_location, 2] 这个是anchors 分类结果激活前的张量。

rpn_probs，shape = [batch, H * W * anchors_per_location, 2] 这是上⾯logits激活（softmax）之后的结果，表⽰ anchor 分类结果的得分。（或者说可能性，probs就是probabilities）shape中最后⼀维是“2”，代表前景（有对象）/背景的初步判断结果。

rpn_bbox，shape = [batch, H * W * anchors_per_location, 4]，最后⼀维的4具体是 (dy, dx, log(dh), log(dw))。这是anchors回归的delta。

具体代码如下：

本文发布于:2023-05-15 08:30:42，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/638586.html

上一篇：充满正能量阳光的说说

下一篇：2023年汽修专业自荐信100字(5篇)