maskrcnn超详细代码解读(一)

更新时间:2023-05-15 08:30:42 阅读: 评论:0

maskrcnn超详细代码解读(⼀)
mask r-cnn 代码解读(⼀)
⽂章⽬录
本系列将对 mask r-cnn 的代码做⾮常详细的讲解。 默认教程使⽤者已经对mask r-cnn的结构基本了解,因此不对原论⽂做解析、最好是读者⼿头有完整的mrcnn代码(没有也没事,会贴),对照着代码和博客来理解。
本⽂将通过解析代码再次梳理⽹络结构中模糊的地⽅。
1 代码架构
如下图所⽰,mrcnn 中包含四个主要的python⽂件:
config,py :代码中涉及的超参数放在此⽂件中
model.py :深度⽹络的build代码
utils.py :涉及⼀些⼯具⽅法
visualize.py :预测(detect)得到结果后,将预测的Bbox和mask结合原图重新绘制
另外还有 parallel_model.py ,是为了⽅便模型在多GPU上训练。
本⽂将主要根据 model.py 的内容进⾏,此过程中调⽤了别的代码时,也会对被调⽤的解析。
2 model.py 的结构
model.py 特别长,可根据内容划分成如下块:
块作⽤
清火片
Utility Functions对⽇志记录等做⼀些规定
Resnet Graph基础⽹络,提取特征(feature)
Region Proposal Network (RPN)在feature上⽣成anchors,并给出粗略的前景/背景判断结果、粗略的Bbox坐标
Proposal Layer输⼊rpn⽹络得到的Bbox,过滤背景得到proposals
ROIAlign Layer输⼊基础⽹络提取到的特征和 rois 的坐标,剪裁得到特征上坐标对应的部分。并reshape到相同⼤⼩
Detection Target Layer train过程中,根据ground truth 和 proposal得到 rois(region of interest)
Detection Layer inference过程中,根据proposals和Bbox deltas得到最后的Bbox结果
Feature Pyramid Network Heads输⼊pool后的特征,预测物体class和mask
MaskRCNN Class将上述结构串起来,最后返回⼀个model对象
Loss Functions
损失函数相关Data Generator
管理训练数据等Data Formatting
对输⼊图⽚的⼀些处理Miscellenous Graph Functions 其他⼀些函数
学习历史的方法块作⽤接下来将按照 Resnet Graph → RPN → Proposal Layer → ROIAlign Layer → Detection Target Layer → Feature Pyramid Network Heads → MaskRCNN Class 的顺序讲解 train 部分的代码。
然后是 inference 中不同的代码部分。
然后是损失函数和数据先关操作。整个解析中,会:贴上增加注释的代码将代码转换成容易理解的流程图
对代码中不易理解/别扭(个⼈认为)的地⽅额外解释
3 train 过程代码解析
3.1 Resnet Graph 这部分代码⽐较简单,⽹络结构也⼗分清晰。包括三个⽅法:def identity_block def conv_block
def resnet_graph
其中 identity_block 和 conv_block 是定义的两个卷积块,也就是两种不同的卷积⽅式:
中秋故事
综合素质评价评语具体的代码也贴在下⽅:
def  identity_block (input_tensor , kernel_size , filters , stage , block ,
u_bias =True , train_bn =True ):
"""The identity_block is the block that has no conv layer at shortcut
# Arguments
input_tensor: input tensor
kernel_size: default 3, the kernel size of middle conv layer at main path
filters: list of integers, the nb_filters of 3 conv layer at main path
filters: list of integers, the nb_filters of 3 conv layer at main path
stage: integer, current stage label, ud for generating layer names
block: 'a','b'..., current block label, ud for generating layer names
u_bias: Boolean. To u or not u a bias in conv layers.
train_bn: Boolean. Train or freeze Batch Norm layers
"""
'''
⼀个1x1卷积→⼀个kxk卷积(kernel size)→⼀个1x1卷积→卷积结果和input加起来(像素相加)
'''
nb_filter1, nb_filter2, nb_filter3 = filters
conv_name_ba ='res'+str(stage)+ block +'_branch'
bn_name_ba ='bn'+str(stage)+ block +'_branch'
x = KL.Conv2D(nb_filter1,(1,1), name=conv_name_ba +'2a',
u_bias=u_bias)(input_tensor)
x = BatchNorm(name=bn_name_ba +'2a')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.Conv2D(nb_filter2,(kernel_size, kernel_size), padding='same',
name=conv_name_ba +'2b', u_bias=u_bias)(x)
x = BatchNorm(name=bn_name_ba +'2b')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.Conv2D(nb_filter3,(1,1), name=conv_name_ba +'2c',
u_bias=u_bias)(x)
x = BatchNorm(name=bn_name_ba +'2c')(x, training=train_bn)
x = KL.Add()([x, input_tensor])
x = KL.Activation('relu', name='res'+str(stage)+ block +'_out')(x)
return x
def conv_block(input_tensor, kernel_size, filters, stage, block,
暗室亏心
strides=(2,2), u_bias=True, train_bn=True):
"""conv_block is the block that has a conv layer at shortcut
# Arguments
input_tensor: input tensor
kernel_size: default 3, the kernel size of middle conv layer at main path
filters: list of integers, the nb_filters of 3 conv layer at main path
stage: integer, current stage label, ud for generating layer names
block: 'a','b'..., current block label, ud for generating layer names
u_bias: Boolean. To u or not u a bias in conv layers.
train_bn: Boolean. Train or freeze Batch Norm layers
Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
And the shortcut should have subsample=(2,2) as well
"""
'''
⼀个1x1卷积→⼀个kxk卷积(kernel size)→⼀个1x1卷积→卷积结果和input 1x1卷积后加起来(像素相加)    '''
nb_filter1, nb_filter2, nb_filter3 = filters
conv_name_ba ='res'+str(stage)+ block +'_branch'
bn_name_ba ='bn'+str(stage)+ block +'_branch'
x = KL.Conv2D(nb_filter1,(1,1), strides=strides,
name=conv_name_ba +'2a', u_bias=u_bias)(input_tensor)
x = BatchNorm(name=bn_name_ba +'2a')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.Conv2D(nb_filter2,(kernel_size, kernel_size), padding='same',
name=conv_name_ba +'2b', u_bias=u_bias)(x)
x = BatchNorm(name=bn_name_ba +'2b')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.Conv2D(nb_filter3,(1,1), name=conv_name_ba +
'2c', u_bias=u_bias)(x)
x = BatchNorm(name=bn_name_ba +'2c')(x, training=train_bn)
shortcut = KL.Conv2D(nb_filter3,(1,1), strides=strides,
name=conv_name_ba +'1', u_bias=u_bias)(input_tensor)幼儿园大班数学练习题
shortcut = BatchNorm(name=bn_name_ba +'1')(shortcut, training=train_bn)
x = KL.Add()([x, shortcut])
x = KL.Activation('relu', name='res'+str(stage)+ block +'_out')(x)
return x
在 resnet_graph 中,调⽤这两种卷积块,得到特征图。
⾸先:
def resnet_graph(input_image, architecture, stage5=Fal, train_bn=True):
"""Build a ResNet graph.
architecture: Can be resnet50 or resnet101
stage5: Boolean. If Fal, stage5 of the network is not created
train_bn: Boolean. Train or freeze Batch Norm layers
"""
asrt architecture in["resnet50","resnet101"]
asrt 是断⾔,如果输⼊的architecture不是resnet50或者resnet101中某⼀个的话,将报错。然后调⽤上述两个卷积块进⾏特征提取:
def resnet_graph(input_image, architecture, stage5=Fal, train_bn=True):
...
# Stage 1
x = KL.ZeroPadding2D((3,3))(input_image)
x = KL.Conv2D(64,(7,7), strides=(2,2), name='conv1', u_bias=True)(x)
x = BatchNorm(name='bn_conv1')(x, training=train_bn)
x = KL.Activation('relu')(x)
C1 = x = KL.MaxPooling2D((3,3), strides=(2,2), padding="same")(x)
# Stage 2
x = conv_block(x,3,[64,64,256], stage=2, block='a', strides=(1,1), train_bn=train_bn)
x = identity_block(x,3,[64,64,256], stage=2, block='b', train_bn=train_bn)
C2 = x = identity_block(x,3,[64,64,256], stage=2, block='c', train_bn=train_bn)
# Stage 3
x = conv_block(x,3,[128,128,512], stage=3, block='a', train_bn=train_bn)
x = identity_block(x,3,[128,128,512], stage=3, block='b', train_bn=train_bn)
x = identity_block(x,3,[128,128,512], stage=3, block='c', train_bn=train_bn)
C3 = x = identity_block(x,3,[128,128,512], stage=3, block='d', train_bn=train_bn)
# Stage 4
x = conv_block(x,3,[256,256,1024], stage=4, block='a', train_bn=train_bn)
block_count ={"resnet50":5,"resnet101":22}[architecture]
芹菜叶子for i in range(block_count):
x = identity_block(x,3,[256,256,1024], stage=4, block=chr(98+ i), train_bn=train_bn)
C4 = x
# Stage 5
if stage5:汲取的反义词
x = conv_block(x,3,[512,512,2048], stage=5, block='a', train_bn=train_bn)
x = identity_block(x,3,[512,512,2048], stage=5, block='b', train_bn=train_bn)
C5 = x = identity_block(x,3,[512,512,2048], stage=5, block='c', train_bn=train_bn)
el:
C5 =None
return[C1, C2, C3, C4, C5]
这部分代码也很简单,基本就是平铺直叙,整理这部分⽹络结构如下:
3.2 Region Proposal Network (RPN)
这部分包括两个⽅法:
def rpn_graph(feature_map, anchors_per_location, anchor_stride)
def build_rpn_model(anchor_stride, anchors_per_location, depth)
在 build_rpn_graph 中调⽤ rpn_graph ,得到output然后返回⼀个model对象。(先在 **_graph ⽅法中写好⽹络结构,然后在 build_** ⽅法中调⽤它,返回⼀个model对象。这种写代码思路是CNN中常
见的写法。后⽂不再提及。 )
其中输⼊参数说明如下:
feature_map,shape = [batch, height, width, depth],是上⽂ resnet 输出的结果。
anchors_per_location :feature 中每个 pixel 产⽣的 anchors 的数量
anchor_stride :产⽣anchors的间隔,⼀般是1(也就是每个pixel都产⽣anchors),或者2(间隔⼀个)
需要说明的是,之前 resnet 产⽣了五个不同尺⼨的 feature ,⽽ RPN 的输⼊中没有体现这⼀点。
这是因为:在后期 Mask RCNN 调⽤这两个 layer 的时候,将 resnet ⽣成的 feature 放⼊ list 中,然后循环 list ,对每⼀层的 feature 单独输⼊ RPN。
返回:
rpn_class_logits,shape = [batch, H * W * anchors_per_location, 2] 这个是anchors 分类结果激活前的张量。
rpn_probs,shape = [batch, H * W * anchors_per_location, 2] 这是上⾯logits激活(softmax)之后的结果,表⽰ anchor 分类结果的得分。(或者说可能性,probs就是probabilities)shape中最后⼀维是“2”,代表 前景(有对象)/背景 的初步判断结果。
rpn_bbox,shape = [batch, H * W * anchors_per_location, 4],最后⼀维的4具体是 (dy, dx, log(dh), log(dw))。这是anchors回归的delta。
具体代码如下:

本文发布于:2023-05-15 08:30:42,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/638586.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:代码   结果   得到   解析   卷积   特征   背景   坐标
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图