YOLACT++代码分析2——Yolact模型
Yolact模型
在YOLACT++代码分析1——数据增强,我们讲解了数据增强的部分,这⼀部分主要讲Yolact模型。
⾸先看到train.py⽂件
#image_path训练图⽚⽂件夹
#info_file标签⽂件夹
datat = COCODetection(image_path=ain_images,
info_file=ain_info,
transform=SSDAugmentation(MEANS))
1.继续看看COCODetection的构造函数:
def__init__(lf, image_path, info_file, transform=None,
target_transform=None,
datat_name='MS COCO', has_gt=True):
# Do this here becau we have too many things named COCO
import COCO
if target_transform is None:
target_transform = COCOAnnotationTransform()
< = image_path
< = COCO(info_file)#将标签⽂件导⼊COCO API
#lf.coco.imgToAnns ⾥⾯包含了标签⽂件中所有的bbox、category_id、image_id、gmentation
#的信息,很显然这⾥这⾥取出所有训练的图⽚中的信息:len(lf.ids)=159
lf.ids =imgToAnns.keys())
if len(lf.ids)==0or not has_gt:
lf.ids =imgs.keys())
#transform是SSDAugmentation的实例对象
#COCOAnnotationTransform这个类作⽤:将COCO的标签转换成bbox coords and label index
#的张量
lf.target_transform = COCOAnnotationTransform()
lf.name = datat_name
lf.has_gt = has_gt
这⾥要讲下COCOAnnotationTransform(),这个类的__call__⽅法将coco标签中的bbox和category_id信息存到⼀个列表中:[xmin, ymin, xmax, ymax, category_id]。这⾥需要注意coco标签中的bbox信息:[xmin,ymin,w,h].
2.下⾯回到train.py⽂件中: yolact_net = Yolact()
class Yolact(nn.Module):
def__init__(lf):
混合烟super().__init__()
1.⾸先定义ResNet101的backbone
lf.backbone = construct_backbone(cfg.backbone)#默认resnet101
1).yolact默认以ResNet101作为backbone
2).不让BN层参与梯度传播
2.让上⾯定义的backbone⾥⾯的除Conv之外的层不参与梯度计算(学习)
if cfg.freeze_bn:
lf.freeze_bn()
def freeze_bn(lf, enable=Fal):
""" Adapted from discuss.pytorch/t/how-to-train-with-frozen-batchnorm/12106/8 """
for module dules():
揣测if isinstance(module, nn.BatchNorm2d):
quires_grad = enable
quires_grad = enable
'''
第⼀次循环,module为Yolact类,那么if肯定不成⽴
第⼆次循环,module为ResNetbackbone类,就是上⾯定义的⽹络层,if不成⽴a开头英文名
最美的相遇第三次循环,进⼊ResNetbackbone中,依次访问⾥⾯的层,这次module为Modulelist
第四次循环,module为Modulelist中的第⼀个Sequential
第五次循环,module为Bottleneck。if不成⽴
第六次循环,module为conv1,if不成⽴
第七次循环,module为bn1,if成⽴:
quires_grad = enable #enable = Fal
quires_grad = enable
'''
上⾯循环访问顺序参考下⾯具体模型(backbone:ResNet101的⼀部分):
3).下⾯我们直接看到这⼀⾏代码:Protonet Architecture
lf.proto_net, cfg.mask_dim = make_net(in_channels,
cfg.mask_proto_net, include_last_relu=Fal)
输⼊参数:
这个make_net函数就不粘贴出来了,直接看到make_net函数中:
#conf就是上图中cfg.mask_proto_net
net =sum([make_layer(x)for x in conf],[])
第⼀个循环:x为(256, 3, {‘padding’: 1}),进⼊make_layer函数:直接执⾏到193⾏,剩余的循环与上⾯原理⼀样,我们看最终的输出net:
我们返回到yolact.py⽂件中:
lf.proto_net, cfg.mask_dim = make_net(in_channels, cfg.mask_proto_net, include_last_relu=Fal)
查看lf.proto_net,这个就是论⽂中 Protonet Architecture:
Sequential(
(0): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(1): ReLU(inplace=True)
(2): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(3): ReLU(inplace=True)
(4): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(5): ReLU(inplace=True)
(6): InterpolateModule()
(7): ReLU(inplace=True)
(8): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
(9): ReLU(inplace=True)
(10): Conv2d(256,32, kernel_size=(1,1), stride=(1,1))
)
看看论⽂怎么说: prototype⽣成分⽀为⼀张图⽚预测k个 prototype masks。我们将FCN⽹络的最后⼀层通道数改为k ,每⼀个prototype对应⼀个通道。并与backbone连接上,以P3 feature map(Fig 2)作为输⼊。
论⽂中也解释了为什么将P3输⼊到Protonet ?因为从更深层次的主⼲特征中提取protonet会产⽣ more robust masks和higher resolution prototypes,就导致在⼩的对象上得到更⾼质量的Mask和更好的性能。因此,我们使⽤了FPN⽹络,因为它最⼤的那个特征层(在我们的例⼦中是p3;见图2) 是最深的。我们将它的尺⼨提升到输⼊图像的四分之⼀,以提⾼⼩对象的性能。
最后,作者以ReLU作激活函数,跟在Protonet 后⾯。
4).下⾯我们来到yolact.py的492⾏:FPN
if cfg.fpn is not None:
# Some hacky rewiring to accomodate the FPN
lf.fpn = FPN([src_channels[i]for i in lf.lected_layers])
lf.lected_layers =list(range(len(lf.lected_layers)+ cfg.fpn.num_downsample))
src_channels =[cfg.fpn.num_features]*len(lf.lected_layers)
'''
FPN的输⼊参数:
src_channels:[256, 512, 1024, 2048]
lf.lected_layers:[1, 2, 3]
'''
我们来到了 FPN:
class FPN(ScriptModuleWrapper):
"""
Implements a general version of the FPN introduced in
arxiv/pdf/1612.03144.pdf
Parameters (in cfg.fpn):
- num_features (int): The number of output features in the fpn layers.
- interpolation_mode (str): The mode to pass to F.interpolate.
- num_downsample (int): The number of downsampled layers to add onto the lected layers.
The extra layers are downsampled from the last lected layer.描写水果的句子
Args:
- in_channels (list): For each conv layer you supply in the forward pass,
how many features will it have?
"""
__constants__ =['interpolation_mode','num_downsample','u_conv_downsample','relu_pred_layers', 'lat_layers','pred_layers','downsample_layers','relu_downsample_layers']
参战退役人员
def__init__(lf, in_channels):
super().__init__()
lf.lat_layers = nn.ModuleList([
nn.Conv2d(x, cfg.fpn.num_features, kernel_size=1)
for x in reverd(in_channels)
])
# This is here for backwards compatability
padding =1if cfg.fpn.pad el0
lf.pred_layers = nn.ModuleList([
nn.Conv2d(cfg.fpn.num_features, cfg.fpn.num_features, kernel_size=3, padding=padding) for _ in in_channels
])
if cfg.fpn.u_conv_downsample:
党员教育培训总结
lf.downsample_layers = nn.ModuleList([
nn.Conv2d(cfg.fpn.num_features, cfg.fpn.num_features, kernel_size=3, padding=1, stride=2) for _ in range(cfg.fpn.num_downsample)
])
lf.interpolation_mode = cfg.fpn.interpolation_mode
lf.num_downsample = cfg.fpn.num_downsample
lf.u_conv_downsample = cfg.fpn.u_conv_downsample
炒股四季歌