首页 > 美文鉴赏

YOLACT++代码分析2——Yolact模型

更新时间:2023-07-18 22:27:11 阅读：评论：0

Yolact模型

在YOLACT++代码分析1——数据增强，我们讲解了数据增强的部分，这⼀部分主要讲Yolact模型。

⾸先看到train.py⽂件

#image_path训练图⽚⽂件夹

#info_file标签⽂件夹

datat = COCODetection(image_path=ain_images,

info_file=ain_info,

transform=SSDAugmentation(MEANS))

1.继续看看COCODetection的构造函数:

def__init__(lf, image_path, info_file, transform=None,

target_transform=None,

datat_name='MS COCO', has_gt=True):

# Do this here becau we have too many things named COCO

import COCO

if target_transform is None:

target_transform = COCOAnnotationTransform()

< = image_path

< = COCO(info_file)#将标签⽂件导⼊COCO API

#lf.coco.imgToAnns ⾥⾯包含了标签⽂件中所有的bbox、category_id、image_id、gmentation

#的信息，很显然这⾥这⾥取出所有训练的图⽚中的信息：len(lf.ids)=159

lf.ids =imgToAnns.keys())

if len(lf.ids)==0or not has_gt:

lf.ids =imgs.keys())

#transform是SSDAugmentation的实例对象

#COCOAnnotationTransform这个类作⽤：将COCO的标签转换成bbox coords and label index

#的张量

lf.target_transform = COCOAnnotationTransform()

lf.name = datat_name

lf.has_gt = has_gt

这⾥要讲下COCOAnnotationTransform()，这个类的__call__⽅法将coco标签中的bbox和category_id信息存到⼀个列表中：[xmin, ymin, xmax, ymax, category_id]。这⾥需要注意coco标签中的bbox信息:[xmin,ymin,w,h].

2.下⾯回到train.py⽂件中: yolact_net = Yolact()

class Yolact(nn.Module):

def__init__(lf):

混合烟super().__init__()

1.⾸先定义ResNet101的backbone

lf.backbone = construct_backbone(cfg.backbone)#默认resnet101

1).yolact默认以ResNet101作为backbone

2).不让BN层参与梯度传播

2.让上⾯定义的backbone⾥⾯的除Conv之外的层不参与梯度计算(学习)

if cfg.freeze_bn:

lf.freeze_bn()

def freeze_bn(lf, enable=Fal):

""" Adapted from discuss.pytorch/t/how-to-train-with-frozen-batchnorm/12106/8 """

for module dules():

揣测if isinstance(module, nn.BatchNorm2d):

quires_grad = enable

'''

第⼀次循环，module为Yolact类，那么if肯定不成⽴

第⼆次循环，module为ResNetbackbone类，就是上⾯定义的⽹络层，if不成⽴a开头英文名

最美的相遇第三次循环，进⼊ResNetbackbone中，依次访问⾥⾯的层，这次module为Modulelist

第四次循环，module为Modulelist中的第⼀个Sequential

第五次循环，module为Bottleneck。if不成⽴

第六次循环，module为conv1，if不成⽴

第七次循环，module为bn1，if成⽴：

quires_grad = enable #enable = Fal

quires_grad = enable

'''

上⾯循环访问顺序参考下⾯具体模型(backbone:ResNet101的⼀部分)：

3).下⾯我们直接看到这⼀⾏代码：Protonet Architecture

lf.proto_net, cfg.mask_dim = make_net(in_channels,

cfg.mask_proto_net, include_last_relu=Fal)

输⼊参数：

这个make_net函数就不粘贴出来了，直接看到make_net函数中：

#conf就是上图中cfg.mask_proto_net

net =sum([make_layer(x)for x in conf],[])

第⼀个循环：x为(256, 3, {‘padding’: 1})，进⼊make_layer函数：直接执⾏到193⾏，剩余的循环与上⾯原理⼀样，我们看最终的输出net：

我们返回到yolact.py⽂件中：

lf.proto_net, cfg.mask_dim = make_net(in_channels, cfg.mask_proto_net, include_last_relu=Fal)

查看lf.proto_net，这个就是论⽂中 Protonet Architecture：

Sequential(

(0): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))

(1): ReLU(inplace=True)

(2): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))

(3): ReLU(inplace=True)

(4): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))

(5): ReLU(inplace=True)

(6): InterpolateModule()

(7): ReLU(inplace=True)

(8): Conv2d(256,256, kernel_size=(3,3), stride=(1,1), padding=(1,1))

(9): ReLU(inplace=True)

(10): Conv2d(256,32, kernel_size=(1,1), stride=(1,1))

)

看看论⽂怎么说： prototype⽣成分⽀为⼀张图⽚预测k个 prototype masks。我们将FCN⽹络的最后⼀层通道数改为k ，每⼀个prototype对应⼀个通道。并与backbone连接上，以P3 feature map(Fig 2)作为输⼊。

论⽂中也解释了为什么将P3输⼊到Protonet ？因为从更深层次的主⼲特征中提取protonet会产⽣ more robust masks和higher resolution prototypes，就导致在⼩的对象上得到更⾼质量的Mask和更好的性能。因此，我们使⽤了FPN⽹络，因为它最⼤的那个特征层（在我们的例⼦中是p3；见图2) 是最深的。我们将它的尺⼨提升到输⼊图像的四分之⼀，以提⾼⼩对象的性能。

最后，作者以ReLU作激活函数，跟在Protonet 后⾯。

4).下⾯我们来到yolact.py的492⾏：FPN

if cfg.fpn is not None:

# Some hacky rewiring to accomodate the FPN

lf.fpn = FPN([src_channels[i]for i in lf.lected_layers])

lf.lected_layers =list(range(len(lf.lected_layers)+ cfg.fpn.num_downsample))

src_channels =[cfg.fpn.num_features]*len(lf.lected_layers)

'''

FPN的输⼊参数：

src_channels：[256, 512, 1024, 2048]

lf.lected_layers：[1, 2, 3]

'''

我们来到了 FPN：

class FPN(ScriptModuleWrapper):

"""

Implements a general version of the FPN introduced in

arxiv/pdf/1612.03144.pdf

Parameters (in cfg.fpn):

- num_features (int): The number of output features in the fpn layers.

- interpolation_mode (str): The mode to pass to F.interpolate.

- num_downsample (int): The number of downsampled layers to add onto the lected layers.

The extra layers are downsampled from the last lected layer.描写水果的句子

Args:

- in_channels (list): For each conv layer you supply in the forward pass,

how many features will it have?

"""

__constants__ =['interpolation_mode','num_downsample','u_conv_downsample','relu_pred_layers', 'lat_layers','pred_layers','downsample_layers','relu_downsample_layers']

参战退役人员

def__init__(lf, in_channels):

super().__init__()

lf.lat_layers = nn.ModuleList([

nn.Conv2d(x, cfg.fpn.num_features, kernel_size=1)

for x in reverd(in_channels)

])

# This is here for backwards compatability

padding =1if cfg.fpn.pad el0

lf.pred_layers = nn.ModuleList([

nn.Conv2d(cfg.fpn.num_features, cfg.fpn.num_features, kernel_size=3, padding=padding) for _ in in_channels

])

if cfg.fpn.u_conv_downsample:

党员教育培训总结

lf.downsample_layers = nn.ModuleList([

nn.Conv2d(cfg.fpn.num_features, cfg.fpn.num_features, kernel_size=3, padding=1, stride=2) for _ in range(cfg.fpn.num_downsample)

])

lf.interpolation_mode = cfg.fpn.interpolation_mode

lf.num_downsample = cfg.fpn.num_downsample

lf.u_conv_downsample = cfg.fpn.u_conv_downsample

炒股四季歌

本文发布于:2023-07-18 22:27:11，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1086925.html

上一篇：java转换驼峰命名与蛇形命名的几种方式

下一篇：NX二次开发之程序组及操作重命名

标签：标签对象循环梯度参与通道

留言与评论（共有 0 条评论）