语义分割和实例分割_语义分割与数据集

更新时间:2023-07-13 20:00:32 阅读：评论：0

语义分割和实例分割_语义分割与数据集

语义分割与数据集

Semantic Segmentation and the Datat

在⽬标检测问题中，我们只使⽤矩形边界框来标记和预测图像中的对象。在这⼀节中，我们将对不同的语义区域进⾏语义分割。这些语义区域在像素级标记和预测对象。图1显⽰了⼀个语义分割的图像，区域标记为“dog”、“cat”和“background”。如您所见，与⽬标检测相⽐，语义分割使⽤像素级边界标记区域，以获得更⾼的精度。

Fig. 1. Semantically-gmented image, with areas labeled “dog”, “cat”, and “background”.

1. Image Segmentation and Instance Segmentation

在计算机视觉领域，语义分割有两种重要的⽅法：图像分割和实例分割。这⾥，我们将把这些概念与语义分割区分开来，具体如下：

图像分割将⼀幅图像分成⼏个组成区域。这种⽅法通常利⽤图像中像素之间的相关性。在训练期间，图像像素不需要标签。然⽽，在预测过程中，这种⽅法不能保证分割区域具有我们想要的语义。如果输⼊图像，图像分割可能会将狗分成两个区域，⼀个覆盖狗的嘴和眼睛，⿊⾊是突出的颜⾊，另⼀个覆盖狗的其余部分，黄⾊是突出的颜⾊。

实例分割也称为同步检测与分割。该⽅法尝试识别图像中每个对象实例的像素级区域。与语义分割不同，实例分割不仅区分语义，⽽且区分不同的对象实例。如果⼀幅图像包含两条狗，实例分割将区分哪些像素属于哪只狗。

2. The Pascal VOC2012 Semantic Segmentation Datat

在语义分割领域，⼀个重要的数据集是Pascal VOC2012。为了更好地理解这个数据集，我们必须⾸先导⼊实验所需的包或模块。

%matplotlib inline

import mxnet as

d2l import

as d2l

from d2l

from

mxnet import

import gluon, image, np, npx

from mxnet

from

import os

import

npx.t_np()

原始站点可能不稳定，因此我们从镜像站点下载数据。该存档⽂件约为2GB，因此需要⼀些时间来下载。解压缩归档⽂件后，数据集位于../data/VOCdevkit/VOC2012路径中。

#@save

d2l.DATA_HUB['voc2012'] = (d2l.DATA_URL + 'VOCtrainval_11-May-2012.tar',

'4e443f8a2eca6b1dac8a6c57641b67dd40621a49')

desrtvoc_dir = d2l.download_extract('voc2012', 'VOCdevkit/VOC2012')

转到../data/VOCdevkit/VOC2012查看数据集的不同部分。ImageSets/Segmentation路径包含指定训练和测试⽰例的⽂本⽂件。JPEGImages和SegmentationClass路径分别包含⽰例输⼊图像和标签。这些标签也是图像格式的，与它们对应的输⼊图像具有相同的尺⼨。在标签中，颜⾊相同的像素属于同⼀语义范畴。下⾯定义的read_voc_images函数将所有输⼊图像和标签读⼊内存。

#@save

True):

def

def read_voc_images(voc_dir, is_train=True

"""Read all VOC feature and label images."""

txt_fname = os.path.join(voc_dir, 'ImageSets', 'Segmentation',

el '')

'' if if is_train el

as f:

with open(txt_fname, 'r') as

with

images = f.read().split()

features, labels = [], []

for i, fname in in enumerate(images):

for

features.append(image.imread(os.path.join(

voc_dir, 'JPEGImages', '%s.jpg' % fname)))

labels.append(image.imread(os.path.join(

voc_dir, 'SegmentationClass', '%s.png' % fname)))

return

return features, labels

True)

train_features, train_labels = read_voc_images(voc_dir, True

我们绘制前五个输⼊图像及其标签。在标签图像中，⽩⾊代表边界，⿊⾊代表背景。其他颜⾊对应不同的类别。n = 5

imgs = train_features[0:n] + train_labels[0:n]

d2l.show_images(imgs, 2, n);

接下来，我们将列出标签中的每个RGB颜⾊值及其标记的类别。

#@save

小语种培训VOC_COLORMAP = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],

[0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],

[64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],

[64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],

[0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],layoff

[0, 64, 128]]

#@save

VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',

'bottle', 'bus', 'car', 'cat', 'chair', 'cow',

'diningtable', 'dog', 'hor', 'motorbike', 'person',

'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']

在定义了上⾯的两个常量之后，我们可以很容易地找到标签中每个像素的类别索引。

#@save

def build_colormap2label():

def

"""Build an RGB color to label mapping for gmentation."""

colormap2label = np.zeros(256 ** 3)

for i, colormap in in enumerate(VOC_COLORMAP):

for

colormap2label[(colormap[0]*256 + colormap[1])*256 + colormap[2]] = i

return colormap2label

quit什么意思return

#@save

def

def voc_label_indices(colormap, colormap2label):

"""Map an RGB color to a label."""

colormap = colormap.astype(np.int32)

idx = ((colormap[:, :, 0] * 256 + colormap[:, :, 1]) * 256

+ colormap[:, :, 2])

return colormap2label[idx]

return

例如，在第⼀个⽰例图像中，飞机前部的类别索引为1，背景的索引为0。

y = voc_label_indices(train_labels[0], build_colormap2label())

y[105:115, 130:140], VOC_CLASSES[1]

(array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],

[0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],

[0., 0., 0., 0., 0., 0., 1., 1., 1., 1.],

[0., 0., 0., 0., 0., 1., 1., 1., 1., 1.],

futanaria[0., 0., 0., 0., 1., 1., 1., 1., 1., 1.],

[0., 0., 0., 0., 0., 1., 1., 1., 1., 1.],

[0., 0., 0., 0., 0., 0., 1., 1., 1., 1.],

[0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]]),

'aeroplane')

2.1. Data Preprocessing

在前⾯的章节中，我们缩放图像以使它们适合模型的输⼊形状。在语义分割中，这种⽅法需要将预测的像素类别重新映射回原始⼤⼩的输⼊图像。要精确地做到这⼀点是⾮常困难的，尤其是在具有不同语义的分段区域中。为了避免这个问题，我们裁剪图像以设置尺⼨，⽽不缩放它们。具体来说，我们

绿尾虹雉使⽤图像增强中使⽤的随机裁剪⽅法从输⼊图像及其标签中裁剪出相同的区域。

#@save

def voc_rand_crop(feature, label, height, width):

def

"""Randomly crop for both feature and label images."""

feature, rect = image.random_crop(feature, (width, height))

label = image.fixed_crop(label, *rect)

return feature, label

return

imgs = []

for _ in in range(n):

for

imgs += voc_rand_crop(train_features[0], train_labels[0], 200, 300)

d2l.show_images(imgs[::2] + imgs[1::2], 2, n);

2.2. Datat Class for Custom Semantic Segmentation

我们使⽤gloon提供的继承数据集类来定制语义分段数据集类VOCSegDatat。通过实现the __getitem__ function函数，我们可以从数据集中任意访问索引idx和每个像素的类别索引的输⼊图像。由于数据集中的某些图像可能⼩于为随机裁剪指定的输出尺⼨，因此必须使⽤⾃定义筛选函数删除这些⽰例。此外，我们定义了normalize_image函数来规范输⼊图像的三个RGB通道中的每⼀个。

#@save

class VOCSegDatat

VOCSegDatat(gluon.data.Datat):

class

"""A customized datat to load VOC datat."""

def __init__(lf, is_train, crop_size, voc_dir):

def

features, labels = read_voc_images(voc_dir, is_train=is_train)

lf.features = [lf.normalize_image(feature)

for feature in in lf.filter(features)]

for

lf.labels = lf.filter(labels)

print('read ' + str(len(lf.features)) + ' examples')

def normalize_image(lf, img):

def

return (img.astype('float32') / 255 - lf.rgb_mean) / lf.rgb_std

return

def filter(lf, imgs):

def

for img in in imgs if if (

北京计算机培训班

return [img for

return

and

img.shape[0] >= lf.crop_size[0] and

功夫熊猫感恩节特辑img.shape[1] >= lf.crop_size[1])]

def __getitem__(lf, idx):

def

feature, label = voc_rand_crop(lf.features[idx], lf.labels[idx],

*lf.crop_size)

return

return (anspo(2, 0, 1),

voc_label_indices(label, lf.colormap2label))

def __len__(lf):

def

return len(lf.features)

return

2.3. Reading the Datat

使⽤定制的VOCSegDatat类，我们创建训练集和测试集实例。我们假设随机裁剪操作会输出形状中的图像

320×480个

320×480个 .

下⾯，我们可以看到训练和测试集中保留的⽰例数。

crop_size = (320, 480)

True, crop_size, voc_dir)

voc_train = VOCSegDatat(True

Fal, crop_size, voc_dir)

voc_test = VOCSegDatat(Fal

read 1114 examples

read 1078 examples

我们将批处理⼤⼩设置为64，并为训练集和测试集定义迭代器。打印第⼀个⼩批量的形状。与图像分类和对象识别不同，这⾥的标签是三维数组。

batch_size = 64

True,

train_iter = gluon.data.DataLoader(voc_train, batch_size, shuffle=True

last_batch='discard',

num__dataloader_workers())

for X, Y in in train_iter:

for

print(X.shape)

print(Y.shape)

break

(64, 3, 320, 480)

(64, 320, 480)

2.4. Putting All Things Together

最后，我们下载并定义数据集加载程序。

#@save

def load_data_voc(batch_size, crop_size):

97ribdef

shmily意思"""Download and load the VOC2012 mantic datat."""

voc_dir = d2l.download_extract('voc2012', os.path.join(

'VOCdevkit', 'VOC2012'))

num_workers = _dataloader_workers()

本文发布于:2023-07-13 20:00:32，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/176418.html

上一篇：PROE工程图的配置参数含义

下一篇：集成电路芯片半导体中英文对照术语词汇表

标签：图像分割语义数据区域

留言与评论（共有 0 条评论）