首页 > 美文阅读

【代码阅读】RandLA-Net

更新时间:2023-05-27 09:14:51 阅读：评论：0

【代码阅读】RandLA-Net

⽂章⽬录

RandLa-Net是新提出来的针对⼤场景语义分割的⽅法，效果拔群。我对该⽂章的解读可以看我另外，论⽂作者给出的代码是的。接下来，我们就看⼀下这个代码都做了什么操作。以SemanticKitti为例。

由于作者给出的代码是在TensorFlow下的，我改到了Pytorch下⾯，代码详见，是实现了SemanticKITTI数据集下的训练。

数据预处理

# utils/data_prepare-mantickitti.py

# line 42-50

points = DP.load_pc_kitti(join(pc_path, scan_id))

labels = DP.load_label_kitti(join(label_path,str(scan_id[:-4])+'.label'), remap_lut)

sub_points, sub_labels = DP.grid_sub_sampling(points, labels=labels, grid_size=grid_size)

arch_tree = KDTree(sub_points)

KDTree_save = join(KDTree_path_out,str(scan_id[:-4])+'.pkl')

np.save(join(pc_path_out, scan_id)[:-4], sub_points)

np.save(join(label_path_out, scan_id)[:-4], sub_labels)

with open(KDTree_save,'wb')as f:

pickle.dump(arch_tree, f)

可以看到，上述预处理，是把point和label做了grid sampling，并且⽣成了⼀个kdtree保存下来。

datat

是⽤main_SemanticKITTI/SemanticKITTI这个类实现的，我们看看这部分做了什么

class SemanticKITTI:

def__init__(lf, test_id):

...

# Generate the input data flow

def get_batch_gen(lf, split):

...

def spatially_regular_gen():

# Generator loop

# line 72-79

for i in range(num_per_epoch):

if split !='test':

cloud_ind = i

pc_path = path_list[cloud_ind]

pc, tree, labels = lf.get_data(pc_path)

# crop a small point cloud

pick_idx = np.random.choice(len(pc),1)

lected_pc, lected_labels, lected_idx = lf.crop_pc(pc, labels, tree, pick_idx)

...

各省简称顺口溜return gen_func, gen_types, gen_shapes

def get_data(lf, file_path):#从file_path所指向的⽂件中读⼊point，kdtree和label

...

return points, arch_tree, labels

@staticmethod

def crop_pc(points, labels, arch_tree, pick_idx):

# crop a fixed size point cloud for training

center_point = points[pick_idx,:].reshape(1,-1)

lect_idx = arch_tree.query(center_point, k=cfg.num_points)[1][0]

lect_idx = DP.shuffle_idx(lect_idx)

lect_points = points[lect_idx]

lect_labels = labels[lect_idx]

return lect_points, lect_labels, lect_idx

@staticmethod

def get_tf_mapping2():

def tf_map(batch_pc, batch_label, batch_pc_idx, batch_cloud_idx):

features = batch_pc

input_points =[]

input_neighbors =[]

input_pools =[]

input_up_samples =[]

for i in range(cfg.num_layers):

neighbour_idx = tf.py_func(DP.knn_arch,[batch_pc, batch_pc, cfg.k_n], tf.int32)

sub_points = batch_pc[:,:tf.shape(batch_pc)[1]// cfg.sub_sampling_ratio[i],:]

pool_i = neighbour_idx[:,:tf.shape(batch_pc)[1]// cfg.sub_sampling_ratio[i],:]

up_i = tf.py_func(DP.knn_arch,[sub_points, batch_pc,1], tf.int32)

input_points.append(batch_pc)

input_neighbors.append(neighbour_idx)

input_pools.append(pool_i)

input_up_samples.append(up_i)

batch_pc = sub_points

input_list = input_points + input_neighbors + input_pools + input_up_samples

易经起名input_list +=[features, batch_label, batch_pc_idx, batch_cloud_idx]

return input_list

return tf_map

def init_input_pipeline(lf):

上述可以出从预处理的数据得到训练数据的过程：

1. line 72：通过get_data这个函数从⽂件中读取预处理保存好的point，kdtree，label

2. line 79：通过crop_pc这个函数，从point中任意指定⼀个初始点，寻找cfg.num_points临近点作为crop后的点，也就是⽹络的输

⼊。然后打乱这些点的顺序

3. line 141-150：计算输⼊点中每个点的k临近点，计算下采样的点（由于点的顺讯已经打乱，所以从前⾯取就⾏），计算下采样的点

在上⼀层点的序号pool_idx，计算恢复过程中要使⽤的序号up_idx

4. line 152-153：将上述信息合起来作为⽹络的输⼊

RandLA-Net

由于RandLA-Net的结构⽐较简单，就不画图表⽰了，直接看代码吧。

前向计算

def inference(lf, inputs, is_training):

d_out = lf.config.d_out

feature = inputs['features']

feature = tf.layers.den(feature,8, activation=None, name='fc0')

feature = tf.nn.leaky_relu(tf.layers.batch_normalization(feature,-1,0.99,1e-6, training=is_training))

feature = tf.expand_dims(feature, axis=2)

# >>>>>##Encoder>>>>>###

f_encoder_list =[]

for i in fig.num_layers):

养蚕日记

f_encoder_i = lf.dilated_res_block(feature, inputs['xyz'][i], inputs['neigh_idx'][i], d_out[i],

'Encoder_layer_'+str(i), is_training)

f_sampled_i = lf.random_sample(f_encoder_i, inputs['sub_idx'][i])

feature = f_sampled_i

if i ==0:

f_encoder_list.append(f_encoder_i)

f_encoder_list.append(f_sampled_i)

# >>>>>##Encoder>>>>>###

feature = helper_v2d(f_encoder_list[-1], f_encoder_list[-1].get_shape()[3].value,[1,1],

'decoder_0',

[1,1],'VALID',True, is_training)

# >>>>>##Decoder>>>>>###

f_decoder_list =[]

for j in fig.num_layers):

f_interp_i = lf.nearest_interpolation(feature, inputs['interp_idx'][-j -1])

宏伟

f_decoder_i = helper_v2d_at([f_encoder_list[-j -2], f_interp_i], axis=3),

f_encoder_list[-j -2].get_shape()[-1].value,[1,1],孟尝君

'Decoder_layer_'+str(j),[1,1],'VALID', bn=True,

is_training=is_training)

feature = f_decoder_i

f_decoder_list.append(f_decoder_i)

# >>>>>##Decoder>>>>>###

f_layer_fc1 = helper_v2d(f_decoder_list[-1],64,[1,1],'fc1',[1,1],'VALID',True, is_training)

f_layer_fc2 = helper_v2d(f_layer_fc1,32,[1,1],'fc2',[1,1],'VALID',True, is_training)

f_layer_drop = helper_tf_util.dropout(f_layer_fc2, keep_prob=0.5, is_training=is_training, scope='dp1')

f_layer_fc3 = helper_v2d(f_layer_drop, lf.config.num_class,[1,1],'fc',[1,1],'VALID',Fal,

is_training, activation_fn=None)

f_out = tf.squeeze(f_layer_fc3,[2])

return f_out

inference这个函数就是前向计算的函数，从中可以看到，RandLA-Net有以下结构：

1. 将特征升维到8

2. encoder：由4个（dilated_res_block+random_sample）构成，形成特征⾦字塔

3. 将⾦字塔尖的特征再次计算以下

4. decoder：由4个（nearest_interpolation+conv2d_transpo）构成，恢复到point-wi的特征

5. 由point-wi经过⼀些MLP，得到f_out

Encoder馄饨馅怎么调才好吃

dilated_res_block

def dilated_res_block(lf, feature, xyz, neigh_idx, d_out, name, is_training):

f_pc = helper_v2d(feature, d_out //2,[1,1], name +'mlp1',[1,1],'VALID',True, is_training)

f_pc = lf.building_block(xyz, f_pc, neigh_idx, d_out, name +'LFA', is_training)

f_pc = helper_v2d(f_pc, d_out *2,[1,1], name +'mlp2',[1,1],'VALID',True, is_training,

activation_fn=None)

shortcut = helper_v2d(feature, d_out *2,[1,1], name +'shortcut',[1,1],'VALID',

activation_fn=None, bn=True, is_training=is_training)

leaky_relu(f_pc + shortcut)

dilated_res_block有以下结构：

1. 将特征降维

2. 使⽤building_block聚集周围点的特征

壬读音3. 将特征升维

4. 计算short_cut的特征

5. 将两种特征加和，形成res的结构

那么building_block的结构：

#line 279-291

def building_block(lf, xyz, feature, neigh_idx, d_out, name, is_training):

d_in = _shape()[-1].value

红烧白菜

f_xyz = lf.relative_pos_encoding(xyz, neigh_idx)

f_xyz = helper_v2d(f_xyz, d_in,[1,1], name +'mlp1',[1,1],'VALID',True, is_training)

f_neighbours = lf.gather_neighbour(tf.squeeze(feature, axis=2), neigh_idx)

f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)

f_pc_agg = lf.att_pooling(f_concat, d_out //2, name +'att_pooling_1', is_training)

f_xyz = helper_v2d(f_xyz, d_out //2,[1,1], name +'mlp2',[1,1],'VALID',True, is_training)

f_neighbours = lf.gather_neighbour(tf.squeeze(f_pc_agg, axis=2), neigh_idx)

f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)

f_pc_agg = lf.att_pooling(f_concat, d_out, name +'att_pooling_2', is_training)

return f_pc_agg

dilated_res_block有以下结构：

1. 通过relative_pos_encoding提取k临近点相对于参考点的特征，对应⽂章中的公式（1）

2. 将得到的k临近点对于参考点的特征升维

3. 将k临近点相对于参考点的特征与k临近点本⾝的特征做cat

4. 使⽤att_pooling得到参考点的特征，对应⽂章公式(2)(3)

5. 再次对参考点的k临近点本⾝的特征与k临近点对于参考点的特征做cat

6. 使⽤att_pooling得到参考点的特征

通过两次更新参考点本⾝的特征，将参考点k临近点的信息聚集在参考点上

random_sample

这个sample的选择的点在构造数据集中已经定了，是通过将点顺序打乱，然后取前⾯⼀定数量的点得到的。Decoder

主要就是看⼀下如何做的nearest_interpolation

def nearest_interpolation(feature, interp_idx):

"""

:param feature: [B, N, d] input features matrix

:param interp_idx: [B, up_num_points, 1] nearest neighbour index

:return: [B, up_num_points, d] interpolated features matrix

"""

feature = tf.squeeze(feature, axis=2)

batch_size = tf.shape(interp_idx)[0]

up_num_points = tf.shape(interp_idx)[1]

interp_idx = tf.reshape(interp_idx,[batch_size, up_num_points])

interpolated_features = tf.batch_gather(feature, interp_idx)

interpolated_features = tf.expand_dims(interpolated_features, axis=2)

return interpolated_features

可以看到是根据interp_idx来上插值feature

# main_SemanticKITTI.py

# line 145

up_i = tf.py_func(DP.knn_arch,[sub_points, batch_pc,1], tf.int32)

可以看到up_i（其实就是interp_idx）是通过找1临近点来确定的。

也就是说nearest_interpolation将最临近点的特征赋值给新点的特征的。这样⼦做是与Pointnet++的interpolation有⼀定区别的。Loss

# RandLANet.py

# line 57-76

with tf.variable_scope('loss'):

lf.logits = tf.reshape(lf.logits,[-1, config.num_class])

lf.labels = tf.reshape(lf.labels,[-1])

# Boolean mask of points that should be ignored

ignored_bool = tf.zeros_like(lf.labels, dtype=tf.bool)

for ign_label fig.ignored_label_inds:

ignored_bool = tf.logical_or(ignored_bool, tf.equal(lf.labels, ign_label))

# Collect logits and labels that are not ignored

valid_idx = tf.squeeze(tf.where(tf.logical_not(ignored_bool)))

valid_logits = tf.gather(lf.logits, valid_idx, axis=0)

valid_labels_init = tf.gather(lf.labels, valid_idx, axis=0)

# Reduce label values in the range of logit shape

reducing_list = tf.fig.num_class, dtype=tf.int32)

inrted_value = tf.zeros((1,), dtype=tf.int32)

for ign_label fig.ignored_label_inds:

reducing_list = tf.concat([reducing_list[:ign_label], inrted_value, reducing_list[ign_label:]],0)

valid_labels = tf.gather(reducing_list, valid_labels_init)

Loss的计算包含如下结构：

1. 将前向计算的logits和label变扁平

2. 除去ignore的label的点

3. 更新label，由于有些label是ignored，所以这些label的序号要去掉，ignored label序号之后的⼀些label的序号要前移。

4. 加载每个label的weight，⽤CrossEntropy计算loss

本文发布于:2023-05-27 09:14:51，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/791242.html

上一篇：满招损，谦受益作文（精选33篇）

下一篇：2023女生的网名优雅好听

标签：特征临近参考点

留言与评论（共有 0 条评论）