【代码阅读】RandLA-Net

更新时间:2023-05-27 09:14:51 阅读: 评论:0

【代码阅读】RandLA-Net
⽂章⽬录
RandLa-Net是新提出来的针对⼤场景语义分割的⽅法,效果拔群。我对该⽂章的解读可以看我另外 ,论⽂作者给出的代码是 的。接下来,我们就看⼀下这个代码都做了什么操作。以SemanticKitti为例。
由于作者给出的代码是在TensorFlow下的,我改到了Pytorch下⾯,代码详见,是实现了SemanticKITTI数据集下的训练。
数据预处理
# utils/data_prepare-mantickitti.py
# line 42-50
points = DP.load_pc_kitti(join(pc_path, scan_id))
labels = DP.load_label_kitti(join(label_path,str(scan_id[:-4])+'.label'), remap_lut)
sub_points, sub_labels = DP.grid_sub_sampling(points, labels=labels, grid_size=grid_size)
arch_tree = KDTree(sub_points)
KDTree_save = join(KDTree_path_out,str(scan_id[:-4])+'.pkl')
np.save(join(pc_path_out, scan_id)[:-4], sub_points)
np.save(join(label_path_out, scan_id)[:-4], sub_labels)
with open(KDTree_save,'wb')as f:
pickle.dump(arch_tree, f)
可以看到,上述预处理,是把point和label做了grid sampling,并且⽣成了⼀个kdtree保存下来。
datat
是⽤main_SemanticKITTI/SemanticKITTI这个类实现的,我们看看这部分做了什么
class SemanticKITTI:
def__init__(lf, test_id):
...
# Generate the input data flow
def get_batch_gen(lf, split):
...
def spatially_regular_gen():
# Generator loop
# line 72-79
for i in range(num_per_epoch):
if split !='test':
cloud_ind = i
pc_path = path_list[cloud_ind]
pc, tree, labels = lf.get_data(pc_path)
# crop a small point cloud
pick_idx = np.random.choice(len(pc),1)
lected_pc, lected_labels, lected_idx = lf.crop_pc(pc, labels, tree, pick_idx)
...
...
各省简称顺口溜return gen_func, gen_types, gen_shapes
def get_data(lf, file_path):#从file_path所指向的⽂件中读⼊point,kdtree和label
...
return points, arch_tree, labels
@staticmethod
def crop_pc(points, labels, arch_tree, pick_idx):
# crop a fixed size point cloud for training
# crop a fixed size point cloud for training
center_point = points[pick_idx,:].reshape(1,-1)
lect_idx = arch_tree.query(center_point, k=cfg.num_points)[1][0]
lect_idx = DP.shuffle_idx(lect_idx)
lect_points = points[lect_idx]
lect_labels = labels[lect_idx]
return lect_points, lect_labels, lect_idx
@staticmethod
def get_tf_mapping2():
def tf_map(batch_pc, batch_label, batch_pc_idx, batch_cloud_idx):
features = batch_pc
input_points =[]
input_neighbors =[]
input_pools =[]
input_up_samples =[]
for i in range(cfg.num_layers):
neighbour_idx = tf.py_func(DP.knn_arch,[batch_pc, batch_pc, cfg.k_n], tf.int32)
sub_points = batch_pc[:,:tf.shape(batch_pc)[1]// cfg.sub_sampling_ratio[i],:]
pool_i = neighbour_idx[:,:tf.shape(batch_pc)[1]// cfg.sub_sampling_ratio[i],:]
up_i = tf.py_func(DP.knn_arch,[sub_points, batch_pc,1], tf.int32)
input_points.append(batch_pc)
input_neighbors.append(neighbour_idx)
input_pools.append(pool_i)
input_up_samples.append(up_i)
batch_pc = sub_points
input_list = input_points + input_neighbors + input_pools + input_up_samples
易经起名input_list +=[features, batch_label, batch_pc_idx, batch_cloud_idx]
return input_list
return tf_map
def init_input_pipeline(lf):
.
..
上述可以出从预处理的数据得到训练数据的过程:
1. line 72:通过get_data这个函数从⽂件中读取预处理保存好的point,kdtree,label
2. line 79:通过crop_pc这个函数,从point中任意指定⼀个初始点,寻找cfg.num_points临近点作为crop后的点,也就是⽹络的输
⼊。然后打乱这些点的顺序
3. line 141-150:计算输⼊点中每个点的k临近点,计算下采样的点(由于点的顺讯已经打乱,所以从前⾯取就⾏),计算下采样的点
在上⼀层点的序号pool_idx,计算恢复过程中要使⽤的序号up_idx
4. line 152-153:将上述信息合起来作为⽹络的输⼊
RandLA-Net
由于RandLA-Net的结构⽐较简单,就不画图表⽰了,直接看代码吧。
前向计算
def inference(lf, inputs, is_training):
d_out = lf.config.d_out
feature = inputs['features']
feature = tf.layers.den(feature,8, activation=None, name='fc0')
feature = tf.nn.leaky_relu(tf.layers.batch_normalization(feature,-1,0.99,1e-6, training=is_training))
feature = tf.expand_dims(feature, axis=2)
# >>>>>##Encoder>>>>>###
f_encoder_list =[]
for i in fig.num_layers):
养蚕日记
f_encoder_i = lf.dilated_res_block(feature, inputs['xyz'][i], inputs['neigh_idx'][i], d_out[i],
'Encoder_layer_'+str(i), is_training)
f_sampled_i = lf.random_sample(f_encoder_i, inputs['sub_idx'][i])
feature = f_sampled_i
if i ==0:
f_encoder_list.append(f_encoder_i)
f_encoder_list.append(f_sampled_i)
# >>>>>##Encoder>>>>>###
feature = helper_v2d(f_encoder_list[-1], f_encoder_list[-1].get_shape()[3].value,[1,1],
'decoder_0',
[1,1],'VALID',True, is_training)
# >>>>>##Decoder>>>>>###
f_decoder_list =[]
for j in fig.num_layers):
f_interp_i = lf.nearest_interpolation(feature, inputs['interp_idx'][-j -1])
宏伟
f_decoder_i = helper_v2d_at([f_encoder_list[-j -2], f_interp_i], axis=3),
f_encoder_list[-j -2].get_shape()[-1].value,[1,1],孟尝君
'Decoder_layer_'+str(j),[1,1],'VALID', bn=True,
is_training=is_training)
feature = f_decoder_i
f_decoder_list.append(f_decoder_i)
# >>>>>##Decoder>>>>>###
f_layer_fc1 = helper_v2d(f_decoder_list[-1],64,[1,1],'fc1',[1,1],'VALID',True, is_training)
f_layer_fc2 = helper_v2d(f_layer_fc1,32,[1,1],'fc2',[1,1],'VALID',True, is_training)
f_layer_drop = helper_tf_util.dropout(f_layer_fc2, keep_prob=0.5, is_training=is_training, scope='dp1')
f_layer_fc3 = helper_v2d(f_layer_drop, lf.config.num_class,[1,1],'fc',[1,1],'VALID',Fal,
is_training, activation_fn=None)
f_out = tf.squeeze(f_layer_fc3,[2])
return f_out
inference这个函数就是前向计算的函数,从中可以看到,RandLA-Net有以下结构:
1. 将特征升维到8
2. encoder:由4个(dilated_res_block+random_sample)构成,形成特征⾦字塔
3. 将⾦字塔尖的特征再次计算以下
4. decoder:由4个(nearest_interpolation+conv2d_transpo)构成,恢复到point-wi的特征
5. 由point-wi经过⼀些MLP,得到f_out
Encoder馄饨馅怎么调才好吃
dilated_res_block
def dilated_res_block(lf, feature, xyz, neigh_idx, d_out, name, is_training):
f_pc = helper_v2d(feature, d_out //2,[1,1], name +'mlp1',[1,1],'VALID',True, is_training)
f_pc = lf.building_block(xyz, f_pc, neigh_idx, d_out, name +'LFA', is_training)
f_pc = helper_v2d(f_pc, d_out *2,[1,1], name +'mlp2',[1,1],'VALID',True, is_training,
activation_fn=None)
shortcut = helper_v2d(feature, d_out *2,[1,1], name +'shortcut',[1,1],'VALID',
activation_fn=None, bn=True, is_training=is_training)
leaky_relu(f_pc + shortcut)
dilated_res_block有以下结构:
1. 将特征降维
2. 使⽤building_block聚集周围点的特征
壬读音3. 将特征升维
4. 计算short_cut的特征
5. 将两种特征加和,形成res的结构
那么building_block的结构:
#line 279-291
def building_block(lf, xyz, feature, neigh_idx, d_out, name, is_training):
d_in = _shape()[-1].value
红烧白菜
f_xyz = lf.relative_pos_encoding(xyz, neigh_idx)
f_xyz = helper_v2d(f_xyz, d_in,[1,1], name +'mlp1',[1,1],'VALID',True, is_training)
f_neighbours = lf.gather_neighbour(tf.squeeze(feature, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = lf.att_pooling(f_concat, d_out //2, name +'att_pooling_1', is_training)
f_xyz = helper_v2d(f_xyz, d_out //2,[1,1], name +'mlp2',[1,1],'VALID',True, is_training)
f_neighbours = lf.gather_neighbour(tf.squeeze(f_pc_agg, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = lf.att_pooling(f_concat, d_out, name +'att_pooling_2', is_training)
return f_pc_agg
dilated_res_block有以下结构:
1. 通过relative_pos_encoding提取k临近点相对于参考点的特征,对应⽂章中的公式(1)
2. 将得到的k临近点对于参考点的特征升维
3. 将k临近点相对于参考点的特征与k临近点本⾝的特征做cat
4. 使⽤att_pooling得到参考点的特征,对应⽂章公式(2)(3)
5. 再次对参考点的k临近点本⾝的特征与k临近点对于参考点的特征做cat
6. 使⽤att_pooling得到参考点的特征
通过两次更新参考点本⾝的特征,将参考点k临近点的信息聚集在参考点上
random_sample
这个sample的选择的点在构造数据集中已经定了,是通过将点顺序打乱,然后取前⾯⼀定数量的点得到的。Decoder
主要就是看⼀下如何做的nearest_interpolation
def nearest_interpolation(feature, interp_idx):
"""
:param feature: [B, N, d] input features matrix
:param interp_idx: [B, up_num_points, 1] nearest neighbour index
:return: [B, up_num_points, d] interpolated features matrix
"""
feature = tf.squeeze(feature, axis=2)
batch_size = tf.shape(interp_idx)[0]
up_num_points = tf.shape(interp_idx)[1]
interp_idx = tf.reshape(interp_idx,[batch_size, up_num_points])
interpolated_features = tf.batch_gather(feature, interp_idx)
interpolated_features = tf.expand_dims(interpolated_features, axis=2)
return interpolated_features
可以看到是根据interp_idx来上插值feature
# main_SemanticKITTI.py
# line 145
up_i = tf.py_func(DP.knn_arch,[sub_points, batch_pc,1], tf.int32)
可以看到up_i(其实就是interp_idx)是通过找1临近点来确定的。
也就是说nearest_interpolation将最临近点的特征赋值给新点的特征的。这样⼦做是与Pointnet++的interpolation有⼀定区别的。Loss
# RandLANet.py
# line 57-76
with tf.variable_scope('loss'):
lf.logits = tf.reshape(lf.logits,[-1, config.num_class])
lf.labels = tf.reshape(lf.labels,[-1])
# Boolean mask of points that should be ignored
ignored_bool = tf.zeros_like(lf.labels, dtype=tf.bool)
for ign_label fig.ignored_label_inds:
ignored_bool = tf.logical_or(ignored_bool, tf.equal(lf.labels, ign_label))
# Collect logits and labels that are not ignored
valid_idx = tf.squeeze(tf.where(tf.logical_not(ignored_bool)))
valid_logits = tf.gather(lf.logits, valid_idx, axis=0)
valid_labels_init = tf.gather(lf.labels, valid_idx, axis=0)
# Reduce label values in the range of logit shape
reducing_list = tf.fig.num_class, dtype=tf.int32)
inrted_value = tf.zeros((1,), dtype=tf.int32)
for ign_label fig.ignored_label_inds:
reducing_list = tf.concat([reducing_list[:ign_label], inrted_value, reducing_list[ign_label:]],0)
valid_labels = tf.gather(reducing_list, valid_labels_init)
Loss的计算包含如下结构:
1. 将前向计算的logits和label变扁平
2. 除去ignore的label的点
3. 更新label,由于有些label是ignored,所以这些label的序号要去掉,ignored label序号之后的⼀些label的序号要前移。
4. 加载每个label的weight,⽤CrossEntropy计算loss

本文发布于:2023-05-27 09:14:51,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/791242.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:特征   临近   参考点
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图