avod_源码记录
AVOD_源碼記錄
Table of Contents
AVOD代碼框架
主要分為以下幾個(gè)部分:
- 預(yù)生成數(shù)據(jù)
- Train
- Evaluate+Infer
代碼細(xì)節(jié)
預(yù)生成數(shù)據(jù)
用于生成rpn網(wǎng)絡(luò)的輸入數(shù)據(jù):包含類(lèi)聚類(lèi)的anchor大小信息以及具體每個(gè)sample的anchor的生成的anchor信息
調(diào)用鏈
base_dir = avod/
config = avod/avod/configs/mb_preprocessing/rpn_cars(cyclists,pedestrians,people).config
主要的相關(guān)模塊調(diào)用:
scripts/preprocessing/gen_min_batches.py->avod/builders/dataset_builder.py(build_kitti_dataset)->avod/datasets/kitti/kitti_dataset.py(KittiDataset)->avod/datasets/kitti/kitti_utils.py(KittiUtils)->avod/core/mini_batch_utils.py(MiniBatchUtils.preprocess_rpn_mini_batches)->avod/core/mini_batch_preprocessor.py(MiniBatchPreprocessor.preprocess->avod/core/anchor_generator/grid_anchor_3d_generator.py(GridAnchor3dGenerator.generate)
核心部分
數(shù)據(jù)前處理:mini_batch Anchor生成
Avod數(shù)據(jù)前處理gen_minbacth包括兩個(gè)部分:生成不同類(lèi)的size的cluster結(jié)果;利用聚類(lèi)結(jié)果生成不同類(lèi)的Anchor信息,作為RPN的輸入數(shù)據(jù)
Anchor信息具體為:[max_gt_2d_iou, max_gt_3d_iou, (6 x offsets), class_index],anchor對(duì)應(yīng)的gt_iou(2d和3d),anchor偏移值,對(duì)應(yīng)類(lèi)的index
具體步驟為:
- 先生成anchor_stride(默認(rèn)為0.5)的3d anchor
- 生成voxel 2d圖,進(jìn)行empty-anchor的過(guò)濾
- anchors與gt進(jìn)行iou的計(jì)算,確定與生成的anchor iou最高的類(lèi),更新offsets與class_index
核心代碼如下:
# mini_batch_preprocessor.py:49 def preprocess(self, indices):"""Preprocesses anchor info and saves info to filesArgs:indices (int array): sample indices to process.If None, processes all samples"""# Get anchor stride for class,默認(rèn)為0.5anchor_strides = self._anchor_stridesdataset = self._datasetdataset_utils = self._dataset.kitti_utilsclasses_name = dataset.classes_name# Make folder if it doesn't exist yetoutput_dir = self.mini_batch_utils.get_file_path(classes_name,anchor_strides,sample_name=None)os.makedirs(output_dir, exist_ok=True)# Get clusters for class# 生成的cluster size用于anchor size的生成all_clusters_sizes, _ = dataset.get_cluster_info()# 初始化3d_anchor_generatoranchor_generator = grid_anchor_3d_generator.GridAnchor3dGenerator()# Load indices of data_splitall_samples = dataset.sample_listif indices is None:indices = np.arange(len(all_samples))num_samples = len(indices)# For each image in the dataset, save info on the anchorsfor sample_idx in indices:# Get image name for given clustersample_name = all_samples[sample_idx].nameimg_idx = int(sample_name)# Check for existing files and skip to the nextif self._check_for_existing(classes_name, anchor_strides,sample_name):print("{} / {}: Sample already preprocessed".format(sample_idx + 1, num_samples, sample_name))continue# Get ground truth and filter based on difficultyground_truth_list = obj_utils.read_labels(dataset.label_dir,img_idx)# Filter objects to dataset classesfiltered_gt_list = dataset_utils.filter_labels(ground_truth_list)filtered_gt_list = np.asarray(filtered_gt_list)# Filtering by class has no valid ground truth, skip this imageif len(filtered_gt_list) == 0:print("{} / {} No {}s for sample {} ""(Ground Truth Filter)".format(sample_idx + 1, num_samples,classes_name, sample_name))# Output an empty file and move on to the next image.self._save_to_file(classes_name, anchor_strides, sample_name)continue# Get ground planeground_plane = obj_utils.get_road_plane(img_idx,dataset.planes_dir)image = Image.open(dataset.get_rgb_image_path(sample_name))image_shape = [image.size[1], image.size[0]]# Generate sliced 2D voxel grid for filtering# 生成2d voxel grid,這里只保留了image視角內(nèi)bev圖信息vx_grid_2d = dataset_utils.create_sliced_voxel_grid_2d(sample_name,source=dataset.bev_source,image_shape=image_shape)# List for merging all anchorsall_anchor_boxes_3d = []# Create anchors for each classfor class_idx in range(len(dataset.classes)):# Generate anchors for all classes# 根據(jù)不同class的anchor大小以及stride和plane生成3d anchorgrid_anchor_boxes_3d = anchor_generator.generate(area_3d=self._area_extents,anchor_3d_sizes=all_clusters_sizes[class_idx],anchor_stride=self._anchor_strides[class_idx],ground_plane=ground_plane)all_anchor_boxes_3d.extend(grid_anchor_boxes_3d)# Filter empty anchorsall_anchor_boxes_3d = np.asarray(all_anchor_boxes_3d)anchors = box_3d_encoder.box_3d_to_anchor(all_anchor_boxes_3d)empty_anchor_filter = anchor_filter.get_empty_anchor_filter_2d(anchors, vx_grid_2d, self._density_threshold)# Calculate anchor info# 這里更新了所有anchor和gt的iou信息,以找到anchor匹配的目標(biāo)targetanchors_info = self._calculate_anchors_info(all_anchor_boxes_3d, empty_anchor_filter, filtered_gt_list)anchor_ious = anchors_info[:, self.mini_batch_utils.col_ious]valid_iou_indices = np.where(anchor_ious > 0.0)[0]print("{} / {}:""{:>6} anchors, ""{:>6} iou > 0.0, ""for {:>3} {}(s) for sample {}".format(sample_idx + 1, num_samples,len(anchors_info),len(valid_iou_indices),len(filtered_gt_list), classes_name, sample_name))# Save anchors infoself._save_to_file(classes_name, anchor_strides,sample_name, anchors_info)其中3D Anchor生成的步驟:
-
確定Anchor生成范圍(area_extents)
-
根據(jù)stride生成anchor的center點(diǎn)分布
-
生成size和rotation分布->生成anchor matrix
def tile_anchors_3d(area_extents,anchor_3d_sizes,anchor_stride,ground_plane): """ Tiles anchors over the area extents by using meshgrids to generate combinations of (x, y, z), (l, w, h) and ry. Args:area_extents: [[min_x, max_x], [min_y, max_y], [min_z, max_z]]anchor_3d_sizes: list of 3d anchor sizes N x (l, w, h)anchor_stride: stride lengths (x_stride, z_stride)ground_plane: coefficients of the ground plane e.g. [0, -1, 0, 0]Returns:boxes: list of 3D anchors in box_3d format N x [x, y, z, l, w, h, ry] """ # Convert sizes to ndarray # 由于kitti坐標(biāo)系的原因:x,z軸定義的為地平面坐標(biāo)系,而y軸對(duì)應(yīng)高度 anchor_3d_sizes = np.asarray(anchor_3d_sizes)anchor_stride_x = anchor_stride[0] anchor_stride_z = anchor_stride[1] anchor_rotations = np.asarray([0, np.pi / 2.0])x_start = area_extents[0][0] + anchor_stride[0] / 2.0 x_end = area_extents[0][1] x_centers = np.array(np.arange(x_start, x_end, step=anchor_stride_x),dtype=np.float32)z_start = area_extents[2][1] - anchor_stride[1] / 2.0 z_end = area_extents[2][0] z_centers = np.array(np.arange(z_start, z_end, step=-anchor_stride_z),dtype=np.float32)# Use ranges for substitution size_indices = np.arange(0, len(anchor_3d_sizes)) rotation_indices = np.arange(0, len(anchor_rotations))# Generate matrix for substitution # e.g. for two sizes and two rotations # [[x0, z0, 0, 0], [x0, z0, 0, 1], [x0, z0, 1, 0], [x0, z0, 1, 1], # [x1, z0, 0, 0], [x1, z0, 0, 1], [x1, z0, 1, 0], [x1, z0, 1, 1], ...] before_sub = np.stack(np.meshgrid(x_centers,z_centers,size_indices,rotation_indices),axis=4).reshape(-1, 4)# Place anchors on the ground plane # 利用之前的meshgrid生成anchor的center點(diǎn) a, b, c, d = ground_plane all_x = before_sub[:, 0] all_z = before_sub[:, 1] all_y = -(a * all_x + c * all_z + d) / b# Create empty matrix to return num_anchors = len(before_sub) all_anchor_boxes_3d = np.zeros((num_anchors, 7))# Fill in x, y, z all_anchor_boxes_3d[:, 0:3] = np.stack((all_x, all_y, all_z), axis=1)# Fill in shapes sizes = anchor_3d_sizes[np.asarray(before_sub[:, 2], np.int32)] all_anchor_boxes_3d[:, 3:6] = sizes# Fill in rotations rotations = anchor_rotations[np.asarray(before_sub[:, 3], np.int32)] all_anchor_boxes_3d[:, 6] = rotationsreturn all_anchor_boxes_3d
模型訓(xùn)練
avod模型的整體結(jié)構(gòu)包括backbone+RPN+avod網(wǎng)絡(luò)三個(gè)部分,詳情參照avod_paperreading
backbone采用的是VGG+FPN的結(jié)構(gòu),但是添加了bev feature的設(shè)計(jì)(lidar三維數(shù)據(jù)轉(zhuǎn)化為二維的bev特征),后與image feature進(jìn)行融合,RPN網(wǎng)絡(luò)用于生成region proposal,avod用于最后物體的分類(lèi)和檢測(cè)框的回歸
調(diào)用鏈
base_dir = avod/
主要的相關(guān)模塊調(diào)用:
config = avod/config/pyramid_cars_with_aug_example.config
scripts/run_training.py->avod/avod/core/trainer.py(這里會(huì)完成model,input_data,loss,op等模塊的構(gòu)建)->avod/avod/core/models/avod_model.py->avod/avod/core/models/rpn_model.py
核心部分
數(shù)據(jù)前處理
訓(xùn)練的數(shù)據(jù)前處理與前文的預(yù)生成數(shù)據(jù)的區(qū)別是這里是對(duì)輸入的原始數(shù)據(jù)進(jìn)行處理,主要分為以下幾個(gè)部分:
三維點(diǎn)云數(shù)據(jù)的讀取和過(guò)濾:
三維點(diǎn)云數(shù)據(jù)讀入后需要進(jìn)行去除在image視角外的點(diǎn)云數(shù)據(jù)包括兩個(gè)部分:ground_plane_filter+image_filter,前者主要用于生成bev圖特征(對(duì)應(yīng)不同高度生成不同體素空間,進(jìn)行點(diǎn)的特征編碼,參照bev的生成),后者主要是將對(duì)應(yīng)cam view外的點(diǎn)進(jìn)行過(guò)濾。
BEV圖的生成
BEV圖生成原理是在過(guò)濾后的點(diǎn)云數(shù)據(jù)上,根據(jù)height_lo和height_hi的高度范圍(相對(duì)于ground_plane)生成num_slices個(gè)y軸維度的切片(slices)每個(gè)切片上按照voxel_size生成一系列單元(voxel),以其中點(diǎn)云的最高點(diǎn)高度作為feature,最終生成(bev_width/voxel_size)*(bev_height/voxel_size)*(num_slices+1)維特征,+1為記錄的density信息,代碼如下
#avod/acod/datasets/kitti/kitti_utils.py:109def generate_bev(self,source,point_cloud,ground_plane,area_extents,voxel_size):"""Generates the BEV maps dictionary. One height map is created foreach slice of the point cloud. One density map is created forthe whole point cloud.Args:source: point cloud sourcepoint_cloud: point cloud (3, N)ground_plane: ground plane coefficientsarea_extents: 3D area extents[[min_x, max_x], [min_y, max_y], [min_z, max_z]]voxel_size: voxel size in mReturns:BEV maps dictionaryheight_maps: list of height mapsdensity_map: density map"""#得到點(diǎn)云數(shù)據(jù)all_points = np.transpose(point_cloud)height_maps = []for slice_idx in range(self.num_slices):height_lo = self.height_lo + slice_idx * self.height_per_divisionheight_hi = height_lo + self.height_per_division#slice_filter相對(duì)ground_plane根據(jù)高度進(jìn)行每個(gè)slice點(diǎn)云的過(guò)濾slice_filter = self.kitti_utils.create_slice_filter(point_cloud,area_extents,ground_plane,height_lo,height_hi)# Apply slice filterslice_points = all_points[slice_filter]if len(slice_points) > 1:# Create Voxel Grid 2Dvoxel_grid_2d = VoxelGrid2D()voxel_grid_2d.voxelize_2d(slice_points, voxel_size,extents=area_extents,ground_plane=ground_plane,create_leaf_layout=False)# Remove y values (all 0)voxel_indices = voxel_grid_2d.voxel_indices[:, [0, 2]]# Create empty BEV imagesheight_map = np.zeros((voxel_grid_2d.num_divisions[0],voxel_grid_2d.num_divisions[2]))# Only update pixels where voxels have max height values,# and normalize by height of slices# 生成含有最大高度信息的height_mapvoxel_grid_2d.heights = voxel_grid_2d.heights - height_loheight_map[voxel_indices[:, 0], voxel_indices[:, 1]] = \np.asarray(voxel_grid_2d.heights) / self.height_per_divisionheight_maps.append(height_map)# Rotate height maps 90 degrees# (transpose and flip) is faster than np.rot90# 應(yīng)該是坐標(biāo)系定義的問(wèn)題(image和bev)height_maps_out = [np.flip(height_maps[map_idx].transpose(), axis=0)for map_idx in range(len(height_maps))]#得到density的filter,在全量高度上得到density_slice_filter = self.kitti_utils.create_slice_filter(point_cloud,area_extents,ground_plane,self.height_lo,self.height_hi)density_points = all_points[density_slice_filter]# Create Voxel Grid 2Ddensity_voxel_grid_2d = VoxelGrid2D()density_voxel_grid_2d.voxelize_2d(density_points,voxel_size,extents=area_extents,ground_plane=ground_plane,create_leaf_layout=False)# Generate density mapdensity_voxel_indices_2d = \density_voxel_grid_2d.voxel_indices[:, [0, 2]]density_map = self._create_density_map(num_divisions=density_voxel_grid_2d.num_divisions,voxel_indices_2d=density_voxel_indices_2d,num_pts_per_voxel=density_voxel_grid_2d.num_pts_in_voxel,norm_value=self.NORM_VALUES[source])bev_maps = dict()bev_maps['height_maps'] = height_maps_outbev_maps['density_map'] = density_mapreturn bev_maps數(shù)據(jù)增強(qiáng)(data augumentation)
這部分主要是在讀入數(shù)據(jù)的過(guò)程中會(huì)進(jìn)行數(shù)據(jù)的增強(qiáng)操作,默認(rèn)car的增強(qiáng)操作包括:flipping+pca_jitter。
Backbone
backbone(feature extactor)包括兩個(gè)部分:bev和image,整體結(jié)構(gòu)類(lèi)似,具體實(shí)現(xiàn)參考下文代碼,其結(jié)構(gòu)可以概述為conv1*2->pool1->conv2*2->pool2->conv3*2->pool3->conv4->(upconv3+concat3+fusion3)->(upconv2+concat2+fusion2)->(upconv1+concat1+fusion1)
#avod/core/feature_extractors/bev_vgg_pyramid.py:30 def build(self,inputs,input_pixel_size,is_training,scope='bev_vgg_pyr'):""" Modified VGG for BEV feature extraction with pyramid featuresArgs:inputs: a tensor of size [batch_size, height, width, channels].input_pixel_size: size of the input (H x W)is_training: True for training, False for validation/testing.scope: Optional scope for the variables.Returns:The last op containing the log predictions and end_points dict."""vgg_config = self.configwith slim.arg_scope(self.vgg_arg_scope(weight_decay=vgg_config.l2_weight_decay)):with tf.variable_scope(scope, 'bev_vgg_pyr', [inputs]) as sc:end_points_collection = sc.name + '_end_points'# Collect outputs for conv2d, fully_connected and max_pool2d.with slim.arg_scope([slim.conv2d, slim.max_pool2d],outputs_collections=end_points_collection):# Pad 700 to 704 to allow even divisions for max poolingpadded = tf.pad(inputs, [[0, 0], [4, 0], [0, 0], [0, 0]])# Encoderconv1 = slim.repeat(padded,vgg_config.vgg_conv1[0],slim.conv2d,vgg_config.vgg_conv1[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='conv1')pool1 = slim.max_pool2d(conv1, [2, 2], scope='pool1')conv2 = slim.repeat(pool1,vgg_config.vgg_conv2[0],slim.conv2d,vgg_config.vgg_conv2[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='conv2')pool2 = slim.max_pool2d(conv2, [2, 2], scope='pool2')conv3 = slim.repeat(pool2,vgg_config.vgg_conv3[0],slim.conv2d,vgg_config.vgg_conv3[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='conv3')pool3 = slim.max_pool2d(conv3, [2, 2], scope='pool3')conv4 = slim.repeat(pool3,vgg_config.vgg_conv4[0],slim.conv2d,vgg_config.vgg_conv4[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='conv4')# Decoder (upsample and fuse features)upconv3 = slim.conv2d_transpose(conv4,vgg_config.vgg_conv3[1],[3, 3],stride=2,normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='upconv3')concat3 = tf.concat((conv3, upconv3), axis=3, name='concat3')pyramid_fusion3 = slim.conv2d(concat3,vgg_config.vgg_conv2[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='pyramid_fusion3')upconv2 = slim.conv2d_transpose(pyramid_fusion3,vgg_config.vgg_conv2[1],[3, 3],stride=2,normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='upconv2')concat2 = tf.concat((conv2, upconv2), axis=3, name='concat2')pyramid_fusion_2 = slim.conv2d(concat2,vgg_config.vgg_conv1[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='pyramid_fusion2')upconv1 = slim.conv2d_transpose(pyramid_fusion_2,vgg_config.vgg_conv1[1],[3, 3],stride=2,normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='upconv1')concat1 = tf.concat((conv1, upconv1), axis=3, name='concat1')pyramid_fusion1 = slim.conv2d(concat1,vgg_config.vgg_conv1[1],[3, 3],normalizer_fn=slim.batch_norm,normalizer_params={'is_training': is_training},scope='pyramid_fusion1')# Slice off padded areasliced = pyramid_fusion1[:, 4:]feature_maps_out = sliced# Convert end_points_collection into a end_point dict.end_points = slim.utils.convert_collection_to_dict(end_points_collection)return feature_maps_out, end_pointsRPN Model
Backbone(feature extraction)出來(lái)的feature會(huì)分別經(jīng)過(guò)一個(gè)1*1的卷積(bottle_neck)生成proposal網(wǎng)絡(luò)的input_feature。默認(rèn)配置設(shè)置了path_drop:image和bev兩個(gè)path會(huì)有一定的幾率沒(méi)有輸入,類(lèi)似于drop_out(具體參考avod/avod/core/models/rpn.py:create_path_drop_masks)。之后會(huì)將得到的3d anchor映射到bev圖和image圖上,前者直接投影到ground_plane上,后者通過(guò)lidar坐標(biāo)和image坐標(biāo)的映射關(guān)系得到(取最大的2d框)。之后根據(jù)config中的roi_crop_size將得到的proposal feature進(jìn)行crop_and_resize到相同尺寸。之后會(huì)做特征的fusion(默認(rèn)采用mean fusion),fusioned feature會(huì)通過(guò)兩個(gè)分支:3層卷積(論文中為fc,實(shí)際代碼中為convd)組成的objectness和offsets的預(yù)測(cè),這樣就形成了first stage的proposal,之后proposal一方面會(huì)通過(guò)top-k的nms(注意這里的nms是所有類(lèi)共同做的nms結(jié)果)作為second stage的輸入,另一方面通過(guò)gen_mini_batch生成mini-batch(默認(rèn)為512個(gè)samples,正負(fù)例各一半)計(jì)算objectness和regression loss(smooth l1),值得注意的是這里的是生成mini-batch的方式采用的是random shuffile的方式,即先shuffle一半的正例(256),如果不足的話(huà)用負(fù)例補(bǔ)充,沒(méi)有考慮類(lèi)比不平衡的問(wèn)題,所以會(huì)造成小樣本類(lèi)別物體收斂慢甚至不收斂的問(wèn)題。其build 網(wǎng)絡(luò)部分代碼如下:
#rpn_model.py:280, deteled some code for summary def build(self):# Setup input placeholdersself._set_up_input_pls()# Setup feature extractorsself._set_up_feature_extractors()bev_proposal_input = self.bev_bottleneckimg_proposal_input = self.img_bottleneckfusion_mean_div_factor = 2.0# If both img and bev probabilites are set to 1.0, don't do# path drop.if not (self._path_drop_probabilities[0] ==self._path_drop_probabilities[1] == 1.0):with tf.variable_scope('rpn_path_drop'):random_values = tf.random_uniform(shape=[3],minval=0.0,maxval=1.0)img_mask, bev_mask = self.create_path_drop_masks(self._path_drop_probabilities[0],self._path_drop_probabilities[1],random_values)img_proposal_input = tf.multiply(img_proposal_input,img_mask)bev_proposal_input = tf.multiply(bev_proposal_input,bev_mask)self.img_path_drop_mask = img_maskself.bev_path_drop_mask = bev_mask# Overwrite the division factorfusion_mean_div_factor = img_mask + bev_maskwith tf.variable_scope('proposal_roi_pooling'):with tf.variable_scope('box_indices'):def get_box_indices(boxes):proposals_shape = boxes.get_shape().as_list()if any(dim is None for dim in proposals_shape):proposals_shape = tf.shape(boxes)ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32)multiplier = tf.expand_dims(tf.range(start=0, limit=proposals_shape[0]), 1)return tf.reshape(ones_mat * multiplier, [-1])bev_boxes_norm_batches = tf.expand_dims(self._bev_anchors_norm_pl, axis=0)# These should be all 0's since there is only 1 imagetf_box_indices = get_box_indices(bev_boxes_norm_batches)# Do ROI Pooling on BEVbev_proposal_rois = tf.image.crop_and_resize(bev_proposal_input,self._bev_anchors_norm_pl,tf_box_indices,self._proposal_roi_crop_size)# Do ROI Pooling on imageimg_proposal_rois = tf.image.crop_and_resize(img_proposal_input,self._img_anchors_norm_pl,tf_box_indices,self._proposal_roi_crop_size)with tf.variable_scope('proposal_roi_fusion'):rpn_fusion_out = Noneif self._fusion_method == 'mean':tf_features_sum = tf.add(bev_proposal_rois, img_proposal_rois)rpn_fusion_out = tf.divide(tf_features_sum,fusion_mean_div_factor)elif self._fusion_method == 'concat':rpn_fusion_out = tf.concat([bev_proposal_rois, img_proposal_rois], axis=3)else:raise ValueError('Invalid fusion method', self._fusion_method)# TODO: move this section into an separate AnchorPredictor classwith tf.variable_scope('anchor_predictor', 'ap', [rpn_fusion_out]):tensor_in = rpn_fusion_out# Parse rpn layers configlayers_config = self._config.layers_config.rpn_configl2_weight_decay = layers_config.l2_weight_decayif l2_weight_decay > 0:weights_regularizer = slim.l2_regularizer(l2_weight_decay)else:weights_regularizer = Nonewith slim.arg_scope([slim.conv2d],weights_regularizer=weights_regularizer):# Use conv2d instead of fully_connected layers.cls_fc6 = slim.conv2d(tensor_in,layers_config.cls_fc6,self._proposal_roi_crop_size,padding='VALID',scope='cls_fc6')cls_fc6_drop = slim.dropout(cls_fc6,layers_config.keep_prob,is_training=self._is_training,scope='cls_fc6_drop')cls_fc7 = slim.conv2d(cls_fc6_drop,layers_config.cls_fc7,[1, 1],scope='cls_fc7')cls_fc7_drop = slim.dropout(cls_fc7,layers_config.keep_prob,is_training=self._is_training,scope='cls_fc7_drop')cls_fc8 = slim.conv2d(cls_fc7_drop,2,[1, 1],activation_fn=None,scope='cls_fc8')objectness = tf.squeeze(cls_fc8, [1, 2],name='cls_fc8/squeezed')# Use conv2d instead of fully_connected layers.reg_fc6 = slim.conv2d(tensor_in,layers_config.reg_fc6,self._proposal_roi_crop_size,padding='VALID',scope='reg_fc6')reg_fc6_drop = slim.dropout(reg_fc6,layers_config.keep_prob,is_training=self._is_training,scope='reg_fc6_drop')reg_fc7 = slim.conv2d(reg_fc6_drop,layers_config.reg_fc7,[1, 1],scope='reg_fc7')reg_fc7_drop = slim.dropout(reg_fc7,layers_config.keep_prob,is_training=self._is_training,scope='reg_fc7_drop')reg_fc8 = slim.conv2d(reg_fc7_drop,6,[1, 1],activation_fn=None,scope='reg_fc8')offsets = tf.squeeze(reg_fc8, [1, 2],name='reg_fc8/squeezed')# Return the proposalswith tf.variable_scope('proposals'):anchors = self.placeholders[self.PL_ANCHORS]# Decode anchor regression offsetswith tf.variable_scope('decoding'):regressed_anchors = anchor_encoder.offset_to_anchor(anchors, offsets)with tf.variable_scope('bev_projection'):_, bev_proposal_boxes_norm = anchor_projector.project_to_bev(regressed_anchors, self._bev_extents)with tf.variable_scope('softmax'):objectness_softmax = tf.nn.softmax(objectness)with tf.variable_scope('nms'):objectness_scores = objectness_softmax[:, 1]# Do NMS on regressed anchorstop_indices = tf.image.non_max_suppression(bev_proposal_boxes_norm, objectness_scores,max_output_size=self._nms_size,iou_threshold=self._nms_iou_thresh)top_anchors = tf.gather(regressed_anchors, top_indices)top_objectness_softmax = tf.gather(objectness_scores,top_indices)# top_offsets = tf.gather(offsets, top_indices)# top_objectness = tf.gather(objectness, top_indices)# Get mini batchall_ious_gt = self.placeholders[self.PL_ANCHOR_IOUS]all_offsets_gt = self.placeholders[self.PL_ANCHOR_OFFSETS]all_classes_gt = self.placeholders[self.PL_ANCHOR_CLASSES]with tf.variable_scope('mini_batch'):mini_batch_utils = self.dataset.kitti_utils.mini_batch_utilsmini_batch_mask, _ = \mini_batch_utils.sample_rpn_mini_batch(all_ious_gt)# Ground Truth Tensorswith tf.variable_scope('one_hot_classes'):# Anchor classification ground truth# Object / Not Objectmin_pos_iou = \self.dataset.kitti_utils.mini_batch_utils.rpn_pos_iou_range[0]objectness_classes_gt = tf.cast(tf.greater_equal(all_ious_gt, min_pos_iou),dtype=tf.int32)objectness_gt = tf.one_hot(objectness_classes_gt, depth=2,on_value=1.0 - self._config.label_smoothing_epsilon,off_value=self._config.label_smoothing_epsilon)# Mask predictions for mini batchwith tf.variable_scope('prediction_mini_batch'):objectness_masked = tf.boolean_mask(objectness, mini_batch_mask)offsets_masked = tf.boolean_mask(offsets, mini_batch_mask)with tf.variable_scope('ground_truth_mini_batch'):objectness_gt_masked = tf.boolean_mask(objectness_gt, mini_batch_mask)offsets_gt_masked = tf.boolean_mask(all_offsets_gt,mini_batch_mask)# Specify the tensors to evaluatepredictions = dict()# Temporary predictions for debugging# predictions['anchor_ious'] = anchor_ious# predictions['anchor_offsets'] = all_offsets_gtif self._train_val_test in ['train', 'val']:# All anchorspredictions[self.PRED_ANCHORS] = anchors# Mini-batch maskspredictions[self.PRED_MB_MASK] = mini_batch_mask# Mini-batch predictionspredictions[self.PRED_MB_OBJECTNESS] = objectness_maskedpredictions[self.PRED_MB_OFFSETS] = offsets_masked# Mini batch ground truthpredictions[self.PRED_MB_OFFSETS_GT] = offsets_gt_maskedpredictions[self.PRED_MB_OBJECTNESS_GT] = objectness_gt_masked# Proposals after nmspredictions[self.PRED_TOP_INDICES] = top_indicespredictions[self.PRED_TOP_ANCHORS] = top_anchorspredictions[self.PRED_TOP_OBJECTNESS_SOFTMAX] = top_objectness_softmaxelse:# self._train_val_test == 'test'predictions[self.PRED_TOP_ANCHORS] = top_anchorspredictions[self.PRED_TOP_OBJECTNESS_SOFTMAX] = top_objectness_softmaxreturn predictionsAVOD Model
AVOD網(wǎng)絡(luò)部分會(huì)得到first stage得到的top-k anchor proposals,得到對(duì)應(yīng)bev和img的anchor projection,進(jìn)行相同的crop_and_resize操作,之后再進(jìn)行fusion+n*(fc+fc_drop)進(jìn)行cls,offsets以及angle vector的預(yù)測(cè)(fusion默認(rèn)采用early-fusion:即先進(jìn)行fusion再進(jìn)入之后網(wǎng)絡(luò)層)。生成prediction之后,會(huì)解碼gt投影到bev圖上,然后采用同樣的策略生成mini-batch和top-anchor(bev上進(jìn)行的nms),并且生成對(duì)應(yīng)的objecness,offset,angle的loss。mini-batch的loss作為train過(guò)程中進(jìn)行模型訓(xùn)練,后者生成最終的預(yù)測(cè),但是loss好像并沒(méi)有使用。其中,offset的loss需要轉(zhuǎn)化到3d box上去計(jì)算(論文提出的box_4c計(jì)算方式)。相關(guān)代碼如下:
#avod_model.py:123 deleted code for summary def build(self):rpn_model = self._rpn_model# Share the same prediction dict as RPNprediction_dict = rpn_model.build()top_anchors = prediction_dict[RpnModel.PRED_TOP_ANCHORS]ground_plane = rpn_model.placeholders[RpnModel.PL_GROUND_PLANE]class_labels = rpn_model.placeholders[RpnModel.PL_LABEL_CLASSES]with tf.variable_scope('avod_projection'):if self._config.expand_proposals_xz > 0.0:expand_length = self._config.expand_proposals_xz# Expand anchors along x and zwith tf.variable_scope('expand_xz'):expanded_dim_x = top_anchors[:, 3] + expand_lengthexpanded_dim_z = top_anchors[:, 5] + expand_lengthexpanded_anchors = tf.stack([top_anchors[:, 0],top_anchors[:, 1],top_anchors[:, 2],expanded_dim_x,top_anchors[:, 4],expanded_dim_z], axis=1)avod_projection_in = expanded_anchorselse:avod_projection_in = top_anchorswith tf.variable_scope('bev'):# Project top anchors into bev and image spacesbev_proposal_boxes, bev_proposal_boxes_norm = \anchor_projector.project_to_bev(avod_projection_in,self.dataset.kitti_utils.bev_extents)# Reorder projected boxes into [y1, x1, y2, x2]bev_proposal_boxes_tf_order = \anchor_projector.reorder_projected_boxes(bev_proposal_boxes)bev_proposal_boxes_norm_tf_order = \anchor_projector.reorder_projected_boxes(bev_proposal_boxes_norm)with tf.variable_scope('img'):image_shape = tf.cast(tf.shape(rpn_model.placeholders[RpnModel.PL_IMG_INPUT])[0:2],tf.float32)img_proposal_boxes, img_proposal_boxes_norm = \anchor_projector.tf_project_to_image_space(avod_projection_in,rpn_model.placeholders[RpnModel.PL_CALIB_P2],image_shape)# Only reorder the normalized imgimg_proposal_boxes_norm_tf_order = \anchor_projector.reorder_projected_boxes(img_proposal_boxes_norm)bev_feature_maps = rpn_model.bev_feature_mapsimg_feature_maps = rpn_model.img_feature_mapsif not (self._path_drop_probabilities[0] ==self._path_drop_probabilities[1] == 1.0):with tf.variable_scope('avod_path_drop'):img_mask = rpn_model.img_path_drop_maskbev_mask = rpn_model.bev_path_drop_maskimg_feature_maps = tf.multiply(img_feature_maps,img_mask)bev_feature_maps = tf.multiply(bev_feature_maps,bev_mask)else:bev_mask = tf.constant(1.0)img_mask = tf.constant(1.0)# ROI Poolingwith tf.variable_scope('avod_roi_pooling'):def get_box_indices(boxes):proposals_shape = boxes.get_shape().as_list()if any(dim is None for dim in proposals_shape):proposals_shape = tf.shape(boxes)ones_mat = tf.ones(proposals_shape[:2], dtype=tf.int32)multiplier = tf.expand_dims(tf.range(start=0, limit=proposals_shape[0]), 1)return tf.reshape(ones_mat * multiplier, [-1])bev_boxes_norm_batches = tf.expand_dims(bev_proposal_boxes_norm, axis=0)# These should be all 0's since there is only 1 imagetf_box_indices = get_box_indices(bev_boxes_norm_batches)# Do ROI Pooling on BEVbev_rois = tf.image.crop_and_resize(bev_feature_maps,bev_proposal_boxes_norm_tf_order,tf_box_indices,self._proposal_roi_crop_size,name='bev_rois')# Do ROI Pooling on imageimg_rois = tf.image.crop_and_resize(img_feature_maps,img_proposal_boxes_norm_tf_order,tf_box_indices,self._proposal_roi_crop_size,name='img_rois')# Fully connected layers (Box Predictor)avod_layers_config = self.model_config.layers_config.avod_configfc_output_layers = \avod_fc_layers_builder.build(layers_config=avod_layers_config,input_rois=[bev_rois, img_rois],input_weights=[bev_mask, img_mask],num_final_classes=self._num_final_classes,box_rep=self._box_rep,top_anchors=top_anchors,ground_plane=ground_plane,is_training=self._is_training)all_cls_logits = \fc_output_layers[avod_fc_layers_builder.KEY_CLS_LOGITS]all_offsets = fc_output_layers[avod_fc_layers_builder.KEY_OFFSETS]# This may be Noneall_angle_vectors = \fc_output_layers.get(avod_fc_layers_builder.KEY_ANGLE_VECTORS)with tf.variable_scope('softmax'):all_cls_softmax = tf.nn.softmax(all_cls_logits)####################################################### Subsample mini_batch for the loss function####################################################### Get the ground truth tensorsanchors_gt = rpn_model.placeholders[RpnModel.PL_LABEL_ANCHORS]if self._box_rep in ['box_3d', 'box_4ca']:boxes_3d_gt = rpn_model.placeholders[RpnModel.PL_LABEL_BOXES_3D]orientations_gt = boxes_3d_gt[:, 6]elif self._box_rep in ['box_8c', 'box_8co', 'box_4c']:boxes_3d_gt = rpn_model.placeholders[RpnModel.PL_LABEL_BOXES_3D]else:raise NotImplementedError('Ground truth tensors not implemented')# Project anchor_gts to 2D bevwith tf.variable_scope('avod_gt_projection'):bev_anchor_boxes_gt, _ = anchor_projector.project_to_bev(anchors_gt, self.dataset.kitti_utils.bev_extents)bev_anchor_boxes_gt_tf_order = \anchor_projector.reorder_projected_boxes(bev_anchor_boxes_gt)with tf.variable_scope('avod_box_list'):# Convert to box_list formatanchor_box_list_gt = box_list.BoxList(bev_anchor_boxes_gt_tf_order)anchor_box_list = box_list.BoxList(bev_proposal_boxes_tf_order)#得到minibatch的mask,label index和對(duì)應(yīng)的匹配到的gt indexmb_mask, mb_class_label_indices, mb_gt_indices = \self.sample_mini_batch(anchor_box_list_gt=anchor_box_list_gt,anchor_box_list=anchor_box_list,class_labels=class_labels)# Create classification one_hot vectorwith tf.variable_scope('avod_one_hot_classes'):mb_classification_gt = tf.one_hot(mb_class_label_indices,depth=self._num_final_classes,on_value=1.0 - self._config.label_smoothing_epsilon,off_value=(self._config.label_smoothing_epsilon /self.dataset.num_classes))# TODO: Don't create a mini batch in test mode# Mask predictionswith tf.variable_scope('avod_apply_mb_mask'):# Classificationmb_classifications_logits = tf.boolean_mask(all_cls_logits, mb_mask)mb_classifications_softmax = tf.boolean_mask(all_cls_softmax, mb_mask)# Offsetsmb_offsets = tf.boolean_mask(all_offsets, mb_mask)# Angle Vectorsif all_angle_vectors is not None:mb_angle_vectors = tf.boolean_mask(all_angle_vectors, mb_mask)else:mb_angle_vectors = None# Encode anchor offsetswith tf.variable_scope('avod_encode_mb_anchors'):mb_anchors = tf.boolean_mask(top_anchors, mb_mask)if self._box_rep == 'box_3d':# Gather corresponding ground truth anchors for each mb samplemb_anchors_gt = tf.gather(anchors_gt, mb_gt_indices)mb_offsets_gt = anchor_encoder.tf_anchor_to_offset(mb_anchors, mb_anchors_gt)# Gather corresponding ground truth orientation for each# mb samplemb_orientations_gt = tf.gather(orientations_gt,mb_gt_indices)elif self._box_rep in ['box_8c', 'box_8co']:# Get boxes_3d ground truth mini-batch and convert to box_8cmb_boxes_3d_gt = tf.gather(boxes_3d_gt, mb_gt_indices)if self._box_rep == 'box_8c':mb_boxes_8c_gt = \box_8c_encoder.tf_box_3d_to_box_8c(mb_boxes_3d_gt)elif self._box_rep == 'box_8co':mb_boxes_8c_gt = \box_8c_encoder.tf_box_3d_to_box_8co(mb_boxes_3d_gt)# Convert proposals: anchors -> box_3d -> box8cproposal_boxes_3d = \box_3d_encoder.anchors_to_box_3d(top_anchors, fix_lw=True)proposal_boxes_8c = \box_8c_encoder.tf_box_3d_to_box_8c(proposal_boxes_3d)# Get mini batch offsetsmb_boxes_8c = tf.boolean_mask(proposal_boxes_8c, mb_mask)mb_offsets_gt = box_8c_encoder.tf_box_8c_to_offsets(mb_boxes_8c, mb_boxes_8c_gt)# Flatten the offsets to a (N x 24) vectormb_offsets_gt = tf.reshape(mb_offsets_gt, [-1, 24])elif self._box_rep in ['box_4c', 'box_4ca']:# Get ground plane for box_4c conversionground_plane = self._rpn_model.placeholders[self._rpn_model.PL_GROUND_PLANE]# Convert gt boxes_3d -> box_4cmb_boxes_3d_gt = tf.gather(boxes_3d_gt, mb_gt_indices)mb_boxes_4c_gt = box_4c_encoder.tf_box_3d_to_box_4c(mb_boxes_3d_gt, ground_plane)# Convert proposals: anchors -> box_3d -> box_4cproposal_boxes_3d = \box_3d_encoder.anchors_to_box_3d(top_anchors, fix_lw=True)proposal_boxes_4c = \box_4c_encoder.tf_box_3d_to_box_4c(proposal_boxes_3d,ground_plane)# Get mini batchmb_boxes_4c = tf.boolean_mask(proposal_boxes_4c, mb_mask)mb_offsets_gt = box_4c_encoder.tf_box_4c_to_offsets(mb_boxes_4c, mb_boxes_4c_gt)if self._box_rep == 'box_4ca':# Gather corresponding ground truth orientation for each# mb samplemb_orientations_gt = tf.gather(orientations_gt,mb_gt_indices)else:raise NotImplementedError('Anchor encoding not implemented for', self._box_rep)####################################################### Final Predictions####################################################### Get orientations from angle vectorsif all_angle_vectors is not None:with tf.variable_scope('avod_orientation'):all_orientations = \orientation_encoder.tf_angle_vector_to_orientation(all_angle_vectors)# Apply offsets to regress proposalswith tf.variable_scope('avod_regression'):if self._box_rep == 'box_3d':prediction_anchors = \anchor_encoder.offset_to_anchor(top_anchors,all_offsets)elif self._box_rep in ['box_8c', 'box_8co']:# Reshape the 24-dim regressed offsets to (N x 3 x 8)reshaped_offsets = tf.reshape(all_offsets,[-1, 3, 8])# Given the offsets, get the boxes_8cprediction_boxes_8c = \box_8c_encoder.tf_offsets_to_box_8c(proposal_boxes_8c,reshaped_offsets)# Convert corners back to box3Dprediction_boxes_3d = \box_8c_encoder.box_8c_to_box_3d(prediction_boxes_8c)# Convert the box_3d to anchor format for nmsprediction_anchors = \box_3d_encoder.tf_box_3d_to_anchor(prediction_boxes_3d)elif self._box_rep in ['box_4c', 'box_4ca']:# Convert predictions box_4c -> box_3dprediction_boxes_4c = \box_4c_encoder.tf_offsets_to_box_4c(proposal_boxes_4c,all_offsets)prediction_boxes_3d = \box_4c_encoder.tf_box_4c_to_box_3d(prediction_boxes_4c,ground_plane)# Convert to anchor format for nmsprediction_anchors = \box_3d_encoder.tf_box_3d_to_anchor(prediction_boxes_3d)else:raise NotImplementedError('Regression not implemented for',self._box_rep)# Apply Non-oriented NMS in BEVwith tf.variable_scope('avod_nms'):bev_extents = self.dataset.kitti_utils.bev_extentswith tf.variable_scope('bev_projection'):# Project predictions into BEVavod_bev_boxes, _ = anchor_projector.project_to_bev(prediction_anchors, bev_extents)avod_bev_boxes_tf_order = \anchor_projector.reorder_projected_boxes(avod_bev_boxes)# Get top score from second column onwardall_top_scores = tf.reduce_max(all_cls_logits[:, 1:], axis=1)# Apply NMS in BEVnms_indices = tf.image.non_max_suppression(avod_bev_boxes_tf_order,all_top_scores,max_output_size=self._nms_size,iou_threshold=self._nms_iou_threshold)# Gather predictions from NMS indicestop_classification_logits = tf.gather(all_cls_logits,nms_indices)top_classification_softmax = tf.gather(all_cls_softmax,nms_indices)top_prediction_anchors = tf.gather(prediction_anchors,nms_indices)if self._box_rep == 'box_3d':top_orientations = tf.gather(all_orientations, nms_indices)elif self._box_rep in ['box_8c', 'box_8co']:top_prediction_boxes_3d = tf.gather(prediction_boxes_3d, nms_indices)top_prediction_boxes_8c = tf.gather(prediction_boxes_8c, nms_indices)elif self._box_rep == 'box_4c':top_prediction_boxes_3d = tf.gather(prediction_boxes_3d, nms_indices)top_prediction_boxes_4c = tf.gather(prediction_boxes_4c, nms_indices)elif self._box_rep == 'box_4ca':top_prediction_boxes_3d = tf.gather(prediction_boxes_3d, nms_indices)top_prediction_boxes_4c = tf.gather(prediction_boxes_4c, nms_indices)top_orientations = tf.gather(all_orientations, nms_indices)else:raise NotImplementedError('NMS gather not implemented for',self._box_rep)if self._train_val_test in ['train', 'val']:# Additional entries are added to the shared prediction_dict# Mini batch predictionsprediction_dict[self.PRED_MB_CLASSIFICATION_LOGITS] = \mb_classifications_logitsprediction_dict[self.PRED_MB_CLASSIFICATION_SOFTMAX] = \mb_classifications_softmaxprediction_dict[self.PRED_MB_OFFSETS] = mb_offsets# Mini batch ground truthprediction_dict[self.PRED_MB_CLASSIFICATIONS_GT] = \mb_classification_gtprediction_dict[self.PRED_MB_OFFSETS_GT] = mb_offsets_gt# Top NMS predictionsprediction_dict[self.PRED_TOP_CLASSIFICATION_LOGITS] = \top_classification_logitsprediction_dict[self.PRED_TOP_CLASSIFICATION_SOFTMAX] = \top_classification_softmaxprediction_dict[self.PRED_TOP_PREDICTION_ANCHORS] = \top_prediction_anchors# Mini batch predictions (for debugging)prediction_dict[self.PRED_MB_MASK] = mb_mask# prediction_dict[self.PRED_MB_POS_MASK] = mb_pos_maskprediction_dict[self.PRED_MB_CLASS_INDICES_GT] = \mb_class_label_indices# All predictions (for debugging)prediction_dict[self.PRED_ALL_CLASSIFICATIONS] = \all_cls_logitsprediction_dict[self.PRED_ALL_OFFSETS] = all_offsets# Path drop masks (for debugging)prediction_dict['bev_mask'] = bev_maskprediction_dict['img_mask'] = img_maskelse:# self._train_val_test == 'test'prediction_dict[self.PRED_TOP_CLASSIFICATION_SOFTMAX] = \top_classification_softmaxprediction_dict[self.PRED_TOP_PREDICTION_ANCHORS] = \top_prediction_anchorsif self._box_rep == 'box_3d':prediction_dict[self.PRED_MB_ANCHORS_GT] = mb_anchors_gtprediction_dict[self.PRED_MB_ORIENTATIONS_GT] = mb_orientations_gtprediction_dict[self.PRED_MB_ANGLE_VECTORS] = mb_angle_vectorsprediction_dict[self.PRED_TOP_ORIENTATIONS] = top_orientations# For debuggingprediction_dict[self.PRED_ALL_ANGLE_VECTORS] = all_angle_vectorselif self._box_rep in ['box_8c', 'box_8co']:prediction_dict[self.PRED_TOP_PREDICTION_BOXES_3D] = \top_prediction_boxes_3d# Store the corners before converting for visualization purposesprediction_dict[self.PRED_TOP_BOXES_8C] = top_prediction_boxes_8celif self._box_rep == 'box_4c':prediction_dict[self.PRED_TOP_PREDICTION_BOXES_3D] = \top_prediction_boxes_3dprediction_dict[self.PRED_TOP_BOXES_4C] = top_prediction_boxes_4celif self._box_rep == 'box_4ca':if self._train_val_test in ['train', 'val']:prediction_dict[self.PRED_MB_ORIENTATIONS_GT] = \mb_orientations_gtprediction_dict[self.PRED_MB_ANGLE_VECTORS] = mb_angle_vectorsprediction_dict[self.PRED_TOP_PREDICTION_BOXES_3D] = \top_prediction_boxes_3dprediction_dict[self.PRED_TOP_BOXES_4C] = top_prediction_boxes_4cprediction_dict[self.PRED_TOP_ORIENTATIONS] = top_orientationselse:raise NotImplementedError('Prediction dict not implemented for',self._box_rep)# prediction_dict[self.PRED_MAX_IOUS] = max_ious# prediction_dict[self.PRED_ALL_IOUS] = all_iousreturn prediction_dict總結(jié)
- 上一篇: 社交网络分析初步学习1.md
- 下一篇: 申报软件著作权时,怎样快捷计算源代码行数