當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

SSD( Single Shot MultiBox Detector)关键源码解析

發(fā)布時(shí)間：2024/8/23 编程问答 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 SSD( Single Shot MultiBox Detector)关键源码解析小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

SSD（SSD: Single Shot MultiBox Detector）是采用單個(gè)深度神經(jīng)網(wǎng)絡(luò)模型實(shí)現(xiàn)目標(biāo)檢測(cè)和識(shí)別的方法。如圖0-1所示，該方法是綜合了Faster R-CNN的anchor box和YOLO單個(gè)神經(jīng)網(wǎng)絡(luò)檢測(cè)思路（YOLOv2也采用了類似的思路，詳見YOLO升級(jí)版：YOLOv2和YOLO9000解析），既有Faster R-CNN的準(zhǔn)確率又有YOLO的檢測(cè)速度，可以實(shí)現(xiàn)高準(zhǔn)確率實(shí)時(shí)檢測(cè)。在300*300分辨率，SSD在VOC2007數(shù)據(jù)集上準(zhǔn)確率為74.3%mAP，59FPS；512*512分辨率，SSD獲得了超過(guò)Fast R-CNN，獲得了80%mAP/19fps的結(jié)果，如圖0-2所示。SSD關(guān)鍵點(diǎn)分為兩類：模型結(jié)構(gòu)和訓(xùn)練方法。模型結(jié)構(gòu)包括：多尺度特征圖檢測(cè)網(wǎng)絡(luò)結(jié)構(gòu)和anchor boxes生成；訓(xùn)練方法包括：ground truth預(yù)處理和損失函數(shù)。本文解析的是SSD的tensorflow實(shí)現(xiàn)源碼，來(lái)源balancap/SSD-Tensorflow。本文結(jié)構(gòu)如下：

1，多尺度特征圖檢測(cè)網(wǎng)絡(luò)結(jié)構(gòu)；

2，anchor boxes生成；

3，ground truth預(yù)處理；

4，目標(biāo)函數(shù)；

5，總結(jié)

圖0-1 SSD與MultiBox，Faster R-CNN，YOLO原理（此圖來(lái)源于作者在eccv2016的PPT）

圖0-2 SSD檢測(cè)速度與精確度。（此圖來(lái)源于作者在eccv2016的PPT）

1 多尺度特征圖檢測(cè)網(wǎng)絡(luò)結(jié)構(gòu)

SSD的網(wǎng)絡(luò)模型如圖1-1所示。<img src="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_b.png" data-rawwidth="1152" data-rawheight="553" class="origin_image zh-lightbox-thumb" width="1152" data-original="https://pic1.zhimg.com/v2-7f7f3c99d20df97455e8bcfce7876d30_r.png">

圖1-1 SSD模型結(jié)構(gòu)。（此圖來(lái)源于原論文）

模型建立源代碼包含于ssd_vgg_300.py中。模型多尺度特征圖檢測(cè)如圖1-2所示。模型選擇的特征圖包括：38×38（block4）,19×19（block7），10×10（block8），5×5（block9），3×3（block10），1×1（block11）。對(duì)于每張?zhí)卣鲌D，生成采用3×3卷積生成默認(rèn)框的四個(gè)偏移位置和21個(gè)類別的置信度。比如block7，默認(rèn)框（def boxes）數(shù)目為6，每個(gè)默認(rèn)框包含4個(gè)偏移位置和21個(gè)類別置信度（4+21）。因此，block7的最后輸出為(19*19)*6*(4+21)。

圖1-2 多尺度特征采樣（此圖來(lái)源：知乎專欄）

其中，初始化參數(shù)如下：

""" Implementation of the SSD VGG-based 300 network. The default features layers with 300x300 image input are: conv4 ==> 38 x 38 conv7 ==> 19 x 19 conv8 ==> 10 x 10 conv9 ==> 5 x 5 conv10 ==> 3 x 3 conv11 ==> 1 x 1 The default image size used to train this network is 300x300. """default_params = SSDParams(img_shape=(300, 300),#輸入尺寸num_classes=21,#預(yù)測(cè)類別20+1=21（20類加背景）#獲取feature map層feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],anchor_size_bounds=[0.15, 0.90],#anchor boxes的大小anchor_sizes=[(21., 45.),(45., 99.),(99., 153.),(153., 207.),(207., 261.),(261., 315.)],#anchor boxes的aspect ratiosanchor_ratios=[[2, .5],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5],[2, .5]],anchor_steps=[8, 16, 32, 64, 100, 300],#anchor的層anchor_offset=0.5,#補(bǔ)償閥值0.5normalizations=[20, -1, -1, -1, -1, -1],#該特征層是否正則，大于零即正則；小于零則否prior_scaling=[0.1, 0.1, 0.2, 0.2])

建立模型代碼如下，作者采用了TensorFlow-Slim（類似于keras的高層庫(kù)）來(lái)建立網(wǎng)絡(luò)模型，詳細(xì)內(nèi)容可以參考TensorFlow-Slim網(wǎng)頁(yè)。

#建立ssd網(wǎng)絡(luò)函數(shù) def ssd_net(inputs,num_classes=21,feat_layers=SSDNet.default_params.feat_layers,anchor_sizes=SSDNet.default_params.anchor_sizes,anchor_ratios=SSDNet.default_params.anchor_ratios,normalizations=SSDNet.default_params.normalizations,is_training=True,dropout_keep_prob=0.5,prediction_fn=slim.softmax,reuse=None,scope='ssd_300_vgg'):"""SSD net definition. """# End_points collect relevant activations for external use.#用于收集每一層輸出結(jié)果end_points = {}#采用slim建立vgg網(wǎng)絡(luò),網(wǎng)絡(luò)結(jié)構(gòu)參考文章內(nèi)的結(jié)構(gòu)圖with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):# Original VGG-16 blocks.net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')end_points['block1'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool1')# Block 2.net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')end_points['block2'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool2')# Block 3.net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')end_points['block3'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool3')# Block 4.net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')end_points['block4'] = netnet = slim.max_pool2d(net, [2, 2], scope='pool4')# Block 5.net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')end_points['block5'] = netnet = slim.max_pool2d(net, [3, 3], 1, scope='pool5')#max pool#外加的SSD層# Additional SSD blocks.# Block 6: let's dilate the hell out of it!#輸出shape為19×19×1024net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')end_points['block6'] = net# Block 7: 1x1 conv. Because the fuck.#卷積核為1×1net = slim.conv2d(net, 1024, [1, 1], scope='conv7')end_points['block7'] = net# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).end_point = 'block8'with tf.variable_scope(end_point):net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3')end_points[end_point] = netend_point = 'block9'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3')end_points[end_point] = netend_point = 'block10'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')end_points[end_point] = netend_point = 'block11'with tf.variable_scope(end_point):net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')end_points[end_point] = net# Prediction and localisations layers.#預(yù)測(cè)和定位predictions = []logits = []localisations = []for i, layer in enumerate(feat_layers):with tf.variable_scope(layer + '_box'):#接受特征層的輸出，生成類別和位置預(yù)測(cè)p, l = ssd_multibox_layer(end_points[layer],num_classes,anchor_sizes[i],anchor_ratios[i],normalizations[i])#把每一層的預(yù)測(cè)收集predictions.append(prediction_fn(p))#prediction_fn為softmax，預(yù)測(cè)類別logits.append(p)#概率localisations.append(l)#預(yù)測(cè)位置信息return predictions, localisations, logits, end_points

2 anchor box生成

對(duì)每一張?zhí)卣鲌D，按照不同的大小（scale）和長(zhǎng)寬比（ratio）生成生成k個(gè)默認(rèn)框（default boxes），原理圖如圖2-1所示(此圖中，默認(rèn)框數(shù)目k=6，其中5×5的紅色點(diǎn)代表特征圖，因此：5*5*6 = 150 個(gè)boxes)。

每個(gè)默認(rèn)框大小計(jì)算公式為：，其中，m為特征圖數(shù)目，為最底層特征圖大小（原論文中值為0.2，代碼中為0.15），為最頂層特征圖默認(rèn)框大小（原論文中為0.9,代碼中為0.9）。

每個(gè)默認(rèn)框長(zhǎng)寬比根據(jù)比例值計(jì)算，原論文中比例值為，因此，每個(gè)默認(rèn)框的寬為，高為。對(duì)于比例為1的默認(rèn)框，額外添加一個(gè)比例為的默認(rèn)框。最終，每張?zhí)卣鲌D中的每個(gè)點(diǎn)生成6個(gè)默認(rèn)框。每個(gè)默認(rèn)框中心設(shè)定為,其中，為第k個(gè)特征圖尺寸。

圖2-1 anchor box生成示意圖（此圖來(lái)源于知乎專欄）

源代碼中，默認(rèn)框生成函數(shù)為ssd_anchor_one_layer()，代碼如下：

#生成一層的anchor boxes def ssd_anchor_one_layer(img_shape,#原始圖像shapefeat_shape,#特征圖shapesizes,#預(yù)設(shè)的box sizeratios,#aspect 比例step,#anchor的層offset=0.5,dtype=np.float32):"""Computer SSD default anchor boxes for one feature layer. Determine the relative position grid of the centers, and the relative width and height. Arguments: feat_shape: Feature shape, used for computing relative position grids; size: Absolute reference sizes; ratios: Ratios to use on these features; img_shape: Image shape, used for computing height, width relatively to the former; offset: Grid offset. Return: y, x, h, w: Relative x and y grids, and height and width. """# Compute the position grid: simple way.# y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]# y = (y.astype(dtype) + offset) / feat_shape[0]# x = (x.astype(dtype) + offset) / feat_shape[1]# Weird SSD-Caffe computation using steps values...""" #測(cè)試中，參數(shù)如下 feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)] anchor_sizes=[(21., 45.), (45., 99.), (99., 153.), (153., 207.), (207., 261.), (261., 315.)] anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]] anchor_steps=[8, 16, 32, 64, 100, 300] offset=0.5 dtype=np.float32 feat_shape=feat_shapes[0] step=anchor_steps[0] """#測(cè)試中，y和x的shape為（38,38）（38,38）#y的值為#array([[ 0, 0, 0, ..., 0, 0, 0],# [ 1, 1, 1, ..., 1, 1, 1],# [ 2, 2, 2, ..., 2, 2, 2],# ..., # [35, 35, 35, ..., 35, 35, 35],# [36, 36, 36, ..., 36, 36, 36],# [37, 37, 37, ..., 37, 37, 37]])y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]#測(cè)試中y=(y+0.5)×8/300,x=(x+0.5)×8/300y = (y.astype(dtype) + offset) * step / img_shape[0]x = (x.astype(dtype) + offset) * step / img_shape[1]#擴(kuò)展維度，維度為（38,38,1）# Expand dims to support easy broadcasting.y = np.expand_dims(y, axis=-1)x = np.expand_dims(x, axis=-1)# Compute relative height and width.# Tries to follow the original implementation of SSD for the order.#數(shù)值為2+2num_anchors = len(sizes) + len(ratios)#shape為（4,）h = np.zeros((num_anchors, ), dtype=dtype)w = np.zeros((num_anchors, ), dtype=dtype)# Add first anchor boxes with ratio=1.#測(cè)試中，h[0]=21/300,w[0]=21/300?h[0] = sizes[0] / img_shape[0]w[0] = sizes[0] / img_shape[1]di = 1if len(sizes) > 1:#h[1]=sqrt(21*45)/300h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]di += 1for i, r in enumerate(ratios):h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)#測(cè)試中，y和x shape為（38,38,1）#h和w的shape為（4,）return y, x, h, w

3 ground truth預(yù)處理

訓(xùn)練過(guò)程中，首先需要將label信息（ground truth box，ground truth category）進(jìn)行預(yù)處理，將其對(duì)應(yīng)到相應(yīng)的默認(rèn)框上。根據(jù)默認(rèn)框和ground truth box的jaccard 重疊來(lái)尋找對(duì)應(yīng)的默認(rèn)框。文章中選取了jaccard重疊超過(guò)0.5的默認(rèn)框?yàn)檎龢颖?#xff0c;其它為負(fù)樣本。

源代碼ground truth預(yù)處理代碼位于ssd_common.py文件中，關(guān)鍵代碼如下：

#label和bbox編碼函數(shù) def tf_ssd_bboxes_encode_layer(labels,#ground truth標(biāo)簽，1D tensorbboxes,#N×4 Tensor（float）anchors_layer,#anchors，為listmatching_threshold=0.5,#閥值prior_scaling=[0.1, 0.1, 0.2, 0.2],#縮放dtype=tf.float32):"""Encode groundtruth labels and bounding boxes using SSD anchors from one layer. Arguments: labels: 1D Tensor(int64) containing groundtruth labels; bboxes: Nx4 Tensor(float) with bboxes relative coordinates; anchors_layer: Numpy array with layer anchors; matching_threshold: Threshold for positive match with groundtruth bboxes; prior_scaling: Scaling of encoded coordinates. Return: (target_labels, target_localizations, target_scores): Target Tensors. """# Anchors coordinates and volume.#獲取anchors層yref, xref, href, wref = anchors_layerymin = yref - href / 2.xmin = xref - wref / 2.ymax = yref + href / 2.xmax = xref + wref / 2.#xmax的shape為((38, 38, 1), (38, 38, 1), (4,), (4,)) (38, 38, 4)#體積vol_anchors = (xmax - xmin) * (ymax - ymin)# Initialize tensors...shape = (yref.shape[0], yref.shape[1], href.size)feat_labels = tf.zeros(shape, dtype=tf.int64)feat_scores = tf.zeros(shape, dtype=dtype)#shape為（38,38,4）feat_ymin = tf.zeros(shape, dtype=dtype)feat_xmin = tf.zeros(shape, dtype=dtype)feat_ymax = tf.ones(shape, dtype=dtype)feat_xmax = tf.ones(shape, dtype=dtype)#計(jì)算jaccard重合def jaccard_with_anchors(bbox):"""Compute jaccard score a box and the anchors. """# Intersection bbox and volume.int_ymin = tf.maximum(ymin, bbox[0])int_xmin = tf.maximum(xmin, bbox[1])int_ymax = tf.minimum(ymax, bbox[2])int_xmax = tf.minimum(xmax, bbox[3])h = tf.maximum(int_ymax - int_ymin, 0.)w = tf.maximum(int_xmax - int_xmin, 0.)# Volumes.inter_vol = h * wunion_vol = vol_anchors - inter_vol \+ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])jaccard = tf.div(inter_vol, union_vol)return jaccard#條件函數(shù) def condition(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):"""Condition: check label index. """#tf.less函數(shù) Returns the truth value of (x < y) element-wise.r = tf.less(i, tf.shape(labels))return r[0]#主體def body(i, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax):"""Body: update feature labels, scores and bboxes. Follow the original SSD paper for that purpose: - assign values when jaccard > 0.5; - only update if beat the score of other bboxes. """# Jaccard score.label = labels[i]bbox = bboxes[i]scores = jaccard_with_anchors(bbox)#計(jì)算jaccard重合值# 'Boolean' mask.#tf.greater函數(shù)返回大于的布爾值mask = tf.logical_and(tf.greater(scores, matching_threshold),tf.greater(scores, feat_scores))imask = tf.cast(mask, tf.int64)fmask = tf.cast(mask, dtype)# Update values using mask.feat_labels = imask * label + (1 - imask) * feat_labelsfeat_scores = tf.select(mask, scores, feat_scores)feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_yminfeat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xminfeat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymaxfeat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmaxreturn [i+1, feat_labels, feat_scores,feat_ymin, feat_xmin, feat_ymax, feat_xmax]# Main loop definition.i = 0[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax] = tf.while_loop(condition, body,[i, feat_labels, feat_scores,feat_ymin, feat_xmin,feat_ymax, feat_xmax])# Transform to center / size.#計(jì)算補(bǔ)償后的中心feat_cy = (feat_ymax + feat_ymin) / 2.feat_cx = (feat_xmax + feat_xmin) / 2.feat_h = feat_ymax - feat_yminfeat_w = feat_xmax - feat_xmin# Encode features.feat_cy = (feat_cy - yref) / href / prior_scaling[0]feat_cx = (feat_cx - xref) / wref / prior_scaling[1]feat_h = tf.log(feat_h / href) / prior_scaling[2]feat_w = tf.log(feat_w / wref) / prior_scaling[3]# Use SSD ordering: x / y / w / h instead of ours.feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)return feat_labels, feat_localizations, feat_scores#ground truth編碼函數(shù) def tf_ssd_bboxes_encode(labels,#ground truth標(biāo)簽，1D tensorbboxes,#N×4 Tensor（float）anchors,#anchors，為listmatching_threshold=0.5,#閥值prior_scaling=[0.1, 0.1, 0.2, 0.2],#縮放dtype=tf.float32,scope='ssd_bboxes_encode'):"""Encode groundtruth labels and bounding boxes using SSD net anchors. Encoding boxes for all feature layers. Arguments: labels: 1D Tensor(int64) containing groundtruth labels; bboxes: Nx4 Tensor(float) with bboxes relative coordinates; anchors: List of Numpy array with layer anchors; matching_threshold: Threshold for positive match with groundtruth bboxes; prior_scaling: Scaling of encoded coordinates. Return: (target_labels, target_localizations, target_scores): Each element is a list of target Tensors. """with tf.name_scope(scope):target_labels = []target_localizations = []target_scores = []for i, anchors_layer in enumerate(anchors):with tf.name_scope('bboxes_encode_block_%i' % i):#將label和bbox進(jìn)行編碼t_labels, t_loc, t_scores = \tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,matching_threshold, prior_scaling, dtype)target_labels.append(t_labels)target_localizations.append(t_loc)target_scores.append(t_scores)return target_labels, target_localizations, target_scores#編碼goundtruth的label和bboxdef bboxes_encode(self, labels, bboxes, anchors,scope='ssd_bboxes_encode'):"""Encode labels and bounding boxes. """return ssd_common.tf_ssd_bboxes_encode(labels, bboxes, anchors,matching_threshold=0.5,prior_scaling=self.params.prior_scaling,scope=scope)

4 目標(biāo)函數(shù)

SSD目標(biāo)函數(shù)分為兩個(gè)部分：對(duì)應(yīng)默認(rèn)框的位置loss（loc）和類別置信度loss（conf）。定義為第i個(gè)默認(rèn)框和對(duì)應(yīng)的第j個(gè)ground truth box，相應(yīng)的類別為p。目標(biāo)函數(shù)定義為：

其中，N為匹配的默認(rèn)框。如果N=0，loss為零。為預(yù)測(cè)框和ground truth box 的Smooth L1 loss，值通過(guò)cross validation設(shè)置為1。

定義如下：<img src="https://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_b.png" data-rawwidth="539" data-rawheight="184" class="origin_image zh-lightbox-thumb" width="539" data-original="https://pic1.zhimg.com/v2-c59028fcd350680c60002216cac34434_r.png">其中，其中，為預(yù)測(cè)框，為ground truth。為補(bǔ)償（regress to offsets）后的默認(rèn)框（）的中心，為默認(rèn)框的寬和高。

定義為多累別softmax loss，公式如下：

<img src="https://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_b.png" data-rawwidth="739" data-rawheight="75" class="origin_image zh-lightbox-thumb" width="739" data-original="https://pic3.zhimg.com/v2-b5772e77cfe447103133b90c05a807ee_r.png">目標(biāo)函數(shù)定義源碼位于ssd_vgg_300.py，注釋如下：

目標(biāo)函數(shù)定義源碼位于ssd_vgg_300.py，注釋如下：

# =========================================================================== # # SSD loss function. # =========================================================================== # def ssd_losses(logits, #預(yù)測(cè)類別localisations,#預(yù)測(cè)位置gclasses, #ground truth 類別glocalisations, #ground truth 位置gscores,#ground truth 分?jǐn)?shù)match_threshold=0.5,negative_ratio=3.,alpha=1.,label_smoothing=0.,scope='ssd_losses'):"""Loss functions for training the SSD 300 VGG network. This function defines the different loss components of the SSD, and adds them to the TF loss collection. Arguments: logits: (list of) predictions logits Tensors; localisations: (list of) localisations Tensors; gclasses: (list of) groundtruth labels Tensors; glocalisations: (list of) groundtruth localisations Tensors; gscores: (list of) groundtruth score Tensors; """# Some debugging...# for i in range(len(gclasses)):# print(localisations[i].get_shape())# print(logits[i].get_shape())# print(gclasses[i].get_shape())# print(glocalisations[i].get_shape())# print()with tf.name_scope(scope):l_cross = []l_loc = []for i in range(len(logits)):with tf.name_scope('block_%i' % i):# Determine weights Tensor.pmask = tf.cast(gclasses[i] > 0, logits[i].dtype)n_positives = tf.reduce_sum(pmask)#正樣本數(shù)目#np.prod函數(shù)Return the product of array elements over a given axisn_entries = np.prod(gclasses[i].get_shape().as_list())# r_positive = n_positives / n_entries# Select some random negative entries.r_negative = negative_ratio * n_positives / (n_entries - n_positives)#負(fù)樣本數(shù)nmask = tf.random_uniform(gclasses[i].get_shape(),dtype=logits[i].dtype)nmask = nmask * (1. - pmask)nmask = tf.cast(nmask > 1. - r_negative, logits[i].dtype)#cross_entropy loss# Add cross-entropy loss.with tf.name_scope('cross_entropy'):# Weights Tensor: positive mask + random negative.weights = pmask + nmaskloss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits[i],gclasses[i])loss = tf.contrib.losses.compute_weighted_loss(loss, weights)l_cross.append(loss)#smooth loss# Add localization loss: smooth L1, L2, ...with tf.name_scope('localization'):# Weights Tensor: positive mask + random negative.weights = alpha * pmaskloss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])loss = tf.contrib.losses.compute_weighted_loss(loss, weights)l_loc.append(loss)# Total losses in summaries...with tf.name_scope('total'):tf.summary.scalar('cross_entropy', tf.add_n(l_cross))tf.summary.scalar('localization', tf.add_n(l_loc))

5 總結(jié)

本文對(duì)SSD: Single Shot MultiBox Detector的tensorflow的關(guān)鍵源代碼進(jìn)行了解析。本文采用的源碼來(lái)自于balancap/SSD-Tensorflow。源碼作者寫得非常詳細(xì)，內(nèi)容較多（其它還包括了圖像預(yù)處理，多GPU并行訓(xùn)練等許多內(nèi)容），因此只選取了關(guān)鍵代碼進(jìn)行解析。在看完論文后，再結(jié)合關(guān)鍵代碼分析，結(jié)構(gòu)就很清晰了。SSD代碼實(shí)現(xiàn)的關(guān)鍵點(diǎn)為：1，多尺度特征圖檢測(cè)網(wǎng)絡(luò)結(jié)構(gòu)；2，anchor boxes生成；3，ground truth預(yù)處理；4，目標(biāo)函數(shù)。SSD和YOLOv2類似，可以實(shí)現(xiàn)高準(zhǔn)確率下的實(shí)時(shí)目標(biāo)檢測(cè)，是非常值得研究和改進(jìn)的目標(biāo)檢測(cè)方法。

總結(jié)

以上是生活随笔為你收集整理的SSD( Single Shot MultiBox Detector)关键源码解析的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： CPU Burst有副作用吗？让数学来回
下一篇：数字农业WMS库存操作重构及思考