04.卷积神经网络 W3.目标检测(作业:自动驾驶 - 汽车检测)
文章目錄
- 1. 問題背景
- 2. YOLO 模型
- 2.1 模型細節
- 2.2 分類閾值過濾
- 2.3 非極大值抑制
- 2.4 完成過濾
- 3. 在照片上測試已預訓練的YOLO模型
- 3.1 定義類別、anchors、圖片尺寸
- 3.2 加載已預訓練的模型
- 3.3 模型輸出轉化為可用的邊界框變量
- 3.4 過濾邊界框
- 3.5 在圖片上運行
測試題:參考博文
筆記:04.卷積神經網絡 W3.目標檢測
參考論文:
Redmon et al., 2016 (https://arxiv.org/abs/1506.02640)
Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242)
導入一些包:
import argparse import os import matplotlib.pyplot as plt from matplotlib.pyplot import imshow import scipy.io import scipy.misc import numpy as np import pandas as pd import PIL import tensorflow as tf from keras import backend as K from keras.layers import Input, Lambda, Conv2D from keras.models import load_model, Model from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body%matplotlib inline- from keras import backend as K, 使用 Keras 的函數可以這么寫 K.function(...)
1. 問題背景
在車上裝的攝像頭采集了汽車道路行駛過程中的照片,所有的照片做了標記,在照片里對每個汽車目標畫了方框
因為YOLO模型的訓練非常昂貴,我們將加載預先訓練好的權重
2. YOLO 模型
YOLO(you only look once)是一種流行的算法,因為它在實現高精度的同時還能夠實時運行。
這個算法“只看一次”圖像,因為它只需要一次前向傳播通過網絡來進行預測。
在非最大值抑制之后,它輸出識別的對象和邊界框。
2.1 模型細節
- 輸入:一批圖片,維度:(m, 608, 608, 3)
- 輸出:(pc,bx,by,bh,bw,c)(p_c, b_x, b_y, b_h, b_w, c)(pc?,bx?,by?,bh?,bw?,c),ccc 可以展開,如果你需要識別80個類別,那么輸出就是 85 個數字
我們將使用 5 個 anchor boxes,模型結構如下:
如果一個目標的中點在某個方格內,這個方格就負責檢測那個目標
19x19的方格中,每個格子中輸出包含 5個 anchor boxes,每個 anchor boxes 包含 對應的標簽 85 個數字
可視化預測過程:
- 對于19x19的網格,找到 5個 box里最大概率的類別
- 按照概率最大的類別,給目標著色
請注意,這種可視化并不是YOLO算法本身用于進行預測的核心部分;
它只是可視化算法中間結果的一種很好的方式
還有一種可視化:
- 繪制邊界框
邊界框太多:進行 non max suppression 非最大值抑制
- 去掉分數低的框
- 當多個框相互重疊并檢測到同一個對象時,只選擇一個框
2.2 分類閾值過濾
建立過濾器,去掉任何一個“分數”低于所選閾值的框
模型給你 19x19x5x85 的數字,每個邊框包含著 85 個數,把數據拆分下方便后序操作:
- box_confidence: tensor of shape (19×19,5,1) , 每個格子,5個box預測對象的置信概率
- boxes: tensor of shape (19×19,5,4),包含每個格子,5個box的 (bx,by,bh,bw)(b_x, b_y, b_h, b_w)(bx?,by?,bh?,bw?) 位置信息
- box_class_probs: tensor of shape (19×19,5,80),包含每個格子,5個box的80種目標的探測概率 (c1,c2,...c80)(c_1, c_2, ... c_{80})(c1?,c2?,...c80?)
boolean_mask 參考:https://www.tensorflow.org/api_docs/python/tf/boolean_mask
tf.boolean_mask( tensor, mask, axis=None, name=‘boolean_mask’ )
# GRADED FUNCTION: yolo_filter_boxesdef yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):"""Filters YOLO boxes by thresholding on object and class confidence.Arguments:box_confidence -- tensor of shape (19, 19, 5, 1)boxes -- tensor of shape (19, 19, 5, 4)box_class_probs -- tensor of shape (19, 19, 5, 80)threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding boxReturns:scores -- tensor of shape (None,), containing the class probability score for selected boxesboxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxesclasses -- tensor of shape (None,), containing the index of the class detected by the selected boxesNote: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. For example, the actual output size of scores would be (10,) if there are 10 boxes."""# Step 1: Compute box scores### START CODE HERE ### (≈ 1 line)box_scores = box_confidence*box_class_probs### END CODE HERE #### Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score### START CODE HERE ### (≈ 2 lines)box_classes = K.argmax(box_scores, axis=-1)box_class_scores = K.max(box_scores, axis=-1)### END CODE HERE #### Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)### START CODE HERE ### (≈ 1 line)filtering_mask = box_class_scores >= threshold### END CODE HERE #### Step 4: Apply the mask to scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = tf.boolean_mask(box_class_scores,filtering_mask)boxes = tf.boolean_mask(boxes, filtering_mask)classes = tf.boolean_mask(box_classes, filtering_mask)### END CODE HERE ###return scores, boxes, classes2.3 非極大值抑制
過濾以后,還有很多重疊的邊界框,這時我們使用 non maximum suppression (NMS)
NMS 使用最高交并比(IoU)的邊框作為預測結果
非最大值抑制步驟:
TF 內置 NMS https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression
https://www.tensorflow.org/api_docs/python/tf/gather
# GRADED FUNCTION: yolo_non_max_suppressiondef yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):"""Applies Non-max suppression (NMS) to set of boxesArguments:scores -- tensor of shape (None,), output of yolo_filter_boxes()boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)classes -- tensor of shape (None,), output of yolo_filter_boxes()max_boxes -- integer, maximum number of predicted boxes you'd likeiou_threshold -- real value, "intersection over union" threshold used for NMS filteringReturns:scores -- tensor of shape (, None), predicted score for each boxboxes -- tensor of shape (4, None), predicted box coordinatesclasses -- tensor of shape (, None), predicted class for each boxNote: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that thisfunction will transpose the shapes of scores, boxes, classes. This is made for convenience."""max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression()K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep### START CODE HERE ### (≈ 1 line)nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)### END CODE HERE #### Use K.gather() to select only nms_indices from scores, boxes and classes### START CODE HERE ### (≈ 3 lines)scores = K.gather(scores, nms_indices)boxes = K.gather(boxes, nms_indices)classes = K.gather(classes, nms_indices)### END CODE HERE ###return scores, boxes, classes2.4 完成過濾
兩個輔助函數:
- boxes = yolo_boxes_to_corners(box_xy, box_wh) 可以將box轉成 兩個頂點的表達方式
- boxes = scale_boxes(boxes, image_shape) 縮放box以便在不同的size的圖片上顯示
YOLO 模型總結:
- 輸入 608*608*3 的圖片,經過 卷積NN,得到 19*19*5*85的輸出
- 展平最后兩維就是 19*19*425,19x19的每個網格包含有 425 個數
- 5 是因為選了 5 種 anchor boxes, 85 = 80個類別 + 5 個參數 (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏?,𝑏𝑤)
- 然后只選出了一些邊框(閾值過濾,非最大值抑制)
3. 在照片上測試已預訓練的YOLO模型
- 創建 session
3.1 定義類別、anchors、圖片尺寸
class_names = read_classes("model_data/coco_classes.txt") anchors = read_anchors("model_data/yolo_anchors.txt") image_shape = (720., 1280.)coco_classes文件里定義了80種物體的名稱
yolo_anchors文件里有10個浮點數,定義了5種 anchor box 的形狀
3.2 加載已預訓練的模型
報錯:module 'tensorflow' has no attribute 'space_to_depth'
版本問題真的很麻煩,安裝以下版本不報錯(python 3.7環境)
pip uninstall tensorflow pip uninstall keras pip install tensorflow==1.14.0 pip install keras==2.3.1 yolo_model = load_model("model_data/yolo.h5")模型預覽:
yolo_model.summary() Model: "model_1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 608, 608, 3) 0 __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 608, 608, 32) 864 input_1[0][0] __________________________________________________________________________________________________ batch_normalization_1 (BatchNor (None, 608, 608, 32) 128 conv2d_1[0][0] __________________________________________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 608, 608, 32) 0 batch_normalization_1[0][0] __________________________________________________________________________________________________ max_pooling2d_1 (MaxPooling2D) (None, 304, 304, 32) 0 leaky_re_lu_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 304, 304, 64) 18432 max_pooling2d_1[0][0] __________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 304, 304, 64) 256 conv2d_2[0][0] __________________________________________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 304, 304, 64) 0 batch_normalization_2[0][0] __________________________________________________________________________________________________ max_pooling2d_2 (MaxPooling2D) (None, 152, 152, 64) 0 leaky_re_lu_2[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 152, 152, 128 73728 max_pooling2d_2[0][0] __________________________________________________________________________________________________ batch_normalization_3 (BatchNor (None, 152, 152, 128 512 conv2d_3[0][0] __________________________________________________________________________________________________ leaky_re_lu_3 (LeakyReLU) (None, 152, 152, 128 0 batch_normalization_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 152, 152, 64) 8192 leaky_re_lu_3[0][0] __________________________________________________________________________________________________ batch_normalization_4 (BatchNor (None, 152, 152, 64) 256 conv2d_4[0][0] __________________________________________________________________________________________________ leaky_re_lu_4 (LeakyReLU) (None, 152, 152, 64) 0 batch_normalization_4[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 152, 152, 128 73728 leaky_re_lu_4[0][0] __________________________________________________________________________________________________ batch_normalization_5 (BatchNor (None, 152, 152, 128 512 conv2d_5[0][0] __________________________________________________________________________________________________ leaky_re_lu_5 (LeakyReLU) (None, 152, 152, 128 0 batch_normalization_5[0][0] __________________________________________________________________________________________________ max_pooling2d_3 (MaxPooling2D) (None, 76, 76, 128) 0 leaky_re_lu_5[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 76, 76, 256) 294912 max_pooling2d_3[0][0] __________________________________________________________________________________________________ batch_normalization_6 (BatchNor (None, 76, 76, 256) 1024 conv2d_6[0][0] __________________________________________________________________________________________________ leaky_re_lu_6 (LeakyReLU) (None, 76, 76, 256) 0 batch_normalization_6[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 76, 76, 128) 32768 leaky_re_lu_6[0][0] __________________________________________________________________________________________________ batch_normalization_7 (BatchNor (None, 76, 76, 128) 512 conv2d_7[0][0] __________________________________________________________________________________________________ leaky_re_lu_7 (LeakyReLU) (None, 76, 76, 128) 0 batch_normalization_7[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 76, 76, 256) 294912 leaky_re_lu_7[0][0] __________________________________________________________________________________________________ batch_normalization_8 (BatchNor (None, 76, 76, 256) 1024 conv2d_8[0][0] __________________________________________________________________________________________________ leaky_re_lu_8 (LeakyReLU) (None, 76, 76, 256) 0 batch_normalization_8[0][0] __________________________________________________________________________________________________ max_pooling2d_4 (MaxPooling2D) (None, 38, 38, 256) 0 leaky_re_lu_8[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D) (None, 38, 38, 512) 1179648 max_pooling2d_4[0][0] __________________________________________________________________________________________________ batch_normalization_9 (BatchNor (None, 38, 38, 512) 2048 conv2d_9[0][0] __________________________________________________________________________________________________ leaky_re_lu_9 (LeakyReLU) (None, 38, 38, 512) 0 batch_normalization_9[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D) (None, 38, 38, 256) 131072 leaky_re_lu_9[0][0] __________________________________________________________________________________________________ batch_normalization_10 (BatchNo (None, 38, 38, 256) 1024 conv2d_10[0][0] __________________________________________________________________________________________________ leaky_re_lu_10 (LeakyReLU) (None, 38, 38, 256) 0 batch_normalization_10[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D) (None, 38, 38, 512) 1179648 leaky_re_lu_10[0][0] __________________________________________________________________________________________________ batch_normalization_11 (BatchNo (None, 38, 38, 512) 2048 conv2d_11[0][0] __________________________________________________________________________________________________ leaky_re_lu_11 (LeakyReLU) (None, 38, 38, 512) 0 batch_normalization_11[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D) (None, 38, 38, 256) 131072 leaky_re_lu_11[0][0] __________________________________________________________________________________________________ batch_normalization_12 (BatchNo (None, 38, 38, 256) 1024 conv2d_12[0][0] __________________________________________________________________________________________________ leaky_re_lu_12 (LeakyReLU) (None, 38, 38, 256) 0 batch_normalization_12[0][0] __________________________________________________________________________________________________ conv2d_13 (Conv2D) (None, 38, 38, 512) 1179648 leaky_re_lu_12[0][0] __________________________________________________________________________________________________ batch_normalization_13 (BatchNo (None, 38, 38, 512) 2048 conv2d_13[0][0] __________________________________________________________________________________________________ leaky_re_lu_13 (LeakyReLU) (None, 38, 38, 512) 0 batch_normalization_13[0][0] __________________________________________________________________________________________________ max_pooling2d_5 (MaxPooling2D) (None, 19, 19, 512) 0 leaky_re_lu_13[0][0] __________________________________________________________________________________________________ conv2d_14 (Conv2D) (None, 19, 19, 1024) 4718592 max_pooling2d_5[0][0] __________________________________________________________________________________________________ batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096 conv2d_14[0][0] __________________________________________________________________________________________________ leaky_re_lu_14 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_14[0][0] __________________________________________________________________________________________________ conv2d_15 (Conv2D) (None, 19, 19, 512) 524288 leaky_re_lu_14[0][0] __________________________________________________________________________________________________ batch_normalization_15 (BatchNo (None, 19, 19, 512) 2048 conv2d_15[0][0] __________________________________________________________________________________________________ leaky_re_lu_15 (LeakyReLU) (None, 19, 19, 512) 0 batch_normalization_15[0][0] __________________________________________________________________________________________________ conv2d_16 (Conv2D) (None, 19, 19, 1024) 4718592 leaky_re_lu_15[0][0] __________________________________________________________________________________________________ batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096 conv2d_16[0][0] __________________________________________________________________________________________________ leaky_re_lu_16 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_16[0][0] __________________________________________________________________________________________________ conv2d_17 (Conv2D) (None, 19, 19, 512) 524288 leaky_re_lu_16[0][0] __________________________________________________________________________________________________ batch_normalization_17 (BatchNo (None, 19, 19, 512) 2048 conv2d_17[0][0] __________________________________________________________________________________________________ leaky_re_lu_17 (LeakyReLU) (None, 19, 19, 512) 0 batch_normalization_17[0][0] __________________________________________________________________________________________________ conv2d_18 (Conv2D) (None, 19, 19, 1024) 4718592 leaky_re_lu_17[0][0] __________________________________________________________________________________________________ batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096 conv2d_18[0][0] __________________________________________________________________________________________________ leaky_re_lu_18 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_18[0][0] __________________________________________________________________________________________________ conv2d_19 (Conv2D) (None, 19, 19, 1024) 9437184 leaky_re_lu_18[0][0] __________________________________________________________________________________________________ batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096 conv2d_19[0][0] __________________________________________________________________________________________________ conv2d_21 (Conv2D) (None, 38, 38, 64) 32768 leaky_re_lu_13[0][0] __________________________________________________________________________________________________ leaky_re_lu_19 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_19[0][0] __________________________________________________________________________________________________ batch_normalization_21 (BatchNo (None, 38, 38, 64) 256 conv2d_21[0][0] __________________________________________________________________________________________________ conv2d_20 (Conv2D) (None, 19, 19, 1024) 9437184 leaky_re_lu_19[0][0] __________________________________________________________________________________________________ leaky_re_lu_21 (LeakyReLU) (None, 38, 38, 64) 0 batch_normalization_21[0][0] __________________________________________________________________________________________________ batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096 conv2d_20[0][0] __________________________________________________________________________________________________ space_to_depth_x2 (Lambda) (None, 19, 19, 256) 0 leaky_re_lu_21[0][0] __________________________________________________________________________________________________ leaky_re_lu_20 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_20[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 19, 19, 1280) 0 space_to_depth_x2[0][0] leaky_re_lu_20[0][0] __________________________________________________________________________________________________ conv2d_22 (Conv2D) (None, 19, 19, 1024) 11796480 concatenate_1[0][0] __________________________________________________________________________________________________ batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096 conv2d_22[0][0] __________________________________________________________________________________________________ leaky_re_lu_22 (LeakyReLU) (None, 19, 19, 1024) 0 batch_normalization_22[0][0] __________________________________________________________________________________________________ conv2d_23 (Conv2D) (None, 19, 19, 425) 435625 leaky_re_lu_22[0][0] ================================================================================================== Total params: 50,983,561 Trainable params: 50,962,889 Non-trainable params: 20,672模型把一批圖片 m * 608 * 608 * 3 ,轉為 tensor m * 19 * 19 * 5 * 85
3.3 模型輸出轉化為可用的邊界框變量
yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))3.4 過濾邊界框
- 只選出一些邊界框作為結果
3.5 在圖片上運行
注意:當模型使用BatchNorm時(就像在YOLO中一樣),需要在 feed_dict 中傳遞一個額外的 placeholder K.learning_phase(): 0
out_scores, out_boxes, out_classes = predict(sess, "test.jpg") Found 7 boxes for test.jpg car 0.60 (925, 285) (1045, 374) bus 0.67 (5, 267) (220, 407) car 0.68 (705, 279) (786, 351) car 0.70 (947, 324) (1280, 704) car 0.75 (159, 303) (346, 440) car 0.80 (762, 282) (942, 412) car 0.89 (366, 299) (745, 648) Found 2 boxes for 1.jpg car 0.61 (253, 466) (367, 513) car 0.73 (179, 473) (284, 522)- 批量預測圖片,并生成 gif 動圖
我的CSDN博客地址 https://michael.blog.csdn.net/
長按或掃碼關注我的公眾號(Michael阿明),一起加油、一起學習進步!
總結
以上是生活随笔為你收集整理的04.卷积神经网络 W3.目标检测(作业:自动驾驶 - 汽车检测)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: LeetCode 1577. 数的平方等
- 下一篇: LeetCode 1733. 需要教语言