當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

YOLOv3: 训练自己的数据(绝对经典版本1)

發(fā)布時間：2025/4/16 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 YOLOv3: 训练自己的数据(绝对经典版本1) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

為什么80%的碼農(nóng)都做不了架構(gòu)師？>>> ??

windows版本：請參考：https://github.com/AlexeyAB/darknet

linux ? ? ? 版本：請參考本文與?https://pjreddie.com/darknet/yolo

第一部分：論文與代碼

第二部分：如何訓練自己的數(shù)據(jù)

第三部分：疑惑解釋

第四部分：測試相關(guān)

第一部分：論文與代碼

論? 文：https://pjreddie.com/media/files/papers/YOLOv3.pdf

翻? 譯：https://zhuanlan.zhihu.com/p/34945787

代? 碼：https://github.com/pjreddie/darknet

官? 網(wǎng)：https://pjreddie.com/darknet/yolo

舊? 版：

?https://pjreddie.com/darknet/yolov2/

?https://pjreddie.com/darknet/yolov1/

第二部分：如何訓練自己的數(shù)據(jù)

說明：

（1）平臺 linux + 作者官方代碼【訓練指令請參考官網(wǎng)教程：https://pjreddie.com/darknet/yolo】

環(huán)境：centos6.5 ? ?

顯卡：1080Ti ? ??

時間：很快，具體時間記不清了 ? ?
迭代：900?
優(yōu)點：小目標檢測
速度：稍微慢于v2
測試：記得更改cfg文件

目的：給網(wǎng)友提供參考，所以樣本和迭代次數(shù)較少，僅供學習！

（2）為了方便大家學習，這里提供了182張訓練數(shù)據(jù)、標注以及對應的配置文件，數(shù)據(jù)是4類【人，車頭，車尾，車側(cè)身】:?

訓練數(shù)據(jù)、配置文件、模型、訓練日志、標注工具放在QQ群：371315462，請看完本文再加群！！！具體下載地址在群文件的YOLOv3.txt！

友情提醒：如果加群之后，還在提問本博文中已經(jīng)列舉出來的問題，將會被清理出去！

【群文件補充了training_list.txt以及labels文件】，歡迎目標檢測與語義分割的小伙伴進群交流。

訓練自己的數(shù)據(jù)主要分以下幾步：

（0）數(shù)據(jù)集制作：

A.制作VOC格式的xml文件

工具：LabelImg?【群文件提供了exe免安裝版本以及使用說明】

B.將VOC格式的xml文件轉(zhuǎn)換成YOLO格式的txt文件

腳本：voc_label.py，根據(jù)自己的數(shù)據(jù)集修改就行了。

（1）文件修改：

（A）關(guān)于?.data .names?兩個文件修改非常簡單，參考官網(wǎng)或者群文件ＹＯＬＯv3.txt連接中的文件。

（B）關(guān)于cfg修改，以6類目標檢測為例，主要有以下幾處調(diào)整（藍色標出）,也可參考我上傳的文件，里面對應的是4類。

[net]
# Testing
# batch=1
# subdivisions=1
# Training
?batch=64

?subdivisions=8

......

[convolutional]
size=1
stride=1
pad=1
filters=33###75

activation=linear

[yolo]
mask = 6,7,8
anchors = 10,13, ?16,30, ?33,23, ?30,61, ?62,45, ?59,119, ?116,90, ?156,198, ?373,326
classes=6###20
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=0###1

......

[convolutional]
size=1
stride=1
pad=1
filters=33###75
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,13, ?16,30, ?33,23, ?30,61, ?62,45, ?59,119, ?116,90, ?156,198, ?373,326
classes=6###20
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=0###1

......

[convolutional]
size=1
stride=1
pad=1
filters=33###75
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,13, ?16,30, ?33,23, ?30,61, ?62,45, ?59,119, ?116,90, ?156,198, ?373,326
classes=6###20
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=0###1

A.filters數(shù)目是怎么計算的：3x(classes數(shù)目+5)，和聚類數(shù)目分布有關(guān)，論文中有說明；

B.如果想修改默認anchors數(shù)值，使用k-means即可；

C.如果顯存很小，將random設(shè)置為0，關(guān)閉多尺度訓練；

D.其他參數(shù)如何調(diào)整，有空再補;

E.前100次迭代loss較大，后面會很快收斂；

Region xx: cfg文件中yolo-layer的索引；

Avg IOU:當前迭代中，預測的box與標注的box的平均交并比，越大越好，期望數(shù)值為1；

Class: 標注物體的分類準確率，越大越好，期望數(shù)值為1；

obj: 越大越好，期望數(shù)值為1；

No obj: 越小越好；

.5R: 以IOU=0.5為閾值時候的recall; recall = 檢出的正樣本/實際的正樣本

0.75R: 以IOU=0.75為閾值時候的recall;

count:正樣本數(shù)目。

F.模型測試：

6類測試效果【模型與配置文件稍后放到群文件，但數(shù)據(jù)暫不提供】

? ? ? ? ??

4類的測試效果，182張迭代900次時的檢測效果。【群文件的YOLOv3.txt有百度云盤的下載地址】

配置文件中的相關(guān)參數(shù)：完整版本見群文件。

H.如果訓練還有問題或其他疑問，請參考第三部分或者網(wǎng)絡(luò)搜索。

I.如何測試以及測試中的問題，請參考第四部分或者網(wǎng)絡(luò)搜索。

訓練指令；多GPU訓練指令；恢復訓練指令

第三部分：訓練問題詳解

圖片來自群文件，侵權(quán)聯(lián)系刪除

Tips0:?數(shù)據(jù)集問題

如果是學習如何訓練，建議不要用VOC或者COCO,這兩個數(shù)據(jù)集復雜，類別較多，復現(xiàn)作者的效果需要一定的功力，迭代差不多5w次，就可以看到初步的效果。所以，不如挑個簡單數(shù)據(jù)集的或者手動標注個幾百張就可以進行訓練學習。

Tips1:?CUDA:?out of memory 以及 resizing 問題

顯存不夠，調(diào)小batch，關(guān)閉多尺度訓練：random = 0。

Tips2:?在迭代前期，loss很大，正常嗎？

經(jīng)過幾個數(shù)據(jù)集的測試，前期loss偏大是正常的，后面就很快收斂了。

Tips3:?YOLOV3中的mask作用？

參考#558 #567

Every layer has to know about all of the anchor boxes but is only predicting some subset of them. This could probably be named something better but the mask tells the layer which of the bounding boxes it is responsible for predicting. The first?yolo?layer predicts 6,7,8 because those are the largest boxes and it's at the coarsest scale. The 2nd?yolo?layer predicts some smallers ones, etc.

The layer assumes if it isn't passed a mask that it is responsible for all the bounding boxes, hence the?ifstatement thing.

Tips3:?YOLOV3中的num作用？

#參考567

num is 9 but each yolo layer is only actually looking at 3 (that's what the mask thing does). so it's (20+1+4)*3 = 75. If you use a different number of anchors you have to figure out which layer you want to predict which anchors and the number of filters will depend on that distribution.

according to paper, each yolo (detection) layer get 3 anchors with associated with its size, mask is selected anchor indices.

Tips4:?YOLOV3訓練出現(xiàn)nan的問題？

參考#566

You must be training on a lot of small objects! nan's appear when there are no objects in a batch of images since i definitely divide by zero. For example, Avg IOU is the sum of IOUs for all objects at that level / # of objects, if that is zero you get nan. I could probably change this so it just does a check for zero 1st, just wasn't a priority.

所以在顯存允許的情況下，可適當增加batch大小，可以一定程度上減少NAN的出現(xiàn)。

Tips5:?Anchor box作用是？

參考#568

Here's a quick explanation based on what I understand (which might be wrong but hopefully gets the gist of it). After doing some clustering studies on ground truth labels, it turns out that most bounding boxes have certain height-width ratios. So instead of directly predicting a bounding box, YOLOv2 (and v3) predict off-sets from a predetermined set of boxes with particular height-width ratios - those predetermined set of boxes are the anchor boxes.

Anchors are initial sizes (width, height) some of which (the closest to the object size) will be resized to the object size - using some outputs from the neural network (final feature map):

darknet/src/yolo_layer.c

Lines 88 to 89 in?6f6e475

?	b.w?=?exp(x[index?+?2stride]) biases[2*n] / w;
?	b.h?=?exp(x[index?+?3stride]) biases[2*n+1] / h;

x[...]?- outputs of the neural network
biases[...]?- anchors
b.w?and?b.h?result width and height of bounded box that will be showed on the result image

Thus, the network should not predict the final size of the object, but should only adjust the size of the nearest anchor to the size of the object.

In Yolo v3 anchors (width, height) - are sizes of objects on the image that resized to the network size (width=?and?height=?in the cfg-file).

In Yolo v2 anchors (width, height) - are sizes of objects relative to the final feature map (32 times smaller than in Yolo v3 for default cfg-files).

Tips6:?YOLOv2和YOLOv3中anchor box為什么相差很多？

參考#562 #555

Now anchors depends on size of the network-input rather than size of the network-output (final-feature-map):?#555 (comment)
So values of the anchors 32 times more.

Now?filters=(classes+1+coords)*anchors_num?where anchors_num is a number of?masks for this layer.
If?mask?is absence then?anchors_num = num?for this layer:

darknet/src/yolo_layer.c

Lines 31 to 37 in?e4acba6

?	if(mask) l.mask?= mask;
?	else{
?	l.mask?=?calloc(n,?sizeof(int));
?	for(i =?0; i < n; ++i){
?	l.mask[i] = i;
?	}
?	}

Each?[yolo]?layer uses only those anchors whose indices are specified in the?mask=

So YOLOv2 I made some design choice errors, I made the anchor box size be relative to the feature size in the last layer. Since the network was downsampling by 32 this means it was relative to 32 pixels so an anchor of 9x9 was actually 288px x 288px.

In YOLOv3 anchor sizes are actual pixel values. this simplifies a lot of stuff and was only a little bit harder to implement

Tips7:?YOLOv3打印的參數(shù)都是什么含義？

詳見yolo_layer.c文件的forward_yolo_layer函數(shù)。

? ? printf("Region %d Avg IOU: %f, Class: %f, Obj: %f, No Obj: %f, .5R: %f, .75R: %f, ?count: %d\n", net.index, avg_iou/count, avg_cat/class_count, avg_obj/count, avg_anyobj/(l.w*l.h*l.n*l.batch), recall/count, recall75/count, count);

剛開始迭代，由于沒有預測出相應的目標，所以查全率較低【.5R 0.75R】，會出現(xiàn)大面積為0的情況，這個是正常的。

第四部分：測試問題

由于比較忙，評論不會及時回復。請大家仔細參考下方問題、評論區(qū)或者加群討論。

ps1.好多評論都是重復的，可能喜歡不看文章或他人評論，就留言了。針對重復的問題，不再回復，抱歉！

ps2.關(guān)于優(yōu)化以及工程化的問題，這些都是需要你自己去弄，我沒時間和精力幫你解決，抱歉！

ps3.大家要善用網(wǎng)絡(luò)搜索，不會的先搜索，找不到答案再提問！

（0）部分童鞋在使用CPU前向時出現(xiàn)段錯誤（核心已轉(zhuǎn)儲）問題

代碼本身的問題，暫時還沒時間去定位這個問題，先盡量用GPU做前向。

（1）** Error in `./darknet': free(): invalid next size (fast): 0x000055d39b90cbb0 ***已放棄 (核心已轉(zhuǎn)儲)?

請使用以下測試指令！

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

訓練與測試指令，請參考官網(wǎng)連接：https://pjreddie.com/darknet/yolo/

（2）如何進行模型的評估（valid）？

評估指令，請參考官網(wǎng)連接：https://pjreddie.com/darknet/yolo/

（3）bounding box正確，標簽錯亂，這里提供兩種方案供選擇。

A.不改源碼，不用重新編譯

? ?修改coco.names中內(nèi)容，為自己的檢測目標類別

B.改源碼，需要重新編譯[ 先 make clean ,然后再 make]

原因是作者在代碼中設(shè)置了默認，所以修改 examples/darknet.c文件，將箭頭指示的文件替換成自己的。然后重新編譯即可。

（4）模型保存問題

A：保存模型出錯？

一般是 .data 文件中指定的文件夾無法創(chuàng)建，導致模型保存時出錯。自己手動創(chuàng)建即可。

B：模型什么時候保存？如何更改

迭代次數(shù)小于1000時，每100次保存一次，大于1000時，沒10000次保存一次。

自己可以根據(jù)需求進行更改，然后重新編譯即可[ 先 make clean ,然后再 make]。

代碼位置： examples/detector.c line 138

C：使用預訓練模型直接保存問題

darknet53.conv.74作為預訓練權(quán)重文件，因為只包含卷積層，所以可以從頭開始訓練。

xxx.weights作為預權(quán)重文件訓練，因為包含所有層，相當于恢復快照訓練，會從已經(jīng)保存的迭代次數(shù)往下訓練。如果cfg中迭代次數(shù)沒改，所以不會繼續(xù)訓練，直接保存結(jié)束。

（5）中文標簽問題

這個解決方案較多，我就簡單的說一下我的方案【群文件labels提供了參考代碼】

A：首先生成對應的中文標簽，

修改代碼中的字體，將其替換成指中文字體，如果提示提示缺少**模塊，安裝就行了。

B：添加自己的讀取標簽和畫框函數(shù)

（6）圖片上添加置信值

代碼比較熟悉的童鞋，使用opencv在畫框的函數(shù)里面添加一下就行了。

（7）圖片保存名稱

測試的時候，保存的默認名稱是predictions.自己改成對應的圖片名稱即可。

第五部分：論文閱讀

優(yōu)點：速度快，精度提升，小目標檢測有改善；

不足：中大目標有一定程度的削弱，遮擋漏檢，速度稍慢于V2。

v2: anchors[k-means]+多尺度+跨尺度特征融合
v3: anchors[k-means]+多尺度+跨尺度特征融合

v2,v3兩者都是有上面的共同特點，簡單的多尺度不是提升小目標的檢測的關(guān)鍵。

v2: 32x的下采樣，然后使用anchor進行回歸預測box
問題：較大的下采樣因子，通常可以帶來較大的感受野，這對于分類任務是有利，但會損害目標檢測和定位【小目標在下采樣過程中消失，大目標邊界定位不準】

v3: 針對這個問題，進行了調(diào)整。就是在網(wǎng)絡(luò)的3個不同的尺度進行了box的預測。【說白了就是FPN的思想】在下采樣的前期就進行目標的預測，這樣就可以改善小目標檢測和定位問題。
不理解的話，稍微看一下FPN,就明白了。這個才是v3提升小目標的關(guān)鍵所在。

--------------------- 本文來自馬衛(wèi)飛的CSDN 博客，全文地址請點擊：https://blog.csdn.net/maweifei/article/details/81137563?utm_source=copy

轉(zhuǎn)載于:https://my.oschina.net/farces/blog/2209444

總結(jié)

以上是生活随笔為你收集整理的YOLOv3: 训练自己的数据(绝对经典版本1)的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：微服务架构详谈
下一篇：可能是全网把 ZooKeeper 概念讲