當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

yolo yolov2_PP-YOLO超越YOLOv4 —对象检测的进步

發布時間：2023/12/15 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 yolo yolov2_PP-YOLO超越YOLOv4 —对象检测的进步小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

yolo yolov2

重點 (Top highlight)

PP-YOLO evaluation metrics show improved performance over YOLOv4, the incumbent state of the art object detection model. Yet, the Baidu authors write:

PP-YOLO評估指標顯示，其性能優于現有的最新對象檢測模型YOLOv4 。但是，百度作者寫道：

This paper is not intended to introduce a novel object detector. It is more like a recipe, which tell you how to build a better detector step by step.

本文無意介紹一種新穎的物體檢測器。它更像一個食譜，它告訴您如何逐步構建更好的檢測器。

Let’s unpack that.

讓我們打開包裝。

YOLO發展歷程 (YOLO Development History)

YOLO was originally authored by Joseph Redmon to detect objects. Object detection is a computer vision technique that localizes and tags objects by drawing a bounding box around them and identifying the class label that a given box belongs too. Unlike massive NLP transformers, YOLO is designed to be tiny, enabling realtime inference speeds for deployment on device.

YOLO最初是由Joseph Redmon創作的，用于檢測物體。對象檢測是一種計算機視覺技術，它通過在對象周圍繪制邊框并標識給定框也屬于的類標簽來對對象進行定位和標記。與大型NLP變壓器不同，YOLO的體積很小，可為設備上的部署提供實時推理速度。

YOLO-9000 was the second “YOLOv2” object detector published by Joseph Redmon, improving the detector and emphasizing the detectors ability to generalize to any object in the world.

YOLO-9000是約瑟夫·雷德蒙(Joseph Redmon)發布的第二個“ YOLOv2”物體檢測器，它改進了檢測器并強調了該檢測器能夠推廣到世界上任何物體的能力。

PP-YOLO is being trained to identify different fruit flies in this photo.PP-YOLO進行識別不同果蠅的培訓。

YOLOv3 made further improvements to the detection network and began to mainstream the object detection process. We began to publish tutorials on how to train YOLOv3 in PyTorch, how to train YOLOv3 in Keras, and compared YOLOv3 performance to EfficientDet (another state of the art detector).

YOLOv3進一步完善了檢測網絡，并開始將物體檢測過程納入主流。我們開始發布有關如何在PyTorch中訓練YOLOv3 ，如何在Keras中訓練YOLOv3的教程，并將YOLOv3的性能與EfficientDet (另一種先進的檢測器)進行了比較。

Then Joseph Redmon stepped out of the object detection game due to ethical concerns.

然后，出于道德考慮，約瑟夫·雷德蒙(Joseph Redmon)退出了物體檢測游戲。

Naturally, the open source community picked up the baton and continues to move YOLO technology forward.

自然，開源社區接力棒，并繼續推動YOLO技術向前發展。

YOLOv4 was published recently this spring by Alexey AB in his for of the YOLO Darknet repository. YOLOv4 was primarily an ensemble of other known computer vision technologies, combined and validated through the research process. See here for a deep dive on YOLOv4. The YOLOv4 paper reads similarly to the PP-YOLO paper, as we will see below. We put together some great training tutorials on how to train YOLOv4 in Darknet.

YOLOv4于今年春天由Alexey AB在YOLO Darknet存儲庫中發布。 YOLOv4主要是其他已知計算機視覺技術的集合，并通過研究過程進行了組合和驗證。請參閱此處，深入了解YOLOv4 。 YOLOv4紙的讀取方式與PP-YOLO紙類似，如下所示。我們整理了一些很棒的培訓教程，介紹如何在Darknet中訓練YOLOv4 。

Then, just a few months ago YOLOv5 was released. YOLOv5 took the Darknet (C based) training environment and converted the network to PyTorch. Improved training techniques pushed performance of the model even further and created a great, easy to use, out of the box object detection model. Ever since, we have been encouraging developers using Roboflow to direct their attention to YOLOv5 for the formation of their custom object detectors via this YOLOv5 training tutorial.

然后，就在幾個月前，發布了YOLOv5 。 YOLOv5采用了Darknet(基于C)的培訓環境，并將網絡轉換為PyTorch。改進的訓練技術進一步提高了模型的性能，并創建了一個出色的，易于使用的現成的對象檢測模型。從那時起，我們一直在鼓勵使用Roboflow的開發人員通過此YOLOv5培訓教程將注意力轉移到YOLOv5上，以形成他們的自定義對象檢測器。

Enter PP-YOLO.

輸入PP-YOLO。

PP代表什么？ (What Does PP Stand For?)

PP is short for PaddlePaddle, a deep learning framework written by Baidu.

PP是百度編寫的深度學習框架PaddlePaddle的縮寫。

PaddlePaddle distributions provided on their website.PaddlePaddle發行版本。

If PaddlePaddle is new to you, then we are in the same boat. Primarily written in Python, PaddlePaddle seems akin to PyTorch and TensorFlow. A deep dive into the PaddlePaddle framework is intriguing, but beyond the scope of this article.

如果PaddlePaddle對您來說是新手，那么我們在同一條船上。 PaddlePaddle主要是用Python編寫的，看起來類似于PyTorch和TensorFlow。對PaddlePaddle框架的深入研究很有趣，但超出了本文的范圍。

PP-YOLO貢獻 (PP-YOLO Contributions)

The PP-YOLO paper reads much like the YOLOv4 paper in that it is a compilation of techniques that are known to work in computer vision. The novel contribution is to prove that the ensemble of these technologies improves performance, and to provide an ablation study of how much each step helps the model along the way.

PP-YOLO紙的讀法與YOLOv4紙非常相似，因為它是計算機視覺中已知的技術的匯總。新穎的貢獻是證明這些技術的集成可提高性能，并提供消融研究，以了解每一步對模型的幫助程度。

Before we dive into the contributions of PP-YOLO, it will be useful to review the YOLO detector architecture.

在我們深入探討PP-YOLO的貢獻之前，有必要回顧一下YOLO檢測器的架構。

YOLO檢測器的解剖 (Anatomy of the YOLO Detector)

PP-YOLO object detection networkPP-YOLO對象檢測網絡的圖形描述

The YOLO detector is broken into three main pieces.

YOLO檢測器分為三個主要部分。

YOLO Backbone — The YOLO backbone is a convolutional neural network that pools image pixels to form features at different granularities. The Backbone is typically pretrained on a classification dataset, typically ImageNet.

YOLO骨干網 — YOLO骨干網是一個卷積神經網絡，它將圖像像素合并以形成不同粒度的特征。骨干通常在分類數據集(通常為ImageNet)上進行預訓練。

YOLO Neck — The YOLO neck (FPN is chosen above) combines and mixes the ConvNet layer representations before passing on to the prediction head.

YOLO脖子— YOLO脖子(上面選擇了FPN)在傳遞到預測頭之前對ConvNet圖層表示進行組合和混合。

YOLO Head — This is the part of the network that makes the bounding box and class prediction. It is guided by the three YOLO loss functions for class, box, and objectness.

YOLO Head —這是網絡中進行邊界框和類預測的部分。它由關于類，框和對象的三個YOLO損失函數指導。

現在，讓我們深入了解PP YOLO貢獻。 (Now let’s dive into the PP YOLO Contributions.)

PP-YOLOPP-YOLO中的每種技術都會提高邊際mAP準確度性能

更換骨干網 (Replace Backbone)

The first PP YOLO technique is to replace the YOLOv3 Darknet53 backbone with the Resnet50-vd-dcn ConvNet backbone. Resnet is a more popular backbone, more frameworks are optimized for its execution, and it has fewer parameters than Darknet53. Seeing a mAP improvement by swapping this backbone is a huge win for PP YOLO.

第一種PP YOLO技術是用Resnet50-vd-dcn ConvNet主干替換YOLOv3 Darknet53主干。 Resnet是一個更流行的主干，它的執行優化了更多的框架，并且其參數少于Darknet53。交換此主干可以看到mAP的改進，這對PP YOLO來說是一個巨大的勝利。

ResNetResNet中的圖形描繪

模型參數的EMA (EMA of Model Parameters)

PP YOLO tracks the Exponential Moving Average of network parameters to maintain a shadow of the models weights for prediction time. This has been shown to improve inference accuracy.

PP YOLO跟蹤網絡參數的指數移動平均值，以保持預測時間的模型權重的陰影。已經證明這可以提高推理精度。

批量更大 (Larger Batch Size)

PP-YOLO bumps the batch size up from 64 to 192. Of course, this is hard to implement if you have GPU memory constraints.

PP-YOLO的批量大小從64增加到192。當然，如果您有GPU內存限制，則很難實現。

DropBlock正則化 (DropBlock Regularization)

PP YOLO implements DropBlock regularization in the FPN neck (in the past, this has usually occurred in the backbone). DropBlock randomly removes a block of the training features at a given step in the network to teach the model to not rely on key features for detection.

PP YOLO在FPN脖子上實現DropBlock正則化(過去，這通常發生在主干中)。在網絡的給定步驟中，DropBlock會隨機刪除一部分訓練功能，以指示模型不依賴于關鍵特征進行檢測。

Drop Block regularization technique — features are hidden in blocks ? not randomly (b)Drop Block正則化技術-功能隱藏在塊中?不是隨機(b)

IOU損失 (IoU Loss)

The YOLO loss function does not translate well to the mAP metric, which uses the Intersection over Union heavily in its calculation. Therefore, it is useful to edit the training loss function with this end prediction in mind. This edit was also present in YOLOv4.

YOLO損失函數不能很好地轉換為mAP指標，該指標在計算中大量使用了Union上的Intersection。因此，考慮到此結束預測來編輯訓練損失函數很有用。此編輯也出現在YOLOv4中。

IoU意識 (IoU Aware)

The PP-YOLO network adds a prediction branch to predict the model’s estimated IOU with a given object. Including this IoU awareness when making the decision to predict an object or not improves performance.

PP-YOLO網絡添加了一個預測分支，以預測給定對象的模型估計的IOU。在決定是否預測對象時包含此IoU意識可提高性能。

電網靈敏度 (Grid Sensitivity)

The old YOLO models do not do a good job of making predictions right around the boundaries of anchor box regions. It is useful to define box coordinates slightly differently to avoid this problem. This technique is also present in YOLOv4.

舊的YOLO模型不能很好地在錨框區域的邊界附近進行預測。定義盒子坐標稍有不同是很有用的，以避免此問題。 YOLOv4中也存在此技術。

矩陣網管 (Matrix NMS)

Non-Maximum Suppression is a technique to remove over proposals of candidate objects for classification. Matrix NMS is a technique to sort through these candidate predictions in parallel, speeding up the calculation.

非最大抑制是一種刪除候選對象的提議以進行分類的技術。矩陣NMS是一項技術，可以對這些候選預測進行并行排序，從而加快了計算速度。

協調轉換 (CoordConv)

CoordConv was motivated by the problems ConvNets were having with simply mapping (x,y) coordinates to a one-hot pixel space. The CoordConv solution gives the convolution network access to its own input coordinates. CoordConv interventions are marked with yellow diamonds above. More details are available in the CordConv paper.

ConordNet的靈感來自于ConvNets僅將(x，y)坐標映射到一個熱像素空間所遇到的問題。 CoordConv解決方案使卷積網絡可以訪問其自己的輸入坐標。 CoordConv干預措施上方標有黃色菱形。有關更多詳細信息，請參見CordConv文件。

SPP (SPP)

Spatial Pyramid Pooling is an extra block after the backbone layer to mix and pool spatial features. Also implemented in YOLOv4 and YOLOv5.

空間金字塔池化是主干層之后的一個額外塊，用于混合和合并空間特征。也已在YOLOv4和YOLOv5中實現。

更好的預訓練骨干 (Better Pretrained Backbone)

The PP YOLO authors distilled down a larger ResNet model to serve as the backbone. A better pretrained model shows to improve downstream transfer learning as well.

PP YOLO的作者提煉出更大的ResNet模型作為骨干。更好的預訓練模型顯示也可以改善下游轉移學習。

PP-YOLO是最先進的嗎？ (Is PP-YOLO State of the Art?)

PP-YOLO outperforms the results YOLOv4 published on April 23, 2020.

PP-YOLO勝過2020年4月23日發布的YOLOv4結果。

In fairness, the authors note this may be the wrong question to be asking. The authors’ intent appears to not simply “introduce a new novel detector,” rather to show the process of carefully tuning an object detector to maximize performance. Quoting the paper’s introduction here:

公平地說，作者指出這可能是一個錯誤的問題。作者的意圖似乎不只是“引入一種新穎的新型檢測器”，而是表明仔細調整對象檢測器以最大化性能的過程。在此處引用本文的介紹：

The focus of this paper is how to stack some effective tricks that hardly affect efficiency to get better performance… This paper is not intended to introduce a novel object detector. It is more like a recipe, which tell you how to build a better detector step by step. We have found some tricks that are effective for the YOLOv3 detector, which can save developers’ time of trial and error. The final PP-YOLO model improves the mAP on COCO from 43.5% to 45.2% at a speed faster than YOLOv4

本文的重點是如何堆疊一些幾乎不影響效率的有效技巧以獲得更好的性能……本文無意介紹一種新穎的目標檢測器。它更像一個食譜，它告訴您如何逐步構建更好的檢測器。我們發現了一些對YOLOv3檢測器有效的技巧，可以節省開發人員的反復試驗時間。 最終的PP-YOLO模型以比YOLOv4更快的速度將COCO的mAP從43.5％提高到45.2％

(emphasis ours)

(強調我們的)

The PP-YOLO contributions reference above took the YOLOv3 model from 38.9 to 44.6 mAP on the COCO object detection task and increased inference FPS from 58 to 73. These metrics are shown in the paper to beat the currently published results for YOLOv4 and EfficientDet.

上面的PP-YOLO貢獻參考將YOLOv3模型在COCO對象檢測任務上從38.9 mAP提升到44.6 mAP，并將推理FPS從58增加到73。論文中顯示了這些指標，勝過了YOLOv4和EfficientDet的當前發布結果。

In benchmarking PP-YOLO against YOLOv5, it appears YOLOv5 still has the fastest inference time-to-accuracy performance (AP vs FPS) tradeoff on a V100. However, a YOLOv5 paper still remains to be released. Furthermore, it has been shown that training the YOLOv4 architecture on the YOLOv5 Ultralytics repository outperforms YOLOv5 and, transitively, YOLOv4 trained using YOLOv5 contributions would outperform the PP-YOLO results posted here. These results are still to be formally published but can be traced to this GitHub discussion.

在以YOLOv5為基準對PP-YOLO進行基準測試時，似乎YOLOv5在V100上仍具有最快的推理準確度(AP與FPS)折衷。但是，YOLOv5論文仍然有待發布。此外，已經表明，在YOLOv5 Ultralytics存儲庫上訓練YOLOv4體系結構的性能要優于YOLOv5，并且，以可移植的方式，使用YOLOv5貢獻進行訓練的YOLOv4的性能將勝過此處發布的PP-YOLO結果。這些結果仍有待正式發布，但可以追溯到GitHub上的討論。

PP-YOLO evaluation on COCO dataset on V100 GPU (note AP_50 column)PP-YOLO評估 (請注意AP_50列) YOLOv5 evaluation on COCO dataset on V100 GPU (note AP_50 column)YOLOv5評估 (請注意AP_50列)

It is worth noting that many of the techniques (such as architecture search and data augmentation) that were used in YOLOv4 were not used in PP YOLO. This means that there is still room for the state of the art in object detection to grow as more of these techniques are combined and integrated together.

值得注意的是，在YOLOv4中使用的許多技術(例如體系結構搜索和數據增強)在PP YOLO中并未使用。這意味著，隨著更多的這些技術被組合和集成在一起，在物體檢測領域中仍存在發展的空間。

Needless to say, is an exciting time to be implementing computer vision technologies.

毋庸置疑，這是實施計算機視覺技術的激動人心的時刻。

我應該從YOLOv4或YOLOv5切換到PP-YOLO嗎？ (Should I Switch from YOLOv4 or YOLOv5 to PP-YOLO?)

The PP-YOLO model shows the promise of state of the art object detection, but the improvements are incremental over other object detectors and it is written in a new framework. At this stage, the best thing to do is to develop your own empirical result by training PP-YOLO on your own dataset. (To be notified when you can easily use PP-YOLO on your dataset, subscribe to our newsletter.)

PP-YOLO模型顯示了最先進的對象檢測技術的前景，但是與其他對象檢測器相比，這些改進是遞增的，并且是在一個新的框架中編寫的。在此階段，最好的辦法是通過在自己的數據集上訓練PP-YOLO來開發自己的經驗結果。 (要在可以輕松在數據集中使用PP-YOLO時收到通知，請訂閱我們的新聞通訊。)

In the meantime, I recommend checking out the following YOLO tutorials to get your object detector off the ground:

同時，我建議您查看以下YOLO教程，以使您的物體檢測器成為現實：

How to Train YOLOv4 in Darknet
如何在Darknet中訓練YOLOv4
How to Train YOLOv5 in PyTorch
如何在PyTorch中訓練YOLOv5

As always — happy training!

一如既往-培訓愉快！

翻譯自: https://towardsdatascience.com/pp-yolo-surpasses-yolov4-object-detection-advances-1efc2692aa62