當前位置：首頁 > 人工智能 > 目标检测 >内容正文

目标检测

Label Assign综述：提升目标检测上限

發布時間：2025/3/8 目标检测 69 豆豆

生活随笔收集整理的這篇文章主要介紹了 Label Assign综述：提升目标检测上限小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文鏈接：https://bbs.cvmart.net/topics/2960
專注計算機視覺前沿資訊和技術干貨
微信公眾號：極市平臺
官網：https://www.cvmart.net/

最近因為AutoAssign這篇paper的原因，再加上之前對目標檢測中label assign問題很感興趣，看了幾篇label assign相關論文（FreeAnchor、ATSS、AutoAssign），梳理一下幾篇論文的關系做個記錄~~

我用一張圖大致梳理出幾個label assign相關論文的關系

FreeAnchor、ATSS、AutoAssign都是Label Assign方面的改進。ATSS提出RetinaNet和FCOS的gap主要源于采樣方式的不同，ATSS提出更好的Label Assign，來縮小RetinaNet和FCOS的差距，FreeAnchor在RetinaNet的基礎上提出更好的Label Assign，AutoAssign在FCOS的基礎上提出更好的Label Assign。

RetinaNet是Anchor-based經典算法，FCOS是Anchor-Free的經典算法，FCOS在RetinaNet的基礎上，去掉anchor先驗，轉變成point先驗，同時增加了center-ness分支來去除低質量的point采樣。相關的算法細節可以看我之前的筆記

陀飛輪：目標檢測：Anchor-Free時代

陀飛輪：Soft Sampling：探索更有效的采樣策略

ReinaNet和FCOS主要有3點不同：

1.每個位置的先驗數量不同。RetinaNet每個位置有幾個anchor先驗，而FCOS每個位置只有一個point先驗。

2.正負樣本的采樣方式不同。RetinaNet通過IOU來選擇正負樣本，而FCOS通過空間和尺度的約束來選擇正負樣本。

3.回歸的起始點不同。RetinaNet回歸的起始點是篩選過的anchor box，而FCOS回歸的起始點是篩選過的point box。

Inconsistency Removal

FCOS相比于只設置一個anchor先驗的RetinaNet來說，COCO數據集上的mAP高差不多5個點，為了公平的比較RetinaNet和FCOS的精度差異的原因，ATSS的paper中用一致的trick對RetinaNet和FCOS進行實驗。如實驗所示，相同trick設置下，RetinaNet比FCOS少0.8個點，這時RetinaNet和FCOS的差別僅僅在于采樣方式和回歸起點的不同。

Essential Difference

為了進一步探究采樣方式和回歸起點對于精度的影響，ATSS中對RetinaNet和FCOS排列組合了四種實驗進行比較。如實驗所示，回歸起點對精度影響微乎其微，采樣方式才是RetinaNet和FCOS最本質的差別。

通過上述實驗觀察，引出了目標檢測任務中非常重要的難點，How to define positive and negative training samples? 也就是本文關注的label assign問題。好的label assign方式可以提升目標檢測器的上限。

ATSS

ATSS的paper中進而提出了一種更加合適的label assign方式，稱為ATSS。

ATSS吸收了RetinaNet和FCOS的采樣方式的優點，ATSS采樣方式如下：

1.基于anchor box和ground-truth的中心點距離選擇候選正樣本

2.使用候選anchor box的mean和std之和作為IOU閾值自適應的挑選候選正樣本

3.通過候選正樣本中心是否落在ground-truth內篩選出最終的正樣本

下圖中顯示出了ATSS對于不同分布的樣本可以自適應的調整閾值，挑選到合適尺度下的正樣本。

自適應閾值為：
$t \_ { g } = m \_ { g } + v \_ { g }$

RetinaNet和FCOS都可以疊加ATSS的采樣方式，實驗結果顯示，RetinaNet和FCOS在ATSS的采樣方式下，mAP基本上持平，縮小了Anchor-based和Anchor-Free之間的精度差異。驗證了好的采樣方式能夠提升目標檢測精度的上限。

FreeAnchor

將檢測訓練過程看成一個極大似然估計問題
$L(θ)=∑_a_j∈A_+∑_b_i∈BC_ijL_ijcls(θ)+β∑_a_j∈A_+∑_b_i∈BC_ijL_ijloc(θ)+∑_a_j∈A_?L_jbg(θ)\mathcal { L } ( \theta ) = \sum \_ { a \_ { j } \in A \_ { + } } \sum \_ { b \_ { i } \in B } C \_ { i j } \mathcal { L } \_ { i j } ^ { c l s } ( \theta ) + \beta \sum \_ { a \_ { j } \in A \_ { + } } \sum \_ { b \_ { i } \in B } C \_ { i j } \mathcal { L } \_ { i j } ^ { l o c } ( \theta ) + \sum \_ { a \_ { j } \in A \_ { - } } \mathcal { L } \_ { j } ^ { b g } ( \theta )$

$P(θ)=e?L(θ)=∏_a_j∈A_+(∑_b_i∈BC_ije?L_ijcls(θ))∏_a_j∈A_+(∑_b_i∈BC_ije?βL_ijloc(θ))∏_a_j∈A_?e?L_jbg(θ)=∏_a_j∈A_+(∑_b_i∈BC_ijP_ijcls(θ))∏_a_j∈A_+(∑_b_i∈BC_ijP_ijloc(θ))∏_a_j∈A_?P_jbg(θ)\begin{aligned} \mathcal { P } ( \theta ) & = e ^ { - \mathcal { L } ( \theta ) } \\ & = \prod \_ { a \_ { j } \in A \_ { + } } ( \sum \_ { b \_ { i } \in B } C \_ { i j } e ^ { - \mathcal { L } \_ { i j } ^ { c l s } ( \theta ) }) \prod \_ { a \_ { j } \in A \_ { + } } ( \sum \_ { b \_ { i } \in B } C \_ { i j } e ^ { - \beta \mathcal { L } \_ { i j } ^ { l o c } ( \theta ) }) \prod \_ { a \_ { j } \in A \_ { - } } e ^ { - \mathcal { L } \_ { j } ^ { b g } ( \theta ) } \\ & = \prod \_ { a \_ { j } \in A \_ { + } } ( \sum \_ { b \_ { i } \in B } C \_ { i j } \mathcal { P } \_ { i j } ^ { c l s } ( \theta ) ) \prod \_ { a \_ { j } \in A \_ { + } } ( \sum \_ { b \_ { i } \in B } C \_ { i j } \mathcal { P } \_ { i j } ^ { l o c } ( \theta ) ) \prod \_ { a \_ { j } \in A \_ { - } } \mathcal { P } \_ { j } ^ { b g } ( \theta ) \end{aligned}$

構造極大似然估計問題的recall和precision似然函數

轉變成損失函數

FreeAnchor在RetianNet的基礎上，將檢測器的訓練過程定義成一個極大似然估計問題，通過優化recall和precision似然函數的loss，自適應的將匹配的anchor構建成bag of anchors。

將檢測看成一個極大似然估計問題的好處是可以不用平衡分類和定位分支，通過一個loss來監督檢測器的訓練，并且可以自適應的調整匹配的anchor。

如下圖所示，隨著訓練的進行，檢測器挑選出匹配的anchor。

AutoAssign

但是FreeAnchor和ATSS本質上還是通過中心先驗、IOU、空間和尺度約束來進行label assign，避免不了大量超參數的調整，不是完完全全的自適應label assign。

最近的AutoAssign在FCOS的基礎上，通過引入ImpObj、Center Weighting和Confidence Weighting三個分支，將FCOS中根據空間和尺度定義正負樣本的方式和center-ness分支都去掉，將label assign做的更加徹底，完完全全通過CNN學習自適應的label assign方式。

借鑒了作者的理解：https://zhuanlan.zhihu.com/p/158907507

From VanillaDet to AutoAssign

VanillaDet 是指：對于一個 gt box，所有在這個 gt box 內的位置（所有 FPN 層都包含在內），都是這個 gt 的正樣本；反之，所有不落在 gt 框內部的位置都是負樣本。可以理解為label assign的下限。

從實驗結果可知，更好的label assign方式可以大幅度提升檢測器的精度。

Center Weighting

引入高斯中心先驗，通過與gt中心點的距離學習出不同類別自適應的中心先驗
$\vec { d } \mid \vec { \mu } , \vec { \sigma } ) = e ^ { \frac { - ( \vec { d } - \vec { \mu } ) ^ { 2 } } { 2 \vec { \sigma } ^ { 2 } } }$
Confidence Weighting

通過ImpObj分支來避免引入大量背景位置

$P_i(cls∣θ)=P_i(cls∣obj,θ)P_i(obj∣θ)\mathcal { P } \_ { i } ( c l s \mid \theta ) = \mathcal { P } \_ { i } ( c l s \mid o b j , \theta ) \mathcal { P } \_ { i } ( o b j \mid \theta )$
與FreeAnchor相似，將分類和定位聯合看成極大似然估計問題，學習出樣本的置信度
$L_i(θ)=L_icls(θ)+λL_iloc(θ)=?log?(P_i(cls∣θ))+λL_iloc(θ)=?log?(P_i(cls∣θ)e?λL_iloc(θ))=?log?(P_i(cls∣θ)P_i(loc?∣θ))=?log?(P_i(θ))\begin{aligned} \mathcal { L } \_ { i } ( \theta ) & = \mathcal { L } \_ { i } ^ { c l s } ( \theta ) + \lambda \mathcal { L } \_ { i } ^ { l o c } ( \theta ) \\ & = - \log \left( \mathcal { P } \_ { i } ( c l s \mid \theta ) \right) + \lambda \mathcal { L } \_ { i } ^ { l o c } ( \theta ) \\ & = - \log ( \mathcal { P } \_ { i } ( c l s \mid \theta ) e ^ { - \lambda \mathcal { L } \_ { i } ^ { l o c } ( \theta ) }) \\ & = - \log \left( \mathcal { P } \_ { i } ( c l s \mid \theta ) \mathcal { P } \_ { i } ( \operatorname { loc } \mid \theta ) \right) \\ & = - \log \left( \mathcal { P } \_ { i } ( \theta ) \right) \end{aligned}$

$C(P_i)=ep_i(θ)τC \left( \mathcal { P } \_ { i } \right) = e ^ { \frac { p \_ { i } ( \theta ) } { \tau } }$

positive weights

通過Center Weighting和Confidence Weighting得到Positive weights
$w_i+=C(P_i)G(d?_i)∑_j∈S_nC(P_i)G(d?_i)w \_ { i } ^ { + } = \frac { C ( \mathcal { P } \_ { i } ) G ( \vec { d } \_ { i } ) } { \sum \_ { j \in S \_ { n } } C \left( \mathcal { P } \_ { i } \right) G ( \vec { d } \_ { i } ) }$
neative weights

通過最大IOU得到Negative weights
$w_i?=1?f(11?iou_i)w \_ { i } ^ { - } = 1 - f ( \frac { 1 } { 1 - \mathrm { i } \mathrm { ou } \_ { i } } )$
對于前景和背景的 weighting function，有一個共同的特點是 “單調遞增”；也就是說，一個位置預測 pos / neg 的置信度越高，那么他們當多前景 / 背景的權重就越大。

loss function

Positive weights和Negative weights在訓練過程中動態調整達到平衡，像是在學一個正負樣本的決策邊界，而根據IOU閾值來定義正負樣本的決策邊界是人為定義的。
$L(θ)=?∑_n=1Nlog?(∑_i∈S_nw_i+P_i+)?∑_j∈Slog?(w_j?P_j?)\mathcal { L } ( \theta ) = - \sum \_ { n = 1 } ^ { N } \log ( \sum \_ { i \in S \_ { n } } w \_ { i } ^ { + } \mathcal { P } \_ { i } ^ { + } ) - \sum \_ { j \in S } \log ( w \_ { j } ^ { - } \mathcal { P } \_ { j } ^ { - } )$

Ablation Studies

通過消融實驗可以看出，引入的3個分支對檢測器都有提升。

Visualization

AutoAssign 并不顯式的區分不同 FPN 層的 locations，而是全部平等地看待。因此，只需要依靠一套 weighitng 策略，就可以同時解決 spatial 和 scale 的 assignment 問題。

AutoAssign另一個理解，keypoint在heatmap上分類即定位，AutoAssign學習到的正負樣本決策邊界，是為了篩選出那些更加容易定位的點，分類后定位難度小。AutoAssign在動態學習真正有效的前景。

一些想法：

其實早期也有很多探索label assign相關的paper，比如RPN、FPN、cascade-RCNN、IoU-Net等等，但是這些文章基本上還是在anchor先驗的框架下，hand-craft的采樣方式設置空間和尺度，最近出現了很多通過CNN自適應學習出適合樣本的采樣方式，比如GuidedAnchor、MetaAnchor、FSAF、PISA等等，但是都沒有很好的解決label assign問題，還是存在一些敏感參數的設置。AutoAssign整體的設置感官上還是略顯復雜，但是避免了大量超參數的設置，使得檢測器更加魯棒。

目標檢測任務是介于分類和分割之間，比起分類可以進一步定位出位置，比起分割標注更為簡單，但是會不可避免的引入無關的背景信息，這就導致了目標檢測器對于正負樣本采樣格外敏感，anchor機制的引入正是為了更好的定位目標但是不可避免的引入了label assign問題，如何定義正負樣本顯的格外重要，單純的通過IOU閾值來區分正負樣本過于hard，**直覺上來講，目標檢測任務的label assign應該是一種連續問題，沒有真正意義上的正負樣本之分，簡單的根據IOU閾值定義正負樣本，這樣會將一個連續的label assign問題變成了一個離散的label assign問題，無法根本性解決目標檢測的label assign問題。**最新的幾種label assign方法本質上是將目標檢測的label assign設計成連續的自適應label assign。**如何更好的學習正負樣本的決策邊界是關鍵。**期待出現比AutoAssign更加簡潔的label assign的方法！

參考：

poodar.chu：From VanillaDet to AutoAssign

RetinaNet

FCOS

FreeAnchor

ATSS

AutoAssign

關注極市平臺公眾號（ID：extrememart），獲取計算機視覺前沿資訊/技術干貨/招聘面經等

總結

以上是生活随笔為你收集整理的Label Assign综述：提升目标检测上限的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：国防科大提出基于可变形三维卷积的视频超分
下一篇：超详细！使用OpenCV深度学习模块在图