當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Arcface v1 论文翻译与解读

發(fā)布時(shí)間：2025/3/21 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Arcface v1 论文翻译与解读小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

神羅Noctis 2019-10-13 16:14:39 ?543 ?收藏 4
展開(kāi)
論文地址：http://arxiv.org/pdf/1801.07698v1.pdf
最新版本v3的論文翻譯：Arcface v3 論文翻譯與解讀

Arcface v1 論文的篇幅比較長(zhǎng)，花費(fèi)了本人3天的時(shí)間進(jìn)行翻譯解讀，希望能夠幫助讀者更好地理解論文。

ArcFace: Additive Angular Margin Loss for Deep Face Recognition
目錄
Abstract

1. Introduction

2. From Softmax to ArcFace?

2.1. Softmax

2.2.Weights Normalisation

2.3. Multiplicative Angular Margin

2.4. Feature Normalisation

2.5. Additive Cosine Margin

2.6. Additive Angular Margin

2.7. Comparison under Binary Case

2.8. Target Logit Analysis

3. Experiments

3.1. Data

3.1.1 Training data

3.1.2 Validation data

3.1.3 Test data

3.2. Network Settings

3.2.1 Input setting

3.2.2 Output setting

3.2.3 Block Setting

3.2.4 Backbones

3.2.5 Network Setting Conclusions

3.3. Loss Setting

3.4. MegaFace Challenge1 on FaceScrub

3.5. Further Improvement by Triplet Loss

4. Conclusions

4. 結(jié)論

Abstract
Convolutional neural networks have significantly boosted the performance of face recognition in recent?years due to its high capacity in learning discriminative?features. To enhance the discriminative power of the Softmax?loss, multiplicative angular margin [23] and additive?cosine margin [44, 43] incorporate angular margin and?cosine margin into the loss functions, respectively. In?this paper, we propose a novel supervisor signal, additive?angular margin (ArcFace), which has a better geometrical interpretation than supervision signals proposed so far. Specifically, the proposed ArcFace cos(θ?+ m) directly?maximise decision boundary in angular (arc) space based?on the L2 normalised weights and features. Compared to?multiplicative angular margin cos(mθ) and additive cosine?margin cosθ -?m, ArcFace can obtain more discriminative?deep features. We also emphasise the importance of?network settings and data refinement in the problem of deep?face recognition. Extensive experiments on several relevant?face recognition benchmarks, LFW, CFP and AgeDB,?prove the effectiveness of the proposed ArcFace. Most importantly,we get state-of-art performance in the MegaFace?Challenge in a totally reproducible way. We make data,?models and training/test code public .

摘要
近年來(lái)，卷積神經(jīng)網(wǎng)絡(luò)顯著提高了人臉識(shí)別的性能，因其強(qiáng)大學(xué)習(xí)具有判別性特征的能力。為了提高Softmax損失的判別能力，乘法角度間隔(multiplicative angular margin)[23]和加法余弦間隔(additive cosine margin)[44,43]分別將角度間隔和余弦間隔加入到損失函數(shù)中。在本文中，我們提出了一種新穎的監(jiān)控信號(hào)，即(additive angular margin)加法角度間隔（ArcFace），它比目前提出的監(jiān)督信號(hào)具有更好的幾何解釋。具體來(lái)說(shuō)，所提出的ArcFace?cos(θ?+ m)直接將基于L2歸一化權(quán)重和特征的角度空間中的決策邊界最大化。與乘法角度間隔 cos(mθ) 和加法余弦間隔?cosθ -?m 相比，ArcFace可以獲得更具判別能力的深層特征。我們也強(qiáng)調(diào)了網(wǎng)絡(luò)設(shè)置和數(shù)據(jù)細(xì)化在深度人臉識(shí)別中的重要性。在相關(guān)的人臉識(shí)別基準(zhǔn)(LFW、CFP和AgeDB)上進(jìn)行了大量的實(shí)驗(yàn)，證明了ArcFace的有效性。最重要的是，我們?cè)贛egaFace挑戰(zhàn)賽中以完全可重現(xiàn)的方式獲得最先進(jìn)的性能。我們公開(kāi)數(shù)據(jù)、模型和培訓(xùn)/測(cè)試代碼。

圖1. ArcFace的幾何解釋。(a) 藍(lán)點(diǎn)和綠點(diǎn)代表來(lái)自?xún)蓚€(gè)不同類(lèi)的嵌入特征。ArcFace可以直接在類(lèi)之間增加角度間隔(angular (arc) margin)。(b)我們發(fā)現(xiàn)角度和角度間隔(arc margin)之間有一種直觀的對(duì)應(yīng)關(guān)系。Arcface的angular margin對(duì)應(yīng)超球面上的arc?margin(測(cè)地距離)。

1. Introduction
Face representation through the deep convolutional network embedding is considered the state-of-the-art method for face verification, face clustering, and face recognition?[42, 35, 31]. The deep convolutional network is responsible?for mapping the face image, typically after a pose normalisation?step, into an embedding feature vector such that?features of the same person have a small distance while features?of different individuals have a considerable distance.?The various face recognition approaches by deep convolutional network embedding differ along three primary?attributes.

1. 介紹
通過(guò)深度卷積網(wǎng)絡(luò)嵌入的人臉表征，被認(rèn)為是目前最先進(jìn)的人臉驗(yàn)證、人臉聚類(lèi)和人臉識(shí)別方法[42,35,31]。通常在歸一化步驟之后，深度卷積網(wǎng)絡(luò)負(fù)責(zé)將人臉圖像映射到嵌入的特征向量中，使得同一個(gè)人的特征距離小，不同一個(gè)人的特征距離大。基于深度卷積網(wǎng)絡(luò)嵌入的人臉識(shí)別方法，主要有三個(gè)屬性。

The first attribute is the training data employed to train?the model. The identity number of public available training?data, such as VGG-Face [31], VGG2-Face [7], CAISAWebFace?[48], UMDFaces [6], MS-Celeb-1M [11], and?MegaFace [21], ranges from several thousand to half million. Although MS-Celeb-1M and MegaFace have a significant?number of identities, they suffer from annotation?noises [47] and long tail distributions [50]. By comparison,?private training data of Google [35] even has several million?identities. As we can check from the latest performance report?of Face Recognition Vendor Test (FRVT) [4], Yitu, a
start-up company from China, ranks first based on their private?1.8 billion face images [5]. Due to orders of magnitude
difference on the training data scale, face recognition models?from industry perform much better than models from?academia. The difference of training data also makes some?deep face recognition results [2] not fully reproducible.?

第一個(gè)屬性是用于訓(xùn)練模型的訓(xùn)練數(shù)據(jù)。公開(kāi)人臉訓(xùn)練數(shù)據(jù)集，如VGG-Face [31]，VGG2-Face [7]，CAISAWebFace [48]，UMDFaces [6]，MS-Celeb-1M [11]和MegaFace [21]，包含了數(shù)量范圍從幾千到五十萬(wàn)的身份。盡管MS-Celeb-1M和MegaFace具有相當(dāng)數(shù)量的身份，但它們受到注釋噪聲[47]和長(zhǎng)尾分布[50]的影響。相比之下，Google [35]的私人訓(xùn)練數(shù)據(jù)集甚至擁有數(shù)百萬(wàn)個(gè)身份。從人臉識(shí)別供應(yīng)商Test (FRVT)[4]的最新業(yè)績(jī)報(bào)告中可以看出，中國(guó)初創(chuàng)企業(yè)依圖公司擁有18億張私有的人臉圖像[5]，排名第一。由于訓(xùn)練數(shù)據(jù)規(guī)模上的數(shù)量級(jí)差異，來(lái)自工業(yè)界的人臉識(shí)別模型比來(lái)自學(xué)術(shù)界的模型表現(xiàn)要好得多。訓(xùn)練數(shù)據(jù)的差異也使得一些深度人臉識(shí)別結(jié)果[2]不能完全復(fù)現(xiàn)。

The second attribute is the network architecture and?settings. High capacity deep convolutional networks, such as ResNet [14, 15, 46, 50, 23] and Inception-ResNet [40,3], can obtain better performance compared to VGG network?[37, 31] and Google Inception V1 network [41, 35].Different applications of deep face recognition prefer different?trade-off between speed and accuracy [16, 51]. For?face verification on mobile devices, real-time running speed?and compact model size are essential for slick customer experience.For billion level security system, high accuracy is?as important as efficiency.?

第二個(gè)屬性是網(wǎng)絡(luò)架構(gòu)和設(shè)置。高容量的深卷積網(wǎng)絡(luò)，如ResNet[14, 15, 46, 50, 23]和Inception-ResNet[40,3]，與VGG網(wǎng)絡(luò)[37,31]和Google Inception V1網(wǎng)絡(luò)[41,35]相比，可以獲得更好的性能。不同的深度人臉識(shí)別應(yīng)用在速度和精度之間的取舍是不同的[16,51]。對(duì)于移動(dòng)設(shè)備上的人臉驗(yàn)證，實(shí)時(shí)運(yùn)行速度和緊湊的模型大小對(duì)于流暢的客戶(hù)體驗(yàn)是至關(guān)重要的。對(duì)于十億級(jí)安全系統(tǒng)，精度和效率是同等重要的。

The third attribute is the design of the loss functions.

第三個(gè)屬性是損失函數(shù)的設(shè)計(jì)。

(1) Euclidean margin based loss.?

In [42] and [31], a Softmax classification layer is trained?over a set of known identities. The feature vector is then?taken from an intermediate layer of the network and used?to generalise recognition beyond the set of identities used?in training. Centre loss [46] Range loss [50] and Marginal?loss [10] add extra penalty to compress intra-variance or enlarge?inter-distance to improve the recognition rate, but all?of them still combine Softmax to train recognition models.However, the classification-based methods [42, 31] suffer?from massive GPU memory consumption on the classification?layer when the identity number increases to million?level, and prefer balanced and sufficient training data for?each identity.

The contrastive loss [39] and the Triplet loss [35] utilise?pair training strategy. The contrastive loss function consists?of positive pairs and negative pairs. The gradients of the loss?function pull together positive pairs and push apart negative?pairs.??Triplet loss minimises the distance between an anchor?and a positive sample and maximises the distance between?the anchor and a negative sample from a different identity.?However, the training procedure of the contrastive loss [39]?and the Triplet loss [35] is tricky due to the selection of?effective training samples.

(1) 基于歐幾里德距離的損失。

在[42]和[31]中，Softmax分類(lèi)層是在一組已知身份上訓(xùn)練的。然后，從網(wǎng)絡(luò)的中間層提取特征向量，用于訓(xùn)練中使用的一組身份之外的泛化識(shí)別。中心損失[46]范圍損失[50]和margin損失[10]增加了額外的懲罰來(lái)減小類(lèi)內(nèi)方差或增大類(lèi)間距離，從而提高識(shí)別率，但它們?nèi)匀唤Y(jié)合Softmax來(lái)訓(xùn)練識(shí)別模型。然而，當(dāng)身份數(shù)量增加到百萬(wàn)級(jí)別時(shí)，基于分類(lèi)(classi?cation-based)方法[42,31]會(huì)在分類(lèi)層上消耗大量的GPU內(nèi)存，并且每個(gè)身份都需要均衡且充足的訓(xùn)練數(shù)據(jù)。

對(duì)比損失[39]和三重?fù)p失[35]采用配對(duì)訓(xùn)練策略。對(duì)比損失函數(shù)由正對(duì)和負(fù)對(duì)組成。損失函數(shù)的梯度將正對(duì)(positive pairs)拉攏在一起，將負(fù)對(duì)(negative pairs)分開(kāi)。三重?fù)p失最小化錨點(diǎn)(anchor)與正樣本之間的距離，最大化錨點(diǎn)(anchor)與不同身份的負(fù)樣本之間的距離。然而，很難選擇有效的訓(xùn)練樣本，導(dǎo)致對(duì)比損失[39]和三重?fù)p失[35]的訓(xùn)練過(guò)程比較復(fù)雜。

(2) Angular and cosine margin based loss.

Liu et al. [24] proposed a large margin Softmax (L-Softmax)?by adding multiplicative angular constraints to?each identity to improve feature discrimination. SphereFace?cos(mθ) [23] applies L-Softmax to deep face recognition?with weights?normalisation. Due to the non-monotonicity?of the cosine function, a piece-wise function is applied in?SphereFace to guarantee the monotonicity. During training?of SphereFace, Softmax loss is combined to facilitate?and ensure the convergence. To overcome the optimisation?difficulty of SphereFace, additive cosine margin [44, 43]?cos(θ) - m moves the angular margin into cosine space. The?implementation and optimisation of additive cosine margin?are much easier than SphereFace. Additive cosine margin?is easily reproducible and achieves state-of-the-art performance?on MegaFace (TencentAILab_FaceCNN_v1) [2].
Compared to Euclidean margin based loss, angular and?cosine margin based loss explicitly adds discriminative?constraints on a hypershpere manifold, which intrinsically?matches the prior that human face lies on a manifold.

As is well known that the above mentioned three attributes, data, network and loss, have a high-to-low influence?on the performance of face recognition models. In?this paper, we contribute to improving deep face recognition?from all of these three attributes.

(2)基于角度間隔和余弦間隔的損失

Liu等人在[24]中提出了 large margin Softmax (L-Softmax)，通過(guò)在每個(gè)身份中添加乘法角度約束來(lái)提高特征的判別能力。 SphereFace?cos(mθ) 將L-Softmax應(yīng)用于權(quán)重歸一化的深度人臉識(shí)別。在SphereFace的訓(xùn)練中，結(jié)合了Softmax損失，以促進(jìn)和確保收斂。為了克服SphereFace的優(yōu)化困難，加法余弦間隔(additive cosine margin) [44, 43]?cos(θ) - m 將角度間隔移動(dòng)到余弦空間中。加法余弦間隔的實(shí)現(xiàn)和優(yōu)化比SphereFace容易得多。加法余弦間隔很容易重現(xiàn)，并在MegaFace（TencentAILab_FaceCNN_v1）[2] 上達(dá)到了最先進(jìn)的性能。與基于歐幾里德距離的損失相比，基于角度間隔和余弦間隔的損失在超球面流形上顯式地增加了判別約束，這本質(zhì)上與人臉?lè)植荚诹餍紊系南闰?yàn)匹配。

眾所周知，上述三個(gè)屬性，數(shù)據(jù)，網(wǎng)絡(luò)和損失，對(duì)人臉識(shí)別模型的性能有高低的影響。在這篇論文中，我們從這三個(gè)屬性來(lái)改進(jìn)深度人臉識(shí)別。

Data. We refined the largest public available training?data, MS-Celeb-1M [11], in both automatic and manual?way. We have checked the quality of the refined MS1M?dataset with the Resnet-27 [14, 50, 10] network and the?marginal loss [10] on the NIST Face Recognition Prize Challenge?. We also find that there are hundreds of overlap?face images between the MegaFace one million distractors?and the FaceScrub dataset, which significantly affects the?evaluation results. We manually find these overlap face images?from the MegaFace distractors. Both the refinement of?training data and test data will be public available.

數(shù)據(jù)。我們以自動(dòng)和手動(dòng)兩種方式對(duì)目前世界上規(guī)模最大的公開(kāi)人臉訓(xùn)練數(shù)據(jù)集MS-Celeb-1M[11]進(jìn)行清洗。我們使用Resnet-27[14, 50, 10]網(wǎng)絡(luò)和NIST人臉識(shí)別獎(jiǎng)挑戰(zhàn)中的margin損失[10]，檢測(cè)了修改后的MS1M數(shù)據(jù)集的質(zhì)量。我們還發(fā)現(xiàn)在MegaFace中的一百萬(wàn)個(gè)干擾集和FaceScrub數(shù)據(jù)集之間存在數(shù)百個(gè)重復(fù)的人臉圖像，這對(duì)評(píng)估結(jié)果有顯著影響。(MegaFace挑戰(zhàn)將從 Flickr Dataset中挑選的百萬(wàn)張人臉圖像作為測(cè)試時(shí)的干擾集?(distractors)，而使用的搜索測(cè)試集 (probes)來(lái)自于FaceScrub 數(shù)據(jù)集 )?我們?cè)贛egaFace的干擾集中手動(dòng)找到這些重復(fù)的人臉圖像。訓(xùn)練數(shù)據(jù)和測(cè)試數(shù)據(jù)的改進(jìn)都將公之于眾。

Network. Taking VGG2 [7] as the training data, we conduct?extensive contrast experiments regarding the convolutional
network settings and report the verification accuracy?on LFW, CFP and AgeDB. The proposed network settings?have been confirmed robust under large pose and age variations.?We also explore the trade-off between the speed and?accuracy based on the most recent network structures.

網(wǎng)絡(luò)。以VGG2[7]作為訓(xùn)練數(shù)據(jù)集，我們對(duì)卷積網(wǎng)絡(luò)設(shè)置進(jìn)行了大量對(duì)比實(shí)驗(yàn)，并報(bào)告了LFW、CFP和AgeDB的驗(yàn)證精度。在大的面部姿態(tài)變動(dòng)和年齡變化下，提出的網(wǎng)絡(luò)設(shè)置已被證實(shí)具有魯棒性。基于最新的網(wǎng)絡(luò)結(jié)構(gòu)，我們還探討了速度和精度之間的權(quán)衡。

Loss. We propose a new loss function, additive angular?margin (ArcFace), to learn highly discriminative features?for robust face recognition. As shown in Figure 1, the proposed loss function cos(θ + m) directly maximise?decision boundary in angular (arc) space based on the L2?normalised weights and features. We show that ArcFace?not only has a more clear geometrical interpretation but also?outperforms the baseline methods, e.g. multiplicative angular?margin [23] and additive cosine margin [44, 43]. We?innovatively explain why ArcFace is better than Softmax,?SphereFace [23] and CosineFace [44, 43] from the view of
semi-hard sample distributions.

損失。我們提出了一種新的損失函數(shù)——加法角度間隔(ArcFace)，學(xué)習(xí)具有高判別性的特征，以實(shí)現(xiàn)具有魯棒性的人臉識(shí)別。如圖1所示,所提出的損失函數(shù)cos(θ+ m)直接最大化基于L2歸一化權(quán)重和特征的角度空間中的決策邊界。我們分析表明，ArcFace不僅有更清晰的幾何解釋，而且比乘法角度間隔[23]和加法余弦間隔[44,43]這些 baseline方法更好。

Performance. The proposed ArcFace achieves state-ofthe-art results on the MegaFace Challenge [21], which is?the largest public face benchmark with one million faces?for recognition. We make these results totally reproducible?with data, trained models and training/test code public?available.

性能。所提出的ArcFace在MegaFace挑戰(zhàn)賽[21]上取得了優(yōu)異的成績(jī)，[21]是目前世界上規(guī)模最大和公開(kāi)的百萬(wàn)規(guī)模級(jí)別的人臉識(shí)別算法的測(cè)試基準(zhǔn)。我們將這些結(jié)果與數(shù)據(jù)、經(jīng)過(guò)訓(xùn)練的模型和訓(xùn)練/測(cè)試代碼公開(kāi)。

2. From Softmax to ArcFace?
2.1. Softmax
The most widely used classification loss function, Softmax
loss, is presented as follows:

where ??denotes the deep feature of the -th samples, belonging to the -th class. The feature dimension d is set?as 512 in this paper following [46, 50, 23, 43]. ?denotes the -th column of the weights ?in the?last fully connected layer and ?is the bias term. The?batch size and the class number is ?and , respectively.?Traditional Softmax loss is widely used in deep face recognition?[31, 7]. However, the Softmax loss function does?not explicitly optimise the features to have higher similarity?score for positive pairs and lower similarity score for negative?pairs, which leads to a?performance gap.

2.?從Softmax到ArcFace
2.1. Softmax
最廣泛使用的分類(lèi)損失函數(shù)Softmax損失如下：

其中??表示第個(gè)樣本的深層特征，屬于類(lèi)。本文根據(jù)[46,50,23,43]將特征維度設(shè)置為512。?表示最后一個(gè)完全連接層中權(quán)重??的第列，?為偏置項(xiàng)。批大小和類(lèi)的數(shù)量分別為和?。傳統(tǒng)的Softmax損失在深度人臉識(shí)別中得到了廣泛的應(yīng)用[31,7]。然而，Softmax損失函數(shù)沒(méi)有明確地優(yōu)化特征，使正對(duì)的相似性得分更高并且負(fù)對(duì)的相似性得分更低，這導(dǎo)致性能差距。

2.2.Weights Normalisation
For simplicity, we fix the bias ?as [23]. Then, we transform the target logit [32] as follows:

Following [23, 43, 45], we fix ??by L2 normalisation, which makes the predictions only depend on the angle?between the feature vector and the weight.

In the experiments of SphereFace, L2 weight normalisation?only improves little on performance.

2.2. 權(quán)重歸一化
為了簡(jiǎn)單起見(jiàn)，我們像[23]那樣，固定偏差??。然后，我們將目標(biāo)logit[32]變換如下:

在[23,43,45]之后，我們通過(guò)L2歸一化將權(quán)重向量固定為，這使得預(yù)測(cè)只依賴(lài)于特征向量和權(quán)重之間的角度。

在SphereFace的實(shí)驗(yàn)中，L2權(quán)重歸一化對(duì)性能的改善微乎其微。

2.3. Multiplicative Angular Margin
In SphereFace [23, 24], angular margin m is introduced?by multiplication on the angle.

where ?. In order to remove this restriction,??is substituted by a piece-wise monotonic function?. The SphereFace is formulated as:

where ,?,?,??is the integer
that controls the size of angular margin. However, during?the implementation of SphereFace, Softmax supervision is?incorporated to guarantee the convergence of training, and?the weight is controlled by a dynamic hyper-parameter λ.?With the additional Softmax loss, ??in fact is:

where λ is a additional hyper-parameter to facilitate the?training of SphereFace. λ is set to 1,000 at beginning and?decreases to 5 to make the angular space of each class more?compact [23]. This additional dynamic hyper-parameter λ?makes the training of SphereFace relatively tricky.

2.3. 乘法角度間隔
在SphereFace[23,24]中，角度間隔 ?通過(guò)乘法引入到角度中。

其中??。為了消除這個(gè)限制，用一個(gè)分段單調(diào)函數(shù) ?代替。SphereFace表示為:

其中，???，??，??是一個(gè)整數(shù)，它控制角度間隔的大小。然而，在SphereFace的實(shí)施過(guò)程中，引入一個(gè)Softmax監(jiān)督項(xiàng)?，用于確保訓(xùn)練時(shí)收斂，監(jiān)督項(xiàng)的權(quán)重由動(dòng)態(tài)的超參數(shù) λ 控制。加上額外的Softmax損失，?實(shí)際變?yōu)?#xff1a;

其中 λ 是一個(gè)額外的超參數(shù)，用于促進(jìn)SphereFace的訓(xùn)練。?λ在開(kāi)始時(shí)設(shè)置為1,000，并且逐漸減小到5，使得每個(gè)類(lèi)的角度空間更緊湊[23]。這個(gè)額外的動(dòng)態(tài)超參數(shù) λ 使得SphereFace的訓(xùn)練難度更大。

2.4. Feature Normalisation
Feature normalisation is widely used for face verification,e.g. L2-normalised Euclidean distance and cosine distance?[29]. Parde et al. [30] observe that the L2-norm of?features learned using Softmax loss is informative of the?quality of the face. Features for good quality frontal faces?have a high L2-norm while blurry faces with extreme pose?have low L2-norm. Ranjan et al. [33] add the L2-constraint?to the feature descriptors and restrict features to lie on a?hypersphere of a fixed radius. L2 normalisation on features?can be easily implemented using existing deep learning?frameworks and significantly boost the performance of?face verification. Wang et al. [44] point out that gradient?norm may be extremely large when the feature norm from?low-quality face image is very small, which potentially increases?the risk of gradient explosion. The advantages of?feature normalisation are also revealed in [25, 26, 43, 45]?and the feature normalisation is explained from analytic, geometric?and experimental perspectives.

2.4. 特征歸一化
特征歸一化被廣泛用于人臉驗(yàn)證，例如，L2歸一化的歐幾里德距離和余弦距離[29]。Parde等人[30]觀察到，使用的softmax損失學(xué)習(xí)的歸一化特征，對(duì)于得到關(guān)于人臉質(zhì)量的信息是有幫助的。高質(zhì)量正面臉的特征具有較高的L2范數(shù)，而姿態(tài)極端的模糊臉的特征具有較低的L2范數(shù)。Ranjan等人將L2約束添加到特征描述符中，并限制特征分布在一個(gè)半徑固定的超球面上。使用現(xiàn)有的深度學(xué)習(xí)框架可以很容易地實(shí)現(xiàn)特征的L2歸一化，并顯著提高人臉驗(yàn)證的性能。Wang等人[44]指出，當(dāng)來(lái)自低質(zhì)量人臉圖像的特征范數(shù)非常小時(shí)，梯度范數(shù)可能非常大，這可能增加了梯度爆炸的風(fēng)險(xiǎn)。在[25,26,43,45]揭示了特征歸一化的優(yōu)點(diǎn)，并從分析、幾何和實(shí)驗(yàn)的角度對(duì)特征歸一化進(jìn)行了解釋。

As we can see from above works, L2 normalisation on?features and weights is an important step for hypersphere?metric learning. The intuitive insight behind feature and?weight normalisation is to remove the radial variation and?push every feature to distribute on a hypersphere manifold.

從以上工作可以看出，特征和權(quán)值的L2歸一化是超球度量學(xué)習(xí)的重要步驟。特征和權(quán)重歸一化背后的直覺(jué)洞察力是去除放射性狀的變量，并推動(dòng)每個(gè)特征分布在超球面流形上。

Following [33, 43, 45, 44], we fix???by L2 normalisation?and re-scale ?to s, which is the hypersphere?radius and the lower bound is give in [33]. In this paper, we use s = 64 for face recognition experiments [33, 43].?Based on feature and weight normalisation, we can get??.

按照[33,43,45,44]，我們通過(guò)L2歸一化將??固定，并將??重新縮放到?，也就是超球面的半徑，其下界在[33]中給出。本文采用s = 64進(jìn)行人臉識(shí)別實(shí)驗(yàn)[33,43]。基于特征和權(quán)重歸一化，我們可以得到。

If the feature normalisation is applied to SphereFace, we can get the feature normalised SphareFace, denoted as?SphereFace-FNorm

如果將特征歸一化應(yīng)用于SphereFace，則可以得到特征歸一化的SphareFace，表示為SphereFace-FNorm

?
2.5. Additive Cosine Margin
In [44, 43], the angular margin m is removed to the outside?of cosθ, thus they propose the cosine margin loss function:

In this paper, we set the cosine margin m as 0:35 [44, 43].Compared to SphereFace, additive cosine margin (CosineFace)?has three advantages: (1) extremely easy to implement?without tricky hyper-parameters; (2) more clear and?able to converge?without the Softmax supervision; (3) obvious?performance improvement.

2.5. 加法余弦間隔
在[44,43]中，角度間隔?被移動(dòng)到cosθ的外邊，這樣一來(lái)，它們提出了余弦間隔損失函數(shù)：

在本文中，我們將余弦間隔設(shè)為0:35[44,43]。與SphereFace相比，加法余弦間隔(CosineFace)具有以下三個(gè)優(yōu)點(diǎn):(1)無(wú)需復(fù)雜的超參數(shù)即可輕松實(shí)現(xiàn)；?(2)更清晰，不需要Softmax監(jiān)督即可收斂；(3)性能明顯提高。

2.6. Additive Angular Margin
Although the cosine margin in [44, 43] has a one-to-one?mapping from the cosine space to the angular space, there is?still a difference between these two margins. In fact, the angular?margin has a more clear geometric interpretation compared?to cosine margin, and the margin in angular space corresponds?to the arc distance on the hypersphere manifold.

We add an angular margin m within cosθ. Since cos(θ+m) is lower than cos(θ) when???, the constraint?is more stringent for classification. We define the proposed?ArcFace as:

If we expand the proposed additive angular margin?cos(θ+m), we get cos(θ+m) = cosθcosm - sinθsinm.?Compared to the additive cosine margin cos(θ) - m proposed?in [44, 43], the proposed ArcFace is similar but the?margin is dynamic due to sin θ.?In Figure 2, we illustrate the proposed ArcFace, and the?angular margin corresponds to the arc margin. Compared to
SphereFace and CosineFace, our method has the best geometric?interpretation.

In Figure 2, we illustrate the proposed ArcFace, and the?angular margin corresponds to the arc margin. Compared to?SphereFace and CosineFace, our method has the best geometric?interpretation.

2.6. 加法角度間隔
雖然[44,43]中的余弦間隔從余弦空間到角度空間是一對(duì)一映射的，但這兩個(gè)間隔(margin)之間仍然存在差異。事實(shí)上，與余弦間隔相比，角度間隔有更清晰的幾何解釋，角度空間中的間隔(margin)對(duì)應(yīng)于超球面流形上的弧距(arc distance)。

我們?cè)赾osθ里面增加一個(gè)角度間隔。因?yàn)閏os(θ+ m)小于cos(θ)，所以當(dāng)??時(shí)，對(duì)于分類(lèi)，約束更為嚴(yán)格。我們將提議的ArcFace定義為:

如果我們將提出的加法角度間隔cos(θ+ m)展開(kāi)，得到?cos(θ+m) = cosθcosm - sinθsinm。與[44,43]中提出的加法余弦間隔cos(θ) - m?相比，提出的ArcFace與之類(lèi)似，由于有sinθ，所以margin是動(dòng)態(tài)的。

在圖2中，我們說(shuō)明了所提出的ArcFace，角度間隔(arc margin)對(duì)應(yīng)于弧度間隔(arc margin)。與SphereFace和CosineFace相比，我們的方法具有最佳的幾何解釋。

圖2. ArcFace的幾何解釋。不同的顏色區(qū)域代表不同類(lèi)的特征空間。ArcFace不僅可以壓縮特征區(qū)域，而且可以對(duì)應(yīng)超球面上的測(cè)地線距離。

2.7. Comparison under Binary Case
To better understand the process from Softmax to the?proposed ArcFace, we give the decision boundaries under?binary classification case in Table 1 and Figure 3. Based on?the weights and features normalisation, the main difference?among these methods is where we put the margin.

2.7. 二分類(lèi)情景下的比較
為了更好地理解從Softmax到所提出的ArcFace的過(guò)程，我們?cè)诒?和圖3中給出了二元分類(lèi)情況下的決策邊界。基于權(quán)重和特征歸一化，這些方法之間的主要區(qū)別是margin的擺放位置。

表1. 二分類(lèi)情景下類(lèi)1的決策邊界。注意，?是?和?之間的角度，是超球面半徑，是間隔(margin)。

圖3. 二分類(lèi)情景下不同損失函數(shù)的決策間隔(decision margins)。虛線表示決策邊界，灰色區(qū)域是決策間隔。

2.8. Target Logit Analysis
To investigate why the face recognition performance can?be improved by SphereFace, CosineFace and ArcFace, we
analysis the target logit curves and the θ distributions during?training. Here, we use the LResNet34E-IR (refer to?Sec. 3.2) network and the refined MS1M dataset (refer to?Sec. 3.1).

2.8. 目標(biāo)Logit分析
補(bǔ)充一點(diǎn)：target logit按照字面翻譯是目標(biāo)邏輯，但實(shí)際上跟論文想表達(dá)的意思不符。target logit代表的是全連接層輸出矩陣中預(yù)測(cè)類(lèi)別為真實(shí)類(lèi)別的輸出，應(yīng)該翻譯成目標(biāo)分?jǐn)?shù)比較好。

為了研究為什么SphereFace，CosineFace和ArcFace可以改善人臉識(shí)別性能，我們分析了目標(biāo)logit曲線和訓(xùn)練期間的θ分布。在這里，我們使用LResNet34E-IR（參見(jiàn)3.2節(jié)）網(wǎng)絡(luò)和修改后的MS1M數(shù)據(jù)集（參見(jiàn)3.1節(jié)）。

圖4. 目標(biāo)logit分析。 (a) Softmax，SphereFace，CosineFace和ArcFace的目標(biāo)logit曲線。 (b)?對(duì)Softmax，CosineFace和ArcFace進(jìn)行批訓(xùn)練，估算的目標(biāo)logit 收斂曲線。 (c)?在訓(xùn)練期間，θ分布從大角度移動(dòng)到小角度（開(kāi)始，中間和結(jié)束）。最好通過(guò)放大查看。

In Figure 4(a), we plot the target logit curves for?Softmax, SphereFace, CosineFace and ArcFace. For?SphereFace, the best setting is m = 4 and λ = 5, which?is similar to the curve with m = 1.5 and λ = 0. However,?the implementation of SphereFace requires the m to be an?integer. When we try the minimum multiplicative margin,?m = 2 and λ = 0, the training can not converge. Therefore,?decreasing the target logit curve slightly from Softmax is?able to increase the training difficulty and improve the performance,?but decreasing too much may cause the training?divergence.

在圖4(a)中，我們繪制了Softmax、SphereFace、CosineFace和ArcFace的目標(biāo)logit曲線。對(duì)于SphereFace,最佳設(shè)置為m = 4 和 λ= 5, 它類(lèi)似于m = 1.5 和 λ= 0 的曲線。但是，SphereFace的實(shí)現(xiàn)要求m為整數(shù)。我們嘗試最小化乘法間隔,m = 2 和 λ= 0,但是訓(xùn)練無(wú)法收斂。因此，與Softmax相比，稍微降低目標(biāo)logit曲線可以增加訓(xùn)練難度，提高訓(xùn)練效果，但降低太多會(huì)導(dǎo)致訓(xùn)練發(fā)散。

Both CosineFace and ArcFace follow this insight. As we?can see from Figure 4(a), CosineFace moves the target logit?curve along the negative direction of y-axis, while ArcFace?moves the target logit curve along the negative direction of?x-axis. Now, we can easily understand the performance improvement?from Softmax to CosineFace and ArcFace.

CosineFace和ArcFace都遵循這一觀點(diǎn)。從圖4(a)可以看出，CosineFace將目標(biāo)logit曲線沿y軸負(fù)方向移動(dòng)，ArcFace將目標(biāo)logit曲線沿x軸負(fù)方向移動(dòng)。現(xiàn)在，我們可以很容易地理解從Softmax到CosineFace和ArcFace的性能改進(jìn)。

For ArcFace with the margin m = 0.5, the target logit?curve is not monotonic decreasing when ?. In?fact, the target logit curve increases when .?However, as shown in Figure 4(c), the θ has a Gaussian distribution?with the centre at ?and the largest angle below??when starting from the randomly initialised network.?The increasing interval of ArcFace is almost never reached?during training. Therefore, we do not need to deal with this?explicitly.

對(duì)于margin m = 0.5的ArcFace，當(dāng)??時(shí)，目標(biāo)logit曲線不是單調(diào)遞減的。事實(shí)上，當(dāng)?時(shí)，目標(biāo)logit曲線會(huì)增加。然而，如圖4（c）所示，當(dāng)從隨機(jī)初始化網(wǎng)絡(luò)開(kāi)始時(shí)，θ具有高斯分布，其中心位于，最大角度小于。在訓(xùn)練期間，ArcFace逐漸增大的間隔幾乎從未達(dá)到??。因此，我們不需要明確地處理這個(gè)問(wèn)題。

In Figure 4(c), we show the θ distributions of CosineFace?and ArcFace in three phases of training, e.g. start, middle?and end. The distribution centres gradually move from ?to . In Figure 4(a), we find the target logit curve?of ArcFace is lower than that of CosineFace between ?to?. Therefore, the proposed ArcFace puts more strict margin?penalty compared to CosineFace in this interval. In Figure?4(b), we show the target logit converge curves estimated?on training batches for Softmax, CosineFace and ArcFace.?We can also find that the margin penalty of ArcFace is heavier?than that of CosineFace at the beginning, as the red dotted?line is lower than the blue dotted line. At the end of?training, ArcFace converges better than CosineFace, as the?histogram of θ is in the left (Figure 4(c)) and the target logit?converge curve is higher (Figure 4(b)). From Figure 4(c),?we can find that almost all of the θs are smaller than ?at?the end of training. The samples beyond this field are the?hardest samples as well as the noise samples of the training?dataset. Even though CosineFace puts more strict margin?penalty when ?(Figure 4(a)), this field is seldom?reached even at the end of training (Figure 4(c)). Therefore,?we can also understand why SphereFace can obtain?very good performance even with a relatively small margin?in this section.

在圖4(c)中，我們展示了CosineFace和ArcFace在三個(gè)訓(xùn)練階段的θ分布，例如：開(kāi)始，中間和結(jié)束。θ值的分布中心逐漸從?移動(dòng)到??之間。在圖4(a)中，我們發(fā)現(xiàn)在?到??之間，ArcFace的目標(biāo)logit曲線低于CosineFace的目標(biāo)logit曲線。因此，與CosineFace相比，本文提出的ArcFace對(duì)此區(qū)間內(nèi)的margin懲罰更為嚴(yán)格。在圖4(b)中，我們展示了對(duì)Softmax、CosineFace和ArcFace進(jìn)行批訓(xùn)練，估算出的目標(biāo)logit收斂曲線。我們還可以發(fā)現(xiàn)在開(kāi)始時(shí),ArcFace的margin懲罰比CosineFace重，因?yàn)榧t色虛線低于藍(lán)色虛線。在訓(xùn)練結(jié)束時(shí)，ArcFace收斂比CosineFace要好，因?yàn)棣鹊闹狈綀D在左邊（圖4(c)），目標(biāo)logit收斂曲線更高（圖4(b)）。從圖4(c)中，我們可以發(fā)現(xiàn)在訓(xùn)練結(jié)束時(shí)幾乎所有的θ都小于。超出這個(gè)區(qū)域的樣本是最困難的樣本和訓(xùn)練數(shù)據(jù)集的噪聲樣本。當(dāng)?（圖4(a)），盡管CosineFace會(huì)對(duì)margin進(jìn)行更嚴(yán)格的懲罰，在訓(xùn)練結(jié)束時(shí)也很少達(dá)到這個(gè)區(qū)域(圖4(c))。因此，我們也可以理解為什么SphereFace在本節(jié)中即使是相對(duì)較小的margin也可以獲得非常好的性能。

In conclusion, adding too much margin penalty when??may cause training divergence,??e.g.?SphereFace (m = 2 and λ = 0). Adding margin when?can potentially improve the performance,?because this section corresponds to the most effective semihard?negative samples [35]. Adding margin when ?can not obviously improve the performance, because this?section corresponds to the easiest samples. When we go?back to Figure 4(a) and rank the curves between ,?we can understand why the performance can improve from?Softmax, SphereFace, CosineFace to ArcFace under their?best parameter settings. Note that, ?and ?here are?the roughly estimated thresholds for easy and hard training?samples.

總之，當(dāng)??，添加太大的margin懲罰，可能會(huì)導(dǎo)致訓(xùn)練發(fā)散，例如SphereFace (m = 2 和 λ= 0) 。當(dāng)?，添加margin有可能會(huì)提高性能，因?yàn)檫@部分對(duì)應(yīng)最有效的半困難negative樣本[35]。當(dāng)，添加margin無(wú)法明顯改善性能，因?yàn)檫@部分對(duì)應(yīng)于最簡(jiǎn)單的樣本。當(dāng)我們回到圖4(a)，對(duì)??之間的曲線進(jìn)行排序時(shí)，我們可以理解為什么在它們(Softmax，SphereFace，CosineFace、ArcFace)的最佳參數(shù)設(shè)置下，性能會(huì)有所提高。請(qǐng)注意，此處的??and ?是對(duì)于簡(jiǎn)單和困難訓(xùn)練樣本，粗略估計(jì)的閾值。

3. Experiments
In this paper, we target to obtain state-of-the-art performance?on MegaFace Challenge [21], the largest face?identification and verification benchmark, in a totally reproducible?way. We take Labelled Faces in the Wild?(LFW) [19], Celebrities in Frontal Profile (CFP) [36], Age?Database (AgeDB) [27] as the validation datasets, and conduct?extensive experiments regarding network settings and?loss function designs. The proposed ArcFace achieves?state-of-the-art performance on all of these four datasets.

3. 實(shí)驗(yàn)
在本文中，我們的目標(biāo)是在MegaFace Challenge [21]中以完全可復(fù)制的方式獲得最先進(jìn)的性能，其中MegaFace Challenge是目前世界上規(guī)模最大的人臉識(shí)別和人臉驗(yàn)證測(cè)試基準(zhǔn)。我們采用 Labelled Faces in the Wild?(LFW) [19], Celebrities in Frontal Profile (CFP) [36], Age?Database (AgeDB) [27] 作為驗(yàn)證數(shù)據(jù)集，并對(duì)有關(guān)網(wǎng)絡(luò)設(shè)置和損失函數(shù)設(shè)計(jì)進(jìn)行大量的實(shí)驗(yàn)。所提出的ArcFace在這四個(gè)數(shù)據(jù)集上實(shí)現(xiàn)了最先進(jìn)的性能。

3.1. Data
3.1.1 Training data
We use two datasets, VGG2 [7] and MS-Celeb-1M [11], as?our training data.

VGG2. VGG2 dataset contains a training set with 8,631?identities (3,141,890 images) and a test set with 500 identities?(169,396 images). VGG2 has large variations in pose,?age, illumination, ethnicity and profession. Since VGG2 is
a high-quality dataset, we use it directly without data refinement.

MS-Celeb-1M. The original MS-Celeb-1M dataset contains?about 100k identities with 10 million images. To decrease?the noise of MS-Celeb-1M and get a high-quality?training data, we rank all face images of each identity by?their distances to the identity centre. For a particular identity,?the face image whose feature vector is too far from the?identity’s feature centre is automatically removed [10]. We?further manually check the face images around the threshold?of the first automatic step for each identity. Finally, we?obtain a dataset which contains 3.8M images of 85k unique?identities. To facilitate other researchers to reproduce all of?the experiments in this paper, we make the refined MS1M?dataset public available within a binary file, but please cite?the original paper [11] and follow the original license [11]?when using this dataset. Our contribution here is only training?data refinement, not release.

3.1. 數(shù)據(jù)
3.1.1 訓(xùn)練數(shù)據(jù)集
我們使用兩個(gè)數(shù)據(jù)集，VGG2 [7]和MS-Celeb-1M [11]作為我們的訓(xùn)練數(shù)據(jù)集。

VGG2。 VGG2數(shù)據(jù)集包含具有8,631個(gè)身份（3,141,890個(gè)圖像）的訓(xùn)練集和具有500個(gè)身份（169,396個(gè)圖像）的測(cè)試集。VGG2在姿勢(shì)，年齡，光照，種族和職業(yè)方面有很大差異。由于VGG2是一個(gè)高質(zhì)量的數(shù)據(jù)集，我們直接使用它，無(wú)需對(duì)數(shù)據(jù)進(jìn)行清洗。

MS-Celeb-1M。最初的MS-Celeb-1M數(shù)據(jù)集包含大約10萬(wàn)個(gè)身份和1000萬(wàn)張圖像。為了降低MS-Celeb-1M的噪聲并獲得高質(zhì)量的訓(xùn)練數(shù)據(jù)，我們將每個(gè)身份的所有面部圖像按照它們到身份中心的距離進(jìn)行排序。對(duì)于一個(gè)特定的身份，如果其特征向量距離身份特征中心太遠(yuǎn)，則該人臉圖像將被自動(dòng)清洗[10]。在第一個(gè)自動(dòng)步驟中，我們進(jìn)一步為每個(gè)身份手動(dòng)檢查閾值附近的人臉圖像。最后，我們得到一個(gè)包含3.8M張圖像(85k個(gè)唯一身份)的數(shù)據(jù)集。為了方便其他研究人員復(fù)制本文中的所有實(shí)驗(yàn)，我們用一個(gè)二進(jìn)制文件，將清洗過(guò)的MS1M數(shù)據(jù)集公開(kāi)，但是在使用該數(shù)據(jù)集時(shí)，請(qǐng)引用原始論文[11]并遵循原始許可證[11]。我們?cè)谶@里的貢獻(xiàn)只是對(duì)訓(xùn)練數(shù)據(jù)進(jìn)行修改，而不是發(fā)布。

3.1.2 Validation data
We employ Labelled Faces in the Wild (LFW) [19],Celebrities in Frontal Profile (CFP) [36] and Age Database?(AgeDB) [27] as the validation datasets.

LFW. [19] LFW dataset contains 13,233 web-collected?images from 5749 different identities, with large variations?in pose, expression and illuminations. Following the standard?protocol of unrestricted with labelled outside data, we?give the verification accuracy on 6,000 face pairs.??

CFP. [36]. CFP dataset consists of 500 subjects, each?with 10 frontal and 4 profile images. The evaluation protocol includes frontal-frontal (FF) and frontal-profile (FP)?face verification, each having 10 folders with 350 sameperson?pairs and 350 different-person pairs. In this paper,?we only use the most challenging subset, CFP-FP, to report?the performance.

AgeDB. [27, 10] AgeDB dataset is an in-the-wild dataset?with large variations in pose, expression, illuminations, and?age. AgeDB contains 12,240 images of 440 distinct subjects,?such as actors, actresses, writers, scientists, and politicians.?Each image is annotated with respect to the identity,?age and gender attribute. The minimum and maximum ages are 3 and 101, respectively. The average age range for each?subject is 49 years. There are four groups of test data with different year gaps (5 years, 10 years, 20 years and 30 years,respectively) [10]. Each group has ten split of face images,?and each split contains 300 positive examples and 300 negative?examples. The face verification evaluation metric is?the same as LFW. In this paper, we only use the most challenging?subset, AgeDB-30, to report the performance.

3.1.2 驗(yàn)證數(shù)據(jù)集
我們采用?Labelled Faces in the Wild (LFW) [19],Celebrities in Frontal Profile (CFP) [36] and Age Database?(AgeDB) [27] 作為驗(yàn)證數(shù)據(jù)集。

LFW。 [19] LFW數(shù)據(jù)集包含來(lái)自5749個(gè)不同身份的13,233個(gè)網(wǎng)絡(luò)收集的圖像，其姿態(tài)，表情和照明有很大變化。遵循不受限制的標(biāo)準(zhǔn)協(xié)議，并標(biāo)注外部數(shù)據(jù)，我們給出了6,000對(duì)人臉的驗(yàn)證精度。

CFP。[36]。CFP數(shù)據(jù)集包含500名受試者，每個(gè)受試者有10張正面圖和4張側(cè)面圖。評(píng)估方案包括正面對(duì)正面(FF)和正面對(duì)側(cè)面(FP)的人臉驗(yàn)證，每個(gè)都有10個(gè)文件夾，包含350對(duì)相同的人和350對(duì)不同的人。在本文中，我們僅使用最具挑戰(zhàn)性的子集CFP-FP來(lái)報(bào)告性能。

補(bǔ)充：收集和注釋在無(wú)約束條件下捕獲的面部圖像，通常被稱(chēng)為“in-the-wild”

AgeDB。[27,10] AgeDB數(shù)據(jù)集是一種in-the-wild的數(shù)據(jù)集，在姿態(tài)、表情、光照和年齡方面有很大的變化。AgeDB。[27,10] AgeDB數(shù)據(jù)集是一種野外數(shù)據(jù)集，在姿態(tài)、表情、光照和年齡方面有很大的變化。AgeDB包含了12240張440個(gè)不同主題的圖片，這些主題包括男女演員、作家、科學(xué)家和政治家。每個(gè)圖像都有關(guān)于身份、年齡和性別屬性的注釋。最小年齡為3歲，最大年齡為101歲。每個(gè)研究對(duì)象的平均年齡范圍為49歲。有四組測(cè)試數(shù)據(jù)具有不同的年份差距（分別為5年，10年，20年和30年）[10]。每組10張分割的人臉圖片，每組包含300個(gè)正例圖片和300個(gè)負(fù)例圖片。人臉驗(yàn)證評(píng)價(jià)指標(biāo)與LFW相同。在本文中，我們僅使用最具挑戰(zhàn)性的子集AgeDB-30來(lái)報(bào)告性能。

3.1.3 Test data
MegaFace. MegaFace datasets [21] are released as the?largest public available testing benchmark, which aims at?evaluating the performance of face recognition algorithms?at the million scale of distractors. MegaFace datasets include?gallery set and probe set. The gallery set, a subset of?Flickr photos from Yahoo, consists of more than one million?images from 690k different individuals. The probe sets?are two existing databases: FaceScrub [28] and FGNet [1].?FaceScrub is a publicly available dataset that containing?100k photos of 530 unique individuals, in which 55,742?images are males, and 52,076 images are females. FGNet?is a face ageing dataset, with 1002 images from 82 identities.?Each identity has multiple face images at different ages?(ranging from 1 to 69).

It is quite understandable that data collection of?MegaFace is very arduous and time-consuming thus data?noise is inevitable. For FaceScrub dataset, all of the face?images from one particular identity should have the same?identity. For the one million distractors, there should not?be any overlap with the FaceScrub identities. However, we?find noisy face images not only exist in FaceScrub dataset?but also exist in the one million distractors, which significantly?affect the performance.

In Figure 5, we give the noisy face image examples from?the Facesrub dataset. As shown in Figure 8(c), we rank all?of the faces according to the cosine distance to the identity?centre. In fact, face image 221 and 136 are not Aaron Eckhart.?We manually clean the FaceScrub dataset and finally?find 605 noisy face images. During testing, we change the?noisy face to another right face, which can increase the identification?accuracy by about 1%. In Figure 6(b), we give the?noisy face image examples from the MegaFace distractors.?All of the four face images from the MegaFace distractors?are Alec Baldwin. We manually clean the MegaFace distractors?and finally find 707 noisy face images. During testing,?we add one additional feature dimension to distinguish?these noisy faces, which can increase the identification accuracy?by about 15%.

Even though the noisy face images are double checked?by seven annotators who are very familiar with these?celebrities, we still can not promise these images are 100%?noisy. We put the noise lists of the FaceScrub dataset and?the MegaFace distractors online. We believe the masses?have sharp eyes and we will update these lists based on other?researchers’ feedback.

3.1.3 測(cè)試數(shù)據(jù)集
MegaFace。MegaFace數(shù)據(jù)集[21]作為世界上規(guī)模最大的公開(kāi)測(cè)試基準(zhǔn)，旨在評(píng)估人臉識(shí)別算法在百萬(wàn)級(jí)干擾項(xiàng)干擾下的性能。MegaFace數(shù)據(jù)集包括圖庫(kù)集和探測(cè)集。圖庫(kù)集是來(lái)自雅虎Flickr照片的一個(gè)子集，由來(lái)自69萬(wàn)不同個(gè)體的100多萬(wàn)張照片組成。探測(cè)集是兩個(gè)現(xiàn)有的數(shù)據(jù)庫(kù):FaceScrub[28]和FGNet[1]。FaceScrub是一個(gè)公開(kāi)數(shù)據(jù)集，包含530個(gè)獨(dú)立個(gè)體的100k張照片，其中55,742張是男性，52,076張是女性。FGNet是一個(gè)面部老化數(shù)據(jù)集，包含來(lái)自82個(gè)身份的1002張圖像。每個(gè)身份在不同年齡(從1歲到69歲)都有多個(gè)人臉圖像。

可以理解的是，MegaFace的數(shù)據(jù)采集是非常艱巨和耗時(shí)的，因此數(shù)據(jù)噪聲是不可避免的。對(duì)于FaceScrub數(shù)據(jù)集，來(lái)自一個(gè)特定身份的所有人臉圖像應(yīng)該具有相同的身份。對(duì)于數(shù)量為一百萬(wàn)的干擾集，不應(yīng)該與FaceScrub身份有任何重復(fù)。然而，我們發(fā)現(xiàn)噪聲人臉圖像不僅存在于FaceScrub數(shù)據(jù)集中，而且還存在于數(shù)量為一百萬(wàn)的干擾集中，這對(duì)性能有很大的影響。

在圖5中，我們給出了來(lái)自Facesrub數(shù)據(jù)集的噪聲人臉圖像示例。如圖8(c)所示，我們根據(jù)到身份中心的余弦距離對(duì)所有的人臉進(jìn)行排序。事實(shí)上，noise face 221和?noise face 136?并不是Aaron Eckhart。我們手動(dòng)清理FaceScrub數(shù)據(jù)集，最終找到605張有噪聲的人臉圖像。在測(cè)試過(guò)程中，我們將有噪聲的人臉變換為另一個(gè)右臉，可以使識(shí)別精度提高約1%。在圖6(b)中，我們給出了來(lái)自MegaFace干擾集的噪聲人臉圖像示例。這四張來(lái)自MegaFace干擾集的人臉圖像都是亞歷克·鮑德溫。我們手動(dòng)清楚了MegaFace的干擾集，最終找到了707張有噪聲的人臉圖像。在測(cè)試過(guò)程中，我們?cè)黾恿艘粋€(gè)額外的特征維度來(lái)區(qū)分這些有噪聲的人臉，可以將識(shí)別精度提高約15%。

盡管這些有噪聲的人臉圖像被7位非常熟悉這些名人的注釋者反復(fù)檢查，我們?nèi)匀徊荒鼙ＷC這些圖像100%沒(méi)有噪聲。我們將FaceScrub數(shù)據(jù)集和MegaFace干擾集的噪聲列表放到了網(wǎng)上。我們相信大眾有敏銳的眼睛，我們將根據(jù)其他研究人員的反饋更新這些列表。

圖5. 來(lái)自FaceScrub數(shù)據(jù)集的噪聲人臉圖像示例。在(a)中，圖像id放在左上角，到身份中心的余弦距離放在左下角。

圖6. (a)用于注釋器從FaceScrub數(shù)據(jù)集學(xué)習(xí)身份。(b)顯示從MegaFace干擾集中選取的重復(fù)人臉。

3.2. Network Settings
We first evaluate the face verification performance based?on different network settings by using VGG2 as the training?data and Softmax as the loss function. All experiments in?this paper are implemented by MxNet [8]. We set the batch?size as 512 and train models on four or eight NVIDIA Tesla?P40 (24GB) GPUs. The learning rate is started from 0.1?and divided by 10 at the 100k, 140k, 160k iterations. Total?iteration step is set as 200k. We set momentum at 0.9 and?weight decay at 5e -?4 (Table 5).

3.2. 網(wǎng)絡(luò)設(shè)置
我們首先使用VGG2作為訓(xùn)練數(shù)據(jù)和Softmax作為損失函數(shù)，根據(jù)不同的網(wǎng)絡(luò)設(shè)置，評(píng)估人臉驗(yàn)證的性能。本文中的所有實(shí)驗(yàn)均由MxNet [8]實(shí)現(xiàn)。我們將批大小(batch?size)設(shè)置為512，在4個(gè)或8個(gè)NVIDIA Tesla P40(24GB)GPU上訓(xùn)練模型。學(xué)習(xí)速率從0.1開(kāi)始，并在100k、140k、160k個(gè)迭代(iterations)時(shí)除以10。總迭代步長(zhǎng)設(shè)置為200k。我們?cè)O(shè)定動(dòng)量為0.9，權(quán)重衰減為5e - 4(表5)。

3.2.1 Input setting
Following [46, 23], we use five facial landmarks (eye centres,nose tip and mouth corners) [49] for similarity transformation?to normalise the face images. The faces are cropped?and resized to 112 × 112, and each pixel (ranged between?[0,255]) in RGB images is normalised by subtracting 127.5?then divided by 128.?

As most of the convolutional networks are designed for the Image-Net [34] classification task, the input image size?is usually set as 224 × 224 or larger. However, the size?of our face crops is only 112 × 112. To preserve higher?feature map resolution, we use conv3 × 3 and stride = 1?in the first convolutional layer instead of using conv7 × 7?and stride = 2. For these two settings, the output size of?the convolutional networks is 7×7 (denoted as “L” in front?of the network names) and 3 × 3, respectively.

3.2.1 輸入設(shè)置
按照[46,23]，我們使用五個(gè)面部關(guān)鍵點(diǎn)landmarks（眼睛中心，鼻尖和嘴角）[49]進(jìn)行相似性變換，來(lái)標(biāo)準(zhǔn)化人臉圖像。將人臉裁剪并調(diào)整為112×112，在RGB圖像中，通過(guò)對(duì)每個(gè)像素(范圍在[0,255]之間)減去127.5再除以128，來(lái)進(jìn)行歸一化。

由于大多數(shù)卷積網(wǎng)絡(luò)都是針對(duì)Image-Net [34]分類(lèi)任務(wù)而設(shè)計(jì)的，因此輸入圖像大小通常設(shè)置為224 × 224或更大。但是，我們的剪裁后人臉圖像的大小只有112×112。為了保持更高的feature map分辨率，我們?cè)诘谝粋€(gè)convolutional layer中使用了conv3×3 and stride = 1來(lái)代替conv7×7 and stride = 2。對(duì)于這兩種設(shè)置，卷積網(wǎng)絡(luò)的輸出大小分別為7×7(在網(wǎng)絡(luò)名稱(chēng)前面用“L”表示)和3×3。

?
3.2.2 Output setting
In last several layers, some different options can be investigated?to check how the embedding settings affect the model??performance. All feature embedding dimension is set to 512?expect for Option-A, as the embedding size in Option-A is
determined by the channel size of last convolutional layer.

?Option-A: Use global pooling layer(GP).

?Option-B: Use one fully connected (FC) layer after GP.

?Option-C: Use FC-Batch Normalisation (BN) [20] after?GP.

?Option-D: Use FC-BN-Parametric Rectified Linear Unit (PReLu) [13] after GP.

?Option-E: Use BN-Dropout [38]-FC-BN after the last?convolutional layer.

During testing, the score is computed by the Cosine Distance?of two feature vectors. Nearest neighbour and threshold?comparison are used for face identification and verification?tasks.

3.2.2 輸出設(shè)置
在最后幾層中，可以探討一些不同的選項(xiàng)，來(lái)檢測(cè)嵌入設(shè)置是如何影響模型的性能。對(duì)于Option-A，所有feature的嵌入維數(shù)設(shè)置為512，其中Option-A中的嵌入維數(shù)由最后一個(gè)convolutional layer的通道大小決定。

?選項(xiàng)-A：使用全局池化層（GP）。

?選項(xiàng)-B：在GP之后使用一個(gè)全連接（FC）層。

?選項(xiàng)-C：在GP之后使用FC-Batch 標(biāo)準(zhǔn)化（BN）[20]。

?選項(xiàng)-D：在GP之后使用FC-BN-Parametric 整流線性單元（PReLu）[13]。

?選項(xiàng)-E：在最后一個(gè)卷積層之后使用BN-Dropout [38] -FC-BN。

在測(cè)試過(guò)程中，通過(guò)兩個(gè)特征向量的余弦距離來(lái)計(jì)算分?jǐn)?shù)。采用鄰近算法和閾值比較，用于人臉識(shí)別和人臉驗(yàn)證任務(wù)。

3.2.3 Block Setting
Besides the original ResNet [14] unit, we also investigate?a more advanced residual unit setting [12] for the training?of face recognition model. In Figure 7, we show the improved?residual unit (denoted as “IR” in the end of model?names), which has a BN-Conv-BN-PReLu-Conv-BN structure.?Compared to the residual unit proposed by [12], we?set stride = 2 for the second convolutional layer instead of?the first one. In addition, PReLu [13] is used to substitute?the original ReLu.

3.2.3 模塊設(shè)置
在原有的ResNet[14]單元的基礎(chǔ)上，我們還研究了一種更高級(jí)的用于人臉識(shí)別模型訓(xùn)練的殘差單元設(shè)置[12]。在圖7中，我們展示了改進(jìn)后的殘差單元(模型名稱(chēng)末尾用“IR”表示)，其結(jié)構(gòu)為BN-Conv-BN-PReLu-Conv-BN。與[12]提出的殘差單位相比，我們將第二個(gè)卷積層的步長(zhǎng)設(shè)置為2，而不是第一個(gè)卷積層?(如下圖第二個(gè)藍(lán)色框中，步長(zhǎng)設(shè)置為2)。另外，使用PReLu[13]代替原來(lái)的ReLu。

3.2.4 Backbones
Based on recent advances on the model structure designs,we also explore MobileNet [16], Inception-Resnet-V2 [40], Densely connected convolutional networks?(DenseNet) [18], Squeeze and excitation networks?(SE) [17] and Dual path Network (DPN) [9] for deep face?recognition. In this paper, we compare the differences between?these networks from the aspects of accuracy, speed?and model size.

3.2.4 骨干
基于模型結(jié)構(gòu)設(shè)計(jì)的最新進(jìn)展，我們還探索了MobileNet [16], Inception-Resnet-V2 [40], ?DenseNet[18], Squeeze，?SE[17] 和DPN [9]，用于深度人臉識(shí)別。本文從精度、速度和模型大小三個(gè)方面比較了這些網(wǎng)絡(luò)之間的差異。

3.2.5 Network Setting Conclusions
Input selects L. In Table 2, we compare two networks with?and without the setting of “L”. When using conv3 × 3 and?stride = 1 as the first convolutional layer, the network output?is 7×7. By contrast, if we use conv7×7 and stride = 2?as the first??convolutional layer, the network output is only?3×3. It is obvious from Table 2 that choosing larger feature?maps during training obtains higher verification accuracy.

表2. 驗(yàn)證精度(%)在不同的輸入條件下(Softmax@VGG2)。

3.2.5 網(wǎng)絡(luò)設(shè)置結(jié)論
輸入選擇L。在表2中，我們比較了兩個(gè)有和沒(méi)有設(shè)置“L”的網(wǎng)絡(luò)。當(dāng)使用conv3×3和stride = 1作為第一個(gè)卷積層時(shí)，網(wǎng)絡(luò)輸出為7×7。相比之下，如果我們使用conv7×7和stride = 2作為第一個(gè)卷積層，網(wǎng)絡(luò)輸出只有3×3。從表2可以看出，在訓(xùn)練過(guò)程中選擇較大的feature map可以獲得較高的驗(yàn)證精度。

Output selects E. In Table 3, we give the detailed comparison?between different output settings. The option E?(BN-Dropout-FC-BN) obtains the best performance. In this?paper, the dropout parameter is set as 0.4. Dropout can effectively?act as the????regularisation term to avoid over-fitting?and obtain better generalisation for deep face recognition.

表3. 驗(yàn)證精度(%)在不同的輸出設(shè)置(Softmax@VGG2)。

輸出選擇E。在表3中，我們給出了不同輸出設(shè)置的詳細(xì)比較。選項(xiàng)E (BN-Dropout-FC-BN)的性能最好。本文將dropout參數(shù)設(shè)置為0.4。Dropout可以有效地作為正則化項(xiàng)，避免過(guò)擬合，獲得更好的深度人臉識(shí)別泛化效果。

Block selects IR. In Table 4, we give the comparison?between the original residual unit and the improved?residual unit. As we can see from the results, the proposed?BN-Conv(stride=1)-BN-PReLu-Conv(stride=2)-BN?unit can obviously improve the verification performance.

表4. 驗(yàn)證精度(%)原殘差單元與改進(jìn)殘差單元的比較(Softmax@VGG2)。

殘差模塊選擇 IR。表4給出了原殘差單元與改進(jìn)殘差單元的比較。從結(jié)果可以看出，提出的BN-Conv(stride=1)-BN-PReLu-Conv(stride=2)-BN 單元可以明顯提高驗(yàn)證性能。

Backbones Comparisons. In Table 8, we give the verification accuracy, test speed and model size of different backbones. The running time is estimated on the P40 GPU. As?the performance on LFW is almost saturated, we focus on?the more challenging test sets, CFP-FP and AgeDB-30, to?compare these network backbones. The Inception-Resnet-V2 network obtains the best performance with long running?time (53.6ms) and largest model size (642MB). By contrast,MobileNet can finish face feature embedding within?4.2ms with a model of 112MB, and the performance only?drops slightly. As we can see from Table 8, the performance?gaps between these large networks, e.g. ResNet-100,Inception-Resnet-V2, DenseNet, DPN and SE-Resnet-100,
are relatively small. Based on the trade-off between accuracy,speed and model size, we choose LResNet100E-IR to?conduct experiments on the Megaface challenge.

表8.??不同骨干之間的準(zhǔn)確性(%)、速度(ms)和模型大小(MB)的比較(Softmax@VGG2)

骨干比較。在表8中，我們給出了不同骨架的驗(yàn)證精度、測(cè)試速度和模型尺寸。運(yùn)行時(shí)間在P40 GPU上估算。由于LFW的性能已經(jīng)接近飽和，我們將重點(diǎn)放在更具挑戰(zhàn)性的測(cè)試集CFP-FP和AgeDB-30上，來(lái)比較這些網(wǎng)絡(luò)骨架。Inception-Resnet-V2網(wǎng)絡(luò)獲得最佳的性能，其運(yùn)行時(shí)間長(zhǎng)為(53.6ms)，最大的模型大小為(642MB)。相比之下，MobileNet可以使用大小為112MB的模型，在4.2ms內(nèi)完成人臉特征的嵌入，性能略有下降。從表8可以看出，這些大型網(wǎng)絡(luò)，如ResNet-100,Inception-Resnet-V2, DenseNet, DPN 和?SE-Resnet-100，它們之間的性能差距相對(duì)較小。基于精度、速度和模型尺寸之間的權(quán)衡，我們選擇LResNet100E-IR來(lái)進(jìn)行Megaface challenge實(shí)驗(yàn)。

Weight decay. Based on the SE-LResNet50E-IR network,we also explore how the weight decay (WD) value?affects the verification performance. As we can see from?Table 5, when the weight decay value is set as 5e -?4, the?verification accuracy reaches the highest point. Therefore,?we fix the weight decay at 5e -?4 in all other experiments.

表5. 不同權(quán)重衰減(WD)值的驗(yàn)證性能(%)(SE-LResNet50E-IR,Softmax@VGG2)。

權(quán)重衰減。基于SE-LResNet50E-IR網(wǎng)絡(luò)，我們還探討了權(quán)重衰減(WD)值如何影響驗(yàn)證性能。從表5可以看出，當(dāng)權(quán)重衰減值設(shè)置為5e - 4時(shí)，驗(yàn)證精度達(dá)到最高點(diǎn)。因此，在所有其他實(shí)驗(yàn)中，我們將權(quán)重衰減的值固定為5e - 4。

3.3. Loss Setting
Since the margin parameter m plays an important role?in the proposed ArcFace, we first conduct experiments to?search the best angular margin. By varying m from 0.2?to 0.8, we use the LMobileNetE network and the ArcFace?loss to train models on the refined MS1M dataset. As?illustrated in Table 6, the performance improves consistently?from m = 0.2 on all datasets and gets saturated at?m = 0.5. Then, the verification accuracy turns to decrease?from m = 0.5. In this paper, we fix the additive angular?margin m as 0.5.

表6. 不同的角度間隔 m (LMobileNetE,ArcFace@MS1M)對(duì)應(yīng)的ArcFace驗(yàn)證性能(%)。

3.3. 損失設(shè)計(jì)
由于margin參數(shù) m 在提出的ArcFace中起著重要的作用，我們首先進(jìn)行實(shí)驗(yàn)來(lái)尋找最佳的角度間隔。通過(guò)將m從0.2變化到0.8，我們使用LMobileNetE網(wǎng)絡(luò)和ArcFace損失在清洗完的MS1M數(shù)據(jù)集上訓(xùn)練模型。如表6所示，在所有數(shù)據(jù)集上，從m = 0.2開(kāi)始，性能不斷提高，在m = 0.5時(shí)達(dá)到飽和。驗(yàn)證精度從m = 0.5之后開(kāi)始下降。本文將加法角度間隔 m 固定為0.5。

Based on the LResNet100E-IR network and the refined?MS1M dataset, we compare the performance of different?loss functions, e.g. Softmax, SphereFace [23], Cosine-Face [44, 43] and ArcFace. In Table 7, we give the detailed?verification accuracy on the LFW, CFP-FP, and AgeDB-30?datasets. As LFW is almost saturated, the performance improvement?is not obvious. We find that (1) Compared to?Softmax, SphereFace, CosineFace and ArcFace improve the?performance obviously, especially under large pose and age?variations. (2) CosineFace and ArcFace obviously outperform?SphereFace with much easier implementation. Both CosineFace and ArcFace can converge easily without additional?supervision from Softmax. By contrast, additional?supervision from Softmax is indispensable for SphereFace?to avoid divergence during training. (3) ArcFace is slightly?better than CosineFace. However, ArcFace is more intuitive?and has a more clear geometric interpretation on the hypersphere?manifold as shown in Figure 1.

表7. 不同損失函數(shù)下的驗(yàn)證性能(%)(LResNet100E-IR@MS1M)。

基于LResNet100E-IR網(wǎng)絡(luò)和MS1M數(shù)據(jù)集清洗，我們比較了不同損失函數(shù)的性能，如Softmax、SphereFace[23]、Cosine-Face[44、43]和ArcFace。在表7中，我們給出了LFW、CFP-FP和AgeDB-30數(shù)據(jù)集的詳細(xì)驗(yàn)證精度。由于LFW接近飽和，性能改善不明顯。我們發(fā)現(xiàn)(1)與Softmax相比，SphereFace、CosineFace和ArcFace明顯提高了性能，特別是在較大的姿態(tài)和年齡變化情況下。(2) CosineFace和ArcFace明顯優(yōu)于SphereFace，實(shí)現(xiàn)更簡(jiǎn)單。CosineFace和ArcFace可以很容易地收斂，而不需要額外的Softmax監(jiān)督。相比之下，為了避免在訓(xùn)練中出現(xiàn)發(fā)散，額外的Softmax監(jiān)督對(duì)于SphereFace來(lái)說(shuō)是必不可少的。(3) ArcFace略?xún)?yōu)于CosineFace。但是ArcFace更加直觀，對(duì)超球面流形的幾何解釋更加清晰，如圖1所示。

3.4. MegaFace Challenge1 on FaceScrub
For the experiments on the MegaFace challenge, we?use the LResNet100E-IR network and the refined MS1M?dataset as the training data. In both Table 9 and 10, we?give the identification and verification results on the original?MegaFace dataset and the refined MegaFace dataset.

In Table 9, we use the whole refined MS1M dataset to?train models. We compare the performance of the proposed?ArcFace with related baseline methods, e.g. Softmax,Triplet, SphereFace, and CosineFace. The proposed Arc-Face obtains the best performance before and after the distractors?refinement. After the overlapped face images are?removed from the one million distractors, the identification?performance significantly improves. We believe that the results?on the manually refined MegaFace dataset are more?reliable, and the performance of face identification under?million distractors is better than we think [2].

To strictly follow the evaluation instructions on?MegaFace, we need to remove all of the identities appearing?in the FaceScrub dataset from our training data. We calculate?the feature centre for each identity in the refined MS1M?dataset and the FaceScrub dataset. We find that 578 identities from the refined MS1M dataset have a close distance?(cosine similarity is higher than 0.45) with the identities?from the FaceScrub dataset. We remove these 578 identities?from the refined MS1M dataset and compare the proposed?ArcFace to other baseline methods in Table 10. ArcFace?still outperforms CosineFace with a slight performance drop?compared to Table 9. But for Softmax, the identification?rate drops obviously from 78.89% to 73.66% after the suspectable?overlap identities are removed from the training?data. On the refined MegaFace testset, the verification result?of CosineFace is slightly higher than that of ArcFace.This is because we read the verification results which are?closest to FAR=1e-6 from the outputs of the devkit. As we?can see from Figure 8, the proposed ArcFace always outperforms?CosineFace under both identification and verification?metric.

3.4. 在FaceScrub上的MegaFace Challenge1
對(duì)于MegaFace挑戰(zhàn)的實(shí)驗(yàn)，我們使用LResNet100E-IR網(wǎng)絡(luò)和清洗完的MS1M數(shù)據(jù)集作為訓(xùn)練數(shù)據(jù)。在表9和表10中，我們給出了原始MegaFace數(shù)據(jù)集和清洗完的MegaFace數(shù)據(jù)集的識(shí)別和驗(yàn)證結(jié)果。

在表9中，我們使用整個(gè)清洗完的MS1M數(shù)據(jù)集來(lái)訓(xùn)練模型。我們將提出的ArcFace與相關(guān)baseline方法(Softmax、Triplet、SphereFace和CosineFace)的性能進(jìn)行了比較。在修改干擾項(xiàng)之前和之后，提出的ArcFace都獲得最佳的性能。從數(shù)量為一百萬(wàn)的干擾集中去除重復(fù)的人臉圖像后，識(shí)別性能顯著提高。我們認(rèn)為在手動(dòng)清洗完的MegaFace數(shù)據(jù)集上的結(jié)果更可靠，在百萬(wàn)級(jí)別的干擾集下，人臉識(shí)別的性能比我們認(rèn)為的[2]更好。

為了嚴(yán)格遵守MegaFace上的評(píng)估說(shuō)明，我們需要從我們的訓(xùn)練數(shù)據(jù)集中刪除FaceScrub數(shù)據(jù)集中出現(xiàn)的所有身份。我們計(jì)算了清洗完的MS1M數(shù)據(jù)集和FaceScrub數(shù)據(jù)集中每個(gè)身份的特征中心。我們發(fā)現(xiàn)來(lái)自清洗完的MS1M數(shù)據(jù)集的578個(gè)身份與來(lái)自FaceScrub數(shù)據(jù)集的身份的距離相近(余弦相似度高于0.45)。我們從清洗完的MS1M數(shù)據(jù)集中刪除了這578個(gè)身份，并將提出的ArcFace與其他baseline方法進(jìn)行比較，在表10中。ArcFace仍然優(yōu)于CosineFace，與表9相比性能略有下降。但是對(duì)于Softmax，在從訓(xùn)練數(shù)據(jù)中去除疑似重復(fù)的身份后，識(shí)別率明顯下降，從78.89%下降到73.66%。在清洗完的MegaFace測(cè)試集上，CosineFace的驗(yàn)證結(jié)果略高于ArcFace。這是因?yàn)槲覀冏x取的驗(yàn)證結(jié)果與devkit的輸出FAR=1e-6 最接近。從圖8可以看出，在識(shí)別度和驗(yàn)證度度量方面，提出的ArcFace總是優(yōu)于CosineFace。

以下補(bǔ)充2個(gè)概念

rank-1 ：https://blog.csdn.net/sinat_42239797/article/details/93651594

TAR和FAR：https://blog.csdn.net/liuweiyuxiang/article/details/81259492

表9. MegaFace Challenge1 (LResNet100E-IR@MS1M)中不同方法的識(shí)別和驗(yàn)證結(jié)果。“Rank 1”指的是rank-1人臉識(shí)別的精度，“VR”指的是,在FAR(錯(cuò)誤接受的比例)為時(shí)，人臉驗(yàn)證的TAR(正確接受的比例)。(R)表示MegaFace數(shù)據(jù)集清洗的版本。

表10. MegaFace Challenge1 (Methods@ MS1M - FaceScrub)?中不同方法的識(shí)別和驗(yàn)證結(jié)果。“Rank 1”指的是rank-1人臉識(shí)別的精度，“VR”指的是,在FAR(錯(cuò)誤接受的比例)為時(shí)，人臉驗(yàn)證的TAR(正確接受的比例)。(R)表示MegaFace數(shù)據(jù)集清洗的版本。

圖8.?(a) 和?(c) 報(bào)告了在附帶1M干擾項(xiàng)的MegaFace數(shù)據(jù)集上不同方法的CMC曲線。(b) 和?(d) 報(bào)告了在附帶1M干擾集的MegaFace數(shù)據(jù)集上不同方法的ROC曲線。(a) 和?(b)在原始的MegaFace數(shù)據(jù)集上評(píng)估，(c) 和?(d)則在清洗完的MegaFace數(shù)據(jù)集上評(píng)估。

3.5. Further Improvement by Triplet Loss
Due to the limitation of GPU memory, it is hard to train?Softmax-based methods,e.g. SphereFace, CosineFace and?ArcFace, with millions of identities. One practical solution?is to employ metric learning methods, and the most widely?used method is the Triplet loss [35, 22]. However, the converging?speed of Triplet loss is relatively slow. To this end,?we explore Triplet loss to fine-turn exist face recognition?models which are trained with Softmax based methods.

For Triplet loss fine-tuning, we use the LResNet100EIR?network and set learning rate at 0.005, momentum at 0?and weight decay at 5e -?4. As shown in Table 11, we?give the verification accuracy by Triplet loss fine-tuning?on the AgeDB-30 dataset. We find that (1) The Softmax?model trained on a dataset with fewer identity numbers (e.g.VGG2 with 8,631 identities) can be obviously improved?by Triplet loss fine-tuning on a dataset with more identity?numbers (e.g. MS1M with 85k identities). This improvement?confirms the effectiveness of the two-step training?strategy, and this strategy can significantly accelerate the
whole model training compared to training Triplet loss from?scratch. (2) The Softmax model can be further improved by?Triplet loss fine-tuning on the same dataset, which proves?that the local refinement can improve the global model. (3)?The excellence of margin improved Softmax methods, e.g.SphereFace, CosineFace, and ArcFace, can be kept and further?improved by Triplet loss fine-tuning, which also verifies?that local metric learning method, e.g. Triplet loss, is complementary to global hypersphere metric learning based?methods.

As the margin used in Triplet loss is the Euclidean distance,we will investigate Triplet loss with the angular margin?recently.

表11.??通過(guò)三重?fù)p失微調(diào)(LResNet100E-IR)提高驗(yàn)證精度。

3.5. 進(jìn)一步改進(jìn)三重?fù)p失
由于GPU內(nèi)存的限制，很難訓(xùn)練使用softmax-based的方法(SphereFace, CosineFace和ArcFace)，去訓(xùn)練百萬(wàn)級(jí)別的身份。一種實(shí)用的解決方法是使用度量學(xué)習(xí)方法，最廣泛使用的方法是三重?fù)p失[35,22]。然而，三重態(tài)損失的收斂速度相對(duì)較慢。為此，我們探索三重?fù)p失的微調(diào)現(xiàn)，存在的softmax based方法訓(xùn)練的人臉識(shí)別模型。

對(duì)于三重?fù)p失的微調(diào)，我們使用LResNet100EIR網(wǎng)絡(luò)，并設(shè)置學(xué)習(xí)率為0.005，動(dòng)量為0，權(quán)重衰減為5e - 4。如表11所示，我們通過(guò)對(duì)AgeDB-30數(shù)據(jù)集進(jìn)行三重?fù)p失微調(diào)來(lái)給出驗(yàn)證精度。我們發(fā)現(xiàn) (1)用較少身份數(shù)量的數(shù)據(jù)集(例如具有8,631個(gè)身份的vgg2)訓(xùn)練的Softmax模型可以顯著得到提升，通過(guò)使用在較多身份數(shù)量的數(shù)據(jù)集(例如具有85k身份的MS1M)上微調(diào)過(guò)的三重?fù)p失。這一改進(jìn)證實(shí)了兩步訓(xùn)練策略的有效性，與從頭開(kāi)始訓(xùn)練的三重?fù)p失相比，這種策略可以顯著加速整個(gè)模型訓(xùn)練。(2)通過(guò)對(duì)同一數(shù)據(jù)集上的三重?fù)p失進(jìn)行微調(diào)，可以進(jìn)一步改進(jìn)Softmax模型，證明局部改進(jìn)可以提升全局模型。(3)margin的有點(diǎn)提升了的Softmax方法，如sphereface、CosineFace和ArcFace。這個(gè)優(yōu)點(diǎn)可以通過(guò)三重?fù)p失的微調(diào)來(lái)保持和進(jìn)一步改進(jìn)，這也驗(yàn)證了局部度量學(xué)習(xí)方法，如三重?fù)p失，是對(duì)全局超球度量學(xué)習(xí)基本方法的補(bǔ)充。

由于三重?fù)p失使用的間隔是歐幾里德距離，所以我們最近將用研究帶有角度間隔的三重?fù)p失。

4. Conclusions
In this paper, we contribute to improving deep face?recognition from data refinement, network settings and?loss function?designs. We have (1) refined the largest?public available training dataset (MS1M) and test dataset?(MegaFace); (2) explored different network settings and?analysed the trade-off between accuracy and speed; (3) proposed?a geometrically interpretable loss function called ArcFaceand explained why the proposed ArcFace is better?than Softmax, SphereFace and CosineFace from the view?of semi-hard sample distributions; (4) obtained state-of-theart?performance on the MegaFace dataset in a totally reproducible way.

4. 結(jié)論
在本文中，我們從數(shù)據(jù)清洗、網(wǎng)絡(luò)設(shè)置和損失函數(shù)設(shè)計(jì)三個(gè)方面來(lái)提升深度人臉識(shí)別的效果。我們有(1)清洗了規(guī)模最大的公開(kāi)訓(xùn)練數(shù)據(jù)集(MS1M)和測(cè)試數(shù)據(jù)集(MegaFace)；(2)探索不同的網(wǎng)絡(luò)設(shè)置，分析準(zhǔn)確性與速度之間的權(quán)衡；(3)提出了一種稱(chēng)為ArcFace的幾何可解釋損失函數(shù)，從semi-hard樣本分布的角度解釋了為什么提出的ArcFace要優(yōu)于Softmax、SphereFace和CosineFace；(4)以完全可復(fù)制的方式，在MegaFace數(shù)據(jù)集中獲得最先進(jìn)的性能
————————————————
版權(quán)聲明：本文為CSDN博主「神羅Noctis」的原創(chuàng)文章，遵循CC 4.0 BY-SA版權(quán)協(xié)議，轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/qq_39937396/article/details/102523945

總結(jié)

以上是生活随笔為你收集整理的Arcface v1 论文翻译与解读的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Insightface项目爬坑指南+使用
下一篇：使用keras的cifar10.load