【论文阅读】The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification
【論文閱讀】The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification
- 摘要
- 具體實現
- 判別組件
- 差異性組件
- 代碼
摘要
Key for solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms – a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive on spatial-wise. The end result is therefore a set of feature channels that each reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford-Cars). Ablative studies further demonstrate the superiority of MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks. Code available at https://github.com/dongliangchang/Mutual-Channel-Loss
解決細粒度圖像分類的關鍵是找到與細微視覺特征相對應的區分區域和局部區域。復雜的網絡專門設計用來學習零件級的區別特征表示已經取得了長足的進步。在這篇文章中,我們展示了在不需要過于復雜的網絡設計或訓練機制的情況下培養微妙的細節是可能的——只需一個損失就可以了。主要的訣竅在于我們如何在早期深入研究各個特征通道,而不是從統一的特征圖開始的慣例。所提出的損失函數稱為互通道損失(MC-Loss),由兩個特定通道的組件組成:判別性組件和差異性組件。判別性組件通過一種新穎的通道注意力機制,強制屬于同一類別的所有特征頻道具有鑒別性。差異性組件另外約束通道,使得它們在空間上變得相互排斥。因此,最終結果是一組特征通道,每個通道反映特定類別的不同局部區分區域。MC-Loss可以端到端訓練,不需要任何額外的bounding box/part標注,并在推理過程中產生高度可分辨的區域。實驗結果表明,當我們的MCLoss在公共基礎網絡上實現時,可以在所有四個細粒度分類數據集(Cub-Birds、FGVC-Aircraft、Flowers102和Stanford-Cars)上獲得最先進的性能。消融研究進一步證明,在兩種不同的基礎網絡上,MC-Loss與最近提出的其他用于視覺分類的通用損失相比具有優越性。代碼可在https://github.com/dongliangchang/Mutual-Channel-Loss獲得
具體實現
———————————————————————————————————————
將圖像輸入到基本網絡提取特征得到F∈RN×W×H\mathcal{F} \in R^{N \times W \times H}F∈RN×W×H,其中需要將通道NNN設置成c×ξc \times \xic×ξ,其中,ccc為類別數量,ξ\xiξ為平坦到每個類上的通道數。
由此第i個類別的特征可以表示為:
Fi={Fi×ξ+1,Fi×ξ+2,?,Fi×ξ+ξ}\mathbf{F}_{i}=\left\{\mathcal{F}_{i \times \xi+1}, \mathcal{F}_{i \times \xi+2}, \cdots, \mathcal{F}_{i \times \xi+\xi}\right\}Fi?={Fi×ξ+1?,Fi×ξ+2?,?,Fi×ξ+ξ?}
之前提取的特征就可以表示為:
F={F0,F1,?,Fc?1}\mathbf{F}=\left\{\mathbf{F}_{0}, \mathbf{F}_{1}, \cdots, \mathbf{F}_{c-1}\right\}F={F0?,F1?,?,Fc?1?}
隨后,FFF進入網絡的兩個流,其中兩個不同的損失針對兩個不同的目標定制。在上圖中,LCEL_{C E}LCE?流將FFF視為輸入到全連接層中,作交叉熵損失,在這里,交叉熵損失鼓勵網絡提取主要集中在全局判別區域的信息特征。LMCL_{M C}LMC?流監督網絡聚焦不同的局部區分性區域,對LMCL_{M C}LMC?做一個加權得到總的損失:
Loss?(F)=LCE(F)+μ×LMC(F)\operatorname{Loss}(\mathbf{F})=L_{C E}(\mathbf{F})+\mu \times L_{M C}(\mathbf{F})Loss(F)=LCE?(F)+μ×LMC?(F)
再來具體看LMCL_{M C}LMC?流:
判別組件
上圖(a)左,根據前面劃分的F={F0,F1,?,Fc?1}\mathbf{F}=\left\{\mathbf{F}_{0}, \mathbf{F}_{1}, \cdots, \mathbf{F}_{c-1}\right\}F={F0?,F1?,?,Fc?1?},判別組件要求特征通道與類對齊,每個對應于特定類的特征通道都應該具有足夠的判別能力。
Ldis(F)=LCE(y,[eg(F0),eg(F1),?,eg(Fc?1)]T∑i=0c?1eg(Fi)?Softmax?)L_{d i s}(\mathbf{F})=L_{C E}(\boldsymbol{y}, \underbrace{\frac{\left[e^{g\left(\mathbf{F}_{0}\right)}, e^{g\left(\mathbf{F}_{1}\right)}, \cdots, e^{\left.g\left(\mathbf{F}_{c-1}\right)\right]^{\mathrm{T}}}\right.}{\sum_{i=0}^{c-1} e^{g\left(\mathbf{F}_{i}\right)}}}_{\text {Softmax }})Ldis?(F)=LCE?(y,Softmax?∑i=0c?1?eg(Fi?)[eg(F0?),eg(F1?),?,eg(Fc?1?)]T???)
其中,g(Fi)=1WH∑k=1WH?GAP?max?j=1,2,?,ξ?CCMP?[Mi?Fi,j,k]?CWA?g\left(\mathbf{F}_{i}\right)=\underbrace{\frac{1}{W H} \sum_{k=1}^{W H}}_{\text {GAP }} \underbrace{\max _{j=1,2, \cdots, \xi}}_{\text {CCMP }} \underbrace{\left[M_{i} \cdot \mathbf{F}_{i, j, k}\right]}_{\text {CWA }}g(Fi?)=GAP?WH1?k=1∑WH???CCMP?j=1,2,?,ξmax???CWA?[Mi??Fi,j,k?]??;Mi=diag?(M_{i}=\operatorname{diag}\left(\right.Mi?=diag( Mask i)\left._{i}\right)i?);Mask i∈Rξ_{i} \in R^{\xi}i?∈Rξ is a 0-1 mask with randomly ?ξ2?\left\lfloor\frac{\xi}{2}\right\rfloor?2ξ??zero(s);GAP,CCMP,CWA,分別為Global Average Pooling,Cross-Channel Max Pooling,Channel-Wise Attention。
差異性組件
差異性組件是用于所有特征通道的近似距離測量,以計算所有通道的總相似度。與歐幾里德距離和二次復雜度的Kullback-Leibler散度等常用度量相比,在計算復雜度不變的情況下,它的計算成本更低。沿著上圖(a)的右側塊所示的差異性組件讓組LiL_{i}Li?中的特征通道通過訓練變得彼此不同。換言之,一個類別的不同特征通道應該聚焦于圖像的不同區域,而不是所有通道都聚焦于最具區分性的區域。因此,它通過使每個組的特征通道多樣化來減少冗余信息,并有助于發現針對圖像中每一類的不同區分區域。該操作可以被解釋為跨通道去相關,以便從圖像的不同顯著區域捕捉細節。在Softmax之后,通過引入CCMP直接在卷積濾波器上施加監督,然后進行空間維度求和來衡量相交程度。LdivL_{div}Ldiv?表示為:
Ldiv(F)=1c∑i=0c?1h(Fi)L_{d i v}(\mathbf{F})=\frac{1}{c} \sum_{i=0}^{c-1} h\left(\mathbf{F}_{i}\right)Ldiv?(F)=c1?i=0∑c?1?h(Fi?)
其中,h(Fi)=∑k=1WHmax?j=1,2,?,ξ?CCMP[eFi,j,k∑k′=1WHeFi,j,k′]?Softmax?h\left(\mathbf{F}_{i}\right)=\sum_{k=1}^{W H} \underbrace{\max _{j=1,2, \cdots, \xi}}_{C C M P} \underbrace{\left[\frac{e^{\mathbf{F}_{i, j, k}}}{\sum_{k^{\prime}=1}^{W H} e^{\mathbf{F}_{i, j, k^{\prime}}}}\right]}_{\text {Softmax }}h(Fi?)=∑k=1WH?CCMPj=1,2,?,ξmax???Softmax?[∑k′=1WH?eFi,j,k′?eFi,j,k??]??
代碼
def Mask(nb_batch, channels):foo = [1] * 2 + [0] * 1bar = []for i in range(200):random.shuffle(foo)bar += foobar = [bar for i in range(nb_batch)]bar = np.array(bar).astype("float32")bar = bar.reshape(nb_batch,200*channels,1,1)bar = torch.from_numpy(bar)bar = bar.cuda()bar = Variable(bar)return bardef supervisor(x,targets,height,cnum):mask = Mask(x.size(0), cnum)branch = xbranch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))branch = F.softmax(branch,2)branch = branch.reshape(branch.size(0),branch.size(1), x.size(2), x.size(2))branch = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch) branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))loss_2 = 1.0 - 1.0*torch.mean(torch.sum(branch,2))/cnum # set margin = 3.0branch_1 = x * mask branch_1 = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch_1) branch_1 = nn.AvgPool2d(kernel_size=(height,height))(branch_1)branch_1 = branch_1.view(branch_1.size(0), -1)loss_1 = criterion(branch_1, targets)return [loss_1, loss_2] class model_bn(nn.Module):def __init__(self, feature_size=512,classes_num=200):super(model_bn, self).__init__() self.features = nn.Sequential(*list(net.children())[:-2]) self.max = nn.MaxPool2d(kernel_size=14, stride=14)self.num_ftrs = 600*1*1self.classifier = nn.Sequential(nn.BatchNorm1d(self.num_ftrs),#nn.Dropout(0.5),nn.Linear(self.num_ftrs, feature_size),nn.BatchNorm1d(feature_size),nn.ELU(inplace=True),#nn.Dropout(0.5),nn.Linear(feature_size, classes_num),)def forward(self, x, targets):x = self.features(x)if self.training:MC_loss = supervisor(x,targets,height=14,cnum=3)x = self.max(x)x = x.view(x.size(0), -1)x = self.classifier(x)loss = criterion(x, targets)if self.training:return x, loss, MC_losselse:return x, losstrain
out, ce_loss, MC_loss = net(inputs, targets)loss = ce_loss + args["alpha_1"] * MC_loss[0] + args["beta_1"] * MC_loss[1]valid
out, ce_loss = net(inputs,targets)test_loss += ce_loss.item()總結
以上是生活随笔為你收集整理的【论文阅读】The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 什么是TPM设备管理系统?本文来告诉你
- 下一篇: Devil fly