DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之BN-Inception:BN-Inception算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
?
?
?
目錄
BN-Inception算法的簡介(論文介紹)
BN-Inception算法的架構詳解
1、BN-Inception網絡—核心組件
5、實驗結果比對
BN-Inception算法的案例應用
?
?
?
?
?
相關文章
DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
DL之BN-Inception:BN-Inception算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
DL之BN-Inception:BN-Inception算法的架構詳解
DL之InceptionV4/ResNet:InceptionV4/Inception-ResNet算法的簡介(論文介紹)、架構詳解、案例應用等配圖集合之詳細攻略
BN-Inception算法的簡介(論文介紹)
? ? ? ? ? ? ? ? ? ?BN-Inception是Google研究人員在Inception的基礎上,所作出的改進版本。
Abstract
? ? ? ?Training Deep Neural Networks is complicated by the fact ?that the distribution of each layer’s inputs changes during ?training, as the parameters of the previous layers change. ?This slows down the training by requiring lower learning ?rates and careful parameter initialization, and makes it no ?- ?toriously hard to train models with saturating nonlinearities. ?We refer to this phenomenon as internal covariate ?shift, and address the problem by normalizing layer inputs. ?Our method draws its strength from making normalization ?a part of the model architecture and performing the ?normalization for each training mini-batch. Batch Normalization ?allows us to use much higher learning rates and ?be less careful about initialization. It also acts as a regularizer, ?in some cases eliminating the need for Dropout. ?Applied to a state-of-the-art image classification model, ?Batch Normalization achieves the same accuracy with 14 ?times fewer training steps, and beats the original model ?by a significant margin. Using an ensemble of batchnormalized ?networks, we improve upon the best published ?result on ImageNet classification: reaching 4.9% top-5 ?validation error (and 4.8% test error), exceeding the accuracy ?of human raters.
摘要
? ? ? ?由于訓練過程中各層輸入的分布隨前一層參數的變化而變化,使得訓練深度神經網絡變得復雜。這通過要求較低的學習率和謹慎的參數初始化來降低訓練速度,并使用飽和非線性訓練模型變得不那么困難。我們將這種現象稱為內部協變量移位,并通過規范化層輸入來解決這個問題。我們的方法將規范化作為模型體系結構的一部分,并對每個訓練小批執行規范化,從而獲得了它的優勢。批處理規范化允許我們使用更高的學習率,并且在初始化方面不那么小心。它還作為一個正則化器,在某些情況下消除了Dropout的需要。應用于最先進的圖像分類模型,批處理歸一化以14倍的訓練步驟達到了同樣的精度,并大大超過了原始模型。利用批量歸一化網絡的集合,我們改進了在ImageNet分類上發布的最佳結果:達到4.9%的前5個驗證錯誤(和4.8%的測試錯誤),超過了人類評分器的精度。
Conclusion
? ? ? ?We have presented a novel mechanism for dramatically ?accelerating the training of deep networks. It is based on ?the premise that covariate shift, which is known to complicate ?the training of machine learning systems, also applies to sub-networks and layers, and removing it from internal activations of the network may aid in training. Our proposed method draws its power from normalizing activations, and from incorporating this normalization in the network architecture itself. This ensures that the normalization is appropriately handled by any optimization method that is being used to train the network. To enable stochastic optimization methods commonly used in deep network training, we perform the normalization for each mini-batch, and backpropagate the gradients through the normalization parameters. Batch Normalization adds only two extra parameters per activation, and in doing so preserves the representation ability of the network. We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and often do not require Dropout for regularization.
? ? ? ?我們提出了一種新的機制,可以顯著加快深度網絡的訓練。它的前提是協變量移位(covariate shift)也適用于子網絡和層,從網絡的內部激活中去除協變量移位可能有助于訓練。協變量移位已知會使機器學習系統的訓練復雜化。我們提出的方法從規范化激活和將這種規范化合并到網絡體系結構本身中獲得強大的功能。這可以確保任何用于訓練網絡的優化方法都能恰當地處理規范化。為了實現深度網絡訓練中常用的隨機優化方法,我們對每個小批進行歸一化,并通過歸一化參數對梯度進行反向傳播。批處理規范化在每次激活時只添加兩個額外的參數,這樣做保留了網絡的表示能力。提出了一種利用批處理規范化網絡構造、訓練和執行推理的算法。得到的網絡可以用飽和非線性進行訓練,對增加的訓練率更有容忍度,而且通常不需要退出正則化。
? ? ? ?Merely adding Batch Normalization to a state-of-theart ?image classification model yields a substantial speedup ?in training. By further increasing the learning rates, removing ?Dropout, and applying other modifications afforded ?by Batch Normalization, we reach the previous ?state of the art with only a small fraction of training steps ?– and then beat the state of the art in single-network image ?classification. Furthermore, by combining multiple models ?trained with Batch Normalization, we perform better ?than the best known system on ImageNet, by a significant ?margin.
? ? ? ?僅僅在一個最先進的圖像分類模型中添加批處理歸一化,就可以大大加快訓練速度。通過進一步提高學習速度,移除Dropout,并應用批處理歸一化提供的其他修改,我們只需要一小部分訓練步驟就可以達到以前的水平——然后在單網絡圖像分類中擊敗目前的水平。此外,通過將多個經過訓練的模型與批處理規范化相結合,我們在ImageNet上的性能比最著名的系統要好得多。
? ? ? ?Interestingly, our method bears similarity to the standardization ?layer of (G¨ulc?ehre & Bengio, 2013), though ?the two methods stem from very different goals, and perform ?different tasks. The goal of Batch Normalization ?is to achieve a stable distribution of activation values ?throughout training, and in our experiments we apply it ?before the nonlinearity since that is where matching the ?first and second moments is more likely to result in a ?stable distribution. On the contrary, (G¨ulc?ehre & Bengio, ?2013) apply the standardization layer to the output of the ?nonlinearity, which results in sparser activations. In our ?large-scale image classification experiments, we have not ?observed the nonlinearity inputs to be sparse, neither with ?nor without Batch Normalization. Other notable differentiating characteristics of Batch Normalization include ?the learned scale and shift that allow the BN transform ?to represent identity (the standardization layer did not require ?this since it was followed by the learned linear transform ?that, conceptually, absorbs the necessary scale and ?shift), handling of convolutional layers, deterministic inference ?that does not depend on the mini-batch, and batchnormalizing ?each convolutional layer in the network.
? ? ? ?有趣的是,我們的方法與(G¨ulc ehre & Bengio, 2013)的標準化層有相似之處,盡管這兩種方法的目標非常不同,執行的任務也不同。批量歸一化的目標是在整個訓練過程中實現激活值的穩定分布,在我們的實驗中,我們將其應用于非線性之前,因為在非線性之前,匹配第一和第二矩更有可能得到穩定的分布。相反,(G¨ulc ehre & Bengio, 2013)將標準化層應用于非線性的輸出,導致更稀疏的激活。在我們的大規模圖像分類實驗中,我們沒有觀察到非線性輸入是稀疏的,既沒有批次歸一化也沒有沒有。批正常化的其他顯著的差異化特征包括規模和學習轉變,使BN變換代表身份(標準化層不需要這個,因為隨之而來的線性變換,從概念上講,吸收必要的規模和轉移),卷積處理層,確定性推理,并不取決于mini-batch,和每個卷積batchnormalizing層網絡中。
? ? ? ?In this work, we have not explored the full range of ?possibilities that Batch Normalization potentially enables. ?Our future work includes applications of our method to ?Recurrent Neural Networks (Pascanu et al., 2013), where ?the internal covariate shift and the vanishing or exploding ?gradients may be especially severe, and which would allow ?us to more thoroughly test the hypothesis that normalization ?improves gradient propagation (Sec. 3.3). We plan ?to investigate whether Batch Normalization can help with ?domain adaptation, in its traditional sense – i.e. whether ?the normalization performed by the network would allow ?it to more easily generalize to new data distributions, ?perhaps with just a recomputation of the population ?means and variances (Alg. 2). Finally, we believe that further ?theoretical analysis of the algorithm would allow still ?more improvements and applications.
? ? ? ?在這項工作中,我們還沒有探索批處理規范化可能實現的所有可能性。我們未來的工作包括將我們的方法應用于遞歸神經網絡(Pascanu et al., 2013),其中內部協變量移位和消失或爆炸梯度可能特別嚴重,這將使我們能夠更徹底地檢驗正常化改善梯度傳播的假設(第3.3節)。我們計劃調查是否批標準化有助于域適應,在傳統意義上,即標準化執行的網絡是否會使它更容易推廣到新的數據分布,也許只需重新計算總體均值和方差(alg.2)。最后,我們相信的進一步理論分析算法將允許更多的改進和應用。
?
論文
Sergey Ioffe, Christian Szegedy.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,
https://arxiv.org/abs/1502.03167
?
?
BN-Inception算法的架構詳解
DL之BN-Inception:BN-Inception算法的架構詳解
?
0、BN算法是如何加快訓練和收斂速度的呢?
Batch Normalization有兩個功能,一個是可以加快訓練和收斂速度,另外一個是可以防止過擬合。
? ? ? ? ?BN算法在實際使用的時候,會把特征給強制性的歸到均值為0,方差為1的數學模型下。深度網絡在訓練的過程中,如果每層的數據分布都不一樣的話,將會導致網絡非常難收斂和訓練,而如果能把每層的數據轉換到均值為0,方差為1的狀態下,一方面,數據的分布是相同的,訓練會比較容易收斂;另一方面,均值為0,方差為1的狀態下,在梯度計算時會產生比較大的梯度值,可以加快參數的訓練,更直觀的來說,是把數據從飽和區直接拉到非飽和區。更進一步,這也可以很好的控制梯度爆炸和梯度消失現象,因為這兩種現象都和梯度有關。
?
1、BN-Inception網絡—核心組件
- Batch Normalization(批歸一化)。意義,目前BN已經成為幾乎所有卷積神經網絡的標配技巧。
- 5x5卷積核→2個3x3卷積核。相同的感受野
?
5、實驗結果比對
? ? ? ? ?在提供的包含50000個圖像的驗證集上,與以前的最新技術進行批量標準化初始比較。*根據測試服務器的報告,在ImageNet測試集的100000張圖像上,BN初始集成已達到4.82% top-5。
? ? ? ? ?其中BN-Inception Ensemble,則采用多個網絡模型集成學習后得到的結果。
?
BN-Inception算法的案例應用
?
TF之DD:實現輸出Inception模型內的某個卷積層或者所有卷積層的形狀
TF之DD:利用Inception模型+GD算法生成原始的Deep Dream圖片
TF之DD:利用Inception模型+GD算法生成更大尺寸的Deep Dream精美圖片
TF之DD:利用Inception模型+GD算法生成更高質量的Deep Dream高質量圖片
TF之DD:利用Inception模型+GD算法——五個架構設計思路
TF之DD:利用Inception模型+GD算法生成帶背景的大尺寸、高質量的Deep Dream圖片
?
?
?
?
?
總結
以上是生活随笔為你收集整理的DL之BN-Inception:BN-Inception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Dataset之WebVision:We
- 下一篇: 成功解决AttributeError: