在神经网络中使用辍学:不是一个神奇的子弹
Overfitting is an issue that occurs when a model shows high accuracy in predicting training data (the data used to build the model), but low accuracy in predicting test data (unseen data that the model has not used before).
當模型在預測訓練數據(用于構建模型的數據)中顯示出較高的準確性,但在預測測試數據(模型之前未使用的未知數據)中顯示出低準確性時,就會發生過擬合問題。
This can particularly be a problem when it comes to using small datasets in the course of building a neural network. It is possible for the neural network to be of such a size that it “overtrains” on the training data — and therefore performs poorly when it comes to predicting new data.
當在構建神經網絡的過程中使用小型數據集時,這尤其可能成為一個問題。 神經網絡的大小可能使它在訓練數據上“過度訓練”,因此在預測新數據時表現不佳。
輟學在規范化神經網絡中的作用 (Role of Dropout in Regularizing Neural Networks)
At its most basic, Dropout literally “drops-out” certain neurons from the neural network. This is to prevent excessive “noise” in the network that artificially increases the training accuracy, but does not result in any meaningful information being transferred to the output layer — i.e. any increase in the training accuracy comes from excessive training and not from any useful information from the model features themselves.
從最基本的角度講,Dropout實際上是從神經網絡中“丟棄”某些神經元。 這是為了防止網絡中過分的“噪聲”人為地提高訓練精度,但不會導致任何有意義的信息被傳輸到輸出層,即,訓練精度的任何提高都來自過度的訓練,而不是來自任何有用的信息從模型特征本身。
Dropout renders certain nodes in the network inactive as illustrated in the image at the beginning of this article — thus forcing the network to look for more meaningful patterns that influence the output layer.
如本文開頭的圖像所示,Dropout使網絡中的某些節點處于非活動狀態 -從而迫使網絡尋找影響輸出層的更有意義的模式。
While Dropout can technically be used in both the input and hidden layers — it is most common to use Dropout across the hidden layers, as using it on the input layer still risks discarding important information.
盡管從技術上講可以在輸入層和隱藏層中都使用Dropout,但最常見的是在隱藏層中使用Dropout,因為在輸入層上使用Dropout仍然有丟棄重要信息的風險。
預測酒店的平均每日房價:基于回歸的神經網絡 (Predicting Average Daily Rates For Hotels: Regression-Based Neural Network)
To investigate the effectiveness of Dropout in predicting the output layer, let’s use a regression-based neural network to predict ADR (average daily rates) for customers at a hotel.
為了研究Dropout在預測輸出層中的有效性,我們使用基于回歸的神經網絡來預測酒店客戶的ADR(平均每日房價) 。
The original research by Antonio, Almeida, and Nunes (2016) is available in the References section below.
Antonio,Almeida和Nunes(2016)的原始研究可在下面的參考部分中找到。
The following features are used to predict ADR:
以下功能用于預測ADR:
數據集 (Datasets)
Let’s consider two training datasets.
讓我們考慮兩個訓練數據集。
Dataset 1 is the original dataset with 40,060 observations. Dataset 2 is a smaller version of the original with 100 observations.
數據集1是具有40,060個觀測值的原始數據集。 數據集2是原始版本的較小版本,具有100個觀測值。
A regression-based neural network model is built on each in order to predict ADR values across the test set (a separate dataset). The datasets and code for this example are available in the references section below.
基于回歸的神經網絡模型建立在每個模型上,以便預測整個測試集(單獨的數據集)的ADR值。 該示例的數據集和代碼可在下面的參考部分中找到。
無輟學的神經網絡 (Neural Networks without Dropout)
數據集1-模型配置 (Dataset 1 — Model Configuration)
8 input layers are used in the network
網絡中使用了8個輸入層
ELU is used as the activation function.
ELU用作激活功能。
- A linear output layer is used. 使用線性輸出層。
1,669 hidden nodes are used in the hidden layer.
在隱藏層中使用了1,669個隱藏節點。
The number of hidden nodes in the layer are determined as follows:
該層中的隱藏節點數確定如下:
Source: Image created by author — formula based on answer from Cross Validated來源:作者創建的圖像-基于“交叉驗證”的答案的公式With 30,045 samples in our training set (after partitioning Dataset 1 into training and validation portions), a chosen factor of 2, as well as 8 input neurons and 1 output neuron — this gives 1,669 hidden nodes.
在我們的訓練集中有30,045個樣本(將數據集1劃分為訓練和驗證部分之后),2的選擇因子以及8個輸入神經元和1個輸出神經元-這提供了1,669個隱藏節點。
Here is the structure of the neural network:
這是神經網絡的結構:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出Using 30 epochs, a batch size of 150, and a validation split of 20%, the model is trained.
使用30個紀元,批量大小為150個 ,驗證拆分為20%來訓練模型。
Here is the training and validation loss:
這是訓練和驗證損失:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出When the predictions are compared to the test set, the following errors are obtained:
將預測結果與測試集進行比較時,將獲得以下誤差:
Mean Absolute Error: 29.89
平均絕對誤差: 29.89
Root Mean Squared Error: 43.91
均方根誤差: 43.91
數據集2-模型配置 (Dataset 2 — Model Configuration)
Using the condensed dataset with only 100 observations, let us now see what the errors look like when using a much smaller dataset.
使用僅包含100個觀測值的壓縮數據集,現在讓我們看一下使用小得多的數據集時的錯誤情況。
The overall configuration of the network remains the same — but this time a hidden layer with 5 nodes is used — as a dense layer of 1,669 nodes would almost certainly lead to overfitting with such a small training set.
網絡的總體配置保持不變-但這次使用具有5個節點的隱藏層-因為1669個節點的密集層幾乎可以肯定會導致采用如此小的訓練集的過擬合。
The network configuration is as follows:
網絡配置如下:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出The errors obtained on the test set are as follows:
在測試集上獲得的錯誤如下:
Mean Absolute Error: 39.08
平均絕對錯誤: 39.08
Root Mean Squared Error: 53.59
均方根誤差: 53.59
Clearly, there has been an increase in errors when training on a smaller dataset, which indicates that the model is not performing as well on unseen data. Let’s see what happens when Dropout is introduced.
顯然,在較小的數據集上進行訓練時,錯誤增加了,這表明模型在看不見的數據上表現不佳。 讓我們看看引入Dropout時會發生什么。
具有輟學的神經網絡 (Neural Networks with Dropout)
The same neural network as above is run, but this time using 20% Dropout. In other words, a 20% probability that nodes in the hidden layer will be dropped in order to prevent overfitting.
運行與上面相同的神經網絡,但是這次使用20%的 Dropout。 換句話說,隱藏層中的節點將丟失20%的概率,以防止過度擬合。
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出The results obtained are as follows:
獲得的結果如下:
Mean Absolute Error: 39.96
平均絕對錯誤: 39.96
Root Mean Squared Error: 54.83
均方根誤差: 54.83
We see that this has not had the desired effect of improving accuracy on the test set, and the errors have in fact risen slightly.
我們看到,這并沒有達到改善測試集準確性的預期效果,并且誤差實際上有所增加。
Let’s try 40% Dropout.
讓我們嘗試40%輟學率。
Mean Absolute Error: 41.97
平均絕對誤差: 41.97
Root Mean Squared Error: 57.23
均方根誤差: 57.23
Again, the errors have increased substantially. This indicates that instead of reducing overfitting — Dropout is eliminating valuable information from the neural network instead which is resulting in lower prediction accuracy.
再次,錯誤已大大增加。 這表明,與其減少過度擬合,不如說是Dropout從神經網絡中消除了有價值的信息,這導致較低的預測準確性。
增加隱藏層 (Increasing Hidden Layers)
Instead of using Dropout, what if two hidden layers (5 nodes each) are used instead of one?
如果不使用Dropout,而是使用兩個隱藏層(每個5個節點)而不是一個隱藏層怎么辦?
Here is the updated model configuration:
這是更新的模型配置:
Source: Jupyter Notebook Output資料來源:Jupyter Notebook輸出Under this configuration, the reported errors have decreased considerably — on par with those seen when the larger dataset was used:
在這種配置下,報告的錯誤已大大減少-與使用較大數據集時看到的錯誤相當:
Mean Absolute Error: 29.06
平均絕對錯誤: 29.06
Root Mean Squared Error: 43.42
均方根誤差: 43.42
選擇正確的功能后,丟包會變得多余 (With Proper Feature Selection, Dropout Can Become Redundant)
Why has Dropout not worked as we intended in this case?
為什么在這種情況下Dropout無法按預期工作?
One important thing to remember about this neural network is that the features for the input layer were selected before fitting the neural network.
關于該神經網絡要記住的重要一件事是, 在擬合神經網絡之前選擇了輸入層的特征。
This was done using feature selection tools such as the ExtraTreesClassifier and forward and backward feature selection — as well as manually determining if the included features make theoretical sense in predicting ADR values.
這是通過使用功能選擇工具(例如ExtraTreesClassifier和向前和向后的功能選擇 )以及手動確定所包含的功能在預測ADR值上是否具有理論意義來完成的。
In this regard, one can make the argument that with proper feature selection — Dropout serves little purpose and instead may simply result in eliminating valuable information from the network.
在這方面,人們可以提出這樣的論點:選擇適當的功能-輟學沒有多大作用,而可能只是導致從網絡中消除有價值的信息。
In this case, adding another hidden layer to the smaller network appears to have been sufficient in accounting for the additional variation in the output layer.
在這種情況下,將另一個隱藏層添加到較小的網絡似乎足以解決輸出層中的其他變化。
While Dropout can be of use if there are many irrelevant features in the input layer — proper feature selection in the first instance would mean that inducing Dropout in a neural network becomes unnecessary.
盡管在輸入層中有許多不相關的特征時可以使用Dropout,但首先選擇適當的特征將意味著無需在神經網絡中引入Dropout。
結論 (Conclusion)
As we have seen, Dropout did not have the desired effect in improving test accuracy — even in the case of a smaller dataset.
如我們所見,即使在數據集較小的情況下,Dropout在提高測試準確性方面也沒有達到預期的效果。
From this standpoint, proper feature selection prior to building a neural network will in most cases prove superior to arbitrarily applying Dropout in order to reduce overfitting. As with any model — ensuring that the variables in such a model make theoretical sense will often produce better results.
從這個角度來看,在大多數情況下,在構建神經網絡之前進行適當的特征選擇將優于為減少過度擬合而任意應用Dropout的優勢。 與任何模型一樣,確保此類模型中的變量具有理論意義,通常會產生更好的結果。
Many thanks for your time, and grateful for any comments or feedback. The code and datasets for this example is available in the MGCodesandStats GitHub repository as referenced below.
非常感謝您的寶貴時間,并感謝您提出任何意見或反饋。 MGCodesandStats GitHub存儲庫中提供了此示例的代碼和數據集,如下所示。
Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.
免責聲明:本文按“原樣”撰寫,不作任何擔保。 它旨在提供數據科學概念的概述,并且不應以任何方式解釋為專業建議。
翻譯自: https://towardsdatascience.com/using-dropout-with-neural-networks-not-a-magic-bullet-2fc3e4b17898
總結
以上是生活随笔為你收集整理的在神经网络中使用辍学:不是一个神奇的子弹的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: adobe系列常用的5个软件
- 下一篇: 金庸群侠传x左右互搏怎么学(金属化学元素