神经网络中,正则化L1与L2的区别、如何选择以及代码验证
所謂的正則效果就是:
數學上具備修補項的某些特性。
講人話,到底什么是正則化?
就是讓我們本科時學過的拉格朗日法求極值得到的解集具有某些特征。
L1:(拉普拉斯分布的指數項)
結果會比較稀疏(接近0,或者很大),
好處是更快的特征學習,讓很多W為0
但是正則效果可能不太明顯;
L2:(高斯分布的指數項)
L2對于不重要的特征會減小W,但是不會為0
我們應該如何選擇L1還是L2?
一般是根據先驗分布的不同選擇不同的正則化項(其實高斯分布和拉普拉斯分布長得差不多)
Google的說法是:
L1 regularization can’t help with multicollinearity.
L2 regularization can’t help with feature selection.
講人話:
當你想要抽取規則的時候,L1優先
當你想要特征之間進行線性組合的時候,L2優先
為什么L1具有稀疏性?
這個東西網上幾乎沒有博客是講清楚的,
還記得本科時學的拉格朗日不?
這里使用
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models
中的一個圖來說明:
上面圖中的橢圓就是未經正則化的原loss函數,
綠色的就是約束,解最終在綠色的區域的邊上產生。
上面有個地方沒有講準確,就是,
這里其實使用的是“廣義拉格朗日”(處理不等式約束),
本科時我們學過的是“狹義的拉格朗日(處理等式約束)”
所以L2能產生稀疏解不?也可以,但是概率比較小,因為約束是一個圓圈嘛。
好了,扯了這么多,代碼呢?
代碼可以使用《python深度學習》第四章的第三個實驗
神經網絡結構是10000X16X16X1
為了快速出結果,設置epochs=1
L1正則時的權重輸出如下:
我們可以看到,有很多個權重是e-4,也就是說小于0.1,
所以L1的稀疏性是什么意思呢,
不是網上說的很多權重為0,
而是很多權重接近0.
然后使用同樣的代碼,再進行L2正則化,然后看下輸出的權重
輸出權重 [array([[ 5.4510499e-19, -1.1375406e-14, 1.2810104e-09, ...,6.6694220e-13, 1.7195195e-21, 1.1844387e-18],[ 2.2681307e-02, 2.8639721e-02, -5.0795679e-03, ...,3.2248314e-02, 3.5097659e-02, 1.9943751e-02],[ 2.9682485e-02, 3.8339857e-02, -3.8643787e-03, ...,-7.3956484e-03, 1.5394368e-02, 6.4378373e-02],...,[-2.2290610e-03, -5.2631828e-03, -1.1894613e-02, ...,-3.2023021e-03, 9.8316949e-03, 4.5698625e-03],[-6.3545518e-03, -2.8249058e-03, -1.4715969e-02, ...,3.9712125e-03, -2.1713993e-03, 6.4099208e-05],[ 4.3839724e-03, 7.1036047e-04, -6.1749844e-03, ...,3.3779065e-03, 3.6998792e-04, -2.5457949e-03]], dtype=float32), array([0.01308027, 0.02083343, 0.01824512, 0.01143504, 0.00499259,0.01486909, 0.01095271, 0.00404395, 0.02463059, 0.00872665,0.00801992, 0.00815683, 0.01039271, 0.01561781, 0.01411563,0.04340347], dtype=float32), array([[ 2.93407321e-01, -1.98400989e-02, -4.40114766e-01,-1.11424305e-01, -7.31087476e-02, -3.04403722e-01,4.64353442e-01, -3.29900682e-01, 1.63630053e-01,-1.84131727e-01, 3.08276862e-01, 1.94891691e-01,-4.41589683e-01, -3.05707157e-01, -4.41319868e-02,4.10211772e-01],[ 1.44314080e-01, 1.14107765e-01, -3.18847924e-01,-8.83864984e-02, -2.84857243e-01, 4.43122834e-01,3.62090170e-02, -2.15172529e-01, 9.76731330e-02,-1.61776304e-01, -3.61122280e-01, -5.60393780e-02,8.91952366e-02, 3.17961156e-01, 3.24503183e-01,-3.71593475e-01],[ 4.25948858e-01, 7.04302564e-02, 2.81940609e-01,-1.77070782e-01, -2.74864286e-01, -1.06579565e-01,4.36641574e-01, -1.12034686e-01, -3.45022917e-01,-3.19837213e-01, -1.43970661e-02, -2.16923535e-01,-2.31076464e-01, -8.27731341e-02, 3.60185146e-01,3.36787492e-01],[ 1.15281835e-01, -3.01896662e-01, 2.39346668e-01,1.83167055e-01, -1.16130240e-01, -2.12356411e-02,-3.89987141e-01, -2.43074540e-02, 3.24033946e-01,2.12604478e-01, 1.53906882e-01, 3.26046437e-01,2.10126624e-01, 8.62302035e-02, -1.64832115e-01,1.51150580e-02],[-3.90144706e-01, 3.52188319e-01, 2.51630321e-02,-3.43495667e-01, -2.54216045e-01, -3.87258083e-02,-1.94808662e-01, 2.56020427e-01, 2.74487942e-01,2.68538356e-01, 4.05583121e-02, 4.54750240e-01,1.47770867e-01, -3.62259477e-01, -3.83709610e-01,2.63715029e-01],[-2.85299212e-01, 1.69331729e-01, -3.38647544e-01,2.34549761e-01, 2.80789793e-01, 8.91473368e-02,1.77124396e-01, -2.13072211e-01, -2.61840492e-01,-1.87434465e-01, -2.91305147e-02, 1.61795199e-01,2.20224589e-01, 2.58004293e-02, 5.37811071e-02,-3.69709074e-01],[-1.43994346e-01, -2.49894559e-01, -2.04025045e-01,9.80285257e-02, -2.77742773e-01, -2.26973757e-01,-8.67364928e-02, 1.37876272e-01, -2.02594623e-01,-1.06818587e-01, -1.79687336e-01, -2.55178958e-01,-2.99385250e-01, -5.59285395e-02, -2.94983864e-01,2.53982246e-01],[ 1.84192955e-01, -3.73058200e-01, 2.97143936e-01,-1.03991538e-01, 2.90101379e-01, -1.68928549e-01,1.39721408e-01, -1.06037809e-02, -2.18328491e-01,1.80070952e-01, -2.92274624e-01, 3.65887821e-01,-1.29578754e-01, -3.81524444e-01, 3.92549694e-01,4.74777371e-01],[ 4.72008854e-01, -2.47345746e-01, 3.59616429e-01,2.52750129e-01, 1.19450204e-01, -3.10099781e-01,4.30306435e-01, -1.62968546e-01, -1.40832901e-01,3.73812504e-02, -1.25185639e-01, 2.44938642e-01,3.32485586e-01, 3.53299305e-02, -3.30613941e-01,-1.35950983e-01],[ 1.85852900e-01, -9.83352214e-02, 3.41978699e-01,8.67165811e-03, 3.46845627e-01, 1.29153579e-01,4.64443177e-01, 1.15139224e-02, -3.25556934e-01,-3.66379201e-01, -2.55769938e-01, -2.08554789e-02,1.51786834e-01, 5.86780272e-02, 2.55553305e-01,-3.25718910e-01],[-3.11689675e-01, -2.17617527e-02, 7.72571983e-03,-2.31704265e-01, -2.14245647e-01, 9.61495414e-02,-3.27444166e-01, 2.67990142e-01, 2.50948817e-01,1.95652723e-01, 5.79224415e-02, 9.13931709e-03,2.56390348e-02, 3.59594494e-01, -2.66508758e-01,2.20254958e-01],[ 1.94646284e-01, 3.66057843e-01, 2.14457333e-01,2.24085152e-01, 1.05342574e-01, -4.27612185e-01,-3.49920005e-01, 3.75871032e-01, -1.05346240e-01,-9.78173241e-02, 1.75176308e-01, -2.12537095e-01,-8.19739625e-02, -1.40039310e-01, 7.61785209e-02,3.92508149e-01],[-4.72818352e-02, 2.94093102e-01, 2.90557832e-01,1.40494900e-04, 2.79832035e-01, 3.35276634e-01,-3.79788160e-01, -7.71822548e-03, 2.58355170e-01,2.66037405e-01, -1.21681616e-01, 3.90908629e-01,6.70772269e-02, 3.55733871e-01, 2.57470012e-01,2.55136728e-01],[ 3.27138193e-02, -2.80077597e-06, -3.42442542e-01,-4.03933018e-01, -1.91840082e-01, -1.61959320e-01,3.39495540e-01, 1.39288634e-01, 4.23164964e-01,1.70850694e-01, -1.47289789e-08, 9.98789370e-02,7.75708482e-02, -7.56203895e-03, -3.06240357e-02,-6.50095474e-03],[ 2.32675731e-01, -2.35386729e-01, 2.33843103e-01,-4.32860196e-01, 7.58191794e-02, 4.14948165e-01,1.41167402e-01, 1.70569643e-01, 4.25751954e-01,-3.75422835e-01, -2.05775797e-01, 6.12163655e-02,-3.75813574e-01, 2.67257035e-01, 2.52756864e-01,1.00201644e-01],[ 1.61049694e-01, -2.18448177e-01, -2.63321877e-01,4.01718348e-01, -1.76443890e-01, 2.42350683e-01,7.09695518e-02, -3.05766135e-01, 6.77920505e-02,-2.73409277e-01, -3.23337317e-01, 1.05839022e-01,1.21519454e-01, -8.19710717e-02, -6.41441718e-02,1.70101207e-02]], dtype=float32), array([ 0.01596233, -0.00421972, -0.0345025 , 0.03105383, 0.00776252,我們可以看到,其實也有很多是e-2取值的,但是L2正則化的情況下,你基本看不到e-4的,所以說,
L1比L2“更容易”導致權重的稀疏性,
注意:
并非只有L1能導致稀疏性。
總結
以上是生活随笔為你收集整理的神经网络中,正则化L1与L2的区别、如何选择以及代码验证的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: expected dense_10_in
- 下一篇: 《Python深度学习》第四章的实验结果