手把手入门神经网络系列(2)_74行代码实现手写数字识别
作者:?龍心塵&&寒小陽?
時間:2015年12月。?
出處:?
http://blog.csdn.net/longxinchen_ml/article/details/50281247?
聲明:版權(quán)所有,轉(zhuǎn)載請聯(lián)系作者并注明出處,謝謝。
一、 引言:不要站在岸上學游泳
“機器學習”是一個很實踐的過程。就像剛開始學游泳,你在只在岸上比劃一堆規(guī)定動作還不如先跳到水里熟悉水性學習來得快。以我們學習“機器學習”的經(jīng)驗來看,很多高大上的概念剛開始不懂也沒關(guān)系,先寫個東西來跑跑,有個感覺了之后再學習那些概念和理論就快多了。如果別人已經(jīng)做好了輪子,直接拿過來用則更快。因此,本文直接用Michael Nielsen先生的代碼(github地址,壓縮包地址)作為例子,給大家展現(xiàn)神經(jīng)網(wǎng)絡(luò)分析的普遍過程:導入數(shù)據(jù),訓練模型,優(yōu)化模型,啟發(fā)式理解等。
本文假設(shè)大家已經(jīng)了解python的基本語法,并在自己機器上運行過簡單python腳本。
二、 我們要解決的問題:手寫數(shù)字識別
手寫數(shù)字識別是機器學習領(lǐng)域中一個經(jīng)典的問題,是一個看似對人類很簡單卻對程序十分復雜的問題。很多早期的驗證碼就是利用這個特點來區(qū)分人類和程序行為的,當然此處就不提12306近乎反人類的奇葩驗證碼了。?
回到手寫數(shù)字識別,比如我們要識別出一個手寫的“9”,人類可能通過識別“上半部分一個圓圈,右下方引出一條豎線”就能進行判斷。但用程序表達就似乎很困難了,你需要考慮非常多的描述方式,考慮非常多的特殊情況,最終發(fā)現(xiàn)程序?qū)懙梅浅碗s而且效果不好。
而用(機器學習)神經(jīng)網(wǎng)絡(luò)的方法,則提供了另一個思路:獲取大量的手寫數(shù)字的圖像,并且已知它們表示的是哪個數(shù)字,以此為訓練樣本集合,自動生成一套模型(如神經(jīng)網(wǎng)絡(luò)的對應(yīng)程序),依靠它來識別新的手寫數(shù)字。
?
本文中采用的數(shù)據(jù)集就是著名的“MNIST數(shù)據(jù)集”。它的收集者之一是人工智能領(lǐng)域著名的科學家——Yann LeCu。這個數(shù)據(jù)集有60000個訓練樣本數(shù)據(jù)集和10000個測試用例。運用本文展示的單隱層神經(jīng)網(wǎng)絡(luò),就可以達到96%的正確率。
三、 圖解:解決問題的思路
我們可以用下圖展示上面的粗略思路。?
但是如何由“訓練集”來“生成模型”呢?
在這里我們使用反復推薦的逆推法——假設(shè)這個模型已經(jīng)生成了,它應(yīng)該滿足什么樣的特性,再以此特性為條件反過來求出模型。
可以推想而知,被生成的模型應(yīng)該對于訓練集的區(qū)分效果非常好,也就是相應(yīng)的訓練誤差非常低。比如有一個未知其相應(yīng)權(quán)重和偏移的神經(jīng)網(wǎng)絡(luò),而訓練神經(jīng)網(wǎng)絡(luò)的過程就是逐步確定這些未知參數(shù)的過程,最終使得這些參數(shù)確定的模型在訓練集上的誤差達到最小值。我們將會設(shè)計一個數(shù)量指標衡量這個誤差,如果訓練誤差沒有達到最小,我們將繼續(xù)調(diào)整參數(shù),直到這個指標達到最小。但這樣訓練出來的模型我們仍無法保證它面對新的數(shù)據(jù)仍會有這樣好的識別效果,就需要用測試集對模型進行考核,得出的測試結(jié)果作為對模型的評價。因此,上圖就可以細化成下圖:?
但是,如果我們已經(jīng)生成了多個模型,怎么從中選出最好的模型?一個自然的思路就是通過比較不同模型在測試集上的誤差,挑選出誤差最小的模型。這個想法看似沒什么問題,但是隨著你測試的模型增多,你會覺得用測試集篩選出來的模型也不那么可信。比如我們增加一個神經(jīng)網(wǎng)絡(luò)的隱藏層節(jié)點,就會產(chǎn)生新的對應(yīng)權(quán)重,產(chǎn)生一個新的模型。但是我也不知道增加多少個節(jié)點是合適的,所以比較全面的想法就是嘗試測試不同的節(jié)點數(shù)x∈(1,2,3,4,…,100), 來觀察這些不同模型的測試誤差,并挑出誤差最小的模型。這時我們發(fā)現(xiàn)我們的模型其實多出來了一個參數(shù)x, 我們挑選模型的過程就是確定最優(yōu)化的參數(shù)x 的過程。這個分析過程與上面訓練參數(shù)的思路如出一轍!只是這個過程是基于同一個測試集,而不訓練集。那么,不同的神經(jīng)網(wǎng)絡(luò)的層數(shù)是不是也是一個新的參數(shù)y∈(1,2,3,4,…,100), 也要經(jīng)過這么個過程來“訓練”?
我們會發(fā)現(xiàn)我們之前生成模型過程中很多不變的部分其實都是可以變換調(diào)節(jié)的,這些也是新的參數(shù),比如訓練次數(shù)、梯度下降過程的步長、規(guī)范化參數(shù)、學習回合數(shù)、minibatch 值等等,我們把他們叫做超參數(shù)。超參數(shù)是影響所求參數(shù)最終取值的參數(shù),是機器學習模型里面的框架參數(shù),可以理解成參數(shù)的參數(shù),它們通常是手工設(shè)定,不斷試錯調(diào)整的,或者對一系列窮舉出來的參數(shù)組合一通進行枚舉(網(wǎng)格搜索)來確定。但無論如何,這也是基于同樣一個數(shù)據(jù)集反復驗證優(yōu)化的結(jié)果。在這個數(shù)據(jù)集上最后的結(jié)果并不一定在新的數(shù)據(jù)繼續(xù)有效。所以為了評估這個模型的識別效果,就需要用新的測試集對模型進行考核,得出的測試結(jié)果作為對模型的評價。這個新的測試集我們就直接叫“測試集”,之前那個用于篩選超參數(shù)的測試集,我們就叫做“交叉驗證集”。篩選模型的過程其實就是交叉驗證的過程。
所以,規(guī)范的方法的是將數(shù)據(jù)集拆分成三個集合:訓練集、交叉驗證集、測試集,然后依次訓練參數(shù)、超參數(shù),最終得到最優(yōu)的模型。
因此,上圖可以進一步細化成下圖:?
或者下圖:?
可見機器學習過程是一個反復迭代不斷優(yōu)化的過程。其中很大一部分工作是在調(diào)整參數(shù)和超參數(shù)。
四、 先跑跑再說:初步運行代碼
Michael Nielsen的代碼封裝得很好,只需以下5行命令就可以生成神經(jīng)網(wǎng)絡(luò)并測試結(jié)果,并達到94.76%的正確率!。
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> mnist_loader <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> network <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 將數(shù)據(jù)集拆分成三個集合:訓練集、交叉驗證集、測試集</span> training_data, validation_data, test_data = mnist_loader.load_data_wrapper() <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 生成神經(jīng)網(wǎng)絡(luò)對象,神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)為三層,每層節(jié)點數(shù)依次為(784, 30, 10)</span> net = network.Network([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">784</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>]) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 用(mini-batch)梯度下降法訓練神經(jīng)網(wǎng)絡(luò)(權(quán)重與偏移),并生成測試結(jié)果。</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 訓練回合數(shù)=30, 用于隨機梯度下降法的最小樣本數(shù)=10,學習率=3.0</span> net.SGD(training_data, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>, test_data=test_data)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>-
第一個命令的功能是:將數(shù)據(jù)集拆分成三個集合:訓練集、交叉驗證集、測試集。
-
第二個命令的功能是:生成神經(jīng)網(wǎng)絡(luò)對象,神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)為三層,每層節(jié)點數(shù)依次為(784, 30, 10)。
-
第三個命令的功能是:用(mini-batch)梯度下降法訓練神經(jīng)網(wǎng)絡(luò)(權(quán)重與偏移),并生成測試結(jié)果。
- 該命令設(shè)定了三個超參數(shù):訓練回合數(shù)=30, 用于隨機梯度下降法的最小樣本數(shù)(mini-batch-size)=10,步長=3.0。
本文并不打算詳細解釋隨機梯度下降法的細節(jié),感興趣的同學請閱讀前文《深度學習與計算機視覺系列(4)_最優(yōu)化與隨機梯度下降》
總共的輸出結(jié)果如下:
<code class="hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">Epoch 0: 9045 / 10000 Epoch 1: 9207 / 10000 Epoch 2: 9273 / 10000 Epoch 3: 9302 / 10000 Epoch 4: 9320 / 10000 Epoch 5: 9320 / 10000 Epoch 6: 9366 / 10000 Epoch 7: 9387 / 10000 Epoch 8: 9427 / 10000 Epoch 9: 9402 / 10000 Epoch 10: 9400 / 10000 Epoch 11: 9442 / 10000 Epoch 12: 9448 / 10000 Epoch 13: 9441 / 10000 Epoch 14: 9443 / 10000 Epoch 15: 9479 / 10000 Epoch 16: 9459 / 10000 Epoch 17: 9446 / 10000 Epoch 18: 9467 / 10000 Epoch 19: 9470 / 10000 Epoch 20: 9459 / 10000 Epoch 21: 9484 / 10000 Epoch 22: 9479 / 10000 Epoch 23: 9475 / 10000 Epoch 24: 9482 / 10000 Epoch 25: 9489 / 10000 Epoch 26: 9489 / 10000 Epoch 27: 9478 / 10000 Epoch 28: 9480 / 10000 Epoch 29: 9476 / 10000</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li></ul>五、 神經(jīng)網(wǎng)絡(luò)如何識別手寫數(shù)字:啟發(fā)式理解
首先,我們解釋一下神經(jīng)網(wǎng)絡(luò)每層的功能。?
第一層是輸入層。因為mnist數(shù)據(jù)集中每一個手寫數(shù)字樣本是一個28*28像素的圖像,因此對于每一個樣本,其輸入的信息就是每一個像素對應(yīng)的灰度,總共有28*28=784個像素,故這一層有784個節(jié)點。
第三層是輸出層。因為阿拉伯數(shù)字總共有10個,我們就要將樣本分成10個類別,因此輸出層我們采用10個節(jié)點。當樣本屬于某一類(某個數(shù)字)的時候,則該類(該數(shù)字)對應(yīng)的節(jié)點為1,而剩下9個節(jié)點為0,如[0,0,0,1,0,0,0,0,0,0]。
因此,我們每一個樣本(手寫數(shù)字的圖像)可以用一個超長的784維的向量表示其特征,而用一個10維向量表示該樣本所屬的類別(代表的真實數(shù)字),或者叫做標簽。
mnist的數(shù)據(jù)就是這樣表示的。所以,如果你想看訓練集中第n個樣本的784維特征向量,直接看training_data[n][0]就可以找到,而要看其所屬的標簽,看training_data[n][1]就夠了。
那么,第二層神經(jīng)網(wǎng)絡(luò)所代表的意義怎么理解?這其實是很難的。但是我們可以有一個啟發(fā)式地理解,比如用中間層的某一個節(jié)點表示圖像中的某一個小區(qū)域的特定圖像。這樣,我們可以假設(shè)中間層的頭4個節(jié)點依次用來識別圖像左上、右上、左下、右下4個區(qū)域是否存在這樣的特征的。?
如果這四個節(jié)點的值都很高,說明這四個區(qū)域同時滿足這些特征。將以上的四個部分拼接起來,我們會發(fā)現(xiàn),輸入樣本很可能就是一個手寫“0”!?
因此,同一層的幾個神經(jīng)元同時被激活了意味著輸入樣本很可能是某個數(shù)字。
當然,這只是對神經(jīng)網(wǎng)絡(luò)作用機制的一個啟發(fā)式理解。真實的過程卻并不一定是這樣。但通過啟發(fā)式理解,我們可以對神經(jīng)網(wǎng)絡(luò)作用機制有一個更加直觀的認識。
由此可見,神經(jīng)網(wǎng)絡(luò)能夠識別手寫數(shù)字的關(guān)鍵是它有能夠?qū)μ囟ǖ膱D像激發(fā)特定的節(jié)點。而神經(jīng)網(wǎng)絡(luò)之所以能夠針對性地激發(fā)這些節(jié)點,關(guān)鍵是它具有能夠適應(yīng)相關(guān)問題場景的權(quán)重和偏移。那這些權(quán)重和偏移如何訓練呢?
六、 神經(jīng)網(wǎng)絡(luò)如何訓練:進一步閱讀代碼
上文已經(jīng)圖解的方式介紹了機器學習解決問題的一般思路,但是具體到神經(jīng)網(wǎng)絡(luò)將是如何訓練呢?
其實最快的方式是直接閱讀代碼。我們將代碼的結(jié)構(gòu)用下圖展示出來,運用其內(nèi)置函數(shù)名表示基本過程,發(fā)現(xiàn)與我們上文分析的思路一模一樣:
簡單解釋一下,在神經(jīng)網(wǎng)絡(luò)模型中:
- 所需要求的關(guān)鍵參數(shù)就是:神經(jīng)網(wǎng)絡(luò)的權(quán)重(self.weights)和偏移(self.biases)。
- 超參數(shù)是:隱藏層的節(jié)點數(shù)=30,訓練回合數(shù)(epochs)=30, 用于隨機梯度下降法的最小樣本數(shù)(mini_batch_size)=10,步長(eta)=3.0。
- 用隨機梯度下降法調(diào)整參數(shù):?
- 用反向傳播法求出隨機梯度下降法所需要的梯度(偏導數(shù)): backprop()
- 用輸出向量減去標簽向量衡量訓練誤差:cost_derivative() = output_activations-y
全部代碼如下(去掉注釋之后,只有74行):
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;"><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">""" network.py ~~~~~~~~~~A module to implement the stochastic gradient descent learning algorithm for a feedforward neural network. Gradients are calculated using backpropagation. Note that I have focused on making the code simple, easily readable, and easily modifiable. It is not optimized, and omits many desirable features. """</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#### Libraries</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Standard library</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> random<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Third-party libraries</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">Network</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(object)</span>:</span><span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">__init__</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, sizes)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""The list ``sizes`` contains the number of neurons in therespective layers of the network. For example, if the listwas [2, 3, 1] then it would be a three-layer network, with thefirst layer containing 2 neurons, the second layer 3 neurons,and the third layer 1 neuron. The biases and weights for thenetwork are initialized randomly, using a Gaussiandistribution with mean 0, and variance 1. Note that the firstlayer is assumed to be an input layer, and by convention wewon't set any biases for those neurons, since biases are onlyever used in computing the outputs from later layers."""</span>self.num_layers = len(sizes)self.sizes = sizesself.biases = [np.random.randn(y, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> y <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> sizes[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:]]self.weights = [np.random.randn(y, x)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> x, y <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(sizes[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>], sizes[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:])]<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">feedforward</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, a)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Return the output of the network if ``a`` is input."""</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> b, w <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(self.biases, self.weights):a = sigmoid(np.dot(w, a)+b)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> a<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">SGD</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, training_data, epochs, mini_batch_size, eta,test_data=None)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Train the neural network using mini-batch stochasticgradient descent. The ``training_data`` is a list of tuples``(x, y)`` representing the training inputs and the desiredoutputs. The other non-optional parameters areself-explanatory. If ``test_data`` is provided then thenetwork will be evaluated against the test data after eachepoch, and partial progress printed out. This is useful fortracking progress, but slows things down substantially."""</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> test_data: n_test = len(test_data)n = len(training_data)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> j <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> xrange(epochs):random.shuffle(training_data)mini_batches = [training_data[k:k+mini_batch_size]<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> k <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> xrange(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, n, mini_batch_size)]<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> mini_batch <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> mini_batches:self.update_mini_batch(mini_batch, eta)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> test_data:<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Epoch {0}: {1} / {2}"</span>.format(j, self.evaluate(test_data), n_test)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span>:<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Epoch {0} complete"</span>.format(j)<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">update_mini_batch</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, mini_batch, eta)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Update the network's weights and biases by applyinggradient descent using backpropagation to a single mini batch.The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``is the learning rate."""</span>nabla_b = [np.zeros(b.shape) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> b <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> self.biases]nabla_w = [np.zeros(w.shape) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> w <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> self.weights]<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> x, y <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> mini_batch:delta_nabla_b, delta_nabla_w = self.backprop(x, y)nabla_b = [nb+dnb <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> nb, dnb <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(nabla_b, delta_nabla_b)]nabla_w = [nw+dnw <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> nw, dnw <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(nabla_w, delta_nabla_w)]self.weights = [w-(eta/len(mini_batch))*nw<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> w, nw <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(self.weights, nabla_w)]self.biases = [b-(eta/len(mini_batch))*nb<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> b, nb <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(self.biases, nabla_b)]<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">backprop</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, x, y)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Return a tuple ``(nabla_b, nabla_w)`` representing thegradient for the cost function C_x. ``nabla_b`` and``nabla_w`` are layer-by-layer lists of numpy arrays, similarto ``self.biases`` and ``self.weights``."""</span>nabla_b = [np.zeros(b.shape) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> b <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> self.biases]nabla_w = [np.zeros(w.shape) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> w <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> self.weights]<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># feedforward</span>activation = xactivations = [x] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># list to store all the activations, layer by layer</span>zs = [] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># list to store all the z vectors, layer by layer</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> b, w <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> zip(self.biases, self.weights):z = np.dot(w, activation)+bzs.append(z)activation = sigmoid(z)activations.append(activation)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># backward pass</span>delta = self.cost_derivative(activations[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>], y) * \sigmoid_prime(zs[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>])nabla_b[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] = deltanabla_w[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] = np.dot(delta, activations[-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>].transpose())<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Note that the variable l in the loop below is used a little</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># differently to the notation in Chapter 2 of the book. Here,</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># l = 1 means the last layer of neurons, l = 2 is the</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># second-last layer, and so on. It's a renumbering of the</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># scheme in the book, used here to take advantage of the fact</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># that Python can use negative indices in lists.</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> l <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> xrange(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, self.num_layers):z = zs[-l]sp = sigmoid_prime(z)delta = np.dot(self.weights[-l+<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>].transpose(), delta) * spnabla_b[-l] = deltanabla_w[-l] = np.dot(delta, activations[-l-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>].transpose())<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> (nabla_b, nabla_w)<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">evaluate</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, test_data)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Return the number of test inputs for which the neuralnetwork outputs the correct result. Note that the neuralnetwork's output is assumed to be the index of whicheverneuron in the final layer has the highest activation."""</span>test_results = [(np.argmax(self.feedforward(x)), y)<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> (x, y) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> test_data]<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> sum(int(x == y) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> (x, y) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> test_results)<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">cost_derivative</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(self, output_activations, y)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Return the vector of partial derivatives \partial C_x /\partial a for the output activations."""</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> (output_activations-y)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#### Miscellaneous functions</span> <span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">sigmoid</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(z)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""The sigmoid function."""</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>/(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>+np.exp(-z))<span class="hljs-function" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> <span class="hljs-title" style="box-sizing: border-box;">sigmoid_prime</span><span class="hljs-params" style="color: rgb(102, 0, 102); box-sizing: border-box;">(z)</span>:</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"""Derivative of the sigmoid function."""</span><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> sigmoid(z)*(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>-sigmoid(z))</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li><li style="box-sizing: border-box; padding: 0px 5px;">59</li><li style="box-sizing: border-box; padding: 0px 5px;">60</li><li style="box-sizing: border-box; padding: 0px 5px;">61</li><li style="box-sizing: border-box; padding: 0px 5px;">62</li><li style="box-sizing: border-box; padding: 0px 5px;">63</li><li style="box-sizing: border-box; padding: 0px 5px;">64</li><li style="box-sizing: border-box; padding: 0px 5px;">65</li><li style="box-sizing: border-box; padding: 0px 5px;">66</li><li style="box-sizing: border-box; padding: 0px 5px;">67</li><li style="box-sizing: border-box; padding: 0px 5px;">68</li><li style="box-sizing: border-box; padding: 0px 5px;">69</li><li style="box-sizing: border-box; padding: 0px 5px;">70</li><li style="box-sizing: border-box; padding: 0px 5px;">71</li><li style="box-sizing: border-box; padding: 0px 5px;">72</li><li style="box-sizing: border-box; padding: 0px 5px;">73</li><li style="box-sizing: border-box; padding: 0px 5px;">74</li><li style="box-sizing: border-box; padding: 0px 5px;">75</li><li style="box-sizing: border-box; padding: 0px 5px;">76</li><li style="box-sizing: border-box; padding: 0px 5px;">77</li><li style="box-sizing: border-box; padding: 0px 5px;">78</li><li style="box-sizing: border-box; padding: 0px 5px;">79</li><li style="box-sizing: border-box; padding: 0px 5px;">80</li><li style="box-sizing: border-box; padding: 0px 5px;">81</li><li style="box-sizing: border-box; padding: 0px 5px;">82</li><li style="box-sizing: border-box; padding: 0px 5px;">83</li><li style="box-sizing: border-box; padding: 0px 5px;">84</li><li style="box-sizing: border-box; padding: 0px 5px;">85</li><li style="box-sizing: border-box; padding: 0px 5px;">86</li><li style="box-sizing: border-box; padding: 0px 5px;">87</li><li style="box-sizing: border-box; padding: 0px 5px;">88</li><li style="box-sizing: border-box; padding: 0px 5px;">89</li><li style="box-sizing: border-box; padding: 0px 5px;">90</li><li style="box-sizing: border-box; padding: 0px 5px;">91</li><li style="box-sizing: border-box; padding: 0px 5px;">92</li><li style="box-sizing: border-box; padding: 0px 5px;">93</li><li style="box-sizing: border-box; padding: 0px 5px;">94</li><li style="box-sizing: border-box; padding: 0px 5px;">95</li><li style="box-sizing: border-box; padding: 0px 5px;">96</li><li style="box-sizing: border-box; padding: 0px 5px;">97</li><li style="box-sizing: border-box; padding: 0px 5px;">98</li><li style="box-sizing: border-box; padding: 0px 5px;">99</li><li style="box-sizing: border-box; padding: 0px 5px;">100</li><li style="box-sizing: border-box; padding: 0px 5px;">101</li><li style="box-sizing: border-box; padding: 0px 5px;">102</li><li style="box-sizing: border-box; padding: 0px 5px;">103</li><li style="box-sizing: border-box; padding: 0px 5px;">104</li><li style="box-sizing: border-box; padding: 0px 5px;">105</li><li style="box-sizing: border-box; padding: 0px 5px;">106</li><li style="box-sizing: border-box; padding: 0px 5px;">107</li><li style="box-sizing: border-box; padding: 0px 5px;">108</li><li style="box-sizing: border-box; padding: 0px 5px;">109</li><li style="box-sizing: border-box; padding: 0px 5px;">110</li><li style="box-sizing: border-box; padding: 0px 5px;">111</li><li style="box-sizing: border-box; padding: 0px 5px;">112</li><li style="box-sizing: border-box; padding: 0px 5px;">113</li><li style="box-sizing: border-box; padding: 0px 5px;">114</li><li style="box-sizing: border-box; padding: 0px 5px;">115</li><li style="box-sizing: border-box; padding: 0px 5px;">116</li><li style="box-sizing: border-box; padding: 0px 5px;">117</li><li style="box-sizing: border-box; padding: 0px 5px;">118</li><li style="box-sizing: border-box; padding: 0px 5px;">119</li><li style="box-sizing: border-box; padding: 0px 5px;">120</li><li style="box-sizing: border-box; padding: 0px 5px;">121</li><li style="box-sizing: border-box; padding: 0px 5px;">122</li><li style="box-sizing: border-box; padding: 0px 5px;">123</li><li style="box-sizing: border-box; padding: 0px 5px;">124</li><li style="box-sizing: border-box; padding: 0px 5px;">125</li><li style="box-sizing: border-box; padding: 0px 5px;">126</li><li style="box-sizing: border-box; padding: 0px 5px;">127</li><li style="box-sizing: border-box; padding: 0px 5px;">128</li><li style="box-sizing: border-box; padding: 0px 5px;">129</li><li style="box-sizing: border-box; padding: 0px 5px;">130</li><li style="box-sizing: border-box; padding: 0px 5px;">131</li><li style="box-sizing: border-box; padding: 0px 5px;">132</li><li style="box-sizing: border-box; padding: 0px 5px;">133</li><li style="box-sizing: border-box; padding: 0px 5px;">134</li><li style="box-sizing: border-box; padding: 0px 5px;">135</li><li style="box-sizing: border-box; padding: 0px 5px;">136</li><li style="box-sizing: border-box; padding: 0px 5px;">137</li><li style="box-sizing: border-box; padding: 0px 5px;">138</li><li style="box-sizing: border-box; padding: 0px 5px;">139</li><li style="box-sizing: border-box; padding: 0px 5px;">140</li><li style="box-sizing: border-box; padding: 0px 5px;">141</li></ul>七、 神經(jīng)網(wǎng)絡(luò)如何優(yōu)化:訓練超參數(shù)與多種模型對比
由以上分析可知,神經(jīng)網(wǎng)絡(luò)只需要74行代碼就可以完成編程,可見機器學習真正困難的地方并不在編程,而在你對數(shù)學過程本身,和對它與現(xiàn)實問題的對應(yīng)關(guān)系有深入的理解。理解深入后,你才能寫出這樣的程序,并對其進行精微的調(diào)優(yōu)。
我們初步的結(jié)果已經(jīng)是94.76%的正確率了。但如果要將準確率提得更高怎么辦?
這其實是一個開放的問題,有許多方法都可以嘗試。我們這里僅僅是拋磚引玉。
首先,隱藏層只有30個節(jié)點。由我們之前對隱藏層的啟發(fā)式理解可以猜測,神經(jīng)網(wǎng)絡(luò)的識別能力其實與隱藏層對一些細節(jié)的識別能力正相關(guān)。如果隱藏層的節(jié)點更多的話,其識別能力應(yīng)該會更強的。那么我們設(shè)定100個隱藏層節(jié)點試試?
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">net = network.Network([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">784</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>]) net.SGD(training_data, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>, test_data=test_data)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>發(fā)現(xiàn)結(jié)果如下:
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6669</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6755</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6844</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6833</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6887</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7744</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7778</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7876</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8601</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8643</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8659</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8665</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8683</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8700</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8694</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8699</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">16</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8715</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">17</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8770</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">18</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9611</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">19</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9632</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9625</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">21</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9632</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9651</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">23</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9655</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">24</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9653</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">25</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9658</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9653</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">27</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9664</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9655</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">29</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9672</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li></ul>發(fā)現(xiàn),我們只是改了一個超參數(shù),準確率就從94.76%提升到96.72%!
這里強調(diào)一下,更加規(guī)范的模型調(diào)優(yōu)方法是將多個模型用交叉驗證集的結(jié)果來橫向比較,選出最優(yōu)模型后再用一個新的測試集來最終評估該模型。本文為了與之前的結(jié)果比較,才采用了測試集而不是交叉驗證集。讀者千萬不要學博主這樣做哈,因為這很有可能會過擬合。這是工程實踐中數(shù)據(jù)挖掘人員經(jīng)常犯的錯誤,我們之后會專門寫篇博文探討。
我們現(xiàn)在回來繼續(xù)調(diào)優(yōu)我們的模型。那么還有其他的隱藏節(jié)點數(shù)更合適嗎?這個我們也不知道。常見的方法是用幾何級數(shù)增長的數(shù)列(如:10,100,1000,……)去嘗試,然后不斷確定合適的區(qū)間,最終確定一個相對最優(yōu)的值。
但是即便如此,我們也只嘗試了一個超參數(shù),還有其他的超參數(shù)沒有調(diào)優(yōu)呢。我們于是嘗試另一個超參數(shù):步長。之前的步長是3.0,但是我們可能覺得學習速率太慢了。那么嘗試一個更大的步長試試?比如100?
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">net = network.Network([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">784</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>]) net.SGD(training_data, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100.0</span>, test_data=test_data)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>發(fā)現(xiàn)結(jié)果如下:
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1002</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">16</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">17</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">18</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">19</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1001</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1000</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">21</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1000</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">23</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">24</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">25</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">27</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">29</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">999</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li></ul>發(fā)現(xiàn)準確率低得不忍直視,看來步長設(shè)得太長了。根本跑不到最低點。那么我們設(shè)定一個小的步長試試?比如0.01。
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">net = network.Network([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">784</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>]) net.SGD(training_data, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">30</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.001</span>, test_data=test_data)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>結(jié)果如下:
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">790</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">846</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">854</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">904</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">944</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">975</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">975</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">975</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">975</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">13</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">14</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">15</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">16</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">17</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">18</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">974</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">19</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">976</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">979</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">21</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">981</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">22</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1004</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">23</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1157</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">24</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1275</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">25</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1323</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1369</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">27</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1403</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">28</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1429</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> Epoch <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">29</span>: <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1451</span> / <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li></ul>呃,發(fā)現(xiàn)準確率同樣低得不忍直視。但是有一個優(yōu)點,準確率是穩(wěn)步提升的。說明模型在大方向上應(yīng)該還是對的。如果在調(diào)試模型的時候忽視了這個細節(jié),你可能真的找不到合適的參數(shù)。
可見,我們第一次嘗試的神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的超參數(shù)設(shè)定還是比較不錯的。但是真實的應(yīng)用場景中,基本沒有這樣好的運氣,很可能剛開始測試出來的結(jié)果全是奇葩生物,長得違反常理,就像來自另一個次元似的。這是數(shù)據(jù)挖掘工程師常見的情況。此時最應(yīng)該做的,就是遏制住心中數(shù)萬草泥馬的咆哮奔騰,靜靜地觀察測試結(jié)果的分布規(guī)律,嘗試找到些原因,再繼續(xù)將模型試著調(diào)優(yōu)下去,與此同時,做好從一個坑跳入下一個坑的心理準備。當然,在機器學習工程師前赴后繼的填坑過程中,還是總結(jié)出了一些調(diào)優(yōu)規(guī)律。我們會在接下來專門寫博文分析。
當然,以上的調(diào)優(yōu)都沒有逃出神經(jīng)網(wǎng)絡(luò)模型本身的范圍。但是可不可能其他的模型效果更好?比如傳說中的支持向量機?關(guān)于支持向量機的解讀已經(jīng)超越了本文的篇幅,我們也考慮專門撰寫博文分析。但是在這里我們只是引用一下在scikit-learn中提供好的接口,底層是用性能更好的C語言封裝的著名的LIBSVM。
相關(guān)代碼也在Michael Nielsen的文件中。直接引入,并運行一個方法即可。
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> mnist_svm mnist_svm.svm_baseline()</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>我們看看結(jié)果:
<code class="language-python hljs has-numbering" style="display: block; padding: 0px; background-color: transparent; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background-position: initial initial; background-repeat: initial initial;">Baseline classifier using an SVM. <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">9435</span> of <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10000</span> values correct.</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>94.35%,好像比我們的神經(jīng)網(wǎng)絡(luò)低一點啊。看來我們的神經(jīng)網(wǎng)絡(luò)模型還是更優(yōu)秀一些?
然而,實際情況并非如此。因為我們用的只是scikit-learn給支持向量機的設(shè)好的默認參數(shù)。支持向量機同樣有一大堆可調(diào)的超參數(shù),以提升模型的效果。 跟據(jù)?Andreas Mueller的這篇博文,調(diào)整好超參數(shù)的支持向量機能夠達到98.5%的準確度!比我們剛才最好的神經(jīng)網(wǎng)絡(luò)提高了1.8個百分點!
然而,故事并沒有結(jié)束。2013年,通過深度神經(jīng)網(wǎng)絡(luò),研究者可以達到99.79%的準確度!而且,他們并沒有運用很多高深的技術(shù)。很多技術(shù)在我們接下來的博文中都可以繼續(xù)介紹。
所以,從目前的準確度來看:
簡單的支持向量機<淺層神經(jīng)網(wǎng)絡(luò)<調(diào)優(yōu)的支持向量機<深度神經(jīng)網(wǎng)絡(luò)
但還是要提醒一下,炫酷的算法固然重要,但是良好的數(shù)據(jù)集有時候比算法更重要。Michael Nielsen專門寫了一個公式來來表達他們的關(guān)系:
精致的算法 ≤ 簡單的算法 + 良好的訓練數(shù)據(jù)?
sophisticated algorithm ≤ simple learning algorithm + good training data.
所以為了調(diào)優(yōu)模型,往往要溯源到數(shù)據(jù)本身,好的數(shù)據(jù)真的會有好的結(jié)果。
八、 小結(jié)與下期預告
以上我們只是粗略地展示了用神經(jīng)網(wǎng)絡(luò)分析問題的基本過程,很多深入的內(nèi)容并沒有展開。我們將會在接下來的博文中進行深入探討。
在該系列下一篇博文中,我們試圖直接探討深度神經(jīng)網(wǎng)絡(luò)的表現(xiàn)能力,并提供一個啟發(fā)式理解。敬請關(guān)注。
《新程序員》:云原生和全面數(shù)字化實踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的手把手入门神经网络系列(2)_74行代码实现手写数字识别的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 深度学习与计算机视觉系列(7)_神经网络
- 下一篇: 机器学习系列(1)_逻辑回归初步