生活随笔
收集整理的這篇文章主要介紹了
DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
?FROM:http://blog.csdn.net/u012162613/article/details/43221829
@author:wepon
@blog:http://blog.csdn.net/u012162613/article/details/43221829
本文介紹多層感知機(jī)算法,特別是詳細(xì)解讀其代碼實(shí)現(xiàn),基于python theano,代碼來自:Multilayer Perceptron,如果你想詳細(xì)了解多層感知機(jī)算法,可以參考:UFLDL教程,或者參考本文第一部分的算法簡介。
經(jīng)詳細(xì)注釋的代碼:放在我的github地址上,可下載。
一、多層感知機(jī)(MLP)原理簡介
多層感知機(jī)(MLP,Multilayer Perceptron)也叫人工神經(jīng)網(wǎng)絡(luò)(ANN,Artificial Neural Network),除了輸入輸出層,它中間可以有多個(gè)隱層,最簡單的MLP只含一個(gè)隱層,即三層的結(jié)構(gòu),如下圖:
從上圖可以看到,多層感知機(jī)層與層之間是全連接的(全連接的意思就是:上一層的任何一個(gè)神經(jīng)元與下一層的所有神經(jīng)元都有連接)。多層感知機(jī)最底層是輸入層,中間是隱藏層,最后是輸出層。
輸入層沒什么好說,你輸入什么就是什么,比如輸入是一個(gè)n維向量,就有n個(gè)神經(jīng)元。
隱藏層的神經(jīng)元怎么得來?首先它與輸入層是全連接的,假設(shè)輸入層用向量X表示,則隱藏層的輸出就是
f(W1X+b1),W1是權(quán)重(也叫連接系數(shù)),b1是偏置,函數(shù)f 可以是常用的sigmoid函數(shù)或者tanh函數(shù):
? ? ? ?
最后就是輸出層,輸出層與隱藏層是什么關(guān)系?其實(shí)隱藏層到輸出層可以看成是一個(gè)多類別的邏輯回歸,也即softmax回歸,所以輸出層的輸出就是softmax(W2X1+b2),X1表示隱藏層的輸出f(W1X+b1)。
MLP整個(gè)模型就是這樣子的,上面說的這個(gè)三層的MLP用公式總結(jié)起來就是,函數(shù)G是softmax
因此,MLP所有的參數(shù)就是各個(gè)層之間的連接權(quán)重以及偏置,包括W1、b1、W2、b2。對(duì)于一個(gè)具體的問題,怎么確定這些參數(shù)?求解最佳的參數(shù)是一個(gè)最優(yōu)化問題,解決最優(yōu)化問題,最簡單的就是梯度下降法了(SGD):首先隨機(jī)初始化所有參數(shù),然后迭代地訓(xùn)練,不斷地計(jì)算梯度和更新參數(shù),直到滿足某個(gè)條件為止(比如誤差足夠小、迭代次數(shù)足夠多時(shí))。這個(gè)過程涉及到代價(jià)函數(shù)、規(guī)則化(Regularization)、學(xué)習(xí)速率(learning rate)、梯度計(jì)算等,本文不詳細(xì)討論,讀者可以參考本文頂部給出的兩個(gè)鏈接。
了解了MLP的基本模型,下面進(jìn)入代碼實(shí)現(xiàn)部分。
二、多層感知機(jī)(MLP)代碼詳細(xì)解讀(基于python+theano)
再次說明,代碼來自:Multilayer Perceptron,本文只是做一個(gè)詳細(xì)解讀,如有錯(cuò)誤,請(qǐng)不吝指出。
這個(gè)代碼實(shí)現(xiàn)的是一個(gè)三層的感知機(jī),但是理解了代碼之后,實(shí)現(xiàn)n層感知機(jī)都不是問題,所以只需理解好這個(gè)三層的MLP模型即可。概括地說,MLP的輸入層X其實(shí)就是我們的訓(xùn)練數(shù)據(jù),所以輸入層不用實(shí)現(xiàn),剩下的就是“輸入層到隱含層”,“隱含層到輸出層”這兩部分。上面介紹原理時(shí)已經(jīng)說到了,“輸入層到隱含層”就是一個(gè)全連接的層,在下面的代碼中我們把這一部分定義為HiddenLayer?!半[含層到輸出層”就是一個(gè)分類器softmax回歸(也有人叫邏輯回歸),在下面的代碼中我們把這一部分定義為LogisticRegression。
代碼詳解開始:
(1)導(dǎo)入必要的python模塊
主要是numpy、theano,以及python自帶的os、sys、time模塊,這些模塊的使用在下面的程序中會(huì)看到。
[python] view plaincopy
import?os??import?sys??import?time????import?numpy????import?theano??import?theano.tensor?as?T??
(2)定義MLP模型(HiddenLayer+LogisticRegression)
這一部分定義MLP的基本“構(gòu)件”,即上文一直在提的HiddenLayer和LogisticRegression
隱含層我們需要定義連接系數(shù)W、偏置b,輸入、輸出,具體的代碼以及解讀如下:
[python] view plaincopy
class?HiddenLayer(object):??????def?__init__(self,?rng,?input,?n_in,?n_out,?W=None,?b=None,???????????????????activation=T.tanh):???????????????????????????????????????self.input?=?input??????????????????????????if?W?is?None:??????????????W_values?=?numpy.asarray(??????????????????rng.uniform(??????????????????????low=-numpy.sqrt(6.?/?(n_in?+?n_out)),??????????????????????high=numpy.sqrt(6.?/?(n_in?+?n_out)),??????????????????????size=(n_in,?n_out)??????????????????),??????????????????dtype=theano.config.floatX??????????????)??????????????if?activation?==?theano.tensor.nnet.sigmoid:??????????????????W_values?*=?4??????????????W?=?theano.shared(value=W_values,?name='W',?borrow=True)????????????if?b?is?None:??????????????b_values?=?numpy.zeros((n_out,),?dtype=theano.config.floatX)??????????????b?=?theano.shared(value=b_values,?name='b',?borrow=True)??????????????self.W?=?W??????????self.b?=?b??????????????lin_output?=?T.dot(input,?self.W)?+?self.b??????????self.output?=?(??????????????lin_output?if?activation?is?None??????????????else?activation(lin_output)??????????)??????????????self.params?=?[self.W,?self.b]??
邏輯回歸(softmax回歸),代碼詳解如下。
(如果你想詳細(xì)了解softmax回歸,可以參考:?DeepLearning tutorial(1)Softmax回歸原理簡介+代碼詳解)
[python] view plaincopy
????????????????class?LogisticRegression(object):??????def?__init__(self,?input,?n_in,?n_out):??????????????self.W?=?theano.shared(??????????????value=numpy.zeros(??????????????????(n_in,?n_out),??????????????????dtype=theano.config.floatX??????????????),??????????????name='W',??????????????borrow=True??????????)????????????self.b?=?theano.shared(??????????????value=numpy.zeros(??????????????????(n_out,),??????????????????dtype=theano.config.floatX??????????????),??????????????name='b',??????????????borrow=True??????????)??????????????????????self.p_y_given_x?=?T.nnet.softmax(T.dot(input,?self.W)?+?self.b)??????????????self.y_pred?=?T.argmax(self.p_y_given_x,?axis=1)??????????????self.params?=?[self.W,?self.b]??
ok!這兩個(gè)基本“構(gòu)件”做好了,現(xiàn)在我們可以將它們“組裝”在一起。
我們要三層的MLP,則只需要HiddenLayer+LogisticRegression,
如果要四層的MLP,則為HiddenLayer+HiddenLayer+LogisticRegression........以此類推。
下面是三層的MLP:
[python] view plaincopy
??class?MLP(object):??????def?__init__(self,?rng,?input,?n_in,?n_hidden,?n_out):????????????????????self.hiddenLayer?=?HiddenLayer(??????????????rng=rng,??????????????input=input,??????????????n_in=n_in,??????????????n_out=n_hidden,??????????????activation=T.tanh??????????)??????????????self.logRegressionLayer?=?LogisticRegression(??????????????input=self.hiddenLayer.output,??????????????n_in=n_hidden,??????????????n_out=n_out??????????)????????????????????self.L1?=?(??????????????abs(self.hiddenLayer.W).sum()??????????????+?abs(self.logRegressionLayer.W).sum()??????????)????????????self.L2_sqr?=?(??????????????(self.hiddenLayer.W?**?2).sum()??????????????+?(self.logRegressionLayer.W?**?2).sum()??????????)????????????????self.negative_log_likelihood?=?(??????????????self.logRegressionLayer.negative_log_likelihood??????????)??????????????self.errors?=?self.logRegressionLayer.errors??????????????self.params?=?self.hiddenLayer.params?+?self.logRegressionLayer.params????????????
MLP類里面除了隱含層和分類層,還定義了損失函數(shù)、規(guī)則化項(xiàng),這是在求解優(yōu)化算法時(shí)用到的。
(3)將MLP應(yīng)用于MNIST(手寫數(shù)字識(shí)別)
上面定義好了一個(gè)三層的MLP,接下來使用它在MNIST數(shù)據(jù)集上分類,MNIST是一個(gè)手寫數(shù)字0~9的數(shù)據(jù)集。
首先定義加載數(shù)據(jù)?mnist.pkl.gz?的函數(shù)load_data():
[python] view plaincopy
????def?load_data(dataset):??????????????????data_dir,?data_file?=?os.path.split(dataset)??????if?data_dir?==?""?and?not?os.path.isfile(dataset):????????????????????new_path?=?os.path.join(??????????????os.path.split(__file__)[0],??????????????"..",??????????????"data",??????????????dataset??????????)??????????if?os.path.isfile(new_path)?or?data_file?==?'mnist.pkl.gz':??????????????dataset?=?new_path????????if?(not?os.path.isfile(dataset))?and?data_file?==?'mnist.pkl.gz':??????????import?urllib??????????origin?=?(??????????????'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'??????????)??????????print?'Downloading?data?from?%s'?%?origin??????????urllib.urlretrieve(origin,?dataset)????????print?'...?loading?data'????????????????????f?=?gzip.open(dataset,?'rb')??????train_set,?valid_set,?test_set?=?cPickle.load(f)??????f.close()?????????????????def?shared_dataset(data_xy,?borrow=True):??????????data_x,?data_y?=?data_xy??????????shared_x?=?theano.shared(numpy.asarray(data_x,?????????????????????????????????????????????????dtype=theano.config.floatX),???????????????????????????????????borrow=borrow)??????????shared_y?=?theano.shared(numpy.asarray(data_y,?????????????????????????????????????????????????dtype=theano.config.floatX),???????????????????????????????????borrow=borrow)??????????return?shared_x,?T.cast(shared_y,?'int32')??????????test_set_x,?test_set_y?=?shared_dataset(test_set)??????valid_set_x,?valid_set_y?=?shared_dataset(valid_set)??????train_set_x,?train_set_y?=?shared_dataset(train_set)????????rval?=?[(train_set_x,?train_set_y),?(valid_set_x,?valid_set_y),??????????????(test_set_x,?test_set_y)]??????return?rval??
加載了數(shù)據(jù),可以開始訓(xùn)練這個(gè)模型了,以下就是主體函數(shù)test_mlp(),將MLP用在MNIST上:
[python] view plaincopy
??def?test_mlp(learning_rate=0.01,?L1_reg=0.00,?L2_reg=0.0001,?n_epochs=10,???????????????dataset='mnist.pkl.gz',?batch_size=20,?n_hidden=500):?????????????????????????datasets?=?load_data(dataset)??????train_set_x,?train_set_y?=?datasets[0]??????valid_set_x,?valid_set_y?=?datasets[1]??????test_set_x,?test_set_y?=?datasets[2]????????????n_train_batches?=?train_set_x.get_value(borrow=True).shape[0]?/?batch_size??????n_valid_batches?=?valid_set_x.get_value(borrow=True).shape[0]?/?batch_size??????n_test_batches?=?test_set_x.get_value(borrow=True).shape[0]?/?batch_size??????????????????????????print?'...?building?the?model'??????????????index?=?T.lscalar()????????x?=?T.matrix('x')???????y?=?T.ivector('y')???????????????????????????????????rng?=?numpy.random.RandomState(1234)????????classifier?=?MLP(??????????rng=rng,??????????input=x,??????????n_in=28?*?28,??????????n_hidden=n_hidden,??????????n_out=10??????)????????????cost?=?(??????????classifier.negative_log_likelihood(y)??????????+?L1_reg?*?classifier.L1??????????+?L2_reg?*?classifier.L2_sqr??????)??????????????????????test_model?=?theano.function(??????????inputs=[index],??????????outputs=classifier.errors(y),??????????givens={??????????????x:?test_set_x[index?*?batch_size:(index?+?1)?*?batch_size],??????????????y:?test_set_y[index?*?batch_size:(index?+?1)?*?batch_size]??????????}??????)????????validate_model?=?theano.function(??????????inputs=[index],??????????outputs=classifier.errors(y),??????????givens={??????????????x:?valid_set_x[index?*?batch_size:(index?+?1)?*?batch_size],??????????????y:?valid_set_y[index?*?batch_size:(index?+?1)?*?batch_size]??????????}??????)??????????gparams?=?[T.grad(cost,?param)?for?param?in?classifier.params]????????????????updates?=?[??????????(param,?param?-?learning_rate?*?gparam)??????????for?param,?gparam?in?zip(classifier.params,?gparams)??????]????????train_model?=?theano.function(??????????inputs=[index],??????????outputs=cost,??????????updates=updates,??????????givens={??????????????x:?train_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??????????????y:?train_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??????????}??????)????????????????????????????print?'...?training'????????????????patience?=?10000????????patience_increase?=?2??????????improvement_threshold?=?0.995??????????validation_frequency?=?min(n_train_batches,?patience?/?2)????????????best_validation_loss?=?numpy.inf??????best_iter?=?0??????test_score?=?0.??????start_time?=?time.clock()??????????????epoch?=?0??????done_looping?=?False??????????????????????????while?(epoch?<?n_epochs)?and?(not?done_looping):??????????epoch?=?epoch?+?1??????????for?minibatch_index?in?xrange(n_train_batches):????????????????minibatch_avg_cost?=?train_model(minibatch_index)????????????????????????????iter?=?(epoch?-?1)?*?n_train_batches?+?minibatch_index????????????????if?(iter?+?1)?%?validation_frequency?==?0:????????????????????????????????????validation_losses?=?[validate_model(i)?for?i???????????????????????????????????????in?xrange(n_valid_batches)]??????????????????this_validation_loss?=?numpy.mean(validation_losses)????????????????????print(??????????????????????'epoch?%i,?minibatch?%i/%i,?validation?error?%f?%%'?%??????????????????????(??????????????????????????epoch,??????????????????????????minibatch_index?+?1,??????????????????????????n_train_batches,??????????????????????????this_validation_loss?*?100.??????????????????????)??????????????????)????????????????????if?this_validation_loss?<?best_validation_loss:??????????????????????if?(??????????????????????????this_validation_loss?<?best_validation_loss?*??????????????????????????improvement_threshold??????????????????????):??????????????????????????patience?=?max(patience,?iter?*?patience_increase)????????????????????????best_validation_loss?=?this_validation_loss??????????????????????best_iter?=?iter????????????????????????test_losses?=?[test_model(i)?for?i?????????????????????????????????????in?xrange(n_test_batches)]??????????????????????test_score?=?numpy.mean(test_losses)????????????????????????print(('?????epoch?%i,?minibatch?%i/%i,?test?error?of?'?????????????????????????????'best?model?%f?%%')?%????????????????????????????(epoch,?minibatch_index?+?1,?n_train_batches,?????????????????????????????test_score?*?100.))????????????????if?patience?<=?iter:??????????????????done_looping?=?True??????????????????break????????end_time?=?time.clock()??????print(('Optimization?complete.?Best?validation?score?of?%f?%%?'?????????????'obtained?at?iteration?%i,?with?test?performance?%f?%%')?%????????????(best_validation_loss?*?100.,?best_iter?+?1,?test_score?*?100.))??????print?>>?sys.stderr,?('The?code?for?file?'?+????????????????????????????os.path.split(__file__)[1]?+????????????????????????????'?ran?for?%.2fm'?%?((end_time?-?start_time)?/?60.))??
文章完,經(jīng)詳細(xì)注釋的代碼:放在我的github地址上,可下載。
如果有任何錯(cuò)誤,或者有說不清楚的地方,歡迎評(píng)論留言。
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀
總結(jié)
以上是生活随笔為你收集整理的DeepLearning tutorial(3)MLP多层感知机原理简介+代码详解的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。