FROM:?http://blog.csdn.net/u012162613/article/details/43157801
DeepLearning tutorial(1)Softmax回歸原理簡介+代碼詳解
@author:wepon
@blog:http://blog.csdn.net/u012162613/article/details/43157801
本文介紹Softmax回歸算法,特別是詳細解讀其代碼實現,基于python theano,代碼來自:Classifying MNIST digits using Logistic Regression,參考UFLDL。
一、Softmax回歸簡介
關于算法的詳細教程本文沒必要多說,可以參考UFLDL。下面只簡單地總結一下,以便更好地理解代碼。 Softmax回歸其實就相當于多類別情況下的邏輯回歸,對比如下: 邏輯回歸的假設函數(hypothesis):
整個邏輯回歸模型的參數就是theta,h(*)是sigmoid函數,輸出在0~1之間,一般作為二分類算法。對于具體的問題,找出最合適的theta便是最重要的步驟,這是最優化問題,一般通過定義代價函數,然后最小化代價函數來求解,邏輯回歸的代價函數為:
最小化J(theta),一般采用梯度下降算法,迭代計算梯度并更新theta。
Softmax的假設函數:
邏輯回歸里將-theta*x作為sigmoid函數的輸入,得到的是0或者1,兩個類別。而softmax有有k個類別,并且將-theta*x作為指數的系數,所以就有e^(-theta_1*x)至e^( -theta_k*x)共k項,然后除以它們的累加和,這樣做就實現了歸一化,使得輸出的k個數的和為1,而每一個數就代表那個類別出現的概率。因此:softmax的假設函數輸出的是一個k維列向量,每一個維度的數就代表那個類別出現的概率。
Softmax的代價函數:
本質上跟邏輯回歸是一樣的,采用NLL,如果加上權重衰減項(正則化項),則為:
最小化代價函數,同樣可以采用簡單而有效的梯度下降,需要提到的是,在程序實現中,我們一般采用批量隨機梯度下降,即MSGD,minibatch?Stochastic Gradient Descent,簡單來說,就是每遍歷完一個batch的樣本才計算梯度和更新參數,一個batch一般有幾十到幾百的單個樣本。PS:隨機梯度下降則是一個樣本更新一次。
二、Softmax代碼詳細解讀
首先說明一點,下面的程序采用的是MSGD算法,代價函數是不帶權重衰減項的,整個程序實現用Softmax回歸來classfy MINST數據集(識別手寫數字0~9)。代碼解讀是個人理解,僅供參考,不一定正確,如有錯誤請不吝指出。
原始代碼和經過我注釋的代碼:github地址
參數說明:上面第一部分我們的參數用theta表示,在下面的程序中,用的是W,權重,這兩者是一樣的。還有一點需要注意,上面的假設函數中是-theta*x,而在程序中,用的是W*X+b,本質也是一樣的,因為可以將b看成W0,增加一個x0=1,則W*X+b=WX=-theta*x。
(1)導入一些必要的模塊
[python] view plaincopy
import?cPickle??import?gzip??import?os??import?sys??import?time????import?numpy????import?theano??import?theano.tensor?as?T?? (2)定義Softmax回歸模型
在deeplearning tutorial中,直接將LogisticRegression視為Softmax,而我們所認識的二類別的邏輯回歸就是當n_out=2時的LogisticRegression,因此下面代碼定義的LogisticRegression就是Softmax。
代碼解讀見注釋:
[python] view plaincopy
????????class?LogisticRegression(object):??????def?__init__(self,?input,?n_in,?n_out):????????????????self.W?=?theano.shared(??????????????value=numpy.zeros(??????????????????(n_in,?n_out),??????????????????dtype=theano.config.floatX??????????????),??????????????name='W',??????????????borrow=True??????????)????????????self.b?=?theano.shared(??????????????value=numpy.zeros(??????????????????(n_out,),??????????????????dtype=theano.config.floatX??????????????),??????????????name='b',??????????????borrow=True??????????)??????????????????????self.p_y_given_x?=?T.nnet.softmax(T.dot(input,?self.W)?+?self.b)??????????????self.y_pred?=?T.argmax(self.p_y_given_x,?axis=1)??????????????self.params?=?[self.W,?self.b]??????????????????def?negative_log_likelihood(self,?y):????????????return?-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),?y])??????????def?errors(self,?y):????????????????????if?y.ndim?!=?self.y_pred.ndim:??????????????raise?TypeError(??????????????????'y?should?have?the?same?shape?as?self.y_pred',??????????????????('y',?y.type,?'y_pred',?self.y_pred.type)??????????????)??????????????????????????????????????????????????if?y.dtype.startswith('int'):??????????????return?T.mean(T.neq(self.y_pred,?y))??????????else:??????????????raise?NotImplementedError()?? 上面已經定義好了softmax模型,包括輸入的batch :input,每個樣本的大小n_in,輸出的類別n_out,模型的參數W、b,模型預測的輸出y_pred,代價函數NLL,以及誤差率errors。
(3)加載MNIST數據集
[python] view plaincopy
def?load_data(dataset):??????????????????data_dir,?data_file?=?os.path.split(dataset)??????if?data_dir?==?""?and?not?os.path.isfile(dataset):????????????????????new_path?=?os.path.join(??????????????os.path.split(__file__)[0],??????????????"..",??????????????"data",??????????????dataset??????????)??????????if?os.path.isfile(new_path)?or?data_file?==?'mnist.pkl.gz':??????????????dataset?=?new_path????????if?(not?os.path.isfile(dataset))?and?data_file?==?'mnist.pkl.gz':??????????import?urllib??????????origin?=?(??????????????'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'??????????)??????????print?'Downloading?data?from?%s'?%?origin??????????urllib.urlretrieve(origin,?dataset)????????print?'...?loading?data'????????????????????f?=?gzip.open(dataset,?'rb')??????train_set,?valid_set,?test_set?=?cPickle.load(f)??????f.close()?????????????????def?shared_dataset(data_xy,?borrow=True):??????????data_x,?data_y?=?data_xy??????????shared_x?=?theano.shared(numpy.asarray(data_x,?????????????????????????????????????????????????dtype=theano.config.floatX),???????????????????????????????????borrow=borrow)??????????shared_y?=?theano.shared(numpy.asarray(data_y,?????????????????????????????????????????????????dtype=theano.config.floatX),???????????????????????????????????borrow=borrow)??????????return?shared_x,?T.cast(shared_y,?'int32')??????????test_set_x,?test_set_y?=?shared_dataset(test_set)??????valid_set_x,?valid_set_y?=?shared_dataset(valid_set)??????train_set_x,?train_set_y?=?shared_dataset(train_set)????????rval?=?[(train_set_x,?train_set_y),?(valid_set_x,?valid_set_y),??????????????(test_set_x,?test_set_y)]??????return?rval??
(4)將模型應用于MNIST數據集
[python] view plaincopy
def?sgd_optimization_mnist(learning_rate=0.13,?n_epochs=1000,?????????????????????????????dataset='mnist.pkl.gz',?????????????????????????????batch_size=600):????????datasets?=?load_data(dataset)??????train_set_x,?train_set_y?=?datasets[0]??????valid_set_x,?valid_set_y?=?datasets[1]??????test_set_x,?test_set_y?=?datasets[2]????????n_train_batches?=?train_set_x.get_value(borrow=True).shape[0]?/?batch_size??????n_valid_batches?=?valid_set_x.get_value(borrow=True).shape[0]?/?batch_size??????n_test_batches?=?test_set_x.get_value(borrow=True).shape[0]?/?batch_size??????????????????????????print?'...?building?the?model'????????????index?=?T.lscalar()????????x?=?T.matrix('x')???????y?=?T.ivector('y')?????????????????????classifier?=?LogisticRegression(input=x,?n_in=28?*?28,?n_out=10)??????????????cost?=?classifier.negative_log_likelihood(y)??????????????????????test_model?=?theano.function(??????????inputs=[index],??????????outputs=classifier.errors(y),??????????givens={??????????????x:?test_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??????????????y:?test_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??????????}??????)??????????validate_model?=?theano.function(??????????inputs=[index],??????????outputs=classifier.errors(y),??????????givens={??????????????x:?valid_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??????????????y:?valid_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??????????}??????????g_W?=?T.grad(cost=cost,?wrt=classifier.W)??????g_b?=?T.grad(cost=cost,?wrt=classifier.b)??????????updates?=?[(classifier.W,?classifier.W?-?learning_rate?*?g_W),?????????????????(classifier.b,?classifier.b?-?learning_rate?*?g_b)]??????????train_model?=?theano.function(??????????inputs=[index],??????????outputs=cost,??????????updates=updates,??????????givens={??????????????x:?train_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??????????????y:?train_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??????????}??????)??????????????????????????print?'...?training?the?model'???????????patience?=?5000????????patience_increase?=?2?????????improvement_threshold?=?0.995?????????validation_frequency?=?min(n_train_batches,?patience?/?2)??????????????????????????????????????????best_validation_loss?=?numpy.inf?????????test_score?=?0.??????start_time?=?time.clock()????????done_looping?=?False??????epoch?=?0????????????????????????????while?(epoch?<?n_epochs)?and?(not?done_looping):??????????epoch?=?epoch?+?1??????????for?minibatch_index?in?xrange(n_train_batches):????????????????minibatch_avg_cost?=?train_model(minibatch_index)????????????????????????????iter?=?(epoch?-?1)?*?n_train_batches?+?minibatch_index????????????????if?(iter?+?1)?%?validation_frequency?==?0:????????????????????????????????????validation_losses?=?[validate_model(i)???????????????????????????????????????for?i?in?xrange(n_valid_batches)]??????????????????this_validation_loss?=?numpy.mean(validation_losses)????????????????????print(??????????????????????'epoch?%i,?minibatch?%i/%i,?validation?error?%f?%%'?%??????????????????????(??????????????????????????epoch,??????????????????????????minibatch_index?+?1,??????????????????????????n_train_batches,??????????????????????????this_validation_loss?*?100.??????????????????????)??????????????????)??????????????????????????????????????if?this_validation_loss?<?best_validation_loss:????????????????????????????????????????????if?this_validation_loss?<?best_validation_loss?*??\?????????????????????????improvement_threshold:??????????????????????????patience?=?max(patience,?iter?*?patience_increase)????????????????????????best_validation_loss?=?this_validation_loss??????????????????????????????????????????????test_losses?=?[test_model(i)?????????????????????????????????????for?i?in?xrange(n_test_batches)]??????????????????????test_score?=?numpy.mean(test_losses)????????????????????????print(??????????????????????????(??????????????????????????????'?????epoch?%i,?minibatch?%i/%i,?test?error?of'??????????????????????????????'?best?model?%f?%%'??????????????????????????)?%??????????????????????????(??????????????????????????????epoch,??????????????????????????????minibatch_index?+?1,??????????????????????????????n_train_batches,??????????????????????????????test_score?*?100.??????????????????????????)??????????????????????)????????????????if?patience?<=?iter:??????????????????done_looping?=?True??????????????????break??????????end_time?=?time.clock()??????print(??????????(??????????????'Optimization?complete?with?best?validation?score?of?%f?%%,'??????????????'with?test?performance?%f?%%'??????????)??????????%?(best_validation_loss?*?100.,?test_score?*?100.)??????)??????print?'The?code?run?for?%d?epochs,?with?%f?epochs/sec'?%?(??????????epoch,?1.?*?epoch?/?(end_time?-?start_time))??????print?>>?sys.stderr,?('The?code?for?file?'?+????????????????????????????os.path.split(__file__)[1]?+????????????????????????????'?ran?for?%.1fs'?%?((end_time?-?start_time)))?? 完
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀
總結
以上是生活随笔為你收集整理的DeepLearning tutorial(1)Softmax回归原理简介+代码详解的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。