當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

DeepLearning tutorial（1）Softmax回归原理简介+代码详解

發布時間：2025/7/25 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 DeepLearning tutorial（1）Softmax回归原理简介+代码详解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

FROM:?http://blog.csdn.net/u012162613/article/details/43157801

DeepLearning tutorial（1）Softmax回歸原理簡介+代碼詳解

@author：wepon

@blog：http://blog.csdn.net/u012162613/article/details/43157801

本文介紹Softmax回歸算法，特別是詳細解讀其代碼實現，基于python theano，代碼來自：Classifying MNIST digits using Logistic Regression，參考UFLDL。

一、Softmax回歸簡介

關于算法的詳細教程本文沒必要多說，可以參考UFLDL。下面只簡單地總結一下，以便更好地理解代碼。 Softmax回歸其實就相當于多類別情況下的邏輯回歸，對比如下：邏輯回歸的假設函數（hypothesis）：

整個邏輯回歸模型的參數就是theta，h(*)是sigmoid函數，輸出在0～1之間，一般作為二分類算法。對于具體的問題，找出最合適的theta便是最重要的步驟，這是最優化問題，一般通過定義代價函數，然后最小化代價函數來求解，邏輯回歸的代價函數為：

最小化J(theta)，一般采用梯度下降算法，迭代計算梯度并更新theta。
Softmax的假設函數：
邏輯回歸里將-theta*x作為sigmoid函數的輸入，得到的是0或者1，兩個類別。而softmax有有k個類別，并且將-theta*x作為指數的系數，所以就有e^(-theta_1*x)至e^( -theta_k*x)共k項，然后除以它們的累加和，這樣做就實現了歸一化，使得輸出的k個數的和為1，而每一個數就代表那個類別出現的概率。因此：softmax的假設函數輸出的是一個k維列向量，每一個維度的數就代表那個類別出現的概率。
Softmax的代價函數：

本質上跟邏輯回歸是一樣的，采用NLL，如果加上權重衰減項（正則化項），則為：

最小化代價函數，同樣可以采用簡單而有效的梯度下降，需要提到的是，在程序實現中，我們一般采用批量隨機梯度下降，即MSGD，minibatch?Stochastic Gradient Descent，簡單來說，就是每遍歷完一個batch的樣本才計算梯度和更新參數，一個batch一般有幾十到幾百的單個樣本。PS：隨機梯度下降則是一個樣本更新一次。

二、Softmax代碼詳細解讀

首先說明一點，下面的程序采用的是MSGD算法，代價函數是不帶權重衰減項的，整個程序實現用Softmax回歸來classfy MINST數據集（識別手寫數字0～9）。代碼解讀是個人理解，僅供參考，不一定正確，如有錯誤請不吝指出。

原始代碼和經過我注釋的代碼:github地址

參數說明：上面第一部分我們的參數用theta表示，在下面的程序中，用的是W，權重，這兩者是一樣的。還有一點需要注意，上面的假設函數中是-theta*x，而在程序中，用的是W*X+b，本質也是一樣的，因為可以將b看成W0，增加一個x0=1，則W*X+b=WX=-theta*x。

（1）導入一些必要的模塊

[python] view plaincopy

import?cPickle??

import?gzip??

import?os??

import?sys??

import?time??

import?numpy??

import?theano??

import?theano.tensor?as?T??

（2）定義Softmax回歸模型

在deeplearning tutorial中，直接將LogisticRegression視為Softmax，而我們所認識的二類別的邏輯回歸就是當n_out=2時的LogisticRegression，因此下面代碼定義的LogisticRegression就是Softmax。
代碼解讀見注釋：
[python] view plaincopy

#參數說明：??

#input，輸入的一個batch，假設一個batch有n個樣本(n_example)，則input大小就是(n_example,n_in)??

#n_in,每一個樣本的大小，MNIST每個樣本是一張28*28的圖片，故n_in=784??

#n_out,輸出的類別數，MNIST有0～9共10個類別，n_out=10???

class?LogisticRegression(object):??

????def?__init__(self,?input,?n_in,?n_out):??

#W大小是n_in行n_out列，b為n_out維向量。即：每個輸出對應W的一列以及b的一個元素。WX+b????

#W和b都定義為theano.shared類型，這個是為了程序能在GPU上跑。??

????????self.W?=?theano.shared(??

????????????value=numpy.zeros(??

????????????????(n_in,?n_out),??

????????????????dtype=theano.config.floatX??

????????????),??

????????????name='W',??

????????????borrow=True??

????????)??

????????self.b?=?theano.shared(??

????????????value=numpy.zeros(??

????????????????(n_out,),??

????????????????dtype=theano.config.floatX??

????????????),??

????????????name='b',??

????????????borrow=True??

????????)??

#input是(n_example,n_in)，W是（n_in,n_out）,點乘得到(n_example,n_out)，加上偏置b，??

#再作為T.nnet.softmax的輸入，得到p_y_given_x??

#故p_y_given_x每一行代表每一個樣本被估計為各類別的概率??????

#PS：b是n_out維向量，與(n_example,n_out)矩陣相加，內部其實是先復制n_example個b，??

#然后(n_example,n_out)矩陣的每一行都加b??

????????self.p_y_given_x?=?T.nnet.softmax(T.dot(input,?self.W)?+?self.b)??

#argmax返回最大值下標，因為本例數據集是MNIST，下標剛好就是類別。axis=1表示按行操作。??

????????self.y_pred?=?T.argmax(self.p_y_given_x,?axis=1)??

#params，模型的參數???????

????????self.params?=?[self.W,?self.b]??

#代價函數NLL??

#因為我們是MSGD，每次訓練一個batch，一個batch有n_example個樣本，則y大小是(n_example,),??

#y.shape[0]得出行數即樣本數，將T.log(self.p_y_given_x)簡記為LP，??

#則LP[T.arange(y.shape[0]),y]得到[LP[0,y[0]],?LP[1,y[1]],?LP[2,y[2]],?...,LP[n-1,y[n-1]]]??

#最后求均值mean，也就是說，minibatch的SGD，是計算出batch里所有樣本的NLL的平均值，作為它的cost??

????def?negative_log_likelihood(self,?y):????

????????return?-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),?y])??

#batch的誤差率??

????def?errors(self,?y):??

????????#?首先檢查y與y_pred的維度是否一樣，即是否含有相等的樣本數??

????????if?y.ndim?!=?self.y_pred.ndim:??

????????????raise?TypeError(??

????????????????'y?should?have?the?same?shape?as?self.y_pred',??

????????????????('y',?y.type,?'y_pred',?self.y_pred.type)??

????????????)??

????????#?再檢查是不是int類型，是的話計算T.neq(self.y_pred,?y)的均值，作為誤差率??

????????#舉個例子，假如self.y_pred=[3,2,3,2,3,2],而實際上y=[3,4,3,4,3,4]??

????????#則T.neq(self.y_pred,?y)=[0,1,0,1,0,1],1表示不等，0表示相等??

????????#故T.mean(T.neq(self.y_pred,?y))=T.mean([0,1,0,1,0,1])=0.5，即錯誤率50%??

????????if?y.dtype.startswith('int'):??

????????????return?T.mean(T.neq(self.y_pred,?y))??

????????else:??

????????????raise?NotImplementedError()??

上面已經定義好了softmax模型，包括輸入的batch ：input，每個樣本的大小n_in，輸出的類別n_out，模型的參數W、b，模型預測的輸出y_pred，代價函數NLL，以及誤差率errors。

（3）加載MNIST數據集

[python] view plaincopy

def?load_data(dataset):??

????#?dataset是數據集的路徑，程序首先檢測該路徑下有沒有MNIST數據集，沒有的話就下載MNIST數據集??

????#這一部分就不解釋了，與softmax回歸算法無關。??

????data_dir,?data_file?=?os.path.split(dataset)??

????if?data_dir?==?""?and?not?os.path.isfile(dataset):??

????????#?Check?if?dataset?is?in?the?data?directory.??

????????new_path?=?os.path.join(??

????????????os.path.split(__file__)[0],??

????????????"..",??

????????????"data",??

????????????dataset??

????????)??

????????if?os.path.isfile(new_path)?or?data_file?==?'mnist.pkl.gz':??

????????????dataset?=?new_path??

????if?(not?os.path.isfile(dataset))?and?data_file?==?'mnist.pkl.gz':??

????????import?urllib??

????????origin?=?(??

????????????'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'??

????????)??

????????print?'Downloading?data?from?%s'?%?origin??

????????urllib.urlretrieve(origin,?dataset)??

????print?'...?loading?data'??

#以上是檢測并下載數據集mnist.pkl.gz，不是本文重點。下面才是load_data的開始??

??????

#從"mnist.pkl.gz"里加載train_set,?valid_set,?test_set，它們都是包括label的??

#主要用到python里的gzip.open()函數,以及?cPickle.load()。??

#‘rb’表示以二進制可讀的方式打開文件??

????f?=?gzip.open(dataset,?'rb')??

????train_set,?valid_set,?test_set?=?cPickle.load(f)??

????f.close()??

?????

#將數據設置成shared?variables，主要時為了GPU加速，只有shared?variables才能存到GPU?memory中??

#GPU里數據類型只能是float。而data_y是類別，所以最后又轉換為int返回??

????def?shared_dataset(data_xy,?borrow=True):??

????????data_x,?data_y?=?data_xy??

????????shared_x?=?theano.shared(numpy.asarray(data_x,??

???????????????????????????????????????????????dtype=theano.config.floatX),??

?????????????????????????????????borrow=borrow)??

????????shared_y?=?theano.shared(numpy.asarray(data_y,??

???????????????????????????????????????????????dtype=theano.config.floatX),??

?????????????????????????????????borrow=borrow)??

????????return?shared_x,?T.cast(shared_y,?'int32')??

????test_set_x,?test_set_y?=?shared_dataset(test_set)??

????valid_set_x,?valid_set_y?=?shared_dataset(valid_set)??

????train_set_x,?train_set_y?=?shared_dataset(train_set)??

????rval?=?[(train_set_x,?train_set_y),?(valid_set_x,?valid_set_y),??

????????????(test_set_x,?test_set_y)]??

????return?rval??

（4）將模型應用于MNIST數據集

[python] view plaincopy

def?sgd_optimization_mnist(learning_rate=0.13,?n_epochs=1000,??

???????????????????????????dataset='mnist.pkl.gz',??

???????????????????????????batch_size=600):??

#加載數據??

????datasets?=?load_data(dataset)??

????train_set_x,?train_set_y?=?datasets[0]??

????valid_set_x,?valid_set_y?=?datasets[1]??

????test_set_x,?test_set_y?=?datasets[2]??

#計算有多少個minibatch，因為我們的優化算法是MSGD，是一個batch一個batch來計算cost的??

????n_train_batches?=?train_set_x.get_value(borrow=True).shape[0]?/?batch_size??

????n_valid_batches?=?valid_set_x.get_value(borrow=True).shape[0]?/?batch_size??

????n_test_batches?=?test_set_x.get_value(borrow=True).shape[0]?/?batch_size??

????######################??

????#?開始建模????????????#??

????######################??

????print?'...?building?the?model'??

#設置變量，index表示minibatch的下標，x表示訓練樣本，y是對應的label??

????index?=?T.lscalar()????

????x?=?T.matrix('x')???

????y?=?T.ivector('y')???

??????

#定義分類器，用x作為input初始化。??

????classifier?=?LogisticRegression(input=x,?n_in=28?*?28,?n_out=10)??

#定義代價函數，用y來初始化，而其實還有一個隱含的參數x在classifier中。??

#這樣理解才是合理的，因為cost必須由x和y得來，單單y是得不到cost的。??

????cost?=?classifier.negative_log_likelihood(y)??

#這里必須說明一下theano的function函數，givens是字典，其中的x、y是key，冒號后面是它們的value。??

#在function被調用時，x、y將被具體地替換為它們的value，而value里的參數index就是inputs=[index]這里給出。??

#下面舉個例子：??

#比如test_model(1)，首先根據index=1具體化x為test_set_x[1?*?batch_size:?(1?+?1)?*?batch_size]，??

#具體化y為test_set_y[1?*?batch_size:?(1?+?1)?*?batch_size]。然后函數計算outputs=classifier.errors(y)，??

#這里面有參數y和隱含的x，所以就將givens里面具體化的x、y傳遞進去。??

????test_model?=?theano.function(??

????????inputs=[index],??

????????outputs=classifier.errors(y),??

????????givens={??

????????????x:?test_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??

????????????y:?test_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??

????????}??

????)??

????validate_model?=?theano.function(??

????????inputs=[index],??

????????outputs=classifier.errors(y),??

????????givens={??

????????????x:?valid_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??

????????????y:?valid_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??

????????}??

#?計算各個參數的梯度??

????g_W?=?T.grad(cost=cost,?wrt=classifier.W)??

????g_b?=?T.grad(cost=cost,?wrt=classifier.b)??

#更新的規則，根據梯度下降法的更新公式??

????updates?=?[(classifier.W,?classifier.W?-?learning_rate?*?g_W),??

???????????????(classifier.b,?classifier.b?-?learning_rate?*?g_b)]??

#train_model跟上面分析的test_model類似，只是這里面多了updatas，更新規則用上面定義的updates?列表。?????

????train_model?=?theano.function(??

????????inputs=[index],??

????????outputs=cost,??

????????updates=updates,??

????????givens={??

????????????x:?train_set_x[index?*?batch_size:?(index?+?1)?*?batch_size],??

????????????y:?train_set_y[index?*?batch_size:?(index?+?1)?*?batch_size]??

????????}??

????)??

????###############??

????#?開始訓練?????#??

????###############??

????print?'...?training?the?model'??

?????

????patience?=?5000????

????patience_increase?=?2???

#提高的閾值，在驗證誤差減小到之前的0.995倍時，會更新best_validation_loss?????

????improvement_threshold?=?0.995????

#這樣設置validation_frequency可以保證每一次epoch都會在驗證集上測試。??

???validation_frequency?=?min(n_train_batches,?patience?/?2)??

??????????????????????????????????

????best_validation_loss?=?numpy.inf???#最好的驗證集上的loss，最好即最小。初始化為無窮大??

????test_score?=?0.??

????start_time?=?time.clock()??

????done_looping?=?False??

????epoch?=?0??

??????

#下面就是訓練過程了，while循環控制的時步數epoch，一個epoch會遍歷所有的batch，即所有的圖片。??

#for循環是遍歷一個個batch，一次一個batch地訓練。for循環體里會用train_model(minibatch_index)去訓練模型，??

#train_model里面的updatas會更新各個參數。??

#for循環里面會累加訓練過的batch數iter，當iter是validation_frequency倍數時則會在驗證集上測試，??

#如果驗證集的損失this_validation_loss小于之前最佳的損失best_validation_loss，??

#則更新best_validation_loss和best_iter，同時在testset上測試。??

#如果驗證集的損失this_validation_loss小于best_validation_loss*improvement_threshold時則更新patience。??

#當達到最大步數n_epoch時，或者patience<iter時，結束訓練??

????while?(epoch?<?n_epochs)?and?(not?done_looping):??

????????epoch?=?epoch?+?1??

????????for?minibatch_index?in?xrange(n_train_batches):??

????????????minibatch_avg_cost?=?train_model(minibatch_index)??

????????????#?iteration?number??

????????????iter?=?(epoch?-?1)?*?n_train_batches?+?minibatch_index??

????????????if?(iter?+?1)?%?validation_frequency?==?0:??

????????????????#?compute?zero-one?loss?on?validation?set??

????????????????validation_losses?=?[validate_model(i)??

?????????????????????????????????????for?i?in?xrange(n_valid_batches)]??

????????????????this_validation_loss?=?numpy.mean(validation_losses)??

????????????????print(??

????????????????????'epoch?%i,?minibatch?%i/%i,?validation?error?%f?%%'?%??

????????????????????(??

????????????????????????epoch,??

????????????????????????minibatch_index?+?1,??

????????????????????????n_train_batches,??

????????????????????????this_validation_loss?*?100.??

????????????????????)??

????????????????)??

????????????????#?if?we?got?the?best?validation?score?until?now??

????????????????if?this_validation_loss?<?best_validation_loss:??

????????????????????#improve?patience?if?loss?improvement?is?good?enough??

????????????????????if?this_validation_loss?<?best_validation_loss?*??\??

???????????????????????improvement_threshold:??

????????????????????????patience?=?max(patience,?iter?*?patience_increase)??

????????????????????best_validation_loss?=?this_validation_loss??

????????????????????#?test?it?on?the?test?set??

????????????????????test_losses?=?[test_model(i)??

???????????????????????????????????for?i?in?xrange(n_test_batches)]??

????????????????????test_score?=?numpy.mean(test_losses)??

????????????????????print(??

????????????????????????(??

????????????????????????????'?????epoch?%i,?minibatch?%i/%i,?test?error?of'??

????????????????????????????'?best?model?%f?%%'??

????????????????????????)?%??

????????????????????????(??

????????????????????????????epoch,??

????????????????????????????minibatch_index?+?1,??

????????????????????????????n_train_batches,??

????????????????????????????test_score?*?100.??

????????????????????????)??

????????????????????)??

????????????if?patience?<=?iter:??

????????????????done_looping?=?True??

????????????????break??

#while循環結束??

????end_time?=?time.clock()??

????print(??

????????(??

????????????'Optimization?complete?with?best?validation?score?of?%f?%%,'??

????????????'with?test?performance?%f?%%'??

????????)??

????????%?(best_validation_loss?*?100.,?test_score?*?100.)??

????)??

????print?'The?code?run?for?%d?epochs,?with?%f?epochs/sec'?%?(??

????????epoch,?1.?*?epoch?/?(end_time?-?start_time))??

????print?>>?sys.stderr,?('The?code?for?file?'?+??

??????????????????????????os.path.split(__file__)[1]?+??

??????????????????????????'?ran?for?%.1fs'?%?((end_time?-?start_time)))??

完

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的DeepLearning tutorial（1）Softmax回归原理简介+代码详解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： DeepLearning tutoria
下一篇：【DeepLearning工具】Fedo