當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

从头开始建立神经网络翻译及扩展

發(fā)布時間：2025/3/15 编程问答 13 豆豆

生活随笔收集整理的這篇文章主要介紹了从头开始建立神经网络翻译及扩展小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

翻譯
- 從頭開始建立神經(jīng)網(wǎng)絡-簡介
- 導包和配置
- 生成一個數(shù)據(jù)集
- 實現(xiàn)用來展示決策邊界的輔助函數(shù)
- Logistic Regression
- 訓練一個神經(jīng)網(wǎng)絡
- - 我們的神經(jīng)網(wǎng)絡如何進行預測
  - 學習神經(jīng)網(wǎng)絡的參數(shù)
  - 實現(xiàn)神經(jīng)網(wǎng)絡
- 訓練一個隱層有3個神經(jīng)元的神經(jīng)網(wǎng)絡
- 驗證隱層神經(jīng)元個數(shù)對神經(jīng)網(wǎng)絡的影響
- 練習
練習題解答
- 1. Minibatch gradient
- 2.Annealing learning rate
- 3.其他激活函數(shù)
- - Sigmoid Activation
  - ReLU Activation
- 4.Three Classes
- 5.Extend the network to 4 layers

翻譯

這篇文章是在完成吳恩達的深度學習課程作業(yè)的時候，在參考資料中看到的，感覺寫的不錯；這里翻譯一下內(nèi)容來加深自己的理解，同時在后面也完成了作者留下的一些作業(yè)，來提高自己對神經(jīng)網(wǎng)絡的認識。翻譯部分的原內(nèi)容請轉到原文傳送門。

從頭開始建立神經(jīng)網(wǎng)絡-簡介

在這篇文章中我們將從頭開始實現(xiàn)一個非常簡單的3層神經(jīng)網(wǎng)絡。這里并不會對所需的所有數(shù)學知識進行推導，但我會試著從直覺上來解釋我們在做什么。我也會給你指出去哪里查看你想要的細節(jié)。

這里我假設你已經(jīng)具備了基礎的微積分基礎和機器學的概念，舉例來說就是，你已經(jīng)知道什么是回歸問題，而什么是分類問題。更理想的是你也已經(jīng)知道了一些有關像梯度下降這樣的優(yōu)化方法。不過就算上面提到的東西你一無所知，這篇文章也能給你帶來樂趣。

為什么我想要從頭開始建立一個神經(jīng)網(wǎng)絡呢？就算是你將來打算使用像PyBrain這樣的框架來實現(xiàn)你的神經(jīng)網(wǎng)絡，那么有過至少一次從頭開始實現(xiàn)一個神經(jīng)網(wǎng)絡的經(jīng)驗能夠讓你明白神經(jīng)網(wǎng)絡是如何工作的；而明白神經(jīng)網(wǎng)絡的工作原理對于設計一個有效的模型是至關重要的。

還有一個需要注意的問題是這里的代碼并不是很高效，因為我想讓這些代碼更加易于理解。在后面的文章中我會使用Theano來實現(xiàn)一個高效的神經(jīng)網(wǎng)絡。（在我看的時候已經(jīng)實現(xiàn)好了，–>傳送門!!!所謂的高效就是用GPU）。

導包和配置

# Package imports import matplotlib.pyplot as plt import numpy as np import sklearn import sklearn.datasets import sklearn.linear_model import matplotlib# Display plots inline and change default figure size %matplotlib inline matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

生成一個數(shù)據(jù)集

要訓練模型，首先就要生成一個數(shù)據(jù)集。很慶幸，scikit-learn 有很多有用的數(shù)據(jù)集生成器，我們這里直接使用make_moons這個函數(shù)來生成我們的數(shù)據(jù)集。

# Package imports # Generate a dataset and plot it np.random.seed(0) X, y = sklearn.datasets.make_moons(200, noise=0.20) plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)

生成的數(shù)據(jù)集結果如下

我們生成的數(shù)據(jù)集有兩個類別，分別用紅色和藍色的點來表示。你可以用這樣的場景來描述這個數(shù)據(jù)集：藍色的代表是男病人、紅色代表女病人，而x和y的值代表了兩項醫(yī)療指標。

我們的目標就是訓練一個能夠通過給定的x和y的值來正確識別紅色和藍色這兩個類別（男人或者女人）。注意我們生成的數(shù)據(jù)集不是線性可分的，也就是說我們無法畫一條直線來區(qū)分這兩個類別。這就意味著像邏輯回歸這樣的線性分類器無法識別這個數(shù)據(jù)集中的模式，除非你手工制造一些適合該模型的非線性的特征（例如：多項式）。

事實上，這是神經(jīng)網(wǎng)絡的一個主要優(yōu)勢。你無需擔心特征工程。隱層的神經(jīng)元將會幫你提取有用的特征。

實現(xiàn)用來展示決策邊界的輔助函數(shù)

# Helper function to plot a decision boundary. # If you don't fully understand this function don't worry, # it just generates the contour plot below. def plot_decision_boundary(pred_func):# Set min and max values and give it some paddingx_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5h = 0.01# Generate a grid of points with distance h between themxx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))# Predict the function value for the whole gidZ = pred_func(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)# Plot the contour and training examplesplt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)

Logistic Regression

為了證明我的觀點，我將訓練一個邏輯回歸分類器。這個分類器接收x，y的輸入，然后輸出對應的分類（0 或者 1）。為了簡便，這里直接使用scikit-learn提供的模型來實現(xiàn)。

# Train the logistic regression classifier clf = sklearn.linear_model.LogisticRegressionCV(cv=5) clf.fit(X, y) # Plot the decision boundary plot_decision_boundary(lambda x: clf.predict(x)) plt.title("Logistic Regression")

運行結果如下

圖像顯示了我們的邏輯回歸算法所學習到的決策邊界，它已經(jīng)盡力地用一條直線來分割我們的數(shù)據(jù)集了，但是它永遠無法捕捉數(shù)據(jù)集中的“月牙”形狀。

訓練一個神經(jīng)網(wǎng)絡

我們現(xiàn)在開始建立一個3層的神經(jīng)網(wǎng)絡：一個輸入層、一個隱層、一個輸出層。輸入層的神經(jīng)元的個數(shù)由數(shù)據(jù)的特征緯度來決定，所以這里是2（x和y）。輸出層的神經(jīng)元個數(shù)由類別數(shù)來決定，也是2（有兩個類別）（由于我們只有兩個類別，因此我們完全可以用一個神經(jīng)元來作為輸出層，分別輸出1和0來代表兩個分類。但是用兩個神經(jīng)元可以為后面做多分類任務的擴展帶來便利）。我們輸入x和y，神經(jīng)網(wǎng)絡會輸出兩個概率值，一個代表類別0的概率（“女人”），另一個代表類別為1的概率（“男人”）。神經(jīng)網(wǎng)絡的結構如下圖所示：

隱層的神經(jīng)元的個數(shù)是由我們來指定的。我們放入隱層的神經(jīng)元的個數(shù)越多，我們就能夠模擬越復雜的函數(shù)。但是大量的隱層神經(jīng)元增加了代價。首先，在預測和訓練的過程中就需要跟高的計算能力。同時大量的參數(shù)就意味著更容易發(fā)生過擬合。

如何來決定隱層的大小？沒有什么一般性的知道方針，它由你所處理的問題不同而決定，同時這也是一種藝術！（調(diào)參的藝術！！！）后面我們會對隱層數(shù)量進行改變，來看看它是如何影響我們的輸出結果的。

同時我們也需要為我們的隱層挑選一個合適的激活函數(shù)。激活函數(shù)的作用就是把該層的輸入的線性組合某種變換。一個非線性的激活函數(shù)能夠讓我們做出非線性的預測。通常的激活函數(shù)選擇有tanh和sigmoid以及ReLUs。這里我們選擇使用tanh，因為它在很多場景下都表現(xiàn)的很好。這些函數(shù)的一個很好的屬性是它們的倒數(shù)可以使用原函數(shù)的值來計算。那tanh(x)來舉例，tanh(x)的導數(shù)是1-tanh²(x)。這個特性很有用，因為這使得我們能夠只計算一次tanh(x)的值，然后利用這個值來計算導數(shù)，減少了很多計算量。

因為我們想要我們的神經(jīng)網(wǎng)絡輸出概率，所以輸出層的激活函數(shù)需要使用softmax，它提供了將得分轉換為概率的途徑。如果你對logistic 函數(shù)很熟悉，那么你可以將softmax函數(shù)看作它在多分類問題上的擴展。

我們的神經(jīng)網(wǎng)絡如何進行預測

我們的神經(jīng)網(wǎng)絡通過前想傳播來進行預測，也就是一堆的矩陣乘法和對我們所定義的激活函數(shù)的應用。假設x是一個2維向量，那么我們通過如下的方式來計算 $y^\hat{y}$ ：
z₁=xW₁+b₁
a₁=tanh(z₁)
z₂=a₁W₂+b₂
a₂= $y^\hat{y}$ =softmax(z₂)
z_i是第i層的輸入，a_i是第i層經(jīng)過激活函數(shù)的作用后的輸出。W₁，b₁，W₂，b₂是神經(jīng)網(wǎng)絡的參數(shù)，需要我們從數(shù)據(jù)集中來學習。可以把它們看作是神經(jīng)網(wǎng)絡不同層之間的數(shù)據(jù)傳輸矩陣。通過矩陣乘法的定義，我們可以決定這些矩陣的緯度。假如隱層有500個神經(jīng)元，那么 $W1∈R2×500W_{1}\in \mathbb{R}^{2\times 500}$ ， $b1∈R500b_{1}\in \mathbb{R}^{500}$ ， $W2∈R500×2W_{2}\in \mathbb{R}^{500\times 2}$ ， $b2∈R2b_{2}\in \mathbb{R}^{2}$ 。現(xiàn)在你應該能夠發(fā)現(xiàn)為什么隱層的神經(jīng)元的個數(shù)越多，我們的參數(shù)就越多了。

學習神經(jīng)網(wǎng)絡的參數(shù)

對參數(shù)進行學習，也就是尋找能夠最小化訓練集上的誤差的(W₁,b₁,W₂,b₂)。所以問題就變成了如何來定義這個誤差？我們會定義一個損失函數(shù)來描述這個誤差。輸出使用softmax，那么對應的損失函數(shù)通常定義為cross-entropy-loss(也叫做負log likelihood)。假如我們擁有N個訓練數(shù)據(jù)和C個類別，那么我們所預測的 $y^\hat{y}$ 與真正的標簽y之間的損失函數(shù)定義為：
$L(y,y^)=?1N∑n∈N∑i∈Cyn,ilogy^n,iL(y,\hat{y})=-\frac{1}{N}\sum_{n\in N}\sum_{i\in C}y_{n,i}log\hat{y}_{n,i}$

這個公式開起來很復雜，但實際上它做的事情就是如果我們在某個樣本上預測錯誤，就累計該錯誤。然后在整個樣本上求和。y（實際值）和 $y^\hat{y}$ （預測值）相差越大，損失函數(shù)的值就會越大。最小化損失函數(shù)，其實就是在數(shù)據(jù)集上最大話似然函數(shù)。

我們使用梯度下降法來最小化損失函數(shù)，這里使用最常見的一種梯度下降法，也就是通常的使用固定學習率的批量梯度下降法。它的變種SGD和minibatch-gradient-descent在實際應用中會表現(xiàn)更好。所以如果你是來真的，那么在他們之中選擇一個是正確的選擇。同時更理想的是你要使用隨著時間衰減的學習率。

作為輸入，梯度下降法需要損失函數(shù)對我們的參數(shù)的的梯度（導數(shù)的向量）: $?L?W1\frac{\partial L}{\partial W_{1}}$ , $?L?b1\frac{\partial L}{\partial b_{1}}$ , $?L?W2\frac{\partial L}{\partial W_{2}}$ , $?L?b2\frac{\partial L}{\partial b_{2}}$ 。我們使用著名的誤差逆?zhèn)鞑ニ惴▉砬筮@些梯度。這里我不會細說誤差逆?zhèn)鞑ニ惴ㄊ侨绾喂ぷ鞯?#xff0c;你可以參考這些在網(wǎng)上流傳很廣的解釋傳送門、傳送門。

使用誤差逆?zhèn)鞑ニ惴?#xff0c;我們得到如下的計算公式（信我，我算的是對的）：
$δ3=y^?y\delta_{3}=\hat{y}-y$
$δ2=(1?tanh2z1)°δ3W2T\delta_{2}=(1-tanh^2z_{1})\circ\delta_{3}W_{2}^T$
$?L?W2=a1Tδ3\frac{\partial L}{\partial W_{2}}=a_{1}^T\delta_{3}$
$?L?b2=δ3\frac{\partial L}{\partial b_{2}}=\delta_{3}$
$?L?W1=xTδ2\frac{\partial L}{\partial W_{1}}=x^T\delta_{2}$
$?L?b1=δ2\frac{\partial L}{\partial b_{1}}=\delta_{2}$

實現(xiàn)神經(jīng)網(wǎng)絡

現(xiàn)在我們就要來實現(xiàn)一個神經(jīng)網(wǎng)絡了。我們先來定義一些有用的參數(shù)。

num_examples = len(X) # training set size nn_input_dim = 2 # input layer dimensionality nn_output_dim = 2 # output layer dimensionality# Gradient descent parameters (I picked these by hand) epsilon = 0.01 # learning rate for gradient descent reg_lambda = 0.01 # regularization strength

首先來實現(xiàn)一損失函數(shù)的計算，用它來評價我們的模型的表現(xiàn)。

# Helper function to evaluate the total loss on the dataset def calculate_loss(model):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagation to calculate our predictionsz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples), y])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1))+ np.sum(np.square(W2)))return 1./num_examples * data_loss

同樣我們還要定義一個用來做預測的輔助函數(shù)。它運行我們定義的前向傳播算法，輸出概率最高的類別作為預測結果。

# Helper function to predict an output (0 or 1) def predict(model, x):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagationz1 = x.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1)

最后就是定義一個訓練模型的函數(shù)。實現(xiàn)了固定學習率的批量梯度下降算法。

#This function learns parameters for the # neural network and returns the model. #- nn_hdim: Number of nodes in the hidden layer #- num_passes: Number of passes # through the training data for gradient descent #- print_loss: If True, print the loss every 1000 iterations def build_model(nn_hdim, num_passes=20000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples), y] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0)# Add regularization terms (b1 and b2 don't# have regularization terms)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# Optionally print the loss.# This is expensive because it uses the whole dataset,# so we don't want to do it too often.if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss(model)))return model

訓練一個隱層有3個神經(jīng)元的神經(jīng)網(wǎng)絡

# Build a model with a 3-dimensional hidden layer model = build_model(3, print_loss=True)# Plot the decision boundary plot_decision_boundary(lambda x: predict(model, x)) plt.title("Decision Boundary for hidden layer size 3")

結果

哈!結果看起來相當不錯。我們的神經(jīng)網(wǎng)絡可以找到區(qū)分這兩個類別的決策邊界。

驗證隱層神經(jīng)元個數(shù)對神經(jīng)網(wǎng)絡的影響

在上面的例子中我們指定了隱層的神經(jīng)元的數(shù)量為3。現(xiàn)在讓我們來看看不同的神經(jīng)元個數(shù)對于輸出結果的影響。

plt.figure(figsize=(16, 32)) hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50] for i, nn_hdim in enumerate(hidden_layer_dimensions):plt.subplot(5, 2, i+1)plt.title('Hidden Layer size %d' % nn_hdim)model = build_model(nn_hdim)plot_decision_boundary(lambda x: predict(model, x)) plt.show()

我們可以看到，低維的隱層能夠很好地捕捉數(shù)據(jù)的變化趨勢。高維的隱層有過擬合的趨向；高緯度的隱層“記憶”了訓練集中的所有數(shù)據(jù)，從而降低了在整個形狀上的泛華能力。加入我們使用獨立的測試集來衡量我們的模型（這是你應該做的！！）。小隱層的神經(jīng)網(wǎng)絡將會因為更強的泛華能力而表現(xiàn)的更好。我們可以通過更強的正則化來抵消過擬合，但為隱藏層選擇正確的大小是一種更“經(jīng)濟”的解決方案。

練習

用小批量梯度下降算法來替代批量梯度下降算法(參考)來訓練模型，小批量梯度下降算法通常表現(xiàn)的更好。

這里我們使用了固定的學習率

?\epsilon

，為學習率實現(xiàn)一種退火的策略（也就是衰減策略）參考

我們使用了tanh來作為隱層的激活函數(shù)。嘗試一下其他的激活函數(shù)。

擴展神經(jīng)網(wǎng)絡為3分類神經(jīng)網(wǎng)絡（自己建立一個數(shù)據(jù)集）

擴展神經(jīng)網(wǎng)絡為一個4層的神經(jīng)網(wǎng)絡。嘗試一些不同的隱層的大小。

練習題解答

1. Minibatch gradient

只需要修改一下模型的訓練函數(shù)

import random def build_model_batch(nn_hdim, num_passes=50000, print_loss=False, batch_size=50):# 這里的batch_size就是小批量的大小# 建立一個訓練集的索引列表indexes = [index for index in range(num_examples)]# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# 隨機從訓練集中拿出 batch_size個數(shù)據(jù)進行訓練train_indexes = random.sample(indexes, batch_size)X_TRAIN = X[train_indexes, :]y_train = y[train_indexes]# Forward propagationz1 = X_TRAIN.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(batch_size), y_train] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X_TRAIN.T, delta2)db1 = np.sum(delta2, axis=0)# Add regularization terms (b1 and b2 don't # have regularization terms)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# Optionally print the loss.# This is expensive because it uses # the whole dataset, so we don't want to do it too often.if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss(model)))return model

用該方式訓練模型

# Build a model with a 3-dimensional hidden layer with MiniBatch model = build_model_batch(3, print_loss=False)# Plot the decision boundary plot_decision_boundary(lambda x: predict(model, x)) plt.title("Decision Boundary for hidden layer size 3 with MiniBatch")

結果

2.Annealing learning rate

退火有利于跳出局部最小！這里實現(xiàn)一個最簡單的退火方式。

# 初始的學習率 max_epsilon = 0.01 # 終止的學習率 min_epsilon = 0.001 def build_model_annealing(nn_hdim, num_passes=80000, print_loss=False, explore=100):# explore就是退火周期，每explore次迭代，退火一次# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# 初始化學習率為最大tem_epsilon = max_epsilon# Gradient descent. For each batch...for i in range(0, num_passes):# 進行退火if tem_epsilon > min_epsilon and i % explore == 0:tem_epsilon -= (max_epsilon - min_epsilon) / exploretem_epsilon = max(tem_epsilon, min_epsilon)# Forward propagationz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples), y] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -tem_epsilon * dW1b1 += -tem_epsilon * db1W2 += -tem_epsilon * dW2b2 += -tem_epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f，the now learning rate is %s" %(i, calculate_loss(model), tem_epsilon))return model # Build a model with a 3-dimensional hidden layer with Annealing model = build_model_annealing(3, print_loss=True)# Plot the decision boundary plot_decision_boundary(lambda x: predict(model, x)) plt.title("Decision Boundary for hidden layer size 3 with Annealing")

運行結果

3.其他激活函數(shù)

更改了激活函數(shù)以后，計算損失、預測和模型訓練方法都需要進行修改。其實最合理的是增加一個激活函數(shù)的參數(shù)，我這里偷懶就直接復制了·

Sigmoid Activation

# 定義sigmoid函數(shù) def sigmoid(z):s = 1.0 / (1 + np.exp(-z))return s # 繪制圖像 def draw_sigmoid():fig = plt.figure(figsize=(6,4))ax = fig.add_subplot(111)x=np.linspace(-6,6,1000) #這個表示在-5到5之間生成1000個x值y=sigmoid(x) #對上述生成的1000個數(shù)循環(huán)用sigmoid公式求對應的yplt.xlim((-6,6))plt.ylim((0.00,1.00))plt.yticks([0,0.5,1.0],[0,0.5,1.0]) #設置y軸顯示的刻度plt.plot(x,y,color='darkblue') #用上述生成的1000個xy值對生成1000個點ax=plt.gca()ax.spines['right'].set_color('none') #刪除右邊框設為無ax.spines['top'].set_color('none') #刪除上邊框設為無ax.xaxis.set_ticks_position('bottom')ax.spines['bottom'].set_position(('data', 0)) #調(diào)整x軸位置ax.yaxis.set_ticks_position('left')ax.spines['left'].set_position(('data', 0)) #調(diào)整y軸位置plt.xlabel("sigmoid")plt.show() draw_sigmoid()

運行結果

激活函數(shù)為sigmoid的模型實現(xiàn)

# Helper function to evaluate the total loss on the dataset # sigmoid edition def calculate_loss_sigmoid(model):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagation to calculate our predictionsz1 = X.dot(W1) + b1# 修改了激活函數(shù)a1 = sigmoid(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples), y])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))return 1./num_examples * data_loss # Helper function to predict an output (0 or 1) # sigmoid edtion def predict_sigmoid(model, x):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagationz1 = x.dot(W1) + b1# 修改了激活函數(shù)a1 = sigmoid(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1) # sigmoid edition of building model def build_model_sigmoid(nn_hdim, num_passes=50000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X.dot(W1) + b1# 修改了激活函數(shù)a1 = sigmoid(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples), y] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)# 這里的導數(shù)變成了sigmoid的導數(shù)，即a1*(1-a1)delta2 = delta3.dot(W2.T)*a1*(1-a1)dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# 改為用sigmoid計算損失if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss_sigmoid(model)))return model # Build a model with a 3-dimensional hidden layer with sigmoid model = build_model_sigmoid(3, print_loss=True)# Plot the decision boundary plot_decision_boundary(lambda x: predict_sigmoid(model, x)) plt.title("Decision Boundary for hidden layer size 3 with sigmoid")

運行結果

ReLU Activation

ReLU圖像

def draw_ReLU():fig = plt.figure(figsize=(6, 4))ax = fig.add_subplot(111)x = np.arange(-10, 10)y = np.where(x>0, x, 0)plt.xlim(-11, 11)plt.ylim(-11, 11)ax.spines['top'].set_color('none')ax.spines['right'].set_color('none')ax.xaxis.set_ticks_position('bottom')ax.spines['bottom'].set_position(('data', 0))ax.set_xticks([-10, -5, 0, 5, 10])ax.yaxis.set_ticks_position('left')ax.spines['left'].set_position(('data', 0))ax.set_yticks([-10, -5, 5, 10])plt.plot(x, y, label="ReLU", color="blue")plt.legend()plt.show() draw_ReLU()

運行結果

激活函數(shù)為ReLU的模型實現(xiàn)

# Helper function to evaluate the total loss on the dataset # ReLU edition def calculate_loss_ReLU(model):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagation to calculate our predictionsz1 = X.dot(W1) + b1# 激活函數(shù)改為了ReLUa1 = np.where(z1>0, z1, 0)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples), y])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))return 1./num_examples * data_loss # Helper function to predict an output (0 or 1) # ReLU edtion def predict_ReLU(model, x):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagationz1 = x.dot(W1) + b1# 激活函數(shù)改為了ReLUa1 = np.where(z1>0, z1, 0)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1) # ReLU edition of building model def build_model_ReLU(nn_hdim, num_passes=20000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X.dot(W1) + b1# 激活函數(shù)改為ReLUa1 = np.where(z1>0, z1,0)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples), y] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T)# ReLu求導delta2 = np.where(z1 >0, delta2, 0)dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}# 改為ReLU計算損失if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss_ReLU(model)))return model # Build a model with a 3-dimensional hidden layer with sigmoid model = build_model_ReLU(3, print_loss=True)# Plot the decision boundary plot_decision_boundary(lambda x: predict_ReLU(model, x)) plt.title("Decision Boundary for hidden layer size 3 with ReLU")

運行結果

4.Three Classes

先搞一個數(shù)據(jù)集
這里也用scikit-learn來生成

# Generate a dataset and plot it X1, y1 = sklearn.datasets.make_classification(n_samples=300, n_features=2,n_redundant=0, n_informative=2,n_clusters_per_class=1,n_classes=3,random_state=29) plt.scatter(X1[:,0], X1[:,1], s=40, c=y1, cmap=plt.cm.Spectral)

運行結果

初始化三分類問題的參數(shù)

num_examples1 = len(X1) # training set size nn_input_dim1 = 2 # input layer dimensionality nn_output_dim1 = 3 # output layer dimensionality

實現(xiàn)三分類神經(jīng)網(wǎng)絡
由于輸出采用的是softmax，所以基本不用修改。只是修改一下數(shù)據(jù)集和輸出層的神經(jīng)元個數(shù)。

# Helper function to evaluate the total loss on the dataset # 這里只是簡單的把數(shù)據(jù)集更換了 def calculate_loss1(model):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagation to calculate our predictionsz1 = X1.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples1), y1])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))return 1./num_examples1 * data_loss # Helper function to predict an output (0 or 1) # 這個就沒改··· def predict1(model, x):W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']# Forward propagationz1 = x.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1) def build_model1(nn_hdim, num_passes=20000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim1, nn_hdim) / np.sqrt(nn_input_dim1)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_output_dim1) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_output_dim1))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X1.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2exp_scores = np.exp(z2)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta3 = probsdelta3[range(num_examples1), y1] -= 1dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X1.T, delta2)db1 = np.sum(delta2, axis=0)dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss1(model)))return model

由于數(shù)據(jù)集發(fā)生了變化，作者的繪制邊界的函數(shù)中的數(shù)據(jù)集都是寫死在函數(shù)里的，為了偷懶我就復制一個新的，改了改

def plot_decision_boundary1(pred_func):# Set min and max values and give it some paddingx_min, x_max = X1[:, 0].min() - .5, X1[:, 0].max() + .5y_min, y_max = X1[:, 1].min() - .5, X1[:, 1].max() + .5h = 0.01# Generate a grid of points with distance h between themxx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))# Predict the function value for the whole gidZ = pred_func(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)# Plot the contour and training examplesplt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)plt.scatter(X1[:, 0], X1[:, 1], c=y1, cmap=plt.cm.Spectral) # Build a model with a 5-dimensional hidden layer model = build_model1(5, print_loss=True)# Plot the decision boundary plot_decision_boundary1(lambda x: predict1(model, x)) plt.title("Decision Boundary for hidden layer size 3")

運行結果

5.Extend the network to 4 layers

三個函數(shù)都增加了一個隱層而已~

def calculate_loss2(model):W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3']# Forward propagation to calculate our predictionsz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2a2 = np.tanh(z2)z3 = a2.dot(W3) + b3exp_scores = np.exp(z3)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Calculating the losscorect_logprobs = -np.log(probs[range(num_examples), y])data_loss = np.sum(corect_logprobs)# Add regulatization term to loss (optional)data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))return 1./num_examples * data_loss def predict2(model, x):W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3']# Forward propagationz1 = x.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2a2 = np.tanh(z2)z3 = a2.dot(W3) + b3exp_scores = np.exp(z3)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)return np.argmax(probs, axis=1) def build_model2(nn_hdim, nn_hdim1, num_passes=20000, print_loss=False):# Initialize the parameters to random values. We need to learn these.np.random.seed(0)W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim1)b1 = np.zeros((1, nn_hdim))W2 = np.random.randn(nn_hdim, nn_hdim1) / np.sqrt(nn_hdim)b2 = np.zeros((1, nn_hdim1))W3 = np.random.randn(nn_hdim1, nn_output_dim) / np.sqrt(nn_hdim1)b3 = np.zeros((1, nn_output_dim))# This is what we return at the endmodel = {}# Gradient descent. For each batch...for i in range(0, num_passes):# Forward propagationz1 = X.dot(W1) + b1a1 = np.tanh(z1)z2 = a1.dot(W2) + b2a2 = np.tanh(z2)z3 = a2.dot(W3) + b3exp_scores = np.exp(z3)probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)# Backpropagationdelta4 = probsdelta4[range(num_examples), y] -= 1dW3 = (a2.T).dot(delta4)db3 = np.sum(delta4, axis=0, keepdims=True)delta3 = delta4.dot(W3.T) * (1 - np.power(a2, 2))dW2 = (a1.T).dot(delta3)db2 = np.sum(delta3, axis=0, keepdims=True)delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))dW1 = np.dot(X.T, delta2)db1 = np.sum(delta2, axis=0, keepdims=True)dW3 += reg_lambda * W3dW2 += reg_lambda * W2dW1 += reg_lambda * W1# Gradient descent parameter updateW1 += -epsilon * dW1b1 += -epsilon * db1W2 += -epsilon * dW2b2 += -epsilon * db2W3 += -epsilon * dW3b3 += -epsilon * db3# Assign new parameters to the modelmodel = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2, 'W3': W3, 'b3': b3}if print_loss and i % 1000 == 0:print("Loss after iteration %i: %f" %(i, calculate_loss2(model)))return model # Build a model with a 3_4-dimensional hidden layer model = build_model2(3,4, print_loss=True)# Plot the decision boundary plot_decision_boundary(lambda x: predict2(model, x)) plt.title("Decision Boundary for hidden layer size 3-4")

運行結果
我嘗試了一下，感覺3，4就不錯。

總結

以上是生活随笔為你收集整理的从头开始建立神经网络翻译及扩展的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

神经网络

上一篇： Alter操作(修改列名，修改列数据类型
下一篇：欢迎加入唐山.NET俱乐部

编程问答

从头开始建立神经网络翻译及扩展

目錄

翻譯

從頭開始建立神經(jīng)網(wǎng)絡-簡介

導包和配置

生成一個數(shù)據(jù)集

實現(xiàn)用來展示決策邊界的輔助函數(shù)

Logistic Regression

訓練一個神經(jīng)網(wǎng)絡

我們的神經(jīng)網(wǎng)絡如何進行預測

學習神經(jīng)網(wǎng)絡的參數(shù)

實現(xiàn)神經(jīng)網(wǎng)絡

訓練一個隱層有3個神經(jīng)元的神經(jīng)網(wǎng)絡

驗證隱層神經(jīng)元個數(shù)對神經(jīng)網(wǎng)絡的影響

練習

練習題解答

1. Minibatch gradient

2.Annealing learning rate

3.其他激活函數(shù)

Sigmoid Activation

ReLU Activation

4.Three Classes

5.Extend the network to 4 layers

總結