01.神经网络和深度学习 W2.神经网络基础(作业:逻辑回归 图片识别)
文章目錄
- 編程題 1
- 1. numpy 基本函數
- 1.1 編寫 sigmoid 函數
- 1.2 編寫 sigmoid 函數的導數
- 1.3 reshape操作
- 1.4 標準化
- 1.5 廣播機制
- 2. 向量化
- 2.1 L1\L2損失函數
- 編程題 2. 圖片🐱識別
- 1. 導入包
- 2. 數據預覽
- 3. 算法的一般結構
- 4. 建立算法
- 4.1 輔助函數
- 4.2 初始化參數
- 4.3 前向后向傳播
- 4.4 更新參數,梯度下降
- 4.5 合并所有函數到Model
- 4.6 分析
- 4.7 用自己的照片測試模型
- 5. 總結
選擇題測試,請參考 鏈接博文
編程題 1
1. numpy 基本函數
1.1 編寫 sigmoid 函數
import mathdef basic_sigmoid(x):"""Compute sigmoid of x.Arguments:x -- A scalarReturn:s -- sigmoid(x)"""### START CODE HERE ### (≈ 1 line of code)s = 1/(1+math.pow(math.e, -x)) # or s = 1/(1+math.exp(-x))### END CODE HERE ###return s- 不推薦使用 math 包,因為深度學習里很多都是向量,math 包不能對向量進行計算
- 使用 numpy 編寫的 sigmoid 函數
1.2 編寫 sigmoid 函數的導數
# GRADED FUNCTION: sigmoid_derivativedef sigmoid_derivative(x):"""Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.You can store the output of the sigmoid function into variables and then use it to calculate the gradient.Arguments:x -- A scalar or numpy arrayReturn:ds -- Your computed gradient."""### START CODE HERE ### (≈ 2 lines of code)s = sigmoid(x)ds = s*(1-s)### END CODE HERE ###return ds x = np.array([1, 2, 3]) sigmoid_derivative(x) print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x))) # sigmoid_derivative(x) = [0.19661193 0.10499359 0.04517666]1.3 reshape操作
將照片的數據展平,不想計算的維,可以置為 -1,會自動計算
# GRADED FUNCTION: image2vector def image2vector(image):"""Argument:image -- a numpy array of shape (length, height, depth)Returns:v -- a vector of shape (length*height*depth, 1)"""### START CODE HERE ### (≈ 1 line of code)v = image.reshape(-1,1)### END CODE HERE ###return v # This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values image = np.array([[[ 0.67826139, 0.29380381],[ 0.90714982, 0.52835647],[ 0.4215251 , 0.45017551]],[[ 0.92814219, 0.96677647],[ 0.85304703, 0.52351845],[ 0.19981397, 0.27417313]],[[ 0.60659855, 0.00533165],[ 0.10820313, 0.49978937],[ 0.34144279, 0.94630077]]])print ("image2vector(image) = " + str(image2vector(image))) # 輸出 image2vector(image) = [[0.67826139][0.29380381][0.90714982][0.52835647][0.4215251 ][0.45017551][0.92814219][0.96677647][0.85304703][0.52351845][0.19981397][0.27417313][0.60659855][0.00533165][0.10820313][0.49978937][0.34144279][0.94630077]]1.4 標準化
標準化通常使得梯度下降收斂更快。
舉個例子 x=[034264]x = \begin{bmatrix} 0 & 3 & 4 \\ 2 & 6 & 4 \\ \end{bmatrix}x=[02?36?44?]
那么 ∥x∥=np.linalg.norm(x,axis=1,keepdims=True)=[556]\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix} 5 \\ \sqrt{56} \\ \end{bmatrix}∥x∥=np.linalg.norm(x,axis=1,keepdims=True)=[556??]
x_normalized=x∥x∥=[03545256656456]x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix} 0 & \frac{3}{5} & \frac{4}{5} \\ \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\ \end{bmatrix}x_normalized=∥x∥x?=[056?2??53?56?6??54?56?4??]
1.5 廣播機制
官方文檔
對于行向量 x∈R1×n,?softmax(x)=softmax([x1x2...xn])=[ex1∑jexjex2∑jexj...exn∑jexj]x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix} x_1 && x_2 && ... && x_n \end{bmatrix}) = \begin{bmatrix} \frac{e^{x_1}}{\sum_{j}e^{x_j}} && \frac{e^{x_2}}{\sum_{j}e^{x_j}} && ... && \frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix}x∈R1×n,?softmax(x)=softmax([x1???x2???...??xn??])=[∑j?exj?ex1????∑j?exj?ex2????...??∑j?exj?exn???]
對于矩陣 xxx ∈Rm×n\in \mathbb{R}^{m \times n}∈Rm×n
xijx_{ij}xij? maps to the element in the ithi^{th}ith row and jthj^{th}jth column of xxx, thus we have
softmax(x)=softmax[x11x12x13…x1nx21x22x23…x2n?????xm1xm2xm3…xmn]=[ex11∑jex1jex12∑jex1jex13∑jex1j…ex1n∑jex1jex21∑jex2jex22∑jex2jex23∑jex2j…ex2n∑jex2j?????exm1∑jexmjexm2∑jexmjexm3∑jexmj…exmn∑jexmj]=(softmax(first?row?of?x)softmax(second?row?of?x)...softmax(last?row?of?x))softmax(x) = softmax\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} = \begin{bmatrix} \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} = \begin{pmatrix} softmax\text{(first row of x)} \\ softmax\text{(second row of x)} \\ ... \\ softmax\text{(last row of x)} \\ \end{pmatrix} softmax(x)=softmax??????x11?x21??xm1??x12?x22??xm2??x13?x23??xm3??……?…?x1n?x2n??xmn????????=????????∑j?ex1j?ex11??∑j?ex2j?ex21???∑j?exmj?exm1???∑j?ex1j?ex12??∑j?ex2j?ex22???∑j?exmj?exm2???∑j?ex1j?ex13??∑j?ex2j?ex23???∑j?exmj?exm3???……?…?∑j?ex1j?ex1n??∑j?ex2j?ex2n???∑j?exmj?exmn???????????=?????softmax(first?row?of?x)softmax(second?row?of?x)...softmax(last?row?of?x)??????
# GRADED FUNCTION: softmaxdef softmax(x):"""Calculates the softmax for each row of the input x.Your code should work for a row vector and also for matrices of shape (n, m).Argument:x -- A numpy matrix of shape (n,m)Returns:s -- A numpy matrix equal to the softmax of x, of shape (n,m)"""### START CODE HERE ### (≈ 3 lines of code)# Apply exp() element-wise to x. Use np.exp(...).x_exp = np.exp(x)# Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).x_sum = np.sum(x_exp, axis=1, keepdims=True)# Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.s = x_exp/x_sum### END CODE HERE ###return s x = np.array([[9, 2, 5, 0, 0],[7, 5, 0, 0 ,0]]) print("softmax(x) = " + str(softmax(x))) softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-041.21052389e-04][8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-048.01252314e-04]]2. 向量化
向量化計算更簡潔,更高效
2.1 L1\L2損失函數
L1(y^,y)=∑i=0m∣y(i)?y^(i)∣\begin{aligned} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{aligned}?L1?(y^?,y)=i=0∑m?∣y(i)?y^?(i)∣?
def L1(yhat, y):"""Arguments:yhat -- vector of size m (predicted labels)y -- vector of size m (true labels)Returns:loss -- the value of the L1 loss function defined above"""### START CODE HERE ### (≈ 1 line of code)loss = np.sum(abs(yhat-y))### END CODE HERE ###return loss yhat = np.array([.9, 0.2, 0.1, .4, .9]) y = np.array([1, 0, 0, 1, 1]) print("L1 = " + str(L1(yhat,y))) # L1 = 1.1L2(y^,y)=∑i=0m(y(i)?y^(i))2\begin{aligned} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{aligned}?L2?(y^?,y)=i=0∑m?(y(i)?y^?(i))2?
import numpy as np
a = np.array([1, 2, 3])
np.dot(a, a)
14
編程題 2. 圖片🐱識別
使用神經網絡識別貓
1. 導入包
import numpy as np import matplotlib.pyplot as plt import h5py import scipy from PIL import Image from scipy import ndimage from lr_utils import load_dataset%matplotlib inline2. 數據預覽
弄清楚數據的維度
reshape 數據
標準化數據
有訓練集,標簽為 y = 1 是貓,y = 0 不是貓
有測試集,帶標簽的
每個圖片是 3 通道的
- 讀取數據
- 預覽圖片
- 數據大小
- 將樣本圖片矩陣展平
- 圖片的矩陣數值為 0 - 255,標準化數據
3. 算法的一般結構
用神經網絡的思路,建立一個 Logistic 回歸
4. 建立算法
定義模型結構(如,輸入的特征個數)
初始化模型參數
循環迭代:
4.1 輔助函數
- sigmoid 函數
4.2 初始化參數
邏輯回歸的參數可以都設置為 0(神經網絡不可以)
# GRADED FUNCTION: initialize_with_zerosdef initialize_with_zeros(dim):"""This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.Argument:dim -- size of the w vector we want (or number of parameters in this case)Returns:w -- initialized vector of shape (dim, 1)b -- initialized scalar (corresponds to the bias)"""### START CODE HERE ### (≈ 1 line of code)w = np.zeros((dim, 1))b = 0### END CODE HERE ###assert(w.shape == (dim, 1))assert(isinstance(b, float) or isinstance(b, int))return w, b4.3 前向后向傳播
前向傳播:
- 有 XXX 特征
- 計算 A=σ(wTX+b)=(a(0),a(1),...,a(m?1),a(m))A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})A=σ(wTX+b)=(a(0),a(1),...,a(m?1),a(m))
- 計算損失函數: J=?1m∑i=1my(i)log?(a(i))+(1?y(i))log?(1?a(i))J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})J=?m1?∑i=1m?y(i)log(a(i))+(1?y(i))log(1?a(i))
方程:
?J?w=1mX(A?Y)T\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T?w?J?=m1?X(A?Y)T
?J?b=1m∑i=1m(a(i)?y(i))\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})?b?J?=m1?i=1∑m?(a(i)?y(i))
4.4 更新參數,梯度下降
# GRADED FUNCTION: optimizedef optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):"""This function optimizes w and b by running a gradient descent algorithmArguments:w -- weights, a numpy array of size (num_px * num_px * 3, 1)b -- bias, a scalarX -- data of shape (num_px * num_px * 3, number of examples)Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)num_iterations -- number of iterations of the optimization looplearning_rate -- learning rate of the gradient descent update ruleprint_cost -- True to print the loss every 100 stepsReturns:params -- dictionary containing the weights w and bias bgrads -- dictionary containing the gradients of the weights and bias with respect to the cost functioncosts -- list of all the costs computed during the optimization, this will be used to plot the learning curve.Tips:You basically need to write down two steps and iterate through them:1) Calculate the cost and the gradient for the current parameters. Use propagate().2) Update the parameters using gradient descent rule for w and b."""costs = []for i in range(num_iterations):# Cost and gradient calculation (≈ 1-4 lines of code)### START CODE HERE ### grads, cost = propagate(w, b, X, Y)### END CODE HERE #### Retrieve derivatives from gradsdw = grads["dw"]db = grads["db"]# update rule (≈ 2 lines of code)### START CODE HERE ###w = w - learning_rate * dwb = b - learning_rate * db### END CODE HERE #### Record the costsif i % 100 == 0:costs.append(cost)# Print the cost every 100 training examplesif print_cost and i % 100 == 0:print ("Cost after iteration %i: %f" %(i, cost))params = {"w": w,"b": b}grads = {"dw": dw,"db": db}return params, grads, costs params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)print ("w = " + str(params["w"])) print ("b = " + str(params["b"])) print ("dw = " + str(grads["dw"])) print ("db = " + str(grads["db"])) w = [[0.1124579 ][0.23106775]] b = [1.55930492] dw = [[0.90158428][1.76250842]] db = [0.43046207]- 可以利用學習到的參數來進行預測
計算預測值 Y^=A=σ(wTX+b)\hat{Y} = A = \sigma(w^T X + b)Y^=A=σ(wTX+b)
根據預測值進行分類,<= 0.5 標記為0,否則為1
4.5 合并所有函數到Model
# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):"""Builds the logistic regression model by calling the function you've implemented previouslyArguments:X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)num_iterations -- hyperparameter representing the number of iterations to optimize the parameterslearning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()print_cost -- Set to true to print the cost every 100 iterationsReturns:d -- dictionary containing information about the model."""### START CODE HERE #### initialize parameters with zeros (≈ 1 line of code)w, b = initialize_with_zeros(X_train.shape[0])# Gradient descent (≈ 1 line of code)parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = print_cost)# Retrieve parameters w and b from dictionary "parameters"w = parameters["w"]b = parameters["b"]# Predict test/train set examples (≈ 2 lines of code)Y_prediction_test = predict(w, b, X_test)Y_prediction_train = predict(w, b, X_train)### END CODE HERE #### Print train/test Errorsprint("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))d = {"costs": costs,"Y_prediction_test": Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "w" : w, "b" : b,"learning_rate" : learning_rate,"num_iterations": num_iterations}return d d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True) Cost after iteration 0: 0.693147 Cost after iteration 100: 0.584508 Cost after iteration 200: 0.466949 Cost after iteration 300: 0.376007 Cost after iteration 400: 0.331463 Cost after iteration 500: 0.303273 Cost after iteration 600: 0.279880 Cost after iteration 700: 0.260042 Cost after iteration 800: 0.242941 Cost after iteration 900: 0.228004 Cost after iteration 1000: 0.214820 Cost after iteration 1100: 0.203078 Cost after iteration 1200: 0.192544 Cost after iteration 1300: 0.183033 Cost after iteration 1400: 0.174399 Cost after iteration 1500: 0.166521 Cost after iteration 1600: 0.159305 Cost after iteration 1700: 0.152667 Cost after iteration 1800: 0.146542 Cost after iteration 1900: 0.140872 train accuracy: 99.04306220095694 % test accuracy: 70.0 %- 模型在訓練集上表現的很好,在測試集上一般,存在過擬合現象
更改 index 可以查看 測試集的 預測值和真實值
- 繪制代價函數、梯度
- 增加訓練迭代次數為 3000(上面是2000)
訓練集上的準確率上升,但是測試集上準確率下降,這就是過擬合了
4.6 分析
- 不同學習率下的對比
- 學習率太大的話,容易引起震蕩,導致不收斂(本例子0.01,不算太壞,最后收斂了)
- 低的cost不意味著好的模型,要檢查是否過擬合(訓練集很好,測試集很差)
4.7 用自己的照片測試模型
## START CODE HERE ## (PUT YOUR IMAGE NAME) my_image = "cat1.jpg" # change this to the name of your image file ## END CODE HERE ### We preprocess the image to fit your algorithm. fname = "images/" + my_image image = Image.open(fname) my_image = np.array(image.resize((num_px, num_px))).reshape((1, num_px*num_px*3)).T my_predicted_image = predict(d["w"], d["b"], my_image)plt.imshow(image) print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")
5. 總結
- 處理數據很重要,數據維度,數據標準化
- 各個獨立的函數,初始化,前后向傳播,梯度下降更新參數
- 組成模型
- 調節學習率等超參數
總結
以上是生活随笔為你收集整理的01.神经网络和深度学习 W2.神经网络基础(作业:逻辑回归 图片识别)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 天池 在线编程 LR String
- 下一篇: 天池 在线编程 聪明的销售(计数+贪心)