當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

莫烦Tensorflow教程（15~22）

發布時間：2023/12/15 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了莫烦Tensorflow教程（15~22）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

十五、卷積神經網絡

圖像和語言方面結果突出

神經網絡是由多層級聯組成的，每層中包含很多神經元

卷積：神經網絡不再是對每個像素做處理，而是對一小塊區域的處理，這種做法加強了圖像信息的連續性，使得神經網絡看到的是一個圖像，而非一個點，同時也加深了神經網絡對圖像的理解，卷積神經網絡有一個批量過濾器，通過重復的收集圖像的信息，每次收集的信息都是小塊像素區域的信息，將信息整理，先得到邊緣信息，再用邊緣信息總結從更高層的信息結構，得到部分輪廓信息，最后得到完整的圖像信息特征，最后將特征輸入全連接層進行分類，得到分類結果。

詳細介紹：

貓的圖像，有長、寬、高（顏色信息，黑白高度為1，彩色高度為3）

卷積：

經過卷積以后，變為高度更高，長和寬更小的圖像，進行多次卷積，就會獲得深層特征

1）256*256的輸入（RGB為圖像深度）

2）不斷的利用卷積提取特征，壓縮長和寬，增大深度，也就是深層信息越多。

3）分類

池化：

提高魯棒性

綜合結構：

Tensorflow實現

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datamnist=input_data.read_data_sets('MNIST_data',one_hot=True)def compute_accuracy(v_xs,v_ys):#全局變量global prediction#生成預測值，也就是概率，即每個數字的概率y_pre=sess.run(prediction,feed_dict={xs:v_xs,keep_prob:1})#對比預測的數據是否和真實值相等，對比位置是否相等，相等就對了correct_prediction=tf.equal(tf.arg_max(y_pre,1),tf.arg_max(v_ys,1))#計算多少個對，多少個錯#tf.cast(x,dtype)，將x數據轉換為dtype類型accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))result=sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys,keep_prob:1})return resultdef weight_variable(shape):initial=tf.truncated_normal(shape,stddev=0.1)return tf.Variable(initial)def bias_variable(shape):initial=tf.constant(0.1,shape=shape)return tf.Variable(initial)def conv2d(x,W):#stride[1,x_movement,y_movement,1]return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME') #x,y跨度都為1 def max_pooling_2x2(x):return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')# define placeholder for input network keep_prob=tf.placeholder(tf.float32) xs=tf.placeholder(tf.float32,[None,784]) ys=tf.placeholder(tf.float32,[None,10]) #-1:代表圖像數量不確定,1:黑白色，channel為1 # 將xs變為[28*28*1]的形狀 x_image=tf.reshape(xs,[-1,28,28,1])# conv1 layer #patch/kernel=[5,5],input size=1也就是圖像的深度為1,output size=32也就是卷積核的個數 W_con1=weight_variable([5,5,1,32]) b_conv1=bias_variable([32]) #hidded layer h_conv1=tf.nn.relu(conv2d(x_image,W_con1)+b_conv1) #output size = 28*28*32 #pooling layer h_pool1=max_pooling_2x2(h_conv1) #output size=14*14*32# conv2 layer W_conv2=weight_variable([5,5,32,64]) #patch 5x5,in size 32,out size 64 b_conv2=bias_variable([64]) h_conv2=tf.nn.relu(conv2d(h_pool1,W_conv2)+b_conv2)#outputsize=14*14*64 h_pool2=max_pooling_2x2(h_conv2) #output size=7*7*64# func1 layer W_fc1=weight_variable([7*7*64,1024]) b_fc1=bias_variable([1024])h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64,]) #[n_samples,7,7,64]->[n_samples,7*7*64] h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+b_fc1) h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)# func2 layer W_fc2=weight_variable([1024,10]) b_fc2=bias_variable([10]) prediction=tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2)+b_fc2)#the error between prediction and real data cross_entropy=tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction),reduction_indices=[1])) train_step=tf.train.AdadeltaOptimizer(0.0001).minimize(cross_entropy)init=tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)for i in range(1000):batch_xs,batch_ys=mnist.train.next_batch(100)sess.run(train_step,feed_dict={xs:batch_xs,ys:batch_ys,keep_prob:0.5})if i%50 ==0:print(compute_accuracy(mnist.test.images,mnist.test.labels))

結果：

0.103 0.7985 0.8934 0.9172 0.9296 0.9409 0.9411 0.951 0.9495 0.9536 0.9611 0.9586 0.9643

隨機梯度下降訓練：

使用一小部分的隨機數據來進行訓練被稱為隨機訓練（stochastic training）- 在這里更確切的說是隨機梯度下降訓練。在理想情況下，我們希望用我們所有的數據來進行每一步的訓練，因為這能給我們更好的訓練結果，但顯然這需要很大的計算開銷。所以，每一次訓練我們可以使用不同的數據子集，這樣做既可以減少計算開銷，又可以最大化地學習到數據集的總體特性。

十六、Saver 保存讀取

Tensorflow目前只能保存Varibales，而不能保存框架，所以需要重新定義一下框架，再把Varibales放進來重新學習。

import tensorflow as tf import numpy as np # #save to file # W=tf.Variable([[1,2,3],[3,4,5]],dtype=tf.float32,name='weight')#2行3列的weight # b=tf.Variable([[1,2,3]],dtype=tf.float32,name='biases') #1行3列 # # init=tf.global_variables_initializer() # # #saver用來存儲各種變量 # saver=tf.train.Saver() # # with tf.Session() as sess: # sess.run(init) # # 把返回的值保存在save_path中,將sess中的所有東西都保存 # save_path=saver.save(sess,"my_net/save_net.ckpt") # print("Save to path:",save_path)#restore variables #只是一個空的框架，把上面保存的東西restore到這個框架中來 W=tf.Variable(np.arange(6).reshape((2,3)),dtype=tf.float32,name="weight") b=tf.Variable(np.arange(3).reshape((1,3)),dtype=tf.float32,name="biases")# no need to init step saver=tf.train.Saver()with tf.Session() as sess:saver.restore(sess,"my_net/save_net.ckpt")print("weight:",sess.run(W))print("biases:",sess.run(b))

結果：

weight: [[ 1. 2. 3.][ 3. 4. 5.]] biases: [[ 1. 2. 3.]]

十七、RNN

預測的順序排列是很重要的

序列數據，預測result0的時候是基于Data0，如果數據是有順序的，那么NN也可以分析出來數據中的關聯，就會產生很好的效果。

如果讓NN了解數據的關聯？——記住之前發生的事情

計算Data0之后，把分析結果存入記憶，分析Data1的時候，NN會產生新的記憶，但是兩個記憶沒有關聯，此時就可以將Data0的記憶調用過來，NN會將之前的記憶都累積起來，繼續分析則繼續累積。

數學分析：

RNN每次運行完之后都會產生一個對于當前的分析(state)S(t)

分析X(t+1)時刻，會產生一個S(t+1)，Y(t+1)是由S(t)和S(t+1)共同創造的。

RNN的形式：

RNN形式很多變，所以功能越來越強大
原理介紹：

RNN對于處理有序的數據很有效，預測序列化的數據

RNN：

預測有序的數據時，用x1預測得到y1，這部分的內存保存在cell中，之后對輸入x2再用這個cell預測y2，在預測時，首先這個cell會調用之前存儲的記憶，這部分記憶加上新的輸入x2，進行一個總結，之后輸出y2，所以得到的y2，不僅僅包含了輸入x2，還包含了上一步的x1的記憶，也就是對x1，x2按順序的一個總結。

下面的總體過程：

所有的w都是同一個w，經過同一個cell的時候，都會保留輸入的記憶，再加上另外一個要預測的輸入，所以預測包含了之前所有的記憶加上此次的輸入。

普通的RNN，如果要預測的序列是一個很長的序列，則反向傳播過程中存在梯度消失和梯度爆炸現象。

為了解決上述問題，提出了 LSTM RNN

Long Short-Term Memory，長短期記憶RNN

RNN是在有序的數據上進行學習的，RNN會產生對先前發生事件的記憶，不過一般形式的RNN有些“健忘”。

以“紅燒排骨”來分析，普通RNN為什么對久遠的記憶較差：

1）關鍵詞“紅燒排骨”要經過很多層訓練到達輸出，得到誤差

2）誤差反向傳遞時：得到的誤差在每一步都會乘以系數w

如果w<1，則傳遞到前面的誤差值就非常小，就是梯度消失
- 如果w>1，則傳遞到前面的誤差值就非常大，超過了承受范圍，計算梯度爆炸
- LSTM的改進：增加了三個控制器——輸入控制、輸出控制、忘記控制
  
  具體介紹：
  
  輸入：考慮要不要將分線劇情加入到主線劇情，如果某些分線劇情比較重要，那么就會按重要程度，將其寫入總線劇情，再進行分析。
  
  忘記：如果分線劇情改變了我們對主線劇情的認知，那么忘記劇情就會對之前的劇情進行忘記，按比例替換為現在的新劇情。
  
  所以主線劇情的更新就取決于輸入控制和忘記控制。
  
  輸出：基于目前的主線劇情和分線劇情，判斷到底要輸出什么。
  
  基于上述控制機制，LSTM就延緩了記憶衰退。
  
  Tensorflow
  
  以圖像來說，順序表示什么？
  
  就是第一行的像素算起，先考慮第一行，一直到最后一行。
  import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datamnist=input_data.read_data_sets('MNIST_data',one_hot=True)#hyperparameters lr=0.1 #learning rate training_iters=100000 #循環次數 batch_size=128n_inputs=28 #MNIST data input(28*28),每次輸入一行，即28個像素 n_steps=28 #總共28行，即輸入28次28 n_hidden_unis=128 #隱層神經元 n_classes=10 #10個類#tf Graph input x=tf.placeholder(tf.float32,[None,n_steps,n_inputs]) y=tf.placeholder(tf.float32,[None,n_classes])#Define weights #weights:input weights+output weights #進入RNN的cell之前，要經過一層hidden layer #cell計算完結果后再輸出到output hidden layer #下面就定義cell前后的兩層hidden layer，包括weights和biasesweights={#(28,128)'in':tf.Variable(tf.random_normal([n_inputs,n_hidden_unis])),#(128,10)'out':tf.Variable(tf.random_normal([n_hidden_unis,n_classes])) } biases={#(128,)'in':tf.Variable(tf.constant(0.1,shape=[n_hidden_unis,])),#(10,)'out':tf.Variable(tf.constant(0.1,shape=[n_classes,])) }def RNN(X,weights,biases):#hidden layer for input to cell#X(128 batch,28 steps,28 inputs),要轉化成(128x128,28 inputs),因為要進行矩陣乘法X=tf.reshape(X,[-1,n_inputs])# 再變換為3維矩陣,(128 batch x 28 steps,128 hidden)X_in=tf.matmul(X,weights['in'])+biases['in']# 再變換為3維矩陣,(128 batch,28 steps,128 hidden)X_in=tf.reshape(X_in,[-1,n_steps,n_hidden_unis])#cell#包含多少個節點，forget_bias:初始的forget定義為1，也就是不忘記，state_is_tuple：lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(n_hidden_unis,forget_bias=1.0,state_is_tuple=True)#RNN每次計算一次都會保留一個state#LSTM會保留兩個state，lstm cell is divided into two parts(c_state,m_state),#也就是主線的state(c_state),和分線的state(m_state)，會包含在元組（tuple）里邊#state_is_tuple=True就是判定生成的是否為一個元組# 初始state,全部為0，慢慢的累加記憶_init_state=lstm_cell.zero_state(batch_size,dtype=tf.float32)#outputs是一個list，每步的運算都會保存起來，time_majortime的時間點是不是在維度為1的地方，我們的放在第二個維度，28stepsoutputs,states=tf.nn.dynamic_rnn(lstm_cell,X_in,initial_state=_init_state,time_major=False)#hidden layer for outputs and final resultsresults=tf.matmul(states[1],weights['out'])+biases['out']return resultspred=RNN(x,weights,biases) #the error between prediction and real data #labels是神經網絡目標輸出 , logistics是神經網絡實際輸出 cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y)) train_op=tf.train.AdadeltaOptimizer(lr).minimize(cost)correct_pred=tf.equal(tf.arg_max(pred,1),tf.arg_max(y,1)) accuracy=tf.reduce_mean(tf.cast(correct_pred,tf.float32))init=tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)step=0while step*batch_size<training_iters:batch_xs,batch_ys=mnist.train.next_batch(batch_size)batch_xs=batch_xs.reshape([batch_size,n_steps,n_inputs])sess.run(train_op,feed_dict={x:batch_xs,y:batch_ys})if step%20==0:print(sess.run(accuracy,feed_dict={x:batch_xs,y:batch_ys}))step+=1
  十八、自編碼（Autoencoder）
  
  神經網絡的非監督學習
  
  神經網絡接收圖像→給圖像打馬賽克→再還原
  
  具體：
  
  原有的圖像被壓縮，再用所儲存的特征信息，經過解壓獲得原圖。
  
  如果神經元直接從獲取的高清圖像中取學習信息，會是一件很吃力的事情，所以通過特征提取，提取出能夠重構出原圖的主要信息，把縮減后的信息放入神經網絡中進行學習，就可以更加輕松的學習。
  
  輸入：白色的X
  
  輸出：黑色的X
  
  求取兩者的誤差，經過誤差反向傳遞，逐步提升自編碼準確性，中間的隱層就是能夠提取出原數據最主要特征的神經元。
  
  為什么說其是非監督學習：因為該過程只是用了X，而不用其標簽，所以使非監督學習。
  
  一般使用的時候只是用前半部分
  
  因為前面已經學習了數據的精髓，我們只需要創建一個神經網絡來學習這些精髓就好啦，可以達到和普通神經網絡一樣的效果，并且很高效。
  
  編碼器：前半部分
  
  解碼器：后半部分
  
  自編碼和PCA類似，可以提取出特征，可以給特征降維，自編碼超越了PCA。
  
  代碼1：
  import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_datamnist=input_data.read_data_sets('MNIST_data',one_hot=True)#hyperparameters lr=0.1 #learning rate training_iters=100000 #循環次數 batch_size=128n_inputs=28 #MNIST data input(28*28),每次輸入一行，即28個像素 n_steps=28 #總共28行，即輸入28次28 n_hidden_unis=128 #隱層神經元 n_classes=10 #10個類#tf Graph input x=tf.placeholder(tf.float32,[None,n_steps,n_inputs]) y=tf.placeholder(tf.float32,[None,n_classes])#Define weights #weights:input weights+output weights #進入RNN的cell之前，要經過一層hidden layer #cell計算完結果后再輸出到output hidden layer #下面就定義cell前后的兩層hidden layer，包括weights和biasesweights={#(28,128)'in':tf.Variable(tf.random_normal([n_inputs,n_hidden_unis])),#(128,10)'out':tf.Variable(tf.random_normal([n_hidden_unis,n_classes])) } biases={#(128,)'in':tf.Variable(tf.constant(0.1,shape=[n_hidden_unis,])),#(10,)'out':tf.Variable(tf.constant(0.1,shape=[n_classes,])) }def RNN(X,weights,biases):#hidden layer for input to cell#X(128 batch,28 steps,28 inputs),要轉化成(128x128,28 inputs),因為要進行矩陣乘法X=tf.reshape(X,[-1,n_inputs])# 再變換為3維矩陣,(128 batch x 28 steps,128 hidden)X_in=tf.matmul(X,weights['in'])+biases['in']# 再變換為3維矩陣,(128 batch,28 steps,128 hidden)X_in=tf.reshape(X_in,[-1,n_steps,n_hidden_unis])#cell#包含多少個節點，forget_bias:初始的forget定義為1，也就是不忘記，state_is_tuple：lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(n_hidden_unis,forget_bias=1.0,state_is_tuple=True)#RNN每次計算一次都會保留一個state#LSTM會保留兩個state，lstm cell is divided into two parts(c_state,m_state),#也就是主線的state(c_state),和分線的state(m_state)，會包含在元組（tuple）里邊#state_is_tuple=True就是判定生成的是否為一個元組# 初始state,全部為0，慢慢的累加記憶_init_state=lstm_cell.zero_state(batch_size,dtype=tf.float32)#outputs是一個list，每步的運算都會保存起來，time_majortime的時間點是不是在維度為1的地方，我們的放在第二個維度，28stepsoutputs,states=tf.nn.dynamic_rnn(lstm_cell,X_in,initial_state=_init_state,time_major=False)#hidden layer for outputs and final resultsresults=tf.matmul(states[1],weights['out'])+biases['out']return resultspred=RNN(x,weights,biases) #the error between prediction and real data #labels是神經網絡目標輸出 , logistics是神經網絡實際輸出 cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y)) train_op=tf.train.AdamOptimizer(lr).minimize(cost)correct_pred=tf.equal(tf.arg_max(pred,1),tf.arg_max(y,1)) accuracy=tf.reduce_mean(tf.cast(correct_pred,tf.float32))init=tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)step=0while step*batch_size<training_iters:batch_xs,batch_ys=mnist.train.next_batch(batch_size)batch_xs=batch_xs.reshape([batch_size,n_steps,n_inputs])sess.run(train_op,feed_dict={x:batch_xs,y:batch_ys})if step%20==0:print(sess.run(accuracy,feed_dict={x:batch_xs,y:batch_ys}))step+=1
  把梯度下降方法輸入錯了，cost總是很大，找了好久的問題。
  
  結果：
  Epoch: 0001 cost= 0.089918643 Epoch: 0002 cost= 0.082782879 Epoch: 0003 cost= 0.073581800 Epoch: 0004 cost= 0.069128580 Epoch: 0005 cost= 0.066503450 Epoch: 0006 cost= 0.066125013 Epoch: 0007 cost= 0.062507540 Epoch: 0008 cost= 0.059653457 Epoch: 0009 cost= 0.060695820 Epoch: 0010 cost= 0.059536964 Optimization Finished
  
  代碼2：
  import tensorflow as tf import matplotlib.pyplot as pltfrom tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)learning_rate = 0.01 training_epochs = 20 batch_size = 256 display_step = 1 n_input = 784 X = tf.placeholder("float", [None, n_input])#壓縮過程，壓縮到2個元素 n_hidden_1 = 128 n_hidden_2 = 64 n_hidden_3 = 10 n_hidden_4 = 2weights = {'encoder_h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1],)),'encoder_h2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2],)),'encoder_h3': tf.Variable(tf.truncated_normal([n_hidden_2, n_hidden_3],)),'encoder_h4': tf.Variable(tf.truncated_normal([n_hidden_3, n_hidden_4],)),'decoder_h1': tf.Variable(tf.truncated_normal([n_hidden_4, n_hidden_3],)),'decoder_h2': tf.Variable(tf.truncated_normal([n_hidden_3, n_hidden_2],)),'decoder_h3': tf.Variable(tf.truncated_normal([n_hidden_2, n_hidden_1],)),'decoder_h4': tf.Variable(tf.truncated_normal([n_hidden_1, n_input],)), } biases = {'encoder_b1': tf.Variable(tf.random_normal([n_hidden_1])),'encoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),'encoder_b3': tf.Variable(tf.random_normal([n_hidden_3])),'encoder_b4': tf.Variable(tf.random_normal([n_hidden_4])),'decoder_b1': tf.Variable(tf.random_normal([n_hidden_3])),'decoder_b2': tf.Variable(tf.random_normal([n_hidden_2])),'decoder_b3': tf.Variable(tf.random_normal([n_hidden_1])),'decoder_b4': tf.Variable(tf.random_normal([n_input])), } def encoder(x):layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),biases['encoder_b1']))layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),biases['encoder_b2']))layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, weights['encoder_h3']),biases['encoder_b3']))# 為了便于編碼層的輸出，編碼層隨后一層不使用激活函數，輸出的范圍是無窮大layer_4 = tf.add(tf.matmul(layer_3, weights['encoder_h4']),biases['encoder_b4'])return layer_4def decoder(x):layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),biases['decoder_b1']))layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),biases['decoder_b2']))layer_3 = tf.nn.sigmoid(tf.add(tf.matmul(layer_2, weights['decoder_h3']),biases['decoder_b3']))layer_4 = tf.nn.sigmoid(tf.add(tf.matmul(layer_3, weights['decoder_h4']),biases['decoder_b4']))return layer_4encoder_op = encoder(X) decoder_op = decoder(encoder_op)y_pred = decoder_op y_true = Xcost = tf.reduce_mean(tf.pow(y_true - y_pred, 2)) optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)init=tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)total_batch = int(mnist.train.num_examples/batch_size)for epoch in range(training_epochs):for i in range(total_batch):batch_xs, batch_ys = mnist.train.next_batch(batch_size) # max(x) = 1, min(x) = 0_, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})if epoch % display_step == 0:print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c))print("Optimization Finished!")#顯示解壓前的結果encoder_result = sess.run(encoder_op, feed_dict={X: mnist.test.images})plt.scatter(encoder_result[:, 0], encoder_result[:, 1], c=mnist.test.labels)# plt.colorbar()plt.show()
  結果：
  
  十九、tf.name_scope / tf.variable_scope
  
  一、tf.name_scope
  from __future__ import print_function #__future__模塊，把下一個新版本的特性導入到當前版本，于是我們就可以在當前版本中測試一些新版本的特性 import tensorflow as tf tf.set_random_seed(1)with tf.name_scope("a_name_scope"): #name_scope的名字為"a_name_scope"initializer=tf.constant_initializer(value=1)#兩種創建variable的途徑# tf.get_variable要定義一個initializer# name_scope 對tf.get_variable無效var1=tf.get_variable(name='var1',shape=[1],dtype=tf.float32,initializer=initializer)var2=tf.Variable(name='var2', initial_value=[2], dtype=tf.float32)var21 = tf.Variable(name='var2', initial_value=[2.1], dtype=tf.float32)var22 = tf.Variable(name='var2', initial_value=[2.2], dtype=tf.float32)with tf.Session() as sess:sess.run(tf.global_variables_initializer())#分別打印varibale的名字和值print(var1.name)print(sess.run(var1))print(var2.name)print(sess.run(var2))print(var21.name)print(sess.run(var21))print(var22.name)print(sess.run(var22))
  結果：
  var1:0 [ 1.] a_name_scope/var2:0 [ 2.] a_name_scope/var2_1:0 [ 2.0999999] a_name_scope/var2_2:0 [ 2.20000005]
  二、tf.variable_scope
  from __future__ import print_function #__future__模塊，把下一個新版本的特性導入到當前版本，于是我們就可以在當前版本中測試一些新版本的特性 import tensorflow as tf tf.set_random_seed(1)with tf.variable_scope("a_variable_scope") as scope:initializer=tf.constant_initializer(value=3)var3=tf.get_variable(name="var3",shape=[1],dtype=tf.float32,initializer=initializer)var4=tf.Variable(name='var4',initial_value=[4],dtype=tf.float32)#可以重復調用之前創造的變量，但是tf.Variable是不可行的，只能重新創建一個#a_variable_scope/var4:0# [ 4.]# a_variable_scope/var4_1:0# [ 4.]#var4_reuse=tf.Variable(name='var4',initial_value=[4],dtype=tf.float32)#使用tf.get_variable重復調用var3,要先強調后面的要重復利用#scope.reuse_variables()會先再前面搜索是否已經存在，重復利用得到的兩個變量是同一個變量scope.reuse_variables()var3_reuse=tf.get_variable(name='var3')with tf.Session() as sess:sess.run(tf.global_variables_initializer())#分別打印varibale的名字和值print(var3.name)print(sess.run(var3))print(var4.name)print(sess.run(var4))print(var3_reuse.name)print(sess.run(var3_reuse))
  結果：
  a_variable_scope/var3:0 [ 3.] a_variable_scope/var4:0 [ 4.] a_variable_scope/var3:0 [ 3.]
  為什么要用tf.variable_scope來定義重復利用？
  
  ——RNN會經常用到。
  
  二十、批標準化——Batch Normalization
  
  將分散的數據進行規范化，利于機器學習的學習。
  
  數據分布會對神經網絡的學習產生影響，
  
  1）輸入X1=1，權值為W=0.1，第二層接收到的就是Wx1=0.1?1=0.1
  
  2）輸入X2=20，權值為W=0.1，第二層接收到的就是Wx1=0.1?20=2
  
  3）添加激活函數：tanh，tanh(x1)=0.1，tanh(x2)=0.96，x2已經接近飽和了，無論之后x怎么擴大，tanh函數的輸出值都不會變化很大，也就是神經網絡在初始階段已經不對那些過大的x敏感了，所以要做預處理，使得輸入的范圍規范化，集中在激勵函數的敏感部分。
  
  但是這種情況不僅僅發生在輸入層，同樣發生在隱藏層，那么可以對隱藏層的輸入進行標準化嗎？
  
  答案是肯定的，這也叫做batch normalization
  
  將數據Data進行分批，分批進行隨機梯度下降，并且在每批數據進行前向傳遞的時候，對每一層都進行Normaliation。
  
  x經過神經網絡的前向傳播過程：
  
  x->全連接層->激活函數->全連接層
  
  添加Batch Normalization：
  
  x->全連接層->Batch Normalization->激活函數->全連接層
  
  輸入激勵函數的值對計算結果很重要，所以要將數據規范化到激活函數的敏感區域，才能更有效的向前傳遞。
  
  下圖展示了未進行BN 和進行BN后的數據的分布：
  
  激活后的分布如下：
  
  未進行BN的數據激活之后大多分布在飽和階段，也就是-1和1的居多，BN之后的數據進行激活的結果基本均勻分布，對神經網絡的學習更加有價值。
  
  Batch Normalization：包含正向和反向兩個過程
  
  反向操作：將BN后的數據進行擴展和平移，就是為了讓神經網絡自己學習去學習使用和修改擴展參數γ，和平移參數β，讓神經網絡自己學習BN到底有沒有作用，如果沒有作用的話，就用上述兩個參數進行抵消BN的一些操作。
  
  神經網絡訓練到最后，數據的分布圖
  
  有BN標準化：讓每一層的值在有效的范圍內傳遞下去，
  
  無BN標準化：缺失了對數據的敏感性，不能有效的傳遞每一層的信息
  
  詳細解釋：
  
  上面一行是未進行BN的過程：
  
  下面一行是對每一層都進行BN的過程：
  
  Tensorflow
  """ Build two networks. 1. Without batch normalization 2. With batch normalizationRun tests on these two networks. """# 23 Batch Normalizationimport numpy as np import tensorflow as tf import matplotlib.pyplot as pltACTIVATION = tf.nn.tanh N_LAYERS = 7 N_HIDDEN_UNITS = 30def fix_seed(seed=1):# reproduciblenp.random.seed(seed)tf.set_random_seed(seed)def plot_his(inputs, inputs_norm):# plot histogram for the inputs of every layerfor j, all_inputs in enumerate([inputs, inputs_norm]):for i, input in enumerate(all_inputs):plt.subplot(2, len(all_inputs), j*len(all_inputs)+(i+1))plt.cla()if i == 0:the_range = (-7, 10)else:the_range = (-1, 1)plt.hist(input.ravel(), bins=15, range=the_range, color='#FF5733')plt.yticks(())if j == 1:plt.xticks(the_range)else:plt.xticks(())ax = plt.gca()ax.spines['right'].set_color('none')ax.spines['top'].set_color('none')plt.title("%s normalizing" % ("Without" if j == 0 else "With"))plt.draw()plt.pause(0.01)def built_net(xs, ys, norm):def add_layer(inputs, in_size, out_size, activation_function=None, norm=False):# weights and biases (bad initialization for this case)Weights = tf.Variable(tf.random_normal([in_size, out_size], mean=0., stddev=1.))biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)# fully connected productWx_plus_b = tf.matmul(inputs, Weights) + biases# normalize fully connected productif norm:# Batch Normalize# 首先得到整批數據的均值和方差，在batch的維度上#注意！如果是test，要固定fc_mean, fc_var兩個參數，不使用tf.nn.moments#因為測試的時候不是在一個batch中測試的，不用求它的方差均值了fc_mean, fc_var = tf.nn.moments(Wx_plus_b,axes=[0], # the dimension you wanna normalize, here [0] for batch# for image, you wanna do [0, 1, 2] for [batch, height, width] but not channel)scale = tf.Variable(tf.ones([out_size]))shift = tf.Variable(tf.zeros([out_size]))epsilon = 0.001# apply moving average for mean and var when train on batchema = tf.train.ExponentialMovingAverage(decay=0.5)def mean_var_with_update():ema_apply_op = ema.apply([fc_mean, fc_var])with tf.control_dependencies([ema_apply_op]):return tf.identity(fc_mean), tf.identity(fc_var)mean, var = mean_var_with_update()Wx_plus_b = tf.nn.batch_normalization(Wx_plus_b, mean, var, shift, scale, epsilon)# similar with this two steps:# Wx_plus_b = (Wx_plus_b - fc_mean) / tf.sqrt(fc_var + 0.001)# Wx_plus_b = Wx_plus_b * scale + shift# activationif activation_function is None:outputs = Wx_plus_belse:outputs = activation_function(Wx_plus_b)return outputsfix_seed(1)#對輸入層做normalizationif norm:# BN for the first inputfc_mean, fc_var = tf.nn.moments(xs,axes=[0],)scale = tf.Variable(tf.ones([1]))shift = tf.Variable(tf.zeros([1]))epsilon = 0.001# apply moving average for mean and var when train on batchema = tf.train.ExponentialMovingAverage(decay=0.5)def mean_var_with_update():ema_apply_op = ema.apply([fc_mean, fc_var])with tf.control_dependencies([ema_apply_op]):return tf.identity(fc_mean), tf.identity(fc_var)mean, var = mean_var_with_update()xs = tf.nn.batch_normalization(xs, mean, var, shift, scale, epsilon)# record inputs for every layerlayers_inputs = [xs]# build hidden layersfor l_n in range(N_LAYERS):layer_input = layers_inputs[l_n]in_size = layers_inputs[l_n].get_shape()[1].valueoutput = add_layer(layer_input, # inputin_size, # input sizeN_HIDDEN_UNITS, # output sizeACTIVATION, # activation functionnorm, # normalize before activation)layers_inputs.append(output) # add output for next run# build output layerprediction = add_layer(layers_inputs[-1], 30, 1, activation_function=None)cost = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction), reduction_indices=[1]))train_op = tf.train.GradientDescentOptimizer(0.001).minimize(cost)return [train_op, cost, layers_inputs]# make up data fix_seed(1) x_data = np.linspace(-7, 10, 2500)[:, np.newaxis] np.random.shuffle(x_data) noise = np.random.normal(0, 8, x_data.shape) y_data = np.square(x_data) - 5 + noise# plot input data plt.scatter(x_data, y_data) plt.show()xs = tf.placeholder(tf.float32, [None, 1]) # [num_samples, num_features] ys = tf.placeholder(tf.float32, [None, 1])train_op, cost, layers_inputs = built_net(xs, ys, norm=False) # without BN train_op_norm, cost_norm, layers_inputs_norm = built_net(xs, ys, norm=True) # with BNsess = tf.Session() if int((tf.__version__).split('.')[1]) < 12 and int((tf.__version__).split('.')[0]) < 1:init = tf.initialize_all_variables() else:init = tf.global_variables_initializer() sess.run(init)# record cost cost_his = [] cost_his_norm = [] record_step = 5plt.ion() plt.figure(figsize=(7, 3)) for i in range(250):if i % 50 == 0:# plot histogramall_inputs, all_inputs_norm = sess.run([layers_inputs, layers_inputs_norm], feed_dict={xs: x_data, ys: y_data})plot_his(all_inputs, all_inputs_norm)# train on batchsess.run([train_op, train_op_norm], feed_dict={xs: x_data[i*10:i*10+10], ys: y_data[i*10:i*10+10]})if i % record_step == 0:# record costcost_his.append(sess.run(cost, feed_dict={xs: x_data, ys: y_data}))cost_his_norm.append(sess.run(cost_norm, feed_dict={xs: x_data, ys: y_data}))plt.ioff() plt.figure() plt.plot(np.arange(len(cost_his))*record_step, np.array(cost_his), label='no BN') # no norm plt.plot(np.arange(len(cost_his))*record_step, np.array(cost_his_norm), label='BN') # norm plt.legend() plt.show()
  
  每50步變化一次的分布情況，
  
  沒有BN的情況：第一層還有分布，后面的基本都變為0。
  輸入（-7，10）
  
  有BN的情況：基本上都可以很好的分布在大于0的區間。
  使用的relu的激活函數：
  
  誤差曲線：
  
  no BN的誤差曲線都沒有，訓練到最后，所有的神經元都不起作用了，說明用relu函數后，都不起作用了。
  
  tanh
  
  數據分布：
  
  沒有BN：輸入值基本趨于飽和
  
  有BN：大部分值都在沒有飽和的區間，也就是激活的狀態
  
  誤差曲線：
  
  no BN：有誤差曲線了
  
  BN：誤差會一直減小，訓練效果更好
  
  二十一、可視化梯度
  import tensorflow as tf import numpy as np import matplotlib.pylab as plt from mpl_toolkits.mplot3d import Axes3DLR=0.1 #模型有兩個參數 REAL_PARAMS=[1.2,2.5] #生成模型的真實參數 INIT_PARAMS=[[5,4], #初始化的數據[5,1],[2,4.5]][2]x=np.linspace(-1,1,200,dtype=np.float32)# test 1 y_fun=lambda a,b:a*x+b #生成真實數據 tf_y_fun=lambda a,b:a*x+b #用tensorflow來擬合a和b這兩個參數noise=np.random.rand(200)/10 y=y_fun(*REAL_PARAMS)+noise #參數使用REAL_PARAMS # plt.scatter(x,y) # plt.show()a,b=[tf.Variable(initial_value=p,dtype=tf.float32) for p in INIT_PARAMS] pred=tf_y_fun(a,b) mse=tf.reduce_mean(tf.square(y-pred)) train_op=tf.train.GradientDescentOptimizer(LR).minimize(mse)a_list,b_list,cost_list=[],[],[] with tf.Session() as sess:sess.run(tf.global_variables_initializer())for t in range(400):a_,b_,mes_=sess.run([a,b,mse])#record parametersa_list.append(a_)b_list.append(b_)cost_list.append(mes_)#trainingresult,_=sess.run([pred,train_op])# visualization codes: print('a=', a_, 'b=', b_) plt.figure(1) plt.scatter(x, y, c='b') # plot data plt.plot(x, result, 'r-', lw=2) # plot line fitting # 3D cost figure fig = plt.figure(2); ax = Axes3D(fig) a3D, b3D = np.meshgrid(np.linspace(-2, 7, 30), np.linspace(-2, 7, 30)) # parameter space cost3D = np.array([np.mean(np.square(y_fun(a_, b_) - y)) for a_, b_ in zip(a3D.flatten(), b3D.flatten())]).reshape(a3D.shape) ax.plot_surface(a3D, b3D, cost3D, rstride=1, cstride=1, cmap=plt.get_cmap('rainbow'), alpha=0.5) ax.scatter(a_list[0], b_list[0], zs=cost_list[0], s=300, c='r') # initial parameter place ax.set_xlabel('a'); ax.set_ylabel('b') ax.plot(a_list, b_list, zs=cost_list, zdir='z', c='r', lw=3) # plot 3D gradient descent plt.show()
  結果：
  a= 1.19776 b= 2.54675
  
  從初始點向誤差小的方向下降
  
  跨步太大，路徑波動太大，出現震蕩。
  
  并且沒有辦法很好的擬合到原始數據
  
  如何使用Tensorflow進行調參：
  
  局部最優：和初始值關系較大，會滑到局部最小值
  
  初始點1：
  LR=0.1 #模型有兩個參數 REAL_PARAMS=[1.2,2.5] #生成模型的真實參數 INIT_PARAMS=[[5,4], #初始化的數據[5,1],[2,4.5]][2]
  
  初始點2：改變初始值
  #模型有兩個參數 REAL_PARAMS=[1.2,2.5] #生成模型的真實參數 INIT_PARAMS=[[5,4], #初始化的數據[5,1],[2,4.5]][1]
  
  二十二：遷移學習——Transfer learning
  
  站在巨人的肩膀上，借鑒已有的模型。
  
  不再訓練前面的參數，也就是固定住模型的理解能力，將輸出層替換為需要的功能。
  
  節約計算資源
  
  Tensorflow實現：
  
  利用16層的VGGNet，拆掉其分類的部分，補上用回歸的。
  
  區分貓和老虎。
  
  原本為分類，現在做成回歸，就是做些假的數據，包括貓和老虎自身的長度，
  
  我們就來遷移一個圖片分類的 CNN (VGG). 這個 VGG 在1000個類別中訓練過. 我們提取這個 VGG 前面的 Conv layers, 重新組建后面的 fully connected layers, 讓它做一個和分類完全不相干的事. 我們在網上下載那1000個分類數據中的貓和老虎的圖片, 然后偽造一些貓和老虎長度的數據. 最后做到讓遷移后的網絡分辨出貓和老虎的長度 (regressor).
  
  因為現在我們不是預測分類結果了, 所以我偽造了一些體長的數據. 老虎通常要比貓長, 所以它們的 distribution 就差不多是下面這種結構(單位cm).
  
  數據下載：
  
  VGG16.npy
  
  遷移學習改動的地方：
  
  為了做遷移學習, 對他的 tensorflow VGG16 代碼進行了改寫. 保留了所有 Conv 和 pooling 層, 將后面的所有 fc 層拆了, 改成可以被 train 的兩層, 輸出一個數字, 這個數字代表了這只貓或老虎的長度.
  """ This is a simple example of transfer learning using VGG. Fine tune a CNN from a classifier to regressor. Generate some fake data for describing cat and tiger length. Fake length setting: Cat - Normal distribution (40, 8) Tiger - Normal distribution (100, 30) The VGG model and parameters are adopted from: https://github.com/machrisaa/tensorflow-vgg Learn more, visit my tutorial site: [莫煩Python](https://morvanzhou.github.io) """from urllib.request import urlretrieve import os import numpy as np import tensorflow as tf import skimage.io import skimage.transform import matplotlib.pyplot as pltdef download(): # download tiger and kittycat imagecategories = ['tiger', 'kittycat']for category in categories:os.makedirs('transfer_learning/data/%s' % category, exist_ok=True)with open('transfer_learning/model/imagenet_%s.txt' % category, 'r') as file:urls = file.readlines()n_urls = len(urls)for i, url in enumerate(urls):try:urlretrieve(url.strip(), 'transfer_learning/data/%s/%s' % (category, url.strip().split('/')[-1]))print('%s %i/%i' % (category, i, n_urls))except:print('%s %i/%i' % (category, i, n_urls), 'no image')def load_img(path):img = skimage.io.imread(path)img = img / 255.0# print "Original Image Shape: ", img.shape# we crop image from centershort_edge = min(img.shape[:2])yy = int((img.shape[0] - short_edge) / 2)xx = int((img.shape[1] - short_edge) / 2)crop_img = img[yy: yy + short_edge, xx: xx + short_edge]# resize to 224, 224resized_img = skimage.transform.resize(crop_img, (224, 224))[None, :, :, :] # shape [1, 224, 224, 3]return resized_imgdef load_data():imgs = {'tiger': [], 'kittycat': []}for k in imgs.keys():dir = 'transfer_learning/data/' + kfor file in os.listdir(dir):if not file.lower().endswith('.jpg'):continuetry:resized_img = load_img(os.path.join(dir, file))except OSError:continueimgs[k].append(resized_img) # [1, height, width, depth] * nif len(imgs[k]) == 400: # only use 400 imgs to reduce my memory loadbreak# fake length data for tiger and cattigers_y = np.maximum(20, np.random.randn(len(imgs['tiger']), 1) * 30 + 100)cat_y = np.maximum(10, np.random.randn(len(imgs['kittycat']), 1) * 8 + 40)return imgs['tiger'], imgs['kittycat'], tigers_y, cat_yclass Vgg16:vgg_mean = [103.939, 116.779, 123.68]def __init__(self, vgg16_npy_path=None, restore_from=None):# pre-trained parameterstry:self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()except FileNotFoundError:print('Please download VGG16 parameters at here https://mega.nz/#!YU1FWJrA!O1ywiCS2IiOlUCtCpI6HTJOMrneN-Qdv3ywQP5poecM')self.tfx = tf.placeholder(tf.float32, [None, 224, 224, 3])self.tfy = tf.placeholder(tf.float32, [None, 1])# Convert RGB to BGRred, green, blue = tf.split(axis=3, num_or_size_splits=3, value=self.tfx * 255.0)bgr = tf.concat(axis=3, values=[blue - self.vgg_mean[0],green - self.vgg_mean[1],red - self.vgg_mean[2],])# pre-trained VGG layers are fixed in fine-tuneconv1_1 = self.conv_layer(bgr, "conv1_1")conv1_2 = self.conv_layer(conv1_1, "conv1_2")pool1 = self.max_pool(conv1_2, 'pool1')conv2_1 = self.conv_layer(pool1, "conv2_1")conv2_2 = self.conv_layer(conv2_1, "conv2_2")pool2 = self.max_pool(conv2_2, 'pool2')conv3_1 = self.conv_layer(pool2, "conv3_1")conv3_2 = self.conv_layer(conv3_1, "conv3_2")conv3_3 = self.conv_layer(conv3_2, "conv3_3")pool3 = self.max_pool(conv3_3, 'pool3')conv4_1 = self.conv_layer(pool3, "conv4_1")conv4_2 = self.conv_layer(conv4_1, "conv4_2")conv4_3 = self.conv_layer(conv4_2, "conv4_3")pool4 = self.max_pool(conv4_3, 'pool4')conv5_1 = self.conv_layer(pool4, "conv5_1")conv5_2 = self.conv_layer(conv5_1, "conv5_2")conv5_3 = self.conv_layer(conv5_2, "conv5_3")pool5 = self.max_pool(conv5_3, 'pool5')# detach original VGG fc layers and# reconstruct your own fc layers serve for your own purposeself.flatten = tf.reshape(pool5, [-1, 7*7*512])self.fc6 = tf.layers.dense(self.flatten, 256, tf.nn.relu, name='fc6')self.out = tf.layers.dense(self.fc6, 1, name='out')self.sess = tf.Session()if restore_from:saver = tf.train.Saver()saver.restore(self.sess, restore_from)else: # training graphself.loss = tf.losses.mean_squared_error(labels=self.tfy, predictions=self.out)self.train_op = tf.train.RMSPropOptimizer(0.001).minimize(self.loss)self.sess.run(tf.global_variables_initializer())def max_pool(self, bottom, name):return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)def conv_layer(self, bottom, name):with tf.variable_scope(name): # CNN's filter is constant, NOT Variable that can be trainedconv = tf.nn.conv2d(bottom, self.data_dict[name][0], [1, 1, 1, 1], padding='SAME')lout = tf.nn.relu(tf.nn.bias_add(conv, self.data_dict[name][1]))return loutdef train(self, x, y):loss, _ = self.sess.run([self.loss, self.train_op], {self.tfx: x, self.tfy: y})return lossdef predict(self, paths):fig, axs = plt.subplots(1, 2)for i, path in enumerate(paths):x = load_img(path)length = self.sess.run(self.out, {self.tfx: x})axs[i].imshow(x[0])axs[i].set_title('Len: %.1f cm' % length)axs[i].set_xticks(()); axs[i].set_yticks(())plt.show()def save(self, path='ransfer_learning/model/transfer_learn'):saver = tf.train.Saver()saver.save(self.sess, path, write_meta_graph=False)def train():tigers_x, cats_x, tigers_y, cats_y = load_data()# plot fake length distributionplt.hist(tigers_y, bins=20, label='Tigers')plt.hist(cats_y, bins=10, label='Cats')plt.legend()plt.xlabel('length')plt.show()xs = np.concatenate(tigers_x + cats_x, axis=0)ys = np.concatenate((tigers_y, cats_y), axis=0)vgg = Vgg16(vgg16_npy_path='transfer_learning/vgg16.npy')print('Net built')for i in range(100):b_idx = np.random.randint(0, len(xs), 6)train_loss = vgg.train(xs[b_idx], ys[b_idx])print(i, 'train loss: ', train_loss)vgg.save('transfer_learning/model/transfer_learn') # save learned fc layersdef eval():vgg = Vgg16(vgg16_npy_path='transfer_learning/vgg16.npy',restore_from='transfer_learning/model/transfer_learn')vgg.predict(['transfer_learning/data/kittycat/000129037.jpg', 'transfer_learning/data/tiger/391412.jpg'])if __name__ == '__main__':download()train()eval()
總結

以上是生活随笔為你收集整理的莫烦Tensorflow教程（15~22）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： github上传本地代码
下一篇：官宣！荣耀Magic5系列定档2月27日

编程问答

莫烦Tensorflow教程（15~22）

十五、卷積神經網絡

十六、Saver 保存讀取

十七、RNN

十八、自編碼（Autoencoder）

十九、tf.name_scope / tf.variable_scope

二十一、可視化梯度

二十二：遷移學習——Transfer learning

總結