Python 3深度置信网络(DBN)在Tensorflow中的实现MNIST手写数字识别
任何程序錯(cuò)誤,以及技術(shù)疑問或需要解答的,請(qǐng)掃碼添加作者VX:1755337994
使用DBN識(shí)別手寫體
傳統(tǒng)的多層感知機(jī)或者神經(jīng)網(wǎng)絡(luò)的一個(gè)問題: 反向傳播可能總是導(dǎo)致局部最小值。
當(dāng)誤差表面(error surface)包含了多個(gè)凹槽,當(dāng)你做梯度下降時(shí),你找到的并不是最深的凹槽。 下面你將會(huì)看到DBN是怎么解決這個(gè)問題的。
深度置信網(wǎng)絡(luò)
深度置信網(wǎng)絡(luò)可以通過額外的預(yù)訓(xùn)練規(guī)程解決局部最小值的問題。 預(yù)訓(xùn)練在反向傳播之前做完,這樣可以使錯(cuò)誤率離最優(yōu)的解不是那么遠(yuǎn),也就是我們?cè)谧顑?yōu)解的附近。再通過反向傳播慢慢地降低錯(cuò)誤率。
深度置信網(wǎng)絡(luò)主要分成兩部分。第一部分是多層玻爾茲曼感知機(jī),用于預(yù)訓(xùn)練我們的網(wǎng)絡(luò)。第二部分是前饋反向傳播網(wǎng)絡(luò),這可以使RBM堆疊的網(wǎng)絡(luò)更加精細(xì)化。
1. 加載必要的深度置信網(wǎng)絡(luò)庫
# urllib is used to download the utils file from deeplearning.net import urllib.request response = urllib.request.urlopen('http://deeplearning.net/tutorial/code/utils.py') content = response.read().decode('utf-8') target = open('utils.py', 'w') target.write(content) target.close() # Import the math function for calculations import math # Tensorflow library. Used to implement machine learning models import tensorflow as tf # Numpy contains helpful functions for efficient mathematical calculations import numpy as np # Image library for image manipulation from PIL import Image # import Image # Utils file from utils import tile_raster_images2. 構(gòu)建RBM層
RBM的細(xì)節(jié)參考【https://blog.csdn.net/sinat_28371057/article/details/115795086】
為了在Tensorflow中應(yīng)用DBN, 下面創(chuàng)建一個(gè)RBM的類
3. 導(dǎo)入MNIST數(shù)據(jù)
使用one-hot encoding標(biāo)注的形式載入MNIST圖像數(shù)據(jù)。
# Getting the MNIST data provided by Tensorflow from tensorflow.examples.tutorials.mnist import input_data# Loading in the mnist data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images,\mnist.test.labels Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz4. 建立DBN
RBM_hidden_sizes = [500, 200 , 50 ] #create 4 layers of RBM with size 785-500-200-50#Since we are training, set input as training data inpX = trX#Create list to hold our RBMs rbm_list = []#Size of inputs is the number of inputs in the training set input_size = inpX.shape[1]#For each RBM we want to generate for i, size in enumerate(RBM_hidden_sizes):print('RBM: ',i,' ',input_size,'->', size)rbm_list.append(RBM(input_size, size))input_size = size Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz RBM: 0 784 -> 500 RBM: 1 500 -> 200 RBM: 2 200 -> 50rbm的類創(chuàng)建好了和數(shù)據(jù)都已經(jīng)載入,可以創(chuàng)建DBN。 在這個(gè)例子中,我們使用了3個(gè)RBM,一個(gè)的隱藏層單元個(gè)數(shù)為500, 第二個(gè)RBM的隱藏層個(gè)數(shù)為200,最后一個(gè)為50. 我們想要生成訓(xùn)練數(shù)據(jù)的深層次表示形式。
5.訓(xùn)練RBM
我們將使用***rbm.train()***開始預(yù)訓(xùn)練步驟, 單獨(dú)訓(xùn)練堆中的每一個(gè)RBM,并將當(dāng)前RBM的輸出作為下一個(gè)RBM的輸入。
#For each RBM in our list for rbm in rbm_list:print('New RBM:')#Train a new onerbm.train(inpX) #Return the output layerinpX = rbm.rbm_outpt(inpX) New RBM: Epoch: 0 reconstruction error: 0.061174 Epoch: 1 reconstruction error: 0.052962 Epoch: 2 reconstruction error: 0.049679 Epoch: 3 reconstruction error: 0.047683 Epoch: 4 reconstruction error: 0.045691 New RBM: Epoch: 0 reconstruction error: 0.035260 Epoch: 1 reconstruction error: 0.030811 Epoch: 2 reconstruction error: 0.028873 Epoch: 3 reconstruction error: 0.027428 Epoch: 4 reconstruction error: 0.026980 New RBM: Epoch: 0 reconstruction error: 0.059593 Epoch: 1 reconstruction error: 0.056837 Epoch: 2 reconstruction error: 0.055571 Epoch: 3 reconstruction error: 0.053817 Epoch: 4 reconstruction error: 0.054142現(xiàn)在我們可以將輸入數(shù)據(jù)的學(xué)習(xí)好的表示轉(zhuǎn)換為有監(jiān)督的預(yù)測(cè),比如一個(gè)線性分類器。特別地,我們使用這個(gè)淺層神經(jīng)網(wǎng)絡(luò)的最后一層的輸出對(duì)數(shù)字分類。
6. 神經(jīng)網(wǎng)絡(luò)
下面的類使用了上面預(yù)訓(xùn)練好的RBMs實(shí)現(xiàn)神經(jīng)網(wǎng)絡(luò)。
import numpy as np import math import tensorflow as tfclass NN(object):def __init__(self, sizes, X, Y):# Initialize hyperparametersself._sizes = sizesself._X = Xself._Y = Yself.w_list = []self.b_list = []self._learning_rate = 1.0self._momentum = 0.0self._epoches = 10self._batchsize = 100input_size = X.shape[1]# initialization loopfor size in self._sizes + [Y.shape[1]]:# Define upper limit for the uniform distribution rangemax_range = 4 * math.sqrt(6. / (input_size + size))# Initialize weights through a random uniform distributionself.w_list.append(np.random.uniform(-max_range, max_range, [input_size, size]).astype(np.float32))# Initialize bias as zeroesself.b_list.append(np.zeros([size], np.float32))input_size = size# load data from rbmdef load_from_rbms(self, dbn_sizes, rbm_list):# Check if expected sizes are correctassert len(dbn_sizes) == len(self._sizes)for i in range(len(self._sizes)):# Check if for each RBN the expected sizes are correctassert dbn_sizes[i] == self._sizes[i]# If everything is correct, bring over the weights and biasesfor i in range(len(self._sizes)):self.w_list[i] = rbm_list[i].wself.b_list[i] = rbm_list[i].hb# Training methoddef train(self):# Create placeholders for input, weights, biases, output_a = [None] * (len(self._sizes) + 2)_w = [None] * (len(self._sizes) + 1)_b = [None] * (len(self._sizes) + 1)_a[0] = tf.placeholder("float", [None, self._X.shape[1]])y = tf.placeholder("float", [None, self._Y.shape[1]])# Define variables and activation functoinfor i in range(len(self._sizes) + 1):_w[i] = tf.Variable(self.w_list[i])_b[i] = tf.Variable(self.b_list[i])for i in range(1, len(self._sizes) + 2):_a[i] = tf.nn.sigmoid(tf.matmul(_a[i - 1], _w[i - 1]) + _b[i - 1])# Define the cost functioncost = tf.reduce_mean(tf.square(_a[-1] - y))# Define the training operation (Momentum Optimizer minimizing the Cost function)train_op = tf.train.MomentumOptimizer(self._learning_rate, self._momentum).minimize(cost)# Prediction operationpredict_op = tf.argmax(_a[-1], 1)# Training Loopwith tf.Session() as sess:# Initialize Variablessess.run(tf.global_variables_initializer())# For each epochfor i in range(self._epoches):# For each stepfor start, end in zip(range(0, len(self._X), self._batchsize), range(self._batchsize, len(self._X), self._batchsize)):# Run the training operation on the input datasess.run(train_op, feed_dict={_a[0]: self._X[start:end], y: self._Y[start:end]})for j in range(len(self._sizes) + 1):# Retrieve weights and biasesself.w_list[j] = sess.run(_w[j])self.b_list[j] = sess.run(_b[j])print("Accuracy rating for epoch " + str(i) + ": " + str(np.mean(np.argmax(self._Y, axis=1) == \sess.run(predict_op, feed_dict={_a[0]: self._X, y: self._Y}))))7. 運(yùn)行
nNet = NN(RBM_hidden_sizes, trX, trY) nNet.load_from_rbms(RBM_hidden_sizes,rbm_list) nNet.train() Accuracy rating for epoch 0: 0.46683636363636366 Accuracy rating for epoch 1: 0.6561272727272728 Accuracy rating for epoch 2: 0.7678363636363637 Accuracy rating for epoch 3: 0.8370727272727273 Accuracy rating for epoch 4: 0.8684181818181819 Accuracy rating for epoch 5: 0.885 Accuracy rating for epoch 6: 0.8947636363636363 Accuracy rating for epoch 7: 0.9024909090909091 Accuracy rating for epoch 8: 0.9080363636363636 Accuracy rating for epoch 9: 0.9124181818181818完整代碼
pip install??tensorflow==1.13.1
# Import the math function for calculations import math # Tensorflow library. Used to implement machine learning models import tensorflow as tf # Numpy contains helpful functions for efficient mathematical calculations import numpy as np # Image library for image manipulation # import Image # Utils file # Getting the MNIST data provided by Tensorflow from tensorflow.examples.tutorials.mnist import input_data""" This file contains different utility functions that are not connected in anyway to the networks presented in the tutorials, but rather help in processing the outputs into a more understandable way.For example ``tile_raster_images`` helps in generating a easy to grasp image from a set of samples or weights. """import numpydef scale_to_unit_interval(ndar, eps=1e-8):""" Scales all values in the ndarray ndar to be between 0 and 1 """ndar = ndar.copy()ndar -= ndar.min()ndar *= 1.0 / (ndar.max() + eps)return ndardef tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),scale_rows_to_unit_interval=True,output_pixel_vals=True):"""Transform an array with one flattened image per row, into an array inwhich images are reshaped and layed out like tiles on a floor.This function is useful for visualizing datasets whose rows are images,and also columns of matrices for transforming those rows(such as the first layer of a neural net).:type X: a 2-D ndarray or a tuple of 4 channels, elements of which canbe 2-D ndarrays or None;:param X: a 2-D array in which every row is a flattened image.:type img_shape: tuple; (height, width):param img_shape: the original shape of each image:type tile_shape: tuple; (rows, cols):param tile_shape: the number of images to tile (rows, cols):param output_pixel_vals: if output should be pixel values (i.e. int8values) or floats:param scale_rows_to_unit_interval: if the values need to be scaled beforebeing plotted to [0,1] or not:returns: array suitable for viewing as an image.(See:`Image.fromarray`.):rtype: a 2-d array with same dtype as X."""assert len(img_shape) == 2assert len(tile_shape) == 2assert len(tile_spacing) == 2# The expression below can be re-written in a more C style as# follows :## out_shape = [0,0]# out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -# tile_spacing[0]# out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -# tile_spacing[1]out_shape = [(ishp + tsp) * tshp - tspfor ishp, tshp, tsp in zip(img_shape, tile_shape, tile_spacing)]if isinstance(X, tuple):assert len(X) == 4# Create an output numpy ndarray to store the imageif output_pixel_vals:out_array = numpy.zeros((out_shape[0], out_shape[1], 4),dtype='uint8')else:out_array = numpy.zeros((out_shape[0], out_shape[1], 4),dtype=X.dtype)#colors default to 0, alpha defaults to 1 (opaque)if output_pixel_vals:channel_defaults = [0, 0, 0, 255]else:channel_defaults = [0., 0., 0., 1.]for i in range(4):if X[i] is None:# if channel is None, fill it with zeros of the correct# dtypedt = out_array.dtypeif output_pixel_vals:dt = 'uint8'out_array[:, :, i] = numpy.zeros(out_shape,dtype=dt) + channel_defaults[i]else:# use a recurrent call to compute the channel and store it# in the outputout_array[:, :, i] = tile_raster_images(X[i], img_shape, tile_shape, tile_spacing,scale_rows_to_unit_interval, output_pixel_vals)return out_arrayelse:# if we are dealing with only one channelH, W = img_shapeHs, Ws = tile_spacing# generate a matrix to store the outputdt = X.dtypeif output_pixel_vals:dt = 'uint8'out_array = numpy.zeros(out_shape, dtype=dt)for tile_row in range(tile_shape[0]):for tile_col in range(tile_shape[1]):if tile_row * tile_shape[1] + tile_col < X.shape[0]:this_x = X[tile_row * tile_shape[1] + tile_col]if scale_rows_to_unit_interval:# if we should scale values to be between 0 and 1# do this by calling the `scale_to_unit_interval`# functionthis_img = scale_to_unit_interval(this_x.reshape(img_shape))else:this_img = this_x.reshape(img_shape)# add the slice to the corresponding position in the# output arrayc = 1if output_pixel_vals:c = 255out_array[tile_row * (H + Hs): tile_row * (H + Hs) + H,tile_col * (W + Ws): tile_col * (W + Ws) + W] = this_img * creturn out_array# Class that defines the behavior of the RBM class RBM(object):def __init__(self, input_size, output_size):# Defining the hyperparametersself._input_size = input_size # Size of inputself._output_size = output_size # Size of outputself.epochs = 5 # Amount of training iterationsself.learning_rate = 1.0 # The step used in gradient descentself.batchsize = 100 # The size of how much data will be used for training per sub iteration# Initializing weights and biases as matrices full of zeroesself.w = np.zeros([input_size, output_size], np.float32) # Creates and initializes the weights with 0self.hb = np.zeros([output_size], np.float32) # Creates and initializes the hidden biases with 0self.vb = np.zeros([input_size], np.float32) # Creates and initializes the visible biases with 0# Fits the result from the weighted visible layer plus the bias into a sigmoid curvedef prob_h_given_v(self, visible, w, hb):# Sigmoidreturn tf.nn.sigmoid(tf.matmul(visible, w) + hb)# Fits the result from the weighted hidden layer plus the bias into a sigmoid curvedef prob_v_given_h(self, hidden, w, vb):return tf.nn.sigmoid(tf.matmul(hidden, tf.transpose(w)) + vb)# Generate the sample probabilitydef sample_prob(self, probs):return tf.nn.relu(tf.sign(probs - tf.random_uniform(tf.shape(probs))))# Training method for the modeldef train(self, X):# Create the placeholders for our parameters_w = tf.placeholder("float", [self._input_size, self._output_size])_hb = tf.placeholder("float", [self._output_size])_vb = tf.placeholder("float", [self._input_size])prv_w = np.zeros([self._input_size, self._output_size],np.float32) # Creates and initializes the weights with 0prv_hb = np.zeros([self._output_size], np.float32) # Creates and initializes the hidden biases with 0prv_vb = np.zeros([self._input_size], np.float32) # Creates and initializes the visible biases with 0cur_w = np.zeros([self._input_size, self._output_size], np.float32)cur_hb = np.zeros([self._output_size], np.float32)cur_vb = np.zeros([self._input_size], np.float32)v0 = tf.placeholder("float", [None, self._input_size])# Initialize with sample probabilitiesh0 = self.sample_prob(self.prob_h_given_v(v0, _w, _hb))v1 = self.sample_prob(self.prob_v_given_h(h0, _w, _vb))h1 = self.prob_h_given_v(v1, _w, _hb)# Create the Gradientspositive_grad = tf.matmul(tf.transpose(v0), h0)negative_grad = tf.matmul(tf.transpose(v1), h1)# Update learning rates for the layersupdate_w = _w + self.learning_rate * (positive_grad - negative_grad) / tf.to_float(tf.shape(v0)[0])update_vb = _vb + self.learning_rate * tf.reduce_mean(v0 - v1, 0)update_hb = _hb + self.learning_rate * tf.reduce_mean(h0 - h1, 0)# Find the error rateerr = tf.reduce_mean(tf.square(v0 - v1))# Training loopwith tf.Session() as sess:sess.run(tf.global_variables_initializer())# For each epochfor epoch in range(self.epochs):# For each step/batchfor start, end in zip(range(0, len(X), self.batchsize), range(self.batchsize, len(X), self.batchsize)):batch = X[start:end]# Update the ratescur_w = sess.run(update_w, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})cur_hb = sess.run(update_hb, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})cur_vb = sess.run(update_vb, feed_dict={v0: batch, _w: prv_w, _hb: prv_hb, _vb: prv_vb})prv_w = cur_wprv_hb = cur_hbprv_vb = cur_vberror = sess.run(err, feed_dict={v0: X, _w: cur_w, _vb: cur_vb, _hb: cur_hb})print('Epoch: %d' % epoch, 'reconstruction error: %f' % error)self.w = prv_wself.hb = prv_hbself.vb = prv_vb# Create expected output for our DBNdef rbm_outpt(self, X):input_X = tf.constant(X)_w = tf.constant(self.w)_hb = tf.constant(self.hb)out = tf.nn.sigmoid(tf.matmul(input_X, _w) + _hb)with tf.Session() as sess:sess.run(tf.global_variables_initializer())return sess.run(out)class NN(object):def __init__(self, sizes, X, Y):# Initialize hyperparametersself._sizes = sizesself._X = Xself._Y = Yself.w_list = []self.b_list = []self._learning_rate = 1.0self._momentum = 0.0self._epoches = 10self._batchsize = 100input_size = X.shape[1]# initialization loopfor size in self._sizes + [Y.shape[1]]:# Define upper limit for the uniform distribution rangemax_range = 4 * math.sqrt(6. / (input_size + size))# Initialize weights through a random uniform distributionself.w_list.append(np.random.uniform(-max_range, max_range, [input_size, size]).astype(np.float32))# Initialize bias as zeroesself.b_list.append(np.zeros([size], np.float32))input_size = size# load data from rbmdef load_from_rbms(self, dbn_sizes, rbm_list):# Check if expected sizes are correctassert len(dbn_sizes) == len(self._sizes)for i in range(len(self._sizes)):# Check if for each RBN the expected sizes are correctassert dbn_sizes[i] == self._sizes[i]# If everything is correct, bring over the weights and biasesfor i in range(len(self._sizes)):self.w_list[i] = rbm_list[i].wself.b_list[i] = rbm_list[i].hb# Training methoddef train(self):# Create placeholders for input, weights, biases, output_a = [None] * (len(self._sizes) + 2)_w = [None] * (len(self._sizes) + 1)_b = [None] * (len(self._sizes) + 1)_a[0] = tf.placeholder("float", [None, self._X.shape[1]])y = tf.placeholder("float", [None, self._Y.shape[1]])# Define variables and activation functoinfor i in range(len(self._sizes) + 1):_w[i] = tf.Variable(self.w_list[i])_b[i] = tf.Variable(self.b_list[i])for i in range(1, len(self._sizes) + 2):_a[i] = tf.nn.sigmoid(tf.matmul(_a[i - 1], _w[i - 1]) + _b[i - 1])# Define the cost functioncost = tf.reduce_mean(tf.square(_a[-1] - y))# Define the training operation (Momentum Optimizer minimizing the Cost function)train_op = tf.train.MomentumOptimizer(self._learning_rate, self._momentum).minimize(cost)# Prediction operationpredict_op = tf.argmax(_a[-1], 1)# Training Loopwith tf.Session() as sess:# Initialize Variablessess.run(tf.global_variables_initializer())# For each epochfor i in range(self._epoches):# For each stepfor start, end in zip(range(0, len(self._X), self._batchsize), range(self._batchsize, len(self._X), self._batchsize)):# Run the training operation on the input datasess.run(train_op, feed_dict={_a[0]: self._X[start:end], y: self._Y[start:end]})for j in range(len(self._sizes) + 1):# Retrieve weights and biasesself.w_list[j] = sess.run(_w[j])self.b_list[j] = sess.run(_b[j])print("Accuracy rating for epoch " + str(i) + ": " + str(np.mean(np.argmax(self._Y, axis=1) == \sess.run(predict_op, feed_dict={_a[0]: self._X, y: self._Y}))))if __name__ == '__main__':# Loading in the mnist datamnist = input_data.read_data_sets("MNIST_data/", one_hot=True)trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images,\mnist.test.labelsRBM_hidden_sizes = [500, 200, 50] # create 4 layers of RBM with size 785-500-200-50# Since we are training, set input as training datainpX = trX# Create list to hold our RBMsrbm_list = []# Size of inputs is the number of inputs in the training setinput_size = inpX.shape[1]# For each RBM we want to generatefor i, size in enumerate(RBM_hidden_sizes):print('RBM: ', i, ' ', input_size, '->', size)rbm_list.append(RBM(input_size, size))input_size = size# For each RBM in our listfor rbm in rbm_list:print('New RBM:')# Train a new onerbm.train(inpX)# Return the output layerinpX = rbm.rbm_outpt(inpX)nNet = NN(RBM_hidden_sizes, trX, trY)nNet.load_from_rbms(RBM_hidden_sizes, rbm_list)nNet.train()任何程序錯(cuò)誤,以及技術(shù)疑問或需要解答的,請(qǐng)掃碼添加作者VX::1755337994
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的Python 3深度置信网络(DBN)在Tensorflow中的实现MNIST手写数字识别的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 第二章关系数据库
- 下一篇: MFC中CString类字符串与长整型、