當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

深度学习案例之基于 CNN 的 MNIST 手写数字识别

發(fā)布時間：2023/12/20 pytorch 30 豆豆

生活随笔收集整理的這篇文章主要介紹了深度学习案例之基于 CNN 的 MNIST 手写数字识别小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、模型結(jié)構(gòu)

本文只涉及利用Tensorflow實現(xiàn)CNN的手寫數(shù)字識別,CNN的內(nèi)容請參考:卷積神經(jīng)網(wǎng)絡(luò)(CNN)

MNIST數(shù)據(jù)集的格式與數(shù)據(jù)預(yù)處理代碼input_data.py的講解請參考 :Tutorial (2)

二、實驗代碼

# -*- coding:utf-8 -*- """@Time : @Author: Feng Lepeng@File : mnist_cnn_tf_demo.py@Desc : 手寫數(shù)字識別的CNN網(wǎng)絡(luò) LeNet注意：一般情況下，我們都是直接將網(wǎng)絡(luò)結(jié)構(gòu)翻譯成為這個代碼，最多稍微的修改一下網(wǎng)絡(luò)中的參數(shù)（超參數(shù)、窗口大小、步長等信息）https://deeplearnjs.org/demos/model-builder/https://js.tensorflow.org/#getting-started """ import math import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data# 數(shù)據(jù)加載 mnist = input_data.read_data_sets('data/mnist', one_hot=True)# 手寫數(shù)字識別的數(shù)據(jù)集主要包含三個部分：訓(xùn)練集(5.5w, mnist.train)、測試集(1w, mnist.test)、驗證集(0.5w, mnist.validation) # 手寫數(shù)字圖片大小是28*28*1像素的圖片(黑白)，也就是每個圖片由784維的特征描述 train_img = mnist.train.images train_label = mnist.train.labels test_img = mnist.test.images test_label = mnist.test.labels train_sample_number = mnist.train.num_examples# 相關(guān)的參數(shù)、超參數(shù)的設(shè)置 # 學(xué)習(xí)率，一般學(xué)習(xí)率設(shè)置的比較小 learn_rate_base = 1.0 # 每次迭代的訓(xùn)練樣本數(shù)量 batch_size = 64 # 展示信息的間隔大小 display_step = 1# 輸入的樣本維度大小信息 input_dim = train_img.shape[1] # 輸出的維度大小信息 n_classes = train_label.shape[1]# 模型構(gòu)建 # 1. 設(shè)置數(shù)據(jù)輸入的占位符 x = tf.placeholder(tf.float32, shape=[None, input_dim], name='x') y = tf.placeholder(tf.float32, shape=[None, n_classes], name='y') learn_rate = tf.placeholder(tf.float32, name='learn_rate')def learn_rate_func(epoch):"""根據(jù)給定的迭代批次，更新產(chǎn)生一個學(xué)習(xí)率的值:param epoch::return:"""return learn_rate_base * (0.9 ** int(epoch / 10))def get_variable(name, shape=None, dtype=tf.float32, initializer=tf.random_normal_initializer(mean=0, stddev=0.1)):"""返回一個對應(yīng)的變量:param name::param shape::param dtype::param initializer::return:"""return tf.get_variable(name, shape, dtype, initializer)# 2. 構(gòu)建網(wǎng)絡(luò) def le_net(x, y):# 1. 輸入層with tf.variable_scope('input1'):# 將輸入的x的格式轉(zhuǎn)換為規(guī)定的格式# [None, input_dim] -> [None, height, weight, channels]net = tf.reshape(x, shape=[-1, 28, 28, 1])# 2. 卷積層with tf.variable_scope('conv2'):# 卷積# conv2d(input, filter, strides, padding, use_cudnn_on_gpu=True, data_format="NHWC", name=None) => 卷積的API# data_format: 表示的是輸入的數(shù)據(jù)格式，兩種：NHWC和NCHW，N=>樣本數(shù)目，H=>Height, W=>Weight, C=>Channels# input：輸入數(shù)據(jù)，必須是一個4維格式的圖像數(shù)據(jù)，具體格式和data_format有關(guān)，如果data_format是NHWC的時候，input的格式為: [batch_size, height, weight, channels] => [批次中的圖片數(shù)目，圖片的高度，圖片的寬度，圖片的通道數(shù)]；如果data_format是NCHW的時候，input的格式為: [batch_size, channels, height, weight] => [批次中的圖片數(shù)目，圖片的通道數(shù)，圖片的高度，圖片的寬度]# filter: 卷積核，是一個4維格式的數(shù)據(jù)，shape: [height, weight, in_channels, out_channels] => [窗口的高度，窗口的寬度，輸入的channel通道數(shù)(上一層圖片的深度)，輸出的通道數(shù)(卷積核數(shù)目)]# strides：步長，是一個4維的數(shù)據(jù)，每一維數(shù)據(jù)必須和data_format格式匹配，表示的是在data_format每一維上的移動步長，當(dāng)格式為NHWC的時候，strides的格式為: [batch, in_height, in_weight, in_channels] => [樣本上的移動大小，高度的移動大小，寬度的移動大小，深度的移動大小],要求在樣本上和在深度通道上的移動必須是1；當(dāng)格式為NCHW的時候，strides的格式為: [batch,in_channels, in_height, in_weight]# padding: 只支持兩個參數(shù)"SAME", "VALID"，當(dāng)取值為SAME的時候，表示進(jìn)行填充，"在TensorFlow中，如果步長為1，并且padding為SAME的時候，經(jīng)過卷積之后的圖像大小是不變的"；當(dāng)VALID的時候，表示多余的特征會丟棄；net = tf.nn.conv2d(input=net, filter=get_variable('w', [5, 5, 1, 20]), strides=[1, 1, 1, 1], padding='SAME')net = tf.nn.bias_add(net, get_variable('b', [20]))# 激勵 ReLu# tf.nn.relu => max(fetures, 0)# tf.nn.relu6 => min(max(fetures,0), 6)net = tf.nn.relu(net)# 3. 池化with tf.variable_scope('pool3'):# 和conv2一樣，需要給定窗口大小和步長# max_pool(value, ksize, strides, padding, data_format="NHWC", name=None)# avg_pool(value, ksize, strides, padding, data_format="NHWC", name=None)# 默認(rèn)格式下：NHWC，value：輸入的數(shù)據(jù)，必須是[batch_size, height, weight, channels]格式# 默認(rèn)格式下：NHWC，ksize：指定窗口大小，必須是[batch, in_height, in_weight, in_channels]，其中batch和in_channels必須為1# 默認(rèn)格式下：NHWC，strides：指定步長大小，必須是[batch, in_height, in_weight, in_channels],其中batch和in_channels必須為1# padding：只支持兩個參數(shù)"SAME", "VALID"，當(dāng)取值為SAME的時候，表示進(jìn)行填充，；當(dāng)VALID的時候，表示多余的特征會丟棄；net = tf.nn.max_pool(value=net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')# 4. 卷積with tf.variable_scope('conv4'):net = tf.nn.conv2d(input=net, filter=get_variable('w', [5, 5, 20, 50]), strides=[1, 1, 1, 1], padding='SAME')net = tf.nn.bias_add(net, get_variable('b', [50]))net = tf.nn.relu(net)# 5. 池化with tf.variable_scope('pool5'):net = tf.nn.max_pool(value=net, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')# 6. 全連接with tf.variable_scope('fc6'):# 28 -> 14 -> 7(因為此時的卷積不改變圖片的大小)net = tf.reshape(net, shape=[-1, 7 * 7 * 50])net = tf.add(tf.matmul(net, get_variable('w', [7 * 7 * 50, 500])), get_variable('b', [500]))net = tf.nn.relu(net)# 7. 全連接with tf.variable_scope('fc7'):net = tf.add(tf.matmul(net, get_variable('w', [500, n_classes])), get_variable('b', [n_classes]))act = tf.nn.softmax(net)return act# 構(gòu)建網(wǎng)絡(luò) act = le_net(x, y)# 構(gòu)建模型的損失函數(shù) # softmax_cross_entropy_with_logits: 計算softmax中的每個樣本的交叉熵，logits指定預(yù)測值，labels指定實際值 cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=act, labels=y))# 使用Adam優(yōu)化方式比較多 # learning_rate: 要注意，不要過大，過大可能不收斂，也不要過小，過小收斂速度比較慢 train = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(cost)# 得到預(yù)測的類別是那一個 # tf.argmax:對矩陣按行或列計算最大值對應(yīng)的下標(biāo)，和numpy中的一樣 # tf.equal:是對比這兩個矩陣或者向量的相等的元素，如果是相等的那就返回True，反正返回False，返回的值的矩陣維度和A是一樣的 pred = tf.equal(tf.argmax(act, axis=1), tf.argmax(y, axis=1)) # 正確率（True轉(zhuǎn)換為1，False轉(zhuǎn)換為0） acc = tf.reduce_mean(tf.cast(pred, tf.float32))# 初始化 init = tf.global_variables_initializer()with tf.Session() as sess:# 進(jìn)行數(shù)據(jù)初始化sess.run(init)# 模型保存、持久化saver = tf.train.Saver()epoch = 0while True:avg_cost = 0# 計算出總的批次total_batch = int(train_sample_number / batch_size)# 迭代更新for i in range(total_batch):# 獲取x和ybatch_xs, batch_ys = mnist.train.next_batch(batch_size)feeds = {x: batch_xs, y: batch_ys, learn_rate: learn_rate_func(epoch)}# 模型訓(xùn)練sess.run(train, feed_dict=feeds)# 獲取損失函數(shù)值avg_cost += sess.run(cost, feed_dict=feeds)# 重新計算平均損失(相當(dāng)于計算每個樣本的損失值)avg_cost = avg_cost / total_batch# DISPLAY 顯示誤差率和訓(xùn)練集的正確率以此測試集的正確率if (epoch + 1) % display_step == 0:print("批次: %03d 損失函數(shù)值: %.9f" % (epoch, avg_cost))# 這里之所以使用batch_xs和batch_ys，是因為我使用train_img會出現(xiàn)內(nèi)存不夠的情況，直接就會退出feeds = {x: batch_xs, y: batch_ys, learn_rate: learn_rate_func(epoch)}train_acc = sess.run(acc, feed_dict=feeds)print("訓(xùn)練集準(zhǔn)確率: %.3f" % train_acc)feeds = {x: test_img, y: test_label, learn_rate: learn_rate_func(epoch)}test_acc = sess.run(acc, feed_dict=feeds)print("測試準(zhǔn)確率: %.3f" % test_acc)if train_acc > 0.9 and test_acc > 0.9:saver.save(sess, './mnist/model')breakepoch += 1# 模型可視化輸出writer = tf.summary.FileWriter('./mnist/graph', tf.get_default_graph())writer.close()

總結(jié)

以上是生活随笔為你收集整理的深度学习案例之基于 CNN 的 MNIST 手写数字识别的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：三款按键可视化软件——在你的电脑屏幕上显
下一篇：你安全吗网剧技术探讨-个人向