深度學(xué)習(xí)之卷積神經(jīng)網(wǎng)絡(luò)(10)CIFAR10與VGG13實戰(zhàn) MNIST是機器學(xué)習(xí)最常用的數(shù)據(jù)集之一,但由于手寫數(shù)字圖片非常簡單,并且MNIST數(shù)據(jù)集只保存了圖片灰度信息,并不適合輸入設(shè)計為RGB三通道的網(wǎng)絡(luò)模型。本節(jié)將介紹另一個經(jīng)典的圖片分類數(shù)據(jù)集: CIFAR10。
?CIFAR10數(shù)據(jù)集由加拿大Canadian Institute For Advanced Research發(fā)布,它包含了飛機、汽車、鳥、貓等共10大類物體的彩色圖片,每個種類收集了6000張
32×3232×32 3 2 × 3 2 大小圖片,共6萬張圖片。其中5萬張作為訓(xùn)練數(shù)據(jù)集,1萬張作為測試數(shù)據(jù)集。每個種類樣片如下圖所示。
CIFAR10數(shù)據(jù)集
?在TensorFlow中,同樣地,不需要手動下載、解析和加載CIFAR10數(shù)據(jù)集,通過datasets.cifar10.load_data()函數(shù)就看人義直接加載切割好的訓(xùn)練集和測試集。例如:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Sequential, losses, optimizers, datasetsimport osfrom Chapter10.CIFAR10 import load_dataos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'def preprocess(x, y):x = tf.cast(x, dtype=tf.float32) / 255.y = tf.cast(y, dtype=tf.int32)return x, y# 在線下載,加載CIFAR10數(shù)據(jù)集
(x, y), (x_test, y_test) = load_data('/Users/xuruihang/.keras/datasets/cifar-10-batches-py')
# 刪除y的一個維度,[b,1] => [b]
y = tf.squeeze(y, axis=1)
y_test = tf.squeeze(y_test, axis=1)
# 打印訓(xùn)練集和測試集的形狀
print(x.shape, y.shape, x_test.shape, y_test.shape)
# 構(gòu)建訓(xùn)練集對象,隨機打亂,預(yù)處理,批量化
train_db = tf.data.Dataset.from_tensor_slices((x, y))
train_db = train_db.shuffle(1000).map(preprocess).batch(128)
# 構(gòu)建測試集對象,預(yù)處理,批量化
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
train_db = train_db.map(preprocess).batch(128)
# 從訓(xùn)練集中采樣一個Batch,并觀察
sample = next(iter(train_db))
print('sample: ', sample[0].shape, sample[1].shape,tf.reduce_min(sample[0]), tf.reduce_max(sample[0]))
運行結(jié)果如下圖所示:
注: 這里的load_data()是調(diào)用自己寫的一段代碼,因為直接下載會報錯:
import numpy as np
import osdef load_batch(file):import picklewith open(file, 'rb') as fo:d = pickle.load(fo, encoding='bytes')d_decoded = {}for k, v in d.items():d_decoded[k.decode('utf8')] = vd = d_decodeddata = d['data']labels = d['labels']data = data.reshape(data.shape[0], 3, 32, 32)return data, labelsdef load_data(path ='data/cifar-10-batches-py'):"""Loads CIFAR10 dataset.# ReturnsTuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`."""from tensorflow.python.keras import backend as Knum_train_samples = 50000x_train = np.empty((num_train_samples, 3, 32, 32), dtype='uint8')y_train = np.empty((num_train_samples,), dtype='uint8')for i in range(1, 6):fpath = os.path.join(path, 'data_batch_' + str(i))(x_train[(i - 1) * 10000: i * 10000, :, :, :],y_train[(i - 1) * 10000: i * 10000]) = load_batch(fpath)fpath = os.path.join(path, 'test_batch')x_test, y_test = load_batch(fpath)y_train = np.reshape(y_train, (len(y_train), 1))y_test = np.reshape(y_test, (len(y_test), 1))if K.image_data_format() == 'channels_last':x_train = x_train.transpose(0, 2, 3, 1)x_test = x_test.transpose(0, 2, 3, 1)return (x_train, y_train), (x_test, y_test)(x_train, y_train), (x_test, y_test) = load_data('/Users/xuruihang/.keras/datasets/cifar-10-batches-py')
詳見Keras CIFAR10離線加載 ?可以看到,上述代碼運行后,得到的訓(xùn)練集的X\boldsymbol X X 和y\boldsymbol y y 形狀為: (50000,32,32,3)(50000,32,32,3) ( 5 0 0 0 0 , 3 2 , 3 2 , 3 ) 和(50000)(50000) ( 5 0 0 0 0 ) ,測試集的X\boldsymbol X X 和y\boldsymbol y y 形狀為: (10000,32,32,3)(10000,32,32,3) ( 1 0 0 0 0 , 3 2 , 3 2 , 3 ) 和(10000)(10000) ( 1 0 0 0 0 ) ,分別代表了圖片大小為32×3232×32 3 2 × 3 2 ,彩色圖片,訓(xùn)練集樣本數(shù)為50000,測試集樣本數(shù)為10000。 ?CIFAR10圖片識別任務(wù)并不簡單,這主要是由于CIFAR10的圖片內(nèi)容需要大量細節(jié)才能呈現(xiàn),而保存的圖片分辨率僅有32×3232×32 3 2 × 3 2 ,使得主題部分信息較為模糊,甚至人眼都很難分辨。淺層的神經(jīng)網(wǎng)絡(luò)表達能力有限,很難訓(xùn)練優(yōu)化到較好的性能,本節(jié)將基于表達能力更強的VGG13網(wǎng)絡(luò),根據(jù)我們的數(shù)據(jù)集特點修改部分網(wǎng)絡(luò)結(jié)構(gòu),完成CIFAR10圖片識別。修改如下:
將網(wǎng)絡(luò)輸入調(diào)整為32×32。原網(wǎng)絡(luò)輸入為224×224224×224 2 2 4 × 2 2 4 ,導(dǎo)致全連 接層輸入特征維度過大,網(wǎng)絡(luò)參數(shù)量過大。 3個全連接層的維度調(diào)整為[256,64,10][256,64,10] [ 2 5 6 , 6 4 , 1 0 ] ,滿足10分類任務(wù)的設(shè)定。
?下圖是調(diào)整后的VGG13網(wǎng)絡(luò)結(jié)構(gòu),我們統(tǒng)稱之為VGG13網(wǎng)絡(luò)模型。
調(diào)整的VGG13模型結(jié)構(gòu)
?我們將網(wǎng)絡(luò)實現(xiàn)為兩個子網(wǎng)絡(luò): 卷積子網(wǎng)絡(luò)和全連接子網(wǎng)絡(luò)。卷積子網(wǎng)絡(luò)由5個子模塊構(gòu)成,每個子模塊包含了Conv-Conv-MaxPooling單元結(jié)構(gòu),代碼如下:
conv_layers = [ # 先創(chuàng)建包含多網(wǎng)絡(luò)層的列表# Conv-Conv-Pooling單元1# 64個3×3卷積核,輸入輸出同大小layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), # 高寬減半layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'), # Conv-Conv-Pooling單元2,輸出通道提升至128,高寬大小減半layers.Conv2D(128, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.Conv2D(128, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'), # Conv-Conv-Pooling單元3,輸出通道提升至256,高寬大小減半layers.Conv2D(256, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.Conv2D(256, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'), # Conv-Conv-Pooling單元4,輸出通道提升至512,高寬大小減半layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'), # Conv-Conv-Pooling單元5,輸出通道提升至512,高寬大小減半layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu), layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),
]
# 利用前面創(chuàng)建的層列表構(gòu)建網(wǎng)絡(luò)容器
conv_net = Sequential(conv_layers)
全連接子網(wǎng)絡(luò)包含了3個全連接層,每層添加ReLU非線性激活函數(shù),最后一層除外。代碼如下:
# 創(chuàng)建3層全連接層子網(wǎng)絡(luò)
fc_net = Sequential([layers.Dense(256, activation=tf.nn.relu), layers.Dense(128, activation=tf.nn.relu), layers.Dense(10, activation=None)
])
子網(wǎng)絡(luò)創(chuàng)建完成后,通過如下代碼查看網(wǎng)絡(luò)的參數(shù)量:
conv_net.build(input_shape=[None, 32, 32, 3])
fc_net.build(input_shape=[None, 512])
conv_net.summary()
fc_net.summary()
卷積網(wǎng)絡(luò)總參數(shù)量約為940萬個,全連接網(wǎng)絡(luò)總參數(shù)量約為17.7萬個,網(wǎng)絡(luò)總參數(shù)量約為950萬個,相比于原始版本的VGG13參數(shù)量減少了很多。 ?由于我們將網(wǎng)絡(luò)實現(xiàn)為2個子網(wǎng)絡(luò),在進行梯度更新時,需要合并2個子網(wǎng)絡(luò)的待優(yōu)化參數(shù)列表。代碼如下:
# 列表合并,合并2個子網(wǎng)絡(luò)的參數(shù)
variables = conv_net.trainable_variables + fc_net.trainable_variables
# 對所有參數(shù)求梯度
grads = tape.gradient(loss, variables)
# 自動更新
optimizer.apply_gradients(zip(grads, variables))
運行代碼即可開始訓(xùn)練模型,在訓(xùn)練完50個Epoch后,網(wǎng)絡(luò)的測試準確率達到了77.5% 。 完整代碼:
import tensorflow as tf
from tensorflow.keras import layers, optimizers, datasets, Sequential
import osfrom Chapter10.CIFAR10 import load_dataos.environ['TF_CPP_MIN_LOG_LEVEL']='2'
tf.random.set_seed(2345)conv_layers = [ # 5 units of conv + max pooling# unit 1layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),# unit 2layers.Conv2D(128, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.Conv2D(128, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),# unit 3layers.Conv2D(256, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.Conv2D(256, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),# unit 4layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),# unit 5layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.Conv2D(512, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same')]def preprocess(x, y):# [0~1]x = 2*tf.cast(x, dtype=tf.float32) / 255.-1y = tf.cast(y, dtype=tf.int32)return x,y# 在線下載,加載CIFAR10數(shù)據(jù)集
(x, y), (x_test, y_test) = load_data('/Users/xuruihang/.keras/datasets/cifar-10-batches-py')
y = tf.squeeze(y, axis=1)
y_test = tf.squeeze(y_test, axis=1)
print(x.shape, y.shape, x_test.shape, y_test.shape)train_db = tf.data.Dataset.from_tensor_slices((x,y))
train_db = train_db.shuffle(1000).map(preprocess).batch(128)test_db = tf.data.Dataset.from_tensor_slices((x_test,y_test))
test_db = test_db.map(preprocess).batch(64)sample = next(iter(train_db))
print('sample:', sample[0].shape, sample[1].shape,tf.reduce_min(sample[0]), tf.reduce_max(sample[0]))def main():# [b, 32, 32, 3] => [b, 1, 1, 512]conv_net = Sequential(conv_layers)fc_net = Sequential([layers.Dense(256, activation=tf.nn.relu),layers.Dense(128, activation=tf.nn.relu),layers.Dense(10, activation=None),])conv_net.build(input_shape=[None, 32, 32, 3])fc_net.build(input_shape=[None, 512])conv_net.summary()fc_net.summary()optimizer = optimizers.Adam(lr=1e-4)# [1, 2] + [3, 4] => [1, 2, 3, 4]# 列表合并,合并2個子網(wǎng)絡(luò)的參數(shù)variables = conv_net.trainable_variables + fc_net.trainable_variablesfor epoch in range(50):for step, (x,y) in enumerate(train_db):with tf.GradientTape() as tape:# [b, 32, 32, 3] => [b, 1, 1, 512]out = conv_net(x)# flatten, => [b, 512]out = tf.reshape(out, [-1, 512])# [b, 512] => [b, 10]logits = fc_net(out)# [b] => [b, 10]y_onehot = tf.one_hot(y, depth=10)# compute lossloss = tf.losses.categorical_crossentropy(y_onehot, logits, from_logits=True)loss = tf.reduce_mean(loss)grads = tape.gradient(loss, variables)optimizer.apply_gradients(zip(grads, variables))if step %100 == 0:print(epoch, step, 'loss:', float(loss))total_num = 0total_correct = 0for x,y in test_db:out = conv_net(x)out = tf.reshape(out, [-1, 512])logits = fc_net(out)prob = tf.nn.softmax(logits, axis=1)pred = tf.argmax(prob, axis=1)pred = tf.cast(pred, dtype=tf.int32)correct = tf.cast(tf.equal(pred, y), dtype=tf.int32)correct = tf.reduce_sum(correct)total_num += x.shape[0]total_correct += int(correct)acc = total_correct / total_numprint(epoch, 'acc:', acc)if __name__ == '__main__':main()
運行結(jié)果如下圖所示:
可以看到,準確率達到了77.41% 。(破程序運行了一晚上,電腦直接起飛了)
總結(jié)
以上是生活随笔 為你收集整理的深度学习之卷积神经网络(10)CIFAR10与VGG13实战 的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔 推薦給好友。