优达学城《DeepLearning》2-1:卷积神经网络
本次由3部分組成:
- 可視化卷積神經(jīng)網(wǎng)絡(luò)。
- 設(shè)計(jì)和訓(xùn)練一個(gè)CNN來對(duì)MNIST手寫數(shù)字分類。
- 設(shè)計(jì)并訓(xùn)練一個(gè)CNN來對(duì)CIFAR10數(shù)據(jù)集中的圖像進(jìn)行分類。
?
本次遇到的深度學(xué)習(xí)核心概念:
- SGD優(yōu)化器:GD就是梯度下降(Gradient Descent),SGD就是隨機(jī)梯度下降。SGD相對(duì)于GD優(yōu)勢在于:①不用計(jì)算全部圖片輸入網(wǎng)絡(luò)的梯度,而用小批量圖來更新一次網(wǎng)絡(luò),極大提升訓(xùn)練速度。②“歪歪扭扭”地走,天生容易跳出局部最優(yōu)點(diǎn),最終訓(xùn)練的精度往往比GD高的多。
-
?Sobel 算子:是一個(gè)離散微分算子, 結(jié)合了高斯平滑和微分求導(dǎo),主要用來計(jì)算圖像中某一點(diǎn)在
橫向/縱向上的近似梯度,如果梯度值大于某一個(gè)閾值,則認(rèn)為該點(diǎn)為邊緣點(diǎn)(像素值發(fā)生顯著變化的地方)。-
圖像近似梯度計(jì)算如下:
-
所以,sobel x和sobel y參數(shù)一般如下:
-
-
交叉熵?fù)p失:
-
二分類的交叉熵?fù)p失公式:(y為標(biāo)簽,y^為預(yù)測為正樣本的概率)
-
訓(xùn)練過程中代價(jià)函數(shù)是對(duì)m個(gè)樣本的損失函數(shù)求和然后除以m:
-
多分類交叉熵?fù)p失:
- K是種類數(shù)量
- y是標(biāo)簽,也就是如果類別是 i,則 yi?=1,否則等于0
- p是神經(jīng)網(wǎng)絡(luò)的輸出,也就是指類別是 i 的概率。這個(gè)輸出值就是用 softmax 計(jì)算得來的。
-
?
?
?
?
目錄
1 可視化卷積神經(jīng)網(wǎng)絡(luò)
1.1 自定義濾波器
1.2 可視化卷積層
1.3 可視化池化層
1.3.1 Import the image
1.3.2 Define and visualize the filters
1.3.3 Define convolutional and pooling layers
1.3.4 Visualize the output of each filter
1.3.5 Visualize the output of the pooling layer
2 設(shè)計(jì)和訓(xùn)練一個(gè)CNN對(duì)MNIST手寫數(shù)字分類
2.1 加載并可視化數(shù)據(jù)
2.1.1 可視化訓(xùn)練集中一個(gè)batch圖像集
2.1.2 觀察單個(gè)圖像更詳細(xì)的信息
2.2 定義網(wǎng)絡(luò)結(jié)構(gòu)
2.3 指定損失函數(shù)和優(yōu)化器
2.4 訓(xùn)練網(wǎng)絡(luò)
2.5 測試訓(xùn)練好的網(wǎng)絡(luò)
2.6 可視化test集預(yù)測結(jié)果
3 設(shè)計(jì)并訓(xùn)練一個(gè)CNN來對(duì)CIFAR10數(shù)據(jù)集中的圖像進(jìn)行分類
3.1 CUDA測試
3.2 加載數(shù)據(jù)
3.3 可視化一批訓(xùn)練數(shù)據(jù)
3.4 更詳細(xì)地查看圖像
3.5 定義網(wǎng)絡(luò)結(jié)構(gòu)
3.6 指定損失函數(shù)和優(yōu)化器
3.7 訓(xùn)練網(wǎng)絡(luò)
3.8 加載模型
3.9 測試訓(xùn)練好的模型
3.10 問題:你的模型有哪些缺點(diǎn),如何改進(jìn)?
3.11 可視化test集預(yù)測結(jié)果
?
1 可視化卷積神經(jīng)網(wǎng)絡(luò)
1.1 自定義濾波器
導(dǎo)入資源并顯示圖像:
import matplotlib.pyplot as plt
import matplotlib.image as mpimgimport cv2
import numpy as np%matplotlib inline# Read in the image
image = mpimg.imread('data/curved_lane.jpg')plt.imshow(image)
將圖像轉(zhuǎn)換為灰度圖:
# Convert to grayscale for filtering
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)plt.imshow(gray, cmap='gray')
TODO:創(chuàng)建自定義內(nèi)核
下面,我們?yōu)槟峁┝艘环N常見的邊緣檢測過濾器:Sobel操作符。
Sobel濾波器常用于邊緣檢測和圖像強(qiáng)度模式的提取。對(duì)圖像應(yīng)用Sobel濾波器是一種分別獲取圖像在x或y方向上的導(dǎo)數(shù)(近似值)的方法。運(yùn)算符如下所示。
由您創(chuàng)建一個(gè)sobel x操作符并將其應(yīng)用于給定的圖像。
作為一個(gè)挑戰(zhàn),看看你是否可以對(duì)圖像完成如下一系列濾波操作:模糊圖像(采取平均像素),然后一個(gè)檢測邊緣。
# Create a custom kernel# 3x3 array for edge detection
sobel_y = np.array([[ -1, -2, -1], [ 0, 0, 0], [ 1, 2, 1]])## TODO: Create and apply a Sobel x operator
sobel_x = np.array([[ -1, 0, 1], [ -2, 0, 2], [ -1, 0, 1]])# Filter the image using filter2D, which has inputs: (grayscale image, bit-depth, kernel)
filtered_image_x = cv2.filter2D(gray, -1, sobel_x)
filtered_image_y = cv2.filter2D(gray, -1, sobel_y)plt.figure(figsize=(14,14))#設(shè)置圖像尺寸(畫面大小其實(shí)是 1400 * 1400)#要生成兩行兩列,這是第一個(gè)圖plt.subplot('行','列','編號(hào)')
plt.subplot(1,2,1)
plt.title('sobel x')
plt.imshow(filtered_image_x, cmap='gray')plt.subplot(1,2,2)
plt.title('sobel y')
plt.imshow(filtered_image_y, cmap='gray')plt.show()
結(jié)果:
測試其他過濾器!
我們鼓勵(lì)您創(chuàng)建其他類型的過濾器并應(yīng)用它們來查看發(fā)生了什么!作為可選練習(xí),請(qǐng)嘗試以下操作:
- 創(chuàng)建具有小數(shù)值參數(shù)的過濾器。
- 創(chuàng)建5x5過濾器
- 將過濾器應(yīng)用于images目錄中的其他圖像。
image = mpimg.imread('data/bridge_trees_example.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)sobel_y = np.array([[ -1, -2, -1], [ 0, 0, 0], [ 1, 2, 1]])sobel_y_2 = np.array([[ -1.5, -2.5, -1.5], [ 0, 0, 0], [ 1.5, 2.5, 1.5]])sobel_x = np.array([[ -1, 0, 1], [ -2, 0, 2], [ -1, 0, 1]])sobel_x_5x5 = np.array([[ -1, 0, 0, 0, 1], [ -1, 0, 0, 0, 1],[ -2, 0, 0, 0, 2], [ -1, 0, 0, 0, 1],[ -1, 0, 0, 0, 1]])# Filter the image using filter2D, which has inputs: (grayscale image, bit-depth, kernel)
filtered_image_y = cv2.filter2D(gray, -1, sobel_y)
filtered_image_y_2 = cv2.filter2D(gray, -1, sobel_y_2)
filtered_image_x = cv2.filter2D(gray, -1, sobel_x)
filtered_image_x_5x5 = cv2.filter2D(gray, -1, sobel_x_5x5)plt.figure(figsize=(14, 14))#設(shè)置圖像尺寸(畫面大小其實(shí)是 1200 * 1200)plt.subplot(3,2,1)
plt.title('image')
plt.imshow(image)plt.subplot(3,2,2)
plt.title('gray')
plt.imshow(gray, cmap='gray')plt.subplot(3,2,3)
plt.title('sobel y')
plt.imshow(filtered_image_y, cmap='gray')plt.subplot(3,2,4)
plt.title('sobel y decimal')
plt.imshow(filtered_image_y_2, cmap='gray')plt.subplot(3,2,5)
plt.title('sobel x')
plt.imshow(filtered_image_x, cmap='gray')plt.subplot(3,2,6)
plt.title('sobel x 5*5')
plt.imshow(filtered_image_x_5x5, cmap='gray')plt.show()
結(jié)果:
1.2 可視化卷積層
在本筆記本中,我們將卷積層的四個(gè)過濾輸出(又稱激活圖)可視化。
在這個(gè)例子中,我們定義了四個(gè)濾波器,通過初始化卷積層的權(quán)值來應(yīng)用于輸入圖像,經(jīng)過訓(xùn)練的CNN將學(xué)習(xí)這些權(quán)值的值。
導(dǎo)入圖像:
import cv2
import matplotlib.pyplot as plt
%matplotlib inline# TODO: Feel free to try out your own images here by changing img_path
# to a file path to another image on your computer!
img_path = 'data/udacity_sdc.png'# load color image
bgr_img = cv2.imread(img_path)
# convert to grayscale
gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)# normalize, rescale entries to lie in [0,1]
gray_img = gray_img.astype("float32")/255# plot image
plt.imshow(gray_img, cmap='gray')
plt.show()
定義并可視化過濾器:
# visualize all four filters
fig = plt.figure(figsize=(10, 5))
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))width, height = filters[i].shapefor x in range(width):for y in range(height):ax.annotate(str(filters[i][x][y]), xy=(y,x),horizontalalignment='center',verticalalignment='center',color='white' if filters[i][x][y]<0 else 'black')
定義卷積層
初始化單個(gè)卷積層,使其包含所有創(chuàng)建的過濾器。請(qǐng)注意,您沒有訓(xùn)練此網(wǎng)絡(luò);您正在卷積層中初始化權(quán)重,以便可以直觀地看到前向傳播此網(wǎng)絡(luò)后發(fā)生的情況!
下面,我定義了一個(gè)名為Net類的結(jié)構(gòu),它有一個(gè)卷積層,可以包含四個(gè)4x4灰度過濾器。
import torch
import torch.nn as nn
import torch.nn.functional as F# define a neural network with a single convolutional layer with four filters
class Net(nn.Module):def __init__(self, weight):super(Net, self).__init__()# initializes the weights of the convolutional layer to be the weights of the 4 defined filtersk_height, k_width = weight.shape[2:]# assumes there are 4 grayscale filtersself.conv = nn.Conv2d(1, 4, kernel_size=(k_height, k_width), bias=False)self.conv.weight = torch.nn.Parameter(weight)def forward(self, x):# calculates the output of a convolutional layer# pre- and post-activationconv_x = self.conv(x)activated_x = F.relu(conv_x)# returns both layersreturn conv_x, activated_x# instantiate the model and set the weights
weight = torch.from_numpy(filters).unsqueeze(1).type(torch.FloatTensor)
model = Net(weight)# print out the layer in the network
print(model)
可視化每個(gè)過濾器的輸出
首先,我們將定義一個(gè)helper函數(shù),即接受特定層和過濾器數(shù)量(可選參數(shù))的?viz_layer,并在圖像通過后顯示該層的輸出。
# helper function for visualizing the output of a given layer
# default number of filters is 4
def viz_layer(layer, n_filters= 4):fig = plt.figure(figsize=(20, 20))for i in range(n_filters):ax = fig.add_subplot(1, n_filters, i+1, xticks=[], yticks=[])# grab layer outputsax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')ax.set_title('Output %s' % str(i+1))
在應(yīng)用ReLu激活函數(shù)之前和之后,讓我們看看卷積層的輸出。
# plot original image
plt.imshow(gray_img, cmap='gray')# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)# get the convolutional layer (pre and post activation)
conv_layer, activated_layer = model(gray_img_tensor)# visualize the output of a conv layer
viz_layer(conv_layer)
結(jié)果:
ReLu 激活函數(shù)
在這個(gè)模型中,我們使用了一個(gè)激活函數(shù)來縮放卷積層的輸出。我們選擇了一個(gè)ReLu函數(shù)來實(shí)現(xiàn)這一點(diǎn),這個(gè)函數(shù)只是將所有負(fù)像素值轉(zhuǎn)換為0(黑色)。關(guān)于輸入像素值x,請(qǐng)參見下圖中的公式。
# after a ReLu is applied
# visualize the output of an activated conv layer
viz_layer(activated_layer)
結(jié)果:
1.3 可視化池化層
在這個(gè)筆記本中,我們添加并可視化了CNN中maxpooling層的輸出。
卷積層+激活函數(shù)、池化層和線性層(用于創(chuàng)建所需的輸出大小)構(gòu)成CNN的基本層。
1.3.1 Import the image
1.3.2 Define and visualize the filters
1.3.3 Define convolutional and pooling layers
在下一個(gè)單元中,我們初始化一個(gè)卷積層,以便它包含所有創(chuàng)建的過濾器。然后添加一個(gè)maxpooling層,內(nèi)核大小為(2x2),這樣您就可以看到在這一步之后圖像分辨率已經(jīng)降低了!
maxpooling層減少了輸入的大小,并且只保留最活躍的像素值。下面是一個(gè)2x2池內(nèi)核的示例,步長為2,應(yīng)用于一小塊灰度像素值;將面片的大小減少2倍。只有2x2中的最大像素值保留在新的合并輸出中。
1.3.4 Visualize the output of each filter
首先,我們將定義一個(gè)helper函數(shù),即接受特定層和過濾器數(shù)量(可選參數(shù))的viz_layer,并在圖像通過后顯示該層的輸出。
# helper function for visualizing the output of a given layer
# default number of filters is 4
def viz_layer(layer, n_filters= 4):fig = plt.figure(figsize=(20, 20))for i in range(n_filters):ax = fig.add_subplot(1, n_filters, i+1)# grab layer outputsax.imshow(np.squeeze(layer[0,i].data.numpy()), cmap='gray')ax.set_title('Output %s' % str(i+1))
讓我們看看應(yīng)用ReLu激活函數(shù)后卷積層的輸出:
# plot original image
plt.imshow(gray_img, cmap='gray')# visualize all filters
fig = plt.figure(figsize=(12, 6))
fig.subplots_adjust(left=0, right=1.5, bottom=0.8, top=1, hspace=0.05, wspace=0.05)
for i in range(4):ax = fig.add_subplot(1, 4, i+1, xticks=[], yticks=[])ax.imshow(filters[i], cmap='gray')ax.set_title('Filter %s' % str(i+1))# convert the image into an input Tensor
gray_img_tensor = torch.from_numpy(gray_img).unsqueeze(0).unsqueeze(1)# get all the layers
conv_layer, activated_layer, pooled_layer = model(gray_img_tensor)# visualize the output of the activated conv layer
viz_layer(activated_layer)
結(jié)果:
1.3.5 Visualize the output of the pooling layer
然后,看看池層的輸出。池化層將上圖中的特征映射作為輸入,通過某種池化因子,通過在給定的內(nèi)核區(qū)域中構(gòu)造一個(gè)只有最大值(最亮值)的新的、更小的圖像來降低這些映射的維數(shù)。
仔細(xì)觀察x、y軸上的值,以查看圖像大小的變化。
?
2 設(shè)計(jì)和訓(xùn)練一個(gè)CNN對(duì)MNIST手寫數(shù)字分類
在本筆記本中,我們將訓(xùn)練一個(gè)MLP(Multi-Layer Perceptron 多層感知器)來對(duì)MNIST數(shù)據(jù)庫手寫數(shù)字?jǐn)?shù)據(jù)庫中的圖像進(jìn)行分類。
該過程將分為以下步驟:
- 加載并可視化數(shù)據(jù)
- 定義神經(jīng)網(wǎng)絡(luò)
- 訓(xùn)練模型
- 在測試數(shù)據(jù)集上評(píng)估我們訓(xùn)練模型的性能!
在開始之前,我們必須導(dǎo)入處理數(shù)據(jù)和PyTorch所需的庫。
# import libraries
import torch
import numpy as np
2.1 加載并可視化數(shù)據(jù)
下載可能需要一些時(shí)間,您應(yīng)該可以在加載數(shù)據(jù)時(shí)看到您的進(jìn)度。如果要一次加載更多數(shù)據(jù),也可以選擇更改批處理大小。
這個(gè)單元格將為每個(gè)數(shù)據(jù)集創(chuàng)建數(shù)據(jù)加載器。
# The MNIST datasets are hosted on yann.lecun.com that has moved under CloudFlare protection
# Run this script to enable the datasets download
# Reference: https://github.com/pytorch/vision/issues/1938from six.moves import urllib
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
from torchvision import datasets
import torchvision.transforms as transforms# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20# convert data to torch.FloatTensor
transform = transforms.ToTensor()# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,download=True, transform=transform)# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
2.1.1 可視化訓(xùn)練集中一個(gè)batch圖像集
分類任務(wù)的第一步是查看數(shù)據(jù),確保數(shù)據(jù)正確加載,然后對(duì)數(shù)據(jù)中的模式進(jìn)行任何初始觀察。
2.1.2 觀察單個(gè)圖像更詳細(xì)的信息
2.2 定義網(wǎng)絡(luò)結(jié)構(gòu)
該網(wǎng)絡(luò)結(jié)構(gòu)將784維度張量作為輸入,并輸出長度為10(我們的類別數(shù))的張量,該張量指示輸入圖像的類分?jǐn)?shù)。這個(gè)特殊的例子使用了2個(gè)隱藏層和dropout來避免過度擬合。
import torch.nn as nn
import torch.nn.functional as F## TODO: Define the NN architecture
class Net(nn.Module):def __init__(self):super(Net, self).__init__()# linear layer (784 -> 1 hidden node)self.fc1 = nn.Linear(28 * 28, 256)self.fc2 = nn.Linear(256, 64)self.fc3 = nn.Linear(64, 10)self.dropout = nn.Dropout(0.2)def forward(self, x):# flatten image inputx = x.view(-1, 28 * 28)# add hidden layer, with relu activation functionx = F.relu(self.fc1(x))x = self.dropout(x)x = F.relu(self.fc2(x))x = self.dropout(x)x = F.log_softmax(self.fc3(x), dim=1) return x# initialize the NN
model = Net()
print(model)
2.3 指定損失函數(shù)和優(yōu)化器
建議使用交叉熵?fù)p失進(jìn)行分類。如果您查看文檔,您可以看到PyTorch的交叉熵函數(shù)將softmax函數(shù)應(yīng)用于輸出層,然后計(jì)算日志損失。
## TODO: Specify loss and optimization functions
from torch import nn, optim
# specify loss function
criterion = nn.CrossEntropyLoss()# specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)
2.4 訓(xùn)練網(wǎng)絡(luò)
從一批數(shù)據(jù)中訓(xùn)練/學(xué)習(xí)的步驟在下面的注釋中描述:
- 1.清除所有優(yōu)化變量的梯度
- 2.前向傳播:通過將輸入傳遞到模型來計(jì)算預(yù)測輸出
- 3.計(jì)算損失
- 4.反向傳播:計(jì)算相對(duì)于模型參數(shù)的損失梯度
- 5.執(zhí)行單個(gè)優(yōu)化步驟(參數(shù)更新)
- 6.更新平均訓(xùn)練損失
以下是30個(gè)epoch的循環(huán)訓(xùn)練;請(qǐng)隨意更改此值。目前,我們建議在20-50個(gè)epoch之間。在訓(xùn)練時(shí),看看訓(xùn)練損失的值是如何隨著時(shí)間的推移而減少的。我們希望它減少,同時(shí)也避免過擬合訓(xùn)練數(shù)據(jù)。
# number of epochs to train the model
n_epochs = 30 # suggest training between 20-50 epochsmodel.train() # prep model for trainingfor epoch in range(n_epochs):# monitor training losstrain_loss = 0.0#################### train the model ####################for data, target in train_loader:# clear the gradients of all optimized variablesoptimizer.zero_grad()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the lossloss = criterion(output, target)# backward pass: compute gradient of the loss with respect to model parametersloss.backward()# perform a single optimization step (parameter update)optimizer.step()# update running training losstrain_loss += loss.item()*data.size(0)# print training statistics # calculate average loss over an epochtrain_loss = train_loss/len(train_loader.dataset)print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch+1, train_loss))
訓(xùn)練結(jié)果:
- Epoch: 1 ?? ?Training Loss: 0.950629
- Epoch: 2 ?? ?Training Loss: 0.378016
- Epoch: 3 ?? ?Training Loss: 0.292131
- Epoch: 4 ?? ?Training Loss: 0.237494
- Epoch: 5 ?? ?Training Loss: 0.203416
- Epoch: 6 ?? ?Training Loss: 0.178869
- Epoch: 7 ?? ?Training Loss: 0.157555
- Epoch: 8 ?? ?Training Loss: 0.143985
- Epoch: 9 ?? ?Training Loss: 0.132015
- Epoch: 10 ?? ?Training Loss: 0.122434
- Epoch: 11 ?? ?Training Loss: 0.113976
- Epoch: 12 ?? ?Training Loss: 0.105239
- Epoch: 13 ?? ?Training Loss: 0.098839
- Epoch: 14 ?? ?Training Loss: 0.093791
- Epoch: 15 ?? ?Training Loss: 0.088727
- Epoch: 16 ?? ?Training Loss: 0.081909
- Epoch: 17 ?? ?Training Loss: 0.079282
- Epoch: 18 ?? ?Training Loss: 0.074924
- Epoch: 19 ?? ?Training Loss: 0.071149
- Epoch: 20 ?? ?Training Loss: 0.068345
- Epoch: 21 ?? ?Training Loss: 0.065399
- Epoch: 22 ?? ?Training Loss: 0.062431
- Epoch: 23 ?? ?Training Loss: 0.060230
- Epoch: 24 ?? ?Training Loss: 0.056332
- Epoch: 25 ?? ?Training Loss: 0.055859
- Epoch: 26 ?? ?Training Loss: 0.053873
- Epoch: 27 ?? ?Training Loss: 0.050490
- Epoch: 28 ?? ?Training Loss: 0.049184
- Epoch: 29 ?? ?Training Loss: 0.046799
- Epoch: 30 ?? ?Training Loss: 0.047051
2.5 測試訓(xùn)練好的網(wǎng)絡(luò)
最后,我們在以前看不到的測試數(shù)據(jù)上測試了我們的最佳模型,并評(píng)估了它的性能。在看不見的數(shù)據(jù)上進(jìn)行測試是檢驗(yàn)我們的模型是否具有良好的泛化能力的一個(gè)好方法。在這個(gè)分析中,細(xì)化模型,看看這個(gè)模型在每個(gè)類上的表現(xiàn),以及它的總體損失和準(zhǔn)確性,也可能是有用的。
model.eval()?將模型中的所有層設(shè)置為評(píng)估模式。這會(huì)影響像dropout這樣的層,這些層在訓(xùn)練期間以一定的概率關(guān)閉節(jié)點(diǎn),但是評(píng)估時(shí)dropout的功能會(huì)被關(guān)閉。
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))model.eval() # prep model for *evaluation*for data, target in test_loader:# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the lossloss = criterion(output, target)# update test loss test_loss += loss.item()*data.size(0)# convert output probabilities to predicted class_, pred = torch.max(output, 1)# compare predictions to true labelcorrect = np.squeeze(pred.eq(target.data.view_as(pred)))# calculate test accuracy for each object classfor i in range(batch_size):label = target.data[i]class_correct[label] += correct[i].item()class_total[label] += 1# calculate and print avg test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))for i in range(10):if class_total[i] > 0:print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (str(i), 100 * class_correct[i] / class_total[i],class_correct[i], class_total[i]))else:print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (100. * np.sum(class_correct) / np.sum(class_total),np.sum(class_correct), np.sum(class_total)))
2.6 可視化test集預(yù)測結(jié)果
此單元格按以下格式顯示測試圖像及其標(biāo)簽:predicted (ground-truth)。文本將是綠色的準(zhǔn)確分類的例子和紅色的錯(cuò)誤預(yù)測。
# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()# get sample outputs
output = model(images)
# convert output probabilities to predicted class
_, preds = torch.max(output, 1)
# prep images for display
images = images.numpy()# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])ax.imshow(np.squeeze(images[idx]), cmap='gray')ax.set_title("{} ({})".format(str(preds[idx].item()), str(labels[idx].item())),color=("green" if preds[idx]==labels[idx] else "red"))
3 設(shè)計(jì)并訓(xùn)練一個(gè)CNN來對(duì)CIFAR10數(shù)據(jù)集中的圖像進(jìn)行分類
在本筆記本中,我們訓(xùn)練CNN對(duì)CIFAR-10數(shù)據(jù)庫中的圖像進(jìn)行分類。
該數(shù)據(jù)庫中的圖像是小彩色圖像,分為10個(gè)類;下面是一些示例圖片。
3.1 CUDA測試
由于這些是更大(32x32x3)的圖像,因此使用GPU加速訓(xùn)練可能會(huì)很有用。CUDA是一個(gè)并行計(jì)算平臺(tái),CUDA張量與典型張量相同,只是利用GPU進(jìn)行計(jì)算。
3.2 加載數(shù)據(jù)
下載可能需要一分鐘。我們加載訓(xùn)練和測試數(shù)據(jù),將訓(xùn)練數(shù)據(jù)拆分為訓(xùn)練和驗(yàn)證集,然后為每個(gè)數(shù)據(jù)集創(chuàng)建數(shù)據(jù)加載器。
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# percentage of training set to use as validation
valid_size = 0.2# convert data to a normalized torch.FloatTensor
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])# choose the training and test datasets
train_data = datasets.CIFAR10('data', train=True,download=True, transform=transform)
test_data = datasets.CIFAR10('data', train=False,download=True, transform=transform)# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)# specify the image classes
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer','dog', 'frog', 'horse', 'ship', 'truck']
3.3 可視化一批訓(xùn)練數(shù)據(jù)
3.4 更詳細(xì)地查看圖像
在這里,我們將標(biāo)準(zhǔn)化后的紅色、綠色和藍(lán)色(RGB)顏色通道視為三個(gè)獨(dú)立的灰度強(qiáng)度圖像。
rgb_img = np.squeeze(images[6]) #上圖第6序號(hào)的紅色鳥
channels = ['red channel', 'green channel', 'blue channel']fig = plt.figure(figsize = (36, 36))
for idx in np.arange(rgb_img.shape[0]):ax = fig.add_subplot(1, 3, idx + 1)img = rgb_img[idx]ax.imshow(img, cmap='gray')ax.set_title(channels[idx])width, height = img.shapethresh = img.max()/2.5for x in range(width):for y in range(height):val = round(img[x][y],2) if img[x][y] !=0 else 0ax.annotate(str(val), xy=(y,x),horizontalalignment='center',verticalalignment='center', size=8,color='white' if img[x][y]<thresh else 'black')
結(jié)果如下(圖像可以放大查看):
?
3.5 定義網(wǎng)絡(luò)結(jié)構(gòu)
這一次,您將定義一個(gè)CNN架構(gòu):
- 卷積層,可以看作是過濾圖像的濾波器堆疊。
- Maxpooling層,它減少輸入的x-y大小,只保留前一層中最活躍的像素。
- 通常的線性+dropout層,以避免過度擬合,并產(chǎn)生一個(gè)10維度的輸出。
下面的圖片和代碼中顯示了一個(gè)具有兩個(gè)卷積層的網(wǎng)絡(luò),您已經(jīng)獲得了具有一個(gè)卷積層和一個(gè)maxpooling層的起始代碼。
TODO:定義具有多個(gè)卷積層的模型,并定義前饋網(wǎng)絡(luò)行為。
包含的卷積層越多,模型可以檢測到的顏色和形狀的模式就越復(fù)雜。建議您的最終模型包括2或3個(gè)卷積層以及線性層+dropout,以避免過擬合。
將相關(guān)模型的現(xiàn)有研究和實(shí)現(xiàn)作為定義您自己的模型的起點(diǎn)是一種很好的做法。您可能會(huì)發(fā)現(xiàn)查看這個(gè)PyTorch分類示例或這個(gè)更復(fù)雜的Keras示例有助于確定最終結(jié)構(gòu)。
https://github.com/pytorch/tutorials/blob/master/beginner_source/blitz/cifar10_tutorial.py
https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py
卷積層的輸出大小:
為了計(jì)算給定卷積層的輸出大小,我們可以執(zhí)行以下計(jì)算(摘自斯坦福的cs231n課程):
- 我們可以計(jì)算輸出卷的空間大小,作為輸入卷大小(W)、內(nèi)核大小(F)、應(yīng)用它們的步長(S)和邊界上使用的零填充量(P)的函數(shù)。計(jì)算輸出的正確公式為:(W?F+2P)/S + 1。
例如,對(duì)于7x7輸入和3x3濾波器,步幅1和pad 0,我們將得到5x5輸出。如果用步幅2,我們可以得到3x3的輸出。
import torch.nn as nn
import torch.nn.functional as F# define the CNN architecture
class Net(nn.Module):def __init__(self):super(Net, self).__init__()# convolutional layerself.conv1 = nn.Conv2d(3, 16, 3, padding=1) # convolutional layerself.conv2 = nn.Conv2d(16, 32, 3, padding=1)# convolutional layerself.conv3 = nn.Conv2d(32, 64, 3, padding=1)# max pooling layerself.pool = nn.MaxPool2d(2, 2)# linear layer (64 * 4 * 4 -> 200)self.fc1 = nn.Linear(64 * 4 * 4, 200)# linear layer (200 -> 10)self.fc2 = nn.Linear(200, 10)# dropout layer (p=0.2)self.dropout = nn.Dropout(0.2)def forward(self, x):# add sequence of convolutional and max pooling layersx = self.pool( F.relu( self.conv1(x))) #輸出維度:16 * 16*16x = self.pool( F.relu( self.conv2(x))) #輸出維度:32 * 8*8x = self.pool( F.relu( self.conv3(x))) #輸出維度:64 * 4*4# flatten image inputx = x.view(-1, 64 * 4 * 4)# add dropout layerx = self.dropout(x)# add 1st hidden layer, with relu activation functionx = F.relu(self.fc1(x)) #輸出維度:200# add dropout layerx = self.dropout(x)x = self.fc2(x) #輸出維度:10return x# create a complete CNN
model = Net()
print(model)# move tensors to GPU if CUDA is available
if train_on_gpu:model.cuda()
3.6 指定損失函數(shù)和優(yōu)化器
import torch.optim as optim# specify loss function
criterion = nn.CrossEntropyLoss()# specify optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
3.7 訓(xùn)練網(wǎng)絡(luò)
記住看看訓(xùn)練集和驗(yàn)證集損失是如何隨著時(shí)間的推移而減少的;如果驗(yàn)證集損失增加,則表明可能過擬合。
# number of epochs to train the model
n_epochs = 8 # you may increase this number to train a final modelvalid_loss_min = np.Inf # track change in validation lossfor epoch in range(1, n_epochs+1):# keep track of training and validation losstrain_loss = 0.0valid_loss = 0.0#################### train the model ####################model.train()for data, target in train_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# clear the gradients of all optimized variablesoptimizer.zero_grad()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# backward pass: compute gradient of the loss with respect to model parametersloss.backward()# perform a single optimization step (parameter update)optimizer.step()# update training losstrain_loss += loss.item()*data.size(0)###################### # validate the model #######################model.eval()for data, target in valid_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# update average validation loss valid_loss += loss.item()*data.size(0)# calculate average lossestrain_loss = train_loss/len(train_loader.dataset)valid_loss = valid_loss/len(valid_loader.dataset)# print training/validation statistics print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valid_loss))# save model if validation loss has decreasedif valid_loss <= valid_loss_min:print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min,valid_loss))torch.save(model.state_dict(), 'model_cifar.pt')valid_loss_min = valid_loss
結(jié)果:
3.8 加載模型
model.load_state_dict(torch.load('model_cifar.pt'))
3.9 測試訓(xùn)練好的模型
在以前看不到的數(shù)據(jù)上測試你的訓(xùn)練模型!一個(gè)“好”的訓(xùn)練結(jié)果大約有70%分類精度(或更多,盡你最大的努力!)。
# track test loss
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))model.eval()
# iterate over test data
for data, target in test_loader:# move tensors to GPU if CUDA is availableif train_on_gpu:data, target = data.cuda(), target.cuda()# forward pass: compute predicted outputs by passing inputs to the modeloutput = model(data)# calculate the batch lossloss = criterion(output, target)# update test loss test_loss += loss.item()*data.size(0)# convert output probabilities to predicted class_, pred = torch.max(output, 1) # compare predictions to true labelcorrect_tensor = pred.eq(target.data.view_as(pred))correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())# calculate test accuracy for each object classfor i in range(batch_size):label = target.data[i]class_correct[label] += correct[i].item()class_total[label] += 1# average test loss
test_loss = test_loss/len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))for i in range(10):if class_total[i] > 0:print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (classes[i], 100 * class_correct[i] / class_total[i],np.sum(class_correct[i]), np.sum(class_total[i])))else:print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (100. * np.sum(class_correct) / np.sum(class_total),np.sum(class_correct), np.sum(class_total)))
結(jié)果:
3.10 問題:你的模型有哪些缺點(diǎn),如何改進(jìn)?
答:
- 訓(xùn)練結(jié)束時(shí),loss還在快速下降,訓(xùn)練的epoch數(shù)遠(yuǎn)遠(yuǎn)不夠。
- 不同類別的測試結(jié)果差異較大,類別比較復(fù)雜多變的類預(yù)測效果普遍較差(如狗、小汽車、鳥類),這些類相對(duì)其他類,類內(nèi)距離較大,這要么表示模型訓(xùn)練時(shí)間不夠還沒掌握復(fù)雜類的預(yù)測,要么模型結(jié)構(gòu)的復(fù)雜度還較低導(dǎo)致無法表達(dá)復(fù)雜類情況。
3.11 可視化test集預(yù)測結(jié)果
# obtain one batch of test images
dataiter = iter(test_loader)
images, labels = dataiter.next()
images.numpy()# move model inputs to cuda, if GPU available
if train_on_gpu:images = images.cuda()# get sample outputs
output = model(images)
# convert output probabilities to predicted class
_, preds_tensor = torch.max(output, 1)
preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy())if train_on_gpu:images = images.cpu()# plot the images in the batch, along with predicted and true labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])imshow(images[idx] if not train_on_gpu else images[idx].cpu())ax.set_title("{} ({})".format(classes[preds[idx]], classes[labels[idx]]),color=("green" if preds[idx]==labels[idx].item() else "red"))
結(jié)果:
總結(jié)
以上是生活随笔為你收集整理的优达学城《DeepLearning》2-1:卷积神经网络的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 优达学城《DeepLearning》项目
- 下一篇: 优达学城《DeepLearning》2-