(pytorch-深度学习系列)pytorch避免过拟合-dropout丢弃法的实现-学习笔记
pytorch避免過擬合-dropout丟棄法的實(shí)現(xiàn)
對于一個(gè)單隱藏層的多層感知機(jī),其中輸入個(gè)數(shù)為4,隱藏單元個(gè)數(shù)為5,且隱藏單元hih_ihi?(i=1,…,5i=1, \ldots, 5i=1,…,5)的計(jì)算表達(dá)式為:
hi=?(x1w1i+x2w2i+x3w3i+x4w4i+bi)h_i = \phi\left(x_1 w_{1i} + x_2 w_{2i} + x_3 w_{3i} + x_4 w_{4i} + b_i\right) hi?=?(x1?w1i?+x2?w2i?+x3?w3i?+x4?w4i?+bi?)
這里?\phi?是激活函數(shù),x1,…,x4x_1, \ldots, x_4x1?,…,x4?是輸入,隱藏單元iii的權(quán)重參數(shù)為w1i,…,w4iw_{1i}, \ldots, w_{4i}w1i?,…,w4i?,偏差參數(shù)為bib_ibi?。當(dāng)對該隱藏層使用丟棄法時(shí),該層的隱藏單元將有一定概率被丟棄掉。設(shè)丟棄概率為ppp,那么有ppp的概率hih_ihi?會被清零,有1?p1-p1?p的概率hih_ihi?會除以1?p1-p1?p做拉伸。丟棄概率是丟棄法的超參數(shù)。具體來說,設(shè)隨機(jī)變量ξi\xi_iξi?為0和1的概率分別為ppp和1?p1-p1?p。使用丟棄法時(shí)我們計(jì)算新的隱藏單元hi′h_i'hi′?
hi′=ξi1?phih_i' = \frac{\xi_i}{1-p} h_i hi′?=1?pξi??hi?
(這個(gè)公式就表示 hi′h_i'hi′?有1?p1-p1?p的概率為hih_ihi?)
由于E(ξi)=1?pE(\xi_i) = 1-pE(ξi?)=1?p,因此
E(hi′)=E(ξi)1?phi=hiE(h_i') = \frac{E(\xi_i)}{1-p}h_i = h_i E(hi′?)=1?pE(ξi?)?hi?=hi?
即丟棄法不改變其輸入的期望值。我們對隱藏層使用丟棄法,一種可能的結(jié)果是h2h_2h2?和h5h_5h5?被清零。這時(shí)輸出值的計(jì)算不再依賴h2h_2h2?和h5h_5h5?,在反向傳播時(shí),與這兩個(gè)隱藏單元相關(guān)的權(quán)重的梯度均為0。由于在訓(xùn)練中隱藏層神經(jīng)元的丟棄是隨機(jī)的,即h1,…,h5h_1, \ldots, h_5h1?,…,h5?都有可能被清零,輸出層的計(jì)算無法過度依賴h1,…,h5h_1, \ldots, h_5h1?,…,h5?中的任一個(gè),從而在訓(xùn)練模型時(shí)起到正則化的作用,并可以用來應(yīng)對過擬合。在測試模型時(shí),我們?yōu)榱四玫礁哟_定性的結(jié)果,一般不使用丟棄法。
開始實(shí)現(xiàn)drop丟棄法避免過擬合
定義dropout函數(shù):
%matplotlib inline import torch import torch.nn as nn import numpy as npdef dropout(X, drop_prob):X = X.float()assert 0 <= drop_prob <= 1keep_prob = 1 - drop_prob# 這種情況下把全部元素都丟棄if keep_prob == 0:return torch.zeros_like(X)mask = (torch.rand(X.shape) < keep_prob).float()return mask * X / keep_prob定義模型參數(shù):
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True) b1 = torch.zeros(num_hiddens1, requires_grad=True) W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True) b2 = torch.zeros(num_hiddens2, requires_grad=True) W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True) b3 = torch.zeros(num_outputs, requires_grad=True)params = [W1, b1, W2, b2, W3, b3]定義模型將全連接層和激活函數(shù)ReLU串起來,并對每個(gè)激活函數(shù)的輸出使用丟棄法。分別設(shè)置各個(gè)層的丟棄概率。通常的建議是把靠近輸入層的丟棄概率設(shè)得小一點(diǎn)。在這個(gè)實(shí)驗(yàn)中,我們把第一個(gè)隱藏層的丟棄概率設(shè)為0.2,把第二個(gè)隱藏層的丟棄概率設(shè)為0.5。我們可以通過參數(shù)is_training來判斷運(yùn)行模式為訓(xùn)練還是測試,并只在訓(xùn)練模式下使用丟棄法。
drop_prob1, drop_prob2 = 0.2, 0.5def net(X, is_training=True):X = X.view(-1, num_inputs)H1 = (torch.matmul(X, W1) + b1).relu()if is_training: # 只在訓(xùn)練模型時(shí)使用丟棄法H1 = dropout(H1, drop_prob1) # 在第一層全連接后添加丟棄層H2 = (torch.matmul(H1, W2) + b2).relu()if is_training:H2 = dropout(H2, drop_prob2) # 在第二層全連接后添加丟棄層return torch.matmul(H2, W3) + b3def evaluate_accuracy(data_iter, net):acc_sum, n = 0.0, 0for X, y in data_iter:if isinstance(net, torch.nn.Module):net.eval() # 評估模式, 這會關(guān)閉dropoutacc_sum += (net(X).argmax(dim=1) == y).float().sum().item()net.train() # 改回訓(xùn)練模式else: # 自定義的模型if('is_training' in net.__code__.co_varnames): # 如果有is_training這個(gè)參數(shù)# 將is_training設(shè)置成Falseacc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() else:acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() n += y.shape[0]return acc_sum / n訓(xùn)練和測試模型:
num_epochs, lr, batch_size = 5, 100.0, 256 loss = torch.nn.CrossEntropyLoss()def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):"""Download the fashion mnist dataset and then load into memory."""trans = []if resize:trans.append(torchvision.transforms.Resize(size=resize))trans.append(torchvision.transforms.ToTensor())transform = torchvision.transforms.Compose(trans)mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)if sys.platform.startswith('win'):num_workers = 0 # 0表示不用額外的進(jìn)程來加速讀取數(shù)據(jù)else:num_workers = 4train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)return train_iter, test_iterdef train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,params=None, lr=None, optimizer=None):for epoch in range(num_epochs):train_l_sum, train_acc_sum, n = 0.0, 0.0, 0for X, y in train_iter:y_hat = net(X)l = loss(y_hat, y).sum()# 梯度清零if optimizer is not None:optimizer.zero_grad()elif params is not None and params[0].grad is not None:for param in params:param.grad.data.zero_()l.backward()if optimizer is None:sgd(params, lr, batch_size)else:optimizer.step() # “softmax回歸的簡潔實(shí)現(xiàn)”一節(jié)將用到train_l_sum += l.item()train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()n += y.shape[0]test_acc = evaluate_accuracy(test_iter, net)print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc)) train_iter, test_iter = load_data_fashion_mnist(batch_size) train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)總結(jié)
以上是生活随笔為你收集整理的(pytorch-深度学习系列)pytorch避免过拟合-dropout丢弃法的实现-学习笔记的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 睡眠声音识别中的准确率问题(三)--采集
- 下一篇: linux 打开上一级目录,linux开