當(dāng)前位置：首頁 > 人工智能 > pytorch >内容正文

pytorch

（pytorch-深度学习系列）pytorch避免过拟合-dropout丢弃法的实现-学习笔记

發(fā)布時(shí)間：2024/8/23 pytorch 23 豆豆

生活随笔收集整理的這篇文章主要介紹了（pytorch-深度学习系列）pytorch避免过拟合-dropout丢弃法的实现-学习笔记小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

pytorch避免過擬合-dropout丟棄法的實(shí)現(xiàn)

對于一個(gè)單隱藏層的多層感知機(jī)，其中輸入個(gè)數(shù)為4，隱藏單元個(gè)數(shù)為5，且隱藏單元 $h_i$ （ $\ldots, 5$ ）的計(jì)算表達(dá)式為：
$hi=?(x1w1i+x2w2i+x3w3i+x4w4i+bi)h_i = \phi\left(x_1 w_{1i} + x_2 w_{2i} + x_3 w_{3i} + x_4 w_{4i} + b_i\right)$
這里 $?\phi$ 是激活函數(shù)， $x1,…,x4x_1, \ldots, x_4$ 是輸入，隱藏單元 $i$ 的權(quán)重參數(shù)為 $w1i,…,w4iw_{1i}, \ldots, w_{4i}$ ，偏差參數(shù)為 $b_i$ 。當(dāng)對該隱藏層使用丟棄法時(shí)，該層的隱藏單元將有一定概率被丟棄掉。設(shè)丟棄概率為 $p$ ，那么有 $p$ 的概率 $h_i$ 會被清零，有 $1 ? p$ 的概率 $h_i$ 會除以 $1 ? p$ 做拉伸。丟棄概率是丟棄法的超參數(shù)。具體來說，設(shè)隨機(jī)變量 $ξi\xi_i$ 為0和1的概率分別為 $p$ 和 $1 ? p$ 。使用丟棄法時(shí)我們計(jì)算新的隱藏單元 $h_i'$
$hi′=ξi1?phih_i' = \frac{\xi_i}{1-p} h_i$
（這個(gè)公式就表示 $h_i'$ 有 $1 ? p$ 的概率為 $h_i$ ）

由于 $E(ξi)=1?pE(\xi_i) = 1-p$ ，因此

$E(hi′)=E(ξi)1?phi=hiE(h_i') = \frac{E(\xi_i)}{1-p}h_i = h_i$
即丟棄法不改變其輸入的期望值。我們對隱藏層使用丟棄法，一種可能的結(jié)果是 $h_2$ 和 $h_5$ 被清零。這時(shí)輸出值的計(jì)算不再依賴 $h_2$ 和 $h_5$ ，在反向傳播時(shí)，與這兩個(gè)隱藏單元相關(guān)的權(quán)重的梯度均為0。由于在訓(xùn)練中隱藏層神經(jīng)元的丟棄是隨機(jī)的，即 $h1,…,h5h_1, \ldots, h_5$ 都有可能被清零，輸出層的計(jì)算無法過度依賴 $h1,…,h5h_1, \ldots, h_5$ 中的任一個(gè)，從而在訓(xùn)練模型時(shí)起到正則化的作用，并可以用來應(yīng)對過擬合。在測試模型時(shí)，我們?yōu)榱四玫礁哟_定性的結(jié)果，一般不使用丟棄法。

開始實(shí)現(xiàn)drop丟棄法避免過擬合

定義dropout函數(shù)：

%matplotlib inline import torch import torch.nn as nn import numpy as npdef dropout(X, drop_prob):X = X.float()assert 0 <= drop_prob <= 1keep_prob = 1 - drop_prob# 這種情況下把全部元素都丟棄if keep_prob == 0:return torch.zeros_like(X)mask = (torch.rand(X.shape) < keep_prob).float()return mask * X / keep_prob

定義模型參數(shù)：

num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True) b1 = torch.zeros(num_hiddens1, requires_grad=True) W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True) b2 = torch.zeros(num_hiddens2, requires_grad=True) W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True) b3 = torch.zeros(num_outputs, requires_grad=True)params = [W1, b1, W2, b2, W3, b3]

定義模型將全連接層和激活函數(shù)ReLU串起來，并對每個(gè)激活函數(shù)的輸出使用丟棄法。分別設(shè)置各個(gè)層的丟棄概率。通常的建議是把靠近輸入層的丟棄概率設(shè)得小一點(diǎn)。在這個(gè)實(shí)驗(yàn)中，我們把第一個(gè)隱藏層的丟棄概率設(shè)為0.2，把第二個(gè)隱藏層的丟棄概率設(shè)為0.5。我們可以通過參數(shù)is_training來判斷運(yùn)行模式為訓(xùn)練還是測試，并只在訓(xùn)練模式下使用丟棄法。

drop_prob1, drop_prob2 = 0.2, 0.5def net(X, is_training=True):X = X.view(-1, num_inputs)H1 = (torch.matmul(X, W1) + b1).relu()if is_training: # 只在訓(xùn)練模型時(shí)使用丟棄法H1 = dropout(H1, drop_prob1) # 在第一層全連接后添加丟棄層H2 = (torch.matmul(H1, W2) + b2).relu()if is_training:H2 = dropout(H2, drop_prob2) # 在第二層全連接后添加丟棄層return torch.matmul(H2, W3) + b3def evaluate_accuracy(data_iter, net):acc_sum, n = 0.0, 0for X, y in data_iter:if isinstance(net, torch.nn.Module):net.eval() # 評估模式, 這會關(guān)閉dropoutacc_sum += (net(X).argmax(dim=1) == y).float().sum().item()net.train() # 改回訓(xùn)練模式else: # 自定義的模型if('is_training' in net.__code__.co_varnames): # 如果有is_training這個(gè)參數(shù)# 將is_training設(shè)置成Falseacc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() else:acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() n += y.shape[0]return acc_sum / n

訓(xùn)練和測試模型：

num_epochs, lr, batch_size = 5, 100.0, 256 loss = torch.nn.CrossEntropyLoss()def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):"""Download the fashion mnist dataset and then load into memory."""trans = []if resize:trans.append(torchvision.transforms.Resize(size=resize))trans.append(torchvision.transforms.ToTensor())transform = torchvision.transforms.Compose(trans)mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)if sys.platform.startswith('win'):num_workers = 0 # 0表示不用額外的進(jìn)程來加速讀取數(shù)據(jù)else:num_workers = 4train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)return train_iter, test_iterdef train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,params=None, lr=None, optimizer=None):for epoch in range(num_epochs):train_l_sum, train_acc_sum, n = 0.0, 0.0, 0for X, y in train_iter:y_hat = net(X)l = loss(y_hat, y).sum()# 梯度清零if optimizer is not None:optimizer.zero_grad()elif params is not None and params[0].grad is not None:for param in params:param.grad.data.zero_()l.backward()if optimizer is None:sgd(params, lr, batch_size)else:optimizer.step() # “softmax回歸的簡潔實(shí)現(xiàn)”一節(jié)將用到train_l_sum += l.item()train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()n += y.shape[0]test_acc = evaluate_accuracy(test_iter, net)print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc)) train_iter, test_iter = load_data_fashion_mnist(batch_size) train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

總結(jié)

以上是生活随笔為你收集整理的（pytorch-深度学习系列）pytorch避免过拟合-dropout丢弃法的实现-学习笔记的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：睡眠声音识别中的准确率问题(三)--采集
下一篇： linux 打开上一级目录,linux开