當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

PyTorch 进行 Neural-Transfer

發布時間：2023/11/28 生活经验 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 PyTorch 进行 Neural-Transfer 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

PyTorch 進行 Neural-Transfer
1.簡介
本文講解如何實現由 Leon A. Gatys，Alexander S. Ecker和Matthias Bethge提出的Neural-Style 算法。Neural-Style 或者叫 Neural-Transfer，使用一種新的風格將指定的圖片進行重構。這個算法使用三張圖片，一張輸入圖片，一張內容圖片和一張風格圖片，并將輸入的圖片變得與內容圖片相似，且擁有風格圖片的優美風格。
2.基本原理
定義兩個間距，一個用于內容D_C，另一個用于風格D_S。D_C測量兩張圖片內容的不同，而D_S用來測量兩張圖片風格的不同。然后，輸入第三張圖片，并改變這張圖片，使其與內容圖片的內容間距和風格圖片的風格間距最小化。導入必要的包，開始圖像風格轉換。
3.導包并選擇設備
下面是一張實現圖像風格轉換所需包的匯總。 * torch, torch.nn, numpy：使用PyTorch進行風格轉換必不可少的包 * torch.optim：高效的梯度下降 * PIL, PIL.Image, matplotlib.pyplot：加載和展示圖片 * torchvision.transforms：將PIL圖片轉換成張量 * torchvision.models：訓練或加載預訓練模型 * copy：對模型進行深度拷貝；系統包
from future import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from PIL import Image
import matplotlib.pyplot as plt

import torchvision.transforms as transforms
import torchvision.models as models

import copy
下一步，選擇用哪一個設備來運行神經網絡，導入內容和風格圖片。在大量圖片上運行圖像風格算法需要很長時間，在GPU上運行，可以加速。可以使用torch.cuda.is_available()來判斷是否有可用的GPU。下一步，使用torch.device，同時 torch.device .to(device)方法也被用來將張量或者模型移動到指定設備。
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
4.加載圖片
將導入風格和內容圖片。原始的PIL圖片的值介于0到255之間，當轉換成torch張量時，值被轉換成0到1之間。圖片也需要被重設成相同的維度。一個重要的細節是，torch庫中的神經網絡，用來訓練的張量的值為0到1之間。嘗試將0到255的張量圖片加載到神經網絡，激活的特征映射，將不能偵測到目標內容和風格。Caffe庫中的預訓練網絡用來訓練的張量值為0到255之間的圖片。
注意：下載需要用到的圖片的鏈接：picasso.jpg 和 dancing.jpg。下載這兩張圖片并且添加到當前工作目錄中的 images文件夾。

所需的輸出圖像大小

imsize = 512 if torch.cuda.is_available() else 128 # use small size if no gpu

loader = transforms.Compose([
transforms.Resize(imsize), # scale imported image
transforms.ToTensor()]) # transform it into a torch tensor

def image_loader(image_name):
image = Image.open(image_name)
# fake batch dimension required to fit network’s input dimensions
image = loader(image).unsqueeze(0)
return image.to(device, torch.float)

style_img = image_loader("./data/images/neural-style/picasso.jpg")
content_img = image_loader("./data/images/neural-style/dancing.jpg")

assert style_img.size() == content_img.size(),
“we need to import style and content images of the same size”
，讓創建一個方法，通過重新將圖片轉換成PIL格式來展示，并使用plt.imshow展示它的拷貝。將嘗試展示內容和風格圖片來確保它們被正確的導入。

unloader = transforms.ToPILImage() # reconvert into PIL image

plt.ion()

def imshow(tensor, title=None):
image = tensor.cpu().clone() # we clone the tensor to not do changes on it
image = image.squeeze(0) # remove the fake batch dimension
image = unloader(image)
plt.imshow(image)
if title is not None:
plt.title(title)
plt.pause(0.001) # pause a bit so that plots are updated

plt.figure()
imshow(style_img, title=‘Style Image’)

plt.figure()
imshow(content_img, title=‘Content Image’)

5.損失函數
5.1 內容損失
內容損失表示一層內容間距的加權版本。使用網絡中的L層的特征映射F_XL，網絡處理輸入X并返回在圖片X和內容圖片 C之間的加權內容間距W_CL*D_C^L(X,C)。必須知道內容圖片（F_CL）的特征映射來計算內容間距。以F_CL作為構造參數輸入的 torch 模型來實現。間距||F_XL-F_CL||^2是兩個特征映射集合之間的平均方差，可以使用nn.MSELoss來計算。
直接添加這個內容損失模型到被用來計算內容間距的卷積層之后。每一次輸入圖片到網絡中時，內容損失都會在目標層被計算。而且因為自動求導，所有的梯度都會被計算。為了使內容損失層透明化，必須定義一個forward方法來計算內容損失，返回該層的輸入。計算的損失作為模型的參數被保存。
class ContentLoss(nn.Module):

def __init__(self, target,):super(ContentLoss, self).__init__()# 從用于動態計算梯度的樹中“分離”目標內容：# 這是一個聲明的值，而不是變量。 # 否則標準的正向方法將引發錯誤。self.target = target.detach()def forward(self, input):self.loss = F.mse_loss(input, self.target)return input

注意：
重要細節：盡管這個模型的名稱被命名為 ContentLoss, 不是一個真實的PyTorch損失方法。如果想要定義內容損失為PyTorch Loss方法，必須創建一個PyTorch自動求導方法，在backward方法中手動重計算/實現梯度.
5.2 風格損失
風格損失模型與內容損失模型的實現方法類似。作為一個網絡中的透明層，計算相應層的風格損失。為了計算風格損失，需要計算 Gram 矩陣G_XL。Gram 矩陣，將給定矩陣和它的轉置矩陣的乘積。給定的矩陣是L層特征映射F_XL的重塑版本。 F_XL被重塑成F?_XL，一個 KxN的矩陣，其中K是L層特征映射的數量，N是任何向量化特征映射F_XL^K的長度。例如，第一行的F?_XL 與第一個向量化的F_XL^1。
最后，Gram 矩陣，通過將每一個元素除以矩陣中所有元素的數量進行標準化。標準化是為了消除擁有很大的N維度F?_XL在Gram矩陣中，產生的很大的值。這些很大的值將在梯度下降的時候，對第一層（在池化層之前）產生很大的影響。風格特征往往在網絡中更深的層，標準化步驟是很重要的。
def gram_matrix(input):
a, b, c, d = input.size() # a=batch size(=1)
# 特征映射 b=number
# (c,d)=dimensions of a f. map (N=c*d)

features = input.view(a * b, c * d)  # resise F_XL into \hat F_XLG = torch.mm(features, features.t())  # compute the gram product# 通過除以每個特征映射中的元素數來“標準化”gram矩陣的值.
return G.div(a * b * c * d)

風格損失模型看起來和內容損失模型很像。風格間距也用G_XL和G_SL之間的均方差來計算。
class StyleLoss(nn.Module):

def __init__(self, target_feature):super(StyleLoss, self).__init__()self.target = gram_matrix(target_feature).detach()def forward(self, input):G = gram_matrix(input)self.loss = F.mse_loss(G, self.target)return input

6.導入模型
需要導入預訓練的神經網絡。將使用19層的 VGG 網絡，就像論文中使用的一樣。
PyTorch 的 VGG 模型實現被分為了兩個字 Sequential 模型：features（包含卷積層和池化層）和classifier（包含全連接層）。將使用features模型，因為需要每一層卷積層的輸出來計算內容和風格損失。在訓練的時候有些層會有和評估不一樣的行為，所以必須用.eval()將網絡設置成評估模式。
cnn = models.vgg19(pretrained=True).features.to(device).eval()
此外，VGG網絡通過使用mean=[0.485, 0.456, 0.406]和std=[0.229, 0.224, 0.225]參數來標準化圖片的每一個通道，并在圖片上進行訓練。在把圖片輸入神經網絡之前，先使用這些參數對圖片進行標準化。
cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)

創建一個模塊來規范化輸入圖像

這樣就可以輕松地將它放入nn.Sequential中

class Normalization(nn.Module):
def init(self, mean, std):
super(Normalization, self).init()
# .view the mean and std to make them [C x 1 x 1] so that they can
# directly work with image Tensor of shape [B x C x H x W].
# B is batch size. C is number of channels. H is height and W is width.
self.mean = torch.tensor(mean).view(-1, 1, 1)
self.std = torch.tensor(std).view(-1, 1, 1)

def forward(self, img):# normalize imgreturn (img - self.mean) / self.std

一個 Sequential 模型，包含一個順序排列的子模型序列。例如，vff19.features包含一個以正確的深度順序排列的序列（Conv2d, ReLU, MaxPool2d, Conv2d, ReLU…）。需要將自己的內容損失和風格損失層，在感知到卷積層之后立即添加進去。因此，必須創建一個新的Sequential模型，并正確的插入內容損失和風格損失模型。

期望的深度層來計算樣式/內容損失：

content_layers_default = [‘conv_4’]
style_layers_default = [‘conv_1’, ‘conv_2’, ‘conv_3’, ‘conv_4’, ‘conv_5’]

def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
style_img, content_img,
content_layers=content_layers_default,
style_layers=style_layers_default):
cnn = copy.deepcopy(cnn)

# 規范化模塊
normalization = Normalization(normalization_mean, normalization_std).to(device)# 只是為了擁有可迭代的訪問權限或列出內容/系統損失
content_losses = []
style_losses = []# 假設cnn是一個`nn.Sequential`，
# 所以創建一個新的`nn.Sequential`來放入應該按順序激活的模塊
model = nn.Sequential(normalization)i = 0  # increment every time we see a conv
for layer in cnn.children():if isinstance(layer, nn.Conv2d):i += 1name = 'conv_{}'.format(i)elif isinstance(layer, nn.ReLU):name = 'relu_{}'.format(i)# 對于在下面插入的`ContentLoss`和`StyleLoss`，# 本地版本不能很好地發揮作用。所以在這里替換不合適的layer = nn.ReLU(inplace=False)elif isinstance(layer, nn.MaxPool2d):name = 'pool_{}'.format(i)elif isinstance(layer, nn.BatchNorm2d):name = 'bn_{}'.format(i)else:raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))model.add_module(name, layer)if name in content_layers:# 加入內容損失:target = model(content_img).detach()content_loss = ContentLoss(target)model.add_module("content_loss_{}".format(i), content_loss)content_losses.append(content_loss)if name in style_layers:# 加入風格損失:target_feature = model(style_img).detach()style_loss = StyleLoss(target_feature)model.add_module("style_loss_{}".format(i), style_loss)style_losses.append(style_loss)# 在最后的內容和風格損失之后剪掉了圖層
for i in range(len(model) - 1, -1, -1):if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):breakmodel = model[:(i + 1)]return model, style_losses, content_losses

下一步，選擇輸入圖片。你可以使用內容圖片的副本或者白噪聲。
input_img = content_img.clone()

如果您想使用白噪聲而取消注釋以下行：

input_img = torch.randn(content_img.data.size(), device=device)

將原始輸入圖像添加到圖中：

plt.figure()
imshow(input_img, title=‘Input Image’)

7.梯度下降
使用 L-BFGS 算法來進行的梯度下降。與訓練一般網絡不同，訓練輸入圖片是為了最小化內容/風格損失。要創建一個 PyTorch 的 L-BFGS 優化器optim.LBFGS，并傳入的圖片到其中，作為張量去優化。
def get_input_optimizer(input_img):
# 此行顯示輸入是需要漸變的參數
optimizer = optim.LBFGS([input_img.requires_grad_()])
return optimizer
最后，必須定義一個方法來展示圖像風格轉換。對于每一次的網絡迭代，都將更新過的輸入傳入其中并計算損失。要運行每一個損失模型的backward方法，計算它們的梯度。優化器需要一個“關閉”方法，它重新估計模型并且返回損失。
還有最后一個問題要解決。神經網絡可能會嘗試使張量圖片的值超過0到1之間來優化輸入。可以通過在每次網絡運行的時候將輸入的值矯正到0到1之間來解決這個問題。
def run_style_transfer(cnn, normalization_mean, normalization_std,
content_img, style_img, input_img, num_steps=300,
style_weight=1000000, content_weight=1):
“”“Run the style transfer.”""
print(‘Building the style transfer model…’)
model, style_losses, content_losses = get_style_model_and_losses(cnn,
normalization_mean, normalization_std, style_img, content_img)
optimizer = get_input_optimizer(input_img)

print('Optimizing..')
run = [0]
while run[0] <= num_steps:def closure():# 更正更新的輸入圖像的值input_img.data.clamp_(0, 1)optimizer.zero_grad()model(input_img)style_score = 0content_score = 0for sl in style_losses:style_score += sl.lossfor cl in content_losses:content_score += cl.lossstyle_score *= style_weightcontent_score *= content_weightloss = style_score + content_scoreloss.backward()run[0] += 1if run[0] % 50 == 0:print("run {}:".format(run))print('Style Loss : {:4f} Content Loss: {:4f}'.format(style_score.item(), content_score.item()))print()return style_score + content_scoreoptimizer.step(closure)# 最后的修正......
input_img.data.clamp_(0, 1)return input_img

最后，可以運行這個算法。
output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,
content_img, style_img, input_img)

plt.figure()
imshow(output, title=‘Output Image’)

sphinx_gallery_thumbnail_number = 4

plt.ioff()
plt.show()

? 輸出結果
Building the style transfer model…
Optimizing…
run [50]:
Style Loss : 4.169304 Content Loss: 4.235329

run [100]:
Style Loss : 1.145476 Content Loss: 3.039176

run [150]:
Style Loss : 0.716769 Content Loss: 2.663749

run [200]:
Style Loss : 0.476047 Content Loss: 2.500893

run [250]:
Style Loss : 0.347092 Content Loss: 2.410895

run [300]:
Style Loss : 0.263698 Content Loss: 2.358449

總結

以上是生活随笔為你收集整理的PyTorch 进行 Neural-Transfer的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。