深度有趣 | 30 快速图像风格迁移
簡介
使用TensorFlow實(shí)現(xiàn)快速圖像風(fēng)格遷移(Fast Neural Style Transfer)
原理
在之前介紹的圖像風(fēng)格遷移中,我們根據(jù)內(nèi)容圖片和風(fēng)格圖片優(yōu)化輸入圖片,使得內(nèi)容損失函數(shù)和風(fēng)格損失函數(shù)盡可能小
和DeepDream一樣,屬于網(wǎng)絡(luò)參數(shù)不變,根據(jù)損失函數(shù)調(diào)整輸入數(shù)據(jù),因此每生成一張圖片都相當(dāng)于訓(xùn)練一個(gè)模型,需要很長時(shí)間
訓(xùn)練模型需要很長時(shí)間,而使用訓(xùn)練好的模型進(jìn)行推斷則很快
使用快速圖像風(fēng)格遷移可大大縮短生成一張遷移圖片所需的時(shí)間,其模型結(jié)構(gòu)如下,包括轉(zhuǎn)換網(wǎng)絡(luò)和損失網(wǎng)絡(luò)
風(fēng)格圖片是固定的,而內(nèi)容圖片是可變的輸入,因此以上模型用于將任意圖片快速轉(zhuǎn)換為指定風(fēng)格的圖片
- 轉(zhuǎn)換網(wǎng)絡(luò):參數(shù)需要訓(xùn)練,將內(nèi)容圖片轉(zhuǎn)換成遷移圖片
- 損失網(wǎng)絡(luò):計(jì)算遷移圖片和風(fēng)格圖片之間的風(fēng)格損失,以及遷移圖片和原始內(nèi)容圖片之間的內(nèi)容損失
經(jīng)過訓(xùn)練后,轉(zhuǎn)換網(wǎng)絡(luò)所生成的遷移圖片,在內(nèi)容上和輸入的內(nèi)容圖片相似,在風(fēng)格上和指定的風(fēng)格圖片相似
進(jìn)行推斷時(shí),僅使用轉(zhuǎn)換網(wǎng)絡(luò),輸入內(nèi)容圖片,即可得到對(duì)應(yīng)的遷移圖片
如果有多個(gè)風(fēng)格圖片,對(duì)每個(gè)風(fēng)格分別訓(xùn)練一個(gè)模型即可
實(shí)現(xiàn)
基于以下兩個(gè)項(xiàng)目進(jìn)行修改,github.com/lengstrom/f…、github.com/hzy46/fast-…
依然通過之前用過的imagenet-vgg-verydeep-19.mat計(jì)算內(nèi)容損失函數(shù)和風(fēng)格損失函數(shù)
需要一些圖片作為輸入的內(nèi)容圖片,對(duì)圖片具體內(nèi)容沒有任何要求,也不需要任何標(biāo)注,這里選擇使用MSCOCO數(shù)據(jù)集的train2014部分,cocodataset.org/#download,共82612張圖片
加載庫
# -*- coding: utf-8 -*-import tensorflow as tf import numpy as np import cv2 from imageio import imread, imsave import scipy.io import os import glob from tqdm import tqdm import matplotlib.pyplot as plt %matplotlib inline 復(fù)制代碼查看風(fēng)格圖片,共10張
style_images = glob.glob('styles/*.jpg') print(style_images) 復(fù)制代碼加載內(nèi)容圖片,去掉黑白圖片,處理成指定大小,暫時(shí)不進(jìn)行歸一化,像素值范圍為0至255之間
def resize_and_crop(image, image_size):h = image.shape[0]w = image.shape[1]if h > w:image = image[h // 2 - w // 2: h // 2 + w // 2, :, :]else:image = image[:, w // 2 - h // 2: w // 2 + h // 2, :] image = cv2.resize(image, (image_size, image_size))return imageX_data = [] image_size = 256 paths = glob.glob('train2014/*.jpg') for i in tqdm(range(len(paths))):path = paths[i]image = imread(path)if len(image.shape) < 3:continueX_data.append(resize_and_crop(image, image_size)) X_data = np.array(X_data) print(X_data.shape) 復(fù)制代碼加載vgg19模型,并定義一個(gè)函數(shù),對(duì)于給定的輸入,返回vgg19各個(gè)層的輸出值,就像在GAN中那樣,通過variable_scope重用實(shí)現(xiàn)網(wǎng)絡(luò)的重用
vgg = scipy.io.loadmat('imagenet-vgg-verydeep-19.mat') vgg_layers = vgg['layers']def vgg_endpoints(inputs, reuse=None):with tf.variable_scope('endpoints', reuse=reuse):def _weights(layer, expected_layer_name):W = vgg_layers[0][layer][0][0][2][0][0]b = vgg_layers[0][layer][0][0][2][0][1]layer_name = vgg_layers[0][layer][0][0][0][0]assert layer_name == expected_layer_namereturn W, bdef _conv2d_relu(prev_layer, layer, layer_name):W, b = _weights(layer, layer_name)W = tf.constant(W)b = tf.constant(np.reshape(b, (b.size)))return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)def _avgpool(prev_layer):return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')graph = {}graph['conv1_1'] = _conv2d_relu(inputs, 0, 'conv1_1')graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')graph['avgpool1'] = _avgpool(graph['conv1_2'])graph['conv2_1'] = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')graph['avgpool2'] = _avgpool(graph['conv2_2'])graph['conv3_1'] = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')graph['conv3_4'] = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')graph['avgpool3'] = _avgpool(graph['conv3_4'])graph['conv4_1'] = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')graph['conv4_2'] = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')graph['conv4_3'] = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')graph['conv4_4'] = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')graph['avgpool4'] = _avgpool(graph['conv4_4'])graph['conv5_1'] = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')graph['conv5_2'] = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')graph['conv5_3'] = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')graph['conv5_4'] = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')graph['avgpool5'] = _avgpool(graph['conv5_4'])return graph 復(fù)制代碼選擇一張風(fēng)格圖,減去通道顏色均值后,得到風(fēng)格圖片在vgg19各個(gè)層的輸出值,計(jì)算四個(gè)風(fēng)格層對(duì)應(yīng)的Gram矩陣
style_index = 1 X_style_data = resize_and_crop(imread(style_images[style_index]), image_size) X_style_data = np.expand_dims(X_style_data, 0) print(X_style_data.shape)MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))X_style = tf.placeholder(dtype=tf.float32, shape=X_style_data.shape, name='X_style') style_endpoints = vgg_endpoints(X_style - MEAN_VALUES) STYLE_LAYERS = ['conv1_2', 'conv2_2', 'conv3_3', 'conv4_3'] style_features = {}sess = tf.Session() for layer_name in STYLE_LAYERS:features = sess.run(style_endpoints[layer_name], feed_dict={X_style: X_style_data})features = np.reshape(features, (-1, features.shape[3]))gram = np.matmul(features.T, features) / features.sizestyle_features[layer_name] = gram 復(fù)制代碼定義轉(zhuǎn)換網(wǎng)絡(luò),典型的卷積、殘差、逆卷積結(jié)構(gòu),內(nèi)容圖片輸入之前也需要減去通道顏色均值
batch_size = 4 X = tf.placeholder(dtype=tf.float32, shape=[None, None, None, 3], name='X') k_initializer = tf.truncated_normal_initializer(0, 0.1)def relu(x):return tf.nn.relu(x)def conv2d(inputs, filters, kernel_size, strides):p = int(kernel_size / 2)h0 = tf.pad(inputs, [[0, 0], [p, p], [p, p], [0, 0]], mode='reflect')return tf.layers.conv2d(inputs=h0, filters=filters, kernel_size=kernel_size, strides=strides, padding='valid', kernel_initializer=k_initializer)def deconv2d(inputs, filters, kernel_size, strides):shape = tf.shape(inputs)height, width = shape[1], shape[2]h0 = tf.image.resize_images(inputs, [height * strides * 2, width * strides * 2], tf.image.ResizeMethod.NEAREST_NEIGHBOR)return conv2d(h0, filters, kernel_size, strides)def instance_norm(inputs):return tf.contrib.layers.instance_norm(inputs)def residual(inputs, filters, kernel_size):h0 = relu(conv2d(inputs, filters, kernel_size, 1))h0 = conv2d(h0, filters, kernel_size, 1)return tf.add(inputs, h0)with tf.variable_scope('transformer', reuse=None):h0 = tf.pad(X - MEAN_VALUES, [[0, 0], [10, 10], [10, 10], [0, 0]], mode='reflect')h0 = relu(instance_norm(conv2d(h0, 32, 9, 1)))h0 = relu(instance_norm(conv2d(h0, 64, 3, 2)))h0 = relu(instance_norm(conv2d(h0, 128, 3, 2)))for i in range(5):h0 = residual(h0, 128, 3)h0 = relu(instance_norm(deconv2d(h0, 64, 3, 2)))h0 = relu(instance_norm(deconv2d(h0, 32, 3, 2)))h0 = tf.nn.tanh(instance_norm(conv2d(h0, 3, 9, 1)))h0 = (h0 + 1) / 2 * 255.shape = tf.shape(h0)g = tf.slice(h0, [0, 10, 10, 0], [-1, shape[1] - 20, shape[2] - 20, -1], name='g') 復(fù)制代碼將轉(zhuǎn)換網(wǎng)絡(luò)的輸出即遷移圖片,以及原始內(nèi)容圖片都輸入到vgg19,得到各自對(duì)應(yīng)層的輸出,計(jì)算內(nèi)容損失函數(shù)
CONTENT_LAYER = 'conv3_3' content_endpoints = vgg_endpoints(X - MEAN_VALUES, True) g_endpoints = vgg_endpoints(g - MEAN_VALUES, True)def get_content_loss(endpoints_x, endpoints_y, layer_name):x = endpoints_x[layer_name]y = endpoints_y[layer_name]return 2 * tf.nn.l2_loss(x - y) / tf.to_float(tf.size(x))content_loss = get_content_loss(content_endpoints, g_endpoints, CONTENT_LAYER) 復(fù)制代碼根據(jù)遷移圖片和風(fēng)格圖片在指定風(fēng)格層的輸出,計(jì)算風(fēng)格損失函數(shù)
style_loss = [] for layer_name in STYLE_LAYERS:layer = g_endpoints[layer_name]shape = tf.shape(layer)bs, height, width, channel = shape[0], shape[1], shape[2], shape[3]features = tf.reshape(layer, (bs, height * width, channel))gram = tf.matmul(tf.transpose(features, (0, 2, 1)), features) / tf.to_float(height * width * channel)style_gram = style_features[layer_name]style_loss.append(2 * tf.nn.l2_loss(gram - style_gram) / tf.to_float(tf.size(layer)))style_loss = tf.reduce_sum(style_loss) 復(fù)制代碼計(jì)算全變差正則,得到總的損失函數(shù)
def get_total_variation_loss(inputs):h = inputs[:, :-1, :, :] - inputs[:, 1:, :, :]w = inputs[:, :, :-1, :] - inputs[:, :, 1:, :]return tf.nn.l2_loss(h) / tf.to_float(tf.size(h)) + tf.nn.l2_loss(w) / tf.to_float(tf.size(w)) total_variation_loss = get_total_variation_loss(g)content_weight = 1 style_weight = 250 total_variation_weight = 0.01loss = content_weight * content_loss + style_weight * style_loss + total_variation_weight * total_variation_loss 復(fù)制代碼定義優(yōu)化器,通過調(diào)整轉(zhuǎn)換網(wǎng)絡(luò)中的參數(shù)降低總損失
vars_t = [var for var in tf.trainable_variables() if var.name.startswith('transformer')] optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=vars_t) 復(fù)制代碼訓(xùn)練模型,每輪訓(xùn)練結(jié)束后,用一張測試圖片進(jìn)行測試,并且將一些tensor的值寫入events文件,便于使用tensorboard查看
style_name = style_images[style_index] style_name = style_name[style_name.find('/') + 1:].rstrip('.jpg') OUTPUT_DIR = 'samples_%s' % style_name if not os.path.exists(OUTPUT_DIR):os.mkdir(OUTPUT_DIR)tf.summary.scalar('losses/content_loss', content_loss) tf.summary.scalar('losses/style_loss', style_loss) tf.summary.scalar('losses/total_variation_loss', total_variation_loss) tf.summary.scalar('losses/loss', loss) tf.summary.scalar('weighted_losses/weighted_content_loss', content_weight * content_loss) tf.summary.scalar('weighted_losses/weighted_style_loss', style_weight * style_loss) tf.summary.scalar('weighted_losses/weighted_total_variation_loss', total_variation_weight * total_variation_loss) tf.summary.image('transformed', g) tf.summary.image('origin', X) summary = tf.summary.merge_all() writer = tf.summary.FileWriter(OUTPUT_DIR)sess.run(tf.global_variables_initializer()) losses = [] epochs = 2X_sample = imread('sjtu.jpg') h_sample = X_sample.shape[0] w_sample = X_sample.shape[1]for e in range(epochs):data_index = np.arange(X_data.shape[0])np.random.shuffle(data_index)X_data = X_data[data_index]for i in tqdm(range(X_data.shape[0] // batch_size)):X_batch = X_data[i * batch_size: i * batch_size + batch_size]ls_, _ = sess.run([loss, optimizer], feed_dict={X: X_batch})losses.append(ls_)if i > 0 and i % 20 == 0:writer.add_summary(sess.run(summary, feed_dict={X: X_batch}), e * X_data.shape[0] // batch_size + i)writer.flush()print('Epoch %d Loss %f' % (e, np.mean(losses)))losses = []gen_img = sess.run(g, feed_dict={X: [X_sample]})[0]gen_img = np.clip(gen_img, 0, 255)result = np.zeros((h_sample, w_sample * 2, 3))result[:, :w_sample, :] = X_sample / 255.result[:, w_sample:, :] = gen_img[:h_sample, :w_sample, :] / 255.plt.axis('off')plt.imshow(result)plt.show()imsave(os.path.join(OUTPUT_DIR, 'sample_%d.jpg' % e), result) 復(fù)制代碼保存模型
saver = tf.train.Saver() saver.save(sess, os.path.join(OUTPUT_DIR, 'fast_style_transfer')) 復(fù)制代碼測試圖片依舊是之前用過的交大廟門
風(fēng)格遷移結(jié)果
訓(xùn)練過程中可以使用tensorboard查看訓(xùn)練過程
tensorboard --logdir=samples_starry 復(fù)制代碼在單機(jī)上使用以下代碼即可快速完成風(fēng)格遷移,在CPU上也只需要10秒左右
# -*- coding: utf-8 -*-import tensorflow as tf import numpy as np from imageio import imread, imsave import os import timedef the_current_time():print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(time.time()))))style = 'wave' model = 'samples_%s' % style content_image = 'sjtu.jpg' result_image = 'sjtu_%s.jpg' % style X_image = imread(content_image)sess = tf.Session() sess.run(tf.global_variables_initializer())saver = tf.train.import_meta_graph(os.path.join(model, 'fast_style_transfer.meta')) saver.restore(sess, tf.train.latest_checkpoint(model))graph = tf.get_default_graph() X = graph.get_tensor_by_name('X:0') g = graph.get_tensor_by_name('transformer/g:0')the_current_time()gen_img = sess.run(g, feed_dict={X: [X_image]})[0] gen_img = np.clip(gen_img, 0, 255) / 255. imsave(result_image, gen_img)the_current_time() 復(fù)制代碼對(duì)于其他風(fēng)格圖片,用相同方法訓(xùn)練對(duì)應(yīng)模型即可
參考
- Perceptual Losses for Real-Time Style Transfer and Super-Resolution:arxiv.org/abs/1603.08…
- Fast Style Transfer in TensorFlow:github.com/lengstrom/f…
- A Tensorflow Implementation for Fast Neural Style:github.com/hzy46/fast-…
視頻講解課程
深度有趣(一)
總結(jié)
以上是生活随笔為你收集整理的深度有趣 | 30 快速图像风格迁移的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2022 年中国半导体专利申请量全球第一
- 下一篇: LOL手游剑圣打野天赋出装