自己写一个PRISMA 让两张图片融合起来
原文:http://blog.askfermi.me/2016/09/27/diy-prisma/
大約2個月前的一天,一款叫做PRISMA的應用突然刷爆了朋友圈,后來還出現了叫做Ostagram之類的更豐富的應用,它可以讓一張照片變成世界名畫的風格。實話說,這款app突然火起來還是很讓我驚訝的,因為之前也恰好看到了相關的論文,和一個開源的實現。而且在6月的《互聯網編程》的課上還有一位同學實現了出來。今天,我們就來一起來實現一個高級版的PRISMA,不僅僅是世界名畫,任意兩幅圖片,我們都能將它們融合在一起。
由于這不是一篇太學術意義上的科普文章,因此本文中會介紹相關的論文,和一些開源的項目,并利用這個開源的項目來實現一個簡單的類PRISMA應用。算是一篇踏坑紀實。這篇文章將只會實現后臺的一部分。
原理
PRISMA工作在一種叫做卷積神經網絡的理論之上,論文可以按此:A Neural Algorithm of Artistic Style,我們的這個項目根據的是這篇論文在torch上的一個實現,作者也將其開源在了Github上了:Neural Style。我們將其安裝在我們的系統上,在做一些簡單的操作就可以完成類似PRISMA的操作了。
硬件配置
PRISMA所用的卷積神經網絡(CNN)通常都對計算機的性能有著較高的要求,在科研和工業環境中,通常需要使用較高配置的顯卡來進行基于CUDA的運算才可以在較快的時間內完成。Neural Style的作者也提供了對CUDA的支持,因此有一塊較好的顯卡是比較推薦的配置。
根據測試,大約需要6~8G的內存,才可以較好地在CPU模式下運行Neural Style。
強烈不建議在小內存的主機上運行這一程序。
安裝
Neural Style的作者提供了安裝文檔,然而,還是會經常遇到一些問題。推薦的安裝流程如下(以Ubuntu為例):
升級GCC
GCC 5是必備的組件之一。最初我使用gcc 4.8和gcc 4.9都失敗了,這是特別坑的一點,只有使用gcc 5以上的版本才可以正常編譯。
sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt-get update sudo apt-get install gcc-5 g++-5 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5之后使用gcc -v就可以看到當前的版本,若為5就可以進行下面的步驟了。
安裝Torch及依賴
cd ~/ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch ./install.sh執行最后一條之后就會開始自動安裝torch,在安裝結束之后,會自動將環境變量信息寫入bashrc,我們只需要source ~/.bashrc就可以使其生效,之后,在命令行中輸入th,若出現 Torch,就表示安裝成功了。
安裝loadcaffe
loadcaffe 可以在Torch中加載Caffe的網絡,也是一個經常使用的庫。它依賴Google的Protocol Buffer Library,所以要先安裝它們 sudo apt-get install libprotobuf-dev protobuf-compiler 我們可以通過luarocks(一個lua的Package Manager)來安裝loadcaffe luarocks install loadcaffe
安裝 Neural-Style
先從github上把倉庫clone下來
cd ~/ git clone https://github.com/jcjohnson/neural-style.git cd neural-style之后下載提前訓練好的神經網絡數據,這個數據會比較大sh models/download_models.sh 下載結束之后,就基本可以開始使用了。
使用
最基礎的使用:th neural_style.lua -style_image <image.jpg> -content_image <image.jpg> 就可以用默認的參數來輸出融合后的圖像了。我們也可以為其增加參數來實現不同的功能,基本可以實現PRISMA的各項功能:
Options:
- image_size: Maximum side length (in pixels) of of the generated image. Default is 512.
- style_blend_weights: The weight for blending the style of multiple style images, as a comma-separated list, such as -style_blend_weights 3,7. By default all style images are equally weighted.
- gpu: Zero-indexed ID of the GPU to use; for CPU mode set -gpu to -1.
Optimization options:
- content_weight: How much to weight the content reconstruction term. Default is 5e0.
- style_weight: How much to weight the style reconstruction term. Default is 1e2.
- tv_weight: Weight of total-variation (TV) regularization; this helps to smooth the image. Default is 1e-3. Set to 0 to disable TV regularization.
- num_iterations: Default is 1000.
- init: Method for generating the generated image; one of random or image. Default is random which uses a noise initialization as in the paper; image initializes with the content image.
- optimizer: The optimization algorithm to use; either lbfgs or adam; default is lbfgs. L-BFGS tends to give better results, but uses more memory. Switching to ADAM will reduce memory usage; when using ADAM you will probably need to play with other parameters to get good results, especially the style weight, content weight, and learning rate; you may also want to normalize gradients when using ADAM.
- learning_rate: Learning rate to use with the ADAM optimizer. Default is 1e1.
- normalize_gradients: If this flag is present, style and content gradients from each layer will be L1 normalized. Idea from andersbll/neural_artistic_style.
Output options:
- output_image: Name of the output image. Default is out.png.
- print_iter: Print progress every print_iter iterations. Set to 0 to disable printing.
- save_iter: Save the image every save_iter iterations. Set to 0 to disable saving intermediate results.
Layer options:
- content_layers: Comma-separated list of layer names to use for content reconstruction. Default is relu4_2.
- style_layers: Comma-separated list of layer names to use for style reconstruction. Default is relu1_1,relu2_1,relu3_1,relu4_1,relu5_1.
Other options:
- style_scale: Scale at which to extract features from the style image. Default is 1.0.
- original_colors: If you set this to 1, then the output image will keep the colors of the content image.
- proto_file: Path to the deploy.txt file for the VGG Caffe model.
- model_file: Path to the .caffemodel file for the VGG Caffe model. Default is the original VGG-19 model; you can also try the normalized VGG-19 model used in the paper.
- pooling: The type of pooling layers to use; one of max or avg. Default is max. The VGG-19 models uses max pooling layers, but the paper mentions that replacing these layers with average pooling layers can improve the results. I haven’t been able to get good results using average pooling, but the option is here.
- backend: nn, cudnn, or clnn. Default is nn. cudnn requires cudnn.torch and may reduce memory usage. clnn requires cltorch and clnn
- cudnn_autotune: When using the cuDNN backend, pass this flag to use the built-in cuDNN autotuner to select the best convolution algorithms for your architecture. This will make the first iteration a bit slower and can take a bit more memory, but may significantly speed up the cuDNN backend.
運行結果
我對示例中的兩幅圖片進行了測試,分別使用50次迭代,100次迭代和200次迭代來達到不同的效果。效果圖如下:
Content Image:
Style Image:
50次迭代:
100次迭代:
200次迭代:
總結
以上是生活随笔為你收集整理的自己写一个PRISMA 让两张图片融合起来的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 语音服务器搭建,教你自建团队语音服务器
- 下一篇: eureka的自我保护机制?