ICCV 2021可逆的跨空间映射实现多样化的图像风格传输:Diverse Image Style Transfer via Invertible Cross-Space Mapping
Diverse Image Style Transfer via Invertible Cross-Space Mapping
Haibo Chen, Lei Zhao? , Huiming Zhang, Zhizhong Wang Zhiwen Zuo, Ailin Li, Wei Xing? , Dongming Lu
College of Computer Science and Technology, Zhejiang University
[paper]
目錄
Abstract
1.? Introduction
3.? Approach
3.1.? Stylization Branch
3.2.? Disentanglement Branch
3.3.? Inverse Branch
3.4.? Final Objective and Network Architectures
4.? Experiments
5.? Conclusion
Abstract
Image style transfer aims to transfer the styles of artworks onto arbitrary photographs to create novel artistic images.
Although style transfer is inherently an underdetermined problem, existing approaches usually assume a deterministic solution, thus failing to capture the full distribution of possible outputs.
To address this limitation, we propose a Diverse Image Style Transfer (DIST) framework which achieves significant diversity by enforcing an invertible cross-space mapping.
Specifically, the framework consists of three branches: disentanglement branch, inverse branch, and stylization branch. Among them, the disentanglement branch factorizes artworks into content space and style space; the inverse branch encourages the invertible mapping between the latent space of input noise vectors and the style space of generated artistic images; the?stylization branch renders the input content image with the style of an artist. Armed with these three branches, our approach is able to synthesize significantly diverse stylized images without loss of quality.
We conduct extensive experiments and comparisons to evaluate our approach qualitatively and quantitatively. The experimental results demonstrate the effectiveness of our method.
研究方向:
圖像風格轉換的目的是將藝術作品的風格轉換到任意的照片上,創造新穎的藝術形象。
提出本文要解決的核心問題:
雖然風格轉移本質上是一個不確定的問題,但現有的方法通常假設一個確定的解決方案,因此無法捕獲可能的輸出的完整分布。
本文主要的研究方法:
為了解決這一限制,本文提出了一個多樣化的圖像風格傳輸 (DIST) 框架,該框架通過強制一個可逆的跨空間映射實現了顯著的多樣性。
研究方法的具體介紹:
具體來說,框架由三個分支組成:解耦分支、逆分支和風格化分支。其中,解耦分支將藝術品分解為內容空間和風格空間;逆分支鼓勵輸入噪聲向量的潛在空間與生成的藝術圖像的風格空間之間的可逆映射;風格化分支以藝術家的風格呈現輸入內容圖像。有了這三個分支,本文的方法能夠合成出顯著不同的風格化圖像而不損失質量。
實驗結論:
本文進行了廣泛的實驗和比較,從質量和數量上評價我們的方法。實驗結果證明了該方法的有效性。
1. Introduction
An exquisite artwork can take a diligent artist days or even months to create, which is labor-intensive and timeconsuming. Motivated by this, a series of recent approaches studied the problem of repainting an existing photograph with the style of an artist using either a single artwork or a collection of artworks. These approaches are known as style transfer. Armed with style transfer techniques, anyone could create artistic images.
本文研究對象:風格遷移之藝術圖像生成
一件精美的藝術品可能需要一個勤奮的藝術家幾天甚至幾個月的時間來創作,這是勞動密集型和時間消耗。受此啟發,一系列最近的研究方法研究了用藝術家的風格重新繪制現有照片的問題,無論是使用單一的藝術作品還是收藏的藝術作品。這些方法被稱為風格轉換。有了風格轉換技術,任何人都可以創造藝術圖像。
How to represent the content and style of an image is the key challenge of style transfer. Recently, the seminal work of Gatys et al. [7] firstly proposed to extract content and style features from an image using pre-trained Deep Convolutional Neural Networks (DCNNs). By separating and recombining contents and styles of arbitrary images, novel artworks can be created. This work showed the enormous potential of CNNs in style transfer and created a surge of interest in this field. Based on this work, a series of subsequent methods have been proposed to achieve better performance in many aspects, including efficiency [13, 21, 34], quality [20, 35, 40, 43, 39, 4], and generalization [6, 5, 10, 24, 30, 27, 22]. However, diversity, as another important aspect, has received relatively less attention.
研究背景及問題引出:
如何表現圖像的內容和風格是風格轉換的關鍵挑戰。最近,Gatys et al. [7] 的開創性工作首先提出使用預訓練的深度卷積神經網絡 (Deep Convolutional Neural Networks, DCNNs) 從圖像中提取內容和風格特征。通過對任意圖像的內容和風格進行分離和重組,可以創作出新穎的藝術品。這項工作顯示了 cnn 在風格轉換方面的巨大潛力,并引發了人們對這一領域的興趣激增。在此基礎上,又提出了一系列后續的方法,以期在效率 [13,21,34]、質量 [20,35,40,43,39,4] 和泛化[6,5,10,24,30,27,22] 等多個方面取得更好的性能。然而,作為另一個重要方面,多樣性受到的關注相對較少。
As the saying goes, “There are a thousand Hamlets in a thousand people’s eyes”. Similarly, different people have different understanding and interpretation of the style of an artwork. There is no uniform and quantitative definition of the artistic style of an image. Therefore, the stylization results should be diverse rather than unique, so that the preferences of different people can be satisfied. To put it another way, style transfer is an underdetermined problem, where a large number of solutions can be found. Unfortunately, existing style transfer methods usually assume a deterministic solution. As a result, they fail to capture the full distribution of possible outputs.
動機:用很站得住的理由解釋動機,讓動機更合理,更自然
俗話說,“一千個人眼中有一千個哈姆雷特”。同樣,不同的人對一件藝術品的風格也有不同的理解和解讀。一個形象的藝術風格沒有統一的、定量的定義。因此,風格化的結果應該是多樣化的,而不是獨一無二的,這樣才能滿足不同人的偏好。換句話說,風格轉移是一個未確定的問題,可以找到大量的解決方案。不幸的是,現有的樣式轉換方法通常采用確定性的解決方案。因此,它們無法捕獲可能輸出的全部分布。
A straightforward approach to handle diversity in style transfer is to take random noise vectors along with content images as inputs, i.e., utilizing the variability of the input noise vectors to produce diverse stylization results. However, the network tends to pay more attention to the high-dimensional and structured content images and ignores the noise vectors, leading to deterministic output. To ensure that the variability in the latent space can be passed into the image space, Ulyanov et al. [35] enforced the dissimilarity among generated images by enlarging their distance in the pixel space. Similarly, Li et al. [23] introduced a diversity loss that penalized the feature similarities of different samples in a mini-batch. Although these methods can achieve diversity to some extent, they have obvious limitations.
First, forcibly enlarging the distance among outputs may cause the results to deviate from the local optimum, resulting in the degradation of image quality.
Second, to avoid introducing too many artifacts to the generated images, the weight of the diversity loss is generally set to a small value. Consequently, the diversity of the stylization results is relatively limited.
Third, diversity is more than the pixel distance or feature distance among generated images, which contains richer and more complex connotation. Most recently, Wang et al. [37] achieved better diversity by using an orthogonal noise matrix to perturb the image feature maps while keeping the original style information unchanged. However, this approach is apt to generate distorted?results, providing insufficient visual quality. Therefore, the problem of diverse style transfer remains an open challenge.
技術難題:對問題更細致、更深入的探討
處理風格轉換多樣性的一種簡單方法是將隨機噪聲向量與內容圖像一起作為輸入,即利用輸入噪聲向量的可變性來產生多樣化的程式化結果。然而,該網絡更傾向于關注高維和結構化內容圖像,而忽略噪聲向量,導致輸出的確定性。為了保證潛在空間中的可變性可以傳遞到圖像空間,Ulyanov et al. [35] 通過增大像素空間中的距離來增強生成圖像之間的不相似性。類似地,Li et al. [23] 引入了多樣性損失,懲罰了小批量中不同樣品的特征相似性。這些方法雖然能在一定程度上實現多樣性,但也存在明顯的局限性。
首先,強行增大輸出之間的距離可能會導致結果偏離局部最優,導致圖像質量下降。
其次,為了避免在生成的圖像中引入過多的偽影,一般將多樣性損失的權重設置為一個較小的值。因此,風格化結果的多樣性相對有限。
第三,多樣性不僅僅是生成圖像之間的像素距離或特征距離,它包含更豐富、更復雜的內涵。最近,Wang et al. [37] 在保持原始風格信息不變的情況下,利用正交噪聲矩陣擾動圖像特征映射,獲得了更好的多樣性。然而,這種方法容易產生失真的結果,產生不理想的視覺質量。
因此,多元化的風格轉換問題仍然是一個開放的挑戰。
In this paper, we propose a Diverse Image Style Transfer (DIST) framework which achieves significant diversity without loss of quality by enforcing an invertible crossspace mapping. Specifically, the framework takes random noise vectors along with everyday photographs as its inputs, where the former are responsible for style variations and the latter determine the main contents. However, according to above analyses, we can learn that the noise vectors are prone to be ignored in the network. Our proposed DIST framework tackles this problem through three branches: disentanglement branch, inverse branch, and stylization branch.
The disentanglement branch factorizes artworks into content space and style space. The inverse branch encourages the invertible mapping between the latent space of input noise vectors and the style space of generated artistic images, which is inspired by [32]. But different from [32], we invert the style information rather than the whole generated image to the input noise vector, since the input noise vector mainly influences the style of the generated image. The stylization branch renders the input content image with the style of an artist. Equipped with these three branches, DIST is able to synthesize significantly diverse stylized images without loss of quality, as shown in Figure 1.
本文提出了一個多樣化的圖像風格傳輸 (DIST) 框架,通過強制一個可逆的跨空間映射來實現顯著的多樣性而不損失質量。具體來說,框架將隨機噪聲向量和日常照片作為輸入,前者負責風格變化,后者決定主要內容。但是,通過以上分析,可以了解到噪聲向量在網絡中很容易被忽略。本文提出的 DIST 框架通過三個分支來解決這個問題:解糾纏分支、逆分支和風格化分支。
解構分支將藝術品分解為內容空間和風格空間。
逆分支鼓勵輸入噪聲向量的潛在空間與生成的藝術圖像的風格空間之間的可逆映射,其靈感來自 [32]。但與 [32] 不同的是,由于輸入噪聲矢量主要影響生成圖像的風格,所以本文將風格信息而不是生成的整個圖像轉換為輸入噪聲矢量。
風格化分支以藝術家的風格呈現輸入內容圖像。
配備了這三個分支,DIST 能夠合成出明顯不同的風格化圖像而不降低圖像質量,如圖 1 所示。
Overall, the contributions can be summarized as follows:
?? We propose a novel style transfer framework which achieves significant diversity by learning the one-toone mapping between latent space and style space.
?? Different from existing style transfer methods [35, 23, 37] that obtain diversity with serious degradation of quality, our approach can produce both high-quality and diverse stylization results.
?? Our approach provides a new way to disentangle the style and content of an image.
?? We demonstrate the effectiveness and superiority of our approach by extensive comparison with several state-of-the-art style transfer methods.
總的來說,這些貢獻可總結如下:
?? 通過學習潛在空間和風格空間之間的一對一映射,提出了一種新的風格遷移框架,實現了顯著的多樣性。
?? 與現有的風格轉移方法 [35,23,37] 獲得多樣性而質量嚴重退化不同,本文的方法可以產生高質量和多樣化的風格化結果。
?? 本文的方法提供了一種新的方法來理清圖像的風格和內容。
?? 通過與幾種最先進的風格轉換方法的廣泛比較,證明了本文的方法的有效性和優越性。
【貢獻總結略顯簡單】
3.? Approach
Inspired by [29, 17, 18, 33], we learn artistic style not from a single artwork but from a collection of related artworks. Formally, our task can be described as follows: given a collection of photos x ~ X and a collection of artworks y ~ Y (the contents of X and Y can be totally different), we aim to learn a style transformation G : X → Y with significant diversity. To achieve this goal, we propose a DIST framework consisting of three branches: stylization branch, disentanglement branch, and inverse branch. In this section, we introduce the three branches in details.
受 [29,17,18,33] 的啟發,學習藝術風格不是從單一的藝術品,而是從相關藝術品的集合。形式上,本文的任務可以這樣描述:給定一組照片 x ~ X?和一組藝術品 y ~ Y?(x 和 y 的內容可以完全不同),本文的目標是學習具有顯著多樣性的風格轉變 G: X→Y。為了實現這一目標,本文提出了一個由三個分支組成的 DIST 框架:風格化分支、解耦分支和逆分支。
3.1.? Stylization Branch
The stylization branch aims to repaint x ~ X with the style of y ~ Y . To this end, we enable G to approximate the distribution of Y by employing a discriminator D to train against G: G tries to generate images that resembles the images in Y , while D tries to distinguish the stylized images from the real ones. Joint training of these two networks leads to a generator that is able to produce desired stylizations. This process can be formulated as follows (note that for G, we adopt an encoder-decoder architecture consisting of an encoder Ec and a decoder D) :
? (1)
where z ∈ R dz is a random noise vector and p(z) is the standard normal distribution N (0, I). We leverage its variability to encourage diversity in generated images.
風格化分支
風格化分支的目標是用 y ~ Y?的風格重新繪制 x ~ X。為此,本文使用判別器器 D 對 G 進行訓練,使 G 能夠近似 Y 的分布: G 試圖生成與 Y 中的圖像相似的圖像,而 D 試圖將程式化的圖像與真實的圖像區分開來。對這兩個網絡的聯合訓練將產生一個能夠產生所需程式化的生成器。這個過程可以表述如下 (注意對于 G,本文采用編碼器 Ec 和解碼器 D 組成的編解碼器體系結構),如公式(1)。
其中 z∈rdz 是一個隨機噪聲向量,p(z) 是標準正態分布 N (0, I)。本文利用它的可變性來鼓勵生成圖像的多樣性。
?
Only using above adversarial loss cannot preserve the content information of x in the generated image, which does not meet the requirements of style transfer. The simplest solution is to utilize a pixel-wise loss between the content image x ~ X and stylized image D(Ec(x), z). However, this loss is too strict and harms the quality of the stylized image. Therefore, we soften the constraint: instead of directly calculating the distance between original images, we first input them into an average pooling layer P and then calculate the distance between them. We express this content structure loss as:
? ? (2)
Compared with the pixel-wise loss which requires the content image and the stylized image to be exactly the same, Lp measures their difference in a more coarse-grained manner and only requires them to be similar in general content structures, more consistent with the goal of style transfer.
Although the stylization branch is sufficient to obtain remarkable stylized images, it can only produce a deterministic stylized image without diversity, because the network tends to ignore the random noise vector z.
僅使用上述對抗性損失無法保留生成圖像中 x 的內容信息,不滿足風格轉移的要求。最簡單的解決方案是利用內容圖像 x ~ X?和風格化圖像 D(Ec(x), z) 之間的像素損失。然而,這種損失太過嚴格,損害了風格化圖像的質量。因此,本文軟化約束:本文不是直接計算原始圖像之間的距離,而是首先將它們輸入到平均池化層 P 中,然后計算它們之間的距離。本文將這種內容結構損失表示為公式(2)。
與要求內容圖像和風格化圖像完全相同的像素損失相比,Lp 以更粗粒度的方式衡量它們的差異,只要求它們在一般內容結構上相似,更符合風格轉移的目標。
雖然風格化分支足以獲得顯著的風格化圖像,但由于網絡傾向于忽略隨機噪聲向量 z,只能產生確定性的風格化圖像,沒有多樣性。
3.2. Disentanglement Branch
[32] alleviated the mode collapse issue in GANs by enforcing the bijection mapping between the input noise vectors and generated images. Different from [32], which only takes noise vectors as inputs, our model takes noise vectors along with content images as inputs, where the former are responsible for style variations and the latter determine the main contents. Therefore, in the inverse process, instead of inverting the whole generated image to the input noise vector like [32] does, we invert the style information of the stylized image to the input noise vector (details in Section 3.3). To be specific, we utilize a style encoder to extract the style information from the stylized image, and enforce the consistency between the style encoder’s output and the input noise vector. The main problem now is how to obtain such a style encoder. We resolve this problem through the disentanglement branch.
解釋為什么需要解耦分支:
[32] 通過加強輸入噪聲向量與生成圖像之間的雙射映射,緩解了 GAN 中的模式坍縮問題。與 [32] 只使用噪聲向量作為輸入不同,本文的模型將噪聲向量與內容圖像一起作為輸入,前者負責風格變化,后者決定主要內容。因此,在反過程中,本文不像 [32] 那樣將整個生成的圖像反到輸入噪聲向量,而是將風格化圖像的樣式信息反到輸入噪聲向量 (詳見第3.3節)。具體來說,本文利用樣式編碼器從風格化圖像中提取樣式信息,并加強樣式編碼器輸出與輸入噪聲向量之間的一致性。現在的主要問題是如何獲得這樣的樣式編碼器。本文通過解耦分支來解決這個問題。
First, the disentanglement branch employs an encoder E ′ c which takes the stylized image D(Ec(x), z) as input. Given that the content image and stylized image share the same content and differ greatly in style, if we encourage the similarity between the output of Ec (whose input is the content image) and that of E ′ c (whose input is the stylized image), Ec and E ′ c shall extract the shared content information and neglect the specific style information. Notice that Ec and E ′ c are two independent networks and do not share weights. This is because there are some differences when extracting photographs’ contents and artworks’ contents. We define the corresponding content feature loss as,
? (3)?
首先,解耦分支采用編碼器 E ' c,以風格化圖像 D(Ec(x), z) 作為輸入。鑒于內容圖像和風格化圖像共享相同的內容和風格有很大的不同,如果鼓勵 Ec 的輸出之間的相似性 (其輸入內容圖像) 和E’c (其輸入是程式化的形象), Ec 和 E’c 應提取共享內容信息和忽視具體樣式信息。注意,Ec 和 E ’c 是兩個獨立的網絡,不共享權值。這是因為在提取照片內容和藝術品內容時存在一些差異。本文將相應的內容特征損失定義為公式(3)。
However, L_FP may encourage Ec and E ′ c to output feature maps in which the value of each element is pretty small (i.e., ∥ Ec(x) ∥→ 0, ∥ E ′ c (D(Ec(x), z)) ∥→ 0). In such a circumstance, although L_FP is minimized, the similarity between Ec(x) and E ′ c (D(Ec(x), z)) is not increased. To alleviate this problem, we employ a feature discriminator Df and introduce a content feature adversarial loss,
? ?(4)?
L_cadv measures the distribution deviation, less sensitive to the value of its input in comparison with L_FP . In addition, Lcadv together with L_FP can promote the similarity in two dimensions, further improving the performance.
然而,L_FP 可能鼓勵 Ec 和 E ' c 輸出特征圖中每個元素的值是很小的 (也就是說,∥Ec (x)∥→0,∥E ' c (D (Ec (x), z))∥→0)。在這種情況下,盡管如果 P 是最小化,Ec (x) 之間的相似性和 Ec (D (Ec (x), z)) 不增加。為了緩解這個問題,本文使用了特征判別器 Df,并引入了內容特征對抗損失,即公式 (4)。
L_cadv 測量的是分布偏差,與 L_FP 相比,L_cadv 對其輸入值的敏感性較小。此外,L_cadv 與 L_FP一起可以促進兩個維度的相似性,進一步提高性能。
Then the disentanglement branch adopts another encoder Es together with the content encoder E ′ c and the decoder D to reconstruct the artistic image. Since E ′ c is constrained to extract the content information, Es has to extract the style information to reconstruct the artistic image. Therefore, we get our desired style encoder Es. We formulate the reconstruction loss as,
? ?(5)?
然后解耦分支采用另一個編碼器 Es,再加上內容編碼器 E’c 和解碼器 D 來重構藝術圖像。由于 E ' c 被約束提取內容信息,Es 必須提取風格信息來重建藝術形象。因此,本文得到所需的樣式編碼器 Es。本文將重建損失定義為公式(5)。
3.3.? Inverse Branch
Armed with the style encoder Es, we can access the style space of artistic images. To achieve diversity, the inverse branch enforces the one-to-one mapping between latent space and style space by employing the inverse loss,
? (6)
The inverse loss ensures that the style information of the generated image D(Ec(x), z) can be inverted to the corresponding noise vector z, which implies that D(Ec(x), z)?retains the influence and variability of z. In this way, we can get diverse stylization results by randomly sampling different z from the standard normal distribution N (0, I).
借助風格編碼器 Es,可以進入藝術形象的風格空間。為了實現分集,逆分支利用逆損失,強制潛在空間和風格空間之間的一對一映射,如果公式 (6)。
逆損失確保樣式信息生成的圖像 D (Ec (x), z) 可以倒到相應的噪聲向量 z,這意味著 D (Ec (x), z) 保留 z 的影響和變化。通過這種方式,本文可以得到不同的格式化結果由標準正態分布 N(0, 1) 隨機抽樣不同 z。
3.4.? Final Objective and Network Architectures
Figure 2 illustrates the full pipeline of our approach. We summarize all aforementioned losses and obtain the compound loss,
where the hyper-parameters λadv, λp, λfp, λcadv, λrecon, and λinv control the importance of each term. We use the compound loss as the final objective to train our model.
圖 2 說明了本文方法的完整 pipeline。將上述損失匯總,得出復合損失。
其中超參數 λadv、λp、λfp、λcadv、λrecon、λinv 控制各項的重要性。本文將復合損耗作為訓練模型的最終目標。
?
Network Architectures
We build on the recent AST backbone [29], and extend it with our proposed changes to produce diverse stylization results. Specifically, the content encoder Ec and E ′ c have the same architecture and are composed of five convolution layers. The style encoder Es includes five convolution layers, a global average pooling layer, and a fully connected (FC) layer. Similar to [15], our decoder D has two branches. One branch takes the content image x as input, containing nine residual blocks [9], four upsampling blocks, and one convolution layer. Another branch takes the noise vector z as input (notice that at inference time, we can take either z or the style code Es(y) extracted from a reference image y as its input), containing one FC layer to produce a set of affine parameters γ, β. Then the two branches are combined through AdaIN [10],
where a is the activation of the previous convolutional layer in branch one, μ and σ are channel-wise mean and standard deviation, respectively. The image discriminator D is a fully convolutional network with seven convolution layers. The feature discriminator Df consists of three convolution layers and one FC layer. As for P, it is an average pooling layer. The loss weights are set to λadv = 2, λp = 150, λfp = 100, λcadv = 10, λrecon = 200, and λinv = 600. We use the Adam optimizer with a learning rate of 0.0002.
本文的網絡構建在最近的 AST baseline [29]?上,并使用提出的更改對其進行擴展,以產生不同的樣式化結果。
[29] A style-aware content loss for real-time hd style transfer. ECCV 2018.
具體來說,內容編碼器 Ec 和 E ' c 具有相同的架構,由 5 個卷積層組成。
風格編碼器 Es 包括 5 個卷積層、全局平均池化層和完全連接 (FC) 層。
解碼器 D 有兩個分支。一個分支以內容圖像 x 為輸入,包含 9 個殘差 block,4 個上采樣 block 和 1 個卷積層。另一個分支將噪聲向量 z 作為輸入 (注意,在推斷時,可以將 z 或從參考圖像 y 中提取的樣式代碼 Es(y) 作為輸入),包含一個 FC 層以產生一組仿射參數 γ, β。
然后兩個分支通過 AdaIN 合并,如上式,式中,a 為分支1上一層卷積層的激活,μ 和 σ 分別為通道均值和標準差。
圖像判別器 D 是一個具有 7 層卷積層的全卷積網絡。特征鑒別器 Df 由 3 個卷積層和 1 個 FC 層組成。P 是一個平均的池化層。
損失權重設置為: λadv = 2, λp = 150, λfp = 100, λcadv = 10, λrecon = 200, λinv = 600。我們使用 Adam 優化器,學習率為 0.0002。
4.? Experiments
Dataset
Like [29, 17, 18, 33], we take Places365 [45] as the content dataset and Wikiart [14] as the style?dataset (concretely, we collect hundreds of artworks for each artist from WikiArt and train a separate model for him/her). Training images were randomly cropped and resized to 768×768 resolutions.
本文將 Places365 [GitHub][官網]?作為內容集,Wikiart [14] 作為風格集 (具體來說,從 Wikiart 為每個藝術家收集數百件藝術品,并為他/她訓練一個單獨的模型)。訓練圖像被隨機裁剪并調整為 768×768 分辨率。
Baselines
We take the following methods that can produce diversity as our baselines: Gatys et al. [7], Li et al. [23], Ulyanov et al. [35], DFP [37], and MUNIT [11]. Apart from above methods, we also compare with AST [29] and Svoboda et al. [33] to make the experiments more sufficient. Note that we use their officially released codes and default settings of hyper-parameters for experiments.
我們采用以下能夠產生多樣性的方法作為我們的基線:Gatys et al. [7], Li et al. [23], Ulyanov et al. [35], DFP[37],和MUNIT[11]。除上述方法外,我們還與AST[29]和Svoboda et al.[33]進行了比較,使實驗更加充分。請注意,我們使用他們官方發布的代碼和默認超參數的實驗設置。
[7] Image style transfer using convolutional neural networks. CVPR 2016.
[11] Multimodal unsupervised image-to-image translation. ECCV 2018.
[23] Diversified texture synthesis with feed-forward networks. CVPR 2017.
[29] A style-aware content loss for real-time hd style transfer. ECCV 2018.
[33] Two-stage peer-regularized feature recombination for arbitrary image style transfer. CVPR 2020. GitHub
[35] Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. CVPR 2017.
[37] Diversified arbitrary style transfer via deep feature perturbation. CVPR 2020. GitHub
實驗結果展示
?
?
5.? Conclusion
In this paper, we propose a Diverse Image Style Transfer (DIST) framework which achieves significant diversity without loss of quality by encouraging the one-to-one mapping between the latent space of input noise vectors and the style space of generated artistic images. The framework consists of three branches, where the stylization branch is responsible for stylizing the content image, and the other two branches (i.e., the disentanglement branch and the inverse branch) are responsible for diversity. Our extensive experimental results demonstrate the effectiveness and superiority of our method. In the future work, we would like to extend our method to other tasks, such as text-to-image synthesis and image inpainting.
在本文中,本文提出了一個多樣化的圖像風格轉換 (DIST) 框架,通過鼓勵輸入噪聲向量的潛在空間和生成的藝術圖像的風格空間之間的一對一映射,實現了顯著的多樣性而不損失質量。框架由三個分支組成,其中程式化分支負責程式化內容圖像,其他兩個分支?(即耦分支和逆分支) 負責多樣性。大量的實驗結果證明了該方法的有效性和優越性。在未來的工作中,希望將本文的方法擴展到其他任務,如文本到圖像的合成和圖像的填充。
總結
以上是生活随笔為你收集整理的ICCV 2021可逆的跨空间映射实现多样化的图像风格传输:Diverse Image Style Transfer via Invertible Cross-Space Mapping的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【数据结构与算法】【算法思想】Dijks
- 下一篇: VS2005 .vs. Orcas