當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

2021-06-03 【论文笔记】Cross-domain Correspondence Learning for Exemplar-based Image Translation

發布時間：2023/12/8 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 2021-06-03 【论文笔记】Cross-domain Correspondence Learning for Exemplar-based Image Translation 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

論文題目：Cross-domain Correspondence Learning for Exemplar-based Image Translation

論文主頁：https://panzhang0212.github.io/CoCosNet/

論文鏈接：https://arxiv.org/abs/2004.05571

代碼鏈接：https://github.com/microsoft/CoCosNet

摘要

本文提供了一種圖像翻譯的通用框架，它從輸入的語義圖像合成真實的照片圖像。與常規不同的是這個框架可以再輸入一個 exemplar image，以這個 exemplar image的風格來輸出最后的真實照片圖像。這個exemplar圖像給輸出圖像更多的限制，也提供了更多信息。
這個框架主要由兩部分組成，一是解決跨域語義對應的Cross domain correspondence Network，二是解決翻譯生成圖像的Translation network。傳統方法的理論只能處理自然圖像直接的關系，無法處理跨域圖像，但本框架可以處理跨域圖像的問題。

Cross domain correspondence Network：
首先建立了位于不同領域的input和exemplar image之間的對應關系，并對exemplar image進行了相應的扭曲，使其語義與input一致。具體是把兩個域的圖像映射到一個中間域，找到對應關系，從而扭曲exemplar image.

input圖像xA屬于A域，exemplar圖像yB屬于B域，作者通過把xA 和yB放入feature pyramid network（利用FPN方法）提取特征，轉化為中間域S的xS和yS.
其中 $θF\theta_{\mathcal{F}}$ 是需要學習的參數
此步驟損失函數為：

$Ldomain??1=∥FA→S(xA)?FB→S(xB)∥1\mathcal{L}_{\text {domain }}^{\ell_{1}}=\left\|\mathcal{F}_{A \rightarrow S}\left(x_{A}\right)-\mathcal{F}_{B \rightarrow S}\left(x_{B}\right)\right\|_{1}$

由于XA和YB是不同域圖像，但包含相同語義，他們轉化到S域之后應當盡量對其，故損失函數為使兩者在S域中的映射之間的差別。應使這個差異最小。

xA和yB都映射到域S之后，計算一個S域中他們倆的相關矩陣，然后通過softmax加權選擇yB中最相關的像素。
$M(u,v)=x^S(u)Ty^S(v)∥x^S(u)∥∥y^S(v)∥\mathcal{M}(u, v)=\frac{\hat{x}_{S}(u)^{T} \hat{y}_{S}(v)}{\left\|\hat{x}_{S}(u)\right\|\left\|\hat{y}_{S}(v)\right\|}$

$ry→x(u)=∑vsoftmax?v(αM(u,v))?yB(v)r_{y \rightarrow x}(u)=\sum_{v} \operatorname{softmax}_{v}(\alpha \mathcal{M}(u, v)) \cdot y_{B}(v)$

損失函數為： $Lreg=∥ry→x→y?yB∥1\mathcal{L}_{r e g}=\left\|r_{y \rightarrow x \rightarrow y}-y_{B}\right\|_{1}$

Translation Network：
把扭曲的exemplar image合成輸出圖像。從一個固定的常量z開始，通過卷積逐步扭曲圖像的風格信息。

$αh,wi(ry→x)×Fc,h,wi?μh,wiσh,wi+βh,wi(ry→x)\alpha_{h, w}^{i}\left(r_{y \rightarrow x}\right) \times \frac{F_{c, h, w}^{i}-\mu_{h, w}^{i}}{\sigma_{h, w}^{i}}+\beta_{h, w}^{i}\left(r_{y \rightarrow x}\right)$

$αi,βi=Ti(ry→x;θT)\alpha^{i}, \beta^{i}=\mathcal{T}_{i}\left(r_{y \rightarrow x} ; \theta_{\mathcal{T}}\right)$

最終生成圖像：

$x^B=G(z,Ti(ry→x;θT);θG)\hat{x}_{B}=\mathcal{G}\left(z, \mathcal{T}_{i}\left(r_{y \rightarrow x} ; \theta_{\mathcal{T}}\right) ; \theta_{\mathcal{G}}\right)$

最終網絡為七層，得到輸出圖片。
另外的一些損失函數：
第一個是偽參考圖像對損失，xB作為真實值，xB’是xB的變形，保持圖片內容不變，如翻轉等。如果吧xB’作為exemplar image，xA作為input，那么生成圖像應接近xB。故損失函數為：

$Lfeat?=∑lλl∥?l(G(xA,xB′))??l(xB)∥1\mathcal{L}_{\text {feat }}=\sum_{l} \lambda_{l}\left\|\phi_{l}\left(\mathcal{G}\left(x_{A}, x_{B}^{\prime}\right)\right)-\phi_{l}\left(x_{B}\right)\right\|_{1}$

第二個是參考圖像轉換損失，其中包含兩項，perceptual loss和contextual loss。
perceptual loss:

$Lperc?=∥?l(x^B)??l(xB)∥1\mathcal{L}_{\text {perc }}=\left\|\phi_{l}\left(\hat{x}_{B}\right)-\phi_{l}\left(x_{B}\right)\right\|_{1}$

contextual loss:

$Lcontext?=∑lωl[?log?(1nl∑imax?jAl(?il(x^B),?jl(yB)))]\mathcal{L}_{\text {context }}=\sum_{l} \omega_{l}\left[-\log \left(\frac{1}{n_{l}} \sum_{i} \max _{j} A^{l}\left(\phi_{i}^{l}\left(\hat{x}_{B}\right), \phi_{j}^{l}\left(y_{B}\right)\right)\right)\right]$

最后是Adversarial loss：

$LadvD=?E[h(D(yB))]?E[h(D(G(xA,yB)))]\mathcal{L}_{a d v}^{\mathcal{D}}=-\mathbb{E}\left[h\left(\mathcal{D}\left(y_{B}\right)\right)\right]-\mathbb{E}\left[h\left(\mathcal{D}\left(\mathcal{G}\left(x_{A}, y_{B}\right)\right)\right)\right]$

$LadvG=?E[D(G(xA,yB))]\mathcal{L}_{a d v}^{\mathcal{G}}=-\mathbb{E}\left[\mathcal{D}\left(\mathcal{G}\left(x_{A}, y_{B}\right)\right)\right]$

最終損失函數為：

$Lθ=min?F,T,Gmax?Dψ1Lfeat?+ψ2Lperc?+ψ3Lcontext?+ψ4LadvG+ψ5Ldomain??1+ψ6Lreg?\begin{aligned} \mathcal{L}_{\theta}=\min _{\mathcal{F}, \mathcal{T}, \mathcal{G}} & \max _{\mathcal{D}} \psi_{1} \mathcal{L}_{\text {feat }}+\psi_{2} \mathcal{L}_{\text {perc }}+\psi_{3} \mathcal{L}_{\text {context }} \\ &+\psi_{4} \mathcal{L}_{a d v}^{\mathcal{G}}+\psi_{5} \mathcal{L}_{\text {domain }}^{\ell_{1}}+\psi_{6} \mathcal{L}_{\text {reg }}\end{aligned}$

實驗
生成圖像對比：

跨領域的相關度
利用correlation matrix可以計算輸入語義圖像和輸入參考風格圖像之間不同點的對應關系

圖像編輯
給定一張圖像及其對應的mask，對語義mask進行修改，再將原圖像作為參考風格圖像

方法限制

示例圖像中的兩輛不同顏色汽車同時與input中的汽車相對應，方法可能會產生混合顏色偽影，與現實不符；此外，在多對一映射（第二行)的情況下，多個實例(圖中的枕頭)可能使用相同的樣式

另外，相關矩陣等計算非常占用GPU內存，使得這個方法很難用在高分辨率的圖像上。

總結

以上是生活随笔為你收集整理的2021-06-03 【论文笔记】Cross-domain Correspondence Learning for Exemplar-based Image Translation的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：最新发布 Debian 系统的详细安装过
下一篇：防范ASP木马