當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SR综述论文总结

發(fā)布時間：2023/12/16 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 SR综述论文总结小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

- 論文：A Deep Journey into Super-resolution: A Survey
- - - 論文概要
    - BackGround
    - SISR 分類
    - 實(shí)驗(yàn)評估
    - 未來方向

論文：A Deep Journey into Super-resolution: A Survey

作者：Saeed Anwar, Salman Khan, and Nick Barnes

論文概要

論文概要：
對比了近30個最新的超分辨率卷積網(wǎng)絡(luò)在6個數(shù)據(jù)集上(3個經(jīng)典的，3個最近提出的)的表現(xiàn)，來給SISR定下基準(zhǔn)。分成9類。我們還提供了在網(wǎng)絡(luò)復(fù)雜性、內(nèi)存占用、模型輸入和輸出、學(xué)習(xí)細(xì)節(jié)、網(wǎng)絡(luò)損失類型和重要架構(gòu)差異方面的比較。
SISR應(yīng)用方面：
large computer displays
HD television sets
hand-held devices (mobile phones,tablets, cameras etc.).
object detection in scenes (particularly small objects )
face recognition in surveillance videos
medical imaging
improving interpretation of images in remote sensing
astronomical images
forensics
超分辨率是一個經(jīng)典的問題，但由于種種原因，仍然是計算機(jī)視覺領(lǐng)域一個具有挑戰(zhàn)性和開放性的研究課題
原因：
SR is an ill-posed inverse problem
(There exist multiple solutions for the same low-resolution image. To constrain the solution-space, reliable prior information is typically required.)
the complexity of the problem increases as the up-scaling factor increases.(x2,x4,x8,問題就會變得越來越難)
assessment of the quality of output is not straightforward（模型的質(zhì)量評估不容易，質(zhì)量指標(biāo)PSNR,SSIM只與人類的感知有松散的聯(lián)系）
DL 在其他 AI 領(lǐng)域的應(yīng)用：
目標(biāo)分類與探測
自然語言處理
圖像處理
音頻信號處理
本論文的貢獻(xiàn)：
全面回顧超分辨率的最新技術(shù)
基于各種超分辨率算法結(jié)構(gòu)的不同提出一個的新的分類方法
基于參數(shù)數(shù)量、算法設(shè)置、訓(xùn)練細(xì)節(jié)和重要的結(jié)構(gòu)創(chuàng)新進(jìn)行全面的分析
我們對算法進(jìn)行系統(tǒng)的評估（在6個SISR數(shù)據(jù)集上）
討論了目前超分領(lǐng)域的挑戰(zhàn)和對未來研究的展望

BackGround

Degradation Process:

$\Phi( x ; \theta_\eta)\tag{1}$
$x$ ：HR 圖像
$y$ ：LR 圖像
$Φ\Phi$ ：degradation function
$θη\theta_\eta$ ：degradation parameters (scaling factor,noise)

現(xiàn)實(shí)中，只有 $y$ 是可獲取的，并且沒有降解過程和也沒有降解參數(shù)，超分辨率就是試圖消除降解效應(yīng)去獲得和 $x$ (真實(shí)HR圖像) 近似的圖像 $x^\hat{x}$

$x^=Φ?1(y,θ?)(2)\hat{x}=\Phi^{-1}(y,\theta_\varsigma)\tag{2}$
$θ?\theta_\varsigma$ ： $Φ?1\Phi^{-1}$ 的參數(shù)
降解的過程是未知且非常復(fù)雜的，受到很多因素影響，例如：noise (sensor and speckle), compression, blur (defocus and motion), and other artifacts

因此，大多數(shù)研究工作相對(1)更喜歡下邊降解模型：
$\otimes k) \downarrow_s+ \ n\tag{3}$
$k$ ：blurring kernel
$\otimes k$ ：convolution operation
$↓s\downarrow_s$ ：downsampling operation with a scaling factor $s$
$n$ : the additive white Gaussian noise (AWGN) with a standard deviation of $σ\sigma$ (noise level).

圖像超分辨率的目標(biāo)就是去最小化與模型 $\otimes k+ \ n$ 相關(guān)的數(shù)據(jù)保真項(xiàng) (data fidelity term) 如下：
$J(x^,θ?,k)=∥x?k?y∥?datafidelityterm+αΨ(x,θ?)?regularizer(正則化)J(\hat{x},\theta_\varsigma,k)=\underbrace{\|x \otimes k -y\|}_{data\ fidelity\ term}+\underbrace{\alpha \Psi(x,\theta_\varsigma)}_{regularizer(正則化)}$
$α\alpha$ ：（the data fidelity term and image prior $Ψ(?)\Psi(\cdot)$ ）平衡系數(shù)

自然圖像先驗(yàn)
在自然圖像處理領(lǐng)域里,有很多問題(比如圖像去噪、圖像去模糊、圖像修復(fù)、圖像重建等)都是反問題 ,即問題的解不是唯一的。為了縮小問題的解的空間或者說為了更好的逼近真實(shí)解,我們需要添加限制條件。這些限制條件來自自然圖像本身的特性,即自然圖像的先驗(yàn)信息。如果能夠很好地利用自然圖像的先驗(yàn)信息,就可以從低質(zhì)量的圖像上恢復(fù)出高質(zhì)量的圖像,因此研究自然圖像的先驗(yàn)信息是非常有意義的。
目前常用的自然圖像的先驗(yàn)信息有自然圖像的局部平滑性、非局部自相似性、非高斯性、統(tǒng)計特性、稀疏性等特征。
作者：showaichuan
鏈接：https://www.jianshu.com/p/ed8a5b05c3a4
來源：簡書

基于圖像先驗(yàn)，超分辨率的方法大致可以分為如下幾個類別：
prediction methods
edgebased methods
statistical methods
patch-based methods
deep learning methods

SISR 分類

Linear networks
only a single path for signal flow without any skip connections or multiple-branches

note：some linear networks learn to reproduce the residual image （the difference between the LR and HR images）

根據(jù) up-sampling operation 可以分兩類：
- early upsampling
  
  首先對LR輸入進(jìn)行上采樣以匹配所需的HR輸出大小，然后學(xué)習(xí)層次特征表示以生成輸出
  常用的上采樣方法：雙立方插值算法
  - SRCNN（using only convolutional layers for super-resolution）
- 數(shù)據(jù)集：
  training data set：
  HR圖像：synthesized by extracting non-overlapping dense patches of size 32 $×\times$ 32 from the HR images
  LR圖像：The LR input patches are first downsampled and then upsampled using bicubic interpolation having the same size as the high-resolution output image
- Layers ：three convolutional and two ReLU layers
  convolutional layer is termed as patch extraction or feature extraction（從輸入圖像創(chuàng)建特征映射）
  convolutional layer is called non-linear mapping（非線性映射，將特征映射轉(zhuǎn)換為高維特征向量）
  convolutional layer aggregates the features maps to output the final high-resolution image
- Loss function：Mean Squared Error (MSE)
- Layers ：deep CNN architecture
  (the VGG-net and uses fixed-size convolutions 3 $×\times$ 3 in all network layers)
  To avoid slow convergence(緩慢收斂) in deep networks (specifically with 20 weight layers), they propose two effective strategies :
- learn a residual mapping(殘差映射) that generates the difference between the HR and LR image(使得目標(biāo)更簡單，網(wǎng)絡(luò)只聚焦在高頻信息)
- gradients are clipped with(夾在) in the range $[?θ,+θ][\ -\theta,+\theta\ ]$ (使得到學(xué)習(xí)率可以加速訓(xùn)練過程)
- 觀點(diǎn)：deeper networks can provide better contextualization and learn generalizable representations that can be used for multi-scale super-resolution
  
  VDSR 與 ResNet
- learns to predict a high-frequency residual directly instead of the latent super-resolved image
- Layers ：similar to SRCNN
- depends heavily on the accuracy of noise estimation without knowing the underlying structures and textures present in the image
- computationally expensive (batch normalization operations after every convolutional layer)
- 提出了一套基于CNN的去噪器，可以聯(lián)合用于圖像去噪、去模糊和超分辨率等幾個低層次的視覺任務(wù)
- Specifically，利用半二次分裂Half Quadratric Splitting (HQS)技術(shù)對觀測模型中的正則項(xiàng)和保真項(xiàng)進(jìn)行解耦，然后，利用CNN具有較強(qiáng)的建模能力和測試時間效率，對去噪先驗(yàn)進(jìn)行判別學(xué)習(xí)
- Layers：CNN去噪器由7個(dilated convolution layers)擴(kuò)張卷積層組成的堆棧組成，這些卷積層與批歸一化和ReLU非線性層交錯。擴(kuò)展操作通過封閉更大的接受域有助于對較大的上下文進(jìn)行建模。
- residual image learning is performed in a similar manner to previous architectures （VDSR, DRCN and DRRN）
- 使用小尺寸的訓(xùn)練樣本和零填充來避免卷積運(yùn)算造成的（boundary artifacts）邊界偽影
- late upsampling
  后上采樣網(wǎng)絡(luò)對低分辨率輸入進(jìn)行學(xué)習(xí)，然后對網(wǎng)絡(luò)輸出附近的特征進(jìn)行上采樣(低內(nèi)存占用)
  - FSRCNN
- improves speed and quality over SRCNN
- Datasets: 91-image dataset ,
  Data augmentation such as rotation, flipping,and scaling is also employed to increase the number of images by 19 times
- Layers：consists of four convolution layers （feature extraction, shrinking, non-linear mapping, and expansion layers）and one deconvolution
- feature extraction step is similar to SRCNN(difference lies in the input size and the filter size, the input to FSRCNN is the original patch without upsampling it)
- shrinking layer : reduce the feature dimensions (number of parameters) by adopting a smaller filter size (i.e. f=1)
- non-linear mapping (critical step)：the size of filters in the non-linear mapping layer is set to three,while the number of channels is kept the same as the previous layer
- expansion layers :an inverse operation of the shrinking step to increase the number of dimensions
- upsampling and aggregating deconvolution layer : stride acts as an upscaling factor
- 使用(PReLU)代替了每個卷積層后的整流線性單元(ReLU)
- Loss Funcion : mean-square error
- ESPCN(Efficient sub-pixel convolutional neural network)
  a fast SR approach that can operate in real-time
  both for images and videos
- perform feature extraction in the LR space
- at the very end to aggregate LR feature maps and simultaneously perform projection to high dimensional space to reconstruct the HR image.
- sub-pixel convolution operation used in this work is essentially similar to convolution transpose or deconvolution operation(使用 fractional kernel stride 分?jǐn)?shù)級步幅用于提高輸入特征圖的空間分辨率)
- Loss Function ： $l_1$ loss
  A separate upscaling kernel is used to map each feature map
Residual Networks（殘差網(wǎng)絡(luò)）
uses skip connections in the network design (avoid gradients vanishing, more feasible)

algorithms learn residue i.e. the high-frequencies between the input and ground-truth

根據(jù) the number of stages used in such networks 可以分成兩類：
- Single-stage Residual Nets
  - EDSR(The Enhanced Deep Super-Resolution)
    modifies the ResNet architecture to work with the SR task
- Removing Batch Normalization layers (from each residual block) and ReLU activation (outside residual blocks) (實(shí)質(zhì)性的改進(jìn))
- Similar to VDSR, they also extended their single scale approach to work on multiple scales.
- Propose Multi-scale Deep SR (MDSR) architecture(reduces the number of parameters through a majority of shared parameters)
- 特定于尺度的層僅并行地應(yīng)用于輸入和輸出塊附近，以學(xué)習(xí)與尺度相關(guān)的表示。
- Data augmentation (rotations and flips) was used to create a ‘self-ensemble’ ( transformed inputs are passed through the network, reverse-transformed and averaged together to create a single output )
- Better performance compared to SR-CNN, VDSR，SR-GAN
- Loss Function ： $l_1$ loss
- 與其他模型的區(qū)別在于本地和全局級聯(lián)模塊的存在
- 中間層的特點(diǎn)是級聯(lián)的，且聚集到一個 $1×11\times1$ 的卷積層上
- 本地級聯(lián)連接與全局級聯(lián)連接相同，只是這些塊是簡單的剩余塊。
- DateSets：using $64×6464\times64$ patches from BSD , Yang et al. and DIV2K dataset with data augmentation
- Loss Function ： $l_1$ loss
- Adam is used for optimization with an initial learning rate of $10^{-4}$ which is halved after every $4×1054\times 10 ^ 5$ steps
- Multi-stage Residual Nets
  composed of multiple subnets that are generally trained in succession (第一個子網(wǎng)通常預(yù)測粗特征，而其他子網(wǎng)改進(jìn)了初始預(yù)測)
  encoder-decoder designs(first downsample the input using an encoder and then perform upsampling via a decoder)(hence two distinct stages)
  - FormResNet
    composed of two networks, both of which are similar to DnCNN，the difference lies in the loss layers
- The first network(Formatting layer格式化層)
  Loss = Euclidean loss + perceptual loss
  The classical algorithms such as BM3D can also replace this formatting layer
- The second deep network (DiffResNet)
  第二層網(wǎng)絡(luò)的輸入取自第一層網(wǎng)絡(luò)
- Formatting layer removes high-frequency corruption in uniform areas
  DiffResNet learns the structured regions
- low-resolution stage：
  feature maps have a smaller size, the same as the input patch
  （通過反褶積和最近鄰上采樣對特征圖進(jìn)行上采樣）
- high-resolution stages：
  The upsampled feature maps are then fed into the high-resolution stage
- a variant of residual block called projected convolution is employed（In both the low-resolution and the high-resolution stages）
  residual block consists of $\times 1$ convolutional layer as a feature map projection to decrease the input size of $\times 3$ convolutional features
  LR stage has six residual blocks，HR stage consists of four residual blocks
- DataSets： DIV2K dataset
  During training, the images are cropped to $108 \times 108$ sized patches and augmented using flipping and rotation operations
- The optimization was performed using Adam
- The residual block consists of 128 feature maps as input and 64 as output
- Loss Function ： $l_2$ loss
- the convolutional layers
  (在保留對象結(jié)構(gòu)和去除退化的同時提取特征映射)
- the deconvolutional layers
  reconstruct the missing details of the images
- skip connections are added between the convolutional and the symmetric deconvolutional layer
  卷積層的特征映射與鏡像反卷積層的輸出相加，然后進(jìn)行非線性校正
- (input)bicubic interpolated images
  (outcome)high-resolution image
  該網(wǎng)絡(luò)具有端到端可訓(xùn)練性，通過最小化系統(tǒng)output與ground truth之間的 $l_2 -norm$ 來達(dá)到收斂性
- proposed three variants of the REDNet architecture(改變了卷積和反卷積層的數(shù)量)
  best performing architecture has 30 weight layers, each with 64 feature maps
- Datasets：the luminance channel from the Berkeley Segmentation Dataset(BSD) is used to generate the training image set
  Ground truth: The patches of size $50 \times 50$
  Input patches : 輸入的patch是通過對patch進(jìn)行降采樣，再用雙三次插值的方法將其恢復(fù)到原來的大小
- Loss Function ：Mean square error (MSE)
- The input and output patch sizes are $\times 9$ and
  $\times 5$ , respectively.
  這些小塊通過其平均值和方差被歸一化，這些平均值和方差隨后被添加到相應(yīng)的恢復(fù)后的最終高分辨率輸出中
  the kernel has a size of $\times 5$ with 128 feature channels
Recursive networks（遞歸網(wǎng)絡(luò)）
employ recursively connected convolutional layers or recursively linked units
這些設(shè)計背后的主要動機(jī)是將較難的SR問題逐步分解為一組較簡單的SR問題
- DRCN（Deep Recursive Convolutional Network）
  這種技術(shù)的一個優(yōu)點(diǎn)是，對于更多的遞歸，參數(shù)的數(shù)量保持不變
  composed of three smaller networks：
embedding network：converts the input (either grayscale or color image) to feature maps
inference net：performs super-resolution
analyzes image regions by recursively applying a single layer （consisting of convolution and ReLU）
The size of the receptive field is increased after each recursion.
The output of the inference net is high-resolution feature maps
reconstruction net：high-resolution feature maps which are transformed to grayscale or color

DRRN（Deep Recursive Residual Network）
a deep CNN model but with conservative parametric complexity

deeper architecture with as many as 52 convolutional layers.
At the same time, they reduce the network complexity
這是通過將residual image learning與網(wǎng)絡(luò)中small blocks層之間的local identity connections相結(jié)合來實(shí)現(xiàn)的
這種并行信息流實(shí)現(xiàn)了對更深層架構(gòu)的穩(wěn)定訓(xùn)練
DRRN utilizes recursive learning which replicates a basic skip-connection block several times to achieve a multi-path network block
由于在復(fù)制之間共享參數(shù)，內(nèi)存成本和計算復(fù)雜度顯著降低
used the standard SGD optimizer
The loss layer is based on MSE loss

MemNet（memory network）
MemNet can be broken down into three parts similar to SRCNN

(the first part)feature extraction block ：
extracts features from the input image
(the second part)(crucial role):
consists of a series of memory blocks
memory block = a recursive unit + a gate unit
The recursive part is similar to ResNet
composed of two convolutional layers with a pre-activation mechanism and dense connections to the gate unit
Each gate unit is a convolutional layer with $\times 1$ convolutional kernel size
Loss Function: MSE
DataSets:using 200 images from BSD and 91 images from Yang et al
The network consists of six memory blocks with six recursions.The total number of layers in MemNet is 80
MemNet也被用于其他圖像恢復(fù)任務(wù)，如圖像去噪，JPEG去塊
Progressive reconstruction designs
To deal with large factors，predict the output in multiple steps $i . e .$ $×2\times 2$ followed by $×4\times 4$
(CNN算法可一步預(yù)測輸出；但是，對于大比例因子而言，這可能不可行)
- SCN(sparse coding-based network)基于稀疏編碼的網(wǎng)絡(luò)
  將稀疏編碼的優(yōu)點(diǎn)與深度神經(jīng)網(wǎng)絡(luò)的領(lǐng)域知識相結(jié)合，以獲得一個緊湊的模型并提高性能
  mimics a Learned Iterative Shrinkage and Thresholding Algorithm (LISTA) network to build a multi-layer neural network
the first convolutional layer extracts features from the low-resolution patches which are then fed into a LISTA network.
the LISTA network consists of a finite number of recurrent stages(to obtain the sparse code for each feature)
LISTA階段由兩個線性層和一個非線性層組成，其中激活函數(shù)具有一個閾值(threshold)，該閾值在訓(xùn)練過程中被學(xué)習(xí)/更新。
為了簡化訓(xùn)練，將非線性神經(jīng)元分解為兩個(linear scaling layers)線性標(biāo)度層和一個(unit-threshold neuron)單位閾值神經(jīng)元
兩個尺度層是對角矩陣，它們互為倒數(shù)，例如，如果存在乘法尺度層，則在閾值單位之后進(jìn)行除法
在LISTA網(wǎng)絡(luò)之后，將（sparse code）稀疏編碼與（high-resolution dictionary）高分辨率字典相乘，在連續(xù)的線性層中重構(gòu)原始的高分辨率patch
最后一步，再次使用線性層，將高分辨率的patch放置在圖像的原始位置，獲得高分辨率的輸出。

LapSRN（Deep Laplacian pyramid super-resolution network）深度拉普拉斯金字塔超分辨率網(wǎng)絡(luò)
consists of three sub-networks that progressively predict the residual images up to a factor of $×8\times8$
將每個子網(wǎng)絡(luò)的殘差圖像加入到輸入LR圖像中，得到SR圖像

The output :
(first sub-network) a residue of $×2\times2$
(second sub-network) a residue of $×4\times4$
(last sub-network) a residue of $×8\times8$
將這些剩余圖像加入相應(yīng)比例的上采樣圖像中，得到最終的超分辨圖像。
將residual prediction branch稱為feature extraction
將the addition of bicubic images with the residue稱為image reconstruction branch
the LapSRN network consists of three types of elements (the convolutional layers, leaky ReLU,and deconvolutional layers)
Loss Function：Charbonnier(一種可微分的 $l_1$ 損失函數(shù)變體，它可以處理異常值)
employed at every sub-network, resembling a multi-loss structure
the filter sizes for convolutional and deconvolutional layers are $3×33\times3$ and $4×44\times4$ , having 64 channels each
DataSets: images from Yang et al. and 200 images from BSD dataset
他們還提出了一個稱為Multi-scale (MS) LapSRN的單一模型，該模型聯(lián)合學(xué)習(xí)處理多個SR尺度。單個MSLapSRN模型的性能優(yōu)于三個不同模型的結(jié)果。對這種效應(yīng)的一種解釋是，單一模型利用了共同的內(nèi)部尺度特征，這有助于獲得更準(zhǔn)確的結(jié)果
Densely Connected Networks
DenseNet architecture
這種設(shè)計的主要動機(jī)是將沿著網(wǎng)絡(luò)深度可用的層次線索組合起來（combine hierarchical cues available along the network depth），以實(shí)現(xiàn)更高的靈活性和更豐富的特性表示。
- SR-DenseNet
  based on the DenseNet which uses dense connections between the layers（a layer directly operates on the output from all previous layers）
這種從低層到高層的信息流動避免了梯度消失的問題，使學(xué)習(xí)compact models成為可能，并加快了訓(xùn)練過程。
Towards the rear part of the network, SR-DenseNet uses a couple of deconvolution layers to upscale the inputs.
three variants of SR-DenseNet ：
a sequential arrangement of dense blocks followed by deconvolution layers
這樣，只有高層次的特征被用于重建最終的SR圖像
Low-level features from initial layers are combined before final reconstruction.
跳躍連接用于組合低層次和高層次的特征
All features are combined by using multiple skip connections between low-level features and the dense blocks（to allow a direct flow of information for a better HR reconstruction）
Since complementary features are encoded at multiple stages in the network, the combination of all feature maps gives the best performance
Loss Function :MSE error ( $l_{2}$ loss )

RDN(Residual Dense Network)
combines residual skip connections (inspired by SR-ResNet) with dense connections (inspired by SR-DenseNet)
主要動機(jī)是充分利用(hierarchical feature representations)分層特性表示來學(xué)習(xí)(local patterns)局部模式

residual connections are introduced at two levels; local and global
(At the local level) 提出了一種新的殘差密集塊(residual density block, RDB)，將每個塊的輸入傳到RDB中的所有層，并將其添加到塊的輸出中，使每個塊更關(guān)注殘差模式（residual patterns）。
由于密集的連接會很快產(chǎn)生高維輸出，因此每個RDB使用了一種包含一個 $1×11\times1$ 卷積的局部特征融合方法來減少維數(shù)
(At the global level)將多個RDB的輸出融合在一起(通過拼接和一個 $1×11\times1$ 卷積操作)，并執(zhí)行全局殘差學(xué)習(xí)來合并網(wǎng)絡(luò)中多個塊的特征
The residual connections help stabilize network training and results in an improvement over the SR-DenseNet
Loss Function: $l_1$ loss
在每批隨機(jī)選擇 $32×3232\times32$ 的patch來進(jìn)行網(wǎng)絡(luò)訓(xùn)練
通過翻轉(zhuǎn)和旋轉(zhuǎn)來增加數(shù)據(jù)是一種正則化措施
作者還對在LR圖像中不同降解degradation形式（噪音和人為干擾)）的環(huán)境進(jìn)行了實(shí)驗(yàn)。該方法具有良好的抗退化能力（resilience against such degradation），能夠恢復(fù)較好的SR圖像

D-DBPN（Dense deep back-projection network）致密深部反投影網(wǎng)絡(luò)
從傳統(tǒng)的SR方法中獲得靈感（迭代地執(zhí)行反向投影，以了解LR和HR圖像之間的反饋錯誤信號）
其動機(jī)是，只有前饋方法不是建模從LR到HR圖像映射的最佳方法，而反饋機(jī)制可以極大地幫助實(shí)現(xiàn)更好的結(jié)果

comprises of a series of up and down sampling layers that are densely connected with each other
將網(wǎng)絡(luò)中多個深度的HR圖像進(jìn)行組合，得到最終的輸出
該設(shè)計的一個重要特點(diǎn)是將 upsampling outputs for input feature map 與residual signal 相結(jié)合。
在upsampled feature map中添加residual signal 可提供錯誤反饋，并迫使網(wǎng)絡(luò)專注于精細(xì)細(xì)節(jié)
Loss Function : $l_{1}$ loss
計算復(fù)雜度較高（ $～\sim$ 10 million parameters for $×4\times 4$ SR）一個較低的復(fù)雜性版本的最終模型也被提出(性能略有下降)
Multi-branch designs
多分支網(wǎng)絡(luò)的目標(biāo)是在多個上下文范圍(multiple context scales)內(nèi)獲得一組不同的特性，然后將這些互補(bǔ)信息融合在一起，得到更好的HR重構(gòu)。
這種設(shè)計還支持多路徑信號流，從而在訓(xùn)練過程中更好地進(jìn)行前向和后向的信息交換
- CNF（Context-wise Network Fusion）
  融合多個卷積神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)圖像超分辨率
  每個SRCNN都由不同數(shù)量的層構(gòu)成，然后，每個SRCNN的輸出通過一個單獨(dú)的卷積層傳遞，最終使用sum-pooling將它們?nèi)诤显谝黄?/li>
DataSets：20 million patches collected from Open Image Dataset
The size of each patch is $33 \times 33$ pixels of luminance channel only
（First）each SRCNN is trained individually （epochs = 50 ，learning rate =1e-4 ）
（then）the fused network is trained （epochs = 10 ，learning rate =1e-4 ）
這種漸進(jìn)的學(xué)習(xí)策略類似于課程學(xué)習(xí)，從簡單的任務(wù)開始，然后轉(zhuǎn)向更復(fù)雜的任務(wù)，聯(lián)合優(yōu)化多個子網(wǎng)，以實(shí)現(xiàn)改進(jìn)的SR。
Loss Function：Mean square error

CMSC（Cascaded multi-scale cross-network）級聯(lián)多尺度交叉網(wǎng)絡(luò)
composed of a feature extraction layer, cascaded subnets, and a reconstruction network

(feature extraction layer) performs the same function as mentioned for the cases of SRCNN , FSRCNN
(cascaded subnets) Each subnet is composed of merge-and-run (MR) blocks
每個MR塊由兩個并行的分支組成，每個分支有兩個卷積層，每個分支的（residual connections）剩余連接累積在一起，然后分別添加到兩個分支的輸出中
CMSC的每個子網(wǎng)均由四個MR塊組成，這些MR塊具有 $3×33\times3$ 、 $5×55\times5$ 和 $7×77\times7$ 的不同接收字段，以多個尺度捕獲上下文信息
MR塊中的每個卷積層后面都是batch normalization和Leaky-ReLU
(last reconstruction layer) generates the final output
Loss Function: $l_{1}$ (使用平衡項(xiàng)將中間輸出與最終輸出組合在一起)
Input : 使用雙三次插值對網(wǎng)絡(luò)的輸入進(jìn)行向上采樣，patch的大小為 $41×4141\times41$
該模型使用與VDSR相似的291幅圖像進(jìn)行訓(xùn)練，初始學(xué)習(xí)率為 $10^{-1}$ ,每10個epochs 學(xué)習(xí)后每十世總共50時代
與EDSR及其變體MDSR相比，CMSC的性能有所滯后

IDN（Information Distillation Network）
consists of three blocks: a feature extraction block, multiple stacked information distillation blocks and a reconstruction block

（feature extraction block）composed of two convolutional layers to extract features
（distillation block）made up of two other blocks, an enhancement unit, and a compression unit.

enhancement unit ：six convolutional layers followed by leaky ReLU
將第三個卷積層的輸出進(jìn)行切片，將其中的一半與block的輸入進(jìn)行拼接，將剩下的一半作為第四個convolutional layer的輸入
The output of the concatenated component (連接組件) is added with the output of the enhancement block. In total, four enhancement blocks are utilized.

compression unit ：the compression unit is realized using a $1×11\times1$ convolutional layer after each enhancement block.
（ reconstruction block） a deconvolution layer with a kernel size of $17×1717\times17$ .
Loss Function: 首先利用(absolute mean error loss)絕對平均誤差損失對網(wǎng)絡(luò)進(jìn)行訓(xùn)練，然后利用(mean square error loss)均方誤差損失對網(wǎng)絡(luò)進(jìn)行微調(diào)
Input :The input patch size is $26×2626\times26$
The initial learning rate is set to be $1 e ? 4$ for a total of $10^5$ iterations
utilizing Adam as an optimizer
Attention-based Networks
在前面討論的網(wǎng)絡(luò)設(shè)計中，所有的空間位置和信道對于超分辨率都具有統(tǒng)一的重要性，在某些情況下，它有助于有選擇地關(guān)注給定層中的少數(shù)特性。
基于注意力的模型允許這種靈活性，并考慮到并非所有的特性都是超分辨率的必要條件，但它們的重要性各不相同。與深度網(wǎng)絡(luò)相結(jié)合，最近的基于注意力的模型顯示了SR的顯著改進(jìn)。
- SelNet
  a novel selection unit for the image super-resolution network
The selection unit serves as a gate between convolutional layers, allowing only selected values from the feature maps.
選擇單元由一個恒等映射和一個ReLU級聯(lián)、一個 $1×11\times 1$ 卷積層和一個sigmoid層組成
SelNet共包含22個卷積層，每個卷積層之后都添加一個選擇單元。與VDSR類似，SelNet也使用了殘差學(xué)習(xí)和gradient switching (a version of gradient clipping)來提高學(xué)習(xí)速度。
DataSets：low-resolution patches of size $120×120120\times 120$ （cropped from DIV2K dataset）
epochs = 50 ，learning rate = $10^{-1}$
Loss Function : $l_{2}$

RCAN(Residual Channel Attention Network)

The main highlights of the architecture include:
(a) 一種遞歸殘差設(shè)計，其中(residual connections)殘差連接存在于(global residual network)全局殘差網(wǎng)絡(luò)的每個塊中
(b) 每個(local residual block)局部剩余塊都有一個(channel attention mechanism)通道注意機(jī)制：the filter activations are collapsed from $h×w×ch\times w \times c$ to a vector with $1×1×c1\times 1\times c$ dimensions (after passing through a bottleneck) that acts as a selective attention over channel maps
第一個新奇之處是允許信息從最初的層流向最終的層
第二個貢獻(xiàn)是允許網(wǎng)絡(luò)將重點(diǎn)放在對最終任務(wù)更重要的選擇性特征映射上，并有效地建模特征映射之間的關(guān)系
Loss Function ： $l_{1}$ loss
recursive residual style architecture使超深網(wǎng)絡(luò)具有更好的收斂性。與當(dāng)代的方法如IRCNN，VDSR和RDN 相比，它具有更好的性能，這說明了通道注意機(jī)制對低水平視覺任務(wù)的有效性
high computational complexity compared to LapSRN, MemNet and VDSR.（ $～15\sim 15$ million parameters for $×4\times 4$ SR)

SRRAM(Residual Attention Module for SR)
SRRAM結(jié)構(gòu)類似于RCAN，這兩種方法都受到了EDSR的啟發(fā)
The SRRAM can be divided into three parts :

(feature extraction) similar to SRCNN
(feature upscaling) composed of residual attention modules (RAM).
SRRAM的基本單元，由residual blocks、spatial attention和channel attention組成，用于學(xué)習(xí)inter-channel and intra-channel dependencies通道間和通道內(nèi)的依賴關(guān)系
(feature reconstruction) similar to SRCNN
DataSets : using randomly cropped $48×4848\times 48$ patches from DIV2K dataset with data augmentation
The filters are of $3×33\times 3$ size with feature maps of 64
The optimizer used is Adam
Loss Function : $l_{1}$ loss
learning rate = $10^{-4}$
在最終的模型中總共使用了64個RAM塊
Multiple-degradation handling networks
in reality, multiple degradations can simultaneously occur
- ZSSR(Zero-Shot Super-Resolution)
  該方法在經(jīng)典方法的基礎(chǔ)上，利用內(nèi)部圖像統(tǒng)計信息，利用深度神經(jīng)網(wǎng)絡(luò)對圖像進(jìn)行超分辨
The ZSSR is trained using a downsampled version of the test image
這里的目的是根據(jù)測試圖像生成的LR圖像預(yù)測測試圖像
一旦網(wǎng)絡(luò)學(xué)習(xí)了LR測試圖像和測試圖像之間的關(guān)系，就會使用相同的網(wǎng)絡(luò)以測試圖像為輸入來預(yù)測SR圖像
因此，它不需要對特定的退化訓(xùn)練圖像，并且可以在推理過程中動態(tài)地學(xué)習(xí)特定于圖像的網(wǎng)絡(luò)
eight convolutional layers followed by ReLU consisting of 64 channels
Loss Function ： $l_{1}$ loss

SRMD(Super-resolution network for multiple degradations)
takes a concatenated low-resolution image and its degradation maps.

The architecture of SRMD is similar to SRCNN.
(First) a cascade of convolutional layers of $\times 3$ filter size is applied to extracted features, followed by a sequence of Conv, ReLU and Batch normalization layers
(Furthermore)similar to ESPCN，利用卷積運(yùn)算提取HR子圖像
(final) HR sub-images are transformed to the final single HR output
SRMD直接學(xué)習(xí)HR圖像，而不是圖像的殘差
a variant called SRMDNF，learns from noise-free degradations
the connections from the first noise-level maps in the convolutional layers are removed
the rest of the architecture is similar to SRMD
The authors trained individual models for each upsampling scale in contrast to the multi-scale training
Loss Function: $l_{1}$ loss
Input : training patches ( $40 \times 40$ )
Layers: The number of convolution layers is fixed to 12, while each layer has 128 feature maps
DataSets: 5,944 images from BSD, DIV2K and Waterloo datasets
initial learning( $10^{-3}$ )， later decreased( $10^{-5}$ )
學(xué)習(xí)速率降低的標(biāo)準(zhǔn)是基于the error change between successive epochs
SRMD和它的變體都不能打破早期SR網(wǎng)絡(luò)如EDSR，MDSR，和CMSC的PSNR記錄
然而，它聯(lián)合處理多種降解的能力提供了一種獨(dú)特的能力
GAN Models
采用博弈論方法，其中模型由兩個部分組成，即生成器和鑒別器。該生成器生成的SR圖像是鑒別器無法識別是否是真實(shí)HR圖像或人工超分辨輸出
這樣就產(chǎn)生了感知質(zhì)量更好的HR圖像，相應(yīng)的PSNR值通常會降低（PSNR值越小表示圖像失真越大）（這突出了SR文獻(xiàn)中流行的定量測量方法沒能很好的描述出生成的HR圖像的感知質(zhì)量）
- SRGAN
  SRGAN提出使用一種對抗目標(biāo)函數(shù)來促使超分辨輸出近乎接近自然圖像。
（highlight）a multi-task loss formulation that consists of three main parts ：
（1）a MSE loss that encodes pixel-wise similarity
（2）a perceptual similarity metric in terms of a distance metric （defined over high-level image representation (e.g., deep network features)）
（3）an adversarial loss
平衡了生成器和鑒別器之間的最小最大博弈(標(biāo)準(zhǔn)GAN目標(biāo))
favors outputs that are perceptually similar to the high-dimensional images
To quantify this capability（perceptually similar）, they introduce a new Mean Opinion Score (MOS) which is assigned manually by human raters indicating bad/excellent quality of each super-resolved
image.
SRGAN在感知質(zhì)量指標(biāo)上明顯優(yōu)于競爭對手
competitors：optimize direct data dependent measures (such as pixel-errors)

EnhanceNet
這個網(wǎng)絡(luò)設(shè)計的重點(diǎn)是在高分辨率的超分辨率圖像中創(chuàng)建如實(shí)的紋理細(xì)節(jié)。

常規(guī)圖像質(zhì)量測量如PSNR的一個關(guān)鍵問題是它們不符合圖像的感知質(zhì)量。這導(dǎo)致過度平滑的圖像沒有銳利的紋理。為了克服這個問題，EnhanceNet 使用了（the regular pixel-level MSE loss）常規(guī)像素級MSE損耗之外的另外兩個loss terms ：
（the perceptual loss function）was defined on the intermediate feature representation of a pretrained network in the form of $l_{1}$ distance
（the texture matching loss）用于低分辨率和高分辨率圖像的紋理匹配 , is quantified as the $l_{1}$ loss between gram matrices computed from deep features
整個網(wǎng)絡(luò)架構(gòu)都經(jīng)過了對抗性訓(xùn)練，SR網(wǎng)絡(luò)的目標(biāo)是欺騙鑒別器網(wǎng)絡(luò)。
EnhanceNet使用的架構(gòu)基于全卷積網(wǎng)絡(luò)和殘差學(xué)習(xí)原理
他們的結(jié)果表明，盡管在只使用(a pixel level loss)像素級損失的情況下可以獲得最佳的PSNR，但是額外的損失項(xiàng)和對抗性訓(xùn)練機(jī)制會產(chǎn)生更實(shí)際和感知上更好的輸出
不利的一面是，當(dāng)超分辨高紋理區(qū)域時，所提出的對抗性訓(xùn)練可能會產(chǎn)生visible artifacts。

SRFeat
another GAN-based Super-Resolution algorithm with Feature Discrimination
這項(xiàng)工作的重點(diǎn)是輸入圖像的真實(shí)感，使用一個額外的鑒別器來幫助生成器生成高頻結(jié)構(gòu)特征（是通過鑒別機(jī)器生成圖像和真實(shí)圖像的特征來實(shí)現(xiàn)的），而不是noisy artifacts

該網(wǎng)絡(luò)使用了個 $9×99\times9$ 個卷積層來提取特征
使用類似于ResNet 帶有l(wèi)ong-range skip connections 的residual blocks，它有 $1×11\times1$ 個卷積
通過（pixel shuffler layers）像素變換層對特征圖進(jìn)行向上采樣以獲得所需的輸出大小
used 16 residual blocks with two different settings of feature maps i.e. 64 and 128
Loss Function: perceptual (adversarial loss) and pixel-level loss ( $l_{2}$ ) functions
Adam optimizer
Input : The input resolution to the system is $74×7474\times74$ which only outputs $296×296296\times296$ image
120k images from the ImageNet for pre-training the generator
followed by fine-tuning on augmented DIV2K dataset using learning rates of $10^{-4}$ to $10^{-6}$ .

ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks)
在SRGAN的基礎(chǔ)上構(gòu)建，刪除batch normalization和incorporating dense blocks

Each dense block’s input is also connected to the output of the respective block making a residual connection over each dense block
ESRGAN also has a global residual connection to enforce residual learning
the authors also employ an enhanced discriminator called Relativistic GAN
DataSets:3,450 images from the DIV2K and Flicker2K datasets employing augmentation
Loss Function: 訓(xùn)練模型 $l_{1}$ loss ，訓(xùn)練好的模型 perceptual loss
Input ：patch size for training is set to $128×128128\times128$
having a network depth of 23 blocks，Each block contains five convolutional layers, each with 64 feature maps
與RCAN相比，視覺結(jié)果相對較好，但在定量測度方面，RCAN表現(xiàn)較好，ESRGAN 存在滯后

實(shí)驗(yàn)評估

Dataset
Set5
Set14
BSD100
Urban100
DIV2K
Manga109
Quantitative Measures
PSNR（peak signal-to-noise ratio）
SSIM（structural similarity index）
Number of parameters
Choice of network loss
卷積神經(jīng)網(wǎng)絡(luò) ：
平均絕對誤差 $l_{1}$
均方誤差 MSE $l_{2}$

生成對抗網(wǎng)絡(luò)(GANs)：

感知損失（對抗損失）
像素級損失（MSE）
Network Depth
目前這批CNNs正在加入更多的卷積層來構(gòu)建更深層次的網(wǎng)絡(luò)，以提高圖像質(zhì)量和數(shù)量，自SRCNN誕生以來，這一趨勢一直是深度SR的主導(dǎo)趨勢
Skip Connections
這些連接可以分為四種主要類型:全局連接、局部連接、遞歸連接和密集連接

未來方向

Incorporation of Priors
Objective Functions and Metrics
Need for Unified Solutions
Unsupervised Image SR
Higher SR rates
Arbitrary SR rates
Real vs Artificial Degradation

總結(jié)

以上是生活随笔為你收集整理的SR综述论文总结的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

论文
sr