SR综述论文总结
文章目錄
- 論文:A Deep Journey into Super-resolution: A Survey
- 論文概要
- BackGround
- SISR 分類
- 實(shí)驗(yàn)評估
- 未來方向
論文:A Deep Journey into Super-resolution: A Survey
作者:Saeed Anwar, Salman Khan, and Nick Barnes
論文概要
-
論文概要:
對比了近30個最新的超分辨率卷積網(wǎng)絡(luò)在6個數(shù)據(jù)集上(3個經(jīng)典的,3個最近提出的)的表現(xiàn),來給SISR定下基準(zhǔn)。分成9類。我們還提供了在網(wǎng)絡(luò)復(fù)雜性、內(nèi)存占用、模型輸入和輸出、學(xué)習(xí)細(xì)節(jié)、網(wǎng)絡(luò)損失類型和重要架構(gòu)差異方面的比較。 -
SISR應(yīng)用方面:
- large computer displays
- HD television sets
- hand-held devices (mobile phones,tablets, cameras etc.).
- object detection in scenes (particularly small objects )
- face recognition in surveillance videos
- medical imaging
- improving interpretation of images in remote sensing
- astronomical images
- forensics
-
超分辨率是一個經(jīng)典的問題,但由于種種原因,仍然是計算機(jī)視覺領(lǐng)域一個具有挑戰(zhàn)性和開放性的研究課題
原因: - SR is an ill-posed inverse problem
(There exist multiple solutions for the same low-resolution image. To constrain the solution-space, reliable prior information is typically required.) - the complexity of the problem increases as the up-scaling factor increases.(x2,x4,x8,問題就會變得越來越難)
- assessment of the quality of output is not straightforward(模型的質(zhì)量評估不容易,質(zhì)量指標(biāo)PSNR,SSIM只與人類的感知有松散的聯(lián)系)
-
DL 在其他 AI 領(lǐng)域的應(yīng)用:
- 目標(biāo)分類與探測
- 自然語言處理
- 圖像處理
- 音頻信號處理
-
本論文的貢獻(xiàn):
- 全面回顧超分辨率的最新技術(shù)
- 基于各種超分辨率算法結(jié)構(gòu)的不同提出一個的新的分類方法
- 基于參數(shù)數(shù)量、算法設(shè)置、訓(xùn)練細(xì)節(jié)和重要的結(jié)構(gòu)創(chuàng)新進(jìn)行全面的分析
- 我們對算法進(jìn)行系統(tǒng)的評估(在6個SISR數(shù)據(jù)集上)
- 討論了目前超分領(lǐng)域的挑戰(zhàn)和對未來研究的展望
BackGround
-
Degradation Process:
y=Φ(x;θη)(1)y = \Phi( x ; \theta_\eta)\tag{1}y=Φ(x;θη?)(1)
xxx:HR 圖像
yyy :LR 圖像
Φ\PhiΦ :degradation function
θη\theta_\etaθη?:degradation parameters (scaling factor,noise)現(xiàn)實(shí)中,只有 yyy 是可獲取的,并且沒有降解過程和也沒有降解參數(shù),超分辨率就是試圖消除降解效應(yīng)去獲得和 xxx (真實(shí)HR圖像) 近似的圖像 x^\hat{x}x^
x^=Φ?1(y,θ?)(2)\hat{x}=\Phi^{-1}(y,\theta_\varsigma)\tag{2}x^=Φ?1(y,θ??)(2)
θ?\theta_\varsigmaθ??:Φ?1\Phi^{-1}Φ?1的參數(shù)
降解的過程是未知且非常復(fù)雜的,受到很多因素影響,例如:noise (sensor and speckle), compression, blur (defocus and motion), and other artifacts因此,大多數(shù)研究工作相對(1)更喜歡下邊降解模型:
y=(x?k)↓s+n(3)y = (x \otimes k) \downarrow_s+ \ n\tag{3}y=(x?k)↓s?+?n(3)
kkk:blurring kernel
x?kx \otimes kx?k :convolution operation
↓s\downarrow_s↓s? :downsampling operation with a scaling factor sss
nnn : the additive white Gaussian noise (AWGN) with a standard deviation of σ\sigmaσ (noise level).圖像超分辨率的目標(biāo)就是去最小化與模型y=x?k+ny = x \otimes k+ \ ny=x?k+?n相關(guān)的數(shù)據(jù)保真項(xiàng) (data fidelity term) 如下:
J(x^,θ?,k)=∥x?k?y∥?datafidelityterm+αΨ(x,θ?)?regularizer(正則化)J(\hat{x},\theta_\varsigma,k)=\underbrace{\|x \otimes k -y\|}_{data\ fidelity\ term}+\underbrace{\alpha \Psi(x,\theta_\varsigma)}_{regularizer(正則化)}J(x^,θ??,k)=data?fidelity?term∥x?k?y∥??+regularizer(正則化)αΨ(x,θ??)??
α\alphaα:(the data fidelity term and image prior Ψ(?)\Psi(\cdot)Ψ(?))平衡系數(shù)自然圖像先驗(yàn)
在自然圖像處理領(lǐng)域里,有很多問題(比如圖像去噪、圖像去模糊、圖像修復(fù)、圖像重建等)都是反問題 ,即問題的解不是唯一的。為了縮小問題的解的空間或者說為了更好的逼近真實(shí)解,我們需要添加限制條件。這些限制條件來自自然圖像本身的特性,即自然圖像的先驗(yàn)信息。如果能夠很好地利用自然圖像的先驗(yàn)信息,就可以從低質(zhì)量的圖像上恢復(fù)出高質(zhì)量的圖像,因此研究自然圖像的先驗(yàn)信息是非常有意義的。
目前常用的自然圖像的先驗(yàn)信息有自然圖像的局部平滑性、非局部自相似性、非高斯性、統(tǒng)計特性、稀疏性等特征 。
作者:showaichuan
鏈接:https://www.jianshu.com/p/ed8a5b05c3a4
來源:簡書基于圖像先驗(yàn),超分辨率的方法大致可以分為如下幾個類別:
- prediction methods
- edgebased methods
- statistical methods
- patch-based methods
- deep learning methods
SISR 分類
-
Linear networks
only a single path for signal flow without any skip connections or multiple-branchesnote:some linear networks learn to reproduce the residual image (the difference between the LR and HR images)
根據(jù) up-sampling operation 可以分兩類:
-
early upsampling
首先對LR輸入進(jìn)行上采樣以匹配所需的HR輸出大小,然后學(xué)習(xí)層次特征表示以生成輸出
常用的上采樣方法:雙立方插值算法- SRCNN(using only convolutional layers for super-resolution)
- 數(shù)據(jù)集:
training data set:
HR圖像:synthesized by extracting non-overlapping dense patches of size 32 ×\times× 32 from the HR images
LR圖像:The LR input patches are first downsampled and then upsampled using bicubic interpolation having the same size as the high-resolution output image - Layers :three convolutional and two ReLU layers
convolutional layer is termed as patch extraction or feature extraction(從輸入圖像創(chuàng)建特征映射)
convolutional layer is called non-linear mapping(非線性映射,將特征映射轉(zhuǎn)換為高維特征向量)
convolutional layer aggregates the features maps to output the final high-resolution image - Loss function:Mean Squared Error (MSE)
- VDSR
- Layers :deep CNN architecture
(the VGG-net and uses fixed-size convolutions 3 ×\times× 3 in all network layers)
To avoid slow convergence(緩慢收斂) in deep networks (specifically with 20 weight layers), they propose two effective strategies : - learn a residual mapping(殘差映射) that generates the difference between the HR and LR image(使得目標(biāo)更簡單,網(wǎng)絡(luò)只聚焦在高頻信息)
- gradients are clipped with(夾在) in the range [?θ,+θ][\ -\theta,+\theta\ ][??θ,+θ?](使得到學(xué)習(xí)率可以加速訓(xùn)練過程)
-
- 觀點(diǎn):deeper networks can provide better contextualization and learn generalizable representations that can be used for multi-scale super-resolution
VDSR 與 ResNet - DnCNN
- learns to predict a high-frequency residual directly instead of the latent super-resolved image
- Layers :similar to SRCNN
- depends heavily on the accuracy of noise estimation without knowing the underlying structures and textures present in the image
- computationally expensive (batch normalization operations after every convolutional layer)
- IRCNN(Image Restoration CNN)
- 提出了一套基于CNN的去噪器,可以聯(lián)合用于圖像去噪、去模糊和超分辨率等幾個低層次的視覺任務(wù)
- Specifically,利用半二次分裂Half Quadratric Splitting (HQS)技術(shù)對觀測模型中的正則項(xiàng)和保真項(xiàng)進(jìn)行解耦,然后,利用CNN具有較強(qiáng)的建模能力和測試時間效率,對去噪先驗(yàn)進(jìn)行判別學(xué)習(xí)
- Layers:CNN去噪器由7個(dilated convolution layers)擴(kuò)張卷積層組成的堆棧組成,這些卷積層與批歸一化和ReLU非線性層交錯。擴(kuò)展操作通過封閉更大的接受域有助于對較大的上下文進(jìn)行建模。
- residual image learning is performed in a similar manner to previous architectures (VDSR, DRCN and DRRN)
- 使用小尺寸的訓(xùn)練樣本和零填充來避免卷積運(yùn)算造成的(boundary artifacts)邊界偽影
-
late upsampling
后上采樣網(wǎng)絡(luò)對低分辨率輸入進(jìn)行學(xué)習(xí),然后對網(wǎng)絡(luò)輸出附近的特征進(jìn)行上采樣(低內(nèi)存占用)- FSRCNN
- improves speed and quality over SRCNN
- Datasets: 91-image dataset ,
Data augmentation such as rotation, flipping,and scaling is also employed to increase the number of images by 19 times - Layers:consists of four convolution layers (feature extraction, shrinking, non-linear mapping, and expansion layers)and one deconvolution
- feature extraction step is similar to SRCNN(difference lies in the input size and the filter size, the input to FSRCNN is the original patch without upsampling it)
- shrinking layer : reduce the feature dimensions (number of parameters) by adopting a smaller filter size (i.e. f=1)
- non-linear mapping (critical step):the size of filters in the non-linear mapping layer is set to three,while the number of channels is kept the same as the previous layer
- expansion layers :an inverse operation of the shrinking step to increase the number of dimensions
- upsampling and aggregating deconvolution layer : stride acts as an upscaling factor
- 使用(PReLU)代替了每個卷積層后的整流線性單元(ReLU)
- Loss Funcion : mean-square error
- ESPCN(Efficient sub-pixel convolutional neural network)
a fast SR approach that can operate in real-time
both for images and videos - perform feature extraction in the LR space
- at the very end to aggregate LR feature maps and simultaneously perform projection to high dimensional space to reconstruct the HR image.
- sub-pixel convolution operation used in this work is essentially similar to convolution transpose or deconvolution operation(使用 fractional kernel stride 分?jǐn)?shù)級步幅用于提高輸入特征圖的空間分辨率)
- Loss Function :l1l_1l1? loss
A separate upscaling kernel is used to map each feature map
Residual Networks(殘差網(wǎng)絡(luò))
uses skip connections in the network design (avoid gradients vanishing, more feasible)
algorithms learn residue i.e. the high-frequencies between the input and ground-truth
根據(jù) the number of stages used in such networks 可以分成兩類:
-
Single-stage Residual Nets
- EDSR(The Enhanced Deep Super-Resolution)
modifies the ResNet architecture to work with the SR task
- EDSR(The Enhanced Deep Super-Resolution)
- Removing Batch Normalization layers (from each residual block) and ReLU activation (outside residual blocks) (實(shí)質(zhì)性的改進(jìn))
- Similar to VDSR, they also extended their single scale approach to work on multiple scales.
- Propose Multi-scale Deep SR (MDSR) architecture(reduces the number of parameters through a majority of shared parameters)
- 特定于尺度的層僅并行地應(yīng)用于輸入和輸出塊附近,以學(xué)習(xí)與尺度相關(guān)的表示。
- Data augmentation (rotations and flips) was used to create a ‘self-ensemble’ ( transformed inputs are passed through the network, reverse-transformed and averaged together to create a single output )
- Better performance compared to SR-CNN, VDSR,SR-GAN
- Loss Function :l1l_1l1? loss
- CARN(Cascading residual network)級聯(lián)殘差網(wǎng)絡(luò)
- 與其他模型的區(qū)別在于本地和全局級聯(lián)模塊的存在
- 中間層的特點(diǎn)是級聯(lián)的,且聚集到一個 1×11\times11×1的卷積層上
- 本地級聯(lián)連接與全局級聯(lián)連接相同,只是這些塊是簡單的剩余塊。
- DateSets:using 64×6464\times6464×64 patches from BSD , Yang et al. and DIV2K dataset with data augmentation
- Loss Function :l1l_1l1? loss
- Adam is used for optimization with an initial learning rate of 10?410^{-4}10?4 which is halved after every 4×1054\times 10 ^ 54×105 steps
Multi-stage Residual Nets
composed of multiple subnets that are generally trained in succession (第一個子網(wǎng)通常預(yù)測粗特征,而其他子網(wǎng)改進(jìn)了初始預(yù)測)
encoder-decoder designs(first downsample the input using an encoder and then perform upsampling via a decoder)(hence two distinct stages)
- FormResNet
composed of two networks, both of which are similar to DnCNN,the difference lies in the loss layers
Loss = Euclidean loss + perceptual loss
The classical algorithms such as BM3D can also replace this formatting layer
第二層網(wǎng)絡(luò)的輸入取自第一層網(wǎng)絡(luò)
DiffResNet learns the structured regions
- BTSRN(balanced two-stage residual networks)
composed of a low-resolution stage and a high-resolution stage
feature maps have a smaller size, the same as the input patch
(通過反褶積和最近鄰上采樣對特征圖進(jìn)行上采樣)
The upsampled feature maps are then fed into the high-resolution stage
residual block consists of 1×11 \times 11×1 convolutional layer as a feature map projection to decrease the input size of 3×33 \times 33×3 convolutional features
LR stage has six residual blocks,HR stage consists of four residual blocks
During training, the images are cropped to 108×108108 \times 108108×108 sized patches and augmented using flipping and rotation operations
- REDNet(Residual Encoder Decoder Network)
composed of convolutional and symmetric deconvolutional layers
(ReLU is added after each convolutional and deconvolutional layer)
(在保留對象結(jié)構(gòu)和去除退化的同時提取特征映射)
reconstruct the missing details of the images
卷積層的特征映射與鏡像反卷積層的輸出相加,然后進(jìn)行非線性校正
(outcome)high-resolution image
該網(wǎng)絡(luò)具有端到端可訓(xùn)練性,通過最小化系統(tǒng)output與ground truth之間的 l2?norml_2 -norml2??norm 來達(dá)到收斂性
best performing architecture has 30 weight layers, each with 64 feature maps
Ground truth: The patches of size 50×5050 \times 5050×50
Input patches : 輸入的patch是通過對patch進(jìn)行降采樣,再用雙三次插值的方法將其恢復(fù)到原來的大小
5×55 \times 55×5, respectively.
這些小塊通過其平均值和方差被歸一化,這些平均值和方差隨后被添加到相應(yīng)的恢復(fù)后的最終高分辨率輸出中
the kernel has a size of 5×55 \times 55×5 with 128 feature channels
Recursive networks(遞歸網(wǎng)絡(luò))
employ recursively connected convolutional layers or recursively linked units
這些設(shè)計背后的主要動機(jī)是將較難的SR問題逐步分解為一組較簡單的SR問題
- DRCN(Deep Recursive Convolutional Network)
這種技術(shù)的一個優(yōu)點(diǎn)是,對于更多的遞歸,參數(shù)的數(shù)量保持不變
composed of three smaller networks:
analyzes image regions by recursively applying a single layer (consisting of convolution and ReLU)
The size of the receptive field is increased after each recursion.
The output of the inference net is high-resolution feature maps
- DRRN(Deep Recursive Residual Network)
a deep CNN model but with conservative parametric complexity
這是通過將residual image learning與網(wǎng)絡(luò)中small blocks層之間的local identity connections相結(jié)合來實(shí)現(xiàn)的
這種并行信息流實(shí)現(xiàn)了對更深層架構(gòu)的穩(wěn)定訓(xùn)練
由于在復(fù)制之間共享參數(shù),內(nèi)存成本和計算復(fù)雜度顯著降低
- MemNet(memory network)
MemNet can be broken down into three parts similar to SRCNN
extracts features from the input image
consists of a series of memory blocks
memory block = a recursive unit + a gate unit
composed of two convolutional layers with a pre-activation mechanism and dense connections to the gate unit
Progressive reconstruction designs
To deal with large factors,predict the output in multiple steps i.e.i.e.i.e. ×2\times 2×2 followed by ×4\times 4×4
(CNN算法可一步預(yù)測輸出;但是,對于大比例因子而言,這可能不可行)
- SCN(sparse coding-based network)基于稀疏編碼的網(wǎng)絡(luò)
將稀疏編碼的優(yōu)點(diǎn)與深度神經(jīng)網(wǎng)絡(luò)的領(lǐng)域知識相結(jié)合,以獲得一個緊湊的模型并提高性能
mimics a Learned Iterative Shrinkage and Thresholding Algorithm (LISTA) network to build a multi-layer neural network
LISTA階段由兩個線性層和一個非線性層組成,其中激活函數(shù)具有一個閾值(threshold),該閾值在訓(xùn)練過程中被學(xué)習(xí)/更新。
為了簡化訓(xùn)練,將非線性神經(jīng)元分解為兩個(linear scaling layers)線性標(biāo)度層和一個(unit-threshold neuron)單位閾值神經(jīng)元
兩個尺度層是對角矩陣,它們互為倒數(shù),例如,如果存在乘法尺度層,則在閾值單位之后進(jìn)行除法
- LapSRN(Deep Laplacian pyramid super-resolution network)深度拉普拉斯金字塔超分辨率網(wǎng)絡(luò)
consists of three sub-networks that progressively predict the residual images up to a factor of ×8\times8×8
將每個子網(wǎng)絡(luò)的殘差圖像加入到輸入LR圖像中,得到SR圖像
(first sub-network) a residue of ×2\times2×2
(second sub-network) a residue of ×4\times4×4
(last sub-network) a residue of ×8\times8×8
將這些剩余圖像加入相應(yīng)比例的上采樣圖像中,得到最終的超分辨圖像。
將the addition of bicubic images with the residue稱為image reconstruction branch
employed at every sub-network, resembling a multi-loss structure
Densely Connected Networks
DenseNet architecture
這種設(shè)計的主要動機(jī)是將沿著網(wǎng)絡(luò)深度可用的層次線索組合起來(combine hierarchical cues available along the network depth),以實(shí)現(xiàn)更高的靈活性和更豐富的特性表示。
- SR-DenseNet
based on the DenseNet which uses dense connections between the layers(a layer directly operates on the output from all previous layers)
這樣,只有高層次的特征被用于重建最終的SR圖像
跳躍連接用于組合低層次和高層次的特征
Since complementary features are encoded at multiple stages in the network, the combination of all feature maps gives the best performance
- RDN(Residual Dense Network)
combines residual skip connections (inspired by SR-ResNet) with dense connections (inspired by SR-DenseNet)
主要動機(jī)是充分利用(hierarchical feature representations)分層特性表示來學(xué)習(xí)(local patterns)局部模式
由于密集的連接會很快產(chǎn)生高維輸出,因此每個RDB使用了一種包含一個1×11\times11×1卷積的局部特征融合方法來減少維數(shù)
- D-DBPN(Dense deep back-projection network)致密深部反投影網(wǎng)絡(luò)
從傳統(tǒng)的SR方法中獲得靈感(迭代地執(zhí)行反向投影,以了解LR和HR圖像之間的反饋錯誤信號)
其動機(jī)是,只有前饋方法不是建模從LR到HR圖像映射的最佳方法,而反饋機(jī)制可以極大地幫助實(shí)現(xiàn)更好的結(jié)果
將網(wǎng)絡(luò)中多個深度的HR圖像進(jìn)行組合,得到最終的輸出
在upsampled feature map中添加residual signal 可提供錯誤反饋,并迫使網(wǎng)絡(luò)專注于精細(xì)細(xì)節(jié)
Multi-branch designs
多分支網(wǎng)絡(luò)的目標(biāo)是在多個上下文范圍(multiple context scales)內(nèi)獲得一組不同的特性,然后將這些互補(bǔ)信息融合在一起,得到更好的HR重構(gòu)。
這種設(shè)計還支持多路徑信號流,從而在訓(xùn)練過程中更好地進(jìn)行前向和后向的信息交換
- CNF(Context-wise Network Fusion)
融合多個卷積神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)圖像超分辨率
每個SRCNN都由不同數(shù)量的層構(gòu)成,然后,每個SRCNN的輸出通過一個單獨(dú)的卷積層傳遞,最終使用sum-pooling將它們?nèi)诤显谝黄?/li>
The size of each patch is 33×3333 \times 3333×33 pixels of luminance channel only
(then)the fused network is trained (epochs = 10 ,learning rate =1e-4 )
- CMSC(Cascaded multi-scale cross-network)級聯(lián)多尺度交叉網(wǎng)絡(luò)
composed of a feature extraction layer, cascaded subnets, and a reconstruction network
每個MR塊由兩個并行的分支組成,每個分支有兩個卷積層,每個分支的(residual connections)剩余連接累積在一起,然后分別添加到兩個分支的輸出中
CMSC的每個子網(wǎng)均由四個MR塊組成,這些MR塊具有3×33\times33×3、5×55\times55×5和7×77\times77×7的不同接收字段,以多個尺度捕獲上下文信息
MR塊中的每個卷積層后面都是batch normalization和Leaky-ReLU
- IDN(Information Distillation Network)
consists of three blocks: a feature extraction block, multiple stacked information distillation blocks and a reconstruction block
(feature extraction block)composed of two convolutional layers to extract features
(distillation block)made up of two other blocks, an enhancement unit, and a compression unit.
enhancement unit :six convolutional layers followed by leaky ReLU
將第三個卷積層的輸出進(jìn)行切片,將其中的一半與block的輸入進(jìn)行拼接,將剩下的一半作為第四個convolutional layer的輸入
The output of the concatenated component (連接組件) is added with the output of the enhancement block. In total, four enhancement blocks are utilized.
compression unit :the compression unit is realized using a 1×11\times11×1 convolutional layer after each enhancement block.
( reconstruction block) a deconvolution layer with a kernel size of 17×1717\times1717×17 .
Loss Function: 首先利用(absolute mean error loss)絕對平均誤差損失對網(wǎng)絡(luò)進(jìn)行訓(xùn)練,然后利用(mean square error loss)均方誤差損失對網(wǎng)絡(luò)進(jìn)行微調(diào)
Input :The input patch size is 26×2626\times2626×26
The initial learning rate is set to be 1e?41e-41e?4 for a total of 10510^5105 iterations
utilizing Adam as an optimizer
Attention-based Networks
在前面討論的網(wǎng)絡(luò)設(shè)計中,所有的空間位置和信道對于超分辨率都具有統(tǒng)一的重要性,在某些情況下,它有助于有選擇地關(guān)注給定層中的少數(shù)特性。
基于注意力的模型允許這種靈活性,并考慮到并非所有的特性都是超分辨率的必要條件,但它們的重要性各不相同。與深度網(wǎng)絡(luò)相結(jié)合,最近的基于注意力的模型顯示了SR的顯著改進(jìn)。
- SelNet
a novel selection unit for the image super-resolution network
選擇單元由一個恒等映射和一個ReLU級聯(lián)、一個1×11\times 11×1卷積層和一個sigmoid層組成
- RCAN(Residual Channel Attention Network)
(a) 一種遞歸殘差設(shè)計,其中(residual connections)殘差連接存在于(global residual network)全局殘差網(wǎng)絡(luò)的每個塊中
(b) 每個(local residual block)局部剩余塊都有一個(channel attention mechanism)通道注意機(jī)制:the filter activations are collapsed from h×w×ch\times w \times ch×w×c to a vector with 1×1×c1\times 1\times c1×1×c dimensions (after passing through a bottleneck) that acts as a selective attention over channel maps
第二個貢獻(xiàn)是允許網(wǎng)絡(luò)將重點(diǎn)放在對最終任務(wù)更重要的選擇性特征映射上,并有效地建模特征映射之間的關(guān)系
- SRRAM(Residual Attention Module for SR)
SRRAM結(jié)構(gòu)類似于RCAN,這兩種方法都受到了EDSR的啟發(fā)
The SRRAM can be divided into three parts :
SRRAM的基本單元,由residual blocks、spatial attention和channel attention組成,用于學(xué)習(xí)inter-channel and intra-channel dependencies通道間和通道內(nèi)的依賴關(guān)系
Multiple-degradation handling networks
in reality, multiple degradations can simultaneously occur
- ZSSR(Zero-Shot Super-Resolution)
該方法在經(jīng)典方法的基礎(chǔ)上,利用內(nèi)部圖像統(tǒng)計信息,利用深度神經(jīng)網(wǎng)絡(luò)對圖像進(jìn)行超分辨
這里的目的是根據(jù)測試圖像生成的LR圖像預(yù)測測試圖像
一旦網(wǎng)絡(luò)學(xué)習(xí)了LR測試圖像和測試圖像之間的關(guān)系,就會使用相同的網(wǎng)絡(luò)以測試圖像為輸入來預(yù)測SR圖像
因此,它不需要對特定的退化訓(xùn)練圖像,并且可以在推理過程中動態(tài)地學(xué)習(xí)特定于圖像的網(wǎng)絡(luò)
- SRMD(Super-resolution network for multiple degradations)
takes a concatenated low-resolution image and its degradation maps.
(First) a cascade of convolutional layers of 3×33 \times 33×3 filter size is applied to extracted features, followed by a sequence of Conv, ReLU and Batch normalization layers
(Furthermore)similar to ESPCN,利用卷積運(yùn)算提取HR子圖像
(final) HR sub-images are transformed to the final single HR output
the connections from the first noise-level maps in the convolutional layers are removed
the rest of the architecture is similar to SRMD
學(xué)習(xí)速率降低的標(biāo)準(zhǔn)是基于the error change between successive epochs
然而,它聯(lián)合處理多種降解的能力提供了一種獨(dú)特的能力
GAN Models
采用博弈論方法,其中模型由兩個部分組成,即生成器和鑒別器。該生成器生成的SR圖像是鑒別器無法識別是否是真實(shí)HR圖像或人工超分辨輸出
這樣就產(chǎn)生了感知質(zhì)量更好的HR圖像,相應(yīng)的PSNR值通常會降低(PSNR值越小表示圖像失真越大)(這突出了SR文獻(xiàn)中流行的定量測量方法沒能很好的描述出生成的HR圖像的感知質(zhì)量)
- SRGAN
SRGAN提出使用一種對抗目標(biāo)函數(shù)來促使超分辨輸出近乎接近自然圖像。
(1)a MSE loss that encodes pixel-wise similarity
(2)a perceptual similarity metric in terms of a distance metric (defined over high-level image representation (e.g., deep network features))
(3)an adversarial loss
平衡了生成器和鑒別器之間的最小最大博弈(標(biāo)準(zhǔn)GAN目標(biāo))
image.
competitors:optimize direct data dependent measures (such as pixel-errors)
- EnhanceNet
這個網(wǎng)絡(luò)設(shè)計的重點(diǎn)是在高分辨率的超分辨率圖像中創(chuàng)建如實(shí)的紋理細(xì)節(jié)。
(the perceptual loss function)was defined on the intermediate feature representation of a pretrained network in the form of l1l_{1}l1? distance
(the texture matching loss)用于低分辨率和高分辨率圖像的紋理匹配 , is quantified as the l1l_{1}l1? loss between gram matrices computed from deep features
- SRFeat
another GAN-based Super-Resolution algorithm with Feature Discrimination
這項(xiàng)工作的重點(diǎn)是輸入圖像的真實(shí)感,使用一個額外的鑒別器來幫助生成器生成高頻結(jié)構(gòu)特征(是通過鑒別機(jī)器生成圖像和真實(shí)圖像的特征來實(shí)現(xiàn)的),而不是noisy artifacts
followed by fine-tuning on augmented DIV2K dataset using learning rates of 10?410^{-4}10?4 to 10?610^{-6}10?6.
- ESRGAN(Enhanced Super-Resolution Generative Adversarial Networks)
在SRGAN的基礎(chǔ)上構(gòu)建,刪除batch normalization和incorporating dense blocks
實(shí)驗(yàn)評估
-
Dataset
Set5
Set14
BSD100
Urban100
DIV2K
Manga109 -
Quantitative Measures
PSNR(peak signal-to-noise ratio)
SSIM(structural similarity index)
-
Number of parameters
-
Choice of network loss
卷積神經(jīng)網(wǎng)絡(luò) : - 平均絕對誤差 l1l_{1}l1?
- 均方誤差 MSE l2l_{2}l2?
- 感知損失(對抗損失)
- 像素級損失(MSE)
-
Network Depth
目前這批CNNs正在加入更多的卷積層來構(gòu)建更深層次的網(wǎng)絡(luò),以提高圖像質(zhì)量和數(shù)量,自SRCNN誕生以來,這一趨勢一直是深度SR的主導(dǎo)趨勢 -
Skip Connections
這些連接可以分為四種主要類型:全局連接、局部連接、遞歸連接和密集連接
生成對抗網(wǎng)絡(luò)(GANs):
未來方向
- Incorporation of Priors
- Objective Functions and Metrics
- Need for Unified Solutions
- Unsupervised Image SR
- Higher SR rates
- Arbitrary SR rates
- Real vs Artificial Degradation
總結(jié)
- 上一篇: 语音识别中的鸡尾酒会问题
- 下一篇: 密码学·常用网址