【RepVGG】《RepVGG:Making VGG-style ConvNets Great Again》
CVPR-2021
文章目錄
- 1 Background and Motivation
- 2 Related Work
- 3 Advantages / Contributions
- 4 Building RepVGG via Structural Re-param
- 4.1 Simple is Fast, Memory-economical, Flexible
- 4.2 Training-time Multi-branch Architecture
- 4.3 Reparam for Plain Inferencetime Model
- 4.4 Architectural Specification
- 5 Experiments
- 5.1 Datasets
- 5.2 RepVGG for ImageNet Classification
- 5.3 Structural Reparameterization is the Key
- 5.4 Semantic Segmentation
- 6 Conclusion(own) / Future work
1 Background and Motivation
MVGA:Making VGG-style ConvNets Great Again
CNN 多 branch 結(jié)構(gòu)(ResNet,inception)的缺點
- slow down the inference and reduce the memory utilization
- increase the memory access cost and lack supports
of various devices
單 branch(plain 結(jié)構(gòu) eg VGG)沒有上述缺陷但性能又不如多 branch
本文,作者利用 re-parameterization(identity 和 1x1 都等價替換成 3x3) 技術(shù)提出了 VGG-style 網(wǎng)絡(luò)(plain)—— RepVGG
訓(xùn)練的時候多 branch,推理的時候單 branch——VGG-style
decouple the training-time multi-branch and inference-time plain architecture via structural re-parameterization
精度還行,提升了推理速度
2 Related Work
- From Single-path to Multi-branch
- Effective Training of Single-path Models
- Model Re-parameterization
- Winograd Convolution
Tera FLoating-point Operations Per Second, TFLOPS)
Winograd [20] is a classic algorithm for accelerating 3x3 conv (only if the stride is 1), which has been well supported (and enabled by default) by libraries like cuDNN and MKL.
3 Advantages / Contributions
提出 RepVGG 網(wǎng)絡(luò)
- without any branches
- uses only 3x3 conv and ReLU
- nor heavy designs
run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy
show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet
4 Building RepVGG via Structural Re-param
an identity branch can be regarded as a degraded 1x1 conv, and the latter can be further regarded as a degraded 3x3 conv
4.1 Simple is Fast, Memory-economical, Flexible
先看看簡單結(jié)構(gòu)(plain or single branch)的好處
(1)Fast
VGG-16 has 8.4x FLOPs as EfficientNet-B3 but runs 1.8x faster on 1080Ti, which means the computational density of the former is 15x as the latter.
有如下兩個重要因素是 FLOPs 評價指標(biāo)所忽略的
- the memory access cost (MAC)
(MAC constitutes a large portion of time usage in groupwise convolution) - degree of parallelism
a model with high degree of parallelism could be much faster than another one with low degree of parallelism, under the same FLOPs.
(2)Memory-economical
skip connection 結(jié)構(gòu)結(jié)合的時候,要等兩個都到位才能結(jié)合
(3)Flexible
-
last conv layers of every residual block have to produce tensors of the same shape
-
multi-branch topology limits the application of channel pruning
4.2 Training-time Multi-branch Architecture
a multibranch architecture makes the model an implicit ensemble of numerous shallower models
例如 with n n n blocks, the model can be interpreted as an ensemble of 2 n 2^n 2n models
下面看看作者在訓(xùn)練階段設(shè)計的 multi-branch 結(jié)構(gòu)
we use ResNet-like identity (only if the dimensions match) and 1x1 branches so that the training-time information flow of a building block is y = x + g ( x ) + f ( x ) y = x + g(x) + f(x) y=x+g(x)+f(x).
- g ( x ) g(x) g(x) 是1x1 conv
- f ( x ) f(x) f(x) 是 3x3 conv
- x x x 是 identity
4.3 Reparam for Plain Inferencetime Model
先看圖
1x1、3x3 和 identity 結(jié)構(gòu)后面都接了 BN
再看看公式
-
M ( 1 ) ∈ R N × C 1 × H 1 × W 1 M^{(1)} \in \mathbb{R}^{N \times C_1 \times H_1 \times W_1} M(1)∈RN×C1?×H1?×W1? 輸入特征圖
-
M ( 2 ) ∈ R N × C 2 × H 2 × W 2 M^{(2)} \in \mathbb{R}^{N \times C_2 \times H_2 \times W_2} M(2)∈RN×C2?×H2?×W2? 輸出特征圖
-
μ \mu μ, σ \sigma σ, γ \gamma γ, β \beta β 是 BN 的 accumulated mean, standard deviation and learned scaling factor and bias,角標(biāo) (0) 表示的是 identity 的 BN 參數(shù),角標(biāo) (1) 表示的是 1x1 conv 的,角標(biāo) (2) 表示的是 3x3 conv 的
BN 可以和 conv 合并,具體如下
合并前 (w) 后 (w’) 卷積的 weight 和 bias
an identity can be viewed as a 1x1 conv with an identity matrix as the kernel.
這樣的話,作者的 multi-branch 結(jié)構(gòu)就由 one 3x3 kernel, two 1x1 kernels, and three bias vectors 組成
然后 1x1 的 conv 可以 8 臨域 padding 0 成 3x3 conv
所以作者最終的結(jié)構(gòu)由 3 個 3x3 conv 和 3 個 bias 構(gòu)成,最終 add 在一起
由于矩陣乘法的分配律和結(jié)合律 w1x + b1 + w2x + b2 + w3x + b3 = (w1+w2+w3)x+b1 + b2 + b3
所以三個 conv 和 3 個 bias 可以合并成 1 個 conv 和 1 個 bias,這樣 multi-branch 就變成了 plain 結(jié)構(gòu)
4.4 Architectural Specification
it does not use max pooling like VGG,僅 3x3 conv 堆堆起來
multiplier a a a to scale the first four stages and b b b for the last stage, and usually set b > a b > a b>a because we desire the last layer to have richer features for the classification or other down-stream tasks
為了進一步減少參數(shù)量,we may optionally interleave groupwise 3x3 conv layers with dense ones to trade accuracy for efficiency
3 5 7 … 21 for RepVGG-A
3 5 7 … 27 for RepVGG-B
set groups g g g as 1,2, or 4
5 Experiments
5.1 Datasets
- ImageNet
- COCO
5.2 RepVGG for ImageNet Classification
Wino MULs is a better proxy on GPU(相比于 FLOPs)
VGG-16 的 FLOPS 比 ResNet-152 大,但 Wino MULs 卻比較小(乘法比加法慢很多),所以最終的推理速度 VGG-16 快(16層的 FLOPs 比 152層的都高也是沒誰啦,哈哈哈)
當(dāng)然,金標(biāo)準(zhǔn)還得是 actual speed
g4 是 group = 4 的意思
Compared to RegNetX-12GF, RepVGG-B3 runs 31% faster(但效果沒有人家好哈,which is the first time for plain models to catch up with the state-of-the-arts)
5.3 Structural Reparameterization is the Key
- Identity w/o BN
- Post-addition BN:the position of BN is changed from pre-addition to post-addition
- +ReLU in branches:inserts ReLU into each branch (after BN and before addition).
5.4 Semantic Segmentation
6 Conclusion(own) / Future work
-
multiplications are much more time-consuming than additions
-
RepVGGstyle structural re-parameterization is not a generic overparameterization technique, but a methodology critical for training powerful plain ConvNets.
-
RepVGG models are more parameter-efficient than ResNets but may be less
favored than the mobile-regime models like MobileNets and ShuffleNets for low-power devices
摘抄一些比較不錯的點評
總結(jié)
以上是生活随笔為你收集整理的【RepVGG】《RepVGG:Making VGG-style ConvNets Great Again》的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C++ condition_variab
- 下一篇: 黄芪、西洋参都能补气,有何区别?哪个效果