當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【RepVGG】《RepVGG：Making VGG-style ConvNets Great Again》

發(fā)布時間：2024/1/18 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了【RepVGG】《RepVGG：Making VGG-style ConvNets Great Again》小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CVPR-2021

文章目錄

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Building RepVGG via Structural Re-param
- 4.1 Simple is Fast, Memory-economical, Flexible
- 4.2 Training-time Multi-branch Architecture
- 4.3 Reparam for Plain Inferencetime Model
- 4.4 Architectural Specification
5 Experiments
- 5.1 Datasets
- 5.2 RepVGG for ImageNet Classification
- 5.3 Structural Reparameterization is the Key
- 5.4 Semantic Segmentation
6 Conclusion（own） / Future work

1 Background and Motivation

MVGA：Making VGG-style ConvNets Great Again

CNN 多 branch 結(jié)構(gòu)（ResNet，inception）的缺點

slow down the inference and reduce the memory utilization
increase the memory access cost and lack supports
of various devices

單 branch（plain 結(jié)構(gòu) eg VGG）沒有上述缺陷但性能又不如多 branch

本文，作者利用 re-parameterization（identity 和 1x1 都等價替換成 3x3）技術(shù)提出了 VGG-style 網(wǎng)絡(luò)（plain）—— RepVGG

訓(xùn)練的時候多 branch，推理的時候單 branch——VGG-style

decouple the training-time multi-branch and inference-time plain architecture via structural re-parameterization

精度還行，提升了推理速度

2 Related Work

From Single-path to Multi-branch
Effective Training of Single-path Models
Model Re-parameterization
Winograd Convolution
Tera FLoating-point Operations Per Second, TFLOPS)

Winograd [20] is a classic algorithm for accelerating 3x3 conv (only if the stride is 1), which has been well supported (and enabled by default) by libraries like cuDNN and MKL.

3 Advantages / Contributions

提出 RepVGG 網(wǎng)絡(luò)

without any branches
uses only 3x3 conv and ReLU
nor heavy designs

run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy

show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet

4 Building RepVGG via Structural Re-param

an identity branch can be regarded as a degraded 1x1 conv, and the latter can be further regarded as a degraded 3x3 conv

4.1 Simple is Fast, Memory-economical, Flexible

先看看簡單結(jié)構(gòu)（plain or single branch）的好處

（1）Fast

VGG-16 has 8.4x FLOPs as EfficientNet-B3 but runs 1.8x faster on 1080Ti, which means the computational density of the former is 15x as the latter.

有如下兩個重要因素是 FLOPs 評價指標(biāo)所忽略的

the memory access cost (MAC)
(MAC constitutes a large portion of time usage in groupwise convolution)
degree of parallelism
a model with high degree of parallelism could be much faster than another one with low degree of parallelism, under the same FLOPs.

（2）Memory-economical

skip connection 結(jié)構(gòu)結(jié)合的時候，要等兩個都到位才能結(jié)合

（3）Flexible

last conv layers of every residual block have to produce tensors of the same shape
multi-branch topology limits the application of channel pruning

4.2 Training-time Multi-branch Architecture

a multibranch architecture makes the model an implicit ensemble of numerous shallower models

例如 with $n$ blocks, the model can be interpreted as an ensemble of $2^n$ models

下面看看作者在訓(xùn)練階段設(shè)計的 multi-branch 結(jié)構(gòu)

we use ResNet-like identity (only if the dimensions match) and 1x1 branches so that the training-time information flow of a building block is $y = x + g (x) + f (x)$ .

$g (x)$ 是1x1 conv
$f (x)$ 是 3x3 conv
$x$ 是 identity

4.3 Reparam for Plain Inferencetime Model

先看圖

1x1、3x3 和 identity 結(jié)構(gòu)后面都接了 BN

再看看公式

$M^{(1)} \in \mathbb{R}^{N \times C_1 \times H_1 \times W_1}$ 輸入特征圖
$M^{(2)} \in \mathbb{R}^{N \times C_2 \times H_2 \times W_2}$ 輸出特征圖
$\mu$ ， $\sigma$ ， $\gamma$ ， $\beta$ 是 BN 的 accumulated mean, standard deviation and learned scaling factor and bias，角標(biāo) (0) 表示的是 identity 的 BN 參數(shù)，角標(biāo) (1) 表示的是 1x1 conv 的，角標(biāo) (2) 表示的是 3x3 conv 的

BN 可以和 conv 合并，具體如下

合并前 (w) 后 (w’) 卷積的 weight 和 bias

an identity can be viewed as a 1x1 conv with an identity matrix as the kernel.

這樣的話，作者的 multi-branch 結(jié)構(gòu)就由 one 3x3 kernel, two 1x1 kernels, and three bias vectors 組成

然后 1x1 的 conv 可以 8 臨域 padding 0 成 3x3 conv

所以作者最終的結(jié)構(gòu)由 3 個 3x3 conv 和 3 個 bias 構(gòu)成，最終 add 在一起

由于矩陣乘法的分配律和結(jié)合律 w1x + b1 + w2x + b2 + w3x + b3 = （w1+w2+w3）x+b1 + b2 + b3

所以三個 conv 和 3 個 bias 可以合并成 1 個 conv 和 1 個 bias，這樣 multi-branch 就變成了 plain 結(jié)構(gòu)

4.4 Architectural Specification

it does not use max pooling like VGG，僅 3x3 conv 堆堆起來

multiplier $a$ to scale the first four stages and $b$ for the last stage, and usually set $b > a$ because we desire the last layer to have richer features for the classification or other down-stream tasks

為了進一步減少參數(shù)量，we may optionally interleave groupwise 3x3 conv layers with dense ones to trade accuracy for efficiency

3 5 7 … 21 for RepVGG-A
3 5 7 … 27 for RepVGG-B

set groups $g$ as 1,2, or 4

5 Experiments

5.1 Datasets

ImageNet
COCO

5.2 RepVGG for ImageNet Classification

Wino MULs is a better proxy on GPU（相比于 FLOPs）

VGG-16 的 FLOPS 比 ResNet-152 大，但 Wino MULs 卻比較小（乘法比加法慢很多），所以最終的推理速度 VGG-16 快（16層的 FLOPs 比 152層的都高也是沒誰啦，哈哈哈）

當(dāng)然，金標(biāo)準(zhǔn)還得是 actual speed

g4 是 group = 4 的意思

Compared to RegNetX-12GF, RepVGG-B3 runs 31% faster（但效果沒有人家好哈，which is the first time for plain models to catch up with the state-of-the-arts）

5.3 Structural Reparameterization is the Key

Identity w/o BN
Post-addition BN：the position of BN is changed from pre-addition to post-addition
+ReLU in branches：inserts ReLU into each branch (after BN and before addition).

5.4 Semantic Segmentation

6 Conclusion（own） / Future work

multiplications are much more time-consuming than additions
RepVGGstyle structural re-parameterization is not a generic overparameterization technique, but a methodology critical for training powerful plain ConvNets.
RepVGG models are more parameter-efficient than ResNets but may be less
favored than the mobile-regime models like MobileNets and ShuffleNets for low-power devices

摘抄一些比較不錯的點評

總結(jié)

以上是生活随笔為你收集整理的【RepVGG】《RepVGG：Making VGG-style ConvNets Great Again》的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： C++ condition_variab
下一篇：黄芪、西洋参都能补气，有何区别？哪个效果