Paper:2020年3月30日何恺明团队最新算法RegNet—来自Facebook AI研究院《Designing Network Design Spaces》的翻译与解读
Paper:2020年3月30日何愷明團隊最新算法RegNet—來自Facebook AI研究院《Designing Network Design Spaces》的翻譯與解讀
導(dǎo)讀:
臥槽,臥槽,臥槽! 還是熟悉的團隊,還是熟悉的署名,熟悉的Ross,熟悉的何愷明, 熟悉的何大佬。
就像知乎網(wǎng)友分享的那樣,理性分析可知,雖然有何,但其不是一作,深入研究發(fā)現(xiàn),這個屬于Ilija Radosavovic系列的工作,何與Ross應(yīng)該有所指導(dǎo)和建議。Ilija Radosavovic在這個系列上出了兩篇,第一篇評價搜索空間;這是第二篇,在第一篇的基礎(chǔ)上,不斷的shrink搜索空間來獲取insight輔助設(shè)計網(wǎng)絡(luò)。
卡多不差錢,真的可以為所欲為呀。這次帶來的RegNet,且讓我研究一番。
?
目錄
Designing Network Design Spaces
Abstract??摘要
1. Introduction??介紹
2. Related Work??相關(guān)工作
3. Design Space Design? ?設(shè)計空間設(shè)計
3.1. Tools for Design Space Design??設(shè)計空間設(shè)計工具
3.2. The AnyNet Design Space??AnyNet的設(shè)計空間
3.3. The RegNet Design Space??RegNet設(shè)計空間
3.4. Design Space Generalization? ?設(shè)計空間泛化
4. Analyzing the RegNetX Design Space? ?分析RegNetX設(shè)計空間
5. Comparison to Existing Networks??與現(xiàn)有網(wǎng)絡(luò)的比較
5.1. State-of-the-Art Comparison: Mobile Regime??最先進的比較:移動系統(tǒng)
5.2. Standard Baselines Comparison: ResNe(X)t? ?標準基線比較:ResNe(X)t
5.3. State-of-the-Art Comparison: Full Regime? ?最先進的比較:完全制度
6. Conclusion??結(jié)論
?
論文地址:https://arxiv.org/pdf/2003.13678.pdf
相關(guān)文章:如何評價FAIR團隊最新推出的RegNet?
Designing Network Design Spaces
Abstract??摘要
| In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5× faster on GPUs. | 在這項工作中,我們提出了一個新的網(wǎng)絡(luò)設(shè)計范例。我們的目標是幫助提高對網(wǎng)絡(luò)設(shè)計的理解,并發(fā)現(xiàn)跨設(shè)置泛化的設(shè)計原則。我們不是專注于設(shè)計單個的網(wǎng)絡(luò)實例,而是設(shè)計參數(shù)化網(wǎng)絡(luò)總體的網(wǎng)絡(luò)設(shè)計空間。整個過程類似于經(jīng)典的手工網(wǎng)絡(luò)設(shè)計,但是提升到了設(shè)計空間的層次。使用我們的方法,我們探索了網(wǎng)絡(luò)設(shè)計的結(jié)構(gòu)方面,并得出一個由簡單的、規(guī)則的網(wǎng)絡(luò)組成的低維設(shè)計空間,我們稱之為RegNet。RegNet參數(shù)化的核心思想非常簡單:良好網(wǎng)絡(luò)的寬度和深度可以用量化的線性函數(shù)來解釋。我們分析了RegNet設(shè)計空間,得出了與當前網(wǎng)絡(luò)設(shè)計實踐不相符的有趣發(fā)現(xiàn)。RegNet設(shè)計空間提供了簡單而快速的網(wǎng)絡(luò),可以很好地跨越各種觸發(fā)器。在可比較的訓(xùn)練設(shè)置和flops的情況下,RegNet模型在gpu上的運行速度高達5倍,表現(xiàn)優(yōu)于當前流行的有效網(wǎng)模型。 |
?
1. Introduction??介紹
| Deep convolutional neural networks are the engine of visual recognition. Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks. Examples include LeNet [15], AlexNet [13], VGG [26], and ResNet [8]. This body of work advanced both the effectiveness of neural networks as well as our understanding of network design. In particular, the above sequence of works demonstrated the importance of convolution, network and data size, depth, and residuals, respectively. The outcome of these works is not just particular network instantiations, but also design principles that can be generalized and applied to numerous settings. | 深度卷積神經(jīng)網(wǎng)絡(luò)是視覺識別的引擎。在過去的幾年里,更好的體系結(jié)構(gòu)在視覺識別領(lǐng)域取得了長足的進步。例如LeNet[15]、AlexNet[13]、VGG[26]和ResNet[8]。這一工作體系既提高了神經(jīng)網(wǎng)絡(luò)的有效性,也提高了我們對網(wǎng)絡(luò)設(shè)計的理解。特別是上述工作序列,分別說明了卷積、網(wǎng)絡(luò)和數(shù)據(jù)大小、深度和殘差的重要性。這些工作的結(jié)果不僅是特定的網(wǎng)絡(luò)實例化,而且是設(shè)計原則,可以推廣和應(yīng)用于許多設(shè)置。 |
| Figure 1. Design space design. We propose to design network design spaces, where a design space is a parametrized set of possible model architectures. Design space design is akin to manual network design, but elevated to the population level. In each step of our process the input is an initial design space and the output is a refined design space of simpler or better models. Following [21], we characterize the quality of a design space by sampling models and inspecting their error distribution. For example, in the figure above we start with an initial design space A and apply two refinement steps to yield design spaces B then C. In this case C ? B ?A (left), and the error distributions are strictly improving from A to B to C (right). The hope is that design principles that apply to model populations are more likely to be robust and generalize. While manual network design has led to large advances, finding well-optimized networks manually can be challenging, especially as the number of design choices increases. A popular approach to address this limitation is neural architecture search (NAS). Given a fixed search space of possible?networks, NAS automatically finds a good model within the search space. Recently, NAS has received a lot of attention and shown excellent results [34, 18, 29]. Despite the effectiveness of NAS, the paradigm has limitations. The outcome of the search is a single network instance tuned to a specific setting (e.g., hardware platform). This is sufficient in some cases; however, it does not enable discovery of network design principles that deepen our understanding and allow us to generalize to new settings. In particular, our aim is to find simple models that are easy to understand, build upon, and generalize. | 圖1所示。設(shè)計空間設(shè)計。我們建議設(shè)計網(wǎng)絡(luò)設(shè)計空間,其中設(shè)計空間是可能的模型架構(gòu)的參數(shù)化集合。設(shè)計空間設(shè)計類似于人工網(wǎng)絡(luò)設(shè)計,但提升到了人口層面。在我們流程的每個步驟中,輸入是初始設(shè)計空間,輸出是更簡單或更好模型的精細化設(shè)計空間。在[21]之后,我們通過采樣模型并檢查它們的誤差分布來描述設(shè)計空間的質(zhì)量。例如,在上圖中我們從最初的設(shè)計空間和應(yīng)用兩個改進措施產(chǎn)量設(shè)計空間B C。在這種情況下C?B?(左),和誤差分布嚴格改善從A到B C(右)。希望適用于模型總體的設(shè)計原則更有可能是健壯的和一般化的。 雖然手動網(wǎng)絡(luò)設(shè)計已經(jīng)取得了很大的進展,但是手動找到優(yōu)化良好的網(wǎng)絡(luò)可能是一項挑戰(zhàn),特別是在設(shè)計選擇的數(shù)量增加的情況下。解決這一限制的一種流行方法是神經(jīng)架構(gòu)搜索(NAS)。給定一個可能的網(wǎng)絡(luò)的固定搜索空間,NAS會自動在搜索空間中找到一個好的模型。近年來,NAS受到了廣泛的關(guān)注,并取得了良好的研究成果[34,18,29]。 盡管NAS有效,但這種范式也有局限性。搜索的結(jié)果是將單個網(wǎng)絡(luò)實例調(diào)優(yōu)到特定的設(shè)置(例如,硬件平臺)。在某些情況下,這就足夠了;然而,它并不能幫助我們發(fā)現(xiàn)網(wǎng)絡(luò)設(shè)計原則,從而加深我們的理解,并使我們能夠歸納出新的設(shè)置。特別是,我們的目標是找到易于理解、構(gòu)建和泛化的簡單模型。 |
| In this work, we present a new network design paradigm that combines the advantages of manual design and NAS. Instead of focusing on designing individual network instances, we design design spaces that parametrize populations of networks.1 Like in manual design, we aim for interpretability and to discover general design principles that describe networks that are simple, work well, and generalize across settings. Like in NAS, we aim to take advantage of semi-automated procedures to help achieve these goals. The general strategy we adopt is to progressively design simplified versions of an initial, relatively unconstrained, design space while maintaining or improving its quality (Figure 1). The overall process is analogous to manual design, elevated to the population level and guided via distribution estimates of network design spaces [21]. As a testbed for this paradigm, our focus is on exploring network structure (e.g., width, depth, groups, etc.) assuming standard model families including VGG [26], ResNet [8], and ResNeXt [31]. We start with a relatively unconstrained design space we call AnyNet (e.g., widths and depths vary freely across stages) and apply our humanin-the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, that we call RegNet. The core of the RegNet design space is simple: stage widths and depths are determined by a quantized linear function. Compared to AnyNet, the RegNet design space has simpler models, is easier to interpret, and has a higher concentration of good models. | 在這項工作中,我們提出了一個新的網(wǎng)絡(luò)設(shè)計范例,它結(jié)合了手工設(shè)計和NAS的優(yōu)點。我們不是專注于設(shè)計單個的網(wǎng)絡(luò)實例,而是設(shè)計參數(shù)化網(wǎng)絡(luò)總體的設(shè)計空間。和手工設(shè)計一樣,我們的目標是可解釋性,并發(fā)現(xiàn)描述網(wǎng)絡(luò)的一般設(shè)計原則,這些網(wǎng)絡(luò)簡單、工作良好,并且可以在各種設(shè)置中泛化。與NAS一樣,我們的目標是利用半自動過程來幫助實現(xiàn)這些目標。 我們采用的一般策略是逐步設(shè)計一個初始的、相對不受約束的設(shè)計空間的簡化版本,同時保持或提高其質(zhì)量(圖1)。整個過程類似于手工設(shè)計,提升到總體水平,并通過網(wǎng)絡(luò)設(shè)計空間[21]的分布估計進行指導(dǎo)。 作為這個范例的一個測試平臺,我們的重點是探索假定標準模型族包括VGG[26]、ResNet[8]和ResNeXt[31]的網(wǎng)絡(luò)結(jié)構(gòu)(例如,寬度、深度、組等)。我們從一個相對不受約束的設(shè)計空間開始,我們稱之為AnyNet(例如,寬度和深度在不同階段自由變化),并應(yīng)用我們的人在循環(huán)的方法來達到一個由簡單的“規(guī)則”網(wǎng)絡(luò)組成的低維設(shè)計空間,我們稱之為RegNet。RegNet設(shè)計空間的核心很簡單:舞臺寬度和深度由量化的線性函數(shù)決定。與AnyNet相比,RegNet設(shè)計空間具有更簡單的模型,更易于解釋,并且具有更高的優(yōu)秀模型集中度。 ? |
| We design the RegNet design space in a low-compute, low-epoch regime using a single network block type on ImageNet [3]. We then show that the RegNet design space generalizes to larger compute regimes, schedule lengths, and network block types. Furthermore, an important property of the design space design is that it is more interpretable and can lead to insights that we can learn from. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. For example, we find that the depth of the best models is stable across compute regimes (~20 blocks) and that the best models do not use either a bottleneck or inverted bottleneck. We compare top REGNET models to existing networks in various settings. First, REGNET models are surprisingly effective in the mobile regime. We hope that these simple models can serve as strong baselines for future work. Next, REGNET models lead to considerable improvements over standard RESNE(X)T [8, 31] models in all metrics. We highlight the improvements for fixed activations, which is of high practical interest as the number of activations can strongly influence the runtime on accelerators such as GPUs. Next, we compare to the state-of-the-art EFFICIENTNET [29] models across compute regimes. Under comparable training settings and flops, REGNET models outperform EFFICIENTNET models while being up to 5× faster on GPUs. We further test generalization on ImageNetV2 [24]. We note that network structure is arguably the simplest form of a design space design one can consider. Focusing on designing richer design spaces (e.g., including operators) may lead to better networks. Nevertheless, the structure will likely remain a core component of such design spaces. In order to facilitate future research we will release all code and pretrained models introduced in this work.2 | 我們使用ImageNet[3]上的單一網(wǎng)絡(luò)塊類型,在低計算、低歷元的情況下設(shè)計了RegNet設(shè)計空間。然后,我們展示了RegNet設(shè)計空間可以泛化為更大的計算狀態(tài)、調(diào)度長度和網(wǎng)絡(luò)塊類型。此外,設(shè)計空間設(shè)計的一個重要特性是,它具有更強的可解釋性,并能帶來我們可以學(xué)習(xí)的見解。我們分析了RegNet設(shè)計空間,得出了與當前網(wǎng)絡(luò)設(shè)計實踐不相符的有趣發(fā)現(xiàn)。例如,我們發(fā)現(xiàn)最佳模型的深度在不同的計算機制(~20塊)之間是穩(wěn)定的,并且最佳模型既不使用瓶頸,也不使用反向瓶頸。 我們將頂級的REGNET模型與各種環(huán)境下的現(xiàn)有網(wǎng)絡(luò)進行比較。首先,REGNET模型在移動環(huán)境中非常有效。我們希望這些簡單的模型可以作為未來工作的強大基線。接下來,REGNET模型在所有指標上都比標準的RESNE(X)T[8,31]模型有了顯著的改進。我們強調(diào)了對固定激活的改進,這是一個很有實際意義的問題,因為激活的數(shù)量會對加速程序(如gpu)的運行時間產(chǎn)生很大的影響。接下來,我們比較最先進的高效網(wǎng)絡(luò)[29]模型的計算制度。在可比較的訓(xùn)練設(shè)置和失敗的情況下,REGNET模型比有效的net模型表現(xiàn)得更好,同時在gpu上可以達到5倍的速度。我們進一步測試了在ImageNetV2[24]上的泛化。 我們注意到,網(wǎng)絡(luò)結(jié)構(gòu)可以說是人們可以考慮的最簡單的設(shè)計空間形式。專注于設(shè)計更豐富的設(shè)計空間(例如,包括運營商)可能會導(dǎo)致更好的網(wǎng)絡(luò)。然而,該結(jié)構(gòu)可能仍然是此類設(shè)計空間的核心組成部分。 為了方便未來的研究,我們將發(fā)布所有的代碼和在此工作中引入的預(yù)訓(xùn)練模型 |
?
2. Related Work??相關(guān)工作
| Manual network design. The introduction of AlexNet [13] catapulted network design into a thriving research area. In the following years, improved network designs were proposed; examples include VGG [26], Inception [27, 28], ResNet [8], ResNeXt [31], DenseNet [11], and MobileNet [9, 25]. The design process behind these networks was largely manual and focussed on discovering new design choices that improve accuracy e.g., the use of deeper models or residuals. We likewise share the goal of discovering new design principles. In fact, our methodology is analogous to manual design but performed at the design space level. Automated network design. Recently, the network design process has shifted from a manual exploration to more automated network design, popularized by NAS. NAS has proven to be an effective tool for finding good models, e.g., [35, 23, 17, 20, 18, 29]. The majority of work in NAS focuses on the search algorithm, i.e., efficiently finding the best network instances within a fixed, manually designed search space (which we call a design space). Instead, our focus is on a paradigm for designing novel design spaces. The two are complementary: better design spaces can improve the efficiency of NAS search algorithms and also lead to existence of better models by enriching the design space. | 手動網(wǎng)絡(luò)設(shè)計。AlexNet[13]的引入使網(wǎng)絡(luò)設(shè)計成為一個蓬勃發(fā)展的研究領(lǐng)域。在接下來的幾年里,提出了改進的網(wǎng)絡(luò)設(shè)計;例如VGG [26], Inception [27,28], ResNet [8], ResNeXt [31], DenseNet[11],和MobileNet[9,25]。這些網(wǎng)絡(luò)背后的設(shè)計過程很大程度上是手工的,并且專注于發(fā)現(xiàn)新的設(shè)計選擇,以提高準確性,例如使用更深層次的模型或殘差。我們同樣分享發(fā)現(xiàn)新設(shè)計原則的目標。事實上,我們的方法類似于手工設(shè)計,但是在設(shè)計空間級別執(zhí)行。 自動化的網(wǎng)絡(luò)設(shè)計。最近,網(wǎng)絡(luò)設(shè)計的過程已經(jīng)從手工探索轉(zhuǎn)向了自動化程度更高的網(wǎng)絡(luò)設(shè)計,并由NAS推廣開來。NAS已被證明是尋找良好模型的有效工具,如[35,23,17,20,18,29]。NAS的大部分工作集中在搜索算法上,即,在一個固定的、手工設(shè)計的搜索空間(我們稱之為設(shè)計空間)中高效地找到最佳網(wǎng)絡(luò)實例。相反,我們關(guān)注的是設(shè)計新穎設(shè)計空間的范例。兩者是相輔相成的:更好的設(shè)計空間可以提高NAS搜索算法的效率,并且通過豐富設(shè)計空間來產(chǎn)生更好的模型。 |
| Network scaling. Both manual and semi-automated network design typically focus on finding best-performing network instances for a specific regime (e.g., number of flops comparable to ResNet-50). Since the result of this procedure is a single network instance, it is not clear how to adapt the instance to a different regime (e.g., fewer flops). A common practice is to apply network scaling rules, such as varying network depth [8], width [32], resolution [9], or all three jointly [29]. Instead, our goal is to discover general design principles that hold across regimes and allow for efficient tuning for the optimal network in any target regime. Comparing networks. Given the vast number of possible network design spaces, it is essential to use a reliable comparison metric to guide our design process. Recently, the authors of [21] proposed a methodology for comparing and analyzing populations of networks sampled from a design space. This distribution-level view is fully-aligned with our goal of finding general design principles. Thus, we adopt this methodology and demonstrate that it can serve as a useful tool for the design space design process. | 網(wǎng)絡(luò)擴展。手動和半自動網(wǎng)絡(luò)設(shè)計通常都關(guān)注于為特定的機制尋找性能最佳的網(wǎng)絡(luò)實例(例如,與ResNet-50相當?shù)氖?shù))。由于這個過程的結(jié)果是一個單一的網(wǎng)絡(luò)實例,所以不清楚如何使實例適應(yīng)不同的機制(例如,更少的失敗)。一種常見的做法是應(yīng)用網(wǎng)絡(luò)縮放規(guī)則,如改變網(wǎng)絡(luò)深度[8]、寬度[32]、分辨率[9]或三者同時改變[29]。相反,我們的目標是發(fā)現(xiàn)普遍的設(shè)計原則,這些原則適用于各種制度,并允許對任何目標制度下的最佳網(wǎng)絡(luò)進行有效的調(diào)優(yōu)。 比較網(wǎng)絡(luò)。考慮到大量可能的網(wǎng)絡(luò)設(shè)計空間,使用可靠的比較度量來指導(dǎo)我們的設(shè)計過程是至關(guān)重要的。最近,[21]的作者提出了一種比較和分析從設(shè)計空間中采樣的網(wǎng)絡(luò)總體的方法。這個分布級視圖與我們尋找通用設(shè)計原則的目標完全一致。因此,我們采用這種方法,并證明它可以作為一個有用的工具,為設(shè)計空間的設(shè)計過程。 |
| Parameterization. Our final quantized linear parameterization shares similarity with previous work, e.g. how stage widths are set [26, 7, 32, 11, 9]. However, there are two key differences. First, we provide an empirical study justifying the design choices we make. Second, we give insights into structural design choices that were not previously understood (e.g., how to set the number of blocks in each stages). | 參數(shù)化。我們最終的量化線性參數(shù)化與之前的工作有相似之處,例如如何設(shè)置舞臺寬度[26,7,32,11,9]。然而,有兩個關(guān)鍵的區(qū)別。首先,我們提供了一個實證研究證明我們所做的設(shè)計選擇。其次,我們提供了以前不了解的結(jié)構(gòu)設(shè)計選擇(例如,如何在每個階段設(shè)置塊的數(shù)量)。 |
?
3. Design Space Design? ?設(shè)計空間設(shè)計
| Our goal is to design better networks for visual recognition. Rather than designing or searching for a single best model under specific settings, we study the behavior of populations of models. We aim to discover general design principles that can apply to and improve an entire model population. Such design principles can provide insights into network design and are more likely to generalize to new settings (unlike a single model tuned for a specific scenario). We rely on the concept of network design spaces introduced by Radosavovic et al. [21]. A design space is a large, possibly infinite, population of model architectures. The core insight from [21] is that we can sample models from a design space, giving rise to a model distribution, and turn to tools from classical statistics to analyze the design space. We note that this differs from architecture search, where the goal is to find the single best model from the space. | 我們的目標是設(shè)計更好的視覺識別網(wǎng)絡(luò)。我們不是在特定的環(huán)境下設(shè)計或?qū)ふ覇我坏淖罴涯P?#xff0c;而是研究模型總體的行為。我們的目標是發(fā)現(xiàn)一般的設(shè)計原則,可以適用于和改善整個模型人口。這樣的設(shè)計原則可以提供對網(wǎng)絡(luò)設(shè)計的洞察,并且更有可能泛化到新的設(shè)置中(不同于針對特定場景進行調(diào)整的單個模型)。 我們依靠Radosavovic等人提出的網(wǎng)絡(luò)設(shè)計空間的概念。一個設(shè)計空間是一個龐大的,可能是無限的,模型架構(gòu)的種群。[21]的核心觀點是,我們可以從設(shè)計空間中抽取模型樣本,生成模型分布,并使用經(jīng)典統(tǒng)計數(shù)據(jù)中的工具來分析設(shè)計空間。我們注意到這與架構(gòu)搜索不同,架構(gòu)搜索的目標是從空間中找到單個的最佳模型。 |
| In this work, we propose to design progressively simplified versions of an initial, unconstrained design space. We refer to this process as design space design. Design space design is akin to sequential manual network design, but elevated to the population level. Specifically, in each step of our design process the input is an initial design space and the output is a refined design space, where the aim of each design step is to discover design principles that yield populations of simpler or better performing models. We begin by describing the basic tools we use for design space design in §3.1. Next, in §3.2 we apply our methodology to a design space, called AnyNet, that allows unconstrained network structures. In §3.3, after a sequence of design steps, we obtain a simplified design space consisting of only regular network structures that we name RegNet. Finally, as our goal is not to design a design space for a single setting, but rather to discover general principles of network design that generalize to new settings, in §3.4 we test the generalization of the RegNet design space to new settings. Relative to the AnyNet design space, the RegNet design space is: (1) simplified both in terms of its dimension and type of network configurations it permits, (2) contains a higher concentration of top-performing models, and (3) is more amenable to analysis and interpretation. | 在這項工作中,我們建議設(shè)計一個逐步簡化版本的初始,無約束的設(shè)計空間。我們將這個過程稱為設(shè)計空間設(shè)計。設(shè)計空間設(shè)計類似于順序的手工網(wǎng)絡(luò)設(shè)計,但提升到總體水平。具體地說,在我們的設(shè)計過程的每一步中,輸入是一個初始設(shè)計空間,輸出是一個精細化的設(shè)計空間,其中每一步的目的是發(fā)現(xiàn)設(shè)計原則,從而生成更簡單或性能更好的模型。 我們首先在§3.1中描述我們用于設(shè)計空間設(shè)計的基本工具。接下來,在§3.2中,我們將我們的方法應(yīng)用于一個稱為AnyNet的設(shè)計空間,該設(shè)計空間允許無約束的網(wǎng)絡(luò)結(jié)構(gòu)。在§3.3中,經(jīng)過一系列設(shè)計步驟之后,我們獲得了一個簡化的設(shè)計空間,該空間只由我們命名為RegNet的規(guī)則網(wǎng)絡(luò)結(jié)構(gòu)組成。最后,由于我們的目標不是為單個設(shè)置設(shè)計一個設(shè)計空間,而是發(fā)現(xiàn)網(wǎng)絡(luò)設(shè)計的一般原則,這些原則可以推廣到新的設(shè)置,在§3.4中,我們測試了RegNet設(shè)計空間推廣到新的設(shè)置的情況。 相對于AnyNet設(shè)計空間,RegNet設(shè)計空間是:(1)在它所允許的網(wǎng)絡(luò)配置的尺寸和類型方面都進行了簡化;(2)包含了更集中的性能最好的模型;(3)更易于分析和解釋。 |
?
3.1. Tools for Design Space Design??設(shè)計空間設(shè)計工具
| We begin with an overview of tools for design space design. To evaluate and compare design spaces, we use the tools introduced by Radosavovic et al. [21], who propose to quantify the quality of a design space by sampling a set of models from that design space and characterizing the resulting model error distribution. The key intuition behind this approach is that comparing distributions is more robust and informative than using search (manual or automated) and comparing the best found models from two design spaces. | 我們首先概述用于設(shè)計空間設(shè)計的工具。為了評估和比較設(shè)計空間,我們使用了Radosavovic等人提出的工具,他們提出通過從設(shè)計空間中抽取一組模型并對結(jié)果模型誤差分布進行表征來量化設(shè)計空間的質(zhì)量。這種方法背后的主要直覺是,與使用搜索(手動或自動)和比較來自兩個設(shè)計空間的最佳模型相比,比較發(fā)行版更健壯、更有信息性。 |
| Figure 2. Statistics of the AnyNetX design space computed with n = 500 sampled models. Left: The error empirical distribution function (EDF) serves as our foundational tool for visualizing the quality of the design space. In the legend we report the min error and mean error (which corresponds to the area under the curve). Middle: Distribution of network depth d (number of blocks) versus error. Right: Distribution of block widths in the fourth stage (w4) versus error. The blue shaded regions are ranges containing the best models with 95% confidence (obtained using an empirical bootstrap), and the black vertical line the most likely best value. | 圖2。使用n = 500個采樣模型計算AnyNetX設(shè)計空間的統(tǒng)計信息。左:誤差經(jīng)驗分布函數(shù)(EDF)是我們可視化設(shè)計空間質(zhì)量的基本工具。在圖例中,我們報告了最小誤差和平均誤差(對應(yīng)于曲線下的面積)。中間:網(wǎng)絡(luò)深度d(塊數(shù))與錯誤的分布。右:塊寬在第四階段(w4)的分布與誤差。藍色陰影區(qū)域是包含95%置信度的最佳模型的范圍(使用經(jīng)驗自助法獲得),而黑色豎線是最有可能的最佳值。 |
| To obtain a distribution of models, we sample and train n models from a design space. For efficiency, we primarily do so in a low-compute, low-epoch training regime. In particular, in this section we use the 400 million flop3 (400MF) regime and train each sampled model for 10 epochs on the ImageNet dataset [3]. We note that while we train many models, each training run is fast: training 100 models at 400MF for 10 epochs is roughly equivalent in flops to training a single ResNet-50 [8] model at 4GF for 100 epochs. As in [21], our primary tool for analyzing design space quality is the error empirical distribution function (EDF). The error EDF of n models with errors ei is given by: F(e) gives the fraction of models with error less than e. We show the error EDF for n = 500 sampled models from the AnyNetX design space (described in §3.2) in Figure 2 (left). | 為了獲得模型的分布,我們從一個設(shè)計空間中采樣并訓(xùn)練n個模型。為了提高效率,我們主要在低計算、低時代的訓(xùn)練體制下進行。特別地,在本節(jié)中,我們使用4億個flop3 (400MF)機制,并在ImageNet數(shù)據(jù)集[3]上對每個采樣模型進行10個紀元的訓(xùn)練。我們注意到,雖然我們訓(xùn)練了許多模型,但每一次訓(xùn)練都是快速的:在400MF下訓(xùn)練100個模型10個epoch與在4GF下訓(xùn)練單個ResNet-50[8]模型100 epoch大致相當。 與[21]一樣,我們分析設(shè)計空間質(zhì)量的主要工具是誤差經(jīng)驗分布函數(shù)(EDF)。有誤差ei的n個模型的誤差EDF為: F(e)給出了誤差小于e的模型的比例。我們在圖2(左)中展示了來自AnyNetX設(shè)計空間(在§3.2中描述)的n = 500個采樣模型的誤差EDF。 |
| Given a population of trained models, we can plot and analyze various network properties versus network error, see Figure 2 (middle) and (right) for two examples taken from the AnyNetX design space. Such visualizations show 1D projections of a complex, high-dimensional space, and can help obtain insights into the design space. For these plots, we employ an empirical bootstrap4 [5] to estimate the likely range in which the best models fall. To summarize: (1) we generate distributions of models obtained by sampling and training n models from a design space, (2) we compute and plot error EDFs to summarize design space quality, (3) we visualize various properties of a design space and use an empirical bootstrap to gain insight, and (4) we use these insights to refine the design space. | 給定一組訓(xùn)練過的模型,我們可以繪制和分析各種網(wǎng)絡(luò)屬性與網(wǎng)絡(luò)錯誤之間的關(guān)系,參見圖2(中間)和圖2(右邊),這兩個示例取自AnyNetX設(shè)計空間。這樣的可視化顯示了一個復(fù)雜的高維空間的一維投影,可以幫助獲得對設(shè)計空間的洞察。對于這些圖,我們使用一個經(jīng)驗bootstrap4[5]來估計最佳模型的可能范圍。 總結(jié):(1)我們生成分布模型獲得的采樣和訓(xùn)練n模型的設(shè)計空間,(2)我們計算和繪制錯誤edf總結(jié)設(shè)計空間質(zhì)量,(3)我們想象的各種屬性設(shè)計空間和使用經(jīng)驗引導(dǎo)了解,(4),我們使用這些信息來改進設(shè)計空間。 ? |
| Figure 3. General network structure for models in our design spaces. (a) Each network consists of a stem (stride-two 3×3 conv with w0 = 32 output channels), followed by the network body that performs the bulk of the computation, and then a head (average pooling followed by a fully connected layer) that predicts n output classes. (b) The network body is composed of a sequence of stages that operate at progressively reduced resolution ri. (c) Each stage consists of a sequence of identical blocks, except the first block which uses stride-two conv. While the general structure is simple, the total number of possible network configurations is vast. | 圖3。我們設(shè)計空間中模型的一般網(wǎng)絡(luò)結(jié)構(gòu)。(a)每個網(wǎng)絡(luò)包括一個stem (stride-two 3×3 conv with w0 = 32 output channels),接著是執(zhí)行大部分計算的網(wǎng)絡(luò)主體,然后是預(yù)測n個輸出類的head(平均池接全連接層)。(b)網(wǎng)絡(luò)主體由一系列按逐步降低分辨率ri操作的階段組成。(c)除了第一個使用stride-two conv的塊外,每個階段都由一系列相同的塊組成。雖然一般結(jié)構(gòu)簡單,但可能的網(wǎng)絡(luò)配置的總數(shù)是巨大的。 |
?
3.2. The AnyNet Design Space??AnyNet的設(shè)計空間
| We next introduce our initial AnyNet design space. Our focus is on exploring the structure of neural networks assuming standard, fixed network blocks (e.g., residual bottleneck blocks). In our terminology the structure of the network includes elements such as the number of blocks (i.e. network depth), block widths (i.e. number of channels), and other block parameters such as bottleneck ratios or group widths. The structure of the network determines the distribution of compute, parameters, and memory throughout the computational graph of the network and is key in determining its accuracy and efficiency. The basic design of networks in our AnyNet design space is straightforward. Given an input image, a network consists of a simple stem, followed by the network body that performs the bulk of the computation, and a final network head that predicts the output classes, see Figure 3a. We keep the stem and head fixed and as simple as possible, and instead focus on the structure of the network body that is central in determining network compute and accuracy. | ? 接下來,我們將介紹我們最初的AnyNet設(shè)計空間。我們的重點是探索假定標準的固定網(wǎng)絡(luò)塊(例如,剩余瓶頸塊)的神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)。在我們的術(shù)語中,網(wǎng)絡(luò)的結(jié)構(gòu)包括一些元素,如塊的數(shù)量(即網(wǎng)絡(luò)深度)、塊的寬度(即通道的數(shù)量)和其他塊的參數(shù)(如瓶頸比率或組的寬度)。網(wǎng)絡(luò)的結(jié)構(gòu)決定了計算、參數(shù)和內(nèi)存在整個網(wǎng)絡(luò)計算圖中的分布,是決定其準確性和效率的關(guān)鍵。 在我們的AnyNet設(shè)計空間中,網(wǎng)絡(luò)的基本設(shè)計非常簡單。給定一個輸入圖像,一個網(wǎng)絡(luò)由一個簡單的主干、執(zhí)行大部分計算的網(wǎng)絡(luò)主體和預(yù)測輸出類的最終網(wǎng)絡(luò)頭組成,如圖3a所示。我們保持桿和頭固定,盡可能簡單,相反,我們關(guān)注網(wǎng)絡(luò)主體的結(jié)構(gòu),這是決定網(wǎng)絡(luò)計算和精度的核心。 |
| Figure 4. The X block is based on the standard residual bottleneck block with group convolution [31]. (a) Each X block consists of a 1×1 conv, a 3×3 group conv, and a final 1×1 conv, where the 1×1 convs alter the channel width. BatchNorm [12] and ReLU follow each conv. The block has 3 parameters: the width wi, bottleneck ratio bi, and group width gi. (b) The stride-two (s = 2) version. | 圖4。X塊基于標準剩余瓶頸塊,組卷積為[31]。(a)每個X塊由一個1×1 conv、一個3×3 group conv和一個最終的1×1 conv組成,其中1×1 convs改變通道寬度。BatchNorm[12]和ReLU遵循每個conv, block有3個參數(shù):寬度wi,瓶頸比bi,組寬度gi。(b)第二版(s = 2)。 網(wǎng)絡(luò)主體由4個階段組成,以逐漸降低的分辨率運行,參見圖3b(我們在§3.4中探討了階段數(shù)量的變化)。每個階段由一系列相同的塊組成,參見圖3c。總的來說,對于每個階段i,自由度包括塊的數(shù)量di、塊寬度wi和任何其他塊參數(shù)。雖然總體結(jié)構(gòu)很簡單,但AnyNet設(shè)計空間中可能存在的網(wǎng)絡(luò)總數(shù)是巨大的。 |
| The network body consists of 4 stages operating at progressively reduced resolution, see Figure 3b (we explore varying the number of stages in §3.4). Each stage consists of a sequence of identical blocks, see Figure 3c. In total, for each stage i the degrees of freedom include the number of blocks di , block width wi , and any other block parameters. While the general structure is simple, the total number of possible networks in the AnyNet design space is vast. Most of our experiments use the standard residual bottlenecks block with group convolution [31], shown in Figure 4. We refer to this as the X block, and the AnyNet design space built on it as AnyNetX (we explore other blocks in §3.4). While the X block is quite rudimentary, we show it can be surprisingly effective when network structure is optimized. The AnyNetX design space has 16 degrees of freedom as each network consists of 4 stages and each stage i has 4 parameters: the number of blocks di , block width wi , bottleneck ratio bi , and group width gi . We fix the input resolution r = 224 unless otherwise noted. To obtain valid models, we perform log-uniform sampling of di ≤ 16, wi ≤ 1024 and divisible by 8, bi ∈ {1, 2, 4}, and gi ∈ {1, 2, . . . , 32} (we test these ranges later). We repeat the sampling until we obtain n = 500 models in our target complexity regime (360MF to 400MF), and train each model for 10 epochs.5 Basic statistics for AnyNetX were shown in Figure 2. | 我們的實驗大多使用標準的殘差瓶頸塊與組卷積[31],如圖4所示。我們將其稱為X塊,在其上構(gòu)建的AnyNet設(shè)計空間稱為AnyNetX(我們將在§3.4中討論其他塊)。雖然X塊非常簡陋,但我們證明了它在優(yōu)化網(wǎng)絡(luò)結(jié)構(gòu)時可以非常有效。 AnyNetX設(shè)計空間有16個自由度,因為每個網(wǎng)絡(luò)由4個階段組成,每個階段i有4個參數(shù):塊數(shù)量di、塊寬度wi、瓶頸比bi和組寬度gi。我們修正了輸入分辨率r = 224,除非另有說明。為了得到有效的模型,我們對di≤16,wi≤1024,且可被8整除,bi∈{1,2,4},gi∈{1,2,…, 32}(稍后測試這些范圍)。我們重復(fù)采樣,直到在我們的目標復(fù)雜度范圍內(nèi)(360 - 400MF)獲得n = 500個模型,并對每個模型進行10個epoch的訓(xùn)練。AnyNetX的基本統(tǒng)計數(shù)據(jù)如圖2所示。 |
| sic statistics for AnyNetX were shown in Figure 2. There are (16·128·3·6)4 ≈ 1018 possible model configurations in the AnyNetX design space. Rather than searching for the single best model out of these ~1018 configurations, we explore whether there are general design principles that can help us understand and refine this design space. To do so, we apply our approach of designing design spaces. In each step of this approach, our aims are:
| AnyNetX的sic統(tǒng)計數(shù)據(jù)如圖2所示。在AnyNetX設(shè)計空間中有(16·128·3·6)4≈1018種可能的模型配置。我們不是從這些~1018配置中尋找單個的最佳模型,而是探究是否有通用的設(shè)計原則可以幫助我們理解和改進這個設(shè)計空間。為此,我們采用了設(shè)計設(shè)計空間的方法。在這方法的每一步,我們的目標是:
|
| Figure 5. AnyNetXB (left) and AnyNetXC (middle) introduce a shared bottleneck ratio bi = b and shared group width gi = g, respectively. This simplifies the design spaces while resulting in virtually no change in the error EDFs. Moreover, AnyNetXB and AnyNetXC are more amendable to analysis. Applying an empirical bootstrap to b and g we see trends emerge, e.g., with 95% confidence b ≤ 2 is best in this regime (right). No such trends are evident in the individual bi and gi in AnyNetXA (not shown). Figure 6. Example good and bad AnyNetXC networks, shown in the top and bottom rows, respectively. For each network, we plot the width wj of every block j up to the network depth d. These per-block widths wj are computed from the per-stage block depths di and block widths wi (listed in the legends for reference). Figure 7. AnyNetXD (left) and AnyNetXE (right). We show various constraints on the per stage widths wi and depths di. In both cases, having increasing wi and di is beneficial, while using constant or decreasing values is much worse. Note that AnyNetXD = AnyNetXC + wi+1 ≥ wi, and AnyNetXE = AnyNetXD + di+1 ≥ di. We explore stronger constraints on wi and di shortly. Figure 8. Linear fits. Top networks from the AnyNetX design space can be well modeled by a quantized linear parameterization, and conversely, networks for which this parameterization has a higher fitting error efit tend to perform poorly. See text for details. | 圖5。AnyNetXB(左)和AnyNetXC(中)分別引入了共享瓶頸比bi = b和共享組寬度gi = g。這簡化了設(shè)計空間,同時幾乎不會改變錯誤EDFs。此外,AnyNetXB和AnyNetXC更易于進行分析。對b和g應(yīng)用經(jīng)驗自舉法,我們可以看到趨勢的出現(xiàn),例如,在這種情況下,95%置信b≤2是最好的(右)。在AnyNetXA的個體bi和gi中沒有明顯的這種趨勢(未顯示)。 ? ? ? 圖6。示例好的和壞的AnyNetXC網(wǎng)絡(luò),分別顯示在頂部和底部行。對于每個網(wǎng)絡(luò),我們繪制每個塊j的寬度wj,直到網(wǎng)絡(luò)深度d。這些每個塊寬度wj是由每個階段的塊深度di和塊寬度wi計算出來的(列在圖例中以供參考)。 ? ? ? ? 圖7。AnyNetXD(左)和AnyNetXE(右)。我們給出了每個階段寬度wi和深度di的各種約束條件。在這兩種情況下,增加wi和di是有益的,而使用常數(shù)或遞減值則更糟。注意AnyNetXD = AnyNetXC + wi+1≥wi,且AnyNetXE = AnyNetXD + di+1≥di。我們將很快探討對wi和di的更強約束。 |
| We now apply this approach to the AnyNetX design space. AnyNetXA. For clarity, going forward we refer to the initial, unconstrained AnyNetX design space as AnyNetXA. AnyNetXB. We first test a shared bottleneck ratio bi = b for all stages i for the AnyNetXA design space, and refer to the resulting design space as AnyNetXB. As before, we sample and train 500 models from AnyNetXB in the same settings. The EDFs of AnyNetXA and AnyNetXB, shown in Figure 5 (left), are virtually identical both in the average and best case. This indicates no loss in accuracy when coupling the bi . In addition to being simpler, the AnyNetXB is more amenable to analysis, see for example Figure 5 (right). AnyNetXC. Our second refinement step closely follows the first. Starting with AnyNetXB, we additionally use a shared group width gi = g for all stages to obtain AnyNetXC. As before, the EDFs are nearly unchanged, see Figure 5 (middle). Overall, AnyNetXC has 6 fewer degrees of freedom than AnyNetXA, and reduces the design space size nearly four orders of magnitude. Interestingly, we find g > 1 is best (not shown); we analyze this in more detail in §4. AnyNetXD. Next, we examine typical network structures of both good and bad networks from AnyNetXC in Figure 6. A pattern emerges: good network have increasing widths. We test the design principle of wi+1 ≥ wi , and refer to the design space with this constraint as AnyNetXD. In Figure 7 (left) we see this improves the EDF substantially. We return to examining other options for controlling width shortly. AnyNetXE. Upon further inspection of many models (not shown), we observed another interesting trend. In addition to stage widths wi increasing with i, the stage depths di likewise tend to increase for the best models, although not necessarily in the last stage. Nevertheless, we test a design space variant AnyNetXE with di+1 ≥ di in Figure 7 (right), and see it also improves results. Finally, we note that the constraints on wi and di each reduce the design space by 4!, with a cumulative reduction of O(107 ) from AnyNetXA. | 我們現(xiàn)在將這種方法應(yīng)用于AnyNetX設(shè)計空間。AnyNetXA。為了清晰起見,我們將最初的、不受約束的AnyNetX設(shè)計空間稱為AnyNetXA。AnyNetXB。我們首先測試AnyNetXA設(shè)計空間的所有階段i的共享瓶頸比bi = b,并將得到的設(shè)計空間稱為AnyNetXB。與之前一樣,我們在相同的設(shè)置下從AnyNetXB取樣和培訓(xùn)了500個模型。如圖5(左)所示,AnyNetXA和AnyNetXB的edf在平均情況和最佳情況下實際上是相同的。這表示在耦合bi時沒有精度損失。除了更簡單之外,AnyNetXB更易于分析,參見圖5(右側(cè))。 AnyNetXC。我們的第二個細化步驟緊跟著第一個步驟。從AnyNetXB開始,我們還為所有階段使用共享的組寬度gi = g來獲得AnyNetXC。與前面一樣,EDFs幾乎沒有變化,請參見圖5(中間)。總的來說,AnyNetXC比AnyNetXA少了6個自由度,并且減少了近4個數(shù)量級的設(shè)計空間大小。有趣的是,我們發(fā)現(xiàn)g > 1是最好的(沒有顯示);我們將在§4中對此進行更詳細的分析。AnyNetXD。接下來,我們將研究圖6中AnyNetXC中好的和壞的網(wǎng)絡(luò)的典型網(wǎng)絡(luò)結(jié)構(gòu)。一種模式出現(xiàn)了:良好的網(wǎng)絡(luò)具有不斷增長的寬度。我們測試了wi+1≥wi的設(shè)計原則,并將此約束下的設(shè)計空間稱為AnyNetXD。在圖7(左)中,我們看到這極大地改進了EDF。稍后我們將討論控制寬度的其他選項。AnyNetXE。在進一步檢查許多模型(未顯示)后,我們觀察到另一個有趣的趨勢。除了階段寬度wi隨i增加外,對于最佳模型,階段深度di也同樣趨向于增加,盡管不一定是在最后階段。盡管如此,在圖7(右)中,我們測試了一個設(shè)計空間變體AnyNetXE,其中di+1≥di,并看到它也改善了結(jié)果。最后,我們注意到對wi和di的約束使設(shè)計空間減少了4!,與AnyNetXA相比O(107)的累積減少。 |
?
3.3. The RegNet Design Space??RegNet設(shè)計空間
| To gain further insight into the model structure, we show the best 20 models from AnyNetXE in a single plot, see Figure 8 (top-left). For each model, we plot the per-block width wj of every block j up to the network depth d (we use i and j to index over stages and blocks, respectively). See Figure 6 for reference of our model visualization. While there is significant variance in the individual models (gray curves), in the aggregate a pattern emerges. In particular, in the same plot we show the line wj = 48·(j+1)for 0 ≤ j ≤ 20 (solid black curve, please note that the y-axis is logarithmic). Remarkably, this trivial linear fit seems to explain the population trend of the growth of network widths for top models. Note, however, that this linear fit assigns a different width wj to each block, whereas individual models have quantized widths (piecewise constant functions). To see if a similar pattern applies to individual models, we need a strategy to quantize a line to a piecewise constant function. Inspired by our observations from AnyNetXD and AnyNetXE, we propose the following approach. First, we introduce a linear parameterization for block widths: This parameterization has three parameters: depth d, initial width w0 > 0, and slope wa > 0, and generates a different block width uj for each block j < d. To quantize uj , We can convert the per-block wj to our per-stage format by simply counting the number of blocks with constant width, that is, each stage i has block width wi = w0·w i m and number of blocks di = P j 1[bsj e = i]. When only considering four stage networks, we ignore the parameter combinations that give rise to a different number of stages. | 為了進一步了解模型結(jié)構(gòu),我們在一個圖中顯示了來自AnyNetXE的最好的20個模型,見圖8(左上)。對于每個模型,我們繪制每個塊j的每塊寬度wj,直到網(wǎng)絡(luò)深度d(我們分別使用i和j來索引階段和塊)。請參閱圖6,以了解我們的模型可視化。 雖然在個別模型(灰色曲線)中存在顯著的差異,但在總體上出現(xiàn)了一種模式。特別地,在相同的圖中,我們顯示了0≤j≤20時的wj = 48·(j+1)(實心黑色曲線,請注意y軸是對數(shù)的)。值得注意的是,這種瑣碎的線性擬合似乎可以解釋頂級模型網(wǎng)絡(luò)寬度增長的總體趨勢。然而,請注意,這個線性擬合為每個塊分配了不同的寬度wj,而單個模型具有量化的寬度(分段常數(shù)函數(shù))。 要查看類似的模式是否適用于單個模型,我們需要一種策略來將一條線量化為分段常數(shù)函數(shù)。受AnyNetXD和AnyNetXE的啟發(fā),我們提出了以下方法。首先,我們引入一個塊寬的線性參數(shù)化: 該參數(shù)化有三個參數(shù):深度d、初始寬度w0 >和斜率wa > 0,并為每個區(qū)塊j < d生成不同的區(qū)塊寬度uj。為了量化uj, 我們可以將每個塊的wj轉(zhuǎn)換為我們的每個階段的格式,只需計算具有恒定寬度的塊的數(shù)量,即每個階段i的塊寬度wi = w0·w im,塊數(shù)量di = P j 1[bsj e = i]。當只考慮四個階段網(wǎng)絡(luò)時,我們忽略了引起不同階段數(shù)的參數(shù)組合。 圖9。RegNetX設(shè)計空間。詳情見正文。 |
| Figure 9. RegNetX design space. See text for details. Figure 10. RegNetX generalization. We compare RegNetX to AnyNetX at higher flops (top-left), higher epochs (top-middle), with 5-stage networks (top-right), and with various block types (bottom). In all cases the ordering of the design spaces is consistent and we see no signs of design space overfitting. Table 1. Design space summary. See text for details. | 圖10。RegNetX泛化。我們比較了RegNetX和AnyNetX在更高的flops(左上)、更高的epoch(中上)、5級網(wǎng)絡(luò)(右上)和各種塊類型(下)的性能。在所有情況下,設(shè)計空間的順序是一致的,我們沒有看到設(shè)計空間過擬合的跡象。 表1。設(shè)計空間的總結(jié)。詳情見正文。 ? |
| We test this parameterization by fitting to models from AnyNetX. In particular, given a model, we compute the fit by setting d to the network depth and performing a grid search over w0, wa and wm to minimize the mean log-ratio (denoted by efit) of predicted to observed per-block widths. Results for two top networks from AnyNetXE are shown in Figure 8 (top-right). The quantized linear fits (dashed curves) are good fits of these best models (solid curves). Next, we plot the fitting error efit versus network error for every network in AnyNetXC through AnyNetXE in Figure 8 (bottom). First, we note that the best models in each design space all have good linear fits. Indeed, an empirical bootstrap gives a narrow band of efit near 0 that likely contains the best models in each design space. Second, we note that on average, efit improves going from AnyNetXC to AnyNetXE, showing that the linear parametrization naturally enforces related constraints to wi and di increasing. To further test the linear parameterization, we design a design space that only contains models with such linear structure. In particular, we specify a network structure via 6 parameters: d, w0, wa, wm (and also b, g). Given these, we generate block widths and depths via Eqn. (2)-(4). We refer to the resulting design space as RegNet, as it contains only simple, regular models. We sample d < 64, w0, wa < 256, 1.5 ≤ wm ≤ 3 and b and g as before (ranges set based on efit on AnyNetXE). | 我們通過擬合來自AnyNetX的模型來測試這個參數(shù)化。特別地,在給定的模型中,我們通過設(shè)置網(wǎng)絡(luò)深度d并在w0、wa和wm上執(zhí)行網(wǎng)格搜索來計算擬合,從而最小化每個塊寬度的預(yù)測與觀察的平均日志比(用efit表示)。來自AnyNetXE的兩個頂級網(wǎng)絡(luò)的結(jié)果如圖8所示(右上角)。量化的線性擬合(虛線)是這些最佳模型(實線)的良好擬合。 接下來,我們通過AnyNetXE繪制AnyNetXC中每個網(wǎng)絡(luò)的擬合錯誤efit與網(wǎng)絡(luò)錯誤,如圖8(底部)所示。首先,我們注意到每個設(shè)計空間中最好的模型都具有良好的線性擬合。實際上,經(jīng)驗引導(dǎo)法給出了一個接近于0的efit窄頻帶,它可能包含每個設(shè)計空間中最好的模型。其次,我們注意到efit從AnyNetXC到AnyNetXE的平均性能得到了改善,這表明線性參數(shù)化自然地對wi和di的增加施加了相關(guān)的約束。 為了進一步檢驗線性參數(shù)化,我們設(shè)計了一個只包含線性結(jié)構(gòu)模型的設(shè)計空間。特別地,我們通過6個參數(shù)來指定網(wǎng)絡(luò)結(jié)構(gòu):d, w0, wa, wm(以及b, g),給定這些參數(shù),我們通過Eqn來生成塊的寬度和深度。(2)- (4)。我們將最終的設(shè)計空間稱為RegNet,因為它只包含簡單的、常規(guī)的模型。我們對d < 64、w0、wa < 256、1.5≤wm≤3和b、g進行采樣(根據(jù)AnyNetXE上的efit設(shè)置范圍)。 |
| The error EDF of RegNetX is shown in Figure 9 (left). Models in RegNetX have better average error than AnyNetX while maintaining the best models. In Figure 9 (middle) we test two further simplifications. First, using wm = 2 (doubling width between stages) slightly improves the EDF, but we note that using wm ≥ 2 performs better (shown later). Second, we test setting w0 = wa, further??simplifying the linear parameterization to uj = wa ·(j + 1). Interestingly, this performs even better. However, to maintain the diversity of models, we do not impose either restriction. Finally, in Figure 9 (right) we show that random search efficiency is much higher for RegNetX; searching over just ~32 random models is likely to yield good models. Table 1 shows a summary of the design space sizes (for RegNet we estimate the size by quantizing its continuous parameters). In designing RegNetX, we reduced the dimension of the original AnyNetX design space from 16 to 6 dimensions, and the size nearly 10 orders of magnitude. We note, however, that RegNet still contains a good diversity of models that can be tuned for a variety of settings. | 我們通過擬合來自AnyNetX的模型來測試這個參數(shù)化。特別地,在給定的模型中,我們通過設(shè)置網(wǎng)絡(luò)深度d并在w0,佤邦和wm上執(zhí)行網(wǎng)格搜索來計算擬合,從而最小化每個塊寬度的預(yù)測與觀察的平均日志比(用efit表示)。來自AnyNetXE的兩個頂級網(wǎng)絡(luò)的結(jié)果如圖8所示(右上角)。量化的線性擬合(虛線)是這些最佳模型(實線)的良好擬合。 表1顯示了設(shè)計空間大小的摘要(對于RegNet,我們通過量化其連續(xù)參數(shù)來估計大小)。在設(shè)計RegNetX時,我們將原始AnyNetX設(shè)計空間的維度從16個維度縮減為6個維度,大小接近10個數(shù)量級。但是,我們注意到,RegNet仍然包含各種各樣的模型,可以針對各種設(shè)置進行調(diào)優(yōu)。 |
?
3.4. Design Space Generalization? ?設(shè)計空間泛化
| We designed the RegNet design space in a low-compute, low-epoch training regime with only a single block type. However, our goal is not to design a design space for a single setting, but rather to discover general principles of network design that can generalize to new settings. In Figure 10, we compare the RegNetX design space to AnyNetXA and AnyNetXE at higher flops, higher epochs, with 5-stage networks, and with various block types (described in the appendix). In all cases the ordering of the design spaces is consistent, with RegNetX > AnyNetXE > AnyNetXA. In other words, we see no signs of overfitting. These results are promising because they show RegNet can generalize to new settings. The 5-stage results show the regular structure of RegNet can generalize to more stages, where AnyNetXA has even more degrees of freedom. |
在圖10中,我們將RegNetX設(shè)計空間與AnyNetXA和AnyNetXE在更高的flops、更高的epoch、5級網(wǎng)絡(luò)和各種塊類型(在附錄中進行了描述)下進行了比較。在所有情況下,設(shè)計空間的順序是一致的,使用RegNetX > AnyNetXE > AnyNetXA。換句話說,我們沒有看到過度擬合的跡象。這些結(jié)果很有希望,因為它們表明RegNet可以泛化到新的設(shè)置。5階段的結(jié)果表明,正則RegNet結(jié)構(gòu)可以推廣到更多的階段,其中AnyNetXA具有更多的自由度。 |
| Figure 11. RegNetX parameter trends. For each parameter and each flop regime we apply an empirical bootstrap to obtain the range that contains best models with 95% confidence (shown with blue shading) and the likely best model (black line), see also Figure 2. We observe that for best models the depths d are remarkably stable across flops regimes, and b = 1 and wm ≈ 2.5 are best. Block and groups widths (wa, w0, g) tend to increase with flops. Figure 12. Complexity metrics. Top: Activations can have a stronger correlation to runtime on hardware accelerators than flops (we measure inference time for 64 images on an NVIDIA V100 GPU). Bottom: Trend analysis of complexity vs. flops and best fit curves (shown in blue) of the trends for best models (black curves). Figure 13. We refine RegNetX using various constraints (see text). The constrained variant (C) is best across all flop regimes while being more efficient in terms of parameters and activations. Figure 14. We evaluate RegNetX with alternate design choices. Left: Inverted bottleneck ( 1 8 ≤ b ≤ 1) degrades results and depthwise conv (g = 1) is even worse. Middle: Varying resolution r harms results. Right: RegNetY (Y=X+SE) improves the EDF. | 圖11。RegNetX參數(shù)趨勢。對于每個參數(shù)和每個觸發(fā)器,我們應(yīng)用經(jīng)驗自舉法獲得包含95%置信度的最佳模型(用藍色底紋表示)和可能的最佳模型(黑線表示)的范圍,參見圖2。我們觀察到,對于最佳模型,深度d在flops范圍內(nèi)非常穩(wěn)定,b = 1和wm≈2.5是最佳的。塊和組的寬度(wa、w0、g)隨著拖放而增加。 ? ? 圖12。復(fù)雜性度量。Top:與flops相比,激活在硬件加速器上與運行時的相關(guān)性更強(我們在NVIDIA V100 GPU上測量64張圖像的推斷時間)。底部:趨勢分析的復(fù)雜性與失敗和最佳擬合曲線(藍色顯示)的趨勢的最佳模型(黑色曲線)。 ? ? ? 圖13。我們使用各種約束來細化RegNetX(參見文本)。約束變式(C)在所有的觸發(fā)器中都是最好的,同時在參數(shù)和激活方面更有效。 圖14。我們評估RegNetX的備選設(shè)計選擇。左:倒瓶頸(1 8≤b≤1)降低結(jié)果,深度conv (g = 1)更差。中間:變化的分辨率對結(jié)果有害。右:RegNetY (Y=X+SE)改善EDF。 |
?
4. Analyzing the RegNetX Design Space? ?分析RegNetX設(shè)計空間
| We next further analyze the RegNetX design space and revisit common deep network design choices. Our analysis yields surprising insights that don’t match popular practice, which allows us to achieve good results with simple models. As the RegNetX design space has a high concentration of good models, for the following results we switch to sampling fewer models (100) but training them for longer (25 epochs) with a learning rate of 0.1 (see appendix). We do so to observe more fine-grained trends in network behavior. RegNet trends. We show trends in the RegNetX parameters across flop regimes in Figure 11. Remarkably, the depth of best models is stable across regimes (top-left), with an optimal depth of ~20 blocks (60 layers). This is in contrast to the common practice of using deeper models for higher flop regimes. We also observe that the best models use a bottleneck ratio b of 1.0 (top-middle), which effectively removes the bottleneck (commonly used in practice). Next, we observe that the width multiplier wm of good models is ~2.5 (top-right), similar but not identical to the popular recipe of doubling widths across stages. The remaining parameters (g, wa, w0) increase with complexity (bottom). | 接下來,我們將進一步分析RegNetX設(shè)計空間,并回顧常見的深度網(wǎng)絡(luò)設(shè)計選擇。我們的分析產(chǎn)生了與流行實踐不匹配的驚人見解,這使我們能夠用簡單的模型獲得良好的結(jié)果。 由于RegNetX設(shè)計空間擁有高度集中的優(yōu)秀模型,對于以下結(jié)果,我們將轉(zhuǎn)換為抽樣較少的模型(100個),但對它們進行更長時間的培訓(xùn)(25個epoch),學(xué)習(xí)率為0.1(參見附錄)。我們這樣做是為了觀察網(wǎng)絡(luò)行為中更細微的趨勢。 RegNet趨勢。我們在圖11中展示了在整個觸發(fā)器中RegNetX參數(shù)的變化趨勢。值得注意的是,最佳模型的深度在不同區(qū)域(左上)是穩(wěn)定的,最優(yōu)深度為~20塊(60層)。這與在更高的翻背越高的體制中使用更深的模式的慣例形成了對比。我們還觀察到,最佳模型使用的瓶頸比b為1.0(上-中),這有效地消除了瓶頸(在實踐中經(jīng)常使用)。接下來,我們觀察到好模型的寬度倍增器wm為~2.5(右上角),這與流行的跨階段加倍寬度的方法相似,但并不完全相同。其余參數(shù)(g、wa、w0)隨復(fù)雜度增加而增加(底部)。 |
| Complexity analysis. In addition to flops and parameters, we analyze network activations, which we define as the size of the output tensors of all conv layers (we list complexity measures of common conv operators in Figure 12, top-left). While not a common measure of network complexity, activations can heavily affect runtime on memory-bound hardware accelerators (e.g., GPUs, TPUs), for example, see Figure 12 (top). In Figure 12 (bottom), we observe that for the best models in the population, activations increase with the square-root of flops, parameters increase linearly, and runtime is best modeled using both a linear and a square-root term due to its dependence on both flops and activations. RegNetX constrained. Using these findings, we refine the RegNetX design space. First, based on Figure 11 (top), we set b = 1, d ≤ 40, and wm ≥ 2. Second, we limit parameters and activations, following Figure 12 (bottom). This yields fast, low-parameter, low-memory models without affecting accuracy. In Figure 13, we test RegNetX with theses constraints and observe that the constrained version is superior across all flop regimes. We use this version in §5, and further limit depth to 12 ≤ d ≤ 28 (see also Appendix D). | 復(fù)雜性分析。除了flops和參數(shù)之外,我們還分析了網(wǎng)絡(luò)激活,我們將其定義為所有conv層的輸出張量的大小(我們在圖12(左上角)中列出了常見conv操作符的復(fù)雜性度量)。雖然激活不是測量網(wǎng)絡(luò)復(fù)雜性的常用方法,但它會嚴重影響內(nèi)存限制硬件加速器(例如,gpu、TPUs)上的運行時,參見圖12(頂部)。在圖12(底部)中,我們觀察到,對于總體中的最佳模型,激活隨flops的平方根增加而增加,參數(shù)線性增加,由于運行時對flops和激活的依賴性,最好同時使用線性和平方根項進行建模。 RegNetX受限。利用這些發(fā)現(xiàn),我們改進了RegNetX設(shè)計空間。首先,根據(jù)圖11 (top),我們令b = 1, d≤40,wm≥2。其次,我們限制參數(shù)和激活,如下圖12(底部)所示。這將生成快速、低參數(shù)、低內(nèi)存的模型,而不會影響準確性。在圖13中,我們使用這些約束對RegNetX進行了測試,并觀察到約束的版本在所有的觸發(fā)器狀態(tài)下都是優(yōu)越的。我們在§5中使用這個版本,并進一步將深度限制為12≤d≤28(參見附錄d)。 |
| Alternate design choices. Modern mobile networks often employ the inverted bottleneck (b < 1) proposed in [25] along with depthwise conv [1] (g = 1). In Figure 14 (left), we observe that the inverted bottleneck degrades the EDF slightly and depthwise conv performs even worse relative to b = 1 and g ≥ 1 (see appendix for further analysis). Next, motivated by [29] who found that scaling the input image resolution can be helpful, we test varying resolution in Figure 14 (middle). Contrary to [29], we find that for RegNetX a fixed resolution of 224×224 is best, even at higher flops. SE. Finally, we evaluate RegNetX with the popular Squeeze-and-Excitation (SE) op [10] (we abbreviate X+SE as Y and refer to the resulting design space as RegNetY). In Figure 14 (right), we see that RegNetY yields good gains. | 替代設(shè)計選擇。現(xiàn)代移動網(wǎng)絡(luò)通常采用倒置瓶頸(b < 1)提出了[25]隨著切除conv [1] (g = 1)。在圖14(左),我們觀察到倒置瓶頸略有降低了EDF,切除conv執(zhí)行更糟糕的是相對于b = 1, g≥1進一步分析(見附錄)。接下來,在[29]的啟發(fā)下,我們測試了圖14(中間)中變化的分辨率,[29]發(fā)現(xiàn)縮放輸入圖像分辨率是有幫助的。與[29]相反,我們發(fā)現(xiàn)對于RegNetX,固定的224×224分辨率是最好的,即使在更高的flops。 最后,我們使用流行的擠壓-激勵(SE) op[10]來評估RegNetX(我們將X+SE縮寫為Y,并將最終的設(shè)計空間稱為RegNetY)。在圖14(右)中,我們看到RegNetY產(chǎn)生了良好的收益。 |
| Figure 15. Top REGNETX models. We measure inference time for 64 images on an NVIDIA V100 GPU; train time is for 100 epochs on 8 GPUs with the batch size listed. Network diagram legends contain all information required to implement the models. Figure 16. Top REGNETY models (Y=X+SE). The benchmarking setup and the figure format is the same as in Figure 15. | 圖15。頂級REGNETX模型。我們在NVIDIA V100 GPU上測量64幅圖像的推理時間;在8個gpu上的訓(xùn)練時間為100個epoch,列出了批量大小。網(wǎng)絡(luò)圖圖例包含實現(xiàn)模型所需的所有信息。 圖16。頂級權(quán)威模型(Y=X+SE)。基準測試的設(shè)置和圖的格式與圖15相同。 |
?
5. Comparison to Existing Networks??與現(xiàn)有網(wǎng)絡(luò)的比較
| We now compare top models from the RegNetX and RegNetY design spaces at various complexities to the stateof-the-art on ImageNet [3]. We denote individual models using small caps, e.g. REGNETX. We also suffix the models with the flop regime, e.g. 400MF. For each flop regime, we pick the best model from 25 random settings of the RegNet parameters (d, g, wm, wa, w0), and re-train the top model 5 times at 100 epochs to obtain robust error estimates. Resulting top REGNETX and REGNETY models for each flop regime are shown in Figures 15 and 16, respectively. In addition to the simple linear structure and the trends we analyzed in §4, we observe an interesting pattern. Namely, the higher flop models have a large number of blocks in the third stage and a small number of blocks in the last stage. This is similar to the design of standard RESNET models. Moreover, we observe that the group width g increases with complexity, but depth d saturates for large models. | 我們現(xiàn)在比較的頂級模型從RegNetX和RegNetY設(shè)計空間在各種復(fù)雜的狀態(tài),對ImageNet[3]的藝術(shù)狀態(tài)。我們使用小的大寫字母來表示單個的模型,例如REGNETX。我們還在模型后面加上了觸發(fā)器機制,例如400MF。對于每個觸發(fā)器機制,我們從RegNet參數(shù)的25個隨機設(shè)置(d、g、wm、wa、w0)中選出最佳模型,并在100個epoch時對top模型進行5次再訓(xùn)練,以獲得可靠的誤差估計。 圖15和圖16分別顯示了每種翻牌制度的最高REGNETX和REGNETY模型。除了§4中分析的簡單線性結(jié)構(gòu)和趨勢外,我們還觀察到一個有趣的模式。即高階觸發(fā)器模型在第三階段積木數(shù)量較多,在最后階段積木數(shù)量較少。這與標準RESNET模型的設(shè)計類似。此外,我們觀察到群寬度g隨著復(fù)雜度的增加而增加,但是深度d對于大型模型來說是飽和的。 |
| Our goal is to perform fair comparisons and provide simple and easy-to-reproduce baselines. We note that along with better architectures, much of the recently reported gains in network performance are based on enhancements to the training setup and regularization scheme (see Table 7). As our focus is on evaluating network architectures, we perform carefully controlled experiments under the same training setup. In particular, to provide fair comparisons to classic work, we do not use any training-time enhancements. | 我們的目標是執(zhí)行公平的比較,并提供簡單且易于復(fù)制的基線。我們注意,以及更好的架構(gòu),最近的報道在網(wǎng)絡(luò)性能是基于增強培訓(xùn)設(shè)置和正規(guī)化方案(見表7)。我們的重點是評估網(wǎng)絡(luò)架構(gòu),我們表現(xiàn)的小心控制的實驗設(shè)置在同樣的培訓(xùn)。特別是,為了與經(jīng)典作品進行公平的比較,我們沒有使用任何培訓(xùn)時間的增強。 |
?
5.1. State-of-the-Art Comparison: Mobile Regime??最先進的比較:移動系統(tǒng)
| Much of the recent work on network design has focused on the mobile regime (~600MF). In Table 2, we compare REGNET models at 600MF to existing mobile networks. We observe that REGNETS are surprisingly effective in this regime considering the substantial body of work on finding better mobile networks via both manual design [9, 25, 19] and NAS [35, 23, 17, 18]. We emphasize that REGNET models use our basic 100 epoch schedule with no regularization except weight decay, while most mobile networks use longer schedules with various enhancements, such as deep supervision [16], Cutout [4], DropPath [14], AutoAugment [2], and so on. As such, we hope our strong results obtained with a short training schedule without enhancements can serve as a simple baseline for future work. |
我們強調(diào),REGNET模型使用基本的100 epoch調(diào)度,除了權(quán)值衰減外沒有任何正則化,而大多數(shù)移動網(wǎng)絡(luò)使用更長的調(diào)度,并進行了各種增強,如深度監(jiān)控[16]、刪除[4]、刪除路徑[14]、自動增強[2]等等。因此,我們希望我們在短時間的培訓(xùn)計劃中取得的良好結(jié)果可以作為未來工作的簡單基線。 |
| Table 2. Mobile regime. We compare existing models using originally reported errors to RegNet models trained in a basic setup. Our simple RegNet models achieve surprisingly good results given the effort focused on this regime in the past few years. Table 3. RESNE(X)T comparisons. (a) Grouped by activations, REGNETX show considerable gains (note that for each group GPU inference and training times are similar). (b) REGNETX models outperform RESNE(X)T models under fixed flops as well. Table 4. EFFICIENTNET comparisons using our standard training schedule. Under comparable training settings, REGNETY outperforms EFFICIENTNET for most flop regimes. Moreover, REGNET models are considerably faster, e.g., REGNETX-F8000 is about 5× faster than EFFICIENTNET-B5. Note that originally reported errors for EFFICIENTNET (shown grayed out), are much lower but use longer and enhanced training schedules, see Table 7. Figure 17. ResNe(X)t comparisons. REGNETX models versus RESNE(X)T-(50,101,152) under various complexity metrics. As all models use the identical components and training settings, all observed gains are from the design of the RegNetX design space Figure 18. EFFICIENTNET comparisons. REGNETs outperform the state of the art, especially when considering activations | 表2。移動的政權(quán)。我們將使用原始報告錯誤的現(xiàn)有模型與在基本設(shè)置中訓(xùn)練的RegNet模型進行比較。我們簡單的RegNet模型取得了令人驚訝的好結(jié)果,因為在過去幾年中我們一直致力于此機制。 ? ? ? ? 表3。RESNE T (X)的比較。(a)按照激活分組,REGNETX顯示了相當大的增益(注意,對于每個組,GPU推斷和訓(xùn)練時間是相似的)。(b) REGNETX模型的表現(xiàn)也優(yōu)于RESNE(X)T模型。 ? ? ? ? ? 表4。使用我們的標準訓(xùn)練時間表進行有效的比較。在可比的訓(xùn)練環(huán)境下,REGNETY在大多數(shù)翻背戰(zhàn)術(shù)中都表現(xiàn)得比有效網(wǎng)更好。此外,REGNET模型要快得多,例如,REGNET - f8000大約比efficient entnet - b5快5倍。請注意,efficient entnet最初報告的錯誤(顯示為灰色)要低得多,但是使用更長的和增強的訓(xùn)練計劃,見表7。 ? ? ? 圖17。ResNe t (X)的比較。在各種復(fù)雜度指標下,REGNETX模型與RESNE(X)T-(50,101,152)模型的比較。由于所有的模型都使用相同的組件和訓(xùn)練設(shè)置,所有觀察到的收益都來自RegNetX設(shè)計空間的設(shè)計 ? ? ? 圖18。EFFICIENTNET比較。regnet的性能優(yōu)于當前的技術(shù),特別是在考慮激活時 |
?
?
5.2. Standard Baselines Comparison: ResNe(X)t? ?標準基線比較:ResNe(X)t
| Next, we compare REGNETX to standard RESNET [8] and RESNEXT [31] models. All of the models in this experiment come from the exact same design space, the former being manually designed, the latter being obtained through design space design. For fair comparisons, we compare REGNET and RESNE(X)T models under the same training setup (our standard REGNET training setup). We note that this results in improved RESNE(X)T baselines and highlights the importance of carefully controlling the training setup. Comparisons are shown in Figure 17 and Table 3. Overall, we see that REGNETX models, by optimizing the network structure alone, provide considerable improvements under all complexity metrics. We emphasize that good REGNET models are available across a wide range of compute regimes, including in low-compute regimes where good RESNE(X)T models are not available. | 接下來,我們將REGNETX與標準的RESNET[8]和RESNEXT[31]模型進行比較。本實驗中所有的模型都來自于完全相同的設(shè)計空間,前者是人工設(shè)計,后者是通過設(shè)計空間設(shè)計得到的。為了進行公平的比較,我們在相同的訓(xùn)練設(shè)置(我們的標準REGNET訓(xùn)練設(shè)置)下比較REGNET和RESNE(X)T模型。我們注意到這導(dǎo)致了改進的RESNE(X)T基線,并強調(diào)了仔細控制培訓(xùn)設(shè)置的重要性。 比較結(jié)果如圖17和表3所示。總的來說,我們看到,通過優(yōu)化網(wǎng)絡(luò)結(jié)構(gòu),REGNETX模型在所有復(fù)雜度指標下都提供了相當大的改進。我們強調(diào),良好的REGNET模型適用于廣泛的計算環(huán)境,包括在低計算環(huán)境中,因為沒有良好的RESNE(X)T模型可用。 |
| Table 3a shows comparisons grouped by activations (which can strongly influence runtime on accelerators such as GPUs). This setting is of particular interest to the research community where model training time is a bottleneck and will likely have more real-world use cases in the future, especially as accelerators gain more use at inference time (e.g., in self-driving cars). REGNETX models are quite effective given a fixed inference or training time budget. | 表3a顯示了按激活分組的比較(這對加速程序(如gpu)的運行時有很大影響)。這個設(shè)置對研究社區(qū)特別有意義,因為模型訓(xùn)練時間是一個瓶頸,未來可能會有更多的真實世界用例,特別是在推斷時間(例如,在自動駕駛汽車中)加速器獲得更多的使用。給定一個固定的推論或訓(xùn)練時間預(yù)算,REGNETX模型是相當有效的。 |
?
5.3. State-of-the-Art Comparison: Full Regime? ?最先進的比較:完全制度
| We focus our comparison on EFFICIENTNET [29], which is representative of the state of the art and has reported impressive gains using a combination of NAS and an interesting model scaling rule across complexity regimes. To enable direct comparisons, and to isolate gains due to improvements solely of the network architecture, we opt to reproduce the exact EFFICIENTNET models but using our standard training setup, with a 100 epoch schedule and no regularization except weight decay (effect of longer schedule and stronger regularization are shown in Table 7). We optimize only lr and wd, see Figure 22 in appendix. This is the same setup as REGNET and enables fair comparisons. Results are shown in Figure 18 and Table 4. At low flops, EFFICIENTNET outperforms the REGNETY. At intermediate flops, REGNETY outperforms EFFICIENTNET, and at higher flops both REGNETX and REGNETY perform better. We also observe that for EFFICIENTNET, activations scale linearly with flops (due to the scaling of both resolution and depth), compared to activations scaling with the square-root of flops for REGNETs. This leads to slow GPU training and inference times for EFFICIENTNET. E.g., REGNETX-8000 is 5× faster than EFFICIENTNET-B5, while having lower error. |
|
?
6. Conclusion??結(jié)論
| In this work, we present a new network design paradigm. Our results suggest that designing network design spaces is a promising avenue for future research. | 在這項工作中,我們提出了一個新的網(wǎng)絡(luò)設(shè)計范例。研究結(jié)果表明,網(wǎng)絡(luò)設(shè)計空間的設(shè)計是未來研究的重要方向。 |
| Table 5. RESNE(X)T comparisons on ImageNetV2 Table 6. EFFICIENTNET comparisons on ImageNetV2. | 表5所示。RESNE(X)T在ImageNetV2上的比較 ? ? ? ? 表6所示。在ImageNetV2上的有效比較。 ? |
| Figure 19. Additional ablations. Left: Fixed depth networks (d = 20) are effective across flop regimes. Middle: Three stage networks perform poorly at high flops. Right: Inverted bottleneck (b < 1) is also ineffective at high flops. See text for more context. Figure 20. Swish vs. ReLU. Left: RegNetY performs better with Swish than ReLU at 400MF but worse at 6.4GF. Middle: Results across wider flop regimes show similar trends. Right: If, however, g is restricted to be 1 (depthwise conv), Swish is much better. |
? ? ? ? 圖20。漂亮與ReLU。左圖:RegNetY在400MF時的Swish表現(xiàn)比ReLU好,但在6.4GF時表現(xiàn)更差。中:更廣泛的翻牌方法的結(jié)果顯示了類似的趨勢。正確:然而,如果g被限制為1(深度conv), Swish更好。 |
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的Paper:2020年3月30日何恺明团队最新算法RegNet—来自Facebook AI研究院《Designing Network Design Spaces》的翻译与解读的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: BigData之Hive:Hive数据管
- 下一篇: DayDayUp:哈哈,你上榜了嘛?界面