久久精品国产精品国产精品污,男人扒开添女人下部免费视频,一级国产69式性姿势免费视频,夜鲁夜鲁很鲁在线视频 视频,欧美丰满少妇一区二区三区,国产偷国产偷亚洲高清人乐享,中文 在线 日韩 亚洲 欧美,熟妇人妻无乱码中文字幕真矢织江,一区二区三区人妻制服国产

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

A ConvNet for the 2020s

發布時間:2023/12/20 编程问答 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 A ConvNet for the 2020s 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
A ConvNet for the 2020s

作者:Zhuang Liu1,2* Hanzi Mao1 Chao-Yuan Wu1 Christoph Feichtenhofer1 Trevor Darrell2 Saining Xie1?
機構1Facebook AI Research (FAIR) 2UC Berkeley

*Work done during an internship at Facebook AI Research. —— 在Facebook人工智能研究部實習期間完成的工作。
?Corresponding author.


Abstract

The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

視覺識別的 "咆哮20年代 "始于視覺Transformer(ViTs)的引入,它迅速取代了ConvNets成為最先進的圖像分類模型。另一方面,一個虛無的ViT在應用于一般的計算機視覺任務時面臨著困難,如目標檢測和語義分割。正是分層Transformer(如Swin Transformers)重新引入了幾個ConvNet先驗 (priors),使得Transformer作為通用視覺骨干實際上是可行的,并在各種視覺任務中表現出顯著的性能。然而,這種混合方法的有效性仍然主要歸功于Transformers的內在優勢,而不是Convolutions的內在歸納偏置(the inherent inductive biases of convolutions)。在這項工作中,我們重新審視了設計空間(design spaces),并測試了純ConvNet所能實現的極限。我們逐步將一個標準的ResNet “現代化(modernize)”,使之成為一個視覺Transformer的設計,并在這一過程中發現了幾個促成性能差異的關鍵組件。這一探索的結果是一個被稱為ConvNeXt的純ConvNet模型系列。ConvNeXt完全由標準的ConvNet模塊構成,在準確性和可擴展性方面與Transformer競爭,在COCO檢測和ADE20K分割方面達到了87.8%的ImageNet top-1準確性并超過了Swin Transformers,同時保持了標準ConvNets的簡單性和效率

這里面的ConvNets指的是基于CNN的網絡。

1. Introduction

Looking back at the 2010s, the decade was marked by the monumental progress and impact of deep learning. The primary driver was the renaissance of neural networks, particularly convolutional neural networks (ConvNets). Through the decade, the field of visual recognition successfully shifted from engineering features to designing (ConvNet) architectures. Although the invention of back-propagationtrained ConvNets dates all the way back to the 1980s [42], it was not until late 2012 that we saw its true potential for visual feature learning. The introduction of AlexNet [40] precipitated the “ImageNet moment” [59], ushering in a new era of computer vision. The field has since evolved at a rapid speed. Representative ConvNets like VGGNet [64], Inceptions [68], ResNe(X)t [28, 87], DenseNet [36], MobileNet [34], EfficientNet [71] and RegNet [54] focused on different aspects of accuracy, efficiency and scalability, and popularized many useful design principles.

回顧2010年代,這十年的特點是深度學習的巨大進步和影響。主要驅動力是神經網絡的復興,特別是卷積神經網絡(ConvNets)。在這十年中,視覺識別領域成功地從工程特征轉向設計(ConvNet)架構。雖然反向傳播訓練的ConvNets的發明可以追溯到20世紀80年代,但直到2012年底,我們才看到它在視覺特征學習方面的真正潛力。AlexNet的引入催生了 “ImageNet時刻”,開創了計算機視覺的新時代。此后,該領域以極快的速度發展起來。代表性的ConvNets如

  • VGGNet
  • Inceptions
  • ResNe(X)t
  • DenseNet
  • MobileNet
  • EfficientNet
  • RegNet

專注于準確性、效率和可擴展性的不同方面,并推廣了許多有用的設計原則。

The full dominance of ConvNets in computer vision was not a coincidence: in many application scenarios, a “sliding window” strategy is intrinsic to visual processing, particularly when working with high-resolution images. ConvNets have several built-in inductive biases that make them wellsuited to a wide variety of computer vision applications. The most important one is translation equivariance, which is a desirable property for tasks like objection detection. ConvNets are also inherently efficient due to the fact that when used in a sliding-window manner, the computations are shared [62]. For many decades, this has been the default use of ConvNets, generally on limited object categories such as digits [43], faces [58, 76] and pedestrians [19, 63]. Entering the 2010s, the region-based detectors [23, 24, 27, 57] further elevated ConvNets to the position of being the fundamental building block in a visual recognition system.

ConvNets在計算機視覺中的全面主導地位并不是一個巧合:在許多應用場景中,"滑動窗口(sliding window)"策略是視覺處理的內在因素,特別是在處理高分辨率圖像時。ConvNets有幾個內置的歸納偏置,使它們非常適合于各種計算機視覺應用。最重要的是平移等變性 (translation equivariant),這是目標檢測等任務的一個理想屬性。ConvNets本身也是高效的,因為當以滑動窗口的方式使用時,計算是共享的(也就是常說的卷積第二個特征——權值共享)。幾十年來,這一直是ConvNets的默認用法,一般用于有限的對象類別,如數字、人臉和行人。進入2010年代,基于區域的檢測器(region-based detectors)進一步提升了ConvNets的地位,成為視覺識別系統的基本構件。

translation equivariant: 卷積操作具有平移等變性(translation equivariant),這意味著它保存了轉換,而CNN則允許平移不變性(translation invariance)這是通過適當的(即與空間特征相關的)降維來實現的。

Around the same time, the odyssey of neural network design for natural language processing (NLP) took a very different path, as the Transformers replaced recurrent neural networks to become the dominant backbone architecture. Despite the disparity in the task of interest between language and vision domains, the two streams surprisingly converged in the year 2020, as the introduction of Vision Transformers (ViT) completely altered the landscape of network architecture design. Except for the initial “patchify” layer, which splits an image into a sequence of patches, ViT introduces no image-specific inductive bias and makes minimal changes to the original NLP Transformers. One primary focus of ViT is on the scaling behavior: with the help of larger model and dataset sizes, Transformers can outperform standard ResNets by a significant margin. Those results on image classification tasks are inspiring, but computer vision is not limited to image classification. As discussed previously, solutions to numerous computer vision tasks in the past decade depended significantly on a sliding-window, fully convolutional paradigm. Without the ConvNet inductive biases, a vanilla ViT model faces many challenges in being adopted as a generic vision backbone. The biggest challenge is ViT’s global attention design, which has a quadratic complexity with respect to the input size. This might be acceptable for ImageNet classification, but quickly becomes intractable with higher-resolution inputs.

大約在同一時間,用于自然語言處理(NLP)的神經網絡設計的漫長而充滿風險地走了一條非常不同的道路,因為Transformer取代了遞歸神經網絡(RNN),成為了主流的骨干架構。盡管語言和視覺領域的關注點任務不盡相同,但這兩股潮流在2020年出人意料地融合在一起,因為Vision Transformers(ViT)的引入完全改變了網絡架構設計的格局。除了最初的 "補丁化"層 —— patchify(將圖像分割成一連串的patches),ViT沒有引入圖像特定的歸納偏置,對原始的NLP變形器的改動也很小。ViT的一個主要關注點是擴展行為:在更大的模型和數據集規模的幫助下,Transformers可以在很大程度上超過標準ResNets的表現。這些關于圖像分類任務的結果是鼓舞人心的,但計算機視覺并不限于圖像分類。如前所述,在過去十年中,許多計算機視覺任務的解決方案在很大程度上依賴于滑動窗口、全卷積范式(fully convolutional paradigm)。如果沒有ConvNet的歸納偏置,視覺的ViT模型在作為通用視覺骨干時面臨許多挑戰。最大的挑戰是ViT的全局注意力設計它的復雜度與輸入大小呈二次方這對于ImageNet分類來說可能是可以接受的,但對于更高分辨率的輸入來說很快就變得難以解決了

Hierarchical Transformers employ a hybrid approach to bridge this gap. For example, the “sliding window” strategy (e.g. attention within local windows) was reintroduced to Transformers, allowing them to behave more similarly to ConvNets. Swin Transformer [45] is a milestone work in this direction, demonstrating for the first time that Transformers can be adopted as a generic vision backbone and achieve state-of-the-art performance across a range of computer vision tasks beyond image classification. Swin Transformer’s success and rapid adoption also revealed one thing: the essence of convolution is not becoming irrelevant; rather, it remains much desired and has never faded.

分層Transformer采用了一種混合方法來彌補這一差距。例如,"滑動窗口 "策略(如在局部窗口內的注意)被重新引入Transformers,使其行為與ConvNets更加相似。Swin Transformer是這個方向上的一個里程碑式的工作,首次證明了Transformer可以作為通用的視覺骨干,并在圖像分類之外的一系列計算機視覺任務中取得最先進的性能。Swin Transformer的成功和快速采用也揭示了一件事:卷積的本質并沒有變得不重要;相反,它仍然備受期待,從未褪色

Under this perspective, many of the advancements of Transformers for computer vision have been aimed at bringing back convolutions. These attempts, however, come at a cost: a naive implementation of sliding window self-attention can be expensive [55]; with advanced approaches such as cyclic shifting [45], the speed can be optimized but the system becomes more sophisticated in design. On the other hand, it is almost ironic that a ConvNet already satisfies many of those desired properties, albeit in a straightforward, no-frills way. The only reason ConvNets appear to be losing steam is that (hierarchical) Transformers surpass them in many vision tasks, and the performance difference is usually attributed to the superior scaling behavior of Transformers, with multi-head self-attention being the key component.

在這種觀點下,許多用于計算機視覺的Transformer的進步都是為了讓卷積回歸。然而,這些嘗試是有代價的:樸實的滑動窗口self-attention的實現可能是昂貴的;用先進的方法,如循環移位(cyclic shifting),速度可以被優化,但系統的設計變得更加復雜。另一方面,具有諷刺意味的是,ConvNet已經滿足了許多這些期望的特性,盡管是以一種直接的、不加修飾的方式。ConvNets似乎正在失去動力的唯一原因是(分層的)Transformers在許多視覺任務中超過了它們,而性能差異通常歸因于Transformers卓越的擴展行為,其中多頭自注意力是關鍵的組成部分

Unlike ConvNets, which have progressively improved over the last decade, the adoption of Vision Transformers was a step change. In recent literature, system-level comparisons (e.g. a Swin Transformer vs. a ResNet) are usually adopted when comparing the two. ConvNets and hierarchical vision Transformers become different and similar at the same time: they are both equipped with similar inductive biases, but differ significantly in the training procedure and macro/micro-level architecture design. In this work, we investigate the architectural distinctions between ConvNets and Transformers and try to identify the confounding variables when comparing the network performance. Our research is intended to bridge the gap between the pre-ViT and post-ViT eras for ConvNets, as well as to test the limits of what a pure ConvNet can achieve.

與ConvNets不同的是,在過去的十年中,ConvNets逐步得到了改善,而采用Vision Transformers則是一個步驟的改變。在最近的文獻中,在比較兩者時通常采用系統級的比較(如Swin Transformer與ResNet)。ConvNets和分層視覺Transformer同時變得既不同又相似:它們都配備了類似的歸納偏置,但在訓練程序和宏觀/微觀層面的架構設計上有很大的不同。在這項工作中,我們研究了ConvNets和Transformers之間的架構區別,并試圖確定比較網絡性能時的混雜變量。我們的研究旨在彌合ConvNets的前ViT時代和后ViT時代之間的差距,以及測試純ConvNet能夠實現的極限

To do this, we start with a standard ResNet (e.g. ResNet50) trained with an improved procedure. We gradually “modernize” the architecture to the construction of a hierarchical vision Transformer (e.g. Swin-T). Our exploration is directed by a key question: How do design decisions in Transformers impact ConvNets’ performance? We discover several key components that contribute to the performance difference along the way. As a result, we propose a family of pure ConvNets dubbed ConvNeXt. We evaluate ConvNeXts on a variety of vision tasks such as ImageNet classification [17], object detection/segmentation on COCO [44], and semantic segmentation on ADE20K [92]. Surprisingly, ConvNeXts, constructed entirely from standard ConvNet modules, compete favorably with Transformers in terms of accuracy, scalability and robustness across all major benchmarks. ConvNeXt maintains the efficiency of standard ConvNets, and the fully-convolutional nature for both training and testing makes it extremely simple to implement.

為了做到這一點,我們從一個標準的ResNet(例如ResNet50)開始,用改進的程序進行訓練。我們逐漸將架構 “現代化”,以構建一個分層的視覺Transformer(例如Swin-T)。我們的探索是由一個關鍵問題引導的。Transformer中的設計決定如何影響ConvNets的性能?我們發現了幾個關鍵的組件,這些組件有助于沿途的性能差異。因此,我們提出了一個被稱為ConvNeXt的ConvNets系列。我們在各種視覺任務上評估了ConvNeXts,如ImageNet分類、COCO上的物體檢測/分割,以及ADE20K上的語義分割。令人驚訝的是,完全由標準ConvNet模塊構建的ConvNeXts在所有主要基準的準確性、可擴展性和魯棒性方面與Transformers競爭。ConvNeXt保持了標準ConvNets的效率,而且訓練和測試的完全卷積性質使其實現起來非常簡單

We hope the new observations and discussions can challenge some common beliefs and encourage people to rethink the importance of convolutions in computer vision.

我們希望新的觀察和討論可以挑戰一些常見的信念,鼓勵人們重新思考計算機視覺中卷積的重要性。

2. Modernizing a ConvNet: a Roadmap —— 現代化的ConvNet:一個路線圖

In this section, we provide a trajectory going from a ResNet to a ConvNet that bears a resemblance to Transformers. We consider two model sizes in terms of FLOPs, one is the ResNet-50 / Swin-T regime with FLOPs around 4.5×109 and the other being ResNet-200 / Swin-B regime which has FLOPs around 15.0 × 109. For simplicity, we will present the results with the ResNet-50 / Swin-T complexity models. The conclusions for higher capacity models are consistent and results can be found in Appendix C.

在這一節中,我們提供了一個從ResNet到ConvNet的軌跡,這個軌跡與Transformer很相似。我們考慮了兩種FLOPs大小的模型,一種是ResNet-50 / Swin-T制度,FLOPs約為 4.5×1094.5\times 10^94.5×109,另一種是ResNet-200 / Swin-B制度,FLOPs約為 15.0×10915.0\times 10^915.0×109。為了簡單起見,我們將介紹ResNet-50 / Swin-T復雜度模型的結果。更高容量模型的結論是一致的,結果可以在附錄C中找到

At a high level, our explorations are directed to investigate and follow different levels of designs from a Swin Transformer while maintaining the network’s simplicity as a standard ConvNet. The roadmap of our exploration is as follows. Our starting point is a ResNet-50 model. We first train it with similar training techniques used to train vision Transformers and obtain much improved results compared to the original ResNet-50. This will be our baseline. We then study a series of design decisions which we summarized as 1) macro design, 2) ResNeXt, 3) inverted bottleneck, 4) large kernel size, and 5) various layer-wise micro designs. In Figure 2, we show the procedure and the results we are able to achieve with each step of the “network modernization”. Since network complexity is closely correlated with the final performance, the FLOPs are roughly controlled over the course of the exploration, though at intermediate steps the FLOPs might be higher or lower than the reference models. All models are trained and evaluated on ImageNet-1K.

在高層(high level)上,我們的探索方向是研究和遵循Swin Transformer的不同層次(level)的設計,同時保持網絡作為一個標準ConvNet的簡單性。我們探索的路線圖如下。

Figure 2. We modernize a standard ConvNet (ResNet) towards the design of a hierarchical vision Transformer (Swin), without introducing any attention-based modules. The foreground bars are model accuracies in the ResNet-50/Swin-T FLOP regime; results for the ResNet-200/Swin-B regime are shown with the gray bars. A hatched bar means the modification is not adopted. Detailed results for both regimes are in the appendix. Many Transformer architectural choices can be incorporated in a ConvNet, and they lead to increasingly better performance. In the end, our pure ConvNet model, named ConvNeXt, can outperform the Swin Transformer.
圖2. 我們將一個標準的ConvNet(ResNet)現代化,以設計一個層次化的視覺Transformer(Swin),而不引入任何基于注意力的模塊。前面的條形圖是ResNet-50/Swin-T FLOP體系中的模型精度;ResNet-200/Swin-B體系的結果用灰色條形圖表示。帶帽子的條形圖表示沒有采用該修改。兩個制度的詳細結果見附錄。許多Transformer架構的選擇可以被納入ConvNet中,而且它們會帶來越來越好的性能。最后,我們的純ConvNet模型,名為ConvNeXt,可以超過Swin Transformer

我們的起點是一個ResNet-50模型。我們首先用類似于訓練視覺Transformer的訓練技巧來訓練它,并獲得比原來的ResNet-50更多的結果。這將是我們的基線(baseline)。然后,我們研究了一系列的設計決策,我們總結為:

  • 宏觀設計
  • ResNeXt
  • 倒置瓶頸
  • 大卷積核
  • 各種層級的微設計
  • 在圖2中,我們展示了 "網絡現代化 "的每一步的程序和我們能夠實現的結果。由于網絡的復雜性與最終的性能密切相關,在探索的過程中,FLOPs被大致控制,盡管在中間步驟,FLOPs可能高于或低于參考模型。所有模型都是在ImageNet-1K上訓練和評估的。

    2.1 Training Techniques —— 訓練技巧

    Apart from the design of the network architecture, the training procedure also affects the ultimate performance. Not only did vision Transformers bring a new set of modules and architectural design decisions, but they also introduced different training techniques (e.g. AdamW optimizer) to vision. This pertains mostly to the optimization strategy and associated hyper-parameter settings. Thus, the first step of our exploration is to train a baseline model with the vision Transformer training procedure, in this case, ResNet50/200. Recent studies [7, 81] demonstrate that a set of modern training techniques can significantly enhance the performance of a simple ResNet-50 model. In our study, we use a training recipe that is close to DeiT’s [73] and Swin Transformer’s [45]. The training is extended to 300 epochs from the original 90 epochs for ResNets. We use the AdamW optimizer [46], data augmentation techniques such as Mixup [90], Cutmix [89], RandAugment [14], Random Erasing [91], and regularization schemes including Stochastic Depth [36] and Label Smoothing [69]. The complete set of hyper-parameters we use can be found in Appendix A.1. By itself, this enhanced training recipe increased the performance of the ResNet-50 model from 76.1% [1] to 78.8% (+2.7%), implying that a significant portion of the performance difference between traditional ConvNets and vision Transformers may be due to the training techniques. We will use this fixed training recipe with the same hyperparameters throughout the “modernization” process. Each reported accuracy on the ResNet-50 regime is an average obtained from training with three different random seeds.

    除了網絡架構的設計,訓練程序也會影響最終的性能。視覺Transformer不僅帶來了一套新的模塊和架構設計決策,而且還為視覺引入了不同的訓練技術(如AdamW優化器)。這主要涉及到優化策略和相關的超參數設置。因此,我們探索的第一步是用視覺Transformer訓練程序訓練一個基線模型(baseline),在這種情況下是ResNet50/200。最近的研究表明,一套現代訓練技術可以顯著提高一個簡單的ResNet-50模型的性能。在我們的研究中,我們使用了與DeiT和Swin Transformer的相近的訓練配置。訓練從原來的90個epochs擴展到300個epochs的ResNets。我們使用AdamW優化器,數據增強技術,如Mixup、Cutmix、RandAugment、Random Erasing,以及包括Stochastic Depth和Label Smoothing的正則化方案。我們使用的完整的超參數集可以在附錄A.1中找到。

    就其本身而言,這個增強的訓練配置將ResNet-50模型的性能從76.1%提高到78.8%(+2.7%),這意味著傳統ConvNets和視覺Transformer之間的性能差異的很大一部分可能是由于訓練技巧造成的。我們將在整個 "現代化"過程中使用這個固定的訓練配置,并使用相同的超參數。ResNet-50制度上的每個報告的準確度是用三個不同的隨機種子訓練得到的平均值

    2.2 Macro Design —— 宏觀設計

    We now analyze Swin Transformers’ macro network design. Swin Transformers follow ConvNets [28, 65] to use a multi-stage design, where each stage has a different feature map resolution. There are two interesting design considerations: the stage compute ratio, and the “stem cell” structure.

    我們現在分析一下Swin Transformers的宏觀網絡設計。Swin Transformers跟隨ConvNets使用多階段設計每個階段有不同的特征圖分辨率。有兩個有趣的設計考慮:階段計算比和 "干細胞(stem cell)"結構。

    Changing stage compute ratio. The original design of the computation distribution across stages in ResNet was largely empirical. The heavy “res4” stage was meant to be compatible with downstream tasks like object detection, where a detector head operates on the 14×14 feature plane. Swin-T, on the other hand, followed the same principle but with a slightly different stage compute ratio of 1:1:3:1. For larger Swin Transformers, the ratio is 1:1:9:1. Following the design, we adjust the number of blocks in each stage from (3, 4, 6, 3) in ResNet-50 to (3, 3, 9, 3), which also aligns the FLOPs with Swin-T. This improves the model accuracy from 78.8% to 79.4%. Notably, researchers have thoroughly investigated the distribution of computation [53, 54], and a more optimal design is likely to exist.
    From now on, we will use this stage compute ratio.

    2.2.1 改變階段性的計算比例 (Changing stage compute ratio)

    ResNet中各階段的計算分布的最初設計主要是經驗性的。沉重的 "res4 "階段是為了與下游任務兼容,如目標檢測,其中一個檢測器頭(detector head)在14×14的特征平面上操作。另一方面,Swin-T也遵循同樣的原則,但階段計算比例略有不同,為1:1:3:1。對于較大的Swin Transformers,比例為1:1:9:1。按照設計,我們將每個階段的塊數(blocks)從ResNet-50的(3,4,6,3)調整為(3,3,9,3),這也使FLOPs與Swin-T一致。這使模型的準確性從78.8%提高到79.4%。值得注意的是,研究人員已經徹底調查了計算的分布情況,而且很可能存在一個更理想的設計。

    從現在開始,我們將使用這個階段的計算比例

    Changing stem to “Patchify”. Typically, the stem cell design is concerned with how the input images will be processed at the network’s beginning. Due to the redundancy inherent in natural images, a common stem cell will aggressively downsample the input images to an appropriate feature map size in both standard ConvNets and vision Transformers. The stem cell in standard ResNet contains a 7×7 convolution layer with stride 2, followed by a max pool, which results in a 4× downsampling of the input images. In vision Transformers, a more aggressive “patchify” strategy is used as the stem cell, which corresponds to a large kernel size (e.g. kernel size = 14 or 16) and non-overlapping convolution. Swin Transformer uses a similar “patchify” layer, but with a smaller patch size of 4 to accommodate the architecture’s multi-stage design. We replace the ResNet-style stem cell with a patchify layer implemented using a 4×4, stride 4 convolutional layer. The accuracy has changed from 79.4% to 79.5%. This suggests that the stem cell in a ResNet may be substituted with a simpler “patchify” layer à la ViT which will result in similar performance.
    We will use the “patchify stem” (4×4 non-overlapping convolution) in the network.

    2.2.2 將"stem"改為 “Patchify” (Changing stem to “Patchify”)。

    通常情況下,stem設計關注的是在網絡開始時如何處理輸入圖像由于自然圖像中固有的冗余,一個普通的stem層將積極地對輸入圖像進行降采樣,以達到標準卷積網絡和視覺Transformer中適當的特征圖大小。標準ResNet中的stem層包含一個7×7的卷積層,步長為2,然后是一個MaxPooling層,這導致輸入圖像的4倍下采樣。在視覺Transformer中,一個更激進的 "Patchify"策略被用作stem層,它對應于一個大的核大小(例如kernel size=14或16)和非重疊卷積。Swin Transformer使用類似的 "Patchify "層,但patch尺寸較小,為4,以適應架構的多階段設計。我們用一個使用4×4、步長為4的卷積層實現的patchify層取代ResNet式的stem層。準確率從79.4%變為79.5%這表明ResNet中的stem層可以用一個更簡單的 "patchify "層來代替,就像ViT一樣,這將導致類似的性能

    我們將在網絡中使用 “patchify stem”(4×4非重疊卷積)。

    非重疊卷積就是說卷積核大小 ≤\le 步長

    2.3. ResNeXt-ify —— ResNeXT化

    In this part, we attempt to adopt the idea of ResNeXt [87], which has a better FLOPs/accuracy trade-off than a vanilla ResNet. The core component is grouped convolution, where the convolutional filters are separated into different groups. At a high level, ResNeXt’s guiding principle is to “use more groups, expand width”. More precisely, ResNeXt employs grouped convolution for the 3×3 conv layer in a bottleneck block. As this significantly reduces the FLOPs, the network width is expanded to compensate for the capacity loss.

    在這一部分,我們試圖采用ResNeXt的思想,它比普通的ResNet有更好的FLOPs/準確性權衡。其核心部分是分組卷積,其中卷積卷積核被分成不同的組。在高層次上,ResNeXt的指導原則是 “使用更多的組,擴大寬度”。更確切地說,ResNeXt對Bottleneck中的3×3卷積層采用了分組卷積。由于這大大減少了FLOPs,網絡寬度被擴大以補償容量的損失

    In our case we use depthwise convolution, a special case of grouped convolution where the number of groups equals the number of channels. Depthwise conv has been popularized by MobileNet [34] and Xception [11]. We note that depthwise convolution is similar to the weighted sum operation in self-attention, which operates on a per-channel basis, i.e., only mixing information in the spatial dimension. The combination of depthwise conv and 1 × 1 convs leads to a separation of spatial and channel mixing, a property shared by vision Transformers, where each operation either mixes information across spatial or channel dimension, but not both. The use of depthwise convolution effectively reduces the network FLOPs and, as expected, the accuracy. Following the strategy proposed in ResNeXt, we increase the network width to the same number of channels as Swin-T’s (from 64 to 96). This brings the network performance to 80.5% with increased FLOPs (5.3G). We will now employ the ResNeXt design.

    在我們的案例中,我們使用深度卷積,這是分組卷積的一個特例,其中分組的數量等于通道的數量。深度卷積已被MobileNet和Xception所推廣。我們注意到,深度卷積與自注意中的加權和操作類似,后者是在每個通道的基礎上操作的,也就是說,只混合空間維度的信息深度卷積和1×1卷積的結合導致了空間和通道混合的分離,這是視覺Transformer所共有的屬性,每個操作要么在空間或通道維度上混合信息,但不能同時混合深度卷積的使用有效地減少了網絡的FLOPs,正如預期的那樣,也減少了準確性。按照ResNeXt提出的策略,我們將網絡寬度增加到與Swin-T的通道數量相同(從64到96)。這使得網絡性能達到80.5%,FLOPs增加(5.3G)。我們現在將采用ResNeXt的設計。

    2.4. Inverted Bottleneck —— 逆殘差模塊

    One important design in every Transformer block is that it creates an inverted bottleneck, i.e., the hidden dimension of the MLP block is four times wider than the input dimension (see Figure 4). Interestingly, this Transformer design is connected to the inverted bottleneck design with an expansion ratio of 4 used in ConvNets. The idea was popularized by MobileNetV2 [61], and has subsequently gained traction in several advanced ConvNet architectures [70, 71].

    每個Transformer塊中的一個重要設計是,它創造了一個逆殘差瓶頸模塊,即MLP塊的隱藏維度比輸入維度寬四倍(見圖4)。有趣的是,這種Transformer設計與ConvNets中使用的擴展率為4的逆殘差瓶頸模塊設計有聯系。這個想法被MobileNetV2所推廣,隨后在一些先進的ConvNet架構中得到推廣[MnasNet, EfficientNet]。

    Figure 3. Block modifications and resulted specifications. (a) is a ResNeXt block; in (b) we create an inverted bottleneck block and in ? the position of the spatial depthwise conv layer is moved up.
    圖3. 塊的修改和結果規格。(a)是一個ResNeXt塊;在(b)中,我們創建了一個倒置的瓶頸塊,在?中,空間縱深說服層的位置被上移。

    Here we explore the inverted bottleneck design. Figure 3 (a) to (b) illustrate the configurations. Despite the increased FLOPs for the depthwise convolution layer, this change reduces the whole network FLOPs to 4.6G, due to the significant FLOPs reduction in the downsampling residual blocks’ shortcut 1×1 conv layer. Interestingly, this results in slightly improved performance (80.5% to 80.6%). In the ResNet-200 / Swin-B regime, this step brings even more gain (81.9% to 82.6%) also with reduced FLOPs.
    We will now use inverted bottlenecks.

    在這里,我們探討了逆殘差模塊設計。圖3(a)至(b)說明了配置。盡管深度卷積層的FLOPs增加了,但由于下采樣殘余塊的捷徑1×1卷積層的FLOPs大幅減少,這種改變使整個網絡的FLOPs減少到4.6G。有趣的是,這樣做的結果是性能略有提高(80.5%到80.6%)。在ResNet-200/Swin-B系統中,這一步帶來了更多的收益(81.9%到82.6%),也減少了FLOPs。

    我們現在將使用逆殘差模塊。

    2.5. Large Kernel Sizes —— 大卷積核

    In this part of the exploration, we focus on the behavior of large convolutional kernels. One of the most distinguishing aspects of vision Transformers is their non-local self-attention, which enables each layer to have a global receptive field. While large kernel sizes have been used in the past with ConvNets [40, 68], the gold standard (popularized by VGGNet [65]) is to stack small kernel-sized (3×3) conv layers, which have efficient hardware implementations on modern GPUs [41]. Although Swin Transformers reintroduced the local window to the self-attention block, the window size is at least 7×7, significantly larger than the ResNe(X)t kernel size of 3×3. Here we revisit the use of large kernel-sized convolutions for ConvNets.

    在這一部分的探索中,我們重點關注大型卷積核的效果。視覺Transformer最突出的一個方面是它們的非局部自我注意(non-local self-attention),這使得每一層都有一個全局的接受場(global receptive field)。雖然過去在ConvNets[AlexNet, Inception v1]中使用了大內核尺寸,但黃金標準(由VGGNet推廣)是堆疊小內核尺寸(3×3)的conv層,這在現代GPU上有高效的硬件實現。雖然Swin Transformers在自注意力模塊中重新引入了局部窗口(local window),但窗口大小至少是7×7,明顯大于3×3的ResNe(X)t內核大小。在此,我們重新審視大核大小的卷積在ConvNets中的使用。

    2.5.1 Moving up depthwise conv layer —— 上移深度卷積層

    To explore large kernels, one prerequisite is to move up the position of the depthwise conv layer (Figure 3 (b) to ?). That is a design decision also evident in Transformers: the MSA block is placed prior to the MLP layers. As we have an inverted bottleneck block, this is a natural design choice — the complex/inefficient modules (MSA, large-kernel conv) will have fewer channels, while the efficient, dense 1×1 layers will do the heavy lifting. This intermediate step reduces the FLOPs to 4.1G, resulting in a temporary performance degradation to 79.9%.

    為了探索大的內核,一個前提條件是將深度卷積層的位置上移(圖3(b)到(c))。這是一個在Transformer中也很明顯的設計決定:MSA塊被放在MLP層之前。由于我們有一個逆殘差模塊,這是一個自然的設計選擇——復雜/低效的模塊(MSA,大核conv)將有較少的通道,而高效、密集的1×1層將完成重任這個中間步驟將FLOPs減少到4.1G,導致性能暫時下降到79.9%

    2.5.2 Increasing the kernel size —— 增大卷積核尺寸

    With all of these preparations,the benefit of adopting larger kernel-sized convolutions is significant. We experimented with several kernel sizes, including 3, 5, 7, 9, and 11. The network’s performance increases from 79.9% (3×3) to 80.6% (7×7), while the network’s FLOPs stay roughly the same. Additionally, we observe that the benefit of larger kernel sizes reaches a saturation point at 7×7. We verified this behavior in the large capacity model too: a ResNet-200 regime model does not exhibit further gain when we increase the kernel size beyond 7×7.
    We will use 7×7 depthwise conv in each block.
    At this point, we have concluded our examination of network architectures on a macro scale. Intriguingly, a significant portion of the design choices taken in a vision Transformer may be mapped to ConvNet instantiations.

    在所有這些準備工作中,采用較大的核大小的卷積的好處是顯著的。我們試驗了幾種內核大小,包括3、5、7、9和11。網絡的性能從79.9%(3×3)增加到80.6%(7×7),而網絡的FLOPs大致保持不變。此外,我們觀察到,更大的內核尺寸的好處在7×7時達到了飽和點。我們在大容量模型中也驗證了這種行為:當我們將核大小增加到7×7以上時,ResNet-200制度模型沒有表現出進一步的收益

    我們將在每個區塊中使用7×7的深度 conv。

    至此,我們結束了對宏觀規模上的網絡結構的研究。耐人尋味的是,在視覺Transformer中采取的相當一部分設計選擇可以映射到ConvNet實例中。

    2.6. Micro Design —— 微觀設計

    In this section, we investigate several other architectural differences at a micro scale — most of the explorations here are done at the layer level, focusing on specific choices of activation functions and normalization layers.

    在本節中,我們在微觀層面上研究了其他幾個架構上的差異——這里的大部分探索都是在層級上完成的,重點是激活函數和歸一化層的具體選擇。

    2.6.1 Replacing ReLU with GELU

    One discrepancy between NLP and vision architectures is the specifics of which activation functions to use. Numerous activation functions have been developed over time, but the Rectified Linear Unit (ReLU) [49] is still extensively used in ConvNets due to its simplicity and efficiency. ReLU is also used as an activation function in the original Transformer paper [77]. The Gaussian Error Linear Unit, or GELU [32], which can be thought of as a smoother variant of ReLU, is utilized in the most advanced Transformers, including Google’s BERT [18] and OpenAI’s GPT-2 [52], and, most recently, ViTs. We find that ReLU can be substituted with GELU in our ConvNet too, although the accuracy stays unchanged (80.6%).

    NLP和視覺架構之間的一個差異是使用何種激活函數的具體問題。隨著時間的推移,許多激活函數已經被開發出來,但整流線性單元(ReLU)由于其簡單和高效,仍然被廣泛用于ConvNets。ReLU也被用作原始變形器論文中的激活函數。高斯誤差線性單元,即GELU,可以被認為是ReLU的平滑變體,在最先進的Transformer中被利用,包括谷歌的BERT和OpenAI的GPT-2,以及最近的ViTs。我們發現,在我們的ConvNet中,ReLU也可以用GELU代替,盡管準確率保持不變(80.6%)

    2.6.2 Fewer activation functions —— 更少的激活函數

    One minor distinction between a Transformer and a ResNet block is that Transformers have fewer activation functions. Consider a Transformer block with key/query/value linear embedding layers, the projection layer, and two linear layers in an MLP block. There is only one activation function present in the MLP block. In comparison, it is common practice to append an activation function to each convolutional layer, including the 1 × 1 convs. Here we examine how performance changes when we stick to the same strategy. As depicted in Figure 4, we eliminate all GELU layers from the residual block except for one between two 1 × 1 layers, replicating the style of a Transformer block. This process improves the result by 0.7% to 81.3%, practically matching the performance of Swin-T.
    We will now use a single GELU activation in each block.

    Transformer和ResNet塊之間的一個小區別是,Transformer的激活函數較少。考慮一個帶有鍵(Key)/查詢(Query)/值(Value)線性嵌入層(Embedding層)的Transformer塊,投影層(projection layer),以及MLP塊中的兩個線性層(linear layer)。在MLP塊中只有一個激活函數存在。相比之下,通常的做法是在每個卷積層(包括1×1卷積層)上附加一個激活函數。在這里,我們研究了當我們堅持使用相同的策略時,性能如何變化,如圖4所示。

    Figure 4. Block designs for a ResNet, a Swin Transformer, and a ConvNeXt. Swin Transformer’s block is more sophisticated due to the presence of multiple specialized modules and two residual connections. For simplicity, we note the linear layers in Transformer MLP blocks also as “1×1 convs” since they are equivalent.
    圖4. 一個ResNet、一個Swin Transformer和一個ConvNeXt的模塊設計。由于存在多個專門的模塊和兩個剩余連接,Swin Transformer的模塊更加復雜。為了簡單起見,我們把Transformer MLP塊中的線性層也記為 “1×1 convs”,因為它們是等同的。

    我們從殘差塊中消除了所有的GELU層,除了兩個1×1層之間的一個,復制了變形塊的風格。這個過程將結果提高了0.7%,達到81.3%,實際上與Swin-T的性能相匹配。

    現在我們將在每個塊中使用單一的GELU激活。

    2.6.3 Fewer normalization layers —— 更少的歸一化層

    Transformer blocks usually have fewer normalization layers as well. Here we remove two BatchNorm (BN) layers, leaving only one BN layer before the conv 1 × 1 layers. This further boosts the performance to 81.4%, already surpassing Swin-T’s result. Note that we have even fewer normalization layers per block than Transformers, as empirically we find that adding one additional BN layer at the beginning of the block does not improve the performance.

    Transformer塊通常也有較少的歸一化層。這里我們去掉了兩個BatchNorm(BN)層,在Conv 1×1層之前只留下一個BN層。這進一步將性能提高到81.4%,已經超過了Swin-T的結果。請注意,我們每個區塊的歸一化層數甚至比Transformers還要少,因為根據經驗,我們發現在區塊的開始增加一個額外的BN層并不能提高性能

    2.6.4 Substituting BN with LN —— 使用LN替換BN

    BatchNorm [38] is an essential component in ConvNets as it improves the convergence and reduces overfitting. However, BN also has many intricacies that can have a detrimental effect on the model’s performance [84]. There have been numerous attempts at developing alternative normalization [60, 75, 83] techniques, but BN has remained the preferred option in most vision tasks. On the other hand, the simpler Layer Normalization [5] (LN) has been used in Transformers, resulting in good performance across different application scenarios.

    BatchNorm是ConvNets中的一個重要組成部分,因為它可以提高收斂性并減少過擬合。然而,BN也有許多錯綜復雜的問題,會對模型的性能產生不利的影響[84]。已經有很多人嘗試開發替代的歸一化技術[60, 75, 83],但在大多數視覺任務中,BN仍然是首選。另一方面,更簡單的層歸一化(LN)已被用于Transformer,在不同的應用場景中產生了良好的性能。

    Directly substituting LN for BN in the original ResNet will result in suboptimal performance [83]. With all the modifications in network architecture and training techniques, here we revisit the impact of using LN in place of BN. We observe that our ConvNet model does not have any difficulties training with LN; in fact, the performance is slightly better, obtaining an accuracy of 81.5%.
    From now on, we will use one LayerNorm as our choice of normalization in each residual block.

    在原ResNet中直接用LN代替BN會導致次優的性能[83]隨著網絡結構和訓練技術的所有修改,這里我們重新審視了使用LN來代替BN的影響。我們觀察到,我們的ConvNet模型在使用LN訓練時沒有任何困難;事實上,性能略好,獲得了81.5%的準確性。

    從現在開始,我們將使用一個LayerNorm作為我們在每個殘差塊中的標準化選擇。

    2.6.5 Separate downsampling layers獨立的下采樣層

    In ResNet, the spatial downsampling is achieved by the residual block at the start of each stage, using 3×3 conv with stride 2 (and 1×1 conv with stride 2 at the shortcut connection). In Swin Transformers, a separate downsampling layer is added between stages. We explore a similar strategy in which we use 2×2 conv layers with stride 2 for spatial downsampling. This modification surprisingly leads to diverged training. Further investigation shows that, adding normalization layers wherever spatial resolution is changed can help stablize training. These include several LN layers also used in Swin Transformers: one before each downsampling layer, one after the stem, and one after the final global average pooling. We can improve the accuracy to 82.0%, significantly exceeding Swin-T’s 81.3%.

    在ResNet中,空間下采樣是由每個階段開始時的residual block實現的,使用3×3 conv with stride 2(在捷徑連接處使用1×1 conv with stride 2)。在Swin Transformers中,在各階段之間增加了一個單獨的下采樣層。我們探索了一種類似的策略,即使用跨度為2的2×2 conv層進行空間下采樣。這種修改出人意料地導致了訓練的分歧。進一步的調查顯示,在空間分辨率改變的地方添加歸一化層,有助于穩定訓練。這些包括同樣用于Swin Transformers的幾個LN層:一個在每個下采樣層之前,一個在干層之后,一個在最后的全局平均匯集之后。我們可以將精度提高到82.0%,大大超過Swin-T的81.3%。

    We will use separate downsampling layers. This brings us to our final model, which we have dubbed ConvNeXt. A comparison of ResNet, Swin, and ConvNeXt block structures can be found in Figure 4. A comparison of ResNet-50, Swin-T and ConvNeXt-T’s detailed architecture specifications can be found in Table 9.

    我們將使用單獨的下采樣層。這給我們帶來了最終的模型,我們將其稱為ConvNeXt。圖4是ResNet、Swin和ConvNeXt塊結構的比較。ResNet-50、Swin-T和ConvNeXt-T的詳細架構規格的比較可以在表9中找到。

    2.6.6 Closing remarks —— 閉幕詞(總結)

    We have finished our first “playthrough” and discovered ConvNeXt, a pure ConvNet, that can outperform the Swin Transformer for ImageNet-1K classification in this compute regime. It is worth noting that all design choices discussed so far are adapted from vision Transformers. In addition, these designs are not novel even in the ConvNet literature — they have all been researched separately, but not collectively, over the last decade. Our ConvNeXt model has approximately the same FLOPs, #params., throughput, and memory use as the Swin Transformer, but does not require specialized modules such as shifted window attention or relative position biases.

    我們已經完成了我們的第一周目,發現ConvNeXt,一個純粹的ConvNet,在這個計算系統中可以超過Swin Transformer的ImageNet-1K分類。值得注意的是,到目前為止討論的所有設計選擇都是從視覺變形器中改編而來。此外,這些設計即使在ConvNet文獻中也并不新穎——它們在過去十年中都被單獨研究過,但沒有被集體研究過。我們的ConvNeXt模型的FLOPs、#params.、吞吐量和內存使用量與Swin Transformer大致相同,但不需要專門的模塊,如移窗注意或相對位置偏差

    These findings are encouraging but not yet completely convincing — our exploration thus far has been limited to a small scale, but vision Transformers’ scaling behavior is what truly distinguishes them. Additionally, the question of whether a ConvNet can compete with Swin Transformers on downstream tasks such as object detection and semantic segmentation is a central concern for computer vision practitioners. In the next section, we will scale up our ConvNeXt models both in terms of data and model size, and evaluate them on a diverse set of visual recognition tasks.

    這些發現令人鼓舞,但還不能完全令人信服——迄今為止,我們的探索僅限于小規模,但視覺Transformer的擴展行為才是它們真正的區別所在。此外,ConvNet能否在下游任務(如物體檢測和語義分割)上與Swin Transformers競爭的問題是計算機視覺從業者的核心關注點。在下一節中,我們將在數據和模型大小方面擴大我們的ConvNeXt模型,并在一組不同的視覺識別任務上對它們進行評估。

    3. Empirical Evaluations on ImageNet —— 在ImageNet上的經驗評估

    We construct different ConvNeXt variants, ConvNeXtT/S/B/L, to be of similar complexities to Swin-T/S/B/L [45]. ConvNeXt-T/B is the end product of the “modernizing” procedure on ResNet-50/200 regime, respectively. In addition, we build a larger ConvNeXt-XL to further test the scalability of ConvNeXt. The variants only differ in the number of channels C, and the number of blocks B in each stage. Following both ResNets and Swin Transformers, the number of channels doubles at each new stage. We summarize the configurations below:

    我們構建了不同的ConvNeXt變體,ConvNeXtT/S/B/L,其復雜程度與Swin-T/S/B/L相似。ConvNeXt-T/B是在ResNet-50/200制度上分別進行 "現代化 "程序的最終產品。此外,我們建立了一個更大的ConvNeXt-XL來進一步測試ConvNeXt的可擴展性。這些變體只在通道數C和每個階段的塊數B上有所不同。按照ResNets和Swin Transformers,通道的數量在每個新階段都會增加一倍。我們把這些配置總結如下。

    ? ConvNeXt-T: C = (96, 192, 384, 768), B = (3, 3, 9, 3)
    ? ConvNeXt-S: C = (96, 192, 384, 768), B = (3, 3, 27, 3)
    ? ConvNeXt-B: C = (128, 256, 512, 1024), B = (3, 3, 27, 3)
    ? ConvNeXt-L: C = (192, 384, 768, 1536), B = (3, 3, 27, 3)
    ? ConvNeXt-XL: C = (256, 512, 1024, 2048), B = (3, 3, 27, 3)

    3.1. Settings

    The ImageNet-1K dataset consists of 1000 object classes with 1.2M training images. We report ImageNet-1K top-1 accuracy on the validation set. We also conduct pre-training on ImageNet-22K, a larger dataset of 21841 classes (a superset of the 1000 ImageNet-1K classes) with ~14M images for pre-training, and then fine-tune the pre-trained model on ImageNet-1K for evaluation. We summarize our training setups below. More details can be found in Appendix A.

    ImageNet-1K數據集由1000個物體類別和120萬張訓練圖像組成。我們報告了ImageNet-1K在驗證集上的最高準確性。我們還在ImageNet-22K上進行了預訓練,這是一個由21841個類組成的更大的數據集(1000個ImageNet-1K類的超集),有1400萬張圖像用于預訓練,然后在ImageNet-1K上對預訓練模型進行微調以進行評估。我們在下面總結了我們的訓練設置。更多的細節可以在附錄A中找到。

    Training on ImageNet-1K

    We train ConvNeXts for 300 epochs using AdamW [46] with a learning rate of 4e-3. There is a 20-epoch linear warmup and a cosine decaying schedule afterward. We use a batch size of 4096 and a weight decay of 0.05. For data augmentations, we adopt common schemes including Mixup [90], Cutmix [89], RandAugment [14], and Random Erasing [91]. We regularize the networks with Stochastic Depth [37] and Label Smoothing [69]. Layer Scale [74] of initial value 1e-6 is applied. We use Exponential Moving Average (EMA) [51] as we find it alleviates larger models’ overfitting.

    我們使用AdamW對ConvNeXts進行了300個epochs的訓練,學習率為4×10?34\times 10^{-3}4×10?3。有一個20個epoch的線性預熱,之后是余弦衰落的時間表。我們使用了4096的批次大小和0.05的權重衰減。對于數據增強,我們采用常見的方案,包括Mixup、Cutmix、RandAugment和Random Erasing。我們用隨機深度和標簽平滑對網絡進行規范。采用了初始值為1×10?61\times 10^{-6}1×10?6的Layer Scale。我們使用指數移動平均法(EMA),因為我們發現它可以減輕較大的模型的過擬合。

    Pre-training on ImageNet-22K

    We pre-train ConvNeXts on ImageNet-22K for 90 epochs with a warmup of 5 epochs. We do not use EMA. Other settings follow ImageNet-1K.

    我們在ImageNet-22K上對ConvNeXts進行了90個epochs的預訓練,并進行了5個epochs的預熱。我們不使用EMA。其他設置遵循ImageNet-1K。

    Fine-tuning on ImageNet-1K

    We fine-tune ImageNet-22K pre-trained models on ImageNet-1K for 30 epochs. We use AdamW, a learning rate of 5e-5, cosine learning rate schedule, layer-wise learning rate decay [6, 12], no warmup, a batch size of 512, and weight decay of 1e-8. The default pre-training, fine-tuning, and testing resolution is 2242 . Additionally, we fine-tune at a larger resolution of 3842, for both ImageNet-22K and ImageNet-1K pre-trained models.

    我們在ImageNet-1K上對ImageNet-22K的預訓練模型進行了30個epochs的微調。我們使用AdamW,學習率為5×10?55\times 10^{-5}5×10?5,余弦學習率計劃,層級學習率衰減[6, 12],無預熱,批次大小為512,權重衰減為1×10?81\times 10^{-8}1×10?8。默認的預訓練、微調和測試分辨率為2242224^22242。此外,我們對ImageNet-22K和ImageNet-1K的預訓練模型在更大的分辨率下進行微調,即3842384^23842

    Compared with ViTs/Swin Transformers, ConvNeXts are simpler to fine-tune at different resolutions, as the network is fully-convolutional and there is no need to adjust the input patch size or interpolate absolute/relative position biases.

    與ViTs/Swin Transformers相比,ConvNeXts在不同分辨率下的微調更簡單,因為網絡是完全卷積的,不需要調整輸入補丁大小或插值絕對/相對位置偏差

    3.2. Results

    ImageNet-1K

    Table 1 (upper) shows the result comparison with two recent Transformer variants, DeiT [73] and Swin Transformers [45], as well as two ConvNets from architecture search - RegNets [54], EfficientNets [71] and EfficientNetsV2 [72]. ConvNeXt competes favorably with two strong ConvNet baselines (RegNet [54] and EfficientNet [71]) in terms of the accuracy-computation trade-off, as well as the inference throughputs. ConvNeXt also outperforms Swin Transformer of similar complexities across the board, sometimes with a substantial margin (e.g. 0.8% for ConvNeXt-T). Without specialized modules such as shifted windows or relative position bias, ConvNeXts also enjoy improved throughput compared to Swin Transformers.

    表1(上)顯示了與最近的兩個Transformer變體DeiT和Swin Transformers,以及兩個來自架構搜索的ConvNets–RegNets、EfficientNets和EfficientNetsV2的結果比較。

    Table 1. Classification accuracy on ImageNet-1K. Similar to Transformers, ConvNeXt also shows promising scaling behavior with higher-capacity models and a larger (pre-training) dataset. Inference throughput is measured on a V100 GPU, following [45]. On an A100 GPU, ConvNeXt can have a much higher throughput than Swin Transformer. See Appendix E. (?)ViT results with 90-epoch AugReg [67] training, provided through personal communication with the authors.
    表1. ImageNet-1K的分類精度。與Transformers類似,ConvNeXt也顯示了在更高容量的模型和更大的(預訓練)數據集下有希望的擴展行為。推理吞吐量是在V100 GPU上測量的,遵循[Swin-Transformer]。在A100 GPU上,ConvNeXt的吞吐量可以比Swin Transformer高得多。見附錄E。(?)ViT在90個周期的AugReg[67]訓練下的結果,通過與作者的個人交流提供。

    ConvNeXt與兩個強大的ConvNet基線(RegNet和EfficientNet)在準確性-計算權衡以及推理吞吐量方面進行了良好的競爭。ConvNeXt也全面超越了復雜程度相似的Swin Transformer,有時還有很大的差距(例如ConvNeXt-T的0.8%)。如果沒有專門的模塊,如移位窗口(shifted windows)或相對位置偏差(relative position bias),ConvNeXt也享有比Swin Transformer更好的吞吐量(throughput )。

    A highlight from the results is ConvNeXt-B at 3842384^23842 : it outperforms Swin-B by 0.6% (85.1% vs. 84.5%), but with 12.5% higher inference throughput (95.7 vs. 85.1 image/s). We note that the FLOPs/throughput advantage of ConvNeXt-B over Swin-B becomes larger when the resolution increases from 2242224^22242 to 3842384^23842. Additionally, we observe an improved result of 85.5% when further scaling to ConvNeXt-L.

    結果中的一個亮點是3842384^23842的ConvNeXt-B:它比Swin-B高出0.6%(85.1%對84.5%),但推理吞吐量高出12.5%(95.7對85.1圖像/秒)。我們注意到,當分辨率從2242224^22242增加到3842384^23842時,ConvNeXt-B相對于Swin-B的FLOPs/吞吐量優勢變得更大。此外,當進一步擴展到ConvNeXt-L時,我們觀察到85.5%的改進結果。

    ImageNet-22K. We present results with models fine-tuned from ImageNet-22K pre-training at Table 1 (lower). These experiments are important since a widely held view is that vision Transformers have fewer inductive biases thus can perform better than ConvNets when pre-trained on a larger scale. Our results demonstrate that properly designed ConvNets are not inferior to vision Transformers when pre-trained with large dataset — ConvNeXts still perform on par or better than similarly-sized Swin Transformers, with slightly higher throughput. Additionally, our ConvNeXt-XL model achieves an accuracy of 87.8% — a decent improvement over ConvNeXt-L at 3842384^23842 , demonstrating that ConvNeXts are scalable architectures.

    ImageNet-22K

    我們在表1(下圖)展示了從ImageNet-22K預訓練中微調的模型結果。這些實驗是很重要的,因為有一種廣泛的觀點認為,視覺Transformer的歸納偏置較少,因此在進行大規模的預訓練時可以比ConvNets的表現更好。我們的結果表明,當用大型數據集進行預訓練時,適當設計的ConvNets并不遜于視覺Transformer——ConvNeXts的性能仍然與類似規模的Swin Transformers相當或更好,而且吞吐量略高。此外,我們的ConvNeXt-XL模型達到了87.8%的準確率——比ConvNeXt-L的3842384^23842的準確率有了很大的提高,這表明ConvNeXts是可擴展的架構

    On ImageNet-1K, EfficientNetV2-L, a searched architecture equipped with advanced modules (such as Squeeze-andExcitation [35]) and progressive training procedure achieves top performance. However, with ImageNet-22K pre-training, ConvNeXt is able to outperform EfficientNetV2, further demonstrating the importance of large-scale training.
    In Appendix B, we discuss robustness and out-of-domain generalization results for ConvNeXt.

    在ImageNet-1K上,EfficientNetV2-L,一個配備了高級模塊(如Squeeze-andExcitation[35])和漸進式訓練程序的搜索架構取得了頂級性能。然而,在ImageNet-22K的預訓練下,ConvNeXt能夠超越EfficientNetV2,進一步證明了大規模訓練的重要性。

    在附錄B中,我們討論了ConvNeXt的魯棒性(robustness)和域外泛化結果(out-of-domain generalization results)。

    3.3. Isotropic ConvNeXt vs. ViT —— 各向同性研究

    In this ablation, we examine if our ConvNeXt block design is generalizable to ViT-style [20] isotropic architectures which have no downsampling layers and keep the same feature resolutions (e.g. 14×14) at all depths. We construct isotropic ConvNeXt-S/B/L using the same feature dimensions as ViT-S/B/L (384/768/1024). Depths are set at 18/18/36 to match the number of parameters and FLOPs. The block structure remains the same (Fig. 4). We use the supervised training results from DeiT [73] for ViT-S/B and MAE [26] for ViT-L, as they employ improved training procedures over the original ViTs [20]. ConvNeXt models are trained with the same settings as before, but with longer warmup epochs. Results for ImageNet-1K at 2242 resolution are in Table 2. We observe ConvNeXt can perform generally on par with ViT, showing that our ConvNeXt block design is competitive when used in non-hierarchical models.

    在這個消融中,我們研究了我們的ConvNeXt塊設計是否可以推廣到ViT式(ViT-style)的各向異性架構,這種架構沒有下采樣層,在所有深度都保持相同的特征分辨率(如14×14)。我們使用與ViT-S/B/L相同的特征尺寸(384/768/1024)構建各向異性的ConvNeXt-S/B/L。深度設置為18/18/36,以匹配參數和FLOPs的數量。塊狀結構保持不變(圖4)。

    Figure 4. Block designs for a ResNet, a Swin Transformer, and a ConvNeXt. Swin Transformer’s block is more sophisticated due to the presence of multiple specialized modules and two residual connections. For simplicity, we note the linear layers in Transformer MLP blocks also as “1×1 convs” since they are equivalent.
    圖4. 一個ResNet、一個Swin Transformer和一個ConvNeXt的模塊設計。由于存在多個專門的模塊和兩個剩余連接,Swin Transformer的模塊更加復雜。為了簡單起見,我們把Transformer MLP塊中的線性層也記為 “1×1 convs”,因為它們是等同的。

    我們對ViT-S/B使用DeiT、的監督訓練結果,對ViT-L使用MAE[26]的監督訓練結果,因為它們采用了比原始ViTs[20]更好的訓練程序。ConvNeXt模型的訓練設置與之前相同,但有更長的預熱周期。表2列出了2242分辨率的ImageNet-1K的結果。我們觀察到ConvNeXt的表現基本與ViT持平,這表明我們的ConvNeXt塊設計在用于非層次模型時具有競爭力(non-hierarchical models)。

    Table 2. Comparing isotropic ConvNeXt and ViT. Training memory is measured on V100 GPUs with 32 per-GPU batch size.
    表2. 比較各向同性的ConvNeXt和ViT。訓練內存是在V100 GPU上測量的,每個GPU的批量大小為32。

    4. Empirical Evaluation on Downstream Tasks —— 下游任務的實證評估

    Object detection and segmentation on COCO

    We finetune Mask R-CNN [27] and Cascade Mask R-CNN [9] on the COCO dataset with ConvNeXt backbones. Following Swin Transformer [45], we use multi-scale training, AdamW optimizer, and a 3× schedule. Further details and hyperparameter settings can be found in Appendix A.3.

    我們在COCO數據集上用ConvNeXt骨干網絡對Mask R-CNN和Cascade Mask R-CNN進行微調。在Swin Transformer之后,我們使用了多尺度訓練、AdamW優化器和3×時間表。進一步的細節和超參數設置可以在附錄A.3中找到。

    Table 3 shows object detection and instance segmentation results comparing Swin Transformer, ConvNeXt, and traditional ConvNet such as ResNeXt. Across different model complexities, ConvNeXt achieves on-par or better performance than Swin Transformer. When scaled up to bigger models (ConvNeXt-B/L/XL) pre-trained on ImageNet-22K, in many cases ConvNeXt is significantly better (e.g. +1.0 AP) than Swin Transformers in terms of box and mask AP.

    表3顯示了Swin Transformer、ConvNeXt和ResNeXt等傳統ConvNet的物體檢測和實例分割結果的比較。

    Table 3. COCO object detection and segmentation results using Mask-RCNN and Cascade Mask-RCNN. ? indicates that the model is pre-trained on ImageNet-22K. ImageNet-1K pre-trained Swin results are from their Github repository [3]. AP numbers of the ResNet-50 and X101 models are from [45]. We measure FPS on an A100 GPU. FLOPs are calculated with image size (1280, 800).
    表3. 使用Mask-RCNN和Cascade Mask-RCNN進行COCO物體檢測和分割的結果。?表示該模型是在ImageNet-22K上預訓練的。ImageNet-1K的預訓練Swin結果來自其Github資源庫[3]。ResNet-50和X101模型的AP編號來自[45]。我們在A100 GPU上測量FPS。FLOPs是以圖像尺寸(1280,800)計算的。

    在不同的模型復雜性中,ConvNeXt取得了與Swin Transformer相當或更好的性能。當擴大到在ImageNet-22K上預訓練的更大的模型(ConvNeXt-B/L/XL)時,在許多情況下,ConvNeXt在box 和mask AP方面明顯優于Swin Transformer(例如+1.0AP)。

    Semantic segmentation on ADE20K

    We also evaluate ConvNeXt backbones on the ADE20K semantic segmentation task with UperNet [85]. All model variants are trained for 160K iterations with a batch size of 16. Other experimental settings follow [6] (see Appendix A.3 for more details). In Table 4, we report validation mIoU with multi-scale testing. ConvNeXt models can achieve competitive performance across different model capacities, further validating the effectiveness of our architecture design.

    我們還在ADE20K語義分割任務中評估了ConvNeXt骨架與UperNet的關系。所有的模型變體都訓練了16萬次迭代,批次大小為16。其他實驗設置遵循[6](更多細節見附錄A.3)。在表4中,我們報告了多尺度測試的驗證mIoU。

    Table 4. ADE20K validation results using UperNet [85]. ? indicates IN-22K pre-training. Swins’ results are from its GitHub repository [2]. Following Swin, we report mIoU results with multiscale testing. FLOPs are based on input sizes of (2048, 512) and (2560, 640) for IN-1K and IN-22K pre-trained models, respectively.
    表4. 使用UperNet[85]的ADE20K驗證結果。?表示IN-22K預訓練。Swins的結果來自其GitHub倉庫[2]。繼Swin之后,我們報告了多尺度測試的mIoU結果。FLOPs是基于IN-1K和IN-22K預訓練模型的輸入尺寸(2048,512)和(2560,640)。

    ConvNeXt模型可以在不同的模型容量下取得有競爭力的性能,進一步驗證了我們架構設計的有效性。

    Remarks on model efficiency

    Under similar FLOPs, models with depthwise convolutions are known to be slower and consume more memory than ConvNets with only dense convolutions. It is natural to ask whether the design of ConvNeXt will render it practically inefficient. As demonstrated throughout the paper, the inference throughputs of ConvNeXts are comparable to or exceed that of Swin Transformers. This is true for both classification and other tasks requiring higher-resolution inputs (see Table 1,3 for comparisons of throughput/FPS). Furthermore, we notice that training ConvNeXts requires less memory than training Swin Transformers. For example, training Cascade Mask-RCNN using ConvNeXt-B backbone consumes 17.4GB of peak memory with a per-GPU batch size of 2, while the reference number for Swin-B is 18.5GB. In comparison to vanilla ViT, both ConvNeXt and Swin Transformer exhibit a more favorable accuracy-FLOPs trade-off due to the local computations. It is worth noting that this improved efficiency is a result of the ConvNet inductive bias, and is not directly related to the self-attention mechanism in vision Transformers.

    在類似的FLOPs下,已知具有深度卷積的模型比只有密集卷積的ConvNets更慢,消耗更多的內存。我們很自然地會問,ConvNeXt的設計是否會使其實際效率降低。正如本文所展示的那樣,ConvNeXt的推理吞吐量與Swin Transformers相當,甚至超過了Swin Transformers。這對于分類和其他需要高分辨率輸入的任務來說都是如此(吞吐量/FPS的比較見表1,3)。此外,我們注意到,訓練ConvNeXts需要的內存比訓練Swin Transformers少。例如,使用ConvNeXt-B骨干訓練Cascade Mask-RCNN,在每個GPU批次大小為2的情況下,消耗了17.4GB的峰值內存,而Swin-B的參考數字是18.5GB。與vanilla ViT相比,由于本地計算,ConvNeXt和Swin Transformer都表現出更有利的精度-FLOPs權衡。值得注意的是,這種效率的提高是ConvNet歸納偏置的結果,而與視覺Transformer中的自注意機制沒有直接關系。

    5. Related Work

    5.1 Hybrid models

    In both the pre- and post-ViT eras, the hybrid model combining convolutions and self-attentions has been actively studied. Prior to ViT, the focus was on augmenting a ConvNet with self-attention/non-local modules [8, 55, 66, 79] to capture long-range dependencies. The original ViT [20] first studied a hybrid configuration, and a large body of follow-up works focused on reintroducing convolutional priors to ViT, either in an explicit [15, 16, 21, 82, 86, 88] or implicit [45] fashion.

    在ViT之前和之后的時代,結合卷積和自留地的混合模型一直被積極研究。在ViT之前,重點是用自注意力/非本地模塊來增強ConvNet[8, 55, 66, 79],以捕捉長距離的依賴關系。最初的ViT[20]首次研究了一種混合配置,大量的后續工作集中在將卷積先驗重新引入ViT,無論是以顯式[15, 16, 21, 82, 86, 88]還是隱式[45]方式。

    5.2 Recent convolution-based approaches

    Han et al. [25] show that local Transformer attention is equivalent to inhomogeneous dynamic depthwise conv. The MSA block in Swin is then replaced with a dynamic or regular depthwise convolution, achieving comparable performance to Swin. A concurrent work ConvMixer [4] demonstrates that, in small-scale settings, depthwise convolution can be used as a promising mixing strategy. ConvMixer uses a smaller patch size to achieve the best results, making the throughput much lower than other baselines. GFNet [56] adopts Fast Fourier Transform (FFT) for token mixing. FFT is also a form of convolution, but with a global kernel size and circular padding. Unlike many recent Transformer or ConvNet designs, one primary goal of our study is to provide an in-depth look at the process of modernizing a standard ResNet and achieving state-of-the-art performance.

    Han等人[25]表明,局部Transformer注意力等同于不均勻的動態深度卷積,然后用動態或常規深度卷積取代Swin中的MSA塊,取得與Swin相當的性能。同時進行的一項工作ConvMixer[4]表明,在小范圍內,深度卷積可以作為一種有前途的混合策略。ConvMixer使用較小的補丁尺寸來達到最佳效果,使得吞吐量比其他基線低很多。GFNet[56]采用快速傅里葉變換(FFT)進行標記混合。FFT也是卷積的一種形式,但有一個全局內核大小和循環填充。與許多最近的Transformer或ConvNet設計不同,我們研究的一個主要目標是深入研究標準ResNet的現代化過程并實現最先進的性能。

    6. Conclusions

    In the 2020s, vision Transformers, particularly hierarchical ones such as Swin Transformers, began to overtake ConvNets as the favored choice for generic vision backbones. The widely held belief is that vision Transformers are more accurate, efficient, and scalable than ConvNets. We propose ConvNeXts, a pure ConvNet model that can compete favorably with state-of-the-art hierarchical vision Transformers across multiple computer vision benchmarks, while retaining the simplicity and efficiency of standard ConvNets. In some ways, our observations are surprising while our ConvNeXt model itself is not completely new — many design choices have all been examined separately over the last decade, but not collectively. We hope that the new results reported in this study will challenge several widely held views and prompt people to rethink the importance of convolution in computer vision.

    在2020年代,視覺Transformer,特別是層次化的Transformer,如Swin Transformers,開始超越ConvNets,成為通用視覺骨干的首選。人們普遍認為,視覺Transformer比ConvNets更準確、更高效、更可擴展。我們提出了ConvNeXts,一個純ConvNet模型,它可以在多個計算機視覺基準中與最先進的分層視覺Transformer競爭,同時保留了標準ConvNets的簡單性和效率。在某些方面,我們的觀察結果令人驚訝,而我們的ConvNeXt模型本身并不是全新的——許多設計選擇都在過去十年中被單獨研究過,但沒有集體研究過。我們希望本研究報告的新結果將挑戰幾個廣泛持有的觀點,并促使人們重新思考計算機視覺中卷積的重要性。

    Acknowledgments

    We thank Kaiming He, Eric Mintun, Xingyi Zhou, Ross Girshick, and Yann LeCun for valuable discussions and feedback.

    我們感謝何開明、Eric Mintun、周欣怡、Ross Girshick和Yann LeCun的寶貴討論和反饋。

    Appendix —— 附錄

    In this Appendix, we provide further experimental details (§A), robustness evaluation results (§B), more modernization experiment results (§C), and a detailed network specification (§D). We further benchmark model throughput on A100 GPUs (§E). Finally, we discuss the limitations (§F) and societal impact (§G) of our work.

    在這個附錄中,我們提供了進一步的實驗細節(§A),魯棒性評估結果(§B),更多的現代化實驗結果(§C),以及詳細的網絡規范(§D)。我們進一步對A100 GPU上的模型吞吐量進行了基準測試(§E)。最后,我們討論了我們工作的局限性(§F)和社會影響(§G)。

    A. Experimental Settings

    A.1. ImageNet (Pre-)training

    We provide ConvNeXts’ ImageNet-1K training and ImageNet-22K pre-training settings in Table 5. The settings are used for our main results in Table 1 (Section 3.2). All ConvNeXt variants use the same setting, except the stochastic depth rate is customized for model variants.

    我們在表5中提供了ConvNeXts的ImageNet-1K訓練和ImageNet-22K預訓練設置。這些設置用于我們在表1(第3.2節)的主要結果。所有的ConvNeXt變體都使用相同的設置,只是隨機深度率是為模型變體定制的。

    Table 5. ImageNet-1K/22K (pre-)training settings. Multiple stochastic depth rates (e.g., 0.1/0.4/0.5/0.5) are for each model (e.g., ConvNeXt-T/S/B/L) respectively.
    表5. ImageNet-1K/22K(預)訓練設置。多個隨機深度率(如0.1/0.4/0.5/0.5)分別為每個模型(如ConvNeXt-T/S/B/L)。

    Table 1. Classification accuracy on ImageNet-1K. Similar to Transformers, ConvNeXt also shows promising scaling behavior with higher-capacity models and a larger (pre-training) dataset. Inference throughput is measured on a V100 GPU, following [45]. On an A100 GPU, ConvNeXt can have a much higher throughput than Swin Transformer. See Appendix E. (?)ViT results with 90-epoch AugReg [67] training, provided through personal communication with the authors.
    表1. ImageNet-1K的分類精度。與Transformers類似,ConvNeXt也顯示了在更高容量的模型和更大的(預訓練)數據集下有希望的擴展行為。推理吞吐量是在V100 GPU上測量的,遵循[Swin-Transformer]。在A100 GPU上,ConvNeXt的吞吐量可以比Swin Transformer高得多。見附錄E。(?)ViT在90個周期的AugReg[67]訓練下的結果,通過與作者的個人交流提供。

    For experiments in “modernizing a ConvNet” (Section 2), we also use Table 5’s setting for ImageNet-1K, except EMA is disabled, as we find using EMA severely hurts models with BatchNorm layers. For isotropic ConvNeXts (Section 3.3), the setting for ImageNet-1K in Table A is also adopted, but warmup is extended to 50 epochs, and layer scale is disabled for isotropic ConvNeXt-S/B. The stochastic depth rates are 0.1/0.2/0.5 for isotropic ConvNeXt-S/B/L.

    在 "ConvNet現代化 "的實驗中(第2節),我們也使用了表5對ImageNet-1K的設置,只是EMA被禁用,因為我們發現使用EMA會嚴重傷害帶有BatchNorm層的模型。對于各向同性的ConvNeXts(第3.3節),我們也采用了表A中對ImageNet-1K的設置,但預熱時間延長到50個歷時,并且對于各向同性的ConvNeXt-S/B來說,層規模是禁用的。各向同性的ConvNeXt-S/B/L的隨機深度率為0.1/0.2/0.5。

    A.2. ImageNet Fine-tuning

    We list the settings for fine-tuning on ImageNet-1K in Table 6. The fine-tuning starts from the final model weights obtained in pre-training, without using the EMA weights, even if in pre-training EMA is used and EMA accuracy is reported. This is because we do not observe improvement if we fine-tune with the EMA weights (consistent with observations in [73]). The only exception is ConvNeXt-L pre-trained on ImageNet-1K, where the model accuracy is significantly lower than the EMA accuracy due to overfitting, and we select its best EMA model during pre-training as the starting point for fine-tuning.

    我們在表6中列出了ImageNet-1K的微調設置。微調是從預訓練中得到的最終模型權重開始的,沒有使用EMA權重,即使在預訓練中使用了EMA,并且報告了EMA精度。這是因為如果使用EMA權重進行微調,我們并沒有觀察到改進(與[73]中的觀察一致)。唯一的例外是在ImageNet-1K上預訓練的ConvNeXt-L,由于過擬合,其模型精度明顯低于EMA精度,我們在預訓練中選擇其最佳EMA模型作為微調的起點。

    In fine-tuning, we use layer-wise learning rate decay [6, 12] with every 3 consecutive blocks forming a group. When the model is fine-tuned at 3842 resolution, we use a crop ratio of 1.0 (i.e., no cropping) during testing following [2, 74, 80], instead of 0.875 at 2242.

    在微調中,我們使用層間學習率衰減[6, 12],每3個連續的塊形成一個組。當模型在3842分辨率下進行微調時,我們在測試過程中使用1.0的裁剪率(即不裁剪),而不是2242時的0.875。

    A.3. Downstream Tasks

    For ADE20K and COCO experiments, we follow the training settings used in BEiT [6] and Swin [45]. We also use MMDetection [10] and MMSegmentation [13] toolboxes. We use the final model weights (instead of EMA weights) from ImageNet pre-training as network initializations.

    對于ADE20K和COCO的實驗,我們遵循BEiT[6]和Swin[45]中使用的訓練設置。我們還使用了MMDetection[10]和MMSegmentation[13]工具箱。我們使用ImageNet預訓練的最終模型權重(而不是EMA權重)作為網絡初始化。

    We conduct a lightweight sweep for COCO experiments including learning rate {1e-4, 2e-4}, layer-wise learning rate decay [6] {0.7, 0.8, 0.9, 0.95}, and stochastic depth rate {0.3, 0.4, 0.5, 0.6, 0.7, 0.8}. We fine-tune the ImageNet-22K pre-trained Swin-B/L on COCO using the same sweep. We use the official code and pre-trained model weights [3].

    我們對COCO實驗進行了輕量級掃描,包括學習率{1×10?41 \times 10^{-4}1×10?4, 2×10?42 \times 10^{-4}2×10?4},層間學習率衰減[6] {0.7, 0.8, 0.9, 0.95},以及隨機深度率{0.3, 0.4, 0.5, 0.6, 0.7, 0.8}。我們在COCO上使用同樣的掃頻對ImageNet-22K預訓練的Swin-B/L進行微調。我們使用官方代碼和預訓練的模型權重[3]。

    The hyperparameters we sweep for ADE20K experiments include learning rate {8e-5, 1e-4}, layer-wise learning rate decay {0.8, 0.9}, and stochastic depth rate {0.3, 0.4, 0.5}. We report validation mIoU results using multi-scale testing. Additional single-scale testing results are in Table 7.

    我們為ADE20K實驗掃除的超參數包括學習率{8×10?58 \times 10^{-5}8×10?5, 1×10?41 \times 10^{-4}1×10?4},層間學習率衰減{0.8, 0.9},以及隨機深度率{0.3, 0.4, 0.5}。我們報告了使用多尺度測試的驗證性mIoU結果。其他單尺度測試結果見表7。

    B. Robustness Evaluation

    Additional robustness evaluation results for ConvNeXt models are presented in Table 8. We directly test our ImageNet-1K trained/fine-tuned classification models on several robustness benchmark datasets such as ImageNet-A [33], ImageNet-R [30], ImageNet-Sketch [78] and ImageNetC/Cˉ\bar{\mathrm{C}}Cˉ [31, 48] datasets. We report mean corruption error (mCE) for ImageNet-C, corruption error for ImageNet-Cˉ\bar{\mathrm{C}}Cˉ, and top-1 Accuracy for all other datasets.

    表8中列出了ConvNeXt模型的其他魯棒性評估結果。我們直接在幾個魯棒性基準數據集上測試我們的ImageNet-1K訓練/微調分類模型,如ImageNet-A [33], ImageNet-R [30], ImageNet-Sketch [78] 和ImageNetC/Cˉ\bar{\mathrm{C}}Cˉ [31, 48] 數據集。我們報告了ImageNet-C的平均腐蝕誤差(mCE),ImageNet-Cˉ\bar{\mathrm{C}}Cˉ的腐蝕誤差,以及所有其他數據集的top-1準確率。

    Table 8. Robustness evaluation of ConvNeXt. We do not make use of any specialized modules or additional fine-tuning procedures.
    表8. ConvNeXt的魯棒性評估。我們沒有使用任何專門的模塊或額外的微調程序。

    ConvNeXt (in particular the large-scale model variants) exhibits promising robustness behaviors, outperforming state-of-the-art robust transformer models [47] on several benchmarks. With extra ImageNet-22K data, ConvNeXtXL demonstrates strong domain generalization capabilities (e.g. achieving 69.3%/68.2%/55.0% accuracy on ImageNetA/R/Sketch benchmarks, respectively). We note that these robustness evaluation results were acquired without using any specialized modules or additional fine-tuning procedures.

    ConvNeXt(尤其是大規模模型的變體)表現出了很好的魯棒性行為,在一些基準測試上超過了最先進的魯棒性Transformer模型[47]。利用額外的ImageNet-22K數據,ConvNeXt XL展示了強大的領域泛化能力(例如,在ImageNetA/R/Sketch基準上分別達到69.3%/68.2%/55.0%的精度)。我們注意到,這些魯棒性評估結果是在沒有使用任何專門模塊或額外微調程序的情況下獲得的。

    C. Modernizing ResNets: detailed results

    Here we provide detailed tabulated results for the modernization experiments, at both ResNet-50 / Swin-T and ResNet-200 / Swin-B regimes. The ImageNet-1K top-1 accuracies and FLOPs for each step are shown in Table 10 and 11. ResNet-50 regime experiments are run with 3 random seeds.

    這里我們提供了在ResNet-50 / Swin-T和ResNet-200 / Swin-B兩個制度下的現代化實驗的詳細表格結果。表10和11顯示了ImageNet-1K每一步的最高準確率和FLOPs。ResNet-50制度的實驗是用3個隨機種子運行的。

    For ResNet-200, the initial number of blocks at each stage is (3, 24, 36, 3). We change it to Swin-B’s (3, 3, 27, 3) at the step of changing stage ratio. This drastically reduces the FLOPs, so at the same time, we also increase the width from 64 to 84 to keep the FLOPs at a similar level. After the step of adopting depthwise convolutions, we further increase the width to 128 (same as Swin-B’s) as a separate step.

    對于ResNet-200,每個階段的初始塊數是(3, 24, 36, 3)。在改變階段比例的步驟中,我們將其改為Swin-B的(3, 3, 27, 3)。這大大減少了FLOPs,所以同時我們也將寬度從64增加到84,以保持FLOPs在一個類似的水平。在采用深度卷積的步驟后,我們進一步將寬度增加到128(與Swin-B的相同),作為一個單獨的步驟。

    The observations on the ResNet-200 regime are mostly consistent with those on ResNet-50 as described in the main paper. One interesting difference is that inverting dimensions brings a larger improvement at ResNet-200 regime than at ResNet-50 regime (+0.79% vs. +0.14%). The performance gained by increasing kernel size also seems to saturate at kernel size 5 instead of 7. Using fewer normalization layers also has a bigger gain compared with the ResNet-50 regime (+0.46% vs. +0.14%).

    對ResNet-200系統的觀察與主論文中描述的ResNet-50系統的觀察基本一致。一個有趣的區別是,與ResNet-50系統相比,倒置尺寸帶來了更大的改進(+0.79% vs. +0.14%)。與ResNet-50系統相比,使用較少的歸一化層也有更大的收益(+0.46% vs. +0.14%)。

    D. Detailed Architectures

    We present a detailed architecture comparison between ResNet-50, ConvNeXt-T and Swin-T in Table 9. For differently sized ConvNeXts, only the number of blocks and the number of channels at each stage differ from ConvNeXt-T (see Section 3 for details). ConvNeXts enjoy the simplicity of standard ConvNets, but compete favorably with Swin Transformers in visual recognition.

    我們在表9中列出了ResNet-50、ConvNeXt-T和Swin-T之間的詳細結構比較。對于不同大小的ConvNeXts,只有每個階段的塊數和通道數與ConvNeXt-T不同(詳見第三節)。ConvNeXts享有標準ConvNets的簡單性,但在視覺識別方面與Swin Transformers的競爭很有利。

    E. Benchmarking on A100 GPUs

    Following Swin Transformer [45], the ImageNet models’ inference throughputs in Table 1 are benchmarked using a V100 GPU, where ConvNeXt is slightly faster in inference than Swin Transformer with a similar number of parameters. We now benchmark them on the more advanced A100 GPUs, which support the TensorFloat32 (TF32) tensor cores. We employ PyTorch [50] version 1.10 to use the latest “Channel Last” memory layout [22] for further speedup.

    按照Swin Transformer[45]的做法,表1中ImageNet模型的推理吞吐量是使用V100 GPU進行基準測試的,在參數數量相似的情況下,ConvNeXt的推理速度略高于Swin Transformer。現在我們在更先進的A100 GPU上對它們進行基準測試,它支持TensorFloat32(TF32)張量核心。我們采用PyTorch[50]1.10版本,使用最新的 "Channel Last "內存布局[22],以進一步提高速度。

    We present the results in Table 12. Swin Transformers and ConvNeXts both achieve faster inference throughput than V100 GPUs, but ConvNeXts’ advantage is now significantly greater, sometimes up to 49% faster. This preliminary study shows promising signals that ConvNeXt, employed with standard ConvNet modules and simple in design, could be practically more efficient models on modern hardwares.

    我們在表12中列出了結果。Swin Transformers和ConvNeXts都取得了比V100 GPU更快的推理吞吐量,但ConvNeXts的優勢現在明顯更大,有時可以快到49%。這項初步研究顯示了有希望的信號,即ConvNeXt,采用標準的ConvNet模塊,設計簡單,實際上可以在現代硬軟件上成為更有效的模型。

    Table 12. Inference throughput comparisons on an A100 GPU. Using TF32 data format and “channel last” memory layout, ConvNeXt enjoys up to ~49% higher throughput compared with a Swin Transformer with similar FLOPs.
    表12. A100 GPU上的推理吞吐量比較。使用TF32數據格式和 "通道最后(channel last) "內存布局,ConvNeXt與具有類似FLOPs的Swin Transformer相比,享有高達49%的吞吐量。

    F. Limitations

    We demonstrate ConvNeXt, a pure ConvNet model, can perform as good as a hierarchical vision Transformer on image classification, object detection, instance and semantic segmentation tasks. While our goal is to offer a broad range of evaluation tasks, we recognize computer vision applications are even more diverse. ConvNeXt may be more suited for certain tasks, while Transformers may be more flexible for others. A case in point is multi-modal learning, in which a cross-attention module may be preferable for modeling feature interactions across many modalities. Additionally, Transformers may be more flexible when used for tasks requiring discretized, sparse, or structured outputs. We believe the architecture choice should meet the needs of the task at hand while striving for simplicity.

    我們證明了ConvNeXt,一個純粹的ConvNet模型,在圖像分類、物體檢測、實例和語義分割等任務上的表現不亞于層次化的視覺變換器。雖然我們的目標是提供廣泛的評估任務,但我們認識到計算機視覺的應用甚至更加多樣化。ConvNeXt可能更適合某些任務,而Transformer可能對其他任務更靈活。一個典型的例子是多模態學習,在這種情況下,交叉注意力模塊可能更適合于為許多模態之間的特征互動建模。此外,當用于需要離散的(discretized)、稀疏的(sparse)或結構化(structured )輸出的任務時,Transformer可能更靈活。我們認為,架構的選擇應該滿足手頭任務的需要,同時爭取做到簡單。

    G. Societal Impact

    In the 2020s, research on visual representation learning began to place enormous demands on computing resources. While larger models and datasets improve performance across the board, they also introduce a slew of challenges. ViT, Swin, and ConvNeXt all perform best with their huge model variants. Investigating those model designs inevitably results in an increase in carbon emissions. One important direction, and a motivation for our paper, is to strive for simplicity — with more sophisticated modules, the network’s design space expands enormously, obscuring critical components that contribute to the performance difference. Additionally, large models and datasets present issues in terms of model robustness and fairness. Further investigation on the robustness behavior of ConvNeXt vs. Transformer will be an interesting research direction. In terms of data, our findings indicate that ConvNeXt models benefit from pre-training on large-scale datasets. While our method makes use of the publicly available ImageNet-22K dataset, individuals may wish to acquire their own data for pre-training. A more circumspect and responsible approach to data selection is required to avoid potential concerns with data biases.

    在2020年代,關于視覺表征學習的研究開始對計算資源提出了巨大的要求。雖然更大的模型和數據集全面提高了性能,但也帶來了一系列的挑戰。ViT、Swin和ConvNeXt都在其巨大的模型變體中表現最好。研究這些模型設計不可避免地會導致碳排放的增加。一個重要的方向,也是我們論文的動機,就是力求簡單——隨著更復雜的模塊,網絡的設計空間會極大地擴展,掩蓋了造成性能差異的關鍵部件。此外,大型模型和數據集在模型魯棒性和公平性方面存在問題。對ConvNeXt與Transformer的魯棒性行為的進一步調查將是一個有趣的研究方向。在數據方面,我們的發現表明ConvNeXt模型得益于大規模數據集的預訓練。雖然我們的方法利用了公開的ImageNet-22K數據集,但個人可能希望獲得自己的數據進行預訓練。為避免潛在的數據偏差問題,需要采取更加謹慎和負責任的方法來選擇數據。

    References

    [1] PyTorch Vision Models. https://pytorch.org/vision/stable/models.html. Accessed: 2021-10-01.
    [2] GitHub repository: Swin transformer. https://github.com/microsoft/Swin-Transformer, 2021.
    [3] GitHub repository: Swin transformer for object detection.https://github.com/SwinTransformer/Swin-Transformer-Object-Detection, 2021.
    [4] Anonymous. Patches are all you need? Openreview, 2021.
    [5] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016.
    [6] Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021.
    [7] Irwan Bello, William Fedus, Xianzhi Du, Ekin Dogus Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, and Barret Zoph. Revisiting resnets: Improved training and scaling strategies. NeurIPS, 2021.
    [8] Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V Le. Attention augmented convolutional networks. In ICCV, 2019.
    [9] Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In CVPR, 2018.
    [10] Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. MMDetection: Open mmlab detection toolbox and benchmark. arXiv:1906.07155, 2019.
    [11] Fran?ois Chollet. Xception: Deep learning with depthwise separable convolutions. In CVPR, 2017.
    [12] Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR, 2020.
    [13] MMSegmentation contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020.
    [14] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR Workshops, 2020.
    [15] Zihang Dai, Hanxiao Liu, Quoc V Le, and Mingxing Tan. Coatnet: Marrying convolution and attention for all data sizes. NeurIPS, 2021.
    [16] Stéphane d’Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, and Levent Sagun. ConViT: Improving vision transformers with soft convolutional inductive biases. ICML, 2021.
    [17] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
    [18] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
    [19] Piotr Dollár, Serge Belongie, and Pietro Perona. The fastest pedestrian detector in the west. In BMVC, 2010.
    [20] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
    [21] Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. Multiscale vision transformers. ICCV, 2021.
    [22] Vitaly Fedyunin. Tutorial: Channel last memory format in PyTorch. https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html, 2021. Accessed: 2021-10-01.
    [23] Ross Girshick. Fast R-CNN. In ICCV, 2015.
    [24] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    [25] Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, and Jingdong Wang. Demystifying local vision transformer: Sparse connectivity, weight sharing, and dynamic weight. arXiv:2106.04263, 2021.
    [26] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. arXiv:2111.06377, 2021.
    [27] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. In ICCV, 2017.
    [28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
    [29] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, 2016.
    [30] Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, 2021.
    [31] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2018.
    [32] Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016.
    [33] Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. In CVPR, 2021.
    [34] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017.
    [35] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In CVPR, 2018.
    [36] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, 2017.
    [37] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In ECCV, 2016.
    [38] Sergey Ioffe. Batch renormalization: Towards reducing minibatch dependence in batch-normalized models. In NeurIPS, 2017.
    [39] Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. Big Transfer (BiT): General visual representation learning. In ECCV, 2020.
    [40] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
    [41] Andrew Lavin and Scott Gray. Fast algorithms for convolutional neural networks. In CVPR, 2016.
    [42] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
    [43] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
    [44] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV. 2014.
    [45] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. 2021.
    [46] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In ICLR, 2019.
    [47] Xiaofeng Mao, Gege Qi, Yuefeng Chen, Xiaodan Li, Ranjie Duan, Shaokai Ye, Yuan He, and Hui Xue. Towards robust vision transformer. arXiv preprint arXiv:2105.07926, 2021.
    [48] Eric Mintun, Alexander Kirillov, and Saining Xie. On interaction between augmentations and corruptions in natural corruption robustness. NeurIPS, 2021.
    [49] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
    [50] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
    [51] Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 1992.
    [52] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
    [53] Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, and Piotr Dollár. On network design spaces for visual recognition. In ICCV, 2019.
    [54] Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. Designing network design spaces. In CVPR, 2020.
    [55] Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, and Jonathon Shlens. Stand-alone self-attention in vision models. NeurIPS, 2019.
    [56] Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. Global filter networks for image classification. NeurIPS, 2021.
    [57] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
    [58] Henry A Rowley, Shumeet Baluja, and Takeo Kanade. Neural network-based face detection. TPAMI, 1998.
    [59] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
    [60] Tim Salimans and Diederik P Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In NeurIPS, 2016.
    [61] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.
    [62] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.
    [63] Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. Pedestrian detection with unsupervised multistage feature learning. In CVPR, 2013.
    [64] Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In NeurIPS, 2014.
    [65] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
    [66] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition. In CVPR, 2021.
    [67] Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your vit? data, augmentation, and regularization in vision transformers. arXiv preprint arXiv:2106.10270, 2021.
    [68] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In CVPR, 2015.
    [69] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016.
    [70] Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In CVPR, 2019.
    [71] Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
    [72] Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. In ICML, 2021.
    [73] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. arXiv:2012.12877, 2020.
    [74] Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Hervé Jégou. Going deeper with image transformers. ICCV, 2021.
    [75] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022, 2016.
    [76] Régis Vaillant, Christophe Monrocq, and Yann Le Cun. Original approach for the localisation of objects in images. Vision, Image and Signal Processing, 1994.
    [77] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NeurIPS, 2017.
    [78] Haohan Wang, Songwei Ge, Eric P Xing, and Zachary C Lipton. Learning robust global representations by penalizing local predictive power. NeurIPS, 2019.
    [79] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In CVPR, 2018.
    [80] Ross Wightman. GitHub repository: Pytorch image models. https://github.com/rwightman/pytorchimage-models, 2019.
    [81] Ross Wightman, Hugo Touvron, and Hervé Jégou. Resnet strikes back: An improved training procedure in timm. arXiv:2110.00476, 2021.
    [82] Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Introducing convolutions to vision transformers. ICCV, 2021.
    [83] Yuxin Wu and Kaiming He. Group normalization. In ECCV, 2018.
    [84] Yuxin Wu and Justin Johnson. Rethinking “batch” in batchnorm. arXiv:2105.07576, 2021.
    [85] Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Unified perceptual parsing for scene understanding. In ECCV, 2018.
    [86] Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, and Ross Girshick. Early convolutions help transformers see better. In NeurIPS, 2021.
    [87] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017.
    [88] Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. Coscale conv-attentional image transformers. ICCV, 2021.
    [89] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk
    Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
    [90] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In ICLR, 2018.
    [91] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In AAAI, 2020.
    [92] Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Semantic understanding of scenes through the ADE20K dataset. IJCV, 2019

    總結

    以上是生活随笔為你收集整理的A ConvNet for the 2020s的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

    精品国产一区av天美传媒 | 午夜理论片yy44880影院 | 图片区 小说区 区 亚洲五月 | 无码人妻av免费一区二区三区 | 秋霞成人午夜鲁丝一区二区三区 | 无套内射视频囯产 | 久久久久久久人妻无码中文字幕爆 | 亚拍精品一区二区三区探花 | 中文字幕无码免费久久9一区9 | 99久久久无码国产精品免费 | 欧美乱妇无乱码大黄a片 | 欧美老人巨大xxxx做受 | 亚洲人成影院在线无码按摩店 | 亚洲毛片av日韩av无码 | 国产卡一卡二卡三 | 人妻无码αv中文字幕久久琪琪布 | 久久精品中文字幕大胸 | 亚洲中文字幕在线观看 | 无码国产色欲xxxxx视频 | 国产成人精品三级麻豆 | 久久久久成人精品免费播放动漫 | 精品成在人线av无码免费看 | 300部国产真实乱 | 亚洲色偷偷男人的天堂 | 国产亚洲人成在线播放 | 欧美人与牲动交xxxx | 性色欲网站人妻丰满中文久久不卡 | 红桃av一区二区三区在线无码av | 任你躁国产自任一区二区三区 | 任你躁国产自任一区二区三区 | 性啪啪chinese东北女人 | 亚洲一区二区三区播放 | 俄罗斯老熟妇色xxxx | 欧美一区二区三区 | 国产亚洲精品久久久闺蜜 | 最新国产乱人伦偷精品免费网站 | 真人与拘做受免费视频 | 亚洲日韩精品欧美一区二区 | ass日本丰满熟妇pics | 中文字幕无码免费久久9一区9 | 亚洲精品中文字幕久久久久 | 啦啦啦www在线观看免费视频 | 天下第一社区视频www日本 | 欧美性生交xxxxx久久久 | 1000部夫妻午夜免费 | 日本熟妇浓毛 | 亚洲综合久久一区二区 | 人妻aⅴ无码一区二区三区 | 久精品国产欧美亚洲色aⅴ大片 | 牲欲强的熟妇农村老妇女 | 精品国产国产综合精品 | 好爽又高潮了毛片免费下载 | 激情爆乳一区二区三区 | 性色av无码免费一区二区三区 | 一本久久a久久精品亚洲 | 国产sm调教视频在线观看 | 国产麻豆精品一区二区三区v视界 | 鲁一鲁av2019在线 | 久久综合给久久狠狠97色 | 麻豆国产人妻欲求不满谁演的 | 国产精品无码成人午夜电影 | 日韩精品乱码av一区二区 | 久久久久久a亚洲欧洲av冫 | 久久精品一区二区三区四区 | 超碰97人人做人人爱少妇 | 国产精品久久久午夜夜伦鲁鲁 | 欧美人与物videos另类 | 成人精品视频一区二区 | 麻花豆传媒剧国产免费mv在线 | 国产农村妇女aaaaa视频 撕开奶罩揉吮奶头视频 | 中文字幕无码免费久久9一区9 | 欧美人与禽猛交狂配 | 国产成人无码av片在线观看不卡 | 亚洲 欧美 激情 小说 另类 | 亚洲色无码一区二区三区 | 啦啦啦www在线观看免费视频 | 双乳奶水饱满少妇呻吟 | 亚洲中文字幕乱码av波多ji | 狠狠亚洲超碰狼人久久 | 又大又紧又粉嫩18p少妇 | 日日鲁鲁鲁夜夜爽爽狠狠 | 亚洲欧美日韩国产精品一区二区 | 在线播放亚洲第一字幕 | 一区二区三区乱码在线 | 欧洲 | 亚洲精品一区二区三区在线观看 | 免费国产黄网站在线观看 | 欧美日韩色另类综合 | 少妇无套内谢久久久久 | 亚洲自偷精品视频自拍 | 精品欧美一区二区三区久久久 | 日韩欧美中文字幕在线三区 | 大色综合色综合网站 | 欧美日本精品一区二区三区 | 成人无码视频免费播放 | 任你躁在线精品免费 | 在线播放无码字幕亚洲 | 亚洲天堂2017无码中文 | 俺去俺来也在线www色官网 | 成人欧美一区二区三区 | 中文字幕日韩精品一区二区三区 | 国产精品a成v人在线播放 | 久久国产精品_国产精品 | 爆乳一区二区三区无码 | 中文字幕无码av波多野吉衣 | 国产农村妇女aaaaa视频 撕开奶罩揉吮奶头视频 | 久久 国产 尿 小便 嘘嘘 | 久久精品成人欧美大片 | 亚洲欧美中文字幕5发布 | 无码国产激情在线观看 | 无码av免费一区二区三区试看 | 国产亚洲人成在线播放 | 白嫩日本少妇做爰 | 亚洲小说图区综合在线 | 亚洲精品无码人妻无码 | 国产激情精品一区二区三区 | 高清无码午夜福利视频 | 亚洲s色大片在线观看 | 福利一区二区三区视频在线观看 | 99精品久久毛片a片 | 久久久精品欧美一区二区免费 | 久久亚洲精品中文字幕无男同 | 亚洲欧美综合区丁香五月小说 | 亚洲综合另类小说色区 | 欧美性生交活xxxxxdddd | 欧洲熟妇精品视频 | 久久午夜无码鲁丝片午夜精品 | 亚洲男女内射在线播放 | 亚洲中文字幕乱码av波多ji | 亚洲一区二区三区含羞草 | 日韩精品无码一区二区中文字幕 | 色妞www精品免费视频 | 少妇被粗大的猛进出69影院 | 久久国产精品精品国产色婷婷 | 人妻少妇被猛烈进入中文字幕 | 波多野结衣av一区二区全免费观看 | 乌克兰少妇xxxx做受 | 国产婷婷色一区二区三区在线 | 丰满肥臀大屁股熟妇激情视频 | 亚洲 激情 小说 另类 欧美 | 亚洲成色在线综合网站 | 十八禁真人啪啪免费网站 | 久久久www成人免费毛片 | 亚洲综合无码久久精品综合 | 国产激情无码一区二区app | 人人妻人人澡人人爽人人精品浪潮 | 国产亚洲精品久久久久久久久动漫 | 亚洲熟妇色xxxxx欧美老妇y | 久久久久久久久蜜桃 | 色偷偷人人澡人人爽人人模 | 亚洲精品国产精品乱码视色 | 亚洲色成人中文字幕网站 | 亚洲欧美日韩综合久久久 | 久久综合给久久狠狠97色 | 日日摸夜夜摸狠狠摸婷婷 | 人人妻人人澡人人爽欧美精品 | 图片区 小说区 区 亚洲五月 | 久久久久亚洲精品中文字幕 | 亚洲 日韩 欧美 成人 在线观看 | 欧美日本精品一区二区三区 | 欧美老妇与禽交 | 全球成人中文在线 | 亚洲欧美日韩综合久久久 | 高清国产亚洲精品自在久久 | 东京热无码av男人的天堂 | 无码乱肉视频免费大全合集 | 国产精品igao视频网 | 久久精品国产亚洲精品 | 国产精品18久久久久久麻辣 | 亚洲一区二区观看播放 | 亚洲成av人影院在线观看 | 色综合久久网 | 精品夜夜澡人妻无码av蜜桃 | 日本乱偷人妻中文字幕 | 性生交大片免费看l | 国产精品亚洲一区二区三区喷水 | 亚洲中文字幕成人无码 | 5858s亚洲色大成网站www | 精品无人区无码乱码毛片国产 | 99视频精品全部免费免费观看 | 日日碰狠狠丁香久燥 | 日日噜噜噜噜夜夜爽亚洲精品 | 亚洲精品中文字幕久久久久 | 国产极品美女高潮无套在线观看 | 中文亚洲成a人片在线观看 | 无遮无挡爽爽免费视频 | 国产精品第一区揄拍无码 | 激情人妻另类人妻伦 | 国产婷婷色一区二区三区在线 | 亚洲色大成网站www国产 | 国产精品资源一区二区 | 国产又爽又猛又粗的视频a片 | 任你躁在线精品免费 | 久久无码专区国产精品s | 久青草影院在线观看国产 | 中文久久乱码一区二区 | 国产人妻精品一区二区三区 | 蜜桃av抽搐高潮一区二区 | 国产办公室秘书无码精品99 | 麻豆精品国产精华精华液好用吗 | 国产精品毛片一区二区 | 成人性做爰aaa片免费看不忠 | 国产亚洲精品久久久久久 | 日本丰满熟妇videos | 夜夜高潮次次欢爽av女 | av香港经典三级级 在线 | 精品久久久久香蕉网 | 亚洲精品国产精品乱码视色 | 亚洲国产精品久久久久久 | 精品国产国产综合精品 | 亚洲人成影院在线无码按摩店 | 亚洲欧美色中文字幕在线 | 动漫av一区二区在线观看 | 久久综合给久久狠狠97色 | 黑人玩弄人妻中文在线 | 中文字幕人妻无码一区二区三区 | 天天摸天天透天天添 | 无码人妻丰满熟妇区毛片18 | 老子影院午夜精品无码 | 国产亚洲精品久久久久久国模美 | 国产精品久久精品三级 | 成人片黄网站色大片免费观看 | 久久亚洲精品成人无码 | 色情久久久av熟女人妻网站 | 日本熟妇浓毛 | 最新国产麻豆aⅴ精品无码 | 人妻互换免费中文字幕 | 亚洲日韩av一区二区三区四区 | 国产性生交xxxxx无码 | 久久国内精品自在自线 | 午夜成人1000部免费视频 | 国产内射老熟女aaaa | 亚洲日韩乱码中文无码蜜桃臀网站 | 99精品国产综合久久久久五月天 | 国产激情艳情在线看视频 | 亚洲日韩中文字幕在线播放 | 国产超级va在线观看视频 | 国产成人无码区免费内射一片色欲 | 日日麻批免费40分钟无码 | 国产高清av在线播放 | 日本熟妇人妻xxxxx人hd | 青草青草久热国产精品 | 在线 国产 欧美 亚洲 天堂 | 欧美国产亚洲日韩在线二区 | 熟女俱乐部五十路六十路av | 亚洲成av人影院在线观看 | 一本大道久久东京热无码av | 给我免费的视频在线观看 | 九九久久精品国产免费看小说 | 国产卡一卡二卡三 | 欧美日韩视频无码一区二区三 | 无码任你躁久久久久久久 | 好男人社区资源 | 久久久中文字幕日本无吗 | 99久久无码一区人妻 | 亚洲精品综合五月久久小说 | 奇米影视7777久久精品 | 丝袜美腿亚洲一区二区 | 男女猛烈xx00免费视频试看 | 蜜桃视频插满18在线观看 | 亚洲男女内射在线播放 | 亚洲天堂2017无码中文 | 精品无码av一区二区三区 | 激情五月综合色婷婷一区二区 | 4hu四虎永久在线观看 | 成人动漫在线观看 | 在线精品亚洲一区二区 | 又色又爽又黄的美女裸体网站 | 亚洲日本在线电影 | 国产午夜手机精彩视频 | 午夜精品一区二区三区在线观看 | 老熟妇仑乱视频一区二区 | 国产精品无套呻吟在线 | 无码福利日韩神码福利片 | 欧美日韩精品 | 学生妹亚洲一区二区 | 免费无码av一区二区 | 精品成人av一区二区三区 | 2019午夜福利不卡片在线 | 精品午夜福利在线观看 | 免费中文字幕日韩欧美 | 欧美亚洲日韩国产人成在线播放 | 久久精品人妻少妇一区二区三区 | 国产成人无码区免费内射一片色欲 | 亚洲中文字幕无码一久久区 | 小sao货水好多真紧h无码视频 | 真人与拘做受免费视频一 | av人摸人人人澡人人超碰下载 | 国产精品美女久久久网av | 亚洲日韩一区二区三区 | 成年美女黄网站色大免费视频 | 午夜福利试看120秒体验区 | 黑人大群体交免费视频 | 亚洲日本va午夜在线电影 | 97夜夜澡人人双人人人喊 | 国产成人无码av片在线观看不卡 | 美女扒开屁股让男人桶 | 亚洲欧洲日本综合aⅴ在线 | 女人被爽到呻吟gif动态图视看 | av在线亚洲欧洲日产一区二区 | 亚洲中文无码av永久不收费 | 国产精品久久久久久亚洲影视内衣 | 亚洲 a v无 码免 费 成 人 a v | 国产精品二区一区二区aⅴ污介绍 | 欧美zoozzooz性欧美 | 久久午夜无码鲁丝片秋霞 | 国产成人无码午夜视频在线观看 | 无套内射视频囯产 | 亚洲春色在线视频 | 奇米影视7777久久精品人人爽 | 中文字幕无码视频专区 | 国产片av国语在线观看 | 亚洲欧美精品aaaaaa片 | 亚洲国产精品无码久久久久高潮 | 超碰97人人做人人爱少妇 | 亚洲gv猛男gv无码男同 | 国产精品资源一区二区 | 国产精品毛多多水多 | 国产精品亚洲一区二区三区喷水 | 四虎永久在线精品免费网址 | 午夜无码人妻av大片色欲 | 亚洲成av人在线观看网址 | 人妻aⅴ无码一区二区三区 | 国产小呦泬泬99精品 | 扒开双腿吃奶呻吟做受视频 | аⅴ资源天堂资源库在线 | 国产真人无遮挡作爱免费视频 | 精品水蜜桃久久久久久久 | 亚洲呦女专区 | 久久久久久亚洲精品a片成人 | 蜜臀aⅴ国产精品久久久国产老师 | 福利一区二区三区视频在线观看 | 中文字幕无码日韩欧毛 | 中文字幕精品av一区二区五区 | 久久无码中文字幕免费影院蜜桃 | 中文精品无码中文字幕无码专区 | 亚洲国产日韩a在线播放 | 国产无套内射久久久国产 | 国产亚洲精品久久久闺蜜 | 亚洲 另类 在线 欧美 制服 | 久久99精品久久久久久动态图 | 日本va欧美va欧美va精品 | 欧美亚洲日韩国产人成在线播放 | 久久无码专区国产精品s | 国产亚洲精品久久久久久国模美 | 99麻豆久久久国产精品免费 | 国产办公室秘书无码精品99 | 最新版天堂资源中文官网 | 国产激情综合五月久久 | 国产九九九九九九九a片 | 国产精品久久福利网站 | 午夜精品久久久久久久久 | 玩弄人妻少妇500系列视频 | 久久午夜无码鲁丝片秋霞 | 国内精品人妻无码久久久影院蜜桃 | 少女韩国电视剧在线观看完整 | 欧洲vodafone精品性 | 国产内射爽爽大片视频社区在线 | 熟妇人妻中文av无码 | 成人精品视频一区二区 | 激情内射日本一区二区三区 | 亚洲大尺度无码无码专区 | 300部国产真实乱 | 中文字幕+乱码+中文字幕一区 | 国产午夜手机精彩视频 | 99久久久国产精品无码免费 | 天天摸天天透天天添 | 欧美性生交xxxxx久久久 | 午夜熟女插插xx免费视频 | 亚洲国产欧美国产综合一区 | 18黄暴禁片在线观看 | 免费无码一区二区三区蜜桃大 | 无码乱肉视频免费大全合集 | 国产精品亚洲一区二区三区喷水 | 午夜免费福利小电影 | √天堂资源地址中文在线 | 亚洲最大成人网站 | 中文字幕无码热在线视频 | 最近的中文字幕在线看视频 | 色欲av亚洲一区无码少妇 | 亚洲码国产精品高潮在线 | 色综合视频一区二区三区 | 夜精品a片一区二区三区无码白浆 | 人人妻人人澡人人爽精品欧美 | 亚无码乱人伦一区二区 | 玩弄人妻少妇500系列视频 | www国产亚洲精品久久久日本 | 77777熟女视频在线观看 а天堂中文在线官网 | 亚洲精品综合五月久久小说 | 国产日产欧产精品精品app | 国产精品美女久久久 | 久久国产精品_国产精品 | 亚拍精品一区二区三区探花 | 国产精品久久久久影院嫩草 | 少妇高潮一区二区三区99 | 精品国产麻豆免费人成网站 | 俺去俺来也在线www色官网 | 成年女人永久免费看片 | 亚洲精品国产第一综合99久久 | 一个人看的www免费视频在线观看 | 日日夜夜撸啊撸 | 欧美猛少妇色xxxxx | 樱花草在线社区www | 我要看www免费看插插视频 | 色综合久久88色综合天天 | 黑人巨大精品欧美黑寡妇 | 一本加勒比波多野结衣 | 欧美熟妇另类久久久久久不卡 | 亚洲中文字幕无码中文字在线 | 爱做久久久久久 | 三级4级全黄60分钟 | 内射欧美老妇wbb | 1000部夫妻午夜免费 | 亚洲s色大片在线观看 | 色狠狠av一区二区三区 | 人妻无码αv中文字幕久久琪琪布 | 青草视频在线播放 | 大色综合色综合网站 | 色五月五月丁香亚洲综合网 | 97夜夜澡人人双人人人喊 | 欧美乱妇无乱码大黄a片 | 国产超碰人人爽人人做人人添 | 丰满少妇女裸体bbw | 国产内射爽爽大片视频社区在线 | 丁香啪啪综合成人亚洲 | 欧美刺激性大交 | 亚洲成熟女人毛毛耸耸多 | 成人试看120秒体验区 | 国产精品福利视频导航 | 国产精品内射视频免费 | 亚洲成av人片天堂网无码】 | 国产香蕉尹人综合在线观看 | 天天综合网天天综合色 | 蜜桃臀无码内射一区二区三区 | 少妇激情av一区二区 | 久久久久亚洲精品中文字幕 | 麻豆精品国产精华精华液好用吗 | 成人综合网亚洲伊人 | 三上悠亚人妻中文字幕在线 | 国产精品高潮呻吟av久久4虎 | 免费网站看v片在线18禁无码 | 精品 日韩 国产 欧美 视频 | 国产热a欧美热a在线视频 | yw尤物av无码国产在线观看 | 永久免费精品精品永久-夜色 | 又粗又大又硬又长又爽 | 国产乱人伦app精品久久 国产在线无码精品电影网 国产国产精品人在线视 | 亚洲午夜久久久影院 | 国产人妻久久精品二区三区老狼 | 国产无遮挡吃胸膜奶免费看 | 99国产精品白浆在线观看免费 | 少妇无码吹潮 | 色婷婷久久一区二区三区麻豆 | 国产精品免费大片 | 精品无码成人片一区二区98 | 漂亮人妻洗澡被公强 日日躁 | 久久综合香蕉国产蜜臀av | 欧洲美熟女乱又伦 | 日本丰满熟妇videos | 久久亚洲中文字幕无码 | 四虎永久在线精品免费网址 | 久久久久免费看成人影片 | 国内丰满熟女出轨videos | 国产黑色丝袜在线播放 | 日本丰满护士爆乳xxxx | 国内精品一区二区三区不卡 | 动漫av一区二区在线观看 | 在线播放无码字幕亚洲 | 欧美成人家庭影院 | 高潮喷水的毛片 | 99久久久无码国产精品免费 | 又大又黄又粗又爽的免费视频 | 四虎国产精品一区二区 | 亚洲人成影院在线无码按摩店 | 久久精品99久久香蕉国产色戒 | 夜精品a片一区二区三区无码白浆 | 西西人体www44rt大胆高清 | 国产亚洲视频中文字幕97精品 | 日产国产精品亚洲系列 | 午夜性刺激在线视频免费 | 无码帝国www无码专区色综合 | 图片区 小说区 区 亚洲五月 | 国精产品一品二品国精品69xx | 久久精品国产精品国产精品污 | 牲欲强的熟妇农村老妇女视频 | 欧美日韩一区二区三区自拍 | 99精品国产综合久久久久五月天 | 奇米综合四色77777久久 东京无码熟妇人妻av在线网址 | 蜜桃av抽搐高潮一区二区 | 天堂久久天堂av色综合 | 99精品无人区乱码1区2区3区 | 无遮挡国产高潮视频免费观看 | 亚洲一区二区观看播放 | 日韩人妻无码一区二区三区久久99 | 久久久久国色av免费观看性色 | 天天拍夜夜添久久精品大 | 无码中文字幕色专区 | 国内精品一区二区三区不卡 | 亚洲啪av永久无码精品放毛片 | 日日橹狠狠爱欧美视频 | 无码av最新清无码专区吞精 | 久久综合九色综合97网 | 国产猛烈高潮尖叫视频免费 | 性欧美大战久久久久久久 | 久久综合九色综合欧美狠狠 | 人妻少妇精品无码专区二区 | 国产精品沙发午睡系列 | 一区二区三区乱码在线 | 欧洲 | 综合网日日天干夜夜久久 | 国产超碰人人爽人人做人人添 | 精品少妇爆乳无码av无码专区 | 亚洲精品国产品国语在线观看 | 免费人成网站视频在线观看 | 综合人妻久久一区二区精品 | 精品无码成人片一区二区98 | 亚洲 高清 成人 动漫 | 色综合久久久久综合一本到桃花网 | 俺去俺来也www色官网 | 最近中文2019字幕第二页 | 女高中生第一次破苞av | 亚洲精品无码人妻无码 | 一本久久a久久精品vr综合 | 国产精品久久久久久亚洲毛片 | 亚洲熟女一区二区三区 | 黄网在线观看免费网站 | 人妻天天爽夜夜爽一区二区 | 亚洲精品综合五月久久小说 | 国语精品一区二区三区 | 婷婷综合久久中文字幕蜜桃三电影 | 国产精品人人爽人人做我的可爱 | 天下第一社区视频www日本 | 精品午夜福利在线观看 | 国产超碰人人爽人人做人人添 | 大乳丰满人妻中文字幕日本 | 亚洲 日韩 欧美 成人 在线观看 | 亚洲日韩av一区二区三区中文 | 亚洲精品美女久久久久久久 | 国产真实乱对白精彩久久 | 女人被男人躁得好爽免费视频 | 97夜夜澡人人双人人人喊 | 欧美人与善在线com | 亚洲日本一区二区三区在线 | 国产一区二区不卡老阿姨 | 97人妻精品一区二区三区 | 色五月五月丁香亚洲综合网 | 日韩亚洲欧美中文高清在线 | 国内精品久久久久久中文字幕 | 国产香蕉尹人综合在线观看 | 午夜成人1000部免费视频 | 色情久久久av熟女人妻网站 | 国产女主播喷水视频在线观看 | 在线播放无码字幕亚洲 | 欧美人与牲动交xxxx | 精品熟女少妇av免费观看 | 2020最新国产自产精品 | 啦啦啦www在线观看免费视频 | 国产麻豆精品精东影业av网站 | 九月婷婷人人澡人人添人人爽 | 色 综合 欧美 亚洲 国产 | 亚洲中文字幕av在天堂 | 亚洲精品一区二区三区婷婷月 | 婷婷五月综合缴情在线视频 | 在教室伦流澡到高潮hnp视频 | av无码久久久久不卡免费网站 | 黑人粗大猛烈进出高潮视频 | 国产热a欧美热a在线视频 | 蜜桃av蜜臀av色欲av麻 999久久久国产精品消防器材 | 色欲人妻aaaaaaa无码 | 黑人巨大精品欧美一区二区 | 欧美性生交xxxxx久久久 | 亚洲 a v无 码免 费 成 人 a v | 婷婷丁香五月天综合东京热 | 久久久久久a亚洲欧洲av冫 | 成人精品视频一区二区三区尤物 | 国产片av国语在线观看 | 丰满少妇弄高潮了www | 精品国产麻豆免费人成网站 | 欧美高清在线精品一区 | 欧美人与禽猛交狂配 | 小sao货水好多真紧h无码视频 | 东北女人啪啪对白 | 熟妇人妻激情偷爽文 | 国产成人精品必看 | 久青草影院在线观看国产 | 色窝窝无码一区二区三区色欲 | 女高中生第一次破苞av | 久久精品成人欧美大片 | 一本无码人妻在中文字幕免费 | 国产熟妇另类久久久久 | 亚洲日韩中文字幕在线播放 | 色综合久久久无码网中文 | 成 人影片 免费观看 | 香蕉久久久久久av成人 | 国产内射老熟女aaaa | 无码精品人妻一区二区三区av | 伊人久久婷婷五月综合97色 | 好男人社区资源 | 国产精品福利视频导航 | 国产区女主播在线观看 | 亚洲国产精品成人久久蜜臀 | 国产精品久久久 | 亚洲综合无码一区二区三区 | 麻豆md0077饥渴少妇 | 国产精品美女久久久久av爽李琼 | 色老头在线一区二区三区 | 亚洲 欧美 激情 小说 另类 | 未满小14洗澡无码视频网站 | 欧美性黑人极品hd | 国产特级毛片aaaaaa高潮流水 | 亚洲の无码国产の无码影院 | 自拍偷自拍亚洲精品10p | 亚洲高清偷拍一区二区三区 | 成人aaa片一区国产精品 | 鲁大师影院在线观看 | 国产免费观看黄av片 | 久久久久99精品成人片 | 对白脏话肉麻粗话av | 亚洲欧美色中文字幕在线 | 欧美丰满少妇xxxx性 | 熟女体下毛毛黑森林 | 少妇无码一区二区二三区 | 久在线观看福利视频 | 国产精品久久久午夜夜伦鲁鲁 | 久久亚洲中文字幕无码 | 18禁黄网站男男禁片免费观看 | 玩弄人妻少妇500系列视频 | 三上悠亚人妻中文字幕在线 | 国产色精品久久人妻 | 久久久久99精品成人片 | 精品一二三区久久aaa片 | 欧美人与禽猛交狂配 | 国产一区二区三区日韩精品 | 日韩精品无码一区二区中文字幕 | 精品一区二区三区无码免费视频 | 伊人久久大香线焦av综合影院 | 无人区乱码一区二区三区 | 大屁股大乳丰满人妻 | 久青草影院在线观看国产 | 岛国片人妻三上悠亚 | www国产精品内射老师 | 欧美大屁股xxxxhd黑色 | 大地资源网第二页免费观看 | 荫蒂被男人添的好舒服爽免费视频 | 国产精品亚洲一区二区三区喷水 | 国产在热线精品视频 | a国产一区二区免费入口 | 国语精品一区二区三区 | 亚洲人成影院在线观看 | 久久久久久国产精品无码下载 | 无码国产激情在线观看 | 爆乳一区二区三区无码 | 亚洲成a人一区二区三区 | 18精品久久久无码午夜福利 | 亚洲精品国偷拍自产在线麻豆 | 精品夜夜澡人妻无码av蜜桃 | 亚洲另类伦春色综合小说 | 亚洲aⅴ无码成人网站国产app | 精品久久久中文字幕人妻 | 麻豆精品国产精华精华液好用吗 | 99久久人妻精品免费二区 | 亚洲中文字幕在线观看 | 国产精品-区区久久久狼 | 国产精品a成v人在线播放 | 最新国产乱人伦偷精品免费网站 | 色老头在线一区二区三区 | aa片在线观看视频在线播放 | 国产一区二区三区精品视频 | 熟妇人妻无乱码中文字幕 | 色妞www精品免费视频 | 国产超级va在线观看视频 | 成人免费视频在线观看 | 精品久久久中文字幕人妻 | 久久综合给久久狠狠97色 | 欧美成人高清在线播放 | 久久精品无码一区二区三区 | 国产精品美女久久久久av爽李琼 | 天堂а√在线地址中文在线 | 日本一区二区更新不卡 | 久久久久亚洲精品男人的天堂 | 成人aaa片一区国产精品 | 天天躁夜夜躁狠狠是什么心态 | 亲嘴扒胸摸屁股激烈网站 | 成人免费无码大片a毛片 | 久久久www成人免费毛片 | 九九久久精品国产免费看小说 | 99久久久无码国产精品免费 | 内射后入在线观看一区 | 久久久av男人的天堂 | 高清无码午夜福利视频 | 一本大道久久东京热无码av | 国产精品高潮呻吟av久久4虎 | 中文字幕人妻无码一夲道 | 国内少妇偷人精品视频免费 | 欧美丰满熟妇xxxx性ppx人交 | 少妇无码一区二区二三区 | 精品偷拍一区二区三区在线看 | 国产一区二区三区精品视频 | 99久久久无码国产aaa精品 | 婷婷五月综合缴情在线视频 | 真人与拘做受免费视频 | 欧美freesex黑人又粗又大 | 九一九色国产 | 国产办公室秘书无码精品99 | 亚洲色www成人永久网址 | 国产成人精品久久亚洲高清不卡 | 中文精品无码中文字幕无码专区 | 老熟女乱子伦 | 免费人成在线观看网站 | 国产成人精品必看 | 亚洲七七久久桃花影院 | 粉嫩少妇内射浓精videos | 国产精品美女久久久网av | 国产人妻人伦精品 | 国产电影无码午夜在线播放 | 色综合久久88色综合天天 | 午夜精品一区二区三区在线观看 | 又色又爽又黄的美女裸体网站 | 樱花草在线社区www | 国产口爆吞精在线视频 | 国产xxx69麻豆国语对白 | 精品成在人线av无码免费看 | 国产乱人伦偷精品视频 | 少妇性荡欲午夜性开放视频剧场 | 在线播放亚洲第一字幕 | 麻豆人妻少妇精品无码专区 | 日本高清一区免费中文视频 | 国产精品亚洲а∨无码播放麻豆 | 久久久成人毛片无码 | 亚洲人成无码网www | 装睡被陌生人摸出水好爽 | 国产午夜亚洲精品不卡下载 | 精品无码成人片一区二区98 | 丁香啪啪综合成人亚洲 | 国内少妇偷人精品视频免费 | 色综合天天综合狠狠爱 | 老头边吃奶边弄进去呻吟 | 国产黄在线观看免费观看不卡 | 久久久成人毛片无码 | 中文字幕无码免费久久9一区9 | 日本在线高清不卡免费播放 | 欧美国产亚洲日韩在线二区 | 国产9 9在线 | 中文 | 亚洲欧美国产精品专区久久 | 亚洲熟女一区二区三区 | 精品久久久无码人妻字幂 | 日韩成人一区二区三区在线观看 | 亚洲综合色区中文字幕 | 成人一区二区免费视频 | 日产精品99久久久久久 | 亚拍精品一区二区三区探花 | 国产亚洲精品久久久久久国模美 | 亚洲色www成人永久网址 | 欧美黑人性暴力猛交喷水 | 一个人看的www免费视频在线观看 | 熟妇激情内射com | 又大又黄又粗又爽的免费视频 | 欧美喷潮久久久xxxxx | 国产免费久久精品国产传媒 | 亚洲精品无码人妻无码 | 性欧美大战久久久久久久 | 又紧又大又爽精品一区二区 | 无遮挡国产高潮视频免费观看 | 国精品人妻无码一区二区三区蜜柚 | 国产人妻大战黑人第1集 | 午夜福利不卡在线视频 | 国产偷国产偷精品高清尤物 | 亚洲乱亚洲乱妇50p | 国产精品福利视频导航 | 久久无码中文字幕免费影院蜜桃 | 无码一区二区三区在线观看 | 日韩人妻系列无码专区 | 丰满护士巨好爽好大乳 | 又粗又大又硬毛片免费看 | 在线成人www免费观看视频 | 欧美激情一区二区三区成人 | 熟妇女人妻丰满少妇中文字幕 | 日韩成人一区二区三区在线观看 | 无码av岛国片在线播放 | 中文字幕日韩精品一区二区三区 | 久久综合九色综合欧美狠狠 | 亚洲 激情 小说 另类 欧美 | 亚洲精品欧美二区三区中文字幕 | 一本久道久久综合婷婷五月 | 国产绳艺sm调教室论坛 | 天天拍夜夜添久久精品大 | 97精品国产97久久久久久免费 | 中文字幕乱码人妻无码久久 | 99riav国产精品视频 | 色婷婷av一区二区三区之红樱桃 | 狠狠cao日日穞夜夜穞av | 99视频精品全部免费免费观看 | 久久人妻内射无码一区三区 | 久久精品人人做人人综合 | 色爱情人网站 | 嫩b人妻精品一区二区三区 | 蜜臀av在线播放 久久综合激激的五月天 | 又色又爽又黄的美女裸体网站 | 欧美freesex黑人又粗又大 | 黑人玩弄人妻中文在线 | 少妇高潮喷潮久久久影院 | 久久亚洲国产成人精品性色 | 国产精品a成v人在线播放 | 亚洲乱码中文字幕在线 | 国产绳艺sm调教室论坛 | 丰满护士巨好爽好大乳 | 樱花草在线社区www | av小次郎收藏 | 日韩精品一区二区av在线 | 日本爽爽爽爽爽爽在线观看免 | 色窝窝无码一区二区三区色欲 | 欧美日韩在线亚洲综合国产人 | 亚洲熟悉妇女xxx妇女av | 天堂亚洲2017在线观看 | 亚洲一区二区三区 | 在线а√天堂中文官网 | 人妻体内射精一区二区三四 | 鲁一鲁av2019在线 | 99精品视频在线观看免费 | 欧美性猛交xxxx富婆 | 久久国产36精品色熟妇 | 国产亚洲美女精品久久久2020 | 亚洲精品成人福利网站 | 国产精品久久久久久亚洲影视内衣 | 国产av无码专区亚洲awww | 亚洲 欧美 激情 小说 另类 | 国产午夜无码视频在线观看 | 成年美女黄网站色大免费全看 | 久久亚洲国产成人精品性色 | 国产一精品一av一免费 | 四虎影视成人永久免费观看视频 | 亚洲欧美日韩综合久久久 | 大肉大捧一进一出好爽视频 | 无码av最新清无码专区吞精 | 中文字幕无线码 | 少妇无码av无码专区在线观看 | 一本色道婷婷久久欧美 | 亚洲精品一区二区三区在线观看 | 最近中文2019字幕第二页 | 九九久久精品国产免费看小说 | 欧美精品免费观看二区 | 狂野欧美激情性xxxx | 国产精品丝袜黑色高跟鞋 | 久久天天躁夜夜躁狠狠 | 精品无码国产一区二区三区av | 领导边摸边吃奶边做爽在线观看 | 少妇性俱乐部纵欲狂欢电影 | 国内精品久久久久久中文字幕 | 久久亚洲a片com人成 | 99久久久无码国产精品免费 | 精品成人av一区二区三区 | 纯爱无遮挡h肉动漫在线播放 | 亚洲热妇无码av在线播放 | 99在线 | 亚洲 | 亚洲国产精品久久人人爱 | 亚洲色欲久久久综合网东京热 | 久久久久人妻一区精品色欧美 | 国产成人无码a区在线观看视频app | 色偷偷av老熟女 久久精品人妻少妇一区二区三区 | а天堂中文在线官网 | 色综合视频一区二区三区 | 久久亚洲国产成人精品性色 | 久久精品中文闷骚内射 | 一区二区三区乱码在线 | 欧洲 | 俺去俺来也在线www色官网 | 国产色视频一区二区三区 | 精品人妻人人做人人爽 | 未满小14洗澡无码视频网站 | 性欧美牲交xxxxx视频 | 日本一区二区三区免费播放 | 久久午夜无码鲁丝片秋霞 | 在线成人www免费观看视频 | 国产免费观看黄av片 | 国产成人无码av一区二区 | 久久亚洲精品成人无码 | 粉嫩少妇内射浓精videos | 妺妺窝人体色www在线小说 | 国产 浪潮av性色四虎 | 人人妻人人澡人人爽人人精品浪潮 | 久久精品丝袜高跟鞋 | 婷婷色婷婷开心五月四房播播 | 国产乱人伦偷精品视频 | 久久综合香蕉国产蜜臀av | 乌克兰少妇性做爰 | 性欧美牲交在线视频 | 97无码免费人妻超级碰碰夜夜 | 国产激情一区二区三区 | 精品久久综合1区2区3区激情 | 疯狂三人交性欧美 | 六十路熟妇乱子伦 | 精品久久综合1区2区3区激情 | 国产高清av在线播放 | 国产精品久久久久无码av色戒 | 国内丰满熟女出轨videos | 亚洲成av人影院在线观看 | 两性色午夜免费视频 | 亚欧洲精品在线视频免费观看 | 亚洲精品一区二区三区大桥未久 | 最新国产乱人伦偷精品免费网站 | 图片小说视频一区二区 | 国产亚洲精品久久久久久久久动漫 | 狠狠色丁香久久婷婷综合五月 | 99国产精品白浆在线观看免费 | 亚洲成a人片在线观看无码 | 国语精品一区二区三区 | 国产舌乚八伦偷品w中 | 亚洲国产精品无码久久久久高潮 | 国产亚洲tv在线观看 | 99久久精品无码一区二区毛片 | 无码任你躁久久久久久久 | 亚洲一区二区三区偷拍女厕 | 国产偷抇久久精品a片69 | 国产精品爱久久久久久久 | 国产精品18久久久久久麻辣 | 久久人人爽人人爽人人片ⅴ | 1000部夫妻午夜免费 | 亚洲性无码av中文字幕 | 亚洲午夜福利在线观看 | 久久精品人人做人人综合试看 | 国产精品国产三级国产专播 | 成人免费视频视频在线观看 免费 | 黄网在线观看免费网站 | 久久久中文字幕日本无吗 | 大色综合色综合网站 | 西西人体www44rt大胆高清 | 久久久久久国产精品无码下载 | 亚洲国产欧美日韩精品一区二区三区 | 国产精品久久久久久无码 | av无码久久久久不卡免费网站 | 免费看少妇作爱视频 | 少妇的肉体aa片免费 | 国产成人无码av一区二区 | 欧美大屁股xxxxhd黑色 | 亚洲国产精品久久久久久 | 人妻有码中文字幕在线 | 欧美猛少妇色xxxxx | 在线观看国产午夜福利片 | 国产精品-区区久久久狼 | 久久久久av无码免费网 | 久久久亚洲欧洲日产国码αv | 麻豆果冻传媒2021精品传媒一区下载 | 夜夜躁日日躁狠狠久久av | 九月婷婷人人澡人人添人人爽 | 99麻豆久久久国产精品免费 | 三上悠亚人妻中文字幕在线 | 荫蒂被男人添的好舒服爽免费视频 | 欧美日韩一区二区免费视频 | 亚洲精品中文字幕 | 天天拍夜夜添久久精品大 | aⅴ亚洲 日韩 色 图网站 播放 | 激情人妻另类人妻伦 | 中文精品久久久久人妻不卡 | а天堂中文在线官网 | 亚洲成a人片在线观看无码3d | 人人妻人人澡人人爽欧美一区九九 | 香蕉久久久久久av成人 | 97夜夜澡人人爽人人喊中国片 | 国产成人无码一二三区视频 | ass日本丰满熟妇pics | 沈阳熟女露脸对白视频 | 中文字幕无码免费久久9一区9 | 九九久久精品国产免费看小说 | 午夜时刻免费入口 | √8天堂资源地址中文在线 | 亚洲春色在线视频 | 曰韩少妇内射免费播放 | 无码人妻av免费一区二区三区 | 精品国产一区二区三区av 性色 | 久久99热只有频精品8 | 在线欧美精品一区二区三区 | 7777奇米四色成人眼影 | 久久人人97超碰a片精品 | 亚洲中文字幕无码中字 | 97人妻精品一区二区三区 | 亚洲乱码国产乱码精品精 | 精品乱子伦一区二区三区 | 性生交大片免费看l | 中文字幕无码免费久久9一区9 | 中文精品久久久久人妻不卡 | 日本在线高清不卡免费播放 | 日产国产精品亚洲系列 | 国产精品va在线观看无码 | 日本精品久久久久中文字幕 | 亚欧洲精品在线视频免费观看 | 日本精品久久久久中文字幕 | 九九热爱视频精品 | www一区二区www免费 | 欧美人与动性行为视频 | 久久久婷婷五月亚洲97号色 | 国产精品毛多多水多 | 狠狠综合久久久久综合网 | 大地资源中文第3页 | 国模大胆一区二区三区 | 久久亚洲日韩精品一区二区三区 | 无码午夜成人1000部免费视频 | 国产人妻人伦精品1国产丝袜 | 最近中文2019字幕第二页 | 给我免费的视频在线观看 | 国产人妻人伦精品1国产丝袜 | 国产色视频一区二区三区 | 免费网站看v片在线18禁无码 | 国产人妻久久精品二区三区老狼 | a国产一区二区免费入口 | 欧美成人免费全部网站 | 性色欲情网站iwww九文堂 | 奇米影视7777久久精品 | 亚洲人成网站在线播放942 | 日韩精品乱码av一区二区 | 日本在线高清不卡免费播放 | 国产精品久久久久9999小说 | 任你躁在线精品免费 | 国语自产偷拍精品视频偷 | 国产成人人人97超碰超爽8 | 无码国内精品人妻少妇 | 亚洲欧洲日本无在线码 | 永久免费精品精品永久-夜色 | 东京热无码av男人的天堂 | 搡女人真爽免费视频大全 | 亚洲娇小与黑人巨大交 | 少妇久久久久久人妻无码 | 国产成人综合美国十次 | 亚洲国精产品一二二线 | 99久久久国产精品无码免费 | 国产超级va在线观看视频 | 国产人妻精品午夜福利免费 | 国产精品对白交换视频 | 成人精品一区二区三区中文字幕 | 丰满人妻翻云覆雨呻吟视频 | 国产后入清纯学生妹 | 国产亚洲人成在线播放 | 国产精品人人爽人人做我的可爱 | 久久久中文久久久无码 | 亚洲熟悉妇女xxx妇女av | 色一情一乱一伦一区二区三欧美 | 国内丰满熟女出轨videos | 精品国产一区av天美传媒 | 欧美熟妇另类久久久久久多毛 | 夜夜夜高潮夜夜爽夜夜爰爰 | 学生妹亚洲一区二区 | 免费观看激色视频网站 | 亚洲日本va午夜在线电影 | 波多野结衣aⅴ在线 | 色五月五月丁香亚洲综合网 | 亚洲成a人片在线观看无码3d | 高清不卡一区二区三区 | 奇米综合四色77777久久 东京无码熟妇人妻av在线网址 | 国产口爆吞精在线视频 | 十八禁真人啪啪免费网站 | 欧美人与禽猛交狂配 | 亚洲爆乳精品无码一区二区三区 | 亲嘴扒胸摸屁股激烈网站 | 精品久久久久久人妻无码中文字幕 | 蜜臀av在线播放 久久综合激激的五月天 | 国产精品人妻一区二区三区四 | 无码免费一区二区三区 | 亚洲性无码av中文字幕 | 国产婷婷色一区二区三区在线 | 中文字幕无码免费久久99 | 精品人人妻人人澡人人爽人人 | 亚洲成熟女人毛毛耸耸多 | 国产 浪潮av性色四虎 | 爱做久久久久久 | 久久久久se色偷偷亚洲精品av | 欧美野外疯狂做受xxxx高潮 | 欧美 丝袜 自拍 制服 另类 | 亚洲国产精品无码一区二区三区 | 国产精品毛多多水多 | 无遮挡国产高潮视频免费观看 | 午夜无码人妻av大片色欲 | 精品人人妻人人澡人人爽人人 | 亚洲国产高清在线观看视频 | 六十路熟妇乱子伦 | 久久97精品久久久久久久不卡 | 精品偷自拍另类在线观看 | 精品熟女少妇av免费观看 | 东京无码熟妇人妻av在线网址 | 精品国产青草久久久久福利 | 国产精品.xx视频.xxtv | 午夜熟女插插xx免费视频 | 午夜福利试看120秒体验区 | 思思久久99热只有频精品66 | 久久国内精品自在自线 | 国产人妻精品一区二区三区不卡 | 国产猛烈高潮尖叫视频免费 | 精品人人妻人人澡人人爽人人 | 亚洲色偷偷男人的天堂 | 又粗又大又硬又长又爽 | 久久精品丝袜高跟鞋 | 亚洲色偷偷偷综合网 | 精品亚洲韩国一区二区三区 | 欧洲vodafone精品性 | 成人一区二区免费视频 | 欧美人与禽zoz0性伦交 | 欧美成人午夜精品久久久 | 亚洲日本在线电影 | 久久综合狠狠综合久久综合88 | 亚洲成av人影院在线观看 | 国产后入清纯学生妹 | 无码人妻av免费一区二区三区 | 亚洲中文字幕va福利 | 亚洲乱码国产乱码精品精 | 亚洲色欲久久久综合网东京热 | 免费中文字幕日韩欧美 | 日韩av激情在线观看 | 熟女体下毛毛黑森林 | 亚洲一区二区三区偷拍女厕 | 久久精品人人做人人综合试看 | 国产人妻精品午夜福利免费 | 自拍偷自拍亚洲精品10p | 亚洲色欲色欲天天天www | 国产av一区二区精品久久凹凸 | 国产两女互慰高潮视频在线观看 | 欧美三级不卡在线观看 | 少妇久久久久久人妻无码 | 日韩少妇白浆无码系列 | 久久久久久久久蜜桃 | 精品无人区无码乱码毛片国产 | 超碰97人人射妻 | 少妇厨房愉情理9仑片视频 | 国产舌乚八伦偷品w中 | 免费国产成人高清在线观看网站 | 久久人人爽人人人人片 | 特级做a爰片毛片免费69 | 午夜丰满少妇性开放视频 | 十八禁真人啪啪免费网站 | 精品少妇爆乳无码av无码专区 | 亚洲一区二区三区香蕉 | 国产精品a成v人在线播放 | 少妇被粗大的猛进出69影院 | 少妇无码吹潮 | 免费网站看v片在线18禁无码 | 久久久久人妻一区精品色欧美 | 欧美老熟妇乱xxxxx | 久久综合久久自在自线精品自 | 乱人伦人妻中文字幕无码久久网 | 午夜熟女插插xx免费视频 | 中文无码伦av中文字幕 | 日韩精品a片一区二区三区妖精 | 日本熟妇乱子伦xxxx | 色 综合 欧美 亚洲 国产 | 中文无码伦av中文字幕 | 亚洲gv猛男gv无码男同 | 久激情内射婷内射蜜桃人妖 | 中文字幕无码乱人伦 | 性欧美大战久久久久久久 | 亚洲精品鲁一鲁一区二区三区 | 日韩人妻系列无码专区 | 国产在线一区二区三区四区五区 | 国产极品视觉盛宴 | 人人爽人人爽人人片av亚洲 | 国产午夜无码视频在线观看 | 国产精品无码成人午夜电影 | 亚洲国产精品久久人人爱 | 六月丁香婷婷色狠狠久久 | 国产电影无码午夜在线播放 | 精品一区二区三区无码免费视频 | 中文字幕无码av波多野吉衣 | 亚洲精品国产精品乱码不卡 | 亚洲国产精品久久久天堂 | 最近免费中文字幕中文高清百度 | 亚洲 a v无 码免 费 成 人 a v | 日韩欧美中文字幕在线三区 | 在线天堂新版最新版在线8 | 国产成人亚洲综合无码 | 日本护士毛茸茸高潮 | 午夜精品一区二区三区的区别 | 日韩精品无码一本二本三本色 | 特级做a爰片毛片免费69 | 欧美成人免费全部网站 | 亲嘴扒胸摸屁股激烈网站 | 欧美野外疯狂做受xxxx高潮 | 欧美 日韩 亚洲 在线 | 免费播放一区二区三区 | 亚洲成熟女人毛毛耸耸多 | 中文毛片无遮挡高清免费 | 亚洲一区二区观看播放 | 国产真实伦对白全集 | 成人女人看片免费视频放人 | 少妇厨房愉情理9仑片视频 | 中文字幕无码av激情不卡 | 无码人妻少妇伦在线电影 | 久久精品国产一区二区三区 | 久久精品中文闷骚内射 | 麻豆国产人妻欲求不满 | 国产亚洲人成a在线v网站 | 久久天天躁狠狠躁夜夜免费观看 | 久久午夜夜伦鲁鲁片无码免费 | 亚洲乱亚洲乱妇50p | 国产精品美女久久久 | 丰满妇女强制高潮18xxxx | 在线视频网站www色 | 国产麻豆精品一区二区三区v视界 | 宝宝好涨水快流出来免费视频 | 国产精品怡红院永久免费 | 熟女少妇人妻中文字幕 | 成人综合网亚洲伊人 | 桃花色综合影院 | 欧美黑人性暴力猛交喷水 | 久久精品人人做人人综合 | 又大又紧又粉嫩18p少妇 | 天天燥日日燥 | 男人和女人高潮免费网站 | 好男人社区资源 | 色综合久久久无码网中文 | 动漫av网站免费观看 | 大肉大捧一进一出视频出来呀 | 久久午夜夜伦鲁鲁片无码免费 | 国产一区二区三区精品视频 | 成人欧美一区二区三区 | 亚洲成av人影院在线观看 | 欧美午夜特黄aaaaaa片 | 久久综合给久久狠狠97色 | 日韩av无码一区二区三区不卡 | 国产午夜福利100集发布 | а天堂中文在线官网 | 日日摸日日碰夜夜爽av | 日韩精品久久久肉伦网站 | 国产乱人伦app精品久久 国产在线无码精品电影网 国产国产精品人在线视 | av香港经典三级级 在线 | 亚洲性无码av中文字幕 | 国产精品资源一区二区 | 久久亚洲中文字幕无码 | 久久精品国产亚洲精品 | 最新国产乱人伦偷精品免费网站 | 色偷偷av老熟女 久久精品人妻少妇一区二区三区 | 亚无码乱人伦一区二区 | 天堂亚洲免费视频 | 福利一区二区三区视频在线观看 | 亚洲欧美中文字幕5发布 | 国产99久久精品一区二区 | 国产人成高清在线视频99最全资源 | 国内揄拍国内精品人妻 | 国产人妻人伦精品1国产丝袜 | 亚洲中文字幕无码一久久区 | 亚洲自偷自拍另类第1页 | 亚洲欧美国产精品久久 | 中文字幕无码免费久久99 | 黑人玩弄人妻中文在线 | 秋霞成人午夜鲁丝一区二区三区 | 少妇邻居内射在线 | 欧美丰满熟妇xxxx性ppx人交 | 日韩在线不卡免费视频一区 | 国产午夜无码视频在线观看 | 久久久国产精品无码免费专区 | 熟女少妇在线视频播放 | 色欲久久久天天天综合网精品 | 玩弄少妇高潮ⅹxxxyw | 久久天天躁夜夜躁狠狠 | 国产精品久久久久久亚洲影视内衣 | 东京热一精品无码av | 中文字幕无码av激情不卡 | 51国偷自产一区二区三区 | 国产精品久久久午夜夜伦鲁鲁 | 亚洲男人av天堂午夜在 | 老太婆性杂交欧美肥老太 | 熟妇人妻无码xxx视频 | 精品成人av一区二区三区 | 在线观看欧美一区二区三区 | 久热国产vs视频在线观看 | 久久久久亚洲精品中文字幕 | 波多野结衣av一区二区全免费观看 | 日本乱偷人妻中文字幕 | 国产香蕉97碰碰久久人人 | 亚洲欧美精品aaaaaa片 | 亚洲乱码国产乱码精品精 | 国产黄在线观看免费观看不卡 | 亚洲成a人片在线观看无码 | 性欧美熟妇videofreesex | 窝窝午夜理论片影院 | 国产成人一区二区三区别 | 免费国产成人高清在线观看网站 | 色婷婷综合中文久久一本 | 荫蒂添的好舒服视频囗交 | 亚洲欧美日韩综合久久久 | 成人女人看片免费视频放人 | 亚洲欧洲中文日韩av乱码 | 沈阳熟女露脸对白视频 | 日韩人妻系列无码专区 | 国产成人无码一二三区视频 | 中文字幕av日韩精品一区二区 | 99久久婷婷国产综合精品青草免费 | 中文字幕中文有码在线 | 麻豆精品国产精华精华液好用吗 | 波多野结衣av一区二区全免费观看 | 奇米影视888欧美在线观看 | 午夜无码区在线观看 | 一本一道久久综合久久 | 亚洲日韩av一区二区三区四区 | 人妻天天爽夜夜爽一区二区 | 亚洲 日韩 欧美 成人 在线观看 | 亚洲国精产品一二二线 | 又粗又大又硬又长又爽 | 无码纯肉视频在线观看 | 好屌草这里只有精品 | 国产亚洲欧美在线专区 | 色综合久久网 | 国产精品久久久久久亚洲毛片 | 狂野欧美性猛xxxx乱大交 | 好男人www社区 | av无码不卡在线观看免费 | 欧美日韩精品 | 久久久久久国产精品无码下载 | 免费国产黄网站在线观看 | 精品人妻人人做人人爽夜夜爽 | 国产在线精品一区二区高清不卡 | 97精品人妻一区二区三区香蕉 | 正在播放老肥熟妇露脸 | 国产一区二区三区精品视频 | 国产午夜亚洲精品不卡 | 好屌草这里只有精品 | 国产激情艳情在线看视频 | 亚洲国产精品成人久久蜜臀 | 无码人中文字幕 | 国产成人一区二区三区在线观看 | 中文字幕av无码一区二区三区电影 | 国产精品人人妻人人爽 | 在线成人www免费观看视频 | 国产精品亚洲专区无码不卡 | 99久久久无码国产精品免费 | 久久国产精品萌白酱免费 | 综合人妻久久一区二区精品 | 国产精品手机免费 | 亚洲gv猛男gv无码男同 | 人人妻人人澡人人爽人人精品 | 99re在线播放 | 99久久精品国产一区二区蜜芽 | 色老头在线一区二区三区 | 国产一区二区三区四区五区加勒比 | 中文字幕无码av波多野吉衣 | 天天拍夜夜添久久精品 | 性做久久久久久久免费看 | 精品无人区无码乱码毛片国产 | 鲁大师影院在线观看 | 国产特级毛片aaaaaa高潮流水 | 日本大乳高潮视频在线观看 | 大地资源中文第3页 | 国产情侣作爱视频免费观看 | 国产精品久久国产三级国 | 成人av无码一区二区三区 | 久久久久亚洲精品男人的天堂 | 中文字幕av伊人av无码av | 婷婷丁香五月天综合东京热 | 久久久亚洲欧洲日产国码αv | 欧美高清在线精品一区 | 牛和人交xxxx欧美 | 久久99热只有频精品8 | 国产精品99久久精品爆乳 | 国产精品亚洲五月天高清 | 久久 国产 尿 小便 嘘嘘 | 欧美乱妇无乱码大黄a片 | 午夜免费福利小电影 | 日产精品高潮呻吟av久久 | 成人欧美一区二区三区黑人 | 精品偷自拍另类在线观看 | 熟妇人妻无乱码中文字幕 | 鲁大师影院在线观看 | 最近的中文字幕在线看视频 | 国产成人午夜福利在线播放 | 国产av无码专区亚洲a∨毛片 | 伊人久久大香线蕉亚洲 | 国产无套粉嫩白浆在线 | 亚洲一区二区三区偷拍女厕 | 亚洲娇小与黑人巨大交 | 天干天干啦夜天干天2017 | 国产亚洲精品久久久久久国模美 | 亚洲大尺度无码无码专区 | 午夜福利一区二区三区在线观看 | 国产成人精品三级麻豆 | 欧美日韩人成综合在线播放 | 成人精品视频一区二区 | 欧美熟妇另类久久久久久多毛 | 日韩人妻无码中文字幕视频 | 免费视频欧美无人区码 | 人妻少妇精品视频专区 | 少妇久久久久久人妻无码 | 色妞www精品免费视频 | 精品无码一区二区三区的天堂 | 免费看男女做好爽好硬视频 | 狠狠色欧美亚洲狠狠色www | yw尤物av无码国产在线观看 | 国产极品视觉盛宴 | 在线精品国产一区二区三区 | 国产亚洲日韩欧美另类第八页 | 国产av一区二区三区最新精品 | 午夜成人1000部免费视频 | 少妇厨房愉情理9仑片视频 | 人人妻人人澡人人爽人人精品浪潮 | 亚洲精品一区二区三区婷婷月 | 中文字幕人妻丝袜二区 | 黑人玩弄人妻中文在线 | 国产手机在线αⅴ片无码观看 | 性生交大片免费看l | 久久午夜无码鲁丝片秋霞 | 久久视频在线观看精品 | 亚洲精品国产精品乱码不卡 | 青青草原综合久久大伊人精品 | 在线天堂新版最新版在线8 | 骚片av蜜桃精品一区 | 狂野欧美激情性xxxx | 无码av岛国片在线播放 | www国产亚洲精品久久久日本 | 人人超人人超碰超国产 | 狠狠色噜噜狠狠狠狠7777米奇 | 在线天堂新版最新版在线8 | 国产精品久久久久9999小说 | √天堂资源地址中文在线 | 亚洲中文字幕av在天堂 | 88国产精品欧美一区二区三区 | 欧美一区二区三区视频在线观看 | 精品一区二区三区波多野结衣 | 日产精品99久久久久久 | 国产亚洲精品久久久久久大师 | 国产乱码精品一品二品 | 欧美freesex黑人又粗又大 | 天堂а√在线地址中文在线 | 久久久中文字幕日本无吗 | 欧美 日韩 人妻 高清 中文 | 国产电影无码午夜在线播放 | 国产精品内射视频免费 | 综合人妻久久一区二区精品 | 人人妻人人澡人人爽人人精品 | 性色欲情网站iwww九文堂 | 九一九色国产 | 自拍偷自拍亚洲精品10p | 亚洲色欲色欲欲www在线 | 亚洲国产精品一区二区第一页 | 欧美高清在线精品一区 | 荫蒂被男人添的好舒服爽免费视频 | 亚洲码国产精品高潮在线 | 国产又爽又黄又刺激的视频 | 国内老熟妇对白xxxxhd | 亚洲日韩av片在线观看 | 撕开奶罩揉吮奶头视频 | 国产情侣作爱视频免费观看 | 人妻互换免费中文字幕 | 欧美午夜特黄aaaaaa片 | 蜜臀av在线观看 在线欧美精品一区二区三区 | 亚洲自偷自拍另类第1页 | 精品无码av一区二区三区 | 国内少妇偷人精品视频免费 | 台湾无码一区二区 | 亚洲欧洲日本无在线码 | 丝袜足控一区二区三区 | 日韩欧美群交p片內射中文 | 欧美丰满熟妇xxxx | 中文字幕av伊人av无码av | 娇妻被黑人粗大高潮白浆 | 精品日本一区二区三区在线观看 | 色综合久久久无码网中文 | 精品水蜜桃久久久久久久 | 久久久www成人免费毛片 | 久久99精品久久久久久 | 欧美大屁股xxxxhd黑色 | 伊人久久大香线焦av综合影院 | 国产午夜亚洲精品不卡 | 久久精品国产精品国产精品污 | 亚洲码国产精品高潮在线 | 亚洲精品欧美二区三区中文字幕 | 无码人妻av免费一区二区三区 | 亚洲成av人影院在线观看 | 国产一精品一av一免费 | 国产人妻精品午夜福利免费 | 天堂亚洲2017在线观看 | 性色欲情网站iwww九文堂 | 国内老熟妇对白xxxxhd | 色综合久久久无码网中文 | 天干天干啦夜天干天2017 | 老熟女重囗味hdxx69 | 国产人成高清在线视频99最全资源 | 黄网在线观看免费网站 | 天下第一社区视频www日本 | 亚洲精品国产品国语在线观看 | 精品成在人线av无码免费看 | 少妇无码av无码专区在线观看 | 国产亚洲精品精品国产亚洲综合 | 又大又硬又爽免费视频 | 色婷婷综合中文久久一本 | 水蜜桃色314在线观看 | av小次郎收藏 | 亚洲一区二区三区 | 久久久久久亚洲精品a片成人 | 无码av免费一区二区三区试看 | 网友自拍区视频精品 | 欧洲极品少妇 | 色婷婷香蕉在线一区二区 | 水蜜桃色314在线观看 | 精品国产一区av天美传媒 | 熟女少妇人妻中文字幕 | 国产人妻人伦精品1国产丝袜 | 亚洲国产精品美女久久久久 | 麻豆md0077饥渴少妇 | 久久亚洲日韩精品一区二区三区 | 色婷婷香蕉在线一区二区 | 精品夜夜澡人妻无码av蜜桃 | 国产av一区二区精品久久凹凸 | 中文字幕av伊人av无码av | 亚洲人成无码网www | 熟女体下毛毛黑森林 | 中文字幕人妻无码一夲道 | 久久久久免费精品国产 | 精品国产麻豆免费人成网站 | 国产人妻精品一区二区三区不卡 | 国产乱人伦app精品久久 国产在线无码精品电影网 国产国产精品人在线视 | 老子影院午夜伦不卡 | 中文字幕无码免费久久9一区9 | 中文字幕无码日韩欧毛 | 国产sm调教视频在线观看 | 亚洲中文字幕乱码av波多ji | 乱人伦中文视频在线观看 | 精品无码一区二区三区的天堂 | 免费无码的av片在线观看 | 亚洲精品国产品国语在线观看 | 丝袜 中出 制服 人妻 美腿 | 国产成人无码区免费内射一片色欲 | 曰韩少妇内射免费播放 | 久久国内精品自在自线 | 久久国产精品_国产精品 | 自拍偷自拍亚洲精品被多人伦好爽 | 欧美精品无码一区二区三区 | 免费网站看v片在线18禁无码 | 图片区 小说区 区 亚洲五月 | 呦交小u女精品视频 | 领导边摸边吃奶边做爽在线观看 | 国产免费久久精品国产传媒 | 成在人线av无码免费 | 狠狠色欧美亚洲狠狠色www | 欧美放荡的少妇 | 日本熟妇人妻xxxxx人hd | 亚洲aⅴ无码成人网站国产app | 午夜精品久久久内射近拍高清 | 婷婷丁香五月天综合东京热 | 国内少妇偷人精品视频免费 | 强开小婷嫩苞又嫩又紧视频 | 日日天干夜夜狠狠爱 | 亚洲综合无码一区二区三区 | 国产精品高潮呻吟av久久 | 中文字幕人妻丝袜二区 | 欧美激情内射喷水高潮 | 成人无码视频在线观看网站 | 亚拍精品一区二区三区探花 | 国产suv精品一区二区五 | 色情久久久av熟女人妻网站 | 蜜桃臀无码内射一区二区三区 | 老子影院午夜精品无码 | 中文字幕无码人妻少妇免费 | 在线播放免费人成毛片乱码 | 日韩欧美中文字幕在线三区 | 久久人人爽人人爽人人片av高清 | 东京热无码av男人的天堂 | 国产激情无码一区二区 | 亚洲精品一区二区三区四区五区 | 熟妇人妻无乱码中文字幕 | 巨爆乳无码视频在线观看 | 国产激情无码一区二区app | 国产色视频一区二区三区 | 国产亚av手机在线观看 | 久久久久久久久888 | 秋霞成人午夜鲁丝一区二区三区 | 国产色精品久久人妻 | 国产精品亚洲专区无码不卡 | 樱花草在线播放免费中文 | 久久精品国产大片免费观看 | 乱中年女人伦av三区 | 日本一卡二卡不卡视频查询 | 亚洲成av人片在线观看无码不卡 | 又大又硬又爽免费视频 | 色诱久久久久综合网ywww | 国产亚洲tv在线观看 | 狠狠噜狠狠狠狠丁香五月 | 国产农村妇女aaaaa视频 撕开奶罩揉吮奶头视频 | 欧美国产日韩亚洲中文 | 久久久久人妻一区精品色欧美 | 国产国语老龄妇女a片 | 国产无遮挡又黄又爽免费视频 | 亚洲一区二区三区在线观看网站 | 久久久久成人精品免费播放动漫 | 亚洲日韩中文字幕在线播放 | 亚洲精品国产第一综合99久久 | 国产激情精品一区二区三区 | 日本一卡二卡不卡视频查询 | 亚洲一区二区三区无码久久 | 自拍偷自拍亚洲精品被多人伦好爽 | 日本精品久久久久中文字幕 | 国产 浪潮av性色四虎 | 99精品无人区乱码1区2区3区 | 少妇厨房愉情理9仑片视频 | 少妇人妻大乳在线视频 | 中文字幕无码免费久久99 | 波多野结衣av在线观看 | 在线 国产 欧美 亚洲 天堂 | 国产激情无码一区二区 | 日本乱偷人妻中文字幕 | 熟女少妇人妻中文字幕 | 国产一区二区不卡老阿姨 | 久久www免费人成人片 | 在线天堂新版最新版在线8 | 国产福利视频一区二区 | 亚洲一区二区三区在线观看网站 | 亚洲精品一区二区三区在线观看 | 亚洲精品久久久久久一区二区 | 色窝窝无码一区二区三区色欲 | 久久亚洲精品中文字幕无男同 | 久久精品人人做人人综合 | 性色欲网站人妻丰满中文久久不卡 | 国产农村妇女高潮大叫 | 男人扒开女人内裤强吻桶进去 | 奇米影视7777久久精品人人爽 | 天天躁日日躁狠狠躁免费麻豆 | 亚洲综合在线一区二区三区 | 狂野欧美性猛xxxx乱大交 | 欧美老熟妇乱xxxxx | 丁香花在线影院观看在线播放 | 妺妺窝人体色www在线小说 | 国产熟女一区二区三区四区五区 | 亚洲国产精品一区二区第一页 | 亚洲国产综合无码一区 | 欧美丰满老熟妇xxxxx性 | 亚洲精品鲁一鲁一区二区三区 | 国产超碰人人爽人人做人人添 | 日日噜噜噜噜夜夜爽亚洲精品 | 国产成人亚洲综合无码 | 国产在线aaa片一区二区99 | 色一情一乱一伦 | 黑人粗大猛烈进出高潮视频 | 亚洲综合另类小说色区 | 久久午夜无码鲁丝片午夜精品 | 中文毛片无遮挡高清免费 | 人人妻人人藻人人爽欧美一区 | 熟女少妇人妻中文字幕 | 日韩精品久久久肉伦网站 | 丰满人妻一区二区三区免费视频 | 久久久国产精品无码免费专区 | 成人无码精品一区二区三区 | 国产精品99爱免费视频 | 国产精品高潮呻吟av久久4虎 | 亚洲性无码av中文字幕 | 久久综合九色综合欧美狠狠 | 久久久无码中文字幕久... | 真人与拘做受免费视频 | 亚洲国精产品一二二线 | 玩弄人妻少妇500系列视频 | 久久久无码中文字幕久... | 中文字幕人成乱码熟女app | 国产在线一区二区三区四区五区 | 欧美真人作爱免费视频 | 欧美性生交活xxxxxdddd | 男女作爱免费网站 | 国产精品欧美成人 | 高清国产亚洲精品自在久久 | 国产精品高潮呻吟av久久4虎 | 漂亮人妻洗澡被公强 日日躁 | 香港三级日本三级妇三级 | 久久精品国产99精品亚洲 | 亚洲国产精品一区二区美利坚 | 天堂亚洲2017在线观看 | 久久精品女人的天堂av | 无码中文字幕色专区 | 国产三级精品三级男人的天堂 | 久久精品人妻少妇一区二区三区 | 日本丰满熟妇videos | 又黄又爽又色的视频 | 午夜精品一区二区三区在线观看 | 999久久久国产精品消防器材 | 欧美色就是色 | 日本一区二区三区免费播放 | 又紧又大又爽精品一区二区 | 亚洲天堂2017无码 | 久久久久久久女国产乱让韩 | 色情久久久av熟女人妻网站 | 国产成人精品优优av | 在线播放无码字幕亚洲 | 亚洲成色在线综合网站 | 久久综合九色综合欧美狠狠 | 欧美精品一区二区精品久久 | 国内少妇偷人精品视频免费 | 婷婷综合久久中文字幕蜜桃三电影 | 午夜肉伦伦影院 | 青青青手机频在线观看 |