关于大家对Swin Transformer的魔改论文模型记录(只关注Swin是如何使用的)
A Novel Transformer based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images(語義分割任務)
Self-Supervised Learning with Swin Transformers(模型簡稱:MoBY,使用了對比學習)
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation(醫療圖像語義分割)
Rethinking Training from Scratch for Object Detection(看不懂)
Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight
DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation(醫療圖像的語義分割)
Long-Short Temporal Contrastive Learning of Video Transformers
Video Swin Transformer
PVTv2: Improved Baselines with Pyramid Vision Transformer(Pyramid:金字塔)
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
What Makes for Hierarchical Vision Transformer?
CYCLEMLP: A MLP-LIKE ARCHITECTURE FOR DENSE PREDICTION
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
ConvNets vs. Transformers: Whose Visual Representations are More Transferable?
Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets. However, although Transformer-based backbones have achieved much progress on ImageNet classification, it is still unclear whether the learned representations are as transferable as or even more transferable than ConvNets’ features. To address this point, we systematically investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations. Given the strong correlation between the performance of pretrained models and transfer learning, we include 2 residual ConvNets (i.e., R-101×3 and R-152×4) and 3 Transformer based visual backbones (i.e., ViT-B, ViT-L and Swin-B), which have close error rates on ImageNet, that indicate similar transfer learning performance on downstream datasets. We observe consistent advantages of Transformer-based backbones on 13 downstream tasks (out of 15), including but not limited to fine-grained classification, scene recognition (classification, segmentation and depth estimation), open-domain classification, face recognition, etc. More specifically, we find that two ViT models heavily rely on whole network fine-tuning to achieve performance gains while Swin Transformer does not have such a requirement. Moreover, vision transformers behave more robustly in multi-task learning, i.e., bringing more improvements when managing mutually beneficial tasks and reducing performance losses when tackling irrelevant tasks. We hope our discoveries can facilitate the exploration and exploitation of vision transformers in the future.
視覺變壓器因其不局限于卷積神經網絡的空間感應偏置而受到計算機視覺研究者的廣泛關注。然而,盡管基于transformer的主干網在ImageNet分類方面取得了很大的進展,但我們仍然不清楚學習后的表示是否和卷積網絡的特征一樣可轉移,甚至比卷積網絡的特征更可轉移。為了解決這一問題,我們在15個單任務和多任務性能評估中系統地研究了卷積神經網絡和視覺變壓器的遷移學習能力。考慮到預訓練模型的性能與遷移學習之間的強相關性,我們包括2個殘差ConvNets (R-101×3和R-152×4)和3個基于Transformer的視覺主干(vi - b、vi - l和swi - b),它們在ImageNet上的錯誤率接近,表明在下游數據集上的遷移學習性能類似。我們觀察到基于transformer的骨干在13個下游任務(15個任務中)上具有一致的優勢,包括但不限于細粒度分類、場景識別(分類、分割和深度估計)、開放域分類、人臉識別等。更具體地說,我們發現兩個ViT模型嚴重依賴于整個網絡的微調來實現性能增益,而Swin Transformer沒有這樣的需求。此外,視覺變壓器在多任務學習中表現得更加穩健,即在管理互惠任務時帶來更多的改進,在處理無關任務時減少性能損失。我們希望我們的發現可以促進未來視覺變壓器的探索和開發。
SwinIR: Image Restoration Using Swin Transformer(重點是殘差鏈接)
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer
3rd Place Scheme on Instance Segmentation Track of ICCV 2021 VIPriors Challenges
VIDT: AN EFFICIENT AND EFFECTIVE FULLY TRANSFORMER-BASED OBJECT DETECTOR
Satellite Image Semantic Segmentation(衛星圖像語義分割)(手稿)
COVID-19 Detection in Chest X-ray Images Using Swin Transformer and Transformer in Transformer
HRFormer: High-Resolution Transformer for Dense Prediction
Vis-TOP: Visual Transformer Overlay Processor
Hepatic vessel segmentation based on 3D swin-transformer with inductive biased multi-head self-attention
Transformer-based Image Compression(圖像壓縮)
Swin Transformer V2: Scaling Up Capacity and Resolution
Vision Transformer with Deformable Attention
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
SWIN-POSE: SWIN TRANSFORMER BASED HUMAN POSE ESTIMATION
總結
以上是生活随笔為你收集整理的关于大家对Swin Transformer的魔改论文模型记录(只关注Swin是如何使用的)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: SQL学习---第一章
- 下一篇: 复现网状的记忆Transformer图像