Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Top-K Pooling with Patch Contrastive Learning for Weakly-Supervised Semantic Segmentation
Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao
MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection
David C. Jeong, Tianma Shen, Hongji Liu, Raghav Kapoor, Casey Nguyen, Song Liu, Christopher A. Kitts
Tackling Heterogeneity in Medical Federated learning via Vision Transformers
Erfan Darzi, Yiqing Shen, Yangming Ou, Nanna M. Sijtsema, P. M. A van Ooijen
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut
Vision Transformers increase efficiency of 3D cardiac CT multi-label segmentation
Lee Jollans, Mariana Bustamante, Lilian Henriksson, Anders Persson, Tino Ebbers
3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou
Accelerating Vision Transformers Based on Heterogeneous Attention Patterns
Deli Yu, Teng Xi, Jianwei Li, Baopu Li, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain
Md Sohag Mia, Abu Bakor Hayat Arnob, Abdu Naim, Abdullah Al Bary Voban, Md Shariful Islam
No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling
Xuwei Xu, Changlin Li, Yudong Chen, Xiaojun Chang, Jiajun Liu, Sen Wang
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu, Sen Wang, Yudong Chen, Jiajun Liu
A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers
Matteo Bastico, David Ryckelynck, Laurent Corté, Yannick Tillier, Etienne Decencière
RetSeg: Retention-based Colorectal Polyps Segmentation Network
Khaled ELKarazle, Valliappan Raman, Caslon Chua, Patrick Then
Hierarchical Side-Tuning for Vision Transformers
Weifeng Lin, Ziheng Wu, Wentao Yang, Mingxin Huang, Jun Huang, Lianwen Jin