Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection
Reza Azad, Amirhossein Kazerouni, Babak Azad, Ehsan Khodapanah Aghdam, Yury Velichko, Ulas Bagci, Dorit Merhof
Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation
Ramtin Mojtahedi, Mohammad Hamghalam, Richard K. G. Do, Amber L. Simpson
Learning Diverse Features in Vision Transformers for Improved Generalization
Armand Mihai Nicolicioiu, Andrei Liviu Nicolicioiu, Bogdan Alexe, Damien Teney
Emergence of Segmentation with Minimalistic White-Box Transformers
Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma
Fixating on Attention: Integrating Human Eye Tracking into Vision Transformers
Sharath Koorathota, Nikolas Papadopoulos, Jia Li Ma, Shruti Kumar, Xiaoxiao Sun, Arunesh Mittal, Patrick Adelman, Paul Sajda
Unified Single-Stage Transformer Network for Efficient RGB-T Tracking
Jianqiang Xia, DianXi Shi, Ke Song, Linna Song, XiaoLei Wang, Songchang Jin, Li Zhou, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao
ACC-UNet: A Completely Convolutional UNet model for the 2020s
Nabil Ibtehaz, Daisuke Kihara
Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers
Mohammad Javad Rajabi, Morteza Mirzai, Ahmad Nickabadi
Linear Oscillation: A Novel Activation Function for Vision Transformer
Juyoung Yun
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson, Yin Li, Mohit Gupta
A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation
Jan-Aike Termöhlen, Timo Bartels, Tim Fingscheidt
FrFT based estimation of linear and nonlinear impairments using Vision Transformer
Ting Jiang, Zheng Gao, Yizhao Chen, Zihe Hu, Ming Tang
SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation
Lixiong Qin, Mei Wang, Chao Deng, Ke Wang, Xi Chen, Jiani Hu, Weihong Deng
Exemplar-Free Continual Transformer with Convolutions
Anurag Roy, Vinay Kumar Verma, Sravan Voonna, Kripabandhu Ghosh, Saptarshi Ghosh, Abir Das