Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks
Neha A S, Vivek Chaturvedi, Muhammad Shafique
Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience
Zeyu Wang, Weichen Dai, Xiangyu Zhou, Ji Qi, Yi Zhou
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Prateek Jain, Volker Tresp, Sujoy Paul
CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference
Mohammad Erfan Sadeghi, Arash Fayyazi, Suhas Somashekar, Massoud Pedram
Global-Local Similarity for Efficient Fine-Grained Image Recognition with Vision Transformers
Edwin Arkel Rios, Min-Chun Hu, Bo-Cheng Lai
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng, Zeyi Xu, Fanghui Xue, Biao Yang, Jiancheng Lyu, Shuai Zhang, Yingyong Qi, Jack Xin
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
Nir Barel, Ron Shapira Weber, Nir Mualem, Shahaf E. Finder, Oren Freifeld
Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification
Naif Alkhunaizi, Faris Almalik, Rouqaiah Al-Refai, Muzammal Naseer, Karthik Nandakumar
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh, Jan Kautz
Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification
Mei Qiu, Lauren Christopher, Lingxi Li
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. EL-Assiouti, Ghada Hamed, Dina Khattab, Hala M. Ebied
Swin SMT: Global Sequential Modeling in 3D Medical Image Segmentation
Szymon Płotka, Maciej Chrabaszcz, Przemyslaw Biecek