Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile, Valentino Maiorca, Luca Bortolussi, Emanuele RodolĂ , Francesco Locatello
ViT-LCA: A Neuromorphic Approach for Vision Transformers
Sanaz Mahmoodi Takaghaj
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang, Baoxin Li, Jae-sun Seo, Yu Cap
NMformer: A Transformer for Noisy Modulation Classification in Wireless Communication
Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET
Yitong Li, Morteza Ghahremani, Youssef Wally, Christian Wachinger
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
Wei Dong, Yuan Sun, Yiting Yang, Xing Zhang, Zhijun Lin, Qingsen Yan, Haokui Zhang, Peng Wang, Yang Yang, Hengtao Shen
FilterViT and DropoutViT: Lightweight Vision Transformer Models for Efficient Attention Mechanisms
Bohang Sun (School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China)
Backdoor Attack Against Vision Transformers via Attention Gradient-Based Image Erosion
Ji Guo, Hongwei Li, Wenbo Jiang, Guoming Lu
MAPUNetR: A Hybrid Vision Transformer and U-Net Architecture for Efficient and Interpretable Medical Image Segmentation
Ovais Iqbal Shah, Danish Raza Rizvi, Aqib Nazir Mir
FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection
Dat Nguyen, Marcella Astrid, Enjie Ghorbel, Djamila Aouada