Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance
Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar
Quantum Vision Transformers for Quark-Gluon Classification
Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu
A Comprehensive Evaluation of Histopathology Foundation Models for Ovarian Cancer Subtype Classification
Jack Breen, Katie Allen, Kieran Zucker, Lucy Godson, Nicolas M. Orsi, Nishant Ravikumar
Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study
Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan
A Timely Survey on Vision Transformer for Deepfake Detection
Zhikan Wang, Zhongyao Cheng, Jiajie Xiong, Xun Xu, Tianrui Li, Bharadwaj Veeravalli, Xulei Yang
Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer
Whenty Ariyanti, Kai-Chun Liu, Kuan-Yu Chen, Yu Tsao
UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks
Kovvuri Sai Gopal Reddy, Bodduluri Saran, A. Mudit Adityaja, Saurabh J. Shigwan, Nitin Kumar
TransAnaNet: Transformer-based Anatomy Change Prediction Network for Head and Neck Cancer Patient Radiotherapy
Meixu Chen, Kai Wang, Michael Dohopolski, Howard Morgan, David Sher, Jing Wang
LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets
Ojasw Upadhyay
Brighteye: Glaucoma Screening with Color Fundus Photographs based on Vision Transformer
Hui Lin, Charilaos Apostolidis, Aggelos K. Katsaggelos
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du, Gu Gong, Xiaowen Chu