Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement
Yingtie Lei, Jia Yu, Yihang Dong, Changwei Gong, Ziyang Zhou, Chi-Man Pun
Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers
Thanh Thi Nguyen, Campbell Wilson, Janis Dalins
A Novel Approach to Classify Power Quality Signals Using Vision Transformers
Ahmad Mohammad Saber, Alaa Selim, Mohamed M. Hammad, Amr Youssef, Deepa Kundur, Ehab El-Saadany
Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase
Yicong Li, Xing Guo, Haohua Du
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
Alessio Devoto, Federico Alvetreti, Jary Pomponi, Paolo Di Lorenzo, Pasquale Minervini, Simone Scardapane
Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices
Kouki Horio, Kiyoshi Nishikawa, Hitoshi Kiya
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito Classification: A Novel Approach to Entomological Studies
Ahmed Akib Jawad Karim, Muhammad Zawad Mahmud, Riasat Khan
Optimizing Vision Transformers with Data-Free Knowledge Transfer
Gousia Habib, Damandeep Singh, Ishfaq Ahmad Malik, Brejesh Lall