Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Automatic Cardiac Pathology Recognition in Echocardiography Images Using Higher Order Dynamic Mode Decomposition and a Vision Transformer for Small Datasets
Andrés Bell-Navas, Nourelhouda Groun, María Villalba-Orero, Enrique Lara-Pezzi, Jesús Garicano-Mena, Soledad Le Clainche
CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation
Weiquan Huang, Yifei Shen, Yifan Yang
C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks
Sairam VC Rebbapragada, Pranoy Panda, Vineeth N Balasubramanian
Deep Learning for Melt Pool Depth Contour Prediction From Surface Thermal Images via Vision Transformers
Francis Ogoke, Peter Myung-Won Pak, Alexander Myers, Guadalupe Quirarte, Jack Beuth, Jonathan Malen, Amir Barati Farimani
Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting
Reza Akbarian Bafghi, Nidhin Harilal, Claire Monteleoni, Maziar Raissi
Binarizing Documents by Leveraging both Space and Frequency
Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara
Efficient Transformer Encoders for Mask2Former-style models
Manyi Yao, Abhishek Aich, Yumin Suh, Amit Roy-Chowdhury, Christian Shelton, Manmohan Chandraker
Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification
Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
Kyunghwan Shim, Jaewoong Yun, Shinkook Choi