Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging
Andrea Dosi, Massimo Brescia, Stefano Cavuoti, Mariarca D'Aniello, Michele Delli Veneri, Carlo Donadio, Adriano Ettari, Giuseppe Longo, Alvi Rownok, Luca Sannino, Maria Zampella
SEA-ViT: Sea Surface Currents Forecasting Using Vision Transformer and GRU-Based Spatio-Temporal Covariance Modeling
Teerapong Panboonyuen
Investigation of Hierarchical Spectral Vision Transformer Architecture for Classification of Hyperspectral Imagery
Wei Liu, Saurabh Prasad, Melba Crawford
HTR-VT: Handwritten Text Recognition with Vision Transformer
Yuting Li, Dexiong Chen, Tinglong Tang, Xi Shen
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
Hanning Chen, Yang Ni, Wenjun Huang, Yezi Liu, SungHeon Jeong, Fei Wen, Nathaniel Bastian, Hugo Latapie, Mohsen Imani
VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation
Ezra MacDonald, Derek Jacoby, Yvonne Coady
Efficient Training of Large Vision Models via Advanced Automated Progressive Learning
Changlin Li, Jiawei Zhang, Sihao Lin, Zongxin Yang, Junwei Liang, Xiaodan Liang, Xiaojun Chang
Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers
Gorka Abad, Stjepan Picek, Lorenzo Cavallaro, Aitor Urbieta
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu
Deep Transfer Learning for Breast Cancer Classification
Prudence Djagba, J. K. Buwa Mbouobda
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
Ibtissam Saadi, Douglas W. Cunningham, Taleb-ahmed Abdelmalik, Abdenour Hadid, Yassin El Hillali
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation
Shahzaib Iqbal, Tariq M. Khan, Syed S. Naqvi, Asim Naveed, Erik Meijering