Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu, Kaikai Pan, Yanjiao Chen, Jiangyi Deng, Shengyuan Pang, Wenyuan Xu
UNetVL: Enhancing 3D Medical Image Segmentation with Chebyshev KAN Powered Vision-LSTM
Xuhui Guo, Tanmoy Dam, Rohan Dhamdhere, Gourav Modanwal, Anant Madabhushi
A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation
Toomas Tahves, Junyi Gu, Mauro Bellone, Raivo Sell
GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation
Niloufar Eghbali, Hassan Bagher-Ebadian, Tuka Alhanai, Mohammad M. Ghassemi
EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models
Andrés Villa, Juan León Alcázar, Motasem Alfarra, Vladimir Araujo, Alvaro Soto, Bernard Ghanem
SAFER: Sharpness Aware layer-selective Finetuning for Enhanced Robustness in vision transformers
Bhavna Gopal, Huanrui Yang, Mark Horton, Yiran Chen
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer
Ziyang Chen, Yongjun Zhang, Wenting Li, Bingshu Wang, Yabo Wu, Yong Zhao, C.L. Philip Chen
MATEY: multiscale adaptive foundation models for spatiotemporal physical systems
Pei Zhang, M. Paul Laiu, Matthew Norman, Doug Stefanski, John Gounley
Segmentation of Muscularis Propria in Colon Histopathology Images Using Vision Transformers for Hirschsprung's Disease
Youssef Megahed, Anthony Fuller, Saleh Abou-Alwan, Dina El Demellawy, Adrian D. C. Chan