Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
Trading through Earnings Seasons using Self-Supervised Contrastive Representation Learning
Zhengxin Joseph Ye, Bjoern Schuller
Classification of Gleason Grading in Prostate Cancer Histopathology Images Using Deep Learning Techniques: YOLO, Vision Transformers, and Vision Mamba
Amin Malekmohammadi, Ali Badiezadeh, Seyed Mostafa Mirhassani, Parisa Gifani, Majid Vafaeezadeh
Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation
Richard D. Paul, Alessio Quercia, Vincent Fortuin, Katharina Nöh, Hanno Scharr
Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis
Illia Tsiporenko, Pavel Chizhov, Dmytro Fishman
HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space
Jacob Fein-Ashley, Ethan Feng, Minh Pham
A Novel Framework for the Automated Characterization of Gram-Stained Blood Culture Slides Using a Large-Scale Vision Transformer
Jack McMahon, Naofumi Tomita, Elizabeth S. Tatishev, Adrienne A. Workman, Cristina R Costales, Niaz Banaei, Isabella W. Martin, Saeed Hassanpour
HydroVision: LiDAR-Guided Hydrometric Prediction with Vision Transformers and Hybrid Graph Learning
Naghmeh Shafiee Roudbari, Ursula Eicker, Charalambos Poullis, Zachary Patterson
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
Shihua Sun, Kenechukwu Nwodo, Shridatt Sugrim, Angelos Stavrou, Haining Wang
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang, Vardan Papyan
Boosting Federated Domain Generalization: The Role of Advanced Pre-Trained Architectures
Avi Deb Raha, Apurba Adhikary, Mrityunjoy Gain, Yu Qiao, Choong Seon Hong
DS2TA: Denoising Spiking Transformer with Attenuated Spatiotemporal Attention
Boxun Xu, Hejia Geng, Yuxuan Yin, Peng Li
Unsupervised Feature Orthogonalization for Learning Distortion-Invariant Representations
Sebastian Doerrich, Francesco Di Salvo, Christian Ledig
On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery
BW Sheffield, Jeffrey Ellen, Ben Whitmore
NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis
Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki