Vision Transformer

Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.

1550papers

Papers - Page 16

September 24, 2024

Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Parameter Efficient Transfer Learning Parameter Efficient Transfer Pre Trained Model Visual Recognition Anti Unification Vision Transformer Empirical Study

September 23, 2024

September 22, 2024

September 21, 2024

Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer
Vision Transformer Pre Trained Vision Transformer Inference Time Parameter Efficient Transfer

September 18, 2024

September 16, 2024

September 14, 2024

Vision Transformer

Papers - Page 16

Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation

Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis

HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space

Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition

A Novel Framework for the Automated Characterization of Gram-Stained Blood Culture Slides Using a Large-Scale Vision Transformer

HydroVision: LiDAR-Guided Hydrometric Prediction with Vision Transformers and Hybrid Graph Learning

Patch Ranking: Efficient CLIP by Learning to Rank Local Patches

Detection of pulmonary pathologies using convolutional neural networks, Data Augmentation, ResNet50 and Vision Transformers

Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer

ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

Boosting Federated Domain Generalization: The Role of Advanced Pre-Trained Architectures

DS2TA: Denoising Spiking Transformer with Attenuated Spatiotemporal Attention

Unsupervised Feature Orthogonalization for Learning Distortion-Invariant Representations

On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery

NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers

AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging

SEA-ViT: Sea Surface Currents Forecasting Using Vision Transformer and GRU-Based Spatio-Temporal Covariance Modeling