Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
PriViT: Vision Transformers for Fast Private Inference
Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning
Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou
TiC: Exploring Vision Transformer in Convolution
Song Zhang, Qingzhong Wang, Jiang Bian, Haoyi Xiong
Sub-token ViT Embedding via Stochastic Resonance Transformers
Dong Lao, Yangchao Wu, Tian Yu Liu, Alex Wong, Stefano Soatto
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
Victor Vadakechirayath George
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
Hamid Mohammadi, Ehsan Nazerfard, Tahereh Firoozi
ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer
Seok-Yong Byun, Wonju Lee
Improving Drumming Robot Via Attention Transformer Network
Yang Yi, Zonghan Li
SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy Efficiency of Inference Efficient Vision Transformers
KL Navaneet, Soroush Abbasi Koohpayegani, Essam Sleiman, Hamed Pirsiavash
MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images
Huyen Tran, Duc Thanh Nguyen, John Yearwood
Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression
Gousia Habib, Tausifa Jan Saleem, Brejesh Lall
Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation
Jingliang Deng, Zonghan Li