Vision Transformer
Vision Transformers (ViTs) adapt the transformer architecture, initially designed for natural language processing, to image analysis by treating images as sequences of patches. Current research focuses on improving ViT efficiency and robustness through techniques like token pruning, attention engineering, and hybrid models combining ViTs with convolutional neural networks or other architectures (e.g., Mamba). These advancements are driving progress in various applications, including medical image analysis, object detection, and spatiotemporal prediction, by offering improved accuracy and efficiency compared to traditional convolutional neural networks in specific tasks.
Papers
RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone
Mustafa Munir, Md Mostafijur Rahman, Radu Marculescu
Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification
Yucong Meng, Zhiwei Yang, Yonghong Shi, Zhijian Song
One Pixel is All I Need
Deng Siqin, Zhou Xiaoyi
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
Dong Hoon Lee, Seunghoon Hong
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?
Taewhan Kim, Hojin Bae, Zeming Li, Xiaoqi Li, Iaroslav Ponomarenko, Ruihai Wu, Hao Dong
Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction
Edvard Ghukasyan, Hrant Khachatrian, Rafayel Mkrtchyan, Theofanis P. Raptis
Selective Visual Prompting in Vision Mamba
Yifeng Yao, Zichen Liu, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou
Sensing for Space Safety and Sustainability: A Deep Learning Approach with Vision Transformers
Wenxuan Zhang, Peng Hu
Static Key Attention in Vision
Zizhao Hu, Xiaolin Zhou, Mohammad Rostami
Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers
Johanna Vielhaben, Dilyara Bareeva, Jim Berend, Wojciech Samek, Nils Strodthoff
An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers
Xueluan Gong, Bowei Tian, Meng Xue, Yuan Wu, Yanjiao Chen, Qian Wang
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Hyesu Lim, Jinho Choi, Jaegul Choo, Steffen Schneider
Power Plant Detection for Energy Estimation using GIS with Remote Sensing, CNN & Vision Transformers
Blessing Austin-Gabriel, Cristian Noriega Monsalve, Aparna S. Varde
Slicing Vision Transformer for Flexible Inference
Yitian Zhang, Huseyin Coskun, Xu Ma, Huan Wang, Ke Ma, Xi (Stephen) Chen, Derek Hao Hu, Yun Fu
Megatron: Evasive Clean-Label Backdoor Attacks against Vision Transformer
Xueluan Gong, Bowei Tian, Meng Xue, Shuike Li, Yanjiao Chen, Qian Wang