Attention Map
Attention maps, visualizations of the weighting process within neural networks, are crucial for understanding model decision-making and improving model performance. Current research focuses on refining attention mechanisms within transformer architectures like Vision Transformers (ViTs) and BERT, addressing issues such as artifact reduction, efficient computation (e.g., through token selection), and improved alignment with input data (e.g., text prompts or object features). This work has implications for diverse fields, including image processing, natural language processing, and even medical diagnostics, by enhancing model interpretability, accuracy, and efficiency. The development of robust and interpretable attention maps is a key area of ongoing investigation.
Papers
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding
Joao Pereira, Vasco Lopes, David Semedo, Joao NevesProgressive Focused Transformer for Single Image Super-Resolution
Wei Long, Xingyu Zhou, Leheng Zhang, Shuhang GuUniversity of Electronic Science and Technology of China
Dynamic Accumulated Attention Map for Interpreting Evolution of Decision-Making in Vision Transformer
Yi Liao, Yongsheng Gao, Weichuan ZhangGriffith University●Shaanxi University of Science and TechnologyGrowing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun YuHangzhou Dianzi University●Li Auto Inc.●Harbin Institute of Technology (Shenzhen)