Attention Mechanism
Attention mechanisms are computational processes that selectively focus on relevant information within data, improving efficiency and performance in various machine learning models. Current research emphasizes optimizing attention's computational cost (e.g., reducing quadratic complexity to linear), enhancing its expressiveness (e.g., through convolutional operations on attention scores), and improving its robustness (e.g., mitigating hallucination in vision-language models and addressing overfitting). These advancements are significantly impacting fields like natural language processing, computer vision, and time series analysis, leading to more efficient and accurate models for diverse applications.
Papers
Stick-breaking Attention
Shawn Tan, Yikang Shen, Songlin Yang, Aaron Courville, Rameswar Panda
Emotion Recognition with Facial Attention and Objective Activation Functions
Andrzej Miskow, Abdulrahman Altahhan
Feature Learning in Attention Mechanisms Is More Compact and Stable Than in Convolution
Baiyuan Chen
Chain and Causal Attention for Efficient Entity Tracking
Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham, David P. Woodruff
MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
Timothy Chase Jr, Karthik Dantu
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li
Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting
Ignacio Aguilera-Martos, Andrés Herrera-Poyatos, Julián Luengo, Francisco Herrera
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen