Query Attention
Query attention mechanisms aim to improve the efficiency and effectiveness of attention-based models, particularly large language models (LLMs), by optimizing how queries interact with keys and values. Current research focuses on developing more efficient query attention architectures, such as grouped query attention and multi-query attention, which reduce computational costs and memory requirements by sharing key-value heads across multiple queries. These advancements are crucial for deploying LLMs on resource-constrained devices and enabling processing of longer sequences, impacting various applications from question answering to image and video analysis.
Papers
March 27, 2024
March 13, 2024
October 10, 2023
September 13, 2023
August 11, 2023
June 6, 2023
May 22, 2023
May 7, 2023
March 8, 2023
October 17, 2022
September 30, 2022
March 29, 2022