Memory Bottleneck
Memory bottlenecks severely limit the capabilities of large language models (LLMs) and other machine learning systems, hindering their scalability and efficiency. Current research focuses on optimizing memory usage through techniques like key-value cache compression, efficient fine-tuning methods such as LoRA and Mixture-of-Experts (MoE) models, and novel hardware-software co-design approaches including learning-in-memory (LIM). Overcoming these limitations is crucial for advancing AI capabilities, enabling the training and deployment of larger, more powerful models for diverse applications while reducing energy consumption.
Papers
July 1, 2024
April 22, 2024
February 21, 2024
February 14, 2024
November 15, 2023
March 26, 2023
November 30, 2022