Memory Bottleneck

Memory bottlenecks severely limit the capabilities of large language models (LLMs) and other machine learning systems, hindering their scalability and efficiency. Current research focuses on optimizing memory usage through techniques like key-value cache compression, efficient fine-tuning methods such as LoRA and Mixture-of-Experts (MoE) models, and novel hardware-software co-design approaches including learning-in-memory (LIM). Overcoming these limitations is crucial for advancing AI capabilities, enabling the training and deployment of larger, more powerful models for diverse applications while reducing energy consumption.

Papers