Memory Management

Memory management in computing focuses on optimizing the allocation, use, and deallocation of memory resources to improve efficiency and performance across diverse applications. Current research emphasizes efficient memory usage in large language models (LLMs) and deep neural networks (DNNs), exploring techniques like adaptive memory allocation, mixed precision training, and novel attention mechanisms (e.g., PagedAttention) to reduce memory footprints and improve training/inference speed. These advancements are crucial for enabling the deployment of increasingly complex models in resource-constrained environments and for mitigating the environmental impact of computationally intensive tasks, impacting fields ranging from artificial intelligence to scientific computing.

Papers