Memory Aware Scheduling

Memory-aware scheduling aims to optimize resource allocation, particularly memory, in computationally intensive tasks like large language model (LLM) inference and deep neural network training. Current research focuses on developing efficient algorithms and data structures, such as novel tensor representations and iterative graph optimization techniques, to minimize memory footprint and improve performance in diverse settings including distributed systems and network-constrained environments. These advancements are crucial for enabling the deployment of increasingly complex models and applications, addressing limitations imposed by memory constraints and ultimately improving the speed and efficiency of various computational processes.

Papers