Memory Aware Scheduling
Memory-aware scheduling aims to optimize resource allocation, particularly memory, in computationally intensive tasks like large language model (LLM) inference and deep neural network training. Current research focuses on developing efficient algorithms and data structures, such as novel tensor representations and iterative graph optimization techniques, to minimize memory footprint and improve performance in diverse settings including distributed systems and network-constrained environments. These advancements are crucial for enabling the deployment of increasingly complex models and applications, addressing limitations imposed by memory constraints and ultimately improving the speed and efficiency of various computational processes.
Papers
July 22, 2024
August 26, 2023
December 17, 2022