Auto Scaling

Auto-scaling dynamically adjusts computing resources to meet fluctuating demands, aiming to optimize resource utilization and maintain service level objectives. Current research focuses on developing more sophisticated prediction models, including those based on recurrent neural networks, graph neural networks, and meta-reinforcement learning, to anticipate workload changes and improve the accuracy and efficiency of scaling decisions. These advancements are crucial for managing the ever-increasing complexity of cloud-based systems and large-scale machine learning training, leading to cost savings and improved performance in diverse applications ranging from serverless functions to large language model training.

Papers