Layer Model Parallelism
Layer model parallelism aims to accelerate deep learning model training and inference by distributing computations across multiple devices, overcoming the limitations of single-device processing. Current research focuses on optimizing this parallelism within and between layers of models like Transformers and CNNs, employing techniques such as mixed-integer programming for automated strategy selection and sparse communication to reduce inter-device data transfer. These advancements significantly improve training throughput, inference speed, and resource efficiency for large-scale models, impacting both the efficiency of research and the deployment of AI applications in resource-constrained environments.
Papers
August 14, 2024
June 11, 2024
July 31, 2023
February 22, 2023
February 10, 2023