Layer Model Parallelism

Layer model parallelism aims to accelerate deep learning model training and inference by distributing computations across multiple devices, overcoming the limitations of single-device processing. Current research focuses on optimizing this parallelism within and between layers of models like Transformers and CNNs, employing techniques such as mixed-integer programming for automated strategy selection and sparse communication to reduce inter-device data transfer. These advancements significantly improve training throughput, inference speed, and resource efficiency for large-scale models, impacting both the efficiency of research and the deployment of AI applications in resource-constrained environments.

Papers