Model Partition
Model partitioning involves dividing large machine learning models into smaller, manageable parts for improved efficiency and scalability across diverse hardware platforms. Current research focuses on optimizing partitioning strategies for various architectures, including heterogeneous GPU clusters and multi-chip modules, often employing techniques like adaptive quantization and reinforcement learning to enhance performance. This work is crucial for deploying increasingly complex models, such as large language models, on resource-constrained devices and for accelerating training and inference in federated learning settings, ultimately impacting both computational cost and application performance.
Papers
November 11, 2024
March 2, 2024
October 27, 2023
October 24, 2023
June 20, 2023
December 7, 2021