the latest in aiBeta

Model Parallel

Model parallelism addresses the challenge of training extremely large neural networks that exceed the memory capacity of a single machine by distributing the model across multiple devices. Current research focuses on optimizing communication efficiency between these devices, exploring techniques like data and model partitioning, compression of activations and gradients, and novel algorithms such as SWARM parallelism and MGRIT for handling long sequences. These advancements enable training of massive models for diverse applications, including large language models, recommender systems, and the solution of complex partial differential equations, significantly accelerating scientific discovery and industrial processes.

10papers

Papers

January 15, 2024

Activations and Gradients Compression for Model-Parallel Training
Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov
Targeted Activation Penalty Model Parallel Model Convergence Neural Network Experimental Batch Correction Method Top K Sparsification Gradient Compression

January 27, 2023

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin, Tim Dettmers, Michael Diskin, Alexander Borzunov
Model Parallel Large Model Agent Swarm Model Parallelism Large Scale Training

November 23, 2022

SciAI4Industry -- Solving PDEs for industry-scale problems with deep learning
Philipp A. Witte, Russell J. Hewett, Kumar Saurabh, AmirHossein Sojoodi, Ranveer Chandra
3D Flow Large Scale Deep Learning Model Parallel

November 10, 2022

On Optimizing the Communication of Model Parallelism
Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
Tensor Layout Model Parallel Model Parallelism Timely Communication

October 28, 2022

LOFT: Finding Lottery Tickets through Filter-wise Training
Qihan Wang, Chen Dun, Fangshuo Liao, Chris Jermaine, Anastasios Kyrillidis
Convolutional Neural Network Model Parallel Neural Network Lottery Ticket Hypothesis Lottery Ticket Layer Wise

October 17, 2022

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li
Model Parallel Scientific Inference

March 7, 2022

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences
Gordon Euhyun Moon, Eric C. Cyr
Gated Recurrent Unit Neural Network Parallel Training Model Parallel Long Sequence Multigrid Method

January 29, 2022

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity
Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava
Model Parallelism Parallel Training Neural Network Model Parallel Sparsity Increase Scalable Neural Computing Cluster

January 28, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing+2
Model Parallel Model Parallelism Deep Learning

December 31, 2021

SplitBrain: Hybrid Data and Model Parallel Deep Learning
Farley Lai, Asim Kadav, Erik Kruus
Hybrid Data Deep Learning Application Layer Wise Model Parallel Deep Learning Framework Split and Fit