Large Scale Training
Large-scale training focuses on efficiently training massive machine learning models, often with billions of parameters, across distributed computing systems. Current research emphasizes techniques to reduce memory consumption (e.g., layerwise importance sampling), improve communication efficiency (e.g., communication-computation overlap, 0/1 Adam), and optimize training speed (e.g., active learning, model parallelism) for various architectures including transformers, graph neural networks, and GANs. These advancements are crucial for developing powerful models in diverse fields like natural language processing, medical imaging, and recommender systems, ultimately impacting the performance and accessibility of AI applications.
Papers
Large-scale Training of Foundation Models for Wearable Biosignals
Salar Abbaspourazad, Oussama Elachqar, Andrew C. Miller, Saba Emrani, Udhyakumar Nallasamy, Ian Shapiro
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Schwarz, Ryutaro Tanno, Olivier J. Henaff
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou