Training Compute
Training compute, the computational resources required to train machine learning models, is a critical factor determining model performance and capabilities. Current research focuses on optimizing training efficiency through techniques like improved training schedules, sparse model architectures (e.g., Mixture-of-Experts), and self-training methods that reduce reliance on large labeled datasets. These advancements aim to reduce the substantial computational cost associated with training large language models and other complex AI systems, impacting both the economic feasibility and environmental sustainability of AI development. The efficient use of training compute is also crucial for regulatory oversight, as it correlates with model capabilities and potential risks.
Papers
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, Martin Jaggi
Yuan 2.0-M32: Mixture of Experts with Attention Router
Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen