Scaling Law
Scaling laws in machine learning aim to quantify the relationship between a model's performance and factors like its size, training data volume, and computational resources. Current research focuses on refining these laws across diverse model architectures, including transformers (both encoder-decoder and decoder-only), and optimization algorithms like SGD and AdamW, investigating their applicability to various tasks such as language modeling, translation, and image classification. Understanding these scaling laws is crucial for optimizing resource allocation in model development, improving training efficiency, and guiding the design of future, more powerful AI systems. Furthermore, the principles are being extended to explore economic productivity and the impact of data quality.
Papers
Scaling Laws for Pre-training Agents and World Models
Tim Pearce, Tabish Rashid, Dave Bignell, Raluca Georgescu, Sam Devlin, Katja Hofmann
Scaling Laws for Precision
Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, Aditi Raghunathan
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli, Samet Oymak
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
Scaling Laws for Multilingual Language Models
Yifei He, Alon Benhaim, Barun Patra, Praneetha Vaddamanu, Sanchit Ahuja, Parul Chopra, Vishrav Chaudhary, Han Zhao, Xia Song
A Hitchhiker's Guide to Scaling Law Estimation
Leshem Choshen, Yang Zhang, Jacob Andreas
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang, Allan Zhou, Zhili Feng, Sadhika Malladi, J. Zico Kolter