Chinchilla Scaling

Chinchilla scaling investigates the optimal balance between model size (parameters) and training data for achieving the best performance in large language models (LLMs), specifically aiming to maximize accuracy for a given computational budget. Current research focuses on refining the scaling laws themselves, addressing inconsistencies in previous estimations, and extending these laws to account for the computational cost of inference. This work is significant because it improves the efficiency of LLM training, enabling researchers to build more powerful models with limited resources and potentially leading to more responsible and sustainable development of AI.

Papers

June 12, 2024

Reconciling Kaplan and Chinchilla Scaling Laws
Tim Pearce, Jinyeop Song
Next Token Prediction Scaling Behavior Et Al Overestimation Bias Chinchilla Scaling

April 15, 2024

Chinchilla Scaling: A replication attempt
Tamay Besiroglu, Ege Erdil, Matthew Barnett, Josh You
Loss Function Scaling Law Serial Reproduction Estimation Method Chinchilla Scaling

December 31, 2023

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana, Jonathan Frankle
Large Language Model Language Model Scientific Inference Inference Cost Chinchilla Scaling

July 18, 2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik
Language Model Evidence Piece Multiple Choice Question Choice Question Answering Circuit Analysis Interpretability Scale Chinchilla Scaling

April 6, 2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness
Large Language Model Language Model Large Model GPT Model Computing Cluster Chinchilla Scaling

March 20, 2023

What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring
Yonadav Shavit
Machine Learning Simple RULE Runtime Monitoring Neural Network Training Chinchilla Scaling AI Chip

Chinchilla Scaling

Papers

Reconciling Kaplan and Chinchilla Scaling Laws

Chinchilla Scaling: A replication attempt

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring