Model Collapse

Model collapse describes the performance degradation of machine learning models, particularly large language models (LLMs) and generative models, when trained on data generated by previous iterations of the same or similar models. Current research focuses on understanding the causes of this phenomenon, including the impact of synthetic data proportions, model architecture (e.g., transformers, diffusion models), and training methods (e.g., self-supervised learning, reinforcement learning from human feedback). Addressing model collapse is crucial for ensuring the reliability and safety of increasingly prevalent AI systems, as it impacts both the accuracy and fairness of model outputs across various applications.

Papers