Representation Collapse

Representation collapse, a phenomenon where neural network representations lose diversity and become overly similar, hinders the performance and scalability of various machine learning models. Current research focuses on mitigating this issue in large language models (LLMs), particularly those employing sparse mixture-of-experts (SMoE) architectures and transformers, as well as in reinforcement learning and federated learning settings. Strategies involve improving routing mechanisms in SMoEs, incorporating regularization techniques to encourage feature diversity, and developing novel objective functions that explicitly penalize representation collapse. Addressing representation collapse is crucial for advancing the capabilities and reliability of these models across diverse applications.

Papers