Representation Collapse
Representation collapse, a phenomenon where neural network representations lose diversity and become overly similar, hinders the performance and scalability of various machine learning models. Current research focuses on mitigating this issue in large language models (LLMs), particularly those employing sparse mixture-of-experts (SMoE) architectures and transformers, as well as in reinforcement learning and federated learning settings. Strategies involve improving routing mechanisms in SMoEs, incorporating regularization techniques to encourage feature diversity, and developing novel objective functions that explicitly penalize representation collapse. Addressing representation collapse is crucial for advancing the capabilities and reliability of these models across diverse applications.
Papers
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Md Rifat Arefin, Gopeshh Subbaraj, Nicolas Gontier, Yann LeCun, Irina Rish, Ravid Shwartz-Ziv, Christopher Pal
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu, Bocheng Li, Yifei Xin, Linli Xu