Sparse MoEs

Sparse Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large neural networks by activating only a subset of their parameters (experts) for each input. Current research focuses on optimizing routing mechanisms—the algorithms that select which experts to activate—exploring different architectures like "Expert Choice" and "Token Choice" routing, and developing compression techniques to reduce memory footprint, such as sub-1-bit quantization. These advancements enable training and deploying significantly larger models with reduced computational cost and memory requirements, impacting various fields including natural language processing and computer vision by allowing for more powerful and efficient models on resource-constrained devices.

Papers