Sparse Mixture of Expert

Sparse Mixture-of-Experts (SMoE) models aim to improve the efficiency and scalability of large neural networks by activating only a subset of their parameters (experts) for each input. Current research focuses on developing more efficient routing mechanisms (e.g., cosine routers, product key techniques), exploring novel architectures like Block Tensor-Train MoE (BTT-MoE), and optimizing expert pruning strategies (e.g., evolutionary algorithms) to reduce computational costs and improve performance. This approach holds significant promise for deploying extremely large models on resource-constrained devices and improving the performance-compute trade-off in various applications, including natural language processing and computer vision.

Papers