Sparse Gate

Sparse gating mechanisms are crucial for efficiently scaling up large neural networks, particularly Mixture-of-Experts (MoE) models, by selectively activating subsets of the network for each input. Current research focuses on improving the training and performance of these gates, exploring novel architectures like tree-based approaches and dense-to-sparse training strategies to address convergence issues and enhance expert specialization. These advancements aim to improve the efficiency and performance of large language models and other deep learning applications by reducing computational costs while maintaining or improving accuracy. The resulting improvements in training stability and model performance have significant implications for deploying large-scale models in resource-constrained environments.

Papers

August 20, 2024

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
Language Model Weight Pruning Task Specific Structured Pruning Shot Pruning Sparse Gate

January 25, 2024

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen, Pedram Akbarian, Nhat Ho
Expert Knowledge Sparse Mixture of Expert Softmax Gating Sparse to Sparse Training Temperature Sampling Sparse Gate

June 5, 2023

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search
Shibal Ibrahim, Wenyu Chen, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder
Expert Knowledge Tree Specie Local Search Chasing COMET Finite Mixture Sparse Expert Sparse MoEs Sparse Gate

December 29, 2021

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang, Bin Cui
Mixture of Expert Sparse Gate

Sparse Gate

Papers

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate