Mixture of Expert
Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large language and other models by using multiple specialized "expert" networks, each handling a subset of the input data. Current research focuses on improving routing algorithms to efficiently assign inputs to experts, developing heterogeneous MoE architectures with experts of varying sizes and capabilities, and optimizing training methods to address challenges like load imbalance and gradient conflicts. This approach holds significant promise for creating larger, more powerful models with reduced computational costs, impacting various fields from natural language processing and computer vision to robotics and scientific discovery.
Papers
A Survey on Inference Optimization Techniques for Mixture of Experts Models
Jiacheng Liu, Peng Tang, Wenfeng Wang, Yuhang Ren, Xiaofeng Hou, Pheng-Ann Heng, Minyi Guo, Chao Li
GraphLoRA: Empowering LLMs Fine-Tuning via Graph Collaboration of MoE
Ting Bai, Yue Yu, Le Huang, Zenan Xu, Zhe Zhao, Chuan Shi
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan
Llama 3 Meets MoE: Efficient Upcycling
Aditya Vavre, Ethan He, Dennis Liu, Zijie Yan, June Yang, Nima Tajbakhsh, Ashwath Aithal