Softmax Gating

Softmax gating is a crucial component of Mixture of Experts (MoE) models, which combine multiple specialized "expert" networks to improve the accuracy and efficiency of machine learning tasks. Current research focuses on improving the performance and sample efficiency of softmax gating, exploring alternative gating functions like sigmoid and investigating the impact of different architectures, such as hierarchical MoEs and dense-to-sparse gating, on model convergence and parameter estimation. These advancements aim to address limitations of softmax gating, such as representation collapse and slow convergence rates, leading to more robust and efficient large-scale models for applications ranging from image classification to recommendation systems.

Papers

December 26, 2024

A novel framework for MCDM based on Z numbers and soft likelihood function
Yuanpeng He
Simple Function Novel Framework Maximum Likelihood Multi Criterion Decision Intuitionistic Fuzzy Fuzzy Membership Softmax Gating

October 9, 2024

The Sampling-Gaussian for stereo matching
Baiyu Pan, jichao jiao, Bowen Yao, Jianxin Pang, Jun Cheng
Softmax Function Stereo Matching Softmax Gating Learning Based Stereo Gaussian Sampling

October 3, 2024

On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions
Huy Nguyen, Xing Han, Carl William Harris, Suchi Saria, Nhat Ho
Foundation Model Estimation Task Expert Knowledge Softmax Gating Hierarchical Mixture

May 22, 2024

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen, Nhat Ho, Alessandro Rinaldo
Mixture Component Expert Knowledge Softmax Function Gating Mechanism Estimation Performance Softmax Gating

February 5, 2024

On Least Square Estimation in Softmax Gating Mixture of Experts
Huy Nguyen, Nhat Ho, Alessandro Rinaldo
Expert Knowledge Mixture of Expert Least Square Expert Network Softmax Gating

January 25, 2024

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen, Pedram Akbarian, Nhat Ho
Expert Knowledge Sparse Mixture of Expert Softmax Gating Sparse to Sparse Training Temperature Sampling Sparse Gate

November 11, 2023

Online Continual Learning via Logit Adjusted Softmax
Zhehao Huang, Tao Li, Chenhe Yuan, Yingwen Wu, Xiaolin Huang
Class Imbalance Model Bias Online Continual Learning Softmax Gating Intra Class Distribution

October 22, 2023

A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen, Pedram Akbarian, TrungTin Nguyen, Nhat Ho
Expert Knowledge Mixture of Expert Maximum Likelihood Softmax Gating

October 16, 2023

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification
Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou
Classification Code Meta Learning Shot Classification Softmax Gating

September 25, 2023

Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho
Expert Knowledge Softmax Gating

May 5, 2023

Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Huy Nguyen, TrungTin Nguyen, Nhat Ho
Expert Knowledge Parameter Estimation Gaussian Mixture Softmax Gating

June 18, 2022

PHN: Parallel heterogeneous network with soft gating for CTR prediction
Ri Su, Alphonse Houssou Hounye, Cong Cao, Muzhou Hou
Click Through Rate Gating Mechanism Heterogeneous Network Convex Constraint Softmax Gating