Sparse Expert
Sparse expert models aim to improve the efficiency and performance of large language models (LLMs) and other deep learning architectures by employing a network of specialized "expert" modules, each handling a subset of the input data. Current research focuses on developing efficient routing mechanisms (like those used in Mixture-of-Experts models) to select appropriate experts, optimizing the training process (e.g., through upcycling pre-trained models or novel regularization techniques), and enhancing the stability and transferability of these models across diverse tasks. This approach offers significant potential for creating larger, more powerful models with reduced computational costs, impacting various fields from natural language processing and computer vision to medical image analysis and robotics.
Papers
FOCIL: Finetune-and-Freeze for Online Class Incremental Learning by Training Randomly Pruned Sparse Experts
Murat Onur Yildirim, Elif Ceren Gok Yildirim, Decebal Constantin Mocanu, Joaquin Vanschoren
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz, Ying Wei