Sparse Activation
Sparse activation, the phenomenon where only a small subset of neurons are active during neural network processing, is a key research area aiming to improve the efficiency and speed of large-scale models like Transformers and Mixture-of-Experts (MoE) architectures. Current research focuses on optimizing training methods for sparse models, developing novel activation functions to enhance sparsity, and exploring the interplay between sparse activation and other efficiency techniques like weight pruning and quantization. This research is significant because it offers the potential for substantial reductions in computational cost and energy consumption for large language models and other deep learning applications, making them more accessible and sustainable.
Papers
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation
Ziwei Zhan, Wenkuan Zhao, Yuanqing Li, Weijie Liu, Xiaoxi Zhang, Chee Wei Tan, Chuan Wu, Deke Guo, Xu Chen
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Yuxin Xiao, Chaoqun Wan, Yonggang Zhang, Wenxiao Wang, Binbin Lin, Xiaofei He, Xu Shen, Jieping Ye
QBI: Quantile-Based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning
Micha V. Nowak, Tim P. Bott, David Khachaturov, Frank Puppe, Adrian Krenzer, Amar Hekalo
Learning Neural Networks with Sparse Activations
Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath, Raghu Meka