Activation Sparsity
Activation sparsity, the phenomenon where only a small fraction of a neural network's neurons are active for a given input, is a key research area aiming to improve the efficiency of deep learning models, particularly large language models (LLMs). Current research focuses on methods to induce and leverage this sparsity during both training and inference, exploring techniques like thresholding, specialized activation functions (e.g., ReLU variants), and Mixture-of-Experts (MoE) architectures. This work is significant because it promises to reduce computational costs and memory requirements, enabling faster and more energy-efficient deployment of large models on resource-constrained devices, including edge devices and mobile phones.
Papers
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion
Filip Szatkowski, Bartosz Wójcik, Mikołaj Piórczyński, Simone Scardapane