Sparse Activation

Sparse activation, the phenomenon where only a small subset of neurons are active during neural network processing, is a key research area aiming to improve the efficiency and speed of large-scale models like Transformers and Mixture-of-Experts (MoE) architectures. Current research focuses on optimizing training methods for sparse models, developing novel activation functions to enhance sparsity, and exploring the interplay between sparse activation and other efficiency techniques like weight pruning and quantization. This research is significant because it offers the potential for substantial reductions in computational cost and energy consumption for large language models and other deep learning applications, making them more accessible and sustainable.

Papers

February 6, 2024

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun
Large Language Model Activation Function Sparse Activation ReLU Neural Network Sparse Computation

February 5, 2024

Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Takashi Furuya, Marc T. Law
Deep Learning Model Expert Knowledge Mixture of Expert ReLU Layer Sparse Activation Traditional Deep Learning Approximation Rate

February 2, 2024

Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection
Lennard Bodden, Franziska Schwaiger, Duc Bach Ha, Lars Kreuzberg, Sven Behnke
Sparse Activation Spiking Element Wise Distillation Learning

December 16, 2023

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen
Large Language Model Large Language Model Inference Sparse Activation Inference System Hybrid Inference Consumer Level GPUs

November 13, 2023

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
Neuromorphic Computing Activation Sparsity Sparse Activation RNN Inference

November 7, 2023

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
Jan Finkbeiner, Thomas Gmeinder, Mark Pupilli, Alexander Titterton, Emre Neftci
Many Sparse Faster Training Activation Sparsity Sparse Activation Tensor Processing Unit AI Hardware Sparse Tensor Shared Memory

October 11, 2023

Measuring Feature Sparsity in Language Models
Mingyang Deng, Lucas Tao, Joe Benton
Language Model Sparse Activation Feature Sparsity Sparse Linear

June 19, 2023

Sparse Modular Activation for Efficient Sequence Modeling
Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai
Long Sequence Attention Module Sequence Model Sparse Activation Sequence Modeling Task

May 20, 2023

Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang, Yang Yang, Jiahao Liu, Jingang Wang, Yunsen Xian, Benyou Wang, Dawei Song
Language Model Knowledge Distillation Pretrained Language Model Reversal Curse Sparse Activation Capacity Gap

May 18, 2023

Learning Activation Functions for Sparse Neural Networks
Mohammad Loni, Aditya Mohan, Mehdi Asadi, Marius Lindauer
Activation Function DNN Model Sparse Network Sparse Neural Network Sparse Activation

May 17, 2023

Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks
Chanwoo Chun, Daniel D. Lee
Sparse Network Wide Neural Network Sparse Activation Neural Network Gaussian Process Deep Bayesian

May 3, 2023

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami
Transformer Based Mixture of Expert Language Pair Sparse Activation Translation Benchmark Hierarchical Bias Driven Stratification Capacity Gap

April 11, 2023

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang
Inference Efficiency Fast Inference Parameter Efficient Transfer Learning Sparse Activation Inference Speedup Conditional Adapter

February 3, 2023

SPARLING: Learning Latent Representations with Extremely Sparse Activations
Kavi Gupta, Osbert Bastani, Armando Solar-Lezama
Latent Representation Activation Sparsity Sparse Activation High Sparsity Sparse Tensor

January 1, 2023

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK
Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang
Strong Generalization Early Stage Convergence Neural Tangent Kernel ReLU Network Sparse Network Wide Neural Network Sparse Activation Relative Stance Bias

December 10, 2022

SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing
Chaoyang He, Shuai Zheng, Aston Zhang, George Karypis, Trishul Chilimbi, Mahdi Soltanolkotabi, Salman Avestimehr
Mixture of Expert Potential Scalability Dynamic Routing Sparse Activation Heterogeneous Network Happy Image

November 16, 2022

Learning unfolded networks with a cyclic group structure
Emmanouil Theodosis, Demba Ba
LeArning Abstract Deep Neural Network Network Programming Equivariant Neural Network Sparse Activation Interpretable Network Intensional Group

October 12, 2022

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
Transformer Megatron Decepticons Multi Layer Many Sparse Sparsity Increase Path Breaking Emergence Multi Layer Perceptron Activation Sparsity Sparse Activation Dead Neuron

August 16, 2022

Universal Solutions of Feedforward ReLU Networks for Interpolations
Changcun Huang
Deep Neural Network ReLU Network Interpolation Regime Sparse Matrix Sparse Activation

August 4, 2022

Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li
Deep Learning Mixture Component Expert Knowledge Mixture of Expert Sparse Activation Two Layer