Dynamic Sparsity
Dynamic sparsity in neural networks focuses on efficiently reducing computational costs by selectively activating only a subset of network parameters during training or inference. Current research explores dynamic sparsity across various architectures, including large language models, vision transformers, and convolutional neural networks, employing techniques like structured and unstructured pruning, dynamic layer routing, and sample-aware fine-tuning to achieve this. This approach offers significant potential for improving the efficiency and scalability of deep learning models, enabling deployment on resource-constrained devices and accelerating training processes while maintaining or even improving performance. The resulting smaller, faster models are particularly impactful for applications in edge computing, low-power devices, and resource-limited settings.
Papers
Dynamic Sparsity Is Channel-Level Sparsity Learner
Lu Yin, Gen Li, Meng Fang, Li Shen, Tianjin Huang, Zhangyang Wang, Vlado Menkovski, Xiaolong Ma, Mykola Pechenizkiy, Shiwei Liu
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao