Structured Sparsity
Structured sparsity in neural networks focuses on strategically removing parameters to reduce computational costs and memory footprint without significantly sacrificing performance. Current research emphasizes developing efficient algorithms for inducing and leveraging this sparsity in large language models (LLMs) and convolutional neural networks (CNNs), often employing techniques like N:M sparsity, block sparsity, and various pruning methods coupled with quantization. This area is crucial for deploying large models on resource-constrained devices and improving the efficiency of training and inference, impacting both the scalability of AI and its energy consumption.
Papers
(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork
Tianjin Huang, Fang Meng, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Shiwei Liu, Tianlong Chen
Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization
Derek Fox, Samuel Hernandez, Qianqian Tong