Sparse Double Descent

Sparse double descent describes a non-monotonic relationship between model sparsity and generalization performance in neural networks, where increasing sparsity initially worsens, then improves, and finally degrades performance. Current research focuses on understanding this phenomenon in various architectures, including convolutional neural networks and vision transformers, and exploring mitigation strategies like L2 regularization and knowledge distillation. This research is significant because it challenges conventional wisdom about pruning and highlights the complex interplay between model size, sparsity, and generalization ability, impacting efficient model design and resource allocation in machine learning.

Papers