Sparse Double Descent
Sparse double descent describes a non-monotonic relationship between model sparsity and generalization performance in neural networks, where increasing sparsity initially worsens, then improves, and finally degrades performance. Current research focuses on understanding this phenomenon in various architectures, including convolutional neural networks and vision transformers, and exploring mitigation strategies like L2 regularization and knowledge distillation. This research is significant because it challenges conventional wisdom about pruning and highlights the complex interplay between model size, sparsity, and generalization ability, impacting efficient model design and resource allocation in machine learning.
Papers
January 19, 2024
August 31, 2023
July 26, 2023
June 21, 2023
March 2, 2023