Hierarchical Pruning
Hierarchical pruning is a network compression technique aiming to reduce the computational cost and memory footprint of large neural networks, such as large language models and diffusion models, without significant performance degradation. Current research focuses on developing efficient algorithms that prune networks at multiple levels (e.g., channels, heads, layers), often employing optimization-based methods or leveraging metrics like focal diversity to guide the pruning process. These advancements are significant because they enable deployment of complex models on resource-constrained devices and improve the efficiency of training and inference, impacting various applications from image generation to natural language processing.
Papers
June 17, 2024
June 15, 2024
April 5, 2024
March 19, 2024
December 23, 2023
November 17, 2023