LLM Compression
LLM compression aims to reduce the substantial computational and memory demands of large language models (LLMs) while preserving their performance. Current research focuses on techniques like pruning, quantization, and low-rank decomposition, often applied to models such as LLaMA, exploring the trade-offs between compression ratios and accuracy across various downstream tasks and evaluating the impact on model safety and fairness. This field is crucial for enabling the deployment of LLMs on resource-constrained devices and improving their accessibility and efficiency in real-world applications.
Papers
November 10, 2024
October 28, 2024
October 18, 2024
October 9, 2024
September 17, 2024
July 6, 2024
June 25, 2024
June 24, 2024
June 22, 2024
June 10, 2024
June 6, 2024
May 29, 2024
May 23, 2024
May 21, 2024
March 12, 2024
January 25, 2024
January 11, 2024
December 1, 2023
October 8, 2023