LLM Compression
LLM compression aims to reduce the substantial computational and memory demands of large language models (LLMs) while preserving their performance. Current research focuses on techniques like pruning, quantization, and low-rank decomposition, often applied to models such as LLaMA, exploring the trade-offs between compression ratios and accuracy across various downstream tasks and evaluating the impact on model safety and fairness. This field is crucial for enabling the deployment of LLMs on resource-constrained devices and improving their accessibility and efficiency in real-world applications.
Papers
October 2, 2023
September 25, 2023
September 2, 2023
July 15, 2023
June 1, 2023
May 17, 2023