Structured Compression
Structured compression aims to reduce the size and computational cost of large language models (LLMs) and other deep neural networks without significantly sacrificing performance. Current research focuses on developing efficient algorithms, such as low-rank matrix approximations and structured pruning, often applied to specific Transformer sub-layers or tailored to different model architectures (e.g., BERT, GPT). These techniques are crucial for deploying large models on resource-constrained devices and improving training efficiency, impacting both the scalability of AI research and the accessibility of powerful AI applications.
Papers
August 19, 2024
April 15, 2024
December 8, 2023
September 21, 2023
February 21, 2023