Model Compression

Model compression aims to reduce the size and computational cost of large deep learning models, particularly large language models (LLMs) and vision transformers, without significant performance loss. Current research focuses on techniques like pruning (structured and unstructured), quantization, knowledge distillation, and novel architecture search methods, often applied to models like BERT, Llama, and ViT. These efforts are crucial for deploying advanced AI models on resource-constrained devices and making them more energy-efficient and accessible, impacting both scientific research and real-world applications. The field also emphasizes the need for comprehensive evaluation metrics beyond simple accuracy, considering factors like safety and robustness.

Papers