Model Compression
Model compression aims to reduce the size and computational cost of large deep learning models, particularly large language models (LLMs) and vision transformers, without significant performance loss. Current research focuses on techniques like pruning (structured and unstructured), quantization, knowledge distillation, and novel architecture search methods, often applied to models like BERT, Llama, and ViT. These efforts are crucial for deploying advanced AI models on resource-constrained devices and making them more energy-efficient and accessible, impacting both scientific research and real-world applications. The field also emphasizes the need for comprehensive evaluation metrics beyond simple accuracy, considering factors like safety and robustness.
Papers
Automated Inference of Graph Transformation Rules
Jakob L. Andersen, Akbar Davoodi, Rolf Fagerberg, Christoph Flamm, Walter Fontana, Juri Kolčák, Christophe V. F. P. Laurent, Daniel Merkle, Nikolai Nøjgaard
Improve Knowledge Distillation via Label Revision and Data Selection
Weichao Lan, Yiu-ming Cheung, Qing Xu, Buhua Liu, Zhikai Hu, Mengke Li, Zhenghua Chen