Compression Technique
Model compression techniques aim to reduce the size and computational cost of large machine learning models, such as Large Language Models (LLMs) and Convolutional Neural Networks (CNNs), without significant performance degradation. Current research focuses on methods like pruning, quantization, and low-rank approximations, often applied in conjunction with techniques like knowledge distillation and optimized data encoding, and evaluated across diverse model architectures including LLMs, CNNs, and point cloud transformers. These advancements are crucial for deploying sophisticated AI models on resource-constrained devices, improving efficiency in federated learning, and mitigating the environmental impact of large-scale AI training and inference.
Papers
Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling
Xihaier Luo, Samuel Lurvey, Yi Huang, Yihui Ren, Jin Huang, Byung-Jun Yoon
Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior
Yi Yu, Yufei Wang, Wenhan Yang, Lanqing Guo, Shijian Lu, Ling-Yu Duan, Yap-Peng Tan, Alex C. Kot
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers
Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong
Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks
Samer Francy, Raghubir Singh