Adaptive Compression
Adaptive compression techniques aim to reduce the size and computational cost of machine learning models, particularly deep neural networks, while minimizing performance degradation. Current research focuses on developing adaptive compression methods for various model architectures, including transformers, variational autoencoders, and convolutional neural networks, often employing strategies like structured pruning, quantization, and low-rank approximations tailored to specific model characteristics or dynamic bandwidth conditions. These advancements are crucial for deploying large models on resource-constrained devices (e.g., edge computing) and improving the efficiency of distributed training in federated learning settings. The resulting improvements in computational efficiency and bandwidth usage have significant implications for various applications, including speech recognition, image processing, and anomaly detection.
Papers
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
Zhenheng Tang, Xueze Kang, Yiming Yin, Xinglin Pan, Yuxin Wang, Xin He, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Amelie Chi Zhou, Bo Li, Bingsheng He, Xiaowen Chu
Test-time adaptation for image compression with distribution regularization
Kecheng Chen, Pingping Zhang, Tiexin Qin, Shiqi Wang, Hong Yan, Haoliang Li