Functional Compression
Functional compression aims to reduce the size and computational cost of machine learning models, particularly large language models and deep neural networks, without significant performance degradation. Current research focuses on post-training compression techniques, including pruning, quantization, knowledge distillation, and novel algorithms like low-rank decomposition and compression-aware optimization, applied to various architectures such as transformers and convolutional neural networks. These advancements are crucial for deploying sophisticated models on resource-constrained devices like mobile phones and embedded systems, enabling wider accessibility and application of AI in diverse fields. Furthermore, research is exploring efficient compression methods tailored to specific data types, such as 3D scene graphs and user-item interaction data in recommendation systems.