Dataset Quantization

Dataset quantization aims to reduce the size of training datasets for deep learning models, mitigating computational costs and memory limitations without significant performance loss. Current research focuses on developing efficient quantization techniques, including adaptive sampling strategies guided by active learning and novel vector quantization methods like low-rank representations, often incorporating knowledge distillation from dense embeddings to improve retrieval performance. These advancements are crucial for enabling the training of large models on resource-constrained hardware and expanding the accessibility of deep learning to a wider range of applications.

Papers