Data Reduction

Data reduction aims to decrease the size of datasets while preserving essential information for downstream tasks like machine learning model training. Current research focuses on developing efficient algorithms, including attention-based methods and those leveraging privileged information or loss-curvature matching, to select representative subsets or synthesize smaller, high-fidelity datasets. These techniques are crucial for addressing the challenges posed by ever-increasing data volumes in various fields, from climate science and high-energy physics to embedded systems and IoT applications, enabling resource-efficient model training and deployment. The impact extends to democratizing access to AI technologies and reducing the environmental footprint of computationally intensive tasks.

Papers