Data Recycling

Data recycling, or the reuse of existing data for improved model training or efficiency, is a burgeoning field aiming to enhance machine learning performance and reduce computational costs. Current research focuses on leveraging past data for various purposes, including improving large language model (LLM) controllability and instruction following, boosting the performance of synthetic data generation for training classifiers, and optimizing algorithms like Word2Vec for faster execution. These techniques show promise in improving model accuracy, privacy, and efficiency across diverse applications, from software development and medical image analysis to electronic waste recycling and online advertising.

Papers