Unsupervised Data Selection
Unsupervised data selection aims to identify the most informative subset of unlabeled data for training machine learning models, particularly in low-resource scenarios where labeled data is scarce. Current research focuses on developing effective selection criteria, often leveraging metrics like perplexity, contrastive loss ratios, or divergence measures between data distributions, and employing algorithms that incorporate these metrics to guide the selection process. This field is crucial for improving the efficiency and performance of various machine learning applications, including speech recognition, machine translation, and text-to-speech systems, by enabling the effective use of readily available unlabeled data.
Papers
July 19, 2024
February 29, 2024
February 26, 2023
January 22, 2023
December 22, 2022
December 3, 2022