Labeling Source

Labeling source research focuses on efficiently generating training data for machine learning models by leveraging "weak" labels—inexpensive, noisy, or incomplete annotations—instead of relying solely on expensive manual labeling. Current research explores methods to integrate multiple weak sources, including rule-based systems, pre-trained models, and crowd-sourced data, often employing techniques like generative models (e.g., normalizing flows) or adapting existing models to handle weak supervision. This work is significant because it addresses the bottleneck of data annotation in many machine learning applications, enabling the development of accurate models even with limited high-quality labeled data.

Papers