Paper ID: 2407.12793

Data Collection and Labeling Techniques for Machine Learning

Qianyu Huang, Tongfang Zhao

Data collection and labeling are critical bottlenecks in the deployment of machine learning applications. With the increasing complexity and diversity of applications, the need for efficient and scalable data collection and labeling techniques has become paramount. This paper provides a review of the state-of-the-art methods in data collection, data labeling, and the improvement of existing data and models. By integrating perspectives from both the machine learning and data management communities, we aim to provide a holistic view of the current landscape and identify future research directions.

Submitted: Jun 19, 2024