Data Annotation
Data annotation, the process of labeling data for machine learning model training, is a crucial yet resource-intensive task. Current research focuses on improving annotation efficiency and quality through techniques like leveraging large language models (LLMs) for automated labeling, developing interactive tools for efficient exploration and annotation of unstructured data, and employing active learning strategies to optimize annotation efforts. These advancements are vital for improving the performance and reliability of machine learning models across diverse applications, from healthcare and finance to social media analysis and autonomous vehicles, while also addressing issues like annotator bias and cost-effectiveness.
Papers
How We Define Harm Impacts Data Annotations: Explaining How Annotators Distinguish Hateful, Offensive, and Toxic Comments
Angela Schöpke-Gonzalez, Siqi Wu, Sagar Kumar, Paul J. Resnick, Libby Hemphill
Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection
Sophia Althammer, Guido Zuccon, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury