Human Generated Data
Human-generated data is crucial for training machine learning models, particularly large language models (LLMs), but its limitations—including bias, scarcity in specific domains, and high annotation costs—drive current research. Active areas focus on developing methods to generate high-quality synthetic data, assessing and mitigating biases in existing datasets and models, and exploring alternative data sources like robot-collected data or game-generated data to supplement or replace human-annotated data. This research is vital for improving the accuracy, fairness, and scalability of AI systems across diverse applications, from medical diagnosis to content generation and beyond.
Papers
September 16, 2022
July 12, 2022
June 11, 2022
April 11, 2022
March 1, 2022
January 18, 2022
November 14, 2021
November 11, 2021