Human Generated Data
Human-generated data is crucial for training machine learning models, particularly large language models (LLMs), but its limitations—including bias, scarcity in specific domains, and high annotation costs—drive current research. Active areas focus on developing methods to generate high-quality synthetic data, assessing and mitigating biases in existing datasets and models, and exploring alternative data sources like robot-collected data or game-generated data to supplement or replace human-annotated data. This research is vital for improving the accuracy, fairness, and scalability of AI systems across diverse applications, from medical diagnosis to content generation and beyond.
Papers
November 4, 2024
August 28, 2024
June 24, 2024
May 7, 2024
March 28, 2024
March 25, 2024
January 25, 2024
January 15, 2024
December 11, 2023
December 3, 2023
September 25, 2023
July 14, 2023
June 2, 2023
May 15, 2023
April 14, 2023
October 26, 2022
September 16, 2022
July 12, 2022
June 11, 2022
April 11, 2022