Training Datasets
Training datasets are crucial for developing effective machine learning models, particularly large language and vision models, but their size and quality significantly impact model performance, cost, and security. Current research focuses on optimizing dataset size and composition through techniques like dataset distillation, pruning, and automated data generation, as well as mitigating issues arising from memorization of biased or sensitive information within existing datasets via methods such as machine unlearning. These advancements are vital for improving model efficiency, robustness, and ethical considerations across diverse applications, from medical image analysis to natural language processing.
Papers
July 30, 2024
July 25, 2024
July 15, 2024
July 8, 2024
April 30, 2024
April 8, 2024
January 26, 2024
January 3, 2024
October 24, 2023
October 4, 2023
July 16, 2023
May 25, 2023
April 27, 2023
March 30, 2023
November 19, 2022
September 30, 2022
May 31, 2022
May 19, 2022
May 7, 2022
May 4, 2022