Automatic Evaluation
Automatic evaluation of generated text and other outputs from AI models, particularly large language models (LLMs), aims to create objective and efficient alternatives to expensive and time-consuming human assessment. Current research focuses on developing new metrics and frameworks that better correlate with human judgment, often leveraging LLMs themselves as "judges" or incorporating techniques like instruction tuning and preference optimization. These advancements are crucial for accelerating the development and deployment of AI systems across diverse fields, from scientific protocol generation to medical diagnosis and education, by providing reliable and scalable evaluation methods.
Papers
December 31, 2024
December 23, 2024
December 20, 2024
December 18, 2024
December 12, 2024
December 10, 2024
November 26, 2024
November 23, 2024
November 19, 2024
November 15, 2024
November 3, 2024
October 28, 2024
October 15, 2024
October 6, 2024
October 3, 2024
September 23, 2024
September 19, 2024
September 18, 2024
September 17, 2024