Automatic Evaluation
Automatic evaluation of generated text and other outputs from AI models, particularly large language models (LLMs), aims to create objective and efficient alternatives to expensive and time-consuming human assessment. Current research focuses on developing new metrics and frameworks that better correlate with human judgment, often leveraging LLMs themselves as "judges" or incorporating techniques like instruction tuning and preference optimization. These advancements are crucial for accelerating the development and deployment of AI systems across diverse fields, from scientific protocol generation to medical diagnosis and education, by providing reliable and scalable evaluation methods.
Papers
September 15, 2023
September 6, 2023
September 5, 2023
August 15, 2023
August 14, 2023
July 19, 2023
June 13, 2023
June 7, 2023
May 29, 2023
May 26, 2023
May 24, 2023
May 10, 2023
April 17, 2023
March 27, 2023
March 21, 2023
November 29, 2022
November 19, 2022
November 18, 2022