Automatic Evaluation Metric
Automatic evaluation metrics aim to objectively assess the quality of generated text or other outputs, such as images or radiology reports, by quantifying their similarity to human-created references. Current research focuses on developing metrics that are robust to common generation flaws like hallucinations, better correlate with human judgments, and are adaptable across diverse tasks and languages, often leveraging large language models (LLMs) for improved performance. These advancements are crucial for accelerating the development and deployment of natural language generation and other AI systems by providing efficient and reliable evaluation methods, reducing the reliance on expensive and time-consuming human evaluations.
Papers
December 18, 2024
October 22, 2024
October 13, 2024
September 28, 2024
September 15, 2024
May 6, 2024
April 30, 2024
April 27, 2024
April 6, 2024
March 26, 2024
February 28, 2024
December 6, 2023
November 7, 2023
October 20, 2023
September 22, 2023
August 31, 2023
June 22, 2023
May 24, 2023
April 5, 2023
March 15, 2023