Automatic Evaluation Metric
Automatic evaluation metrics aim to objectively assess the quality of generated text or other outputs, such as images or radiology reports, by quantifying their similarity to human-created references. Current research focuses on developing metrics that are robust to common generation flaws like hallucinations, better correlate with human judgments, and are adaptable across diverse tasks and languages, often leveraging large language models (LLMs) for improved performance. These advancements are crucial for accelerating the development and deployment of natural language generation and other AI systems by providing efficient and reliable evaluation methods, reducing the reliance on expensive and time-consuming human evaluations.
Papers
November 21, 2022
November 3, 2022
October 22, 2022
October 17, 2022
October 14, 2022
October 1, 2022
August 31, 2022
March 25, 2022
March 17, 2022
March 11, 2022
January 4, 2022