LLM Based Metric
Large language model (LLM)-based metrics are automated evaluation tools leveraging LLMs to assess the quality of text generated by other LLMs, addressing limitations of traditional metrics like ROUGE and BLEU. Current research focuses on developing more robust and reliable LLM-based metrics, including exploring various prompting strategies, addressing biases (such as favoring texts generated by the same LLM), and incorporating fine-grained analysis beyond simple summary-level scores. These advancements are crucial for improving the development and deployment of LLMs across diverse applications, from summarization and machine translation to medical tasks and cybersecurity, by providing more nuanced and human-aligned evaluations.
Papers
December 20, 2024
December 10, 2024
October 18, 2024
October 1, 2024
August 22, 2024
August 16, 2024
July 1, 2024
June 26, 2024
June 3, 2024
March 5, 2024
February 9, 2024
November 16, 2023
November 7, 2023
October 21, 2023
August 14, 2023
June 15, 2023
June 5, 2023
May 24, 2023
November 16, 2022