Factuality Metric
Factuality metrics aim to automatically assess the accuracy of information in text generated by large language models (LLMs), particularly in summarization tasks. Current research focuses on developing more robust metrics that go beyond simple n-gram overlap, employing techniques like sentence embedding similarity (e.g., Sentence-BERT Score) and leveraging question-answering or natural language inference models. These advancements are crucial for improving the reliability of LLM-generated content across various applications, from scientific literature summarization to medical report generation, by providing a more accurate and nuanced evaluation of factual consistency.
Papers
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Yue Zhang, Jingxuan Zuo, Liqiang Jing