Summarization Quality
Evaluating the quality of automatically generated text summaries is a crucial but challenging area of research, aiming to develop robust and reliable metrics that align with human judgment. Current efforts focus on creating more comprehensive evaluation benchmarks encompassing diverse scenarios and fine-grained aspects of quality (e.g., faithfulness, consistency, perspective preservation), often leveraging large language models (LLMs) and transformer-based architectures for both summarization and evaluation. Improved summarization quality evaluation is vital for advancing natural language processing, particularly in applications requiring high accuracy and reliability, such as medical reporting and scientific literature review.
Papers
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely
Forrest Sheng Bao, Ruixuan Tu, Ge Luo, Yinfei Yang, Hebi Li, Minghui Qiu, Youbiao He, Cen Chen
On Improving Summarization Factual Consistency from Natural Language Feedback
Yixin Liu, Budhaditya Deb, Milagro Teruel, Aaron Halfaker, Dragomir Radev, Ahmed H. Awadallah