Summarization Benchmark
Summarization benchmark research aims to create robust and reliable methods for evaluating the quality of automatically generated summaries, addressing limitations of existing metrics like ROUGE which often poorly correlate with human judgment. Current efforts focus on developing fine-grained evaluation techniques using large language models (LLMs), exploring instruction-controllable summarization, and addressing issues like data contamination and temporal generalization. These advancements are crucial for improving the accuracy and reliability of summarization models, ultimately impacting various applications requiring concise and faithful information extraction from large text corpora.
Papers
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
Daniel Deutsch, Dan Roth
Adding more data does not always help: A study in medical conversation summarization with PEGASUS
Varun Nair, Namit Katariya, Xavier Amatriain, Ilya Valmianski, Anitha Kannan