Paper ID: 2404.03428

Edisum: Summarizing and Explaining Wikipedia Edits at Scale

Marija Šakota, Isaac Johnson, Guosheng Feng, Robert West

An edit summary is a succinct comment written by a Wikipedia editor explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit summaries are crucial for maintaining the encyclopedia: they are the first thing seen by content moderators and they help them decide whether to accept or reject an edit. Additionally, edit summaries constitute a valuable data source for researchers. Unfortunately, as we show, for many edits, summaries are either missing or incomplete. To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a language model trained to produce good edit summaries given the representation of an edit diff. To overcome the challenges of mixed-quality training data and efficiency requirements imposed by the scale of Wikipedia, we fine-tune a small generative language model on a curated mix of human and synthetic data. Our model performs on par with human editors. Commercial large language models are able to solve this task better than human editors, but are not well suited for Wikipedia, while open-source ones fail on this task. More broadly, we showcase how language modeling technology can be used to support humans in maintaining one of the largest and most visible projects on the Web.

Submitted: Apr 4, 2024