Optical Character Recognition Quality
Optical Character Recognition (OCR) quality significantly impacts the usability and accuracy of digitized text, particularly in historical documents with complex layouts or degraded images. Current research focuses on improving OCR accuracy through techniques like leveraging pre-trained language models for post-processing correction and incorporating contextual information to filter out noise and improve detection of textual elements. These advancements are crucial for enhancing access to historical archives and improving the effectiveness of information retrieval systems that rely on digitized text, as demonstrated by user studies showing a direct correlation between OCR quality and perceived usefulness of retrieved documents.
Papers
August 30, 2024
June 1, 2022
May 17, 2022
March 4, 2022