Optical Character Recognition Quality

Optical Character Recognition (OCR) quality significantly impacts the usability and accuracy of digitized text, particularly in historical documents with complex layouts or degraded images. Current research focuses on improving OCR accuracy through techniques like leveraging pre-trained language models for post-processing correction and incorporating contextual information to filter out noise and improve detection of textual elements. These advancements are crucial for enhancing access to historical archives and improving the effectiveness of information retrieval systems that rely on digitized text, as demonstrated by user studies showing a direct correlation between OCR quality and perceived usefulness of retrieved documents.

Papers