Optical Character Recognition
Optical Character Recognition (OCR) aims to automatically convert images of text into machine-readable text, facilitating efficient document processing and information extraction. Current research emphasizes improving OCR accuracy, particularly for challenging scenarios like historical documents, low-resolution images, and complex layouts, often employing transformer-based language models and convolutional neural networks for both character recognition and post-processing error correction. These advancements are crucial for digitizing historical archives, enhancing accessibility to information, and automating various tasks across diverse fields, from document management to scientific literature analysis.
Papers
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
Jiaxin Zhang, Joy Rimchala, Lalla Mouatadid, Kamalika Das, Sricharan Kumar
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova