Character Recognition
Character recognition, the automated extraction of text from images, aims to digitize and make accessible vast amounts of textual data, including historical documents and scene text. Current research heavily utilizes deep learning models, particularly transformer-based architectures and convolutional neural networks, often incorporating techniques like contrastive learning and multi-modal approaches to improve accuracy and efficiency across diverse languages and document types. This field is crucial for applications ranging from document digitization and information retrieval to cultural preservation and intelligent traffic systems, driving advancements in both computer vision and natural language processing.
Papers
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
Jiaxin Zhang, Joy Rimchala, Lalla Mouatadid, Kamalika Das, Sricharan Kumar
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic
DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu