Scene Text Recognition
Scene text recognition (STR) aims to automatically extract and interpret text from images, a crucial task with applications ranging from autonomous driving to accessibility tools. Current research focuses on improving accuracy and efficiency, particularly for low-resolution images and low-resource languages, often employing architectures like transformers and diffusion models, along with self-supervised and semi-supervised learning techniques to address data scarcity. These advancements are driving progress in various fields, including document processing, image understanding, and assistive technologies, by enabling more robust and reliable text extraction from diverse visual sources.
Papers
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
Changxu Cheng, Bohan Li, Qi Zheng, Yongpan Wang, Wenyu Liu
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages
Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura