Scene Text Recognition
Scene text recognition (STR) aims to automatically extract and interpret text from images, a crucial task with applications ranging from autonomous driving to accessibility tools. Current research focuses on improving accuracy and efficiency, particularly for low-resolution images and low-resource languages, often employing architectures like transformers and diffusion models, along with self-supervised and semi-supervised learning techniques to address data scarcity. These advancements are driving progress in various fields, including document processing, image understanding, and assistive technologies, by enabling more robust and reliable text extraction from diverse visual sources.
Papers
Portmanteauing Features for Scene Text Recognition
Yew Lee Tan, Ernest Yu Kai Chew, Adams Wai-Kin Kong, Jung-Jae Kim, Joo Hwee Lim
Pure Transformer with Integrated Experts for Scene Text Recognition
Yew Lee Tan, Adams Wai-kin Kong, Jung-Jae Kim
Masked Vision-Language Transformers for Scene Text Recognition
Jie Wu, Ying Peng, Shengming Zhang, Weigang Qi, Jian Zhang