Urdu Text

Research on Urdu text processing focuses on bridging the gap between the rich linguistic resources of other languages and the relatively limited resources available for Urdu. Current efforts concentrate on improving automatic speech recognition (ASR), optical character recognition (OCR), and scene text detection/recognition, employing transformer-based architectures and hybrid CNN-RNN models to handle the complexities of the Urdu script, including its cursive nature and variations in writing styles. These advancements are crucial for enhancing accessibility to Urdu digital content, improving information retrieval, and fostering broader applications in areas like machine translation and visual question answering. The development of large, high-quality datasets is a key component of this progress, addressing the scarcity of resources that has historically hindered research in this area.

Papers