Text Block
Research on text blocks focuses on improving the detection, understanding, and utilization of coherent text units within complex visual and textual contexts. Current efforts involve developing advanced models, such as transformers and diffusion models, to achieve precise text localization and recognition in images, even without fine-grained detection, and to leverage large language models for improved contextual understanding and generation. This work is significant for applications ranging from improved automatic speech recognition and image captioning to enhanced text mining and analysis of web pages and documents. The development of robust and accurate text block processing techniques is crucial for advancing various fields reliant on efficient and accurate text information extraction.
Papers
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran