Text Image
Text-image research focuses on understanding and generating images containing text, aiming to improve the accuracy, realism, and diversity of such images. Current research heavily utilizes diffusion models, often enhanced with techniques like glyph-aware training and dual translation learning, to address challenges such as legible text generation, multi-concept synthesis, and cross-lingual capabilities. This field is significant for applications in combating misinformation (detecting text-image inconsistencies), improving scene text recognition, and enabling novel image editing and generation tasks, ultimately advancing both computer vision and natural language processing.
Papers
TIPS: Text-Image Pretraining with Spatial Awareness
Kevis-Kokitsi Maninis, Kaifeng Chen, Soham Ghosh, Arjun Karpur, Koert Chen, Ye Xia, Bingyi Cao, Daniel Salz, Guangxing Han, Jan Dlabal, Dan Gnanapragasam, Mojtaba Seyedhosseini, Howard Zhou, Andre Araujo
Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation
Ruting Chi, Zhiyi Huang, Yuexing Han
Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance
Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo