Text to Image Synthesis
Text-to-image synthesis aims to generate realistic and stylistically consistent images from textual descriptions, leveraging advancements in deep learning. Current research emphasizes improving model scalability (e.g., using Mixture-of-Experts architectures), enhancing controllability through techniques like frequency band substitution and layout-conditional generation, and developing more robust evaluation metrics that assess both image quality and semantic alignment with the input text. This field is significant for its potential applications in creative content generation, digital art, and various scientific domains requiring visual data synthesis from textual information, driving ongoing efforts to improve both the efficiency and fidelity of these models.
Papers
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment
Qi Chen, Chaorui Deng, Zixiong Huang, Bowen Zhang, Mingkui Tan, Qi Wu
Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
Minho Park, Jooyeol Yun, Seunghwan Choi, Jaegul Choo