Text to Image Synthesis
Text-to-image synthesis aims to generate realistic and stylistically consistent images from textual descriptions, leveraging advancements in deep learning. Current research emphasizes improving model scalability (e.g., using Mixture-of-Experts architectures), enhancing controllability through techniques like frequency band substitution and layout-conditional generation, and developing more robust evaluation metrics that assess both image quality and semantic alignment with the input text. This field is significant for its potential applications in creative content generation, digital art, and various scientific domains requiring visual data synthesis from textual information, driving ongoing efforts to improve both the efficiency and fidelity of these models.
Papers
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Duan Li, Jian Gao, Li Zhang, Hao Yang, Boxing Chen, Osamu Yoshie
Abstract Art Interpretation Using ControlNet
Rishabh Srivastava, Addrish Roy