Text to Image Synthesis
Text-to-image synthesis aims to generate realistic and stylistically consistent images from textual descriptions, leveraging advancements in deep learning. Current research emphasizes improving model scalability (e.g., using Mixture-of-Experts architectures), enhancing controllability through techniques like frequency band substitution and layout-conditional generation, and developing more robust evaluation metrics that assess both image quality and semantic alignment with the input text. This field is significant for its potential applications in creative content generation, digital art, and various scientific domains requiring visual data synthesis from textual information, driving ongoing efforts to improve both the efficiency and fidelity of these models.
Papers
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision
Shengguang Wu, Zhenglun Chen, Qi Su
Semantic-aware Data Augmentation for Text-to-image Synthesis
Zhaorui Tan, Xi Yang, Kaizhu Huang
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang, Zhentao Tan, Tianyi Wei, Yue Wu, Qidong Huang