Text Guided Diffusion Model
Text-guided diffusion models are generative AI systems that create images, videos, 3D models (including molecules), and audio based on textual descriptions. Current research focuses on improving the fidelity and controllability of these models, exploring techniques like prompt engineering, embedding manipulation within diffusion model architectures (e.g., U-Nets), and ensembling multiple models to enhance generation quality and address limitations such as object count accuracy and spatial layout preservation. These advancements have significant implications for various fields, including medical imaging, drug discovery, surgical training, and creative content generation, by enabling the synthesis of realistic and diverse data where real-world data is scarce or difficult to obtain.
Papers
Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search
Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham