Text Guided Diffusion Model
Text-guided diffusion models are generative AI systems that create images, videos, 3D models (including molecules), and audio based on textual descriptions. Current research focuses on improving the fidelity and controllability of these models, exploring techniques like prompt engineering, embedding manipulation within diffusion model architectures (e.g., U-Nets), and ensembling multiple models to enhance generation quality and address limitations such as object count accuracy and spatial layout preservation. These advancements have significant implications for various fields, including medical imaging, drug discovery, surgical training, and creative content generation, by enabling the synthesis of realistic and diverse data where real-world data is scarce or difficult to obtain.
Papers
Prompt-guided Precise Audio Editing with Diffusion Models
Manjie Xu, Chenxing Li, Duzhen zhang, Dan Su, Wei Liang, Dong Yu
Non-confusing Generation of Customized Concepts in Diffusion Models
Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang