Latent Diffusion Model
Latent diffusion models (LDMs) are generative AI models that create high-quality images by reversing a diffusion process in a compressed latent space, offering efficiency advantages over pixel-space methods. Current research focuses on improving controllability (e.g., through text or other modalities), enhancing efficiency (e.g., via parameter-efficient architectures or faster inference), and addressing challenges like model robustness and ethical concerns (e.g., watermarking and mitigating adversarial attacks). LDMs are significantly impacting various fields, including medical imaging (synthesis and restoration), speech enhancement, and even physics simulation, by enabling the generation of realistic and diverse data for training and analysis where real data is scarce or difficult to obtain.
Papers
Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification
Jussi Leinonen, Ulrich Hamann, Daniele Nerini, Urs Germann, Gabriele Franch
Exploring Compositional Visual Generation with Latent Classifier Guidance
Changhao Shi, Haomiao Ni, Kai Li, Shaobo Han, Mingfu Liang, Martin Renqiang Min
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis