Latent Diffusion Model
Latent diffusion models (LDMs) are generative AI models that create high-quality images by reversing a diffusion process in a compressed latent space, offering efficiency advantages over pixel-space methods. Current research focuses on improving controllability (e.g., through text or other modalities), enhancing efficiency (e.g., via parameter-efficient architectures or faster inference), and addressing challenges like model robustness and ethical concerns (e.g., watermarking and mitigating adversarial attacks). LDMs are significantly impacting various fields, including medical imaging (synthesis and restoration), speech enhancement, and even physics simulation, by enabling the generation of realistic and diverse data for training and analysis where real data is scarce or difficult to obtain.
Papers
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano
Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression
Junhui Li, Jutao Li, Xingsong Hou, Huake Wang
Learning Discrete Concepts in Latent Hierarchical Models
Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang
Improving Text Generation on Images with Synthetic Captions
Jun Young Koh, Sang Hyun Park, Joy Song
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao
Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen
Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models
Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu
Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks
João Bordalo, Vasco Ramos, Rodrigo Valério, Diogo Glória-Silva, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes
MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis
Joseph Cho, Cyril Zakka, Dhamanpreet Kaur, Rohan Shad, Ross Wightman, Akshay Chaudhari, William Hiesinger