Text to Image Diffusion Model
Text-to-image diffusion models generate images from textual descriptions, aiming for high-fidelity and precise alignment. Current research focuses on improving controllability, addressing safety concerns (e.g., preventing generation of inappropriate content), and enhancing personalization capabilities through techniques like continual learning and latent space manipulation. These advancements are significant for various applications, including medical imaging, artistic creation, and data augmentation, while also raising important ethical considerations regarding model safety and bias.
Papers
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Renjing Pei, Xueyi Zou, Youliang Yan, Yujiu Yang
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen, Annan Wang, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis
Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou