Text to Image Diffusion Model
Text-to-image diffusion models generate images from textual descriptions, aiming for high-fidelity and precise alignment. Current research focuses on improving controllability, addressing safety concerns (e.g., preventing generation of inappropriate content), and enhancing personalization capabilities through techniques like continual learning and latent space manipulation. These advancements are significant for various applications, including medical imaging, artistic creation, and data augmentation, while also raising important ethical considerations regarding model safety and bias.
Papers
Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model
Kangyang Xie, Binbin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, Jiliang Tang
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang, Zhenhong Sun, Zhiyu Tan, Xuanbai Chen, Weihua Chen, Hao Li, Cheng Zhang, Yang Song
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara, Toshihiko Yamasaki
AFreeCA: Annotation-Free Counting for All
Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas
PixArt-\Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Pu Cao, Feng Zhou, Qing Song, Lu Yang