Text to Image Generation
Text-to-image generation aims to create realistic and diverse images from textual descriptions, focusing on improving controllability, efficiency, and factual accuracy. Current research emphasizes enhancing model architectures like diffusion models and leveraging large language models for prompt understanding and control, including methods for fine-grained manipulation of image components and styles. This field is significant for its potential impact on various applications, from creative content generation to assisting in scientific visualization and medical imaging, while also raising important questions about bias mitigation and factual accuracy in AI-generated content.
Papers
StyleDrop: Text-to-Image Generation in Any Style
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation
Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Jialu Wang, Xinyue Gabby Liu, Zonglin Di, Yang Liu, Xin Eric Wang
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz
The CLIP Model is Secretly an Image-to-Prompt Converter
Yuxuan Ding, Chunna Tian, Haoxuan Ding, Lingqiao Liu