Text to Image Generation
Text-to-image generation aims to create realistic and diverse images from textual descriptions, focusing on improving controllability, efficiency, and factual accuracy. Current research emphasizes enhancing model architectures like diffusion models and leveraging large language models for prompt understanding and control, including methods for fine-grained manipulation of image components and styles. This field is significant for its potential impact on various applications, from creative content generation to assisting in scientific visualization and medical imaging, while also raising important questions about bias mitigation and factual accuracy in AI-generated content.
Papers
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation
Michael Ogezi, Ning Shi
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation
Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, Jingyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang