Text to Image Generation
Text-to-image generation aims to create realistic and diverse images from textual descriptions, focusing on improving controllability, efficiency, and factual accuracy. Current research emphasizes enhancing model architectures like diffusion models and leveraging large language models for prompt understanding and control, including methods for fine-grained manipulation of image components and styles. This field is significant for its potential impact on various applications, from creative content generation to assisting in scientific visualization and medical imaging, while also raising important questions about bias mitigation and factual accuracy in AI-generated content.
Papers
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Rinon Gal, Adi Haviv, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Gal Chechik
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu