Text to Image Generation
Text-to-image generation aims to create realistic and diverse images from textual descriptions, focusing on improving controllability, efficiency, and factual accuracy. Current research emphasizes enhancing model architectures like diffusion models and leveraging large language models for prompt understanding and control, including methods for fine-grained manipulation of image components and styles. This field is significant for its potential impact on various applications, from creative content generation to assisting in scientific visualization and medical imaging, while also raising important questions about bias mitigation and factual accuracy in AI-generated content.
Papers
Object-level Visual Prompts for Compositional Image Generation
Gaurav Parmar, Or Patashnik, Kuan-Chieh Wang, Daniil Ostashev, Srinivasa Narasimhan, Jun-Yan Zhu, Daniel Cohen-Or, Kfir Aberman
EliGen: Entity-Level Controlled Image Generation with Regional Attention
Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, Yu Zhang
Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free
Evelyn Zhang, Bang Xiao, Jiayi Tang, Qianli Ma, Chang Zou, Xuefei Ning, Xuming Hu, Linfeng Zhang
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang
Efficient Scaling of Diffusion Transformers for Text-to-Image Generation
Hao Li, Shamit Lal, Zhiheng Li, Yusheng Xie, Ying Wang, Yang Zou, Orchid Majumder, R. Manmatha, Zhuowen Tu, Stefano Ermon, Stefano Soatto, Ashwin Swaminathan
A LoRA is Worth a Thousand Pictures
Chenxi Liu, Towaki Takikawa, Alec Jacobson