Text to Image Generation
Text-to-image generation aims to create realistic and diverse images from textual descriptions, focusing on improving controllability, efficiency, and factual accuracy. Current research emphasizes enhancing model architectures like diffusion models and leveraging large language models for prompt understanding and control, including methods for fine-grained manipulation of image components and styles. This field is significant for its potential impact on various applications, from creative content generation to assisting in scientific visualization and medical imaging, while also raising important questions about bias mitigation and factual accuracy in AI-generated content.
Papers
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan
Approximate Caching for Efficiently Serving Diffusion Models
Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, Shiv Saini
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo
HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation
Yifan Zhang, Bryan Hooi
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev
ET3D: Efficient Text-to-3D Generation via Multi-View Distillation
Yiming Chen, Zhiqi Li, Peidong Liu