Text to Image Synthesis

Text-to-image synthesis aims to generate realistic and stylistically consistent images from textual descriptions, leveraging advancements in deep learning. Current research emphasizes improving model scalability (e.g., using Mixture-of-Experts architectures), enhancing controllability through techniques like frequency band substitution and layout-conditional generation, and developing more robust evaluation metrics that assess both image quality and semantic alignment with the input text. This field is significant for its potential applications in creative content generation, digital art, and various scientific domains requiring visual data synthesis from textual information, driving ongoing efforts to improve both the efficiency and fidelity of these models.

Papers

March 11, 2024

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi
Diffusion Model Text to Image Synthesis Attention Map Semantic Fidelity Cognitive Control

March 8, 2024

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
Jiapeng Wang, Chengyu Wang, Tingfeng Cao, Jun Huang, Lianwen Jin
Prompt Engineering Text to Image Synthesis

March 6, 2024

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, Jun Huang
Self Attention Stable Diffusion Text to Image Synthesis Attention Map Self Attention Layer Cross Attention Map Understanding Cross Text Guided Image Editing

March 5, 2024

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
Generative Model Text to Image Generation Text to Image Synthesis High Resolution Image Synthesis Recent Generative Model Flow Guided Transformer

February 29, 2024

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping
Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, Tat-Jen Cham
Text to Image Synthesis Consistency Model Consistency Distillation Latent Consistency Model Trajectory Distillation

February 20, 2024

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
Nailei Hei, Qianyu Guo, Zihao Wang, Yan Wang, Haofen Wang, Wenqiang Zhang
New Framework Text to Image Model Text to Image Text to Image Synthesis

February 12, 2024

Discovering Universal Semantic Triggers for Text-to-Image Synthesis
Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su, Qingni Shen
Text to Image Synthesis Target Word Trigger Word Universal Semantic

February 8, 2024

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang
Text to Image Synthesis Multiple Instance Instance Generation Instance Shadow Detection

January 25, 2024

Explicitly Representing Syntax Improves Sentence-to-layout Prediction of Unexpected Situations
Wolf Nuyts, Ruben Cartuyvels, Marie-Francine Moens
Text to Image Synthesis Language Representation Compositional Structure Code Syntax Generative Layout Layout Prediction

January 18, 2024

CLIP Model for Images to Textual Prompts Based on Top-k Neighbors
Xin Zhang, Xin Zhang, YeMing Cai, Tianzhi Jia
Generative Model Image Generation Text to Image Synthesis CLIP Model Multimodal Generation Textual Prompt Hop Neighbor

January 12, 2024

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering
Chang Yu, Junran Peng, Xiangyu Zhu, Zhaoxiang Zhang, Qi Tian, Zhen Lei
Text to Image Generation Prompt Learning Complex Prompt Pre Trained Diffusion Model Prompt Engineering Text to Image Synthesis Textual Description

January 10, 2024

PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models
Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li
Image Generation Text to Image Diffusion Model Text to Image Synthesis Latent Consistency Model PIXART $\Alpha$

December 27, 2023

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu
Diffusion Model Text to Image Model Pre Trained Diffusion Model Text to Image Synthesis Conditional Image Synthesis Decoupled Training

December 18, 2023

The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses
Mahmoud Ahmed, Omer Moussa, Ismail Shaheen, Mohamed Abdelfattah, Amr Abdalla, Marwan Eid, Hesham Eraqi, Mohamed Moustafa
Ground Truth Text to Image Generation Text to Image Synthesis COCO Dataset Semantic Consistency Big Gain Planning Loss

December 15, 2023

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey
Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen
Timely Survey Generative Model Generative Question Text to Image Synthesis Anti Unification Visual Foundation Model Discriminative Approach

December 13, 2023

December 7, 2023

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
Youngwan Lee, Kwanyong Park, Yoorhim Cho, Yong-Ju Lee, Sung Ju Hwang
Text to Image Model Text to Image High Resolution Image Text to Image Synthesis Diffusion Based T2I Model

Text to Image Synthesis

Papers

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Discovering Universal Semantic Triggers for Text-to-Image Synthesis

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Explicitly Representing Syntax Improves Sentence-to-layout Prediction of Unexpected Situations

CLIP Model for Images to Textual Prompts Based on Top-k Neighbors

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering

PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision

Semantic-aware Data Augmentation for Text-to-image Synthesis

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis