Text to Image Synthesis

Text-to-image synthesis aims to generate realistic and stylistically consistent images from textual descriptions, leveraging advancements in deep learning. Current research emphasizes improving model scalability (e.g., using Mixture-of-Experts architectures), enhancing controllability through techniques like frequency band substitution and layout-conditional generation, and developing more robust evaluation metrics that assess both image quality and semantic alignment with the input text. This field is significant for its potential applications in creative content generation, digital art, and various scientific domains requiring visual data synthesis from textual information, driving ongoing efforts to improve both the efficiency and fidelity of these models.

Papers

June 15, 2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li
Generative Model Text to Image Model Text to Image Text to Image Synthesis Human Preference Benchmark Study

June 14, 2023

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
Zhiyu Jin, Xuli Shen, Bin Li, Xiangyang Xue
Diffusion Model Text to Image Diffusion Model Text to Image Synthesis Training Free Diffusion Attention Entropy

June 12, 2023

Fill-Up: Balancing Long-Tailed Data with Generative Models
Joonghyuk Shin, Minguk Kang, Jaesik Park
Generative Model Synthetic Image Text to Image Synthesis Long Tailed Recognition Long Tailed Data Controllable Image Synthesis

June 8, 2023

Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung, Songwei Ge, Jia-Bin Huang
Cross Attention Text to Image Synthesis Attention Map Selective Focus Large Scale Diffusion Model

June 5, 2023

Cheap-fake Detection with LLM using Prompt Engineering
Guangyang Wu, Weijie Wu, Xiaohong Liu, Kele Xu, Tianjiao Wan, Wenyi Wang
Large Language Model Image Captioning Prompt Engineering Text to Image Synthesis Context Image Fake Detection

June 1, 2023

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville
Text to Image Diffusion Model Latent Representation Text to Image Synthesis Efficient Architecture Text Conditioned Image Generation Compact Latent

May 22, 2023

Design a Delicious Lunchbox in Style
Yutong Zhou
Generative Adversarial Network Product Design Text to Image Synthesis Style Consistency Image Composition

May 19, 2023

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun
Cross Lingual Transfer Image Text Pair Stable Diffusion Text to Image Synthesis Text Encoder Semantic Space Pivot Element Recogniton

May 18, 2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang
Large Language Model Real Power Text to Image Model Text to Image Synthesis Text Image Chinese LLM

May 7, 2023

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, Hang Su
Text to Image Diffusion Model Text to Image Synthesis Multimodal Backdoor Multimodal Attack

April 26, 2023

Training-Free Location-Aware Text-to-Image Synthesis
Jiafeng Mao, Xueting Wang
Text to Image Synthesis Training Free Generation Task Generative Learning

April 13, 2023

April 12, 2023

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, Yuxiao Dong
LeArning Abstract Text to Image Generation Text to Image Model Preference Feedback Text to Image Synthesis Preference Reward

April 7, 2023

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
Qiucheng Wu, Yujian Liu, Handong Zhao, Trung Bui, Zhe Lin, Yang Zhang, Shiyu Chang
Diffusion Model Text to Image Cross Attention Text to Image Synthesis Temporal Attention Spatial Attention

March 29, 2023

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models
Konstantina Nikolaidou, George Retsinas, Vincent Christlein, Mathias Seuret, Giorgos Sfikas, Elisa Barney Smith, Hamam Mokayed, Marcus Liwicki
Diffusion Model Generative Adversarial Network Latent Diffusion Model Diffusion Probabilistic Model Word List Text to Image Synthesis Handwritten Text Generation

March 25, 2023

Indonesian Text-to-Image Synthesis with Sentence-BERT and FastGAN
Made Raharja Surya Mahadi, Nugraha Priya Utama
Text to Image Synthesis Text Encoder Text to Image Generation Model Image Generator Sentence BERT

March 24, 2023

Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis
Jiguo Li, Xiaobin Liu, Lirong Zheng
Text to Image Synthesis Text Feature

March 23, 2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
Levon Khachatryan, Andranik Movsisyan, Vahram Tadevosyan, Roberto Henschel, Zhangyang Wang, Shant Navasardyan, Humphrey Shi
Zero Shot Text to Image Diffusion Model Text to Image Synthesis Text to Video Generation Text to Video Synthesis

March 21, 2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith
Question Answering Text to Image Model Text to Image Text to Image Synthesis Text to Image Generation Model

Text to Image Synthesis

Papers

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Fill-Up: Balancing Long-Tailed Data with Generative Models

Grounded Text-to-Image Synthesis with Attention Refocusing

Cheap-fake Detection with LLM using Prompt Engineering

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

Design a Delicious Lunchbox in Style

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Training-Free Location-Aware Text-to-Image Synthesis

Expressive Text-to-Image Generation with Rich Text

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Indonesian Text-to-Image Synthesis with Sentence-BERT and FastGAN

Factor Decomposed Generative Adversarial Networks for Text-to-Image Synthesis

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering