Text to Image Diffusion
Text-to-image diffusion models generate images from textual descriptions by iteratively refining noise into a coherent image, leveraging the power of large pretrained models and diffusion processes. Current research focuses on improving controllability (e.g., viewpoint, object attributes, style), efficiency (e.g., one-step generation, reduced parameter counts), and addressing issues like overfitting and bias in personalized models, often employing techniques like parameter-efficient fine-tuning, reinforcement learning, and multi-modal conditioning with other data sources (e.g., audio, depth maps). These advancements are significantly impacting various fields, including computer graphics, digital art, and content creation, by enabling more efficient and controllable image synthesis.
Papers
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang, Bingtao Fu, Qingnan Fan, Qi Zhang, Runxing Liu, Hong Gu, Huaqi Zhang, Xinguo Liu
A Simple and Efficient Baseline for Zero-Shot Generative Classification
Zipeng Qi, Buhua Liu, Shiyan Zhang, Bao Li, Zhiqiang Xu, Haoyi Xiong, Zeke Xie
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin
Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
Eric Hanchen Jiang, Yasi Zhang, Zhi Zhang, Yixin Wan, Andrew Lizarraga, Shufan Li, Ying Nian Wu
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre
Novel Object Synthesis via Adaptive Text-Image Harmony
Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li