Text to Image Diffusion
Text-to-image diffusion models generate images from textual descriptions by iteratively refining noise into a coherent image, leveraging the power of large pretrained models and diffusion processes. Current research focuses on improving controllability (e.g., viewpoint, object attributes, style), efficiency (e.g., one-step generation, reduced parameter counts), and addressing issues like overfitting and bias in personalized models, often employing techniques like parameter-efficient fine-tuning, reinforcement learning, and multi-modal conditioning with other data sources (e.g., audio, depth maps). These advancements are significantly impacting various fields, including computer graphics, digital art, and content creation, by enabling more efficient and controllable image synthesis.
Papers
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre
Novel Object Synthesis via Adaptive Text-Image Harmony
Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li
UVMap-ID: A Controllable and Personalized UV Map Generative Model
Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting
Weili Zeng, Yichao Yan, Qi Zhu, Zhuo Chen, Pengzhi Chu, Weiming Zhao, Xiaokang Yang