Diffusion Based Text to Image
Diffusion-based text-to-image models aim to generate high-quality, realistic images from textual descriptions, focusing on improving image fidelity, controllability, and safety. Current research emphasizes enhancing the models' ability to accurately render text within images, mitigating biases and safety concerns (like generating unsafe content through prompt manipulation), and improving compositional generation of complex scenes with multiple objects. These advancements are significant for both the scientific community, pushing the boundaries of multimodal generation and AI safety, and for practical applications in creative content generation, design, and various other fields.
Papers
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, Namhyuk Ahn
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan