Compositional Text to Image
Compositional text-to-image generation aims to create images accurately reflecting complex textual descriptions involving multiple objects, attributes, and relationships. Current research focuses on improving the controllability and accuracy of these models, often employing diffusion models and large language models (LLMs) to guide the generation process, addressing issues like attribute misalignment and object omission through techniques such as attention map manipulation and scene decomposition. These advancements are significant for both advancing fundamental understanding of multimodal generation and enabling applications requiring precise visual representations from nuanced textual inputs.
Papers
October 9, 2024
June 19, 2024
May 14, 2024
May 11, 2024
January 28, 2024
October 10, 2023
July 12, 2023
May 23, 2023
February 22, 2023
January 4, 2023
December 9, 2022