Multi Modal Image Generation
Multimodal image generation aims to create realistic images from diverse input sources like text, sketches, and other images, offering greater control and creativity than unimodal methods. Current research focuses on combining the strengths of autoregressive and diffusion models, often incorporating large language models for improved context understanding and leveraging pre-trained models to reduce training needs. This field is significant for its potential applications in various creative industries, such as fashion design and art, as well as for advancing computer vision tasks like person re-identification and image fusion.
Papers
October 10, 2024
October 7, 2024
September 17, 2024
July 24, 2024
June 26, 2024
January 31, 2024
May 24, 2023
November 11, 2022
October 4, 2022
May 24, 2022