Multimodal Generation
Multimodal generation focuses on creating coherent outputs across different data types, such as text, images, audio, and video, aiming to build AI systems that understand and generate information in a more human-like way. Current research emphasizes integrating autoregressive models for global context and diffusion models for high-quality local details, often leveraging large language models to manage complex interactions between modalities. This field is significant for advancing AI capabilities in creative content generation, personalized experiences, and complex tasks like robotic control and medical image analysis, driving progress in both fundamental AI research and practical applications.
Papers
January 18, 2024
November 30, 2023
November 29, 2023
November 25, 2023
November 14, 2023
October 13, 2023
October 9, 2023
October 3, 2023
October 2, 2023
July 4, 2023
June 30, 2023
May 26, 2023
May 24, 2023
May 19, 2023
November 27, 2022
November 15, 2022
October 10, 2022
August 3, 2022