Multimodal Generation
Multimodal generation focuses on creating coherent outputs across different data types, such as text, images, audio, and video, aiming to build AI systems that understand and generate information in a more human-like way. Current research emphasizes integrating autoregressive models for global context and diffusion models for high-quality local details, often leveraging large language models to manage complex interactions between modalities. This field is significant for advancing AI capabilities in creative content generation, personalized experiences, and complex tasks like robotic control and medical image analysis, driving progress in both fundamental AI research and practical applications.
Papers
November 13, 2024
November 7, 2024
October 7, 2024
September 25, 2024
September 16, 2024
September 3, 2024
August 21, 2024
August 20, 2024
July 23, 2024
July 8, 2024
June 17, 2024
May 29, 2024
May 25, 2024
April 7, 2024
March 14, 2024
March 12, 2024
March 11, 2024
February 25, 2024
January 18, 2024
November 30, 2023