Multimodal Information
Multimodal information processing focuses on integrating data from multiple sources, such as text, images, audio, and sensor data, to achieve a more comprehensive understanding than any single modality allows. Current research emphasizes developing robust model architectures, including large language models (LLMs), transformers, and autoencoders, to effectively fuse and interpret this diverse information, often addressing challenges like missing data and noise. This field is significant for advancing numerous applications, from improving medical diagnoses and e-commerce search to enhancing robotic perception and understanding human-computer interactions.
Papers
March 15, 2024
March 13, 2024
March 11, 2024
March 8, 2024
March 4, 2024
March 2, 2024
February 27, 2024
February 17, 2024
February 12, 2024
February 6, 2024
January 29, 2024
January 22, 2024
January 16, 2024
December 28, 2023
December 19, 2023
December 15, 2023
November 17, 2023
October 18, 2023
October 9, 2023
October 3, 2023