Multimodal Information
Multimodal information processing focuses on integrating data from multiple sources, such as text, images, audio, and sensor data, to achieve a more comprehensive understanding than any single modality allows. Current research emphasizes developing robust model architectures, including large language models (LLMs), transformers, and autoencoders, to effectively fuse and interpret this diverse information, often addressing challenges like missing data and noise. This field is significant for advancing numerous applications, from improving medical diagnoses and e-commerce search to enhancing robotic perception and understanding human-computer interactions.
Papers
January 8, 2023
December 12, 2022
November 25, 2022
November 7, 2022
October 28, 2022
October 13, 2022
October 10, 2022
September 16, 2022
September 9, 2022
September 1, 2022
May 17, 2022
April 30, 2022
April 14, 2022
April 13, 2022
April 6, 2022
March 30, 2022
March 25, 2022
March 24, 2022