Multimodal Corpus
Multimodal corpora are collections of data integrating various modalities like text, audio, images, and video, aiming to better understand and model human communication. Current research focuses on developing large-scale, multilingual multimodal corpora and leveraging them to train and evaluate multimodal large language models (mLLMs), often employing techniques like image-text interleaving and schema-based approaches for improved in-context learning and task performance. These corpora are crucial for advancing natural language processing, particularly in areas like emotion recognition, video editing, and cross-lingual understanding, enabling the development of more robust and human-like AI systems.
Papers
October 21, 2024
September 9, 2024
July 26, 2024
June 13, 2024
June 12, 2024
March 26, 2024
December 11, 2023
June 26, 2023
June 8, 2023
June 4, 2023
May 28, 2023
October 26, 2022
July 9, 2022
June 5, 2022
May 24, 2022
March 1, 2022