Multimodal System
Multimodal systems integrate data from multiple sources (e.g., audio, video, text) to achieve tasks beyond the capabilities of single-modality approaches. Current research focuses on improving model architectures like two-tower systems and large language models (LLMs) for tasks such as action recognition, emotion detection, and design generation, often employing techniques like multimodal fusion and attention mechanisms. This field is significant for its potential to create more robust, accurate, and human-centered applications across diverse domains, from healthcare and assistive technologies to urban planning and online safety.
Papers
April 30, 2023
March 13, 2023
April 27, 2022
March 14, 2022
February 7, 2022
January 23, 2022
December 17, 2021
December 15, 2021