Multimodal Interaction
Multimodal interaction research focuses on developing systems that seamlessly integrate and interpret information from multiple sensory modalities (e.g., text, audio, vision) to enable more natural and effective human-computer interaction. Current research emphasizes developing robust model architectures, such as transformers and contrastive learning methods, to effectively fuse multimodal data and accurately infer user intent or emotion, often leveraging large language models for higher-level reasoning. This field is significant for advancing human-robot interaction, improving assistive technologies, and creating more intuitive interfaces for various applications, including autonomous driving and healthcare.
Papers
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications
Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification
Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency
Evaluating Multimodal Interaction of Robots Assisting Older Adults
Afagh Mehri Shervedani, Ki-Hwan Oh, Bahareh Abbasi, Natawut Monaikul, Zhanibek Rysbek, Barbara Di Eugenio, Milos Zefran
InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis
Feng Qiu, Wanzeng Kong, Yu Ding