Multi Modal
Multimodal research focuses on integrating and analyzing data from multiple sources (e.g., text, images, audio, sensor data) to achieve a more comprehensive understanding than any single modality allows. Current research emphasizes developing robust models, often employing large language models (LLMs) and graph neural networks (GNNs), to handle the complexity of multimodal data and address challenges like error detection in mathematical reasoning, long-horizon inference, and efficient data fusion. This field is significant for advancing AI capabilities in diverse applications, including improved recommendation systems, assistive robotics, medical diagnosis, and autonomous driving, by enabling more nuanced and accurate interpretations of complex real-world scenarios.
Papers
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen, Hao Zheng, Yuemeng Li, Yuncong Ma, Liang Ma, Hongming Li, Yong Fan
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
Marvin Schmitt, Stefan T. Radev, Paul-Christian Bürkner
Informative Priors Improve the Reliability of Multimodal Clinical Data Classification
L. Julian Lechuga Lopez, Tim G. J. Rudner, Farah E. Shamout
Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration
Yifan Xie, Jihua Zhu, Shiqi Li, Pengcheng Shi
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Yingying Fang, Shuang Wu, Sheng Zhang, Chaoyan Huang, Tieyong Zeng, Xiaodan Xing, Simon Walsh, Guang Yang
Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis
Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard