Multi Modal
Multimodal research focuses on integrating and analyzing data from multiple sources (e.g., text, images, audio, sensor data) to achieve a more comprehensive understanding than any single modality allows. Current research emphasizes developing robust models, often employing large language models (LLMs) and graph neural networks (GNNs), to handle the complexity of multimodal data and address challenges like error detection in mathematical reasoning, long-horizon inference, and efficient data fusion. This field is significant for advancing AI capabilities in diverse applications, including improved recommendation systems, assistive robotics, medical diagnosis, and autonomous driving, by enabling more nuanced and accurate interpretations of complex real-world scenarios.
Papers
MM-GEF: Multi-modal representation meet collaborative filtering
Hao Wu, Alejandro Ariza-Casabona, Bartłomiej Twardowski, Tri Kurniawan Wijaya
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
Hongguang Zhu, Yunchao Wei, Xiaodan Liang, Chunjie Zhang, Yao Zhao
It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos
Enfa George, Mihai Surdeanu
Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning
Ke Liang, Sihang Zhou, Yue Liu, Lingyuan Meng, Meng Liu, Xinwang Liu
Multi-modal multi-class Parkinson disease classification using CNN and decision level fusion
Sushanta Kumar Sahu, Ananda S. Chowdhury
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil, Yonatan Bitton, Roy Schwartz