Multimodal Learning
Multimodal learning aims to improve machine learning performance by integrating data from multiple sources, such as text, images, and audio, to create richer, more robust representations. Current research focuses on addressing challenges like missing modalities (developing models resilient to incomplete data), modality imbalance (ensuring fair contribution from all modalities), and efficient fusion techniques (e.g., dynamic anchor methods, single-branch networks, and various attention mechanisms). This field is significant because it enables more accurate and contextually aware systems across diverse applications, including healthcare diagnostics, recommendation systems, and video understanding.
Papers
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report
Jielin Qiu, Jiacheng Zhu, Shiqi Liu, William Han, Jingqi Zhang, Chaojing Duan, Michael Rosenberg, Emerson Liu, Douglas Weber, Ding Zhao
Noisy Correspondence Learning with Meta Similarity Correction
Haochen Han, Kaiyao Miao, Qinghua Zheng, Minnan Luo
Identifiability Results for Multimodal Contrastive Learning
Imant Daunhawer, Alice Bizeul, Emanuele Palumbo, Alexander Marx, Julia E. Vogt
Preoperative Prognosis Assessment of Lumbar Spinal Surgery for Low Back Pain and Sciatica Patients based on Multimodalities and Multimodal Learning
Li-Chin Chen, Jung-Nien Lai, Hung-En Lin, Hsien-Te Chen, Kuo-Hsuan Hung, Yu Tsao