Multimodal Learning
Multimodal learning aims to improve machine learning performance by integrating data from multiple sources, such as text, images, and audio, to create richer, more robust representations. Current research focuses on addressing challenges like missing modalities (developing models resilient to incomplete data), modality imbalance (ensuring fair contribution from all modalities), and efficient fusion techniques (e.g., dynamic anchor methods, single-branch networks, and various attention mechanisms). This field is significant because it enables more accurate and contextually aware systems across diverse applications, including healthcare diagnostics, recommendation systems, and video understanding.
Papers
MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report
Samrajya Thapa, Koushik Howlader, Subhankar Bhattacharjee, Wei le
Multimodal Learning for Embryo Viability Prediction in Clinical IVF
Junsik Kim, Zhiyi Shi, Davin Jeong, Johannes Knittel, Helen Y. Yang, Yonghyun Song, Wanhua Li, Yicong Li, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao, Mahyar Najibi, Sachin Mehta
Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers
Davide Celestini, Amirhossein Afsharrad, Daniele Gammelli, Tommaso Guffanti, Gioele Zardini, Sanjay Lall, Elisa Capello, Simone D'Amico, Marco Pavone
On-the-fly Modulation for Balanced Multimodal Learning
Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Christian Ganhör, Marta Moscati, Anna Hausberger, Shah Nawaz, Markus Schedl
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar, Raghav Singhal, Pranamya Kulkarni, Deval Mehta, Kshitij Jadhav