Multimodal Learning
Multimodal learning aims to improve machine learning performance by integrating data from multiple sources, such as text, images, and audio, to create richer, more robust representations. Current research focuses on addressing challenges like missing modalities (developing models resilient to incomplete data), modality imbalance (ensuring fair contribution from all modalities), and efficient fusion techniques (e.g., dynamic anchor methods, single-branch networks, and various attention mechanisms). This field is significant because it enables more accurate and contextually aware systems across diverse applications, including healthcare diagnostics, recommendation systems, and video understanding.
Papers
A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data
Marco D Alessandro, Enrique Calabrés, Mikel Elkano
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai
Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule
Yi Xiao, Xiangxin Zhou, Qiang Liu, Liang Wang
Triple Disentangled Representation Learning for Multimodal Affective Analysis
Ying Zhou, Xuefeng Liang, Han Chen, Yin Zhao, Xin Chen, Lida Yu
MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
Viktor Moskvoretskii, Dmitry Osin, Egor Shvetsov, Igor Udovichenko, Maxim Zhelnin, Andrey Dukhovny, Anna Zhimerikina, Evgeny Burnaev