Modality Invariant
Modality invariant learning aims to create representations of data that are consistent across different data types (modalities), such as text, audio, and images, enabling robust analysis even with missing or incomplete information. Current research focuses on developing models that disentangle modality-specific and modality-invariant features, often employing techniques like contrastive learning, adversarial networks, and attention mechanisms within transformer-based architectures or single-branch networks. This field is crucial for advancing multimodal applications in various domains, including medical diagnosis, sentiment analysis, and recommendation systems, by improving the reliability and robustness of models handling heterogeneous data.
Papers
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Jinhe Bi, Yujun Wang, Haokun Chen, Xun Xiao, Artur Hecker, Volker Tresp, Yunpu Ma
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, Danilo Comminiello