Multimodal Loss
Multimodal loss functions are crucial for effectively integrating information from diverse data sources (e.g., images, text, audio) in machine learning models. Current research focuses on improving the robustness and efficiency of multimodal learning by addressing challenges like noisy labels, gradient conflicts between unimodal and multimodal objectives, and imbalanced data. This involves developing novel architectures and algorithms, such as Pareto-based optimization and quality-aware fusion methods, to enhance representation learning and downstream task performance. The resulting advancements have significant implications for various applications, including medical image analysis, robotics, and cross-modal retrieval, by enabling more accurate and reliable models.
Papers
Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training
Vedant Dave, Fotios Lygerakis, Elmar Rueckert
Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss
Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, Clinton Fookes