Unimodal Model
Unimodal models, focusing on single data modalities (e.g., text or images), are being increasingly leveraged to build and improve multimodal models that integrate information from multiple sources. Current research emphasizes efficient methods for aligning unimodal representations, often using contrastive learning, projection layers, or Mixture of Experts (MoE) architectures, to create effective multimodal systems. This work is significant because it allows researchers to build powerful multimodal models by leveraging the strengths of existing, well-trained unimodal architectures, reducing computational costs and data requirements while improving performance on tasks like sentiment analysis, activity recognition, and image retrieval.
Papers
October 26, 2024
October 15, 2024
October 14, 2024
October 10, 2024
September 28, 2024
September 18, 2024
September 17, 2024
September 12, 2024
September 10, 2024
September 9, 2024
August 28, 2024
July 25, 2024
July 17, 2024
July 5, 2024
June 24, 2024
May 28, 2024
May 24, 2024
May 13, 2024
April 30, 2024
April 29, 2024