Unimodal Model
Unimodal models, focusing on single data modalities (e.g., text or images), are being increasingly leveraged to build and improve multimodal models that integrate information from multiple sources. Current research emphasizes efficient methods for aligning unimodal representations, often using contrastive learning, projection layers, or Mixture of Experts (MoE) architectures, to create effective multimodal systems. This work is significant because it allows researchers to build powerful multimodal models by leveraging the strengths of existing, well-trained unimodal architectures, reducing computational costs and data requirements while improving performance on tasks like sentiment analysis, activity recognition, and image retrieval.
Papers
June 24, 2024
May 28, 2024
May 24, 2024
May 13, 2024
April 30, 2024
April 29, 2024
April 24, 2024
April 13, 2024
April 2, 2024
March 28, 2024
March 19, 2024
February 22, 2024
February 1, 2024
January 16, 2024
December 28, 2023
December 18, 2023
November 28, 2023
November 27, 2023