Unimodal Model
Unimodal models, focusing on single data modalities (e.g., text or images), are being increasingly leveraged to build and improve multimodal models that integrate information from multiple sources. Current research emphasizes efficient methods for aligning unimodal representations, often using contrastive learning, projection layers, or Mixture of Experts (MoE) architectures, to create effective multimodal systems. This work is significant because it allows researchers to build powerful multimodal models by leveraging the strengths of existing, well-trained unimodal architectures, reducing computational costs and data requirements while improving performance on tasks like sentiment analysis, activity recognition, and image retrieval.
Papers
October 13, 2022
October 4, 2022
July 21, 2022
June 29, 2022
June 18, 2022
May 30, 2022
May 17, 2022
April 7, 2022
April 4, 2022
February 24, 2022
February 7, 2022
January 26, 2022