Unimodal Encoders
Unimodal encoders, which process single data modalities (e.g., text, images), are increasingly crucial for building efficient and effective multimodal models. Current research focuses on leveraging pre-trained unimodal encoders to create multimodal systems through techniques like projection layers, modular fusion frameworks, and conditional prompting, often aiming to minimize fine-tuning and computational cost. This work is significant because it allows researchers to build powerful multimodal systems by combining existing, well-understood unimodal components, leading to more data-efficient and computationally tractable solutions for various applications, including image-text retrieval and biomedical analysis.
Papers
October 10, 2024
September 28, 2024
June 12, 2024
June 7, 2024
May 30, 2024
May 28, 2024
April 25, 2024
March 14, 2024
December 15, 2023
December 13, 2023
November 28, 2023
November 7, 2023
October 1, 2023
May 7, 2023
October 8, 2022
April 22, 2022
April 20, 2022