Pre Trained Uni Modal
Pre-trained unimodal models, already powerful in their respective domains (e.g., image, text, audio), are increasingly leveraged to build more effective multimodal systems. Current research focuses on efficient and effective methods for integrating these pre-trained models, often employing architectures like Mixture of Experts (MoE) or novel fusion strategies to overcome challenges such as modality-specific biases and computational limitations. This approach allows for the creation of robust multimodal systems with reduced training data requirements and improved performance across various downstream tasks, impacting fields like natural language processing, computer vision, and audio analysis.
Papers
September 9, 2024
June 26, 2024
May 30, 2024
August 23, 2023
April 20, 2023
February 7, 2023
October 28, 2022