Multimodal Self Supervised Learning
Multimodal self-supervised learning aims to learn robust representations from unlabeled data encompassing multiple modalities (e.g., images, text, audio). Current research focuses on developing effective strategies for aligning and fusing information across modalities, employing architectures like contrastive learning, masked autoencoders, and transformers to capture both shared and unique information. This approach is significant because it addresses the limitations of supervised learning by reducing reliance on expensive labeled datasets, enabling the development of more powerful and generalizable models across diverse applications like healthcare, remote sensing, and robotics.
Papers
November 14, 2024
November 13, 2024
November 8, 2024
September 15, 2024
September 11, 2024
July 8, 2024
April 28, 2024
April 12, 2024
January 30, 2024
September 11, 2023
June 8, 2023
April 4, 2023
March 31, 2023
March 29, 2023
November 29, 2022
June 6, 2022
May 20, 2022
January 7, 2022
November 26, 2021