Cross Modal Representation Learning
Cross-modal representation learning aims to create unified representations of data from different modalities (e.g., text, images, audio) to enable seamless integration and analysis across diverse data types. Current research focuses on developing advanced architectures like masked autoencoders, transformers, and contrastive learning methods to effectively align and fuse information from disparate sources, often leveraging pre-trained large language models. This field is crucial for advancing applications in various domains, including medical diagnostics, spatio-temporal forecasting, speech processing, and multimedia understanding, by enabling more robust and accurate models that can leverage the strengths of multiple data sources.
Papers
October 3, 2024
August 26, 2024
August 11, 2024
June 6, 2024
May 23, 2024
April 23, 2024
April 16, 2024
March 6, 2024
December 16, 2023
September 25, 2023
May 18, 2023
April 30, 2023
March 25, 2023
November 21, 2022
September 30, 2022
July 11, 2022
July 1, 2022
June 21, 2022
May 3, 2022