Cross Modal Representation
Cross-modal representation learning aims to create unified representations of information from different modalities (e.g., text, images, audio) to enable more comprehensive understanding and facilitate tasks like image captioning, video question answering, and cross-modal retrieval. Current research focuses on developing robust models, often leveraging transformer architectures and contrastive learning, to handle data heterogeneity, missing modalities, and noisy data, while improving efficiency and reducing computational costs. These advancements are significant for various applications, including medical image analysis, drug discovery, and improving human-computer interaction through more natural and intuitive interfaces.
Papers
October 23, 2023
September 27, 2023
September 26, 2023
September 25, 2023
September 22, 2023
August 31, 2023
August 29, 2023
August 11, 2023
May 26, 2023
May 24, 2023
May 18, 2023
May 9, 2023
May 2, 2023
April 21, 2023
April 20, 2023
March 23, 2023
March 20, 2023
February 28, 2023
February 13, 2023