Cross Modal Alignment
Cross-modal alignment focuses on integrating information from different data modalities (e.g., text, images, audio) to create unified representations and uncover correlations between them. Current research emphasizes efficient and robust alignment methods, often employing parameter-efficient fine-tuning, lightweight encoders (like OneEncoder), and novel loss functions to address challenges such as noisy data and modality imbalances. This work is significant for improving the performance of various applications, including visual question answering, image retrieval, and speech recognition, by enabling more accurate and comprehensive understanding of multimodal data.
Papers
May 28, 2024
May 23, 2024
May 15, 2024
April 28, 2024
April 21, 2024
April 20, 2024
April 5, 2024
March 21, 2024
March 13, 2024
March 8, 2024
February 15, 2024
January 25, 2024
January 22, 2024
January 16, 2024
January 2, 2024
December 28, 2023
December 25, 2023