Cross Modal Alignment
Cross-modal alignment focuses on integrating information from different data modalities (e.g., text, images, audio) to create unified representations and uncover correlations between them. Current research emphasizes efficient and robust alignment methods, often employing parameter-efficient fine-tuning, lightweight encoders (like OneEncoder), and novel loss functions to address challenges such as noisy data and modality imbalances. This work is significant for improving the performance of various applications, including visual question answering, image retrieval, and speech recognition, by enabling more accurate and comprehensive understanding of multimodal data.
Papers
October 24, 2022
October 19, 2022
October 18, 2022
October 13, 2022
October 12, 2022
September 28, 2022
September 23, 2022
August 18, 2022
August 4, 2022
August 3, 2022
July 31, 2022
June 17, 2022
May 24, 2022
April 4, 2022
February 21, 2022
December 17, 2021
December 4, 2021