Cross Modal Fusion
Cross-modal fusion aims to integrate information from different data modalities (e.g., images, text, audio) to create richer, more robust representations for various tasks. Current research emphasizes developing efficient and effective fusion strategies, often employing transformer-based architectures and attention mechanisms to capture complex inter-modal relationships, as well as exploring different fusion points (early, mid, late) depending on the task and data characteristics. This field is significant because improved cross-modal understanding has broad applications, enhancing performance in areas such as image segmentation, video understanding, recommendation systems, and emotion recognition.
Papers
November 22, 2023
November 14, 2023
September 23, 2023
August 31, 2023
August 22, 2023
August 14, 2023
July 25, 2023
July 19, 2023
June 16, 2023
May 13, 2023
May 8, 2023
February 5, 2023
January 26, 2023
October 26, 2022
September 20, 2022
August 14, 2022
April 12, 2022
March 9, 2022