Cross Modal Fusion
Cross-modal fusion aims to integrate information from different data modalities (e.g., images, text, audio) to create richer, more robust representations for various tasks. Current research emphasizes developing efficient and effective fusion strategies, often employing transformer-based architectures and attention mechanisms to capture complex inter-modal relationships, as well as exploring different fusion points (early, mid, late) depending on the task and data characteristics. This field is significant because improved cross-modal understanding has broad applications, enhancing performance in areas such as image segmentation, video understanding, recommendation systems, and emotion recognition.
Papers
August 14, 2023
July 25, 2023
July 19, 2023
June 16, 2023
May 13, 2023
May 8, 2023
February 5, 2023
January 26, 2023
October 26, 2022
September 20, 2022
August 14, 2022
April 12, 2022
March 9, 2022
March 5, 2022
January 24, 2022
December 9, 2021
November 3, 2021