Multi Modal Transformer
Multi-modal transformers are deep learning models designed to integrate and process information from multiple data sources (e.g., images, text, audio) simultaneously, aiming to improve the accuracy and robustness of various tasks compared to single-modality approaches. Current research focuses on developing efficient architectures, such as encoder-decoder transformers and modality-specific fusion strategies, to handle diverse data types and address challenges like data heterogeneity and missing modalities. These models are proving valuable across numerous fields, including medical image analysis, speech recognition, and autonomous driving, by enabling more comprehensive and accurate analyses than previously possible.
Papers
October 22, 2024
September 26, 2024
September 13, 2024
July 14, 2024
July 9, 2024
June 24, 2024
May 30, 2024
April 18, 2024
April 11, 2024
February 22, 2024
February 15, 2024
December 17, 2023
November 13, 2023
November 6, 2023
October 24, 2023
September 16, 2023
September 7, 2023
August 15, 2023
July 14, 2023