Masked Multimodal
Masked multimodal learning aims to improve the robustness and efficiency of models processing data from multiple sources (e.g., images, text, audio) by strategically masking and predicting missing modalities during training. Current research focuses on developing novel architectures like Mixture-of-Experts and transformers, employing techniques such as masked autoencoding and cross-modal matching to enhance feature representation and cross-modal alignment. This approach is significant for improving the generalization and efficiency of multimodal models across various applications, including autonomous driving, visual document understanding, and person re-identification, by enabling better handling of incomplete or noisy data.
Papers
October 3, 2024
September 26, 2024
April 28, 2024
April 23, 2024
March 1, 2024
December 8, 2023
November 2, 2023
July 9, 2023
May 16, 2023
April 21, 2023
April 19, 2023
June 28, 2022
January 19, 2022