Multimodal Summarization
Multimodal summarization aims to generate concise summaries integrating information from multiple sources like text and images, often aiming for both textual and visual outputs. Current research focuses on improving the accuracy and efficiency of these summaries, employing transformer-based architectures and incorporating techniques like cross-modal attention, knowledge distillation, and optimal transport to better align and fuse information from different modalities. This field is significant because it addresses the growing need to efficiently process and understand complex multimedia data, with applications ranging from medical image analysis to news video summarization and improving the accessibility of information.
Papers
August 28, 2024
August 7, 2024
August 6, 2024
April 9, 2024
February 18, 2024
July 6, 2023
June 7, 2023
May 22, 2023
March 13, 2023
February 20, 2023
February 13, 2023
December 15, 2022
November 4, 2022
October 16, 2022