Video Description
Video description research aims to automatically generate natural language summaries of video content, enhancing accessibility and enabling deeper video understanding. Current efforts focus on developing large-scale video-language models, often employing transformer architectures and incorporating techniques like curriculum learning and multi-modal fusion (e.g., combining visual and audio information) to improve description accuracy and detail. These advancements are significant for improving accessibility for visually impaired individuals and for applications in video indexing, retrieval, and analysis within the broader scientific community.
Papers
November 11, 2024
October 30, 2024
September 24, 2024
June 30, 2024
May 19, 2024
March 5, 2024
January 22, 2024
November 13, 2023
November 8, 2023
November 5, 2023
April 9, 2023
September 23, 2022
April 30, 2022
December 28, 2021
December 2, 2021