Video Paragraph Captioning
Video paragraph captioning (VPC) aims to automatically generate multi-sentence descriptions of long, untrimmed videos, capturing the narrative flow of events. Current research emphasizes developing robust models that handle missing or incomplete data from various modalities (e.g., video, speech, event boundaries), often employing transformer-based architectures and contrastive learning techniques to improve coherence and accuracy. This field is significant for advancing multimodal understanding and has applications in areas such as video summarization, accessibility for visually impaired individuals, and enhancing human-computer interaction.
Papers
October 12, 2024
March 28, 2024
February 27, 2023
November 28, 2022
June 26, 2022
March 12, 2022