Caption Pair
Caption pairs, comprising videos and their corresponding textual descriptions, are central to advancing video understanding in artificial intelligence. Current research focuses on developing robust models capable of handling temporal reasoning and semantic binding within videos, often employing transformer-based architectures and leveraging large-scale datasets for training. A key challenge lies in overcoming limitations in existing models' ability to accurately capture complex relationships between visual and textual information, particularly in longer videos or those with noisy or ambiguous content. This research area is crucial for improving various applications, including video retrieval, question answering, and summarization.
Papers
October 3, 2024
June 16, 2024
February 20, 2024
January 30, 2024
December 4, 2023
October 22, 2023
September 4, 2023
August 28, 2023
August 25, 2023
December 31, 2022
December 11, 2022
November 22, 2022
October 10, 2022
August 13, 2022