Text to Video Retrieval
Text-to-video retrieval (TVR) aims to efficiently locate videos matching a given textual description, a crucial task for various applications. Current research heavily focuses on improving the alignment of visual and textual representations, often employing transformer-based architectures and leveraging pre-trained models like CLIP, exploring multi-granularity features (e.g., sentence-level and word-level text, frame-level and segment-level video), and incorporating audio information to enhance retrieval accuracy. Advances in TVR are significant for improving search capabilities in large video datasets and powering applications like video recommendation systems and content-based video indexing.
Papers
November 13, 2024
October 29, 2024
June 21, 2024
April 26, 2024
January 1, 2024
December 10, 2023
November 30, 2023
November 1, 2023
October 23, 2023
September 18, 2023
August 2, 2023
July 24, 2023
March 23, 2023
February 4, 2023
January 18, 2023
November 21, 2022
October 10, 2022
July 5, 2022
June 26, 2022