Video Text Retrieval
Video text retrieval (VTR) aims to find videos that best match given text queries, bridging the semantic gap between visual and textual data. Current research heavily utilizes pre-trained vision-language models like CLIP, focusing on improving efficiency through techniques such as prompt tuning and adapter modules, as well as enhancing accuracy via multi-scale feature learning, refined cross-modal alignment strategies (e.g., one-to-many alignment), and data-centric approaches like query expansion. VTR is crucial for applications like video search and recommendation, and ongoing research is improving both the speed and accuracy of these systems.
Papers
February 26, 2024
February 4, 2024
January 19, 2024
January 6, 2024
December 15, 2023
December 10, 2023
November 14, 2023
September 20, 2023
September 18, 2023
September 17, 2023
September 16, 2023
August 22, 2023
August 15, 2023
August 2, 2023
July 14, 2023
June 20, 2023
June 7, 2023
May 20, 2023
May 13, 2023