Video Text Retrieval
Video text retrieval (VTR) aims to find videos that best match given text queries, bridging the semantic gap between visual and textual data. Current research heavily utilizes pre-trained vision-language models like CLIP, focusing on improving efficiency through techniques such as prompt tuning and adapter modules, as well as enhancing accuracy via multi-scale feature learning, refined cross-modal alignment strategies (e.g., one-to-many alignment), and data-centric approaches like query expansion. VTR is crucial for applications like video search and recommendation, and ongoing research is improving both the speed and accuracy of these systems.
Papers
November 4, 2024
October 16, 2024
October 9, 2024
September 30, 2024
August 20, 2024
August 14, 2024
July 17, 2024
July 10, 2024
June 23, 2024
June 21, 2024
May 30, 2024
May 29, 2024
May 25, 2024
May 21, 2024
April 22, 2024
April 18, 2024
April 7, 2024
March 26, 2024
February 26, 2024