Text to Video Retrieval
Text-to-video retrieval (TVR) aims to efficiently locate videos matching a given textual description, a crucial task for various applications. Current research heavily focuses on improving the alignment of visual and textual representations, often employing transformer-based architectures and leveraging pre-trained models like CLIP, exploring multi-granularity features (e.g., sentence-level and word-level text, frame-level and segment-level video), and incorporating audio information to enhance retrieval accuracy. Advances in TVR are significant for improving search capabilities in large video datasets and powering applications like video recommendation systems and content-based video indexing.
Papers
April 26, 2022
April 15, 2022
April 6, 2022
March 24, 2022
January 23, 2022
January 13, 2022