Text Video Pair
Text-video pair research focuses on aligning textual descriptions with video content, aiming to improve various applications like video retrieval, question answering, and generation. Current research emphasizes developing robust models that handle diverse video styles and complex interactions, often employing transformer-based architectures, contrastive learning, and diffusion models to achieve better cross-modal alignment and efficient retrieval. This field is significant due to its potential to enhance video search, content creation, and understanding, impacting both scientific understanding of multimodal learning and practical applications in media and information retrieval.
Papers
September 17, 2024
August 29, 2024
August 5, 2024
July 16, 2024
July 4, 2024
July 2, 2024
June 4, 2024
May 21, 2024
March 26, 2024
January 24, 2024
January 12, 2024
November 7, 2023
October 7, 2023
September 26, 2023
September 20, 2023
August 22, 2023
May 30, 2023
May 26, 2023
May 18, 2023