Video Language Pre Training
Video-language pre-training (VLP) aims to learn shared representations between video and text data through self-supervised learning, enabling improved performance on various downstream tasks like video retrieval and question answering. Current research emphasizes efficient model architectures, focusing on techniques like hierarchical representations, fine-grained spatio-temporal alignment, and parameter-efficient adaptation to reduce computational costs and improve generalization. These advancements are significant because they enable more robust and efficient video understanding systems, with applications ranging from improved search capabilities to more sophisticated AI assistants.
Papers
July 28, 2024
June 27, 2024
May 16, 2024
May 12, 2024
February 25, 2024
February 5, 2024
December 1, 2023
October 29, 2023
July 11, 2023
May 22, 2023
April 19, 2023
February 20, 2023
December 30, 2022
December 9, 2022
November 21, 2022
October 21, 2022
October 13, 2022
October 12, 2022
October 8, 2022
September 15, 2022