Video Grounding
Video grounding aims to precisely locate in a video the temporal segment corresponding to a given textual or spoken language query. Current research focuses on improving the scalability and accuracy of grounding models, particularly for long videos and complex queries, employing techniques like late fusion, efficient sampling, and novel transformer architectures with learnable tokens or dynamic moment queries. These advancements are crucial for enhancing video understanding capabilities in various applications, including video retrieval, summarization, and question answering, and are driving the development of more robust and efficient multimodal learning models.
Papers
November 11, 2024
August 3, 2024
April 2, 2024
March 21, 2024
December 31, 2023
December 21, 2023
December 12, 2023
October 26, 2023
September 12, 2023
August 14, 2023
August 11, 2023
July 26, 2023
June 20, 2023
March 14, 2023
January 15, 2023
September 1, 2022
March 15, 2022
March 8, 2022
January 25, 2022