Video Language Model
Video Language Models (VLMs) aim to bridge the gap between visual and textual information in videos, enabling computers to understand and reason about video content in a human-like way. Current research focuses on improving VLM performance through larger datasets, more efficient architectures (like transformer-based models and those incorporating memory mechanisms), and innovative training strategies such as contrastive learning and instruction tuning. These advancements are crucial for applications ranging from automated video captioning and question answering to robotic control and unusual activity detection, driving significant progress in both computer vision and natural language processing.
Papers
March 21, 2024
February 29, 2024
February 20, 2024
February 6, 2024
January 11, 2024
November 30, 2023
November 29, 2023
November 21, 2023
November 15, 2023
November 13, 2023
June 23, 2023
June 7, 2023
May 18, 2023
May 11, 2023
April 13, 2023
February 20, 2023
January 5, 2023
November 21, 2022
November 14, 2022
October 8, 2022