Video Understanding Task
Video understanding research aims to enable computers to interpret the content and context of videos, encompassing tasks like action recognition, video captioning, and question answering. Current efforts focus on developing robust and efficient models, often leveraging large language models (LLMs) and multimodal architectures, including transformers and graph neural networks, to process both visual and auditory information and handle long-term temporal dependencies. These advancements are crucial for applications ranging from automated video indexing and summarization to more complex tasks such as autonomous driving and medical diagnosis, driving significant progress in both computer vision and artificial intelligence.
Papers
December 26, 2024
December 6, 2024
December 5, 2024
December 4, 2024
December 1, 2024
November 29, 2024
November 19, 2024
November 17, 2024
November 8, 2024
October 15, 2024
October 4, 2024
September 27, 2024
September 16, 2024
September 5, 2024
August 29, 2024
July 6, 2024
June 17, 2024
June 10, 2024
May 28, 2024