Video Context
Video context analysis focuses on understanding the rich temporal and multimodal information within videos, aiming to improve tasks like video retrieval, action recognition, and question answering. Current research emphasizes leveraging multimodal data (audio, visual, text) and sophisticated model architectures, including transformers and recurrent neural networks, to capture complex spatio-temporal relationships and contextual dependencies within and across videos. This work is significant for advancing video understanding capabilities, enabling applications such as improved video search, more accurate audio description generation for accessibility, and enhanced human-computer interaction in virtual and augmented reality environments.
Papers
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method
Washington Ramos, Michel Silva, Edson Araujo, Victor Moura, Keller Oliveira, Leandro Soriano Marcolino, Erickson R. Nascimento
Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation
Yueming Jin, Yang Yu, Cheng Chen, Zixu Zhao, Pheng-Ann Heng, Danail Stoyanov