Video Temporal Grounding
Video temporal grounding (VTG) aims to pinpoint the exact moments in untrimmed videos that correspond to a given textual description, bridging the gap between visual and linguistic understanding. Current research emphasizes improving robustness and generalization, focusing on techniques like leveraging large pre-trained vision-language models (VLMs) and large language models (LLMs), developing efficient transfer learning methods, and addressing biases in training data. These advancements are crucial for applications like video summarization, highlight detection, and content-based video retrieval, ultimately enhancing human-computer interaction with video data.
Papers
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma, Haojian Huang, Jin Chen, Haodong Chen, Pengliang Ji, Xianghao Zang, Han Fang, Chao Ban, Hao Sun, Mulin Chen, Xuelong Li
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu