Temporal Action Detection
Temporal action detection (TAD) aims to identify and locate actions within untrimmed videos, a crucial task for video understanding. Current research heavily utilizes transformer-based architectures, like DETR, often focusing on improving temporal modeling through techniques such as refined feature extraction, attention mechanism enhancements to address issues like attention collapse, and incorporating contextual information (e.g., audio, interactions). These advancements are driving progress in various applications, including video summarization, autonomous systems, and ecological monitoring, by enabling more accurate and efficient analysis of complex video data.
Papers
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos
Zsófia Katona, Seyed Sahand Mohammadi Ziabari, Fatemeh Karimi Nejadasl
Harnessing Temporal Causality for Advanced Temporal Action Detection
Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem