Temporal Action Detection
Temporal action detection (TAD) aims to identify and locate actions within untrimmed videos, a crucial task for video understanding. Current research heavily utilizes transformer-based architectures, like DETR, often focusing on improving temporal modeling through techniques such as refined feature extraction, attention mechanism enhancements to address issues like attention collapse, and incorporating contextual information (e.g., audio, interactions). These advancements are driving progress in various applications, including video summarization, autonomous systems, and ecological monitoring, by enabling more accurate and efficient analysis of complex video data.
Papers
ReAct: Temporal Action Detection with Relational Queries
Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, Dacheng Tao
Semi-Supervised Temporal Action Detection with Proposal-Free Masking
Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning
Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang