Temporal Action Proposal
Temporal action proposal generation (TAPG) aims to automatically identify the start and end times of actions within untrimmed videos, a crucial step in video understanding. Recent research focuses on improving proposal accuracy and efficiency through novel architectures like transformer-based models and diffusion models, often incorporating multi-modal information (e.g., visual and textual features) and addressing challenges like zero-shot learning and low-shot learning. These advancements enhance the performance of downstream tasks such as temporal action detection and localization, impacting applications in video analysis, surveillance, and human-computer interaction. The field is actively exploring more efficient and robust methods for generating complete and accurate action proposals.
Papers
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection
Khurram Azeem Hashmi, Didier Stricker, Muhammamd Zeshan Afzal