Temporal Action Proposal

Temporal action proposal generation (TAPG) aims to automatically identify the start and end times of actions within untrimmed videos, a crucial step in video understanding. Recent research focuses on improving proposal accuracy and efficiency through novel architectures like transformer-based models and diffusion models, often incorporating multi-modal information (e.g., visual and textual features) and addressing challenges like zero-shot learning and low-shot learning. These advancements enhance the performance of downstream tasks such as temporal action detection and localization, impacting applications in video analysis, surveillance, and human-computer interaction. The field is actively exploring more efficient and robust methods for generating complete and accurate action proposals.

Papers