Temporal Localization
Temporal localization focuses on identifying the precise time intervals of events or actions within video data, often in response to natural language queries. Current research emphasizes improving accuracy and efficiency through various approaches, including transformer-based architectures, multimodal large language models (MLLMs), and techniques that leverage both visual and textual information for more robust localization. This field is crucial for advancing video understanding, enabling applications such as automated video summarization, content moderation, and assistive technologies for visually impaired individuals.
Papers
Density-Guided Label Smoothing for Temporal Localization of Driving Actions
Tunc Alkanat, Erkut Akdag, Egor Bondarev, Peter H. N. De With
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition
Erkut Akdag, Zeqi Zhu, Egor Bondarev, Peter H. N. De With