Ego4D Natural Language Query
Ego4D natural language query (NLQ) research focuses on accurately locating the time segment in long, first-person videos that answers a given natural language question. Current approaches employ transformer-based models, often incorporating multi-modal and multi-scale features to effectively fuse visual and textual information, and leverage techniques like contrastive learning and efficient clip selection to handle the computational challenges of long videos. This field is significant for advancing video understanding and has potential applications in areas such as augmented reality and robotics, enabling more intuitive and natural interaction with video data.
Papers
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
Fangzhou Mu, Sicheng Mo, Gillian Wang, Yin Li
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022
Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan
A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge
Sicheng Mo, Fangzhou Mu, Yin Li