Video Moment Retrieval
Video moment retrieval (VMR) focuses on locating specific temporal segments within untrimmed videos that correspond to natural language queries. Current research emphasizes improving cross-modal alignment between video and text features, often employing transformer-based architectures like DETR, and exploring techniques like query debiasing and context enhancement to address challenges in semantic understanding and modality imbalance. This field is significant for advancing video understanding and has practical applications in areas such as video search, summarization, and e-commerce, where efficient and accurate retrieval of relevant video moments is crucial. Recent work also explores leveraging large language models to enhance contextual understanding and generate more robust training data.
Papers
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu, Minquan Wang, Ye Ma, Bo Wang, Aozhu Chen, Quan Chen, Peng Jiang, Xirong Li
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval
Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su