Multi Paragraph Video Grounding

Multi-paragraph video grounding (MPVG) focuses on precisely locating the temporal segments in a long video that correspond to multiple, semantically related sentences, often from a synopsis or script. Current research emphasizes developing models capable of handling long videos and complex, interconnected textual descriptions, employing techniques like Siamese networks for joint alignment and regression, and multi-resolution temporal modules to capture temporal consistency across different video granularities. This work is significant for advancing multimodal understanding, particularly in applications requiring the analysis of long-form video content, such as video summarization, content retrieval, and video editing. The development of large-scale datasets specifically designed for MPVG is also a key area of progress.

Papers

November 26, 2024

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding
Mengzhao Wang, Huafeng Li, Yafei Zhang, Jinxing Li, Minghong Xie, Dapeng Tao
Fine Grained Video Text Retrieval Text to Video Retrieval 2 Dimensional Label Multi Paragraph Video Grounding

August 3, 2024

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu
Structured Summary Large Scale Dataset TV Show Video Grounding Multi Paragraph Video Grounding

July 18, 2024

Multi-sentence Video Grounding for Long Video Generation
Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu
Video Generation Video Generation Task Multi Paragraph Video Grounding

March 18, 2024

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng, Jian-Fang Hu
Novel Regression Video Language Understanding 2 Dimensional Label Joint Alignment Multi Paragraph Video Grounding Siamese Sleep Transformer

December 26, 2022

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Wei Ji, Long Chen, Yinwei Wei, Yiming Wu, Tat-Seng Chua
Multi Resolution High Temporal Resolution ATAC Net Multi Paragraph Video Grounding

Multi Paragraph Video Grounding

Papers

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

Multi-sentence Video Grounding for Long Video Generation

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding