Reference Video

Reference video research focuses on understanding and leveraging the relationship between a target video and one or more reference videos, aiming to improve tasks like video question answering, instruction following, and video similarity detection. Current research employs multimodal large language models and video-conditioned language models, often incorporating techniques like spatiotemporal feature extraction and novel attention mechanisms to compare and reason across multiple video streams. This work has significant implications for applications such as personalized AR/VR assistance, automated video content moderation, and enhancing the accessibility and impact of scientific research through video communication.

Papers