Video Scene Graph Generation

Video scene graph generation (VidSGG) aims to automatically create structured representations of video content, depicting objects and their relationships over time. Current research focuses on improving the accuracy and robustness of VidSGG models, particularly addressing challenges like long-tailed data distributions and the need for precise object localization (e.g., using panoptic segmentation). This involves developing advanced architectures such as transformers and graph neural networks, often incorporating spatial and temporal context through attention mechanisms and knowledge-embedding techniques. VidSGG advancements are crucial for enhancing video understanding capabilities in various applications, including autonomous driving, video retrieval, and human-computer interaction.

Papers