Video Scene Graph Generation
Video scene graph generation (VidSGG) aims to automatically create structured representations of video content, depicting objects and their relationships over time. Current research focuses on improving the accuracy and robustness of VidSGG models, particularly addressing challenges like long-tailed data distributions and the need for precise object localization (e.g., using panoptic segmentation). This involves developing advanced architectures such as transformers and graph neural networks, often incorporating spatial and temporal context through attention mechanisms and knowledge-embedding techniques. VidSGG advancements are crucial for enhancing video understanding capabilities in various applications, including autonomous driving, video retrieval, and human-computer interaction.
Papers
Panoptic Video Scene Graph Generation
Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu
HAtt-Flow: Hierarchical Attention-Flow Mechanism for Group Activity Scene Graph Generation in Videos
Naga VS Raviteja Chappa, Pha Nguyen, Thi Hoang Ngan Le, Khoa Luu