Video Instance Segmentation
Video instance segmentation (VIS) aims to simultaneously detect, segment, and track individual objects throughout a video, a crucial step for numerous applications like autonomous driving and video analysis. Current research emphasizes improving accuracy and efficiency, particularly in challenging scenarios involving occlusions, fast motion, and a large number of objects, often leveraging transformer-based architectures and techniques like query-based detection and tracking. Significant efforts focus on reducing annotation costs through weakly-supervised or even unsupervised learning methods, and on adapting models for open-vocabulary settings to handle unseen object categories. These advancements are driving progress in various fields by enabling more robust and versatile video understanding capabilities.
Papers
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge
Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation
Anirudh S Chakravarthy, Won-Dong Jang, Zudi Lin, Donglai Wei, Song Bai, Hanspeter Pfister