Video Instance Segmentation
Video instance segmentation (VIS) aims to simultaneously detect, segment, and track individual objects throughout a video, a crucial step for numerous applications like autonomous driving and video analysis. Current research emphasizes improving accuracy and efficiency, particularly in challenging scenarios involving occlusions, fast motion, and a large number of objects, often leveraging transformer-based architectures and techniques like query-based detection and tracking. Significant efforts focus on reducing annotation costs through weakly-supervised or even unsupervised learning methods, and on adapting models for open-vocabulary settings to handle unseen object categories. These advancements are driving progress in various fields by enabling more robust and versatile video understanding capabilities.
Papers
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell
1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation
Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan