Open Vocabulary Instance Segmentation

Open-vocabulary instance segmentation aims to automatically identify and delineate objects in images and videos, even those not seen during model training, going beyond the limitations of traditional closed-vocabulary methods. Current research focuses on integrating 2D and 3D data streams, leveraging vision-language models and diffusion techniques to improve accuracy and handle diverse object appearances, including challenging scenarios like camouflage. These advancements are significant for broader applications in scene understanding, robotic perception, and augmented reality, reducing the reliance on extensive manual annotation for new object categories.

Papers