Mask Annotation

Mask annotation, the process of creating pixel-level labels for images, is crucial for training many computer vision models but is time-consuming and expensive. Current research focuses on reducing annotation burden through weakly-supervised or even unsupervised methods, often leveraging powerful pre-trained models like Segment Anything Model (SAM) and incorporating techniques like prompt engineering, pseudo-labeling, and consistency modeling to generate high-quality masks from limited data (e.g., bounding boxes, scribbles, or text descriptions). This work is significant because it enables the development of more accurate and robust computer vision systems across various applications, including medical image analysis, object detection, and video segmentation, while significantly reducing the need for manual labeling.

Papers