Paper ID: 2409.14874

Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images

Ahjol Senbi, Tianyu Huang, Fei Lyu, Qing Li, Yuhui Tao, Wei Shao, Qiang Chen, Chengyan Wang, Shuo Wang, Tao Zhou, Yizhe Zhang

We are interested in building a ground-truth-free evaluation model to assess the quality of segmentations produced by SAM (Segment Anything Model) and its variants in medical images. This model estimates segmentation quality scores by comparing input images with their corresponding segmentation maps. Building on prior research, we frame this as a regression problem within a supervised learning framework, using Dice scores (and optionally other metrics) to compute the training loss. The model is trained using a large collection of public datasets of medical images with segmentation predictions from SAM and its variants. We name this model EvanySeg (Evaluation of Any Segmentation in Medical Images). Our exploration of convolution-based models (e.g., ResNet) and transformer-based models (e.g., ViT) revealed that ViT offers superior performance for EvanySeg. This model can be employed for various tasks, including: (1) identifying poorly segmented samples by detecting low-percentile segmentation quality scores; (2) benchmark segmentation models without ground truth by averaging scores across test samples; (3) alerting human experts during human-AI collaboration by applying a threshold within the score space; and (4) selecting the best segmentation prediction for each test sample at test time when multiple segmentation models are available, by choosing the prediction with the highest score. Models and code will be made available at this https URL.

Submitted: Sep 23, 2024