Paper ID: 2406.05285

VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography

Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

Medical image segmentation is a core component of precision medicine, and 3D computed tomography (CT) is one of the most important imaging techniques. A highly accurate and clinically applicable segmentation foundation model will greatly facilitate clinicians and researchers using CT images. Although existing foundation models have attracted great interest, none are adequate for 3D CT, either because they lack accurate automatic segmentation for large cohort analysis or the ability to segment novel classes. An ideal segmentation solution should possess two features: accurate out-of-the-box performance covering major organ classes, and effective adaptation or zero-shot ability to novel structures. To achieve this goal, we introduce Versatile Imaging SegmenTation and Annotation model (VISTA3D). VISTA3D is trained systematically on 11454 volumes and provides accurate out-of-the-box segmentation for 127 common types of human anatomical structures and various lesions. Additionally, VISTA3D supports 3D interactive segmentation, allowing convenient editing of automatic results and achieving state-of-the-art annotation results on unseen classes. The novel model design and training recipe represent a promising step toward developing a versatile medical image foundation model and will serve as a valuable foundation for CT image analysis. Code and model weights are available at https://github.com/Project-MONAI/VISTA

Submitted: Jun 7, 2024