Paper ID: 2407.21516

Expanding the Medical Decathlon dataset: segmentation of colon and colorectal cancer from computed tomography images

I. M. Chernenkiy, Y. A. Drach, S. R. Mustakimova, V. V. Kazantseva, N. A. Ushakov, S. K. Efetov, M. V. Feldsherov

Colorectal cancer is the third-most common cancer in the Western Hemisphere. The segmentation of colorectal and colorectal cancer by computed tomography is an urgent problem in medicine. Indeed, a system capable of solving this problem will enable the detection of colorectal cancer at early stages of the disease, facilitate the search for pathology by the radiologist, and significantly accelerate the process of diagnosing the disease. However, scientific publications on medical image processing mostly use closed, non-public data. This paper presents an extension of the Medical Decathlon dataset with colorectal markups in order to improve the quality of segmentation algorithms. An experienced radiologist validated the data, categorized it into subsets by quality, and published it in the public domain. Based on the obtained results, we trained neural network models of the UNet architecture with 5-part cross-validation and achieved a Dice metric quality of $0.6988 \pm 0.3$. The published markups will improve the quality of colorectal cancer detection and simplify the radiologist's job for study description.

Submitted: Jul 31, 2024