Paper ID: 2406.02618
Immunocto: a massive immune cell database auto-generated for histopathology
Mikaël Simard, Zhuoyan Shen, Maria A. Hawkins, Charles-Antoine Collins-Fekete
With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.
Submitted: Jun 3, 2024