Paper ID: 2410.03333

Comparative Analysis and Ensemble Enhancement of Leading CNN Architectures for Breast Cancer Classification

Gary Murphy, Raghubir Singh

This study introduces a novel and accurate approach to breast cancer classification using histopathology images. It systematically compares leading Convolutional Neural Network (CNN) models across varying image datasets, identifies their optimal hyperparameters, and ranks them based on classification efficacy. To maximize classification accuracy for each model we explore, the effects of data augmentation, alternative fully-connected layers, model training hyperparameter settings, and, the advantages of retraining models versus using pre-trained weights. Our methodology includes several original concepts, including serializing generated datasets to ensure consistent data conditions across training runs and significantly reducing training duration. Combined with automated curation of results, this enabled the exploration of over 2,000 training permutations -- such a comprehensive comparison is as yet unprecedented. Our findings establish the settings required to achieve exceptional classification accuracy for standalone CNN models and rank them by model efficacy. Based on these results, we propose ensemble architectures that stack three high-performing standalone CNN models together with diverse classifiers, resulting in improved classification accuracy. The ability to systematically run so many model permutations to get the best outcomes gives rise to very high quality results, including 99.75% for BreakHis x40 and BreakHis x200 and 95.18% for the Bach datasets when split into train, validation and test datasets. The Bach Online blind challenge, yielded 89% using this approach. Whilst this study is based on breast cancer histopathology image datasets, the methodology is equally applicable to other medical image datasets.

Submitted: Oct 4, 2024