Paper ID: 2407.18554

Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Carolin Flosdorf, Justin Engelker, Igor Keller, Nicolas Mohr

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.

Submitted: Jul 26, 2024