Paper ID: 2501.00464

Addressing Challenges in Data Quality and Model Generalization for Malaria Detection

Kiswendsida Kisito Kabore, Desire Guel

Malaria remains a significant global health burden, particularly in resource-limited regions where timely and accurate diagnosis is critical to effective treatment and control. Deep Learning (DL) has emerged as a transformative tool for automating malaria detection and it offers high accuracy and scalability. However, the effectiveness of these models is constrained by challenges in data quality and model generalization including imbalanced datasets, limited diversity and annotation variability. These issues reduce diagnostic reliability and hinder real-world applicability. This article provides a comprehensive analysis of these challenges and their implications for malaria detection performance. Key findings highlight the impact of data imbalances which can lead to a 20\% drop in F1-score and regional biases which significantly hinder model generalization. Proposed solutions, such as GAN-based augmentation, improved accuracy by 15-20\% by generating synthetic data to balance classes and enhance dataset diversity. Domain adaptation techniques, including transfer learning, further improved cross-domain robustness by up to 25\% in sensitivity. Additionally, the development of diverse global datasets and collaborative data-sharing frameworks is emphasized as a cornerstone for equitable and reliable malaria diagnostics. The role of explainable AI techniques in improving clinical adoption and trustworthiness is also underscored. By addressing these challenges, this work advances the field of AI-driven malaria detection and provides actionable insights for researchers and practitioners. The proposed solutions aim to support the development of accessible and accurate diagnostic tools, particularly for resource-constrained populations.

Submitted: Dec 31, 2024