Paper ID: 2407.00362

JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

Peiliang Zhang, Yujia Tong, Chenghu Du, Chao Che, Yongjun Zhu

Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.

Submitted: Jun 29, 2024