Paper ID: 2301.04844

SACDNet: Towards Early Type 2 Diabetes Prediction with Uncertainty for Electronic Health Records

Tayyab Nasir, Muhammad Kamran Malik

Type 2 diabetes mellitus (T2DM) is one of the most common diseases and a leading cause of death. The problem of early diagnosis of T2DM is challenging and necessary to prevent serious complications. This study proposes a novel neural network architecture for early T2DM prediction using multi-headed self-attention and dense layers to extract features from historic diagnoses, patient vitals, and demographics. The proposed technique is called the Self-Attention for Comorbid Disease Net (SACDNet), achieving an accuracy of 89.3% and an F1-Score of 89.1%, having a 1.6% increased accuracy and 1.3% increased f1-score compared to the baseline techniques. Monte Carlo (MC) Dropout is applied to the SACDNet to get a bayesian approximation. A T2DM prediction framework based on the MC Dropout SACDNet is proposed to quantize the uncertainty associated with the predictions. A T2DM prediction dataset is also built as part of this study which is based on real-world routine Electronic Health Record (EHR) data comprising 4,124 diabetic and 181,767 non-diabetic examples, collected from 295 different EHR systems running in different parts of the United States of America. This dataset is further used to evaluate 7 different machine learning and 3 deep learning-based models. Finally, a detailed analysis of the fairness of every technique against different patient demographic groups is performed to validate the unbiased generalization of the techniques and the diversity of the data.

Submitted: Jan 12, 2023