Paper ID: 2411.18759

Classification of Deceased Patients from Non-Deceased Patients using Random Forest and Support Vector Machine Classifiers

Dheeman Saha, Aaron Segura, Biraj Tiwari

Analyzing large datasets and summarizing it into useful information is the heart of the data mining process. In healthcare, information can be converted into knowledge about patient historical patterns and possible future trends. During the COVID-19 pandemic, data mining COVID-19 patient information poses an opportunity to discover patterns that may signal that the patient is at high risk for death. COVID-19 patients die from sepsis, a complex disease process involving multiple organ systems. We extracted the variables physicians are most concerned about regarding viral septic infections. With the aim of distinguishing COVID-19 patients who survive their hospital stay and those COVID-19 who do not, the authors of this study utilize the Support Vector Machine (SVM) and the Random Forest (RF) classification techniques to classify patients according to their demographics, laboratory test results, and preexisting health conditions. After conducting a 10-fold validation procedure, we assessed the performance of the classification through a Receiver Operating Characteristic (ROC) curve, and a Confusion Matrix was used to determine the accuracy of the classifiers. We also performed a cluster analysis on the binary factors, such as if the patient had a preexisting condition and if sepsis was identified, and the numeric values from patient demographics and laboratory test results as predictors.

Submitted: Nov 27, 2024