Paper ID: 2204.12169

Using Machine Learning to Fuse Verbal Autopsy Narratives and Binary Features in the Analysis of Deaths from Hyperglycaemia

Thokozile Manaka, Terence Van Zyl, Alisha N Wade, Deepak Kar

Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an open-ended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task. Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine Learning, Natural Language Processing

Submitted: Apr 26, 2022