Paper ID: 2410.22452

Explainable convolutional neural network model provides an alternative genome-wide association perspective on mutations in SARS-CoV-2

Parisa Hatami, Richard Annan, Luis Urias Miranda, Jane Gorman, Mengjun Xie, Letu Qingge, Hong Qin

Identifying mutations of SARS-CoV-2 strains associated with their phenotypic changes is critical for pandemic prediction and prevention. We compared an explainable convolutional neural network (CNN) and the traditional genome-wide association study (GWAS) on the mutations associated with WHO labels of SARS-CoV-2, a proxy for virulence phenotypes. We trained a CNN classification model that can predict genomic sequences into Variants of Concern (VOCs), and then applied Shapley Additive explanations (SHAP) model to identify mutations that are important for the correct predictions. For comparison, we performed traditional GWAS to identify mutations associated with VOCs. Comparison of the two approaches shows that the explainable neural network approach can more effectively reveal known nucleotide substitutions associated with VOCs, such as those in the spike gene regions. Our results suggest that explainable neural networks for genomic sequences offer a promising alternative to the traditional genome wide analysis approaches.

Submitted: Oct 29, 2024