Paper ID: 2411.00054

eDOC: Explainable Decoding Out-of-domain Cell Types with Evidential Learning

Chaochen Wu, Meiyun Zuo, Lei Xie

Single-cell RNA-seq (scRNA-seq) technology is a powerful tool for unraveling the complexity of biological systems. One of essential and fundamental tasks in scRNA-seq data analysis is Cell Type Annotation (CTA). In spite of tremendous efforts in developing machine learning methods for this problem, several challenges remains. They include identifying Out-of-Domain (OOD) cell types, quantifying the uncertainty of unseen cell type annotations, and determining interpretable cell type-specific gene drivers for an OOD case. OOD cell types are often associated with therapeutic responses and disease origins, making them critical for precision medicine and early disease diagnosis. Additionally, scRNA-seq data contains tens thousands of gene expressions. Pinpointing gene drivers underlying CTA can provide deep insight into gene regulatory mechanisms and serve as disease biomarkers. In this study, we develop a new method, eDOC, to address aforementioned challenges. eDOC leverages a transformer architecture with evidential learning to annotate In-Domain (IND) and OOD cell types as well as to highlight genes that contribute both IND cells and OOD cells in a single cell resolution. Rigorous experiments demonstrate that eDOC significantly improves the efficiency and effectiveness of OOD cell type and gene driver identification compared to other state-of-the-art methods. Our findings suggest that eDOC may provide new insights into single-cell biology.

Submitted: Oct 30, 2024