Paper ID: 2411.06635
Mixed Effects Deep Learning Autoencoder for interpretable analysis of single cell RNA Sequencing data
Aixa X. Andrade, Son Nguyen, Albert Montillo
Single-cell RNA sequencing (scRNA-seq) data are often confounded due to technical or biological batch effects. Existing deep learning models aim to mitigate these effects but may inadvertently discard batch-specific information. We propose a Mixed Effects Deep Learning (MEDL) Autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling fixed effects representing biological states from random effects capturing batch-specific variations, MEDL integrates both types of information into predictive models, minimizing information loss. This approach improves interpretability enabling 2D visualizations that show how the same cell would appear across different batches, facilitating exploration of batch-specific variations. We applied MEDL to three datasets: Healthy Heart, Autism Spectrum Disorder (ASDc), and Acute Myeloid Leukemia (AML). In Healthy Heart, MEDL managed 147 batches, assessing its capacity to handle high batch numbers. In ASDc, MEDL captured donor heterogeneity between autistic and healthy individuals, while in AML, it distinguished heterogeneity in a complex setting with variable cell-type presence and malignant cells in diseased donors. These applications demonstrate MEDL's potential to capture fixed and random effects, improve visualization, and enhance predictive accuracy, offering a robust framework for cellular heterogeneity analysis across diverse datasets.
Submitted: Nov 11, 2024