Paper ID: 2410.09280

Predicting Drug Effects from High-Dimensional, Asymmetric Drug Datasets by Using Graph Neural Networks: A Comprehensive Analysis of Multitarget Drug Effect Prediction

Avishek Bose, Guojing Cong

Graph neural networks (GNNs) have emerged as one of the most effective ML techniques for drug effect prediction from drug molecular graphs. Despite having immense potential, GNN models lack performance when using datasets that contain high-dimensional, asymmetrically co-occurrent drug effects as targets with complex correlations between them. Training individual learning models for each drug effect and incorporating every prediction result for a wide spectrum of drug effects are impractical. Therefore, an opportunity exists to address this challenge as multitarget prediction problems and predict all drug effects at a time. We developed standard and hybrid GNNs to perform two separate tasks: multiregression for continuous values and multilabel classification for categorical values contained in our datasets. Because multilabel classification makes the target data even more sparse and introduces asymmetric label co-occurrence, learning these models becomes difficult and heavily impacts the GNN's performance. To address these challenges, we propose a new data oversampling technique to improve multilabel classification performances on all the given imbalanced molecular graph datasets. Using the technique, we improve the data imbalance ratio of the drug effects while protecting the datasets' integrity. Finally, we evaluate the multilabel classification performance of the best-performing hybrid GNN model on all the oversampled datasets obtained from the proposed oversampling technique. In all the evaluation metrics (i.e., precision, recall, and F1 score), this model significantly outperforms other ML models, including GNN models when they are trained on the original datasets or oversampled datasets with MLSMOTE, which is a well-known oversampling technique.

Submitted: Oct 11, 2024