Paper ID: 2401.05145
Machine Learning to Promote Translational Research: Predicting Patent and Clinical Trial Inclusion in Dementia Research
Matilda Beinat, Julian Beinat, Mohammed Shoaib, Jorge Gomez Magenti
Projected to impact 1.6 million people in the UK by 2040 and costing {\pounds}25 billion annually, dementia presents a growing challenge to society. This study, a pioneering effort to predict the translational potential of dementia research using machine learning, hopes to address the slow translation of fundamental discoveries into practical applications despite dementia's significant societal and economic impact. We used the Dimensions database to extract data from 43,091 UK dementia research publications between the years 1990-2023, specifically metadata (authors, publication year etc.), concepts mentioned in the paper, and the paper abstract. To prepare the data for machine learning we applied methods such as one hot encoding and/or word embeddings. We trained a CatBoost Classifier to predict if a publication will be cited in a future patent or clinical trial. We trained several model variations. The model combining metadata, concept, and abstract embeddings yielded the highest performance: for patent predictions, an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.84 and 77.17% accuracy; for clinical trial predictions, an AUROC of 0.81 and 75.11% accuracy. The results demonstrate that integrating machine learning within current research methodologies can uncover overlooked publications, expediting the identification of promising research and potentially transforming dementia research by predicting real-world impact and guiding translational strategies.
Submitted: Jan 10, 2024