Paper ID: 2312.05585

Enhancing Medical Specialty Assignment to Patients using NLP Techniques

Chris Solomou

The introduction of Large Language Models (LLMs), and the vast volume of publicly available medical data, amplified the application of NLP to the medical domain. However, LLMs are pretrained on data that are not explicitly relevant to the domain that are applied to and are often biased towards the original data they were pretrained upon. Even when pretrained on domainspecific data, these models typically require time-consuming fine-tuning to achieve good performance for a specific task. To address these limitations, we propose an alternative approach that achieves superior performance while being computationally efficient. Specifically, we utilize keywords to train a deep learning architecture that outperforms a language model pretrained on a large corpus of text. Our proposal does not require pretraining nor fine-tuning and can be applied directly to a specific setting for performing multi-label classification. Our objective is to automatically assign a new patient to the specialty of the medical professional they require, using a dataset that contains medical transcriptions and relevant keywords. To this end, we fine-tune the PubMedBERT model on this dataset, which serves as the baseline for our experiments. We then twice train/fine-tune a DNN and the RoBERTa language model, using both the keywords and the full transcriptions as input. We compare the performance of these approaches using relevant metrics. Our results demonstrate that utilizing keywords for text classification significantly improves classification performance, for both a basic DL architecture and a large language model. Our approach represents a promising and efficient alternative to traditional methods for finetuning language models on domain-specific data and has potential applications in various medical domains

Submitted: Dec 9, 2023