Paper ID: 2409.13057

Natural Language Processing Methods for the Study of Protein-Ligand Interactions

James Michels, Ramya Bandarupalli, Amin Ahangar Akbari, Thai Le, Hong Xiao, Jing Li, Erik F. Y. Hom

Recent advances in Natural Language Processing (NLP) have ignited interest in developing effective methods for predicting protein-ligand interactions (PLIs) given their relevance to drug discovery and protein engineering efforts and the ever-growing volume of biochemical sequence and structural data available. The parallels between human languages and the "languages" used to represent proteins and ligands have enabled the use of NLP machine learning approaches to advance PLI studies. In this review, we explain where and how such approaches have been applied in the recent literature and discuss useful mechanisms such as long short-term memory, transformers, and attention. We conclude with a discussion of the current limitations of NLP methods for the study of PLIs as well as key challenges that need to be addressed in future work.

Submitted: Sep 19, 2024