Protein Language Model
Protein language models (PLMs) leverage the power of natural language processing techniques to analyze and generate protein sequences, aiming to improve our understanding of protein structure, function, and evolution. Current research focuses on enhancing PLMs through techniques like instruction tuning, incorporating structural information (e.g., using graph neural networks or contact maps), and employing various training strategies such as contrastive learning and reinforcement learning to optimize for specific properties or tasks. These advancements are significantly impacting various fields, including drug discovery, protein engineering, and the broader understanding of biological processes by enabling more accurate predictions and efficient design of proteins with desired characteristics.
Papers
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Penić, Tin Vlašić, Roland G. Huber, Yue Wan, Mile Šikić
A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration
Yanlin Zhou, Kai Tan, Xinyu Shen, Zheng He, Haotian Zheng
Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, Wentao Zhang