Protein Language Model
Protein language models (PLMs) leverage the power of natural language processing techniques to analyze and generate protein sequences, aiming to improve our understanding of protein structure, function, and evolution. Current research focuses on enhancing PLMs through techniques like instruction tuning, incorporating structural information (e.g., using graph neural networks or contact maps), and employing various training strategies such as contrastive learning and reinforcement learning to optimize for specific properties or tasks. These advancements are significantly impacting various fields, including drug discovery, protein engineering, and the broader understanding of biological processes by enabling more accurate predictions and efficient design of proteins with desired characteristics.
Papers
Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, Wentao Zhang
GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models
Seongwon Kim, Parisa Mollaei, Akshay Antony, Rishikesh Magar, Amir Barati Farimani
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan