Protein Sequence
Protein sequence analysis focuses on understanding the linear order of amino acids in proteins and its relationship to protein structure, function, and evolution. Current research heavily utilizes deep learning models, including transformer-based architectures like BERT and GPT variants, and diffusion models, to analyze protein sequences, predict properties (e.g., binding sites, post-translational modifications), and even design novel proteins. These advancements are significantly impacting various fields, including drug discovery, disease diagnosis, and biotechnology, by enabling faster and more accurate prediction and design of proteins with desired characteristics. Furthermore, research is actively addressing challenges like data leakage in benchmarks and developing more efficient methods for handling large datasets.
Papers
Training Compute-Optimal Protein Language Models
Xingyi Cheng, Bo Chen, Pan Li, Jing Gong, Jie Tang, Le Song
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei Huang, Jun Xia, Bozhen Hu, Stan Z. Li
A Fusion-Driven Approach of Attention-Based CNN-BiLSTM for Protein Family Classification -- ProFamNet
Bahar Ali, Anwar Shah, Malik Niaz, Musadaq Mansoord, Sami Ullah, Muhammad Adnan
CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation
Wenrui Gou, Wenhui Ge, YangTan, Guisheng Fan, Mingchen Li, Huiqun Yu