Protein Sequence Encoder
Protein sequence encoders are computational tools designed to represent protein sequences as numerical vectors, facilitating analysis and prediction of protein properties and functions. Current research focuses on developing sophisticated encoder architectures, including transformers and graph neural networks, often incorporating contrastive learning and multimodal data integration (e.g., combining sequence and structural information) to improve accuracy and efficiency. These advancements are significantly impacting various fields, such as drug discovery (via virtual screening and de novo molecule design) and protein engineering (through improved mutation analysis and design), by enabling faster and more accurate predictions based on protein sequence data.
Papers
S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search
Gengmo Zhou, Zhen Wang, Feng Yu, Guolin Ke, Zhewei Wei, Zhifeng Gao
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering
Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, Yu Guang Wang