Protein Sequence Generation

Protein sequence generation aims to computationally design novel amino acid sequences with desired properties, such as specific structures or functions. Recent research focuses on developing generative models, including diffusion models and transformer-based language models trained on large protein sequence datasets, often incorporating structural information (e.g., secondary structure) or taxonomic data for improved control and accuracy. These advancements enable the creation of diverse, biologically relevant protein sequences, impacting protein engineering, drug discovery, and our fundamental understanding of protein structure-function relationships.

Papers