The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe
Simple Summary
Abstract
1. Introduction
2. The Vast but Evolutionarily Constrained Protein Functional Universe
3. Beyond Evolutionary Boundaries: Exploring the Functional Universe
3.1. The AI-Driven Paradigm Shift in Protein Engineering
3.2. Main Paradigms of AI-Driven De Novo Protein Design
- Two-Stage Generative Design
- 2.
- Sequence-Guided Language Methods
- 3.
- Sequence–Structure Co-Guided Methods.
4. The AI Toolbox for De Novo Protein Design
4.1. Protein Structure Prediction
4.2. De Novo Backbone Generation
4.3. “Fixed-Backbone” Sequence Design
4.4. Sequence Generation
4.5. Sequence–Structure Co-Design
Model Name | Release Date | Experimental Validation | Model Description | Algorithms | Efficiency | Advantage | Limitation | Is the Code Publicly Available? | Ref. |
---|---|---|---|---|---|---|---|---|---|
Protein Structure Prediction | |||||||||
Alphafold2 | 15 July 2021 | / | The first model predicts protein structures with atomic accuracy. | Evoformer | Low | Atomic accuracy | Resource-intensive; MSA-dependent | https://github.com/google-deepmind/alphafold (accessed on 5 September 2025) | [67] |
RoseTTAFold | 19 August 2021 | / | Accurately predicts protein structures and interactions. | 3D Transformer | Middle | Fast, flexible; accuracy | Less accurate than AF2 on hard targets | https://github.com/RosettaCommons/RoseTTAFold (accessed on 5 September 2025) | [69] |
ColabFold | 30 May 2022 | / | Fast and easy for the prediction of protein structures and complexes. | Evoformer | High | Fast, accessible | Depends on the AF2 back end | https://github.com/sokrypton/ColabFold (accessed on 5 September 2025) | [70] |
OmegaFold | 2 July 2022 | / | Predict orphan proteins and rapidly evolving antibodies. | PLM | High | Fast, MSA-free | Slightly lower accuracy on some targets | https://github.com/HeliXonProtein/OmegaFold (accessed on 5 September 2025) | [73] |
ESMFold | 16 March 2023 | / | Predict protein structure with atomic precision using language models. | PLM | High | Fast, MSA-free | Lower atomic precision vs. AF2 | https://github.com/mit-ll/ESMFold (accessed on 5 September 2025) | [29] |
OpenFold | 14 May 2024 | / | An open-source, trainable implementation of AF2. | Evoformer | Middle | Open, reproducible AF2 implementation | Similar computational needs to AF2 | https://github.com/aqlaboratory/openfold (accessed on 5 September 2025) | [71] |
SPIRED | 27 August 2024 | / | Enhances prediction speed, reduces training consumption. | Unit-based | Middle | Faster training/inference; efficient design | Accuracy below AF2 | https://github.com/Gonglab-THU/SPIRED-Fitness (accessed on 5 September 2025) | [72] |
AFsample2 | 5 March 2025 | / | Expands the structural diversity of AF2’s generative models. | Stochastic sampling | Low | Produces conformational ensembles | Much higher computation for sampling | https://github.com/iamysk/AFsample2 (accessed on 5 September 2025) | [74] |
D-I-TASSER | 23 May 2025 | / | Predicts large multidomain protein structures. | Deep and physics | Low | Good for multi-domain proteins | Slow; template/MSA-dependent | https://zhanggroup.org/D-I-TASSER/download/ (accessed on 5 September 2025) | [75] |
Predicted multimers structure | |||||||||
AlphaFold Multimer | 10 October 2021 | / | Predicts the structure of protein complexes. | Evoformer | Low | Improved multimer predictions | Higher computation requirements | https://github.com/jcheongs/alphafold-multimer (accessed on 5 September 2025) | [79] |
RoseTTAFoldNA | 23 November 2023 | / | Predicts the structures of protein-nucleic acid complexes. | 3D Transformer | Middle | Predicts protein–nucleic acid complexes | Limited NA training data | https://github.com/uw-ipd/RoseTTAFold2NA (accessed on 5 September 2025) | [80] |
RoseTTAFold All-Atom | 7 March 2024 | / | Predicts the structures of biomolecular assemblies containing proteins, nucleic acids, small molecules, metals, and chemical modifications. | 3D Transformer | Low | Models small molecules/ions | Computationally demanding, high memory | https://github.com/baker-laboratory/RoseTTAFold-All-Atom (accessed on 5 September 2025) | [60] |
AlphaFold3 | 8 May 2024 | / | Predicts biomolecular complexes such as proteins, nucleic acids, small molecules, ions, and modified residues. | Evoformer | Low | Multimolecule unified modeling | Large computational requirements | https://github.com/google-deepmind/alphafold3 (accessed on 5 September 2025) | [56] |
Chai-1 | 11 October 2024 | / | Predicts the structures of protein–ligand complexes and protein multimer. | PLM | High | Aims MSA-free multimodal prediction | Public details limited | https://github.com/chaidiscovery/chai-lab (accessed on 5 September 2025) | [81] |
Boltz-1 | 20 November 2024 | / | Open-source AF3-level precision prediction model. | Evoformer | High | Optimized AF3 architecture | Public details limited | https://github.com/jwohlwend/boltz (accessed on 5 September 2025) | [82] |
Boltz-2 | 6 June 2025 | / | Simultaneous prediction of protein–small molecule complex structures and binding affinity. | Evoformer | High | Adds affinity estimation to the structure | Public details limited | https://github.com/jwohlwend/boltz (accessed on 5 September 2025) | [83] |
De novo protein backbone generation | |||||||||
RFdiffusion | 11 July 2023 | Yes | Unconditional/topology monomers; binders; symmetric oligomers; enzyme scaffolds; motif scaffolds. | Diffusion | Middle | Versatile backbone generation | Sampling is computationally allyintensive | https://github.com/RosettaCommons/RFdiffusion (accessed on 5 September 2025) | [88] |
FrameDiff | 22 May 2023 | No | Independent monomer generation with up to 500 amino acids without pretraining. | Diffusion | High | Efficient; no pretrained predictor needed | Needs more validation | https://github.com/jasonkyuyim/se3_diffusion (accessed on 5 September 2025) | [94] |
Chroma | 15 November 2023 | Yes | Programmable protein generation via symmetry, shape, class, or text inputs. | Diffusion | High | Efficient, scalable | Limited benchmarks | https://github.com/generatebio/chroma (accessed on 5 September 2025) | [95] |
FoldingDiff | 5 February 2024 | No | Unconditionally generates highly realistic protein structures. | Diffusion | High | Scales to long chains | Indirect side-chain generation | https://github.com/microsoft/foldingdiff (accessed on 5 September 2025) | [122] |
RFdiffusion All-Atom | 7 March 2024 | Yes | Ligand-guided de novo protein scaffold design. | Diffusion | Middle | Atomistic pockets and ligand design | Computationally costly | https://github.com/baker-laboratory/rf_diffusion_all_atom (accessed on 5 September 2025) | [60] |
RSO | 24 October 2024 | Yes | Relaxed-sequence optimization enabling large-scale protein design without retraining. | Hallucination-based | Low | Joint sequence/structure optimization | Local optimum risk | https://github.com/sokrypton/ColabDesign (accessed on 5 September 2025) | [96] |
SCUBA-D | 21 November 2024 | Yes | Unconditional generation; generation based on sketch input; motif scaffolding. | Diffusion | Middle | Sample novel folds | Experimental validation needed | https://github.com/liuyf020419/SCUBA-D (accessed on 5 September 2025) | [123] |
Proteina | 2 March 2025 | No | Unconditional/class-conditional generation; motif scaffolding. | Flow-matching | High | Generates long chains (up to 800 aa) | Needs side-chain step | https://github.com/NVIDIA-Digital-Bio/proteina (accessed on 5 September 2025) | [99] |
RFdiffusion2 | 10 April 2025 | Yes | Atom-level active site scaffolding without residue indexing or rotamer sampling. | Diffusion | Middle | Direct active-site scaffolding | Computationally costly | https://github.com/RosettaCommons/RFdiffusion2 (accessed on 5 September 2025) | [93] |
ProtComposer | 6 March 2025 | No | Ellipsoid-guided protein generation with customizable layouts. | Flow-matching | High | Conditional layout control | Complex implementation | https://github.com/NVlabs/protcomposer (accessed on 5 September 2025) | [124] |
TopoDiff | 18 June 2025 | Yes | Enabling both unconditional and controllable diffusion-based protein generation. | Diffusion | High | Topology-controlled design | Public details sparse | https://github.com/meneshail/TopoDiff/tree/main (accessed on 5 September 2025) | [125] |
‘Fixed-backbone’ sequence design | |||||||||
ESM-IF | 10 April 2022 | Yes | Inverse folding (protein complexes, partially masked structures, binding interfaces, and multiple states). | PLM | Middle | PLM priors improve novelty | Only backbone design | https://github.com/facebookresearch/esm (accessed on 5 September 2025) | [101] |
ProteinMPNN | 15 September 2022 | Yes | Inverse folding (monomers, cyclic oligomers, protein nanoparticles, and protein-protein interfaces). | MPNN | High | Fast; strong inverse-folding performance | Ignores the ligand context | https://github.com/dauparas/ProteinMPNN (accessed on 5 September 2025) | [102] |
ProRefiner | 16 November 2023 | Yes | Structure-guided residue sequence inpainting with entropy-based global noise filtering. | Transformer | Middle | Improves model outputs | Training complexity | https://github.com/veghen/ProRefiner (accessed on 5 September 2025) | [126] |
CarbonDesign | 23 May 2024 | No | Inverse folding, zero-shot prediction of mutational effects on protein function. | Transformer | Middle | Multimodal constraint integration | Needs broader benchmarking | https://github.com/carbon-design-system/carbon (accessed on 5 September 2025) | [106] |
CARBonAra | 25 July 2024 | Yes | Designs protein sequences under the constraints of specific molecular interaction environments. | Transformer | Middle | Handles ligand/metal contexts | Training complexity | https://github.com/LBM-EPFL/CARBonARa (accessed on 5 September 2025) | [107] |
LigandMPNN | 28 March 2025 | Yes | Simultaneously outputs ligand-binding sequences and sidechain conformations for interaction analysis. | MPNN | Middle | Designs ligand interfaces | Requires ligand coordinates | https://github.com/dauparas/LigandMPNN (accessed on 5 September 2025) | [105] |
FAMPNN | 17 February 2025 | No | Full-atom protein sequence design. | MPNN | High | Full-atom sequence and sidechain output | Limited public details | https://github.com/richardshuai/fampnn (accessed on 5 September 2025) | [127] |
Methods generating sequences | |||||||||
ProtGPT2 | 27 July 2022 | No | High-throughput de novo protein sequence generation. | Transformer | High | Fast generation | Limited control | https://github.com/TeletcheaLab/protGPT2 (accessed on 5 September 2025) | [108] |
ProGen | 26 January 2023 | Yes | Generates functional artificial proteins across families based on a conditional language model. | Transformer | Middle | Conditional generation possible | Computationally demanding | https://github.com/salesforce/progen (accessed on 5 September 2025) | [109] |
ESM2 | 16 May 2023 | Yes | Learns evolutionary patterns for accurate structure–function prediction. | PLM | High | Excellent embeddings; fast | Not primarily generative | https://github.com/facebookresearch/esm (accessed on 5 September 2025) | [29] |
ProGen2 | 15 November 2023 | No | Evolutionary modeling, de novo generation, and zero-shot fitness prediction. | Transformer | Low | Strong generative power | Resource heavy | https://github.com/anonymized-research/progen2 (accessed on 5 September 2025) | [110] |
xTrimoPGLM | 3 April 2025 | No | Large-scale language models for protein analysis and design. | Hybrid | Low | Scales to large tokens | Training/inference costly | https://github.com/ONERAI/xTrimoPGLM (accessed on 5 September 2025) | [115] |
ProGen3 | 16 April 2025 | Yes | Its scale enables broader viable protein generation. | Transformer | Low | Super-scale generative model | Extremely resource-intensive | https://github.com/Profluent-AI/progen3 (accessed on 5 September 2025) | [27] |
Sequence–structure co-design | |||||||||
Multiflow | 7 February 2024 | No | DFMs and multiflow enable protein co-design. | Flow-matching | High | Accurate joint sequence–structure | No side-chain output | https://github.com/jasonkyuyim/multiflow (accessed on 5 September 2025) | [118] |
ProteinGenerator | 25 September 2024 | Yes | Generates diverse de novo proteins under customizable sequence constraints. | Diffusion | Middle | Property-guided seq–structure co-design | Weak on long protein | https://github.com/RosettaCommons/protein_generator (accessed on 5 September 2025) | [119] |
ESM3 | 16 January 2025 | Yes | Supports multi-modal prompt control (sequences, structures, and functions) for generating proteins. | PLM | Low | Multimodal reasoning | Computationally demanding | https://github.com/Cogibra/esm3 (accessed on 5 September 2025) | [120] |
Pinal | 31 March 2025 | No | Protein structure and language co-constrained sequence design. | Transformer | Low | Text → structure → sequence pipeline | Limited by training distribution biases | https://github.com/westlake-repl/Denovo-Pinal (accessed on 5 September 2025) | [121] |
5. AI as an Engine for Protein Functional Universe Exploration
5.1. Exploring Novel Folds and Topologies
5.2. Designing Functional Sites De Novo
5.3. Exploring Sequence–Structure–Function Landscapes
5.4. AI-Driven De Novo Protein Design for Applications in Biotechnology and Synthetic Biology
Molecule Number | Molecule | Target and Activity | Method | Indications/Function | Ref. |
---|---|---|---|---|---|
Therapeutic Proteins | |||||
1 and 2 and 3 | SHRT | Short-chain α-neurotoxins (ScNtx) KD = 0.9 nM, Tm = 78 °C | RFdiffusion ProteinMPNN | Snake venom toxins. | [134] |
LNG | Long-chain α-neurotoxin (P01391) KD = 1.9 nM, Tm > 95 °C | ||||
CYTX | Cytotoxins (Naja pallida) KD = 271 nM, Tm = 61 °C | ||||
4 | TNFR1_mb2_pd1 | The tumor necrosis factor receptor 1(TNFR1) KD (TNFR1) < 10 pM | RFdiffusion ProteinMPNN | Inflammatory disease. | [133] |
5 and 6 and 7 | 23R-91 | Interleukin (IL)-23R KD < 1 pM | Rosetta | Autoinflammatory diseases. | [48] |
17–53 | IL-17 KD = 10 pM | ||||
8 | ELIXIR | NaV1.5 carboxy-terminal domain KD = 0.89 ± 0.25 μM | AfDesign | Cardiac arrhythmias and epilepsy. | [148] |
Enzyme Engineering | |||||
9 | Serine hydrolases | kcat/Km = 2.2 × 105 M−1s−1 | RFdiffusion LigandMPNN PLACER | Catalyze ester hydrolysis with catalytic. | [136] |
10 | Kemp eliminase | kcat/Km = 1.27 × 104 M−1s−1 | Rosetta, PROSS FuncLib, AlphaFold2 | Kemp elimination. | [149] |
11 | Metallohydrolases | kcat/Km = 2.3 × 104 M−1s−1 | RFam, ProteinMPNN AlphaFold2 | Catalyzes some difficult hydrolysis reactions. | [151] |
12 | Retroaldolase | kcat/Km = 1.1 × 104 M−1min−1 | ChemNet, Rosetta LigandMPNN | Catalyze the reverse aldol reaction. | [152] |
13 and 14 | Carbonic anhydrases Lactate dehydrogenases | NA | ZymCTRL | The fastest enzymes known in nature. Primarily in lactic acid production. | [150] |
Synthetic biological components | |||||
15 | Ras-LOCKR-S/PL | Ras-GTP | Rosetta, AlphaFold | Sensor for Ras activity. Ras activity-dependent Proximity Labeler. | [137] |
16 | THR | / | Rosetta, ProteinMPNN | Enables modular nanomaterial design. | [139] |
17 | Allosterically protein assemblies | / | Rosetta, ProteinMPNN, RFDiffusion AlphaFold2 | Allosteric modulation. | [140] |
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, J.; Wang, X.; Ye, X.; Chen, D. Improved health outcomes of nasopharyngeal carcinoma patients 3 years after treatment by the AI-assisted home enteral nutrition management. Front. Nutr. 2025, 11, 1481073. [Google Scholar] [CrossRef]
- Yin, M.; Feng, C.; Yu, Z.; Zhang, Y.; Li, Y.; Wang, X.; Song, C.; Guo, M.; Li, C. sc2GWAS: A comprehensive platform linking single cell and GWAS traits of human. Nucleic Acids Res. 2025, 53, D1151–D1161. [Google Scholar] [CrossRef]
- Ulmer, K.M. Protein engineering. Science 1983, 219, 666–671. [Google Scholar] [CrossRef]
- Arnold, F.H. Design by directed evolution. Acc. Chem. Res. 1998, 31, 125–131. [Google Scholar] [CrossRef]
- Jäckel, C.; Kast, P.; Hilvert, D. Protein design by directed evolution. Annu. Rev. Biophys. 2008, 37, 153–173. [Google Scholar] [CrossRef]
- Koshland, D.E. Application of a theory of enzyme specificity to protein synthesis. Proc. Natl. Acad. Sci. USA 1958, 44, 98–104. [Google Scholar] [CrossRef] [PubMed]
- Fersht, A.R.; Shi, J.-P.; Knill-Jones, J.; Lowe, D.M.; Wilkinson, A.J.; Blow, D.M.; Brick, P.; Carter, P.; Waye, M.M.Y.; Winter, G. Hydrogen bonding and biological specificity analysed by protein engineering. Nature 1985, 314, 235–238. [Google Scholar] [CrossRef] [PubMed]
- Palczewski, K. G Protein–coupled receptor rhodopsin. Annu. Rev. Biochem. 2006, 75, 743–767. [Google Scholar] [CrossRef]
- Hunter, T. Signaling—2000 and beyond. Cell 2000, 100, 113–127. [Google Scholar] [CrossRef] [PubMed]
- Wittinghofer, A.; Vetter, I.R. Structure-function relationships of the g domain, a canonical switch motif. Annu. Rev. Biochem. 2011, 80, 943–971. [Google Scholar] [CrossRef]
- Wilson, I.A.; Skehel, J.J.; Wiley, D.C. Structure of the haemagglutinin membrane glycoprotein of influenza virus at 3 A resolution. Nature 1981, 289, 366–373. [Google Scholar] [CrossRef] [PubMed]
- Garcia, K.C.; Degano, M.; Stanfield, R.L.; Brunmark, A.; Jackson, M.R.; Peterson, P.A.; Teyton, L.; Wilson, I.A. An Aβ T cell receptor structure at 2.5 Å and its orientation in the TCR-MHC complex. Science 1996, 274, 209–219. [Google Scholar] [CrossRef] [PubMed]
- Mitchison, T.; Kirschner, M. Dynamic instability of microtubule growth. Nature 1984, 312, 237–242. [Google Scholar] [CrossRef]
- Shoulders, M.D.; Raines, R.T. Collagen structure and stability. Annu. Rev. Biochem. 2009, 78, 929–958. [Google Scholar] [CrossRef]
- Orgel, J.P.R.O.; Irving, T.C.; Miller, A.; Wess, T.J. Microfibrillar structure of type I collagen in situ. Proc. Natl. Acad. Sci. USA 2006, 103, 9001–9005. [Google Scholar] [CrossRef]
- Walport Mark, J. Complement. N. Engl. J. Med. 2001, 344, 1058–1066. [Google Scholar] [CrossRef]
- Janeway, C.A.J. Approaching the asymptote? Evolution and revolution in immunology. Cold Spring Harb. Symp. Quant. Biol. 1989, 54 Pt 1, 1–13. [Google Scholar] [CrossRef]
- Zhang, P.; Wei, L.; Li, J.; Wang, X. Artificial intelligence-guided strategies for next-generation biological sequence design. Natl. Sci. Rev. 2024, 11, nwae343. [Google Scholar] [CrossRef]
- Saikia, B.; Baruah, A. Recent advances in de novo computational design and redesign of intrinsically disordered proteins and intrinsically disordered protein regions. Arch. Biochem. Biophys. 2024, 752, 109857. [Google Scholar] [CrossRef]
- Anfinsen, C.B. Principles that govern the folding of protein chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef] [PubMed]
- Faure, A.J.; Martí-Aranda, A.; Hidalgo-Carcedo, C.; Beltran, A.; Schmiedel, J.M.; Lehner, B. The genetic architecture of protein stability. Nature 2024, 634, 995–1003. [Google Scholar] [CrossRef]
- Keefe, A.D.; Szostak, J.W. Functional proteins from a random-sequence library. Nature 2001, 410, 715–718. [Google Scholar] [CrossRef] [PubMed]
- Copp, J.N.; Akiva, E.; Babbitt, P.C.; Tokuriki, N. Revealing unexplored sequence-function space using sequence similarity networks. Biochemistry 2018, 57, 4651–4662. [Google Scholar] [CrossRef]
- Lemke, O.; Heineike, B.M.; Viknander, S.; Cohen, N.; Li, F.; Steenwyk, J.L.; Spranger, L.; Agostini, F.; Lee, C.T.; Aulakh, S.K.; et al. The role of metabolism in shaping enzyme structures over 400 million years. Nature 2025, 644, 280–289. [Google Scholar] [CrossRef] [PubMed]
- Yeo, J.; Han, Y.; Bordin, N.; Lau, A.M.; Kandathil, S.M.; Kim, H.; Karin, E.L.; Mirdita, M.; Jones, D.T.; Orengo, C.; et al. Metagenomic-scale analysis of the predicted protein structure universe. bioRxiv 2025. [Google Scholar] [CrossRef]
- Richardson, L.; Allen, B.; Baldi, G.; Beracochea, M.; Bileschi, M.L.; Burdett, T.; Burgin, J.; Caballero-Pérez, J.; Cochrane, G.; Colwell, L.J.; et al. MGnify: The microbiome sequence data analysis resource in 2023. Nucleic Acids Res. 2023, 51, D753–D759. [Google Scholar] [CrossRef]
- Bhatnagar, A.; Jain, S.; Beazer, J.; Curran, S.C.; Hoffnagle, A.M.; Ching, K.; Martyn, M.; Nayfach, S.; Ruffolo, J.A.; Madani, A. Scaling unlocks broader generation and deeper functional understanding of proteins. bioRxiv 2025. [Google Scholar] [CrossRef]
- Varadi, M.; Bertoni, D.; Magana, P.; Paramval, U.; Pidruchna, I.; Radhakrishnan, M.; Tsenkov, M.; Nair, S.; Mirdita, M.; Yeo, J.; et al. AlphaFold Protein Structure Database in 2024: Providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 2024, 52, D368–D375. [Google Scholar] [CrossRef]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
- Linsky, T.W.; Noble, K.; Tobin, A.R.; Crow, R.; Carter, L.; Urbauer, J.L.; Baker, D.; Strauch, E.-M. Sampling of structure and sequence space of small protein folds. Nat. Commun. 2022, 13, 7151. [Google Scholar] [CrossRef]
- Minami, S.; Kobayashi, N.; Sugiki, T.; Nagashima, T.; Fujiwara, T.; Tatsumi-Koga, R.; Chikenji, G.; Koga, N. Exploration of novel αβ-protein folds through de novo design. Nat. Struct. Mol. Biol. 2023, 30, 1132–1140. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.-S.; Boyken, S.E.; Baker, D. The coming of age of de novo protein design. Nature 2016, 537, 320–327. [Google Scholar] [CrossRef]
- Kortemme, T. De novo protein design—From new structures to programmable functions. Cell 2024, 187, 526–544. [Google Scholar] [CrossRef] [PubMed]
- Sellés Vidal, L.; Isalan, M.; Heap, J.T.; Ledesma-Amaro, R. A primer to directed evolution: Current methodologies and future directions. RSC Chem. Biol. 2023, 4, 271–291. [Google Scholar] [CrossRef] [PubMed]
- Regan, L.; DeGrado, W.F. Characterization of a helical protein designed from first principles. Science 1988, 241, 976–978. [Google Scholar] [CrossRef]
- Kuhlman, B.; Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019, 20, 681–697. [Google Scholar] [CrossRef]
- Jänes, J.; Beltrao, P. Deep learning for protein structure prediction and design—Progress and applications. Mol. Syst. Biol. 2024, 20, 162–169. [Google Scholar] [CrossRef]
- Muir, D.F.; Asper, G.P.R.; Notin, P.; Posner, J.A.; Marks, D.S.; Keiser, M.J.; Pinney, M.M. Evolutionary-scale enzymology enables exploration of a rugged catalytic landscape. Science 2025, 388, eadu1058. [Google Scholar] [CrossRef]
- Leaver-Fay, A.; Tyka, M.; Lewis, S.M.; Lange, O.F.; Thompson, J.; Jacak, R.; Kaufman, K.; Renfrew, P.D.; Smith, C.A.; Sheffler, W.; et al. ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011, 487, 545–574. [Google Scholar] [CrossRef]
- Negron, C.; Keating, A.E. Multistate protein design using CLEVER and CLASSY. Methods Enzymol. 2013, 523, 171–190. [Google Scholar] [CrossRef]
- Smadbeck, J.; Peterson, M.B.; Khoury, G.A.; Taylor, M.S.; Floudas, C.A. Protein WISDOM: A workbench for In silico De novo design of biomolecules. J. Vis. Exp. 2013, 77, e50476. [Google Scholar] [CrossRef] [PubMed]
- Wood, C.W.; Bruning, M.; Ibarra, A.Á.; Bartlett, G.J.; Thomson, A.R.; Sessions, R.B.; Brady, R.L.; Woolfson, D.N. CCBuilder: An interactive web-based tool for building, designing and assessing coiled-coil protein assemblies. Bioinformatics 2014, 30, 3029–3035. [Google Scholar] [CrossRef] [PubMed]
- Epstein, C.; Goldberger, R.; Anfinsen, C. The genetic control of tertiary protein structure: Studies with model systems. Cold Spring Harb. Symp. Quant. Biol. 1963, 28, 439–449. [Google Scholar] [CrossRef]
- Kuhlman, B.; Dantas, G.; Ireton, G.C.; Varani, G.; Stoddard, B.L.; Baker, D. Design of a novel globular protein fold with atomic-level accuracy. Science 2003, 302, 1364–1368. [Google Scholar] [CrossRef]
- Röthlisberger, D.; Khersonsky, O.; Wollacott, A.M.; Jiang, L.; DeChancie, J.; Betker, J.; Gallaher, J.L.; Althoff, E.A.; Zanghellini, A.; Dym, O.; et al. Kemp elimination catalysts by computational enzyme design. Nature 2008, 453, 190–195. [Google Scholar] [CrossRef]
- Siegel, J.B.; Zanghellini, A.; Lovick, H.M.; Kiss, G.; Lambert, A.R.; St Clair, J.L.; Gallaher, J.L.; Hilvert, D.; Gelb, M.H.; Stoddard, B.L.; et al. Computational design of an enzyme catalyst for a stereoselective bimolecular diels-alder reaction. Science 2010, 329, 309–313. [Google Scholar] [CrossRef]
- Roy, A.; Shi, L.; Chang, A.; Dong, X.; Fernandez, A.; Kraft, J.C.; Li, J.; Le, V.Q.; Winegar, R.V.; Cherf, G.M.; et al. De novo design of highly selective miniprotein inhibitors of integrins αvβ6 and αvβ8. Nat. Commun. 2023, 14, 5660. [Google Scholar] [CrossRef] [PubMed]
- Berger, S.; Seeger, F.; Yu, T.-Y.; Aydin, M.; Yang, H.; Rosenblum, D.; Guenin-Macé, L.; Glassman, C.; Arguinchona, L.; Sniezek, C.; et al. Preclinical proof of principle for orally delivered Th17 antagonist miniproteins. Cell 2024, 187, 4305–4317.e18. [Google Scholar] [CrossRef] [PubMed]
- Huang, B.; Coventry, B.; Borowska, M.T.; Arhontoulis, D.C.; Exposit, M.; Abedi, M.; Jude, K.M.; Halabiya, S.F.; Allen, A.; Cordray, C.; et al. De novo design of miniprotein antagonists of cytokine storm inducers. Nat. Commun. 2024, 15, 7064. [Google Scholar] [CrossRef] [PubMed]
- Notin, P.; Rollins, N.; Gal, Y.; Sander, C.; Marks, D. Machine learning for functional protein design. Nat. Biotechnol. 2024, 42, 216–228. [Google Scholar] [CrossRef]
- Hsu, C.; Fannjiang, C.; Listgarten, J. Generative models for protein structures and sequences. Nat. Biotechnol. 2024, 42, 196–199. [Google Scholar] [CrossRef]
- Strokach, A.; Kim, P.M. Deep generative modeling for protein design. Curr. Opin. Struct. Biol. 2022, 72, 226–236. [Google Scholar] [CrossRef]
- Chandra, A.; Tünnermann, L.; Löfstedt, T.; Gratz, R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023, 12, e82819. [Google Scholar] [CrossRef] [PubMed]
- Earl, L.A.; Falconieri, V.; Milne, J.L.; Subramaniam, S. Cryo-EM: Beyond the microscope. Curr. Opin. Struct. Biol. 2017, 46, 71–78. [Google Scholar] [CrossRef] [PubMed]
- Araya, C.L.; Fowler, D.M. Deep mutational scanning: Assessing protein function on a massive scale. Trends Biotechnol. 2011, 29, 435–442. [Google Scholar] [CrossRef]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
- Yan, Y.; Tao, H.; He, J.; Huang, S.-Y. The HDOCK server for integrated protein-protein docking. Nat. Protoc. 2020, 15, 1829–1852. [Google Scholar] [CrossRef]
- Reynisson, B.; Alvarez, B.; Paul, S.; Peters, B.; Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 2020, 48, W449–W454. [Google Scholar] [CrossRef]
- Hon, J.; Marusiak, M.; Martinek, T.; Kunka, A.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of soluble protein expression in Escherichia coli. Bioinformatics 2021, 37, 23–28. [Google Scholar] [CrossRef]
- Krishna, R.; Wang, J.; Ahern, W.; Sturmfels, P.; Venkatesh, P.; Kalvet, I.; Lee, G.R.; Morey-Burrows, F.S.; Anishchenko, I.; Humphreys, I.R.; et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528. [Google Scholar] [CrossRef] [PubMed]
- Abriata, L.A. The Nobel Prize in Chemistry: Past, present, and future of AI in biology. Commun. Biol. 2024, 7, 1409. [Google Scholar] [CrossRef]
- Marks, D.S.; Hopf, T.A.; Sander, C. Protein structure prediction from sequence variation. Nat. Biotechnol. 2012, 30, 1072–1080. [Google Scholar] [CrossRef]
- Wüthrich, K. Protein structure determination in solution by NMR spectroscopy. J. Biol. Chem. 1990, 265, 22059–22062. [Google Scholar] [CrossRef] [PubMed]
- Shi, Y. A glimpse of structural biology through X-ray crystallography. Cell 2014, 159, 995–1014. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Zeng, X.; Zhao, Y.; Chen, R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct. Target. Ther. 2023, 8, 115. [Google Scholar] [CrossRef]
- Pakhrin, S.C.; Shrestha, B.; Adhikari, B.; Kc, D.B. Deep learning-based advances in protein structure prediction. Int. J. Mol. Sci. 2021, 22, 5553. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 2021, 89, 1607–1617. [Google Scholar] [CrossRef] [PubMed]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
- Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making protein folding accessible to all. Nat. Methods 2022, 19, 679–682. [Google Scholar] [CrossRef]
- Ahdritz, G.; Bouatta, N.; Floristean, C.; Kadyan, S.; Xia, Q.; Gerecke, W.; O’Donnell, T.J.; Berenberg, D.; Fisk, I.; Zanichelli, N.; et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat. Methods 2024, 21, 1514–1524. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Xu, Y.; Liu, D.; Xing, Y.; Gong, H. An end-to-end framework for the prediction of protein structure and fitness from single sequence. Nat. Commun. 2024, 15, 7400. [Google Scholar] [CrossRef] [PubMed]
- Wu, R.; Ding, F.; Wang, R.; Shen, R.; Zhang, X.; Luo, S.; Su, C.; Wu, Z.; Xie, Q.; Berger, B.; et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022. [Google Scholar] [CrossRef]
- Kalakoti, Y.; Wallner, B. AFsample2 predicts multiple conformations and ensembles with AlphaFold2. Commun. Biol. 2025, 8, 373. [Google Scholar] [CrossRef]
- Zheng, W.; Wuyun, Q.; Li, Y.; Liu, Q.; Zhou, X.; Peng, C.; Zhu, Y.; Freddolino, L.; Zhang, Y. Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER. Nat. Biotechnol. 2025. [Google Scholar] [CrossRef]
- Elofsson, A. Progress at protein structure prediction, as seen in CASP15. Curr. Opin. Struct. Biol. 2023, 80, 102594. [Google Scholar] [CrossRef]
- Moussad, B.; Roche, R.; Bhattacharya, D. The transformative power of transformers in protein structure prediction. Proc. Natl. Acad. Sci. USA 2023, 120, e2303499120. [Google Scholar] [CrossRef]
- Li, H.; Lei, Y.; Zeng, J. Revolutionizing biomolecular structure determination with artificial intelligence. Natl. Sci. Rev. 2024, 11, nwae339. [Google Scholar] [CrossRef]
- Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021. [Google Scholar] [CrossRef]
- Baek, M.; McHugh, R.; Anishchenko, I.; Jiang, H.; Baker, D.; DiMaio, F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117–121. [Google Scholar] [CrossRef]
- Discovery, C.; Boitreaud, J.; Dent, J.; McPartlon, M.; Meier, J.; Reis, V.; Rogozhnikov, A.; Wu, K. Chai-1: Decoding the molecular interactions of life. bioRxiv 2024. [Google Scholar] [CrossRef]
- Wohlwend, J.; Corso, G.; Passaro, S.; Getz, N.; Reveiz, M.; Leidal, K.; Swiderski, W.; Atkinson, L.; Portnoi, T.; Chinn, I.; et al. Boltz-1 democratizing biomolecular interaction modeling. bioRxiv 2024. [Google Scholar] [CrossRef]
- Passaro, S.; Corso, G.; Wohlwend, J.; Reveiz, M.; Thaler, S.; Somnath, V.R.; Getz, N.; Portnoi, T.; Roy, J.; Stark, H.; et al. Boltz-2: Towards accurate and efficient binding affinity prediction. bioRxiv 2025. [Google Scholar] [CrossRef]
- Zheng, S.; He, J.; Liu, C.; Shi, Y.; Lu, Z.; Feng, W.; Ju, F.; Wang, J.; Zhu, J.; Min, Y.; et al. Predicting equilibrium distributions for molecular systems with deep learning. Nat. Mach. Intell. 2024, 6, 558–567. [Google Scholar] [CrossRef]
- Lewis, S.; Hempel, T.; Jiménez-Luna, J.; Gastegger, M.; Xie, Y.; Foong, A.Y.K.; Satorras, V.G.; Abdin, O.; Veeling, B.S.; Zaporozhets, I.; et al. Scalable emulation of protein equilibrium ensembles with generative deep learning. Science 2025, 389, eadv9817. [Google Scholar] [CrossRef] [PubMed]
- Van Kempen, M.; Kim, S.S.; Tumescheit, C.; Mirdita, M.; Lee, J.; Gilchrist, C.L.M.; Söding, J.; Steinegger, M. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 2024, 42, 243–246. [Google Scholar] [CrossRef] [PubMed]
- Yim, J.; Stärk, H.; Corso, G.; Jing, B.; Barzilay, R.; Jaakkola, T.S. Diffusion models in protein structure and docking. WIREs Comput. Mol. Sci. 2024, 14, e1711. [Google Scholar] [CrossRef]
- Watson, J.L.; Juergens, D.; Bennett, N.R.; Trippe, B.L.; Yim, J.; Eisenach, H.E.; Ahern, W.; Borst, A.J.; Ragotte, R.J.; Milles, L.F.; et al. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089–1100. [Google Scholar] [CrossRef]
- Bennett, N.R.; Watson, J.L.; Ragotte, R.J.; Borst, A.J.; See, D.L.; Weidle, C.; Biswas, R.; Yu, Y.; Shrock, E.L.; Ault, R.; et al. Atomically accurate de novo design of antibodies with RFdiffusion. bioRxiv 2024. [Google Scholar] [CrossRef] [PubMed]
- Wu, K.; Jiang, H.; Hicks, D.R.; Liu, C.; Muratspahić, E.; Ramelot, T.A.; Liu, Y.; McNally, K.; Gaur, A.; Coventry, B.; et al. Sequence-specific targeting of intrinsically disordered protein regions. bioRxiv 2024. [Google Scholar] [CrossRef]
- Sappington, I.; Toul, M.; Lee, D.S.; Robinson, S.A.; Goreshnik, I.; McCurdy, C.; Chan, T.C.; Buchholz, N.; Huang, B.; Vafeados, D.; et al. Improved protein binder design using beta-pairing targeted RFdiffusion. bioRxiv 2024. [Google Scholar] [CrossRef]
- Rettie, S.A.; Juergens, D.; Adebomi, V.; Bueso, Y.F.; Zhao, Q.; Leveille, A.N.; Liu, A.; Bera, A.K.; Wilms, J.A.; Üffing, A.; et al. Accurate de novo design of high-affinity protein-binding macrocycles using deep learning. Nat. Chem. Biol. 2025. [Google Scholar] [CrossRef] [PubMed]
- Ahern, W.; Yim, J.; Tischer, D.; Salike, S.; Woodbury, S.M.; Kim, D.; Kalvet, I.; Kipnis, Y.; Coventry, B.; Altae-Tran, H.R.; et al. Atom level enzyme active site scaffolding using RFdiffusion2. bioRxiv 2025. [Google Scholar] [CrossRef]
- Yim, J.; Trippe, B.L.; Bortoli, V.D.; Mathieu, E.; Doucet, A.; Barzilay, R.; Jaakkola, T. SE(3) diffusion model with application to protein backbone generation. In Proceedings of Machine Learning Research, Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; PMLR: Cambridge, MA, USA, 2023; Volume 202, pp. 40001–40039. [Google Scholar]
- Ingraham, J.B.; Baranov, M.; Costello, Z.; Barber, K.W.; Wang, W.; Ismail, A.; Frappier, V.; Lord, D.M.; Ng-Thow-Hing, C.; Van Vlack, E.R.; et al. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070–1078. [Google Scholar] [CrossRef]
- Frank, C.; Khoshouei, A.; Fuβ, L.; Schiwietz, D.; Putz, D.; Weber, L.; Zhao, Z.; Hattori, M.; Feng, S.; De Stigter, Y.; et al. Scalable protein design using optimization in a relaxed sequence space. Science 2024, 386, 439–445. [Google Scholar] [CrossRef]
- Anishchenko, I.; Pellock, S.J.; Chidyausiku, T.M.; Ramelot, T.A.; Ovchinnikov, S.; Hao, J.; Bafna, K.; Norn, C.; Kang, A.; Bera, A.K.; et al. De novo protein design by deep network hallucination. Nature 2021, 600, 547–552. [Google Scholar] [CrossRef]
- Wicky, B.I.M.; Milles, L.F.; Courbet, A.; Ragotte, R.J.; Dauparas, J.; Kinfu, E.; Tipps, S.; Kibler, R.D.; Baek, M.; DiMaio, F.; et al. Hallucinating symmetric protein assemblies. Science 2022, 378, 56–61. [Google Scholar] [CrossRef]
- Geffner, T.; Didi, K.; Zhang, Z.; Reidenbach, D.; Cao, Z.; Yim, J.; Geiger, M.; Dallago, C.; Kucukbenli, E.; Vahdat, A.; et al. Proteina: Scaling Flow-based Protein Structure Generative Models. arXiv 2025. [Google Scholar] [CrossRef]
- Castorina, L.V.; Petrenas, R.; Subr, K.; Wood, C.W. PDBench: Evaluating computational methods for protein-sequence design. Bioinformatics 2023, 39, btad027. [Google Scholar] [CrossRef] [PubMed]
- Hsu, C.; Verkuil, R.; Liu, J.; Lin, Z.; Hie, B.; Sercu, T.; Lerer, A.; Rives, A. Learning inverse folding from millions of predicted structures. In Proceedings of Machine Learning Research, Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; PMLR: Cambridge, MA, USA, 2022; Volume 162, pp. 8946–8970. [Google Scholar]
- Dauparas, J.; Anishchenko, I.; Bennett, N.; Bai, H.; Ragotte, R.J.; Milles, L.F.; Wicky, B.I.M.; Courbet, A.; de Haas, R.J.; Bethel, N.; et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 2022, 378, 49–56. [Google Scholar] [CrossRef]
- Sumida, K.H.; Núñez-Franco, R.; Kalvet, I.; Pellock, S.J.; Wicky, B.I.M.; Milles, L.F.; Dauparas, J.; Wang, J.; Kipnis, Y.; Jameson, N.; et al. Improving protein expression, stability, and function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 2054–2061. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Jin, X.; Lu, X.; Min, X.; Ge, S.; Li, S. Empirical validation of ProteinMPNN’s efficiency in enhancing protein fitness. Front. Genet. 2024, 14, 1347667. [Google Scholar] [CrossRef]
- Dauparas, J.; Lee, G.R.; Pecoraro, R.; An, L.; Anishchenko, I.; Glasscock, C.; Baker, D. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717–723. [Google Scholar] [CrossRef]
- Ren, M.; Yu, C.; Bu, D.; Zhang, H. Accurate and robust protein sequence design with CarbonDesign. Nat. Mach. Intell. 2024, 6, 536–547. [Google Scholar] [CrossRef]
- Krapp, L.F.; Meireles, F.A.; Abriata, L.A.; Devillard, J.; Vacle, S.; Marcaida, M.J.; Dal Peraro, M. Context-aware geometric deep learning for protein sequence design. Nat. Commun. 2024, 15, 6273. [Google Scholar] [CrossRef]
- Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348. [Google Scholar] [CrossRef]
- Madani, A.; Krause, B.; Greene, E.R.; Subramanian, S.; Mohr, B.P.; Holton, J.M.; Olmos, J.L.; Xiong, C.; Sun, Z.Z.; Socher, R.; et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 2023, 41, 1099–1106. [Google Scholar] [CrossRef]
- Nijkamp, E.; Ruffolo, J.A.; Weinstein, E.N.; Naik, N.; Madani, A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023, 14, 968–978.e3. [Google Scholar] [CrossRef] [PubMed]
- Shen, Y.; Chen, Z.; Mamalakis, M.; Liu, Y.; Li, T.; Su, Y.; He, J.; Liò, P.; Wang, Y.G. TourSynbio: A multi-modal large model and agent framework to bridge text and protein sequences for protein engineering. arXiv 2024. [Google Scholar] [CrossRef]
- Orlando, G.; Raimondi, D.; Vranken, W.F. Observation selection bias in contact prediction and its implications for structural bioinformatics. Sci. Rep. 2016, 6, 36679. [Google Scholar] [CrossRef] [PubMed]
- Derry, A.; Carpenter, K.A.; Altman, R.B. Training data composition affects performance of protein structure analysis algorithms. Pac. Symp. Biocomput. 2022, 27, 10–21. [Google Scholar] [CrossRef]
- Schmirler, R.; Heinzinger, M.; Rost, B. Fine-tuning protein language models boosts predictions across diverse tasks. Nat. Commun. 2024, 15, 7407. [Google Scholar] [CrossRef]
- Chen, B.; Cheng, X.; Li, P.; Geng, Y.; Gong, J.; Li, S.; Bei, Z.; Tan, X.; Wang, B.; Zeng, X.; et al. xTrimoPGLM: Unified 100-billion-parameter pretrained transformer for deciphering the language of proteins. Nat. Methods 2025, 22, 1028–1039. [Google Scholar] [CrossRef] [PubMed]
- Edsall, J.T. The molecular basis of evolution (Anfinsen, Christian B.). J. Chem. Educ. 1960, 37, 107. [Google Scholar] [CrossRef][Green Version]
- Zhou, J.; Panaitiu, A.E.; Grigoryan, G. A general-purpose protein design framework based on mining sequence–structure relationships in known protein structures. Proc. Natl. Acad. Sci. USA 2020, 117, 1059–1068. [Google Scholar] [CrossRef]
- Campbell, A.; Yim, J.; Barzilay, R.; Rainforth, T.; Jaakkola, T. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. arXiv 2024. [Google Scholar] [CrossRef]
- Lisanza, S.L.; Gershon, J.M.; Tipps, S.W.K.; Sims, J.N.; Arnoldt, L.; Hendel, S.J.; Simma, M.K.; Liu, G.; Yase, M.; Wu, H.; et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. 2024, 43, 1288–1298. [Google Scholar] [CrossRef]
- Hayes, T.; Rao, R.; Akin, H.; Sofroniew, N.J.; Oktay, D.; Lin, Z.; Verkuil, R.; Tran, V.Q.; Deaton, J.; Wiggert, M.; et al. Simulating 500 million years of evolution with a language model. Science 2025, 387, 850–858. [Google Scholar] [CrossRef]
- Dai, F.; Fan, Y.; Su, J.; Wang, C.; Han, C.; Zhou, X.; Liu, J.; Qian, H.; Wang, S.; Zeng, A.; et al. Toward de novo protein design from natural language. bioRxiv 2024. [Google Scholar] [CrossRef]
- Wu, K.E.; Yang, K.K.; Van Den Berg, R.; Alamdari, S.; Zou, J.Y.; Lu, A.X.; Amini, A.P. Protein structure generation via folding diffusion. Nat. Commun. 2024, 15, 1059. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, S.; Dong, J.; Chen, L.; Wang, X.; Wang, L.; Li, F.; Wang, C.; Zhang, J.; Wang, Y.; et al. De novo protein design with a denoising diffusion network independent of pretrained structure prediction models. Nat. Methods 2024, 21, 2107–2116. [Google Scholar] [CrossRef]
- Stark, H.; Jing, B.; Geffner, T.; Yim, J.; Jaakkola, T.; Vahdat, A.; Kreis, K. ProtComposer: Compositional protein structure generation with 3D ellipsoids. arXiv 2025. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Y.; Ma, Z.; Li, M.; Xu, C.; Gong, H. Improving diffusion-based protein backbone generation with global-geometry-aware latent encoding. Nat. Mach. Intell. 2025, 7, 1104–1118. [Google Scholar] [CrossRef]
- Zhou, X.; Chen, G.; Ye, J.; Wang, E.; Zhang, J.; Mao, C.; Li, Z.; Hao, J.; Huang, X.; Tang, J.; et al. ProRefiner: An entropy-based refining strategy for inverse protein folding with global graph attention. Nat. Commun. 2023, 14, 7434. [Google Scholar] [CrossRef]
- Shuai, R.W.; Widatalla, T.; Huang, P.-S.; Hie, B.L. Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN. bioRxiv 2025. [Google Scholar] [CrossRef]
- Jiang, K.; Yan, Z.; Di Bernardo, M.; Sgrizzi, S.R.; Villiger, L.; Kayabolen, A.; Kim, B.J.; Carscadden, J.K.; Hiraizumi, M.; Nishimasu, H.; et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2025, 387, eadr6006. [Google Scholar] [CrossRef] [PubMed]
- Gligorijević, V.; Renfrew, P.D.; Kosciolek, T.; Leman, J.K.; Berenberg, D.; Vatanen, T.; Chandler, C.; Taylor, B.C.; Fisk, I.M.; Vlamakis, H.; et al. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 2021, 12, 3168. [Google Scholar] [CrossRef] [PubMed]
- Bileschi, M.L.; Belanger, D.; Bryant, D.H.; Sanderson, T.; Carter, B.; Sculley, D.; Bateman, A.; DePristo, M.A.; Colwell, L.J. Using deep learning to annotate the protein universe. Nat. Biotechnol. 2022, 40, 932–937. [Google Scholar] [CrossRef]
- Bordin, N.; Dallago, C.; Heinzinger, M.; Kim, S.; Littmann, M.; Rauer, C.; Steinegger, M.; Rost, B.; Orengo, C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem. Sci. 2023, 48, 345–359. [Google Scholar] [CrossRef]
- Wang, W.; Shuai, Y.; Zeng, M.; Fan, W.; Li, M. DPFunc: Accurately predicting protein function via deep learning with domain-guided structure information. Nat. Commun. 2025, 16, 70. [Google Scholar] [CrossRef]
- Glögl, M.; Krishnakumar, A.; Ragotte, R.J.; Goreshnik, I.; Coventry, B.; Bera, A.K.; Kang, A.; Joyce, E.; Ahn, G.; Huang, B.; et al. Target-conditioned diffusion generates potent TNFR superfamily antagonists and agonists. Science 2024, 386, 1154–1161. [Google Scholar] [CrossRef] [PubMed]
- Vázquez Torres, S.; Benard Valle, M.; Mackessy, S.P.; Menzies, S.K.; Casewell, N.R.; Ahmadi, S.; Burlet, N.J.; Muratspahić, E.; Sappington, I.; Overath, M.D.; et al. De novo designed proteins neutralize lethal snake venom toxins. Nature 2025, 639, 225–231. [Google Scholar] [CrossRef] [PubMed]
- Ming, Y.; Wang, W.; Yin, R.; Zeng, M.; Tang, L.; Tang, S.; Li, M. A review of enzyme design in catalytic stability by artificial intelligence. Brief. Bioinform. 2023, 24, bbad065. [Google Scholar] [CrossRef]
- Lauko, A.; Pellock, S.J.; Sumida, K.H.; Anishchenko, I.; Juergens, D.; Ahern, W.; Jeung, J.; Shida, A.F.; Hunt, A.; Kalvet, I.; et al. Computational design of serine hydrolases. Science 2025, 388, eadu2454. [Google Scholar] [CrossRef]
- Zhang, J.Z.; Nguyen, W.H.; Greenwood, N.; Rose, J.C.; Ong, S.-E.; Maly, D.J.; Baker, D. Computationally designed sensors detect endogenous Ras activity and signaling effectors at subcellular resolution. Nat. Biotechnol. 2024, 42, 1888–1898. [Google Scholar] [CrossRef]
- Zhang, J.Z.; Ong, S.-E.; Baker, D.; Maly, D.J. Single-cell sensor analyses reveal signaling programs enabling Ras-G12C drug resistance. Nat. Chem. Biol. 2025, 21, 47–58. [Google Scholar] [CrossRef] [PubMed]
- Huddy, T.F.; Hsia, Y.; Kibler, R.D.; Xu, J.; Bethel, N.; Nagarajan, D.; Redler, R.; Leung, P.J.Y.; Weidle, C.; Courbet, A.; et al. Blueprinting extendable nanomaterials with standardized protein blocks. Nature 2024, 627, 898–904. [Google Scholar] [CrossRef]
- Pillai, A.; Idris, A.; Philomin, A.; Weidle, C.; Skotheim, R.; Leung, P.J.Y.; Broerman, A.; Demakis, C.; Borst, A.J.; Praetorius, F.; et al. De novo design of allosterically switchable protein assemblies. Nature 2024, 632, 911–920. [Google Scholar] [CrossRef]
- Hou, K.; Huang, W.; Qi, M.; Tugwell, T.H.; Alturaifi, T.M.; Chen, Y.; Zhang, X.; Lu, L.; Mann, S.I.; Liu, P.; et al. De novo design of porphyrin-containing proteins as efficient and stereoselective catalysts. Science 2025, 388, 665–670. [Google Scholar] [CrossRef]
- Poelwijk, F.J.; Kiviet, D.J.; Weinreich, D.M.; Tans, S.J. Empirical fitness landscapes reveal accessible evolutionary paths. Nature 2007, 445, 383–386. [Google Scholar] [CrossRef]
- Starr, T.N.; Thornton, J.W. Epistasis in protein evolution. Protein Sci. 2016, 25, 1204–1218. [Google Scholar] [CrossRef] [PubMed]
- Smith, J.M. Natural selection and the concept of a protein space. Nature 1970, 225, 563–564. [Google Scholar] [CrossRef]
- Fei, H.; Li, Y.; Liu, Y.; Wei, J.; Chen, A.; Gao, C. Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints. Cell 2025, 188, 4674–4692. [Google Scholar] [CrossRef]
- Biswas, S.; Khimulya, G.; Alley, E.C.; Esvelt, K.M.; Church, G.M. Low-N protein engineering with data-efficient deep learning. Nat. Methods 2021, 18, 389–396. [Google Scholar] [CrossRef]
- Wang, J.; Lisanza, S.; Juergens, D.; Tischer, D.; Watson, J.L.; Castro, K.M.; Ragotte, R.; Saragovi, A.; Milles, L.F.; Baek, M.; et al. Scaffolding protein functional sites using deep learning. Science 2022, 377, 387–394. [Google Scholar] [CrossRef] [PubMed]
- Mahling, R.; Hegyi, B.; Cullen, E.R.; Cho, T.M.; Rodriques, A.R.; Fossier, L.; Yehya, M.; Yang, L.; Chen, B.-X.; Katchman, A.N.; et al. De novo design of a peptide modulator to reverse sodium channel dysfunction linked to cardiac arrhythmias and epilepsy. Cell S0092-8674(25)00860-8. [CrossRef]
- Listov, D.; Vos, E.; Hoffka, G.; Hoch, S.Y.; Berg, A.; Hamer-Rogotner, S.; Dym, O.; Kamerlin, S.C.L.; Fleishman, S.J. Complete computational design of high-efficiency Kemp elimination enzymes. Nature 2025, 643, 1421–1427. [Google Scholar] [CrossRef]
- Munsamy, G.; Illanes-Vicioso, R.; Funcillo, S.; Nakou, I.T.; Lindner, S.; Ayres, G.; Sheehan, L.S.; Moss, S.; Eckhard, U.; Lorenz, P.; et al. Conditional language models enable the efficient design of proficient enzymes. bioRxiv 2024. [Google Scholar] [CrossRef]
- Kim, D.; Woodbury, S.M.; Ahern, W.; Tischer, D.; Hanikel, N.; Salike, S.; Yim, J.; Pellock, S.J.; Lauko, A.; Kalvet, I.; et al. Computational design of metallohydrolases. bioRxiv 2024. [Google Scholar] [CrossRef]
- Anishchenko, I.; Kipnis, Y.; Kalvet, I.; Zhou, G.; Krishna, R.; Pellock, S.J.; Lauko, A.; Lee, G.R.; An, L.; Dauparas, J.; et al. Modeling protein-small molecule conformational ensembles with ChemNet. bioRxiv 2024. [Google Scholar] [CrossRef]
- Liu, Z.; Zhao, Z.; Xie, L.; Xiao, Z.; Li, M.; Li, Y.; Luo, T. Proteomic analysis reveals chromatin remodeling as a potential therapeutical target in neuroblastoma. J. Transl. Med. 2025, 23, 234. [Google Scholar] [CrossRef]
- Zhang, G.; Song, C.; Yin, M.; Liu, L.; Zhang, Y.; Li, Y.; Zhang, J.; Guo, M.; Li, C. TRAPT: A multi-stage fused deep learning framework for predicting transcriptional regulators based on large-scale epigenomic data. Nat. Commun. 2025, 16, 3611. [Google Scholar] [CrossRef]
- Wang, M.; Zhang, Z.; Singh Bedi, A.; Guerra, S.; Lin-Gibson, S.; Cong, L.; Chakraborty, S.; Qu, Y.; Ma, J.; Xing, E.; et al. A call for built-in biosecurity safeguards for generative AI tools. Preprint 2025, 43, 845–847. [Google Scholar] [CrossRef] [PubMed]
- Irbäck, A.; Knuthson, L.; Mohanty, S.; Peterson, C. Using quantum annealing to design lattice proteins. Phys. Rev. Res. 2024, 6, 13162. [Google Scholar] [CrossRef]
- Pandey, M.; Fernandez, M.; Gentile, F.; Isayev, O.; Tropsha, A.; Stern, A.C.; Cherkasov, A. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 2022, 4, 211–221. [Google Scholar] [CrossRef]
- Lee, T.-S.; Cerutti, D.S.; Mermelstein, D.; Lin, C.; LeGrand, S.; Giese, T.J.; Roitberg, A.; Case, D.A.; Walker, R.C.; York, D.M. GPU-accelerated molecular dynamics and free energy methods in Amber18: Performance enhancements and new features. J. Chem. Inf. Model. 2018, 58, 2043–2050. [Google Scholar] [CrossRef]
Toolkit | Goal | Inputs | Outputs |
---|---|---|---|
Protein structure prediction (Section 4.1) | Produce 3D models and confidence estimates for single-chain proteins or complexes to assess foldability and guide design. | One or more amino-acid sequences, optional partner sequences, ligands, oligomeric state, templates. | Predicted coordinates (PDB); per-residue and global confidence scores (pLDDT, pTM, PAE); interface metrics for complexes. |
De novo backbone generation (Section 4.2) | Generate novel backbone geometries or scaffolds that satisfy specified geometric/functional constraints. | Design constraints (motif/active sites coordinates, desired topology/symmetry, pocket geometry, binder, anchor residues). | Ensemble of candidate backbone coordinates (atomic models). |
‘Fixed-backbone’ sequence design (Section 4.3) | Design sequences that fold to a given backbone and meet developability. | Target backbone coordinates (PDB); optional side-chain/motif constraints. | Ranked sets of candidate sequences. |
Sequence generation (Section 4.4) | Produce diverse candidate sequences de novo (unconditionally or conditionally guided). | Conditioning information (family or functional labels, motif, structural constraints, or descriptor prompts) and sampling parameters. | Batches of candidate amino-acid sequences annotated with model scores, novelty metrics and basic developability annotations. |
Sequence–structure co-design (Section 4.5) | Jointly generate matched sequence–structure pairs that satisfy functional constraints. | Functional constraints (motif geometry, binding interface, text prompt). | Paired sequence–structure candidates. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, G.; Liu, C.; Lu, J.; Zhang, S.; Zhu, L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology 2025, 14, 1268. https://doi.org/10.3390/biology14091268
Zhang G, Liu C, Lu J, Zhang S, Zhu L. The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology. 2025; 14(9):1268. https://doi.org/10.3390/biology14091268
Chicago/Turabian StyleZhang, Guohao, Chuanyang Liu, Jiajie Lu, Shaowei Zhang, and Lingyun Zhu. 2025. "The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe" Biology 14, no. 9: 1268. https://doi.org/10.3390/biology14091268
APA StyleZhang, G., Liu, C., Lu, J., Zhang, S., & Zhu, L. (2025). The Role of AI-Driven De Novo Protein Design in the Exploration of the Protein Functional Universe. Biology, 14(9), 1268. https://doi.org/10.3390/biology14091268