Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning
Abstract
1. Introduction
2. Materials and Methods
2.1. Searching the Sequence Space of the Enzyme Dihydrofolate Reductase
2.2. Searching the Sequence Space of the Receptor-Binding Domain of SARS-CoV-2
2.3. Additional Details About the Implementation of the AI Models
2.4. On the Possibility of Overfitting
2.5. The Impact of Missing Data, False Positives, and False Negatives on the AI-Iterative Approaches
3. Results
3.1. Searching the Sequence Space of the Enzyme Dihydrofolate Reductase
3.2. Searching the Sequence Space of the Receptor-Binding Domain of SARS-CoV-2
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bershtein, S.; Segal, M.; Bekerman, R.; Tokuriki, N.; Tawfik, D.S.; Tokuriki, N.; Tawfik, D.S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 2006, 444, 929–932. [Google Scholar] [CrossRef] [PubMed]
- Weinreich, D.M.; Delaney, N.F.; DePristo, M.A.; Hartl, D.L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 2006, 312, 111–114. [Google Scholar] [CrossRef] [PubMed]
- Lovzovsky, E.R.; Daniels, R.F.; Heffernan, G.D.; Jacobus, D.P.; Hartl, D.L. Relevance of higher-order epistasis in drug resistance. Mol. Biol. Evol. 2020, 38, 142–151. [Google Scholar] [CrossRef]
- Zhou, J.; Wong, M.; Chen, W.C.; McCandlish, D.M. Higher-order epistasis and phenotypic prediction. Proc. Natl. Acad. Sci. USA 2020, 119, e2204233119. [Google Scholar] [CrossRef] [PubMed]
- Buda, K.; Miton, C.M.; Tokuriki, N. Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution. Nat. Commun. 2023, 14, 8508. [Google Scholar] [CrossRef]
- Arnold, F.H. Innovation by Evolution: Bringing new chemistry to life (Nobel Lecture). Angew. Chem. Int. Ed. 2019, 58, 14420–14426. [Google Scholar] [CrossRef]
- Zeymer, C.; Hilvert, D. Directed evolution of protein catalysts. Annu. Rev. Biochem. 2018, 87, 131–157. [Google Scholar] [CrossRef]
- Verma, R.; Schwaneberg, U.; Roccatano, D. Computer-aided protein directed evolution: A review of web servers, databases and other computational tools for protein engineering. Comput. Struct. Biotechnol. J. 2012, 2, e201209008. [Google Scholar] [CrossRef]
- Wijma, H.J.; Floor, R.J.; Bjelic, S.; Marrink, S.J.; Baker, D.; Janssen, D.B. Enantioselective enzymes by computational design and in silico screening. Angew. Chem. Int. Ed. 2015, 54, 3726–3730. [Google Scholar] [CrossRef]
- Childers, M.C.; Daggett, V. Insights from molecular dynamics simulations for computational protein design. Mol. Syst. Des. Eng. 2017, 2, 9–33. [Google Scholar] [CrossRef]
- Ebert, M.C.; Pelletier, J.N. Computational tools for enzyme improvement: Why everyone can—and should—use them. Curr. Opin. Chem. Biol. 2017, 37, 89–96. [Google Scholar] [CrossRef] [PubMed]
- St-Jacques, A.D.; Eyahpaise, M.-E.C.; Chica, R.A. Computational design of multisubstrate enzyme specificity. ACS Catal. 2019, 9, 5480–5485. [Google Scholar] [CrossRef]
- Wu, Z.; Kan, S.B.J.; Lewis, R.D.; Wittmann, B.J.; Arnold, F.H. Machine Learning-Assisted protein evolution with combinatorial libraries. Proc. Natl. Acad. Sci. USA 2019, 116, 8852–8858. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Dong, Y.; Reetz, M.T. Can machine learning revolutionize directed evolution of selective enzymes? Adv. Synth. Catal. 2019, 361, 2377–2386. [Google Scholar] [CrossRef]
- Risso, V.A.; Romero-Ribera, A.; Gutierrez-Rus, L.I.; Ortega-Muñoz, M.; Santoyo-Gonzalez, F.; Gavira, J.A.; Sanchez-Ruiz, J.M.; Kamerlin, S.C.L. Enhancing a de novo enzyme activity by computationally-focused ultra-low-throughput screening. Chem. Sci. 2020, 11, 6134–6148. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Margheritis, E.G.; Takahashi, K.; Kulesha, A.; D’souza, A.; Kim, I.; Yoon, J.H.; Tame, J.R.H.; Volkov, A.N.; Makhlynets, O.V.; et al. NMR-guided directed evolution. Nature 2022, 610, 389–393. [Google Scholar] [CrossRef]
- Gutierrez-Rus, L.I.; Vos, E.; Pantoja-Uceda, D.; Hoffka, G.; Gutierrez-Cardenas, J.; Ortega-Muñoz, M.; Risso, V.A.; Jimenez, M.A.; Kamerlin, S.C.L.; Sanchez-Ruiz, J.M. Enzyme enhancement through computational stability design targeting NMR-determined catalytic hotspots. J. Am. Chem. Soc. 2025, 147, 14978–14996. [Google Scholar] [CrossRef]
- Yang, J.; Lal, R.J.; Bowden, J.C.; Astudillo, R.; Hameedi, M.A.; Kaur, S.; Hill, M.; Yue, Y.; Arnold, F.H. Active learning-assisted directed evolution. Nat. Commun. 2025, 16, 714. [Google Scholar] [CrossRef]
- Papkou, A.; Garcia-Pastor, L.; Escudero, J.A.; Wagner, A. A rugged yet easily navigable fitness landscape. Science 2023, 382, eadh3860. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 ACM, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Sender, R.; Bar-On, Y.M.; Gleizer, S.; Bernshtein, B.; Flamholz, A.; Philips, R.; Milo, R. The total number and mass of SARS-CoV-2 virions. Proc. Natl. Acad. Sci. USA 2021, 118, e2024815118. [Google Scholar] [CrossRef] [PubMed]
- Moulana, A.; Dupic, T.; Philips, A.M.; Chang, J.; Roffler, A.; Greaney, A.J.; Starr, T.N.; Bloom, J.D.; Desai, M.D. The landscape of antibody binding affinity in SARS-CoV-2 Omicron BA.1 evolution. eLife 2023, 12, e83442. [Google Scholar] [CrossRef]
- Callaway, E. Why a highly mutated coronavirus variant has scientists on alert. Science 2023, 260, 934. [Google Scholar] [CrossRef] [PubMed]
- Cohen, C.U.S. Kills effort to hunt dangerous viruses. Science 2023, 381, 1147. [Google Scholar] [CrossRef]
- McKinney, W. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar]
- Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations (ICLR). arXiv 2015, arXiv:1412.6980. Available online: https://www.arxiv.org/pdf/1412.6980v2 (accessed on 10 May 2025).
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. Available online: https://arxiv.org/pdf/1710.05941 (accessed on 10 May 2025).
- Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Suárez-Martín, I.; Risso, V.A.; Romero-Zaliz, R.; Sanchez-Ruiz, J.M. Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. Int. J. Mol. Sci. 2025, 26, 4741. https://doi.org/10.3390/ijms26104741
Suárez-Martín I, Risso VA, Romero-Zaliz R, Sanchez-Ruiz JM. Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. International Journal of Molecular Sciences. 2025; 26(10):4741. https://doi.org/10.3390/ijms26104741
Chicago/Turabian StyleSuárez-Martín, Ignacio, Valeria A. Risso, Rocío Romero-Zaliz, and Jose M. Sanchez-Ruiz. 2025. "Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning" International Journal of Molecular Sciences 26, no. 10: 4741. https://doi.org/10.3390/ijms26104741
APA StyleSuárez-Martín, I., Risso, V. A., Romero-Zaliz, R., & Sanchez-Ruiz, J. M. (2025). Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning. International Journal of Molecular Sciences, 26(10), 4741. https://doi.org/10.3390/ijms26104741