Breaking Evolution’s Ceiling: AI-Powered Protein Engineering

Jin, Shuming; Wu, Qiuyang; Fu, Gaokui; Lu, Dong; Wang, Fang; Deng, Li; Nie, Kaili

doi:10.3390/catal15090842

Open AccessFeature PaperReview

Breaking Evolution’s Ceiling: AI-Powered Protein Engineering

by

Shuming Jin

,

Qiuyang Wu

,

Gaokui Fu

,

Dong Lu

,

Fang Wang

,

Li Deng

^* and

Kaili Nie

^*

College of Life Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China

^*

Authors to whom correspondence should be addressed.

Catalysts 2025, 15(9), 842; https://doi.org/10.3390/catal15090842

Submission received: 14 July 2025 / Revised: 22 August 2025 / Accepted: 29 August 2025 / Published: 2 September 2025

(This article belongs to the Section Biocatalysis)

Download

Browse Figures

Versions Notes

Abstract

Breakthrough advances in artificial intelligence (AI) are propelling de novo protein design past the boundaries of natural evolution, making it possible to engineer proteins with entirely novel structures and functions. Benefiting from iterative improvements in machine learning algorithms, AI-driven de novo strategies have overcome traditional reliance on natural templates. These approaches autonomously optimize catalytic sites and overall stability, significantly enhancing enzyme performance and applicability. Generative models, including large language models and diffusion models, can rapidly produce novel protein structures with specialized functions, offering innovative technological paths for biomolecule development. This review systematically discusses recent key developments and representative examples of AI applications in enzyme engineering and design. We highlight a fundamental shift from traditional “structure-based function analysis” to a new paradigm of “function-driven structural innovation.” Furthermore, we comprehensively evaluate current challenges in AI-driven protein engineering and suggest promising future directions.

Keywords:

protein engineering; AI-driven enzyme design; generative models; de novo protein design

Graphical Abstract

1. Introduction

In recent years, enzyme-based biocatalysis has gained prominence as a sustainable and eco-friendly alternative to traditional chemical processes, providing greener options for industrial manufacturing [1]. As demands for larger-scale enzyme applications grow [2], research has focused on identifying novel enzymes and enhancing existing ones to boost their performance in terms of efficiency, substrate specificity, product selectivity, and operational stability.

Currently, enzyme design strategies can be broadly classified into three main approaches. (i) Semi-rational design leverages prior structural or mechanistic knowledge in combination with high-throughput screening to pinpoint beneficial mutations. (ii) Rational design employs detailed catalytic insights—often derived from structural data—to introduce targeted modifications. (iii) De novo design seeks to create new catalysts directly from amino acid sequences, bypassing natural templates. Progress in these areas has long been constrained by the limited availability of high-resolution structures. Traditional homology modeling is unreliable when suitable templates are scarce. Recent breakthroughs in deep-learning structure prediction (Figure 1), exemplified by AlphaFold2/3 and RoseTTAFold/All-Atom, now deliver near-experimental accuracy even for proteins lacking homologs [3]. These advancements have provided a robust structural foundation for enzyme design.

Complementary computational techniques, including molecular docking, long-timescale molecular dynamics (MD), and hybrid QM/MM calculations, have become indispensable for mapping enzyme–substrate interactions and energetic landscapes. However, the substantial computational cost and the need for experimental validation remain non-trivial hurdles. In this context, machine-learning (ML) algorithms trained on rapidly expanding sequence–function datasets are shifting the field toward function-driven design paradigms [4]. This transformation from structure-based to function-driven enzyme design accelerates the discovery and optimization of high-performance biocatalysts, greatly expanding potential applications in sustainable biocatalysis.

2. Machine Learning and Data Foundations

Machine learning, a subfield of AI, focuses on predicting outcomes and making decisions by identifying patterns in data, without relying on traditional rule-based modeling approaches. Unlike conventional methods such as quantum mechanical calculations [5], ML optimizes parameters in mathematical functions to learn and uncover the intrinsic relationships within the data. These algorithms discover underlying patterns based on large datasets, rather than depending on manually defined rules [6]. Therefore, the core advantage of machine learning lies in its adaptability [7], as it continuously updates and improves from the data it processes.

2.1. Basic Algorithms

The performance of machine learning is largely dependent on both the accuracy and volume of the data used [8], particularly in the field of biocatalysis. Issues such as bias, noise, and imbalanced datasets must be effectively addressed. To enable ML models to extract meaningful patterns from data, the data must first be converted into numerical forms that represent features [9]. These features serve as the input to the ML models and reflect the intrinsic structure and patterns of the data. For example, in protein engineering, features can include amino acid composition, secondary structure tendencies, and conservation scores [10].

In protein engineering, machine learning is primarily implemented through four model types: supervised, unsupervised, self-supervised, and generative learning approaches [11]. Supervised learning uses labeled datasets to learn the mapping between input and output [12]. It is commonly applied in protein engineering to assess how mutations may influence enzyme stability or activity, such as training models to predict and analyze protein-peptide biophysical interactions. Unsupervised learning, on the other hand, identifies latent patterns or structures within data without labeled inputs [13]. It is widely used in protein sequence clustering and structural analysis, helping researchers classify enzymes or predict protein functional domains [14]. Self-supervised learning generates tasks from unlabeled data and excels in protein structure prediction and sequence docking [15], as it can automatically learn the correlation between protein sequences and their corresponding structures [16]. Generative learning models learn the underlying distribution of protein sequences and/or structures and sample new candidates (e.g., variational autoencoders, generative adversarial networks, normalizing flows, and diffusion models), enabling de novo sequence or structure design, conditional generation guided by target properties (such as stability, binding, or catalytic activity), and virtual screening to prioritize variants [17].

Machine learning technologies have been widely applied in the field of biocatalysis, especially in enzyme design, protein sequence prediction, and functional optimization tasks. Recent deep learning models, especially convolutional neural networks (CNNs) [18] and recurrent neural networks (RNNs), enable ML to process complex protein data and predict enzyme stability [19], substrate specificity, and catalytic efficiency. Recently, deep generative approaches like generative adversarial networks (GANs) and diffusion models have been applied to the design and improvement of proteins [20], further enhancing the predictive power of the models.

2.2. Data Foundation for ML

The quality of data used for ML directly impacts the effectiveness of model training. Therefore, effective data collection, cleaning, and standardization are essential for ensuring the reliability of ML models [21]. In protein engineering, data typically comes from experimental measurements, public databases, and literature reports. Experimental data includes protein sequences, structures, catalytic activity, and substrate specificity. Common public databases like UniProt [22], NCBI, and PDB [23] provide rich data on protein functions, sequences, and structures, which form the backbone for training ML models (Table 1 lists common databases). However, experimental data may have missing values, duplicates, and noise, making data cleaning a crucial step [24,25]. The cleaning process involves removing invalid data, filling in missing values, and correcting errors to ensure data quality and consistency. During the preprocessing of protein data, it is necessary to remove outliers to prevent them from affecting the subsequent model training. For example, protein structures may be missing or sequences incomplete; in protein activity features, there may be sampling biases caused by experimental errors or instrument inaccuracies; data analysis may be biased, and multiple hypothesis testing errors may not be controlled in statistical analyses. Additionally, due to differences in the scales of various features, data standardization is critical. This typically involves normalization or z-score normalization, transforming the data into a common scale [26]. For instance, protein molecular weight, amino acid distribution, and molecular structural features must be standardized to facilitate unified comparisons in ML models. Effective data management strategies can improve the training accuracy and robustness of ML models. By leveraging systematic database applications, researchers can access extensive data resources that drive the rapid advancement of protein engineering.

Despite the significant advancements of machine learning in protein engineering, there are still several unique challenges when handling protein data, especially related to high dimensionality, sparsity, and domain specificity. High-dimensionality of protein data is a prominent challenge; protein sequences can be hundreds or even thousands of amino acids long, with each amino acid representing a distinct feature [27]. This results in high-dimensional representations of proteins, increasing computational complexity and susceptibility to overfitting. To tackle this issue, widely used approaches involve selecting relevant features and applying dimensionality reduction techniques like PCA (Principal Component Analysis) and t-SNE [28]. Furthermore, experimental protein data are often limited, particularly in enzyme activity and mutant studies, leading to sparse datasets [29]. Sparse data can undermine the accuracy of ML models, as the models struggle to capture meaningful patterns when data are insufficient. Strategies to address this issue include data augmentation and transfer learning. Simulating data or using pre-trained models can increase data diversity and enhance the model’s generalization ability [30]. Additionally, different types of proteins have distinct structural and functional characteristics, leading to domain-specific issues in protein data. For certain protein types, general ML models may not be effective [31]. To tackle this, researchers typically design specialized models based on protein categories or use different features and training methods tailored to the specific functions of proteins [32,33].

3. Algorithm Architecture and Method Evaluation

Over the past few years, various architectures have been applied to enzyme modeling. The Transformer model (Figure 2A) leverages a self-attention mechanism to capture long-range interactions and co-evolutionary features within amino acid sequences, making it essential for comprehending enzymes’ overall properties. As a result, the Transformer model has become one of the most influential methods [34]. Additionally, through self-supervised training paradigms, the Transformer model can extract knowledge from large, unlabeled datasets, significantly improving data utilization efficiency [35]. The Transformer model has enormous potential in gene expression prediction and protein analysis [36,37]. For instance, Alexander et al. employed a large-scale unsupervised learning approach using a pure sequence model to achieve high-precision predictions for mutation effects and secondary structures [38]. Similarly, Jonathan et al. designed the miRe2e three-module architecture (structure prediction/MFE estimation/classifier) that outperforms deepMir in pre-miRNA prediction by 14 times [39]. However, it is worth noting that Transformer models face challenges in processing structural data directly, since capturing spatial topology and geometric constraints demands a different approach than that used for linear sequence modeling.

In parallel with the advancement of Transformer architectures, diffusion models have become a robust generative framework for enzyme design guided by structural information [40]. With their ability to simulate the dynamics of information propagation on graphs, diffusion models have laid the foundation for deep learning algorithms in graph-based contexts [41]. In these models, a “graph” refers to a collection of “nodes” (representing entities like atoms, residues, or molecules) connected by “edges” (representing interactions between them). This framework allows for the modeling of complex relationships between components, making it particularly useful in protein design, where understanding how individual elements interact is crucial. Diffusion models simulate how information spreads through these networks, helping to optimize enzyme design by considering how various structural changes can affect enzyme function. This technology has shown great versatility in graph embedding representation, topological structure generation, and time-varying graph analysis (Figure 2B), significantly enhancing the scalability and computational efficiency of graph neural networks in handling large-scale graph data. Kevin et al. [42] introduced the FoldingDiff model, which utilizes bidirectional Transformers as the backbone network and designs a periodic noise diffusion process, accurately reproducing α-helix and β-fold regions through an innovative combination of angle parameterization and the diffusion process. In addition to these two models, other generative models such as CNNs and graph neural networks (GNNs) also play significant roles in enzyme design (Figure 2C). Lu et al. [43] used a graph convolutional network to predict and design protease specificity. Through a two-stage graph convolution and regularization strategy, their model achieved an overall accuracy of 84.2% in predicting variants, validated experimentally. Wang et al. [44] proposed the DDE-CNN model, extracting 400-dimensional sequence features using the dipeptide expected mean deviation (DDE), constructing a 20 × 20 matrix as input to a 2D-CNN. With cross-validation and comparison to models like GRU (Gated Recurrent Unit) [45], a type of recurrent neural network commonly used for sequence data processing due to its ability to capture long-term dependencies, the model demonstrated significant improvement, providing strong support for high-precision enzyme recognition.

Through multi-dimensional feature fusion, dynamic attention mechanisms, and automated structural search, neural network architectures have achieved remarkable improvements in enzyme function prediction in terms of accuracy and efficiency [46]. Zhang et al. [47] built a dynamic docking framework using SE(3)-equivariant graph neural networks, optimizing both protein conformations and ligand poses via a geometric diffusion process. Their DynamicBind model improved by 74% compared to DiffDock. Once a model is constructed, it is crucial to quickly evaluate its effectiveness. A computational metric is needed for rapid comparison, but traditional NLP (Natural Language Processing) methods (e.g., BLEU/perplexity) lack physical relevance and are disconnected from real biological functions [48]. ProteinGym evaluates enzyme design models using ranking correlation, clinical judgment metrics, zero-shot evaluation, and supervised validation mechanisms to quantify the model’s quality [49].

4. Structural and Sequence Parallel Computational Strategies

4.1. Structure-Based Design Strategies

With the rapid development of artificial intelligence technologies, the application of deep learning and graph neural networks has become a major driving force in the field of protein design (Table 2). Traditional protein design methods often rely on experimental data and rule-based computational models, which, although effective, are limited by prediction accuracy and computational efficiency. The introduction of AI, especially deep learning and graph neural networks, has broken through these bottlenecks, allowing protein design to be optimized on a higher-dimensional and more refined level, particularly in addressing the mapping problem from structure to sequence.

Cutting-edge structure prediction tools like AlphaFold, trRosetta, and RoseTTAFold, powered by deep learning, have significantly enhanced the precision of protein 3D structure modeling, marking major advancements in structural biology (Figure 3A). AlphaFold [3] utilizes deep convolutional neural networks to transform amino acid sequences into their corresponding 3D structure models. It achieved near-experimental-level accuracy in the CASP14 assessment, marking a significant breakthrough in protein structure prediction. The latest version, AlphaFold3, demonstrates higher accuracy in predicting protein–protein and protein–ligand (nucleic acids, small molecules, ions) complex structures compared to traditional methods [50]. TrRosetta [51] and RoseTTAFold [52], through combining various data sources and advanced model training methods, predict residue-residue contacts and distances, further improving the reliability and speed of protein structure prediction, especially for complex proteins and polymorphic structures [53].

Table 2. Overview of AI tools for protein engineering and molecular design.

Category	AI Tools	Function
Structure Prediction	AlphaFold3	High-accuracy prediction of protein 3D structures for single chains and complexes	[54]
	RoseTTAFold	Rapid structure prediction from sequence, suitable for general modeling	[55]
	DeepFold	De novo protein structure prediction	[56]
	trRosetta	Residue contact-based modeling for protein structures and multimer prediction	[51]
	RosettaDesign	Protein structure prediction and design platform	[57]
Structure Design and Generation	RFdiffusion	De novo protein sequence generation from amino acid sequences	[58]
	ProGen	De novo protein sequence generation from amino acid sequences	[59]
	EvoEF	Protein function and structure prediction using evolutionary data	[60]
	ESM	Multimodal language models for sequence, structure, and function generation	[61]
	AlphaDesign	Design of novel protein sequences for enhanced function, stability, and affinity	[62]
Sequence Optimization and Mutation Prediction	AlphaMissense	Predicts pathogenicity of missense mutations using structure and evolutionary priors	[63]
	ProtBert	BERT-based pre-trained language model for protein sequence analysis	[64]
	µFormer	Models fitness landscapes; effective in predicting high-order mutational effects	[65]
	AiCE	Universal inverse folding framework combining structure and evolutionary constraints	[66]
	MutaGene	Prediction of mutation effects on protein structure and function	[67]
Binding Prediction and Affinity Estimation	DeepAffinity	Predicts protein-small molecule binding affinities	[68]
	AtomNet	Deep learning-based docking for drug-target interaction modeling	[69]
	EquiBind	Equivariant GNN for predicting protein-ligand binding poses	[70]
Graph Neural Network Applications	ProteinMPNN	Graph neural network-based design of function-specific proteins	[71]
	DynamicBind	Combines GNNs and diffusion for binding conformation prediction	[47]
Integrated Multitask Platforms	BioNeMo	NVIDIA’s unified AI platform for modeling structure, sequence, and function	[72]
	DeepSequence	Prediction of protein function and stability for mutant screening and optimization	[73]

These continuously optimized structural prediction tools provide unprecedented accuracy and efficiency for enzyme design. By accurately predicting enzyme 3D structures, researchers can gain clearer insights into enzyme active sites, substrate binding characteristics, and catalytic mechanisms, laying a solid foundation for directed evolution, optimization, and engineering design of enzymes. For instance, Abramson et al. [50] evaluated the performance of AlphaFold3 in predicting proteins, nucleic acids, and various ligands (small molecules, ions, and modified residues), all showing high accuracy.

Zhang et al. [74] employed AlphaFold2 structural modeling, Rosetta stability prediction, and molecular dynamics simulations to rationally design 3-steroidal-Δ1-dehydrogenase (KsdD5), pinpointing spatial sites that influence substrate entry and exit efficiency. This work provides a structural foundation for subsequent channel expansion designs. After expanding the substrate channel of mutant M3, its catalytic efficiency increased by four times, and its half-life extended by 3.9 times. Machine learning has also demonstrated exceptional capabilities in predicting non-native protein structures. An et al., based on structural predictions, designed a cyclic peptide with adjustable repeat units to modify the binding site size, significantly enhancing binding affinity for downstream small molecules [75]. This highlights the leap from “natural structure simulation” to “artificial function creation,” offering support for the design of novel biomaterials and drug delivery systems. Additionally, structure-based prediction tools have broken through traditional enzymology technical bottlenecks. For instance, membrane protein structure analysis has long been a challenge due to difficulties in expression and purification. Halim et al. [76] effectively built a SARS-CoV-2 membrane protein model using trRosetta, predicting and identifying critical residues and interaction sites in its C-terminal domain, providing an efficient alternative path for membrane protein research. Furthermore, using AlphaFold-generated structures and ProteinMPNN [71] has shown promising results in protein expression and solubility. In a study involving the design of 96 proteins in Escherichia coli, 73 of them expressed solubly, with most proteins exhibiting excellent thermal stability. This multidimensional structure-function design approach is driving a new wave in protein design.

4.2. Sequence-Based Design Strategies

The sequence-based engineering strategy improves protein function by analyzing and optimizing the protein sequence, without directly modifying the protein’s three-dimensional structure (Figure 3B). This approach uses bioinformatics tools and computational simulations to predict how sequence variations affect protein stability, catalytic efficiency, substrate specificity, and other functional characteristics. By computationally predicting and screening potential mutation sites, and integrating methods such as molecular dynamics and machine learning, researchers can efficiently design and optimize the performance of enzymes or other proteins. The core strategy of this approach is based on sequence optimization techniques using inverse folding models. A typical example is ProteinMPNN, which treats the protein backbone as a graph to predict amino acid sequences that can fold into designated structures. With a sequence recovery rate of up to 52.4%, ProteinMPNN demonstrates the potential for generating novel amino acid sequences and enhancing enzyme functionality. Sumida et al. [77] integrated evolutionary conserved information and fixed catalytic and binding sites, using the ProteinMPNN model to reconstruct the protein backbone. They successfully enhanced the stability and function of myoglobin and TEV protease, overcoming the limitations of expressing natural proteins in heterologous systems. For ligand-binding protein design, LigandMPNN [78] explicitly models atomic interactions of non-protein components, like small molecules, nucleic acids, and metals. By predicting these interactions with deep learning, the model designs proteins optimized for stronger binding affinities to target ligands. Compared to Rosetta and ProteinMPNN, LigandMPNN shows significant advantages in restoring native backbone sequences. In another recent study, Gao et al. [66] introduced AiCE (AI-informed Constraints for protein Engineering), an innovative AI-based approach to protein engineering that incorporates both structural and evolutionary constraints into a generalized inverse folding framework. This method does not require re-training proprietary models, reducing computational costs and enabling the identification of single and double mutations in the SpCas9 protein with just 1.15 CPU hours. Functional validation on eight structurally and functionally diverse proteins, such as deaminases, nuclear localization sequences, nucleases, and reverse transcriptases, demonstrated AiCE’s simplicity, efficiency, and generality.

The future trend points toward multi-scale models that combine language models, graph networks, and molecular dynamics, as well as high-throughput closed-loop iterative systems embedded in synthetic biology. These advances signal a paradigm shift in protein engineering, with sequence-based computational strategies driving innovations in synthetic biology and green manufacturing.

4.3. De Novo Design Strategy

This strategy generates entirely new amino acid sequences through computation, without relying on natural templates. It utilizes deep learning models to uncover the connections between protein sequences and their corresponding structures, enabling end-to-end creation from functional requirements to sequence design (Figure 3A).

RFdiffusion [58], developed by the David Baker team, is a deep learning model focused on de novo protein structure design. Its core is based on a diffusion model (noising-denoising mechanism) combined with the RoseTTAFold structural prediction network, gradually generating functional protein backbones from random noise. It supports the design of monomers, oligomers, binding sites, antibodies, and other complex structures. Their study demonstrated the de novo construction of single-chain variable fragments (scFvs), in which engineered CDRs from heavy and light chains formed complexes with TcdB and Phox2b peptide-MHC. Cryo-EM analysis validated proper immunoglobulin folding and binding orientation, while high-resolution structures confirmed atomic-level accuracy across all six CDR loop conformations [79]. RFdiffusion has propelled protein design from structural copying to functional creation, providing an efficient tool for biomedicine and synthetic biology. Moreover, the combination of generative models and protein language models has advanced the precision of protein design. Models like ProGen [59] and ESM [61] are trained on vast numbers of functional protein sequences, enabling unsupervised generation of specific functional protein variants. ProGen, by introducing control labels (e.g., protein family Pfam ID, biological process tags), applies conditional generation techniques from natural language processing to protein design. After fine-tuning 56,000 natural sequences from the lysozyme family, ProGen generated 1 million artificial sequences, with 66% of them successfully expressed. The latest technologies, such as ESM-3, can simultaneously handle protein sequence, 3D structure, and functional generation models. The massive database and training scale have further promoted multimodal modeling in protein design. Using this model, a fluorescent protein, esmGFP, with 58% sequence consistency, was successfully generated [80], demonstrating the powerful ability of language models to explore untapped regions of protein sequence space. Jiang et al. [81] developed the EVOLVEpro protein language model, which quickly enhanced protein activity with minimal experimental data and was successfully applied to antibody optimization, CRISPR nuclease evolution, and T7 promoter enhancement.

Although notable progress has been made, challenges remain in de novo protein design, including difficulties in forecasting the interplay of multiple mutations and limitations in simulating dynamic conformational changes. Future research may focus on further developments in multi-scale fusion models (e.g., combining language models with molecular dynamics) and synthetic biology feedback loops (computational-experimental high-throughput iterations).

5. AI-Driven Strategies for Predicting Enzyme Mutations and Optimizing Stability

5.1. Predicting Mutation Effects and Variant Performance

The primary challenge in protein-directed evolution is efficiently identifying functional mutants from an exponentially growing sequence space. Traditional methods, such as deep mutational scanning [82], site-directed or iterative saturation mutagenesis [83,84], and random mutagenesis libraries, rely heavily on limited sampling and cannot fully capture key sequence-function relationships. More advanced continuous evolution methods, exemplified by phage-assisted continuous evolution (PACE) [85,86], allow higher-throughput screening of adaptive mutants but still offer limited analysis of global protein sequence features. To address this, deep learning-based and multimodal models have become key strategies, enabling accurate prediction of mutation effects and variant performance.

The AlphaMissense model [63] predicts pathogenicity of missense mutations by comparing a mutated sequence to the wild-type sequence, evaluating its evolutionary compatibility using a “naturalness” score. It utilizes a novel unsupervised protein language model to learn amino acid distributions based on surrounding sequence context. AlphaMissense assessed around 216 million single-residue variants across 19,233 human proteins, showing strong agreement with ClinVar annotations. This highlights its broad potential for predicting the molecular impact of variants. ProMEP [87], trained on about 160 million protein structures from the AlphaFold database, utilizes a multimodal deep-learning approach with approximately 695 million parameters. Its prediction speed for proteins with ~1000 amino acids is nearly 296 times faster than AlphaMissense. ProMEP notably requires no multiple sequence alignments (MSA-free), thus simplifying mutant library construction. For example, it predicted beneficial mutations with 50–70% accuracy and deleterious mutations with 100% accuracy in engineering TnpB nucleases and TadA deaminases. However, ProMEP may underperform when predicting catalytic activity toward non-natural substrates. To overcome this limitation, a framework called μFormer [65] was developed. It uses three separate scoring modules to capture mutation effects at single-residue, motif-level, and sequence-level contexts. Combined with pretrained protein language models, μFormer accurately models protein fitness landscapes, reducing the dependency on large experimental datasets. Applying μFormer, researchers successfully engineered a β-lactamase to hydrolyze a novel substrate, ultimately identifying variants with significantly enhanced activity (up to 2000-fold).

These developments mark a paradigm shift from empirical protein engineering toward an “AI-prediction–experimental-validation” iterative workflow, paving new paths for drug design, enzyme engineering, and genome editing technologies.

5.2. Computational Strategies and Models for Enhancing Enzyme Stability

Protein stability is critical for industrial and biomedical applications of enzymes and antibodies. AI-driven computational methods for stability prediction have evolved from static energy calculations toward dynamic conformational analysis (Figure 4).

Initially, due to limited computing resources and experimental data, researchers adopted the rigid-backbone approximation [88], where backbone structures remain unchanged and only side-chain conformations are optimized using physical energy functions. One example is the FOLDEF algorithm employing the FOLD-X energy function, which rapidly and quantitatively predicts protein and protein-complex stability [89]. However, these rigid methods neglect backbone flexibility, limiting their predictive accuracy.

Rosetta incorporates backbone flexibility into energy calculations, substantially improving stability predictions [90]. For instance, in engineering thermostable transketolase variants, Rosetta achieved a qualitative prediction accuracy of 65.3%, ultimately yielding mutants with a three-fold increase in half-life at 60 °C and a five-fold increase in specific activity at 65 °C [91]. Recent studies combining Rosetta’s energy calculations with AlphaFold’s structural predictions have further improved prediction accuracy [92]. In a comparative study on an acyltransferase (LovD) and its mutants, combining AlphaFold structures with Rosetta scoring yielded stability predictions more consistent with experimental results than predictions based solely on crystal structures.

The SRS2020 model [93] incorporates multiple machine learning techniques—such as Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and Gradient Boosted Random Forests (GBT)—to adjust the weighting of Rosetta’s scoring components. Trained on the SKEMPI 2.0 database, which contains experimentally measured ΔΔG values for a wide range of protein–protein interfaces, GBT showed significantly improved accuracy for predicting ΔΔG of protein–protein interfaces (correlation of 0.75), outperforming traditional methods such as FoldX (0.34 correlation).

Equivariant models are also gaining traction in stability prediction. These models maintain geometric consistency under structural transformations, allowing accurate assessment of backbone changes and side-chain compatibility. The equivariant graph neural network ProtLGN [94] effectively extracts local microenvironment information around amino acids. Using ProtLGN, researchers computationally predicted beneficial mutations for VHH antibodies without experimental data input. Of the top ten predicted variants, nine exhibited higher melting temperatures than the wild-type, and three variants showed simultaneous improvements in binding affinity and thermal stability. This approach avoids conflicts inherent in traditional methods between backbone flexibility and mutation compatibility, enhancing prediction reliability and synergy.

These advancements demonstrate a paradigm shift from empirical trial-and-error approaches to data-driven rational protein engineering. By integrating physical principles with AI-driven methods, protein engineering is accelerating enzyme evolution, revealing the molecular basis of stability, and promoting more efficient and sustainable biocatalysis.

6. New Challenges of AI in Protein Engineering

AI technologies have significantly advanced protein design and optimization, particularly in structural prediction, functional optimization, and novel protein design. However, despite the progress, several challenges still hinder AI’s widespread application in protein engineering.

6.1. Limits of Structural Prediction on Complex and Dynamic Systems

One of the main issues lies in the accuracy of AI models for predicting complex proteins, membrane proteins, and multi-body complexes. Although deep learning models like AlphaFold have improved structural predictions, limitations remain when dealing with large protein complexes (e.g., those with 10–30 chains). These models often struggle with complex subunit assembly and interactions [95]. Moreover, the current models provide only static structures, which cannot capture function-related conformational changes, such as those occurring during enzyme catalysis or allosteric effects [96]. There is also a lack of models that can handle dynamic environmental changes affecting protein function [97]. Notably, several emerging ensemble predictors [98], such as ensemble/clustered AlphaFold-style predictors [99], flow- or flow-matching diffusion models [100], MD-guided AF2 variants [101], and multi-state design frameworks [55], are beginning to capture conformational heterogeneity and dynamic states. Their accuracy and robustness, however, remain under active evaluation and will require broader [102,103], standardized benchmarks and independent validation. While molecular dynamics simulations provide useful dynamic data, they are time-consuming and hard to integrate into AI training processes, and coupling multiple-dimensional dynamics is a challenge [104].

6.2. Design–Experiment Gap, Data Limitations, and Model Interpretability

Another challenge is the disconnect between AI predictions and experimental validation. AI models can generate numerous protein sequences, but integrating these designs into experimental settings remains difficult. Many AI-generated proteins show suboptimal stability or activity in practice. Additionally, high-throughput screening methods often face experimental and cost limitations, making it hard to provide timely feedback on AI designs. This challenge underscores the importance of acquiring more high-quality datasets for AI model training. At present, structural information primarily comes from cryo-EM and X-ray crystallography, yet these techniques provide limited time-resolved data on conformational dynamics, particularly for rare or less-characterized proteins [105]. Additionally, a significant challenge lies in the limited interpretability of AI models. These systems often operate as “black boxes,” making it hard to decipher the rationale behind their outputs [106]. This limits researchers’ trust in AI-designed proteins [107] and hinders further refinement of the designs. For complex protein design, scientists prefer methods that offer more interpretability and validation [108,109]. To address this, improving the interpretability of AI models and developing more transparent frameworks will be crucial for future research [110].

6.3. Benchmarks and Fair Comparison

Despite many eye-catching successes, generalization remains fragile. Persistent problems include selective reporting and single-protein anecdotes, incomplete or outdated benchmarks, inconsistent evaluation metrics, and data leakage (e.g., homolog overlap, implicit template reuse, time-forward contamination) [111,112]. These are often compounded by unblinded, retrospective tuning, sparse and unrepresentative test coverage, weak or absent uncertainty calibration, and limited release of code, seeds, and filters that would enable reproduction [113]. To address these issues, evaluations should be anchored to complementary, standardized, and leakage-audited resources. In practice, researchers have assembled standardized benchmark suites—FLIP (fitness-landscape prediction) [114], ProteinGym with its Design leaderboard (variant effects and sequence design) [49], PDB-Struct (refoldability) [115], PDBench (fixed-backbone sequence recovery/perplexity) [116], and a CASP-style, blinded Protein Design Benchmarking Challenge (prospective structure and function) [117]. Fair use of these resources requires comparison to strong modern baselines under clustered sequence-identity thresholds and time-aware holds, with explicit exclusion rules to prevent train–test contamination and, where feasible, preregistered protocols to reduce hindsight bias. Looking ahead, priorities include expanding benchmarks beyond model organisms to membrane and multi-component systems; adding dynamic/stateful tasks and condition shifts; auditing leakage with standardized tools; and instituting prospective, blinded rounds with embargoed test sets. Together, these steps would shift the field from anecdotal wins toward robust, reproducible, and generalizable progress [111].

7. Conclusions

AI is rapidly transforming protein engineering. AlphaFold has brought protein structure prediction close to experimental accuracy. Generative models, such as language and diffusion frameworks, enable fast exploration of sequence space and have successfully designed enzymes with high activity and stability. However, current models still face limitations. They struggle to capture conformational changes during catalysis and cannot fully describe large, complex assemblies. Future improvements will require integrating spatiotemporal data from molecular dynamics, single-molecule spectroscopy, protein NMR, and cryo-EM to enable dynamic structure–function prediction. To enhance reliability, incorporating physical constraints and interpretable modules—such as attention visualization—into AI models can improve transparency and reduce black-box risks. Meanwhile, coupling automated cloning, expression, and screening platforms with active learning will create a closed-loop system for rapid design and validation [118,119]. This integration will shorten R&D cycles and provide real-time experimental feedback. Looking forward, AI is expected to move beyond single-protein design. It holds promise for reprogramming entire metabolic pathways and engineering synthetic organelles, offering new strategies for sustainable biomanufacturing, materials development, and carbon recycling.

Author Contributions

Writing—original draft preparation, S.J.; investigation and writing, Q.W., G.F., D.L. and F.W.; review and editing, L.D. and K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Buller, R.; Lutz, S.; Kazlauskas, R.; Snajdrova, R.; Moore, J.; Bornscheuer, U. From nature to industry: Harnessing enzymes for biocatalysis. Science 2023, 382, 8615. [Google Scholar] [CrossRef] [PubMed]
De Santis, P.; Meyer, L.-E.; Kara, S. The rise of continuous flow biocatalysis–fundamentals, very recent developments and future perspectives. React. Chem. Eng. 2020, 5, 2155–2184. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Landwehr, G.M.; Bogart, J.W.; Magalhaes, C.; Hammarlund, E.G.; Karim, A.S.; Jewett, M.C. Accelerated enzyme engineering by machine-learning guided cell-free expression. Nat. Commun. 2025, 16, 865. [Google Scholar] [CrossRef] [PubMed]
Shen, L.; Yang, W. Molecular dynamics simulations with quantum mechanics/molecular mechanics and adaptive neural networks. J. Chem. Theory Comput. 2018, 14, 1442–1455. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A survey of deep learning and its applications: A new paradigm to machine learning. Arch. Comput. Methods Eng. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
Budach, L.; Feuerpfeil, M.; Ihde, N.; Nathansen, A.; Noack, N.; Patzlaff, H.; Naumann, F.; Harmouch, H. The effects of data quality on machine learning performance. arXiv 2022, arXiv:2207.14529. [Google Scholar] [CrossRef]
Eraslan, G.; Avsec, Ž.; Gagneur, J.; Theis, F.J. Deep learning: New computational modelling techniques for genomics. Nat. Rev. Genet. 2019, 20, 389–403. [Google Scholar] [CrossRef]
Kuhlman, B.; Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019, 20, 681–697. [Google Scholar] [CrossRef]
Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
Pethe, M.A.; Rubenstein, A.B.; Khare, S.D. Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations. Proc. Natl. Acad. Sci. USA 2019, 116, 168–176. [Google Scholar] [CrossRef]
Zhou, P.; Wen, L.; Lin, J.; Mei, L.; Liu, Q.; Shang, S.; Li, J.; Shu, J. Integrated unsupervised–supervised modeling and prediction of protein–peptide affinities at structural level. Brief. Bioinform. 2022, 23, bbac097. [Google Scholar] [CrossRef]
Kim, H.R.; Ji, H.; Kim, G.B.; Lee, S.Y. Enzyme functional classification using artificial intelligence. Trends Biotechnol. 2025. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Wang, Y.; Lin, Y.; Zhang, M.; Liu, O.; Shuai, J.; Zhao, Q. A Multi-Task Self-Supervised Strategy for Predicting Molecular Properties and FGFR1 Inhibitors. Adv. Sci. 2025, 12, 2412987. [Google Scholar] [CrossRef] [PubMed]
Harshvardhan, G.; Gourisaria, M.K.; Pandey, M.; Rautaray, S.S. A comprehensive survey and analysis of generative models in machine learning. Comput. Sci. Rev. 2020, 38, 100285. [Google Scholar] [CrossRef]
Ertelt, M.; Moretti, R.; Meiler, J.; Schoeder, C.T. Self-supervised machine learning methods for protein design improve sampling but not the identification of high-fitness variants. Sci. Adv. 2025, 11, eadr7338. [Google Scholar] [CrossRef]
Fang, X.; Huang, J.; Zhang, R.; Wang, F.; Zhang, Q.; Li, G.; Yan, J.; Zhang, H.; Yan, Y.; Xu, L. Convolution neural network-based prediction of protein thermostability. J. Chem. Inf. Model. 2019, 59, 4833–4843. [Google Scholar] [CrossRef]
Pfeiffenberger, E.; Bates, P.A. Predicting improved protein conformations with a temporal deep recurrent neural network. PLoS ONE 2018, 13, e0202652. [Google Scholar] [CrossRef] [PubMed]
Lin, E.; Lin, C.-H.; Lane, H.-Y. De novo peptide and protein design using generative adversarial networks: An update. J. Chem. Inf. Model. 2022, 62, 761–774. [Google Scholar] [CrossRef]
Gong, Y.; Liu, G.; Xue, Y.; Li, R.; Meng, L. A survey on dataset quality in machine learning. Inf. Softw. Technol. 2023, 162, 107268. [Google Scholar] [CrossRef]
UniProt: The Universal protein knowledgebase in 2025. Nucleic Acids Res. 2025, 53, D609–D617. [CrossRef]
Burley, S.K.; Berman, H.M.; Kleywegt, G.J.; Markley, J.L.; Nakamura, H.; Velankar, S. Protein Data Bank (PDB): The single global macromolecular structure archive. Protein Crystallogr. Methods Protoc. 2017, 1607, 627–641. [Google Scholar]
Zhu, J.-J.; Yang, M.; Ren, Z.J. Machine learning in environmental research: Common pitfalls and best practices. Environ. Sci. Technol. 2023, 57, 17671–17689. [Google Scholar] [CrossRef]
Whang, S.E.; Roh, Y.; Song, H.; Lee, J.-G. Data collection and quality challenges in deep learning: A data-centric ai perspective. VLDB J. 2023, 32, 791–813. [Google Scholar] [CrossRef]
Aksu, G.; Güzeller, C.O.; Eser, M.T. The effect of the normalization method used in different sample sizes on the success of artificial neural network model. Int. J. Assess. Tools Educ. 2019, 6, 170–192. [Google Scholar] [CrossRef]
Stock, M.; Van Criekinge, W.; Boeckaerts, D.; Taelman, S.; Van Haeverbeke, M.; Dewulf, P.; De Baets, B. Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data. PLoS Comput. Biol. 2024, 20, e1012426. [Google Scholar] [CrossRef]
Anowar, F.; Sadaoui, S.; Selim, B. Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne). Comput. Sci. Rev. 2021, 40, 100378. [Google Scholar] [CrossRef]
Li, F.; Yuan, L.; Lu, H.; Li, G.; Chen, Y.; Engqvist, M.K.; Kerkhoven, E.J.; Nielsen, J. Deep learning-based k cat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 2022, 5, 662–672. [Google Scholar] [CrossRef]
Venanzi, N.A.E.; Basciu, A.; Vargiu, A.V.; Kiparissides, A.; Dalby, P.A.; Dikicioglu, D. Machine learning integrating protein structure, sequence, and dynamics to predict the enzyme activity of bovine enterokinase variants. J. Chem. Inf. Model. 2024, 64, 2681–2694. [Google Scholar] [CrossRef] [PubMed]
Agarwal, V.; McShan, A.C. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat. Chem. Biol. 2024, 20, 950–959. [Google Scholar] [CrossRef]
AlQuraishi, M.; Sorger, P.K. Differentiable biology: Using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat. Methods 2021, 18, 1169–1180. [Google Scholar] [CrossRef]
Gao, W.; Mahajan, S.P.; Sulam, J.; Gray, J.J. Deep learning in protein structural modeling and design. Patterns 2020, 1, 100142. [Google Scholar] [CrossRef]
Le, N.Q.K. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023, 23, 2300011. [Google Scholar] [CrossRef]
Chandra, A.; Tünnermann, L.; Löfstedt, T.; Gratz, R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023, 12, e82819. [Google Scholar] [CrossRef]
Zhang, S.; Fan, R.; Liu, Y.; Chen, S.; Liu, Q.; Zeng, W. Applications of transformer-based language models in bioinformatics: A survey. Bioinform. Adv. 2023, 3, vbad001. [Google Scholar] [CrossRef] [PubMed]
Ling, X.; Li, Z.; Wang, Y.; You, Z. Transformer in Protein: A Survey. arXiv 2025, arXiv:2505.20098. [Google Scholar] [CrossRef]
Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [CrossRef] [PubMed]
Raad, J.; Bugnon, L.A.; Milone, D.H.; Stegmayer, G. miRe2e: A full end-to-end deep model based on transformers for prediction of pre-miRNAs. Bioinformatics 2021, 38, 1191–1197. [Google Scholar] [CrossRef] [PubMed]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Zhang, M.; Qamar, M.; Kang, T.; Jung, Y.; Zhang, C.; Bae, S.-H.; Zhang, C. A survey on graph diffusion models: Generative ai in science for molecule, protein and material. arXiv 2023, arXiv:2304.01565. [Google Scholar] [CrossRef]
Wu, K.E.; Yang, K.K.; van den Berg, R.; Alamdari, S.; Zou, J.Y.; Lu, A.X.; Amini, A.P. Protein structure generation via folding diffusion. Nat. Commun. 2024, 15, 1059. [Google Scholar] [CrossRef] [PubMed]
Lu, C.; Lubin, J.H.; Sarma, V.V.; Stentz, S.Z.; Wang, G.; Wang, S.; Khare, S.D. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc. Natl. Acad. Sci. USA 2023, 120, e2303590120. [Google Scholar] [CrossRef]
Sikander, R.; Wang, Y.; Ghulam, A.; Wu, X. Identification of enzymes-specific protein domain based on DDE, and convolutional neural network. Front. Genet. 2021, 12, 759384. [Google Scholar] [CrossRef] [PubMed]
Nosouhian, S.; Nosouhian, F.; Khoshouei, A.K. A Review of Recurrent Neural Network Architecture for Sequence Learning: Comparison Between LSTM and GRU. 2021. Available online: https://www.preprints.org/manuscript/202107.0252/v1 (accessed on 14 July 2025).
Zhou, J.; Huang, M. Navigating the landscape of enzyme design: From molecular simulations to machine learning. Chem. Soc. Rev. 2024, 53, 8202–8239. [Google Scholar] [CrossRef] [PubMed]
Lu, W.; Zhang, J.; Huang, W.; Zhang, Z.; Jia, X.; Wang, Z.; Shi, L.; Li, C.; Wolynes, P.G.; Zheng, S. DynamicBind: Predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat. Commun. 2024, 15, 1071. [Google Scholar] [CrossRef]
Cui, X.-C.; Zheng, Y.; Liu, Y.; Yuchi, Z.; Yuan, Y.-J. AI-driven de novo enzyme design: Strategies, applications, and future prospects. Biotechnol. Adv. 2025, 82, 108603. [Google Scholar] [CrossRef]
Notin, P.; Kollasch, A.; Ritter, D.; Van Niekerk, L.; Paul, S.; Spinner, H.; Rollins, N.; Shaw, A.; Orenbuch, R.; Weitzman, R. Proteingym: Large-scale benchmarks for protein fitness prediction and design. Adv. Neural Inf. Process. Syst. 2023, 36, 64331–64379. [Google Scholar]
Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Du, Z.; Su, H.; Wang, W.; Ye, L.; Wei, H.; Peng, Z.; Anishchenko, I.; Baker, D.; Yang, J. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 2021, 16, 5634–5651. [Google Scholar] [CrossRef]
Baek, M.; McHugh, R.; Anishchenko, I.; Jiang, H.; Baker, D.; DiMaio, F. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 2024, 21, 117–121. [Google Scholar] [CrossRef] [PubMed]
Goverde, C.A.; Pacesa, M.; Goldbach, N.; Dornfeld, L.J.; Balbi, P.E.; Georgeon, S.; Rosset, S.; Kapoor, S.; Choudhury, J.; Dauparas, J. Computational design of soluble and functional membrane protein analogues. Nature 2024, 631, 449–458. [Google Scholar] [CrossRef]
He, X.-h.; Li, J.-r.; Shen, S.-y.; Xu, H.E. AlphaFold3 versus experimental structures: Assessment of the accuracy in ligand-bound G protein-coupled receptors. Acta Pharmacol. Sin. 2025, 46, 1111–1122. [Google Scholar] [CrossRef]
Lisanza, S.L.; Gershon, J.M.; Tipps, S.W.; Sims, J.N.; Arnoldt, L.; Hendel, S.J.; Simma, M.K.; Liu, G.; Yase, M.; Wu, H. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. 2024, 43, 1288–1298. [Google Scholar] [CrossRef]
Lee, J.-W.; Won, J.-H.; Jeon, S.; Choo, Y.; Yeon, Y.; Oh, J.-S.; Kim, M.; Kim, S.; Joung, I.; Jang, C. DeepFold: Enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023, 39, btad712. [Google Scholar] [CrossRef] [PubMed]
Schmitz, S.; Ertelt, M.; Merkl, R.; Meiler, J. Rosetta design with co-evolutionary information retains protein function. PLoS Comput. Biol. 2021, 17, e1008568. [Google Scholar] [CrossRef]
Watson, J.L.; Juergens, D.; Bennett, N.R.; Trippe, B.L.; Yim, J.; Eisenach, H.E.; Ahern, W.; Borst, A.J.; Ragotte, R.J.; Milles, L.F. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089–1100. [Google Scholar] [CrossRef]
Madani, A.; Krause, B.; Greene, E.R.; Subramanian, S.; Mohr, B.P.; Holton, J.M.; Olmos Jr, J.L.; Xiong, C.; Sun, Z.Z.; Socher, R. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 2023, 41, 1099–1106. [Google Scholar] [CrossRef]
Huang, X.; Pearce, R.; Zhang, Y. EvoEF2: Accurate and fast energy function for computational protein design. Bioinformatics 2020, 36, 1135–1142. [Google Scholar] [CrossRef] [PubMed]
Cordoves-Delgado, G.; García-Jacas, C.R. Predicting antimicrobial peptides using ESMFold-predicted structures and ESM-2-based amino acid features with graph deep learning. J. Chem. Inf. Model. 2024, 64, 4310–4321. [Google Scholar] [CrossRef]
Jendrusch, M.A.; Yang, A.L.; Cacace, E.; Bobonis, J.; Voogdt, C.G.; Kaspar, S.; Schweimer, K.; Perez-Borrajero, C.; Lapouge, K.; Scheurich, J. AlphaDesign: A de novo protein design framework based on AlphaFold. Mol. Syst. Biol. 2025, 1–24. [Google Scholar] [CrossRef]
Cheng, J.; Novati, G.; Pan, J.; Bycroft, C.; Žemgulytė, A.; Applebaum, T.; Pritzel, A.; Wong, L.H.; Zielinski, M.; Sargeant, T. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 2023, 381, eadg7492. [Google Scholar] [CrossRef]
Zhang, Y.; Zhu, G.; Li, K.; Li, F.; Huang, L.; Duan, M.; Zhou, F. HLAB: Learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction. Brief. Bioinform. 2022, 23, 173. [Google Scholar] [CrossRef]
Sun, H.; He, L.; Deng, P.; Liu, G.; Liu, H.; Cao, C.; Ju, F.; Wu, L.; Qin, T.; Liu, T.-Y. Accelerating protein engineering with fitness landscape modeling and reinforcement learning. bioRxiv 2023. [Google Scholar] [CrossRef]
Fei, H.; Li, Y.; Liu, Y.; Wei, J.; Chen, A.; Gao, C. Advancing protein evolution with inverse folding models integrating structural and evolutionary constraints. Cell 2025, 188, 4674–4692. [Google Scholar] [CrossRef]
Kosicki, M.; Zhang, B.; Hecht, V.; Pampari, A.; Cook, L.E.; Slaven, N.; Akiyama, J.A.; Plajzer-Frick, I.; Novak, C.S.; Kato, M. In vivo mapping of mutagenesis sensitivity of human enhancers. Nature 2025, 643, 839–846. [Google Scholar] [CrossRef]
Karimi, M.; Wu, D.; Wang, Z.; Shen, Y. DeepAffinity: Interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 2019, 35, 3329–3338. [Google Scholar] [CrossRef] [PubMed]
Stafford, K.A.; Anderson, B.M.; Sorenson, J.; van den Bedem, H. AtomNet PoseRanker: Enriching ligand pose quality for dynamic proteins in virtual high-throughput screens. J. Chem. Inf. Model. 2022, 62, 1178–1189. [Google Scholar] [CrossRef] [PubMed]
Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, R.; Jaakkola, T. Equibind: Geometric deep learning for drug binding structure prediction. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 20503–20521. [Google Scholar]
Dauparas, J.; Anishchenko, I.; Bennett, N.; Bai, H.; Ragotte, R.J.; Milles, L.F.; Wicky, B.I.; Courbet, A.; de Haas, R.J.; Bethel, N. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022, 378, 49–56. [Google Scholar] [CrossRef] [PubMed]
John, P.S.; Lin, D.; Binder, P.; Greaves, M.; Shah, V.; John, J.S.; Lange, A.; Hsu, P.; Illango, R.; Ramanathan, A. BioNeMo Framework: A modular, high-performance library for AI model development in drug discovery. arXiv 2024, arXiv:2411.10548. [Google Scholar]
Strodthoff, N.; Wagner, P.; Wenzel, M.; Samek, W. UDSMProt: Universal deep sequence models for protein classification. Bioinformatics 2020, 36, 2401–2409. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Luo, X.; Li, D.; Gao, Y.; Chen, X.; Xi, Z.; Zheng, Z. Increased thermal stability and catalytic efficiency of 3-ketosteroid Δ1-dehydrogenase5 from Arthrobacter simplex significantly reduces enzyme dosage in prednisone acetate biosynthesis. Int. J. Biol. Macromol. 2024, 283, 137855. [Google Scholar] [CrossRef]
An, L.; Said, M.; Tran, L.; Majumder, S.; Goreshnik, I.; Lee, G.R.; Juergens, D.; Dauparas, J.; Anishchenko, I.; Coventry, B. Binding and sensing diverse small molecules using shape-complementary pseudocycles. Science 2024, 385, 276–282. [Google Scholar] [CrossRef] [PubMed]
Mahtarin, R.; Islam, S.; Islam, M.J.; Ullah, M.O.; Ali, M.A.; Halim, M.A. Structure and dynamics of membrane protein in SARS-CoV-2. J. Biomol. Struct. Dyn. 2022, 40, 4725–4738. [Google Scholar] [CrossRef]
Sumida, K.H.; Núñez-Franco, R.; Kalvet, I.; Pellock, S.J.; Wicky, B.I.; Milles, L.F.; Dauparas, J.; Wang, J.; Kipnis, Y.; Jameson, N. Improving protein expression, stability, and function with ProteinMPNN. J. Am. Chem. Soc. 2024, 146, 2054–2061. [Google Scholar] [CrossRef] [PubMed]
Dauparas, J.; Lee, G.R.; Pecoraro, R.; An, L.; Anishchenko, I.; Glasscock, C.; Baker, D. Atomic context-conditioned protein sequence design using LigandMPNN. Nat. Methods 2025, 22, 717–723. [Google Scholar] [CrossRef]
Bennett, N.R.; Watson, J.L.; Ragotte, R.J.; Borst, A.J.; See, D.L.; Weidle, C.; Biswas, R.; Yu, Y.; Shrock, E.L.; Ault, R. Atomically accurate de novo design of antibodies with RFdiffusion. bioRxiv 2025, 585103. [Google Scholar]
Hayes, T.; Rao, R.; Akin, H.; Sofroniew, N.J.; Oktay, D.; Lin, Z.; Verkuil, R.; Tran, V.Q.; Deaton, J.; Wiggert, M. Simulating 500 million years of evolution with a language model. Science 2025, 387, 850–858. [Google Scholar] [CrossRef]
Jiang, K.; Yan, Z.; Di Bernardo, M.; Sgrizzi, S.R.; Villiger, L.; Kayabolen, A.; Kim, B.; Carscadden, J.K.; Hiraizumi, M.; Nishimasu, H. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2024, 387, eadr6006. [Google Scholar] [CrossRef]
Fowler, D.M.; Fields, S. Deep mutational scanning: A new style of protein science. Nat. Methods 2014, 11, 801–807. [Google Scholar] [CrossRef]
Reetz, M.T.; Carballeira, J. Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat. Protoc. 2007, 2, 891–903. [Google Scholar] [CrossRef]
Bornscheuer, U.; Peds, S. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Eng. Des. Sel. 2010, 23, 903–909. [Google Scholar] [CrossRef] [PubMed]
Miller, S.M.; Wang, T.; Liu, D. Phage-assisted continuous and non-continuous evolution. Nat. Protoc. 2020, 15, 4101–4127. [Google Scholar] [CrossRef]
Esvelt, K.; Carlson, J.C.; Liu, D.R. A system for the continuous directed evolution of biomolecules. Nature 2010, 472, 499–503. [Google Scholar] [CrossRef]
Cheng, P.; Mao, C.; Tang, J.; Yang, S.; Cheng, Y.; Wang, W.; Gu, Q.; Han, W.; Chen, H.; Li, S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res. 2024, 34, 630–647. [Google Scholar] [CrossRef]
Bradley, P.; Misura, K.M.S.; Baker, D.J.S. Toward High-Resolution de Novo Structure Prediction for Small Proteins. Science 2005, 309, 1868–1871. [Google Scholar] [CrossRef] [PubMed]
Guerois, R.; Nielsen, J.E.; Serrano, L. Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. J. Mol. Biol. 2002, 320, 369–387. [Google Scholar] [CrossRef] [PubMed]
Kellogg, E.H.; Leaver-Fay, A.; Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 2011, 79, 830–838. [Google Scholar] [CrossRef]
Yu, H.; Yan, Y.; Zhang, C.; Dalby, P. Two strategies to engineer flexible loops for improved enzyme thermostability. Sci. Rep. 2017, 7, 41212. [Google Scholar] [CrossRef]
Peccati, F.; Alunno-Rufini, S.; Jiménez-Osés, G. Accurate prediction of enzyme thermostabilization with Rosetta using AlphaFold ensembles. J. Chem. Inf. Model. 2023, 63, 898–909. [Google Scholar] [CrossRef]
Shringari, S.R.; Giannakoulias, S.; Ferrie, J.J.; Petersson, E. Rosetta custom score functions accurately predict ΔΔ G of mutations at protein–protein interfaces using machine learning. Chem. Commun. 2020, 56, 6774–6777. [Google Scholar] [CrossRef] [PubMed]
Zhou, B.; Zheng, L.; Wu, B.; Tan, Y.; Lv, O.; Yi, K.; Fan, G.; Hong, L. Protein engineering with lightweight graph denoising neural networks. Mach. Learn. Deep. Learn. 2024, 64, 3650–3661. [Google Scholar] [CrossRef]
Bryant, P.; Pozzati, G.; Zhu, W.; Shenoy, A.; Kundrotas, P.; Elofsson, A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 2022, 13, 6028. [Google Scholar] [CrossRef]
Wu, F.; Jin, S.; Jiang, Y.; Jin, X.; Tang, B.; Niu, Z.; Liu, X.; Zhang, Q.; Zeng, X.; Li, S.Z. Pre-Training of Equivariant Graph Matching Networks with Conformation Flexibility for Drug Binding. Adv. Sci. 2022, 9, 2203796. [Google Scholar] [CrossRef] [PubMed]
Cheng, K.; Liu, C.; Su, Q.; Wang, J.; Zhang, L.; Tang, Y.; Yao, Y.; Zhu, S.; Qi, Y. 4D diffusion for dynamic protein structure prediction with reference and motion guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 93–101. [Google Scholar]
Kryshtafovych, A.; Montelione, G.T.; Rigden, D.J.; Mesdaghi, S.; Karaca, E.; Moult, J. Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1903–1911. [Google Scholar] [CrossRef] [PubMed]
Wayment-Steele, H.K.; Ojoawo, A.; Otten, R.; Apitz, J.M.; Pitsawong, W.; Hömberger, M.; Ovchinnikov, S.; Colwell, L.; Kern, D. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024, 625, 832–839. [Google Scholar] [CrossRef]
Jing, B.; Berger, B.; Jaakkola, T. AlphaFold meets flow matching for generating protein ensembles. arXiv 2024, arXiv:2402.04845. [Google Scholar] [CrossRef]
Barhoon, M.; Mahdiuni, H. Exploring Protein–Protein Docking Tools: Comprehensive Insights into Traditional and Deep-Learning Approaches. J. Chem. Inf. Model. 2025, 65, 6446–6469. [Google Scholar] [CrossRef]
Schafer, J.W.; Lee, M.; Chakravarty, D.; Thole, J.F.; Chen, E.A.; Porter, L.L. Sequence clustering confounds AlphaFold2. Nature 2025, 638, E8–E12. [Google Scholar] [CrossRef]
Praetorius, F.; Leung, P.J.; Tessmer, M.H.; Broerman, A.; Demakis, C.; Dishman, A.F.; Pillai, A.; Idris, A.; Juergens, D.; Dauparas, J. Design of stimulus-responsive two-state hinge proteins. Science 2023, 381, 754–760. [Google Scholar] [CrossRef]
Garzon, D.; Akbari, O.; Bilodeau, C. PepMNet: A Hybrid Deep Learning Model for Predicting Peptide Properties Using Hierarchical Graph Representations. Mol. Syst. Des. Eng. 2024, 10, 205–218. [Google Scholar] [CrossRef]
Fang, J. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief. Bioinform. 2020, 21, 1285–1292. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Von Eschenbach, W.J. Transparency and the black box problem: Why we do not trust AI. Philos. Technol. 2021, 34, 1607–1622. [Google Scholar] [CrossRef]
Wellawatte, G.P.; Gandhi, H.A.; Seshadri, A.; White, A.D. A perspective on explanations of molecular prediction models. J. Chem. Theory Comput. 2023, 19, 2149–2160. [Google Scholar] [CrossRef]
Novakovsky, G.; Dexter, N.; Libbrecht, M.W.; Wasserman, W.W.; Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 2023, 24, 125–137. [Google Scholar] [CrossRef]
White, M.; Haddad, I.; Osborne, C.; Liu, X.-Y.Y.; Abdelmonsef, A.; Varghese, S.; Hors, A.L. The model openness framework: Promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence. arXiv 2024, arXiv:2403.13784. [Google Scholar]
Bernett, J.; Blumenthal, D.B.; List, M. Cracking the black box of deep sequence-based protein–protein interaction prediction. Brief. Bioinform. 2024, 25, bbae076. [Google Scholar] [CrossRef]
Joeres, R.; Blumenthal, D.B.; Kalinina, O.V. Data splitting to avoid information leakage with DataSAIL. Nat. Commun. 2025, 16, 3337. [Google Scholar] [CrossRef]
Greenman, K.P.; Amini, A.P.; Yang, K.K. Benchmarking uncertainty quantification for protein engineering. PLOS Comput. Biol. 2025, 21, e1012639. [Google Scholar] [CrossRef]
Dallago, C.; Mou, J.; Johnston, K.E.; Wittmann, B.J.; Bhattacharya, N.; Goldman, S.; Madani, A.; Yang, K.K. FLIP: Benchmark tasks in fitness landscape inference for proteins. bioRxiv 2021. [Google Scholar] [CrossRef]
Wang, C.; Zhong, B.; Zhang, Z.; Chaudhary, N.; Misra, S.; Tang, J. Pdb-struct: A comprehensive benchmark for structure-based protein design. arXiv 2023, arXiv:2312.00080. [Google Scholar]
Castorina, L.V.; Petrenas, R.; Subr, K.; Wood, C.W. PDBench: Evaluating computational methods for protein-sequence design. Bioinformatics 2023, 39, btad027. [Google Scholar] [CrossRef]
Armer, C.; Kane, H.; Cortade, D.L.; Redestig, H.; Estell, D.A.; Yusuf, A.; Rollins, N.; Spinner, A.; Marks, D.; Brunette, T. Results of the protein engineering tournament: An open science benchmark for protein modeling and design. Proteins Struct. Funct. Bioinform. 2024. [Google Scholar] [CrossRef]
Rocklin, G.J.; Chidyausiku, T.M.; Goreshnik, I.; Ford, A.; Houliston, S.; Lemak, A.; Carter, L.; Ravichandran, R.; Mulligan, V.K.; Chevalier, A. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 2017, 357, 168–175. [Google Scholar] [CrossRef]
Qian, J.; Milles, L.F.; Wicky, B.I.; Motmaen, A.; Li, X.; Kibler, R.D.; Stewart, L.; Baker, D. Accelerating protein design by scaling experimental characterization. bioRxiv 2025. [Google Scholar] [CrossRef]

Figure 1. Advances of AI in protein engineering.

Figure 2. Algorithm architectures applied in protein structure modeling. (A) Transformer model captures interactions among amino acids via self-attention mechanisms; (B) diffusion model progressively optimizes amino acid sequences or 3D structures through a noise-driven generative process; (C) convolutional neural networks (CNNs) efficiently extracts and hierarchically models local correlations in protein sequences and structures.

Figure 3. AI-guided Protein Engineering. (A) Structure-based design: Optimizes protein function by using AI-predicted protein structures as a foundation, applying a structure-to-function design strategy to refine binding sites and active sites. (B) Sequence-based design: Enhances protein function by optimizing amino acid sequences. (C) De novo design: Generates entirely new proteins from scratch based on functional requirements, using deep learning models to predict both sequence and structure.

Figure 4. AI-driven enzyme mutation prediction and stability optimization strategies.

Table 1. Commonly used databases in protein engineering.

Database	Data Type	Data Size
UniProt	Protein function annotation, domain and evolutionary information	More than 240 million sequences
BioLiP2	Protein-ligand interaction	More than 900,000 entries
BRENDA	Provide EC number, dynamic parameters (K_cat/K_m), and mutation effect data	More than 8600 enzymes
AlphaFold	Protein structure prediction	More than 210 million protein sequences
PDB	Experimental analysis and prediction of protein structures	More than 230,000 protein structures
ProTherm	Protein thermal stability parameters	More than 7000 types of mutation data
FireProt	Thermodynamic parameters of protein mutants	More than 13,000 mutation entries
M-CSA	Enzyme catalytic mechanism and sites	More than 650 detailed mechanisms
SoluProtMut	Protein solubility data	More than 17,000 types of mutation data
PhosphoSitePlus	The relationship between phosphorylation/ubiquitination sites and functions	More than 58,000 protein items
SCOP2	Classification of protein structures	More than 860,000 entries
PROSITE	Protein domains, families and functional characteristics	More than 1900 entries
D3DistalMutation	The influence of inactive site mutations on enzyme activity	More than 7000 mutations
STITCH	Visualization of protein-ligand interaction networks and metabolic pathways	More than 70,000 compounds
ProtaBank	Repository of protein engineering sequence–function data (mutational scans, kinetics, stability, DMS)	More than 1.8 million unique variants and 7.7 million data points
Protein Design Archive	Manually curated database of designed protein sequences and structures	Over 1500 designed protein structures
SKEMPI 2.0	A database containing experimentally measured ΔΔG values for protein–protein interfaces	More than 20,000 variants
MaveDB	Multiplexed assays of variant effect (MAVE/DMS) with variant-effect measurements and metadata	Millions of measurements; thousands of datasets
FireProtDB	Manually curated protein stability data for single-point mutants (ΔΔG, Tm) with experimental context	Tens of thousands of mutations
ThermoMutDB	Thermodynamic parameters for missense mutations (ΔG, ΔΔG, Tm) with experimental details	More than 10,000 experimental entries
BindingDB	Protein–small-molecule binding affinities (Kd/Ki/IC50) with experimental conditions	Millions of binding measurements
PDBbind	Protein–ligand/protein–protein complexes annotated with measured binding affinities	Tens of thousands of complexes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, S.; Wu, Q.; Fu, G.; Lu, D.; Wang, F.; Deng, L.; Nie, K. Breaking Evolution’s Ceiling: AI-Powered Protein Engineering. Catalysts 2025, 15, 842. https://doi.org/10.3390/catal15090842

AMA Style

Jin S, Wu Q, Fu G, Lu D, Wang F, Deng L, Nie K. Breaking Evolution’s Ceiling: AI-Powered Protein Engineering. Catalysts. 2025; 15(9):842. https://doi.org/10.3390/catal15090842

Chicago/Turabian Style

Jin, Shuming, Qiuyang Wu, Gaokui Fu, Dong Lu, Fang Wang, Li Deng, and Kaili Nie. 2025. "Breaking Evolution’s Ceiling: AI-Powered Protein Engineering" Catalysts 15, no. 9: 842. https://doi.org/10.3390/catal15090842

APA Style

Jin, S., Wu, Q., Fu, G., Lu, D., Wang, F., Deng, L., & Nie, K. (2025). Breaking Evolution’s Ceiling: AI-Powered Protein Engineering. Catalysts, 15(9), 842. https://doi.org/10.3390/catal15090842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breaking Evolution’s Ceiling: AI-Powered Protein Engineering

Abstract

1. Introduction

2. Machine Learning and Data Foundations

2.1. Basic Algorithms

2.2. Data Foundation for ML

3. Algorithm Architecture and Method Evaluation

4. Structural and Sequence Parallel Computational Strategies

4.1. Structure-Based Design Strategies

4.2. Sequence-Based Design Strategies

4.3. De Novo Design Strategy

5. AI-Driven Strategies for Predicting Enzyme Mutations and Optimizing Stability

5.1. Predicting Mutation Effects and Variant Performance

5.2. Computational Strategies and Models for Enhancing Enzyme Stability

6. New Challenges of AI in Protein Engineering

6.1. Limits of Structural Prediction on Complex and Dynamic Systems

6.2. Design–Experiment Gap, Data Limitations, and Model Interpretability

6.3. Benchmarks and Fair Comparison

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI