Next Article in Journal
Phytochemical Screening by HRLC–MS/MS (Q-TOF) and Antioxidant and Anti-Inflammatory Properties of Thottea sivarajanii Leaf Extract
Previous Article in Journal
Oral Cannabidiol for Acute Post-Extraction Pain: A Randomized Pilot Study
Previous Article in Special Issue
Design, Synthesis, and Evaluation of Pyrrole-Based Selective MAO-B Inhibitors with Additional AChE Inhibitory and Neuroprotective Properties Identified via Virtual Screening
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Advancing PROTAC Discovery Through Artificial Intelligence: Opportunities, Challenges, and Future Directions

1
College of Pharmacy, Keimyung University, Daegu 42601, Republic of Korea
2
Department of Biomedical Informatics, Korea University College of Medicine, Seoul 02708, Republic of Korea
*
Author to whom correspondence should be addressed.
Pharmaceuticals 2025, 18(12), 1793; https://doi.org/10.3390/ph18121793
Submission received: 19 October 2025 / Revised: 7 November 2025 / Accepted: 10 November 2025 / Published: 25 November 2025
(This article belongs to the Special Issue Computer-Aided Drug Design and Drug Discovery, 2nd Edition)

Abstract

Proteolysis Targeting Chimeras (PROTACs) represent a transformative modality in drug discovery, enabling the selective degradation of disease-relevant proteins through the ubiquitin proteasome system. Despite their therapeutic promise, the rational design of PROTACs remains a complex and resource-intensive process, involving multiple parameters such as target and ligase compatibility, ternary complex formation, linker optimization, and degradation efficiency. Recent advances in artificial intelligence (AI) have provided new strategies to address these obstacles, ranging from structure-based modeling of ternary complexes to degradability prediction, generative linker design, and pharmacokinetic property estimation. This review aims to explore how AI can be leveraged directly or indirectly in the PROTAC development pipeline. First, we analyze existing applications of AI, such as ternary complex structure prediction, degradability prediction, linker design, and ADME prediction. We further discuss how other approaches from the related fields may be adapted to address the challenges of PROTAC discovery. Lastly, we discuss challenges that current AI models face, such as limited data, poor interpretability, and low generalizability. Taken together, overcoming these barriers will enable AI-driven strategies to accelerate PROTAC discovery and provide a more rational framework for targeted protein degrader development.

Graphical Abstract

1. Introduction

Many disease-associated proteins, such as scaffolding proteins, transcription factors, or regulatory molecules, lack deep, well-defined binding pockets or catalytic active sites, rendering them inaccessible to traditional occupancy-based pharmacology. These proteins often feature broad, shallow surfaces or intrinsically disordered regions that are poorly suited for high-affinity binding by small molecules [1]. Despite their structural intractability, such proteins frequently play essential roles in cancer, neurodegeneration, and immune disorders, and therefore remain high-value therapeutic targets [2]. To overcome these challenges, targeted protein degradation (TPD) has emerged as a transformative therapeutic strategy that expands the druggable proteome by enabling the modulation of proteins previously considered intractable by conventional small-molecule inhibitors [3,4,5].
Among the most prominent TPD modalities, proteolysis targeting chimeras (PROTACs) represent a class of heterobifunctional small molecules that exploit the cell’s endogenous ubiquitin proteasome system (UPS) to selectively degrade intracellular proteins. PROTACs consist of three key components: a ligand for the protein of interest (POI), a ligand for an E3 ubiquitin ligase, and a linker that connects them. Upon simultaneous engagement of the POI and the E3 ligase, the PROTAC facilitates the formation of a ternary complex that brings the two proteins into close proximity, enabling the ligase to ubiquitinate the POI. This triggers proteasomal recognition and degradation of the target protein. Importantly, the PROTAC molecule is not consumed in this process and can engage in multiple rounds of catalysis, a hallmark of event-driven pharmacology that distinguishes PROTACs from classical inhibitors, which require sustained occupancy of the target at stoichiometric levels [6,7] (Figure 1).
Recent clinical advances have demonstrated the therapeutic potential of this modality. As of 2025, at least 30 PROTACs have entered human clinical trials, with most candidates in phase I or II. Notably, ARV-471 (vepdegestrant), targeting the estrogen receptor (ER), has advanced to phase III clinical trials and is currently under New Drug Application (NDA) labeling, representing the most clinically advanced example of PROTACs in development. These trials not only validate the pharmacological feasibility of targeted protein degradation in humans but also pave the way for the first potential regulatory approvals in this emerging drug class [6,8,9]. Despite their conceptual promise, the rational design of PROTACs remains an inherently complex and resource-intensive endeavor. Achieving efficient and selective protein degradation requires the careful optimization of multiple interdependent parameters. These include the selection of a degradable target protein, the choice of a compatible E3 ligase, the ability to form a structurally stable ternary complex, the positioning and flexibility of the linker, and favorable biophysical properties such as cell permeability, metabolic stability, and intracellular exposure [10]. Notably, many PROTAC candidates fail to induce degradation not due to insufficient binding affinity, but rather because of suboptimal ternary complex geometries that lack cooperativity, or inadequate cellular pharmacokinetics [11,12]. These design bottlenecks make traditional trial-and-error approaches time-consuming and costly, ultimately hindering the pace of PROTAC discovery and development.
In recent years, artificial intelligence (AI) has revolutionized various stages of drug discovery, from de novo molecule generation and target prediction to ADME profiling and structure-based design [13]. However, its application to PROTAC discovery remains nascent. Early studies have demonstrated the potential of machine learning and deep learning in addressing specific subproblems for PROTAC discovery: predicting the likelihood of ternary complex formation, estimating degradability, optimizing linker properties, and modeling compound permeability [14]. As a result, a growing body of research is now exploring how AI-based tools can complement and accelerate PROTAC discovery.
Several excellent reviews have already provided comprehensive overviews of AI applications in general drug discovery and development [15,16,17,18]. Therefore, we do not reiterate those detailed methodologies here. Instead, this review is written from the perspective of how AI specialists can contribute their expertise to PROTAC discovery, which may make certain computational terms less explicitly explained and somewhat challenging for readers from other disciplines. The central aim of this review is to give an overview of the landscape of AI applications in PROTAC discovery and highlight opportunities for future innovation. We begin by reviewing AI models that have already been applied to PROTACs, including ternary complex modeling, degradation prediction, generative models for linker design, and ADME prediction. We also examine other AI models, such as sequence-to-expression models and flow-based generative models, and identify transferable methodologies that could be adapted for PROTAC discovery pipelines. We also discuss major challenges in PROTAC discovery, such as the scarcity of high-quality data, especially experimentally validated ternary complex structures and degradation profiles. In addition, model interpretability and generalizability remain significant concerns, particularly when applying AI tools across diverse target classes and E3 ligase systems. To overcome these barriers, integrated strategies combining physics-informed neural networks, biological validation, and agentic AI frameworks are likely to play a central role.

2. PROTAC Discovery Pipeline: Opportunities for AI Integration

The discovery of PROTACs follows a multi-step pipeline that includes target identification, PROTAC design, synthesis, and biological evaluation (Figure 2). This complex workflow presents numerous technical challenges, from selecting appropriate protein targets and E3 ligases to optimizing the ternary complex formation and pharmacokinetic properties of PROTACs. Recent advances in AI offer promising solutions to several bottlenecks throughout this pipeline, enabling more efficient and rational PROTAC discovery.

2.1. Overview of the PROTAC Discovery Workflow

The PROTAC discovery process begins with target selection, identifying disease-relevant proteins that are suitable for ubiquitin–proteasome system (UPS)–mediated degradation, as first demonstrated by Sakamoto et al. in their seminal work on PROTAC technology [20]. Following this, E3 ligase selection is a critical step, as the ligase must form a productive ternary complex with both the target protein and the PROTAC molecule. Burslem and Crews provide a comprehensive overview of PROTAC design principles, including iterative linker optimization to maximize ternary complex stability and degradation efficiency [21]. Despite advances in our mechanistic understanding, PROTAC discovery has traditionally relied heavily on a trial-and-error approach. The iterative cycle of synthesis, biological evaluation, and optimization remains resource- and time-intensive, reflecting the complexities inherent in modulating ternary complex formation, target engagement, and cellular permeability. As such, initial PROTAC development is challenged by limited predictive tools and incomplete structural or biophysical data, which often necessitate broad chemical exploration to identify active PROTACs. After synthesis, PROTAC candidates undergo rigorous in vitro and in vivo validation, assessing critical parameters including target degradation potency, selectivity, cellular permeability, and metabolic stability [22]. These sequential steps constitute a feedback-driven workflow, where experimental results inform subsequent design iterations in an ongoing optimization loop.

2.2. Critical Bottlenecks: From Target Selection to PROTAC Optimization

Several bottlenecks continue to slow down the PROTAC pipeline:
Target and E3 Ligase Pairing: Although more than 600 E3 ligases are encoded in the human genome, only a small number, most notably VHL and CRBN, are commonly used in PROTAC development. This is largely due to limited structural information, lack of biochemical characterization, and the absence of well-validated small molecule binders [23]. Identifying the most suitable combinations of target proteins and E3 ligases remains a major challenge. There have been efforts to explore alternative E3 ligases beyond VHL and CRBN, such as KEAP1, DCAF16, MDM2, and RNF114. However, these cases are still relatively rare and highly context-dependent. One of the main obstacles is the difficulty in finding small molecules that can bind to these ligases with sufficient potency and selectivity. In addition, E3 ligases show distinct expression patterns depending on the tissues and cell types, which could enable selective degradation in specific biological contexts but also increase the complexity of rational design. Therefore, expanding the usable set of E3 ligases will require continued efforts in ligand discovery, structural analysis, and data-driven profiling of ligase expression and function.
Ternary Complex Formation: The cooperativity and stability of the ternary complex critically influence degradation efficacy. Roy et al. demonstrated the importance of ternary complex structural characterization for rational PROTAC design [24]. Moreover, numerous studies have confirmed that stable ternary complexes substantially contribute to degradation efficiency [19]. Consequently, accurate prediction of ternary complex formation is becoming increasingly important for effective PROTAC development.
Linker and ADMET Optimization: The linker in a PROTAC molecule plays a central role not only in facilitating ternary complex formation between the target protein and E3 ligase but also in modulating critical pharmacokinetic and physicochemical properties. While early-stage PROTAC discovery often employs simple PEG (polyethylene glycol) or alkyl linkers to explore structure–activity relationships [25], it has been consistently shown that even subtle changes in linker length, composition, rigidity, or polarity can dramatically affect both degradation efficacy and drug-like behavior. Importantly, the linker is a major determinant of a PROTAC’s absorption, distribution, metabolism, and excretion (ADME) characteristics. PROTAC molecules typically violate multiple Lipinski’s rules due to their large molecular weight and high flexibility, which frequently result in poor solubility and membrane permeability [4]. Recent work has highlighted that modifications to linker properties, such as reducing hydrogen bond donors, introducing conformational constraints, or minimizing polar surface area, can lead to significant improvements in passive permeability and oral bioavailability. Indeed, most PROTACs that have progressed to clinical trials, including ARV-110 and ARV-471, do not retain their initial screening linkers but instead utilize linkers that have been extensively optimized through medicinal chemistry to strike a balance between degradation potency and favorable pharmacokinetic profiles. This highlights the notion that linker design is not a secondary consideration but a core aspect of PROTAC development strategy. Effective PROTAC optimization, therefore, requires a comprehensive approach that simultaneously addresses ternary complex geometry and ADME properties through rational linker engineering.
Collectively, these challenges highlight that the PROTAC discovery process remains constrained by empirical trial-and-error and incomplete predictive frameworks. Each stage, including target–ligase pairing, ternary complex formation, linker design, and ADME optimization, presents unique computational bottlenecks that limit rational design. These limitations naturally align with areas where artificial intelligence can provide tangible value. In the following section, we therefore examine how emerging AI models directly intervene at each step of this workflow, transforming empirical screening into data-driven, predictive, and generative discovery cycles.

3. Current Applications of AI in PROTAC Discovery

Building upon the workflow and bottlenecks outlined in Section 2, AI has begun to play a transformative role in addressing the critical pain points of PROTAC discovery. Specifically, deep learning and generative models have been developed to (i) predict ternary complex formation with atomic precision, (ii) infer degradability from protein–ligase–compound context, (iii) rationally generate and optimize linkers, and (iv) model pharmacokinetic behavior of large heterobifunctional molecules. These tools collectively mark the transition from descriptive to predictive modeling of degrader design. In this section, we summarize the representative AI approaches specifically developed for PROTACs and highlight their distinctive contributions to the field (Table 1).

3.1. Ternary Complex Prediction

Accurate prediction of ternary complexes is a fundamental requirement in PROTAC research, as productive degradation depends on the formation of stable and cooperative assemblies between the target protein, the E3 ligase, and the PROTAC. However, it remains challenging due to the conformational flexibility of linkers and the cooperative nature of E3–target interactions. The earliest systematic efforts to model PROTAC-induced ternary complexes were demonstrated by Drummond and Williams [42]. Using Monte Carlo conformational sampling combined with docking, they showed that it was possible to recover a small number of near-native poses. However, their framework showed limited accuracy in ranking, highlighting the need for improved sampling and scoring strategies. PRosettaC [43] integrates global docking and Rosetta-based refinement under PROTAC-derived distance constraints, recovering near-native ternary structures for BRD4 and BTK and enabling rationalization of experimental structure–activity relationships. While effective for well-studied systems, its reliance on classical physics-based docking limits accuracy and generalizability across diverse targets.
Recently, deep learning has been leveraged to overcome the limitations of physics-based docking. For example, AlphaFold3 [27] has been applied to ligand-mediated ternary complex prediction, showing that it improves interface prediction accuracy relative to purely docking-based methods. However, AlphaFold3 is primarily a general-purpose protein–ligand predictor, and while it provides structural hypotheses, it was not specifically optimized to capture the cooperativity or degradation potency unique to PROTAC-mediated ternary complexes. By contrast, DeepTernary [26], an SE(3)-equivariant graph neural network developed for end-to-end prediction of ternary complex structures, was trained on over 20,000 experimentally resolved ternary complexes collected from the PDB [44]. Although these ternary complexes are not specific to PROTAC systems, they capture general principles of protein-to-protein association that are relevant to degrader design, because PROTACs act by inducing new and productive protein-to-protein interactions between the target and the E3 ligase rather than merely bringing them into spatial proximity. The model takes as input the unbound structures of the target protein and E3 ligase, along with the molecular graph of the degrader, and predicts the 3D ternary configuration through inter- and intra-graph attention–based message passing and an attention-based decoder (Figure 3A). With the SE(3)-equivariant design, it ensures that geometric relationships are preserved under rotation and translation, enabling physically consistent learning of spatial interactions. When benchmarked on curated ternary datasets, DeepTernary achieved substantially lower interface RMSD and higher DockQ scores than AlphaFold3, indicating superior recovery of native ternary geometries. More importantly, the model reveals a quantitative correlation between predicted buried surface area (BSA) and degradation potencies estimated from K L P T constants (Figure 3B). This indicates that the geometric complementarity learned by the model reflects real degradability trends, supporting its potential use in high-throughput in silico screening and prioritization of degrader candidates. However, the generalizability of these predictions across diverse ligase–target pairs remains unproven, as most datasets are skewed toward well-studied systems like BRD4–VHL. Overfitting to structurally similar training complexes is a small potential risk, given the small number of high-resolution ternary complex structures available.

3.2. Degradability Prediction

While ternary modeling captures structural feasibility, degradation prediction addresses a distinct question: whether a designed PROTAC will actually trigger efficient target ubiquitination and proteasomal clearance inside cells. This task goes beyond structural compatibility, requiring integration of molecular features, binding context, and cellular factors. It is worth noting that a target protein’s native homeostasis, including its intrinsic turnover rate, stability mechanisms, and proteostatic regulation, plays a decisive role in degradability. Proteins that are strongly buffered by cellular quality control systems or maintained through tightly regulated stability networks often resist PROTAC-induced degradation even when ternary complex formation is structurally favorable. To facilitate such studies, PROTAC-DB [45] provides a comprehensive resource of over 6000 PROTACs with some annotated D C 50 and D m a x values, enabling quantitative modeling of degradation outcomes. Using this resource, DeepPROTACs [28] introduced one of the first deep learning frameworks for degradation prediction. The model combines molecular graph embeddings of PROTACs with protein- and E3-specific sequence features to predict whether a compound induces high or low degradation (Figure 4A). When benchmarked on multiple targets, DeepPROTACs achieved a classification accuracy of 78%, and importantly, the authors validated predictions experimentally, correctly identifying 11 out of 16 PROTACs as active degraders(Figure 4B), confirming the model’s ability to prioritize active degraders. Subsequent studies by Ribes et al. extended this approach by incorporating pre-trained embeddings with semi-supervised learning to improve degradation activity prediction under limited labeled data, while also evaluating model generalization to unseen compounds and protein targets [29]. Complementing these molecule-centric models, the MAPD framework [30] predicts the intrinsic degradability of proteins independent of specific PROTACs. It models protein-intrinsic determinants such as ubiquitination motifs and solvent-exposed lysines, achieving an AUROC of around 0.78 for kinase degradability and showing potential applicability to non-kinase targets.
More recent methods focus on incorporating 3D geometry and interpretability. DegradeMaster [31] employs an E(3)-equivariant graph neural network that integrates spatial molecular information with a semi-supervised learning scheme, allowing it to leverage unlabeled PROTAC data from PROTAC-DB (Figure 4C). The model not only reaches high predictive accuracy, with an AUROC of up to 0.88, but also provides attention-based explanations that highlight key atoms most strongly influencing degradation probability. Visualization of these attention maps reveals that atoms at the linking area connecting the warhead- and E3- binding ligand and binding regions of the POI and E3 ligase show higher attention scores, which is consistent with experimentally observed structure–activity trends (Figure 4D). Separately, PrePROTAC [32] applied interpretable machine learning at a genome-wide scale, highlighting previously understudied proteins as potential PROTAC targets, thereby expanding the druggable space. Together, these efforts illustrate how AI can move beyond classification toward biologically meaningful prioritization of substrates. Despite promising results, most models suffer from limited generalizability to unseen targets, particularly non-kinase or non-CRBN substrates. Cell-line specificity and expression context are often underrepresented in training, leading to false positives or negatives in diverse biological settings. In addition to biological diversity, inconsistencies in experimental quantification further affect model reliability. Although standardized metrics such as D C 50 and D m a x are conceptually defined to measure degradation potency, in practice their determination varies widely across studies in terms of assay format, time point, and cell type. This lack of harmonization makes it difficult to establish uniform ground-truth labels and can inflate apparent model performance. As a result, even well-performing models should be interpreted within the context of underlying data uncertainty, emphasizing the need for standardized quantitative benchmarks for PROTAC degradation.

3.3. Linker Design and Optimization

The linker is the chemical bridge that connects the warhead and the E3 ligase ligand in a PROTAC molecule. Its length, flexibility, and geometry critically determine whether the two proteins can form a productive ternary complex. Because experimental linker optimization often requires numerous synthetic iterations, AI–based generative models have become powerful alternatives for rational design.
Graph-based generative models such as DeLinker [33] build linkers atom-by-atom in three-dimensional space. DeLinker adopts a conditional graph variational autoencoder (VAE) architecture that simultaneously models both the chemical topology and the 3D spatial coordinates of atoms. In this framework, each molecular fragment is encoded into a latent vector that captures local chemical environments and geometric constraints, including the distance and angle between exit vectors. During decoding, the model sequentially generates new atoms and bonds conditioned on these geometric features, ensuring that the synthesized linker maintains the correct orientation and avoids steric clashes. This allows DeLinker to grow linkers directly in 3D space while preserving realistic bond lengths and angles (Figure 5A). As shown in the comparison with exhaustive database searches, DeLinker successfully recovered highly similar linkers with a 3D similarity greater than 0.85 to the reference linker, while the database baseline, which randomly sampled linkers from existing molecules, failed to generate comparable results (Figure 5B). This demonstrates the model’s ability to infer plausible 3D connections between fragments. In addition, 3DLinker [34] incorporates E(3)-equivariant latent spaces to have a precise atom-level description of molecule geometry and to ensure generated conformers remain consistent with structural constraints.
Moreover, diffusion-based approaches have also emerged, including DiffPROTACs [38], which adapt denoising diffusion to PROTAC generation, learning to generate entire degrader molecules from noise while maintaining structural coherence between the warhead and E3 ligand (Figure 5C). Its O(3)-equivariant graph transformer ensures rotational invariance, and fine-tuning on a curated PROTAC dataset achieved a 93.9% chemical validity rate. The spatial structure of the generated linkers closely resembles the crystal structures (Figure 5D). Moreover, DAD-PROTAC [39] is a domain-adapted diffusion model that corrects for chemical space mismatch between small molecules and PROTACs via density ratio estimation, enabling efficient fine-tuning and high-quality linker generation under limited PROTAC data.
Reinforcement learning (RL) has also been widely applied to linker optimization. Link-INVENT [35] combines SMILES-based generative modeling with RL to optimize multiple molecular properties simultaneously, such as solubility, flexibility, and hydrogen-bonding capacity. As training progresses, the agent learns to generate linkers that maximize a user-defined multi-objectives (Figure 5E). The result shows a steady improvement in the average objective score across epochs as the model learns to connect two benzene rings with two objectives of minimizing hydrogen bond donors and maintaining a single-ring linker structure (Figure 5F). PROTAC-RL [37] integrates transformer-based linker generation with memory-assisted RL to design viable linkers while optimizing pharmacokinetic profiles even under data-scarce conditions by pre-training on a quasi-dataset followed by fine-tuning on real PROTAC data. Moreover, ShapeLinker [36], an RL method coupled with fast attention-based point cloud alignment, has been introduced to directly optimize linker geometry in three-dimensional space. These innovations collectively highlight a shift from empirical trial-and-error to rational, property-aware linker engineering that can simultaneously account for structural fit and drug-like properties.

3.4. ADME Properties Prediction

PROTACs frequently fall outside conventional drug-likeness rules, creating persistent challenges for ADME properties. Early modeling efforts relied on molecular descriptors and classical classifiers such as decision trees, random forests, and support vector machines [41]. Applied to CRBN- and VHL-based PROTACs, these approaches achieved predictive accuracies above 80% for cell permeability, identifying lipophilicity and molecular size as dominant determinants. Although these studies demonstrated promising accuracy, their scope was limited to relatively small and homogeneous datasets focused on permeability. In practice, comprehensive ADME modeling must extend beyond permeability to include solubility, metabolic stability, and enzyme interactions, and this is particularly challenging because PROTAC-specific datasets remain scarce. Addressing this data limitation, Peteani et al. systematically evaluated quantitative structure–property relationship (QSPR) models for predicting ADME-related properties of PROTACs and molecular glues, including permeability, metabolic clearance, cytochrome P450 inhibition, plasma protein binding, and lipophilicity [40]. Their key contribution was to pretrain multi-task models on large-scale small molecule datasets and then fine-tune them on degrader data, effectively leveraging transfer learning to overcome data scarcity. This approach markedly improved predictive performance, achieving misclassification errors below 4% for molecular glues and around 15% for PROTACs. These results indicate that while classical classifiers can achieve strong performance in narrow tasks, transfer learning is critical to scale predictive modeling across the broader and more data-sparse chemical space of PROTACs. Nevertheless, ADME and permeability data for degraders remain highly heterogeneous, as assay conditions, platforms, and cell lines differ substantially across studies. As a result, even models showing high predictive accuracy may be constrained by experimental variability, underscoring the need for community-wide efforts to unify and standardize such datasets. Also, it should be noted that the model reported in [40] was trained on proprietary, non-public datasets, which limits the possibility for external validation and comparison. Nevertheless, such industrial datasets have provided valuable internal insights, and continued efforts toward transparent and shareable degrader data will be crucial to ensure reproducibility and wider applicability.
While current AI applications have demonstrated clear value across specific stages of PROTAC design, most remain specialized and operate in isolation. The next frontier lies in integrating these task-specific models into unified, multi-modal frameworks capable of reasoning across sequence, structure, and cellular response. Section 4, therefore, explores emerging and transferable AI paradigms—from protein language models to diffusion and flow-based generators—that could bridge these silos and establish a cohesive foundation for end-to-end, AI-enabled PROTAC discovery.

4. Emerging and Transferable AI Models for PROTAC Discovery

Alongside the rapid progress of PROTAC-specific AI models, recent advances in AI models from related areas of biology and chemistry offer valuable opportunities for PROTAC design. These transferable technologies, ranging from sequence-based foundation models to chemical perturbation response prediction frameworks, could significantly accelerate PROTAC research if adapted appropriately. This section highlights promising methods from related fields and identifies areas where dedicated PROTAC-specific model development is still required (Figure 6).

4.1. Sequence- and Transcriptome-Based E3 Ligase Identification

One of the major bottlenecks in PROTAC discovery is the narrow reliance on a handful of E3 ligases such as CRBN and VHL, despite the human genome encoding more than 600 ligases with potential therapeutic utility. This bias largely reflects the scarcity of well-characterized ligands and the lack of structural and biochemical data for most ligases. Recent advances in AI now offer a means to systematically explore and prioritize E3 ligases by integrating diverse sequence- and transcriptome-based biological signals. At the protein level, protein language models (PLMs) such as ESM-2 [46] and ProtTrans [47], trained on massive sequence corpora, provide rich embeddings that capture structural and functional properties of proteins. When applied to the full human genome, these embeddings enable unsupervised clustering of E3 ligases with shared ubiquitination domains and prediction of substrate-recognition motifs or binding pockets that have not been experimentally annotated. Such representations can also guide the de novo design of ligase-binding motifs or even entirely new E3 ligases with desired substrate specificity, by incorporating them into generative protein design frameworks. At the genomic level, sequence-to-expression models including ExPecto [48], Enformer [49], Borzoi [50] and AlphaGenome [51] can complement PLM-based predictions by estimating tissue- and cell-type–specific E3 expression directly from genomic DNA. These models learn regulatory sequence grammars that map promoter and enhancer sequences to quantitative expression patterns across thousands of tissues. Incorporating their predictions allows ranking of E3 ligases not only by biochemical plausibility but also by their predicted availability in disease-relevant cellular contexts. Integrating these approaches with ligandability resources such as ELIOT [52], UbiHub [53], and E3Atlas [54] provides a rational pipeline to expand the E3 ligase repertoire, moving beyond empirical selection toward context-aware, sequence-driven discovery of novel ligases suitable for PROTAC design.

4.2. Predicting Ligase–Substrate Specificity

In PROTAC design, it is important to predict which E3 ligases can productively engage specific substrates within a given cellular environment [55]. Recent advances in protein–protein interaction (PPI) prediction have provided transferable strategies to model E3–substrate recognition in a more quantitative and spatially resolved manner. For instance, geometric deep learning frameworks such as MaSIF [56] learn residue-level surface representations, providing structural determinants that guide ligase selection. Another model named MAPE-PPI [57] introduces a microenvironment-aware protein embedding framework that represents each residue together with its local three-dimensional and sequential neighborhood, enabling accurate and scalable PPI prediction with an effective balance between efficiency and accuracy. Adapting these architectures to E3 ligase systems will require incorporating additional biological context, such as degron accessibility, ubiquitin-transfer geometry, and lysine spatial distribution to predict productive ubiquitination.

4.3. Flow-Based Generative Modeling

Generative modeling frameworks, originally developed for molecular and protein design, can also be applied to PROTAC discovery. Specifically, some diffusion models have demonstrated impressive ability to generate chemically valid and geometrically consistent molecules, including complex scaffolds and linkers [38,39]. Recently, flow matching models [58] have emerged as an efficient alternative to diffusion models, learning continuous transformations from a base distribution, such as Gaussian noise, to the complex data distribution. This approach enables faster convergence during training and produces more stable samples. These methods have already been adapted for molecule and protein design tasks, enabling efficient generation of property-constrained molecules and 3D conformations [59,60]. In the context of PROTACs, flow matching could be leveraged to condition generation explicitly on ternary pocket geometries or linker exit vectors, while simultaneously enforcing physicochemical constraints such as lipophilicity, polar surface area, or permeability. Beyond linker optimization, such frameworks also offer the possibility of designing entirely new warheads for emerging targets or ligands for E3 ligases to expand the PROTAC toolbox, thereby broadening the chemical and biological space accessible for targeted protein degradation. Moreover, flow-matching–based generative surrogates [61,62], initially developed to accelerate molecular dynamics simulations, offer a scalable way to approximate the conformational ensembles underlying these interactions. Together, these approaches could enable large-scale in silico screening of ligase–substrate specificity by combining high-resolution geometric recognition with efficient ensemble-level sampling.

4.4. Physics-Informed Neural Networks

Purely data-driven neural networks, especially when trained on limited datasets, struggle to internalize fundamental physical relationships, such as including steric repulsion, bond connectivity, or force-field energetics, whereas fully atomistic simulations are too computationally expensive for large-scale applications. A promising intermediate solution is physics-informed neural networks that embed molecular mechanics directly into their architectures or loss functions [63,64]. By constraining representations with physically meaningful terms, these models can capture the energetic landscape of PROTAC assemblies while retaining scalability. Adapting such approaches to PROTAC systems could yield neural network models that respect fundamental molecular mechanics while remaining data-efficient—providing energy-consistent predictions of ternary structure prediction without the need for exhaustive atomistic simulations.

4.5. Transcriptome-Based Modeling of Chemical Perturbation Responses

While ternary complex modeling captures the structural feasibility of PROTAC action, it does not provide insight into the broader cellular consequences of PROTAC treatment. A similar challenge has long existed in traditional inhibitor development, where structural docking or binding assays alone could not explain compound-specific transcriptional outcomes. To address this gap, models trained on large-scale perturbational transcriptome datasets such as ConnectivityMap [65] and Tahoe-100M [66], which are primarily derived from inhibitor-induced perturbations, have been widely used to predict how chemical treatments change gene expression levels. These resources, which couple millions of chemical perturbations to matched transcriptional readouts, enable the in silico prediction of chemical perturbation response [67,68]. For PROTACs, a similar modeling paradigm could be adopted once PROTAC-specific perturbational datasets become available, allowing prediction of how targeted protein degradation reshapes transcriptomic states. Such models could be used to (i) prioritize PROTAC molecules that yield the desired transcriptomic fingerprint, (ii) filter out designs predicted to trigger toxicity-associated signatures. In this way, perturbation-response modeling complements structural and ADME predictions by providing a system-level view of PROTAC efficacy and safety, accelerating the triage of PROTAC candidates before entering costly experimental assays.
Taken together, these diverse AI paradigms, ranging from sequence-based protein language models and transcriptome predictors to diffusion and flow-based molecular generators, represent complementary layers of an emerging ecosystem for degrader design. Sequence and transcriptome models expand the accessible ligase and substrate space, while geometric and physics-informed networks enhance ternary modeling fidelity, and generative models accelerate the creation of linker and ligand candidates with desired pharmacological traits. Their convergence defines a unified vision in which multi-modal AI frameworks collectively address the fundamental bottlenecks of PROTAC discovery, paving the way toward integrated, adaptive, and experimentally grounded AI-driven PROTAC design.

5. Challenges and Limitations of Current AI Approaches

Despite rapid advances in AI-driven PROTAC discovery, several fundamental challenges limit the robustness, interpretability, and translational impact of current models. These limitations are not unique to PROTACs but are amplified by the unique structural complexity, pharmacological liabilities, and data sparsity of the PROTAC chemical space. Below, we discuss the main challenges that remain to be overcome for the effective integration of AI into PROTAC design.

5.1. Data Scarcity and Standardization in PROTAC Datasets

Public PROTAC resources remain limited both in scale and annotation quality. For example, PROTAC-DB [45] and TPDdb [69] contain several thousand to ten thousands entries, yet only a fraction are associated with standardized, quantitative degradation metrics such as D C 50 or D m a x . However, their assay conditions differ substantially across studies in terms of cell lines, exposure times, ligase context, and detection methods [4,70]. Such heterogeneity reduces data comparability and introduces noise into model training [71]. In addition, experimentally resolved ternary complex structures remain scarce, with fewer than one thousand entries currently available, leaving most compounds without corresponding 3D interaction data. The limited size of curated datasets increases the risk of overfitting and reduces reproducibility [72]. Addressing these issues will require larger, standardized datasets developed through community-wide benchmarking efforts. Such standardization is not only essential for data comparability but also directly determines the reliability of AI models discussed in Section 3. In addition, some recent AI models rely on proprietary datasets that are not publicly available. While such data have accelerated progress within individual organizations, their limited accessibility makes it difficult for the broader community to independently verify or benchmark model performance. Continued collaboration and gradual sharing of standardized degrader datasets will be important for improving reproducibility and community engagement.

5.2. Limited Generalization Across Targets and Ligases

Most AI models trained on PROTAC data show strong performance in retrospective benchmarks but show sharp performance declines when applied to new targets or less-studied ligases [71,72]. This reflects the inherent bias in available data, which is heavily skewed toward a few popular ligases, such as CRBN and VHL and a limited number of protein targets [4,70]. Generalization to other E3 ligases, substrates, or therapeutic contexts remains a major unsolved problem [23]. Without addressing this bias, models risk reinforcing the current narrow scope of PROTAC discovery rather than enabling expansion into unexplored ligase–target combinations. Transfer learning, domain adaptation, and multi-task modeling across chemical modalities may provide paths forward, but systematic validation across diverse ligase–substrate combinations is still lacking.

5.3. Lack of Interpretability

Many current approaches operate as black-box predictors, achieving high accuracy without offering mechanistic insight. Degradability classifiers rarely reveal which sequence or structural features drive predictions, while ternary complex models often generate plausible poses without clarifying determinants of cooperativity or ubiquitination efficiency. This opacity limits both scientific trust and translational utility. Moreover, relatively few AI-designed PROTACs have progressed to prospective wet-lab validation, meaning that much of the current evidence remains computational. Improving interpretability through attention mechanisms, feature attribution, or physics-informed modeling, combined with systematic biological validation, is necessary to bridge the gap between prediction and mechanism.

5.4. The Missing Experimental Feedback Loop

Finally, AI-driven PROTAC design has yet to be fully integrated into iterative experimental pipelines. Unlike traditional drug discovery, where design–synthesis–testing loops are standard, most PROTAC models operate in isolation from real-time experimental feedback. This prevents models from learning from their own failures and refining predictions dynamically. Embedding AI into closed-loop workflows, where predictions guide synthesis and results are immediately fed back into model training, will be critical to overcome data scarcity, validate predictions, and accelerate optimization. Early demonstrations of lab-in-the-loop drug design suggest that such integration could be transformative for PROTAC discovery [73].

6. Future Perspectives: Toward AI-Driven PROTAC Design

AI is transforming drug discovery, yet its tangible impact differs greatly across modalities. In the small-molecule domain, AI has already delivered practical success. For example, Insilico Medicine developed the first end-to-end AI-designed small-molecule drug, rentosertib, a TNIK inhibitor for idiopathic pulmonary fibrosis that advanced to phase II clinical trials in 2024 after being identified, optimized, and nominated entirely through AI-driven generative design and preclinical modeling [74]. Comparable AI frameworks are now routinely employed in major pharmaceutical pipelines for target identification, de novo molecule generation, and ADME optimization.
By contrast, the application of AI to targeted protein degradation remains at an earlier stage. To date, no PROTACs in clinical trials have originated directly from AI-based molecular design. Nonetheless, several industrial collaborations indicate that this transition is already underway. VantAI, in partnership with Bristol Myers Squibb, Janssen, and other global pharmaceutical companies, is integrating deep generative and reinforcement-learning models to accelerate the design of degraders and molecular glues. Such initiatives exemplify how AI-assisted PROTAC discovery is evolving from conceptual modeling to early industrial implementation, linking computational design with experimental validation.
For these approaches to mature, data accessibility and feedback integration are essential. Much of the existing degrader data remains fragmented or proprietary, limiting reproducibility and cross-benchmarking. Establishing a pre-competitive data-sharing initiative through expanded resources such as PROTAC-DB and TPDdb would provide the transparency required for rigorous evaluation of AI models and facilitate their translation into clinically actionable degraders. Collaborative data ecosystems between academia, startups, and large pharmaceutical organizations could therefore serve as the foundation for the next generation of AI-driven PROTAC platforms.
For AI to transition from predictive modeling to practical design, it also must be tightly integrated with experimental workflows. Current PROTAC pipelines rarely incorporate continuous feedback, preventing models from learning from both successes and failures. The next step is to embed AI into adaptive closed-loop systems in which predictions guide synthesis and high-throughput assays, and the resulting data are immediately fed back into model refinement. Building on this foundation, AI agents offer a way to coordinate such workflows. Agents are autonomous systems that connect multiple models, tools, and datasets, dynamically deciding which ligase–substrate pairs to explore, which linkers to propose, and which assays to run. By incorporating experimental feedback, they convert otherwise static pipelines into adaptive, self-improving design frameworks. The long-term vision is a fully automated PROTAC design platform where a therapeutic target is specified and an optimized PROTAC emerges as output. Such a system would unify target prioritization, ligase selection, ternary complex modeling, linker generation, and ADME optimization into a single computational workflow [75]. Coupled with iterative retraining and prospective validation, this paradigm would represent a decisive shift from heuristic, trial-and-error discovery to systematic, AI-driven PROTAC design.

Author Contributions

Conceptualization, K.-S.P. and M.J.; data curation, K.-S.P. and M.J.; writing—original draft preparation, K.-S.P. and M.J.; writing—review and editing, K.-S.P. and M.J.; visualization, K.-S.P.; supervision, M.J.; project administration, M.J.; funding acquisition, K.-S.P. and M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Bio&Medical Technology Development Program of the National Research Foundation [RS-2024-00441029 to M.J.; RS-2024-00406114 to K.P.]; ICAN program supervised by the IITP [IITP-2025-RS-2022-00156439 to M.J.; IITP-2025-RS-2024-00438263 to M.J.] funded by the Korean government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Zhang, C.; Liu, Y.; Li, G.; Yang, Z.; Han, C.; Sun, X.; Sheng, C.; Ding, K.; Rao, Y. Targeting the undruggables—The power of protein degraders. Sci. Bull. 2024, 69, 1776–1797. [Google Scholar] [CrossRef]
  2. Xie, X.; Yu, T.; Li, X.; Zhang, N.; Foster, L.J.; Peng, C.; Huang, W.; He, G. Recent advances in targeting the “undruggable” proteins: From drug discovery to clinical trials. Signal Transduct. Target. Ther. 2023, 8, 335. [Google Scholar] [CrossRef]
  3. Zhao, L.; Zhao, J.; Zhong, K.; Tong, A.; Jia, D. Targeted protein degradation: Mechanisms, strategies and application. Signal Transduct. Target. Ther. 2022, 7, 113. [Google Scholar] [CrossRef]
  4. Békés, M.; Langley, D.R.; Crews, C.M. PROTAC targeted protein degraders: The past is prologue. Nat. Rev. Drug Discov. 2022, 21, 181–200. [Google Scholar] [CrossRef] [PubMed]
  5. Dale, B.; Cheng, M.; Park, K.S.; Kaniskan, H.Ü.; Xiong, Y.; Jin, J. Advancing targeted protein degradation for cancer therapy. Nat. Rev. Cancer 2021, 21, 638–654. [Google Scholar] [CrossRef] [PubMed]
  6. Zhong, G.; Chang, X.; Xie, W.; Zhou, X. Targeted protein degradation: Advances in drug discovery and clinical practice. Signal Transduct. Target. Ther. 2024, 9, 308. [Google Scholar] [CrossRef]
  7. Liu, Z.; Hu, M.; Yang, Y.; Du, C.; Zhou, H.; Liu, C.; Chen, Y.; Fan, L.; Ma, H.; Gong, Y.; et al. An overview of PROTACs: A promising drug discovery paradigm. Mol. Biomed. 2022, 3, 46. [Google Scholar] [CrossRef]
  8. Vetma, V.; O’Connor, S.; Ciulli, A. Development of PROTAC degrader drugs for cancer. Annu. Rev. Cancer Biol. 2024, 9, 119–140. [Google Scholar] [CrossRef]
  9. Gough, S.M.; Flanagan, J.J.; Teh, J.; Andreoli, M.; Rousseau, E.; Pannone, M.; Bookbinder, M.; Willard, R.; Davenport, K.; Bortolon, E.; et al. Oral estrogen receptor PROTAC vepdegestrant (ARV-471) is highly efficacious as monotherapy and in combination with CDK4/6 or PI3K/mTOR pathway inhibitors in preclinical ER+ breast cancer models. Clin. Cancer Res. 2024, 30, 3549–3563. [Google Scholar] [CrossRef]
  10. Tran, N.L.; Leconte, G.A.; Ferguson, F.M. Targeted protein degradation: Design considerations for PROTAC development. Curr. Protoc. 2022, 2, e611. [Google Scholar] [CrossRef] [PubMed]
  11. Schwalm, M.P.; Krämer, A.; Dölle, A.; Weckesser, J.; Yu, X.; Jin, J.; Saxena, K.; Knapp, S. Tracking the PROTAC degradation pathway in living cells highlights the importance of ternary complex measurement for PROTAC optimization. Cell Chem. Biol. 2023, 30, 753–765. [Google Scholar] [CrossRef]
  12. Crowe, C.; Nakasone, M.; Chandler, S.; Tatham, M.; Makukhin, N.; Hay, R.; Ciulli, A. Mechanism of Degrader-Targeted Protein Ubiquitinability. Sci. Adv. 2024, 10, eado6492. [Google Scholar] [CrossRef]
  13. Hasselgren, C.; Oprea, T.I. Artificial intelligence for drug discovery: Are we there yet? Annu. Rev. Pharmacol. Toxicol. 2024, 64, 527–550. [Google Scholar] [CrossRef]
  14. Ferreira, F.J.; Carneiro, A.S. AI-Driven Drug Discovery: A Comprehensive Review. ACS Omega 2025, 10, 23889–23903. [Google Scholar] [CrossRef] [PubMed]
  15. Jiang, J.; Ke, L.; Chen, L.; Dou, B.; Zhu, Y.; Liu, J.; Zhang, B.; Zhou, T.; Wei, G.W. Transformer technology in molecular science. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2024, 14, e1725. [Google Scholar] [CrossRef]
  16. Zhang, O.; Lin, H.; Zhang, X.; Wang, X.; Wu, Z.; Ye, Q.; Zhao, W.; Wang, J.; Ying, K.; Kang, Y.; et al. Graph Neural Networks in Modern AI-aided Drug Discovery. Chem. Rev. 2025, 125, 10001–10103. [Google Scholar] [CrossRef]
  17. Alakhdar, A.; Poczos, B.; Washburn, N. Diffusion models in de novo drug design. J. Chem. Inf. Model. 2024, 64, 7238–7256. [Google Scholar] [CrossRef] [PubMed]
  18. Tan, R.K.; Liu, Y.; Xie, L. Reinforcement learning for systems pharmacology-oriented and personalized drug design. Expert Opin. Drug Discov. 2022, 17, 849–863. [Google Scholar] [CrossRef]
  19. Yu, X.; Li, D.; Kottur, J.; Shen, Y.; Kim, H.S.; Park, K.S.; Tsai, Y.H.; Gong, W.; Wang, J.; Suzuki, K.; et al. A selective WDR5 degrader inhibits acute myeloid leukemia in patient-derived mouse models. Sci. Transl. Med. 2021, 13, eabj1578. [Google Scholar] [CrossRef] [PubMed]
  20. Sakamoto, K.M.; Kim, K.B.; Kumagai, A.; Mercurio, F.; Crews, C.M.; Deshaies, R.J. Protacs: Chimeric molecules that target proteins to the Skp1–Cullin–F box complex for ubiquitination and degradation. Proc. Natl. Acad. Sci. USA 2001, 98, 8554–8559. [Google Scholar] [CrossRef]
  21. Burslem, G.M.; Crews, C.M. Proteolysis-targeting chimeras as therapeutics and tools for biological discovery. Cell 2020, 181, 102–114. [Google Scholar] [CrossRef]
  22. Madan, J.; Ahuja, V.K.; Dua, K.; Samajdar, S.; Ramchandra, M.; Giri, S. PROTACs: Current trends in protein degradation by proteolysis-targeting chimeras. BioDrugs 2022, 36, 609–623. [Google Scholar] [CrossRef] [PubMed]
  23. Nalawansha, D.A.; Crews, C.M. PROTACs: An emerging therapeutic modality in precision medicine. Cell Chem. Biol. 2020, 27, 998–1014. [Google Scholar] [CrossRef] [PubMed]
  24. Roy, M.J.; Winkler, S.; Hughes, S.J.; Whitworth, C.; Galant, M.; Farnaby, W.; Rumpel, K.; Ciulli, A. SPR-measured dissociation kinetics of PROTAC ternary complexes influence target degradation rate. ACS Chem. Biol. 2019, 14, 361–368. [Google Scholar] [CrossRef] [PubMed]
  25. Lai, A.C.; Crews, C.M. Induced protein degradation: An emerging drug discovery paradigm. Nat. Rev. Drug Discov. 2017, 16, 101–114. [Google Scholar] [CrossRef]
  26. Xue, F.; Zhang, M.; Li, S.; Gao, X.; Wohlschlegel, J.A.; Huang, W.; Yang, Y.; Deng, W. SE (3)-equivariant ternary complex prediction towards target protein degradation. Nat. Commun. 2025, 16, 5514. [Google Scholar] [CrossRef]
  27. Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
  28. Li, F.; Hu, Q.; Zhang, X.; Sun, R.; Liu, Z.; Wu, S.; Tian, S.; Ma, X.; Dai, Z.; Yang, X.; et al. DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs. Nat. Commun. 2022, 13, 7133. [Google Scholar] [CrossRef]
  29. Ribes, S.; Nittinger, E.; Tyrchan, C.; Mercado, R. Modeling PROTAC degradation activity with machine learning. Artif. Intell. Life Sci. 2024, 6, 100104. [Google Scholar] [CrossRef]
  30. Zhang, W.; Roy Burman, S.S.; Chen, J.; Donovan, K.A.; Cao, Y.; Shu, C.; Zhang, B.; Zeng, Z.; Gu, S.; Zhang, Y.; et al. Machine learning modeling of protein-intrinsic features predicts tractability of targeted protein degradation. Genom. Proteom. Bioinform. 2022, 20, 882–898. [Google Scholar] [CrossRef]
  31. Liu, J.; Roy, M.J.; Isbel, L.; Li, F. Accurate PROTAC-targeted degradation prediction with DegradeMaster. Bioinformatics 2025, 41, i342–i351. [Google Scholar] [CrossRef]
  32. Xie, L.; Xie, L. Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning. PLoS Comput. Biol. 2023, 19, e1010974. [Google Scholar] [CrossRef]
  33. Imrie, F.; Bradley, A.R.; van der Schaar, M.; Deane, C.M. Deep generative models for 3D linker design. J. Chem. Inf. Model. 2020, 60, 1983–1995. [Google Scholar] [CrossRef]
  34. Huang, Y.; Peng, X.; Ma, J.; Zhang, M. 3DLinker: An E (3) Equivariant Variational Autoencoder for Molecular Linker Design. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 9280–9294. [Google Scholar]
  35. Guo, J.; Knuth, F.; Margreitter, C.; Janet, J.P.; Papadopoulos, K.; Engkvist, O.; Patronov, A. Link-INVENT: Generative linker design with reinforcement learning. Digit. Discov. 2023, 2, 392–408. [Google Scholar] [CrossRef]
  36. Neeser, R.M.; Akdel, M.; Kovtun, D.; Naef, L. Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment. In Proceedings of the ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  37. Zheng, S.; Tan, Y.; Wang, Z.; Li, C.; Zhang, Z.; Sang, X.; Chen, H.; Yang, Y. Accelerated rational PROTAC design via deep learning and molecular simulations. Nat. Mach. Intell. 2022, 4, 739–748. [Google Scholar] [CrossRef]
  38. Li, F.; Hu, Q.; Zhou, Y.; Yang, H.; Bai, F. DiffPROTACs is a deep learning-based generator for proteolysis targeting chimeras. Briefings Bioinform. 2024, 25, bbae358. [Google Scholar] [CrossRef]
  39. Song, Z.; Meng, Z.; Hernández-Lobato, J.M. Domain-Adapted Diffusion Model for PROTAC Linker Design Through the Lens of Density Ratio in Chemical Space. In Proceedings of the Forty-second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
  40. Peteani, G.; Huynh, M.T.D.; Gerebtzoff, G.; Rodríguez-Pérez, R. Application of machine learning models for property prediction to targeted protein degraders. Nat. Commun. 2024, 15, 5764. [Google Scholar] [CrossRef] [PubMed]
  41. Poongavanam, V.; Kölling, F.; Giese, A.; Göller, A.H.; Lehmann, L.; Meibom, D.; Kihlberg, J. Predictive modeling of PROTAC cell permeability with machine learning. ACS Omega 2023, 8, 5901–5916. [Google Scholar] [CrossRef] [PubMed]
  42. Drummond, M.L.; Williams, C.I. In silico modeling of PROTAC-mediated ternary complexes: Validation and application. J. Chem. Inf. Model. 2019, 59, 1634–1644. [Google Scholar] [CrossRef] [PubMed]
  43. Zaidman, D.; Prilusky, J.; London, N. PRosettaC: Rosetta based modeling of PROTAC mediated ternary complexes. J. Chem. Inf. Model. 2020, 60, 4894–4903. [Google Scholar] [CrossRef]
  44. Burley, S.K.; Berman, H.M.; Kleywegt, G.J.; Markley, J.L.; Nakamura, H.; Velankar, S. Protein Data Bank (PDB): The single global macromolecular structure archive. Protein Crystallogr. Methods Protoc. 2017, 1607, 627–641. [Google Scholar]
  45. Ge, J.; Li, S.; Weng, G.; Wang, H.; Fang, M.; Sun, H.; Deng, Y.; Hsieh, C.Y.; Li, D.; Hou, T. PROTAC-DB 3.0: An updated database of PROTACs with extended pharmacokinetic parameters. Nucleic Acids Res. 2025, 53, D1510–D1515. [Google Scholar] [CrossRef] [PubMed]
  46. Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
  47. Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7112–7127. [Google Scholar] [CrossRef]
  48. Zhou, J.; Theesfeld, C.L.; Yao, K.; Chen, K.M.; Wong, A.K.; Troyanskaya, O.G. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 2018, 50, 1171–1179. [Google Scholar] [CrossRef]
  49. Avsec, Ž.; Agarwal, V.; Visentin, D.; Ledsam, J.R.; Grabska-Barwinska, A.; Taylor, K.R.; Assael, Y.; Jumper, J.; Kohli, P.; Kelley, D.R. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 2021, 18, 1196–1203. [Google Scholar] [CrossRef]
  50. Linder, J.; Srivastava, D.; Yuan, H.; Agarwal, V.; Kelley, D.R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 2025, 57, 949–961. [Google Scholar] [CrossRef]
  51. Avsec, Ž.; Latysheva, N.; Cheng, J.; Novati, G.; Taylor, K.R.; Ward, T.; Bycroft, C.; Nicolaisen, L.; Arvaniti, E.; Pan, J.; et al. AlphaGenome: Advancing regulatory variant effect prediction with a unified DNA sequence model. bioRxiv 2025, 2025-06. [Google Scholar] [CrossRef]
  52. Palomba, T.; Baroni, M.; Cross, S.; Cruciani, G.; Siragusa, L. ELIOT: A platform to navigate the E3 pocketome and aid the design of new PROTACs. Chem. Biol. Drug Des. 2023, 101, 69–86. [Google Scholar] [CrossRef]
  53. Liu, L.; Damerell, D.R.; Koukouflis, L.; Tong, Y.; Marsden, B.D.; Schapira, M. UbiHub: A data hub for the explorers of ubiquitination pathways. Bioinformatics 2019, 35, 2882–2884. [Google Scholar] [CrossRef] [PubMed]
  54. Liu, Y.; Yang, J.; Wang, T.; Luo, M.; Chen, Y.; Chen, C.; Ronai, Z.; Zhou, Y.; Ruppin, E.; Han, L. Expanding PROTACtable genome universe of E3 ligases. Nat. Commun. 2023, 14, 6509. [Google Scholar] [CrossRef]
  55. Wurz, R.P.; Rui, H.; Dellamaggiore, K.; Ghimire-Rijal, S.; Choi, K.; Smither, K.; Amegadzie, A.; Chen, N.; Li, X.; Banerjee, A.; et al. Affinity and cooperativity modulate ternary complex formation to drive targeted protein degradation. Nat. Commun. 2023, 14, 4177. [Google Scholar] [CrossRef] [PubMed]
  56. Gainza, P.; Sverrisson, F.; Monti, F.; Rodola, E.; Boscaini, D.; Bronstein, M.M.; Correia, B.E. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 2020, 17, 184–192. [Google Scholar] [CrossRef]
  57. Wu, L.; Tian, Y.; Huang, Y.; Li, S.; Lin, H.; Chawla, N.V.; Li, S.Z. Mape-ppi: Towards effective and efficient protein-protein interaction prediction via microenvironment-aware protein embedding. arXiv 2024, arXiv:2402.14391. [Google Scholar]
  58. Lipman, Y.; Chen, R.T.; Ben-Hamu, H.; Nickel, M.; Le, M. Flow Matching for Generative Modeling. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  59. Li, J.; Cheng, C.; Wu, Z.; Guo, R.; Luo, S.; Ren, Z.; Peng, J.; Ma, J. Full-Atom Peptide Design based on Multi-modal Flow Matching. In Proceedings of the International Conference on Machine Learning, PMLR, Vienna, Austria, 21–27 July 2024; pp. 27615–27640. [Google Scholar]
  60. Zhang, Z.; Wang, M.; Liu, Q. Flexsbdd: Structure-based drug design with flexible protein modeling. Adv. Neural Inf. Process. Syst. 2024, 37, 53918–53944. [Google Scholar]
  61. Jing, B.; Stärk, H.; Jaakkola, T.; Berger, B. Generative modeling of molecular dynamics trajectories. Adv. Neural Inf. Process. Syst. 2024, 37, 40534–40564. [Google Scholar]
  62. Jin, Y.; Huang, Q.; Song, Z.; Zheng, M.; Teng, D.; Shi, Q. P2dflow: A protein ensemble generative model with se (3) flow matching. J. Chem. Theory Comput. 2025, 21, 3288–3296. [Google Scholar] [CrossRef]
  63. Moon, S.; Zhung, W.; Yang, S.; Lim, J.; Kim, W.Y. PIGNet: A physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 2022, 13, 3661–3673. [Google Scholar] [CrossRef]
  64. Orlando, G.; Serrano, L.; Schymkowitz, J.; Rousseau, F. Integrating physics in deep learning algorithms: A force field as a PyTorch module. Bioinformatics 2024, 40, btae160. [Google Scholar] [CrossRef] [PubMed]
  65. Subramanian, A.; Narayan, R.; Corsello, S.M.; Peck, D.D.; Natoli, T.E.; Lu, X.; Gould, J.; Davis, J.F.; Tubelli, A.A.; Asiedu, J.K.; et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 2017, 171, 1437–1452. [Google Scholar] [CrossRef]
  66. Zhang, J.; Ubas, A.A.; de Borja, R.; Svensson, V.; Thomas, N.; Thakar, N.; Lai, I.; Winters, A.; Khan, U.; Jones, M.G.; et al. Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. bioRxiv 2025, 2025-02. [Google Scholar]
  67. Qi, X.; Zhao, L.; Tian, C.; Li, Y.; Chen, Z.L.; Huo, P.; Chen, R.; Liu, X.; Wan, B.; Yang, S.; et al. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nat. Commun. 2024, 15, 9256. [Google Scholar] [CrossRef]
  68. Lotfollahi, M.; Klimovskaia Susmelj, A.; De Donno, C.; Hetzel, L.; Ji, Y.; Ibarra, I.L.; Srivatsan, S.R.; Naghipourfar, M.; Daza, R.M.; Martin, B.; et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 2023, 19, e11517. [Google Scholar] [CrossRef] [PubMed]
  69. Qin, X.; Zhang, Y.; Wang, Y.; Zhang, Y.; Jing, J.; Zhang, Y.; Xu, G.; Teng, H.; Wang, T.; Fu, L.; et al. TPDdb: The comprehensive database of targeted protein degrader. Nucleic Acids Res. 2025, gkaf996. [Google Scholar] [CrossRef]
  70. Schapira, M.; Calabrese, M.F.; Bullock, A.N.; Crews, C.M. Targeted protein degradation: Expanding the toolbox. Nat. Rev. Drug Discov. 2019, 18, 949–963. [Google Scholar] [CrossRef]
  71. Mostofian, B.; Martin, H.J.; Razavi, A.; Patel, S.; Allen, B.; Sherman, W.; Izaguirre, J.A. Targeted protein degradation: Advances, challenges, and prospects for computational methods. J. Chem. Inf. Model. 2023, 63, 5408–5432. [Google Scholar] [CrossRef]
  72. Tan, S.; Chen, Z.; Lu, R.; Liu, H.; Yao, X. Rational Proteolysis Targeting Chimera Design Driven by Molecular Modeling and Machine Learning. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2025, 15, e70013. [Google Scholar] [CrossRef]
  73. Nahal, Y.; Menke, J.; Martinelli, J.; Heinonen, M.; Kabeshov, M.; Janet, J.P.; Nittinger, E.; Engkvist, O.; Kaski, S. Human-in-the-loop active learning for goal-oriented molecule generation. J. Cheminform. 2024, 16, 138. [Google Scholar] [CrossRef]
  74. Xu, Z.; Ren, F.; Wang, P.; Cao, J.; Tan, C.; Ma, D.; Zhao, L.; Dai, J.; Ding, Y.; Fang, H.; et al. A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: A randomized phase 2a trial. Nat. Med. 2025, 31, 2602–2610. [Google Scholar] [CrossRef]
  75. Song, T.; Luo, M.; Zhang, X.; Chen, L.; Huang, Y.; Cao, J.; Zhu, Q.; Liu, D.; Zhang, B.; Zou, G.; et al. A Multiagent-Driven Robotic AI Chemist Enabling Autonomous Chemical Research On Demand. J. Am. Chem. Soc. 2025, 147, 12534–12545. [Google Scholar] [CrossRef]
Figure 1. Schematic comparison between conventional inhibition and PROTAC-mediated targeted protein degradation. Unlike small-molecule inhibitors that transiently block target protein activity, PROTACs induce catalytic and event-driven degradation by recruiting an E3 ubiquitin ligase to the target protein, promoting ternary complex formation, ubiquitination, and subsequent proteasomal degradation. Red boxes highlight key considerations in PROTAC discovery, including permeability, target selection informed by native homeostasis–dependent degradability, and ternary complex formation.
Figure 1. Schematic comparison between conventional inhibition and PROTAC-mediated targeted protein degradation. Unlike small-molecule inhibitors that transiently block target protein activity, PROTACs induce catalytic and event-driven degradation by recruiting an E3 ubiquitin ligase to the target protein, promoting ternary complex formation, ubiquitination, and subsequent proteasomal degradation. Red boxes highlight key considerations in PROTAC discovery, including permeability, target selection informed by native homeostasis–dependent degradability, and ternary complex formation.
Pharmaceuticals 18 01793 g001
Figure 2. Application points of artificial intelligence (AI) in PROTAC discovery and optimization. AI-based approaches have been applied across multiple stages of PROTAC discovery, including (1) ternary complex prediction, which models the interactions among the target protein, PROTAC, and E3 ligase; (2) degradability prediction, which estimates the degradation potential of target proteins; (3) linker design and generation, enabling de novo or optimized linker construction for improved ternary complex stability and degradation efficiency; and (4) ADME prediction, which supports pharmacokinetic and drug-likeness evaluation through AI–based models of permeability, metabolism, and toxicity. Adapted from Yu et al., Sci. Transl. Med. (2021) [19], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Figure 2. Application points of artificial intelligence (AI) in PROTAC discovery and optimization. AI-based approaches have been applied across multiple stages of PROTAC discovery, including (1) ternary complex prediction, which models the interactions among the target protein, PROTAC, and E3 ligase; (2) degradability prediction, which estimates the degradation potential of target proteins; (3) linker design and generation, enabling de novo or optimized linker construction for improved ternary complex stability and degradation efficiency; and (4) ADME prediction, which supports pharmacokinetic and drug-likeness evaluation through AI–based models of permeability, metabolism, and toxicity. Adapted from Yu et al., Sci. Transl. Med. (2021) [19], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Pharmaceuticals 18 01793 g002
Figure 3. Overview of the DeepTernary model and its structure–property correlation analysis for BRD4–VHL ternary complexes. (A) Schematic representation of DeepTernary, an SE(3)-equivariant graph neural network architecture with attention blocks. * For PROTAC, the pocket points are derived from unbound structures, don’t need to be predicted. ** For MG(D), the ligand (lig) and POI (p2) are simultaneously aligned to E3 ligase (p1). (B) Correlation analysis between the predicted buried surface area (BSA) and degradation potency scores ( l n ( K L P T ) ) for the modeled BRD4–VHL ternary complexes assembled with various PROTACs. Adapted from Xue et al., Nature Communications (2025) [26], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Figure 3. Overview of the DeepTernary model and its structure–property correlation analysis for BRD4–VHL ternary complexes. (A) Schematic representation of DeepTernary, an SE(3)-equivariant graph neural network architecture with attention blocks. * For PROTAC, the pocket points are derived from unbound structures, don’t need to be predicted. ** For MG(D), the ligand (lig) and POI (p2) are simultaneously aligned to E3 ligase (p1). (B) Correlation analysis between the predicted buried surface area (BSA) and degradation potency scores ( l n ( K L P T ) ) for the modeled BRD4–VHL ternary complexes assembled with various PROTACs. Adapted from Xue et al., Nature Communications (2025) [26], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Pharmaceuticals 18 01793 g003
Figure 4. Overview of the DeepPROTACs and DegradeMaster frameworks and their results for PROTAC degradability prediction. (A) The network architecture of DeepPROTACs, a neural network model integrating graph- and sequence-based representations of targets, E3 ligases, ligands, and linkers to predict PROTAC degradation activity. (B) Chemical structures and physicochemical properties of 16 PROTACs in the experimental dataset, along with Western blotting analyses and densitometry quantifications of ER protein levels following treatment with the corresponding PROTACs. (C) The overall framework of DegradeMaster, a semi-supervised, E(3)-equivariant graph neural network that integrates 3D structural information and pseudo-labeling to predict PROTAC degradability. (D) Visualization of attention weights for the PROTAC molecule (PROTAC-DB ID: 194), illustrating the key molecular features contributing to BRD2 degradation via CRBN E3 ligase recruitment. Adapted from Li et al., Nature Communications (2022) [28], and Liu et al., Bioinformatics (2025) [31], licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Figure 4. Overview of the DeepPROTACs and DegradeMaster frameworks and their results for PROTAC degradability prediction. (A) The network architecture of DeepPROTACs, a neural network model integrating graph- and sequence-based representations of targets, E3 ligases, ligands, and linkers to predict PROTAC degradation activity. (B) Chemical structures and physicochemical properties of 16 PROTACs in the experimental dataset, along with Western blotting analyses and densitometry quantifications of ER protein levels following treatment with the corresponding PROTACs. (C) The overall framework of DegradeMaster, a semi-supervised, E(3)-equivariant graph neural network that integrates 3D structural information and pseudo-labeling to predict PROTAC degradability. (D) Visualization of attention weights for the PROTAC molecule (PROTAC-DB ID: 194), illustrating the key molecular features contributing to BRD2 degradation via CRBN E3 ligase recruitment. Adapted from Li et al., Nature Communications (2022) [28], and Liu et al., Bioinformatics (2025) [31], licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Pharmaceuticals 18 01793 g004
Figure 5. Comprehensive overview of deep generative frameworks for linker/PROTAC generation in DeLinker, DiffPROTACs, and Link-INVENT models. (A) Illustration of training and generation procedures demonstrating the overall workflow of model optimization and molecular sampling of DeLinker. (B) Comparison of DeLinker with an exhaustive database search, highlighting improvements in linker similarity and diversity. (C) Overview of the DiffPROTACs diffusion-based architecture for generative modeling of PROTAC linkers and optimization-guided generation. (D) Case studies showing representative BRD4–VHL PROTACs and the corresponding linkers generated by DiffPROTACs. (E) Training and inference overview of Link-INVENT, describing the reinforcement learning–based fine-tuning strategy for linker generation. (F) Illustrative example of Link-INVENT demonstrating its improvement over epochs to design linkers for given objectives. Adapted from Guo et al., Digital Discovery (2023, CC BY 3.0) [35], Imrie et al., J. Chem. Inf. Model. (2020, CC BY) [33] and Li et al., Briefings in Bioinformatics (2024, CC BY-NC 4.0) [38]. Modified and combined under the terms of the respective Creative Commons licenses.
Figure 5. Comprehensive overview of deep generative frameworks for linker/PROTAC generation in DeLinker, DiffPROTACs, and Link-INVENT models. (A) Illustration of training and generation procedures demonstrating the overall workflow of model optimization and molecular sampling of DeLinker. (B) Comparison of DeLinker with an exhaustive database search, highlighting improvements in linker similarity and diversity. (C) Overview of the DiffPROTACs diffusion-based architecture for generative modeling of PROTAC linkers and optimization-guided generation. (D) Case studies showing representative BRD4–VHL PROTACs and the corresponding linkers generated by DiffPROTACs. (E) Training and inference overview of Link-INVENT, describing the reinforcement learning–based fine-tuning strategy for linker generation. (F) Illustrative example of Link-INVENT demonstrating its improvement over epochs to design linkers for given objectives. Adapted from Guo et al., Digital Discovery (2023, CC BY 3.0) [35], Imrie et al., J. Chem. Inf. Model. (2020, CC BY) [33] and Li et al., Briefings in Bioinformatics (2024, CC BY-NC 4.0) [38]. Modified and combined under the terms of the respective Creative Commons licenses.
Pharmaceuticals 18 01793 g005
Figure 6. Next-generation AI frameworks for PROTAC discovery, including (1) adaptation of protein–protein interaction prediction models to E3 ligase–target systems for accurate estimation of interfacial residues and binding cooperativity, (2) prediction of PROTAC-induced transcriptomic perturbations by linking chemical structure with gene expression response profiles, (3) flow-based generative models for de novo PROTAC molecule design, (4) physics-informed neural networks for ternary complex prediction incorporating energy-based constraints or 3D equivariant learning for pose generation, and (5) amino acid or genomic information-based identification of tissue- or cell-specific E3 ligases. Adapted from Yu et al., Sci. Transl. Med. (2021) [19], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Figure 6. Next-generation AI frameworks for PROTAC discovery, including (1) adaptation of protein–protein interaction prediction models to E3 ligase–target systems for accurate estimation of interfacial residues and binding cooperativity, (2) prediction of PROTAC-induced transcriptomic perturbations by linking chemical structure with gene expression response profiles, (3) flow-based generative models for de novo PROTAC molecule design, (4) physics-informed neural networks for ternary complex prediction incorporating energy-based constraints or 3D equivariant learning for pose generation, and (5) amino acid or genomic information-based identification of tissue- or cell-specific E3 ligases. Adapted from Yu et al., Sci. Transl. Med. (2021) [19], licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Modified and combined under the terms of the respective Creative Commons licenses.
Pharmaceuticals 18 01793 g006
Table 1. AI models applied in PROTAC discovery. Abbreviations: ADME, absorption, distribution, metabolism, and excretion; BSA, buried surface area; DNN, deep neural network; GNN, graph neural network; kNN, k-nearest neighbors; MLP, multi-layer perceptron; MTL, multi-task learning; PK, pharmacokinetics; PLM, protein language model; POI, protein of interest; PPI, protein–protein interaction; PTM, post-translational modification; QSPR, quantitative structure–property relationship; RF, random forest; RL, reinforcement learning; RNN, recurrent neural network; SMILES, simplified molecular input line entry system; VAE, variational autoencoder.
Table 1. AI models applied in PROTAC discovery. Abbreviations: ADME, absorption, distribution, metabolism, and excretion; BSA, buried surface area; DNN, deep neural network; GNN, graph neural network; kNN, k-nearest neighbors; MLP, multi-layer perceptron; MTL, multi-task learning; PK, pharmacokinetics; PLM, protein language model; POI, protein of interest; PPI, protein–protein interaction; PTM, post-translational modification; QSPR, quantitative structure–property relationship; RF, random forest; RL, reinforcement learning; RNN, recurrent neural network; SMILES, simplified molecular input line entry system; VAE, variational autoencoder.
Application AreaModel NameArchitectureInputOutputKey FeaturesRef.
Ternary Complex PredictionDeepTernarySE(3)-equivariant GNNPROTAC molecular graph + POI/E3 pocket graph with 3D coordinates3D ternary complexPredicted BSA indicates degradability[26]
AlphaFold3Diffusion TransformerProtein sequence + ligand SMILES3D ternary complexDiffusion-based multimodal model predicting 3D complexes of proteins, ligands, and nucleic acids[27]
Degradability PredictionDeepPROTACsGNN + RNN + MLPPROTAC molecular graph + POI/E3 pocket graphPROTAC degradability (high/low)Joint molecule–protein modeling for degradation prediction[28]
Ribes et al.Pretrained embedding models + linear classifierPROTAC SMILES + POI/E3 sequence + cell-line metadataPROTAC degradability (high/low)Incorporated cell line context for degradation prediction[29]
MAPDRFProtein features (PTMs, PPI, length, etc.)Protein degradability (tractable/ non-tractable)Protein-level degradability prediction from intrinsic features[30]
DegradeMasterE(3)-equivariant GNN3D molecule graphs of PROTAC and POI/E3PROTAC degradability (high/low)Mutual-attention pooling and pseudo-labeling[31]
PrePROTACRFPLM embedding of protein sequenceCRBN-specific protein degradabilityProtein-level degradability prediction and key residues identification by eSHAP[32]
Linker Design & GenerationDeLinkerGNNAnchor fragments (warhead + E3 ligand) in graph with 3D structural info3D linker structuresDistance/angle constrained fragment linking[33]
3DLinkerE(3)-equivariant graph VAEAnchor fragments (warhead + E3 ligand) in graph with 3D coordinates3D linker structuresgenerates physically consistent linkers with accurate spatial alignment[34]
Link-INVENTRNN + RLAnchor fragments (warhead + E3 ligand) as SMILESOptimized linker SMILESMulti-parameter optimization[35]
ShapeLinkerRNN + RL with 3D point cloud alignmentAnchor fragments (warhead + E3 ligand) as SMILESOptimized linker SMILESGeometry-conditioned method based on Link-INVENT[36]
PROTAC-RLTransformer + RLAnchor fragments (warhead + E3 ligand) as SMILESFull PROTAC molecules with generated linkerPretrained on quasi-PROTACs; RL optimizes PK; prospective validation[37]
DiffPROTACsDiffusion + O(3)-equivariant graph TransformerAnchor fragments (warhead + E3 ligand) as molecular graphs with 3D coordinatesFull PROTAC molecules with generated linkerDiffusion refines noisy linker atoms into valid 3D structures; high validity and structural realism[38]
DAD-PROTACDomain-adapted diffusionAnchor fragments (warhead + E3 ligand) as molecular graphs with 3D coordinatesFull PROTAC molecules with generated linkerCorrects distribution gap between small molecules and PROTACs via density-ratio guided score adjustment; efficient fine-tuning and reduced overfitting[39]
ADME & PermeabilityPeteani et al.MTL GNN + DNNPROTAC SMILESADME-related properties (solubility, permeability, stability, etc.)Used transfer learning to adapt QSPR models to degraders[40]
Poongavanam et al.kNN + RF17 physicochemical descriptorsPermeability classesSize and lipophilicity dominate; strong blind accuracy on VHL set[41]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, K.-S.; Jeon, M. Advancing PROTAC Discovery Through Artificial Intelligence: Opportunities, Challenges, and Future Directions. Pharmaceuticals 2025, 18, 1793. https://doi.org/10.3390/ph18121793

AMA Style

Park K-S, Jeon M. Advancing PROTAC Discovery Through Artificial Intelligence: Opportunities, Challenges, and Future Directions. Pharmaceuticals. 2025; 18(12):1793. https://doi.org/10.3390/ph18121793

Chicago/Turabian Style

Park, Kwang-Su, and Minji Jeon. 2025. "Advancing PROTAC Discovery Through Artificial Intelligence: Opportunities, Challenges, and Future Directions" Pharmaceuticals 18, no. 12: 1793. https://doi.org/10.3390/ph18121793

APA Style

Park, K.-S., & Jeon, M. (2025). Advancing PROTAC Discovery Through Artificial Intelligence: Opportunities, Challenges, and Future Directions. Pharmaceuticals, 18(12), 1793. https://doi.org/10.3390/ph18121793

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop