ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning

Luo, Yanlin; Song, Danyang; Zhang, Chengwei; Su, An

doi:10.3390/app15105616

Open AccessArticle

ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning

¹

College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China

²

Zhejiang Key Laboratory of Green Manufacturing Technology for Chemical Drugs, Key Laboratory of Pharmaceutical Engineering of Zhejiang Province, Key Laboratory for Green Pharmaceutical Technologies and Related Equipment of Ministry of Education, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5616; https://doi.org/10.3390/app15105616

Submission received: 12 April 2025 / Revised: 7 May 2025 / Accepted: 14 May 2025 / Published: 17 May 2025

(This article belongs to the Special Issue Development and Application of Computational Chemistry Methods)

Download

Browse Figures

Versions Notes

Abstract

In PROTAC molecules, the design of the linker directly affects the formation efficiency and stability of the target protein–PROTAC–E3 ligase ternary complex, making it a critical factor in determining degradation activity. However, current linker data are limited, and the accessible chemical space remains narrow. The length, conformation, and chemical composition of linkers play a decisive role in drug performance, highlighting the urgent need for innovative linker design. In this study, we propose ProLinker-Generator, a GPT-based model aimed at generating novel and effective linkers. By integrating transfer learning and reinforcement learning, the model expands the chemical space of linkers and optimizes their design. During the transfer learning phase, the model achieved high scores in validity (0.989) and novelty (0.968) for the generated molecules. In the reinforcement learning phase, it further guided the generation of molecules with ideal properties within our predefined range. ProLinker-Generator demonstrates the significant potential of AI in linker design.

Keywords:

machine learning; transfer learning; PROTAC; linker design; reinforcement learning; molecule generation

1. Introduction

Protein Degradation Targeting Chimera (PROTAC) technology represents one of the most groundbreaking therapeutic strategies in the field of targeted protein degradation [1,2,3,4]. This technology uses linkers to bridge the target protein with the E3 ubiquitin ligase to form a ternary complex, thereby inducing ubiquitination of the target protein and subsequent proteasomal degradation—a process referred to as “catalytic degradation”. It enables a shift from the traditional “occupancy-based inhibition” to the innovative “event-driven” therapeutic paradigm [5,6,7]. Compared to traditional small molecule inhibitors, PROTAC technology offers the following significant advantages: its event-driven nature allows for durable effects at low doses, and it can target proteins that were previously considered “undruggable”, providing new therapeutic hope for refractory diseases such as cancer and neurodegenerative disorders [8,9,10].

However, the development of PROTAC technology still faces numerous challenges. These molecules typically have a high molecular weight, and when the molecular weight exceeds 1000 Da [11], their permeability significantly decreases, leading to poor oral bioavailability. Furthermore, the design of the linker in the PROTAC molecule directly affects the efficiency and stability of the target protein–PROTAC–E3 ligase ternary complex formation, making it a key factor in determining degradation activity [1,12,13]. As a critical “molecular bridge” in PROTAC, the length, conformation, and chemical composition of the linker play a decisive role in the stability of the ternary complex and degradation efficiency. An ideal linker must balance steric hindrance with synergistic effect loss [14]. In addressing these challenges, artificial intelligence (AI) has become a pivotal tool for accelerating the rational design of PROTAC linkers, showcasing substantial potential in optimizing linker properties.

In recent years, AI has achieved remarkable progress in accelerating de novo molecular design [15,16,17,18,19]. Gomez-Bombarelli et al. trained a variational autoencoder (VAE) model to encode part of the ZINC database into latent vectors and map them into a hidden space, enabling the generation of new molecules through sampling and decoding [20,21,22]. Weng et al. introduced the PROTAC-DB database [23]; Zheng et al. applied transformer models for de novo PROTAC design [16]; and Li et al. developed the DeepPROTACs model to predict PROTAC degradation capability, achieving an average prediction accuracy of 77.95% on test sets [24]. These breakthroughs demonstrate that AI technology holds promise for overcoming the limitations of traditional linker design, particularly in aspects such as property optimization.

Notably, the GPT (Generative Pre-trained Transformer) architecture has been increasingly applied in molecular design, demonstrating strong capabilities in generating diverse and chemically valid molecules, including MolGPT [25], 3DSMILES-GPT [26], and RM-GPT [27]. In parallel, reinforcement learning (RL) has become a key tool for controllable molecular generation, enabling models to optimize structures toward desired properties through reward-guided training [28,29,30]. Building upon these advancements, our study introduces ProLinker-GENERATOR, which integrates GPT with RL strategies to enable targeted PROTAC linker design with enhanced structural diversity and improved property control.

In this study, we developed the ProLinker-GENERATOR, a GPT-based method designed to generate novel PROTAC linkers. The model was first pre-trained on large-scale public molecular datasets to build a comprehensive foundation of chemical knowledge. It was then fine-tuned on a specialized linker dataset to enhance its applicability in linker design. During the generation phase, the model is guided by predefined molecular scaffolds and employs RL strategies to optimize and control molecular properties. Additionally, through systematic investigation, this study reveals how different pre-training datasets influence the quality and diversity of the generated linkers. In addition, the effects of using different molecular representations (SMILES and SELFIES) and using data augmentation methods on linker design are discussed, providing key theoretical support for the rational design of PROTAC linkers.

2. Method

2.1. DATA

The dataset used in this study consists of four sources. First, the following three large-scale open-source datasets were used: ChEMBL [31,32], ZINC [33], and QM9 [34]. We performed several preprocessing steps on these datasets, including the removal of duplicates and null values; constraining molecular length to within 369 characters; retaining only molecules containing the elements H, B, C, N, O, F, Si, P, S, Cl, Se, Br, and I; and excluding charged molecules. After processing, we obtained a subset of 1.05 million molecules from ChEMBL, 1 million molecules from ZINC, and approximately 130,000 molecules from QM9.

In the pre-training datasets ChEMBL, ZINC, and QM9, there are 131, 1, and 130 molecules, respectively, that overlap with PROTAC-DB. These overlapping molecules are structurally simple and broadly applicable, and together, they account for only 5.97 × 10⁻³%, 5.16 × 10⁻⁵%, and 9.71 × 10⁻²% of each pre-training dataset, respectively. Given their extremely low proportion relative to the total dataset size, their influence on the validity of generated molecules is considered negligible. To maintain the coverage of common chemical motifs and avoid selective bias, these molecules were retained in the pre-training sets.

The linker dataset used in this study was derived from the PROTAC-DB database introduced by Weng et al. We extracted 1501 linker molecules from this source for use in subsequent transfer learning. All molecular properties were calculated using RDKit (version: 2020.09.1.0). An introduction to several of these properties is provided below:

Molecular weight. Molecular weight is the sum of the atomic weights of all atoms in a molecule, reflecting its overall mass.

The Logarithm of the Partition coefficient (LogP). LogP quantifies a compound’s lipophilicity by measuring its distribution between octanol and water phases, serving as a key indicator of membrane permeability and drug-likeness [35].

Synthetic Accessibility Score (SAS). The SAS evaluates the ease of synthesizing a molecule based on its structural complexity and fragment contributions [36].

Quantitative Estimate of Drug-likeness (QED). The QED integrates multiple molecular properties into a single score to assess a compound’s overall drug-like potential [37].

2.2. SMILES Augmentation

SMILES augmentation is a data augmentation technique based on the text representation of molecular structures. Its core principle lies in leveraging the existence of multiple valid SMILES strings for the same molecule. By utilizing cheminformatics tools such as RDKit, the atomic order of molecules is randomly rearranged to generate a large number of non-canonical yet equivalent SMILES representations. This method disrupts the fixed atomic arrangement in molecules, enabling models to encounter diverse topological expressions of the same molecule during training, thereby enhancing the generalization capability for molecular structures. The technique does not require modifying the molecule itself but expands the dataset size through the diversity of the symbolic system, providing richer training samples for deep learning-based molecular property prediction (e.g., LSTM networks) and effectively addressing the challenge of limited molecular dataset sizes. This approach is employed in the present study to mitigate data scarcity issues.

2.3. Model

GPT (Generative Pre-trained Transformer) [38] is a pre-trained language model based on Transformer architecture [39], which realizes the understanding and generation of natural language through large-scale text data training [40], and we have applied this model to the domains of chemistry and molecular science [41]. Our ProLinker-Generator model is built upon the Mol-GPT architecture proposed by Viraj Bagal et al. [25], as shown in Figure 1. We further integrated transfer learning and reinforcement learning into the framework. The model consists of 8 stacked transformer blocks, and its architecture enables molecular generation through a masked self-attention mechanism. The core computational module employs “Scaled Dot Product Attention”, which operates through the collaborative interaction of the following three types of vectors: Query, Key, and Value.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{{Q K}^{T}}{\sqrt{d_{k}}}) V

(1)

2.4. Benchmark Models

2.4.1. RNN

Recurrent Neural Networks [31] (RNNs) are deep learning models specifically designed for sequential data, making them well-suited for tasks such as molecular generation. By maintaining an internal memory of past inputs through their recurrent architecture, RNNs can effectively capture dependencies between elements in a sequence. In molecular generation, they construct complete structures by iteratively predicting the next character in a molecular representation.

2.4.2. LSTM

Long Short-Term Memory [42] (LSTM) networks are a specialized type of RNNs developed to overcome the vanishing and exploding gradient problems encountered in standard RNNs when processing long sequences. Due to their ability to model long-range dependencies in sequential data, LSTMs have been widely adopted in molecular generation and related fields.

2.4.3. AAE

The Adversarial Autoencoder [43,44] (AAE) is a generative model that integrates adversarial training with autoencoders. Inspired by Generative Adversarial Networks (GANs), it enhances generation capabilities by refining the latent space representation. The AAE first encodes input data into a latent space and then employs adversarial training to ensure the latent distribution matches a predefined prior. This approach enables the model to both reconstruct input data accurately and generate novel samples from the learned latent space.

2.4.4. VAE

The Variational Autoencoder [22] (VAE) is a probabilistic generative model that learns a continuous latent representation of input data. In molecular generation, VAEs encode molecular structures (e.g., graphs or sequences) into a latent space and then reconstruct them using a decoder. By maximizing the evidence lower bound (ELBO) through variational inference, VAEs effectively capture complex molecular features, allowing for the generation of diverse and valid molecular structures.

2.5. Reinforcement Learning

The RL method used in this study is described as follows: The prior model and the agent share the same architecture and vocabulary. Within the RL loop, high-performing compounds are tracked, and a loss function is computed via backpropagation, updating model parameters based on feedback from the environment. Through continuous iterations, the model is guided to generate molecules with the desired properties.

The reward function in RL training is defined as follows: the agent generates SMILES strings token by token, with each token generation treated as an action. Rewards are assigned based on the following predefined criteria:

If the generated SMILES is invalid (e.g., due to syntactic errors), the reward is set to 0 (Equation (4)).
If the SMILES is valid, additional property-based scoring is applied. For LogP, a score of 1 is assigned if the value falls within the target range (1 < LogP < 3); otherwise, the score is 0 (Equation (2)).
For molecular length, a score of 1 is given if the number of atoms is less than 20; otherwise, the score is 0 (Equation (3)).
The final reward is the sum of these individual scores (Equation (4)).

I_{l o g p} (S M I L E S) = \{\begin{array}{l} 1, i f 1 < L o g P < 3 \\ 0, o t h e r w i s e \end{array}

(2)

I_{l e n g t h} (S M I L E S) = \{\begin{array}{l} 1, i f L e n g t h < 20 \\ 0, o t h e r w i s e \end{array}

(3)

f (S M I L E S) = \{\begin{matrix} 0, & i f S M I L E S i s i n v a l i d \\ I_{l o g p} (S M I L E S) + I_{l e n g t h} (S M I L E S), & o t h e r w i s e \end{matrix}

(4)

The resulting reward signal, along with the corresponding high-performing compound, is fed back to the agent to guide the learning process. It should be noted that the reinforcement learning approach employed in this study is consistent with the method used in the MCMG framework developed by Wang et al. [28,45].

2.6. Evaluation Metrics

This study uses the following metrics to evaluate the generated results.

Validity. A generated molecule is considered valid as long as it can be parsed by RDKit.

v a l i d i t y = \frac{t h e n u m b e r o f v a l i d m o l e c u l e s}{t h e n u m b e r o f a l l g e n e r a t e d m o l e c u l e s}

(5)

Uniqueness. Uniqueness refers to the only molecule in the valid molecule set. If a molecule is unique, it means that there is no generated molecule that is the same as this molecule.

u n i q u e n e s s = \frac{t h e n u m b e r o f u n i q u e m o l e c u l e s}{t h e n u m b e r o f v a l i d m o l e c u l e s}

(6)

Novelty. Novelty is the proportion of generated molecules that are not present in the training set.

n o v e l t y = \frac{t h e n u m b e r o f v a l i d m o l e c u l e s t h a t a r e n o t i n t h e d a t a s e t}{t h e n u m b e r o f v a l i d m o l e c u l e s}

(7)

KL Divergence. KL divergence is an asymmetric measure of the difference between two probability distributions. Let P(x) and Q(x) be two probability distributions on a random variable x; then, in the case of discrete and continuous random variables, KL divergence is defined as follows:

K L (P, Q) = \sum P (x) l o g \frac{P (x)}{Q (x)}

(8)

K L (P, Q) = \int P (x) l o g \frac{P (x)}{Q (x)} d x

(9)

Frechet ChemNet Distance (FCD). The FCD is calculated using the features of the generated molecules and those from the dataset. Lower FCD values indicate better alignment with the data distribution. Mathematically, the FCD between a generated distribution G and a training data distribution D is defined as follows:

F C D (G, D) = {||μ_{G} - μ_{D}||}^{2} + T r (\sum_{G} + \sum_{D} - 2 {(\sum_{D} \sum_{G})}^{\frac{1}{2}})

(10)

3. Results

Figure 2 presents the comprehensive workflow of the ProLinker-Generator framework, which integrates advanced deep learning techniques for the de novo design of PROTAC linker molecules. The framework comprises the following four core stages: pre-training, transfer learning (fine-tuning), RL optimization, and comprehensive evaluation. In the pre-training phase, a Transformer-based GPT model is trained on a large-scale public molecular dataset to capture general chemical patterns and structural features. This is followed by transfer learning, wherein the model is fine-tuned on PROTAC-DB—a domain-specific dataset focused on the PROTAC chemical space—to enhance its understanding of target-domain molecular construction rules. In the third stage, RL is introduced to guide the molecular generation process. Starting from the initial character “C”, the fine-tuned model sequentially generates a complete SMILES string. The generated molecules are evaluated based on the following two criteria: (1) the syntactic validity of the SMILES representation and (2) compliance with predefined chemical or functional requirements, i.e., 1 < LogP < 3 and SMILES length less than 20. These scores are used as feedback to iteratively refine the model’s generation strategy, enabling it to produce molecules that meet the desired design specifications. Finally, a comprehensive evaluation is performed on the model, the generated linker molecules, and the resulting PROTAC architectures, assessing the structural validity, chemical feasibility, functional potential, and quantitative performance metrics. During training, the model employs a self-supervised learning paradigm, predicting the next character in a SMILES sequence until an end-of-sequence token is generated.

By integrating fine-tuning-based generation with reinforcement learning-based policy optimization, the model effectively learns the key structural motifs and property distributions characteristic of effective PROTAC linkers. The final output consists of synthetically viable linkers within the learned chemical space, demonstrating optimized performance while being supported by rigorous quantitative evaluation.

3.1. Molecular Generation Based on Reinforcement Learning

Computational properties, such as LogP, QED, and SAS, can guide the generation of PROTAC linkers. Firstly, the linker length can affect the formation of PROTACs. In the study by Cyrus et al., PROTACs were tested for their ability to degrade endogenous ER-α in MCF7 breast cancer cells. They explored whether the position at which the linker was attached to the pentapeptide was significant [46]. Ultimately, it was found that PROTACs with linker atomic lengths of less than 16 atoms were preferred [47]. Moreover, according to the study by Bemis et al., the LogP value of the linker should typically be maintained within a reasonable range to balance hydrophilicity and lipophilicity, thereby avoiding issues such as molecular aggregation or reduced solubility due to excessive hydrophobicity [48]. High LogP may lead to aggregation or poor membrane permeability, while low LogP may reduce cellular uptake efficiency [46].

In this section, RL [28] was used to regulate linker generation by controlling linker length and LogP values (Figure 3). RL provides guidance to the model through its unique reward-and-penalty mechanism, directing it to generate linkers with desirable properties. Firstly, by constraining the linker’s LogP to the range of 1 < LogP < 3, a clear shift in the lipophilicity distribution of the RL-generated molecules can be observed (Figure 3A). Secondly, by limiting the linker length to fewer than 20 atoms, the length distribution of the generated molecules remains largely within the desired range (Figure 3B). When both strategies are applied simultaneously, similar improvements are observed (Figure 3C). These results collectively demonstrate that reinforcement learning can effectively guide the model to generate linkers with targeted and desirable properties.

We further evaluated the impact of RL on the evaluation metrics for molecular generation (Table 1). The results indicated that the application of the reward-and-punishment mechanism improved the validity of the generated linkers. Meanwhile, the novelty of the molecules remained largely unchanged, suggesting that the model’s ability to explore novel structures was not affected. However, due to the targeted constraints imposed by RL, the available chemical space was somewhat restricted, leading to a decrease in the uniqueness and KL divergence of the generated molecules. This phenomenon is reasonable and is expected when optimization mechanisms are introduced to refine specific properties.

This study employs a three-stage training strategy—pre-training, transfer learning, and reinforcement learning—to develop a GPT model capable of generating efficient and optimized PROTAC linkers. Initially, pre-training on a large molecular database ensures the consistency of SMILES representations. Although the pre-trained dataset from ChEMBL contains drug-like molecules, it does not fully meet the specific structural and property requirements for PROTAC linkers. The resulting molecules exhibit low SAS, indicating ease of synthesis; however, their low quantitative estimates of QED values and overly complex molecular structures render them unsuitable as linkers.

After fine-tuning the model via transfer learning, the generated molecules, as shown in Figure 4C, demonstrate higher structural similarity to those in PROTAC-DB (Figure 4A). Additionally, these molecules exhibit higher QED values and improved compatibility, although their LogP values remain suboptimal. Specifically, the LogP values of the generated molecules are predominantly distributed between 0 and 1, rarely satisfying the target range from 1 to 3. To address this limitation, reinforcement learning, guided by molecular properties, further refines the model’s performance. Figure 4D highlights a subset of the molecules generated through RL. Compared with the structures in PROTAC-DB, these molecules exhibit similar architectures and successfully fulfill the “linking” function. Moreover, they meet the target LogP requirements while maintaining low SAS values, ensuring ease of synthesis.

3.2. Molecular Generation Based on Transfer Learning

(1): Effect of Data Augmentation

Due to the relative lack of PROTAC Linker data, data augmentation, a technique that expands the training dataset by generating additional samples, was used to provide more data for ProLinker-Generator. In molecular science, data augmentation is typically achieved through transformations of the SMILES representation of molecules, such as atomic rearrangements, isomer introductions, or substructure replacements [49]. These transformations aim to mitigate the challenges posed by the high dimensionality and complexity of chemical space, thereby enabling models to better capture the underlying patterns and distribution characteristics of molecules. According to the results in Table 2, data augmentation significantly outperforms training directly on the small linker dataset. In terms of uniqueness (0.587) and FCD score (6.96), it even surpasses transfer learning. This indicates that data augmentation enables the model to better capture the diverse representations of SMILES, ultimately leading to improved generation performance.

(2): Effect of Different Pre-training Datasets

The source and feature distribution of pre-training data have been shown to be significantly associated with a model’s generalization ability and task-specific performance. To systematically investigate this phenomenon, this study utilized publicly available large-scale chemical datasets from different domains, including ChEMBL, which focuses on bioactivity screening; ZINC, which is oriented toward organic synthesis; and QM9, derived from quantum mechanical calculations [50]. As shown in Table 3, in the transfer learning tasks, pre-training on ChEMBL resulted in the best overall performance for linker generation. However, its performance in terms of novelty was slightly lower (0.968) than that of the other two datasets (0.998). This highlights the influence that different pre-training datasets can have on specific aspects of model behavior [51].

Figure 5 compares the chemical spaces of the original and generated linker datasets using the UMAP algorithm [52,53,54]. Each point in the plot represents a unique linker structure, and the spatial distribution illustrates the overall similarity among structures in high-dimensional chemical feature space. In Figure 5A, which displays the original linkers, the data points are tightly clustered, indicating high structural similarity and limited diversity among existing linkers. This pattern suggests that traditional linker design has primarily explored a narrow region of the available chemical space. In contrast, Figure 5B—showing the linkers generated by our method—reveals a much more dispersed distribution, with many points occupying regions not covered by the original linkers. This broader spread demonstrates that our approach generates structurally diverse and chemically novel linkers beyond conventional design principles. Overall, these results indicate that our method effectively expands the accessible chemical space for linkers, enabling the discovery of new structural motifs with potential functional advantages.

The experimental results demonstrate that molecular representations significantly influence linker generation performance. (Table 4) The SMILES representation (linker_SMILES) exhibits high validity (0.989) and novelty (0.968) but low uniqueness (0.304), indicating limited diversity in the generated linkers, despite its low KL divergence (0.87) and FCD (7.03), which suggest closer alignment with the target molecular distribution. In contrast, the SELFIES representation (linker_SELFIES_1K) achieves optimal validity (1.0) and novelty (1.0), along with significantly improved uniqueness (0.874), albeit with slightly higher KL divergence (0.94) and FCD (3.606), reflecting a modest deviation from the target chemical space. Expanding the dataset to 10K (linker_SELFIES_10K) reduces uniqueness (0.619) but slightly improves KL (0.911) and FCD (3.576), suggesting that larger training data augment distributional alignment. While SMILES remains the mainstream choice due to its simplicity and mature toolkits, SELFIES offers unique value in diversity-demanding scenarios.

3.3. Linker Generation for Molecular Docking Examples

To evaluate the practical applicability of ProLinker-Generator, the BTK target system (PDB ID: 6W8I) was selected as a case study to assess the effectiveness of the generated linkers [18]. As shown in Figure 6, the linker predicted by the model was combined with the E3 ligand and warhead to assemble a complete PROTAC molecule. Subsequently, molecular docking simulations between the PROTAC and the target protein were performed using CB-DOCK2 to assess its binding affinity and overall feasibility [55].

The results show that the generated linker fully satisfies the constraints defined during the reinforcement learning phase, including a LogP value between 1 and 3 and a molecular length of fewer than 20 atoms. In addition, the compound displays favorable drug-like properties, with a QED score of 0.434 and an SAS of 1.959. Notably, the final molecular docking score of −11.8 kcal/mol outperforms the reference value of −10.9 kcal/mol from the known BTK structure (PDB ID: 6W8I) [18,56], indicating enhanced binding potential between the PROTAC and target protein. Collectively, these findings support the rationality and effectiveness of the generated linker.

4. Discussion

Linkers play a critical role in determining the biological degradation efficacy of PROTACs. In this study, we employed both transfer learning and reinforcement learning approaches to design linkers. Linkers were designed by leveraging data augmentation and selecting different pre-training datasets. Reinforcement learning was used to optimize linker properties through its unique reward iteration mechanism. A transformer-based GPT model was primarily used in this study. As shown in Figure 7, compared to a range of existing generative models, GPT achieved the best evaluation results, consistently ranking at the top of all curves. This suggests that our approach outperforms mainstream generative models, particularly in terms of the effectiveness and innovation of the generated linkers. However, RNN [57,58] and AAE have certain advantages in the diversity and structural similarity of the generated linkers. This enabled us to design more effective, unique, and novel linkers, thereby enriching the chemical space of linkers and providing a more diverse candidate pool for novel linker design.

As shown in Figure 8, we present several examples of linkers generated using the method developed in this study. PROTAC linkers are primarily based on PEG chains, and the linkers displayed on the left side of the figure are all PEG-based designs generated by our model. The linkers on the right side contain maleimide groups, which are suitable for stable conjugation. Our approach provides structural diversity for PROTAC linker design.

5. Conclusions

In the development of PROTAC linkers, the stable formation of the ternary complex relies on the precise structural compatibility of the linker. This study proposes a generative AI-based strategy that systematically explores and expands the chemical space of linkers by integrating cross-domain chemical knowledge bases (such as ChEMBL, ZINC, and QM9) with transfer learning and reinforcement learning approaches. The model optimizes molecular properties through a reward-driven mechanism, breaking free from the constraints of traditional design based on empirical rules, and significantly enhancing the structural novelty and functional diversity of the generated linkers. Compared to manual iterative design, this approach can reduce the linker development cycle by several-fold, while the integrated automatic evaluation modules (e.g., efficacy, logP calculation, etc.) ensure the theoretical drug-likeness of the candidate molecules.

Although generative models demonstrate highly efficient design capabilities, their performance remains limited by the breadth and quality of the training data. To address this, future efforts should focus on building high-quality molecular databases that incorporate more experimentally validated PROTAC cases and dynamic structure–activity relationship (SAR) data, thereby enhancing the model’s adaptability to complex chemical spaces. In addition, the introduction of a multi-objective reinforcement learning framework can enable the simultaneous optimization of linker structure, properties, and stability. With the advancement of computational technologies, more data-driven molecular features can be developed, ultimately driving PROTAC linker design toward greater intelligence and precision.

Author Contributions

Conceptualization, D.S., C.Z. and A.S.; Methodology, Y.L. and D.S.; Software, Y.L.; Validation, Y.L. and D.S.; Investigation, A.S.; Resources, A.S.; Data curation, Y.L.; Writing—original draft, Y.L.; Writing—review & editing, D.S. and C.Z.; Visualization, Y.L. and C.Z.; Project administration, A.S.; Funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Zhejiang Province Science and Technology Plan Project under Grant No. 2022C01179, the National Natural Science Foundation of China under Grant No. 22108252, and the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under Grant No. LHDMZ23B060001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://github.com/su-group/ProLinker-Generator (accessed on 10 May 2025)].

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Sun, X.; Gao, H.; Yang, Y.; He, M.; Wu, Y.; Song, Y.; Tong, Y.; Rao, Y. Protacs: Great Opportunities for Academia and Industry. Signal Transduct. Target. Ther. 2019, 4, 64. [Google Scholar] [CrossRef] [PubMed]
Ottis, P.; Crews, C.M. Proteolysis-Targeting Chimeras: Induced Protein Degradation as a Therapeutic Strategy. ACS Chem. Biol. 2017, 12, 892–898. [Google Scholar] [CrossRef]
Deshaies, R.J. Prime Time for Protacs. Nat. Chem. Biol. 2015, 11, 634–635. [Google Scholar] [CrossRef]
Schneekloth, A.R.; Pucheault, M.; Tae, H.S.; Crews, C.M. Targeted Intracellular Protein Degradation Induced by a Small Molecule: En Route to Chemical Proteomics. Bioorg. Med. Chem. Lett. 2008, 18, 5904–5908. [Google Scholar] [CrossRef]
Zhong, G.; Chang, X.; Xie, W.; Zhou, X. Targeted Protein Degradation: Advances in Drug Discovery and Clinical Practice. Signal Transduct. Target. Ther. 2024, 9, 308. [Google Scholar]
Békés, M.; Langley, D.R.; Crews, C.M. Protac Targeted Protein Degraders: The Past Is Prologue. Nat. Rev. Drug Discov. 2022, 21, 181–200. [Google Scholar] [CrossRef]
Toure, M.; Crews, C.M. Small-Molecule Protacs: New Approaches to Protein Degradation. Angew. Chem. Int. Ed. 2016, 55, 1966–1973. [Google Scholar] [CrossRef]
Chirnomas, D.; Hornberger, K.R.; Crews, C.M. Protein Degraders Enter the Clinic—A New Approach to Cancer Therapy. Nat. Rev. Clin. Oncol. 2023, 20, 265–278. [Google Scholar] [CrossRef]
Li, X.; Pu, W.; Zheng, Q.; Ai, M.; Chen, S.; Peng, Y. Proteolysis-Targeting Chimeras (Protacs) in Cancer Therapy. Mol. Cancer 2022, 21, 99. [Google Scholar] [CrossRef]
Zou, Y.; Ma, D.; Wang, Y. The Protac Technology in Drug Development. Cell Biochem. Funct. 2019, 37, 21–30. [Google Scholar] [CrossRef]
Cecchini, C.; Pannilunghi, S.; Tardy, S.; Scapozza, L. From Conception to Development: Investigating Protacs Features for Improved Cell Permeability and Successful Protein Degradation. Front. Chem. 2021, 9, 672267. [Google Scholar] [CrossRef] [PubMed]
Gharbi, Y.; Mercado, R. A Comprehensive Review of Emerging Approaches in Machine Learning Forde Novoprotac Design. Digit. Discov. 2024, 3, 2158–2176. [Google Scholar] [CrossRef]
Park, D.; Izaguirre, J.; Coffey, R.; Xu, H. Modeling the Effect of Cooperativity in Ternary Complex Formation and Targeted Protein Degradation Mediated by Heterobifunctional Degraders. ACS Bio. Med. Chem. Au 2022, 3, 74–86. [Google Scholar] [CrossRef] [PubMed]
Zorba, A.; Nguyen, C.; Xu, Y.; Starr, J.; Borzilleri, K.; Smith, J.; Zhu, H.; Farley, K.A.; Ding, W.; Schiemer, J.; et al. Delineating the Role of Cooperativity in the Design of Potent Protacs for Btk. Proc. Natl. Acad. Sci. USA 2018, 115, E7285–E7292. [Google Scholar] [CrossRef]
Danishuddin; Jamal, M.S.; Song, K.-S.; Lee, K.-W.; Kim, J.-J.; Park, Y.-M. Revolutionizing Drug Targeting Strategies: Integrating Artificial Intelligence and Structure-Based Methods in Protac Development. Pharmaceuticals 2023, 16, 1649. [Google Scholar] [CrossRef]
Zheng, S.; Tan, Y.; Wang, Z.; Li, C.; Zhang, Z.; Sang, X.; Chen, H.; Yang, Y. Accelerated Rational Protac Design via Deep Learning and Molecular Simulations. Nat. Mach. Intell. 2022, 4, 739–748. [Google Scholar] [CrossRef]
Abbas, A.; Ye, F. Computational Methods and Key Considerations for in Silico Design of Proteolysis Targeting Chimera (Protacs). Int. J. Biol. Macromol. 2024, 277, 134293. [Google Scholar] [CrossRef]
Li, B.; Ran, T.; Chen, H. 3d Based Generative Protac Linker Design with Reinforcement Learning. Brief. Bioinform. 2023, 24, bbad323. [Google Scholar] [CrossRef]
Poongavanam, V.; Atilaw, Y.; Ye, S.; Wieske, L.H.E.; Erdelyi, M.; Ermondi, G.; Caron, G.; Kihlberg, J. Predicting the Permeability of Macrocycles from Conformational Sampling—Limitations of Molecular Flexibility. J. Pharm. Sci. 2021, 110, 301–313. [Google Scholar] [CrossRef]
Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. Constrained Graph Variational Autoencoders for Molecule Design. Adv. Neural Inf. Process. Syst. 2018, 31, 7806–7815. [Google Scholar]
Lim, J.; Ryu, S.; Kim, J.W.; Kim, W.Y. Molecular Generative Model Based on Conditional Variational Autoencoder for De Novo Molecular Design. J. Cheminform. 2018, 10, 31. [Google Scholar] [CrossRef] [PubMed]
Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef]
Weng, G.; Shen, C.; Cao, D.; Gao, J.; Dong, X.; He, Q.; Yang, B.; Li, D.; Wu, J.; Hou, T. Protac-Db: An Online Database of Protacs. Nucleic Acids Res. 2021, 49, D1381–D1387. [Google Scholar] [CrossRef]
Li, F.; Hu, Q.; Zhang, X.; Sun, R.; Liu, Z.; Wu, S.; Tian, S.; Ma, X.; Dai, Z.; Yang, X.; et al. Deepprotacs Is a Deep Learning-Based Targeted Degradation Predictor for Protacs. Nat. Commun. 2022, 13, 7133. [Google Scholar] [CrossRef]
Bagal, V.; Aggarwal, R.; Vinod, P.K.; Priyakumar, U.D. Molgpt: Molecular Generation Using a Transformer-Decoder Model. J. Chem. Inf. Model. 2021, 62, 2064–2076. [Google Scholar] [CrossRef]
Wang, J.; Luo, H.; Qin, R.; Wang, M.; Wan, X.; Fang, M.; Zhang, O.; Gou, Q.; Su, Q.; Shen, C.; et al. 3dsmiles-Gpt: 3d Molecular Pocket-Based Generation with Token-Only Large Language Model. Chem. Sci. 2025, 16, 637–648. [Google Scholar] [CrossRef]
Fan, W.; He, Y.; Zhu, F. Rm-Gpt: Enhance the Comprehensive Generative Ability of Molecular Gpt Model via Localrnn and Realformer. Artif. Intell. Med. 2024, 150, 102827. [Google Scholar] [CrossRef]
Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. Reinvent 2.0: An Ai Tool for De Novo Drug Design. J. Chem. Inf. Model. 2020, 60, 5918–5922. [Google Scholar] [CrossRef]
Guo, J.; Knuth, F.; Margreitter, C.; Janet, J.P.; Papadopoulos, K.; Engkvist, O.; Patronov, A. Link-Invent: Generative Linker Design with Reinforcement Learning. Digit. Discov. 2023, 2, 392–408. [Google Scholar] [CrossRef]
Bou, A.; Thomas, M.; Dittert, S.; Navarro, C.; Majewski, M.; Wang, Y.; Patel, S.; Tresadern, G.; Ahmad, M.; Moens, V.; et al. Acegen: Reinforcement Learning of Generative Chemical Agents for Drug Discovery. J. Chem. Inf. Model. 2024, 64, 5900–5911. [Google Scholar] [CrossRef]
Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular Sets (Moses): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol. 2018, 11, 565644. [Google Scholar] [CrossRef] [PubMed]
Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; et al. The Chembl Database in 2017. Nucleic Acids Res. 2017, 45, D945–D954. [Google Scholar] [CrossRef] [PubMed]
Irwin, J.J.; Sterling, T.; Mysinger, M.M.; Bolstad, E.S.; Coleman, R.G. Zinc: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757–1768. [Google Scholar] [CrossRef]
Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. Moleculenet: A Benchmark for Molecular Machine Learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef]
Wildman, S.A.; Crippen, G.M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868–873. [Google Scholar] [CrossRef]
Ertl, P.; Schuffenhauer, A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1, 8. [Google Scholar] [CrossRef]
Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Hocky, G.M.; White, A.D. Natural Language Processing Models That Automate Programming Will Transform Chemistry Research and Teaching. Digit. Discov. 2022, 1, 79–83. [Google Scholar] [CrossRef]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Shen, T.; Guo, J.; Han, Z.; Zhang, G.; Liu, Q.; Si, X.; Wang, D.; Wu, S.; Xia, J. Automoldesigner for Antibiotic Discovery: An Ai-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J. Chem. Inf. Model. 2024, 64, 575–583. [Google Scholar] [CrossRef] [PubMed]
Polykovskiy, D.; Zhebrak, A.; Vetrov, D.; Ivanenkov, Y.; Aladinskiy, V.; Mamoshina, P.; Bozdaganyan, M.; Aliper, A.; Zhavoronkov, A.; Kadurin, A. Entangled Conditional Adversarial Autoencoder for De Novo Drug Discovery. Mol. Pharm. 2018, 15, 4398–4405. [Google Scholar] [CrossRef]
Kadurin, A.; Nikolenko, S.; Khrabrov, K.; Aliper, A.; Zhavoronkov, A. Drugan: An Advanced Generative Adversarial Autoencoder Model for De Novo Generation of New Molecules with Desired Molecular Properties in Silico. Mol. Pharm. 2017, 14, 3098–3104. [Google Scholar] [CrossRef]
Wang, J.; Hsieh, C.Y.; Wang, M.; Wang, X.; Wu, Z.; Jiang, D.; Liao, B.; Zhang, X.; Yang, B.; He, Q.; et al. Multi-Constraint Molecular Generation Based on Conditional Transformer, Knowledge Distillation and Reinforcement Learning. Nat. Mach. Intell. 2021, 3, 914–922. [Google Scholar] [CrossRef]
Cyrus, K.; Wehenkel, M.; Choi, E.-Y.; Han, H.-J.; Lee, H.; Swanson, H.; Kim, K.-B. Impact of Linker Length on the Activity of Protacs. Mol. BioSyst. 2011, 7, 359–364. [Google Scholar] [CrossRef]
Bemis, T.A.; La Clair, J.J.; Burkart, M.D. Unraveling the Role of Linker Design in Proteolysis Targeting Chimeras. J. Med. Chem. 2021, 64, 8042–8052. [Google Scholar] [CrossRef]
Dong, Y.; Ma, T.; Xu, T.; Feng, Z.; Li, Y.; Song, L.; Yao, X.; Ashby, C.R., Jr.; Hao, G.F. Characteristic Roadmap of Linker Governs the Rational Design of Protacs. Acta Pharm. Sin. B 2024, 14, 4266–4295. [Google Scholar] [CrossRef]
Bjerrum, E.J. Smiles Enumeration as Data Augmentation for Neural Network Modeling of Molecules. arXiv 2017, arXiv:1703.07076. [Google Scholar]
Zhang, C.; Zhai, Y.; Gong, Z.; Duan, H.; She, Y.B.; Yang, Y.F.; Su, A. Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data. J. Cheminform. 2024, 16, 89. [Google Scholar] [CrossRef]
Entezari, M.; Wortsman, M.; Saukh, O.; Shariatnia, M.; Sedghi, H.; Schmidt, L. The Role of Pre-Training Data in Transfer Learning. Comput. Sci. 2023. [Google Scholar] [CrossRef]
Trozzi, F.; Wang, X.; Tao, P. Umap as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J. Phys. Chem. B 2021, 125, 5022–5034. [Google Scholar] [CrossRef] [PubMed]
McInnes, L.; Healy, J.; Saul, N.; Großberger, L. Umap: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
Yang, Y.; Hsieh, C.-Y.; Kang, Y.; Hou, T.; Liu, H.; Yao, X. Deep Generation Model Guided by the Docking Score for Active Molecular Design. J. Chem. Inf. Model. 2023, 63, 2983–2991. [Google Scholar] [CrossRef]
Liu, Y.; Yang, X.; Gan, J.; Chen, S.; Xiao, Z.X.; Cao, Y. Cb-Dock2: Improved Protein-Ligand Blind Docking by Integrating Cavity Detection, Docking and Homologous Template Fitting. Nucleic Acids Res. 2022, 50, W159–W164. [Google Scholar] [CrossRef]
Schiemer, J.; Horst, R.; Meng, Y.; Montgomery, J.I.; Xu, Y.; Feng, X.; Borzilleri, K.; Uccello, D.P.; Leverett, C.; Brown, S.; et al. Snapshots and Ensembles of Btk and Ciap1 Protein Degrader Ternary Complexes. Nat. Chem. Biol. 2021, 17, 152–160. [Google Scholar] [CrossRef]
Tong, X.; Liu, X.; Tan, X.; Li, X.; Jiang, J.; Xiong, Z.; Xu, T.; Jiang, H.; Qiao, N.; Zheng, M. Generative Models for De Novo Drug Design. J. Med. Chem. 2021, 64, 14011–14027. [Google Scholar] [CrossRef]
Sousa, T.; Correia, J.; Pereira, V.; Rocha, M. Generative Deep Learning for Targeted Compound Design. J. Chem. Inf. Model. 2021, 61, 5343–5361. [Google Scholar] [CrossRef]

Figure 1. GPT model architecture.

Figure 2. Workflow of the molecular generator, including pre-training, fine-tuning, reinforcement learning, and evaluation.

Figure 3. Linker generation with RL controlling (A) 1 < LogP < 3, (B) SMILES_Length < 20, (C) and 1 < LogP < 3 and SMILES_Length < 20 at the same time.

Figure 4. (A) Linker in PROTAC-DB; (B) linker-like candidate molecules generated by the model after pre-training; (C) linker-like candidate molecules generated by the fine-tuned models; and (D) linker-like candidate molecules generated by models guided by reinforcement learning.

Figure 5. Chemical space map of the generated linkers and existing linkers (A) shows the existing linkers, and (B) includes the generated linkers.

Figure 6. Docking results of PROTAC molecules with linkers generated using ProLinker-Generator in the BTK system. The stick-like structures composed of cyan, blue and red represent the listed PROTAC molecules. The gray region represents the protein (PDB ID: 6W8I), the green region highlights residues within 5 Å of the ligand, and the stick-like structure outside the green area corresponds to the PROTAC molecule.

Figure 7. Performance of state-of-the-art molecular generation models and our models in the generation of PROTAC linkers.

Figure 8. Examples of generated linkers: PEG-based linkers (left) and maleimide-based linkers (right).

Table 1. Evaluation of the linkers from reinforcement learning-based generation.

	Validity	Uniqueness	Novelty	KL	FCD
Length-controlled	0.995	0.249	0.976	0.579	14.6
LogP-controlled	0.998	0.157	0.921	0.686	11.5
Two properties controlled	0.995	0.129	0.931	0.466	17.5
Without RL	0.989	0.404	0.968	0.879	7.48

Table 2. Effect of data augmentation on linker design.

	Valid	Unique	Novelty	FCD	KL
Linker_Only	0.146	0.973	1.00	15.63	0.71
Linker_Aug	0.848	0.587	0.973	6.96	0.86
Linker_TL	0.989	0.404	0.968	7.48	0.88

Table 3. Effect of different pre-training datasets.

	Valid	Unique	Novelty	KL	FCD
ChEMBL	0.989	0.404	0.968	0.88	7.48
ZINC	0.979	0.333	0.998	0.86	7.71
QM9	0.975	0.331	0.998	0.86	7.21

Table 4. Different molecular representations.

	Valid	Unique	Novelty	KL	FCD
linker_SMILES	0.989	0.304	0.968	0.87	7.03
linker_SELFIES_1K	1.0	0.874	1.0	0.94	3.606
linker_SELFIES_10K	1.0	0.619	1.0	0.911	3.576

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, Y.; Song, D.; Zhang, C.; Su, A. ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning. Appl. Sci. 2025, 15, 5616. https://doi.org/10.3390/app15105616

AMA Style

Luo Y, Song D, Zhang C, Su A. ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning. Applied Sciences. 2025; 15(10):5616. https://doi.org/10.3390/app15105616

Chicago/Turabian Style

Luo, Yanlin, Danyang Song, Chengwei Zhang, and An Su. 2025. "ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning" Applied Sciences 15, no. 10: 5616. https://doi.org/10.3390/app15105616

APA Style

Luo, Y., Song, D., Zhang, C., & Su, A. (2025). ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning. Applied Sciences, 15(10), 5616. https://doi.org/10.3390/app15105616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ProLinker–Generator: Design of a PROTAC Linker Base on a Generation Model Using Transfer and Reinforcement Learning

Abstract

1. Introduction

2. Method

2.1. DATA

2.2. SMILES Augmentation

2.3. Model

2.4. Benchmark Models

2.4.1. RNN

2.4.2. LSTM

2.4.3. AAE

2.4.4. VAE

2.5. Reinforcement Learning

2.6. Evaluation Metrics

3. Results

3.1. Molecular Generation Based on Reinforcement Learning

3.2. Molecular Generation Based on Transfer Learning

3.3. Linker Generation for Molecular Docking Examples

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI