A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery

Cai, Fei; Zhou, Lijuan; Delgado, Bryce; Chang, Wenping; Tom, Jeffrey; Hernandez, Evelyn; Joshi, Prajakta; Song, Aimin; Masureel, Matthieu; Maun, Henry R.; Chang, Andrew; Zhang, Yingnan

doi:10.3390/ph19020288

Open AccessArticle

A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery

by

Fei Cai

^1,†

,

Lijuan Zhou

^1,†,

Bryce Delgado

²

,

Wenping Chang

³,

Jeffrey Tom

³,

Evelyn Hernandez

⁴,

Prajakta Joshi

⁴,

Aimin Song

³

,

Matthieu Masureel

²,

Henry R. Maun

¹,

Andrew Chang

^5,*

and

Yingnan Zhang

^1,*

¹

Department of Biological Chemistry, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA

²

Department of Structural Biology, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA

³

Department of Peptide Therapeutics, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA

⁴

Department of BioMolecular Research, Genentech, 1 DNA Way, South San Francisco, CA 94080, USA

⁵

DeepSeq.AI, 500 Lincoln Centre Drive, Building B, Suite 110, Foster City, CA 94404, USA

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Pharmaceuticals 2026, 19(2), 288; https://doi.org/10.3390/ph19020288

Submission received: 20 December 2025 / Revised: 2 February 2026 / Accepted: 6 February 2026 / Published: 9 February 2026

(This article belongs to the Special Issue Peptide Synthesis and Drug Development: Exploring Progress and Potential)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Nature has evolved millions of venom-derived peptides with diverse biological functions, a substantial fraction of which target complex membrane proteins such as G-protein-coupled receptors and ion channels. Many of these peptides are stabilized by multiple disulfide bonds, endowing them with exceptional structural stability and favorable pharmacological properties. Methods: Leveraging this natural diversity, we developed a robust venom peptide therapeutics discovery system built on phage display technology and constructed a library using approximately 482 venom-derived scaffolds. The library design was guided by a machine learning (ML) model capable of predicting mutation-tolerant residues that preserve peptide foldability, maximizing structural integrity and sequence diversity. Results: The resulting VCX library was evaluated through screening against four diverse targets (CD47, DLL3, IL33, and P2X7R), yielding strong binders for all four, a success rate of 100%. Furthermore, by integrating high-throughput recombinant expression of thioredoxin–venom fusion proteins along with ML-assisted affinity maturation, we rapidly identified potential leads for DLL3 binders. Conclusions: This venom-based discovery platform offers significant advantages in both functionality and developability compared with conventional peptide discovery approaches. By combining natural structural diversity, ML-guided design, and recombinant expression, it enables efficient identification of “antibody-like” binders with molecular weights much smaller than those of antibodies. Consequently, it provides a powerful strategy for developing next-generation peptide therapeutics targeting challenging protein–protein interactions and complex membrane proteins.

Keywords:

venom peptide library; phage display; machine learning

Graphical Abstract

1. Introduction

Animal venoms represent one of the most chemically diverse and biologically validated natural sources for drug discovery. Over millions of years of evolution, venomous species—including snakes, scorpions, spiders, cone snails, and centipedes—have developed sophisticated molecular arsenals that are primarily composed of disulfide-rich peptides. These peptides have evolved to precisely and potently modulate ion channels, receptors, and enzymes in prey or predators, often with remarkable target selectivity and nanomolar binding affinities [1,2]. This evolutionary optimization makes venom peptides a unique reservoir of bioactive scaffolds for therapeutic development. Unlike traditional small molecules, which typically fail to engage large and flat protein–protein interaction (PPI) surfaces, venom peptides can mimic protein epitopes through defined three-dimensional folds stabilized by multiple disulfide bonds. As a result, they are especially suited for targeting “undruggable” proteins such as GPCRs, ion channels, and cytokine receptors. Several clinically approved drugs have originated from venom peptides, demonstrating their translational potential. Examples include Captopril, a snake venom-derived ACE inhibitor used to treat hypertension [3]; Exenatide, a GLP-1 receptor agonist derived from Gila monster venom [4]; and Ziconotide, a marine cone snail-derived peptide that targets N-type calcium channels that is used to manage chronic pain [5]. Although venom peptides exhibit remarkable potency and selectivity toward their native targets, such specificity limits their direct therapeutic applicability. To harness the potential of using their unique structures, we must first decouple them from their natural binding roles and redesign the scaffolds to explore new target spaces. In this context, the next challenge is engineering venom-derived frameworks to form diverse, stable peptide libraries that can be used to recognize specific therapeutic targets.

Utilizing natural venom scaffolds for library construction poses a challenge due to their structural complexity and sensitivity to mutation. Random diversification often disrupts proper folding, leading to nonfunctional sequences. We addressed this by applying AI-based foldability prediction, which identified mutation-tolerant positions on each scaffold [6]. This ensured high library diversity while maintaining correct folding, resulting in a library enriched for structurally stable and functionally viable peptides.

Most venom peptides contain multiple disulfide bonds that confer structural stability and functional specificity, classifying them as disulfide-constrained peptides (DCPs). A major challenge in using a DCP platform for drug discovery is its poor developability, primarily due to the reliance on chemical peptide synthesis during the hit-to-lead process [7]. Synthetic DCPs often exhibit low yields and high production costs due to the complexity of oxidative folding and purification. To overcome these limitations, we recombinantly expressed venom peptides as thioredoxin (Trx)-fusion proteins in E. coli, enabling high-throughput production, purification, and analysis of correctly folded peptides.

In this study, we combined AI-guided venom library design and recombinant expression of venoms as Trx-fusion proteins, allowing for efficient screening. This platform successfully yielded binders to four distinct targets—including an ion channel—demonstrating a 100% success rate in finding hits. Furthermore, a generalized ML-assisted affinity maturation strategy was employed, enabling rapid identification of leads from the hits. This integrated framework provides a scalable and efficient path for venom-based peptide drug discovery.

2. Results

2.1. Venom—Conotoxin Library Design

To fully leverage venom peptides as a drug modality, our goal was to design a library that maximizes the proportion of properly folded peptides while achieving substantially greater scaffold sequence diversity than previously reported DCP libraries [7].

To identify highly diverse scaffolds, we first mined two manually curated venom databases: VenomZone [8] and ConoServer [9,10], which contain 7794 and 3073 entries, respectively. From VenomZone, we selected peptides shorter than 45 amino acids that contain three disulfide bonds. Each sequence was analyzed using a previously developed machine learning (ML) model [6] capable of predicting key residues that are critical for peptide foldability. Based on these predictions, we identified residues that could be mutated without compromising foldability and thus are amenable for diversification (“amenable residues”) and calculated a Diversity Score using the formula:

D i v e r s i t y S c o r e = \frac{N u m b e r o f a m e n a b l e r e s i d u e}{T o t a l l e n g t h o f t h e p e p t i d e s e q u e n c e}

A higher Diversity Score indicates that the scaffold is more amenable to library diversification. The top 391 scaffolds from VenomZone were selected for library construction (Table S1).

From the ConoServer dataset, peptides shorter than 45 amino acids and containing 2–4 disulfide bonds were selected and 91 representative sequences served as scaffolds for library construction (Table S1)

For each scaffold, amenable positions were identified using the foldability ML model [6]. A simulated library was then generated by randomizing these positions at a 1:1 ratio—50% retained the original residue and in 50%, the residue was replaced with any of the other amino acids except cysteine. The same ML model was applied to predict foldability scores for each sequence [6]. The top-scoring sequences were included in the library (around 1000 per scaffold for VenomZone and around 2500 per scaffold for ConoServer), resulting in a final library of approximately 6 × 10⁵ unique members across 482 distinct peptide scaffolds, all predicted to possess high foldability.

This Venom–Conotoxin Library (designated as VCX library) was benchmarked against a previously developed DCP platform based on seven scaffolds [7]; each scaffold had a defined set of lengths and a defined subset of positions amenable to randomized amino acid substitution (excluding cysteine), as previously described. Our simulated DCP library used the same fixed scaffold and randomization strategy and was adjusted to be the same size (6 × 10⁵) for a fair comparison of foldability and diversity profiles. Figure 1a presents the normalized distribution of predicted foldability scores for the two designed libraries: VCX (top left) and DCP (top right). The VCX library exhibited a highly concentrated distribution with an overall higher predicted foldability score (25% quantile: 0.8083; 50% quantile: 0.8606 [95% CI: 0.8604, 0.8608]; 75% quantile: 0.9039). This suggests that the sequences in the VCX library are, on average, predicted to be well-folded. In contrast, the DCP library showed a broader distribution (25% quantile: 0.5537; 50% quantile: 0.7028 [95% CI: 0.7024, 0.7032]; 75% quantile: 0.8025). A Mann-Whitney U test further confirmed that the VCX library foldability scores are significantly higher than those of the DCP library (p << 0.001). This indicates that the VCX library is significantly enriched for high-foldability sequences.

Figure 1b illustrates the distribution of pairwise sequence similarities within each library (VCX on the left, DCP on the right). This distribution was used to assess the inherent global sequence diversity of each library: a lower similarity score indicates greater sequence diversity, which is desirable for comprehensive screening and ML applications. The VCX library’s similarity scores displayed a relatively normal distribution (mean: 0.35; standard deviation: 0.09). This suggests a consistent and high level of sequence diversity across the entire library. In contrast, the DCP library’s scores exhibited a distinct bimodal distribution (mean: 0.45; standard deviation: 0.19). This bimodal pattern arises because the DCP library is composed of sequences derived from only seven scaffolds, among which, some scaffolds have limited sequence variation while others have extensive mutations. This results in an imbalanced representation of the sequence space and higher similarity between sequence subsets.

Uniform Manifold Approximation and Projection (UMAP) was applied to all sequences from both libraries to visualize their relative distributions in a shared latent space (Figure 1c). The VCX sequences formed a single, diffuse, and interconnected cluster, indicating that they broadly sample a contiguous region of sequence space. This diffuse topology aligns with the higher sequence diversity observed in Figure 1b, suggesting that the VCX library explores a wide and well-connected spectrum of sequence features. In contrast, the DCP sequences were more fragmented, forming several smaller and denser clusters corresponding to distinct scaffold families. This clustered structure likely reflects the library’s origin from a limited number of parent scaffolds, some of which are extensively diversified while others have low variation. The clear separations between clusters represent regions of sequence space that were not sampled by the DCP library, implying less continuous coverage. Overall, the topology of the joint embedding reveals that VCX sequences interconnect more smoothly across the manifold, whereas DCP sequences remain partitioned by scaffold lineage. In conclusion, this analysis demonstrates that the VCX library provides a more foldable, evenly distributed, and diverse representation of the sequence landscape.

2.2. High-Throughput Recombinant Expression of Venom Peptides

In our previous drug discovery efforts using the DCP platform, the main bottleneck arose at the peptide synthesis stage, which is required for hit validation and characterization. Each round of synthesis typically took several months, extending the overall hit-to-lead process to nearly two years. This limitation was compounded by the high cost of peptide synthesis, significantly restricting the number of hits that could be synthesized and evaluated. To reduce both time and expense in the hit-to-lead workflow, we aimed to develop a high-throughput recombinant expression system to efficiently triage peptide hits derived from the venom library.

Previously, Schwalen et al. reported that the venom peptide PtuI could be recombinantly expressed as a Trx-fusion protein in T7 Shuffle E. coli cells and obtained at high yields [11]. Therefore, we assessed the suitability of this expression system for potential integration into our VCX platform. We began by evaluating the method on a panel of PtuI variants representing diverse foldability profiles. The set included sixteen variants in total: six with low to moderate foldability scores (0.4–0.9) and ten with high scores (>1.0). Each variant was fused to the C-terminus of Trx–His₆-ENLYFQG (TEV cleavage site), expressed in T7 Shuffle E. coli, and purified using Ni–NTA affinity chromatography. Expression analysis of the Trx-fusion proteins showed that four of the six low-to-moderate foldability variants had low yields (<20 mg/L of culture) compared to two of the ten high-foldability variants. This trend was also evident in the Ni–NTA elution profiles analyzed using SDS–PAGE (Figure S1a). LC–MS confirmed the expected four-disulfide-bond (three from the PtuI variants and one from the Trx protein) products in 15 of the 16 Trx-fusion proteins (Table S2). The high-yield constructs showed sharp, symmetric LC–UV peaks, while low-yield constructs produced smaller, broader peaks (Figure S1a).

We then adapted this method for peptides from the VCX library and miniaturized the workflow into a high-throughput recombinant expression format. Expression was performed in duplicate 5 mL cultures in 24-well plates, followed by cell lysis using the BugBuster system [12] and purification with 96-well IMCSTips on a liquid-handling platform. Using this system, 689 Trx–VCX proteins with a median foldability score of 1.01 were expressed and purified: 77% produced a yield >20 mg/L (16% exceeded 100 mg/L) and ~23% were low-yield constructs (Table S3, Figure 1d). We observed that the low-yield variants frequently formed inclusion bodies, suggesting aggregation or limited solubility.

We next examined whether protein yield correlated with the hydrophobicity of the VCX part of the Trx–VCX fusion proteins. First, a sequence-only hydrophobicity index was calculated [13]:

H y d r o p h o b i c i t y I n d e x = \frac{\sum (K y t e - D o o l i t t l e s c a l e f o r e a c h A A)}{L e n g t h o f t h e s e q u e n c e}

Second, we determined a structure-based index that captures exposed hydrophobic patches (surface hydrophobicity). For each sequence, a 3D model was generated using ESMFold [14]; residues with appreciable side-chain exposure were designated as surface, nearby surface residues were grouped into patches, and exposure-weighted hydropathy was aggregated and length-normalized [15].

The protein yield showed stronger correlation with the structure-aware surface hydrophobicity (R = 0.52) compared to the sequence-only hydrophobicity index (R = 0.28) (Figure S1b). Because the surface hydrophobicity depends on the 3D arrangement of exposed side chains, this correlation, albeit moderate, is consistent with highly expressed proteins adopting well-folded conformations rather than existing as unfolded random peptides. Further correlation with ESMFold confidence metrics is shown in Figure S2.

2.3. Validation of the VCX Library by Panning Against Target Proteins

We evaluated the VCX library by panning against four target proteins of therapeutic interest— the extracellular domain of CD47, an N-terminal extracellular fragment of DLL3 comprising the C2 domain and the DSL domain, IL33, and P2X7R, which all have minimal sequence homology. The native ligands for CD47, DLL3, and IL33 are SIRPα, Notch, and ST2, respectively [16,17,18], and their interactions occur primarily through protein–protein interaction (PPI) surfaces. In contrast, P2X7R is a multi-span membrane ion channel activated by ATP as its natural ligand; it is crucial for immune responses, inflammation, and neurotransmission [19].

Many therapeutic antibodies have been developed to target the PPIs of CD47, DLL3, and IL33 [16,17,18], but there are few small-molecule or peptide ligands for these proteins. Therefore, we selected these three proteins to assess the ability of our VCX library to identify binders of shallow, often considered relatively “undruggable”, PPI surfaces. P2X7R was included to evaluate the library’s potential for discovering ligands for complex, multi-span membrane proteins.

The designed VCX library was displayed on the M13 phage surface through both the minor coat protein p3 and the major coat protein p8. CD47 and DLL3 were screened against both p3- and p8-displayed libraries, while IL33 and P2X7R were screened using the p3-displayed library only. Phage selections were performed following standard phage panning protocols [20,21] and a modified approach to define positive hits. Instead of identifying binders through spot phage ELISA, we used next-generation sequencing (NGS) to rank clone abundance after the final panning round. The top 100 sequences were subcloned into Trx-fusion expression vectors, expressed, and purified using high-throughput methods. The binding affinities (K_d) of the purified Trx–VCX fusions were measured using surface plasmon resonance (SPR). Sequences showing measurable binding were defined as positive hits. This process is illustrated as a funnel scheme in Figure S3.

Positive binders were identified for all four targets, achieving a 100% success rate in the primary panning. These binders exhibited substantial sequence and scaffold diversity (Table 1, Table S4). When tested as Trx-fusion proteins, most primary hits showed K_d affinity values in the micromolar range for CD47, DLL3, and IL33, consistent with binding to shallow protein–protein interaction surfaces. In contrast, the hits for P2X7R displayed affinities in the nanomolar range, underscoring the strong potential of the VCX library for discovering ligands targeting multi-span membrane proteins.

We selected several representative binders for each target for solid-phase peptide synthesis (SPPS) and compared their binding affinities with the corresponding Trx-VCX protein. These peptides were synthesized as linear peptides by solid-phase peptide synthesis and subsequently subjected to in vitro oxidative folding to ensure disulfide bond formation. The final products were analyzed by LC–MS to confirm the presence of the expected number of disulfide bonds (Supplementary Materials s1, LC–MS). As shown in Figure S4a, one example of IL-33 ligand peptide exhibited observed molecular masses approximately 6 Da lower than their corresponding linear counterparts, consistent with the formation of three disulfide bonds. In addition, each peptide displayed a single, well-resolved peak in the liquid chromatogram, indicating successful folding into a homogeneous species with a single dominant disulfide connectivity. The functional importance of these disulfide bonds is demonstrated in Figure S4b: peptide binding to IL-33 was almost abolished if peptide was pre-treated with 5 mM TCEP at 60 °C overnight, under conditions that reduce disulfide bonds, confirming that the disulfide-bonded structure is indispensable for IL-33 binding. Table 2 summarizes the K_d values measured for both forms. For most clones, SPR sensorgrams showed highly similar binding profiles between the two forms (Figure 2a). Although the absolute K_d values occasionally differed, they displayed a strong overall correlation (R = 0.711; Figure 2b). This finding indicates that K_d measurements obtained from Trx–VCX fusion proteins reliably predict peptide binding affinity, allowing for efficient ranking and triaging of clones prior to peptide synthesis. This strategy significantly reduces the need for chemical synthesis in early discovery stages, enabling broader screening of sequence diversity at the primary panning stage.

2.4. ML-Assisted Affinity Maturation Strategy for DLL3

To advance peptide binders into viable drug leads, their binding affinities must typically reach the low-nanomolar range. Since the primary hits targeting protein–protein interaction (PPI) surfaces generally exhibited affinities in the micromolar range, an affinity maturation process was necessary. Traditional affinity maturation relies on iterative directed evolution over multiple generations [7]; therefore, it can take years to improve primary hits’ affinity from the micromolar to sub-nanomolar range. To shorten this timeline, we developed an ML-assisted affinity maturation strategy that requires fewer evolutionary cycles and can be adapted as a general protocol. Unlike the target-agnostic foldability model, this affinity model operates as a supervised learner, trained specifically on the experimental enrichment data to learn the precise peptide-target binding relationships (model details described in Section 4.8).

We used DLL3 as an example target to demonstrate this ML-guided hit-to-lead workflow. From the primary panning, 21 binders were identified with K_d values ranging from 6.6 to 185 µM (measured as Trx-VCX proteins). Three representative sequences, DLL3-g3n02, DLL3-23, and DLL3-g3n19 (Table S4), were selected for optimization. These peptides expressed well as Trx-VCX proteins (with yields of 29.7, 37.9, and 20.7 mg/L, respectively) and exhibited moderate affinities (K_d = 55.4, 16.4, and 17.5 µM, respectively).

We applied an NNK-block screening strategy, an extended version of site-saturation mutagenesis scanning. Each NNK block represents three consecutive randomized amino acids, and two blocks were scanned combinatorially across each sequence (excluding cysteines), as exemplified by peptide DLL3-g3n02 in Figure 3a. For this 35-residue peptide with six cysteines, ten blocks were scanned, resulting in 45 × (C₂¹⁰) combinations that can be encoded by 45 oligonucleotides (Table S5). The theoretical diversity of the resulting library (~3 × 10⁹) fits within the coverage of a standard phage display library [20]. The designed library exhibited a ~20% permutation rate, as illustrated by the sequence logo of one simulated NNK-block library (Figure 3b). The NNK-block screening libraries for the other two sequences were constructed following a similar scheme.

The NNK-block libraries derived from the three VCX peptides were constructed as p3-displayed M13 phage libraries and panned against a DLL3 fragment. The NGS data revealed robust sequence enrichment. The resulting enrichment score heatmaps [7], as exemplified by the heatmap of peptide DLL3-g3n02 in Figure 3c, highlight clear sequence–affinity relationship patterns in the NNK-block screen results, providing valuable information for subsequent ML model training.

To generate a focused library enriched with high-affinity binders, an ML model must learn to predict binding using selection data. Predictive power depends not only on dataset size, but on where and how densely the sequence space is sampled around relevant binders. As shown in Figure 4, the NNK-block screening strategy provides an additional advantage over broad library screening by producing locally dense and target-specific datasets, which carry richer sequence–activity signals, thereby improving model training and downstream predictions.

Figure 4a compares the sequence space coverage of three libraries for DLL3: the initial VCX library (left), the NNK-block library derived from three representative VCX hits (middle), and the ML-predicted library (right). The 35 experimentally characterized binders (teal dots) are composed of 21 primary binders and 14 clones derived from the NNK-block screen. Some of these binders, which had a range of affinities (K_d values ranging from 6 to 200 µM), lay near the periphery of the main VCX sequence cloud. This indicates that the initial VCX screen sampled a portion of the relevant binding regions. In contrast, the NNK-block library was constructed via combinatorial block randomization, which, despite originating from only three parent hits, resulted in dense but broad sampling. Critically, the sequence population of the NNK-block library effectively encompassed the majority of the 35 characterized binders, demonstrating the method’s ability to efficiently cover additional space surrounding the primary hits.

To assess how these data affect the binding affinity prediction, we trained two separate models: one on the VCX panning data and one on the NNK-block panning data. Their predictive accuracy was assessed by correlating the predicted binding score against the SPR-measured K_d values of the 35 clones (Figure 4b). The model trained on the initial VCX library (Figure 4b, left panel) showed a modest rank correlation (Spearman ρ = 0.33), consistent with the VCX library’s role as an initial explorer that provides fewer informative contrasts for learning fine-grained sequence–affinity relationships. Conversely, the model trained on the NNK-block library (Figure 4b, right panel) achieved a significantly stronger rank correlation (Spearman ρ = 0.57). This improvement is attributed to the NNK-block screen providing denser training examples that allowed the model to perform better interpolation within the region of the selected binders.

Finally, we leveraged the combined insights from both models to extrapolate and propose a final, focused ML-predicted library of approximately 5 × 10⁴ sequences (see Methods and Figure S5). As shown in Figure 4a (right panel), the sequence population of this generated library occupied a unique region that extends outward from the cluster of 35 characterized binders. This 3-step approach (using VCX library, NNK screening, and then ML modeling) rapidly accesses experimentally unsampled sequence space, extrapolating the learned sequence patterns to propose de novo sequences predicted to possess higher affinities.

This ML-predicted library was displayed on p3 of M13 phages and panned against DLL3 under high-stringency conditions: using a low target concentration (2 nM) and extended washing steps (3 × 15 min). After three rounds of panning, we observed robust enrichment in the phage titer, which was confirmed by NGS analysis. The new NGS data were used to retrain the ML model, which subsequently predicted the top 100 strongest binders. These sequences were cloned as Trx-VCX proteins and screened by SPR, revealing several clones with marked affinity improvements. One representative example (Figure 5a) exhibited a 273-fold increase in affinity compared to the corresponding primary hit (measured as Trx-VCX fusion). Upon synthesis and in vitro folding of the selected nine peptides, we identified two leads with exceptionally slow dissociation rates (k_d = 1.2 × 10⁻⁴ and 1.7 × 10⁻⁴ s⁻¹) and nanomolar affinities (K_d = 6.7 and 12.6 nM, respectively; Figure 5b). Both were derived from the parent sequence of DLL3-g3n02 (QWPFQQWIPCTIHWNCDGNWCCFPITCYEQTGMCD) with a K_d of 3.5 μM (measured as an SPPS peptide (Table 2)) and showed 526- and 278-fold affinity improvements, respectively.

In conclusion, using the ML-assisted affinity maturation approach, we could achieve an up to 500-fold affinity enhancement within a single evolutionary cycle. Traditionally, such improvements require 2–3 iterative rounds of directed evolution [7], demonstrating that ML integration can dramatically accelerate the hit-to-lead process.

3. Discussion

The ML-designed VCX library offers two major advantages over traditionally designed DCP libraries [7]: (a) it achieves higher overall foldability scores, maximizing the proportion of properly folded and functional members within the library, and (b) it provides greater sequence diversity as it originates from hundreds of distinct scaffolds. These features collectively enhance the likelihood of successful panning, as demonstrated by the 100% success rate for the four tested targets. Moreover, the high foldability of the hits derived from the VCX library enables efficient recombinant expression as Trx-fusion proteins, supporting high-throughput screening during the hit-to-lead process. This streamlined workflow could substantially accelerate discovery timelines and reduce the associated costs.

Hsiao M.-H. et al. reported the first attempt to harness animal venom—termed “Metavenom”—as a therapeutic modality using phage display [22]. The sequences in their library were derived from database mining followed by extensive bioinformatic curation. Using this approach, they successfully identified an agonist to the Mas-related G-protein-coupled receptor member X4, demonstrating the ability of venom libraries to yield functional ligands for complex membrane proteins. Compared with Metavenom, our VCX library exhibits several distinct advantages. First, the sequence length limit of VCX (<45 amino acids) categorizes it clearly as a peptide library, whereas Metavenom includes sequences up to 100 amino acids, meaning that the members are more protein-like than peptide-like. Thus, for therapeutic applications favoring smaller modalities, the VCX library offers a clear advantage. Second, the VCX library used in this study has a test diversity of 6 × 10⁵, but its design strategy allows for essentially unlimited scalability—either by incorporating more scaffolds or expanding randomization within each scaffold. In contrast, Metavenom relies solely on database mining without internal randomization, resulting in a limited diversity of ~41,000 unique sequences with no intrinsic mechanism for expansion. In general, greater sequence diversity correlates with higher discovery success rates. We have already demonstrated a 100% success rate for four independent targets using a relatively small VCX test library. Thus, we anticipate maintaining consistently high success rates as the VCX library is further expanded in both scale and diversity.

For our affinity maturation strategy, we initially used primary NGS data to train an ML model to predict a focused library enriched with strong binders. However, this approach only yielded modest affinity improvements (~10-fold), likely because the primary panning data lacked sufficient target-specific affinity information as discussed above and shown in Figure 5. To overcome this limitation, we developed the NNK-block screening strategy to generate a more comprehensive dataset for ML training. The method was designed to be cost-efficient and easily adaptable to standard phage display workflows, enabling simultaneous scanning of multiple scaffolds. Traditional site-saturation mutagenesis introduces all 20 amino acids at each individual position, resulting in a library diversity of only 20 × n (for n positions), which is insufficient for ML model training. In contrast, NNK-block screening expands this approach by randomizing six positions simultaneously, increasing the theoretical diversity to k × 20⁶ = 6.4 × 10⁸ k, where k is the total number of combinations for two NNK blocks, each spanning six randomized amino acid positions. Constructing a full library covering all C(n,6) combinations (e.g., ~5.9 × 10⁵ oligos for n= 30) would cost over USD 10,000. Therefore, we divided the six randomized positions into two blocks of three consecutive NNK codons and used a pool of less than 100 oligos to cover all C(⌈n/3⌉,2) combinations. This reduced the cost to less than USD 400 while maintaining a theoretical diversity of >~10⁹, which is sufficient for robust ML model training. Moreover, the combinatorial two-block design captures covariation information between residues, further enhancing model accuracy. Given the low cost and scalability, this strategy can be readily applied to scan multiple scaffolds in parallel. However, relying on NGS enrichment as a proxy for affinity can introduce data bias, where high enrichment may result from non-specific binding or cell growth bias rather than target interaction. We mitigated this by training a parallel model on non-specific background binding data, allowing us to filter out promiscuous sequences and prioritize target-specific affinity.

In summary, there are several key advantages for this venom peptide platform. First, venom peptides are a promising drug modality with superior pharmacokinetics: their small size allows for deeper tumor penetration than antibodies, while rapid systemic clearance makes them ideal for imaging and theranostics applications [23,24]. Second, unlike linear peptides, their disulfide-stabilized folds act like compact protein domains. This allows them to bind to “shallow” protein-protein interfaces (PPIs) typically reserved for antibodies. Third, many of these peptides can be produced via recombinant expression, bypassing the high costs and slow timelines of chemical synthesis. Finally, the platform demonstrates broad target versatility, successfully generated binders for “antibody-only” targets (CD47, IL33, DLL3) and intrinsically challenging membrane proteins like P2X7R, which are often inaccessible to other modalities.

While the platform demonstrated success in rapid discovery of primary hits and a preliminary advance of fast affinity maturation process assisted by machine learning strategy, it hints at several technical hurdles. First, the efforts on affinity maturation are still not trivial and cannot guarantee to obtain strong binders for every targets. Second, binders for complex targets like GPCRs or ion channels (e.g., P2X7R) may initially show only micromolar affinity, requiring further engineering to reach therapeutic levels. Third, we have not navigated the “promiscuity” of natural venom scaffolds to ensure they hit specific membrane proteins without off-target effects and this remains a critical task for the future.

Looking forward, the future of this platform lies in its application in precision theranostics [25,26] and its integration with advanced delivery systems—such as lipid nanoparticles, extracellular vesicles, and oncolytic viruses [27,28]. By combining the natural evolutionary bias of venom for membrane proteins with computational optimization, this modality is poised to expand therapeutic options for previously “undruggable” targets such as GPCRs and ion channels [29,30,31,32,33].

4. Materials and Methods

4.1. Venom—Conotoxin Library Design

VenomZone database: Protein sequences and annotations for venom and nematocyst-related proteins were retrieved from the UniProt Knowledgebase using a custom search query (taxonomy_id:33208 AND (cc_tissue_specificity:venom OR cc_scl_term:nematocyst) AND reviewed:true) [8]. For this dataset, we computed a Diversity Score for each sequence (see Results for definition) and retained 391 of the most design-amenable scaffolds (Table S1).

Nucleic acid sequences encoding conotoxins were downloaded from the ConoServer in FASTA format (https://www.conoserver.org/?page=download, accessed on 20 November 2023) [9,10]; a total of 3073 entries were included. These sequences were translated into their corresponding protein sequences and filtered based on two criteria: (i) sequence length shorter than 45 amino acids, and (ii) the presence of 4–8 cysteine residues, corresponding to 2–4 disulfide bonds. After applying these filters, 91 unique sequences were identified, which were subsequently used as scaffolds to generate the library (Table S1).

For each scaffold, positions suitable for diversification (“amenable positions”) were identified using our previously described foldability model [6]. For every amenable position, variants were generated such that the parental residue was retained in 50% of the variants, and the remaining 50% consisted of equal numbers of variants with one of the other standard amino acids, except for cysteine. Non-amenable positions were fixed to the scaffold residue. All variants were scored with the foldability model [6]. For each scaffold, the highest-scoring designs were retained to cap the library size while maintaining representation: ~1000 variants per VenomZone scaffold and ~2500 per ConoServer scaffold. The resulting library comprised ~6 × 10⁵ unique sequences across 491 scaffolds, each predicted to have high foldability.

4.2. Venom Library Construction

The VCK libraries were constructed following the Kunkel mutagenesis method [34]. The oligos encoding 6 × 10⁵ sequences designed by the AI/ML model were purchased as Oligo Pools from Twist Bioscience (South San Francisco, CA, USA). The stop template was the single-stranded DNA of pS2202d and pS2202b for g3 and g8 display, respectively, as previously described [7]. The library has a diversity of 10⁸ and contains over 96% designed sequences, which was confirmed by NGS.

4.3. High Throughput Cloning, Recombinant Expression, and Purification of Trx-VCX Fusion Proteins

Oligonucleotides (oligos) encoding the selected VCX sequences were purchased as an oPool from Integrated DNA Technology (Coralville, IA, USA). This oligo pool was used to perform Kunkel mutagenesis on the single-stranded DNA of the pET32a plasmid, which contains sequences for thioredoxin, a 6×His Tag, a TEV cleavage site, and the VCX sequence. The Kunkel reaction mixture was transformed into T7 Shuffle E. coli and plated to isolate single colonies. Individual colonies were then grown overnight at 30 °C in LB medium with 5mg/mL carbenicillin (LB/Carb) in a 96-well loose-tube format. The corresponding colonies were simultaneously screened by 96-well PCR using oligos flanking the VCX sequences. The PCR products were sequenced, and the colonies carrying the correct sequences were selected for protein expression.

The selected clones were expressed in duplicate in 24-well plates, each containing 5 mL of LB/Carb. Cultures were inoculated with 50 μL of the overnight culture and shaken in a 30 °C incubator for 5–6 h until the bacterial density reached an OD₆₀₀ ≈ 0.8. Protein expression was then induced with 0.4 mM IPTG and the culture was incubated at 16 °C with shaking for a minimum of 20 h.

Following expression, the bacteria were pelleted by centrifuging the 24-well plates at 3000 rpm using a plate rotor in a Beckman benchtop centrifuge. The bacterial pellet was lysed with the BugBuster 10× Protein Extraction Reagent (Millipore, Burlington, MA, USA, Cat. No. 70921; diluted to 1× with PBS buffer according to the manufacturer’s protocol). The lysate was clarified by centrifugation at 4000 rpm in the same centrifuge, and the supernatant was transferred to a 96-deep-well plate for purification. Protein purification was performed via nickel-immobilized metal affinity chromatography (Ni-IMAC) using 1 mL tips with 100 μL of resin (Ni-IMCSTips, IMCS (Integrated Micro-Chromatography System), Irmo, SC, USA, Cat. No. 04T-D1R72-1-100-96) and a Dynamic Devices Lynx LM900 liquid handler (Dynamic Devices, Wilmington, DE, USA), following the manufacturer’s instructions. The purified protein was eluted in 150 μL of elution buffer containing PBS and 250 mM imidazole.

4.4. Expression and Purification of Target Proteins

4.4.1. CD47 and DLL3

The extracellular domains of human CD47 (residues Q19-S135, with a C33S mutation) and DLL3 (residues A27-D215), both with a C-terminal Avitag followed by a 6xHis-tag, were cloned into a modified pAcGPA vector behind the polyhedron promoter and the gp67 secretion signal. Recombinant baculoviruses were generated using the Baculogold system (BD Biosciences) in Sf9 cells following standard protocols. Trichoplusia ni cells were infected with baculovirus for large-scale protein production and harvested 48 h post-infection. The harvested media was supplemented with 1 mM NiCl₂, 5 mM CaCl₂ and 20 mM Tris pH 8, shaken for 30 min and then centrifuged for 20 min at 8500× g to precipitate the cells. The supernatant was removed and filtered through a 0.22 μm PES filter prior to loading onto a Ni-NTA affinity column.

Insect cell media from separate cultures containing either secreted 6xHis-tagged extracellular CD47 or DLL3 fragments were loaded onto a 10 mL Ni-NTA Superflow column (Qiagen, Venlo, The Netherlands) at a volumetric flow rate of 170 cm/h. The column was washed with 10 column volumes (CVs) of wash buffer (20 mM Tris pH 8, 10 mM imidazole, and 300 mM NaCl) and eluted with 8 CVs of elution buffer (20 mM Tris pH 8, 300 mM imidazole, and 300 mM NaCl). The fractions were analyzed using SDS-PAGE and those containing the protein of interest were pooled, concentrated, and loaded onto an S200 column (GE Healthcare Life Sciences, Pittsburgh, PA, USA) for further purification by size-exclusion chromatography (SEC) using SEC buffer (20 mM Tris pH 7.5 and 0.2 M NaCl) at the flow rates recommended by the manufacturer. The fractions containing the protein of interest were pooled and concentrated. Purified CD47 and DLL3 proteins were biotinylated via their C-terminal Avitags using the BirA500 biotin-protein ligase kit (Avidity LLC, Aurora, Co, USA) according to the manufacturer’s protocol.

4.4.2. IL33

The mature form of human IL-33 (S112-T270) was produced in-house (Genentech) in two forms, namely Nbtn-IL33 and Cbtn-IL33, which had an N- and C-terminal biotinylation site, respectively. Both forms were expressed with a His-tag in E. coli BL21(DE3) cells and purified using a HisTrap column followed by an SEC S75 column. Biotinylation was achieved by adding a biotin-LPETGG peptide to the TEV-cleaved IL33 using Sortase (Nbtn-IL33) or by adding biotin to the C-terminal Avi-tag using BirA (Cbtn-IL33). The resulting biotinylated IL33 was further purified using a reverse-affinity column followed by an Superdex S75 column.

4.4.3. P2X7R

Full-length human P2X7R or E. coli BirA genes were codon-optimized, synthesized, and cloned into the pRK5 vector under the control of a CMV promoter. P2X7R was cloned with C-terminal Avi and FLAG tags and BirA was cloned with a C-terminal His8 tag.

Protein expression was performed in Expi293F cells using the Expifectamine Kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocol. P2X7R and BirA were co-expressed to induce in vitro biotinylation. Cells were harvested by centrifugation, washed once with PBS, flash-frozen in liquid nitrogen, and stored at −80 °C until the subsequent protein purification.

The protein purification procedure was adapted from a published protocol [35] with minor modifications: extraction with a 10:1 (w/w) solution of LMNG/CHS, affinity purification using M2 Flag Resin, and elution with a Flag peptide. The eluted protein was concentrated and applied to a Superose 6 Increase 10/300 GL size exclusion column. Peak fractions were evaluated for purity and complete biotinylation by SDS–polyacrylamide gel electrophoresis. The protein was pooled, aliquoted, and then snap-frozen in liquid nitrogen. The protein was then stored at −80 °C until downstream use.

4.5. Primary Screening of Venom Library

4.5.1. Selection of CD47, DLL3, and IL33

The VCX libraries were cycled through one (against Nbtn-IL33 and Cbtn-IL33) or two (against biotinylated CD47 and DLL3) rounds of solution panning using STA beads (Invitrogen cat. 65601) [21], followed by three rounds of plate sorting, alternating the use of NTA- and STA-coated MaxiSorp plates [20]. Limited enrichment (less than 10-fold) was observed in the last round in the cases of IL33 and CD47 panning, while obvious enrichment (approximately 50-fold) was observed in round 4 for DLL3. Selected binders from each round of all pannings were subjected to next-generation sequencing (NGS).

4.5.2. Selection of P2X7R

The venom g3-VCX library was cycled through five rounds of binding selection in solution against biotinylated P2X7R. In round one, 10 ug of biotinylated P2X7R was incubated with 1 mL of the phage library (~1× 10¹³ pfu/mL) at 4 °C for 1 h in binding buffer (20 mM Tris/HCl pH 7.5, 150 mM NaCl, 0.004% LMNG, 0.00004% CHS, and 0.5% BSA). The phage–P2X7 complex was captured for 15 min at room temperature using 100 μL of MyOne Streptavidin Dynabeads (Cat# 65602, ThermoFisher, Waltham, MA, USA) that had been previously blocked with the binding buffer. The supernatant was discarded, and the beads were washed three times with wash buffer (20 mM Tris/HCl pH 7.5, 150 mM NaCl, 0.004% LMNG, and 0.00004% CHS). The bound phage was eluted with 400 μL of 0.1 M HCl for 7 min and immediately neutralized with 60 μL of 1 M Tris pH 11. The eluted phage was amplified following the standard protocol [20]. In rounds two and three, the same protocol as round one was used with the following modifications: 5 μg or 2 µg of biotinylated P2X7 was used; 30 μL of SpeedBeads Neutravidin-Coated Magnetic Particles (Cat# 78152104010150, Cytiva, Marlborough, MA, USA) or 20 ul Dynabeads was used. In round four, 2 μg of biotinylated P2X7R was incubated with the amplified phage from the previous round, and the phage–P2X7R complex was captured using neutravidin-coated plates that had been previously blocked with binding buffer. The procedure in round five was identical to that of round four except streptavidin-coated plates were used to capture the phage–P2X7R complex. The phage was propagated in XL1-blue E. coli with the M13-KO7 helper phage at 37 °C following the standard protocol [20].

4.6. Next-Generation Sequencing (NGS)

The phage samples, representing a fraction of the libraries, eluates, and negative controls were collected at each step. These samples were subjected to next-generation sequencing (NGS) analysis using the MiSeq system from Illunima (Foster City, CA, USA) with MiSeq Reagent Kits v2, following the manufacturer’s instructions for sample preparation and sequencing. Briefly, a two-round PCR approach was used: the first round added a unique index to each sample, and the second round added universal overhangs for sequencing. The secondary PCR products were pooled, column purified, and quantified by qPCR using the KAPA Library Quant kit (Illumina, Foster City, CA, USA. Cat. No. KK4835). Finally, the NGS sample was sequenced using the MiSeq system and the “Generate FASTQ” module.

4.7. NNK-Block Screening

In the peptide sequence (with length n and containing m Cys residues), the non-Cys amino acids were divided into k blocks, where k = ⌈(n − m)/3⌉ and each block contains three consecutive amino acids. Extended site-saturated mutagenesis screening was performed by combinatorially mutating two of these k blocks per variant. The total number of combinations screened was C(k,2).

The sequences for each combination were encoded by oligonucleotides containing 2 blocks of 3 consecutive degenerate NNK codons (“N” is an A, C, G, or T; “K” is a G or T), which collectively encode all 20 amino acids (“X”). These oligos were synthesized as an oPool (Integrated DNA Technology, Coralville, IA, USA). The oligo pool, which contained proper flanking sequences, was used to construct the NNK-block library via Kunkel mutagenesis on the same single-stranded DNA described in the “Venom Library Construction” section.

The resulting libraries were subjected to phage display screening against the corresponding target protein for up to four rounds following standard protocols [19]. Phages eluted from each round, along with the initial NNK library sample, were collected and prepared for next-generation sequencing (NGS) analysis. Enrichment scores (ESs) were calculated as previously described [7] and ES heatmaps were generated using a Python 3.9.6 script implemented using the matplotlib library.

4.8. Machine Learning Model Training and Prediction of Focused Library

The overall ML framework has been described previously [36]. Briefly, we used a proprietary transformer-based sequence model (DeepSeq.AI, San Francisco, CA, USA) to learn sequence–enrichment relationships from phage display selections. The core architecture is based on RoBERTa, a masked language model initially pretrained on curated protein sequence datasets, including UniProt and MGnify. To adapt the model for our regression task, we appended a regression head consisting of a pooling layer followed by a fully connected linear layer, which outputs continuous enrichment values.

NGS reads from the DLL3 hits from the VCX and NNK-block libraries were demultiplexed, quality-filtered based on Phred scores, and translated into peptide sequences. For each peptide, an enrichment value was computed as the fold-change in normalized counts across the selection rounds and used as the training label. A third dataset from VCX selections against neutravidin-coated plates was used to train a model that predicts non-specific binding.

The models were fine-tuned using the VCX and NNK datasets; the non-specific model was trained independently. To assess generalization, we used scaffold-aware cross-validation in which sequences were clustered by similarity and split into five disjoint folds, ensuring that the training and validation sets represented distinct sequence families.

Five scaffolds with consistently high predicted binding scores (from the VCX- and NNK-trained models) were selected from the NGS data to form the basis of the focused library. For each scaffold, we generated ~10⁷ novel candidate sequences in silico by varying positions within the scaffold framework. Candidates were scored by (i) the DLL3 binding models and (ii) the non-specific model; sequences predicted to have strong DLL3 binding and low non-specific binding were retained. The top-ranked candidates were down-selected to generate a focused library of 5 × 10⁴ sequences distributed across the five scaffolds.

4.9. Solid-Phase Peptide Synthesis

Linear precursor peptides were synthesized using standard solid-phase 9-fluorenylmethyloxycarbonyl (Fmoc) chemistry and purified with reverse-phase (RP) high-performance liquid chromatography (HPLC) on a C18 column (21 mm × 100 mm, 5 μm, 100 Å, 20 mL/min) using a 40 min gradient of 5 to 40% aqueous acetonitrile containing 0.05% trifluoroacetic acid (TFA). To induce disulfide bond formation, the purified linear peptides were dissolved in a folding buffer containing 0.2 M ammonium bicarbonate, 2 M urea, 2.5 mM reduced glutathione, 0.5 mM oxidized glutathione pH 8.0, and 0–50% dimethyl sulfoxide (DMSO; 0.2 mg/mL) depending on the solubility. The resulting solutions were stirred at room temperature for 24–48 h until liquid chromatography–mass spectrometry (LC-MS) analysis (C18 column, 4.6 mm × 50 mm, 5 μm, 100 Å, 1.0 mL/min, 2 min gradient of 5 to 95% aqueous acetonitrile containing 0.1% TFA, 220 nm) indicated that the reaction had finished. The folded peptides were purified using RP-HPLC and the quality was confirmed using LC-MS.

4.10. SPR Measurements

All SPR assays were conducted using a Biacore S200 instrument, and the data were evaluated using the Biacore S200 Evaluation v1.1.1 software (Cytiva).

For the CD47, DLL3, and IL33 analyses, each protein was biotinylated and captured on SA sensor chips (Cytiva) to 500 RU (for measuring Trx-VCX fusions) or 1600 RU (for measuring VCX peptides). Freshly purified Trx-VCX proteins in 250 mM imidazole were diluted 20-fold in 1x HBS-EP+ buffer (Cytiva), which was then used as the starting concentration in a three-fold dilution series prepared in 1X HBS-EP+ with 12.5 mM imidazole; these dilutions were subjected to SPR measurements. Similarly, synthetic VCX peptides were first solubilized in 100% DMSO and then diluted 20-fold in 1X HBS-EP+, which was used to prepare a three-fold dilution series using 1X HBS-EP+ with 5% DMSO. Binding experiments were performed on a Biacore S200 instrument using a single-cycle kinetics protocol at a flow rate of 15 μL/min at 25 °C. 1X HBS-EP+ with 12.5 mM imidazole and 5% (v/v) DMSO were used as the running buffer for Trx-VCX fusion proteins and synthetic peptides, respectively. Binding kinetics parameters were calculated using the Biacore S200 Evaluation software (Cytiva, v1.1.1) and either a 1:1 binding model (for peptides) or a two-state reaction model (for Trx-VCX fusions). Binding affinities were also calculated using a steady-state equilibrium model provided by the software.

For the P2X7R analysis, biotinylated P2X7R was immobilized on a SA sensor chip as follows: for Trx-VCX, to 315 RU (Fc2), 430 RU (Fc3), and 2000 RU (Fc4); for SPPS peptides, to 1400 and 3700 RU using a capture buffer (20 mM Tris HCl pH 7.5, 150 mM NaCl, 0.004% LMNG, and 0.00004% CHS) at a 5 μL/min flow rate at 20 °C. The reference flow cell was left empty. All flow cells were capped with biotin. The Biacore S200 system was primed using a running buffer consisting of 20 mM Tris HCl pH 7.5, 150 mM NaCl, and 0.06% bOG for Trx-VCX or 1% DMSO for SPPS peptides at 10 °C. The Trx-VCX proteins and SPPS VCS peptides were injected at a flow rate of 30 μL/min for 60 s or 80 μL/min for 90 s, respectively. For Trx-VCX proteins, the starting concentration was 10 μM, which was used to perform three-fold dilutions to generate a total of 5 data points; the data were collected using a single-cycle kinetics protocol with a dissociation time of 300 s. For SPPS VCX peptides, the starting concentration was 20 μM, which was used to perform two-fold dilutions to generate a total of 6 data points; the data were collected using a single-cycle kinetics protocol with a dissociation time of 400 s. Sensograms were fit using the two-state reaction model or 1:1 binding one site model. The biophysical models are approximations of the “true” interaction mechanism as a high binding complexity is expected for MP interactions [37].

5. Conclusions

In summary, the AI-designed venom peptide library demonstrated high effectiveness across diverse targets, such as CD47, DLL3, IL33 and P2X7R, consistently yielding binders with a multiple disulfide bond-constrained conformation. By leveraging machine-learning-guided prediction of key residues for folding into a disulfide-constrained peptide library design, the platform minimizes misfolding and aggregation liabilities while preserving broad functional diversity.

Importantly, the seamless integration of high-throughput recombinant Trx-peptide expression, biophysical characterization, and ML-assisted affinity maturation enables rapid transition from initial hit identification to optimize lead candidates. This workflow allows systematic improvement of binding affinity at significantly reduced time and experimental burden typically associated with peptide optimization.

Collectively, these advances establish a robust, scalable, and generalizable discovery platform for therapeutic peptide development based on natural venom peptide scaffolds, capable of supporting both early-stage target exploration and downstream lead optimization. The combination of ML-assisted library design, efficient production, and iterative ML-guided refinement positions this platform as a powerful engine for accelerating peptide drug discovery against challenging biological targets.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ph19020288/s1. s1: LC-MS results for SPPS peptides; Figure S1: Recombinant expression and characterization of venom peptides as thioredoxin-fusion proteins; Figure S2: Correlation plots analyzing the relationship between the measured Trx-VCX yield and various structural quality metrics derived from ESMFold 3D models for the VCX hits; Figure S3: Funnel schematic of VCX hit identification against four target proteins; Figure S4: Multiple disulfide bonds are indispensable for peptide binding activity. Figure S5: ML-predicted focused library for DLL3 AM; Table S1: All scaffolds in VCX library; Table S2: Sequence, foldability score, calculated MW based on sequence, MW measured by LC-MS, and delta MW for 16 representative PtuI variants; Table S3: Yield, foldability score, hydrophobicity index, and surface hydrophobicity of Trx-fusion clones; Table S4: All positive clones for 4 targets; Table S5: Oligo for NNK-block screening for DLL3 binders.

Author Contributions

Conceptualization, Y.Z., A.C., H.R.M. and M.M.; methodology, Y.Z., A.C., H.R.M., M.M. and A.S.; software, A.C.; validation, F.C. and L.Z.; formal analysis, Y.Z. and A.C.; investigation, F.C. and L.Z.; resources, H.R.M., B.D., W.C., J.T., E.H. and P.J.; writing—original draft preparation, Y.Z. and A.C.; writing—review and editing, H.R.M.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Author Fei Cai, Lijuan Zhou, Bryce Delgado, Wenping Chang, Jeffrey Tom, Evelyn Hernandez, Prajakta Joshi, Aimin Song, Matthieu Masureel, Henry R. Maun and Yingnan Zhang were employed by the company Genentech. Author Andrew Chang was employed by the company DeepSeq.AI. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine Learning
UMAP	Uniform Manifold Approximation and Projection
VCX	Venom-ConotoXin
PPI	Protein–Protein Interaction
DCP	Disulfide-Constrained Peptide
SPPS	Solid Phase Peptide Synthesis
Trx	Thioredoxin
GPCR	G-Protein-coupled Receptor
TFA	trifluoroacetic acid
DMSO	dimethyl sulfoxide
LC-MS	liquid chromatography-mass spectrometry

References

Freuville, L.; Matthys, C.; Quinton, L.; Gillet, J.P. Venom-derived peptides for breaking through the glass ceiling of drug development. Front. Chem. 2024, 12, 1465459. [Google Scholar] [CrossRef]
Kim, E.; Hwang, D.H.; Mohan Prakash, R.L.; Asirvatham, R.D.; Lee, H.; Heo, Y.; Munawir, A.; Seyedian, R.; Kang, C. Animal Venom in Modern Medicine: A Review of Therapeutic Applications. Toxins 2025, 17, 371. [Google Scholar] [CrossRef]
Marte, F.; Sankar, P.; Patel, P.; Cassagnol, M. Captopril; StatPearls: Treasure Island, FL, USA, 2025. [Google Scholar]
Bridges, A.; Bistas, K.G.; Jacobs, T.F. Exenatide; StatPearls: Treasure Island, FL, USA, 2025. [Google Scholar]
Wie, C.S.; Derian, A. Ziconotide; StatPearls: Treasure Island, FL, USA, 2025. [Google Scholar]
Cai, F.; Wei, Y.; Kirchhofer, D.; Chang, A.; Zhang, Y. Rapid prediction of key residues for foldability by machine learning model enables the design of highly functional libraries with hyperstable constrained peptide scaffolds. PLoS Comput. Biol. 2024, 20, e1012609. [Google Scholar] [CrossRef]
Zhou, L.; Cai, F.; Li, Y.; Gao, X.; Wei, Y.; Fedorova, A.; Kirchhofer, D.; Hannoush, R.N.; Zhang, Y. Disulfide-constrained peptide scaffolds enable a robust peptide-therapeutic discovery platform. PLoS ONE 2024, 19, e0300135. [Google Scholar] [CrossRef] [PubMed]
UniProt, C. UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 2025, 53, D609–D617. [Google Scholar]
Kaas, Q.; Westermann, J.C.; Halai, R.; Wang, C.K.; Craik, D.J. ConoServer, a database for conopeptide sequences and structures. Bioinformatics 2008, 24, 445–446. [Google Scholar] [CrossRef]
Kaas, Q.; Yu, R.; Jin, A.H.; Dutertre, S.; Craik, D.J. ConoServer: Updated content, knowledge, and discovery tools in the conopeptide database. Nucleic Acids Res. 2012, 40, D325–D330. [Google Scholar] [CrossRef] [PubMed]
Schwalen, C.J.; Babu, C.; Phulera, S.; Hao, Q.; Wall, D.; Nettleton, D.O.; Pathak, T.P.; Siuti, P. Scalable Biosynthetic Production of Knotted Peptides Enables ADME and Thermodynamic Folding Studies. ACS Omega 2021, 6, 29555–29566. [Google Scholar] [CrossRef]
Grabski, A.C. Advances in preparation of biological extracts for protein purification. Methods Enzymol. 2009, 463, 285–303. [Google Scholar]
Kyte, J.; Doolittle, R.F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982, 157, 105–132. [Google Scholar] [CrossRef]
Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
Chennamsetty, N.; Voynov, V.; Kayser, V.; Helk, B.; Trout, B.L. Prediction of aggregation prone regions of therapeutic proteins. J. Phys. Chem. B 2010, 114, 6614–6624. [Google Scholar] [CrossRef] [PubMed]
Ahvati, H.; Roudi, R.; Sobhani, N.; Safari, F. CD47 as a potent target in cancer immunotherapy: A review. Biochim. Biophys. Acta Rev. Cancer 2025, 1880, 189294. [Google Scholar] [CrossRef] [PubMed]
Calderon, A.A.; Dimond, C.; Choy, D.F.; Pappu, R.; Grimbaldeston, M.A.; Mohan, D.; Chung, K.F. Targeting interleukin-33 and thymic stromal lymphopoietin pathways for novel pulmonary therapeutics in asthma and COPD. Eur. Respir. Rev. 2023, 32, 220144. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Zheng, T.; Xu, D.; Sun, C.; Huang, D.; Liu, X. Targeting DLL3: Innovative Strategies for Tumor Treatment. Pharmaceutics 2025, 17, 520. [Google Scholar] [CrossRef]
Liu, X.; Li, Y.; Huang, L.; Kuang, Y.; Wu, X.; Ma, X.; Zhao, B.; Lan, J. Unlocking the therapeutic potential of P2X7 receptor: A comprehensive review of its role in neurodegenerative disorders. Front. Pharmacol. 2024, 15, 1450704. [Google Scholar] [CrossRef]
Tonikian, R.; Zhang, Y.; Boone, C.; Sidhu, S.S. Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries. Nat. Protoc. 2007, 2, 1368–1386. [Google Scholar] [CrossRef]
Zhang, Y.; Ultsch, M.; Skelton, N.J.; Burdick, D.J.; Beresini, M.H.; Li, W.; Kong-Beltran, M.; Peterson, A.; Quinn, J.; Chiu, C.; et al. Discovery of a cryptic peptide-binding site on PCSK9 and design of antagonists. Nat. Struct. Mol. Biol. 2017, 24, 848–856. [Google Scholar] [CrossRef]
Hsiao, M.H.; Miao, Y.; Liu, Z.; Schutze, K.; Limjunyawong, N.; Chien, D.C.; Monteiro, W.D.; Chu, L.S.; Morgenlander, W.; Jayaraman, S.; et al. Molecular Display of the Animal Meta-Venome for Discovery of Novel Therapeutic Peptides. Mol. Cell Proteom. 2025, 24, 100901. [Google Scholar] [CrossRef]
Thurber, G.M.; Schmidt, M.M.; Wittrup, K.D. Antibody tumor penetration: Transport opposed by systemic and antigen-mediated clearance. Adv. Drug Deliv. Rev. 2008, 60, 1421–1434. [Google Scholar] [CrossRef]
Vadevoo, S.M.P.; Gurung, S.; Lee, H.S.; Gunassekaran, G.R.; Lee, S.M.; Yoon, J.W.; Lee, Y.K.; Lee, B. Peptides as multifunctional players in cancer therapy. Exp. Mol. Med. 2023, 55, 1099–1109. [Google Scholar] [CrossRef] [PubMed]
Armstrong, A.; Coburn, F.; Nsereko, Y.; Al Musaimi, O. Peptide-Drug Conjugates: A New Hope for Cancer. J. Pept. Sci. 2025, 31, e70040. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Lyu, J.; Muhadaisi, Y.; Shi, C.; Wang, D. Peptide based vesicles for cancer immunotherapy: Design, construction and applications. Front. Immunol. 2025, 16, 1609162. [Google Scholar] [CrossRef] [PubMed]
Xiao, W.; Jiang, W.; Chen, Z.; Huang, Y.; Mao, J.; Zheng, W.; Hu, Y.; Shi, J. Advance in peptide-based drug development: Delivery platforms, therapeutics and vaccines. Signal Transduct. Target. Ther. 2025, 10, 74. [Google Scholar] [CrossRef]
Sagar, B.; Gupta, S.; Verma, S.K.; Reddy, Y.V.M.; Shukla, S. Navigating cancer therapy: Harnessing the power of peptide-drug conjugates as precision delivery vehicles. Eur. J. Med. Chem. 2025, 283, 117131. [Google Scholar] [CrossRef]
Daniel, J.T.; Clark, R.J. G-Protein Coupled Receptors Targeted by Analgesic Venom Peptides. Toxins 2017, 9, 372. [Google Scholar] [CrossRef]
Mir, R.; Karim, S.; Kamal, M.A.; Wilson, C.M.; Mirza, Z. Conotoxins: Structure, Therapeutic Potential and Pharmacological Applications. Curr. Pharm. Des. 2016, 22, 582–589. [Google Scholar] [CrossRef]
Nareoja, K.; Nasman, J. Selective targeting of G-protein-coupled receptor subtypes with venom peptides. Acta Physiol. 2012, 204, 186–201. [Google Scholar] [CrossRef]
Saez, N.J.; Senff, S.; Jensen, J.E.; Er, S.Y.; Herzig, V.; Rash, L.D.; King, G.F. Spider-venom peptides as therapeutics. Toxins 2010, 2, 2851–2871. [Google Scholar] [CrossRef]
Wulff, H.; Christophersen, P.; Colussi, P.; Chandy, K.G.; Yarov-Yarovoy, V. Antibodies and venom peptides: New modalities for ion channels. Nat. Rev. Drug Discov. 2019, 18, 339–357. [Google Scholar] [CrossRef]
Kunkel, T.A.; Roberts, J.D.; Zakour, R.A. Rapid and efficient site-specific mutagenesis without phenotypic selection. Methods Enzymol. 1987, 154, 367–382. [Google Scholar]
Delgado, B.D.; Long, S.B. Mechanisms of ion selectivity and throughput in the mitochondrial calcium uniporter. Sci. Adv. 2022, 8, eade1516. [Google Scholar] [CrossRef]
Tombling, B.J.; Cai, F.; Chang, A.; Balana, A.; Miller, S.; Walters, A.L.; Wei, Y.; Tang, W.; Wendorff, T.; Liu, P.; et al. Developing high affinity peptide ligands of LRRC15 using machine learning. In Journal of Peptide Science; Wiley: Hoboken, NJ, USA, 2025. [Google Scholar]
Ro, S.Y.; Jao, C.; Oh, A.; Kschonsak, M.; Li, T.; Austin, D.; Greiner, D.M.Z.; Zhou, L.; Zhang, Y.; Chen, J.; et al. Fab-Induced Stabilization of an Ion Channel Receptor Enables Mechanistic Characterization of Small-Molecule Therapeutics. Anal. Chem. 2025, 97, 5102–5108. [Google Scholar] [CrossRef]

Figure 1. Benchmarking the VCX library against the DCP platform. The VCX library was characterized and benchmarked against the DCP platform based on sequence properties. (a) Comparison of normalized foldability score distribution of the VCX (left) and DCP (right) libraries. Scores were normalized relative to the maximum value from both datasets to facilitate direct visual comparison. Quantiles were determined via Monte Carlo simulation to account for prediction uncertainty. A Mann-Whitney U test confirmed a statistically significant difference between the two distributions (p << 0.001). (b) Comparison of pairwise sequence similarity distribution of the VCX (left) and DCP (right) libraries. (c) Uniform Manifold Approximation and Projection (UMAP) analysis comparing the sequence space occupied by the VCX (teal points) and DCP (gray points) libraries. The UMAP analysis was performed on high-dimensional sequence feature vectors derived from a protein language model as previously described [6]. (d) Pie chart summary illustrating the yield obtained from the high-throughput recombinant expression and purification of Trx-VCX constructs for 689 individual clones.

Figure 2. Surface plasmon resonance (SPR) characterization of VCX hits in two forms. Comparison of the binding kinetics of selected VCX peptides when expressed as a thioredoxin Trx-fusion protein versus the corresponding peptide obtained from solid-phase peptide synthesis (SPPS). (a) Representative SPR sensorgrams for the binding of selected VCX hits to target proteins. Red: raw data; Black: fitted curves. The upper row shows binding to IL33, and the lower row shows binding to P2X7R. The left column displays the data for the Trx-VCX fusion and the right column displays the data for the SPPS peptide. (b) Correlation plot comparing the dissociation constants K_d measured by SPR for the Trx-VCX fusion and corresponding SPPS peptide.

Figure 3. NNK-block mutagenesis and screening strategy. (a) Schematic illustration of the NNK-block screening strategy using DLL3-g3n02 as an example. The full peptide sequence was combinatorially mutated at two positions, with each position consisting of a block of three consecutive, fully randomized residues (NNK codons encoding XXX), which are underlined. For clarity, only a subset of sequences is displayed; omitted variants are indicated with bullets. All Cys residues are highlighted as red font. (b) Sequence logo representing the simulated NNK-block library based on the strategy described in (a), which contains 5 × 10⁵ members. (c) Heatmap displaying the enrichment scores of the NNK-block screening library shown in (a,b) after three rounds of phage display panning against the target protein DLL3.The grey color indicated invariable positions in NNK-block mutagenesis library.

Figure 4. Sequence-space coverage and prediction performance of VCX and NNK libraries. (a) Joint UMAPs for DLL3 sequence–space coverage using three libraries: the initial VCX library (left), an NNK-block library derived from three representative VCX hits (middle), and the ML-predicted library (right). Grey dots represent sequences in the libraries. Thirty-five SPR-characterized binders are overlaid (teal dots). (b) Models trained on VCX panning NGS data (left) and on NNK-block NGS data (right) were evaluated against the SPR K_d for the same 35 clones. The VCX-trained model showed a modest rank correlation (Spearman ρ = 0.33), whereas the NNK-trained model achieved a stronger correlation (ρ = 0.57).

Figure 5. Affinity maturation of DLL3 hits illustrated by surface plasmon resonance (SPR). Affinity improvement achieved through ML-assisted affinity maturation of initial VCX hits targeting DLL3. (a) Comparison of SPR sensorgrams for binding of Trx-VCX fusion proteins to DLL3. The right panel shows the sensorgram for the primary hit, and the left panel shows the sensorgram for the final lead achieved after ML-assisted maturation, demonstrating a 273-fold improvement in affinity. (b) SPR sensorgrams for the top two DLL3 leads as SPPS peptides after ML-assisted affinity maturation. Red: raw data; Black: fitted curves.

Table 1. Summary of VCX hits against four target proteins, including the total number of unique hit sequences (Hit number), the total number of unique scaffolds represented (Scaffold number), and the binding affinity (K_d) range determined using Trx-fusion peptides.

	Number of Hits	Number of Scaffolds	Trx-Fusion K_d Range
P2X7R	16	11	21 nM–2 µM
IL33	57	39	213 nM–34 µM
DLL3	22	14	6.6–185 µM
CD47	33	28	1–555 µM

Table 2. K_d measured by SPR for VCX ligands as Trx-VCX fusions and SPPS peptides. Most reported values were based on kinetic fitting. Steady-state fitting values are reported (italic font) when kinetic fitting was flagged as “parameters not uniquely determined” by the Biacore evaluation software.

Target	Sequence	Trx-Fusion K_d (M)	SPPS K_d (M)
CD47	QHRRNENQKAHDVNAQTYTWCCTQGPCRNTHRNGCS	2.89 × 10⁻⁶	1.76 × 10⁻⁵
CD47	NWCPPRISLCNSDKHCCKYVRCQRRDARMDKEECSQ	6.39 × 10⁻⁶	1.51 × 10⁻⁵
CD47	TRRRQCPPWCTYKICYESTC	5.49 × 10⁻⁶	1.63 × 10⁻⁵
CD47	KCAKYHEVCGDDSKCCHSFDCPGEVIIYCEKSN	4.27 × 10⁻⁴	3.91 × 10⁻⁵
DLL3	QWPFQQWIPCTIHWNCDGNWCCFPITCYEQTGMCD	5.54 × 10⁻⁵	3.50 × 10⁻⁶
DLL3	SWDWTWTSWNDNHETSYQIEDCCPNLQEFCCP	2.44 × 10⁻⁵	2.07 × 10⁻⁵
DLL3	YGKDHHEWVMYEWSQEEITCLDWGELCNLWFPTCCEYCIHPFCA	1.39 × 10⁻⁵	7.50 × 10⁻⁶
DLL3	WSEEWTWISCPMTWNCDGNWCCWHWDCGWQTWMCD	1.54 × 10⁻⁹	6.66 × 10⁻⁹
DLL3	WDPTWQWLPCPMHWNCDGNWCCWTWDCGESGWMCD	1.10 × 10⁻⁹	1.26 × 10⁻⁸
DLL3	WDWEQWWIPCAMHWNCDGNWCCSWWDCTDQGGMCD	1.36 × 10⁻⁸	8.77 × 10⁻⁸
DLL3	WTPECTWTCHWTTCNESWCSCWSWHECTWT	1.05 × 10⁻⁷	3.26 × 10⁻⁶
DLL3	WWETWTWIPCYTSWNCDGNWCCMHTDCTESWWMCD	1.51 × 10⁻⁸	6.53 × 10⁻⁹
DLL3	CGDTCYGLTCNTPFCTCKADRCWATWYWT	2.17 × 10⁻⁷	2.71 × 10⁻⁸
IL33	NPRHGTCYYVKFRCEHRWCWIHVKKCPQTDADDAFN	2.83 × 10⁻⁷	3.83 × 10⁻⁷
IL33	CTPNGGFCIMHYHCCKWTCFTITWNCN	8.00 × 10⁻⁷	1.60 × 10⁻⁶
IL33	DRPRPSKRCIAWKQPCEPHRNHNCCQEHCWNFVCE	2.96 × 10⁻⁶	2.93 × 10⁻⁶
IL33	NAAYWHPQPKWGFTQYHFICNASHCDWVWYCKFMKCYDCRNTRCT	3.04 × 10⁻⁷	1.37 × 10⁻⁵
IL33	RRNLQTEWNPLSLFMWRRWCWWHCRWHSHCASHCICTFRGCGAVNG	4.32 × 10⁻⁷	7.60 × 10⁻⁷
IL33	SKRKTTAHPWIEYECTYVHQTCHDQTPCCSGWCVFYCTGWR	6.32 × 10⁻⁷	1.70 × 10⁻⁵
IL33	CWYCYCPWWPCPQDQDCPGECICMAHGFCG	4.34 × 10⁻⁶	3.05 × 10⁻⁷
IL33	GLPVCGETCTLGKCYTSCCWCWWPWCYCR	3.11 × 10⁻⁷	3.90 × 10⁻⁶
IL33	SIPIWRCYPACILDTCDSYGCDCGEWMLCYMAN	8.99 × 10⁻⁷	1.79 × 10⁻⁷
IL33	GLPVCGEYCFTGKCYTWCCWCTPRRYCECR	8.57 × 10⁻⁷	5.72 × 10⁻⁷
IL33	GLPLCAEECSLGTCWTSCCWCWWPWCYCR	3.69 × 10⁻⁷	2.13 × 10⁻⁷
IL33	GLPLCGEECGSNSCFTTCCWCWWPWCYCR	3.75 × 10⁻⁷	1.10 × 10⁻⁷
IL33	CWYCYCPYRYCPSWTDCPRHCYCRFHGFCG	2.20 × 10⁻⁶	2.60 × 10⁻⁶
IL33	GIPVWRCYPACILDTCDSYGCECGEWMLCYMSD	3.08 × 10⁻⁷	1.55 × 10⁻⁷
IL33	SIPVWRCYPACILDTCDSYGCECGEWMLCYMTD	2.48 × 10⁻⁷	1.77 × 10⁻⁷
IL33	DSARKEVENPKASKWHWYWCQWRPRPCNSSVPCCGGSCGYFSCR	4.70 × 10⁻⁷	1.73 × 10⁻⁷
IL33	EDTRKEVENPKASKWHWYWCQWRPRLCNSSVPCCSGSCGYFSCR	5.44 × 10⁻⁷	3.09 × 10⁻⁷
IL33	CDCNYRCSPQYSPPCRCRWCWCHPLGLFVGFCIHPTG	4.11 × 10⁻⁸	9.51 × 10⁻⁸
IL33	CWYCYCPWRYCVSAQTCSAHCWCSYHGFCG	2.68 × 10⁻⁷	1.11 × 10⁻⁷
P2X7R	RRDCRWYQCEFQCCETINGQERCREINCH	1.94 × 10⁻⁶	5.17 × 10⁻⁶
P2X7R	KSDHVHKKWRWDKTARDHSNRPPPCCNNPACLSNRC	5.60 × 10⁻⁶	2.93 × 10⁻⁶
P2X7R	SHGRNAARKASDLIALTVRECCSQPPCRWKHPELCS	7.00 × 10⁻⁶	1.16 × 10⁻⁶
P2X7R	CCHTQCSQQYNCGQ	5.53 × 10⁻⁶	4.18 × 10⁻⁶

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, F.; Zhou, L.; Delgado, B.; Chang, W.; Tom, J.; Hernandez, E.; Joshi, P.; Song, A.; Masureel, M.; Maun, H.R.; et al. A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery. Pharmaceuticals 2026, 19, 288. https://doi.org/10.3390/ph19020288

AMA Style

Cai F, Zhou L, Delgado B, Chang W, Tom J, Hernandez E, Joshi P, Song A, Masureel M, Maun HR, et al. A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery. Pharmaceuticals. 2026; 19(2):288. https://doi.org/10.3390/ph19020288

Chicago/Turabian Style

Cai, Fei, Lijuan Zhou, Bryce Delgado, Wenping Chang, Jeffrey Tom, Evelyn Hernandez, Prajakta Joshi, Aimin Song, Matthieu Masureel, Henry R. Maun, and et al. 2026. "A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery" Pharmaceuticals 19, no. 2: 288. https://doi.org/10.3390/ph19020288

APA Style

Cai, F., Zhou, L., Delgado, B., Chang, W., Tom, J., Hernandez, E., Joshi, P., Song, A., Masureel, M., Maun, H. R., Chang, A., & Zhang, Y. (2026). A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery. Pharmaceuticals, 19(2), 288. https://doi.org/10.3390/ph19020288

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning-Enabled Venom Peptide Platform for Rapid Drug Discovery

Abstract

1. Introduction

2. Results

2.1. Venom—Conotoxin Library Design

2.2. High-Throughput Recombinant Expression of Venom Peptides

2.3. Validation of the VCX Library by Panning Against Target Proteins

2.4. ML-Assisted Affinity Maturation Strategy for DLL3

3. Discussion

4. Materials and Methods

4.1. Venom—Conotoxin Library Design

4.2. Venom Library Construction

4.3. High Throughput Cloning, Recombinant Expression, and Purification of Trx-VCX Fusion Proteins

4.4. Expression and Purification of Target Proteins

4.4.1. CD47 and DLL3

4.4.2. IL33

4.4.3. P2X7R

4.5. Primary Screening of Venom Library

4.5.1. Selection of CD47, DLL3, and IL33

4.5.2. Selection of P2X7R

4.6. Next-Generation Sequencing (NGS)

4.7. NNK-Block Screening

4.8. Machine Learning Model Training and Prediction of Focused Library

4.9. Solid-Phase Peptide Synthesis

4.10. SPR Measurements

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI