In Silico Prediction of Anti-Infective and Cell-Penetrating Peptides from Thalassophryne nattereri Natterin Toxins

The therapeutic potential of venom-derived peptides, such as bioactive peptides (BAPs), is determined by specificity, stability, and pharmacokinetics properties. BAPs, including anti-infective or antimicrobial peptides (AMPs) and cell-penetrating peptides (CPPs), share several physicochemical characteristics and are potential alternatives to antibiotic-based therapies and drug delivery systems, respectively. This study used in silico methods to predict AMPs and CPPs derived from natterins from the venomous fish Thalassophryne nattereri. Fifty-seven BAPs (19 AMPs, 8 CPPs, and 30 AMPs/CPPs) were identified using the web servers CAMP, AMPA, AmpGram, C2Pred, and CellPPD. The physicochemical properties were analyzed using ProtParam, PepCalc, and DispHred tools. The membrane-binding potential and cellular location of each peptide were analyzed using the Boman index by APD3, and TMHMM web servers. All CPPs and two AMPs showed high membrane-binding potential. Fifty-four peptides were located in the plasma membrane. Peptide immunogenicity, toxicity, allergenicity, and ADMET parameters were evaluated using several web servers. Sixteen antiviral peptides and 37 anticancer peptides were predicted using the web servers Meta-iAVP and ACPred. Secondary structures and helical wheel projections were predicted using the PEP-FOLD3 and Heliquest web servers. Fifteen peptides are potential lead compounds and were selected to be further synthesized and tested experimentally in vitro to validate the in silico screening. The use of computer-aided design for predicting peptide structure and activity is fast and cost-effective and facilitates the design of potent therapeutic peptides. The results demonstrate that toxins form a natural biotechnological platform in drug discovery, and the presence of CPP and AMP sequences in toxin families opens new possibilities in toxin biochemistry research.


Introduction
Animal venoms contain a diverse and complex mixture of bioactive compounds that target various receptors to support the survival of venomous animals [1]. Several drugs derived from animal venoms have been approved by the FDA for human use, while other drugs are in clinical trials [2,3]. Recent advances in genomics and proteomics have improved the biochemical analysis of venoms [4]. The ability to rapidly screen venom compounds using high-throughput technologies and the prediction of new molecules encoded in toxins allows for harnessing the therapeutic potential of animal venoms.
Peptides are key role molecules found in all organisms and play a crucial role in many biological processes [5][6][7][8]. The large distribution and functional diversity of peptides increase their therapeutic potential [9][10][11]. Venom-derived peptides involved in defense and predation have long been exploited for medicinal, agricultural, and biotechnological applications [1,12]. Most of these peptides originate from a limited number of taxa of venomous terrestrial animals. However, several bioactive compounds from fish venoms have been isolated and characterized [13]. Prediction of new bioactive peptides (BAPs) derived from natterins from the venomous fish Thalassophryne nattereri by in silico analysis is the aim of this study.
T. nattereri is responsible for cases of envenomation of fishermen and bathers in the north and northeast of Brazil [14,15]. The most common sites of envenomation are the palm of the hands or soles of the feet [16]. The natterin family of toxins contains five orthologs: natterin 1-4 and -P [17]. Natterins are tissue-kallikrein-like enzymes and aerolysin-like pore-forming toxins responsible for the main toxic effects of T. nattereri venom: local edema, excruciating pain, and necrosis [18][19][20]. The degree of amino acid homology between natterin 1 and 2 is 84%, and these orthologs have 40% identity with natterin 3 and 4 ( Figure S1). Natterin P is the shortest ortholog (71 amino acids) and shows 84% identity with the first 55 amino acid residues in the N-terminus of natterin 4 [17,20]. We hypothesize that natterins should be a source of BAPs with antimicrobial and cell-penetrating activity based on their pharmacological profile.
The therapeutic potential of venom-derived BAPs is determined by specificity, stability, and pharmacokinetic properties [21]. Two classes of BAPs-anti-infective or antimicrobial peptides (AMPs) and cell-penetrating peptides (CPPs)-share several physicochemical characteristics and are potential alternatives to antibiotic-based therapies and drug delivery systems, respectively. Since the plasma membrane selectively controls the transport of bioactive substances across cells, there is increased interest in developing novel strategies to overcome this barrier and increase bioavailability. In this context, peptide-based transport systems, such as CPPs, have come into focus, and their efficiency has been demonstrated in multiple applications [22][23][24][25].
AMPs are a large class of naturally occurring peptides with antibacterial and/or antifungal activity and can help overcome microbial resistance to conventional antibiotics [26][27][28]. Fusion of CPPs and AMPs produces multifunctional peptides capable of treating infections, cancer, obesity, and other diseases [29][30][31][32]. Thus, concerted efforts are being made to design new AMPs or CPPs [33][34][35][36][37]. Nonetheless, these BAPs have failed clinical trials, underscoring the need to optimize these peptides. In this context, the computer-aided design of BAPs has generated crucial information on the physicochemical characteristics and biological activities of BAPs, allowing analyzing these proprieties and activities before peptide synthesis. Several methods have been developed to predict AMPs and CPPs and evaluate physicochemical properties [38,39].
AMPs and CPPs can be derived from known protein sequences. However, analyzing the physicochemical properties of proteins using experimental techniques is expensive and laborious. In silico approaches are faster, cheaper, and less laborious, enabling the large-scale screening and identification of BAPs with application in biomedicine and pharmacology [40].
Several BAP prediction tools have been developed using different data features and machine learning methods [34,40,41], and the performance of these tools varies depending on these features and the nature of the training technique. Most prediction methods use single classifier models such as support vector machine (SVM), discriminant analysis, fuzzy k-nearest neighbor, and deep learning. Other methods use decision tree classifiers such as ensemble models and random forests [42].
In silico approaches have facilitated the design of highly effective engineered peptides with cell-penetrating, antimicrobial, and anticancer activity [43][44][45][46][47][48]. However, as peptides gain ground over small molecule drugs [2,49], some disadvantages must be overcome, including chemical and physical instability [50], high susceptibility to proteolytic degradation [51], short half-life and high clearance [52], slow tissue penetration [53], and high cytotoxicity [53]. In this context, machine learning techniques have been used to screen peptide template libraries based on physicochemical properties and absorption, distribution, metabolism, excretion, and toxicity (ADMET) parameters. This study evaluated the physicochemical and ADMET profiles of newly predicted peptides derived from natterins.

Results and Discussion
There has been an increased interest in therapeutic peptides as potential drug candidates [54]. Several studies identified and characterized a wide range of therapeutic peptides, including tumor-homing peptides [55], CPPs [56], AMPs [57], and anticancer peptides (ACPs) [58][59][60], and used these peptides for treating cancer, diabetes, and cardiovascular diseases. As a result of these efforts, several peptides have entered clinical trials over the past two decades [54]. Nonetheless, only a few peptide-based drugs are used clinically. Therefore, many research groups have focused on computational design based on physicochemical and structural features to produce potent and broad-spectrum peptides [9]. Several computational tools have been used to design peptide-based drug candidates [41]. This study predicted and characterized novel and potent BAPs derived from T. nattereri natterins by in silico analysis.

Identification of Potential Natterin-Derived AMPs and CPPs
Fifty-seven natterin-derived BAPs were identified using the web servers AMPA, CAMP, AmpGram, C2Pred, and CellPPD. These peptides were named according to the original sequence (natterin 1, 2, 3, 4, or P) and the order in which they were identified. For instance, the first peptide derived from natterin 1 was named NATT1_1, the second was named NATT1_2, etc. Some peptide sequences were homologous to more than one natterin. In these cases, the numbering of the source natterin was added to the nomenclature. For instance, the peptide RTYRGGKKTQTTTKGVYRTTQV was the first to be identified as belonging to natterin 1 and 2 and thus was named NATT1.2_1.
All predicted AMPs and CPPs and their respective scores (SVM, RF, or artificial neural networks (ANNs)) and probability scores are listed in Table 1. Nineteen peptides were classified as AMPs, of which seven, three, six, and three belonged to natterin 1, 2, 3, and 4, respectively. Eight CPPs were found, of which one and seven belonged to natterin 2 and 4, respectively. Thirty sequences shared AMP and CPP characteristics, of which five, eleven, four, five, and five sequences belonged to natterin 1, 2, 3, 4, and P, respectively. In the C2Pred web server, peptides with scores of <0.5 and ≥0.5 are classified as non-CPPs and CPPs, respectively. Although some peptides were predicted to be CPPs by CellPPD, C2Pred classified them as non-CPPs. For instance, the natterin 3-derived peptide NATT3_10 was classified as CPP and non-CPP using CellPPD (SVM score of 0.1) and C2Pred (score of 0.48551), respectively.  The length of the predicted peptides varied from 10 to 23 amino acid residues. In the 1980s, most peptides entering clinical development were less than 10 amino acids long. However, the length of engineered peptides increased over the years due to improvements in chemical synthesis and manufacturing technologies [61,62]. In the current decade, candidate peptides have up to 40 amino acids, suggesting that length is no longer a limitation. Nonetheless, most drug candidates have 10 amino acid residues are still the majority for peptide drug development. In the present study, 63.8% of the peptides presented 10 amino acids.

Physicochemical Properties and Membrane-Binding Potential
The following physicochemical characteristics were analyzed: net charge, pI, molecular weight (MW), amphipathicity, water solubility, hydrophobicity, hydrophobicity ratio, and charge. CPPs and AMPs are rich in particular amino acids, such as Arg, Trp, Pro, Gly, Cys, and His. The hallmark of these two classes of peptides is an abundance of basic (Arg and Lys) residues and/or Trp. The charge and pI values are shown in Figure 1. Forty-six (81%) peptides were cationic, nine (16%) were anionic, and two (3%) were neutral. The modes of action are determined by the physicochemical features of amino acid residues [63]. The net positive charge and amphipathicity significantly influence the bioactivity of AMPs and most CPPs [27]. The net positive charge affects initial electrostatic interactions with anionic phospholipids and lipopolysaccharides in the plasma membranes of certain pathogens [64]. In turn, mammalian cells, such as red blood cells, are composed primarily of zwitterionic phospholipids in the outer leaflet of their membranes, which are more strongly affected by peptide hydrophobicity than by positive charges [65]. Highly hemolytic peptides interact with phosphatidylcholine, an abundant component of zwitterionic membranes [66]. In contrast, cholesterol inhibits peptide binding in mammalian cell membranes [67]. pathogens [64]. In turn, mammalian cells, such as red blood cells, are composed primarily of zwitterionic phospholipids in the outer leaflet of their membranes, which are more strongly affected by peptide hydrophobicity than by positive charges [65]. Highly hemolytic peptides interact with phosphatidylcholine, an abundant component of zwitterionic membranes [66]. In contrast, cholesterol inhibits peptide binding in mammalian cell membranes [67]. In vitro and in vivo studies need controlled and accurate peptide concentration; hence, peptide solubilization is a critical step for successful assays. Consequently, poor peptide solubilization can introduce experimental errors and lead to experimental failure [21]. In this respect, the solubility of bioactive peptides depends on the molecular length and the number of hydrophobic amino acids ( Table 2) [68]. Peptides with a high percentage (≥50%) of hydrophobic amino acids are generally partially soluble in aqueous solu- In vitro and in vivo studies need controlled and accurate peptide concentration; hence, peptide solubilization is a critical step for successful assays. Consequently, poor peptide solubilization can introduce experimental errors and lead to experimental failure [21]. In this respect, the solubility of bioactive peptides depends on the molecular length and the number of hydrophobic amino acids (Table 2) [68]. Peptides with a high percentage (≥50%) of hydrophobic amino acids are generally partially soluble in aqueous solutions [20,69]. Our results showed that 22% of the peptides were stable (Table S1). The stability of drug candidates is critical for manufacturing the active pharmaceutical ingredient and for enabling formulation of a stable compound. Further, these properties enable producing peptides with different routes of administration, including topical, subcutaneous, and fast intravenous push preparations [70]. The hydrophobic properties for all peptides were calculated, and a plot representing hydrophobicity vs. hydrophobic moment vs. GRAVY of peptides allowed us to visualize the differences in terms of hydrophobicity between each peptide ( Figure 2). The hydrophobic plot can indicate that diminution of hydrophobicity and amphipathicity of the natterin peptides decreases their cellular uptake and that the substantial increase in these parameters can lead to an increase in their cytotoxicity. This suggests that carefully controlling these parameters can enhance peptide internalization and that above this threshold value it is expected that unwanted toxicity starts to appear. The nature of hydrophobic residues, positioning, and aromaticity are harmful mainly to CPPs' fate in terms of the reversibility of the membrane interaction and final membrane crossing. Studies with Trp-rich peptides revealed that less hydrophobic residues and more interfacial ones can contribute to the peptides establishing more transitory interactions with the membrane in part due to a less deep membrane insertion. This type of flexible membrane interaction is important to prevent the peptide from being locked in the membrane interior and to trigger translocation into membranes [71]. residues, positioning, and aromaticity are harmful mainly to CPPs' fate in terms of the reversibility of the membrane interaction and final membrane crossing. Studies with Trprich peptides revealed that less hydrophobic residues and more interfacial ones can contribute to the peptides establishing more transitory interactions with the membrane in part due to a less deep membrane insertion. This type of flexible membrane interaction is important to prevent the peptide from being locked in the membrane interior and to trigger translocation into membranes [71]. The Boman index estimates protein-binding potential and is calculated on the basis of the cyclohexane-to-water partition coefficient of the respective amino acid side chains divided by the total number of amino acid residues within the peptide [72]. A high index (>2.48) indicates high binding potential (e.g., hormones), whereas a low index (≤1) indicates fewer side effects (e.g., lower toxicity to mammalian cells) [72]. Seven (12%) sequences had a Boman index below 1 (Figure 3). The sequences YVCSCGCSSG (NATT3_03) and LYVAKNKYGLGKL (NATT4_01) presented the best index (0.05 and 0.08, respectively) and will be further chemically synthesized and assayed in vitro and in vivo. These Boman values were expected as AMPs typically do not bind to other proteins but penetrate and disrupt the plasma membrane. Given the amphiphilic nature of CPPs, ACPs, and antiviral peptides (AVPs), strong interaction with and deep penetration into the anionic lipid bilayers are expected for BAPs, making the plasma membranes prone to disruption, endocytosis, and/or direct translocation [73,74]. The Boman index of our peptides ranged from 4.0 to 6.7, which is higher than the range reported previously (∼1.0−3.5) The Boman index estimates protein-binding potential and is calculated on the basis of the cyclohexane-to-water partition coefficient of the respective amino acid side chains divided by the total number of amino acid residues within the peptide [72]. A high index (>2.48) indicates high binding potential (e.g., hormones), whereas a low index (≤1) indicates fewer side effects (e.g., lower toxicity to mammalian cells) [72]. Seven (12%) sequences had a Boman index below 1 (Figure 3). The sequences YVCSCGCSSG (NATT3_03) and LYVAKNKYGLGKL (NATT4_01) presented the best index (0.05 and 0.08, respectively) and will be further chemically synthesized and assayed in vitro and in vivo. These Boman values were expected as AMPs typically do not bind to other proteins but penetrate and disrupt the plasma membrane. Given the amphiphilic nature of CPPs, ACPs, and antiviral peptides (AVPs), strong interaction with and deep penetration into the anionic lipid bilayers are expected for BAPs, making the plasma membranes prone to disruption, endocytosis, and/or direct translocation [73,74]. The Boman index of our peptides ranged from 4.0 to 6.7, which is higher than the range reported previously (∼1.0-3.5) [73]. The cellular localization of each peptide was evaluated using the TMHMM server to estimate the probability of peptide translocation across lipid membranes. The results showed that 90% of the predicted peptides were located in the cell membrane. The membrane-binding potential and cellular localization of CPPs are shown in Figure 4 and Table  S2. The cellular localization of each peptide was evaluated using the TMHMM server to estimate the probability of peptide translocation across lipid membranes. The results showed that 90% of the predicted peptides were located in the cell membrane. The membranebinding potential and cellular localization of CPPs are shown in Figure 4 and Table S2.

Immunogenicity, Allergenicity, and Toxicity
Immunogenicity assessment is a crucial step in the drug development process. The complexity of the immune system demands the use of multiple approaches to predict the immunogenicity of biopharmaceuticals. Experimental studies are straightforward, such as in vitro, in vivo, and ex vivo, but are sometimes expensive and time-consuming, and their results need to be confirmed [75]. Immunogenicity was analyzed using the Immune Epitope Database (IEDB), a database of epitopes and immune receptors [76] (Table 3). Higher scores indicated a higher probability of triggering an immune response. The immunogenicity of all predicted peptides was lower than 0.7, demonstrating that they did not cause immune responses [77,78].

Immunogenicity, Allergenicity, and Toxicity
Immunogenicity assessment is a crucial step in the drug development process. The complexity of the immune system demands the use of multiple approaches to predict the immunogenicity of biopharmaceuticals. Experimental studies are straightforward, such as in vitro, in vivo, and ex vivo, but are sometimes expensive and time-consuming, and their results need to be confirmed [75]. Immunogenicity was analyzed using the Immune Epitope Database (IEDB), a database of epitopes and immune receptors [76] (Table 3). Higher scores indicated a higher probability of triggering an immune response. The immunogenicity of all predicted peptides was lower than 0.7, demonstrating that they did not cause immune responses [77,78].
Given the risk of inducing an immediate type I (IgE-mediated) allergic response, the allergenic potential of druggable proteins and peptides should be determined before they are marketed. The allergenic potential was evaluated using the AllerTOP web server by applying auto-cross covariance transformation to build a dataset of known allergens and developing alignment-independent models for allergen recognition based on the physicochemical properties of proteins [79]. The tool uses five machine learning methods for protein classification, including partial least squares discriminant analysis, logistic regression, decision tree, naïve Bayes, and k-nearest neighbors. In addition, AllerTOP attempts to identify the most likely route of exposure. AllerTOP outperforms other allergen prediction models, with a sensitivity of 94% [79]. Of the 57 predicted sequences, 35 were classified as non-allergenic (Table 3).
Toxicity was assessed using ToxinPred software [80,81], which uses the following datasets to train and test SVM models: (1) a main dataset (1805 toxin sequences from experimentally validated peptides/proteins (positive examples) and 3593 non-toxin Given the risk of inducing an immediate type I (IgE-mediated) allergic response, the allergenic potential of druggable proteins and peptides should be determined before they are marketed. The allergenic potential was evaluated using the AllerTOP web server by applying auto-cross covariance transformation to build a dataset of known allergens and developing alignment-independent models for allergen recognition based on the physicochemical properties of proteins [79]. The tool uses five machine learning methods for protein classification, including partial least squares discriminant analysis, logistic regression, decision tree, naïve Bayes, and k-nearest neighbors. In addition, AllerTOP attempts to identify the most likely route of exposure. AllerTOP outperforms other allergen prediction models, with a sensitivity of 94% [79]. Of the 57 predicted sequences, 35 were classified as non-allergenic (Table 3).
Toxicity was assessed using ToxinPred software [80,81], which uses the following datasets to train and test SVM models: (1) and an alternative independent dataset (303 toxin sequences from SwissProt and 1000 non-toxin sequences from TrEMBL). All identified peptide sequences were classified as non-toxic (data not shown).

Antiviral and Anticancer Potential
The control of viral diseases is challenging because of increased resistance to antiviral drugs and the emergence of new viral pathogens. AVPs, a subset of AMPs, are a potential source of therapeutics useful for preventing and treating viral infections [82]. The ability of AVPs to target various stages of the viral lifecycle, ranging from their attachment to host cells to their ability to impair viral replication within the cells, has been the subject of multiple studies [83][84][85]. Sixteen sequences were predicted to be AVPs, of which four had a score above 90%. NATT1.2_05 and NATT1.2_06 presented the highest scores (0.964 and 0.962, respectively). AVPpred predicts AVPs based on experimentally validated positive and negative datasets.
Cell membrane properties differ between cancer cells and healthy cells [86]. For instance, the membrane fluidity of cancer cells is higher than that of healthy cells [87]. In addition, the membrane of cancer cells has a higher negative charge, larger surface area due to the higher number of microvilli, and higher fluidity than that of healthy cells. ACPs, a subset of AMPs, are toxic to cancer cells [86]. ACPs have 5-30 cationic amino acid residues that adopt an α-helical or β-sheet structure but can assume a linear structure [88,89]. In the present study, 37 peptides were predicted to be ACPs. The physicochemical properties of ACPs determine electrostatic interactions with the anionic cell membrane of cancer cells and thus allow the selective killing of these cells [90]. ACPs have several advantages over small molecule cancer drugs. For instance, the shorter half-life decreases the probability of resistance. Moreover, ACPs have low toxicity, high specificity, high solubility, and good tumor penetration ability, demonstrating their great potential in cancer therapy [88][89][90][91]. The half-life of the predicted ACPs in mammalian cells varied from 1 to 100 h (Table 3). Compared to biologics, peptides have a much shorter circulatory half-life (days vs. weeks), resulting in the need for sub-optimal frequent drug administration [92].

Prediction of ADMET Properties
The analysis of biochemical processes from drug administration to elimination plays a crucial role in lead optimization. An ideal peptide drug should be quickly absorbed into the systemic circulation and eliminated without affecting pharmacological activity. Further, ideal candidates should be non-toxic. The analysis of ADMET parameters is essential in drug discovery. ADMET properties were predicted using the web server ADMETlab version 2.0 [93] (Table 4). The parameters analyzed were blood-brain barrier (BBB) penetration, Caco-2 permeability, volume of distribution (VD), plasma protein binding (PPB), human intestinal absorption (HIA), clearance (CL), half-life (T1/2), skin sensitization, AMES toxicity, carcinogenicity, and synthetic accessibility (SA) score (Table 4). All compounds had positive HIA, indicating the high ability to cross the intestinal barrier. Higher BBB penetration is associated with higher lipophilicity profiles and higher uptake. The calculated value for the BBB was shown to have a high likelihood of being negative. PPB is an important parameter in drug safety assessments since compounds with high PPB (>90%) have a narrow therapeutic index, whereas compounds with low PPB are considerably safer. All analyzed peptides had low PPB, indicating a good therapeutic index. Caco-2 cells, derived from human colon adenocarcinoma cells, have permeability functions similar to those of intestinal enterocytes and are used to predict intestinal drug absorption in vivo. All analyzed compounds had the best scores (greater than −6.47) in Caco-2 cell permeability assays. Regarding carcinogenicity, none of the analyzed peptides showed potential to cause cancer. The results of the AMES test showed that none of the peptides were genotoxic. The analysis of other toxicity parameters, such as hERG inhibition, hepatotoxicity, and skin sensitization, revealed that all peptides were safe. The SA score estimates the ease of synthesis (Table S3). Approximately 38.5% of the peptides had an SA score of up to 6.0, indicating the feasibility of synthesis. All compounds had good ADMET properties.

Medicinal Chemistry Studies
Small molecules defined as "drug-like" need to satisfy Lipinski's rule of five (Ro5): MW <500 Da, ≤5 H-bond donors, ≤10 H-bond acceptors, and 1−octanol/water partition coefficient (LogP) <5. Molecules that satisfy these criteria are likely to be orally bioavailable. Several studies have demonstrated that the physicochemical and structural properties of peptides are outside the traditional chemical space of approved drugs [94][95][96] based on Ro5 criteria [97]. Medicinal chemistry parameters such as MW, topological polar surface area (tPSA), LogP, fraction of sp3-hybridized carbon atoms (Fsp3), number of rotatable bonds (NRB), number of hydrogen bond acceptors (HBAs), number of hydrogen bond donors (HBDs), and number of aromatic rings (NARs) were evaluated (Table 5).  [96]. The peptides with the highest oral availability had an MW of 1200 Da and a LogP of 5-8. Furthermore, these peptides had five times more H-bond donors and acceptors than what was considered acceptable by Ro5 for small molecules [96]. High MW, tPSA, and NRB limit passive transport across cell membranes because of increased molecular size and complexation with water molecules [98,99]. HBAs and HBDs are relevant factors for cell permeability by Ro5 [100]. Our results agreed with the number of HBAs and HBDs for linear and cyclic pentapeptides and two CPP libraries [44,95]. However, the number of HBAs and HBDs in predicted peptides differed from those of clinically approved drugs [100]. The NRB and Fsp3 are used to assess molecular flexibility and complexity. The NRBs of the predicted peptides (37 to 117) exceed the maximum value for oral drugs and peptides [95,96]. The Fsp3 correlates with solubility in the aqueous phase and melting point [101]. The Fsp3 of the predicted peptides was 0.45-0.80, similar to values of orally available peptides (90th percentile = 0.79). Lipophilicity was investigated using LogP and NAR. LogP values are positively correlated with lipophilicity and thus membrane penetration. The LogP of the evaluated peptides varied from −7.387 to 0.562, consistent with values for approved peptide drugs and small molecule drugs [96,100]. The addition of an aromatic ring can significantly increase LogP [102]. Our study found that the NAR varied from 2 to 5.

Prediction of Peptide Structures
After analyzing the physicochemical properties of the peptides, hydrophobicity, hydrophobic moment, GRAVY, Boman index, and ADMET parameters, fifteen BAP sequences of AMPs and CPPs with characteristics considered promising were selected for further studies. Among the 3D structures obtained, it was possible to observe the presence of a random coil, alpha helix, and a peptide sequence (NATT1.2_07) with a beta sheet structure.
The 3D structures were predicted using the PEP-FOLD3 web server. PEP-FOLD models for the peptides NATT1_04, NATT1.2_05, NATT1.2_06, NATT1.2_07, NATT2_06, NATT2_07, NATT2_13, NATT2_14, NATT3_03, NATT3_04, NATT4_01, NATT4_02, NATT4_06, NATT4_15, and NATTP_05 were recognized as the best with the lowest opti- The models with sOPEP energy of −15.1734 and −14.3622 were considered the best and are presented in Figure 5. Ramachandran plot analysis indicated that these two models had 77.8% and 87.5% of the residues in the most favorable region and 0% and 22.2% of the residues in the favorable region, respectively. In addition, the helical wheel projection of these short peptides was obtained using the Heliquest web server ( Figure 5). A hydrophobic face on a helical wheel is characterized by at least five adjacent hydrophobic residues (Leu, Ile, Ala, Val, Pro, Met, Phe, Trp, or Tyr) [103].
The pH-dependent conformational equilibrium of the peptides was predicted using DispHred [104]. Khandogin [105] showed that increasing pH increased the length of the helical segments of C peptide from ribonuclease, where the difference in the relative populations of unfolded states gave rise to the pH-dependent total helix content. Our results showed that at pH 1.5 and 7.0, 75% of the peptides are in the unfolded state, with data indicating the presence of partial helices. The results provided information on the pH-dependent distribution of folded and unfolded states of the peptides. However, further in vitro studies are necessary to corroborate these data.
quences of AMPs and CPPs with characteristics considered promising were selected for further studies. Among the 3D structures obtained, it was possible to observe the presence of a random coil, alpha helix, and a peptide sequence (NATT1.2_07) with a beta sheet structure.
The 3D structures were predicted using the PEP-FOLD3 web server. PEP-FOLD models for the peptides NATT1_04, NATT1.2_05, NATT1.2_06, NATT1.2_07, NATT2_06, NATT2_07, NATT2_13, NATT2_14, NATT3_03, NATT3_04, NATT4_01, NATT4_02, NATT4_06, NATT4_15, and NATTP_05 were recognized as the best with the lowest optimized potential for efficient structure prediction (sOPEP) energy (−15.1734 to −1.97158). The models with sOPEP energy of −15.1734 and −14.3622 were considered the best and are presented in Figure 5. Ramachandran plot analysis indicated that these two models had 77.8% and 87.5% of the residues in the most favorable region and 0% and 22.2% of the residues in the favorable region, respectively. In addition, the helical wheel projection of these short peptides was obtained using the Heliquest web server ( Figure 5). A hydrophobic face on a helical wheel is characterized by at least five adjacent hydrophobic residues (Leu, Ile, Ala, Val, Pro, Met, Phe, Trp, or Tyr) [103].
The pH-dependent conformational equilibrium of the peptides was predicted using DispHred [104]. Khandogin [105] showed that increasing pH increased the length of the helical segments of C peptide from ribonuclease, where the difference in the relative populations of unfolded states gave rise to the pH-dependent total helix content. Our results showed that at pH 1.5 and 7.0, 75% of the peptides are in the unfolded state, with data indicating the presence of partial helices. The results provided information on the pHdependent distribution of folded and unfolded states of the peptides. However, further in vitro studies are necessary to corroborate these data.  Figure 5. Selected PEP FOLD predicted 3D structure homology models, Ramachandran validation plots, and helical wheel projections. (A) NATT4_15 motif, (B) Ramachandran plot for the NATT4_15 motif, (C) NATT4_15 helical wheel projection, (D) NATT4_02 motif, (E) Ramachandran plot for the NATT4_02 motif, and (F) NATT4_02 helical wheel projection. NATT4_15 had nine amino acid sequences in the allowed region, whereas NATT4_02 had eight amino acids in the favorable region. These two peptides had no amino acid sequence in the unfavorable region. The graphical representations were rendered using USCF Chimera [106]. Arrows indicate the direction of the hydrophobic moment (µH).

Study Design
The current study used several in silico approaches to find and design novel and potent AMPs and CPPs as a drug delivery system. The flowchart of peptide prediction and analysis is illustrated in Figure 6.

Study Design
The current study used several in silico approaches to find and design novel and potent AMPs and CPPs as a drug delivery system. The flowchart of peptide prediction and analysis is illustrated in Figure 6.

Evaluation of the Membrane-Binding Ability of BAPs
The Boman index and protein-binding potential were evaluated using APD3 (http: //aps.unmc.edu/AP/prediction/prediction_main.php, accessed on 5 February 2022). The Boman index is the sum of solubility values for all amino acids in a peptide sequence and indicates the ability to bind to the cell membrane or other proteins [72]. The cellular localization of BAPs was assessed using the TMHMM web server (http://www.cbs.dtu. dk/services/TMHMM, accessed on 8 February 2022). TMHMM analyzes the probability of a peptide to bind to the negatively charged bacterial cell membranes.

Prediction of ADMET and Medicinal Chemistry Parameters
The Simplified Molecular Input Line Entry System (SMILES) structural format of 58 peptides was obtained using PepSMI (https://www.novoprolabs.com/tools/convertpeptide-to-smiles-string, accessed on 2 April 2022). PepSMI runs an algorithm that converts raw sequences into a string of texts and unambiguously describes each atom and molecular bond in a manner amenable to machine processing. ADMET parameters, including human intestinal absorption (HIA), mutagenicity, carcinogenicity, central nervous system penetration, drug-induced liver injury (DILI), cytochrome P450 enzyme inhibition, carcinogenicity, mutagenicity, clearance, half-life, and skin sensitization, were assessed using version 420 (released on July 2021) of the ADMETlab 2.0 platform (https://admetmesh.scbdd.com/, accessed on 4 April 2022) and a comprehensive database composed of 0.25 million entries from PubChem, Online Chemical Modeling Environment (OCHEM), DrugBank, ChEMBL, Toxicity Estimation Software Tools (developed by the U.S. Environmental Protection Agency), and peer−reviewed literature [93]. Pan-assay interference compounds (PAINS) and undesirable reactive compounds were analyzed using the PAINS and Pfizer rules [113]. The ADMETlab 2.0 platform predicts the pharmacokinetic parameters based on basic information and experimental values of the respective entries.

Prediction of Peptide Structure
The three-dimensional (3D) structures of predicted BAPs were analyzed using PEP-FOLD3 (https://bioserv.rpbs.univ-paris-diderot.fr/services/PEP-FOLD3/, accessed on 3 April 2022), which predicts peptide structures de novo based on primary amino acid sequences. Peptides are described as a series of fragments of four amino acids, overlapping by three, and each fragment is associated with a geometric descriptor [114]. The quality of the best models was assessed. Peptide structures were validated using PROCHECK to measure the stereochemical properties of the modeled peptide motifs [115]. Furthermore, the helical wheel diagram of peptides was defined by Schiffer Edmundson wheel modeling using Heliquest (https://heliquest.ipmc.cnrs.fr/cgi-bin/ComputParams.py, accessed on 5 April 2022) [103]. pH-dependent folded and unfolded states were predicted using SVMbased DispHred (https://ppmclab.pythonanywhere.com/DispHred, accessed on 5 August 2022) [104].

Conclusions
Fifty-seven novel and potent AMPs and CPPs derived from natterins were predicted in silico from natterin toxins. Moreover, we predicted novel peptides that had high binding membrane indexes and localization inside cells. These peptide sequences can be further evaluated for antimicrobial, cell penetration, and anticancer activity in vitro and in vivo in advance. Generally, the predicted and engineered toxin-derived AMPs and CPPs with different properties can be applied to deliver different cargoes and drug development. Overall, the present study showed that using machine learning tools in peptide research can streamline the development of targeted peptide therapies.