Considerations on the Rational Design of Covalently Conjugated Cell-Penetrating Peptides (CPPs) for Intracellular Delivery of Proteins: A Guide to CPP Selection Using Glucarpidase as the Model Cargo Molecule

Access of proteins to their intracellular targets is limited by a hydrophobic barrier called the cellular membrane. Conjugation with cell-penetrating peptides (CPPs) has been shown to improve protein transduction into the cells. This conjugation can be either covalent or non-covalent, each with its unique pros and cons. The CPP-protein covalent conjugation may result in undesirable structural and functional alterations in the target protein. Therefore, we propose a systematic approach to evaluate different CPPs for covalent conjugations. This guide is presented using the carboxypeptidase G2 (CPG2) enzyme as the target protein. Seventy CPPs —out of 1155— with the highest probability of uptake efficiency were selected. These peptides were then conjugated to the N- or C-terminus of CPG2. Translational efficacy of the conjugates, robustness and thermodynamic properties of the chimera, aggregation possibility, folding rate, backbone flexibility, and aspects of in vivo administration such as protease susceptibility were predicted. The effect of the position of conjugation was evaluated using unpaired t-test (p < 0.05). It was concluded that N-terminal conjugation resulted in higher quality constructs. Seventeen CPP-CPG2/CPG2-CPP constructs were identified as the most promising. Based on this study, the bioinformatics workflow that is presented may be universally applied to any CPP-protein conjugate design.


Introduction
Biotherapeutics such as protein-based therapeutics are a fast-growing group of pharmaceuticals [1]. Although the advantage of biotherapeutics is their target specificity, one of the major challenges in the way of further development of protein-based therapeutics is their intracellular delivery. A huge hydrophilic proteinaceous macromolecule cannot cross the hydrophobic lipid bilayer membrane surrounding a cell [2]. This obstacle, however, can be overcome with the assistance of a cell-penetrating peptide (CPP).
CPPs are short relatively non-toxic peptide sequences, usually less than 30 amino acids, which not only have the ability to cross cellular membranes, but also can co-transport a variety of biologically active molecules (cargoes) inside the cells [1,2]. Since their discovery in 1988, a wide range of different CPPs has been identified and subjected to clinical trial studies [3,4]. Chemically, CPPs are Table 1. Results of CPPred-RF server analysis for 70 cell-penetrating peptides (CPPs) that had high uptake efficiency with prediction confidence of above 0.9.

Peptide No.
CPPs' Names Amino Acid Sequence Cell-Penetrating or Not  92 39 No.14-25 RLFMRFYSPTTRRYG Cell-penetrating 0.95 High 0. 93 40 No.14-17 RLWMRWASPTTRRYG Cell-penetrating 0.99 High 0. 96 41 No.14-18 RLWMRWYAPTTRRYG Cell-penetrating 0.98 High 0. 98 42 No.14-20 RLWMRWYSPATRRYG Cell-penetrating 0.99 High 1.00 43 No.14-21 RLWMRWYSPTARRYG Cell-penetrating 0.99 High 1.00 44 No.14-35 RLWMRWYSPTTRRYA Cell-penetrating 0.98 High 0. 98 45 No. 14 Within this study, we describe significant factors affecting CPP-CPG2 recombinant conjugate and will introduce a guideline for the design of CPP-protein conjugates. This instruction would provide a workflow using available bioinformatics tools to design an effective biotherapeutic with the optimum pharmacological responses and the fewest side effects. It is expected that this guideline will be useful for scientists of different disciplines regarding protein engineering and delivery.

Primary Dataset and Penetration Prediction of CPPs
Unique linear CPPs with natural L-conformation amino acids were retrieved from CPPsite 2.0. In general, CPPs with the highest uptake efficiency are preferred for conjugation. Hence, CPP sequences were submitted to the CPPred-RF server to determine the degree of cellular uptake (Table S1). The CPPred-RF webserver is a two-layered prediction engine. The first layer is based on some features, such as physicochemical properties and dipeptide composition. This layer defines if the submitted sequence is cell-penetrating. When the peptide is cell-penetrating, then the second layer of prediction estimates the uptake of the peptide by cells as high or low. In each layer, prediction confidence is reported. Finally, 70 peptides that were predicted to have the highest uptake efficiency with the prediction confidence of above 0.9 were subjected to further analyses (Table 1). We have checked whether the 70 top selected CPPs have shown adequate uptake efficiency in experimental studies. The data on previously studied top CPPs, as well as the category of each CPP, are available in Table 2. Levels of uptake for 66 out of 70 CPPs were available from laboratory experiments. As presented in Table 2, all of the 70 selected CPPs have displayed either high or medium uptake efficiency. Table 2. Experimental data on the uptake efficiency of top 70 CPPs.

mRNA Secondary Structure Prediction of cpp-cpg2/cpg2-cpp Conjugates
The stability of the mRNA structure affects the expression level of proteins. A more positive ∆G value of the mRNA secondary structure in the translation initiation region correlates with a higher probability of the translation of mRNA into protein [49]. The obtained ∆G values from the mfold server indicated that 74% of the N-terminal conjugates had higher ∆G values compared with the unconjugated CPG2. N-terminal conjugates from peptides 10, 18, 31, 36, and 61 displayed the highest ∆G values. Besides ∆G, the level of exposure of the AUG start codon in the mRNA secondary structure has a regulatory effect on the rate of protein translation in E. coli. If the start codon locates on a loop, higher exposure to the ribosomal subunit occurs and results in a higher level of translation compared with a start codon on the stem [50]. In this study, in 50% of the N-terminal conjugates, the start codon was located on a loop and in the rest of the conjugates, as well as CPG2 itself, the starting codon was on the stem. Secondary structures of two conjugates and CPG2 are shown in Figure 2. The N-terminal conjugate of "R9-CPG2" has a high ∆G value (−1.6 kcal.mol −1 ) and an exposed start codon, while "Transportan 10-CPG2" N-terminal conjugate displays a low ∆G value (−13.2 kcal.mol −1 ) and its start codon is not exposed. ∆G values and position of the start codon for all of the conjugates are available in Table S2. It should be noted that all the seventy C-terminal conjugates have the same initiation translation region as the control unconjugated CPG2. Figure 2. mRNA initiation translation region secondary structures and location of the start codon (indicated by arrows) for two conjugates and CPG2. (a) "R9-CPG2 + PET 14b" has a higher ∆G value compared with CPG2 and the start codon is exposed (b) "Transportan 10-CPG2 + PET 14b" has a lower ∆G value compared with CPG2 and the start codon is not exposed. (c) Control CPG2 + PET 14b: start codon is not exposed.

Physiochemical Properties of CPPs and CPP-CPG2/CPG2-CPP Conjugates
Various physiochemical properties were calculated for CPG2 and the top 70 CPPs both alone (Table S3) and in conjugation to CPG2 at the N-terminal (Table 4) and C-terminal (Table S4) (Table S3), indicating an overall positive charge in the blood pH (7.4). The pI of unconjugated CPG2 was calculated to be 6.22. Only fusion proteins resulting from peptides 12, 66, 67, 68, and 69 had an acidic pI due to the anionic nature of CPPs (Table 4). The rest of the conjugates would have an overall positive charge at pH = 7.4. None of the conjugates had a pI near 7.4, indicating a low risk of protein aggregation due to pI after administration. The instability index for all the conjugates is less than 40, indicating that the conjugates are probably stable in the test tube. However, only conjugates resulting from peptides 1, 3, 12, 14, 55, and 67 do not have elevated instability index compared with the unconjugated CPG2. GRAVY value for all the conjugates is negative, which means all of them are hydrophilic. Compared with CPG2, the GRAVY for conjugates resulting from peptides 1, 12, 14, 28, 29, 30, 32, 33, 66, 67, 68, and 69 showed higher values which could result in higher hydrophobicity. This might, in turn, leads to an increased chance of aggregation. All the conjugates composed of "CPP5s" and "pVEC mutants" showed an increase in GRAVY value compared with CPG2. There was no significant difference between the physiochemical properties of the N-and C-terminal conjugates.

The Solubility of CPP-CPG2 and CPG2-CPP Conjugates
The solubility of the top 70 CPPs conjugated to CPG2, regarding both N-and C-terminal conjugations were predicted using the ccSOL server ( Table 5). The solubility of proteins is a crucial factor for the production, formulation, and delivery of protein-based therapeutics. Solubility is influenced by extrinsic and intrinsic factors. By optimizing extrinsic factors such as pH, ionic strength, the temperature of the solvent, and in the presence of various additives, protein solubility can be increased. The intrinsic factors are mostly related to the amino acids on the proteins' surface [51]. CPPs have been used to increase the aqueous solubility of their cargo. For instance, the conjugation of taxol to CPPs resulted in the improvement of taxol's poor solubility [52,53]. CcSol predicts the percentage of protein solubility. On average, conjugates from "CCP5s" and some "Crot derivatives" displayed an overall higher solubility score. Higher solubility of the "CCP5" family might be due to the increased negative charge on the protein's surface associated with the anionic nature of these CPPs [51]. However, due to differences between results from different solubility prediction servers, further experimental validations are required to reach a definite conclusion. The Nand C-terminal CPP-CPG2 and CPG2-CPP conjugates had an average solubility of 79.69% and 79.44%, respectively. Therefore, the position of a conjugation had no significant effect on the solubility. Table 5. Solubility prediction of CPG2 and top 70 CPPs conjugated to both Nand C-terminus of CPG2 using the ccsol server.

Peptide Number
Name of the Conjugate

Three-Dimensional Modeling of CPP-CPG2 and CPG2-CPP Conjugates
CPPs might influence the structure and function of the covalently conjugated protein [54]. To deliver CPG2, the CPP segment in the conjugate should be exposed to interact with the functional groups on the plasma membrane. The CPP segment should not have any interactions with the active site of the enzyme to keep the functionality of the target protein unchanged. The conjugate should still have reasonable stereo-chemical characteristicw and low steric clashes leading to an easy folding. The I-TASSER program was used to generate PDB models. For each conjugate, models with the highest C-score value were selected and further analyzed by Ramachandran plots. In all models, the CPP domains were exposed and there were no interactions between CPPs and the active site of the CPG2 enzyme.

Thermodynamic Characteristics of CPP-CPG2 and CPG2-CPP Conjugates
Thermodynamic properties play an important role in developing stable biotherapeutics [57]. SCOOP server calculates thermodynamic quantities associated with the folding transition from unfolded to the native state, based on a protein's 3D structure and the host organism ( Table 6, Table S6). The calculated parameters are co-related via the Gibbs-Helmholtz equation. According to the report of Pucci et al. [58], three main strategies result in a more thermodynamically stable protein. A more negative enthalpy change (∆H S ) measured at the maximum stability temperature (T S ) results in an overall decrease of ∆G at all temperatures. In the second strategy, the heat capacity upon folding (∆C p ) becomes less negative, which yields an increase in melting temperature (T m ). The last strategy consists of an increase in T s defined at the minimum of the ∆G(T) curve. As a result, one can find the most stable conjugated protein at room temperature by comparing folding free energy values at room temperature (∆G r ). The conjugate's stability at higher temperatures can be associated with their corresponding T m . Based on calculated values conjugates which have higher (or even equal) ∆G r and T m compared with CPG2 are resulted from N-terminal conjugates with peptides 16, 21, 22, 36, 44, and 45 and C-terminal conjugation with peptides 16 and 63 (Table 6 and Table S6). The position of conjugation does not significantly affect any of the thermodynamic values calculated by SCOOP.  The stability of the conjugates was also evaluated using the FoldX Suite server (Table 7 and  Table S7). According to Rahmatabadi et al. [55], among the dynamic quantities calculated by FoldX Suite, four energies, including total free energy, side-chain hydrogen bonds, solvation polar, and van der Waals clashes have significant correlation with the number of amino acids in the most favored region of Ramachandran Plot; hence, affecting the stability of a protein. Amidst CPP-CPG2 conjugates, N-terminal conjugates derived from peptides 1, 19, 20, 21, 37, 64 and C-terminal conjugates composed of peptides 9, 32, 44, 47, 55, 56, 61, and 69 have the most negative ∆G values of side H bond energies. Only the N-terminal conjugate of "Bip 6-CPG2" has more thermodynamically stable energies across all 4 categories compared with the unconjugated CPG2. Furthermore, it was concluded that the position of conjugation does not significantly affect the calculated thermodynamic energies of CPP and CPG2 conjugates.

Prediction of the Aggregation Possibility of CPP-CPG2 and CPG2-CPP Conjugates
It has been demonstrated that depending on the surrounding conditions and structure, proteins can form insoluble, though stable constructs composed of amyloid fibrils or amorphous aggregates [59]. Protein aggregation is one of the troubles encountered in vitro and in vivo. Protein release and activity are distorted if aggregation occurs after delivery [60,61]. As a result, several computational strategies are used to determine the propensity of proteins to form amyloids based on their amino acid sequence [62][63][64]. Herein, we have employed two servers called Aggrescan and PASTA 2.0 to investigate if the addition of a CPP sequence to CPG2 affects the probability of protein aggregation. Aggrescan finds the number of hotspots for aggregation in a protein regarding amino acid composition, while PASTA 2.0 evaluates the chance of amyloid formations considering pairwise interactions within β-sheets. According to PASTA 2.0 calculations, there was no difference between the numbers of amyloid regions in the conjugates compared with unconjugated CPG2. However, based on Aggrescan analyses, 19 N-terminal conjugates and 15 C-terminal conjugates displayed 17 aggregation hotspots, whereas CPG2 itself and the rest of the conjugates had 16 aggregation hot spots. As a result, 19 N-terminal and 15 C-terminal conjugates might have an increased risk of aggregation than the non-fused CPG2 (Table S8).

Folding Rate and Backbone Flexibility of CPP-CPG2 and CPG2-CPP Conjugates
A protein chain has to be folded into its native conformation to be functional. Therefore, the conjugates must have folding rates closer to the unconjugated CPG2. This means that the attached CPP sequence should cause a minimum disturbance in the folding of a chimera. Although conjugates showed longer folding half-times compared with CPG2, N-terminal conjugates from peptides 8, 10, 18, 34, 36, 59, 66, 68, and 69 and C-terminal conjugates from peptides 12, 59, 60, 61, and 64 displayed the closest half folding time to the CPG2 itself (Table 8). Furthermore, regarding calculated values, N-terminal CPP-CPG2 conjugates displayed significantly lower half-folding time compared with C-terminal conjugates ( Figure 5).   Proteins are composed of a string of amino acids, which after some non-covalent interactions fold into naturally flexible tertiary structures. The degree of flexibility is associated with a protein's function and is crucial in protein engineering and rational drug design [65]. Ligand-binding sites in enzymes usually have both flexible and rigid residues. Rigid residues are associated with specificity and tightness of ligand binding, while flexibility facilitates the entrance of ligands into the binding site and can also be involved in the communication between allosteric and orthosteric binding sites [66,67]. It should be noted that in any enzyme-ligand interaction, the enzyme undergoes a conformational change; therefore, variations in the enzyme's flexibility might disrupt the function. Using the Dynamine server, each conjugate was evaluated to assess if the addition of CPP sequence to CPG2 interferes with the flexibility of zinc-binding amino acids (His 89 , Asp 119 , Glu 154 , Glu 178 , and His 363 ). This might intervene in the ability of CPG2 attachment to the zinc molecules that are substantial for the detoxification of MTX. No changes were observed in His 89 , Asp 119 , Glu 154 , Glu 178 , and His 363 flexibility scores in any of the conjugates. It can be concluded that the addition of CPPs had no interference with the CPG2 ligand-binding site.

Further Analyses for In Vivo Applications
In vivo administration of biotherapeutics comes with a new set of challenges. The objective of this section was to select the conjugates best suited for in vivo application (Table 9).

Analyses of CPP-CPG2 and CPG2-CPP Conjugates
Some proteins might trigger mild to acute allergic responses and fatal anaphylactic shocks. Hence, the possibility of the allergenicity of a biotherapeutic should be explored. Hypersensitivity reactions were reported only in less than 1% of patients receiving glucarpidase [27]. Therefore, it should be investigated if the addition of a CPP sequence affects the probability of allergic reactions. Allergen FP V. 1.0 server evaluated CPG2 and all 140 CPP-CPG2/CPG2-CPP conjugates as probable non-allergens; thus, one can assume that the addition of these CPPs to the CPG2 sequence would not increase the enzyme's allergenicity considerably.
One of the other notable characteristics in a therapeutic is immunogenicity. If the patient's immune system recognizes a biotherapeutic as a threat, produced antibodies reduce the effectiveness of the medication after repeated use. Although glucarpidase does not result in hypersensitivity in most patients, studies have shown that anti-glucarpidase antibodies developed in 17% of patients receiving the medication for one or two doses [27]. In this regard, the fused CPP sequence preferably should not increase the antigenicity of CPG2 and if possible, alleviates the immunogenicity. Assessment of CPG2 and the top 70 CPPs conjugated to Nand C-terminal positions showed that CPG2 and all the conjugates had immunogenicity scores above 0.4, which might act as potential antigens. The addition of CPP resulted in a decrease in immunogenicity score in most conjugates. About 66% of N-terminal and 60% of C-terminal conjugates had an immunogenicity score lower than that of CPG2. N-terminal conjugates from peptides 1, 14, 47, 48, 49 and C-terminal conjugates from peptides 14,19,29,48, and 70 displayed the least immunogenicity scores. Furthermore, it was concluded that the position of conjugations does not significantly affect the immunogenicity of CPP and CPG2 conjugates.
One of the concerns for protein therapeutics is their short biological half-lives. Proteins usually have fast degradation and clearance either by kidney filtration or liver metabolism. In the case of CPP-CPG2 conjugates, intracellular proteasomes might also expedite the frequency of administration. This shows the importance of the selection of the conjugates with optimal predicted half-lives. Using the ProtLifePred server, it was shown that all C-terminal conjugates have half-lives comparable to the unconjugated CPG2, while 31% of the N-terminal conjugates had half-lives longer than the unconjugated control. The rest of the conjugates had shorter or equal half-lives compared with un-conjugated CPG2 (Table S9).

Analyses of top CPPs for In Vivo Application
Some proteins and peptides might have hemolytic toxicity towards red blood cells. Hemolysis is the premature loss of RBCs before their 120 days of the expected lifespan, which results in anemia. According to the HemoPI tool, CPPs named "Transportan 10 (TP10)" and "II" with the highest PROB scores might potentially have hemolysis effect after in vivo administration. "pVEC" mutants and "Retro -Tat (57-49)" had the lowest possibility of hemolysis. All the other CPPs had equal PROB scores relatively.
Another aspect that one should consider in the design of biotherapeutics is the undesired proinflammatory effect of proteins and peptides. For vaccines or immunotherapeutics, the ultimate goal is the activation of the immune system; however, for other proteins, proinflammatory effects such as T cell or B cell activation is undesirable. Proinflamm server checks the peptide sequence for some recognized proinflammatory epitopes. In this study, no inflammatory potential for the top 70 CPPs was detected.
As the aim of this study was to find the best candidates among experimentally validated CPPs for conjugations with CPG2, ToxinPred server was used to screen the top 70 CPPs using SVM based approach to identify toxic CPPs. Twenty percent of the top 70 CPPs were predicted to be toxic. All toxic peptides were derivatives of Crot , which originates from protein in snake venom. Unfortunately, no tool is available for the analysis of hemolytic activity and the proinflammatory effect of complete protein sequences.

Effect of Position of Conjugations on CPP-CPG2 and CPG2-CPP Conjugates
Throughout this study, the effect of position of the conjugation on the characteristics of all CPP-CPG2 and CPG2-CPP conjugates was analyzed using the unpaired t-test. The energy level of secondary structures at the 5 mRNA's initiation translation region, robustness in the protein's 3D structures, higher helix and sheet content in the CPP region of the secondary structure after conjugation, and shorter half-folding times were all significantly improved in the N-terminal CPP-CPG2 conjugates compared with the C-terminal CPG2-CPP homologs. Regarding other features, including physiochemical properties, solubility, thermodynamic properties, probability of aggregation, backbone flexibility, allergenicity, and immunogenicity the position of conjugations was not significantly different between N-and C-terminal conjugates.

Analysis of Susceptibility to Human Proteases
Proteases are a group of enzymes that can cleave the peptide backbone of their target proteins. Some proteases are only able to recognize and cleave particular amino acid sequences known as cleavage sites. Proteolytic degradation of peptides and protein-based drugs is one of the major complications on the way of achieving optimum systemic administration of therapeutics [74]. Glucarpidase, like other peptide and protein-based biotherapeutics, is prone to proteolytic degradation. Although its stability in the blood is high enough to eliminate high concentrations of blood MTX, this might not be enough for successful application in ADEPT [75]. Therefore, several strategies have been applied to increase the resistance of glucarpidase to proteolytic enzymes, such as PEGylation, fusion with human serum albumin [76,77], and circular permutations [75].
Using the Prosperous server, CPG2 and the most promising conjugates (17 final conjugates) were evaluated against human proteases (Tables S11 and S12). The protease susceptibility of the conjugates is similar to the unconjugated CPG2, except for "Rev (34-50)-CPG2" that had a significant chance of being cleaved by "Kallikrein related peptidase 5", while CPG2 and other CPP-CPG2 conjugates are expected to be resistant in the presence of the above-mentioned peptidase. It was observed that in some instances CPP-CPG2 conjugates had higher numbers of cleavage sites compared with CPG2 due to the addition of CPP sequence. However, the significant increase in cleavage sites has been observed for those proteases that target the unconjugated CPG2 even before the conjugation. Furthermore, some enzymes, for example, proprotein convertase 1 and 2, furin, and thrombin were able to cut inside the CPP sequence conjugated to CPG2. It could also be observed that the conjugation of CPP to CPG2 did not significantly decrease the protease susceptibility of CPG2.

Limitations of the Current Study
The mechanism of CPP translocation across plasma membranes is highly dependent on the type of cell line, the concentration of CPP, and the type of cargo [78]. Several CPPs can directly pass through the plasma membrane [79]; however, some others-especially those that are conjugated to macromolecules-penetrate through the endocytosis pathway. Endocytosis might result in the entrapment of the conjugate inside the endocytic organelles before arrival to the cytoplasmic or nuclear target [80]. Hence, the entrapment of CPPs and the corresponding conjugates inside the endosome is one of the limitations in the current computational and experimental studies. Although a bioinformatics method for the assessment of possible entrapment of CPP-protein conjugates is not available yet, using multivalent CPPs or CPP conjugation to a pH-Dependent Membrane Active Peptide (PMAP) are recommended to resolve inadequate CPP-protein endosomal release issues [79]. However, it was promising that in our experimental study conducted on the N-terminal conjugate of TAT-CPG2, the release of CPP-cargo conjugates from endosomes was high to display adequate pharmacological responses [28].

Primary Dataset Collection
Sequences of CPPs were retrieved from CPP site 2.0 (http://crdd.osdd.net/raghava/cppsite/), which keeps the records of experimentally validated CPPs [17]. After excluding cyclic peptides and CPPs that had un-natural residues or amino acids with D-conformations, 1155 unique CPPs remained. Although some of the excluded CPPs-CPPs with unnatural or D-conformation residues and cyclic CPPs-have shown promising results in experimental studies [81][82][83], current computational web servers are unable to analyze these sequences. Mature CPG2 amino acid sequence from Pseudomonas sp. strain RS-16 was retrieved from Uniprot (Uniprot ID # P06621) to build protein conjugates. The nucleotide sequence of the cpg2 gene was then codon-optimized by the codon usage wrangler server (https://www.mrc-lmb.com.ac.uk/ms/methods/codon.html).

Penetration Prediction of CPPs
The uptake efficiency and respective prediction confidence for all 1155 CPPs was predicted using the CPPred-RF server (http://server.malab.cn/CPPred-RF/) [84]. Results of CPPred-RF sever analysis was narrowed down to 70 CPP candidates, which have the highest uptake efficiency with the prediction confidence of 0.9 or above.

mRNA Secondary Structure Prediction of cpg2 and cpp-cpg2/cpg2-cpp Conjugates
The nucleotide sequences of cpg2 and top 70 cpps conjugated to either N-or C-terminus of cpg2 gene in PET 14b expression vector (cpp-cpg2-pET14b) were used to determine the secondary structure and the minimum free energy of 5 -mRNA translation initiation region using the mfold online server (http://unafold.rna.albany.edu/?q=DINAMelt/Quickfold) [85]. To construct the N-terminal conjugates, 30 bases upstream of the start codon in PET 14b were fused to 30 initial bases of cpps linked to the codon-optimized cpg2 gene. For the C-terminal conjugates, 30 bases upstream of the start codon were directly connected to the codon-optimized cpg2. PET 14b was selected due to our recent success in the expression of cpg2 and tat-cpg2 fusion constructs [28].

Prediction of Aggregation Possibility of CPG2 and CPP-CPG2/CPG2-CPP Conjugates
FASTA sequences of CPG2 and the top 70 CPPs linked to the N-and C-terminus of CPG2 were submitted to PASTA 2.0 (http://protein.bio.unipd.it/pasta2/) [94] and Aggrescan (http://bioinf.uab.es/ aggrescan/) [63] servers to compare aggregation possibility of conjugated versus un-conjugated CPG2 protein. PASTA 2.0 analyzes segments that are more likely to form fibrillar aggregates, and Aggrescan predicts the number of hot spots for aggregation in a sequence.

Folding Rate of CPG2 and CPP-CPG2/CPG2-CPP Conjugates
Sequences of CPG2 and the top 70 CPPs linked to the N-and C-terminus of CPG2 were submitted to the Foldrate server (http://www.csbio.sjtu.edu.cn/bioinf/FoldRate/) [95]. Foldrate provides an estimation on the time needed for proteins to fold into their tertiary structure by predicting ln(K f ) constant of folding and half-folding time.

Backbone Flexibility of CPG2 and CPP-CPG2/CPG2-CPP Conjugates
Amino acid sequences of CPG2 and both N-and C-terminal conjugates of the top 70 CPPs and CPG2 were analyzed using the DynaMine server (http://dynamine.ibsquare.be/) [96]. DynaMine predicts the flexibility of each amino acid in a protein sequence by attributing values between 0-1 to each residue. Zero accounts for complete flexibility and 1 for complete rigidity.

Analyses of CPP-CPG2 and CPG2-CPP Conjugates
Sequences of 70 top CPP-CPG2 N-terminal and CPG2-CPP C-terminal conjugates were analyzed using AllergenFP V 1.0 (http://ddg-pharmfac.net/AllergenFP/) [97] to predict their potential allergenicity. Sequences of both Nand C-terminal conjugates of the top 70 CPPs and CPG2 were analyzed for potential immunogenicity by the VaxiJen V2.0 server (http://www.ddgpharmfac.net/vaxijen/VaxiJen/ VaxiJen.html) [98] considering bacteria as selected target organism. The half-life of CPG2 conjugated to the top 70 CPPs (both N-and C-terminal conjugates) was calculated using the ProtLifePred server (http://protein-n-end-rule.leadhoster.com/) in E. coli as the expression system. ProtLifePred server estimates the half-life of protein sequences based on the N-end rule considering ubiquitination [99,100].

Analyses of CPPs
The top 70 CPPs were analyzed for their possible RBC lysis effect after in vivo administration by the HemoPI server (http://crdd.osdd.net/raghava/hemopi/) [101]. HemoPI server analyses submitted sequences using SVM based approach to predict their hemolysis potency by assigning each query a PROB score that ranges between 0-1. Zero is an indication of the lowest possibility of being hemolytic. In this study, the PROB score was set on 0.5 to determine hemolytic toxicity for each CPP. Potential toxicity of the top 70 CPPs was evaluated by ToxinPred server (http://crdd.osdd.net/raghava/toxinpred/ multi_submit.php) [102], using the server's default threshold (zero). Top 70 CPPs were checked for pro-inflammatory effect using the ProInflam server (http://metagenomics.iiserb.ac.in/proinflam/index. html) [103].

Evaluation of CPP-CPG2/CPG2-CPP Conjugates Susceptibility to Human Proteases
The susceptibility of the most promising CPP-CPG2 and CPG2-CPP conjugates to human proteases was evaluated using the Prosperous web server (http://prosperous.erc.monash.edu/) [104,105]. Out of 90 proteases available on the Prosperous server, 51 were human proteases according to MEROPS: the peptidase database [106]. These human proteases are categorized into four families, including aspartic proteases like cathespsin D and E, cysteine proteases like caspase 1 and 3, metalloproteases like matrix metallopeptidase 1,2, 3, and serine proteases like thrombin and plasmin.
Prosperous assigns each cleavage site a probability score between 0.0 and 1.0. Zero is considered as the lowest possibility and 1.0 is estimated as the highest possibility of a position being recognized and cleaved by a protease. In this analysis, the FASTA sequences of CPG2 and 17 most promising CPP-CPG2/CPG2-CPP conjugates were submitted to Prosperous. A probability score of 0.700 and higher was considered significant for the susceptibility to human proteases. An overview of analyses employed in this study is shown as a bioinformatics flowchart below (Scheme 1).

Scheme 1.
A summary of the proposed workflow to select the most promising CPPs in conjugation with a protein.

Statistical Analyses
Unpaired t-test was used to test the significance between N-terminal (CPP-CPG2) and C-terminal (CPG2-CPP) values using GraphPad Prism 8 software (GraphPad Software, Inc. San Diego, CA, USA). p < 0.05 indicated statistical significance.

Conclusions
Within this study, characterizing a high number of CPPs highlighted crucial factors necessary for the rational design of a CPP-protein chimera. This study presented a bioinformatics workflow applicable as a comprehensive approach useful to select CPP-cargo constructs for any therapeutic application. The addition of CPPs resulted in a higher probability of translation of mRNA into protein in about 74% of the N-terminal conjugates. Structurally more than 70% of the CPPs had α-helix or β-sheet conformations in their secondary structure after conjugations, which results in higher penetration than for non-fused cargo. Addition of CPPs did not cause any significant negative changes regarding the stability of the target protein (thermodynamic stability and resistance to human proteases). Except for a few CPPs which might result in higher aggregation of the chimera, analysis of the physiochemical characteristics showed that the dominant number of CPPs linked to the cargo did not have any negative influence on the properties. Conjugation of CPPs did not negatively interfere with the solubility, even improved it. Computational analyses showed that allergenicity is not affected negatively and does not lead to limitations for injectable formulations. Flexibility analysis showed that in most conjugates, the addition of CPPs caused no major alterations in the overall enzyme flexibility. In all conjugates, the flexibility of the ligand-binding site after the addition of a CPP remained unchanged. However, stronger servers are needed for evaluation of the RBC hemolysis and pro-inflammatory effect of the peptide-protein conjugate. Concerning the position of conjugations, the N-terminal linkage is preferred for the construction of CPP-CPG2 conjugates than the C-terminus CPG2-CPP homologs. The approach presented in this investigation is not limited to the CPP-glucarpidase case study but is generally applicable to any CPP-protein conjugate design. Hence, our study provides a platform for further in vitro and in vivo investigations, which should be considered regarding the advantages of computational analyses before designing any covalently conjugated CPP-protein construct.
Supplementary Materials: The followings are available online: Table S1: Result of CPPred-RF server analysis for all 1155 CPPs, Table S2: Result of analysis with mfold server for mRNA intiation translation region sequences of CPG2 and top 70 CPPs conjugated to N-terminal of CPG2 in PET 14b expression vector (All C-terminal conjugates have the same result as that of CPG2 itself), Table S3: Physiochemical properties of top 70 CPPs analyzed via ProtParam tool, Table S4: Physiochemical properties of top 70 CPG2-CPP C-terminal conjugates analyzed via ProtParam tool, Table S5: Analysis of 3D modeled structures of CPG2 and top 70 CPPs conjugated to CPG2 by PDBsum server, Table S6: Thermodynamic properties CPG2 and CPP-CPG2 conjugates calculated by the SCOOP sever, Table S7: Thermodynamic energies of CPG2 and CPP-CPG2 conjugates calculated by FoldX, Table S8: Analysis of aggregation possibility for CPG2 and top 70 CPPs conjugated to CPG2 by Pasta 2.0 and AGGRESCAN servers, Table S9: Half-life prediction for top 70 CPPs conjugated with CPG2, Table S10: Selection of top CPP candidates in conjugation to CPG2, Table S11: Shows whether the conjugates or CPG2 will be cleaved by respective proteases or not. The probability score of 0.700 was considered as significant, Table S12: Shows whether the conjugates will have increased risk of cleavage by each protease due to CPP addition compared with CPG2. Probability score of 0.700 was considered significant.