Combination of Gemcitabine with Cell-Penetrating Peptides: A Pharmacokinetic Approach Using in Silico Tools

Gemcitabine is an anticancer drug used to treat a wide range of solid tumors and is a first line treatment for pancreatic cancer. Our group has previously developed novel conjugates of gemcitabine with cell-penetrating peptides (CPP), and here we report some preliminary data regarding the pharmacokinetics of gemcitabine, two gemcitabine-CPP conjugates and respective CPP gathered from GastroPlus™, and analyze these results considering our previous evaluation of gemcitabine release and conjugates’ bioactivity. Additionally, seeking to shed some light on the relation between the penetration ability of CPP and their physicochemical properties, chemical descriptors for the 20 natural amino acids were calculated, a new principal property scale (z-scale) was created and CPP prediction models were developed, establishing quantitative structure-activity relationships (QSAR). The z-scores of the peptides conjugated with gemcitabine are presented and analyzed with the aforementioned data.


Introduction
Gemcitabine (2 ,2 -difluoro-2 -deoxycytidine, dFdC, Gem, Figure 1) is a drug considered as 'first-line treatment' for various types of solid tumors and is clinically used in the treatment of various cancers including pancreatic cancer, non-small cell lung cancer (NSCLC), bladder, ovarian, and breast cancer, as well as some blood cancers, such as non-Hodgkin's lymphoma [1][2][3]. Like most anticancer drugs, gemcitabine is administered intravenously. Its cellular uptake is primarily facilitated and governed by the human equilibrative nucleoside transporter 1 (hENT1) and human concentrative nucleoside transporter 3 (hCNT3) [4,5]. Inside cells, gemcitabine acts as an antimetabolite, but first needs to be activated by phosphorylation to its triphosphate form (dFdCTP) by deoxycytidine kinase (dCK) and other intracellular kinases. dFdCTP is incorporated into DNA, leading to DNA strand termination after the incorporation of one more nucleotide, and also competes with deoxycytidine triphosphate (dCTP) as an inhibitor of DNA polymerase. The incorporation of this extra nucleotide into DNA appears to be resistant to the normal mechanisms of DNA. Moreover, gemcitabine diphosphate (dFdCdP) is a potent inhibitor of ribonucleotide reductase (RNR), resulting in depletion of deoxyribonucleotides necessary for DNA synthesis, further potentiating the effects of dFdCTP in causing cell death by apoptosis [6,7]. However, there are some factors hindering the full potential of gemcitabine: (a) gemcitabine may rapidly undergo deamination to its inactive uridine metabolite (2',2'-difluorodeoxyuridine, dFdU), by cytidine deaminase (CDA), which is present at high levels in both human plasma and liver; (b) gemcitabine monophosphate (dFdCMP) is deaminated into dFdUMP by deoxycytidylate deaminase (DCTD); (c) the phosphorylated metabolites of gemcitabine are inactivated via reduction by cellular 5'-nucleotidase (5'-NT). Enzymatic conversion of gemcitabine rapidly clears it from the body [7,8]. Additionally, some tumor cells can develop resistance to gemcitabine related to nucleoside transporter deficiency [5]. As the adverse effects associated with chemotherapeutic agents remain severe, many efforts have been made to maximize therapeutic efficacy and attenuate the nocuous side manifestations. Numerous gemcitabine prodrugs have been developed to alter some of the unfavorable physicochemical properties of the drug and ideally improve its oral bioavailability.
Recently, our group has synthesized two gemcitabine-CPP conjugates (Figure 1), in an effort to both retard or prevent deamination of gemcitabine (masking its aniline moiety) and facilitate its delivery into cancer cells [9], taking advantage of the fact that all CPP are able to efficiently pass through cell membranes while being non-cytotoxic and carrying a wide variety of cargos inside cells [9,10]. CPP Penetratin (Pen, RQIKIWFQNRRMKWKK-NH 2 ) and pVEC (LLIILRRRIRKQAHAHSK-NH 2 ) were selected for conjugation with gemcitabine [9]. These are two well-known CPP and have both been reported in numerous cancer studies over the last two decades [11,12]. An additional cysteine residue (Cys) was coupled to the N-terminus of both CPP, producing Cys-Pen and Cys-pVEC, to allow the subsequent binding to parent drug. The time-dependent kinetics of gemcitabine release from hydrolysis of these new conjugates was studied in phosphate-buffered saline (PBS) at pH 7.4, 37 • C, and their biological activity was evaluated against three human tumoral cell lines: MKN-28 (human gastric cancer), Caco-2 (heterogeneous human epithelial colorectal adenocarcinoma) and HT-29 (human colon adenocarcinoma). The results were promising, revealing an increase in the anti-proliferative activity of gemcitabine in vitro upon conjugation with the CPP [9].
In this work, we used computational tools to study the pharmacokinetics (PK) of drug gemcitabine, gemcitabine-CPP conjugates and respective CPP, and to establish a possible relation between penetration potency of CPP and their physicochemical properties. The PK data was acquired using GastroPlus™; amino acid properties were calculated in Schrödinger's Maestro software; principal component analysis (PCA), multivariate analysis (MVA) and partial least square discriminant analysis (PLS-DA) were used to build CPP prediction models in SIMCA by Umetrics. GastroPlus™ is a powerful mechanistically based simulation and modeling software for pharmaceutical research. With Advanced Compartmental Absorption and Transit (ACAT™) and Physiologically Based Pharmacokinetic (PBPK) models, it has features and capabilities to support model-based drug development in all phases of drug discovery, translational research, and clinical development. This software has been used in numerous academic studies and by pharmaceutical companies of excellence, along with the U.S. Food and Drug Administration (FDA), the Centers for Disease Control and Prevention (CDC), the National Institutes of Health (NIH), the National Cancer Institute (NCI) and the China Food and Drug Administration (CFDA).
Peptides and proteins have been the subject of considerable interest in medicine, research, and drug development due to some of their specific properties and a wide variety of applications [13,14]. In particular, CPP have the intrinsic property to efficiently deliver covalently or noncovalently bound therapeutic molecules (nucleic acids, proteins, drugs, imaging agents, etc.) into a variety of cell types and tissues in a nontoxic manner, via receptor-independent mechanisms (primarily endocytosis) [10,15,16]. Besides their ability to be uptaken by cells and act as an excellent therapeutic delivery vehicle, it has been established that CPP are generally relatively short peptides (less than 40 amino acids), have low cytotoxicity, dose-dependent efficiency, and no restriction with respect to the size or type of cargo. Additionally, CPP can enhance the water solubility of drugs [17].
The rational design and prediction of new CPP requires an understanding of the defining properties and similarities of these peptides. For example, almost every CPP sequence involves positively charged amino acids and it has also been shown that secondary structure, specifically helicity, is a key factor governing the interactions of a given CPP with cell membranes, and peptides with an α-helical region can enter cells more efficiently [18].
Theoretical and computational methods are powerful and very often useful tools to predict new CPP sequences, based on previously available experimental data and calculations of several amino acid and peptide properties. Initially, principal components analysis (PCA) and binary classification were explored for pattern recognition models [19,20]. With the determination of physicochemical properties of amino acids and peptides, quantitative structure activity relationship studies (QSAR), partial least squares (PLS) regression and multivariate analysis (MVA) can also be used as tools [21][22][23][24]. Hellberg et al. developed a tridimensional scale (z1-z3) for the 20 natural amino acids to perform quantitative structure-activity relationship (QSAR) of peptides, using 29 physicochemical properties [23]. This method and these scales have since been extended to include more amino acids and descriptive properties in the search for new CPP sequences.
In this project, we followed the same methodology and selected 12 physicochemical properties of the 20 natural amino acids to extract 3 z-scores. This tridimensional z-scale was used to build several CPP prediction models and to discuss the properties of amino acids and peptides that seem to play an important role in the penetration ability of these peptide sequences. Although it is possible to create models that allow for amino acid position-based optimization, the models created here were to predict a binary classification: CPP or non-CPP, using various calculated global peptide descriptors. This has been applied in multiple previous studies regarding peptide modeling with varying successful results [21,25].

Amino Acids-Structure, Physicochemical Properties and Creation of a Z-Scale
The structures of the 20 natural amino acids were drawn in Maestro (version 10.4, Schrödinger, LLC, New York, NY, USA). All amino acids were capped using C-terminal amidation and N-terminal acetylation to better simulate an amino acid as part of a peptide chain, linked through amide bonds. Maestro's LigPrep tool was used to simultaneously minimize the structures and generate possible charge states at pH 7.0 (histidine was only included in its charged, deprotonated state).
Physicochemical properties of the amino acids were calculated by Maestro's QikProp tool. This data was imported into SIMCA (version 13.0 ed, Umetrics AB, Umeå, Sweden) where it was scaled and centered. After principal component analysis (PCA) of the data, 3 principal components were extracted. These scores (designated z1, z2 and z3) constitute a z-scale used to quantitatively describe each amino acid and the peptide sequences. The 12 selected properties for PCA were: number of rotatable bonds (#rotor), molecular weight (mol MW), volume, solvent accessible surface area (SASA), number of hydrogen bond donors (donorHB), number of hydrogen bond acceptors (accptHB), globularity (glob), octanol/water partition coefficient (QPlogPo/w), polar surface area (PSA), net charge (Tot Q), and ratios FISA/SASA (FISA is the hydrophilic component of SASA) and FOSA/SASA (FOSA is the hydrophobic component of SASA).
In general, the relation of the PCs with the physicochemical properties suggests the first PC (z1) is mainly related to properties describing size and shape properties, such as volume, SASA, globularity and molecular weight; the second PC (z2) seems to be more related to the polarity of the amino acids and the descriptors QPlogPo/w, accptHB, donorHB, FISA/SASA and PSA; finally, z3 seems to be predominantly influenced by electronic properties (in this case described by charge).

Datasets
Peptide sequences were extracted from the different CellPPD (Designing of Cell Penetrating Peptides) databases, available from http://crdd.osdd.net/raghava/cellppd/dataset.php. This provided a main dataset of experimentally validated cell-penetrating (900 CPP) and both validated and randomly generated non-active peptides (1148 non-CPP) after removal of duplicates.
The lack of experimentally validated non-CPP is a known problem [26,27] and creating balanced datasets, which has been demonstrated to be very crucial in modeling [26,28,29], is therefore a major problem. To try to overcome this issue, the main dataset was reduced to contain only 900 non-CPP, the same number of CPP present. The deleted 248 peptides were selected randomly.
To study the influence of terminal chains, all peptides were truncated to originate 6 other datasets. First, the peptides were divided in half, generating an N-terminal and a C-terminal dataset. In the cases of peptides with an odd number of amino acids, the N-terminal was the longer chain. Then, five residues were taken from each terminal, originating the "first 5AA" and "last 5AA" datasets. Finally, the same process was used, but to create "first 10AA" and "last 10AA" datasets.

Peptide Descriptors
Every peptide is described as a sequence of the z-scores of their amino acids. The mean of the z-scores across the entire sequence was calculated (mean z1, mean z2 and mean z3). The absolute difference between terminals was calculated (|Nt − Ct|), as well as the absolute difference between the first and last 5 or 10 amino acids. Using an extension of the Eisenberg's equation where the hydrophilicity descriptor of the original equation was replaced with the generated z-scale values (Equation (1)), as established by Maccari et al. [30], the z-scale moment was calculated for each dataset.
Equation (1): Original Eisenberg's equation; N: number of amino acids in the peptide sequence; n: order number of the specific amino acid examined; H: experimental hydrophilicity of a specific amino acid; δ: angle between two adjacent amino acids, which in the case of an alpha helical structure is defined as 100 • .
Some properties of the peptides were calculated and applied as descriptors to the models. These properties included the peptides' steric bulk (calculated as the mean number of non-hydrogen atoms in the amino acid side chains), the mean net donating hydrogen bonds (calculated as the accepted hydrogen bonds subtracted from the donated hydrogen bonds), the total charge of the peptide sequences as well as the mean net charge, which takes into consideration the total number of amino acids in each sequence. Additionally, the total number of Arg, His, Lys, Asp and Glu residues, and the total number and ratio of positively and negatively charged amino acids were also considered when building the prediction models.

Prediction Model Generation and Optimization
Using SIMCA and Partial Least Square Discriminant Analysis (PLS-DA), numerous prediction models were generated by varying the included properties and descriptors.
To be able to perform external validation of the built models, a test set composed of peptides not included in the generation of the models is needed. So, 50% of the main dataset peptides were randomly extracted and selected as the test set.
Internal classification predictive value, Q 2 , and fit measurements were calculated and analyzed in the optimization process. Four performance measurements to access the predictability of the different models were calculated based on the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). These measurements were sensitivity, specificity, accuracy and Matthew's correlation coefficient (MCC), a quality measurement for binary classifications (Equations (2) to (5)).
Equations (2) to (5): Sensitivity, representing the percentage of correctly predicted positive sequences; specificity, representing the percentage of correctly predicted negative sequences; accuracy, representing the percentage of correctly predicted sequences overall; and Matthew's correlation coefficient (MCC), a quality measurement for binary classifications.

Pharmacokinetic Assessment of Gemcitabine, Cpp and Gemcitabine-Cpp Conjugates
The pharmacokinetic study of gemcitabine, CPP and conjugates was performed in GastroPlus™ (version 9.5, Simulations Plus, Inc., Lancaster, California, USA), a mechanistically based simulation and modeling software for pharmaceutical research. GastroPlus™ builds physiologically based pharmacokinetic (PBPK) models and can run simulations based on a drug's structure and collected data to predict the most important parameters in pharmacokinetics (PK), such as the maximum concentration reached in plasma and liver, time necessary to reach such concentrations, binding to plasma proteins, fraction absorbed and bioavailability. It also draws a graphical representation of plasmatic concentration over time and calculates the area under the curve (AUC). GastroPlus™ not only simulates human PK, but can also be used to study mice, rats, monkeys, beagles, cats, rabbits and minipigs, based on preinstalled human and animal physiological parameters. This software has been used to successfully and accurately predict PK profiles, an important tool in early on drug discovery [31].
All the simulations in the scope of this project were performed to predict the PK for 24 h after intravenous administration of 1250 mg (1 h perfusion), using the Compartmental model of GastroPlus™. The software did not provide an estimated clearance for any of the molecules studied here, thus, gemcitabine's clearance value of 168 L/h was input into the software, according to this drug's FDA label and information deposited on DrugBank [32][33][34].
The general workflow of PBPK modeling has been described in publications and tutorials [35][36][37]. The preliminary model in this case was based on the physicochemical data from ADMET Predictor™ module of GastroPlus™, using a standard compartmental PBPK model.

Results and Discussion
The choice of amino acids and their combination in a peptide sequence when designing new CPP are fundamental. Properties such as size, polarity and charge vary greatly within the 20 natural amino acids, and to better understand how these properties correlate with CPP penetration ability, 12 physicochemical properties were selected and PCA was performed to extract 3 principal components (PCs), forming a tridimensional z-scale, presented in Table 1. The relation of the PCs with the physicochemical properties can be seen in Figure 2.  Every peptide was described as a sequence of the z-scores of their amino acids and peptide descriptors were calculated for every peptide in the main dataset. PCA was performed to extract 3 PCs for each peptide. Several prediction models were generated by varying the included properties and descriptors, and Pen, Cys-Pen, pVEC and Cys-pVEC were predicted as CPP; their z-scores are presented in Table 2. All the models created in this project showed a decent ability to predict CPP, with an average of 79% sensitivity, 80% specificity, 81% accuracy, 61% MCC and 0.406 Q 2 .
These results show Pen and Cys-Pen have z1 scores 2-fold higher than the z1 scores calculated for pVEC and Cys-pVEC. The same difference was observed for the z2 scores. However, regarding the third PC, pVEC and Cys-pVEC z3 scores are higher than the ones calculated for Pen and Cys-Pen. Adding a Cys residue to the original CPP sequences seems to have had a bigger impact on the PC related to size and shape, z1, where it was possible to differentiate the original CPP from the modified Cys-CPP, whereas z2 and z3 scores are very similar for the CPP and Cys-CPP.
As previously mentioned, charged has long been appointed as one of the most important features/characteristics of CPP. In Table 3, the number of charged amino acids and ratio of hydrophilic residues to total number of residues are presented. The difference in the content of positively charged amino acids in Pen and pVEC can explain the higher z3 scores calculated for pVEC (and Cys-pVEC).
With respect to the in vitro results previously observed by our group, there was a significant improvement in the biological activity of gemcitabine upon conjugation of the drug with either CPP, with Gem-Cys-pVEC conjugate showing the best results in MKN-28 and HT-29 cells (Table 4).
In Table 5 are the input data used in GastroPlus TM to simulate plasma concentration. Concentration curves were then compared to that of parental drug (GEMZAR ® , gemcitabine for injection) and the approximation between values has been achieved.
Despite the promising in vitro bioactivity, favorable pharmacokinetic properties are required for the success of therapies in vivo. According to the simulations carried out in GastroPlus™, conjugates' bioavailability is ensured and plasma concentration should reach therapeutic levels ( Table 6).     The calculated AUC for the conjugates was comparable to the AUC calculated for gemcitabine, yet, estimated C max was higher for all peptides and conjugates analyzed compared to gemcitabine alone ( Figure 3). Gem-Cys-pVEC conjugate binds less extensively to plasma proteins (>F up , 42.89%). Considering this conjugate showed the best bioactivity in MKN-28 and HT-29 cells, and released gemcitabine in PBS faster than Gem-Cys-Pen conjugate (50% over 42 h, versus 9.6 days for Gem-Cys-Pen [8]), we believe Gem-Cys-pVEC conjugate has the best suitable profile for drug delivery. Binding to plasma proteins acts as a protection from quick biotransformation and degradation due to the action of plasma circulating enzymes [40] (such as proteases and CDA). This increases circulation time and can also be advantageous to biodistribution. However, it is important that there is a significant percentage/amount of the drug/compound free in circulation so that it can reach its target and exert its pharmacologic action. Differences in V c of gemcitabine and the conjugates can be explained by their different affinity to bind to plasma proteins.

Conclusions
The main goal of this study was to combine the in vitro and in silico approaches to highlight the potential clinical applications of CPP in drug delivery. The development of an amino acid z-scale and the calculation of peptide descriptors was important to understand some factors impacting penetration ability. We believe this method is of great value for pharmaceutical design using CPP for drug delivery. Given the results of this work, we intend to continue studying this approach and these conjugates, and to carry out in vivo experiments, considering Gem-Cys-pVEC our therapeutic lead as it showed the most promising results regarding in silico calculated properties, pharmacokinetic potential and in vitro bioactivity.