Application of ‘inductive’ QSAR descriptors for quantification of antibacterial activity of cationic polypeptides

On the basis of the inductive QSAR descriptors we have created a neural network-based solution enabling quantification of antibacterial activity in the series of 101 synthetic cationic polypeptides (CAMEL-s). The developed QSAR model allowed 80% correct categorical classification of antibacterial potencies of the CAMEL-s both in the training and the validation sets. The accuracy of the activity predictions demonstrates that a narrow set of 3D sensitive 'inductive' descriptors can adequately describe the aspects of intra- and intermolecular interactions that are relevant for antibacterial activity of the cationic polypeptides. The developed approach can be further expanded for the larger sets of biologically active peptides and can serve as a useful quantitative tool for rational antibiotic design and discovery.


QSAR models for antibiotic activity
QSAR studies of antibiotic activity represent an emerging and exceptionally important topic in the area of computed-aided drug design.Although the demand for 'in silico' discovery is clear in all areas of human therapeutics, the field of anti-infective drugs has a particular need for computational solutions enabling rapid identification of novel therapeutic leads.As a result, there is an urge for new antibiotics and antivirals driven by critical situations, such as the increased prevalence of multi-drug resistant bacteria and HIV/AIDS, and the emergence or re-emergence of deadly infectious diseases such as Lyme disease, West Nile virus, Hantavirus pulmonary syndrome, Norwalk-like virus, Avian influenza virus, SARS, and novel forms of Cryptococcal infection.On another hand, historically, 'Big Pharma' have withdrawn from the field of antimicrobial drug development in favour of more profitable areas.Consequently, very few novel antibacterial therapeutics have emerged over the last decade.At this moment, QSAR studies can help solving this problem by providing the means of rapid design and virtual screening of combinatorial anti-infective libraries, as well as for rational data mining for novel antibiotic candidates.
Few antibacterial QSAR studies have been reported up to date, which could either distinguish compounds possessing antibacterial activity from all other chemicals, or numerically reproduce antibacterial potencies in the series of closely related chemical analogues.These QSAR approaches process a variety of structure-dependent descriptors with machine learning and statistical techniques such as Artificial Neural Networks [1][2][3], Linear Discriminant Analysis, [4][5][6] Binary Logistic Regression [5], Principal Component Analysis and k-means Cluster method [7].In some cases the results allowed the authors to introduce novel anti-infective leads, however, all of the reported QSAR solutions have been built upon already well -studied classes of traditional antibiotics.In the current work, we apply the QSAR methodology to the newest class of antibacterial therapeutics -the cationic polypeptides, which represent the latest hope in the combat against multi-drug resistant pathogens.

Cationic polypeptides as a novel class of antibacterial therapeutics
A diverse population of antimicrobial peptides (AMP-s) can be found in nature, as they are an essential component of anti-infective defence mechanisms in mammals, amphibians, insects and plants [8][9][10][11].The majority of AMP-s share several key structural features such as short length (typically 10 to 40 amino acids), amphipathicity (i.e., a molecule has distinct cationic and hydrophobic faces), and helical or cyclic structure.In the recent years, AMP-s have drawn much attention as a potentially effective class of anti-infectious therapeutics.Considering the facts that bacterial resistance to antimicrobial peptides is infrequent [12,13,8], they are non-toxic and non-immunogenic (according to numerous reports, such as [14]), extensive research programs have been established with the aim to exploit the AMP-s as a novel stand-alone class of antibiotics.
Substantial experimental efforts have been invested into discovery and investigation of natural and synthetic cationic polypeptides possessing antibacterial, antiviral, antifungal and/or anti-tumour activities.Nevertheless, only a few very simple structure-activity studies have been reported in the literature, with the results not leading to validated QSAR models.In the current work, we have attempted to fill this gap by creating a QSAR model quantifying antibacterial activity of a broad range of rigorously investigated cationic peptides through the recently developed 'inductive' QSAR descriptors.

'Inductive' descriptors overview
The 'inductive' descriptors have been previously introduced, and are based on the models of inductive and steric effects, inductive electronegativity and molecular capacitance, developed in a series of papers by Cherkasov and co-authors [15][16][17][18][19].These molecular parameters can be easily accessed from fundamental parameters of bound atoms, such as absolute electronegativities (χ), covalent radii (R) and intramolecular distances (r).The steric Rs and inductive σ* influence of natomic group G on a single atom j is: In those cases when the inductive and steric interactions occur between a given atom j and the rest of N-atomic molecule (as sub-substituent), the summation in ( 1) and ( 2) is taken over N-1 terms.Thus, the group electronegativity of (N-1)-atomic substituent around atom j is expressed as the following: Similarly, steric and inductive effects of a singe atom onto a group of atoms (the rest of the molecule) are defined as: In the work [18] an iterative procedure for calculating a partial charge on j-th atom in a molecule was developed, and it is: (where Q j reflects the formal charge of an toms j).
Initially, the parameter χ in (6) corresponds to χ 0 -an absolute, unchanged electronegativity of an atom As the iterative calculation progresses, the equalized electronegativity χ' gets updated according to (7): where the local chemical hardness η 0 reflects the "resistance" of electronegativity to a change of the atomic charge.The 'inductive' hardness η i and softness s i of a bound atom i are represented in the following manner: The corresponding group parameters are therefore expressed as: The interpretation of the physical meaning of the 'inductive' descriptors was developed by considering a neutral molecule as an electrical capacitor formed by charged atomic spheres [18].This approximation relates chemical softness-hardness of bound atom(s) with the areas of the facings of an electrical capacitor radically formed by the atom(s) in a molecule (Figure 1), and correlates electronic density with capacitor-accumulated electricity.The validation of Cherkasov's 'inductive' parameters, developed to date, has been rigorously conducted on extensive experimental datasets [15][16][17][18][19][20][21][22][23][24].Table 1 features 50 'inductive' QSAR descriptors that can be calculated in the framework of equations ( 1)- (11).It should be noted that in a previous study [25], these molecular parameters allowed creation of the QSAR model enabling 93% correct recognition of low-molecular weight antibacterial compounds.

EO_Equalized *
Iteratively equalized electronegativity of a molecule Calculated iteratively by (7) where charges get updated according to (6); an atomic hardness in (7)

Softness_of_Most_Pos
Atomic softness of an atom with the most positive charge (9)

Softness_of_Most_Neg
Atomic softness of an atom with the most negative charge

Experimental data
In the current work, we have used the 'inductive' descriptors to investigate structure-activity relationships in a series of antibiotic peptides called CAMEL-s.These compounds represent derivatives from the hybrid polypeptide CAMEL0 previously created by the respective fusion of the Cand N-terminus sequences of natural peptides Cecropin and Melittin.Despite the rather limited variability in amino acid sequences among these leucine-rich peptides, their antibacterial activity ranges over several orders of magnitude.It has been experimentally demonstrated that the CAMEL-s exhibit high activity against various strains (including the drug-resistant ones) of Gram-positive and Gram-negative bacteria, including Bacteroides, Bordetella, Campylobacter, Corynebacterium, Klebsiella, Listeria, Moraxella, Pastuerella, Taylorella, Yersinia, Rhodococcus, Staphylococcus and Streptococcus [26][27][28].The minimal inhibitory concentrations for the series of 101 CAMEL-s against the listed microorganisms have been previously averaged to produce the mean antibiotic potency parameters [26][27][28].These values extracted from the SAPD database [29] have been collected into Table 2 and subjected to QSAR analysis with the 'inductive' descriptors.

Factors governing bioactivity of CAMEL-s
The common mode of action of antimicrobial peptides is disruption of bacterial cell membranes via electrostatic and hydrophobic interactions [1,3,4,12,14,[29][30][31][32][33][34][35][36].It is believed that amphipatic peptides can penetrate or form pores in the cell membranes through the insertion into the lipid bilayer mediated by hydrophobic forces, while their electrostatic interaction with phospholipid headgroups leads to membrane disruption.Cationic peptides exhibit high affinity only toward negatively charged surfaces of bacterial cells while they do not tend to interact with eukaryotic cells surfaces composed mostly of zwitterionic phospholipids.It has been demonstrated that antimicrobial activity of polypeptides can be influenced by their helicity, hydrophobicity and amphipathicity [5,[37][38][39][40][41][42][43][44].Nonetheless, the exact nature of this correlation is still unclear and the understanding of the factors influencing the AMP activity is incomplete.We postulate that a set of the developed 'inductive' descriptors can adequately reflect those structure-dependent properties of CAMEL-s pertaining to their antibacterial activity.The reasoning for this stems from the fact that the parameters calculated within (1)-( 11) cover a very broad range of proprieties of bound atoms and molecules related to their size, polarizability, electronegativity, compactness, mutual inductive and steric influence and distribution of electronic density, etc.

Descriptors calculation and selection
All 50 inductive QSAR descriptors (presented in detail in Table 1) have been calculated for all 101 CAMEL molecules under study.To compute the 'inductive' descriptors we have used the custom SVL-scripts implemented in the MOE package [45].It should be noted that all of the produced parameters are 3D-sensitive and depend on the structure of polypeptides.The CAMEL-s were initially built in the alpha-helical conformations (the helicity was confirmed by a number of secondary structure predictors) which were further optimized by the MMFF9f molecular mechanic simulations.
It should be mentioned here, that some inductive descriptors may reflect related or similar molecular/atomic properties and can be correlated in certain cases (even though the analytical representation of those descriptors does not directly imply their co-linearity).Moreover, most of the CAMEL-s have very similar three-dimensional structures and, therefore, special precautions were taken in selecting the appropriate 'inductive' descriptors for the QSAR model.Hence, to eliminate the cross-correlation among the independent variables, we pre-computed pairwise regressions between all pairs of the 50 QSAR parameters for CAMEL-s.We subsequently removed those descriptors that linearly correlated with R≥0.9.As a result of this procedure, only 20 parameters were selected for further simulation (for more information refer to the Legend of Table 1 The averaged values of these 20 indices were also separately calculated for antibacterial CAMEL-s conventionally sub-divided into three activity groups: very active (with the mean potency above 4), mild (potency between 2 and 4) and moderate antibacterials (potency < 2).The 'inductive' descriptors averaged within these three groups are plotted in Figure 2.Although the curves for moderate and very active peptides are very close for the most part, all three categories of CAMEL-s can be clearly distinguished by the selected QSAR parameters.Therefore, it is reasonable to assume that the 'inductive' QSAR descriptors can effectively be used for the numerical quantification of the average potency of the CAMEL-s.

Composition of the training and the testing (validation) sets
In order to relate the 'inductive' descriptors to the experimental mean potencies of the peptides, we have employed the method of Artificial Neural Networks (ANN).Machine-learning approaches, particularly ANN, represent one of the essential parts of the modern QSAR, and the detailed description of the corresponding methodologies can be found elsewhere [e.g., in 46].
For this study, we chose the standard back-propagation configuration for the ANN and we used the Stuttgart Neural Network Simulator package [47] to implement the model.For effective training of the network (primarily to avoid overfitting), we used the training sets of 91 compounds randomly selected as 90 percent of the available CAMEL-s.Such random sampling has been performed 20 times and, 1044 thus, 20 independent QSAR models have been created in order to evaluate the average predictive ability of the method.One of the training sets with 91 CAMEL peptides is presented in Table 2.The remaining 10 polypeptides, featured in Table 3, were used as the corresponding testing group to access the method's predictive ability.For each polypeptide in the training and testing sets, we have transformed 20 network input descriptors into the normalized values varying from 0 to 1. Similarly, the output parameters from the ANN (mean antibacterial potencies) were normalized to [0:1] range.To quantify the antibacterial potencies of the CAMEL-s from the training sets, for each of them we built ANN consisting of 20 input, 8 hidden and 1 output nodes (as indicated on Figure 3).  2 and plotted against the experimental numbers in Figure 4.As it can be seen from the data, the 'inductive' descriptors allowed reproducing the average antibacterial activity of 91 CAMEL-s in the presented set of the training compounds with rather good accuracy.Considering that the examined potencies represent some averaged, not well standardized properties, the resulting QSAR predictions can be viewed as very adequate.To investigate the predictive power of the developed ANN-based solution, and to ensure that no overtraining occurred, we examined the network's performance on the testing compounds from Table 3.The normalized patterns of the independent variables for 10 CAMEL-s not used for the learning phase were passed through the trained ANN.The pre-estimated node-associated weights of the networks were used to compute the theoretical potencies of the validation compounds.The resulting output parameters collected in Table 3 demonstrate that with the exception of two peptides (CAMEL-s 9 and 140), the predicted CAMEL potencies accurately reproduce the experimental data, thereby validating the QSAR model generated here.It should be noted that similar ANN performance has been observed on all 20 training/testing random pairs of peptide datasets (the results for compounds from Tables 2 and 3 are actually one of the least accurate among the studied).
To assess the predictive ability of the developed approach in the categorical context (which is more appropriate for such non-standardized data with considerable uncertainty), we have also transformed the continuous outputs from the training and testing procedures into the discrete categorical format.Using the previously outlined conditional classification of the studied polypeptides, the performance of the neural network was assessed by comparing the categorical classification of the outputs from the neural network with the categories corresponding to the experimental potencies.For the cases in which the experimental potency and the predicted potency correspond to the same activity category the prediction of activity was considered to be correct.
Based on this simple assessment, the correct predictions by the developed QSAR model can be considered as 79% accurate for the presented training set (72 out of 91 CAMEL-s were correctly assigned) and 80% accurate for the testing set in Table 3 (8 correct predictions out of 10).The significant deviations of the predicted potencies from the experimental potencies for two validation peptides -CAMEL 9 and CAMEL 140, indicate that the ANN-based approach underestimated the potency of these compounds.
Thus, the computed mean potency of CAMEL9 (2.917) is much lower than the corresponding experimental activity (6.292).This can, perhaps, be attributed to the non-exact character of the potency parameters and the model's discrepancies.Similarly, for CAMEL140, we predicted its potency to be 1.203 rather than 4.136.In this case, the error could be attributed to the very uncharacteristic composition of this peptide.It contains the negatively charged glutamic acid (E) residue which could not be adequately captured by the neural network which is trained on the set of peptides not containing this structural feature (only one peptide from the training set has E in the sequence).Despite the occasional misclassification, the developed ANN predictor has not missed by more than one class of antibacterial activity or, in other words, it did not place any peptides with high activity to the class of mild therapeutics and vice versa.
The accuracy of the created QSAR model can, possibly, be further improved by pre-processing the data for most adequate training and testing sets selection [48] or by using more powerful machine learning techniques.To summarize the section, it is possible to conclude, that the developed QSAR model operating by the 'inductive' descriptors and utilizing the ANN algorithm can accurately quantify the antibacterial potency of the studies synthetic cationic polypeptides and can effectively place them into groups of active, moderate and mild anti-infective compounds.

Conclusions and Further Directions
The evolution of bacterial strains into multi-drug resistant organisms progresses at an alarming rate.For this reason, it is crucial to discover novel non-specific antibiotics (such as cationic polypeptides) that are active against a number of different strains of microbes, including the resistant ones.The role of QSAR models for the antimicrobial polypeptides cannot be overestimated as such predictive solutions can significantly rationalize the selection, design and refinement efforts for these drugs.The developed QSAR approach utilizing the 'inductive' descriptors and based on the Artificial Neural Network algorithm can be used for these purposes and can be further expanded to cover a wider range of cationic peptides active against pathogens.
The approach can also be enhanced by utilizing purely statistical techniques in conjunction with the inductive QSAR descriptors which allows interpreting contributions from individual structural factors to the potency of the AMP-s.Despite the fact that the developed ANN-based method does not currently allow us to exactly evaluate the contributions of the individual QSAR descriptors, it is clear that the employed 'inductive' parameters adequately reflect those aspects of intra-and intermolecular interactions which govern antibacterial activity of the cationic polypeptides.
Hence, the developed methodology can further be applied to other important classes of cationic peptides, such as those active against viruses, fungi or tumours, and can provide excellent computational guidance for discovery of novel and potent therapeutic leads.

Figure 1 .
Figure 1.Radial shielding on spherical surface of atom A by the neighbouring atom B

N
where all the contributions i N ∆ derived within(6) Total_Charge_Formal*Sum of charges on all atoms of a molecule (formal charge of a molecule)Sum of all contributions(6)

σσ
equation (2) with n=N-1i.e. each atom j is considered against the rest of the molecule G Total_Abs_Sigma_mol_i Sum of absolute values of group inductive parameters σ*(molecule→atom) for all atoms Largest positive group inductive parameter σ*(molecule→atom) for atoms in a molecule (2) Most_Neg_Sigma_mol_i Largest (by absolute value) negative group inductive parameter σ*(molecule→atom) for atoms in a molecule (2) Most_Pos_Sigma_i_mol Largest positive atomic inductive parameter σ*(atom→molecule) for atoms in a molecule (5) Most_Neg_Sigma_i_mol* Largest negative atomic inductive parameter σ*(atom→molecule) for atoms in a molecule (5) Sum_Pos_Sigma_mol_i Sum of all positive group inductive parameters σ*( molecule →atom) >0 and + n is the number of N-1 atomic substituents in a molecule with positive inductive effect (electron acceptors) Sum_Neg_Sigma_mol_i * Sum of all negative group inductive parameters σ*( molecule →atom) within a molecule <0 and − n is the number of N-1 atomic substituents in a molecule with negative inductive effect (electron donors)

Figure 2 .
Figure 2. Distribution of the averaged values of the 'inductive' indices among 'Very active', 'Moderately active' and 'Mild' CAMEL polypeptides under investigation.

Figure 3 .
Figure 3. Configuration of the Artificial Neural Network with 20 input, 8 hidden and 1 output nodes used in the study.

Figure 4 .
Figure 4. Predicted Potencies vs. Experimental Mean Potencies in the Training Set.
n is the number of atoms i with positive partial charge.*Sum of softnesses of atoms with negative partial charge Obtained by summing up the contributions from atoms with negative charge computed by (9)Average_SoftnessArithmetic mean of softnesses of all atoms of a molecule (11) divided by the number of atoms in molecule Average_Pos_Softness Arithmetic mean of softnesses of atoms with positive partial charge + ∑ + n s n i i where + n is the number of atoms i with positive partial charge.n is the number of atoms i with negative partial charge.
* -descriptors selected for building the antibiotic peptide QSAR model.
).The descriptors are: Average Electronegativities of the Negatively/Positively Charged Atoms, Molecular (equalized) Electronegativity, Total Formal Charge, Average Atomic Hardness, Sum of Atomic Hardnesses, Average Negative/Positive Charges, Largest Positive Charge, Average Atomic Hardnesses of Negatively/Positively Charged Atoms, Hardness of the Most Positively Charged Atom, Largest Hardness among the Negatively/Positively Charged Atoms, Sum of Softnesses of Negatively Charged Atoms, Steric Effect on the Most Negatively Charged Atom, Most Negative Inductive Constant of an Atom in Molecule, Largest Positive Inductive Effect on an Atom in Molecule, The Smallest Steric Effect on a Atom in Molecule, Sum of all Negative Inductive Effects on Atoms in Molecule.

Table 2 .
Training Set of Camels in the 90/10 Split: the Mean Experimental Potencies vs. Predicted Potencies Using a Neural Network with Eight Hidden Nodes.The experimental potencies are average potencies against 24 Gram-positive and Gramnegative bacterial strains.The Camels are sorted according to ascending experimental potencies.

Table 3 .
Validation (testing) Set of Camels in the 90/10 Split: Experimental Mean Potencies vs. Predicted Potencies by a Neural Network with Eight Hidden Nodes.