Modeling the LPS Neutralization Activity of Anti-Endotoxins

Bacterial lipopolysaccharides (LPS), also known as endotoxins, are major structural components of the outer membrane of Gram-negative bacteria that serve as a barrier and protective shield between them and their surrounding environment. LPS is considered to be a major virulence factor as it strongly stimulates the secretion of pro-inflammatory cytokines which mediate the host immune response and culminating in septic shock. Quantitative structure-activity relationship studies of the LPS neutralization activities of anti-endotoxins were performed using charge and quantum chemical descriptors. Artificial neural network implementing the back-propagation algorithm was selected for the multivariate analysis. The predicted activities from leave-one-out cross-validation were well correlated with the experimental values as observed from the correlation coefficient and root mean square error of 0.930 and 0.162, respectively. Similarly, the external testing set also yielded good predictivity with correlation coefficient and root mean square error of 0.983 and 0.130. The model holds great potential for the rational design of novel and robust compounds with enhanced neutralization activity.


Introduction
Lipopolysaccharides (LPS) are major structural components of the outer membrane of Gramnegative bacteria. LPS confers a net negative charge to the membrane and thereby serves as a protective shield against antibacterial agents, as well as playing a crucial role in maintaining the integrity of the overall membrane structure. LPS are endotoxins which when present in the systemic circulation lead to the development of septic shock. The endotoxins initiate such events by first binding to CD14/TLR4/MD2 receptor complex on phagocytic cells causing the release of proinflammatory cytokines which consequently increase the releasing of other inflammatory mediators in many cell types (e.g. neutrophils, monocytes, vascular endothelial cell) as well as initiate the neuroendocrine response. Aside from this, LPS can also mediate the activation of plasma protein cascades (e.g. complement system) [1][2][3][4]. Among the mediators produced by activated macrophage, tumor necrosis factor (TNF) is the most important and the earliest to be released. TNF can amplify the response to endotoxins by further activating inflammatory cytokines (e.g. IL-1, IL-6, and IL-8) which consequently leads to the synthesis of other mediators such as arachidonic acid, platelet-activating factor (PAF), nitric oxide and reactive oxygen species [5,6]. Such elevated levels of mediator decrease blood flow to vital organs (e.g. kidney, heart, and brain) by increasing vascular permeability as well as causing microvacular occlusion and damage thereby leading to multiple organ dysfunctions and eventually culminating in death [7].
Among those susceptible to the development of sepsis are children, immunocompromised individuals, and the elderly. In fact, global incidences of septic shock have increased over the past decade as a result of the growing number of immunologically compromised patients. Furthermore, LPS is considered to be one of the leading causes of mortality in intensive care units worldwide [8][9][10]. Therapeutic strategies for increasing the chances of survival include: (i) modulating inflammatory mediator release via the use of anti-cytokine or anti-inflammatory agents, (ii) supporting major organ dysfunction by increasing blood flow and (iii) administrating early antibiotic or anti-endotoxin usage. Therefore, drugs or compounds able to combat LPS toxicity have been of great interest [11]. Potential therapeutic agents such as anti-endotoxin antibodies, short peptides and lipopolyamines, as well as other small molecules, have been found to display promising sequestering effect towards lipid A but the efficiency of those compounds needs to be improved [12][13][14][15][16][17][18].
The toxic compartment of LPS is their structurally conserved glycolipid component called lipid A. Lipid A is a fatty acid chain bound to two phosphorylated glucosamine residues that is linked to the oligosaccharide core and the distal O-antigen polysaccharide chain. The O-polysaccharide chain is made of oligosaccharide repetitive units (O-units) while the oligosaccharide core is made of an inner core (ketodeoxyoctonic acid and heptose) and an outer core. The diversity in the composition and length of the O-antigen vary among different bacterial species.
In order to investigate the molecular parameters of interaction between lipid A and anti-endotoxin agents, Guo et al. utilized molecular modeling analysis for elucidating the underlying mechanism governing such high binding affinity. Their results indicated that a correlation exists between the binding affinity and the electrostatic interaction of ligands with lipid A [19]. In parallel, our previous investigations have revealed that quantitative structure-activity/property relationship (QSAR/QSPR) are useful tools for establishing relationships between molecular structures with the respective activities or properties of a wide array of biological and chemical systems [20][21][22][23][24][25][26]. Elucidations on the feasibility of potential ligands prior to performing actual experimentations are useful on the economic and time-saving view points.
To the best of our knowledge, we present the first development of a quantitative structure-activity relationship model of LPS neutralization activity by anti-endotoxins as modeled by multivariate analysis. Molecular descriptors accounting for charge and electronic properties of anti-endotoxins were used as input variables for calculating the half-maximal effective displacement (ED 50 ).

Structural considerations
It has been reported that electrostatic and hydrophobic forces are necessary for LPS neutralization activity [27][28][29][30]. To eliminate the endotoxin, an agent would need to bind to lipid A. This is achieved by small molecules that are capable of engaging in strong hydrogen bond formation with the cationic and phosphate groups of lipid A. Furthermore, hydrophobic moieties are also necessary in stabilizing and enhancing the affinity of the agent in binding to the endotoxin [31]. Therefore to account for the electrostatic and hydrophobic forces, descriptors derived from RECON and Spartan'04 software packages were used for QSAR model development. Descriptors from RECON, which is based on the TAE methodology, are suitable for this study as it directly accounts for the molecular recognition in terms of the electronic charge properties. Additional descriptors from Spartan'04 were selected to account for the aforementioned interaction forces. Particularly, the hydrophobic interaction was approximated by the water accessible hydrophobic surface area of the molecule (CPK Area ). Molecular descriptors such as total energy (E Total ), atomic charge (Q A ) and total hydrogen atomic charge (Q TH+ ) are important auxiliary variables that could also account for the electrostatic properties of the molecules. It was observed that the combination of both sets of descriptors exerted a dramatic enhancement of predictive power than solely relying on TAE descriptors (data not shown).

Variable reduction
Prior to performing the calculation of LPS neutralization activity, the redundant and multi-collinear descriptors present in the dataset were initially reduced by UFS in order to achieve a better efficiency of prediction. The subset of descriptors left after variable reduction was in the range of 3 and 20. The optimal number of descriptor to use was determined by making a plot of the number of selected descriptors as a function of RMS ( Figure 1). It was observed that the optimal number was 10, which was selected for further investigations.

Parameter optimization
The network architecture was determined by a trial-and-error adjustment of various parameters for the purpose of obtaining an optimal configuration. The empirically determined parameters included the number of nodes in the hidden layer, the learning epoch size, and the learning rate and momentum. Parameter that exhibited the lowest RMS was chosen as optimum. The optimal number of hidden nodes was determined by varying the number of nodes from 1 to 25.
As represented in the plot of RMS versus hidden nodes (Figure 2), the optimal number of nodes was found to be seven. In order to ovoid overtraining of the network, the learning epoch size was subsequently optimized from 1 to 700 in increments of 50 and learning was stopped once a detectable rise in RMS for the leave-one-out cross-validated testing set was observed. The best learning time could be observed by making a plot of the RMS as a function of the learning epoch size (Figure 3). The optimal value was found to be 300. .50 .55 .60 .65 .70 optimal Subsequently, the optimal learning rate and momentum was selected by making a contour plot of the RMS as function of the learning rate and momentum ( Figure 4). The lines in the contour plot represented constant values of RMS, while shaded boxes designated RMS values that were obtained from the learning procedures and fitted onto the same surface model [32]. As shown on the contour plot, the best learning rate and momentum lies in the middle left region of the graph and was found to be 0.1 and 0.4, respectively.

Prediction of LPS neutralization activity using artificial neural network
The optimal configuration of the predictive model of LPS neutralization activity was identified to be 7, 300, 0.1 and 0.4 for the number of hidden nodes, learning epoch size, learning rate and momentum, respectively. The network was created by means of a leave-one-out cross validation [33][34][35], by which one sample of the dataset was withdrawn for use as the test set while the rest served as the training set. This process was repeated iteratively until all samples of the dataset were used as the test set [32,[36][37]. It was observed that the predicted and experimental neutralization activities were moderately correlated as can be observed from the correlation coefficient of 0.781. To further refine the model, identification of potential outliers present inherently in the predictive model was performed using standard statistical analysis where compounds with absolute standardized residuals exceeding the cut-off value of 2 were marked as outliers. This statistical analysis was reiterated to yield a final predictive model which was constructed using previously determined optimal set of network parameters. As indicated in Table 1, results yielded correlation coefficient and root mean square error of 0.980 and 0.124 for the training set, respectively, while 0.930 and 0.162 was observed for the testing set. A plot showing the experimental versus predicted LPS neutralization activity for model 6 is shown in Figure 5.  In order to evaluate the predictive power of the QSAR model discussed herein, an external testing set would be necessary. To achieve this, the data set from model 6 was further divided into two sets of data: (i) one portion for deriving the optimal network parameters by leave-one-out cross-validation and (ii) an external testing set for evaluating the extrapolation capability of the QSAR model. This was carried out by randomly selecting 10% of the data set as an external testing set which equates to 6 data samples while the remaining 52 data samples were used for performing leave-one-out cross-validation. Optimization of the network parameter was performed as previously discussed to give the optimal parameters as follows: 15 nodes in the hidden layer, 150 learning epochs, learning rate of 0.2 and momentum of 0.4. This set of parameter was used for deriving the predictive performance of the external test set. Results indicated that the proposed QSAR model could accurately predict the LPS neutralization activity as observed by the correlation coefficient of 0.983 and root mean square error of 0.130 as shown in Figure 6.

Conclusions
The LPS neutralization activities of anti-endotoxin agents were modeled using back-propagation neural network. The predicted neutralization activities were found to be in good agreement with the experimental values. Analysis of these results suggests that the use of charge-based descriptors and quantum chemical descriptors were useful and necessary in obtaining good prediction that is representative of the experimental values. Therefore, such methodology proposed in this study demonstrates a facile approach for the design of novel anti-endotoxin agents with robust properties.

Data collection
The half-maximal effective displacement (ED 50 ) of 80 anti-endotoxins were collected from the literature [27,31,40] (Table 1). Seven compounds were identified as inactive and were removed due to their high level of ED 50 >200.

Descriptor generation
The two-dimensional structure of each compound was drawn in ChemAxon's MarvinSketch [41] and exported as SMILES notation. The three-dimensional molecular structures were constructed using the molecular building module of Spartan'04 [42] and subsequently submitted for calculation of quantum chemical descriptors. Likewise, the SMILES notation served as input for the calculation of charge-based descriptors by RECON.
RECON (version 5.5) was used for the generation of 248 transferable atom equivalent (TAE) molecular descriptors. The TAE methodology, based on Bader's quantum theory of atoms in molecule, was developed by Brenemen and co-workers for the rapid reconstruction of molecular charge density and molecular electronic property via pre-computed ab initio atomic charge density fragment [42]. TAE descriptors were selected for the prediction of LPS neutralization activity as they could account for the electronic properties of molecules, which is crucial for modeling molecular interaction [25] of the anti-endotoxins with their respective LPS target.
Aside from electronic properties, the involvement of hydrophobic and electrophilic forces has also been demonstrated to play a crucial role in the biological activity of the anti-endotoxins [11,12]. Hence, Spartan'04 was employed for generation of the following quantum chemical descriptors: molecular weight (MW), water accessible hydrophobic surface area of the molecule (CPK Area ), total energy (E Total ), and atomic charge (Q A ). The three-dimensional structures of each anti-endotoxin compound were initially optimized using the Merck Molecular Force Field (MMFF) in conjunction with Monte Carlo simulation or systematic conformational search to identify the lowest energy geometry. These pre-optimized structures were then calculated at the semi-empirical level using the Parameterization Method 3 (PM3) [42].
An additional descriptor, the total hydrogen atomic charge (Q TH+ ) was calculated according to the following equation: where i Z , i q and n represent the atomic numbers, the atomic electron populations and the number of hydrogen atoms, respectively.

Descriptor reduction
The molecular descriptors were subjected to variable reduction in order to reduce computational time, minimize multi-collinearity and eliminate redundancy of the descriptors. This was performed using UFS, version 1.8, which is a computer program based on the Unsupervised Forward Selection (UFS) algorithm [43]. Briefly, UFS removes variables when the standard deviation is less than the predefined sdevmin and terminates when the squared multiple correlation coefficients of the remaining variables exhibit values greater than the r-squared-max ( 2 max R ). The standard deviation was left as default at 0.0005 while the 2 max R was varied between 0 and 0.99. Thus, 248 descriptors were reduced to a range of three to 20 descriptors.

Overview of artificial neural network
Artificial neural network (ANN) is a computational model that mimics the learning process of the human brain. This is performed in a supervised manner where the model attempts to find a relationship between the independent and dependent variables. ANN is comprised of multiple layers of artificial neurons that are interconnected in a feed-forward manner where signals are relayed from the input layer through the hidden layer and finally to the output layer. The connections among the network of neurons are assigned numerical values known as weights which alter the signal flow through the neural network and governing its predictive performance. Particularly, the weights are adjusted adaptively to reduce the error by back-propagating signals from the output layer back to the input layer through the hidden layer. Readers are referred to the excellent book by Zupan and Gasteiger [32] for further methodological information.

Prediction of LPS neutralization activity
The predictive model of LPS neutralization activity was developed using back-propagation neural network as calculated by Weka, version 3.4.3 [44]. The independent variables, comprising of TAE descriptors and quantum chemical descriptors, of each compound were normalized to a range of 0 and 1 according to the following equation: where norm x , i x , min x , max x represent the normalized data, the value of each instance, the minimum value, and the maximum value of the dataset, respectively. The optimal neural network parameters were obtained by an empirical trial-and-error search of the number of nodes in the hidden layer, the learning epoch size, the learning rate (η) and momentum (μ) constant. A learning epoch refers to a complete cycle of data propagated through the layers of the network in a feed-forward manner. η controls the speed of weight adjustment, while μ prevents sudden changes in attaining the solution. Under an incremental tuning of such parameter, root mean square error (RMS) was simultaneously measured as an indicator of predictive error, which is calculated according to the following equation: where i p , i a , and n represent the predicted output, the actual output and the number of compounds in the dataset, respectively. As each neural network calculation operates via random initialization of the weights, the averaged RMS of ten runs was used as a measure of predictive error by adjusting the random seed from 0 to 9.

Internal validation procedure
Generation of training and testing sets was made using leave-one-out cross-validation (LOO-CV). Briefly, LOO-CV involved the leaving out of one molecule as the testing set and using the remaining as training set. This is performed iteratively until all samples were given the chance to be left out as testing sets.

Statistical analysis
The adjusted R 2 takes into consideration such information as the number of independent variables that are present in the predictive model. This is calculated according to the following equation: where sin ij x represents the standardized residual, ij x represents the residual of each sample, j x represents the mean of the residual, and N represents the sample size of the data set. The cut-off value for the absolute standardized residual was set to 2.