Predicting Complexation Thermodynamic Parameters of β-Cyclodextrin with Chiral Guests by Using Swarm Intelligence and Support Vector Machines

The Particle Swarm Optimization (PSO) and Support Vector Machines (SVMs) approaches are used for predicting the thermodynamic parameters for the 1:1 inclusion complexation of chiral guests with β-cyclodextrin. A PSO is adopted for descriptor selection in the quantitative structure-property relationships (QSPR) of a dataset of 74 chiral guests due to its simplicity, speed, and consistency. The modified PSO is then combined with SVMs for its good approximating properties, to generate a QSPR model with the selected features. Linear, polynomial, and Gaussian radial basis functions are used as kernels in SVMs. All models have demonstrated an impressive performance with R2 higher than 0.8.


Introduction
-cyclodextrin (-CD) is a cyclic oligosaccharide that naturally contains seven glucose residues linked by (1-4)-glycosidic bonds, with a hydrophilic outer surface and a relative hydrophobic central cavity, which can form complexes with appropriate guest molecules. It has received increasing attention in the pharmaceutical field for modifying drug physicochemical properties, such as solubility, stability and bio-availability, reducing their toxicity and side effects, and suppressing unpleasant taste or smell [1,2].
The high interest in the stability constants of CD-host complexes has initiated the search for proper models for predicting these association constants or the related free energies of complexation. The aim is not only to select convenient CDs for the complexation of a particular compound, but also to get some insight into the physico-chemical parameters influencing the affinity between host and guest molecules. The availability of a large amount of experimental data led to several interesting predictive models. The inclusion reactions of a series of benzene derivatives were used for a correlation model [3]. Diverse experimental information was used to develop a prediction model for the free energy of complexation using several molecular descriptors [4][5][6][7]. A CoMFA approach was applied to the binding constants of some organic compounds and various CDs [8]. An analysis of the complexation of 30 carboxylic acids and their anions was performed using a two-parameter correlation model [9]. Artificial neural networks were also used to correlate some molecular descriptors with complexation constants [10,11]. The free energies of a larger dataset of compounds complexed with all natural CDs were considered using a molecular-size based model [12,13]. Energies obtained from molecular docking of the guest molecule to the CDs' cavities were used for systematic investigations of the interaction of organic substances with CDs [14][15][16]. A model correlating physico-chemical parameters directly with the solubility of the complexes was also presented [17]. A newer improved empirical model has been published just recently [18]. Most of these investigations considered the "natural" CDs, and generally, models were obtained with sufficient predictive power. The molecular descriptors are substantially different for various CDs, indicating differences in reaction mechanisms, or as a consequence of the varying flexibility of the ring system. Not much has been done up to now for the prediction of interaction energies with modified CDs, or considering different CD derivatives.
In this study, two novel approaches, Binary Particle Swarm Optimization (BPSO) [19] and Support Vector Machines (SVMs) [20] are used to predict the thermodynamic parameters for the 1:1 inclusion complexation of enantiomeric pairs of chiral guests with -CD. SVMs represent a relatively new type of learning machine. They were designed to minimize the structural risk by minimizing an upper bound of the generalization error rather than the training error. Therefore, the over-fitting problem in machine learning is solved successfully. Another outstanding property of SVMs is that a solution obtained is always unique and globally optimal. In this paper, SVMs are investigated for a quantitative structure-property relationship (QSPR) study, to reduce the complexity of QSPR modeling and utilize the attractive properties of: not requiring the gradient information, consistent results, and fast convergence. The PSO was used to choose a set of important features.
Several factors such as number of atoms, van der Waals surface area, ionization potential, molecular weight, molar refractivity, atomic connectivity index, molecular flexibility, and angle bend energy, etc., influence thermodynamic properties. Only some of these factors strongly affect these thermodynamic properties and are controlled or set up in advance. The selection of these parameters is traditionally conducted by multiple linear regressions, partial least squares, and principle component analysis methods. Consequently, their assumptions must be verified and validated before the developed model can accurately be used. This results in a predictive model which may compromise the quality of the obtained products and/or efficiency of the modeling process.
With the increasing need for more accurate and practical evaluation QSPR models, techniques in artificial intelligence, particularly Artificial Neural Networks (ANNs), are receiving more attention in industry and academia today because they can be used to learn relationships between thermodynamic properties and their parameters. However, a number of parameters such as network topology, learning rate, and training methods have to be fine-tuned before they are deployed successfully. Furthermore, drawbacks like local optima, overfitting, and long learning time tend to occur.
Theoretically, the aforementioned shortcomings of ANNs have been countered by the development of Support Vector Machines. Unlike ANNs which minimize empirical risk, SVMs are designed to minimize the structural risk, by minimizing an upper bound of the generalization error, rather than the training error. Therefore, the overfitting problem in machine learning is solved successfully. Another outstanding property of SVMs is that the task of training SVMs is mapped to a uniquely solvable linearly constrained quadratic programming problem. This produces a solution that is always unique and globally optimal. They have been extended to solve regression problems as well.
In this paper, Support Vector Regression (SVR), which is based on Support Vector Machines, is investigated as an alternative technique for QSPR prediction. It has shown very good results for function approximation of Quantitative Structure-Activity Relationships (QSAR) [21]. The SVR retains much of the elegance of the SVMs, such as good generalization and global optimal properties, and has no normal distribution assumption requirement. The linear approximation is a fundamental concept of SVR. Its extension to a nonlinear case is achieved by using the mechanism of inner-product kernel to avoid the problem of dimensionality. To speed up its regression, the use of a proper kernel is calculated in advance. Even though this kernel computation requires large memory space, various problem optimizations have already been proposed [22,23].
Since SVM can build a very reliable QSPR model based on the training data, it is incorporated in our feature selection process. The Pearson correlation coefficient (R) is then treated as the objective function for a formulated optimization problem. Our previous paper [21] presented a similar approach by attempting to optimize R for predictive QSPR model building with a combined Feed-Forward Neural Network and a Particle Swarm Optimization. The PSO also showed good performance and was suitable for use with the found model, where no explicit relation between inputs and outputs was available. With attractive properties of no requirements for gradient information, consistent results, fast convergence, and successful applications in [24][25][26][27][28], PSO is then selected as an optimizer in this work.
Therefore, the purpose of this study was to develop a procedure that can determine key features for predicting complexation thermodynamic parameters of -CD complexes with enantiomeric pairs of a chiral guest.

Methodology
The proposed methodology consists of two parts: feature selection and QSPR modeling. First, a machine learning technique called Support Vector Machines (SVMs) is used to capture characteristics of QSPR and their factors, because of the SVMs' superior properties of generalization and global optima. They are next incorporated in an optimization problem so that a relatively new, effective, and efficient optimization algorithm, Particle Swarm Optimization (PSO), is applied to find key parameters. The cooperation between both techniques can produce a very good predictive QSPR model.

Support Vector Machines Based QSPR Model
SVMs represent a relatively new type of learning machine. They are an approximate implementation of the method of structural risk minimization, which attempts to minimize the generalization error, which occurs when the machines are tested with unseen data. The generalization error rate is bounded by the sum of a pair of competing terms, the training error rate and the confidence interval, which depends on the Vapnik-Chervonenkis (VC) dimension. Hence, the VC dimension and the training error (empirical risk) are both minimized at the same time. To realize this in SVMs, a structure is imposed on the set of hyperplanes, by trying to obtain the weight vector w having the minimum Euclidean norm. Coupled with dual transformations, the optimization model yields a global optimum. These key properties really separate the SVM from other learning machine algorithms.
In regression problems, the problem of approximating the following set of data , and .,. represents dot product, is taken into consideration. The x i is the set of descriptors, and y i is the output, which is the thermodynamic value. The -insensitive loss function proposed by Vapnik [20] is commonly incorporated with SVMs (-SVR) to create sparseness in the support vectors and to embed the robustness of the Huber's loss function. This means that f(x) is allowed to vary at most  deviation from the target, and is as flat as possible, simultaneously. If the deviations are larger than the  specified, this implies a bad fit and this function is proportionally penalized with constant C. This constant C determines the tradeoff between the training errors and model complexity. The flatness test of f(x) is accomplished by searching for the smallest w. Hence, a formulation of -SVR is described by: min: Everything above  is captured in slack variables  i and everything below - is captured in slack variables  i  . This -insensitive loss function,   is defined as: (2).
Using the Lagrangian multipliers and the Karush-Kuhn-Tucker (KKT) conditions, the following dual problem is obtained: subject to: Transforming into dual form yields a quadratic programming problem with linear constraints and a positive definite Hessian matrix. This leads to a global optimum. A nonlinear form is usually required to adequately model data. Hence, a nonlinear mapping,  , is used to map data from an input space into a higher dimensional intermediate space, where linear regression is performed. Consequently, complications result in the complexity of  and the problem of dimensionality in Equation (3). To alleviate these difficulties, the inner-product kernel is then introduced as follows: The dimensionality of the intermediate space is thus hidden from the remaining computations. Some of the most widely used kernels, such as linear, polynomial, and Gaussian radial basis functions (RBF), were tested in this study. The kernel function is employed in the optimization models above by replacing .,. with   .,. K . This adds the capability to approximate both linear and nonlinear functions.
In summary, the main advantages of SVR are implicit mapping by using kernels in handling nonlinear data, convexity of quadratic optimization, and generalization properties. In addition, distribution of the data is not necessarily assumed in advance, which makes it very promising for realworld problems.

Particle Swarm Optimization
PSO was introduced by Kennedy and Eberhart [19] to imitate social behavior of animals such as birds flocking in searching for food. Each particle flies in hyperspace searching for the best solution by adjusting position and velocity based on its own flying experience (pbest) and its companions' experience (gbest). The inertia weight w was later introduced to improve the PSO optimizer. It is very attractive because the requirements of gradient information are not needed. Hence, it is unaffected by discontinuities of the objective function. The equations used consist of flexible and well-balanced mechanisms to enhance the global and local exploration abilities. These allow a thorough search and simultaneously avoid premature convergence. In addition, PSO uses probabilistic rules for a particle's movements. Therefore, it is quite robust for local optima. The standard PSO consists of the following steps [19,29] 6. Loop to step 2 until a stopping criterion, a sufficiently good evaluation function value, or a maximum number of iterations, is met. In feature selection, the input presented to the regression modeling is in the form of a table where the rows represent chemical compounds and the columns are the molecular descriptors. Each compound contains a value for each corresponding factor. How accurately a QSPR model can predict the biological activity of the compounds depends on their values in a subset of the selected features. Hence, the selection of each column or feature is treated as a binary number. A numerical value of zero is used to represent that the corresponding descriptor is not selected for QSPR modeling. Otherwise, a numerical value of one is assigned. This binary problem calls for some modification of the original PSO. Thus presentx[i][d], which represents the value stored by the i th particle in the d th dimension, can only take on a binary value, instead of a real valued number. This indicates whether the d th feature is selected or not. Note that the D dimensions above are equal to the total number of descriptors. After the update step (Equation (5)), presentx[i][d] is discretized to a binary value by using probabilistic selection or roulette wheel selection. The fractional values of presentx[i][d] are treated as probability thresholds to determine subset membership. Each dimension or feature of the particle is assigned a slice of a roulette wheel whose size is proportional to presentx[i] [d]. The subset is assembled by spinning the wheel and selecting the features to which the wheel's marker points. This process is repeated k times, which are the predefined number of selected features. The chosen descriptors are then set to 1 and the remaining parameters are set to 0. The actual probabilities, p id , are computed as follows: where x id is the fractional coordinates presentx[i][d] after the update step (Equation (5)), and a is a scaling factor or selection pressure and is set to 2. This binary PSO (BPSO) still presents the same advantages as the original PSO. The near-optimal solutions are found much faster, compared with the performance of a random search or an exhaustive search. This allows BPSO to perform feature selection efficiently in datasets with large numbers of descriptors. The objective function evaluated by the BPSO is the Pearson correlation coefficient that measures the quality of QSPR model with the selected features:  (7) where N is the number of training compounds for regression and i y and i ŷ are the measured and the predicted activities of the i th compound, respectively.

Chiral Guest Dataset and Descriptor Generation
The complex stability constant (ln K), the standard free energy (ΔG°), the enthalpy (ΔH°) and the entropy change (TΔS°) for the 1:1 inclusion complexation of enantiomer pairs of 74 selected chiral compounds with -CD were taken from the experiments of Rekharsky and Inoue [30]. The values are given in Table 1. The guest structures were constructed by using the HyperChem program and fully geometrically optimized at the HF/3-21G level by the Gaussian03 program [31]. Two hundred structural properties were calculated by the Molecular Operating Environment (MOE) program package [32]. Descriptors are categorized by class: 2D descriptors (2D) are calculated from purely atomic and connectivity properties, internal 3D descriptors (i3D) use 3D coordinate information about each molecule, and external 3D descriptors (x3D) use absolute 3D atomic coordinate information, but also require an absolute frame of reference (such as, the molecules docked into the same receptor). The chiral guest dataset consisted of 56 compounds for training the models and 18 compounds for testing the quality of models. Table 1. Experimental Thermodynamic parameters: ln K (M -1 ), ΔG° (kJ mol -1 ), ΔH° (kJ mol -1 ) and TΔS° (kJ mol -1 ) of 74 chiral compounds in 1:1 inclusion complexation with -CD taken from Ref. [30] and the values from the best prediction QSPR models with four features.

QSPR Models
The PSO was adopted for major descriptor selection in QSPR of the chiral guest dataset. Swarm parameters are 50 particles and 100 iterations. The iterative PSO attempts to select the key features that maximize the Pearson correlation coefficient (R), resulting in a QSPR model developed by SVMs. The linear approximation is a fundamental concept of SVMs. Some of the most widely used kernels, such as linear, polynomial, and Gaussian radial basis functions (RBF) were tested in this study. This adds the capability to approximate both linear and nonlinear functions. Both PSO and SVMs were implemented in MATLAB 7.0.4 running on a Pentium IV (2.4 GHz) computer. The correlation coefficient for all PSO-SVM models are the average values from 10 calculations. Table 2 shows that the PSO-SVMs with three different kernels give very good results. All models have demonstrated an impressive performance with R 2 for training set (R 2 Training ) higher than 0.8. The nonlinear kernels give better results than the linear function for the training chiral guest dataset. The polynomial kernel, with R 2 Training between 0.9991 and 0.9994, has better calibration correlation coefficients than the Gaussian RBF kernel, whereas the Gaussian RBF gives much better predictions than those obtained with the polynomial SVM. These agree with our previous work [33], in which K and ΔH° of the given enantiomer pair dataset were predicted best with a Gaussian RBF kernel. The numbers of descriptors in the QSPR models of these thermodynamic properties are further investigated by using the Gaussian RBF kernel, which gives the best outcome for the chiral guest dataset. The statistics for all PSO-SVM models are given in Table 3. Models with four features also provide satisfactory predictive abilities when compared to the eight feature models. The best prediction performances of the individual models are presented in Table 4. Table 5 presents the selected descriptors in the best prediction QSPR models. All models were developed with four 2D descriptors, which use the atoms and connection information of the molecule for the calculation. This illustrates that the molecular size and shape factors are important for the thermodynamic properties of 74 chiral compounds in 1:1 inclusion complex with -CD. The plots of the best QSPR models with 4 features for ln K, ΔG°, ΔH° and TΔS° by PSO-SVM integration against the experimental values are shown in Figure 1.    Even though PSO-SVM methods are not able to explain the values of descriptors in the models, the maximum outlier from the QSPR models can point out the error of the experimental data. In ΔH° and TΔS° predictions by four features, PSO-SVM models indicate that cmp. 32 is the maximum outlier. Considering the values in Table 1, experimental results show that ΔH° of cmp. 31 (1R, 3S-camphoric acid) and cmp. 32 (1S, 3R-camphoric acid) are -15.5 and -8.3 kJ mol -1 , and TΔS° are -8.3 and 0.4 kJ mol -1 , respectively. The experimental results have different values for this enantiomeric pair, whereas the PSO-SVM models have identical results: ΔH° = -13.02 kJ mol -1 and TΔS° = -6.63 kJ mol -1 .

Conclusions
This work demonstrated that the combination of PSO and SVMs can be applied to effectively and efficiently select major features in QSPR modeling of the thermodynamic parameters of 1:1 inclusion complexation of enantiomeric pairs of chiral guests with -CD. This responds to the needs of drug designers for prediction of the thermodynamic parameters of new compounds in complexation with -CD. The method was based on a discrete binary modification of PSO. The fitness function was the Pearson correlation which was curve fitted by SVMs. The modified PSO appeared to be an effective and efficient algorithm, which robustly finds near-optimal and consistent results with short computer code and simple mathematical operators, while converging rather quickly. The SVMs showed excellent performance in predicting ln K, ΔG°, ΔH° and TΔS°, by considering major selected features. The combination of the adopted methods showed satisfactory results with the large dataset.