Classification of 5-HT1A Receptor Ligands on the Basis of Their Binding Affinities by Using PSO-Adaboost-SVM

In the present work, the support vector machine (SVM) and Adaboost-SVM have been used to develop a classification model as a potential screening mechanism for a novel series of 5-HT1A selective ligands. Each compound is represented by calculated structural descriptors that encode topological features. The particle swarm optimization (PSO) and the stepwise multiple linear regression (Stepwise-MLR) methods have been used to search descriptor space and select the descriptors which are responsible for the inhibitory activity of these compounds. The model containing seven descriptors found by Adaboost-SVM, has showed better predictive capability than the other models. The total accuracy in prediction for the training and test set is 100.0% and 95.0% for PSO-Adaboost-SVM, 99.1% and 92.5% for PSO-SVM, 99.1% and 82.5% for Stepwise-MLR-Adaboost-SVM, 99.1% and 77.5% for Stepwise-MLR-SVM, respectively. The results indicate that Adaboost-SVM can be used as a useful modeling tool for QSAR studies.


Introduction
Selective serotonin (5-HT) is an important neurotransmitter that mediates various physiological and pathological processes in the peripheral and central nervous system by interaction with several different receptors [1]. To date, 14 serotonin receptor subtypes (5-HTRs) with seven subfamilies (5-HT 1-7 ) have been identified on the basis of molecular cloning, amino acid sequence, pharmacology, and signal transduction [2]. Among the 5-HTRs, the 5-HT 1A subtype is a high affinity serotonin receptor expressed on both the neurons and nonneuronal cells throughout the brain and the spinal cord [3]. In both rodents and humans, the 5-HT 1A receptor [4] is highly concentrated in cortical and limbic brain areas associated with memory functions. And it is located both pre-and postsynaptically, drugs acting at the 5-HT 1A receptor can, depending on the dosages, inhibit or enhance 5-HT 1A receptor function, resulting in varying modulatory effects on key neurotransmitters (glutamate, GABA, and acetylcholine (ACh)) involved in cognition [4]. For example, the 5-HT 1A receptor partial agonist S15535 facilitated memory function in a number of behavioral models [5], because the actions of S15535 involved stimulation of the somatodendritic 5-HT 1A autoreceptors and blockade of postsynaptic 5-HT 1A receptors in the frontal cortex and hippocampus. Furthermore, 5-HT 1A receptor antagonists such as WAY-100635 can, in a dose-dependent manner, improve memory function, because it increases the basal (non-stimulated) ACh release in the cortical and hippocampal areas of the rat brain [5]. Recently, Bert et al. [6] reported increasing the number of 5-HT 1A -receptors in cortex and hippocampus does not induce mnemonic deficits in mice. The dosages of drugs can affect its performance, such as, at higher doses, the full 5-HT 1A -receptor agonist 8-OH-DPAT was found to impair learning, most likely due to activation of postsynaptic sites [6]. However, low doses of 8-OH-DPAT can improve learning and memory, because it can reduce 5-HT release in the projection areas of the raphe nuclei [4] and hippocampal 5-HT release in anaesthetized rats [7]. Moreover, different 5-HT 1A receptor antagonists in the rat have reported facilitation [8], impairment [9] or no effects [10] on cognitive performance in various tasks. These contradictory findings may be explained by different behavioral procedures, route of administration, and differential affinities for pre-and postsynaptic 5-HT 1A receptors, as well as lack of receptor specificity of the 5-HT 1A receptor antagonists used in the different studies. Notable, enhanced brain 5-HT activity improves memory in animals and humans whereas decreasing brain 5-HT levels by acute 5-HT depletion has been shown to impair it [11]. Hence, strategies discriminating the above sources of serotonergic tone will also contribute to the in vivo assessment of inverse agonist [12], agonist or antagonist effects, with 5-HT 1A receptors being a good candidate considering their tone as well as pre-and postsynaptic localization. Several potent 5-HT 1A ligands belong to different chemical classes such as arylpiperazine compounds [13], 4-halo-6-[2-(4-arylpiperazin-1-yl)ethyl]-1H-benzimidazoles [14], piperazine-pyridazinone derivatives [15], [[(arylpiperazinyl) alkyl]thio]thieno [2,3-d]pyrimidinone derivatives [16], 3-[(4-aryl) piperazin-1-yl]-1-arylpropane derivatives [17], 4-[2-(3-methoxyphenyl) ethyl]-1-(2-methoxyphenyl) piperazine [18], and arylpiperazinylalkylthiobenzimidazole, benzothiazole, or benzoxazole derivatives [1].
In previous studies, many groups were more interested in the synthesis of novel compounds as selective 5-HT 1A Serotonin Receptor Ligands, but time and cost considerations do not make it feasible to carry out binding bioassays on every molecule. Alternatively, an untested molecule might be evaluated using the information from already obtained bioassays and the ability to build quantitative structure activity relationships (QSAR) modeling. QSAR modeling seeks to discover and use mathematical relationships between chemical structure and biological activity. The approach does not depend on experimental data of novel compounds as selective 5-HT 1A Ligands, and need the molecular descriptors of the compounds, which can be calculated from the molecular structure alone. Once the structure of a compound is known, any molecular descriptor can be calculated no matter whether the compound is synthesized or not. When a model is established, we can use it to predict the properties of compounds and see which structural factors influence those properties. In order to establish the QSAR model, we should use appropriate molecular descriptors and select suitable modeling methods, including linear methods or nonlinear methods such as LDA (Linear Discriminant Analysis) [13], Spectral-SAR Algorithm [19,20] In the past several years, many authors have studied 5-HT 1A receptor ligands using different QSAR models. Chilmonczyk et al. proposed a 3D QSAR model for various classes of 5-HT receptor ligands by applying molecular electrostatic potential [21]. Borosy et al. employed 3D QSAR analysis of a novel set of pyridazinothiazepines and pyridazinooxazepines with moderate-to-high affinity to 5-HT 1A -receptors, whose model identified by DISCO (DIStance Comparison) served as a suitable mode of superposition for subsequent comparative molecular field analysis [22]. Later, Menziani et al. designed a theoretical QSAR model based on theoretical descriptors, ad hoc defined size and shape descriptors. And theoretical descriptors derived by means of the program CODESSA and ad hoc defined size and shape descriptors have been employed for deciphering, on a quantitative ground, the molecular features responsible for affinity and selectivity in a series of potent N 4 -substituted arylpiperazines antagonists acting at postsynaptic 5-HT 1A displaying a wide range of selectivity towards the α1-adrenoceptors [23]. Guccione et al. reported 5-HT 1A and α1-adrenergic receptor (α1-AR) receptor binding properties of a series of 23 thienopyrimidinones using HASL 3D-QSAR models. And the multiconformer 3D-QSAR model was demonstrated to yield robust cross-validated models for the 23 thienopyrimidinones, which were more predictive than models based on single conformers. Furthermore, the model can avoid the alignment problems typical to 3D-QSAR analyses, and it can represent advancement over other alignment-based methods in avoiding artifactual edge effects and providing smooth interaction contours amenable to direct interpretation [24]. Recently, several groups have used different QSAR models for prediction of binding affinity to the 5-HT 1A receptor of a series of structurally diverse compounds [13,25,26]. Because most of the QSAR models are limited in their applicability to compounds that have a common template structure, accurate prediction of binding affinity of novel 5-HT 1A receptors remains difficult. However, classification of binding affinity of diverse compounds to some extent may be feasible.
In this work, a new QSAR model for the binary classification of 153 5-HT 1A selective ligands has been developed with Adaboost-SVM. The variables were calculated by the E-Dragon 1.0 software, and the descriptors were pre-selected by particle swarm optimization (PSO) and stepwise multiple linear regression (Stepwise -MLR) methods.

Variable selection and model building
In the present work, the stepwise multiple linear regression (stepwise-MLR) and the particle swarm optimization (PSO) optimization techniques have been used for the selection of the most relevant descriptors from the pool of 77 topological descriptors depending on the compounds in the training set.

Variable selection by stepwise-MLR
Stepwise-MLR is a popular technique that has been used on the training data set to select the most appropriate descriptors [27]. This method has been used for variable selection or model development in different systems [28][29][30]. We select significant descriptors using Stepwise-MLR procedure.
The pool of 119 topological descriptors has been calculated by using the software E-Dragon 1.0 for each compound in the training set. The pool of 42 topological descriptors with constant or near constant values inside each group is discarded. The eight most significant descriptors in the training set, which were selected by Stepwise-MLR are: MSD, MAXDN, MAXDP, PW4, PW5, PJI2, BAC, ICR, whose definitions are depicted in Table 1. The eight variables compose of the variable subset of Stepwise-MLR-SVM and Stepwise-MLR-Adaboost-SVM model, respectively. The predicted result of training set and test set are listed in Table  2. The misclassified samples (marked by superscript '**') of SVM and Adaboost-SVM are also listed.
The same misclassified ones of SVM and Adaboost-SVM are a9, a10, e8, m8, n5 and n6. For the training set, when only SVM was used as the base classifier, the total accuracy was 0.991 (Table 3). When AdaBoost algorithm was used to boost the SVM classifier, after 100 iterations, the total accuracy was 0.991 (Table 3). For the testing set, for SVM only and the AdaBoost algorithm, the total accuracy was 0.775 and 0.825 (Table 3), respectively. Comparing with the only the SVM algorithm, it showed that better predicted accuracy was obtained by Stepwise-MLR-Adaboost-SVM. The predictive ability of Stepwise-MLR-Adaboost-SVM is raised 5.0%. The result implies that the AdaBoost algorithm can boost the SVM classifier, and let it make a stronger classifier.

Variable selection by PSO
PSO is a population-based optimization tool. The system is initialized with a population of random solutions and searches for optima by updating generations. Unlike genetic algorithms (GA), PSO has no evolution operators, such as cross-over and mutation. Compared to GA, the advantages of PSO are that PSO is easy to implement and there are few parameters to adjust. In this paper, PSO has been used to find the optimized combinations of variables, from which one can extract the most related variables that capture maximally the information of the original variable blocks to establish the classification model.
The PSO algorithm maintains a population of 10 particles, c 1 =c 2 =1.8, V max =0.2, X max =1.0, X min =0.0 and w was linearly decreased from 1.0 to 0.2 during the 100 iterations. The variable selection methods have been used to select the most significant descriptors from the pool of 77 topological descriptors depending on the compounds in the training set. The selected descriptors by the method have been used to construct some models by using SVM and Adaboost-SVM techniques. These models can be shown as PSO-SVM and PSO-Adaboost-SVM. The selected variables are shown in Table 3 and their meaning in Table 2.
The seven variables compose of the variable subset of PSO-SVM and PSO-Adaboost-SVM model, respectively. The predicted result of training set and test set is listed in Table 1. The same misclassified ones of SVM and Adaboost-SVM are m8 and n4.
When PSO was used as the variable selection method and SVM was used as the base classifier, for training set, the total accuracy was 0.991 (Table 3); for testing, the total accuracy was 0.925 (Table 3).
When PSO was used as the variable selection method and AdaBoost algorithm was used to boost SVM classifier. In SVM, the kernel function was the Gaussian radial basis function (RBF) because of its good generalization and a few parameters, the RBF is defined as below: is width of RDF. Optimized parameters of the SVM are: width of RBF γ=0.0001 and capacity parameter c=5000. The best total accuracy of training and test set was 1.000 and 0.950, respectively ( Table 3). The total accuracy of training and test set was raised 0.9% and 2.5%, respectively. Comparing the results of two different variable selection methods, PSO is a more suitable tool to optimize result of classification problem compared to SVM and AdaBoost-SVM algorithm.

Comparison with different models
The selection of this model as the best one obtained with the topological descriptors is because of the best global classification for training set and test set of the models obtained with this family of descriptors. The classification results for the training set and test set are illustrated in Table 3.
As can be seen from the table, the values of the predicted accuracy of high 5-HT 1A affinity compounds, the predicted accuracy of low 5-HT 1A affinity compounds and the total accuracy of the training and test set are higher than 0.82, 0.66 and 0.77, respectively. Therefore all the models used in comparison present a good predicted result. As can be seen from Table 3 From the comparison of the four methods, it can be seen that performance of PSO-AdaBoost-SVM is better than that of Stepwise-MLR-SVM, Stepwise-MLR-AdaBoost-SVM and PSO-SVM, which implies that, extracting the most related variables that capture maximally the information of the original variable blocks is very important to raise predicted accuracy of the test set, and AdaBoost algorithm was used to boost SVM classifier. In Table 3, we have observed an interesting phenomenon: The predicted accuracy of the low 5-HT 1A affinity group is lower than that of the high 5-HT 1A affinity group. The reasons for this phenomenon are not clear, so perhaps some of compounds misclassified by every method need further experimental testing.
In the PSO-AdaBoost-SVM model, the percentages of false negatives and false positives in the test set are 3.6% (1/28) and 8.3% (1/12), respectively. False positives are those compounds without binding affinities of 5-HT 1A that are classified as active, and the false negatives are those compounds with binding affinities of 5-HT 1A that are classified as inactive (see Table 3). From a practical point of view, in the development of the classification model, it is considered more important to avoid false negatives compounds because those compounds will be rejected for their wrongly predicted property and therefore they will never be evaluated experimentally, and their true binding affinities of 5-HT 1A would never be discovered. On the contrary, the false positives compounds eventually will be detected.

Variable's interpretation of the best model
The molecular structure can be represented using many theoretical descriptors from the literature, but we usually face the problem of selecting those which are the most representatives for the property under consideration. Topological indices have been widely used in the correlation of physicochemical properties of organic compounds. And it is also known as graph theoretical indices are descriptors that characterize molecular graphs and contain a large amount of information about the molecule, including the numbers of hydrogen and non-hydrogen atoms bonded to each non-hydrogen atom, the details of the electronic structure of each atom, and the molecular structural features.
Molecular branching and molecular cyclic structure are the two most visible structural elements that widely vary among molecules. Randić [31] constructed a new matrix D/DD, which has a finer discriminating power, particularly when one is interested in local molecular features, such as atomic, bond, or ring descriptors. At the same time, the D/DD matrix is more sensitive to the immediate and global environment of vertices or larger graph fragments. Hence, the distance/detour index (D/D) descriptor can discriminate 5-HT 1A selective ligands primely.
Balaban centric index (BAC) [32,33], is that the sum of the BF vector gives the centric , where I j is the intrinsic state of atoms j, V j is the vertex degree of atom j, and D ij is the distance between atoms i and j. The BF method is based on both topological and electronic information, which can be applied in molecules with polycyclic, multiple bond and heteroatom. When molecular skeletons are differently organized with respect to the graph center, molecular centricity becomes of importance [34]. Graph center and related parameters are useful for coding of molecular structure, as well as modeling QSAR. So, 5-HT 1A selective ligands can be distinguished primely using the BAC descriptor.
Molecular electrotopological variation (DELS) [35] is simply the sum over all atoms of the intrinsic state differences and could be a measure of total charge transfer in the molecule. The DELS is defined as follows , where, I is the atomic intrinsic state, d denotes the topological distance between the two considered atoms. The intrinsic state of an atom is calculated as the ratio between Kier-Hall atomic electronegativity and the vertex degree, i.e. the number of bonds of the atom, encoding information related to both partial charges of atoms and their topological position relative to the whole molecule. Mean distance degree deviation (MDDD) [37] is defined as follows , where G is a finite connected graph without loops and multiple edges; V(G) is the set of vertices of graph G with cardinality ) is the Wiener index of the graph G, where the distance d(u,v) between vertices u,v in graph G is the length of a simple path which joins the vertices u and v in the graph G and contains the minimal number of edges; is called the atom eccentricity (ECC) of the atom i. In addition, S1K [38] is the Kier-Hall α-modified shape index which is a measure of the relative cyclicity of a compound. A decrease in the value indicates an increase in cyclicity with multi-cyclic compounds having lower values than monocyclic ones. TI1 is the first Mohar index, which is also important topological index. And it has been successfully applied to construct the QSAR model [39]. Regarding above reasons, the seven selected topological descriptors can identify primely different structure information of 5-HT 1A selective ligands. Hence, our PSO-Adaboost-SVM model relating to the topological descriptor predictor should be useful for classification of prediction inhibitory activities of the new synthetic 5-HT 1A selective ligands derivatives.

Comparison with other approaches
As we have previously explained, one of the objectives of the current work was to compare the reliability and applicability of the topological descriptors to describe the property under study as compared with other different descriptors. Consequently, we have developed other twelve models using the same data set that was included in the topological descriptor for the PSO-AdaBoost-SVM model. The results obtained with Constitutional, Information indices, Randic molecular profiles, RDF, WHIM, Topological, 2D autocorrelation Indices, Burden eigenvalues, Eigenualue based indices, Geometrical, 3D-MoRSE, and GETAWAY descriptors [40], are given in Table 4. These descriptors have been calculated by the software E-Dragon 1.0. The comparisons have been done based on classification results, and the predictive capability of the generated models.
As can be seen from Table 4, the value of TA of test set is lower than 90.1% for all approaches except the topological descriptor which has a TA equal to 95.0%. This approach also yields the best value for TA of the training set and percentages of false positives in the test set which has the lowest values in comparison with the rest of the approaches. Additionally, the topological descriptor presents better the percentages of false negatives in the test set, except for information indices descriptor, but information indices descriptor has worse predictive capability for low 5-HT 1A affinity compounds (only 50.0%). In this sense, other families of descriptors such as Constitutional, RDF, Topological, 2D autocorrelation Indices, Geometrical, 3D-MoRSE, GETAWAY, Randic molecular profiles, WHIM, Burden eigenvalues, and Eigenualue based indices have presented similar percent of high 5-HT 1A affinity compounds classification in the test set (92.9, 89.3, 96.4, 92.9, 89.3, 96.4, 85.7, 96.4, 92.9, 89.3, and 92.9%, respectively), while they have shown worse classification for low 5-HT 1A affinity compounds in the test set. So, the topological descriptor for the PSO-AdaBoost-SVM model, not only overtakes the others models in the predictive accuracy of false negatives, but the total accuracy of compounds in the test set is the best. For all these reasons, we have considered that the PSO-AdaBoost-SVM method with topological descriptor can be a useful tool for classification of 5-HT 1A selective ligands on the basis of their binding affinities.

Data sets
The studied compounds are 153 5-HT 1A selective ligands, which were taken from the literature [1,[11][12][13][14][15][16] and their generic structures are shown in Figure 1. The inhibition constant (K i ) is obtained from the IC 50 value by the Cheng-Prusoff equation [39]. The tested compounds are selective 5-HT 1A ligands showing K i values from 0.094 to 5000 nM. For analysis purposes, pK i values are used as the dependent variables and are given in Table 1. The compounds studied in our investigation are more diverse. It is difficult to build a QSAR model by their activity values because there is a very low similarity of the complex structure. So, the compounds are divided into two classifications according to 5-HT 1A selective ligands binding affinities: high 5-HT 1A affinity and low 5-HT 1A affinity. Compounds with pK i values>6.7 are assumed as high 5-HT 1A affinity compounds and pK i values≤6.7 are assumed as low 5-HT 1A affinity compounds [13], which are represented by '1', and '-1', respectively. The whole data set with 153 compounds is randomly divided into training set and test set. The training set is used to adjust the parameters of the models. The test set is used to evaluate the performance of the models once they are built. The training set consists of 113 compounds (including 73 high 5-HT 1A affinity compounds and 40 low 5-HT 1A affinity compounds), and the test set contains 40 compounds (including 28 high 5-HT 1A affinity compounds and 12 low 5-HT 1A affinity compounds).

Descriptor calculation
To develop a QSAR model, molecular structures need to be represented using molecular descriptors, which encode structural information. The calculation process of the descriptors involves the following steps: the structures of the compounds are drawn using Molinspiration WebME Editor [42] and saved as a.smi files. Then the a.smi files are transferred into the software E-Dragon 1.0 [43] to calculate zero, one, two and three dimensional structural descriptors. The software E-Dragon 1.0 can calculate Constitutional, Information indices, Randic molecular profiles, RDF, WHIM, Topological, 2D autocorrelation Indices, Burden eigenvalues, Eigenualue based indices, Geometrical, 3D-MoRSE, and GETAWAY descriptors. And the descriptors have been successfully used in various QSAR/QSPR researches [44][45][46]. In the pre-reduction step, the calculated descriptors are searched for constant values for all molecules and those detected descriptors are removed, and the others calculated descriptors would be used as original variable set.

Theory of PSO
Particle swarm optimization (PSO) is an optimization algorithm, which simulates the movement and flocking of birds [47]. Similar to other population-based algorithms, such as evolutionary algorithms, PSO can solve a variety of difficult optimization problems but has shown a faster convergence rate than other evolutionary algorithms on some problems [48]. The other advantage of PSO is that it has very few parameters to adjust, which makes it particularly easy to implement.
PSO is based on the fact that in order to reach the optimum solution in a multidimensional space, a population of particles is created whose present coordinate determines the cost function to be minimized. After each iteration the new velocity and hence the new position of each particle is updated on the basis of a summated influence of each particle's present velocity, distance of the particle from its own best performance, achieves so far during the search process and the distance of the particle from the leading particle, i.e. the particle which at present is globally the best particle producing till now the best performance, i.e. minimum of the cost function achieved so far. Let x and v denote a particle position and its corresponding velocity in a search space, respectively. Therefore, the i th particle in the d-dimensional search space can be represented as ,v id ), respectively. Each particle has its own best position (pbest), pb i = (pb i1 , pb i2 , …, pb id ) corresponding to the personal best objective value obtained so far at time t. The index of the best particle among all the particles in the group is represented by the pb g , which represents the best particle found so far at time t. The new velocity of each particle is calculated as follows: where, m is the number of particles in a group; d is the number of members in a particle; c 1 , c 2 are constants, which control how far a particle will move in a single iteration; r 1 , r 2 are two independent random numbers uniformly distributed in the range of [0, 1] and w is the inertia weight. A larger inertia weight facilitates global exploration and a smaller inertia weight tends to facilitate local exploration to fine-tune the current search area [49,50]. Thus, the position of each particle is updated iteration according to the following equation: where v i (t) is the velocity of a particle i at iteration t, and x i (t) is the current position of a particle i at iteration t. Generally, the value of each component in v i by Equation (1) can be clamped to the range [-v max , v max ] to control excessive roaming of particles outside the search space. Then the particle flies toward a new position according to Equation (2). This process is repeated until a user-defined termination criterion is reached, and the termination criterion is determined according to whether the maximum iteration or a designated value of the fitness function.

Theory of AdaBoost algorithm
Boosting is a learning process to make a strong classifier by combining multiple weak classifiers. The weak classifier just has slightly better performance than random classification [51]. And boosting demands prior knowledge of the accuracy of the weak learners. In 1997, Freund et al. proposed a method that does not have this requirement [52]. This method is called Adaptive boosting (AdaBoost). AdaBoost algorithm is the most frequently used Boosting method. Depending on the purpose and data structure, variants on AdaBoost algorithm have been developed, such as Discrete AdaBoost, Real AdaBoost and AdaBoost.MH [53]. Here the Discrete AdaBoost has been employed. The other two have been described in detail [48] and will not be repeated here.
Suppose there is a training data set with N samples to in two classes. The two classes are defined as y∈{-1,1}, 1 and -1 corresponding to high and low, respectively. A sequence of N training examples (labelled instances) (x 1 , y 1 ), ..., (x N , y N ) is drawn randomly from X×Y according to distribution ζ. We use boosting to find a hypothesis h f which is consistent with most of the sample (i.e., h f (x i )=y i for most 1≤i≤N).
The Discrete AdaBoost algorithm can be implemented as follows [51][52][53][54]: Step 1 Distribution D over the N training examples, initializes the weight vector: Step 2 Do for t = 1, 2,…, T Step 2a Set Select a data set with N samples from the original training set. The chance for a sample to be selected is related to the distribution of the weights p t . A sample with a higher weight has a higher probability to be selected.
Step 2b Call Weaklearn F t (x), which is done with SVM in our case, with the training set base on the current distribution p t and get back a hypothesis h t (x): F t (x)→h t (x).
Step 2c Calculate the sum of the weighted errors of all training samples according to hypothesis Step 2d Update weights of the correctly classified samples and let the misclassified samples unchanged among all the original training samples.
According to formula (6) and (7) the weights of the samples that are correctly classified are decreased while the weights of the misclassified samples are unchanged.
Step 2e The confidence index of hypothesis h t (x) is calculated as: The lower the weighted error made by hypothesis h t (x) on the training samples, the higher the confidence index of the hypothesis h t (x).
Step 2f If ε t <0.5 or t ≤ T, repeat step (1) ~ step (5); otherwise, stop and T = t -1. After T iterations in Step 2, there is T hypothesis h t (x)s which is associated with T base learning algorithm F t (x)s.
Step 3 The performance of Discrete AdaBoost is evaluated by a test set. For a sample j of the test set, the final prediction is the combined prediction obtained from the T learners. Each prediction is multiplied by the confidence index of the corresponding learner h t (x). The higher confidence index of a learner h t (x), the higher its role in the final decision:

Methodology
After the descriptors are selected using two variable selection methods (Stepwise-MLR and PSO), the next step is to build the classification model using SVM (support vector machines) [55] and Adaboost-SVM methods, respectively. These models can be shown as Stepwise-MLR-SVM, Stepwise-MLR-Adaboost-SVM, PSO-SVM and PSO-Adaboost-SVM. As the Adaboost algorithm has been depicted in the previous section, we only give a simple description on the theory of SVM.
The support vector machine (SVM), was introduced by Vapnik [56] as a novel type of learning machine, gaining popularity due to many attractive features and promising empirical performance. Originally, SVM was developed for pattern recognition problems. And now, with the introduction of a ε-insensitive loss function, SVM has been extended to solve nonlinear regression estimation and time-series prediction and excellent performances have been obtained [57].
For the classification problem, in brief, this involves the optimization of Lagrangian multipliers α i with constraints 0 ≤α i ≤ C and ∑α i y i =0 to yield a decision function: Where sign (u) implies a sign function which returns +1 when u>0, and -1 when u≤0; y i is input class labels that take a value of -1 or +1, x i is a set of descriptors; and K(x, x i ) is a kernel function, whose value is equal to the inner product of two vectors x and x i in the feature space Φ(x) and Φ(x i ). That is, K(x, x i )= Φ(x) .Φ(x i ). Any function that satisfies Mercer's condition can be used as the kernel function.
In SVM, we chose c-SVC [55] as the base classifier and the kernel function was the Gaussian radial basis function (RBF) function.
All the algorithms were written in MATLAB and run on a personal computer (Intel(R) Pentium(R) 4 / 3.20 GHz, 1.00GB RAM).

The evaluation of prediction power
In this study, the quality of a model is assessed by several statistical measures, including false-negative (FN), false-negative rate (FNR), false-positive (FP), false-positive rate (FPR), and total accuracy (TA).
FP: false-positives, the number of chemicals predicted to be active but inactive in the assay, FN: false-negatives, the number of chemicals predicted to be inactive but active in the assay, FPR and FNR defined as follows: where, Ndenotes the total number of inactive chemicals in the data set, N + denotes the total number of active chemicals in the data set, N denotes the total number of active and inactive chemicals in the data set.

Conclusions
In this work, AdaBoost-SVM has been developed for the QSAR analysis of 5-HT 1A selective ligands. The binding affinities of 153 5-HT 1A selective ligands were classified. The variables for the molecular descriptors were determined by PSO. Compared with other descriptors in the Adaboost-SVM model and SVM model, the Topological descriptor composed of AdaBoost-SVM