3D-QSAR Studies on Thiazolidin-4-one S1P1 Receptor Agonists by CoMFA and CoMSIA

Selective S1P1 receptor agonists have therapeutic potential to treat a variety of immune-mediated diseases. A series of 2-imino-thiazolidin-4-one derivatives displaying potent S1P1 receptor agonistic activity were selected to establish 3D-QSAR models using CoMFA and CoMSIA methods. Internal and external cross-validation techniques were investigated as well as some measures including region focusing, progressive scrambling, bootstraping and leave-group-out. The satisfactory CoMFA model predicted a q2 value of 0.751 and an r2 value of 0.973, indicating that electrostatic and steric properties play a significant role in potency. The best CoMSIA model, based on a combination of steric, electrostatic, hydrophobic and H-bond donor descriptors, predicted a q2 value of 0.739 and an r2 value of 0.923. The models were graphically interpreted using contour plots which gave more insight into the structural requirements for increasing the activity of a compound, providing a solid basis for future rational design of more active S1P1 receptor agonists.


Introduction
Sphingosine-1-phosphate (S1P), a widespread lysophospholipid, binds five specific G-protein coupled receptors (S1P 1 -S1P 5 ) [1][2][3][4] and exerts a variety of biological activities such as vascular maturation, cell survival, proliferation, differentitation, migration and chemotaxis [5][6][7]. The S1P 1 receptor only couples to Gαi/o while the other S1P receptors are promiscuous with respect to G-protein coupling, activating G 12/13 in addition to Gαi/o and also Gαq in the case of S1P 2 and S1P 3 [8]. S1P 1 modulates egress of T-lymphocytes from thymus and peripheral lymphoid organs [9], and experiments [10,11] have demonstrated that targeting S1P 1 is sufficient to cause lymphocyte sequestration to thymus and lymphoid organs, without affecting the innate immune system and even cellular reactivity of lymphocytes to antigen challenge. On the other hand, activation of the S1P 3 does not relate to lymphocyte recirculation but links to some undesirable effects such as bronchoconstriction, heart rate reduction and pulmonary epithelial leakage [12][13][14][15]. Selective S1P 1 receptor agonists are promisingly developed as a novel immunomodulator. S1P 1 receptor has been considered a potential target for a variety of immune-mediated diseases, including rheumatoid arthritis, psoriasis and multiple sclerosis disease. Recently, a novel class of S1P 1 receptor agonists based on the 2-imino-thiazolidin-4-one scaffold has been reported [8] and ligand-based drug design techniques will be employed to guide the synthesis of future generations of S1P 1 receptor agonists since the crystal structure of S1P 1 receptor remained unclear.
Three-dimensional quantitative structure-activity relationship (3D-QSAR) techniques are useful methods of ligand-based drug design by correlating physicochemical descriptors from a set of related compounds to their known molecular activity or molecular property values [16,17]. Many researchers [18,19] have carried out quantitative structure activity relationship (QSAR) studies on thiazolidin-4-ones as anti-HIV agents, but the present work reports the first application of QSAR to study thiazolidin-4-ones as potent S1P1 receptor agonists [8]. We studied 61 2-imino-thiazolidin-4-one derivatives using comparative molecular field analysis (CoMFA) [20] and comparative molecular similarity indices analysis (CoMSIA) [21]. The satisfactory QSAR models obtained provide a solid basis for future rational design of more active and selective S1P 1 receptor agonists within the family of S1P receptors.

CoMFA Analysis
The most active molecule 60 was selected as the template for alignment ( Figure 1) and the CoMFA model provided a cross-validation q 2 value of 0.751 with 5 components and an r 2 value of 0.973 (Table 1). The activity values predicted for the tested compounds are in good agreement with the experimental values ( Figure 2) and the r 2 pred value of 0.904 further confirms that the model is reliable and accurate with higher predictive capacity. The steric and electrostatic field contributions to the final model were 54.5% and 45.5%, respectively. The results of the progressive scrambling showed that the value for the slope in the 5 component model is acceptable (Figure 3), and the optimum statistics are also seen for 5 components because cSDEP is at a minimum with Q 2 at a maximum (Table 2).

CoMSIA Analysis
Eight CoMSIA models were generated using combinations of two, three, four, and all five descriptors as shown in Table 3. Model 6, based on four descriptors of steric, electrostatic, hydrophobic and H-bond donor fields was found to be the most accurate yielding a q 2 value of 0.739 and an r 2 value of 0.923. The Group cross q 2 value of 0.740, bootstrapped value of 0.973 ± 0.007 and test set r 2 value of 0.730 further approve that it is the best CoMSIA model. The predicted values are consistent with the experimental data ( Figure 4). The steric field explains 12.5% of the variances, the electrostatic field explains 29.0%, the hydrophobic field for 30.3% and the hydrogen-bond donor for 28.2% of the variance.

CoMFA Contour Maps
The results of 3D-QSAR models are presented in the contour coefficient maps as shown in Figure 5. The CoMFA steric contour map of the most active compound 60 shows a large green polyhedron around 3, 4-positions of 5-benzylidene and a small green one locates near 2-position of N 3 -phenyl ring, suggesting that bulky substituents are preferred in these regions. Large yellow polyhedra at 2-position of 5-benzylidene and around 2-substituent at the iminothiazolidinone scaffold indicate that bulks are disfavored here. These may be the reason why compound 60 with a chlorine atom and a hydroxylethyloxyl substitution at 3-and 4-positions of 5-benzylidene respectively, a methyl group linking to 2-position of 3-phenyl ring substituted at iminothiazolidinone ring, and no substitutions at 2-position of 5-benzylidene ring, showed the most active. Similar to compound 60, compounds 58, 59, 61 displayed almost identical potency. On the other hand, compound 56, with the reverse of substitution at each corresponding position, showed very low activity. The contour maps also show that compounds 18, 25-27, 40 exhibited much higher potency than 19, 30-32, 46 correspondingly. At the same time, the steric contour illustrated a weak trend for a reducing chain length to steadily improve the compound's potency. For example, the activity of compounds 12, 11, 10, 9 was decreased in turn. For the electrostatic contour plots, two small blue polyhedra and a large red polyhedron are both located around the side chain linking to 4-position of 5-benzylidene, indicating that negatively charged groups and electron donating groups are both available in these regions. A large red polyhedron located at 3-position of 5-benzylidene and around 3, 4-positions of N 3 -phenyl ring, indicating that electron-withdrawing groups are preferred at these positions. That is why compounds 26-28 were shown to be more potent than compounds 29-32 with electron-donating groups, and compounds 50-54 without any substituents at 3-position of 5-benzylidene displayed more potency than the corresponding compound 56. Combined with its steric contour map, the chloro-substituent at 3-position of the benzylidene ring is essential group for compound's potency on S1P 1 .

CoMSIA Contour Maps
The best CoMSIA model contour maps of the most active analog (compound 60 in Table 4) are shown in Figure 6. Its steric and electrostatic contour plots ( Figure 6A,B) correlate well with the CoMFA contour maps described above. Figure 6C displays the hydrophobic contour map represented by yellow and white polyhedra. A large yellow polyhedron and a small polyhedron located around 2, 3, 4-positions of N 3 -phenyl ring respectively, indicating that hydrophobic substituents on ligands in these regions can be favored and the white ones disfavored such substituents. That is why compounds 58-61 with a methyl group at the region showed far more potent than others. Figure 6D shows the CoMSIA hydrogen-bond donor fields denoted by cyan and purple contours. Cyan contours represent regions where hydrogen-bond donor substituents are preferred and purple contours indicate unfavorable regions. One large cyan contour around the side chain linking to 4-position of 5-benzylidene indicate that the free OH at 4-position or on its side chain is necessary for potencies. A large purple contour at 2, 3-positions respectively shows where such substituents may be disfavored. That is why compound 40 showed more potent than 41 while compound 56 was less active than compounds 50-54 and 57.

Data Set
Sixty-one compounds in the present study were taken from the published works of Martin H. Bolli and co-workers [8], and some molecules whose activity was shown not to be very exact from the experimental study were excluded. The structures of the molecules and their biological data are given in Table 4. For convenience, the EC50 values of S1P 1 receptor agonists were often converted to their negative logarithm (pEC50) values. The pEC50 values of these compounds have a span of 3.0 log units from 5.06 to 8.04, providing a broad and homogenous data set for 3D-QSAR study [22]. 20 compounds were randomly selected as the test set based on the structural diversity while the remaining 41 compounds were taken as the training set.
Three-dimensional structure building and all modeling were performed using the SYBYL 8.1 program package of Tripos, Inc. 3D structures of all compounds were constructed using the Sketch Molecule module. Structural energy minimization was carried out using the standard Tripos molecular mechanics force field and Gasteiger-Hückle charge, with the convergence criterion set at 0.05 kcal/(Å mol) and the max iterations for the minimization set to 2000.
One method of 3D-QSAR optimization is known as region focusing [23]. However, application of region focusing in this study resulted in a little decrease from 0.751 to 0.739 for the internal validity and from 0.973 to 0.958 for the non-validated r 2 . Obviously, the region focusing is not essential in this CoMFA model.

Molecular Alignment
A good alignment is the most important element for CoMFA and CoMSIA analysis although probe atom type, lattice shifting step size and over all orientation of the aligned compounds may have a bearing on their results [24]. The quality and the predictive ability of the model are directly dependent on the alignment rules. Once the active conformation was determined by energy minimization using Powell method and Tripos standard force field [25] with a distance-dependent dielectric function, the alignment was performed based on some rules such as substructure overlap, pharmacophore overlap and docking [26]. In order to produce a more robust CoMFA and CoMSIA models with a good cross-validated r 2 value, the 5-benzylidenylthizaolindin-4-one with structural rigidity was selected as the common substructure to overlap and to align all of the molecules and the most active compound 65 was used as the alignment template. Alignment of all compounds was shown in Figure 1. It can be seen that all the compounds studied have similar active conformations.

Partial Least Squares (PLS) Analysis
PLS analysis [27], used to linearly correlate the 3D-QSAR fields to biological activity values, was first carried out by the leave-one-out (LOO) and leave-group-out (10 compound groups) cross-validation methods, respectively, to determine cross-validated r 2 (q 2 ) values and the optimal number of components, in which one compound was removed from the data set and its activity was predicted using the model from the rest of the data set. Non-cross-validation was then performed to establish the final 3D-QSAR model using the optimal number of components, in which conventional correlation coefficient (r 2 ), standard errors of estimate (SEE), and F ratio between the variances of calculated and observed activities were given.
The optimal number of components, usually corresponding to the highest cross-validated squared coefficient (q 2 ), was selected on the basis of the lowest standard error of prediction (SEP) and avoiding over-fitting the models. A higher component was accepted and used only when the q 2 differences between two components was larger than 10%. The q 2 has been a good indicator of the accuracy of actual predictions. In general, q 2 values can be separated into three categories: q 2 > 0.6 means a fairly good model, q 2 = 0.4 -0.6 means a questionable model, and q 2 < 0.4 a poor model [27]. To further assess the robustness of the derived models, bootstraping analysis (10 runs) was also utilized to calculate confidence intervals for the r 2 and SEE [28,29].

CoMFA Studies
Three-dimensional grid spacing was set at 2 Å in the x, y, and z directions and a 3D cubic lattice consisted its grid region that extended at least 4 Å beyond van der Waals volume of all aligned molecules in all directions. Steric energy (Lennard-Jones potential) and electrostatic energy (Coulomb potential) were calculated using the Tripos force field for each molecule, and the sp 3 -hybrized carbon atom with a +1 charge was taken as the probe atom to determine the magnitude of the field values. All energies that exceeded the cutoff value of 30 kcal/mol were ignored for the reduction of domination by large steric and electrostatic energies. With standard options for scaling of variables, the regression analysis was carried out using partial least squares (PLS) method [27]. To improve the signal to noise ratio, the column filtering was set to 2.0 kcal/mol, then those lattice points whose energy variation was below this threshold were automatically omitted [30]. The final model was developed with the optimum number of components to yield a non cross-validated r 2 value. Despite being unable to describe all of the binding forces, CoMFA is still a widely useful tool for QSAR analysis at 3D level.

CoMSIA Studies
CoMSIA is similar to CoMFA based on the same assumption that changes in binding affinities of ligands are related to changes in molecular properties represented by fields. Moreover, for CoMSIA, besides steric and electrostatic fields, three other different fields (hydrophobic, hydrogen bond donor, and hydrogen bond acceptor) are calculated [31]. A Gaussian function was introduced to determine the distance between the probe atom and the molecule atoms. Similarity indices inside and outside different molecular surfaces can be calculated at all grid points, however only outside indices were calculated in CoMFA. Equation used to calculate the similarity indices is as follows: Where, A is the similarity index at grid point q, summed over all atoms i of the molecule j under investigation. W probe, k is the probe atom with radius 1 Å, charge +1, hydrophobicity +1, hydrogen bond donating +1 and hydrogen bond accepting +1. W ik is the actual value of the physicochemical property k of atom i. r iq is the mutual distance between the probe atom at grid point q and atom i of the test molecule. α is the attenuation factor whose optimal value is normally between 0.2 and 0.4, with a default value of 0.3 [32,33].

Sensitivity of a PLS Model
Most members of the data set, especially for large data sets, may have "twins" which make a near twin of each left-out molecule likely remain in the training data and usually obtain good predictions, so the q 2 statistic obtained from cross-validation may give a false sense of confidence [34]. Progressive scrambling is used to test the model's stability by determining the sensitivity of a QSAR model to small systematic perturbations of the response variable [35,36]. The values of Q 2 , cSDEP, and dq 2 /dr 2 yy' are used to interpret the predictivity of the model without the potentially confusing redundancy, in which the Q 2 statistic is an estimate of the predictivity of the model after removing the effects of redundancy, and the cSDEP statistic is an estimated cross-validated standard error at a specific critical point for r 2 yy' , and dq 2 /dr 2 yy' means slope of q 2 evaluated at the specified critical point with respect to r 2 yy' . Here, y' indicates the perturbed (scrambled) responses and the statistic r 2 yy' expresses the degree of correlation between the perturbed responses and the original ones. The Q 2 statistic obtained in this way is very conservative, in that it is necessarily reduced by the level of noise introduced to remove redundancy, so a value as low as 0.35 will signify that the original, unperturbed model is robust [33]. The dq 2 /dr 2 yy' means what extent the model changes with small changes to the dependent variable. In a stable model, changing proportionally with small changes in underlying data, has a slope near unity while unstable models change greatly with small changes in underlying response values and its effective slope is generally greater than 1.2. This method was employed to verify the optimal number of components and test the cross-validation against the possibility of such a redundancy in our training set [37].

Predictive Correlation Coefficient
q 2 is a useful but not sufficient criterion for model validation, so an external test set (r 2 pred ) [38] was claimed for the estimation of predictive ability. Equation of predictive values r 2 pred is as follows: 2 1 ( / ) pred r PRESS SD   Therein, SD means the sum of squared differences between the measured activities of the test set and the average measured activity of the training set.

Conclusions
Although many researchers have carried out quantitative structure activity relationship (QSAR) studies on thiazolidin-4-one as anti-HIV agents, the present work reports the first application of QSAR to study thiazolidin-4-ones as potent S1P1 receptor agonists [8]. We studied 61 2-imino-thiazolidin-4-one derivatives using CoMFA and CoMSIA, and some predictive 3D-QSAR models have been developed. Both CoMFA and CoMSIA models provided good statistical results in terms of q 2 and r 2 values, suggesting the significant correlations of molecular structures with its biological activities. Compared with CoMFA, CoMSIA provided a slightly better statistical model. The final CoMSIA model was generated from steric, electrostatic, hydrophobic and hydrogen-bond donor fields and validated by a variety of methods including crossvalidation, non-crossvalidation and test set predictions. The model developed has high internal validity (q 2 above 0.6) and high predictive ability (test set r 2 above 0.7). Compared with SAR summarized by Martin H. Bolli, the results of 3D-QSAR models presented in the contour coefficient maps further revealed how steric, electrostatic, hydrophobic and hydrogen-bond donor modifications should significantly affect the bioactivities of these compounds. Thus, the results of the quantitative structure activity relationships (QSAR) studies gave an insight to design new S1P 1 receptor agonists which can effectively treat a variety of immune-mediated diseases.