Structural Investigation for Optimization of Anthranilic Acid Derivatives as Partial FXR Agonists by in Silico Approaches

In this paper, a three level in silico approach was applied to investigate some important structural and physicochemical aspects of a series of anthranilic acid derivatives (AAD) newly identified as potent partial farnesoid X receptor (FXR) agonists. Initially, both two and three-dimensional quantitative structure activity relationship (2D- and 3D-QSAR) studies were performed based on such AAD by a stepwise technology combined with multiple linear regression and comparative molecular field analysis. The obtained 2D-QSAR model gave a high predictive ability (R2train = 0.935, R2test = 0.902, Q2LOO = 0.899). It also uncovered that number of rotatable single bonds (b_rotN), relative negative partial charges (RPC−), oprea's lead-like (opr_leadlike), subdivided van der Waal’s surface area (SlogP_VSA2) and accessible surface area (ASA) were important features in defining activity. Additionally, the derived3D-QSAR model presented a higher predictive ability (R2train = 0.944, R2test = 0.892, Q2LOO = 0.802). Meanwhile, the derived contour maps from the 3D-QSAR model revealed the significant structural features (steric and electronic effects) required for improving FXR agonist activity. Finally, nine newly designed AAD with higher predicted EC50 values than the known template compound were docked into the FXR active site. The excellent molecular binding patterns of these molecules also suggested that they can be robust and potent partial FXR agonists in agreement with the QSAR results. Overall, these derived models may help to identify and design novel AAD with better FXR agonist activity.


Introduction
Farnesoid X receptor (FXR) is a nuclear receptor expressed in liver, gall bladder, intestine, kidney, and adrenal glands. It regulates important physiological roles in various metabolic pathways involved in bile acid, triglyceride, and glucose homeostasis. Now, FXR has become an attractive target for treating a wide range of metabolic diseases, including diabetes, cholestasis, liver fibrosis, and inflammatory bowel diseases [1][2][3]. Therefore, a number of synthetic steroidal and nonsteroidal FXR agonists have been developed so far. 6-ethyl-chenodeoxycholic acid20 (6-ECDCA) and GW4064 were the most important and widely used steroidal and nonsteroidal FXR agonists [4]. They both constitute full FXR agonists with low nanomolar EC 50 values of 85 nM and 0.9 M in a reporter gene assay [5], respectively. In addition, a recent study indicated that GW4064 was active on several off-targets [6]. Considering that metabolic diseases require a stable long-term therapy, well tolerated and low toxicity FXR agonists are predominantly required that can be applied over long time. Moreover, full activation of a ligand activated transcription factor may cause many side effects in long-term treatment [7]. Therefore, new potent partial FXR agonists aimed at providing a stable long-term therapy for metabolic diseases have attracted more and more attention nowadays.
Recently, a novel series of partial FXR agonists based on anthranilic acid skeleton have been reported by Merk et al. [8,9]. The continued interest in the development of more potent partial FXR agonists prompted us to explore the relationship between structures of AAD and FXR agonist activity. Here, quantitative structure activity relationship (QSAR) methods were introduced to guide lead optimization and study the action mechanism for partial FXR agonists. In this QSAR method, the bioactivity of compounds can be predicted by a mathematical model between physicochemical properties and bioactivity. The mathematical model can be achieved by many general algorithms such as linear and nonlinear algorithms or other new methods such as the spectral-structure activity relationship algorithm [10,11]. This QSAR method has become very useful and is widely applied in many fields for predicting compound properties [12,13], including physical property prediction, biological activity prediction, and toxicity prediction.
To the best of our knowledge, no QSAR study has yet been reported in AAD as FXR agonists so far. In this paper, we attempted to investigate the significant structural and physicochemical features required for improving biological activity and to obtain highly predictive 2D-and 3D-QSAR models so as to assist in the design of new potent partial FXR agonists. Firstly, a stepwise technology combined with multiple linear regression (MLR) was applied to develop predictive 2D-QSAR models for uncovering physicochemical features on FXR activity of AAD. Subsequently, a 3D-QSAR study was also performed to obtain more understanding with respect to chemical structures and biological activity using comparative molecular field analysis (CoMFA). The CoMFA model can provide identification of regions in space where the interactive fields may influence the biological activities in the form of contour maps, which would generate graphical visualization of crucial steric and electrostatic features in 3D Cartesian space [14]. Finally, some important observations were also made during the study concerning nine newly designed AAD with high predicted bioactivity and their interactions with the FXR active site by molecular docking. Molecular docking aims to predict the binding-conformation of ligands to the appropriate target binding site. The success of a docking program depends on two components: the search algorithm and the scoring function. A variety of conformational search strategies have been reported such as systematic or stochastic search or genetic algorithms or simplified molecular input line entry system conformation [15]. Most scoring functions are physics-based molecular mechanics force fields that estimate the energy of the pose within the binding site. where, N train and N test are the number of compounds in the training set and the test set, respectively. R 2 train and R 2 test are the squared correlation coefficient of training set and test set, respectively; Q 2 LOO is the leave-one-out (LOO) cross-validation squared correlation coefficient; F is the F-test value; RMSE is root mean standard error. The selected variables and their chemical meanings, standard coefficients are shown in Table 1. A variable inflation factor (VIF) (VIF = 1/(1´R j 2 ), R j 2 represents the multiple correlation coefficient of one descriptor's effect on the remaining molecular descriptors) was calculated to determine if multicollinearity existed among the descriptors in models. If VIF arrays from 1.0 to 5.0, the linked equation is suitable [16]. As shown in Table 1, the VIF of all descriptors were smaller than 4, indicating that the generated model possessed statistic significance and good stability. Table 2 shows the correlation matrix of the selected descriptors. From this table, it can be seen that the linear correlation coefficient value for each pair of descriptors was smaller than 0.85, suggesting that the selected descriptors were independent, meeting the important criterion for the model selections [17]. The predicted results of the MLR model are given in Table 3 and shown in Figure 1A. As described in Table 4, obviously, the MLR model was very successfully built with statistical significance and good prediction ability. The R 2 train value of this model reveals that it can explain 93.5% of the variance in activity. The Q 2 LOO value of 0.899 was much larger than 0.5, indicating that the developed model had very good stability and predictive ability. In addition, the value of R 2 test for the external prediction was 0.902, showing the good prediction and generalization ability. calculated to determine if multicollinearity existed among the descriptors in models. If VIF arrays from 1.0 to 5.0, the linked equation is suitable [16]. As shown in Table 1, the VIF of all descriptors were smaller than 4, indicating that the generated model possessed statistic significance and good stability. Table 2 shows the correlation matrix of the selected descriptors. From this table, it can be seen that the linear correlation coefficient value for each pair of descriptors was smaller than 0.85, suggesting that the selected descriptors were independent, meeting the important criterion for the model selections [17]. The predicted results of the MLR model are given in Table 3 and shown in Figure 1A. As described in Table 4, obviously, the MLR model was very successfully built with statistical significance and good prediction ability. The R 2 train value of this model reveals that it can explain 93.5% of the variance in activity. The Q 2 LOO value of 0.899 was much larger than 0.5, indicating that the developed model had very good stability and predictive ability. In addition, the value of R 2 test for the external prediction was 0.902, showing the good prediction and generalization ability.        Finally, to confirm the robustness of the model, the Y-randomization test was performed in this study. The dependent variable vector is randomly shuffled and a new model is constructed. If the new model gives significantly lower values for both R 2 train and Q 2 LOO statistics compared to the original model, the original generated model is not considered as resulting from a chance correlation [18]. The results of ten Y-randomization tests are summarized in Table 5. As can be seen, all new R 2 train and Q 2 LOO values were much lower than those of the original model. Thereby, the good results for the MLR model were not due to a chance correlation or structural dependency of the training set.

Model Applicability Domain Analysis for the MLR Model
Finally, to evaluate the generalization degree of the generated model, the applicability domain (AD) was defined by a Williams plot. In the Williams plot, leverage values versus standardized residuals were plotted to detect both the structurally influential chemicals (X outliers) and the response outliers (Y outliers) [19]. The leverage value h is defined as: where x i is the descriptor row vector of compound, X is the matrix of the descriptor values of the training set and n is the number of training sets. The superscript "T" refers to the transposed value of the matrix/vector [19,20]. When a leverage value h is higher than the threshold value h* (calculated as 3(m + 1)/n, where m is the number of model parameters and n is the number of the training set), it is considered as X outliers. In addition, a value of˘3.0 standard deviation units is widely used as a cut off value for defining Y outliers. In this study, the Williams plot for the MLR model is shown in Figure 2. From this Figure 2, it can be found that no Y outliers existed in the investigated data set. Nevertheless, there were five molecules (compound 18, 20, 32, 37 and 26) with a leverage value higher than the warning leverage limit (0.581), but their predicted values were very satisfactory, with standard residuals lower than 1.0 standard deviation units. Hence, these molecules were not influential in the fitting performance of the model. Conversely, it further showed the reliability of the predictions of the generated model as well as confirmed its good generalization ability [19]. Therefore, compounds with high value of leverage and good fitting in the developed model can stabilize the model, and not be considered as X outliers.
As can be seen from the above results, the MLR model was significantly highly predictive, reliable and robust. It can be used to predict the FXR agonist activity of new AAD.

Interpretation of the Descriptors
The MLR model encompassed five descriptors: b_rotN, RPC − , opr_leadlike, SlogP_VSA2 and ASA, indicating some vital physicochemical features of AAD to govern the FXR agonist activity. The relative importance of the descriptors in the model was varied in view of their standardized regression coefficients shown in Table 1 [19]. Therefore, the relative importance order is ASA > SlogP_VSA2 > b_rotN > RPC − > opr_leadlike. ASA is the water accessible surface area calculated using a radius of 1.4 A for the water molecule. This showed that the water accessible surface area of FXR agonists might influence their agonist activity. Its positive coefficient value indicated that high polar groups tend to increase the agonist activity. For instance, it can be observed from the agonist activity of compounds 9 (having 3-cyanophenyl with pEC50 = 6.638) and 10 (having 3-methoxyphenyl with pEC50 = 6.420) or 1 (having 3-carboxyphenyl with pEC50 = 6.553) and 8 (having acetylphenyl with pEC50 = 6.319) in Table 3. Slogp_VSA2 is the subdivided surface area descriptor, which is based on the sum of the approximate accessible van der Waal's surface area, calculated for each atom with contribution to the log of the partition coefficient (octanol/water) in the range of (−0.2,0). Its negative coefficient value indicated that high hydrophobicity tended to decrease the agonist activity. Obviously, the bioactivity of molecules with aliphatic chains at region A in Table 3, such as 34 (with pEC50 of 5.602) or 35, 36, 37 and 38 (with pEC50 of 5.066-5.357), was lower than those without aliphatic chains such as 1 (with pEC50 of 6.553) or 39 and 40 (with pEC50 of 5.824-6.000). B_rotN is the number of rotatable single bonds. Its positive coefficient illustrated that more b_rotN was favorable to the FXR agonist activity. For instance, the bioactivity of compounds 15 and 16 or 35, 36 and 38 are varied in order: 16 (having b_rotN of 10) > 15 (having b_rotN of 9) or 38 (having b_rotN of 11) > 36 (having b_rotN of 10) > 35 (having b_rotN of 9). RPC − is a relative negative partial charge descriptor that depends on the partial charge of each atom of a chemical structure. The positive sign of this descriptor illustrated that the relative negative partial charge of the molecule was favorable for the agonist activity. It can be quickly understood by comparing molecules 13 (having -CONH2 headgroup with pEC50 = 7.131) and 1 (having -COOH headgroup with pEC50 = 6.553). Opr_leadlike belongs to atom count and bond count descriptors that refer to the number of violations of the Oprea's lead-like test. The positive contribution of this descriptor indicated that the high value of opr_leadlike was beneficial to the bioactivity.
Therefore, high polar groups such as the acidic headgroup together with high values of RPC − and b_rotN are favorable for FXR agonist activity. Further, the aliphatic chain has a negative effect on it.

Interpretation of the Descriptors
The MLR model encompassed five descriptors: b_rotN, RPC´, opr_leadlike, SlogP_VSA2 and ASA, indicating some vital physicochemical features of AAD to govern the FXR agonist activity. The relative importance of the descriptors in the model was varied in view of their standardized regression coefficients shown in Table 1 [19]. Therefore, the relative importance order is ASA > SlogP_VSA2 > b_rotN > RPC´> opr_leadlike. ASA is the water accessible surface area calculated using a radius of 1.4 A for the water molecule. This showed that the water accessible surface area of FXR agonists might influence their agonist activity. Its positive coefficient value indicated that high polar groups tend to increase the agonist activity. For instance, it can be observed from the agonist activity of compounds 9 (having 3-cyanophenyl with pEC 50 = 6.638) and 10 (having 3-methoxyphenyl with pEC 50 = 6.420) or 1 (having 3-carboxyphenyl with pEC 50 = 6.553) and 8 (having acetylphenyl with pEC 50 = 6.319) in Table 3. Slogp_VSA2 is the subdivided surface area descriptor, which is based on the sum of the approximate accessible van der Waal's surface area, calculated for each atom with contribution to the log of the partition coefficient (octanol/water) in the range of (´0.2,0). Its negative coefficient value indicated that high hydrophobicity tended to decrease the agonist activity. Obviously, the bioactivity of molecules with aliphatic chains at region A in Table 3, such as 34 (with pEC 50 of 5.602) or 35, 36, 37 and 38 (with pEC 50 of 5.066-5.357), was lower than those without aliphatic chains such as 1 (with pEC 50 of 6.553) or 39 and 40 (with pEC 50 of 5.824-6.000). B_rotN is the number of rotatable single bonds. Its positive coefficient illustrated that more b_rotN was favorable to the FXR agonist activity. For instance, the bioactivity of compounds 15 and 16 or 35, 36 and 38 are varied in order: 16 (having b_rotN of 10) > 15 (having b_rotN of 9) or 38 (having b_rotN of 11) > 36 (having b_rotN of 10) > 35 (having b_rotN of 9). RPC´is a relative negative partial charge descriptor that depends on the partial charge of each atom of a chemical structure. The positive sign of this descriptor illustrated that the relative negative partial charge of the molecule was favorable for the agonist activity. It can be quickly understood by comparing molecules 13 (having -CONH 2 headgroup with pEC 50 = 7.131) and 1 (having -COOH headgroup with pEC 50 = 6.553). Opr_leadlike belongs to atom count and bond count descriptors that refer to the number of violations of the Oprea's lead-like test. The positive contribution of this descriptor indicated that the high value of opr_leadlike was beneficial to the bioactivity.
Therefore, high polar groups such as the acidic headgroup together with high values of RPC´and b_rotN are favorable for FXR agonist activity. Further, the aliphatic chain has a negative effect on it.

CoMFA Analysis
To graphically visualize the key chemical structural features that attributed to enhance the FXR agonist activity, CoMFA models were derived. The results of the CoMFA studies are listed in Table 4. The optimum number of components and filtering value for the CoMFA models were four and six, which were calculated by selecting the highest Q 2 LOO value. The generated CoMFA model illustrated a  Table 4). The pEC 50 values predicted by the CoMFA model are listed in Table 3. Figure 1B demonstrates the correlation between experimental and predicted pEC 50 values by the CoMFA model.

CoMFA Contour Maps
The steric and electrostatic contour maps derived by the CoMFA model based on the reference molecule (compound 30) are shown in Figure 3. The steric interactions are represented by green and yellow contours, while electrostatic interactions are represented by red and blue contours. In the green region of the steric contour plot, bulky substitutes enhance biological activity, while in the yellow regions, these are likely to decrease the activity. Blue contours represent regions where positive charge increases activity, whereas red-colored regions represent areas where negative charge enhances activity [17]. The three regions A, B, and C of compound 30 are depicted in Figure 4.

CoMFA Analysis
To graphically visualize the key chemical structural features that attributed to enhance the FXR agonist activity, CoMFA models were derived. The results of the CoMFA studies are listed in Table 4. The optimum number of components and filtering value for the CoMFA models were four and six, which were calculated by selecting the highest Q 2 LOO value. The generated CoMFA model illustrated a  Table 4). The pEC50 values predicted by the CoMFA model are listed in Table 3. Figure 1B demonstrates the correlation between experimental and predicted pEC50 values by the CoMFA model.

CoMFA Contour Maps
The steric and electrostatic contour maps derived by the CoMFA model based on the reference molecule (compound 30) are shown in Figure 3. The steric interactions are represented by green and yellow contours, while electrostatic interactions are represented by red and blue contours. In the green region of the steric contour plot, bulky substitutes enhance biological activity, while in the yellow regions, these are likely to decrease the activity. Blue contours represent regions where positive charge increases activity, whereas red-colored regions represent areas where negative charge enhances activity [17]. The three regions A, B, and C of compound 30 are depicted in Figure 4.

CoMFA Analysis
To graphically visualize the key chemical structural features that attributed to enhance the FXR agonist activity, CoMFA models were derived. The results of the CoMFA studies are listed in Table 4. The optimum number of components and filtering value for the CoMFA models were four and six, which were calculated by selecting the highest Q 2 LOO value. The generated CoMFA model illustrated a Q 2 LOO value of 0.802 (>0.5) by four components (RMSELOO = 0.383). The non-cross-validated PLS analysis with the four components resulted in R 2 train of 0.944, F value of 109.711 and RMSEtrain of 0.203 and R 2 test of 0.892. The contributions of steric and electrostatic fields calculated by the CoMFA model were 49.7% and 50.3% of the variance, respectively. The obtained high R 2 train, Q 2 train and F values along with the lower RMSEtrain indicated the satisfactory predictive ability of the derived model ( Table 4). The pEC50 values predicted by the CoMFA model are listed in Table 3. Figure 1B demonstrates the correlation between experimental and predicted pEC50 values by the CoMFA model.

CoMFA Contour Maps
The steric and electrostatic contour maps derived by the CoMFA model based on the reference molecule (compound 30) are shown in Figure 3. The steric interactions are represented by green and yellow contours, while electrostatic interactions are represented by red and blue contours. In the green region of the steric contour plot, bulky substitutes enhance biological activity, while in the yellow regions, these are likely to decrease the activity. Blue contours represent regions where positive charge increases activity, whereas red-colored regions represent areas where negative charge enhances activity [17]. The three regions A, B, and C of compound 30 are depicted in Figure 4.   As shown in Figure 3A, there are two large yellow contours near regions A and C, one medium yellow contour near region B and one small yellow contour near region A, indicating that the bioactivity of molecules would be influenced by the introduction of bulky groups near these regions. According to Table 3, this can be explained by a comparison between molecules 21 (having -CH 3 group at position 4 of region A with pEC 50 = 7.347) and 32 (having -OCH 3 at position 4 of region A with pEC 50 = 7.328). The small yellow contour near position 6 of region A also suggested that the agonist activity would be decreased by the introduction of bulky groups here, such as compounds (18,20,19,1) where the use of bulky groups (-OCH 3 > -Cl > -F > -H) resulted in lower pEC 50 values (5.328 < 5.959 < 6.319 < 6.553). A medium yellow contour near R 2 at region B indicated that bulky groups here would cause lower activity. This can be observed by the comparison of molecules 27 (substituted by -CH 3 with pEC 50 value of 7.367) and 32 (substituted by -OCH 3 with pEC 50 value of 7.060), where the volume of -CH 3 was smaller than -OCH 3 . This can also be observed by a comparison of compounds 25 (substituted by -Cl with pEC 50 value of 7.328) and 29 (substituted by -Br with pEC 50 value of 7.319). Another large yellow contour near region C showed that bulky groups at positions 3 and 5 of region C would lead to lower activity. For instance, the agonist activity of compounds 35 or 40 (substituted by naphthalen-2-yl) and 34 or 1 (substituted by 4-tert-butylphenyl) was varied in the order: 35 < 34 or 40 < 1. One small green contour near positions 2 and 3 of region A indicated that FXR agonist activity would be enhanced by introduction of bulky groups here. This can be observed by comparing molecules 17 (having -CH 3 group at position 2 of region A) and 1 (having -H group at position 2 of region A), where using a bulky group influenced the outcome of pEC 50 values (7.377 > 6.553). This can also be observed by comparing the bioactivity of molecules 16 and 15, where using bulky groups (-CH 2 CH 2 COOH > -CH 2 COOH) at position 3 of region A lead to higher pEC 50 values (7.194 > 6.377). Another two small green contours near R 3 substituent groups at region C showed the favorable effect of bulky groups here in increasing the biological activity of compounds. For instance, this can be explained by comparing the activity of compounds 1 (substituted by -C(CH 3 ) 3 with pEC 50 value of 6.553) and 2 (substituted by -CF 3 with pEC 50 value of 5.161) or compounds 34 (substituted by -C(CH 3 ) 3 with pEC 50 value of 5.602) and 33 (substituted by -CH 2 CH 3 group with pEC 50 value of 5.237).
As can be seen from Figure 3B, there was one medium blue contour near positions 2 and R 1 of region A, which showed the favorable effect of electro-donating groups in increasing the biological activity of compounds. For instance, it can be observed by molecules 17 (having -CH 3 group at position 2 of region A with pEC 50 value of 7.377) and 1 (having -H group at position 2 of region A with pEC 50 value of 6.553). This also can be explained by comparing the activity of compounds 13, 1 and 8, where using electro-donating substituents at R 1 (-NH 2 > -OH > -CH 3 ) would result in higher pEC 50 values (7.131 > 6.553 > 6.319).

Chemical Structure Design
According to the information derived from the contour maps generated by the 3D-QSAR models, some important information about the chemical structures requirement was presented to investigate the effect of each kind of group as the substituent for regions A, B and C on FXR agonist activity. The bulky groups with lower electronegativity at positions 2 and R 1 substituent of region A together with bulky groups at R 3 of region C were considered to enhance the FXR agonist activity. However, the presence of bulky groups at positions 4 and 6 of region A, R 2 of region B and 3 and 5 of region C would decrease the agonist activity. Therefore, some new compounds as potent FXR agonists were designed and are listed in Table 6. To investigate the results of each substituent on the activity results, CoMFA was the best modeling tool for use. The newly designed compounds showed that the bulky groups with lower electronegativity at R 1 of region A had positive effects. This can be observed by comparing compound N3 (having -N(CH 3 ) 2 at R 1 of region A with predicted pEC 50 value of 8.322) with template compound T30 (having -OH at R 1 of region A with predicted and actual pEC 50 values of 8.175 and 8.097, respectively). The next attempt was to improve the effects of functional groups at R 2 of region B where bulky effects were presented. It was observed that the addition of less bulky groups (such as -CH 3 ) at R 2 of region B (Table 6) can lead to better agonist activity (compound N2 with predicted pEC 50 value of 8.323). Then, to investigate the bulky effects at R 3 of region C, different bulky groups were tried. This can be observed by comparing compounds N1, N2, N4 (having -C(CF 3 ) 3 , -C(CH 3 ) 3 and -CI 3 at R 3 of region C with predicted pEC 50 values of 8.350, 8.323 and 8.304, respectively). Finally, to observe the effects of the addition of lower electronegativity groups at positions 2 of region A, the donor groups were investigated. It can be seen that using donor group substituent (-H < -CH 3 < -OH) at R 2 of region A would lead to an increase in the predicted pEC 50 values in the compounds: N7 (having -CH 3 with pEC 50 = 8.374), N9 (having -OH with pEC 50 = 8.388) and N1 (having -H with pEC 50 = 8.350). Among the designed compounds, N9 presented the highest activity with a pEC 50 value of 8.388. To understand the origin of this increase in activity, compounds T30 and N3 can be compared. groups at R 2 of region B where bulky effects were presented. It was observed that the addition of less bulky groups (such as -CH3) at R 2 of region B (Table 6) can lead to better agonist activity (compound N2 with predicted pEC50 value of 8.323). Then, to investigate the bulky effects at R 3 of region C, different bulky groups were tried. This can be observed by comparing compounds N1, N2, N4 (having -C(CF3)3, -C(CH3)3 and -CI3 at R 3 of region C with predicted pEC50 values of 8.350, 8.323 and 8.304, respectively). Finally, to observe the effects of the addition of lower electronegativity groups at positions 2 of region A, the donor groups were investigated. It can be seen that using donor group substituent (-H < -CH3 < -OH) at R 2 of region A would lead to an increase in the predicted pEC50 values in the compounds: N7 (having -CH3 with pEC50 = 8.374), N9 (having -OH with pEC50 = 8.388) and N1 (having -H with pEC50 = 8.350). Among the designed compounds, N9 presented the highest activity with a pEC50 value of 8.388. To understand the origin of this increase in activity, compounds T30 and N3 can be compared.

Molecular Docking Study
These molecules were ideally best based on their chemical structures, physicochemical properties and biological activity. Hence, molecular docking embedded in Molecular Operating Environment (MOE2008.10, Chemical Computing Group, Montreal, QC, Canada) was applied to study the binding modes and important interactions.
Prior to the docking, the crystal structure of FXR complexed with benzimidazole-based partial agonistic ligand was first downloaded from a protein data bank (PDB: 3OLF). The protein was protonated using the AMBER99 force field. A set of possible conformations of nine newly designed molecules was prepared by the conformational generation function of MOE. Then, molecular docking was carried out by following parameters: the binding site was defined by the ligand atom mode; triangle matcher was used as a placement method; two rescoring methods were computed, rescoring 1 was selected as London dG; rescoring 2 was selected as affinity; force field was used as a refinement [21].
A critical factor that determines the effectiveness of a docking program is its ability to reproduce ligand poses in the receptor as close to those found in X-ray deduced structures as possible [22]. In this docking study, the root-mean-square distance (RMSD) parameter measured between the complexed ligand and the redocked ligand was 0.5749 Å, suggesting that the docking results were very suitable and reliable. Docking results are listed in Table 6. Obviously, these newly designed

Molecular Docking Study
These molecules were ideally best based on their chemical structures, physicochemical properties and biological activity. Hence, molecular docking embedded in Molecular Operating Environment (MOE2008.10, Chemical Computing Group, Montreal, QC, Canada) was applied to study the binding modes and important interactions.
Prior to the docking, the crystal structure of FXR complexed with benzimidazole-based partial agonistic ligand was first downloaded from a protein data bank (PDB: 3OLF). The protein was protonated using the AMBER99 force field. A set of possible conformations of nine newly designed molecules was prepared by the conformational generation function of MOE. Then, molecular docking was carried out by following parameters: the binding site was defined by the ligand atom mode; triangle matcher was used as a placement method; two rescoring methods were computed, rescoring 1 was selected as London dG; rescoring 2 was selected as affinity; force field was used as a refinement [21].
A critical factor that determines the effectiveness of a docking program is its ability to reproduce ligand poses in the receptor as close to those found in X-ray deduced structures as possible [22]. In this docking study, the root-mean-square distance (RMSD) parameter measured between the complexed ligand and the redocked ligand was 0.5749 Å, suggesting that the docking results were very suitable and reliable. Docking results are listed in Table 6. Obviously, these newly designed compounds had higher docking scores for FXR than the known template compound T30, which was in agreement with the 2D-and 3D-QSAR results. The best docked orientation of compounds is shown in Figure 5, showing that newly designed compounds can be well docked into the ligand binding site of FXR. The best docked conformation of the most active compound N9, as shown in Figure 6A, revealed that the presence of perfluoroalkyl chain substituted groups at region C allowed for potentiation of strong hydrophobic interactions with Met369, Leu291, Trp458, Met454, Ile361, Leu455 and Phe333 in the active site of FXR and formed two H-bonds with Arg335 and Tyr373. Comparative molecular docking between compound N9 and the complexed ligand, shown in Figure 6, indicated that the former had a better binding score than the latter, suggesting that hydrophobic interactions between groups at region C with these amino acids played a dominant role in the FXR agonist activity. Thereby, the hydrophobic interaction of groups at region C seems to stabilize the compound within the binding site, thus contributing greater activity. compounds had higher docking scores for FXR than the known template compound T30, which was in agreement with the 2D-and 3D-QSAR results. The best docked orientation of compounds is shown in Figure 5, showing that newly designed compounds can be well docked into the ligand binding site of FXR. The best docked conformation of the most active compound N9, as shown in Figure 6A, revealed that the presence of perfluoroalkyl chain substituted groups at region C allowed for potentiation of strong hydrophobic interactions with Met369, Leu291, Trp458, Met454, Ile361, Leu455 and Phe333 in the active site of FXR and formed two H-bonds with Arg335 and Tyr373. Comparative molecular docking between compound N9 and the complexed ligand, shown in Figure 6, indicated that the former had a better binding score than the latter, suggesting that hydrophobic interactions between groups at region C with these amino acids played a dominant role in the FXR agonist activity. Thereby, the hydrophobic interaction of groups at region C seems to stabilize the compound within the binding site, thus contributing greater activity.

Data Set
The structures and biological activities of 41 AAD as FXR agonists used for the QSAR analyses were taken from Merk et al. [8,9] and are listed in Table 3. The agonist activity (EC50) value was converted to a logarithmic-scale pEC50 value, which was taken as the dependent parameter for the QSAR study. In order to establish a reliable model, the data set was randomly divided into two subsets, a training set of 31 compounds (approximately 75% of the data) that represented a wide compounds had higher docking scores for FXR than the known template compound T30, which was in agreement with the 2D-and 3D-QSAR results. The best docked orientation of compounds is shown in Figure 5, showing that newly designed compounds can be well docked into the ligand binding site of FXR. The best docked conformation of the most active compound N9, as shown in Figure 6A, revealed that the presence of perfluoroalkyl chain substituted groups at region C allowed for potentiation of strong hydrophobic interactions with Met369, Leu291, Trp458, Met454, Ile361, Leu455 and Phe333 in the active site of FXR and formed two H-bonds with Arg335 and Tyr373. Comparative molecular docking between compound N9 and the complexed ligand, shown in Figure 6, indicated that the former had a better binding score than the latter, suggesting that hydrophobic interactions between groups at region C with these amino acids played a dominant role in the FXR agonist activity. Thereby, the hydrophobic interaction of groups at region C seems to stabilize the compound within the binding site, thus contributing greater activity.

Data Set
The structures and biological activities of 41 AAD as FXR agonists used for the QSAR analyses were taken from Merk et al. [8,9] and are listed in Table 3. The agonist activity (EC50) value was converted to a logarithmic-scale pEC50 value, which was taken as the dependent parameter for the QSAR study. In order to establish a reliable model, the data set was randomly divided into two subsets, a training set of 31 compounds (approximately 75% of the data) that represented a wide

Data Set
The structures and biological activities of 41 AAD as FXR agonists used for the QSAR analyses were taken from Merk et al. [8,9] and are listed in Table 3. The agonist activity (EC 50 ) value was converted to a logarithmic-scale pEC 50 value, which was taken as the dependent parameter for the QSAR study. In order to establish a reliable model, the data set was randomly divided into two subsets, a training set of 31 compounds (approximately 75% of the data) that represented a wide range of varied structures and a test set of 10 compounds (approximately 25%) that followed the distribution of the activity values for the training set [14]. The training set is to build models, while the test set marked by the asterisk in Table 3 will be used to evaluate the prediction ability of the final training set model.

Descriptors Calculation
All 2D structures of the molecules in Table 3 were sketched and their 3D structures were subjected to energy minimization using the molecular mechanics force field (MMFF) method with a convergence criterion of 0.01 kcal/mol and partial atomic charges. The final geometry optimization of each energy-minimized structure was carried out by stochastic conformational search. Then, only the lowest energy conformer of each structure was used to calculate 327 descriptors by employing the QuaSAR module of MOE [23]. These calculated descriptors include three classes: 2D descriptors, which use the atoms and connection information of the molecules, internal 3D (i3D), which uses 3D coordinate information about each molecule and external 3D (x3D), which uses 3D coordinate information with an absolute frame of reference. All the above processes were performed using MOE2008.10 package.

Stepwise Multiple Linear Regression (SW-MLR)
In this study, a stepwise technology combined with MLR (SW-MLR) was employed to select a set of the most relevant descriptors for model constructions. The stepwise regression combines forward and backward selections. It selects statistically meaningful variables that can appreciably increase the residual sum of squares checked by the Fisher test [24]. Therefore, different MLR models will be derived in this procedure. The selection of a good MLR equation is made by statistical parameters such as the squared correlation coefficient (R 2 ), root mean standard error (RMSE), and Fisher statistic [25]. The best MLR model should have high R 2 and Fisher statistic, and low RMSE.

Molecular Alignment
The 3D-QSAR model was constructed by CoMFA embedded in SYBYL 6.9. Because the prediction accuracy of CoMFA is highly dependent on the structural alignment of the molecules with a reference compound, the selection of the template molecule plays an important role in performing CoMFA. Generally, the lowest energy conformer of the most active compound is selected as a template molecule for superimposition of all other compounds [26]. Therefore, compound 30 (as shown in Figure 4) was identified as a reference molecule in view of its highest activity, and all of the remaining compounds were aligned on it to derive CoMFA models. The structures of the aligned molecules are demonstrated in Figure 7. range of varied structures and a test set of 10 compounds (approximately 25%) that followed the distribution of the activity values for the training set [14]. The training set is to build models, while the test set marked by the asterisk in Table 3 will be used to evaluate the prediction ability of the final training set model.

Descriptors Calculation
All 2D structures of the molecules in Table 3 were sketched and their 3D structures were subjected to energy minimization using the molecular mechanics force field (MMFF) method with a convergence criterion of 0.01 kcal/mol and partial atomic charges. The final geometry optimization of each energy-minimized structure was carried out by stochastic conformational search. Then, only the lowest energy conformer of each structure was used to calculate 327 descriptors by employing the QuaSAR module of MOE [23]. These calculated descriptors include three classes: 2D descriptors, which use the atoms and connection information of the molecules, internal 3D (i3D), which uses 3D coordinate information about each molecule and external 3D (x3D), which uses 3D coordinate information with an absolute frame of reference. All the above processes were performed using MOE2008.10 package.

Stepwise Multiple Linear Regression (SW-MLR)
In this study, a stepwise technology combined with MLR (SW-MLR) was employed to select a set of the most relevant descriptors for model constructions. The stepwise regression combines forward and backward selections. It selects statistically meaningful variables that can appreciably increase the residual sum of squares checked by the Fisher test [24]. Therefore, different MLR models will be derived in this procedure. The selection of a good MLR equation is made by statistical parameters such as the squared correlation coefficient (R 2 ), root mean standard error (RMSE), and Fisher statistic [25]. The best MLR model should have high R 2 and Fisher statistic, and low RMSE.

Molecular Alignment
The 3D-QSAR model was constructed by CoMFA embedded in SYBYL 6.9. Because the prediction accuracy of CoMFA is highly dependent on the structural alignment of the molecules with a reference compound, the selection of the template molecule plays an important role in performing CoMFA. Generally, the lowest energy conformer of the most active compound is selected as a template molecule for superimposition of all other compounds [26]. Therefore, compound 30 (as shown in Figure 4) was identified as a reference molecule in view of its highest activity, and all of the remaining compounds were aligned on it to derive CoMFA models. The structures of the aligned molecules are demonstrated in Figure 7.

CoMFA Modeling
After aligning the molecules within the lattice that extended 4 Å units beyond the align molecules in all directions, an sp 3 hybridized carbon was utilized as a probe atom to generate the steric and electrostatic fields with a charge of +1.0 and a van der Waals radius of 1.52 Å. The steric and electrostatic contributions were set as a default cut-off energy value of 30 kcal/mol. A partial least-squares (PLS) method, an extension of multiple regression analysis, was applied to calculate the minimal set of grid points and then linearly correlate the CoMFA fields to the pEC 50 values in order to generate the CoMFA model [27].

Model Validation
The predictive ability and reliability of 2D-and 3D-QSAR models were evaluated by internal and external validations. The leave-one-out (LOO) cross-validation technology is often considered as the best way to internally validate the quality of derived models [28]. The LOO produces a number of models by deleting one object from the training set, which employs all the information available. Generally, when the value of LOO crossed validated correlation coefficient (Q 2 LOO ) goes over a threshold value of 0.5, the model is acceptable [29]. In addition, external validation is also essential and significant to evaluate the generalization performance of the proposed model [25,30]. The statistical parameters, such as the root mean square errors (RMSE test ) and the squared correlation coefficient (R 2 test ) of the external test set were calculated to assess the performance of the model [31].
All algorithms were written in MATLAB 8.0 and run on a computer [Intel(R) Pentium(R), 2.00-GB RA].

Conclusions
In this paper, a three level in silico approach was applied to investigate some important structural and physicochemical aspects of highly potent partial FXR agonists. Initially, 2D-QSAR using methods of both SW-MLR and 3D-QSAR CoMFA studies was performed based on forty-one AAD. Satisfactory results were obtained with the proposed methods. The best derived 2D-QSAR model by SW-MLR can explain 93.5% of the variance in activity with a low RMSE of 0.219. In addition, the 2D-QSAR study demonstrated that b_rotN, RPC´, opr_leadlike, SlogP_VSA2, ASA of molecules had high correlation with the FXR agonist activity. Meanwhile, the best 3D-QSAR model presented higher predictive ability (R 2 train = 0.944, RMSE train = 0.203, Q 2 LOO = 0.802, R 2 test = 0.892) compared with the 2D-QSAR models. The derived contour maps from the 3D-QSAR model suggested the significant structural features (steric and electronic effects) required for improving biological activity. Consequently, the bulky groups with lower electronegativity at positions 2 and R 1 substituent of region A together with bulky groups at R 3 of region C were considered to enhance the FXR agonist activity. However, the presence of bulky groups at positions 4 and 6 of region A, R 2 of region B and 3 and 5 of region C would decrease the agonist activity. Therefore, the obtained 2D-and 3D-QSAR models could provide valuable guidance for future design of new potent partial FXR agonists with an anthranilic acid skeleton in the drug discovery process. Finally, nine newly designed AAD with predicted pEC 50 values higher than the known template compound were docked to the ligand binding domain of FXR. The molecular binding patterns and docking scores of these nine molecules also suggested that they can be robust and potent partial FXR agonists in agreement with the QSAR results. This also revealed that the hydrophobic interaction of groups at region C with Met369, Leu291, Trp458, Met454, Ile361, Leu455 and Phe333 seemed to stabilize the compound within the binding site, thus contributing greater activity. To the best of our knowledge, this work constituted the first in silico study for AAD as partial FXR agonists.