Application of 4D-QSAR Studies to a Series of Raloxifene Analogs and Design of Potential Selective Estrogen Receptor Modulators

Four-dimensional quantitative structure-activity relationship (4D-QSAR) analysis was applied on a series of 54 2-arylbenzothiophene derivatives, synthesized by Grese and coworkers, based on raloxifene (an estrogen receptor-alpha antagonist), and evaluated as ERα ligands and as inhibitors of estrogen-stimulated proliferation of MCF-7 breast cancer cells. The conformations of each analogue, sampled from a molecular dynamics simulation, were placed in a grid cell lattice according to three trial alignments, considering two grid cell sizes (1.0 and 2.0 Å). The QSAR equations, generated by a combined scheme of genetic algorithms (GA) and partial least squares (PLS) regression, were evaluated by “leave-one-out” cross-validation, using a training set of 41 compounds. External validation was performed using a test set of 13 compounds. The obtained 4D-QSAR models are in agreement with the proposed mechanism of action for raloxifene. This study allowed a quantitative prediction of compounds’ potency and supported the design of new raloxifene analogs.

Estrogen replacement therapy, in postmenopausal women, prevents osteoporosis and has advantageous effects on the cardiovascular system [4][5][6]. However, this therapy can increase the risk of hormone-dependent breast and uterine cancer [7]. In this regard, alternative treatments have been studied and new drugs have been developed, such as those agents classified as Selective Estrogen Receptor Modulators (SERMs), because they act, according to the target tissue, as estrogen agonists and antagonists, e.g., inhibiting bone resorption and breast cancer growth [1,2,8]. Details about SERMs' mechanism of action are described by Dutertre and Smith [8]. Tamoxifen (II, Figure 1), the first-generation SERM approved by the U.S. Food and Drug Administration (FDA) agency and the most widely used drug for breast cancer treatment and prevention, displays estrogen agonist effects in bone tissue and cardiovascular system, and acts as an estrogen antagonist in breast but, on the other hand, it manifests partial estrogen agonist activity in the uterus [1,9,10]. Tamoxifen is mainly metabolized to 4-hydroxytamoxifen (III, Figure 1), an active metabolite, which has greater ER affinity than the parent drug [4]. Raloxifene (IV, Figure 1), a second-generation SERM approved by the FDA agency for prevention of osteoporosis in postmenopausal women, is distinguished from tamoxifen by its lack of proliferative effects in uterine tissue [11,12] and is being evaluated to treat and prevent breast cancer, osteoporosis, and cardiovascular diseases [13,14].
Grese and co-workers [11] described four important structural features of raloxifene when compared to tamoxifen and 4-hydroxytamoxifen ( Figure 1): (i) the two phenolic hydroxyl groups; (ii) different substituents on the basic aliphatic amine group; (iii) incorporation of the stilbene moiety into the 2-phenyl-1-benzothiophene framework; and (iv) the carbonyl "hinge" between the side chain containing the basic aliphatic amine and the benzothiophene aromatic ring. These structural elements give raloxifene a distinct molecular conformation, which may affect the conformation of the raloxifene-ER complex, probably being responsible for the unique tissue selectivity observed for this compound.
The crystal structures of ER in complex with 17-estradiol (the endogenous estrogen) and raloxifene [3] are available in the Protein Data Bank (PDB) [15]. These crystallographic structures show that raloxifene (PDB code: 1ERR) binds at the ligand binding domain (LBD), i.e., the same binding site of estradiol (PDB code: 1ERE) ( Figure 2) [3]. Raloxifene is anchored to the LBD of ER by a direct hydrogen-bonding network comprising ( Figure 2): the nitrogen atom (feature II, Figure 1) of the piperidine ring with Asp351; the phenolic hydroxyl group (feature Ia) with Glu353 and Arg394; and the phenolic hydroxyl group (feature Ib) with the imidazole nitrogen atom of His524 [3]. The side chain of raloxifene makes extensive hydrophobic interactions, but it is too long (over 11 Å) to be contained within the confines of the binding cavity. Therefore, the piperidine moiety displaces helix H12 and protrudes out of the pocket between helices H3 and H11 (Figure 2). Brzozowski and co-workers [3] argue that the antagonistic response of raloxifene is based on its ability to prevent the formation of an active conformation of the raloxifene-ER complex, which is dependent on H12.
Based on the structure of raloxifene, Grese and coworkers [16] synthesized and evaluated a series of 2-arylbenzothiophene derivatives as ligands of the ER and as inhibitors of the MCF-7 breast cancer cell proliferation in vitro. In the current work, we developed four-dimensional quantitative structure-activity relationship (4D-QSAR) [17][18][19][20][21] models for this series of 2-arylbenzothiophene derivatives, in order to better understand the mechanism of action of this class of compounds and to design new raloxifene analogs as potential selective estrogen receptor modulators.

Grid Cell Size and Alignment Evaluation
In order to identify the best grid cell size and alignment, we plotted the adjusted cross-validated R 2 (Q 2 adj ) values versus the number of terms included in the corresponding equation, according to the two grid cell sizes (2.0 and 1.0 Å) and the three alignments considered ( Figure 3). Besides, to define the number of descriptors that should be included in a good predictive model, we analyzed models with seven, eight, and no more than nine terms, avoiding possible data overfitting [22].
The best models generated by 1.0 Å grid cell are more predictive (higher Q 2 adj values) than the best models from 2.0 Å grid cell (Figure 3), irrespective to the alignment. Although alignment 3 had shown good performance, a preliminary analysis of those models demonstrated that the spatial localization of their selected descriptors (GCODs) (data not shown) is not consistent with the ER modulators action mechanism. Therefore, only alignments 1 and 2, obtained with a grid cell size of 1.0 Å, will be discussed from this point forward.

Best Models from Alignment 1
The best models 1B7 and 1B9 (1.0 Å grid cell) are described in Table 1. Model 1B8 was eliminated from the analysis because it presented a low Q 2 adj value (<0.5) (Figure 3). In order to determine if the information in models 1B7 and 1B9 is redundant, the correlation coefficient (R) of their residuals was calculated (i.e., pIC 50 experimental-pIC 50 calculated). Equivalent models typically have nearly the same distribution of residuals (i.e., R ≈ 1) and independent models will have nearly uncorrelated residuals (i.e., R ≈ 0) [23]. The results show that models 1B7 and 1B9 have some degree of correlation (R = 0.59), probably due to the presence of spatially identical grid cells with the same IPE type (Table 1), namely GCODs (2,11,4)(any), (−1,8,2)(any), and (1,13,1)(any). We also computed the cross-correlation matrix of the GCODs from models 1B7 and 1B9 (Table 2) to determine if two or more highly correlated GCODs appear in the same 4D-QSAR model. According to Table 2, except only for two pairs of cells, the other pairs of descriptors are poorly correlated (R < 0.5). This means that each of these descriptors contributes in different ways to the 4D-QSAR models [22]. The highest correlations occur between the pair of GCODs (−1,8,2)(any) and (1,13,1)(any) (R = 0.52) and also between the pair of GCODs (0,−2,−1)(any) and (0,1,−2)(any) (R = 0.47). The first pair of GCODs may be found in both models (1B7 and 1B9), while the second pair is found only in model 1B9.  The cross-correlation matrix of the experimental pIC 50 values and the frequency of grid cell occupancy of models 1B7 and 1B9 were calculated ( Table 2). It has been demonstrated that the highest individual correlation with activity, except only for GCOD (1,11,−2)(any), is shown by GCODs Outliers were defined as those compounds whose residuals are higher than twice the standard deviation of the residual of fit (SD res ). The standard deviations were computed for the residuals of all 41 compounds of the training set for models 1B7 (SD res = 0.53) and 1B9 (SD res = 0.38). The results show that model 1B9 presents lower SD res value and, consequently, only compound 9 was defined as an outlier. On the other hand, model 1B7 shows a higher SD res value and presents three outliers (namely, compounds 2, 6, and 13).

Graphical Analysis of the Representative Model of Alignment 1
Since Model 1B9, as described in Equation 1 (Figure 4), exhibits higher R 2 adj and Q 2 adj values, and fewer number of outliers than Model 1B7, it was selected as the representative model. According to the cross-correlation matrix of the Model 1B9 GCODs (Table 2), grid cells (1,13,1)(any) and (−1,8,2)(any) seem to partially supply (R = 0.52) the same type of structure-activity information, in spite of the fact that these cells are spatially distant (5.48 Å) and have opposite contributions. The GCOD (1,13,1)(any) occupation increases the compounds potency, while the GCOD (−1,8,2)(any) occupation decreases. Both descriptors are located close to the side chain of the arylbenzothiophenes, being related to the flexibility of this basic side chain. In fact, GCOD (1,13,1)(any) is close to the carbon atom (C4) of the piperidine ring, while GCOD (−1,8,2)(any) is close to the oxygen atom of the ethoxy-phenyl group ( Figure 3).
This correlation is probably due to an intramolecular hydrogen bonding (N···O distance = 3.02 Å), reinforced by a strong attractive electrostatic interaction between the corresponding amino and ethoxyl groups. This interaction causes a preferential synclinal conformation of this lateral chain, involving the four atoms of the N-C-C-O moiety. Therefore, since GCOD (1,13,1)(any) is the descriptor that most contributes to increase the potency (coefficient value = 15.96, Equation 1, Figure 4), this fact indicates the relevance of the synclinal conformation of the basic side chain for the structure-activity relationship (SAR) of this series of compounds. The cells (2,11,4)(any) and (0,11,5)(any) are close to each other and present negative coefficients, being also related to the basic side chain, located near the carbon (C2) and nitrogen atoms of the piperidine ring, respectively ( Figure 4). Altogether, those four GCODs indicate a preferential orientation of the piperidine side chain. This group is involved in an intermolecular hydrogen bond, intensified by an electrostatic interaction with Asp351, which is corroborated by Wang and co-workers [24]. The authors employed a series of compounds, structurally related to raloxifene, in construction of a 2D-QSAR model and proposed that hydrogen bonds are important, but not an unique feature to binding affinity. That orientation is essential to increase or decrease the potency of the raloxifene analogs. Additionally, the basic side chain of raloxifene also makes extensive hydrophobic contacts with the alpha helixes H3, H5/6, H11 and the loop between the alpha helixes H11 and H12 [3], reinforcing the importance of the orientation and conformation of this basic side chain.
The X-ray crystal structure of the raloxifene-ER complex shows the benzothiophene ring of raloxifene surrounded by hydrophobic residues, such as Leu349, Ala350, Leu387, Leu391 and Phe404 [3]. Therefore, the occupation by any atom types in grid cell (0,−2,−1) decreases the potency of the compounds due to steric factors. This can be explained by the fact that compounds with bulky substituents, such as methoxyl or acetyl groups (e.g., 21, 28, 37, 39, 43, 47, 49, and 52), have a greater occupation frequency than compounds with less bulky substituents (e.g., 19, 23, 30, and 38). In addition, these substituents are not able to perform hydrogen bonding interactions with Glu353 and Arg394 or they can sterically impair these interactions.
Finally, the GCOD (1,6,−2)(np) ( Figure 4) is located in a 3D-box area that corresponds to the Ala350 residue. The negative coefficient of this GCOD (Figure 4) indicates that the occupation of this cell by non-polar atoms reduces the compound potency, probably due to steric factors. Although Figure 4 does not show clearly any atom of the ligands around this cell, some conformations and orientations adopted by the compounds during the MDS (data not shown) enable the carbon and hydrogen atoms of the piperidine ring to occupy this grid cell. It is important to notice that Figure 4 shows only one conformation, selected as the "bioactive" one, among the 2,000 conformers from the MDS of each compound, which leads to maximum potency according to Model 1B9. In fact compounds with substituents at position 2' of the phenyl ring have low occupation frequency (e.g., 3, 7, 8, and 16), as well as compounds with substituents at positions 4 and 5 of the benzothiophene ring (e.g., 13, 29, 31, 35, 43, 44, 45, 47, and 49). This fact indicates that such substituents try to maintain a favorable conformation of the basic side chain of the compounds to the antagonism towards ER. This additional characteristic has not been revealed by the LIV-3D-QSAR Model developed by Cunha and co-workers using this series of compounds [18].
The absence of any descriptors around the phenyl ring, specifically related to the 4'-OH group, which is responsible for the hydrogen bond interaction with the backbone atoms of His524, corroborates what previous SAR studies already demonstrated [11,16], i.e., the 6-OH group of the benzothiophenyl ring is more important for the biological activity than the 4'-OH group. However, the absence of any descriptors related to the 4'-position of the phenyl ring suggests some limitation of the model. Additionally, unlike observed in the LIV-3D-QSAR Model [18], the postulated "bioactive" conformation obtained in the 4D-QSAR Model 1B9 (Figure 4) is very similar to the one adopted by raloxifene in the X-ray co-crystal structure [3]. Comparing the compounds conformations from Model 1B9 with the raloxifene-ER X-ray structure, the nitrogen atom of the piperidine ring is in a very close position to that one observed in the crystal, being the distance among them of 1.33Å for 1 (the most potent), and of 0.29Å for 54 (the least potent). These distances were calculated after RMS superposition of these compounds conformations over the X-ray structure of raloxifene bound in the LBD of ER, according Alignment 1.
The SDres value for the training set was 0.38, which indicated compound 9 as an outlier ( Figure 5 and Table 3). The only structural difference between this compound and raloxifene (1) is an additional substituent, 3'-Cl, at the phenyl ring of 9. According to Model 1B9, the predicted potency for compound 9 is lower than the experimental one, probably due to some limitation of the model, that does not show descriptors around the phenyl ring (as stated previously for the 4' position of the phenyl ring), or because of the presence of only few compounds with 3'-substituents, leading the model to underestimate the potency. Another possible explanation for the outlier behavior of compound 9 could be lipophilicity, which was not considered as a descriptor in this 4D-QSAR model. The calculated LogP value (cLogP) for the isomers containing one hydroxyl group in any position of the phenyl and benzothiophenyl rings is identical, e.g., the cLogP value for compounds 1, 16, 31, 35, and 41 is 5.96 [26]. However, the insertion of a chlorine atom in the phenyl ring increases the lipophilicity, since the cLogP value of compound 9 is 6.63.
Wang and co-workers [24] results indicated that LogP may not be essential to binding affinity on estrogen receptor . However, the compounds have to be able to penetrate through cellular membranes. A 2D-QSAR [26] was performed to the same data set of this study, including LogP as a potential descriptor into the analysis. The authors concluded that the benzothiophene moiety is lipophilic enough to pass through this stage, which corroborate this hypothesis, since there is a significant positive correlation between potency and lipophilicity [25,26].  As described in the previous section, the test data set were used to accomplish a "real" prediction using the 4D-QSAR Model 1B9. The value of SD res found for the test data set was 0.38, indicating compound 51 as an outlier ( Table 3). The only structural difference between this compound and raloxifene (1) is the substitution of the 4'-OH of 1 by the 4'-OMe in 51. The potency of compound 51 was overestimated by Model 1B9. Again, this fact can be due to a limitation of the model, which does not present descriptors around the phenyl ring. Thus, it would not distinguish some putative unfavorable interaction of the methoxyl group with the neighboring residues ( Figure 5). Besides, classic QSAR studies demonstrated that there is a negative steric effect for 4'-substituents in the phenyl ring [26].

Best Models from Alignment 2
The best models (2B7, 2B8, and 2B9) from Alignment 2 were obtained by 1.0 Å grid cells ( Table 4). The cross-correlation matrix of the residues of the training set compounds according to models 2B7, 2B8, and 2B9 was calculated in order to eliminate models with the same type of information. High correlation among the residues of the three models (R > 0.5) is probably due to many identical cells, in terms of Cartesian coordinates and atom type occupation. The grid cells (−1,−2,11)(any) and (0,2,2)(ar) are present in all models, while the cells (0,10,4)(any) and (0,5,−3)(any) are present in models 2B7 and 2B8. In other words, models 2B7/2B8 (R = 0.80) present four identical cells, while models 2B7/2B9 (R = 0.73) and 2B8/2B9 (R = 0.58) present two identical cells.
The cross-correlation matrix between experimental biological activity values and the grid cell occupancy of models 2B7, 2B8, and 2B9 (Table 5) indicates that the cells which present the highest correlation with the activity, are those found in Model 2B9. These cells are (0,10,−2)(any) and (0,2,2)(ar), where the last one is present in the three models. Besides, those GCODs show poor correlation between themselves (R = 0.06). Table 4. Best models of Alignment 2 obtained from 1.0 Å grid cells.    Model 2B7 presents a higher correlation with the other models, especially with model 2B8 (R = 0.80). In a first analysis, Model 2B7 incorporates the other models quantitatively, because it presents more identical cells as compared to the other models. However, models 2B8 and 2B9 are less correlated between themselves (R = 0.58) and, at the same time, highly correlated to model 2B7. Therefore, an additional criteria was taken into account in order to select the representative Model of Alignment 2, i.e., the SD res value and the number of outliers. We observed that model 2B9 presents a lower value of SD res , in spite of presenting the same number of outliers considering the training set compounds. Besides, this model also possesses the highest values of R 2 adj and Q 2 adj , being selected as the most representative model of Alignment 2.

Graphical Analysis of the Representative Model of Alignment 2
Model 2B9 (described in Equation 2) was selected as the most representative model of Alignment 2, as previous reported in the selection criteria outlined above. According to the cross-correlation matrix of grid cell occupancy of model 2B9 (Table 5), the descriptors are nearly orthogonal and contribute in a different way to the 4D-QSAR models, except for only two pairs of GCODs. The GCODs (0,10,−2)(any) and (0,12,−1)(any) shows high correlation (R = 0.73). Although the GCOD (0,10,−2)(any) occupation increases the compounds potency and the GCOD (0,12,−1)(any) occupation decreases, these cells are close in space (distance of 2.24 Å), what would justify the correlation between them. The GCOD (0,10,−2)(any) shows an ambiguity, because it is located in an area of the 3D grid cell close to Asp351 ( Figure 6). Therefore, it is not expected that this GCOD occupation increases the compounds potency. It demonstrates that model 2B9 is unable to "predict" the presence of Asp351 and the attractive electrostatic interaction between this residue and the piperidine group of raloxifene, as can be noticed by visual inspection of the 3D structure of the raloxifene-ER complex (PDB code 1ERR [3]). This data may be used to rationalize the underestimation of the potency of compound 1, the most potent compound of the series under study.
The second pair of GCODs that shows medium correlation (R = 0.44) corresponds to GCODs (0,2,2)(ar) and (1,3,6)(hba). Unlike the previous case, those descriptors are distant in space (distance of 4.24Å) and both contribute to increase the potency. The GCOD (0,2,2)(ar) is located close to the carbon and hydrogen atoms at position 4 of the benzothiophene ring, while the GCOD (1,3,6)(hba) is located close to the oxygen atom of the carbonyl group. Those GCODs are related to the dihedral angle formed by the carbonyl and benzothiophenyl planes, indicating the importance of the coplanar orientation of the side chain, as described by Grese and coworkers [11]. The molecules that occupy these cells most frequently are those that present substituents at position 2' of the phenyl ring, e.g., compounds 3, 7, 8, and 16. Substituents at these positions generate steric repulsion with the oxygen atom of the carbonyl group, leading the side chain to adopt a non-coplanar orientation.
GCODs (−1,12,6)(any) and (0,11,3)(p−) are located close to the piperidine ring, corroborating the importance of the side chain orientation. The occupation of GCOD (−1,12,6) by any atom type decreases the potency of the compounds, displacing the nitrogen atom of the piperidine group from the favorable position, related to the hydrogen bonding with residue Asp351. The GCOD (0,11,3)(p−) occupation is also related to the nitrogen atom of the side chain, since substituents at position 4 of the benzothiophene ring or at position 3' of the phenyl group have a high occupation frequency at this cell, e.g., compounds 5, 9, 29, 35, and 44. This indicates that these substituents are able to maintain a favorable conformation of the basic side chain of the compounds to the antagonism on ER.
GCOD (0,0,−2)(any), located close to Glu353, indicates the importance of hydrogen bonding of this residue with the substituent at position 6 of the benzothiophene ring. This grid cell has a low frequency of occupation when there are no substituents at this position (e.g., compounds 23, 30 and 38) or the substituents are unable to perform this type of interaction (e.g., compound 19). The GCOD (−1,−2,11)(any), located close to position 4' of the phenyl group, is related to the nonpolar interactions around this ring. As this area is wrapped up by hydrophobic residues (e.g., Ile424, Gly521, and Leu525), compounds with bulky substituents at this position (e. g., compounds 22,  33, 36, 39, 43 and 51) show steric hindrance and, consequently, lower potency.
Three compounds of the training set were identified as outliers: 2, 49 and 52 (Table 3). The predicted activity for compound 2 is lower than the experimental one. The structural difference between this compound and raloxifene (1) is an additional substituent, 3'-F, at the phenyl ring of 2. Like Model 1B9, this fact can be due to the existence of few compounds with substituents at position 3' of the phenyl ring. Therefore, the model does not reveal the importance of the substitution pattern at this position. Compounds 49 and 52 show higher predicted potencies than the experimental ones. The chemical difference between raloxifene (1) and 49 is the presence of two additional substituents, 5,7-Me, at the phenyl ring. Model 2B9 does not show any descriptors around those substituents, which turns the model unable to recognize how they may influence the conformation of the side chain, leading to an unfavorable orientation for potency (Figure 7). Compound 52 has an amide substituent at position 6 of the benzothiophene ring that has a high frequency of occupation of GCOD (0,0,−2)(any). However, this cell presents a low coefficient (Equation 2), which tends to have a small contribution to Model 2B9. Compound 51, from test data set, was identified as an outlier and its potency was overestimated by Model 2B9 (Table 3). The same behavior was observed in Model 1B9. In a different fashion as observed with the representative model of alignment 1, Model 2B9 has a close descriptor at position 4' of the phenyl ring, i.e., GCOD (−1,−2,11)(any). This fact supports 2D-QSAR data about the negative steric effect of substituents at this position [26]. This GCOD is also occupied by compound 51, but its contribution to the model is not significant, due to its small coefficient value when compared to other GCODs.
Although alignment 2 has considered atoms belonging to more rigid regions of the molecules and shows descriptors better distributed in space, alignment 1 has superior statistical indices and is more consistent with the considered raloxifene mechanism of action. Thus, based in the results of the 4D-QSAR Model 1B9 and in a previous LIV-3D-QSAR model from our group [18], the synthesis of new SERMs candidates has been suggested ( Figure 8).

New Compounds Based on 4D-QSAR Analysis
In medicinal chemistry, the optimization of lead compounds proceeds along two main methods [27]. The first one is based on chemical modifications of the molecular structure (changing the chemical properties of the molecule), which have been exhaustively explored by Grese and co-workers [16] in this series of raloxifene analogs. The second one is the application of conformational constraints that change the molecular flexibility. Therefore, based on the results obtained by Model 1B9 of the current 4D-QSAR analysis and in the previous LIV-3D-QSAR study [18], we suggest two modifications on the raloxifene structure in order to reduce side chain flexibility. (i) The piperidinyl-ethoxy moiety was replaced by a piperazine group (proposed compounds V and VI), which is able to maintain the electrostatic and the hydrogen bonding interactions with Asp351, since this group has a basic nitrogen atom at the same position in relation to raloxifene. (ii) The carbonyl "hinge" and the phenolic group were replaced by a naphthyl group (proposed compound VI).
According to the 4D-QSAR Model 1B9, compound V showed the poorest calculated potency (pIC 50 = 6.60), while compound VI has shown the highest calculated potency (pIC 50 = 10.48) (Figure 8). Replacing the basic side chain and maintaining the carbonyl "hinge" (V) may lead to side chain orientations, which are not favorable to biological activity, since the basic side chain follows the carbonyl "hinge" orientation. Therefore, GCODs that contributes negatively to the potency have a high frequency of occupation.
A SAR study from 1997 [11], indicated that nearly orthogonal orientations of the basic side chain in raloxifene might be responsible for its unique biological activity profile. However, it is interesting to note that the most potent proposed compound (VI) shows a coplanar orientation of the basic side chain. This result may be due to its similarity to the bioactive conformation of raloxifene, since the RMS deviation value (0.28 Å) between VI and the raloxifene X-ray crystallographic structure is low. Moreover, this result shows that the simultaneously replacement of both the basic side chain and the carbonyl "hinge" (VI) leads to a rigid conformation of the lateral chain, which does not allow unfavorable orientations. Therefore, GCODs that contributes positively to the potency have a high frequency of occupation. It is interesting to note that, benzothiophene [28] and tetrahydrolsoquinoline [29] derivatives containing a constrained piperazine side chain were reported as high-affinity ligands of the ER, being potent agonists in bone tissue.

Structures Building
The three-dimensional (3D) models of the 54 compounds (Table 6) were based on the structure of compound 1 (raloxifene) co-crystallized with ER, retrieved from the Protein Data Bank (PDB code: 1ERR) [15], corresponding to the putative "bioactive" conformation. The 3D model for each compound was built with the nitrogen atom of the piperidine group protonated, using HyperChem 7.0 software [30]. Each structure, including raloxifene (1), was geometry-optimized in vacuum, without any restriction, using the MM+ molecular mechanics force field (HyperChem) [31], and subsequently using the semi-empirical AM1 Hamiltonian (HyperChem) [32], in order to assign the partial atomic charges. Three alignments were used to define the lattice overlay of the CEP of each compound. In alignments 1 and 2, the atoms were selected based on previous SAR studies from the literature [3,11,16], while in alignment 3, we used non-pharmacophoric atoms to observe if the 4D-QSAR program is able to select the most important IPEs. The atom numbers and corresponding sequences for each alignment are listed in Table 7, using the raloxifene structure (compound 1) as a template. Table 7. Atom numbering of the tested alignments used in 4D-QSAR models' construction. Alignment 1st Atom a 2nd Atom a 3rd Atom a 1 C3 C12 a The atom numbers of raloxifene (compound 1) were automatically obtained from HyperChem numbering [17] and not by IUPAC rules.

Independent Variable Generation
The CEP for each compound was overlaid onto a cubic lattice of a selected grid cell size (2.0 and 1.0 Å), according to each alignment. The grid cell occupancy profile for each IPE was computed and used as the 4D-QSAR descriptors, which are named Grid Cell Occupancy Descriptors (GCODs). Thus, the normalized grid cell absolute occupancy [20,23], defined as the number of times a cell was occupied by an atom type over the MDS, divided by the size of the CEP (2,000 conformations), was used to define the GCODs.

Data Reduction
The 4D-QSAR analysis, like other 3D-QSAR methods, generates an great number of QSAR variables (GCODs) because of the large number of grid cells and the number of IPEs [20,34]. Thus, three serial levels of data reduction were considered, in order to eliminate spurious variables [34]. The first level eliminated GCODs that have an individual correlation coefficient, R, with the activity values less than 0.1; the second level eliminated GCODs whose variance (self-variance) over a set of analogs was less than a prechosen fraction; and the third level eliminated GCODs with a prechosen number of empty grid cells (Table 8). The 4D-QSAR models building and optimization process employed the Genetic Function Approximation (GFA) [35] coupled with Partial Least Squares (PLS) regression [36]. Improved models are constructed by performing crossover operations to recombine the descriptors of the better-scored models, according to the Friedman's "lack-of-fit" (LOF) measure, which penalizes the Least Square Error (LSE) measure [34,35]. The number of crossover operations was set from 6,000 to 20,000. In addition, mutation probability was set at 100% and a smoothing factor (the variable that controls the number of independent variables in the models) ranged from 0.5 to 3.0.

Internal and External 4D-QSAR Model Validation
The best-scored GA-PLS equations were submitted to internal validation by "leave-one-out" cross-validation (LOO-cv) technique in the 4D-QSAR program. LOO-cv correlation coefficient (Q 2 ), squared correlation coefficient (R 2 ), standard error (SE), and Fischer's test (F) were used as parameters to select the best models. The test data set (13 compounds) was used to test the best 4D-QSAR models for their ability to predict biological activity values of compounds not included in the training data set.

Bioactive Conformation Selection
The final step in the 4D-QSAR methodology is to hypothesize the bioactive conformation of each compound in the training set. The lowest-energy conformer state (up to 10.0 kcal/mol from the minimum energy conformation), which predicted the maximum potency, using the optimum 4D-QSAR model, was defined as the "bioactive" conformation.

Conclusions
A series of 54 raloxifene analogs, evaluated as estrogen receptor- ligands, was selected from the literature for a 4D-QSAR study, applying three tentative alignments and grid cells of 2.0 and 1.0 Å. The best models were obtained from alignments 1 and 2, using grid cell size of 1.0 Å, from a training set of 41 compounds. In addition, a test set of 13 compounds were used in the external validation process. The best models were also validated based on the biological system and mechanism of action of the compounds under study.
The models generated by 1.0 Å grid cell are more predictive, since they showed higher Q 2 adj values than the best models from 2.0 Å grid cell, irrespective to the alignment. The models from both alignments 1 and 2 were also consistent with the ER modulators action mechanism. A representative model was selected for each one of alignments 1 (Model 1B9) and 2 (Model 2B9), revealing the degree in which the lateral chain flexibility of the raloxifene analogs influences the potency.
Although there are any descriptors associated to the 4'-position of the phenyl ring, it is the most coherent with the X-ray crystallography data. The model 2B9 was incapable to preview the presence of Asp351, which has an important contribution to binding activity of raloxifeno derivatives on estrogen receptor . Both models do not consider cLogP as a descriptor and this limitation can explain the outlier compounds behavior.
In order to evaluate the influence of the reduction of the side chain flexibility on the potency and based on the results from the 4D-QSAR analysis, we proposed two new raloxifene analogs based on the model 1B9. The results indicated that the highest degree of rigidity imposed to the lateral side chain increases the calculated potency, since it does not allow unfavorable orientations, maintaining most of the time the favorable electrostatic and hydrogen bond interactions with Asp351. Therefore, the drastic reduction of the side chain flexibility and, consequently, the generation of more favorable conformations of compounds to achieve better interactions with the receptor may be a successful strategy.