A 3D QSAR Study of Betulinic Acid Derivatives as Anti-Tumor Agents Using Topomer CoMFA: Model Building Studies and Experimental Verification

Betulinic acid (BA) is a natural product that exerts its cytotoxicity against various malignant carcinomas without side effects by triggering the mitochondrial pathway to apoptosis. Betulin (BE), the 28-hydroxyl analog of BA, is present in large amounts (up to 30% dry weight) in the outer bark of birch trees, and shares the same pentacyclic triterpenoid core as BA, yet exhibits no significant cytotoxicity. Topomer CoMFA studies were performed on 37 BA and BE derivatives and their in vitro anti-cancer activity results (reported as IC50 values) against HT29 human colon cancer cells in the present study. All derivatives share a common pentacyclic triterpenoid core and the molecules were split into three pieces by cutting at the C-3 and C-28 sites with a consideration toward structural diversity. The analysis gave a leave-one-out cross-validation q2 value of 0.722 and a non-cross-validation r2 value of 0.974, which suggested that the model has good predictive ability (q2 > 0.2). The contour maps illustrated that bulky and electron-donating groups would be favorable for activity at the C-28 site, and a moderately bulky and electron-withdrawing group near the C-3 site would improve this activity. BE derivatives were designed and synthesized according to the modeling result, whereby bulky electronegative groups (maleyl, phthalyl, and hexahydrophthalyl groups) were directly introduced at the C-28 position of BE. The in vitro cytotoxicity values of the given analogs against HT29 cells were consistent with the predicted values, proving that the present topomer CoMFA model is successful and that it could potentially guide the synthesis of new betulinic acid derivatives with high anti-cancer activity. The IC50 values of these three new compounds were also assayed in five other tumor cell lines. 28-O-hexahydrophthalyl BE exhibited the greatest anti-cancer activities and its IC50 values were lower than those of BA in all cell lines, excluding DU145 cells.


Introduction
Betulinic acid (3β-hydroxy-lup-20(29)-en-28-oic acid, BA, Figure 1) is a natural product with notable anti-HIV activity and cytotoxicity against various malignant carcinomas. In particular, BA exerts its cytotoxicity by triggering the mitochondrial pathway to apoptosis [1] without side effects at doses up to 500 mg/kg [2]. Zuco et al. [3] reported that BA exhibited selective cytotoxicity for tumor cell lines over normal cells. BA has been suggested for the treatment of human melanoma tumors [4]. Betulin (lup-20(29)-en-3β,28-diol, BE, Figure 1), the 28-hydroxyl analog of BA, was one of the first natural products identified and isolated from plants in 1788 [5], and it is present in large amounts (up to 30% dry weight) in the outer bark of birch trees. In contrast to BA, BE has no significant cytotoxicity. Meanwhile, BE represents an abundant starting material for the production of BA and new anti-HIV and anti-tumor agents. BA derivatives semi-synthesized from BA and BE have been reported, and their anti-tumor bioactivities have been investigated [6][7][8][9][10][11][12][13][14][15][16][17][18]. HO OH QSAR methods describe structure-activity relationships in terms of physicochemical parameters and steric properties, or certain structural features (Free Wilson analysis) [19,20]. The method of comparative molecular field analysis (CoMFA) was the first real three-dimensional (3D) QSAR method. Although CoMFA is widely succesful in analyzing structure-activity relationships, it still has weaknesses. For example, in the absence of a reliable receptor site, the researcher must choose among a multitude of ligand alignment protocols; meanwhile, only one approach exists for comparing alignment protocols, namely the maximization of various q 2 (cross-validated r 2 ) statistics, which average the internal predictive accuracy for every structure/activity observation. Considering only the measurement uncertainties attached to any individual structure/activity observation, it is doubtful that moderate changes in q 2 values can provide dependable alignment guidance [21]. This difficulty is addressed with surprisingly good results via a completely objective and universal methodology termed topomer CoMFA, the combination of the universal "topomer" methodology and CoMFA technologies. A series of input structures are broken into two or more fragments at central acyclic single bonds, and any core fragment structurally common to the entire series is removed. Standard topomer 3D models are automatically constructed for each fragment, and a set of steric and electrostatic fields ("CoMFA column") is generated for each set of topomers [22].
In the present paper, a topomer CoMFA method was utilized to explore the quantitative relationships between some BA derivative structures and their anti-cancer activities. The bioactivities of BA and BE derivatives modified at the C-3 and C-28 sites were compared with the values predicted by topomer CoMFA. Based on this topomer CoMFA analysis, some new molecules were designed and synthesized, and their anti-tumor activities were assessed across five different cancer cell lines, which verified that the topomer CoMFA technology provided useful information for designing new derivatives with enhanced anti-tumor activities.

3D-QSAR Model using a Topomer CoMFA Method
The topomer CoMFA method provides a means for an alignment-independent 3D-QSAR approach, which is advantageous because it does not require alignment and it provides a means for automated activity searches in fragment libraries. BA derivatives in the literature have a common pentacyclic triterpenoid core, and most studies have focused on modifications at the C-3 and C-28 sites. In this topomer CoMFA study, we designated the groups at the two sites as R-groups. We split the molecules into three pieces by cutting at the C-3 and C-28 sites with a consideration toward structural diversity ( Figure 2). In this study, Gasteiger-Hückel charges were used for charge calculations.

R 2
A database alignment was used to establish topomer CoMFA 3D-QSAR models using the methods described in the Experimental Topomer CoMFA setup section. The predicted values of compounds in the training and test sets using topomer CoMFA are listed in Table 1. The experimental IC 50 value of compound 10 is >100 μM, and compound 10 was selected as a test molecule with the pIC 50 value set at 4.0 to expand the diversity of structures and the range of pIC 50 . Therefore a bigger residual was found between the predicted and experimental pIC 50 s for compound 10. The topomer CoMFA model was optimized. Cross-validation q 2 value of 0.722 and a non-cross-validation r 2 value of 0.974 with an optimized component of 5 were obtained, which suggested that the model has good predictive ability (q 2 > 0.5) (F values = 151.669). The predictive correlation coefficient (r 2 pred ) of 0.675, the R 2 m value of 0.726, and the root mean square error of prediction (RMSEP) value of 0.201 were determined to be externally validated. The pIC 50 values of the test set were predicted, and the test set r 2 pred value of 0.675 was obtained in the evaluation of the predictive ability of the model. The correlation plot of the experimental versus predicted pIC 50 values is shown in Figure 3, and a good correlation between the predicted and the experimental values of the compounds was observed. The predictive power of the model is high but only within the addressed range, i.e., truly high-affinity compounds (pIC 50 > 8) cannot be explained by the model.  The topomer CoMFA 3D contour maps around R1 and R2 (shown in Figures 4 and 5, respectively) were generated by plotting the coefficients from the model. It is better to choose the molecule with highest activity as the reference molecule which makes it is easier to explain the contour map. Compound 13 showed the best biological activity against HT 29 cells in all 37 compounds. Hence these maps are shown using compound 13 as a reference structure. In the steric field, the green contours located at the C-28 site indicate that a bulky substituent would be favorable, and the yellow contours denote where bulky substituents would not be tolerated (Figure 4a). In the electrostatic field, the blue contours located at the C-28 site indicate that electropositive groups would be favorable, and the red contours indicate that electronegative groups would be favorable (Figure 4b). Green contours dominated the R1 group in the steric field, blue contours occupied the middle of the electrostatic field, whereas red contours were located at the end of the substituent; this suggested that bulky groups with electronegative potential at the end of the side chain at the C-28 site would be favorable for activity. Regarding the contours of the R2 group, green contours ( Figure 5a) were located near the C-3 site, and yellow contours were located in farther away; the red contours were located in the middle of the electrostatic field, and blue contours were located far away from the C-3 site (Figure 5b); this demonstrated that a moderately bulky group with electronegative potential at the R2 site would improve the anti-tumor activity.

Molecular Design and Experimental Verification
To verify the topomer CoMFA model, molecules were designed. Since BE is the 28-hydroxyl analog of BA and shares the same pentacyclic triterpenoid core with BA, both BA and BE are good starting materials for further derivatization. BE shows better availability and much cheaper price because it is present in large amounts (up to 30% dry weight) in the outer bark of birch trees, so BE was selected as the starting material for the experimental verification. Because a moderately bulky group with electronegative potential was required at the C-3 site, we left this position unchanged, as the hydroxyl group at the C-3 site in BE perfectly matches the requirements. To introduce electronegative potential at the end of the side chain at the C-28 site, we employed dicarboxylic anhydride to form a carboxylic group at the end of side chain via esterification of the C-28 hydroxyl group of BE. To meet the needs of a sterically bulky group, the cyclic dicarboxylic anhydrides phthalic anhydride and hexahydrophthalic anhydride were selected, and maleic anhydride was chosen as a representative fatty acid. The chemical reaction and the substituent groups are illustrated in Scheme 1. The bulky electronegative groups (maleyl, phthalyl, and hexahydrophthalyl groups) were directly introduced at the C-28 position of BE. The anti-cancer activity of these three compounds against HT29 cells were predicted and tested ( Table 2). The experimental pIC 50 were compound 1 < compound 2 < compound 3, which were consistent with the tendency of predicted values. The residues between the experimental and predicted values were much bigger in Table 2 than that in Table 1, because the experimental IC50 results from our lab may be significantly different from Kommera's lab, which is also the reason we should select the data from the same research group to establish a reliable model. The consistent tendency further confirmed the established topomer CoMFA method as a predictive model for the structure-activity relationships of BA derivatives. The hexahydrophthalyl and phthalyl derivatives displayed better or equal bioactivity to BA, which is much more active than the parent molecule BE.   [11][12][13].

Scheme 1. Synthesis of compounds
To better understand the anti-tumor activities of these three derivatives, their IC 50 values were also assayed in five other tumor cell lines, including pancreatic cancer (MPC2), prostate cancer (DU145), lung carcinoma (NCI-H520), cervical cancer (HeLa), and ovarian cancer (2774) cells. The results are presented in Table 3. Among these three structures, compound 3 (containing the hexahydrophthalyl group) exhibited the greatest anti-cancer activities; its IC 50 values were lower than those of BA in all cell lines excluding DU145 cells. Following the instructions of the topomer CoMFA model, additional molecular design can be conducted, and more promising and potent agents can be synthesized.

Data Sets
The in vitro anti-cancer activities of 37 BA and BE derivatives (reported as IC 50 s) against HT29 human colon cancer cells were used for the present study. All compounds and associated inhibitory activity data were obtained from the literature [10][11][12][13] reported by the same research group within 2 years. These 37 molecules were divided into a training set of 26 compounds and a test set of 11 compounds. IC 50 values were converted to pIC 50 values by taking log (1/IC 50 ). The compounds in the test set were randomly selected with the goal of choosing a range of both biological activity data and structural diversity. The structures of the compounds in the training and test sets are listed in Table 1.

Topomer CoMFA Setup
The Topomer CoMFA program in sybyl-x1.1 software package was used to perform all calculations. Topomer CoMFA is an alignment-independent 3D-QSAR method that combines the topomer search method [23] (a fragment alignment approach) with the conventional CoMFA method. A 3D-QSAR model was generated by splitting the molecules into fragments, topomerically aligning each fragment, and calculating steric and electrostatic field descriptor values for the topomerically aligned fragments to create a CoMFA table with the field descriptor values. Besides the core of the molecule, we split side functional group into two R-groups that refer to the R1 and R2 groups of BA ( Figure 2).
To evaluate the predictive ability of the model, the pIC 50 values of the test set were predicted using the model developed using the training set. The optimization and splitting methods of the test set were the same as those described for the training set.
To evaluate the predictive ability of the model, structure optimization of the test set was performed as previously described for the training set. The pIC 50 values of the test set were predicted on the basis of the constructed model. r 2 pred , which measures the predictive performance of a PLS model, is defined by equation 1 as follows: where SD is the sum of the squared deviations between the predicted biological activities of the test set and the mean activities of the training molecules and PRESS is the sum of the squared deviations between the predicted and experimental activity values for every molecule in the test set. Furthermore, the r 2 value between the observed and predicted activity values of the compounds in the test set with (r 2 ) or without an intercept (r 2 0 ) and the RMSEP [defined by Equation (2)] [24] for the test set compounds were also calculated. The models were validated using an additional parameter, R 2 m [defined by Equation (3)], which is a good indicator of the external predictability of QSAR models [25]:

Materials and Instruments
BE was extracted from the bark of Betula platyphylla Suk., and its purity exceeded 95% as determined by HPLC. All other chemicals (reagent grade) were commercially available. Separation of compounds 1-3 was performed by column chromatography using silica gel 60 (70-230 mesh). Thin-layer chromatography was performed using silica gel-coated aluminum sheets (silica gel 60 GF254, E. Merck, Darmstadt, Germany). The melting point (m.p.) of each compound measured on an X-6 micro melting-point apparatus (Beijing Tech instrument Co., Beijing, China). The FTIR spectra were recorded using a Nicolet Mattson Infinity Gold FTIR spectrometer using KBr pellets. 1 H-NMR spectra (300 MHz) and 13 C-NMR spectra (75 MHz) were recorded in CDCl 3 at 25 °C with TMS and solvent signals used as internal standards using a Bruker AVANCE-300 MHz. Chemical shifts are reported in ppm (δ). Mass spectra were obtained using a Finnigan Mat 95 apparatus. Elemental analyses were performed on a Thermo Finnigan EA 1112 Series Flash Elemental Analyzer, and the obtained values were within 0.4% of the theoretical values. assays were performed according to a previously reported method [26,27]. Briefly, cells were seeded in 96-well culture plates in 0.2 mL of growth medium per well and allowed to attach for 24 h. The culture medium was replaced with either BA or the derivatives at different concentrations in 4 replicates followed by 72 h of incubation. After incubation, 20 μL of 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide (MTT; Amresco, Solon, OH, USA) solution at a concentration of 5 mg/mL was added to each well followed by incubation for 4 h. The MTT solution was then aspirated, and 150 μL of DMSO was added to each well to dissolve the dark blue crystals thoroughly. The absorbance was measured at 490 nm using a microplate reader (Infinite 200 NanoQuant, Tecan, Grödig, Austria). The relative growth rate (%) was calculated as (mean absorbance of the sample/mean absorbance of the control) × 100%, considering the optical density of the control as 100%.

Conclusions
Topomer CoMFA studies were performed on 37 BA derivatives. The analysis gave a leave-one-out cross-validation q 2 value of 0.722 and a non-cross-validation r 2 value of 0.974, which suggested that the model has good predictive ability (q 2 > 0.2). The contour maps illustrated that bulky and electron-donating groups would be favorable for activity at the C-28 site and that a moderately bulky and electron-withdrawing group near the C-3 site would improve the activity for the R2 group. According to the modeling result, three betulin derivatives 1-3 were designed and synthesized. The in vitro cytotoxicity of the given analogs was consistent with the predicted values, which proves that the present topomer CoMFA model is successful and that it could guide the synthesis of new BA derivatives with high anti-cancer activity.