A Quantum Chemical and Statistical Study of Cytotoxic Activity of Compounds Isolated from Curcuma zedoaria

A series of 21 compounds isolated from Curcuma zedoaria was subjected to cytotoxicity test against MCF7; Ca Ski; PC3 and HT-29 cancer cell lines; and a normal HUVEC cell line. To rationalize the structure–activity relationships of the isolated compounds; a set of electronic; steric and hydrophobic descriptors were calculated using density functional theory (DFT) method. Statistical analyses were carried out using simple and multiple linear regressions (SLR; MLR); principal component analysis (PCA); and hierarchical cluster analysis (HCA). SLR analyses showed that the cytotoxicity of the isolated compounds against a given cell line depend on certain descriptors; and the corresponding correlation coefficients (R2) vary from 0%–55%. MLR results revealed that the best models can be achieved with a limited number of specific descriptors applicable for compounds having a similar basic skeleton. Based on PCA; HCA and MLR analyses; active compounds were classified into subgroups; which was in agreement with the cell based cytotoxicity assay.


Introduction
Curcuma zedoaria (Christm.) Rosc. (Zingiberaceae) is a medicinal herb largely found in tropical Asian countries, including Malaysia, Indonesia, India, Japan and Thailand [1]. Also known as temu putih in Malaysia and Indonesia, C. zedoaria is widely consumed as spice, a flavouring agent in native dishes and is frequently used in food preparations for women during post-partum confinement [2,3]. It has long been used as a folk medicine in different Asian countries for the treatment of menstrual disorders, dyspepsia, vomiting, cancer, stomachic, blood stagnation, hepato-protection and for promoting menstruation [1,4,5]. The rhizomes of C. zedoaria is considered as a rich source of terpenoids [6].
Quantaum chemical methods can be successfully applied to express molecular interactions between substrate and receptor in terms of molecular electronic properties of the substrates. Various qualitative and quantitative analyses and relationship studies can be found in the literature that used quantum chemical and statistical methods to achieve correlations between calculated variables and biological activities of natural and synthetic substrates [7][8][9][10][11][12][13]. Ishihara et al. employed semi-empirical PM5 method to delineate the relationship between the cytotoxic activity and 11 chemical descriptors of a series of tropolone compounds and were able to show that the observed cytotoxic activity correlated well with compounds of structural similarities and was governed mainly by dipole moment (µ), hydrophobicity (logP), hardness (η), electrophilicity (ω) and electronegativity (χ) [14]. In another study, Stanchev et al. showed that the cytotoxic activity of a series of 4-hydroxycoumarins was well correlated with logP, µ, volume (V) and molecular orbital energies (EHOMO and ELUMO) [15]. Yang et al. used a semi-empirical method AM1 to determine the molecular descriptors of a series of ganoderic acids with cytotoxicity against tumour cells; they showed that EHOMO, electronegativity, electronic energy, logP and molecular area (A) are the variables that best discriminate between highly and less active ganoderic acids [16].
The present study aimed at elucidating the structure-cytotoxic activity relationships of a series of 21 compounds isolated from C. zedoaria ( Figure 1) against four human cancer cells and a normal cell, namely as hormone-dependent breast carcinoma cells (MCF-7), cervical carcinoma cells (Ca Ski), human prostate cancer cells (PC-3), human colon adenocarcinoma cells , and human umbilical vein endothelial cells (HUVEC). Density functional theory was adopted at the level of B3LYP/6-31+G (d, p) in order to calculate electronic and steric molecular descriptors of the isolated compounds,

Simple Linear Regression (SLR) Analysis
The values of the electronic, steric and hydrophobic descriptors of compounds 1-21, as well as their cytotoxic activities (IC50 in µM) against MCF-7, Ca Ski, PC-3, HT-29 and HUVEC cells are presented in Table 1. As observed, the cytotoxic activity of compounds  varied with the cell type. Thus, a simple linear regression analyses was done to determine the effect of each of the descriptors separately on the cytotoxicity of the isolated compounds. Figure 2 displays simple linear regression curves obtained with each descriptor for the cytotoxicity of the test compounds against MCF-7 cells, while the statistical parameters (correlation coefficient (R 2 ), adjusted correlation coefficient (R 2 adj) and standard deviation (SD)) for SLR curves between each descriptor and each tested cell line is presented in Table 2.

Cytotoxicity against MCF-7 Cells and SLR Analysis
Based on the IC50 values (Table 2), the compounds were sorted into an inactive group (IC50 > 400 µM) and an active group (IC50 < 400 µM). To avoid the large discrepancies in the IC50 values, the active group was further subdivided into group A (200 < IC50 < 300 µM); group B (100 < IC50 < 200 µM) and group C (IC50 < 100 µM). The SLR analysis shows that the influence of a given descriptor on cytotoxic activity is dependent on the nature of the descriptor itself. For instance, the electronic descriptors IP, AE, χ, η, S, ω and μ have no significant influence (R 2 ≈ 0%-7%), while modest correlations were observed for descriptors α, A, V, LogP and M (R 2 ≈ 40%-55%) ( Figure 2). These results are consistent with those obtained by Ishihara et al., who showed that cytotoxic activity of 20 synthesized tropolones was poorly correlated with each of 11 chosen descriptors [14,17].

Cytotoxicity against Ca Ski Cells and SLR Analysis
Following the same pattern as that of Section 2.1.1, the compounds were divided into inactive and active groups, while the active group was further subdivided into group A, group B, and group C based on their IC50 values against Ca Ski cells. Table 2 represents the SLR parameters between each descriptor and log(IC50). As it can be seen, the effects of electronic descriptors IP, AE, χ, η, S and ω are negligible (R 2 ≤ 10%), while α, A, V and M descriptors showed a moderate effect (R 2 ≈ 28%-43%). Surprisingly, hydrophobicity played no role on the cytotoxicity of the compounds (R 2 ≈ 0) ( Table 2).

Cytotoxicity against PC-3 Cells and SLR Analysis
The compounds were classified into groups on the basis of their activity against PC-3 cells in the same fashion as discussed earlier. All chosen descriptors showed negligible effect on cytotoxic activity (R 2 ≈ 0%-8%) ( Table 2).

Cytotoxicity against HT-29 Cells and SLR Analysis
In case of cytotoxicity of the isolated compounds against HT-29 cells, moderate effects were obtained with the electronic descriptors namely IP, EA, χ, ω and μ with 14%, 19%, 23%, 14% and 13% correlation coefficients, respectively. In contrast to the results obtained for MCF-7 and Ca Ski cells, the steric descriptors did not show significant effects (R 2 ≤ 8%) ( Table 2).

Cytotoxicity against HUVEC Cells and SLR Analysis
For HUVEC cells, all descriptors showed no significant effect on the cytotoxic activity (R 2 ≤ 5%), except IP and χ, which showed moderate effects (30% and 13% correlation coefficients, respectively) ( Table 2).

Multiple Linear Regression (MLR) Analysis
In an attempt to further investigate the correlations between the calculated descriptors and the cytotoxic activity of the isolated compounds against each cell line, MLR analysis was performed. MLR analysis was conducted only for the compounds of the active group.

Cytotoxicity against MCF-7 Cells and MLR Analysis
Among the 17 compounds for which the IC50 values were observed against MCF-7 cells, compound 15 (gweicurculactone) was used as the model compound, and therefore excluded from MLR analysis. The MLR model as given in Equation (1) was obtained from the correlation observed between log(IC50) and the descriptors. The corresponding curve is presented in Figure 3a.
The predicted log(IC50)Pred. and residuals to experimental log(IC50)Obs. for the active compounds are given in Table 3. The correlation between all descriptors and cytotoxicity is relatively weak, with a standard deviation of SD = 0.41 and R 2 = 84%. The predicted log(IC50)Pred. for the model compound tested (compound 15) is relatively high (4.48) with a residual value of 2.34. While the predicted IC50 value suggested compound 15 is categorised in the inactive group, the observed IC50 dictates it to be an active compound. Consequently, this model (Equation (1)) was considered not suitable for cytotoxicity prediction. To obtain a better model, the first 11 compounds (1-11) with similar skeletons were chosen for MLR analysis. For better consistency in the analysis, they were further subdivided into labdane diterpenes (compounds 1-3) and germacrane sesquiterpenes (compounds 4-11). Only the active compounds were subjected to MLR as shown in Equation (2)   The model of Equation (2) was found to be superior to the previous model (Equation (1)) with a correlation coefficient of 99.37% and a SD of 0.09. For purpose of validation, this model (Equation (2)), was applied for compound 11. The predicted log(IC50)Pred. was found to be 2.10, with a difference of only 0.04 from the experimental value. The difference between the predicted and observed cytotoxicity is 13 µM. The MLR model of Equation (2) demonstrated the importance of IP, steric parameters (area and volume), hydrophobicity (logP) and molecular weight (M) on the cytotoxic activity of the test compounds. These results are in good agreement with previous studies where steric parameters and hydrophobicity were found to be the most important descriptors to classify compounds into high and low activities [14,16]. Thus, the MLR of Equation (2) subdivided the test compounds into high and low cytotoxicity (Figure 3b).

Cytotoxicity against Ca Ski Cells and MLR Analysis
Nine compounds showing cytotoxic activity against Ca Ski cells were selected for this study. Compound 4 (dehydrocurdione) was chosen as model compound and thus excluded from MLR analysis. The MLR model obtained between log(IC50) and six best descriptors is given in Equation (3) and the corresponding regression curve is shown in Figure 3c.
The predicted log(IC50) and residuals with respect to experimental values of active isolated compounds are presented in Table 3. The model was found to correlate the descriptors to the observed log(IC50) with good accuracy (R 2 99.94% and SD 0.02). For the model compound (4), the predicted log(IC50)pred. is 1.93, with a difference of 0.04 from the experimental value. The difference between the predicted and observed cytotoxicity is 8 µM. Equation (3) shows the function of steric parameters (area and volume), molecular weight (M), hardness, softness and the polarizability of the isolated compounds towards the cytotoxicity against Ca Ski cells.

Cytotoxicity against PC-3 Cells and MLR Analysis
Seventeen compounds showing cytotoxicity against PC-3 cells were included in MLR analysis while compound 4 (dehydrocurdione) was excluded as the model compound. The MLR model (Equation (4)  The predicted IC50 (>400 µM) of compound (4) suggested it is an inactive compound, which is contradictory to the observed IC50 (81.5 µM) against PC-3 cells. In an attempt to derive a better model, the number of descriptors was reduced and the analysis was confined to compounds (1-11) with a similar basic skeleton. Compound 4 was excluded from MLR analysis. The best correlation was obtained with the electronic descriptors of IP, EA, ω and µ (Equation (5) and Figure 3d). The predicted log(IC50) and residuals to experimental results are presented in Table 3.

Cytotoxicity against HT-29 Cells and MLR Analysis
Seventeen compounds showing cytotoxicity against HT-29 cells (Table 1) were chosen for this study while compound 4 (dehydrocurdione) was excluded from MLR analysis. The MLR model obtained between log(IC50) and all descriptors derived a correlation of 81% (SD 0.22). The predicted value for the test compound 4 (IC50 = 283 µM) suggested it as an active compound (experimental IC50 = 97 µM). In terms of activity, compound 4 falls in group A as per predicted IC50, which is quite different from its group (C) determined from the experimental IC50. In an attempt to obtain a better model, the number of descriptors was reduced and the analysis was performed for the first 11 compounds (1-11) with similar basic skeleton and compound 4 was excluded from MLR analysis. However, in every case, the difference between experimental and predicted IC50 was more than 100 µM, and therefore not presented herein.

Cytotoxicity against HUVEC Cells and MLR Analysis
Fifteen compounds that showed activity against HUVEC cells were included in this study and compound 4 (dehydrocurdione) was excluded from MLR analysis. The correlation between log(IC50) and all descriptors gave an R 2 of 96% (SD 0.16). The predicted IC50 for compound 4 from this model was 142 µM higher than the experimental IC50. The best correlation was obtained when three descriptors, namely IP, χ, S and V were taken into consideration (R 2 77%, SD 0.13) (Equation (6)). Predicted IC50 of compound 4 classified it as an active compound with a difference of 48 µM from the experimental IC50 value.

Principal Component Analysis (PCA)
Principal component analysis (PCA) allows the reduction of the number of variables used in a statistical analysis to create a new set of variables (PCs) expressed in a linear combination of the original data set [19]. The first new variable (PC1) contains the largest variance; while the second contains the second largest variance, and so on. Before applying the PCA method, each variable was standardized for ease of comparison between each other on the same scale. PCA analysis was performed only on MCF-7 cells. After several attempts to obtain a good classification of the isolated compounds, the best result was achieved with five variables, namely IP, A, V, logP and M. The first three components of PCA (PC1 = 90.50%, PC2 = 7.12%, and PC3 = 2.27%) conceded 99.89% of the overall variance of the data set (Table 4), while the sole combination of PC1 and PC2 described 97.62% of variance (Table 4). The loading vectors for PC1, PC2, and PC3 are given in Table 5 and the plot of the score vectors of the two principal components (PC1 × PC2) is shown in Figure 4.   As can be seen in Figure 4, the compounds under investigation are divided into two groups based on PCA analysis: compounds with high activity (1-3) and low activity (4-10). The principal component PC1 presented in Table 5 can be expressed through the following equation: Thus, a compound can be considered active if its IP, A, V, log P, and M values are similar to those described in the above Equation (7). When compared with published literature, the results of our present investigation followed the same trend with some agreement and disagreement in the involvement of descriptors for the activity of a series of compounds. For example, Yang et al. showed that cytotoxic ganoderic acids can be attained when higher values for the variables EHOMO, V, Eel, and logP are coupled with a smaller value for M [16]. Sauza et al. found that for a given flavone to be active against HIV, it must have smaller values for log P and V while EA must have a larger value [7]. In the present study, the results obtained from MLR or PCA are in coordination to show that the cytotoxicity of the compounds under investigation is dependent on the same descriptors (IP, A, V, logP and M) and afforded the same classification of the compounds (Figures 3b and 4).

Hierarchical Cluster Analysis (HCA)
In case of preliminary data analysis, HCA is a powerful tool for examining data sets for expected or unexpected clusters, including the presence of outliers. It examines the distances between the samples in a data set and represents them in a dendrogram which provides similar information as that of PCA results [20]. In HCA, each point forms only one cluster, and then the similarity matrix is analysed. The most similar points are assembled forming one cluster and the process is repeated until all the points belong to only one group [20]. The results obtained from MCF-7 cells are presented in the dendrogram ( Figure 5). Vertical lines in the dendrogram represent the compounds while the horizontal lines represent the distances between compounds within the same group or from compounds of other groups. According to the distances, the compounds are subdivided into highly and weakly active groups and this classification is similar to those obtained from PCA and MLR analysis.

Theoretical Calculations
Energy minimization and 3D structure optimization of the compounds were done by popular Becke three parameter Lee-Yang-Parr (B3LYP) exchange-correlation hybrid functional combined with a double-ζ Pople-type basis set 6-31+G (d,p), in which polarized and diffuse functions are taken into consideration [25]. B3LYP hybrid functional includes a mixture of Hartree-Fock exchange (20% of HF) with DFT exchange-correlation functional. The frequency analyses were carried out at the same level of theory. The absence of imaginary frequencies confirmed that the structures are true minima on the potential energy surface. The choice of the hybrid functional B3LYP is based on previous QSAR studies [26,27]. Recently, we successfully applied the hybrid functional B3P86 to calculate the electronic and structural descriptors for a series of phenolic Schiff bases [28].
The chemical descriptors selected to correlate with cytotoxic activity are: (i) electronic descriptors: frontier molecular orbital energies (EHOMO, ELUMO, which are well accepted as molecular descriptors in medicinal chemistry, since they are linked to the capacity of a molecule to form charge transfer complex with its biological receptor), ionization potential (IP), electronic affinity (EA), electronegativity (χ), hardness (η), softness (S), electrophilicity index (ω), dipole moment (μ), molecular polarizability (α); (ii) steric descriptors: surface area of a molecule (A), volume (V) and its molecular weight (M); and (iii) hydrophobicity descriptor: logP, where P stands for the octanol-water partition coefficient. The calculations of logP were carried out using Hyperchem Molecular package [29] by means of the atomic parameters derived by Ghose, Pritchett and Crippen and later extended by Ghose and co-workers [30,31]. The other descriptors were calculated using the DFT method and obtained in two different ways: (i) Orbital consideration, which is based on Koopman's theorem where IP = −EHOMO and EA = −ELUMO [32]; and (ii) energy consideration, which is based on the use of the classical finite difference approximation, where the change of one electron is usually involved ΔN = ±1 [33]. In this method, IP = E+1 − E0 and EA = E0 − E−1 where E0, E−1 and E+1 are the electronic energies of neutral molecule, when adding and removing an electron to the neutral molecule, respectively. In addition to methods (i) and (ii), the electronic descriptors (e.g., hardness) can be calculated using internally resolved hardness tensor (IRHT) approach [34][35][36], which deals with the fractional occupation numbers based on Janak's extension of DFT [37]. This approach is also based on orbital energies and takes into account the fractional occupation numbers based on Janak's extension of DFT [37]. De Luca et al. used the above approaches to investigate the solvent effects on the hardness values of a series of neutral and charged molecules, and found that these three methods can give similar results in the presence of solvent [38].
The solvent effects were taken into account implicitly by using the polarizable continuum model (PCM) as implemented in the Gaussian 09 package [39]. In PCM, the solute is embedded into a cavity surrounded by solvent described by its dielectric constant ε (e.g., for methanol ε = 32.6) [40]. The use of an explicit solvent has been investigated notably by Guerra et al., who obtained a better description of the electronic properties using PCM compared to the explicit solvent [41]. A hybrid model was tested by Trouillas et al. [42]. The authors showed that only slight differences can be observed as compared to PCM. All theoretical calculations including ground state geometry optimization and frequency analysis calculations were performed with Gaussian 09 package [39].
Simple and multiple linear regression (SLR and MLR, respectively) analyses were used to determine regression equations, correlation coefficients R 2 , adjusted R 2 and standard deviations (SD). PCA and HCA were employed to reduce dimensionality and investigate the subset of descriptors that could be more effective for classifying the isolated compounds according to their degree of cytotoxicity against tumour cells.
The regression models and statistical analyses of obtained results were carried out by using DataLab package [43].

Conclusions
In the present study, a set of electronic, steric and hydrophobicity descriptors were analysed using DFT quantum chemical calculations for a series of 21 compounds from C. zedoaria to determine the effect of the descriptors towards their cytotoxic activity against four different types of cancer cells (MCF-7, Ca Ski, PC-3 and HT-29), as well as a normal cell line (HUVEC). The statistical analyses showed that the influence of individual descriptor on the cytotoxicity of these compounds is not significant with an R 2 less than 50% and a standard deviation higher than 0.20. The results also showed that the cytotoxicity of the compounds towards a given cell line rather depends on a set of certain descriptors. MLR, PCA and HCA allowed us to define the cytotoxicity of the compounds as high, moderate, and low based on their cytotoxicity.
Khalijah Awang. The cytotoxicity tests were performed by Omer Abdalla Ahmed, Sri Nurestri A. Malek and Syarifah Nur Syed Abdul Rahman. The quantum chemical calculations and QSAR studies were performed by El Hassane Anouar, Zuhra Bashir Khalifa Al Trabolsy and Nur Shahidatul Shida Zakaria. The statistical analyses were performed by El Hassane Anouar and Sharifuddin Bin Md Zain. Mohd Zulkefeli, Jean-Frédéric F. Weber, Jamil A. Shilpi analyzed the data. The manuscript wrote by El Hassane Anouar, Omer Abdalla Ahmed Hamdi, Khalijah Awang and Jamil A. Shilpi. All authors read and approved the final manuscript.