A Combined 2D-and 3D-QSAR Study, Design and Synthesis of Some Monocarbonyl Curcumin Analogs as Potential Inhibitors of MDA-MB-231 Breast Cancer Cells

: Breast cancer is the most frequently diagnosed life-threatening cancer in women. As a result, there is a critical need for the development of a safe and effective drug therapy for its treatment. Curcumin, a secondary metabolite isolated from Curcuma longa , has been shown to exhibit an impressively broad range of pharmacological effects. Despite its powerful interaction with a diverse set of cellular targets, curcumin has several drawbacks that limit its potential as a therapeutic agent. One of the most common approaches to overcoming these limitations is the design and synthesis of novel curcumin analogs. This study was conducted in support of the ongoing search for new molecules that are effective enough for the treatment of triple-negative breast cancer while causing only minor side effects. Hence, several symmetric monocarbonyl analogs of curcumin with cyclopentanone, cyclohexanone and 4-piperidone as the central core were synthesized via a Claisen–Schmidt condensation reaction. Their structures were identiﬁed by measuring the melting points, as well as using FTIR and UV/VIS spectroscopic techniques. To assess the cytotoxic activities of the analogs against MDA-MB-231 breast cancer cells and to identify the signiﬁcant structural features responsible for these molecules’ potency, combined 2D-and 3D-quantitative structure– activity relationship (QSAR) models were developed. The generated QSAR models demonstrated acceptable internal validation, as well as good external predictive capacity, indicating that they can be used to design similar compounds. These results suggest that the synthesized candidate drugs have promising cytotoxic potential against MDA-MB-231 cancer cells and should be further investigated both in vitro and in vivo.


Introduction
According to WHO, carcinoma of the breast is the world's most prevalent cancer in women. Ref.
[1] Human breast cancer is a heterogeneous disease that encompasses a diverse group of malignancies with remarkably different biological characteristics, affecting multiple signaling pathways via highly complex molecular mechanisms. Triple-negative breast cancer (TNBC) represents the most aggressive form of breast cancer associated with high risk of metastatic progression and tumor recurrence. It is concerning that TNBC is a tumor type typically resistant to most of the standard therapeutic approaches and that, to date, it lacks effective targeted therapies, hence having the worst prognosis of all breast cancer types [2][3][4][5].
Considering the highly divergent nature of breast cancer, it is clear that a singletargeted agent would be inefficient in addressing the complexity of the issue. Hence, reasonably, multi-targeted agents are expected and proven to be more proficient in playing this role [6]. From this perspective, natural products depict a vital starting point for the discovery or design of potential lead compounds. Of the broad spectrum of this group, it is shown that curcumin-a polyphenolic phytochemical from the rhizomes of the Curcuma longa sp.-is one of the most important members, as it was found to possess multiple bioactive properties, including enhanced anti-cancer activity, when compared to commonly used therapeutic drugs. However, this regulator of numerous targets that modulates various cancer hallmarks, possesses a molecular structure which, although pleotropic, is responsible for curcumin's poor pharmacokinetic profile and subsequent low bioavailability. It is considered that the main reason for its physiological instability is the central β-diketone group that is susceptible to keto-enol tautomerism and subsequent degradation [4,[6][7][8][9]. Even though there are several strategies to modify it, it has been established that the most influential structural modifications of curcumin, in terms of improving its stability and solubility, are those in which, on the one hand, the "heart" is eliminated from the structure, and on the other hand, various alternative substituents on the terminal phenyl rings are incorporated. The resulting synthetic products are called monocarbonyl analogs of curcumin-compounds endowed with better chemical stability that show more potent bioactive effects [10].
Within this study, we synthesized several symmetrical cyclic C5 analogs of curcumin with cyclopentanone, cyclohexanone and 4-piperidone cores and different ortho-and parasubstituents. For the purpose of guiding and optimizing drug-design efforts, a thorough understanding of the structural requirements for anti-cancer activity is crucial. Therefore, one of the primary goals of this research was to predict the anti-breast cancer activity of our synthetic compounds using previously screened molecules, as well as to understand the role of different substituents on the benzene rings and the influence of the structure of the 5-C linker on that activity. In order to find a statistically significant correlation between the structural features of the molecules and their anti-cancer potential, predictive 2D-and 3D-QSAR models were developed. These models were used for obtaining predictive values for the activities of our synthesized MACs, to test their ability to inhibit the growth of cultured MDA-MB-231 human breast cancer cell lines and hence evaluate their anti-breast cancer properties. Since the generated QSAR models resulted in good values for all of the analogs, we can make initial predictions about them from a therapeutic applicative aspect. However, their evaluation in vitro as well as in vivo will be critical to assess therapeutic utility (Supplementary materials).
The melting-point measurements were performed by a Mel-Temp II capillary apparatus (Us Lab. devices) and were uncorrected. Infrared spectra were recorded on a Perkin Elmer 2000 FT-IR with previously prepared KBr pellets, as well as with the ATR (attenuated total reflection) technique using a golden gate sapphire/diamond system. UV spectra were recorded in acetonitrile, acetonitrile/water (80/20 v/v), 1,2-dichloroethane on a Varian Cary 50 Scan UV-vis spectrophotometer.

Chemistry
The analogs were obtained by coupling 1 eq. of the appropriate ketone with 2 eq. of the substituted benzaldehyde via the Claisen-Schmidt condensation reaction. The synthetic routes of the base catalyzed reactions are presented in Scheme 1.

Chemistry
The analogs were obtained by coupling 1 eq. of the appropriate ketone with 2 eq. of the substituted benzaldehyde via the Claisen-Schmidt condensation reaction. The synthetic routes of the base catalyzed reactions are presented in Scheme 1. Scheme 1. Synthetic routes of the base catalyzed reactions.

General Procedure for the Synthesis of Analogs with Cyclopentanone and Cyclohexanone Cores
The analogs were prepared using a previously reported procedure by Liang et al. with minor modifications. A total of 7.5/10 mmol of cyclopentanone or cyclohexanone was mixed with 15/20 mmol of the appropriate benzaldehyde (2-bromobenzaldehyde, 2-fluorobenzaldehyde, 2-furaldehyde and 4-(dimethylamino)benzaldehyde) in a 1:2 ratio, in a round-bottom flask. After adding 10/15 mL of the reaction solvent (methanol,, absolute ethanol), the reaction mixture was stirred for 5 min at room temperature, with an electromagnetic stirrer, followed by the dropwise addition of catalytic amounts (1.5-3 mL) of 20% (w/v) aqueous NaOH solution over the next 5 min. In the case of 4-dimethylamino analog absolute ethanol was used as reaction solvent. The reaction mixture was refluxed for 2 h on 80 °C, followed by the obtention of an orange precipitate. While adding the base, the mixture acquired a yellow color, and, after a few minutes, a yellow precipitate was obtained. The reaction mixture was then stirred well with an electromagnetic stirrer, at ambient temperature, for approximately one hour. After the time had elapsed, the reaction flask was lowered and kept in an ice-bath for about 10 min, followed by the vacuum filtration of the content with a Büchner funnel. The yellow precipitates were washed with saturated aqueous NH4Cl solution, distilled H2O, ice-cold 96% ethanol and cold methanol. After drying, the obtained solids were purified by recrystallization from different solvents and solvent mixtures.

General Procedure for the Synthesis of Analogs with Cyclopentanone and Cyclohexanone Cores
The analogs were prepared using a previously reported procedure by Liang et al. with minor modifications. A total of 7.5/10 mmol of cyclopentanone or cyclohexanone was mixed with 15/20 mmol of the appropriate benzaldehyde (2-bromobenzaldehyde, 2-fluorobenzaldehyde, 2-furaldehyde and 4-(dimethylamino)benzaldehyde) in a 1:2 ratio, in a round-bottom flask. After adding 10/15 mL of the reaction solvent (methanol" absolute ethanol), the reaction mixture was stirred for 5 min at room temperature, with an electromagnetic stirrer, followed by the dropwise addition of catalytic amounts (1.5-3 mL) of 20% (w/v) aqueous NaOH solution over the next 5 min. In the case of 4-dimethylamino analog absolute ethanol was used as reaction solvent. The reaction mixture was refluxed for 2 h on 80 • C, followed by the obtention of an orange precipitate. While adding the base, the mixture acquired a yellow color, and, after a few minutes, a yellow precipitate was obtained. The reaction mixture was then stirred well with an electromagnetic stirrer, at ambient temperature, for approximately one hour. After the time had elapsed, the reaction flask was lowered and kept in an ice-bath for about 10 min, followed by the vacuum filtration of the content with a Büchner funnel. The yellow precipitates were washed with saturated aqueous NH 4 Cl solution, distilled H 2 O, ice-cold 96% ethanol and cold methanol. After drying, the obtained solids were purified by recrystallization from different solvents and solvent mixtures.

General Procedure for the Synthesis of Analogs with 4-Piperidone Cores
A total of 10 mmol of piperid-4-one hydrochloride monohydrate was first dissolved in distilled water (4 mL) in an Erlenmeyer flask, after which ethanol (96%, 20 mL) was added. That was followed by the addition of 20 mmol of the appropriate benzaldehyde (2-bromobenzaldehyde, 2-fluorobenzaldehyde, 2-trifluorobenzaldehyde).
The mixture was stirred with an electromagnetic stirrer, followed by the dropwise addition of 10% (w/v) aqueous NaOH solution (20 mL, 0.5 mol) over a 20 min-period. While aliquots of base were added, the reaction mixture turned yellow, and, after a few minutes, a yellow precipitate was obtained. The content of the reaction flask was then stirred with an electromagnetic stirrer, at room temperature, for approximately 2 h. Afterwards, the reaction flask was lowered and kept in an ice-bath for about 10 min, and the content was filtered with a Büchner funnel. The yellow precipitates were washed with dH2O and ice-cold methanol. When dried, the yellow solids were purified with recrystallization from 96% ethanol and a 4:1 96% ethanol/ethyl acetate mixture (2-bromobenzylidene analog).

Computational Studies
The dataset chosen for this study contains 36 monocarbonyl curcumin analogs with different central cores and different types of aryl substituents at various positions. Their biological activities were extracted from the literature in the form of the negative logarithm of IC50 (pIC50) [11,12]. The IC50 value represents the mean concentrations that inhibited cell growth in MDA-MB-231 human breast cancer cells by 50%.
To test the predictive ability and robustness of the QSAR models, a test set of 6 molecules was chosen randomly, and the remaining 29 compounds were used as a training set to derive the QSAR models.

Geometry Optimization
First, all of the molecules' structures were drawn and converted to 3D using Mar-vinSketch. They were subjected to conformational analysis in order to find the lowest energy conformations. For energy minimization, the MMFF94 method was used, and after generating 50 conformers for each molecule, the conformers with the lowest potential energy (i.e., global minimum) were further geometrically optimized using the software package HyperChem 7.01. The optimization of the geometry for each molecule was carried out using the semi-empirical PM3 method and the Polak-Ribiere algorithm, with a RMS energy gradient of 0.001 Kcal/(Å mol) in vacuum.

2D-QSAR Methodology
The optimized geometries of the molecules were used to calculate different types of descriptors namely topological, electronic, geometrical and constitutional descriptors that encode different aspects of the molecular structure. The ChemDes platform and Hyperchem software were used for this purpose.
A common pretreatment process was applied to the calculated descriptors, including the removal of descriptors with constant or near-constant values (a variance cutoff of 0.001 was used) and the removal of highly intercorrelated descriptors (a correlation cutoff of 0.8 was used). This procedure was carried out by creating a correlation matrix in Excel. The descriptors with the highest correlation to the pIC50 were chosen for the multiple linear regression analysis to determine the best 2D-QSAR equation. This regression study was performed using the statistical program SPSS, which used a dataset made up of 29 training-set molecules.
The developed model was validated internally and externally using common validation procedures. Internal validation included the calculation of common parameters such as R, R 2 , Adjusted R 2 , SE, F and Significance F. The test set was used for external validation to assess the predictive ability of the model.

• Quenched Molecular Dynamics and Alignment
In ligand-based 3D QSAR studies wherein the conformation eliciting the biological response is unknown, one should at least generate a conformational library of all studied structures and then examine which conformers provide the best QSAR model [13]. For this purpose, each previously optimized structure was subjected to a quenched molecular dynamics (QMD) procedure in the Open3DAlign program. The QMD search was accomplished by running a number of short molecular dynamics runs using the MMFF94 force field and TINKER as the molecular mechanics engine.
Before performing statistical analysis, the 3D structures of all compounds must be aligned in 3D space. The structural alignment rule has a direct impact on the accuracy of the QSAR model prediction and the reliability of the contour maps [14]. The best fit template conformer was chosen from the conformational pool, and the conformational flexibility for templates was also taken into consideration. Then the whole dataset was aligned to the template by using a mixed algorithm in Open3DAlign, combining both atom-based LAMBDA-like and Pharao pharmacophore-based approaches. Compound 2 with the highest alignment O3A_score was chosen as the template molecule, onto which the remaining compounds were superimposed and used to build the 3D-QSAR model.

•
CoMFA Model For building the 3D-QSAR model, comparative molecular field analysis (COMFA) method was chosen. The method mostly focuses on ligand properties, such as steric and electrostatic properties, and the resulting favorable and unfavorable receptor-ligand interactions [15,16].
The best-scored alignment molecular set superimposed on conformer 24 of compound 2 was subsequently analyzed in Open3DQSAR using classical Coulombic and van der Waals energy molecular interaction fields (MIFs) computed by the molecular mechanics method MMFF94.
In a more detailed view, the aligned ligand ensemble was first placed in a 2 Å step-size 3D cubic grid box with a 5 Å gap around the largest molecule in all directions. The steric (van der Waals) and electrostatic (Coulombic) interaction energies were calculated for each molecule at each grid point using an sp3 hybridized carbon atom probe and a volume-less probe with a +1 charge, respectively. These steric and electrostatic interaction energies were considered independent variables (CoMFA descriptors).
In order to reduce the noise hidden in the PLS matrix and thus reduce the computational time, the data was pretreated prior to the creation of the CoMFA model and then variable clustering (smart region definition (SRD) procedure) and selection (fractional factorial design (FFD)) procedures were applied. FFD selection aims to select the variables that have the largest effect on predictivity and can operate on both single variables or on groups identified by a previous SRD run.
Finally, PLS analysis was engaged to obtain a correlation between the descriptors derived by CoMFA (independent variables) and pIC50 values (dependent variable).
Open3DQSAR produces a PLS model through the non-linear iterative partial least squares (NIPALS) algorithm [17]. Finally, CoMFA color contour maps were derived for the steric and electrostatic fields.
Without a proper statistical validation, no model can be reliably used for biological activity interpretation and prediction and thus considered true. For internal validation, statistical parameters, including the F-ratio test, R 2 and SDEP were computed. Crossvalidation was performed by applying leave-one-out (LOO), and it was expressed with the coefficient of determination Q 2 . The predictive power of each PLS model was evaluated against the external test set and expressed both as R 2 pred and as SDEP.
The first descriptor is based on the topological distance and the ionization potential. It is a 2D autocorrelation descriptor that associates the presence of polarizable pairs of atoms, at specific topological distance [18]. So, according to its negative correlation with the activity, the higher the ionization potential of the atoms (less electronegative) that share three covalent bonds will be and the lower the activity of the analogs will be.
The second descriptor is a 3D topological distance-based autocorrelation descriptor (also called 3D-TDB) that is related with the topological and geometric distances.
The third descriptor, which belongs to the RDF descriptors, highlights the significance of the distribution of atomic polarizabilities within a radius of 6.0 Å. Its positive correlation with the activity means that increasing the polarizability of atoms is favorable.
SCH-6 is a simple sixth-order topological descriptor based on interatomic distances calculated by the bonds between them, representing molecular connectivity as a chemical graph [19]. The order being six represents the number of edges in the graph, which indicates the branching. The negative value of its coefficient indicates its negative impact on biological activity.
The next descriptor is a topological descriptor, known as the bond information content index, which represents a measures of the number of bonds and their multiplicity in the chosen structural fragment. It can differentiate molecules according to their size, degree of branching and flexibility [20].
nHBint6 is a 2D E-state descriptor and is related to the electro-topological state of hydrogens that are capable of making a hydrogen bond with a path length of six and that may be involved in intermolecular contacts and interactions and, in addition, contribute to the general values of biological and physical-chemical properties [21].
The last descriptor is a 1D constitutional descriptor, known as Crippen's LogP. This descriptor is considered an informative parameter of the solubility tendency of a compound. The lower the CrippenLogP, the more hydrophilic the molecule is and the greater its tendency to dissolve in the aqueous phase. The negative CrippenLogP coefficient indicates that lipophilic molecules decrease inhibitory activity.
The values of the statistical parameters of the internal and external validation for the MLR model are mentioned in Table 1. All of the parameters have acceptable values.  The steric fields are represented by green-and yellow-colored contours, in which green areas indicate regions where increased steric hindrance would increase the activity, while the yellow areas suggest regions where the bulky groups are not favored.
The electrostatic fields are represented by blue and red color contours, in which blue areas define a region where a positively charged substituent increases activity, while the red areas define a region where a negatively charged substituent increases activity. These contour maps give us some general insight into the nature of the receptor-ligand binding region.
The generated QSAR model demonstrated acceptable internal validation, as well as good external predictive capacity ( Table 2), indicating that it can be used to design similar groups of compounds.  Table 3 are represented the predicted activities of our synthesized molecules.   The steric fields are represented by green-and yellow-colored contours, in which green areas indicate regions where increased steric hindrance would increase the activity, while the yellow areas suggest regions where the bulky groups are not favored.
The electrostatic fields are represented by blue and red color contours, in which blue areas define a region where a positively charged substituent increases activity, while the red areas define a region where a negatively charged substituent increases activity. These contour maps give us some general insight into the nature of the receptor-ligand binding region. The generated QSAR model demonstrated acceptable internal validation, as well as good external predictive capacity ( Table 2), indicating that it can be used to design similar groups of compounds. Finally, in Table 3 are represented the predicted activities of our synthesized molecules.

Discussion
Based on the combination of the 2D-and the 3D-QSAR models, we can make some assumptions for the general SAR trend. Both the presence of groups on the aromatic rings and the linker between the aromatic rings have contributions in determining the anticancer activity of the analogs. For analogs with similar substituents, it is noticeable that a reduction in the core size implies lower anti-cancer activity. Oppositely, replacement with a heterocyclic "core" results in a strong cytotoxic effect. On the other hand, for analogs with a similar "core" but a different ortho-substituent on the aromatic rings, there is a positive dependence between the inhibitory activity and the electron-accepting properties of the substituent, which indicates that the presence of more electronegative element(s) in the substituted group results in the more pronounced anti-cancer properties of the analog. Another observation that can be derived from the 3D electrostatic fields is that the presence of bulky groups is unfavorable at the ortho-position.
In conclusion, the rational design of our MACs was supported within this study, and it was confirmed that they have a promising ability to combat breast cancer. Regardless of our findings, for future feasible applicability, it is crucial to evaluate them in vitro, as well as in vivo.