Finding a Novel Chalcone–Cinnamic Acid Chimeric Compound with Antiproliferative Activity against MCF-7 Cell Line Using a Free-Wilson Type Approach

In this work, we carried out the design and synthesis of new chimeric compounds from the natural cytotoxic chalcone 2′,4′-dihydroxychalcone (2′,4′-DHC, A) in combination with cinnamic acids. For this purpose, a descriptive and predictive quantitative structure–activity relationship (QSAR) model was developed to study the chimeric compounds’ anti-cancer activities against human breast cancer MCF-7, relying on the presence or absence of structural motifs in the chalcone structure, like in a Free-Wilson approach. For this, we used 207 chalcone derivatives with a great variety of structural modifications over the α and β rings, such as halogens (F, Cl, and Br), heterocyclic rings (piperazine, piperidine, pyridine, etc.), and hydroxyl and methoxy groups. The multilinear equation was obtained by the genetic algorithm technique, using logIC50 as a dependent variable and molecular descriptors (constitutional, topological, functional group count, atom-centered fragments, and molecular properties) as independent variables, with acceptable statistical parameter values (R2 = 86.93, Q2LMO = 82.578, Q2BOOT = 80.436, and Q2EXT = 80.226), which supports the predictive ability of the model. Considering the aromatic and planar nature of the chalcone and cinnamic acid cores, a structural-specific QSAR model was developed by incorporating geometrical descriptors into the previous general QSAR model, again, with acceptable parameters (R2 = 85.554, Q2LMO = 80.534, Q2BOOT = 78.186, and Q2EXT = 79.41). Employing this new QSAR model over the natural parent chalcone 2′,4′-DHC (A) and the chimeric compound 2′-hydroxy,4′-cinnamate chalcone (B), the predicted cytotoxic activity was achieved with values of 55.95 and 17.86 µM, respectively. Therefore, to corroborate the predicted cytotoxic activity compounds A and B were synthesized by two- and three-step reactions. The structures were confirmed by 1H and 13C NMR and ESI+MS analysis and further evaluated in vitro against HepG2, Hep3B (liver), A-549 (lung), MCF-7 (breast), and CasKi (cervical) human cancer cell lines. The results showed IC50 values of 11.89, 10.27, 56.75, 14.86, and 29.72 µM, respectively, for the chimeric cinnamate chalcone B. Finally, we employed B as a molecular scaffold for the generation of cinnamate candidates (C–K), which incorporated structural motifs that enhance the cytotoxic activity (pyridine ring, halogens, and methoxy groups) according to our QSAR model. ADME/tox in silico analysis showed that the synthesized compounds A and B, as well as the proposed chalcones C and G, are the best candidates with adequate drug-likeness properties. From all these results, we propose B (as a molecular scaffold) and our two QSAR models as reliable tools for the generation of anti-cancer compounds over the MCF-7 cell line.


Introduction
"Cancer" is a generic term that designates a broad group of diseases that can affect any part of the body [1]. It is a tissue growth produced by the continuous proliferation of abnormal cells with the ability to invade and destroy other tissues. It is a consequence of gene mutations that control cell proliferation, differentiation, and homeostasis [2]. The International Agency for Research on Cancer estimated 18.1 million new cancer cases and 10 million cancer deaths in 2020 [3,4]. In México, 190,667 new cases of cancer were presented in 2018, including breast, prostate, colorectal, thyroid, and cervical cancers [4]. Currently, the available treatments for cancer patients can be classified into those that are applied locally (such as surgery and radiotherapy) and those that are used systemically (chemotherapy, hormone therapy, and immunotherapy). The search for suitable natural products for treating human cancer has resulted in the discovery of powerful antiproliferative chemicals used in current practice. Molecular models (scaffolds) obtained from living organisms have undergone different modifications to obtain molecules that are more potent and less toxic than the original molecules.
Chalcone is a simple chemical scaffold of a wide variety of natural and synthetic flavonoid-type compounds. They are widely distributed in the vegetable kingdom, being present in vegetables, fruits, teas, and other plants [5,6]. Chalcone derives from the Greek word "Chalcos", meaning "bronze", which results from the colors of most natural chalcones [7]. Structurally, chalcones consist of a common chemical scaffold of 1,3-diaryl-2propen-1-one (chalconoid), which is formed by two aryl groups (α-and β-rings) connected by an enone, existing as cis and trans isomers, with the trans isomer being thermodynamically more stable. In general, chalcones are synthesized under Claisen-Schmidt condensation conditions using acetophenone and substituted benzaldehydes [8][9][10]. In order to obtain condensate compounds, in the case of polyhydroxy chalcones, protection/deprotection steps are employed [11][12][13][14].
Other naturally occurring compounds with a wide variety of biological properties are phenolic acids, vital constituents found in plants, fruits, vegetables, cereals, and legumes [24]. As secondary metabolites produced by plants, phenolic compounds are characterized by a wide variety of biological properties, with standouts including antioxidant, anti-inflammatory, hepatoprotective, anxiolytic, insect-repellent, antidiabetic, anticholesterolemic, antimicrobial, and antiulcer activities [25]. In terms of chemical structure, they can be divided into derivatives of cinnamic and benzoic acids, varying in the number and substitution of hydroxyl and methoxy groups. Cinnamic, ferulic, caffeic, and p-coumaric acids are the most common cinnamic acid derivatives, and gallic, protocatechuic, syringic, salicylic, vanillic, and p-hydroxybenzoic acids are the most common benzoic acid derivatives [26]. Cinnamic acid is a key intermediate in the shikimate and phenylpropanoid pathways [27]. Cinnamic acid derivatives are reported to have strong an-tioxidant effects [28]. The p-hydroxy and methoxy groups in cinnamic acid derivatives also show good insulin-releasing activity [29]. Benzoic acid derivatives are naturally consumed as dietary phenolic compounds.
On the border between bioinspired design and rational design, we can prepare hybrid molecules with a dual mode of action to create new efficient drugs. Hybrid molecules are defined as chemical entities with two or more structural domains with different biological functions and double activity, indicating that a hybrid molecule acts as two distinct pharmacophores [30]. For these reasons, as well as their structural simplicity and the simplicity of their associated synthesis, chalcones and polyphenols continue to enjoy considerable attention from medicinal chemicals that explore new molecular scaffolds for the design of new therapeutics [31].
The use of in silico techniques, such as quantitative structure-activity relationship studies (QSAR), in pharmacology has improved the process of selection and design of several drugs. All these attributes originated the Computer-Aided Drug Design (CADD) discipline. QSAR is a powerful tool to discover the underlying relationship between a molecular structure and the biological activity of a set of compounds, supported by rigorous statistical parameters [32][33][34][35]. Many QSARs have been done to study the biological activities of the chalcone and polyphenols compounds; our work studied the possible action mechanism of a particular set of chalcones with antiproliferative activity [20]. In addition, QSAR is a valuable tool for achieving the prediction of activity in cases where experimental data are not accessible. QSAR methods use mathematical functions to relate the molecular characteristics and properties of the compounds with their biological activity.
Free-Wilson analysis is a QSAR method that directly relates the structural features of molecules with their biological properties, offering a simple and efficient approach for the quantitative description of the biological activity of interest. The main advantage of employing this methodology is the use of simple descriptors that consider only the presence or absence of substituents at a specific position on the molecule-molecular scaffold. However, some drawbacks arise, mainly due to the fact that a large number of analogs need to be synthesized to make the model meaningful.
Based on all the properties mentioned above of the chalcone and phenolic compounds, in this work, we constructed a QSAR model to predict the cytotoxic activity of different 2 ,4 -DHC derivatives replaced in position C-4 with phenols from the cinnamic and benzoic acid families. Finally, we synthesized and evaluated the theoretically more active compound in a panel of five cell lines of human cancer.
The calculation of different descriptor families was based on their structural interpretation to help us to find the structural fragments that are key for increasing chalcone antiproliferative activity (see details in the Computational methods section). A Free-Wilson analysis was done by inspection of substituents at rings α and β: if a specific functional group appeared at a particular position in the ring, a number 1 was assigned; otherwise, a 0 value was designated. The generation of mathematical models was done through the MobyDigs software [54]; genetic algorithms (GA) were used to generate regression models. The best mathematical model obtained is shown below, consisting of fifteen molecular descriptors. (Equation (1)). The calculation of different descriptor families was based on their structural interpretation to help us to find the structural fragments that are key for increasing chalcone antiproliferative activity (see details in the Computational methods section). A Free-Wilson  ; a(Q 2 ) = −0.2392 (±0.000); δK = 0.0308 (0.000); δQ = 0.0034 (−0.005); R P = 0.017 (0.100); R N = −0.033 (−0.054) All statistical parameters were obtained as an average value from five experiments. The values of all the molecular descriptors present in the model and the logIC 50 of each molecule are shown in Table S1. This model was chosen according to its descriptive ability-associated with the molecular descriptors and their coefficients-and statistical validation requirements, such as standard deviation (s) of 0.1501 (±0.000) [55], Fisher function test (F) value of 42.906 (±22.1114) [56], and correlation coefficient (R 2 ) of 86.93 (±1.3891) [57], indicating that our model is acceptable. In addition, the QUIK (δK), redundancy (R P ), and overfitting (R N ) rules were used for the model selection with values of δK = 0.0308 (0.000), R P = 0.017 (0.100), and R N = −0.033 (−0.054) [58][59][60][61], where R P indicates redundancy among some descriptors. To evaluate and validate the predictive ability of our model, we applied the asymptotic squared Q rule (Q 2 ASYM ) and the leave-many-out cross-validation (Q 2 LMO ), this technique consists of leaving a percentage of the molecules out of the model generation; in our case, it was 30% randomly selected, and the experiment was carried out five times. Linear correlation plots of the predicted logIC 50 against experimental logIC 50 are shown in Figure 2, with their upper/lower confidence intervals at a 95% level [61]. From Figure 2, it can be seen the correlation and predictive ability of our QSAR model according to the number of molecules (colored blue and yellow) that are located in the applicability domain. The applicability domain depicted by the Williams plot in Figure 3 allows the detection of molecules that our model cannot adequately predict. Molecules with distinctive structures-high leverage, h > h*, where h* = (3p)/n, where n is the number of molecules  The applicability domain depicted by the Williams plot in Figure 3 allows the detection of molecules that our model cannot adequately predict. Molecules with distinctive structures-high leverage, h > h*, where h* = (3p)/n, where n is the number of molecules and p is the number of descriptors in the model plus one-or those associated with the response (calculated residuals > 3*SDEC), that are outside of the limits established by the leverage warning and three times the standard deviation in error calculation are outliers.

QSAR Interpretation
Equation (1) presents eleven indicative variables: R6_OH, R2_OMe, R4_FA026, R4_FA029, R4_1PPD, R2_TMPhO, R2′_Cl, R4′_Cl, R2′_OMe, R4′_4MPZ, and R4′_FB035. These descriptors are associated with the presence/absence of the corresponding functional group placed at the specific carbon atom on rings "α" or "β". Functional groups are important features in chemical structures as they provide the basis for physicochemical properties unique to each compound, as well as reactivity. They also allow the generation of key interactions with their biological targets and can modulate the potency of affinity. Thus, the inclusion of molecular descriptors of functional groups at specific positions in the chalcones rings permits the analysis of their potency in terms of volume or properties. Nevertheless, for the study of this system, the consideration of common functional groups as molecular fragments was not enough. Therefore, bigger and more complex molecular fragments incorporated on rings "α" or "β" were used. Figure 4 shows the structure of functional groups and structural fragments used in the model and the number of molecules from the dataset that contain it. For example, it is observed that 16 molecules from the 207 compounds contain the hydroxyl group at the αring, while 46 contain the methoxy group on the same ring. On the other hand, 17 molecules from the 207 chalcones contain the chlorine atom at position 2, while another 17 molecules contain the Cl-atom at position 4 of the β-ring. A reduced number of chalcones contain complex molecular fragments, as in most cases, these molecular descriptors are used to incorporate this diverse set of molecules into the QSAR model. For example, according to the Williams' plots, the chemical structure FA026 only appears in three of the Three chalcones containing the 6-chloro-2-quinoxalinol motif are seen as outliers, with two containing chlorine atoms at positions 2 and 4 on the β-ring and one p-methoxy group on the α-ring. Although the QSAR model suggests that chlorine atoms located at the orthoand para-positions on the β-ring are good for the bioactivity, the combination with the 6-chloro-2-quinoxalinol group generates results that are out of our model applicability domain for a bit. The p-methoxy is distinctive from the rest of the molecules in terms of bioactivity: while similar species do exhibit logIC 50 below 1.35 (or 22 µM), the methoxy derivative shows a logIC 50 value of 1.725 (or 53.1 µM), thus being less potent than other compounds. Hydroxamic acid derivatives are also observed as outliers, as only three molecules are present within the data used to generate the model. Piperidine derivatives, alongside a piperazine chalcone, are also observed as outliers. Indeed, Equation (1) establishes that piperidine substituents at the α-ring decrease the chalcones' potency. Other compounds with unique structures are observed in Figure 3, for which William's plots placed them as potential outliers.

QSAR Interpretation
Equation (1)  tors are associated with the presence/absence of the corresponding functional group placed at the specific carbon atom on rings "α" or "β". Functional groups are important features in chemical structures as they provide the basis for physicochemical properties unique to each compound, as well as reactivity. They also allow the generation of key interactions with their biological targets and can modulate the potency of affinity. Thus, the inclusion of molecular descriptors of functional groups at specific positions in the chalcones rings permits the analysis of their potency in terms of volume or properties. Nevertheless, for the study of this system, the consideration of common functional groups as molecular fragments was not enough. Therefore, bigger and more complex molecular fragments incorporated on rings "α" or "β" were used. Figure 4 shows the structure of functional groups and structural fragments used in the model and the number of molecules from the dataset that contain it. For example, it is observed that 16 molecules from the 207 compounds contain the hydroxyl group at the αring, while 46 contain the methoxy group on the same ring. On the other hand, 17 molecules from the 207 chalcones contain the chlorine atom at position 2, while another 17 molecules contain the Cl-atom at position 4 of the β-ring. A reduced number of chalcones contain complex molecular fragments, as in most cases, these molecular descriptors are used to incorporate this diverse set of molecules into the QSAR model. For example, according to the Williams' plots, the chemical structure FA026 only appears in three of the 207 molecules, which correlates with their position as chemical outliers. The same applies to molecules that contain the 4MPZ and 1PPD functional groups.  The next two variables, Qindex and CENT, belong to the topological descriptors. Qindex is the quadratic index calculated by the normalization of the First Zagreb index (ZM1) [62]. On the other hand, Centralization (CENT) is defined as: where W is the Wiener index, nSK the number of non-hydrogen atoms, and UNIP is the unipolarity of the molecule (defined as the minimum value of the vertex distance degrees) [63]. Both descriptors are mainly related to molecular branching. The scatterplot of the Qindex descriptor against the logIC50 is shown in Figure 5. The plot shows that chalcone derivatives with low branching are located at the bottom of the graph, while compounds with more complex functional groups (high branching) at the α-ring are seen at the middle top. A similar trend is observed for the CENT descriptor against the observed bioactivity, although the dispersion of the data is more pronounced. Nonetheless, the chemical space can be divided into three regions, where simple chalcone derivatives are at the bottom.
Notably, for both plots, three compounds are seen at the middle-left of the scatterplot, near the low-branching chalcones. These chemical species have high potency against the The next two variables, Qindex and CENT, belong to the topological descriptors. Qindex is the quadratic index calculated by the normalization of the First Zagreb index (ZM1) [62]. On the other hand, Centralization (CENT) is defined as: where W is the Wiener index, nSK the number of non-hydrogen atoms, and UNIP is the unipolarity of the molecule (defined as the minimum value of the vertex distance degrees) [63]. Both descriptors are mainly related to molecular branching. The scatterplot of the Qindex descriptor against the logIC 50 is shown in Figure 5. The plot shows that chalcone derivatives with low branching are located at the bottom of the graph, while compounds with more complex functional groups (high branching) at the α-ring are seen at the middle top. A similar trend is observed for the CENT descriptor against the observed bioactivity, although the dispersion of the data is more pronounced. Nonetheless, the chemical space can be divided into three regions, where simple chalcone derivatives are at the bottom. Notably, for both plots, three compounds are seen at the middle-left of the scatterplot, near the low-branching chalcones. These chemical species have high potency against the MCF-7 cancer line, although they show complex functional groups seen farther from the rest of the high-branching compounds. For each scatterplot, chemical species which are structurally similar are highlighted in colored circles. For more structurally diverse chalcones, light-blue dots are dispersed across the graphic. From Figure 6, a correlation is seen between the CENT and Qindex. In this case, simple chalcones are located at the left-bottom of the graphic while gradually increasing in values, with the more complex chalcones at the top-right of the scatterplot.  Finally, two atom-centered fragments' descriptors (ACF) are shown in Equation (1), C029 and H052. The C029 depicts a central sp 2 -carbon atom that is bonded to two electronegative atoms and one R group, with delocalized electron density between one X atom (X = N, O) and the R group. On the other hand, H052 is defined as a hydrogen atom bonded to a sp 3 -C atom next to a carbon atom bonded to an electronegative atom X. Figure 7 shows the distribution of molecules that contain the C029 and the H052 descriptors. It is seen that 63 molecules from the original dataset have the H052 fragment (shown in light blue). On the other hand, only 28 molecules display the C029 descriptor, which belongs to a specific family of chalcones that have the 6-chloroquinoxalin-2-ol fragment (green slice). Finally, the rest of the chalcone derivatives, 116 molecules, do not contain any ACF descriptors. It is noteworthy to observe that complex molecules, with large functional groups or very branched molecules, possess one of the ACF descriptors, while small chalcones (low-branched) do not have any. Finally, two atom-centered fragments' descriptors (ACF) are shown in Equation (1), C029 and H052. The C029 depicts a central sp 2 -carbon atom that is bonded to two electronegative atoms and one R group, with delocalized electron density between one X atom (X = N, O) and the R group. On the other hand, H052 is defined as a hydrogen atom bonded to a sp 3 -C atom next to a carbon atom bonded to an electronegative atom X. Figure 7 shows the distribution of molecules that contain the C029 and the H052 descriptors. It is seen that 63 molecules from the original dataset have the H052 fragment (shown in light blue). On the other hand, only 28 molecules display the C029 descriptor, which belongs to a specific family of chalcones that have the 6-chloroquinoxalin-2-ol fragment (green slice). Finally, the rest of the chalcone derivatives, 116 molecules, do not contain any ACF descriptors. It is noteworthy to observe that complex molecules, with large functional groups or very branched molecules, possess one of the ACF descriptors, while small chalcones (low-branched) do not have any.
Until now, our QSAR model has shown the importance of the molecular shape and branching of chalcones, which inspired us to evaluate if another class of molecular descriptors (geometrical) may fit better in our QSAR model; this decision was based on the rigid and aromatic-like character of most of the chalcones. From this, Equation (3) was obtained, which includes four geometrical descriptors, and added the new mathematical model, which is shown below: Until now, our QSAR model has shown the importance of the molecular shape and branching of chalcones, which inspired us to evaluate if another class of molecular descriptors (geometrical) may fit better in our QSAR model; this decision was based on the rigid and aromatic-like character of most of the chalcones. From this, Equation (3) was obtained, which includes four geometrical descriptors, and added the new mathematical model, which is shown below: As from the previous QSAR, all statistical parameters were obtained as their average values, and all the parameters were satisfied; for this model, higher values of the statistical parameters were achieved. In Figure 8, selected scatterplots for the predicted logIC50 against experimental logIC50 values and their corresponding Williams plot are shown. These graphics show that this new QSAR model possesses a better predictive ability and a more extended applicability domain. For this, many of the chalcones that started in this study were removed due to their structural differences, according to the geometrical descriptors: AROM, J3D, SPH, and RGyr. As from the previous QSAR, all statistical parameters were obtained as their average values, and all the parameters were satisfied; for this model, higher values of the statistical parameters were achieved. In Figure 8, selected scatterplots for the predicted logIC 50 against experimental logIC 50 values and their corresponding Williams plot are shown. These graphics show that this new QSAR model possesses a better predictive ability and a more extended applicability domain. For this, many of the chalcones that started in this study were removed due to their structural differences, according to the geometrical descriptors: AROM, J3D, SPH, and RGyr.
The 3D Balaban index (J3D) is obtained by the Balaban distance-connectivity index J (Equation (4)) using the geometric distance degrees (d x ) in place of topological distance degrees.
where M is the number of edges, and µ is the cyclomatic number (number of rings) [64]. J is a graph invariant. This descriptor is highly discriminant, and its value does not change substantially with molecule size or number of rings. According to our QSAR model, molecules with higher J3D values are expected, and from Figure   The 3D Balaban index (J3D) is obtained by the Balaban distance-connectivity index J (Equation (4)) using the geometric distance degrees (dx) in place of topological distance degrees.

1
(4) where M is the number of edges, and µ is the cyclomatic number (number of rings) [64]. J is a graph invariant. This descriptor is highly discriminant, and its value does not change substantially with molecule size or number of rings. According to our QSAR model, molecules with higher J3D values are expected, and from Figure 9, the J3D range values and examples of higher J3D values from molecules are displayed. The radius of gyration (RGyr) is a size descriptor for the distribution of atomic masses in a compound; its calculation is as follows: where is the distance of the ith atom from the center of mass, is the corresponding atomic mass, nAT the atom number, and MW the molecular weight [65]. In our QSAR The radius of gyration (RGyr) is a size descriptor for the distribution of atomic masses in a compound; its calculation is as follows: where r 2 i is the distance of the ith atom from the center of mass, m i is the corresponding atomic mass, nAT the atom number, and MW the molecular weight [65]. In our QSAR model, this descriptor is positively related to the potency of the chalcones, this is, increasing the RGyr, the biological activity of the chalcone will increase. In Figure 9, the RGyr range value and some of the chalcone structures are displayed.
The spherosity (SPH) is an anisometry descriptor calculated by the eigenvalues of the covariance matrix calculated from the molecular matrix: The spherosity index goes from zero for flat compounds (benzene) to one for totally spherical molecules. SPH has a positive coefficient meaning that as molecules tend to adopt a spherical form, the expected biological activity decreases (as the logIC 50 increases). [66]. This characteristic is related to our previous QSAR analysis, where more rigid and cylinderlike structures are preferred (vide supra). In order to test the predictive ability of the QSAR model, a set of molecules not introduced in the initial dataset of compounds was used as an external validation: compounds reported by Sangpheak [68], Xiao [69], and Kumar [70]. As seen in Figure 10, there is a good correlation between the predicted values and the corresponding experimental data (Table S2), with a value of 75.62, allowing us to confirm the reliability of our model. This strategy has been used in other work to ensure its predictive ability with positive results [71]. The Aromaticity (AROM) is derived from the general aromaticity indices: where the sum runs over the bonds belonging to aromatic rings, r π is the π bond average length and r π are the actual π bond lengths, and B π is the number of aromatic bonds. In the chalcone derivatives, the AROM descriptor possesses a negative coefficient indicating that at great aromaticity values, the better the molecule's activity. For most derivatives, the chalcone core is planar; hence, a better overlap of π orbitals is expected [67]. An in-depth analysis of the four new descriptors versus their logIC 50 showed that structures are arranged such that low-branching molecules are located at the top of their corresponding plots, while complex functional groups are at the bottom (Figure 9). It is especially seen within the J3D, SPH, and AROM descriptors, while for RGyr, more dispersion of the data is obtained. When J3D and AROM descriptors are plotted against each other, three regions are observed, where specific limits are seen for each of the x, y-axis: from left to right, 1.444 and 2.901 are the limits for the J3D Balaban index (x-axis), while below 0.959 and above 0.996 are molecules with the AROM descriptor (y-axis). Chalcone derivatives of the tetraazatricyclo-and the ortho-aminophenyl-motifs are seen outside these limits, which are in agreement with the outliers observed from Williams' plots.
In order to test the predictive ability of the QSAR model, a set of molecules not introduced in the initial dataset of compounds was used as an external validation: compounds reported by Sangpheak [68], Xiao [69], and Kumar [70]. As seen in Figure 10, there is a good correlation between the predicted values and the corresponding experimental data (Table S2), with a R 2 value of 75.62, allowing us to confirm the reliability of our model. This strategy has been used in other work to ensure its predictive ability with positive results [71].
Molecules 2023, 28, x FOR PEER REVIEW Figure 10. Scatterplot for the molecular dataset of some chalcone derivatives. Selected mole displayed within the plot showing the change in substituents.

Design and Synthesis of 2'-hydroxy-4'-cinnamate Chalcone
According to the QSAR model of Equation (3), we believe that new hybrid m containing 2',4'-DHC and polyphenol derivatives may be a good choice in the se cytotoxic agents. In this study, we synthesized the parent 2',4'-dihydroxychalcone the hybrid cinnamate chalcone B. The modification was carried out in the C-4' of leaving intact ring β of the chalcone. We chose cinnamic acid because of its report oxidant activity. The decision to only modify position 4 of the 2',4'-DHC was du fact that in our previous work, we found that modification of the chalcone at this improved the cytotoxic activity against PC-3 (prostate) cancer cell line [20].

Design and Synthesis of 2 -Hydroxy-4 -cinnamate Chalcone
According to the QSAR model of Equation (3), we believe that new hybrid molecules containing 2 ,4 -DHC and polyphenol derivatives may be a good choice in the search for cytotoxic agents. In this study, we synthesized the parent 2 ,4 -dihydroxychalcone (A) and the hybrid cinnamate chalcone B. The modification was carried out in the C-4' of ring α, leaving intact ring β of the chalcone. We chose cinnamic acid because of its reported antioxidant activity. The decision to only modify position 4 of the 2 ,4 -DHC was due to the fact that in our previous work, we found that modification of the chalcone at this position improved the cytotoxic activity against PC-3 (prostate) cancer cell line [20].

Biological Activity
The synthesized chalcones A and B, as well as the protected derivative A1 with purity values over 95%, were evaluated by the MTS assay for their capacity to inhibit the in vitro growth of breast (MCF-7), lung (A-549), cervical (CaSki), and liver (Hep 3B, HepG2) human cancer cell lines. We also included immortalized human hepatocyte cell line (IHH) as a control of non-cancerous cells. Experimental results and predicted activity are summarized in Table 1  Because of the high potency of compound B, we decided to explore structural modifications on the chalcone derivative using specific functional groups according to the QSAR model described above (Equation (3)). Simple chemical substitutions can be performed on the aromatic ring of the cinnamoyl motif. First, because a high number of approved drugs contain nitrogen heterocycles, which can induce hydrogen bonds with the receptor, we decided to evaluate pyridine rings as a starting choice, with the replacement of the p-hydrogen atom at the cinnamoyl ring (Figure 11a, compound C). Its predicted IC 50 value decreases from 17.86 to 5.62 µM. As a second approach, different groups were evaluated to determine their potency compared to the parent compound. It is estimated that the trifluoromethyl group can enhance the potency of the B-derivative, decreasing the expected value to 2.01 µM (Figure 11b, compound H). According to Equation (1), the presence of methoxy and hydroxyl functional groups at positions 2 and 6 on the α-ring of the chalcone core enhances the potency of the compounds by decreasing the expected IC 50 value as seen in compound J (Figure 11c). This also corresponds to an increase in the Qindex and CENT values, which agree to a larger compound branching. Finally, the QSAR model points out two chlorine atoms on the β-ring, which can be beneficial to the predicted cytotoxic value; this is observed for the expected value, which decreases to 0.45 µM in compound K (Figure 11d). Calculated molecular descriptors and predicted logIC 50 values are shown in Table S3. 2.5. In Silico ADME/tox Studies of A-K ADME/tox data (absorption, distribution, metabolism, excretion, and toxicity) can provide information on how the human body treats a drug. In silico pharmacokinetic parameters can predict drug candidates and drug optimization [72]. We decided to analyze ADME/tox data of the synthetic chalcones A and B and the proposed chalcone-cinnamate Figure 11. Proposed chalcone-cinnamate derivates (B-K), which, in accordance with the QSAR model, are expected to be high in potency compared to parent compound B. In (a,b), the introduction of pyridine and trifluoromethyl groups decreases the IC 50 of the compound as the values of Qindex and CENT increase, respectively; (c) the methoxy and hydroxyl functional groups at positions 2 and 6 on the α-ring of the chalcone core enhances potency in compound J; finally, in (d) chlorine atoms on β-ring are beneficial for the expected IC 50 value.

In Silico ADME/tox Studies of A-K
ADME/tox data (absorption, distribution, metabolism, excretion, and toxicity) can provide information on how the human body treats a drug. In silico pharmacokinetic parameters can predict drug candidates and drug optimization [72]. We decided to analyze ADME/tox data of the synthetic chalcones A and B and the proposed chalcone-cinnamate derivatives according to the QSAR study (C-K) ( Table 2). Drug-likeness analysis showed that only the synthetized chalcones A and B have not violated Lipinski, Veber and Ghose, suggesting that they would display well-behaved absorption or permeation [73]. In contrast, F, H, J, and K have one Lipinski violation (exceeded in MW). Lipophilicity was assessed using the logarithm of the n-octanol/water partition coefficient (LogPoctanol/water). Chalcones presented an MLogP range from 2.17 to 3.79 and are related to high membrane permeability. On the other hand, only chalcones A and B comply with Veber rules, with a rotatable bond of less than 10, which indicates molecular flexibility to their target. Moreover, a TPSA range between 57.53 and 85.72 Å (TPSA < 140 Å) [73] means high oral bioavailability for all compounds. Moreover, most of them (except for F, H, J, and K) could have a high gastrointestinal absorption. The presence of the BBB (blood-brain barrier) makes it challenging to develop new treatments for brain diseases, including radiopharmaceuticals for the brain; according to our results, only A and B could cross BBB [74]. P-glycoprotein (P-gp) is a multi-specific efflux transporter and can modulate the pharmacokinetics of anti-cancer drugs. It is related to multidrug resistance in human multidrug-resistant (MDR) cancers [75]. Only chalcone H, J, and K could act as P-gp substrates.
Finally, all designed chalcones (except D, F, and K) do not present hepatotoxicity (liver damage). All of them have non-mutagenicity or carcinogenicity, presenting an LD 50pred from 1000 mg/kg to 3800 mg/kg. In general, according to ADME predictions, the synthesized chalcones A and B are the best candidates and could have the required cell membrane permeability and bioavailability without toxicity, followed by C and G.

Calculation of the Molecular Descriptors
The structures of the molecules of interest were drawn in Avogadro and MarvinSketch (ChemAxon, Budapest, Hungary). The molecules were analyzed in the program Dragon 05 [76], where the families of molecular descriptors corresponding to constitutional properties, topological, functional groups, geometric, atom-centered fragments, molecular charge, and molecular properties were obtained (Table 3). Additionally, for rings α and β, indicative variables were used as descriptors: as stated above, if a specific functional group appeared at a particular position in the ring, a number 1 was assigned; otherwise, a 0 value was designated. Table 3. Molecular descriptors calculated with Dragon 05 program.

Molecular Descriptors
Descriptor

DataSet
An initial dataset of 207 compounds was obtained from the literature between 1995 and 2022; information can be seen in the Supplementary Materials. These compounds shared the same evaluation method against cancer line MCF-7. To improve the reliability of the data, the compounds were curated to eliminate outliers, uncertainties, and potential errors. Data used for the model generation required IC 50 values in µM; if µg/mL data were reported, transformation into their corresponding µM was done, and finally all were transformed to logIC 50 , leaving a final set of 160 molecules for the first model and 152 for the second model.

Construction of QSAR Model
The QSAR model was constructed using the genetic algorithms (GA) technique in MobyDigs program [61]. All molecular descriptors were used as independent variables (x), and the experimental cytotoxic activity was the dependent variable (y). The selection of the best model was based on parameter values such as the coefficient of determination (R 2 ), the standard deviation (s), and the Fischer test (F). The Y-scrambling test was used to guarantee that the QSAR model was built adequately in terms of correlation obtained by chance. This was performed firstly by randomly permuting the logIC 50 values of the dataset and then using the new column of values with the same variables to generate new models. The procedure was repeated 300 times, and the quality parameters of these new models were compared to the original values of the QSAR model: if the original model has no chance correlation, the new R 2 and Q 2 values calculated for the permuted logIC 50 QSAR models will have a significant difference with respect to the original values; otherwise, the model is rejected. Non-collinearity between descriptors is determined using the QUIK rule. The typical δK threshold values for models are between 0.01-0.05. Models that have negative values are not allowed. To detect models with an excess of "good" or "bad" descriptors, the redundancy (R P ) and overfitting (R N ) rules were applied.

Statistical Validation
The coefficient of determination, R 2 . The square of the correlation coefficient (also called Pearson's r) between the observed and predicted values in regression is calculated as shown in Equation (8) [57]. Acceptable values of R 2 should be R 2 ≥ 0.6 [77,78].
Fisher function F. Among most statistical tests, it is defined as the ratio between the model sum of squares MSS and the residual sum of squares RSS: where d M and d E refer to the degrees of freedom of the model (molecular descriptors in the model) and error, respectively, while y is the average value of experimental activity. The calculated value is compared with the critical value F crit for the corresponding degrees of freedom. It is a comparison between the model-explained variance and the residual variance: high values of the F test indicate reliable models [79]. The standard deviation "s" is a measure of the dispersion, or scatter, of the data used to determine the variation in values that respect a mean value, as shown in Equation (10) [55].
where variable n represents the number of molecules in the study,ŷ i represents the experimental activity y i is the calculated activity from the QSAR model. In addition, for the validation of our statistical model was required the analysis of the rules of QUIK, redundancy, and overfitting. The QUIK rule is based on the K multivariate correlation index that measures the total correlation of a set of variables [59] defined as: . . , p and 0 ≤ K ≤ 1 (11) where λ are the eigenvalues obtained from the correlation matrix of dataset X (n, p), being n the number of objects and p the number of variables. The total correlation in the set given by the model predictors X plus the response Y (K XY ) should always be greater than that measured only in the set of predictors (K X ). Therefore, if K XY − K X < δK the model is rejected, where δK has values of 0.01 to 0.05; models with negative differences are unacceptable [58]. The redundancy rule (R P ) detect models with an excess of "good" predictors as well and establishes that if R P < t P , the model is rejected. Depending on the data, the t P values range from 0.01 to 0.1. R P is defined by: The overfitting rule (R N ) tells us that if R N < t N (ε) the model is rejected, then the t N (ε) values are calculated by Equation (13).
where ε values range from 0.01 to 0.1, and p is the number of variables in the model R N is defined by: where M j is defined by Equation (15): R jY is the absolute value of the regression coefficient between the jth descriptors and the response Y. If a descriptor has a high correlation with the response, R P takes the low value and is close to 0 when R jY is equal to R and p > 1. R N accounts for an excess of noisy or useless variables and can be thought of as a measure of overfitting. It takes the maximum value to zero when non-noisy variables are in the model [60].

External Validation
The generated model was validated externally by the prediction of a set of molecules that were not included in the generation of the model. As stated above, data curation was performed on a total of 10 molecules drawn in Avogadro, and their molecular descriptors were obtained from the Dragon software package, as well as an analysis using the Free-Wilson approach. A complete list of descriptors and references can be found in Supplementary Materials Table S1.

Chemistry
All commercial regents: 2,4-dihydroxyacetophenone (99%), benzaldehyde (99%), cinnamic acid, benzyl bromide, dicyclohexylcarbodiimide (99%) dimethyl sulfoxide, potassium carbonate, potassium hydroxide, dichloromethane anhydrous (99%), methanol and boron tribromide were obtained from Sigma-Aldrich (St. Louis, MO, USA) and were used without further purification. Melting points were determined in a Prendo and were uncorrected. Nuclear magnetic resonance spectra were determined at 600 MHz for 1 H NMR and 150 MHz for 13 C NMR in the presence of tetramethylsilane (TMS) as an internal standard in CDCl 3 on a Bruker AMX 600 instrument. Chemical shifts δ are expressed in parts per million (ppm) relative to TMS and coupling constants (J) in Hertz. Multiplicities are indicated as singlet (s), doublet (d), triplet (t), quartet (q), double of double (dd), multi-plet (m), and broad singlet (bs). Open column chromatographies were carried out on silica gel 60 (70-230 mesh), and different solvent systems of n-hexane and EtOAc were used as mobile phases for the purification. Mass spectra were obtained in a Jeol M-station JEOL JMX-AX 505 HA mass spectrometer and HPLC Agilent Infinity 1260 coupled Quadrupole LC/MS 6120.

Antiproliferative Activity
Compounds A and B, as the first approach compounds, were subjected to antiproliferative trials in cell lines of cancer such as Hep3b, HepG2 (hepatocellular carcinoma), MCF-7 (breast), A549 (lung), and Caski (cervical), human cancer cell lines, obtained from ATCC (American Type Culture Collection, Manassas, VA, USA). We also included an immortalized human hepatocytes cell line (IHH) as a control of non-cancerous cells [80]. CaSKi cells were grown in RPMI-1640 medium (Sigma-Aldrich, St. Louis, MO, USA), A549 cells Kaighn's Modification of Ham's F-12 Medium (American Type Culture Collection, Manassas, VA, USA), while Hep3B, HepG2, IHH, and HeLa in a DMEM medium (Invitrogen, Thermo Fisher Scientific, Inc., Waltham, MA, USA) and supplemented with fetal bovine serum 10% (SFB, Invitrogen) and with 2 mM glutamine, all cultures were incubated at 37 • C in an atmosphere of 5% CO 2 .
They were cultivated in plates of 96 wells, 5000 cells per well, to initiate the evaluation. The concentrations used for the compounds were 100, 10, 1, 0.1, and 0.001 µg/mL for a dose/response curve and were incubated at 37 • C in 5% of CO 2 atmosphere for 48 h. The number of viable cells in the proliferation was determined using the Kit CellTiter 96 ® Aqueous One Solution Cell Proliferation Assay (Promega) in accordance with the instructions of the manufacturer. Then, the absorbance values were measured at 490 nm in an automatic microplate ELISA reader (Promega, Madison, WI, USA). Experiments were conducted in triplicate in three independent experiments. The data were analyzed in the statistical program Prisma 6.01, and the mean inhibitory concentration (IC 50 ) was determined by regression analysis.

In Silico ADME/tox
The freely accessible SwissADME web tool [81] (http://www.swissadme.ch/ (accessed on 20 April 2018)) was used for ADME data, and Protox II [82] (https://tox-new. charite.de/protox_II/ (accessed on 20 April 2018)) was used for toxicology parameters. Chemical structures of compounds A-K were introduced into the SwissADME web to obtain the SMILE nomenclature. We selected the most important ADME/Tox properties to determine lipophilicity, pharmacokinetics, and toxicology.

Conclusions
Our QSAR model showed a good predictive capability supported by the statistical rules. The mathematical model consists of fifteen molecular descriptors R6_OH, R2_OMe, R4_FA026, R4_FA029, R4_1PPD, R2_TMPhO, R2 _Cl, R4 _Cl, R2 _OMe, R4 _4MPPZ, R4 _FB035, Qindex, CENT, C029, and H052. The mathematical model shows how the presence or absence of different functional groups within the chalcone structure affects its cytotoxic activity against the MCF-7 cancer cell line. Aromaticity and molecular branching have a direct implication on the observed bioactivity. Our second QSAR model properly considered these properties of the chalcone and cinnamic acid cores by adding geometrical descriptors (J3D, RGyr, SPH, and AROM). QSAR model predicted IC 50 values of 55.95 and 17.86 µM corresponding to the chalcones A and B, respectively, which were synthesized, obtaining yields of 34% and 24%, respectively. These compounds were evaluated under MTS assay on four lines of human cancer, liver (HepG2, Hep3B), breast (MCF-7), lung (A-549) and cervical (CasKi), and a non-cancerous line of the liver (IHH). The results of the biological test on compound B showed better activity in all cancer cell lines, as predicted by the QSAR model developed in this work. Finally, B was chosen as a molecular scaffold for the generation of new derivatives with enhanced predicted cytotoxic activity, with the addition of functional groups in accordance with the QSAR model (C-K), a drug-likeness analysis suggested that chalcones A and B, flowed by C and G are the best candidates to be synthesized.

Supplementary Materials:
The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/molecules28145486/s1. Table S1: List of molecules used for the generation of the QSAR model; Table S2: List of molecules used for the validation test; Table S3: Molecular descriptors for derivatives of compound B and their predicted logIC 50 values; Figure S1