Novel Arylsulfonylhydrazones as Breast Anticancer Agents Discovered by Quantitative Structure-Activity Relationships

Breast cancer (BC) is the second leading cause of cancer death in women, with more than 600,000 deaths annually. Despite the progress that has been made in early diagnosis and treatment of this disease, there is still a significant need for more effective drugs with fewer side effects. In the present study, we derive QSAR models with good predictive ability based on data from the literature and reveal the relationships between the chemical structures of a set of arylsulfonylhydrazones and their anticancer activity on human ER+ breast adenocarcinoma and triple-negative breast (TNBC) adenocarcinoma. Applying the derived knowledge, we design nine novel arylsulfonylhydrazones and screen them in silico for drug likeness. All nine molecules show suitable drug and lead properties. They are synthesized and tested in vitro for anticancer activity on MCF-7 and MDA-MB-231 cell lines. Most of the compounds are more active than predicted and show stronger activity on MCF-7 than on MDA-MB-231. Four of the compounds (1a, 1b, 1c, and 1e) show IC50 values below 1 μM on MCF-7 and one (1e) on MDA-MB-231. The presence of an indole ring bearing 5-Cl, 5-OCH3, or 1-COCH3 has the most pronounced positive effect on the cytotoxic activity of the arylsulfonylhydrazones designed in the present study.


Introduction
Breast cancer (BC) is the most common cancer among women worldwide, accounting for 25% of all cancers, and is the second leading cause of cancer death in women after lung cancer [1]. In 2020, an estimated 2.3 million new cases of breast cancer were diagnosed globally, and 627,000 women died from the disease [1]. The most recent data from the American Cancer Society estimate that about one in eight (12%) women in the United States will develop invasive breast cancer at some point in their lives [2].
BC is categorized into three major types based on its molecular characteristics: hormonebased BC (estrogen receptor (ER + ) or progesterone receptor (PR + )), human epidermal receptor 2-expressing (HER2 + ) BC, and triple-negative (ER − , PR − , and HER2 − ) BC (TNBC) [3]. The type of BC determines the therapeutic approach. The treatment of hormone-based BC involves hormone therapy, which works by inhibiting the production or action of hormones that fuel the growth of cancer cells. Some common types of hormone therapy for BC include tamoxifen (a selective estrogen receptor modulator (SERM) that blocks the effects of estrogen on BC cells) [4] and aromatase inhibitors (a class of drugs that block the production of estrogen by inhibiting the enzyme aromatase) [5]. The CDK4/6 inhibitors (which block the activity of the cyclin-dependent kinases 4 and 6, which play a role in the regulation of the cell cycle) [6], HER2 inhibitors (which block the activity of the HER2 protein, which is overexpressed in some types of BC) [7], and luteinizing hormone-releasing hormone (LHRH) agonists (a class of drugs that lower estrogen levels by inhibiting the Molecules 2023, 28, 2058 2 of 16 production of luteinizing hormone, which is needed for the ovaries to produce estrogen) [8] also have found a place in BC therapy. In addition to hormone therapy, other treatment options for hormone-based BC may include chemotherapy, radiation therapy, surgery, and targeted therapies that block the signaling pathways that promote cancer cell growth [9]. More problematic is the treatment of TNBC, which accounts for 10-15% of BC cases and is characterised by limited possibilities for targeted therapy, as TNBC cells do not overexpress estrogen, progesterone, or HER2/neu receptors. The standard treatment for TNBC typically involves a combination of conventional chemotherapy, surgery, and radiation therapy [10].
Despite the advances in BC therapy, the need for new and more effective drugs with fewer side effects remains. Some promising areas of research in this field include targeted therapies with improved cancer selectivity, that are aimed to specifically target the cancer cells while minimizing harm to normal cells [9], and immunotherapies, which help to boost the body's own immune system to fight the cancer [11].
Recently, two research groups have independently developed novel arylsulfonylhydrazones as anticancer agents against human BC cells. Senkardes et al. have synthesized and tested a series of sulphonyl hydrazones with anticancer activity on human breast adenocarcinoma cell line MCF-7 and prostate cancer cell line PC-3 [12]. The anticancer activities were in the micromolar range and the selectivity index (SI = IC 50 on non-cancer cells/IC 50 on cancer cells) has reached 432 for some of the compounds. Additionally, good cyclooxigenase-2 (COX-2) inhibitory activity has been found in vitro for some of the hydrazones. COX-2 is a proinflammatory enzyme and is overexpressed in solid tumours such as BC and prostate cancer. Gaur et al. have synthesized and tested a series of arylsulfonylhydrazones with indole and morpholine moieties [13]. The compounds have shown anticancer activity on MCF in micromolar concentrations, with a SI up to 60. Furthermore, the compounds have been active on the TNBC cell line MDA-MB-468 with IC 50 in the lower micromolar range and with a SI up to 37. In the present study, we analyse the available data for arylsulfonylhydrazones by Quantitative Structure-Activity Relationship (QSAR) modelling. QSAR modelling is a computational technique that has proven to be valuable in the field of anticancer research. QSAR models use mathematical algorithms to analyse and predict the biological activity of chemicals based on their molecular structure. This information can then be used to identify new, promising compounds for further study and development as potential anticancer drugs. Several studies have demonstrated the utility of QSAR in anticancer research by identifying new candidate compounds for specific cancer targets and by facilitating the design of more selective and effective drugs [14][15][16]. In addition, QSAR can provide insights into the molecular mechanisms underlying a compound's activity, which can help guide the optimization of its structure for improved efficacy and safety [17]. Overall, QSAR modelling represents a powerful tool in the discovery and development of novel anticancer drugs.
We utilize the most effective QSAR models derived in the present study to design a set of potential new anticancer agents. These compounds undergo in silico screening for drug likeness, and the most promising ones are subsequently synthesized and evaluated in vitro on breast cancer cell lines ( Figure 1). activity of the HER2 protein, which is overexpressed in some types of BC) [7], and luteinizing hormone-releasing hormone (LHRH) agonists (a class of drugs that lower estrogen levels by inhibiting the production of luteinizing hormone, which is needed for the ovaries to produce estrogen) [8] also have found a place in BC therapy. In addition to hormone therapy, other treatment options for hormone-based BC may include chemotherapy, radiation therapy, surgery, and targeted therapies that block the signaling pathways that promote cancer cell growth [9]. More problematic is the treatment of TNBC, which accounts for 10-15% of BC cases and is characterised by limited possibilities for targeted therapy, as TNBC cells do not overexpress estrogen, progesterone, or HER2/neu receptors. The standard treatment for TNBC typically involves a combination of conventional chemotherapy, surgery, and radiation therapy [10]. Despite the advances in BC therapy, the need for new and more effective drugs with fewer side effects remains. Some promising areas of research in this field include targeted therapies with improved cancer selectivity, that are aimed to specifically target the cancer cells while minimizing harm to normal cells [9], and immunotherapies, which help to boost the body's own immune system to fight the cancer [11].
Recently, two research groups have independently developed novel arylsulfonylhydrazones as anticancer agents against human BC cells. Senkardes et al. have synthesized and tested a series of sulphonyl hydrazones with anticancer activity on human breast adenocarcinoma cell line MCF-7 and prostate cancer cell line PC-3 [12]. The anticancer activities were in the micromolar range and the selectivity index (SI = IC50 on non-cancer cells/IC50 on cancer cells) has reached 432 for some of the compounds. Additionally, good cyclooxigenase-2 (COX-2) inhibitory activity has been found in vitro for some of the hydrazones. COX-2 is a proinflammatory enzyme and is overexpressed in solid tumours such as BC and prostate cancer. Gaur et al. have synthesized and tested a series of arylsulfonylhydrazones with indole and morpholine moieties [13]. The compounds have shown anticancer activity on MCF in micromolar concentrations, with a SI up to 60. Furthermore, the compounds have been active on the TNBC cell line MDA-MB-468 with IC50 in the lower micromolar range and with a SI up to 37. In the present study, we analyse the available data for arylsulfonylhydrazones by Quantitative Structure-Activity Relationship (QSAR) modelling. QSAR modelling is a computational technique that has proven to be valuable in the field of anticancer research. QSAR models use mathematical algorithms to analyse and predict the biological activity of chemicals based on their molecular structure. This information can then be used to identify new, promising compounds for further study and development as potential anticancer drugs. Several studies have demonstrated the utility of QSAR in anticancer research by identifying new candidate compounds for specific cancer targets and by facilitating the design of more selective and effective drugs [14][15][16]. In addition, QSAR can provide insights into the molecular mechanisms underlying a compound's activity, which can help guide the optimization of its structure for improved efficacy and safety [17]. Overall, QSAR modelling represents a powerful tool in the discovery and development of novel anticancer drugs.
We utilize the most effective QSAR models derived in the present study to design a set of potential new anticancer agents. These compounds undergo in silico screening for drug likeness, and the most promising ones are subsequently synthesized and evaluated in vitro on breast cancer cell lines ( Figure 1). Figure 1. Flowchart of the present study. QSAR models were derived based on the literature data; the best models were used to design a series of potential new anticancer agents; the compounds Figure 1. Flowchart of the present study. QSAR models were derived based on the literature data; the best models were used to design a series of potential new anticancer agents; the compounds were screened in silico for drug likeness and ADME properties; the most prospective ones were synthesized and tested in vitro on BC cell lines.

Quantitative Structure-Activity Relationship (QSAR) Models for Arylsulfonylhydrazones as Breast Anticancer Agents
Two sets of arylsulfonylhydrazones were collected from the literature [12,13] and used as a training set for the derivation of QSAR models. The compounds and their anticancer activities, expressed as ligand efficiency (LE), are given in Table 1. LE measures the ligand activity per non-hydrogen atom and is calculated according to: where pIC 50 is the negative decimal logarithm of IC 50 and N is the number of non-hydrogen atoms in the molecule. The LE values ranged from 0.105 to 0.207 and from 0.110 to 0.170 for the activities on MCF-7 and MDA-MB-468, respectively.  [12], and compounds 5a-k-from Gaur et al. [13].  were screened in silico for drug likeness and ADME properties; the most prospective ones were synthesized and tested in vitro on BC cell lines.

Quantitative Structure-Activity Relationship (QSAR) Models for Arylsulfonylhydrazones as Breast Anticancer Agents
Two sets of arylsulfonylhydrazones were collected from the literature [12,13] and used as a training set for the derivation of QSAR models. The compounds and their anticancer activities, expressed as ligand efficiency (LE), are given in Table 1. LE measures the ligand activity per non-hydrogen atom and is calculated according to: where pIC50 is the negative decimal logarithm of IC50 and N is the number of non-hydrogen atoms in the molecule. The LE values ranged from 0.105 to 0.207 and from 0.110 to 0.170 for the activities on MCF-7 and MDA-MB-468, respectively. Table 1. Training set used in the study for the derivation of QSAR models. Compounds 3a-o are collected from Senkardes et al. [12], and compounds 5a-k-from Gaur et al. [13]. LE stands for Ligand Efficiency. The anticancer activities of the compounds are measured on human breast adenocarcinoma cell line MCF-7 (n = 26) and on the TNBC cell line MDA-MB-468 (n = 11). The structures were optimized and described by 70 molecular descriptors, as explained in Materials and Methods. The most relevant descriptors were selected by a genetic algorithm using software tool MDL QSAR v.2.2 (MDL Information Systems Inc., 2004). All possible subset regressions among the selected descriptors were calculated and only models with r 2 ≥ 0.6 and q 2 ≥ 0.4 were considered.

Original ID
The best model for anticancer activity on cell line MCF-7 is given below: where n = 26; r 2 = 0.796; SEE = 0.014; q 2 = 0.647; CVRSS = 0.007; r 2 random (mean) = 0.155, morph is a user-defined indicator differentiating the two subsets in the training set; SaaaC_acnt accounts for the number of aromatic aaaC-atoms in the molecule; SaaN_acnt corresponds to the number of aromatic aaN-atoms; ka1 is first order kappa alpha shape index; r 2 -goodness of fit, SEE-standard error of estimation, q 2 -leave-one-out cross validation coefficient; CVRSS-cross validation residual sum of squares, and r 2 random (mean)-the mean value of r 2 random values calculated for 100 randomizations of the dependent variable among the compounds.
The values of the descriptors relevant to the cytotoxic activity on MCF-7 are given in Table S1, Supplementary Material. The indicator morph takes 1 for the subset 5a-k and 0 for the subset 3a-o. The negative coefficient for morph means that the substituent 1-(4morpholinylethyl)-1H-indol-3-yl in 5a-k is not favourable for LE on MCF-7. The descriptor SaaaC_acnt varies from 0 (for 3a-o) to 2 (for most of 5a-k) and 4 (for 5h and 5i, containing fused rings). Its coefficient in the model is positive, i.e., more aaaC-atoms in the molecule correspond to better anticancer activity. The descriptor SaaN_acnt takes value 1 for 3h and 5i, containing pyrazolyl and quinolyl, respectively. For the rest of the compounds, SaaN_acnt takes the value 0. As its coefficient is negative, the presence of aromatic N-atoms of type aaN is not essential for the anticancer activity. The kappa shape indices account for the molecular shape [18]. A higher value for ka1 corresponds to more branched molecules (more paths). In the training set, the values for ka1 vary from 13.666 for 3n to 25.609 for 5h. The average ka1 for the subset 3a-o is 16.701, for the subset 5a-k-22.421. The negative coefficient for ka1 favors the less branched molecules.
The QSAR model for cytotoxic activity on cell line MDA-MB-468 was derived only on the compounds from the subset 5a-k. The best model is given below:  Table S1. The number of elements in the molecules 5a-k is five (C, O, N, S, and H); only 5d has an additional F and 5f has an additional Cl. As the coefficient for nelem is positive, obviously, the presence of F and Cl favours the cytotoxic activity. The range of nvx values is from 29 for 5c to 39 for 5h and 5i. The negative coefficient means that the bulky branched substituents are not favourable for the activity on MDA-MB-468.
The structure-activity relationships found in the derived QSAR models are used next in the design of novel arylsulfonylhydrazones with anticancer activity.

Design of Novel Arylsulfonylhydrazones Based on QSAR Models
The requirements obtained from the above QSAR models were implemented in the design of novel arylsulfonylhydrazones as anticancer agents, i.e.:
For Ar2: Aromatic rings containing aaaC and Cl but no aaN. The structures of the designed molecules are given in Table 2. For Ar1, we selected phenyl or 4-methylphenyl substituents. The N-tosyl hydrazones (p-Me-Ph-SO2-NH-N=Ar(R)) are a special class of hydrazones with proven anticancer activity against TNBC cell lines [19]. The structures of the designed molecules are given in Table 2. For Ar1, we selected phenyl or 4-methylphenyl substituents. The N-tosyl hydrazone (p-Me-Ph-SO2-NH-N=Ar(R)) are a special class of hydrazones with proven anticance activity against TNBC cell lines [19].
For Ar2, we selected indole and phenyl substituents. The indole ring possesses an ti-BC activity [20] due several different signaling pathways [21]. The indole system con tains aaaC and no aaN. The N-atoms in the selected indole substituents are aaNH or daaN with slight NH-acidic (pKa 16.2) properties. Six of the nine new hydrazones contain mono-substituted indole moiety (compounds 1a-e, 1i). For comparison, we included three compounds with bi-substituted phenyl rings (compounds 1f-h). Two of the com pounds contain the favourable Cl atom (compounds 1e and 1h).
The LE values of the designed compounds were predicted by the derived models All of them are close to or higher than the maximum LE of the compounds from th training set on both cell lines. At this stage of the study, all designed compounds ap peared to be prospective anticancer agents.

In Silico Screening of the Designed Compounds for Drug Likeness
Prior to synthesis, the designed structures were screened in silico for drug likenes considering their physicochemical and ADME properties and pharmacokinetic (PK) pa rameters.

Physicochemical Properties
The main physicochemical properties calculated for the designed arylsulfonylhy drazones are given in Table 3 [20] due several different signaling pathways [21]. The indole system contains aaaC and no aaN. The N-atoms in the selected indole substituents are aaNH or daaN, with slight NH-acidic (pKa 16.2) properties. Six of the nine new hydrazones contain mono-substituted indole moiety (compounds 1a-e, 1i). For comparison, we included three compounds with bi-substituted phenyl rings (compounds 1f-h). Two of the compounds contain the favourable Cl atom (compounds 1e and 1h).
The LE values of the designed compounds were predicted by the derived models. All of them are close to or higher than the maximum LE of the compounds from the training set on both cell lines. At this stage of the study, all designed compounds appeared to be prospective anticancer agents.

In Silico Screening of the Designed Compounds for Drug Likeness
Prior to synthesis, the designed structures were screened in silico for drug likeness considering their physicochemical and ADME properties and pharmacokinetic (PK) parameters.

Physicochemical Properties
The main physicochemical properties calculated for the designed arylsulfonylhydrazones are given in Table 3. They are molecular weight, Mw; pKa value; fraction of the ionized molecules, f A ; logP; distribution coefficient at pH 7.4 logD 7.4 ; polar surface area, PSA; count of free rotatable bonds, FRB; hydrogen bond donors, HBD; hydrogen bond acceptors, HBA; count of the violations of Lipinski's Rule of 5, R5. The molecular weights are around 300 g/mol (295-355 g/mol), which is in a good agreement with the recommended Mw for lead compounds [22]. The compounds are weak acids with pK a values between 8.59 and 9.09. At pH 7.4, the neutral molecules dominate as indicated by the negligible fraction of ionized molecules f A and the close values between logP and logD 7.4 . The logP values are around 3, which is, again, in good agreement with the requirements for lead compounds. PSAs range from 67 to 92 Å, suggesting good oral absorption and inability to cross the blood-brain barrier (BBB) [23]. The number of free rotatable bonds is between 3 and 5; however, the single bonds in the Ar1-S-N-N = fragment are quite rigid due to p-π conjugation. The number of hydrogen bond donors obeys the 'Rule of 3'; however, the hydrogen bond acceptors exceed it. Regarding Lipinski's rule of 5, all compounds meet the four criteria and there is no violation.

ADME Properties
The ADME properties calculated in the study are given in Table 4. The water solubility was calculated by three methods [24] and the average value in mol/L is presented as logS.
According to the logS scale [24], compounds with logS between −6 and −4 are considered as moderately soluble, while those with logS between −4 and −2-as soluble. According to the BOILED-Egg diagram [23] (Figure 2), all compounds have good oral permeability, one of them (compound 1h) is able to cross the blood-brain barrier (BBB), and none of the compounds are a substrate of the P-glycoprotein (P-gp) transporter. The parameter oral BA summarizes six criteria which definine the suitable physicochemical space for oral bioavailability [23]. These are lipophilicity (logP), size (Mw), polarity (PSA), solubility (logS), insaturation (fraction of Csp3 atoms), and flexibility (number of rotatable bonds). Each criterion has a certain range. The designed compounds violate in insaturation, i.e., the fraction of Csp3 atoms is below the lower limit of 0.25. This violation was expected as most of the C-atoms in the structures are in sp2-hybridization. The BA score indicates the probability of bioavailability being higher than 10% in rats [25]. In our case, the probability is 55%. The CYP inhibition considers the five enzymes that most-commonly take part in drug metabolism: 1A2, 2C19, 2C9, 2D6, and 3A4. The studied compounds are able to inhibit between 2 and 4 of the CYPs. Apart from following Lipinski's rule, all compounds demonstrate drug likeness filtered by the criteria of Ghose [26], Veber [27], Egan [28], and Muegge [29]. The lead likeness is defined by three criteria: Mw in the range 250-350 g/mol, logP up to 3.5, and up to 7 rotatable bonds in the molecule [30]. Here, again, our compounds fit well in the ranges. Finally, the synthetic feasibility of the designed compounds was assessed by the synthetic accessibility score, which ranges from 1 (very easy synthesis) to 10 (very difficult synthesis). A score between 2.56 and 2.80 points to relatively easy synthesis. Table 4. ADME properties of the designed arylsulfonyl hydrazones: water solubility; GI absgastrointestinal absorption; oral BA-oral bioavailability; BA score-bioavailability score; BBB permblood-brain barrier permeability; CYP inh-inhibition of CYP enzymes; P-gp substr-substrate of P-gp; drug likeness; lead likeness; synth access-synthetic accessibility.

Pharmacokinetic Parameters
The main pharmacokinetic parameters, fraction of the unbound-to-plasma-proteins molecules, fu; total clearance, CL; steady-state volume of distribution, VDss; half-life, t1/2, of the designed arylsulfonylhydrazones were calcuculated by QSPkR models previously derived in our Lab [31][32][33]. The predicted values are given in Table 5.
The fu values ranged from 0.010 to 0.074, suggesting high plasma protein binding of all compounds (>90%). It is generally accepted that neutral drugs bind with variable affinity to both human serum albumin and alpha−1-acid glycoprotein [34]. Lipoproteins also contribute to plasma protein binding, especially for highly lipophilic drugs [35].
Total CL values ranged between 0.017 and 0.647 L/h/kg. Most of the compounds can be classified as low CL drugs, while 1c and 1d have medium CL. Analysis of a data set of 754 drugs with different ionization states revealed that 78% of anionic and zwitterionic

Pharmacokinetic Parameters
The main pharmacokinetic parameters, fraction of the unbound-to-plasma-proteins molecules, fu; total clearance, CL; steady-state volume of distribution, VDss; half-life, t 1/2 , of the designed arylsulfonylhydrazones were calcuculated by QSPkR models previously derived in our Lab [31][32][33]. The predicted values are given in Table 5.
The fu values ranged from 0.010 to 0.074, suggesting high plasma protein binding of all compounds (>90%). It is generally accepted that neutral drugs bind with variable affinity to both human serum albumin and alpha−1-acid glycoprotein [34]. Lipoproteins also contribute to plasma protein binding, especially for highly lipophilic drugs [35].
Total CL values ranged between 0.017 and 0.647 L/h/kg. Most of the compounds can be classified as low CL drugs, while 1c and 1d have medium CL. Analysis of a data set of 754 drugs with different ionization states revealed that 78% of anionic and zwitterionic drugs have low CL (<0.24 L/h/kg) and only 1-2% have high CL (>0.96 L/h/kg). For neutral drugs, these percentages were as follows: 45% low CL, 39% moderate CL, and 16% high CL [36]. Considering the relatively high lipophilicity of the compounds and the negligible ionization at pH 7.4, clearance can be considered to be dominated by metabolism. Neutral drugs have a low renal CL R unless their logD 7.4 is negative. For drugs with logD 7.4 > 0, the CL R decreases with lipophilicity due to tubular reabsorption [37]. Table 5. PK parameters of the designed arylsulfonyl hydrazones: fu-fraction of the compound unbound to plasma proteins; CL-total clearance in L/h/kg; VDss-steady-state volume of distribution in L/kg; t 1/2 -half-life in h. Values for VDss vary between 0.587 and 0.953 L/kg, which is in the order of total body water volume. It is likely that the compounds are evenly distributed throughout the body without significant accumulation in certain tissues and organs.

Synthesis of the Novel Arylsulfonyl Hydrazones
The designed compounds showed strong drug and lead likeness in in silico screening procedures and we decided to synthesize and test all of them.
The arylsulfonylhydrazones were prepared by a condensation reaction (Scheme 1) between the corresponding aldehydes and benzenesulfonohydrazide or 4-methylbenzenesulfo nohydrazide, at a molar ratio of 1:1, in absolute ethanol for 1-3 h, as described elsewhere [38].
Molecules 2023, 28, x FOR PEER REVIEW 8 of 1 drugs have low CL (< 0.24 L/h/kg) and only 1-2% have high CL (>0.96 L/h/kg). Fo neutral drugs, these percentages were as follows: 45% low CL, 39% moderate CL, and 16% high CL [36]. Considering the relatively high lipophilicity of the compounds and th negligible ionization at pH 7.4, clearance can be considered to be dominated by metabolism. Neutral drugs have a low renal CLR unless their logD7.4 is negative. Fo drugs with logD7.4 > 0, the CLR decreases with lipophilicity due to tubular reabsorption [37]. Values for VDss vary between 0.587 and 0.953 L/kg, which is in the order of tota body water volume. It is likely that the compounds are evenly distributed throughout th body without significant accumulation in certain tissues and organs.

Synthesis of the Novel Arylsulfonyl Hydrazones
The designed compounds showed strong drug and lead likeness in in silico screen ing procedures and we decided to synthesize and test all of them.
The arylsulfonylhydrazones were prepared by a condensation reaction (Scheme 1 between the corresponding aldehydes and benzenesulfonohydrazide o 4-methylbenzenesulfonohydrazide, at a molar ratio of 1:1, in absolute ethanol for 1-3 h as described elsewhere [38].

Anticancer Activity of the Novel Arylsulfonyl Hydrazones
The anticancer activity of the novel arylsulfonylhydrazones was tested on two BC cell lines: MCF-7 and MDA-MB-231. The cell line MCF-7 originates from human breast adenocarcinoma and expresses estrogen receptor alpha (ER-α) [39], while the cell line MDA-MB-231 represents TNBC adenocarcinoma and lacks any receptor [40]. To test the cytotoxicity of the compounds on healthy cells, they were incubated within Neuro-2a cells, which are mouse neuroblasts isolated from brain tissue [41,42]. The results from the in vitro tests are summarized in Table 6. The differences (errors) between the experimental and the predicted LE values are given in Table 1. The positive values correspond to underpredicted activity, the negativeto overpredicted activity. The errors range between −0.047 and 0.063 for MCF-7 and from −0.027 to 0.081 for MDA-MB-231. Most of the compounds are more active than expected. Only compounds 1i and 1g are less active on MCF-7 and MDA-MB-231, respectively.
The experimental IC 50 values of the novel compounds on MCF-7 range from 0.6 µM to 164.9 µM. The LEs are between 0.158 and 0.286, with an average value of 0.230. For comparison, the average LE of the training set on the same cell line is 0.156 (0.171 for the subset 3a-o and 0.135 for the subset 5a-k) with the highest value being 0.207. The selectivity index SI is defined as the ratio of IC 50 on healthy cells and IC 50 on cancer cells. A SI higher than 10 is considered to belong to a selective compound [43]. The SIs of the novel compounds span from 0.747 to 46 on MCF-7. Four of the nine compounds show cytotoxic activities on MCF-7 below 1 µM. These are compounds 1e, 1a, 1b, and 1c. The most efficient compounds on MCF-7 are 1e and 1a, while the most selective are compounds 1d and 1c.
The most active, efficient, and selective compound on MDA-MB-231 is 1e, with an IC 50 of 0.9 µM, LE of 0.275, and SI of 7. 222. Compounds 1a, 1b, and 1c have IC 50 s in the lower micromolar range with LEs around and above 0.2; however, they have low SIs.

Discussion
Based on data from the literature, QSAR models were obtained in the present study to reveal the relationship between the structures of arylsulfonylhydrazones and their anticancer activity against BC. It was found that, for the activity against ER+ BC, measured on a MCF-7 cell line, a less-branched aromatic substituent with more aaaC-atoms, Cl, and no aaN-atoms performed better as anticancer agents. Less-branched aromatic moieties bearing F and Cl are required for activity against TNBC, as measured in the MDA-MB-231 cell line. These findings were implemented in the design of nine arylsulfonyl hydrazones. The structures contain mono-and/or bi-substituted phenyl and indolyl moieties. Cl atoms were included in two of them. The anticancer activities on both cell lines, expressed as LE, were predicted by the derived QSAR models. All compounds demonstrated higher than or close to the maximal LEs of the compounds from the training set. Prior to synthesis, the structures were screened in silico for drug likeness by calculating their physicochemical and ADME properties and main PK parameters, such as fraction of the unbound to plasma protein molecules, fu; total clearance, CL; steady-state volume of distribution, VDss; and half-life, t 1/2 . In terms of drug likeness, all nine of the designed compounds were suitable as leads. They were synthesized and tested. The in vitro tests confirmed the predicted activities. What is more, seven and eight of the compounds are more active on MCF-7 and MDA-MB-231, respectively, than predicted. Most of the designed compounds are more active on MCF-7 than on MDA-MB-231. The IC 50 values for 1e, 1a, 1b, and 1c on MCF-7 are below 1 µM. On MDA-MB-231, only compound 1e shows activity below 1 µM.
The most active and most efficient compound on both cell lines is 1e, with a SI of 13 for MCF-7 and 7 for MDA-MB-231. It contains a phenyl ring as an Ar1 substituent and 5-chloroindole as an Ar2 subsitituent. Further, 1e obeys drug and lead likeness rules, has high GI absorption, and has no BBB permeability. In terms of PK behavior, 1e is predicted to be extensively bound to plasma proteins (only 1% free fraction), with a total clearance of 5 L/h and a VDss of 54 L for a 70-kg patient, as well as a half-life of 7 h.
The next-most active and efficient arylsulfonylhydrazone on both cell lines is 1a, with a SI of about 9 for MCF-7 and only 1.7 for MDA-MB-231. Further, 1a bears phenyl as Ar1 and 5-methoxyindole as Ar2. This compound is predicted to be a good drug candidate and lead compound in terms of physicochemical and ADME properties, with extensive plasma-protein binding, a total clearance of 13.5 L/h, a VDss of 41 L, and a half-life of 2 h.
Next in activity and efficiency on MCF-7 line are compounds 1b, 1c, 1d, and 1f. Compounds 1c and 1d demonstrate the highest selectivity of 40 and 46, respectively, followed by 1f with a SI of 18. Compounds 1g, 1h, and 1i are less active, efficient, and selective.
For MDA-MB-231, compounds 1b and 1c show activities in the low micromolar range and efficiencies around 0.2; however, they show poor selectivities (up to 2). The remaining compounds are less active and non-selective.
The analysis of substituents shows that the indole ring has the most pronounced positive effect on the cytotoxic activity of the arylsulfonylhydrazones designed in the present study. The substitution of indole by phenyl dramatically reduces the activity on both cell lines (from 10-fold to more than 300-fold on MCF-7 and from 70-fold to complete loss of activity on MDA-MB-231). Among the substituents on the indole ring, 5-Cl, 5-OCH 3 , and 1-COCH 3 increase the activity between 92-and 330-fold on MCF-7 compared with the 1-CH 3 substituent. The effects of these substituents on the activity on MDA-MB-231 are moderate. The Cl atom deserves special attention. Attached to an indole moiety, it increases activity 330-fold on MCF-7 and 70-fold on MDA-MB-231 compared to when it is attached to the phenyl ring.
In conclusion, the QSAR-guided strategy for the design of novel arylsulfonylhydrazones with anticancer activity, applied in the present study, generated several prospective leads with IC 50 values below 1 µM and SI values up to 46. The newly designed compounds were more active than the compounds from the training set and represent a starting point for further lead optimization.

Materials and Reagents
The reagents for the synthesis were analytical or chemically pure and obtained from Sigma-Aldrich (Steinheim, Germany). The solvents used were of analytical grade. The structures of the new molecules were proven by 1H-NMR, 13CNMR, and HRMS spectral data. Their purity was determined by TCL characteristics and melting points.
The in vitro antineoplastic activity of the newly synthesized compounds was evaluated against human BC cell lines of different molecular types: the triple negative MDA-MB-231 cell line and the ER/PR/Her2 positive variant MCF-7, as well as against mouse neuroblast cells, Neuro-2a. All cell lines were purchased from the German Collection of Microorganisms and Cell Cultures (DSMZ GmbH, Braunschweig, Germany) and cultivated according to supplier's instructions. Cells were cultured in an RPMI 1640 growth medium supplemented with 10% fetal bovine serum (FBS) and 5% L-glutamine, and incubated under standard conditions of 37 • C and 5% humidified CO 2 atmosphere.

QSAR Protocol
The training set for the development of QSAR models consisted of 26 compounds. Fifteen compounds were derivatives of 4-methylphenyl hydrazone [12]. The remaining 11 compounds were morpholinylethylindolyl derivatives [13]. The anticancer activities of both subsets were measured in vitro by MTT tests on MCF-7 cell line. The second set was tested on MDA-MB-468 cell line as well. The chemical structures were modeled and optimized by MM+ force field, steepest descent algorithm, and RMS gradient of 0.1 kcal/A.mol using HyperChem 7.52 (Hypercube Inc., Gainesville, FL, USA, 2005).
The chemical structures were described by 70 descriptors divided into eight groups: atom-type E-state indices, atom-type E-state accounts, hydrogen E-state categories, internal H-bonds E-state indices, kappa shape indices, molecular properties (logP, molecular weight, number of elements, number of rings, number of hydrogen-bond donors and acceptors, etc.), 3D descriptors (dipole, polarizability, surface, volume, etc.), and user-defined (morph). The descriptor morph accounts for the presence of an indole-morpholine fragment in the molecule. If an indole-morpholine is presented in the molecule, morph takes 1, otherwise it takes 0. The relevant descriptors were selected by genetic algorithm (GA) at the following settings: size of initial population 32, tournament selection, uniform crossover, one-point mutation, and Friedman's lack-of-fit scoring function with parameter 2. All possible subset regressions among the selected descriptors were calculated and only models with r 2 (goodness of fit) ≥ 0.6 and q 2 (leave-one-out cross validation coefficient) ≥ 0.4 were considered. To check the validity of the selected descriptor set, 100 randomizations of the dependent variable among the compounds were carried out and r 2 random values were calculated for each regression. If the mean value of r 2 random was lower than r 2 , the selected descriptor set was considered as valid. QSAR models were derived by MDL QSAR v.2.2 (MDL Information Systems Inc., 2004).

In Silico Screening for Drug Likeness
The physicochemical properties of the designed compounds were calculated by ACD/LogD tool v. 9.08 (ACD/Labs, Toronto, Canada). The ADME properties were calculated by SwissADME tool [20]. The PK parameters were calculated by previously derived QSPkR models [31][32][33]. As the fraction of the ionized molecules of most of the designed arylsulfonylhydrazones was below 3%, the predictions were based on the QSPkR models derived for neutral molecules. Separate QSPkR models have been derived for the fraction of neutral molecules unbound to plasma proteins, fu; unbound clearance of neutral drugs, Clu; and steady state volume of distribution of basic and neutral drugs, VDss. The datasets consisted of 117 neutral molecules or 407 basic and neutral drugs, respectively, extracted from Obach's database-the largest and best curated source of data for the key pharmacokinetic parameters after iv administration [44]. The chemical structures of the compounds have been encoded by more than 113 to 138 molecular descriptors calculated by ACD/LogD tool v. 9.08 and MDL QSAR version 2.2. Genetic algorithm and step-wise multiple linear regression have been applied for variable selection and model derivation. The QSPkRs have been evaluated by internal and external validation procedures.

General Information
The nuclear magnetic resonance (NMR) experiments were carried out on a Bruker Avance spectrometer at 600 MHz at 20 • C in deuterated dimethyl sulfoxide (DMSO-d6) as a solvent, and tetramethylsilane (TMS) as an internal standard. The precise assignment of the 1 H and 13 CNMR spectra was accomplished by measurement of two-dimensional (2D) homonuclear correlation (correlation spectroscopy (COSY)), DEPT-135, and 2D inverse detected heteronuclear (C-H) correlations (heteronuclear single-quantum correlation spectroscopy (HMQC) and heteronuclear multiple bond correlation spectroscopy (HMBC)). Mass spectra were measured on a Q Exactive Plus mass spectrometer (ThermoFisher Scientific) equipped with a heated electrospray ionization (HESI-II) probe (Thermo Scientific, Bremen, Germany). The melting points were determined using a Buchi 535 apparatus and melting point meter M5000 apparatus. We used IUPAC nomenclature for naming of the newly synthesized compounds.

General Procedure for the Synthesis of the Compounds 1a-i
The solution of 20 mmol of the corresponding carbonyl compounds in 10 mL of absolute ethanol was mixed with a hot solution of 20 mmol (60 • C) benzenesulfonohydrazide or 4-methylbenzenesulfonohydrazide in 10 mL of absolute ethanol and stirred for 1-3 h. Upon cooling, the obtained crystalline precipitates were filtered, washed with ethanolether, recrystallized from ethanol, and dried. The new compounds were colorless, white, and light-yellow crystalline solids, stable at normal conditions and soluble in methanol, acetonitrile, and DMSO; poorly soluble in water and ethanol.