Application of Ligand- and Structure-Based Prediction Models for the Design of Alkylhydrazide-Based HDAC3 Inhibitors as Novel Anti-Cancer Compounds

Histone deacetylases (HDAC) represent promising epigenetic targets for several diseases including different cancer types. The HDAC inhibitors approved to date are pan-HDAC inhibitors and most show a poor selectivity profile, side effects, and in particular hydroxamic-acid-based inhibitors lack good pharmacokinetic profiles. Therefore, the development of isoform-selective non-hydroxamic acid HDAC inhibitors is a highly regarded field in medicinal chemistry. In this study, we analyzed different ligand-based and structure-based drug design techniques to predict the binding mode and inhibitory activity of recently developed alkylhydrazide HDAC inhibitors. Alkylhydrazides have recently attracted more attention as they have shown promising effects in various cancer cell lines. In this work, pharmacophore models and atom-based quantitative structure–activity relationship (QSAR) models were generated and evaluated. The binding mode of the studied compounds was determined using molecular docking as well as molecular dynamics simulations and compared with known crystal structures. Calculated free energies of binding were also considered to generate QSAR models. The created models show a good explanation of in vitro data and were used to develop novel HDAC3 inhibitors.


Introduction
Epigenetics refers to reversible alterations in the gene expressions that do not modify the DNA sequence [1]. Post-translational modifications such as methylation, acetylation, and others introduce changes on the N-terminal tails of histones [2]. Histone acetylation and deacetylation are controlled by different classes of enzymes, namely histone acetyltransferases (HAT) and histone deacetylases (HDAC) [3,4]. Thus, chemical modification is reversible [5,6].

Diversity Analysis of the Datasets
A dataset containing 63 compounds with an N-monosubstituted hydrazide scaffold was collected from the literature [47,49,51]. The 2D structures and IC 50 values of all alkylhydrazides are shown in Table S1 (Supplement). The selected compounds cover a reasonable biological activity range (Table 1). All compounds were measured in vitro using recombinant human HDAC3 and the peptidic substrate (Boc-Lys(acetyl)-AMC). The fluorescence intensity was measured at excitation and emission wavelengths of 360 and 460 nm, respectively [47,49,51]. We first grouped the compounds into three activity classes according to their HDAC3 inhibitory data ( Table 1, Table S1 Supplement): 1.
Inactive inhibitors showing pIC 50 < 5.30 The compounds were randomly divided into a training set (70%; 39 compounds) and a test set (30%; 17 compounds) using the "RAND" function in the MOE program (MOE-Molecular database calculator-RAND) [64]. The compounds either having no exact IC 50 values or showing an IC 50 value higher than 5 µM were classified as inactive ( Table 1). The same training and test sets were used for the ligand-and structure-based model development studies. The training set was used to generate the models, while the independent test set was utilized to evaluate the predictive accuracy of the selected best models. The inactive set was only utilized for the validation of the pharmacophore models by calculating the inactive-survival score (detail in Section 2.2).
The applicability domains of the training and external test sets were analyzed by plotting the three most important principal components (PCA1, PCA2, and PCA3) [60,65] of the calculated descriptors (PEOE_VSA_HYD, GCUT_SLOGP_0, TPSA, b_single, lip_acc, lip_don, and vsa_hyd-explained in Table 2). The most important PCA of the molecular descriptors can explain about 100% of the original space. The PCA analysis showed that the training set and external test set were homogeneously distributed in the chemical space ( Figure 3). Table 2. List of selected molecular descriptors for the PCA analysis.

PEOE_VSA_HYD
The partial equalization of orbital electronegativity (PEOE). Total hydrophobic van der Waals surface area GCUT_SLOGP_0 The GCUT descriptors using atomic contribution to logP TPSA Polar surface area b_single Number of single bonds (including implicit hydrogens). Aromatic bonds are not considered to be single bonds lip_acc The number of O and N atoms lip_don The number of OH and NH atoms vsa_hyd Approximation of the sum of VDW surface areas of hydrophobic atoms. Approximation of the sum of VDW surface areas of hydrophobic atoms. The training set is colored black; the test set is colored orange.

Analysis of the Pharmacophore Model
An important step in establishing a 3D-QSAR model is the development of the correct alignment, usually based on a generated pharmacophore model. In the current work, the pharmacophore model was generated using the Phase module implemented in Schrödinger considering 30 active compounds (pIC50 > 7) [66]. Then, seven inactive compounds were used to analyze the ability of the generated models to discriminate between the active and inactive compounds.
The pharmacophore model shows the 3D (three-dimensional) structural features which might be essential for the biological activity [67,68]. Hence, the pharmacophore features that are common to the 30 active compounds showing a pIC50 more than 7 were investigated. In total, 38 pharmacophore hypotheses were generated and scored according The training set is colored black; the test set is colored orange.

Analysis of the Pharmacophore Model
An important step in establishing a 3D-QSAR model is the development of the correct alignment, usually based on a generated pharmacophore model. In the current work, the pharmacophore model was generated using the Phase module implemented in Schrödinger considering 30 active compounds (pIC 50 > 7) [66]. Then, seven inactive compounds were used to analyze the ability of the generated models to discriminate between the active and inactive compounds.
The pharmacophore model shows the 3D (three-dimensional) structural features which might be essential for the biological activity [67,68]. Hence, the pharmacophore features that are common to the 30 active compounds showing a pIC 50 more than 7 were investigated. In total, 38 pharmacophore hypotheses were generated and scored according to the survival score. The survival score was generated by evaluating how well the selected pharmacophore hypothesis fits to the most active inhibitors. Additionally, the Phase module penalizes the generated pharmacophore hypothesis that cannot discriminate the actives from inactives. Thus, the developed hypotheses were mapped onto the inactive compounds and scored to yield the inactive-survival score. Pharmacophore hypotheses which showed a Pharmaceuticals 2023, 16, 968 6 of 24 better inactive score than survival score was discarded since it cannot discriminate between active and inactive compounds. For the selected pharmacophore hypothesis, all inactive compounds should show low fitness to the pharmacophore hypothesis.
After scoring the generated pharmacophore hypotheses, the best-scored pharmacophore model consisting of seven pharmacophore features (ADDDHRR- Figure 4A) was selected. The survival score (6.923) and the inactive score (1.688) of the hypothesis are shown in Table 3. The pharmacophore features were specified as the hydrogen-bond acceptor (A), the hydrogen bond donor (D), the hydrophobic (H), the negative ionic (N), the positive ionic (P), and the aromatic ring (R). It is worth noting that the less feature-based pharmacophores show weak discrimination between actives and inactives. Most inactive compounds showed a high fitness to the established pharmacophore features which led to an increase in the inactive score as shown for the DDDHRR and DDHR hypotheses (Table 3). discriminate between active and inactive compounds. For the selected pharmacophore hypothesis, all inactive compounds should show low fitness to the pharmacophore hypothesis.
After scoring the generated pharmacophore hypotheses, the best-scored pharmacophore model consisting of seven pharmacophore features (ADDDHRR- Figure  4A) was selected. The survival score (6.923) and the inactive score (1.688) of the hypothesis are shown in Table 3. The pharmacophore features were specified as the hydrogen-bond acceptor (A), the hydrogen bond donor (D), the hydrophobic (H), the negative ionic (N), the positive ionic (P), and the aromatic ring (R). It is worth noting that the less featurebased pharmacophores show weak discrimination between actives and inactives. Most inactive compounds showed a high fitness to the established pharmacophore features which led to an increase in the inactive score as shown for the DDDHRR and DDHR hypotheses (Table 3).  The generated pharmacophore model (ADDDHRR) was mapped onto the most active compound 1. This pharmacophore model shows the importance of the hydrogen bond donor and acceptor functions of the hydrazide moiety ( Figure 4B). The carbonyl oxygen of the hydrazide serves as a hydrogen bond acceptor, while the two nitrogen atoms serve as hydrogen bond donor groups. The alkyl chain shows hydrophobic features while the two aromatic ring systems are assigned as aromatic features. The amide moiety  The generated pharmacophore model (ADDDHRR) was mapped onto the most active compound 1. This pharmacophore model shows the importance of the hydrogen bond donor and acceptor functions of the hydrazide moiety ( Figure 4B). The carbonyl oxygen of the hydrazide serves as a hydrogen bond acceptor, while the two nitrogen atoms serve as hydrogen bond donor groups. The alkyl chain shows hydrophobic features while the two aromatic ring systems are assigned as aromatic features. The amide moiety between the linker acts as a hydrogen bond donor via the amide-NH (details shown in the docking part). Accordingly, the selected ADDDHRR pharmacophore model shows the important structural features which can interact with the HDAC3 pocket.
In conclusion, the common pharmacophore features were determined using the active compounds in this step. The established pharmacophore model shows the required features for the binding to HDAC3. Since there is no reported X-ray structure of HDAC3 with an alkylhydrazide, the pharmacophore model gives an insight into the possible binding mode of alkylhydrazide derivatives.

Analysis of the Atom-Based 3D-QSAR Model
The atom-based 3D-QSAR model was built in Schrödinger19 using the 39 compounds in the training set [66][67][68]. Atom-based QSAR treats molecules as a set of overlapping van der Waals spheres. The spheres are divided into six categories: hydrogen bond donors; hydrophobic/non-polar; negative ionic; positive ionic; electron withdrawing; and miscellaneous [67,68]. The 3D-QSAR model enables us to consider all relevant structural features such as steric clashes as well as pharmacophores which play a role in the HDAC3 activity of the compounds. In this step, the previously selected seven-featured pharmacophore model (ADDDHRR) was used as an alignment rule for the generation of an atom-based QSAR model. First, 39 compounds were aligned to the pharmacophore model and then the atom-based 3D-QSAR models were generated and cross-validated. The best atombased 3D-QSAR model was obtained with a good correlation coefficient (R 2 : 0.95) and cross-validated correlation coefficient (Q 2 : 0.88) ( Table 4). The atom-based 3D-QSAR techniques visualize the compounds as 3D (three-dimensional) based on the non-covalent protein-ligand interactions such as the hydrogen bond acceptor and donor, hydrophobic, and positive and negative ionic interactions. The model marks the favorable structural features with green cubes and unfavorable structural features with red cubes. To understand the most favorable and less favorable interactions, we analyzed all compounds from the training set. As examples, three compounds with low activity (compounds 35, 36, and 38) and three compounds with good activity (compounds 1, 2, and 3) from the training set were chosen to analyze the atom-based QSAR model ( Figure S1, Supplement). According to the atom-based QSAR model, compound 35 exhibited poor activity due to its heptyl alkyl chain (Figure S1A, Supplement). As shown in Figure S1B (Supplement), the meta-substituent on the phenyl linker, as exemplified with compound 36, showed an unfavorable effect on the HDAC3 activity. In the case of compound 38, the thiophene ring showed unfavorable structural features, decreasing the HDAC3 activity ( Figure S1C, Supplement). On the other hand, the propyl alkyl chain attached to the hydrazide group is favored for three active compounds ( Figure S1D-F, Supplement). In addition to that, para-substituted cap groups are also favored and covered by green cubes. According to the model visualization, the least active compounds ( Figure S1A-C, Supplement) are mainly covered by red cubes, while the more active compounds, especially the cap groups ( Figure S1D-F, Supplement), are mostly covered by green cubes.
The external validation was performed using a test set which was not used for model generation, with the aim of evaluating the predictive accuracy and reliability of the generated atom-based QSAR model. The scatter plot of the training and test set is shown in Figure 5. The prediction results of the training, test, and inactive databases are shown in Table S2.
Analysis of the test set revealed that the atom-based QSAR model predicts the active inhibitors well, with differences less than 1 log unit. However, several of the moderately active inhibitors (compounds 51, 53, 55, and 56) in the external test set as well as the seven inactive compounds were predicted, with differences of more than 1 log unit ( Table 5,  Table S2 Supplement). The atom-based QSAR model classified most of the moderately active and inactive inhibitors into the active class. Due to the limited accuracy of the atombased models in correctly predicting the inactives/weakly actives, we tried to overcome this by generating structure-based prediction models. For this, we applied the docking and binding free energy calculation techniques discussed in the next section. generated atom-based QSAR model. The scatter plot of the training and test set is shown in Figure 5. The prediction results of the training, test, and inactive databases are shown in Table S2. Analysis of the test set revealed that the atom-based QSAR model predicts the active inhibitors well, with differences less than 1 log unit. However, several of the moderately active inhibitors (compounds 51, 53, 55, and 56) in the external test set as well as the seven inactive compounds were predicted, with differences of more than 1 log unit ( Table 5,  Table S2 Supplement). The atom-based QSAR model classified most of the moderately active and inactive inhibitors into the active class. Due to the limited accuracy of the atombased models in correctly predicting the inactives/weakly actives, we tried to overcome this by generating structure-based prediction models. For this, we applied the docking and binding free energy calculation techniques discussed in the next section.

Analyzing the Binding Mode of Alkylhydrazides in HDAC3
We started with docking all inhibitors to HDAC3 (PDB ID: 4A69 [69]) ( Figure S2, Supplement). We used a docking set-up in Glide which we previously validated for HDAC inhibitors from different chemical series [31,34,53]. To date, there is no crystal structure of an HDAC in complex with an alkylhydrazide, but we have recently shown that alkylhydrazides similar to inhibitors 1 and 47 (Table S1, Supplement) [47,51] are reversible and substrate competitive inhibitors of HDACs [53]. Thus, we docked the alkylhydrazides into the catalytic pocket of HDAC3 and analyzed whether they are able to chelate the catalytic zinc ion. The analysis of the docking results of the active inhibitors, as exemplified by compounds 1 and 2 from the training set ( Figure 6), showed that the hydrazide moiety chelates the zinc ion in a bidentate manner through its nitrogen and carbonyl oxygen and exhibits conserved hydrogen bond interactions with H134, H135, and Y298 at the bottom of the catalytic pocket. The aromatic linker group was placed into the hydrophobic tunnel consisting of F144, H172, F200, and L266, where it undergoes aromatic pi-pi interactions with F144 and F200. The cap group interacts with residues at the surface by forming hydrogen bond interactions with D93 as well hydrophobic interactions with H22 and P23 in HDAC3. A structural difference which influences the potency and selectivity on HDAC3 is observed in the foot pocket region. According to the docking results, the propyl and butyl chains of the alkylhydrazides fit well into the foot pocket of HDAC3. However, replacing the propyl or butyl chains by pentyl or longer side chains resulted in a dramatic decrease in HDAC3 activity due to the steric hindrance observed in HDAC3. The Y107 residue pushes L133, resulting in a narrower foot pocket region [31,34]. Hence, the pentyl and longer alkyl chains in the foot pocket region show steric clashing with M24 and L133, causing a decrease in or loss of HDAC3 activity.
that alkylhydrazides similar to inhibitors 1 and 47 (Table S1, Supplement) [47,51] are reversible and substrate competitive inhibitors of HDACs [53]. Thus, we docked the alkylhydrazides into the catalytic pocket of HDAC3 and analyzed whether they are able to chelate the catalytic zinc ion. The analysis of the docking results of the active inhibitors, as exemplified by compounds 1 and 2 from the training set ( Figure 6), showed that the hydrazide moiety chelates the zinc ion in a bidentate manner through its nitrogen and carbonyl oxygen and exhibits conserved hydrogen bond interactions with H134, H135, and Y298 at the bottom of the catalytic pocket. The aromatic linker group was placed into the hydrophobic tunnel consisting of F144, H172, F200, and L266, where it undergoes aromatic pi-pi interactions with F144 and F200. The cap group interacts with residues at the surface by forming hydrogen bond interactions with D93 as well hydrophobic interactions with H22 and P23 in HDAC3. A structural difference which influences the potency and selectivity on HDAC3 is observed in the foot pocket region. According to the docking results, the propyl and butyl chains of the alkylhydrazides fit well into the foot pocket of HDAC3. However, replacing the propyl or butyl chains by pentyl or longer side chains resulted in a dramatic decrease in HDAC3 activity due to the steric hindrance observed in HDAC3. The Y107 residue pushes L133, resulting in a narrower foot pocket region [31,34]. Hence, the pentyl and longer alkyl chains in the foot pocket region show steric clashing with M24 and L133, causing a decrease in or loss of HDAC3 activity.  Although the docking poses show reasonable binding modes in the HDAC3 active site, the correlation between the docking scores and pIC 50 values was poor (R 2 = 0.28 for HDAC3). Thus, we rescored the docking poses by calculating the binding free energies.
In addition to the docking results, we checked the stability of the predicted interactions of the potent inhibitors 1 and 2 with the binding site using 100 ns MD simulation (Figures 7, 8, S3 and S4, Supplement). MD simulation analysis of compounds 1 and 2 revealed that the n-propyl chain attached to the hydrazide fit into the foot pocket consisting of M24 and L133. Notably, M24 and L133 play a key role as a gate keeper in the foot pocket region of HDAC3. M24 and L133 closed the foot pocket and made the volume narrower where only a propyl or butyl side chain favorably fit. This conformational change of M24 and L133 might explain the decrease in or loss of HDAC3 activity of the compounds with longer alkyl chains than butyl and propyl. The zinc-binding group which is the common part of compounds 1 and 2 preserves its bidentate coordination and undergoes hydrogen bond interactions with H134, H135, and Y298 throughout the MD simulation. Furthermore, the linker groups of compounds 1 and 2 remain sandwiched between F144 and F200. Besides these similar protein-ligand interactions of compounds 1 and 2, the MD simulation analysis displayed some differences in the cap region of compounds 1 and 2.
predicted interactions were lost, namely the hydrogen bond interaction between the amide group and D93 as well as the interaction between the hydrazide-carbonyl-O and Y298, due to the flexibility of the latter residue ( Figure 7). The hydrogen bond distances of the HDAC3-inhibitor 1 complex throughout the 100 ns MD simulations were analyzed and plotted in Figure S4 (Supplement). No significant fluctuation was observed for the benzofuran cap group of compound 1, which remains embedded in a hydrophobic pocket and undergoes aromatic interaction with H22 at the surface of the protein. According to the MD simulation of compound 1, the selected docking pose was stable during the 100 ns MD simulation (Figures 7 and S3, Supplement). Throughout the MD simulation, the ligand maintained the predicted binding conformation, albeit two of the predicted interactions were lost, namely the hydrogen bond interaction between the amide group and D93 as well as the interaction between the hydrazide-carbonyl-O and Y298, due to the flexibility of the latter residue (Figure 7). The hydrogen bond distances of the HDAC3-inhibitor 1 complex throughout the 100 ns MD simulations were analyzed and plotted in Figure S4 (Supplement). No significant fluctuation was observed for the benzofuran cap group of compound 1, which remains embedded in a hydrophobic pocket and undergoes aromatic interaction with H22 at the surface of the protein.
In the case of compound 2 (Figure 8), the flexible 2-methylindole cap group showed conformational changes. Hence, the ligand RMSD of compound 2 showed higher fluctuations ( Figure S5, Supplement). During the MD simulation, the 2-methylindole group showed two different orientations: between 40-60 ns of the MD simulation, the cap group adopts an orientation where it undergoes edge-to-face interaction with F144. For the rest of the simulation time, the cap group showed the initially observed position and interacted with H22. In contrast to compound 1, Y298 showed less fluctuation and maintained its interaction with compound 2. Throughout the 100 ns MD simulation, compound 2 showed stable binding and maintained its bidentate chelation with the zinc ion. The hydrogen bond distances of the HDAC3-inhibitor 2 complex throughout the 100 ns MD simulations were analyzed and plotted in Figure S6 (Supplement).
In conclusion, since so far no X-ray structure has been released of an HDAC with an alkylhydrazide inhibitor complex, we docked the compounds to HDAC3 to examine the putative binding mode and to rationalize the observed SAR. Additionally, the observed proteinligand interactions were analyzed by MD simulations. The interaction at the catalytic pocket was found to be highly stable whereas some fluctuation was observed for the flexible capping groups that are located at the solvent-exposed part of the binding pocket. To provide further support for the predicted binding poses of the alkyl hydrazides, we previously examined the substrate competition of two alkyl hydrazides for the related class I member HDAC8 and confirmed that they reversibly inhibit and exhibit competitive substrate binding. However, cocrystal structures of HDACs with alkylhydrazide-based inhibitors have to be obtained to confirm the modeling results. Relevant residues are shown in stick representation with salmon carbon atoms in HDAC3. The ligand is shown in stick representation with green carbon atoms. The zinc ion is shown as a cyancolored sphere.
In the case of compound 2 (Figure 8), the flexible 2-methylindole cap group showed conformational changes. Hence, the ligand RMSD of compound 2 showed higher fluctuations ( Figure S5, Supplement). During the MD simulation, the 2-methylindole group showed two different orientations: between 40-60 ns of the MD simulation, the cap group adopts an orientation where it undergoes edge-to-face interaction with F144. For the rest of the simulation time, the cap group showed the initially observed position and interacted with H22. In contrast to compound 1, Y298 showed less fluctuation and maintained its interaction with compound 2. Throughout the 100 ns MD simulation, compound 2 showed stable binding and maintained its bidentate chelation with the zinc ion. The hydrogen bond distances of the HDAC3-inhibitor 2 complex throughout the 100 ns MD simulations were analyzed and plotted in Figure S6 (Supplement). In conclusion, since so far no X-ray structure has been released of an HDAC with an alkylhydrazide inhibitor complex, we docked the compounds to HDAC3 to examine the putative binding mode and to rationalize the observed SAR. Additionally, the observed proteinligand interactions were analyzed by MD simulations. The interaction at the

Binding Free Energy Calculation
Due to the low correlation between the docking scores and pIC 50 values, rescoring of the selected docking poses was performed using the MM/GBSA method in AMBER16 [70]. The total energies of HDAC3-inhibitor complexes were calculated using four different parameter settings (solvation models) and six different frame settings (see the Methods part for details). The same training set including the 39 compounds that was used for the atom-based 3D-QSAR model was also used for model generation based on the calculated binding free energies of the compounds. In total, 24 models were generated. The models were assessed based on the correlation coefficients (R 2 ) between the biological data and the calculated energy values, taking into account Tropsha's criteria for reliable QSAR mod-els [71] (Figure 9). The prediction results of the training, test set, and inactive compounds are shown in Table S3 (Supplement). parameter settings (solvation models) and six different frame settings (see the Methods part for details). The same training set including the 39 compounds that was used for the atom-based 3D-QSAR model was also used for model generation based on the calculated binding free energies of the compounds. In total, 24 models were generated. The models were assessed based on the correlation coefficients (R 2 ) between the biological data and the calculated energy values, taking into account Tropsha's criteria for reliable QSAR models [71] (Figure 9). The prediction results of the training, test set, and inactive compounds are shown in Table S3 (Supplement). According to the Tropsha criteria [71], a good QSAR model should abide by the following rules; R 2 > 0.6 and Q 2 > 0.5. Based on the mentioned rule, models showing a calculated R 2 value > 0.6 were considered for further statistical analysis (MODEL1, 7, 13, and 19). Interestingly, the four selected models are all based on the protein-ligand complex obtained with one minimization step (Emin1, Table 6). Further internal validation of the selected models was analyzed using the leave-one-out (LOO) method, 3fold-cross-validation (cv), and 10-fold cv. According to the Tropsha criteria [71], a good QSAR model should abide by the following rules; R 2 > 0.6 and Q 2 > 0.5. Based on the mentioned rule, models showing a calculated R 2 value > 0.6 were considered for further statistical analysis (MODEL1, 7, 13, and 19). Interestingly, the four selected models are all based on the protein-ligand complex obtained with one minimization step (Emin1, Table 6). Further internal validation of the selected models was analyzed using the leave-one-out (LOO) method, 3-fold-crossvalidation (cv), and 10-fold cv. Abbreviations: R 2 (correlation coefficient), RMSE (root mean square error), Q 2 LOO (leave one-out cross-validation), QMSE (crossed-root mean square error), and Emin1 (single frame after the first energy minimization step).
The four selected models showed R 2 > 0.6 and Q 2 > 0.5. Among the selected four models, the GB8 (GBNeck) implicit solvation model outperformed the other methods (GB HCT refers to GB1, and GB OBC refers to GB2-5 in the article). The reason might be that the GBNeck model (referred to GB8) was generated to correct the van der Waals surface that is inaccessible to water [72]. This improvement in GB8 might help to obtain better results for the compounds used in this article. Model 19 based on the GB8 implicit solvation model showed the highest R 2 and Q 2 values (LOO-method) with 0.81 and 0.78, respectively, and the lowest RMSE and QMSE values with 0.49 and 0.52, respectively ( Figure 10 and Table 6). In addition, we tested whether the inclusion of a 2D descriptor for the shape/electronic properties of the inhibitors could improve the models. Two-dimensional descriptors were computed for all compounds in MOE [64]. All available 2D descriptors were then assessed for their ability to improve the model. The total hydrophobic van der Waals surface area (PEOE_VSA_HYD) gave the best improvement. The combination of this 2D descriptor and energy term improved the R 2 from 0.81 to 0.87 and the Q 2 (LOO) from 0.78 to 0.84. In addition, this model (MODEL19_1) exhibited lower RMSE and QMSE values compared to MODEL19 (Table 6). solvation model showed the highest R 2 and Q 2 values (LOO-method) with 0.81 and 0.78, respectively, and the lowest RMSE and QMSE values with 0.49 and 0.52, respectively ( Figure 10 and Table 6). In addition, we tested whether the inclusion of a 2D descriptor for the shape/electronic properties of the inhibitors could improve the models. Twodimensional descriptors were computed for all compounds in MOE [64]. All available 2D descriptors were then assessed for their ability to improve the model. The total hydrophobic van der Waals surface area (PEOE_VSA_HYD) gave the best improvement. The combination of this 2D descriptor and energy term improved the R 2 from 0.81 to 0.87 and the Q 2 (LOO) from 0.78 to 0.84. In addition, this model (MODEL19_1) exhibited lower RMSE and QMSE values compared to MODEL19 (Table 6). The test set was used to evaluate the accuracy of the best generated model (Model19_1). In the test set (Table 7), all compounds were predicted with less than 1 log unit difference. Additionally, the prediction of the inactive compounds was more satisfying compared to the previously described atom-based QSAR models, with a difference of less than 1 log unit except for compound 61 (Table S3, Supplement). For compound 61, the docking poses could not explain the incorrect prediction. The scatter The test set was used to evaluate the accuracy of the best generated model (Model19_1). In the test set (Table 7), all compounds were predicted with less than 1 log unit difference. Additionally, the prediction of the inactive compounds was more satisfying compared to the previously described atom-based QSAR models, with a difference of less than 1 log unit except for compound 61 (Table S3, Supplement). For compound 61, the docking poses could not explain the incorrect prediction. The scatter plot and prediction results are shown in Figure 10 and Tables 7 and S3 (Supplement), respectively.

Evaluation of the Generated Models on Newly Designed Compounds
The created models, the atom-based 3D QSAR model and Model 19_1, and the generated docking poses were then used to predict alkyl hydrazides with other linkers and capping groups that were synthesized (chemistry and in vitro testing were published elsewhere [53]). These compounds were designed starting from compound 47 (Table S1, Supplement), where several structural modifications were introduced to extend the SAR on this series of HDAC3 inhibitors. In the first series of compounds, the effect of a different length for the alkyl side chain was evaluated. In the second series of compounds, different substitutions on position 3 or 4 of the phenyl ring were introduced. In the next series of compounds, the aminopyrimidine linker group with different N-arylmethyl, N-arylethyl, or N-ethylpiperazinyl moieties were tested, while in the last series of compounds, piperazinyl-piperidine linker groups were attached to the phenylalkylhydrazide core of the compounds. The general structures of the new inhibitors are summarized in Figure 11. The experimentally determined HDAC3 IC 50 values and the prediction results are shown in Tables 8, S5 and S6 (Supplement, all 2D structures are summarized in Table S4, Supplement). Analyzing the atom-based prediction results of the 26 new compounds revealed interesting results (Table 8). Based on the atom-based QSAR model prediction results, the absolute difference between the experimental and predicted pIC50 of the nine compounds, which are moderately active or inactive, was more than >1 log unit, i.e., these compounds were predicted to be more active than experimentally determined. The atom-based QSAR model makes predictions based on the effect of electron-withdrawing groups, electrondonating groups, and hydrophobic groups of compounds considering the created pharmacophore hypothesis. However, in the case of HDAC3, the shape of the foot pocket plays an important role in the inhibitor activity. Due to the smaller volume of the HDAC3 foot pocket, compounds with alkyl groups longer than butyl are moderately active or inactive as determined by the in vitro results. Atom-based QSAR models do not take the pocket volume into account, hence resulting in the observed weak prediction of these derivatives. In conclusion, the atom-based QSAR model showed a weak discriminatory power.
Then we evaluated the binding-free-energy-based prediction results (Table 8). Initially, all 26 compounds were docked to HDAC3 using the same protocol as for the training set. Similar docking poses in HDAC3 were obtained for all 26 new compounds as obtained for compounds 1 and 2 (exemplified in Figure 12). The docking studies showed that the hydrazide moiety as well as the aromatic linker groups of all 26 compounds exhibited similar interactions as observed for compounds 1 and 2. A bidentate chelation between the zinc ion and hydrazide moiety was observed for all compounds. In addition, the hydrazide moiety showed hydrogen bonds with H134, H135, and Y298 in the zincbinding region of the HDAC3 catalytic pocket. The aromatic linker groups were accommodated into the hydrophobic tunnel and interacted with F144 and F200, showing First, the ligand-based models; i.e., the pharmacophore models and 3D_atom-based QSAR models, were tested. The developed pharmacophore model was used to align the 26 new compounds to apply the atom-based QSAR model.
Analyzing the atom-based prediction results of the 26 new compounds revealed interesting results (Table 8). Based on the atom-based QSAR model prediction results, the absolute difference between the experimental and predicted pIC 50 of the nine compounds, which are moderately active or inactive, was more than >1 log unit, i.e., these compounds were predicted to be more active than experimentally determined. The atombased QSAR model makes predictions based on the effect of electron-withdrawing groups, electron-donating groups, and hydrophobic groups of compounds considering the created pharmacophore hypothesis. However, in the case of HDAC3, the shape of the foot pocket plays an important role in the inhibitor activity. Due to the smaller volume of the HDAC3 foot pocket, compounds with alkyl groups longer than butyl are moderately active or inactive as determined by the in vitro results. Atom-based QSAR models do not take the pocket volume into account, hence resulting in the observed weak prediction of these derivatives. In conclusion, the atom-based QSAR model showed a weak discriminatory power. Then we evaluated the binding-free-energy-based prediction results (Table 8). Initially, all 26 compounds were docked to HDAC3 using the same protocol as for the training set. Similar docking poses in HDAC3 were obtained for all 26 new compounds as obtained for compounds 1 and 2 (exemplified in Figure 12). The docking studies showed that the hydrazide moiety as well as the aromatic linker groups of all 26 compounds exhibited similar interactions as observed for compounds 1 and 2. A bidentate chelation between the zinc ion and hydrazide moiety was observed for all compounds. In addition, the hydrazide moiety showed hydrogen bonds with H134, H135, and Y298 in the zinc-binding region of the HDAC3 catalytic pocket. The aromatic linker groups were accommodated into the hydrophobic tunnel and interacted with F144 and F200, showing pi-pi interactions. Meanwhile, the cap groups and foot-pocket-targeting groups showed significant differences which has an impact on the HDAC3 activity.
In the first series, exemplified by compound 64 (Figure 12), the acetoamidomethyl cap group was placed at the entrance of the pocket and showed hydrogen bond interactions with D93. The different length of the hydrazide alkyl side chain resulted in a significant difference in HDAC3 activity. Compound 64 possessing a propyl side chain was predicted to be more active than 65, 66, and 67, which is in line with the experimentally determined data. The difference between the experimental and predicted values is indeed less than <1 log unit for this series. Moreover, compound 67 with a hexyl side chain was predicted to be inactive. This result confirms that the BFE model is sensible to the side chain effects on this dataset. In the first series, exemplified by compound 64 (Figure 12), the acetoamidomethyl cap group was placed at the entrance of the pocket and showed hydrogen bond interactions with D93. The different length of the hydrazide alkyl side chain resulted in a significant difference in HDAC3 activity. Compound 64 possessing a propyl side chain was predicted to be more active than 65, 66, and 67, which is in line with the experimentally determined data. The difference between the experimental and predicted values is indeed less than <1 log unit for this series. Moreover, compound 67 with a hexyl side chain was predicted to be inactive. This result confirms that the BFE model is sensible to the side chain effects on this dataset.
In the second series, only compound 68 ( Figure 12) bearing the acetamidomethyl cap group and propyl side chain in the foot-pocket-targeting region showed moderate activity on HDAC3. The other compounds 69, 70, and 71 with a hexyl side chain, did not show significant activity on HDAC3. Similar to the experimentally determined activity, the BFE model also predicted compound 68 as a moderate inhibitor, while 69, 70, and 71 were predicted as inactive compounds. In the second series, only compound 68 ( Figure 12) bearing the acetamidomethyl cap group and propyl side chain in the foot-pocket-targeting region showed moderate activity on HDAC3. The other compounds 69, 70, and 71 with a hexyl side chain, did not show significant activity on HDAC3. Similar to the experimentally determined activity, the BFE model also predicted compound 68 as a moderate inhibitor, while 69, 70, and 71 were predicted as inactive compounds.
In the third series of compounds bearing an aminopyrimidine moiety as a linker group, different N-arylmethyl, N-arylethyl, or N-methylpiperaziniyl moieties as capping groups were tested. This series is exemplified by compound 72 (Figure 12). All compounds bearing a propyl side chain except compound 74 were predicted with less than <1 log unit compared to the experimentally determined activities. Interestingly, compound 74 bearing an N-arylethyl cap group did not show significant inhibitory activity (41%@1µM) on HDAC3; however, it was predicted by the model as an active inhibitor. The flexible cap group could be the reason for the reduced activity on HDAC3. Finally, compounds 81-84 possessing a hexyl side chain were predicted as inactive inhibitors which is in line with the experimental findings.
The last series of compounds bearing indole or N-methylindole groups attached via a methyl or ethyl linker to the piperazinyl-pyrimidine scaffold were predicted close to the experimental activities. The difference between the experimental and predicted values is less than <1 log unit. The docking poses of this series are exemplified by compound 85 (Figure 12).
In conclusion we applied the validated BFE models to the further development of the alkylhydrazide-based class I HDAC inhibitors. The best inhibitors from this series were also tested for their immunmodulatory effects in Jurkat cells and showed promising cellular effects [53]. As we have recently demonstrated a potent T cell memory response by combined class I HDAC inhibition and immune-checkpoint blockade in hepatocellular carcinoma (HCC) therapy, the new alkylhydrazides represent an interesting class of inhibitors to explore their potential for cancer therapy.

Ligand Database Preparation
A ligand dataset of 63 compounds with hydrazide as the zinc-binding group (ZBG) was collected from the literature [47,49,51]. Only compounds having an N-monosubstituted hydrazide scaffold were considered. The IC 50 values of the selected compounds were retrieved from three publications and they all were determined against HDAC3 using the same fluorogenic substrate (Boc-Lys(acetyl)-AMC (amino methyl coumarin)). The same human recombinant HDAC3 enzyme was used for the in vitro studies [47,49,51]. The compounds were prepared in ligprep tool using the OPLS3e forcefield in Schrödinger suite [73]. Subsequently, the output of the ligprep step was submitted the Confgen to generate 64 conformers per ligand while minimizing the output conformers using the OPLS3e forcefield [73,74]. The compounds were automatically divided into a training (70%) and external test set (30%) using the "RAND" function in the MOE program (MOE-Molecular database calculator-RAND) [64]. The same training set and external test set were used for the model development studies. The compounds with no exact IC 50 values were considered as inactive. The QSAR models were built using the most active and moderate inhibitors for which exact IC 50 values were available.
The diversity analysis of the compounds was performed by analyzing the three most important principal components using the principal component analysis (PCA) implemented in MOE [60,64,65]. The 2D descriptors were computed in MOE [64]. Several 2D descriptors were selected using the Contingency tool in MOE. The three most important principal components (PCA1, PCA2, and PCA3) were calculated using the selected 2D descriptors. These principal components were used to check the diversity of the compounds.
The 26 compounds were collected from the article published by our group to evaluate the established models and check their reliability in different datasets [53]. The ligands were prepared using the same protocol as used for the validation set.

Pharmacophore Model
The pharmacophore model was established using 30 inhibitors with IC 50 values lower than 100 nM in the training set and the 7 inactive compounds in the Phase module of Schrödinger [66]. The compounds were prepared in ligprep using the OPLS3e forcefield in the previous step [73,74]. The conformational search was performed in the Phase module by adjusting 64 conformers per compound and minimizing the output conformers using the "Develop Pharmacophore model" module in Schrödinger [66]. The common pharmacophore hypotheses were developed, scored, and ranked. The selected pharmacophore model was used as an alignment rule for the atom-based 3D-QSAR model.

Atom-Based 3D-QSAR Model
The ligand-based 3D-QSAR model was generated using the training dataset in the Phase module of Schrödinger [66]. The 39 compounds in the training database were aligned using the selected pharmacophore hypothesis from the previous step. The QSAR models were built with four latent factors and 1.0 Å grid spacing as well as the leave-one-outcross-validation approach. The generated models were evaluated by means of standard deviation of the regression (SD), R 2 (correlation coefficient of regression), RMSE (root mean square error of test set prediction), and Q 2 (cross-training of test set prediction).

Docking Study
The hydroxamic acid scaffold and hydrazide scaffold are structurally similar groups. Therefore, the X-ray crystal structures of HDAC2 (PDB ID: 4LXZ [35]) and HDAC3 (PDB ID: 4A69 [69]) were retrieved from the Protein Data Bank (PDB, rcsb.org [(accessed on 20 May 2022) [75]) and analyzed in MOE [64]. SAHA with a hydroxamic acid scaffold in complex with the HDAC2 protein (PDB ID: 4LXZ) was defined as a pan-HDAC inhibitor and showed activity on HDAC3 [35]. First, HDAC2 (PDB ID: 4LXZ) and HDAC3 (PDB ID: 4A69) were superposed in MOE [64]. Then, SAHA was transferred from the HDAC2 protein (PDB ID: 4LXZ) to HDAC3 to mimic the induced fit effect of the zinc-binding group.
The HDAC3-SAHA complex was prepared in the protein preparation wizard of Schrödinger's suite by adding hydrogen bonds and missing side chains and assigning the bond orders [73]. The water molecules (except W2083) and ions (except Zn +2 ions) were deleted. The protonation states and tautomers were optimized at pH 7.4 using the PROPKA tool. The optimized complex was minimized using the OPLS3e force field to remove the steric clashes [74].
Molecular docking studies were carried out by applying the standard precision (SP) mode in Glide implemented in Schrödinger Suite [73]. The grid box including the information on the active site coordinates of the proteins was defined with a 10 Å radius around the ligand. Ten docking poses were employed for further post-docking minimization. The other settings were kept as the default. The docking results were visually analyzed in the MOE program [64].

Molecular Dynamics Simulation
The selected docking poses of compounds 1 and 2 in complex with HDAC3 (PDB ID: 4A69) were subjected to a 100 ns MD simulation in AMBER16 [70]. The Antechamber package was used to prepare the topologies, force field parameters, atom types, and bond types by applying the semi-empirical Austin Model1 with bond charge correction (AM1-BCC) [76,77]. Then, the tLEaP module was employed to prepare the protein-ligand complexes. General amber force field (GAFF), the Duan force field (ff03.r1), and 12-6-4LJ ionic model were used for the ligand, protein, and zinc, respectively [78][79][80][81]. The system was solvated by the TIP3P water model and a margin of 10 Å. Two minimization steps including the two sub-steps in each minimization were carried out. In the first step, 4000 iterations (2000 cycles of steepest descent and then 2000 of the conjugate gradient) were performed, while the protein residues, ligand, and zinc ion were restrained to their initial geometries (force constant of 10 kcal*mol −1 * Å −2 ) to relieve the bad contacts. In the second step, 4000 iterations (2000 cycles of steepest descent and then 2000 of the conjugate gradient) were performed to remove the steric clashes in the entire complex. The restraint on the protein, ligand, and zinc were removed during the second minimization. Then, the system was heated at 300 K through 100 ps of MD. The protein-ligand complex was restrained to prevent large structural deviations (force constant of 10 kcal*mol −1 * Å −2 ). The SHAKE algorithm was activated to constrain bonds involving hydrogens [82]. Finally, the system was equilibrated within a period of 200 ps. Langevin dynamics was applied to keep the temperature at 300 K with a collision frequency of 2 ps [83]. The pressure was kept at 1 bar using isotropic position scaling with a relaxation time of 2 ps. Afterwards, a 100 ns MD simulation was run with a time step of 2 fs using the same conditions as in the equilibration step. A non-bonded cut-off distance of 10 Å was used. The electrostatic interactions were calculated by applying the particle mesh Ewald (PME) method. After the MD simulation, CPPTRAJ module of AMBER was used to analyze the MD snapshots.

Conclusions
In the current study, we have evaluated several QSAR models including ligandbased and structure-based techniques to understand the structure-activity relationship of alkylhydrazides developed as HDAC3 inhibitors. Additionally, the binding modes of the two most potent HDAC3 inhibitors (compounds 1 and 2) were verified through 100 ns MD simulation since there is no X-ray structure crystallized with an alkylhydrazide derivative. With the aid of ligand-based and structure-based approaches, in-house computational models have been developed for the prediction of the HDAC3 inhibitory activity of the alkylhydrazide scaffold.
The ligand-based models enabled us to obtain a general overview of the binding of the compounds in the HDAC3 protein. The established pharmacophore model and atom-based 3D-QSAR model can be used to filter the big databases. Since the shape of the foot pocket of HDAC3 has a crucial impact on the HDAC3 inhibitory activity, the predictive power of the ligand-based models was not satisfactory. These models predicted moderate inhibitors and inactive compounds as active compounds, although they predicted the actives as actives. Thus, these methods should be used to reduce the number of compounds in the big database.
The weakness of the ligand-based methods directed us to generate the structurebased methods. The binding mode of the alkylhydrazide was predicted by docking. The selected binding modes from the validation set (compounds 1 and 2) were verified by the 100 ns MD simulation. The analysis of the MD simulation revealed that M24 and L133 are gatekeepers in the foot pocket of HDAC3. Additionally, the alkylhydrazides kept their bidentate chelation to the zinc ion during the 100 ns MD simulation. The compounds were rescored by means of BFE calculations. The binding free energies were correlated with the experimentally derived inhibitory activities. The established binding-free-energy-based QSAR model predicted all of the compounds in the test set with less than 1 log difference. Additionally, we tested the established model on a new dataset containing 26 molecules which were designed, synthesized, and tested taking knowledge of the developed BFE models [53]. The structure-based model was able to predict these novel compounds with less than 1 log unit error and showed its value for chemical optimization. For the different test sets, the structure-based model showed better accuracy than the ligand-based models. The combination of structure-based and ligand-based models resulted in predictive QSAR models in the current study. These provide useful tools for the further design and optimization of alkylhydrazide derivatives as HDAC inhibitors.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/ph16070968/s1. Figure S1: Visualization of the atom-based 3D-QSAR model in the training set; Figure S2: Docking poses of the training set, the external test set, and the inactive set; Figure S3: 100 ns MD results for the HDAC3-1 complex; Figure S4: 100 ns MD results for the HDAC3-2 complex; Figure S5: MD analysis results of HDAC3-1 complex. Figure S6: MD analysis results of HDAC3-2 complex. Table S1: Two-dimensional chemical structures and biological data of compounds in the training set, external test set, and inactive set; Table S2: Prediction values of the best atom-based QSAR model generated for the training set, test set, and inactive set in HDAC3; Table S3: Prediction values of the best BFE model generated for the training set, test set, and inactive set in HDAC3; Table S4: Twodimensional chemical structures and biological data of the test set; Table S5: Prediction values of the best atom-based QSAR model generated for the test set in HDAC3; Table S6: Prediction values of the best BFE model generated for the test set in HDAC3.