A Thoroughly Validated Virtual Screening Strategy for Discovery of Novel HDAC3 Inhibitors

Histone deacetylase 3 (HDAC3) has been recently identified as a potential target for the treatment of cancer and other diseases, such as chronic inflammation, neurodegenerative diseases, and diabetes. Virtual screening (VS) is currently a routine technique for hit identification, but its success depends on rational development of VS strategies. To facilitate this process, we applied our previously released benchmarking dataset, i.e., MUBD-HDAC3 to the evaluation of structure-based VS (SBVS) and ligand-based VS (LBVS) combinatorial approaches. We have identified FRED (Chemgauss4) docking against a structural model of HDAC3, i.e., SAHA-3 generated by a computationally inexpensive “flexible docking”, as the best SBVS approach and a common feature pharmacophore model, i.e., Hypo1 generated by Catalyst/HipHop as the optimal model for LBVS. We then developed a pipeline that was composed of Hypo1, FRED (Chemgauss4), and SAHA-3 sequentially, and demonstrated that it was superior to other combinations in terms of ligand enrichment. In summary, we present the first highly-validated, rationally-designed VS strategy specific to HDAC3 inhibitor discovery. The constructed pipeline is publicly accessible for the scientific community to identify novel HDAC3 inhibitors in a time-efficient and cost-effective way.

In the whole family of HDACs, HDAC3 has gained more and more attention in recent years and showing its potential as a target for modern drug discovery. HDAC3 maintains the primarily In the whole family of HDACs, HDAC3 has gained more and more attention in recent years and showing its potential as a target for modern drug discovery. HDAC3 maintains the primarily deacetylated state of nuclear factor κB (NF-κB), which is a positive regulator of inflammatory gene expression [7]. Due to this, HDAC3 plays an important role in inflammatory responses. Moreover, the inhibition of HDAC3 brings about anti-inflammatory effects and is, thus, beneficial for the treatment of many chronic inflammatory diseases, such as airway inflammation and inflammatory bowel diseases [6,8].
In recent years, the relationship between HDAC3 and learning/memory has been gradually uncovered. Wood's group identified HDAC3 as a negative regulator of long-term memory formation, and demonstrated HDAC3 inhibition was able to improve cognitive impairments [9]. They also discovered HDAC3 negatively regulated cocaine-context-associated memory formation [10], and the HDAC3-selective inhibitor RGFP966 (cf. Figure 1) was able to persistently enhance extinction of cocaine-seeking behavior [11]. Therefore, HDAC3 inhibitors (HDAC3Is) are potentially applicable to the treatment of neurodegenerative diseases and the control of drug addiction. In addition, growing evidence suggests a link between HDACs and the pathophysiology of diabetes. All HDACs, except for class III (SIRT1-SIRT7), are expressed in the pancreatic β-cells [12]. The inhibition of HDACs potentially prevents diabetes or reverts it to the normal state through a variety of mechanisms. For instance, it promotes β-cell development, proliferation, differentiation, and also prevents β-cell apoptosis [13][14][15][16][17]. In addition, it improves oxidative metabolism, ameliorates the state of insulin resistance, as well as regulates liver gluconeogenesis [18]. However, cytotoxicity has been observed, as well, in the use of HDAC inhibitors, e.g., SAHA [19,20]. Up to now, great efforts have been undertaken to discriminate the individual HDAC(s) for anti-diabetic benefits from others that cause side effects. Most importantly, Wagner's group demonstrated it was sufficient to suppress β-cell apoptosis by genetic knockdown of Hdac3 alone or by using the HDAC1/2/3-selective inhibitor, MS-275 (cf. Figure 1) [19]. They further developed a set of pharmacological tools that had different inhibitory profiles for HDACs, e.g., HDAC3-selective inhibitor, BRD3308 (cf. Figure 1) and HDAC1/2-selective inhibitor, BRD2492. Comparative analysis of their effects on β-cell survival and megakaryocyte formation (a surrogate measure of bone marrow toxicity) has identified HDAC3 instead of HDAC1/2 as a potential therapeutic target for β-cell protection [20]. Encouragingly, BRD3308 has also been shown to improve glycaemia and insulin secretion in vivo [21]. Currently, many ligands are able to inhibit HDAC3. However, most of them belong to anti-cancer pan-HDAC inhibitors and rarely show specific inhibition for HDAC3. To the best of our knowledge, RGFP966 and BRD3308 are the only HDAC3-specific inhibitors that show therapeutic effects on diseases other than cancers. Therefore, it remains an emerging area to discover novel HDAC3Is for the treatment of those diseases. We have been working on identification of novel-scaffold HDAC inhibitors by using computer-aided drug design (CADD) and cheminformatics [22,23]. Most recently, we developed an automated tool, i.e., MUBD-DecoyMaker for building benchmarking sets able to unbiasedly evaluate ligand enrichment of both ligand-based VS (LBVS) and structure-based VS (SBVS) approaches [24,25]. With that tool, we constructed maximal-unbiased benchmarking datasets (MUBD) for HDACs (including Sirtuins), i.e., MUBD-HDACs and released them in order to facilitate HDAC inhibitors discovery [5]. Up to now, the application of MUBD-HDACs has effectively assisted Huang et al., to identify a novel and In addition, growing evidence suggests a link between HDACs and the pathophysiology of diabetes. All HDACs, except for class III (SIRT1-SIRT7), are expressed in the pancreatic β-cells [12]. The inhibition of HDACs potentially prevents diabetes or reverts it to the normal state through a variety of mechanisms. For instance, it promotes β-cell development, proliferation, differentiation, and also prevents β-cell apoptosis [13][14][15][16][17]. In addition, it improves oxidative metabolism, ameliorates the state of insulin resistance, as well as regulates liver gluconeogenesis [18]. However, cytotoxicity has been observed, as well, in the use of HDAC inhibitors, e.g., SAHA [19,20]. Up to now, great efforts have been undertaken to discriminate the individual HDAC(s) for anti-diabetic benefits from others that cause side effects. Most importantly, Wagner's group demonstrated it was sufficient to suppress β-cell apoptosis by genetic knockdown of Hdac3 alone or by using the HDAC1/2/3-selective inhibitor, MS-275 (cf. Figure 1) [19]. They further developed a set of pharmacological tools that had different inhibitory profiles for HDACs, e.g., HDAC3-selective inhibitor, BRD3308 (cf. Figure 1) and HDAC1/2-selective inhibitor, BRD2492. Comparative analysis of their effects on β-cell survival and megakaryocyte formation (a surrogate measure of bone marrow toxicity) has identified HDAC3 instead of HDAC1/2 as a potential therapeutic target for β-cell protection [20]. Encouragingly, BRD3308 has also been shown to improve glycaemia and insulin secretion in vivo [21]. Currently, many ligands are able to inhibit HDAC3. However, most of them belong to anti-cancer pan-HDAC inhibitors and rarely show specific inhibition for HDAC3. To the best of our knowledge, RGFP966 and BRD3308 are the only HDAC3-specific inhibitors that show therapeutic effects on diseases other than cancers. Therefore, it remains an emerging area to discover novel HDAC3Is for the treatment of those diseases. We have been working on identification of novel-scaffold HDAC inhibitors by using computer-aided drug design (CADD) and cheminformatics [22,23]. Most recently, we developed an automated tool, i.e., MUBD-DecoyMaker for building benchmarking sets able to unbiasedly evaluate ligand enrichment of both ligand-based VS (LBVS) and structure-based VS (SBVS) approaches [24,25]. With that tool, we constructed maximal-unbiased benchmarking datasets (MUBD) for HDACs (including Sirtuins), i.e., MUBD-HDACs and released them in order to facilitate HDAC inhibitors discovery [5]. Up to now, the application of MUBD-HDACs has effectively assisted Huang et al., to identify a novel and potent HDAC inhibitor that showed anti-cancer activity [26]. In the extant paper, we present a versatile pipeline that is able to effectively enrich HDAC3-targeted active compounds from large-scale chemical libraries. To develop that pipeline, we use one dataset of MUBD-HDACs, i.e., MUBD-HDAC3, to exhaustively evaluate a variety of SBVS and LBVS approaches, including docking programs, scoring functions, and ligand-induced-fit protein models, as well as multiple pharmacophore/shape-based models. The constructed pipeline will be helpful for the scientific community to identify novel HDAC3Is in a time-efficient and cost-effective way.  Table 1 shows ligand enrichments of three docking programs, i.e., LibDock, GOLD, and FRED. GOLD (Chemscore) was the weakest docking program in terms of both early recognition and overall enrichment. Its values of receiver operating characteristic (ROC) enrichment at 0.5% (i.e., ROCE 0.5%), ROCE 1%, and ROC AUC (area under the curve) were 0, 0, and 0.63, respectively. LibDock (LibScore) ranked in second place. Though its value of ROC AUC was slightly higher than that of FRED (Chemgauss4), its ROCE 0.5% and ROCE 1% values were much lower, i.e., 10.89 vs. 30.90 and 5.42 vs. 25.66. Based on this outcome, FRED (Chemgauss4) was the optimal docking program to enrich for active ligands.

Results and Discussions
We explored the potentials of 10 scoring functions implemented in DS 2016 (Discovery Studio version 2016, San Diego, CA, USA) to improve ligand enrichment. As FRED performed the best among all programs, all docking poses from it became the input for the "score ligand poses" module in DS. Unfortunately, this rescoring approach did not further improve ligand enrichment of FRED (Chemgauss4), as all ROCE 0.5%, ROCE 1%, and ROC AUC values of those explored scoring functions were lower than those of FRED (Chemgauss4) (cf. Table 1). In addition to docking programs and scoring functions, we hypothesized that ligand-induced-fit protein models may enrich HDAC3Is better that the apo structure (i.e., a structure without a ligand). To test this hypothesis, we generated models by ligand-induced-fit docking of five HDAC3Is (cf. Figure 2) into the catalytic site. As a result, flexible docking generated five protein-ligand complexes for 15k, 26 for 8d, 34 for R306465, 40 for SAHA, and 32 for MS-275. By visual inspection, we excluded the complexes that apparently did not contain a coordination bond between zinc ion and the aminoanilide or hydroxamic acid groups. The numbers of protein-ligand complexes for 15k, 8d, R306465, SAHA, and MS-275 were reduced as a result to 5, 12, 8, 12, and 5, respectively. In addition to docking programs and scoring functions, we hypothesized that ligand-induced-fit protein models may enrich HDAC3Is better that the apo structure (i.e., a structure without a ligand). To test this hypothesis, we generated models by ligand-induced-fit docking of five HDAC3Is (cf. Figure 2) into the catalytic site. As a result, flexible docking generated five protein-ligand complexes for 15k, 26 for 8d, 34 for R306465, 40 for SAHA, and 32 for MS-275. By visual inspection, we excluded the complexes that apparently did not contain a coordination bond between zinc ion and the aminoanilide or hydroxamic acid groups. The numbers of protein-ligand complexes for 15k, 8d, R306465, SAHA, and MS-275 were reduced as a result to 5, 12, 8, 12, and 5, respectively. The total number of ligand-induced-fit protein models was 42. As FRED (Chemgauss4) was validated as the best docking/scoring program, it was used to dock and score all actives and decoys in MUBD-HDAC3 against every ligand-induced-fit model. Ligand enrichments of these models along with the use of FRED (Chemgauss4) are shown in Figure 3 and Table 2. Generally, ROCE 0.5% and ROCE 1% values of most protein models were higher than the apo structure, while ROC AUC values remained equivalent to that of the apo structure (cf. Figure 3). To be specific, the average ROCE 0.5% and ROCE 1% values of all models were 56.55 and 31.83, both of which were greater than those values of the apo crystal structure, i.e., 30.90 and 25.66. Meanwhile, the average of ROC AUC was equal to that of apo structure, i.e., 0.72 (cf. Table 2). Overall, these data indicate the induced-fit effect from flexible docking was able to improve early recognition and maintained overall enrichment of the apo structure.
We further analyzed the ligand-induced-fit protein models from the same ligand and selected the best model, i.e., model of highest enrichment for each ligand. As shown in Table 2, the best models induced by those ligands are 15k-5, 8d-7, R306465-7, SAHA-3, and MS-275-4. Among them, both 8d-7 and SAHA-3 presented the highest early enrichment (77.25 for ROCE 0.5% and 41.05 for ROCE 1%), but SAHA-3 showed slightly better performance in term of overall enrichment (0.79 vs. 0.77). Apart from ligand enrichment, we also inspected the binding mode between each ligand and its induced-fit protein structure (cf. Figure 4a-e). Though differences existed in their binding modes, they shared common features: (1) a zinc ion was located in a cavity composed of His172, Asp170, and Asp259, and it interacted with the aminoanilide or hydroxamic acid group of each ligand; (2) the hydrophobic group in the middle of each ligand was placed between Phe200 and Phe144. Such observations prompted us to generate common feature pharmacophore models for LBVS. The total number of ligand-induced-fit protein models was 42. As FRED (Chemgauss4) was validated as the best docking/scoring program, it was used to dock and score all actives and decoys in MUBD-HDAC3 against every ligand-induced-fit model. Ligand enrichments of these models along with the use of FRED (Chemgauss4) are shown in Figure 3 and Table 2. Generally, ROCE 0.5% and ROCE 1% values of most protein models were higher than the apo structure, while ROC AUC values remained equivalent to that of the apo structure (cf. Figure 3). To be specific, the average ROCE 0.5% and ROCE 1% values of all models were 56.55 and 31.83, both of which were greater than those values of the apo crystal structure, i.e., 30.90 and 25.66. Meanwhile, the average of ROC AUC was equal to that of apo structure, i.e., 0.72 (cf. Table 2). Overall, these data indicate the induced-fit effect from flexible docking was able to improve early recognition and maintained overall enrichment of the apo structure.
We further analyzed the ligand-induced-fit protein models from the same ligand and selected the best model, i.e., model of highest enrichment for each ligand. As shown in Table 2, the best models induced by those ligands are 15k-5, 8d-7, R306465-7, SAHA-3, and MS-275-4. Among them, both 8d-7 and SAHA-3 presented the highest early enrichment (77.25 for ROCE 0.5% and 41.05 for ROCE 1%), but SAHA-3 showed slightly better performance in term of overall enrichment (0.79 vs. 0.77). Apart from ligand enrichment, we also inspected the binding mode between each ligand and its induced-fit protein structure (cf. Figure 4a-e). Though differences existed in their binding modes, they shared common features: (1) a zinc ion was located in a cavity composed of His172, Asp170, and Asp259, and it interacted with the aminoanilide or hydroxamic acid group of each ligand; (2) the hydrophobic group in the middle of each ligand was placed between Phe200 and Phe144. Such observations prompted us to generate common feature pharmacophore models for LBVS.
The above data confirmed that inclusion of ligand-induced-fit effect by "flexible docking" in DS was able to generate protein-ligand complexes that (1) contained essential interactions with a zinc ion; and (2) more importantly, possessed better ligand enrichment than the apo structure. In addition, the use of a benchmarking calculation based on MUBD-HDAC3 was effective in selecting the optimal structural model.    The above data confirmed that inclusion of ligand-induced-fit effect by "flexible docking" in DS was able to generate protein-ligand complexes that (1) contained essential interactions with a zinc ion; and (2) more importantly, possessed better ligand enrichment than the apo structure. In addition, the use of a benchmarking calculation based on MUBD-HDAC3 was effective in selecting the optimal structural model.

Pharmacophore Models Generated by Catalyst/HipHop
Since five HDAC3Is in the ligand set for molecular modeling were observed to share common features in different protein-ligand complexes, we used Catalyst/HipHop to generate common feature pharmacophore models. According to Table 3, (1) all of the models were composed of four pharmacophore features, though there were two classes, i.e., RHDA and RHAA; (2) the rank scores of all models were equivalent, in a range from 42.530 to 46.740; (3) Hypo1 performed the best according to rank score, but not all ligands in the ligand set fully matched Hypo1 as its direct hit was 11101. By contrast, the scores of Hypo4 and Hypo9 ranked at the fourth and ninth places, but all ligands were able to fully match these two models. Therefore, we were unable to determine an optimal model from Hypo1and Hypo4/Hypo9 based on these preliminary data.

Pharmacophore Models Generated by Catalyst/HipHop
Since five HDAC3Is in the ligand set for molecular modeling were observed to share common features in different protein-ligand complexes, we used Catalyst/HipHop to generate common feature pharmacophore models. According to Table 3, (1) all of the models were composed of four pharmacophore features, though there were two classes, i.e., RHDA and RHAA; (2) the rank scores of all models were equivalent, in a range from 42.530 to 46.740; (3) Hypo1 performed the best according to rank score, but not all ligands in the ligand set fully matched Hypo1 as its direct hit was 11101. By contrast, the scores of Hypo4 and Hypo9 ranked at the fourth and ninth places, but all ligands were able to fully match these two models. Therefore, we were unable to determine an optimal model from Hypo1and Hypo4/Hypo9 based on these preliminary data.
Ligand enrichment is a preferred metric for predicting model performance in real-world screening. The evaluation outcome based on MUBD-HDAC3 is listed in Table 4. Hypo1 was superior to all other models, including Hypo4 and Hypo9, to which all five ligands were able to fully match. Based on both ligand enrichment and rank score, Hypo1 is undoubtedly the best among all models generated by Catalyst/HipHop. Figure 5 illustrates the details of Hypo1, including four pharmacophore features and their pairwise distances. As an example, the most potent HDAC3I in the ligand set, i.e., 15k, was mapped to Hypo1. According to both the pharmacophore model and the above predicted binding mode (cf. Figure 4a), compounds active for HDAC3 should contain (1) a group that includes a hydrogen bond donor on one side; (2) a hydrophobic group in the middle; (3) a group that contains an aromatic ring and a hydrogen-bond acceptor on the other side. As defined by others, these regions were the zinc binding group (ZBG) for (1); linker for (2); and a capping group for (3) [27,28].   Interestingly, we observed the inconsistency between rank and enrichment (cf . Tables 3 and 4) for a few models. For instance, ligand enrichment (ROCE1%) of Hypo2 was not as good as that of Hypo3, though its rank score is better. The same situation happened with Hypo4 and Hypo5. Such inconsistency indicates the better rank score from Catalyst/HipHop does not necessarily represent better enrichment, which highlights the necessity of a benchmarking calculation. Interestingly, we observed the inconsistency between rank and enrichment (cf . Tables 3 and 4) for a few models. For instance, ligand enrichment (ROCE1%) of Hypo2 was not as good as that of Hypo3, though its rank score is better. The same situation happened with Hypo4 and Hypo5. Such inconsistency indicates the better rank score from Catalyst/HipHop does not necessarily represent better enrichment, which highlights the necessity of a benchmarking calculation.

Shape-Based Models Generated by ROCS
In addition to common features, five HDAC3Is in the ligand set for molecular modeling had their own features in terms of both chemical structure and binding pose. ROCS program was applied to generate models to uncover these differences. Figure 6 shows shape-based models/queries (including pharmacophore features) based on five poses extracted from protein-ligand complexes, i.e., 15k-5L, 8d-7L, R306465-7L, SAHA-3L, and MS-275-4L. Notably, five models displayed unique shapes as well as unique pharmacophore features. We used these models to screen MUBD-HDAC3 and applied five different similarity metrics in ROCS, i.e., TanimotoCombo (TC), ShapeTanimoto (ST), ColorTanimoto (CT), ScaledColor (SC), and ComboScore (CS), to score actives/decoys in MUBD-HDAC3. The resultant ligand enrichment from each similarity metric is listed in Table 5.
As for ROC AUC, the values for all pairs of model/metric are all around 0.50 and their pairwise difference is within 0.10. This indicated in general that ligand enrichment of a shape-based similarity search was no better than in random enrichment. However, all pairs of models/metrics, except for SAHA-3L/ST (ROCE 0.5%: 0; ROCE 1%: 0), performed better in early recognition than random selection. The best metric varies for each model. The best metric for 15k-5L was TC, a combined score of ST and CT. With the use of CT, 8d-7L showed best enrichment. The metrics that result in optimal enrichments for R306465-7L, SAHA-3L and MS-275-4L are CT, TC, and CS (combined score of ST and SC), respectively. Among them, the best pair was SAHA-3L/TC, whose early enrichments in terms of ROCE 0.5% and ROCE 1% were 20.60 and 17.96. Nevertheless, it was still not as good as

The Optimal Strategy to Virtually Screen HDAC3Is
Via exhaustive evaluation, we have identified (1) the ligand-induced-fit protein model, i.e., SAHA-3 combined with FRED (Chemgauss4) led to best ligand enrichment, in particular early recognition; (2) the common feature pharmacophore model, i.e., Hypo1 generated by Catalyst/HipHop was superior to shape-based models generated by ROCS; and (3) The optimal SBVS approach performed better than pharmacophore filtering, i.e., Hypo1. Based on the outcome, we developed a pipeline to rapidly and effectively screen large-scale chemical libraries. This pipeline was composed of pharmacophore filtering by Hypo1, followed by use of FRED docking against SAHA-3. Notably, this combination showed better performance than the use of Hypo1 alone. Meanwhile, it did not significantly impair the good early recognition of molecular docking alone, as its ROCE 0.5% and ROCE 1% values remained close, i.e., 72.10 and 38.49 (cf. Table 6). In light of both computational cost and screening power, we deemed Hypo1_FRED_SAHA-3 was the optimal strategy to virtually screen for HDAC3Is. In addition, we evaluated three other combined pipelines, i.e., Hypo1_GOLD_apo, Hypo1_LibDock_apo, and Hypo1_FRED_apo. These pipelines were the most probably constructed and applied ones for HDAC3Is discovery, if ligand enrichment assessment (i.e., benchmarking calculation) was not performed. The reasons were: (1) Hypo1 had a high rank score in Catalyst/HipHop and, thus, would undoubtedly become the first component of the discovery pipelines (cf. Table 3); (2) any of the commonly used docking programs, i.e., GOLD, LibDock, and FRED, would be arbitrarily selected due to the lack of feasible metrics; and (3) the only available apo structure for HDAC3 would be used as the receptor for docking. According to Table 6, the three pipelines were all inferior to Hypo1_FRED_SAHA-3 in terms of ligand enrichment. The ROC curves clearly illustrate the same outcome (cf. Figure 7). These data validated the advantages of our methods, e.g., flexible docking and benchmarking calculation, in the construction of an optimal pipeline. In addition, we evaluated three other combined pipelines, i.e., Hypo1_GOLD_apo, Hypo1_LibDock_apo, and Hypo1_FRED_apo. These pipelines were the most probably constructed and applied ones for HDAC3Is discovery, if ligand enrichment assessment (i.e., benchmarking calculation) was not performed. The reasons were: (1) Hypo1 had a high rank score in Catalyst/HipHop and, thus, would undoubtedly become the first component of the discovery pipelines (cf. Table 3); (2) any of the commonly used docking programs, i.e., GOLD, LibDock, and FRED, would be arbitrarily selected due to the lack of feasible metrics; and (3) the only available apo structure for HDAC3 would be used as the receptor for docking. According to Table 6, the three pipelines were all inferior to Hypo1_FRED_SAHA-3 in terms of ligand enrichment. The ROC curves clearly illustrate the same outcome (cf. Figure 7). These data validated the advantages of our methods, e.g., flexible docking and benchmarking calculation, in the construction of an optimal pipeline.

The Ligand Set for Molecular Modeling
Five potent and diverse HDAC3Is were selected from the literature, i.e., 15k [29], 8d [30], R306465 [31], SAHA [32], and MS-275 [32] (cf. Figure 2). This set of HDAC3Is covered two major types of scaffolds, i.e., aminoanilide and hydroxamic acid. Prior to molecular modeling, these ligands were protonated at the pH range of 7.3-7.5 by using the "Prepare Ligands" module in DS 2016 (Dassault Systèmes BIOVIA, San Diego, CA, USA). In the present study, this ligand set was utilized to induce multiple protein models by flexible docking. Also, it was used to generate common feature pharmacophore models by Catalyst/HipHop.

The Benchmarking Set for Ligand Enrichment Assessment
As mentioned before, MUBD-HDACs can be applied to benchmark both LBVS and SBVS approaches in an unbiased manner [5]. Therefore, the dataset designed specific to HDAC3, i.e., MUBD-HDAC3 was used to evaluate ligand enrichment of all approaches in this study, including docking programs, scoring functions, ligand-induced-fit protein models, and pharmacophore/shape-based models. MUBD-HDAC3 was downloaded from Wang's lab (available on: http://www.xswlab.org/, accessed in January 2016) [33]. It included 39 diverse HDAC3Is, 1521 unbiased decoys, as well as a ready-to-use apo structure for HDAC3 (PDB Entry: 4A69).

Metrics to Assess Ligand Enrichment
Based on a list of ranked scores of ligands and decoys in MUBD-HDAC3, a receiver operating characteristic (ROC) curve was generated to evaluate ligand enrichment of different approaches. A ROC curve is a plot that describes the true positive rate against the false positive rate at various discrimination threshold settings. It is commonly used in data analysis to evaluate the performance of a binary classifier in discriminating the positive (active) from the negative (decoy). The more specific metrics from the ROC curve were ROC area under the curve (ROC AUC) and ROC enrichment (ROCE) values. The former represented overall enrichment, while the latter measured early recognition. Since early recognition has more practical significance than overall enrichment in real-world screening campaigns, ROCE 1% and ROCE 0.5% were given priority for determining the optimal approach. [34] The ROCE value was defined as the quotient of true positive rate divided by the false positive rate at a given percentage of binding decoys, i.e., 0.5% or 1%. In the present study, those actives or decoys that failed in pose generation or molecular docking were relisted with a maximum/minimum score. This ensured a fair comparison of different approaches in the benchmarking calculation.

Three Commonly Used Docking Programs
Three docking programs that we had access to, LibDock implemented in DS, GOLD (version 3.0.1, Cambridge Crystallographic Data Center, Cambridge, UK), and FRED (now OEDocking, version 3.0.1, OpenEye Scientific Software, Inc., Santa Fe, NM, USA) were applied to screen MUBD-HDAC3. Each program had its inherent scoring function, i.e., Libscore for LibDock, Chemscore for GOLD, and Chemgauss4 for FRED. The ready-to-use apo structure from MUBD-HDAC3 was directly used as a receptor for molecular docking. In molecular docking simulations using the three programs, zinc ion in the binding site was treated as rigid.
For LibDock, the binding site was defined as a sphere centered on zinc ion in the catalytic site with a radius of 12 Å. All compounds in MUBD-HDAC3 were converted to 3D conformers by the "FAST" method in DS, with 255 as the maximum conformations and 20 kcal/mol as an energy threshold. All conformers were docked against the binding site and scored by Libscore. A maximum of 30 poses were saved for each compound. For GOLD, it was not necessary to generate a multi-conformer database before molecular docking. This program used the initial low-energy conformation of each compound as an input, and applied a genetic algorithm to optimize the pose in the binding site. The binding site was defined as exactly the same as that for LibDock. The zinc ion was retained and predefined to form tetrahedral coordination geometries during molecular docking. The docking mode was set as "7-8 times speed-up" and each compound was docked 30 times. All poses were scored by ChemScore. For molecular docking by FRED, firstly OMEGA [35] (version 2.5.1.4, OpenEye Scientific Software, Inc.) was utilized to build a multi-conformer database for all the compounds in MUBD-HDAC3 with default parameters. Then, as GOLD was able to deal with coordination with the zinc ion, a representative HDAC3I, such as 15k, was "positioned" in the catalytic site by GOLD and its best scoring pose was used to define the binding site in a form of grid box. Subsequently, FRED docked all compounds from the multi-conformer database into the binding site. After that, Chemgauss4 scored all docking poses and 30 top scoring poses of each compound were kept.
For the benchmarking calculation, only the best docking score of every ligand/decoy was selected. Based on the list of scores, ROC AUC and ROCE 0.5%/1% values were calculated for each docking program.

Other Structure-Based Scoring Functions
As different scoring functions show inconsistent "screening power" [36,37], rescoring using other functions may have the potential to further improve ligand enrichment of the optimal docking program. Herein, 10 scoring functions available in DS were explored: LigScore1, LigScore2, PLP1, PLP2, Jain, PMF, PMF04, LUDI1, LUDI2, and LUDI3. Based on the evaluation outcome of three docking programs, all poses from the optimal docking program were submitted to the "Score Ligand Poses" module of DS. Again, the binding site was defined as a sphere centered on the zinc ion in the catalytic site with a radius of 12 Å. For each scoring function, the top scoring pose of each compound was retained for the benchmarking calculation.

Multiple Ligand-Induced-Fit Protein Models
Receptor flexibility is also a factor that may affect ligand enrichment. Therefore, the apo structure of HDAC3 was an "induced fit" with every HDAC3I in the ligand set (cf. Figure 2). Prior to the induced fit, GOLD was used to generate a best scoring pose that coordinated with zinc ion. With that as the starting pose, the ligand was docked against the apo structure of HDAC3 by using "flexible docking" module in DS [38]. In flexible docking, the binding site was defined as a sphere with a radius of 12 Å centered on the zinc ion. Those amino acid residues covered by the binding site were set as removable, which became the basis for the generation of receptor conformations and refinement of side chains. Other parameters were set as default. After flexible docking, the generated protein-ligand complexes were visually inspected. Those complexes that apparently did not contain interactions between the aminoanilide or hydroxamic acid group of the ligand and zinc ion were excluded. In the remaining complexes, all ligand poses were stripped and the corresponding receptor confirmations, i.e., ligand-induced-fit protein models were retained. Then, the optimal docking program and scoring function selected from the last step were applied to dock MUBD-HDAC3 against these models. Based on docking scores of actives and decoys in MUBD-HDAC3, the ligand enrichment of each model was calculated.

Pharmacophore Modeling: Catalyst/HipHop
The Catalyst/HipHop module implemented in DS was applied to generate common feature pharmacophore models based on the ligand set (cf. Figure 2). Prior to pharmacophore modeling, two attributes of the ligands, i.e., Principal and MaxOmitFeat, were defined. The former attribute indicates whether the ligand is active (2), moderately active (1), or inactive (0), while the latter represents the maximum number of omitted features. For 15k and 8d, Principal and MaxOmitFeat were set to 2 and 0, respectively. For the other three HDAC3Is, both Principal and MaxOmitFeat were set to 1. For each ligand, the FAST method was used to quickly generate a maximum of 255 diverse low-energy conformations, with 20 kcal/mol as an energy threshold. Hydrogen-bond acceptor (A), hydrogen-bond donor (D), hydrophobic group (H), and ring aromatic (R) were predefined as potential features in pharmacophore models. The minimum and maximum counts of each feature were set to 0 and 5. Based on the above parameters, 10 pharmacophore models were automatically generated. Then, the resultant pharmacophore models were scored (ranked) by Catalyst/HipHop. The rank is a measure of how well the ligands map onto the pharmacophore model.
As ligand enrichment is the most significant metric for VS, these models were further evaluated. Retrospective small-scale VS using each pharmacophore model was performed based on the benchmarking set, i.e., MUBD-HDAC3. Actives and decoys in MUBD-HDAC3 were converted to 3D conformers by using the DS module "Build 3D Database" and then screened by using the DS module of "Search 3D Database". Based on fit values of all compounds, ROC curves were generated and corresponding parameters, i.e., ROCE and ROC AUC, from the curves were obtained.
3.4.2. Shape-Based Models: ROCS ROCS (version 3.0.1, OpenEye Scientific Software Inc., Santa Fe, NM, USA) is a shape-based superposition method and well known for its accuracy and speed [39,40]. Based on every ligand pose whose complementary protein model possesses best enrichment of all models induced by that ligand, ROCS generated five shape-based models (including pharmacophore features) using the default Implicit-Mills force field. No additional editing was applied to these models. These models were then evaluated for ligand enrichment by the benchmarking calculation, i.e., retrospective small-scale VS based on MUBD-HDAC3. To be specific, all of the actives and decoys in MUBD-HDAC3 were converted to a multi-conformer database by OMEGA and screened/scored by five different similarity metrics that included TanimotoCombo (TC), ShapeTanimoto (ST), ColorTanimoto (CT), ScaledColor (SC), and ComboScore (CS). For each metric, the best scoring pose was retained for each compound. Based on the scores for each metric, ROC curves and parameters were calculated.

Development of Pipeline for Large-Scale VS
Due to low computational cost, LBVS approaches are usually set up ahead of SBVS approaches in a pipeline for large-scale VS. Accordingly, the best model obtained from the evaluation of LBVS approaches and the optimal SBVS approaches, e.g., docking program/scoring function and the ligand-induce-fit protein model, were integrated into a pipeline. This pipeline was then benchmarked based on the MUBD-HDAC3 dataset. To do this, the compounds in MUBD-HDAC3 were filtered by the best pharmacophore/shape-based model. Those compounds that passed the filter were further docked against the optimal structural model of HDAC3 by the selected docking program and scored by the best scoring function. For the purpose of comparison, those compounds that passed the filter were also docked against the apo structure using a common docking/scoring program. The outcomes from these combined LBVS and SBVS approaches were then analyzed.

Conclusions
In this paper, we used MUBD-HDAC3 to unbiasedly validate the screening power of various SBVS and LBVS approaches and select the optimal ones. Stepwise evaluation of docking programs and scoring functions identified FRED (Chemgauss4) as optimal for enriching HDAC3Is based on the apo structure. Docking against multiple ligand-induced-fit protein models by FRED (Chemgauss4) further uncovered SAHA-3 as a structural model of better screening power. Comparative analysis of common feature pharmacophore models by Catalyst/HipHop and shape-based models by ROCS demonstrated Hypo1 from the former performed the best in terms of early enrichment. Based on the outcome, we presented a versatile pipeline, i.e., Hypo1_FRED_SAHA-3, to rapidly and effectively enrich HDAC3-targeted active compounds from large-scale chemical libraries. We have demonstrated this rationally developed pipeline was superior to other combinations in terms of ligand enrichment, and this highlighted the importance of benchmarking calculations in decision-making.
To the best of our knowledge, this work is the first to attempt to develop VS pipelines for intentionally screening HDAC3-specific inhibitors. The models associated with this VS pipeline, i.e., Hypo1 and SAHA-3, and their usages are publicly accessible via the following link (available on: https://www.dropbox.com/sh/5w9d6y8m77hk9pb/AAA2QoItV7wEeFS4p1ZyQzMda?dl=0). Our immediate goal is to apply this pipeline to screen chemical libraries from different sources and test potential hits for their HDAC potency/selectivity profile as well as their anti-diabetic effect.