Identification of Novel Cathepsin B Inhibitors with Implications in Alzheimer’s Disease: Computational Refining and Biochemical Evaluation

Amyloid precursor protein (APP), upon proteolytic degradation, forms aggregates of amyloid β (Aβ) and plaques in the brain, which are pathological hallmarks of Alzheimer’s disease (AD). Cathepsin B is a cysteine protease enzyme that catalyzes the proteolytic degradation of APP in the brain. Thus, cathepsin B inhibition is a crucial therapeutic aspect for the discovery of new anti-Alzheimer’s drugs. In this study, we have employed mixed-feature ligand-based virtual screening (LBVS) by integrating pharmacophore mapping, docking, and molecular dynamics to detect small, potent molecules that act as cathepsin B inhibitors. The LBVS model was generated by using hydrophobic (HY), hydrogen bond acceptor (HBA), and hydrogen bond donor (HBD) features, using a dataset of 24 known cathepsin B inhibitors of both natural and synthetic origins. A validated eight-feature pharmacophore hypothesis (Hypo III) was utilized to screen the Maybridge chemical database. The docking score, MM-PBSA, and MM-GBSA methodology was applied to prioritize the lead compounds as virtual screening hits. These compounds share a common amide scaffold, and showed important interactions with Gln23, Cys29, His110, His111, Glu122, His199, and Trp221. The identified inhibitors were further evaluated for cathepsin-B-inhibitory activity. Our study suggests that pyridine, acetamide, and benzohydrazide compounds could be used as a starting point for the development of novel therapeutics.


Introduction
Alzheimer's disease (AD) is the most common type of dementia [1], associated with decline in memory and cognition [2,3]. AD is the sixth-leading cause of death, with

Molecular Docking and Interaction Studies
Molecular docking of 61 analog inhibitors was carried out using a Lamarckian genetic algorithm (LGA) with AutoDock v4.2 tools with default docking parameters [41]. The receptor file was first prepared by deleting water molecules and adding polar hydrogens only. Gasteiger charges and any other atoms present were assigned AD4 atom types.

Molecular Docking and Interaction Studies
Molecular docking of 61 analog inhibitors was carried out using a Lamarckian genetic algorithm (LGA) with AutoDock v4.2 tools with default docking parameters [41]. The receptor file was first prepared by deleting water molecules and adding polar hydrogens only. Gasteiger charges and any other atoms present were assigned AD4 atom types. Ligands with rotatable bonds were set to free, and the protonation state was maintained at physiological pH [35]. Grid parameter files were built, and atom-specific three-dimensional affinity maps were constructed using AutoGrid v4.2. The three docking steps (van der Waals interactions, hydrogen bonds, and the electrostatic potential) were considered for the calculation of binding energy [42,43]. Each docking experiment was derived from 20 different runs that were set to terminate after a maximum of 2,500,000 energy evaluations, or 27,000 generations, yielding 20 docked conformations, while the population size was set to 150. After the generation of multiple runs, cluster analysis was performed. The docking solutions with inhibitors that had atomic root-mean-square deviations (RMSDs) within 2.0 Å of one another were clustered together and ranked by their lowest energy. The solution with the lowest energy was accepted as the best docking conformation for the binding energy calculation. Estimated binding free energy (kcal/mol), inhibitory constant (Ki, µM), electrostatic energy (kcal/mol), van der Waals interactions (kcal/mol), hydrogen bonds (kcal/mol), desolvation energy (kcal/mol), total intermolecular energy (kcal/mol), torsional energy (kcal/mol), and Gibbs free energy of binding were calculated for 61 cathepsin B inhibitors.

Ligand-Based Three-Dimensional Pharmacophore Search and Virtual Screening
A ligand-based three-dimensional pharmacophore model based on the chemical features present in 61 active, moderately active, and inactive cathepsin B inhibitors was generated using LigandScout 3.1 software (Inte:Ligand GmbH, v3.1, Vienna, Austria) [44] from the docking of 61 cathepsin B inhibitors, so that these ligands (cathepsin B inhibitors) would be flexibly aligned into a rigid macromolecular (cathepsin B protein) environment, and later could be estimated for the fidelity of the interactions [45]. Pharmacophoric descriptors-including H-bond donors (HBD), H-bond acceptors (HBA), hydrophobic (HY), ring aromatic (RA), positive ionizable (PI), and negative ionizable (NI) featureswere mapped onto the chemical features of all training set molecules [42]. RA, PI, and NI features were less significantly mapped when compared to the other features. Therefore, HBD, HBA, and HY were selected to generate 3D pharmacophore hypothesis protocols. The docked pose of each compound in the training set was used to generate maximum numbers of conformations (500), a root-mean-square (rms) threshold of 0.4, energy window of 10, and maximum pool of 4000 as the default values using the ligandset module in LigandScout 3.1. Furthermore, clustering was performed using similarity measurement via pharmacophore alignment score. The cluster distance calculation method was kept at 0.4 and average in order to generate ligand-based pharmacophores using pharmacophorefit and atom overlap scoring functions. All of the other parameters were set as default. The IC50 values of individual training set compounds were selected as active properties. The test set analysis method was used to validate the pharmacophore model hypothesis. This method was validated by the enrichment factor (E) to discriminate between active and inactive molecules, and expressed as: False negative (A−Ha) False positive (Ht−Ha) Goodness of hit score (GH) The enrichment factor was calculated as the ratio of active hits (Ha) amongst all the molecules in database D and total hits (Ht) in database A (Equation (1)). The false negatives were calculated by subtracting active hits (Ha) from the total number in the database (A) (Equation (2)), and the difference in total hits (Ht) and active hits (Ha) was used to calculate the rate of false positives (Equation (3)). The goodness of hit score (GH) ranged from 0 to 1, differentiating null models from ideal models (Equation (4)).

Screening of Maybridge Database and Fast Docking Using AutoDock Vina
To identify potential lead molecules from the large Maybridge database containing 60,538 small molecules, virtual screening was performed by generating ligand-based threedimensional pharmacophores (as described in the previous section). At the start of the virtual screening, we presumed that possible hit molecules from the Maybridge library should fit with the majority of the possible features of the pharmacophore query. Duplicate Maybridge molecules were removed and not used for screening. Thereafter, the Maybridgescreened compound hits were docked into the binding site of cathepsin B using AutoDock Vina (ADt-Vina) [46]-a fast-docking program to remove compounds with non-compatible geometries and energetics toward the ligand-binding site. ADt-Vina used an "iterated local search global optimizer" algorithm to evaluate the interactions between a ligand and its receptor. The search space for docking was restricted to a cubic box of 40 Å × 40 Å × 40 Å centered on the binding site. ADt-Vina calculations were carried out using a default value of eight to ensure the exhaustiveness of the global search. Twenty poses were generated for each ligand, which were ranked by ADt-Vina's empirical scoring functions, and the pose with the lowest docking score value was selected as the best iteration. The predicted binding energy from the docking provided a ranking of the compounds, which was compared to the known actives using two measures. Virtual screening performance is commonly analyzed using a receiver operating characteristic (ROC) curve, which can easily be quantified by determining the area under the curve (AUC). The AUC and the Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) metric were used to evaluate the robustness of the docking algorithm in order to select active compounds [47]. The identified hit molecules were further subjected to various screening filters, such as Lipinski's rule of five, which selects molecules with molecular weight less than 500 D, HBD less than 5, HBA less than 10, and a LogP (octanol/water partition coefficient) value of less than 5 [48] to achieve the possible set of potential cathepsin B inhibitor molecules.

Construction of Structural Interaction Fingerprints (SIFts), Similarity Analysis, and Hierarchical Clustering
Structural interaction fingerprints (SIFts) were used for one-dimensional binary representation of contact between ligands and macromolecule amino acids. The advantage of using the SIFt method is in generating protein-ligand interaction fingerprints to elucidate the interaction patterns [49][50][51]. Predefined binding site amino acid residues involved in ligand binding were used for SIFt calculations for both sets of ligands (inhibitors of cathepsin B and virtual screen hits), while hydrogen bond interactions between ligands and amino acids were identified by UCSF Chimera software with default settings. The next step was to classify protein-ligand interaction fingerprints. The implementation of SIFt uses seven bits for each amino acid residue-contacting ligand. The seven bits were switched on (binary representation: 1) and off (binary representation: 0) depending on several factors, such as (1) the subsequent interactions specifying whether a contact was involved at a position, whether the (2) main-chain (MC) or (3) side-chain (SC) atoms were involved, and the presence of a (4) hydrogen bond acceptor (HBA) or (5) donor (HBD); the (6) polar and (7) nonpolar nature of the interactions thus represent each residue by a seven-bit-long string. The complete interaction fingerprints of the virtual screen molecules that complexed with the proteins were finally constructed by successively concatenating the bit string of each binding site residue simultaneously, in accordance with the ascendant residue number. This results in each interaction fingerprint being similar in length, and enables easy comparison of protein-ligand binding interactions at a binding site position across a series of complexes. For the quantitative measure of bit string similarity, we used the Jaccard-Tanimoto coefficient (JT), which is widely used for binary data [52]. The RTc between the two strings A and B is calculated as: where |A∩B| represents the number of ON bits common in both A and B, and |A∪B| represents the number of ON bits present in either A or B. The SIFt signifies the proteinligand interaction pattern; however, similarity analysis of the fingerprints implies that the ligands carry out similar interactions with the protein. We then applied an algorithm that groups similar objects into groups of similar clusters-hierarchical clustering-in order to analyze the protein-ligand interaction fingerprints for each test case [53], using SYSTAT v13.2 software. The generated protein-ligand complex clusters were manually inspected based on their fingerprint interaction dendrograms.

Molecular Dynamics Simulation and Binding Free Energy Calculations
The molecular mechanics Poisson−Boltzmann surface area (MM-PBSA) [54,55] and molecular-mechanics-generalized Born surface area (MM-GBSA) [56] methods, as implemented in AMBER10 [57], were used to calculate ligand binding free energy. To generate a conformational ensemble for the binding free energy calculation, molecular dynamics simulation was carried out for compounds selected after docking. In each case, the proteins were immersed in the rectangular truncated octahedron filled with 8 Å TIP3P water molecules and neutralized by adding Na + or Cl − ions. At first, the protein system was minimized by 500 steps of steepest descent, followed by 2000 steps of conjugate gradient. After minimization, the system was gradually heated in the canonical ensemble from 0 to 300 K over 50 ps, and then equilibrated for 200 ps. Finally, a 20-ns MD simulation was performed under a constant temperature of 300 K [58]. The long-range electrostatic interactions were dealt with by employing the particle mesh Ewald (PME) method [59]. All hydrogen atoms were constrained using the SHAKE algorithm [60], and the time step was set to 2 fs. All simulations were performed using the SANDER module of AMBER10 [61] with the AMBER force field (ff03). Binding free energy was calculated using the following equation: The above equation can be conceptually summarized as: where ∆GMM is the molecular mechanics binding free energy between small molecules and cathepsin B. The ∆Evdw, ∆Eelec, and ∆Eint account for differences in van der Waals energy, electrostatic energy, and internal energy, respectively. ∆Gpolar solvation is the polar contribution to solvation-free energy, and is computed by solving the Poisson−Boltzmann equation in MM-PBSA, or by using Onufriev's generalized Born model [62] in MM-GBSA. ∆Gnonpolar solvation is the nonpolar contribution to solvation-free energy, and is estimated using the following equation: where SASA is the solvent's accessible surface area, as calculated using the LCPO method [63] for MM-PBSA and the ICOSA method [63] for MM-GBSA. The γ and b are empirical constants with default values of 0.0072 and 0, respectively. The T∆S represents the solute entropy that was not considered in this study, as we were mainly interested in the relative ranking of virtual screening hits, rather than absolute binding free energies.

Cathepsin B Inhibitory Assay
Cathepsin B inhibitory assay was performed using a fluorometric-based assay kit as per company instructions (ab185438). The assay kit utilizes the ability of cathepsin B to cleave synthetic AFC (7-Amino-4-trifluoromethyl coumarin) substrate to freely release AFC; moreover, this AFC can be quantified by a fluorometer. In the presence of cathepsin B inhibitors, the cleavage of the substrate was reduced, and there was a decrease in the total loss of the AFC fluorescence. The experiments were performed in triplicate. The test compounds (virtual screened hits) were prepared in different concentrations of 2.5, 5.0, 10.0, 15.0, and 20.0 µM. The inhibitor screening protocol was performed by first incubating 10 µL of test compound with cathepsin B enzyme in CTSB (cathepsin B) reaction buffer for 10-15 min at room temperature. one microliter of F-F-FMK (CTSB inhibitor) and 9 µL of CTSB reaction buffer were added to the cathepsin B enzyme well plate and treated as inhibitory controls (IC or positive control), whereas solvent with cathepsin B enzyme was used for enzyme control (EC or negative control). After incubation, 40 µL of cathepsin B substrate solution was added into each well and mixed. Next, fluorescence was measured in a kinetic mode for 30-60 min at 37 • C (Ex/Em = 400/505). Two timepoints were used-0 and 15 min (∆T = T2 − T1)-and corresponding values for the fluorescence (∆RFU = RFU2-RFU1) were used to calculate the slope for all test inhibitor samples and EC, by dividing the net ∆RFU values by the time ∆T. The percentage of relative inhibition was calculated using the following formula:

Results
The current study employed the power of a computational approach to generate a three-dimensional pharmacophore hypothesis, refine the pharmacophore model through structural interaction fingerprinting, and validate the novel hits from the Maybridge database using molecular dynamics. The general strategy for the ligand-based pharmacophore virtual screening is presented in Figure 3. In summary, the 61 known cathepsin B inhibitors were docked at cathepsin B protein (PDB ID: 1CSB) active sites. The lowest binding energy interactions were selected to create ligand-based three-dimensional pharmacophore models. The validated Hypo III was utilized for the virtual screening of the Maybridge database. The Maybridge database of small molecules with diverse drug-like structures-containing 60,538 compounds-was screened within the matching features of the pharmacophore models. To our satisfaction, 1728 Maybridge molecules were closely mapped on all pharmacophoric features present in Hypo III. These molecules were additionally examined by subjecting them to fast-track docking using the ADt-Vina program, which provided 176 molecules with cutoff values of docking scores ≥ −6.0 kcal/mol. Furthermore, Lipinski's rule of five was applied to 176 molecules as an additional parameter, which provided 18 molecules. These 18 hits were finally subjected to re-docking, and structural interaction fingerprints were generated to prioritize and refine the drug-like compounds. Three novel hit compounds were selected for molecular dynamics simulation studies and validated for further in vitro activity using enzyme-based assay methods. A schematic representation of the pharmacophore generation and virtual screening processes is shown in Figure 3. thermore, Lipinski's rule of five was applied to 176 molecules as an additional parameter, which provided 18 molecules. These 18 hits were finally subjected to re-docking, and structural interaction fingerprints were generated to prioritize and refine the drug-like compounds. Three novel hit compounds were selected for molecular dynamics simulation studies and validated for further in vitro activity using enzyme-based assay methods. A schematic representation of the pharmacophore generation and virtual screening processes is shown in Figure 3. Step 1, data collection: The dataset of 61 ligands was docked in cathepsin B enzyme (PDBID: 1CSB), and then used to generate a 3D pharmacophore, followed by screening of the Maybridge library.
Step 2, hypothesis, and screening: Maybridge hits were subjected to fast docking using AutoDock-Vina, screening using the structural interaction fingerprint (SIFt) approach, and re-docking of the Maybridge virtual screen hits. Step 3, inhibitory activity: Biological evaluation of virtual screening hits against cathepsin B inhibitory activity and molecular dynamics; MM-PBSA and MM-GBSA were used to predict binding free energy components.

Binding Modes of the 61 Active Cathepsin B Inhibitors Using the AutoDock v4.2 Docking Program
Molecular docking not only provides a visualization of potential binding orientations, but also the iteration by which the ligand interfaces with the relevant amino acid [64]. The hydrogen-bonding interaction was observed with critical residues such as Cys29 and Gly74. Traditional protein-ligand docking was performed with AutoDock v4.2 for the dataset of 61 cathepsin B inhibitors using LGA. The docking results predicted binding energies, inhibitory constants, and interactions, which have been calculated for each of them (Table 1 and Figure 4A). Gln23, Gly24, Gly27, Cys29, Asn72, Gly74, His110, Glu122, Met196, Gly198, His199, and Trp221 amino residues comprise the cathepsin B protein-binding site. One example each from both the natural and synthetic cathepsin B inhibitors with the lowest IC50 and binding energies has been selected here to explain the binding mode of these inhibitors into the cathepsin B protein. The natural molecule compound N14, containing an epoxide moiety, was oriented downwards, forming hydrogen bounds with Gln23, Gly74, and Gly198. As shown in Figure 4B, the oxygen atom of the C-5 carboxyl group and C-6 hydroxyl group of ligand-N14 is coordinated with Gln23 and Gly74m whereas the nitrogen atom of ligand-N14-the N-1 amine group-coordinated with Gly198 within the enzyme's catalytic site. However, the synthetic class of inhibitors-compound S1-with the same epoxide moiety analogue, showed a different pattern of hydrogen-binding scheme. For this class of compounds, amino acids His110, His111, and Met196 were the key residues, with distance constraints of 2.73 Å, 2.97 Å, and 2.81 Å, respectively. The pyrrolidine substituent at the side chain of the S1 inhibitor involved the oxygen atom; the C-14 carboxyl group coordinated with His110 and His111, whereas the nitrogen atom of ligand-S1-the N-2 amine group-was found to coordinate with Met196 within the enzyme catalytic site ( Figure 4C).  In total, 61 compounds were identified from various literature and, as described previously, 24 of them were selected manually, considering the structural diversity and wide

Generation and Validation of Mixed-Feature Ligand-Based Pharmacophore Models Using LigandScout 3.1
In total, 61 compounds were identified from various literature and, as described previously, 24 of them were selected manually, considering the structural diversity and wide range of activity, as the training set (marked with * in Figure 2), while the remaining compounds were used as the test set. Structures and biological activities of the training and test set compounds are indicated in Figure 2. The training and test set compounds were divided based on distribution of biological activities and chemical features. The dataset was classified into three categories, according to biological activity data: active (+++, IC50 ≤ 0.3 µM), moderately active (++, 0.3 µM < IC50 < 2.5 µM), and inactive (+, IC50 ≥ 2.5 µM). These molecules were distributed between the training set and test set. A maximum number of active and moderately active compounds, along with a few inactive compounds, were assigned to the training set compounds. The rest of the molecules were assigned to the test set for validation. The training set molecules included epoxides, pyridine, pyrrolidine, and potent acetamide compounds with high affinity to inhibit the cathepsin B domain. These available compounds provide ideal chemical structures for the development of cathepsin B inhibitors. Thus, these mixed-features pharmacophores were used to generate hypotheses based on the activity value of training set compounds.
The qualitative top 10 hypotheses were generated based on the training set molecules, which are tabulated in Table 2. The first three generated hypotheses were selected for the study because they had the highest pharmacophore-fit score in Hypo I (74.81, 7 matching features), Hypo II (75.24, 9 matching features), and Hypo III (75.98, 8 matching features). The matching features contained a combination of three chemical features, including hydrophobic (HY), hydrogen bond acceptor (HBA), and hydrogen bond donor (HBD). Hypothesis I (Hypo I) consisted of seven chemical features including one HY, five HBAs, and one HBD. A nine-feature hypothesis, Hypo II consisted of two HYs, five HBAs, and two HBDs. Hypo III contained two HYs, four HBAs, and two HBDs, constituting eight features overall ( Figure 5A). 6.09 6.07 − 0.3 70.09 ++ ++ a Positive value indicates that the estimated IC50 is higher than the experimental IC50; negative value indicates that the estimated IC50 is lower than the experimental IC50. b Fit value indicates how well the features in the pharmacophore map the chemical features in the compound. c Activity scale: active, +++, IC50 ≤ 0.3 μM; moderate active, ++, 0.3 μM > IC50 < 2.5 μM; less active, +, IC50 ≥ 2.5 μM). The three-dimensional pharmacophore-validated Hypo III model was utilized for  The pharmacophore fit score calculates the matching of number of pharmacophore features. This could predict the activity of compounds in the training set with the lowest deviation, while the RMSD represented pharmacophore alignment to estimate average activity, since the pharmacophore fit scores of the first three hypotheses were selected to validate the pharmacophore models. The validity of any pharmacophore model needs to be determined by applying that model to the test set to find out how correctly the model predicts the activity and, most importantly, whether it can correctly differentiate between active and inactive molecules. The pharmacophore model hypothesis was validated by assessing the predictive ability of the pharmacophore on the test set database consisting of 37 known inhibitors of cathepsin B, along with a subset of the World of Molecular Bioactivity (WOMBAT) database consisting of 741 molecules, active against different proteins than those used in present study, considered here as inactive. This validation gives confidence to select the best pharmacophore from amongst the three generated pharmacophore hypotheses. The results for pharmacophore validation are summarized in Table 3. A number of parameters such as hit list (Ht), active percentage yield (% Y), percentage ratio of active molecules in the hit list (% A), enrichment factor (E), false negatives, false positives, and goodness of hit score (GH) are presented (Table 3). Using the pharmacophore hypothesis Hypo I, 93 false positives were found, but only 28 active molecules were picked among the 37 in the test set. On the other hand, Hypo II showed 30 active hits, with 7 and 30 false negatives and false positives, respectively. The pharmacophore hypothesis Hypo III ( Figure 5A) showed minimal false positives and negatives, good enrichment factor and goodness of fit score, and was considered to be the best model for virtual screening among the three pharmacophore hypotheses. Overall, amongst the 45 hit molecules, 35 molecules were observed to be correct, thus showing only 10 false positives and 2 false negatives. In addition, Hypo I, Hypo II, and Hypo III showed enrichment factors (E) of 4.98, 10.78, and 16.74, and GH scores of 0.32, 0.56, and 0.81, respectively indicating that the quality of the pharmacophore models is within an acceptable range. These results indicate that Hypo III is accurate enough to discriminate the active inhibitors from inactive or low-activity compounds.
In addition to the pharmacophore validation, Hypo III was tested against the predictive power of the pharmacophore model to differentiate between the most, moderately, and least active compounds in the training datasets. The training set molecules were classified into three categories based on their activity values: highly active (IC50 ≤ 0.3 µM, +++), moderately active (0.3 µM > IC50 < 2.5 µM, ++), and inactive (IC50 ≥ 2.5 µM, +). The error value is the ratio between the estimated and experimental activities. The positive error value indicates that the estimated IC50 value is higher than the experimental activity, whereas the negative error value indicates that the estimated IC50 value is much lower than the experimental activity. An error value of <10 signifies the prediction of activity lesser than one order of magnitude. Among the 24 training set compounds, only 2 compounds had an error value of greater than 3. The estimated activity values of the training set compounds were predicted with the same activity scale as the experimental activity, and are represented in Table 4. Among the 24 training set compounds, 1 active compound (+++) in the training set was estimated as moderately active (++), 1 moderately active compound (++) was estimated as active (+++), and 2 moderately active compounds (++) were estimated as inactive compounds (+). The remaining compounds were estimated in their activity scale by Hypo III ( Table 4). The most active compounds of the training set (Compound S1, IC50: 8.65 µM) and their features flexibly aligned in Hypo III, and the least active compound (Compound N8, IC50: 125 µM) demonstrated poor alignment in the pharmacophore model ( Figure 5B,C). The validated pharmacophore model could be used for searching structurally diverse compounds from the Maybridge molecular library database

Maybridge Database Screen and Fast Docking of Maybridge Molecular Library Hits Using AutoDock-Vina
The three-dimensional pharmacophore-validated Hypo III model was utilized for virtual screening of the diverse compounds in the Maybridge molecule library. A total of 60,538 small molecules from the Maybridge database were added in the LigandScout database library to identify potential hit molecules against the cathepsin B protein. As a result, 1728 compounds (2.9% of the Maybridge database) were perfectly matched and aligned to all features in the Hypo III model. These molecules were subjected to fast docking using ADt-Vina. From the resulting data, the compounds were ranked based on their predicted binding energies. These rankings were used to evaluate the ability of ADt-Vina to preferentially select the active compounds. The binding pocket of the cathepsin B protein (PDB: 1CSB) was considered based on the bound ligand (CA030) in the crystal structure ( Figure 6A). In general, the default parameters were used for ADt-Vina. The docking program reported multiple conformations and associated binding energies. In case of ADt-Vina, the lowest energy conformation was selected. The compound rankings were determined and then compared against the active CA030 ligand. ADt-Vina reduced the Maybridge datasets to 176 molecule hits (0.3% of the Maybridge database) after applying a cutoff value of ≥−6.0 kcal/mol ( Figure 3). As shown in Figure 6B, ADt-Vina displayed an accurate ranking of active compounds in cathepsin B. Quantified by an AUC measure (Table 5) (Table 5). These molecules were further examined using Lipinski's rule of five and visual inspection [65].
Cells 2021, 10,1946 17 of 31 to preferentially select the active compounds. The binding pocket of the cathepsin B protein (PDB: 1CSB) was considered based on the bound ligand (CA030) in the crystal structure ( Figure 6A). In general, the default parameters were used for ADt-Vina. The docking program reported multiple conformations and associated binding energies. In case of ADt-Vina, the lowest energy conformation was selected. The compound rankings were determined and then compared against the active CA030 ligand. ADt-Vina reduced the Maybridge datasets to 176 molecule hits (0.3% of the Maybridge database) after applying a cutoff value of ≥−6.0 kcal/mol ( Figure 3). As shown in Figure 6B, ADt-Vina displayed an accurate ranking of active compounds in cathepsin B. Quantified by an AUC measure (  (Table 5). These molecules were further examined using Lipinski's rule of five and visual inspection [65]. This led to a reduction in the number of virtual screening hits to 18 molecules. These 18 small hit molecules were subjected to re-docking analysis using AutoDock v4.2. The docking results of the 18 virtually screened small hit molecules are tabulated in Table 6, and their binding pose in the active site showed ligand conformations (Video S1). Table 5. Virtual screen statistics. Area under the curve (AUC) and Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) 20 values were calculated based on the data shown in Figure 6. p-values were estimated using a bootstrap procedure based on 100,000 random rankings of the active compounds.     Figure 6. p-values were estimated using a bootstrap procedure based on 100,000 random rankings of the active compounds. This led to a reduction in the number of virtual screening hits to 18 molecules. These 18 small hit molecules were subjected to re-docking analysis using AutoDock v4.2. The docking results of the 18 virtually screened small hit molecules are tabulated in Table 6, and their binding pose in the active site showed ligand conformations (Video S1).

Structure Interaction Fingerprint Based Clustering
The SIFts were generated for 61 inhibitors of cathepsin B and 18 virtually screened hit molecules obtained after docking and scoring as described above ( Figure 7A). The dendrogram was created by clustering SIFts of cathepsin B inhibitors and virtual screening of hit molecules and is shown in Figure 7B. The dendrogram showed two major clusters, each of which represents a distinct binding pattern in the ligand-protein complexes ( Figure 7B). Cluster 1 (green color) was composed of 32.88% inhibitor molecules of cathepsin B (mostly less active) and 89.27% virtual screening hit molecules interacting with the cathepsin B protein. Similarly, Cluster 2 (brown and pink color) was composed of 67.12% cathepsin B inhibitors, which were mostly highly and moderately active, and 10.73% virtual screening hit molecules. Interestingly, each of these clusters was comprised of poses with similar binding patterns to the receptor; Cluster 1 contained molecules that represented distinct binding patterns that resulted in dissimilar interactions within the active site pocket formed by Gln23, His199, and Trp221 amino acid residues. Cluster 2 molecules bound in a similar manner to the known inhibitor in the X-ray crystal structures of PDB ID: 1CSB ( Figure 7C). Most of the inhibitors of cathepsin B were located at the interface between the active molecule and the substrate-binding cleft (Supplementary Figure S1A). The moieties such as pyridine, sulfanyl and phenyl present in the hits belonging to Cluster 2 were predicted to occupy the protein's active cleft. These played a critical role in peptide bond cleavage by Cys29 and interacted with His199 (Supplementary Figure S1B). These hits were buried deep within the binding groove and reached the catalytic residue Cys29 and were involved in π-π interactions. These moieties are also predicted to be involved in strong hydrogen bonding interactions with the conserved residues Gln23, Gly27, and Gly74. In most cases, the virtual screening hit molecules protrude outside of the catalytic pocket and occupy the neighborhood position composed of Gln23, Gly74, and Gly198 amino acid residues, belonging to the less active Cluster 1 (Supplementary Figure S1C). Here, these hit molecules were involved in hydrophobic interactions with Tyr75 and Tyr177 residues (not shown here) and hydrogen bond interactions with residues His110 and His199. Some other interactions were also found to occur in most of the inhibitors of cathepsin B viz. a hydrogen bond between O3 of the pyridine base and the Trp221 side chain, a hydrogen bond between Glu122 and N1 of the pyridine ring, and O4 hydrogen bonds with the imidazole ring of His199. Step 1: identify the key binding residues of the receptor protein in the complex; step 2: represent each key residue by a bit string (contact, main chain (MC), side chain (SC), polar, non-polar, hydrogen bond acceptor (HBA), and hydrogen bond donor (HBD)) according to the kind of interaction at that residue; step 3: concatenate 7-bit strings of all key residues to form a unique fingerprint, called an SIFt. (B) Dendrogram derived from agglomerative hierarchical clustering of SIFts of known cathepsin B inhibitors and virtual screening hits. Tanimoto similarity coefficient was used to calculate the similarity between the SIFts. (C) The key residues of the cathepsin B protein involved in interaction with virtual screening hit molecules.

Prioritization and Binding Modes of Hit Molecules
In order to prioritize virtual screening hits, visual inspection of the clusters and their binding modes was carried out with the following considerations: (1) the compound should be from Cluster 2; (2) the degree of occupancy of the enzyme, in particular related to the achieved protein-ligand surface complementarities; (3) the distance and the orientation of the aromatic, aliphatic, or hydrophobic group in relation to His199 for π-stacking interaction; (4) the formation of a hydrogen bond with His110 and His111; and (5) the quality of the overall binding conformation. Overall, out of 18 compounds, 3 compounds-i.e., BTB03075, KM02922, and RF02795-showed acceptable binding poses and met all five of the mentioned criteria. Table 6 shows the binding scores of the 18 Maybridge hit compounds and, in comparison to the co-crystalized ligand (CA030), BTB03075, KM02922, and RF02795 demonstrated better binding scores. Figure 8 shows the binding pattern of the three representative compounds (BTB03075, KM02922, and RF02795) in the Here, these hit molecules were involved in hydrophobic interactions with Tyr75 and Tyr177 residues (not shown here) and hydrogen bond interactions with residues His110 and His199. Some other interactions were also found to occur in most of the inhibitors of cathepsin B viz. a hydrogen bond between O3 of the pyridine base and the Trp221 side chain, a hydrogen bond between Glu122 and N1 of the pyridine ring, and O4 hydrogen bonds with the imidazole ring of His199.

Prioritization and Binding Modes of Hit Molecules
In order to prioritize virtual screening hits, visual inspection of the clusters and their binding modes was carried out with the following considerations: (1) the compound should be from Cluster 2; (2) the degree of occupancy of the enzyme, in particular related to the achieved protein-ligand surface complementarities; (3) the distance and the orientation of the aromatic, aliphatic, or hydrophobic group in relation to His199 for π-stacking interaction; (4) the formation of a hydrogen bond with His110 and His111; and (5) the quality of the overall binding conformation. Overall, out of 18 compounds, 3 compounds-i.e., BTB03075, KM02922, and RF02795-showed acceptable binding poses and met all five of the mentioned criteria. Table 6 shows the binding scores of the 18 Maybridge hit compounds and, in comparison to the co-crystalized ligand (CA030), BTB03075, KM02922, and RF02795 demonstrated better binding scores. Figure 8 shows the binding pattern of the three representative compounds (BTB03075, KM02922, and RF02795) in the cathepsin B binding site (PDB ID: 1CSB). These molecules achieved hydrogen-bonding interactions with the key amino acids His110 and His111 in cathepsin B's binding site via the imidazole side chain of the histidine amino acid. These require less distance to form hydrogen bonds with ligands, as the polar hydrogen atom of the imidazole ring acts as a hydrogen bond donor, while the basic nitrogen moiety is a hydrogen bond acceptor. The predicted binding modes of the Maybridge lead hits BTB03075, KM02922, and RF02795 are shown in Figure 8. Similarly, to most of the cathepsin B inhibitors, identified hits were all anchored to the cavity by hydrogen bonds between the ligand and Trp221, along with the p-p stacking interaction with His111 and His199. Aromatic rings of compounds BTB03075 ( Figure 8A(a')), KM02922, and RF02795 ( Figure 8B(b') and C(c')) were embedded in the cavity formed by Trp221, Gln23, and His199 residues. Compound RF02795 showed the best results in terms of binding pattern and docking score amongst the 18 compounds in the 3 crystal structures, and its predicted binding affinity exceeded that of the co-crystalized ligands in the cathepsin B protein structure (Table 6). Regarding the binding pattern of compound RF02795, its triazinyl is well accommodated at the binding site, interacting with its amide functional group through hydrogen bonding with the key amino acids His110 and His111, fitting the hydrophobic methoxy phenyl ring in the vicinity of the hydrophobic side chains of the amino acids Trp30, Val176, Leu181, and Trp221, creating the hydrophobic pocket. These lead hits showed π-π stacking interactions and hydrogen bonding within the binding pocket of cathepsin B. The promising compounds all share common pharmacophoric features, such as a sulfur-and nitrogen-rich moiety (pyridine, sulfonyl, thiophenyl, triazinyl, etc.).

Molecular Dynamics Simulations
BTB03075, KM02922, and RF02795 complexes with the cathepsin B protein were selected for molecular dynamics (MD) simulation to determine the long-term stability of the docked complex. An additional purpose underlying the MD studies was to investigate the positional and conformational changes of lead compounds in relation to the active pocket and specific residues, in order provide insight into the binding stability. The stability and interactions of the receptor-ligand complex was analyzed after 20-ns MD runs, and the root-mean-square deviations (RMSDs) of binding site residues and ligand trajectory were plotted against longer time frames (Figure 9). To elucidate the flexibility in the ligand-protein complex, we examined the root-mean-square fluctuation (RMSF) in each case of receptor-ligand complex ( Figure 9A). RMSF analysis of protein-ligand complexes gave a maximum value of 6 Å for 1CSB-CA030. Initial stable RMSF flexibility was observed between 1 and 3 Å in the 1CSB protein backbone. Three large peaks of RMSF were seen between residues at 30-60, 100-130, and 200-225, arising from the αhelix, loop, and β-strand regions, and fluctuating between 3 and 6 Å ( Figure 9A). This large peak was due to the binding of CA030, which affected the secondary structure of cathepsin B. There was no change in the secondary structure, although fluctuations in RMSF were observed in the α-helix, loop, and β-strand regions of cathepsin B upon BTB03075, KM02922, and RF02795 binding ( Figure 9A). The average RMSD values of binding site residues of cathepsin B while bound with compounds BTB03075, KM02922, and RF02795 were observed to be 1.87, 0.92, and 0.91, respectively, suggesting stable binding sites. However, significant movements were observed for CA030, the bound inhibitor in the crystal structure of cathepsin B (PDB code 1CSB) ( Figure 9B). MD further revealed that after 20 ns of simulations, no significant changes in ligand-docked conformation were observed for lead hits (BTB02922, KM02922, and RF02795) either, with average RMSD values of 2.93, 2.15, and 2.13 for BTB03075, KM02922, and RF02795, respectively ( Figure 9C). Large movements were again observed in the case of the inhibitor-bound crystal structure of cathepsin B (Figure 9C,D). The results shown here confirm the docking stability of the predicted binding modes of BTB02922, KM02922, and RF02795. The trajectory generated from 20-ns MD simulation was then used to calculate ligand binding free energy using the MM-PBSA [66] and MM-GBSA methods [67]. Both the MM-PBSA and MM-GBSA methods can be used to quickly reproduce relative binding affinities for a set of ligands, with reasonable accuracy [68,69]. The MM-PBSA-and MM-GBSA-predicted ligand binding free energies are presented in Table 7, and strong correlation was observed between ligand binding free energies calculated from both the MM-PBSA and MM-GBSA methods. The MM-PBSA-and MM-GBSA-predicted ligand binding free energies for the three virtual screening hits were either much lower than or comparable with the known inhibitor of cathepsin B (Supplementary Figure S2). interactions with the key amino acids His110 and His111 in cathepsin B's binding site via the imidazole side chain of the histidine amino acid. These require less distance to form hydrogen bonds with ligands, as the polar hydrogen atom of the imidazole ring acts as a hydrogen bond donor, while the basic nitrogen moiety is a hydrogen bond acceptor. The predicted binding modes of the Maybridge lead hits BTB03075, KM02922, and RF02795 are shown in Figure 8. Similarly, to most of the cathepsin B inhibitors, identified hits were all anchored to the cavity by hydrogen bonds between the ligand and Trp221, along with the p-p stacking interaction with His111 and His199. Aromatic rings of compounds BTB03075 ( Figure 8A(a')), KM02922, and RF02795 ( Figure 8B(b') and C(c')) were embedded in the cavity formed by Trp221, Gln23, and His199 residues. Compound RF02795 showed the best results in terms of binding pattern and docking score amongst the 18 compounds in the 3 crystal structures, and its predicted binding affinity exceeded that of the co-crystalized ligands in the cathepsin B protein structure (Table 6). Regarding the binding pattern of compound RF02795, its triazinyl is well accommodated at the binding site, interacting with its amide functional group through hydrogen bonding with the key amino acids His110 and His111, fitting the hydrophobic methoxy phenyl ring in the vicinity of the hydrophobic side chains of the amino acids Trp30, Val176, Leu181, and Trp221, creating the hydrophobic pocket. These lead hits showed π-π stacking interactions and hydrogen bonding within the binding pocket of cathepsin B. The promising compounds all share common pharmacophoric features, such as a sulfur-and nitrogen-rich moiety (pyridine, sulfonyl, thiophenyl, triazinyl, etc.).

Molecular Dynamics Simulations
BTB03075, KM02922, and RF02795 complexes with the cathepsin B protein were selected for molecular dynamics (MD) simulation to determine the long-term stability of the docked complex. An additional purpose underlying the MD studies was to investigate the positional and conformational changes of lead compounds in relation to the active pocket and specific residues, in order provide insight into the binding stability. The stability and

Cathepsin B Inhibition Activity of Virtually Screened Hit Compounds
To validate our mixed-feature 3D pharmacophore modeling and ligand-based virtual screening approach, we tested the cathepsin B inhibitory effects of the test compounds in vitro. The three hit molecules BTB02922, KM02922, and RF02795 were selected after passing our high-throughput screening protocol and showing higher pose stability in MD studies. These drug-like compounds were incubated with human cathepsin B in a concentration-dependent manner. F-F-FMK was used as a positive control and was tested for % inhibition at a concentration of 10 μM only. The compounds-namely, BTB02922, KM02922, and RF02795-were tested at dose-dependent concentrations of 2.5 μM, 5.0 μM, 10.0 μM, 15.0 μM, and 20.0 μM, and exhibited inhibition of the cathepsin B enzyme in the kinetic assay. KM02922 showed the best relative inhibitory activity at a relative inhibition of 44.67, 51.0, and 58% at concentrations of 2.5, 5.0, and 10 μM, respectively ( Figure 10). The second-best compound was BTB02922, which also showed a high percentage relative inhibition, but did not reach the level of compound KM02922. The control inhibitor (F-F-FMK) demonstrated 69.67% relative inhibition of cathepsin B at a concentration of 10 μM.

Cathepsin B Inhibition Activity of Virtually Screened Hit Compounds
To validate our mixed-feature 3D pharmacophore modeling and ligand-based virtual screening approach, we tested the cathepsin B inhibitory effects of the test compounds in vitro. The three hit molecules BTB02922, KM02922, and RF02795 were selected after passing our high-throughput screening protocol and showing higher pose stability in MD studies. These drug-like compounds were incubated with human cathepsin B in a concentration-dependent manner. F-F-FMK was used as a positive control and was tested for % inhibition at a concentration of 10 µM only. The compounds-namely, BTB02922, KM02922, and RF02795-were tested at dose-dependent concentrations of 2.5 µM, 5.0 µM, 10.0 µM, 15.0 µM, and 20.0 µM, and exhibited inhibition of the cathepsin B enzyme in the kinetic assay. KM02922 showed the best relative inhibitory activity at a relative inhibition of 44.67, 51.0, and 58% at concentrations of 2.5, 5.0, and 10 µM, respectively ( Figure 10). The second-best compound was BTB02922, which also showed a high percentage relative inhibition, but did not reach the level of compound KM02922. The control inhibitor (F-F-FMK) demonstrated 69.67% relative inhibition of cathepsin B at a concentration of 10 µM.

Discussion
AD is a neurodegenerative disease that affects neuronal cells, eventually leading to irreversible damage and death of neurons [70,71]. Identification of target-directed inhibitors to combat this complex disease can greatly help in mechanism-based drug development. Aβ plaques and tau hyperphosphorylation are two hallmark features of AD, and various reports have indicated that these events are not mutually exclusive [71][72][73]. The development of inhibitors against the cysteine protease cathepsin B, which participates in the breakdown of APP and is involved in Aβ pathology by acting as a beta-secretase, may be of vital importance. Hook et al. reported that the cysteine protease inhibitor E64d reduces Aβ in the brain and improves memory deficits in AD animal models by inhibiting cathepsin B, but not beta-secretase activity [74]. A neuroprotective role of cathepsin B is also reported in some of the studies by lowering Aβ levels and improving neuronal dysfunction in AD animal models [75][76][77]. Thus, a dual role of cathepsin B as a neurodegenerative and neuroprotective enzyme provides an important therapeutic target for AD. We have previously reported novel molecular scaffolds for Alzheimer's acetylcholinesterase inhibitors using a rational drug design approach [42].
The current study investigated the step-by-step in silico drug screening protocol to identify new small molecules with cathepsin B inhibitory activity using a combination of molecular docking, pharmacophore mapping, SIFts, and molecular dynamics approaches. Firstly, anti-cathepsin-B compounds of both natural and synthetic origins were initially docked to the cathepsin B active site using AutoDock v4.2 ( Table 1). The molecular docking technique is a fast method to analyze the binding affinity of the protein-ligand interactions and has been implicated in the discovery of several drug molecules [78,79]. A ligand-based virtual screening (LBVS) pharmacophore model was developed using the ligandset module in LigandScout 3.1 [45]. This method was useful to identify the structural features of compounds interacting with cathepsin B, and to define the three-dimensional alignment of unique pharmacophoric features [80]. The LBVS method exploits the information of compounds with inhibitory activity in order to retain important chemical and

Discussion
AD is a neurodegenerative disease that affects neuronal cells, eventually leading to irreversible damage and death of neurons [70,71]. Identification of target-directed inhibitors to combat this complex disease can greatly help in mechanism-based drug development. Aβ plaques and tau hyperphosphorylation are two hallmark features of AD, and various reports have indicated that these events are not mutually exclusive [71][72][73]. The development of inhibitors against the cysteine protease cathepsin B, which participates in the breakdown of APP and is involved in Aβ pathology by acting as a beta-secretase, may be of vital importance. Hook et al. reported that the cysteine protease inhibitor E64d reduces Aβ in the brain and improves memory deficits in AD animal models by inhibiting cathepsin B, but not beta-secretase activity [74]. A neuroprotective role of cathepsin B is also reported in some of the studies by lowering Aβ levels and improving neuronal dysfunction in AD animal models [75][76][77]. Thus, a dual role of cathepsin B as a neurodegenerative and neuroprotective enzyme provides an important therapeutic target for AD. We have previously reported novel molecular scaffolds for Alzheimer's acetylcholinesterase inhibitors using a rational drug design approach [42].
The current study investigated the step-by-step in silico drug screening protocol to identify new small molecules with cathepsin B inhibitory activity using a combination of molecular docking, pharmacophore mapping, SIFts, and molecular dynamics approaches. Firstly, anti-cathepsin-B compounds of both natural and synthetic origins were initially docked to the cathepsin B active site using AutoDock v4.2 ( Table 1). The molecular docking technique is a fast method to analyze the binding affinity of the protein-ligand interactions and has been implicated in the discovery of several drug molecules [78,79]. A ligand-based virtual screening (LBVS) pharmacophore model was developed using the ligandset module in LigandScout 3.1 [45]. This method was useful to identify the structural features of compounds interacting with cathepsin B, and to define the three-dimensional alignment of unique pharmacophoric features [80]. The LBVS method exploits the information of compounds with inhibitory activity in order to retain important chemical and physical properties required for the ligand to bind to the protein and elicit biological activity and a pharmacological response [81]. Importantly, all LBVS is based on the common conception that all bioactive compounds interact with the protein (receptor) in a similar manner, leaving ample diversity in the dataset with similar functional groups, which can later be converted to pharmacophoric features that are related to ligand biological activity [82]. Therefore, these pharmacophoric features could underscore the chemical features of cathepsin B interaction. To generate 3D pharmacophore models, 24 potent cathepsin B inhibitors of both natural and synthetic origins were randomly selected as training set compounds, mainly comprising a broad range of small scaffold molecules previously reported as having anti-cathepsin-B activity ( Figure 2, marked with *). The training set represents the following classes: ethylene oxide, benzopyran, nitramide, pyrrolidine, diazonium, triazine, and sulfonyl pyrazol ( Figure 2, Table S1). Ten different hypotheses (Hypo I to Hypo X) were generated using pharmacophore-fit and atom-overlap scoring functions, as shown in Table 2. The refinement of the hypothesis was carried out by adding the WOMBAT database of 741 compounds inactive against cathepsin B. This refinement process allowed us to validate the three pharmacophore hypotheses with the highest goodness of fit scores and enrichment factors. Finally, Hypo III, with eight features (2 HYs, 4 HBAs, and 2 HBDs), was selected because it demonstrated the best features to distinguish between the active and inactive compound libraries described in Table 3. It was noted that all LBVS models share HBA and HBD features, reflecting that the group present in all molecules that acts as H-bond acceptors and donors-expected to form H-bond interactions at the cathepsin B catalytic site with Cys29 and His199 ( Figure 4A)-is crucial for anti-cathepsin-B activity [10].
The greatest challenge of the pharmacophore model is to discriminate the active and inactive compounds and predict the activity of the test/training sets at the same time. Thus, in order to carefully examine the quality of the proposed pharmacophore models, the following validation methods were used: (1) formation of the raining set to confirm that the used ligands are detected; (2) differentiation of the highly active, moderately active, and less active compounds in the test set; and (3) non-active molecules (decoys) and active molecules to validate the predictability of the hypothesis, in order to pick the active from the unknown database. Furthermore, the performance of the 3D pharmacophore model is more accurately measured in larger, rather than smaller, datasets [83]; thus, the validation dataset consisted of 796 (759 decoys and 37 active) molecules. Hypo III correctly estimated the highly active (IC50 ≤ 0.3 µM, +++), moderately active (0.3 µM > IC50 < 2.5 µM, ++), and less active (IC50 ≥ 2.5 µM, +) molecules ( Figure 5, Table 4).
After validation of pharmacophore Hypo III, it was employed for rapid screening of the drug-like chemical library obtained from the Maybridge database [84]. As a benchmark of the current study, small molecules were required in order to significantly reduce the database size by both satisfying pharmacophores and detecting the promising hits correctly. Around 1728 drug-like compounds matched to the Hypo III pharmacophore model, thereby yielding 2.85% of the Maybridge screen library. In addition, fast docking of 1728 drug-like molecules using ADt-Vena was employed to predict the binding conformation and classify the ability of these molecules to form a stable complex within the binding cavity of the cathepsin B protein [85]. These hits were also predicted online for blood-brain barrier (BBB) penetration using an online BBB predictor [86] (Figure 3), reduced to 176 drug-like compounds. Moreover, the analysis of the ROC curves also assesses the performance of the pharmacophore hypotheses in separating active from decoy compounds ( Figure 6, Table 6), and could be effectively used to find novel anti-cathepsin-B compounds. To this end, the pharmacophore model Hypo III, was employed to rapidly screen 176 drug-like compounds passing through visual inspection, Lipinski's rule of five, and docking score; a total of 18 hits survived [87,88]. The ligand-binding energies for the protein-ligand complexes of each one of these 18 hits docked to the cathepsin B protein, and were obtained by performing LGA calculation (Table 6) [89].
LGA estimates the binding affinity for flexible ligand-receptor docking, which allows the accommodation of many degrees of freedom. The top docked hits, with binding energy scores < −8.00 kcal mol −1 , were sorted based on binding energy score ( Table 6). The binding poses of these ligands were analyzed for their relative positions and orientations, and the various molecular interactions they formed in the binding site of cathepsin B, through the generation of SIFt profiles, which were utilized as a virtual screening tool to predict protein-ligand interaction contours when studied in the same target protein (Figure 7). Key representations of the interactions in binding pockets were similarly employed by Ritschel et al. for quantifying the similarities of binding site subpockets in order to explain the pharmacological effects of statin (HMG-CoA reductase) inhibitors based on pharmacophore fingerprints [90]. The best docking poses from known cathepsin B inhibitors and virtual screening hits were clustered using similarity matrices (Jaccard-Tanimoto coefficient) between all of the fingerprints ( Figure 7B,C). The dataset used for the generation of SIFt profiles consisted of 79 molecules (61 known cathepsin B inhibitors and 18 virtual screening hits). Figure 7B shows a dendrogram of the less active (Cluster 1) and most active (Cluster 2) clusters obtained via the hierarchical clustering approach [53]. The binding mode of hit molecules presented in Cluster 2, identified through hierarchical clustering, suggested that the bound conformation of virtual screening hits strongly mimics the observed conformation of cathepsin B inhibitors at the protein-binding site. In the specialized case environment, the thiol and imidazole side chains of cathepsin B Cys29 and His199 amino acid residues form an ion pair [91]. The cleavage is then mediated by nucleophilic attack by S-from Cys29 on the carbon atom, followed by proton donation from His199 [91]. The SIFt outcome was promising because, although the SIFt dataset was assigned to the less active side of the cathepsin B inhibitors, ultimately, smaller clusters of active compounds containing lead-like molecules were obtained. These molecules form hydrogen bonds with His110 and His111 and π-stacking interactions with His199 ( Figure 8). The promising lead compounds (BTB03075, KM02922, and RF02795) share common pharmacophoric features, such as a sulfur-and nitrogen-rich moiety (pyridine, sulfonyl, thiophenyl, triazinyl, etc.).
Molecular dynamic simulations are a key component of the drug discovery process to assess the conformational stability of the docked protein-ligand complexes, as well as to analyze the molecular interactions between the ligands and the protein-binding site residues [92,93]. The top three prioritized drug-like molecules, along with the cathepsin B crystal structure (PDB 1D: 1CSB) used for the virtual screening, were subjected to the same simulation parameters (Supplementary Materials). The RMSF and RMSD analyses revealed that, for all four systems, there was minimal fluctuation in the flexibility of the protein backbone atoms, suggesting that binding of all three ligands remained stable, and that ligand binding did not affect protein stability throughout the course of the simulation run ( Figure 9). In rational drug design, MM-PBSA and MM-GBSA are powerful tools for optimizing lead compounds, because they can critically analyze the interactions of ligand-receptor bonds [55]. It has been suggested that electrostatic interactions dominate over non-covalent bonding in the recognition of molecular interactions between drug and target molecules [94]. However, this cannot always be true, as the geometric shape is also an important aspect, which is affected by other forces, such as van der Waals interactions, solvation/desolvation energy, and entropy. MM-PBSA and MM-GBSA predicted the binding free energy components of CA030, BTB03075, KM02922, and RF02795, as shown in Table 7 and Supplementary Figure S2. There are numerous studies where the critical interactions of the ligand-receptor pairs employed MM-PBSA and MM-GBSA approaches in a computing framework [95][96][97][98]. For example, by using biological and computational MM-PBSA free energy evaluation, a series of resveratrol derivatives to provide an explanation for the anti-β-secretase (BACE-1 in neurons involved in Aβ deposition in the brain) activity in BACE-1 and oxytosis inhibition assay [99], while in another study, application of MD and explicit water thermodynamics was used to identify a new class of cathepsin B inhibitors [100].

Conclusions
In this study, we carried out LBVS of 60,538 Maybridge drug-like molecules by screening these compounds through a 3D pharmacophore model generated from novel inhibitors of cathepsin B, of both natural and synthetic origins. The pharmacophore model Hypo III represents a spatial orientation of eight specific molecular features, including two HYs, four HBAs, and two HBDs, which facilitate sufficient interactions for binding to cathepsin B and, consequently, the potential inhibition of its protease function. The most active molecule in the training set fits the pharmacophore model perfectly with the highest scores. We found 1728 compounds with the relevant eight-feature pharmacophore. These 1728 Maybridge compounds were then tested by fast docking using AutoDock-Vina to perform virtual screening and assess their ability to cross the BBB, which reduced the number of drug-like compounds to 176. Docking score and Lipinski's rule of five further lowered the number to 18 compounds. SIFt mapping made it possible to quickly shortlist promising compounds and automate the visualization of putative active binding modes using hierarchical clustering. The active Cluster 2 demonstrates important interactions, including hydrogen bonding and π-π stacking with the key residues in the cathepsin B binding site residue, viz. Gln23, Gly24, Gly27, Asn72, Gly74, His110, Glu122, Met196, Gly198, His199, and Trp221. Enzyme assay was performed on three virtually screened compounds to validate the pharmacophore hypothesis, and KM02922 was found to be a potent cathepsin B inhibitor. However, compounds BTB03075 and RF02795 showed a lower percentage of relative inhibitory activity. Furthermore, the stability of the lead molecules was subjected to 20-ns scale MD simulations in order to satisfy the selected pharmacophore model. MM-PBSA and MM-GBSA calculation are predicted to inhibit the cathepsin B activity, with significant binding energies. We successfully identified new entities-including pyridine, acetamide, and benzohydrazide derivatives-that have not been previously characterized in the published literature as AD cathepsin B inhibitors. These findings may be useful from a medicinal chemistry and combinatorial chemistry perspective to obtain new molecular starting points for the design and optimization of novel cathepsin B inhibitors for AD.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/cells10081946/s1, Figure S1: (A) Active site of cathepsin B, displaying the binding mode of some cathepsin B inhibitors; (B) virtual screening hits belonging to active Cluster 2 compounds displaying π-π interactions; and (C) virtual screening hits belonging to less active Cluster 1, which contains molecules with distinct binding modes. Some representative images of structures are shown; Figure S2: Correlation plot between MM-PBSA-and MM-GBSA-predicted binding free energy of in silico screening hits; Table S1: Ligand dataset used to generate three-dimensional pharmacophores. Physicochemical properties of the ligand dataset. MW: molecular weight; XLogP3-AA: octanol-water partition coefficient; HBD: hydrogen bond donor; HBA: hydrogen bond acceptor; RBC: rotatable bond count; TPSA: topological polar surface area; HAC: heavy atom count; IC50: half-maximal inhibitory concentration (nM); Table S2: In silico absorption, distribution, and toxicity prediction of natural compounds. HIA: human intestinal absorption; IVCCP: in vitro Caco-2 cell permeability; IVMCM: in vitro MDCK cell permeability; LogKp: in vitro skin permeability; IVPPB: in vitro plasma protein binding; BBP: in vivo blood-brain barrier penetration (C.brain/C.blood); Table S3: In silico absorption, distribution, and toxicity prediction of synthetic compounds. HIA: human intestinal absorption; IVCCP: in vitro Caco-2 cell permeability; IVMCM: in vitro MDCK cell permeability; LogKp: in vitro skin permeability; IVPPB: in vitro plasma protein binding; BBP: in vivo blood-brain barrier penetration (C.brain/C.blood); Video S1: Binding mode of virtually screened hit molecules.