Identification of CDK7 Inhibitors from Natural Sources Using Pharmacoinformatics and Molecular Dynamics Simulations

The cyclin-dependent kinase 7 (CDK7) plays a crucial role in regulating the cell cycle and RNA polymerase-based transcription. Overexpression of this kinase is linked with various cancers in humans due to its dual involvement in cell development. Furthermore, emerging evidence has revealed that inhibiting CDK7 has anti-cancer effects, driving the development of novel and more cost-effective inhibitors with enhanced selectivity for CDK7 over other CDKs. In the present investigation, a pharmacophore-based approach was utilized to identify potential hit compounds against CDK7. The generated pharmacophore models were validated and used as 3D queries to screen 55,578 natural drug-like compounds. The obtained compounds were then subjected to molecular docking and molecular dynamics simulations to predict their binding mode with CDK7. The molecular dynamics simulation trajectories were subsequently used to calculate binding affinity, revealing four hits—ZINC20392430, SN00112175, SN00004718, and SN00262261—having a better binding affinity towards CDK7 than the reference inhibitors (CT7001 and THZ1). The binding mode analysis displayed hydrogen bond interactions with the hinge region residues Met94 and Glu95, DFG motif residue Asp155, ATP-binding site residues Thr96, Asp97, and Gln141, and quintessential residue outside the kinase domain, Cys312 of CDK7. The in silico selectivity of the hits was further checked by docking with CDK2, the close homolog structure of CDK7. Additionally, the detailed pharmacokinetic properties were predicted, revealing that our hits have better properties than established CDK7 inhibitors CT7001 and THZ1. Hence, we argue that proposed hits may be crucial against CDK7-related malignancies.


Introduction
Cancer is one of the leading causes of death in the 21st century and the most critical obstruction for the upsurge of the global lifespan [1]. As a result, the pharmaceutical industry and scientific communities have all focused on reducing the cancer-related death rate, with the expectation of rapid development of effective and safe cancer treatment. Generally, the genetic alteration in signaling pathways that control cell-cycle progression, apoptosis, and cell growth is the common hallmark of the disease progression [2]. Among the reasons mentioned above, the abnormal cell cycle is one of the most protruding features of tumor cells. Anomalous expression of cell-cycle-related proteins provides tumor cells their invasive, metastatic, drug-resistant, and anti-apoptotic properties [3]. The cell cycle is a highly regulated process managed by numerous checkpoints to safeguard the division and proliferation in an ideal way. The central machines that drive the cell-cycle progression are mediated by cyclin-dependent kinases (CDKs) and partner cyclins [4,5]. The CDKs are a family of approximately 20 serine/threonine kinases that regulate the fundamental processes [6]. CDKs are primarily divided into two main groups; the first ones are the cell-cycle linked CDKs (CDK1, 4, and 6) that directly regulate the cell-cycle progression, and the second ones are the transcription-linked CDKs (CDK7, 8, 9, 12, and 13) [7]. CDK7 is a unique member of the CDK family among transcription-associated kinases due to its dual function in cell-division control and transcription [8]. CDK7 forms a dimeric complex with MAT1, which is an element of various chromatin remodeling complexes. Furthermore, the dimeric complex with additional involvement of cyclin H is known as CDK-activating kinase (CAK), which phosphorylates the T-loop of corresponding CDK members (CDKs 1, 2, 4, and 6) to regulate the cell cycle [9,10]. The transcriptional regulation by CDK7 or CAK is performed by phosphorylating the carboxy-terminal domain (CTD) of RNA polymerase II at serine 5 and 7, as well as other transcription factors [11][12][13]. Bartkova et al., have identified the expression of CDK7 in normal and tumor cells for the first time [14]. According to recent studies, CDK7 overexpression is reported in various malignancies such as hepatocellular carcinoma, gastric cancer, oral squamous cell carcinoma, breast cancer, ovarian cancer, high-grade glioma, cholangiocarcinoma, pancreatic cancer, and colorectal cancer with aggressive clinicopathological features and poor prognosis [15][16][17][18][19][20][21][22][23][24][25]. As a result of CDK7's direct involvement in numerous cancers, it has become an attractive target in cancer therapy [26][27][28]. To date, only one ATP-bound human CDK7 crystal structure is known [29]. The structure reveals that the ATP-binding site is located in the cleft between the residues of the N-terminal and C-terminal lobes. Interestingly, CDK7 has a 44% sequence similarity with CDK2 with a reported root mean square deviation of 1.25 Å [29]. The available structural and functional information of CDK7 was exploited previously by researchers to develop inhibitors that can bind to the ATP-binding site of CDK7 [28,[30][31][32]. The literature survey confirms that great progress has been made over the past few years in discovering and developing CDK inhibitors during the last decade. Still, unfortunately, very few inhibitors were reported effective against CDK7 due to adverse effects and low efficacy [30,31]. The CDK7 inhibitors are classified either as reversible or irreversible. The first selective, reversible small-molecule inhibitor identified against CDK7 was BS-181, inhibiting CDK7 with a half-maximal inhibitory concentration (IC 50 ) of 21 nM [33]. Interestingly, BS-181 was derived using computer-aided drug designing from roscovitine. Further studies based on BS-181 led to the identification of ICE0942 (CT7001) which inhibited CDK7 with an IC 50 of 40 nM and is currently under Phase I clinical trials [30,34,35]. The inhibitors mentioned above have demonstrated nanomolar potency against CDK7 while inhibiting numerous other CDK family members, limiting their usage as selective inhibitors [31]. Intriguingly, Nathanael's group identified THZ1, a phenylaminopyrimidine derivative, as the first irreversible ATP-competitive (covalent) inhibitor, inhibiting CDK7 with an IC 50 of 3.2 nM and showing less affinity towards other CDK members [36]. To further improve the potency and selectivity of THZ1, Syros Pharmaceuticals developed SY-1365 which inhibits CDK7 with an IC 50 of 84 nM and is currently under Phase I clinical trials [30]. Apart from small success in inhibiting CDK7, there is a continuous need for research in the design of selective inhibitors due to the dual role of CDK7 in cell-cycle regulation and transcription. A survey of literature reported numerous studies for inhibiting CDK7 considering in vitro or in silico strategies. However, a ligand-and structure-based pharmacophore study has not reported for CDK7 to date. Therefore, the present study was undertaken to explore the molecular mechanism of the selective binding of inhibitors with CDK7 over CDK2. In the current study, ligandand structure-based pharmacophore models were generated to obtain the key features responsible for CDK7 inhibition. The generated pharmacophore models were validated and used as 3D queries to screen a drug-like database of natural compounds. The obtained compounds were subsequently subjected to molecular docking for the identification of the binding mode with CDK7. Furthermore, molecular dynamics simulations were performed on individual complexes to study the stability and binding affinity of the potential inhibitors with CDK7 under physiological conditions. The identified hit compounds were then checked for their in silico selectivity towards CDK7 over CDK2, using molecular docking studies. Finally, the detailed in silico pharmacokinetic properties were predicted for the identified hits, which may be helpful for their optimization or synthesis for further studies.

Ligand-Based Pharmacophore Generation
The ligand-based pharmacophore can be built as either a qualitative (common feature pharmacophore) or quantitative (3D-QSAR) model [37]. Owing to the lesser number of inhibitors reported against CDK7 to date, a reliable 3D-QSAR model cannot be generated. Therefore, in the present study, a ligand-based common feature pharmacophore approach was selected. In the literature, two types of inhibitors were reported for CDK7 inhibition, viz. selective or non-selective [28,30,31]. Rationally, a small set of well-known selective inhibitors was selected as a training set for hypothesis generation [31]. The training set compounds were downloaded from the PubChem database in SDF format and checked manually in Discovery Studio (DS) v18 (www.accelrys.com (accessed on 25 March 2021)) Accelrys Inc. San Diego, CA, USA). The compounds were then energy minimized using the Steepest Descent algorithm with CHARMM force field in DS. Before pharmacophore generation, the Feature Mapping module of the DS was used to identify the most common features of the training set compounds. The pharmacophore model generation was then carried out with the Common Feature Pharmacophore Generation module of DS. This module is based on the Hip-Hop algorithm, which identifies the three-dimensional (3D) spatial arrangements of chemical features common to training set molecules. A maximum of 255 hypothesis conformations were generated using the BEST algorithm with an energy threshold of 20 kcal/mol. Ten pharmacophore models were generated with various parameters such as the rank of the hypothesis, features, direct hit, partial hit, and max fit. During the hypothesis generation, special weightage was given to well-known CDK7 inhibitors-CT7001 and THZ1-by applying Principal and Max Omit feat values 2 and 0, respectively, to ensure that the inhibitor's chemical features are considered in building pharmacophore space [38]. At the same time, other training set compounds were regarded as reasonably active, where all but one feature must map to the compound.

Structure-Based Pharmacophore Generation
To build a reliable structure-based pharmacophore model, a protein's 3D structural complex with a highly active ligand is a prerequisite. Lolli et al., reported the first X-ray crystal structure in 2004 for CDK7 in complex with ATP [29]. Thenceforth, no other ligandbound X-ray crystal structure was reported with CDK7. Interestingly, electron microscopy (EM)-derived CDK7 structure, bound with the highly selective covalent inhibitor, THZ1, was deposited recently in Protein Data Bank (PDB) (PDB ID: 6XD3) [39]. The structure was downloaded and prepared in DS using the Clean Protein module. The unwanted molecules were removed, and the Receptor-Ligand Pharmacophore Generation module was used to generate the pharmacophore model. This module develops selective pharmacophore models based on protein-ligand interactions [40]. The BEST algorithm was opted for the conformation generation with the flexible fitting method, which generates ten hypotheses with different feature sets and selectivity scores. The best hypothesis was selected based on validation parameters and key interacting features with active site residues.

Validation of the Pharmacophore
Validation of the pharmacophore model is an essential step for its selection and evaluation. In the present study, two commonly used validation approaches, mainly, the receiver operating characteristic (ROC) curve and the Güner-Henry (GH) approach, were used [41,42]. The ROC curve analysis was performed during hypothesis generation in both ligand-and structure-based procedures. First, a small dataset was prepared with known active and inactive compounds. The four compounds used for pharmacophore generation were considered as known actives, and the other eight were taken as inactive. The top three hypotheses from each approach were selected and further validated with a second validation technique, the GH or decoy set method. A decoy set of 110 compounds was generated with 6 already known active inhibitors of CDK7 (IC 50 < 100 nm) [30,31] and 104 inactive compounds. The Ligand Pharmacophore Mapping module in DS was used to screen the decoy dataset. The resulting mapping data were used for assessment of the pharmacophore quality by evaluating the following equation: The selected and validated hypotheses from the ligand-and structure-based pharmacophore procedures were exploited as 3D queries to screen four natural compound databases in DS.

Drug-like Database Generation and Virtual Screening
Four natural compound libraries (ZINC, SuperNatural2, ExiMed, and InterBioScreen) were considered in the present study. To retrieve the drug-like small-molecule compounds from these libraries, it is recommended to filter the huge number of compounds during early steps to save time and unnecessary effort at a later stage in drug discovery [43,44]. Therefore, the libraries were filtered at first with Lipinski's rule of five (Ro5) and then for their ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties in DS [45,46]. The validated hypotheses from the ligand-and structure-based approaches were then used to map the compounds using the Ligand Pharmacophore Mapping protocol in DS, with the FAST algorithm for generating the conformations with the flexible fitting method. The obtained compounds were then checked manually aligning with the generated models, and the most appropriate ones were selected for the molecular docking study.

Molecular Docking
Molecular docking is a productive and cost-effective technique in computational drug design to identify and assess molecular interactions between the ligands and receptors [47]. In the present work, the aforementioned drug-like database constructed from natural sources was used for docking with the active site of CDK7 using the Genetic Optimization for Ligand Docking (GOLD v5.2.2) program [48]. The only available inhibitor-bound structure of CDK7 (PDB ID: 6XD3) was used for the molecular docking study. The active site for molecular docking was defined within a 13.10 Å radius around the THZ1, using DS's Define and Edit Binding Site module. The docking sphere's X, Y, and Z coordinates were 116.84, 95.12, and 79.66, respectively. In general, ten docking runs/ligand were performed for consensus generation. For the scoring of ligands, the commonly used scoring function-GoldScore fitnesswas used. The GoldScore fitness function is a molecular mechanics-like function optimized for predicting the ligand-binding site considering hydrogen bonding energy, van der Waals energy, metal interactions, and torsion deformations [49]. Higher GoldScore than reference (REF) inhibitors were used as a cut-off criterion for the selection of potential CDK7 binders. The compounds were further filtered based on their binding mode and vital residual interactions with active site residues reported as necessary for inhibiting CDK7 in the cell.

Molecular Dynamics Simulation
Molecular dynamics (MD) simulations were conducted using the Groningen Machine for Chemical Simulations (GROMACS v5.1.5) to understand better the molecular interaction mechanism of protein and potential hit chemicals under physiological settings in a better way [50]. The simulation parameters and coordinates of CDK7 and potential hit compounds were generated using CHARMM27 force field in GROMACS and SwissParam, respectively [51,52]. The TIP3P water model was used for hydration during the simulation run, and the prepared system was energy minimized with a force of 10 kJ/mol to avoid steric hindrance. The equilibration step was performed under NVT and NPT ensembles for 500 ps at 300 K, using a V-rescale thermostat and Parrinello-Rahman barostat, respectively [53,54]. All the simulations were run under periodic boundary conditions for 50 ns. The MD simulation trajectories were analyzed using DS and visual molecular dynamics (VMD) [55].

Binding Free Energy Calculations
The docking studies used to calculate the binding affinity for the target protein ignores the flexibility of the protein. Despite the advantage of screening large libraries of ligands in a short time, the docking scoring functions often lead to inconsistent results [56]. Therefore, a more reliable method, molecular mechanics Poisson−Boltzmann surface area (MM-PBSA), is generally used to rank the simulated complexes. The combined MD simulations and MM-PBSA calculations can incorporate the conformational fluctuations and entropic contributions to the binding energy [57]. The g_mmpbsa plugin tool in GROMACS was used to calculate the binding free energy from MD simulation trajectories [58]. The precise method used to calculate the binding free energy can be found elsewhere [59]. In general terms, the binding free energy of the protein-ligand complex in the solvent can be calculated as where G complex is the total free energy of the protein-ligand complex whereas G protein and G ligand are the total free energies of the protein and ligand alone in the solvent, respectively. The final ∆G bind value for the CDK7-ligand complex was computed as the average value from the last 40 to 50 ns of the MD simulation trajectories.

In Silico Specificity over CDK2
Designing small-molecule inhibitors with selectivity profiles that will ultimately be successful in the clinic is a huge concern in kinase drug research due to higher similarities among the family members [60,61]. The literature survey reveals that CDK7 shares high similarities with one of its family members, CDK2 [29]. Therefore, to select a specific inhibitor for CDK7, we performed molecular docking of the selected hits from MD simulation analysis with CDK2. The crystal structure of CDK2 in complex with CT7001 was obtained from PDB (PDB ID: 5JQ5) [62]. As mentioned earlier, the structure was prepared in DS, and the GOLD program was used for molecular docking with similar docking parameters. The results were analyzed based on docking scores and fundamental residual interactions.

In Silico Prediction of Pharmacokinetic Properties
Predicting the pharmacokinetic properties (PK) using in silico tools is a common step in drug discovery to identify novel inhibitors [38,63,64]. The PK properties, including subcategories in absorption, distribution, metabolism, excretion, and toxicity of a particular compound, were considered. The detailed in silico prediction of the PK properties might be helpful for further optimization of the selected hit as a successful leader. Therefore, in the present study, PK properties were predicted using an online webserver, pkCSM [65]. The selected hits were converted to their SMILES format in BIOVIA Draw and used as input for assessing their properties (http://biosig.unimelb.edu.au/pkcsm/ (accessed on

Results
The identification of prospective and specific inhibitors of CDK7 was conducted in the present study using several computational methodologies. A general overview of the study is depicted (Figure 1).
Predicting the pharmacokinetic properties (PK) using in silico tools is a common step in drug discovery to identify novel inhibitors [38,63,64]. The PK properties, including subcategories in absorption, distribution, metabolism, excretion, and toxicity of a particular compound, were considered. The detailed in silico prediction of the PK properties might be helpful for further optimization of the selected hit as a successful leader. Therefore, in the present study, PK properties were predicted using an online webserver, pkCSM [65]. The selected hits were converted to their SMILES format in BIOVIA Draw and used as input for assessing their properties (http://biosig.unimelb.edu.au/pkcsm/ (accessed on 30 May 2021)). The output results were analyzed according to the threshold values and compared with REF inhibitors CT7001 and THZ1 used in the study.

Results
The identification of prospective and specific inhibitors of CDK7 was conducted in the present study using several computational methodologies. A general overview of the study is depicted (Figure 1).

Ligand-Based Pharmacophore Generation
A qualitative common feature pharmacophore model was generated using a small group of selective inhibitors known against CDK7 ( Figure 2).

Ligand-Based Pharmacophore Generation
A qualitative common feature pharmacophore model was generated using a small group of selective inhibitors known against CDK7 ( Figure 2).
The Feature Mapping module in DS was used to select the most common features present in the training set compounds, which revealed a higher number of hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), ring aromatic (RA), hydrophobic (HYP), and positive ionizable (PI) features. The obtained information was then used as input for the pharmacophore generation. A total of ten hypotheses (Hypo) were generated with a narrow range of ranking scores (67.71 to 69.73), indicating that all ten models' features were spatially ordered. A deeper insight revealed that a minimum of 3-5 HBD features was present in all the hypotheses. A detailed overview of the generated hypothesis is given (Table 1). The Feature Mapping module in DS was used to select the most common features present in the training set compounds, which revealed a higher number of hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), ring aromatic (RA), hydrophobic (HYP), and positive ionizable (PI) features. The obtained information was then used as input for the pharmacophore generation. A total of ten hypotheses (Hypo) were generated with a narrow range of ranking scores (67.71 to 69.73), indicating that all ten models' features were spatially ordered. A deeper insight revealed that a minimum of 3-5 HBD features was present in all the hypotheses. A detailed overview of the generated hypothesis is given (Table 1). As mentioned in the Section 2, a validation step with ROC was performed simultaneously during the generation of the hypothesis. The ROC results revealed that Hypo1, Hypo2, and Hypo7 showed the highest specificity score of 0.75, 0.87, and 0.87, respectively (Table S1). The specificity score is the fraction of the genuinely inactive compounds being   As mentioned in the Section 2, a validation step with ROC was performed simultaneously during the generation of the hypothesis. The ROC results revealed that Hypo1, Hypo2, and Hypo7 showed the highest specificity score of 0.75, 0.87, and 0.87, respectively (Table S1). The specificity score is the fraction of the genuinely inactive compounds being correctly recognized, and its value ranges from zero to one [66]. Therefore, the hypotheses mentioned above were selected for further analysis based on ROC analysis ( Figure S1).

Structure-Based Pharmacophore Generation
The CDK7 structure consists of an N-and C-terminal, which includes the kinase domain of the protein ranges from 12 to 295 amino acids ( Figure 3A,B). The ATP-binding region of the protein primarily contains residues 90 to 170 ( Figure 3C,D).
The Receptor-Ligand Pharmacophore Generation protocol generated a total of ten hypotheses with five to six features. A minimum of three to four features was hydrophobic in all the hypotheses, and the selectivity score ranged from 8.79 to 10.31. A detailed summary of the hypothesis, such as the number and types of features, and selectivity score, is given ( Table 2). The ROC analysis further revealed Hypo1, Hypo3, and Hypo4 displayed a specificity value of 0.75, 0.75, and 0.87 in terms of identifying inactive compounds; therefore, these hypotheses were initially selected for further study (Table S2 and Figure S1). correctly recognized, and its value ranges from zero to one [66]. Therefore, the hypotheses mentioned above were selected for further analysis based on ROC analysis ( Figure S1).

Structure-Based Pharmacophore Generation
The CDK7 structure consists of an N-and C-terminal, which includes the kinase domain of the protein ranges from 12 to 295 amino acids ( Figure 3A,B). The ATP-binding region of the protein primarily contains residues 90 to 170 ( Figure 3C,D). The Receptor-Ligand Pharmacophore Generation protocol generated a total of ten hypotheses with five to six features. A minimum of three to four features was hydrophobic in all the hypotheses, and the selectivity score ranged from 8.79 to 10.31. A detailed summary of the hypothesis, such as the number and types of features, and selectivity score, is given ( Table 2). The ROC analysis further revealed Hypo1, Hypo3, and Hypo4 displayed a specificity value of 0.75, 0.75, and 0.87 in terms of identifying inactive compounds; therefore, these hypotheses were initially selected for further study (Table S2 and Figure S1).

Pharmacophore Validation
Hypothesis validation is an essential step in pharmacophore modeling to check the performance of the hypothesis generated. The two commonly used approaches, ROC and GH, were used to validate the generated hypotheses [41,42]. Based on ROC, we initially selected the top three hypotheses from both pharmacophore approaches as mentioned in the above sections ( Figure S1). Then, selected hypotheses were further subjected to validation by the GH approach. A test set of 106 inactive and 6 active (IC 50 < 100) compounds were compiled and named as decoy set [30,31]. The mapping analysis revealed that in the ligand-based approach, Hypo7 was found to have the highest GH score of 0.75, followed by Hypo2 and Hypo1. On the other hand, in the structure-based approach, Hypo4 displayed a GH score of 0.83, followed by Hypo3 and Hypo1. A detailed analysis of the GH method for both approaches is shown (Table 3).

Hypothesis Comparison
The validation results confirmed that Hypo7 (Pharm-A) and Hypo4 (Pharm-B) from ligand-and structure-based approaches, respectively, have the potential to differentiate between the active and inactive compounds of a given dataset. Therefore, these hypotheses can be further utilized for the process of virtual screening. Furthermore, the detailed inspection reveals that Pharm-A includes five HBD and one HYP feature ( Figure 4A). On the other hand, Pharm-B has one HBD, two HBA, and three HYP characteristics ( Figure 4C). Thus, it can be seen that the structure-based hypothesis includes more diverse features than the ligand-based hypothesis.
Biomedicines 2021, 9, x FOR PEER REVIEW 10 of 25 4C). Thus, it can be seen that the structure-based hypothesis includes more diverse features than the ligand-based hypothesis. Interestingly, the superimposition of the Pharm-B in the active site reveals that the HBA feature situated outside of the cluster is responsible for interaction with Cys312, which is reported to provide selectivity and a covalent nature for CDK7 inhibitors ( Figure  3C,D). On the contrary, this is mimicked by HBD in the case of Pharm-A. The interfeature distance was also calculated for both hypotheses, displaying the distance between individual features in Å ( Figure 4B,D).

Drug-Like Database and Virtual Screening
To reduce the cost and time of the screening process, we first filtered the compound libraries based on their Lipinski's Ro5 and ADMET properties ( Figure 5 and Table S3), as suggested in previous reports [43,44]. Ro5 states that compounds have drug-like characteristics when the AlogP value ≤ 5, while HBD and HBA numbers are ≤5 and 10, respectively. The molecular weight of the compounds was extended beyond Ro5 ≤ 600 Da to obtain a greater number of compounds for the screening process. In ADMET descriptors, the properties encompassing blood-brain barrier (BBB) permeability (BBBp), solubility, absorption, hepatotoxicity, and CYP2D6 were considered. The compounds were evaluated as drug-like only if they had an absorption level of 0, with solubility and BBBp level of 3. Furthermore, the compounds which predicted false value in CYP2D6 and hepatotoxicity properties were considered. The application of filters resulted in a database of 57,578 drug-like compounds ( Figure 5). Interestingly, the superimposition of the Pharm-B in the active site reveals that the HBA feature situated outside of the cluster is responsible for interaction with Cys312, which is reported to provide selectivity and a covalent nature for CDK7 inhibitors ( Figure 3C,D). On the contrary, this is mimicked by HBD in the case of Pharm-A. The interfeature distance was also calculated for both hypotheses, displaying the distance between individual features in Å ( Figure 4B,D).

Drug-like Database and Virtual Screening
To reduce the cost and time of the screening process, we first filtered the compound libraries based on their Lipinski's Ro5 and ADMET properties ( Figure 5 and Table S3), as suggested in previous reports [43,44]. Ro5 states that compounds have drug-like characteristics when the AlogP value ≤ 5, while HBD and HBA numbers are ≤5 and 10, respectively. The molecular weight of the compounds was extended beyond Ro5 ≤ 600 Da to obtain a greater number of compounds for the screening process. In ADMET descriptors, the properties encompassing blood-brain barrier (BBB) permeability (BBBp), solubility, absorption, hepatotoxicity, and CYP2D6 were considered. The compounds were evaluated as drug-like only if they had an absorption level of 0, with solubility and BBBp level of 3. Furthermore, the compounds which predicted false value in CYP2D6 and hepatotoxicity properties were considered. The application of filters resulted in a database of 57,578 drug-like compounds ( Figure 5). Selected pharmacophore models were then used to screen the database, and the analysis revealed that Pharm-A and Pharm-B mapped a total of 219 and 48 compounds, respectively ( Figure 5). Additional manual inspection of the obtained 267 compounds resulted in 195 compounds that mapped correctly on the hypothesis. The selected compounds were then subjected to molecular docking with CDK7.

Molecular Docking
The THZ1-bound structure (PDB ID: 6XD3) revealed that it could target both the ATP-binding site as well as the site outside the kinase domain ( Figure S2). A deeper inspection of the active site revealed hydrophobic interactions with residues Gly21, Lys41, Phe91, Leu144, Ala154, and hydrogen bonding with residues Met94, Glu95, Asp155, and Cys312 are essential for THZ1 interactions with CDK7. Before the docking experiment, the validation of the docking program GOLD was performed by re-docking the bound inhibitor THZ1. The results displayed a root mean square deviation < 2 Å between the docking pose and co-crystallized THZ1 pose, confirming that GOLD is suitable for our docking study ( Figure S3). The REF inhibitor, CT7001, an ATP-competitive CDK7 inhibitor, displayed a GoldScore of 56.48, and THZ1, the first covalent CDK7 inhibitor, displayed a GoldScore of 55.80 (Tables S4 and S5). During the docking experiment, the compounds with better scores than both the REF inhibitors were selected initially. Finally, the compounds were filtered based on fundamental molecular interactions with the active site residues mentioned above. Our analysis revealed that the docked compounds obtained from Pharm-A followed a non-competitive mode for binding similar to CT7001. In contrast, compounds obtained from Pharm-B showed a competitive binding mode similar to THZ1. The GoldScores and SMILES IDs of the selected compounds are shown (Tables  S4 and S5).

Molecular Dynamics Simulations
The docking results were further validated using MD simulations. A total of 24 CDK7 bound inhibitor complexes were simulated individually for 50 ns in the present study. For Selected pharmacophore models were then used to screen the database, and the analysis revealed that Pharm-A and Pharm-B mapped a total of 219 and 48 compounds, respectively ( Figure 5). Additional manual inspection of the obtained 267 compounds resulted in 195 compounds that mapped correctly on the hypothesis. The selected compounds were then subjected to molecular docking with CDK7.

Molecular Docking
The THZ1-bound structure (PDB ID: 6XD3) revealed that it could target both the ATP-binding site as well as the site outside the kinase domain ( Figure S2). A deeper inspection of the active site revealed hydrophobic interactions with residues Gly21, Lys41, Phe91, Leu144, Ala154, and hydrogen bonding with residues Met94, Glu95, Asp155, and Cys312 are essential for THZ1 interactions with CDK7. Before the docking experiment, the validation of the docking program GOLD was performed by re-docking the bound inhibitor THZ1. The results displayed a root mean square deviation < 2 Å between the docking pose and co-crystallized THZ1 pose, confirming that GOLD is suitable for our docking study ( Figure S3). The REF inhibitor, CT7001, an ATP-competitive CDK7 inhibitor, displayed a GoldScore of 56.48, and THZ1, the first covalent CDK7 inhibitor, displayed a GoldScore of 55.80 (Tables S4 and S5). During the docking experiment, the compounds with better scores than both the REF inhibitors were selected initially. Finally, the compounds were filtered based on fundamental molecular interactions with the active site residues mentioned above. Our analysis revealed that the docked compounds obtained from Pharm-A followed a non-competitive mode for binding similar to CT7001. In contrast, compounds obtained from Pharm-B showed a competitive binding mode similar to THZ1. The GoldScores and SMILES IDs of the selected compounds are shown (Tables S4 and S5).

Molecular Dynamics Simulations
The docking results were further validated using MD simulations. A total of 24 CDK7 bound inhibitor complexes were simulated individually for 50 ns in the present study. For comparative analysis, REF inhibitors (CT7001 and THZ1) were also simulated under similar conditions. The stability of the simulated complexes during the MD run was scrutinized by analyzing the root mean square deviation (RMSD) and root mean square fluctuation (RMSF) plots. The binding affinity of the compounds towards CDK7 was calculated through MM-PBSA. Compounds and inhibitors were ranked according to the binding free energy (Tables S4 and S5). The compounds which failed to show stability during the simulation and desirable binding free energy (∆G) values were removed from the analysis. Lastly, based on the stable binding mode, four molecules were selected and considered as hits against CDK7 (Table S6). It is noteworthy that Hit1 and Hit2 are obtained from the ligand-based approach, while Hit3 and Hit4 are from the structure-based approach.

Root Mean Square Deviation and Fluctuations
The residual deviations and fluctuations were determined using backbone RMSD and RMSF analyses [67]. The backbone RMSD for CT7001, Hit1, and Hit2 demonstrated that simulated systems displayed steady-state stability after 10 ns of run time ( Figure 6A). A slight bump was observed for CT7001 near 20 ns. Hit1 showed an average RMSD value of 0.27 nm, whereas a similar average value of 0.21 nm was observed for Hit2 and CT7001 (Table S6). Although the RMSD for Hit3, Hit4, and THZ1 showed stable RMSD after 5 ns, only a tiny bump was observed in the case of Hit3 near 30 ns ( Figure 6C). The average RMSD value for Hit3 was 0.24 nm, whereas Hit4 and THZ1 displayed a similar value of 0.22 nm (Table S5). The RMSD values of all the simulated compounds declined after 40 ns and remained constant until the endpoint ( Figure 6A,C). The RMSF is another essential parameter for the identification of the rigid and flexible region of the protein. It can be used to assess the flexibility of the backbone elements of the protein structure. The backbone RMSF was measured for all four hits and REF inhibitors (Figure 6B,D). The average RMSF value for Hit1, Hit2, Hit3, and Hit4 was 0.11, 0.09, 0.10, and 0.09 nm, respectively (Table S6). The REF inhibitors also showed similar average RMSF values of 0.09 and 0.11 for THZ1 and CT7001, respectively (Table S6). Significant fluctuations were observed for Hit1 at residues 15 and 167, Hit2 at residue 303, and CT7001 at residue 312 ( Figure 6B). The observed residues are not part of the ATP-binding pocket; except these, no significant fluctuations were observed. Broadly, all simulated systems displayed <0.3 nm of RMSD and RMSF values, indicating no substantial deviations and changes, which can influence the structural stability of CDK7.  (Tables S4 and S5). The compounds which failed to show stability during the simulation and desirable binding free energy (ΔG) values were removed from the analysis. Lastly, based on the stable binding mode, four molecules were selected and considered as hits against CDK7 (Table S6). It is noteworthy that Hit1 and Hit2 are obtained from the ligand-based approach, while Hit3 and Hit4 are from the structure-based approach.

Root Mean Square Deviation and Fluctuations
The residual deviations and fluctuations were determined using backbone RMSD and RMSF analyses [67]. The backbone RMSD for CT7001, Hit1, and Hit2 demonstrated that simulated systems displayed steady-state stability after 10 ns of run time ( Figure 6A). A slight bump was observed for CT7001 near 20 ns. Hit1 showed an average RMSD value of 0.27 nm, whereas a similar average value of 0.21 nm was observed for Hit2 and CT7001 (Table S6). Although the RMSD for Hit3, Hit4, and THZ1 showed stable RMSD after 5 ns, only a tiny bump was observed in the case of Hit3 near 30 ns ( Figure 6C). The average RMSD value for Hit3 was 0.24 nm, whereas Hit4 and THZ1 displayed a similar value of 0.22 nm (Table S5). The RMSD values of all the simulated compounds declined after 40 ns and remained constant until the endpoint ( Figure 6A,C). The RMSF is another essential parameter for the identification of the rigid and flexible region of the protein. It can be used to assess the flexibility of the backbone elements of the protein structure. The backbone RMSF was measured for all four hits and REF inhibitors (Figure 6B,D). The average RMSF value for Hit1, Hit2, Hit3, and Hit4 was 0.11, 0.09, 0.10, and 0.09 nm, respectively (Table S6). The REF inhibitors also showed similar average RMSF values of 0.09 and 0.11 for THZ1 and CT7001, respectively (Table S6). Significant fluctuations were observed for Hit1 at residues 15 and 167, Hit2 at residue 303, and CT7001 at residue 312 ( Figure 6B). The observed residues are not part of the ATP-binding pocket; except these, no significant fluctuations were observed. Broadly, all simulated systems displayed <0.3 nm of RMSD and RMSF values, indicating no substantial deviations and changes, which can influence the structural stability of CDK7.

Binding Free Energy Analysis
The MD simulation trajectories were used for the binding free energy (∆G) calculations. A total of 40 snapshots were taken from the last 10 ns of stable trajectories. The REF inhibitors, CT7001 and THZ1, showed an average ∆G value of −90.58 and −91.48 kJ/mol, respectively (Figure 7 and Table 4).

Binding Free Energy Analysis
The MD simulation trajectories were used for the binding free energy (ΔG) calculations. A total of 40 snapshots were taken from the last 10 ns of stable trajectories. The REF inhibitors, CT7001 and THZ1, showed an average ΔG value of −90.58 and −91.48 kJ/mol, respectively (Figure 7 and Table 4).   Table 4). The decomposition showed that van der Waals interaction contributed significantly to the binding of hits with CDK7, followed by electrostatic and SASA energy. In addition, the polar solvation contributed positively to the binding and therefore opposed the complex formation.

Binding Mode Analysis
The binding mode of the selected hits with CDK7 was evaluated using the average structure calculated from the last 10 ns of MD simulation trajectories. All the CDK7/ligand structures were superimposed in DS ( Figure S4). Accordingly, the superimposition revealed that Hit1 and Hit2 occupied the ATP-binding pocket of CDK7, similar to that of   Table 4). The decomposition showed that van der Waals interaction contributed significantly to the binding of hits with CDK7, followed by electrostatic and SASA energy. In addition, the polar solvation contributed positively to the binding and therefore opposed the complex formation.

Specificity of Inhibitors and Hits with CDK7 over CDK2
CDK family members, CDK7 and CDK2, share highly homologous structures and have sequence identity over 44% [29]. Thus, a deeper understanding of the molecular mechanism of CDK7-specific inhibitors over CDK2 may provide essential knowledge in Figure 9. Binding mode and intermolecular interaction of THZ1 (A,D), Hit3 (B,E), and Hit4 (C,F) with CDK7. Upper and lower panel represents the interactions in 3D and 2D, respectively. The protein in the background is shown with grey lines, whereas interacting residues are shown as grey sticks. The dark green dashed lines indicating hydrogen bonds, light green: van der Waals, purple: π-sigma, while the π-π and π-alkyl interactions are shown in pink.

Specificity of Inhibitors and Hits with CDK7 over CDK2
CDK family members, CDK7 and CDK2, share highly homologous structures and have sequence identity over 44% [29]. Thus, a deeper understanding of the molecular mechanism of CDK7-specific inhibitors over CDK2 may provide essential knowledge in the structure-based drug design of isoform-selective inhibitors. To date, only a tiny number of CDK7 inhibitors have been identified with higher selectivity over CDK2 [30,31]. From the computational end, molecular docking can predict the binding mode of the potential hit compounds inside the active site of two or more isoforms. A recent study also used molecular docking methodology to explore the possible mechanism of ligand specificity in CDK family members [68]. To investigate isoform selectivity of hit compounds, we employed a similar approach with CDK2. The crystal structure of CDK2 in complex with CT7001 (PDB ID: 5JQ5) was downloaded from PDB and prepared for molecular docking, utilizing similar parameters as that for CDK7 [62]. Docking results indicated that our identified hits demonstrate lesser docking scores against CDK2 than CDK7 (Table S8). Accordingly, the representative docking pose of hit molecules with CDK2 and CDK7 were also displayed ( Figure S5). It was observed that Hit1 formed a hydrogen bond with Thr14 and an unfavorable acceptor-acceptor bond with the most frequently occurring HBA feature of CDK2, Leu83 ( Figure S5B). Moreover, Hit2 also exhibited unfavorable bonds with Gln131 and Leu134 of CDK2 ( Figure S5C). Furthermore, hit molecules from a structure-based approach establish only hydrophobic and van der Waals interactions with CDK2 residues, whereas no hydrogen bond was observed ( Figure S5E,F). Hydrogen bonds with essential CDK2 residues Ile10, Leu83, Asp86, Lys89, and Asp145 were not observed for our identified hits with CDK2 [68,69]. On the other hand, our hits display interactions with CDK7 hinge residue Met94 through hydrogen bonds as maintained during MD simulation analysis (Figure 8). Additionally, our hits target crucial CDK7 residues, Pro310 and Cys312 (Figure 9), which are not observed in CDK2 ( Figure S6). These residues, along with additional CDK7 residues Val100 and Thr96 are selective for CDK7 [68]. Our docking analysis indicated interactions with the residues mentioned above via van der Waals and hydrophobic π-bonds (Table S7). While the co-crystallized CDK2 inhibitor CT7001 established hydrogen bonds with residues Leu83 and Asp145, our docking results suggested that identified hits could not follow a similar binding pattern ( Figure S5). Therefore, we argue that our identified hits may be selective for CDK7 over CDK2.

In Silico Prediction of Pharmacokinetic Properties
In the present study, PK properties were analyzed using the pkCSM webserver (Table 5). In absorption properties, the water solubility of the hits was predicted as being more soluble than the REF inhibitors, CT7001 and THZ1. According to the literature, a compound that exhibits a value >0.90 may have a high absorption rate in Caco-2 cell lines. The results indicated that CT7001 has higher permeability, whereas THZ1 has a moderate absorption rate. Interestingly, Hit2 and Hit3 also showed high absorption levels, whereas Hit4 may have a reasonable absorption level. Unfortunately, Hit1 failed to cross the Caco-2 cell line. The hits and REF inhibitors displayed an intestinal absorption rate of >30%, which indicated that all might have high intestinal absorption. The skin permeability for the compounds was found below the threshold value >−2.5, which confirmed that all compounds could easily cross the skin barriers. P-glycoprotein I, also known as multi-drug resistance protein 1 (MDR1), functions as a biological barrier by extruding toxins and xenobiotics outside of the cell. P-gp II or MDR2 functions as a phospholipid translocator. Results indicated that all the hits and REF inhibitors are P-gp substrates. The accumulation of these compounds can be reduced in specific tissues. P-gp I/II inhibition results showed that Hit2, Hit3, and THZ1 are predicted to inhibit both the variants studied. Hit1 was not predicted to inhibit P-gp II, whereas CT7001 was not predicted to inhibit P-gp I. Hit4 was not predicted to be a P-gp I/II inhibitor. In the distribution the following properties: volume of distribution (VDss), blood-brain barrier permeability (BBBp), and central nervous system (CNS) permeability (CNSp) were considered. VDss below 0.71 L/kg is considered low, and above 2.81 L/kg is considered high. The higher the VDss, more the drug is distributed in tissues. Here, CT7001 showed higher VDss, followed by Hit1, Hit3, and Hit2. Hit4 and THZ1 showed the least values of VDss. If a compound showed a value of >0.3 for BBBp and >−2 for CNSp, they can cross the BBB and CNS. Interestingly, all the hits and REF inhibitors were not predicted to be permeable for both the parameters; hence, chances of brain-related toxicities are negligible. Cytochrome P450 (CYP450) is a vital detoxification enzyme found in the liver. It oxidizes the xenobiotics to facilitate their excretion. The two main isoforms of cytochrome responsible for drug metabolism are 2D6 and 3A4, which were studied. The results indicate that none of the hits and REF inhibitors were predicted as substrates or inhibitors for the 2D6 isoform of CYP450. Unfortunately, Hit1, Hit3, and both REF inhibitors were predicted as 3A4 substrates and inhibitors. Hit2 and Hit4 were not predicted to be 3A4 inhibitors. Additionally, Hit4 was not predicted as a 3A4 substrate. Clearance can be used to calculate the rate at which drugs must be added to the circulation to maintain the steady-state plasma concentration. The clearance results showed that Hit1, CT7001, and Hit4 might have a higher clearance rate than Hit2, Hit3, and THZ1, which may have an increased half-life. The critical parameters of pharmacokinetic toxicity were also studied. None of the hits and the REF inhibitors were predicted to be mutagenic in AMES toxicity prediction. The maximum tolerated dose and oral rat toxicity properties were also predicted for hits and REF inhibitors, and are reported in our study ( Table 5). The inhibition of human ether-a-go-go-related gene (hERG), which encodes a potassium ion (K) channel with two subtypes, hERG I and hERG II, was also predicted. The hERG I inhibitors may lead to cardiotoxicity-related effects. Hit compounds were not predicted to be hERG I inhibitors. hERG II is known to play a role in insulin secretion; all hits, except Hit4, were predicted to be hERG II inhibitors, and therefore may affect the glucose level. The pkCSM hepatotoxicity results showed that all hits and REF inhibitors might have hepatotoxic effects, but the results predicted using DS showed that hits could not be hepatotoxic. Nevertheless, these effects can be further cleared via in vitro studies. Our results showed that none of the hits and REF inhibitors were involved in skin sensitization allergic reactions. Abbreviation: IA-intestinal absorption, P-gp-P-glycoprotein, VDss-volume of distribution, BBBp-blood-brain barrier permeability, CNSp-central nervous system permeability, TC-total clearance, AMES-Salmonella typhimurium reverse mutation assay, hERG-human ether-a-go-go-related gene.

Discussion
Cyclin-dependent kinase 7 (CDK7) regulates the cell cycle and transcription and, therefore, plays a key role in cancer development and progression [70]. The role of CDK7 has been reported in multiple human cancers and, therefore, is considered a promising therapeutic target [30]. Several investigations have found CDK7 inhibitors so far; however, first-generation inhibitors have shown considerable adverse effects, limiting their usage in clinical trials [71]. Another reason reported for the failure of first-generation inhibitors is the higher level of structural similarity among CDK family members. In the case of CDK7, it shares a 44% sequence identity with its family member, CDK2 [29]. Therefore, many of the CDK7 inhibitors have been reported to target CDK2 [32]. However, only a few inhibitors with high selectivity for CDK7 have been identified in recent years, and they are now in the early stages of clinical studies [31]. Hence, the inhibitors which can selectively inhibit CDK7 are of primary importance. Since CDK7 plays such a critical role in cell proliferation and transcription, there is a constant need for research to find effective inhibitors that can control overexpression and combat emerging cancer resistance. Therefore, we employed a ligand-and structure-based pharmacophore modeling approach complexed with a series of other computational methods as a valuable strategy for targeting CDK7 inhibition ( Figure 1). The combination approach can retrieve more drug-like compounds from databases as both works on different principles [72,73]. To our knowledge, this is the first pharmacophore-based study to search for novel and selective CDK7 inhibitors. In the ligand-based approach, due to the small number of inhibitors known to date against CDK7 [31], a common feature approach was selected where a limited number of inhibitors are required as the training set ( Figure 2) [43]. The only available covalent inhibitor, THZ1bound structure (PDB ID: 6DX3), was used in the structure-based approach (Figure 3) [39]. Both approaches generated a total of ten hypotheses each (Tables 1 and 2). The hypotheses were validated with two well-known methods-ROC and GH approach [41,42]. According to our hypothesis validation outcomes Hypo7 and Hypo4 from the ligand-and structurebased approach, respectively, can filter out known active and inactive compounds (Table 3, Figure 4). As a result, both hypotheses were used for virtual drug-like database screening. The mapping resulted in 267 drug-like compounds showing the potential to interact with CDK7 ( Figure 5). The binding potential of obtained compounds was then studied by a molecular docking study [74,75]. Two well-known selective CDK7 inhibitors, CT7001 and THZ1, were used in the docking studies for comparative analysis. Recently, Wang et al., reviewed the inhibitor design studies of CDK7 and concluded that one or more hydrogen bonds with ATP-binding site residues, Gly21, Phe91, Met94, Asp155, Thr170, and Cys312, located outside the kinase domain are responsible for the inhibition [30]. Therefore, we selected only those compounds that displayed a similar binding mode as CT7001 and THZ1, greater docking scores than both REF inhibitors, and key molecular interactions with residues mentioned above. The representative pose of THZ1 and CT7001 in the docking study displayed a GoldScore value of 55.8 and 56.48, respectively. The molecular docking results further revealed that 13 compounds from the ligand-based approach and 11 from the structure-based approach have a greater docking score than CT7001 and THZ1 (Tables S4  and S5). One of the major limitations of the docking study is that they do not consider realtime dynamics of the protein-ligand interaction [47]. To study the real-time dynamics, we employed MD simulation studies. The selected protein-ligand complexes were prepared and subjected to MD simulations [43,44,59]. For comparative study, the known inhibitors THZ1 and CT7001 were also simulated. The MD study revealed that all the simulated systems showed stable RMSD and RMSF < 0.3 nm ( Figure 6 and Table S6), which are the criteria generally used in the stability assessment of simulated complexes [67,76]. We were further interested in studying the binding affinity of the simulated compounds compared to known inhibitors. For this purpose, a well-known method, MM-PBSA, was employed using the g_mmpbsa tool [58]. The analysis revealed that CT7001 showed average binding free energy of −90.58 kJ/mol, and THZ1 displayed −91.48 kJ/mol (Figure 7, Table 5). The previous reports confirm that the lower the binding free energy values, the higher the affinity of the molecules towards the protein [43,75,76]. A similar relationship was also observed in some combined in silico and in vitro studies [77,78]. Our analysis found that Hit1 displayed significantly better binding free energy, −170.01 kJ/mol compared to REF inhibitors; this was followed by Hit2 −103.17, Hit3 94.66, and Hit4 −90.58. All the hits displayed better binding free energies than CT7001, whereas THZ1 showed slightly better binding affinity than Hit4. Although four other compounds demonstrated better binding energies than THZ1 and CT7001 (Table S5), they were not considered for further analysis because they displayed a slightly different binding mode than REF inhibitors with CDK7. The binding mode of the compounds was assessed using the average structure extracted from the last 10 ns MD simulation trajectories. CT7001 is known to inhibit CDK7 with an IC 50 of 40 nM and CDK2 with an IC 50 of 620 nM via an in vitro kinase assay [35]. The study by Hazel et al., demonstrated that MD simulation of CT7001/CDK7 could form hydrogen bonds with hinge region residue Met94, and G-rich loop residues Gly21, Asp137, and Asp155. Additionally, using the Asp155 mutant, it has been confirmed that interaction with Asp155 is key for the selectivity of the CT7001 towards CDK7 [62]. Our study observed that Hit1 and Hit2 obtained a similar binding mode as CT7001 (Figure 8). Hit1 forms a hydrogen bond with hinge region residue Met94; unfortunately, molecular interaction with Asp155 was not observed. Hit2 was observed to form hydrogen bond interactions with Met94 and Asp155, which enhances the selectivity towards CDK7. Interestingly, Hit2 also forms alkyl interactions outside the kinase domain with Cys312, which is reported to provide a covalent nature to THZ1. The phenylaminopyrimidine derivative THZ1 is reported as the first covalent inhibitor for CDK7 with an IC 50 of 3.2 nM [36]. The docking studies indicated that THZ1 targets the CDK7 kinase domain and outside the domain via interacting with Cys312 through a hydrogen bond. The study also indicated that mutation of Cys312 to Ser312 prevented the THZ1 binding with CDK7 in a covalent fashion. This confirms the role of Cys312 interaction in covalent inhibition [36]. However, there is no MD simulation study reported to date for THZ1 dynamics. Here, we obtained a similar binding mode as reported, but instead of a hydrogen bond, a van der Waals interaction was observed for THZ1 with Cys312 ( Figure 9). This may happen due to the position of Cys312 as it is present at the last position or maybe due to a change in dynamics, as docking studies do not consider the protein flexibility. Our analysis revealed that THZ1 forms a hydrogen bond with hinge region residues Met94, Glu95, Ala154, and Asp155. Hit3 was observed to form hydrogen bonds with Glu95 and Asn141. Unfortunately, a hydrogen bond with Cys312 was not observed in the case of Hit3 but it formed a π-alkyl interaction with the Cys312, suggesting the compound may act as a covalent inhibitor. Hit4 binding mode revealed that it forms a hydrogen bond with Glu95, Asp97, and Cys312 located outside the kinase domain, important for covalent inhibition ( Figure 9). As mentioned earlier, CDK7 has similarities with CDK2; we next investigated for selectivity of the identified hits. For this purpose, a molecular docking study was performed with the CDK2 crystal structure, using similar parameters. The docking results confirmed that the docking scores were less than the co-crystal ligand CT7001 of CDK2 (Table S8). The structural details reveal that residues Lys33, Asp86, Gln131, and Asp145 are responsible for the polar interactions [68]. The hydrogen bond with Leu83 of CDK2 was reported in more than 90% of the CDK2 inhibitor interactions [68]. Our docking results followed the same interactions for the REF inhibitors CT7001 and THZ1. Interestingly, hits were not found to interact with the polar interaction site of CDK2 via hydrogen bond. Hit1 formed one hydrogen bond with Thr14, and Hit2 formed a hydrogen bond with Asn132, and both the residues are not reported to contribute much to inhibitor interactions. Additionally, our hits formed various interactions with CDK7 residues, which are not similar in CDK2. The sequence alignment of CDK7 with CDK2 revealed that ATP-binding site residues (Leu18, Ala24, Thr25, Glu95, Thr96, Val100, and Gln141) and residues outside the kinase domain (Arg309, Pro310, Asn311, and Cys312) are aligned with different CDK2 residues ( Figure S6). Interestingly, our hits were found targeting these residues via various types of interactions (Table S7). This indicates that identified hits might show selectivity for CDK7 (Figures 8 and 9). As part of the final assessment strategy, the detailed PK properties were assessed for REF inhibitors and hits, using the well-known tool pkCSM (Table 5). Results indicated that our hits showed slightly better PK properties than REF inhibitors. Among all, Hit4 was observed to be the best candidate in terms of absorption, distribution, metabolism, and toxicity parameters. Lastly, we provided the scaffolds comprised of the hits, IUPAC names, and database IDs (Table 6). Hit1 (ZINC20392430) and Hit4 (SN00262261) were observed to encompass benzoate scaffolds, while Hit2 (SN00112175) and Hit3 (SN00004718) were perceived to be pentamide and propanamide, respectively.  (Table 5). Results indicated that our hits showed slightly better PK properties than REF inhibitors. Among all, Hit4 was observed to be the best candidate in terms of absorption, distribution, metabolism, and toxicity parameters. Lastly, we provided the scaffolds comprised of the hits, IUPAC names, and database IDs (Table 6). Hit1 (ZINC20392430) and Hit4 (SN00262261) were observed to encompass benzoate scaffolds, while Hit2 (SN00112175) and Hit3 (SN00004718) were perceived to be pentamide and propanamide, respectively.  (Table 5). Results indicated that our hits showed slightly better PK properties than REF inhibitors. Among all, Hit4 was observed to be the best candidate in terms of absorption, distribution, metabolism, and toxicity parameters. Lastly, we provided the scaffolds comprised of the hits, IUPAC names, and database IDs (Table 6). Hit1 (ZINC20392430) and Hit4 (SN00262261) were observed to encompass benzoate scaffolds, while Hit2 (SN00112175) and Hit3 (SN00004718) were perceived to be pentamide and propanamide, respectively.  (Table 5). Results indicated that our hits showed slightly better PK properties than REF inhibitors. Among all, Hit4 was observed to be the best candidate in terms of absorption, distribution, metabolism, and toxicity parameters. Lastly, we provided the scaffolds comprised of the hits, IUPAC names, and database IDs (Table 6). Hit1 (ZINC20392430) and Hit4 (SN00262261) were observed to encompass benzoate scaffolds, while Hit2 (SN00112175) and Hit3 (SN00004718) were perceived to be pentamide and propanamide, respectively. Table 6. The database ID, IUPAC name, and 2D representation of the final identified hits.  (Table 5). Results indicated that our hits showed slightly better PK properties than REF inhibitors. Among all, Hit4 was observed to be the best candidate in terms of absorption, distribution, metabolism, and toxicity parameters. Lastly, we provided the scaffolds comprised of the hits, IUPAC names, and database IDs (Table 6). Hit1 (ZINC20392430) and Hit4 (SN00262261) were observed to encompass benzoate scaffolds, while Hit2 (SN00112175) and Hit3 (SN00004718) were perceived to be pentamide and propanamide, respectively. [(6R,7R)-3-[(E)-3-acetoxyprop-1-enyl]-7-hydroxy-7-methyl-8-oxo-5,6-dihydro-1H-isochromen-6-yl] 2,4-dihydroxy-6-methyl-benzoate

Conclusions
To identify new potential scaffolds against CDK7, two commonly used pharmacophore modeling approaches, ligand and structure-based, were used to generate hypotheses. The generated hypotheses were used for virtual screening of a drug-like database prepared from four natural compound databases. The filtered compounds were then subjected to molecular docking and subsequent molecular dynamics simulations for identification of their binding mode with CDK7. Additional ∆G calculations confirmed that four hits display a better binding affinity for CDK7 when compared with CT7001 and THZ1. Moreover, the selectivity of the identified hits was checked using molecular docking against CDK2, a close homolog of CDK7. Our results confirmed that the identified hits form polar and non-polar interactions with the residues unique to CDK7 (Ala24, Glu95, Thr96, Val100, Gln141, Pro310, Asn311, and Cys312), reported to enhance the selectivity. As a result, we recommend that future drug design studies focus on these residues in order to develop covalent inhibitors of CDK7, in line with earlier reports. Finally, the pharmacokinetic (PK) properties were investigated, and analyses revealed that hits have better PK properties when compared with CT7001 and THZ1. We argue that our identified hits will help to design novel drugs for CDK7.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/biomedicines9091197/s1, Figure S1. The pharmacophore models from both the approaches obtained after ROC validation. Figure S2. Molecular docking site used in the present study. The bound ligand THZ1 was used to define the docking sphere. Figure S3. Validation of docking parameters using co-crystallized ligand THZ1 (grey) and docked pose (black). Figure S4. Binding patterns of the reference inhibitors and the hits in the active site of CDK7. Superimposition (left) of THZ1, CT7001, Hit1, Hit2, Hit3, and Hit4 (left) and its enlarged view (right). The protein is shown as grey color ribbon representation, and the ligands are shown with stick representation. Only polar hydrogen atoms are shown for clear visualization. Figure S5. The 2D molecular interactions of the reference inhibitors and identified hits with CDK2 crystal structure bound with CT7001 (PDB ID: 5JQ5). The dark green dashed lines indicating hydrogen bonds, light green: van der Waals, purple: π-sigma, red: unfavorable acceptor-acceptor, orange: π-cation, while the π-π and π-alkyl interactions are shown in pink. Figure S6. Sequence alignment of CDK7 (PDB ID: 6XD3) and CDK2 (PDB ID: 5JQ5) resulted in 44.4% identity and 67.1% similarity. The red boxes are used to show the differences between the active site and outside the active site residues (309-312) of both the proteins. The degree of identity ranges from dark cyan color (identical) to white color (non-identical). Table S1. Ligand-based pharmacophore validation using receiver operating characteristic (ROC) curve. Table S2. Structure-based pharmacophore validation using receiver operating characteristic (ROC) curve. Table S3. Details of different properties used for generation of drug-like database. Table S4. List of potential compounds obtained after molecular docking. A total of 13 compounds show a better GoldScore than reference (REF) inhibitor. The binding free energies obtained after MD simulation are also shown. Table S5. List of potential compounds obtained after molecular docking. A total of 11 compounds show better GoldScore than REF inhibitor. The binding free energies obtained after MD simulation are also shown. Table S6. Molecular docking and molecular dynamics simulation analysis of hits and reference inhibitors against CDK7. Table S7. Molecular interactions of reference inhibitors and hits with CDK7 active site residues acquired from stable molecular dynamics simulation trajectories. Table S8. The molecular docking scores of hits and reference inhibitors with CDK2 and CDK7.