Structure-Based Virtual Screening Identifies Multiple Stable Binding Sites at the RecA Domains of SARS-CoV-2 Helicase Enzyme

With the emergence and global spread of the COVID-19 pandemic, the scientific community worldwide has focused on search for new therapeutic strategies against this disease. One such critical approach is targeting proteins such as helicases that regulate most of the SARS-CoV-2 RNA metabolism. The purpose of the current study was to predict a library of phytochemicals derived from diverse plant families with high binding affinity to SARS-CoV-2 helicase (Nsp13) enzyme. High throughput virtual screening of the Medicinal Plant Database for Drug Design (MPD3) database was performed on SARS-CoV-2 helicase using AutoDock Vina. Nilotinib, with a docking value of −9.6 kcal/mol, was chosen as a reference molecule. A compound (PubChem CID: 110143421, ZINC database ID: ZINC257223845, eMolecules: 43290531) was screened as the best binder (binding energy of −10.2 kcal/mol on average) to the enzyme by using repeated docking runs in the screening process. On inspection, the compound was disclosed to show different binding sites of the triangular pockets collectively formed by Rec1A, Rec2A, and 1B domains and a stalk domain at the base. The molecule is often bound to the ATP binding site (referred to as binding site 2) of the helicase enzyme. The compound was further discovered to fulfill drug-likeness and lead-likeness criteria, have good physicochemical and pharmacokinetics properties, and to be non-toxic. Molecular dynamic simulation analysis of the control/lead compound complexes demonstrated the formation of stable complexes with good intermolecular binding affinity. Lastly, affirmation of the docking simulation studies was accomplished by estimating the binding free energy by MMPB/GBSA technique. Taken together, these findings present further in silco investigation of plant-derived lead compounds to effectively address COVID-19.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic has drastically affected almost 218 countries while imposing a severe health and economic burden [1]. A novel coronavirus (nCoV, SARS-CoV-2) is reported to be the causative agent of this infectious disease with the mode of transmission of COVID-19 being found to be through nasopharyngeal discharge from the nose, including droplets of saliva expelled during sneezing or coughing by an infected person [2,3]. Despite the non-specificity of the symptoms and asymptomatic condition of the disease, a range of prevailing acute symptoms such as dry cough, loss of smell part of Pocket 26, with pocket 25 being another potential target of a helicase inhibitor, Darunavir with antiviral activity. Likewise, a number of other plant derived natural compounds were identified as helicase inhibitors in vitro, particularly flavonoids such as xanthones, rutin, triptexanthoside D, phyllaemblinol and quercetagetin [10]. Other effective inhibitors of SARS-CoV helicases including myricetin, scutellerein, eubananin, bananin, vanillinbananin, and iodobananin are also reported. These compounds work by blocking the ATPase activity rather than through the unwinding activity [28,29]. Besides natural products with inhibitory activity against SARS helicase enzyme, synthetic chemical compounds are also reported and these include; 7-ethyl-8-mercapto-3-methyl-3,7-dihydro-1H-purine-2,6-dione, SSYA10-001, a 1,2,4 triazole, and (E)-3-(furan-2-yl)-N-(4-sulfamoylphenyl)acrylamide [30][31][32].
Many other FDA-approved antiviral drugs have also shown promising inhibitory activity against helicases. The drugs that are so far predicted to be repurposed for the treatment of COVID-19 include anticoagulants (dabigatran), antifungals (itraconazole), anti-bacterials (lymecycline, cefsulodine and rolitetracycline), diuretics (canrenoic) and anti-HIV-1 drugs (saquinavir) [8,10,33]. Computer-aided drug discovery, design and development of small molecules against viral protein targets, therefore, offers a fast-paced, cost-effective approach [34]. Among the viral protein targets required for the design of small-molecule agents with inhibitory activity, SARS-CoV-2 helicase (nsp13) is of particular interest due to its highly conserved genomic sequences across coronaviruses, besides its unique function, and characteristic active site [35]. Given all this, using different applications of computational drug design herein we virtually screened Medicinal Plant Database for Drug Designing (MPD3) database [36] against SARS-CoV-2 helicase to identify new phytochemicals with improved binding, pharmacokinetics, non-toxicity and easily available for experimentalists for in vitro and in vivo testing.

Materials and Methods
A summary of the methodology flow used in this study for the identification of hit and stable molecules against SARS-CoV-2 helicase is presented in Figure 1.

Preparation of the SARS-CoV-2 Helicase Structure
The study commenced with the retrieval of SARS-CoV-2 helicase enzyme crystal structure (PDB ID, 6ZSL) available at a good resolution of 1.94 Å. Immediately, the structure was treated in UCSF Chimera, alpha version 1.15 [37] minimization phase where its geometry was optimized, loops and side chains were fixed and hydrogen atoms were added. All co-crystalized ligands were deleted and the structure energy was minimized via two-step process to remove high energies. It was noticed that minimization of 500 steps of steepest descent steps and 500 conjugate gradient steps at step size of 0.02 Å are enough to get high stereo-chemical quality of the enzyme close to the native structure.

Phytochemicals Library Preparation
For virtual screening, MPD3 (https://www.bioinformation.info/index.html) accessed on 10 September 2020 was used [36]. This database is freely available, downloadable and contains information pertaining to phytochemicals, their structures and activities and test targets. Currently, the MPD3 contains 12,281 phytochemicals which are grouped into several categories, i.e., aromatics, alkaloids, steroids, saponins, flavonoids, etc. The complete library was downloaded and imported to the PyRx virtual screening package 0.8 [38] where all compounds were energy minimized and converted to pdbqt format. Nilotinib, which is a tyrosine kinase inhibitor, was used as a control molecule. This molecule has been demonstrated to inhibit SARS-CoV-2 in vitro [39] and interacts with SARS-CoV-2 helicase enzyme [19].

Binding Conformational Analysis
AutoDock 4.2 [40] was utilized to dock the control inhibitor (nilotinib) as well as library of phytochemicals from Nsp13 helicase enzyme protein towards the whole protein surface. The grid box was centered at x: −13.62, y: 26.04 and z: −70.09 coordinates, with the dimensions of the grid points set to 69.75 × 86.68 × 68.21. The grid spacing for this enzyme was adjusted to 0.375 Å. The Lamarckian genetic algorithm (LGA) was then used for the molecular docking with its specified parameters set to default as follows; initial population size; 150 individuals, maximum number of generations: 27,000, maximum number of energy evaluations: 2,500,000, with 0.02 gene mutation rate, cross over rate of 0.8 with number of runs set to 100 GA. The root mean square deviation (RMSD) [41] having a threshold value of 2.0 was used for binding conformational studies, with the lowest inhibition constant values and the lowest binding energy considered as the most favorable binding conformation. UCSF Chimera, alpha version 1.15 [37], Discovery Studio Visualizer [42] and Molecular Operating Environment (MOE) [43] programs were used to analyze the conformational binding and molecular basis of interactions between the enzyme and ligands. Drug-likeness, pharmacokinetics, and toxicity profiles of hits were then unraveled through SwissADME [44] and PreADMET [45].

Molecular Dynamics (MD) Simulation
To understand and assess the chemical, biological, physical, as well as structural stability, it was crucial to analyze the conformational behavior of the screened ligand and its complexes with the SARS-CoV-2 helicase enzyme [46]. The AMBER18 program [47] with the general AMBER force field [48] for ligands preparation and the ff14SB force field [49] for enzyme preparation were used during the molecular simulation to evaluate the dynamic and structural profiles of ligands docked into the binding sites of the target protein of interest. After initial preparation, each system was subjected to 500 steps of steepest descent and conjugate gradient minimization steps. The immersion of top-selected docking complexes was performed in TIP3P water box (the spacing between the edge box and complex was adjusted at 12.0 Å. Counterion treatment was done for system neutralization. The NVT ensemble was run for 20 ps to heat the system to a target temperature set to 310 K. Consequently, NPT ensemble was applied to the system for approximately 40 ns to equilibrate the system, followed by 50 ns of production simulation. The pressure was maintained at an average of 1 atm using isotropic position scaling. Temperature controlled was accomplished via Langevin dynamics allowing the collision frequency of 1 ps −1 [50]. For non-bonded interactions, a cutoff of 8 Å was used, while for long range electrostatic interaction, Particle Mesh Ewald (PME) method was employed [51]. The hydrogen bonds were constrained by SHAKE method [52]. Lastly, the generated MD trajectories were analyzed through CPPTRAJ [53] and Visual Molecular Dynamics (VMD) v.1.9.3 [54].

Free Binding Energy Calculations via MMPB/GBSA
The binding free energy calculations were performed using a force field-based approach through a MMPB/GBSA method [55]. This is used to calculate the difference in binding free energy resulting from the interactions between the ligands (small molecules), protein (macromolecular target) and the solution complex free energies [56]. These intermolecular activities between the small molecules and their ability to bind to the target protein is mathematically equated as follows: Where the symbols 'L' and 'P' represent the ligand and target protein and the complex is represented by 'LP'. In principal, this in silico approach provides useful details on the assessment of the free energy of this reaction as represented by ∆G bind . Thereby, predicting the binding affinity of any drug without the need to experimentally synthesize it first. The following equation is computed for the calculation of free binding energy: The mathematical relationship between the free energy associated with the ligand, proteins and their complexes, with their decomposition state into the gaseous phase, MM energy, including the nonpolar and polar solvent and entropy are represented by the following formula: ∆G = ∆E MM + ∆G solv − T·∆S = ∆E BAT + ∆E vdW + ∆E coul + ∆G solv,p + ∆G solv,np − T·∆S The sum of bond, torsion and angle terms in the force field are collectively denoted by E BAT , and E MM , whereas E vdW , and E coul represent the van der Waals term and Coulombic term, respectively. The generalized-Born (GB) approximation is used for the estimation of the solvation free energy, where G solv,np denotes the nonpolar solvation free energy, which is a linear function of a computational interface; solvent-accessible surface area (SASA). Then, the VSGB 2.0 solvation model/ MMPB/GBSA energy model was used to calculate the binding energies of ligand-protein complexes, neglecting the entropy term [57]. The net MMPB/GBSA energy associated with each screened compound which is estimated through the 100-trajectory frames collected per simulation run.

Comparative Binding Sites and Conformational Analysis
The top ranked compounds and controls were examined for their natural tendency of binding to the SARS-CoV

Comparative Binding Sites and Conformational Analysis
The top ranked compounds and controls were examined for their natural tendency of binding to the SARS-CoV

Comparative Chemical Interactions Analysis
Next, molecular-level interactions involved in binding the compound/control at different sites of the SARS-CoV-2 helicase enzyme were investigated to decipher the key chemical forces crucial for intermolecular binding and stability of complexes. The control Nilotinib at the ATP binding site is reported to form strong hydrogen bonds, in particular with the enzyme H9 helix residues (Gly287, Lys288, and Ser289) at Rec1A domain via its (trifluoromethyl)benzene. The rest of the compound structure stabilization is provided by medium and long range van der Waals and other hydrophobic interactions (Figure 4). The

Comparative Chemical Interactions Analysis
Next, molecular-level interactions involved in binding the compound/control at different sites of the SARS-CoV-2 helicase enzyme were investigated to decipher the key chemical forces crucial for intermolecular binding and stability of complexes. The control Nilotinib at the ATP binding site is reported to form strong hydrogen bonds, in particular with the enzyme H9 helix residues (Gly287, Lys288, and Ser289) at Rec1A domain via its (trifluoromethyl)benzene. The rest of the compound structure stabilization is provided by medium and long range van der Waals and other hydrophobic interactions (Figure 4). The lowest binding energy conformation of the compound at site 1 is anchored at the H14-H15 helix Rec1A domain loop, with further chemical stabilization by dual hydrogen bonds with Asn557 of Rec2A via 1,2,4-triazolidine ring ( Figure 5A). The predominant ATP binding site (binding stie 2) of the hit compound involved mainly van der Waals bonding and alky interactions at the binding site of Rec1A and Rec2A domains throughout the length of the compound ( Figure 5B). At binding site 3, the conformation of the compound produces two hydrogen bonds through its 1,2,4-triazolidine ring with Glu142 stalk H5 helix, the acetophenone is attached to Rec1A domain through a single hydrogen bond with Asn361, and 3,3,5,5,8-pentamethyl-2,3,4,4a,5,10b-hexahydropyrano[3,2-c]chromene also formed a hydrogen bond with Arg339 Rec1A H11-H12 loop. The remaining compound structure established multiple van der Waals, sigma and alkyl interactions with Rec1A, stalk and 1B domains ( Figure 5C). In the least determined binding conformation (binding site 4), 1,2,4-triazolidine ring again engaged Glu142 of the stalk H5 helix in hydrogen bonding while the remaining chemical moieties are hydropophically attached with Rec1A, stalk and 1B domains ( Figure 5D).  Figure 5C). In the least determined binding conformation (binding site 4), 1,2,4-triazolidine ring again engaged Glu142 of the stalk H5 helix in hydrogen bonding while the remaining chemical moieties are hydropophically attached with Rec1A, stalk and 1B domains ( Figure 5D).

SwissADME Analysis
SwissADME is an online server used to compute different physicochemical descriptors along with predictions of drug-like nature, ADME parameters, medicinal chemistry friendliness and pharmacokinetic properties to assist drug discovery. Detail results of each term for the hit molecule described above are listed in Table 1. The oral bioavailability radar of the compound is presented in Figure 6. Physicochemically, the compound properties are within the range of drug-likeness and do not violate any of the Lipinski rule parameters. Topological polar surface area (TPSA), which is the surface sum of all polar atoms, is commonly used metrics to optimize drug capacity to penetrate the blood barrier [58]. Moreover, the compound has good lipophilicity thus maximizing its transportation and reaching to the target site [59]. Additionally, the compound has good gastrointestinal absorption and does not inhibit the majority of the cytochrome P450 isoforms that are significant in drug elimination through the process of metabolic biotransformation. The compound was also demonstrated to fulfill all requirements of the prominent Lipinski [60], Veber [58], Egan [61] and Muegge [62] druggability rules. The bioavailability score of the compound is 0.55. This predicts the compound probability to be at least 10% bioavailable. From a synthetic chemistry perspective, the compound synthesis is easy. The molecule also predicted not to contain Pan-assay interference compounds (PAINS) alerts and will not interact non-specifically with multiple biological targets but rather react with one specific desired target [63]. More importantly, the compound is non-toxic.

SwissADME Analysis
SwissADME is an online server used to compute different physicochemical descriptors along with predictions of drug-like nature, ADME parameters, medicinal chemistry friendliness and pharmacokinetic properties to assist drug discovery. Detail results of each term for the hit molecule described above are listed in Table 1. The oral bioavailability radar of the compound is presented in Figure 6. Physicochemically, the compound properties are within the range of drug-likeness and do not violate any of the Lipinski rule parameters. Topological polar surface area (TPSA), which is the surface sum of all polar atoms, is commonly used metrics to optimize drug capacity to penetrate the blood barrier [58]. Moreover, the compound has good lipophilicity thus maximizing its transportation and reaching to the target site [59]. Additionally, the compound has good gastrointestinal absorption and does not inhibit the majority of the cytochrome P450 isoforms that are significant in drug elimination through the process of metabolic biotransformation. The compound was also demonstrated to fulfill all requirements of the prominent Lipinski [60], Veber [58], Egan [61] and Muegge [62] druggability rules. The bioavailability score of the compound is 0.55. This predicts the compound probability to be at least 10% bioavailable. From a synthetic chemistry perspective, the compound synthesis is easy. The molecule also predicted not to contain Pan-assay interference compounds (PAINS) alerts and will not interact non-specifically with multiple biological targets but rather react with one specific desired target [63]. More importantly, the compound is non-toxic.

MD Simulation of the Docked Models for Structural Stability Analysis
With the docked model having the highest stability profile, MD simulation was conducted with a run-time of 50 ns. Then, using root-mean-square deviation (RMSD) of the SARS-CoV-2 helicase and control/compound as shown in Figure 7A,B, the structural stability analysis were performed on the docked models. The mean RMSDs and standard deviations of the enzyme structure in all complexes are as; control (2.86 Å ± 0.33), binding site 1 (3.84 Å ± 0.66), binding site 2 (3.07 Å ± 0.53), binding site 3 (2.52 Å ± 0.31) and binding site 4 (3.26 Å ± 0.52). Furthermore, ligands mean RMSDs and standard deviations values in these complexes are; control (1.04 Å ± 0.19), binding site 1 (0.99 Å ± 0.15), binding site 2 (1.19 Å ± 0.33), binding site 3 (0.37 Å ± 0.08) and binding site 4 (2.34 Å ± 0.17). The conformations derived from the VMD analysis revealed the inhibitors were constantly attached to the binding sites of target proteins in the complex. Furthermore, any changes of residues as well as the similar patterns with fluctuations within complexes were identified using root mean square fluctuations (RMSF) ( Figure 7C). RMSD analysis indicated that the binding site 2 (ATP) binding site is more comparable to the control and has the same stability pattern. In contrast, the complex of the enzyme and compound at binding site 3 demonstrated high residual flexibility. The compound binding site at 4 was observed to induce more residual flexibility but still highly within the acceptable range. The highly fluctuating regions revealed the residues Thr228-Val570 present towards the active site with highly flexible loops, as shown in ( Figure 4C). Hence, the stability of the docked models were confirmed by both the RMSD and RMSF analyses. Similarly, the helicase enzyme in all complexes is highly compact and can be concluded to enjoy structural stability in the enzyme presence ( Figure 7D fluctuating regions revealed the residues Thr228-Val570 present towards the active site with highly flexible loops, as shown in ( Figure 4C). Hence, the stability of the docked models were confirmed by both the RMSD and RMSF analyses. Similarly, the helicase enzyme in all complexes is highly compact and can be concluded to enjoy structural stability in the enzyme presence ( Figure 7D). The mean ROG values for the complexes are; control (27.50 Å ± 0.11), binding site 1 (27.57 Å ± 0.15), binding site 2 (27.52 Å ± 0.14), binding site 3 (27.22 Å ± 0.11) and binding site 4 (27.82 Å ± 0.27).

Protein-Inhibitor Stability Involving Hydrogen Bond Interactions
The MD simulations were also performed to study the effect of hydrogen bond interaction by measuring the distances between the hydrogen bond (usually heavy atoms) donors and acceptors [64]. This further provided the number and specific binding patterns between the control/compound and enzyme as shown by the active sites given in Figure 8. The control was inferred to be engaged in a network of strong hydrogen bonds (maximum 3) with close distances from the ATP site throughout the simulation time, in particular getting stronger towards the end. Likewise, the predominant ATP binding site of the compound (site 2) seems to follow the same binding pattern of control and demonstrated favor formation of close distance hydrogen bonding. At the binding site 3, it was observed during the simulation procedure that the number of hydrogen bonding and distances were fluctuating, such alterations, however, did not influence the interaction capacity of the binding compound. The binding site 4 of the compound was found the most unstable in terms of hydrogen bonds and the binding patterns seemed highly fluctuating.

Determining Binding Free Energies
The MD trajectories were utilized to estimate the total and residual binding free energies associated with the interactions between the control/compound and the enzyme. During the trajectory analysis, the main interacting residues involved in inhibitor-bound conformation were determined. From the 100 frames of MD trajectories, all complexes including the control had similar and stable net binding energy values indicating stable binding of the control as well as that of the compound at different binding sites of the SARS-CoV-2 helicase triangular base. Comparatively, in the case of MMGBSA, the control compound complex had better net binding energy value. On the contrary, the filtered high affinity binding at site 1 had the stronger binding, followed by site 4th, 3rd and predominant ATP binding site. In all these complexes, higher contribution was found from the gas phase with significant stability provided by van der Waals energy and sufficient by electrostatic energy. For the stronger binding, the nonpolar energy was also demonstrated to play some role in the complex binding. mum 3) with close distances from the ATP site throughout the simulation time, in particular getting stronger towards the end. Likewise, the predominant ATP binding site of the compound (site 2) seems to follow the same binding pattern of control and demonstrated favor formation of close distance hydrogen bonding. At the binding site 3, it was observed during the simulation procedure that the number of hydrogen bonding and distances were fluctuating, such alterations, however, did not influence the interaction capacity of the binding compound. The binding site 4 of the compound was found the most unstable in terms of hydrogen bonds and the binding patterns seemed highly fluctuating.

Determining Binding Free Energies
The MD trajectories were utilized to estimate the total and residual binding free energies associated with the interactions between the control/compound and the enzyme. During the trajectory analysis, the main interacting residues involved in inhibitor-bound conformation were determined. From the 100 frames of MD trajectories, all complexes including the control had similar and stable net binding energy values indicating stable binding of the control as well as that of the compound at different binding sites of the SARS-CoV-2 helicase triangular base. Comparatively, in the case of MMGBSA, the control compound complex had better net binding energy value. On the contrary, the filtered high affinity binding at site 1 had the stronger binding, followed by site 4th, 3rd and predominant ATP binding site. In all these complexes, higher contribution was found from the gas   The given Table 1 also showed that the average MMPBSA binding energy value (−41.52 ± 5.29 kcal/mol) for control, was found to be lower at site 1 of the compound (−45.78 ± 3.54 kcal/mol). While, the binding free energy for site 2, site 3 and site 4 are −32.18 ± 4.52 kcal/mol, −34.16 ± 4.40 kcal/mol, −39.16 ± 4.04 kcal/mol, respectively. A similar trend of higher van der Waals and less electrostatic contribution as reported in the MMGBSA was also seen in the MMPBSA. Again, the net gas phase energy is dominated by both the control and compound binding to the SARS-CoV-2 helicase enzyme in contrast to the non-favorable role of the solvation energy.

Residue Wise Energy Contribution
To gain further insight into the role of individual residues in the binding pockets of the compound/control, the MMGBSA binding free energy was decomposed per residue. It was observed that the majority of the interacting residues of the control and the screened compound are located within the hydrophobic pocket towards the binding site and have shown moderate interactions with the ligand molecule and hence moderate binding affinities as predicted by their docking analysis. In case of control that binds to the ATP binding site, the strongest residues average free energy were those of Gly518 (−3.41 kcal/mol), Glu355

Conclusions
Due to the alarming increase in transmissibility and infectivity rate of SARS-CoV-2, the development of new antiviral therapies remains a serious and demanding challenge. The SARS-CoV-2 helicase is an integral part of the virus replication machinery, does not show any sequence homology and coverage to the human proteome [65], and its crystal structure has been determined previously through X-ray crystallography. All this make SARS-CoV-2 enzyme an attractive biological target for inhibitory molecules design. Our present in silico study focused on identifying biologically-active phytochemicals that interact exclusively and with high affinity with the selected enzyme. To study the nature of these interactions as well, the insights into vital contributing residues that facilitated binding between the target protein and the control/compound, docked models were generated. The docking runs revealed that the top ranked filtered compounds and controls tend to bind to the ATP binding site of SARS-CoV-2 helicase enzyme. The binding mode of each ligand-protein docked complex was then subjected to an extensive molecular dynamic analysis. We then gathered further computational details to characterize the key residues that contribute towards binding affinity. The parameters such as the binding free energies associated with each residue towards their respective active sites were then estimated. Interestingly, it was found that the binding free energies of the intermolecular hydrogen bonding at the binding pocket showed relatively weaker contributions to the binding. On the contrary, the binding free energy derived from the hydrophobic interactions relatively showed a greater binding strength reflecting stronger interactions between the compounds bound to the helicase enzyme. Overall, during the docking process, the compounds showed a tendency of binding to three different sites of the helicase enzyme: the predominant binding site is the ATP molecule binding site (binding site 2) where both the control drug and hit compounds of this study bind. The binding site 1 is H14-H15 helix, Rec1A domain loop. On this site, the compound bounded with high affinity but were seen in fewer docking runs compared to binding site 2. The 3rd and 4th binding sites between Rec1A and Rec2A, on the other hand, are the least reported sites for compound binding. Interesting, it was inferred that the four sites were crucial in enhancing the binding of the compound to the enzyme without contributing towards the hydrogen bonding. It was further observed that the complexes are quite stable from an energy perspective, and several residues at the docked sites of the enzyme are engaging the compound strongly via van der Waals force and less by hydrogen bonding.