Discovery of Novel Chinese Medicine Compounds Targeting 3CL Protease by Virtual Screening and Molecular Dynamics Simulation

The transmission and infectivity of COVID-19 have caused a pandemic that has lasted for several years. This is due to the constantly changing variants and subvariants that have evolved rapidly from SARS-CoV-2. To discover drugs with therapeutic potential for COVID-19, we focused on the 3CL protease (3CLpro) of SARS-CoV-2, which has been proven to be an important target for COVID-19 infection. Computational prediction techniques are quick and accurate enough to facilitate the discovery of drugs against the 3CLpro of SARS-CoV-2. In this paper, we used both ligand-based virtual screening and structure-based virtual screening to screen the traditional Chinese medicine small molecules that have the potential to target the 3CLpro of SARS-CoV-2. MD simulations were used to confirm these results for future in vitro testing. MCCS was then used to calculate the normalized free energy of each ligand and the residue energy contribution. As a result, we found ZINC15676170, ZINC09033700, and ZINC12530139 to be the most promising antiviral therapies against the 3CLpro of SARS-CoV-2.


Introduction
The ongoing COVID-19 worldwide pandemic (declared by the WHO) has caused more than 6.58 million deaths and 632.9 million cases worldwide as of 24 October 2022 [1]. Even though the symptoms are relatively mild, the high transmission rate has caused a crisis that has resulted in a variety of supply chain and economic issues as workplaces and ports globally have shut down [2,3]. Treatments to fight the virus are often thwarted by the virus's ability to mutate and evolve. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has already mutated into several different variants such as Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Iota, Theta, and Kappa [4]. The newer variants and subvariants are not only more transmissible but also become more difficult for host antibodies to recognize. This deadly combination combats the long-term immunity that vaccines are trying to achieve.
Therefore, the innovations of less time-consuming antiviral drug development methods such as drug repurposing are needed to keep up with the rates of infection to eventually solve the problem [5]. Drug repurposing is one of the quickest ways to develop effective therapeutics against COVID-19 [6]. For example, in one study, researchers conducted

Virtual Screening of TCM
We performed ligand-based virtual screening of the traditional Chinese medicines database (TCM) to exclude ligands that do not have the potential to bind with 3CL pro within the binding pocket of the five inhibitors listed above. We narrowed down the 177,764 TCM compounds to 17,939 compounds that have similar pharmacophore features to the five inhibitors listed above. Then we conducted structure-based virtual screening for these 17,939 compounds and listed the top 10 ligands with the best docking scores. The suitable ligands found in this study are ZINC15676170, ZINC15675325, ZINC12529667, ZINC13550544, etc. (Table 1). The best docking score, −9.178 kcal/mol, was reported for  ZINC15676170. The candidates shown in Table 2 have high potential against the 3CL pro of SARS-CoV-2. database (TCM) to exclude ligands that do not have the potential to bind with 3CL pro within the binding pocket of the five inhibitors listed above. We narrowed down the 177,764 TCM compounds to 17,939 compounds that have similar pharmacophore features to the five inhibitors listed above. Then we conducted structure-based virtual screening for these 17,939 compounds and listed the top 10 ligands with the best docking scores. The suitable ligands found in this study are ZINC15676170, ZINC15675325, ZINC12529667, ZINC13550544, etc. (Table 1). The best docking score, −9.178 kcal/mol, was reported for ZINC15676170. The candidates shown in Table 2 have high potential against the 3CL pro of SARS-CoV-2. database (TCM) to exclude ligands that do not have the potential to bind with 3CL pro within the binding pocket of the five inhibitors listed above. We narrowed down the 177,764 TCM compounds to 17,939 compounds that have similar pharmacophore features to the five inhibitors listed above. Then we conducted structure-based virtual screening for these 17,939 compounds and listed the top 10 ligands with the best docking scores. The suitable ligands found in this study are ZINC15676170, ZINC15675325, ZINC12529667, ZINC13550544, etc. (Table 1). The best docking score, −9.178 kcal/mol, was reported for ZINC15676170. The candidates shown in Table 2 have high potential against the 3CL pro of SARS-CoV-2. database (TCM) to exclude ligands that do not have the potential to bind with 3CL pro within the binding pocket of the five inhibitors listed above. We narrowed down the 177,764 TCM compounds to 17,939 compounds that have similar pharmacophore features to the five inhibitors listed above. Then we conducted structure-based virtual screening for these 17,939 compounds and listed the top 10 ligands with the best docking scores. The suitable ligands found in this study are ZINC15676170, ZINC15675325, ZINC12529667, ZINC13550544, etc. (Table 1). The best docking score, −9.178 kcal/mol, was reported for ZINC15676170. The candidates shown in Table 2 have high potential against the 3CL pro of SARS-CoV-2. The table above shows the structure, docking score, molecular weight, and logP value of the ligands with the highest therapeutic potential against 3CL pro of SARS-CoV-2.

Molecular Dynamics Simulation of the Top Ligands
To better understand the conformational and dynamics features of the top ligands against the 3CL pro of SARS-CoV-2, molecular dynamics simulations were performed for the top 10 ligands. Dynamics features of the top 10 ligands were calculated after a 100 ns MD simulation.
As we can see from Table 2, the binding free energies predicted by the molecular dynamics simulation of ZINC09033700, ZINC15676170, and ZINC12530139 are −6.8953 kcal/mol, −4.3351 kcal/mol, and −3.0082 kcal/mol, respectively, which indicates that the binding of 3CL pro with ZINC09033700, ZINC15676170, and ZINC12530139 is stable.

The Time Course of Root-Mean-Square Deviations (RMSD) Analysis
The stabilities of all the top 10 systems were evaluated by using the time course of root-mean-square deviations (RMSD) analysis. As shown in Figures S1 and S2, the results indicated that the stabilities of all ten systems were high with acceptable deviation. The average RMSD for all systems ranged between 1 and 5Å.

The Root-Mean-Square Fluctuation (RMSF) Analysis
The RMSF delineated the average deviation of a particle over time from a reference location. As shown in Figure 1, the deviation pattern of these three systems is very similar. Significant fluctuations were observed within the regions 0-10, 40-50, 85-95, 145-160, 240-250, and 290-307, which represent the loop regions. The regions 10-20, 50-90, 100-110, and 250-275 are very stable, which represent helix strands or β turns. These findings are consistent with the structure of the 3CLpro of SARS-CoV-2. The region 200-220 fluctuates greatly, which may represent a connecting loop. Fluctuations in other regions were relatively flat. These findings indicate that the 3CLpro of SARS-CoV-2 is stabilized by the binding of ZINC15676170, ZINC09033700, and ZINC12530139.

Energy Decomposition of Three Potential Inhibitors and Their Complex by MM/GBSA Calculation and MCCS
Residue energy contribution is an important factor to be considered during the screening process. By acknowledging the importance of the bonds in key residues, the results from the screening process are more accurate and effective. The residue energy contributions shown in Tables 3-5 were predicted by MM/GBSA calculation. We only listed the top 10 residues. MCCS utilized the jdock function to calculate and display the energy contributions of each residue. To identify residues classified as "key residues", we only chose residues that had an inter-ligand free energy contribution of <−0.3 kcal/mol (the lower values represent the stronger binding between the ligand and the receptor). The values differed from the docking scores because MCCS calculates total free energy with the jdock function, while Glide, the ligand docking program we used on Maestro, calculates docking scores with the SP and XP GlideScore functions.
ules 2023, 28, x FOR PEER REVIEW 6 of were relatively flat. These findings indicate that the 3CLpro of SARS-CoV-2 is stabilize by the binding of ZINC15676170, ZINC09033700, and ZINC12530139.

Energy Decomposition of Three Potential Inhibitors and Their Complex by MM/GBSA Calculation and MCCS
Residue energy contribution is an important factor to be considered during th screening process. By acknowledging the importance of the bonds in key residues, th results from the screening process are more accurate and effective. The residue energ contributions shown in Tables 3-5 were predicted by MM/GBSA calculation. We on listed the top 10 residues. MCCS utilized the jdock function to calculate and display th energy contributions of each residue. To identify residues classified as "key residues", w only chose residues that had an inter-ligand free energy contribution of <−0.3 kcal/m (the lower values represent the stronger binding between the ligand and the receptor The values differed from the docking scores because MCCS calculates total free energ with the jdock function, while Glide, the ligand docking program we used on Maestr calculates docking scores with the SP and XP GlideScore functions.     The residue energy contribution between ZINC15676170 and the 3CL pro of SARS-CoV-2 was analyzed by both MD simulation and MCCS. As shown in Figure 2, the binding pose between ZINC15676170 and the 3CL pro of SARS-CoV-2 predicted by MD simulation indicated that the nitrogen atom of ZINC15676170 can form a hydrogen bond with the oxygen atom of HIS164, with the distance of 3.5Å, and the oxygen atom of ZINC15676170 can form a hydrogen bond with the nitrogen atom of GLY143, with the distance of 3.1Å. The MCCS prediction results indicated that the HIS164 and GLY143 have a very high chance of forming a hydrogen bond with ZINC15676170, which is consistent with the MD simulation prediction. simulation indicated that the nitrogen atom of ZINC15676170 can form a hydrogen bond with the oxygen atom of HIS164, with the distance of 3.5Å, and the oxygen atom of ZINC15676170 can form a hydrogen bond with the nitrogen atom of GLY143, with the distance of 3.1Å. The MCCS prediction results indicated that the HIS164 and GLY143 have a very high chance of forming a hydrogen bond with ZINC15676170, which is consistent with the MD simulation prediction.  The residue energy contribution between ZINC09033700 and the 3CL pro of SARS-CoV-2 was analyzed by both MD simulation and MCCS. As shown in Figure 3, the binding pose between ZINC09033700 and the 3CL pro of SARS-CoV-2 predicted by MD simulation indicated that the oxygen atom of ZINC15676170 can form a hydrogen bond with the nitrogen atom of GLU166, with the distance of 3.1Å. The MCCS prediction results indicated that the GLU166 has a very high chance of forming a hydrogen bond with ZINC09033700, which is consistent with the MD simulation prediction.
The residue energy contribution between ZINC09033700 and the 3CL pro of SARS-CoV-2 was analyzed by both MD simulation and MCCS. As shown in Figure 3, the binding pose between ZINC09033700 and the 3CL pro of SARS-CoV-2 predicted by MD simulation indicated that the oxygen atom of ZINC15676170 can form a hydrogen bond with the nitrogen atom of GLU166, with the distance of 3.1Å. The MCCS prediction results indicated that the GLU166 has a very high chance of forming a hydrogen bond with ZINC09033700, which is consistent with the MD simulation prediction.  The residue energy contribution between ZINC12530139 and the 3CL pro of SARS-CoV-2 was analyzed by both MD simulation and MCCS. As shown in Figure 4, the binding pose between ZINC12530139 and the 3CL pro of SARS-CoV-2 predicted by MD simulation indicated that the nitrogen atom of ZINC15676170 can form a hydrogen bond with the oxygen atom of SER144, with the distance of 3.3Å. The MCCS prediction results indicated that the SER144 has a very high chance of forming a hydrogen bond with ZINC12530139, which is consistent with the MD simulation prediction.
Molecules 2023, 28, x FOR PEER REVIEW 10 of 16 The residue energy contribution between ZINC12530139 and the 3CL pro of SARS-CoV-2 was analyzed by both MD simulation and MCCS. As shown in Figure 4, the binding pose between ZINC12530139 and the 3CL pro of SARS-CoV-2 predicted by MD simulation indicated that the nitrogen atom of ZINC15676170 can form a hydrogen bond with the oxygen atom of SER144, with the distance of 3.3Å. The MCCS prediction results indicated that the SER144 has a very high chance of forming a hydrogen bond with ZINC12530139, which is consistent with the MD simulation prediction.  The purpose of MD simulation is to simulate a dynamic process, while MCCS uses the jdock function to predict a static binding pose. The representative conformation is the conformation found in 1000 snapshots of the MD simulation that is closest to the average conformation. The energy decomposition of MM/GBSA predicted by the MD simulation represents the energy decomposition of the representative conformation. Therefore, there may be some difference between the energy decomposition of MM/GBSA and the energy decomposition of MCCS. Although the rankings of residues were not identical due to different scoring mechanisms for residues, we found that at least 8 out of the top 10 residues (80%) were consistent. The accuracy of MCCS prediction has been approved by another manuscript [21].
From the energy contribution between the three potential inhibitors and 3CL pro calculated by MM/GBSA and MCCS, we can predict that HIS41, ASN142, GLY143, SER144, HIS163, HIS164, MET165, GLU166, and GLN189 are the key residues that play an important role within the binding between the potential inhibitors and 3CL pro . Among the key residues, GLY143, SER144, HIS164, and GLU166 have a very high potential to form hydrogen bonds with potential inhibitors. Steric hindrance and hydrogen bonding are the main forms of energy contribution. As shown in Figure S3, we also analyzed the hydrogen-bonding occupancy of ZINC15676170, ZINC09033700, and ZINC12530139. In ZINC15676170, the hydrogen bonds of GLY143 and GLU166 accounted for 25.98% and 15.6% of the trajectory, respectively. In ZINC09033700, the hydrogen bonds of GLU166 accounted for 21.5% of the trajectory. In ZINC12530139, the hydrogen bonds of GLY143 and SER144 accounted for 16.5% and 0.4% of the trajectory, respectively.

Protein and Ligand Structure Organization
In this study, we retrieved the three-dimensional coordinates of 3CL pro from RCSB protein data bank (PDB id: 6LU7) [22]. This structure consists of N3 co-crystal inhibitor and 3CL pro . Protein preparation wizard implemented in Schrödinger software (Schrödinger, LLC, New York, NY, USA) was used to prepare and optimize the structure of 6LU7, which included filling in missing side chains, using Prime and adding hydrogens. The OPLS3 force field was used for protein-energy minimization. LigPrep was used to prepare the ligands.

TCM Database
The chemical compounds used in this study were collected from the TCM database. We chose to use the TCM database because the TCM database is currently the largest and most comprehensive database on traditional Chinese medicine [8]. We collected 225,667 natural compounds, which were then filtered through using Lipinski's rule of five (we excluded any natural compounds that violated this rule). This narrowed 225,667 natural compounds down to 177,764 ligands, which were more likely to be a hit. These ligands were then uploaded to Maestro for scoring function calculations and virtual screening.

Inhibitor Filtering by Ligand-Based Pharmacophore Model
In this study, we collected 5 inhibitors that bonded well to 3CL pro in SARS-CoV-2. These include the N3, GRL-2420, ML188, CALPAIN1, and nirmatrelvir inhibitors retrieved from Protein databank 6LU7, 7JKV, 7L0D, 7LBN, and 7U29 respectively. Among these inhibitors, we conducted the ligand-based virtual screening process, which uses structural commonalities between the 5 inhibitors to accurately identify the key pharmacophore model that plays an important role in the binding process. The pharmacophores that were inserted into the crystal structure of 3CL pro let us identify the key residues and the interaction type that play a major role in the binding between the inhibitor and 3CL pro of SARS-CoV-2. Each inhibitor contains 4 or 5 pharmacophore features that can form interactions with 3CL pro , and we collected the ligands that contain at least 2/4 or 3/5 pharmacophore features as the inhibitors listed above. This allowed us to narrow down the 177,764 TCM compounds to 17,939 compounds that have similar pharmacophore features as the 5 inhibitors listed above.

Virtual Screening and Selection Method
The Maestro software allows us to screen and select ligands from the TCM database. First, we used the Maestro virtual screening software to display the preprocessed and minimized structures of the 6LU7 protein. Then, we used several visual tools to find the various bonds (hydrogen bonds, salt bridges, and π-π stacks) between the ligand and the crystal structure of 3CL pro . The bonds between the key residues of 3CL pro and the ligand are considered very important because of the lower binding free energy between the key residues of 3CL pro and the ligands, thus the more stable the complex formed by ligand and 3CL pro is. The set of 17,939 ligands were docked into the active site of 3CL pro and then displayed in the order of highest (most negative) docking score. The docking score was calculated using the Glide program that utilized the SP and XP GlideScore scoring functions. We selected the top 10 ligands (according to docking score). The Schrödinger Maestro tools calculate various bond types (hydrogen bonds, salt bridges, and π-π stacks), which help us screen the number of bonds that each ligand has with the protein.

Molecular Dynamics (MD) Simulation
We performed molecular dynamics simulations on the top 10 ligands and 3CL pro complexes to further validate our virtual screening results. The MD simulations were carried out in a cubic box (~90Å × 90Å × 90Å) with periodic boundary conditions to model a representative part of the interface devoid of any arbitrary boundary effects [23]. The system was solvated in a 0.15 mol/L NaCl solution, including 15,021 water molecules, 40 Cl − and 44 Na + ions. The systems were protonated prior to the MD simulations. Special caution was applied to histidine residues, which can be ionized at pH 7.40. AMBER ff14SB force field [24] and GAFF [25] were applied to the protein and ligands, respectively. The TIP3P water model was used to treat the water molecules [26]. Subsequently, the PMEMD.mpi and PMEMD.cuda modules in the AMBER16 software package were used to perform the MD simulations [27][28][29].
Before the MD simulations, we first relaxed the complex by minimizing 5000 cycles to avoid the possible spatial conflicts-500 cycles of steepest descent and 4500 cycles of conjugated gradient minimization [30]. After minimization, it entered the heating phase. In this step, each system was gradually heated from 0 K to 303.15 K, followed by equilibrium and production stages (303.15 K) with a time step of 2 fs. Periodic boundary conditions were used to maintain constant temperature and pressure (NPT) ensembles. The pressure was set to 1 atm and was controlled by an anisotropic (x-, y-, z-) pressure scaling protocol, and the pressure relaxation time was 1 ps. Langevin dynamics were used to control the temperature to keep the temperature at the collision frequency of 2 ps −1 [31,32]. The particle grid Ewald (PME) method [33,34] was used to deal with long-distance static electricity, and the critical value of actual spatial interaction was set to 10 Å. The SHAKE algorithm was used to constrain all covalent bonds related to hydrogen atoms [35]. Both systems were subjected to 100 ns MD simulations, and a trajectory of the simulated system was saved every 100 ps.

Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) Calculation
In MM/GBSA, their binding free energy (∆G bind ) of 3CL/molecule was calculated by the following equation: Among them, ∆E MM was the gas phase MM energy during binding, ∆G sol and −T∆S were the changes in solvation free energy and the conformational entropy on ligand binding, respectively [30]. T was the absolute temperature and ∆S was the molecule's entropy [36]. ∆E internal specifically refers to bond, angle, and dihedral energy, which together with electrostatic ∆E electrostatic energy and van der Waals force ∆E vdw form ∆E MM [30]. The specific calculation formula is as follows: The energy contribution of ∆G sol includes electrostatic solvation energy (polar contribution) ∆G GB and non-electrostatic solvation component ∆G SA (non-polar contribution). The non-polar contribution obtained by fitting the solvent accessible surface area (SASA) [37] and linear combination of pairwise overlaps (LCPO) model [38,39]. The specific calculation formula is as follows: In addition, the energy of each residue could also be calculated. Energy analysis could analyze the energy contribution of key residues in the decomposition of the binding process [40].

Molecular Complex Characterizing System
As stated in the introduction before, looking only at the docking scores produces false positive results [9]. Considering the key residues in the docking process can make the results of virtual screening more convincing and make it easier for biological experiments to verify the inhibitory efficacy of ligands against 3CL pro . The energy contribution of key residues was calculated by Molecular Complex Characterizing System (MCCS). Firstly, Chimera (version 1.15) [41] was applied to fix the incomplete side chain of protein PDBs. To be more specific, the full protein structures were scanned by Chimera and then the incomplete residues were revealed. The truncated side chains were replaced with entire side chains of the same residue type with the aid of the Dunbrack rotamer library [42]. Secondly, the polar hydrogens, Vina force field, and Gasteiger charges were added by VEGA [43]. Thirdly, the PDB format was converted to the PDBQT format. For ligand files, VEGA was first used to prepare the ligand PDBs as the protein preparation. Secondly, PROPKA (version 3.1) [44] was used to predict the pKa values of ligands [45]. MCCS would protonate the tertiary (3 • ) amide in the compounds when the computed pKa value of the ligands was greater than or equal to the supplied pH (7.4 by default). Thirdly, the torsions of ligands were determined by VEGA, and the ligand PDB format was converted to the PDBQT format. Finally, the PDBQT files of the ligand and the protein and the pKa file of the ligand were used as inputs into MCCS, which includes scoring and docking with jdock to calculate the residue energy contribution.
Jdock is the core implementation of the MCCS algorithm, which can generate nine different energy contribution vectors in total, including Gauss, Gauss1, Gauss2, hydrogenbonding, hydrophobic, non-steric (hydrogen-bonding and hydrophobic), repulsion, steric (Gauss1, Gauss2, and repulsion), and total energy contribution [9]. The docking mechanism in jdock can generate up to 999 bound poses. The top 10-15 key residues were then recorded and used in the virtual screening process to filter out false positive results.

Conclusions
The SARS-CoV-2 virus relies on receptor binding using spike proteins to release its genetic material and spread the infection of the virus to other cells. Therefore, techniques that utilize drugs to inhibit the spike glycoprotein or 3CL pro are effective treatments against SARS-CoV-2 [11]. In this study, the molecular docking and molecular dynamics simulations proved that ZINC15676170, ZINC09033700, and ZINC12530139 are very likely candidates for an antiviral therapeutic treatment process against SARS-CoV-2. MCCS displayed the energy contributions and key residues that greatly helped the virtual screening process, and MD simulations validated these results. In order to evaluate our virtual screening results, we again used Glide docking in Schrödinger to dock N3, GRL-2420, CALPAIN1, nirmatrelvir, and ML188 inhibitors retrieved from the protein databases 6LU7, 7JKV, 7LBN, 7U29, and 7L0D, respectively, into 3CL pro . The docking scores of N3, GRL-2420, CAL-PAIN1, nirmatrelvir, and ML188 are −10.073, −9.655, −7.28, −6.552, and −6.002 kcal/mol, respectively. N3 can form a covalent bond with 3CL pro , and GRL-2420 can also form a covalent bond with the Cys-145 of 3CL pro , which account for the high docking scores of N3 and GRL-2420 [46,47]. The docking scores of ZINC15676170, ZINC0903370, and ZINC12530139 are −9.178, −8.799, and −8.774, respectively, which are significantly higher than the docking scores of CALPAIN1, nirmatrelvir, and ML188. We also performed MD simulations of ZINC15676170, ZINC0903370, and ZINC12530139 with the 3CLpro of SARS-CoV-2, respectively, and the system was subjected to 500 ns MD simulations. The time course of the root-mean-square deviations (RMSD) plot (Figure S4) of the 500 ns MD simulations indicated the bindings of ZINC15676170, ZINC0903370, and ZINC12530139 with the 3CLpro of SARS-CoV-2, respectively, are stable. Obviously, we still need to conduct in vitro tests and human clinical studies to determine the efficacy of these drugs and to further validate our results. However, in silico computational prediction has proven to be a large asset for drug discovery and repurposing. The speed and efficiency of such studies have been essential to the development of effective antiviral therapeutic treatments [20].