Virtual Screening of Natural Chemical Databases to Search for Potential ACE2 Inhibitors

The angiotensin-converting enzyme II (ACE2) is a multifunctional protein in both health and disease conditions, which serves as a counterregulatory component of RAS function in a cardioprotective role. ACE2 modulation may also have relevance to ovarian cancer, diabetes, acute lung injury, fibrotic diseases, etc. Furthermore, since the outbreak of the coronavirus disease in 2019 (COVID-19), ACE2 has been recognized as the host receptor of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The receptor binding domain of the SARS-CoV-2 S-protein has a strong interaction with ACE2, so ACE2 may be a potent drug target to prevent the virus from invading host cells for anti-COVID-19 drug discovery. In this study, structure- and property-based virtual screening methods were combined to filter natural product databases from ChemDiv, TargetMol, and InterBioScreen to find potential ACE2 inhibitors. The binding affinity between protein and ligands was predicted using both Glide SP and XP scoring functions and the MM-GBSA method. ADME properties were also calculated to evaluate chemical drug-likeness. Then, molecular dynamics (MD) simulations were performed to further explore the binding modes between the highest-potential compounds and ACE2. Results showed that the compounds 154-23-4 and STOCK1N-07141 possess potential ACE2 inhibition activities and deserve further study.


Introduction
Angiotensin-converting enzyme (ACE) is a highly glycosylated transmembrane protein existing in two differentially spliced forms: the two-domain somatic ACE (ACE1, Nand C-domains) with similar, though not identical, substrate specificities and the singledomain testicular form (ACE2). Donoghue et al. identified the ACE2 gene as one that was upregulated in a human heart failure cDNA library [1]. ACE2 actually serves a multiplicity of functions and plays vital roles in different diseases, such as ovarian cancer, diabetes, acute lung injury, fibrotic diseases, etc. [2]. ACE2 is very common in ovarian cancer with amplification mutations. High expression of ACE2 promotes the prognosis of patients with ovarian cancer [3]. Recently, it has been proven that human ACE2 is a host receptor of severe acute respiratory syndrome coronavirus (SARS-CoV), which could specifically bind to SARS-CoV spike protein with high affinity [4,5]. A newly published paper reported that the novel coronavirus could enter ACE2-expressing cells, but not the cells that did not express ACE2, so ACE2 is also the host receptor for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [6].
SARS-CoV-2 is a well-known novel coronavirus that causes an acute infectious pneumonia disease, coronavirus disease 2019 (COVID-19) [7,8]. Symptoms of the infection include respiratory symptoms, fever, cough, shortness of breath, and breathing difficulties. In more severe cases, infections can cause pneumonia, severe acute respiratory syndrome, kidney failure, and even death [9][10][11]. It has been reported that bats might be the original host of this virus [12][13][14] and some animals sold at seafood markets may be the intermediate hosts for novel coronavirus [15]. The World Health Organization (WHO) announced that

ADME Analysis
ADME properties were calculated for the 500 top-ranked ligands after MM-GBSA treatment. The criteria contained in the rule of five were: (I) molecular mass less than 500 dalton, (II) partition coefficient (QPlogPo/w) not greater than 5, (III) hydrogen bond donors less than 5, (IV) hydrogen bond acceptors less than 10, (V) PSA less than 140 Å 2 , and (VI) percent human oral absorption more than 25. These rules were selected to evaluate the drug-likeness of compounds or to determine if a compound had pharmacological or biological potency [30]. A total of 298 ligands remained after deleting molecules that did not meet these standards.

Cluster Analysis
Cluster analysis was executed based on the molecular structural information. The remaining 298 molecules were clustered into 12 different categories, and the compound number contained in each category is shown in Table 1. The chemical with the best docking score in every category was chosen for further analysis.

Virtual Screening Results
The selected representative structures of the 12 ligands with their corresponding ADME properties are shown in Table 2. The interactions between the receptor and top 12 ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds. The selected representative structures of the 12 ligands with their corresponding ADME properties are shown in Table 2. The interactions between the receptor and top 12 ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds. The selected representative structures of the 12 ligands with their corresponding ADME properties are shown in Table 2. The interactions between the receptor and top 12 ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds. ADME properties are shown in Table 2. The interactions between the receptor and top 12 ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds. ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds. ligands are shown in Figure 2. Their docking results and binding residues are listed in Table 3 (The figures shown in this table were depicted using Maestro 10.1 (Schrödinger Inc., LLC, New York, NY, USA)). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds.  ). The structures of the 12 ligands are mostly phenol, ketone, and amine compounds.  Four of the twelve ligands were found to have clear sources and medicinal effects as known drug ingredients; these were compounds 154-23-4, 132-98-9, STOCK1N−53429, and STOCK1N-07141. With the increase in confirmed COVID-19 cases and deaths, it is a good strategy to find novel effects from known drugs or ingredients to prevent the invasion of SARS-CoV-2 because drug repositioning possesses several advantages when considering time, research cost, and safety [31]. Therefore, compounds 154-23-4, 132-98-9, STOCK1N−53429, and STOCK1N-07141 were selected for further analysis, and their interactions and distance are shown in Figure 3 (figures were depicted using PyMOL Molecular Graphics System Version 2.5.2 (Schrödinger, Inc., LLC)).        Four of the twelve ligands were found to have clear sources and medicinal effects as known drug ingredients; these were compounds 154-23-4, 132-98-9, STOCK1N−53429, and STOCK1N-07141. With the increase in confirmed COVID−19 cases and deaths, it is a good strategy to find novel effects from known drugs or ingredients to prevent the invasion of SARS-CoV-2 because drug repositioning possesses several advantages when The first natural product was 154-23-4 (catechin), a polyphenolic compound found in the bark and twigs of plants [32], which possesses antioxidant, anti-inflammatory, antibacterial, antifungal, antiviral, and anticancer properties [33][34][35][36]. Catechin also possesses virus inhibition activity, and the EC50 for the influenza A (H1N1) virus is 18.4 µg/mL [36]. The Glide docking results showed strong interaction between catechin and ACE2. The three OH from chromane in 154-23-4 formed an OH-O hydrogen bonding interaction with the backbone carbonyl atom from Ala387 with a distance of 1.9 Å. The five OH from chromane formed an OH-O interaction with the backbone carbonyl atom from Ala386 with a distance of 1.9 Å. The three OH from phenyl in 154-23-4 formed an OH-N interaction with the amide -NH from Asn33 with a distance of 2.2 Å and formed an OH-O interaction with the oxygen from carboxyl in Asp30 with a distance of 1.8 Å. The four OH from phenyl formed an OH-O interaction with the oxygen from carboxyl in Asp30 with a distance of 1.7 Å. All these results indicated a strong interaction between the ligand and protein with a Glide score of −5.418 and a value of −37.592 kcal/mol.
The second natural product was 132-98-9 (Penicillin V Potassium), which is useful for the treatment of bacterial infections [37]. The docking results indicated that Penicillin V Potassium could bind to ACE2 very well. The oxygen from carboxyl in 132-98-9 formed an O-HN interaction with the -NH from Lys26 with a distance of 1.8 Å. The carbonyl from carboxyl formed an O-HN interaction with the -NH from Asn90 with a distance 1.6 Å. The carbonyl from β-lactam in 132-98-9 formed an O-HN interaction with the -NH from Gln96 with a distance 2.0 Å. The oxygen from phenoxy in 132-98-9 formed an O-HN interaction with the -NH from Asn33 with a distance of 2.3 Å. The Glide score and ∆G value were −4.335 and −18.899 kcal/mol, respectively.
The third natural compound was STOCK1N−53429 (quinic acid), a widely presented natural product found in plants [38] Quinic acid has antioxidants, increases urinary excretion, and enhances DNA repair and immunity properties [39][40][41][42]. Glide docking results showed good affinity between quinic acid and ACE2 with four hydrogen bonding interactions and a salt-bridge interaction. The three OH from cyclohexane in STOCK1N−53429 formed an OH-O interaction with amide carbonyl from Gln76 with a distance of 1.8 Å. The OH from cyclohexane formed an O-HN interaction with the -NH from Gln76 with a distance of 2.1 Å, as well as an OH-O interaction with oxygen from carboxyl in Glu35 with a distance of 1.8Å. Carbonyl in STOCK1N−53429 formed an O-HN interaction with the -NH from Lys31 with a distance of 1.8 Å. Furthermore, there was a salt-bridge interaction between STOCK1N−53429 and Lys 31 with a distance of 4.4 Å. The Glide score and ∆G value were −5.923 and −19.312 kcal/mol, respectively.
The fourth natural compound was STOCK1N-07141 (arbutin), which was first discovered in the leaves of the bearberry plant and is widely found in animals, plants, and microbes [43,44]. Arbutin has been shown to have antioxidant, anti-inflammatory, and antibacterial properties [45][46][47]. Hydroxymethyl in STOCK1N-07141 formed an OH-O interaction with the oxygen from carboxyl in Asp30 with a distance of 2.4 Å and formed an O-HN interaction with the -NH from Asn33 with a distance of 1.9 Å. The two OH in STOCK1N-07141 formed an O-HN interaction with the -NH from Arg393 with a distance of 2.0 Å and formed an OH-O interaction with the oxygen from carboxyl in Glu37 with a distance of 1.7 Å. The Glide score was −4.898, and the ∆G value was −27.518 kcal/mol. These four chemicals have been found to possess medicinal effects, including enhanced immunity, anti-inflammatory, and antibacterial properties, etc. In addition, chemical 154-23-4 has antiviral effects. From the screening results, we knew that chemicals 154-23-4 and STOCK1N−53429 both had five interactions, including hydrogen and salt-bridge, with ACE2, and ligands 132-98-9 and STOCK1N-07141 had four hydrogen interactions with ACE2. These four compounds all had strong binding abilities with ACE2. In order to better determine the binding stability and binding modes of these four compounds and ACE2, we performed MD simulations on the systems combining these four compounds with ACE2.

Molecular Dynamics Simulation Results
After running a 100 ns molecular dynamics simulation for each system, we analyzed the obtained trajectory of each system. In order to research the stability of the structure, we calculated the root mean square deviation (RMSD) of the complex, the ligand, and the binding pocket (defined as residues within 5 Å around the ligand). As shown in Figure 4A, each system had a certain fluctuation in the initial stage of the molecular dynamics simulation and then gradually stabilized in the last 20 ns. In order to obtain more accurate analytical results, we chose the last 20 ns trajectory of each system for the succeeding analysis. Then, we extracted the last frame structure from the trajectory of each system and superimposed them with the initial structures to determine whether the small molecule was still at the binding site after the molecular dynamics simulation. As shown in Figure 5A, we found that the binding position of 132-98-9 in ACE2 changed. As shown in Figure 5B,C, although the conformation of 154-23-4 and STOCK1N-07141 underwent some changes, they remained stable at the binding site. As shown in Figure 5D, STOCK1N−53429 could not stably exist at the binding site. Since STOCK1N−53429 could not bind to ACE2 stably, we no longer considered this small molecule in subsequent analyses. 5A, we found that the binding position of 132-98-9 in ACE2 changed. As shown in Figure  5B,C, although the conformation of 154-23-4 and STOCK1N-07141 underwent some changes, they remained stable at the binding site. As shown in Figure 5D, STOCK1N−53429 could not stably exist at the binding site. Since STOCK1N−53429 could not bind to ACE2 stably, we no longer considered this small molecule in subsequent analyses. The root mean square fluctuation (RMSF) of each residue was calculated to research the stability of residues in these systems. As shown in Figure 6, the RMSF trends of these three systems were very similar. Compared with the stability of amino acid residues in other systems, the residues in the 154-23-4/ACE2 system had higher RMSF values, which The root mean square fluctuation (RMSF) of each residue was calculated to research the stability of residues in these systems. As shown in Figure 6, the RMSF trends of these three systems were very similar. Compared with the stability of amino acid residues in other systems, the residues in the 154-23-4/ACE2 system had higher RMSF values, which indicated that the residues in the 154-23-4/ACE2 system were more unstable than other systems during the molecular dynamics simulation.
The binding energy between the ligand and protein of each system was calculated using the MM-GBSA method and is listed in Table 4. From the values of binding free energy (∆G bind ), we concluded that these three compounds bind well to ACE2. At the same time, we found that the contribution of electrostatic interaction energy (∆E ele ) was quite different in the three systems. The contribution of electrostatic interaction energy presented a higher positive value in the 132-98-9/ACE2 system, while it presented a negative value in both the 154-23-4/ACE2 system and the STOCK1N-07141/ACE2 system. This indicated that electrostatic interaction energy was not conducive to the binding of ligand and protein in the 132-98-9/ACE2 system, but it had a positive effect on the binding of ligand and protein in both the 154-23-4/ACE2 system and the STOCK1N-07141/ACE2 system.  Meanwhile, we found that the contribution of polar solvation free energy (∆G GB ) was also quite different in the three systems. The value of polar solvation free energy was negative in the 132-98-9/ACE2 system, but it was opposite in the other two systems. This result showed that polar solvation free energy was beneficial to the binding of ligand and receptor in the 132-98-9/ACE2 system, while it had a disadvantageous effect on the binding of ligand and protein in the other two systems. The difference in the contributions of electrostatic interaction energy and polar solvation free energy to the three systems may be due to the difference in binding sites.
In addition, we decomposed the binding energy to study the contribution of each residue, as shown in Figure 7. In this figure, the key amino acid residues presenting high energy contributions were different in each of the three systems. In order to more clearly observe the energy contributions of these amino acid residues in the different systems, we extracted and mapped the energy contributions of these amino acid residues. As shown in Figure 8, Q24 and T27 in the 132-98-9/ACE2 system presented a higher contribution to the binding, and they showed almost no energy contribution in the other two systems. This situation may be related to the changed binding site in the 132-98-9/ACE2 system. At the same time, although the binding sites of the other two systems did not change, the energy contributions of their amino acid residues still showed large differences. K26, T92, Q388, and P389 had high energy contributions in the 154-23-4/ACE2 system, while N33, H34, E37, and K353 had high energy contributions in the STOCK1N-07141/ACE2 system.  In order to explore the reasons why the energy contributions of amino acid residues were different in the 154-23-4/ACE2 system and the STOCK1N-07141/ACE2 system, we calculated the hydrogen bonds (defined as the distance between acceptor and donor <0.35 nm and an angle >120 • ) of each system during the MD simulation. As shown in Table 5, hydrogen bond interaction had little effect on the 132-98-9/ACE2 system, while Q388 formed a hydrogen bond with the ligand in the 154-23-4/ACE2 system, which may be the reason for its greater contribution in this system. In the MD process, the hydrogen bond between E37 and the ligand occupied a higher proportion. This could be the reason why E37 contributed the most to the integration of the entire system. The difference in hydrogen bonds may be the reason for the differences in the energy contributions of the residues in different systems.

Protein Preparation
The crystal structure of ACE2 (PDB ID: 1R42) was obtained from the Protein Data Bank and prepared using the Protein Preparation Wizard in Maestro. After deleting water, adding hydrogens, and filling in missing side chains, the protein was optimized at PROPKA pH, and the side chains were minimized with a liquid simulation OPLS_2005 force field. The grid was generated using receptor grid generation protocol. As Wan et al. reported,residues 31,35,38,82, and 353 play a critical role in the affinity between ACE2 and SARS-CoV-2 [22]. These residues were chosen as active center, and the receptor grid size was defined as a 15 Å box. Other adjustable settings were set to default.

Ligand Preparation
Natural product databases from ChemDiv (398), TargetMol (2131), and InterBioScreen (68373) were combined for virtual screening. All molecular structures with ionization states were generated at a specific pH of 7.0 ± 2.0 [28], and stereoisomer computation was left at determined chiralities using the Lig Prep protocol on an OPLS_2005 force field [48].

Molecular Docking
Molecular docking was implemented using a glide docking package. The prepared ligands were docked into the active site of the protein using standard precision (SP), which considers the ligands as rigid, followed by extra precision (XP), which considers the ligands as flexible. Glide SP is a softer docking method that seeks to minimize false negatives; Glide XP is a harder function that exacts severe penalties for poses that violate established physical chemistry [49]. The docked conformers were evaluated using a docking score.
The binding free energies of the docked receptor and ligand complexes were calculated using the prime molecular mechanics-generalized Born surface (MM-GBSA) protocol on an OPLS_2005 force field [50].

ADME Analysis
Absorption, distribution, metabolism, and excretion (ADME) properties are crucial to the development of new drugs [51]. ADME properties were calculated to screen higher quality molecules and identify whether the molecules have drug-formability [52]. The Quik Prop protocol was employed to predict the molecular ADME properties and evaluate the drug-like properties stated in the rule of five (RO5) [53].

Cluster Analysis
The obtained ligands after docking and ADME screening were selected to export data files (cnv format) for clustering using Cavas' tool to obtain different molecular scaffolds. The representative compounds with the highest XP Glide scores among the different clusters were selected for further analysis.

Molecular Dynamics Simulation
In order to further study these systems, AMBER14 [54]. was used to run molecular dynamics simulations (MD). We used Gaussian 09 at the HF/6-31G* level of theory to optimize the ligands, and we used the restrained electrostatic potential (RESP) protocol to calculate the partial atomic charges of the ligand atoms [55][56][57]. Then, the general AMBER force field (GAFF) [58]. was used to create force field parameters for the ligands, and the standard ff14SB [59]. force field was used to generate force field parameters for the protein.
Then, the tLEaP module of AMBER14 was executed to add the hydrogen atoms and the appropriate number of Na+ to the systems. Then, we wrapped each system with a TIP3P cube box [60]. with each atom in the system at least 10 Å from the edge of the water box. Then, the Sander program was used to operate the minimization, heating, density, and equilibration protocols.
In this study, we performed a three-step minimization procedure with a limiting force of 5.0 kcal·mol −1 ·Å −2 on all atoms, a limiting force of 3.0 kcal·mol −1 ·Å −2 on the protein backbone atoms, and no restriction on all atoms. We ran 5000 steps for each minimization: the steepest descent method was used for the first 2500 steps, and the conjugated gradient method was used for the last 2500 steps. After minimization, each system was heated from 0.0 K to 310.0 K in an NVT ensemble. In this step, we imposed a limit force of 5.0 kcal·mol −1 ·Å −2 on all atoms in each system and executed 100 ps. In order to balance the solvent density, a short equilibration simulation was carried out for 50 ps, and all atoms were constrained by 5 kcal·mol −1 ·Å −2 under 1 atm pressure in the isothermal isobaric (NPT) ensemble. Then, we performed 1.5 ns equilibration in the NPT ensemble for each system. The first 1.0 ns was divided into five stages, including a restriction force of 5.0 kcal·mol −1 ·Å −2 , 4.0 kcal·mol −1 ·Å −2 , 3.0 kcal·mol −1 ·Å −2 , 2.0 kcal·mol −1 ·Å −2 , and 1.0 kcal·mol −1 ·Å −2 for each system, respectively, and carried out for 200 ps in each stage. After that, a 500 ps equilibration for each system without imposing the limiting force was performed. Finally, we used the PMEMD program in AMBER14 to carry out a 100 ns production of MD simulations at 310.0 K with 1 atm in the NPT ensemble without any restraint for each system. In the simulation process, the SHAKE algorithm [61]. was used to bound hydrogen bond length and the particle-mesh Ewald (PME) [62]. method was used to deal with long-range Coulomb interactions. The non-bonded cutoff value was set as 10.0 Å to deal with non-bonded interactions, and periodic boundary conditions were applied to avoid edge effects. The time step size was set as 2 fs. We kept a record of coordinate trajectory every 2 ps for all the production trajectories.

Binding Free Energy Calculations
In this study, we used the MM-GBSA [63][64][65][66]. method to calculate the binding free energy of each system. In this process, an average of 2000 structures were extracted at an interval of 10 ps from the last 20 ns of MD trajectory. We used the equations below to calculate the binding free energy: The binding free energy (∆G bind ) is the sum of the enthalpy term (∆H) and entropy term (−T∆S). ∆H of the system is the summation of the interaction energy of the gas phase between the protein-ligand (∆E MM ) and the solvated free energy (∆G solv ). ∆E MM is obtained by adding the internal energy (∆E internal , consists of the energies of bonds, angels, and torsions), the electrostatic interaction energy (∆E ele ), and the van der Waals interaction energy (∆E vdW ). ∆G solv is the sum of the polar solvation free energy (∆G GB ) and the nonpolar solvation free energy (∆G NP ).

Per-Residue Free Energy Decomposition Analysis
We decomposed per-residue free energy decomposition using the 2000 structures collected from the last 20 ns of MD trajectory at an interval of 10 ps. The MM-GBSA method was employed to calculate per-residue free energy decomposition (∆G MM-GBSA ) with the following equation: In this formula, ∆E vdW represents the van der Waals interaction energy, ∆E ele represents the electrostatic interaction energy, ∆E P represents the polar solvation free energy, and ∆E NP represents the nonpolar solvation free energy.

Conclusions
In summary, we first downloaded natural products from the TargetMol, ChemDiv, and InterBioScreen databases (70,902 molecules in total). Then, we implemented docking study to virtually screen chemicals that could bind to ACE2. Then, MM-GBSA ∆G values and ADME properties were calculated and predicted for 500 top-ranked ligands. Subsequently, we selected and clustered 298 molecules into 12 categories. As a result, four natural compounds with strong binding affinity activities to ACE2 and known medicinal effects were selected as potential ACE2 inhibitors to prevent the invasion of SARS-CoV-2. Afterwards, we performed MD simulations on these four systems and found that the compound STOCK1N−53429 could not bind with ACE2 stably, and the compound 132-98-9 also showed changes in the binding site during the MD simulation. The results showed that the compounds 154-23-4 and STOCK1N-07141 are the most promising candidates deserving further research.