Inhibitors of SARS-CoV-2 Main Protease from a Library of Marine Natural Products: A Virtual Screening and Molecular Modeling Study

The current emergency due to the worldwide spread of the COVID-19 caused by the new SARS-CoV-2 is a great concern for global public health. Already in the past, the outbreak of severe acute respiratory syndrome (SARS) in 2003 and Middle Eastern respiratory syndrome (MERS) in 2012 demonstrates the potential of coronaviruses to cross-species borders and further underlines the importance of identifying new-targeted drugs. An ideal antiviral agent should target essential proteins involved in the lifecycle of SARS-CoV. Currently, some HIV protease inhibitors (i.e., Lopinavir) are proposed for the treatment of COVID-19, although their effectiveness was not yet assessed. The main protease (Mpro) provides a highly validated pharmacological target for the discovery and design of inhibitors. We identified potent Mpro inhibitors employing computational techniques that entail the screening of a Marine Natural Product (MNP) library. MNP library was screened by hyphenated pharmacophore model, and molecular docking approaches. Molecular dynamics and re-docking further confirmed the results obtained by structure-based techniques and allowed to highlight some crucial aspects. Seventeen potential SARS-CoV-2 Mpro inhibitors have been identified among the natural substances of marine origin. As these compounds were extensively validated by a consensus approach and by molecular dynamics, the likelihood that at least one of these compounds could be bioactive is excellent.


Introduction
The new coronavirus, designated as SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), was first identified in Wuhan, China, in December 2019 [1]. SARS-CoV-2 belongs to the family of Coronaviridae, single-stranded RNA virus (+ ssRNA) that spreads widely among humans and other mammals, causing a wide range of infections from common cold symptoms to fatal diseases, such as severe respiratory syndrome [2,3]. The mortality rate of this new virus appears to be around 2% in China, which is much lower than the mortality rate of SARS-CoV [4]. It should be considered that the majority of fatal cases are vulnerable populations with comorbidity such as immunosuppression, diabetes, or heart disease. The primary issue all over the world is the high human-to-human transmission that has led to the spread of its outbreaks in many countries [5]. More than 81,170 cases of infection and 3,270 deaths have been reported in China, and 63,900 and 6,000, respectively, in Italy since January 22, 2020 (data from https://www.worldometers.info/coronavirus/ at 24 March 2020). Also, cases of infection from USA, Spain, Germany, Iran, France, Sud Korea, Switzerland, UK, Netherlands, Austria, Belgium, Norway, Australia, Canada, Portugal, Sweden (in decreasing order of infection since >2,000) and others have been reported, indicating that the disease poses a potential threat to global health [1]. Unfortunately, the number of infected continues to overgrow, and no drugs have been approved to be effective. Therefore, the need to discover and develop drugs for the treatment of the Coronavirus Disease 2019 (COVID-19) is urgent. Many research teams have exploited the major protease (M pro ) also named chymotrypsin-like protease (3CL pro ) [6] of SARS-CoV-2 as a potential drug target to combat COVID-19. Sequence alignment revealed that the SARS-CoV-2 M pro shares a 96% similarity to that of SARS-CoV [7]. For example, Jiang and colleagues identified 30 drugs and compounds as SARS-CoV-2 M pro inhibitors through protein modeling and virtual screening [8], which represents rapid progress in the way of dealing with the crisis. Furthermore, one of the 30 drugs/compounds, Remdesivir, was suggested to be a potential SARS-CoV-2 inhibitor from Liu et al., which also suggested three other possible inhibitors [9]. The SARS-CoV-2 M pro and SARS-CoV M pro structures are quite similar, the main difference being the surface of the proteins, where 12 different amino acids are located. Both enzymes consist of three domains; the domains I (residues 1-101) and II (residues 102-184) consist of an antiparallel β-barrel, and, for enzymatic activity, the α-helical domain III is required (residues 201-301) [10][11][12]. Furthermore, both SARS-CoV-2 M pro and SARS-CoV M pro have a structure similar to that of the cysteine proteases, although the third catalytic residue is missing at their active site. Their active site includes a catalytic dyad, His41 and Cys145, and a particularly stable water molecule that forms at least three hydrogen-bonding interactions with the surrounding residues, including a catalytic histidine, which corresponds to the position of the third catalytic element [11 [12]]. It should also be noted that one of the different amino acids in SARS-CoV-2 M pro , namely Ser46, is located on the Cys44-Pro52 loop that flanks the cavity of the active site. Another fundamental difference between SARS-CoV-2 M pro and SARS-CoV M pro is given by the different sizes and shapes of the external pockets found in the two systems. Surprisingly, the volume of the external pockets of the SARS-CoV-2 M pro is, on average, half that of the SARS-CoV M pro . Thus, considering the similarity of the two structures, it is presumable to assume that there may be large differences between the accessibility to the binding cavity and/or the accommodation of the shape of the cavity in response to an inhibitor that can be bound. Moreover, the SARS-CoV M pro apo structure has shown the largest outer pocket respect to both ligand bounded M pro structures, suggesting that the SARS M pro binding cavity is highly flexible and significantly changes in both volume and shape after ligand binding [11 [12]].
These features can be exploited in the design of lead compounds with a potentially broad spectrum of activity. Generally, a suitable substrate can be converted to a good inhibitor by replacement of a part of the substrate sequence that binds directly to the active site of the protease, reversibly or irreversibly, with the chemical warhead targeting the catalytic mechanism. Peptide inhibitors were designed by attaching a chemical warhead (i.e., Michael acceptors, aldehydes, ketones, etc.) to a peptide that mimics the natural substrate [13]. These inhibitors act through a twostep procedure, wherein they first bind and form a non-covalent complex with the enzyme such that the warhead is preorganized near the catalytic residue. Subsequently, the nucleophilic attack occurs at the cysteine, leading to the formation of the covalent bond. Some peptidomimetic derivatives contain Michael acceptors as warheads and are an important class of cysteine protease inhibitors. The cysteine residue undergoes 1,4-addition to the inhibitor at the Michael acceptor warhead group, and the subsequent protonation of the α-carbanion results in the irreversible inhibition of the enzyme [13]. Our work aims to perform a virtual screening against the SARS-CoV-2 M pro binding site using a library of Marine Natural Products (MNP). Many MNP have been detected as having various biological activities; peptides isolated from fish as well as algal polysaccharides have been reported to have anti-cancer, anticoagulant, and inhibitor activities. Marine bacteria and fish oils contain a considerable amount of omega-3 fatty acids, whereas seaweeds and shellfish, such as crustaceans, have potent antioxidants, including carotenoids and phenolic compounds [14]. On a pharmacophore model, built by Pharmit server (http://pharmitcsb.pitt.edu/) [15] starting from the SARS-CoV-2 M pro (PDB ID: 6LU7) and the complexed ligand N3 (PRD_002214) structure employed as input, the virtual screening on the 164,952 conformers of the 14,064 molecules contained in the MNP library was carried out. Among these, 180 molecules were docked using AutoDock Vina software. Finally, we conducted a parallel docking study with AutoDock4 and MD simulation studies, of 10 ns each, on the 17 molecules that showed the best promising results in terms of inhibitory activity.

Catalytic site of the SARS-CoV-2 M pro
The crystal structure of SARS-CoV-2 M pro (PDB ID 6LU7) in complex with the N3 (ID PRD_002214) inhibitor was previously solved by X-ray crystallography at a high resolution of 2.16 Å [8]. In this structure, the N3 peptide inhibitor is covalently bound with the Cys145 residue ( Figure 1). Moreover, the surface topology of the active site inherent both SARS-CoV-2 M pro and SARS-CoV M pro proteins shows some differences, consisting in more sub-cavities and a smaller volume (337.2 A 3 vs 447.7 A 3 ) for the first one ( Figure 1). It should be noted that the volume of the crystallized ligand influences the volume of the catalytic pocket. Moreover, the high flexibility of the catalytic site plays an essential role in the volume of the M pro pocket. Consequently, inhibitors of the SARS-CoV M pro might present altered binding affinities for the SARS-CoV-2 M pro .

Pharmacophore Model
A pharmacophore defines the essential features of interactions, including the spatial arrangement of each interaction in the bond between the drug and the target [15]. Structure-based pharmacophore modeling usually uses the 3D-structure of a protein while is interacting with its docked ligand. Therefore, pharmacophore is capable of incorporating more detailed information about regions available to the ligand to bind efficiently to its receptor.
In drug design, the arrangement of steric and electronic characteristics of the pharmacophore is necessary to guarantee the finest molecular interactions with the respective biological target and to inhibit its biological function [16]. The use of multiple techniques, therefore, improves the probability of identifying a diverse set of lead compounds. To design a novel SARS-CoV-2 M pro inhibitor, we sought new scaffolds from MNP using a pharmacophore model of the N3/SARS-CoV-2 M pro cocrystallized structure. The workflow of the virtual screening strategy for the discovery of new scaffolds binding the catalytic site of SARS-CoV-2 M pro is depicted in Figure 2. The 3D pharmacophore search was performed using the Pharmit server [15], which provides both pharmacophore and molecular shape search modalities as well as a ranking of results by energy minimization. The proposed pharmacophore model is a binding-site-derived pharmacophore model, which includes the following pharmacophore features of ligands binding to the enzyme active site: three amide nitrogen atoms to represent hydrogen bond donors (HBD), two negatively charged oxygen atoms (as in a carboxyl group) to represent a hydrogen bond acceptor (HBA), and the isopropyl group to represent a hydrophobic center (HPC) ( Figure 3). The essential characteristics of the generated pharmacophore model included four HBD features, P1-P4, allocated in an opportune position to interact with residues Thr190, Glu166, Gnl189, and His164, respectively; one HBA feature, P5, that is positioned to interact with Glu166 residue; and one HPC feature, P6, that is related to the interactions of the isopropyl group of the N3 inhibitor located in the hydrophobic pocket. The generated pharmacophore model was used to filter a vast library of MNP (14,064 molecules, 164,952 conformers). From this library, 770 conformers meet the criteria of the pharmacophore filter, and after an ulterior filter that retained only one conformer for molecule with an RMSD lower than 2 Å with respect to the co-crystallized N3 ligand, 197 structures remained. Peptide structures without peptidomimetic regions or unable to act as Michael acceptors were manually discarded, as they do not meet the pharmacophore criteria of protease inhibitors. The residual 180 molecules have undergone the docking procedure. Molecular docking algorithms are often calibrated on a training set of experimental ligandprotein complexes, and the accuracy of these docking programs is often highly dependent on the used training set [19]. In this case, due to the lack of known ligands, it is essential to confirm that the docking software used for virtual screening can replicate the binding mode of a known experimental inhibitor for the enzymes studied. Although neither an effective antiviral drug nor a vaccine against COVID-19 is currently available, several reports have indicated that HIV-1 protease inhibitors, such as Lopinavir, have the potential for designing SARS-CoV-2 protease ligands [17]. In the attempt to have reference values (positive control), we decided to consider both the N3 co-crystallized ligand within the catalytic site of SARS-CoV-2 M pro (PDB ID: 6LU7) and Lopinavir as comparative standards for the molecular docking and MD simulation experiments.

Molecular Docking and MD Simulation
Replication of the experimental binding pose by molecular docking confirmed the suitability of the docking algorithm for virtual screening. Autodock Vina software was able to accurately predict the pose of N3, with an RMSD of only 0.254 Å respect to that of the co-crystallized one.
The 180 MNP structures filtered and selected according to the pharmacophore descriptors were separately docked into the SARS-CoV-2 M pro binding site. The flexible virtual screening was performed using Autodock Vina to find the most favorable binding interactions, and the calculated free binding energies were reported in Table 1 for the best 17 compounds and in Table S1 for all the others. To further validate the pharmacophore model descriptors, validate the poses and binding energies, and in-depth investigate the interactions of the new ligands within the catalytic site of the protease, we conducted a parallel docking study, with Autodock4, and MD simulations on those compounds (1-17) that showed a better affinity (Table 1). For all compounds, a re-docking was performed, using Autodock Vina, after MD simulation, taking the averaged pose of the last 3 ns.
The most promising inhibitors of the SARS-CoV-2 M pro (Table 1, 1-17) are primarily represented by a class of molecules called phlorotannins, oligomers of phloroglucinol (1,3,5-trihydroxybenzene), isolated from Sargassum spinuligerum brown alga [18]. Although most of these phlorotannins were identified in Sargassum spinuligerum, other species of Sargassum may also contain a large number of phlorotannins, including phlorethols, fuhalols, and fucophlorethols. [19]. Algae from the Sargassum family are used extensively in traditional Chinese medicine [18].
The results of the molecular docking showed that the tested compounds (1-19) had docking energies ranging from −14.6 to −10.7 kcal/mol (Table 1). Heptafuhalol A (1), showed the lowest docking energy (−14.60 kcal/mol). As shown in Figure 4, the hydroxyl groups in heptafuhalol A form an extensive network of H-bonds within the protease receptor site. The acceptor residues of hydrogen bonds are represented by Thr24, Ser46, Asn142, Glu166, and Pro168. Furthermore, π-hydrogen bonds with His41 and Gly143 residues, and hydrophobic interactions with Met49, Met65, Leu141, and Pro168, further stabilize the ligand-receptor complex. The results from the MD simulation of the heptafuhalol A/SARS-CoV-2 M pro complex are shown in Figure S1. It is interesting to note that during the MD simulation the His41 residue, belonging to the catalytic dyad, establishes an H-bond with the hydroxyl residue of the ligand, with a distance of 1.97 Å and an average D-H-A angle of 177.8° after 10 ns of simulation, with a free energy of −18.0 kcal/mol from re-docking. The RMSDs of the overall structures (SARS-Cov-2 and compound 1) with respect to the first ones at time 0 were analyzed and plotted during the 10 ns of MD simulation (Figure 4c). The overall RMSD for the protein system appeared to have reached equilibrium after 800 ps, and the stabilization of the protein-ligand complex after 3 ns, keeping the broad network of hydrogen bonds of the complex constant. The ligand is linearly arranged with three aromatic systems along the back of the protein responsible for binding to the viral proteins. Total energy, RMSDs, ligand interactions, and docking binding pose of compounds 2-17, N3, and Lopinavir were reported in Figures S2-19.    Among the most active inhibitors, we find the compounds 7, 10, and 11, belonging to the family of phlorotannins, isolated in the brown algae Ecklonia cava. The latter is an edible seaweed, which has been recognized as a rich source of bioactive derivatives, mainly phlorotannins. These phlorotannins exhibit various beneficial biological activities such as antioxidant, anticancer, antidiabetic, antihuman immunodeficiency virus, antihypertensive, matrix metalloproteinase enzyme inhibition, hyaluronidase enzyme inhibition, radioprotective, and antiallergic [20][21][22][23][24][25][26]. Noteworthy, dieckol (11) has already reported as one of the most potent SARS-CoV M pro phlorotannin inhibitors (IC50 = 2.7 M). Docking studies highlighted that interactions between dieckol and the amino acid residues in the active site of M pro are mainly constituted by an H-bonds network with calculated binding energy of −11.76 kcal/mol [27], that is comparable to the energy found by us with the SARS-CoV-2 M pro .
Although compound 7 has a better affinity (average binding energy = −12.9 kcal/mol), due to an extensive network of H-bonds, it is interesting to note that compounds 7 and 10 interact through Hbonds with the residues of the catalytic dyad His41 and Cys145 ( Figure 5). In particular, 6,6'-bieckol forms an H-bond with the His41 residue with a D-A distance of 1.91 Å and a D-H-A angle of 176.0°. Dieckol shows a profile of hydrophobic interactions lower than the other two (7 and 10), showing interactions with the residues Leu27, Met41, Met49, Met165, Leu167, and Leu167. Among the compounds with the best-calculated activity against the SARS-CoV-2 M pro , also pseudotheonamide D (12) and pseudotheonamide C (17) emerged with a binding energy of −11.6 kcal/mol and −10.7 kcal/mol, respectively. In both cases, there is the formation of H-bonds with the residues Glu166 and Gln189, fundamental for the recognition of viral peptides within the catalytic site. Furthermore, phenyl groups occupy the small hydrophobic pocket of the enzyme interacting with the residues Leu27, Met49, Phe140, and Leu167, so stabilizing the ligand-protein complex (Figure 6a). It is interesting to note that these compounds could represent Michael acceptor covalent inhibitors. In fact, the conjugate bond is located at a distance of about 4 Å from Cys145, suggesting that both pseudotheonamides (12 and 17) can form a covalent bond with Cys145. These pseudotheonamides have been isolated from the marine sponge Theonella swinhoei and have shown good inhibitory activity on the serine protease [28]. Consequently, we simulated the possible formation of a covalent bond between compounds 12 and 17 and the residue Cys145. A short (2 ns) MD simulation was performed in order to stabilize the new complex. The lower energy system was further minimized, and covalent docking was performed. The binding energy of 12 and 17 is very similar (−14.9 kcal/mol and −14.4 kcal/mol, respectively) with a significant increase compared to the non-covalent interaction. The two compounds adopt a similar pose within the catalytic site, establishing H-bonds with the Asn142, Ser144, and Glu166 residues, while the benzyl groups settle into the hydrophobic pockets (Figures 6b,c). Peptidomimetic derivatives contain Michael acceptors as warheads are an essential class of cysteine protease inhibitors. In general, inhibitor design strategies involve the replacement of a substrate scissile amide bond with an appropriate Michael acceptor group. The cysteine residue undergoes 1,4-addition to the inhibitor at the Michael acceptor warhead group, and the subsequent protonation of the α-carbanion results in the irreversible inhibition of the enzyme [29][30][31].
Another class of promising M pro inhibitors has been identified in flavonoids such as Apigenin-7-O-neohesperidoside, Luteolin-7-rutinoside, and Resinoside. These compounds are also widespread on terrestrial plants and in food waste with good anti-tumor, anti-inflammatory, and antioxidant activity [32][33][34][35][36]. Among these, Apigenin-7-O-neohesperidoside or Rhoifolin (whose structure belongs to flavone glycoside and its aglycone is apigenin, while the neohesperidose disaccharide constitutes the glycosidic structure) has the best binding energy (−12.39 kcal/mol). The docking pose of apigenin (FigureS8) shows H-bonds between the aromatic region and residues Leu141, Glu166, and Thr190, establishing a π-stacking interaction with Gln189. In SARS-CoV M pro it has been shown that the Gln189 mutation negatively affects inhibitory activity, suggesting that this area of the protein plays a key role in the binding interaction [37].

Dataset of compounds
The chemical structures of the marine dataset were retrieved from Marine Natural Products (MNP, http://dockingfiles.umh.es/download/MNP_sdf_14492.zip). The full list of the 180 molecules that passed the pharmacophore filter, including the MNP ID, contacting receptor residues, and Vina  (Table S1).

Pharmacophore-based virtual screening and database preparation
The 3D pharmacophore search was performed using the Pharmit server (http://pharmit.csb.pitt.edu/) [15]. The pharmacophore model was constructed by Pharmit by inserting the SARS-CoV-2 enzyme (PDB 6LU7) and N3 ligand (PRD_002214) structures as input. Pharmit parameters for 3D-pharmacophore research have remained unchanged, except for the hydrophobic center (isopropyl group) with a radius of 1.5 A. This model was the basis for the virtual screening of the MNP library, which contained 14,064 molecules for a total of 164,952 conformers. The search was directed to select only one orientation for each conformation of the molecules. Compounds with an RMSD ≥2 Å than the N3 ligand were discarded. The remaining poses were minimized according to the functions of Pharmit.

Structures preparation and minimization
The structures of all the molecules used in this study were built using Marvin Sketch (18.24, ChemAxon Ltd., Budapest, Hungary). A first molecular mechanics energy minimization was used for 3D structures created from the SMLES; the Merck molecular force field (MMFF94) present in Marvin Sketch [38] was used. The protonation states were calculated, assuming a neutral pH. The PM3 Hamiltonian, as implemented in the MOPAC package (MOPAC2016 v. 18.151, Stewart Computational Chemistry, Colorado Springs, CO, USA) [39][40][41], was then used to further optimize the 3D-structures before the alignment for the docking calculations.

Molecular docking
Flexible ligand docking experiments were performed by employing AutoDock 4.2.6 and AutoDock Vina software implemented in YASARA (v. 19.5.5, YASARA Biosciences GmbH, Vienna, Austria) [42,43], using the three-dimensional crystal structure of SARS-CoV-2 M pro in complex with an inhibitor N3 PRD_002214 (PDB ID: 6LU7) obtained from the Protein Data Bank (PDB, http://www.rcsb.org/pdb), and the Lamarckian genetic algorithm (LGA). The covalent bond between the Cys145 residue and the crystallized ligand has been eliminated. His41 and Cys145 residues were protonated and optimized using YASARA software. The maps were generated by the program AutoGrid (4.2.6) with a spacing of 0.375 Å and dimensions that encompass all atoms extending 5 Å from the surface of the structure of the crystallized ligand. All parameters were inserted at their default settings, as previously reported [44]. In the docking tab, the macromolecule and ligand were selected, and GA parameters were set as ga_runs = 100, ga_pop_size = 150, ga_num_evals = 25,000,000, ga_num_generations = 27,000, ga_elitism = 1, ga_mutation_rate = 0.02, ga_crossover_rate = 0.8, ga_crossover_mode = two points, ga_cauchy_alpha = 0.0, ga_cauchy_beta = 1.0, number of generations for picking worst individual = 10.

Molecular Dynamics Simulations
The molecular dynamics simulations of the M pro /ligand complexes were performed with the YASARA Structure package. A periodic cubic simulation cell with boundaries extending 8 Å [45] from the surface of the complex was employed. The box was filled with water, with a maximum sum of all water bumps of 1.0 Å, and a density of 0.997 g mL −1 .
The setup included an optimization of the hydrogen bonding network [46] to increase the solute stability, and a pKa prediction to fine-tune the protonation states of protein residues at the chosen pH of 7.4 [47]. NaCl ions were added with a physiological concentration of 0.9%, with an excess of either Na or Cl to neutralize the cell. Water molecules were deleted to readjust the solvent density to 0.997 g/mL. The simulation was run using the ff14SB force field [48] for the solute, GAFF2 [49], and AM1BCC [50] for ligands and TIP3P for water. The cutoff was 10 Å for Van der Waals forces (the default used by AMBER [51], no cutoff was applied to electrostatic forces (using the Particle Mesh Ewald algorithm, [52]). The equations of motions were integrated with multiple time steps of 2.5 fs for bonded interactions and 5.0 fs for nonbonded interactions at a temperature of 298 K and a pressure of 1 atm (NPT ensemble) using algorithms described in detail previously [53]. The final system dimensions were approximately 80 × 80 × 80 Å 3 . Short MD simulation was run on the solvent only to remove clashes. The entire system was then energy minimized using first a steepest descent minimization to remove conformational stress, followed by a simulated annealing minimization until convergence (<0.01 kcal/mol Å). Finally, 10 ns MD simulations without any restrictions were conducted, and the conformations of each system were recorded every 200 ps. After inspection of the solute RMSD as a function of simulation time, the last 3 ns averaged structures were considered for further analysis.

Conclusions
Natural compounds can represent a synergy to be combined with pharmacological treatments in various pathologies [54]. In some cases, the natural resource is not readily available and cannot be reproduced on a large scale (for example, marine sponges) to insert it in the global market. In other cases, the raw materials for the extraction, purification, or enrichment of bioactive compounds can be reproducible (for example, algae) for the formulation of supplements.
Here, we described the screening of a collection of MNP (14,064 compounds) in search of new, potentially SARS-CoV-2 M pro inhibitors. Structure-based and ligand-based drug design approaches have developed into valuable drug discovery tools, owing to their versatility and synergistic character [44,55,56].
For the ligand-based evaluation, we used a pharmacophore model developed by the Pharmit server, whereas, for the structure-based evaluation, an initial docking analysis was performed, followed by a parallel docking approach on lead molecules. The poses of the 17 selected ligands have been analyzed by MD simulations. The selected compounds showed a better energy score than the drug currently used to treat COVID-19. Furthermore, it has been shown that several classes of compounds, such as phlorotannins, flavonoid, and pseudo peptides, can inhibit the SARS-CoV-2 M pro , as demonstrated for the SARS-CoV M pro .
Future in vitro activity assays of the ligands identified in this study will provide vital information on novel scaffolds for lead optimization.