Potential Coronaviral Inhibitors of the Nucleocapsid Protein Identified In Silico and In Vitro from a Large Natural Product Library

The nucleocapsid protein (NP) is one of the main proteins out of four structural proteins of coronaviruses including the severe acute respiratory syndrome coronavirus 2, SARS-CoV-2, discovered in 2019. NP packages the viral RNA during virus assembly and is, therefore, indispensable for virus reproduction. NP consists of two domains, i.e., the N- and C-terminal domains. RNA-binding is mainly performed by a binding pocket within the N-terminal domain (NTD). NP represents an important target for drug discovery to treat COVID-19. In this project, we used the Vina LC virtual drug screening software and a ZINC-based database with 210,541 natural and naturally derived compounds that specifically target the binding pocket of NTD of NP. Our aim was to identify coronaviral inhibitors that target NP not only of SARS-CoV-2 but also of other diverse human pathogenic coronaviruses. Virtual drug screening and molecular docking procedures resulted in 73 candidate compounds with a binding affinity below −9 kcal/mol with NP NTD of SARS-CoV-1, SARS-CoV-2, MERS-CoV, HCoV-OC43, HCoV-NL63, HoC-229E, and HCoV-HKU1. The top five compounds that met the applied drug-likeness criteria were then tested for their binding in vitro to the NTD of the full-length recombinant NP proteins using microscale thermophoresis. Compounds (1), (2), and (4), which belong to the same scaffold family of 4-oxo-substituted-6-[2-(4a-hydroxy-decahydroisoquinolin-2-yl)2H-chromen-2-ones and which are derivates of coumarin, were bound with good affinity to NP. Compounds (1) and (4) were bound to the full-length NP of SARS-CoV-2 (aa 1–419) with Kd values of 0.798 (±0.02) µM and 8.07 (±0.36) µM, respectively. Then, these coumarin derivatives were tested with the SARS-CoV-2 NP NTD (aa 48–174). Compounds (1) and (4) revealed Kd-values of 0.95 (±0.32) µM and 7.77 (±6.39) µM, respectively. Compounds (1) and (4) caused low toxicity in human A549 and MRC-5 cell lines. These compounds may represent possible drug candidates, which need further optimization to be used against COVID-19 and other coronaviral infections.


Introduction
In Wuhan, the capital of Hubei Province, China, various cases of unknown severe lung disease were reported for the first time in December 2019 that were caused by a novel virus, termed the severe acute respiratory syndrome coronavirus (SARS-CoV-2). The disease was called coronavirus disease 2019 . Since March 2020, COVID-19 has been declared a global pandemic by the World Health Organization (WHO) [1]. Through easy human-to-human transmission via small droplets and aerosols, the virus was distributed very quickly all over the world. The symptoms range widely between symptomless (mild), approved activity against variants with wildtype or mutated spike proteins are missing as of yet.
Therefore, we hypothesize that the identification of small molecules that are selectively active against a druggable target different from the spike protein might be attractive because such a compound may address all spike phenotypes independent from their mutational status. Furthermore, several computational simulation models have predicted that the probability is quite high for the next virus epidemic/pandemic to come; it may be wise to have candidate drugs at hand that are not only active against wildtype and mutated SARS-CoV-2, but also against other coronaviruses and eventually also against still unknown coronaviruses that might appear in the future. These molecules could bind to the target of interest that harbors lower mutations across different strains. Some targets are conserved among the coronavirus family members.
In this study, we focused on small molecules targeting the RNA binding domain of NP in all seven human pathogenic coronaviruses in an attempt to identify inhibitors with broad spectrum activity.

Multiple Sequence Alignment
We performed a multiple sequence alignment for the NP NTD domains of all human pathogenic coronaviruses, i.e., SARS-CoV-2, SARS-CoV-1, MERS-CoV, HCoV-OC43, HCoV-HKU1, HCoV-NL63, and HCoV-229E. As shown in Figure 1, there are multiple amino acids shared across the different NTD sequences. The color code shows the different properties of the amino acids, e.g., positive charged amino acids were represented in red, negative charged amino acids in magenta, hydrophobic amino acids in blue, polar amino acids in green, cysteines in pink, glycines in orange, prolines in yellow, aromatic amino acids in cyan, and non-conserved amino acids in white. The amino acids Ser4, Pro28, Gly40, Tyr41, Arg47, Gly55, Leu60, Pro62, Phe66, Tyr67, Tyr68, Gly70, Thr71, Gly72, Pro73, Gly86, Trp88, Val89, Ala91, and Arg106 were represented in all NP NTDs. It can be assumed that conserved amino acids are important for the function of the proteins. Therefore, we considered these residues for the subsequent drug screening steps. We performed a homology analysis of the seven coronaviral NTD sequences taken from UniProt.kb using Clustal Omega. The highest homology was found between the NTDs of SARS-CoV-2 and SARS-CoV-1 with 92.54% similarity while the NTD of MERS-CoV resulted in 60% similarity to SARS-CoV-2 followed by the NTDs of HCoV-HKU1, HCoV-OC43, HCoV-229E, and HCoV-NL63 with homologies of 44.09%, 41.73%, 33.90%, and 31.09%, respectively.

Literature Research for Known Active Residues in RNA Binding of the NTD
The essential residues for RNA binding of the NP NTD from SARS-CoV-2, SARS-CoV-1, MERS-CoV, HCoV-OC43, and HCoV-NL63 were reported in the literature (Table 1). The amino acids that are uniformly mentioned for RNA-binding in more than one NTD sequence are underlined. In almost every NTD, Arg45, Tyr62, Tyr64, Arg102 (referred to SARS-CoV-2 NTD) were involved in the binding function. Table 1. Amino acids involved in RNA binding of the NP NTD of different coronaviruses (SARS-CoV-2, SARS-CoV-1, MERS-CoV, HCoV-OC43, and HCoV-NL63). The sequence data were taken from UniProt.kb. Residues that occurred in more than one NTD are underlined.

Virtual Drug Screening and Molecular Docking
By implementing bioinformatic methods, we performed a virtual drug screening using a ZINC-based natural product library of more than 210,000 compounds and the NTD domain of SARS-CoV-2 NP. We used the Vina LC software to run an established workflow on a high-performance computer (MOGON). The protein binding pocket was determined based on amino acids known to be involved in the RNA-binding of the NP NTD. As a first step of the screening, SARS-CoV-2 NP NTD (PDB:6M3M) was used as target protein. The top 30% of compounds with the lowest binding energies were subsequently rescreened with the NTDs of SARS-CoV-1 (PDB:2OFZ), MERS-CoV (PDB:4UD1), HCoV-OC43 (PDB:4J3K), and HCoV-NL63 (PDB:5N4K). A total of 73 compounds revealed binding energies below -9 kcal/mol to all investigated coronavirus NPs. The Venn diagram in Figure 2 shows the number of binding compounds to each of the single coronavirus NTD and the intersections between each other. The intersection in the middle shows the common 73 compounds binding to all NTDs with a binding affinity below −9 kcal/mol. From the results of this virtual screening, we selected five compounds for further investigations ( Figure 3). Table 2 displays the compounds with their different properties.  (1) ZINC000011867103 (2) ZINC000011867127 (3) ZINC000011867122 (4) ZINC000104071421

Cytotoxicity of Active Compounds towards A549 and MRC-5 Cell Lines
Promising drug candidates should be non-toxic for human cells. Therefore, we tested compounds (1) and (4) using the resazurin assay and two different lung cell lines, A549 and MRC5. Figure 9 shows the cell viability after 72 h treatment with concentrations between 0.003 and 100 µM.

Cytotoxicity of Active Compounds towards A549 and MRC-5 Cell Lines
Promising drug candidates should be non-toxic for human cells. Therefore, we tested compounds (1) and (4) using the resazurin assay and two different lung cell lines, A549 and MRC5. Figure 9 shows the cell viability after 72 h treatment with concentrations between 0.003 and 100 µM. Compound (1) resulted in IC50 values of 51.02 ± 8.13 µM for A549 cells and no toxicity for MRC-5 cells. Compound (4) resulted in an IC50 value of 93.39 ± < 4.34 µM for MRC-5 cells and no toxicity for A549 cells.

Discussion
The nucleocapsid protein plays a major role in SARS-CoV-2 infection. It interferes with the expression of the stress granule formation G3BP1/2 and RIGL1 receptor pathway genes [32,33], increases cytokine, and chemokine production [34], and interferes with many other pathways in the human body [35]. NP also interacts with the NLRP3 inflammasome in mice by boosting the assembly and activation. It increases proinflammatory reactions, such as multiplied expression of different interleukins (e.g., IL-1β, IL-6, TNF,

Discussion
The nucleocapsid protein plays a major role in SARS-CoV-2 infection. It interferes with the expression of the stress granule formation G3BP1/2 and RIGL1 receptor pathway genes [32,33], increases cytokine, and chemokine production [34], and interferes with many other pathways in the human body [35]. NP also interacts with the NLRP3 inflammasome in mice by boosting the assembly and activation. It increases proinflammatory reactions, such as multiplied expression of different interleukins (e.g., IL-1β, IL-6, TNF, etc.). Subsequently, the strong IL-1β expression stimulates the NF-κB signaling pathway, and even more cytokines are released. This can ultimately lead to a cytokine storm [20,21]. The correlation between NP, NLRP3 inflammasome, and NF-κB was also confirmed by using Ingenuity Pathway Analysis (IPA) (Figure 10). Therefore, NP is a considerable target for small molecules to fight acute SARS-CoV-2 infections and the subsequent long-term side effects termed "long-COVID".  NP, as one of the main structural proteins of all coronaviruses (and many other viruses) should be more considered as an important drug target in addition to the coronaviral spike protein [4, 13,36]. Some NP inhibitors of SARS-CoV-2 and MERS-CoV have been previously described [37,38]. However, most of all these studies reported solely in silico data [39][40][41][42]. A few examples of candidates investigated both in silico and in vitro were the synthetic drugs remdesivir and ceftriaxone. Remdesivir showed promising results but has to be further tested for safety and efficiency. Ceftriaxone, which is an antibacterial drug, demonstrated a high binding affinity to the SARS-CoV-2 NP NTD and is discussed as a potential drug against COVID-19 [40,41]. NP represents, therefore, not only an attractive drug target but also provides ample opportunities for natural product-derived compounds. Therefore, our goal was to find NP inhibitors by a combined in silico and in vitro approach.
The in silico compound screening of a ZINC-derived natural product library with 210,541 compounds resulted in 73 candidates that were bound to the NP NTDs of SARS-CoV-2, SARS-CoV1, MERS-CoV, HCoV-OC43, and HCoV-NL63 with free binding energies below -9 kcal/mol. We selected five of them according to lowest binding affinity, molecular weight, logP and commercial availability for subsequent in vitro experiments to confirm their binding activity. Four of them have similar structures with a coumarin core, except for differences in conformation and the position of OH-and methyl-groups. As a control, we performed docking using AutoDock4.2.6 with the known ligands, rapamycin, hydroxychloroquine, and ceftriaxone. The binding affinities were −7.75, −6.07, and −8.69 kcal/mol, respectively. The binding affinities were slightly higher in comparison to our compounds. Hence, the chosen compounds in this project demonstrated better in silico binding affinities to the SARS-CoV-2 NP NTD than the control drugs.
Microscale thermophoresis experiments with compounds (1), (2), and (4) indeed verified the in silico predicted binding activity. Compound (1) resulted in a Kd value of 798 ± 2.03 nM, which was the lowest of all active compounds, suggesting a good potential as SARS-CoV-2 NP inhibitor. On the other hand, compound (2) showed the highest Kd value with 22.79 ± 1.52 µM and was, therefore, excluded from further analysis. Next, we tested these active compounds with the SARS-CoV-2 NP NTD and confirmed that compounds (1) and (4) were specifically bound to the NTD. Even if the sequences of both NTDs are highly conserved, the conformation/folding of the NTD of SARS-CoV-1 may differ from the one of SARS-CoV-2 leading to different binding energies and Kd values [43].
The new omicron variants of SARS-CoV-2 that emerged at the end of the year 2021 [44] contained not only mutations in the spike protein but also in NP [45]. Therefore, we were interested to test our candidate compounds also with the NP of the SARS-CoV-2 omicron mutant. Compound (1) was bound with a Kd value of 12.43 ± 0.26 µM. This Kd value was higher than the one of compound (1) binding to the wildtype NP of SARS-CoV-2 (Kd = 798 ± 2.03 nM). The omicron variant has no mutations within the NTD sequence of NP (aa 48-174) but outside of it (B.1.1.529: P13L, ERS31-33del, R203K, G204R) (https://de.acrobiosystems.com/P4496-SARS-CoV-2-Nucleocapsid-protein-His-Tag-%28B11529Omicron%29.html (accessed on 16 May 2022)). These mutations can influence the conformation of the full-length protein, since they are present within the dynamic phosphorylateable linker region (LKR) [12,15] and, therefore, have an impact on the binding of compounds to NP. We also compared our MST results to the binding affinity of RNA to NP, since this is the natural ligand of this protein. Wu et. al. (2021) measured the binding affinities between NP and RNA via a fluorescence polarization assay and calculated a Kd value of 0.007 ± 0.001 µM for the NP wildtype. This value was slightly lower than the results of our active compounds for NP. This difference is possibly due to different methods used to determine the Kd value. These authors also found that the binding affinity to the NP NTD only was much lower compared to the full-length protein [46]. The Kd values for NP and NP NTD were similar for our compounds.
MST is a very sensitive method for analyzing the binding between proteins and ligands. We used the labeling MST technique, since the specific labeling of proteins with fluorescent markers lowers the disturbance of visible and UV-active ligands. This might also apply for the compounds displayed here because of their aromatic systems [47]. On the other hand, the high sensitivity might be a limitation of MST. It is crucial to work very precisely for sample preparation, since very small concentrations and volumes are used. Throughout the whole process, from labeling to measuring, there are many sources of error [48].
The molecular docking analyses suggested that compounds (1) and (4) interact with key residues of SARS-CoV-2 NP NTD, including Ser4, Arg45, Tyr62, Tyr64, and Arg102. This may explain the inhibitory capacity of the two compounds, since these amino acid residues are involved in the RNA-binding activity of NP.
Compound (1) and (4) showed low toxicity to human lung cells. Both compounds have a higher logP, making them more hydrophobic. This could possibly also affect their toxicity and off-target effects. Local anesthetics may serve as an example, since they have lipophilic characteristics [49]. However, it should be mentioned that hydrophobic characteristics can increase the cellular absorption because of higher affinities to the lipid membrane [50]. This could be a positive effect of these compounds.
Compared to the MST results, compound (1) reached Kd values of <1 for SARS-CoV-2 NP and NTD and a Kd value of <13 µM with the omicron variant, equal to the Kd value of SARS-CoV-1 NP. Compared to the IC 50 value of A549 cells, the Kd values were more than 50-fold and 4-fold lower, respectively. For compound (4), a Kd value of~8 µM was obtained for SARS-CoV-2 NP and NTD, and <1 µM for SARS-CoV-1 NP. The IC 50 value occurring in MRC-5 was more than 11-fold lower. Hence, we concluded that compounds (1) and (4) did not show any cytotoxicity if used at a concentration range of the measured Kd values.
Compound (1) and (4) are derivates of the natural product coumarin. There was no further specific information available regarding their natural origin about both compounds. Chromene-derivates are usually secondary metabolites of plants such as Poaceae and Faboideae [50]. Typical plants containing coumarin are Melilotus officinalis, Galium odoratum, and Prunus mahaleb, but also plants from other families such as Phoenix dactylifera, Dipterix odorata, and several Cinnamomum species. Therefore, we have to leave it open whether our compounds are plant metabolites, metabolites in the human body, or semisynthetic derivatives that do not occur in nature. Nevertheless, we suggest that these two compounds may represent promising chemical scaffolds for further development against COVID-19 and other coronaviral infections.

Virtual Screening with Vina LC
Virtual screening and estimation of binding affinities was performed using a HPC "snakemake workflow" that implemented automatized steps of structural-based screening. . The PDB file formats were transformed to PDBQT files (Protein Data Bank Partial Charge and Atom Type). Heterogenous atoms were removed. The grid box-dimensions (x, y, z) were fitted to include all possible active amino acids of the RNA-binding site and the structures were saved as a "gfp" format. AutoDock 4.2.6. A Lamarckian algorithm was used to perform docking between the active compounds and the SARS-CoV-2 NP NTD to identify amino acids responsible for hydrophobic interaction and H-bonds and furthermore to create the graphic presentation of the protein domain with the bound compound.

Cell Lines
A549 lung cancer cells are frequently used in COVID-19 studies [33,54]. They were obtained from the Tumor Bank of the German Cancer Research Center (DFKZ, Heidelberg, Germany) and were maintained in Gibco™ RPMI 1640 medium with 10% fetal bovine serum (FBS), and 1% penicillin/steptomycin (PIS). Human diploid MRC-5 lung fibroblasts were kindly provided by Dr. Sebastian Zahnreich (Department of Radiation Oncology and Radiation Therapy, University Medical Centre of the Johannes Gutenberg University, Mainz, Germany). MRC-5 cells grew in Gibco™ DMEM, low glucose, pyruvate medium with 15% FBS, 1% PIS, and 1% Gibco™ MEM non-essential amino acids were used for cultivation. Both cell lines were incubated at 37 • C, 5% CO 2 , and 90% humidity. A549 cells were passaged every third day and MRC-5 cells every 6-7 days.

Cytotoxicity Assay
The cytotoxicity was tested by using a resazurin reduction assay [55,56]. Exponentially growing A549 and MRC-5 cells were seeded in 96-well plates at a density of 10 4 cells/well. Different compound dilutions ranging between 0.003 and 100 µM were added in a total volume of 200 µL and incubated for 72 h. Thereafter, 20 µL/well resazurin (0.01% w/v) were added (Sigma Aldrich, Taufkirchen, Germany). Fluorescence was measured after 4 h incubation via an Infinite M200 Pro plate reader (Tecan, Crailsheim, Germany). Doseresponse curves were generated by calculated the percentage of viable cells in treated samples compared to untreated control samples. The 50% inhibition concentration (IC 50 ) was calculated from three independent experiments with six each parallel measurements.

Conclusions
In this work, we first performed in silico studies to find possible inhibitors of the nucleocapsid protein NTD of the seven currently existing human-pathogenic coronaviruses, especially SARS-CoV-2. From these results, we have chosen five compounds for in vitro testing using MST. The binding of compounds (1) and (4) to SARS-CoV-2 NP of the wildtype and NP NTD could be confirmed. Compound (1) was also bound to the NP of the omicron variant. Both compounds demonstrated low or no toxicity towards lung cells. Since it is one of the first attempts to find compounds against the coronaviral nucleocapsid protein, further improvements in compound selection and planning of future studies can be considered to emphasize the activity of these compounds. Despite the fact that compound (1) was bound alongside the SARS-CoV-2 and SARS-CoV-1 wild-type NP, the binding to the omicron variant was reduced. Compound (4) did not bind to the omicron variant. Therefore, it is desirable to enlarge the search for more compounds in the future and to conduct further experiments on the mechanism of action in addition to MST.