Two Ligand-Binding Sites on SARS-CoV-2 Non-Structural Protein 1 Revealed by Fragment-Based X-ray Screening

The regular reappearance of coronavirus (CoV) outbreaks over the past 20 years has caused significant health consequences and financial burdens worldwide. The most recent and still ongoing novel CoV pandemic, caused by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) has brought a range of devastating consequences. Due to the exceptionally fast development of vaccines, the mortality rate of the virus has been curbed to a significant extent. However, the limitations of vaccination efficiency and applicability, coupled with the still high infection rate, emphasise the urgent need for discovering safe and effective antivirals against SARS-CoV-2 by suppressing its replication or attenuating its virulence. Non-structural protein 1 (nsp1), a unique viral and conserved leader protein, is a crucial virulence factor for causing host mRNA degradation, suppressing interferon (IFN) expression and host antiviral signalling pathways. In view of the essential role of nsp1 in the CoV life cycle, it is regarded as an exploitable target for antiviral drug discovery. Here, we report a variety of fragment hits against the N-terminal domain of SARS-CoV-2 nsp1 identified by fragment-based screening via X-ray crystallography. We also determined the structure of nsp1 at atomic resolution (0.99 Å). Binding affinities of hits against nsp1 and potential stabilisation were determined by orthogonal biophysical assays such as microscale thermophoresis and thermal shift assays. We identified two ligand-binding sites on nsp1, one deep and one shallow pocket, which are not conserved between the three medically relevant SARS, SARS-CoV-2 and MERS coronaviruses. Our study provides an excellent starting point for the development of more potent nsp1-targeting inhibitors and functional studies on SARS-CoV-2 nsp1.


Introduction
In the past two decades, coronaviruses (CoVs) have posed a constant threat to humans in various forms, from Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), the etiological agent for the outbreak of severe acute respiratory syndrome (SARS) in 2002 [1], to the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in 2012 [2], and the most recent novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), leading to the ongoing coronavirus pandemic first emerging at the end of 2019 in Wuhan, China [3]. The high infectivity and fatality of CoVs have caused great public concern [4] and its worldwide impact is causing a huge financial burden [5][6][7]. These events stress the importance of timely precaution and effective curative strategies such as vaccination and the development of antiviral agents, which significantly curbed the effects of the disease [8,9]. Even though vaccines are great at reducing hospitalisations and deaths the effectiveness of these is limited to the present variants [10,11] and the applicability and safety are unsatisfying for certain populations [12][13][14]. Therefore, there is a high number of infected individuals, both unvaccinated and breakthrough cases in the vaccinated, in urgent need for antivirals to alleviate their symptoms, and prevent complications. Remdesivir is an antiviral drug approved for the treatment of COVID-19, exerting its effect mainly by disrupting viral RNA synthesis [15]. The Ritonavir-boosted protease inhibitor PF-07321332 (Paxlovid) and various anti-SARS-CoV-2 monoclonal antibodies have been authorised for the use in the European Union for the treatment of COVID-19. Other medications are currently being evaluated in clinical trials for the treatment of COVID-19 [16][17][18]. All antivirals were more effective when administered early in the disease course [19,20]. There are also limitations that these antiviral agents are not available to all patients due to drug-drug interactions and contraindications [21,22]. Therefore, the discovery of specific anti-viral drugs for SARS-CoV-2 is urgently required.
The CoV genome is composed of a single, 5 -capped and 3 -polyadenylated RNA [23]. The first two-thirds of the genome encode two overlapping open reading frames (ORFs), ORF1a and ORF1b that produce polyproteins 1a and 1ab in a ribosomal frameshift. Downstream of this region, the remaining one-third of the genome encodes structural proteins (spike, envelope, membrane, and nucleocapsid) and several ORFs that produce the accessory proteins [24]. The polyproteins 1a and 1ab are then processed post-translationally by a virus-encoded protease, known as non-structural protein 5 (nsp5 or main protease) to produce other non-structural proteins (nsps) [25].
Among these proteins, non-structural protein 1 (nsp1) encoded by the gene closest to the 5 end of the viral genome received extensive attention. Nsp1 proteins of β-CoVs display similar biological functions in suppressing host gene expression [26,27] and inhibiting the innate immune response to virus infection [25,26]. SARS-CoV and SARS-CoV-2 nsp1 induce a near-complete shutdown of host protein translation by a three-pronged strategy: First, it binds to the small ribosomal subunit and stalls canonical mRNA translation at various stages [28,29]. Second, nsp1 binding to the host ribosome leads to endonucleolytic cleavage and subsequent degradation of host mRNAs [28]. The interaction between nsp1 and a conserved region in the 5 untranslated region (UTR) of viral mRNA releases the shutdown of viral protein expression as a self-protection mechanism [30,31]. Third, nsp1 interacts with heterodimeric host messenger RNA (mRNA) export receptor NXF1-NXT1, which is responsible for nuclear export of cellular mRNAs, causing a significant number of cellular mRNAs to be retained in the nucleus during infection [32]. Likewise, MERS-CoV nsp1 also exhibits a conserved function to negatively regulate host gene expression, despite employing a different strategy that selectively inhibits the host translationally competent mRNAs [33]. After infection, nsp1 also contributes to the evasion of the innate anti-viral immune response, mainly by antagonising interferon (IFN) induction and downstream signalling. In view of the crucial roles of nsp1 in the replication and virulence of CoVs, it represents an attractive target for both vaccine development and drug discovery.
The crystal structure of the N-terminal domain of SARS-CoV-2 nsp1 has been determined to 1.65 Å resolution (PDB entry 7K3N) [34] and the small C-terminal domain in complex with the human 40S ribosome (PDB entry 7K5I) [31] has been determined by cryo-electron microscopy providing useful structural information although the entire nsp1 structure could not be determined, probably due to its flexible linker region connecting these two distinct domains.
In this project we optimised crystallisation conditions and determined the structure of the N-terminal domain of nsp1 to atomic resolution of 0.99 Å, which served as a basis for fragment-based screening via X-ray crystallography. We identified two ligand binding pockets on nsp1, which bind a range of fragments. The fragment hits were further characterised by orthogonal biophysical assays. These fragments will serve as a basis for subsequent structure-based drug design.

Structure Determination of Nsp1 to Atomic Resolution
SARS-CoV-2 nsp1 is a protein of 180 residues consisting of an N-terminal domain (residues 1 to 128), followed by a linker region (residues 129 to 148) and a smaller C-terminal domain (residues 149 to 180) ( Figure 1A). We first generated an expression construct covering full-length SARS-CoV-2 nsp1 (residues 2 to 180) and an expression construct covering almost the entire N-terminal globular domain (residues 10 to 126). DNA inserts coding for residues 2 to 180 and 10 to 126 of SARS-CoV-2 nsp1, named nsp1 2-180 and nsp1  , respectively, throughout the manuscript, were codon optimised for expression in E. coli and the proteins were expressed and purified. Nsp1 10-126 was subjected to crystallisation trials using published crystallisation conditions [34], while no crystallisation conditions for Nsp1  have been reported so far. However, the published crystallisation condition of Nsp1 10-126 did not yield any crystals. We therefore tested a variety of commercial crystallisation screens using both proteins. The screening of Nsp1 10-126 yielded three types of crystals with distinct morphology (supplementary material Section Table S1), whereas no crystals appeared in the screens of Nsp1 2-180 , possibly due to the flexible linker between two structured domains hindering crystal formation. Nsp1 10-126 crystals were then cryo-cooled in liquid nitrogen, tested for diffraction quality and data were processed, all yielding the same space group P4 3 2 1 2 with very similar cell parameters (a = b = 36.81 Å, c = 140.96 Å, α = β = γ = 90 • ) and identical to the published space group [34].   The structure of Nsp1 10-126 was determined by molecular replacement (MR) and refined to atomic resolution of 0.99 Å. Data collection and refinement statistics are summarised in Table 1. The asymmetric unit contains one molecule of nsp1. The final model covers residues Glu10 to Asn126. The high resolution reveals a detailed structure of SARS-CoV-2 nsp1. The structure of SARS-CoV-2 nsp1 10-126 features a unique topological arrangement resulting in the formation of a seven-stranded (n = 7) β-barrel that is primarily antiparallel, except for strands β1 (His13-Val20) and β2 (Cys51-Val54) ( Figure 1B,C). Additional major structural features include helix α1 (Val35-Asp48), which is positioned as a cap along one opening of the β-barrel, three 3 10 helices that run parallel to each other, and the strand β5 (ILe95-Tyr97), which is not part of the β-barrel but forms a β-sheet interaction with the β4 strand (Val84-Leu92). Compared to the recently published structures determined at 1.65 Å and 1.77 Å resolution (PDB entries 7K3N and 7K7P) [34,35], our atomic structure identified a third 3 10 helix in the flexible loop region between β3 and β4 (residues 80 to 82). As the density indicates that this (loop) region may have various conformations, we tentatively fitted one conformation into the density. In addition, our structure also reveals a new strand, β6, and shows a longer strand β5 than the previously published structures. This complete model refined to higher resolution provides a more accurate structural picture of nsp1 and will serve as an excellent starting point for fragment-based screening.

Fragment Screening via X-ray Crystallography
Fragment screening was performed by soaking 584 fragments from the Maybridge Ro3 library into nsp1 10-126 crystals followed by validation through X-ray diffraction experiments in fully automated mode at beamline MASSIF-1 [36,37]. In addition, 40 nsp1 crystals without fragments were prepared to construct the Ground State (GS) model for data analysis in the multi-crystal software system PanDDA [38]. The overall resolution of data sets for the GS was 1.3 Å. 398 fragments were accepted for PanDDA analysis. Datasets that were rejected for the high R free values (14) were reprocessed and manually inspected. Potential fragment hits were discovered by PanDDA (34), and nine of them were verified in the single crystal system by manual inspection in COOT [39] followed by refinement in Phenix [40]. Here, we characterise five selected fragment hits. Data collection and refinement statistics for SARS-CoV-2 nsp1 10-126 -fragment complexes are summarised in Table S2 of the supplementary material section and the chemical structures of the fragment hits are shown in Figure 2. Fragments 10B6, 11C6 and 5E11 represent hits in which a phenyl group is connected to a 5-membered heterocyclic ring system. The second group of fragments represented by 7H2 and 8E6 contain a single phenyl ring system containing single or double substitutions.

Identification of Ligand Binding Sites in SARS-CoV-2 Nsp1
By overlaying all nsp1-fragment complexes, we could identify two distinct ligandbinding pockets. The majority of fragment hits, 3, bind to ligand binding site I, whereas 2 of the 5 hits bind to the shallower binding site II ( Figure 3A). Binding site I consists of strands β1 and β7 as well as residues of helix α1. Ligand-binding site II is formed by nsp1 and one of its symmetry mates involving the loop region between the first 3 10 helix and helix α1, two residues of α1 and residues in the loop connecting strands β6 and β7. Residues from the symmetry mate involved in binding are located in and close to the first 3 10 helix and in the second 3 10 helix. The shortest distance between the two binding pockets is about 18 Å and seems to be challenging to bridge for fragment linking strategies. The characteristics of these two pockets are summarised in Table 2.

Fragment Hits Binding to Pocket I Induce Small Structural Rearrangements
Having high-resolution native and fragment-bound structures allows us to discuss ligand-induced structural changes in detail. Fragment 11C6 was chosen as an example. With respect to binding site I, Glu10, Lys47 and Lys125 show the most obvious changes. Whereas in the native atomic structure the side chain of Lys125 is only partially visible, it is fully visible in the ligand bound structure, as it runs parallel to the fragment and forms hydrophobic interactions with one of the phenol rings and displays a hydrogen bond interaction with Glu10 ( Figure 3B). In the fragment bound structure the Glu10 side chain density is therefore mostly visible, while it is absent in the native structure. The density for the Lys47 side chain is also only partially visible in both native and fragment bound structures but some positive density in the native structure suggests one conformation extending toward the space, which the fragment would occupy. In the native structure, two water molecules are located in, or close to, the space otherwise occupied by the 3chlorophenyl group of the ligand, but these are absent in the fragment bound structure. In conclusion, binding of fragments to this site leads to a stabilisation of surrounding residues indicating a small but noticeable induced fit mechanism caused by fragment binding. In contrast, binding of fragment 7H2 to binding site II does not induce any significant structural changes in its surrounding environment.

Detailed Description of SARS-CoV-2 Nsp1 10-126 -Fragment Interactions
Fragments hits 10B6, 11C6 and 5E11 are chemically related as they contain a substituted phenyl ring connected to a five-membered heterocyclic thiophen or thiazol ring system. All three bind to ligand binding site I, establishing various interactions.
10B6 establishes various hydrophobic interactions with a range of residues of binding pocket I ( Figure 4A). The 2-amino group establishes a hydrogen bond interaction with a water molecule. Fragment 11C6 maintains the overall binding conformation in the same pocket ( Figure 4B). However, this fragment contains a chloride substituent in the meta-position instead of the para-substituted hydroxyl group. Binding occurs exclusively through hydrophobic interactions, but compared to 10B6, 11C6 displays significantly less interactions with nsp1. Interestingly, the orientation for fragment 5E11 is reversed whereby the 6-and 5-membered rings switch positions ( Figure 4C). Overall, many hydrophobic interactions of the fragment hit with residues in binding pocket I are maintained, but the nitrogen of the Lys47 side chain now establishes a hydrogen bond interaction with the sulphur of the thiophen ring system, while a N-methyl methanamine substituent on the 6-membered ring system does not establish a hydrogen bond interaction. This example shows the importance of high-quality and high-resolution structural information as mostly flat fragments can be easily fit into the electron density in various orientations and a wrong orientation of the ligand could easily lead to incorrect conclusions in subsequent SAR studies.
Fragments 7H2 and 8E6 bind to a distinct site on SARS-CoV-2 nsp1 10-126 named binding pocket II ( Figure 4D-G). The common feature of these two hits is that they both possess a single substituted phenyl ring system.
7H2 displays various hydrophobic interactions with residues of SARS-CoV-2 nsp1 10-126 ( Figure 4D). The ethan-1-amine group establishes a hydrogen-bond interaction through a water molecule with the main chain oxygen of Phe31. The opposite chloride substituent does not seem to contribute to binding to SARS-CoV-2 nsp1 10-126 . It is noticeable that compared to fragments binding to pocket I, 7H2 has a reduced number of interactions with residues of the protein, probably because this ligand binding site II is shallower than pocket I. Compared to 7H2, 8E6 is a mono-substituted fragment lacking the chloride substituent but carrying an amin-1-ethan-1-ol substituent instead of the ethan1-1-amine ( Figure 4E). As the data for this complex were particularly good, with a resolution of 1.18 Å, we placed the fragment in all possible orientations into the electron density and conducted refinement for six distinct orientations. The best refinement was obtained in which the substituent points towards the solvent, establishing a hydrogen-bond interaction with the side chains of Glu41 and His45 through a water molecule. Despite essentially observing electron density for only part of the fragment, 8E6 also establishes hydrophobic interactions with Pro109 and Val111, similar to 7H2. In contrast to 10B6, 11C6 and 5E1, 7H2 ( Figure 4F) and 8E6 ( Figure 4G) also establish interactions with a symmetry mate generating a set of novel interactions, which requires orthogonal assays to establish if there is a "real" binding site, or if these only bind in the presence of the arrangement dictated by the crystal.

Orthogonal Biophysical Assays to Further Validate Fragment Hits
To further validate SARS-CoV-2 nsp1 10-126 -targeting fragment hits, we employed two orthogonal biophysical assays, thermal shift assays (TSA) and microscale thermophoresis (MST). For MST experiments, the binding affinities between SARS-CoV-2 nsp1 10-126 and fragment hits were evaluated using the Monolith NT. 115 instrument with the pico-red fluorescent channel. The MST results of the fragments are summarised in Table 3. The K d values vary between 480 µM and larger than 20 mM. Interestingly, fragment hits binding in the "deeper" binding site systematically show better K d values than the two hits binding in the shallower site II which display weaker K d values. Crystallographic analysis of the binding site for 7H2 and 8E6 revealed that a symmetry mate contributed to the binding of the two fragments to nsp1, raising the possibility that both proteins in the particular crystallographic arrangement are required for binding and that these two fragments may not bind to nsp1 in solution. However, the calculated K d values clearly show that binding to a SARS-CoV-2 nsp1 10-126 monomer in solution is sufficient.
For TSA, nsp1 in the presence of an equal concentration of DMSO was used as a control to offset the influence of DMSO on the protein. The averaged change of inflection temperature ( T i ) of the protein in the presence of fragments higher than three-fold of standard deviation (| Ti|-3*SD > 0) was regarded as statistically significant. As can be seen from Table 3, except for 5E11, which showed atypical unfolding curves, all fragment hits influenced the unfolding property of SARS-CoV-2 nsp1 10-126 . Destabilisation effects were measured for fragment hits 8E6, 10B6, 7H2 and 11C6. In this regard, TSA may not be suitable yet for the validation of hits against SARS-CoV-2 nsp1 10-126 , as we have recently also observed for another SARS-CoV-2 protein, nsp10 [41]. With respect to physicochemical properties, calculated values for fragments are well within the expected range, indicating good starting points for structure-based drug design. Table 3. Summary of X-ray crystallography, MST and TSA results for fragment hits targeting SARS-CoV-2 nsp1 10-126 and selected calculated properties. The ∆T i values of SARS-CoV-2 nsp1 10-126 in the presence of fragments obtained from nanoscale differential scanning fluorimetry (nanoDSF) and fragment affinities for SARS-CoV-2 nsp1 10-126 determined by MST are shown. TSA and MST experiments were conducted at least in triplicate. The T i of SARS-CoV-2 nsp1 10-126 in the presence of 2.0% DMSO is 54.05 • C, which was used as a control in ∆T i calculations. The following chemical properties were calculated by Molsoft's ICM Pro software [42]. MW, molecular weight; PSA, polar surface area; MollogP, calculated partition coefficient; HBA, hydrogen bond acceptor; HBD, hydrogen bond donor; MollogS, calculated solubility. *APUC: atypical protein unfolding curve. To expand our investigation to other β-coronaviruses of medical relevance, fragment hits identified by X-ray crystallography were also tested on SARS-CoV-1 and MERS nsp1s to investigate potential cross-binding effects using MST (Table 4). With the high sequence identity of 86.3% between SARS-CoV-2 and SARS-CoV-1 nsp1s, only 7H2 shows no binding to SARS-CoV-1 nsp1 whereas 11C6 showed strong cross-binding effects. With lower sequence identity between MERS and SARS-CoV-2 nsp1, only 10B6 showed moderate binding affinity to MERS nsp1. This may be a first indication that developing pan inhibitors across medically relevant β-coronavirus by targeting nsp1 is challenging. To probe the homology of SARS-CoV-2, SARS-CoV-1, and MERS nsp1s, the sequences of the N-terminal domain of the three nsp1s were compared using structure-based sequence alignment ( Figure 5). Whereas nsp1 is highly conserved between SARS-CoV-1 and SARS-CoV-2 (86.3% identical residues, 8.6% strongly similar, 0.9% weakly similar, 4.3% different), the protein sequence from MERS differs significantly (19.4% identical residues, 21.0% strongly similar, 11.3% weakly similar, 48.4% different). A major difference between MERS and the two SARS sequences is the presence of four insertions in loop regions of the protein.
Consequently, most residues forming part of the two ligand binding pockets for the five fragment hits are therefore conserved between SARS-CoV-2 nsp1 and SARS-CoV-1 nsp1 but differ in MERS nsp1, in particular in binding site II.  Key residues in binding pockets I and II are indicated by blue and green triangles, respectively. The sequences were aligned using CLUSTAL W multiple alignment [43] and the secondary structure elements were extracted using ESPript 3.0 [44]. (B) Overlay of SARS-CoV-1 (coloured in light grey) and SARS-CoV-2 (coloured in light blue) highlighting the structural differences around binding pocket I (RMSD of 0.748 Å). Pairwise key residues of the two nsp1s in pocket I interacting with fragment hits are shown in sticks. Glu10 is absent in the NMR structure of SARS-CoV-1 nsp1 (PDB entry 2HSX).
As the structure of MERS nsp1 has not yet been determined, the structures of the two available SARS-CoV-1 and SARS-CoV-2 nsp1 proteins were overlaid highlighting key residues in the two binding pockets. For binding pocket I, almost all key residues of SARS-CoV-2 nsp110-126 involved in fragment binding are conserved in the N-terminal domain of SARS-CoV-1 nsp1. However, due to the noticeable displacement of the secondary structure elements in this region between the two proteins, these conserved residues are oriented in different directions or located at different locations ( Figure 5B). As even minor changes in these residues could greatly influence the interaction with fragment hits, these Key residues in binding pockets I and II are indicated by blue and green triangles, respectively. The sequences were aligned using CLUSTAL W multiple alignment [43] and the secondary structure elements were extracted using ESPript 3.0 [44]. (B) Overlay of SARS-CoV-1 (coloured in light grey) and SARS-CoV-2 (coloured in light blue) highlighting the structural differences around binding pocket I (RMSD of 0.748 Å). Pairwise key residues of the two nsp1s in pocket I interacting with fragment hits are shown in sticks. Glu10 is absent in the NMR structure of SARS-CoV-1 nsp1 (PDB entry 2HSX).
As the structure of MERS nsp1 has not yet been determined, the structures of the two available SARS-CoV-1 and SARS-CoV-2 nsp1 proteins were overlaid highlighting key residues in the two binding pockets. For binding pocket I, almost all key residues of SARS-CoV-2 nsp1 10-126 involved in fragment binding are conserved in the N-terminal domain of SARS-CoV-1 nsp1. However, due to the noticeable displacement of the secondary structure elements in this region between the two proteins, these conserved residues are oriented in different directions or located at different locations ( Figure 5B). As even minor changes in these residues could greatly influence the interaction with fragment hits, these structural changes could probably alter the binding for fragment hits on the N-terminal domain of SARS-CoV-1 nsp1, explaining their reduced binding affinity compared to SARS-CoV-2 nsp1. In binding pocket II, structural changes are less pronounced, but may be sufficient to abolish weak binding affinities of the fragments. Based on these structural comparisons, we therefore hypothesise that although SARS-CoV-1 and SARS-CoV-2 nsp1 share high sequence identities in binding sites I and II, the structural differences in these pockets probably explain the loss of binding affinity of fragment hits 5E11, 10B6, 7H2, and 8E6. A full understanding would require obtaining structural information on MERS and SARS-CoV-1 nsp1-fragment complexes.

The Predicted RNA Binding and Validated DNA Polymerase α-Primase Binding Regions Are in Close Proximity to Fragment Binding Sites I and II
The N-terminal nsp1 domain has been reported to release the expression shutoff for viral RNA by direct interaction with the stem loop 1 (SL1) [45]. Therefore, the spatial relationship between the binding pockets and the interaction regions was first investigated for potential overlaps of key residues for both fragment hits and mRNA binding. According to computational docking experiments between nsp1 and the host RNA SL1 region, nsp1 displays direct interactions mainly through residues 11-17, 118-130, and 144-148, where hydrogen bonds and ionic interactions were observed for a range of key residues [45]. Among these, Lys125 is both an important contributor to the two types of interactions with RNA but also interacts with fragment hits in binding pocket I ( Figure 6A). In view of the close proximity and this shared key residue, we speculate that targeting binding pocket I could interfere with RNA binding to nsp1, and subsequently impact on viral RNA expression.

Development of Resistance through Mutations in Nsp1 Ligand Binding Pockets?
Recently, Mou and co-workers, investigated close to 300,000 fully sequenced SARS-CoV-2 genomes for mutations in nsp1 [51]. For full-length nsp1 containing 180 residues they identified 933 non-synonymous mutations in the entire coding region, identifying at least one mutation for each single residue. The 19 most common mutations with a frequency between 113 and 1122 were reported. By comparing residues forming binding pocket I (Table 2) with the 19 residues with the highest frequency we conclude that none The primosome, the complex of DNA polymerase α (Pol α) and primase that is responsible for initiating DNA synthesis during DNA replication has recently been reported to be an important human host protein targeted by SARS-CoV-2 nsp1 [46,47]. In addition, the observation of disease-associated mis-splicing of POLA1 coding for the catalytic subunit of Pol α resulting in abnormal IFN I response and auto-inflammatory manifestation reveals the participation of Pol α in innate immunity [48,49]. By inspecting the interaction surface between SARS-CoV-2 nsp1 and the catalytic subunit of Pol α from the determined cryoelectron microscopy structure of the SARS-CoV-2 nsp1-primosome complex [50], we found that the fragment binding pocket II is located at the centre of the interaction interface and shares several key residues (Val28, Phe31, Glu41, His45, Pro109 and Val111) with the primosome ( Figure 6B). Therefore, we suggest that targeting binding pocket II could interrupt the interaction between SARS-CoV-2 nsp1 and the primosome, thus releasing the inhibition on the primosome in DNA synthesis and immune response.

Development of Resistance through Mutations in Nsp1 Ligand Binding Pockets?
Recently, Mou and co-workers, investigated close to 300,000 fully sequenced SARS-CoV-2 genomes for mutations in nsp1 [51]. For full-length nsp1 containing 180 residues they identified 933 non-synonymous mutations in the entire coding region, identifying at least one mutation for each single residue. The 19 most common mutations with a frequency between 113 and 1122 were reported. By comparing residues forming binding pocket I (Table 2) with the 19 residues with the highest frequency we conclude that none of the residues of this binding site belongs to the group with highest frequency, indicating that these residues do not seem to be among the group of residues that mutate easily. However, all residues in binding site I display some tendency to mutate with frequencies between 4 and 9. As binding pocket II is very shallow, only a limited number of SARS-CoV-2 nsp1 10-126 residues establish interactions with the fragment hits. Interestingly, one of the three residues, His45, belongs to the group of residues with most frequent mutations. However, since the interaction occurs via the main chain but not through the side chain, this mutation should have no significant effect, in case they do not significantly alter the local structure. In contrast, in the symmetry mate, Arg24 is one of the residues forming the ligand binding pocket. This residue shows the tendency to mutate with the highest frequency (1122) of all residues. In conclusion, binding pocket II is not only shallower than binding pocket I, but its residues involved in the formation of the pocket also show a higher tendency to mutate. We hypothesise that SARS-CoV-2 may be able to overcome inhibitors in binding pocket II by either mutating key residues involved in inhibitor binding, or through primary resistance as mutations may already be present in the virus.

The Rationale for Targeting N-and C-Terminal Domains of Nsp1
After infection of host cells, SARS-CoV-2 nsp1 can bind to the 40 S subunit of host ribosomes to block the entry of the host mRNA expression channel through its C-terminal domain. The C-terminal domain is connected by a linker region to its N-terminal globular domain, which remains tethered around the entry site [30]. As a result, host mRNAs are inaccessible to the expression cleft. Therefore, the host ribosome is hijacked by SARS-CoV-2 nsp1 to mainly serve the production of viral proteins. Targeting the C-terminal SARS-CoV-2 nsp1 domain would thus represent a valid strategy to interfere with its binding to the 40 S subunit and closure of the channel in the small ribosome subunit. However, this approach remains challenging as this small domain appears to be unstructured [52], only folding into two helices upon binding to the ribosome [30].
Residues in the N-terminal and linker region of SARS-CoV-2 nsp1 are not involved in docking into the mRNA entry channel, but they do stabilise its association with the ribosome, enhancing its restriction of host gene expression [53]. Ribosome-bound SARS-CoV-2 nsp1 further induces endonucleolytic RNA cleavage in the 5 -UTR of host mRNAs [54,55]. However, viral mRNA with a special stem-loop 1 (SL1) within the leader sequence in the 5 -UTR can circumvent the SARS-CoV-2 nsp1-mediated translational inhibition and cleav-age by directly interacting with the N-terminal domain and the linker region of the protein, ensuring sufficient viral protein expression [31,45,56]. It was also reported that the activity of SARS-CoV-2 nsp1 in translation termination is provided by its N-terminal domain by binding with the 80 S ribosomes translating host mRNAs and removing them from the pool of the active ribosomes [57]. Additionally, the interaction between the N-terminal globular domain of SARS-CoV-2 nsp1 and the primosome, which is essential for host DNA replication, identified by biochemical and structural characterisation, suggests that targeting the primosome is part of a novel mechanism driving SARS-CoV-2 infections [50]. In view of the important roles played by both N-and C-terminal domains in host gene expression and escape of viral mRNA in translational repression, nsp1 appears to be a potential drug target.
In conclusion, we identified two ligand-binding pockets in the N-terminal domain of nsp1 that accommodate fragments and described their detailed molecular interactions. Whereas the first ligand binding site can be considered as a "true" pocket establishing multiple interactions between residues of SARS-CoV-2 nsp1 10-126 and fragment hits, the second ligand binding pocket is shallower with a reduced number of protein-fragment interactions and weaker K d estimates. In an accompanying paper, we have used extensive molecular dynamics simulations and fragment-based screening to highlight the cryptic nature of these pockets [58]. Furthermore, our fragment screening work is also an excellent example of how difficult it can be to fit fragment hits into positive electron density, despite high resolution between 1.1 to 1.4 Å and excellent data quality, when fragment hits are predominantly flat, substituents around the core chemical structure are flexible and fragment hits also interact with symmetry mates in the crystal. MST assays conducted using full-length nsp1 from SARS and MERS, two previously known β-CoVs with severe medical impact, to investigate cross-binding effects provide a first hint of how challenging it may be to develop pan CoV inhibitors against diverging medically relevant CoV such as SARS-CoV-2, SARS and MERS. The experimental identification of binding pockets and induced fit changes will be a good starting point for computational docking and screening for inhibitors with lower K d more accurately. Nevertheless, fragment hits will act as templates in future analogue design, combined with structure-based optimisation. Fragment hits reveal druggable binding pockets on SARS-CoV-2 nsp1 10-126 , which will provide novel insights into further drug design, but we assume that additional ligand binding pockets on SARS-CoV-2 nsp1 may be discovered in the future. Currently, the existence of two pockets allows for fragment growing and merging techniques but not fragment linking.

Subcloning, Expression and Purification of SARS-CoV-2 Nsp1 Constructs
Codon-optimised DNA for expression in E. coli for the N-terminal domain of SARS-CoV-2 nsp1 was obtained by extracting the protein sequence of the N-terminal domain from PDB entry 7K7P. In frame restriction sites for NcoI and XhoI were added at the 5 and 3 and the synthesised insert was subcloned into expression vector ppSUMO-2, containing a Ulp1-cleavable N-terminal histag and SUMO domain, using the same restriction sites. The expressed nsp1 protein construct codes for residues 10 to 126 of SARS-CoV-2 nsp1 and includes two unspecific residues, Thr-Met, at the N-terminus of the protein due to the cloning strategy.
200 µL of competent E. coli BL21-Gold (DE3) cells (Agilent Technologies) was gently mixed with 1 µL ppSUMO-nsp1 expression plasmid and incubated on ice for 30 min. A heat shock at 42 • C was employed for 45 sec followed by 2 min incubation on ice. Then, 400 µL of SOC medium (Sigma-Aldrich, UK) was added followed by incubation for 1 h on a shaker-incubator at 37 • C, 500-600 rpm. 200 µL of transformed cells were spread evenly on LB agar (MP Biomedicals, LLC France) plates supplemented with 50 µg/mL kanamycin and incubated at 37 • C overnight.
To express the nsp1 constructs, a single BL21-Gold (DE3) colony was picked and added into 35 mL of TB medium supplemented with 50 µg/mL kanamycin. The small culture was then incubated in a shaker at 37 • C, 220 rpm overnight. Subsequently, the small culture was separately added into 12 Erlenmeyer flasks each containing 1 L of TB medium supplemented with 50 µg/mL kanamycin and incubated in a shaker at 37 • C, 220 rpm for 5 h until the OD 600 value reached an OD between 0.6 to 0.9. The large cultures were kept at 4 • C for 20 min before being induced for protein expression using 1 mM (Isopropylthio-βgalactoside) IPTG and incubated at 18 • C, 220 rpm for 24 h. The following day, the culture was harvested by centrifugation at 8000 g (Avanti ® J-E centrifuge from Beckman Coulter, rotor: JLA 16.250), for 30 min at 4 • C. The cell pellets were then resuspended with 300 mL of buffer A (25 mM NaH 2 PO 4 , 25 mM Na 2 HPO 4 pH 7.5, 300 mM NaCl, 20 mM Imidazole and 1 mM PMSF), frozen in liquid nitrogen and stored at −80 • C.
All subsequent purification steps were performed at 4 • C. The cell pellets were thawed and sonicated for 15 rounds on ice for 30 sec followed by a one min rest interval in each round. The lysate was centrifuged at 20000 rpm for 45 min in centrifuge Beckman Coulter Avanti ® J-E with a, JLA 25.50 rotor). The supernatant was collected and loaded into a 5 mL HisTrap FF crude column pre-equilibrated with buffer A. The column was then washed with 50 CVs of buffer B (25 mM NaH 2 PO 4 , 25 mM Na 2 HPO 4 pH 7.5, 300 mM NaCl and 20 mM Imidazole) on a ÄKTA FPLC. Protein bound to the column was then eluted isocratically with 20 CVs of buffer C (25 mM NaH 2 PO 4 , 25 mM Na 2 HPO 4 pH 7.5, 300 mM NaCl and 250 mM Imidazole). The purity of the fractions was monitored by SDS-PAGE (Invitrogen by Thermo Fisher Scientific) and then combined and quantified with Bradford reagent (BioRad, Solna, Sweden). A quarter of the protein was transferred into a SnakeSkin ® Dialysis membrane tubing (10 kDa) in 2 L of buffer E (10 mM HEPES pH 7.6, 300 mM NaCl) overnight for buffer exchange. The protein was harvested and concentrated to 18 mg/mL using an Amicon Ultra Centrifugal Filters (10 kDa) for subsequent MST assays.
The remaining three quarters of the protein were dialysed in a SnakeSkin ® Dialysis membrane tubing supplied with 3 mM β-mercaptoethanol (β-ME) and Ulp1 protease (1 mg Ulp1 for 30 mg protein) in 2 L of buffer D (25 mM NaH 2 PO 4 , 25 mM Na 2 HPO 4 pH 7.6, 300 mM NaCl and 3 mM β-ME) overnight to cleave the His-SUMO fusion protein. After dialysis, nickel-affinity purification was applied. The cleaved protein was loaded into the second pre-equilibrated HisTrap FF crude column followed by washing with 3 CVs of buffer B. The cleaved nsp1 in the flow through was collected. The purity and the concentration were verified by SDS-PAGE and Bradford reagent. Afterwards, the protein was concentrated to 25 mg/mL using an Amicon Ultra Centrifugal Filters (10 kDa), aliquoted into 100 µL fractions, frozen in liquid nitrogen and stored at −80 • C for subsequent crystallisation.

Crystallisation of Nsp1
SARS-CoV-2 nsp1 10-126 crystallisation trials were conducted with sitting drop vapour diffusion using commercial crystallisation screens and the Mosquito ® crystallisation nanodrop robot (SPT Labtech, Melbourne, UK) on 96-well 3-drop plates (SWISSCI AG, Zug, Switzerland). The thawed SARS-CoV-2 nsp1 10-126 was diluted to 20 mg/mL with buffer E. Each drop was set up by mixing 200 nL of SARS-CoV-2 nsp1 10-126 with 200 or 400 nL of reservoir solution from the commercial crystallisation screens. The crystallisation plates were incubated at 18 • C. Large crystals appeared after incubation overnight in several crystallising conditions in three different shapes (supplementary Table S1).
Cryoprotectant solution was prepared by adding 10%, 15% or 20% DMSO into crystallisation buffer (0.1 M HEPES pH 7.5 and 25% w/v PEG3500). 200 nL of cryoprotectant was mixed with around 200 nL of protein drops with large crystals formed therein to give final DMSO concentrations of 5%, 7.5% and 10%. As three different shapes of crystals were observed, after incubation for 5 min at room temperature, large crystals of each type were harvested, cryo-cooled in liquid nitrogen, and stored in unipucks, which were also kept in liquid nitrogen ready for diffraction experiments.
Based on the X-ray diffraction results of the crystals, condition Index 44 (0.1 M HEPES pH 7.5, 25% w/v PEG3350) was chosen for further SARS-CoV-2 nsp1 10-126 crystallisation in large quantities. Frozen stocks of SARS-CoV-2 nsp1 10-126 were thawed on ice and centrifuged in a Thermo Scientific Pico 17 Microcentrifuge, 24-Pl Rotor at 4 • C, 20000 rpm for 10 min to remove aggregates before the determination of the protein concentration. Subsequently, the protein stock was diluted to 20 mg/mL with precrystallisation buffer. 400 uL of Index condition 44 (0.1 M HEPES pH 7.5 and 25% w/v PEG3350) was added into each reservoir well. Five protein drops were set on each cover slip by mixing 1 µL of protein solution with 1 µL of the reservoir. The 24-well Linbro plates were placed at 18 • C in an incubator.

Fragment Soaking of Nsp1 Crystals
For nsp1 fragment soaking we used the same vapour diffusion protocol as described above. Fragments were selected from the Maybridge Ro3 library at a concentration of 200 mM dissolved in DMSO. Fragments were diluted in Index 44 solution to 40 mM containing 20% DMSO. An equal amount of DMSO was diluted in Index 44 solution to obtain 20% DMSO. Then, 1.5 µL of each fragment solution were added into around 1 µL of crystallisation drops, making the final concentration of fragments 24 mM and approximately 12% DMSO. 1.5 µL of 20% DMSO solution was mixed with about 1 µL of crystallisation drops as protein-only samples, which were used to construct the ground state model required for PanDDA analysis. Drops were incubated at room temperature for 3-4 h followed by freezing of crystals in liquid nitrogen and stored in unipucks at the same temperature for subsequent diffraction experiments.

X-ray Diffraction Data Collection, Structure Determination and Refinement
For high-resolution data collection of SARS-CoV-2 nsp1 10-126 , protein crystals were soaked in 10% MPD for 15 sec and cryo-cooled in liquid nitrogen for subsequent data collection at ID30B. 2300 images were collected with 0.02 s exposure time, an oscillation of 0.05 degrees, and a beam size of 50 µm. Data collection of fragment-soaked nsp1 10-126 crystals was conducted at beamline MASSIF-1 [37] at the ESRF. The high-resolution diffraction data was indexed, processed and integrated using the XDS package [59] using a minimum 0.5 value for I/sigma and statistically significant CC(1/2) values to define the high resolution cutoff. The diffraction data for fragment-soaked protein samples were automatically processed by various data reduction packages autoPROC [60], autoPROC_staraniso [61], XDSAPP [62], XIA2_DIALS [63], EDNA_proc [64]) to generate merged and integrated data sets. The data sets with the best statistics in terms of resolution, completeness and merging statistics were downloaded from ISPyB. For the native high-resolution structure, molecular replacement (MR) was performed in Phenix with the published coordinate set of SARS-CoV-2 nsp1 10-126 (PDB entry 7K7P) as a search model. The obtained structure was first automatically refined in Phenix followed by manual inspection and real space refinement in COOT, and followed by at least a second round of refinement in Phenix. Several iterations of refinement in Phenix and visual inspection in COOT were performed until the R free value did not decrease any further. The model derived from refinement against the highest resolution data set was used as the model in the following MR procedure for SARS-CoV-2 nsp1 10-126 -fragment complex data sets.
To identify fragment hits binding to SARS-CoV-2 nsp1 10-126 , PanDDA, a multi-dataset crystallographic analysis program was employed. Each dataset contains a coordinate file with the suffix "pdb" and an electronic density map file with the suffix "mtz" for each protein sample along with the chemical structure files with the suffix "pdb" and "cif" for the fragment in which the SARS-CoV-2 nsp1 10-126 crystal was soaked. The "pdb" and "mtz" files were generated after MR by the program Dimple in CCP4 [65] with the new model, while the chemical structure files for fragments were generated by Elbow in Phenix. In addition, 40 high-resolution datasets of native SARS-CoV-2 nsp1 10-126 from the protein soaked in 12% DMSO solution were used to construct a "ground state" model of SARS-CoV-2 nsp1 10-126 in PanDDA.
After automatic analysis and inspection, the identified hits were verified manually in COOT by fitting the chemical structures of the fragments into their corresponding electron density. The complexes were then refined in Phenix to verify the correctness of the fragment conformations in binding events.

Thermal Shift Assays for Nsp1-Fragment Complexes Using nanoDSF
The protein concentration used in this assay was determined by analysing the unfolding curves of SARS-CoV-2 nsp1 10-126 at different concentrations in the presence of 2% DMSO. Fragments dissolved in DMSO (200 mM stocks) were diluted in buffer E supplied with 0.05% Pluronic ® F-127 and mixed with nsp1  . The final concentration of fragments was 4 mM containing 2.0% DMSO, while the final protein concentration was 1.25 mg/mL. After centrifugation for 5 min at 6000 rpm, the samples were loaded into Tycho NT.6 capillaries (TY-C001, NanoTemper, München, Germany) with 1.25 mg/mL SARS-CoV-2 nsp1  in the presence of 2% DMSO as control and measured by the Tycho NT.6 instrument (NanoTemper, München, Germany) in triplicate. The resulting T i values are presented as means ± SD (n = 3) and the T i values for each sample were calculated with the following equation.
T i = T i (protein-fragment) − T i (protein-DMSO)

MST for the Estimation of K d Values of Fragment Hits
To label the protein with fluorescent dye, 100 nM of RED-TRIS-NTA 2nd Generation dye (MO-L018, NanoTemper, Müchen, Germany) was mixed with 800 nM SARS-CoV-2 nsp1 10-126 , SARS-CoV nsp1 2-180 or MERS nsp1  and incubated for 30 min on ice. The mixtures were centrifuged in a Thermo Scientific Pico 17 MicroCentrifuge, 24-Pl Rotor at 15,000× g for 10 min at 4 • C to remove any aggregates. For fragment hits 5E11, 10B6, and 11C6, labelled protein was diluted to 200 nM in assay buffer (buffer E supplemented with 0.05% Pluronic(R) F-127) and the fragment hits were diluted from 200 mM to 20 mM with assay buffer. 20 µL of DMSO was mixed with 180 µL of assay buffer as ligand buffer. The fragment solution was serially diluted with ligand buffer with a dilution factor of two, obtaining 16 fragment solutions in gradient concentrations. Then, an equal volume of protein solution was mixed with each fragment solution and incubated for 30 min at 4 • C to reach the binding equilibrium. The final fragment concentrations were from 10 mM to 305 nM with 5% DMSO. The final protein concentration in each sample was 100 nM. The samples were centrifuged at 6000 rpm for 5 min before the supernatant was being loaded into capillaries and detected in the Monolith NT.115 Pico instrument (NanoTemper, Müchen, Germany) under the Pico-RED channel with 5% excitation power and 40% MST power under the Binding Affinity mode in the MO. Control software. For fragment hits 7H2 and 8E6, labelled protein was diluted to 80 nM in assay buffer and the fragment hits were diluted from 200 mM to 40 mM with assay buffer. 40 µL of DMSO was mixed with 160 µL of assay buffer as ligand buffer. The fragment solution was serially diluted with ligand buffer with a dilution factor of 1.3, obtaining 16 fragment solutions in gradient concentrations. Then, an equal volume of protein solution was mixed with each fragment solution and incubated for 30 min at 4 • C to reach the binding equilibrium. The final fragment concentrations were from 20 mM to 390 µM with 10% DMSO. The final protein concentration in each sample was 40 nM. The samples were centrifuged at 6000 rpm for 5 min before the supernatant was loaded into capillaries and detected under the Pico-RED channel with 20% excitation power and 40% MST power under the Expert mode in the MO. Control software. The temperature was set at 25 • C. Each fragment hit was tested at least in triplicate. The data were analysed, and the figures were generated in the MO. Affinity Analysis software.

Calculation of Physico-Chemical Properties and Preparation of Figures
The molecular weight (MW), the calculated partition coefficient (clogP) and the polar surface area (PSA) of fragment hits were calculated from the chemical structures using ChemDraw version 19.1. Hydrogen-bond donors and acceptors as well as calculated solubility (clogS) were calculated on the drug-likeness prediction server of Molsoft: http://molsoft.com/mprop/ (accessed on 11 November 2021). Figures were prepared using ChemDraw 19.2, Pymol (The PyMOL Molecular Graphics System, Version 2.4.1, Schrödinger, LLC), Chimera [66] and Liplot+ [67]. The default values for hydrogen-bond interactions (2.7 to 3.35 Å) and non-bonded contacts (2.9 to 3.9 Å) were used.