Identification of B-Cell Linear Epitopes in the Nucleocapsid (N) Protein B-Cell Linear Epitopes Conserved among the Main SARS-CoV-2 Variants

The Nucleocapsid (N) protein is highlighted as the main target for COVID-19 diagnosis by antigen detection due to its abundance in circulation early during infection. However, the effects of the described mutations in the N protein epitopes and the efficacy of antigen testing across SARS-CoV-2 variants remain controversial and poorly understood. Here, we used immunoinformatics to identify five epitopes in the SARS-CoV-2 N protein (N(34–48), N(89–104), N(185–197), N(277–287), and N(378–390)) and validate their reactivity against samples from COVID-19 convalescent patients. All identified epitopes are fully conserved in the main SARS-CoV-2 variants and highly conserved with SARS-CoV. Moreover, the epitopes N(185–197) and N(277–287) are highly conserved with MERS-CoV, while the epitopes N(34–48), N(89–104), N(277–287), and N(378–390) are lowly conserved with common cold coronaviruses (229E, NL63, OC43, HKU1). These data are in accordance with the observed conservation of amino acids recognized by the antibodies 7R98, 7N0R, and 7CR5, which are conserved in the SARS-CoV-2 variants, SARS-CoV and MERS-CoV but lowly conserved in common cold coronaviruses. Therefore, we support the antigen tests as a scalable solution for the population-level diagnosis of SARS-CoV-2, but we highlight the need to verify the cross-reactivity of these tests against the common cold coronaviruses.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the current worldwide outbreak, which less than four months after the first case in China [1] was declared by World Health Organization (WHO) as a pandemic [2]. Until now, the coronavirus disease 2019 (COVID-19) affected more than 663 million people and resulted in about 7 million deaths throughout the globe [3]. Currently, despite the use of vaccines to tackle the spread of the virus and minimize the associated morbidity and mortality [4], more than 80,000 cases and about 500 deaths still are daily reported around the world [3]. In this scenario, with about 37 tmillion COVID-19 confirmed cases and more than 698,000 deaths, Brazil currently figures as the second country with more deaths and as the sixth in COVID-19 cases around the world [3].
SARS-CoV-2, like all RNA viruses, is susceptible to rapidly accumulating mutations in its genome [5] leading to changes that can allow the virus to evade the immune response or affect its ability to spread or cause disease, resulting in the rise of lineages that are

Sequences Data and 3D Structures
To predict possible antigenic properties and select potential B-cell epitopes, the sequence of SARS-CoV-2 N protein (UniProt ID: P0DTC9) was used. The complete structure of the N protein was modeled using the Robetta server (http://new.robetta.org/, accessed on 10 September 2020) [20,21] based on the full-length amino acid sequence of the protein. This server is continually evaluated through CAMEO (Continuous Automated Model Evaluation) and generates five models analyzed by MolProbity (molprobity.biochem.duke.edu; accessed on 20 September 2020), which is a widely used system of model validation for protein structures. The best predictive model was selected and used in further analysis.

In Silico Prediction of Linear B-Cell Epitopes
We used a combination of web-based tools for B-cell epitope prediction: the Immune Epitope Database (IEDB) [22] and ABCpred [23] servers.
The IEDB (http://www.iedb.org/, accessed on 10 March 2020) is a freely available resource funded by NIAID. This server catalogs experimental data on antibody and T cell epitopes studied in humans, non-human primates, and other animal species in the context of infectious disease, allergy, autoimmunity and transplantation. The IEDB also hosts tools to assist in the prediction and analysis of epitopes. In this study, we used the ElliPro [24], Bepipred 1 [25] and EMINI Surface Accessibility [26] modules on the IEDB server with default settings to define B-cell linear epitopes exposed on the protein surface. We also used the ABCpred server [23] to refine our prediction using an artificial neural network (ann) method. All algorithms were accessed on 10 March 2020. Finally, predicted sequences with more than 9 mers and that were predicted by at least three of the algorithms were defined as linear B-cell epitopes.

Prediction of Antigenicity
To exclude non-antigenic sequences, the predicted linear B-cell epitopes were evaluated by the VaxiJen server (accessed in 12 March 2020), which is the first server for alignment-independent prediction of protective antigens. Its algorithm was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. Viral datasets were selected to derive models for the prediction of whole protein antigenicity, showing prediction accuracy from 70 to 89% [27][28][29]. Using the default threshold (0.5), all sequences predicted as non-antigenic were excluded from the study.

Peptide Synthesis
The best predicted epitopes were synthesized by fluorenylmethoxycarbonyl (F-moc) solid-phase chemistry [30,31] (GenOne Biotechnologies, Rio de Janeiro, Brazil). Analytical chromatography of the peptide demonstrated a purity of >95%, and mass spectrometric analysis also indicated estimated masses corresponding to the molecular masses of predicted peptides.

Patients and Samples
Samples from convalescent COVID-19 donors: Twenty individuals (12 women and 8 men), with ages ranging from 25 to 51 years (mean age: 35.8 ± 6.7 years) and confirmed SARS-CoV-2 infections who had been tested using real-time RT-PCR for viral infections or who had tested positive in the serological assay for COVID-19, were invited to enroll in the study. The serum samples were collected only in the convalescent phase. After recovering from COVID-19, convalescent donors were screened for symptoms and had to be symptom-free and approximately 3 weeks out from symptom onset at the time of the blood draw. Asymptomatic individuals, who had had contact with infected patients and were positively tested by RT-PCR but who had not presented symptoms for at least 21 days post-diagnosis, were also invited to enroll in the study. All donors voluntarily gave their informed consent before being enrolled in the study. Individuals did not receive compensation for their participation.
Healthy unexposed donors: A total of 20 samples (13 women and 7 men), from blood donors, in Brazilian blood centers between the years 2010 and 2018 with ages ranging from 20 to 56 years (36.7 ± 10.4 years) were randomly selected from the serum biobank for the development of diagnostic tests of the Institute of Technology in Immunobiologicals. These samples were considered to be from unexposed controls, given that SARS-CoV-2 emerged as a novel pathogen in late 2019, more than one year after the collection of any of these samples.
Peripheral blood samples were collected by venipuncture in EDTA tubes. After centrifugation (350× g, 10 min), the plasma was collected and stored at −30 • C.
Written informed consent was obtained from all COVID-19 donors, and the study was reviewed and approved by the Oswaldo Cruz Foundation Ethical Committee and the National Ethical Committee of Brazil (CEP-FIOCRUZ CAAE 31368620.0.0000.5262).

Antibody Assays
Plasma samples from donors were screened for the presence of naturally acquired antibodies against the S-ECD and S-RBD recombinant proteins and synthetic peptide, predicted as linear B-cell epitopes in SARS-CoV-2 N protein, by enzyme-linked immunosorbent assay (ELISA) essentially as previously described [32][33][34]. Briefly, MaxiSorp 96-well plates (Nunc, Rochester, NY, USA) were coated with PBS containing 5 µg/mL of recombinant protein or 20 µg/mL of a peptide. After overnight incubation at 4 • C, plates were washed with PBS and blocked with PBS-Tween containing 5% skim milk (PBS-Tween-M) for 1 h at 37 • C. Individual plasma samples diluted 1:100 on PBS-Tween-M were added in duplicate wells, and the plates were incubated at 37 • C for 2 h. After three washes with PBS-Tween, bound antibodies were detected with peroxidase-conjugated goat anti-human IgM (Sigma, St. Louis, MO, USA, cat number A 6907) or peroxidase-conjugated goat anti-human IgG (Southern, AL, cat number 2040-05) followed by the addition of 3,3 ,5,5 -tetramethylbenzidine (Sigma St. Louis, MO, cat number N301). Optical density was measured at 450 nm using a Spectra-Max microplate spectrophotometer (Molecular Devices, Sunnyvale, CA, USA). The results for total IgM and IgG were expressed as reactivity indexes (RIs), which were calculated by the ratio between the mean optical density of an individual's tested sample and the mean optical density samples of 20 unexposed individuals plus 2.5 standard deviations. Subjects were classified as responders to an antigen if the RI of IgM or IgG were higher than 1.

In Silico Conservancy Analysis of Amino Acid Residues Recognized by Antibodies
To investigate the cross-reactivity of described antibodies against the N protein across the SARS-CoV-2 variants and other coronaviruses, we selected the crystallographic structures of two nanobodies (PDB: 7R98 and 7N0R) and one antibody (PDB: 7CR5). Using the LigPlus, we listed the main SARS-CoV-2 N protein amino acids recognized by antibody/nanobody and compared the conservation of these amino acids between SARS-CoV-2 variants and other coronaviruses, which were aligned by MUSCLE using Meg Align Pro on DNASTAR Lasergene software.

Statistical Analysis
All statistical analysis was carried out using Prism 5.0 for Windows (GraphPad Software, Inc.). The one-sample Kolmogorov-Smirnoff test was used to determine whether a variable was normally distributed. The Wilcoxon matched pairs test or the paired T-test was used to compare the reactivity indexes of synthetic peptides and recombinant proteins. Differences in the frequency of responder of IgM and/or IgG responders to recombinant proteins were evaluated by chi-square test (χ 2 ).
The first column indicates the name of the epitope, representing the start and end position of the sequence. Vaxijen scores above 0.5 were considered antigenic. The "X" indicates that the algorithm in this column predicted, completely or partially, the sequence in the line, while the "-" indicates that the algorithm did not predict the sequence. Sequences predicted as antigenic linear B-cell epitopes were considered promising targets for serological tests and are highlighted in bold.

Epitope
Length The first column indicates the name of the epitope, representing the start and end position of the sequence. Vaxijen scores above 0.5 were considered antigenic. The "X" indicates that the algorithm in this column predicted, completely or partially, the sequence in the line, while the "-" indicates that the algorithm did not predict the sequence. Sequences predicted as antigenic linear B-cell epitopes were considered promising targets for serological tests and are highlighted in bold.

Profile of Convalescent COVID-19 Donors
To experimentally validate B-cell linear epitopes on SARS-CoV-2 N proteins, plasma samples of 20 convalescent COVID-19 donors were obtained in the early months of the pandemic COVID-19 in Brazil: in 2020 July and June. All studied individuals were from the state of Rio de Janeiro in Brazil, where more than 130,000 cases and about 12,000 deaths were reported until early 2020 July. All convalescent donors had recovered from COVID-19 and were screened for symptoms before scheduling blood draws. They had to be symptomfree and approximately 3 weeks out from symptom onset at the time of the blood draw. Regarding the diagnosis of SARS-CoV-2 infection, 60% of donors were positive diagnosed only by RT-PCR to SARS-CoV-2, 30% were positive only by commercial serological assay and 10% of tested individuals were positive by both methods. About the clinical spectral, 80% of donors experienced mild illness and reported fatigue, fever, headache, and cough as the most common symptoms; 10% of donors presented asymptomatic cases, diagnosed by RT-PCR and serological methods, and persisted without symptoms for 25 days between the molecular diagnose and blood draw, while two donors presented complications (thrombosis and bacterial pneumonia); both already recovered at the moment of blood draw. The characteristics of the studied individuals are summarized in Table 2.

Analysis of Epitope Conservation across Other Human Coronaviruses
After verifying the conservancy of identified epitopes among SARS-CoV-2 isolates, we compare its conservancies among other human coronaviruses (SARS-CoV, MERS-

Evaluation of Antibody Cross-Reaction against SARS-CoV-2 Variants and Other Coronaviruses
To investigate the cross-reactivity of the described antibodies across the SARS-CoV-2 variants and other coronaviruses, we selected the crystallographic structures of two nanobodies (PDB: 7R98 and 7N0R) and one antibody (PDB: 7CR5). Using the LigPlus, we listed the main SARS-CoV-2 N protein amino acids recognized by antibody/nanobody and compared the conservation of these amino acids between SARS-CoV-2 variants and other coronaviruses.
As shown in Table 4, all amino acids recognized by an antibody were completely conserved across SARS-CoV-2 variants once these amino acids were not among the 16 key mutations related to main variants (D3L, Q9L, P13L, Del31/33, D63G, P80R, E136D, R203M, R203K, G204R, T205I, G215C, L230F, S235F, D377Y, S413R). Regarding the cross-reactivity with other human coronaviruses, we observed different levels of conservation across the human coronaviruses (SARS, MERS, 229E, NL63, OC43, and HKU1). Firstly, the recognized amino acids were completely conserved (100% of identity) with SARS-CoV and highly conserved with MERS-CoV once the residues recognized by the nanobodies 7N0R and 7R98 presented more than 75% of identity and residues recognized by the antibody 7CR5 presented more than 50% of identity with the SARS-CoV-2 N protein.
On the other side, when compared to common cold coronaviruses N proteins, the recognized residues presented lower conservation, which ranged from 0% to 75% of identity. Interestingly, among the three evaluated antibodies, while the residues recognized by nanobody 7N0R and antibody 7CR5 presented a low conservation degree (ranging from 0% to 33% of identity), the residues recognized by the nanobody 7R98 were conserved in coronaviruses 229E (50% of identity) and NL63 (75% of identity), and they were lowly conserved in coronaviruses OC53 and HKU1 (25% of identity) ( Table 4). Table 4. Conservation of SARS-CoV-2 N protein amino acids recognized by nanobodies (7R98 and 7N0R) and the antibody 7CR5 across other human coronaviruses.

Aligned Residues in Other Human Coronaviruses
Non-conserved amino acids in human coronaviruses are indicated by gray cells.

Discussion
Despite the global use of vaccines to control the disease, with a drastic reduction in cases and deaths by COVID-19, the continuous rise of new SARS-CoV-2 variants makes constant and large surveillance essential to avoiding new waves and the spread of infection. In this scenario, due to being highly immunogenic and abundantly present in blood and saliva during early asymptomatic and symptomatic SARS-CoV-2 infection [39,40], the N protein is highlighted as the major target for antigen detection, which is a valuable and inexpensive strategy for COVID-19 epidemiological surveillance [13]. However, studies investigating the sensitivity of antigen tests for SARS-CoV-2 variants [41][42][43] raise the question of the truthful efficacy of antigen detection tests against current and future variants. Therefore, this study aimed to identify B-cell linear epitopes in the SARS-CoV-2 N protein and to investigate their conservation across the variants of concern and variants of interest.
Regarding the recognition of identified epitopes, 80% of studied individuals recognized at least one of these, with a prevalence of IgM response against N (34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48) , a prevalence of IgG response against N (89-104) , and similar frequencies of IgM and IgG reactivity against the epitopes N (185)(186)(187)(188)(189)(190)(191)(192)(193)(194)(195)(196)(197) , N (277)(278)(279)(280)(281)(282)(283)(284)(285)(286)(287) , and N (378-390) . Interestingly, these data corroborate the IgG prevalence observed against the peptide 96-100 in the study of Wang and collaborators but disagree with the prevalence of IgG response against peptides 386-390 and 366-400, which was observed in the same study [59]. These differences between studies can be related to differences between Brazilian and Chinese populations' genetics or in the stage of infection since samples were from patients in the convalescent phase in our study and earlystage patients in the Wang et al. study [59]. Notably, the number of individuals studied is a limitation of our study, but we believe it is sufficient to validate the natural immunogenicity of epitopes. In support of this point, the aforementioned study by Wang et al. used samples from only 10 patients to identify peptides recognized by antibodies [59]. However, we believe that a larger number of samples is needed to evaluate the true sensitivity and specificity of these epitopes in serological assays.
Concerning the use of antigen detection tests in COVID-19 massive surveillance, we investigated the conservation of predicted epitopes across the main variants. Remarkably, we did not observe mutations inside the identified epitopes, suggesting that the identified epitope should be recognized by the same antibodies in the main SARS-CoV-2 variants. These data are inconsistent with the study by Kumar et al., who mapped mutations in predicted B cell linear epitopes in the SARS-CoV-2 N protein [4]. From our point of view, this discrepancy could be related to the selection of the variants studied, since we selected the top growing lineages in the world and the most common variants in Brazil, while Kumar's study used variations present in the N-protein among 831 Indian isolates of SARS-CoV-2 [4], resulting in a local observation of the mutations.
Leuzinger et al. performed a study comparing different immunoassays and showed that the full-length N protein was cross-recognized by pre-existing antibodies, which were produced during previous exposures to other human coronaviruses [60]. Based on this finding, we investigated the conservation of the identified epitopes in other human coronaviruses (SARS, MERS, 229E, NL63, OC43, and HKU1) and found that all identified epitopes are highly conserved in the SARS-CoV N protein, and epitopes N (185)(186)(187)(188)(189)(190)(191)(192)(193)(194)(195)(196)(197) and N (277-287) share more than 70% identity with the MERS-CoV N protein. Notably, despite the high conservation compared to SARS and MERS, the identified epitopes share low conservation (identities ranging from 23% to 61%) with common cold coronaviruses (229E, NL63, OC43, and HKU1), suggesting that these epitopes are mainly recognized by antibodies specifics to SARS-CoV-2 s N protein with low cross-reactivity against the most common coronaviruses. These data corroborate studies that suggested the use of N domains with deleted conserved regions as a target for specific antibody-based assays for COVID-19 [60,61]. Furthermore, our study corroborates the study by Wen et al., reporting that monoclonal antibodies to the SARS-CoV-2 nucleocapsid protein cross-react with their counterparts of SARS-CoV but not other human betacoronaviruses [62], and it suggests that knowing the epitope conservation shared among human coronaviruses is a critical step in predicting the cross-reactivity of a protein as an antigen to the serologic test, as the epitope conservation does not necessarily reflect the conservation shared in the whole protein [63,64].
Considering that the N protein is the main target of antigen detection tests and that the sensitivity of the test depends on the specificity and affinity of the antibodies used to detect it, we evaluated the conservation of amino acids recognized by three anti-antibodies to SARS-CoV-2 N protein among the major SARS-CoV-2 variants and other human coronaviruses. Remarkably, the amino acid residues recognized by the three antibodies studied were not located in key-point mutations positions, suggesting that these antibodies can recognize the N protein of the main SARS-CoV-2 variants and corroborating the hypothesis that the decrease in sensitivity of antigen tests can be related to other causes, such as a lower viral load of some variants [43]. Furthermore, our data suggest that these antibodies to the SARS-CoV-2 N protein can cross-recognize the protein of SARS and MERS, corroborating previous studies that demonstrated that antibodies against N protein from SARS and MERS can cross-react with the SARS-CoV-2 protein [64][65][66], and reinforcing that caution should be used while interpreting assay results when the full-length recombinant N protein of SARS-CoV-2 is used as a reagent for the diagnosis of SARS-CoV-2 infections in humans.
Conversely, despite the cross-reactivity with SARS and MERS, our data support the use of antigen testing as a tool for massive epidemiological surveillance of COVID-19. Considering that both coronaviruses, SARS and MERS, are not usually reported worldwide, the common cold coronaviruses (229E, NL63, OC43, and HKU1) should be the main concern for antigen test cross-reactivity. In this context, regarding the potential cross-reactivity with N-protein from common cold coronaviruses, only one studied antibody seems to recognize amino acid conserved in coronaviruses 229E and NL63. Therefore, despite our data showing that most of the identified epitopes are poorly conserved across common cold coronaviruses, there are epitopes identified and amino acids recognized by antibodies that are highly conserved across these viruses. Several studies have reported differences in the sensitivity of antigen tests to detect SARS-CoV-2 infection [67][68][69][70], which may be related to patient characteristics such as days of symptoms, virus variant characteristics such as viral load, or characteristics of the antibodies used in the test, such as affinity and specificity. These data highlight the need to evaluate the cross-reactivity of antigen tests against common cold coronaviruses to prove their specificity.
Concluding, this study gives a comprehensive view of B-cell epitopes of SARS-CoV-2 N protein and their conservation across the main SARS-CoV-2 variants and other coronaviruses. Despite studies that have demonstrated that mutations in the Spike protein of SARS-CoV-2 variants may lead to decreased sensitivity to neutralizing antibodies [71][72][73], it is evident from our study that the main mutations described in major SARS-CoV-2 variants are not inserted in identified epitopes, neither in amino acids recognized by the evaluated antibodies. In this context, our study supports the use of antigen testing as a scalable solution for population-level diagnosis of SARS-CoV-2; however, we highlight the need to verify the cross-reactivity of these tests against the N protein of common cold coronaviruses, which are described worldwide.

Conclusions
Our study validated experimentally five predicted B-cell linear epitopes as naturally immunogenic by their reactivity against the serum samples from COVID-19 convalescent patients. Remarkably, these epitopes were conserved across the major SARS-CoV-2 variants, suggesting that antigen tests based on antibodies specific to these epitopes could recognize the N protein of these variants. We also examined the conservation of these epitopes and the amino acid residues recognized by the three antibodies across other human coronaviruses and showed that there is an increased likelihood of cross-reaction with SARS and MERS coronaviruses and a decreased likelihood of cross-reaction with common cold coronaviruses, but these cross-reactions should be verified by antigen test developers.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v15040923/s1; Table S1: Frequency of N protein key mutations across the SARS-CoV-2 major variants. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data are available upon request for privacy or ethical reasons. The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to confidential information related to donors' personal data in accordance with the Institutional Ethics Committee of the Oswaldo Cruz Foundation.