1. Introduction
A cluster of severe pneumonia cases of unknown origin, linked to the Huanan seafood wholesale market in Wuhan, Hubei Providence, China, were reported by the Chinese Health Authorities on 31 December 2019. Sequencing-based analysis of lower respiratory tract samples (bronchoalveolar lavage fluid) identified a novel beta-coronavirus sharing >85% sequence similarity with a bat severe acute respiratory syndrome (SARS)-like coronavirus (CoV), provisionally indicated as 2019-nCoV, to be the causative pathogen of the coronavirus disease 2019 (COVID-19) [
1,
2].
CoVs constitute a family of enveloped positive-strand RNA viruses of 26–32 kilobases (kb) [
3]. This seventh member of the CoV family that causes disease in humans, was further characterized showing sequence homology of approximately 79% and 50% with the SARS-CoV (now referred to as SARS-CoV-1) of the 2002 outbreak in China and the Middle East respiratory syndrome (MERS)-CoV of the 2012 outbreak in the Middle East (mainly in Saudi Arabia), respectively [
4]. Thereafter, this novel CoV was named SARS-CoV-2 [
5], and was rapidly spread to most countries worldwide, leading to the announcement of the COVID-19 pandemic by the World Health Organization (WHO) on 12 March 2020.
Due to the lack of established herd immunity and effective therapeutic treatments [
6], the identification of COVID-19 patients and asymptomatic carriers that can transmit the virus, is still being adopted worldwide as the main approach for disease monitoring and management. At the same time, the availability of SARS-CoV-2 genomic sequences have significantly improved our understanding of SARS-CoV-2 evolution and spread, supported the identification of strains with selective advantage to become the dominant strain over other variants, and provided a risk-stratification tool for SARS-CoV-2 infectivity ranging from individual patients to community/national scales [
7,
8]. Nonetheless, and although risk-based community/population-wide PCR screening strategies have clearly contributed to the control of epidemics, lessening of death rates, and reopening of economic activities (see Wuhan and Singapore examples [
9,
10]), with the current methods/resources, only symptomatic and suspicious cases are being tested and only a fraction of them is being sequenced so far in most countries, highlighting the need for population-wide screening approaches for SARS-CoV-2 spread and mutational/variant monitoring.
Wastewater-based epidemiology (WBE) has been successfully applied in numerous case studies worldwide, with the most extensive so far being the estimation of drug consumption [
11,
12]. Although WBE cannot replace clinical screening and diagnostics, it still offers, due to the significant lower scale of samples needed to be tested, a cheaper and faster way for population-wide surveillance without selection bias. In this regard, WBE screening could capture SARS-CoV-2 asymptomatic carriers who are less likely to undergo testing and symptomatic patients avoiding testing due to stigmatization and social isolation. Moreover, mutational profiling of SARS-CoV-2 in wastewater could represent an innovative, cost-effective approach towards the establishment of an early-warning system for the global monitoring of SARS-CoV-2 genomic epidemiology at community/population levels [
13,
14,
15].
Nonetheless, the development and optimization of the analytical protocols needed for SARS-CoV-2 WBE are very challenging due to the nature of the sample (PCR inhibition, complexity of DNA/RNA templates), the moderate efficiency of SARS-CoV-2 concentration, and the quality/degradation of the RNA template, for example, due to excessive amounts of detergents in wastewater samples [
16]. To overcome these bottlenecks, we have performed an extensive in silico RNA stability analysis to identify the most stable genomic regions of SARS-CoV-2 and we have developed four different nested PCR/real-time PCR assays in order to improve the sensitivity and reduce the false negative rate of SASR-CoV-2 detection in wastewater. Moreover, targeted DNA-seq was applied for the mutational/strain analysis of SARS-CoV-2 in wastewater, following nested PCR-based amplification of targeted regions of the SASR-CoV-2 genome.
3. Discussion
Considering the lack of effective treatments against COVID-19 [
6], accurate, massive, and representative at the community level screening along with in-depth epidemiological analysis of existing or new emerging variants, are currently the only ways for an evidence-based approach of applying restriction measures in the near future. To this end, WBE of SARS-CoV-2 has been raised as a modern approach for real-time population-wide surveillance of SARS-CoV-2. Indeed, concentrations of SARS-CoV-2 in wastewater seemingly correlate with COVID-19 infection rates, and precede epidemic expansion and molecular testing at general population levels [
14,
18]. Moreover, SARS-CoV-2 genetic analysis in human samples could provide a severity-stratification tool for COVID-19 patients, as well as a much-needed approach for the early identification of newly emerging SARS-CoV-2 variants; nonetheless, these approaches remain particularly costly and time-consuming. Thus, mutational profiling of SARS-CoV-2 in wastewater could represent an innovative, cost-effective approach for the monitoring of existing variants and an early-warning system for new emerging ones, at community/population levels.
Τhe current analytical protocols for WBE of SARS-CoV-2 suffer from reduced sensitivity and false negative results, due to, for example, low viral loads, RNA degradation, and PCR inhibition [
16]. To overcome these drawbacks, we performed RNA stability analysis of the SARS-CoV-2 RNA genome and identified highly predicted stable regions. This knowledge was exploited for the design of novel in-house methods that combining random hexamers-based reverse transcription and nested PCR/real-time PCR amplification against four highly stable regions of SARS-CoV-2 RNA. The evaluation of our novel assays highlighted the improved LOD (up to two copies/PCR reaction) as compared with one-step RT-PCR methods [
19,
20], and the significantly improved sensitivity as compared with in-house CDC/2019-nCoV_N1-based assay. Interestingly, more than half of the positive samples were detected by using only one assay, highlighting the anticipated on-going degradation of SARS-CoV-2 RNA in wastewater and clearly demonstrates that SARS-CoV-2 RNA detection in wastewater is genomic region dependent.
Previously, nested PCR approaches have been successfully applied for the detection of SARS-CoV-2 in wastewater [
21,
22,
23,
24]. We have documented, here, that the detection of SARS-CoV-2 in wastewater is assay dependent. In this regard, the reduction of false negative results in WBE requires the targeting of more than one SARS-CoV-2 genomic region. Based on our findings, the detection sensitivity of a single assay ranged from ~30% to 60%, while the combination of two or three assays improved sensitivity to 82% and 94%, respectively.
Mutation rates and genetic diversity of RNA viruses are significantly high, providing selective advantage to evolve and adapt to dynamic changes of environments and hosts [
25]. Despite the presence of genetic proofreading machinery in SARS-CoVs [
26], genetic diversity of SARS-CoV-2 is ever-growing, highlighting the unique position of CoVs in the RNA virus world. Thus, specific amino acid changes in SARS-CoV-2 encoded polypeptides could alter SARS-CoV-2 life cycle, infectivity, and/or antigenicity resulting in weakening the on-going vaccination programs and COVID-19 treatment efficacy.
As the SARS-CoV-2 Spike protein prevails as the main target of COVID-19 vaccines [
27], mutations of S gene are frequently reported and studied [
8,
28]. In this regard, the D614G (23403A>G) missense mutation, which was initially identified in Europe, has emerged as the dominant pandemic form, likely due to a significant fitness advantage. The D614G mutation has been strongly associated with higher upper track viral loads and higher rates of younger hosts’ infection, as well as with increased replication and higher pseudotyped viral titers ex vivo [
7,
29,
30]. Moreover, recent studies have confirmed the enhanced infectiveness and transmission of G614 strain in vivo [
31,
32]. Additionally, mutations of RNA-dependent RNA polymerase (RdRP), which is targeted by the anti-viral nucleoside analogues remdesivir [
33] and favipiravir [
34], are also in the spotlight. Interestingly, the P323L (14408C>T) mutation has been reported to co-evolve with D614G worldwide; this adaptation of the virus might strengthen SARS-CoV-2 G614 strain replication rates and infectivity [
35].
In this regard, we have initially targeted five well-characterized missense mutations spanning different genomic regions of SARS-CoV-2, i.e., D614G (23403A>G), P323L (14408C>T), Q57H (25563G>T), R203K (28881G>A), and G204R (28883G>C), and specific nested PCR amplicons were sequenced using DNA-seq. This approach allowed the quantification of SARS-CoV-2 mutations/variations, and our data highlighted the significant prevalence (>99%) of D614G and P323L mutations in wastewater samples obtained from WWTP of Athens, Greece, during September-November 2020, in line with worldwide data based on COVID-19 patient samples. Additionally, the reported worldwide growing trend of the Q57H mutation was confirmed in our samples, with a percentage of ~47% in samples collected during October/November 2020.
Interestingly, a previously unknown missense mutation in the S gene, H625R (23436A>G), was identified in ~6% of September samples. Although H625R involves the substitution of two amino acids with positively charged polar side chains, the in silico structure analysis suggested significant changes in S protein folding. In this regard, specific monitoring of H625R along with other newly emerging mutants in COVID-19 patients and wastewater is necessary to conclude on their potential selective advantage and possible association with SARS-CoV-2 infectivity and effect on existing vaccines.
Focusing on N gene, the declining trend of 28881G>A, 28882G>A, and 28883G>C substitutions was confirmed in our samples. Interestingly, the 28884G>A substitution, which reportedly has a ~1% prevalence worldwide, was overrepresented in Athen’s samples (09/2020, 70% and 11/2020, 35%) and correlated significantly with 28883G>C, resulting in the G204L substitution of nucleocapsid protein. Moreover, a novel variant originating from a simultaneous 4 nt deletion (28881_28884del) and a 4 nt insertion at position 28885 (28885_28886insACAT), was observed in our samples, leading to the R203K and G204H missense mutations of nucleocapsid protein.
Finally, to facilitate the in-depth epidemiological analysis of existing and/or new emerging SARS-CoV-2 variants, we have expanded our methodology towards the mutational profiling of the whole S gene of SARS-CoV-2. The analysis of March 2021 samples as proof-of-principle of the methodology, highlighted that the Β.1.1.7/alpha variant was dominant in wastewater samples from WWTP of Attica, a finding in accordance with the prevalence of Β.1.1.7/alpha variant during the third pandemic wave in Greece.
4. Materials and Methods
4.1. SARS-CoV-2 RNA Stability Analysis
Τhe in silico stability analysis of SARS-CoV-2 RNA was carried out with the ScanFold algorithm [
36]. The Wuhan-Hu-1 reference genome (NC_045512.2) was analyzed with ScanFold-Scan, using a 300 nt window with a 150 nt nucleotide step size, resulting in 198 analyzed windows. Each window was analyzed using the RNAfold algorithm that is included in the ViennaRNA package. For each window the minimum free energy (MFE) ΔG° structure and value was predicted using the Turner energy model at 18 °C.
4.2. Wastewater Sampling
The 24 h composite influent wastewater samples were collected from the WWTP of Athens (serves a population equivalent of 5,200,000 inhabitants). Influent wastewater samples were collected in precleaned high-density polyethylene (HDPE) bottles, transported on ice to the laboratory and stored at 4 °C. All the collected samples were analyzed immediately after the arrival at the laboratory.
4.3. Sample Concentration and RNA Extraction
Sample concentration was performed immediately after arrival using Polyethylene Glycol 8000 (PEG 8000, Promega Corporation, Madison, WI, USA) precipitation. In particular, 50 mL of an influent wastewater was centrifuged at 4750× g for 30 min at 4 °C to remove debris, bacteria, and large particles. The supernatant was transferred in a clean centrifuge tube, containing 3.5× g PEG and 0.8 g NaCl, mixed at ambient temperature until completely dissolved, and centrifuged at 10,050× g for 2 h, at 4 °C. Most of the supernatant was discarded without disturbing the pellet and the tube was centrifuged at 10,050× g for 5 min, at 4 °C. Finally, the pellet was reconstituted by 500 μL nuclease-free water.
RNA extraction was performed, by 200 μL concentrate using a Water DNA/RNA Magnetic Bead kit (IDEXX Laboratories Inc., Westbrook, Maine, USA) immediately following concentration. The % recovery was calculated by examining three concentrations, 10, 100, and 1000 copies/μL of the EURM-019 SARS-CoV-2 reference material, equal to 78.6%, 82.4%, and 91.3%, respectively [
16,
37]. Moreover, mengovirus (MgV) was used as an extraction control to evaluate extraction efficiency.
4.4. First-Strand cDNA Synthesis
Total RNA template from wastewater samples was reverse transcribed in a 20 μL reaction containing 5.0 μL RNA, 1.0 μL of 10 mM dNTPs mix (Jena Bioscience GmbH, Jena, Germany), 100 U SuperScript III reverse transcriptase (Invitrogen, Carlsbad, CA, USA), 50 U RNaseOUT recombinant ribonuclease inhibitor (Invitrogen), and 1.0 μL of 50 μΜ random hexamers (Invitrogen). The mixture of total RNA, dNTPs, and random hexamers was incubated at 65 °C for 5 min, while the reverse transcription took place at 25 °C for 10 min, followed by 50 °C for 50 min. Enzyme inactivation was performed at 70 °C for 15 min. The AMPLIRUN SARS-CoV-2 RNA control (Vircell S.L., Granada, Spain) was used as the SARS-CoV-2 complete genome control.
4.5. Detection of SARS-CoV-2
Novel nested PCR and nested real-time PCR assays were designed and validated against the identified highly stable regions of SARS-CoV-2 RNA. The Wuhan-Hu-1 reference genome (NC_045512.2) was used for the in silico analysis and design of SARS-CoV-2 specific primers and fluorescent probes (
Table S1).
A Veriti 96-well fast thermal cycler (Applied Biosystems, Carlsbad, CA) was used for the nested PCR assays. The 25 μL of the reaction consisted of 5.0 μL cDNA template (1st PCR) or 2.0 μL PCR product (2nd PCR), 1.0 μL of 10 mM dNTPs mix (Jena Bioscience GmbH), 500 nM of each forward/reverse primer, and 1 U of Kapa Taq polymerase (Kapa Biosystems, Inc., Woburn, MA, USA). The thermal protocol consisted of polymerase activation step at 95 °C for 3 min, followed by 15 cycles (1st PCR) or 40 cycles (2nd PCR) of denaturation at 95 °C for 30 s, primer annealing at 60 °C for 30 s and extension at 72 °C for 1 min, followed by a final extension step at 72 °C for 5 min. After the completion of the 2nd reaction, 10 μL of PCR product was electrophoresed on 1.5% w/v agarose gel, visualized with ethidium bromide staining, and photographed under UV light.
The probe fluorescent-based real-time PCR assays were performed in a 7500 Fast Real-Time PCR System (Applied Biosystems). The PCR product of the 1st conventional PCR, as described above, were used as a template for the real-time PCR assay (2nd reaction). The 20 μL reaction consisted of 2.0 μL PCR product, 10 μL Kapa Probe Fast Universal (2X) qPCR Master Mix (Kapa Biosystems), 500 nM of each of the forward, reverse primers, and 125 nM of fluorescent probe. The thermal protocol included an initial polymerase activation step at 95 °C for 3 min, followed by 40 cycles of denaturation at 95 °C for 15 s, and finally the primer/probe annealing and extension step at 60 °C for 1 min.
4.6. Targeted DNA-seq for the Mutational Analysis of SARS-CoV-2 in Wastewater
In-house developed targeted DNA-seq assays, using semi-conductor sequencing technology were performed for the analysis of SARS-CoV-2 variations/mutations in wastewater samples.
Nested PCR assays were carried out to amplify the target regions. The Wuhan-Hu-1 reference genome (NC_045512.2) was used for the in silico analysis and the design of specific primers for the analysis of:
The five missense mutations, D614G (23403A>G)-S gene, Q57H (25563G>T)-ORF3a gene, P323L (14408C>T)—ORF1ab/RdRP gene, R203K (28881G>A)—N gene, and G204R (28883G>C)—N gene (
Table S2);
The whole Spike (S) gene of SARS-CoV-2 (
Table S3).
The Veriti 96-well fast thermal cycler (Applied Biosystems) was used for the nested PCR assays. The 25 μL of the reaction consisted of 5.0 μL cDNA template (1st PCR) or 2.0 μL PCR product (2nd PCR), 1.0 μL of 10 mM dNTPs mix (Jena Bioscience), 500 nM of each forward/reverse primer, and 1 U of Kapa Taq polymerase (Kapa Biosystems). The thermal protocol consisted of polymerase activation step at 95 °C for 3 min, followed by 20 cycles (1st PCR) or 40 cycles (2nd PCR) of denaturation at 95 °C for 30 s, primer annealing at 60 °C (most assays) or 57 °C (assays 5, 6, 10, and 14 of S gene analysis,
Table S3) for 30 s and extension at 72 °C for 1 min, followed by a final extension step at 72 °C for 5 min. After the completion of the 2nd reaction, 10 μL of PCR product were electrophoresed on 1.5%
w/v agarose gel, visualized with ethidium bromide staining, and photographed under UV light.
An Ion Xpress Plus Fragment Library Kit (Ion Torrent, Thermo Fisher Scientific Inc., Waltham, MA, USA) was employed for the construction of the DNA-seq library, using 1 μg of purified PCR product mix as input. Adapter ligation, nick-repair, and purification of the ligated DNA were carried out, based on the manufacturer’s guidelines. The adapter-ligated library was quantified using the Ion Library TaqMan Quantitation Kit (Ion Torrent) in an ABI 7500 Fast Real-Time PCR System (Applied Biosystems).
The sequencing template was generated with emulsion PCR on an Ion OneTouch 2 System (Ion Torrent), using the Ion PGM Hi-Q View OT2 kit (Ion Torrent). Next, the Ion OneTouch ES instrument (Ion Torrent) was used for the downstream template enrichment procedure. Ultimately, semi-conductor sequencing methodology was carried out in the Ion Torrent PGM system for the sequencing of the amplicons.
4.7. Bioinformatics Analysis
Alignment of the sequencing reads to the reference genome was carried out using the Burrows-Wheeler Aligner (BWA-MEM) [
38]. The BAM files were analyzed by the Integrative Genomics Viewer (IGV) v.2.8.12 software for the visualization and assessment of the alignment results.
To efficiently call variants from the derived NGS datasets, we used the iVar algorithm [
39], which is designed to detect virus genomic variations (SNVs or indels) from amplicon-based sequencing assays, using the default parameters; iVar was also used to identify the corresponding codons and translate variants into amino acids, based on the SARS-CoV-2GFF file.
4.8. In Silico 3D Protein Folding Analysis
To examine the predicted structural modification caused by each detected genomic variant, 3D structure models were generated with the I-TASSER v.5.1 server [
40]. For the detection of the structural differentiations between the mutated and corresponding “wild-type” polypeptides, only the 3D structure with the highest confidence score was taken into consideration.