Key SARS-CoV-2 Mutations of Alpha, Gamma, and Eta Variants Detected in Urban Wastewaters in Italy by Long-Read Amplicon Sequencing Based on Nanopore Technology

: The emergence of SARS-CoV-2 variants of concern (VOCs) and variants of interest (VOIs) poses an increased risk to global public health and underlines the need to prioritise monitoring and research to better respond to the COVID-19 pandemic. Wastewater monitoring can be used to monitor SARS-CoV-2 spread and to track SARS-CoV-2 variants. A long read amplicon sequencing approach based on the Oxford Nanopore technology, targeting the spike protein, was applied to detect SARS-CoV-2 variants in sewage samples collected in central Italy on April 2021. Next-generation sequencing was performed on three pooled samples. For variant identiﬁcation, two approaches–clustering (unsupervised) and classiﬁcation (supervised)–were implemented, resulting in the detection of two VOCs and one VOI. Key mutations of the Alpha variant (B.1.1.7) were detected in all of the pools, accounting for the vast majority of NGS reads. In two different pools, mutations of the Gamma (P.1) and Eta (B.1.525) variants were also detected, accounting for 22.4%, and 1.3% of total NGS reads of the sample, respectively. Results were in agreement with data on variant circulation in Italy at the time of wastewater sample collection. For each variant, in addition to the signature key spike mutations, other less common mutations were detected, including the amino acid substitutions S98F and E484K in the Alpha cluster (alone and combined), and S151I in the Eta cluster. Results of the present study show that the long-read sequencing nanopore technology can be successfully used to explore SARS-CoV-2 diversity in sewage samples, where multiple variants can be present, and that the approach is sensitive enough to detect variants present at low abundance in wastewater samples. In conclusion, wastewater monitoring can help one discover the spread of variants in a community and early detect the emerging of clinically relevant mutations or variants.


Introduction
Coronavirus disease 2019 (COVID- 19), the viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is the cause of the ongoing worldwide public health emergency. SARS-CoV-2 is defined by a variety of lineages harbouring distinctive genetic changes in the genome and, more significantly, in the spike (S) protein. Some fast-spreading SARS-CoV-2 variants have become the dominant circulating strains. The emergence of variants that posed an increased risk to global public health prompted the characterisation of specific variants of concern (VOCs) and variants of interest (VOIs) in order to prioritise global monitoring and research, and ultimately to inform the ongoing response to the COVID-19 pandemic (tracking SARS-CoV-2 variants (who.int)). VOCs show clear evidence of either increased transmissibility, increased virulence, change in clinical disease presentation, decreased efficacy of public health measures, diagnostics, or vaccines. VOIs have mutations with established or suspected phenotypic implications that could imply an effect on transmissibility, severity and/or immunity, realistically having an impact on the epidemiological situation (SARS-CoV-2 variants of concern as of 18 June 2021 (https://www.ecdc.europa.eu/en/covid-19/variants-concern, accessed on 11 August 2021). Finally, variants under monitoring are those for which there is some indication that they could have properties similar to those of a VOC, but the evidence is still weak or an assessment has not yet be completed. On 31 May, 2021, WHO assigned simple labels for key variants of SARS-CoV-2 using letters of the Greek alphabet (WHO announces simple, easy-to-say labels for SARS-CoV-2 variants of interest and concern). As of August 2021, there are four VOCs identified by WHO Theta and Epsilon variants still on the VOI list, and the Iota variant instead included among variants under monitoring (https://www.ecdc.europa.eu/en/covid-19/variants-concern, accessed on 11 August 2021).
In a previous study, we described a long nested RT-PCR assay targeting the spike protein (~1600 bps, covering amino acids 58-573), combined with Sanger sequencing [6]. This approach allowed the detection of panels of multiple nucleotide mutations distinctive of Alpha and Gamma SARS-CoV-2 variants as well as of the Pango lineage B.1.177 (20E.EU1). However, since classical Sanger sequencing on PCR amplicons may underestimate the existence of possibly less common sequences, in this study we aimed to combine the protocol with NGS for a more in-depth analysis, as previously successfully applied to enteric viruses [14][15][16]. The NGS technologies more commonly used, requiring sequencing of short DNA fragments and assembling in unique contigs, display some limitations in complex matrices like urban wastewaters, where a mixture of different genomes is frequently present. Using this technology, indeed, due to DNA fragmentation, it is not possible to pinpoint co-occurrence of mutations on the same DNA strand, while it is possible to detect signature mutations of SARS-CoV-2 variants by considering the frequency of mutations within a sample. The introduction of single-molecule sequencing platforms, such as single-molecule sensing technologies on the Oxford Nanopore Technologies (ONT) MinION platform, has opened the way to obtaining ultra-long reads. Herein we applied the ONT technology to identify a panel of key spike mutations, in order to discriminate the different SARS-CoV-2 variants in wastewater samples.

Materials and Methods
2.1. Sampling, Nucleic Acids Extraction, Real-Time RT-qPCR, RT-Nested-PCR, and Sanger Sequencing Twelve samples were collected in three wastewater Treatments Plants located in the Province of Latina, Latium, Central Italy between 28 April and 28 May 2021. These WTPs were selected since a high-risk COVID red zone was declared in that area, due to the high proportion of local Indian farm labourers who have tested positive for the virus and the concern about the possible circulation of the Delta variant (COVID: red zone declared near Latina as dozens of Indians test positive-English-ANSA.it).
Before viral concentration, samples underwent a 30 min treatment at 56 • C to increase the safety of the analytical protocol for the laboratory personnel and environment, and were then concentrate using polyethylene glycol (PEG)-based concentration, according to the protocol by Wu et al. (2020) [17], with slight modifications. Briefly, sewage samples (45 mL) were centrifuged at 4500× g for 30 min; after centrifugation 40 mL of sample were mixed with polyethylene glycol 8000 8% (wt/vol) and NaCl (0.3 M) (both supplied by Sigma-Aldrich, St. Louis, MO, USA), spiked with a known quantity of mengovirus, used as a process control; sample was then centrifuged at 12,000× g for 2 h and the viral pellet was resuspended in 2 mL NucliSENS Lysis Buffer reagent (bioMerieux, Marcy-l'Étoile, France) for RNA extraction. Nucleic acids were extracted using the MiniMAG extraction system (bioMerieux) and eluted RNA (100 µL) was further purified from PCR inhibitors using the OneStep PCR Inhibitor Removal Kit (Zymo Research, Irvine, CA, USA) and stored at −80 • C until molecular analysis. Viral recovery was assessed by mengovirus analysis as described in Costafreda et al. (2006) [18]. Screening for SARS-CoV-2 detection was performed by real-time RT-qPCR as reported in La   [19]. Conventional RT-PCR was performed using the SuperScript IV OneStep RT-PCR System with Platinum SuperFi RT-PCR Master mix (Invitrogen, Carlsbad, CA, USA) and previously described primers amplifying a long fragment (1592 bps) of the spike region [6]. Amplifications were performed in a T100 PCR thermal cycler (Bio-Rad Laboratories, Hercules, CA, USA). Mutations detectable by the long PCR assays for VOCs and VOIs variants are shown in Table 1.
Amplification conditions were as follows: reverse transcription at 45 • C for 10 min, denaturation at 98 • C for 2 min, followed by 35 cycles of 98 • C for 10 s, 58 • C for 10 s, 72 • C for 1 min (PCR ID 979), and a final extension at 72 • C for 5 min. PCR reactions was performed using 2 µL of each primer (10 µM) and 8 µL of RNA in a final volume of 50 µL. After the first round PCR, nested PCR (PCR ID 980) was performed in a 50 µL volume using the Phusion Hot Start II DNA Polymerase with GC buffer (Thermo Fisher Scientific, Waltham, MA, USA), 4 µL of the first PCR product, 2 µL of each primer (10 µM) and the following conditions: initial denaturation at 98 • C for 30 s, followed by 45 cycles of 98 • C for 10 s, 62 • C for 30 s, 72 • C for 1 min, and a final extension at 72 • C for 10 min. Standard precautions were taken to avoid laboratory contamination. RNA from SARS-CoV-2 Wuhan prototype strain (provided by the Robert Koch Institute under the RefBio project) and molecular grade water were included in each PCR run as positive and negative controls, respectively. PCR products were observed by gel electrophoresis (2% agarose gel, stained with GelRed, Biotium; Fremont, CA, USA). Positive PCR products were purified using Montage PCRm96 Micro-Well Filter Plate (Millipore, Burlington, MA, USA) and were sequenced on both strands (Bio-Fab Research, Rome, Italy, and Eurofins Genomics, Ebersberg, Germany). Positive PCR products confirmed by Sanger sequencing were mixed in three different pools (barcode BC01, BC02, and BC03) based on sampling points (see Table 2) for subsequent nanopore sequencing.
Percentage represent the frequency of mutation occurrence in each variant according to GISAID data (data extracted on 6 August, 2021). Key mutations used for the BAMQL queries are reported in bold.

Library Preparation of SARS-CoV-2 PCR Products for Nanopore Sequencing, Data Acquisition, Basecalling, and Bioinformatics Analyses
The workflow for library preparation, nanopore sequencing, and data analysis is shown in Figure 1. Nanopore sequencing was performed at Bio-Fab Research (Rome, Italy) using a MinION sequencer. Libraries were prepared using the cDNA-PCR Sequencing kit (SQK-PCS109) from Oxford Nanopore Technologies (Oxford, UK) following the manufacturer's instructions, combined with the Native barcoding (EXP-NBD104) kit. MinION flow cells (FLO-MIN106 R9.4.1) were loaded with 50 fmol of pooled libraries and sequencing was carried on over 17 h using the MinKNOW software 3.6.5. The raw data (FAST5 files) were base called using the high-accuracy models (dna_r9.4.1_450bps_hac.cfg) and then demultiplexed by Guppy basecalling suite v3.6.1 + 249406c on Ubuntu 18.04 machine, to obtain the final FASTQ files. Filtering parameters were applied to select only for reads preserving the 5 and 3 adaptors. Subsequently, reads were selected for a length range consistent with the PCR amplicon product (1400-1700 bases), and for an average quality equal or above 8. In order to eliminate non-specific signals, individual sequence reads were mapped to the SARS-CoV-2 reference sequence (NC_045512) and only reads aligned with this reference were used for further analysis. The resulting BAM file was the input of the subsequent analysis for amplicon assignation. Both unsupervised and supervised approaches were used for NGS reads assignment. When the alignment showed the presence of additional mutations besides the key mutations associated with the selected variant (e.g., the sub-variant of the Alpha cluster, including the amino acid substitution E484K, or other), a second query was performed to extract the reads subsets containing the additional amino acid changes. Further to the presence of queried mutations, the visualization step in IGV was used to check for alignment anomalies, such as Ins/Del heavy patterns (see Supplementary Material, Figures S4  and S5) and positive/negative strand heavy unbalance. In presence of such unbalancing, the alignment was considered unreliable and therefore discarded. A sequencing coverage level of at least 100 reads after the bioinformatic analysis was required for variant assignment. Finally, a comparison between classification outputs (variants and sub-variants) was performed extracting read IDs from the BAMQL output and using them as input for The unsupervised clustering was performed by using the phasing by the alignment model of the LongAmpliconPhasing.py tool (https://github.com/PacificBiosciences/ pbampliconclustering, accessed on 11 August 2021). Most similar reads were clustered, based on SNPs present in BAM files, calling a unique haplotype variant. Results were shown in an alleleClusterSummary file, which illustrates the different clusters and associated mutations.
The supervised classification approach was performed extracting reads subsets from BAM files, querying them with BAMQL tool v1.6 [20] using ad hoc command lines designed to search a panel of deletions and/or amino acid substitutions in linkage, each characteristic for specific VOCs or VOIs. Mutations included in the command lines for each VOC/VOI are illustrated in the Supplementary Material (Table S2). BAMQL reads subset outputs were loaded in IGV v2.9.4 for visual confirmation of the alignment and of selected mutations.
When the alignment showed the presence of additional mutations besides the key mutations associated with the selected variant (e.g., the sub-variant of the Alpha cluster, including the amino acid substitution E484K, or other), a second query was performed to extract the reads subsets containing the additional amino acid changes. Further to the presence of queried mutations, the visualization step in IGV was used to check for alignment anomalies, such as Ins/Del heavy patterns (see Supplementary Material, Figures S4 and S5) and positive/negative strand heavy unbalance. In presence of such unbalancing, the alignment was considered unreliable and therefore discarded. A sequencing coverage level of at least 100 reads after the bioinformatic analysis was required for variant assignment. Finally, a comparison between classification outputs (variants and sub-variants) was performed extracting read IDs from the BAMQL output and using them as input for the VIB web tool to draw Venn diagrams (Draw Venn Diagram (ugent.be). Bioinformatics analysis was performed under Linux Ubuntu 18.04 machine.

Results
Viral recovery, assessed by mengovirus analysis was on average 2.9% (range 1.3-6.2%). SARS-CoV-2 RNA was detected in 11 of the 12 samples by real-time RT-qPCR, with concentrations ranging from 1.9 × 10 3 to 9.1 × 10 4 genome copies (g.c.)/L of sewage (see Supplementary Materials, Table S1). Amplification by the long RT-nested-PCR assay (ID 980) was achieved in 6 of the 12 samples, and resulting amplicons were sequenced by conventional Sanger sequencing. Mutations characteristic of both Alpha (B.1.1.7) and Gamma (P.1) VOCs were detected, in these sewage samples, as shown in Table 2. These positive samples were then combined into three pools for nanopore sequencing (pool BC01, BC02, and BC03), each containing two samples collected at the same WTP in close dates. A total of 1.367.887, 1.925.893, and 764.517 raw reads were obtained for barcoded pool BC01, BC02, and BC03, respectively. After selecting only reads preserving the 5 and 3 adaptors, trimming, filtering for fragment length, and aligning to the reference Wuhan prototype strain, a total of 312.697 reads were obtained for BC01, 328.701 for BC02, and 330.950 for BC03 (see Supplementary Material, Table S1). The number of reads assigned to each variant/sub-variant by the two approaches (clustering and BAMQL classification) is shown in Table 3. Table 3. Number of reads assigned to each variant/sub-variant by the two bioinformatics analysis approaches.

Clustering Results
The results of clustering analysis using the LongAmpliconPhasing.py tool showed the presence of two clusters, corresponding to the Alpha (11,055 reads) and Gamma (8443) variants in pool BC01 (see Supplementary Materials, Figure S1); two clusters, Alpha (70,714 reads) and Alpha with substitution S98F (C21855T) (8604 reads) in BC02; and only one cluster, the Alpha variant (85,971 reads) in BC03.

Classification Results
With the analysis using BAMQL queries (see Supplementary Materials, Table S2), a total of 54,489 reads and 69,980 reads were assigned to the Alpha and Gamma clusters, respectively, in pool BC01. In pool BC02, 85,191 reads were assigned to the Alpha cluster. Of these, 38,460 included also the mutation S98F (C21855T), 761 harboured the mutation E484K (G23012A), and 931 displayed both S98F and E484K (see Table 3 and Supplementary Material, Figure S1). The Eta variant was also detected in pool BC02, with 4325 total reads, including 2036 reads displaying also the S151I mutation (G22014T) (see Table 3 and Supplementary Material, Figure S2). Venn diagrams, which use overlapping circles or other shapes to illustrate the relationships between sets of items, confirmed the results obtained in BC02 for Alpha and Eta variants/sub-variants (see Supplementary Material, Figure S3). In pool BC03, 105,315 reads were assigned to the Alpha variant. No sub-variants or other variants were detected in this pool.
In a previous investigation, we detected SARS-CoV-2 VOCs (Alpha and Gamma) in Italy by a long nested PCR (~1600 bps) targeting the spike region, followed by Sanger sequencing [6]. All SARS-CoV-2 variants involve genetic mutations of the spike protein, which is therefore a significant region to be sequenced in order to discriminate VOCs and VOIs. In the present study, we implemented this method by combining it with a long read sequencing approach based on the Oxford Nanopore MinION platform.
Key mutations associated with two VOCs (Alpha and Gamma) and a VOI (Eta) were detected in urban sewage. Specifically, a panel of mutations characteristic of the Alpha variant (H69del, V70del, Y144del, N501Y, A570D) was detected in all the tested pools, confirming that the B.1.1.7 lineage was widespread in Italy at the time of sample collection (April/May 2021), as reported by the surveillance data on VOCs in Italy (prevalenza delle tre varianti SARS-CoV-2: lineage B.1.1.7, P1, in Italia). In all the pools, the Alpha cluster accounted for the vast majority of reads and three alpha sub-clusters were identified based on the presence of additional mutations (S98F, E484K, and S98F + E484K in combination). These two mutations are indeed occasionally found in SARS-CoV-2 sequences of the Alpha variant. As of 13 August 2021, of the 1,049,488 Alpha sequences in GISAID, 18,490 (1.8%) also contain the amino acid substitution S98F. This mutation was also detected by Sanger sequencing in one sample (ID 4162) later included in the pool ( Table 2). With regard to mutation E484K, it is usually present in variants as Beta, Gamma, Eta and Zeta, but only in some isolates of the Iota and Alpha variants. In detail, mutation E484K is reported in 2959 (0.3%) of the Alpha sequences from GISAID, including 26 sequences from Italy. According to ECDC, the variant B.1.1.7 + E484K shows evidence for impact on transmissibility, immunity, and severity of the infection (https://www.ecdc.europa.eu/ en/covid-19/variants-concern, accessed on 11 August 2021). The combination Alpha + mutations S98F and E484K, found in BC02, is overall extremely rare, having being found thus far only in 12 Alpha sequences, none of which are from Italy. However, the visual control of the alignment confirmed the reads being of good quality and not artefacts of sequencing.
In pool sample BC01, a significant number of reads (approximately 70,000) were assigned to the Gamma cluster. All the reads showed the key panel of mutations characteristic of this variant (D138Y, R190S, K417T, E484K, N501Y) and no additional mutations were found in the sample.
VOCs other than Alpha and Gamma were not identified in sewage collected in April 2021. This is in agreement with clinical data at the time of sewage collection. Since March 2021, the Italian National Institute of Health (ISS) has been reporting on SARS-CoV-2 variants in nasopharyngeal swabs by whole genome sequencing of samples collected in Italy from cases notified in specific dates (so called 'flash survey'). Monthly reports are available in the ISS website (Comunicati stampa-ISS). According to the report related to 20 April 2021, the Alpha variant prevalence in Italy was estimated at 91.6%, the Gamma at 4.5%, and all the other variants were below 0.5%. Interestingly, the Eta variant was also detected in this survey in 11 of the 2041 sequences samples, with an estimated prevalence of 0.4%. The panel of mutations of the Eta variant (A67V, H69del, V70del, Y144del, E484K) was detected in more than 4300 reads in pool BC02. An Eta sub-cluster with the additional mutation S151I (G22014T) was also detected, accounting for 2036 reads, and the visual control of the alignment confirmed these reads to be of good quality. According to GISAID database, this mutation has been detected in 741 sequences from 45 countries worldwide thus far, but has rarely been associated with the Eta variant (only three sequences). To further explore the extent of the circulation of this variant in Italy at the time of collection, the GISAID database was interrogated, and, interestingly, five B.1.525 sequences were reported for nasopharyngeal swabs collected since February 2021 and from patients of the hospital S.M. Goretti of Latina, the city where the sewage samples with the Eta variant were collected, suggesting the occurrence of a small cluster of the Eta variant in this area. Significantly, the last of these swab samples had been collected on 21 April, which is consistent with our findings on the presence of eta variants in sewage on 28-29 April in the WTP catchment area.
In this study, the Delta variant was not detected in sewage samples, despite samples being collected specifically in an area where there was concern over the possible spread of this variant in the community, due to the high number of local Indian farm labourers who had tested positive for SARS-CoV-2 (COVID: Red zone declared near Latina as dozens of Indians test positive-English-ANSA.it). It is worthy of note, however, that the absence of the Delta variant in wastewater samples in April 2021 is not surprising since, at the time of sewage sample collection, only one sequence of the Delta variant had been submitted to GISAID from Italy (hCoV-19/Italy/FVG-ASUGI_4_S4_Run10/2021) and three weeks later, on May 18, the estimated prevalence of this variant in Italy was still below 1% (Prevalenza delle tre varianti SARS-CoV-2: lineage B.1.1.7, P1, in Italia). Moreover, the epidemiological investigation on the epidemic in the province of Latina, carried out by 'Lazzaro Spallanzani' National Institute for Infectious Diseases in Rome, confirmed by extended sequencing of positive nasopharyngeal swabs that no infection attributable to the Delta variant was present in the community of Latina (https://newsrnd.com/life/2021-04-29 -early-tests-latin-sikh-community--no-indian-variant-cases.HkWhSZL_vu.html, accessed on 11 August 2021).
With regard to the deep sequencing approach used in this study, both clustering and classification by specific queries were implemented for amplicon reads assignment. Clustering is a type of unsupervised analysis, used to group data points having similar characteristics as clusters. The supervised BAMQL classification analysis, instead, is able to group NGS reads in clusters on the base of selected target characteristics (query based on known key mutations typical of each variant). Significantly, while BAMQL offers direct SNPs call, it doesn't allow to directly call for deletions. In this study, we implemented a strategy to filter deletions by querying the absence of a specific nucleotide in the deleted region; along with the presence of wild type nucleotides at its 5 and 3 , to minimise selection of random deletions generated by nanopore sequencing. This "deletion calling strategy", concatenated with the panel of SNPs has a strong discriminatory power for VOCs/VOIs. Our results showed that the unsupervised clustering approach missed the Eta variant, as well as Alpha cluster, with the additional mutation E484K or the additional mutations S98F plus E484K, suggesting the inability of this approach to identify small clusters in the presence of significantly larger clusters [27,28]. The BAMQL classification, on the other hand, was able to identify also the small clusters, including the Eta variant, which represented only the 1.3% of the total reads in pool sample BC02. This method is characterised by high stringency: of the 972,348 reads obtained in total for the three pools after the filters (reads preserving both the 5 and 3 adaptor, consistent with the PCR amplicon product size, with an average quality equal or above 8, and aligning to the SARS-CoV-2 reference sequence) only 319,300 (32.8%) could be assigned. The Oxford Nanopore MinION, third generation sequencing technology, has several advantages such as portability, long read length, user friendliness, speed of sequencing, capability for multiplexing samples, and it has been demonstrated that Nanopore sequencing can identify low frequency variants in a mix of sequences [29]. However, it is important to underline that reverse transcription and PCR amplification in complex matrices display variable degrees of efficiency on different molecular targets and may significantly affect the final proportions of the amplified sequences. Therefore, the number of amplicon reads in sequenced samples do not necessary represent variant proportions in wastewater samples and sequencing results should not be considered in a quantitative manner.
MinION's main disadvantage is a relatively low read accuracy compared with the highly accurate sequencing of Sanger and Illumina methods, the estimated error rate being between 5 and 15% [30][31][32] despite its constant improvement. Therefore, the analysis of ONT sequence reads is a complex task, particularly when variant detection is required.
Here we demonstrated that, despite the low read accuracy of the system, SARS-CoV-2 variant assignment is feasible, first using stringent filtering parameters (i.e., selection only of reads with double barcodes and of the expected length), then applying a supervised tool for the assignation of reads based on stringent queries, and lastly by inspecting the sequence alignment to check for anomalies. Manual review of aligned reads for confirmation and interpretation of variant calls is an important step that can significantly increase the confidence in calls and reduce the risk of false positives. For example, if an alignment check reveals that variant bases are only on strands of the same polarity the subsequent reads assignation is likely to be a false positive, since the expected strands distribution is 50-50 with the protocol used for this sample sequencing [33]. In this study, the protocol was designed to avoid false positive mutation calls, in the awareness that the sequencing system has a non-negligible rate of errors. However, it should be considered that a high stringency of the analysis may also lead to false negative results, missing the detection of SARS-CoV-2 variants that could be present in the sample. We can assume that the vast majority of the unassigned reads were composed by PCR chimera products, or were of low quality, including too many nucleotide substitutions or insertion/deletions to be assigned. In particular, a characterization via Ins/Del ratio of unassigned reads versus assigned reads highlights that unassigned reads display a considerable higher number of small deletions, therefore hindering their correct alignment and assignation. Furthermore, sequences with an exceedingly high number of deletions and misalignments may easily be missed by the BAMQL stringent queries of the classification approach, due to absence/mismatch of one of the queried targets.
In the current study, sequencing of PCR products was not performed on separate samples, but on pooled PCR amplicons obtained from samples collected at the same WTP in close dates, thus reducing analytical costs and time. Indeed, despite the reduction of NGS costs in recent years, sequencing of large numbers of samples is still challenging. Therefore, barcoding and pooling of samples is a cost-and time-effective approach. One of the most challenging problems of sample pooling is the correct identification of rare variants. In this study, however, we were able to identify mutations related to the Eta variant accounting for only 1.3% of total NGS reads of the sample, confirming that rare variants can be successfully detected after pooling, as already demonstrated by other studies [34].
In this study, 6 of the 12 samples could not be amplified using the long RT-nested-PCR assay. Of these, 5 samples displayed presence of SARS-CoV-2 RNA by real-time RT-qPCR in low concentrations (equal or below 3.5 g.c./µL of tested RNA). This means that SARS-CoV-2 infection was present in the community, but variant analysis could not be completed by partial spike gene sequencing since the amplification of long fragments poses significant challenges in complex matrices as wastewaters. Indeed, while falsepositive errors can be reduced through strict adherence to best laboratory practices with appropriate controls, false-negative results may be associated to intrinsic sensitivity limits of the analytical methods. For a detailed review on factors contributing to SARS-CoV-2 RT-PCR false-negative errors in wastewater surveillance, see Ahmed et al., 2021 [35].

Conclusions
In conclusion, the present study, the first to our knowledge using long-read amplicon sequencing based on nanopore technology to detect SARS-CoV-2 variants in sewage, provides the results of a preliminary analysis of VOCs and VOIs based on partial spike gene amplicon sequencing, using the ONT MinION. The developed bioinformatics analysis approach for variants identification allowed the detection of mutations characteristic of the Alpha, Gamma, end Eta variants in sewage samples from central Italy. Regular collection of wastewater samples and systematic sequencing of the spike gene of SARS-CoV-2 in sewage will help identify VOCs and VOIs and track their circulation in the population.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/w13182503/s1. Figure S1. AlleleClusterSummary for pool BC01 (Clustering unsupervised results); Figure S2. Mutations detected in the alpha cluster; Figure S3. Mutations detected in the eta cluster; Figure S4. Venn diagrams for Alpha and Eta reads in pool BC02; Figure S5. Distribution of small deletions in assigned reads (A) and in unassigned reads (B): small deletions are significantly more frequent in unassigned reads; Figure S6. Insertion/Deletion analysis in assigned reads (A) and unassigned reads (B): in unassigned reads Ins/Del ratio is exceedingly low in the range of 10 bases indicating an abundance of small deletions in comparison to assigned reads. Table S1. Results of real-time RT-qPCR detection of SARS-CoV-2 in the tested samples; Table S2. Number of raw and final reads selected after filtering steps; Table S3. Concatenated mutations queried using the BAMQL tool v1.6 for the detection of each VOC/VOI.

Funding:
The study has received funding from the Italian Ministry of Health in the framework of the national project "Wastewater based epidemiology: implementation of a surveillance system for the early detection of pathogens with a focus on SARS-CoV-2" (CCM 2020 program).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.