The Spread of SARS-CoV-2 Omicron Variant in CALABRIA: A Spatio-Temporal Report of Viral Genome Evolution

We investigated the evolution of SARS-CoV-2 spread in Calabria, Southern Italy, in 2022. A total of 272 RNA isolates from nasopharyngeal swabs of individuals infected with SARS-CoV-2 were sequenced by whole genome sequencing (N = 172) and/or Sanger sequencing (N = 100). Analysis of diffusion of Omicron variants in Calabria revealed the prevalence of 10 different sub-lineages (recombinant BA.1/BA.2, BA.1, BA.1.1, BA.2, BA.2.9, BA.2.10, BA.2.12.1, BA.4, BA.5, BE.1). We observed that Omicron spread in Calabria presented a similar trend as in Italy, with some notable exceptions: BA.1 disappeared in April in Calabria but not in the rest of Italy; recombinant BA.1/BA.2 showed higher frequency in Calabria (13%) than in the rest of Italy (0.02%); BA.2.9, BA.4 and BA.5 emerged in Calabria later than in other Italian regions. In addition, Calabria Omicron presented 16 non-canonical mutations in the S protein and 151 non-canonical mutations in non-structural proteins. Most non-canonical mutations in the S protein occurred mainly in BA.5 whereas non-canonical mutations in non-structural or accessory proteins (ORF1ab, ORF3a, ORF8 and N) were identified in BA.2 and BA.5 sub-lineages. In conclusion, the data reported here underscore the importance of monitoring the entire SARS-CoV-2 genome.


Introduction
In December 2019, a novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] caused an outbreak of atypical pneumonia in Wuhan, China [2], designated by the World Health Organization (WHO) as Coronavirus Disease 2019 (COVID -19). The Wuhan strain of SARS-CoV-2 (NC_045512.2) was rapidly replaced in late January 2020 by a variant called B.1, characterized by mutation D614G in the spike protein (S protein) gene, which increases the transmissibility of SARS-CoV-2 [3]. The B.1 lineage spread worldwide in March 2020 [4]. The rapid spread of SARS-CoV-2 was triggered by the increase in mutations in the S protein that enhanced binding to the ACE2 receptor [5], counteracted the neutralizing effect of natural antibodies [6,7], and/or increase the transmissibility of the virus to other species [8][9][10][11]. In November 2020, the variant of concern (VOC) designated Alpha (B.1.1.7, according to Pango nomenclature [12]) emerged in the United Kingdom. The Alpha variant had nine mutations in the S protein, including the N501Y mutation that promotes infection and transmission of SARS-CoV-2 [13]. In late 2020, two additional VOCs

Sequencing of Viral Genome
The SARS-CoV-2 nasopharyngeal positive swabs were collected in the hospitals of the Catanzaro area, Southern Calabria, Italy. The diagnosis was carried out by the following assays: Allplex SARS-CoV-2 Variants (Arrow Diagnostics Srl, Genoa, Italy), Applied Biosystems TaqPath COVID-19 high throughput combo (ThermoFisher Scientific, Waltham, MA, USA), Xpert ® Xpress SARS-CoV-2 (Cepheid, Sunnyvale, CA, USA) and/or cobas ® SARS-CoV-2 Test (Roche Diagnostics, Indianapolis, IN, USA). Viral RNA was extracted using the NUCLISENS ® easyMAG ® (bioMérieux, Florence, Italy). The RNA samples were subjected to retro-transcription through the InvitrogenTM SuperScript IV VILO Master Mix (ThermoFisher Scientific) and then S gene regions were amplified with the primer list reported in Table S2 and using PCR products obtained through the BigDye Direct Cycle Sequencing Kit (Applied Biosystems, Waltham, MA, USA) were then purified according to the instructions of the BigDye Terminator Purification Kit (Applied Biosystems) and sequenced through the Genetic Analyzer 3500 Dx (Applied Biosystems). Electropherograms were analyzed by SeqScape Software, v.4 (Applied Biosystems), and compared to the SARS-CoV-2 reference genome (NC_045512.2). In brief, 7 µL of viral RNA were retrotranscribed by using InvitrogenTM SuperScriptTM VILO cDNA Synthesis Kit (ThermoFisher Scientific). Libraries were prepared using the Ion AmpliSeq SARS-CoV-2 Research Panel (ThermoFisher Scientific), which consists of 237 amplicons ranging from 125 to 275 bp in length across the SARS-CoV-2 genome. Libraries preparation was performed either manually according to the Ion AmpliSeq Library Kit Plus or on the automatic Ion ChefTM Instrument by using the Ion AmpliSeq Kit for Chef DL8 (ThermoFisher Scientific). The final concentration of manually prepared cDNA libraries was determined on the Agilent 2200 System by the Agilent High Sensitivity DNA Assay (Agilent Technologies, Santa Clara, CA, USA), following manufacturer's recommendations. The concentration of automatically prepared cDNA libraries was determined by Ion Library TaqMan Quantitation Kit (ThermoFisher Scientific). Barcoded libraries were diluted to 30 pM and then loaded onto the Ion ChefTM Instrument (ThermoFisher Scientific) for emulsion PCR, enrichment, and loading onto the Ion S5 520 chip. Post-sequencing run analysis was performed by the Ion Torrent Suite Software with the following plugins: The SARS-CoV-2_coverageAnalysis, COVID19AnnotateSnpEff for variant annotation, and generateConsesus to obtain consensus sequence for each barcode. Variant classification was performed on whole-genome sequences using the Pangolin tool 4.1.1 [34].

Phylogenetic Analysis
Phylogenetic analysis was performed using the full FASTA dataset via NGPhylogeny web tool [35] on FastTree [36] default parameters (bootstrap = 100). Visualization of the phylogenetic tree was performed by Interactive Tree of Life (iTOL) v6 software [37].

Analysis of Viral Strains
Distribution of Omicron sub-lineages in Italy was extracted from GISAID database (data extraction 30 September 2022) using the filters human, VOC Omicron GRA, Italy, complete and low coverage excluded. A total of 48,453 sequences were extracted (Table S3). The frequency of non-canonical mutations in S protein in the viral isolates in Italy was extracted from GISAID database using the filters human, VOC Omicron GRA, complete, low coverage excluded (Table S3).  (Table S4). From January 2022 the highest number of infections was registered in the Reggio Calabria province (N = 135,090 cases), followed by Catanzaro province (N = 68,421 cases), Crotone province (N = 44,896 cases) and Vibo Valentia province (N = 34,227 cases). The percentage of positive swabs per month in the specific regional area was on average 44% (range 22-85%) for Reggio Calabria, 22% (range 8-28%) for Catanzaro, 14% (range 8-25%) for Crotone and 11% (range 6-24%) for Vibo Valentia, (Table S4). The percentage of sequenced samples compared to positive swabs in the Calabria area was on average 0.4% (range 0.1-0.8%). As previously reported [39], Calabria Omicron arose in January 2022 and became the prevalent variant in February 2022. To investigate the evolution of the Omicron variant in Calabria, in this study we have analyzed SARS-CoV-2 swabs obtained during 2022, including previously published data that referred to January and February 2022. Viral RNA was extracted from 272 nasopharyngeal swabs that were positive between March and September 2022, of which 172 were subjected to whole genome sequencing while 100 were sequenced by targeted Sanger sequencing of specific regions of S, M, and N genes (see Section 2). As previously indicated SARS-CoV-2 NGS sequences relative to January and February 2022 were already published [39] and available in the GenBank [40] under accession No. SUB11466277. As to the swabs sequenced in this study, the median depth of sequencing was 4200 (range, 790-11,169), with genome coverage > 99% in 97% of the viral genome. A median number of 613,908 reads (range 110,033-1,618,768) was generated for each sample (Table S5). For data relative to sequences of swabs collected in January and February 2022, see De Marco et al., 2022 [39]. From January 2022, Omicron was the prevalent variant in the Calabria region. Phylogenetic tree analysis of the complete genomes of Omicron isolates is shown in Figure 1.

Evolution of Omicron Sub-Lineages
Through this analysis, we have identified five Omicron lineages that were further divided into different sub-lineages. The five Omicron lineages identified in this study were:  Figure S1 and Table S4. The distribution of the 10 Omicron sub-lineages present in the GISAID database (on 30 September 2022) showed, as a whole, a similar behavior throughout all Italian regions, though with some notable exceptions. The highest frequency of BA.1 in Italy occurred in Lazio, Piemonte, and Trentino-Alto Adige (31%, 37%, and 31%, respectively), whereas the lowest frequency was observed in Marche (4%). Conversely, Calabria was the Italian region with the highest frequency of BA.2.9 variant (12%). As described, BA.2.9 appeared in Calabria in April 2022 and remained until July 2022 with a frequency higher than that observed in Italy (1%). In addition, from the analysis of the Gisaid database Calabria, Emilia-Romagna, and Marche showed a frequency of 7% for BA.4, which is the highest frequency of this sub-lineage observed in Italy during 2022 ( Figure 3c).    observed in Italy during 2022 (Figure 3c). In this study, mutations that allowed the identification of a specific VOC were indicated as canonical, whereas mutations that appeared sporadically in one or more isolates were indicated as non-canonical. Both canonical and non-canonical mutations were present in the S gene, in the genes encoding both structural (E, M, and N) and nonstructural viral proteins (ORF1ab, ORF3a, ORF6, ORF7a, ORF8) [41].      In this study, mutations that allowed the identification of a specific VOC were indicated as canonical, whereas mutations that appeared sporadically in one or more isolates were indicated as non-canonical. Both canonical and non-canonical mutations were present in the S gene, in the genes encoding both structural (E, M, and N) and non-structural viral proteins (ORF1ab, ORF3a, ORF6, ORF7a, ORF8) [41].   We have also observed that the S protein of Omicron lineages presented 16 noncanonical mutations (L5F, V6I, P9T, S12F, T22I, M153I, D178N, G181V/A, P209L, I624M, N658S, A701V, A713S, P862S, E1144V). The most frequent non-canonical mutations in the cohort under analysis were L5F and A701V (2.5% and 2.7%, respectively), with N658S and A713S showing lower frequencies (0.8% and 0.5%, respectively). All the remaining non-canonical mutations (V6I, P9T, S12F, T22I, M153I, D178N, G181V/A, P209L, I624M, P862S, E1144V) appeared in only one isolate. L5F was the only non-canonical mutation that appeared in two lineages (BA.2.ˆ, and BA.5.ˆ).

Mutations in the S Protein
Notably, the lineages that presented the highest percentage of isolates with noncanonical mutations were BA.1ˆ(4.8%), BA.5.ˆ(3.9%), and BA.2.ˆ(2.9%). Concerning the lineages with a greater number of non-canonical mutations, BA.5.ˆshowed nine different non-canonical mutations (L5F, S12F, M153I, G181V/A, P209L, A713S, P862S, E1144V). In total, 36 isolates (10%) had only 1 non-canonical mutation in the S protein. Details are reported in Table 2 and Figure 4.   Concerning the timing of the emergence of non-canonical mutations, we have observed that A701V was first detected in nine patients in January, P9T and T22I appeared each in one patient in March, L5F was detected in three patients in April, I624M was detected in one patient in May, V6I, M153I, N658S were detected in five patients in June, S12F, D178N, G181A, P209L, and A713S were detected in six patients in July, G181V and P862S were detected in one patient in August, and finally, E1144V was detected in one patient in September. Notably, the highest frequency of occurrence of non-canonical mutations in the S protein was observed in July (31%). Interestingly, most non-canonical mutations were in the NTD, CT1-CT2, S1/S2 cleavage site, and FP domains of the S protein [42] but not in the RBD domain (Table 3). Some mutations are apparently of particular interest since they are located within (M153I) or near (P9T, S12F and T22I) the antigenic supersite of the NTD domain, which represents the main target of antibodies produced by memory B cells after administration of mRNA vaccine [43,44]. Three additional non-canonical mutations (N658S, A701V, A713S) were located between CT1/CT2 and S1/S2 cleavage site and may be involved in the increase in cleavage capability of the S protein [16]. Finally, N658S, identified in three BA.4 isolates, is implicated in resistance to antibody neutralization [45].
P862S is a mutation that falls in the FP domain. It showed a frequency of 0.3% in the cohort under analysis. Notably, during the Omicron wave, P862S mutation occurred with a very low frequency in Europe (0.0008%) whereas it was detected in a single patient from the Italian region, Trentino-Alto Adige, in April 2022 (GISAID database). This may be due to the observation that residue P862 in the FP domain may impair viral entry because it is necessary to initiate viral fusion with the host cell membrane [46,47]. Accordingly, only a small number of mutations in the FP domain have been described during the pandemic, possibly because this domain is highly conserved among β-coronaviruses [48,49]. The only mutation in this domain described worldwide is N856K with an overall frequency of 43% [41].
The mutation density per domain defined as the number of mutations divided by the amino acidic size of the domain was calculated and reported in Table 3. The highest mutation density was shown by the NTD domain (3.4%) followed by the CT1-CT2 (1.3%), S1/S2 cleavage site (1.6%), and FP (1.1%) domains (Table 3). Among the different Omicron lineages, BA.5.ˆand BA.2.ˆpresented the highest number of mutations in the NTD domain (N = 7 and N = 2, respectively), compared to BA.1.ˆand BA.4.ˆ(N = 1). It is of note that BA.5.â lso showed the major number of mutated domains compared to other lineages (N = 4). See Table 3.

Mutations in Structural Proteins
Structural proteins E, M, and N play crucial roles in viral replication, assembly, and release [50]. Mutations identified so far in the above-mentioned proteins are listed in Table S7 and are described in File S1. E protein shows only one canonical mutation (T9I) which is present in all lineages. M protein shows two canonical mutations (Q19E and A63T) that are present in all lineages. Only the BA.5.ˆlineage presents an additional mutation (D3N) in the M protein. Mutations R203K and G204R in N protein are common to all lineages whereas mutation P151S has been identified only in BA.4.ˆ.
In this study, we have identified five non-canonical mutations in the M protein (S4F, T7I, L34F, A85V, H125R) and fourteen non-canonical mutations in the N protein (D3V,  T49I, H59R, D103N, P162H, N181L, S194L, A220T, G275C, T362I, P365S, T366I, T379I,  A414S). Details are provided in Table 4. Several isolates in this study (8%) showed noncanonical mutations in N while a more limited number of isolates showed non-canonical mutations in M (3%). Among the identified Omicron lineages, BA.2.ˆand BA.5.ˆhad the highest percentage of isolates with non-canonical mutations in N (5% and 2%, respectively). Conversely, non-canonical mutations in the gene encoding the M protein were identified exclusively in BA.1.ˆand BA.2.ˆ(2% and 1% of isolates, respectively) ( Figure 4). These data are in agreement with previous reports showing a higher frequency of mutations in N protein compared to M in the Omicron variant compared to variants such as Alpha, Gamma, and Delta [51][52][53]. It is of note that N genomic region along with the S gene is widely used in clinical practice to diagnose SARS-CoV-2 positivity [54].

Mutations in Non-Structural Proteins
Mutations identified in non-structural proteins are listed in Table S7 and are described in Supplementary File S1. A total of 115 non-canonical mutations were identified in the ORF1ab gene; 10 non-canonical mutations and 1 deletion were identified in ORF3a; 1 non-canonical mutation was identified in ORF6; 2 non-canonical mutations were identified in ORF7a; 4 non-canonical mutations were identified in ORF8. Most non-canonical mutations in ORF1ab were identified in nsp2, nsp3, and nsp1 (18%, 15%, and 13%, respectively), in ORF3a (7% of isolates), in ORF8 (6% of isolates) and in Orf7a (3%). Only one isolate showed a non-canonical mutation in ORF6.
ORF1ab showed 10 canonical mutations in BA. With respect to other non-structural proteins, no mutations were identified in nsp5, nsp7, nsp8, nsp9, nsp10, or nsp11 while, remarkably, we detected mutations in the catalytic subunit of RdRp polymerase (nsp12) which may have a negative impact on remdesivir efficacy as previously demonstrated [55][56][57]. Within non-structural proteins, nsp1 plays an important role in the evolution of Omicron variants. Nsp1 inhibits host gene expression and suppresses the innate immune response [58,59].
In the present study, we also identified three deletions in nsp1: 141-143 del, 82-86 del, and 85 del. Four isolates showed the 141-143 del; eight isolates presented the 82-86 del and five isolates presented the 85 del. Of note, the 141-143 del was described as a canonical alteration of nsp1 in BA.4.ˆlineage and as a non-canonical alteration in four isolates of BA.2.ˆ. Deletion 79-89 del is known to be associated with a lower viral load [60]. However, when we compared the Ct values of isolates with or without the deletions in nsp1, we observed that the mean Ct values of isolates carrying the deletions 82-86 del (21.88 ± 3.08, N = 17) or 141-143 del (23.25 ± 3.30, N = 4) were not significantly higher than the Ct values presented by the isolates without deletions (19.98 ± 2.96, N = 185).
ORF3a protein plays a key role in viral genome replication and virion assembly, inhibition of host mRNA export and translation, and escape from the host immune system [61].
We identified 9% of isolates with non-canonical mutations in the gene encoding ORF3a. ORF3a is a viral potassium ion channel (viroporin) that inhibits autophagy and activates the inflammasome and apoptosis [62,63]. We found 11 non-canonical mutations that, to our knowledge, have been poorly characterized so far, except for the L108F and F207L mutations. Although extremely rare, these mutations have been predicted to exert a destabilizing effect on the stability of the protein structure [64].
Conversely, ORF8 impairs host cell-mediated immune responses by downregulating MHC-1 molecules [65,66]. ORF8 is poorly conserved among related coronaviruses, likely because of its key role in the pathogenicity of SARS-CoV-2. Many mutations and deletions of ORF8 have been identified [67]. All reports conclude that SARS-CoV-2 can survive without a functional ORF8 though the accuracy of serological tests may be compromised in such cases [68,69]. In this study, we found four non-canonical mutations (Q27*, E64G, A65D, S67F) of the ORF8 protein in 8% of isolates.

Discussion
The present manuscript provides data regarding the evolution of SARS-CoV-2 spread in Calabria, Southern Italy.
The first key finding of this study includes the analysis of the diffusion of Omicron variants in Calabria. Omicron spread during 2022 presented a similar trend in Italy and Calabria, with some notable exceptions: (i) recombinant BA.1/BA.2 showed higher frequency in Calabria (13%) than in the rest of Italy (0.02%), and presented all canonical mutations of S protein of BA.2.ˆlineage; (ii) BA.2.9, BA.4 and BA.5 emerged in Calabria later than other Italian regions; (iii) BA.1 sub-lineage disappeared in April in Calabria but not in the rest of Italy.
As observed in this study, the recombinant BA.1/BA.2 increased in Calabria in April (13%) while its frequency was negligible in the rest of Italy (mean frequency 0.02%), with the highest frequency in Friuli Venezia Giulia (0.4%) and the lowest in Lazio (0.02%).
BA.2.9 occurred for the first time in February in Umbria (0.07%) while in Calabria this sub-lineage was detected for the first time in April.
BA.4 appeared for the first time in May in Lombardia (0.03%) and BA.5 appeared for the first time in May in Umbria, Lazio, and Puglia (0.09%, 0.04%, and 0.02%, respectively), while the first cases in Calabria were detected in June.
Finally, unlike what was observed in Calabria where BA.1 disappeared in April, it was still present in Italy until May with cases in Lombardia (0.04%) and Campania (0.002%). In Italy in 2022 COVID-19 showed three peaks of infection associated with the spread of Omicron lineages BA.1, BA.2, BA.4, and BA. 5 [38]. In particular, the spread of the BA.1 lineage, from late December 2021 to mid-January 2022, led to an exponential increase in COVID-19 cases, characterized by a 55% increase in the number of diagnosed cases. The third point was the discovery of 16 non-canonical mutations identified in the S protein as well as 151 non-canonical mutations in the non-structural and accessory proteins of the SARS-CoV-2 genome. The mutations identified in ORF1ab, ORF3a, M, ORF8, and N were not common to all Omicron sub-lineages but showed a specific distribution among variants.

Conclusions
This study provides a detailed description of the genome evolution of the Omicron variant in the Calabria region of Southern Italy. This analysis allowed the identification of non-canonical mutations in most genes (ORF1ab, S, ORF3a, M, ORF8, N) of the SARS-CoV-2 genome. The non-canonical mutations identified in this study occurred in viral proteins involved in viral replication, assembly and release, viral fusion with cellular membranes, escape from the immune system, resistance to neutralization by antibodies, and resistance to antiviral therapies. The mutations highlighted here show a specific distribution among Omicron sub-lineages allowing discrimination among them.
In summary, sequencing of the entire SARS-CoV-2 genome allows not only the identification of the exact viral variant with which humans are infected but also real-time monitoring of the emergence of novel mutations that could impact viral transmissibility and the efficacy of therapies.