Molecular Evolution and Epidemiological Characteristics of SARS COV-2 in (Northwestern) Poland

The emergence of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) evolved into a worldwide outbreak, with the first Polish cases in February/March 2020. This study aimed to investigate the molecular epidemiology of the circulating virus lineages between March 2020 and February 2021. We performed variant identification, spike mutation pattern analysis, and phylogenetic and evolutionary analyses for 1106 high-coverage whole-genome sequences, implementing maximum likelihood, multiple continuous-time Markov chain, and Bayesian birth–death skyline models. For time trends, logistic regression was used. In the dataset, virus B.1.221 lineage was predominant (15.37%), followed by B.1.258 (15.01%) and B.1.1.29 (11.48%) strains. Three clades were identified, being responsible for 74.41% of infections over the analyzed period. Expansion in variant diversity was observed since September 2020 with increasing frequency of the number in spike substitutions, mainly H69V70 deletion, P681H, N439K, and S98F. In population dynamics inferences, three periods with exponential increase in infection were observed, beginning in March, July, and September 2020, respectively, and were driven by different virus clades. Additionally, a notable increase in infections caused by the B.1.1.7 lineage since February 2021 was noted. Over time, the virus accumulated mutations related to optimized transmissibility; therefore, faster dissemination is reflected by the second wave of epidemics in Poland.


Introduction
Coronaviruses (CoVs) are a family of large, up to~30kb in length single-stranded RNA viruses. Four phylogenetically distinct groups (alpha, beta, gamma, delta) have been identified, including seven human coronavirus variants (hCoVs): the alpha-CoVs HCoVs-NL63 and HCoVs-229E, beta-CoVs HCoVs-OC43 and HCoVs-HKU1, severe acute respiratory syndrome-CoV (SARS-CoV), the Middle East respiratory syndrome-CoV (MERS-CoV), and lastly SARS-CoV-2 [1]. The emergence of the last one, identified in Wuhan, China, is responsible for the ongoing pandemic of coronavirus disease 2019 (COVID- 19) and has initiated unprecedented efforts in studies on its molecular evolution, including wholegenome sequencing [2]. Coronaviruses possess lower mutation rates than other respiratory RNA viruses (a rate of ∼1.1 × 10 −3 substitutions per site per year, corresponding to one substitution every ∼11 days); however, a high number of infections in the expanding epidemics translates into increasing interhost genomic diversity [3].
As of 22 June 2021, more than 2,041,359 complete genomes of SARS-CoV-2 were deposited in the Global Initiative on Sharing All Influenza Data (GISAID) database [4].
This extensive molecular surveillance of virus requires a straightforward approach to the classification of variant genetic diversity. Three main nomenclatures have been introduced for SARS-CoV-2, including Nextstrain clades [5], PANGO lineages (PANGO, Phylogenetic Assignment of Named Outbreak LINeages) [6], and GISAID groups [7]. While PANGO provides more detailed outbreak cluster information, the other two classifications offer broad geographical and temporal clade trends. A vast amount of molecular data enables real-time tracing of the pandemic evolution by investigation of the outbreaks and surveillance of the emergence of novel circulating strains.
The emergence of the novel variants requires constant molecular surveillance, as molecular diversity results in evolution into variants of concern (VOC), with clear scientific evidence on the improved transmissibility, immune escape, or severity. This evolution may notably hinder efforts in the combat of SARS-CoV-2 pandemics [8]. Furthermore, expanding population immunity exerts novel selection pressures on the virus, further underscoring the importance of monitoring the vaccine, convalescent plasma, and immunoglobulin escape variants [9]. Novel virus strains are also classified as variants of interest and variants under monitoring, with preliminary data suggesting an increase in transmission risk or disease severity. Key VOCs currently circulating in Europe and worldwide are variants B.1.1.7 originating in the UK, which has dominated the EU epidemics in recent months, B.1.351 from South Africa, and P.1 first identified in Brazil, all with clear evidence on their increased transmissibility and severity [10,11].
All VOCs and most variants under surveillance harbor mutations within the spike protein-coding regions, allowing for optimized binding to the human angiotensin-converting enzyme receptor (ACE-2). Specific mutations, such as D614G, have been fixed in the circulating viral strains since the initial months of the pandemic, while others are common among the variants of increased virulence (e.g., N501Y, E484K) [12,13]. Additionally, deletions in the spike-coding regions, such as ∆H69/V70, were associated with the increased incorporation of spike into virions, which may act as a permissive factor allowing for the emergence of other deleterious mutations [14]. Studies on the impact of the mutations on virus evolution are ongoing, and continuously identify novel variants and mutations, with the key recent ones being L452R, E484Q, and T478K from Indian isolate B.1.617-VOC Delta and Kappa [15].
In Poland, the first confirmed case of the COVID-19 disease was registered on 3 March 2020 (https://www.gov.pl/web/coronavirus accessed on 22 June 2021). The epidemic has evolved in three waves so far. The first was a mild one observed in the spring of 2020, while the second and third waves have been associated with the high COVID-19 rates/100,000, significant mortality, and limited access to medical services. Recent increases in the number of new cases have been associated with the introduction of the B.1.1.7 variant into the population. So far, national totals as of 21 June 2021 were 2,878,840 cases of infections and 74,829 deaths which translate into <3% official COVID-19 mortality [16]. The country is scaling up the molecular detection and sequence-based identification efforts to provide upto-date information on the population evolution of the virus; however, so far, no detailed phylogenetic study on the variant evolution and mutation emergence has been published.
In the present study, we aimed to present molecular surveillance data on SARS-Cov-2 variant evolution in the first year of the pandemic in Poland, with the characteristics of spike protein mutations, based on sequences from Northwestern Poland supplemented with the GISAID data. Moreover, we performed evolutionary and epidemiological analyses to reflect the characteristics of the virus variants circulating at the country level, tracing the origin and the temporal population dynamics of SARS-CoV-2 in Poland.

Study Group
In this study, a dataset of locally obtained samples (159 cases) with nucleic acid amplification testing (NAAT)-confirmed SARS-CoV-2 infection was supplemented with a set of 1005 sequences (a total of 1164 sequences) from Poland available in the public GISAID database as of 1 March 2021. Clinical data were obtained from medical record reviews or collected by the sequencing laboratory in the process of clinical testing.

SARS-CoV-2 Whole Genome Sequencing (WGS)
RNA was extracted using the MagMAX Viral/Pathogen Nucleic Acid Isolation Kit (Thermo Fisher Scientific (TFS), Vantaa, Finland), and the automated KingFisher Flex (TFS, Singapore) instrument for automated sample purification according to manufacturer instructions. Subsequently, the extracted RNA was quantified with TaqMan

Sequence Data Sets
The MAFFT v.7.471 program [17] was used to align and remove sequences with more than 5% ambiguous letters. Following quality control, from the initial dataset of 1164 Polish SARS-CoV-2 sequences we selected 1106 high-coverage full-genome sequences including 122 local Northwestern Poland samples and 984 genomes obtained from GISAID. In the next step, the SARS-CoV-2 alignments were filtered, and sequences were masked following the script published by Nicola del Mario et al. [18]. For SARS-CoV-2 variant identification and mutation calling, we employed two lineage assignment tools: PANGOLIN v2.3.2 (https://github.com/cov-lineages/pangolin (accessed on 15 March 2021); referred to as PANGO) and NEXTCLADE v0.14.2 (https://github.com/nextstrain/nextclade (accessed on 15 March 2021); referred to as Nextstrain). Finally, for phylogeographic analyses, each West Pomeranian sequence was used as a query in a BLAST search (Basic Local Alignment Search Tool) against all the GISAD SARS-CoV-2 sequences. For every West Pomeranian isolate, ten of the most similar sequences were downloaded, and duplicates removed. As a result, a set of 376 sequences related to Polish Northwestern SARS-CoV-2 genomes was obtained. Quality checks of the final sequences and evaluation of genetic distance were performed in MEGAX software [19].

Phylogenetic and Phylodynamic Analyses
All phylogenetic trees were generated with IQ-Tree v2.0.5 [20] using the maximum likelihood (ML) method with approximate likelihood ratio test (aLRT) and ultrafast bootstrap with 1000 replicates. The GTR+F+G model with four gamma categories was selected as optimal for the analyzed dataset using ModelFinder accuracy estimates [21]. All trees were visualized in the Interactive Tree of Life (iTOL) [22]. After TempEst v.1.5.3 [23] analysis, the SARS-CoV-2 phylogenies exhibited a moderate association between genetic distances and sampling dates and were suitable for phylogenetic molecular clock analysis in Bayesian Evolutionary Analysis by Sampling Trees (BEAST). A large residual point scatter from the regression line suggested that a relaxed molecular clock model should be most appropriate for subsequent analysis. Different coalescent tree priors for identified clade 1-3 sequences were separately implemented in the BEAST v.1.10.4 software package [24]. For the time-scaled analysis, the uncorrelated relaxed clock model with an underlying lognormal distribution (UCLN) and tree prior of coalescent Bayesian skyline growth population with five groups piecewise-constant model was used. As identified by ModelFinder, we used a multiple continuous-time Markov chain (MCTMC) GTR+F+G4 model. Five hundred million Markov chain Monte Carlo (MCMC) runs with sampling every 10,000 steps were computed and processed in two independent replicates of the same inference [25]. LogCombiner was used for combining the output from multiple runs, and results were visualized and checked in Tracer v1.8 [26]. The effective sampling size values (ESS) were 200 or more, indicating adequate convergence.
The Bayesian birth-death skyline model (BDSKY) [27] was implemented in BEAST v2.62 [28] to estimate changes in the effective reproductive number (R e ) for three clades separately. For heterochronous data, the Birth Death Skyline Serial prior with UCLN distribution was selected under a GTR+G4 substitution model based on ModelFinder selection. The evolutionary rates for each clade were based on the slope of root-to-tip plots assigned in TempEst. A lognormal distribution with a mean of 0 and standard deviation of 1.0 for R e was chosen, and the dimension of the parameter was selected to be five. The rate to become uninfectious (δ) had a normal distribution with a mean of 48.7 and a standard deviation of 15. These values reflect the inverse of the time of infectiousness and were estimated by Li Q. et al. [29]. The sampling proportion (ρ) prior was assessed with the alpha parameter set to 1 and beta to 9999. The origin of the epidemic was approximated with normal prior using a mean of 0.25 and standard deviation of 0.05 units per year, as described elsewhere [30]. MCMC ran for at least 200 million generations and was sampled every 50,000 steps. The ESS value reads were diagnosed using Tracer, and values above 200 indicated sufficient sampling.

Statistics and Visualization
Statistical comparisons were performed with Fisher's exact and X 2 tests for nominal variables, as needed. The confidence intervals (CI) were marked as appropriate. Statistical calculations were made with commercial software (Statistica v13. Statasoft, Warsaw, Poland). The R (4.0.2.) platform [31] was performed with packages including MASS [32] for time trends and logistic regression, CORRPLOT [33] for Spearman Rank test, and VCD [34] to visualize the two-way contingency plots.

Phylogenetic Analysis of SARS-CoV-2 Genomes
Phylogenetic analysis revealed that 105 (86.07%) sequences from the West Pomerania region formed three main clades with aLRT support >90% ( Figure 2). We performed phylogenetic reconstruction of the sequences and the ancestry to identify clade-defining mutations (Table 1). For the purpose of this study we named these monophyletic groups as  Table 1). The mean number of base substitutions per site within the clade (6.58 × 10 −4 , SEM = 4.23 × 10 −5 ) versus mean inter-clade variability (1.32 × 10 −3 , SEM = 1.04 × 10 −4 ) showed the higher intra-group sequence similarity, as shown in the inferred phylogenetic tree (estimates of clade average evolutionary divergence presented in Supplementary Table S1).

Phylogenetic Analysis of SARS-CoV-2 Genomes
Phylogenetic analysis revealed that 105 (86.07%) sequences from the West Pomerania region formed three main clades with aLRT support >90% ( Figure 2). We performed phylogenetic reconstruction of the sequences and the ancestry to identify clade-defining mutations (Table 1). For the purpose of this study we named these monophyletic groups as  Table 1). The mean number of base substitutions per site within the clade (6.58 × 10 −4 , SEM = 4.23 × 10 −5 ) versus mean inter-clade variability (1.32 × 10 −3 , SEM = 1.04 × 10 −4 ) showed the higher intra-group sequence similarity, as shown in the inferred phylogenetic tree (estimates of clade average evolutionary divergence presented in Supplementary Table S1).

Genetic Variability of the SARS-CoV-2 Spike Protein
This section examines the variability of the SARS-CoV-2 spike protein-coding region in the analyzed dataset. In total, we identified 21 mutations, of which 11 were found in more than 5% of the sequences ( Figure S4). D614G substitution was fixed (97.47%) in the circulating viruses in Poland.
Spike protein H69V70 deletion was the second most common mutation, observed in 279 (25.23%) sequences. Genetic variability of the SARS-CoV-2 and number of spike mutation containing variants increased rapidly since September 2020, with additional accumulation of P681H, N439K, S98F, during the second wave of SARS CoV-2 epidemics in Poland ( Figure 5, Tables S1 and S2).

Genetic Variability of the SARS-CoV-2 Spike Protein
This section examines the variability of the SARS-CoV-2 spike protein-coding region in the analyzed dataset. In total, we identified 21 mutations, of which 11 were found in more than 5% of the sequences ( Figure S4). D614G substitution was fixed (97.47%) in the circulating viruses in Poland.
Spike protein H69V70 deletion was the second most common mutation, observed in 279 (25.23%) sequences. Genetic variability of the SARS-CoV-2 and number of spike mutation containing variants increased rapidly since September 2020, with additional accumulation of P681H, N439K, S98F, during the second wave of SARS CoV-2 epidemics in Poland ( Figure 5, Tables S1 and S2). The frequencies of the 14 most frequent sequence changes were compared using the Spearman's correlation rank test to reflect the co-existence of the mutations in the analyzed genomes. Six substitutions and two deletions were strongly correlated ( Figure 6). Co-existence of ΔH69V70, ΔY144, P681H, T716I, S982A, A570D, N501Y, and D1118H was the signature for the B.  The frequencies of the 14 most frequent sequence changes were compared using the Spearman's correlation rank test to reflect the co-existence of the mutations in the analyzed genomes. Six substitutions and two deletions were strongly correlated ( Figure 6). Coexistence of ∆H69V70, ∆Y144, P681H, T716I, S982A, A570D, N501Y, and D1118H was the signature for the B.

Temporal Trends for Spike Mutation Frequency
For the analyses of time trends related to spike mutation frequency, samples obtained before November 2020 were excluded due to low genetic diversity. The proportion of eight analyzed most common missense mutations in the spike protein-coding region increased significantly over time ( Figure 7 and     Two mutations, A570D and N501Y, were in perfect linkage; therefore, the increase in incidence from 0.88% in December to 51.32% in February (OR: 10.12, 95% CI 6.83-15.37, p < 0.0001) was the same. The last notable increasing trend was detected for D1118H, from 0.88% in December to 49.68% in February (OR: 9.90, 95% CI 6.83-15.01, p < 0.0001). Interestingly, only the frequency of N439K dropped, from 26.31 to 10.97% (OR: 0.68, 95% CI 0.57-0.82, p < 0.0001) in the analyzed months. Of note, when excluding the B.1.1.7 variant from temporal trends analysis, only four mutations (delH69V70, P681H, S98F, A222V) had a significantly different frequency from November 2020 until February 2021 (see Figure S5 and Table S4).

Phylodynamic Analysis of the Polish Dataset
The mean tMRCA (time to most recent common ancestor) was estimated for Clade 1 (lineages B.   The effective reproduction number (R e ) estimates of Clade 1 showed complex phylodynamics, indicated by two declining phases and two growing frames (Figure 9a). The mean value ranged from 1.25 (95% HPD 1.17-1.33) since the origin of the epidemic to 0.92 (95% HPD 0.87-0.96) in June-August 2020. In mid-August 2020, the curve started to grow until 1.089 (95% HPD 1.06-1.12) in November-December 2020 and finally reclined to 1.06 (95% HPD 1.04-1.09) at the end of sampling. For Clade 2, the curve started to grow in the second half of October 2020 and reached the value of 1.52 (95% HPD 0.97-2.14) in December 2020 (Figure 9b). After the new year, the value of R e fell to its lowest level of 0.91 (95% HPD 0.76-1.04) and rose again to 1.08 (95% HPD 0.98-1.21) in February 2021. During the sampling period, R e estimates for Clade 3 had the highest values. At the beginning of November 2020, R e was 2.16 (95% HPD 1.74-2.64), then showed a fast decrease to 1.02 (95% HPD 0.87-1.17) at the end of January 2021. In February 2021, the R e value peaked at 1.37 (95% HDP 1.24-1.52) and then dropped to 0.34 (95% HDP 0.13-0.55) in the first week of March 2021 (Figure 9c).

Discussion
Following the introduction of the novel virus into the population, surveillance studies assessing molecular evaluation and diversity remain of primary importance. Since the emergence of the SARS-CoV-2 pandemic, the expanding use of genomic technologies has unprecedently allowed the rapid and continuous update of the phylodynamic evolution of this virus [35]. The emergence of the novel variants and mutations affecting viral transmissibility and pathogenicity require constant phylogenetic updates to inform public responses and vaccine studies [36]. Novel variants may increase the R 0 due to spike protein mutations with the modeling data suggesting 56-75% higher transmissibility compared to the previously circulating strains (https://www.ecdc.europa.eu/en/publications-data/ covid-19-risk-assessment-spread-new-variants-concern-eueea-first-update accessed on 22 June 2021). In this study, we present novel data on the sequence evolution, mutation patterns and phylodynamics of SARS-CoV-2 from both the regional (Northwestern Poland) and national perspective, including temporal reconstructions.
We report a significant change in the virus characteristics over time, with the dynamic expansion of genetic variability, observed both as the increase in strain diversity and number of spike mutations, coinciding with the beginning of the second wave of epidemics in Poland observed from September 2020. In this period, the seven-day average in the country exceeded 20,000 cases with >800 cases per 100,000 during the peak of the wave; however, molecular detection of active infection was focused on symptomatic cases, which resulted in underreporting of the morbidity (https://covid19-surveillance-report.ecdc.europa.eu/ accessed on 22 June 2021), underscoring the importance of the presented research from the phylodynamic perspective with the added value of the presented coalescent and birth-death models [37].
Interestingly, the trees' root includes local and other Polish sequences closely related to the 19A and 19B strains and the Wuhan-obtained reference. This may reflect the early introduction of the virus to Poland within the short time of virus entry to Europe. Similar phenomena were observed in Italy and Germany [38]. Our analysis shows that subsequently, three genetic lines of the SARS-CoV-2 molecular evolution have emerged, with the largest clade (Clade 1, lineages B.1.1.*) being the most diverse genetically, with a significant proportion (24.6% for this clade) of the VOC B.1.1.7 variant characterized by a signature combination of the ∆Y144, N501Y, A570D, S982A, D1118H, and T716I spike mutations. This variant, however, became dominant in early 2021, which is in line with observations observed in other countries-Germany, the United Kingdom, and Denmark [10]. Interestingly, two genetically convergent strains (Clade 2 and 3) classified as B.1.258 and B.1.221 were responsible for almost 1/3 (29.2%) of the remaining infections in the analyzed period, with Clade 2 characterized by the conjunction of two spike mutations, ∆H69_V70 and N439K. It was previously suggested that ∆H69_V70 is associated with the increased transmissibility via spike incorporation into virions, and may be regarded as a "permissive mutation", enhancing infection and allowing tolerance of the immune escape mutations related to the loss of replicative capacity. In comparison, N439K increases the binding affinity to ACE2 and may be responsible for the immune escape from the convalescent sera and monoclonal antibodies [39]. It has been indicated before that N439K co-occurs with ∆H69_V70 in the PANGO B.1.258 clade, exactly as confirmed in the presented phylogenetic analysis. In August 2020, this variant was present mainly in Ireland. It began to spread to Central Europe, with a high number of infections in the Czech Republic (November 2020) and Slovenia (December 2020/January 2021). In Poland, the peak of infection caused by the B.1.258 lineage was recorded in January 2021 (see Supplementary Figure S5) [40,41]. This lineage was probably introduced to Western Pomerania not from Southern Europe but from Germany or Nordic countries (Figure 4) Figure S5) and is still circulating. The Netherlands and UK have the third-and first-largest representation of Polish emigrants in Europe (https://stat.gov.pl/en/ accessed on 22 June 2021) and likely have links with migration-and travel-related introduction of infections.
As presented, during the first wave of the SARS-CoV-2 epidemic in Poland the virus was less diverse genetically, with practically only a D614G spike mutation observed in the analyzed genomes. On the other hand, in the second wave of pandemics observed in autumn 2020, the molecular diversity of the virus has increased in line with an explosive number of cases and significant mortality. In this period, we report an expansion of more virulent variants such as B.1.1.7 VOC and strains not associated with the increased transmissibility per se but containing the described above ∆H69_V70, N439K, or P681H mutations. From November to the end of the analysis, the frequencies of the spike mutations increased by several folds, most likely reflecting the increasing dynamic of the infection in the population and a high number of circulating viral strains in the susceptible population. As shown, this increment was associated with the increasing prevalence of B.1.1.7, which naturally contains some of the spike mutations, the accumulation of deletions, and other substitutions in non-VOC strains. A decrease in the prevalence of the N439K variant was the exception. This was most likely due to the expansion of the B.1.1.7 variant, which does not contain N439K substitution, and a smaller proportion of the B.1.258 infections in early 2021. This was also reflected by the decrease in ∆H69_V70 frequency in non-B.1.1.7 and confirmed by the phylodynamic analyses birth-death skyline plot of Cluster 2 and 3, indicating drop in the R e in the last months of the analysis.
As expected and suggested by the previous studies, D614G spike mutation has become fixed in the observed sequences as this variant was associated with in early sequences with transmission advantage and higher SARS-CoV-2 viral loads [42]. The second most common mutation is the ∆H69_V70 deletion. Further studies are required to indicate if this mutation will also predominate in the circulating strains, but it is also present in VOC B.1.1.7, it can be assumed that its frequency will continue to increase.
When analyzing the clustered Northwestern sequences from Poland, the highest homology with German sequences was observed for all identified clades, which is understandable due to the geographic location of the region but may importantly confirm the international cross-border spread of the SARS-CoV-2 between adjacent territories. Of note, in East Germany, the three main lineages, B.1.177, B.1.258, and B. 1.221, from the second wave (between August to October 2020) had been circulating, representing a very similar pattern to West Pomeranian isolates [43].
In the phylodynamic analyses, tMRCAs were in line with the epidemic course in Poland and support the observation of the increasing genetic diversity of the circulating virus. Moreover, the effective reproductive number estimated for March 2020 was greater than 1, which suggests the spread of the virus before the first confirmed COVID-19 isolate was collected. Skyline plots for all clades closely reflect the observed epidemiology, with peaks of cases seen from October/November 2020, explaining an increase in the number of infections in the same period. The values of R e confirmed that B.1.1.* (Clade 1) was responsible for the spread of the first wave of infections in Poland. Furthermore, during the second wave of epidemics, all three clades expanded, with a high increment in variants B.1.221 (Clade 3) and B.1.258 (Clade 2). Restrictive measures introduced in the period from mid-October 2020 to mid-January 2021 reduced and stabilized the infections observed during the second wave of the epidemic.
The principal limitation of the study was the number and time frame of collected isolates from the West Pomerania region and analyzed GISAID sequences. They represent less than 2‰ of recorded SARS-CoV-2 infections in Poland. Nevertheless, the number of samples was substantial, allowed reliable phylogenetic analysis, and inferred population genetics. The scale of the epidemic excludes molecular surveillance of each isolate.
To conclude, continuous tracing of emerging virus lineages should be focused on variants of interests and variants of concern and the evolution of spike mutations. Phylodynamic studies identify the introductory events with subsequent spread of the virus and its divergence into clades. Increasing molecular variability during the second wave of pandemics in Poland might have resulted in the number of cases not only related to expanding infections with VOC B.1.1.7. Additionally, expansion of the variants bearing mutations related to optimized transmissibility and potentially higher virulence might have contributed to the epidemic waves. Continuous surveillance allows follow-up of virus evolutionary variability and the risks associated with the emergence of new variants.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13071295/s1, Figure S1: Prevalence of main SARS-CoV-2 lineages in Poland, Figure S2A and B: Two-way contingency plots of differences in Sars-CoV-2 variant prevalence by analyzed region, Figure S3: Frequency of major Spike mutation identified in Polish Sars-CoV-2 strains, Figure S4 Table S1: Estimates of average evolutionary divergence over West Pomeranian sequence pairs within and between Clades, Table S2: Differences in Sars-CoV-2 variants prevalence by analyzed region, Table S3: Differences in mutations of Sars-CoV-2 spike protein prevalence by analyzed region,   Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The sequences used in this work have been deposited in the GISAID and can be found under the appropriate IDs: EPI_ISL_2631232-EPI_ISL_2631325, and EPI_ISL_2650471-EPI_ISL_2650498.