Emergence and Spread of SARS-CoV-2 Lineages B.1.1.7 and P.1 in Italy

Italy’s second wave of SARS-CoV-2 has hit hard, with more than three million cases and over 100,000 deaths, representing an almost ten-fold increase in the numbers reported by August 2020. Herein, we present an analysis of 6515 SARS-CoV-2 sequences sampled in Italy between 29 January 2020 and 1 March 2021 and show how different lineages emerged multiple times independently despite lockdown restrictions. Virus lineage B.1.177 became the dominant variant in November 2020, when cases peaked at 40,000 a day, but since January 2021 this is being replaced by the B.1.1.7 ‘variant of concern’. In addition, we report a sudden increase in another documented variant of concern—lineage P.1—from December 2020 onwards, most likely caused by a single introduction into Italy. We again highlight how international importations drive the emergence of new lineages and that genome sequencing should remain a top priority for ongoing surveillance in Italy.


Introduction
COVID-19, caused by infection with severe respiratory coronavirus 2 (SARS-CoV-2), has had a devastating global impact, with an estimated 116 million cases and 2.5 million deaths [1]. Of these reported infections, more than 78% have occurred between September 2020-March 2021, greatly outnumbering the 25 million cases and~800 k deaths reported globally prior to 31 August 2020 [2]. This second wave of infections is also characterized by the emergence of numerous new lineages of SARS-CoV-2, three of which are commonly referred to global 'variants of concern'-B.1.1.7, B.1.351, and P.1 [3]. The B.1.1.7 lineage was first detected in the United Kingdom in late September 2020 [4] and is characterized by 17 changes across the genome including the N501Y amino acid mutation and a two amino acid deletion at positions 69 and 70 of the spike protein [5]. The B.1.351 lineage was first reported in South Africa and shares similar substitutions to B.1.1.7 including N501Y but not the 69/70 deletion [6], while the P.1 lineage was first identified in Brazil and shares two substitutions with B.1.351-E484K and N501Y in the spike protein [4]. All of these lineages have seemingly replaced previous circulating variants in their geographic regions and have spread to other countries in Europe, the Americas, and Asia [7][8][9]. There is also considerable concern about the possibility of reinfection with these new lineages due to reduced cross-protective immunity [10][11][12], which may also have implications for vaccine efficacy [13,14]. Thus, increased surveillance to document and understand the spread of different SARS-CoV-2 lineages is of upmost importance.
The first cases of SARS-CoV-2 in Italy were reported in January 2020 and the country was hit hard with a peak of >6500 cases a day in late March 2020. Italy was also the first European Viruses 2021, 13, 794 2 of 9 country to impose a nation-wide lockdown from 9 March to the 18 May 2020. In retrospect, this first wave was of relatively low magnitude compared to the huge number of infections reported between September 2020 and March 2021. Indeed, during this second wave, Italy experienced a maximum of~40,000 cases a day in mid-November 2020 (Figure 1a), and an estimated 3.1 million cases and~100,000 deaths have been reported to date (8 March 2021) [15]. Beginning in October 2020, a color-coded system was established to restrict the mobility of residents in each administrative region in Italy, depicting increasing the levels of restriction from yellow to red. In addition, in attempt to reduce viral transmission during the holiday season, further restrictions to social mobility and on possible sources of infection (schools, restaurants, etc.) were put in place for the whole of Italy between 21 December 2020 and 6 January 2021. On 20 December 2020, the Italian Minister of Health signed an order prohibiting all air flights from/to United Kingdom. On 16 January 2021 the same restriction was applied to the air flight from/to Brazil.
Viruses 2021, 13, x FOR PEER REVIEW 2 of 10 lineages due to reduced cross-protective immunity [10][11][12], which may also have implications for vaccine efficacy [13,14]. Thus, increased surveillance to document and understand the spread of different SARS-CoV-2 lineages is of upmost importance. The first cases of SARS-CoV-2 in Italy were reported in January 2020 and the country was hit hard with a peak of >6500 cases a day in late March 2020. Italy was also the first European country to impose a nation-wide lockdown from 9 March to the 18 May 2020. In retrospect, this first wave was of relatively low magnitude compared to the huge number of infections reported between September 2020 and March 2021. Indeed, during this second wave, Italy experienced a maximum of ~40,000 cases a day in mid-November 2020 (Figure 1a), and an estimated 3.1 million cases and ~100,000 deaths have been reported to date (8 March 2021) [15]. Beginning in October 2020, a color-coded system was established to restrict the mobility of residents in each administrative region in Italy, depicting increasing the levels of restriction from yellow to red. In addition, in attempt to reduce viral transmission during the holiday season, further restrictions to social mobility and on possible sources of infection (schools, restaurants, etc.) were put in place for the whole of Italy between 21 December 2020 and 6 January 2021. On 20 December 2020, the Italian Minister of Health signed an order prohibiting all air flights from/to United Kingdom. On 16 January 2021 the same restriction was applied to the air flight from/to Brazil.  We previously showed that virus importation associated with travel, followed by local transmission, were key drivers of viral spread during the first wave of SARS-CoV-2 in Italy [16]. After a year into the pandemic, lockdown restrictions have proven effective in reducing virus transmission and limiting geographic spread [17], although international travel is still a major source for the introduction of new lineages into Italy [18]. Here we show how the B.1.1.7 and P.1 variants entered Italy before the implementation of border closures and that these two variants have subsequently spread rapidly across the country.

Sequence Data
In Italy, the Department of Health Prevention, itself organized into nearly 100 Local Health Units (ASL), are also in charge of implementing all SARS-CoV-2 surveillance and tracing activities. Surveillance is mainly based on nasopharyngeal swab testing by real time RT-PCR in approved public laboratories, and performed on all individuals with clinical signs compatible with COVID-19 (persisting fever, dyspnea, anosmia, etc.) as well as those in strict contact with confirmed cases in the 14 days before clinical onset. In addition, mass drive-through testing initiatives that make use of antigenic rapid tests are organized in ports, airports, main train and bus stations, major cities, and towns experiencing a significant increase in case numbers. People with positive results to antigenic rapid tests are then subjected to a follow-up swab for molecular testing. In addition, periodic testing is performed on at-risk worker categories, such as health and laboratory personnel. Sequencing and genome analysis activities based on direct infected swab samples are still far from optimal, and are performed on a voluntary basis by equipped laboratories scattered all over the Italian territory. However, of a total of 834,274 sequences available on the GISAID database (https://www.gisaid.org/ (accessed on 22 March 2021)) up to 22 March 2021, only 13,769 originated from Italy, with huge differences between regions.
All Italian sequences of SARS-CoV-2 available on the GISAID database were downloaded (8 March 2021, Supplementary Table S1). Sequences without a complete date of collection were removed. As per our previous study, cases were grouped according to geographical non-administrative macro areas in North (n = 1195), Central (n = 1403), and South Italy (n = 3917) [16]. SARS-CoV-2 lineages were assigned to each sequence using the Pangolin COVID-19 Lineage Assigner tool v2.0.7 (github.com/cov-lineages/pangolin). All available global sequence data for B.1.351 and P.1 were similarly downloaded from the GISAID database. Due to the large amount of data available for B.1.1.7 (>170,000), a random subset of 3000 global sequence data was used in combination with all available Italian sequence data. In brief, a global phylogeny containing~289,000 sequences was downloaded from GISAID [19]. Sequences classifying as B.1.1.7 lineage (~94,000) were extracted from which a random subset of 3000 sequences was selected. A list with accession numbers for the global sequence data used in the phylogenies can be found in Supplementary Table S2.

Phylogenetic Analysis
Separate nucleotide sequence alignments were constructed for the B.1.1.7, P.1, and B.1.351 lineages. A representative sequence for each lineage was used as a reference and as an outgroup for tree rooting (B.1.1.7 = EPI_ISL_862239, B.1.351 = EPI_ISL_660190, P.1 = EPI_ISL_833137). Three separate alignments were performed using MAFFT implementing the L-INS-I algorithm and manually inspected for accuracy using Geneious Prime ® 2021.1.1 (https://www.geneious.com (accessed on 12 March 2021)) [20]. Full genome sequences with >5% ambiguity and no exact sampling date were removed. The final data sets comprised: P.1 n = 614 (Italy n = 111), B.1.351 n = 2317 (Italy n = 8), and B.1.1.7 n = 4361 (Italy n = 1461). A maximum likelihood tree was estimated for each lineage using IQ-TREE implementing the Hasegawa-Kishino-Yano nucleotide substitution model with a gamma distributed rate variation among sites (HKY+Γ) and an ultrafast bootstrap method (1000 repetitions) [21,22]. Although these data show that different lineages of SARS-CoV-2 were first reported within a short time period throughout the country, we also identified region-specific sampling biases with unusual abundances of sequences obtained on individual days (Supplementary Figure S1). These sampling 'peaks' are most likely the result of targeted sampling after an outbreak. For example, 193 sequences were sampled on 26 October 2020 alone, of which 104 were B.1.177 all sampled in the region Campania in South Italy. Also, in this region, 42 B.1.1.187 sequences were sampled on 29 June 2020 alone, and which represents 51% of all sequence data available for this lineage in Italy. Notably, according to GISAID, this lineage has only been sampled in the UK (n = 14) and Japan (n = 1), and South Italy (n = 81).  (Figure 1b). The first P.1 sequences were identified on 7 January 2021 in Central Italy in travelers returning from Brazil [23]. This lineage was subsequently found on 21 January and 12 February in North and South Italy, respectively.

Lineages of Global
Notably, the Italian B.1.1.7 sequences were distributed across the global phylogeny indicating multiple independent introductions into the country (Figure 2). In addition, we observed numerous Italian-specific transmission clades (i.e., extended transmission chains), although most (n = 73, 78%) were area specific, the largest containing 104 infections all sampled in South Italy (Figure 2). Of the remaining clades (n = 22), 82% contained infections from two and 18% from all three macro areas. For 73% (n = 16) of these, the clade in question comprised ≥80% of infections from a single area only and only a small number of infections from a second or third macro area. For example, one clade with 64 sequence comprised 63 from Central Italy, one from South Italy and none from North Italy. Similarly, another clade contained 76 infections from North Italy and one from Central Italy. In addition, we identified six clades with more inter-area mixing, including one example consisting of 35 infections (10 North, 12 Central, 13 South). Notably, this is the only example of prolonged transmission between macro areas for this time period (Figure 2). number of infections from a second or third macro area. For example, one clade with 64 sequence comprised 63 from Central Italy, one from South Italy and none from North Italy. Similarly, another clade contained 76 infections from North Italy and one from Central Italy. In addition, we identified six clades with more inter-area mixing, including one example consisting of 35 infections (10 North, 12 Central, 13 South). Notably, this is the only example of prolonged transmission between macro areas for this time period ( Figure  2). 19-February) outside of both clade II and III also contained this amino acid substitution (Figure 3). nt 3357/aa 1118 Ref EPI_ISL_833137), and 43 of these Italian sequences (clade III) also contained a non-synonymous mutation in the spike leading to an S813N mutation (spike nt 2437/aa 813 Ref EPI_ISL_833137). This S813N mutation is located between two fusion peptides and not in the receptor binding domain where the E484K and N501Y mutations are located (Supplementary Figure S2). P.1. viruses with the mutation were first sampled on 13 January 2021 in Central Italy and the most recent occurrence was found in an infection sampled on 1 March. To date, the mutation has only been found in infections from Central and South Italy. Of note, one additional sequence (EPI_ISL_1169907, Central Italy, 19-February) outside of both clade II and III also contained this amino acid substitution ( Figure  3).  Bootstrap node support for clades I-III is shown. Note, a single sequence containing also the S813N mutation but falling in clade I is marked with an asterisk. Branches without a tip circle represent sequences from Germany. (top) Schematic view of the spike protein alignment for different reference sequences as well as the clade containing the S813N substitution. The protein structural regions are marked: S1 NTD = N-terminal domain of the S1 subunit, S1 RBD = receptor binding domain of S1, S1-S2 S2 = S1/S2 cleavage region and S2 fusion subunit. A detailed view of the protein alignment can be found in Supplementary Figure S2.
Finally, five of the eight B.1.351 sequences were scattered across the global phylogeny, confirming that these are independent infections associated with returned travelers (30 January and 11 February in North Italy n = 2, 23 February in Central Italy n = 1, 19 and 26 February in South Italy n = 2, Supplementary Figure S3). However, we also found B.1.351 sequences from North Italy sampled on 11 and 13 February 2021 who formed a phylogenetic cluster, suggestive of some local transmission.

Discussion
We document the complex patterns of virus transmission and lineage turnover during the second wave of SARS-CoV-2 in Italy, with an analysis based on >6500 sequences sampled from January 2020-March 2021. First, we observed the replacement of B.1 and B. 1.1.74 by B.1.177, the latter being the most sampled lineage during the second wave. Our data shows how different lineages appeared almost simultaneously in North, Central, and South Italy (i.e., within a five-day time frame), suggesting a complex pattern of multiple introductions into the different macro areas. Second, and more notably, we observed the more recent appearance of the emerging 'variants of concern' B.1.1.7 and P.1, with the former being the most sampled lineage during January and February 2021.
We focused our analysis on lineages B.1.1.7, B.1.351, and P.1 as these have been the subject of considerable discussion and concern. An importation of B.1.351 was first reported by Novazzi et al. [24], with this variant identified in a returning traveler from Malawi, although this did not lead to ongoing transmission. Indeed, in the case of B.1.351, we only found one instance of local transmission in Italy: three infections from mid-February 2021, with no evidence for additional ongoing transmission. In contrast, we found strong evidence for local ongoing transmission of both the B.1.1.7 and P.1 lineages, with the former potentially replacing the previous dominant strain B.1.177. Of note, from 8 March 2021 onwards, B.1.1.7 represented on average 50% of all sequences sampled. This is compatible with observations that B.1.1.7 has enhanced transmissibility [5] and has become the dominant variant in numerous other European countries, e.g., Denmark [25].
Our phylogenetic analysis confirms extensive virus transmission for B.1.1.7 within North, Central, and South Italy, but only limited transmission between these regions, consistent with lockdown restrictions imposed in December 2020 that prohibited travel between regions. We did find one cluster containing a mix of sequences sampled from infection in North, Central, and South Italy. However, the limited sequence data available as well as the lack of metadata makes it challenging to accurately identify the directionality of transmission and lineage spread across the country, although our previous study similarly showed limited interregional transmission during the lockdown period [16]. Although no metadata regarding virulence was available for the sequence data included here, it was recently shown that some B.1.1.7 variants are characterized by longer persistence and higher viral RNA loads in nasopharyngeal swabs [26], in the UK the B.1.1.7 is associated with both a higher production number and a greater hazard of death [5]. Of note, no clade defining amino acid substitutions were found for the B.1.1.7 lineage in spike gene of Italian sequences analyzed here.
After Brazil, Italy harbors the second highest number of P.1 infections reported to date [3]. This variant was first reported in Manaus (Brazil), which was hit hard by the first wave of SARS-CoV-2 with an estimated 76% of the population infected [27]. Thus, the surge of P.1 in this area strongly implies the possibility of reinfection [9]. An earlier study reported the presence of P.1 in three returning travelers from Brazil to Abruzzo sampled on 18 January 2021 [23], although we show here that this small cluster did not lead to (detectable) subsequent transmission events. A second study reported the presence of P.1 in a returning traveler from Brazil [28]. This individual was sampled on 22 January in Milan, North Italy. However, as the sequence deposited only covered the spike protein, we did not include it in our analysis. Despite this our molecular epidemiological analysis provided strong evidence for a single introduction of P.1 into Italy, followed by extensive local transmission. Indeed, our phylogenetic analysis depicts importation of the P.1 variant into Italy, although from a yet unknown source, followed by possible onward spread from Italy to Germany. Importantly, within the Italian network we also found a unique, clade defining amino acid substitution-S813N. While the function of this mutation is currently unknown, it is likely unrelated to virus binding and hence does not represent an immune escape variant, although this should be assessed further [13,29].
Our study contains notable limitations. Although we included all available sequence data from GISAID (~6000 sequences), this represents a tiny fraction of the total number infections reported in Italy to date (>3 million). We also observed a marked sampling bias between North, Central, and South Italy, which hampers estimates on the extent of interregional transmission. Finally, our study focused on three key variants only and we did not report the transmission patterns for B.1 and B.1.177, which may have greater levels of inter-region transmission following the easing of population lockdowns.

Conclusions
We depict the rapid emergence and replacement of new lineages of SARS-CoV-2 in Italy despite major lockdown restrictions, suggesting that the disease management policies employed were insufficient to halt the spread of emerging variants. Notably, both the B.1.1.7 and P.1 variants of concern spread across Italy before international border closers were imposed for the UK and Brazil. This highlights the importance of a rapid and inclusive vaccine roll-out on a global scale.