A Simplified Sanger Sequencing Method for Detection of Relevant SARS-CoV-2 Variants

Molecular surveillance of the new coronavirus through new genomic sequencing technologies revealed the circulation of important variants of SARS-CoV-2. Sanger sequencing has been useful in identifying important variants of SARS-CoV-2 without the need for whole-genome sequencing. A sequencing protocol was constructed to cover a region of 1000 base pairs, from a 1120 bp product generated after a two-step RT-PCR assay in samples positive for SARS-CoV-2. Consensus sequence construction and mutation identification were performed. Of all 103 samples sequenced, 69 contained relevant variants represented by 20 BA.1, 13 delta, 22 gamma, and 14 zeta, identified between June 2020 and February 2022. All sequences found were aligned with representative sequences of the variants. Using the Sanger sequencing methodology, we were able to develop a more accessible protocol to assist viral surveillance with a more accessible platform.


Introduction
SARS-CoV-2, a new coronavirus that causes a severe acute respiratory syndrome (COVID- 19), was first reported in December 2019 [1] and soon after became a pandemic due to its high transmission rate, which favored the formation of new genomic mutations and contributed to the emergence of new variants [2].
In the middle of 2021, the delta variant caused an outbreak, becoming the most prevalent and worrying worldwide [7]. However, the emergence of new cases and the prevalence of the omicron variant in South Africa evidenced the change in the epidemiological landscape, whereas the circulation of the delta variant has been suppressed by the high transmissibility of the omicron [8].
Molecular surveillance of the new coronavirus through new genomic sequencing technologies has played a fundamental role in the monitoring of COVID-19 evolution by allowing the identification and screening of the main variants responsible for outbreaks around the world [9].
Methodologies involving Next Generation Sequencing (NGS) has been essential during the pandemic, from the detection of the etiological agent in the first samples until the determination of variants from the sequencing data deposited on platforms such as GISAID [10]. However, NGS is a platform with restricted accessibility due to operational costs, making difficult the viral molecular tracking in a widespread way [11]. Although Sanger sequencing is an old methodology, it is still very useful for mutation analysis and

Study Design
This is an observational cross-sectional study, developed at the Research Laboratory for Infectious Diseases (LAPI) at Hospital Universitario Professor Edgard Santos (HUPES) in Federal University of Bahia, Salvador, Brazil.
We selected preserved samples, positive for SARS-CoV-2, from individuals over 18 years of age, who came to this center with symptoms of infection, as well as from HUPES healthcare workers who developed COVID-19. Individuals were attended to at the HUPES pneumology outpatient clinic from June 2020 to February 2022. All samples from involved participants were confirmed for COVID-19 through [12] detection of SARS-CoV-2 in saliva samples by an RT-qPCR assay, using the commercial kit 1copyTM COVID-19 qPCR 4plex Kit (1Drop, Seongnam-si), for amplification of 3 genes: E, N, and RdRp. Samples were considered detectable with Cycle Threshold (C.T) equal to or less than 40 in all 3 genes.

Amplification Protocol
We used 2.5 µL of RNA extracted from saliva in a reverse transcription assay using the SuperScript III Reverse Transcriptase kit (Invitrogen, Waltham, MA, USA) according to the manufacturer's manual. The cycles of the two stages are 72 • C at 5 min (1st stage), thermal shock at 3 min (after thermal time), 50 • C at 10 min, 55 • C at 10 min, 58 • C at 58 min, and 94 • C at 1 min (2nd stage). For PCR reaction, 6 µL of reverse transcriptase reaction was used at a final volume of 25 µL. The enzyme used was Platinum Taq Polymerase from Brazil (Invitrogen, Massachusetts). The sequences of primers for the first and second rounds of amplification, shown in Table 1, were taken from published works [13] (https://github.com/artic-network/artic-ncov2019/blob/master/ primer_schemes/nCoV-2019/V3/nCoV-2019.tsv accessed on 12 April 2021). The concentrations of all reagents used followed the guidelines of the enzymatic kits. The cycling of the first round is 94 • C for 2 min, 94 • C for 20 s, 60 • C for 30 s, 72 • C for 1 min and 30 s for 35 cycles, and 72 • C for 2 min. An amount of 2 uL of the first round was used for a second round PCR (Nested PCR) in a final volume of 60 uL, cycling at 94 • C for 2 min, 94 • C for 20 s, 60 • C for 30 s, 72 • C for 1 min and 10 s for 35 cycles and 72 • for 2 min. At the end of the PCR, 10 uL of the product was used for electrophoresis in a 1.5% agarose gel. The name, sequence, melting temperature (T.M), annealing temperature (T.A), and position are shown. The terms in bold represent the steps of RT-PCR, where the first round (1R) will produce a 1645 bp strand and the second round (2R) will produce a 1120 bp strand. The sequencing of the strand generated by the 2R will allow the identification of 25 important target mutations. (a): removed primers from ARTICnetwork; (b): removed primers from Shaibu et al. [13].

Purifications and Sequencing
We used the commercial kit PureLink Genomic DNA Mini Kit (Invitrogen, MA, USA) according to the manufacturer's manual, obtaining a purified product of 40 uL. The sequencing primers can cover a region of about 1000 base pairs (bp) in the S gene. The sequencing reaction was performed using the BigDye Terminator V3.1 Cycle Sequencing kit. The BigDye reaction product was purified using 75% isopropanol, and at the end 10 uL of formamide was added to each well for sequencing using the SeqStudio (Applied Biosystems, Waltham, MA, USA).

Data Analysis
The results were analyzed by the Geneious 9.0.5 software (Dotmatics, MA, USA) for the construction of the consensus sequence from the reference sequence NC_045512.2 and identification of mutations along the genome. Sequences that contained mutations common to important variants were aligned to others, deposited on the GISAID and NCBI GenBank platforms, representative of variants of concern. The sequences generated in this study were deposited on the GISAID and GenBank platforms' Table S1. The classification adopted for the variants was based on the PANGO lineage nomenclature, proposed by Rambout et al. 2020 [14].

Results
A total of 103 samples obtained from people with COVID in the period from 23 June 2020 to 12 February 2022 were used ( Figure 1). The proportion of men in the sample was 53.2%. The age of the participants ranged from 18 to 85 years with a mean of 48 years. Among all the participants, cough was the most prevalent symptom (60%); HIV infection was the most common health condition, comprising 18% of the samples collected. We were able to amplify a single, extensive region of DNA of 1120 bp in all samples, which is enough to identify important mutations for the characterization and differentiation of emerging variants ( Figure 2). The CT values for the samples ranged from 4.6 to 37. We were able to amplify a single, extensive region of DNA of 1120 bp in all samples, which is enough to identify important mutations for the characterization and differentiation of emerging variants ( Figure 2). The CT values for the samples ranged from 4.6 to 37.
variant had the highest prevalence among the analyzed samples, while the BA.1 was the only variant found in the first two months of 2022.
We were able to amplify a single, extensive region of DNA of 1120 bp in all samples, which is enough to identify important mutations for the characterization and differentiation of emerging variants ( Figure 2). The CT values for the samples ranged from 4.6 to 37. In total, 28 important mutations are potentially identified, all located between amino acids K417 to T716 that are present in VOCs and VOIs, from position 1251 to 2148 of the S gene (Table 2). In all samples, we identified 23 mutations. All of these mutations allowed us to identify and differentiate 15 viral variants, including 4 omicron subvariants. In total, 28 important mutations are potentially identified, all located between amino acids K417 to T716 that are present in VOCs and VOIs, from position 1251 to 2148 of the S gene (Table 2). In all samples, we identified 23 mutations. All of these mutations allowed us to identify and differentiate 15 viral variants, including 4 omicron subvariants.
Among The sequences corresponding to the variants found were aligned with representative sequences of these variants, deposited on the GISAID platform, plus the first one generated in Wuhan, NC_045512.2. All detected mutations representative of each variant were paired with those found in the sequences removed for alignment ( Figure 3).

Discussion
The emergence of new SARS-CoV-2 variants has raised concerns about lenges of the pandemic. Sanger sequencing seems to be a viable alternative tifying emerging SARS-CoV-2 variants [15]. In this work, we used Sang technology to construct a simplified protocol for detecting the most importa the global scenery. We were able to identify several variants of SARS-CoV years of pandemic, using a simplified protocol that allowed us to follow t circulating variants.
The protocol was constructed with the aim of identifying 28 key muta gene of SARS-CoV-2, RBD, S1/S2 cleavage region and the beginning of the

Discussion
The emergence of new SARS-CoV-2 variants has raised concerns about the new challenges of the pandemic. Sanger sequencing seems to be a viable alternative tool for identifying emerging SARS-CoV-2 variants [15]. In this work, we used Sanger sequencing technology to construct a simplified protocol for detecting the most important variants in the global scenery. We were able to identify several variants of SARS-CoV-2 during two years of pandemic, using a simplified protocol that allowed us to follow the changes in circulating variants.
The protocol was constructed with the aim of identifying 28 key mutations in the S gene of SARS-CoV-2, RBD, S1/S2 cleavage region and the beginning of the S2 subunit.  [2].
Other authors using the same technology were also able to identify mutations in the same region [16][17][18]. Salles et al. developed a sequencing protocol covering the entire S gene where they were able to identify the gamma variant without the need to sequence the entire genome [11]. Besides the gamma variant, others such as delta, alpha, beta, epsilon, iota, kappa, and eta could be identified by other authors by sequencing five regions of the S gene [19]. Our protocol has the advantage of identifying the main variants of the coronavirus using only a single amplified fragment.
NGS platforms are useful tools for tracking viral gene variability in an infected individual, as well as for tracking mutations not detected by the Sanger platform [20]. As the pandemic progressed and new variants emerged, the genome sequencing data could not timely follow the increase in the number of cases worldwide, as more than 600 million cases were reported by May 2022 and only 13 million sequences have been deposited in the GISAID database since the beginning of the pandemic [9]. This confirms the weakness of current viral molecular surveillance, as the use of these platforms is concentrated in places with more sophisticated facilities and greater research resources. In addition, for a virus such as SARS-CoV-2 that does not present a high mutation rate [21] observed in other viruses such as HIV [22], Sanger technology becomes a more affordable alternative for screening important variants associated with new outbreaks worldwide [15].
In our study, we were able to sequence 103 samples from different periods of the pandemic, distributed over the peak periods. There were 38 (37%) between June 2020 and December 2020, 45 (44%) during year of 2021, and 20 (19%) in the months of January and February 2022. The C.T value of all samples was used to evaluate the effectiveness of the protocol against different viral loads. Our protocol was able to sequence samples with C.T from 4.6 to 37.0, showing its robustness despite the variation of viral load in clinical samples.
We included samples from 2020 to 2022, which provided us a way to follow the prevalence of variants over time. We detected the circulation of four important variants in addition to those belonging to wild strains predominant in 2020. In Bahia, in June 2020, variants B.1.1.28 and B.1.1.33 dominated the epidemiological picture, being suppressed by the emergence of gamma (P.1) and zeta (P.2) in January 2021 [23]. In total, we identified 34 samples containing wild-type strains, not specified by our protocol.
The omicron variant is composed of several sublines, three of which are considered VOC, BA.2, BA.4, and BA.5, and basically have the same number of mutations [28,29]. Currently, BA.4 and BA.5 dominate the world epidemiological scenario, demonstrating greater transmissibility than BA.2. In addition, BA.4 and BA.5 can reinfect individuals who have already had COVID-19 by previous subvariants of omicron [30,31]. In our study, we identified 20 omicron variants, all characterized as BA.1. We did not identify the newest subvariants. Our protocol can differentiate BA.1, BA.2, BA.4, and BA.5, even though it cannot cover mutations such as D405N and D408S present in BA.2, BA.4, and BA.5. It is also able to continue identifying relevant variants as the viral evolution progresses. The absence of substitutions G446S, G496S, and T547K in BA.2 and the presence of L452R, F486V in BA.4/BA.5 became a differentiating criterion between them (Table 2) [27,28].
Thirteen sequences were classified as delta variants. This variant was first discovered in October 2020 but named by the WHO in May 2021 [6]. It has been considered a VOC due to its high transmissibility, short incubation period, and potential evasion of the neutralizing activity of antibodies generated by vaccine and previous infections [7,32]. In these samples, we identified three delta-specific mutations (L452R, T478K, and P681R) in addition to D614G, in agreement with other sequencing protocols, that used the same technology, capable of identifying delta [33].
The gamma variant, which is no longer important in the overall epidemiological picture of the pandemic, was identified in 22 samples. This variant was initially discovered in Japanese travelers returning from Manaus in November 2020 and was associated with the peak of cases in that city [34]. In these samples, we could find the common mutations described to gamma (K417T, E484K, N501Y, and H655Y). The E484K mutation was also found in 14 more samples, identified as zeta, a former VOI that is no longer circulating. Zeta was prevalent in Brazil in late 2020, where its emergence was accompanied by the spread of strains containing the E484K mutation [35].
The target region of our sequencing protocol comprises important amino acids for the infection mechanism of SARS-CoV-2 [36]. The amino acid K417, the first to be identified in our protocol, is part of the receptor-binding domain and is immediately adjacent to the receptor-binding motif (RBM), which encompasses the amino acids at positions 438 to 506 [2]. The S1 domain exhibits greater variability in amino acids compared to the S2 domain, with RBM being the most variable portion of RBD [2]. Within this RBM region, there are numerous important mutations, and all the VOCs already identified have at least one mutation in this region, conferring resistance to neutralizing antibody action and enhanced antigen-receptor interaction [37][38][39][40]. The enzymatic cleavage region between S1 and S2 stands out, because three essential mutations for the identification of important variants (H655Y, N679K and P681H) increase the proteolytic activity in this region, leading to increased viral infectivity and replication [27].
Mutations in the S gene responsible for the genotypic characterization of SARS-CoV-2 variants also influence their phenotypic characteristics, conferring escape to antibodies generated by vaccine or previous infections. Booster doses have emerged with the goal of improving vaccination coverage due to the emergence of new variants and the possibility of reinfection [41,42]. Success in identifying the variants through our protocol allowed us to observe that 100% of those who had the omicron variant and 70% of those who became infected with delta were vaccinated. The current protocol proved to be a simplified and effective tool for variant detection and could be applied to variant monitoring in vaccinated individuals.

Conclusions
Molecular surveillance of SARS-CoV using genome sequencing methodologies has provided worldwide data on viral evolution and adaptability as new cases emerged. The use of NGS for this purpose played an important role in this pandemic, although it is a platform of limited access due to operational costs. With the Sanger sequencing methodology, we were able to develop a more accessible protocol, which allowed us to identify four important variants from a single amplified DNA fragment. Thus, this protocol can assist viral surveillance from a more accessible platform.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics12112609/s1, Table S1: Sequence access number. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available in Table S1 (Supplementary Materials).