SARS-CoV-2 Variants by Whole-Genome Sequencing in a University Hospital in Bangkok: First to Third COVID-19 Waves

Background: Multiple severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants emerged globally during the recent coronavirus disease (COVID-19) pandemic. From April 2020 to April 2021, Thailand experienced three COVID-19 waves, and each wave was driven by different variants. Therefore, we aimed to analyze the genetic diversity of circulating SARS-CoV-2 using whole-genome sequencing analysis. Methods: A total of 33 SARS-CoV-2 positive samples from three consecutive COVID-19 waves were collected and sequenced by whole-genome sequencing, of which, 8, 10, and 15 samples were derived from the first, second, and third waves, respectively. The genetic diversity of variants in each wave and the correlation between mutations and disease severity were explored. Results: During the first wave, A.6, B, B.1, and B.1.375 were found to be predominant. The occurrence of mutations in these lineages was associated with low asymptomatic and mild symptoms, providing no transmission advantage and resulting in extinction after a few months of circulation. B.1.36.16, the predominant lineage of the second wave, caused more symptomatic COVID-19 cases and contained a small number of key mutations. This variant was replaced by the VOC alpha variant, which later became dominant in the third wave. We found that B.1.1.7 lineage-specific mutations were crucial for increasing transmissibility and infectivity, but not likely associated with disease severity. There were six additional mutations found only in severe COVID-19 patients, which might have altered the virus phenotype with an inclination toward more highly pathogenic SARS-CoV-2. Conclusion: The findings of this study highlighted the importance of whole-genome analysis in tracking newly emerging variants, exploring the genetic determinants essential for transmissibility, infectivity, and pathogenicity, and helping better understand the evolutionary process in the adaptation of viruses in humans.


Introduction
Three years have passed since the world faced the major challenges caused by the coronavirus disease (COVID- 19) crisis; when the crisis will end remains unknown. The disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which first appeared in Wuhan, China, in late December 2019. Since then, it has infected more than 700 million people and caused over 6 million deaths globally [1]. The more the virus circulates, the greater the chance for mutations to occur; as a process, viral evolution

Ethical Approval
This study was approved by The Ethical Review Board of the Faculty of Medicine, Vajira Hospital, Navamindradhiraj University (approval ID COA 049/2563). The informed consent was waived due to all samples being anonymous.

Sample Selection and Viral RNA Extraction
Forty nasopharyngeal swab (NPS) specimens of COVID-19-suspected patients routinely sent for diagnosis at the Molecular Laboratory Unit, Division of Central Laboratory and Blood Bank, Faculty of Medicine, Vajira Hospital, Navamindradhiraj University from April 2020 to April 2021 were selected for RNA extraction and WGS. Viral RNA of COVID-19 cases confirmed by a routine method were extracted using QIAamp Viral RNA Mini kit (QIAGEN, Qiagen, Germantown, MD, USA) following the manufacturer's instructions.

Collection of Demographic and Clinical Data
The demographic and clinical data, including sex, age, clinical features, inpatient/outpatient status, and comorbidities, were retrieved from medical records. The disease severity was classified as asymptomatic, mild, moderate, severe, and critical illnesses according to World Health Organization; asymptomatic (laboratory-confirmed patients without COVID-19 symptoms), mild (fever and upper respiratory symptoms with no signs of pneumonia), moderate (fever and respiratory symptoms with evidence of lower respiratory disease), severe (oxygen saturation < 94%, ratio of arterial partial pressure of oxygen to fraction of inspired oxygen < 300 mm Hg, respiratory rate > 30 breaths/min, or lung infiltration > 50%), and critical (respiratory failure, septic shock, and/or multiple organ failure) [16].

WGS of the First Wave
Nine samples from April 2020 were subjected to WGS using an Atoplex System, in accordance with the manufacturer's protocol. In brief, 10 uL of RNA were reverse transcribed to cDNA. Then, the DNA library which contained a Dual barcode adaptor was performed by two-step multiplex PCR to amplify the RNA target region. The pooled DNA library was used to create circularized single strand DNA (ssCirDNA) by following the protocol of ATOPlex RNA Universal Library Preparation Kit (MGI Tech Co., Ltd., Shenzhen, China). DNA nanoball (DNB) was generated from ssCirDNA for subsequent sequencing on MGISEQ-2000RS sequencer by following the protocol of MGISEQ-2000RS High-throughput Sequencing Set (MGI Tech Co., Ltd., Shenzhen, China). The sequences were cleaned and retrieved by the bioinformatician at the Medical Genome Company, Bangkok, Thailand.
After obtaining sequencing read in FASTQ format, the bioinformatic analysis is done by the iGVD software following the iGenomeVirusDetector User Guide (Genome Wisdom (Beijing, China) Gene Technology Co., Ltd.).

WGS of the Second and Third Waves
A total of 31 samples collected during December 2020 to April 2021 were analyzed using the Ion Proton system (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer's protocol. The Ion AmpliSeq SARS-CoV-2 research panel used in this study contains 247 amplicons in two pools targeting the SARS-CoV-2 genome with >99% coverage. The library preparation of amplified samples was performed on the Ion OneTouch 2 System and then sequenced on the Ion Proton system (Thermo Fisher Scientific, Waltham, MA, USA) at the Center of Medical Genomics, Ramathibodi Hospital, Mahidol University, Thailand.
For data analysis, the sequencing reads were processed with the Ion Proton software plug-ins coverageAnalysis, IRMAreport, AssemblerTrinity, variantCaller (Torrent Suite software v5.20, with germline low-stringency settings according to the TS5.20 user guide), and GenerateConsensus (Ion AmpliSeq SARS-CoV-2 Insight research assay). Coverage analysis was set to a minimum depth of 30 reads. The consensus sequence was obtained directly from the IRMAreport and GenerateConsensus as the fasta file format. The consensus sequences were assessed using NextClade (https://clades.nextstrain.org/, version 1.9.0, accessed on 1 October 2021) for sequence clade assignment, identification quantification of mutations, and sequence quality analyses.

Phylogenetic Tree
To assess the evolutionary relationships among the studied sequences, we aligned the sequence using an online version of MAFFT version 7 (https://mafft.cbrc.jp/alignment/ server/, accessed on 13 February 2022). Then, evolutionary history was inferred using the maximum likelihood method. The best models of evolution for the dataset were selected from the Bayesian Information Criterion (BIC) of MEGA 11: Molecular Evolutionary Genetics Analysis across computing platforms [17]. The model with the lowest BIC score was selected to construct a maximum likelihood phylogenetic tree. The percentage of trees in which the associated taxa clustered together was shown next to the branches. The initial tree for the heuristic search was automatically obtained by the application of the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach. We then selected the topology associated with a superior log likelihood value. The tree was drawn to scale, with branch lengths representing the number of substitutions per site. A bootstrap analysis was conducted using 500 replicates, and bootstrap values ≥50% are shown above branches.

SARS-CoV-2 Lineage Classification
Consensus FASTA files of SAR-CoV2 genomic sequences were uploaded to the Pangolin web service to assign the most likely SARS-CoV-2 lineage to our samples (Pango nomenclature) [18]. CoVsurver online server (CoVsurver-CoronaVirus Surveillance Server, 2021) was used for GISAID clade assignment.

Distribution of Three Waves of COVID-19 Outbreaks in Thailand
The COVID-19 situation in Thailand was divided into distinct waves driven by different dominant SARS-CoV-2 strains. In this context, the first (March 2020-November 2020), second (December 2020-March 2021), and third (April 2021-June 2021) waves were studied. The first wave of the SARS-CoV-2 outbreak started in early March 2020 and peaked between the 22 and 31 March 2020 (188 cases/day). The number of confirmed cases never exceeded 200 cases per day. For the second wave, a peak of transmission (959 cases/day) started on the 26 January 2021 until the 4 February 2021 with less than 1000 new COVID-19 cases per day. At the beginning of the third wave in early April 2021, the number of cases started to increase rapidly. A record of 9635 new cases, which was the highest daily number of cases since the pandemic began, was reported on 17 May 2021. The average number of new COVID-19 cases reported per day during this wave was nearly 3000 (Figure 1).
A total of 40 NPS swabs were collected from 40 confirmed COVID-19 patients who acquired the virus by local transmission and were subjected to WGS study. Given the poor genomic coverage (<95%), seven samples were excluded from the analysis. Thirty-three samples passed the Chi-square test performed by IQ-TREE multicore version 1.6.12 and were analyzed to represent the first, second, and third waves of the SARS-CoV-2 outbreak in the hospital ( Table 1). The samples collected from travelers or residents returning home in the state quarantine were not included in the study. Viral lineages were assigned using Pangolin online software (https://cov-lineages.org/pangolin.html; accessed on 30 November 2021), respectively.  A total of 40 NPS swabs were collected from 40 confirmed COVID-19 patients who acquired the virus by local transmission and were subjected to WGS study. Given the poor genomic coverage (<95%), seven samples were excluded from the analysis. Thirty-three samples passed the Chi-square test performed by IQ-TREE multicore version 1.6.12 and were analyzed to represent the first, second, and third waves of the SARS-CoV-2 outbreak in the hospital ( Table 1). The samples collected from travelers or residents returning home in the state quarantine were not included in the study. Viral lineages were assigned using Pangolin online software (https://cov-lineages.org/pangolin.html; accessed on 30 November 2021), respectively.    The shared SARS-CoV-2 data were retrieved by CoV spectrum, showing that 27.7% of A.6 in the first wave, 78.9% of B.1.36.16 in the second wave, and 72.1% of B.1.1.7 in the third wave were sequenced from clinical samples in Thailand and submitted to the database ( Figure 2). Thus, A.6, B.1.36.16, and B.1.1.7 were the predominant lineages that circulated in the first, second, and third waves, respectively. Of note, B.1.36.16 was detected in very low frequency (1%) during the first wave, while a small number (8.63%) of B.1.1.7 was found in the second wave, demonstrating that these lineages had already emerged and circulated for a few months before becoming dominating variants of the second and third waves. The B.1.375 and Q.3 found in the samples for the current study had not been previously recorded in Thailand's database. circulated in the first, second, and third waves, respectively. Of note, B.1.36.16 was detected in very low frequency (1%) during the first wave, while a small number (8.63%) of B.1.1.7 was found in the second wave, demonstrating that these lineages had already emerged and circulated for a few months before becoming dominating variants of the second and third waves. The B.1.375 and Q.3 found in the samples for the current study had not been previously recorded in Thailand's database.

Phylogenetic Tree
This study assessed the genetic relationship in the samples by building a phylogenetic tree in MEGA11 [17]. The T93:Tamura-Nei parameter model and the nonuniformity of evolutionary rates among sites can be modeled by using a discrete gamma distribution (+G), with five rate categories selected based on the best model of evolution analyses (BIC score: 87244.0255270594). Subsequently, a phylogenetic tree was constructed by maximum likelihood analysis based on the T93 + G model with 500 replications of bootstrap

Phylogenetic Tree
This study assessed the genetic relationship in the samples by building a phylogenetic tree in MEGA11 [17]. The T93:Tamura-Nei parameter model and the nonuniformity of evolutionary rates among sites can be modeled by using a discrete gamma distribution (+G), with five rate categories selected based on the best model of evolution analyses (BIC score: 87244.0255270594). Subsequently, a phylogenetic tree was constructed by maximum likelihood analysis based on the T93 + G model with 500 replications of bootstrap analysis. The tree with the highest log likelihood (−43190.83) was shown. A + G was used to model the evolutionary rate differences among sites [5 categories (+G, parameter = 0.0500)]. As a tree (Figure 3), 34 nucleotide sequences were involved. The codon positions included were 1st + 2nd + 3rd + Noncoding. A total of 30,266 positions were included in the final dataset.  . Phylogenetic tree of SAR-CoV-2 sequences. The percentage of trees in which the associated taxa clustered together is shown next to the branches. The initial tree for the heuristic search was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the MCL approach, and the topology with a superior log likelihood value was selected. The tree was drawn to scale, with branch lengths measured in the number of substitutions per site.

Characterization of SARS-CoV-2 Variants in Thailand
We further characterized the aa changes in each lineage. Figure 4 shows the mutations throughout the genome of each lineage in all three COVID-19 waves. Since A.6, B, B.1, and B.1.375 were the earliest clusters in Thailand, low levels of genomic diversity were expected in these lineages. In comparison to the reference strain Wuhan-Hu-1, there were less than 10 mutations in an entire genome. Focusing on the S protein, there was only one mutation (A829T) in the A.6 lineage, while D614G was initially detected in the B.1 and B.1375 lineages and remained present in all the lineages that circulated in Thailand. Similar to D614G, P323L in nonstructural protein (Nsp) 12 was first identified in the first wave (B.1 and B.1.375) and frequently detected throughout the period of study.
More mutations were observed in the second wave; B.1.36.16 carried 32 mutations, in which a high frequency was located on Nsp; there were nine mutations in Nsp3 and 15 mutations in Nsp14 (12 of these were deletions at positions 216-227). However, only three mutations (L5F, S459F, and D614G) were identified on the S protein. L37F in Nsp6 and Q57H in NS3, which were found in few samples from the first wave, later became predominant mutations of the second wave.
Notably, the B.1.1.7 lineage in the third wave harbored a large number of mutations. A total of 54 aa substitutions were identified throughout the genome, where the majority

Characterization of SARS-CoV-2 Variants in Thailand
We further characterized the aa changes in each lineage. Figure 4 shows the mutations throughout the genome of each lineage in all three COVID-19 waves. Since A.6, B, B.1, and B.1.375 were the earliest clusters in Thailand, low levels of genomic diversity were expected in these lineages. In comparison to the reference strain Wuhan-Hu-1, there were less than 10 mutations in an entire genome. Focusing on the S protein, there was only one mutation (A829T) in the A.6 lineage, while D614G was initially detected in the B. lineage. Thus, these were recognized as lineage-defining mutations. The Q.3 lineage, which was sequenced from one sample during the third wave, contained lineage-defining mutations in all regions without any uncommon aa changes detected.
The global occurrence of all mutations found in this study are shown in the Suppl mentary Data section. Similar to high global frequency, S: D614G and Nsp12: P323L we detected in 28/33 (84.8%) and 26/33 (78.8%) in all three epidemic waves, respectively. mutation that had never been reported elsewhere was not considered a mutation of co cern and was thus excluded for further analysis.  The global occurrence of all mutations found in this study are shown in the Supplementary Data section. Similar to high global frequency, S: D614G and Nsp12: P323L were detected in 28/33 (84.8%) and 26/33 (78.8%) in all three epidemic waves, respectively. A mutation that had never been reported elsewhere was not considered a mutation of concern and was thus excluded for further analysis.

Clinical Characteristics of COVID-19 Patients in Three Epidemic Waves
The demographic and clinical information were obtained from 32 patients (data on one patient in the first wave remained unavailable) ( Table 2). The median age of all patients was 37 years (range 1-74); 46.9% were male, and 53.1% were female. The distributions of age and gender in the first and second waves were similar; the median ages were 32 (13-48) and 34 (18-67) years, respectively. Females were affected more than males, whereas higher median age and number of males than females were observed in the third wave. The clinical features of patients in each wave were elaborated. In the first wave, the numbers of asymptomatic and symptomatic patients were comparable at 42.9% and 57.1%, respectively. The symptoms included mild upper respiratory indications (fever, cough, and sore throat) and diarrhea, which were relieved without hospitalization. The second wave caused more symptomatic indications compared to the first wave; 80% of patients presented symptoms, of which 60% had mild illnesses, and 20% developed severe-to-critical illnesses. One of two severe patients had underlying diseases, namely hypertension and diabetes. All of the patients in the second wave outbreak recovered. A notably more intense situation was observed in the third wave, in which the mass vaccination campaign had not yet started. Nearly all infected people had symptoms, particularly lower respiratory symptoms including cough, sputum production, and shortness of breath. All patients required hospital admission (isolation in a healthcare facility for an asymptomatic patient); 40% had moderate symptoms, and 33.3% developed severe-to-critical illnesses. Patients who had comorbidities (66.7%) such as hypertension, diabetes, and heart disease were affected more by this cluster and had a higher risk of death. The clinical outcomes of the third wave resulted in 11 (73.3%) recoveries and four (26.7%) deaths, all of which had underlying diseases.

SARS-CoV-2 Mutations Associated with Disease Severity
The mutations found in virus sequences retrieved from patients who had no symptoms, mild symptoms, and severe-to-critical illness were characterized and grouped by COVID-19 waves (Table 3). A majority of aa substitutions were lineage-defining mutations which were not likely associated with disease severity. For example, (i) S:A829T (A.6 lineage) was detected in asymptomatic and mild-symptom patients in the first wave; (ii) S:L5F, S459F, and D614G (B.1.36.16 lineage) were detected in asymptomatic and severe-to-critical patients in the second wave; (iii) ten aa substitutions in the S protein of B.1.1.7 were found in all patients with varying degrees of severity. However, some mutations were specifically identified in certain clinical spectra. For example, S protein T549I and envelope (E) protein S55F were detected only in severe-to-critical patients, who eventually died, suggesting that these mutations might be potentially pathogenic determinants of the virus which could have influenced disease severity.  * More than one symptom or comorbidity can be given.  Nucleocapsid S187L S194L S194L S194L L557F  L557F  D164G  D164G  P1200S  T183I  T183I  T183I  T183I   T583I  P1200S  P1200S  A890D  A890D  A890D  A890D   T860del  L1195I  I1412T  I1412T  I1412T  I1412T   A861R  V393F  M1529I  Q1530H  D1755N   L862M  T943N  Q1530E  A894del   F1110I F1532V L895V

Discussion
SARS-CoV-2 has placed a continuous strain on the global population with the emergence of several VOCs. Genomic surveillance has been implemented to track viral transmission, monitor mutations, evaluate the rate of evolution, and determine the potential for causing future outbreaks. From the beginning of the COVID-19 pandemic up to June 2021, Thailand experienced three COVID-19 waves that were dominated by distinct viral lineages. In the absence of herd immunity, the introduction of a new variant in each wave contributed to increases in the number of confirmed cases of COVID-19 as well as disease severity. The total case numbers and deaths in the third wave were nearly 30 times that from the first wave [10]. This fact led to the exploration of the genomic diversity of dominant viral lineages in the three waves using WGS. The correlation between mutations and clinical data was also analyzed. Whole-genome analyses revealed the differences in mutation patterns among predominant lineages, which might be associated with transmissibility, infectivity, and disease severity.
During the first wave, Pangolin lineages A.6, B, B.1, and B.1.375 were identified from the samples in the current study and linked to asymptomatic or mild symptoms. The results in this study were in line with those from the study of Puenpa J. et al., wherein clade L (Wuhan-Hu-1-like), S (A.6), G (B.1), V (B), and O (Others) were detected in samples collected from mild symptomatic patients during the first COVID-19 wave in Thailand [11]. Several research groups have attempted to link genetic variations with disease severity. The majority of mutations that occurred during the first wave have been shown to relate to asymptomatic and mild diseases. Nagy Á. et al. identified the mutations in viral sequences retrieved from the GISAID database and observed that NS8:L84S (A.6 lineage) and Nsp6:L37F (B lineage) had higher frequencies among asymptomatic and mild patients [19]. Similar to the study by Aiewsakul P. and colleagues, Nsp6:L37F was significantly associated with asymptomatic patients [20]. Moreover, structural analysis and ex vivo study of NS3:Q57H (B.1 lineage) provided evidence supporting that this mutation was associated with decreased viral virulence, resulting in decreased mortality rates and increased transmission [21,22].
The second wave of the COVID-19 surge was driven by the B.1.36.16 lineage. Most of the patients in the second wave had mild respiratory illnesses, which is consistent with an article from South China in which Xu H. and colleagues reported that patients infected with B.1.36.16 were more symptomatic than earlier strains [23]. High-frequency substitutions were S: D614G; N:S194L; Nsp6:L37F; Nsp12:P323L; NS3:Q57H, which were found in all samples collected during the second wave. Co-mutational combinations (S:D614G, NS3:Q57H, N:S194L) have been shown to correlate with mild and severe outcome based on a model predicting mutations-associated disease severity [24], indicating that the co-mutations were possibly responsible for symptomatic manifestations compared to the previous outbreak. The second wave of the COVID-19 surge was driven by the B.1.36.16 lineage. Most of the patients in the second wave had mild respiratory illnesses, which is consistent with an article from South China in which Xu H. and colleagues reported that patients infected with B.1.36.16 were more symptomatic than earlier strains [25,26]. For example, the combination of S:D614G and Nsp12:P323L has been proven to be an epidemiologically successful variant, giving an advantage to increase the viral fitness and transmissibility but not disease severity [27][28][29]. Our results showed that S:D614G and Nsp12:P323L were dominant in the second and third waves, respectively, suggesting that this combination might have significantly impacted enhanced transmission, contributing to increasing COVID-19 case numbers.
The predominant lineages of the third wave were designated as VOCs (Alpha variant; B.1.1.7 and its sub-lineages). This variant spread rapidly across the globe, increasing the risk of ICU admission and mortality, especially in among older people [30,31]. Data for the third wave showed the same trend as other countries in terms of clinical characteristics and the rise of daily COVID-19 confirmed cases [32][33][34]. The B.1.1.7 and Q3 lineages harbored mutation patterns that could facilitate transmissibility, infectivity, and immune evasion, particularly in the S protein such as del69-70HV, del144Y, N501Y, A570D, D614G, P681H, T761I, S982A, and D1118H. For example, N501Y increases the binding affinity to the host receptor; P681H enhances the spike cleavage, impacting the viral entry; del69-70HV is likely to be involved with immune evasion and an increase in viral transmission [35,36]. Mutations in non-spike proteins such as (N: D3L, R203K, G204R, S235F; Nsp3: T183I, A890D, I1412T; Nsp6: SGF106-108del; Nsp12: P323L; NS3: G254stop; NS8: Q27stop, R52I, K68stop, Y73C) were also observed at a high frequency (85-100%) corresponding to the global prevalence [37]. Although the potential effects of these mutated proteins have not been investigated thoroughly, the relationship among mutations may have provided transmissibility, infectivity, and replication advantages for the alpha variant, eventually making it predominant in Thailand. However, these signature mutations were not likely to contribute to enhanced disease severity since they were detected in patients with varying degrees of clinical symptoms and illnesses ranging from mild to critical.
In this study, a few additional mutations were found only in sequences retrieved from severe/critical and deceased patients. The aa substitutions in structural proteins might affect viral attachment and entry into host cells. The mutational effect of S:T549I has not been reported previously. However, position 549 on the S protein is proximal to RBD, which may affect the binding affinity to the ACE2 receptor and subsequently influence viral entry. S55F was predicted to improve the binding of the SARS-CoV-2 envelope protein to a tight junction-associated protein (PALS1) [38]. In addition, changes in the non-structural proteins might be involved with viral replication and the ability to counteract with host immune response [39]. For example, Nsp2 mutations might affect virulence by interfering with the interaction with host proteins [40]; Nsp3 mutations might be responsible for interacting with host immunity and contribute to unfavorable clinical outcome in patients [41]; and Nsp14 mutations possibly affect the genomic diversity and evolution of viruses [40]. Nevertheless, the functional impacts of these substitutions need to be investigated further. Due to the low frequency of occurrence, we postulated that these mutations probably change the virus phenotype to be more pathogenic and decrease viral fitness. It should be noted that the mutation-associated disease severity could vary widely due to the wide-ranging clinical information regarding symptom classification and patient management in each country.
Not only are genetic variations of the virus responsible for virulence, but the host factor also contributes to severe outcomes. The presence of pre-existing comorbidities including hypertension, heart disease, diabetes, and obesity combined with pathogenic determinants in the viral genome can cause detrimental effects on the host. Given the small sample size, determining the correlation between aa substitutions and clinical data is difficult. Further investigations including in vitro, in vivo, and structural analyses are required to explore whether additional mutations affect viral pathogenesis and disease severity since these mutations might rebound in subsequent emerging variants.

Conclusions
This study presented the genome characterization of SARS-CoV-2 variants that predominated during three COVID-19 waves in Thailand. The number of mutations increased over time and was directly proportional to the increase in confirmed cases as well as disease severity. The A.6, B, and B.1 lineages contained mutations that had minimal impacts on transmission; the lockdowns and preventive measures could have effectively controlled this wave. Accumulated mutations were observed in B.1.36.16, relative to the increase of symptomatic COVID-19 patients in the second wave. The alpha variant that was predominant in the third wave harbored mutations that were essential for enhanced transmissibility and infectivity. Thus, a sharp rise in case numbers and increased hospitalizations were recorded. All lineage-defining mutations were not likely to be associated with severe diseases since they were detected in sequences retrieved from patients in all clinical spectra. Nevertheless, six mutations were detected only in severe and deceased patients, suggesting that these additional mutations possibly gave rise to phenotypic changes to attain higher pathogenicity, especially in patients with underlying conditions. The findings of this study can provide more insights into the genomic epidemiology and diversity of SARS-CoV-2 circulating in Thailand.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pathogens12040626/s1, Table S1: Global prevalence of amino acid substitutions identified in our study.
Author Contributions: C.S., M.T. and T.W. conceived and designed the study, analyzed and interpret the results, and wrote the original manuscript; C.S. and M.T. performed RNA extraction; T.W. and C.T. performed the data curation and genetic and phylogenetic analyses; T.H., U.P. and C.T. helped in coordinating the project. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the Faculty of Medicine Vajira Hospital, Navamindradhiraj University, approval code: COA 049/2563.

Informed Consent Statement:
A waiver of informed consent was approved by the Vajira Institutional Review Board (IRB) since the study involves no greater than minimal risk. In addition, a waiver of informed consent will not adversely affect the rights and welfare of the subjects. Permission from Vajira Hospital is required for the secondary use of electronic medical records.

Data Availability Statement:
The data presented in this study are not publicly available due to confidentiality but are available on reasonable request.