A Global Mutational Profile of SARS-CoV-2: A Systematic Review and Meta-Analysis of 368,316 COVID-19 Patients

Since its first detection in December 2019, more than 232 million cases of COVID-19, including 4.7 million deaths, have been reported by the WHO. The SARS-CoV-2 viral genomes have evolved rapidly worldwide, causing the emergence of new variants. This systematic review and meta-analysis was conducted to provide a global mutational profile of SARS-CoV-2 from December 2019 to October 2020. The review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA), and a study protocol was lodged with PROSPERO. Data from 62 eligible studies involving 368,316 SARS-CoV-2 genomes were analyzed. The mutational data analyzed showed most studies detected mutations in the Spike protein (n = 50), Nucleocapsid phosphoprotein (n = 34), ORF1ab gene (n = 29), 5′-UTR (n = 28) and ORF3a (n = 25). Under the random-effects model, pooled prevalence of SARS-CoV-2 variants was estimated at 95.1% (95% CI; 93.3–96.4%; I2 = 98.952%; p = 0.000) while subgroup meta-analysis by country showed majority of the studies were conducted ‘Worldwide’ (n = 10), followed by ‘Multiple countries’ (n = 6) and the USA (n = 5). The estimated prevalence indicated a need to continuously monitor the prevalence of new mutations due to their potential influence on disease severity, transmissibility and vaccine effectiveness.


Introduction
Coronavirus Disease-19  is caused by Severe Acquired Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) [1,2]. Since the SARS-CoV-2 epidemic first reported in Wuhan, China, the clinical features of COVID-19 have evolved, moving from clinically apparent pulmonary or flu-like symptoms to subclinical or even silent infections. The COVID-19 infection could frequently involve an asymptomatic or paucisymptomatic framework, leading to a spread in the general population [3]. COVID-19 produces respiratory distress with mild to severe symptoms, and it is fatal in individuals with a chronic disease or a compromised immune system [4]. Various clinical outcomes in COVID-19 Life 2021, 11, 1224 2 of 17 patients have also been documented throughout several other regions across the world. As of 29 September 2021, according to the World Health Organization (WHO), the SARS-CoV-2 pandemic has infected over 232,075,351 individuals across the world, resulting in 4,752,988 fatalities and significant disruptions to regular activities and national economies.
The international scientific community continuously characterized the pathophysiological features of COVID-19, developed diagnostic tools, evaluated immune responses, and identified risk factors for severe illness courses. SARS-CoV-2 clustered outbreaks and super spreading episodes provide a unique challenge to pandemic control [5]. However, the basic characteristics of SARS-CoV-2 genome evolution and transmission dynamics within the human population are still unknown [6]. COVID-19 infection demonstrated related inflammatory state of the upper airway mucosa and olfactory neurotoxic damage. However, to date, a reliable method in the evaluation of the nasal health of post-infection patients is not clear [7].
SARS-CoV-2 genomic sequencing from several geographical regions has recently revealed that the virus quickly changes by accumulating mutations in its genome. It has been proposed that new SARS-CoV-2 variants may adapt better to new geographical locations, making them more potent than the virus that discovered in Wuhan, China.
All viruses' genomes gain mutations over time. However, various variables, including the mutation rate and the effects of mutation on viral dynamics within and between individual hosts, influence the rate of mutation accumulation and its repercussions for transmission and illness in the host population [8]. The combination of these variables determines the development and transmission of viral variations and the evolution of pandemics. Detection of mutations spread worldwide is essential for a better understanding of the viral evolution, bio-pathology and transmission since RNA virus genomes are highly susceptible to mutation [9].

Study Design and Protocol
The Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocol (PRISMA-P 2015) guidelines [10] were used as this study's checklist. The study population included individuals with SARS-CoV-2 infection with the main out-come being mutations in the SARS-CoV-2. Reference was made to the Wuhan strain as a comparator A Prospero protocol (No CRD42021229620) was lodged for this study.

Literature Review
The PROSPERO database and Database of Abstracts of Reviews of Effects (DARE) (http://www.library.UCSF.edu; accessed on 10 January 2021) were searched to ensure no other meta-analysis on the impact of the mutational profile of SARS-CoV-2 on transmissibility and disease severity exists or is ongoing. The literature search was performed using international databases PubMed, Scopus, Science Direct and Google Scholar using the search terms listed in Table S2. Two authors carried out the database search to minimize bias.

Inclusion and Exclusion Criteria for Studies
Inclusion criteria: (1) Studies reporting on human COVID-19, (2) Studies reporting on SARS-CoV-2 mutations, (3) Studies reporting on SARS-CoV-2 mutations and their association with superspreading events, transmissibility and severity of illness in COVID-19 patients. Exclusion criteria include reviews papers, animal studies, protein characterization studies, studies on environmental sampling, and media reports.

Quality Assessment
The methodological quality of the included studies was assessed independently by two authors using the Joanna Briggs Institute (JBI) critical appraisal checklist for prevalence data [11]. A score of '1' for "yes" and '0' for other parameters was assigned to attain a total Life 2021, 11,1224 3 of 17 quality score ranging from '0' to '9'. Studies with an overall score of '7'-'9' were considered sufficient quality (Table S3).

Data Extraction
Two independent authors performed the data extraction by using standardized forms, which included manuscript title, authors, journal, publication year, countries of study, period of study, number of participants, number of mutated cases, regions of mutations, types of mutations, mutations, viral load, symptoms, severity (mild, moderate, severe, fatal), sample types (nasopharyngeal swab, bronchoalveolar lavage), viral shedding, comorbidity, mutation detection method, the database used (data downloaded), database accessed and transmissibility.
Studies that analyzed genetic mutations from more than one country were categorized as "multiple countries" rather than the individual countries included. When mutational data from different countries and regions were analyzed as a whole, instead of by specific countries, they were characterized as 'worldwide', and the data were extracted and analyzed in that form to avoid confusion. For regions of mutations labelled as ORF1a, ORF1b, nsp1-14, 3C-like proteinase, RNA-dependent RNA polymerase (RdRp), helicase, 3 -to-5 exonuclease, endoRNAse, 2 -O-ribose methyltransferase, or leader protein, they were characterized as 'ORF1ab' to simplify analysis. Where more than one article reported mutational data from the same group of sample, record, or patient cohort, only one was counted and selected.

Data Synthesis and Analysis
Data analysis was conducted using Comprehensive Meta-analysis Software (CMA) (Version 2.0) (https://www.meta-analysis.com/; accessed on 25 July 2021). The pooled prevalence of SARS-CoV-2 variants was calculated and subgroup analysis was done according to country. A random-effect model using the DerSimonian-Laird method of the metaanalysis was employed to determine the pooled estimates of the reported SARS-CoV-2 variants and subtype proportions. A forest plot was subsequently generated to visually summarize details of the individual studies alongside the estimated common effect and degree of heterogeneity. Publication bias was examined using funnel plots (visual aid for detecting bias) and Egger's regression test. Cochran's Q test evaluated the heterogeneities (i.e., variation in study outcomes between studies) of study-level estimates and quantified using I 2 statistics. I 2 values of 25%, 50%, and 75% were considered low, moderate, and high heterogeneity, respectively [12].
Subgroup meta-analysis was used to analyze sources of heterogeneity. A sensitivity test was conducted using the leave-one-out analysis. p-value of <0.001 was considered to be statistically significant for all tests.

Search Result and Eligible Studies
The complete literature search process is displayed in Figure 1. The search strategy initially found 352 articles, after which 325 were left after duplicates removal. Two hundred and fifty-three articles were excluded based on the exclusion criteria. The full-text of 72 articles were assessed for eligibility, and ten were excluded for lack of mutations data or mutations data were not countries-specified. A total of 62 articles were included in the final qualitative synthesis, and finally, 51 articles published between December 2019 and October 2020 were included in the final quantitative synthesis (meta-analysis).

Characteristics of the Eligible Studies
All the eligible studies included in the meta-analyses were of high methodological quality. From 62 studies included from December 2019 to October 2020 (Table 1) [2,, the highest numbers were from Worldwide (n = 10), multiple countries (n = 6) and the USA (n = 5). The 368,316 samples and genomic data analyzed in the studies were detected by quantitative Reverse Transcriptase (qRT-PCR) or DNA sequencing (Sanger, Next-generation, Whole-genome or Nanopore sequencing).

Characteristics of the Eligible Studies
All the eligible studies included in the meta-analyses were of high methodological quality. From 62 studies included from December 2019 to October 2020 (Table 1) [2,, the highest numbers were from Worldwide (n = 10), multiple countries (n = 6) and the USA (n = 5). The 368,316 samples and genomic data analyzed in the studies were detected by quantitative Reverse Transcriptase (qRT-PCR) or DNA sequencing (Sanger, Nextgeneration, Whole-genome or Nanopore sequencing).

Subgroup Meta-Analysis
The result of subgroup meta-analysis by country showed that the majority of the studies were conducted Worldwide (n = 10), followed by studies carried out in Multiple countries (n = 6) and the USA (n = 5). Interestingly, China with three studies had heterogeneity (I 2 ) of 5.356 and prevalence of 97.5% (CI = 85.1-99.6%), while Italy, with the same number of studies, had heterogeneity of 0.000 and prevalence of 98.1% (CI = 88.2-99.7%). Heterogeneity was highest among studies conducted Worldwide (I 2 = 99.747%), which was also trailed by six studies conducted in Multiple countries (I 2 = 95.168%) ( Table 2). The forest plot is shown in Figure 6.

Meta-Regression
Meta-regression was done for the single variable country. Method of moments was used as the computational option, and a scattered plot (Figure 7) was plotted. p-value of '0.000 was obtained for 'Country', indicating the heterogeneity observed in this study, aside from chance, could also be contributed by country.

Meta-Regression
Meta-regression was done for the single variable country. Method of moments was used as the computational option, and a scattered plot (Figure 7) was plotted. P-value of '0.000′ was obtained for 'Country', indicating the heterogeneity observed in this study, aside from chance, could also be contributed by country

Discussion
With the high infection numbers worldwide, the SARS-CoV-2 virus has evolved, developed mutations and given rise to new genetic variations with increased infectivity and transmissibility. Efforts are currently being undertaken to characterize the virus and its genomic variability molecularly. Viral mutations and variants around the globe are routinely monitored through sequence-based surveillance, epidemiological analysis and laboratory studies.
This study has examined the mutational profile of SARS-CoV-2 between December 2019 to October 2020 from 62 studies of different continents. The pooled prevalence of SARS-CoV-2 variants in COVID-19 patients' samples estimated by the random-effect model was 95.1%. Upon using the Trim and Fill method to adjust for potential bias, the estimate for the prevalence of the variants was still very high at 82.5%.
The analysis showed that between-study variability was high (I 2 = 98.95%). The subgroup meta-analysis showed that the high heterogeneity was contributed by countries such 'Worldwide' (I 2 = 99.7%), 'Multiple Countries' (I 2 = 95.2%), Hong Kong (I 2 = 93.7%) and USA (I 2 = 92.3%). Only two countries, Italy (I 2 = 0%) and China (I 2 = 5.4%) showed a low heterogeneity score. The different methods used to detect the mutations may contribute to the high heterogeneity, especially in the 'Worldwide' and 'Multiple Countries'. The high heterogeneity could also be attributed to the different regions of the SARS-CoV-2 gene analyzed (Spike protein, ORF1ab, Nucleocapsid polyprotein, ect.) and the type of samples used in the studies. Most of the studies, especially those referred to as

Discussion
With the high infection numbers worldwide, the SARS-CoV-2 virus has evolved, developed mutations and given rise to new genetic variations with increased infectivity and transmissibility. Efforts are currently being undertaken to characterize the virus and its genomic variability molecularly. Viral mutations and variants around the globe are routinely monitored through sequence-based surveillance, epidemiological analysis and laboratory studies.
This study has examined the mutational profile of SARS-CoV-2 between December 2019 to October 2020 from 62 studies of different continents. The pooled prevalence of SARS-CoV-2 variants in COVID-19 patients' samples estimated by the random-effect model was 95.1%. Upon using the Trim and Fill method to adjust for potential bias, the estimate for the prevalence of the variants was still very high at 82.5%.
The analysis showed that between-study variability was high (I 2 = 98.95%). The subgroup meta-analysis showed that the high heterogeneity was contributed by countries such 'Worldwide' (I 2 = 99.7%), 'Multiple Countries' (I 2 = 95.2%), Hong Kong (I 2 = 93.7%) and USA (I 2 = 92.3%). Only two countries, Italy (I 2 = 0%) and China (I 2 = 5.4%) showed a low heterogeneity score. The different methods used to detect the mutations may contribute to the high heterogeneity, especially in the 'Worldwide' and 'Multiple Countries'. The high heterogeneity could also be attributed to the different regions of the SARS-CoV-2 gene analyzed (Spike protein, ORF1ab, Nucleocapsid polyprotein, ect.) and the type of samples used in the studies. Most of the studies, especially those referred to as 'Worldwide' and 'Multiple Countries', analyzed patients' genomic data downloaded from GISAID's database.
Most of the reported mutations were located at the Spike gene region, followed by the Nucleocapsid gene and ORF1ab gene. The high number of studies reporting on the Spike gene region might be due to its importance in the pathogenicity and transmissibility of the SARS-CoV-2 virus. The Spike (S) gene has two domains: S1 and S2. The S1 domain mediates receptor binding while S2 mediates downstream membrane fusion [74]. The S1 receptor-binding domain (RBD) shows a high affinity for the human ACE2 receptor in the lungs' alveolar type 2 (AT2) cells. Once the virus is attached to the host cell receptor, cleavage occurs between subunits S1 and S2. The subunit S2 will drive the viral and cellular membranes to fuse. The S1 recognizes and binds to the ACE2 receptor, whereas S2 directly facilitates entry into the host cell, making S1 and S2 crucial for infection [14].
Data extracted from publications included in this study showed that a 23403A>G mutation in the S gene, which produced a missense mutation of D614G in the Spike protein, was recorded in 43 out of 62 studies. The D614G substitution is usually linked to three other mutations: a 241C>to-T mutation in the 5 -UTR region, a synonymous 3037C>T mutation, and a non-synonymous 14408C>T mutation at the RNA-dependent RNA polymerase (RdRP) known as P323L or P4715L at ORF1ab gene.
Our data showed that D614G was detected in the European region from middle to late February 2020 [51]. By early March, it had spread rapidly to the United States (US) [51] and the South American region [62]. In east Asia, the D614G variant was found in Thailand from a sample diagnosed with COVID-19 in early March 2020 [69]. While in China, the variant was detected in samples collected from January to April 2020 [25]. By June 2020, D614G was found in every sample sequenced worldwide [75].
The mutation appeared to arise independently to simultaneously sweep across multiple geographic regions, suggestive of natural selection and an adaptive benefit of D614G. However, subsequent sequencing efforts identified the D614G mutation in viruses in several Chinese provinces in late January (first D614G in China: hCoV-19/Zhejiang/HZ103/2020; 24 January 2020), raising the possibility that global spreading of this mutation may result from chance founder events. Viruses carrying 614G mutation could initiate most early transmission events in multiple locations, demonstrating that D614G mutation was not adaptive, despite in vitro data showing its effects on receptor binding [76].
A study of more than 25,000 sequences of the UK population found that viruses bearing 614G mutation are associated with higher viral load and younger age of patients. It appeared to spread faster and seed larger phylogenetic clusters than viruses with 614D; however, no association was found between the presence of the Spike 614G with clinical severity and COVID-19 mortality [65].
In this study, few limitations were identified, including the inability to assess the impact of the identified mutations on patients' viral loads, severity of the disease, and its transmissibility, due to the lack of reported data from the included studies. An understanding of the impact of the mutations on these variables would be invaluable. Furthermore, most of the studies downloaded only viral genomic data extracted from COVID-19 patients from NCBI and GSAID websites, thus limiting our access to the patients' demographic information such as sex and age; and clinical data such as viral loads symptoms, co-morbidities and disease severity. The scarcity of the required data also limited the subgroup metaanalyses that could be conducted.

Conclusions
In this study, a systematic review and meta-analysis of studies were conducted to report the global prevalence of SARS-CoV-2 variants, estimated at 95.1%. Although a high heterogeneity was observed, we believe the estimate provides a good indication of the prevalence of SARS-CoV-2 variants worldwide from December 2019 to October 2020. With the fast evolution of the SARS-CoV-2 virus, there is a need to continuously monitor the prevalence of new mutations due to their potential influence on disease severity, transmissibility, resistance to antiviral drugs and vaccine effectiveness.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/life11111224/s1, Table S1: Major characteristics of the included studies; Table S2: Search strategy in four electronic databases; Table S3: Quality of included studies by JBI critical appraisal checklist for studies reporting prevalence data.