Molecular Analysis of SARS-CoV-2 Genetic Lineages in Jordan: Tracking the Introduction and Spread of COVID-19 UK Variant of Concern at a Country Level

The rapid evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is manifested by the emergence of an ever-growing pool of genetic lineages. The aim of this study was to analyze the genetic variability of SARS-CoV-2 in Jordan, with a special focus on the UK variant of concern. A total of 579 SARS-CoV-2 sequences collected in Jordan were subjected to maximum likelihood and Bayesian phylogenetic analysis. Genetic lineage assignment was undertaken using the Pango system. Amino acid substitutions were investigated using the Protein Variation Effect Analyzer (PROVEAN) tool. A total of 19 different SARS-CoV-2 genetic lineages were detected, with the most frequent being the first Jordan lineage (B.1.1.312), first detected in August 2020 (n = 424, 73.2%). This was followed by the second Jordan lineage (B.1.36.10), first detected in September 2020 (n = 62, 10.7%), and the UK variant of concern (B.1.1.7; n = 36, 6.2%). In the spike gene region, the molecular signature for B.1.1.312 was the non-synonymous mutation A24432T resulting in a deleterious amino acid substitution (Q957L), while the molecular signature for B.1.36.10 was the synonymous mutation C22444T. Bayesian analysis revealed that the UK variant of concern (B.1.1.7) was introduced into Jordan in late November 2020 (mean estimate); four weeks earlier than its official reporting in the country. In Jordan, an exponential increase in COVID-19 cases due to B.1.1.7 lineage coincided with the new year 2021. The highest proportion of phylogenetic clustering was detected for the B.1.1.7 lineage. The amino acid substitution D614G in the spike glycoprotein was exclusively present in the country from July 2020 onwards. Two Jordanian lineages dominated infections in the country, with continuous introduction/emergence of new lineages. In Jordan, the rapid spread of the UK variant of concern should be monitored closely. The spread of SARS-CoV-2 mutants appeared to be related to the founder effect; nevertheless, the biological impact of certain mutations should be further investigated.


Introduction
The evolutionary analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is appealing for several reasons.
First, this novel virus harbours a ribonucleic acid (RNA) genome, with replication using RNA-dependent RNA polymerase. This replicase enzyme has a minimal proofreading activity; the hallmark of rapidly-evolving viruses (e.g., influenza virus and hepatitis C virus) [1,2].
In addition, the pandemic nature of coronavirus disease 2019 (COVID- 19), with more than 100 million detected cases so far, translates into a huge pool of susceptible hosts with varying selective pressure on the viral genome [3,4]. This resulted in rapid divergence of  The molecular signature found consistently in the Spike gene region of the first Jordan lineage B.1.1.312 was the replacement of adenine by thymine at position 24,432 (A24432T) of the reference genome NC_045512 (thymine instead of uracil since the results were those of DNA sequencing). This mutation was non-synonymous resulting in the replacement of glutamine (Q) by leucine (L) at position 957 of the spike glycoprotein (Q957L).
The molecular signature in the Spike gene region for the second Jordan lineage B.1.36.10 was C22444T (a synonymous mutation).
Using the Tamura  The molecular signature found consistently in the Spike gene region of the first Jordan lineage B.1.1.312 was the replacement of adenine by thymine at position 24,432 (A24432T) of the reference genome NC_045512 (thymine instead of uracil since the results were those of DNA sequencing). This mutation was non-synonymous resulting in the replacement of glutamine (Q) by leucine (L) at position 957 of the spike glycoprotein (Q957L).
The molecular signature in the Spike gene region for the second Jordan lineage B.1.36.10 was C22444T (a synonymous mutation).
Using the Tamura

The Proportion of Phylogenetic Clustering among the Three Most Common Lineages in Jordan
To determine sequence clustering among the three most common genetic lineages of SARS-CoV-2 in Jordan, we conducted maximum likelihood (ML) phylogeny construction. Using the Spike gene region, the proportion of phylogenetic clustering was the highest among the B.  Figure 2). Please refer to the Materials and

The Proportion of Phylogenetic Clustering among the Three Most Common Lineages in Jordan
To determine sequence clustering among the three most common genetic lineages of SARS-CoV-2 in Jordan, we conducted maximum likelihood (ML) phylogeny construction. Using the Spike gene region, the proportion of phylogenetic clustering was the highest among the B.  Figure 2). Please refer to the Materials and Methods section for the explanation of difference in B.1.1.312 number of sequences for the two sub-genomic regions (Supplementary S1).

Amino Acid Substitutions in the Surface Glycoprotein of the Three Major Genetic Lineages in Jordan
For the three major genetic lineages in Jordan (B.1.1.312; B.1.36.10 and B.1.1.7), an assessment of amino acid substitutions in the spike glycoprotein compared to that in the reference sequence (YP_009724390) was undertaken.
The amino acid substitution D614G was detected in the vast majority of sequences (n = 566, 97.8%), and the wild type (D614) was last identified in June 2020.
The amino acid substitutions N501Y and P681H besides the deletion ∆69/70 were consistently found among the lineage B.1.1.7 sequences, while N501I was detected in a single sequence from the first Jordan lineage B.1.1.312.
The following amino acid substitutions were totally absent from the sequences that were analyzed in this study: K417N and E484K.
Using the Protein Variation Effect Analyzer (PROVEAN) tool, two amino acid substitutions were predicted to be deleterious for the spike glycoprotein: T716I detected among B.1.1.7 sequences and Q957L found in the first Jordan lineage B.1.1.312 (Table 2).

Amino Acid Substitutions in the Surface Glycoprotein of the Three Major Genetic Lin Jordan
For the three major genetic lineages in Jordan (B.1.1.312; B.1.36.10 and B.1.1 assessment of amino acid substitutions in the spike glycoprotein compared to tha reference sequence (YP_009724390) was undertaken.
The amino acid substitution D614G was detected in the vast majority of seque = 566, 97.8%), and the wild type (D614) was last identified in June 2020.
The amino acid substitutions N501Y and P681H besides the deletion Δ69/7 consistently found among the lineage B.1.1.7 sequences, while N501I was detect single sequence from the first Jordan lineage B.1.1.312.
The following amino acid substitutions were totally absent from the sequenc were analyzed in this study: K417N and E484K.
Using the Protein Variation Effect Analyzer (PROVEAN) tool, two amino ac stitutions were predicted to be deleterious for the spike glycoprotein: T716I d among B.1.1.7 sequences and Q957L found in the first Jordan lineage B.1.1.312 (Ta Table 2. Prediction of amino acid substitution impact in the spike glycoprotein of SARS-CoV-2 stratified by the thr major genetic lineages detected in Jordan.

SARS-CoV-2 Lineage
Amino Acid Substitution PROVEAN 1 Score Prediction (Cutoff UK variant of concern (B.  Variants with a score equal to or below −2.5 are considered "deleterious," and variants with a score above −2.5 are considered "neutral" in the Protein Variation Effect Analyzer (PROVEAN) tool.

The UK Variant of Concern was Introduced into Jordan in Late November 2020
Bayesian analysis of the UK variant of concern (B.1.1.7) lineage, with 35 SARS-COV-2 S sequences collected in Jordan between 24 December 2020 and 6 January 2021 revealed that the time to the most recent common ancestor (tMRCA) of this lineage in Jordan was 21 November 2020 (95% highest posterior density interval: 17 November 2020-24 December 2020). Coalescent analysis using a Bayesian skyline plot showed a rapid exponential increase in the number of effective infections between 1 January 2021 and 5 January 2021 (Figure 3).
Bayesian analysis of the UK variant of concern (B.1.1.7) lineage, with 35 SARS-COV-2 S sequences collected in Jordan between 24 December 2020 and 6 January 2021 revealed that the time to the most recent common ancestor (tMRCA) of this lineage in Jordan was 21 November 2020 (95% highest posterior density interval: 17 November 2020-24 December 2020). Coalescent analysis using a Bayesian skyline plot showed a rapid exponential increase in the number of effective infections between 1 January 2021 and 5 January 2021 (Figure 3).

Discussion
In this study, we utilized molecular clock and coalescent analyses to describe the timeline of introduction of the genetic lineage B.1.1.7-commonly known as the UK variant of concern-and its spread in Jordan. Additionally, we employed the Pango classification system, which facilitates the classification and nomenclature of SARS-CoV-2 genetic lineages, containing molecular signatures that can be helpful to track its introduction/emergence and spread [5]. This approach can be used to evaluate public health measures including control and mitigation practices [31]. The negative consequences of the current COVID-19 pandemic necessitates such in-depth epidemiologic studies, which can be helpful to plan effective preventive strategies [32,33].
The major result of this study revealed that the genetic lineage B.1.1.7 was introduced into Jordan about four weeks earlier than the official reporting of its introduction into the country [27]. Bayesian skyline coalescent analysis showed that the exponential increase in infections as a result of the B.1.1.7 lineage coincided with the new year 2021, following a lag phase of several weeks. It is known that the human behavior can drive a surge in infections if a super spreader event takes place in a large gathering [34,35]. However, this hypothesis needs further evaluation using contact tracing data together with dense sampling to reconstruct the evolutionary history of this lineage in the country.

Discussion
In this study, we utilized molecular clock and coalescent analyses to describe the timeline of introduction of the genetic lineage B.1.1.7-commonly known as the UK variant of concern-and its spread in Jordan. Additionally, we employed the Pango classification system, which facilitates the classification and nomenclature of SARS-CoV-2 genetic lineages, containing molecular signatures that can be helpful to track its introduction/emergence and spread [5]. This approach can be used to evaluate public health measures including control and mitigation practices [31]. The negative consequences of the current COVID-19 pandemic necessitates such in-depth epidemiologic studies, which can be helpful to plan effective preventive strategies [32,33].
The major result of this study revealed that the genetic lineage B.1.1.7 was introduced into Jordan about four weeks earlier than the official reporting of its introduction into the country [27]. Bayesian skyline coalescent analysis showed that the exponential increase in infections as a result of the B.1.1.7 lineage coincided with the new year 2021, following a lag phase of several weeks. It is known that the human behavior can drive a surge in infections if a super spreader event takes place in a large gathering [34,35]. However, this hypothesis needs further evaluation using contact tracing data together with dense sampling to reconstruct the evolutionary history of this lineage in the country.
Despite the need for further evidence regarding the biological significance of B.1.1.7 lineage, several studies reported on the rapid dissemination of this lineage in UK among several other countries [6,16,36,37]. This proposed change in virus behavior can be related to enhanced binding between the spike glycoprotein of this lineage and its receptor; and this enhancement has been proposed to be the result of N501Y amino acid substitution [18,38].
Additionally, we used the Pango classification system to describe the molecular epidemiology of COVID-19 in Jordan [5]. Since the first introduction of the novel coronavirus into humans, the expanding genetic diversity of the virus demanded a scheme to classify and name monophyletic clades, which would facilitate the study of epidemiologic features of the virus including its spread. This would also provide a consensus to study the possible biological significance of such lineages [39,40]. In this study, we adopted the approach conceived by Rambaut et al., that can help in analyzing patterns of introduction and spread of this novel virus in a certain region [5,12].
Community transmission of SARS-CoV-2 in Jordan became apparent in August 2020, and was dominated by three genetic lineages starting with the first and second Jordan  1.312 and B.1.36.10), and it was recently driven by the UK variant of concern (B. 1.1.7.). The emergence/introduction of the two Jordan lineages can be mostly related to a founder effect, since no discernible advantageous or neutral mutations were detected among the two lineages [41][42][43]. The molecular signature of the second Jordan lineage (B.1.36.10) was found in earlier sequences collected in Turkey [44]. This might point to a possibility of introduction of this lineage into Jordan in early September 2020, considering that travelers coming from Turkey (classified as a green country at that time) were not required to be quarantined [45].
One result that should be investigated further is the higher proportion of phylogenetic clustering for the B.1.1.7 lineage compared to the two Jordan lineages. This indicates a higher proportion of domestic transmission, which can be linked to enhanced transmissibility of the virus. However, such a result is pending further evidence to support the current observations linking such a genetic lineage with a higher transmission [37].
In line with several previous studies, genetic analysis of SARS-CoV-2 in Jordan showed the shift into B lineage, harboring the spike D614G amino acid substitution, with all sequences collected in Jordan harboring this substitution from July 2020 onwards [23,24]. This amino acid substitution was present in the country as early as March 2020, which hints to the effects of virus genetic changes on its epidemic behavior, despite the need for further evidence to support such a correlation [21,[46][47][48].

Study Strengths and Limitations
The current study used the state-of-the-art phylogenetic inference methods to characterize the molecular epidemiology of SARS-CoV-2 in Jordan. Additionally, this study can be considered among the first studies in the Middle East and North Africa region utilizing the Pango classification system to characterize the genetic diversity of SARS-CoV-2 to the best of our knowledge.
Limitations of this study included potential sampling bias in time, which was manifested by variation in sequencing proportion in relation to new cases diagnosed each month; with 1.3% sequencing rate out of the newly diagnosed cases before October 2020 and 0.1% thereafter.
Another caveat of this study can be the enhanced surveillance of passengers (and their contacts) coming from UK or other countries where the UK variant of concern was reported. This may have caused the dominance of B.1.1.7 lineage among sequences collected in December 2020-January 2021.

Compilation of SARS-CoV-2 Jordanian Dataset and Epidemiologic Data
All SARS-CoV-2 genetic sequences that were collected in Jordan were retrieved from GISAID, as of 30 January 2021 [11]. The Jordanian sequences were aligned together with the reference SARS-CoV-2 sequence Wuhan-Hu-1 (accession number: NC_045512). Multiple sequence alignment was undertaken through a multiple alignment program for amino acid or nucleotide sequences (MAFFT v.7) [49].
Data on daily COVID-19 diagnosed cases and deaths in Jordan were retrieved from Coronavirus Source Data, and covered the period from 3 March 2020 to 29 January 2021 [50].

SARS-CoV-2 Lineage Assignment
To describe the genetic lineages of the sequences in the SARS-CoV-2 Jordanian dataset, we utilized Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin) [51]. The Pangolin tool follows the 'Pango' nomenclature system for classifying SARS-CoV-2 genomic sequences [5,12].
The measurement of within-lineage genetic distances was done using MEGA6, which was also used to detect the following amino acid substitutions/deletions in the spike glycoprotein sequence: D614G, E484K, N501Y, P681H, 69-70del, and K417N [52].
Genetic divergence from the reference sequence of SARS-CoV-2 and within-lineage genetic diversity were assessed using the Tamura-Nei model as implemented in MEGA6 [52,53].

Assessment Spike Protein of the Major Lineages in Jordan
For the three major SARS-CoV-2 lineages circulating in Jordan (B.1.1.312; B.1.36.10; and B.1.1.7), we used the Protein Variation Effect Analyzer (PROVEAN) tool in order to assess the possible functional changes in the spike glycoprotein compared to that in the reference sequence (YP_009724390) [54].
Phylogeny construction for the two sub-genomic Jordanian datasets using the ML approach was done using PhyML v3 [55]. The Smart Model Selection (SMS) was used for selection of the most appropriate nucleotide substitution model, depending on the Akaike Information Criterion (AIC) [56]. Models that were used for construction of ML trees were: GTR + G for ORF1ab; and HKY85 + I for S region.  [57]. The following criteria were used for Bayesian evolutionary analysis by sampling trees (BEAST) analysis: HKY nucleotide substitution model with discrete gammadistributed rate heterogeneity, uncorrelated relaxed clock model with a uniform rate prior (initial value of 0.0065) and a Bayesian skyline tree density model [23]. A single run with 200 million chain length was performed, with samples of trees and parameters collected every 20,000 steps after discarding a burn-in of 20%. Convergence was checked for using Tracer v1.6.0. with all parameters having effective sample sizes (ESSs) of >200. Construction of the Bayesian skyline plot was done in Tracer; and assembly of the maximum clade credibility (MCC) tree was done using TreeAnnotator available in BEAST package [57]. Visualization of the trees in this study was undertaken in FigTree [58].

Conclusions
In the current study, molecular characterization of SARS-CoV-2 in Jordan was undertaken for the first time to the best of our knowledge. A recent report by Edyth Parker et al investigated the emergence of lineage B.1.1.7 in Jordan and revealed the current dominance of this lineage in Jordan [59]. Two Jordan lineages dominated infections in the country, with a recent introduction of the lineage B.1.1.7. This UK variant of concern was present in the country several weeks before its official reporting, with an exponential propagation over the first few days of the new year 2021. The introduction of new lineages in the country appeared to be related to founder effect; nevertheless, the biological significance of certain mutations should be further evaluated. An important note should be clarified, which is related to the distinction that should be made between the epidemiologic and contact tracing value of determination of virus lineages as opposed to the identification and characterization of novel strain, subtypes or types of viruses that have distinct biological features. Thus, continuous surveillance of genetic variability of SARS-CoV-2 is recommended to track the emergence of new genetic variants, with subsequent studies of its potential biological significance.
The media hype about the UK variant of concern seems justified considering its rapid spread and the number of amino acid changes detected in the spike glycoprotein of this lineage, which can have important effects on antigenicity and transmissibility. In turn, this can have implications for the current vaccine formulations and resurgence of new waves of infection.

Data Availability Statement:
The data authors can be contacted directly via GISAID website: https://www.gisaid.org/. The da-tasets analysed during the current study (ML analyses files, xml files without sequences, Tracer log files) are available from the corresponding author (M.S.) on a reasonable request and considering the terms of use by GISAID.

Acknowledgments:
We sincerely thank the originating lab (Biolab Diagnostic Laboratories) and the submitting lab (Andersen lab at Scripps research) who have sequenced and shared the full genome data for SARS-CoV-2 in the GISAID database.

Conflicts of Interest:
The authors declare no conflict of interest.