Bayesian Molecular Dating Analyses Combined with Mutational Profiling Suggest an Independent Origin and Evolution of SARS-CoV-2 Omicron BA.1 and BA.2 Sub-Lineages

The ongoing evolution of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) has resulted in the recent emergence of a highly divergent variant of concern (VOC) defined as Omicron or B.1.1.529. This VOC is of particular concern because it has the potential to evade most therapeutic antibodies and has undergone a sustained genetic evolution, resulting in the emergence of five distinct sub-lineages. However, the evolutionary dynamics of the initially identified Omicron BA.1 and BA.2 sub-lineages remain poorly understood. Herein, we combined Bayesian phylogenetic analysis, mutational profiling, and selection pressure analysis to track the virus’s genetic changes that drive the early evolutionary dynamics of the Omicron. Based on the Omicron dataset chosen for the improved temporal signals and sampled globally between November 2021 and January 2022, the most recent common ancestor (tMRCA) and substitution rates for BA.1 were estimated to be that of 18 September 2021 (95% highest posterior density (HPD), 4 August–22 October 2021) and 1.435 × 10−3 (95% HPD  =  1.021 × 10−3 − 1.869 × 10−3) substitution/site/year, respectively, whereas 3 November 2021 (95% highest posterior density (HPD) 26 September–28 November 2021) and 1.074 × 10−3 (95% HPD  =  6.444 × 10−4 − 1.586 × 10−3) substitution/site/year were estimated for the BA.2 sub-lineage. The findings of this study suggest that the Omicron BA.1 and BA.2 sub-lineages originated independently and evolved over time. Furthermore, we identified multiple sites in the spike protein undergoing continued diversifying selection that may alter the neutralization profile of BA.1. This study sheds light on the ongoing global genomic surveillance and Bayesian molecular dating analyses to better understand the evolutionary dynamics of the virus and, as a result, mitigate the impact of emerging variants on public health.


Recombination Analysis
The complete coding genomic sequences of SARS-CoV-2, where the ORFs were concatenated in the following order: ORF1ab + S + ORF3a + E + M + ORF6 + ORF7a + ORF7b + ORF8 + N + ORF10, were screened for recombination signals using RDP v4.101, which implements nine distinct algorithms: RDP, GENECONV, Bootscan, MaxChi, Chimaera, Siscan, PhylPro, LARD, and 3seq [30]. Using the default settings, these sequences were examined for each identified recombination breakpoint. In order to reduce false positive recombination signals, we only considered recombination events detected by at least two of the nine algorithms.

Root-to-Tip Regression Analysis to Assess the Temporal Signals
In order to assess the temporal signals in the dataset, we employed root-to-tip regression analysis on the entire coding genomic sequences of SARS-CoV-2 Omicron variant. Briefly, root-to-tip regression analyses are commonly used to estimate the relationship between root-to-tip genetic divergence and sampling dates generated from Maximum Likelihood (ML) phylogeny. The slope of the regression line provides an estimate of the evolutionary rates (substitutions per site per year), whereas the intercept with the time axis estimates the age of the root.
We first screened for the Maximum Likelihood (ML) fits of 88 alternative nucleotide substitution models for the SARS-CoV-2 Omicron sub-lineages' (BA.1 and BA.2) datasets, and subsequently used ModelFinder to identify the best fitting nucleotide substitution model based on the Bayesian Information Criterion (BIC) (Table S2) [31]. The phylogenetic trees were then estimated using the ML inference and ultrafast bootstrap with 1000 replicates as implemented in IQ tree v2.1.2 [32]. Finally, these ML trees were used to investigate the temporal molecular evolutionary signals for each SARS-CoV-2 Omicron sub-lineage using TempEst v1.5.3 [33].
Of the complete dataset containing high-quality, unique Omicron BA.1 (n = 767) and BA.2 (n = 1002) complete genomic sequences, we removed outliers (often caused by sequencing errors or incorrect labeling) that did not fit to a root-to-tip regression and sequences showing evidence of recombination signals aside from that of the BA.1 (n = 381) and BA.2 (n = 579) dataset. As a result, after removing the outliers and recombinant sequences, we were left with 386 BA.1 and 423 BA.2 genomic sequences. Subsequently, to assess and improve the temporal signals, we generated three datasets by randomly picking up unique, representative Omicron sequences from each country: (i) one sequence from each sampling date (n = 26 for BA.1 and n = 48 for BA.2), (ii) two sequences from each sampling date (n = 75 for BA.1 and n = 86 for BA.2), and (iii) three sequences from each sampling date (n = 107 for BA.1 and n = 116 for BA.2).

Molecular Clock Phylogenetics
To infer the substitution/evolutionary rates and timescale of SARS-CoV-2 Omicron variant, Bayesian inference analyses were performed on the second dataset containing 161 dated, non-recombinant nucleotide sequences for the complete coding sequences of the BA.1 (n = 75) and BA.2 (n = 86) sub-lineages using a Markov Chain Monte Carlo (MCMC) framework [34], implemented in the Bayesian evolutionary analysis by sampling trees (BEAST) v2.6.7 [35]. As previously indicated, the best-fit nucleotide substitution model for each Omicron sub-lineage dataset was chosen.
To identify the best combination of tree priors and clock models, we tested and compared four coalescent tree priors: a constant population size [36], exponential population [37], Bayesian skyline [38], and extended Bayesian skyline [39] tree prior; and two clock models: a strict clock and an uncorrelated relaxed clock with log-normal distribution (UCLN) [40]. All Bayesian analyses were run for 100 million steps across two independent MCMC simulations with states and parameters sampled after every 10,000 steps. To find the best tree prior-clock model combination, Bayesian model testing, a statistical fit measure calculated by computing the log marginal likelihood, was performed and subsequently, each model combination was ranked accordingly. The log Bayes factor (BF) is the difference between two tree prior-clock models' log marginal likelihoods [41]. A log BF of at least 1.1 in favor of a model is described as 'substantial evidence', with 2.3 being 'strong' and 4.6 being 'decisive' [42]. We considered two marginal likelihood estimators: path sampling and stepping-stone sampling [43][44][45]. The parameters for the best-fit model combination for Omicron sub-lineages attained an effective sample size of more than 200, indicating adequate sampling. Using Tracer v1.7.1, we extracted time to the most recent common ancestor (tMRCA) and clock rate estimations from the best-fit model combination [46]. After deleting the first 10% of samples as burn-in, the maximum clade credibility (MCC) tree was extracted using TreeAnnotator v1.8.4 [47]. The MCC trees were visualized using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/, accessed on 11 July 2022). All these analyses were performed on the Hokusai BigWaterfall supercomputer of the Institute of Physical and Chemical Research (RIKEN), Japan.

Selection Pressure Analysis
The complete coding sequences of SARS-CoV-2 Omicron VOC were screened for the presence of recombination breakpoints using the Genetic Algorithm for Recombination Detection (GARD) method, implemented in the HyPhy package's Datamonkey web server [48]. This screening was necessary because the inference based on the data with recombination breakpoints frequently yields more false positive sites [49,50]. In the cases of data with recombination breakpoints, the sequences were partitioned into the recombination blocks, and the selection pressure was estimated individually for each block. Then, using the Single Likelihood Ancestor Counting (SLAC) method, the site-specific selection pressure within the SARS-CoV-2 Omicron variant was estimated as the ratio of nonsynonymous (dN) to synonymous (dS) nucleotide substitutions per site (ω = dN/dS) [48].
We tested and compared the results of sites under a selection pressure estimated by five different methods with their default parameters, namely, the SLAC [48], Fixed Effect Likelihood (FEL), Fast Unbiased Bayesian AppRoximation (FUBAR) [51], adaptive Branch-Site Random Effects Likelihood (aBSREL) [52,53], and Branch-Site Unrestricted Statistical Test for Episodic Diversification (BUSTED) [54], all available at the Datamonkey web server. Furthermore, the sites in each ORF of SARS-CoV-2 Omicron variant experiencing positive/diversifying and negative/purifying selection pressure were taken into account only when anticipated by at least two of the aforementioned methods.

Mutational Scanning of SARS-CoV-2 Omicron Reveals Its Independent Emergence
The recently discovered Omicron VOC is distantly related to earlier documented VOCs (Alpha, Beta, Gamma, and Delta) and has a remarkably high number of mutations in the spike protein ( Figure 1A). As a result of Omicron's continued genetic evolution, two sub-lineages (BA.1 and BA.2) were initially identified; BA.1 emerged as the dominant sublineage in late 2021 and spread quickly throughout the world, but BA.2 quickly overtook BA.1 on a global scale ( Figure 1B). The identification of a highly divergent Omicron VOC advocated the possibility that Omicron may have evolved in a cellular micro-environment completely different from other known VOCs. In comparison to the wild-type SARS-CoV-2 (WT), Omicron's receptor-binding domain (RBD) contains 15 mutations, 10 of which are in the receptor-binding motif (RBM), which mediates binding to host cells via the angiotensin-converting enzyme 2 (ACE2) receptor and to the majority of NAbs ( Figure 2). Four mutations in Omicron's RBM (N440K, S477N, T478K and N501Y), and one in RBD (G339D) have been demonstrated to improve binding affinity to human ACE2 [55,56]. count only when anticipated by at least two of the aforementioned methods.

Mutational Scanning of SARS-CoV-2 Omicron Reveals Its Independent Emergence
The recently discovered Omicron VOC is distantly related to earlier documented VOCs (Alpha, Beta, Gamma, and Delta) and has a remarkably high number of mutations in the spike protein ( Figure 1A). As a result of Omicron's continued genetic evolution, two sub-lineages (BA.1 and BA.2) were initially identified; BA.1 emerged as the dominant sublineage in late 2021 and spread quickly throughout the world, but BA.2 quickly overtook BA.1 on a global scale ( Figure 1B). The identification of a highly divergent Omicron VOC advocated the possibility that Omicron may have evolved in a cellular micro-environment completely different from other known VOCs. In comparison to the wild-type SARS-CoV-2 (WT), Omicron's receptor-binding domain (RBD) contains 15 mutations, 10 of which are in the receptor-binding motif (RBM), which mediates binding to host cells via the angiotensin-converting enzyme 2 (ACE2) receptor and to the majority of NAbs ( Figure 2). Four mutations in Omicron's RBM (N440K, S477N, T478K and N501Y), and one in RBD (G339D) have been demonstrated to improve binding affinity to human ACE2 [55,56].  The variants of concern (VOCs) are denoted by different colour schemes. The figure was generated by the Nextstrain using the data from the GISAID, (B) The global distribution frequencies of VOCs are denoted by different colour schemes over the timespan since SARS-CoV-2's emergence. Furthermore, 8 RBD mutations (G339D, S373P, S375F, N440K, E484A, Q493R, Q498R, and Y505H) have been shown to be associated with escape from a wide range of different classes of NAbs [57][58][59][60][61][62][63]. These RBD mutations, together with N-terminal domain (NTD) mutations, particularly H69-V70del and G142-Y144del, favor Omicron's evasion from the majority of NAbs induced either by infections or vaccinations [22,64]. Notably, the mutation profile of Omicron BA.2 s NTD (T19I, L24-P26del, A27S, G142D, and V213D) differs significantly from that of BA.1, although their functional roles remain unknown ( Figure 2). Some mutations in the Omicron's spike protein may also contribute to modulating the virus host spectrum. For example, the acquisition of positively charged amino acids at 493 and 498 (Q493K and Q498H) has been shown to allow SARS-CoV-2 to infect mice via interacting with murine ACE2 [65,66]. These two mutations, Q493R and Q498R (both containing positively charged amino acids and found in Omicron's RBM), were acquired after 30 passages in the mouse lung (GISAID accession number EPI_ISL_1666328) [67]. Furthermore, H655Y was selected during replication in the mink model, implying a role in modifying the host range [68]. As a result, Q493R, Q498R, and H655Y carried by Omicron's spike protein reflects its adaptation in mice and mink. These findings, together with a recent study demonstrating Omicron's ability to mediate the enhanced entry into cells expressing multiple animal species' ACE2 [69], imply that, Omicron may have a broader host range and a greater proclivity to establish an animal reservoir for its family than previously known VOCs. Furthermore, 8 RBD mutations (G339D, S373P, S375F, N440K, E484A, Q493R, Q498R, and Y505H) have been shown to be associated with escape from a wide range of different classes of NAbs [57][58][59][60][61][62][63]. These RBD mutations, together with N-terminal domain (NTD) mutations, particularly H69-V70del and G142-Y144del, favor Omicron's evasion from the majority of NAbs induced either by infections or vaccinations [22,64]. Notably, the mutation profile of Omicron BA.2′s NTD (T19I, L24-P26del, A27S, G142D, and V213D) differs significantly from that of BA.1, although their functional roles remain unknown ( Figure 2). Some mutations in the Omicron's spike protein may also contribute to modulating the virus host spectrum. For example, the acquisition of positively charged amino acids at 493 and 498 (Q493K and Q498H) has been shown to allow SARS-CoV-2 to infect mice via interacting with murine ACE2 [65,66]. These two mutations, Q493R and Q498R (both containing positively charged amino acids and found in Omicron's RBM), were acquired after 30 passages in the mouse lung (GISAID accession number EPI_ISL_1666328) [67]. Furthermore, H655Y was selected during replication in the mink model, implying a role in modifying the host range [68]. As a result, Q493R, Q498R, and H655Y carried by Omicron's spike protein reflects its adaptation in mice and mink. These findings, together with a recent study demonstrating Omicron's ability to mediate the enhanced entry into cells expressing multiple animal species' ACE2 [69], imply that, Omicron may have a broader host range and a greater proclivity to establish an animal reservoir for its family than previously known VOCs.
Intriguingly, of 30 mutations in the Omicron's spike protein, it shares only 8 mutations with other known VOCs, including H69-V70del and P681H in Alpha, K417N in Beta, H655Y in Gamma, T19I and T478K in Delta, N501Y in Alpha, Beta, and Gamma, and D614G in Alpha, Beta, Gamma, and Delta. Omicron's unique mutation profile, combined with the amino acid substitution pattern's low similarity to other known VOCs, opens the door for designating any of the previously reported VOCs or other variants as its most recent common ancestor. Furthermore, we noted distinct sub-lineage-specific mutations in the spike protein of BA.1 (A67V, T95I, G142-Y144del, Y145D, N211del, L212I, Intriguingly, of 30 mutations in the Omicron's spike protein, it shares only 8 mutations with other known VOCs, including H69-V70del and P681H in Alpha, K417N in Beta, H655Y in Gamma, T19I and T478K in Delta, N501Y in Alpha, Beta, and Gamma, and D614G in Alpha, Beta, Gamma, and Delta. Omicron's unique mutation profile, combined with the amino acid substitution pattern's low similarity to other known VOCs, opens the door for designating any of the previously reported VOCs or other variants as its most recent common ancestor. Furthermore, we noted distinct sub-lineage-specific mutations in the spike protein of BA.1 (A67V, T95I, G142-Y144del, Y145D, N211del, L212I, 214EPEins, S371L, G446S, T547K, N856K, and L981F) and BA.2 (T19I, L24-P26del, A27S, G142D, V213G, S371F, T376A, D405N, and R408S), possibly indicating their separate/independent emergence and evolution. Furthermore, regardless of the vaccination status, these sub-lineage-specific mutations may be associated with a higher susceptibility of infection by the BA.2 sublineage in comparison to the BA.1 sub-lineage [70].
Out of a cluster of three substitutions (H655Y, N679K, and P681H) found near the S1/S2 furin cleavage site of Omicron's spike protein, two substitutions (P681H in Alpha, H655Y in Gamma, and P681R in Delta) have been demonstrated to facilitate cleavage of the spike protein and increase viral fusogenicity in the host cells [68,[71][72][73]. Additionally, the spike protein's fusion peptides of both Omicron's sub-lineages (BA.1 and BA.2) contained a D796Y substitution that was absent from the previously identified VOCs. The combination of these substitutions may enhance the fusogenicity and transmissibility of Omicron [68].
Other than the spike glycoprotein, Omicron has several mutations in other proteins. The BA.1 sub-lineage is distinct from the BA.2 sub-lineage in that the former had three substitutions (K38R, L1266I, and A1892T) and one deletion (S1265 del) in nsp3, but the latter did not. The rest of the mutations are common in both the sub-lineages, including T492I in nsp4, P132H in nsp5, a S106-G107-F108 deletion and I189V in nsp6, P323L in NS12, and I42V in nsp14. Nonetheless, little is known about their functional roles, aside from: a deletion in nsp6 (del105-107) for the evasion of innate immunity [74], P323L in nsp12 (RNA-dependent RNA polymerase) for reduced binding affinity to remdesivir [75], and two mutations (R203K and G204R) in nucleocapsid for enhanced infectivity [76].

Recombination Analysis
Recombination is a fundamental mechanism for generating diversity among positive-sense RNA viruses, including SARS-CoV-2, and is an important tool for understanding the evolutionary history of viruses. Furthermore, the Bayesian molecular dating analyses on the dataset having evidence of recombination can result in the biased phylogenetic and phylodynamic inferences [77,78]. As a result, the construction of the dataset free from the recombination signals is a crucial step in deriving the molecular clock phylogenetics inferences.
We individually screened the sequences of the Omicron BA.1 and BA.2 sub-lineages for recombination signals using the RDP v4.101, which implements nine distinct algorithms to locate evidence of recombination signals. In the SARS-CoV-2 Omicron BA.1 sub-lineage dataset, we found a total of four recombination signals that were recognized by at least two different algorithms (Table S3). Three of these, however, lacked a high level of evidence. We found moderate evidence for only one recombination signal, which was identified by Chimaera (p = 0.  [4]. Therefore, the continuous monitoring of recombinants, especially in wastewater samples, together with combined individual testing is an effective and efficient approach in forecasting of new SARS-CoV-2 variants, thereby assisting the scientific community in preparing for future public health challenges. Lastly, we identified and removed all the recombinant sequences projected to convey even a low level of evidence from the dataset before conducting the Bayesian molecular dating analyses.

Bayesian Molecular Dating Analyses of Omicron VOC
After screening 88 distinct nucleotide substitution models for the SARS-CoV-2 Omicron sub-lineages (BA.1 and BA.2) datasets, General Time Reversible (GTR + F + I) and Tamura-Nei (TN + F + I) models were found to be the best nucleotide substitution models for BA.1 and BA.2 sub-lineages, respectively (Table S2). The ML trees were generated using these best fitting nucleotide substitution models. Using TempEst v1.5.3, root-to-tip regression analysis was performed on the ML trees generated separately for the Omicron BA.1 and BA.2 sub-lineages to assess the temporal molecular evolutionary signals. The coefficient of the determinant, R 2 , which measures the clock-likeness of the sequences, and the correlation of coefficient (r) were low for the BA.1 (r = 0.371 and R 2 = 0.137) and BA.2 (r = 0.166 and R 2 = 0.027) datasets.
To improve the temporal signals in our dataset, we first identified and removed any outliers (often caused by sequencing errors or incorrect labeling) that did not fit to a root-totip regression. Subsequently, while maintaining the sequence heterogeneity, we generated three datasets by randomly picking up unique Omicron sequences from each country: (i) one sequence from each sampling date, (ii) two sequences from each sampling date, and (iii) three sequences from each sampling date. By doing this, temporal signals for both Omicron sub-lineages BA.1 (r = 0.454 and R 2 = 0.206) and BA.2 (r = 0.549 and R 2 = 0.302) improved significantly in the second set of data ( Figure 3, Table S4).
BA.1 and BA.2 sub-lineages to assess the temporal molecular evolutionary signals. The coefficient of the determinant, R 2 , which measures the clock-likeness of the sequences, and the correlation of coefficient (r) were low for the BA.1 (r = 0.371 and R 2 = 0.137) and BA.2 (r = 0.166 and R 2 = 0.027) datasets.
To improve the temporal signals in our dataset, we first identified and removed any outliers (often caused by sequencing errors or incorrect labeling) that did not fit to a rootto-tip regression. Subsequently, while maintaining the sequence heterogeneity, we generated three datasets by randomly picking up unique Omicron sequences from each country: (i) one sequence from each sampling date, (ii) two sequences from each sampling date, and (iii) three sequences from each sampling date. By doing this, temporal signals for both Omicron sub-lineages BA.1 (r = 0.454 and R 2 = 0.206) and BA.2 (r = 0.549 and R 2 = 0.302) improved significantly in the second set of data ( Figure 3, Table S4). Next, Bayesian molecular dating analyses were performed on the second dataset containing 161 dated, non-recombinant nucleotide sequences for the complete coding sequences of Omicron. We compared the prior, posterior, and likelihood distributions of each of the eight tree priors-clocks combinations in order to determine the best model-fit for the Omicron sub-lineages ( Figure 4A-F). The estimated tMRCA dates and evolutionary rates of BA.1 sub-lineage were relatively comparable across all the tree priors (e.g., Next, Bayesian molecular dating analyses were performed on the second dataset containing 161 dated, non-recombinant nucleotide sequences for the complete coding sequences of Omicron. We compared the prior, posterior, and likelihood distributions of each of the eight tree priors-clocks combinations in order to determine the best model-fit for the Omicron sub-lineages ( Figure 4A-F). The estimated tMRCA dates and evolutionary rates of BA.1 sub-lineage were relatively comparable across all the tree priors (e.g., constant population size, exponential population, Bayesian skyline, and extended Bayesian skyline), but varied greatly depending on the clock model used. Comparatively, the estimated tMRCA dates and evolutionary rates of BA.2 sub-lineage were very similar depending on the clock model used, but varied across different tree priors (Table S5).
constant population size, exponential population, Bayesian skyline, and extended Baye ian skyline), but varied greatly depending on the clock model used. Comparatively, t estimated tMRCA dates and evolutionary rates of BA.2 sub-lineage were very similar d pending on the clock model used, but varied across different tree priors (Table S5). The Constant Population coalescent tree prior with strict clock was the best fit to bo of the Omicron sub-lineages, according to Bayesian hypothesis testing using the log Bay factor ( Figure 4G-H). A time-scaled maximum-clade-credibility tree showed that all of t omicron sequences could be separated into two distinct sub-lineages, BA.1 (n = 75) an BA.2 (n = 86). The most recent common ancestors (tMRCA) for the BA.1 and BA.2 su November 2021), respectively ( Figure 5). The substitution rates of BA.1 and BA.2 were estimated to be 1.435 × 10 −3 (95% HPD = 1.021 × 10 −3 − 1.869 × 10 −3 ) substitution/site/year and 1.074 × 10 −3 (95% HPD = 6.444 × 10 −4 − 1.586 × 10 −3 ) substitution/site/year, respectively, which is in line with several previous studies that estimated the substitution rates of SARS-CoV-2 [9,[80][81][82]. In comparison to our dataset's tMRCA for the BA.1 sub-lineage, the estimated tMRCA of the South African BA.1 sub-lineage sequence was found to be early October 2021 (9 October 2021, 95% HPD 30 September-20 October 2021) [9]. This discrepancy is likely due to differences in the dataset's geographical heterogeneity. The tMRCA estimate for the BA.2 sub-lineage is in perfect accord with other studies, such as the tMRCA of BA.2 sub-lineage on 6 November 2021 (95% HPD = 9 October 2021 to 29 November 2021) [26], and mid-November for the Philippines BA.2 lineage sequences (18 November 2021; 95 % HPD = 6-28 November 2021) [83]. Since the BA.1 and BA.2 sub-lineages exhibit different tMRCAs, which is in line with the more recent emergence of BA.2 as compared to BA.1, they are expected to bear sub-lineage specific mutational profiles, and it is possible that these two sub-lineages of Omicron VOC might have originated and evolved independently.
The evolutionary history of Omicron is presently governed by three hypotheses. The first hypothesis is that Omicron might have spread silently in a geographical region with limited surveillance and sequencing facility [84,85]. Second, Omicron might have evolved in an immunocompromised patient, allowing long-term sustained evolution and adaptation of the virus [84,86]. Third, the Omicron might have accumulated mutations in a nonhuman host before jumping to humans [86,87]. Presently, the second hypothesis seems to be the more plausible explanation for the evolutionary origin of Omicron [87]. Nonetheless, in the event of the emergence of multiple new mutations in the Omicron's spike protein, which are quite distinct in the BA.1 and BA.2 sub-lineages, as well as their estimated In comparison to our dataset's tMRCA for the BA.1 sub-lineage, the estimated tM-RCA of the South African BA.1 sub-lineage sequence was found to be early October 2021 (9 October 2021, 95% HPD 30 September-20 October 2021) [9]. This discrepancy is likely due to differences in the dataset's geographical heterogeneity. The tMRCA estimate for the BA.2 sub-lineage is in perfect accord with other studies, such as the tMRCA of BA.2 sub-lineage on 6 November 2021 (95% HPD = 9 October 2021 to 29 November 2021) [26], and mid-November for the Philippines BA.2 lineage sequences (18 November 2021; 95 % HPD = 6-28 November 2021) [82]. Since the BA.1 and BA.2 sub-lineages exhibit different tMRCAs, which is in line with the more recent emergence of BA.2 as compared to BA.1, they are expected to bear sub-lineage specific mutational profiles, and it is possible that these two sub-lineages of Omicron VOC might have originated and evolved independently.
The evolutionary history of Omicron is presently governed by three hypotheses. The first hypothesis is that Omicron might have spread silently in a geographical region with limited surveillance and sequencing facility [83,84]. Second, Omicron might have evolved in an immunocompromised patient, allowing long-term sustained evolution and adaptation of the virus [83,85]. Third, the Omicron might have accumulated mutations in a non-human host before jumping to humans [85,86]. Presently, the second hypothesis seems to be the more plausible explanation for the evolutionary origin of Omicron [86]. Nonetheless, in the event of the emergence of multiple new mutations in the Omicron's spike protein, which are quite distinct in the BA.1 and BA.2 sub-lineages, as well as their estimated separate most recent common ancestor, it may be more plausible to conclude that a combination of RBD-and NTD-directed classes of antibody therapeutics at sub-optimal doses in COVID-19 patients or optimal doses in an immunocompromised patient or waned vaccine-induced immunity may have provided a conducive environment to accumulate multiple mutations in Omicron's spike protein. However, the role of intermediate hosts, particularly rodents, in Omicron transmission and evolution, followed by reverse-zoonosis, should not be neglected.

Selection Analysis
The rapid evolution of RNA viruses, particularly SARS-CoV-2, is governed by increasing and persistent selection pressure, leading to the creation of viruses with altered genetic and phenotypic characteristics [87,88]. These evolutionary changes are the re-sult of two opposing forces: positive selection (which generates various genetic changes beneficial for virus enhanced fitness to its host) and negative selection (which aims to maintain fitness without producing any advantageous changes). Therefore, positive selection may have accelerated the accumulation of multiple mutations in the Omicron BA.1 and BA.2 sub-lineages. To test for evidence of selection (both positive/diversifying and negative/purifying) in the Omicron's evolution, we employed a selection pressure estimation pipeline comprising of five methods: SLAC, FEL, FUBAR, aBSREL, and BUSTED at the Datamonkey web server of the HyPhy package. We filtered out sequences that do not have Omicron's spike protein signature mutational profile and then extracted individual protein coding regions (11 ORFs) from unique BA.1 (n = 689) and BA.2 (n = 948) sub-lineages to estimate the site-specific selection pressure. To improve the robustness of site-selection and reduce the false positive rates, only sites predicted by at least two of the aforementioned methods were considered.
Among the 11 ORFs, three ORFs (ORF3a, ORF7a, and ORF7b) experienced a strong selection (ratio of non-synonymous to synonymous substitutions is more than 1.5). There were seven positively selected sites in ORF1ab, 13 in the spike protein, and 1 in the Nucleocapsid protein of the Omicron BA.1 sub-lineage, and only one in the Membrane protein of BA.2 sub-lineage, indicating that multiple codon sites drive the genetic diversity in the BA.1 sub-lineage (Table 1). Furthermore, of the spike protein's 13 sites in the BA.1 sub-lineage, eight sites (339, 371, 375, 440, 446, 484, 493, and 505) that showed evidence of diversifying or positive selection, were associated with escape from different classes of NAbs [57][58][59][60][61][62][63], implying that these sites are still evolving in order to modify BA.1 s neutralization profile. Importantly, four sites in S (346, 452, 554, and 1260), and one each in N (215) and nsp6 (3646) that showed positive selection signals were not Omicron-defining mutations, and these sites could have been carried by its most recent common ancestor. However, three sites carrying the positive selection signals converge on mutations found in previously identified SARS-CoV-2 VOCs (S:452, N:215, and nsp6:3646 in Delta; S:346 in Mu VOI). These observations support the notion that Omicron might have originated and evolved independently.

Conclusions
The findings of this study, which investigated the early global evolutionary dynamics of the recently identified, highly divergent SARS-CoV-2 Omicron VOC sampled between November 2021 and January 2022, combined with mutational profiling suggest that the Omicron BA.1 and BA.2 sub-lineages originated independently and evolved over time. The currently available evidence supports the idea that the Omicron VOC may have originated due to the long-term persistence in an immunocompromised patient or COVID-19 patient with waned vaccine-induced immunity. However, the role of intermediate hosts, particularly rodents, in Omicron transmission and evolution, followed by reverse-zoonosis, should not be neglected. This study also advocates the continued diversifying selection that may alter the neutralization profile of BA.1. Numerous mutations, particularly in the spike protein's NTD and nsp3, whose functions are still unclear, warrant functional characterization in order to understand their contributions to differential viral transmissibility and diminished efficacy of therapeutics and vaccines. Finally, this study emphasizes the significance of ongoing global genomic surveillance, Bayesian molecular dating analyses, and mutational profiling in understanding the virus's evolutionary dynamics and, as a result, mitigating the impact of emerging variants on public health.

Supplementary Materials:
The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/v14122764/s1, Table S1: An EPI_SET Identifier for the metadata of highquality SARS-CoV-2 Omicron genomic sequences used in this study; Table S2: Identification of best fitting nucleotide substitution model based on the Bayesian Information Criterion (BIC) using the ModelFinder; Table S3: The recombination signals detected in the Omicron BA.1 sub-lineage; Table S4: Temporal signals in three datasets (of BA.1 and BA.2 sub-lineages) generated by randomly picking up unique Omicron sequences from each country: (i) one sequence from each sampling date (1SEQ), (ii) two sequences from each sampling date (2SEQ), and (iii) three sequences from each sampling date (3SEQ); Table S5