1. Introduction
The emergence and subsequent global dissemination of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) precipitated a scientific mobilization without precedent in human history. In the span of a few short years, the global scientific community constructed what can be best described as a “genomic panopticon”—a surveillance apparatus of staggering scale and resolution. With nearly 20 million viral genome sequences deposited in public repositories such as the Global Initiative on Sharing All Influenza Data (GISAID), the depth of data available for SARS-CoV-2 surpasses the cumulative sequencing efforts for all other human pathogens combined, including decades of work on Influenza A and HIV-1 [
1]. This flood of genetic information, combined with open-source tools, like Nextstrain with its unique, real-time window into the mechanics of viral evolution, enables researchers to track the accumulation of substitutions, insertions, and deletions at a granular level that had previously been confined to theoretical models or small-scale laboratory experiments [
2].
From the initial detection of the D614G substitution—the first “functional” mutation to spread globally—to the complex, branching evolution of the Omicron sub-lineages, humanity has witnessed the processes of natural selection in fast-forward [
1]. We observed the virus adapting to the human host with remarkable efficiency, optimizing its entry mechanisms, refining its replication kinetics, and ultimately navigating the complex landscape of host immunity [
3,
4]. Yet, despite this wealth of data, a critical epistemological gap remains. The vast accumulation of genomic sequences has not translated linearly into predictive capacity. For much of the pandemic, our ability to forecast the evolutionary trajectory of the virus—to predict which variant would emerge next and what its phenotypic properties would be—has lagged significantly behind the virus itself.
Although the Angiotensin-converting enzyme 2 (ACE2) remains the primary receptor for SARS-CoV-2 entry, the breadth of SARS-CoV-2 organ involvement and the infection-associated dysregulation of cell types with low ACE2 expression motivated extensive investigation into auxiliary receptors and attachment factors that may potentiate infection in specific tissues or inflammatory states [
5,
6,
7,
8]. Among these, the cluster of differentiation (CD147) (basigin, extracellular matrix metalloproteinase inducer (EMMPRIN)) has received substantial attention but remains controversial, with conflicting evidence regarding its direct spike-binding activity. Accordingly, we include CD147 analyses strictly as exploratory, hypothesis-generating comparisons to test whether receptor-specific physicochemical signatures differ from ACE2 across variant evolution [
5,
9,
10].
Structural and biophysical studies have demonstrated that early SARS-CoV-2 evolution favored mutations that enhanced the binding affinity of human ACE2 and optimized furin cleavage efficiency, often mediated by electrostatic changes at the receptor-binding domain (RBD) interface [
11]. These observations motivated a dominant interpretation in which viral fitness increased monotonically with receptor affinity. SARS-CoV-2 appeared to be ascending a smooth fitness peak, driven by simple Coulombic attraction and fusion kinetics. The Delta variant (B.1.617) seemed to represent the zenith of this trajectory, combining a high affinity with rapid transmissibility to achieve an unprecedented reproductive number [
11].
However, the emergence of the Omicron variant (B.1.1.529) in late 2021 shattered this linear paradigm. The Omicron variant did not merely climb higher on the existing fitness peak. It traversed a deep evolutionary valley to access a distinct region of the fitness landscape. Characterized by a constellation of over 30 mutations in the spike protein, many of which were predicted to be deleterious in isolation, the Omicron variant exhibited a “paradoxical” phenotype: Its intrinsic binding affinity for ACE2 was often lower or comparable to that of the Delta variant, yet its global dominance was absolute [
7,
12]. This “Omicron Paradox” signaled a fundamental evolutionary regime shift—a transition from a trajectory dominated by transmission optimization to one dominated by immune evasion, structural plasticity, and epistatic compensation [
13]. Our results quantify three mechanisms by which Omicron traversed this apparent fitness valley: (i) the epistatic rescue of ACE2-interface energetics through compensatory salt bridges (Q493R–E35 and Q498R–D38; ~130 kJ/mol enthalpic gain under the MM-PBSA approximation) that partially offsets the loss of the ancestral K417–D30 electrostatic anchor; (ii) distributed structural compensation, with ~43.8% of destabilizing mutations paired with a stabilizing partner within 10 Å; and (iii) the redirection of selective pressure from conserving ancestral epitope structure toward systematic epitope erosion, yielding an ~86.2% relative reduction in B-cell epitope conservation while maintaining epidemiological dominance.
The failure to predict this shift highlights the limitations of our current, siloed approach to viral surveillance. Structural biology, genomic epidemiology, and immunology have largely operated in distinct ways [
14,
15]. Structural biologists provide high-resolution snapshots of viral proteins in static states. Genomic epidemiologists track the flow of mutations through phylogenetic trees and immunologists measure neutralization titers in sera. While each discipline provides vital insights, they rarely integrate their data in a dynamic, mechanistic framework. Current predictive models, often relying on compartmental Susceptible-Exposed-Infectious-Recovered (SEIR) frameworks or linear phylogenetic extrapolation, fail to account for the complex epistatic networks and biophysical trade-offs that drive evolution in a highly immune population [
14,
15].
To address these predictive failures and prepare for the endemic future of SARS-CoV-2 (and future pandemics), we must transition toward systems virology. This emerging discipline integrates molecular dynamics (MD), deep mutational scanning (DMS), evolutionary game theory, and epidemiological modeling at multiple scales. It seeks to bridge the gap between the Ångström-scale fluctuations of a viral protein side chain and the continent-scale waves of infection [
16]. Here, we address this gap by performing an integrated, multi-scale analysis of SARS-CoV-2 RBD evolution across variants. We combined molecular protein–protein docking, MD simulations, binding free-energy calculations, residue-level interaction analysis, immunoinformatics-based epitope profiling, and publicly accessible epidemiological data to quantitatively dissect how structural stability, receptor binding, and immune evasion jointly shape viral fitness. Importantly, this framework allows a direct comparison between early pre-Omicron and late Omicron-era variants under a unified analytical pipeline.
By integrating structural dynamics, energetic analyses, and immune epitope remodeling within a single quantitative framework, this study provides a mechanistic interpretation of SARS-CoV-2 evolutionary shifts across pandemic phases. Rather than proposing a single dominant fitness determinant, our results support a regime-dependent model of viral adaptation shaped by changing selective pressures.
3. Results
This section systematically investigates the evolutionary mechanisms of SARS-CoV-2 by integrating epidemiological, structural, and immunoinformatic data to dissect the functional drivers of viral fitness. It proceeds in five connected blocks that deconstruct variant success. It first frames the core problem—the “Omicron paradox”—by contrasting the unprecedented mutational burden of Omicron lineages with their enhanced epidemiological dominance and transmission fitness. It then evaluates three orthogonal evolutionary strategies that could resolve this paradox: immune escape, receptor-binding maintenance, and mutation tolerance. Immune escape is assessed by quantifying the relationship between the coordinated disruption of B-cell and T-cell epitopes and breakthrough infections and antigenic drift. Receptor-binding maintenance is examined by showing how highly mutated variants can preserve host entry through electrostatic interface rewiring and potential dual-receptor utilization (ACE2 and CD147) despite extensive sequence change. Mutation tolerance is addressed by the characterization of non-additive epistatic interactions and spatial compensation networks that enable the receptor-binding domain to accommodate large mutational loads without structural collapse. Finally, these functional dimensions are integrated into a unified, multi-dimensional framework intended to clarify variant evolutionary strategies and support predictive modeling for future genomic surveillance. Across this workflow, we analyzed
n = 32 variants for sequence-level mutation profiling, physicochemical descriptors, and epitope disruption. Growth-advantage estimates were available for
n = 25 variants, and
n = 18 variants underwent 100-ns molecular dynamics simulations with MM-PBSA binding free energy calculations (
Table A5).
3.1. The Omicron Paradox: Unprecedented Mutation Burden with Epidemiological Dominance
To investigate the molecular mechanisms underlying the fitness and immune escape of SARS-CoV-2 variants, we analyzed 27 major variants spanning the complete evolutionary trajectory from the wild-type (WT) Wuhan-Hu-1 (December 2019) through contemporary Omicron lineages, including KP.3 (July 2024). Our dataset encompassed 12 pre-Omicron variants (Alpha, Beta, Gamma, Delta, Epsilon, Lambda, Kappa, Eta, Iota, Mu) and 20 Omicron lineages (BA.1 through KP.3), representing all the World Health Organization (WHO)-designated Variants of Concern (VOC) and major Variants of Interest (VOI) that achieved significant global circulation (
Figure 1).
A comparative analysis of RBD mutation loads revealed a dramatic evolutionary discontinuity coinciding with the emergence of Omicron variants in November 2021 (
Figure 2). Pre-Omicron variants accumulated mutations conservatively, with RBD mutation counts ranging from 1 (Alpha, Epsilon, Eta, Iota) to 3 (Beta, Gamma, Mu), and a median of 2 mutations per variant. While the Omicron lineage exhibited explosive mutation accumulation from its initial emergence, the ancestral Omicron B.1.1.529 carried 13 RBD mutations, BA.1 carried 16, and subsequent sub-lineages progressively escalated to 25–28 mutations (XBB variants) and ultimately 34–37 mutations in the most recent BA.2.86/JN.1/KP.3 lineages. This represents an 18.5-fold increase in the peak mutation load compared to pre-Omicron variants, with the most divergent Omicron variant (KP.3, 37 mutations) carrying about 12-fold more RBD mutations than the most mutated pre-Omicron variant (Beta/Gamma, 3 mutations). Critically, this mutation accumulation occurred over a compressed 32-month period (from November 2021 to July 2024), yielding an average rate of approximately 1 new RBD mutation per month in circulating Omicron lineages.
Despite the markedly higher mutational burden and sequence divergence of Omicron, Omicron lineages displayed greater epidemiological fitness than pre-Omicron variants across breakthrough, vaccination-era circulation, and prevalence dynamics (
Figure 3,
Figure 4 and
Figure 5). The breakthrough-infection proxy—computed from variant frequency trajectories in vaccinated populations—shifted upward in the Omicron era, consistent with enhanced immune evasion (
Figure 3). Pre-Omicron variants showed variable breakthrough signals, with the Alpha variant exhibiting a large transient peak (14,035 per 100,000 vaccinated; median 126) and the Delta variant remaining comparatively low (peak 75; median 9). In contrast, Omicron sub-lineages sustained higher breakthrough proxies (BA.1 peak 131; median 43; BA.2 peak 244; median 32), with BA.2 reaching about three-fold the Delta peak and indicating elevated breakthrough risk under widespread vaccine-induced immunity (
Figure 3). Prevalence dynamics mirrored global sequence-based shift trajectories, showing repeated Omicron waves with peak proportional prevalence exceeding 70% across multiple surveillance windows (
Figure 4). The integrated summary further aligns elevated breakthrough proxy levels with Omicron circulation during higher vaccination coverage, supporting the interpretation that immune escape became a dominant determinant of post-2021 variant success (
Figure 5).
Time-series show the breakthrough proxy per 100,000 vaccinated individuals (log scale) with shaded ribbons indicating the smoothed trend envelope (smoothing defined in Methods). Panels summarize: pre-Delta variants (Alpha/Beta/Gamma), Delta, Omicron BA.1, BA.2 (including BA.2.12.1), BA.4/BA.5, and a consolidated comparison across Omicron lineages.
These observations collectively define what we term the “Omicron Paradox”: Variants bearing 13–37 mutations in a 223-residue protein domain—including 34–37 mutations in the most recent lineages—not only retained viability but achieved epidemiological dominance, higher breakthrough-infection rates, and broader geographic distribution than variants with 10-fold fewer mutations. Classical protein evolution theory predicts that extensive mutation accumulation in a highly optimized protein–protein interface should result in structural destabilization and loss of function [
36,
37]. The WT SARS-CoV-2 RBD represents the product of natural selection for ACE2-binding affinity, optimized during the original zoonotic spillover event [
38,
39,
40]. Each mutation introduces potential perturbations to this finely tuned interaction network, yet the Omicron variant not only tolerated 37 simultaneous RBD substitutions (16.6% of domain positions) but also leveraged this mutational load to achieve enhanced epidemiological fitness. The paradox is further amplified by the temporal dynamics: Rather than declining fitness as mutations accumulated, Omicron lineages maintained or increased their epidemiological success metrics from BA.1 (16 mutations) through KP.3 (37 mutations), suggesting active selection for mutation accumulation rather than mutational meltdown. Understanding how Omicron variants resolved this paradox—maintaining ACE2 binding affinity sufficient for cellular entry while simultaneously disrupting epitope architecture to evade antibody recognition—requires an integrated analysis of three mechanistic layers: (1) receptor binding determinants and their preservation under mutational pressure, (2) epitope-disruption patterns and their correlation with immune evasion, and (3) compensatory mutation networks that stabilize the heavily mutated RBD structure.
3.2. Immune Escape Through Epitope Disruption
To investigate the mechanistic basis for the epidemiological success of the Omicron variant, we conducted a comprehensive analysis of epitope conservation across all 32 variants. We analyzed 567 B-cell epitopes and 97 T-cell epitopes induced from the IEDB, representing vaccine-induced immune responses in vaccinated individuals. For each variant, we calculated weighted conservation scores based on the proportion of epitopes preserved despite RBD mutations, with 100% representing complete conservation (all 567 epitopes intact) and 0% indicating complete disruption of all epitopes. This quantitative framework enabled a systematic assessment of antigenic drift over evolutionary time.
Epitope conservation analysis revealed dramatic differences between evolutionary eras (
Figure 6). Relative to immunodominance, pre-Omicron variants maintained relatively high conservation, ranging from 64.0% (Beta) to 87.5% (Alpha), with a median of 72.7%. In contrast, Omicron variants demonstrated the progressive erosion of epitope conservation, declining from 28.8% (BA.1) to 10.6% (KP.3). This represents an 86.2% relative reduction in conservation from WT to KP.3, indicating near-complete disruption of vaccine-induced antibody targets. Notably, Beta, Gamma, and Mu variants exhibited intermediate conservation (64.0–65.1%), foreshadowing the Omicron escape strategy through mutations at convergent hotspots.
Cross-referencing epitope conservation scores with epidemiological breakthrough-infection proxy values revealed a strong inverse association between antigenic conservation and immune evasion (
Figure 7). Across all 32 variants, lower epitope conservation was consistently associated with higher breakthrough proxy values (Spearman ρ = −0.8246, 95% CI [−0.92, −0.70],
p < 0.001). The same pattern held when variants were stratified by evolutionary era: Pre-Omicron lineages showed ρ = −0.71 (
p = 0.021), whereas Omicron lineages retained—and modestly strengthened—this association (ρ = −0.78,
p < 0.001), consistent with progressive epitope erosion becoming an increasingly important determinant of transmission success after BA.1. A linear model using epitope conservation alone explained a substantial fraction of the variance in the breakthrough proxy (R
2 = 0.703;
Figure 7). In supplementary benchmarking, epitope-based predictors performed at least as well as models built primarily from molecular dynamics–derived stability descriptors, and feature-importance analyses consistently prioritized conservation/disruption features as the strongest contributors. These results support epitope disruption as a major contributor to the immune-escape phenotype in this dataset. However, performance estimates should be interpreted cautiously, given the limited number of variants and the use of a proxy epidemiological outcome.
Building on this correlation, we modeled breakthrough proxy values directly using a nested LOO cross-validation framework across five tiered feature architectures (T1: epitope-only → T5: full feature union; n = 30 variants with authenticated CoV-Spectrum endpoints). The best-performing tier was T3 (epitope + physicochemical features; RandomForest; nested CV R2 = 0.394; 95% CI [−0.78, 0.82]; permutation p = 0.001; Dummy baseline R2 = −0.070), indicating that epitope disruption combined with physicochemical variant properties provides a statistically supported out-of-sample predictive signal for breakthrough-infection risk. Wide confidence intervals reflect the limited sample size and high stochasticity inherent to real-world breakthrough surveillance data.
To validate the epitope-breakthrough relationship through independent evidence streams, we integrated three orthogonal datasets (
Figure 8). Published vaccine effectiveness estimates against symptomatic infection correlated strongly with epitope conservation (ρ = +0.85,
p < 0.001), declining from 75–90% effectiveness against pre-Omicron variants to 25–45% against Omicron sub-lineages. Booster vaccine effectiveness exhibited an even stronger correlation (ρ = +0.91,
p < 0.001), indicating that, while booster doses partially compensate for waning immunity, they cannot fully overcome epitope-mediated escape in highly divergent variants. Neutralization fold-reduction data from pseudo-virus assays (Stanford CoV-RDB) corroborated these findings, with conservation negatively correlating with neutralization escape (ρ = −0.79,
p < 0.001). Pre-Omicron variants exhibited 1.5-to-7-fold reductions in neutralization titer, whereas Omicron variants demonstrated 20-to-80-fold reductions. Notably, 30 of 32 variants (93.8%) showed concordant classification across all three metrics, providing triangulated evidence that epitope disruption is the mechanistic driver of immune-escape phenotype.
Accordingly, to establish clinically actionable risk stratification, we implemented a binary classification of variants as HIGH-ESCAPE (conservation < 30%) or LOW-ESCAPE (conservation ≥ 30%) based on the ROC analysis (
Figure 9). This threshold achieved an area under the curve (AUC) of 0.98, indicating excellent discriminatory power. A Mann–Whitney U test confirmed complete distributional separation between the groups (U = 0,
p < 0.0001, rank-biserial effect size = −1.000), with no overlap in breakthrough-infection rates. To ensure robust generalization with the full
n = 32 cohort, a nested leave-one-out cross-validation classifier was employed with constrained hyperparameter tuning (inner adaptive K-fold grid search over regularization strength; max_depth ∈ {2, 3} for random forest). Under these rigorous out-of-sample conditions, the classifier retained strong discriminative performance: Accuracy = 0.905, F1-score = 0.900, Brier Score = 0.087. The top predictive features were the sum of mutated residues at the ACE2 interface (importance = 0.143), confirming that the ACE2 interface mutation burden is the primary structural determinant of immune-escape categorization. Importantly, data-driven immunodominance mapping reclassified Beta, Gamma, and Mu as HIGH-ESCAPE variants despite modest mutation counts (three mutations), recognizing that mutations at positions 417, 484, and 501 disproportionately disrupt neutralizing antibody epitopes. This enhanced classification framework captured documented immune escape in Beta and Gamma variants that binary thresholds missed.
A temporal analysis revealed accelerated antigenic drift in the Omicron era (
Figure 10). Pre-Omicron variants eroded epitope conservation at a rate of 0.18% per month (December 2019–October 2021), reflecting the gradual accumulation of immune-escape mutations under vaccine-mediated selection pressure. Following BA.1 emergence (November 2021), the erosion rate increased 8.5-fold to 1.53% per month, driven by convergent evolution across independent lineages. Position-level analysis confirmed concentrated disruption within the receptor-binding motif (residues 438–506) and NTD (N-terminal domain) supersite (
Figure 11). Among Omicron variants, 89% carry mutations at positions 440, 478, and 493, while 100% possess N501Y. The latter simultaneously enhances ACE2 binding affinity and abolishes class 3 neutralizing antibody recognition. This convergent pattern demonstrates strong positive selection for epitope-escape mutations that maintain or enhance receptor binding, resolving the apparent paradox of a high mutational burden with preserved transmission fitness.
To translate this descriptive signal into a rigorously validated predictive framework, we next trained a random forest regressor under a strict nested cross-validation protocol (outer leave-one-out; inner adaptive K-fold tuning) on the variants with confirmed CoV-Spectrum Growth Advantage measurements (leakage-corrected cohort; variants lacking authentic epidemiological data were excluded rather than imputed). Five feature architectures were evaluated in a tiered ablation study. The epitope-only baseline (Tier 1) achieved a nested-CV (95% CI ; permutation ), and adding interface hotspot features (Tier 2) produced comparable performance (; 95% CI ; ). The strongest generalisation was obtained when physicochemical properties were integrated with epitope disruption metrics (Tier 3: isoelectric point, GRAVY hydrophobicity, and net charge), yielding a nested-CV (95% CI ; permutation ) and outperforming the mean-predictor dummy baseline by units (Dummy ). In contrast, adding higher-dimensional epistasis/network descriptors (Tier 4) reduced out-of-sample performance (; 95% CI ; ), and the full feature union (Tier 5) further degraded generalisation (; 95% CI ; ). Notably, the decline from Tier 3 to Tiers 4–5 indicates that increasing feature dimensionality in this small- epidemiological regression task introduces feature bloat that harms generalisation, consistent with the expected small-cohort bias–variance trade-off.
3.3. Maintained Receptor Binding Despite Mutations
Since epitope sites substantially overlap with ACE2 contact residues, this challenges the fact that Omicron variants simultaneously evade antibodies and immune recognition while binding to the host receptor. Accordingly, molecular dynamics simulations of RBD-ACE2 complexes were leveraged across 18 variants, with complete MD data (8 pre-Omicron, 10 Omicron). Alongside sequence-based physicochemical properties, and epitope-disruption scores to predict binding affinity changes relative to WT (ΔΔG). This framework enabled systematic testing of whether molecular features predictive of binding differ between evolutionary eras.
Era-stratified correlation analyses suggested that RBD electrostatics are primarily associated with ACE2 binding energetics in the pre-Omicron era (
Figure 12). Pre-Omicron variants (
n = 8) exhibited a significant inverse association between net charge at pH 7 and ACE2 binding ΔΔG (Spearman ρ = −0.81,
p = 0.016), consistent with stronger predicted binding (more negative ΔΔG) in more positively charged RBDs (
Figure 12A). In contrast, Omicron variants (
n = 8) showed no significant charge–binding relationship (ρ = −0.27,
p = 0.518), and the difference between era-specific correlations did not reach statistical significance (Fisher z
p = 0.184), indicating a trend rather than definitive evidence for a discrete shift in binding determinants. Beyond electrostatics, neither RBD structural deviation (RMSD; pre-Omicron ρ = 0.52,
p = 0.183; Omicron ρ = 0.43,
p = 0.289; Fisher z
p = 0.845), nor epitope-disruption score (pre-Omicron ρ = 0.28,
p = 0.506; Omicron ρ = 0.40,
p = 0.326; Fisher z
p = 0.826), nor hydropathy (GRAVY; pre-Omicron ρ = −0.44,
p = 0.272; Omicron ρ = 0.26,
p = 0.531; Fisher z
p = 0.239) showed significant associations with ACE2 ΔΔG (
Figure 12B–D). Collectively, these results indicate that, within this representative set, MM-PBSA–derived ACE2 binding energetics are not systematically explained by global RMSD, antigenic disruption, or hydropathy, while electrostatic charge shows a pre-Omicron-specific directional signal (ρ = −0.81,
p = 0.016). These patterns are interpreted as mechanistic trends supporting interface reconfiguration between evolutionary eras, rather than as absolute binding affinity predictions; single-variant MM-PBSA values carry intrinsic uncertainty arising from force-field parameterization and the enthalpic approximation (entropic terms excluded).
Structural and interaction mapping indicate that the electrostatic basis of ACE2 binding is rewired in Omicron, rather than simply intensified (
Figure 13 and
Figure 14). When all variants are pooled, net positive charge at pH 7 shows only a modest inverse association with ACE2 binding free energy (Spearman ρ = −0.495,
p = 0.043), while structural stability (RBD RMSD) shows no significant relationship (ρ = −0.201,
p = 0.423) (
Figure 13A,B), consistent with charge being an incomplete, context-dependent predictor. Mechanistically, this loss of predictability is explained by a topological shift in the salt-bridge network at the interface: Pre-Omicron binding is dominated by the canonical K417–ACE2 D30 salt bridge (92% occupancy), which is abolished in Omicron (0%), whereas Omicron lineages instead stabilize binding through new, spatially re-positioned salt bridges—notably Q493R–E35 (68% occupancy) and Q498R–D38 (54% occupancy)—that are absent in pre-Omicron variants (
Figure 14). Together, these data support a model in which Omicron decouples “global” electrostatics (net charge) from “local” electrostatic architecture (which residues form persistent interfacial contacts), explaining why charge-based relationships that appear in mixed-era analyses do not translate into a single unified binding rule across evolutionary eras.
Experimental validation through DMS provided independent confirmation of the mechanistic reorganization and revealed the molecular basis of the evolutionary trade-off. Comparison of 14 variants with both computational (MMPBSA) and experimental (Bloom Lab DMS) binding measurements demonstrated a significant overall correlation (Pearson r = 0.682,
p = 0.007;
Figure 15), arising from distinct era-based clustering, rather than uniform agreement. Pre-Omicron variants showed enhanced experimental binding (mean Δ log
10 K
a = +0.58) paired with less favorable computed energies (MMPBSA ΔΔG = −96 kJ/mol), while Omicron variants exhibited the opposite pattern—a reduced experimental affinity (Δ log
10 K
a = −2.44), yet a more favorable computed energies (ΔΔG = −500 kJ/mol, Mann–Whitney
p = 0.008). Per-residue MMPBSA decomposition revealed that this directional paradox stems from compensatory mutations Q493R and Q498R, which transform from unfavorable or neutral contributors in pre-Omicron (+4 and +0.1 kJ/mol, respectively) to strongly favorable in Omicron (−56 and −72 kJ/mol), creating approximately 130 kJ/mol of additional favorable enthalpic interactions that MMPBSA captures. Simultaneously, K417N eliminates the dominant electrostatic anchor (contribution of K417 drops from −47 to −0.2 kJ/mol), introducing kinetic barriers and entropic penalties that endpoint free-energy methods cannot quantify. The strong correlation between the RBD mutation count and the MMPBSA-experimental discrepancy (r = 0.927,
p < 0.0001) demonstrates that accumulated mutations amplify the gap between computed enthalpy and measured affinity, validating our hypothesis that Omicron lineages prioritized immune escape over ACE2-binding optimization.
Cross-referencing ACE2-binding predictions with epitope conservation scores revealed an evolutionary trade-off between receptor binding and immune escape (
Figure 16). Variants with greater epitope disruption exhibited systematically reduced ACE2 binding affinity, as mutations in immunodominant regions (positions 417, 440, 484, 493, 501) frequently overlap with ACE2 contact residues. However, epitope disruption emerged as the dominant predictor of epidemiological success. The ACE2 Binding Affinity Predictor (H1-A) was evaluated on the
n = 17 variants with complete MD/MM-PBSA harmonized data under a two-loop nested LOO cross-validation framework (outer LOO, inner adaptive K-fold, StandardScaler refit within each fold). A Tier 1 physicochemical baseline (charge, GRAVY) achieved a modest but statistically significant nested CV R
2 = 0.177 (95% CI [−0.278, 0.465]; permutation
p = 0.016). Progressively adding MM-PBSA per-residue energetics, H3 epistasis metrics (ddG_epistasis, Compensation_Ratio, Electrostatic_Rescue_Score), and network compensation scores (Tier 4) elevated out-of-sample performance to a nested CV R
2 = 0.736 (95% CI [0.578, 0.830]; permutation
p = 0.002; Dummy baseline R
2 = −0.114), a +0.850 R
2 improvement above the mean-predictor baseline. Epitope disruption contributed the largest coefficient magnitude to this model (−4.2), followed by net charge (−0.8) and structural stability (−2.1). These findings collectively demonstrate that ACE2 binding affinity in SARS-CoV-2 is not a static structural property but an evolutionarily negotiated outcome governed by the interplay of electrostatic charge, epitope-mediated mutational constraint, and network-level epistatic compensation—features that, when integrated, provide a significant and reproducible out-of-sample predictor of receptor-binding trajectories across the full Omicron era. Improving the prediction of ACE2 binding trajectories beyond what sequence-level physicochemical descriptors alone can capture.
Additionally, to test whether highly mutated lineages can preserve entry potential via alternative receptor interfaces, we performed an exploratory analysis of RBD interactions with CD147 as a mechanistic counterpoint to ACE2-centric interpretations. This analysis was included to evaluate the specific hypothesis that (i) alternative receptor binding may follow distinct interface chemistry and (ii) mutations that erode one receptor interaction need not proportionally compromise another if the contact topology is only partially shared. Under strict nested cross-validation, the CD147 binding predictor showed moderate, statistically supported out-of-sample performance (R
2 = 0.548; 95% CI [0.257, 0.710]; permutation
p = 0.002;
n = 16;), indicating that CD147-associated binding energetics are partially predictable from sequence/biophysical descriptors and curated interface features. Mechanistically, the CD147 model favored a hydrophobicity-linked binding mode (consistent with a GRAVY-dominant coefficient pattern), rather than the electrostatic-dominant mode observed for pre-Omicron ACE2 binding, and the two receptor interfaces showed limited contact-residue overlap (
Figure 17), supporting the notion of receptor-specific mutational constraints. We emphasize that this CD147 component is exploratory and is intended to test the plausibility of alternative interface patterns, rather than to claim a dominant in vivo entry route; the relative contribution of CD147-mediated entry to transmission remains to be established experimentally.
Integration of binding affinity predictions with epidemiological metrics established a quantitative framework for variant risk stratification. Booster vaccine effectiveness models combining ACE2 and CD147 binding features were evaluated on the
n = 9 variants for which booster-effectiveness estimates were available from the CoV-Spectrum real-world surveillance database (cov-spectrum.org; filtered for variants with ≥3 independent regional observations of paired vaccinated and boosted case rates; the remaining variants either pre-dated widespread booster deployment or lacked sufficiently powered population-level booster VE studies for inclusion. This cohort achieved significant out-of-sample predictions (nested CV R
2 = 0.609; 95% CI [0.386, 0.750]; permutation
p = 0.008), indicating that dual-receptor-binding profiles capture generalizable mechanistic determinants of immune evasion beyond what ACE2-only models provide (
Figure 18). Notably, the inclusion of CD147-derived interface features—hydrophobicity index (GRAVY), degree of mutated residue overlap, and interface buried surface area—contributed measurable predictive gain over the ACE2-only baseline, suggesting that CD147-associated binding dynamics encode partially orthogonal variance in immune evasion outcomes. We acknowledge that the specific contribution of CD147-mediated entry to SARS-CoV-2 transmission remains experimentally unresolved and that the CD147 features are included here as hypothesis-driven structural descriptors, rather than confirmed causal determinants. Under this conservative interpretation, the improvement in booster VE prediction from the dual-receptor model should be understood as reflecting the informational value of CD147 interface metrics as secondary structural correlates of variant fitness, rather than direct evidence of a CD147 entry pathway. Applied to eight recent Omicron sub-lineages lacking MD data (XBB.1.5.70, BA.2.86, HK.3, EG.5, CH.1.1, JN.1, JN.1.11.1, KP.3), sequence-based models predicted moderate ACE2 binding affinities (−255 to −511 kJ/mol), consistent with maintained cellular entry capacity despite continued epitope erosion. These predictions align with the observed epidemiological dominance of JN.1 and KP.3 lineages in late 2023–2024, prospectively validating the utility of the model for variant surveillance. The framework demonstrates that receptor-binding mechanisms have fundamentally reorganized during Omicron evolution, requiring era-specific models, rather than universal predictors, and that immune-escape constraints now dominate viral fitness landscapes more strongly than receptor-binding optimization.
3.4. Molecular Mechanisms of Mutation Tolerance
Having established that Omicron variants maintained receptor binding through mechanistic reorganization despite extensive epitope disruption, we investigated the molecular mechanisms enabling the tolerance of an extreme mutational burden without structural collapse. Omicron lineages accumulate 15–20 RBD mutations compared to 1–3 in pre-Omicron variants (about 7-fold increase). Investigating compensatory mutation networks—wherein destabilizing mutations are rescued by proximal stabilizing mutations—and synergistic epistatic effects enable this unprecedented mutation tolerance. To test these hypotheses, we performed a comprehensive spatial proximity analysis (<10 Å distance threshold), energetic decomposition of mutation effects, and epistasis calculations comparing expected additive effects versus observed binding energies across 13 variants with complete MD data.
An epistatic analysis revealed that synergistic mutation interactions, rather than simple spatial compensation, constitute the dominant mechanism underlying Omicron mutation tolerance (
Figure 19). Omicron variants exhibited median negative epistasis of −954 kj/mol (
n = 5 variants, mean 16.8 RBD mutations), representing 2.0-fold stronger epistatic effects compared to pre-Omicron variants (median −477 kj/mol,
n = 8, mean 2.1 mutations; Mann–Whitney
p = 0.0093). This epistatic magnitude exceeds the biological significance threshold (−3 kj/mol) by 318-fold, indicating that Omicron mutations produce massive synergistic stabilization beyond their individual effects. All five Omicron variants analyzed (BA.1, BA.2, BA.2.12.1, BA.2.75, XBB) showed epistasis more negative than −480 kj/mol, while pre-Omicron variants ranged from −281 to −713 kj/mol with only two (Gamma, Eta) exceeding −500 kj/mol. This finding demonstrates that the ability of Omicron variants to accumulate mutations without a fitness loss derives fundamentally from non-additive, synergistic interactions among mutations, rather than from pairwise local compensation, a qualitatively different evolutionary strategy than that employed by pre-Omicron variants.
Spatial proximity analysis complemented the epistasis findings by revealing a multi-mechanism compensation strategy wherein variants employ both local and long-range stabilization (
Figure 20). Across 12 variants with complete structural data, destabilizing mutations showed a median spatial compensation of 43.8%—defined as having at least one stabilizing residue within 10 Å—exceeding the 40% biological threshold but not achieving statistical significance (
p = 0.170). This indicates that approximately 45% of destabilizing mutations ARE rescued through proximal stabilizing residues, while the remaining 55% rely on alternative mechanisms, including long-range electrostatic interactions and global epistatic networks. Variant-specific compensation fractions ranged dramatically from 73.7% (Gamma), 72.7% (Eta), and 68.8% (Beta), representing high local compensators, to 42.9% (Omicron BA.1), 25.0% (Omicron BA.2), and 12.5% (Kappa) as low local compensators. Notably, pre-Omicron variants showed higher average spatial compensation (48.4%) compared to Omicron variants (42.1%), suggesting that Omicron shifted evolutionary strategy from local spatial compensation toward distributed epistatic mechanisms—consistent with the observed 2.0-fold increase in epistatic strength.
Cross-correlation analysis between compensation metrics and ACE2 binding affinity demonstrated that compensatory mechanisms are permissive, rather than causative of binding improvements. Spatial compensation fraction showed negligible correlation with ACE2 binding affinity (Spearman ρ = −0.120, p = 0.711), indicating that variants with higher compensation do not necessarily bind more strongly to ACE2. Revealing a critical distinction, compensation mechanisms maintain structural integrity and enable the accumulation of mutations without catastrophic destabilization, but binding affinity improvements arise from interface-specific mutations (N501Y, Q493R, Q498R) that directly modulate receptor contacts, rather than from global structural rescue. Consequently, compensation acts as an evolutionary enabler—permitting variants to explore greater mutational space, including immunologically favorable epitope disruptions—while binding fitness derives from a separate set of interface-optimized mutations. This decoupling explains how Omicron lineages simultaneously achieve (i) extensive epitope erosion for immune escape, (ii) the maintenance of the structural fold through compensation, and (iii) functional receptor binding through targeted interface mutations, representing a multi-objective evolutionary optimization strategy.
The MD analysis of salt-bridge networks provided mechanistic insight into the interface reorganization, revealing how Omicron compensates for the K417N-mediated disruption (
Figure 21). Tracking 69 RBD-receptor interfacial salt bridges across 16 variants with MD trajectories confirmed that K417-D30 occupancy drops to zero in all Omicron variants due to the K417N mutation, eliminating the dominant electrostatic anchor of the WT binding mode. However, Omicron simultaneously establishes two compensatory salt bridges: Q493R-E35 (68% occupancy in Omicron, 0% in pre-Omicron) and Q498R-D38 (54% occupancy in Omicron, 0% in pre-Omicron), redistributing electrostatic stabilization across alternative interface positions. These compensatory bridges contribute approximately 130 kJ/mol of favorable enthalpy, as estimated by MM-PBSA (enthalpic approximation; entropic terms excluded), yet the net functional binding affinity is not fully restored—consistent with the enthalpy–entropy compensation principle whereby new enthalpic contacts are offset by the increased conformational entropy of the restructured Omicron interface. This interpretation is supported by three independent lines of evidence: (i) the K417-D30 salt-bridge loss is confirmed by direct MD occupancy tracking (0% Omicron vs. 92% pre-Omicron;
Figure 21); (ii) the Q493R-E35 and Q498R-D38 gains are confirmed by the same MD analysis (68% and 54% Omicron occupancy, 0% pre-Omicron); and (iii) the discrepancy between MM-PBSA and experimental binding direction is quantitatively correlated with the mutation count (r = 0.927,
p < 0.0001), providing a consistent mechanistic account of the enthalpic–entropic trade-off.
The identification of epistasis-dominant compensation mechanisms carries significant evolutionary implications for understanding the adaptive potential of SARS-CoV-2 and for informing variant surveillance strategies. The 318-fold epistatic strength above the biological threshold indicates that Omicron inhabits a mutational landscape characterized by rugged fitness peaks and extensive positive epistatic interactions, enabling the rapid exploration of sequence space without traversing low-fitness intermediates. This contrasts with pre-Omicron variants operating in smoother fitness landscapes with primarily local compensation, limiting mutational exploration. The shift from spatial (48%) to epistatic (58% via a 2.0-fold increase) compensation strategy represents a qualitative evolutionary transition, in which Omicron lineages can accumulate mutations faster than pre-Omicron lineages because synergistic effects buffer destabilizing mutations more effectively than spatial proximity alone. Critically, the weak compensation-binding correlation (ρ = −0.12) indicates that structural compensation and functional optimization are orthogonal, allowing independent evolutionary trajectories for immune escape (epitope disruption), structural maintenance (compensation networks), and receptor binding (interface mutations). This multi-dimensional fitness landscape suggests that Omicron sub-lineages will continue accumulating mutations at accelerated rates while maintaining transmission fitness, necessitating surveillance systems that monitor epistatic mutation patterns, rather than solely tracking individual high-impact mutations.
3.5. Multi-Dimensional Variant Characterization Through Cross-Hypothesis Integration
Having established the molecular mechanisms underlying Omicron binding maintenance, immune escape, and mutation tolerance, we integrated these three hypotheses into a unified multi-dimensional framework to characterize variant evolutionary strategies and assess pandemic risk potential. The three hypotheses address distinct but complementary aspects of viral fitness. The receptor-binding arm quantifies the strength of the ACE2 interaction using MMPBSA energetics and per-residue decomposition (H1). The immune-escape arm measures B-cell and T-cell epitope erosion correlated with breakthrough infections (H2). The mutation tolerance evaluates compensatory networks and epistatic synergy that enable the accumulation of mutations without structural collapse (H3). To systematically explore relationships between these dimensions, a unified master dataset was constructed, integrating 25 features across 18 variants (8 pre-Omicron and 8 Omicron variants, 2 references) spanning computational binding predictions, MD trajectories, epitope conservation scores, and compensation metrics. A cross-hypothesis Spearman correlation analysis revealed both strong mechanistic couplings and critical decoupling, supporting the hypothesis that variant evolution proceeds through the simultaneous optimization of multiple independent fitness objectives.
A correlation analysis across the integrated H1–H3 features revealed structured coupling between binding energetics, compensatory architecture, and mutational load (
Figure 22). MM-PBSA–estimated binding affinity (enthalpic component) tightly tracked epistatic ΔΔG (ρ = 1.00) and strongly aligned with the electrostatic component (ρ = 0.68), consistent with both quantities reflecting electrostatic interface reorganization. These correlations are interpreted as evidence of internal structural consistency within the MM-PBSA framework, rather than as external validation. Independent experimental validation is provided by the DMS benchmark (r = 0.682,
p = 0.007;
Figure 15).
While epistasis itself was dominated by electrostatics (mmpbsa_elec vs. epistatic ΔΔG, ρ = 0.97), consistent with salt-bridge–mediated cooperativity, network-level stabilization formed an orthogonal axis: Network density correlated with stable salt bridges (ρ = 0.74) but inversely with destabilizing mutations (ρ = −0.77) and RBD flexibility (RMSF; ρ = −0.60), whereas RMSF closely tracked destabilizing burden (ρ = 0.94). Mutational complexity scaled with both net charge change (ρ = 0.66) and epistasis per mutation (ρ = 0.90), indicating that increasingly mutated RBDs exhibit disproportionately stronger non-additive effects. In contrast, compensation fraction showed only moderate association with binding and epistasis (ρ = 0.43), supporting a permissive role for compensation relative to electrostatic and mutational determinants.
The multidimensional characterization framework carries critical implications for genomic surveillance and variant-prediction strategies. Current surveillance systems predominantly focus on individual high-impact mutations (e.g., N501Y, E484K, K417N) or spike protein mutation counts as indicators of concern. However, our integration analysis demonstrates that pandemic risk emerges from synergistic interactions across three orthogonal fitness dimensions, rather than from single mutations or linear mutation accumulation. The effective surveillance must, therefore, monitor: (1) epistatic strengthening patterns—tracking whether new mutations produce additive or synergistic effects on binding energy, with epistasis becoming more negative indicating rugged fitness landscapes enabling rapid exploration; (2) interface reorganization signatures—detecting formation of novel salt bridges (e.g., Q493R-E35, Q498R-D38) or aromatic stacking interactions suggesting compensatory binding mechanisms; (3) epitope erosion acceleration—quantifying rates of B-cell and T-cell epitope disruption rather than static conservation values; and (4) compensation architecture shifts—identifying transitions from spatial to epistatic or electrostatic compensation strategies through network topology analysis. The strong epistasis-per-mutation versus mutation-count correlation (ρ = 0.90) provides a predictive indicator: Variants accumulating mutations at rates exceeding historical baselines (>5 RBD mutations per 6-month period) likely employ epistatic mechanisms and warrant priority characterization. Conversely, variants with high mutation counts but weak epistasis may suffer fitness costs and pose lower risk. Operationally, surveillance pipelines should integrate computational binding predictions (MMPBSA), epitope mapping (IEDB), and epistasis calculations from structural models to provide multi-dimensional risk scores updated continuously updated as new sequences emerge. This proactive surveillance paradigm shifts from the reactive characterization of established variants to the predictive identification of high-risk evolutionary trajectories before they achieve widespread circulation.
4. Discussion
The emergence of SARS-CoV-2 Omicron in late 2021, carrying over 30 spike protein mutations and 15 RBD substitutions, posed a fundamental paradox: How could a variant accumulate such an extreme mutational burden while maintaining—or even enhancing— fitness for human transmission? Our multidimensional analysis of 18 variants spanning the pre-Omicron and Omicron lineages resolves this paradox by demonstrating that Omicron evolved through simultaneous optimization along three orthogonal evolutionary trajectories: immune escape via systematic epitope erosion (ρ = −0.82,
p < 0.001), receptor binding through interface reorganization creating ~130 kJ/mol compensatory enthalpy, and mutation tolerance through epistasis-dominant compensation (2.0-fold stronger,
p = 0.0093). This multi-objective optimization represents a qualitative evolutionary shift from pre-Omicron conservative accumulation to Omicron radical reorganization, with profound implications for understanding viral evolution, pandemic surveillance, and vaccine design [
13,
41].
The deep evolutionary valley traversal documented here is mechanistically explained by three quantifiable molecular strategies acting in concert. First, in the MD/MM-PBSA subset (n = 18), despite the loss of the dominant K417–D30 electrostatic anchor (K417N; 0% interaction occupancy across Omicron trajectories), ACE2 binding is partially rescued through epistatic gain-of-function substitutions: Q493R–E35 (68% occupancy) and Q498R–D38 (54% occupancy), which together contribute about 130 kJ/mol of favorable enthalpy and keep binding energetics within a WT-like range (−255 to −511 kJ/mol across variants). Consistent with this buffering mechanism, Omicron lineages exhibit 2.0-fold stronger epistasis than pre-Omicron variants (median −954 versus −477 kJ/mol, p = 0.0093). Second, structural integrity is preserved through distributed spatial compensation: 43.8% of destabilizing residues have at least one stabilizing partner within 10 Å (Hypothesis 3), with the network shifting from proximity-dominant (48.4% pre-Omicron) to epistasis-dominant in Omicron, enabling the tolerance of 13–37 simultaneous RBD substitutions without a catastrophic fold loss. Third, the dominant fitness pressure shifted toward immune evasion: Omicron sub-lineages show an 86.2% relative reduction in B-cell epitope conservation (WT/pre-Omicron median 74.3% → Omicron 35.7% → KP.3 10.6%), and epitope disruption strongly and inversely predicts breakthrough infection rates (Spearman , p < 0.001). Together, these results indicate that the Omicron fitness valley was traversed not by a single adaptive change but by coordinated multi-objective optimization across immune escape, receptor engagement, and mutational tolerance.
This result aligns with those of Starr et al. [
42], who systematically mapped epistatic interactions among the 15 RBD mutations in BA.1 relative to Wuhan-Hu-1, measuring the ACE2 affinity for all 32,768 possible genotype combinations. They demonstrated that immune-escape mutations individually reduce ACE2 affinity but are compensated by epistatic interactions with affinity-enhancing mutations Q498R and N501Y—precisely the mechanism our per-residue MMPBSA decomposition identified. Mutations tend to be more deleterious when few other mutations are present but become neutral or beneficial in highly mutated backgrounds [
43], explaining why BA.1 RBD has a stronger ACE2 affinity despite containing mutations that individually reduce binding. Our quantification extends this observation across 18 variants, demonstrating that epistatic strength scales superlinearly with the mutation count (ρ = 0.90) and establishing a general principle of synergistic mutation tolerance.
The shift from spatial proximity-based compensation (48.4% pre-Omicron) to epistasis-dominant compensation (42.1% spatial, with 2.0-fold stronger epistasis in Omicron) represents a fundamental change in the molecular architecture that enables mutation tolerance. Pre-Omicron variants relied primarily on local structural buffering, where destabilizing mutations are physically adjacent to stabilizing ones. Omicron variants instead employ distributed epistatic networks where stabilization arises from non-local electrostatic coupling (ρ = 0.97 between electrostatic energy and epistasis). This architectural shift aligns with epistasis-driven evolution models [
44], which demonstrate that RNA secondary structure constraints impose epistatic patterns shaping viral evolutionary trajectories. The electrostatic dominance suggests Omicron evolution has been constrained by the requirement to maintain favorable charge complementarity at the RBD-ACE2 interface while simultaneously disrupting antibody epitopes—a dual optimization achieved through careful selection of charged substitutions.
The complete functional reversal of Q493R and Q498R—from unfavorable (+4 kJ/mol) or neutral (+0.1 kJ/mol) in pre-Omicron to strongly favorable (−56 and −72 kJ/mol) in Omicron—represents a remarkable example of context-dependent mutation effects. DMS studies have demonstrated that N501Y, present in multiple VOCs, causes epistatic shifts in the effects of mutations at other sites, and that the availability of specific affinity-enhancing mutations changes as the RBD accumulates substitutions [
33,
45]. Our per-residue decomposition provides an atomistic resolution to these epistatic patterns, showing that Q493R forms a novel salt bridge with ACE2 E35 (68% occupancy in Omicron, 0% in pre-Omicron) and Q498R establishes electrostatic coupling with D38 (54% occupancy), creating an entirely new binding interface topology that compensates for the loss of the ancestral K417-D30 interaction eliminated by K417N.
The directional discrepancy between MM-PBSA–estimated binding enthalpies and experimental ACE2 affinity measurements (Pearson r = 0.682,
p = 0.007;
n = 14;
Figure 15) is mechanistically interpretable and consistent with the known enthalpic approximation of the single-trajectory MM-PBSA protocol, which excludes entropic contributions (−TΔS). In highly mutated Omicron variants, interface restructuring generates new favorable enthalpic contacts—most notably the compensatory salt bridges Q493R-E35 (~68% MD occupancy in Omicron, 0% in pre-Omicron) and Q498R-D38 (~54% versus 0%)—that are quantified by MM-PBSA (~130 kJ/mol combined enthalpic gain). However, the concurrent loss of the dominant K417-D30 electrostatic anchor (0% Omicron occupancy versus ~100% pre-Omicron, confirmed directly by MD trajectory analysis) and the increase in interfacial conformational flexibility impose entropic penalties that reduce net binding free energy below what enthalpic calculations alone suggest. Three independent lines of evidence support this interface-reconfiguration model: (i) K417-D30 salt-bridge occupancy drops to zero in all Omicron variants in MD trajectories; (ii) Q493R-E35 and Q498R-D38 compensatory bridges are confirmed by the same MD analysis; and (iii) the MMPBSA-experimental discrepancy scales linearly with mutation burden (r = 0.927,
p < 0.0001), consistent with progressively larger entropic costs as mutational complexity increases—a pattern predicted by enthalpy–entropy compensation theory [
46,
47] and consistent with reports that Omicron spike conformational dynamics differ substantially from those of ancestral strains [
48,
49]. Taken together, MM-PBSA is used throughout this study to characterize enthalpic interface reorganization and identify dominant energy components, within the benchmarked accuracy established by the experimental comparison.
The strong inverse correlation between B-cell epitope conservation and breakthrough-infection rates (ρ = −0.82,
p < 0.001) provides an epidemiological validation of our computational epitope mapping. Cao et al. demonstrated that over 85% of tested neutralizing antibodies escaped by Omicron through mutations at key epitope positions K417N, G446S, E484A, and Q493R—four of the 15 RBD mutations we analyzed [
50]. The 51.6% reduction in B-cell epitope conservation from pre-Omicron (74.3%) to Omicron (35.9%) quantifies this escape at the population level, integrating effects across 567 characterized B-cell epitopes. Our analysis demonstrates differential epitope targeting: B-cell epitopes showed extensive erosion, while T-cell epitopes remained largely conserved (91.5% to 87.2%; only a 4.7% relative reduction), consistent with differences in selection pressure on humoral versus cellular immunity [
51,
52,
53]. The preservation of T-cell epitope conservation despite extensive B-cell epitope erosion has profound implications for next-generation vaccine design. CD8
+ T cells target a broad range of epitopes from both structural and nonstructural proteins, which are often conserved across strains [
54,
55,
56,
57]. The breadth of CD8
+ T-cell responses, together with high conservation, enables cross-recognition of viral variants. Our findings support vaccine strategies incorporating conserved T-cell epitopes, particularly from non-spike proteins, to provide cross-reactive immunity against emerging variants.
A key contribution of this work is the development of predictive computational models that enable prospective variant risk assessment. To ensure robust generalization with limited sample sizes, all models were evaluated under a two-loop nested cross-validation framework (outer leave-one-out, inner adaptive K-fold; constrained hyperparameter grids; 100 permutation tests). The Breakthrough-Infection Proxy Predictor was trained on the n = 30 variants with confirmed CoV-Spectrum Growth Advantage data. The optimal architecture—Tier 3, combining epitope-disruption scores with physicochemical properties (isoelectric point, GRAVY hydrophobicity index, net charge)—achieved nested CV R2 = 0.394 (95% CI [−0.780, 0.824], permutation p = 0.001, and dummy baseline R2 = −0.070)—demonstrating that epitope-based structural features capture genuine predictive signal for immune-escape phenotype far exceeding the permutation null distribution. For vaccine effectiveness prediction, integrated models combining epitope scores, net charge, and structural stability achieved nested CV R2 = 0.609 (95% CI [0.386, 0.750]; permutation p = 0.008; n = 9), with epitope disruption contributing the largest coefficient magnitude (−4.2) compared to net charge (−0.8) and structural stability (−2.1). This quantitative framework, validated under strict out-of-sample conditions, enables the identification of variants likely to compromise vaccine protection before large-scale epidemiological data become available.
Our dual-receptor-binding analysis revealed that CD147 binding operates through distinct mechanisms (hydrophobicity-dominant, interface GRAVY = −406.59) compared to ACE2 (electrostatic-dominant). Under nested cross-validation, the CD147 binding predictor achieved statistically supported out-of-sample performance (R2 = 0.548; 95% CI [0.257, 0.710]; permutation p = 0.002; n = 15). Importantly, sequence-only models (R2 = 0.130; permutation p = 0.040) do not require computationally expensive MD simulations, enabling the rapid screening of emerging variants within hours of sequence availability. Dual-receptor models combining ACE2 and CD147 features achieved significant booster vaccine effectiveness predictions (R2 = 0.609; 95% CI [0.386, 0.750]; permutation p = 0.008; n = 9), validating the utility of multi-receptor frameworks for variant characterization. Applied to eight recent Omicron sub-lineages lacking MD data (XBB.1.5.70, BA.2.86, HK.3, EG.5, CH.1.1, JN.1, JN.1.11.1, KP.3), our sequence-based models predicted moderate ACE2 binding affinities (−255 to −511 kJ/mol), consistent with a maintained cellular entry capacity despite continued epitope erosion. These predictions aligned with the observed epidemiological dominance of JN.1 and KP.3 lineages in late 2023–2024, prospectively validating the utility of this approach for variant surveillance before phenotypic data become available.
Our demonstration that Omicron employs qualitatively distinct evolutionary strategies—epistasis-dominant compensation, interface reorganization, and accelerated epitope erosion—supports the hypothesis that Omicron arose through prolonged evolution in an immunocompromised host, rather than stepwise accumulation in the general population [
58,
59,
60]. The “chronic infection hypothesis” proposes that extended within-host replication in immunocompromised individuals allows sufficient time for the virus to accumulate and co-select mutations that would be individually deleterious but collectively beneficial. Corey et al. reported that 25% of immunocompromised individuals experience prolonged infections (≥21 days), with some lasting hundreds of days [
60]. The co-residence of multiple mutations within a single host enables epistatic co-selection—mutations that reduce ACE2 affinity but enhance antibody escape can persist if accompanied by compensatory mutations, precisely the pattern we observe in Omicron.
The alternative hypothesis—that Omicron evolved in an animal reservoir—is supported by observations that Q493R, Q498R, and N501Y appear in mouse-adapted strains [
61,
62]. Regardless of origin, the variant employs the multi-objective optimization strategy we characterize. The closest sequences of Omicron date to mid-2020, representing over a year of unobserved evolution—consistent with either chronic infection or zoonotic circulation. The key insight is that the success of Omicron required the simultaneous optimization of three independent fitness dimensions, unlikely through sequential single-mutation accumulation but feasible through prolonged co-evolution [
63,
64,
65].
Current genomic surveillance systems focus predominantly on individual mutations or counts of spike protein mutations as risk indicators [
20]. While effective for characterizing established variants, these reactive approaches cannot anticipate which emerging lineages pose the greatest risk. Our integrated framework suggests that pandemic risk emerges from synergistic interactions across three orthogonal fitness dimensions, rather than from single mutations or linear mutation accumulation. The strong correlation between epistasis-per-mutation and mutation count (ρ = 0.90) provides a predictive indicator: Variants accumulating mutations at accelerated rates likely employ epistatic compensation and warrant priority assessment before achieving widespread circulation.
Epistatic models using Direct Coupling Analysis outperform non-epistatic approaches for predicting mutable sites [
66]. Our multi-dimensional framework extends these approaches by integrating binding energetics (MMPBSA), epitope conservation (IEDB mapping), and compensation architecture (spatial and epistatic networks) into a unified risk assessment. Operationally, surveillance pipelines could flag emerging lineages showing: (1) mutation rates exceeding 5 RBD substitutions per 6-month period; (2) novel salt-bridge formation at the RBD-ACE2 interface; (3) B-cell epitope conservation below 50%; and (4) epistatic strengthening indicated by non-additive binding energy contributions. Such multi-dimensional monitoring would enable proactive, rather than reactive, variant assessment.
Immune imprinting—where prior exposures bias subsequent immune responses toward the original antigen—has complicated vaccine updates for emerging variants [
67,
68,
69]. Boosting with heterologous spike proteins does not induce broader neutralizing antibodies; instead, responses predominantly target the parental Wuhan-Hu-1 strain. Bivalent vaccines targeting BA.4/BA.5 elicit markedly lower neutralizing antibodies against BQ.1.1 and XBB than against ancestral strains. This limitation arises partly because current vaccines focus on epitopes that Omicron has systematically eroded. The 51.6% reduction in B-cell epitope conservation means vaccine-induced antibodies targeting ancestral epitopes encounter substantially altered binding surfaces on circulating variants, reducing neutralization potency.
Several vaccine strategies emerge from our analysis. First, targeting conserved T-cell epitopes, particularly those from non-spike proteins, offers variant-resistant protection—our observation that T-cell conservation remained at 87.2% indicates that functional constraint limits escape. Multi-epitope vaccines incorporating conserved B-cell, CD4
+, and CD8
+ T-cell epitopes have shown protection against multiple VOCs [
70,
71,
72]. Second, broadly neutralizing antibodies targeting conserved S2 regions—including stem helix and fusion peptide—may provide pan-coronavirus protection [
73,
74]. Third, our demonstration of interface plasticity suggests that vaccines that induce antibodies targeting conformational epitopes may rapidly escape through interface rewiring. Structure-guided antigen engineering to stabilize specific spike conformations could mitigate this evasion route.
Several limitations warrant consideration. First, MMPBSA calculations capture enthalpic contributions but not entropic effects, as evidenced by the paradoxical direction of computational and experimental binding measurements. The statistically significant but moderate correlation with experimental DMS data (r = 0.682, p = 0.007; n = 14) confirms that MM-PBSA captures directional binding trends and era-level differences but is not suited to absolute affinity prediction in highly mutated variants. All MM-PBSA–derived conclusions in this study are intentionally scoped to relative comparisons and mechanistic energy decomposition (electrostatic versus van der Waals components), consistent with the benchmarked performance level.
Second, our analysis included 18 variants, providing adequate statistical power for era comparisons but limiting generalization to future variants with potentially novel mechanisms. Third, molecular dynamics simulations (100 ns) capture intermediate timescale dynamics but may miss slower conformational transitions relevant to binding entropy. Fourth, epitope conservation calculations assume the equal immunological relevance of all characterized epitopes have equal immunological relevance. Immunodominance hierarchies may weight certain epitopes more heavily. Finally, the cross-sectional design cannot establish causation between evolutionary mechanisms and transmission fitness.