Abstract
Merbecovirus, a subgenus of coronaviruses that includes the highly pathogenic Middle East respiratory syndrome coronavirus (MERSr-CoV), poses a significant zoonotic threat. To better understand its host adaptation and potential for cross-species transmission, we conducted a comprehensive analysis of codon usage patterns in 1967 Merbecovirus sequences. Phylogenetic analysis confirmed the division of Merbecoviruses into seven distinct clusters. Codon usage bias was found to be low and predominantly shaped by natural selection, with a consistent A/U-rich composition across the genome. Codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analyses indicate that Merbecovirus exhibits potential host adaptation to Sus scrofa (pigs), Equus caballus (horses), and Oryctolagus cuniculus (rabbits), suggesting a risk of cross-species transmission. Strikingly, this genomic-level adaptation prediction is supported by emerging functional evidence: recent studies have demonstrated that key Merbecovirus lineages utilize diverse cell entry receptors (DPP4 or ACE2), a fundamental determinant of host tropism. For instance, the ability of the HKU5 lineage to utilize ACE2 receptors from mustelids like minks (Neogale vison) provides mechanistic support for the host adaptability trends inferred from our genomic analyses. By integrating existing receptor specificity data, this study provides the first systematic, large-scale analysis of codon usage across the Merbecovirus subgenus, elucidating key mechanisms of genomic adaptation and viral evolution. Our analytical framework provides a novel comparative perspective on host diversity and pinpoints specific surveillance priorities for mitigating future spillover risks.
1. Introduction
Merbecovirus, a subgenus of coronaviruses within the Betacoronavirus genus of the Coronaviridae family, includes a range of viruses isolated from various hosts, primarily bats, and are frequently transmitted across species and pose significant health risks to both animals and humans. Of these, Middle East respiratory syndrome coronavirus (MERS-CoV) [], the most notable member of Merbecovirus that first emerged in Saudi Arabia in 2012, has since spread rapidly around the globe and caused regional and global outbreaks through cross-species transmission from dromedary camels to humans. MERS-CoV infection ranges from asymptomatic or mild respiratory illness to severe disease, including pneumonia, respiratory failure, and death, with older adults and individuals with chronic conditions at higher risk. As of September 2025, the World Health Organization (WHO) reported 2627 cumulative confirmed cases of MERS worldwide since April 2012, resulting in 947 deaths and a high fatality rate of 36% []. Given the serious threat that MERS-CoV poses to human health, in-depth analysis of the genomic characteristics, host range, ecological distribution, and possible public health implications of other viruses in the Merbecovirus subgenus could provide essential scientific knowledge for the prevention and control of future coronavirus pandemics. Merbecovirus subgenus includes four species, as defined by the International Committee on Taxonomy of Viruses (ICTV): Betacoronavirus cameli, Betacoronavirus erinacei, Betacoronavirus pipistrelli, and Betacoronavirus tylonycteridis. In the present classification, the four representative viruses—Middle East respiratory syndrome-related coronaviruses (MERSr-CoV), Tylonycteris bat coronavirus HKU4 (Ty-BatCoV HKU4), Pipistrellus bat coronavirus HKU5 (Pi-BatCoV HKU5), and Hedgehog coronavirus 1 (HedCoV1)—are each assigned to one of these four species. In addition, this study included three unclassified Merbecoviruses, mink-derived HKU5-like viruses, Manis javanica HKU4-related coronavirus (MjHKU4r-CoV), and Erinaceus hedgehog coronavirus HKU31 (Ea-HedCoV HKU31), to comprehensively capture the genetic diversity within the subgenus. Of note, Ty-BatCoV HKU4 and Pi-BatCoV HKU5 were identified in Hong Kong, marking the initial detection of Merbecovirus five years before the MERS epidemic outbreak [,]. In 2024, Pipistrellus bat coronavirus HKU5-like viruses were further identified in two farmed minks, and the mink-derived HKU5-like CoV lineage was found to be phylogenetically closely related to viruses previously reported exclusively in bats, within which recombination events have been documented [].
Although several hosts have been identified in Merbecoviruses, such as pangolins (Manis javanica) [] and hedgehogs (Erinaceus hedgehog) [], previous studies have demonstrated varying susceptibility to MERS-CoV among different species, with cells from rhesus monkeys (Macaca mulatta) [], common marmoset (Callithrix jacchus) [], goats (Capra hircus) [], horses (Equus caballus) [], rabbits (Oryctolagus cuniculus) [], pigs (Sus scrofa) [], and civets (Civettictis civetta) [] supporting MERS-CoV replication. In addition, novel viruses similar to Merbecovirus and new hosts continue to be discovered, increasing the potential impact and threat posed by the Merbecovirus subgenus [,,].
The genome of Merbecoviruses encodes various proteins, one main polyprotein (pp1ab), comprising, among others, the RNA-dependent RNA polymerase (RdRp), and four structural proteins: Spike (S), Envelope (E), Membrane (M), and Nucleocapsid (N). The S protein is particularly vital for invading host cells by facilitating virus attachment and cell membrane fusion []. The redundancy inherent in the genetic code allows for multiple codons to encode the same amino acid, a feature that influences protein production efficiency and fidelity []. These synonymous codons, while interchangeable, may vary in their representation within cells and in their recognition speed by ribosomes []. Codon selection, termed codon usage bias (CUB), is not random within or across genomes, being influenced by factors such as mutation pressure, natural selection, and environmental conditions [,,]. Mutation pressure, arising from repeated nucleotide substitutions, occurs at a rate of 10−4–10−5 per nucleotide per replication cycle [], while translation-related selection further influences codon choice. CUB can modulate viral mRNA stability, translational efficiency, and protein expression, and may facilitate immune evasion by altering RNA secondary structures or reducing CpG content []. Heterogeneity in CUB across genes or evolutionary stages reflects viral functional requirements and selective pressures. Understanding viral codon usage can illuminate viral evolution, gene expression regulation, and aid in vaccine development by optimizing viral protein expression for immune response generation [,].
Therefore, it is essential to monitor both known and potential hosts and understand the virus–host interaction, with particular attention to the host immune response. In this study, codon usage patterns of Merbecovirus were systematically analyzed to investigate host adaptation and to elucidate how codon usage influences viral adaptability and transmission potential.
2. Materials and Methods
2.1. Source of Target Sequences
We collected 1965 Merbecovirus sequences from the National Center for Biotechnology Information (NCBI) (as of the data available in April 2025, the time of this study), complemented by two additional Pipistrellus bat coronavirus HKU5-like sequences identified in farmed mink and retrieved from GenBase []. To guarantee the quality of the sequence, we removed sequences that were too long or too short and contained degenerate and undetermined bases (W, K, R, and X). After quality control, the five curated datasets were retained for subsequent analyses: 796 sequences for RdRp; 998 for S protein; 759 for E protein; 770 for M protein; and 815 for N protein.
To ensure consistency and comparability in further analyses, all Merbecovirus sequences were filtered to retain only seven lineages: MERSr-CoV, Ty-BatCoV HKU4, Pi-BatCoV HKU5, HedCoV1, Ea-HedCoV HKU31, MjHKU4r-CoV, and mink-derived HKU5-like. The virus lineages analyzed in this study were further classified according to their corresponding hosts as follows: MERSr-CoV: Camelus dromedarius, Homo sapiens, Vespertilionidae-MERSr; Ty-BatCoV HKU4: Vespertilionidae-HKU4; Pi-BatCoV HKU5: Vespertilionidae-HKU5; HedCoV1: Erinaceus europaeus; Ea-HedCoV HKU31: Erinaceus amurensis; MjHKU4r-CoV: Manis javanica; mink-derived HKU5-like coronavirus: Neogale vison (Supplementary Table S1). RdRp and four structural protein sequences (S, E, M, and N) were then extracted from each group for subsequent analyses.
2.2. Recombination Analysis
Only the main viral ORFs (RdRp and S) were selected for recombination analysis because they are homologous across the viral lineages studied and encode proteins essential to key stages of the viral life cycle, such as replication and entry. RdRp and S gene sequences were analyzed separately to identify potential recombination events using RDP4 []. Seven recombination detection methods implemented in RDP4 were employed: RDP [], GENECONV [], 3Seq [], Chimae [], SiScan [], MaxChi [], and LARD []. Recombination events detected by at least three methods with a p-value ≤ 0.05 were considered credible []. Recombinant sequences were subsequently removed, and the analysis was iteratively repeated until no further recombination was detected. Accordingly, five RdRp sequences and two S gene sequences were excluded from downstream analyses.
2.3. Phylogenetic Analysis
The sequences of RdRp and S, after recombination removal, were aligned using MAFFT v.7.520 [] with default parameters. To minimize the influence of synonymous substitutions, phylogenetic trees were constructed based on the amino acid sequences using the maximum-likelihood (ML) method in IQ-TREE v.2.2.2.6 [], with the best-fit substitution model selected automatically by ModelFinder Plus (MFP) option (-m MFP). Node support was assessed with ultrafast bootstrap analysis using 1000 replicates, and only nodes with bootstrap values ≥ 70% were indicated on the tree []. The FigTree and iTOL v.6 [] were used to visualize and annotate phylogenetic trees.
2.4. Principal Component Analysis (PCA)
PCA is a multivariate statistical method used to analyze the primary trends in codon usage patterns. Sequences of all groups were recoded using 59-dimensional space of principal components (PC) that illustrated the RSCU features. Principal components PC1 and PC2, which accounted for the majority of the variance in the RSCU, were plotted; PC2 and PC3 were also analyzed to reveal secondary variation patterns and support clustering structures. PCA was performed using GraphPad Prism 9.0.
2.5. Nucleotide Composition Analysis
The compositional parameters of RdRp and four structural protein (S, E, M, and N) genes were calculated after removing stop codons (UAA, UGA, UAG) as well as AUG and UGG (because Met and Trp are coded by a single codon with AUG and UGG, respectively). The nucleotide frequencies of the third synonymous codon positions (A3s, G3s, C3s, U3s) were calculated using CodonW (v1.4.2). In addition, the Grand Average of Hydropathicity (Gravy) and Aromaticity (Aroma) indices were calculated using CodonW (v1.4.2) to assess the overall hydrophobicity and aromaticity of the encoded proteins, respectively. The frequencies of A, U, G, and C were calculated using CAIcal server (http://genomes.urv.es/CAIcal/; accessed on 30 September 2025). The frequencies of synonymous codons with G + C content at the first (GC1s), second (GC2s), and third (GC3s) positions were determined using the online EMBOSS (http://emboss.toulouse.inra.fr/; accessed on 30 September 2025). The G + C content at the first and second positions was combined to calculate GC12s.
2.6. Analysis of Relative Synonymous Codon Usage (RSCU)
RSCU refers to the relative probability of using a specific codon usage among synonymous codons encoding the same amino acid []. If a codon is used without preference, RSCU equals 1. Specifically, RSCU > 1 indicates a preference for the codon, RSCU ≥ 1.6 suggests a strong codon bias, indicating a high-frequency codon, and RSCU ≤ 0.6 implies a weak codon bias, indicating a low-frequency codon. RSCU values for 59 codons (excluding AUG [Met], UGG [Trp], the three stop codons UAA, UGA, and UAG, and ambiguous bases) were calculated using the CAIcal website (http://genomes.urv.es/CAIcal/; accessed on 30 September 2025). For each synonymous codon, the optimal codon was chosen based on its highest number of occurrences and largest RSCU.
Synonymous codon usage data for hosts were retrieved from the Codon Usage Database (http://codonstatsdb.unr.edu/; accessed on 30 September 2025) []. The RSCU values were calculated using the following formula []:
where Xij represents the number of occurrences of the jth codon for the ith amino acid, which has ni types of synonymous codons. Previous research demonstrated that MERS-CoV replicates in cells from rhesus monkeys, goats, horses, rabbits, pigs, civets, dromedary camels, and bats []. Research has shown that camels, the primary host of MERS, come into contact with wild rodents, rabbits, and possibly bats []. Based on this ecological and experimental context, the selection of host species for codon usage analysis in this study followed two primary criteria: biological relevance and data availability. The analysis included all confirmed natural hosts of Merbecoviruses with complete codon usage data available in the Codon Usage Database, namely Homo sapiens, Camelus dromedarius, Erinaceus europaeus, and Manis javanica. Potential hosts were selected based on indications from the aforementioned studies, and included pigs, horses, and rabbits, provided they possessed unique and complete entries in the database. Although bats are recognized as key reservoirs, this group was excluded from the quantitative codon adaptation analysis due to the presence of multiple, inconsistent entries at the family level in the database, which prevented a reliable and representative assessment.
Based on the computed RSCU values for five genes, heatmaps were generated using R (v4.4.3) to visually illustrate differences in codon usage patterns among different viral lineages.
2.7. Analysis of Dinucleotide Relative Abundance and Characterization
To understand the impact of dinucleotide frequencies on codon usage selection and identify overrepresented dinucleotides, the occurrence frequencies of all 16 possible dinucleotides within coding sequences were calculated. DAMBE software (v7.3.11) was used to compute the relative abundance of dinucleotides. The formula for calculating the dominance ratio of the 16 dinucleotides is as follows:
where , , and denote the occurrence rate of nucleotide X, the prevalence of nucleotide Y, and the recorded occurrence frequency of the dinucleotide XY, respectively. When exceeds 1.23 (or falls below 0.78), the dinucleotide XY is deemed as being overrepresented (or underrepresented).
2.8. Analysis of Effective Number of Codons (ENC)
ENC is a method used to describe the strength of preference for the usage of synonymous codons []. The ENC value ranges from 20 (when only one codon is used) to 61 (when all synonymous codons are equally used). A lower ENC value indicates a stronger codon preference. An ENC value below 35 suggests a strong codon preference. The ENC value can be calculated using CodonW (v1.4.2). The formula for calculating the ENC value is as follows:
where k (k = 2, 3, 4, 6) represents the means of the k-fold degenerate amino acids, which is calculated as outlined below:
where n is the total count of occurrences for the codons associated with a particular amino acid; represents the count of occurrences for the specific jth codon related to that amino acid.
2.9. ENC-GC3s Plot Analysis
ENC-GC3s plots are often used to visualize whether mutation pressure is a major determinant of codon usage bias. An ENC-GC3s plot involves constructing a scatter plot with GC3s as the independent variable and the ENC value as the dependent variable. If mutational pressure is the sole driving factor behind codon usage bias, the points on the plot will lie on a curve that can be predicted when the value of the ENC depends only on genomic composition, and computed as follows:
Alternatively, if the points are below the standard curve, it suggests that factors other than mutational pressure influence codon usage bias. ENC values and GC3s were calculated using CodonW (v1.4.2), and GraphPad Prism 9.0 was used for plotting.
2.10. The Parity Rule 2 (PR2) Analysis
A PR2 plot is a method of studying the composition of codon bases. CAI website (http://genomes.urv.es/CAIcal/; accessed on 30 September 2025) was used to calculate the A3%, C3%, G3%, and T3% values. The comparison of A3/(A3 + U3) with G3/(G3 + C3) is used to assess the relationship between mutation pressure and natural selection. A = U and G = C (i.e., axis values of 0.5 and 0.5), respectively, indicating a balance between mutation pressure and natural selection.
2.11. Neutrality Analysis
Regression curves were computed to assess the impacts of mutational pressure and natural selection on codon usage, using the GC12s (y-axis) plotted against the GC3s (x-axis) relationship. The stronger the correlation, the closer the slope of the regression line is to 1, indicating that codon usage bias is primarily influenced by mutational pressure. Conversely, as the slope of the regression line gradually decreases (even reaching 0), it suggests an increasing role of natural selection pressure on codon usage bias. Neutrality plots were constructed by GraphPad Prism 9.0, and regression lines were calculated.
2.12. Correlation Analysis
To investigate the relationship between nucleotide composition and codon usage features, Spearman’s rank correlation analyses were performed in R (v4.4.3) for A3, T3, G3, C3, GC3, ENC, as well as A%, T%, G%, C%, GC%, Gravy, and Aroma values. The correlation results were then visualized as heatmaps using R (v4.4.3). To discern the primary evolutionary forces shaping codon usage bias, two analytical approaches were employed. First, the correlation between the base composition at the third codon position (A3, U3, G3, and C3) and the overall genomic base composition was assessed. A strong positive correlation, particularly between GC3 and overall GC content, was interpreted as evidence for the dominance of genome-wide mutation pressure. Second, the association between the ENC and indices of protein physicochemical properties (namely, the Gravy and Aroma indices) was evaluated. A significant association in this case was considered indicative of natural selection acting through functional constraints.
2.13. Codon Adaptation Index (CAI) Analysis and Relative Codon Deoptimization Index (RCDI) Analysis
CAI analysis was performed to predict the adaptability of Merbecovirus RdRp, S, M, N, and E genes to their natural hosts and potential hosts. The CAI value ranges from 0 to 1.0, where a higher CAI value indicates better adaptation of the virus to its host.
RCDI reflects the similarity of the codon usage between a given coding sequence and a reference genome []. RCDI analysis was conducted to calculate RCDI values for the encoding sequences in comparison to potential hosts. When the value is 1.0, the use of codons is appropriate for the host, and when it is greater than 1.0, the use of codons is deviated from the host. Both CAI and RCDI were computed using the CAIcal SERVER (https://ppuigbo.me/programs/CAIcal/; accessed on 30 September 2025).
3. Results
3.1. The Phylogenetic Relationship of Merbecovirus Shows Clustering Patterns Similar to Those Seen in the PCA
We performed phylogenetic analyses of Merbecovirus based on the RdRp gene and the S gene (Figure 1), providing an extensive understanding of the evolutionary relationships among the different viral lineages analyzed. Phylogenetic analysis revealed seven distinct host-associated groups in the Merbecovirus subgenus, represented by MERSr-CoV, HedCoV1, Pi-BatCoV HKU5, Ty-BatCoV HKU4, MjHKU4r-CoV, Ea-HedCoV HKU31, and mink-derived HKU5-like. Phylogenetic analysis of S and RdRp genes shows that MjHKU4r-CoV is most closely related to Ty-BatCoV HKU4, indicating a close evolutionary relationship, which is consistent with the PCA analysis where both viruses cluster together, further confirming their high genetic similarity (Figure 2, Supplementary Figure S2). In the phylogenetic tree of S and RdRp genes, virus sequences from Homo sapiens and Camelus dromedarius infected with MERSr-CoV alternate and cluster together, indicating a high degree of genomic similarity among these strains. This pattern may reflect frequent cross-host transmission events, particularly between Homo sapiens and Camelus dromedarius, which is a phenomenon that has been well-documented. Moreover, this clustering may also suggest that MERSr-CoV experiences similar selective pressures across different hosts, leading to convergent evolutionary trajectories.
Figure 1.
(A) Maximum-likelihood phylogenetic tree of Merbecovirus RNA-dependent RNA polymerase (RdRp) gene based on the amino acid sequences; (B) maximum-likelihood phylogenetic tree reconstructed based on the amino acid sequences of the spike (S) gene. Branch lengths are proportional to the number of amino acid substitutions per site, and the scale bar represents the estimated genetic distance. Branches are colored according to the host, and the outer ring indicates the virus lineages. Phylogenetic trees were constructed by IQTree based on the maximum-likelihood method with a bootstrap of 1000 replicates and visualized using the iTOL online tool (https://itol.embl.de/; accessed on 30 September 2025). Nodes exhibiting statistically significant support (bootstrap values ≥ 70%) are annotated with orange circular markers, with the marker diameter scaled proportionally to the corresponding bootstrap support value.
Figure 2.
Principal Component Analysis (PCA) based on the RSCU values of 59 synonymous codons. (A,B) represent codon usage patterns of the RdRp and S gene in different Merbecovirus lineages, respectively; (C,D) represent codon usage clustering of the RdRp and S gene in different hosts. MERSr-CoV, Pi-BatCoV HKU5, Ty-BatCoV HKU4, MjHKU4r-CoV, Ea-HedCoV HKU31, HedCoV1, and mink-derived HKU5-like are represented in blue, red, yellow, green, dark gray, purple, and dusty pink. Erinaceus europaeus, Erinaceus amurensis, Vespertilionidae-MERSr, Camelus dromedarius, Homo sapiens, Vespertilionidae-HKU5, Vespertilionidae-HKU4, Manis javanica, and Neogale vison are represented in blue, red, yellow, green, light gray, purple, cameo brown, black, and dusty pink, respectively. Vespertilionidae-MERSr: Vespertilionidae carrying MERSr-CoV; Vespertilionidae-HKU5: Vespertilionidae carrying Pi-BatCoV HKU5; Vespertilionidae-HKU4: Vespertilionidae carrying Ty-BatCoV HKU4.
The PCA plot revealed that, as observed in the phylogenetic analysis, Pi-BatCoV HKU5 and MERSr-CoV partially overlapped in the RdRp and S genes, and Ty-BatCoV HKU4 and MjHKU4r-CoV partially overlapped in the RdRp, S, and N genes (Supplementary Figures S1 and S3). At the host’s level, similarly, in the PCA plots for the RdRp, S, E, and N genes, Vespertilionidae from Ty-BatCoV HKU4 and Manis javanica from MjHKU4r-CoV showed close clustering of points. Specifically, for Homo sapiens and Camelus dromedarius, there was significant overlap in the PCA plots of all five genes, mirroring the phylogenetic relationships.
3.2. Nucleotide Composition Analysis Indicated a High Abundance of AU
The values of nucleotide contents in the RdRp, S, M, N, and E genes of Merbecovirus were analyzed (Supplementary Table S2). The results showed that nucleotides U and A were abundant in all proteins except the N protein, with about 60% AU content. Although A and C were the most abundant nucleotides in the N protein with 29.88 ± 0.36 (mean ± standard deviation) and 26.64 ± 0.38, respectively, the AU content (52.69%) was also greater than the GC content (47.31%). In addition, U3s and A3s were also higher than C3s and G3s in all proteins. For example, in RdRp, U3s (0.53 ± 0.02) and A3s (0.29 ± 0.01) were higher than C3s (0.25 ± 0.02) and G3s (0.21 ± 0.01). Similarly, the GC content in different proteins showed a consistent trend that the highest GC frequency (%) was located in position 1 and the lowest GC frequency was located in position 3. For example, the GC1/2/3 content of RdRp was 49.13%, 36.37%, and 34.41%, respectively. Analysis of five genes (RdRp, S, E, M, and N) from different host species revealed consistent AU-rich characteristics and GC positional trends across all corresponding viral genomes (Supplementary Table S3). In conclusion, the analysis of the nucleotide composition of the different proteins showed that Merbecovirus codons preferred U and A, and usually ended with U.
3.3. The RSCU of Merbecovirus Was A/U-End Biased and Opposite to the Hosts
Through RSCU analysis, the codon usage trend was studied to further understand why A/U nucleotides are preferentially used at the third position in the RdRp and four structural proteins of Merbecovirus (Supplementary Figures S4 and S5). In the RdRp and S protein, seven distinct viral lineages share 5 (UUU, UAU, CAU, AAU, and UGU) and 18 (UUU, AUU, GUU, UCU, CCU, GCU, UAU, CAU, CAA, AAU, GAU, GAA, CGU, GGU, UAU, AAU, UGU, and GGU) common optimal codons, respectively, all ending with A/U (Supplementary Table S4). Moreover, the over-represented codons (RSCU > 1.6) tend to be A/U-ended, while the underrepresented codons (RSCU < 0.6) are mainly G/C-ended. For different viral groups of Merbecovirus, there were two (UAU, CAA), three (CAU, AAU, UGU), and three (AAU, GAU, AGA) common optimal codons for E, M, and N, respectively, all ending with A/U (Supplementary Table S5). However, the optimal codons for various viral groups of Merbecovirus in RdRp and the four structural proteins are entirely distinct from those of all known and potential hosts (horse, rabbit, and pig), all of which exhibit a shared preference for codons ending in C/G. In addition, codons containing CpG dinucleotides (UCG, CCG, ACG, GCG, CGC, and CGA) were mostly underrepresented in RdRp and four structural proteins.
3.4. Mutation Pressure and Natural Selection Have Both Influenced Codon Usage Patterns
To investigate the effects of natural selection and mutational pressure on codon usage, we performed ENC-plot analysis, PR2 analysis, neutrality analysis, and correlation analysis. In this study, the ENC values for all seven groups in different proteins were above 35, indicating a low codon preference for Merbecovirus (Table 1). Individually, the highest ENC value was for the E (53.91 ± 4.63) and the lowest for the S (46.01 ± 4.39) coding sequence. Comparing the ENC values of seven lineages of Merbecovirus, we found that the highest ENC value was for MERSr-CoV (52.58 ± 4.84) and the lowest for MjHKU4r-CoV (46.30 ± 6.34).
Table 1.
The effective number of codons (ENC) values for viral structural protein (S, E, M, N) and nonstructural gene (RdRp) of different Merbecovirus lineages.
To further investigate the synonymous codon usage pattern of Merbecovirus, the relationship between ENC and GC3s was assessed. The ENC-GC3s plot revealed that the RdRp, S, and N (Figure 3, Supplementary Figure S6) are located below the standard curve, which indicates that natural selection could be responsible for the codon usage bias. In contrast, the M and especially E (Supplementary Figure S6) are closer to or even exceed the standard curve, suggesting that they experienced greater mutational pressure compared to other proteins. More specifically, these two proteins have been subjected to some degree of mutational pressure in MERSr-CoV and Ea-HedCoV HKU31.
Figure 3.
Analysis of codon usage bias using ENC-GC3s plots for the RdRp and S gene. (A,B) represent ENC plotted against GC3s for the RdRp and S gene in different Merbecovirus lineages, respectively; (C,D) represent ENC plotted against GC3s for the RdRp and S gene in different hosts, respectively. Solid curves represent the expected ENC values. Arrows point to the magnified views (right) of the areas marked by the dashed orange circles.
In addition, we performed the PR2 bias plot (Figure 4, Supplementary Figure S7) where all points are far from (0.5, 0.5) and the majority of the points are in the region of A3s/(A3s + U3s) > 0.5, G3s/(G3s + C3s) < 0.5, indicating all proteins tends to use the A/C base, further confirmation of the overlapping effect of natural selection and mutational pressure on Merbecovirus codon preferences. Neutrality analysis was used to further confirm whether natural selection or mutational pressure primarily shaped the codon usage patterns of Merbecovirus. Due to insufficient sequence data, the mink-derived HKU5-like, MjHKU4r-CoV, and HedCoV1 groups did not meet the requirements for constructing regression curves and were, therefore, excluded from the neutrality analysis. Similarly, sequences from the Ea-HedCoV HKU31 were not included in the RdRp analysis as they also failed to meet the criteria for regression curve construction. The analysis showed (Figure 5, Supplementary Figure S8) that the effect of mutation pressure on codon usage bias of RdRp in MERSr-CoV, Pi-BatCoV HKU5, and Ty-BatCoV HKU4 was only 13%, 7%, and 29%, suggesting that natural selection dominated codon usage bias in Merbecovirus. However, for the S protein, mutational pressure (relative to neutrality) was higher than for RdRp, as evidenced by 15%, 10%, 16%, and 36% occupancy in MERSr-CoV, Pi-BatCoV HKU5, Ty-BatCoV HKU4, and Ea-HedCoV HKU31, respectively. Although the previous ENC-plot showed that E proteins are under more mutational pressure in MERSr-CoV, neutral analysis indicated that natural selection is still dominant, namely, that natural selection contributes 72%. Correlation analysis yielded results consistent with the findings described above (Supplementary Figure S9). Combining ENC-plot, PR2 bias analysis, neutrality analysis, and correlation analysis, we conclude that despite the dominance of natural selection in codon usage bias, mutational pressure still exerts a non-negligible influence.
Figure 4.
Parity Rule 2 (PR2) analysis of codon usage for the RdRp and S genes. (A,B) represent PR2 plot for the RdRp and S gene in different Merbecovirus lineages, respectively; (C,D) represent PR2 plot for the RdRp and S gene in different hosts, respectively. The center of the plot (0.5, 0.5), indicates the place where there is no bias in the effect of mutation or selection pressure.
Figure 5.
Neutrality plot analysis of codon usage (GC3s against GC12s) for the RdRp and S genes. (A,B) represent neutrality plot for the RdRp and S gene in different Merbecovirus lineages, respectively; (C,D) represent neutrality plot for the RdRp and S gene in different hosts, respectively. GC12s is plotted on the ordinate, and GC3s on the abscissa. The dotted line is the linear regression of GC12s against GC3s.
3.5. Analysis of Dinucleotide Relative Abundance and Characterization
Apart from mutation pressure and natural selection, other factors such as dinucleotide abundance are considered to influence codon usage bias. The relative abundances of the 16 dinucleotides were calculated for the RdRp and four structural proteins of Merbecovirus to assess their influence on codon usage selection. Deviations from the expected value (relative abundance = 1) were observed (Supplementary Figures S10 and S11), indicating non-random dinucleotide occurrences. In summary, we found four dinucleotides with high proportions (ApG, CpA, GpC, and UpC) and two dinucleotides with low proportions (CpG and GpA). Among them, CpG shows a serious underrepresentation with a mean value of only 0.59. Our investigation revealed variations in the codon usage patterns of Merbecovirus different-lineage dinucleotides, particularly highlighting a significant deficiency in RdRp and all structural sequences related to CpG. In the RSCU analysis, representative codons associated with CpG, except CGU encoding Arg, were underrepresented for the remaining seven codons. The CpG deficiency can be attributed to a preference for codons ending in U/A, consistent with our RSCU analysis. Overall, dinucleotide composition contributes to codon usage bias in Merbecovirus.
3.6. Codon Adaptation Index (CAI) and Relative Codon Deoptimization Index (RCDI) Analysis
To investigate the adaptability of Merbecovirus to its natural and potential host, we calculated CAI and RCDI values for eight reported hosts and three potential hosts of Merbecovirus, respectively (Table 2). CAI represents the relationship between gene expression levels and codon usage patterns, and higher CAI values indicate stronger adaptability. RCDI represents the adaptation degree of a pathogen to its host species, with a lower RCDI value meaning higher adaptation. It was found that the CAI and RCDI scores of different hosts showed diverse adaptive patterns, and that CAI as well as RCDI varied for different proteins in the same host. For instance, our analysis revealed that the average CAI score (0.71 ± 0.02) and RCDI score (1.48 ± 0.17) for Homo sapiens were comparatively higher than those for other hosts, suggesting that Merbecovirus is better adapted to humans. Specifically, the RdRp protein exhibited improved adaptation in humans, as indicated by a higher CAI (0.73) and lower RCDI (1.36). Based on CAI and RCDI analyses, pigs, horses, and rabbits exhibited codon adaptation patterns similar to those observed in known natural hosts. Interestingly, among the potential hosts, horses displayed the highest average CAI (0.64 ± 0.02) and RCDI (1.6 ± 0.22) scores, possibly indicating a favorable codon adaptation of Merbecovirus in horse cells.
Table 2.
The codon adaptation index (CAI) and the relative codon deoptimization index (RCDI) of natural hosts and potential hosts of Merbecovirus.
4. Discussion
The potential threat posed by Merbecovirus is underscored by its ability to cross species barriers, infecting various mammalian hosts and raising concerns about its capacity to cause future zoonotic outbreaks. Phylogenetic analyses suggest that Merbecovirus transmitted from bats or wild animals to farmed animals or humans have different geographical origins, implying the existence of several animal hosts []. A comprehensive codon usage analysis of RdRp and the four structural proteins (S, E, M, and N) in Merbecoviruses enables a comparative evaluation of their molecular evolution and host adaptability. Importantly, this prediction finds strong support in a growing body of in vitro and in vivo evidence regarding the fundamental mechanism of viral entry: receptor specificity.
Systematic analyses of nucleotide composition revealed that Merbecoviruses exhibit a high abundance of A and U, particularly at the third codon position. RSCU analysis further showed that Merbecoviruses preferentially use A/U-ending codons. This preference is not a virus-specific adaptation but rather a common molecular feature of RNA viruses (also observed in MERS-CoV, SARS-CoV-2, and various mammalian RNA viruses) whose coding sequences typically exhibit a compositional bias characterized by reduced C/G-ending and enriched A/U-ending codons [,,,,,]. A shared AU-rich bias was observed in both the viruses and their primary bat hosts, with host genomic GC content ranging from 36% to 50%. Given the AT-rich nature of the human genome (58%), such nucleotide composition likely facilitates viral adaptability and evolution within host environments [,,,]. Studies show that the zinc-finger antiviral protein (ZAP) binds CpG-rich viral RNA to suppress replication []; conversely, low CpG content in viral RNA can help evade immune recognition by receptors like RIG-I, delaying interferon production and facilitating viral escape []. Although dinucleotide composition reflects viral family characteristics rather than host genomes, our results show consistently low viral CpG levels across different hosts. Moreover, the RSCU pattern of Merbecoviruses correlates poorly with that of the host, which may reduce translational efficiency but simultaneously support the correct folding of viral proteins [,].
The ENC values for all viral groups were consistently above 35, indicating a generally low codon usage bias, which may facilitate efficient replication, transcription, and translation during host infection []. Among them, MERSr-CoV exhibited the highest ENC values, suggesting particularly weak codon bias that could reflect distinct codon preferences and support efficient replication in vertebrate hosts. Notably, the E and M proteins of MERSr-CoV from Camelus dromedarius and Homo sapiens, as well as partial sequences of Ea-HedCoV HKU31, deviated above the standard curve, with the E protein showing the greatest deviation—a pattern also reported in recent MERS-CoV studies—indicating potential selective pressures and accelerated evolution that may affect pathogenicity and immune evasion [,]. Correlation analyses, PR2, and neutrality analyses further revealed that natural selection predominantly shapes codon usage bias, with mutational pressure contributing to a lesser extent. Recent comparative analyses of codon usage patterns across Betacoronaviruses suggest that both mutational pressure and natural selection contribute to shaping their evolutionary trajectories. Embecoviruses (e.g., HCoV-OC43 and HCoV-HKU1) exhibit codon usage profiles indicative of long-term adaptation to human hosts, characterized by relatively weak bias predominantly governed by mutational pressure []. In contrast, Sarbecoviruses, particularly SARS-CoV-2, display progressive optimization toward human-preferred codons over the course of ongoing evolution [,].
The canonical receptor for MERS-CoV is human dipeptidyl peptidase-4 (hDPP4) []. In the phylogenetic analysis of S and RdRp genes, our finding that MERSr-CoV sequences from humans and camels cluster closely together is consistent with this known receptor usage as camels are the established zoonotic reservoir. Furthermore, the recently identified pangolin-derived MjHKU4r-CoV, which clusters with bat HKU4 viruses in our phylogenetic and PCA analyses, has also been experimentally proven to utilize hDPP4 for efficient cell entry, confirming its potential for zoonotic transmission [].
Perhaps the most significant validation of our codon-based predictions comes from the recent paradigm shift in understanding Merbecovirus receptor usage. Bat-origin Merbecoviruses, such as NeoCoV, PDF-2180, MOW15-22, and PnNL 2018B, have been confirmed to use ACE2 as a functional receptor [,]. It is now established that several Merbecovirus lineages have evolved to use angiotensin-converting enzyme 2 (ACE2) instead of DPP4. Notably, the bat virus NeoCoV and its close relative PDF-2180 can utilize ACE2 orthologs from various bat species for entry []. More directly relevant to our findings, the HKU5 lineage—viruses from which were identified in farmed minks in our dataset—has been demonstrated to use ACE2 from its natural bat host (Pipistrellus abramus) and, crucially, from American mink (Neogale vison) [,]. These viruses exhibit distinct receptor usage due to significant receptor-binding domain (RBD) sequence divergence. NeoCoV/PDF-2180 retains a MERS-CoV-like RBD fold but forms a more compact ACE2-binding interface via conformational changes, relying on glycosylation sites N54 and N329, unlike SARS-CoV-2 or NL63 [,,,,]. Receptor shift results from key amino acid changes and RBD domain recombination, with different lineages evolving independently: NeoCoV/PDF-2180 uses bat ACE2, while HKU5-CoV-2 employs a novel interface for human ACE2, illustrating convergent evolution [,]. Therefore, codon usage bias in viral genomes appears to be associated with the evolution of receptor-binding domains, suggesting a plausible direction for exploring viral host adaptation mechanisms.
Analysis of CAI and RCDI revealed that Merbecoviruses exhibit codon adaptation patterns in pigs, horses, and rabbits similar to those observed in their natural hosts. These findings are consistent with in vitro experiments and animal infection studies, in which MERSr-CoV has been shown to complete its replication cycle and generate infectious viral particles in primary cells or in vivo models of these species [,,]. Together, the concordance between codon adaptation metrics and experimental infection data provides multi-dimensional evidence supporting the role of pigs, horses, and rabbits as potential susceptible hosts of Merbecoviruses, warranting further investigation into their potential involvement in cross-species viral transmission.
In conclusion, our comprehensive codon usage analysis reveals the evolutionary adaptation of Merbecoviruses to a diverse range of hosts. The agreement between our genomic predictions and established functional studies on receptor tropism underscores the reliability of this approach. The identification of potential new host species, coupled with the demonstrated ability of certain Merbecovirus lineages to utilize different entry receptors—a trait potentially acquired through recombination events—significantly expands the perceived host range of these viruses and highlights an ongoing risk of cross-species transmission.
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/v17111479/s1, Figure S1: Principal Component Analysis (PCA) based on the RSCU values of 59 synonymous codons. Scatter plot of Merbecovirus genes on the plane defined by the first two principal components (PC1 and PC2); Figure S2: Principal Component Analysis (PCA) of major Merbecovirus genes based on the second and third principal components (PC2 and PC3); Figure S3: Principal Component Analysis (PCA) of major Merbecovirus genes based on the second and third principal components (PC2 and PC3); Figure S4: Heatmap of relative synonymous codon usage (RSCU) values for RdRp gene and S gene; Figure S5: Heatmap of relative synonymous codon usage (RSCU) values for E gene, M gene, and N gene; Figure S6: ENC-GC3s plot analysis of codon usage for the structural genes E, M, and N; Figure S7: Parity Rule 2 (PR2) analysis of codon usage for the E, M, and N genes; Figure S8: Neutrality plot analysis of codon usage (GC3s against GC12s) for the E, M, and N genes; Figure S9: Correlations among nucleotide composition and amino acid usage features of Merbecovirus; Figure S10: Relative abundance analysis of 16 dinucleotides in the RdRp and S genes; Figure S11: Relative abundance analysis of 16 dinucleotides in the E, M, and N genes; Table S1: Overview of Merbecovirus seven lineages and their associated host-based groups; Table S2: The results of nucleotide composition analysis in viral structural protein (S, E, M, N) and nonstructural gene (RdRp) of Merbecovirus; Table S3: The results of nucleotide composition analysis in viral structural protein (S, E, M, N) and nonstructural gene (RdRp) of host-based groups of Merbecovirus; Table S4: The relative synonymous codon usage (RSCU) patterns of 59 codons encoding 18 amino acids in RdRp and S for various lineages of Merbecovirus, compared with known and potential hosts; Table S5: The relative synonymous codon usage (RSCU) patterns in M, N, and E proteins for various lineages of Merbecovirus, compared with known and potential hosts.
Author Contributions
Conceptualization, W.-T.H.; methodology, G.Y., Y.L. and M.Z.; software, H.Z. and H.L.; validation, H.Z. and G.F.; formal analysis, G.Y. and Y.L.; investigation, H.Z.; resources, J.D. and W.-T.H.; data curation, G.Y., Y.L. and M.Z.; writing—original draft preparation, G.Y. and Y.L.; writing—review and editing, G.Y., H.Z., X.C. and W.-T.H.; visualization, Y.L.; supervision, W.-T.H.; project administration, W.-T.H.; funding acquisition, W.-T.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Open Project Program of Jiangsu Key Laboratory of Zoonosis.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.
Acknowledgments
We thank Meng Lu and Ruiya Liu for their constructive suggestions and for proofreading the draft manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Zaki, A.M.; Van Boheemen, S.; Bestebroer, T.M.; Osterhaus, A.D.; Fouchier, R.A. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N. Engl. J. Med. 2012, 367, 1814–1820. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization Regional Office for the Eastern Mediterranean (WHO-EMRO). MERS Situation Update, July 2025. 2025. Available online: https://applications.emro.who.int/docs/WHOEMCSR833E-eng.pdf (accessed on 1 July 2025).
- Woo, P.C.; Lau, S.K.; Li, K.S.; Poon, R.W.; Wong, B.H.; Tsoi, H.W.; Yip, B.C.; Huang, Y.; Chan, K.H.; Yuen, K.Y. Molecular diversity of coronaviruses in bats. Virology 2006, 351, 180–187. [Google Scholar] [CrossRef] [PubMed]
- Woo, P.C.; Wang, M.; Lau, S.K.; Xu, H.; Poon, R.W.; Guo, R.; Wong, B.H.; Gao, K.; Tsoi, H.W.; Huang, Y.; et al. Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features. J. Virol. 2007, 81, 1574–1585. [Google Scholar] [CrossRef] [PubMed]
- Zhao, J.; Wan, W.; Yu, K.; Lemey, P.; Pettersson, J.H.; Bi, Y.; Lu, M.; Li, X.; Chen, Z.; Zheng, M.; et al. Farmed fur animals harbour viruses with zoonotic spillover potential. Nature 2024, 634, 228–233. [Google Scholar] [CrossRef]
- Chen, J.; Yang, X.; Si, H.; Gong, Q.; Que, T.; Li, J.; Li, Y.; Wu, C.; Zhang, W.; Chen, Y.; et al. A bat MERS-like coronavirus circulates in pangolins and utilizes human DPP4 and host proteases for cell entry. Cell 2023, 186, 850–863.e16. [Google Scholar] [CrossRef]
- Lau, S.K.P.; Luk, H.K.H.; Wong, A.C.P.; Fan, R.Y.Y.; Lam, C.S.F.; Li, K.S.M.; Ahmed, S.S.; Chow, F.W.N.; Cai, J.P.; Zhu, X.; et al. Identification of a Novel Betacoronavirus (Merbecovirus) in Amur Hedgehogs from China. Viruses 2019, 11, 980. [Google Scholar] [CrossRef]
- Yao, Y.; Bao, L.; Deng, W.; Xu, L.; Li, F.; Lv, Q.; Yu, P.; Chen, T.; Xu, Y.; Zhu, H.; et al. An animal model of MERS produced by infection of rhesus macaques with MERS coronavirus. J. Infect. Dis. 2014, 209, 236–242. [Google Scholar] [CrossRef]
- Falzarano, D.; de Wit, E.; Feldmann, F.; Rasmussen, A.L.; Okumura, A.; Peng, X.; Thomas, M.J.; van Doremalen, N.; Haddock, E.; Nagy, L.; et al. Infection with MERS-CoV causes lethal pneumonia in the common marmoset. PLoS Pathog. 2014, 10, e1004250. [Google Scholar] [CrossRef]
- Kandeil, A.; Gomaa, M.; Shehata, M.; El-Taweel, A.; Kayed, A.E.; Abiadh, A.; Jrijer, J.; Moatasim, Y.; Kutkat, O.; Bagato, O.; et al. Middle East respiratory syndrome coronavirus infection in non-camelid domestic mammals. Emerg. Microbes Infect. 2019, 8, 103–108. [Google Scholar] [CrossRef]
- Meyer, B.; García-Bocanegra, I.; Wernery, U.; Wernery, R.; Sieberg, A.; Müller, M.A.; Drexler, J.F.; Drosten, C.; Eckerle, I. Serologic assessment of possibility for MERS-CoV infection in equids. Emerg. Infect. Dis. 2015, 21, 181–182. [Google Scholar] [CrossRef]
- Haagmans, B.L.; van den Brand, J.M.; Provacia, L.B.; Raj, V.S.; Stittelaar, K.J.; Getu, S.; de Waal, L.; Bestebroer, T.M.; van Amerongen, G.; Verjans, G.M.; et al. Asymptomatic Middle East respiratory syndrome coronavirus infection in rabbits. J. Virol. 2015, 89, 6131–6135. [Google Scholar] [CrossRef] [PubMed]
- Vergara-Alert, J.; van den Brand, J.M.; Widagdo, W.; Muñoz, M.t.; Raj, S.; Schipper, D.; Solanes, D.; Cordón, I.; Bensaid, A.; Haagmans, B.L.; et al. Livestock Susceptibility to Infection with Middle East Respiratory Syndrome Coronavirus. Emerg. Infect. Dis. 2017, 23, 232–240. [Google Scholar] [CrossRef] [PubMed]
- Lu, G.; Wang, Q.; Gao, G.F. Bat-to-human: Spike features determining ‘host jump’ of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. 2015, 23, 468–478. [Google Scholar] [CrossRef] [PubMed]
- Pomorska-Mól, M.; Ruszkowski, J.J.; Gogulski, M.; Domanska-Blicharz, K. First detection of Hedgehog coronavirus 1 in Poland. Sci. Rep. 2022, 12, 2386. [Google Scholar] [CrossRef]
- He, W.T.; Hou, X.; Zhao, J.; Sun, J.; He, H.; Si, W.; Wang, J.; Jiang, Z.; Yan, Z.; Xing, G.; et al. Virome characterization of game animals in China reveals a spectrum of emerging pathogens. Cell 2022, 185, 1117–1129.e1118. [Google Scholar] [CrossRef]
- Aloor, A.; Aradhya, R.; Venugopal, P.; Gopalakrishnan Nair, B.; Suravajhala, R. Glycosylation in SARS-CoV-2 variants: A path to infection and recovery. Biochem. Pharmacol. 2022, 206, 115335. [Google Scholar] [CrossRef]
- Spencer, P.S.; Barral, J.M. Genetic code redundancy and its influence on the encoded polypeptides. Comput. Struct. Biotechnol. J. 2012, 1, e201204006. [Google Scholar] [CrossRef]
- Brule, C.E.; Grayhack, E.J. Synonymous Codons: Choose Wisely for Expression. Trends Genet. 2017, 33, 283–297. [Google Scholar] [CrossRef]
- Ingvarsson, P.K. Molecular evolution of synonymous codon usage in Populus. BMC Evol. Biol. 2008, 8, 307. [Google Scholar] [CrossRef]
- Bulmer, M. The selection-mutation-drift theory of synonymous codon usage. Genetics 1991, 129, 897–907. [Google Scholar] [CrossRef]
- Hershberg, R.; Petrov, D.A. Selection on codon bias. Annu. Rev. Genet. 2008, 42, 287–299. [Google Scholar] [CrossRef]
- Domingo, E.; Perales, C. Viral quasispecies. PLoS Genet. 2019, 15, e1008271. [Google Scholar] [CrossRef]
- Mordstein, C.; Cano, L.; Morales, A.C.; Young, B.; Ho, A.T.; Rice, A.M.; Liss, M.; Hurst, L.D.; Kudla, G. Transcription, mRNA Export, and Immune Evasion Shape the Codon Usage of Viruses. Genome Biol. Evol. 2021, 13, evab106. [Google Scholar] [CrossRef]
- Morgunov, A.S.; Babu, M.M. Optimizing membrane-protein biogenesis through nonoptimal-codon usage. Nat. Struct. Mol. Biol. 2014, 21, 1023–1025. [Google Scholar] [CrossRef]
- Victor, M.P.; Acharya, D.; Begum, T.; Ghosh, T.C. The optimization of mRNA expression level by its intrinsic properties-Insights from codon usage pattern and structural stability of mRNA. Genomics 2019, 111, 1292–1297. [Google Scholar] [CrossRef] [PubMed]
- Bu, C.; Zheng, X.; Zhao, X.; Xu, T.; Bai, X.; Jia, Y.; Chen, M.; Hao, L.; Xiao, J.; Zhang, Z.; et al. GenBase: A Nucleotide Sequence Database. Genom. Proteom. Bioinform. 2024, 22, qzae047. [Google Scholar] [CrossRef]
- Martin, D.P.; Murrell, B.; Golden, M.; Khoosal, A.; Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 2015, 1, vev003. [Google Scholar] [CrossRef]
- Martin, D.; Rybicki, E. RDP: Detection of recombination amongst aligned sequences. Bioinformatics 2000, 16, 562–563. [Google Scholar] [CrossRef]
- Padidam, M.; Sawyer, S.; Fauquet, C.M. Possible emergence of new geminiviruses by frequent recombination. Virology 1999, 265, 218–225. [Google Scholar] [CrossRef]
- Boni, M.F.; Posada, D.; Feldman, M.W. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 2007, 176, 1035–1047. [Google Scholar] [CrossRef]
- Gibbs, M.J.; Armstrong, J.S.; Gibbs, A.J. Sister-scanning: A Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics 2000, 16, 573–582. [Google Scholar] [CrossRef]
- Smith, J.M. Analyzing the mosaic structure of genes. J. Mol. Evol. 1992, 34, 126–129. [Google Scholar] [CrossRef] [PubMed]
- Holmes, E.C.; Worobey, M.; Rambaut, A. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 1999, 16, 405–409. [Google Scholar] [CrossRef] [PubMed]
- Sabir, J.S.; Lam, T.T.; Ahmed, M.M.; Li, L.; Shen, Y.; Abo-Aba, S.E.; Qureshi, M.I.; Abu-Zeid, M.; Zhang, Y.; Khiyami, M.A.; et al. Co-circulation of three camel coronavirus species and recombination of MERS-CoVs in Saudi Arabia. Science 2016, 351, 81–84. [Google Scholar] [CrossRef] [PubMed]
- Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
- Nguyen, L.T.; Schmidt, H.A.; von Haeseler, A.; Minh, B.Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 2015, 32, 268–274. [Google Scholar] [CrossRef]
- Hoang, D.T.; Chernomor, O.; von Haeseler, A.; Minh, B.Q.; Vinh, L.S. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 2018, 35, 518–522. [Google Scholar] [CrossRef]
- Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v6: Recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024, 52, W78–W82. [Google Scholar] [CrossRef]
- Wang, M.; Liu, Y.-S.; Zhou, J.-H.; Chen, H.-T.; Ma, L.-N.; Ding, Y.-Z.; Liu, W.-Q.; Gu, Y.-X.; Zhang, J. Analysis of codon usage in Newcastle disease virus. Virus Genes 2011, 42, 245–253. [Google Scholar] [CrossRef]
- Subramanian, K.; Payne, B.; Feyertag, F.; Alvarez-Ponce, D. The codon statistics database: A database of codon usage bias. Mol. Biol. Evol. 2022, 39, msac157. [Google Scholar] [CrossRef]
- Sharp, P.M.; Tuohy, T.M.; Mosurski, K.R. Codon usage in yeast: Cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986, 14, 5125–5143. [Google Scholar] [CrossRef] [PubMed]
- Wright, F. The ‘effective number of codons’ used in a gene. Gene 1990, 87, 23–29. [Google Scholar] [CrossRef] [PubMed]
- Puigbò, P.; Aragonès, L.; Garcia-Vallvé, S. RCDI/eRCDI: A web-server to estimate codon usage deoptimization. BMC Res. Notes 2010, 3, 87. [Google Scholar] [CrossRef]
- Cotten, M.; Watson, S.J.; Kellam, P.; Al-Rabeeah, A.A.; Makhdoom, H.Q.; Assiri, A.; Al-Tawfiq, J.A.; Alhakeem, R.F.; Madani, H.; AlRabiah, F.A. Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: A descriptive genomic study. Lancet 2013, 382, 1993–2002. [Google Scholar] [CrossRef]
- Bellare, P.; Dufresne, A.; Ganem, D. Inefficient Codon Usage Impairs mRNA Accumulation: The Case of the v-FLIP Gene of Kaposi’s Sarcoma-Associated Herpesvirus. J. Virol. 2015, 89, 7097–7107. [Google Scholar] [CrossRef]
- Hussain, S.; Shinu, P.; Islam, M.M.; Chohan, M.S.; Rasool, S.T. Analysis of Codon Usage and Nucleotide Bias in Middle East Respiratory Syndrome Coronavirus Genes. Evol. Bioinform. 2020, 16, 1176934320918861. [Google Scholar] [CrossRef]
- Wang, W.; Zhou, L.; Ge, X.; Han, J.; Guo, X.; Zhang, Y.; Yang, H. Analysis of codon usage patterns of porcine enteric alphacoronavirus and its host adaptability. Virology 2023, 587, 109879. [Google Scholar] [CrossRef]
- Peng, Q.; Zhang, X.; Li, J.; He, W.; Fan, B.; Ni, Y.; Liu, M.; Li, B. Comprehensive analysis of codon usage patterns of porcine deltacoronavirus and its host adaptability. Transbound. Emerg. Dis. 2022, 69, e2443–e2455. [Google Scholar] [CrossRef]
- Kustin, T.; Stern, A. Biased Mutation and Selection in RNA Viruses. Mol. Biol. Evol. 2021, 38, 575–588. [Google Scholar] [CrossRef]
- Ventoso, I. Codon Usage Bias in Human RNA Viruses and Its Impact on Viral Translation, Fitness, and Evolution. Viruses 2025, 17, 1218. [Google Scholar] [CrossRef]
- Cheng, S.; Wu, H.; Chen, Z. Evolution of Transmissible Gastroenteritis Virus (TGEV): A Codon Usage Perspective. Int. J. Mol. Sci. 2020, 21, 7898. [Google Scholar] [CrossRef]
- Rima, B.K.; McFerran, N.V. Dinucleotide and stop codon frequencies in single-stranded RNA viruses. J. Gen. Virol. 1997, 78, 2859–2870. [Google Scholar] [CrossRef] [PubMed]
- Piovesan, A.; Pelleri, M.C.; Antonaros, F.; Strippoli, P.; Caracausi, M.; Vitale, L. On the length, weight and GC content of the human genome. BMC Res. Notes 2019, 12, 106. [Google Scholar] [CrossRef] [PubMed]
- Dutta, R.; Buragohain, L.; Borah, P. Analysis of codon usage of severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) and its adaptability in dog. Virus Res. 2020, 288, 198113. [Google Scholar] [CrossRef] [PubMed]
- Takata, M.A.; Gonçalves-Carneiro, D.; Zang, T.M.; Soll, S.J.; York, A.; Blanco-Melo, D.; Bieniasz, P.D. CG dinucleotide suppression enables antiviral defence targeting non-self RNA. Nature 2017, 550, 124–127. [Google Scholar] [CrossRef]
- Franzo, G. SARS-CoV-2 and other human coronavirus show genome patterns previously associated to reduced viral recognition and altered immune response. Sci. Rep. 2021, 11, 10696. [Google Scholar] [CrossRef]
- Hu, J.S.; Wang, Q.Q.; Zhang, J.; Chen, H.T.; Xu, Z.W.; Zhu, L.; Ding, Y.Z.; Ma, L.N.; Xu, K.; Gu, Y.X.; et al. The characteristic of codon usage pattern and its evolution of hepatitis C virus. Infect. Genet. Evol. 2011, 11, 2098–2102. [Google Scholar] [CrossRef]
- Kumar, N.; Kulkarni, D.D.; Lee, B.; Kaushik, R.; Bhatia, S.; Sood, R.; Pateriya, A.K.; Bhat, S.; Singh, V.P. Evolution of Codon Usage Bias in Henipaviruses Is Governed by Natural Selection and Is Host-Specific. Viruses 2018, 10, 604. [Google Scholar] [CrossRef]
- Huang, W.; Guo, Y.; Li, N.; Feng, Y.; Xiao, L. Codon usage analysis of zoonotic coronaviruses reveals lower adaptation to humans by SARS-CoV-2. Infect. Genet. Evol. 2021, 89, 104736. [Google Scholar] [CrossRef]
- Maldonado, L.L.; Bertelli, A.M.; Kamenetzky, L. Molecular features similarities between SARS-CoV-2, SARS, MERS and key human genes could favour the viral infections and trigger collateral effects. Sci. Rep. 2021, 11, 4108. [Google Scholar] [CrossRef]
- Kumar, N.; Kaushik, R.; Tennakoon, C.; Uversky, V.N.; Mishra, A.; Sood, R.; Srivastava, P.; Tripathi, M.; Zhang, K.Y.J.; Bhatia, S. Evolutionary Signatures Governing the Codon Usage Bias in Coronaviruses and Their Implications for Viruses Infecting Various Bat Species. Viruses 2021, 13, 1847. [Google Scholar] [CrossRef]
- Ramazzotti, D.; Angaroni, F.; Maspero, D.; Mauri, M.; D’Aliberti, D.; Fontana, D.; Antoniotti, M.; Elli, E.M.; Graudenzi, A.; Piazza, R. Large-scale analysis of SARS-CoV-2 synonymous mutations reveals the adaptation to the human codon usage during the virus evolution. Virus Evol. 2022, 8, veac026. [Google Scholar] [CrossRef]
- Gu, H.; Chu, D.K.W.; Peiris, M.; Poon, L.L.M. Multivariate analyses of codon usage of SARS-CoV-2 and other betacoronaviruses. Virus Evol. 2020, 6, veaa032. [Google Scholar] [CrossRef]
- Lu, G.; Hu, Y.; Wang, Q.; Qi, J.; Gao, F.; Li, Y.; Zhang, Y.; Zhang, W.; Yuan, Y.; Bao, J.; et al. Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26. Nature 2013, 500, 227–231. [Google Scholar] [CrossRef] [PubMed]
- Xiong, Q.; Cao, L.; Ma, C.; Tortorici, M.A.; Liu, C.; Si, J.; Liu, P.; Gu, M.; Walls, A.C.; Wang, C.; et al. Close relatives of MERS-CoV in bats use ACE2 as their functional receptors. Nature 2022, 612, 748–757. [Google Scholar] [CrossRef] [PubMed]
- Ma, C.B.; Liu, C.; Park, Y.J.; Tang, J.; Chen, J.; Xiong, Q.; Lee, J.; Stewart, C.; Asarnow, D.; Brown, J.; et al. Multiple independent acquisitions of ACE2 usage in MERS-related coronaviruses. Cell 2025, 188, 1693–1710.e18. [Google Scholar] [CrossRef] [PubMed]
- Madel Alfajaro, M.; Keeler, E.L.; Li, N.; Catanzaro, N.J.; Teng, I.T.; Zhao, Z.; Grunst, M.W.; Yount, B.; Schäfer, A.; Wang, D.; et al. HKU5 bat merbecoviruses engage bat and mink ACE2 as entry receptors. Nat. Commun. 2025, 16, 6822. [Google Scholar] [CrossRef]
- Park, Y.J.; Liu, C.; Lee, J.; Brown, J.T.; Ma, C.B.; Liu, P.; Gen, R.; Xiong, Q.; Zepeda, S.K.; Stewart, C.; et al. Molecular basis of convergent evolution of ACE2 receptor utilization among HKU5 coronaviruses. Cell 2025, 188, 1711–1728.e1721. [Google Scholar] [CrossRef]
- Ma, C.; Liu, C.; Xiong, Q.; Gu, M.; Shi, L.; Wang, C.; Si, J.; Tong, F.; Liu, P.; Huang, M.; et al. Broad host tropism of ACE2-using MERS-related coronaviruses and determinants restricting viral recognition. Cell Discov. 2023, 9, 57. [Google Scholar] [CrossRef]
- Wu, K.; Li, W.; Peng, G.; Li, F. Crystal structure of NL63 respiratory coronavirus receptor-binding domain complexed with its human receptor. Proc. Natl. Acad. Sci. USA 2009, 106, 19970–19974. [Google Scholar] [CrossRef]
- Lan, J.; Ge, J.; Yu, J.; Shan, S.; Zhou, H.; Fan, S.; Zhang, Q.; Shi, X.; Wang, Q.; Zhang, L.; et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 2020, 581, 215–220. [Google Scholar] [CrossRef]
- Zhao, J.; Kang, M.; Wu, H.; Sun, B.; Baele, G.; He, W.T.; Lu, M.; Suchard, M.A.; Ji, X.; He, N.; et al. Risk assessment of SARS-CoV-2 replicating and evolving in animals. Trends Microbiol. 2024, 32, 79–92. [Google Scholar] [CrossRef]
- Chen, J.; Zhang, W.; Li, Y.; Liu, C.; Dong, T.; Chen, H.; Wu, C.; Su, J.; Li, B.; Zhang, W.; et al. Bat-infecting merbecovirus HKU5-CoV lineage 2 can use human ACE2 as a cell entry receptor. Cell 2025, 188, 1729–1742.e1716. [Google Scholar] [CrossRef]
- Hemida, M.G.; Chu, D.K.W.; Perera, R.; Ko, R.L.W.; So, R.T.Y.; Ng, B.C.Y.; Chan, S.M.S.; Chu, S.; Alnaeem, A.A.; Alhammadi, M.A.; et al. Coronavirus infections in horses in Saudi Arabia and Oman. Transbound. Emerg. Dis. 2017, 64, 2093–2103. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).