Strategies and Patterns of Codon Bias in Molluscum Contagiosum Virus

Trends associated with codon usage in molluscum contagiosum virus (MCV) and factors governing the evolution of codon usage have not been investigated so far. In this study, attempts were made to decipher the codon usage trends and discover the major evolutionary forces that influence the patterns of codon usage in MCV with special reference to sub-types 1 and 2, MCV-1 and MCV-2, respectively. Three hypotheses were tested: (1) codon usage patterns of MCV-1 and MCV-2 are identical; (2) SCUB (synonymous codon usage bias) patterns of MCV-1 and MCV-2 slightly deviate from that of human host to avoid affecting the fitness of host; and (3) translational selection predominantly shapes the SCUB of MCV-1 and MCV-2. Various codon usage indices viz. relative codon usage value, effective number of codons and codon adaptation index were calculated to infer the nature of codon usage. Correspondence analysis and correlation analysis were performed to assess the relative contribution of silent base contents and significance of codon usage indices in defining bias in codon usage. Among the tested hypotheses, only the second and third hypotheses were accepted.


Introduction
In universal genetic code, any given amino acid except tryptophan and methionine is encoded by a specific set of multi-fold degenerate codons called synonymous codons [1,2]. As an event of mutation which causes replacement of one synonymous codon with another in a given coding region does not modify the amino acid sequence, these mutations are called 'silent' [3]. Although these synonymous changes are seemingly neutral, selection of synonymous codons occurs during the process of evolution as these 'silent' changes have many effects on the functioning of a living cell [3]. Due to selection, even though translational mechanisms in organisms are relatively conserved from pole to pole, patterns of synonymous codon usage (SCU) are non-random across species, resulting in speciesspecific SCU [4,5]. Further, usage of synonymous codons varies within genes of the same genome [6][7][8].
Intraspecies SCUB is often viewed as the result of selection because the higher the number of preferred codons, the higher the level of gene expression would be [23,27]. In contrast, mutational pressure is assumed to be the primary player in determining interspecies SCUB [1,28,29]. However, such generalizations of driving forces behind SCUB in intraspecies and interspecies scenarios are not yet fully justified [30,31] as compositional constraints (differential nucleotide contents) of genomes are also crucial. For instance, GCrich genomes tend to favor G and C ending codons whereas AT-rich genomes preferentially use A and T ending codons [6,32,33]. Research on SCUB in various species unveiled the role of weak selection acting at the molecular level towards molecular evolution [34][35][36], and such studies produced substantial evidence to develop molecular evolutionary models based on selection other than neutral molecular evolution model [37,38]. An understanding of differential influences of these forces on shaping SCUB in a species is of paramount importance to research as it paves way for studying the evolutionary potential of genomic machinery of that species.
Viruses are parasites which depend on host cells to undertake key biomolecular measures of survival, such as transcription, translation and replication [39]. Viral genes are capable of altering various steps in the pathogen identification pathways of host cell [40]. Certain viruses are proposed to remain in host cell for long durations without being identified by host immune mechanisms and may follow a relaxed inexorable way of reproduction using cell's replication machinery [39]. Essentially, such long-term association in host cells can cause transformation of whole viral genome (DNA/RNA) as an integral part of the host genome (colonization), which will decide the direction of the evolution of the host. Analyses of SCUB of various viral genomes reported that the efficiency of adaptation of viral genomes to the host is directly proportional to rate of similarity of SCUB between virus and host; the more the similarity, the higher the adaptation will be [41,42]. A recent study revealed that optimum SCUB pattern of viral genome follows slight deviation from the SCUB pattern of the natural host in order to avoid excessive expression and depletion of the tRNA pool as host fitness is important for the virus to survive in the natural host/virus systems [43]. Although debatable, the concept that viruses develop unique genes and then colonize bacterial and vertebrate lineages reveals the evolutionary significance of viruses [39]. Hence, studying SCUB patterns of viral genomes will help to gain significant insights into overall viral sustenance, codon adaptability and viral pathogenesis with respect to natural and symptomatic hosts [44].
Molluscum contagiosum virus (MCV) is a double-stranded DNA virus belonging to the genus Molluscipox of Poxviridae family [45]. Molluscum contagiosum (MC) is a self-limited skin disease caused by MCV in humans which is characterized by small but raised mollusca (lesions) on the top layer of skin [46]. High incidence of MC is limited to the pediatric population, but immunodeficient individuals and sexually active adults are also susceptible to this infectious dermatosis [47]. The disease characteristics were initially described in 1814 [48], but the viral background of the disease was discovered in 1905 [49]. Although the raised mollusca associated with this infection are observed to be self-limiting, lesion clearance may take from 6 months to as long as 5 years [50]. As no significant difference was observed between treated and untreated cases [51], no FDAapproved therapy exists for treatment [52]. In general, 'active non-intervention' is adopted as a recommended strategy in dealing with MCV infections [52]. Currently, MCV cannot be cultured in vitro, limiting the ability to investigate replication and pathogenesis [53]. Four subtypes of MCV are identified, viz., MCV-1, MCV-2, MCV-3 and MCV-4 [53]. Among these subtypes, MCV-1 causes nearly 98% of cases, particularly in children, whereas MCV-2 causes skin lesions in immunocompromised adults [53]. The double-stranded DNA genome of MCV contains 182 non-overlapping coding frames, but only half of them share homologies with other poxvirus proteins [54]. The variable region of the MCV genome hosts a number of unique genes [55]; hence, the genomic machinery of MCV is highly divergent from other mammalian chordopoxviruses [56]. Considering the unique features of MCV such as (i) restriction to humans as a significant host, (ii) a lack of a system for culture and (iii) high divergence from other poxviruses [56], continued studies of MCV are required to gain insights into viral evolution [44], pathogenesis and cellular mechanisms which control the host's response to infection [57]. The present study focused on the genomes of MCV-1 and MCV-2 due to their higher rates of infectioncausing capabilities among the four sub-types. As MCV uses humans as their natural host, long-term association with human cells may provide MCVs a platform for their own evolution [58]. In light of the fact that MCV has unique strategies to coexist with natural host [45], the present study is focused on testing the following three hypotheses to obtain insight into the co-evolving trend of the MCV genome with the host genome: (1) codon usage patterns of MCV-1 and MCV-2 are identical, (2) SCUB patterns of MCV-1 and MCV-2 slightly deviate from that of human host to avoid affecting the fitness of host, and (3) translational selection predominantly shapes the SCUB of MCV-1 and MCV-2.

Effect of Base Compositional Constraints on SCUB
Overall and site-specific base contents of coding sequences were estimated for MCV-1 and MCV-2 genomes to assess the effect base composition in shaping SCUB. In all selected genomes of MCV-1 and MCV-2, G and C contents were higher overall than A and T contents (Figure 1), indicating that MCV is GC-rich. In the first codon position, G content was high whereas in the second position, T content was high although overall T content was relatively low. In synonymous sites (third position), C content was high in both subtypes. Complex correlations were observed between overall and site-specific base contents in MCV-1 and MCV-2 genomes (Table 1). In both subtypes, A content was in significant negative correlation with G3, whereas A content was in positive correlation with A3 in five genomes of MC-1. In MCV-2 genomes, A and A3 were not correlated. In all genomes, T content was in significant positive correlation with A3 and T3 and was in negative correlation with C3, G3 and GC3. Except two genomes of MCV-1 and five genomes of MCV-2, other genomes exhibited significant negative correlation between G and T3 whereas positive correlation existed between G and G3 in all selected MCV I and MCV 2 genomes. In both subtypes, C content was positively correlated with C3, G3 and GC3, whereas it was negatively correlated with A3 and T3.

Relative Magnitude of Selection versus Mutation
ENC and GC3 values were calculated for coding sequences of MCV-1 and MCV-2 genomes. Mean ENC values varied by 45.03 ± 0.57. Mean GC3 values were within the range of 53.308 ± 0.78. ENC values of majority of coding sequences were found to be lying in between 33-54 in MCV-1 and MCV-2 genomes indicating a clear but weak bias [59]. In the ENC vs. GC3 plot, the majority of coding sequences were lying considerably below the expected curve, indicating a high possibility of selection influencing SCUB ( Figure 2). The Mann-Whitney two-sample test did not reveal any significant differences between intergenomic ENC. Moreover, a strong positive correlation between ENC and GC3 values was observed in all genomes (p < 0.0001), indicating the possible role of mutation as one of the major determining factors in shaping SCUB. Among the coding sequences analyzed, a few were observed to be having low SCUB (ENC ≥ 55) ( Table 2).
In the neutrality plot, strong positive correlations were observed between GC12 and GC3 in seven MCV-1 genomes (Figure 3a-g), and relatively weaker negative correlations were observed between GC12 and GC3 in two MCV-1 genomes and all selected MCV-2 genomes (Figure 3h-o). These significant correlations (p ≤ 0.001) indicated the critical role of mutation in shaping SCUB in the genomes of MCV-1 and MCV-2 but with varying intensities. Among the selected MCVs, in the seven genomes of MCV-1, slopes of regression lines were close to 1, revealing that mutational pressure is highly influential in determining SCUB (Figure 3a-g) [60,61], but the narrow distribution of GC3 could be due to the effect of some amount of selection. In the remaining genomes (two MCV-1 and all selected MC-2; Figure 3h-o), the scatter plots were widespread with relatively weaker correlations, and also the slopes of regression lines were ≤0.50. This indicated that mutational pressure is relatively lower and selection pressure is relatively higher in these genomes (Figure 3h-o) when compared with that of the seven MCV-1 genomes mentioned above [44].        PR2 bias plot revealed non-proportional usage of AT and GC count at 3rd codon position in four-fold degenerate codons in MCV-1 and MCV-2 genomes. Frequency of nucleotides A and T at degenerate positions (A3 and T3) were not equal with that of nucleotides G3 and C3 ( Figure 4). AT bias at degenerate positions in the coding sequences of MCV-1 and MCV-2 deviated considerably from the center (A = T = 0.5; bias) relative to GC bias at degenerate positions in the fourfold degenerate codons.

Over-Represented and Under-Represented Codons
RSCU values of 59 synonymous codons of coding sequences of MCV-1 and MCV-2 were tabulated (Table 3). No strand-specific bias was observed in synonymous codon usage (Table 4). MCV-1 and MCV-2 genomes exhibited preference towards G/C ending rather than A/T ending codons in coding amino acids except methionine (Met) and tryptophan (Trp) as Met and Trp are coded by single codons. Among the thirty codons were underrepresented (RSCU < 0.6), 29 were A/T ending and one was G ending (CGG for Arg). Of the twenty-one G/C ending codons over-represented (RSCU > 1.6), TTC and CAG were found to be over-represented only in MCV-2 genomes. The codon CCC was over-represented only in a single MCV-2 genome and CCG was over-represented in genomes except two MCV-1 and one MCV-2 genomes (Table 3). RSCU values of only 8 codons (~13.5%) were in the range of 0.6-1.6. Analyses of dinucleotide frequencies revealed that dinucleotide contents were not randomly distributed (χ2 test; p ≤ 0.05). The CC, GG and TA dinucleotides were the most under-represented in both MCV sub-types. The dinucleotides CG and GC were over-represented in all chosen MCV-1 and MCV-2 genomes.        Among the 18 amino acids that are coded by synonymous codons, most preferred codons for six amino acids were recognized by the suboptimal isoacceptor tRNAs (GCG for Ala, CCG for Pro, ACG for Thr, TCG for Ser, CGC for Arg and ATC for Ile) in the isoacceptor tRNA pool (Table 5). Most preferred codons for remaining 12 amino acids were recognized by the abundant isoacceptor tRNAs in MCV genomes (Table 5).

Major Factors Influencing SCUB
No single axis could explain majority of variations in RSCU values of coding sequences of MCV-1 and MCV-2 ( Supplementary Figures S1-S8). Cumulatively, axes 1-7 accounted for more than half of the codon usage variations in both sub types of MCV. Among the seven principal axes chosen, axis 1 in MCV-1 and MCV-2 accounted for~24% of total variations. Axis 1 was positively correlated with G3, C3, GC3 and gene length in all chosen sub types of MCV, whereas axis 1 was negatively correlated with A3, T3, ENC and CAI ( Table 6). Most of the genes were spread across the axis 1 (Supplementary Figure S1). Grouping of A/T ending codons to the left and G/C ending codons to the right of axis 1 was noticed in both MCV-1 and MCV-2 genomes. Cluster analyses revealed distinct grouping of MCV-1 and MCV-2 based on RSCU values ( Figure 5).

Discussion
Deciphering genomic nucleotide composition is a prerequisite for characterization of viral genomes [62]. Nucleotide composition at third codon sites is found unequal and nonrandom between species [63,64] and identification of major determining factors of SCUB is essential for understanding viral genome evolution [65]. In this study, patterns of SCUB and various factors which influence the formation of SCUB patterns in selected individuals of MCV-1 and MCV-2 were examined in detail. Positively correlated homogeneous base contents and negatively correlated heterogeneous base contents in MCV-1 and MCV-2 indicate the major influence of mutational pressure [66]. However, correlation analyses revealed the existence of positive heterogenous correlations (T and A3; C and G3) in all selected MC viruses. Positively correlated heterogenous correlations (T and A3; C and G3) in MCV-1 and MCV-2 revealed that natural selection by host must have influenced the SCUB patterns as in viral genomes, positive correlation between heterogeneous contents and negative correlation between homogeneous contents indicate host-induced natural selection [67]. The highest occurrence of nucleotide C at silent sites confirms the fact that overall base contents of genomes determine patterns of SCUB [33,63] as MCV genomes are GC rich [45].
ENC values of majority of genes were within a range , which indicates the prevalence of a distinct but weak SCUB [59]. The mean ENC value of 45.03 ± 0.57 revealed a relatively stable codon usage in genomes of MCV sub-types as ENC > 35 indicates a conserved genomic architecture [68,69]. Significant differences in intragenomic ENC (SD ≥ 5.7) and GC3 (SD ≥ 7.2) and strong positive correlation between ENC and GC3 point out the role of base compositional constraints in shaping SCUB as reported in large double-stranded DNA viruses [6,70]. Highly biased genes possess low ENC values <35 [6] indicating high levels of gene expression [71]. Variola virus, a genetically close member of MCV belonging to poxvirus group, causes a severe systemic disease with high immune response in humans, whereas MCV do not cause fulminant systemic disease and develops a low rate of immune response [45]. The low immune response developed by MCV infection can be attributed to missing of highly expressive genes of Variola virus in MCV genomic machinery which produce proteins for enabling virus-host interactions [45]. The weak SCUB (low expression) of MCV genomes can be attributed to the ability of MC viral machinery to be in the host for longer periods of time without eliciting a fulminant immune response. As the majority of genes lie far below the bell-shaped portion of the expected ENC curve, the assumption that G + C biased mutation pressure is the sole factor behind the SCUB patterns in MCV does not hold true [71]. Rejection of this null hypothesis, that is, SCUB is dictated solely by GC biased mutational pressure due to GC richness in MCV genomes reveals the possibilities of having selection influencing SCUB patterns [42] in MCV-1 and -2. The possible role of selection was further supported by the narrow distribution of GC3 in seven MCV-1 genomes and low regression slopes of remaining MCV-1 and all selected MCV-2 genomes [44]. Mean values of AT bias [A3/(A3 + T3)] and GC [G3/(G3 + C3)] bias were greater than 0.5, indicating preference of purines over pyrimidines, that is, A over T and G over C [42,72] in synonymous codons of four-fold degenerate amino acids.
The strong preference towards G/C ending codons was due to over-representation of CG/GC dinucleotides in MCV genomes. The low frequency of GG dinucleotide resulted in the under-representation of CGG codon in coding amino acid Arg. This confirms the fact that bias in dinucleotide frequencies shape SCUB [6,73]. The under-representation of TA dinucleotide in MCV genomes may possibly be due to low thermal stability [74] resulting in destabilization of mRNA coupled with sensitivity of uracil in UpA (uracil-phosphateadenine) to cytoplasmic RNase [75] to regulate mRNA turnover in a cell [42]. Among the GC containing codons, GCG, CCG, CGC, TCG and ACG were used preferentially (RSCU > 1.5) whereas CGA, CGG and CGT were under-represented (RSCU < 0.6). The low frequencies of GG and GT dinucleotides can justify the under-representation of CGG and CGT. The possible reason for the low preference of CGA may be attributed to the low overall A content. These results suggest that SCUB in MCV genomes is largely influenced by dinucleotide bias as reported [42,76]. Although codon usage patterns shared some common features as mentioned above, the cluster analysis ( Figure 5) revealed a clear difference in RSCU patterns of MCV 1 and MCV 2, as both sub-types formed distinct clusters.
Role of translation selection in shaping SCUB in MCV can be confirmed by checking whether most preferred codons are recognized by most abundant isoacceptor tRNAs in the isoacceptor tRNA pool [9,42]. In the selected MCV sub-types, most preferred codons of 12 amino acids correspond to the most abundant isoacceptor tRNAs, indicating the role of translational selection [77,78]. Most of the non-optimal codon-anticodon base pairing occurred with CG dinucleotide containing codons (GCG for Ala, CCG for Pro, ACG for Thr, TCG for Ser, CGC) in MCV genomes, that is, most preferred CG dinucleotide containing codons in MCV were translated by rare tRNAs. This can be considered as a selective force to keep a low rate of translation [79,80] in the beginning to develop proper folding of viral proteins [81] for evading host immunity [82] by reducing the anti-viral response from the host [73]. Moreover, strong positive correlations between CAI and ENC (p < 0.0001) also indicate selection pressure as observed in Nipah viruses [42] as correlation between ENC and CAI determine the relative magnitude selection versus mutation [83]. The strong correlations between axis 1 and silent base contents (A3, T3, G3 and C3) pointed out the relative influence of mutational pressure due to compositional constraints in shaping SCUB. CAI values are associated with selection and ENC values reveals SCUB which can be due to either mutation/selection [42]. The strong correlation between axis 1 and these two indices (ENC and CAI) specified the relative high magnitude of selection over mutation in MCV genomes.
Similar to the pattern observed in MCV genomes, host cells also used G/C ending codons most preferentially [81,84]. Although both MCV and host cells preferred G/C ending codons, the non-optimal codon-anticodon base pairing of most preferred codons containing CG dinucleotides indicated that MCV genomes may follow a deliberate slight deviation from host codon usage to remain in the host for a certain period to become adapted to host for acquiring ambient 'climate' for genome evolution [39]. Viral adaptation to host in terms of codon usage is essential for the infection to be successful in human host [41] either due to coevolution of human genome along with infected viral genome or due to human genome evolution from viral genome [85].

Conclusions
This study was performed to test the veracity of following three hypotheses. First hypothesis-Codon usage patterns of MCV-1 and MCV-2 are identical: Although SCUB patterns of MCV-1 and MCV-2 shared common features, apparent intrinsic differences existed in codon usage patterns as revealed by grouping of MCV-1 and MCV-2 in cluster analysis. Thus, the first hypothesis was not accepted.
Second hypothesis-SCUB patterns of MCV 1 and MCV 2 slightly deviate from that of human host to avoid affecting the fitness of host: Despite both human and MCV genomes used G/C ending codons, most preferred codons containing CG dinucleotides were not recognized by most abundant isoacceptor isotypes. This indicated that MCV genomes followed a slight deviation from codon usage pattern of host cells. Thus, the second hypothesis was accepted.
Third hypothesis-Translational selection predominantly shapes the SCUB of MCV-1 and MCV-2: The findings such as strong correlations between ENC and CAI, strong correlation between axis 1 and ENC and axis 1 and CAI, recognition of majority of most preferred codons in MCV genomes by the most abundant isoacceptor isotypes in host cells indicates dominant role of selection along with mutational pressure. Thus, the third hypothesis was also accepted.

Data Retrieval
The coding sequences (CDS) with exact initiation and termination codons of nine MCV-1 and six MCV-2 genomes were retrieved in FASTA format from GenBank database of the National Center for Biotechnology Information (NCBI). Details such as subtypes, accession numbers, country of isolation, total number of CDS, selected CDS and size of genomes are provided in Table 7. Only coding sequences of length ≥ 300 nucleotides were selected for analyses to avoid sampling errors and stochastic variations [6]. Sequences were aligned using MUSCLE algorithm [86] embedded in MEGA X [87]. For each genome, coding sequences on the plus and minus strands were grouped separately to assess strand-specific codon usage bias.

Relative Synonymous Codon Usage
Relative synonymous codon usage (RSCU) is an important measure to analyze the biased usage of synonymous codons in coding a given amino acid [88]. RSCU value of a codon which codes for a given amino acid is calculated as the ratio of observed occurrences of that codon to the expected occurrences of the same codon provided all synonymous codons of that particular amino acid are used equally [27]. If RSCU value of a codon is greater than 1, it indicates preferred usage over its synonymous counterparts [27,89]. If RSCU value is less than 1, it indicates non-preferred usage and for rare codons, RSCU values fall below 0.66 [32]. No bias is indicated if RSCU value is 1 [27]. RSCU value was calculated according to the equation given below [27] where, RSCU mn is the relative synonymous codon usage value of mth codon of nth amino acid. F mn is the observed frequency of mth codon of nth amino acid and ci is the number of standard synonymous codons of nth amino acid, i.e., level of codon degeneracy.

Dinucleotide Analysis
Dinucleotide frequencies were estimated to check whether any dinucleotides from possible 16 combinations are preferably used as dinucleotide bias is linked with SCUB [33]. Dinucleotide frequency was calculated as follows [42] P xy = F xy F x F y where F x = frequency of nucleotide x, F y = frequency of nucleotide y and F xy is the frequency of dinucleotide xy. The odds ratio is defined as the ratio of observed frequency of a dinucleotide to the expected frequency of that particular dinucleotide. If odds ratio of a given dinucleotide falls above 1.25, it is a sign of over-representation and if the value falls below 0.78, it is a sign of under-representation [42,76].

ENC vs. GC3 Plot
Effective number of codons (ENC) was calculated to assess the extent of SCUB. ENC values range from 20 (extreme bias of synonymous codon usage, i.e., one codon for one amino acid) to 61 (near uniform synonymous codon usage). Expected ENC value of a given sequence is calculated as follows [71] ENC = 2 + s + 29 where s = GC content at the synonymous position of codons (GC3). In ENC vs. GC3 plot, expected curve is a bell-shaped curve indicating the expected values of ENC (ordinate) determined solely by base composition (GC3; abscissa) as per the equation above [71]. In the biological system, for a given sequence, observed ENC values may not always follow the path of expected curve. If observed ENC values fall on or just near the expected curve, it can be assumed that compositional constraints influence the SCUB to a great extent [89]. On the other hand, if observed ENC values fall considerably below the expected curve, it can be assumed that certain other factors (for, e.g., selection) must be influencing the shaping of SCUB [89]. Coding sequences having ENC values ≤ 30 are considered to be highly biased and those with ENC values ≥ 55 are considered to be less biased [59].

Neutrality Plot
Average GC composition at 1st, 2nd and 3rd codon position were calculated. Using GC values at 1st and 2nd positions (GC1 + GC2 = GC12; ordinate) and GC3 (abscissa), neutrality plot was developed to assess the mutation-selection balance in framing SCUB [44]. In the scatter plot, each CDS is indicated by a dot and existence of high correlation between GC12 and GC3 with slope coefficient close to 1 indicates the role of mutation in shaping SCUB [90]. If dots are widespread with no correlation between GC3 and GC12 with slope coefficient tends towards 0, selection is presumed to be possibly influencing the SCUB [6,44].

Parity Rule 2 Plot
Parity rule 2 (PR2) plot was developed to determine relative magnitude of mutation and selection in framing base composition of coding sequences [44]. In this plot, AT bias [A/(A + T)] and GC bias [G/(G + C)] are plotted on ordinate and abscissa [91]. If equal proportion of nucleotides (A = T = G = C = 0.25) is assumed, 0.5 would be the value at the center of the plot indicating that effects of mutation and selection are equal [92]. In this study, AT and GC bias at the third codon positions [A3/(A3 + T3), G3/(G3 + C3)] of four-fold degenerate amino acids of each coding sequence were plotted as PR2 biases at the synonymous positions are relatively more significant [93,94].

Correspondence Analysis
Correspondence analyses (CA) was performed on 59 synonymous codons (excluding ATG for Met, TGG for Trp, termination codons TAA, TAG and TGA) by assuming each coding sequence as a 59-dimensional vector with each dimension identical to RSCU value of a codon [61,95] for delineating SCU variations across the genes of MCV genomes. The relative importance of each codon over each orthogonal axis is represented by eigen value [96]. The total variation of codon usage was partitioned across 59 orthogonal axes in terms of percentage variation accounted by each CA-axis [97]. The first axis of CA explained majority of variations followed by subsequent axes holding a declining number of variations [97]. The number of axes for spearman's rank correlation analyses to study the relative influence of various factors on SCUB was determined based on the condition that selected axes account for majority (>50%) of codon usage variations.

Cluster Analysis
Cluster analysis was performed on the pooled RSCU values of coding sequences of MCV 1 and MCV 2 genomes to study the pattern of codon usage in subtypes of selected MCV based on grouping of subtypes in terms of codon usage [6,70]. A 15 × 59 matrix was constructed in which rows corresponded to 15 MCV strains (nine MCV 1 and six MCV 2) and columns corresponded to pooled RSCU values of 59 codons. The method employed for clustering MCV 1 and MCV 2 subtypes based on RSCU values was unweighted pair-group average clustering based on Euclidean distances [6].

Statistical Analysis and the Softwares Used
Dambe ver 7.3.2 [98] was employed to compute overall base contents, site-specific nucleotide compositions, RSCU, ENC and codon adaptation index (CAI) values. Isoacceptor tRNA pool was identified using an online tool (GtRNAdb: Genomic tRNA database) [42]. All correlation analyses were carried out using non-parametric Spearman rank correlation method [6,97]. Non-parametric Spearman rank correlation method, Mann-Whitney 2-sample test and cluster analysis were performed using PAST 4.03 [99]. For all statistical analyses, the level of significance was taken as p < 0.05.