Analysis of the Codon Usage Pattern of HA and NA Genes of H7N9 Influenza A Virus

Novel H7N9 influenza virus transmitted from birds to human and, since March 2013, it has caused five epidemic waves in China. Although the evolution of H7N9 viruses has been investigated, the evolutionary changes associated with codon usage are still unclear. Herein, the codon usage pattern of two surface glycoproteins, hemagglutinin (HA) and neuraminidase (NA), was studied to understand the evolutionary changes in relation to host, epidemic wave, and pathogenicity. Both genes displayed a low codon usage bias, with HA higher than NA. The codon usage was driven by mutation pressure and natural selection, although the main contributing factor was natural selection. Additionally, the codon adaptation index (CAI) and deoptimization (RCDI) illustrated the strong adaptability of H7N9 to Gallus gallus. Similarity index (SiD) analysis showed that Homo sapiens posed a stronger selection pressure than Gallus gallus. Thus, we assume that this may be related to the gradual adaptability of the virus to human. In addition, the host strong selection pressure was validated based on CpG dinucleotide content. In conclusion, this study analyzed the usage of codons of two genes of H7N9 and expanded our understanding of H7N9 host specificity. This aids into the development of control measures against H7N9 influenza virus.


Introduction
Before the outbreak of H7N9 in China in 2013, H7 subtype avian influenza viruses (AIVs) mainly existed in birds and with less frequency in humans leading to mild symptoms [1]. After March 2013, H7N9 was first isolated in human [2][3][4] and so far, five epidemic waves have been well studied and defined from October of each year to September of the following year [5]. By the end of the fifth wave, H7N9 caused 1564 human cases with a mortality rate of nearly 40% according to the World Health Organization (WHO). The number of infections of the first four waves decreased almost year by year, while the number of cases in the fifth wave increased sharply from late 2016 to early 2017, up to 766 cases (http://www.who.int/influenza/human_animal_interface/HAI_Risk_Assessment/en/), with the simultaneous emergence of highly pathogenic H7N9, indicating a serious threat to public health. The main route of human H7N9 infection is transmission from poultry [6,7]. According to the WHO, most human clinical cases had been exposed to live birds, in particular, live poultry markets [6,8] which may facilitate the adaptability and transmission from birds to humans.
H7N9 AIV belongs to the influenza a virus genus of Orthomyxoviridae. It is a single segmented negative-stranded RNA enveloped virus. The genome contains eight fragments of a total length

Phylogenetic Analyses of the HA and NA Genes of H7N9
The maximum likelihood (ML) trees of HA and NA genes showed similar topology to previous studies [5] with HA highly pathogenic sequences of human clustering in almost a single branch and dispersion of highly pathogenic sequences in the NA gene ( Figure 1). Furthermore, we found a closer relationship of human with chicken compared with duck, with branches containing most duck-derived strains far from other strains. In addition, the newly added strains in 2018 clustered with the fifth wave.

Trends of Codon Usage Patterns Based on Different Classifications of HA and NA
To understand the major variations of H7N9 HA and NA, PCA was calculated according to the relative synonymous codon usage (RSCU) value. We found that the first and second axis account for 29.81% and 27.95% variations of HA, respectively, and 27.62% and 21.76% for NA. Next, we classified all the sequences in clusters. Strains categorized based on the environment, avian, and human clustered together with no effective separation among them, except for several avian strains for both NA and HA ( Figure 2). This is consistent with the evolutionary tree suggesting that they may derive from the same source. We also found that, apart from the strains of fifth wave and low pathogenicity strains, other strains often clustered together.

Trends of Codon Usage Patterns Based on Different Classifications of HA and NA
To understand the major variations of H7N9 HA and NA, PCA was calculated according to the relative synonymous codon usage (RSCU) value. We found that the first and second axis account for 29.81% and 27.95% variations of HA, respectively, and 27.62% and 21.76% for NA. Next, we classified all the sequences in clusters. Strains categorized based on the environment, avian, and human clustered together with no effective separation among them, except for several avian strains for both NA and HA ( Figure 2). This is consistent with the evolutionary tree suggesting that they may derive from the same source. We also found that, apart from the strains of fifth wave and low pathogenicity strains, other strains often clustered together. The environment, human, and avian are represented in orange, beige, and cyan, respectively. Circles are marked by dark purple, olive green, light yellow, orange, and light purple, corresponding to waves 1 to 5, respectively. Grass green corresponds to high pathogenicity.

Nucleotide Composition
We found that the highest mean mononucleotide composition of H7N9 HA and NA corresponded to nucleotide A (greater than 34%), while the remaining nucleotide values were approximately 20%. The frequency of nucleotide on the third position suggested that, in synonymous codons, A was also the highest in accordance with the mononucleotide composition. The overall AU content accounted for 60% compared with 40% of GC. For HA and NA, the highest value of the position of GC was for position 1 and the lowest value for position 3 ( Table 1). The same patters were identified when considering different classifications, different waves, hosts, and pathogenicity. Altogether, we can conclude that these two genes are biased towards the use of A bases and thus, the existence of codon usage preferences in the HA and NA genes.  The environment, human, and avian are represented in orange, beige, and cyan, respectively. Circles are marked by dark purple, olive green, light yellow, orange, and light purple, corresponding to waves 1 to 5, respectively. Grass green corresponds to high pathogenicity.

Nucleotide Composition
We found that the highest mean mononucleotide composition of H7N9 HA and NA corresponded to nucleotide A (greater than 34%), while the remaining nucleotide values were approximately 20%. The frequency of nucleotide on the third position suggested that, in synonymous codons, A was also the highest in accordance with the mononucleotide composition. The overall AU content accounted for 60% compared with 40% of GC. For HA and NA, the highest value of the position of GC was for position 1 and the lowest value for position 3 ( Table 1). The same patters were identified when considering different classifications, different waves, hosts, and pathogenicity. Altogether, we can conclude that these two genes are biased towards the use of A bases and thus, the existence of codon usage preferences in the HA and NA genes.  (Table S2), with values greater than 35, indicative of these two genes possessing a lower preference of codon usage no matter in relating to waves, hosts, or pathogenicity ( Figure 3). values greater than 35, indicative of these two genes possessing a lower preference of codon usage no matter in relating to waves, hosts, or pathogenicity ( Figure 3).

RSCU Value of HA and NA Genes
Based on RSCU analysis (Table 2), we found that optimal codons terminated in nucleotide A (11 codons in A, 5 in U, and 1 in G and C) for HA. For NA, nucleotide A was also the most commonly used base at the end of the optimal codon, followed by C, U, and G. It is worth noting that 7 and 8 for HA and NA genes of the 18 preferred synonymous codons had a value > 1.6, the highest being AGA (3.4 for HA and 2.47 for NA), which indicates they are over-represented. Moreover, almost all of the above synonymous codons were A-ended, whereas most of synonymous codons were underrepresented. Most of the low-expression synonymous codons ended in C, G, and U, with the exception of UUA and CGA, which encode Leu and Arg in HA. In addition, the codon usage results based on different categories were closely related to the results of all sequence analyses. Furthermore, we evaluated the H7N9 RSCU value compared with the host species, even though the association was almost non-existent. In particular, for the optimal synonymous codon only 3 to 5 of the 18 preferred codons were identical. The relevance was considered to be minimal. The RSCU of highly pathogenic HA and NA were identical with the host, rather than with all sequences.

RSCU Value of HA and NA Genes
Based on RSCU analysis (Table 2), we found that optimal codons terminated in nucleotide A (11 codons in A, 5 in U, and 1 in G and C) for HA. For NA, nucleotide A was also the most commonly used base at the end of the optimal codon, followed by C, U, and G. It is worth noting that 7 and 8 for HA and NA genes of the 18 preferred synonymous codons had a value > 1.6, the highest being AGA (3.4 for HA and 2.47 for NA), which indicates they are over-represented. Moreover, almost all of the above synonymous codons were A-ended, whereas most of synonymous codons were underrepresented. Most of the low-expression synonymous codons ended in C, G, and U, with the exception of UUA and CGA, which encode Leu and Arg in HA. In addition, the codon usage results based on different categories were closely related to the results of all sequence analyses. Furthermore, we evaluated the H7N9 RSCU value compared with the host species, even though the association was almost non-existent. In particular, for the optimal synonymous codon only 3 to 5 of the 18 preferred codons were identical. The relevance was considered to be minimal. The RSCU of highly pathogenic HA and NA were identical with the host, rather than with all sequences.

Factors Driving Codon Usage Bias
Factors shaping the codon usage bias of H7N9 HA and NA genes were illustrated by ENC-plot, Parity Rule 2 (PR2), and neutrality analysis. We found that the points corresponding to HA and NA genes clustered below the expected curve regardless of the classifications in ENC-plot ( Figure 4). This indicates the effect of mutational pressure on codon usage bias with natural selection being more important than other factors. Based on PR2 analysis ( Figure 5), these points were away from the origin (0.5, 0.5), indicating a bias between the effect of mutation pressure and natural selection.

Factors Driving Codon Usage Bias
Factors shaping the codon usage bias of H7N9 HA and NA genes were illustrated by ENC-plot, Parity Rule 2 (PR2), and neutrality analysis. We found that the points corresponding to HA and NA genes clustered below the expected curve regardless of the classifications in ENC-plot ( Figure 4). This indicates the effect of mutational pressure on codon usage bias with natural selection being more important than other factors. Based on PR2 analysis ( Figure 5), these points were away from the origin (0.5, 0.5), indicating a bias between the effect of mutation pressure and natural selection.   The ENC-plot analysis and PR2 analysis showed that both mutation pressure and natural selection govern the codon usage pattern. Next, to assess the extent of the mutational pressure compared with natural selection, the correlation between GC12s and GC3s was investigated by neutrality analysis. For both HA and NA, the correlation between the indexes was extremely significant (p < 0.0001). However, the coefficient of the slope was 0.001749 ± 0.4562, −0.2510 ± 0.5428, and −0.1174 ± 0.4966 for avian, human, and the environment of HA, respectively. This indicates that the contribution of natural selection was 99%, 75%, and 89%, respectively. We also found strong natural selection pressure in sequence analysis according to other classifications, with the regression slope close to 0.0 ( Figure 6). In general, although natural selection pressure had different strength for different classifications, the influence of natural selection was more dominant than mutation pressure in shaping codon usage bias of HA and NA complete sequences. The ENC-plot analysis and PR2 analysis showed that both mutation pressure and natural selection govern the codon usage pattern. Next, to assess the extent of the mutational pressure compared with natural selection, the correlation between GC12s and GC3s was investigated by neutrality analysis. For both HA and NA, the correlation between the indexes was extremely significant (p < 0.0001). However, the coefficient of the slope was 0.001749 ± 0.4562, −0.2510 ± 0.5428, and −0.1174 ± 0.4966 for avian, human, and the environment of HA, respectively. This indicates that the contribution of natural selection was 99%, 75%, and 89%, respectively. We also found strong natural selection pressure in sequence analysis according to other classifications, with the regression slope close to 0.0 ( Figure 6). In general, although natural selection pressure had different strength for different classifications, the influence of natural selection was more dominant than mutation pressure in shaping codon usage bias of HA and NA complete sequences. Figure 6. Neutrality analysis of HA and NA depicted by plotting GC3s against GC12s. The higher the slope, the greater the effect of natural selection pressure. The environment, human, and avian are represented in orange, beige, and cyan, respectively. Dark purple, olive green, light yellow, orange, and light purple correspond to waves 1 to 5, respectively. Grass green corresponds to high pathogenicity.

The HA and NA Genes of H7N9 Virus Are Highly Adapted to Gallus gallus
The adaptation of H7N9 HA and NA genes to Gallus gallus and Homo sapiens was investigated by codon adaptation index (CAI) analysis (Figure 7). Both the HA and NA genes showed higher CAI values in Gallus gallus compared with Homo sapiens. Among the three classifications, the highest values of the HA gene were environment, high pathogenicity, and wave 5 with mean ± SD values of 0.7457 ± 0.003, 0.7467 ± 0.001, and 0.747 ± 0.0019, respectively. Similar results were found for NA (Figure 7). Regarding the lowest CAI value, a different trend was identified for HA and NA. The lowest CAI value was found in the low pathogenicity classification irrespective of the gene.
Relative codon deoptimization index (RCDI) analysis was performed to explore the codon deoptimization. The value of Homo sapiens was higher than that of Gallus gallus. The RCDI value for environment (1.349 ± 0.021), high pathogenicity (1.339 ± 0.003), and wave 5 (1.340 ± 0.012) was lowest in HA. For NA, the lowest value was environment (1.378 ± 0.012), high pathogenicity (1.371 ± 0.008), and wave 5 (1.374 ± 0.011). Overall, the CAI or RCDI values of NA gene were higher than HA. Figure 6. Neutrality analysis of HA and NA depicted by plotting GC3s against GC12s. The higher the slope, the greater the effect of natural selection pressure. The environment, human, and avian are represented in orange, beige, and cyan, respectively. Dark purple, olive green, light yellow, orange, and light purple correspond to waves 1 to 5, respectively. Grass green corresponds to high pathogenicity.

The HA and NA Genes of H7N9 Virus Are Highly Adapted to Gallus gallus
The adaptation of H7N9 HA and NA genes to Gallus gallus and Homo sapiens was investigated by codon adaptation index (CAI) analysis (Figure 7). Both the HA and NA genes showed higher CAI values in Gallus gallus compared with Homo sapiens. Among the three classifications, the highest values of the HA gene were environment, high pathogenicity, and wave 5 with mean ± SD values of 0.7457 ± 0.003, 0.7467 ± 0.001, and 0.747 ± 0.0019, respectively. Similar results were found for NA ( Figure 7). Regarding the lowest CAI value, a different trend was identified for HA and NA. The lowest CAI value was found in the low pathogenicity classification irrespective of the gene.

Figure 7.
Codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis of HA and NA. The coordinate axis was divided into two segments, and then placed the above two analyses on the same figure for observation. CAI corresponds to dark purple for avian and coffee for human. In RCDI, dark red and dark green represent Gallus gallus and Homo sapiens, respectively. Cylindrical maps are classified according to different taxonomy with SiD values as ordinates. Blue and yellow are used to represent Homo sapiens and Gallus gallus, respectively.

Strong Selection Pressure of Homo Sapiens on H7N9
Similarity index (SiD) analysis was performed to find out the effect of the overall codon usage pattern of the host on the total codon usage of the H7N9 virus. We found that Homo sapiens had a strong selection pressure on the virus compared with Gallus gallus (Figure 7). The codon similarity between host species and the varied waves, as well as pathogenicity in HA showed a gradual downward trend from wave 1 to 5 while low pathogenicity was higher than that of high pathogenicity. For NA, although there was no same downward trend as HA, wave 5 was significantly lower than the other waves, while the conclusion of pathogenicity was identical to that of HA. In general, Homo sapiens had a greater impact on H7N9. Furthermore, we also calculated the incidence of CpG dinucleotide frequencies to understand their relationship with the host. We tracked the evolution of all H7N9 strains, including the CpG content after cross-host (Figure 8). The range of CpG content of HA was 0.345 to 0.511 for Gallus gallus and 0.331 to 0.455 for Homo sapiens. The values of NA were 0.300 to 0.451 for Gallus gallus and 0.316 to 0.436 for Homo sapiens. All these values were lower than 0.78, implying that CpG was underrepresented. Codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analysis of HA and NA. The coordinate axis was divided into two segments, and then placed the above two analyses on the same figure for observation. CAI corresponds to dark purple for avian and coffee for human. In RCDI, dark red and dark green represent Gallus gallus and Homo sapiens, respectively. Cylindrical maps are classified according to different taxonomy with SiD values as ordinates. Blue and yellow are used to represent Homo sapiens and Gallus gallus, respectively.

Strong Selection Pressure of Homo Sapiens on H7N9
Similarity index (SiD) analysis was performed to find out the effect of the overall codon usage pattern of the host on the total codon usage of the H7N9 virus. We found that Homo sapiens had a strong selection pressure on the virus compared with Gallus gallus (Figure 7). The codon similarity between host species and the varied waves, as well as pathogenicity in HA showed a gradual downward trend from wave 1 to 5 while low pathogenicity was higher than that of high pathogenicity. For NA, although there was no same downward trend as HA, wave 5 was significantly lower than the other waves, while the conclusion of pathogenicity was identical to that of HA. In general, Homo sapiens had a greater impact on H7N9. Furthermore, we also calculated the incidence of CpG dinucleotide frequencies to understand their relationship with the host. We tracked the evolution of all H7N9 strains, including the CpG content after cross-host (Figure 8). The range of CpG content of HA was 0.345 to 0.511 for Gallus gallus and 0.331 to 0.455 for Homo sapiens. The values of NA were 0.300 to 0.451 for Gallus gallus and 0.316 to 0.436 for Homo sapiens. All these values were lower than 0.78, implying that CpG was underrepresented.

Discussion
Influenza virus evolution is driven by genetic shift and drift [32]. H7N9 AIV originated from poultry via reassortment in 2013 and caused the highest number of human cases in the latest (fifth) wave according to the WHO. Therefore, it is urgent to analyze its genetic evolution and adaptability. Codon usage studies of the epidemic H7N9 virus in different avian hosts based on the PB2 gene have been reported [31]. These studies lay foundation for further research on the evolution of H7N9. Herein, we collected 2024 HA and 1989 NA genes sequences of all H7N9 available sequences in China from all hosts until 2019 and performed a comprehensive and systematic analysis based on host, wave, and pathogenicity.
Based on ML tree of HA and NA genes, H7N9 isolates from different waves and hosts displayed no clear dependent branch. Even if there were obvious sequence differences between exact genes of isolates with high and low pathogenicity, most of the high pathogenicity sequences clustered together. However, they shared the same branch with low pathogenicity isolates, indicating they derive from a common source as previously shown [33]. Based on codon analysis, the results of PCA were consistent with the evolutionary tree. Of note, the branch-clustering high pathogenicity strains displayed highest homology with chickens rather than other poultry animals.
Codon usage bias is common in other viruses, such as ZIKA virus [25], H3N2 CIVs [34], etc. We found that the overall AU content of HA and NA was higher than GC and the optimal codons ended with A. ENC values revealed a low-level overview among HA and NA. A higher codon usage bias is in contrast with other IAV, such as the 1918 pandemic H1N1 (52.50) [35], H3N8 EIVs (52.09) [36], and H5N1 influenza virus (almost 52.00) [37]. Moreover, the average value of the HA gene of ICV and IDV were 44.15 ± 0.92 [38], 48.3 ± 0.179 [39], respectively. It is hypothesized that a low codon bias of H7N9 AIV compared with other influenza viruse subtypes might promote effective replication by reducing competition between viruses and hosts during protein synthesis according to previous reports [40]. Hence, H7N9 had different extent of codon usage bias in the avian and human hosts with lower codon usage preference in the human than in avian host helping maintain the successful replication of the virus and possibly increase in virulence [40]. The nucleotide composition displayed an extremely higher AU content than GC, in agreement with the optimal synonymous codon on the third position. We concluded that the codon preference was impacted by composition, i.e., mutation pressure. In addition, we compared the RSCU of the virus with the host RSCU. H7N9 evolved almost exactly in the opposite direction to host RSCU. It has been reported that the usage of the same synonymous codon allows efficient translation of the virus [41]. Thus, the phenomenon observed here indicates that the translation efficiency may be reduced, while the viral protein can be correctly folded [41].
By ENC-plot and PR2 analyses, we found the effect of both mutation pressure and natural selection. However, the predominant factor in shaping the codon usage bias of specific classification

Discussion
Influenza virus evolution is driven by genetic shift and drift [32]. H7N9 AIV originated from poultry via reassortment in 2013 and caused the highest number of human cases in the latest (fifth) wave according to the WHO. Therefore, it is urgent to analyze its genetic evolution and adaptability. Codon usage studies of the epidemic H7N9 virus in different avian hosts based on the PB2 gene have been reported [31]. These studies lay foundation for further research on the evolution of H7N9. Herein, we collected 2024 HA and 1989 NA genes sequences of all H7N9 available sequences in China from all hosts until 2019 and performed a comprehensive and systematic analysis based on host, wave, and pathogenicity.
Based on ML tree of HA and NA genes, H7N9 isolates from different waves and hosts displayed no clear dependent branch. Even if there were obvious sequence differences between exact genes of isolates with high and low pathogenicity, most of the high pathogenicity sequences clustered together. However, they shared the same branch with low pathogenicity isolates, indicating they derive from a common source as previously shown [33]. Based on codon analysis, the results of PCA were consistent with the evolutionary tree. Of note, the branch-clustering high pathogenicity strains displayed highest homology with chickens rather than other poultry animals.
Codon usage bias is common in other viruses, such as ZIKA virus [25], H3N2 CIVs [34], etc. We found that the overall AU content of HA and NA was higher than GC and the optimal codons ended with A. ENC values revealed a low-level overview among HA and NA. A higher codon usage bias is in contrast with other IAV, such as the 1918 pandemic H1N1 (52.50) [35], H3N8 EIVs (52.09) [36], and H5N1 influenza virus (almost 52.00) [37]. Moreover, the average value of the HA gene of ICV and IDV were 44.15 ± 0.92 [38], 48.3 ± 0.179 [39], respectively. It is hypothesized that a low codon bias of H7N9 AIV compared with other influenza viruse subtypes might promote effective replication by reducing competition between viruses and hosts during protein synthesis according to previous reports [40]. Hence, H7N9 had different extent of codon usage bias in the avian and human hosts with lower codon usage preference in the human than in avian host helping maintain the successful replication of the virus and possibly increase in virulence [40]. The nucleotide composition displayed an extremely higher AU content than GC, in agreement with the optimal synonymous codon on the third position. We concluded that the codon preference was impacted by composition, i.e., mutation pressure. In addition, we compared the RSCU of the virus with the host RSCU. H7N9 evolved almost exactly in the opposite direction to host RSCU. It has been reported that the usage of the same synonymous codon allows efficient translation of the virus [41]. Thus, the phenomenon observed here indicates that the translation efficiency may be reduced, while the viral protein can be correctly folded [41].
By ENC-plot and PR2 analyses, we found the effect of both mutation pressure and natural selection. However, the predominant factor in shaping the codon usage bias of specific classification was natural selection. In addition, CAI analysis was used to analyze the role of natural selection deeply. Overall, the adaption of H7N9 to Gallus gallus was higher than to Homo sapiens. However, on the basis of host classification, the CAI value of Homo sapiens was higher than that of Gallus gallus. In addition, the CAI values in Homo sapiens relating to waves showed a gradually increasing tendency. This may be related to the emergence of highly pathogenic strains in the fifth wave [42] leading to a large number of human deaths. The CAI of high pathogenic strains was also expected to be higher than that of low pathogenic. Therefore, we inferred that the level of CAI might be related to the virulence of the virus to host and potential hosts, similarly to previously reported data [43]. In addition, the combination of RCDI and CAI analysis further validated the high adaptability of the virus to Gallus gallus. For SiD analysis, the strong selection pressure on Homo sapiens compared with Gallus gallus is indicative of the virus gradually adapting to Homo sapiens, involving new mutations coinciding with huge outbreaks of human infections in the fifth wave in China [44]. The lower CpG content found in human, especially for HA indicates that there is a strong selection pressure in human [45].
In general, we found that H7N9 has a low codon bias and is mainly driven by natural selection. After avian influenza virus transmitted to human, a rapid adaptation was observed in relation to codon usage bias. This information is of great significance for studying the structure and function of H7N9 HA and NA and for understanding the evolution of H7N9. More and more epidemiological surveillance should be considered due to the increasing number of human infections and deaths caused by the emergence of high pathogenic viruses.

Data Sequences
All the complete coding sequences of HA and NA gene of H7N9 virus (including viruses infecting avian and human) from China were downloaded from GenBank of National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/genbank/) and GISAID (https://www.gisaid.org/). A total of 2024 HA and 1989 NA genes were analyzed. The detailed information of strain name, collection date, and province as well as host is listed in the Supplementary Materials (Table S1).

Phylogenetic Analysis
Sequences were aligned using MAFFT (v 7.1) [46], manually adjusted, and divided into three sets according to host, wave, and high or low pathogenicity to humans. Maximum likelihood trees of HA and NA genes were reconstructed with RAxML (v8.2.10) [47] using the GTR+I+Γ nucleotide substitution model, which was inferred by ModelGenerator [48].

Correspondence Analysis
Correspondence analysis is a method of multiple vector statistics that reveals the codon usage pattern trends of genes. Each sequence is presented in 59-dimensional result using the RSCU value as a benchmark. Previous studies showed that the first two axes account for a large proportion of the total changes, indicating that they account for the main part of codon usage change [49,50]. Therefore, we selected the first two dimensions of the data as the basis for the next analysis.

Relative Synonymous Codon Usage Analysis
In order to understand the frequency at which codon is used in a synonymous codon family, the RSCU value was calculated by MEGA 7.0. The calculation formula of RSCU is as follows: where the g ij is the quantity of the ith codon of jth amino acid. The denominator is the sum of all synonymous codons encoding the amino acid, and is multiplied by the number of synonymous codons at the end [51]. If the value = 1 means that the usage frequency of the synonymous codons is equal [52].
If it is >1.0 or <1.0, it means abundant codons and less abundant codons, respectively. Two extreme values of RSCU were >1.6 and <0.6 and were treated as 'over-represented' and 'underrepresented' codons, respectively [53].

Effective Number of Codons Analysis
The effective number of codons is considered a standard method to evaluate codon usage bias [54]. The ENC values range from 20 to 61, representing the use of only one codon per amino acid and all possible synonymous codons. The formula to calculate it is as follows: where F i (i = 2, 3, 4, 6) is the average of the F i values of the i-fold degenerate amino acids. Using the formula to calculate the F i value, we obtain: where n is the total number of codon occurrences of the amino acid and n j is the total number of occurrences of the jth codon of the amino acid. The cut-off point of the ENC value is 35 [55]. When it is less than 35, it means that the gene has a strong codon preference. The larger the ENC value, the lower the codon usage bias.

ENC-Plot Analysis
The main codon usage bias driving factors are mutation pressure and natural selection [56,57] among others such as, replication, protein structure, and dinucleotide frequency [36,58]. The ENC value is plotted in the ordinate and GC3s as the abscissa for analysis. The expected ENC value was calculated as follows: ENC expected = 2 + s + 29 where 's' is the frequency G + C at the third position of synonymous codons. If the point lies on or around the standard curve, it means codon usage bias is merely constrained by mutation pressure. In contrast, if the point lies below and away from the standard curve, this means other factors besides mutation pressure drive codon bias.

Parity Rule 2 Analysis (PR2)
PR2 analysis takes [A3/(A3 + U3)] of four-codon amino acids as the ordinate and [G3/(G3 + C3)] as the abscissa and investigates the impact of mutation pressure and natural selection pressure. It takes 0.5 and 0.5 as the origin of coordinate axis. When the value is located at the origin, it is confirmed that there is no deviation between the effect of mutation pressure and natural selection [59,60].

Neutrality Analysis
Neutrality analysis was used to verify the major factors effecting the codon usage pattern, especially mutation pressure or natural selection [61]. It uses a linear relationship representing GC12s and GC3s. If the slope is 0, the effect of direct mutation pressure is not present while if the slope of the linear relationship is 1, it means mutation pressure plays a major role. The higher the slope, the greater the effect of natural selection pressure [61]. Each dot represented one sequence of H7N9 HA gene or NA gene.

Codon Adaptation Index
CAI values are generally used to predict gene expression levels according to reference host RSCU values, ranging from 0 to 1.0. The CAI value was calculated by CAIcal server (http://genomes.urv.es/ CAIcal/) [62]. The CAI value was calculated based on the reference value of the host. The higher the value, the stronger the adaptability of the corresponding host, and vice versa [63]. The reference RSCU was obtained from the Codon Usage Database (CUD) [64], in which the host species were Homo sapiens and Gallus gallus, as the existing hosts of H7N9.

Relative Codon Deoptimization Index
The codon deoptimization trend is determined by comparing the codon usage of a given coding sequence with the reference genome. The RCDI was calculated by CAIcal server (http://genomes.urv.es/CAIcal/). Contrary to CAI, the value is ≥1. The larger the value, the weaker the adaptability to the host [65,66]. The reference RSCU value of the hosts Homo sapiens and Gallus gallus was obtained from CUD (http://www.kazusa.or.jp/codon/).

Similarity Index
The Similarity index analysis is the effect of the overall codon usage pattern of the host on the codon usage of the virus. A common estimate of SiD is the cosine of the angle between A and B is:

CpG Dinucleotides Frequency
The CpG content of each strain of HA and NA gene of H7N9 was calculated using DAMBE [68]. The ratio of CpG is divided by the observed value and the expected value. As mentioned above, the expected value was also obtained. When the relative dinucleotide abundances are >1.23 or <0.78 it indicates over-represented and under-represented dinucleotides, respectively [69].
Supplementary Materials: The following are available online at http://www.mdpi.com/1422-0067/21/19/7129/s1. Table S1. Host, country, and date of HA and NA sequences used in this study. Table S2. Nucleotide composition of HA and NA sequences and relevant plotting data.