Synonymous Codon Usages as an Evolutionary Dynamic for Chlamydiaceae

The family of Chlamydiaceae contains a group of obligate intracellular bacteria that can infect a wide range of hosts. The evolutionary trend of members in this family is a hot topic, which benefits our understanding of the cross-infection of these pathogens. In this study, 14 whole genomes of 12 Chlamydia species were used to investigate the nucleotide, codon, and amino acid usage bias by synonymous codon usage value and information entropy method. The results showed that all the studied Chlamydia spp. had A/T rich genes with over-represented A or T at the third positions and G or C under-represented at these positions, suggesting that nucleotide usages influenced synonymous codon usages. The overall codon usage trend from synonymous codon usage variations divides the Chlamydia spp. into four separate clusters, while amino acid usage divides the Chlamydia spp. into two clusters with some exceptions, which reflected the genetic diversity of the Chlamydiaceae family members. The overall codon usage pattern represented by the effective number of codons (ENC) was significantly positively correlated to gene GC3 content. A negative correlation exists between ENC and the codon adaptation index for some Chlamydia species. These results suggested that mutation pressure caused by nucleotide composition constraint played an important role in shaping synonymous codon usage patterns. Furthermore, codon usage of T3ss and Pmps gene families adapted to that of the corresponding genome. Taken together, analyses help our understanding of evolutionary interactions between nucleotide, synonymous codon, and amino acid usages in genes of Chlamydiaceae family members.


Introduction
Chlamydia spp. are a group of obligate intracellular bacteria that are widely distributed throughout the world, causing a variety of diseases in humans and animals [1]. To date, 12 species have been identified in the single genus of the family Chlamydiaceae: C. trachomatis, C. pneumoniae, C. muridarum, C. suis, C. psittaci, C. pecorum, C. abortus, C. felis, C. caviae, C. avium, C. gallinacea, and C. ibidis [2,3]. Among these species, C. trachomatis and C. pneumonia mainly cause diseases in humans, while other species often cause animal diseases and most have zoonotic potential [4]. C. trachomatis infects the ocular and genital mucosa. It is the leading cause of infectious preventable blindness in developing countries and contributes to the most prevalent bacterial sexually transmitted diseases (STDs) throughout the world [5,6]. Urogenital infection of C. trachomatis may cause serious sequelae

Bias A/T versus G/C in Chlamydia spp. Genes
To better clarify the organization of nucleotide usages at gene levels of Chlamydia spp., the nucleotide contents of genes in each species were calculated. The average contents of the four nucleotides represented similar patterns in the 12 species, namely bias for high AT content versus low GC content (Table 1). Furthermore, nucleotide usages at third codon positions strongly influenced the organization at gene levels (Table 1). Generally, the overall nucleotide usages at gene levels had an obvious effect on the organization of nucleotide usages at third codon positions, suggesting synonymous codon usages could be influenced by the stable organization of nucleotide usages at gene levels of Chlamydia spp. Analysis of information entropy showed an overall nucleotide usage bias derived from the four nucleotide contents (Table 1). Figure 1 showed that extents of nucleotide usage bias at third codon positions was generally stronger than the overall nucleotide usage bias in genes of Chlamydia spp. This result implied that mutation pressure caused by nucleotide composition played an important role in shaping synonymous codon usages in Chlamydia spp. In addition, the three biovars of C. trachomatis exhibited different nucleotide usage biases at third codon positions, with C. trachomatis lymphogranuloma venereum biovar strain L2/25567R significantly different from C. trachomatis trachoma biovar strain A-HAR-13 and C. trachomatis genital tract infection biovar strain E/SW3 ( Figure 1). This result might suggest that mutation pressure functioned as a regulator for different biovars in the C. trachomatis. To better clarify the organization of nucleotide usages at gene levels of Chlamydia spp., the nucleotide contents of genes in each species were calculated. The average contents of the four nucleotides represented similar patterns in the 12 species, namely bias for high AT content versus low GC content (Table 1). Furthermore, nucleotide usages at third codon positions strongly influenced the organization at gene levels (Table 1). Generally, the overall nucleotide usages at gene levels had an obvious effect on the organization of nucleotide usages at third codon positions, suggesting synonymous codon usages could be influenced by the stable organization of nucleotide usages at gene levels of Chlamydia spp. Analysis of information entropy showed an overall nucleotide usage bias derived from the four nucleotide contents (Table 1). Figure 1 showed that extents of nucleotide usage bias at third codon positions was generally stronger than the overall nucleotide usage bias in genes of Chlamydia spp. This result implied that mutation pressure caused by nucleotide composition played an important role in shaping synonymous codon usages in Chlamydia spp. In addition, the three biovars of C. trachomatis exhibited different nucleotide usage biases at third codon positions, with C. trachomatis lymphogranuloma venereum biovar strain L2/25567R significantly different from C. trachomatis trachoma biovar strain A-HAR-13 and C. trachomatis genital tract infection biovar strain E/SW3 ( Figure 1). This result might suggest that mutation pressure functioned as a regulator for different biovars in the C. trachomatis.

Nucleotide Usage Influencing Codon Usage
We quantified bias in synonymous codon usage using relative synonymous codon usage (RSCU). All over-represented synonymous codons ended with A or T, while all under-represented synonymous codons ended with G or C (Table S2). These genetic features strongly reflected obvious constraints on nucleotide composition shaping synonymous codon usage biases in Chlamydia. Note that despite the dominating evolutionary dynamic of nucleotide usage patterns on codon usages and their variation in Chlamydia, some synonymous codons usage patterns showed that constraints on nucleotide composition were not the only constraints affecting the evolutionary dynamics of Chlamydia gene contents. In Table S2, although the four dominant synonymous codons (CTA for Leu, ATA for Ile, AGT for Ser and CCA for Pro) ended with A or T, corresponding RSCU values were less than 1.0 in the 12 species, suggesting that usage of the four synonymous codons was suppressed. Similar genetic features were also found in stop codons usages. In Table S3, despite bias for stop codon TAA in all 12 Chlamydia, stop codon TGA was less used than stop codon TAG, despite their identical nucleotide contents. These findings implied that translational constraints modified usages of specific synonymous codons and stop codons.
Compositional asymmetry of nucleotide contents between the replicational leading and lagging strands is common in bacteria [37] including Chlamydiae. To quantify the synonymous codon usage variations in the leading and lagging strand of each Chlamydia spp., we used PCA to reflect overall codon usage pattern. The results showed different codon usages of genes located on the leading and lagging strands for most of the Chlamydia spp. (Figure S1), which was consistent with the previous

Nucleotide Usage Influencing Codon Usage
We quantified bias in synonymous codon usage using relative synonymous codon usage (RSCU). All over-represented synonymous codons ended with A or T, while all under-represented synonymous codons ended with G or C (Table S2). These genetic features strongly reflected obvious constraints on nucleotide composition shaping synonymous codon usage biases in Chlamydia. Note that despite the dominating evolutionary dynamic of nucleotide usage patterns on codon usages and their variation in Chlamydia, some synonymous codons usage patterns showed that constraints on nucleotide composition were not the only constraints affecting the evolutionary dynamics of Chlamydia gene contents. In Table S2, although the four dominant synonymous codons (CTA for Leu, ATA for Ile, AGT for Ser and CCA for Pro) ended with A or T, corresponding RSCU values were less than 1.0 in the 12 species, suggesting that usage of the four synonymous codons was suppressed. Similar genetic features were also found in stop codons usages. In Table S3, despite bias for stop codon TAA in all 12 Chlamydia, stop codon TGA was less used than stop codon TAG, despite their identical nucleotide contents. These findings implied that translational constraints modified usages of specific synonymous codons and stop codons.
Compositional asymmetry of nucleotide contents between the replicational leading and lagging strands is common in bacteria [37] including Chlamydiae. To quantify the synonymous codon usage variations in the leading and lagging strand of each Chlamydia spp., we used PCA to reflect overall codon usage pattern. The results showed different codon usages of genes located on the leading and lagging strands for most of the Chlamydia spp. (Figure S1), which was consistent with the previous reports [38,39]. However, the codon usages in C. avium and C. abortus did not show separation between leading and lagging strands.

Genetic Diversity of Chlamydia spp. in Codon and Amino Acid Usage
To quantify the overall codon usage trends from synonymous codon usage variations of Chlamydia spp., we used PCA to reflect overall codon usage trends. Generally, the first (f'1) and second (f'2) PCA axis accounted for 49.0% and 17.0% of the total codon usage variation, respectively. The three biovars of C. trachomatis almost perfectly overlapped and clustered with C. suis and C. muridarum, while C. pecorum, C. pneumoniae and C. abortus clustered together. C. caviae, C. felis, and C. psittaci formed a third cluster. Interestingly, the newly identified species, C. avium, C. gallinacean and C. ibidis showed different overall codon usage trends ( Figure 2). The first (f'1) and second (f'2) PCA axis accounted for 51.0% and 19.2% of the total amino acid usage variation, respectively. The amino acid usage patterns for C. trachomatis, C. muridarum, C. suis, C. felis, C. peittaci and C. caviae could be divided into the two genetic clusters, however, the others owned their specific amino acid usage patterns ( Figure 3). These results suggested that both codon usage and amino acid usage patterns could be regarded as evolutionary dynamics related to the balance between mutation pressure and natural selection for driving the evolution of the Chlamydia spp. reports [38,39]. However, the codon usages in C. avium and C. abortus did not show separation between leading and lagging strands.

Genetic Diversity of Chlamydia spp. in Codon and Amino Acid Usage
To quantify the overall codon usage trends from synonymous codon usage variations of Chlamydia spp., we used PCA to reflect overall codon usage trends. Generally, the first (f'1) and second (f'2) PCA axis accounted for 49.0% and 17.0% of the total codon usage variation, respectively. The three biovars of C. trachomatis almost perfectly overlapped and clustered with C. suis and C. muridarum, while C. pecorum, C. pneumoniae and C. abortus clustered together. C. caviae, C. felis, and C. psittaci formed a third cluster. Interestingly, the newly identified species, C. avium, C. gallinacean and C. ibidis showed different overall codon usage trends ( Figure 2). The first (f'1) and second (f'2) PCA axis accounted for 51.0% and 19.2% of the total amino acid usage variation, respectively. The amino acid usage patterns for C. trachomatis, C. muridarum, C. suis, C. felis, C. peittaci and C. caviae could be divided into the two genetic clusters, however, the others owned their specific amino acid usage patterns ( Figure 3). These results suggested that both codon usage and amino acid usage patterns could be regarded as evolutionary dynamics related to the balance between mutation pressure and natural selection for driving the evolution of the Chlamydia spp.   reports [38,39]. However, the codon usages in C. avium and C. abortus did not show separation between leading and lagging strands.

Genetic Diversity of Chlamydia spp. in Codon and Amino Acid Usage
To quantify the overall codon usage trends from synonymous codon usage variations of Chlamydia spp., we used PCA to reflect overall codon usage trends. Generally, the first (f'1) and second (f'2) PCA axis accounted for 49.0% and 17.0% of the total codon usage variation, respectively. The three biovars of C. trachomatis almost perfectly overlapped and clustered with C. suis and C. muridarum, while C. pecorum, C. pneumoniae and C. abortus clustered together. C. caviae, C. felis, and C. psittaci formed a third cluster. Interestingly, the newly identified species, C. avium, C. gallinacean and C. ibidis showed different overall codon usage trends ( Figure 2). The first (f'1) and second (f'2) PCA axis accounted for 51.0% and 19.2% of the total amino acid usage variation, respectively. The amino acid usage patterns for C. trachomatis, C. muridarum, C. suis, C. felis, C. peittaci and C. caviae could be divided into the two genetic clusters, however, the others owned their specific amino acid usage patterns ( Figure 3). These results suggested that both codon usage and amino acid usage patterns could be regarded as evolutionary dynamics related to the balance between mutation pressure and natural selection for driving the evolution of the Chlamydia spp.

Multiple Selection Forces Influencing Codon Usage Patterns in Chlamydia spp.
To identify whether gene codon usage patterns in each Chlamydia spp. were shaped solely by mutation pressure, natural selection, or both, ENC v.s GC3 content maps were constructed for each strain. The vast majority of plots for each species did not overlap the expected curve and were below this curve, and the below-curve scattering plots reflected the dominating effects of natural selection on genes of each species. As for the closely related species C. trachomatis, C. muridarum and C. suis, codon usage pattern at gene levels of C. trachomatis A-HAR-13 and C. muridarum Nigg represented more limited codon usage patterns than those of C. trachomatis L2/25567R, C. trachomatis E/SW3 and C. suis MD56 (Figure 4a-e). For C. pneumoniae and most of the mammal infecting Chlamydia spp., similar codon usage patterns among the corresponding genome were observed (Figure 4f-i,l-m). While the three newly identified bird-infesting Chlamydia spp., codon usage patterns of C. avium 10DC88 were more limited than those of C. gallinacea and C. ibidis (Figure 4j,k,n). To better identify the role of mutation pressure from gene nucleotide composition, correlations between ENC and GC3 contents of gene were calculated. ENC and GC3 content are positively correlated in all species ( Table 2), suggesting that mutation pressure in all Chlamydia has dominant roles in shaping codon usage. In addition, significant negative correlation with the relative rank (r value ranging from −0.308 to −0.067) was found between CAI and ENC for all strains, excluding C. suis MD56, C. pecorum E58 and C. pneumoniae TW-183 (Table 3), implying that the obvious effect of codon usage bias the on codon usage pattern of gene population was just one among several evolutionary dynamics, compared with the role of mutation pressure. 7 red shows the trachoma biovar strain A-HAR-13). The Chlamydia spp. clustered in separate groups were highlighted in red ellipses.

Multiple Selection Forces Influencing Codon Usage Patterns in Chlamydia spp.
To identify whether gene codon usage patterns in each Chlamydia spp. were shaped solely by mutation pressure, natural selection, or both, ENC v.s GC3 content maps were constructed for each strain. The vast majority of plots for each species did not overlap the expected curve and were below this curve, and the below-curve scattering plots reflected the dominating effects of natural selection on genes of each species. As for the closely related species C. trachomatis, C. muridarum and C. suis, codon usage pattern at gene levels of C. trachomatis A-HAR-13 and C. muridarum Nigg represented more limited codon usage patterns than those of C. trachomatis L2/25567R, C. trachomatis E/SW3 and C. suis MD56 (Figure 4a-e). For C. pneumoniae and most of the mammal infecting Chlamydia spp., similar codon usage patterns among the corresponding genome were observed (Figure 4f-i and Figure 4l-m). While the three newly identified bird-infesting Chlamydia spp., codon usage patterns of C. avium 10DC88 were more limited than those of C. gallinacea and C. ibidis (Figure 4j,k and n). To better identify the role of mutation pressure from gene nucleotide composition, correlations between ENC and GC3 contents of gene were calculated. ENC and GC3 content are positively correlated in all species ( Table 2), suggesting that mutation pressure in all Chlamydia has dominant roles in shaping codon usage. In addition, significant negative correlation with the relative rank (r value ranging from −0.308 to −0.067) was found between CAI and ENC for all strains, excluding C. suis MD56, C. pecorum E58 and C. pneumoniae TW-183 (Table 3), implying that the obvious effect of codon usage bias the on codon usage pattern of gene population was just one among several evolutionary dynamics, compared with the role of mutation pressure.      Table 3. The correlation between the codon adaptiation index (CAI) and ENC of Chlamydia spp.

High Codon Usage Adaptation of T3ss and Pmps Gene Families to that of Corresponding Genome
We analyzed the extent of codon usage adaptation between T3ss, Pmps gene families and the corresponding genome to better identify the role of codon usage of gene population in the target gene, which played important roles in life cycle of Chlamydia. As shown in Figure 5, the two gene families generally had strong codon usage adaptation to the corresponding genome (D (A,B) < 0.1) and failed to follow the similar model of codon usage pattern in Chlamydia spp. In the closely related species of C. trachomatis, C. muridarum and C. suis, the T3ss and Pmps genes had a similar adaptation of codon usage in all strains excluding C. suis MD56 (Figure 5a-e). The T3ss and Pmps gene families had an obviously different adaptation of codon usage in C. pneumoniae TW-183 when compared with its closely related Chlamydia spp, such as C. psittaci, C. pecorum, C. abortus, C. felis, and C. caviae (Figure 5i,f,g,h,l,m). While for the three newly identified bird Chlamydia spp. (C. gallinacean, C. avium and C. ibidis), the two gene families had an obviously different adaptation of codon usage in C. gallinacean 08-1274/3, compared with the other two species (Figure 5j,k,n). Since T3ss and Pmps genes had specific adaptations of codon usage in the corresponding genome, the strong codon usage adaptation of the two gene families to the corresponding genome implied evolutionary dynamic derived from genome organization influencing codon usage pattern of genes, which played an important role in the life cycle of Chlamydiae.  10 We analyzed the extent of codon usage adaptation between T3ss, Pmps gene families and the corresponding genome to better identify the role of codon usage of gene population in the target gene, which played important roles in life cycle of Chlamydia. As shown in Figure 5, the two gene families generally had strong codon usage adaptation to the corresponding genome (D (A,B) < 0.1) and failed to follow the similar model of codon usage pattern in Chlamydia spp. In the closely related species of C. trachomatis, C. muridarum and C. suis, the T3ss and Pmps genes had a similar adaptation of codon usage in all strains excluding C. suis MD56 (Figure 5a-e). The T3ss and Pmps gene families had an obviously different adaptation of codon usage in C. pneumoniae TW-183 when compared with its closely related Chlamydia spp, such as C. psittaci, C. pecorum, C. abortus, C. felis, and C. caviae ( Figure  5i,f,g,h,l and m). While for the three newly identified bird Chlamydia spp. (C. gallinacean, C. avium and C. ibidis), the two gene families had an obviously different adaptation of codon usage in C. gallinacean 08-1274/3, compared with the other two species (Figure 5j,k and n). Since T3ss and Pmps genes had specific adaptations of codon usage in the corresponding genome, the strong codon usage adaptation of the two gene families to the corresponding genome implied evolutionary dynamic derived from genome organization influencing codon usage pattern of genes, which played an important role in the life cycle of Chlamydiae.

Discussion
Here, we gave an evolutionary insight into the relationship between nucleotide usages and codon usages in genomes of Chlamydiaceae family members, by means of information entropy, RSCU, ENC, CAI and a similarity index of codon usage adaptation. Although mutation pressure derived from nucleotide usages was identified as the dominant evolutionary dynamic in genomes of Chlamydia spp., other evolutionary dynamics, such as natural selection, also influenced codon usage patterns to modify the evolutionary trends of Chlamydia spp. In previous reports, GC3 content was regarded as a ruler, which is often used for reflecting influences of the overall nucleotide composition variations on codon usages at gene levels [31,[40][41][42][43]. The four nucleotide bases are regarded as footstones for genomic organization of microorganisms, the systemic and general estimation of four nucleotide usage patterns is better than the estimation of GC3 content or AT3 content when displaying the roles of nucleotide usage patterns in formation of synonymous codon usage patterns [44,45]. Members of Chlamydiaeace have similar genomes and share almost the same gene contents although they infect different hosts with pathogenic diversity [46]. Quantification of four nucleotide usage variations in gene population of the bacteria is a benefit for the overall nucleotide usage bias. A great deal of genetic information, such as the origin of Chlamydia spp., efficient nutrient usage, and the regulation/expression of genes in corresponding genomes, existed in the complex interplay between the four base (A, C, T and G) usage variations [47,48]. The information entropy method is able to systemically display this complex weight ratio between the four bases. The nucleotide usage bias at the gene levels of Chlamydia spp. reflects the role of efficient nutrient usage in strand-specific nucleotide usage and GC content. Besides, almost all bacterial genomes exhibit nucleotide compositional asymmetry between the replicational leading and lagging strands; therefore, there is an excess of nucleotides G relative to C in the leading strand and of C to G in the lagging strand [36]. This study, in accordance with previous studies, showed that the nucleotide compositional asymmetry contributed to the codon usage bias of genes located in leading and lagging strands in Chlamydia spp. [37,38], which showed the influence of nucleotide usage bias on codon usages. Interestingly, the codon usage patterns derived from genomes of C. abortus and C. avium displayed differences from other Chlamydia spp. in this study. It implied that gene location in the corresponding genome might serve as one evolutionary dynamic for codon usage formation in Chlamydia spp. Generally, AT-rich was found in gene levels of these genomes of Chlamydia spp., suggesting that AT-rich in Chlamydia spp. genomes could be an evolutionary feedback for their genome, losing many genes that were related to metabolic activities. AT-rich in gene levels of bacteria genome enabled the bacteria to replicate themselves with a small amount of energy [29,32,[49][50][51].
The nucleotide usage bias at the third codon position was generally stronger than the overall nucleotide usage bias at the gene level in all Chlamydia spp., suggesting that the dominant evolutionary dynamic was caused by the nucleotide composition function on the codon usage pattern of Chlamydiaceae. Interestingly, the overall nucleotide usage bias, nucleotide usage bias at the third codon position and synonymous codon usage pattern represented similar patterns in Chlamydiaceae, respectively. Currently, there are two evolutionary theories that explain why genetic code changes do not result in extinction of the species: the 'codon capture' theory and the 'ambiguous intermediate' theory [52]. For instance, Mycoplasma, a group of extracellular bacteria, have synonymous codon usage patterns that are regulated by the 'codon capture' theory [53,54] due to their extreme nucleotide usage bias (AT-rich). On the other hand, Chlamydia spp., the obligate intercellular bacteria, which do not contain extreme nucleotide usage bias at gene levels should not be mediated by the 'codon capture' theory but might follow the 'ambiguous intermediate' theory.
As obligate intercellular bacteria, the development cycle of Chlamydiae includes a unique intracellular stage when the microorganisms undergo growth and proliferation in the chlamydial inclusions inside the host cytoplasm, whereby the living activities of Chlamydiae deeply interacts with the host cellular processes. The parasitic life-style of Chlamydia spp. has driven them to adapt to the harsh intracellular environment during the evolutionary process. It has been accepted that co-evolution between intercellular pathogens and hosts can be performed by positive selection from the immune response of hosts and mutation pressure from pathogens [55,56]. Approximately, two-thirds of predicted proteins are shared across Chlamydia spp., which reflects genetic conservation and the evolutionary constraints that are imposed by their intracellular lifestyle and conserved developmental cycle [25,57]. Similarly, the members of Chlamydiaceae undergo a significant genomic degradation in the evolutionary process when compared with Parachlamydia acanthamoebae UWE25, a symbiont of ubiquitous protozoa, which is considered the evolutionary homolog of the last common ancestor of Chlamydia spp. with a genome twice as large in size [39]. The investigation into evolutionary dynamics of this family would be a benefit for better understanding the genetic trends of Chlamydiae. Compared with genetic diversity caused by amino acid usage in Chlamydia spp., evolutionary divergence caused by codon usage can separate Chlamydia spp. into different subgroups, suggesting codon usages play an important role in evolutionary trends in the Chlamydiaceae family. Horizontal gene transfer in Chlamydia plays an important role in sustaining a wide range of susceptible hosts [58]. Since horizontal gene transfer can be considered an important exogenous dynamic resulting in cross-species infection of these microorganisms, can endogenous genes, which function as important roles in life cycle of Chlamydia, be influenced by host bacteria? Pmps genes are considered as a highly heterogenous gene family and possess key biological activities in life cycle of Chlamydiae, and T3ss effectors which also play important roles in life cycle represent high variation of sequence similarity within Chlamydiaceae [1,59,60]. Interestingly, T3ss and Pmps genes represent strong codon usage adaptation to the corresponding host bacteria, implying that during formation of codon usage pattern, genes with important functions in Chlamydia spp. undergo natural selection from host bacteria. Previous reports have pointed out that those genes with important biological activities have a strong codon adaptation to their corresponding genomes, including ribosomal major transcription/translational processing factors, major chaperone/degradation proteins and also genes encoding enzymes of fatty acid biosynthesis, amino acid, and nucleotide biosynthesis [51,61,62].

The Genome Data
14 whole genomes of the 12 chlamydial species (three genomes for C. trachomatis) in the Chlamydiaceae family with coding sequence annotations were obtained from the National Center for Biotechnology (NCBI) GenBank database. The demographics of the selected species were given in Table S1.

Nucleotide Usage Patterns by Information Entropy
To clarify the effects of nucleotide composition on codon usage patterns, the following compositional properties were calculated for the coding sequences of the 14 genomes, namely the overall frequency of occurrence of nucleotides (N%, 'N' meaning any nucleotides), frequency of each nucleotide at the third codon position (N3%) and frequency of occurrence of nucleotides GC at the third codon position (GC3%). According to the pr evious report about analyzing nucleotide usage bias [32,55], we employed information entropy to reflect the overall nucleotide usage bias and nucleotide usage bias at the third codon position.
where f i means the probability of the specific nucleotide (F i ), F i means a number of occurrences of the specific nucleotide. The value of Entropy for nucleotide usage bias represents how dispersed the contribution of these four types of nucleotide is: the higher value, the more uniform nucleotide usage is; in contrast, the lower value reflects a more biased usage of nucleotide.

Relative Synonymous Codon Usage (RSCU) Value
The RSCU values for all coding sequences of the 12 chlamydial genomes were calculated to determine the characteristics of synonymous codon usage without the confounding influences of the amino acid usages or the gene lengths [63]. It is obvious that RSCU values close to 1.0 indicate a lack of bias for the corresponding codons, in contrast, RSCU values deviating from 1.0 reflect usage bias for the corresponding codons. When the RSCU value is 1.0, the corresponding synonymous codon is selected equally and randomly. Furthermore, to better reflect the extent of synonymous codon usage trends, RSCU values more than 1.6 and less than 0.6 were regarded as 'over-represented' and 'under-represented' codons, respectively [64].

Amino Acid Usage Bias by Information Entropy
To better clarify the extent of amino acid usage bias, we reference d the formation of nucleotide usage bias mentioned above. As for amino acid usage bias of each gene, the information Entropy over the frequencies of different amino acids in a given gene is represented by the formula [32,55]: where f i means the probability of the specific nucleotide (F i ). F i means a number of occurrences of the specific amino acid. The total types of amino acid are 20. The value of Entropy for amino acid usage bias ranges from 0 to 1, and represents how a dispersed contribution of the twenty types of amino acid: the higher value, the more uniform amino acid usage; in the contrast, the lower value reflects the more biased usage of amino acid.

Genetic Diversity of Chlamydia at Synonymous Codon and Amino Acid Usages
Principal component analysis (PCA) is a multivariate statistical method which reduces data dimensionality by performing a covariance analysis for a data matrix. As for genetic diversity for Chlamydia at gene levels, PCA was carried out by RSCU data of the 14 genomes of 12 chlamydial species. As for genetic diversity for Chlamydia at amino acid levels, PCA was carried out by amino acid compositions of them.

Codon Usage Index
To better identify the relationship between nucleotide usages and the overall codon usage bias, an effective number of codons (ENC) analysis was introduced in this study [65]. The ENC values range from 20 to 61, which are able to reflect the role of GC3 content in the overall codon usage bias. The lower the ENC value, the more biased the overall codon usage. In addition, to identify the relationship between the overall codon usage bias and protein properties, codon adaptation index (CAI), the grand average of hydropholicity scale and aromaticity were used in this study. These codon index data for coding sequences of the 14 Chlamydia genomes were calculated by CodonW software (https://sourceforge.net/projects/codonw/).

Similarity of Codon Usage
To better quantify extents of codon usage adaptation of Pmps genes or T3ss genes to the corresponding species, R (A,B) index was introduced in this study. The formula for R (A,B) index was calculated as follows: where R (A,B) index is defined as a cosine value of an included angle between A and B special vectors meaning the degree of similarity between Pmps/T3ss gene and the specific species at the aspect of the overall codon usage pattern, a i is defined as the RSCU value for a specific codon in 59 synonymous codons of coding sequence of Pmps or T3SS, b i is termed as the RSCU value for the same codon of the corresponding species. D (A,B) index represents the potential effect of the overall codon usage of host on that of DENV, and this value ranges from zero to 1.0 [66].

Statistical Methods
One-way ANOVA method was used to compare means of two or more groups containing numerical response data using the software SPSS 16.0 (IBM, Chicago, IL, USA) for Windows, and significant difference can be identified when p value <0.05. Correlation analysis was performed to identify the relationships between CAI data and ENC data/between ENC data and GC3 content for each strain using Spearman's correlation method.

Acknowledgments:
The authors appreciate Jianhua Zhou for his kind help in the analysis of data and organization of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.