Novel Structural Variation and Evolutionary Characteristics of Chloroplast tRNA in Gossypium Plants

Cotton is one of the most important fiber and oil crops in the world. Chloroplast genomes harbor their own genetic materials and are considered to be highly conserved. Transfer RNAs (tRNAs) act as “bridges” in protein synthesis by carrying amino acids. Currently, the variation and evolutionary characteristics of tRNAs in the cotton chloroplast genome are poorly understood. Here, we analyzed the structural variation and evolution of chloroplast tRNA (cp tRNA) based on eight diploid and two allotetraploid cotton species. We also investigated the nucleotide evolution of chloroplast genomes in cotton species. We found that cp tRNAs in cotton encoded 36 or 37 tRNAs, and 28 or 29 anti-codon types with lengths ranging from 60 to 93 nucleotides. Cotton chloroplast tRNA sequences possessed specific conservation and, in particular, the Ψ-loop contained the conserved U-U-C-X3-U. The cp tRNAs of Gossypium L. contained introns, and cp tRNAIle contained the anti-codon (C-A-U), which was generally the anti-codon of tRNAMet. The transition and transversion analyses showed that cp tRNAs in cotton species were iso-acceptor specific and had undergone unequal rates of evolution. The intergenic region was more variable than coding regions, and non-synonymous mutations have been fixed in cotton cp genomes. On the other hand, phylogeny analyses indicated that cp tRNAs of cotton were derived from several inferred ancestors with greater gene duplications. This study provides new insights into the structural variation and evolution of chloroplast tRNAs in cotton plants. Our findings could contribute to understanding the detailed characteristics and evolutionary variation of the tRNA family.


Introduction
Chloroplasts (cp) are unique oblate organelles equipped with important photosynthesis functions [1]. Besides the complete photosynthetic components, they have also been reported as active organelles participating in other diverse biomolecule processes including biosynthesis of starch, fatty acids, pigments, amino acids, etc. [2][3][4]. Some previous studies found that chloroplasts originated from cyanobacterial ancestors [5,6]. For most plants, the cp genome is a double-stranded circular unit involving four main regions: two inverted repeats (IRs) separated by a large single copy (LSC) region and a small single copy (SSC) region [7,8]. In addition, the features of non-recombination and maternal inheritance (in most angiosperms) of the cp genome provide the opportunity to employ it as material for

Structural Analysis of tRNAs
ARAGORN and tRNAScan-SE software [42,43] were employed to investigate the secondary structure of tRNA sequences of cp genomes. The default parameters of ARAGORN software were set to investigate tRNAs. The parameters of tRNAScan-SE were set as: bacterial for sequence source, default for search mode, formatted (FASTA) for query sequences, and universal for genetic code for tRNA isotype prediction.

Phylogenetic Tree Construction
The phylogenetic tree was constructed using MEGA7.0 software [47,48]. To investigate the evolution of chloroplast tRNAs, a matrix of whole tRNA sequences was created by Clustal Omega software before the phylogenetic tree was constructed. MEGA7 software was employed to turn the matrix file into MEGA file format for tree construction, in which the lowest Bayesian information criterion (BIC) was selected for the model. As a result, the Kimura2 + G + I model was found to have the lowest BIC score, 8980. 50. Thus, this model was adopted for the phylogenetic tree construction. The other related parameters were as follows: phylogeny reconstruction for analysis, maximum likelihood model, bootstrap method in phylogeny test, 1000 bootstrap replicates, nucleotide type, γ distributed with invariant sites (G + I) model, 5 discrete γ categories, partial deletion for gaps/missing data treatment, 95% site coverage cutoff, and very strong for branch swap filter.

Analysis of Disparity Index
A disparity index test of pattern heterogeneity was conducted to check the homogeneity of nucleotide substitutions and find whether all substitutions in nucleotides happened at equal rates, i.e., homogeneity in the process of evolution. The statistical parameters were as follows based on a previous study [29]: disparity index test for substitution pattern homogeneity, in sequence pairs, 10,001 Monte Carlo replications, nucleotide for substitution type, partial deletion of gaps/missing data treatment, and site coverage cutoff 95%.

Transition/Transversion Analysis
The transition and transversion rates of the tRNAs genes were analyzed according to their isotypes by MEGA 7 software [49]. The parameters were set as follows: substitution pattern estimation (ML) for analysis, automatic (neighbor-joining tree), maximum likelihood statistical method, nucleotide for substitution type, Kimura 2-parameter model, γ distributed (G) site rates, 5 discrete γ categories, partial deletion of gaps/missing data treatment, 95% of site coverage cutoff, and very strong branch swap filter.

Evolutionary Analysis of Single Nucleotide Polymorphisms
The diversity of single nucleotide polymorphisms of 8 diploid Gossypium cp genomes was calculated based on 3 regions: coding regions, introns, and intergenic spacers. The synonymous (dS) and non-synonymous (dN) substitutions of coding genes were also calculated by DnaSP v5.10 software [50]. The coding regions, introns, and intergenic regions of the studied Gossypium cp genome were extracted through Geneious 8.0.2 software and aligned manually [41].

Calculation of Mutation Rate
The rate of indel mutation (µ, per site per year) was calculated with the formula: where m is the number of sites of observed mutation, n is the total number of sites, and T is the divergence time of Gossypium. The µ value of structural mutations was calculated according to the method of Saitou and Ueda [51]. The T value was obtained through relevant published literature searches in the Fossil works database and the Cenozoic Angiosperm Database [52,53]. Additionally, the mutation rates of protein-coding genes and tRNA genes were calculated.

Duplication/Loss Analysis of tRNA Genes
To investigate duplication or loss events of the tRNAs, we used the NCBI taxonomy browser to construct the species tree of G. arboreum, G. anomalum, G. robinsonii, G. klotzschianum, G. somalense, G. longicalyx, G. bickii, G. hirsutum, G. barbadense, and G. populifolium. Additionally, the previously constructed phylogenetic tree of the tRNAs was employed as the gene tree. Subsequently, Notung 2.9 software [54,55] was used to reconcile the gene tree and the species tree and obtain the gene duplication and loss nodes.

Basic Characteristics of Cotton Chloroplast tRNAs
The results of detailed genomic analysis showed that G. arboreum, G. anomalum, G. robinsonii, G. klotzschianum, G. somalense, G. hirsutum, G. barbadense, G. longicalyx, and G. populifolium coded 37 tRNAs, respectively, while only G. bickii coded 36 tRNAs ( Table 2). The length of the chloroplast tRNAs ranged from 60 nt (tRNA Gly in G. arboreum, GCC) to 93 nt (tRNA Ser , UGA), with an average length of 76 nt (Table S1). The genomic analysis results showed that chloroplast tRNA genes of the investigated cotton plants coded 28 or 29 anti-codon types, of which only G. anomalum, G. longicalyx, and G. bickii coded 29 anti-codons. The most common anti-codons observed in cp tR-NAs were UGC-tRNA Ala , GCC-tRNA Gly , GAC-tRNA Val , ACG-tRNA Arg , CAA-tRNA Leu , GUU-tRNA Asn , CAU-tRNA Ile , GAU-tRNA Ile , and CAU-tRNA Met . Additionally, each of these genes, except GCC-tRNA Gly and CAU-tRNA Met , had two copies (Table S2). GCC (tRNA Gly ) was found with one copy in G. anomalum, G. longicalyx, and G. bickii but two copies in the other cotton species. Similarly, CAU (tRNA Met ) was found with one copy in G. bickii but two copies in the other cotton species. There were 33 tRNAs with various isoacceptors missing in the cp genome of Gossypium (Table S2). UCC-tRNA Gly was observed in G. anomalum, G. longicalyx, and G. bickii. All investigated cotton species were shown to possess at least one anti-codon type for each kind of tRNA. Furthermore, the anti-codon CAU was a typical characteristic of tRNA Met , which harbored only one type of iso-acceptor. Apart from the existence of the anti-codon CAU in tRNA Met , tRNA Ile was also observed to encode the anti-codon CAU in the cotton cp genome (Table S2).
All the tRNA gene families were analyzed by multiple sequence alignment, from which the limited conservation and consensus sequences were found in the Ψ-arm and Ψ-loop. The Ψ-arm of tRNAs was observed to contain the G-G consensus sequence, and the Ψ-loop was observed to contain the conserved sequence U-U-C-X3-U ( Table 3).
Most of the tRNAs possessed a G nucleotide at the first position of the acceptor arm, while tRNA Gln , tRNA Pro , and tRNA Val were observed to contain a U and an A (Table 3). Except for tRNA Lys , tRNA Met , tRNA Thr , tRNA Tyr , and tRNA Val , other tRNAs contained a G in the first nucleotide of the D-arm. Additionally, there was usually an A at the last position of the D-loop and ANC-loop. At the final site of the D-arm, except for tRNA Arg , tRNA Glu , tRNA Ile , tRNA Leu , tRNA Met , tRNA Ser , and tRNA Tyr , the other investigated tRNAs were found to have a C (Table 3). Additionally, we also observed that all tRNA Lys , tRNA Ala , tRNA Glu , tRNA Arg , tRNA Tyr , and a few tRNA Leu had a C-C-A tail in the 3 end.
Note: The asterisk (*****) shows the absence of conserved nucleotide consensus sequence in the respective region of the chloroplast tRNAs.

Diversification of tRNA Structure
The tRNA feature of possessing various arms and loops was responsible for protein translation. The results of structure analysis showed that the acceptor arm in cotton chloroplast tRNA contained 6 to 7 nt. Among the 369 investigated tRNA sequences, only nine were found to contain 6 nt, while the remaining 360 tRNAs (97.56%) contained 7 nt (Table S3). In addition, the D-arm was observed to contain 2 to 4 nt, among which 20 contained only 2 nt, and 110 (29.81%) contained 3 nt. The remaining tRNAs (64.50%) were observed to possess 4 nt in the D-arm. Moreover, the D-loop contained 7 to 11 nt. For all the involved tRNAs, 84 contained 7 nt in their D-loops; 69 (18.70%) contained 8; 105 (28.46%) contained 9; 40 (10.84%) contained 10; and the rest (18.98%) contained 11 nt (Table S3).

Chloroplast tRNA Contained Introns
The cotton chloroplast tRNAs had intron annotations according to previous studies. The tRNA Val of G. populifolium was observed to contain an intron in its anti-codon loop region ( Figure 1). The introns in bacterial and plant chloroplast tRNAs had conserved G-A-T-T-T and C-T-T-C-A consensus sequences ( Figure 2). Phylogenetic analysis showed that chloroplast tRNA introns grouped with the introns in cyanobacteria ( Figure 3). Introns contained in the same tRNAs of most plants (tRNA Ile and tRNA Leu ) tended to appear in the same branch. This revealed their close phylogenetic relationship. The introns in chloroplast tRNA Val of G. populifolium and Zea mays L. aggregated to the same branch ( Figure 3), showing that the introns of corn and cotton have a close phylogenetic relationship.
G-A-T-T-T and C-T-T-C-A consensus sequences ( Figure 2). Phylogenetic analysis showed that chloroplast tRNA introns grouped with the introns in cyanobacteria (Figure 3). Introns contained in the same tRNAs of most plants (tRNA Ile and tRNA Leu ) tended to appear in the same branch. This revealed their close phylogenetic relationship. The introns in chloroplast tRNA Val of G. populifolium and Zea mays L. aggregated to the same branch (Figure 3), showing that the introns of corn and cotton have a close phylogenetic relationship.

Chloroplast tRNAs with Non-Typical Features
A few unconventional tRNAs were observed in cotton cp genomes. tRNA Leu , tRNA Ser , and tRNA Tyr were observed to have a loop in the variable region ( Figure 4). In these non-typical tRNAs, the anti-codon loop harbored 7 nt with the X-U-X3-A-A consensus sequence, and the stem of the anti-codon loop contained 4 to 5 nt. The variable loop region was observed to have 2 to 8 nt for tRNA Leu , tRNA Ser , and tRNA Tyr . The stem of

Chloroplast tRNAs with Non-Typical Features
A few unconventional tRNAs were observed in cotton cp genomes. tRNA Leu , tRNA Ser , and tRNA Tyr were observed to have a loop in the variable region (Figure 4). In these non-typical tRNAs, the anti-codon loop harbored 7 nt with the X-U-X3-A-A con-

Chloroplast tRNAs with Non-Typical Features
A few unconventional tRNAs were observed in cotton cp genomes. tRNA Leu , tRNA Ser , and tRNA Tyr were observed to have a loop in the variable region ( Figure 4). In these nontypical tRNAs, the anti-codon loop harbored 7 nt with the X-U-X3-A-A consensus sequence, and the stem of the anti-codon loop contained 4 to 5 nt. The variable loop region was observed to have 2 to 8 nt for tRNA Leu , tRNA Ser , and tRNA Tyr . The stem of the variable region contained 3 to 7 nt pairs. Obviously, tRNA Ser had mainly 6 or 7 nt pairs (Figure 4). These loop structures in variable regions of tRNAs put forward the question of whether these loops play an important role in the process of protein translation of the chloroplast. (d-f) tRNA Ser ; (g) tRNA Tyr . Anti-codon loop had seven nucleotides with the conservative X-U-X3-A-A consensus sequence in these tRNAs. Loop structure of the variable region was observed to contain three to eight nucleotides from tRNA Leu , tRNA Ser , and tRNA Tyr .

Cotton Chloroplast tRNAs Were Derived from Several Evolutionary Ancestors
The phylogenetic tree of cotton and other various species' tRNA sequences from a wide range of taxonomic positions, including algae (Nostoc sp. PCC 7524), bryophytes (Dumortiera hirsuta), ferns (Psilotum nudum), gymnosperms (Pinus taeda), and angiosperms (Arabidopsis lyrata), presented three major clusters (Table 4). In all, there were 87 groups in integrated clade I, 43 groups in integrated clade II, and 7 groups in integrated clade III. In the phylogenetic tree, tRNA Ser , tRNA Leu , tRNA Arg , tRNA Val , tRNA Ile , tRNA Met , and tRNA Gln were polyphyletic, positioned in more than two integrated clades. In integrated clade I, most of tRNA Met was clustered into two polyphyletic sub-clades (containing nine groups), one embedded in the tRNA Thr group and another adjacent to tRNA Cys and tRNA Phe , while tRNA Met of cyanobacteria was independently clustered to its tRNA Arg . In integrated clade II, tRNA Leu was closely related to cyanobacterial tRNA Gln , even with its polyphyly. tRNA Trp , except the cyanobacterial one, was clustered into a monophyletic group, while the cyanobacterial one was embedded into the sister clade of the tRNA Trp group. tRNA Ile was found in all three clades. Interestingly, tRNA Leu , tRNA Ile , tRNA Gln , tRNA Phe , and tRNA Arg present in integrated clade I were also found in clade II, and the tRNAs in integrated clade III were also found in clade I ( Figure 5). Table 4. The distribution of types of tRNAs in the three major clusters of the phylogenetic tree.

Integrated
Clades Types of tRNAs (d-f) tRNA Ser ; (g) tRNA Tyr . Anti-codon loop had seven nucleotides with the conservative X-U-X3-A-A consensus sequence in these tRNAs. Loop structure of the variable region was observed to contain three to eight nucleotides from tRNA Leu , tRNA Ser , and tRNA Tyr .

Cotton Chloroplast tRNAs Were Derived from Several Evolutionary Ancestors
The phylogenetic tree of cotton and other various species' tRNA sequences from a wide range of taxonomic positions, including algae (Nostoc sp. PCC 7524), bryophytes (Dumortiera hirsuta), ferns (Psilotum nudum), gymnosperms (Pinus taeda), and angiosperms (Arabidopsis lyrata), presented three major clusters (Table 4). In all, there were 87 groups in integrated clade I, 43 groups in integrated clade II, and 7 groups in integrated clade III. In the phylogenetic tree, tRNA Ser , tRNA Leu , tRNA Arg , tRNA Val , tRNA Ile , tRNA Met , and tRNA Gln were polyphyletic, positioned in more than two integrated clades. In integrated clade I, most of tRNA Met was clustered into two polyphyletic sub-clades (containing nine groups), one embedded in the tRNA Thr group and another adjacent to tRNA Cys and tRNA Phe , while tRNA Met of cyanobacteria was independently clustered to its tRNA Arg . In integrated clade II, tRNA Leu was closely related to cyanobacterial tRNA Gln , even with its polyphyly. tRNA Trp , except the cyanobacterial one, was clustered into a monophyletic group, while the cyanobacterial one was embedded into the sister clade of the tRNA Trp group. tRNA Ile was found in all three clades. Interestingly, tRNA Leu , tRNA Ile , tRNA Gln , Genes 2021, 12, 822 9 of 17 tRNA Phe , and tRNA Arg present in integrated clade I were also found in clade II, and the tRNAs in integrated clade III were also found in clade I ( Figure 5).  D, aspartate (Asp); C, cysteine (Cys); Q, glutamine (Gln); E, glutamate (Glu); G, glycine (Gly); H, histidine (His); I, isoleucine (Ile); L, leucine (Leu); K, lysine (Lys); M, methionine (Met); F, phenylalanine (Phe); P, proline (Pro); S, serine (Ser); T, threonine (Thr); W, tryptophan (Trp); Y, tyrosine (Tyr); V, valine (Val). The green solid circle represents Gossypium; the red solid triangle represents P. taeda; the pink solid diamond represents A. lyrate; the yellow solid triangle represents P. nudum; the orange solid triangle represents D. hirsuta; and the blue solid square represents Nostoc sp. PCC 7524. Phylogenetic analysis showed the polyphyletic origin of chloroplast tRNAs.

Transition/Transversion of tRNAs
tRNAs have evolved with almost equal transition and transversion rates in spite of the small probability of transition or transversion events in tRNAs. In the present study, we found some intriguing phenomena in the substitution rate of cotton chloroplast tRNAs. A transition rate of 16.66 and transversion rate of 16.68 were found in tRNA Ala , tRNA Phe , tRNA Asp , tRNA Pro , tRNA Tyr , tRNA Val , and tRNA Ile . This showed that the transversion rate was slightly higher than the transition rate, and these tRNAs evolved with almost the same transition and transversion rates ( Figure 6, Table S4). The highest transition rate was 50.00 for tRNA Trp and the highest transversion rate was 25.00 for tRNA His . Correspondingly, the lowest transition rate (0.00) was observed for tRNA His , and the lowest transversion rate (0.00) for tRNA Trp (Figure 6). This indicated that the tRNA Trp of the cotton cp genome had experienced a high transition rate without any transversion. Similarly, tRNA His had experienced a high rate of transversion but no transition. For tRNA Lys , tRNA Arg , tRNA Met , tRNA Asn , tRNA Cys , tRNA Ser , tRNA Glu , tRNA Gly , and tRNA Leu , the transition rate was higher than the transversion rate ( Figure 6A). Additionally, the transition rate was two times higher than transversion for tRNA Arg , tRNA Cys , and tRNA Glu , which showed that the evolution of chloroplast tRNA Arg , tRNA Cys , and tRNA Glu tended toward transition rather than transversion. For tRNA Gln and tRNA Thr , the transversion rate was apparently higher than the transition rate ( Figure 6B). This indicated that these tRNA iso-acceptors experienced transversion substitutions more easily than transition. The substitution rates of overall cp tRNAs showed that the average transition rate (23.64) was greater than the transversion rate (13.18) ( Figure 6B).

Evolutionary Characteristics of Single Nucleotide Polymorphisms
The biallelic and parallel mutation SNPs of the eight diploid Gossypium cp genomes were calculated. There were 2709 SNPs in Gossypium cp genomes (Table 5). They were subdivided into coding, intron, and intergenic spacer regions for further analyses. Among the 2709 SNPs, 906 were in coding regions, 299 were in intron regions, and 1504 were in intergenic spacers. The percentage of SNPs to total length was 1.14, 1.39, and 2.90%, respectively. In coding regions, the overall ratio of nonsynonymous mutations to synonymous mutations (dN/dS) was 3.04.

Mutation Rate of Chloroplast Genome
The divergence time of Gossypium was speculated to be 9.8 Mya from Theobroma [52,53]. The total length of the eight diploid cotton cp genomes was 156,796 bp. The mutation rate (µ 1 ) of the protein-coding genes and the mutation rate (µ 2 ) of the tRNA genes were calculated using the length of genomes, the number of indel mutations (522 for protein-coding genes and 133 for tRNA genes), and the divergent time. The value of µ 1 of 0.34 × 10 -9 per site per year for protein-coding genes and the value of µ 2 of 0.09 × 10 -9 per site per year for tRNA genes were obtained, respectively. the cotton cp genome had experienced a high transition rate without any transversion. Similarly, tRNA His had experienced a high rate of transversion but no transition. For tRNA Lys , tRNA Arg , tRNA Met , tRNA Asn , tRNA Cys , tRNA Ser , tRNA Glu , tRNA Gly , and tRNA Leu , the transition rate was higher than the transversion rate ( Figure 6A). Additionally, the transition rate was two times higher than transversion for tRNA Arg , tRNA Cys , and tRNA Glu , which showed that the evolution of chloroplast tRNA Arg , tRNA Cys , and tRNA Glu tended toward transition rather than transversion. For tRNA Gln and tRNA Thr , the transversion rate was apparently higher than the transition rate ( Figure 6B). This indicated that these tRNA iso-acceptors experienced transversion substitutions more easily than transition. The substitution rates of overall cp tRNAs showed that the average transition rate (23.64) was greater than the transversion rate (13.18) ( Figure 6B).

Evolutionary Characteristics of Single Nucleotide Polymorphisms
The biallelic and parallel mutation SNPs of the eight diploid Gossypium cp genomes were calculated. There were 2709 SNPs in Gossypium cp genomes (Table 5). They were subdivided into coding, intron, and intergenic spacer regions for further analyses. Among the 2709 SNPs, 906 were in coding regions, 299 were in intron regions, and 1504 were in intergenic spacers. The percentage of SNPs to total length was 1.14, 1.39, and

tRNA Duplication/Loss Events
Besides substitution, duplication and loss events of genes had a vital influence on their evolution. The analysis of duplication or loss events indicated that the investigated cotton chloroplast tRNA genes had experienced 226 duplication and 93 co-duplication events ( Figure S1). However, only 63 loss events were observed ( Figure S1, Table S5). The results showed that the duplication of genes was greater than the loss of genes for all of the tRNAs.

Discussion
The nucleotide composition of tRNA is closely related to its senior structure, which is responsible for the translation process. In many species, the tRNA family is conserved in evolution [56,57]. As one of the major gene components of semi-autonomous chloroplast, tRNAs were shown to have several basic conserved genomic features. tRNA Leu and tRNA Ser were observed to have 80 nt or more. These two tRNA isotypes were also found to harbor more than 83 nt in Adoxaceae plants, which shows that the gene sequences of tRNA Ser and tRNA Leu are longer than other tRNAs [58]. In addition, some tRNA isoacceptors were not observed in the cp genome of Gossypium, which was similar to that of the species in Gramineae [28]. Additionally, this is perhaps related to the codon usage and wobble base pairing in the genomic constitution of plants [59]. Furthermore, selenocysteine and suppressor tRNAs were found lacking in the cotton cp genome, which were present in Oryza sativa, Sachharum officinarum L., Sorghum bicolor (L.) Moench, Triticum aestivum L., and Zea mays. This may be related to the biotoxin accumulation ability of these two tRNAs [29]. The differences between the quantity (36 or 37) and anti-codon types (28 or 29) of tRNAs in diploid and allotetraploid cotton species were not significant. This may be related to the stability and conservation of tRNA genes and the long cultivation of cotton species, which reduced the level of interspecific diversity [60,61].
The tRNA family was conserved in its genomic composition. In this study, most of the involved tRNAs were observed to harbor the conservative sequence U-U-C-X3-U at their Ψ-loop (Table 3). Additionally, in Adoxaceae, U-U-C-A conservative nucleotides at its Ψ-loop region were found [58]. This showed that the short consensus sequences might be important components of the molecular recognizer in the Ψ-loop, and are associated with the recognition of ribosomal RNA during the translation process [62]. In our study, we found that the acceptor arm of cotton chloroplast tRNA contained 6 to 7 nt; the D-arm contained 2 to 4 nt; the D-loop contained 7 to 11 nt; the anti-codon loops contained 7 and 9 nt; the Ψ-arms contained 5 nt; and the Ψ-loops had 7 nt. Additionally, in gymnosperm chloroplast genomes, the acceptor arm of tRNAs harbors 6 bp to 7 bp; the D-arm has 3 bp or 4 bp; and the D-loop contains 7 nt to 11 nt; the anti-codon loop contains 7 nt; the Ψ-arm contains 5 bp; and the Ψ-loop has 7 nt [63]. Our results and previous findings together suggest that chloroplast RNAs are significantly conserved, though a few tRNAs contain rare secondary structures [64,65]. In the present study, some tRNA Leu , tRNA Ser , and tRNA Tyr were found in the variable region including a loop structure. The presence of this unconventional structure in the variable region exhibited the structural variation existing in tRNAs and this might be relevant to the maintenance of tRNA structures and the interaction with the D-arm and Ψ-arm [66].
Introns are previously reported in archaeal and eukaryotic genomes that break the continuity of numerous eukaryotic genes [67]. Here, the intron was observed in chloroplast tRNAs of G. populifolium. This is consistent with previous reports in other organisms [68,69]. In general, the group I intron has a few hundred nucleotides because of its built-in ribozyme unit [70]. tRNAs contain sequences of less than 100 polynucleotides that fold into a clovertype secondary structure [71]. Thus, the intron we observed in this study might be a part of intron I that was contained within the cotton chloroplast genome.
Phylogenetic analysis revealed that the introns in chloroplast tRNA Val of Gossypium populifolium and Zea mays aggregated to the same branch (Figure 3), suggesting that the introns of corn and cotton have a close phylogenetic relationship. Additionally, chloroplast tRNAs with introns were grouped with cyanobacteria, providing supportive evidence for a common cyanobacterial lineage source of cp tRNAs [58].
In addition to the anti-codon CAU of tRNA Met , tRNA Ile was observed to code the anticodon CAU in the cotton chloroplast genome. This may be associated with the modification of tRNA [72]. Three types of tRNAs-tRNA fMet , tRNA Met , and tRNA kIle , with anti-codon CAU-were found in a bacterial genome, and tRNA kIle was able to identify the codon of isoleucine instead of methionine after anti-codon modification [73]. For the tRNA Ile CAU observed in the genomes of plants, the first position in the anti-codon of these tRNAs has a lysidine-like nucleotide. On the other hand, a methionine-discerning anti-codon CAT was found in the genes. After the modification of the C residue of the CAT anti-codon, the mature tRNA shows isoleucine-identifying activity rather than methionine-identifying activity [74,75].
The presence of three obvious clusters and diverse groupings, and the appearance of tRNA groups from different plants with a wide range of taxonomic positions in integrated clade I and II, clade I and III, and clade I, II, and III of the phylogenetic tree, indicate frequent duplication and divergence during their evolution. In addition, tRNA Ser , tRNA Leu , tRNA Arg , tRNA Val , tRNA Ile , tRNA Met , and tRNA Gln were found in more than two integrated clades. This indicates their multiple evolutionary origins. Diverse groupings and the overlapping of tRNA groups from different plants suggest that the tRNAs have several inferred ancestors, including tRNA Met , tRNA Ile , etc., in the evolutionary history [63]. The interlaced emergence of tRNA groups (tRNA Met , tRNA Thr , tRNA Arg , tRNA Trp , tRNA Val , and tRNA Ser ) also suggests that evolutionary relationships among these tRNA types are relatively close.
Just as the cases with the most genetic variations, the transition rate was greater than the transversion rate of Gossypium cp tRNAs at the overall level. However, tRNA Trp and tRNA His showed different substitutive choices, which may be caused by various factors, such as their neighbor bases and the efficiency of the repair system of DNA strands [76]. The percentage of SNPs in coding regions, intron regions, and intergenic spacers compared to the total cotton cp genomes was 1.14, 1.39, and 2.90%, respectively, which implies that the density of SNPs was different in the cotton cp genome and the intergenic spacer of Gossypium was more variable than intron regions. This may be associated with the richness of A/T and G/C repetitive units in certain regions of the cp genome. It also implies that the evolutionary mutation potential of different positions in the genome is unbalanced [77]. In coding regions, the overall ratio of non-synonymous to synonymous mutations (dN/dS) was 3.04, showing that non-synonymous mutations had been fixed in the cotton cp genome. In the involved diploid Gossypium cp genomes, the mutation rate (µ 1 ), 0.34 × 10 -9 per site per year of protein-coding genes, was significantly higher than the mutation rate (µ 2 ), 0.09 × 10 -9 per site per year of tRNA genes (more than three times than that of µ 2 ), which also implies that compared with protein-coding genes, tRNA genes and structures have considerable evolutionary stability [78]. In addition, it should be mentioned that the genomic data employed in our study still have limitations. Perhaps the addition of more diverse gene sequences would provide more interesting results.
In concert with multiple factors of tRNA evolution, most important gene functions have evolved from gene duplication events [79,80]. In this study, there were about five times as many duplication events as loss events. Many studies have also shown that frequent duplication greatly promotes evolution and functional diversification in gene families [81,82]. This may provide further insights to confirm that the evolution of cotton tRNAs derived from polyphyletic evolutionary ancestors.

Conclusions
Cotton chloroplast tRNAs encode 28 or 29 anti-codon types, and 36 or 37 anti-codonspecific tRNAs with lengths ranging from 60 (tRNA Gly ) to 93 nucleotides (tRNA Ser ). Thirtythree anti-codon types including AGC-tRNA Ala are absent in the cp genome of Gossypium. The CAU anti-codon is encoded in both tRNA Met and tRNA Ile . The acceptor arm of cotton chloroplast tRNA contains 6 to 7 nt; the D-arm contains 2 to 4 nt; the D-loop contains 7 to 11 nt; the anti-codon loop contains 7 and 9 nt; the Ψ-arms contains 5 nt; and the Ψ-loop has 7 nt. The Ψ-arm of cotton chloroplast tRNAs contains the G-G consensus sequence, and the Ψ-loop contains the conserved U-U-C-X3-U motifs. Additionally, cotton chloroplast tRNAs were found to contain introns and a few tRNA Leu , tRNA Ser , and tRNA Tyr were observed to have a loop in the variable region. Furthermore, phylogenetic analysis suggests that tRNAs possibly have several inferred ancestors, including tRNA Met , tRNA Ile , etc., in the evolutionary history. On the other hand, the average transition rate of all involved cp tRNAs was greater than their transversion rate. The density of SNPs was unbalanced and the intergenic spacer of Gossypium was more variable than intron regions. Gene duplication events (226 duplication and 93 co-duplication) have occurred more frequently than gene loss events (63) in cotton chloroplast tRNAs. These results provide helpful insights into the detailed characteristics and evolutionary variation of the tRNA family.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12060822/s1. Figure S1: Duplication and loss events of the chloroplast tRNAs. Blue: duplication events; gray: loss events; D: duplication node; cD: conditional duplication node. Analysis indicates that the chloroplast tRNAs underwent frequent duplication events during their evolution with subsequent diversification, Table S1: Information registration of the analyzed tRNA genes, Table S2: Distribution of anti-codons in cotton chloroplast genome, Table S3: Nucleotide composition in acceptor arm (AA), D-arm (DA), D-loop (DL), anti-codon arm (ACA), anti-codon loop (ACL), variable loop (VL), pseudouridine arm (ΨA), and pseudouridine loop (ΨL) of chloroplast tRNA, Table S4: Results of transition and transversion rates of chloroplast tRNAs, Table S5: Loss events of the chloroplast tRNAs.