The Complete Chloroplast Genome of Two Important Annual Clover Species, Trifolium alexandrinum and T. resupinatum: Genome Structure, Comparative Analyses and Phylogenetic Relationships with Relatives in Leguminosae.

Trifolium L., which belongs to the IR lacking clade (IRLC), is one of the largest genera in the Leguminosae and contains several economically important fodder species. Here, we present whole chloroplast (cp) genome sequencing and annotation of two important annual grasses, Trifolium alexandrinum (Egyptian clover) and T. resupinatum (Persian clover). Abundant single nucleotide polymorphisms (SNPs) and insertions/deletions (In/Dels) were discovered between those two species. Global alignment of T. alexandrinum and T. resupinatum to a further thirteen Trifolium species revealed a large amount of rearrangement and repetitive events in these fifteen species. As hypothetical cp open reading frame (ORF) and RNA polymerase subunits, ycf1 and rpoC2 in the cp genomes both contain vast repetitive sequences and observed high Pi values (0.7008, 0.3982) between T. alexandrinum and T. resupinatum. Thus they could be considered as the candidate genes for phylogenetic analysis of Trifolium species. In addition, the divergence time of those IR lacking Trifolium species ranged from 84.8505 Mya to 4.7720 Mya. This study will provide insight into the evolution of Trifolium species.


Introduction
Trifolium L. (Leguminosae, Fabaceae), one of the largest genera in the Leguminosae, contains several important fodder species, such as T. repens (white clover), T. pratense (red clover), T. alexandrinum (Egyptian clover), T. resupinatum (Persian clover) amongst others [1]. Trifolium species are also widely grown as green manure crops, and about 11 species, including T. alexandrinum and T. resupinatum, were introduced to the subtropical zone of east Asia and have been reported to be excellently adapted to saline-alkali soil thus useful for agricultural production [2]. T. alexandrinum is generally grown as an annual winter legume fodder crop in the Middle East, Mediterranean and the Indian paired-end sequences to the reference of T. medeseum (Figure 1). The cp genomes of T. alexandrium and T. resupinatum were detected with a lack of IR and have a size of 148,545 bp and 149,026 bp, respectively ( Table 1). The GC content observed in the two cp genomes was 34.09% and 33.80% overall, and 37.05% and 36.64% in coding sequences (CDS). A total of 112 and 109 genes were consisted in the complete cp genomes of T. alexandrinum and T. resupinatum, which contains 31 and 37 tRNA, 75 and 66 mRNA, and 6 rRNA, and 13 and five genes possessing introns, respectively. In particular, there were two rrn 16 genes in each of the Trifolium species with the unidentical sequences. alignment of paired-end sequences to the reference of T. medeseum ( Figure 1). The cp genomes of T. alexandrium and T. resupinatum were detected with a lack of IR and have a size of 148,545 bp and 149,026 bp, respectively ( Table 1). The GC content observed in the two cp genomes was 34.09% and 33.80% overall, and 37.05% and 36.64% in coding sequences (CDS). A total of 112 and 109 genes were consisted in the complete cp genomes of T. alexandrinum and T. resupinatum, which contains 31 and 37 tRNA, 75 and 66 mRNA, and 6 rRNA, and 13 and five genes possessing introns, respectively. In particular, there were two rrn 16 genes in each of the Trifolium species with the unidentical sequences.  There were 46 genes related to photosynthesis in cp genomes of T. alexandrinum and T. resupinatum (Table 2), of which four genes psbN, atpF, ndhA and ndhB were specific for T. alexandrinum. These genes include the ones encoding subunits of Rubisco, subunits of photosystem I, subunits of photosystem II, subunits of ATP synthase, cytochrome b/f complex, c-type cytochrome synthesis and subunits of NADH dehydrogenase. Thirty-one genes were related to self-replication, including four ribosomal RNA genes and 27 transfer RNA genes, in which trnT-CGU was unique in T. alexandrinum. Besides, ten genes encoded ribosomal proteins and twelve were associated with transcription. Among them, rpl2 and rpoC1 were unique in T. alexandrinum. Furthermore, three genes clpP, accD and ycf3 with other functions were particular for T. alexandrinum (Table S1).
Introns are generally not subject to natural selection thus theoretically accumulate more mutations than exons. In this study, a total of seven genes (atpF, clpP, ndhA, ndhB, rpoC1, rps18 and tRNA-CGU) only contained an intron in T. alexandrinum (Table S1). Other five genes tRNA-UAA, tRNA-UAC, tRNA-UGC, tRNA-UUC and tRNA-UUU all had an intron in T. alexandrinum and T. resupinatum. The exons length of those five genes was more conserved compared with the intron. In particular, ycf3 had two introns in T. alexandrinum.

Repeat Sequences Analysis
Scattered repetitive sequences (palindrome repeats and direct repeats) and simple sequence repeats (SSRs) were analyzed respectively. A total of 1941 scattered repetitive sequences in the T. alexandrinum cp genome were annotated, which was greater than T. resupinatum (1250). The percentages of palindrome repeats (type P, 50.49%, Figure 2B) of T. alexandrinum were slightly larger than T. resupinatum (46.4%). A total of 370 ( Figure 2A) and 383 SSRs (sizes ranged from 8-81 bp and 8-36 bp) were predicted in T. alexandrinum and T. resupinatum and 30.54% and 23.24% of them were distributed in genic regions. In particular, the majority of SSRs were located in ycf1 (18 for T. alexandrinum and T. resupinatum), followed by rpoC2 (11 for T. resupinatum and 9 for T. alexandrinum). Mononucleotide repeats were dominant (65.41% in T. alexandrinum and 74.93% in T. resupinatum), followed by trinucleotide repeats (25.68% in T. alexandrinum and 22.19% in T. resupinatum), in which the polyadenine repeats (poly A, 37.34% for T. resupinatum and 35.95% for T. alexandrinum) and polythymine (poly T, 36.55% for T. resupinatum and 37.84% for T. alexandrinum) were much more than guanine (G) or cytosine (C) repeats (less than 1.35%). A total of 24 SSRs were identified to be shared by T. alexandrinum and T. resupinatum (Additional file2: Table S2; Figure 2A). The common repeat sequences larger than 30 bp with the longest length of 117 bp was showed in Figure 2C.

Repeat Sequences Analysis
Scattered repetitive sequences (palindrome repeats and direct repeats) and simple sequence repeats (SSRs) were analyzed respectively. A total of 1941 scattered repetitive sequences in the T. alexandrinum cp genome were annotated, which was greater than T. resupinatum (1250). The percentages of palindrome repeats (type P, 50.49%, Figure 2B) of T. alexandrinum were slightly larger than T. resupinatum (46.4%). A total of 370 ( Figure 2A) and 383 SSRs (sizes ranged from 8 -81 bp and 8 -36 bp) were predicted in T. alexandrinum and T. resupinatum and 30.54% and 23.24% of them were distributed in genic regions. In particular, the majority of SSRs were located in ycf1 (18 for T. alexandrinum and T. resupinatum), followed by rpoC2 (11 for T. resupinatum and 9 for T. alexandrinum). Mononucleotide repeats were dominant (65.41% in T. alexandrinum and 74.93% in T. resupinatum), followed by trinucleotide repeats (25.68% in T. alexandrinum and 22.19% in T. resupinatum), in which the polyadenine repeats (poly A, 37.34% for T. resupinatum and 35.95% for T. alexandrinum) and polythymine (poly T, 36.55% for T. resupinatum and 37.84% for T. alexandrinum) were much more than guanine (G) or cytosine (C) repeats (less than 1.35%). A total of 24 SSRs were identified to be shared by T. alexandrinum and T. resupinatum (Additional file2: Table S2; Figure 2A). The common repeat sequences larger than 30 bp with the longest length of 117 bp was showed in Figure 2C.

Relative Synonymous Codon Usage Analysis
Relative synonymous codon usage analysis (RSCU), which is considered to be a combination result of natural selection, species mutation and genetic drift, was analyzed (Additional file1: Figure  S1; Additional file3: Table S3). The RSCU value for initiation codon AUG was 2.9745 in T. alexandrinum and 2.9721 in T. resupinatum. The values of three termination codons UAA, UAG and UGA were 1.6215, 0.5676 and 0.8109 in T. alexandrinum, and 1.5909, 0.5454 and 0.8637 in T. resupinatum. The codons with an RSCU value greater than one were considered to be a greater codon frequency. 46.97% (31 of 66, include three termination codons) of the codons were with the greater codon frequency both in T. alexandrinum and T. resupinatum, in whih 93.55% (29 of 31) prefers A or U in the third sites. In the other codons with RSCU values less than one (including one), C or G were more common in the third position (88.57%, 31 of 35).

Relative Synonymous Codon Usage Analysis
Relative synonymous codon usage analysis (RSCU), which is considered to be a combination result of natural selection, species mutation and genetic drift, was analyzed (Additional file1: Figure  S1; Additional file3: Table S3). The RSCU value for initiation codon AUG was 2.9745 in T. alexandrinum and 2.9721 in T. resupinatum. The values of three termination codons UAA, UAG and UGA were 1.6215, 0.5676 and 0.8109 in T. alexandrinum, and 1.5909, 0.5454 and 0.8637 in T. resupinatum. The codons with an RSCU value greater than one were considered to be a greater codon frequency. 46.97% (31 of 66, include three termination codons) of the codons were with the greater codon frequency both in T. alexandrinum and T. resupinatum, in whih 93.55% (29 of 31) prefers A or U in the third sites. In the other codons with RSCU values less than one (including one), C or G were more common in the third position (88.57%, 31 of 35).

Whole cp Genome Comparison with Other Trifolium Species
In order to examine the sequence divergence of Trifolium genus and further shed light on the evolutionary events, such as gene mutation, rearrangement and gene loss, cp genomes of fifteen IR lacking species (T. grandiflorum, T. hybridum, T. lupinaster, T. occidentale, T. semipilosum, T. aureum, T. boissieri, T. glanduliflerum, T. strictum, T. repens, T. pratense, T. subterraneum, T. meduseum, T. alexandrinum and T. resupinatum) were compared. The results showed that the size of cp genomes of these IR lacking species ranged from 121,178 bp (T. pratense) to 149,026 bp (T. resupinatum), with an average of 134,062 bp ( Table 1). The GC content of those fifteen species changed from 33.80% to 37.20% in whole cp genome with the mean value of 34.99%, and 35.71% to 37.34% in CDS with the mean of 36.55%. Only minor variations were detected in the total numbers of genes, tRNA and mRNA among the selected species. T. pratense possessed the smallest numbers of tRNA (28), mRNA (58) and total number of genes (90). Furthermore, abundant gene rearrangements at the cp genome level were detected among fifteen Trifolium species using MAUVE program and the T. resupinatum as the reference sequence ( Figure 6).

Whole cp Genome Comparison with Other Trifolium Species
In order to examine the sequence divergence of Trifolium genus and further shed light on the evolutionary events, such as gene mutation, rearrangement and gene loss, cp genomes of fifteen IR lacking species (T. grandiflorum, T. hybridum, T. lupinaster, T. occidentale, T. semipilosum, T. aureum, T. boissieri, T. glanduliflerum, T. strictum, T. repens, T. pratense, T. subterraneum, T. meduseum, T. alexandrinum and T. resupinatum) were compared. The results showed that the size of cp genomes of these IR lacking species ranged from 121,178 bp (T. pratense) to 149,026 bp (T. resupinatum), with an average of 134,062 bp ( Table 1). The GC content of those fifteen species changed from 33.80% to 37.20% in whole cp genome with the mean value of 34.99%, and 35.71% to 37.34% in CDS with the mean of 36.55%. Only minor variations were detected in the total numbers of genes, tRNA and mRNA among the selected species. T. pratense possessed the smallest numbers of tRNA (28), mRNA (58) and total number of genes (90). Furthermore, abundant gene rearrangements at the cp genome level were detected among fifteen Trifolium species using MAUVE program and the T. resupinatum as the reference sequence ( Figure 6).

Whole cp Genome Comparison with Other Trifolium Species
In order to examine the sequence divergence of Trifolium genus and further shed light on the evolutionary events, such as gene mutation, rearrangement and gene loss, cp genomes of fifteen IR lacking species (T. grandiflorum, T. hybridum, T. lupinaster, T. occidentale, T. semipilosum, T. aureum, T. boissieri, T. glanduliflerum, T. strictum, T. repens, T. pratense, T. subterraneum, T. meduseum, T. alexandrinum and T. resupinatum) were compared. The results showed that the size of cp genomes of these IR lacking species ranged from 121,178 bp (T. pratense) to 149,026 bp (T. resupinatum), with an average of 134,062 bp ( Table 1). The GC content of those fifteen species changed from 33.80% to 37.20% in whole cp genome with the mean value of 34.99%, and 35.71% to 37.34% in CDS with the mean of 36.55%. Only minor variations were detected in the total numbers of genes, tRNA and mRNA among the selected species. T. pratense possessed the smallest numbers of tRNA (28), mRNA (58) and total number of genes (90). Furthermore, abundant gene rearrangements at the cp genome level were detected among fifteen Trifolium species using MAUVE program and the T. resupinatum as the reference sequence ( Figure 6).

Whole cp Genome Comparison with Other Trifolium Species
In order to examine the sequence divergence of Trifolium genus and further shed light on the evolutionary events, such as gene mutation, rearrangement and gene loss, cp genomes of fifteen IR lacking species (T. grandiflorum, T. hybridum, T. lupinaster, T. occidentale, T. semipilosum, T. aureum, T. boissieri, T. glanduliflerum, T. strictum, T. repens, T. pratense, T. subterraneum, T. meduseum, T. alexandrinum and T. resupinatum) were compared. The results showed that the size of cp genomes of these IR lacking species ranged from 121,178 bp (T. pratense) to 149,026 bp (T. resupinatum), with an average of 134,062 bp ( Table 1). The GC content of those fifteen species changed from 33.80% to 37.20% in whole cp genome with the mean value of 34.99%, and 35.71% to 37.34% in CDS with the mean of 36.55%. Only minor variations were detected in the total numbers of genes, tRNA and mRNA among the selected species. T. pratense possessed the smallest numbers of tRNA (28), mRNA (58) and total number of genes (90). Furthermore, abundant gene rearrangements at the cp genome level were detected among fifteen Trifolium species using MAUVE program and the T. resupinatum as the reference sequence ( Figure 6).

Phylogenetic Divergence Time Estimation
The 41 protein-coding genes shared in cp genomes of the 25 species (23 of Papilionoideae, one of Caesalpinioideae and one of Mimosaceae) were subjected to phylogeny analysis and divergence times estimation ( Figure 7A). The topological structure of phylogenetic tree was almost consistent with the classification of Leguminosae with strong bootstrap support. Three subfamilies of Leguminosae, Papilionoideae, Caesalpinioideae and Mimosaceae were clearly separated. Furthermore, two genes ycf1 and rpoC2, which both contain vast repetitive sequences and high Pi values (0.7008, 0.3982) between T. alexandrinum and T. resupinatum, were also used to construct the phylogenetic trees. The result showed that the topological structure of the phylogenetic relationship of Trifolium species based on ycf1 ( Figure 7B) and rpoC2 ( Figure 7C) was almost in accordance with the phylogenetic tree constructed using 41 shared genes. Each Section of Trifolium was clearly divided based on ycf1 and rpoC2 genes. The Trifolium species of "refractory clade" (including Section Lupinaster, Trifolium, Tricocephalum, Vesicastrum and Trifoliastrum) were grouped together in Figure 7C. In Figure 7B, however, Section Trifolium, Tricocephalum, Vesicastrum and Trifoliastrum were grouped into one clade, and another clade consisted of Section Lupinaster, Paramesus and Subg. Chronosemium. Trifolium species split from Medicago species during the Early Cretaceous (116.1575 Mya) and the divergence time of those fifteen IR lacking Trifolium species ranged from 84.8505 Mya to 4.7720 Mya (Figure 7).

Phylogenetic Divergence Time Estimation
The 41 protein-coding genes shared in cp genomes of the 25 species (23 of Papilionoideae, one of Caesalpinioideae and one of Mimosaceae) were subjected to phylogeny analysis and divergence times estimation ( Figure 7A). The topological structure of phylogenetic tree was almost consistent with the classification of Leguminosae with strong bootstrap support. Three subfamilies of Leguminosae, Papilionoideae, Caesalpinioideae and Mimosaceae were clearly separated. Furthermore, two genes ycf1 and rpoC2, which both contain vast repetitive sequences and high Pi values (0.7008, 0.3982) between T. alexandrinum and T. resupinatum, were also used to construct the phylogenetic trees. The result showed that the topological structure of the phylogenetic relationship of Trifolium species based on ycf1 ( Figure 7B) and rpoC2 ( Figure 7C) was almost in accordance with the phylogenetic tree constructed using 41 shared genes. Each Section of Trifolium was clearly divided based on ycf1 and rpoC2 genes. The Trifolium species of "refractory clade" (including Section Lupinaster, Trifolium, Tricocephalum, Vesicastrum and Trifoliastrum) were grouped together in Figure 7C. In Figure 7B, however, Section Trifolium, Tricocephalum, Vesicastrum and Trifoliastrum were grouped into one clade, and another clade consisted of Section Lupinaster, Paramesus and Subg. Chronosemium. Trifolium species split from Medicago species during the Early Cretaceous (116.1575 Mya) and the divergence time of those fifteen IR lacking Trifolium species ranged from 84.8505 Mya to 4.7720 Mya (Figure 7).

Genome Feature of T. alexandrinum and T. resupinatum and Comparison with Other IR Lacking Trifolium Species
In this study, we sequenced and annotated cp genomes of two IR lacking species T. alexandrinum and T. resupinatum, identified the repeats and hotspot genes within the cp genomes, and constructed the phylogenetic tree along with other cp genomes. Our results added information about cp genomes of Trifolium species and provided new insights into the evolution study of IR lacking species.
The two cp genomes sequenced in this study revealed 75 and 66 protein-coding genes in T. alexandrinum and T. resupinatum, respectively (Table 1). There were nine unique genes in the T. alexandrinum cp genome which were absent in T. resupinatum, which might have been transferred from the cp genome to the nucleus genome of T. resupinatum in the evolutionary process of the species. A similar result was revealed in tobacco with an absent accD gene, which was essential for leaf development [15,16]. The rpl2 gene, which was absent in T. resupinatum, has been completely or partially transferred to the nucleus genome of some legumes like soybean and Medicago [17]. Besides, the loss of two ndh genes (ndhA and ndhB) was usually related to the nutritional status and

Genome Feature of T. alexandrinum and T. resupinatum and Comparison with Other IR Lacking Trifolium Species
In this study, we sequenced and annotated cp genomes of two IR lacking species T. alexandrinum and T. resupinatum, identified the repeats and hotspot genes within the cp genomes, and constructed the phylogenetic tree along with other cp genomes. Our results added information about cp genomes of Trifolium species and provided new insights into the evolution study of IR lacking species.
The two cp genomes sequenced in this study revealed 75 and 66 protein-coding genes in T. alexandrinum and T. resupinatum, respectively (Table 1). There were nine unique genes in the T. alexandrinum cp genome which were absent in T. resupinatum, which might have been transferred from the cp genome to the nucleus genome of T. resupinatum in the evolutionary process of the species. A similar result was revealed in tobacco with an absent accD gene, which was essential for leaf development [15,16]. The rpl2 gene, which was absent in T. resupinatum, has been completely or partially transferred to the nucleus genome of some legumes like soybean and Medicago [17]. Besides, the loss of two ndh genes (ndhA and ndhB) was usually related to the nutritional status and rearrangement in most angiosperm species [18]. Although gene loss of atpF [19], psbN [19], rpoC1 [19] and ycf3 [20] was also found in other species, however, the detailed reason for that gene loss remained to be explained.
Compared to other IR containing cp genomes of angiosperms, the number of protein-coding genes of Trifolium species is less conserved [21,22]. It might be related to the fact that IR lacking will lead to an extensively arranged cp genome thus causing diverse genes loss [11]. According to Millen et al. [23], the vast majority of cp genomes of angiosperms held in shared 74 coding-protein genes but other five genes (accD, ycf1, ycf2, rpl23 and infA) only existed in some specific species. infA gene, which codes for translation initiation factor 1 (IF1), was defunct in all the listed fifteen Trifolium species. Considered as the most transferable gene in cp genome, infA was in existence in about 24 angiosperm lineages including Trifolium species, and related Medicago species [24,25]. It is worth noting that there are two rrn16 gene in each of Trifolium species sequenced in this study, and this phenomenon was also reported in cp genomes of T. strictum [13] and T. glanduliferum [13]. Furthermore, we also found that one of the rrn16 and rrn23 was partially overlapped. Indeed, no similar phenomenon was found in other Trifolium species. Previous studies have shown that Cyanobacteria, Chlamydomonas reinhardtii, Cyanophora paradoxa, Zea mays, Oryza sativa and Arabidopsis thaliana can start and stop transcription anywhere in the cp genome [26]. Chloroplasts are known to originate from ancient Cyanobacteria, this transcription property may be conserved in some species like T. alexandrinum and T. resupinatum, so that the rRNA genes of those two species may gain potential transcription ability. On the other hand, the cp genomes of most plant species are small and in order to avoid costs, genes overlapping happened. The Ycf1 (hypothetical chloroplast reading frame no. 1) gene, generally with the premature stop codons in the CDS thus be defined as pseudogene in other angiosperm [27], has undergone processes of accelerated mutation rate, decreased GC content, and decreased secondary structure stability [28]. In this study, the ycf1 gene detected in all the fourteen Trifolium species (except T. hybridum) contains a normal stop codon and is able to code protein. Two genes ycf4 (hypothetical chloroplast reading frame no. 4) and rps16 (ribosomal protein S16), which were found in most cp genomes of angiosperm and some relic plant like Amborella trichopoda [29], are not present in the two Trifolium species cp genomes sequenced in this study, and rps16 is not present in all the fifteen Trifolium species. The rps8 (ribosomal protein S8) gene was found without stop codons in Medicago truncatula and M. sativa [25], while it possesses a stop codon in the fifteen Trifolium cp genomes. In the gene ndhB, no internal stop codon was detected in the fourteen (except T. resupinatum) Trifolium cp genomes, which is inconsistent with many legume cp genomes whose ndhB gene contains an internal stop codon [25].

Relative Synonymous Codon Usage Analysis (RSCU)
The unequal using frequencies of synonymous codons detected in most sequenced genomes was termed synonymous codons usage bias [30], and is now considered crucial in shaping gene expression and cellular function [31]. RSCU indicates the relative probability of a particular codon encoding the synonymous codon of the corresponding amino acid. In this study, 93.5% of the codons were found prefer A/U in third position, similar to that of M. sativa [25] and Gossypium [32]. This phenomenon could be attributed to the fact that dicotyledon prefers to the A/U-end codons and manifests a potential force in molecular evolution of Trifolium species: mutation and natural selection.

SSRs and Large Repeat Sequences
Given a matrilineal inheritance feature, rich number of tags and low frequency of genetic recombination, cpSSRs (chloroplast simple sequence repeats) are considered an efficient molecular marker in genetic variety analyzing, population structure studying, species identification and phylogeny analysis [33]. The SSRs identified in T. alexandrinum and T. resupinatum were poly(A)/(T), which was consistent with the majority of plant family [34][35][36]. Although there have many studies reported on the application of cpSSRs in plant genetic diversity analysis, the important potential of cpSSRs in studying the ecological and evolutionary processes of wild materials of plants and their related species still need to be recognized [37]. Therefore, the cpSSRs of the two Trifolium species detected in this study can be used to evaluate genetic relationships among different species and to detect polymorphism of Trifolium species and their relatives at the population level.
Repetitive sequences related to the continuously self-replicating of genetic material in the process of evolution, thus indicating the greatly expanded and enriched genetic information [38]. The present study revealed a relatively high repetitive percentage (7.325%, Table 1) in the cp genomes of fifteen IR lacking species, which was higher than IR lacking Tydemania expeditionis (0.4%) and Bryopsis plumose (2.4%) [7]. The number of repeated sequences in the cp genome are associated with rearrangement in some species [14]. However, the driving force of the repetitive sequence was predominantly related to nuclear genes and genomic recombination [12], such as homologous recombination and microhomology-mediated break-induced replication acting on more than 50 bp and less than 30 bp repeats, respectively. Known as "hotspots" for variation [39], ycf1 and rpoC2 possessed high values of Pi ( Figure 5) and the majority of repetitive sequences in T. alexandrinum and T. resupinatum. Therefore, these two genes could have suffered from selection pressure and could be used for phylogenetic analysis and population genetic study of Trifolium species.

Sequence Divergence and Hotspots
Point mutation was generally more common than frame shift for natural mutation [40]. As expected, more SNPs (21963, 6618 Tn and 15345 Tv; Additional file4: Table S4) than In/Del (2097) were found between T. alexandrinum and T. resupinatum. What's more, 60% of them occurred in intergenic regions, which was consistent with the hypothesis that CDS had a slower rate of evolution compared with CNS [41]. Furthermore, minor SNPs (159 between Oryza sativa and O. nivara [42], 330 between Citrus sinensis and C. aurantiifolia [43] and 231 between Machilus yunnanensis and M. balansae [44]) were identified in IR containing species, which were exceptionally smaller than the SNPs between two IR lacking species T. alexandrinum and T. resupinatum calculated in the present study. As an important structure in stabilizing cp genome, the IR region can prevent the genome mutating by selective force [45]. Thus the observed abundant SNPs/Indels in T. alexandrinum and T. resupinatum are not surprising.
The comparison between the Ka and Ks of genes is an important measure of molecular evolution [46]. Most genes were subjected to neutral selection and purification selection; however, there are also limited genes whose rate of Ka is higher than that of Ks because the function of the gene has been dramatically changed, called Darwinian positive selection [47]. Lacking one IR region is believed to directly enhance the nucleotide substitution rate of the single repeat sequence. Previous studies have shown that in the IR lacking cp genome, the nucleotide substitution rate in the remaining repeat region is comparable to that of the single repeat region, which is 2.3 times higher than that in the IR containing cp genome [48]. Here, seven protein-coding genes in the cp genomes of T. alexandrinum and T. resupinatum (rps4, rpoC2, ndhG, ccsA, ndhF, ycf1 and psaC) have a high ratio of Ka to Ks, which is led by high values of Ka but extremely low values of Ks, and could imply that they are under positive selections. rps4 [49] and rpoC2 [50] have been reported to be under positive selection in previous studies. However, beneficial mutations might be fixed in those genes and, thus, reduce genetic polymorphism at selected sites [51].
In general, there is a strong correlation between the presence of IR and structurally stabilization of cp genomes. Substantial rearrangement was usually found in cp genomes lacking IR [13]. Among those IR lacking species of Leguminosae such as alfalfa, subclover, pea, and so forth, some are structurally stable and have not been rearranged, some undergo intermediate rearrangements, while others experienced a series of complex rearrangements [13]. This study found abundant rearrangements within fifteen cp genomes of IR lacking species of Trifolium ( Figure 6). According to Palmer and Thompson [8], IR could prevent the rearrangement of cp sequence to some extent, so the rearrangement probability will be increased in the species lacking IR, this could be why there are many rearrangement events detected among those fifteen Trifolium species [8]. However, lacking IR leading to increased rearrangement is only one of the explanations. The repeats, acting as a locus of recombination within homologous genes, along with transposable elements (TEs), have been suggested as a reasonable mechanism for highly-rearranged cp genomes of Trifolium species [8].

Phylogeny Analysis and Divergence Time
The topological structure of other thirteen Trifolium species using 41 protein coding genes in this study was generally in agreement with the report of Sveinsson and Cronk [13] by 58 protein coding genes. In addition, the phylogenetic location of tested T. alexandrinum and T. resupinatum was confirmed (Figure 7). Furthermore, T. alexandrinum and T. pratense, both belonging to Trifolium Sect. Trifolium, were clustered together though T. alexandrinum and T. subterraneum were grouped together in Malaviya's study based on isozyme data [52]. Three IR lacking species T. boissieri, T. grandiflorum and T. aureum were predicted to differentiate with other twelve species in the late Cretaceous period, then another two IR lacking species T. strictum and T. glanduliferum were further diverted at about 8 Mya. In late Cretaceous period, violent crustal movement and sea-land changes led to a flourished development of angiosperms and IR lacking species might form at the same time. It looks as if the ancestor of some IR lacking species had gone through a battery of evolutionary alternation (including high rearrangement and repetition) and the precise mechanism of such an evolutionary pattern is underway to illuminate.

Plant Material, DNA Isolation and Sequencing
Plant seeds of T. alexandrinum (cv 'Elite II') and T. resupinatum (cv 'Laser') were kindly provided by Barenbrug (Queensland, Australia) then germinated in a growth chamber (25 • C, 300 µmols·m 2 ·s −1 ; 16-h photoperiod). Total DNA was extracted from 50 mg of fresh leaves following the Plant DNA Isolation Kit (Tiangen, Beijing, China). Sheared low molecular weight DNA fragments were used to construct paired-end (PE) libraries according to protocol of Illumina manual (San Diego, CA, USA). Completed libraries were pooled and sequenced in the Illumina NovaSeq platform with PE150 sequencing strategy and 350 bp insert size.

cp Genome Assembly, Annotation and Visualization
The raw read data for two Trifolium species were filtered according to the following criteria-reads of less than 5% unidentified nucleotides and more than 50% of their bases with a quality score of >20 were retained. With the reference genome of T. meduseum [12] (National Center for Biotechnology Information, NCBI number KJ 788288), the cp DNA were assembled as follows. In order to decrease the difficulty of sequences assembly, filtered reads (clean data) were aligned to the cp genome database built by Genepioneer Biotechnologies (Nanjing, China) using Bowtie2 v 2.2.4 [53] and SPAdes v3.10.1 [54] to acquire SEED sequences then obtained contigs by kmer iterative extend seed. Contigs, whose E-values were less than 1×10 −5 , Identities values were close to 100% and Gaps were close to 0 by a BLAST research in NCBI with the data set including T. meduseum [12], T. pratense [13], T. repens [12] and T. subterraneum [20], were obtained. Then the contigs were connected as scaffolds using SSPACE v 2.0 [55] followed by gap filling using Gapfiller v 2.1.1 [56] until the complete chloroplast genome sequence was recovered.

The Relative Synonymous Codon Usage Analysis (RSCU) and Simple Sequence Repeats (SSRs) Prediction
The RSCU was analyzed using MEGA v7.0 to reflect the relative preference of a particular base encoding the corresponding amino acid codon [61]. Values of RSCU over one were considered to be a greater codon frequency. SSRs with the same repeats units and times and distributed in the genic regions were considered as shared repeats, the repetitive sequences were distinguished using VMATCH V2.3.0 (http://www.vmatch.de/) and MISA v1.0 (http://pgrc.ipk-gatersleben.de/misa/misa.html) based on the genomic data, which was also utilized to determine the mono-, di-, tri-, tetra-, penta-and hexa-nucleotides.

Conclusions
cp genomes of T. alexandrinum and T. resupinatum, which belong to inverted-repeat-lacking clade (IRLC), were sequenced and annotated in present study and were compared with the cp genomes of other thirteen IR lacking Trifolium species reported previously. The results revealed abundant SNP and In/Del in T. alexandrinum and T. resupinatum cp genomes and high variation in CDS and abundant rearrangement within Trifolium genus. This valuable information will provide insight into the evolution of IR lacking species.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/4/478/s1, Figure S1. The relative synonymous codon usage (RSCU) for the T. alexandrinum and T. resupinatum. Figure S2. The assembly sequence genomic coverage maps of T. alexandrinum and T. resupinatum. Table S1. Location and length of intron-containing genes in the chloroplast genomes of T. alexandrinum and T. resupinatum . Table S2. The shared repeats of T. alexandrinum and T. resupinatum. * means the shared location for T. alexandrinum and T. resupinatum, res means locations particular for T. resupinatum, the numbers of "Number" mean number of repeats in T. alexandrinum and T. resupinatum, respectively. Table S3. The relative synonymous codon usage (RSCU) analyzed using CodonW. Table S4. Transversion (Tv) and transition (Tn) were detected between T. alexandrinum and T. resupinatum. Table S5. The synonymous/synonymous substitution rates (Ka/Ks) calculated using 59 shared genes in T. alexandrinum and T. resupinatum. Table S6. The nucleotide diversity (Pi) computed using 96 common genes of T. alexandrinum and T. resupinatum.