The Plastome Sequences of Triticum sphaerococcum (ABD) and Triticum turgidum subsp. durum (AB) Exhibit Evolutionary Changes, Structural Characterization, Comparative Analysis, Phylogenomics and Time Divergence

The mechanism and course of Triticum plastome evolution is currently unknown; thus, it remains unclear how Triticum plastomes evolved during recent polyploidization. Here, we report the complete plastomes of two polyploid wheat species, Triticum sphaerococcum (AABBDD) and Triticum turgidum subsp. durum (AABB), and compare them with 19 available and complete Triticum plastomes to create the first map of genomic structural variation. Both T. sphaerococcum and T. turgidum subsp. durum plastomes were found to have a quadripartite structure, with plastome lengths of 134,531 bp and 134,015 bp, respectively. Furthermore, diploid (AA), tetraploid (AB, AG) and hexaploid (ABD, AGAm) Triticum species plastomes displayed a conserved gene content and commonly harbored an identical set of annotated unique genes. Overall, there was a positive correlation between the number of repeats and plastome size. In all plastomes, the number of tandem repeats was higher than the number of palindromic and forward repeats. We constructed a Triticum phylogeny based on the complete plastomes and 42 shared genes from 71 plastomes. We estimated the divergence of Hordeum vulgare from wheat around 11.04–11.9 million years ago (mya) using a well-resolved plastome tree. Similarly, Sitopsis species diverged 2.8–2.9 mya before Triticum urartu (AA) and Triticum monococcum (AA). Aegilops speltoides was shown to be the maternal donor of polyploid wheat genomes and diverged ~0.2–0.9 mya. The phylogeny and divergence time estimates presented here can act as a reference framework for future studies of Triticum evolution.


Introduction
Triticeae Dumort. is an economically valuable grass tribe with around 360 species and subspecies in 20-30 genera. The genus Triticum is an important agricultural allopolyploid complex containing two diploids, two tetraploids, and two hexaploids. Over 10,000 years, one diploid and both tetraploid species have been domesticated, whereas two hexaploid species emerged under cultivation in Eurasia [1]. Hybridization of a cultivated type of tetraploid Triticum turgidum (AABB genomes) with diploid goat grass Aegilops tauschii province, Pakistan. The plants were identified by lead taxonomist at National Agricultural Research Centre (NARC), Pakistan. Both species, T. sphaerococcum and T. turgidum subsp. durum are available with accesion number 3 and 25, respectively at Gene Bank of Bioresources Conservation Institute, Pakistan. The leaf samples were collected in plastic zip bags and immediately kept in liquid nitrogen and stored at −80 • C for further analysis. A standard protocol for DNA extraction was followed, as described previously [34]. Pure DNA was sequenced using an Illumina HiSeq2000. In total, 115,455,321 and 86,123,254 raw reads were generated for T. sphaerococcum and T. turgidum subsp. durum, respectively. Next, the quality of paired-end Illumina reads was assessed in FastQC and the pipeline GetOrganelle version 1.6.2. [35] was used to select trimmed reads with default settings that corresponded to the plastid using the plastome of T. aestivum as a reference. Finally, the plastid-filtered reads from GetOrganelle version 1.6.2 were imported into Geneious Prime using the default settings. Furthermore, the complete plastomes of 19 Triticum species available in GenBank (as of 18 May 2021) were downloaded from the NCBI database (Table S1). The species with incorrect annotations were reannotated using CpGAVAS [36] and DOGMA [37] (http://dogma.ccbb.utexas.edu/ accessed on 25 February 2022, China). Moreover, tRNAscan-SE version 1.21 [38] was used to detect tRNA genes. Finally, the annotations were verified by Geneious Prime [39]. Graphical presentations were completed in R 4.0 and the ggplot2 package [40]. The number of genes shared among Triticum plastomes was identified via a Venn diagram webtool (bioinformatics.psb.ugent.be/webtools/Venn/ accessed on 25 February 2022). Pearson's correlation coefficients were used to evaluate the associations among different plastomes characteristics, and the graphs were illustrated using R software (https://www.r-project.org/ accessed on 25 February 2022) cor function.

Characterization of Repetitive Sequences and SSRs
REPuter was used to determine the repetitive sequences (direct, reverse, and palindromic repeats) within plastomes [41]. For repeat identification, the following settings were used in REPuter: (1) a minimum repeat size of 30 bp, (2) ≥90% sequence identity, and (3) a Hamming distance of 1. Tandem Repeats Finder version 4.07b was used to find tandem repeats with the default settings applied [42]. To find SSRs, MISA [43] was employed with the search parameters set to ≥3 repeat units for pentanucleotide and hexanucleotide repeats, ≥4 repeat units for trinucleotide and tetranucleotide repeats, ≥8 repeat units for dinucleotide repeats, and ≥10 repeat units for mononucleotide repeats.

Sequence Divergence, Phylogenetic Analyses, and Divergence Time
For genome divergence among Triticum plastomes, mVISTA [44] in Shuffle-LAGAN mode was used with T. turgidum subsp. durum selected as a reference genome. The complete plastome and 42 shared protein-coding genes sequence divergence among Triticum species was calculated. Comparative analysis was performed after multiple sequence alignment and gene order was compared to identify ambiguous and missing gene annotations. To align the complete plastomes, MAFFT version 7.222 [45] was used with the default parameters and Kimura's two-parameter (K2P) model [46] was used to determine pairwise sequence divergence. To resolve the phylogenetic position of Triticum, the complete plastomes and 42 shared genes from 71 (including 23 complete and 39 draft plastomes) of Triticum were used for analysis. Initially, a separate ML analysis of these data were conducted using RAxML [47] implemented in CIPRES with the default general time reversible (GTR + G) model and the fast bootstrap option, as previously reported by [48]. The resulting phylogenetic reconstruction was displayed using FigTree version 1.4.1 [49] and Interactive Tree Of Life (iTOL) version 6 [50].
To determine the divergence time of the Triticum genus relative to those of Triticeae species, we used a concatenated data matrix. Briefly, the GTR + G substitution model was used with four rate categories and a Yule tree speciation model was applied with a lognormal relaxed clock model in BEAST [51] using a prior rate of substitution. We used an average substitution rate of 3.0 × 10 −9 substitutions per site per year (s/s/y) and a fossil-based method to calibrate the molecular divergence as previously reported [52]. To root the calibration time, we included four outgroup Hordeum species: H. vulgare subsp. vulgare, H. vulgare subsp. spontaneum, H. jubatum, and H. bogdani. We also incorporated four fossil constraints (Supplementary Table S2) that are widely recognized and have been used in previous molecular dating of Triticeae [53]. The mean root height constraint of 11.8 ± 1.8 mya was based on the work of [53,54]. This calibration is consistent with previous work completed by [22]. We selected these outgroups as all these species are closely related to our study model species and have fossil records older than that of the Triticum genus [53].
Calibration limited to the root height was used to compare with internal calibrations and assess their influence on divergence time estimates. The dating analyses involved three independent Markov Chain Monte Carlo runs of 25 million generations. LogCombiner was used to combine the tree files from each of the three runs. Convergence and effective sample sizes were assessed in Tracer 1.5 [55]. From each analysis, we removed 25% of trees as burn-in. Finally, the tree was calculated using TreeAnnotator and a tree with the 95% highest posterior density was visualized in FigTree 1.4.

Triticum Plastome Characteristics (Genome Size and GC Content Variation)
The complete chloroplast genomes (cp) of the two sequenced Triticum species, T. sphaerococcum (MZ230675), T. turgidum subsp. durum (MZ230674), are circular molecules like typical angiosperm cp genomes having quadripartite structures. The sizes of the T. sphaerococcum and T. turgidum subsp. durum plastomes are 134,015 bp and 134,531 bp, respectively ( Figure 1). Both T. sphaerococcum and T. turgidum subsp. durum plastomes were analyzed and compared with 19 associated Triticum cp genomes, with sizes ranging from 133,873 bp (T. aestivum; KJ592713) to 136,886 bp (T. monococcum subsp. monococcum (LC005977)) ( Table 1). Even in the sequenced plastomes of similar species reported by various authors, variation in plastome length was detected in Triticum species. Both T. sphaerococcum and T. turgidum subsp. durum species in this study, had a pair of IR regions 20,699 and 20,701 bp, which divided the LSC region from the SSC, respectively, whereas the LSC and SSC lengths in T. sphaerococcum (80,342 and 12,791) and T. turgidum subsp. durum (79,817 and 12,788 bp). Both of these species had the same GC content in their genomes, 38.3% in the whole plastome and 44% in the IR regions ( Figure 1, Table 1).

Gene Content and Gene Loss in Triticum Plastomes
The gene content of the 21 Triticum plastomes varied considerably. There were 72-89 protein-coding genes, 8 rRNA genes, and 32-42 tRNA genes in these plastomes ( Figure 1, Table 1). The number of genes annotated in a plastome ranged from 112 (T. macha, T. monococcum subsp. monococcum) to 136 (T. turgidum subsp. durum cultivar Langdon). Both the plastomes sequenced in this study had 131 genes including 8 rRNA genes, 39 tRNA genes, and 84 protein-coding genes (5 genes for photosystem I, 15 genes associated with photosystem II, 11 genes for large ribosomal proteins, and 17 genes related to small ribosomal proteins) ( Table 2). In both plastomes, the lengths of the protein-coding region, tRNA, and rRNA were 59,538, 3004, and 9192 bp, respectively ( Table 1). The gene contents were generally conserved throughout all Triticum species (Figure 2). The ycf1, ycf2, ycf15, and ycf68 genes were lacking in almost all plastomes ( Figure 2), with the exception of the T. turgidum subsp. durum langdon plastome in which the ycf1 gene was detected. Notably, as in previous plastomes, the plastid gene accD was lost in all Triticum plastomes. The rps12 gene (small ribosomal protein 12) is trans spliced and has one intron; the 5 end exon is in the LSC region, whereas the 3 end exon is in the IRb regions and duplicated in the IRa region.    The gene content of the 21 Triticum plastomes varied considerably. There were 72-89 protein-coding genes, 8 rRNA genes, and 32-42 tRNA genes in these plastomes ( Figure 1, Table 1). The number of genes annotated in a plastome ranged from 112 (T. macha, T. monococcum subsp. monococcum) to 136 (T. turgidum subsp. durum cultivar Langdon). Both the plastomes sequenced in this study had 131 genes including 8 rRNA genes, 39 tRNA genes, and 84 protein-coding genes (5 genes for photosystem I, 15 genes associated with photosystem II, 11 genes for large ribosomal proteins, and 17 genes related to small ribosomal proteins) ( Table 2). In both plastomes, the lengths of the protein-coding region, tRNA, and rRNA were 59,538, 3004, and 9192 bp, respectively ( Table 1). The gene contents were generally conserved throughout all Triticum species ( Figure 2). The ycf1, ycf2, ycf15, and ycf68 genes were lacking in almost all plastomes ( Figure 2), with the exception of the T. turgidum subsp. durum langdon plastome in which the ycf1 gene was detected. Notably, as in previous plastomes, the plastid gene accD was lost in all Triticum plastomes. The rps12 gene (small ribosomal protein 12) is trans spliced and has one intron; the 5′ end exon is in the LSC region, whereas the 3′ end exon is in the IRb regions and duplicated in the IRa region.   Each of the sequenced Triticum species contained 12 intron-containing protein genes and 8 tRNA genes. Similar to in other angiosperm plastomes, ycf3 contained two introns, whereas the remaining genes had only a single intron while the rps12 gene is transspliced. The smallest introns were found in the trnS-CGA gene in both T. sphaerococcum and T. turgidum subsp. durum (658 bp), whereas the largest intron-containing gene was found in trnK-UUU in T. sphaerococcum (2486 bp) and in T. turgidum subsp. durum (2490 bp) ( Table 3). The protein-coding region formed 44.2% of the whole plastome of T. sphaerococcum and 44.4% of that of T. turgidum subsp. durum. Similarly, tRNA and rRNA respectively comprised 2.23% and 6.83% of former species and 2.24% and 6.85% of the latter species.

Functional Repeats within Triticum Plastomes
In repeat analysis, different forms of repeat sequences in the 21 plastomes, including T. sphaerococcum and T. turgidum subsp. durum, were analyzed. T. sphaerococcum had 19 palindromic repeats, 23 forward repeats, and 29 tandem repeats, whereas T. turgidum had 21 palindromic repeats, 25 forward repeats, and 27 tandem repeats ( Figure 3). Across all species, there was some variation in the number of repeats. The overall number of repeats in these genomes (including palindromic, forward, and tandem repeats) ranged from 59 (T. monococcum subsp. monococcum) to 75 T. aestivum (NC002762), with 71 and 73 repeats found in T. sphaerococcum and T. turgidum subsp. durum respectively. Among these repeats, the highest number of palindromic repeats (21) were detected in four species including T. turgidum subsp. durum ( Figure 3A-E). However, the highest number of forward repeats (26) was detected in the T. timopheevii cultivar Tim01 and T. zhukovskyi plastomes. Similarly, the highest number of tandem repeats were detected in the T. aestivum (NC002762) and T. turgidum subsp. durum cultivar Langdon plastomes. Overall, there was a positive correlation between plastome size and IR length ( Figure 3F

Simple Sequence Repeat (SSR) Analysis in Triticum Plastomes
We analyzed perfect SSRs in all studied plastomes (Figure 4). Similar to other plastome characteristics, there were some variations in the number of SSRs in these plastomes: SSR numbers ranged from 124 (T. aestivum NC002762) to 132 (T. timopheevii cultivar Tim01 and T. zhukovskyi). Unexpectedly, SSR numbers also varied even in plastomes of the same species; for example, in T. aestivum (NC002762) 124 SSRs were detected whereas 131 were detected in T. aestivum (KJ614403). We observed a positive association between SSR numbers and plastome size in Triticum ( Figure 3F). The T. sphaerococcum and T. turgidum subsp. durum plastomes sequenced in this study contained 128 and 127 SSRs, respectively. In T. sphaerococcum, of the 128 SSRs, 121 were mononucleotide repeats ( Figure 4A); 1 di, 3 tri, and 3 pentanucleotides were detected; and 82% of SSRs were found in LSC regions, around 10.1% were in SSC regions, and 3.9% were in IR regions. Similarly, T. turgidum subsp. durum contained 120 mononucleotides, with similar numbers of di, tri, and pentanucleotides to those in the T. sphaerococcum plastome, and SSRs present at 81.8% in LSC regions, 10.2% in SSC regions, and 3.9% in IR regions (Figure 4). In all plastomes, the most abundant repeat motifs were mononucleotides, ranging from 117 in T. aestivum (NC002762) to 125 in T. timopheevii (Tim01), followed by trinucleotides and pentanucleotides, which were next most abundant in most plastomes ( Figure 4). Using our search criterion, tetranucleotides and hexanucleotide SSRs were absent in most plastomes except in those of T. aestivum spleta PI384000, T. turgidum TA0060, and T. turgidum cultivar TA1133, in which one hexanucleotide was observed in each plastome. Notably, only one dinucleotide SSR was detected in most plastomes.

Comparative Analysis and Divergence of Triticum Plastomes
Comparison of T. sphaerococcum and T. turgidum subsp. durum with the 19 related plastomes showed that variation existed in whole plastomes. We aligned and compared all plastomes to find the average pairwise distance among the species (Table S2). T. sphaerococcum was used a reference in pairwise sequence divergence analysis, in which it showed the highest divergence (0.004) with T. monococcum and the lowest divergence with all of Triticum astivum, Triticum trugidum, and T. timopheevii species (0.001) (Table S2). Similarly, the synteny of T. sphaerococcum and T. turgidum subsp. durum plastomes with the 19 plastomes of the other Triticum species was analyzed by mVISTA. The results showed high sequence similarities among the plastomes of several species, especially in protein-coding and IR regions ( Figure S1). However, divergence was observed in noncoding regions compared with coding regions. The matK gene exhibited almost similar divergence in all plastomes. The region between the rps16 and psbl genes showed the highest divergence, except for in T. sphaerococcum that had comparatively low divergence. The atpF gene exhibited divergence in all cp genomes excluding those of T. sphaerococcum and T. turgidum subsp. durum. Similarly, the noncoding region between the trnL and ndhJ genes of T. monococcum subsp. monococcum showed large divergence with the region in T. turgidum subsp. durum plastomes. Similar results were observed in psbE and petL regions. Furthermore, a significant divergence was observed in all plastomes between rpl23-ndhB regions. Moreover, 42 protein-coding gene sequences were compared to obtain the average pairwise distance among 21 Triticum plastomes. Results showed relatively lower levels of average pairwise divergence and various divergent genes, i.e., matK, ccsA, ndhH, petA, psbD, rpl14, and ycf3, were detected in these plastomes. The highest pairwise divergence was detected in the matK (0.67) and ccsA (0.56) genes ( Figure 4E).

Evolution and Origin of IRs in Triticum Plastomes
IR regions are considered to be the most conserved regions in a chloroplast genome. Larger plastome sizes correlate with larger IR lengths. Similar to previously described angiosperm plastomes, both the T. sphaerococcum and T. turgidum subsp. durum plastomes also contained IRs with lengths of 20,699 and 20,701 bp, respectively. The gene arrangement in the IR region of these plastomes was more similar to that of T. aestivum, in which, seven protein-coding genes (rpl2, rpl23, rps12, rps7, ndhB, rps15, and ndhH) are duplicated. Comparative analyses of the plastomes suggested that the smallest IR region was found in T. turgidum subsp. durum cultivar Langdon (17,066 bp), whereas the largest was detected in all four T. timopheevii cultivars (21,553 bp). Furthermore, the IR borders for different genes in the T. sphaerococcum and T. turgidum subsp. durum plastomes were compared with already published plastomes of Triticum. Due to the conserved nature of these Triticum plastomes no significant changes were observed in four IR junctions (J LB , J SB , J SA , and J LA ) and genes located across these borders. Although, few variations were detected in some plastomes like location of ndhH at J SA is 975 bp in SSC and 207 bp in IRa in all plastomes except in Triticum turgidum subsp. durum cultivar Langdon which slightly deviating and located 1004 bp in SSC and 208 bp in IRa. ( Figure S2). Furthermore, the rps19 is present in IRa region at J LA of all plastomes but it is located in LSC at J LA in Triticum turgidum subsp. durum cultivar Langdon and resulted in changing the location of rpl22 gene comparing to all other plastomes. The overall nature of all compared plastomes was found very conserved.

Plastome Phylogenomics and Diversification of Triticum Plastomes
We used full-length plastome sequences and 42 shared protein-coding concatenated genes among all Triticum species to infer the phylogenetic position and divergence time of both T. sphaerococcum and T. turgidum subsp. durum in relation to the other Triticum species. Phylogenetic analyses were performed using maximum likelihood (ML) and Bayesian inference methods. Interestingly, the current study provides the first molecular phylogeny of the Triticum genus based on the complete plastomes and 42 shared genes from all the Triticum species available in the NCBI database. The plastomes of barley (Hordeum species) were used as an outgroup for the Triticum/Aegilops complex. The molecular clock was calibrated using 11.6 million years ago (mya) for the divergence time between barley and wheat ( Figures 5 and 6). The topology of these trees is almost identical. According to our phylogenetic analysis, Sitopsis species including S 1 S 1 , S s S s , S h S h , and S b S b genomes diverged before T. urartu (AA) and T. monococcum (AA). Similarly, A. speltoides (SS) form a monophyletic clade with T. timopheevii (AAGG) and T. zhukovskyi (A m A m AAGG), as previously reported [56]. Polyploid Triticum species and A. speltoides formed a clade indicating that A. speltoides is the maternal donor of polypoloid wheat genomes ( Figure 5). The precise phylogenetic relationship among the A, B, and D genomes remains a topic for debate. In current results, all hexaploidy T. aestivum species form a clade with wild tetraploid T. turgidum species based on 60 protein-coding genes. Similar results were observed from whole plastome results except for those from T. turgidum subsp. durum and T. aestivum (NC002762) that both made a separate clade ( Figure 6). The divergence time of AB and ABD genomes was estimated at 0.8 to 0.4 mya based on both the complete plastomes and 42 shared genes. The sequenced plastomes of T. sphaerococcum and Triticum T. turgidum subsp. durum shared the same clade and diverged 0.19 to 0.24 mya on both complete plastomes and protein-coding genes (Figures 5 and 6).

Discussion
In this study, we sequenced two plastomes from the Triticum genus, those of T. sphaerococcum (ABD) and T. turgidum subsp. durum (AB), and compared them to 19 complete plastomes that were accessible from GenBank to investigate the relationships within Triticum. Both the T. sphaerococcum and T. turgidum subsp. durum plastomes were found to have a quadripartite structure with plastome lengths of 134,531 and 134,015 bp, respectively. Vari-ations were observed in plastome size and GC content compared with those of plastomes from related species, especially with AA genomes. As with previously reported plastomes, variation in plastome length may be due to factors such as loss of IR regions and absence of essential genes [57][58][59]. Diploid (AA), Tetraploid (AB, AG) and hexaploid (ABD, AGA m ) Triticum species plastomes showed conserved gene content and commonly harbored an identical set of annotated unique genes. These plastomes comprised 72-89 protein-coding genes, 4-8 rRNA genes, and 32-42 tRNA genes (Table 1), which are characteristic of plastomes [60][61][62][63]. The number of genes annotated in plastomes ranged from 112 (T. macha and T. monococcum subs monococcum) to 136 (T. turgidum subsp. durum cultivar Langdon).
Gene transfer and loss commonly occur in plant plastomes [64,65]. For example, accD, ycf1, ycf2, rpl23, rpl22, infA, rps16, and ycf4 were reported partially or completely lost from the legume plastomes [66,67]. Of these, some genes, such as infA, have even b plant plastomes een lost multiple times. To the best of our knowledge, this is the first study to use such a large number of representative plastomes to assess gene loss patterns in closely related plants; here, we observed that the ycf1, ycf2, ycf15, and ycf68 genes were lost from the Triticum plastomes that have been sequenced and analyzed ( Figure 2). According to our findings the T. macha has the highest number of missing protein coding genes comparing to other Triticum plastomes i.e., ycf1, ycf2, ycf 15, ycf68, petB and petD whereas the T. monococcum subs monococcum and T. turgidum subsp. durum cultivar Langdon misses one (each) protein coding gene (ycf1 and ycf15 respectively) ( Figure 2). All these plastomes lacked the accD gene, which encodes for a subunit of acetyl-CoA carboxylase. Similar findings have previously been observed in plastomes [59] of Poaceae members. However, this gene has been lost or pseudogenized in several species from Campanulaceae [68,69], Geraniaceae [70], Oleaceae [71], and all Poaceae [59] including the Triticeae studied in this work. The deletion of the accD gene from chloroplast genomes and its related protein encoded by a nuclear gene in Poaceae perhaps is the first example for a non-ribosomal component [72]. In conclusion, to reduce the size of a chloroplast genome, in addition to gene transfer from the chloroplast to the nucleus, a chloroplast gene could be deleted and a nuclear gene could instead encode the related protein [59].
The remarkable feature of a plastome is the conservation of its most prominent structure and genome size [73]. To determine the contribution of repetitive DNA sequences to genome variation and evolution, we investigated their evolutionary dynamics across these closely related plastomes. The total number of repeats (including palindromic, forward, and tandem repeats) in these genomes ranged from 59 (T. monococcum) to 75 (T. aestivum (NC002762)) as shown in Figure 3. Overall, there was a positive correlation between the number of repeats and plastome size. In all plastomes, the number of tandem repeats was higher than the number of palindromic and forward repeats. However, the molecular mechanism underlying the de novo generation of novel repeated sequences in a chloroplast genome remains largely unresolved [74]. Numerous repeat numbers have previously been identified in angiosperm plastomes [52,74,75], but the mechanisms underlying the appearance of these tandem repeats remain to be elucidated. Nevertheless, plastome rearrangement, gene expansion, and gene duplication are known to be associated with such repeats [76,77].
Plastome SSR diversity is an attractive research area in plant biology because of SSR's codominant inheritance, high reproducibility, multiallelic composition, richness, and ease of detection [78,79]. SSRs are normally small tandem mononucleotide repeats, typically found in the chloroplast genome noncoding regions, which usually exhibit intraspecific repeat number differences [80,81]. Genome-wide characterization of SSRs showed that, similar to other plastome characteristics, there was variation in the number of SSRs in these plastomes: SSR numbers ranged from 124 (T. aestivum NC002762) to 132 (T. timopheevii cultivar Tim01 and T. zhukovskyi). The T. sphaerococcum and T. turgidum subsp. durum plastomes sequenced in this study contained 128 and 127 SSRs, respectively ( Figure 4). Among these SSR motifs, the most abundant repeat motifs were mononucleotides, ranging from 117 in T. aestivum (NC002762) to 126 in T. urartu cultivar PI428335, followed by trinucleotides and pentanucleotides. Using our search criterion, tetranucleotide and hexanucleotide SSRs were absent in most plastomes. Similarly, a linear relationship between SSR numbers and plastome size was detected in Triticum plastomes. A similar correlation was discovered in previous analysis of various plant whole genomes [79]. As previously reported, genome rearrangement and sequence diversity occur due to an incorrect recombination of these repeat sequences and slipped strand mispairing [82,83]. Furthermore, the occurrence of these repeats indicate that the region is a critical hotspot for the reconfiguration of the plastome genome [83]. Additionally, these repeats represent a useful source for developing genetic markers for Triticum species, which could be further applied in phylogenetic and population studies.
Comparative analysis of Triticum plastomes showed that analysis of genes with known functions shared 42 protein-coding genes. Furthermore, pairwise sequence divergence analysis shown that T. sphaerococcum exhibited the highest divergence with T. urartu. Similarly, synteny analysis showed high sequence similarities among the Triticum plastomes, especially in the protein-coding and IR regions. The psbD and atpF genes exhibited divergence in all plastomes excluding T. sphaerococcum and T. turgidum subsp. durum. Similarly, the noncoding region between the trnL-ndhJ gene of T. monococcum subsp. monococcum was significantly divergent from that of the T. turgidum subsp. durum plastomes. Similar results were observed for psbE and petL regions. Furthermore, a significant divergence was observed in all plastomes between the rpl23-ndhB regions. Moreover, the average pairwise distances among 42 shared protein-coding genes revealed that the most divergent genes were matK, ccsA, ndhH, petA, psbD, rpl14, and ycf3 in almost all species. This is in agreement with previously reported findings for the plastomes of angiosperms [33,52]. Menezes, et al. [84] concluded that divergent plastome genes are predominantly detected in the LSC regions, suggesting a more rapid evolution trend [85].
In addition to the overall conservation of gene content, the analyzed Triticum plastomes were structurally conserved and they displayed an almost uniform order of the same set of genes with slight differences at boundary junctions. The plastomes of Triticum in general are smaller (like other Poaceae members) [59] than most angiosperms due to the pseudogenization of the ycf1 and ycf2 genes, i.e., two of the longest open reading frames in angiosperms. Due to the conserved nature of these Triticum plastomes, no substantial changes in four IR junctions and gene positions across IR boundaries were identified, with only minor variations between plastomes. This is not exceptional, as a slight IR expansion/contraction has also been reported in other angiosperm plastomes [71,86,87].
Despite a plethora of molecular phylogenies, it remains a lack of understanding of the relationships among the Triticum genus [20,22,23,[88][89][90][91][92]. The acceptance levels of taxa vary greatly among studies at the genus level and below [20,93,94]. One important reason for this is the complex mode of evolution within Triticeae. The majority of species are allopolyploids and many of them likely originated repeatedly with the process involving genetically different parent species [95][96][97]. Bread wheat is the most prominent polyploid and evolved via consecutive hybridizations of three diploids; thus, it combines three related genomes (A, B, and D) [11,92]. It is assumed that diploid species and monogenomic taxa are the basic units within Triticeae and that the heterogenomic polyploids form a second level of taxonomic entities [98,99]. The paternal ancestors of polyploid wheat, i.e., T. urartu, T. monococcum, and A. tauschii, have been well established based on the sequence analysis of many nuclear genes. The history of the maternal ancestors of the G genome of Timopheevi and the B genome of Emmer wheat has been more difficult to decipher based on nuclear genome sequences.
We presented phylogenetic analyses of Triticum based on complete plastomes and 42 concatenated protein-coding genes shared among these Triticum species. Our results are similar to those of previous studies and in particular the divergence of H. vulgare from wheat, which was estimated previously to have occurred approximately 8-12 mya [53,56,100] is 11.04 and 11.9 mya using complete plastomes and protein-coding genes, respectively. These values are in the range of estimates reported by [53] (approx. 7-16 mya) and [100] (8.3-11.3 mya), but more similar to the 11.4 ± 0.6 mya reported by [56]. Similarly, of particular interest to us was the precise phylogeny and dating of the age of the wheat genome donors. Our analysis indicates that the donors (or their close relatives) of the wheat genomes diverged within the past 3 million years as previously reported [101]. According to our phylogenetic analysis, Sitopsis species, including S 1 S 1 , S s S s , S h S h , and S b S b genomes, diverged before T. urartu (AA) and T. monococcum (AA) around 2.8-2.9 mya. Our estimates are in the range of previous reports [53,56], which ranged rather widely from 2 to 6 mya. Similarly, T. monococcum diverged from T. urartu about 1.7-1.9 mya. These results are more recent than the previous estimates [53,56]. Furthermore, our results showed that polyploid Triticum species and A. speltoides formed a clade indicating that A. speltoides is the maternal donor of polyploid wheat genomes and diverged~0.2-0.9 mya based on complete plastomes and protein-coding gene trees, respectively, from T. timopheevii (AABB) and T. zhukovskyi (GG) genomes. Similar results were reported previously by various researchers [22,53,101]. The relationships within this clade support the hypothesis that two different A. speltoides lineages were involved in their formation [5,9,22,102]. As in previous studies, the direct maternal donor for T. timopheevii and T. zhukovskyi (G) were identifiable because they share the chloroplast haplotype of A. speltoides genomes. According to our results, the donor remains uncertain for T. turgidum and T. aestivum (B), indicating that either our sampling of A. speltoides was insufficient to cover the species diversity or that a now extinct donor lineage previously existed. Aegilops speltoides and the polyploid wheat species form three groups: (1) most A. speltoides accessions form a clade of their own (S), (2) they share a clade with T. timopheevii and T. zhukovskyi wheat, and (3) all accessions of T. turgidum and T. aestivum share the same haplotype (B). Additionally, the usage of entire plastomes and shared protein-coding genes also suggests that diploid Triticum species (A) diverged from D-genome taxa and the remaining Aegilops species 2.83 mya.

Conclusions
Decoding Triticum plastomes during the last three decades has greatly increased levels of available plastomic data and provided an improved picture of Triticum plastome evolution. The elucidated plastomes of diploid (AA), tetraploids (AB, AG) and hexaploids (ABD, AGAm) show conserved gene content. Despite significant effort to determine Triticum plastomes at the genus/species level, certain species remain poorly studied; thus, full systematic phylogenetic studies are lacking. Based on 42 shared genes and full plastomes, the tree height between H. vulgare and Triticum was found to be 11.04 and 11.9 mya, respectively. In the future, additional plastomes will be sequenced and comparatively analyzed to provide a more complete picture of Triticum plastome evolution.