Comparative Chloroplast Genomics of 21 Species in Zingiberales with Implications for Their Phylogenetic Relationships and Molecular Dating

Zingiberales includes eight families and more than 2600 species, with many species having important economic and ecological value. However, the backbone phylogenetic relationships of Zingiberales still remain controversial, as demonstrated in previous studies, and molecular dating based on chloroplast genomes has not been comprehensively studied for the whole order. Herein, 22 complete chloroplast genomes from 21 species in Zingiberales were sequenced, assembled, and analyzed. These 22 genomes displayed typical quadripartite structures, which ranged from 161,303 bp to 163,979 bp in length and contained 111–112 different genes. The genome structures, gene contents, simple sequence repeats, long repeats, and codon usage were highly conserved, with slight differences among these genomes. Further comparative analysis of the 111 complete chloroplast genomes of Zingiberales, including 22 newly sequenced ones and the remaining ones from the national center for biotechnology information (NCBI) database, identified three highly divergent regions comprising ccsA, psaC, and psaC-ndhE. Maximum likelihood and Bayesian inference phylogenetic analyses based on chloroplast genome sequences found identical topological structures and identified a strongly supported backbone of phylogenetic relationships. Cannaceae was sister to Marantaceae, forming a clade that was collectively sister to the clade of (Costaceae, Zingiberaceae) with strong support (bootstrap (BS) = 100%, and posterior probability (PP) = 0.99–1.0); Heliconiaceae was sister to the clade of (Lowiaceae, Strelitziaceae), then collectively sister to Musaceae with strong support (BS = 94–100%, and PP = 0.93–1.0); the clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)) was sister to the clade of (Musaceae, (Heliconiaceae, (Lowiaceae, Strelitziaceae))) with robust support (BS = 100%, and PP = 1.0). The results of divergence time estimation of Zingiberales indicated that the crown node of Zingiberales occurred approximately 85.0 Mya (95% highest posterior density (HPD) = 81.6–89.3 million years ago (Mya)), with major family-level lineages becoming from 46.8 to 80.5 Mya. These findings proved that chloroplast genomes could contribute to the study of phylogenetic relationships and molecular dating in Zingiberales, as well as provide potential molecular markers for further taxonomic and phylogenetic studies of Zingiberales.

In addition to studying phylogenetic relationships in Zingiberales, researchers have also explored the divergence time of Zingiberales, but most studies have been based on gene fragments.Kress and Specht (2006) [7] constructed the phylogenetic divergence time of Zingiberales based on three genes (atpB, rbcL, and 18S) from 36 taxa (including 24 species and 12 outgroups) and five calibration points, indicating that Zingiberales emerged approximately 124 million years ago (Mya), with major family-level lineages becoming established approximately 80-110 Mya.Specht (2006) [22] used chloroplast trnL-F and trnK sequence data for 37 taxa and one outgroup, and estimated the divergence time of Costaceae.The results showed that the initial diversification within Costaceae occurred approximately 65 Mya [22].Givnish et al. (2018) [6] used draft chloroplast genomes generated for 52 species and estimated the divergence time of Zingiberales using five genes (atpB, psaA, psbD, rbcL, and rps4).The results indicated that the stem and crown nodes of Zingiberales took place 114 Mya and 83 Mya, respectively [6].More recently, Fu et al. (2022) [9] used five genes (ccsA, matK, ndhF, rpoC1, and rpoC2) from 61 chloroplast genomes of Zingiberales and one outgroup to estimate the divergence time, showing that the stem and crown nodes of Zingiberales occurred at 98.57 Mya and 87.59 Mya, respectively [9].Additionally, Ashokan et al. (2022) [26] used three chloroplast markers (trnK/matK, trnL, and rps16) plus one nuclear (ITS), three fossil calibration constraints, and one secondary calibration to estimate the divergence time of Hedychium.The result revealed that Hedychium occurred 10.6 Mya [26].Most of these studies based on gene fragments generated relatively weakly supported relationships; therefore, more evidence is crucial to further exploring the divergence time and to reconstruct the phylogenetic relationships of Zingiberales.
In this study, we sequenced 22 complete chloroplast genomes of 21 species, including 5 genera (Cornukaempferia, Hedychium, Kaempferia, Calathea and Stromanthe) from two families Zingiberaceae and Marantaceae, and explored the chloroplast genomes structural features and sequences differentiation among these species.Furthermore, we combined them with published chloroplast genomes from the NCBI database, representing eight families of Zingiberales and four outgroups, for reconstructing phylogenetic relationships and estimating the divergence time of Zingiberales.The main aims of this study were to: (1) analyze the structures and features of the newly sequenced complete chloroplast genomes, (2) identify highly variable regions in complete chloroplast genomes as potential DNA markers for future species identification in Zingiberales, (3) infer the molecular evolution of complete chloroplast genomes in Zingiberales, (4) reconstruct the phylogenetic relationships of Zingiberales by chloroplast genomes, especially determining the phylogenetic positions of Cornukaempferia, Hedychium, and Kaempferia, and (5) estimate the divergence time of Zingiberales by chloroplast genomes.

Characteristics of 22 Complete Chloroplast Genomes in Zingiberales
The 22 newly sequenced samples of Zingiberales generated approximately 222.60 Gb of paired-end clean data, ranging from 5.84 to 13.26 Gb clean data for each sample after removing adapters and low-quality data (Table S1).All 22 sequenced chloroplast genomes displayed a typical quadripartite structure containing one large single-copy (LSC), one small single-copy (SSC) and two inverted repeat regions (IRa and IRb) by OGDRAW [27] and CGView tool [28] (Figure 1 and Table 1).The 22 complete chloroplast genomes generated here were deposited in GenBank with accession numbers OP805573 to OP805594 (Table 1).Their size ranged from 161,303 bp (Stromanthe sanguinea) to 163,979 bp (Hedychium menghaiense) (Tables 1 and S2).They showed four junction regions, including a separate LSC region of 87,056-91,910 bp, an SSC region of 15,640-20,848 bp, and a pair of IRs (IRa and IRb) of 27,074-29,788 bp each (Figure 1, Tables 1 and S2).The GC content of these 22 complete chloroplast genomes was very similar (36.08-36.67%)(Table 1).Specifically, the GC content in the IR regions (41.14-42.16%)was higher than that in the LSC regions (33.84-34.44%)and SSC regions (29.49-30.35%)(Table 1).The GC content of protein-coding genes of these 22 complete chloroplast genomes ranged from 36.91% to 37.60% (Table 1).
In this study, each sequenced chloroplast genome contained 131-134 predicted functional genes, which consisted of 87-88 protein-coding genes, 36-38 tRNA genes, and 8 rRNA genes (Tables 1 and S2).Among these genes, a total of 111-112 different genes were detected in these 22 chloroplast genomes, including 79 protein-coding genes, 28-29 tRNA genes and 4 rRNA genes (Tables 1 and S2).Although most of the protein-coding genes tRNAs and rRNAs among these 22 chloroplast genomes were similar, slight differences existed, for examples, rpl22 had two copies only in the chloroplast genome of Calathea makoyana.In addition, trnH-GUG had only one copy in the chloroplast genome of Hedychium brevicaule in which trnR-UCU was missing (Tables 2 and S2).
Note: CDS protein-coding genes, GC guanine-cytosine, LSC large single-copy region, SSC small single-copy region, and IR inverted repeat.
Table 2. Gene contents in the newly sequenced 22 complete chloroplast genomes of Zingiberales in this study.

Repeat Sequences Analyses
In this study, four types of long repeats including forward repeats, palindromic repeats, reverse repeats, and complement repeats were identified in our newly assembled 22 chloroplast genomes (Figure 2A and Table S3).Among these 22 chloroplast genomes, C. makoyana had the largest number of long repeats (384) and Kaempferia parviflora had the smallest number of long repeats (51) (Figure 2A and Table S3).The number of forward repeats varied between 19 (Cornukaempferia aurantiflora) and 306 (C.makoyana), the number of palindromic repeats varied from 23 (K.parviflora) to 70 (C.makoyana), the number of reverse repeats varied between 3 (K.parviflora) and 21 (Hedychium villosum var.tenuiflorum), and the number of complement repeats varied from 1 (C.makoyana and Hedychium yunnanense) to 9 (H.villosum var.tenuiflorum) (Figure 2A and Table S3).Long repeats with 30-34 bp were found to be the most common in these 22 chloroplast genomes (Figure 2B and Table S3).Long repeats with lengths of ≥45 bp and lengths of 35-39 bp were the second and third most common, respectively (Table S3).

Repeat Sequences Analyses
In this study, four types of long repeats including forward repeats, palindromic repeats, reverse repeats, and complement repeats were identified in our newly assembled 22 chloroplast genomes (Figure 2A and Table S3).Among these 22 chloroplast genomes, C. makoyana had the largest number of long repeats (384) and Kaempferia parviflora had the smallest number of long repeats (51) (Figure 2A and Table S3).The number of forward repeats varied between 19 (Cornukaempferia aurantiflora) and 306 (C.makoyana), the number of palindromic repeats varied from 23 (K.parviflora) to 70 (C.makoyana), the number of reverse repeats varied between 3 (K.parviflora) and 21 (Hedychium villosum var.tenuiflorum), and the number of complement repeats varied from 1 (C.makoyana and Hedychium yunnanense) to 9 (H.villosum var.tenuiflorum) (Figure 2A and Table S3).Long repeats with 30-34 bp were found to be the most common in these 22 chloroplast genomes (Figure 2B and Table S3).Long repeats with lengths of ≥45 bp and lengths of 35-39 bp were the second and third most common, respectively (Table S3).Six types of simple sequence repeats (SSRs) were discovered among these 22 chloroplast genomes: mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide and hexanucleotide (Figure 3A and Table S4).In total, there were 85 to 127 SSRs in each chloroplast genome (Figure 3A).Among these SSRs, only the chloroplast genome of C. aurantiflora had no pentanucleotide repeats, and four chloroplast genomes of Hedychium chrysoleucum, Hedychium tienlinense, H. menghaiense, and S. sanguinea had no hexanucleotide repeats (Figure 3A and Table S4).Among each sequenced chloroplast genome, mononucleotide repeats were the most frequent, with numbers ranging from 41 to 63, followed by dinucleotide repeats, ranging from 14 to 35, tetranucleotide repeats, ranging from 10 to 21, trinucleotide repeats, ranging from 3 to 14, hexanucleotide repeats, ranging from 0 to 7, and pentanucleotide repeats, ranging from 0 to 6 (Figure 3A and Table S4).The majority of the mononucleotide SSRs were A/T repeats, which accounted for 92.68-100% of all the mononucleotide types among these 22 genomes (Figure 3B and Table S4).In the dinucleotide repeats, the AG/CT repeats were observed most frequently except in the chloroplast genome of C. makoyana (Figure 3B and Table S4).In the chloroplast genome of C. makoyana, AT/AT repeats were the most frequent dinucleotide repeats (Figure 3B and Table S4).In the tetranucleotide category, the AAAT/ATTT repeats were the most abundant type (Figure 3B and Table S4).SSRs were more frequently located in the LSC regions (54-87 loci, 62.09-73.68%)than in the SSC regions (15-23 loci, 14.02-22.35%),IRa regions (3-10 loci, 3.53-8.40%),and IRb regions (3-9 loci, 3.53-8.05%)among these 22 chloroplast genomes (Table S4).

Codons Usage Analysis
The 22 newly sequenced chloroplast genomes of Zingiberales were analyzed to survey information on the codon usage, amino acid frequency, and relative synonymous codon usage (RSCU) (Table S5).The total codons (excluding stop codons) of these 22 chloroplast genomes ranged from 25,908 (S. sanguinea) to 27,694 (H.brevicaule).Of these, 61 codons encoded 20 amino acids (Figure 4 and Table S5).The codons ATG and TGG, encoding methionine (Met) and tryptophan (Trp), respectively, showed no codon bias, both with RSCU values of 1.00 among these 22 chloroplast genomes (Figure 4 and Table S5).The codons of four with the highest RSCU values (AGA, TTA, GCT, and TCT) and six with the lowest RSCU values (AGC, GGC, CGC, GCG, CTG and GAC) were found in these 22 chloroplast genomes (Figure 4 and Table S5).Twenty-nine codons showed codon usage bias with RSCU > 1.00 in protein-coding genes of 21 chloroplast genomes; however, in protein-coding genes of C. makoyana chloroplast genome, thirty codons indicated codon usage bias with RSCU > 1.00 (Table S5).

Comparative Analysis of 22 Complete Chloroplast Genomes in Zingiberales
Using the complete chloroplast genome of H. bijiangense as the reference, 22 newly sequenced chloroplast genomes of Zingiberales were compared by using mVISTA [29] and CGView [28] (Figures 1 and 5).The mVISTA result revealed that the two IR regions were less divergent than the LSC and SSC regions (Figure 5).The non-coding regions showed obviously higher divergence than the protein-coding regions (Figure 5).The main divergences for the protein-coding regions were trnQ, rpl33, trnG, ycf3, trnS and ccsA.For the non-coding regions, highly divergent regions were accD-psaI, rpl32-trnL, rbcL-accD, and ndhG-psaC (Figure 5).The CGView result also indicated that the LSC and SSC regions were significantly more divergent than the two IR regions, and the main divergences originated from LSC and SSC regions (the innermost fourth color ring to the outwards 25th ring in Figure 1).Compared to the chloroplast genome of H. bijiangense (the innermost fourth color ring in Figure 1), the other 21 complete chloroplast genomes showed four divergent regions in LSC (trnG-trnS, rbcL-accD, psaA-ycf3 and ycf4-cemA), one region in SSC (ndhG-psaC) and one region in iRa (ycf1).

Comparative Analysis of 22 Complete Chloroplast Genomes in Zingiberales
Using the complete chloroplast genome of H. bijiangense as the reference, 22 newly sequenced chloroplast genomes of Zingiberales were compared by using mVISTA [29] and CGView [28] (Figures 1 and 5).The mVISTA result revealed that the two IR regions were less divergent than the LSC and SSC regions (Figure 5).The non-coding regions showed obviously higher divergence than the protein-coding regions (Figure 5).The main divergences for the protein-coding regions were trnQ, rpl33, trnG, ycf3, trnS and ccsA.For the non-coding regions, highly divergent regions were accD-psaI, rpl32-trnL, rbcL-accD, and ndhG-psaC (Figure 5).The CGView result also indicated that the LSC and SSC regions were significantly more divergent than the two IR regions, and the main divergences originated from LSC and SSC regions (the innermost fourth color ring to the outwards 25th ring in Figure 1).Compared to the chloroplast genome of H. bijiangense (the innermost fourth color ring in Figure 1), the other 21 complete chloroplast genomes showed four divergent regions in LSC (trnG-trnS, rbcL-accD, psaA-ycf3 and ycf4-cemA), one region in SSC (ndhG-psaC) and one region in IRa (ycf1).The borders of LSC/IRb, IRb/SSC, SSC/IRa, and IRa/LSC among these 22 chloroplast genomes were compared and shown in detail (Figure 6).Among these 22 chloroplast genomes, the rps19 and psbA genes were located in the IRa/LSC borders, respectively (Figure 6).The distances between the ends of rps19 and the IRa/LSC borders were 120-305 bp, and the distances between the start of psbA and the IRa/LSC boundaries ranged from 93 bp to 291 bp (Figure 6).For the LSC/IRb borders, the rpl22 and rps19 genes were located in the LSC/IRb borders in all the 22 chloroplast genomes.A total of 24-95 bp were found between the ends of rpl22 and the LSC/IRb borders among these 22 chloroplast genomes, and the distances between the start of rps19 and the LSC/IRb borders ranged from 124 bp to 306 bp (Figure 6).The SSC/IRa borders were located in the ycf1 gene, which crossed into the IRa region in the 21 chloroplast genomes; however, the lengths of ycf1 in the IRa regions significantly varied among these 21 chloroplast genomes, ranging from 1050 bp to 3877 bp (Figure 6).In S. sanguinea, ndhF was found in the SSC/IRa border in its chloroplast genome; and the distance between the end of ndhF and the SSC/IRa border was 5 bp (Figure 6).Regarding the IRb/SSC borders, the ycf1 and ndhF genes were located in the IRb/SSC borders in the 21 chloroplast genomes, except in the chloroplast genome of S. sanguinea.The ycf1 gene expanding into the SSC regions ranged from 6 bp to 132 bp, and the distances between the start of ndhF and the IRb/SSC borders ranged from 4 bp to 447 bp among these 21 chloroplast genomes (Figure 6).Regarding S. sanguinea, only ycf1 gene was located at the border of IRb/SSC in its chloroplast genome, and the ycf1 gene expanded into the SSC region by 4107 bp (Figure 6).In conclusion, the IR/LSC borders of these 22 chloroplast genomes of Zingiberales were relatively conserved and similar, but the IR/SSC borders exhibited variations.bp (Figure 6).The SSC/iRa borders were located in the ycf1 gene, which crossed into the iRa region in the 21 chloroplast genomes; however, the lengths of ycf1 in the iRa regions significantly varied among these 21 chloroplast genomes, ranging from 1050 bp to 3877 bp (Figure 6).In S. sanguinea, ndhF was found in the SSC/iRa border in its chloroplast genome; and the distance between the end of ndhF and the SSC/iRa border was 5 bp (Figure 6).Regarding the iRb/SSC borders, the ycf1 and ndhF genes were located in the iRb/SSC borders in the 21 chloroplast genomes, except in the chloroplast genome of S. sanguinea.The ycf1 gene expanding into the SSC regions ranged from 6 bp to 132 bp, and the distances between the start of ndhF and the iRb/SSC borders ranged from 4 bp to 447 bp among these 21 chloroplast genomes (Figure 6).Regarding S. sanguinea, only ycf1 gene was located at the border of iRb/SSC in its chloroplast genome, and the ycf1 gene expanded into the SSC region by 4107 bp (Figure 6).In conclusion, the IR/LSC borders of these 22 chloroplast genomes of Zingiberales were relatively conserved and similar, but the IR/SSC borders exhibited variations.

Highly Divergence Regions and Selective Pressure Analyses of Zingiberales
Nucleotide diversity (Pi) and single-nucleotide substitutions in the LSC, SSC, IRa, IRb and the total of chloroplast genomes were analyzed among the 111 complete chloroplast genomes of Zingiberales (Figure 7 and Table 3).One hundred and eleven chloroplast genomes of Zingiberales were aligned with a matrix of 163,883 bp with 111,120 variable sites (67.80%) and 92,746 parsimony informative sites (56.59%).The Pi value of the complete chloroplast genomes was 0.1565 (Table 3).The LSC region had the highest Pi value (0.1751) and the IRa region had the lowest Pi value (0.1455) (Table 3).Among the protein-coding regions, the Pi values ranged from 0 to 0.2695 and had an average value of 0.1399 (Table S6).Four protein-coding regions (trnM, trnL, ccsA and psaC) showed remarkably high values (Pi > 0.20; Figure 7A and Table S6).For the non-coding regions, Pi values ranged from 0 to 0.2379 (ccsA-ndhD) and had an average of 0.1423 (Table S6).Five of these regions had remarkably high values (Pi > 0.20), including trnQ-psbK, psbL-psbF, ccsA-ndhD, ndhD-psaC, and psaC-ndhE (Figure 7B and Table S6).Considering the results of Pi values and the length of regions ≥ 150 bp for the selection of potential molecular markers of Zingiberales, three regions were identified, including ccsA, psaC and psaC-ndhE.

Genome Structure Comparison and Sequence Variation
In this study, 22 complete chloroplast genomes of Zingiberales showed a typical quadripartite structure (a single LSC region, a single SSC region and a pair of IR regions),

Genome Structure Comparison and Sequence Variation
In this study, 22 complete chloroplast genomes of Zingiberales showed a typical quadripartite structure (a single LSC region, a single SSC region and a pair of IR regions), which had been reported in other Zingiberaceae and Musaceae species [9][10][11]13,14].Although most of the protein-coding genes tRNAs and rRNAs were highly conserved, gene loss and gene duplication occurred in these 22 complete chloroplast genomes.For examples, H. brevicaule lost trnR-UCU and had only one copy of trnH-GUG in its chloroplast genome and C. makoyana had two copies of rpl22 only in its chloroplast genome, suggesting that gene loss and insertion had happened during the evolutionary processes of H. brevicaule and C. makoyana.In contrast, there had been reports of gene loss and duplication in other chloroplast genomes of higher plants, such as losses of ndh genes in families Orobanchaceae [30] and Orchidaceae [31], and duplication of trnS-GCU and trnT-UGU in Globba schomburgkii [19].IR contraction and expansion of chloroplast genomes were recognized to be important evolutionary events and may cause chloroplast genome size variations, production of pseudogenes, gene duplication or the reduction in duplicate genes to single genes [19,30,31].In this study, the IR region of C. makoyana was 27,074 bp long, which was shorter than that of the other 21 chloroplast genomes (Table 1), indicating the other Zingiberales species differentiated later than C. makoyana.Additionally, among studied 22 chloroplast genomes herein, only ycf1 and ndhF were found at the IRb/SSC and SSC/IRa borders in the chloroplast genome of S. sanguinea, respectively.Therefore, changes in the SSC/IR borders may be the main reasons for the IR contraction and expansion in these Zingiberales species.
Comparative analysis of the 111 complete chloroplast genomes of Zingiberales revealed that the LSC and SSC regions were more divergent than the IR regions (Table 3 and Figure 7), consistent with findings for other plants [12][13][14][15]31].Previous studies used three combined molecular markers (atpB, rbcL and 18S) to confirm the relationships among the ginger families, but were not able to resolve the earliest divergent lineages in Zingiberales [2].Based on the Pi values studied in the present study, it was also obvious that the frequently used chloroplast genome markers, including atpB and rbcL, presented relatively low polymorphisms (0.17, 0.16, respectively) at the order level.Based on Pi values studied currently, 3 divergent hotspot regions among 111 complete chloroplast genomes of Zingiberales were identified, including ccsA, psaC and psaC-ndhE.By comparison, ccsA was also reported as a divergent hotspot region in Myrtales [32] and Monsteroideae [33].Therefore, these variable regions may be suitable for potential DNA markers for phylogenetic relationships and species identification studies of Zingiberales.

Positive Selection and Phylogenetic Analysis within Zingiberales
In this study, 32 genes with positive selection sites were identified among 111 complete chloroplast genomes of Zingiberales (Table 4).In contrast, we found fewer genes under positive selection than results in a previous study [34], in which most protein-coding genes (62) underwent positive selection pressures among 14 Araceae species.Our results found that 15 positively selected sites were identified in clpP genes for Zingiberales, suggesting that clpP may play an important role in the adaptive evolution of Zingiberales.Additionally, we found that rpoC2, rbcL and matK also possessed relatively high positive selection sites (8, 8, 7, respectively).Current studies have revealed that these four genes with positive selection in land plants may be very common [31,[35][36][37][38].For examples, rpoC2 and rbcL have been reported as positive selection in orchid species [31]; matK and rbcL have been identified under positive selection in Ulmaceae species [35]; rpoC2 and clpP have been identified under positive selection in Anisodus species [36]; matK has been identified under positive selection in the Cotinus species [37], and clpP has been identified under positive selection in the Astragalus species [38].The high positive selection sites of clpP, rpoC2, rbcL and matK indicated that these four genes were valuable markers for the adaptive evolution studies of Zingiberales.On the one hand, Zingiberales species commonly maintained high levels of plant diversity, such as diverse pseudostem heights and leaf sizes; for instance, Musa coccinea had pseudostem height of 100-200 cm and leaf size of 50-100 × 10-25 cm, respectively, while Orchidantha chinensis and Roscoea humeana had pseudostem heights of 50-100 cm and 13-25 cm, respectively, and leaf sizes of 22-30 × 5.5-11 cm and 10-30 × 3-6 cm, respectively [39].On the other hand, Zingiberales species also owned diverse habitats; for example, R. humeana lived in pine forests, meadows and rocky hillside at altitudes of 2900-3800 m, whereas M. coccinea scattered in the valley and slope below altitudes of 600 m [39,40].Therefore, genes of chloroplast genome involved in energy (photosystem, ATP synthase, NADH dehydrogease) and development (ribosome and others), may play important roles during the evolution and adaptation of Zingiberales species to their respective habitats.
Compared with previous studies based on morphology, chloroplast genome genes and nuclear ITS [1,3,23], incomplete chloroplast genomes and nuclear genes [4][5][6]20], our results sampled more species using chloroplast genome sequences and showed high resolution of phylogenetic relationships in Zingiberales (Figure 8).Eight clades representing eight families were fully resolved with strong support (Figure 8).Across these previous studies, the four ginger families (Cannaceae, Marantaceae, Costaceae, and Zingiberaceae) and all the relationships among them were strongly supported; however, the placements of the other four banana families (Musaceae, Strelitziaceae, Lowiaceae, and Heliconiaceae) have to date been uncertain and inconsistent.On the one hand, in some previous studies, Musaceae [1,2,4,8] or Heliconiaceae [3] or Lowiaceae [27] were placed as the sister to all the other families in the Zingiberales.On the other hand, Musaceae was sister to the clade containing Heliconiaceae, which was in turn sister to Strelitziaceae and Lowiaceae with weak to strong support (MLBS = 69-100%, and BIPP = 0.6-1.0)[5,6,20].Our phylogenomic analysis of Zingiberales identified strong support for the sister relationship between Musaceae and a clade of Heliconiaceae + Strelitziaceae-Lowiaceae (MLBS = 94-100%, and BIPP = 0.93-1.0; Figure 8), which was in agreement with the backbone from some previous studies [5,6,20].Additionally, the phylogenetic positions of Cornukaempferia, Hedychium and Kaempferia within Zingiberaceae were also determined.The present phylogenetic results provided high degree of credibility that complete chloroplast genomes may be useful for phylogeny of Zingiberales in the future.Further study should sample more taxa of Zingiberales and obtain more complete chloroplast genomes data to identify whether our results are in agreement with those from nuclear genes.

Divergence Time within Zingiberales
In this study, the results of divergence time estimation showed that the crown node of Zingiberales most likely occurred at 85.0 Mya (95% HPD: 81.6-89.3Mya) (Figure 9).The divergence time of Zingiberales estimated herein was in close proximity to some previous reports, such as 87.59 Mya reported by Fu et al. (2022) [9], and 83 Mya reported by Givnish et al. (2018) [6].However, Kress and Specht (2006) using three genes (atpB, rbcL, and 18S) from 36 taxa (including 24 species of Zingiberales and 12 outgroups) and five calibration points, estimated the divergence time of Zingiberales to be approximately 124 Mya [7].Present analyses estimated that the major family-level lineages of Zingiberales became established approximately 46.8-80.5 Mya (Figure 9), which were younger than the ages estimated by Kress and Specht (2006) (80-110 Mya) [7].In addition, within Zingiberaceae, the crown node of Hedychium had occurred approximately 8.1 Mya (95% HPD: 4.2-15.4Mya) (Figure 9), which coincided with a report that Hedychium was a very young lineage that originated approximately 10.6 Mya [26].Because taxon sampling, analysis methods, molecular data, and calibration points were different among these divergence time studies [6,7,9,26], we cannot perfectly compare across all these studies.Changes in taxon sampling, analysis methods, molecular data, and calibration points may lead to differences in divergence times estimation in these studies [6,7,9,26].

Plant Sample Collection, DNA Extraction, and Sequencing
Fresh leaves of 22 individuals from 21 species, representing 5 genera and 2 families in Zingiberales, were collected from the resource garden (23 • 23 N, 113 • 26 E) of the environmental horticulture research institute at the Guangdong Academy of Agricultural Sciences, Guangzhou, China.Each leaf sample was immediately frozen in liquid nitrogen and stored at −80 • C until use.Total chloroplast genomic DNA was extracted from each leaf sample using sucrose gradient centrifugation method with minor modifications [41].The chloroplast DNA quality and concentration were detected by using 1% (w/v) agarose gel electrophoresis and NanoDrop 2000 microspectrometer (Wilmington, DE, USA).Each qualified chloroplast DNA (1.0 µg) was used for construction a DNA library with fragments of approximately 350 bp, volume of 25 µL and concentration of 21.5 ng/µL, and then sequenced on an Illumina NovaSeq 6000 platform with 150 bp paired-end reads length (Biozeron, Shanghai, China).The original raw data were filtered by Trimmomatic v. 0.39 with default parameters [42] to delete adaptors and low-quality reads.

Chloroplast Genome Assembly and Annotation
Filtered high-quality clean reads were assembled into complete chloroplast genomes using GetOrganelle v. 1.7.6. 1 [43] with default settings.Geneious Prime 2022 (Biomatters Ltd., Auckland, New Zealand) [44] was used for sequence correction with a reference chloroplast genome of Hedychium coronarium from Guangdong (GenBank MK262736).Each assembled complete chloroplast genome was annotated using GeSeq [45] and the online Dual Organellar Genome Annotator (DOGMA) [46] with default parameters, respectively.The transfer RNA (tRNA) and ribosomal RNA (rRNA) sequences were confirmed by tR-NAscanSE v. 2.0.5 [47] and BLAST v. 2.13.0 [48].The newly annotated complete chloroplast genome sequences were first validated using online GB2sequin [49], further were verified and formatted using Sequin v. 15.50 from NCBI, and deposited in GenBank (accession numbers are shown in Table 1).Newly complete chloroplast genomes maps were drawn using Organellar Genome Draw (OGDRAW) v. 1.3.1 [27].

Repeat Sequences Analysis
Four types of long repeats of the newly sequenced 22 chloroplast genomes of Zingiberales, namely, forward, palindrome, reverse and complement repeats, were identified using REPuter [50] with a minimal repeat size of 30 bp, a hamming distance of 3 and a repeat identity of more than 90%.Simple sequence repeats (SSRs) of these 22 chloroplast genomes were examined by MISA-web [51] with the following parameters: minimum repeat units of 10 for mononucleotides; 5 for dinucleotides, 4 for trinucleotides, and 3 for tetra-, penta-and hexanucleotides.

Codon Usage Analysis
The relative synonymous codon usage (RSCU) and amino acid frequencies of the 22 chloroplast genomes of Zingiberales were analyzed using MEGA v. 7.0 [52] with default parameters.A RSCU value of >1 indicates that the codon is used more frequently, a RSCU value = 1 indicates that the codon has no use preference, and a value of <1 indicates that the codon is used less frequently.Clustered heat map of RSCU values of newly sequenced 22 chloroplast genomes of Zingiberales was constructed with R v. 4.0.2(https://www.R-project.org)(accessed on 10 September 2022).

Chloroplast Genomes Comparison
Chloroplast genomes structures among the 22 newly sequenced genomes of Zingiberales were compared with the mVISTA program in the Shuffle-LAGAN mode [29], using the chloroplast genome of Hedychium bijiangense as the reference.The newly generated 22 chloroplast genomes of Zingiberales for LSC/IR and SSC/IR boundaries and their adjacent genes were analyzed using IRscope [53].Furthermore, chloroplast genomes comparisons across 22 chloroplast genomes of Zingiberales were performed using CGView Server [28].GC contents were detected based on GC skew using the equation: GC skew = (G − C)/(G + C).

Nucleotide Diversity and Gene Selective Pressure Analyses
Nucleotide diversity (Pi) was calculated by sliding window analysis using DnaSP v. 6.12.03 [54].In total, 111 complete chloroplast genomes of Zingiberales were analyzed, including 22 newly sequenced complete chloroplast genomes and 89 complete chloroplast genomes from the NCBI database (Table S7).Additionally, variable and parsimony informative base sites of the LSC, SSC, IRa, IRb, and complete chloroplast genomes were also calculated.
To detect positively selected amino acid sites in 111 complete chloroplast genomes of Zingiberales, the nonsynonymous (dN) and synonymous (dS) substitution rates of consensus protein-coding genes were calculated by using the CodeML program implemented in EasyCodeML [55].Gene selective pressure analysis was based on 66 consensus protein-coding genes sequences after removing all stop codons.The positive selection model of M8 (β & ω > 1) was used to detect positively selected sites based on both the dN and dS ratios (ω) and likelihood ratio tests (LRTs) values [56].The bayes empirical bayes (BEB) method was used to identify the most likely codons under positive selection, with a posterior probability higher than 0.95 and 0.99 indicating sites under positive selection and strong positive selection, respectively [57].

Phylogenetic Relationship Analysis
To reconstruct and confirm the phylogenetic relationships of eight families in Zingiberales, 115 chloroplast genomes of Zingiberales, including 22 complete chloroplast genomes generated in the present study, 89 complete chloroplast genomes and 4 incomplete chloroplast genomes downloaded from the NCBI GenBank database, were analyzed (Table S8).Commelina communis (MW617984), Pollia japonica (MW617990), Rhopalephora scaberrima (MW617991), and Siderasis fuscata (MW617992) were used as outgroups.Chloroplast genome sequences were aligned using MAFFT v. 7.458 [58] with default parameters, and manually checked when necessary.Phylogenetic tree was constructed using ML and BI methods, respectively.The best nucleotide substitution model (GTR + G + I) was determined using Akaike Information Criterion (AIC) in jModelTest v. 2.1.10[59].ML analysis was conducted in PhyML v. 3.0 [60] with 1000 bootstrap replicates.BI analysis was performed in MrBayes v. 3.2.6 [61].Two Markov Chain Monte Carlo algorithm (MCMC) runs were conducted simultaneously with 200,000 generations and four Markov chains, starting from random trees, sampling trees every 100 generations, and discarding the first 10% of samples as burn-in.The phylogenetic trees were edited and visualized using iTOL v. 3.4.3(http://itol.embl.de/itol.cgi)(accessed on 10 December 2022).

Divergence Time Estimation
The dataset of 115 chloroplast genomes of Zingiberales was analyzed using GTR + G + I model determined in jModelTest v. 2.1.10 [59] to search for the best tree topology.Divergence time estimation of Zingiberales was performed by using MCMC tree in PAML v. 4.4 [62].Three fossil records and one calibration point was obtained and used in this divergence time estimation.Zingiberopsis attenuate was used as a mean age of 65 Million years ago (Mya) for the crown age of family Zingiberaceae [63].Ensete oregonense was applied to calibrate the crown age of Ensete and Musella with a mean age of 43 Mya [64].Spirematospermum chandlerae was used to calibrate the crown age of Zingiberales with a mean age of 83.5 Mya [65].Each fossil calibration point was assumed to follow a normal distribution with a standard deviation of 2 and an offset of 2, resulting in 63.1-70.9Mya, 41.1-48.9Mya, and 81.6-89.4Mya, 95% intervals, respectively.Then, the stem root of Zingiberales was set with a mean age of 100 Mya and a standard deviation of 5 (90.2-110Mya, 95% intervals), based on previous studies [9,66].The ML tree constructed from chloroplast genome sequences was used as a starting tree for MCMC run.MCMC run was set 1,000,000 generations, sampling every 100 generations, and removing the first 10% generations as burn in.Divergence time estimation was calculated by parameters of clock = 2 and model = 0, with 95% highest posterior density (HPD) intervals.

Conclusions
In this study, we sequenced, assembled and compared structural characteristics of 22 complete chloroplast genomes from 21 Zingiberales species, studied the molecular evolution of chloroplast genomes in Zingiberales, reconstructed the phylogenetic relationships of Zingiberales with a high-resolution backbone, and inferred the phylogenetic divergence time of Zingiberales.The newly sequenced 22 chloroplast genomes of Zingiberales had a typical quadripartite structure and contained 111-112 different genes, including 79 proteincoding genes, 28-29 tRNA genes and 4 rRNA genes, with chloroplast genome length of 161,303-163,979 bp.Comparative analyses of 111 complete chloroplast genomes of Zingiberales identified 3 highly divergent regions, which can be used as candidate markers for phylogenetic analyses and species identification.Both ML and BI phylogenetic trees based on chloroplast genome sequences identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Heliconiaceae, (Lowiaceae, Strelitziaceae))) (MLBS = 94-100%, and BIPP = 0.93-1.0).Reconstruction divergence time of Zingiberales revealed that the stem and crown nodes of Zingiberales were approximately 100.1 Mya (95% HPD: 92.2-109.6Mya), and 85.0 Mya (95% HPD: 81.6-89.3Mya), respectively.The major family-level lineages of Zingiberales becoming ranged from 46.8 to 80.5 Mya.In addition, 32 genes were under positive selection at levels of amino acids with high posterior probabilities among 111 complete chloroplast genomes of Zingiberales.All the obtained genomic resources in this study will contribute to future studies of species identification, phylogeny, molecular evolution, and conservation of Zingiberales species.
photosystem assembly/stability factors RubisCO large subunit NADH dehydrogenase ATP synthase cytochrome b/f complex photosystem II photosystem I

Figure 2 .
Figure 2. Analysis of long repeats in the 22 newly sequenced complete chloroplast genomes of Zingiberales.(A) Total number of four long repeat types.(B) Length distribution of long repeats in each sequenced chloroplast genome.

Figure 2 .
Figure 2. Analysis of long repeats in the 22 newly sequenced complete chloroplast genomes of Zingiberales.(A) Total number of four long repeat types.(B) Length distribution of long repeats in each sequenced chloroplast genome.

Figure 3 .
Figure 3. Types and distribution of SSRs in 22 newly sequenced complete chloroplast genomes of Zingiberales.(A) Number of different SSR types.(B) Number of identified SSR motifs in different repeat class types.SSR, simple sequence repeat.

Figure 3 .
Figure 3. Types and distribution of SSRs in 22 newly sequenced complete chloroplast genomes of Zingiberales.(A) Number of different SSR types.(B) Number of identified SSR motifs in different repeat class types.SSR, simple sequence repeat.

Figure 4 .
Figure 4. Heat map for the relative synonymous codon usage values of the 22 newly sequenced complete chloroplast genomes of Zingiberales.

Figure 4 .
Figure 4. Heat map for the relative synonymous codon usage values of the 22 newly sequenced complete chloroplast genomes of Zingiberales.

Figure 5 .
Figure 5. Visualization of the alignment of the 22 newly sequenced complete chloroplast genomes of Zingiberales.The chloroplast genome of H. bijiangense is used as the reference.The y-axis represents the percent identity ranging from 50% to 100%.The x-axis depicts sequence coordinates within the chloroplast genome.Purple bars represent exons, sky-blue bars represent untranslated regions (UTRs), red bars represent non-coding sequences (CNS), grey bars represent mRNA and white regions represent sequence differences among 22 analyzed chloroplast genomes.

Figure 6 .
Figure 6.Comparison of the IR/SC borders among 22 newly sequenced complete chloroplast genomes of Zingiberales.LSC, large single-copy region; SSC, small single-copy region; IR, inverted repeat region.

Figure 6 .
Figure 6.Comparison of the IR/SC borders among 22 newly sequenced complete chloroplast genomes of Zingiberales.LSC, large single-copy region; SSC, small single-copy region; IR, inverted repeat region.

Figure 7 .
Figure 7.Comparison of nucleotide diversity values across the 111 complete chloroplast genomes of Zingiberales.(A) Protein-coding regions.(B) Non-coding regions.

Figure 7 .
Figure 7.Comparison of nucleotide diversity values across the 111 complete chloroplast genomes of Zingiberales.(A) Protein-coding regions.(B) Non-coding regions.

Figure 9 .
Figure 9. Divergence time estimation of Zingiberales based on the 115 chloroplast genome sequences.The fossil and calibration taxa are indicated with red points on the corresponding nodes.The mean divergence time of the nodes is shown at the nodes with blue.The blue bars correspond to 95% HPD of estimated divergence time, with minimum and maximum values, respectively.

Figure 9 .
Figure 9. Divergence time estimation of Zingiberales based on the 115 chloroplast genome sequences.The fossil and calibration taxa are indicated with red points on the corresponding nodes.The mean divergence time of the nodes is shown at the nodes with blue.The blue bars correspond to 95% HPD of estimated divergence time, with minimum and maximum values, respectively.

Table 1 .
Basic characteristics of the newly sequenced 22 complete chloroplast genomes of Zingiberales in this study.

Table 3 .
Variable site analyses of the 111 complete chloroplast genomes of Zingiberales.

Table 4 .
Positively selected sites detected in the 111 complete chloroplast genomes of Zingiberales.
Note: * and ** indicate a posterior probability higher than 0.95 and 0.99, respectively.