Complete Chloroplast Genome Sequence of Broomcorn Millet ( Panicum miliaceum L . ) and Comparative Analysis with Other Panicoideae Species

Broomcorn millet (Panicum miliaceum L.) is one of the earliest domesticated cereals worldwide, holding significant agricultural, historical, and evolutionary importance. However, our genomic knowledge of it is rather limited at present, hampering further genetic and evolutionary studies. Here, we sequenced and assembled the chloroplast genome (cp) of broomcorn millet and compared it with five other Panicoideae species. Results showed that the cp genome of broomcorn millet was 139,826 bp in size, with a typical quadripartite structure. In total, 108 genes were annotated and 18 genes were duplicated in the IR (inverted region) region, which was similar to other Panicoideae species. Comparative analysis showed a rather conserved genome structure between them, with three common regions. Furthermore, RNA editing, codon usage, and expansion of the IR, as well as simple sequence repeat (SSR) elements, were systematically investigated and 13 potential DNA markers were developed for Panicoideae species identification. Finally, phylogenetic analysis implied that broomcorn millet was a sister species to Panicum virgatum within the tribe Paniceae, and supported a monophyly of the Panicoideae. This study has reported for the first time the genome organization, gene content, and structural features of the chloroplast genome of broomcorn millet, which provides valuable information for genetic and evolutionary studies in the genus Panicum and beyond.


Introduction
Chloroplasts are organelles that are responsible for photosynthesis, which is vital for providing energy for plants and algae.It is believed that chloroplasts originated from cyanobacteria through endosymbiosis [1][2][3].They have their own genetic replication mechanisms, have a mostly maternal inheritance, and transcribe their own genome relatively independently [4].In higher plants, the chloroplast genome (cp) is generally a closed circular DNA in the form of multiple copies in a plant cell.The chloroplast genome of higher plants has a highly conserved quadripartite structure with approximately 110-130 genes, containing two inverted repeat sequences (IR), as well as one large single-copy region (LSC) and one small single-copy region (SSC) [5].The size of the chloroplast genome ranges from 115 to 165 kb, depending on species, and the genome size variation is mainly decided by the length variation of the IR region [6,7].Due to its abundant genetic information, moderate nucleotide replacement rate, maternal inheritance, and highly conserved structures [5,8], the chloroplast genome is widely used for species identification, plant barcoding, phylogenetic classification, and molecular evolutionary studies [9][10][11].Nowadays, rapid progress in next-generation sequencing technologies has resulted in a boom in completed chloroplast genomes [12,13], making phylogenomics analysis based on complete cp genome accessible.The phylogenomic analysis of cp genomes is of great value in providing resolution and support for evolutionary relationships of angiosperms.Furthermore, it will shed light on the cp genome evolution, such as sequence inversion [9,14], gene loss and gain [15,16], and structure variation [17,18].
Panicoideae is the second largest subfamily of the grass family (Poaceae), which is composed of approximately 3316 species in 12 tribes [19], such as lawn grasses (Eremo chloaophiuroides and Stenotaphrum secundatum) [20], biofuel stocks (Panicum virgatum) [21], and some important crops like maize (Zea mays L.), sugarcane (Saccharum officinarum L.), sorghum (Sorghum bicolor L.), and broomcorn millet (Panicum miliaceum L.) as well as foxtail millet (Setaria_italica L.) [22,23].In light of its economic and evolutionary significance, extensive studies have been performed to investigate the phylogenetic relationships and genetic divergence in the Panicoideae subfamily [24,25].Early studies showed that there were at least seven tribes in the Panicoideae subfamily [26].Numerous studies were carried out to study the accurate phylogeny based on morphological characters and molecular markers, although the relationship could not be defined perfectly and there are low support values for phylogenic nodes [27,28].Recently, the phylogenomic analysis using 50 full plastomes provided better resolution to determine the phylogenetic relationships in the Panicoideae subfamily [29].
Broomcorn millet is one of the oldest domesticated cereals in China and its agricultural value in the northern region can be traced back to the Pleistocene and Holocene boundaries [30].Before rice and wheat were domesticated, broomcorn millet was the main food staple of humans living in the semi-arid region of Asia.Broomcorn millet has many excellent agronomic traits, such as short growth cycle, high yield, lower nutrient and lower water requirements, and good resistance to diverse stresses [31].In recent years, progress has been made in the study of the nuclear genome of broomcorn millet, such as the exploration and identification of stress-related WRKY gene family in the transcriptome of broomcorn millet and SSR analysis based on high-throughput sequencing [32,33].However, the chloroplast genome of broomcorn millet has not been sequenced and the phylogenetic relationship between Panicum miliaceum L. with other Panicoideae subfamily species is still not well understood at present.In this study, we sequenced and assembled the complete chloroplast genome of Panicum miliaceum L. based on the Illumina sequencing technology.Then, a comparative analysis of broomcorn millet with other five closely related species, including sugarcane (KU214867.1),foxtail millet (KF646538.1),sorghum (EF115542.1),switchgrass (NC_015990.1), and maize (KF241981.1),was performed.This study not only provided useful information on genome organization, gene content, and structure variation in the broomcorn millet cp genome, but also provided important clues to its phylogenetic relationships, which will contribute to genetic and evolutionary studies in broomcorn millet and beyond.

Chloroplast Genome Assembly and Sequence Analysis of Broomcorn Millet
Based on the assembly method described by Nie et al. [9], we obtained the complete chloroplast genome sequence of broomcorn millet.After annotation and modification, the genome sequence was submitted to the GenBank database with accession no.KU343177.1.The size of this genome was 139,826 bp, with the typical quadripartite structure containing a pair of inverted repeats (IRA and IRB, 22,785 bp in size), one SSC region (12,574 bp in size), and one LSC region (81,682 bp in size), which is similar to other Panicoideae cp genomes [22,29] (Table 1).The GC content of the IR region (43.95%) was much higher than that of the LSC (36.47%) and SSC regions (33.09%) in the broomcorn millet cp genome.The higher GC content in the IR region was probably the result of the presence of four ribosomal RNA (rRNA) genes in this region, which is consistent with previous analyses in other plants [18,34] (Table 1 and Figure 1).Additionally, the GC content in the overall cp genome and IR region was nearly the same as in other Panicoideae cp genomes, suggesting that the cp genome in this subfamily had rather conserved genome organization [15,29].A total of 108 genes were annotated in broomcorn millet cp genome, including 76 protein-coding genes, 28 transfer RNAs (tRNA) genes, and four rRNA genes.Among them, six protein-coding genes (namely rps19, rps7, rpl23, rpl2, rps15, and ycf15), four rRNA genes (rRNA23, rRNA16, rRNA5, and rRNA4.5)and eight tRNA genes (trnfM-CAU, trnH-GUG, trnI-CAU, trnI-GAU, trnL-CAA, trnM-CAU, trnN-GUU, trnR-ACG and trnV-GAC) were duplicated as they located in the IR region.At the same time, the SSC region consisted of 10 protein-coding genes and one tRNA, while the LSC region consisted of 61 protein-coding genes and 22 tRNA (Table 2).Additionally, 28 unique tRNA genes, which encoded 20 amino acids, were distributed unevenly within the genome (one in the SSC region, 19 in the LSC region, and eight in the IR region).Furthermore, out of 108 genes, 13 genes each have one intron (namely rps16, atpF, petB, petD, rpl2, ndhB, ndhA, trnK-UUU, trnA-UGC, trnL-UAA, trnV-UAC, rpl16, and trnI-GAU) and one gene (ycf3) has two introns.Out of the 14 genes with introns, nine genes were located in the LSC (five protein-coding genes, three tRNA with one intron, and one ycf gene with two introns), one in the SSC (a protein-coding gene with one intron), and four in the IR region (two protein-coding genes and two tRNAs with one intron).Like most other land plants, rps12 is a trans-spliced gene with the 5' end exon located in the LSC region and the duplicated 3' end exon located in the IR regions.The trnK-UUU has the largest intron (2472 bp), where another important gene, matK, was located (Table 3).Sequence analysis revealed that the 43.5%, 2.9%, and 6.6% of broomcorn millet cp genome encoded protein, tRNAs, and rRNAs respectively, and the remaining 47.0% belonged to non-coding regions.Among the coding regions, there were 76 protein-coding genes categorized into different functional groups, including nine genes for large ribosomal proteins, 12 genes for small subunit ribosomal proteins, seven genes for photosystem I, 15 genes for photosystem II, and six genes for ATP synthase.A similar gene composition was also found in other monocots [34] (Table 2).Furthermore, the 82 protein-coding genes comprise of 60,903 bp in length coding for 20,301 codons.Then, the codon usage frequency of broomcorn millet cp genome was further calculated.Results showed that there were 2199 codons (10.8%) encoding for leucine, representing the most used amino acid, while there were 228 codons (1.12%) for cysteine, which was least used (Table 4).Among them, ATT (823) for isoleucine was the most used codon.The AT content was 52.17%, 60.43%, and 69.33% at the 1st, 2nd, and 3rd codon positions of these codons, respectively.The preference for A or T at the 3rd position suggested codon bias towards A or T at the third codon position, which was consistent with that of switchgrass, barley, sorghum, and foxtail millet [35,36].

Comparison of cp Genomes between Broomcorn Millet and Other Five Panicoideae Species
Five complete cp genomes within the Panicoideae subfamily were chosen to compare with broomcorn millet.The results showed that some small variation was present in the size of these cp genomes.The genome size of broomcorn millet, sugarcane, foxtail millet, sorghum, switchgrass and maize was 139,826 bp, 141,182 bp, 135,516 bp, 141,279 bp, 139,619 bp, and 140,447 bp, respectively.The size variation among them was mainly due to the difference in the length of the LSC and IR regions (Table 1).Furthermore, the general gene structure, gene organization, and gene content were highly conserved among these six cp genomes, and a total of 75 protein-coding genes were shared by them.However, we found that three genes, namely accD, ycf1, and ycf2, were all lost in these six cp genomes, which is similar to previous results regarding other Panicoid grasses [37,38].Similarly, two genes, including ycf15 and orf42, were only found in broomcorn millet, and one gene, ycf68, was observed in broomcorn millet, switchgrass, and maize.In addition, the loss of an intron in clpP and rpoC1 was also identified in the broomcorn millet cp genome, which was also found in switchgrass, barley, and sorghum cp genomes [22].Further study is needed to reveal the molecular mechanism and impact of intron loss events in these genes.
Furthermore, the detailed IR-SSC and IR-LSC borders, together with the adjacent genes, were systematically compared across the six cp genomes.Results showed that the border of IRb/SSC was located in the coding region of ndhF gene, resulting in a pseudogene of ndhF in the IRa region with the same size as IRb region expanded into ndhF gene in all six Panicoideae cp genomes.However, different lengths of IRb expanding into ndhF were found; the size in broomcorn millet was 112 bp, while it was 29 bp for the other five Panicoideae species (Figure 2).The expansion of ndhF was found to be unique to Panicoideae, which could be considered a unique feature to barcode the members of this subfamily [29].At the same time, we found that there was no expansion of the IRa/SSC boundary into ndhH genes in all six Panicoideae species, an difference from Ehrhartoideae and Pooideae species, where the partial duplication of ndhH genes were found.Additionally, the size of the intergenic spacer regions between rps19 and the end of the IRa region was consistently 35 bp in these six cp genomes, while the sizes of the start site of LSC to psbA, and rpl22 to the end of LSC, as well as the start site of IRb to rps19, showed some minor variations.Overall, the contraction or expansion of IR regions only partially contributed to the variation of the total size of the Panicoideae plastid genome, although they showed diverse contraction or expansion of IR.The variations in intergenic spacer regions might also exert an impact on the Panicoideae plastid genome size [29].
into ndhH genes in all six Panicoideae species, an difference from Ehrhartoideae and Pooideae species, where the partial duplication of ndhH genes were found.Additionally, the size of the intergenic spacer regions between rps19 and the end of the IRa region was consistently 35 bp in these six cp genomes, while the sizes of the start site of LSC to psbA, and rpl22 to the end of LSC, as well as the start site of IRb to rps19, showed some minor variations.Overall, the contraction or expansion of IR regions only partially contributed to the variation of the total size of the Panicoideae plastid genome, although they showed diverse contraction or expansion of IR.The variations in intergenic spacer regions might also exert an impact on the Panicoideae plastid genome size [29].Pairwise alignment was also conducted between broomcorn millet and the other five Panicoideae plastids, and the mVISTA tool was used to plot the comprehensive sequence identity of these six Panicoideae cp genomes, using broomcorn millet as a reference (Figure 3).Results demonstrated that the SSC and LSC regions showed more divergence than IRa and IRb regions.Also, non-coding regions were more divergent than coding regions.Among these six cp genomes, there were some highly divergent regions found, including rpoC2, ndhB, trnL, rpl22, rpl23, psbK and matK, of which some have been commonly used as makers for plant identification and phylogenetic analysis [25,39].In addition, these newly identified regions could be considered as novel candidates for speciation identification and phylogenetic studies in Panicoideae and beyond.Pairwise alignment was also conducted between broomcorn millet and the other five Panicoideae plastids, and the mVISTA tool was used to plot the comprehensive sequence identity of these six Panicoideae cp genomes, using broomcorn millet as a reference (Figure 3).Results demonstrated that the SSC and LSC regions showed more divergence than IRa and IRb regions.Also, non-coding regions were more divergent than coding regions.Among these six cp genomes, there were some highly divergent regions found, including rpoC2, ndhB, trnL, rpl22, rpl23, psbK and matK, of which some have been commonly used as makers for plant identification and phylogenetic analysis [25,39].In addition, these newly identified regions could be considered as novel candidates for speciation identification and phylogenetic studies in Panicoideae and beyond.Gene order between the broomcorn millet and other five Panicoideae species showed similar patterns, while it was different from that of Nicotiana tabacum as a result of three inversions of 28 kb, 6 kb, and less than 1 kb that were found in these Panicoideae species and have been identified by previous studies [40,41] (Figure 4).The endpoints of the first inversion occurred between trnG-UCC and trnR-UCU at one end and rps14 and trnfM-CAU at the other end when compared to N. tabacum, and the second inversion (about 6 kb) had endpoints between trnS and psbD on one end and trnG-UCC and trnT-GGU on the other end [42].The third inversion has endpoints between trnG-UCU and trnT-GGU to trnT-GGU and trnE-UUC.This inversion is very small and explained the inverted orientation of trnT-GGU.Gene order between the broomcorn millet and other five Panicoideae species showed similar patterns, while it was different from that of Nicotiana tabacum as a result of three inversions of 28 kb, 6 kb, and less than 1 kb that were found in these Panicoideae species and have been identified by previous studies [40,41] (Figure 4).The endpoints of the first inversion occurred between trnG-UCC and trnR-UCU at one end and rps14 and trnfM-CAU at the other end when compared to N. tabacum, and the second inversion (about 6 kb) had endpoints between trnS and psbD on one end and trnG-UCC and trnT-GGU on the other end [42].The third inversion has endpoints between trnG-UCU and trnT-GGU to trnT-GGU and trnE-UUC.This inversion is very small and explained the inverted orientation of trnT-GGU.

SSR Loci Identified in Panicoideae cp Genomes
Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 bp nucleotide and widely distributed in eukaryotes genome.Due to its abundant polymorphism, high stability, codominant inheritance and ease of use, SSRs have gradually been a popular and robust tool for population genetic analysis in non-model organisms [43].It is believed that the threshold for SSRs screening should be over 8 bp, as SSRs of 8 bp or longer tend towards slip-strand mispairing (SSM), which is considered the main mutational mechanism that has some influence on SSRs [7,16].In the present study, we set parameters over 8 bp to identify the potential SSRs in the chloroplast genome of broomcorn millet, together with other five related species.In total, 97 perfect SSRs were detected in the broomcorn millet cp genome, unevenly located at both genic and intergenic regions (Figure 5).

SSR Loci Identified in Panicoideae cp Genomes
Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 bp nucleotide and widely distributed in eukaryotes genome.Due to its abundant polymorphism, high stability, codominant inheritance and ease of use, SSRs have gradually been a popular and robust tool for population genetic analysis in non-model organisms [43].It is believed that the threshold for SSRs screening should be over 8 bp, as SSRs of 8 bp or longer tend towards slip-strand mispairing (SSM), which is considered the main mutational mechanism that has some influence on SSRs [7,16].In the present study, we set parameters over 8 bp to identify the potential SSRs in the chloroplast genome of broomcorn millet, together with other five related species.In total, 97 perfect SSRs were detected in the broomcorn millet cp genome, unevenly located at both genic and intergenic regions (Figure 5).Similarly, 94, 78, 87, 89, and 98 SSRs were identified in the maize, switchgrass, sorghum, foxtail millet, and sugarcane cp genome, respectively.The majority of the SSRs in these six cp genome were trinucleotides, ranging from 42 in foxtail millet to 50 in maize, and mononucleotides were the second most abundant, varying from 21 in foxtail millet to 36 in the broomcorn millet.Additionally, SSRs with pentanucletide and hexanucletiode motifs in these genomes were scarce: only one hexanucletiode was found in the sorghum and one pentanucletide was found in foxtail millet, sugarcane, and switchgrass (Figure 5 and Table S1).Among all the SSR motifs identified in the broomcorn millet cp genome, 85.9% of mononucleotides were comprised of A or T, and a similar proportion of trinucleotides (73.8%) was composed of A or T. Similarly, the motif tended to have a higher percentage of A or T in the other five related species.Our results were consistent with previous studies that reported that potential SSR markers identified in the cp genome commonly consisted of polyA or polyT repeats, and seldom included C and G repeats, which partially explained why broomcorn millet had a higher frequency of AT content [34,36].Based on the SSR dynamics in these six cp genomes, a total of 13 potential SSR markers were developed, which could be used for molecular identification of Panicoideae and other species (Table S2).

Prediction of RNA Editing Sites in Panicoideae cp Genomes
RNA editing is an important way to regulate gene expression at a post-transcriptional level, which generally changes original genetic information by insertion, deletion, and base substitution of nucleotides within mRNA molecules [44].Various studies in the chloroplasts of higher plants have suggested that identification of RNA editing sites could be very useful for enriching the genetic information, which makes it easier for organisms to adapt to the environment and also uncover the evolutionary patterns of RNA editing [18,45].Here, a total of 175 editing sites were predicted in these six Panicoideae cp genomes, and further analysis found that all of these sites were C to U conversions, which was consistent with a previous study of RNA editing in cp genomes in N. tabacum [46].Among them, 31 editing sites in 14 genes were identified for broomcorn millet, 30 editing sites in 15 genes for maize, 29 editing sites in 14 genes for switchgrass, 28 editing sites in 14 genes for sorghum, 28 editing sites in 14 genes for foxtail millet, and 29 editing sites in 15 genes for sugarcane.Furthermore, 11 sites in switchgrass, two sites in sorghum, one site in foxtail millet, and nine sites in maize were validated by EST alignment analysis.In addition, we compared the RNA editing sites patterns in these plastids, and 15 sites in nine genes were found to be shared by these six Panicoideae cp genomes (Table 5).It has been proven that the RNA editing was evolutionarily conserved and that species having a closer relationship generally shared more editing sites [45].In this study, broomcorn millet was found to share 29 editing sites with switchgrass, which was more overlap than for the other four species.Although they seemed to have rather similar features in RNA editing, there were still some specific Similarly, 94, 78, 87, 89, and 98 SSRs were identified in the maize, switchgrass, sorghum, foxtail millet, and sugarcane cp genome, respectively.The majority of the SSRs in these six cp genome were trinucleotides, ranging from 42 in foxtail millet to 50 in maize, and mononucleotides were the second most abundant, varying from 21 in foxtail millet to 36 in the broomcorn millet.Additionally, SSRs with pentanucletide and hexanucletiode motifs in these genomes were scarce: only one hexanucletiode was found in the sorghum and one pentanucletide was found in foxtail millet, sugarcane, and switchgrass (Figure 5 and Table S1).Among all the SSR motifs identified in the broomcorn millet cp genome, 85.9% of mononucleotides were comprised of A or T, and a similar proportion of trinucleotides (73.8%) was composed of A or T. Similarly, the motif tended to have a higher percentage of A or T in the other five related species.Our results were consistent with previous studies that reported that potential SSR markers identified in the cp genome commonly consisted of polyA or polyT repeats, and seldom included C and G repeats, which partially explained why broomcorn millet had a higher frequency of AT content [34,36].Based on the SSR dynamics in these six cp genomes, a total of 13 potential SSR markers were developed, which could be used for molecular identification of Panicoideae and other species (Table S2).

Prediction of RNA Editing Sites in Panicoideae cp Genomes
RNA editing is an important way to regulate gene expression at a post-transcriptional level, which generally changes original genetic information by insertion, deletion, and base substitution of nucleotides within mRNA molecules [44].Various studies in the chloroplasts of higher plants have suggested that identification of RNA editing sites could be very useful for enriching the genetic information, which makes it easier for organisms to adapt to the environment and also uncover the evolutionary patterns of RNA editing [18,45].Here, a total of 175 editing sites were predicted in these six Panicoideae cp genomes, and further analysis found that all of these sites were C to U conversions, which was consistent with a previous study of RNA editing in cp genomes in N. tabacum [46].Among them, 31 editing sites in 14 genes were identified for broomcorn millet, 30 editing sites in 15 genes for maize, 29 editing sites in 14 genes for switchgrass, 28 editing sites in 14 genes for sorghum, 28 editing sites in 14 genes for foxtail millet, and 29 editing sites in 15 genes for sugarcane.Furthermore, 11 sites in switchgrass, two sites in sorghum, one site in foxtail millet, and nine sites in maize were validated by EST alignment analysis.In addition, we compared the RNA editing sites patterns in these plastids, and 15 sites in nine genes were found to be shared by these six Panicoideae cp genomes (Table 5).It has been proven that the RNA editing was evolutionarily conserved and that species having a closer relationship generally shared more editing sites [45].In this study, broomcorn

Phylogenetic Analysis
Poaceae is one of the most species-rich families of the angiosperms, with approximately 12,000 species and 771 genera [28].Among them, there are 3560 species in the subfamily Panicoideae, which are further divided into 12 tribes.The chloroplast genome is a vital resource for addressing diverse phylogenetic questions across the Panicoideae family and analyzing the evolutionary relationships within the family.Various studies have been carried out to study the taxonomic status in order to have a better understanding of the phylogenetic relationship in the Panicoideae family; the single gene matK was chosen to conduct a molecular phylogenetic study [39].At the same time, trnL-trnF, psbJ-petA, and trnQ-rps16 gene sites were combined to analyze the evolutionary relationships among highly variable and controversial taxonomic groups of Panicoideae [42].In order to further study the phylogenetic relationship of broomcorn millet within the Panicoideae family, 81 genes shared by the 66 cp genomes including five Panicoideae members, were used for multiple alignment based on the MSWAT database.After gap removal, there were 55,628 characters remaining in the final database.MP analysis based on MEGA software 7.0 [47] with Nuphar advena and Nymphaea alba as outgroup generated a single tree with a length of 146,741, a consistency index of 0.3838, and a retention index of 0.6016.Bootstrap analysis implied that 51 of 63 nodes have bootstrap values >95% and 42 of these with bootstrap values of 100%.Maximum likelihood (ML) analysis generated the same topology tree with the MP method.ML Bootstrap values were high and all 63 nodes had 100% bootstrap values (Figure 6).Overall, the phylogenetic tree was divided into two major clades: monocots and eudicots.All five Panicoideae species were clustered into the Panicoideae group and placed within the monocots.It was observed that P. virgatum was sister to P. miliaceum L. In a previous study, P. virgatum was clustered into subtribe Panicinae, which was the subfamily of tribe Paniceae, and S. bicolor was sister to S. officinarum, though S. bicolor belonged to subtribe Sorghinase and S. officinarum belonged to subtribe Saccharinae.In addition, Z. mays was clustered into subtribe Tripsacinae.All three subtribes belonged to the tribe Andropogoneae, which was similar to previous reports [44].All five species are clustered into the PACMAD clade, and P. virgatum shows a closer relationship with P. miliaceum than the other three species.

Genome Assembly and Genome Annotation
The raw Illumina sequence data of broomcorn millet downloaded from the NCBI database (accession no.SRR2163427) were used in this study.The chloroplast genome of broomcorn millet was assembled using the method reported by Nie et al. with some modifications [9].The lower-quality reads were filtered out with the trimmomatic-0.36 tool (http://www.usadellab.org/cms/?page=trimmomatic).The obtained clean data were used to capture the chloroplast reads by mapping with switchgrass and corn cp genomes as a reference based on BWA software [48].Then, the captured cp reads were directly assembled into contigs with a

Genome Assembly and Genome Annotation
The raw Illumina sequence data of broomcorn millet downloaded from the NCBI database (accession no.SRR2163427) were used in this study.The chloroplast genome of broomcorn millet was assembled using the method reported by Nie et al. with some modifications [9].The lower-quality reads were filtered out with the trimmomatic-0.36 tool (http://www.usadellab.org/cms/?page=trimmomatic).The obtained clean data were used to capture the chloroplast reads by mapping with switchgrass and corn cp genomes as a reference based on BWA software [48].Then, the captured cp reads were directly assembled into contigs with a minimum length of 100 bp using the SOAPdenovo2 software [49].Furthermore, these contigs were aligned to the switchgrass cp genome (considered the reference genome) by the BLAST program (http://blast.ncbi.nlm.nih.gov/), and these aligned contigs were ordered according to the reference genome.Finally, the clean data were first mapped against the obtained sequence to close gaps and then the consensus sequence was used to close the gaps again to obtain the candidate genome sequence.
The online program DOGMA was used to predict and annotate the protein-coding genes, tRNA genes, and rRNA genes in the Broomcorn millet cp genome with default parameters [50], coupled with manual corrections.The position of codons, or intron/exton junctions of the protein-coding gene, were determined following Sugita and Sugiura [51], with those of the switchgrass cp genome as a reference.A circular map of the broomcorn millet cp genome was drawn using OGDRAW v1.2 software [52].The GC content and codon usage were calculated manually.

Sequence and Structural Comparison in Six Panicoideae cp Genomes
The Shuffle-LAGAN mode of the mVISTA online program (http://genome.lbl.gov/vista/mvista/) [53] was used to compare the sequence similarity of the full chloroplast genome of broomcorn millet with other five Panicoideae chloroplast genomes, including sugarcane (KU214867.1),foxtail millet (KF646538.1),sorghum (EF115542.1),switchgrass (NC_015990.1), and maize (KF241981.1)using the annotation information of broomcorn millet.A structural comparison of these genomes was performed using the MAUVE tool [54].Finally, the IR expansion or contraction, together with the adjacent genes, was compared across the six Panicoideae cp genomes manually based on the annotation files.

SSRs Identification and RNA Editing
MISA software (http://pgrc.ipk-gatersleben.de/misa)was used to detect simple sequence repeats (SSRs) within the six cp genome, with the parameters setting as ≥10 repeats of mononucleotides, ≥5 repeats of dinucleotides, trinucleotides and tetranucleotides, as well as ≥3 repeats of pentanucleotide and hexanucleotide SSRs, respectively.All the protein-coding genes of the six Panicoideae cp genomes were obtained from the NCBI database (http://www.ncbi.nlm.nih.gov)based on their assembled and annotated genome information.The RNA editing sites in these genes were predicted by Predictive RNA editor for plants (PREP)-Cp tool (http://prep.unl.edu/cgi-bin/cp-input.pl)with default parameters.Significant hits were examined manually, and only the base-pair transversion C to T was viewed as the RNA-edited sites.In order to validate the prediction, the expressed sequence tags (EST) sequences of six species obtained from the Compositae Genome Project Database (http://compgenomics.ucdavis.edu/)were searched against the predicted genes using BLAST tools.

Phylogenetic Analysis
A total of 81 cp genes from 66 taxa comprising five Panicoideae species, 58 other angiosperm lineages, and three gymnosperms were used to analyze the phylogenetic relationships among these species.All sequences were aligned using the database MSWAT (http://mswat.ccbb.utexas.edu/)and edited manually.Nuphar advena and Nymphaea alba were designated as out-groups, and MEGA version 7 software [47] was used to perform maximum parsimony (MP) analysis with 1000 replicates and tree bisection-reconnection (TBR) branch swapping with the MulTrees option, and the GTR + I + G nucletiode substitution model was selected for the maximum likelihood (ML) analysis using PhyMLv3.0 [55].

Conclusions
This study reported the complete cp genome of broomcorn millet and a comparative analysis of six Panicoideae cp genomes was performed to reveal the genome features and evolutionary significance.Furthermore, the SSR composition, RNA editing sites, and phylogenetic relationships of these cp genomes were systematically investigated and compared.This not only provided important information on the genome organization, gene content, and structural variation of broomcorn millet and other Panicoideae cp genomes, but also provided important clues and tools for phylogenetic studies in the Panicoideae subfamily, which could be useful for molecular phylogeny and population studies within this subfamily and beyond.
Author Contributions: X.N. performed the entire analysis and drafted the paper.X.Z. and S.W. contributed to genome assembly and comparative analysis.C.L. and H.L. performed the RNA editing analysis.W.T. contributed reagents/materials/analysis tools and revised the manuscript.Y.G. conceived the entire study.All authors read and approved the manuscript.

Figure 1 .
Figure 1.Chloroplast genome map of broomcorn millet.Genes located in the outside of the outer circle are transcribed clockwise, and the genes located inside are transcribed counterclockwise.Genes with different functions were divided into different groups with various colors.

Figure 1 .
Figure 1.Chloroplast genome map of broomcorn millet.Genes located in the outside of the outer circle are transcribed clockwise, and the genes located inside are transcribed counterclockwise.Genes with different functions were divided into different groups with various colors.

Figure 2 .
Figure 2. Comparison of the border positions of SSC, LSC, and IR regions among six Panicoideae chloroplast genomes.

Figure 2 .
Figure 2. Comparison of the border positions of SSC, LSC, and IR regions among six Panicoideae chloroplast genomes.

Figure 3 .
Figure 3. Percent identity analysis for the comparison of six Panicoideae chloroplast genomes using mVISTA program.The top line displays genes in order with arrows.Sequence similarity of aligned regions between broomcorn millet and the other five species is shown in the direction of horizontal level implying average percent identity between 50-100% (shown on y-axis of graph).The x-axis is considered the coordinate in the cp genome.

Figure 3 .
Figure 3. Percent identity analysis for the comparison of six Panicoideae chloroplast genomes using mVISTA program.The top line displays genes in order with arrows.Sequence similarity of aligned regions between broomcorn millet and the other five species is shown in the direction of horizontal level implying average percent identity between 50-100% (shown on y-axis of graph).The x-axis is considered the coordinate in the cp genome.

Figure 4 .
Figure 4. Comparison of the genome structure of six Panicoideae chloroplast genomes, with N. tabacum as the reference.The boxes above the line represent DNA sequences in a clockwise direction, and those below the line represent DNA sequences in the counterclockwise direction.

Figure 4 .
Figure 4. Comparison of the genome structure of six Panicoideae chloroplast genomes, with N. tabacum as the reference.The boxes above the line represent DNA sequences in a clockwise direction, and those below the line represent DNA sequences in the counterclockwise direction.

Figure 5 .
Figure 5. Analysis of simple sequence repeat (SSR) distribution in broomcorn millet and five other Panicoideae species.

Figure 5 .
Figure 5. Analysis of simple sequence repeat (SSR) distribution in broomcorn millet and five other Panicoideae species.

Figure 6 .
Figure 6.The MP phylogenetic tree is based on 81 genes from 66 plant taxa.The MP tree has the length of 146741and consistency index of 0.3838, and retention index of 0.6016.The numbers above the node are the bootstrap support values.The ML tree, which has the same topology, is not shown.

Figure 6 .
Figure 6.The MP phylogenetic tree is based on 81 genes from 66 plant taxa.The MP tree has the length of 146741and consistency index of 0.3838, and retention index of 0.6016.The numbers above the node are the bootstrap support values.The ML tree, which has the same topology, is not shown.

Table 1 .
Comparisons of CpDNA features among the six Panicoideae species.

Table 2 .
Genes found in the assembled broomcorn millet chloroplast genome.
NADH oxidoreductase ndhAb, Bb, c, C, D, E, F, G, H, I, J, K Cytochrome b6/f petA, Bb, Db, G, L, N ATP synthase atpA, B, E, Fb, H, I a Gene containing two intron.b Gene containing a single intron.c Two gene copies in the IRs.d Gene divided into two independent transcription units.

Table 3 .
The genes having introns in the broomcorn cp genome, and the length of the exons and introns.The length units for the exons and introns are base pair (bp).*rps12 is a trans-spliced gene with the 5' end exon located in the LSC region and the duplicated 3' end exon located in IR regions.

Table 4 .
The codon-anticodon recognition pattern and codon usage for broomcorn cp genome.
* Numerals indicate the frequency of usage of each codon in 20,301 codons in 78 potential protein-coding genes.