The Complete Chloroplast Genomes of Punica granatum and a Comparison with Other Species in Lythraceae

Pomegranates (Punica granatum L.) are one of the most popular fruit trees cultivated in arid and semi-arid tropics and subtropics. In this study, we determined and characterized three complete chloroplast (cp) genomes of P. granatum cultivars with different phenotypes using the genome skimming approach. The complete cp genomes of three pomegranate cultivars displayed the typical quadripartite structure of angiosperms, and their length ranged from 156,638 to 156,639 bp. They encoded 113 unique genes and 17 are duplicated in the inverted regions. We analyzed the sequence diversity of pomegranate cp genomes coupled with two previous reports. The results showed that the sequence diversity is extremely low and no informative sites were detected, which suggests that cp genome sequences may be not be suitable for investigating the genetic diversity of pomegranate genotypes. Further, we analyzed the codon usage pattern and identified the potential RNA editing sites. A comparative cp genome analysis with other species within Lythraceae revealed that the gene content and organization are highly conserved. Based on a site-specific model, 11 genes with positively selected sites were detected, and most of them were photosynthesis-related genes and genetic system-related genes. Together with previously released cp genomes of the order Myrtales, we determined the taxonomic position of P. granatum based on the complete chloroplast genomes. Phylogenetic analysis suggested that P. granatum form a single clade with other species from Lythraceae with a high support value. The complete cp genomes provides valuable information for understanding the phylogenetic position of P. gramatum in the order Myrtales.


Introduction
Pomegranates (Punica granatum L.) are an economically important fruit tree of the tropical and subtropical regions of the world. It is native to central Asia and has been highly praised in many human cultures since ancient times [1]. Pomegranates have showy edible fruit with a high content of anthocyanins and flavonoids [2,3]. It has been well demonstrated that pomegranates are valuable to human health due to high levels of flavonoids and anthocyanins, which are considered potent antioxidants offering protection against heart disease and cancer [4,5]. Also, the pomegranate tree is suitable for genetic analysis due to its short juvenile period, and the high number of progenies [6]. As important resources for basic research and crop improvement, the genome of P. granatum 'Taishanhong' has been determined [7]. This genome will shed new light on the understanding

General Features of Pomegranate Chloroplast Genomes
The complete cp genomes of 'Nana', 'Tunisia', and 'Taishanhong' were de novo assembled using whole genome sequencing data with GetOrganelle [27]. The cp genomes of 'Nana', 'Tunisia', and 'Taishanhong' were found to be 158,638, 158,639, and 158,638 bp in size, respectively. All of them exhibited a typical quadripartite structure, consisting of a pair of IRs separated by a large single copy region (LSC) and a small single copy region (SSC) (Figure 1). There are identical sets of 113 genes with the same gene order, including 79 protein-coding, 30 tRNA, and 4 rRNA genes. Six protein-coding genes (rps7, rps12, rpl2, rpl23, ndhB, ycf2), seven tRNA genes (trnI-CAU, trnN-GUU, trnR-ACG, trnA-UGC, trnI-GAU, trnV-GAC, trnL-CAA), and all rRNA genes (4.5S, 5S, 16S, 23S) are located at the IR regions. Eleven of the protein-coding genes and six of the tRNA genes contain introns, 14 of which contain a single intron, whereas three (rps12, ycf3, clpP) have two introns (Table 1). In particular, the rps12 is a trans-spliced gene, whose first exon is located in the LSC, while the second and third exons reside in IRs. The infA gene was identified as a pseudogene because of the accumulation of the premature stop codons [28]. Another pseudogene ycf1 existed because of the incomplete duplication of the normal copy of ycf1 in the IRa and SSC junction, which is identical with previous reports [29,30]. There are some exceptions where non-ATG codons were identified as start codons, such as ACG for psbL, GTG for rps19, and ACG for ndhD. Alternate start codons have been found in other plant species [31]. Alternate start codons are still translated as Met when they are the start of a protein because a separate transfer RNA is used for initiation [32]. The overall GC content was 36.92%; this was consistent with previously reported GC content of IRs (42.78%) being higher than that of the LSC (34.89%) and SSC (30.64%) [33]. The high GC percentage of IRs could be due to the presence of rRNA sequences in these regions [34].
normal copy of ycf1 in the IRa and SSC junction, which is identical with previous reports [29,30]. There are some exceptions where non-ATG codons were identified as start codons, such as ACG for psbL, GTG for rps19, and ACG for ndhD. Alternate start codons have been found in other plant species [31]. Alternate start codons are still translated as Met when they are the start of a protein because a separate transfer RNA is used for initiation [32]. The overall GC content was 36.92%; this was consistent with previously reported GC content of IRs (42.78%) being higher than that of the LSC (34.89%) and SSC (30.64%) [33]. The high GC percentage of IRs could be due to the presence of rRNA sequences in these regions [34]. Chloroplast DNA has already been used in accessing the genetic diversity and phylogenetic structure at an intraspecies level. For instance, hypervariable regions of cp DNA such as atpB-rbcL, trnL-trnF and rps16-trnQ were used to assess the genetic diversity of Tunisian apricot accessions [35]. Also, chloroplast microsatellite loci were used to investigate the genetic diversity of Iranian pomegranate genotypes [36]. In our present study, the sequence diversity of pomegranate cp genomes was investigated combined with two previously reported cp genomes (NC_035240, MG878386). The results showed that the sequence diversity of pomegranates is extremely low (0.0008). Only 42 singleton variable sites were detected, and there were no parsimony variable sites in the alignment of the cp genomes of the five Chloroplast DNA has already been used in accessing the genetic diversity and phylogenetic structure at an intraspecies level. For instance, hypervariable regions of cp DNA such as atpB-rbcL, trnL-trnF and rps16-trnQ were used to assess the genetic diversity of Tunisian apricot accessions [35]. Also, chloroplast microsatellite loci were used to investigate the genetic diversity of Iranian pomegranate genotypes [36]. In our present study, the sequence diversity of pomegranate cp genomes was investigated combined with two previously reported cp genomes (NC_035240, MG878386). The results showed that the sequence diversity of pomegranates is extremely low (0.0008). Only 42 singleton variable sites were detected, and there were no parsimony variable sites in the alignment of the cp genomes of the five pomegranate accessions. Therefore, we propose that cp genome sequences might not be appropriate for investigating the genetic diversity of pomegranate genotypes. Ribosomal proteins (SSU) rps2, rps3, rps4, rps7*, rps8, rps11, rps12* b , rps14, rps15, rps16 a , rps18, rps19 Ribosomal proteins (LSU) rpl2 * , rpl14, rpl16 a , rpl20, rpl22, rpl23*, rpl32, rpl33, rpl36 Hypothetical chloroplast reading frames ycf1, ycf2*, ycf3 b , ycf4 Translation initiation factor IF-1 infA

Codon Usage Bias
As an essential evolutionary feature, the codon usage pattern has been widely investigated in many plant species [37][38][39]. In our study, we explored the codon usage pattern in the cp genomes of pomegranates. Protein-coding genes with more than 300 nucleotides were selected for further analysis. Firstly, the base composition on three different codon positions was determined, and the data are displayed in Figure 2A. The results indicated that the average GC content of the first (GC1), second (GC), and third codon positions (GC3) were 47.04, 39.79, and 28.34%, respectively. The base compositions of the three different positions were distributed unevenly. The average GC3 content was significantly lower than those of GC1 and GC2. The results of neutrality plots ( Figure 2B) showed that no significant correlation (R2 = 0.0036) between GC12 and GC3 was observed, which suggests that selective pressure affects the codon usage bias in the pomegranate cp genomes [40,41]. The codon adaptation index (CAI) value ( Figure 2C) ranged from 0.5 to 1 with a default E. coli reference gene set as the reference. According to their functions in the chloroplast, the protein-coding genes can be classified into three categories: photosynthesis related genes (photo-genes), genetic system related genes (genet-genes), and other genes. A recent study about codon usage bias of cp genomes in cultivated and wild Solanum species concluded that photo-genes always had higher CAI values than genet-genes because the expression level of photo-genes is relatively higher than that of genet-genes [42]. The same result was also observed in the cp genome of pomegranates. The main reason is probably due to the fact that photo-genes may have a higher codon usage bias for the requirement of high gene expression than do the genet-genes in the plant cp genomes. Furthermore, the relationship between bas compositions and codon usage was investigated by ENC-plot ( Figure 2D). Effective number of codons (ENC) values ranged from 35.73 to 61, suggesting that codon usage bias is relatively weak in the pomegranate cp genomes. The distribution of most genes was far away from the standard curve, which shows that there are other factors that affect the codon usage, other than base compositions [43][44][45]. As the cp genomes were highly AT-rich, it was not surprising that AT-ending codons would be predominant in the protein-coding genes. The results are also consistent with the mutational bias towards AT being the force driving the strong bias of codon usage of plan cp genomes [43]. Also, the Arg amino acid coded with the AGA codon was the most frequent codon with a relative synonymous codon usage (RSCU) value 2.02 ( Table 2).
According to their functions in the chloroplast, the protein-coding genes can be classified into three categories: photosynthesis related genes (photo-genes), genetic system related genes (genet-genes), and other genes. A recent study about codon usage bias of cp genomes in cultivated and wild Solanum species concluded that photo-genes always had higher CAI values than genet-genes because the expression level of photo-genes is relatively higher than that of genet-genes [42]. The same result was also observed in the cp genome of pomegranates. The main reason is probably due to the fact that photo-genes may have a higher codon usage bias for the requirement of high gene expression than do the genet-genes in the plant cp genomes. Furthermore, the relationship between bas compositions and codon usage was investigated by ENC-plot ( Figure 2D). Effective number of codons (ENC) values ranged from 35.73 to 61, suggesting that codon usage bias is relatively weak in the pomegranate cp genomes. The distribution of most genes was far away from the standard curve, which shows that there are other factors that affect the codon usage, other than base compositions [43][44][45]. As the cp genomes were highly AT-rich, it was not surprising that AT-ending codons would be predominant in the protein-coding genes. The results are also consistent with the mutational bias towards AT being the force driving the strong bias of codon usage of plan cp genomes [43]. Also, the Arg amino acid coded with the AGA codon was the most frequent codon with a relative synonymous codon usage (RSCU) value 2.02 ( Table 2).

RNA Editing Sites
RNA editing is a posttranscriptional process, which has been experimentally identified in organellar transcriptomes from several species [46,47]. It mainly involves the conversion of cytidine to uridine, which generally results in amino acid changes. Therefore, knowing where sites of RNA editing exist in the organelle transcriptome could provide information for understanding the structure and function of the translated proteins [48]. The potential RNA editing sites in the pomegranate cp genome were predicted using the online program PREP. A total of 64 editing sites in 20 protein-coding genes were identified ( Table 3). The ndhB gene had the highest number of potential editing sites (11), followed by the ndhD gene (9). In accordance with previous reports [49,50], we observed that most conversions at the codon positions changed from serine (S) to leucine (L) and most RNA editing sites led to amino acid changes from polar to apolar, which resulted in an increase in protein hydrophobicity. Table 3. Predicted RNA editing sites in the cp genome of P. granatum.

Sequence Diversity of the Chloroplast Genomes among Lythraceae Species
Four complete cp genomes within Lythraceae, available in GenBank with our newly assembled 'Taishanhong', were selected to analyze the sequence diversity. The mean P-distance value was designated to represent the level of divergence. The genetic distance of all 76 protein-coding genes ( Figure 3A) ranged from 0.003053 (psbN) to 0.108932 (rpl22), with an average of 0.024379. The intergenic regions had a relatively higher genetic distance ( Figure 3B) compared to the protein-coding regions, ranging from 0.005621 (trnL-ndhB) to 0.23463 (trnL-ycf2) with an average value of 0.069775. The results agree with previous reports that intergenic regions showed greater divergence than coding regions ( Figure 3D) [51]. The SSC region exhibited higher divergence levels than the LSC and IRs ( Figure 3C). Three intergenic regions with genetic distance values over the 95th percentile were considered as highly divergent regions, including trnH-psbA, trnL-ccsA, and trnL-ycf2. These highly variable regions may be regarded as potential molecular markers for application in phylogenetic analyses in Lythraceae.
assembled 'Taishanhong', were selected to analyze the sequence diversity. The mean Pdistance value was designated to represent the level of divergence. The genetic distance of all 76 protein-coding genes ( Figure 3A) ranged from 0.003053 (psbN) to 0.108932 (rpl22), with an average of 0.024379. The intergenic regions had a relatively higher genetic distance ( Figure 3B) compared to the protein-coding regions, ranging from 0.005621 (trnL-ndhB) to 0.23463 (trnL-ycf2) with an average value of 0.069775. The results agree with previous reports that intergenic regions showed greater divergence than coding regions ( Figure 3D) [51]. The SSC region exhibited higher divergence levels than the LSC and IRs ( Figure 3C). Three intergenic regions with genetic distance values over the 95 th percentile were considered as highly divergent regions, including trnH-psbA, trnL-ccsA, and trnL-ycf2. These highly variable regions may be regarded as potential molecular markers for application in phylogenetic analyses in Lythraceae.

Structure Comparison among the Chloroplast Genomes of Lythraceae Species
Five complete cp genomes within Lythraceae were selected for comparison with each other. The genome sizes was ranged from 152,205 to 159,219 bp. The length of the LSC, SSC, and IRs varied in the range of 84,046-89,021 bp, 16,914-18,821 bp, and 23,902-25,914 bp, respectively. Lagerstroemia indica has the smallest genome, and this difference is mostly attributed to variation in the length of the LSC and IR regions (Table 4). A detailed comparison on four borders between the two IRs and the two single-copy regions showed that the border structures were highly similar with one another (Figure 4). However, a slight difference in junction positions was observed among these five cp genomes. For instance, the ndhF gene was located at the SSC region in Sonneratia alba, Trapa maximowiczii, Punica granatum, and Heimia

Structure Comparison among the Chloroplast Genomes of Lythraceae Species
Five complete cp genomes within Lythraceae were selected for comparison with each other. The genome sizes was ranged from 152,205 to 159,219 bp. The length of the LSC, SSC, and IRs varied in the range of 84,046-89,021 bp, 16,914-18,821 bp, and 23,902-25,914 bp, respectively. Lagerstroemia indica has the smallest genome, and this difference is mostly attributed to variation in the length of the LSC and IR regions (Table 4). A detailed comparison on four borders between the two IRs and the two single-copy regions showed that the border structures were highly similar with one another (Figure 4). However, a slight difference in junction positions was observed among these five cp genomes. For instance, the ndhF gene was located at the SSC region in Sonneratia alba, Trapa maximowiczii, Punica granatum, and Heimia myrtifolia, while it varied from 3 to 54 bp apart from the IRb/SSC junction. However, the ndhF gene crossed over the IRb/SSC region in Lagerstroemia indica. The rps19 gene was located in the junction of the LSC/IRb in Trapa maximowiczii, Lagerstroemia indica, and Punica granatum, with 24-83 bp located in the IRb. However, in Heimia myrtifolia and Sonneratia alba, the rps19 gene was fully located in the LSC region, and 4-16 bp apart from the LSC/IRb border. Overall, the IR boundary regions varied slightly in the Lythraceae cp genomes. IR expansion and contraction often results in genome size variations among various plant lineages, which can be used to study the phylogenetic classification and the genome evolution among plant lineages. Three reasons may explain the diversification of the IR boundary region sequences: the first is intramolecular recombination, the second is the presence of multiple repeat sequences, and the third is the indels, which caused a mismatch that resulted in the upstream sequence becoming a single copy [52].
Pairwise alignment of the P. granatum cp genome with the other Lythraceae species revealed a high degree of synteny and gene order conservation ( Figure 5), suggesting an evolutionary conservation of the Lythraceae cp genomes at the genome-scale level. IR boundary regions varied slightly in the Lythraceae cp genomes. IR expansion and contraction often results in genome size variations among various plant lineages, which can be used to study the phylogenetic classification and the genome evolution among plant lineages. Three reasons may explain the diversification of the IR boundary region sequences: the first is intramolecular recombination, the second is the presence of multiple repeat sequences, and the third is the indels, which caused a mismatch that resulted in the upstream sequence becoming a single copy [52].

Positive Selection Analysis
The ratios of non-synonymous (dN) and synonymous (dS) substitutions for 75 proteincoding genes among five Lythraceae were calculated based on the site-specific model. Eleven

Positive Selection Analysis
The ratios of non-synonymous (dN) and synonymous (dS) substitutions for 75 protein-coding genes among five Lythraceae were calculated based on the site-specific model. Eleven genes with positively selected sites within the Lythraceae family were identified (Table 5). Those genes contained one subunit of acetyl-CoA carboxylase (accD), one photosystem I subunit gene (psaI), two NADH-dehydrogenase subunit genes (ndhF, ndhJ), one ribosome large subunit gene (rpl22), five ribosome small subunit genes (rps2, rps4, rps7, rps8, rps12), and the ycf1 gene. According to the M8 model, the ycf1 gene possessed 10 positive sites, followed by ndhF (7) and rpl22 (5). The other eight genes each had only one positive site. The Photo-genes included four genes (accD, psaI, ndhF, ndhJ). The Genet-genes included six genes (rpl22, rps2, rps4, rps7, rps8, rps12). The ycf1 gene was considered as the other gene. Most positively selected genes were genetic system or photosynthesis related genes, which indicated that the chloroplast functional genes played vital roles during the plant evolution [53,54].

The Phylogenetic Position of P. granatum
Complete cp genomes are important as they can provide information for understanding the phylogenetic relationships at multiple taxonomic levels. For example, recent phylogenetic analyses based on protein-coding genes of the cp genome provided the broad phylogenetic framework for Viridiplantae, which has significant value both to evolutionary biologists and ecologists [55]. The genus Punica belongs to the order Myrtales and most likely originated from Saxifragales. However, the placement of genus Punica under different families such as Punicaceae, Lythraceae, and Myrtaceae has remained debatable [6]. Recent phylogenomic analysis based on 106 single-copy nuclear genes was performed and supported that P. granatum belongs to the Lythraceae family rather than the monogeneric Punicaeceae family [7].
To determine the position of P. granatum and to further analyze the relationships within Myrtales, we used the complete cp genome sequences to perform the phylogenetic analysis. Eighty-five species representing five families from the order Myrtales were selected. Two species from the order Vitales (Vitis acerifolia, Vitis vinifera) were selected as outgroups. To avoid systematic errors produced by poor alignment, we removed poorly aligned sites using Gblocks. After the removal of the ambiguously aligned regions, the alignment contained 106,571 sites in total, including 20,088 parsimony-informative sites, 9178 singleton sites, and 77,305 constant sites. The method of data analysis (ML and BI) did not affect the results, and the treetopologies of ML and BI were found to be the same. The phylogenetic relationships among five families were fully resolved, and node support values for the ML type were higher than 90.
The tree showed ( Figure 6) that all the accessions of the pomegranate were grouped into a single clade with other closely related species of the Lythraceae family. The monophyly of the family Lythraceae and the sister relation with family Onagraceae is highly supported (>90). Myrtaceae were strongly supported as monophyletic and formed a sister relationship with Melastomataceae. Five pomegranate accessions formed a clade with zero or nearly zero branches length, which suggests that the cp genome might be of limited use for cultivar identification and population genetic studies of P. granatum. Our full genomic data set resolved the phylogenomic placement of Punica and provided strong support for most relationships of Myrtales. www.mdpi.com/journal/ijms the placement of genus Punica under different families such as Punicaceae, Lythraceae, and Myrtaceae has remained debatable [6]. Recent phylogenomic analysis based on 106 single-copy nuclear genes was performed and supported that P.granatum belongs to the Lythraceae family rather than the monogeneric Punicaeceae family [7].
To determine the position of P. granatum and to further analyze the relationships within Myrtales, we used the complete cp genome sequences to perform the phylogenetic analysis. Eightyfive species representing five families from the order Myrtales were selected. Two species from the order Vitales (Vitis acerifolia, Vitis vinifera) were selected as outgroups. To avoid systematic errors produced by poor alignment, we removed poorly aligned sites using Gblocks. After the removal of the ambiguously aligned regions, the alignment contained 106,571 sites in total, including 20,088 parsimony-informative sites, 9178 singleton sites, and 77,305 constant sites. The method of data analysis (ML and BI) did not affect the results, and the treetopologies of ML and BI were found to be the same. The phylogenetic relationships among five families were fully resolved, and node support values for the ML type were higher than 90.
The tree showed ( Figure 6) that all the accessions of the pomegranate were grouped into a single clade with other closely related species of the Lythraceae family. The monophyly of the family Lythraceae and the sister relation with family Onagraceae is highly supported (> 90). Myrtaceae were strongly supported as monophyletic and formed a sister relationship with Melastomataceae. Five pomegranate accessions formed a clade with zero or nearly zero branches length, which suggests that the cp genome might be of limited use for cultivar identification and population genetic studies of P. granatum. Our full genomic data set resolved the phylogenomic placement of Punica and provided strong support for most relationships of Myrtales.

Plant Material
Three P. granatum cultivars with distinct phenotypes were chosen to reconstruct the complete cp genome: 'Nana' is a dwarf pomegranate, which has a small and sour fruit with hard seeds. 'Tunisia' is a domesticated cultivar characterized as a normal-sized tree with sweet taste and soft seeds. 'Taishanhong' is a widely grown landrace in China, characterized as having a bright red peel with delicious taste and hard seeds. The materials used in this study were collected from the experimental orchard at Nanjing Forestry University. The voucher specimen was deposited in Nanjing Forestry University.

DNA Sequencing, Genome Assembly, and Annotation
Total genomic DNA was extracted from mature leaves using a modified CTAB protocol. Firstly, 1.0 µg Genomic DNA was sheared into an average fragment size of 350 bp by a Covaris S220 sonicator (Woburn, Massachusetts, MA, USA). Then, the size distribution and concentration of the libraries were determined using an Agilent 2100 Bioanalyzer and qualified by real-time PCR (2 nM), respectively. DNA libraries were sequenced on Illumina Hiseq X Ten (Nanjing, China) for at least 150 bp reads. The raw sequence data reported in this paper were deposited in the Genome Sequence Archive in Big Data Center [56], Beijing Institute of Genomics (BIG) [57], Chinese Academy of Sciences, under the BioProject with the accession number PRJCA001313. After the fragments were filtered and trimmed by the fastp program [58], clean reads were obtained. Subsequently, the high-quality paired-end reads were used to de novo assemble the complete cp genomes using the GetOrganelle program [27] with a combined k-mer of 95,105,125. Genome annotation was performed using the online program GeSeq [59] for the pomegranate cp genomes previously reported. The annotation results were inspected using Geneious [60] and adjusted manually as needed. The cp genome map was drawn using the online tool OGDRAW [61]. The complete cp genomes have been submitted to Genbank with accession number MK603511-MK603513.

Codon Usage
The complete cp genome of the pomegranate cultivar 'Taishanhong' was selected to analyze the codon usage pattern. The protein-coding genes with more than 300 nucleotides were extracted according to the annotation file. The GC content of GC1, GC2, and GC3 was calculated using an in-house python script. The codon usage indices were calculated by CodonW v1.4.4, including the relative synonymous codon usage (RSCU), codon adaptation index (CAI), and the effective number of codons (ENC). RSCU values were close to 1 indicating that all synonymous codons encoding the same amino acid were used equally. CAI is used to measure the extent of bias towards preferred codons in a gene. A higher CAI value means a stronger codon usage bias and a higher expression level. ENC is used to measure codon usage evenness. Its value ranges from 20 (extremely biased) to 61 (totally unbiased) [62].

RNA Editing Sites
Prediction of the possible RNA editing sites in P. granatum protein-coding genes were performed using the online program predictive RNA editor for plants (PREP) suite [63] with 35 genes as reference. Only those sites which had a cutoff value of 0.8 were kept.

Sequence Diversity
Four cp genomes from Lythraceae were downloaded from GenBank, including Lagerstroemia indica (NC_030484), Heimia myrtifolia (MG921615), Sonneratia alba (NC_039975), and Trapa maximowiczii (NC_037023). These four cp genomes together with that of our newly assembled 'Taishanhong' genome were used to detect the divergent hot spot. Intergenic and protein-coding regions from five Lythraceae cp genomes were extracted using an in-house python script. Multiple sequence alignment was performed using MAFFT [64] and the mean P-distances were calculated using R package 'ape' [65] with Kimura's two-parameter model.

Structure Comparison
IR expansion and contraction of cp genomes among the five Lythraceae species mentioned above were analyzed using IRscope (Helsinki, Finland) [66]. We also conducted a co-linear analysis. A pairwise alignment was achieved by the lastz program. The results were visualized using AliTV (Wurzburg, Germany) [67].

Positive Selection Analysis
In order to detect the protein-coding genes under selection in Lythraceae, the sequences for each gene were aligned separately using the Muscle (codon) implemented in MEGA [68], and the Maximum likelihood phylogenetic tree based on complete cp genome sequences was constructed using IQ-tree [69]. The site-specific model was performed to test for natural selection using the CODEML algorithm [70] implemented in EasyCodeML [71]. Six codon substitution models described as M0, M1a, M2a, M3, M7, and M8 were investigated. This model allowed the ω ratio to vary among sites with a fixed ω ratio in all branches in order to test for site-specific evolution in the gene phylogeny. Two likelihood ratio tests were performed to detect positively selected sites: M1a (neutral) vs. M2a (positive selection), M7 (β) vs. M8 (β and ω), and M0 (one-ratio) vs. M3 (discrete), which were compared using a site-specific model [72].

Phylogenetic Analysis
To determine the phylogenetic position of P. granatum, phylogenetic analysis was performed using the complete cp genomes in the Myrtales. The cp genomes previously published in the Myrtales and two species from Vitales were downloaded from NCBI using an in-house python script. Multiple sequences alignment was achieved by HomBlocks pipelines [73]. Some poorly aligned regions were removed with Gblocks [74]. Two methods were employed to construct a phylogenetic tree, including Maximum likelihood (ML), and Bayes inference (BI). The dataset was unpartitioned. ML was implemented in IQ-tree [69] under the best-fit model selected by using ModelFinder [75] according to Akaike information criterion (AIC). The ML tree was inferred with 1000 bootstrap replicates (the '-bb' options). BI was performed with MrBayes [76] under the best-fit model determined by Modeltest with the AIC. The Marjkov chain Monte Carlo (MCMC) analysis was run for 2 × 200,000 generations. Trees were sampled at every 1,000 generations with the first 25% discarded as burn-in. The stationarity was considered to be reached when the average standard deviation of split frequencies remained below 0.001. Phylogenetic trees with bootstrap values (BS) and posterior probabilities (PP) were visualized using R package 'ggtree' [77].

Conclusions
Next generation whole genome shotgun sequences of plant species often contain numerous reads that are derived from the cp genomes, which provides a unique opportunity to assemble complete cp genomes. This method of using low coverage of the whole genome sequencing data to recover highly repetitive genome regions such as organelle genomes is called the "whole genome skimming approach". In the present study, we determined and characterized the complete cp genomes of three P. granatum cultivars using the whole genome sequencing data. Sequence diversity analysis revealed that cp genome sequences may not be suitable for investigating the genetic diversity of pomegranate genotypes. The genome sequencing data of three different cultivars are valuable resources for pomegranate breeding programs. The complete cp genome sequences that were newly assembled in our study could provide valuable information for understanding the evolutionary relationships among the Myrtales.

Conflicts of Interest:
The authors declare no conflict of interest.