Morphological Characteristics and Comparative Chloroplast Genome Analyses between Red and White Flower Phenotypes of Pyracantha fortuneana (Maxim.) Li (Rosaceae), with Implications for Taxonomy and Phylogeny

Pyracantha fortuneana (Maxim.) Li (Rosaceae), commonly known as Chinese firethorn, is an evergreen shrub with high nutritional, medicinal, and horticultural importance. This species typically has white flowers, but a rare red flower phenotype has been found in very few wild populations in western Hubei, China, showing great ornamental potential. In this study, the complete chloroplast genome of the red flower phenotype of P. fortuneana was reported for the first time, using high-throughput sequencing technology. The complete chloroplast genome was 160,361 bp in length and showed a typical quadripartite structure with a pair of inverted repeat (IR) regions (26,350 bp) separated by a large single-copy (LSC) region (88,316 bp) and a small single-copy (SSC) region (19,345 bp). A total of 131 functional genes were annotated in this chloroplast genome, including 86 protein-coding genes (PCGs), eight rRNA genes, and 37 tRNA genes. Comparative chloroplast genome analyses revealed that high genome similarity existed not only between red and white flower phenotypes of P. fortuneana, but also among Pyracantha species. No evidence for positive selection was found in any PCG, suggesting the evolutionary conservation of Pyracantha chloroplast genomes. Furthermore, four mutational hotspots (trnG-trnR-atpA, psbZ-trnG-trnfM-rps14, ycf3-trnS-rps4, and ndhF-rpl32) with π > 0.004 were identified as potential molecular markers for Pyracantha species. Phylogenomic analysis strongly supported that the red flower phenotype of P. fortuneana was nested within the common white flower phenotype. Based on both morphological and molecular evidence, we suggest that the red flower phenotype of P. fortuneana could be considered as a new forma. Overall, the availability of these genetic resources will not only offer valuable information for further studies on molecular taxonomy, phylogeny, and population genetics of Pyracantha species but also could be used as potential genetic resources for Chinese firethorn breeding.


Introduction
Pyracantha M. Roem. (Rosaceae), commonly called firethorn, is composed of 10 evergreen shrub species and mainly distributed in contiguous areas from Eastern Asia to Southern Europe [1,2]. China is the most important distribution center of this genus and harbors seven species, including five endemic species [1,3]. Previous morphological and molecular phylogenetic analyses placed this genus in the subfamily Maloideae [3], or treated it as a subtribe (Pyracanthinae) within the tribe Maleae of subfamily Amygdaloideae [4][5][6]. However, Pyracantha is currently recognized as a stable monophyletic Chloroplasts are believed to derive from a single primary endosymbiotic event involving the capture of a cyanobacterium and have their own genomes encoding many key proteins in relation to photosynthesis and other major cellular functions, including synthesis of starch, fatty acids, pigments, and amino acids [18][19][20]. A typical chloroplast genome is a double-stranded circular DNA and portrays a quadripartite structure, including a pair of inverted repeat (IR) regions separated by a large single-copy (LSC) region and a small single-copy (SSC) region [21]. Due to its independent matrilineal inheritance, the lack of genetic recombination, low levels of nucleotide substitution, and small effective popula-tion size, the chloroplast genome has been widely used for accurate species identification and phylogenetic inference in different plant lineages, especially those with a complex phenotypic evolution [22][23][24][25][26].
According to our field observation, we hypothesized that the red flower phenotype of P. fortuneana could be recognized as a rare phenotype resource of this species or be considered as a new forma. To test this hypothesis, the complete chloroplast genome of the red flower phenotype of P. fortuneana (P. fortuneana (red)) was sequenced and assembled. Combined with previously published chloroplast genomes of this genus, including two individuals of P. fortuneana with white flower phenotype (P. fortuneana-1 (white) (NC_059101) and P. fortuneana-2 (white) (MW596361)) and one individual each for Pyracantha atalantioides (Hance) Stapf (MW801001), Pyracantha coccinea M. Roem. (NC_062343), and Pyracantha angustifolia (Franch.) Schneid. (KY419957), respectively, we provided a total of six chloroplast genomes for comparative genomic and phylogenomic analyses. Our study aims were to: (1) characterize and compare the chloroplast genomes of Pyracantha species to demonstrate their evolutionary patterns; (2) screen and identify candidate DNA barcodes for species/phenotype identification within Pyracantha; (3) resolve phylogenetic relationships within the genus Pyracantha, and (4) gain insights into the taxonomy of the red flower phenotype of P. fortuneana based on both molecular and morphological evidence.

Sample Collection and Genome Sequencing
Young leaf materials of P. fortuneana with the red flower phenotype were collected from one wild population in western Hubei, China (29 • 21 15.68 N, 108 • 58 16.61 E, Alt. 750 m) and then dried with silica gel. Total genomic DNA was extracted from silica gel-dried leaf samples using a modified procedure of CTAB (cetyltrimethylammonium bromide) [27]. The purified genomic DNA was fragmented to construct short-insert libraries for low-depth whole-genome sequencing, using the Illumina paired-end technology platform (HiSeq-PE150), and about 8 Gb of raw data were obtained. Library construction and sequencing were conducted by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China).

Chloroplast Genome Assembly and Annotation
The raw reads were employed to assemble the complete chloroplast genome sequence of P. fortuneana (red) using a GetOrganelle pipeline [28] with the suggested default parameters. The connection and circularity of the assembly graphs from GetOrganelle were subsequently visually checked in Bandage v.0.8.1 [29]. The Chloroplast genome annotation was performed using the Plastid Genome Annotator (PGA) program [30], with the annotated sequences of Amborella trichopoda (AJ506156) and P. fortuneana (NC_059101) as references. The draft annotation was further verified by GeSeq software v.1.4.2 [31] and checked manually. The annotated chloroplast genome sequence of P. fortuneana was deposited in GenBank (accession No.: OM793283). The circular chloroplast genomic map was drawn using the online software Chloroplot (https://irscope.shinyapps.io/Chloroplot/, accessed on 2 December 2022) [32], with subsequent manual editing.

Comparative Chloroplast Genome Analyses
The basic features of the chloroplast genome sequence of P. fortuneana (red), including the size and GC content of different regions, and gene classification were analyzed with Geneious software v.10.2.3 [33] and were compared with those in four other complete chloroplast genomes of Pyracantha, i.e., P. fortuneana-1 (white), P. fortuneana-2 (white), P. coccinea, and P. atalantioides (note: the chloroplast genome of P. angustifolia was not included in this analysis because only one copy of the IR region was kept by the GenBank submitter). The inverted repeat-single copy (IR/SC) junction characteristics of these five Pyracantha chloroplast genome sequences were drawn in Adobe Illustrator. Furthermore, all six chloroplast genome sequences (only one copy of IR included) from four Pyracantha species (i.e., P. fortuneana, P. coccinea, P. atalantioides, and P. angustifolia) were aligned to identify potentially sequence rearrangements of Pyracantha chloroplast genomes using Mauve software v.2.3.1 [34]. The genome-wide similarity of these Pyracantha species was also plotted using the online software Circoletto (http://tools.bat.infspire.org/circoletto/, accessed on 12 July 2022) [35].

Codon Usage and RNA Editing Analyses
For the codon usage bias analysis, MEGA v.7.0 [36] was used to calculate the RSCU (relative synonymous codon usage) values of coding sequences (CDSs) across all six Pyracantha chloroplast genomes. Additionally, the potential RNA editing sites in P. fortuneana chloroplast genomes were further predicted using the PREP-Cp web server (http://prep.unl.edu/cgibin/cp-input.pl) [37], with a cutoff value of 1.

Selection Pressure and Analysis
The selection pressure on protein-coding genes (PCGs) of six Pyracantha chloroplast genomes was evaluated using the Datamonkey web server (https://www.datamonkey. org/, accessed on 1 July 2022) [38], with FEL (Fixed Effects Likelihood) as the best-fit method according to the step tips. All 78 shared PCGs of six Pyracantha chloroplast genomes were extracted in Geneious software v.10.2.3 and aligned using the program Muscle in MEGA software v.7.0 [39]. For each gene alignment matrix, the stop codons and unaligned fragments were removed using the program Gbloks v.0.91.b [40]. Then, all these single-gene alignment matrices were concatenated into a supermatrix alignment in PhyloSuite v.1.2.2 [41]. Finally, the concatenated protein-coding sequences were uploaded to Datamonkey to perform selection pressure analysis.

Identification of SSRs and Highly Variable Regions
The MISA Perl program [42] was used to identify simple sequence repeats (SSRs) across chloroplast genome sequences of three P. fortuneana accessions (one accession of the red flower phenotype and two accessions of the white flower phenotype) and another three Pyracantha species (i.e., P. coccinea, P. atalantioides, and P. angustifolia) with the common minimum repeat settings: ten for mononucleotides, five for dinucleotides, four for trinucleotides, and three for tetranucleotides, pentanucleotides, and hexanucleotides, respectively. DnaSP software v.6.12.3 [43] was used to calculate genome-wide nucleotide diversity (Pi) of the aligned chloroplast genome sequences of Pyracantha (excluding one copy of IR), with a window length of 600 bp and a step size of 200 bp.

Genetic Distance Analyses
Pairwise genetic distances between all Pyracantha individuals were computed under the analysis of molecular variance (AMOVA) and used to construct a neighbor-joining phenogram [44] in MEGA v.7.0 [36] based on the multiple alignment of Pyracantha chloroplast genomes with MAFFT software v.7.409 [45].

Phylogenetic Analyses
To explore phylogenetic relationships within the genus Pyracantha and among the members of the tribe Maleae of Rosaceae, a total of 31 chloroplast genome sequences (only one copy of IR kept) were used to reconstruct phylogenetic trees based on the methods of maximum likelihood (ML) and Bayesian inference (BI). Except for P. fortuneana (red), the other 30 chloroplast genome sequences were downloaded from the NCBI database (five accessions of the subtribe Pyracanthinae, five accessions of the subtribe Vauqueliniinae, five accessions of the subtribe Lindleyinae, 13 accessions of the subtribe Mespilinae, and two outgroups (Gillenia stipulata and Gillenia trifoliata)) ( Table S1). All 31 chloroplast genome sequences were aligned using MAFFT software v.7.409 [45], and then the best-fit models for the ML and BI methods were selected according to the Bayesian information criterion (BIC) using the ModelFinder program [46] in Phylosuite v.1.2.2 [41]. ML analysis was implemented in IQ-TREE v.2.1.2 [47] with 1000 bootstrap replications under the best fit model TVM + I + G2 + F. The BI tree was constructed using MrBayes software v.3.2.6 [48], under the best-fit model GTR + F + I + G4. The Markov chain Monte Carlo (MCMC) algorithm was run for two independent runs of 1 × 10 6 generations, with four independent Markov chain Monte Carlo (MCMC) chains each (i.e., one cold and three heated) and a sampling frequency of 1000 trees. The initial 25% of sampled trees were discarded as burn-in. The ML and BI trees were then combined in TreeGraph software v.2 [49], based on the consistent topological structures, and the combined phylogenetic tree was visualized using Figtree software v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/, accessed on 31 July 2022).
A total of 30 nrDNA ITS sequences from two flower phenotypes of P. fortuneana and related species were also used for constructing phylogenetic trees following the same described methods as above for chloroplast genomes. Among these sequences, the ITS sequences of P. fortuneana (red), P. fortuneana-2 (white) and P. coccinea were generated from raw sequence data (NCBI SRA accession nos. SRR21976475, SRR17631715, and SRR13004386 for these three accessions, respectively) using GetOrganelle v.1.7.4 [28], while the other 27 ITS sequences (P. fortuneana-1 (white), four accessions of the subtribe Vauqueliniinae, four accessions of the subtribe Lindleyinae, 16 accessions of the subtribe Mespilinae, and two outgroups (G. stipulata and G. trifoliata)) were downloaded from the NCBI database (see details in Table S2).

Chloroplast Genome Characteristics
The complete chloroplast genome of P. fortuneana (red) was 160,361 bp in length and retained the typical quadripartite structure, comprising a large single-copy (LSC) region of 88,316 bp, a small single-copy (SSC) region of 19,345 bp, and a pair of inverted repeat (IR) regions of 26,350 bp ( Figure 2). The overall GC content of the whole genome was 36.5%, and the corresponding values of the LSC, SSC, and IR regions were 34.1%, 30.4%, and 42.7%, respectively. Among the five Pyracantha chloroplast genomes, i.e., P. fortuneana (red), P. fortuneana-1 (white), P. fortuneana-2 (white), P. coccinea, and P. atalantioides, whole-chloroplastgenome sizes varied from 160,361 bp (P. fortuneana (red)) to 160,803 bp (P. atalantioides), with LSC from 88,316 bp (P. fortuneana (red)) to 88,698 bp (P. atalantioides), SSC from 19,344 bp (P. fortuneana-2 (white)) to 19,438 bp (P. coccinea), and IR from 26,342 bp (P. coccinea) to 26,355 bp (P. fortuneana-1 (white)) ( Table 1). The length variations among the three individuals (two flower phenotypes) of P. fortuneana were much less than those among the three Pyracantha species. Furthermore, GC content in the LSC, SSC, and IR regions as well as whole genome sequences were almost the same among these five Pyracantha chloroplast genomes (Table 1).
A total of 131 functional genes, including 86 protein-coding genes (PCGs), 37 tRNA genes, and eight rRNA genes were annotated in the chloroplast genome of P. fortuneana (red), which could be further divided into four categories ( Table 2).

Whole-Chloroplast-Genome Comparison
The IR/SC junction characteristics of chloroplast genomes were almost the same not only between P. fortuneana (red) and P. fortuneana (white) but also even between P. fortuneana and three other Pyracantha species (Figure 3). The rps19 gene of these five chloroplast genomes extended 120 bp into the IRb region at the junction of the LSC/IRb (JLB), creating a duplicated pseudogene at the IRa region (pseudogene not shown). Similarly, the ycf1 gene crossed the SSC/IRa (JSA), and the pseudogene fragment was located at the IRb region with 1074 bp (pseudogene not shown). The gene trnH-GUG was completely located in the LSC region, with a length of 120-205 bp away from the LSC/IRa (JLA) boundary ( Figure 3). Additionally, structural comparison of Pyracantha chloroplast genomes revealed that there were high levels of syntenic similarity between P. fortuneana (including two flower phenotypes) and three other Pyracantha species (Figure S1), with no significant rearrangements detected ( Figure S2). IRb region with 1074 bp (pseudogene not shown). The gene trnH-GUG was completely located in the LSC region, with a length of 120-205 bp away from the LSC/IRa (JLA) boundary ( Figure 3). Additionally, structural comparison of Pyracantha chloroplast genomes revealed that there were high levels of syntenic similarity between P. fortuneana (including two flower phenotypes) and three other Pyracantha species ( Figure S1), with no significant rearrangements detected ( Figure S2).

Codon Usage Bias and RNA Editing Sites
A total of 26,292 codons were identified in the chloroplast genome of P. fortuneana (red) ( Table S3). Among all amino acids, tryptophan and methionine were the only two amino acids translated by one codon (UGG and AUG, respectively), while the remaining amino acids were translated by two to six codons. The most frequently used codon was UUA (1.93%, leucine), while the least abundant codon was AGC (0.38%, serine). Meanwhile, the three most frequent amino acids were serine (6.00%), arginine (6.00%), and leu-

Codon Usage Bias and RNA Editing Sites
A total of 26,292 codons were identified in the chloroplast genome of P. fortuneana (red) ( Table S3). Among all amino acids, tryptophan and methionine were the only two amino acids translated by one codon (UGG and AUG, respectively), while the remaining amino acids were translated by two to six codons. The most frequently used codon was UUA (1.93%, leucine), while the least abundant codon was AGC (0.38%, serine). Meanwhile, the three most frequent amino acids were serine (6.00%), arginine (6.00%), and leucine (5.99%), whereas the two least frequent amino acids were methionine (1.00%) and tryptophan (1.00%) ( Table 3). It is also important to note that nearly all (30/32) C/G-ending codons had RSCU values lower than 1, while nearly all (29/32) A/U-ending codons had RSCU values higher than 1, indicating that most of the amino acids tended to use A/Uending codons rather than C/G-ending codons ( Table 3). Among all six Pyracantha chloroplast genomes, the total number of codons ranged from 26,288 (P. atalantioides) to 26,317 (P. angustifolia), with a slight difference between three accessions of P. fortuneana and two other Pyracantha species (Table S3). The RSCU values of the same codons were highly identical in six Pyracantha chloroplast genomes, and for each kind of amino acid, the sum of RSCU values of all codons involved in its encoding was also equal ( Figure 4).   The three P. fortuneana chloroplast genomes (one accession with the red flower phenotype and two accessions with the white flower phenotype) shared the same 38 potential RNA editing sites, which were distributed on 17 PCGs associated with NADHdehydrogenase (ndhB, ndhD, and ndhF), ATP synthase (atpA, atpB, and atpI), photosystem II (psbE, psbF, and psbL), the small subunit of ribosome (rps2 and rps14), RNA polymerase subunits (rpoA and rpoB), subunits of cytochrome (petB), and other functional genes (clpP, accD, and matK) ( Figure 5, Table S4). Among these genes, the gene ndhB had the highest number of RNA editing sites (10 sites), followed by genes ndhD (four sites), ndhF (three sites), rpoB (three sites), atpA (three sites), accD (two sites), petB (two sites), and rps14 (two sites), while all the others had only one RNA editing site ( Figure 5). The conversion of these sites was all from "C" to "T", and most of the converted sites occurred in the second base of the codon, accounting for 76.31% of all sites (29/38). Of 38 RNA editing sites, the most frequent conversion of RNA editing sites was from TCA (serine) to TTA (leucine) (13), followed by CCA (proline) to CTA (leucine) (6), CTT (serine) to TCA (phenylalanine) (4), CAT (histidine) to TAT (tyrosine) (3) (Table S4). Correspondingly, the most frequent conversion of amino acids was from serine to leucine (15), followed by proline to leucine (7), serine to phenylalanine (5), and histidine to tyrosine (4) ( Table S4).

Selection Pressure
The selection pressure analysis revealed that no CDSs across all six Pyracantha chloroplast genomes experienced positive selection (dN/dS ratio > 1), while at least 10 CDSs, mainly involved in NADH-dehydrogenase (ndhD, ndhF, and ndhH), ATP synthase (atpI), photosystem II (psbM), RNA polymerase subunits (rpoC1), small subunit of ribosome (rps18), and other functional genes (ccsA, matK, and rbcL) had at least one site under purifying selection (Table 4), thus presumably had conserved function. In addition, a total of 42 neutral selected sites (dN/dS ratio = 1) were identified in 27 CDSs, most of which were involved in NADH-dehydrogenase (five genes, 12 sites), RNA polymerase subunits (two genes, three sites), the small subunit of ribosome (four genes, four sites), the large subunit of ribosome (three genes, three sites), photosystem II (three genes, three sites), subunits of cytochrome (two genes, three sites), photosystem I (one genes, three sites), ATP synthase (one gene, two sites) (Table S6).

Selection Pressure
The selection pressure analysis revealed that no CDSs across all six Pyracantha chloroplast genomes experienced positive selection (dN/dS ratio > 1), while at least 10 CDSs, mainly involved in NADH-dehydrogenase (ndhD, ndhF, and ndhH), ATP synthase (atpI), photosystem II (psbM), RNA polymerase subunits (rpoC1), small subunit of ribosome (rps18), and other functional genes (ccsA, matK, and rbcL) had at least one site under purifying selection (Table 4), thus presumably had conserved function. In addition, a total of 42 neutral selected sites (dN/dS ratio = 1) were identified in 27 CDSs, most of which were involved in NADH-dehydrogenase (five genes, 12 sites), RNA polymerase subunits (two genes, three sites), the small subunit of ribosome (four genes, four sites), the large subunit of ribosome (three genes, three sites), photosystem II (three genes, three sites), subunits of cytochrome (two genes, three sites), photosystem I (one genes, three sites), ATP synthase (one gene, two sites) (Table S6).

Genetic Distance
AMOVA analysis revealed that there was almost no genetic difference among three P. fortuneana individuals (two flower color phenotype) and between Pyracantha two species (P. fortuneana and P. angustifolia). The genetic differences mainly occurred between P. coccinea and the rest of five Pyracantha individuals (Table 5). Furthermore, the visualized neighbor-joining tree also showed that the genetic distances were short among three P. fortuneana individuals and between P. fortuneana and P. angustifolia, while longer genetic distances were observed between P. coccinea and five other Pyracantha individuals ( Figure  8).

Genetic Distance
AMOVA analysis revealed that there was almost no genetic difference among three P. fortuneana individuals (two flower color phenotype) and between Pyracantha two species (P. fortuneana and P. angustifolia). The genetic differences mainly occurred between P. coccinea and the rest of five Pyracantha individuals (Table 5). Furthermore, the visualized neighborjoining tree also showed that the genetic distances were short among three P. fortuneana individuals and between P. fortuneana and P. angustifolia, while longer genetic distances were observed between P. coccinea and five other Pyracantha individuals (Figure 8).

Phylogenetic Relationships
The phylogenetic tree based on chloroplast genome sequences strongly supported that the genus Pyracantha was monophyletic and was also the only genus in the subtribe Pyracanthinae. This genus was further recovered as sister to the subtribe Mespilinae, comprising the genera Crataegus, Hesperomeles, Amelanchier, and Malacomeles. Within Pyracantha, the accession with the red flower phenotype of P. fortuneana (P. fortuneana (red)) was nested within the accessions with the white flower phenotype (P. fortuneana (white)), rather than formed a sister relationship to P. fortuneana (white) (Figure 9). Phylogenetic relationships within the genus Pyracantha and among the members of the tribe Maleae of Rosaceae inferred from ITS sequences ( Figure S3) were highly similar to those inferred from chloroplast genome sequences.

Phylogenetic Relationships
The phylogenetic tree based on chloroplast genome sequences strongly supported that the genus Pyracantha was monophyletic and was also the only genus in the subtribe Pyracanthinae. This genus was further recovered as sister to the subtribe Mespilinae, comprising the genera Crataegus, Hesperomeles, Amelanchier, and Malacomeles. Within Pyracantha, the accession with the red flower phenotype of P. fortuneana (P. fortuneana (red)) was nested within the accessions with the white flower phenotype (P. fortuneana (white)), rather than formed a sister relationship to P. fortuneana (white) (Figure 9). Phylogenetic relationships within the genus Pyracantha and among the members of the tribe Maleae of Rosaceae inferred from ITS sequences ( Figure S3) were highly similar to those inferred from chloroplast genome sequences.

Chloroplast Genome Features
Chloroplast genomes within a species are highly conserved in terms of genomic structure, gene content, gene order, and GC content [50,51], with P. fortuneana being no exception in this regard. All three individuals of P. fortuneana (two flower phenotypes) possessed the typical quadripartite structure of land plant chloroplast genomes, with a pair of IR regions (26,350-26,355 bp) separating LSC (88,316-88,393 bp) and SSC (19,344-19,348 bp) regions, and encoded 113 identical unique genes, including 79 PCGs, 30 tRNAs, and four rRNAs (Figure 2, Tables 1 and 2). All three chloroplast genomes also shared the

Chloroplast Genome Features
Chloroplast genomes within a species are highly conserved in terms of genomic structure, gene content, gene order, and GC content [50,51], with P. fortuneana being no exception in this regard. All three individuals of P. fortuneana (two flower phenotypes) possessed the typical quadripartite structure of land plant chloroplast genomes, with a pair of IR regions (26,355 bp) separating LSC (88,316-88,393 bp) and SSC (19,348 bp) regions, and encoded 113 identical unique genes, including 79 PCGs, 30 tRNAs, and four rRNAs (Figure 2, Tables 1 and 2). All three chloroplast genomes also shared the same overall GC content (36.5%), higher than that in LSC (34.1%) and SSC (30.4%) regions, but lower than that in IR regions (42.7%) ( Table 1), likely due to the high GC content of the four rRNAs. The high GC content in IR regions may also contribute to the stability of their chloroplast genomes [52][53][54]. Moreover, the location of the IR/SC boundaries were nearly identical either within P. fortuneana or in comparison to P. atalantioides, P. coccinea, and P. angustifolia, and no gene arrangements were detected in Pyracantha chloroplast genomes (Figures 3 and S3). At a broader taxonomic scale, the genome size, GC content, and gene number of Pyracantha chloroplast genomes also resembled those of previously published Maleae species [55,56]. As an example, the gene for the translation initiation factor, infA, which was found to be lost in Pyracantha chloroplast genomes, was also widely absent in the chloroplast genomes of the tribe Maleae and Fabaceae and may have been transferred to the nuclear genome or replaced with other related genes [57].
RNA editing is a post-transcriptional process on the target transcripts by base insertion, deletion, or replacement [58]. It mainly occurs in chloroplast and mitochondrial genomes, and the number of editing sites varies in terrestrial plants [59]. In this study, however, the RNA editing sites and codon usage bias were completely identical not only between red and white flower phenotypes of P. fortuneana but also even among Pyracantha species (Figures 4 and 5), strongly supporting the functional conservation of RNA editing and translation in Pyracantha chloroplast genomes. In addition, the ratio of nonsynonymous to synonymous substitutions (dN/dS) has been widely applied to evaluate the selection pressure sites and nucleotide evolution rates in coding sequences [60]. It is worth noting that only purifying and neutral selected sites were detected in Pyracantha chloroplast genomes (Table 4 and Table S6), indicating the conserved functions of these chloroplast genes in their evolutionary history [61]. Together, these findings provided further evidence on the conserved nature of Pyracantha chloroplast genomes.
Comparative chloroplast genome analyses revealed that most of the sequence variations were found in the LSC and SSC regions, while the IR regions exhibited comparatively fewer sequence variations (Figure 7). The lower sequence divergence was observed in the IRs than the LSC and SSC regions, which may be due to copy correction between IR sequences by gene conversion [61]. Furthermore, a total of four highly variable regions, i.e., trnG-trnR-atpA, psbZ-trnG-trnfM-rps14, ycf3-trnS-rps4, and ndhF-rpl32 (Pi > 0.004) (Figure 7) identified in this study could be served as molecular markers for future phylogenetic, population genetic and phylogeographic studies of Pyracantha.

Morphology, Phylogeny, and Taxonomy of the Red Flower Phenotype of P. fortuneana
Morphologically, the red flower phenotype of P. fortuneana is evergreen or semievergreen shrubs (mature leaves turn red in winter), up to 3 m tall. Lateral branches are short, and their apex is thornlike. Young branchlets are densely rusty pubescent, and mature branchlets are dark brown and glabrous. Buds are small and cover pubescent in the outer. Leaves mainly grow on the short branches, and petioles are glabrous or slightly pubescent when young. Leaf blades are obovate or obovate-oblong and glabrous in both surfaces, 1.5-5.5 cm long, 0.5-2 cm wide. Leaf base is cuneate to wide round, and the serrations of leaf margin are conspicuous or inconspicuous. Leaf apex is obtuse or emarginate. Dense flowers are clustered into loose compound corymbs, ca. 25 cm in diameter. Peduncles are nearly glabrous and caducous bracts are lanceolate. Pedicels are nearly glabrous, ca. 1 cm long. The color of flowers is red to pink, ca. 1 cm in diameter. Calyx tubes are campanulate and their outer surfaces are glabrous. Sepals are triangular to triangular-ovate and glabrous, 1-1.5 mm long, with entire margin and blunt apex. Petals are red to pink and nearly round, ca. 4 × 3 mm long, and apex is rounded or blunt. Stamens are 20 and filaments are red to pink, 3-4 mm long. Ovaries are densely white pubescent on the upper part and styles are red to pink, 3-4 mm long. Pome is orange-red to dark red and subglobose, ca. 5 mm long. Fruit pedicels are short, 2-5 mm long. Sepals are persistent in fruit apex and erect. The florescence is from April to May, and the fruiting period is from August to November [1,3].
Although P. fortuneana individuals with the red flower phenotype can be conspicuously distinguished from those with the white flower phenotype, by their pink to red floral parts, including inner sepals, petals, styles and filaments, and red mature leaves (Figure 1, Table 6), most vegetative morphological traits between red and white flower phenotype individuals are always the same. For example, both of them have short, thornlike lateral branches, rusty-pubescent young branchlets, dark brown and glabrous mature branchlets, obovate or obovate-oblong leaf blades, short, glabrous and pubescent petioles [1,3,13]. Although Wang [17] proposed that P. fortuneana individuals with the red flower phenotype appear to have a wide round leaf base and conspicuous serrations, different from the characteristics of cuneate leaf base and inconspicuous serrations in individuals with the white flower phenotype, our field observations showed that the red flower phenotype individuals also harbored the above traits in white flower phenotype individuals ( Figure 1). Thus, except for flower color, individuals with these two flower phenotypes are not clearly distinguished. Furthermore, in terms of geographical distribution, P. fortuneana individuals with the red flower phenotypes were only found in the Enshi Tujia and Miao Autonomous Prefecture, western Hubei, China, and have a scatted distribution in thickets, stream sides, and roadsides at altitudes of 750-1500 m, mixed with the white flower phenotype individuals. Precisely because red and white phenotypes individuals shared the same habitat, we ruled out the possibility of the elements of calcium and phosphorus causing the difference in flower color [62,63]. Phylogenetically, chloroplast genome and ITS trees are identical, both of which strongly supported the monophyly of the species P. fortuneana, and P. fortuneana (red) was nested within P. fortuneana (white) (Figures 9 and S3), undoubtedly supporting that the red flower phenotype of P. fortuneana should belong to the species of P. fortuneana. Moreover, comparative chloroplast genome analyses between red and white flower phenotypes of P. fortuneana also revealed there is a very close genetic relationship between them. Based on above evidence, we thus suggested that the red flower phenotype of P. fortuneana could be recognized as a rare phenotype resource of this species, conforming to the initial view of the first discoverer [17], or be considered as a new forma.

Limitations and Future Directions
In conclusion, the findings obtained in this study will not only provide new insights into chloroplast genome evolution and phylogeny of Pyracantha, and the taxonomic status of red flower phenotype of P. fortuneana, but also be useful for breeding, cultivation and utilization of this economically important species. However, at least two limitations of our study should be acknowledged. First, current sampling is too sparse for the red flower phenotype of P. fortuneana (only one individual), thus additional sampling is needed to resolve intra-specific relationships and to evaluate how well species delimitations based primarily on morphology coincide with chloroplast genome lineages. Second, although previous studies have reported that flower color in Rosaceae is mainly attributed to anthocyanin accumulation, and controlled by the transcription factor classes of MYB, bHLH, and WD40, e.g., [64,65], the genomic/genetic data currently available for these two flower phenotypes of P. fortuneana are inadequate to uncover the exact molecular mechanism underlying the flower color variations in this species. Thus, the family classic genetic techniques, along with transcriptome profiling, gene expression and population genetic data and/or approaches, are needed to identify putative loss-of-function mutations and/or gene expression changes that generate rare, red flowers instead of the common, white color in P. fortuneana. Finally, it is also worth emphasizing that within-population flower color variation is relatively uncommon, the population of P. fortuneana from the Enshi Tujia and Miao Autonomous Prefecture, western Hubei, China provides an ideal system to explore the possible mechanisms that maintain flower color variation within populations, which will be useful in understanding fundamental evolutionary processes that create and maintain trait variation [66].
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13122404/s1, Figure S1: Synteny comparison among the chloroplast genomes of P. fortuneana and three other Pyracantha species. The colored blocks outside the sequences refer to the score/max bit core ration, with red > 0.75, orange > 0.50, green > 0.25, and blue ≤ 0.25. Figure S2: The structural rearrangement analysis of six Pyracantha chloroplast genomes using the mauve multiple alignment algorithm. Figure S3: The phylogenetic tree of the genus Pyracantha based on 30 nrDNA ITS sequences using both ML and BI methods. The BI posterior probabilities / ML bootstrap values are displayed above the lines ("-" stands for the value less than 0.5/50). Table S1: Sample list of chloroplast genomes used in this study. Table S2: Sample list of rDNA ITS sequences used in this study. Table S3: The codon number in six Pyracantha chloroplast genomes. Table S4: List of the potential RNA editing sites in the chloroplast protein-coding genes of P. fortuneana (red) and two individuals of P. fortuneana (white). Table S5: Summary of SSRs in six Pyracantha chloroplast genomes.   Data Availability Statement: The chloroplast genome sequence and rDNA ITS sequence of P. fortuneana (red) can be found in the GenBank (https://www.ncbi.nlm.nih.gov/genbank/) under accession numbers OM793283 and OP821228, respectively. Raw Illumina data were deposited at NCBI SRA (no. SRR21976475).