Comparative Analysis of Complete Chloroplast Genomes of Rubus in China: Hypervariable Regions and Phylogenetic Relationships

With more than 200 species of native Rubus, China is considered a center of diversity for this genus. Due to a paucity of molecular markers, the phylogenetic relationships for this genus are poorly understood. In this study, we sequenced and assembled the plastomes of 22 out of 204 Chinese Rubus species (including varieties) from three of the eight sections reported in China, i.e., the sections Chamaebatus, Idaeobatus, and Malachobatus. Plastomes were annotated and comparatively analyzed with the inclusion of two published plastomes. The plastomes of all 24 Rubus species were composed of a large single-copy region (LSC), a small single-copy region (SSC), and a pair of inverted repeat regions (IRs), and ranged in length from 155,464 to 156,506 bp. We identified 112 unique genes, including 79 protein-coding genes, 29 transfer RNAs, and four ribosomal RNAs. With highly consistent gene order, these Rubus plastomes showed strong collinearity, and no significant changes in IR boundaries were noted. Nine divergent hotspots were identified based on nucleotide polymorphism analysis: trnH-psbA, trnK-rps16, rps16-trnQ-psbK, petN-psbM, trnT-trnL, petA-psbJ, rpl16 intron, ndhF-trnL, and ycf1. Based on whole plastome sequences, we obtained a clearer phylogenetic understanding of these Rubus species. All sampled Rubus species formed a monophyletic group; however, sections Idaeobatus and Malachobatus were polyphyletic. These data and analyses demonstrate the phylogenetic utility of plastomes for systematic research within Rubus.


Introduction
Rubus L. is a large and diverse genus in the family Rosaceae.Besides being among the most important genetic resources of berry fruits [1], Rubus species also have great potential as medicines [2][3][4] and in ecological restoration [5,6].There are at least 750 to 1000 species of Rubus worldwide, mainly distributed in northern temperate areas [7].In China, there are ca.204 species, accounting for 97% of Asian species [8,9]; thus, China is considered a key center of Rubus diversity, with 138 endemic taxa, mainly distributed in Southwest China [10].Due to frequent hybridization, polyploidization, and apomixis [11][12][13], the classification of Rubus is challenging [13].Among different classification systems, Focke's system with 12 subgenera and 22 sections was the first to be widely accepted [14][15][16].More recent systems, such as the Flora of China, recognizing eight sections and 24 subsections, were applied to Chinese Rubus species [7,17], with sections Idaeobatus (83 species, 11 subgroups) and Malachobatus (86 species, 13 subgroups) being the most diverse [17].Extensive studies on morphology [18,19], palynology [20,21], and cytology [22][23][24] notwithstanding, prevailing classifications were still controversial.Notably, the monophyly of some recognized sections has not been supported [25][26][27][28], and the placement of certain species remains Genes 2024, 15, 716 2 of 16 uncertain [11].Phylogenetic analyses with few genetic makers were insufficient to obtain highly confident phylogenetic trees [11], and resultant phylogenetic trees were poorly resolved.Therefore, a broader array of suitable genetic markers is urgently needed to clarify phylogenetic relationships, essential for the efficient development and utilization of the Rubus.
With a typical quarter-ring structure, plant plastomes are usually composed of two inverted repeat regions (IRs) separated by a large single-copy region (LSC) and a small singlecopy region (SSC).Plastomes sizes of most seed plants range between 120 and 160 kb [29], with occasional significant enlargement or reduction, such as in Pelargonium transvaalense (242 kb, KM527900), Actinidia cylindrica (224 kb) [30], Carnegiea gigantea (113 kb) [31], and Hordeum spontaneum (114 kb, KC912688).In most cases, the contraction, expansion, or loss of IRs leads to changes in plastome size [32].The gene content and order of angiosperm plastomes are usually very conservative, but rearrangements can be found in some families, including Leguminosae [33], Campanulaceae [34], Geraniaceae [35], and Oleaceae [36].These rearrangements can be associated with the loss of genes or introns [37], IR expansion, inversion [38], and repetitive sequence expansion [39] or transposable elements (TEs) [40,41].With the merits of moderate substitution rates, high homology, and often maternal inheritance nature, plastomes are now extensively used for phylogenetic analyses across diverse plant lineages [42][43][44].For example, Zhang et al. [45] reconstructed the phylogeny of Rosaceae based on the plastome data, with most nodes being well resolved and well supported.Li et al. [46] recovered high phylogenetic resolution along the backbone of the tribe Potentilleae within the Rosaceae family based on plastid phylogenomics.
In this study, we used short reads from Illumina sequencing to assemble and annotate 22 plastomes of Chinese Rubus species for genome characterization and phylogenetic analysis.Our objectives were to (1) characterize the composition, structure, and sequence variation of Rubus plastome; (2) identify variation hotspots in these plastomes; and (3) explore the potential utility of plastomes for the phylogenetic reconstruction of Rubus.

Plant Material and DNA Sequencing
Twenty-two Rubus species representing 20 subsections of three sections (Idaeobatus, Malachobatus, and Chamaebatus) were collected from Yunnan, Sichuan, Chongqing, Guangxi, Guangdong, and Hainan provinces in China (Table S1).The identification of each species was verified twice by Bine Xue and Longyuan Wang, and corresponding voucher specimens were deposited in the herbarium of Sun Yat-sen University (SYS) (Table S1).Young, healthy leaves of each collection were dried with silica gel, and their DNA was extracted using a modified CTAB method.Prior to sequencing, the integrity of the DNA samples was checked by gel electrophoresis and the purity was evaluated with the NanoDrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) with a ratio of absorbance at 260 nm/280 nm ca.1.8 and finally measured (>20 ng•µL −1 ) with a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA).A NEBNex ® t Ultra TM II DNA Library Prep Kit (New England Biolabs) was used to construct a library with an insert size of 350 bp, which was sequenced with at least 20-fold coverage on the Illumina HiSeq X™ Ten Sequencing System (San Diego, CA, USA), with 150 bp paired-end reads.

Assembly, Annotation, and Visualization of Chloroplast Genome
By using the de novo assembler NOVOPlasty 3.6 [47] with default parameters, the 22 Rubus plastomes were assembled with 5 million raw reads after the removal of adapters and further annotated with Rubus leucanthus (MK105853) as a reference with the package Plann [48] with default parameters.Further rRNA annotations were implemented in GeSeq [49] (https://chlorobox.mpimp-golm.mpg.de/geseq.html(accessed on 20 December 2019)) with default parameters, and all annotation information was checked by Sequin v.9.50.After manual correction, plastome circle maps were obtained from the online web program OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html(accessed on 31 May 2020)).
The plastomes of 24 Rubus species were compared and analyzed with R. leucanthus (MK105853) as a reference with Shuffle-LAGAN packaged in mVISTA [53].The program MAFFT 7 [54] with default parameters was used to align the complete plastomes, LSC, SSC, and IRs of the24 Rubus species, and the corresponding numbers of mutation sites and information sites were estimated with MEGA X [50] using default parameters.
Plastome nucleotide polymorphisms (Pi) among the 24 plastomes were estimated by DnaSP v.6 [55] with a window length of 600 bp and a step size of 200 bp.The nine fragments with the highest Pi values were selected as variation hotspots among the 24 Rubus species, and the locations of these fragments were determined based on the annotation information.
Prior to phylogenetic reconstruction, alignment partition schemes and substitution model determinations were implemented by using the program PartitionFinder v 2.1.1 [56] with default parameters.Following Ma's partition scheme strategies for plastomes [57], we divided the plastome as follows: a. partition0, unpartitioned; b. partition2, coding and non-coding regions; c. partition3, LSC, IRs, SSC; partition6, rRNA, tRNA, non-coding regions, codon 1,2,3 of coding regions.The best partition schemes and corresponding substitution models were determined among the four partition schemes by maximum likelihood estimation or Akaike information criterion (AIC).Based on these partitions and corresponding substitution modes, maximum likelihood (ML) phylogenetic tree estimation was implemented in RAxML 8.2.12 [58] with 1000 bootstrap analyses using default parameters.Bayesian analysis (BI) was performed by using MrBayes 3.2.7 [59] with 1,000,000 generations and a sample frequency of 1000 generations until the average standard deviation of split frequency values was <0.01.The first 25% of samples were discarded as "burn-in", and the remaining samples were summarized to construct a 50% majority-rule consensus tree.The convergence of MCMC for MrBayes analysis was diagnosed using Tracer 1.7.1 [60] with an effective sample size (ESS) of over 200.All phylogenetic trees were visualized using the program in FigTree v1.4.4.

Characteristics of Chloroplast Genomes
We sequenced and annotated the complete chloroplast genomes of 22 Rubus species (Table 1).All plastomes of these species contain the four typical structural regions including a large single-copy region, two inverted repeat regions, and a small single-copy region (Figure 1).Their genome size ranged between 155,464 bp (R. pileatus) and 156,506 bp (R. pectinaris) (Table 1).The LSC length varied from 84,847 bp (R. pileatus) to 86,214 bp (R. pectinaris), accounting for 54.58-55.09% of the plastome length.The SSC length ranged between 18,481 bp (R. innominatus) and 18,875 bp (R. xanthocarpus), accounting for 11.89-12.10% of the plastome length.The length of the IR (single) ranged from 25,737 (R. peltatus) to 25,997 (R. pileatus), accounting for 16.45-16.72%of the plastome length.The length ratio of each structural region to the total length was relatively stable in 24 Rubus species, and plastome size variation was relatively small.With an average IR region length of 18,778 bp and an average SSC region length of 25,801 bp, the variation among these species was less than 400 bp for both regions.The total chloroplast genome length, averaging 156,000 bp, showed significant variation across the 24 Rubus species, ranging from 6 to 1042 bp.In addition, the total length and the length of the three structural regions of the Rubus plastomes were similar to those species in the same subfamily (Rosoideae) (Fragaria vesca, F. pentaphylla, and Rosa multiflora), but quite different from members of subfamily Amygdaloideae (Pyrus pyrifolia and Malus hupehensis).
We identified 112 unique genes, including 79 protein-coding genes, 29 tRNAs, and four rRNAs in 24 Rubus plastid genomes, with the total number of genes ranging between 129 and 131 (Table 1).Numbers of both tRNAs and rRNAs were conserved across the Rosaceae, but those of protein-encoding genes exhibited significant variation ranging between 83 (Pyrus pyrifolia) and 90 (Rosa multiflora), while our Rubus species consistently had 85 proteinencoding genes.Protein-encoding genes were classified into four functional categories: self-replication (58), photosynthesis (45), other functions (5), and unknown functions (4) (Table S2).In the IR region, the composition of four rRNAs, seven tRNAs (trnA-UGC, trnI-CAU, and trnI-GAU, etc.), and six protein-coding genes (rps7, rpl2, ndhB, etc.) were common, and, sometimes, one end of trans-spliced gene rps12 or ycf1 was observed among the 24 Rubus species.For protein-encoding genes with introns, two introns were observed in clpP and ycf3 and 14 genes had only one intron.We identified 112 unique genes, including 79 protein-coding genes, 29 tRNAs, and four rRNAs in 24 Rubus plastid genomes, with the total number of genes ranging between 129 and 131 (Table 1).Numbers of both tRNAs and rRNAs were conserved across the Rosaceae, but those of protein-encoding genes exhibited significant variation ranging between 83 (Pyrus pyrifolia) and 90 (Rosa multiflora), while our Rubus species consistently had 85 protein-encoding genes.Protein-encoding genes were classified into four functional categories: self-replication (58), photosynthesis (45), other functions (5), and unknown functions (4) (Table S2).In the IR region, the composition of four rRNAs, seven tRNAs (trnA-UGC, trnI-CAU, and trnI-GAU, etc.), and six protein-coding genes (rps7, rpl2, ndhB, etc.) were common, and, sometimes, one end of trans-spliced gene rps12 or ycf1 was observed among the 24 Rubus species.For protein-encoding genes with introns, two introns were observed in clpP and ycf3 and 14 genes had only one intron.GC contents among the 24 Rubus species were analyzed, and the average GC content of the plastomes ranged between 36.91% (R. peltatus) and 37.30% (R. innominatus), and no significant bias was observed.However, obvious differences existed in different regions, i.e., IR (42.79%),LSC (35.05%), and SSC (31.15%).Furthermore, this same pattern was observed in the other five Rosaceae species.

Comparative Analysis of Chloroplast Genomes and Identification of Divergence Hotspots
Collinearity analysis demonstrated no genome rearrangements among 24 Rubus species (Figure S1).In addition, the boundaries of IRA and IRB regions were relatively stable among these species (Figure 2).For instance, the LSC-IRB boundary was mainly located between rps19 and rpl2 (except for in R. leucanthus), with a distance of 13-25 bp between rps19 and the adjacent boundary.The boundary between IRB and SSC of most Rubus species was close to the gene ndhF of the SSC region, except for five species (R. corchorifolius, R. peltatus, R. leucanthus, R. innominatus, and R. pileatus), where it was located in ycf1 with a length of 11 bp to 26 bp in the SSC region.The SSC-IRA boundaries were all inside ycf1, with lengths varying from 4605 bp (R. tsangii) to 4650 bp (R. leucanthus) in the SSC region and with lengths from 432 bp (excluding 1092 bp of R. leucanthus) in the IRA region.Except for R. pentagonus, R. xanthocarpus, R. innominatus, and R. pileatus, the IRA-LSC boundary of other Rubus species was adjacent to trnH in the LSC region.Overall, the structural borders of Rosoideae species (F.vesca, F. pentaphylla, and Rosa multiflora) more closely resembled those of the Rubus species than those of two Amygdaloideae species (P.pyrifolia and M. hupehensis).By using mVISTA, chloroplast genome sequences of 23 Rubus species were compared with reference to R. leucanthus annotations (Figure 3).Overall, the IR region exhibited the least variation (0.91%), much less than that of LSC (6.88%) or SSC (8.53%) (Table 2).As expected, the variable proportions of coding genes were significantly less than those of noncoding genes (introns and intergenic spacers) (Table 2).The nucleotide polymorphism (Pi) of plastomes among 24 Rubus species varied from 0 to 0.0375 (Figure 4), and the IR regions, with a mean Pi value < 0.005, showed the least differentiation.Based on Pi values, nine highly variable regions were selected (Pi value > 0.0225), including seven gene spacers (trnH-psbA, trnK-rps16, rps16-trnQ-psbK, petN-psbM, trnT-trnL, petA-psbJ, ndhF-trnL), one gene intron (rpl16 intron), and one protein-coding gene (ycf1).Among them, trnH-psbA exhibited the highest level of polymorphism (Pi = 0.0565) and a parsimony information site proportion of 8.63%; rps16-trnQ-psbK displayed the highest variation in site proportion (17.62%).

Phylogenetic Analysis
With an increased number of partitions, decreased AIC or increased maximum likelihood supported the scheme partition 6 (Table S3); thus, we divided the plastome alignment into rRNA, tRNA, non-coding regions, and codons 1, 2, and 3 of the coding regions, respectively.For each partition, GTR + I + R was selected as the best-fit substitution model.From these datasets, the tree topologies estimated by the Bayesian inference (BI) method and maximum likelihood (ML) method were consistent (Figure 5).All Rubus species form a monophyletic clade (PP = 1.00,ML BS = 100%) as a sister to the Fragaria and Rosa species (PP = 1.00,ML BS = 100%).The Rubus species split into three clades: clade A consisted of only four Rubus species, sister to the remaining Rubus species; the remaining species diverged into two distinctive sister clades (clade B and clade C).Species in clades A and B all belonged to the sect.Idaeobatus.The species in clade C were further divided into two subclades, one assigned to sect.Malachobatus and the other with three species belonging to sections Malachobatus, Idaeobatus, and Chamaebatus, respectively (Figure 5).The monophyly of sects.Idaeobatus and Malachobatus were therefore not supported, unless R. pentagonus was misassigned at the sectional level.

Phylogenetic Analysis
With an increased number of partitions, decreased AIC or increased maximum likelihood supported the scheme partition 6 (Table S3); thus, we divided the plastome alignment into rRNA, tRNA, non-coding regions, and codons 1, 2, and 3 of the coding regions, respectively.For each partition, GTR + I + R was selected as the best-fit substitution model.From these datasets, the tree topologies estimated by the Bayesian inference (BI) method and maximum likelihood (ML) method were consistent (Figure 5).All Rubus species form a monophyletic clade (PP = 1.00,ML BS = 100%) as a sister to the Fragaria and Rosa species (PP = 1.00,ML BS = 100%).The Rubus species split into three clades: clade A consisted of only four Rubus species, sister to the remaining Rubus species; the remaining species diverged into two distinctive sister clades (clade B and clade C).Species in clades A and B all belonged to the sect.Idaeobatus.The species in clade C were further divided into two subclades, one assigned to sect.Malachobatus and the other with three species belonging to sections Malachobatus, Idaeobatus, and Chamaebatus, respectively (Figure 5).The monophyly of sects.Idaeobatus and Malachobatus were therefore not supported, unless R. pentagonus was misassigned at the sectional level.ML and BI analyses were also conducted based on the dataset of the nine mutation hotspots in Rubus species.The phylogenetic trees based on this dataset were generally ML and BI analyses were also conducted based on the dataset of the nine mutation hotspots in Rubus species.The phylogenetic trees based on this dataset were generally consistent with those obtained from whole plastomes, with somewhat decreased support for some nodes or position shifts for some species (Figure 6).
consistent with those obtained from whole plastomes, with somewhat decreased support for some nodes or position shifts for some species (Figure 6).

Variation in Plastome Size and Gene Composition
The gene content of plastomes is conservative, typically involving 100 to 120 unique genes [61].In these Rubus species, we identified 112 unique genes and 129 to 131 total genes, with the differences attributed to the copy numbers of ycf1 and trnH-GUG.Among these, two full copies of the ycf1 gene were only observed in five species (R. pileatus, R. innominatus, R. leucanthus, R. peltatus, and R. corchorifolius), with only one copy in the others.Two copies of ycf1, one complete and one pseudogene truncated by the IRB-SSC boundary [62], have been reported in Nelumbonaceae [63], Salicaceae [64], and Brassicaceae [65], and some have even been lost completely in some plants [66].In 19 of the 24 Rubus plastomes, one copy of ycf1 in the IRB region has been lost, but the random distribution of these species across the phylogenetic tree suggests that these events give no clear phylogenetic signal.
Of 29 Rosaceae species, two copies of trnH-GUG located in the LSC region were only observed in R. leucanthus.This duplication is common in monocotyledons (e.g., Orchidaceae, Poaceae) and some basic angiosperms (Magnoliales and Chloranthales) [67].Mardanov et al. [68] considered that the expansion from the IR region to the LSC region might be attributed to the additional copy of trnH-GUG, representing an ancestral pattern in basal angiosperms.Wang et al. [69] found that the karyotypes of Chinese Rubus resembled those of the basic angiosperms.Hence, Rubus may be relatively primitive within the Rosaceae.

Variation in Plastome Size and Gene Composition
The gene content of plastomes is conservative, typically involving 100 to 120 unique genes [61].In these Rubus species, we identified 112 unique genes and 129 to 131 total genes, with the differences attributed to the copy numbers of ycf1 and trnH-GUG.Among these, two full copies of the ycf1 gene were only observed in five species (R. pileatus, R. innominatus, R. leucanthus, R. peltatus, and R. corchorifolius), with only one copy in the others.Two copies of ycf1, one complete and one pseudogene truncated by the IRB-SSC boundary [62], have been reported in Nelumbonaceae [63], Salicaceae [64], and Brassicaceae [65], and some have even been lost completely in some plants [66].In 19 of the 24 Rubus plastomes, one copy of ycf1 in the IRB region has been lost, but the random distribution of these species across the phylogenetic tree suggests that these events give no clear phylogenetic signal.
Of 29 Rosaceae species, two copies of trnH-GUG located in the LSC region were only observed in R. leucanthus.This duplication is common in monocotyledons (e.g., Orchidaceae, Poaceae) and some basic angiosperms (Magnoliales and Chloranthales) [67].Mardanov et al. [68] considered that the expansion from the IR region to the LSC region might be attributed to the additional copy of trnH-GUG, representing an ancestral pattern in basal angiosperms.Wang et al. [69] found that the karyotypes of Chinese Rubus resembled those of the basic angiosperms.Hence, Rubus may be relatively primitive within the Rosaceae.

Variation and Divergence of Rubus Plastomes
So far, the expansion or contraction of IR boundaries and gene loss have been attributed to plastome size variations [70,71], and, in some taxa, these changes were phylogenetically informative [67].Beyond the Rubus species, the IR boundaries expanded at rps19 and ndhF genes in two other species in the subfamily Amygdaloideae, and the same expansion of IR boundaries also occurred in five species of Prunus in the same subfamily [72].Therefore, we speculate that the IR boundary changes were responsible for the plastome size difference between Rubus and the subfamily Amygdaloideae.The plastid genome of most plants has a non-coding sequence ranging from 0 to 30 bp between the IRA-LSC boundary and the 3 ′ end of trnH-GUG [73].However, such non-coding sequences expanded significantly and showed high similarity to mitochondrial gene sequences in some taxa, for instance, Apiales species (>200 bp) including Petroselinum (345 bp) [73].We also observed a non-coding sequence ranging between one and eight bp at this position in R. pentagonus, R. xanthocarpus, R. innominatus, and R. pileatus, but it showed no phylogenetic signal.
The GC content was associated with DNA stability; the higher the GC content, the lower the mutation rates [74].The GC content of IR, LSC, and SSC among Rubus species decreased in turn, which was well in accordance with the variation levels among the three regions for the Rubus species.In our study, variation indices, including the mutation site proportion, parsimony information site proportion, nucleotide polymorphisms, and differentiation levels, all showed the same trend among the three regions [75].

Development of Chloroplast DNA Markers
Chloroplast DNA markers have been widely used in Rubus phylogenetic analyses, e.g., protein-coding genes, such as ndhF, rbcL, and rpl16, and intergenic regions trnL-trnF and trnS-trnG [27].However, most of these markers have shown limitations for phylogenetic reconstruction in Rubus and other Rosaceae [76,77].In contrast, by using nine screened hyper-mutation fragments, we obtained a well-resolved phylogenetic tree comparable to one based on whole plastome sequences, and these nine markers might be useful for phylogenetic analyses in other groups of Rosaceae.Some of our recommended fragments, i.e., trnH-psbA [78], trnT-trnL [79], petA-psbJ [80], and rpl16 intron [81] have been widely applied in other plants, and ycf1a or ycf1b have even been recommended as candidates for core DNA barcodes [82].

Phylogenetic Analysis
Due to frequent hybridization, apomixis, and polyploidy, the classification and phylogeny of Rubus have long been controversial [11][12][13]28].By using whole chloroplast genomes, the resolution of the maternal phylogenetic tree can be improved significantly over those based on only a few fragments.Our phylogenetic analyses showed that Rubus is monophyletic and more closely related to Fragaria and Rosa than to Pyrus and Malus, consistent with another recent plastid phylogenomic study of Rosaceae [45].Our phylogenetic analyses showed that sections Idaeobatus and Malachobatus as currently recognized were polyphyletic, consistent with previous studies of sect.Idaeobatus [83,84].However, more taxon sampling will be required before proposing any new infra-generic classification.
Species in sect.Idaeobatus mainly have compound leaves and those in sect.Malachobatus mainly bear simple leaves, implying that compound leaves might be ancestral in Rubus.In previous studies, the relationships among three taxa, R. ellipticus, R. ellipticus var.obcordatus, and R. wallichianus, were controversial [24,85,86].Our phylogenetic analysis revealed that R. ellipticus has a more distant relationship with R. wallichianus and R. ellipticus var.obcordatus, lending support for the recognition of R. ellipticus var.obcordatus at the species level, which will require a broader investigation.
In our study, we examined six species from section Idaeobatus and three from section Malachobatus, which overlapped with those studied by Carter et al. (2019) [28].Our findings show a divergence in the relationship of R. ichangensis and R. lambertianus within section Malachobatus, unlike Carter et al. (2019), who identified them as sister species based on nuclear loci.Additionally, we observed inconsistencies in the phylogenetic relationship for R. lineatus within Malachobatus and for R. calophyllus and R. pentagonus within Idaeobatus when comparing chloroplast genomes with nuclear loci.The remaining six species examined in both studies showed that relationships among R. niveus, R. coreanus, and R. innominatus in our clade A are congruent with the chloroplast phylogeny reported in the earlier study [28], yet conflict with their nuclear phylogeny.Conversely, the alignment of R. ellipticus and R. crataegifolius in our clade B matches with both their chloroplast and nuclear loci findings [28].Due to extensive natural hybridization within Rubus, the matrilineal chloroplast genome is difficult to explain via reticular evolution.In subsequent studies, the nuclear genome data with biparental inheritance and detailed genetic information can be applied to explore the complex evolutionary history of Rubus including reticular evolution, hybrid infiltration, and polyploidization.

Conclusions
In this study, we assembled and annotated the complete chloroplast genomes of 22 Chinese Rubus species.Comparative plastid genome analysis revealed that gene order and content were conserved among the Rubus species.Differences in genome size among them were small and typically related to the length variations of the LSC region.There were slight differences in the total number of genes between a few Rubus species.Plastid genome structures were stable, and there was no obvious expansion or contraction observed at the boundaries of the two IR regions.The composition and order of the chloroplast genes were also conservative, with high collinearity.Phylogenetic trees were constructed based on the whole chloroplast genome dataset, and the combined nine hyper-variable regions dataset strongly supported that Rubus species in this study formed a monophyletic group with three main clades, but both sections Idaeobatus and Malachobatus are polyphyletic.This study reaffirms the great potential of plastid genomes to resolve biosystematic relationships among Rubus species.

Figure 1 .
Figure 1.Overview of circos map for the Rubus chloroplast genomes.Genes in the large circle are transcribed clockwise, genes outside the large circle are transcribed counterclockwise, and dark gray in the small circle corresponds to GC content.LSC, large single-copy region; SSC, small single-copy region; IRA, IRB, inverted repeat region.Different functional genes are distinguished by color.

Figure 1 .
Figure 1.Overview of circos map for the Rubus chloroplast genomes.Genes in the large circle are transcribed clockwise, genes outside the large circle are transcribed counterclockwise, and dark gray in the small circle corresponds to GC content.LSC, large single-copy region; SSC, small single-copy region; IRA, IRB, inverted repeat region.Different functional genes are distinguished by color.

Genes 2024 , 17 Figure 2 .
Figure 2. Comparison of the boundaries of LSC, IR, and SSC among chloroplast genomes of 24 Rubus species and 5 other Rosaceae species.JLB, junction between LSC and IRB; JSB, junction between SSC and IRB; JSA, junction between SSC and IRA; JLA, junction between LSC and IRA.

Figure 2 .
Figure 2. Comparison of the boundaries of LSC, IR, and SSC among chloroplast genomes of 24 Rubus species and 5 other Rosaceae species.JLB, junction between LSC and IRB; JSB, junction between SSC and IRB; JSA, junction between SSC and IRA; JLA, junction between LSC and IRA.

Genes 2024 , 17 Figure 3 .
Figure 3. Comparative analysis of chloroplast genome sequences in 24 Rubus species.R. leucanthus was chosen as the reference genome; the gray arrow indicates the direction of the gene.UTR, untranslated region; CNS, non-coding sequence.The y-axis represents from 50% to 100% consistency.

Figure 4 .
Figure 4.Nucleotide polymorphism analysis of chloroplast genomes in 24 Rubus species.The x-axis denotes the coordinates of the chloroplast genome and the y-axis represents the polymorphisms measured with Pi.

Figure 4 .
Figure 4.Nucleotide polymorphism analysis of chloroplast genomes in 24 Rubus species.The x-axis denotes the coordinates of the chloroplast genome and the y-axis represents the polymorphisms measured with Pi.

Figure 5 .
Figure 5.The Bayesian inference (BI) and maximum likelihood (ML) tree for Rubus based on the complete plastome.The number on the branch is BI posterior probability (PP)/ML bootstrap (BS); branches without numbers indicate nodes with 1.00/100 support values.

Figure 5 .
Figure 5.The Bayesian inference (BI) and maximum likelihood (ML) tree for Rubus based on the complete plastome.The number on the branch is BI posterior probability (PP)/ML bootstrap (BS); branches without numbers indicate nodes with 1.00/100 support values.

Figure 6 .
Figure 6.The Bayesian inference (BI) and maximum likelihood (ML) tree for Rubus based on nine highly variable cpDNA makers.The number on the branch is BI posterior probability (PP)/ML bootstrap (BS); branches without numbers indicate nodes with 1.00/100 support values.

Figure 6 .
Figure 6.The Bayesian inference (BI) and maximum likelihood (ML) tree for Rubus based on nine highly variable cpDNA makers.The number on the branch is BI posterior probability (PP)/ML bootstrap (BS); branches without numbers indicate nodes with 1.00/100 support values.

Table 1 .
Characteristics of chloroplast genomes of 24 Rubus species and 5 other species in Rosaceae.

Table 2 .
Summaries of variable information for 24 Rubus chloroplast genomes.Comparative analysis of chloroplast genome sequences in 24 Rubus species.R. leucanthus was chosen as the reference genome; the gray arrow indicates the direction of the gene.UTR, untranslated region; CNS, non-coding sequence.The y-axis represents from 50% to 100% consistency.

Table 2 .
Summaries of variable information for 24 Rubus chloroplast genomes.