Comparative Chloroplast Genome Analysis of Rhubarb Botanical Origins and the Development of Specific Identification Markers

Rhubarb is an important ingredient in traditional Chinese medicine known as Rhei radix et rhizome. However, this common name refers to three different botanical species with different pharmacological effects. To facilitate the genetic identification of these three species for their more precise application in Chinese medicine we here want to provide chloroplast sequences with specific identification sites that are easy to amplify. We therefore sequenced the complete chloroplast genomes of all three species and then screened those for suitable sequences describing the three species. The length of the three chloroplast genomes ranged from 161,053 bp to 161,541 bp, with a total of 131 encoded genes including 31 tRNA, eight rRNA and 92 protein-coding sequences. The simple repeat sequence analysis indicated the differences existed in these species, phylogenetic analyses showed the chloroplast genome can be used as an ultra-barcode to distinguish the three botanical species of rhubarb, the variation of the non-coding regions is higher than that of the protein coding regions, and the variations in single-copy region are higher than that in inverted repeat. Twenty-one specific primer pairs were designed and eight specific identification sites were experimentally confirmed that can be used as special DNA barcodes for the identification of the three species based on the highly variable regions. This study provides a molecular basis for precise medicinal plant selection, and supplies the groundwork for the next investigation of the closely related Rheum species comparing and correctly identification on these important medicinal species.


Introduction
Rheum (Polygonaceae), a genus containing eight sections and ~60 herbaceous species, is widely distributed in Asia, especially in the temperate and subtropical high mountainous regions [1].Rhubarb (Rhei radix et rhizome), an important multi-origin traditional Chinese medicine, was first recorded in the Shennong Herbal Classic as Jun Yao due to its efficacy as an analgesic and anti-inflammatory, effective at clearing heat, removing toxicity, and improving blood stasis [2].Modern pharmacology research suggests that rhubarb also exhibits anticancer, antiviral, hypotensive, and immune system regulatory effects [3].At present, R. palmatum, R. tanguticum, and R. officinale are considered the legal species to be used to produce Rhei radix et rhizome, as recorded in the Chinese Pharmacopoeia 2015 edition.
In past reports, researchers have mainly focused on the extraction of bioactive components, chemistry, or pharmacology properties.However, as a typical multi-origin medicine, the composition of each pharmacological product is different with regard to the three Rheum species and its effect thus variable [4].The source of the species plays a decisive role in the chemical composition of rhubarb [5].The difference between the three rhubarb plants mainly lies in the degree of leaf division, whereby R. officinale leaves are lobed and broad triangular, the R. palmatum leaves are lobed and triangular, and R. tanguticum leaves are parted and lanceolate [6], The methods that have long served to identify rhubarb medicinal materials mainly adopt trait identification, microscopic identification, physical and chemical identification, but these methods depend on experience, their subjectivity is strong, and it is hard to distinguish between processed products and powders [7].It is particularly important to accurately identify the species of rhubarb.Some scholars have used the trnL-trnF sequences of 13 species of the genus Rheum to analyze and design specific primer pairs for the identification of different botanical species of rhubarb [8].psbA-trnH has also been used to distinguish rhubarb from the other 19 related Polygonaceae species [9].The gene sequences of matK showed potential for the distinction of different Rheum sections [10].From the results of the identification efficiency analysis, the identification success rate of trnH-trnF for Rheum is 84%, and the success rate for matK identification is 83.7%, indicating that the identification efficiency of the chloroplast genes is significantly higher than that of nuclear sequences, such as ITS2, for Rheum.Therefore, the current method can identifyf closely related Rheum species only to a certain degree.
Chloroplasts are ubiquitous in plant cells and play important roles in plants to carry out photosynthesis and energy conversion.The chloroplast genome is independent of nuclear genes and is dominated by maternal inheritance.The chloroplast genome of most angiosperms consists of four parts: a pair of inverted repeats (IRA and IRB), a large single-copy region (LSC), and one small single-copy region (SSC) [11].With the expansion and contraction of the IR region, the chloroplast genome size is approximately 120~160 kb [12].Numerous scholars suggest that the whole chloroplast genome sequence is an ideal genomic barcode because the genome size is moderate, the intraspecific sequences are relatively conservative, the interspecies variation is large, and the substitution rate is lower than nuclear genes but higher than mitochondrial genes [13].
In the present study, the chloroplast DNA of three different botanical species of rhubarb, the famous Chinese medicinal herb, were utilized.After assembly and annotation, the characteristics of the chloroplast genome were analyzed and the identification of a few specific short sequences which are easy to amplify and which contain identification sites to distinguish the study species.The current study laid the foundation of super barcode utilization in rhubarb, provided a molecular basis for precise medicinal plant selection, and supplied the groundwork for the next investigation of the closely related Rheum species comparing and correctly identification on these important medicinal species.

Chloroplast Genome Features
The chloroplast genomes ranged from 161,053 bp (R. tanguticum) to 161,541 bp (R. palmatum) in length, and the chloroplast genomes of the three species shared the same GC content, 37.3%, which is similar to the reported chloroplast genome of angiosperms [14].The GC content of the IR region is higher than that of LSC and SSC.These genome sequences have been submitted to GenBank with accession number MH572012 for R. officinale, and MH572013 for R. tanguticum.The sequence of R. palmatum is similar to the published sequence with accession number KR816224 [15] (Table 1).
The chloroplast genome of Rheum was found to encode 131 predicted functional genes, including 92 protein-coding genes, 31 tRNA genes, and eight rRNA genes (Table 2).Among them, seven genes, trnL-TAG, trnI-GAT, rpl2, atpF, rpoC1, ndhA, and ndhB, contain one intron; two genes contain two introns (ycf3 and clpP); 18 genes have two copies; and one gene has three copies.Variation was observed among the different species; for example, trnL-TAG contains one intron in R. officinale, but none in R. palmatum and R. tanguticum.LSC: large single-copy region; IR: inverted repeats; SSC: small single-copy region.
Table 2.A list of genes found in the chloroplast genomes of the three Rheum species including copy number and introns included.

Characterization of Simple Sequence Repeats
The distribution of repeated sequences and the presence of the SSRs in the chloroplast genomes of the three species were analyzed.Among the three species, SSRs of R. palmatum (314) and R. officinale (312) were larger than SSR of R. tanguticum (301).There were 512, 183, 203, 25, and four mono-, di-, tri-, tetra-and pentanucleotide SSRs, respectively.No hexanucleotides were found in the three species.Among all SSRs, mononucleotide repeats were the most common, accounting for 55.23% of the SSR population, of which 507 A/T accounted for 99.0% of mononucleotides.The number of mononucleotides (174) is the same in R. officinale and R. palmatum; R. palmatum and R. tanguticum contain the same number of trinucleotides (68) and Pentanucleotides (1); there are 8

Characterization of Simple Sequence Repeats
The distribution of repeated sequences and the presence of the SSRs in the chloroplast genomes of the three species were analyzed.Among the three species, SSRs of R. palmatum (314) and R. officinale (312) were larger than SSR of R. tanguticum (301).There were 512, 183, 203, 25, and four mono-, di-, tri-, tetra-and pentanucleotide SSRs, respectively.No hexanucleotides were found in the three species.Among all SSRs, mononucleotide repeats were the most common, accounting for 55.23% of the SSR population, of which 507 A/T accounted for 99.0% of mononucleotides.The number of mononucleotides (174) is the same in R. officinale and R. palmatum; R. palmatum and R. tanguticum contain the same number of trinucleotides (68) and Pentanucleotides (1); there are 8 tetranucleotides in R. officinale and R. tanguticum; the number of dinucleotides in R. officinale, R. palmatum and R. tanguticum were 61, 62, and 60, respectively (Figure 2a).The SSRs of the three Rheum plants are similar in number in the four regions (Figure 2b), but the percentage of SSR in the four regions is different (Figure 2c).These findings show that SSRs are not evenly distributed in the chloroplast genomes.The length of most SSRs were <20 bp (Figure 2d).The distribution of p-type SSRs with a length greater than 10 bp was analyzed by using R. officinale as the representative (Table 3).The repeated sequences were mostly distributed in the non-coding sequences (CNS): intergenic spacers and intron regions, but found in coding regions (CDS), such as rpoC2, petA, ycf2, ndhF, ndhG, matK, atpA, ndhD, and ycf1.There is a 10 bp SSR between petL and the CNS.The others were similar to that of R. officinale, and are displayed in Supplementary Tables S1 and S2.
Molecules 2018, 23, x FOR PEER REVIEW 5 of 13 tetranucleotides in R. officinale and R. tanguticum; the number of dinucleotides in R. officinale, R. palmatum and R. tanguticum were 61, 62, and 60, respectively (Figure 2a).The SSRs of the three Rheum plants are similar in number in the four regions (Figure 2b), but the percentage of SSR in the four regions is different (Figure 2c).These findings show that SSRs are not evenly distributed in the chloroplast genomes.The length of most SSRs were <20 bp (Figure 2d).The distribution of p-type SSRs with a length greater than 10 bp was analyzed by using R. officinale as the representative (Table 3).The repeated sequences were mostly distributed in the non-coding sequences (CNS): intergenic spacers and intron regions, but found in coding regions (CDS), such as rpoC2, petA, ycf2, ndhF, ndhG, matK, atpA, ndhD, and ycf1.There is a 10 bp SSR between petL and the CNS.The others were similar to that of R. officinale, and are displayed in Supplementary Tables S1 and S2.

Phylogenetic Analysis
The phylogenetic relationships tree represents the results of a systematic study that can be used to describe the evolutionary relationships between species [16].As can be seen from the neighbor joining tree, monocotyledons and dicotyledons were clustered together, and the support rate was 100%.Neighbor joining strongly supported the position of Fagopyrum esculentum and F. tataricum as a sister of the closely related species in the Polygonaceae.
The three species of rhubarb were clustered into a polytomy and Rheum wittrockii clustered together, indicating that despite the close relationship, the three species of rhubarb can be separated from each other via these regions (Figure 3).The chloroplast genome sequence provides a new method for the identification of Rheum.

Phylogenetic Analysis
The phylogenetic relationships tree represents the results of a systematic study that can be used to describe the evolutionary relationships between species [16].As can be seen from the neighbor joining tree, monocotyledons and dicotyledons were clustered together, and the support rate was 100%.Neighbor joining strongly supported the position of Fagopyrum esculentum and F. tataricum as a sister of the closely related species in the Polygonaceae.
The three species of rhubarb were clustered into a polytomy and Rheum wittrockii clustered together, indicating that despite the close relationship, the three species of rhubarb can be separated from each other via these regions (Figure 3).The chloroplast genome sequence provides a new method for the identification of Rheum.

Comparative Genomic and Candidate Identification Sequence Analysis
The mVISTA [17] software was used to compare and analyze the chloroplast genomes of the three studied species in Rheum, and R. palmatum was used as a reference sequence.Overall, the IR region is more conservative than LSC and SSC, and coding regions were more conserved than

Comparative Genomic and Candidate Identification Sequence Analysis
The mVISTA [17] software was used to compare and analyze the chloroplast genomes of the three studied species in Rheum, and R. palmatum was used as a reference sequence.Overall, the IR region is more conservative than LSC and SSC, and coding regions were more conserved than non-coding ones.The genetic differences of the three species are mainly concentrated in the intron and intergenic spacers, and are mostly presented in the form of base substitution.For example, the gene ycf3 and clpP both have two introns, while two sites are substituted in the ycf3 exon, and six sites are replaced in clpP.Intergenic spacers differences existed in the genes, such as psaA-ycf3, trnD-trnT, psbD-trnT, rpl16-rps3, and ccsA-ndhD.The gene accD and rpoC2 with two and eight base substitution, respectively, have no intron (Figure 4).non-coding ones.The genetic differences of the three species are mainly concentrated in the intron and intergenic spacers, and are mostly presented in the form of base substitution.For example, the gene ycf3 and clpP both have two introns, while two sites are substituted in the ycf3 exon, and six sites are replaced in clpP.Intergenic spacers differences existed in the genes, such as psaA-ycf3, trnD-trnT, psbD-trnT, rpl16-rps3, and ccsA-ndhD.The gene accD and rpoC2 with two and eight base substitution, respectively, have no intron (Figure 4).In order to distinguish these three species, reference to the chloroplast genome interspecific analysis, specific primer pairs were designed for the different regions, and the target fragments were amplified in the nearly one hundred samples (experiment with 30 samples of each species).Primer pair 1, used to amplify the trnD-trnT intergenic spacer region, can be employed to identify R. tangutum from the other two species because of its six bases deletion at 331 to 336 in the sequence alignment (Figure 5a).The intergenic spacer of PsaA-ycf3 amplified by Primer pair 7 can distinguish R. tangutum for the base insertion (from site 45 to 50) and the base substitution (C to A at 164) (Figure 5b).Primer pair 9 and primer pair 10 were used to amplify the genes rpoA and rpl16, respectively.SNPs were found at site 321 (G to A) in rpoA, as well as 164 (T to C) and 198 (A to T) in rpl16, which can also be used to distinguish R. tangutum (Figure 5c,d).Primer pair 15, amplified in the trnN-ycf1 intergenic spacer sequence, was also appropriate for the identification of R. tangutum due to the base deletion at 304 to 329 (Figure 5e).The region amplified by primer pair 17 has base substitutions from 111 to 113: CCT/TAA, so that it can be used to identify R. palmatum (Figure 5f).The amplified trnN-ycf1 intergenic spacer from primer pair 21 was very specific, with a C to T substitution at 35, which can be used to identify R. palmatum, and a 26 base deletion at the site 290 to 315, which can be used to identify R. tangutum (Figure 5g).The PsbD-trnT intergenic spacer amplified by primer pair 6 was used to identify R. officinale with base substitutions at 271 (G to T) and 276 (G to T) (Figure 5h).The three species can be distinguished based on the substitution and deletion of bases in the target fragment amplified by primer pair 21.In order to distinguish these three species, reference to the chloroplast genome interspecific analysis, specific primer pairs were designed for the different regions, and the target fragments were amplified in the nearly one hundred samples (experiment with 30 samples of each species).Primer pair 1, used to amplify the trnD-trnT intergenic spacer region, can be employed to identify R. tangutum from the other two species because of its six bases deletion at 331 to 336 in the sequence alignment (Figure 5a).The intergenic spacer of PsaA-ycf3 amplified by Primer pair 7 can distinguish R. tangutum for the base insertion (from site 45 to 50) and the base substitution (C to A at 164) (Figure 5b).Primer pair 9 and primer pair 10 were used to amplify the genes rpoA and rpl16, respectively.SNPs were found at site 321 (G to A) in rpoA, as well as 164 (T to C) and 198 (A to T) in rpl16, which can also be used to distinguish R. tangutum (Figure 5c,d).Primer pair 15, amplified in the trnN-ycf1 intergenic spacer sequence, was also appropriate for the identification of R. tangutum due to the base deletion at 304 to 329 (Figure 5e).The region amplified by primer pair 17 has base substitutions from 111 to 113: CCT/TAA, so that it can be used to identify R. palmatum (Figure 5f).The amplified trnN-ycf1 intergenic spacer from primer pair 21 was very specific, with a C to T substitution at 35, which can be used to identify R. palmatum, and a 26 base deletion at the site 290 to 315, which can be used to identify R. tangutum (Figure 5g).The PsbD-trnT intergenic spacer amplified by primer pair 6 was used to identify R. officinale with base substitutions at 271 (G to T) and 276 (G to T) (Figure 5h).The three species can be distinguished based on the substitution and deletion of bases in the target fragment amplified by primer pair 21.S3.

Discussion
At present, DNA barcoding technology relies on chloroplast loci [18], such as matK, rbcL, trnH-psbA, rpoB, rpoC1, atpF-atpH, psbK-psbI, ycf5, and trnL, and has been discussed in detail [19,20].These traditional single chloroplast loci typically lack sufficient variation; phylogenetic analyses of these chloroplast regions at higher taxonomic levels are meaningful, but chloroplast loci are not generally suitable for lower taxonomic levels.Because of the inherent deficiencies of single-locus DNA barcoding, a new method needs to be found to identify closely related species [21].It has recently been suggested that the complete chloroplast genome contains as many sites of variation as mitochondrial regions in animals and may be used as a plant DNA barcode [22].The chloroplast genome is now considered a species-level DNA barcode because it can greatly improve plant phylogenetic and population genetic analyses, facilitating the recovery of lineages as monophyletic at lower taxonomic levels [23].Using the chloroplast genome as a plant DNA barcode can prevent identification failures caused by gene deletion and low PCR amplification success rate, and it can also solve the problem of sequence retrieval encountered in traditional barcode research [24,25].The sequencing costs and obtaining high-quality DNA were once the main challenges of the enrichment of the chloroplast ultra-barcode database [26], but these challenges have been overcome by next-generation sequencing (NGS) combined with other technologies [27].Thus, neither extraction methods nor sequencing capacity can be considered limiting factors for obtaining chloroplast genome data [28].Currently, whole chloroplast genomes have been used as super molecular markers for species identification and phylogenetic analysis of closely related plant species [18,29,30]; medicinal plants, such as Gynostemma pentaphyllum, have also been analyzed by using chloroplast genomes [31].In the present study, three closely related Rheum species that cannot be accurately identified by traditional barcodes (single-locus and multi-locus barcodes) were analyzed.The size, number of annotated genes, and the number of simple sequence repeats of their chloroplast genomes were different, and the results of the corresponding phylogenetic analysis showed the genomes can be effectively used to distinguish among these closely related species (Figure 3).The results further suggested the potential use of the chloroplast genome as a super barcode for the identification of closely related species.
The ultimate goal of DNA barcoding is to distinguish species rather than find a universal marker (Li et al., 2015).It is very expensive to sequence the chloroplast genome for each species, but a single gene locus was not suitable at the species level due to its modest discriminatory power, so we sought mutational hotspots to design primer pairs that can be used to identify the three different botanical origins of rhubarb.The trnD-trnT intergenic spacer, psbD-trnT intergenic spacer, psaA-ycf3 intergenic spacer, rpl16-rps3 intergenic spacer, trnN-ycf1 intergenic spacer, ndhF-rpl32 intergenic spacer, trnN (GUU)-ycf1 intergenic spacer, rpoA, and rpl16 were found to have existing  S3.

Discussion
At present, DNA barcoding technology relies on chloroplast loci [18], such as matK, rbcL, trnH-psbA, rpoB, rpoC1, atpF-atpH, psbK-psbI, ycf5, and trnL, and has been discussed in detail [19,20].These traditional single chloroplast loci typically lack sufficient variation; phylogenetic analyses of these chloroplast regions at higher taxonomic levels are meaningful, but chloroplast loci are not generally suitable for lower taxonomic levels.Because of the inherent deficiencies of single-locus DNA barcoding, a new method needs to be found to identify closely related species [21].It has recently been suggested that the complete chloroplast genome contains as many sites of variation as mitochondrial regions in animals and may be used as a plant DNA barcode [22].The chloroplast genome is now considered a species-level DNA barcode because it can greatly improve plant phylogenetic and population genetic analyses, facilitating the recovery of lineages as monophyletic at lower taxonomic levels [23].Using the chloroplast genome as a plant DNA barcode can prevent identification failures caused by gene deletion and low PCR amplification success rate, and it can also solve the problem of sequence retrieval encountered in traditional barcode research [24,25].The sequencing costs and obtaining high-quality DNA were once the main challenges of the enrichment of the chloroplast ultra-barcode database [26], but these challenges have been overcome by next-generation sequencing (NGS) combined with other technologies [27].Thus, neither extraction methods nor sequencing capacity can be considered limiting factors for obtaining chloroplast genome data [28].Currently, whole chloroplast genomes have been used as super molecular markers for species identification and phylogenetic analysis of closely related plant species [18,29,30]; medicinal plants, such as Gynostemma pentaphyllum, have also been analyzed by using chloroplast genomes [31].In the present study, three closely related Rheum species that cannot be accurately identified by traditional barcodes (single-locus and multi-locus barcodes) were analyzed.The size, number of annotated genes, and the number of simple sequence repeats of their chloroplast genomes were different, and the results of the corresponding phylogenetic analysis showed the genomes can be effectively used to distinguish among these closely related species (Figure 3).The results further suggested the potential use of the chloroplast genome as a super barcode for the identification of closely related species.
The ultimate goal of DNA barcoding is to distinguish species rather than find a universal marker (Li et al., 2015).It is very expensive to sequence the chloroplast genome for each species, but a single gene locus was not suitable at the species level due to its modest discriminatory power, so we sought mutational hotspots to design primer pairs that can be used to identify the three different botanical origins of rhubarb.The trnD-trnT intergenic spacer, psbD-trnT intergenic spacer, psaA-ycf3 intergenic spacer, rpl16-rps3 intergenic spacer, trnN-ycf1 intergenic spacer, ndhF-rpl32 intergenic spacer, trnN (GUU)-ycf1 intergenic spacer, rpoA, and rpl16 were found to have existing stable mutation sites that can be used for the identification of rhubarb botanical origins.Therefore, a DNA fragment having sufficiently high mutation rate and being easily amplified is sought to be able to recognize a species in a given taxonomic group.The chloroplast genome is an effective approach to differentiate closely related plants, including most of the multi-original herbal medicines.
In the Chinese Pharmacopoeia 2015 edition, about one-quarter of the Chinese traditional medicines have multiple origins, meaning they could be derived from different species.Thus, the accurate identification of the different botanical origins of these multi-origin Chinese traditional medicines has become a focus of attention in society.Correct identifications ensure the safety of clinical medications and the control of drug quality.The present study laid the foundation of super barcode utilization in rhubarb, providing a molecular basis for precision medication, and lays the groundwork for the next investigation on these important medicinal species.This research has also provided a reference on the identification of the botanical origin of multi-origin medicinals.

Plant Materials and DNA Extraction
Fresh leaves of the three different botanical species (R. officinale, R. palmatum, R. tangutum) for chloroplast genome sequencing were collected from Qinghai, Sichuan, and Gansu Province, China (Table 1).The silica gel dried samples used for specific markers screening were collected from Gansu, Guizhou, Hebei, Henan, Hubei, Jilin, Qinghai and Sichuan Province, and thirty samples were collected for each of the species.The voucher specimens were deposited at the Hubei Institute for Drug Control and identified by Professor Ling Xiao.Total genomic DNA was extracted from leaves with a modified cetyltrimethylammonium bromide (CTAB) method [32,33].The concentration of DNA was estimated by measuring A260 and A280 using an ND-2000 spectrometer (Nanodrop Technologies, Wilmington, DE, USA), samples were also visually examined by 1% agarose 1× TAE gel electrophoresis.

Sequencing, Assembly, and Annotation
The DNA integrity and quantity were analyzed by 1% agarose gel electrophoresis, a NanoDrop 2000C Spectrophotometer (Thermo Scientific, Waltham, MA, USA), Qubit ® 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA), and Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA).Then the DNA was randomly fragmented into ~300 bp long fragments using an ultrasonicator (Bioruptor Pico, Denville, NJ, USA).After the sequencing libraries were constructed according to the manufacturer's protocols (NEBNext ® Ultra TM DNA Library Prep Kit for Illumina ® , Beijing, China), sequencing was carried out on an Illumina HiSeq2000 high-throughput sequencer.The raw reads obtained were filtered using the NGS QC Toolkit [34], to omit the reads with more than 30% low quality bases (Q < 30) and those with more than 5% the amount of non-ATCG (N).The low quality regions of the reads were trimmed using trimmingReads.pl.Clean data were stored for next analysis.All the clean reads were collected as a pool for chloroplast genome assembly, and Geneious 9.1.4(Biomatters Ltd., Auckland, New Zealand) [35] with BLAST 2.0.3+(National Institutes of Health, Bethesda, MD, USA) [36] was employed to assemble the genomes.Four junctions between the inverted repeat (IR) and large single-copy/small single-copy (LSC/SSC) regions were confirmed by PCR amplification and Sanger sequencing.
DOGMA (available online: http://dogma.ccbb.utexas.edu/)[37] and CPGAVAS [38] were used for annotating the chloroplast genome to compare them between the three study species and further confirmation was performed using BLAST and DOGMA [37].The tRNA genes were identified by tRNAscanSE [39].Circular genome maps of the three different botanical origins of rhubarb were illustrated with the Organellar Genome DRAW tool [40].The characteristics of chloroplast genomic sequences were determined using MEGA7 [41].

Figure 1 .
Figure 1.Gene map of the Rheum chloroplast genome.The genes lying inside and outside the outer circle are transcribed in a clockwise and counterclockwise direction, respectively (as indicated by arrows).Colors denote the genes belonging to different functional groups.The hatch marks on the inner circle indicate the extent of the inverted repeats (IRa and IRb) that separate the small single copy (SSC) region from the large single copy (LSC) region.The dark gray and light gray shading within the inner circle correspond to the percentage of G + C and A + T content, respectively.

Figure 1 .
Figure 1.Gene map of the Rheum chloroplast genome.The genes lying inside and outside the outer circle are transcribed in a clockwise and counterclockwise direction, respectively (as indicated by arrows).Colors denote the genes belonging to different functional groups.The hatch marks on the inner circle indicate the extent of the inverted repeats (IRa and IRb) that separate the small single copy (SSC) region from the large single copy (LSC) region.The dark gray and light gray shading within the inner circle correspond to the percentage of G + C and A + T content, respectively.

Figure 2 .
Figure 2. Comparison of SSR types and quantities in the three studied Rheum species.(a) Number of SSR types; (b) SSRs of three species in four regions; (c) The percentages of SSRs number in four regions; (d) Frequency of SSRs by length.SSR: Simple sequence repeats; LSC: large single-copy region; SSC: small single-copy region; IRA and IRB: inverted repeats.

Figure 2 .
Figure 2. Comparison of SSR types and quantities in the three studied Rheum species.(a) Number of SSR types; (b) SSRs of three species in four regions; (c) The percentages of SSRs number in four regions; (d) Frequency of SSRs by length.SSR: Simple sequence repeats; LSC: large single-copy region; SSC: small single-copy region; IRA and IRB: inverted repeats.

Figure 3 .
Figure 3. Phylogenetic tree constructed using neighbor joining (NJ), based on the whole chloroplast genomes from different species.Amborella trichopoda was set as outgroup.

Figure 3 .
Figure 3. Phylogenetic tree constructed using neighbor joining (NJ), based on the whole chloroplast genomes from different species.Amborella trichopoda was set as outgroup.

Figure 4 .
Figure 4. Comparison of three chloroplast genomes using R. palmatum as the reference.The vertical scale indicates the percentage of identity, ranging from 50% to 100%; the horizontal axis indicates the coordinates within the chloroplast genome.Annotated genes are displayed along the top.Genome regions are color-coded as either protein-coding exons, rRNA, tRNA, or conserved non-coding sequences (CNS).UTR: Untranslated Region.

Figure 4 .
Figure 4. Comparison of three chloroplast genomes using R. palmatum as the reference.The vertical scale indicates the percentage of identity, ranging from 50% to 100%; the horizontal axis indicates the coordinates within the chloroplast genome.Annotated genes are displayed along the top.Genome regions are color-coded as either protein-coding exons, rRNA, tRNA, or conserved non-coding sequences (CNS).UTR: Untranslated Region.

Figure 5 .
Figure 5. Base information of the identification sites of sequences obtained by chosen primer pairs for the three study species.(a) Primer pair 1; (b) Primer pair 7; (c) Primer pair 9; (d) Primer pair 10; (e) Primer pair 15; (f) Primer pair 17; (g) Primer pair 21; (h) Primer pair 6.For more detailed information on primer pairs see TableS3.

Figure 5 .
Figure 5. Base information of the identification sites of sequences obtained by chosen primer pairs for the three study species.(a) Primer pair 1; (b) Primer pair 7; (c) Primer pair 9; (d) Primer pair 10; (e) Primer pair 15; (f) Primer pair 17; (g) Primer pair 21; (h) Primer pair 6.For more detailed information on primer pairs see TableS3.

Table 1 .
The basic characteristics of chloroplast genomes of the three Rheum species.