Sequencing and Comparative Analysis of the Chloroplast Genome of Angelica polymorpha and the Development of a Novel Indel Marker for Species Identification

The genus Angelica (Apiaceae) comprises valuable herbal medicines. In this study, we determined the complete chloroplast (CP) genome sequence of A. polymorpha and compared it with that of Ligusticum officinale (GenBank accession no. NC039760). The CP genomes of A. polymorpha and L. officinale were 148,430 and 147,127 bp in length, respectively, with 37.6% GC content. Both CP genomes harbored 113 unique functional genes, including 79 protein-coding, four rRNA, and 30 tRNA genes. Comparative analysis of the two CP genomes revealed conserved genome structure, gene content, and gene order. However, highly variable regions, sufficient to distinguish between A. polymorpha and L. officinale, were identified in hypothetical chloroplast open reading frame1 (ycf1) and ycf2 genic regions. Nucleotide diversity (Pi) analysis indicated that ycf4–chloroplast envelope membrane protein (cemA) intergenic region was highly variable between the two species. Phylogenetic analysis revealed that A. polymorpha and L. officinale were well clustered at family Apiaceae. The ycf4-cemA intergenic region in A. polymorpha carried a 418 bp deletion compared with L. officinale. This region was used for the development of a novel indel marker, LYCE, which successfully discriminated between A. polymorpha and L. officinale accessions. Our results provide important taxonomic and phylogenetic information on herbal medicines and facilitate their authentication using the indel marker.


Introduction
In plants, the chloroplast (CP) plays an important role in photosynthesis and carbon fixation as well as starch, fatty acid, and amino acid biosynthesis [1].In higher plants, the CP genome ranges in size from 120 to 180 Kb and has a quadripartite structure, including a large single copy (LSC) region, a small single copy (SSC) region, and two copies of an inverted repeat (IR) region (IRa and IRb) [2].Angiosperm CP genomes usually contain 110-130 genes, with up to 80 protein-coding genes, approximately 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes [3].Despite the highly conserved genome structure, gene content, and gene order among plant species, CP genomes exhibit genomic arrangement, IR region loss, and gene loss in some angiosperms such as parasitic plants [4][5][6][7][8][9].Advances in next generation sequencing technologies have reduced the cost and complexity of CP genome assembly compared with Sanger sequencing [10].The CP genome has been widely used for phylogenetic analysis and molecular marker development in plant species.These molecular markers are highly useful DNA barcoding tools for the authentication and identification of plant taxa including herbal medicines.For example, matK and rbcL genes in CP genomes are used as universal plant DNA barcodes [11].The genus Angelica comprises valuable herbal medicines [12].Although the CP genomes of a few Angelica species have been reported [13][14][15][16][17] and are available from GenBank, limited genomic information is available for the identification of Angelica species.Additional genomic information is needed to understand the utility of herbal medicines in the genus Angelica.
Insertions/deletions (indels) in CP genomes occur because of genomic rearrangements resulting from slipped strand mispairing, stem-loop secondary structure, and intramolecular recombination [18][19][20].Indels represent intraspecific polymorphisms in plant populations [21] and are used for species identification.Indel in the trnL-F region has been widely used as a universal DNA barcode for species classification [22].Phylogenetic relationships among 41 Poa species have been determined using indels in trnL-F and trnL introns, clustering these species into four major groups [23].Indels in trnL-F, trnG-trnS, and trnL introns have been used for the analysis of the CP genomes of Silene latifolia and S. vulgaris.Furthermore, the authors showed that indels evolved at slightly higher rates than single nucleotide polymorphisms (SNPs) in the Silene genus [24].In the genus Aconitum, four species have been distinguished using indels in CP genomes [25,26].Furthermore, CP genome indels have been used to identify intraspecific variation in the genera Fagopyrum (F.tataicum vs. F. esculentum) and Ipomoea (I.nil vs. I. purpurea) [27,28].Thus, indels in CP genomes are useful for phylogenetic and evolutionary analyses of plant species as well as for species identification.
The genus Angelica (family Apiaceae) is a taxonomically complex and controversial group comprising approximately 110 species with diverse morphology [29].Ligusticum officinale is widely distributed in East Asia, and dried rhizomes of this species are used as an important herbal medicine [12].Unfortunately, A. polymorpha is frequently misused as L. officinale in inauthentic preparations of herbal medicines in Korean herbal markets because sliced preparations of the two species are highly similar to the naked eye.To ensure a consistent pharmacological effect of herbal medicines, accurate identification of L. officinale and A. polymorpha is essential.Therefore, an objective method of analysis, such as a molecular marker, is needed for the identification of herbal medicines.
In this study, we characterized the CP genome of A. polymorpha and compared it with that of L. officinale with the aim to identify highly variable regions and understand the phylogenetic relationship between the two species.Additionally, we aimed at developing an efficient molecular marker to distinguish between the CP genomes of these species.

CP Genome Organization of A. polymorpha
The CP genome of A. polymorpha was sequenced using the Illumina MiSeq platform.Sequencing at approximately 75× coverage generated 1.25 Gb of paired-end reads (Table S1).The complete circular CP genome of A. polymorpha, completed after gap filling and manual editing, was 147,121 bp in length.Paired-end read mapping was conducted to validate the draft genome (Figure S1).The CP genome of A. polymorpha showed a quadripartite structure like in most land plants consisting of a pair of IR regions (17,870 bp each) separated by LSC (93,591 bp) and SSC (17,796 bp) regions (Figure 1, Table 1).The CP genome of A. polymorpha was AT-rich (62.5%), and the AT content of LSC (64.1%) and SSC (69%) regions was higher than that of the IR regions (54.9%); these data are consistent with those of other angiosperm CP genomes [2,30].Sequences of the junctions between the LSC, SSC, and IR regions were validated using PCR-based sequencing (Table S2).The CP genome of A. polymorpha harbored 113 predicted genes, of which 97 were present as single copies in the LSC and SSC regions, while 17 were duplicated in the IR regions (Table S3).The 97 unique genes included 79 protein-coding genes, 30 tRNA genes, and four rRNA genes.Additionally, the CP genome of A. polymorpha harbored 17 intron-containing genes.Among these, 14 genes (nine protein-coding and five tRNA genes) contained a single intron, while two genes (ycf3 and clpP) contained two introns (Table S4).Of the 17 intron-containing genes, 12 genes (nine protein-coding and three tRNA genes) were located in the LSC region, one protein-coding gene in the SSC region, and four genes (two protein-coding and two tRNA genes) in the IR regions.Of the 79 protein-coding genes, six genes (ndhB, rpl2, rpl23, rps7, rps12 and ycf15) were duplicated in the IR regions.The start codons of ndhD and rps19 were ACG and GTG, respectively, which were used as an alternative to ATG.The use of ACG and GTG as start codons is a common phenomenon in various genes in CP genomes of land plants [31][32][33].The protein-coding genes comprised 21,587 bp in the CP genome of A. polymorpha (Table S5), and codons of leucine and isoleucine were highly abundant (Figure S2A).Relative synonymous codon usage (RSCU) values revealed synonymous codon usage bias, with a high proportion of synonymous codons harboring A or T(U) nucleotide in the third position (Figure S2B).Overall, the genome structure, gene number, and codon usage in the CP genome of A. polymorpha were consistent with those in CP genomes of other Angelica species [13,15,17].
Molecules 2019, 22, x FOR PEER REVIEW 3 of 14 while 17 were duplicated in the IR regions (Table S3).The 97 unique genes included 79 protein-coding genes, 30 tRNA genes, and four rRNA genes.Additionally, the CP genome of A. polymorpha harbored 17 intron-containing genes.Among these, 14 genes (nine protein-coding and five tRNA genes) contained a single intron, while two genes (ycf3 and clpP) contained two introns (Table S4).Of the 17 intron-containing genes, 12 genes (nine protein-coding and three tRNA genes) were located in the LSC region, one protein-coding gene in the SSC region, and four genes (two protein-coding and two tRNA genes) in the IR regions.Of the 79 protein-coding genes, six genes (ndhB, rpl2, rpl23, rps7, rps12 and ycf15) were duplicated in the IR regions.The start codons of ndhD and rps19 were ACG and GTG, respectively, which were used as an alternative to ATG.The use of ACG and GTG as start codons is a common phenomenon in various genes in CP genomes of land plants [31][32][33].The protein-coding genes comprised 21,587 bp in the CP genome of A. polymorpha (Table S5), and codons of leucine and isoleucine were highly abundant (Figure S2A).Relative synonymous codon usage (RSCU) values revealed synonymous codon usage bias, with a high proportion of synonymous codons harboring A or T(U) nucleotide in the third position (Figure S2B).Overall, the genome structure, gene number, and codon usage in the CP genome of A. polymorpha were consistent with those in CP genomes of other Angelica species [13,15,17].

Analysis of Repeated Sequences in the CP Genomes of A. polymorpha and L. officinale
Repeated sequences were abundant in the CP genomes of both species.These repeat sequences result in structural variation due to genomic rearrangement, gene expansion, and pseudogene formation [8,35].Simple sequence repeats (SSRs), also known as microsatellites, comprise 1-6 nucleotides [36].We analyzed SSRs in the CP genomes of the two species (Figure S3).The CP genomes of A. polymorpha and L. officinale harbored a similar number of SSRs (209 and 203, respectively).Most of these SSRs were located in single copy regions (LSC and SSC), as expected.The number of SSRs was similar between the SSC and IR regions.SSRs were more abundant in the intergenic spacer (IGS) region, especially the non-coding region, than in genic regions, and mononucleotide motifs were the most abundant type of repeats, followed by dinucleotide motifs, in both CP genomes (Figure S3C).We also identified tandem repeats (>20 bp) in the two CP genomes (Figure S4).Most of the tandem repeats (20-59 bp) were located in IGS and LSC regions.The longest tandem repeat (100 bp) was present in the CP genome of L. officinale.Palindromic repeats were located in the LSC region in both CP genomes (Table S6).Overall, the CP genomes of A. polymorpha and L. officinale showed a similar number and type of repeats, and no polymorphism was detected between the two genomes.

Comparative Analysis of the CP Genomes of A. polymorpha and L. officinale
The IR regions represent the most highly conserved sequences in the CP genome [37].The contraction and expansion of sequences at the borders of IR regions is a common evolutionary event, which is mainly responsible for variation in CP genome size and genomic rearrangement [38].In this study, we analyzed the border structure of LSC, SSC, and IR regions in the two CP genomes (Figure 2).The ycf2 gene was located at the LSC/IRa junction.The ycf1 pseudogene and ycf1 gene, which was located at the IRa/SSC and SSC/IRb junctions, extended into the SSC region.The location of most other genes was similar to their location in other CP genomes [28,39].
Gene content, order, and orientation were similar between the CP genomes of A. polymorpha and L. officinale.To compare CP genomes of the two species, we performed multiple sequence alignment of the whole CP genome sequences using mVISTA (Figure 3).The non-coding region was more variable than the coding region, and the IGS region was the most variable in both CP genomes.Five highly variable regions were identified in this study including three IGS regions (trnT-psbD, ycf4-cemA, and ycf2-trnL) and two genic regions (ycf1 and ycf2).To determine sequence divergence between the CP genomes of A. polymorpha and L. officinale, we calculated the nucleotide diversity (Pi) of the CP genome sequences (Figure 4).The IR regions were more highly conserved than the LSC and SSC regions, with average Pi values of 0.002 in IR regions and 0.009 in single copy regions (with some IR regions showing a Pi value of 0).In the LSC, ycf4-cemA exhibited a Pi of 0.189, which was the highest.Although the CP genomes of both species were mostly highly conserved, the IGS regions showed divergence.High divergence in the IGS regions, including trnT-psbD, ycf4-cemA, and ycf2-trnL, because of the presence of indels, SNPs, and structural variation has been previously reported in CP genomes of other plant species [40][41][42][43].In this study, the ycf4-cemA region was used for the development of a molecular marker to distinguish between A. polymorpha and L. officinale.
most abundant type of repeats, followed by dinucleotide motifs, in both CP genomes (Figure S3C).We also identified tandem repeats (>20 bp) in the two CP genomes (Figure S4).Most of the tandem repeats (20-59 bp) were located in IGS and LSC regions.The longest tandem repeat (100 bp) was present in the CP genome of L. officinale.Palindromic repeats were located in the LSC region in both CP genomes (Table S6).Overall, the CP genomes of A. polymorpha and L. officinale showed a similar number and type of repeats, and no polymorphism was detected between the two genomes.

Comparative Analysis of the CP Genomes of A. polymorpha and L. officinale
The IR regions represent the most highly conserved sequences in the CP genome [37].The contraction and expansion of sequences at the borders of IR regions is a common evolutionary event, which is mainly responsible for variation in CP genome size and genomic rearrangement [38].In this study, we analyzed the border structure of LSC, SSC, and IR regions in the two CP genomes (Figure 2).The ycf2 gene was located at the LSC/IRa junction.The ycf1 pseudogene and ycf1 gene, which was located at the IRa/SSC and SSC/IRb junctions, extended into the SSC region.The location of most other genes was similar to their location in other CP genomes [28,39].Gene content, order, and orientation were similar between the CP genomes of A. polymorpha and L. officinale.To compare CP genomes of the two species, we performed multiple sequence alignment of the whole CP genome sequences using mVISTA (Figure 3).The non-coding region was more variable than the coding region, and the IGS region was the most variable in both CP genomes.Five highly variable regions were identified in this study including three IGS regions (trnT-psbD, ycf4-cemA, and ycf2-trnL) and two genic regions (ycf1 and ycf2).To determine sequence divergence between the CP genomes of A. polymorpha and L. officinale, we calculated the nucleotide diversity (Pi) of the CP genome sequences (Figure 4).The IR regions were more highly conserved than the LSC and SSC regions, with average Pi values of 0.002 in IR regions and 0.009 in single copy regions (with some IR regions showing a Pi value of 0).In the LSC, ycf4-cemA exhibited a Pi of 0.189, which was the highest.Although the CP genomes of both species were mostly highly conserved, the IGS regions showed divergence.High divergence in the IGS regions, including trnT-psbD, ycf4-cemA, and ycf2-trnL, because of the presence of indels, SNPs, and structural variation has been previously reported in CP genomes of other plant species [40][41][42][43].In this study, the ycf4-cemA region was used for the development of a molecular marker to distinguish between A. polymorpha and L. officinale.

Phylogenetic Relationship Between A. polymorpha and L. officinale
The CP genomes are valuable genomic resources for the reconstruction of accurate highresolution phylogenies [44,45].To identify the phylogenetic positions of A. polymorpha and L. officinale within the Apiaceae family, 52 protein-coding sequences shared by 33 CP genomes were aligned over a total length of 38,279 bp (Figure 5).The maximum likelihood (ML) tree and Bayesian inference (BI) trees contained 22 of 30 nodes, with ML bootstrap values of 100% and BI posterior probabilities of 1.0.Both the ML and BI phylogenetic results indicated that Apiaceae and Araliaceae with ML bootstrap values of 100% and BI posterior probabilities of 1.0.L. tenuissimum and L. officinale clustered together.Moreover, these two Ligusticum species were closely related to Coriandrum sativum within Apiaceae.A. polymorpha was well-positioned within the genus Angelica.Foeniculum vulgare and Anethum graveolens formed a monophyletic group and a sister relationship with Petroselinum crispum within Apiaceae.The genus Angelica showed highly ML bootstrap values and BI posterior probabilities, species within this genus were well clustered according to the APG IV system [46].However, Glehnia littoralis weakly clustered within the genus Angelica in this study.In a previous study, phylogenetic trees of Apiaceae were reconstructed using internal transcribed spacer (ITS) and CP loci [29,47,48], and our results were consistent with phylogenetic trees based on both ITS and CP loci.Genera Glehnia and Angelica showed different morphological characteristics, and their phylogenetic relationship was not clear based on whole CP genome sequences.However, to understand the phylogenetic relationship between Angelica and Glehnia species, in-depth investigation of other CP genomes and reinterpretation of morphological data are needed.Furthermore, taxonomic delimitation of the following four species at the genus level has changed depending on the view point of taxonomists [49][50][51][52]: Ledebouruella seseloides (=Saposhnikovia divaricata (Turcz.)Schischk), L. tenuissimum (=Conioselinum tenuissimum (Nakai) Pimenov & Kljuykov), L. officinale (=Cnidium officinale Makino), and Peucedanum insolens [=Sillaphyton podagraria (H.Boissieu) Pimenov].Among these species, L. tenuissimum and L. officinale clustered within a monophyletic group in this study.We suggest that the Ligusticum taxa should be considered for further

Phylogenetic Relationship between A. polymorpha and L. officinale
The CP genomes are valuable genomic resources for the reconstruction of accurate high-resolution phylogenies [44,45].To identify the phylogenetic positions of A. polymorpha and L. officinale within the Apiaceae family, 52 protein-coding sequences shared by 33 CP genomes were aligned over a total length of 38,279 bp (Figure 5).The maximum likelihood (ML) tree and Bayesian inference (BI) trees contained 22 of 30 nodes, with ML bootstrap values of 100% and BI posterior probabilities of 1.0.Both the ML and BI phylogenetic results indicated that Apiaceae and Araliaceae with ML bootstrap values of 100% and BI posterior probabilities of 1.0.L. tenuissimum and L. officinale clustered together.Moreover, these two Ligusticum species were closely related to Coriandrum sativum within Apiaceae.A. polymorpha was well-positioned within the genus Angelica.Foeniculum vulgare and Anethum graveolens formed a monophyletic group and a sister relationship with Petroselinum crispum within Apiaceae.The genus Angelica showed highly ML bootstrap values and BI posterior probabilities, species within this genus were well clustered according to the APG IV system [46].However, Glehnia littoralis weakly clustered within the genus Angelica in this study.In a previous study, phylogenetic trees of Apiaceae were reconstructed using internal transcribed spacer (ITS) and CP loci [29,47,48], and our results were consistent with phylogenetic trees based on both ITS and CP loci.Genera Glehnia and Angelica showed different morphological characteristics, and their phylogenetic relationship was not clear based on whole CP genome sequences.However, to understand the phylogenetic relationship between Angelica and Glehnia species, in-depth investigation of other CP genomes and reinterpretation of morphological data are needed.Furthermore, taxonomic delimitation of the following four species at the genus level has changed depending on the view point of taxonomists [49][50][51][52]: Ledebouruella seseloides (=Saposhnikovia divaricata (Turcz.)Schischk), L. tenuissimum (=Conioselinum tenuissimum (Nakai) Pimenov & Kljuykov), L. officinale (=Cnidium officinale Makino), and Peucedanum insolens [=Sillaphyton podagraria (H.Boissieu) Pimenov].Among these species, L. tenuissimum and L. officinale clustered within a monophyletic group in this study.We suggest that the Ligusticum taxa should be considered for further investigation.Taken together, our results provide insights into the phylogenetic relationship among species within Apiaceae.

Development and Validation of an Indel Marker for Authentication of Cnidii Rhizoma
In this study, we identified divergent regions in the CP genomes of A. polymorpha and L. officinale to distinguish between these two species.Results showed that the CP genome of A. polymorpha carries a 418 bp deletion in the ycf4-cemA region compared with L. officinale.To characterize these sequences, we aligned these sequences with those available in the non-redundant (NR) database of NCBI.Multiple sequence alignment revealed species-specific sequences but no copy number variation of tandem repeats.To develop indel markers, sequence-specific primers were designed in the conserved regions flanking ycf4 and cemA (Table 2).The LYCE primers successfully amplified sequences from both L. officinale and A. polymorpha (Figure 6).The indel marker was tested on 21 accessions collected from different sites in Korea using LYCE primers.These 21 samples were clearly distinguished into 12 L. officinale and nine A. polymorpha samples (Table S7).The CP DNA fragments amplified from the tested samples were sequenced to determine the exact amplicon size.The LYCE primer pair amplified a 540 bp amplicon from L. officinale samples and a 122 bp fragment from A. polymorpha samples.The predicted sizes of insertions or deletions in the CP genomes were consistent with fragment sizes amplified from L. officinale and A. polymorpha samples.
Dried rhizomes of L. officinale are used as a traditional herbal medicine in Korea [12].Although phylogenetic analysis indicated that A. polymorpha is distant from L. officinale, a molecular approach is needed for efficient differentiation between authentic herbal medicines and adulterants that appear similar because of similar shaped rhizomes and sliced herbal products.Indels in CP genomes were

Development and Validation of an Indel Marker for Authentication of Cnidii Rhizoma
In this study, we identified divergent regions in the CP genomes of A. polymorpha and L. officinale to distinguish between these two species.Results showed that the CP genome of A. polymorpha carries a 418 bp deletion in the ycf4-cemA region compared with L. officinale.To characterize these sequences, we aligned these sequences with those available in the non-redundant (NR) database of NCBI.Multiple sequence alignment revealed species-specific sequences but no copy number variation of tandem repeats.To develop indel markers, sequence-specific primers were designed in the conserved regions flanking ycf4 and cemA (Table 2).The LYCE primers successfully amplified sequences from both L. officinale and A. polymorpha (Figure 6).The indel marker was tested on 21 accessions collected from different sites in Korea using LYCE primers.These 21 samples were clearly distinguished into 12 L. officinale and nine A. polymorpha samples (Table S7).The CP DNA fragments amplified from the tested samples were sequenced to determine the exact amplicon size.The LYCE primer pair amplified a 540 bp amplicon from L. officinale samples and a 122 bp fragment from A. polymorpha samples.The predicted sizes of insertions or deletions in the CP genomes were consistent with fragment sizes amplified from L. officinale and A. polymorpha samples.
Dried rhizomes of L. officinale are used as a traditional herbal medicine in Korea [12].Although phylogenetic analysis indicated that A. polymorpha is distant from L. officinale, a molecular approach is needed for efficient differentiation between authentic herbal medicines and adulterants that appear similar because of similar shaped rhizomes and sliced herbal products.Indels in CP genomes were useful for species identification and distinguishing between authentic and inauthentic herbal medicines.Previous studies have reported indel markers of CP genomes [26,53,54].Aconitum pseudolaeve, A. longecassidatum, and A. barbatum have been clearly distinguished on the basis of variation in CP genomes using indel markers [25].Similarly, species identification of F. tataricum and F. esculentum has been performed using the same approach [27].Thus, indel markers play an important role in species identification and herbal medicine authentication.The LYCE indel marker developed in this study will be useful for the identification of L. officinale and authentication of Cnidii Rhizoma.for species identification and distinguishing between authentic and inauthentic herbal medicines.Previous studies have reported indel markers of CP genomes [26,53,54].Aconitum pseudolaeve, A. longecassidatum, and A. barbatum have been clearly distinguished on the basis of variation in CP genomes using indel markers [25].Similarly, species identification of F. tataricum and F. esculentum has been performed using the same approach [27].Thus, indel markers play an important role in species identification and herbal medicine authentication.The LYCE indel marker developed in this study will be useful for the identification of L. officinale and authentication of Cnidii Rhizoma.

Plant Materials
Fresh leaves of A. polymorpha (KIOM201501014664) were collected from natural populations in Korea and used for CP genome sequencing.All samples were assigned identification numbers and registered in the Korean Herbarium of Standard Herbal Resources (Index Herbariorum code KIOM) at the Korea Institute of Oriental Medicine (KIOM, Naju, Korea).Plant samples used for CP genome analysis and indel marker validation are listed in Table S7.

Sequencing and Assembly of the CP Genome of A. polymorpha
DNA was extracted from leaf samples using DNeasy Plant Maxi Kit (Qiagen, Valencia, CA, USA), according to the manufacturer's instructions.Illumina short-insert paired-end sequencing libraries were constructed and sequenced using the Illumina MiSeq platform (Illumina, San Diego, CA, USA).CP genome sequences were determined from the de novo assembly of low-coverage whole genome sequences.Trimmed paired-end reads (Phred score ≥20) were assembled using the CLC genome assembler ver.4.06 beta (CLC Inc., Rarhus, Denmark) with default parameters.Principal contigs representing the CP genome were retrieved from the total collection of contigs using Nucmer [55] and aligned with the reference CP genome sequence of Angelica acutiloba (KT963036).De novo SOAP gap closer was performed to fill gaps based on the aligned paired-end reads [56].

Annotation and Comparative Analysis
Gene annotation of the CP genome of A. polymorpha was performed using GeSeq [57], and annotation results were concatenated using an in-house script pipeline.Protein-coding sequences were manually curated and confirmed using Artemis [58] and checked against the NCBI protein

Plant Materials
Fresh leaves of A. polymorpha (KIOM201501014664) were collected from natural populations in Korea and used for CP genome sequencing.All samples were assigned identification numbers and registered in the Korean Herbarium of Standard Herbal Resources (Index Herbariorum code KIOM) at the Korea Institute of Oriental Medicine (KIOM, Naju, Korea).Plant samples used for CP genome analysis and indel marker validation are listed in Table S7.

Sequencing and Assembly of the CP Genome of A. polymorpha
DNA was extracted from leaf samples using DNeasy Plant Maxi Kit (Qiagen, Valencia, CA, USA), according to the manufacturer's instructions.Illumina short-insert paired-end sequencing libraries were constructed and sequenced using the Illumina MiSeq platform (Illumina, San Diego, CA, USA).CP genome sequences were determined from the de novo assembly of low-coverage whole genome sequences.Trimmed paired-end reads (Phred score ≥20) were assembled using the CLC genome assembler ver.4.06 beta (CLC Inc., Rarhus, Denmark) with default parameters.Principal contigs representing the CP genome were retrieved from the total collection of contigs using Nucmer [55] and aligned with the reference CP genome sequence of Angelica acutiloba (KT963036).De novo SOAP gap closer was performed to fill gaps based on the aligned paired-end reads [56].

Figure 1 .
Figure 1.Circular gene map of the CP genome of A. polymorpha.Genes drawn inside the circle are transcribed clockwise, and those drawn outside the circle are transcribed counterclockwise.The darker gray inner circle represents the GC content.

Figure 1 .
Figure 1.Circular gene map of the CP genome of A. polymorpha.Genes drawn inside the circle are transcribed clockwise, and those drawn outside the circle are transcribed counterclockwise.The darker gray inner circle represents the GC content.

Figure 2 . 14 Figure 2 .
Figure 2. Comparison of CP genome sequences of A. polymorpha and L. officinale at the junctions of the LSC, IR (IRa and IRb), and SSC regions.ψ: pseudogenes.

Figure 3 .
Figure 3. Comparative analysis of the CP genomes of A. polymorpha and L. officinale using mVISTA.Complete CP genomes of the two species were compared, with the CP genome of A. polymorpha used as a reference.Blue block, conserved genes; sky-blue block, tRNA and rRNA genes; red block, conserved non-coding sequences (CNSs); white block, regions polymorphic between A. polymorpha and L. officinale.

Figure 3 .
Figure 3. Comparative analysis of the CP genomes of A. polymorpha and L. officinale using mVISTA.Complete CP genomes of the two species were compared, with the CP genome of A. polymorpha used as a reference.Blue block, conserved genes; sky-blue block, tRNA and rRNA genes; red block, conserved non-coding sequences (CNSs); white block, regions polymorphic between A. polymorpha and L. officinale.

Figure 4 .
Figure 4. Comparison of nucleotide diversity (Pi) between the CP genomes of A. polymorpha and L. officinale.

Figure 4 .
Figure 4. Comparison of nucleotide diversity (Pi) between the CP genomes of A. polymorpha and L. officinale.

Molecules 2019 ,
22,  x FOR PEER REVIEW 7 of 14 investigation.Taken together, our results provide insights into the phylogenetic relationship among species within Apiaceae.

Figure 5 .
Figure 5. Phylogenetic tree showing the relationship of A. polymorpha with 31 species based on 52 protein-coding genes using maximum likelihood (ML) and Bayesian inference (BI) posterior probabilities.The ML topology is indicated with ML bootstrap support values and BI posterior probabilities at each node.The '+' sign indicates ML bootstrap values of 100%, and the '-' sign indicates BI posterior probabilities of 1.0.Black triangles represent the CP genomes of A. polymorpha and L. officinale examined in this study.

Figure 5 .
Figure 5. Phylogenetic tree showing the relationship of A. polymorpha with 31 species based on 52 protein-coding genes using maximum likelihood (ML) and Bayesian inference (BI) posterior probabilities.The ML topology is indicated with ML bootstrap support values and BI posterior probabilities at each node.The '+' sign indicates ML bootstrap values of 100%, and the '-' sign indicates BI posterior probabilities of 1.0.Black triangles represent the CP genomes of A. polymorpha and L. officinale examined in this study.

Table 1 .
Characteristics of the CP genomes of A. polymorpha and L. officinale.Characteristic 1A.polymorpha L. officinale2

Table 1 .
Characteristics of the CP genomes of A. polymorpha and L. officinale.

Table 2 .
Primers used for the development of the indel marker.

Table 2 .
Primers used for the development of the indel marker.