The Complete Chloroplast Genome Sequences of Aconitum pseudolaeve and Aconitum longecassidatum, and Development of Molecular Markers for Distinguishing Species in the Aconitum Subgenus Lycoctonum

Aconitum pseudolaeve Nakai and Aconitum longecassidatum Nakai, which belong to the Aconitum subgenus Lycoctonum, are distributed in East Asia and Korea. Aconitum species are used in herbal medicine and contain highly toxic components, including aconitine. A. pseudolaeve, an endemic species of Korea, is a commercially valuable material that has been used in the manufacture of cosmetics and perfumes. Although Aconitum species are important plant resources, they have not been extensively studied, and genomic information is limited. Within the subgenus Lycoctonum, which includes A. pseudolaeve and A. longecassidatum, a complete chloroplast (CP) genome is available for only one species, Aconitum barbatum Patrin ex Pers. Therefore, we sequenced the complete CP genomes of two Aconitum species, A. pseudolaeve and A. longecassidatum, which are 155,628 and 155,524 bp in length, respectively. Both genomes have a quadripartite structure consisting of a pair of inverted repeated regions (51,854 and 52,108 bp, respectively) separated by large single-copy (86,683 and 86,466 bp) and small single-copy (17,091 and 16,950 bp) regions similar to those in other Aconitum CP genomes. Both CP genomes consist of 112 unique genes, 78 protein-coding genes, 4 ribosomal RNA (rRNA) genes, and 30 transfer RNA (tRNA) genes. We identified 268 and 277 simple sequence repeats (SSRs) in A. pseudolaeve and A. longecassidatum, respectively. We also identified potential 36 species-specific SSRs, 53 indels, and 62 single-nucleotide polymorphisms (SNPs) between the two CP genomes. Furthermore, a comparison of the three Aconitum CP genomes from the subgenus Lycoctonum revealed highly divergent regions, including trnK-trnQ, ycf1-ndhF, and ycf4-cemA. Based on this finding, we developed indel markers using indel sequences in trnK-trnQ and ycf1-ndhF. A. pseudolaeve, A. longecassidatum, and A. barbatum could be clearly distinguished using the novel indel markers AcoTT (Aconitum trnK-trnQ) and AcoYN (Aconitum ycf1-ndhF). These two new complete CP genomes provide useful genomic information for species identification and evolutionary studies of the Aconitum subgenus Lycoctonum.


Introduction
Chloroplasts (CPs) play important functional roles in photosynthesis, biosynthesis, and metabolism of starch and fatty acids throughout the plant life cycle [1].The angiosperm CP genome is a circular molecule with a quadripartite structure consisting of large single-copy (LSC) and small single-copy (SSC) regions and two copies of an inverted repeat (IR) region.Typically, the CP genomes of higher plants contain 110-120 genes, encoding proteins, transfer RNAs (tRNAs), and ribosomal of A. pseudolaeve, A. longecassidatum, and A. barbatum Patrin ex Pers., we developed indel markers to distinguish three species of the subgenus Lycoctonum based on divergent regions of the CP genome.These results will provide useful genetic tools for identification of Aconitum species of the subgenus Lycoctonum, and also will inform genomic resources for evolutionary studies of these plants.

Repeat Analysis in Two Aconitum Chloroplast Genomes
Microsatellites or simple sequence repeats (SSRs) are made up of abundant tandem repeat sequences consisting of 1-6-nt motifs.These elements are useful markers due to their high degree of polymorphism.In addition, SSRs are used for phylogenic analysis in population genetics [30,31].We identified SSR loci, revealing 268 and 277 SSRs in the CP genomes of A. pseudolaeve Nakai and A. longecassidatum Nakai, respectively (Figure 3).Mononucleotides were the most abundant motifs, constituting 128 (47.8%) and 126 (45.5%) of the SSRs in A. pseudolaeve and A. longecassidatum, respectively.Approximately 37% of SSRs were distributed in coding regions.More SSRs were present in single-copy regions than in IR regions.To detect potential SSR loci for development of markers to distinguish the two Aconitum species, we identified 36 SSR indels consisting of A or T motifs, ranging in length from 1 to 6 bp (Table 4).The region exhibiting the greatest difference between species was a 6-bp SSR in psbM-trnD (IGS).One SSR is present in an exon of ndhG (Table 4).Repeat sequences play important evolutionary roles, influencing changes in genome structure such as duplication and rearrangement [32].We detected tandem repeats of 20 or 21 bp in A. pseudolaeve and 19 bp in A. longecassidatum (Table S4).Most tandem repeats were located in IGS regions, and were present in both the ycf1 and ycf2 genes.Fourteen repeats were shared between the two Aconitum species.Three tandem repeats were located in ycf2, and two in trnK-rps16.In both Repeat sequences play important evolutionary roles, influencing changes in genome structure such as duplication and rearrangement [32].We detected tandem repeats of 20 or 21 bp in A. pseudolaeve and 19 bp in A. longecassidatum (Table S4).Most tandem repeats were located in IGS regions, and were present in both the ycf1 and ycf2 genes.Fourteen repeats were shared between the two Aconitum species.Three tandem repeats were located in ycf2, and two in trnK-rps16.In both species, six palindromic repeats were present, ranging in size from 21 to 33 bp (Table 5).In particular, the ycf2 gene contained short tandem repeats as well as a palindromic repeat.Based on a phylogenetic analysis of the CP genome sequence, A. pseudolaeve and A. longecassidatum have been clustered within the Aconitum subgenus Lycoctonum, genetically closest to A. barbatum [27].Consistent with this, the CP genomes of A. pseudolaeve and A. longecassidatum are 99.7% similar, with nearly identical genome structure, gene content, and gene order, although the single-copy (LSC and SSC) and IR regions differ slightly.The LSC and IR regions of the A. barbatum CP genome are slightly longer than those of A. pseudolaeve and A. longecassidatum, whereas the SSC regions are shorter.Thus, overall, the CP genomes of the three Aconitum species are very similar.
To identify divergent regions among the three species, we performed sequence alignment against the A. barbatum CP genome (Figure 4).The greatest divergence was observed in non-coding regions.In particular, A. barbatum contains a large insertion in trnK-trnQ that is not present in the other two species.Smaller divergent regions are present in petN-psbM, trnT-trnL, ndhC-trnV, rbcL-accD, and other loci; A. barbatum has more divergent regions than the other two species.Almost all divergent regions are located in non-coding regions such as trnR-atpA, trnT-psbD, ycf4-cemA, ndhC-trnV, and ycf1-ndhF.As noted above, coding regions are highly conserved between A. pseudolaeve and A. longecassidatum (Figure S2).The most divergent regions were found in non-coding regions such as trnR-atpA, trnT-psbD, ycf4-cemA, ndhC-trnV, and ycf1-ndhF (Figure S2).In addition, to analyze divergence at the sequence level among the three Aconitum CP genomes, we also calculated the nucleotide variability (Pi) value (Figure 5).As expected, IR regions are dramatically conserved among the three species.In other words, single-copy (LSC and SSC) regions are more variable than IR regions.The divergence among the three Aconitum species is greater than that between A. pseudolaeve and A. longecassidatum.As shown in Figure 5, a few regions exhibited divergence (atpH, trnL, ndhJ, rpl16, ycf1, and ndhA), with a maximal Pi value of 0.7%.

Indel and SNP Mutation between A. pseudolaeve and A. longecassidatum
Indels and SNPs are common events in the evolution of higher plant CP genomes [9,[33][34][35].These mutations provide information that is useful for resolving evolutionary relationships in phylogenetic analyses of related taxa [36].We detected 61 indels between A. pseudolaeve Nakai and A. longecassidatum Nakai (Table S5), of which 53 are located in IGS regions and the remaining eight are in coding regions.Most indels range from 1 to 6 bp, and eight indels are longer than 10 bp; the longest indel, in ycf4-cemA, has a length of 256 bp.No indels were found in IR regions.Comparison of the Aconitum species revealed a large insertion (1582 bp) in A. barbatum Patrin ex Pers.not present in A. pseudolaeve or A. longecassidatum.trnK-trnQ is highly conserved between A. pseudolaeve and A. longecassidatum.
We also detected 62 SNPs consisting of 27 transitions (Ts) and 35 transversions (Tv) between two CP genomes (Figure 6 and Table S6).The ratio of Ts/Tv was 1:0.77, similar to that of other CP genomes [9,37].Some nucleotides were substituted A-to-C and T-to-G (32%).Substitution of C-to-G and G-to-C showed the lowest frequency (3%).Of these 62 SNPs, 26 are located in coding regions.In particular, the ycf1 gene contains nine SNPs (three Ts, six Tv), and thus represents a hotspot region containing clustered variation [9,38].We detected no non-synonymous SNPs between A. pseudolaeve and A. longecassidatum.

Indel and SNP Mutation between A. pseudolaeve and A. longecassidatum
Indels and SNPs are common events in the evolution of higher plant CP genomes [9,[33][34][35].These mutations provide information that is useful for resolving evolutionary relationships in phylogenetic analyses of related taxa [36].We detected 61 indels between A. pseudolaeve Nakai and A. longecassidatum Nakai (Table S5), of which 53 are located in IGS regions and the remaining eight are in coding regions.Most indels range from 1 to 6 bp, and eight indels are longer than 10 bp; the longest indel, in ycf4-cemA, has a length of 256 bp.No indels were found in IR regions.Comparison of the Aconitum species revealed a large insertion (1582 bp) in A. barbatum Patrin ex Pers.not present in A. pseudolaeve or A. longecassidatum.trnK-trnQ is highly conserved between A. pseudolaeve and A. longecassidatum.
We also detected 62 SNPs consisting of 27 transitions (Ts) and 35 transversions (Tv) between two CP genomes (Figure 6 and Table S6).The ratio of Ts/Tv was 1:0.77, similar to that of other CP genomes [9,37].Some nucleotides were substituted A-to-C and T-to-G (32%).Substitution of C-to-G and G-to-C showed the lowest frequency (3%).Of these 62 SNPs, 26 are located in coding regions.In particular, the ycf1 gene contains nine SNPs (three Ts, six Tv), and thus represents a hotspot region containing clustered variation [9,38].We detected no non-synonymous SNPs between A. pseudolaeve and A. longecassidatum.

Development and Validation of the Indel Marker for Authentication of Three Species in the Aconitum Subgenus Lycoctonum
Indel regions are commonly used for development of markers because they are easy to detect, and it is straightforward to design suitable primers for them [14,15,39].We developed indel markers using the sequence variability of the large indel regions in A. pseudolaeve Nakai, A. longecassidatum Nakai, and A. barbatum Patrin ex Pers.(Figure 4).Specifically, we designed indel primers based on the conserved regions of trnK-trnQ and ycf1-ndhF.AcoTT (Aconitum trnK-trnQ) and AcoYN (Aconitum ycf1-ndhF) primers successfully amplified the predicted products in all three Aconitum species (Figure 7 and Data S1). A. pseudolaeve and A. longecassidatum exhibit a small length difference in AcoTT, whereas A. barbatum exhibits a longer PCR product than the other two species, as expected.As shown in Figure 7, A. barbatum, A. pseudolaeve, and A. longecassidatum yielded amplicons of 1865 bp, 275 bp, and 283 bp, respectively.Furthermore, A. longecassidatum has a 6-bp insertion relative to A. pseudolaeve.In AcoYN, only A. longecassidatum (259 bp) exhibits a difference to A. pseudolaeve and A. barbatum (370 bp).Indel regions are commonly used for development of markers because they are easy to detect, and it is straightforward to design suitable primers for them [14,15,39].We developed indel markers using the sequence variability of the large indel regions in A. pseudolaeve Nakai, A. longecassidatum Nakai, and A. barbatum Patrin ex Pers.(Figure 4).Specifically, we designed indel primers based on the conserved regions of trnK-trnQ and ycf1-ndhF.AcoTT (Aconitum trnK-trnQ) and AcoYN (Aconitum ycf1-ndhF) primers successfully amplified the predicted products in all three Aconitum species (Figure 7 and Data S1). A. pseudolaeve and A. longecassidatum exhibit a small length difference in AcoTT, whereas A. barbatum exhibits a longer PCR product than the other two species, as expected.As shown in Figure 7, A. barbatum, A. pseudolaeve, and A. longecassidatum yielded amplicons of 1865 bp, 275 bp, and 283 bp, respectively.Furthermore, A. longecassidatum has a 6-bp insertion relative to A. pseudolaeve.In AcoYN, only A. longecassidatum (259 bp) exhibits a difference to A. pseudolaeve and A. barbatum (370 bp).In the previous study analyzing molecular phylogeny based on the CP genome sequences of Aconitum species, we found that two Aconitum subgenera, Aconitum and Lycoctonum, were clearly classified [7].To confirm the variability of indel regions between Aconitum species and subgenera, we conducted analysis of PCR amplification profiles using the indel markers AcoTT and AcoYN, and a total 27 samples of Aconitum species (nine species and one variety) consisting of Aconitum subgenera Aconitum and Lycoctonum (Figure 7).Interestingly, all 27 other Aconitum samples yielded only the 877-bp amplicon for AcoTT, but three band patterns for AcoYN (Figure 7): the PCR products for A. monanthum Nakai and A. kirinense Nakai were 431 bp; that of A. coreanum was 410 bp; and those of the other species were 502 bp.However, A. longecassidatum was clearly distinguished from the other Aconitum species.Taken together, these findings confirm that the three Aconitum species each have specific sequences, and that it is possible to distinguish them from other Aconitum species.
Aconitum.In this study, we overcame the limitations of universal DNA barcodes for inter-species identification.Thus, our indel markers (AcoTT and AcoYN) will be useful in identification of A. pseudolaeve, A. longecassidatum, and A. barbatum (Table 6).Furthermore, we confirmed that these markers can be used to distinguish Aconitum at the subgenus level.It is likely that the subgenus Aconitum exhibits greater conservation (i.e., less variation) than the subgenus Lycoctonum.Although only a few Aconitum species were used in this study, our findings will contribute to species classification in Aconitum subgenus Lycoctonum.

Plant Materials and Genome Sequencing
We collected fresh leaves of A. pseudolaeve Nakai (KIOM201401010986) and A. longecassidatum Nakai (KIOM201401010506) from medicinal plantations in Korea, and subjected the samples to CP genome sequencing.A. pseudolaeve and A. longecassidatum were given identification numbers, and specimens were registered in the Korean Herbarium of Standard Herbal Resources (Index-Herbarium code KIOM) at the Korea Institute of Oriental Medicine (KIOM) [20].DNA was extracted using the DNeasy Plant Maxi kit (Qiagen, Valencia, CA, USA).Illumina paired-end sequencing libraries were constructed and generated using MiSeq platform (Illumina, San Diego, Valencia, CA, USA).

Assembly and Annotation of Two Aconitum Species
CP genomes were obtained by de novo assembly from low-coverage whole-genome sequence data.Trimmed paired-end reads (Phred scores ≥ 20) were assembled using CLC Genome Assembler (ver.4.06 beta, CLC Inc, Aarhus, Denmark) with default parameters.The principal contigs representing the CP genome were retrieved from total contigs using Nucmer [40] using the CP genome sequence of Aconitum barbatum var.puberulum (KC844054) as the reference sequence.Gene annotation was performed using DOGMA [41] and manual curation using BLAST.The circular maps of A. pseudolaeve and A. longecassidatum were obtained using OGDRAW [42].Codon usage and base composition analysis of CP genomes were performed using MEGA6 [43].NCBI accession numbers of CP genome sequences are KY407562 and KY407561 for A. pseudolaeve and A. longecassidatum, respectively.

SSR, Tandem, and Palindromic Repeat Analysis in Two Aconitum CP Genomes
Tandem repeats were ≥20 bp with minimum alignment score and maximum period size set at 50 and 500, respectively, and identity of repeats was set at ≥90% [44].SSRs were detected using MISA [45] with the minimum repeat numbers set to 10, 5, 4, 3, 3 and 3 for mono-, di-tri-tetra-, penta-, and hexanucleotides, respectively.IRs were detected using the Inverted Repeats Finder [46] with default parameters.IRs were required to be ≥20 bp in length with 90% similarity.

Comparative Analysis of CP Genomes of A. pseudolaeve and A. longecassidatum
The mVISTA program [47] was used to compare the CP genomes of Aconitum barbatum var.puberulum (KC844054), A. pseudolaeve, and A. longecassidatum.To calculate nucleotide variability (Pi) between CP genomes, we performed sliding-window analysis using DnaSP version 5.1 [48] with a window length of 600 bp and step size of 200 bp.Indels and SNPs were analyzed based on sequence alignments using MAFFT [49].

Development and Validation of Indel Markers (AcoTT and AcoYN) Among Aconitum Species
We selected indel regions based on mVISTA similarities and designed primers using Primer-BLAST (NCBI).Indel regions were amplified from 20 ng of genomic DNA in a 20-µL PCR mixture (Solg TM 2X Taq PCR smart mix 1, Solgent, Daegeon, Korea) with 10 pmol of each primer (Bioneer, Daejeon, Korea).Amplification was performed on a Pro Flex PCR system (Applied Biosystems, Waltham, MA, USA) according to the following program: (1) AcoTT primer: initial denaturation at 95 • C for 2 min; 35 cycles at 95 • C for 1 min, 61 • C for 1 min, and 72 • C for 1.5 min; and final extension at 72 • C for 5 min; and (2) AcoYN primer: initial denaturation at 95 • C for 2 min; 35 cycles at 95 • C for 50 s, 60 • C for 50 s, and 72 • C for 50 s; and final extension at 72 • C for 5 min.PCR products were separated on 2% agarose gels at 150 V for 40 min.To validate the specificity of indel markers and confirm the variability of indel regions between Aconitum species and subgenera Aconitum and Lycoctonum, we checked PCR amplification profiles using 27 additional samples from nine species and one variety of Aconitum consisting of both Aconitum subgenus Aconitum and Lycoctonum, which were provided from the KIOM herbarium.In addition, to confirm that the sizes of the PCR products were accurate, two samples per species were sequenced.Each PCR product was rescued from the agarose gel, subcloned into the pGEM-T Easy vector (Promega, Madison, WI, USA), and sequenced on a DNA sequence analyzer (ABI 3730, Applied Biosystems Inc., Foster City, CA, USA) to estimate sizes and verify the sequences of amplicons.

Figure 1 .
Figure 1.Circular gene map of A. pseudolaeve and A. longecassidatum.Genes drawn inside the circle are transcribed clockwise, and those outside the circle counterclockwise.The darker gray in the inner circle represents (GC) content.The gene map corresponds to A. pseudolaeve.LSC: large single copy; IR: inverted repeat; SSC: small single copy; GC: guanine-cytosine; ORF: open reading frame.

Figure 1 .
Figure 1.Circular gene map of A. pseudolaeve and A. longecassidatum.Genes drawn inside the circle are transcribed clockwise, and those outside the circle counterclockwise.The darker gray in the inner circle represents (GC) content.The gene map corresponds to A. pseudolaeve.LSC: large single copy; IR: inverted repeat; SSC: small single copy; GC: guanine-cytosine; ORF: open reading frame.

Figure 3 .
Figure 3. Distribution of simple sequence repeats (SSRs) in A. pseudolaeve Nakai and A. longecassidatum Nakai CP genomes.(A) Distribution of SSR types in the two Aconitum CP genomes.(B) Distribution of SSRs between coding and non-coding regions.(C) Number of SSRs per unit length in the indicated genomic regions of Aconitum CP genomes.CP: chloroplast; LSC: large single copy; IR: inverted repeat; SSC: small single copy.

Figure 3 .
Figure 3. Distribution of simple sequence repeats (SSRs) in A. pseudolaeve Nakai and A. longecassidatum Nakai CP genomes.(A) Distribution of SSR types in the two Aconitum CP genomes.(B) Distribution of SSRs between coding and non-coding regions.(C) Number of SSRs per unit length in the indicated genomic regions of Aconitum CP genomes.CP: chloroplast; LSC: large single copy; IR: inverted repeat; SSC: small single copy.

Figure 4 .
Figure 4. Comparison of A. pseudolaeve Nakai, A. longecassidatum Nakai and Aconitum barbatum Patrin ex Pers.chloroplast (CP) genomes using mVISTA.Complete CP genomes of A. pseudolaeve and A. longecassidatum were compared with A. barbatum as a reference.Blue block: conserved gene; sky-blue block: tRNA and rRNA; red block: intergenic region.White peaks are sequence variation regions between A. barbatum and A. pseudolaeve, and A. barbatum and A. longecassidatum.CP: chloroplast; tRNA; transfer RNA; CNS: conserved non-coding sequence.

Figure 4 .
Figure 4. Comparison of A. pseudolaeve Nakai, A. longecassidatum Nakai and Aconitum barbatum Patrin ex Pers.chloroplast (CP) genomes using mVISTA.Complete CP genomes of A. pseudolaeve and A. longecassidatum were compared with A. barbatum as a reference.Blue block: conserved gene; sky-blue block: tRNA and rRNA; red block: intergenic region.White peaks are sequence variation regions between A. barbatum and A. pseudolaeve, and A. barbatum and A. longecassidatum.CP: chloroplast; tRNA; transfer RNA; CNS: conserved non-coding sequence.

Figure 6 .
Figure 6.The nucleotide substitution pattern in A. pseudolaeve Nakai and A. longecassidatum Nakai CP genomes.The patterns were divided into six types, as indicated by the six non-strand-specific basesubstitution types (i.e., numbers of considered G to A and C to T site sites for each respective set of associated mutation types).The A. pseudolaeve chloroplast genome was used as a reference.CP: chloroplast

Figure 6 .
Figure 6.The nucleotide substitution pattern in A. pseudolaeve Nakai and A. longecassidatum Nakai CP genomes.The patterns were divided into six types, as indicated by the six non-strand-specific base-substitution types (i.e., numbers of considered G to A and C to T site sites for each respective set of associated mutation types).The A. pseudolaeve chloroplast genome was used as a reference.CP: chloroplast

Table 1 .
Size comparison of two Aconitum species' CP genomic regions.

Table 2 .
Genes present in the CP genomes of A. pseudolaeve Nakai and A. longecassidatum Nakai.

Table 3 .
Genes with introns in the A. pseudolaeve Nakai CP genome, and lengths of exons and introns.
* rps12 gene is a trans-spliced gene.Gene length in A. longecassidatum Nakai is shown in parentheses.CP: chloroplast; LSC: large single copy; IR: inverted repeat; SSC: small single copy.

Table 5 .
Distribution of palindromic repeats in the CP genomes of A. pseudolaeve Nakai and A. longecassidatum Nakai.

Table 6 .
Primer information for insertion and selection (indel) marker development in this study.