Complete Mitochondrial Genome and a Set of 10 Novel Kompetitive Allele-Speciﬁc PCR Markers in Ginseng ( Panax ginseng C. A. Mey.)

: Panax ginseng C. A. Mey., a perennial herb belonging to the family Araliaceae, is a valuable medicinal plant with distinctive biological characteristics. However, comprehensive analyses of the mitochondrial genome (mitogenome) are lacking. In this study, we sequenced the complete mitogenome of ginseng based on long-read data from the Nanopore sequencing platform. The mitogenome was assembled into a “master circle” form of 464,705 bp and contained 72 unique genes. The genome had three large repeat regions, and 10.42% of the sequences were mitogenome sequences of plastid origin (MTPTs). In total, 278 variants (213 SNPs and 65 InDels) were discovered, most of which were identiﬁed in intergenic regions. The MTPT regions were mutational hotspots, harboring 74.5% of the variants. The ginseng mitogenome showed a higher mutation rate than that of the chloroplast genome, and this pattern is uncommon in plants. In addition, 10 Kompetitive allele-speciﬁc PCR (KASP) markers were developed from 10 SNPs, excluding those in MTPT regions. These markers accurately identiﬁed the genotypes of 59 Korean ginseng accessions and elucidated mitogenome diversity. These results provide insight into organellar genomes and genetic diversity in ginseng. Moreover, the complete mitogenome sequence and 10 KASP markers will be useful for ginseng research and breeding. × 72 × KASP assay mix Genomics), and 5 ng of genomic DNA template were mixed in a 1.6 µ L KASP reaction mixture in a 384-well Array Tape. The reactions were run in duplicate, including non-template controls as negative controls. KASP ampliﬁcation was performed using the following cycling proﬁle: ◦ ◦


Introduction
Panax ginseng C. A. Mey., a perennial herb belonging to the family Araliaceae, is a valuable medicinal crop [1]. Ginseng is one of the most well-known plants in the world [2], and its components have various pharmacological effects, such as immunity intensification, antioxidant activity, and antiaging effects [3][4][5]. Owing to these beneficial effects, its use is expanding to other fields, including the development of functional beverages and cosmetics [6,7]. However, ginseng grows very slowly, with a long life cycle of at least 4 years [8]. The seed production per plant is significantly lower than that of other major crops; therefore, a lot of effort is required to construct breeding populations. To effectively utilize ginseng, it is necessary to develop efficient genetic tools for crop improvement.
Mitochondria are essential organelles in most eukaryotic organisms, with vital roles in cellular energy production [9]. They contain an independent mitochondrial genome (mitogenome), which usually shows maternal inheritance [10]. The plant mitogenome varies substantially across species [11,12] and can show multipartite architectures within a species due to scattered repetitive sequences [13][14][15]. Mitogenomes in plants exhibit far lower mutation rates than those of their nuclear or chloroplast counterparts; accordingly, the structure and gene organization are highly conserved within a species [16]. This characteristic provides useful information for evolutionary and phylogenetic studies [17,18].
Recently, there has been significant progress in genomic research on ginseng. Reference nuclear and chloroplast genome sequences of various ginseng resources have been reported, providing essential information for genetic studies [19][20][21][22][23]. However, the ginseng mitogenome is poorly understood, and only unverified sequence data are available in GenBank (accession no. KF735063). Mitogenome sequencing is a challenging task owing to its structural complexity. However, technological advances, including the development of long-read sequencing platforms and assembly programs, have made mitogenome sequencing feasible. To gain a deeper understanding of the ginseng genome, the completion of the mitogenome sequence from official cultivars and comprehensive studies is required.
DNA molecular markers are useful tools for crop improvement [24] and for studies of genetic diversity and species authentication [25,26]. DNA molecular marker techniques have recently been developed for rapid and accurate genotype identification. Kompetitive allele-specific PCR (KASP) is an efficient molecular marker system that can be used to simultaneously analyze the genotype of many specimens using fluorescence signals [27]. The combination of efficient KASP markers and the conserved mitogenome can facilitate molecular breeding.
In this study, we sequenced and characterized the complete ginseng mitogenome using an official cultivar. We discovered various polymorphisms by a comparative analysis and developed KASP markers from single-nucleotide polymorphisms (SNPs) at a specific locus. The markers were used to elucidate the mitogenome diversity among ginseng accessions. We performed the first analysis of diversity in ginseng mitogenome and observed an unusually high mutation rate. These results broaden our understanding of organellar genomes and genetic diversity in ginseng. The newly developed KASP markers are expected to be used as essential genetic tools for efficient ginseng breeding and further research, including pedigree and population structure analyses.

Sampling, DNA Extraction, and Sequencing
Three Panax species were used (Table S1). Fresh leaves from 59 P. ginseng C. A. Mey., including 12 cultivars and 47 breeding lines, 10 P. quinquefolius L., and 10 P. notoginseng (Burk.) F. H. Chen were collected from the National Institute of Horticultural and Herbal Science (Table S1; Eumseong, Korea). Genomic DNA from 79 individual plants was extracted from~100 mg of frozen leaves using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. The DNA quantity and quality were checked using a QIAxpert system (Qiagen). P. ginseng cv. "Gumpoong" was selected as a representative cultivar and 2 µg of DNA was provided to PHYZEN (Seongnam, Korea) for library construction and sequencing. Oxford Nanopore and Illumina MiSeq libraries were prepared using a Rapid Sequencing Kit (SQK-RAD004, Oxford Nanopore Technologies (ONT), Oxford, UK) and the TruSeq Nano DNA Kit (Illumina, San Diego, CA, USA), respectively, in accordance with the manufacturers' instructions. The two genomic libraries were sequenced using the Oxford Nanopore MinION and Illumina MiSeq sequencing platforms.

Mitogenome Assembly and Annotation
Raw ONT sequencing data were trimmed using Porechop v. 0.2.3 (https://github.com/rrwick/ Porechop), and an in-house script was used to remove adaptor sequences and chimeric sequences. The trimmed ONT sequencing data were self-corrected using Canu assembler v. 1.71 (https://github. com/marbl/canu) with default parameters, and corrected ONT reads were assembled de novo using SMARTdenovo (https://github.com/ruanjue/smartdenovo). Assembled contigs were used as inputs for BLASTN [28] searches against the National Center for Biotechnology Information (NCBI) mitochondrion database (https://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/) with an E-value threshold of 1E-6, after which only mitochondrial contigs longer than 10,000 bp showing 95% identity were selected. Overlapping contigs were used to assemble, and one side of the overlapped sequences was removed to join the mitochondrial contigs. The Illumina MiSeq reads were trimmed using the CLC quality trim tool with default parameters and mapped on the assembled mitogenome using the clc_ref_assemble tool in the CLC Assembly Cell package (v. 4.21, CLC Inc., Aarhus, Denmark). Error correction was conducted by manual curation. The completed mitogenome sequence was annotated using the GeSeq tool with default parameters [29]. were used as references for primary annotation. The precise gene regions were determined by manual curation using BLAST searches [28]. A circular map of the annotated genome was visualized using OrganellarGenomeDRAW [30]. Large repeat structures were identified using BLASTZ [31], and tandem repeat (TR) sequences were discovered using the MISA-web program with custom parameters [32]. The completed mitogenome sequence and annotation data were deposited in the NCBI database.

Comparative Analysis
Assembled mitogenome sequences of cv. "Gumpoong" (GenBank accession no. MW029460) were compared with the registered mitogenome sequence of P. ginseng C. A. Mey. (GenBank accession no. KF735063) in the NCBI database as the reference sequence. A multiple sequence alignment of mitogenome sequences was generated using MAFFT [33]. To identify variants, including SNPs and InDels, in the mitogenome, a modified version of msaTovcf (using an in-house script) was used [34]. The variant positions were determined based on the reference ginseng mitogenome sequence. In addition, the chloroplast sequence of ginseng (GenBank accession no. KM067388) was used to explore regions showing a high homology with the plastid genome in the assembled mitogenomes. Sequences longer than 500 bp showing over 95% identity were selected using BLASTN with an E-value threshold of 1E-6 [28].

KASP Marker Development
From the SNPs identified in the mitogenome, 10 SNP sites lacking homology with the ginseng chloroplast genome were used to develop candidate KASP markers. The SNP information was sent to LGC (Teddington, UK), including 100-bp flanking sequences on both sides, to design two allele-specific forward primers and a common reverse primer. KASP amplification and allelic discrimination were performed using the Nexar system (LGC Douglas Scientific, Alexandria, VA, USA) in the Seed Industry Promotion Center (Gimje, Korea). An aliquot (0.8 µL) of 2× Master Mix (LGC Genomics), 0.02 µL of 72× KASP assay mix (LGC Genomics), and 5 ng of genomic DNA template were mixed in a 1.6 µL KASP reaction mixture in a 384-well Array Tape. The reactions were run in duplicate, including non-template controls as negative controls. KASP amplification was performed using the following thermal cycling profile: 15 min at 94 • C, a touchdown phase of 10 cycles of 94 • C for 20 s and 61 • C to 55 • C (dropping 0.6 • C per cycle) or 68 • C to 62 • C (dropping 0.6 • C per cycle) for 60 s, followed by 26 cycles of 94 • C for 20 s and 55 • C or 62 • C for 60 s (first PCR stage). Next, recycling was performed using three cycles of 94 • C for 20 s and 57 • C for 60 s (second PCR stage). Fluorescence was read for KASP genotyping after PCR.

KASP Marker Application
The 10 KASP primer set was applied to 79 ginseng accessions, including 59 P. ginseng C. A. Mey., 10 P. quinquefolius L., and 10 P. notoginseng (Burk.) F. H. Chen accessions, for validation. The genotypes of each polymorphic locus were used for a genetic diversity analysis. The number of alleles (N A ), major allele frequency (MAF), gene diversity (GD), polymorphism information content (PIC), and Nei's genetic distance [35] for each SNP locus were calculated using PowerMarker v. 3.25 [36]. A clustering analysis of Panax species was performed based on Nei's genetic distances and the unweighted pair group method with arithmetic mean (UPGMA) [37] using PowerMarker.

Complete Mitogenome Sequence of P. ginseng
We obtained the complete mitogenome sequence of P. ginseng cv. "Gumpoong" (GenBank accession no. MW029460) based on long-read data from ONT platforms. Short reads from the MiSeq platform were used for error correction. The total amounts of long-and short-reads were approximately 6.78 Gb and 5.58 Gb, respectively (Table S2). The de novo assembled sequence was 464,705 bp and had a single circular form ( Figure 1). The new ginseng mitogenome was slightly larger than the previously reported genome (464,680 bp, GenBank accession no. KF735063). The average read depths were about 110× and 165×, respectively. The ginseng mitogenome encoded 72 unique genes, including 45 protein-coding, 24 tRNA, and 3 rRNA genes (Table 1; Table S3). Among these, seven genes (cob, rpl10, trnE-UUC, trnM-CAU, trnP-UGG, trnQ-UUG, and trnY-GUA) were duplicated in the mitogenome, and seven copies of the trnM-CAU gene were identified. As a result of the gene duplication, 84 genes encoding 47 proteins, 34 tRNAs, and 3 rRNAs were identified in the ginseng mitogenome. In addition, 10 genes (ccmFc, cox2, nad1, nad2, nad4, nad5, nad7, trnA-UGC, trnStop-UUA, and trnV-UAC) had two or more exonic regions, and four genes (cox2, nad1, nad2, and nad5) had exonic regions that were located far apart (Table S3). Most of the protein-coding genes in the mitogenome had the common start codon (ATG); however, three additional types of start codons were also identified: ACG (cox1, nad4L, rps1, and rps10; C to U RNA-editing), ATA (tatC; A to G RNA-editing), and GTG (atp1; G-to-A RNA editing on the first site). Table 1. Annotated genes list in the mitogenome of P. ginseng.

Structure and Variation in Ginseng Mitogenome
The ginseng mitogenome harbored three large repeat structures, including two direct repeats and one inverted repeat (Figure 2). Two protein-coding genes (cob and rpl10) were duplicated due to these large repeat blocks. A total of 42 TR sequences were identified, mainly mononucleotides (18) and nonanucleotides (13) (Table S4). Mitogenome sequences of plastid origin (MTPTs), transferred from the plastid genome, were 48,463 bp (10.42%) in length (Table S5). The newly assembled ginseng mitogenome was highly similar to the reference sequence (GenBank accession no. KF735063), with a sequence identity of 99.8%. A comparative analysis of the two genomes revealed 278 polymorphisms, including 213 SNPs and 65 InDels, in the ginseng mitogenome (Table S6-S7). Most polymorphisms were located in intergenic regions; however, approximately 10.1% of the variants (24 SNPs and 4 InDels) were located in genic regions. In particular, the MTPT regions were mutational hotspots, including 74.5% of the variants (165 SNPs and 42 InDels) (Figure 2).

Structure and Variation in Ginseng Mitogenome
The ginseng mitogenome harbored three large repeat structures, including two direct repeats and one inverted repeat (Figure 2). Two protein-coding genes (cob and rpl10) were duplicated due to these large repeat blocks. A total of 42 TR sequences were identified, mainly mononucleotides (18) and nonanucleotides (13) (Table S4). Mitogenome sequences of plastid origin (MTPTs), transferred from the plastid genome, were 48,463 bp (10.42%) in length (Table S5). The newly assembled ginseng mitogenome was highly similar to the reference sequence (GenBank accession no. KF735063), with a sequence identity of 99.8%. A comparative analysis of the two genomes revealed 278 polymorphisms, including 213 SNPs and 65 InDels, in the ginseng mitogenome (Table S6-

Development and Validation of KASP Markers
We developed allele-specific markers based on the SNPs and their 100-bp flanking sequences. Ten candidate SNPs were selected and a KASP primer set was designed ( Table 2). The primer set was used to validate the polymorphisms and to genotype 79 ginseng accessions, including three Panax species. All the KASP markers provided genotyping results that could be classified into two clusters ( Figure 3, Table S8). The Korean ginseng samples showed diverse genotypes, whereas each of the 10 American and 10 Chinese ginseng samples had no variation within populations. The pgmt20 and pgmt199 loci were not identified in Chinese ginseng accessions.

Development and Validation of KASP Markers
We developed allele-specific markers based on the SNPs and their 100-bp flanking sequences. Ten candidate SNPs were selected and a KASP primer set was designed ( Table 2). The primer set was used to validate the polymorphisms and to genotype 79 ginseng accessions, including three Panax species. All the KASP markers provided genotyping results that could be classified into two clusters ( Figure 3, Table S8). The Korean ginseng samples showed diverse genotypes, whereas each of the 10 American and 10 Chinese ginseng samples had no variation within populations. The pgmt20 and pgmt199 loci were not identified in Chinese ginseng accessions.

Genetic Diversity in Ginseng
The genotypes of 59 Korean ginseng accessions were surveyed based on the 10 KASP markers (Table S8). The KASP markers exhibited polymorphism among the Korean accessions. We used them to evaluate the diversity in the ginseng mitogenome (Table 3). All the KASP loci had two alleles, as expected, and the MAF at each locus ranged from 62.71% for pgmt57 to 84.75% for pgmt17 and pgmt199. On average, 76.61% of the accessions contained a common major allele. The GD for each locus ranged from 0.2585 to 0.4677, with an average of 0.3494. The PIC values for KASP markers from mitogenomes ranged from 0.2251 to 0.3583, with an average of 0.2862. A clustering analysis of Korean ginseng accessions showed that the populations could be divided into seven groups based on the mitogenome haplotypes (Figure 4).

Discussion
Plant mitogenomes are larger than those of animals [9,11,12]. Because these genomes have a complex structure, sequencing is very labor-intensive. The recent development of various assembly technologies has make it possible to obtain long and accurate sequences [38]. Complete mitogenome sequences have been reported for many plants [39][40][41]. However, in ginseng detailed studies of the mitogenome sequence are lacking, and only unverified sequence data (GenBank accession no.

Discussion
Plant mitogenomes are larger than those of animals [9,11,12]. Because these genomes have a complex structure, sequencing is very labor-intensive. The recent development of various assembly technologies has make it possible to obtain long and accurate sequences [38]. Complete mitogenome sequences have been reported for many plants [39][40][41]. However, in ginseng detailed studies of the mitogenome sequence are lacking, and only unverified sequence data (GenBank accession no. KF735063) have been reported. We assembled the complete ginseng mitogenome sequence into a 464,705-bp-long "master circle" based on long-read data (Figure 1). In vascular plants, the mitogenome often has a multipartite architecture derived from recombination between large repeats [42,43]. Ginseng also has three large repeat structures in the mitogenome, but a unique circular genome has been maintained for a long time.
Mitogenomes in plants generally exhibit lower variation rates than those of other intracellular genomes [16]. Accordingly, their structure and gene contents are highly conserved within a species. However, the results for ginseng differed from established theory [44]. A previous study of diverse cultivated ginseng accessions identified low levels of variation, including six SNPs and six InDels, in the chloroplast genome [22], whereas the newly assembled ginseng mitogenome had 278 polymorphisms (i.e., an increase of >23-fold), including 213 SNPs and 65 InDels. A similar phenomenon has been observed in algae and a few land plants [43,45,46]. These variations mainly occur in intergenic regions or in coding regions as synonymous substitutions [47,48]. This observation could be explained by the specific DNA repair mechanisms for the mitogenome [49,50]. In fact, variations occur frequently in the mitogenome; however, the nucleotide substitution rates are low in coding regions [47,51]. Most polymorphisms in the ginseng mitogenome were found in intergenic regions (90.0%), especially in the MTPT regions (74.5%). These results strongly support the hypothesis that a higher variation rate in the mitogenome than in the chloroplast among land plants might not be as uncommon as once thought.
The development of accurate molecular markers in ginseng is very difficult owing to the extensive repetitive regions [52,53] and, in particular, paralogous regions arising from recent whole genome duplication events [19,54]. Accordingly, molecular marker development should be performed with caution [55]. An approach based on the mitogenome or chloroplast genome could be a more efficient method. We successfully developed 10 KASP markers based on the SNPs in the ginseng mitogenome. The SNPs in MTPT regions were excluded to prevent misinterpretation caused by simultaneous amplification in the two organellar genomes [56]. The effectiveness of the 10 KASP markers developed in this study was validated to identify clear genotypes for various ginseng accessions simultaneously. We also characterized the mitogenome diversity in ginseng using the newly developed KASP markers. In a previous study, ginseng showed lower variation rates in the nuclear genome; however, high levels of diversity at the polymorphic loci were observed [25,57,58]. The ginseng mitogenome exhibited a lower diversity than that of the nuclear genome; however, there were many more polymorphisms in the mitogenome than in the chloroplast genome. Although the GD and PIC values were low because they are mitogenome-derived markers, the ginseng accessions were effectively classified based on mitogenome haplotypes using these KASP markers. In addition, since the mitogenome shows uniparental inheritance and has more polymorphisms than the chloroplast genome, it is advantageous for classifying resources and pedigree analyses. These KASP markers will be great supports for ginseng breeding where molecular breeding systems are not well established.
In this study, we sequenced the complete ginseng mitogenome using the official cultivar "Gumpoong" based on long-read data. Through a comprehensive sequence analysis, the structural characteristics of the ginseng mitogenome were determined, and an unusually high level of variation was observed. In addition, we developed 10 KASP markers to evaluate the diversity in ginseng mitogenomes and to classify ginseng accessions. We performed the first analysis of mitogenome diversity to date. These results broaden our understanding of the genetic diversity in ginseng and provide insight into the organellar genome. Furthermore, the complete mitogenome sequence will provide essential information for further studies, and the newly developed KASP markers will be essential genetic tools for efficient ginseng breeding.