Comparative Analysis of Complete Chloroplast Genome Sequences and Insertion-Deletion (Indel) Polymorphisms to Distinguish Five Vaccinium Species

: We report the identiﬁcation of interspeciﬁc barcoding InDel regions in Vaccinium species. We compared ﬁve complete Vaccinium chloroplast (cp) genomes ( V. bracteatum , V. vitis-idaea , V. uliginosum , V. macrocarpon , and V. oldhamii ) to identify regions that can be used to distinguish them. Comparative analysis of nucleotide diversity from ﬁve cp genomes revealed 25 hotspot coding and noncoding regions, occurring in 65 of a total of 505 sliding windows, that exhibited nucleotide diversity ( Pi ) > 0.02. PCR validation of 12 hypervariable InDel regions identiﬁed seven candidate barcodes with high discriminatory powers: accD-trnT-GGU , rpoB-rpoA , ycf2-trnL-GAA , rps12-ycf15 , trnV-GAC , and ndhE-ndhF . Among them, the rpoB-rpoA(2) and ycf2-trnL-CAA sequences clearly showed the intraspeciﬁc and interspeciﬁc distance among ﬁve Vaccinium species by using a K2P technique. In phylogenetic analysis, included ﬁve Vaccinium species ( n = 19) in the Bayesian and Neighbor-Joining (NJ) analysis revered all species in two major clades and resolved taxonomic position within species groups. These two locus provide comprehensive information that aids the phylogenetics of this genus and increased discriminatory capacity during species authentication. vitis-idaea contained two copies of the psbA gene: One in the IRA / LSC border and the other in the IRB region. The ndhF gene was located in the SSC region, between 87 and 203 bp away from the borders. rpl32 resided in IRB, 616-669 bp away from the SSC / IRA border. In V. uliginosum and V. vitis-idaea , the 38 bp trnV-UAC gene was located in the LSC region, while V. macrocarpon had a 661 bp variant, and V. bracteatum lacked this gene.

mvista/submit.shtml) in Shuffle-LAGAN mode was used to compare the four Vaccinium cp genomes, using the V. macrocarpon cp genome as a reference. DNaSP version 6.0 [23] was used to calculate nucleotide diversity (Pi) among the five Vaccinium cp genomes. Only regions with a nucleotide diversity (Pi) value of >0.02 were considered. CPGAVAS2 [24] was used to annotate the cp genomes and predict the rRNA/tRNA sequences of V. macrocarpon and V. oldhamii. The comparison of the LSC/IRB/SSC/IRA junctions among these related species was visualized by IRscope (http://irscope.shinapps.io/irapp/), based on the annotations of their available cp genomes in Genbank.

Development and Validation of the Candidate DNA Barcodes
To validate interspecies polymorphisms within the chloroplast genomes, specific primers were designed using Primer 3Plus, based on hotspot regions with high nucleotide diversity identified in these Vaccinium cp genomes [25]. All DNA and fresh leaf samples were showed in Table 1 and identified based on our previous study [26]. For the extraction of total genomic DNA, fresh leaf samples of all species (80 mg wet weight) were added to a tube filled with stainless steel beads (2.38 mm in diameter) from a PowerPlantPro DNA Isolation Kit (Qiagen, Valencia, CA, USA), and the mixture was homogenized in a Precellys ® Evolution homogenizer (Bertin Technologies, Montigny-le-Breonneux, France). Genomic DNA was extracted using the PowerPlantPro DNA Isolation Kit according to the manufacturer's instructions. PCR amplifications were performed in a reaction volume of 50 µL containing 5 µL 10x Ex Taq buffer (with MgCl 2 ), 4 µL dNTP mixture (each 2.5 mM), Ex Taq (5 U/µL), (Takara, Japan), 10 ng genomic DNA, and 1 µL (10 pM) forward and reverse primers. The mixtures were denatured at 95 • C for 5 min and amplified for 40 cycles at 95 • C for 30 s, 55 • C for 20 s, and 72 • C for 30 s, with a final extension at 72 • C for 5 min. The target DNA was extracted and purified using a MinElute PCR Purification Kit (Qiagen). PCR products were visualized on 1.5% agarose gels with ethidium bromide. Purified PCR products were sequences by CosmoGenetech (Seoul Korea) using forward and reverse primers. The sequencing results were analyzed by BLAST searches of the GenBank database. Sequence alignment and data visualization were carried out using the CLC sequence viewer 8.0 [27].  (Table 1). The specimens were deposited at the National Institute of Biological Resources (NIBR) and Jeollanamdo Institute of Natural Resources Research (JINR Korea). The seven loci sequences of each Vaccinium species obtained during this study were compared with Vaccinium chloroplast genomes in Genbank (accession numbers LC521967, LC521968, LC521969, NC_019616, and NC_042713) using the Basic Local Alignment Search Tool (BLAST, available at http://blast.ncbi.nlm.nih.gov/Blast.cgi). Rhododendron delavayi (MN711645) and Rhododendron pulchrum (MN182619) was used as the outgroup taxon. The sequences were aligned using the Clustal W algorithm implemented in MEGA ver. 7.0. The phylogenetic tree was constructed using the neighbor-joining (NJ) method in MEGA software. The Komura 2-parameter (K2P) model and bootstrap analysis with 1000 replicates were included. Genetic distances were calculated using the K2P model. Bayesian analysis was conducted with MrBayes ver. 3.2 using two replicates of 1 million generations with the nucleotide evolutionary model. The best-fit model GTR + I + G was implemented using the Akaike Information Criterion (AIG) in MrModeltest ver. 2.3.

Comparative Analyses of the Chloroplast Genome of Five Vaccinium Species
A previous study reported that the three Vaccinium cp genome sequences were deposited in NCBI Genbank and published [26]. We compared the features of the newly sequenced cp genomes with those of V. macrocarpon (NC_019616) and V. oldhamii (NC_042713), already available in NCBI Genbank. The five Vaccinium cp genomes contained a pair of inverted repeat regions (IRs: 30,637-34,242 bp) which were separated by a small single copy region (SSC: 2979-3518 bp) and a large single copy region (LSC: 104,552-106,565 bp). All five varied in the number of genes; total gene number ranged from 117 to 147, protein-coding genes from 75 to 85, and tRNA genes from 30 to 38. All five cp genomes contained 8 rRNA genes. The overall GC content in each cp genome was approximately 37.1% (Table 2). These genes can be classified into five categories based on their different roles in the chloroplast ( Table 3). The rpoA, rps7, rps12, rps16, petB, and petD genes were present in V. macrocarpon and V. oldhamii, but not in the other three. The trnG-GCC gene was absent from V. macrocarpon and V. oldhamii. The trnD-GUC, trnfM-CAU, trnK-UUU, trnR-UCU, trnV-UAC, trnY-GUA, and rpl2 genes were absent from V. bracteatum. The rpl20 and psbZ genes were absent from V. macrocarpon. The lhbA, infA, ycf2, ycf15b, and ycf68 genes were present only in V. macrocarpon. Table 3. List of genes encoded by the five Vaccinium chloroplast genomes. a Gene with two copies; * Gene with one intron; ** Gene with two intron; The symbol • indicate the presence of the gene; -gene loss; Vb, Vaccinium bracteatum; Vu, Vaccinium uliginosum; Vv, Vaccinium vitis-idaea; Vm, Vaccinium macrocarpon; Vo, Vaccinium oldhamii.

Gene Category
Gene Group Gene Names Vb Vu Vv Vm Vo Table 3. Cont.

Gene Category Gene Group Gene Names Vb Vu Vv Vm Vo
Large subunit of ribosome

Gene of unknown function
Open reading frame ycf4 In addition, we observed variation in the copy numbers and intron numbers of several genes. Six protein-coding genes, four rRNA genes, and two tRNA genes were present in two copies. Furthermore, rpoA, rps3, rps18, and rpl22 had two copies only in V. macrocarpon. Moreover, two copies of the rps12 gene were identified in both V. macrocarpon and V. oldhamii. Fourteen genes contained introns: These included the rpoC1 RNA polymerase gene, seven tRNA genes, and five protein-coding genes. rpoC1, trnA-UGC, trnI-GAU, ndhA, and ndhB genes contained one intron in all five cp genomes. trnG-UCC and rps16 genes with one intron were identified in the four cp genomes other than V. macrocarpon. The trnK-UUU gene in V. bracteatum and V. macrocarpon, and the trnV-UAC gene in V. bracteatum, contained no introns, while rps3 had one intron in the three cp genomes other than V. macrocarpon and V. oldhamii. The psbA, petB, and petD genes in V. macrocarpon and V. oldhamii contained one intron. Only the ycf3 genes had two introns in each of the five cp genomes. All of the above divergences are shown in Table 3.
We compared the border structure of the five cp genomes in detail ( Figure 1). IR regions contained rpl32 and the IRA/LSC border contained a part of the psbA gene. V. vitis-idaea contained two copies of the psbA gene: One in the IRA/LSC border and the other in the IRB region. The ndhF gene was located in the SSC region, between 87 and 203 bp away from the borders. rpl32 resided in IRB, 616-669 bp away from the SSC/IRA border. In V. uliginosum and V. vitis-idaea, the 38 bp trnV-UAC gene was located in the LSC region, while V. macrocarpon had a 661 bp variant, and V. bracteatum lacked this gene.

Discussion
In our previous work, we sequenced the cp genomes of V. bracteatum, V. vitis-idaea, and V. uliginosum using Illumina Hiseq platform, which provided resources for evolutionary and genetic studies of Vaccinium [26]. Although a recent study submitted sequences of a few Vaccinium species, such as V. macrocarpon and V. oldhamii, to the NCBI Genbank database, most research has been limited to "core" DNA barcodes, with resolution limited to the species level. By comparing the gene structure, content, and arrangement of five Vaccinium cp genomes, we have detected valuable variations in intergenic spacer lengths, which could serve as interspecific DNA barcodes.
Of these five species, V. macrocarpon had the largest cp genome and IR length; other species exhibited minor differences in genome and IR length, whereas V. vitis-idaea and V. oldhamii had the largest SSC and LSC length, respectively. All five cp genomes contained variation in protein coding and tRNA genes, with the exception of V. uliginosum and V. vitis-idaea, which were identical to the reference. All cp genomes had identical rRNA genes. The ycf15 and ycf68 genes were lost in three of the cp genomes, and was inferred as a pseudogene in V. macrocarpon. Their function as hypothetical genes is ambiguous in various land plants [28]. The infA gene, which codes for a translation initiation factor, was missing in all species other than V. macrocarpon. Millen et al. 2001 [29] demonstrated at least 24 independent losses of infA in angiosperms, with a transfer into the nucleus in four lineages.
DNA barcodes are universal DNA sequences, such as rbcL, trnH-psbA, and matK, that have a high mutation rate. Their use allows researchers to distinguish a species within a given taxon, and to reliably identify plant species. Because the core DNA barcodes lack sufficient variation between closely related taxa, none of them work across all plant species [17]. With advances in NGS technologies, recent barcoding studies have focused on the use of whole-chloroplast genome-based barcodes. Because they are more efficient at detecting gene loss and determining gene order than the established DNA barcoding, they are better able to distinguish between closely related taxa [30]. The continuing advances in NGS technology may make these the method of choice for plant identification. In contrast to SNPs and SSRs, INDELs have received more attention recently; they are relatively abundant, spread throughout the genome, contribute to both intra-and inter-specific variation, and are suitable for fast and cost-effective genotyping.
Our phylogenetic analysis of rpoB-rpoA(2) and ycf2-trnL-CAA gene sequences, the two major clade (A and B) were confirmed, where the clade A represent the species with the fruit color of red type (a closely related species to cranberry) and the clade B represent species with the fruit color of dark blue type (a closely related species to blueberry). Many researchers reported that V. macrocarpon is closely related to V. vitis-idaea based on a phylogenetic analysis of nrITS sequence data [31,32]. We suggest that these intergenic spacers are suitable for use as DNA barcodes; they have good priming sites, and exhibit length variation and interspecific variation. This study analyzed a limited number of Vaccinium species based on available cp genomes; more complete cp genome sequences are needed to resolve the comprehensive phylogenies and genetic divergence within the Vaccinium genus.