Mining Single Nucleotide Polymorphism (SNP) Markers for Accurate Genotype Identiﬁcation and Diversity Analysis of Chinese Jujube ( Ziziphus jujuba Mill.) Germplasm

: Chinese jujube ( Ziziphus jujuba Mill.) is an economically important fruit tree with outstanding adaptability to marginal lands and a broad range of climate conditions. There are over 800 cultivars, mostly landraces from China. However, a high rate of mislabeling in Chinese jujube germplasm restricts the sharing of information and materials among jujube researchers and hampers the use of jujube germplasm in breeding. In the present study, we developed a large panel of single nucleotide polymorphism (SNP) markers and validated 288 SNPs by genotyping 114 accessions of Chinese jujube germplasm. The validation resulted in the designation of a set of 192 polymorphic SNP markers that revealed a high rate of synonymous mislabeling in the jujube germplasm collection in Ningxia, China. A total of 17 groups of duplicates were detected, encompassing 49 of the 114 Chinese jujube cultivars. Model-based population stratiﬁcation revealed two germplasm groups, and the core members of the two groups showed a signiﬁcant genetic differentiation (Fst = 0.16). The results supported the hypothesis that the cultivated Chinese jujube had multiple origins and multiple regions of domestication. The Neighbor-Joining dendrogram further revealed that this collection is comprised of multiple sub-groups, each including 1-13 closely related cultivars. Parentage analysis of cultivars with known pedigree information proved the efﬁcacy of using these SNP markers for parentage veriﬁcation. A subset of 96 SNPs with high information index was selected for future downstream application including gene bank management, veriﬁcation of pedigrees in breeding programs, quality control for propagation of planting materials and support of the traceability and authentication of jujube products. ﬁrst SNP discovery and validation study in jujube, demonstrating the utility of published genomic resources as an approach for rapid development of high-quality genotyping tools.


Introduction
Chinese jujube (Ziziphus jujuba Mill.) is a diploid fruit crop (2n = 2x = 24) in the Rhamnaceae family. This plant species is native to China, with its putative center of origin located in the Yellow River basin [1][2][3]. Chinese jujube (hereafter referred to as jujube) is a multipurpose tree cultivated for fruits and has a tremendous economic importance. It is one of the earliest domesticated fruit trees in China, with a history of utilization going back more than 7000 years [2][3][4]. Recent research suggests that the current cultivars of Chinese jujube were originally selected from sour jujubes (Ziziphus jujuba Mill. var. spinosa), which are still widely distributed in Northern, Central, and Southwestern China [1,3,4]. Jujube has become increasingly popular in China and abroad for its outstanding adaptability, nutritious fruits, and many attributes that are utilized in food and traditional medicine. It is an ideal economic crop for arid and semiarid areas of temperate and subtropical regions where most common fruit trees cannot be grown. Presently it ranks 7th among fruit tree These SNP markers, as well as the genotyping method, will be particularly useful for jujube germplasm management, breeding programs, and propagation of planting materials.

Discovering Jujube SNP Markers through Data Mining
SNP data mining was performed using sequence data of 36 Ziziphus jujuba genotypes (SRR3095649 to SRR3095689, SRR3310162 to SRR3310166, SRR5041640, SRR5041641, SRR5041644, SRR5041645), as well as the related species Ziziphus mauritiana (SRR6267272) and Ziziphus spina-christi (SRR6277366), which were deposited in the NCBI Sequence Read Archive (SRA) database. These SRA reads were downloaded from the database and mapped on the jujube reference genome (JREP00000000) [17] using the BWA program [18]. The Genome Analysis Toolkit (GATK) package v 3.5 [19] was used for SNP calling using HaplotypeCaller with default parameters. Then the hard filters (parameters: QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0) were applied to exclude low-quality alleles. Four sequencing datasets (SRR3081153, SRR3081197, SRR3081340, and SRR3081342), which were used in jujube genome-assembly, were also downloaded and included in GATK SNP calling steps. These data were used as internal references to correct error or ambiguous sequences in jujube genome assembly. Among 36 jujube genotypes, the polymorphic loci (MAF > 0.10) were selected as candidate SNP loci. To select high-quality SNPs for experimental validation, any SNPs that had other possible adjacent SNP sites 80 bp upstream or 80 bp downstream were eliminated. From the discovered putative SNPs, a subset of 288 putative SNPs was selected for validation test using the nanofluidic array genotyping system (Fluidigm Co, South San Francisco, CA, USA). The primers of the selected 288 SNPs were designed by Fluidigm and applied on the selected jujube cultivars for validation.

Plant Materials and DNA Extraction
A total of 114 jujube cultivars (Table 1) were used in the present study. These jujube germplasm accessions were maintained in the jujube collection in Yinchuan, Ningxia, China. For DNA extraction, three fully expanded healthy leaves were harvested and the leaves were freeze-dried. The DNeasy Plant Mini kit (Qiagen Inc., Valencia, CA, USA), was used to extract DNA from the dried jujube leaves. A TissueLyser II (Qiagen Inc.) was used to disrupt the dry leaf tissue samples with high-speed shaking (30 Hz for 1 min) using Lysing Matrix A (MP Biomedicals. Solon, OH, USA) as described in Fang et al., 2013. A NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE, USA) was used to determine DNA concentration by absorbance at 260 nm and to estimate DNA purity at ratios of 260:280 and 260:230.

Data Analysis
Duplicate cultivars were identified using pairwise multilocus matching among all individual samples. DNA samples that were fully matched at all genotyped SNP loci were considered the same cultivar or clones. The procedure of multilocus matches, as implemented in the program GenAlEx 6.5 [20], was used for computation. The probability of identity among siblings (PID-SIB ), which is the probability that two sibling individuals drawn at random from a population have the same multilocus genotype, was used to measure the statistical rigor of the matching result. The overall PID provides the minimum essential number of loci required to resolve all individuals and relatives in a group. After duplicate identification, the redundant samples were removed and only one genotype from each duplicate group was retained and included in consequent diversity analysis. Summary statistics, including minor allele frequency, observed heterozygosity, expected heterozygosity, and Shannon's information index were computed, using the software GenAlEx 6.5 [20].
Population structure of the jujube samples was determined using a model-based Bayesian cluster analysis software STRUCTURE v2.3.4 [21]. The admixture model was applied and the number of clusters (K-value), indicating the number of genetic clusters, was set from 1 to 10. The analyses were carried out without assuming any prior information about the genetic groups or geographic origins of the samples. Ten independent runs were assessed for each fixed number of clusters (K value), each consisting of 100,000 iterations after a burn-in of 200,000 iterations. The Delta K value [22] was used to detect the most probable number of clusters using the online program STRUCTURE HARVESTER [23]. Permutation was performed using the computer program CLUMPPv1.1.1 [24] and the resultant outputs were then visualized using computer program Distruct v1.1 [25].
Distance-based multivariate analysis was performed on the individual data. Pairwise genetic distances were computed using the Distance option, and Principal Coordinates Analysis (PCoA) within the GenAlEx 6.5 program [20]. Both distance and covariance were not standardized. In addition, a cluster analysis using the neighbor-joining (NJ) method was used to further examine the genetic relationship among the cultivars with unique SNP profiles. Nei's distance [26] was chosen as a genetic distance measurement for the individual accessions with the program MICROSATELLITE ANALYZER [27]. A dendrogram was generated from the resulting distance matrix using the NJ algorithm available in PHYLIP version 3.697 [28] and the tree was constructed with the program Fig  Tree v 1.4.3 [29].
To test the efficacy of using these SNPs for pedigree verification in jujube germplasm, seven cultivars with known parents (per literature records), were selected for parentage analysis (Table 1). It is known that these seven cultivars were selected from true seedlings (in contrast to clonal selection) but their parentage was either partially known or unknown. These cultivars were considered 'offspring' for which parentage analyses were carried out using the rest of the cultivars as potential candidate parents. A likelihood-based method implemented in the program CERVUS 3.0 was used for computation [30,31]. A likelihood ratio (LOD score) was calculated for each parent-offspring pair. Critical LOD scores were determined for the assignment of parentage to a group of individuals without knowing the maternal or paternal parent. Simulations were run for 10,000 cycles, assuming that 20% of candidate parents were sampled, and a total of 95% of loci was typed with a 1% typing error rate. The most probable single mother (or father) for each offspring was identified based on the critical difference in LOD scores (D) between the most likely and the next most likely candidate parents at greater than 95% confidence [30,31].
To facilitate future-large scale application of these SNPs in jujube genotyping, a core set of 96 SNPs was selected out of the 192 SNPs. Quality Assurance Module from SNP Variation Suite version 8 (SVS8; Golden Helix Inc., Bozeman, MN, USA) was applied to remove SNPs that were in high level of linkage disequilibrium (LD) with each other (r 2 ≥ 0.5). Then the final core set of 96 SNPs was selected based on the Shannon's information index values. The accumulated PID value was computed for these 96 SNP markers following the method of Waits et al., (2001), using GenAlEx 6.5 [20]. Genetic distances among the jujube cultivars were computed using the selected 96 SNP markers. A Mantel test was performed between the distance matrix based on the full panel of 192 SNPs and the matrix based on the selected 96 SNPs, using the same computer program.

Data Mining and SNP Discovery
A total of 41 files containing high-throughput sequencing data were downloaded from NCBI, accounting for 376.43 giga nucleotides. NGS QC Toolkit (v. 2.3.2) software was used to remove reads with 20% or more low quality bases (Phred score < 20) [32]. High-quality reads from all sequencing data were then compiled for alignment using the short-read mapping program BWA. SNP calling by GATK was applied for all sequencing data separately, resulting in a large number of potential SNPs. On average, around 2,500,000 potential SNPs were called in each jujube genotype. An in-house Perl script was then used to merge all potential loci, resulting in a total of 11,366,557 SNPs. To select high-quality SNPs for experimental validation, SNP sites having an adjacent SNP site either 80 bp upstream or 80 bp downstream were eliminated. In total, 32,249 putative SNPs, including coding gene regions and intergenic regions that covered all jujube chromosomes, were obtained, which were applicable for SNP experimental validation. Detailed information of these putative SNPs is presented in Supplementary Materials (Supplementary Data 1).
A total of 288 putative SNPs were selected for validation testing. Out of 288 SNP markers, only 12 failed, likely due to the sequence complexity or the presence of polymorphisms within the flanking sequences. Among the 276 successful SNPs, 40 were monomorphic across the 114 samples (i.e., only one SNP variant was identified in all individuals) or the frequency was lower than 0.02. These monomorphic markers may have resulted from errors in sequencing, which then led to the incorrect identification of SNPs. It is also possible that some of these SNPs may correspond to rare alleles that were not present in the tested set of jujube accessions. From the remaining 236 SNP markers, a total of 192 polymorphic SNPs were selected based on their no-call rate and consistency of genotyping result. Primers with a no-call rate above 5% were excluded. This final set of 192 SNPs was included in the subsequent data analysis. The flanking sequences for these 192 SNPs are listed in Supplementary Data 2, whereas the genotyping result generated by the 192 SNP markers for all 114 analyzed Chinese jujube cultivars is presented in Supplementary Data 3.

Cultivar Identification
SNP profiles of the multiple leaf samples from the same jujube cultivars showed that genotyping results were highly consistent, as shown by the high repeatability of internal controls (Supplementary Data 3). An example showing the multilocus SNP data among jujube cultivars was presented in Table 2. Multilocus matching of SNP fingerprints revealed a high rate of duplicates in this jujube collection. Out of the 114 analyzed cultivars, a total of 49 cultivars could be classified into 17 synonymous groups ( Table 3). The number of cultivars in each synonymous group ranged from two to eight. The probability that two jujube cultivars will have the same genotype at the 192 SNP loci is approximately 1 in 1,000,000 as computed by the multilocus matching procedure found in GenAlEx 6.5 [20]. From each of the synonymous groups, only one cultivar was retained and used for subsequent diversity analysis. This procedure led to the identification of 79 genotypes that had unique SNP profiles.
Descriptive statistics were then computed for the 192 polymorphic SNPs across the 79 jujube cultivars, and the results are presented in Supplementary Data 4. The mean information index was 0.577, ranging from 0.010 to 0.693. The observed heterozygosity ranged from 0.013 to 0.842 with an average of 0.355, whereas the mean expected heterozygosity was 0.350, ranging from 0.008 to 0.500 (Supplementary Data 4).   Based on Shannon's Information Index, a subset of 96 SNP markers was selected (Supplementary Data 2). Every single cultivar could be distinguished by the combined use of these 96 SNPs. The accumulated PID of these 96 SNPs was 6.37 × 10 −12 . Correlation between the full-panel (192 SNPs) and the core-panel (96 SNPs) matrix of genetic distance was highly significant (r = 0.8075, p < 0.01), as shown by the Mantel Test (Figure 1).
Based on Shannon's Information Index, a subset of 96 SNP markers was selected (S Data 2). Every single cultivar could be distinguished by the combined use of these 96 SNPs. The accumulated PID of these 96 SNPs was 6.37 × 10 −12 . Correlation between the full-panel (192 SNPs) and the core-panel (96 SNPs) matrix of genetic distance was highly significant (r = 0.8075, p < 0.01), as shown by the Mantel Test (Figure 1).

Population Stratification
Population stratification of the 79 jujube accessions, based on ∆K value computed by STRUCTURE HARVESTER, revealed two clusters (Figure 2) as the most probable number of K [22]. At a high assignment coefficient value (Q > 0.80), the first group included 21 core members, whereas the second group included 24. The remaining 34 cultivars were classified as admixed genotypes ( Figure 3 and Table 4). The three groups did not show a consistent pattern of geographical origin (i.e., each group included jujube cultivars from different provinces). Nonetheless, in the first group of core members, two-thirds of the cultivars were from Shanxi and Shaanxi, whereas in the second group only 15% of the cultivars were from these two provinces.
Analysis of molecular variance (AMOVA) showed that both the within-and amonggroup variations were highly significant, accounting for 84% and 16% of the total molecular variance, respectively (Figure 4). Pairwise Fst between the two groups was 0.16, and the result of the permutation test was highly significant (p < 0.001), showing a significant genetic differentiation between these two groups.

Population Stratification
Population stratification of the 79 jujube accessions, based on ΔK value computed by STRUCTURE HARVESTER, revealed two clusters (Figure 2) as the most probable number of K [22]. At a high assignment coefficient value (Q > 0.80), the first group included 21 core members, whereas the second group included 24. The remaining 34 cultivars were classified as admixed genotypes (Figure 3 and Table 4). The three groups did not show a consistent pattern of geographical origin (i.e., each group included jujube cultivars from different provinces). Nonetheless, in the first group of core members, two-thirds of the cultivars were from Shanxi and Shaanxi, whereas in the second group only 15% of the cultivars were from these two provinces.    Analysis of molecular variance (AMOVA) showed that both the within-and amonggroup variations were highly significant, accounting for 84% and 16% of the total molecular variance, respectively (Figure 4). Pairwise Fst between the two groups was 0.16, and the result of the permutation test was highly significant (p < 0.001), showing a significant genetic differentiation between these two groups.

Among Pops 16%
Within Pops 84% Percentages of Molecular Variance

PCoA and Clustering Analysis
Genetic relationships among the analyzed jujube accessions are presented in the principal coordinates analysis (PCoA) plots ( Figure 5). The two core member groups assigned by the Bayesian clustering analysis were clearly distinguished without overlapping, showing the different genetic background between these two groups of cultivars. However, the geographical pattern was not clearly reflected in the PCoA.

PCoA and Clustering Analysis
Genetic relationships among the analyzed jujube accessions are presented in the principal coordinates analysis (PCoA) plots ( Figure 5). The two core member groups assigned by the Bayesian clustering analysis were clearly distinguished without overlapping, showing the different genetic background between these two groups of cultivars. However, the geographical pattern was not clearly reflected in the PCoA. The NJ tree revealed additional insight that is complementary to those presented by PCoA and Bayesian stratification ( Figure 6). The NJ tree classified the jujube cultivars into 20 small sub-clusters, which were deeply separated. However, each of these 20 subclusters comprised 1-13 closely related cultivars. Some of these sub-clusters, such as Huizao, The NJ tree revealed additional insight that is complementary to those presented by PCoA and Bayesian stratification ( Figure 6). The NJ tree classified the jujube cultivars into 20 small sub-clusters, which were deeply separated. However, each of these 20 subclusters comprised 1-13 closely related cultivars. Some of these sub-clusters, such as Huizao, Ruchengzao, Lizao, Longzhu 1, etc. reflected specific geographical origins.   Table 2.

Parentage Analysis
Among the five cultivars that were indicated as true seedlings, four of them were assigned a paternal (or maternal) parent (>95% confidence level) by matching with the records ( Table 5). The only exception is 'Zaoqiuhong', whose maternal parent was supposed to be 'Dalingzao' from Shandong. The result further clarified that cv. 'Longzhu 1' and 'Longzhu 2', which were thought to be siblings from the same parents, were not related and had different parents. 'Longzhu 1' was a progeny of 'Lizao', whereas 'Longzhu 2' was a progeny of 'Dongzao'. The result of parent-offspring assignment is also highly compatible with the cluster analysis (Figure 6), where the identified parent-offspring pairs were all grouped closely in the same sub-clusters.

Development of SNP Markers through Data Mining
Despite great progress in genomics research on jujube, availability of advanced molecular tools to support germplasm management has been scarce. Developing SNP markers using available sequences could fill the gap between genomic research and downstream applications by jujube breeders and genebank curators. In the present study, we developed 32,249 putative SNPs based on SRA sequences of jujube in a public database and used them to genotype a diverse panel of 114 jujube cultivars. We obtained a success rate of approximately 80% for marker validation, which demonstrated that this approach is effective and can thus serve as a shortcut for large-scale SNP development.

Jujube Cultivar Identification Using SNP Markers
Reliable identification of jujube cultivars is invaluable for management of jujube genetic resources, propagation of planting materials, and breeding for new cultivars with desirable agronomic traits and quality attributes. In the present study, it has been demonstrated that the SNP marker fingerprinting was effective for the assessment of genetic identity of jujube germplasm. As shown in the present study, results from multiple clones of the same cultivar showed 100% concordance, demonstrating that the nanofluidic array system is a reliable platform for generating jujube DNA fingerprints with high accuracy.
The present results revealed a high rate of genetic redundancy in the tested jujube collection. This result is consistent with the result of Xu et al. (2016), who reported that 47% of the analyzed germplasm accessions had at least one duplicated accession. This high rate of synonymous mislabeling can be explained by the fact of germplasm exchange. Jujube has a long cultivation history in China. Elite cultivars were introduced to different regions and the long-term interregional cultivar exchange has resulted in extensive duplications in germplasm collections. Some of the identified duplicates are well-documented synonymous cultivars. For example, 'Jinsixiaozao' is a popular cultivar widely distributed in the provinces of the lower Yellow River valley, such as Shandong, Henan, Hebei, and Beijing. As shown in the present study, the same cultivar was labeled differently in different regions (e.g., 'Jinsixiaozao', 'Laolingxiaozao', 'Cangxiantunzizao', and 'Puyangxiaozao'), which caused duplications in ex situ genebanks. The same patterns were found in elite cultivars 'Zanhuangchangzao', 'Zhongningyuanzao', and 'Minqinxiaozao'. Identification of these synonymous groups will significantly improve the accuracy and efficiency in the exchange, conservation, and use of jujube germplasm.
However, caution needs to be taken regarding the interpretation of cultivars with the same SNP profiles. This is because somatic mutations are commonly reported in jujube and can modify many phenotypic traits such as fruit skin color, flesh color, growth habit and fruit quality attributes [3]. These somatic mutations have been the major source of variation exploited for the selection of new cultivars. For example, between 2007 and 2014, there were 11 newly released cultivars in China that were selections based on somatic mutations [33]. This challenge also existed in fingerprinting projects dealing with other vegetative propagated crops, such as pineapple [10] and banana [34]. For these types of duplicates, more comprehensive genomic approaches, such as genome resequencing, would be needed for the detection of somatic mutations and copy number alterations in corresponding genes or alleles. For this reason, the reduction of genetic redundancy in jujube genebank should not be based on DNA fingerprints only. Characterization of phenotypic traits of the synonymous group members is still essential to complement DNA fingerprinting for genotype identification.
In addition, genotyping of the jujube collection in Ningxia alone is not sufficient to fully correct the mislabeling in this collection. This is because most of the jujube accessions in the Ningxia repository were introduced from various jujube collections in other provinces in China. These germplasms are not necessarily authentic. Therefore, to correct the mislabeling in these introduced cultivars, the reference profiles of the original trees in the source genebanks need to be established using the same set of SNP markers. These reference SNP profiles then can be compiled and deposited in a jujube germplasm database, which should be publicly accessible, in order to make comparisons between reference standard and any tested cultivars or clones.

Parentage Verification for Improved Jujube Cultivars
In addition to accurate cultivar identification, accurate parentage and pedigree information is also imperative for jujube cultivar registration and protection of the breeder's rights as well as for efficient use of germplasm in breeding programs. Although most jujube cultivars currently used in production are landraces, improved cultivars and breeders' selections are being released at an accelerated pace [9]. However, the recorded parentage has not always been clear for the released cultivars. Moreover, the recorded parents could be a mislabeled accession. The present study evaluated the efficacy of using the developed SNP panel for parentage verification. Among the five cultivars that have known parental cultivars or parentage background, four were proven to have the correct parents-offspring relationship, and one was found to be misreported.
The discrepancy between breeding records and observed SNP profiles was well illustrated by the example of 'Longzhu 2' and 'Longzhu 3'. These two varieties were recorded as siblings selected from the same progeny population of 'Dongzao' × 'Lizao'. However, these two were found to be duplicates in the present study, suggesting the possibility that the breeding record might be wrong. Nonetheless, more samples of 'Longzhu2', and 'Longzhu3', preferably from original sources, need to be examined to confirm the observation. The results demonstrate the usefulness of using these SNP markers to support jujube cultivar registration. Given that hybrid verification is of critical importance in jujube breeding because of self-compatibility in some germplasm accessions [9,32], these SNP markers could also be used by jujube breeders to effectively manage breeding lines based on marker-based parentage and family pedigree.

The Core Set of SNP Markers for Universal Jujube Cultivar Identification
Various molecular markers have been applied on jujube cultivar identification. However, the key challenge is to have a standard set of markers that can allow cross-laboratory data comparison. Despite the high polymorphism of SSR markers, it is difficult to compare and combine SSR fingerprints generated by different laboratories or genotyping platforms (e.g., ABI, SEQ, or other gel box electrophoresis). Additional challenges include the low accuracy, low efficiency, and high cost. Therefore, a small core set of SNP markers is needed for various downstream applications in the value chain of jujube. Data generated by this small set of SNPs can be easily compared with each other, regardless of the genotyping platform used.
The present study selected 96 high quality SNPs (out of the 192 SNPs reported here), which formed a jujube genotyping kit. This subset of SNP markers was filtered to remove markers that show a high level of linkage disequilibrium (LD) and have a high polymorphism informative content (PIC). The accumulated PID value demonstrated that this panel has sufficient statistical power for accurate cultivar identification of jujube cultivars. The Mantel test showed a high correlation (r = 0.91) between these 96 SNPs and the full panel of 192 SNPs. The generated SNP profiles can be converted into a simple bar code and be used in many other downstream applications, such as nursery accreditation, cultivar registration, and the authentication of geographically referenced jujube products.

Genetic Relationships among the Different Germplasm Groups
Bayesian stratification (Figure 3) showed that the 79 unique jujube cultivars (and synonymous groups) could be grouped into two different clusters. The partitioning result did not show a consistent pattern of geographical origin. The Fst value between the two germplasm groups was 0.16, demonstrating a substantial interpopulation differentiation and therefore supporting the hypothesis of significant regional differentiation of jujube germplasm [1,3].
Since the Evanno Delta K graph also showed secondary peaks at K = 3 and K = 5, we included the corresponding partitioning results in the Supplemental Data 5. However, it is worth noting that at K = 3 and K = 5, a much larger proportion of the cultivars were classified as admixture. This was likely due to the relatively small sample size used in the present study. Indeed, out of the 800 or so existing germplasm accessions, only a small fraction was included in the present study. Full-scale sampling of the cultivated jujube gene pool, together with established reference standards, will be needed to correctly partition the jujube varieties into appropriate genetic clusters.
The NJ Tree revealed complementary insight about the relationships among the 79 cultivars. The 19 small sub-clusters were deeply separated in the NJ tree ( Figure 6), which suggests that there was a lack of crosses and recombination among these sub-clusters. However, each sub-cluster comprised several (up to 13) closely related cultivars, and some of them were exclusively from the same region. This observation indicates that these closely related cultivars may share a common ancestry or parentage. This type of clustering pattern suggests that the large number of jujube cultivars (>800) in China could have been derived from a much smaller number of progenitors that have not been crossed with each other extensively, either due to geographical separation or reproductive barrier (e.g., cross-incompatibility and self-fertilization).
This interesting pattern of genetic structure in jujube germplasm suggests that there is great potential to explore heterosis between the germplasm cluster and sub-clusters. From the perspective of long-term germplasm conservation and genebank management, the present results also suggest that a much smaller collection can be sampled to represent most of the genetic diversity existing in the large number of jujube cultivars. In this way, more resources could be allocated to conserving other related taxa and ensure that maximum genetic diversity in the primary gene pool of jujube is conserved.
In conclusion, we conducted a study to develop a large number of SNP markers for jujube germplasm management and genetic improvement. We validate a small set and applied them for fingerprinting the jujube germplasm collection in Ningxia, China using a nanofluidic array method. This approach enabled us to generate high-quality SNP profiles for accurate identification of jujube cultivars. This tool is highly useful for the management of jujube genetic resources, which will also lead to more efficient selection of parental clones for jujube breeding. Furthermore, these SNP markers can be used to protect intellectual property rights of breeders, monitor clone purity of planting materials, and for the authentication of premium jujube products. Our result also generated significant insight regarding the classification of jujube cultivars. For the identified synonymous groups, morphological characterization is underway to identify any somaclonal mutations that may have occurred in these synonymous groups. Genome resequencing will be applied to gain a comprehensive understanding of the genetic basis for mutation-based changes in important agronomic traits. This SNP-based genotyping approach will be highly useful in many other areas of the jujube industry.