Characterization and Development of Genomic SSRs in Pecan (Carya illinoinensis)

Research Highlights: The distribution of simple sequence repeat (SSR) motifs in two draft genomes of pecan was evaluated. Sixty-six SSR loci were validated by PCR amplification in pecan. Twenty-two new development markers can be used for genetic study in genus Carya. Background and Objectives: Pecan has good nutritional and health benefits and is an important crop worldwide. However, the genetic research in this species is insufficient. One of the main reasons for this is the lack of enough accurate, convenient, and economical molecular markers. Among different marker types, SSR loci are enormously useful in genetic studies. However, the number of SSRs in C. illinoinensis (Wangenh.) K. Koch is limited. Materials and Methods: The distribution of SSR motifs in the pecan genome was analyzed. Then, the primers for each SSR were designed. To evaluate their availability, 74 SSR loci were randomly selected and amplified in pecan. Finally, 22 new SSRs and eight former ones were picked to evaluate the genetic diversity in 60 pecan genotypes and to determine their transferability in other Carya species. Results: 145,714 and 143,041 SSR motifs were obtained from two draft genomes of ‘87MX3-2’ and ‘Pawnee’, respectively. In total, 9145 candidate primers were obtained. Sixty-six (89.19%) primers amplified the target products. Among the 30 SSRs, 29 loci showed polymorphism in 60 pecan genotypes. The polymorphic information content (PIC) values ranged from 0.012 to 0.906. In total, 26, 25, and 22 SSRs can be used in C. cathayensis Sarg., C. dabieshanensis W. C. Cheng & R. H. Chang, and C. hunanensis W.C. Liu, respectively. Finally, the dendrogram of all individuals was constructed. The results agree with the geographic origin of the four species and the pedigree relationships between different pecan cultivars. Conclusions: The characterization of SSRs in the pecan genome and the new SSRs will promote the progress of genetic study and breeding in pecan, as well as other species of genus Carya.


Introduction
Pecan (Carya illinoinensis (Wangenh.) K. Koch), native to North America, is an important crop worldwide [1]. Pecan nuts are rich in unsaturated fatty acids, phenolics, and flavonoids and have good nutritional and health benefits [2][3][4]. The nut shell contains high levels of bioactive compounds, including tocopherols, phytosterols, total phenolics, and condensed tannins, and shows antioxidant, antimicrobial, and potential anticancer activity [5,6]. Meanwhile, the high oil content (>70% of the fresh weight) and high mono-unsaturated fatty acids content of the nut make pecan an excellent oil crop [4]. In addition, the biomass waste of the tree makes pecan a potential energy crop [7]. Previous studies

Plant Materials and DNA Extraction
In total, 80 plant samples were used in this study, including 21 pecan cultivars (20 cultivars introduced from USA), 14 excellent seedlings, 25 pecan seedling trees, 10 Chinese hickory strains, five C. hunanensis W. C. Cheng & R. H. Chang seedling trees, and five C. dabieshanensis W.C. Liu seedling trees (Table 1; Figure 1). Genomic DNA was extracted from buds or young leaves of each individual using a Plant DNA Extraction Kit (Tiangen, Beijing, China). All DNA was stored at −80 • C until use.

Amplification and Validation of gSSRs
A total of 74 gSSR loci were randomly selected. Genomic DNA from 'Pawnee' and 'Mahan' were selected to amplify target products and validate the usability of these primers. PCR reactions were performed in 16 μL reaction volumes, containing 7.5 μL of 2 × Tsingke Master mix (Beijing Tsingke Biological Technology Co., Ltd., Beijing, China), 1 μL of each primer (10 pmol), and 1 μL of cDNA and 4.5 μL ddH2O. The PCR procedure was 94 °C for 5 min, then 35 cycles of 94 °C for 30 s, 60 °C for 30 s, 72 °C for 30 s, with a final extension at 72 °C for 5 min. PCR products were electrophoresed using 1% agarose under 300 V for 12 min. The primers which amplify the expected product size were selected and labeled with HEX, ROX, FAM, or TRAMA. The PCR amplifications were performed again, and the products were separated using capillary electrophoresis on an ABI 3730 sequencer (Applied Biosystems, Forster City, CA, USA). Finally, the primers that gave clearly distinguishable peaks were used for the genetic study of the pecan population.

Genetic Diversity Analysis of Pecan and Transferability of gSSRs to Other Species
Twenty-two validated gSSRs and eight SSRs from Grauk et al. (2003) were chosen to amplify genomic DNA from 80 genotypes [13]. PCR products from each individual were separated using capillary electrophoresis. The genotype of each SSR locus was analyzed by Gene Mapper 4.1 software (Applied Biosystems, Foster City, CA, USA). Several genetic parameters, including observed allele number (Na), observed heterozygosity (Ho), expected heterozygosity (He), and polymorphic information content (PIC), were calculated by using the Popgen 32 program (University of Alberta and Center for International Forestry Research, Canada). A phylogenetic dendrogram was constructed by using the unweighted pair group method with arithmetic average (UPGMA).

Amplification and Validation of gSSRs
A total of 74 gSSR loci were randomly selected. Genomic DNA from 'Pawnee' and 'Mahan' were selected to amplify target products and validate the usability of these primers. PCR reactions were performed in 16 µL reaction volumes, containing 7.5 µL of 2 × Tsingke Master mix (Beijing Tsingke Biological Technology Co., Ltd., Beijing, China), 1 µL of each primer (10 pmol), and 1 µL of cDNA and 4.5 µL ddH 2 O. The PCR procedure was 94 • C for 5 min, then 35 cycles of 94 • C for 30 s, 60 • C for 30 s, 72 • C for 30 s, with a final extension at 72 • C for 5 min. PCR products were electrophoresed using 1% agarose under 300 V for 12 min. The primers which amplify the expected product size were selected and labeled with HEX, ROX, FAM, or TRAMA. The PCR amplifications were performed again, and the products were separated using capillary electrophoresis on an ABI 3730 sequencer (Applied Biosystems, Forster City, CA, USA). Finally, the primers that gave clearly distinguishable peaks were used for the genetic study of the pecan population.

Genetic Diversity Analysis of Pecan and Transferability of gSSRs to Other Species
Twenty-two validated gSSRs and eight SSRs from Grauk et al. (2003) were chosen to amplify genomic DNA from 80 genotypes [13]. PCR products from each individual were separated using capillary electrophoresis. The genotype of each SSR locus was analyzed by Gene Mapper 4.1 software (Applied Biosystems, Foster City, CA, USA). Several genetic parameters, including observed allele number (Na), observed heterozygosity (Ho), expected heterozygosity (He), and polymorphic information content (PIC), were calculated by using the Popgen 32 program (University of Alberta and Center for International Forestry Research, Canada). A phylogenetic dendrogram was constructed by using the unweighted pair group method with arithmetic average (UPGMA).

Primer Design and Validation
A total of 60,018 primer pairs for 20,006 gSSRs in 7143 sequences were designed by using Primer 3 software (Whitehead Institute, Cambridge, MA, USA). Then, 9145 (15.24%) candidate primer pairs to 5002 gSSRs (25.00%) were obtained after the deletion of low-quality ones (Table S2). To validate this, a set of 74 primer pairs were randomly picked and synthesized. All of them were distributed in 74 contigs of '87MX3-2' genome sequences. These loci were distributed in 70 scaffolds of 'Pawnee' genome sequences: Ciz034 and Ciz035 in scaffold72459, Ciz037 and Ciz058 in scaffold69125, Ciz039

Genetic Diversity Analysis and Cross-Species Transferability of gSSRs
Twenty-two randomly selected gSSRs from 66 validated loci and eight SSRs from a previous study were employed in this section (Table 3) [13]. In total, 60 pecan genotypes and 20 individuals from three other species were used to perform genetic diversity analysis and cross-transferability, respectively. The PCR products were separated using capillary electrophoresis (Figure 3). Among the 22 newly developed gSSR loci, 21 (95.45%, except Ciz036) exhibited polymorphism in the examined population (Table 3). For all loci, the observed allele number (Na) ranged from 1 (Ciz036) to 18 (Ciz031). A total of 221 alleles were detected in pecan, and the mean number of alleles per locus was 7.369. The observed heterozygosity ranged from 0.00 (Ciz036) to 1 (Ciz047) with a mean value of 0.439. In addition, the minimum and the maximum of PIC values were 0.00 (Ciz036) and 0.893 (Ciz031), with 23 loci showing high PIC values (PIC > 0.5), and the mean value was 0.547. Twenty-six (86.67%) gSSRs could yield the target product in hickory, while 25 (83.33%) and 22 (73.33%) loci could be used in C. dabieshanensis and C. hunanensis, respectively (Table 4).

Population Structure and Cluster Analysis
The 80 individuals were mainly divided into two clusters (Figure 4). Cluster I contained all samples originating from China, including 10 hickory, five C. dabieshanensis, and five C. hunanensis. According to three different species, cluster I was divided into three subclasses. The genetic distance between hickory and C. dabieshanensis was relatively close. The samples of pecan were clustered together and showed two subclasses. Interestingly, all the cultivars and most excellent strains (10 out of 14) were classified together, while nearly all the seedling trees (21 out of 25) and four excellent strains were clustered into another subgroup. Nine cultivars ('Mohawk', 'Forkert', 'Oconee', 'Choctaw', 'Mahan', 'Osage', 'Waco', 'Pawnee', and 'Creek') and three excellent strains (1101, XH5, and XH6) appeared to have a short genetic distance. In addition, 'Colby', 'Carter', 'Syrup Mill', 'Gloria Grande', and 'Stuart'; 'Major', 'Lakota', and 'Greenriver'; and 'Elliott' and 'Navaho' were clustered together, respectively ( Figure 4).

Population Structure and Cluster Analysis
The 80 individuals were mainly divided into two clusters (Figure 4). Cluster I contained all samples originating from China, including 10 hickory, five C. dabieshanensis, and five C. hunanensis. According to three different species, cluster I was divided into three subclasses. The genetic distance between hickory and C. dabieshanensis was relatively close. The samples of pecan were clustered together and showed two subclasses. Interestingly, all the cultivars and most excellent strains (10 out of 14) were classified together, while nearly all the seedling trees (21 out of 25) and four excellent strains were clustered into another subgroup. Nine cultivars ('Mohawk', 'Forkert', 'Oconee', 'Choctaw', 'Mahan', 'Osage', 'Waco', 'Pawnee', and 'Creek') and three excellent strains (1101, XH5, and XH6) appeared to have a short genetic distance. In addition, 'Colby', 'Carter', 'Syrup Mill', 'Gloria Grande', and 'Stuart'; 'Major', 'Lakota', and 'Greenriver'; and 'Elliott' and 'Navaho' were clustered together, respectively (Figure 4).

Discussion
Recently, next and third-generation sequencing technologies have been employed to investigate molecular mechanisms of key traits and construct a reference genome in pecan [3,4,12,28]. Transcriptomes and draft genomes offered an opportunity to identify SSR sites on a large scale. In the present study, two draft genomes of '87MX3-2' and 'Pawnee' were independently analyzed to mine genomic SSR loci [14,28]. The frequencies of the occurrence of gSSRs were 1/3.65 kb and 1/4.55 kb in '87MX3-2' and 'Pawnee', respectively. The gSSR densities between two cultivars showed slight differences. The genomic SSR frequency was significantly higher than genic SSR (1/6.10 kb; 6860 SSRs, including di-, tri-, tetra-, penta-, and hexanucleotides, in 41,858,722 bp sequences) in pecan [29], which confirmed that the gSSR is more abundant than transcriptomic SSRs in this plant. The gSSR density is higher than bamboo (1 SSR/16 kb) [31], and lower than walnut (1 SSR/2.3 kb) and Chinese jujube
The genus Carya consists of 17 species [1], of which pecan, hickory, C. hunanensis, and C. dabieshanensis have significant economic values in China [3]. However, the molecular markers in these species are far from sufficient. In this study, 66 gSSRs were validated in 'Pawnee' and 'Mahan', and 21 newly developed loci and eight previous SSRs showed genetic polymorphisms in different germplasms (Table 3). Among them, 23 loci showed high PIC values (>0.5). Therefore, these gSSR loci exhibited high polymorphisms in different germplasms and could be used in genetic diversity study, genetic map construction, Quantitative trait loci (QTL) mapping, as well as cultivar identification in pecan. For the cross-transferability of gSSR loci, 86.67% (26/30) of gSSRs can be used in hickory, which was higher than that in a previous study (63.02%) [20]. Moreover, the transferability rates were 83.33% (25/30) and 73.33% (22/30) between pecan and two other Carya species of C. dabieshanensis and C. hunanensis, respectively (Table 4). Similarly, high transferability rates also have been reported in other species, such as Pistacia vera L. [37], Casuarina L. ex Adans [38], and Melilotus [39].
SSR is a useful tool for evolutionary studies and pedigree relationships evaluation [16]. As is shown in Figure 4, the species of C. hunanensis, C. dabieshanensis, and C. cathayensis were clustered together, and the last two species showed a closer relationship. These observations are consistent with the results of Zhang et al. (2013), who reported that the Carya species from eastern Asian and eastern North American were classified into two different groups, respectively [40]. For pecan, 'Mohawk', 'Forkert', 'Oconee', 'Choctaw', 'Mahan', 'Osage', 'Waco', 'Pawnee', and 'Creek' exhibited a wide range of similarities. Similar results were also reported by previous studies [12,13]. 'Mohawk' and 'Choctaw' originated from a controlled cross of 'Success' × 'Mahan' [14,41] (Table S4), and 'Mohawk' was the parent of 'Pawnee' and 'Creek'. 'Forkert' was selected from offspring of 'Success' × 'Schley' [14,41]. 'Schley' was an ancestor of 'Oconee' and 'Waco' [41]. Therefore, these individuals have pedigree relationships with each other, and the group patterns agreed with this fact. Incidentally, '1101', 'XH5', and 'XH6' were also contained in this subgroup, which implies that these excellent seedlings might be progenies of these cultivars. Presumably, 'Stuart' might be the parent of 'Gloria Grande' [41]. Here, 'Gloria Grande' and 'Stuart' showed a higher similarity, which provided good support for this speculation. In addition, 'Lakota' originated from 'Mahan' × 'Major', and 'Greenriver' was selected from the same woods as 'Major'. Therefore, these three cultivars were joined together. Pecan was introduced into China in the 1900s. Currently, a large number of seedling trees is spread through several provinces in China. However, the collection and genetic study of these germplasms is still insufficient. As shown in Figure 4, a relatively long genetic distance exists between cultivars and most seedling trees in China. Therefore, the hybridization between the two types of germplasms might create more genetic variation, which might be useful for cross breeding in the future.

Conclusions
Pecan is an important multipurpose tree worldwide. However, the progress of genetic study in pecan is limited by the lack of a sufficient number of accurate, convenient, and economical molecular markers. In the present study, the distribution of SSR motifs was evaluated in the pecan genome. The gSSRs densities were 1/4.55 kb ('Pawnee') to 1/3.65 kb ('87MX3-2'). In total, 60,018 primer pairs for 20,006 gSSRs were designed, and a set of 66 gSSRs were successfully validated. Thirty gSSRs were employed to analyze the genetic diversity and progeny relationship between pecan genotypes and their cross-transferability to other Carya species. In total, 29 SSR loci showed polymorphism, and 26, 25, and 22 gSSRs can be used in hickory, C. dabieshanensis, and C. hunanensis, respectively. We believe that the characterization of SSRs in the pecan genome and the new gSSRs reported here will promote the progress of genetic study and MAS research in pecan, as well as other species of the genus Carya.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/1/61/s1, Table S1: The frequency of identified SSR motifs, Table S2: The information of primers, Table S3: The information of 66 primer pairs used in the present study, Table S4: The pedigree information of 21 cultivars.

Conflicts of Interest:
The authors declare no conflicts of interest.