Microsatellite Loci Reveal Genetic Diversity of Asian Callery Pear (Pyrus calleryana) in the Species Native Range and in the North American Cultivars

Pyrus calleryana Decne. (Callery pear) includes cultivars that in the United States are popular ornamentals in commercial and residential landscapes. Last few decades, this species has increasingly naturalized across portions of the eastern and southern US. However, the mechanisms behind this plant’s spread are not well understood. The genetic relationship of present-day P. calleryana trees with their Asian P. calleryana forebears (native trees from China, Japan, and Korea) and the original specimens of US cultivars are unknown. We developed and used 18 microsatellite markers to analyze 147 Pyrus source samples and to articulate the status of genetic diversity within Asian P. calleryana and US cultivars. We hypothesized that Asian P. calleryana specimens and US cultivars would be genetically diverse and would show genetic relatedness. Our data revealed high genetic diversity, high gene flow, and presence of population structure in P. calleryana, potentially relating to the highly invasive capability of this species. Strong evidence for genetic relatedness between Asian P. calleryana specimens and US cultivars was also demonstrated. Our data suggest the source for P. calleryana that have become naturalized in US was China. These results will help understand the genetic complexity of invasive P. calleryana when developing management for escaped populations: In follow-up studies, we use the gSSRs developed here to analyze P. calleryana escape populations from across US.


Introduction
Genetic diversity and population structure of any species are important factors for determining the long-term survival of the species and their adaptation to environmental changes [1]. The population genetic profile of a species can inform about the origins of sub-populations [2,3], which provides greater understanding about the introduction history of the tree and its cultivars. Such knowledge can be used to formulate an effective management plan. Assessment of the existing genetic variation of any species requires widespread and intensive sampling from both native and introduced ranges [4].
Pyrus calleryana Decne., Callery pear, is a species of pear tree native to eastern and southern China, Taiwan, Korea, and Japan [5]. This deciduous tree often has a conical to rounded crown and flowers as early as three years of age [5,6]. Pyrus calleryana is a popular ornamental tree well-known for its early spring flowers, robust growth, and fall color display [7]. 'Bradford' is the most widely planted and the most commonly known Callery pear cultivar in the United States (USA) [8,9]. Pyrus calleryana is diploid with a haploid chromosome number of 17, 2n = 34 [6] and flow cytometry estimated genome size of 588 Mbp/1C [10]. Flowers from the cultivars of P. calleryana are self-incompatible and cultivars are routinely vegetatively propagated by grafting. The trees can produce viable, long-lived seeds [11], when flowers are cross-pollinated by other compatible Pyrus species, compatible rootstock-based pollen, or other P. calleryana cultivar [12]. When this occurs, the seeds can germinate and establish when dispersed in favorable environments [9]. In addition, P. calleryana adapts well to soils of variable pH (acidic to alkaline soil) and trees are tolerant of drought [7]. Several traits, including gametophytic self-incompatibility, pathogen and herbivore resistance, and tolerance of various abiotic stresses have contributed to the spread and persistence of P. calleryana in a variety of environments [12]. Although the underlying mechanisms and processes that have enabled the broad spread and invasiveness of the species are not well understood, intraspecific hybridization among the genetically distinct cultivars and interspecific hybridization with the other escaped Pyrus species could be among the possible reasons behind expansion of P. calleryana populations [7,13]. Additionally, several insects promote cross-pollination [8,14]. The fruits on fertile naturalized trees become secondary foods for birds that then disperse seeds into distant areas, and contribute to the spread of the species [15].
Major imports of P. calleryana seeds to the USA were made from East Asia in 1917 through 1920 and that germplasm was used to breed with European pear, Erwinia amylovora Selections were performed based on resistance against the fire blight pathogen, Erwinia amylovora Burrill [12]. The hardiness of P. calleryana lent itself to the development and release of several intraspecific hybrid cultivars making it popular among local gardeners and landscapers. Soon, P. calleryana appeared in and naturalized almost all habitats throughout the USA [8,[16][17][18][19]. Naturalized P. calleryana trees can be found in 33 states [16,17] and due to expanding populations of naturalized trees, P. calleryana has been listed, or watch-listed, as an invasive species in many USA states [9,12,20]. Pyrus calleryana can potentially become one of the most problematic invasive species in the USA [21]. hence, there is need for understanding genetic contributions to invasiveness as a means to develop effective management plans and strategies that can mitigate or limit the spread of P. calleryana. To address this knowledge gap, we need to know more about the biology and genetics of this invasive species.
The assessment of the molecular population genetics preferably using co-dominant markers is necessary for understanding the dynamics of adaptation and the spread of any species [22]. The use of simple sequence repeats (microsatellites; SSRs) in population genetics and phylogenetics provides a more reliable interpretation of genetic diversity than other arbitrary markers such as random amplified polymorphic DNAs (RAPDs) [23,24]. SSRs, highly informative and multi-allele genetic markers, are experimentally reproducible and transferable among closely related species [25]. The presence of multiple alleles at an SSR locus makes SSRs more informative than other molecular markers, including SNPs [26].
Most of our knowledge on P. calleryana originates within its native range in Asia, where P. calleryana naturally occurs as scattered individuals in the wild to the point of being considered a threatened species [14,27]. Most of these studies have focused on identification and characterization of cultivars or Pyrus species using different types of DNA markers such as SSRs [28][29][30][31], amplified fragment length polymorphisms (AFLPs) [32], RAPDs [33], restriction fragment length polymorphisms (RFLPs) [34], and chloroplast DNA (cpDNA) [35]. Few studies, however, have investigated the genetic diversity and population structure of P. calleryana. Nuclear SSRs (nSSRs) and cpDNA were used in a study of genetic diversity of P. calleryana in Zhejiang province of China [36], which revealed the geographic distance as the major factor in shaping the population structure of P. calleryana. Additionally, an examination of the genetic diversity of P. calleryana var. dimorphophylla in the Tokai district of central Japan [7,27,37] found complex genetic structure of the species, probably originating from artificial propagation and introgression with other Pyrus species. There are limited studies investigating the genetic diversity and population structure of the invasive USA P. calleryana escapes, and this issue urgently needs further investigation [7,37].
We assessed the genetic diversity and population structure of P. calleryana from both native (i.e., Asian) and introduced ranges (the USA-introduced cultivars). We hypothesized that there would be high genetic diversity present within Asian P. calleryana populations, and that they would be genetically related to early USA-naturalized cultivars of P. calleryana (owing to the origin of the species) in terms of genetic distance and their genetic clustering. Specifically, we aimed to (a) develop novel microsatellite markers for P. calleryana, (b) test the cross-species amplification of the developed genomic short sequence repeats (gSSRs) to other Pyrus and Malus species (a related outgroup, of economic importance), and (c) use the most informative of these markers to evaluate the genetic diversity and genetic relatedness among Asian P. calleryana specimens and early USA-naturalized cultivars of P. calleryana.

Sample Collection
Leaf and flower bud samples of pears (Pyrus) and the original USA cultivar selections of P. calleryana were obtained from 10 herbaria and arboreta in the USA. In total, 147 samples (including 80 samples of P. calleryana and 67 samples of other Pyr and Malus species) were received. Respective permissions or licenses for limited destructive sampling were granted by all these institutions, in accordance with their internal regulations. The geographical coordinates, the country of origin, and the year of collection were recorded for each sample and whenever possible for historical specimens [38]. The collection was reduced to 90 specimens due to the unreliable/inconsistent amplification of samples. Out of those 90 samples, 57 were P. calleryana specimens (36 specimens of Asian P. calleryana and 21 samples of 7 USA commercial cultivars of P. calleryana) and 33 samples represented 14 different species of Pyrus and 2 samples of Malus rockii (Supplementary Table S1). These 33 Pyrus and Malus species samples were used to evaluate the potential for cross-amplification among gSSRs.

gDNA Extraction
Approximately 100 mg of dried leaf samples or fresh flower buds were taken from each specimen and homogenized using a Bead mill 24 (Fisher Scientific, Pittsburgh, PA, USA). samples were homogenized four times and kept frozen in liquid nitrogen for at least 5 min between each homogenization step in order to improve the tissue homogenization. The gDNA was extracted using EZNA DNA DS mini Kit (Omega Bio-Tek, Norcross, GA, USA) protocol. Nanodrop (Thermo Fisher Scientific, Wilmington, DE, USA) was used to evaluate the purity and measure the concentration of the isolated gDNA. Re-extraction was attempted following the CTAB protocol [39] for those samples that did not produce good quality and quantity of gDNA using EZNA DNA DS mini Kit. gDNAs was re-extracted wherever enough of plant sample material was available.

Microsatellite Primers and Genotyping Conditions
Genomic SSRs (gSSRs) were developed by using closely related pear (Pyrus × bretschneideri Rehder) whole genome sequence data (GenBank number: JH994112) [40]. The development of gSSRs involved several steps including: (i) genome assembly using MaSuRCA [41]; and (ii) SSR mining using GMATA [42]. The gSSRs of interest with a motif length of 2 to 6 bp and a minimum of 5 repeats were retained. Finally, 18 to 22 bp gSSR primers were designed using Primer3 [43] with 45 to 55% GC content, 58 to 62 • C melting temperature, and 100 to 500 bp of expected product sizes. The developed gSSRs were used for this P. calleryana study (IDT, Coralville, IA, USA; Supplementary Table S2). Genomic locations of the gSSRs used in this study were analyzed in six available genomes of related Prunus spp. using BLAST [44] with default settings, owing to their relatively higher quality (contiguity, N score, and annotation) than the genome used for their development. Genome with most gSSRs mapping to it was used to visualize their locations. Additionally, for cross-species amplification analysis, gDNA samples of the other Pyrus species that failed to amplify with the gSSRs were further tested with Internal transcribed spacer (ITS) primers [45] and ribosomal protein S16 (rps16/cpDNA03) primers [46]. This was done to confirm that the failure of gSSR-PCR was not due to inferior quality of gDNA used. Both ITS (nuclear) and rps16 (plastidic) primers were used in our study for more effective DNA quality controls.
Polymerase chain reaction (PCR) amplifications were performed in a 10 µL reaction mixture consisting of 4 ng gDNA, 1 µM final concentration of each primer, 5 µL of 2 × GoTaq ® DNA Polymerase (Promega, Madison, WI, USA), and 0.5 µL of dimethyl sulfoxide (DMSO). In order to validate the data, the gDNA extracted from the P. calleryana herbarium specimen (Arnold Arboretum, catalog number: 156,119) was used as a positive control, and sterile distilled water as a negative control, for each of 18 gSSRs tested. PCR amplification was performed using the touch-down protocol to ensure the specificity of the amplified fragments [47] using the following thermal profile: initial denaturation at 94 • C for 3 min., followed by 10 cycles of denaturation at 94 • C for 30 s, annealing at 65 • C for 30 s with a touch-down of 0.7 • C/cycle and an extension at 72 • C for 30 s then, followed by 30 cycles of denaturation at 94 • C for 30 s, annealing at 58 • C for 30 s and an extension at 72 • C for 30 s, with a final extension of 72 • C for 4 min.
QIAxcel Capillary Electrophoresis System (QIAGEN, Germantown, Maryland, USA) was used to visualize the P. calleryana PCR products and determine allelic sizes using a 15/600 bp alignment marker and 25 to 500 bp DNA size marker. Allele sizing was performed for each of the 147 gDNA samples against each of the 18 gSSR markers. The PCR reaction was repeated twice for the samples that did not amplify, which were then declared missing data if still failed to amplify. samples with missing data in more than 9 of 18 loci were discarded from the analyses resulting in the final dataset of 57 P. calleryana samples (Supplementary Table S3 and data not shown).
For cross-species amplification test, PCR amplification in the gDNA samples of the other Pyrus and Malus species was also performed using this PCR reaction mixture composition and following the touch-down protocols detailed above. In addition, PCR amplification for the samples that failed to amplify for each gSSR was further attempted using ITS [45] and rps16 primers [46] using a 10 µL reaction mixture: 10 ng gDNA, 5 µM final concentration of each primer (ITS or rps16, respectively), and 5 µL of AccuStart II PCR SuperMix (Quantabio, Beverly, MA, USA). Pyrus calleryana (Carnegie herbarium, catalog number: 396078) DNA was used as a positive control and sterile distilled water was used as a negative control.
The optimized thermal profile for PCR with ITS1 and ITS4 primers included an initial denaturation at 94 • C for 2 min, 40 cycles of 95 • C for 30 s, 60 • C for 1 min, 72 • C for 90 s, and the final extension at 72 • C for 7 min. The thermal cycle for touchdown PCR involving

Data Analysis
The raw allele sizes were binned into statistically identical allelic classes using FlexiBin excel macro [48]. The binned allelic data were used for further analyses. For each gSSRs, the P. calleryana dataset was transformed using the motif lengths to represent the number of repeats rather than the PCR allele sizes using PGDSpider [49] [50] 3.6.2 (R Foundation, Vienna, Austria). There were no clones in the dataset (total n = 57). Based on their country of origin, the samples were divided into four population groups, denoted "China" (n = 20), "Japan" (n = 11), "Korea" (n = 5), and "USA" (n = 21). In order to assess the contribution of mutation rate to the population structure, SPAGeDi performed the permutation tests to determine the phylogenetic distance between individuals using 10,000 permutations among alleles within each locus and to determine the significance of inbreeding coefficients using 10,000 permutations among gene copies [56]. For many samples, including many of the historical specimens, Global Positioning System (GPS) coordinates were not available, shrinking the sample subset further, and consequently creating a possible bias in our study. The entire P. calleryana dataset (n = 57) was used to determine the phylogeographical signal within populations as R ST does not depend on GPS coordinates, whereas, only the individuals with known GPS coordinates of origin (total n = 28; "China", "Japan", "Korea" groups) were used to determine the phylogeographical signal among populations.
Analysis of Molecular Variance (AMOVA) was performed using the package poppr with 1000 permutations in order to evaluate the molecular variance partitioning within and among the population groups. Linkage Disequilibrium (LD) of the used gSSRs was determined in poppr, with significance assessed using 1000 permutations. The pairwise index of association (r d ) was calculated using permutation approach to assess whether the loci were linked and to ensure that the observed pattern of LD is not due to a single pair of loci [57].

Population Structure Mantel Test for Isolation by Distance
Samples with known GPS coordinates (n = 28; Supplementary Figure S1) were analyzed using Mantel and partial Mantel tests implemented in the package MASS [58] version 7.3-50 (CSIRO, Cleveland, Australia) using 1000 permutations for the assessment of tests significance. These tests estimate isolation by distance (IBD) to determine the correlation between the genetic and the geographical distance matrices of the individuals. The Mantel test was also standardized using the year of sampling (partial Mantel test), to additionally assess the effect of mutation across time. Further, Mantel correlogram tests were performed to further examine the underlying correlative relationship using the packages ade4  [63]. ObStruct [64] version 1.0 (University of Auckland, Auckland, New Zealand) was used to determine the correlation of population structure of inferred ancestral profiles to that of predefined/sampled pop-ulations. The program uses the ad hoc R 2 statistics whose values range from 0 (recent divergence of predefined populations or a lot of migration/admixture between populations) to 1 (strong diversification and/or population structure) [64]. One-tailed t-tests were used to accrue the significance of the differences when consecutively removing each one of the pre-defined populations or the inferred clusters, to assess this group's impact on R 2 .
The population structure of the P. calleryana dataset was also analyzed using modelfree multivariate clustering approach, Discriminant Analysis of Principal Components (DAPC) using the package adegenet [65] version 2.1.1 (Université de Lyon, Lyon, France). The DAPC analysis was optimized and cross-checked utilizing 1000 permutations over Principal Component Analysis (PCA) range from 2 to 109 (total number of alleles -1). The analysis was further confirmed using a dendrogram of unrooted neighbor-joining tree of pairwise genetic distances [66] among P. calleryana originating populations. A separate DAPC analysis was also performed for USA cultivars only.

gSSR Development and Selection
After gSSR mining with GMATA and execution of additional in-house scripts, a total of 115,838 SSRs were discovered from three Pyrus × bretschneideri draft genomes. Marker polymorphism was determined based on allelic variation within each genome or across the three genome assemblies. SSR mining and primer design resulted in 105,557 SSRs with acceptable primers and of these, 90,987 were dinucleotide, 10,913 were trinucleotide, and 2891 were tetranucleotide repeats. Primers between 59 to 60 • C with <1 • C difference between forward and reverse primers melting temperatures resulted in the discovery of 15,269 single motifs monomorphic SSRs along with 306 single motifs polymorphic SSRs. Only single motifs polymorphic SSRs were considered for the further analysis. In the single motifs polymorphic SSRs collection, AG was the most frequent dinucleotide motif and AAG was the most frequent trinucleotide motif (Supplementary Figure S2a). For the study, 50 gSSRs were selected and tested using gDNA samples from three locally escaped, naturalizing Callery pear trees (Third Creek Greenway, Knoxville, TN, USA). Based on preliminary data and the resulting amplification robustness, polymorphic character, and agreement with the expected product sizes, 40 gSSRs were selected for further evaluations. Of these, based on the initial assessment, 18 gSSRs with high discriminating power for multi-locus genotypes of P. calleryana gDNA were included in the subsequent analyses (Supplementary Table S2). Genomic locations of those 18 gSSRs were analyzed using the six available high-quality genomes of related Prunus spp. Among those, 17 gSSR mapped to Prunus dulcis (Mill.) D.A.Webb genome GCF_902201215.1 [67] (Supplementary Figure S2b).

Genetic Diversity
All 57 individual P. calleryana samples represented unique MLGs and their genotypic data were used for analyses. Throughout the dataset, there was about 10% missing data ( Table 1). The locus and population with the highest missing data were PyC012 with 26%, and "China" with 15% missing data, respectively. The dataset deviated from hardy-Weinberg equilibrium (HWE; Supplementary Figure S4), which is a possible result from the relatively low sample number, from specimens that were sampled across a broad time range (approximately 1912 to 2019 AD), as well as from wide geographical origins.  [24]; P a : Number of private alleles in each population group. Significance was assessed by using 10,000 randomizations of gene copies among all individuals, *** = p < 0.0001 and ns = p > 0.05 at 10,000 permutations.
Among the four populations, the average Shannon-Wiener Index of MLG diversity (H) was 4.04, indicating high genetic diversity in the genotypic dataset ( Table 1). The average effective number of alleles in the four populations was 5, ranging from 2 in "Korea" to 6 in "China". This indicated high genetic diversity and variations of P. calleryana populations. Similarly, the overall A r of 3.79 varied from 2.48 in "Korea" to 4.61 in "China". A total of 88 private alleles were found in the P. calleryana dataset, with "China" population having the most of private alleles (n = 37).
The gSSRs had high power in discriminating the MLGs requiring only 8 gSSRs to capture all of the MLGs present (Supplementary Figure S5). The calculation of a small range of pairwise values of LD (r d = 0 to 0.4, p-value = 0.113) indicates no linked loci and suggests distribution of gSSRs across the genome (Supplementary Figure S6).
Across the dataset, an average of about 12 alleles per locus (ranging from 5 to 20) were detected ( Table 2). The mean allelic richness (A r ) calculated in the locus-wise manner was 4.70, ranging from 2.74 (PyC050) to 6.22 (PyC031), suggesting a high long-term adaptability and persistence potential of the P. calleryana populations. There was a moderate observed heterozygosity across 18 gSSRs overall (H o = 0.34) ranging from 0.03 (PyC050) to 0.93 (PyC038). In addition, there was a high expected overall heterozygosity (H e = 0.81) across all loci ranging from 0.41 (PyC050) to 0.92 (PyC031 and PyC017). The overall h o < h e implied the presence of population structure within our P. calleryana collection. Furthermore, the dataset indicated high gene flow across all gSSRs with an overall value of 1.79. . Significance of the dataset was assessed by 10,000 randomization of gene copies among all individuals using **** = p < 0.0001; *** = p < 0.001; ** = p < 0.01; * = p < 0.05; ns = p > 0.05. NA: Not applicable as a negative value does not represent any gene flow data.
For the assessment of phylogeographic signals within the P. calleryana dataset, SPAGeDi was implemented, using 10,000 permutations among alleles within each locus. The mean permuted R ST over all loci was not statistically different from F ST in accordance with the expectations of this test, and the observed R ST was bigger than the observed F ST (Pobs > exp = 0) indicating the presence of phylogeographic signal within populations (data not shown). Furthermore, the slope test of pairwise R ST was evaluated in both linear and logarithmic forms to assess the phylogeographic signal among populations. From this slope test, we found no evidence of phylogeographic signal among populations (Pobs > exp = 0.95). Only the samples with available GPS coordinates (n = 28) were used for the slope test, thus creating a possible bias in this analysis.
AMOVA was used to investigate the partitioning of the molecular variance. It suggested a low proportion of total molecular variance among populations (6.2%), with more than half of the total molecular variance partitioned within individuals (63.5%) (Supplementary Table S5). Significance of the test result (p < 0.001) indicated the existence of population structure within the genotyped collection of P. calleryana.

Population Structure Mantel Test for Isolation by Distance
Several independent analyses were performed to determine the population structure in the P. calleryana collection. Isolation by distance analysis using the Mantel test yielded no evidence of correlation between genetic and geographic distances among the analyzed P. calleryana individuals of Asian origins (total n = 28; Supplementary Figure S1; Mantel statistic (r) = 0.04, p-value = 0.30) (Figure 1). The partial Mantel test (standardized geographic distance matrix by the year of sample collection) did not change that result (Mantel statistic using year of sample collection (r') = 0.04, p-value = 0.33). Additionally, there was a non-linear relationship of the genetic and geographic distances across the space (Figure 1b). The amplitude of the Mantel's r scores in the correlogram was low (−0.1 to 0.05), indicating a low impact of spatial distancing on the population structure of P. calleryana dataset.  Bayesian clustering using Structure and DAPC Bayesian clustering using STRUCTURE indicated two genetically distinct clusters (ΔK = 2) in the genotyped P. calleryana dataset (Figure 2a,b). The admixture of both the inferred clusters was observed for all four population groups. Additionally, four genetically distinct clusters (ΔK = 4) of the genotyped P. calleryana dataset (Figure 2c) showed the admixture of all four inferred clusters in all population groups except "Korea". Negligible variability among 30 independent Markov chains of STRUCTURE was detected. The overall R 2 between predefined and inferred clusters of the P. calleryana collection (when K = 2) was 0.42 ± 0.07 suggesting moderate divergence among the predefined populations and STRUC- Bayesian Clustering Using Structure and DAPC Bayesian clustering using STRUCTURE indicated two genetically distinct clusters (∆K = 2) in the genotyped P. calleryana dataset (Figure 2a,b). The admixture of both the inferred clusters was observed for all four population groups. Additionally, four genetically distinct clusters (∆K = 4) of the genotyped P. calleryana dataset (Figure 2c) showed the admixture of all four inferred clusters in all population groups except "Korea". Negligible variability among 30 independent Markov chains of STRUCTURE was detected. The overall R 2 between predefined and inferred clusters of the P. calleryana collection (when K = 2) was 0.42 ± 0.07 suggesting moderate divergence among the predefined populations and STRUCTUREderived genetic clusters within the dataset. The information on the contribution of sampled and inferred populations to the observed structure of P. calleryana was derived by changes to R 2 using iterative successive removal of the pre-defined populations and the inferred clusters using ObStruct (Supplementary Table S6). Only minor changes in R 2 index value were evident when the predefined populations or the inferred clusters were removed sequentially. Removal of the "China" population caused decrease (p-value = 0.32) in R 2 indicating that this population contributed the most to the population structure from among those analyzed. Additionally, the removal of the "Japan" population caused increase (p-value = 0.20) in R 2 which suggests that this population was of mixed ancestries and contributed the least to the population structure. Furthermore, removal of the inferred clusters resulted in no major changes in R 2 indicating no major contribution of the inferred clusters to the structure within the data. In addition to this, the overall R 2 between predefined and inferred clusters of the P. calleryana dataset (when K = 4) was 0.41 ± 0.02, again with no major changes when sequentially removing the pre-attributed or the inferred groups (Supplementary Table S6). Using DAPC, a multivariate analysis, the P. calleryana dataset showed the clustering different from STRUCTURE. DAPC indicated 4 clusters for the P. calleryana dataset. The DAPC result indicated the possibility of the "China" population being ancestral to the bulk of the species with diverged "Japan" and "Korea" populations ( Figure 3). This was Using DAPC, a multivariate analysis, the P. calleryana dataset showed the clustering different from STRUCTURE. DAPC indicated 4 clusters for the P. calleryana dataset. The DAPC result indicated the possibility of the "China" population being ancestral to the bulk of the species with diverged "Japan" and "Korea" populations ( Figure 3). This was also supported by the genetic distance dendrogram, and in congruence with the results of the Bayesian clustering. In addition to this, DAPC analysis of USA cultivars alone showed that the cultivars named same from the same or different source institutions were not identical (Supplementary Figure S7). Eigenvalues (Insert bar graph, bottom right) represent the factor by which eigenvectors are scaled, which expresses the spatial relationship among populations at different spatial scales. The two respective axes are indicated by the alleles explaining the most of variance within the sampled population. Insert genetic distance tree, top right: Unrooted neighbor-joining tree of pairwise genetic distances (FST) [66] among the sampled P. calleryana specimens.

Discussion
We investigated the genetic diversity of P. calleryana in the collection of original noncultivated Asian specimens and early developed USA cultivars. Available historical records and our results support the hypothesis for high genetic diversity within Asian pear specimens and their genetic relatedness with the early developed USA cultivars. Sample collection from herbaria and arboreta demonstrate their great value for research studies such as ours, or when sampling in native environment is hindered. We found high levels of genetic diversity within P. calleryana populations supporting the fact that wild populations of forest tree species maintain high levels of genetic diversity [73]. Our findings on P. calleryana diversity are also supported by other studies of woody forest trees: North America-native Cercis canadensis [74] and Asia-native Cornus kousa [75].
We developed and used 18 gSSRs discriminating individual MLGs of P. calleryana. Despite availability of the SSRs published by others [36], we chose to develop new markers, as is standard in our laboratories [25,75]. This approach increases the pool of available markers and provides a higher level of confirmation for the results accrued by others. Our novel gSSRs also cross-amplified to conserved sequences of DNA extracted from several Pyrus species. However, these gSSRs did not perform well, with fewer gSSRs amplifying informative sequences from the more distantly related Malus species. Wide genetic variation among Pyrus species exists, and the failure of gSSRs to cross-amplify some Pyrus species could be due to speciation and changing genomic landscape [36]. Those non-amplified Pyrus species are related to P. bretschneideri/P. calleryana but they are highly admixed

Discussion
We investigated the genetic diversity of P. calleryana in the collection of original noncultivated Asian specimens and early developed USA cultivars. Available historical records and our results support the hypothesis for high genetic diversity within Asian pear specimens and their genetic relatedness with the early developed USA cultivars. Sample collection from herbaria and arboreta demonstrate their great value for research studies such as ours, or when sampling in native environment is hindered. We found high levels of genetic diversity within P. calleryana populations supporting the fact that wild populations of forest tree species maintain high levels of genetic diversity [73]. Our findings on P. calleryana diversity are also supported by other studies of woody forest trees: North America-native Cercis canadensis [74] and Asia-native Cornus kousa [75].
We developed and used 18 gSSRs discriminating individual MLGs of P. calleryana. Despite availability of the SSRs published by others [36], we chose to develop new markers, as is standard in our laboratories [25,75]. This approach increases the pool of available markers and provides a higher level of confirmation for the results accrued by others. Our novel gSSRs also cross-amplified to conserved sequences of DNA extracted from several Pyrus species. however, these gSSRs did not perform well, with fewer gSSRs amplifying informative sequences from the more distantly related Malus species. Wide genetic variation among Pyrus species exists, and the failure of gSSRs to cross-amplify some Pyrus species could be due to speciation and changing genomic landscape [36]. Those non-amplified Pyrus species are related to P. bretschneideri/P. calleryana but they are highly admixed likely due to inter-species hybridization and genetic admixture common in Pyrus [76]. Nevertheless, the overall cross-amplification potential of our gSSRs makes these novel markers useful for future studies of at least several species of Pyrus. Owing to limited resources we did not confirm the gSSR transfer to other species used here; PCR product size agreement provides clues that our cross-amplification was indeed successful. The gSSRs can be sequenced as needed for each species undergoing analyses.
The genetic diversity, allelic richness, and gene flow patterns of any plant species are related to their ecology and evolution [77][78][79]. In an outcrossing and widespread species, the genetic diversity mainly exists within populations [80]. Our study revealed a high level of genetic diversity (H e = 0.81) and allelic richness, and the presence of population structure in P. calleryana. A similarly high level of genetic diversity was found in P. calleryana (H e = 0.639) using 14 nSSRs to study 77 individuals [36] and in Ambrosia artemisiifolia (H e = 0.776) using 13 gSSRs and 13 expressed sequence tag SSRs (EST-SSRs) to study 321 individuals [81]. We found high genetic diversity for P. calleryana populations compared to other invasive species such as southwestern Puerto Rico's invasive tree [82], Albizia lebbeck with h e = 0.27, or southeastern US invasive species [83], Pueraria lobata with h e = 0.25.
We recorded high gene flow among P. calleryana populations across the species' native range. Irrespective of the geographical barriers, high gene flow could be a possible reason for this species' invasiveness because gene flow promotes evolution through the spread of new genes or mixture of genes throughout the range of species [84]. The high gene flow rate observed here is consistent with other invasive species [82,85] such as A. lebbeck and Fallopia species. Such a high gene flow rate promotes exchange of genes among populations especially in self-incompatible species by providing seeds for more population growth and colonization of new habitats [82]. As an outcrossing species, P. calleryana has been able to maintain population structure with high genetic diversity. The individuals of a particular cultivar have the same self-incompatibility genotype and cannot produce fruits. however, if the rootstock is allowed to sprout then that rootstock can cross with genetically different scion resulting in fruit set with mixed cultivar types [12]. Additionally, our study found a low impact of spatial distance in population structure of P. calleryana dataset which could support the seeds or pollen as the far-reaching mechanism of dispersal.
In P. calleryana, flowers are indiscriminately visited by various generalist pollinator species and the pollen is often carried among neighboring cultivars, resulting in intraspecific hybridization [8]. Seeds can be dispersed over long distances as a result of frequent frugivorous animal activities [34]. In a previous study, most of the P. calleryana seedlings were found with almost no genetic similarity to nearby mature trees, implying that those seedlings might have originated from foraging birds' defecation [12]. In addition, there could be an intraspecific hybridization among the cultivars and an interspecific hybridization with the other escaped Pyrus species [13], as implied by our cross-amplification data. These characteristics may help P. calleryana maintain high genetic diversity and high gene flow rate, aiding in the continued spread of the species by providing environment-specific genotypes needed to adapt to a varied environmental condition [82].
High genetic differentiation found in our P. calleryana collection suggested the existence of population structure. Pears are expected to undergo random mating, but an unexpected positive result for the inbreeding coefficient (F IS ) values was found in our study indicating the probability of alleles coming from a common ancestor. A similar positive F IS result was found in a study in P. calleryana in China [36], where the species is considered threatened. One of the possible explanations for such positive F IS result could be that the sampled individuals might have experienced some human interferences, such as selection, propagation, and intentional transportation resulting in the escape of cultivation in the USA. Other explanations could be the insect pollination with limited pollen flow creating local population structure, or the selection of loci under negative selection by chance, where any mutation would be lethal. Studies testing this phenomenon in the escaped P. calleryana populations are underway.
Our study did not indicate geographic distance as the major factor contributing to the genetic structure of the populations. A non-significant correlation between genetic and geographic distance could be a result of the small size of our P. calleryana dataset collection. Our result is unlike the Mantel test result obtained for wild P. calleryana from China [36]. In addition to this, our study partitioned 57 individuals from 4 populations into 2 major genetic clusters. however, considering the biology of P. calleryana, DAPC result, unrooted neighbor-joining tree, and bias of STRUCTURE towards k = 2 [86,87], we assumed k = 4 as the best clustering for our P. calleryana dataset. Almost all P. calleryana individuals throughout all populations showed extensive genetic admixture, in agreement with the detected high gene flow among populations. The individuals from "China" and "USA" population groups had relatively higher assignment probability to the major genetic clusters. Furthermore, genetic distances of "China" and "USA" populations placed them close to each other. Thus, our results support the historical import records of P. calleryana from China to the USA [12] with the "China" population being ancestral to the "USA" population.
The historical events suggest the development of P. calleryana cultivars using P. calleryana as a common rootstock. We expected at least some clones within the "USA" P. calleryana individuals as such cultivars were multiplicated clonally. however, no clones within the "USA" P. calleryana individuals were found, as each "USA" P. calleryana individual was unique with no shared MLGs within this population. Our study also found no identical cultivars although named the same from the source institutions. In such case, 'Trinity' cultivar of one source is genetically different from 'Trinity' cultivar of another source. Such a great genetic composition and diversity with no clones in our "USA" P. calleryana individuals signifies the great invasive potential of the species, and alarms about the level of the previously noted cultivar mislabeling and mishandling [7,88]. Our result is in stark contrast to those of others, who used 2 SSRs from P. pyrifolia and 7 from Malus × domestica on 14 P. calleryana cultivars [7]. They detected clonal MLGs in agreement with the cultivar description, confirming genetic identity of 'Chanticleer', 'Cleveland Select', and 'Stonehill'-in accord with their history, and origin from the same source [7].
The high level of genetic diversity within the investigated P. calleryana collection suggests the high potential of the species for evolving resilience and ability to thrive in a variety of environmental conditions. In addition, the high gene flow among P. calleryana populations creates the complex genetic admixture, adding more challenge to their control. For the development of measures for improved control, it is necessary to study in detail the genetic composition and diversity, and to infer the evolutionary potential of the species across the introduced ranges. Considering the ultimate goal of well-informed control measures, we are already taking the next steps in analyses of the naturalized P. calleryana: the fine-scale study of P. calleryana in localized area of the USA using the gSSRs developed here, and the broad-scale study of P. calleryana across the USA using reduced-representation genotyping-by-sequencing. Thus, P. calleryana populations across the USA provide a great prospect for further research and study, to better understand the spread mechanism of this invasive species.

Conclusions and Outlook
Our study has highlighted factors in P. calleryana native populations that are likely to be contributing to its invasive capability within introduced range in the USA. This tree species has been able to maintain high genetic diversity, which has helped it survive and adapt under diverse environmental, physical, and geographical conditions. The genetic complexity of the species adds complications to efforts at managing naturalized populations and mitigating further spread. The evidenced mishandling of the cultivars likely contributed to enabling the species spread in the naturalized range. In order to limit the cultivar mishandling, we suggest nursery professionals to verify their cultivars with the original ones or to maintain only one/few properly identified cultivars per station, and to consider the high genetic diversity of P. calleryana in large scale production of P. calleryana saplings. Further molecular and remote imagery research is underway to better understand the mechanisms behind the observed spread of P. calleryana across the USA.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/life11060531/s1, Supplementary Supplementary Figure S1. Map of geographic origins for the Asian samples of P. calleryana with known details (n = 28). Locations for several specimens are obscured due to the marker size and the overlap. Map generated using GoogleEarth version 7.3.3.7786 (Google LLC, Mountain View, CA, USA), with surface distance of 300 km marked. Supplementary Figure S2. Frequency of di-, tri-, tetra-, and hexa-nucleotide motif repeats disovered in 306 single motifs polymorphic gSSRs collection. The indicated motifs and their redundant iterations are grouped together. Insert: Frequency of di-, tri-, and tetra-nucleotide motif repeat gSSRs in the discovered 105,557 gSSRs collection; bp = base pair. Supplementary Figure S3. Gel image of crossspecies amplification using ITS and rps16, ran on the same gel. (a) The uppermost row in gel image represents the amplification done using ITS primers [45]; (b) The lowermost row in gel image represents the amplification done using rps16 primers [46]. Order of samples in (a) and (b) is identical, and fro\m left to right includes: DNA ladder (DNA molecular marker of 100 bp; BIONEER, Catalog No.:D-1030; Oakland, California, USA); positive control (P. calleryana; PC_A_019); negative control (water); P. gharbiana (20_PC_AO_29 and 20_PC_AO_33); P. korshinskyi (20_PC_AO_14 and 20_PC_AO_15); P. regelii (20_PC_AO_16 and 20_PC_AO_21); P. hybrid ('Bartlett' × P. salicifolia; 20_PC_AO_08). The expected size of PCR products for ITS was 678 bp and for rps16 was 911 bp. Supplementary Figure S4