Genetic Diversity of Walnut (Juglans Regia L.) in the Eastern Italian Alps

Juglans regia L. is distributed primarily across temperate and subtropical regions of the Northern Hemisphere. During the last glaciation, the species survived in refugial areas that in Europe included the Balkans and the Italian peninsula, two areas joined by a corridor represented by the Friuli Venezia Giulia region, where two germplasm reservoirs met and likely intercrossed during re-colonization after the last glaciation. In this work, two hundred and fifteen wild accessions native to the area were sampled, georeferenced, and genotyped with 20 microsatellite loci selected from the literature. The local accessions of this study displayed moderate genetic diversity with 80 alleles identified. The number of alleles/loci was 4.0 (4.7 alleles for the genomic SSRs (Simple Sequence Repeats) and 2.7 alleles per EST (Expressed Sequence Tag)-derived SSR, on average). An analysis of molecular variance (AMOVA) revealed that most of the molecular diversity was between individuals (nearly 98% of variation explained). The model-based clustering algorithms implemented either in STRUCTURE and GENELAND software revealed two clusters: The first one encompassed most of the samples and showed a great genetic admixture throughout the five sampling areas defined on the base of orographic characteristics of the region. The second cluster represented a small island with three samples traced back to an introduction from Russia at the beginning of the 20th century.


Introduction
Persian walnut (Juglans regia L.) is one of approximately 21 species in the genus Juglans L. [1].The genus originated in the late Paleocene, diversified during the Eocene [2], and maintained its presence in the area between the 45 • and 65 • paleolatitudes during evolution of the climate to cooler conditions [3].Juglans regia L. is nowadays distributed primarily across the temperate and subtropical regions of the Northern Hemisphere from Central Asia to the Mediterranean basin [4].Because of the presence of large walnut (J.regia L.) forests in countries of Central Asia such as Kyrgyzstan, Uzbekistan, and Tajikistan, and the large phenotypic diversity recorded in those countries, Vavilov considered Central Asia to be the primary center of diversity of the species [5], but the subject is still debated [6].It is likely that prior to the Pleistocene glaciations the species had a wide distribution in Eurasia, but during the glacial periods the distribution was contracted to refugial areas in China, the Himalayan slopes, Southern-Central Asia, the Balkans, and the Iberian and Italian peninsulas [7][8][9][10].Following the last glaciation, recolonization of the species' original areas of diffusion and other neighboring areas with compatible climates has likely masked the preglacial centers of origin of this species.Yet, this spontaneous spread was accompanied 2000 years ago by artificial diffusion through human silvo-pastoral practices, as has been demonstrated in the Fergana Valley where walnut forest stands are a mosaic of natural and planted trees [9,[11][12][13].Nevertheless, disregarding the problem of the original center of diffusion of the species, and considering only the recent spread after the last glacial maximum (LGM), the Southern Balkan and Italian peninsulas were two likely reservoirs of germplasm from which the species recolonized northern latitudes [7].However, the severe reduction in the walnut populations in these reservoirs that has been claimed by some authors would have favored the re-introduction of the species from Western Asia [14].The above-mentioned peninsulas, which are separated from each other by the Adriatic Sea, are joined to the north by a corridor represented by the Friuli Venezia Giulia Region and neighboring areas such as Istria.These are known to be areas where different germplasm populations, which had retreated during the last glaciation, came into contact again, thus creating new genetic diversity.The botanical richness of these regions evidences the intermingling of the flora of the two glacial refuges [15].Walnut fossils are commonly found in the Neolithic sites of northeastern Italy [16].Therefore, a wide genetic diversity is expected in the corridor represented by the Friuli Venezia Giulia region, so a wide exploration of wild walnut was undertaken in that area by sampling 215 old trees propagated by seed alone according to information collected from local people.The genetic diversity was investigated with 20 Simple Sequence Repeat (SSR) markers selected from the literature.

Plant Materials
The work plan aimed to sample walnut trees (Juglans regia L.) originating from seedlings in the Friuli Venezia Giulia region.To achieve a good representation of the native walnut population, the region was initially split into geographically homogeneous areas, represented by the eastern Alpine highlands ('Alpi Giulie', pop 1), the western Alpine highlands ('Alpi Carniche', pop 2), the Friuli plains (pop 3), the valleys of the Torre and Natisone rivers (pop 4), and the Trieste karst (pop 5) (Figure 1).At each location, the trees sampled were among the oldest individuals in the area.In some areas that were poorly populated, additional younger trees were sampled when their seed origins were known with confidence.A total of 215 accessions were collected.The list of the accessions analyzed is reported in the Supplementary Material (Table S1).

DNA Extraction
Leaves were collected from young shoots at the beginning of the season or mature leaves later in the season.DNA was extracted using the DNeasy 96 Plant Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's protocol.

SSR Genotyping
A review of the SSRs present in the literature [10,12,[17][18][19][20][21][22][23][24][25][26] suggested the selection of twenty-two SSR markers, which were tested initially in a small panel of accessions.The control of amplification, the goodness of signal recorded by the automatic sequencer, and the number and quality of the true peaks suggested that only 20 of those SSR markers be retained for the analysis of the whole set of accessions.Furthermore, the neutrality of selected loci was checked with the Ewens-Watterson Test (1000 permutations) [27] using PopGene software [28].Markers from different literature sources are collectively reported in    The forward primers were tailed by adding a 19-base M13 oligo sequence (M13 tail) labeled with FAM or HEX dye to the 5 end [29,30].The PCR reaction was carried out in a 10 µL solution containing 10 ng genomic DNA, 1 × Mg-free PCR buffer solution, 0.20 mM•dNTPs (Deoxynucleotide Triphosphates), 3.0 mM•MgCl2, 0.10 pmol forward primer, 0.30 pmol reverse primer, 0.30 pmol M13-labeled primer, 1.0 U AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, CA, USA), and dH 2 O. Amplification was performed in a 9700 Thermal Cycler (Applied Biosystems) with the following cycles: 10 min at 94 • C followed by 30 cycles of 30 s at 94 • C, 45 s at 57

Genetic Variation and Related Metrics
Diversity parameters were estimated for each SSR locus, including the number of alleles, the number of unique genotypes, observed heterozygosity (Ho), expected heterozygosity or gene diversity (GD), polymorphism information content (PIC), and the major allele frequency (MAF).The parameters were calculated with PowerMarker v3.25 software [31].The Hardy-Weinberg equilibrium (HW) and the probability of identity (PID) for both unrelated genotypes and full sibs [32] were calculated using Cervus (http://www.fieldgenetics.com)[33,34].The estimation of the expected frequency of null alleles (Fnull) was calculated in Cervus using an iterative algorithm (10 iterations) based on the observed and expected frequencies of the various genotypes [35].The kinship coefficient (Fij) was estimated according to J. Nason as described in [36] using SPAGeDi 1.5 [37], and statistical significance was determined by Jackknifed after 20,000 permutation estimators [38].

The Comparison among Populations
Genetic diversity was measured for each population across all loci by calculating the actual (A) and the effective (Ne) number of alleles/loci, observed (Ho) and expected heterozygosity (He), using GenAlEx 6.3 software [39].Allelic richness (Rs) was calculated by using the rarefaction method with HP-Rare software [40,41].The estimates of Rs were standardized on a minimum sample size of six individuals.

Distance-Based Clustering
Distance matrixes of pairwise combinations of populations and individual accessions were calculated for co-dominant data by the Codom-Genotypic distance option in GenAlEx 6.1 software [39,42].The matrixes generated were used for subsequent PCA and analysis of molecular variance (AMOVA) analyses with 999 permutations.A Mantel test, as implemented in GenAlEx, was performed using genetic and geographical distances to test for isolation by distance [43].Cluster analysis was carried out on genetic distances by the neighbor joining method [44] using MEGA 6 software [45].The results were visualized in the form of a circular tree using TreeView v1.6.6 [46].

Model-Based Clustering
We applied the model-based clustering algorithm implemented in the software, STRUCTURE [47].STRUCTURE was run independently 20 times for each K value (range 1-20) using 250,000 iterations for burn-in and 1,000,000 iterations for MCMC (Markov chain Monte Carlo).We used the admixture model option with the correlated allele frequencies [48].Parameters were set to their default values, as advised in the package documentation [49].All accessions were treated as having unknown origin (USEPOPINFO = 0).The inference of true K (number of populations) was calculated based on the second order rate of change of the likelihood (∆K) [50].The STRUCTURE results were processed using STRUCTURE HARVESTER web version v0.6.94 (http://taylor0.biology.ucla.edu/structureHarvester/) [51].

Landscape Genetics
We also inferred population structure using a Bayesian Monte Carlo Markov Chains method implemented in the Geneland package, version 4.0.5 [52], under the R Language and Environment for Statistical Computing software, as described by Guillot et al. [53][54][55] and Guillot [56].Twenty independent Markov Chain Monte Carlo runs were performed by Geneland with the following settings: 1,000,000 iterations with 100 thinning intervals and a burn-in period of 250,000, using the correlated allele frequencies model.The maximum number of populations was unknown and hence treated as simulated variable along the MCMC simulations allowed to vary between 1 and 10.The run displayed a clear mode at K = 2 which was thus the maximum a posteriori estimate of K, confirming the number of sub-populations (K) estimated by the empirical statistic K [50].Therefore, we did a second run with the maximum number of populations set to 2. A map of posterior probabilities (membership) was obtained by inputting PostProcessChain and PostTessellation functions into Geneland by tessellating the landscape at a resolution of 1 m.The posterior probabilities of population memberships were plotted on Google maps as reported in the Geneland package version 4.0.5.Null alleles were also calculated with Geneland 4.0.5 [55], but are not reported here.

SSR Marker Polymorphism and Genetic Metrics
Several SSR markers of walnut reported in the literature were unable to provide reliable outputs, despite their repeated citation.Lack of polymorphism, multilocus amplification, a large frequency of null alleles, strong peak stuttering, and inconstant amplifications were the main drawbacks.After a preliminary amplification test, twenty markers that provided good amplifications and clear fragment separation were selected and used to genotype the whole set of samples.The allelic profiles of the collected accessions are reported in the Supplementary Material (Table S1).The number of alleles per locus ranged from two to nine (mean 4.0).The observed heterozygosity (Ho) ranged from 0.051 to 0.712 (mean 0.462), and the genetic diversity ranged from 0.059 to 0.745 (mean 0.516), which in most cases closely followed the observed heterozygosity.The polymorphism information content (PIC) ranged from 0.058 to 0.698 (mean 0.438), which also followed the observed and expected heterozygosity (Table 2).Contig_721 and Contig_1692 were the two least informative markers due to the very high frequency of the major allele (0.97 and 0.96, respectively).Generally speaking, SSR markers derived from genomic libraries were more informative and discriminated better among accessions compared to those derived from EST (Expressed Sequence Tag) libraries, with the mean number of alleles being 4.7 vs. 2.7, the observed heterozygosity 0.558 vs. 0.367, the gene diversity 0.589 vs. 0.381, and the PIC 0.505 vs. 0.313, respectively (Table 2).In spite of this low polymorphism of EST-derived markers, the cumulative probability of identity (PID), which is the probability that two individuals would share by chance the same profile over all loci, was appreciably low, ranging from 2.93 × 10 −11 for unrelated individuals to 1.075 × 10 −5 for full sibs (Table 2).The estimated frequency of null alleles was below 12%, except in the case of WGA349 where the frequency was rather higher (a.23%).The estimates provided by Cervus and Geneland were similar in magnitude but different in values due to the different algorithms adopted by the two packages.It is noteworthy that the null allele frequency is estimated indirectly because the value simply indicates an excess of homozygotes, which are not necessarily linked to the presence of null alleles.For this we took the estimate of null alleles as a warning about the use of a given marker in genotyping individuals.None of the 215 accessions showed identical profiles to any other accession of the dataset (data not shown).
Table 2. Genetic metrics of the 20 SSR markers tested on the whole set of walnut accessions: number of alleles, allele size ranges, number of genotypes, observed heterozygosity (Ho), gene diversity (GD), polymorphism information content (PIC), major allele frequency without null alleles (MAF), Hardy-Weinberg equilibrium (HW), estimated frequency of null alleles (Fnull), and probability of identity (PID) for unrelated individuals and full-sibs.  a) In the case of probability of Identity (PID) the reported value is the combined non-exclusion probability; (b) The fragment sizes do not include the 19 pigtail bases.

Hierarchical and Model-Based Clustering of Walnut Accessions
The clusters that could be identified in the tree topology, according to the level of similarity selected (Figure 2), did not group individuals according to their geographical origin.Frequently, accessions sampled from adjacent locations were clustered far away from each other.Although such evidence has already been reported for Italian walnut germplasm [24], it is not easy to explain such an unexpected outcome.From the evidence of the current study it would seem likely that the accession clustering is the result of the traditional ways that walnuts have been distributed across the region.In previous centuries, and especially in the 19th and 20th centuries, fruits collected in the alpine valleys were traditionally sold in markets across the entire region during local festivals.We have found traces of this traditional nut market in local books and leaflets (cited in [57]).The analysis of molecular variance (AMOVA) confirmed that the differences between the populations accounted for only 2.12% of the entire variance, while the variability within populations accounted for 97.87% (Table 3).The Mantel test did not reveal any significant correlation between geographic and genetic distances among accessions (data not shown).The pairwise comparisons of populations for FST did not show any apparent subdivision of the whole population in the study (Table 4), thus confirming the large genetic admixture observed with other analyses.A further analysis carried out by splitting the whole population into the five geographic sub-populations confirmed that the whole population should be correctly treated as a non-subdivided population.The only new information obtained from such an analysis was the existence of private alleles, that is alleles carried by a single individual accession (Table 5).The existence of accessions with private alleles could have the significance of an incipient genetic isolation of those accessions from the original population or sub-population, but when private alleles of a single accessions are one or very few, as this is the case, no prediction can be made on the evolution of population fragmentation.The situation could not be permanent and gene flow could change by the modification of the genetic and geographical barriers.The genetic structure was investigated by two further approaches: the first one based on the Bayesian cluster analysis implemented in STRUCTURE [47] and a second one represented by the spatial clustering model as implemented in Geneland.The advantage of using spatial vs. non-spatial clustering models lies in the ability to obtain more accurate results when the number of loci is relatively small and data are characterized by low levels of genetic differentiation [52,53].In the first case, the analysis strongly supported two clusters with K = 2 corresponding to the maximum ∆K (delta K) (Figure 3A,B).We did not find any apparent common feature, such as the geographic origin, among the samples belonging to the same cluster (Figure 4).The results of Geneland analyses were consistent with STRUCTURE by clearly showing that the same two distinct clusters can be identified in the study area (Figures 4 and 5A).In addition, Geneland indicated that the genetic differentiation between the two subpopulations (clusters 1 and 2) is small (FST = 0.033), thereby showing a similar genetic structure.In particular, cluster 1 showed a higher FIS value (0.21) with respect to cluster 2 (0.05), indicating a higher deficit of heterozygotes in the cluster 1.It is noteworthy that cluster 1 is composed of the individuals W156, W157, and W216 (Figure 5B,C).The kinship coefficient (Fij) between W156 and W157 was 0.25, indicating a full sib origin.The W216 individual has a Fij equal to 0.02 with individual W157, and a negative value with all the other individuals analyzed.Therefore, even if the relationship between the W216 and W157 individuals is very low, this is sufficient to explain the presence of W216 in cluster 1.  S1. Accessions from the same geographic area have the same color (see the legend in Figure 1B for association of the colors with the geographical areas).S1.Accessions from the same geographic area have the same color (see the legend in Figure 1B for association of the colors with the geographical areas).S1.Accessions from the same geographic area have the same color (see the legend in Figure 1B for association of the colors with the geographical areas).

Conclusions
Considering the large number of accessions collected in the sampling area, we are confident that we have sampled a good representation of the genetic diversity of walnuts in the Friuli Venezia Giulia region.If we exclude the EST-derived markers, which showed very low polymorphism, the genetic metrics observed in the 13 genomic markers of the study were similar to those reported by Pollegioni et al. [24].These authors reported that the common markers analyzed in 456 walnut trees sampled in Italy were substantially comparable with earlier reports that considered large regional or national sampling [18,22].As stated by other authors, the genetic diversity of J. regia investigated primarily on a national or regional scale across Eurasia has revealed low levels of genetic differentiation among populations and no robust geographic patterns of genetic diversity [10,12,24], and this has been reinforced by the current work, even though it was carried out on a limited geographical scale.The reasons for such moderate genetic diversity are not easily determined.One hypothesis is that the reservoirs of walnut genetic diversity in Europe were largely eroded during the last glacial maximum [14]; the second hypothesis, which is not in contrast with the first one, rules out that the re-introduction of walnut from the Western Asia by humans suffered a bottleneck [12], and has not been mitigated by the few generations that have passed in the intervening 2000 years.Historians report a well-developed walnut industry in the sub-alpine arch surrounding the Friuli plain.The people of Carnia and the Torre and Natisone valleys attended local festivals bringing and selling their nuts collected from the wild or grown in their courtyards from heirloom seed [57].Walnut seeds were for human consumption, but they were occasionally planted at the birth of children in looking ahead to their eventual marriages, at which time the trees were cut to craft furniture for the new family.This dispersal of seeds could explain the inability of the dendrogram of genetic distances among accessions to group samples according to neighborhood, but instead it distributed the local samples into numerous clusters that did not represent any of the geographical areas identified at the time of planning the study.The walnut admixture we report as a result of human traditions and heirloom

Figure 1 .Figure 1 .
Figure 1.(A) General map with a red circle representing the region of study; (B) Explosion of the sampling areas, for the color code see the legend below.Trieste Karst

Figure 2 .
Figure 2. UPGMA (Unweighted Pair Group Method with Arithmetic Mean)-based circular dendrogram showing genetic similarities between the 215 walnut (Juglans regia L.) accessions analyzed as listed in the supplementary material, TableS1.Accessions from the same geographic area have the same color (see the legend in Figure1Bfor association of the colors with the geographical areas).

Figure 3 .
Figure 3. Inference of K, the most probable number of clusters, using STRUCTURE software, based on microsatellite analysis of 215 walnut samples.(A) Plot of mean likelihood L(K) and variance per K value from STRUCTURE on a dataset containing 215 individuals genotyped for 20 polymorphic microsatellite loci over 20 runs for each K value.(B) Plot of the ad hoc statistic ΔK, which is based on the rate of change of the log-likelihood as K is increased)[50] ΔK tends to peak at the value of K that corresponds to the highest level of hierarchical substructure.The modal value of this distribution is the true K, here two clusters.

Figure 2 .
Figure 2. UPGMA (Unweighted Pair Group Method with Arithmetic Mean)-based circular dendrogram showing genetic similarities between the 215 walnut (Juglans regia L.) accessions analyzed as listed in the Supplementary Material, TableS1.Accessions from the same geographic area have the same color (see the legend in Figure1Bfor association of the colors with the geographical areas).

Figure 2 .
Figure 2. UPGMA (Unweighted Pair Group Method with Arithmetic Mean)-based circular dendrogram showing genetic similarities between the 215 walnut (Juglans regia L.) accessions analyzed as listed in the supplementary material, TableS1.Accessions from the same geographic area have the same color (see the legend in Figure1Bfor association of the colors with the geographical areas).

Figure 3 .
Figure 3. Inference of K, the most probable number of clusters, using STRUCTURE software, based on microsatellite analysis of 215 walnut samples.(A) Plot of mean likelihood L(K) and variance per K value from STRUCTURE on a dataset containing 215 individuals genotyped for 20 polymorphic microsatellite loci over 20 runs for each K value.(B) Plot of the ad hoc statistic ΔK, which is based on the rate of change of the log-likelihood as K is increased)[50] ΔK tends to peak at the value of K that corresponds to the highest level of hierarchical substructure.The modal value of this distribution is the true K, here two clusters.

Figure 3 .
Figure 3. Inference of K, the most probable number of clusters, using STRUCTURE software, based on microsatellite analysis of 215 walnut samples.(A) Plot of mean likelihood L(K) and variance per K value from STRUCTURE on a dataset containing 215 individuals genotyped for 20 polymorphic microsatellite loci over 20 runs for each K value; (B) Plot of the ad hoc statistic ∆K, which is based on the rate of change of the log-likelihood as K is increased) [50] ∆K tends to peak at the value of K that corresponds to the highest level of hierarchical substructure.The modal value of this distribution is the true K, here two clusters.Forests 2017, 8, 81 10 of 15

Figure 4 .
Figure 4. Population structure inference for 215 walnut (Juglans regia L.) accessions as assigned using STRUCTURE [47] and the admixture option for K = 2 (see text for details).Vertical bars represent individual accessions.The length of segment color in each vertical bar represents the proportion contributed by each of the two populations in the model (represented by different colors) to that individual.

Figure 4 .
Figure 4. Population structure inference for 215 walnut (Juglans regia L.) accessions as assigned using STRUCTURE [47] and the admixture option for K = 2 (see text for details).Vertical bars represent individual accessions.The length of segment color in each vertical bar represents the proportion contributed by each of the two populations in the model (represented by different colors) to that individual.

Figure 4 .
Figure 4. Population structure inference for 215 walnut (Juglans regia L.) accessions as assigned using STRUCTURE [47] and the admixture option for K = 2 (see text for details).Vertical bars represent individual accessions.The length of segment color in each vertical bar represents the proportion contributed by each of the two populations in the model (represented by different colors) to that individual.

Figure 5 .Table 5 .
Figure 5. Results of Geneland analysis of 215 walnut (Juglans regia L.) accessions for K = 2. (A) Map of estimated population membership plotted on Google maps, with cluster 1 in green and cluster 2 in white.The accession IDs of the three individuals in cluster 1 are recorded (see text for details).(B) and (C) are maps of posterior probabilities of cluster 1 and cluster 2, respectively.Contour lines represent posterior probabilities of belonging to cluster 1 (B) and 2 (C).Table 5. Genetic diversity within the five populations of Persian walnut (between brackets the no. of samples) collected in the Friuli Venezia Giulia Region assessed with 20 SSR loci.A = no. of alleles, Ne = effective no. of alleles, Rs = allelic richness, Ho = observed heterozygosity, GD = gene diversity, Pa = private alleles (the private allele length is listed in brackets).

Figure 5 .
Figure 5. Results of Geneland analysis of 215 walnut (Juglans regia L.) accessions for K = 2. (A) Map of estimated population membership plotted on Google maps, with cluster 1 in green and cluster 2 in white.The accession IDs of the three individuals in cluster 1 are recorded (see text for details); (B) and (C) are maps of posterior probabilities of cluster 1 and cluster 2, respectively.Contour lines represent posterior probabilities of belonging to cluster 1 (B) and 2 (C).

Table
1 with primers and annealing temperatures for the convenience of the reader.General map with a red circle representing the region of study; (B) Explosion of the sampling areas, for the color code see the legend below.General map with a red circle representing the region of study; (B) Explosion of the sampling areas, for the color code see the legend below.General map with a red circle representing the region of study; (B) Explosion of the sampling areas, for the color code see the legend below.
Figure 1.(A) General map with a red circle representing the region of study; (B) Explosion of the sampling areas, for the color code see the legend below.

Table 1 .
List of the 20 Simple Sequence Repeat (SSR) markers selected from the literature for this study.
• C, 45 s at 72 • C; eight cycles of 30 s at 94 • C, 45 s at 53 • C, 45 s at 72 • C; and a final extension cycle of 30 min at 72 • C. The PCR products were separated in an ABI 3730 DNA automatic sequencer (Applied Biosystems) and fragments sized by means of the GeneScan™ 500 LIZ ® Size Standard (Applied Biosystems).Data were analyzed and alleles called with GeneMarker software version 2.2.0 (SoftGenetics, College Station, TX, USA).

Table 3 .
Analysis of Molecular Variance (AMOVA) of five walnut populations analyzed with 20 SSRs.

Table 4 .
Pairwise Genetic distances (FST) between sampling locations (the number of samples is provided in brackets), d.f.= degrees of freedom

Table 5 .
Genetic diversity within the five populations of Persian walnut (between brackets the no. of samples) collected in the Friuli Venezia Giulia Region assessed with 20 SSR loci.A = no. of alleles, Ne = effective no. of alleles, Rs = allelic richness, Ho = observed heterozygosity, GD = gene diversity, Pa = private alleles (the private allele length is listed in brackets).