Molecular Characterization of Genetic Diversity in Apricot Cultivars: Current Situation and Future Perspectives

: In the recent years, an important renewal of apricot cultivars is taking place worldwide with the introduction of a large number of new releases, which are replacing traditional and local cultivars in many situations. To study the current genetic diversity, a group of 202 apricot accessions, including landraces and releases from breeding programs in several countries, has been characterized using 13 microsatellite markers. The diversity parameters showed higher diversity in modern releases than in landraces, but also suggested a loss of diversity associated with recent breeding. Two main clusters according to the pedigree origin of the accessions were clearly differentiated in the phylogenetic analysis based on Nei’s genetic distance. The ﬁrst group comprised mostly European and North American traditional cultivars, and the second group included the majority of recent and commercial releases from breeding programs. Further population analyses showed the same clustering trend on the distribution of individuals and clusters, conﬁrming the results obtained in the molecular phylogenetic analysis. These results provide a sight of the erosion and the decrease of the genetic diversity in the currently grown apricot and highlight the importance of preserve traditional cultivars and local germplasm to assure genetic resources for further breeding.


Introduction
Apricot (Prunus armeniaca L.) is a diploid fruit tree species of the Rosaceae family. It was originated in Central Asia, where the first pieces of evidence of apricot cultivation date from 406-250 BC [1,2]. The crop was spread worldwide throughout three diffusion routes: Eastern Asia to Japan, the Irano-Caucasian region, and Continental Europe [3]. From the Irano-Caucasian region, it reached the Mediterranean countries by two secondary routes, Southern Europe and North Africa, originating three major apricot gene pools throughout the Mediterranean Basin: the "Irano-Caucasian", the "North-Mediterranean Basin" and the "South-Mediterranean Basin" [4]. A recent study of single nucleotide polymorphisms (SNPs) in apricot revealed that the cultivated apricot resulted as a consequence of two different domestication events. The European cultivated apricots diverged from the wild populations of Northern Central Asia and resulted in four differentiated groups: Mediterranean countries, Continental Europe, North-America and North-Africa. On the other hand, Chinese cultivated apricots were domesticated from Southern Central Asian wild populations [5].
Nowadays, apricots are cultivated in temperate regions around the world, constituting the third stone fruit tree in economic importance worldwide. Near 50% of the world production is concentrated in Mediterranean countries. Turkey was the first world producer in 2019 with 846,606 t (20.7% of world production), followed by Uzbekistan (536,544 t, 13.1%), Iran (329,638 t, 8.1%), Italy (272,990 t, 6.7%) and Algeria (209,224 t, 5.1%) [6].
Apricot cultivars have been traditionally classified into eco-geographical groups according to their geographical origin: Central Asian, East Chinese, North Chinese, Dzhungar-Zailij, Irano-Caucasian, and European [7]. Cultivars from the Central Asian group, which is the oldest and most diverse, are mainly self-incompatible, show high chilling requirements and produce small-medium fruits. The Dzhungar-Zailij group includes mostly self-incompatible and small-fruited cultivars. Asian cultivars have been recently differentiated in two main gene pools: Central Asia and Eastern Asia, which includes Japanese apricots [3]. Most cultivars from the Irano-Caucasian group are characterized by selfincompatibility and low chilling requirements. The European group includes most of the commercial cultivars of Europe, North-America, South Africa and Australia. Most of them are self-compatible [8] and have low chilling requirements [9] and a short ripening time [10,11]. Two main gene pools have been recently differentiated in the European group: Mediterranean Europe and Continental Europe [3].
In the last years, an important renewal of apricot cultivars is taking place worldwide with the introduction of a large number of new releases in response to productive and industrial changes in the crop. Breeding programs from several countries have developed a number of new commercial cultivars focused on common objectives: self-compatibility [12,13], resistance to Plum Pox Virus (PPV), fruit quality, and extension of the ripening period [14]. The release of these new cultivars has led to the displacement of local cultivars in many countries resulting in genetic erosion of apricot diversity [15].
However, information is lacking on the diversity relationships of most of recent apricot releases. In order to fill this gap, a group of 202 apricot accessions, including landraces and releases from breeding programs in several countries were characterized using SSRs to (i) evaluate the current genetic diversity, (ii) establish the similarity relationships between cultivars, and (iii) estimate the levels of population structure in the main cultivars currently grown.

Plant Material
A group of 202 apricot accessions was analyzed, including 30 landraces and 171 releases from breeding programs, and one aprium, an interspecific hybrid between apricot (P. armeniaca) and plum (Prunus salicina Lindl.) ( Table 1). Plant material was collected from germplasm collections and commercial orchards of Aragón, Cataluña, and Extremadura (Spain). Landraces are originating from six countries and releases are of 33 private and public breeding programs from ten countries. Two groups have been considered within the bred accessions: commercial cultivars registered in the Community Plant Variety Office (N = 132) [28] and recent unregistered releases (N = 39). The aprium hybrid was considered a commercial release for the analysis (Table 1).

Data Analysis
In order to analyze the genetic variability and the population structure, different statistical analyses were performed using the R programming environment ( [36], version 4.1.0). The genetic profiles were stored in a csv file in which each allele was coded by a character string. In order to process the SSR dataset, the file was converted into a matrix of allelic frequencies stored in a genind class with the "df2genind" function using the R package "adegenet" v. 2.1.3 [37]. Missing data (<0.1%) were replaced with the mean frequency of the corresponding allele, which avoids adding artefactual between-group differentiation [38].
Number of alleles per locus (N a ), allelic richness (A r ), private alleles (P a ), observed heterozygosity (H o ), expected heterozygosity (H e ), tests for deviations from Hardy-Weinberg expectations (HWE) and inbreeding coefficients (F IS ) were calculated on the landraces and bred cultivars using the "adegenet" v. 2.1.3 [37], "hierfstat" v. 0.5-7 [39], "pegas" v. 1.0-1 [40] and "PopGenReport" v. 3.0.4 [41] packages. The levels of genetic differentiation between all pairs of populations for pre-defined and inferred groups were estimated using Nei's pairwise F ST values using the "hierfstat" v. 0.5-7 R package [39]. In order to validate the F ST pairwise values, a bootstrap with 1000 replicates was carried out using the function "boot.ppfst". Results were plotted as a correlation plot with the R package "corrplot" v. 0.90 [42]. Additionally, the distribution of genetic diversity across the population structure was evaluated with an Analysis of Molecular Variance (AMOVA) using the "poppr" R package v. 2.9.2 [43]. A three-level hierarchical analysis was designed to show the variations within/among the source of origin, the classification (landrace, commercial and recent releases), and the breeding program. All analyses were performed in the "ade4" v. 1.7-16 package [44] using 1000 permutations to assess the significance of variance components.
A R script was developed to detect homonymies and synonymies in the data. Homonymies were identified by the comparison of all accession names by the "duplicated" function, and the allele data were compared using the function "duplicated" in order to detect identical genetic profiles considered as synonymies.
Genetic relatedness among genotypes was analyzed by UPGMA (Unweighted Pair Group Method with Arithmetic averages) cluster analysis based on the Nei and Li similarity index. A dendrogram was generated using the "poppr" package v. 2.9.2 [43] with 1000 bootstrap replicates and plotted with "ape" package v. 5.3 [45].
The genetic structure of the set of accessions was analyzed by a Principal Components Analysis (PCA) and a Discriminant Analysis of Principal Components (DAPC). PCA was performed using "stats" package v. 3.6.0 and was plotted using "ggplot2" v. 3.3.4 package [46]. DAPC was carried out using the "adegenet" package v. 2.1.3 [37]. First, genetic data were transformed using PCA, and then a Discriminant Analysis was performed on the principal components (PC) retained using a cross-validation method. The optimal number of clusters (k) was determined according to the lowest Bayesian Information Criterion (BIC) value obtained with the "find.clusters" function. A cross-validation function ("XvalDapc") was used to confirm the appropriate number of PCA to be retained.

Microsatellite Polymorphism and Genetic Diversity
Ten of the 13 microsatellite markers resulted polymorphic in the analysis of the 202 accessions (Table 1), but no amplification patterns were obtained with three loci (UDP96-001, ssrPaCITA12 and ssrPaCITA19) ( Table 2). The alleles obtained for each accession and locus can be found in Supplementary Table S1.
In order to evaluate the genetic diversity, different parameters were compared between the landraces and the releases from breeding programs. Additionally, the diversity indexes were studied for each group of landraces, commercial cultivars and recent releases ( Table 3). The mean number of alleles found in landraces (6.50) was higher than those obtained in previous reports for traditional cultivars in Spain (4.00 [23]; 4.27 [15]) or Iran (4.62 [47]; 3.01 [27]) resulting presumably from the larger and diverse number of accessions analyzed in this work. However, the number of alleles was lower than those obtained in reports that include wild apricots (23.00 [48]; 16.75 [20]), probably due to not having included cultivars from China, the center of origin [20], as reported by Bourguiba et al. [3]. The highest number of alleles for all of the studied SSR loci was detected in the group of releases from breeding programs, ranging from 7 to 12, in which the average number of alleles (9.40) was higher than that of landraces although without significant differences (p < 0.05) (Supplementary Table S2). Traditional cultivars from different countries have been widely used as material for breeding [49,50]. In Europe, traditional cultivars were susceptible to Sharka (PPV) and most of them were replaced in the late 20th century. Then, most breeding programs focused on developing new cultivars by crossing traditional cultivars from each country, to conserve interesting traits, with PPV-resistant cultivars developed in North America in order to transmit the source of PPV-resistance to the new releases. This admixture in new cultivars could explain the higher number of alleles observed in bred releases due to the incorporation of genotypes developed in North America with Asian ancestors. In commercial cultivars, the average number of alleles (9.30) was higher but not significant (p < 0.05) than in recent releases (6.30), indicating a gradient of decreasing genetic diversity, which suggests that controlled selection in breeding programs may be causing a reduction in the diversity of the crop in recent years. In spite of the higher allelic diversity found in the releases from breeding programs, the allelic richness calculated to measure genetic diversity (6.97) did not showed significant differences (p < 0.05) with respect to landraces (6.20), which can be related to the diversity of origins in the bred accessions. Table 3. Genetic parameters of apricot landraces and cultivars released from breeding programs (commercial and recent). Mean number of alleles (N a ), mean allelic richness (A r ), number of private alleles (P a ), mean observed heterozygosity (H o ), mean expected heterozygosity (H e ), and mean inbreeding coefficient (F IS ). Only one private allele was exclusively found in the landraces, while 30 were found in releases from breeding programs. The presence of rare alleles in bred cultivars is related to the fact that these cultivars have been enriched with germplasm of landraces from different origins. However, the number of private alleles was lower when two separate groups were considered into the group of bred releases: commercial cultivars (22) and recent releases (1), showing a loss of variability in the next generation of apricot cultivars. In general, most of the private alleles were also unique, as they were exclusive to only one genotype; for example, the one found in the group of landraces is only present in the cultivar "Stella" (Tables 1 and 3).
The mean inbreeding coefficient (F IS ) had a positive value in the landraces (0.08) whereas in both recent and commercial releases was lower (−0.04), indicating a certain degree of inbreeding (Table 3). In all accessions, seven out of ten loci showed no deviation from Hardy-Weinberg Equilibrium (HWE), and three (ssrPaCITA23, ssrPaCITA27 and UDP98-412) showed significant departures from HWE (p < 0.05) (Supplementary Table  S2). A loss of apricot genetic diversity during domestication and diffusion from the center of origin to areas of more recent cultivation has been reported in previous works [3,4]. Our results showed an excess of heterozygosity, which is an indicator of recent bottleneck, suggesting a loss of diversity associated to breeding. This can be due to the use of the same or very related parental genotypes in different breeding programs, thus reducing the diversity found in the new cultivars. On the other hand, we found a deficit of heterozygotes in the group of landraces as a result of inbreeding, which is a sign of expansion since traditional cultivars are a source of genetically interesting traits.
To analyze the genetic distance between all pairs of populations, we calculated the Nei's pairwise F ST matrix (Supplementary Table S3). The value of pairwise F ST among the pre-defined groups showed a moderate and significative genetic difference ranging from 0.09 to 0.12 (p < 0.001), although lower than that observed in previous studies in which wild apricots (0.14 [48]) or cultivars from different eco-geographic groups were analyzed (0.58 [19]; 0.14 [26]; 0.32 [51]; 0.38 [15]).
The hierarchical population structure examined by an Analysis of Molecular Variance (AMOVA) ( Table 4) showed that the highest variation (93.55%) occurred within the cultivars, although no significant differences were found. The distribution of genetic diversity suggested a significant population structure considering two origin levels, landraces and breeding releases, and a moderate (6.18%) and significant differentiation among the breeding programs. The level of variation among populations is lower than that obtained when apricot cultivars are grouped by geographical criteria [3,4] or reported in other Prunus [56,57]. However, these values together with those of fixation index seem to support the hypothesis that there is a certain level of variation among landraces and new releases.

Genetic Relationships and Similarities among Genotypes
The dendrogram generated from the UPGMA cluster analysis based on the Nei and Li similarity index revealed two main clusters supported by a strong bootstrap value (100) ( Figure 1A), which clearly differentiated landraces (I) from breeding releases (II). The first cluster (I), containing mostly European and North American traditional cultivars, was subdivided into two sub-clusters according to the geographical origin. One of them (I.A) was composed of traditional cultivars from Spain (e.g., "Currot", "Moniqui", "Búlida"), France ("Beliana") and Greece ("Lito", "Précoce de Tirynthos"). Additionally, recent cultivars from Spanish public breeding programs as "Dama rosa" (IVIA), "Mirlo rojo", "Mirlo blanco" and "Murciana" (CEBAS) were grouped closely to the Spanish traditional cultivars, which is expected since they were developed from local Spanish germplasm. The other sub-cluster (I.B) included three North American cultivars: "Stella" and "Veecot" from USA, and "Harval" from Canada. This grouping supports that North American cultivars originated from European cultivars as suggested by Faust et al. [1] and later shown in different studies [22,58,59]. Our results are also in agreement with the recent breeding activity, since a great varietal renewal has taken place in the last 15 years, with the introduction of 322 new cultivars in Europe [28] from breeding programs worldwide [14]. The fact that breeding programs share some objectives, including resistance to sharka (PPV), has led to the use of a few PPV-resistant North American cultivars as parental lines to introduce this trait into European germplasm to generate new resistant cultivars, since all European cultivars are susceptible to PPV [49,50].
The second cluster (II) comprises the aprium hybrid and two sub-clusters formed mainly of breeding accessions, including recent and commercial releases. Sub-cluster II.A included cultivars from France and Hungary. The Hungarian cultivar "Gönci magyar" was clustered close to the French cultivars "Luizet", "Paviot", "Bergeron", "Bergecot", which is a mutation of "Bergeron", and "Bergarouge", which is a descendent of "Bergeron" [60]. This and previous studies [22,51] support the hypothesis of the presence of Hungarian apricots in the pedigree of some French cultivars [1].

Population Structure
To study the genetic structure of the set of apricot accessions, a Principal Components Analysis (PCA) was carried, and a Discriminant Analysis of Principal Components (DAPC) to further analyze the population structure. This approach identifies and describes clusters of genetically related individuals, which provide a visual assessment of between-population genetic structures.
The two first components of PCA were represented in Figure 2. The first axis (PC-1, 12.85%) reflected population differentiation corresponding to the breeding origin, forming two main clusters. Landraces were located on the left along the x-axis, and breeding releases on the right, without a clear division between recent releases and commercial cultivars. In the second axis (PC-2, 9.05%), no clear differentiated groups were observed. In the DAPC, BIC values were used to determine the most appropriate number of clusters (k = 10) (Supplementary Figures S1 and S2). The first two principal components of DAPC were plotted in Figure 3. The distribution of individuals and clusters showed the same clustering trend that we detected in the molecular phylogenetic analysis ( Figure 1A,B) with a hierarchical structure where three groups could be identified. Spanish landraces were included in only two clusters (Supplementary Table S4). Clusters 1 (N = 10) and 2 (N = 20) were markedly separated from the others on the first principal component (Figure 3, horizontal axis), suggesting genetic structure differentiation. Although the clusters are partially overlapped, these results corroborate the hypotheses of the existence of two main genetic pools in Spain [23]: Cluster 1 encompassed "Moniquí" and synonymies, and Cluster 2 comprised the cultivars originating in Valencia and Murcia. In the DAPC, BIC values were used to determine the most appropriate number of clusters (k = 10) (Supplementary Figures S1 and S2). The first two principal components of DAPC were plotted in Figure 3. The distribution of individuals and clusters showed the same clustering trend that we detected in the molecular phylogenetic analysis ( Figure  1A,B) with a hierarchical structure where three groups could be identified. Spanish landraces were included in only two clusters (Supplementary Table S4). Clusters 1 (N = 10) and 2 (N = 20) were markedly separated from the others on the first principal component (Figure 3, horizontal axis), suggesting genetic structure differentiation. Although the clusters are partially overlapped, these results corroborate the hypotheses of the existence of two main genetic pools in Spain [23]: Cluster 1 encompassed "Moniquí" and synonymies, and Cluster 2 comprised the cultivars originating in Valencia and Murcia.  Interestingly, North American cultivars ("Harval", "Henderson", "Stark Early Orange", "Stella", and "Veecot") and most of the commercial cultivars were included in the other clusters (Supplementary Table S4). This is in agreement with the use of American genotypes as a genetic source for Sharka resistance in apricot breeding. The majority of the recent releases from Spanish breeding programs and a high number of commercial Spanish cultivars, were plotted overlapped and separately from the rest of the bred cultivars in clusters 8 (N = 24) and 10 (N = 20). Finally, the other six clusters were overlapped, indicating a tendency of the relatedness between them within the breeding program. This repeated use of some common genotypes as parental lines in most of the breeding programs may result in loss of genetic diversity and, as consequence, there is a risk of a bottleneck in future generations of this crop.
FST values between each cluster were calculated in order to study the differentiation between the populations obtained by DAPC (Figure 4; Supplementary Table S5). All the correlations were highly significant (p < 0.001), showing moderate or high genetic differentiation among the defined clusters. The values were higher among cluster 1 and the rest of the clusters (0.20-0.39) reflecting a great genetic differentiation. Therefore, these results provide very interesting information for breeding programs and conservation of germplasm collections. The choice of landraces from cluster 1 in breeding programs would increase the diversity of the new cultivars. Furthermore, the use of this traditional germplasm would provide a source of interesting traits that could be used to respond to new market demands or agroclimatic conditions. Interestingly, North American cultivars ("Harval", "Henderson", "Stark Early Orange", "Stella", and "Veecot") and most of the commercial cultivars were included in the other clusters (Supplementary Table S4). This is in agreement with the use of American genotypes as a genetic source for Sharka resistance in apricot breeding. The majority of the recent releases from Spanish breeding programs and a high number of commercial Spanish cultivars, were plotted overlapped and separately from the rest of the bred cultivars in clusters 8 (N = 24) and 10 (N = 20). Finally, the other six clusters were overlapped, indicating a tendency of the relatedness between them within the breeding program. This repeated use of some common genotypes as parental lines in most of the breeding programs may result in loss of genetic diversity and, as consequence, there is a risk of a bottleneck in future generations of this crop.
F ST values between each cluster were calculated in order to study the differentiation between the populations obtained by DAPC (Figure 4; Supplementary Table S5). All the correlations were highly significant (p < 0.001), showing moderate or high genetic differentiation among the defined clusters. The values were higher among cluster 1 and the rest of the clusters (0.20-0.39) reflecting a great genetic differentiation. Therefore, these results provide very interesting information for breeding programs and conservation of germplasm collections. The choice of landraces from cluster 1 in breeding programs would increase the diversity of the new cultivars. Furthermore, the use of this traditional germplasm would provide a source of interesting traits that could be used to respond to new market demands or agroclimatic conditions.

Conclusions
Results reveal a clear differentiation between apricot landraces, commercial cultivars and recent releases developed from breeding programs. The results showed higher diversity in bred cultivars than in landraces. This could seem a paradox but this situation could be explained by the introduction in most breeding programs of North American genotypes with alleles from Asian genotypes not present in the European landraces. As a consequence, although the introduction of new releases is increasing allelic diversity in cultivated apricot germplasm, our results suggest that the use of common parents in breeding programs can lead to a genetic bottleneck. Thus, the replacement of local landraces and traditional cultivars by genetically related bred genotypes is resulting in an erosion and decrease of the genetic diversity in grown apricot worldwide. The preservation of traditional cultivars and local germplasm is important in order to reduce the genetic erosion and conserve valuable genetic resources for further breeding.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Cross-validation procedure to choose the optimal number of Principal Components for the DAPC analysis. (A) A general analysis with 30 replicates of cross-validation each PC. (B) Specifying a number interval of PC around 50 with 1000 replicates. Figure S2: Graph of BIC values shows the optimum number of clusters (k = 10). Table S1: Genotypes of 201 diploid apricot accessions and one interspecific hybrid (aprium) revealed by 10 microsatellite markers. Table S2: Genetic parameters of apricot landraces and cultivars released from breeding programs (commercial and recent) for the 10 studied microsatellite loci. Table S3: Pairwise FST calculated among apricot accessions using "hierfstat" v. 0.5-7 R package. (A) Landraces (LN) and cultivars released from breeding programs (BP). (B) Upper limit (above the diagonal) and lower limit (below the diagonal) of the 99% confidence interval based on 1000 bootstrap replicates between LN and BP. (C) Landraces (LN), commercial (CR) and recent released selections (RR) from breeding programs. (D) Upper limit (above the diagonal) and lower limit (below the diagonal) of the 99% confidence interval based on 1000 bootstrap replicates among LN, CR, and RR. Table S4: List of accessions including group assignment from DAPC analysis (k = 10). Table S5: Upper limit (above the diagonal) and lower limit (below the diagonal) of the 99% confidence interval based on 1000 bootstrap replicates among populations identified by DAPC using "hierfstat" v. 0.5-7 R package.

Conclusions
Results reveal a clear differentiation between apricot landraces, commercial cultivars and recent releases developed from breeding programs. The results showed higher diversity in bred cultivars than in landraces. This could seem a paradox but this situation could be explained by the introduction in most breeding programs of North American genotypes with alleles from Asian genotypes not present in the European landraces. As a consequence, although the introduction of new releases is increasing allelic diversity in cultivated apricot germplasm, our results suggest that the use of common parents in breeding programs can lead to a genetic bottleneck. Thus, the replacement of local landraces and traditional cultivars by genetically related bred genotypes is resulting in an erosion and decrease of the genetic diversity in grown apricot worldwide. The preservation of traditional cultivars and local germplasm is important in order to reduce the genetic erosion and conserve valuable genetic resources for further breeding.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/agronomy11091714/s1, Figure S1: Cross-validation procedure to choose the optimal number of Principal Components for the DAPC analysis. (A) A general analysis with 30 replicates of cross-validation each PC. (B) Specifying a number interval of PC around 50 with 1000 replicates. Figure S2: Graph of BIC values shows the optimum number of clusters (k = 10). Table S1: Genotypes of 201 diploid apricot accessions and one interspecific hybrid (aprium) revealed by 10 microsatellite markers. Table S2: Genetic parameters of apricot landraces and cultivars released from breeding programs (commercial and recent) for the 10 studied microsatellite loci. Table S3: Pairwise F ST calculated among apricot accessions using "hierfstat" v. 0.5-7 R package. (A) Landraces (LN) and cultivars released from breeding programs (BP). (B) Upper limit (above the diagonal) and lower limit (below the diagonal) of the 99% confidence interval based on 1000 bootstrap replicates between LN and BP. (C) Landraces (LN), commercial (CR) and recent released selections (RR) from breeding programs. (D) Upper limit (above the diagonal) and lower limit (below the diagonal) of the 99% confidence interval based on 1000 bootstrap replicates among LN, CR, and RR. Table S4: List of accessions including group assignment from DAPC analysis (k = 10).