Labelling Selective Sweeps Used in Durum Wheat Breeding from a Diverse and Structured Panel of Landraces and Cultivars

Simple Summary Evaluation of the genetic diversity of a crop species is a critical step for breeding. Landraces are essential to avoid genetic erosion, and Mediterranean landraces are an important group of genetic resources due to their high genetic variability, adaptation to local conditions in rainfed environments, and their resilience to pests and pathogens. This study uses a genome-wide association approach employing eigenvectors to identify selective sweeps among Mediterranean durum wheat landraces and a world panel of modern cultivars. Abstract A panel of 387 durum wheat genotypes including Mediterranean landraces and modern cultivars was characterized with 46,161 diversity arrays technology (DArTseq) markers. Analysis of population structure uncovered the existence of five subpopulations (SP) related to the pattern of migration of durum wheat from the domestication area to the west of the Mediterranean basin (SPs 1, 2, and 3) and further improved germplasm (SPs 4 and 5). The total genetic diversity (HT) was 0.40 with a genetic differentiation (GST) of 0.08 and a mean gene flow among SPs of 6.02. The lowest gene flow was detected between SP 1 (presumably the ancient genetic pool of the panel) and SPs 4 and 5. However, gene flow from SP 2 to modern cultivars was much higher. The highest gene flow was detected between SP 3 (western Mediterranean germplasm) and SP 5 (North American and European cultivars). A genome wide association study (GWAS) approach using the top ten eigenvectors as phenotypic data revealed the presence of 89 selective sweeps, represented as quantitative trait loci (QTL) hotspots, widely distributed across the durum wheat genome. A principal component analysis (PCoA) using 147 markers with −log10 p > 5 identified three regions located on chromosomes 2A, 2B and 3A as the main drivers for differentiation of Mediterranean landraces. Gene flow between SPs offers clues regarding the putative use of Mediterranean old durum germplasm by the breeding programs represented in the structure analysis. EigenGWAS identified selective sweeps among landraces and modern cultivars. The analysis of the corresponding genomic regions in the ‘Zavitan’, ‘Svevo’ and ‘Chinese Spring’ genomes discovered the presence of important functional genes including Ppd, Vrn, Rht, and gene models involved in important biological processes including LRR-RLK, MADS-box, NAC, and F-box.


Introduction
Durum wheat (Triticum turgidum L. var. durum) originated in the Fertile Crescent 10,000 years ago and propagated across the Mediterranean basin, arriving in the Iberian Peninsula from two routes: southern Europe and northern Africa [1,2]. During this migration, both natural and human selection occurred and new traits allowing adaptation to the new environments were selected, resulting in the expansion of local landraces [3]. Landraces were broadly cultivated until the 1960s, when they were replaced by new and improved semi-dwarf cultivars arising from the Green Revolution. However, due to their

Plant Material
The diversity panel was comprised of a panel of 387 durum wheat genotypes, including 183 landraces from 24 Mediterranean and eastern European countries and a set of commercial varieties from 24 countries, representing the main durum wheat growing areas in the world (204 genotypes) (Supplementary Table S1). The landrace populations were supplied by public gene banks (the Centro de Recursos Fitogenéticos CRF-INIA, Spain, the ICARDA Germplasm Bank, and the USDA Germplasm Bank) and were increased in bulk and purified to select the dominant type (frequency higher than 80%). Modern cultivars were provided by the IRTA durum wheat collection, international centres (CIMMYT and ICARDA), and breeding companies.

Genotyping
DNA was isolated from fresh leaf samples according to Doyle and Doyle [24]. Highthroughput genotyping was performed at Diversity Arrays Technology Pty Ltd. (Canberra, Australia) (http://www.diversityarrays.com, accessed on 1 February 2020) with the DArTseq GBS platform [25]. A total of 46,161 markers were used to genotype the association mapping panel, including 35,837 presence-absence variants (PAV) and 10,324 SNPs. The consensus map of wheat v4, available at https://www.diversityarrays.com/technologyand-resources/genetic-maps/ (accessed on 1 February 2020) (Diversity Arrays Technology Pty Ltd., Canberra, Australia), was used for mapping purposes.
Linkage disequilibrium (LD) was estimated using TASSEL 5.0 [30] as the square of marker correlations (r 2 ) for mapped markers at a significance level of p < 0.001 with a sliding window of 50 cM. The r 2 values were plotted against the genetic distance and a locally estimated scatterplot smoothing (LOESS) curve was fitted to determine the distance at which the curve intercepts the line of a critical value of r 2 to estimate the LD decay. The critical value of r 2 was determined as the mean r 2 for each genome.
The genetic structure of the association mapping panel was estimated using the Bayesian clustering algorithm implemented in the software STRUCTURE v2.3.4 [31], which uses an admixture model with burn-in and Monte Carlo Markov chain for 10,000 and 100,000 cycles, respectively. The Evanno method [32] was used to calculate the most likely number of subpopulations using the online software STRUCTURE HARVESTER [33]. Principal coordinates analysis (PCoA) based on genetic distance was calculated using GenAlEx 6.5 [34]. Diversity analysis between genotypes was defined by the simple matching coefficient [35] using DARwin software v.6 [36]. The un-rooted tree was calculated using the neighbor-joining method [37].

Identification of Selective Sweeps
Identification of loci under selection among landraces and modern cultivars was performed by GWAS utilizing the eigenvectors corresponding to the top ten eigenvalues as the phenotype data, similar to the eigenGWAS [13], but using a mixed linear model (MLM) with TASSEL software version 5.0 [29]. The MLM accounted for population structure using a principal component analysis (PCA) matrix with 6 principal components as the fixed effect and a kinship (K) matrix as the random effect (PCA + K) at the optimum compression level. MLM followed the equation: y = Xβ + Zu + e where y is the trait value (the eigenvector in this case), β is the fixed effect for the marker, and u is a vector of random effects not associated with the markers; X and Z are incidence matrices linking y to β and u. Finally, e is the undetected vector of the random residual. In addition, the heading date was incorporated as a cofactor in the analysis. Two thresholds were established for considering marker-trait association (MTA) significance. A highly significant threshold was established using a false discovery rate (FDR) threshold [38] at p < 0.05, and a moderate threshold at −log 10 p = 3. In order to simplify the GWAS results, QTL hotspots grouping closely located MTAs were determined based on LD decay. Graphical representations of Manhattan plots were carried out using the R package "CMplot" (http://www.r-project.org (accessed on 15 April 2020)).

Genetic Diversity and Population Structure
Overall, 46,161 DArTseq markers were used to genotype the set of 387 durum wheat genotypes, of which 183 corresponded to Mediterranean and eastern Europe landraces and 204 to modern cultivars. To diminish the risk of false positives, markers and accessions were analyzed for the presence of duplicated patterns and missing values.
Of the 35,837 presence/absence variants (PAV), 24,188 had a known map position in the wheat v4 consensus map (Diversity Arrays Pty Ltd., Canberra, Australia). Of these, 4745 markers with a minor allele frequency (MAF) lower than 5% were excluded from the analysis, resulting in 19,443 PAVs remaining. Of 10,324 SNPs, 6957 were located on the wheat v4 consensus map. Of these, 1260 markers with missing data higher than 30% and 1011 markers with MAF < 5% were excluded from the analysis, resulting in a total of 4686 SNPs. Moreover, 413 markers were found to be duplicated among SNPs and PAVs, so the corresponding PAVs were discarded, leaving a total of 23,716 markers for the analyses. Forty-one percent of the markers corresponded to genome A and 59% to genome B. The total length of the map was 2129.2 cM, with a mean coverage of 11 markers/cM. Polymorphic information content (PIC) values were estimated for each chromosome, ranging from 0.26 in chromosome 7A to 0.29 in 7B, with an average of 0.28. PIC values showed a skewed distribution, with 48% of the markers having a PIC of <0.3 (Supplementary Figure S1).
Linkage disequilibrium (r 2 ) was estimated for locus pairs in genomes A and B. A total of 471,319 and 681,389 possible pair-wise loci were found for genomes A and B, respectively. The percentage of locus pairs showing LD at p < 0.001 was 43% for both genomes. Mean values for r 2 were 0.12 and 0.11 for genomes A and B, respectively. These means were used as a threshold for estimating the intercept of the LOESS curve to determine the distance at which LD decays in each genome. LD decays were established at 1 cM for both genomes ( Figure 1).

Genetic Diversity and Population Structure
Overall, 46,161 DArTseq markers were used to genotype the set of 387 durum wheat genotypes, of which 183 corresponded to Mediterranean and eastern Europe landraces and 204 to modern cultivars. To diminish the risk of false positives, markers and accessions were analyzed for the presence of duplicated patterns and missing values.
Of the 35,837 presence/absence variants (PAV), 24,188 had a known map position in the wheat v4 consensus map (Diversity Arrays Pty Ltd., Canberra, Australia). Of these, 4745 markers with a minor allele frequency (MAF) lower than 5% were excluded from the analysis, resulting in 19,443 PAVs remaining. Of 10,324 SNPs, 6957 were located on the wheat v4 consensus map. Of these, 1260 markers with missing data higher than 30% and 1011 markers with MAF < 5% were excluded from the analysis, resulting in a total of 4686 SNPs. Moreover, 413 markers were found to be duplicated among SNPs and PAVs, so the corresponding PAVs were discarded, leaving a total of 23,716 markers for the analyses. Forty-one percent of the markers corresponded to genome A and 59% to genome B. The total length of the map was 2129.2 cM, with a mean coverage of 11 markers/cM. Polymorphic information content (PIC) values were estimated for each chromosome, ranging from 0.26 in chromosome 7A to 0.29 in 7B, with an average of 0.28. PIC values showed a skewed distribution, with 48% of the markers having a PIC of <0.3 (Supplementary Figure S1).
Linkage disequilibrium (r 2 ) was estimated for locus pairs in genomes A and B. A total of 471,319 and 681,389 possible pair-wise loci were found for genomes A and B, respectively. The percentage of locus pairs showing LD at p < 0.001 was 43% for both genomes. Mean values for r 2 were 0.12 and 0.11 for genomes A and B, respectively. These means were used as a threshold for estimating the intercept of the LOESS curve to determine the distance at which LD decays in each genome. LD decays were established at 1 cM for both genomes ( Figure 1). Analysis of population structure was performed according to the distance of LD decay using only SNP markers showing less than 25% of missing data, minor allele frequencies higher than 10%, and PIC values higher than 0.3. A total of 1695 markers were used. The highest value for ΔK was observed for K = 2, followed by K = 5 ( Figure 2A). In the first case, the Bayesian clustering method used the Evanno test [32] to separate the genotypes by their origin (landraces vs. modern cultivars). Considering a membership coefficient of q > 0.6, the first group comprised 201 genotypes, 19 of them modern cultivars (9%) and 182 (91%) landraces. The second group included 160 modern cultivars. Finally, 26 genotypes remained as admixed (one landrace and 25 modern cultivars). When K = 5, the genotypes were structured according to their origin, showing a geographical pattern. In Analysis of population structure was performed according to the distance of LD decay using only SNP markers showing less than 25% of missing data, minor allele frequencies higher than 10%, and PIC values higher than 0.3. A total of 1695 markers were used. The highest value for ∆K was observed for K = 2, followed by K = 5 ( Figure 2A). In the first case, the Bayesian clustering method used the Evanno test [32] to separate the genotypes by their origin (landraces vs. modern cultivars). Considering a membership coefficient of q > 0.6, the first group comprised 201 genotypes, 19 of them modern cultivars (9%) and 182 (91%) landraces. The second group included 160 modern cultivars. Finally, 26 genotypes remained as admixed (one landrace and 25 modern cultivars). When K = 5, the genotypes were structured according to their origin, showing a geographical pattern. In this case, q > 0.5 was established as a threshold for considering a genotype within a subpopulation (SP).   The first group (SP 1) included 19 landraces, from which 89% corresponded to eastern Mediterranean countries and 11% to northern Mediterranean countries. The second group (SP 2) grouped 116 landraces and three modern cultivars. Landraces were mainly from northern Mediterranean countries (66%), and in lower percentages from eastern Mediterranean (21%) and southern Mediterranean (North of Africa) (13%) countries. The modern cultivars came from Italy ('Creso') and Spain ('Anibal' and 'Paramo'). The third group (SP 3) showed both landraces (31) and modern cultivars (12), mainly from western Mediterranean countries (including the south of Europe and north of Africa) (84% and 83% of the landraces and modern cultivars, respectively). The fourth (SP 4) and fifth (SP 5) groups included only modern cultivars. SP 4 (116 genotypes) was represented by modern cultivars mainly developed from CIMMYT and ICARDA germplasm, whereas SP 5 (39 genotypes) represented modern cultivars mainly from northern America (56%) and Europe (France, Italy and Spain) (41%). The remaining 51 genotypes (17 landraces and 34 modern cultivars) remained as admixed.
A principal coordinate analysis (PCoA) was carried out to graphically represent the results of the structure analysis in a bi-dimensional plot ( Figure 2B). In agreement with the structure analysis, the first two coordinates of the PCoA separated landraces, located on the positive side of the first coordinate, from the modern cultivars, located on the negative side of the first coordinate. Admixed genotypes were in the center of the plot. Within these clusters, the different subpopulations were clearly defined, as shown in Figure 2B.
As a complementary approach, a neighbor-joining tree based on a distance matrix was constructed to support the previous results ( Figure 2C). The tree presented a main division in two clusters, grouping landraces and modern cultivars separately. Within the cluster of landraces, there is a clear separation among SP 1 with landraces from eastern Mediterranean countries, SP 2 with landraces from northern Mediterranean countries, and the western Mediterranean landraces from SP 3. This cluster, grouping western Mediterranean landraces and modern cultivars by structure analysis, separated both types of genotypes in the main clusters. The modern cluster separately grouped the genotypes from the western Mediterranean (SP 3), north America (SP 5) and cultivars developed by CIMMYT and ICARDA breeding programs (SP 4). In addition to these main clusters, a small one representing modern cultivars from north America and southern Europe remained separate.
Results of the analysis of molecular variance (AMOVA) indicated that variation within SPs accounted for 92% of the total variance, whereas the remaining 8% corresponded to variation between SPs. Total genetic diversity (H T ) among SPs ranged from 0.40 in SP 4 to 0.35 for SP 3 and the admixed genotypes ( Table 1). The genetic diversity among SPs (D ST ) was low (0.03), causing a genetic differentiation (G ST ) among SPs of 0.08. This means that only about 8% of the variability observed was due to differences between SPs, as previously reported by AMOVA. The estimation of the gene flow (Nm) among SPs was 6.02, indicating a high level of gene exchange according to the low genetic differentiation among the SPs. Comparisons among SPs revealed that gene flow ranged from 2.54 between SP 4 (modern cultivars mainly developed by CIMMYT and ICARDA) and SP 5 (modern cultivars from north America and Europe) to 69.81 between SP 2 (landraces mainly from northern Mediterranean countries) and SP 3 (western Mediterranean landraces and cultivars) ( Table 1).

Identification of Loci under Selection by EigenGWAS
EigenGWAS was conducted using the top ten eigenvectors resulting from the PCoA obtained for the whole collection of genotypes, including landraces and modern cultivars. The largest eigenvalue was 3600.4, explaining 11.3% of the genetic variation, whereas the 10th eigenvalue was 408.0, explaining 1.3% of the genetic variation. The top ten eigenvalues accounted for 32.3% of the genetic variation, which indicates the complexity of the population structure of this durum wheat collection. A total of 1575 marker-trait associations (MTAs) were reported for the top ten eigenvectors using a moderate threshold of −log 10 p = 3.0. Based on the LD decay for a maximum distance of 1 cM, a highly significant FDR threshold at p < 0.05 was established for −log 10 p = 4.6. Following this approach, 250 MTAs were significant ( loci under selection, QTL hotspots were identified by grouping closely located MTAs. Confidence intervals were defined based on the distance of 1 cM of the LD decay. A total of 89 QTL hotspots were identified, including 1491 MTAs, 248 of them (17%) above the FDR threshold ( Table 2). The remaining 84 single MTAs were not considered further in the analysis. The number of MTAs per QTL hotspot ranged from 2 to 158, with a mean of 17 MTAs/QTL hotspot. The number of QTL hotspots per chromosome ranged from two in chromosome 4B to nine in chromosome 3A. The number of MTAs per chromosome ranged from 7 in chromosome 4B to 277 in chromosome 2B. Chromosome 4B did not carry any MTA above the FDR threshold, whereas chromosome 5B reported 51 MTAs out of 184 above the FDR threshold.  To simplify this information and to identify consensus genomic regions controlling loci under selection, QTL hotspots were identified by grouping closely located MTAs. Confidence intervals were defined based on the distance of 1 cM of the LD decay. A total of 89 QTL hotspots were identified, including 1491 MTAs, 248 of them (17%) above the FDR threshold ( Table 2). The remaining 84 single MTAs were not considered further in the analysis. The number of MTAs per QTL hotspot ranged from 2 to 158, with a mean of 17 MTAs/QTL hotspot. The number of QTL hotspots per chromosome ranged from two in chromosome 4B to nine in chromosome 3A. The number of MTAs per chromosome ranged from 7 in chromosome 4B to 277 in chromosome 2B. Chromosome 4B did not carry any MTA above the FDR threshold, whereas chromosome 5B reported 51 MTAs out of 184 above the FDR threshold.

Identification of Selection Regions among SPs
To identify the genome regions most involved in the selection among the different SPs, markers with −log 10 p > 5 (147) from the eigenGWAS were selected and a PCoA was carried out (Figure 4). Markers were widely distributed along the genomes in all chromosomes, except chromosome 4B which harbored at least two MTAs or one QTL hotspot. The PCoA separated two clear groups: group 1, on the left of the Y-axis, included 173 genotypes (mainly modern cultivars (79%)), whereas group 2, on the right of the Y-axis, included 214 genotypes, of which 69% corresponded to landraces. By SPs, those repre- The PCoA separated two clear groups: group 1, on the left of the Y-axis, included 173 genotypes (mainly modern cultivars (79%)), whereas group 2, on the right of the Y-axis, included 214 genotypes, of which 69% corresponded to landraces. By SPs, those represented mainly by landraces (SP 1, SP 2 and SP 3) were mostly included in group 2 (95%, 77%, and 63%, respectively), whereas SPs represented mainly by modern cultivars (SP 4 and SP 5), were mostly included in group 1 (63% and 85%, respectively). All north American cultivars from SP 5 were located within group 1. Most of the landraces from the northern Mediterranean included in SP 2 were also represented in group 1.
The selected 147 markers, corresponding to 35 QTL hotspots, were analyzed to identify differences in the marker allele between both groups, as well as the different SPs (Supplementary Table S3). To identify robust differences among groups, a threshold of allele frequency within a group was established at 80%. When both alleles of the marker comply with this condition, the marker was considered significant for locus selection. Following this approach, 35 markers from five QTL hotspots were identified: eigenQTL2A.7, eigenQTL2B.3, eigenQTL3A.5, eigenQTL3A.6 and eigenQTL3A.7 (Table 3). However, when the markers were blasted against the reference genomes of bread wheat [22], durum wheat [23], and wild emmer [21], it was observed that markers corresponding to eigenQTL3A.5, eigenQTL3A.6, and eigenQTL3A.7 shared the same physical positions.

Gene Annotation
Gene models were successfully identified using the different Gbrowse tools for the bread wheat cultivar 'Chinese Spring' [22], the durum wheat cultivar 'Svevo' [23], and the wild emmer cultivar 'Zavitan' [21] (Supplementary Table S4). The genome interval to identify gene models was defined based on the position of the flanking markers of the corresponding QTL hotspot. Thus, for eigenQTLT2A.7, 27, 29, and 6 gene models were identified in 1.40 Mb, 1.67 Mb and 270 Kb for 'Chinese Spring', 'Svevo' and 'Zavitan', respectively. For eigenQTLT2B.3, 47, 36, and 23 gene models, with 3.92 Mb 3.79 Mb and 4.11 Mb in 'Chinese Spring', 'Svevo' and 'Zavitan', respectively. Finally, eigenQTLT3A.5-7 were those with a higher number of gene models for the three genomes, with 77, 62, and 42 covering 6.12 Mb, 6.73 Mb and 6.57 Mb, respectively. Some of the gene models were represented in clusters, as was the case for F-box proteins, kinase proteins and resistance genes (Supplementary Table S4). Figure 5 summarizes the identification of common gene models between the three genomes for each of the three selected QTL hotspots. To reduce complexity, when a gene model was represented by more than one copy, it was reduced to a unique gene. represented in clusters, as was the case for F-box proteins, kinase proteins and resistance genes (Supplementary Table S4). Figure 5 summarizes the identification of common gene models between the three genomes for each of the three selected QTL hotspots. To reduce complexity, when a gene model was represented by more than one copy, it was reduced to a unique gene. From 133 gene models within the three QTL hotspots, 33 (25%) were common for the three genomes, 25 (19%) were common between 'Chinese Spring' and 'Svevo', 11 (8%) were common between 'Chinese Spring' and 'Zavitan', and 3 (2%) were in common between 'Svevo' and 'Zavitan'. Finally, 46% of the gene models were unique for the different genomes.

Genetic Diversity and Population Structure
Genetic diversity is essential in plant breeding because it represents a source of new alleles for genes of interest. A useful approach for recovering and broadening allelic variation in traits of interest is the use of landraces in breeding programs [40], which may be of particular interest for suboptimal environments such as those prevailing in the Mediterranean basin [41].
The average chromosomal PIC value was 0.28. This value is similar to that reported previously in studies using bi-allelic markers such as SNP or DArT in durum wheat. Baloch et al. [42] reported PIC values of 0.26 and 0.30 depending on the marker type, (DArTseq or SNP, respectively). Kabbaj et al. [43] found a PIC value of 0.32 with 8173 SNPs from the Axiom 35K array. Pascual et al. [39] using a collection of Spanish landraces of bread and durum wheat genotyped with the DArTseq technology and obtained an average PIC value for both species between 0.30 and 0.35. According to the classification proposed by Botstein et al. [44] which separates PIC values into three categories of highly informative (PIC > 0.5), moderately informative (0.25 < PIC < 0.5) and slightly informative (PIC < 0.25), the markers in our panel are considered moderately informative. In previous From 133 gene models within the three QTL hotspots, 33 (25%) were common for the three genomes, 25 (19%) were common between 'Chinese Spring' and 'Svevo', 11 (8%) were common between 'Chinese Spring' and 'Zavitan', and 3 (2%) were in common between 'Svevo' and 'Zavitan'. Finally, 46% of the gene models were unique for the different genomes.

Genetic Diversity and Population Structure
Genetic diversity is essential in plant breeding because it represents a source of new alleles for genes of interest. A useful approach for recovering and broadening allelic variation in traits of interest is the use of landraces in breeding programs [40], which may be of particular interest for suboptimal environments such as those prevailing in the Mediterranean basin [41].
The average chromosomal PIC value was 0.28. This value is similar to that reported previously in studies using bi-allelic markers such as SNP or DArT in durum wheat. Baloch et al. [42] reported PIC values of 0.26 and 0.30 depending on the marker type, (DArTseq or SNP, respectively). Kabbaj et al. [43] found a PIC value of 0.32 with 8173 SNPs from the Axiom 35K array. Pascual et al. [39] using a collection of Spanish landraces of bread and durum wheat genotyped with the DArTseq technology and obtained an average PIC value for both species between 0.30 and 0.35. According to the classification proposed by Botstein et al. [44] which separates PIC values into three categories of highly informative (PIC > 0.5), moderately informative (0.25 < PIC < 0.5) and slightly informative (PIC < 0.25), the markers in our panel are considered moderately informative. In previous studies from our group in durum and bread wheat, Soriano et al. [45] genotyped a panel of 192 durum wheat genotypes (mainly Mediterranean landraces) with 44 microsatellite markers and found an expected heterozygosity of 0.71. Rufo et al. [46] genotyped bread wheat collections of landraces and modern cultivars with a 15K SNP array and obtained a mean PIC value of 0.30, in accordance with the results obtained in the present study. These lower PIC values using DArTseq are explained by their bi-allelic nature, as the maximum PIC corresponds to 0.5 when both alleles have the same frequency [47].
Population structure analysis clearly divided the collection into two main subpopulations based on historical breeding periods, separating the genotypes in landraces and modern cultivars. To conduct a deeper analysis, the second highest value for K in the Evanno test was used. The genetic distribution of the landraces in the three SPs and the huge gene flow between them may be associated with the pattern of migration of durum wheat from the Fertile Crescent to the west of the Mediterranean basin described by Moragues et al. [48]. SP 1 contains the largest proportion of landraces from countries close to the zone of wheat domestication (89.4%), and only two Italian landraces (10.5%). Therefore, it is conceivable that SP 1 may putatively incorporate the oldest genetic background within the germplasm panel used in this study. The lowest level of admixture in SP 1 (89% of the genotypes with q > 0.7) agrees with this hypothesis. SP 2 could represent a further step in the history of wheat dispersal within the Mediterranean basin, as it gathers 21% of landraces from eastern Mediterranean countries, but 76% from western areas where it is supposed that wheat arrived between 2 and 3 millennia after its domestication [1]. Finally, SP 3 includes 72% of landraces and 28% of modern cultivars from western Mediterranean countries, the most distant from the area of wheat domestication, and so the most evolved from an evolutionary point of view. The highest gene flow between SP 2 and SP 3 and the lower, but still very high gene flow between SP 1 and SP 2, agree with this interpretation.
Gene flow between SPs offers clues regarding the putative use of Mediterranean old durum germplasm by the breeding programs represented here. The lowest gene flow was detected between SP 1 (assumed to gather the ancient genetic pool of the panel) and SPs involving only modern germplasms (SP 4 and SP 5). However, gene flow from SP 2 to modern cultivars was much higher, in agreement with the fact that this SP includes landraces adapted to a wide range of environmental conditions. The highest gene flow between SP 3 and SP 5 suggests that modern north American and European cultivars incorporate a significant portion of the genetic background of germplasms adapted to western Mediterranean environments. The relatively low gene flow observed between Mediterranean germplasms and the CIMMYT-ICARDA genetic pool may be a consequence of these international centers acting globally, thus incorporating germplasms in their breeding programs from around the world. SP 4 and SP 5 included only modern cultivars and had a low genetic flow between them, in agreement with the CIMMYT and north American durum wheats belonging to different germplasm pools [49,50].
Modern SPs presented a higher genetic diversity than SPs that included landraces in the following direction: SP 4 > SP 5 > SP 1 = SP 2 > SP 3. In agreement with the international nature of CIMMYT and ICARDA and their role as germplasm providers worldwide, SP 4 incorporates a wide range of cultivars with a worldwide distribution, thus showing a heterogeneous genetic background and the largest genetic diversity. SP 2 and SP 3 have mainly a western Mediterranean background (including the south of Europe and the north of Africa) and thus, with higher germplasm exchange, they could produce uniformity in the cultivars. The slightly higher values of H T observed in modern SPs may be due to the type of markers used in the study, as DArTseq and SNP are biallelic markers. For example, Soriano et al. [45] used SSR markers in a similar collection of 172 Mediterranean landraces and 20 modern cultivars and found higher values for H T in landrace SP and lower numbers of alleles in modern cultivars.
Genetic differentiation indicated that only 8% of the variability observed corresponded to differences between SPs, according to the high estimation of gene flow among SPs, thus indicating a high level of genetic exchange. Comparison of the genetic exchange between SPs revealed that the highest gene flow was observed between the western Mediterranean landraces and modern cultivars from western Mediterranean countries, as well as those between landraces from east to west in the Mediterranean basin. However, the lowest gene flow was found between eastern Mediterranean landraces and the germplasms from CIMMYT and ICARDA and between these germplasms and the modern cultivars from north America. This agrees with the results reported by Parzies et al. [51] and Ben-Romdhane et al. [52], suggesting that the genetic differentiation among landrace SPs is due to farmer trade and is mainly influenced by geographic distances. Cultivars with a CIMMYT/ICARDA origin reported lower values of gene flow than the other SPs, as reported previously by Rufo et al. [46] in bread wheat. These authors concluded that this was mainly due to the release of improved inbred lines distributed by local breeding programs through the nurseries to which these international centers distribute worldwide.

Detection of Selective Sweeps by EigenGWAS
Eigenvectors are frequently used to infer the genetic structure of a given population as they are estimated for any single individual. Several studies have pointed out the usefulness of primary eigenvectors to analyze population differentiation [53][54][55]. In this direction, eigenGWAS was developed by Chen et al. [13] as an approach to identify genomic regions underlying genetic differentiation. The analysis of selective sweeps produced during breeding is important for the identification of loci under selection that will be of interest for marker-assisted selection and the selection of the improved germplasm.
Other authors identified selective sweeps in hexaploid wheat. Cavanagh et al. [56] identified 21 selective regions in spring wheat and 39 in winter wheat using a worldwide collection of 2994 accessions. These authors found that most of the selective regions were involved in yield potential, vernalization, plant height, and biotic and abiotic stress. More recently, Zhou et al. [57] found 148 selective regions in a collection of 717 Chinese wheat landraces associated with yield and disease resistance. Liu et al. [15], using a worldwide panel comprising landraces from China and Pakistan and modern cultivars genotyped with the 90K SNP array, identified 477 selective sweeps. Some of these loci comprised known functional genes for disease resistance, vernalization, quality, adaptability, and yield. This is the first study of this type conducted in durum wheat. We identified selective sweeps among Mediterranean landraces and modern cultivars in the durum wheat genome using eigenvectors as phenotypic traits in the GWAS. A total of 1575 MTAs were significant for the first ten eigenvectors at a moderate threshold, whereas for a highly significant threshold, 250 MTAs were significant. To simplify this information, 89 QTL hotspots (including 1491 MTAs) were defined as consensus genomic regions controlling loci under selection. These QTL hotspots included important loci that were selected during the breeding process such as the photoperiod loci Ppd-A1 and Ppd-B1, the vernalization loci Vrn-A1 and Vrn-B1, and the dwarfing genes Rht-B1, Rht12 and Rht25. The cycle length of durum wheat was shortened during the breeding process by the incorporation of favorable alleles from these loci, as reported by Royo et al. [41,49]. The development of semi-dwarf germplasm by CIMMYT at the end of the 1960s had a world-wide impact on wheat production. The major dwarfing genes Rht-B1b and Rht-D1 (this last in bread wheat) incorporated in the modern cultivars reported yield increases of up to 35% in both durum wheat [50] and bread wheat [58]. The quality loci for the high molecular weight (HMW) glutenin subunits (GS) Glu-A1 and Glu-B1 were found within QTL hotspots in chromosomes 1A and 1B, respectively. Previous studies reported by De Vita et al. [59] and Subirà et al. [60] revealed the improvement of pasta-making quality in modern cultivars during the 20th century in Italy and Spain due to the incorporation of favorable alleles for HMW-and low molecular weight (LMW)-GS loci. Other loci involved in grain quality located within hotspots were the polyphenol oxidase (PPO) genes Ppo-A1 [61] and Ppo-B2 [62], which cause the undesir-able brown color in semolina, and thus the identification of the alleles producing low PPO activity is essential in durum wheat breeding programs. The peroxidase activity genes, such as Pod-A1 [63], affect the natural carotenoid pigment content and are associated with the color of flour. GPC-B1, located on chromosome 6B [64], confers a shorter duration of the grain filling period due to early flag leaf senescence and thus increases grain protein content. Wheat grain avenin-like proteins (ALPs), such as TaALP-4A, are involved in dough quality and antifungal activities [65]. Finally, Psy-B1 belongs to the phytoene synthase (PSY) gene family, which are involved in the biosynthesis of carotenoid pigments in durum wheat, influencing grain yellowness [66]. Other genes included within QTL hotspots were related to grain yield, such as the locus TaSus2-2A which is associated with grain weight as reported by Jiang et al. [67], the transcript elongation factor TaTEF-7A [68] which regulates tillering and increases grain number per spike, and the glutamine synthetase gene TaGS2-B1 [69] which plays a key role in plant growth, nitrogen use efficiency, and yield potential in wheat. The identification of these genes that were incorporated into elite cultivars during the breeding process suggest the QTL hotspot regions as target loci in wheat improvement.
Among the 250 MTA over the highly significant threshold, 147 MTAs showed −log 10 p > 5. These markers, distributed in all chromosomes except 4B, were used to perform a new PCoA. Interestingly, a similar pattern with two main groups was observed in both analyses, separating most of the landraces from modern cultivars, with a higher level of admixture among subpopulations in the latter. When markers were analyzed to find allelic differences among the two groups, five QTL hotspots (eigenQTL2A.7, eigenQTL2B.3, eigenQTL3A.5, eigenQTL3A.6, and eigenQTL3A.7) were identified as being responsible for the main separation in the PCoA among landraces and modern cultivars. However, at the genome level [21][22][23], hotspots on chromosome 3A were located in the same physical positions. Differences in the genetic position may correspond to heterozygous genotypes and missing data. According to our results, these hotspots are suggested to be the main drivers in the genetic differentiation of Mediterranean landraces from modern cultivars.
The analysis of the genome sequence covering these QTL hotspots revealed the presence of gene models involved in important biological functions (Supplementary Table S4). Among them, different gene models were related to disease resistance; a CsAtPR5-like protein was found to be linked to the powdery mildew resistance gene PmLK906 in the wheat line 'Lankao 90(6) 21-12 [70]. According to Larriba et al. [71] Rhomboid-like proteins are involved in fungal-plant interactions. Proteins belonging to the UDP-glycosyltransferase protein superfamily were found to participate in fusarium head blight (FHB) resistance in wheat [72]. The kinase family proteins are involved in different processes, ranging from physiological roles such as control of shoots and floral meristems to pathogen identification [73]. This protein family also includes the leucine-rich repeats receptor-like kinase (LRR-RLK) genes, a large and complex gene family in plants mainly participating in the development and stress reactions. LRR domains are characterized by a high variation in the number of repeats, allowing a wide range of protein-protein interactions [74]. Proteins containing NAC and heat shock domains were reported to regulate biotic and abiotic stresses [75,76].
Other genes with implications in stress and plant development corresponded to MADS-box and tetratricopeptide repeats (TPR). According to Ma et al. [77], the MADSbox gene family plays key roles in different developmental processes such as flowering time, floral meristems, fruit formation, and flower organs and seeds. The authors found that several wheat MADS-boxes were expressed in the roots, stems, leaves, spikes, and grains during different developmental stages. Other MADS-box genes showed different expression under stress, as reported by Guo et al. [78] in response to stripe rust in wheat, suggesting their role in plant-microbe interactions. In Brachypodium, MADS-box genes were also identified to be regulated under drought and cold stresses [79]. TPRs mediate protein-protein interactions and are present across all plant species. Some TPRs are involved in plant stress and hormone signaling [80]. Auxin response factors (ARF) regulate the development of plant organs. Qiao et al. [81] characterized the ARF family in wheat and found that one of them, TaARF15-A.1, may regulate the development of roots and leaves. Expansins were found to be involved in root development. The experiments of Li et al. [82] overexpressing of the wheat expansin gene TaEXPB23 in tobacco enhanced drought tolerance and accelerated root development. Zinc finger proteins play important roles in several plant mechanisms, from growth regulation and development, signaling and responses, to abiotic stresses. In wheat, the zinc finger protein TaZFP34 is overexpressed in roots, reducing shoot growth but maintaining root elongation [83]. The homeobox protein LUMINIDEPEDENS was found in eigenQTL3A.5, 6, and 7 in the three genomes. This gene controls flowering time in Arabidopsis, as mutations in the gene have been found to produce late flowering that is partially suppressed by vernalization [84]. Other gene models within these eigenQTLs were found to enhance grain yield. F-box proteins were found in 'Chinese Spring' and 'Svevo' annotations in the three hotspots on chromosome 3A. Among the different functions of these genes, Li et al. [85] demonstrated that the F-box gene LARGER PANICLE improves the panicle architecture of rice, thus enhancing grain yield. In wheat, Hong et al. [86] reported that members of the F-box E3 ubiquitin ligases regulate spike development. Carboxypeptidases were implicated in grain size control in rice through the regulation of grain width, grain filling, and weight [87]. These authors found that the expression of GS 5 was correlated with larger grains in rice. Finally, a tapetum determinant gene was found. According to Lei and Liu [88], disrupted tapetum development alters the expression of many genes involved in male meiosis in higher plants.

Conclusions
The use of local landraces in breeding programs is considered a valuable approach to broadening the genetic background of crops lost during the breeding process and improving traits of commercial importance [40,45]. The present study uses a GWAS approach with eigenvectors to identify selective sweeps among durum wheat Mediterranean landraces and modern cultivars from different origins. Most of the chromosomes reported selective regions, some of them harboring functional genes for important agronomic traits involved in yield performance, plant development, and grain quality. Three genome regions in chromosomes 2A, 2B, and 3A were identified as the main drivers for the differentiation of the Mediterranean landraces. Within these regions, gene models for disease resistance, abiotic stress, plant development, and yield were found.