www.mdpi.com/journal/ijms Genetic Diversity Revealed by Single Nucleotide Polymorphism Markers in a Worldwide Germplasm Collection of Durum Wheat

Evaluation of genetic diversity and genetic structure in crops has important implications for plant breeding programs and the conservation of genetic resources. Newly developed single nucleotide polymorphism (SNP) markers are effective in detecting genetic diversity. In the present study, a worldwide durum wheat collection consisting of 150 accessions was used. Genetic diversity and genetic structure were investigated using 946 polymorphic SNP markers covering the whole genome of tetraploid wheat. Genetic structure was greatly impacted by multiple factors, such as environmental conditions, breeding methods reflected by release periods of varieties, and gene flows via human activities. A loss of genetic diversity was observed from landraces and old cultivars to the modern cultivars released during periods of the Early Green Revolution, but an increase in cultivars released during the Post Green Revolution. Furthermore, a comparative analysis of genetic diversity among the 10 mega ecogeographical regions indicated that South America, North America, and Europe possessed the richest genetic variability, while the Middle East showed moderate levels of genetic diversity.


Introduction
Modern wheat cultivars usually refer to two species: hexaploid bread wheat, Triticum aestivum (2n = 6X = 42, AABBDD), and tetraploid, hard or durum-type wheat, T. durum (2n = 4X = 28, AABB) [1]. Durum wheat is traditionally grown around the Mediterranean Sea and is the most common cultivated form of allotetraploid wheat. Currently, more than half of the durum wheat is still grown in the Mediterranean basin, mainly in Italy, Spain, France, Greece, West Asian, and North African countries [2].
Wheat domestication took place 12,000 years ago in the Near East, with the wild ancestor (T. dicoccoides) giving rise to the first domesticated form (emmer wheat, T. dicoccum) [3]. About 2000 years after this event, durum wheat, which is characterized by free threshing, appeared in the eastern Mediterranean and replaced its ancestor T. dicoccum to become the major cultivated form of allotetraploid wheat by the second millennium BC [3][4][5]. Durum was part of the initial crop package introduced into Europe and North Africa during the Neolithic period but was preferred in the western Mediterranean basin [6], whereas emmer was the staple crop in Ancient Egypt until the introduction of durum in the Hellenistic Period [7]. Durum wheat continued to spread throughout Europe at the end of the 15th century [8]. That is, when Europeans first touched the shores of the Americas across the Atlantic in 1492, the Columbian Exchange (artificial re-establishment of connections through the commingling of Old and New World plants, animals, and bacteria.) allowed durum wheat from the Old World to the New World [9,10]. Especially in the Spanish colonial periods during the 16-17th centuries, European agriculture had a profound effect on the Americas. The most recent history of durum wheat has been marked by modern genetic improvement, involving the replacement of landraces by inbred varieties and the introduction of dwarfing genes (second part of the 20th century) [3]. These historical events are likely to have altered the original genetic structure and genetic diversity pattern of wheat.
Molecular markers are particularly useful for the evaluation of genetic diversity in wheat and other crop species with a narrow genetic base [11]. To date, a variety of molecular marker techniques are available for genome analysis in wheat. Molecular markers that did not rely on genomic sequence information were designed first, including restriction fragment length polymorphisms (RFLPs) [12][13][14], random amplified polymorphic DNA (RAPD) [14][15][16], and amplified fragment length polymorphism (AFLP) [11,14,[17][18][19][20]. These markers have been used successfully for genetic mapping, phylogenetic relationships [17,18], comparative genomic studies [20], and diversity evaluation [18,19]. However, none of them have been used extensively in breeding programs because they do not meet the requirements for efficient application in marker-assisted-selection (MAS), i.e., adaptability to flexible and high-throughput detection methods, high efficiency with low-quantity and low-quality DNA, low-cost per assay, tight linkage to target loci, and the high level of polymorphism in breeding materials [21,22].
Until now, simple sequence repeat (SSR) markers relying on genomic sequences have been proven to be the most widely used DNA marker type in characterizing germplasm collections of crops, because of their easy use, relatively low cost, and high degree of polymorphism provided by the large number of alleles per locus [23,24]. In the past decade, thousands of SSR markers have been developed for wheat and more than 4000 have been mapped genetically (see GrainGenes: A Database for Triticeae and Avena. [25]). However, operationally, there have been problems in their use caused by challenges in accurately sizing SSR alleles due to PCR and electrophoresis artifacts [26].
More recently, single nucleotide polymorphism (SNP) markers gained significant attention because they are bi-allelic in nature and occur at a much higher frequency in the genome than SSRs and other markers. Furthermore, their genotyping can be easily automated [26]. In crops, the availability of SNP genotyping platforms would facilitate the genetic dissection of traits of economic importance and the application of marker-assisted and genomic selection [21,[27][28][29]. Moreover, SNPs are the most abundant class of sequence variability in the genome and thus have the potential to provide the highest map resolution [26,30]. Genome-wide maps comprised of large numbers of SNP markers have been reported in Arabidopsis [31], rice [32], soybean [33], and barley [34]. However, so far only a limited number of SNPs has been reported in wheat [35][36][37][38][39][40], because large-scale SNP discovery in wheat is limited by both the polyploidy nature of the organism and the high sequence similarity found among the three homoeologous wheat genomes [38,41]. Also, none have been reported on genetic diversity and genetic structure detected by SNP markers in world-wide durum wheat germplasm resources.
Information about the genetic diversity and genetic structure in germplasm is of fundamental importance for crop improvement [24]. It is widely argued that the genetic diversity of major crops, especially self-pollinating cereals, has suffered an overall reduction with time, due to the pressure of pure-line selection applied in breeding programs [42][43][44]. Genetic diversity in durum wheat germplasm were studied using several types of molecular markers. However, SNP-detected diversity pattern and genetic relationships in a worldwide germplasm collection of durum wheat have not been reported. Herein, the objectives of our study were to (a) evaluate the genetic diversity in a global durum wheat collection using SNP markers covering the whole genome; (b) unravel the genetic structure of durum wheat; and (c) assess genetic variation temporally and spatially by comparing the diversity among released periods of varieties and among different geographical origins, respectively.

SNP Marker Quality and Genomic Distribution
A total of 230,400 data points were generated by genotyping of 150 durum wheat accessions with multiplexed 1536 Illumina Golden Gate SNP assay. Out of 1536 SNPs presented in our oligonucleotide pool assay (OPA), 1366 (89%) SNPs with high quality genotype calls were obtained, while the other 10% failing to generate clear genotype clustering were removed. Out of the 1366 scoreable SNP markers, 420 were monomorphic across all the 150 accessions and the overall polymorphism rate was 69.3%. Because SNP markers are mainly bi-allelic, therefore, all SNPs showed two alleles only. The 946 polymorphic SNPs markers were used for further analysis. Marker distribution, Nei's gene diversity, and PIC values estimated for each chromosome and genome were listed in Table 1. SNPs loci were not evenly distributed across the seven homoeologous groups, and coverage ranged from 108 in group 5 to 161 loci in group 6. Nei's gene diversity and PIC values across groups ranged from 0.2004 to 0.2508 and from 0.1656 to 0.2006, respectively. The chromosome group 1 had higher genetic diversity and the group 3, 4 and 5 had lower genetic diversity than the genome-wide average (Table 1).
Of the polymorphic loci, 516 and 430 were located in A and B genomes of durum wheat, respectively. As shown in Table 1, a higher genetic diversity was detected in genome B with Nei's gene diversity, and PIC values of 0.2384 and 0.1970, respectively, while 0.2193 and 0.1819 for genomes A, respectively. This difference between genome A and B was not statistically significant for both gene diversity (t = 1.459, p = 0.195, paired t test) and PIC (t = 1.488, p = 0.187, paired t test). In the A genome of durum wheat, chromosome 6A had higher genetic diversity (Nei's gene diversity, 0.2526; PIC, 0.2072), and chromosome 4A had lower genetic diversity (Nei's gene diversity, 0.1899; PIC, 0.1576) than the rest of chromosomes (Table 1). In the B genome, genetic diversity was lower in chromosome 4B and 5B than the genome-wide average, while genetic diversity was higher in chromosome 1B (Nei's gene diversity, 0.2695; PIC, 0.225) than the genome-wide average (Table 1).

Genetic Structure
Genotyping data generated by the 946 polymorphic SNP markers were used for genetic structure analysis, using the Bayesian clustering model implemented in the Structure software. The estimated log probability of the data (LnP(D)) increased continuously with increasing K and there was no obvious K value clearly defining the number of populations ( Figure 1a). However, the rate of change in the Napierian logarithm probability relative to standard deviation (ΔK) [45] suggested that the best structure was K = 2 ( Figure 1b). Thus, the analyzed durum wheat germplasm can be divided into two genetically distinct groups. Similarly, the unrooted NJ tree based on shared-allele genetic distances also distinguished two major groups of accessions (Groups I,II), corresponding to the structure grouping ( Figure 2). However, group II can be further divided into four subgroups: IIa, IIb, IIc, and IId. Ecogeographical origin, improvement status (landraces vs. cultivars), and pedigree information of accessions were analyzed to explain the inferred structure. Group I contained 39 accessions, about half (20/39) of which were collected from the Americas (North America and South America). Further analysis of these accessions showed that this group is dominated by landraces (16) and cultivars released during the Post Green Revolution (PGR) (14) (Figure 2).
Group II contained 96 accessions, which can be further divided into four big subgroups: IIa, IIb, IIc, and IId. Although the grouping pattern is very ecogeographically heterogeneous in each subgroup, the grouping pattern of some accessions appeared to be associated, to some extent, with the release period of varieties ( Figure 2). Group IIa is dominated by landraces and old cultivars (OC). Group IIc is dominated by landraces and cultivars released during the Post Green Revolution. Both group IIb and IId are dominated by cultivars released during the Early Green Revolution (EGR).

Figure 2.
Dendrogram of 150 T. durum accessions based on the shared-allele genetic distance calculated from data of 946 SNP markers, using the NJ algorithm as the clustering method. Numbers on nodes are bootstrap probabilities estimated by permutation test with 1000 replications.

Genetic Diversity between Landraces and Cultivars
As shown in Table 2, difference between landrace and cultivar was significant for Nei's gene diversity (t = 7.214, p < 0.001, paired t test) and PIC (t = 9.026, p < 0.001, paired t test). The higher genetic diversity was detected using SNP markers in the cultivars with Nei's gene diversity and PIC values of 0.2310 and 0.1919, compared to 0.2192 and 0.1800 for the landrace, respectively. Furthermore, molecular variance component in cultivars and landraces was compared to serve as a complementary indicator for genetic diversity. Analysis of molecular variance (AMOVA) revealed that individuals within cultivars (65.54%) are highly genetically variation in relation to individuals within landraces (33.97%) (Table 3). Similarly, the higher polymorphic level obtained from the cultivars also reflect greater genetic variation compared to that in the landraces. Of the 946 polymorphic SNP markers over the panel of 150 accessions, 756 showed polymorphism (756/946 = 79.9%) among the 53 landraces, while 933 showed polymorphism (933/946 = 98.6%) among the 97 cultivars ( Table 2). The panel of 53 landraces has a significant lower level of genetic diversity than the panel of 97 durum wheat cultivars. But previous research showed that landraces had wide genetic diversity, while the cultivars had narrow genetic diversity due to high selection pressure and genetic drift in breeding programs [20,46,47].  In order to explain the reasons why the higher level of genetic diversity exists within improved accessions, the 97 cultivars were further divided into three temporal groups: OC, EGR and PGR. As shown in Table 2, a loss of genetic diversity was observed from OC to EGR (Nei's gene diversity, t = 6.484, p < 0.001, paired t test; PIC, t = 6.304, p < 0.001, paired t test), but an increase in PGR was observed (Nei's gene diversity, t = 9.617, p < 0.001, paired t test; PIC, t = 9.885, p < 0.001, paired t test). That is, genetic diversity was narrowed down from 1930 to 1980, but enhanced from 1981 to 2009. Noteworthy, plant height, as an extremely important target trait in modern wheat breeding, also showed significant variation/decrease. The "Green Revolution" in cereals was achieved by reducing plant height, thereby reducing lodging susceptibility and increasing grain yield [1,48]. As shown in Table 4, mean plant height of landrace and old cultivars were 132.46 and 130.72, respectively, while cultivars released during the periods of EGR and PGR had a significantly lower plant height (F = 19.02, p < 0.01, ANOVA), with an average of 119.13 and 101.91, respectively.

Divergence between Landraces and Cultivars
We conducted further analyses to identify candidate loci that are under positive selection between landraces and cultivars. An analysis of Fst on a locus-by-locus basis provided a cutoff for identifying loci that may be under positive selection [49]. Therefore, we used an outlier detection method implemented in the LOSITAN program [50]. Between landraces and cultivars, a total of 92 outlier loci under positive selection were identified. Chromosomal distributions of these loci were shown using wheat chromosome bin maps in Figure 3. A high portion of these loci (54.3%) was derived from chromosomes 2, 6, and 7. Among the 92 loci, P-EA (phosphoethanolamine methyltransferase), TsPAP1 (prolyl aminopeptidase 1), CPK10 (Calcium-dependent protein kinase), PI-PLC1 (phosphoinositide-specific phospholipase C1), RSZ38 (alternative splicing regulator), PDS (phytoenedesaturase), and LOX3 (lipoxygenase) gene, which play important roles in plant responses to biotic and abiotic stresses or in grain storage in wheat, were identified as under positive selection between landraces and cultivars. We inferred putative functions of these loci based on comparison to a protein sequence database (Table 5).   Table 4. The number in parentheses at the bottom of each chromosome is the number of EST loci mapped in that chromosome without knowing the exact bin. Only those bins with mapped loci are indicated.

Genetic Diversity vs. Place of Origin
Knowledge of genetic diversity from different ecogeographic areas was expected to have a significant impact on the conservation and utilization programs of durum germplasm, allowing breeders to develop strategies to incorporate useful diversity in their breeding programs. A summary of the genetic diversity data of the 10 mega ecogeographical regions was shown in Table 6 Table 6).

SNP-Based Polymorphism and Genetic Diversity
Average Nei's gene diversity and PIC values revealed by SNP markers in this study were 0.2280 and 0.1888, respectively (Table 1). Compared to the previous studies on durum wheat, this level of genetic diversity is not high. Moragues et al. [8] reported genetic diversity of 63 durum wheat landraces from the Mediterranean basin, and obtained PIC values of 0.24 and 0.70 for AFLP and SSR, respectively. Maccaferri et al. [2] studied genetic diversity of the elite durum wheat germplasm from Italy and other Mediterranean countries using SSR markers, and estimated a mean diversity index (DI) of 0.56. Relatively lower genetic variation revealed by SNP marker is an expected. Because SNP markers are mainly bi-allelic, the gene diversity and PIC thus cannot exceed 0.50, whereas the maximum can approach 1 for multi-allelic markers, such as SSRs.
Despite this fact, a sufficient level of genetic variation and similar variation trend can be detected using SNP markers. For example, our results are in agreement with previous studies that chromosomes 4A and 4B have relatively low genetic diversity due to the evolutionary translocation events involving chromosome 4A [14,51,52]. The greater genetic variation in the B genome than in the A genome was detected in this study (Table 1), which suggested a larger contribution of the B than A genome to durum genetic variation. The different contribution of A, B genomes to genetic variation was also demonstrated in previous studies by the use of SSRs [53], RFLPs [54] and AFLP [14] in common hexaploid wheat as well as in T. dicoccoides [1,55]. These results suggest that SNP can be used as an effective type of molecular markers for genetic evaluation in wheat.

Genetic Structure Raveled by SNP Markers
Genetic structure is similar among the 150 T. durum accessions, based on the Bayesian clustering model implemented in the Structure software and NJ algorithm implemented in POWERMARKER Ver. 3.25 and PHYLIP (Figures 1 and 2). Neither geographical nor ecological evidence for most accessions was detected in the grouping. This result suggested that the relationships we have found between countries are greatly affected by the within-countries variability. Consequently, countries that showed a large variability do not group easily (their grouping distance is large). AMOVA indicated that 90.81% of the genetic variation resided among accessions within the country (data not shown).
The reason might be that the gene flows via germplasm exchanges among different regions occurred frequently or that human transfer of genes in history made a very big admixture. This is consistent with the known history. Contact between the Old and New World after Columbus' voyages allowed the exchange of many domesticated plants, including wheat. Especially, in the case of the Spanish colonies in Americas, it is well known that Spaniards not only tried by all possible means to introduce their own European culture, but also, with tenacity, to introduce many crops (including durum wheat landraces and cultivars) from Europe to the American territories [10]. Besides, emigration had a profound influence on the world in the 18th, 19th, and 20th centuries. Through trade routes and immigration, new varieties of wheat were sold or shared by people from different regions. Our ongoing experiment, including many more durum landraces collected from Spain and Mexico, will help us further understand germplasm exchanges between the Old and New World.
An alternative or complementary possibility may be found in breeding history. In this study, most of the accessions selected were cultivars (97/150 = 64.7%), and cultivars experienced primarily artificial selection, and only secondarily natural selection, for certain desirable characteristics. For example, breeding efforts focused on early maturity and yield increase before 1930, disease resistance from 1930 to 1970, and multiple disease resistance and quality improvement after 1970 [56][57][58]. Such human activities must have played a great part in a genetic shift. That is also why the grouping pattern of durum wheat accessions appeared to be associated with the released period of varieties to some extent ( Figure 2).
However, not all accessions released from the same period were clustered in the same group. In contrast, some of accessions from the same geographic region were clustered together though into different groups corresponding to their geographical regions of collection ( Figure 2). For example, South America contained 12 accessions; most of which (7/12) were clustered together into Group I, and others were mainly distributed in Group IId. Most of the American accessions (7/13) were clustered together into Group I. These results indicate that many of the accessions were clustered into groups corresponding to their geographical regions of collection, which may be due to the same environmental conditions or to agronomical practices.
Above all, such genetic structures and grouping patterns of the 150 durum wheat accessions were obviously affected by environmental conditions, release period of varieties, and gene flows via germplasm exchanges or artificial transfer of genes.

Genetic Diversity
Measurements of genetic diversity in crops have important implications for plant breeding programs and the conservation of genetic resources. In the present study, temporal and spatial genetic variation was analyzed by comparing the diversity among released periods of varieties and among different geographical origins, respectively.

Temporally: Genetic Diversity vs. Year of Release
It has been argued that the level of genetic diversity in the modern durum wheat cultivar germplasm may have declined due to high-pure breeding selection pressure applied in breeding programs. This is also true for wild emmer wheat and wild barley due to global warming as discovered in a recent study by Nevo et al. [59]. However, our results demonstrated that there still existed a substantial level of genetic variation within a set of durum wheat cultivars as detected by SNP markers (Table 2).
We did find a significant reduction in the diversity of varieties released in the 1960s and 1970s, compared with the diversity levels in the landraces and old cultivars (1930-1964) (p < 0.001, paired t test). But the diversity was significantly increased in varieties released after the 1960s and 1970s (p < 0.001, paired t test) ( Table 2). That is, genetic basis of durum wheat was narrowed down from 1930 to 1980, but was widened from 1981 to 2009 (Table 2). These results are in agreement with the previous reports by Soleimani et al. [11] and Maccaferri et al. [2]. Genetic diversity estimates in modern cultivars of durum wheat using AFLP and pedigree-based techniques showed that the level of genetic variation within the most recently developed cultivars is fairly substantial [11]. Likewise, microsatellite analysis also reveals a progressive widening of the genetic basis in the elite durum wheat germplasm [2]. However, we showed opposite results with Fu et al. who concluded genome-wide reduction of genetic diversity in Canadian wheat breeding programs [56][57][58]. The reasons may be due to differences in materials used and regions of collection. A worldwide durum wheat collection consisting of 150 accessions was used to estimate the genetic diversity in this study, while 75 Canadian hard red spring wheat (T. aestivum L.) cultivars were used in Fu's study.
The low diversity levels of varieties released in 1965-1980 might be due to the "Early Green Revolution", which was characterized by breeding semi-dwarf varieties possessing a higher yielding potential [60,61]. Interestingly, this deduction of genetic diversity was in agreement with decrease of plant height in durum wheat (Tables 2 and 4). The increase in genetic diversity from the 1980s may be explained by a change in the breeding strategy of the International Maize and Wheat Improvement Center (CIMMYT) in the late 1970s. During the last 50 years, CIMMYT has played a great role in wheat improvement including durum. Out of 140 durum varieties released from the period 1966-1992, 90 varieties (64%) are from CIMMYT crosses [62]. When CIMMYT realized the danger of narrowing down their germplasm base in the late 1970s, they changed the breeding strategy, aiming at increasing productivity while ensuring genetic diversity. Our result showed that genetic diversity was narrowed down from 1930 to 1980 but was enhanced from 1981 to 2009 (Table 2), indicating that CIMMYT breeders successfully increased the genetic diversity. The increase in genetic diversity can be obtained mainly through the introgression of various novel wheat materials [63,64], which can be proved in this study. Many cultivars used in this study were obtained by crossing T. dicoccoides and durum wheat. The pedigree information of these accessions used can be obtained from the Germplasm Resources Information Network (GRIN) [65] based on accession identifier # (Table 7). Table 7. List of durum wheat accessions used in the study. Geographical region of origin, year of release, accession identifier #, geographical parameters, and improvement status are reported.    Above all, the reason why genetic diversity is larger in cultivars than in landraces may be due to breeding strategy and breeders' efforts. Alternatively, imbalanced sample size in the two groups (53 landraces vs. 97 cultivars) was used.

Spatially: Genetic Diversity vs. Place of Origin
Generally speaking, great genetic variation should exist in the center of origin and domestication. Moreover, Vavilov reported that the Middle, Near East regions, and North Africa are considered the centers of origin and diversification of durum wheat [66]. However, in this present study, comparative analysis of genetic diversity among the 10 mega ecogeographical regions indicated that the greatest genetic diversity was found in South America, followed by North America and Western Europe, while Middle East showed moderate levels of genetic diversity (Table 6).
These results support the idea that the centers of diversity are not confined exclusively to their centers of origin [5,67]. Harlan [68,69] studied the distribution of variability in crops and concluded that there exist several centers of diversity in different crops which could not be regarded as centers of their origin. But it is worth pointing out that our results correspond to the centers of genetic diversity described by Vavilov [64]: North Africa should be considered as a microcenter of diversity for durum wheat in the southeastern Mediterranean (Table 6).
Higher genetic diversity in the New World than in the Old World where durum evolved was detected. The reason can be explained by a combination of the uneven distribution of landraces or cultivars among countries and different genetic diversity levels between landraces and cultivars used in this study. As shown in Table 2, the greatest genetic diversity was found in the cultivars released from PGR, followed by landraces, old cultivars, and EGR. In this study, a larger number of cultivars released during the period of 1981-2009 existed in ecogeographical regions having greater genetic diversity such as South America, North America, and Western Europe. For example, of the 33 accessions from North America, there are 24 cultivars released during the period of 1981-2009, accounting for 72.7%. To the contrary, Middle East has relatively lower genetic diversity based on 32 accessions, 18 of which are landraces, and 9 are old cultivars.

Divergence between Landraces and Cultivars Revealed by SNP Markers
Durum wheat had undergone intensive selection during domestication and the subsequent breeding process for certain desirable characteristics, such as high and stable yields. Such artificial selection activities may result in significant differentiation at some loci during domestication and the subsequent breeding process, since traits, e.g., grain yield, seed size, plant height, etc., are quantitatively inherited [1]. A Fst-outlier method was used to identify loci that may be under positive selection and therefore might be linked to genome regions conferring the phenotypic variation present in the analyzed germplasm.
We identified 92 candidate loci under positive selection based on Fst values that fall outside of the 99% confidence interval established for the distribution. These loci may be directly under selection, but more likely mark regions of the genome that have been selected during evolution. The loci we identified have a disproportional bias with 54.3% mapping to chromosomes 2, 6 and 7 ( Figure 3, Table 5). This observation suggests that there are "hot spots" for directional selection in durum wheat. In addition, seven genes including P-EA, TsPAP1, CPK10, PI-PLC1, RSZ38, PDS, and LOX3, which play important roles in plant responses to biotic and abiotic stresses or in grain storage in wheat, appear to be under selection when comparing landraces with cultivars (Table 5). These results suggest that the use of objective approaches to identify outliers will reveal portions of the genome that are under selection. Such objective assessment will provide a scalable means for comprehensive assessments of genetic variation within durum wheat as emerging sequence data and improved genotyping platforms lead to larger data sets [49].

Plant Materials
A total of 150 durum wheat accessions consisting of 53 landraces and 97 cultivars were used in this study. Ninety-seven cultivars were further divided into three temporal groups according to their released period: group 1, 1930-1964 (old cultivars, OC); group 2, 1965-1980 (Early Green Revolution, EGR); group 3, 1981-2009 (Post Green Revolution, PGR) [62,63,70,71]. The "Early Green Revolution" was characterized by breeding semi-dwarf varieties. The first semi-dwarf durum variety was released in Mexico in 1965 [60,61]. These 150 accessions were collected from 10 mega ecogeographical regions: East Asia, South Asia, Middle East, North America, South America, Oceania, Western Europe, Eastern Europe, South Africa, and North Africa, covering 41 countries and spatially reflecting different genetic backgrounds (Figure 4). Detailed information about each accession is shown in Table 7. Only those countries with durum wheat sampling are indicated by green asterisks.
The 150 durum wheat accessions were genotyped with 1536 SNP markers. These SNPs, discovered in a panel of 32 lines of tetraploid and hexaploid wheat, were downloaded from the Wheat SNP Database [73]. SNP selection and assay design were performed according to previously described procedures [35,74]. The following criteria were applied for SNP selection: no more than 2 SNPs were selected per locus, with preference being given to SNPs present in at least two lines in the discovery panel. Additional SNPs were discovered by sequencing the transcriptomes of T. aestivum cv. Chinese Spring and Jagger [35,74].
A total of 150 ng of genomic DNA per genotype was used for Illumina SNP genotyping at the Genome Center of University of California in Davis using Illumina Bead Array platform and Golden Gate Assay following the manufacturer's protocol [75]. Genotype scores were called using the Illumina's Genome Studio V 2010.3. Each of the 1536 SNP clusters was manually examined to correct imperfect calling of automated clustering.

Genetic Diversity
Genetic diversity was evaluated using POWERMARKER Ver. 3.25 [76]. The genetic parameters including Nei's gene diversity and polymorphism information content (PIC) were used. Nei's gene diversity was defined as the probability that two randomly chosen alleles from the population are different [77]. PIC values provide an estimate of the probability of finding polymorphism between two random samples of the germplasm.

Genetic Structure and Population Differentiation
In order to have a better insight into the genetic structure of durum wheat, different methods were exploited. First, we applied the Bayesian model-based clustering algorithm implemented in STRUCTURE 2.2 [78]. Admixture and correlated allele frequency models were employed with a number of clusters (K) ranging from 1 to 12. For each K, five runs were carried out. Burn-in time and replication number were both set to 100,000 for each run. Accessions with probability of membership greater than 80% were assigned to a subgroup, while those with lower probabilities were assigned to the "mixed" subgroup. Dendrograms, based on the NJ algorithm according to shared-allele distance, were also used to analyze the genetic structure of the germplasm. A phylogenetic tree was implemented by POWERMARKER Ver. 3.25. Bootstrapping over loci with 1000 replications was carried out to assess the strength of the evidence for the branching patterns in the resulting NJ tree. A consensus tree with bootstrap values was reconstructed by the consensus program of PHYLIP [79] and displayed by FigTree Ver.1.3.1 [80].
The population differentiation was assessed with the AMOVA implemented in the ARLEQUIN version 3.11software [81]. Significance levels for variance components were estimated using 16,000 permutations. We identified loci under positive selection between landrace and cultivars using a Fst-outlier detection method as implemented in the LOSITAN workbench [50]. The analysis was performed with 100,000 simulations using an infinite allele model. Based on Fst values that fall outside of the 99% confidence interval, candidate loci identified under positive selection were used for further analysis.

Statistical Tests
SPSS V.13.0 program was used for statistical analyses [82]. The significance of differences for Nei's gene diversity and PIC among chromosomes was tested by estimating a 95% confidence interval (CI) of the genome mean, which was calculated using bootstrap analysis with 1000 replications. Chromosome means outside of the 95% CI were declared significantly different from the genome mean [36]. The Paired t test was used to test the significance of differences of genetic diversity between genomes using Nei's gene diversity and PIC per chromosome as variables. The significance of differences for genetic diversity parameters between cultivars and landrace were also tested by paired t test. The plant height data were analyzed by analysis of variance (ANOVA) and the means among group were further tested by Duncan's Multiple Range Test.

Conclusions
In this study, we used worldwide germplasm accessions and 946 SNP markers to estimate genetic structure and genetic diversity of durum wheat on the whole genome level. Genetic structure, based on a set 150 accessions from different places of origin, was greatly affected by many factors, such as environmental conditions, release period of varieties, and gene flows via germplasm exchanges or human activities. Genetic diversity indicated that there still existed a substantial level of genetic variation within modern cultivars of durum wheat as detected by SNP markers, despite rigorous selection pressure aimed at cultivar purity and associated breeding practices. Our results can be used to accelerate wheat improvement by addressing the patterns of genetic variation within durum wheat, conserving adequate type and number of germplasm accessions and helping breeders maximize the level of variation present in segregating populations by crossing cultivars with greater genetic distance.