An Analysis of Genetic Variability and Population Structure in Wheat Germplasm Using Microsatellite and Gene-Based Markers

Knowledge of the natural patterns of genetic variation and their evolutionary basis is required for sustainable management and conservation of wheat germplasm. In the current study, the genetic diversity and population structure of 100 individuals from four Triticum and Aegilops species (including T. aestivum, Ae. tauschii, Ae. cylindrica, and Ae. crassa) were investigated using two gene-based markers (start codon targeted (SCoT) polymorphism and CAAT-box derived polymorphism (CBDP)) and simple-sequence repeats (SSRs). The SCoT, CBDP, and SSR markers yielded 76, 116, and 48 polymorphism fragments, respectively. The CBDP marker had greater efficiency than the SCoT and SSR markers due to its higher polymorphism content information (PIC), resolving power (Rp), and marker index (MI). Based on an analysis of molecular variance (AMOVA) performed using all marker systems and combined data, there was a higher distribution of genetic variation within species than among them. Ae. cylindrica and Ae. tauschii had the highest values for all genetic variation parameters. A cluster analysis using each marker system and combined data showed that the SSR marker had greater efficiency in grouping of tested accessions, such that the results of principal coordinate analysis (PCoA) and population structure confirmed the obtained clustering patterns. Hence, combining the SCoT and CBDP markers with polymorphic SSR markers may be useful in genetic fingerprinting and fine mapping and for association analysis in wheat and its germplasm for various agronomic traits or tolerance mechanisms to environmental stresses.


Introduction
Genetic erosion is one of the negative consequences of modern agriculture using improved high-yield cultivars. In addition, climate change directly impacts the occurrence of abiotic stresses such as drought, heat, and salinity, which pose serious risks to agricultural production. One of the ways to increase resilience to these adverse conditions is to take advantage of potential new alleles in the gene pool of plants [1]. Due to the limited genetic diversity in modified crop species to adapt to climate change and the consequently limited possibility of obtaining new alleles in these species, wild relatives of crops may be a rich and diverse gene source of new alleles and may be ideal for breeders. Most studies on wild relatives of crop species have focused on wild ancestors of wheat [2]. Using new gene sources from this germplasm is a good approach to establish new varieties [3,4]. Indeed, wild relatives of wheat, especially the genera Aegilops and Triticum, are precious genetic resources that contain many genes associated with resistance to different abiotic stresses and have interesting breeding potential. Hence, due to their high level of genetic diversity, these species play a key role in wheat breeding programs [2,[5][6][7][8][9][10].
One of the basic requirements for wheat breeding is to estimate the diversity of wild relatives of wheat for breeding goals [11]. Genetic diversity is the basis of any breeding program and modeling genetic diversity may reveal possible adaptations to different environments. Studying genetic diversity also makes it possible to identify genetic traits associated with important breeding goals [12]. Due to the high level of genetic diversity within the germplasm resources of wild relatives of wheat in Iran [7,13], these natural resources may be beneficial in wheat breeding programs. Assessing the genetic diversity in germplasm assemblies is one of the main tasks of breeding programs, as this may assist in selecting cultivars and lines with higher diversity and better performance under specific conditions [7]. In this regard, DNA markers are a suitable tool to assess the genetic structure of plant populations and to analyze genetic diversity in plant germplasm. There have been several reports on the use of polymerase chain reaction (PCR)-based markers in the evaluation of wheat germplasm [14][15][16][17]. PCR amplifications have been applied to analyze amplification fragment length polymorphisms (AFLP), random amplified polymorphic DNA (RADP), simple sequence repeats (SSR), and inter-simple sequence repeats (ISSR). PCR-based methods have been used to identify mainly neutral, relatively repetitive sequences of the genome [18].
Molecular markers provide useful information for crop plant breeding, particularly in studies of genetic variability and genetic relationships among different accessions of several plant species. Among molecular marker systems, SSR is the most popular PCR-based marker. This marker has been widely used to analyze genetic diversity among different plant species. SSRs are consecutive repeats of one to six nucleotides in both coding and non-coding regions. SSRs are a selective genotype marker due to their high frequency, high level of allelic diversity, co-dominant inheritance, and analytical convenience [19]. In addition, this marker can be used effectively in phylogenetic studies, identification of genetic diversity, production of wheat genome mapping, and estimation of genetic relationships among extensions [20][21][22]. In the last decade, progress in molecular markers has yielded gene-based markers for biological research. CATT box-derived polymorphism (CBDP), a promoter-targeted marker, uses the nucleotide sequence of the CAAT box of plant promoters. This marker possesses a specific pattern of consensus sequence nucleotides (GGCCAATCTs) located upstream of the start codon of eukaryotic genes [23]. CBDP primers contain 18 nucleotides that consist of a central core of the CCAAT nucleotide, which is located at the end of the filler sequence at the 5 end and di-or tri nucleotides at the 3 end [23]. CBDP primers are PCR-based DNA markers that are inexpensive, highly polymorphic, and contain extensive genetic information that may be useful for assessing genetic variation and population structure, identifying genotypes, and mapping quantitative trait loci (QTL) [24][25][26][27]. Start codon targeted (SCoT) polymorphism is another gene-based marker that was designed based on short-protected regions around the start code (ATG) in plant genes. Similar to CBDP, this marker uses an 18-nucleotide primer that enables the detection of sequence polymorphisms (ATGs) in plant genes. SCoT markers are highly polymorphic and reproducible, and designing primers for this marker does not require information on the genome sequence [27]. In addition, this technique can provide additional information on biological properties as compared with other DNA marker techniques. SCoT markers have been successfully applied in many plant species [16,24,25,[28][29][30].
The main objective of this study was to investigate the genetic diversity within and among selected wild relatives of wheat using the molecular markers SCoT, CBDP, and SSR. Furthermore, a comparative analysis using these markers was also performed.

Genetic Materials and DNA Isolation
A set of 100 accessions, including 25 samples each from T. aestivum, Ae. tauschii, Ae. cylindrica, and Ae. crassa, were analyzed in this study (Table 1). All samples were provided from Ilam University Genebank (IUGB). The total genomic DNA for all studied accessions was extracted according to the CTAB protocol [31]. Agarose gel (0.8%) electrophoresis was used to assess the quality of extracted DNAs.

PCR Amplification and Genotyping Assays
For SCoT analysis, eight primers were selected based on the literature [15,32] (Table 2). PCR amplifications were performed in a 20 µL volume and consisted of 10 µL PCR master mix (ready-to-use PCR master mix 2X, Ampliqon, Odense, Denmark), 2 µL of DNA, 2 µL of each primer, and 6 µL ddH 2 O. All reactions were performed under the following conditions: an initial denaturation step of 5 min at 94 • C, 45 cycles of denaturation for 45 s at 94 • C, primer annealing for 45 s (temperature varies for each primer), extension for 3 min at 72 • C, and final extension for 7 min at 72 • C. Amplified fragments were stained with SafeView II and visualized by gel electrophoresis in 1.5% agarose. A set of 12 CBDP primers were designed based on Singh et al. [23] for CBDP analysis ( Table 2). Each PCR reaction was amplified in a 20 µL volume and consisted of 2 µL DNA, 2 µL primer, 6 µL ddH 2 O, and 10 µL master mix (ready-to-use PCR master mix 2X, Ampliqon). All reactions were performed as follows: initial denaturation step for 5 min at 94 • C, 45 cycles of denaturation for 45 s at 94 • C, primer annealing for 45 s at 56 • C, primer elongation for 90 s at 72 • C, and final extension for 10 min 72 • C. The PCR products were stained with Safestaine-II (Yekta Tajhiz Azma, Tehran, Iran) and visualized on a 1.5% agarose gel with a gel documentation device.
In the SSR analysis, 25 microsatellite primers were selected to form a set of SSR developed based on the D genome of bread wheat by Roder et al. [21] (Table 3). Similar to other marker systems, all PCR reactions were performed in 20 µL reaction mixture containing 10 µL master mix 2XPCR (ready-to-use PCR master mix 2X, Ampliqon), 6 µL ddH 2 O, 2 µL template DNA from each sample, and 2 µL each primer, respectively. Amplification reactions were run at 5 min for 95 • C, followed by 35 cycles of denaturation for 45 s at 95 • C, primer annealing for 45 s (temperature varied for each primer from 51.3 to 69.3 • C) and primer elongation for 1 min at 72 • C. The final extension was 5 min at 72 • C. The amplified products were visualized on a 2% agarose gel, stained with SafeView II and visualized under UV light using an imaging system.

Data Analysis
The binary matrices were created based on the presence (1) and absence (0) of amplified fragments across all studied samples. Several informativeness parameters were calculated, such as the number of polymorphic bands (NPB), resolving power (Rp), and marker index (MI). The analysis of molecular variance (AMOVA) was performed using the GenAlEx package ver. 6.5 [33]. Several genetic parameters, including the number of observed alleles (Na), effective number of alleles (Ne), Shannon's information index (I), percentage of polymorphic loci (PPL), and Nei's gene diversity (H), were estimated using the GenAlEx package [33]. Jaccard's genetic similarities coefficients were used to create phylogenetic dendrograms using the MEGA ver. 5.1 software [34]. Furthermore, principal coordinate analysis (PCoA) was performed using the GenAlEx package [33]. The STRUCTURE 2.3.4 software [35] was used to analyze ancestral population structure based on Bayesian clustering model. This analysis was run 10 times, with each run consisting of 100,000 steps followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations, presuming an admixture framework with correlated allelic and several clusters (K) ranging from 1 to 10. The optimum number of K was estimated using the web-based STRUCTURE HARVESTER v2.3.4 [36]. Table 2 provides brief information on the informativeness parameters for SCoT, CBDP, and SSR markers. The eight SCoT primers amplified a total of 76 fragments across 100 samples of bread wheat landraces and its wild relatives; all were polymorphic. The number of polymorphic bands varied between 7 and 12 with a mean of 9.50. The polymorphism information content (PIC) ranged from 0.34 to 0.48 with a mean of 0.42. The lowest and highest PIC values were recorded for the SCoT-3 and SCoT-18 primers, respectively. The average MI was 3.97 and primers SCoT-3 and SCoT-19 had the lowest (3.12) and highest (5.11) values. The Rp ranged from 4.44 (SCoT-3) to  with an average of 7.07. In the CBDP analysis, 12 primers amplified 116 polymorphic fragments. The average number of polymorphic bands was 9.67, and primers CBDP-12 and CBDP-6 showed the minimum (8) and maximum (12) numbers, respectively. PIC ranged from 0.40 to 0.48 with a mean of 0.45. The lowest and highest values of this parameter were observed for CBDP-10 and CBDP-1 primers, respectively. The MI (mean of 4.33) had the highest variability among tested primers (range from 3.56 to 5.36). Rp varied between 6.32 and 9.98 with an average of 8.42. The two primers, CBDP-7 and CBDP-1, had the lowest and highest values, respectively. In the SSR analysis (Table 3), 25 primers generated a total of 49 polymorphic alleles in 100 investigated samples. The PIC values for the used primers varied between 0.09 (Xgwm-121) and 0.50 (Xgwm-16), with a mean of 0.32. Some of these primers showed the highest PIC values ( Table 3). The MI values ranged from 0.19 (Xgwm-121) to 1 (Xgwm-16) with a mean of 0.64. Rp (mean 2.52) varied between 1.80 (Xgwm-349) and 3.84 (Xgwm-272).

Genetic Diversity Analysis
To dissect the genetic diversity that exists in between and among the investigated populations, an AMOVA was performed based on each marker system and combined genotyping data ( Table 4). The AMOVA results indicated that the percentage variance was higher within populations (SCoT = 81%, CBDP = 80%, SSR = 58%, combined data = 77%). A population genetic diversity analysis using SCoT showed that the highest Na value was estimated among the Ae. crassa accessions. The highest values of Ne, expected heterozygosity (He), I, and PPL, were estimated among Ae. cylindrica accessions. In the CBDP analysis, the highest values of Ne, I, He, and PPL were estimated for the Ae. cylindrica population. The SSR analysis showed that the Ae. tauschii population wa the more diverse population as compared with other populations due to the highest values of all genetic variation parameters. This finding was confirmed by analyzing the combined data (SCoT + CBDP + SSR) ( Table 4).

Genetic Distance and Grouping of Samples
The molecular data from SCoT, CBPD, SSR, and combined markers were used to estimate Jaccard's genetic distance coefficient (GD) pairs of investigated wheat accessions. In the SCoT analysis, the GD values ranged from 0.068 to 0.909 with a mean of 0.720. The highest GD value was estimated between two samples of T. aestivum (accessions No. 33 and No. 17); the lowest value was found between two samples of Ae. crassa (accessions No. 76 and No. 84). Using the CBDP data, the GD values ranged between 0.068 and 0.909 with an average of 0.684. The highest and lowest GD coefficients were found between accessions No. 30 (Ae. tauschii) and No. 98 (Ae. crassa) and between No. 64 (Ae. cylindrica) and No. 63 (Ae. cylindrica), respectively. In the SSR analysis, the average GD value was 0.780 and ranged between 0.0750 and 0.956. The highest GD was estimated between accession No. 4 (T. aestivum) and No. 65 (Ae. cylindrica), whereas the lowest was found between two samples of Ae. tauschii (accessions No. 30 and No. 33). The analysis of combined data showed the average of GD was 0.810. Two samples of T. aestivum (accessions No. 12 and No. 21) and two samples of Ae. tauschii (accessions No. 30 and No. 43) showed the highest and lowest GD values, respectively (data not shown).
To investigate the genetic relationships among wheat landraces and other wild relative accessions, cluster analyses based on Jaccard's similarity coefficients and neighbor-joining (NJ) algorithm were computed for each marker system and combined data (Figure 1). Based on the SCoT data, results of the cluster analyses showed that most of all investigated accessions were clearly separated into separated groups and subgroups. However, some accessions from different species were clustered with each other in the same group ( Figure 1A). The efficiency of CBDP data in grouping of accessions was lower than the SCoT marker. As shown in Figure 2B, except for a few accessions from each species that grouped with each other, the remaining accessions were a mixture in the same group or subgroup. The dendrogram rendered by the SSR data revealed a clear grouping pattern of the studied accessions ( Figure 1C). All accessions belonging to Ae. crassa and Ae. cylindrica clustered into two distinct groups. T. aestivum and Ae. tauschii accessions created the unique group. Except for two accessions of T. aestivum and Ae. tauschii, all samples of these species clustered into distinct subgroups. When the cluster analysis was computed using combined data, a better grouping pattern of classification was observed. As shown in Figure 1D, only one accession of T. aestivum was separated from its group and clustered with Ae. cylindrica accessions. The results of Mantel's test further supported these results. Based on Mantel's test [37], there were positive and significant correlations among all the used marker systems (data not shown). The PCoA results further confirmed the grouping pattern. Based on these results, the first two coordinates accounted for 52.95%, 51.66%, 54.57%, and 54.47% of the total molecular variation using SCoT, CBDP, SSR, and combined data, respectively ( Figure 2). A comparative analysis showed that the SSR marker grouped well among all investigated accessions according to phylogenetic relationships. According to the SSR results, accessions belonging to Ae. cylindrica and Ae. tauschii clearly separated from each other. Similar to the dendrogram obtained by cluster analysis, the samples belonging to T. aestivum and Ae. tauschii were scattered in the same position of biplot.

Structure and Pattern of Classification
Stratification of the genetic population of the total sample assembly based on all marker systems showed the existence of a distinct structure. For this analysis, we used an assumed population range from K = 2 to K = 10, with 10 replications per K. In the SCoT analysis, the optimum number of subpopulations was K = 4. The first subpopulation consisted of fourteen and eighteen accessions of Ae. cylindrica and Ae. crassa. The second subpopulation consisted of six accessions of Ae. crassa. All the Ae. tauschii accessions, along with one and nine accessions from T. aestivum and Ae. cylindrica, created the third subpopulation. The fourth subpopulation consisted of twenty-four accessions of T. aestivum and two accessions of Ae. cylindrica. One accession of T. aestivum was identified as an admixture sample ( Figure 3A). In the CBDP analysis, the optimum number of subpopulations was three. The first subpopulation consisted of four, six, and two accessions of Ae. tauschii, Ae. cylindrica, and Ae. crassa, respectively. The second subpopulation included fourteen accessions of Ae. tauschii. The remaining accessions from all species were in the third subpopulation. One accession from Ae. tauschii showed admixture status between two first subpopulations ( Figure 3B). The population structure analysis using SSR data showed a clear pattern of classification. All samples grouped into four distinct subpopulations. The first subpopulation consisted of all accessions of Ae. tauschii along with three accessions of T. aestivum species; the second subpopulation included of all accessions of Ae. crassa; the third subpopulation consisted of all Ae. cylindrica accessions; and the fourth subpopulation included the remaining accessions of T. aestivum ( Figure 3C).  Table 1.

Structure and Pattern of Classification
Stratification of the genetic population of the total sample assembly based on all marker systems showed the existence of a distinct structure. For this analysis, we used an assumed population range from K = 2 to K = 10, with 10 replications per K. In the SCoT analysis, the optimum number of subpopulations was K = 4. The first subpopulation consisted of fourteen and eighteen accessions of Ae. cylindrica and Ae. crassa. The second subpopulation consisted of six accessions of Ae. crassa. All the Ae. tauschii accessions, along with one and nine accessions from T. aestivum and Ae. cylindrica, created the third subpopulation. The fourth subpopulation consisted of twenty-four accessions of T. aestivum and two accessions of Ae. cylindrica. One accession of T. aestivum was identified as an admixture sample ( Figure 3A). In the CBDP analysis, the optimum number of subpopulations was three. The first subpopulation consisted of four, six, and two accessions of Ae. tauschii, Ae. cylindrica, and Ae. crassa, respectively. The second subpopulation included fourteen   Table 1.  Table 1.

Discussion
The genetic diversity of wild relatives of wheat, which is known as the main germplasm for bread wheat, should be elucidated for conservation and utilization and to expedite breeding programs. Molecular markers are efficient and accurate tools to reveal and estimate genetic diversity and to determine the population structures of many plant species [30]. Over the past decade, several novel gene-based marker systems have been developed to aid the investigation of genetic diversity and population structure analyses. The SCoT and CBDP markers are two of these novel molecular systems. Several studies have indicated that these markers had good capabilities in genetic research due to their ability to reveal polymorphisms in conserved regions and their high reliability as compared with other systems [15,16,23,25,27,32,38]. In the present study, data were provided on the genetic diversity and structure of 100 samples of Aegilops and Triticum populations collected from different natural habits of Iran. The results of this work revealed that the SCoT, CBDP, and SSR markers could be successfully used to investigate genetic diversity variation among and within populations of bread wheat. The applicability of these marker systems to characterize the genetic diversity and phylogenetic relationships was also compared. The CBDP marker showed higher polymorphism than the SCoT and SSR marker systems (Tables 2-4).
The mean values of PIC, Rp, and MI were also higher for the CBDP marker than for the SSR and SCoT markers. Thus, the CBDP marker is a more efficient molecular system for investigating the genetic diversity among wheat germplasms. Similarly, the reliability and efficiency of this marker system to examine the genetic diversity in wheat and other plants has been reported [16,25,39]. Several reports on the efficiency of SCoT have indicated that it is a suitable molecular tool to dissect polymorphisms in wheat germplasm [15,30].
Based on the results of the AMOVA analysis using each marker system and combined data, the level of genetic diversity within species was greater than among them, which indicated that all samples from each species had a diverse genetic background (Table 4). When the rate of diversity was evaluated using several genetic variation parameters, we found that the CBDP and SCoT markers yielded higher values for all parameters (Na, Ne, He, I, and PPL) than the SSR marker (Table 4).
Indeed, one the main reasons for this result may be related to the number of amplified fragments. Among the four species, the highest values of the genetic variation parameters were estimated for Ae. cylindrica using the SCoT and CBDP markers, while the use of the SSR marker and combined data yielded the highest values related to Ae. tauschii (Table 4). Although the obtained results from different markers were different, revealing Ae. cylindrica and Ae. tauschii as the most diverse species was a notable finding. Similarly, several studies have reported a high level of diversity among these species using agro-morphological characteristics and different molecular marker types. For instance, the high level of genetic diversity in Ae. cylindirca has been reported using the SCoT marker [32]. However, using the CBDP marker indicated a high level of diversity in Ae. tauschii as compared with other wild relatives of wheat [39]. Furthermore, the SSR marker showed that Ae. cylindrica and Ae. tauschii had the highest values of genetic variation parameters as compared with bread wheat and its other relatives [40]. Among the wild relatives of wheat, Ae. cylindrica and Ae. tauschii have good potential for use in breeding programs, and various breeding aspects of these species have been highlighted in numerous studies [3][4][5][6]9,10,41]. In this way, Pour-Aboughadareh et al. [6] reported that Ae. tauschii, due to its physiological mechanisms, could be used as an ideal genetic source for discovery of novel genes to improve drought tolerance in bread wheat. In another study, Ahmadi et al. [42] reported that Ae. tauschii responded well to high levels of salinity stress as compared with other ancestral species. The breeding potential of these species has been highlighted in a review by Pour-Aboughadareh et al. [10].
The clustering patterns of samples generated by all marker types were different in some cases. However, the pattern obtained by the SSR marker was clearer than other markers (Figure 1). The best clustering pattern was obtained when genotyping data were combined. Hence, we propose that combining gene-based markers (i.e., SCoT and CBDP) and a conserved marker (i.e., SSR) would yield the best grouping of accessions based on their genetic background, as has also been observed in previous studies [15,25,32]. In this regard, the results of the Mantel test between two pairs of markers showed that the SSR marker had a positive and significant correlation with the SCoT and CBDP markers. In the present study, we used a PCoA analysis to confirm the clustering patterns. The best pattern of classification was obtained by SSR data (Figure 2). In general, these results were confirmed by the population structure analyses. Based on the structure analysis (Figure 3), all investigated samples were separated from each other based on their taxonomic group. Indeed, our results suggest that conserved markers (such as SSR) have greater efficiency than gene-based techniques for studying phylogenetic relationships.

Conclusions
The present study indicated a high level of genetic diversity within wild relatives of wheat, especially Ae. cylindrica and Ae. tauschii. Knowledge of the genetic diversity of these species may assist in efficient management of these natural germplasms of wheat. Our findings also revealed that two gene-based markers, SCoT and CBDP, are more suitable for detection of polymorphism rate and provide greater informativeness than the SSR marker. However, the SCoT and CBDP markers are suitable for use in fine mapping studies.