Genome-Wide Association Mapping for Stripe Rust Resistance in Pakistani Spring Wheat Genotypes

Stripe rust caused by the pathogen Puccinia striiformis f. sp. tritici (Pst) is a major threat for wheat, resulting in low yield and grain quality loss in many countries. Genetic resistance is a prevalent method to combat the disease. Mapping the resistant loci and their association with traits is highly exploited in this era. A panel of 465 Pakistani spring wheat genotypes were evaluated for their phenotypic response to stripe rust at the seedling and adult plant stages. A total of 765 single nucleotide polymorphism (SNP) markers were applied on 465 wheat genotypes to evaluate their stripe rust response against nine races during the seedling test and in three locations for the field test. Currently, twenty SNPs dispersed on twelve chromosomal regions (1A, 1B, 1D, 2A, 2B, 4A, 4B, 5B, 6A, 6B, 6D and 7B) have been identified that were associated with rust race-specific resistance at the seedling stage. Thirty SNPs dispersed on eighteen chromosomal regions (1A, 1B, 1D, 2A, 2B, 2D, 3A, 3B, 3D, 4B, 5A, 5B, 6A, 6B, 6D, 7A, 7B and 7D) are associated with adult plant resistance. SNP loci IWB3662 was linked with all three Pakistani races, and likewise IWA2344 and IWA4096 were found to be linked with three different USA races. The present research findings can be applied by wheat breeders to increase their resistant capability and yield potential of their cultivars, through marker-assisted selection.


Introduction
Triticum aestivum L., commonly known as bread wheat, is the world's major food crop for about 2 billion people. It fulfills the nutritional requirements of almost 1/3 of the world's population (about 35%) by providing half of their protein and above half of their calorie requirement [1]. Wheat contains a high quantity of dietary nutrients which includes carbohydrates (70%), crude fibers (22%), proteins (12%), water (12%), fats (2%) and minerals (1.8%) [2].
Wheat can withstand a wide range of climate changes but it is also prone to different biotic and abiotic stresses. Among the biotic stresses (rust, smut, mildew, bunt etc.), rust has global economic and historic significance. Rust (leaf, stripe and stem) is the most devastating fungal disease that is threatening the overall world wheat production [3] and its cyclic rotation is considered responsible for famine in many parts of the world.
Wheat in its growing season, if faced with a cool environment, is mostly arrested by stripe (yellow) rust which is a major foliar disease caused by Puccinia striiformis f. sp. tritici (Pst) [4], but nowadays wheat growing in warmer regions is also prone to stripe rust epidemics [5,6]. Mostly, 0.1-5% wheat yield losses are observed due to stripe rust disease and rarely, losses expand up to 25% [4]. The yield loss due to stripe rust can exceed up to 100% for susceptible wheat landraces. Wheat infected with

Phenotypic Variation to Stripe Rust Response
The phenotypic characterization of 465 spring wheat genotypes was done with nine Pst races (six races selected from the United State of America and three races from Pakistan). The seedling test was performed in a greenhouse (controlled condition) and the rust-infection score (infection type (IT)) collected is summarized in Figures 1 and 2. The frequency of resistance and susceptibility varied with all the stripe rust races. About 15,56,72   Wheat genotypes showed a slightly different behavior with the Pakistani stripe rust races such that the numbers of resistant genotypes were greater when compared to the USA stripe rust races ( Figure 2). The phenotypic analysis revealed 72 resistant genotypes to stripe rust races PK07-4 and PK07-12, while 98 genotypes showed a resistance response against the PK08-2 stripe rust race. Most of the genotypes were susceptible to the USA races PSTv-37 and PSTv-198 and the Pakistani race PK07-12. Most of the genotypes showed resistance to the USA races PSTv-40 and PSTv-51 and to the Pakistani race PK08-2 (Figures 1 and 2).
In a field trial, the adult stage response of the wheat genotype against the stripe rust was slightly different as compared to the controlled conditions in the greenhouse. In a field, the resistance behavior against stripe rust infection of the genotypes was observed, namely that of PAK-University of Agriculture, Faisalabad  (Figures 3 and 4). The genotypes susceptibility frequencies were higher in the USA locations than in Pakistan, hence the response varied with the environment. In Pakistan, due to the warm environment, the disease impact was slightly less but was not negligible as compared to the cold environment near Pullman, WA, USA. The estimation of the variance component showed a highly significant behavior (p < 0.0001) of the genotypes and the genotype x environment interaction across all the environments. Non-significant responses of environmental variance indicate the variable climate conditions and the influence of different races in the disease development. A high range of the coefficient of determination and broad sense heritability of (H 2 ) 87% for the IT and 93% for the SEV indicate the reliability of the dataset for GWAS (Table 1).

Population Structure and Linkage Disequilibrium
The population structure was performed on a total 465 entire wheat genotypes by the Bayesian clustering approach. An admixture-based model was used to cluster the genotype into three subpopulations based on the ∆K ( Figure 5; Supplementary Table S4) value. Population structure reduces the false positive association among the markers and traits. Subpopulation one (Q1) contains 85 individuals. Similarly, subpopulation two (Q2) contains 179 and subpopulation three (Q3) contains 201 individuals. Each subpopulation is shown by a different color in the cluster analysis. The length of each color represents the estimated contribution of each sample to the subgroups.
After the filtration of minor allelic frequency, 765 SNP markers out of 1500 SNPs were used for the linkage disequilibrium analysis (LD). Linkage disequilibrium (LD) depends on many factors, including the population structure, genetic drift, chromosomal region and natural selection. LD decay relies on the value of r 2 , whose value is calculated for all chromosomes. Critical r 2 value 0.12 was identified for all 465 spring wheat genotypes by taking the 95th percentile of the coefficient square (showed by the red line in Figure 6). The mean r 2 value across the genome was found 0.03 with 50 cm distance. The highest number of pair-markers were found on the A genome (48%) followed by the B genome (46%) and the D genome (5%). Chromosome 1A had the highest number of pairs (2138) and 4D had the minimum number (nine) of pair markers. The LD decay was constructed using chromosomal distance and the critical r 2 value as the threshold to indicate the LD decay length, which attained 1.25 cm for the whole genome ( Figure 6).  Asterisks ** significant at p < 0.0001 and * significant at p < 0.005, ns: non-specific, IT: infection type, SEV: disease severity, Var: variance, R 2 : coefficient of determination, SD: standard deviation, H 2 : broad sense heritability.

Marker-Trait Association at Seedling Stage
Genome-wide association studies (GWAS) of SNP markers with IT scores of nine stripe rust races identify twenty SNPs associated with rust resistance at the seedling stage (Table 2). Manhattan and Q-Q plots, showing the marker-trait association of stripe rust-resistance response at the seedling stage, with all nine races IT scores are provided in the Supplementary Figures S1 and S2. Twelve chromosomal regions (1A, 1B, 1D, 2A, 2B, 4A, 4B, 5B, 6A, 6B, 6D and 7B) were found to be significantly associated (p < 0.0001) with the IT scores of nine rust races at the seedling stage. A total of fourteen SNP loci were identified to be linked with resistance against USA races and twelve loci were identified to be linked with stripe rust-resistance response to Pakistani races. One SNP present on chromosome 1A (IWB3662) was found to be associated with all three Pakistani rust races, namely PK07-4, PK07-12 and PK08-2. A SNP on the 1B chromosome (IWB12258) was linked with the IT score of PK08-2. At the chromosome 1D, two SNPs IWA7171 (linked to PK07-12) and IWA4344 (linked to PSTv-4) were identified. Two SNP markers, IWB50806 linked with PK07-12 and IWA7638 linked with PSTv-14, were identified on chromosome 2A. The targeted SNP IWA2344 was associated with three rust races PSTv-198, PSTv-51 and PSTv-40 at chromosome 2B. Similarly, for the three rust races, PSTv-198, PSTv-51 and PSTv-4, one SNP (IWA4097) marker on the chromosome 1B was identified. Two SNPs were identified at chromosome 2B associated with six stripe rust race. At chromosome 4A, one SNP (IWA3361) identified associated with PSTv-4. Three SNPs identified on the 4B chromosome, IWB12434, IWA2031 and IWA408, were associated with PK07-12, PSTv-14 and PK08-2, respectively. The SNP IWB5781 was associated with PSTv-4, identified at chromosome 5B. At 6A three SNPs identified IWA8022 linked with PSTv-40 and IWA3463, IWB25252 both linked with PK07-12. One SNP (IWB7615) on chromosome 6B was associated with PSTv-51. IWB12259 was linked with PK07-4 at the location 6D. Finally, two SNPs (IWA6857 and IWB10895) were associated with PK07-12 and PSTv-40 on chromosome 7B. Seven SNPs were identified at genome A, ten SNPs at genome B and three SNPs at genome D. The largest number of SNPs (six) was identified to be linked to PK07-12. The identified SNP markers associated with rust race-specific resistance can be linked to different Yr genes based on the virulence and avirulence formula of races (Supplementary Table S2 and S3).

Marker-Trait Association at Adult Plant Stage
Genome-wide association studies of 765 SNPs with the IT and SEV responses of 465 wheat genotypes against stripe rust at three different locations were performed. The association with the IT scores at different locations, namely PAK-UAF16, PAK-UAF17, USA-MTV18, USA-PCFS18 and with the BLUE (best linear unbiased estimator) value identifies twenty-two SNPs that were associated with stripe rust resistance at the adult plant stage. Significant SNPs was identified based on p < 0.001. A total of fifteen chromosomal regions (1A, 1B, 1D, 2A, 2B, 2D, 3A, 3B, 4B, 5A, 5B, 6A, 6D, 7A and 7B) were mapped with the identified SNPs. The highest number of SNPs with an IT score was identified in the USA environment, whereas with the SEV response, twenty-two SNPs were mapped in sixteen chromosomal regions (1A, 1B, 1D, 2A, 2B, 2D, 3A, 3B, 3D, 4B, 5A, 6A, 6B, 7A, 7B and 7D). Overall, thirty significant SNPs were identified with both IT and SEV response. The highest number of loci (eight) was mapped for the 2A chromosomes with the SNPs IWA11136, IWB12554 and IWA6798 at 9.41 cm, 143.6 cm and 150.1 cm respectively. The SNP name, chromosome, position and the probability of its association with IT and SEV, are discussed in detail in Table 3. Five chromosome regions (1D, 3D, 4B, 6D and 7A) were identified that were mapped with significant SNPs in the Pakistan environments with both the IT and SEV disease scores of the genotypes. Nine regions were identified with both the IT and SEV response in the environment USA-MTV18 and eight regions were mapped with significant SNP markers in the environment USA-PCFS18 (Table 3). In the present work, the SNP marker, IWB11136 (2A at 9.41 cm), was found to be linked with all the USA locations with both IT and SEV resistant scores of stripe rust. Manhattan and Q-Q plots showing the marker-trait association with the stripe rust resistance response IT and SEV scores, at all three locations and with the BLUE value, are provided in Supplementary Figures S3 and S4.  a Putative name of identified quantitative trait loci (QTL) from the marker-trait association, b SNP marker associated with all-stage resistance to stripe rust against different races, cd Chromosome number and the position of the associated SNP marker according to [25],

Phenotypic Variation to Stripe Rust Response
The present study was conducted to study the spectrum of genetic diversity, as well as to map the resistance loci in 465 Pakistani spring wheat genotypes in response to stripe rust (Puccinia striiformis f. sp. tritici) at seedling and adult plant stage. Profiling with high-density SNP markers helps in the identification of genomic diversity, population structure among genotypes and marker-trait association to identify well-worn and novel resistance sources in germplasm. In previous reports, variable numbers of stripe rust races were used to compute the response against the specific race [19,[26][27][28]. In the current study, a total of nine Pst races, including six from the USA and three of Pakistani origin were used to evaluate the genotypes response at the seedling stage also three different locations, tested for adult plant response (two years in Pakistan considered as one location and two locations in the USA).
In the current study, almost 50% intermediate response was observed at the seedling stage. Moreover, a high level of resistance (IT score = 0-3) response appeared at the adult plant stage and a low level of resistance phenotype appeared at the seedling stage. Similarly, a more susceptible response appeared at the seedling stage than at the adult plant stage [16,[27][28][29]. Wheat genotypes were more susceptible to USA stripe rust races compared to Pakistani stripe rust races, where the resistant phenotype response was dominant. Pakistani wheat genotypes have long history of cultivation [30] before the emergence of modern wheat. This suggests that they have more interaction with Pst races, prevailing themselves as a major source of resistance to Pst [31].
Population structure, based on the Bayesian model, divides the 465 genotypes into three subpopulations (K = 3). Based on the genotype diversity, a different number of subpopulations were reported, as two subpopulations [16], three subpopulations [29], six subpopulations [32], seven subpopulation [31] and eight subpopulations [27]. Critical r 2 was used to estimate the extent of the LD decay with the line intersecting the smooth curve [19]. In present study, the critical r 2 value for the whole genome was 0.12 (for 765 SNPs) and we found LD decay at a confidence interval of 1.25, previously reported as 1.6 cm [31].

Genome-Wide Association Analysis with Stripe Rust Response
The major factors that influenced the association mapping were the population size, the germplasm choice and the marker density over whole genome [32]. Several association mapping studies were reported for stripe rust using different molecular markers as diversity arrays technology (DArT) [26,33,34], simple sequence repeats (SSR) [35,36] and SNP [19,27,29,37,38].
SNP loci QYr.uaf-1A.1, mapped on the short arm of 1A positioned at 10.69 cm, was found to be associated with resistance to all three Pakistani races (PK07-4, PK07-12 and PK08-2). Likewise, earlier it was reported that the position of SNP IWB3662 lay within the confidence interval of YrEDWL-1AS, associated with resistance to PSTv-14 and PSTv-37 in durum wheat [19] and QYrid.ui-1A_Rio Blanco [39]. Three SNP markers (IWB12795, IWB20633 and IWB56353) associated with the loci QYr.tsw-1A were positioned on the 1A chromosome had already been reported [16] in close proximity to currently identified loci that were linked with resistance to PSTv-4 in spring wheat germplasm. Stripe rust-resistant locus QYr.uaf-1B.1 tagged with SNP IWB12258 in current research findings was found in close proximity of IWA1191 [28]. The Chromosomal position 1B reported with many Yr-associated genes as Yr9, Yr10, Yr15, Yr34/Yr26, Yr29, Y64, Yr65, YrAlp and YrH52 [31]. Chromosome 1B locus QYr.uaf-1B.1 was found in close proximity to QYrcau-1BS_AQ24788-53 [40] from Chinese winter wheat and Yr9 resistant gene [31]. In Pakistan, Yr9 was first reported in 1994, and after that many cultivated varieties were developed with this gene, due to its linkage with other genes (Lr26, Sr31, Pm8) and pleiotropic effect [41]. The successful translocation of the Yr9 gene from rye and alongside high-yield potential, made Yr9 a highly dominated gene in Pakistani germplasm [35]. Wheat varieties carrying this gene were highly cultivated during 1990s, but resistance breakdown after a few years made the resistant cultivars susceptible [42]. A major yellow rust epidemic was observed in 1995 with a 20% loss in the affected area in Pakistan. At that time, Pak81, also known as Veery#5 carrying Yr9 gene, predominated. Two major cultivars, Pak81/Pirasabak 85 became susceptible during the period 1994-1995 due to the ineffectiveness of the Yr9 gene and Inquilab 91 in 2002 due to the virulence occurrence of the Yr27 gene [11].
In the present research work, the QTL, QYr.uaf-2A.3 (linked SNP IWB11136), identified to be positioned at 9.41 cm on the 2A chromosome, was significantly associated with six locations including the IT (USA-MTV18, USA-PCFS18, BLUE-IT) and SEV response (USA-MTV18, USA-PCFS18 and BLUE-SEV). The same SNP marker IWB11136 (QYr.tsw-2A.3) that was identified previously was also found significantly associated with all the tested locations for the stripe rust response [16]. This locus was also found within the confidence interval of QYr.tam-2AS [54] from the hard winter wheat TAM111 and Yr17 genes [55]. The Yr17 gene was developed by 2NS-2AS locus translocation (25 to 38 cm) from T. ventricosum (2NS), a famous wild Triticeae species to 2AS of wheat chromosome [55,56]. The 2NS-AS translocation was first carried out in winter bread wheat VPM1 and afterwards in California and Washington, where many winter wheat cultivars were developed such as Madsen, Hyak and Expresso (spring wheat). Furthermore, the 2B QTL QYr.uaf-2B.3 positioned at 130.6 cm was linked with both the IT and SEV score at the USA location MTV, and was linked with the Yr genes Yr53 and Yr43 [47]. At chromosome 3B, two SNP markers (IWB11270 and IWB36652) were associated with the QTLs QYr.uaf-3B.2 and QTL QYr.uaf-3B.3, and were aligned with both the IT and SEV scores of USA-MTV, USA-PCFS18 and the BLUE value, which was found in close proximity to QYr.cim-3B_Pastor and QYr.inra-3Bcentr_Renan [57,58].

Collection of Genetic Material
Four hundred and sixty-five (465) genotypes of bread wheat (Triticum aestivum L.) were collected from the Wheat Research Institute, Ayub Agricultural Research Institute (AARI), Pakistan, Faisalabad.

Field Based Resistance Screening to Puccinia striiformis (Pst)
The panel of 465 spring wheat genotypes was tested for stripe rust response under field conditions. All genotypes were grown at the experimental area of the Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad (31 •  The evaluation of Pst at the adult plant stage was done in a field with the natural conditions of disease epidemics at all three locations (two years in Pakistan considered as one location and two locations in USA). In the field, the data scoring was done by visualizing the impact of the disease on the flag leaf of the susceptible check that the IT score, which ranged from 7-9 and the disease severity score which ranged from 70-100% [19]. One location was in Pakistan at the University of Agriculture, Faisalabad (UAF) for two consecutive years 2015-2016 (PAK-UAF16) and 2016-2017 (PAK-UAF17). The other two locations data was recorded at the USA for year 2018 at Pullman, WA (USA-PCFS18) and at Mount Vernon, WA (USA-MTV18). Infection type (IT) and disease severity (SEV) were two disease phenotype scores that were recorded for the Puccinia striiformis (Pst) infection in the field. The IT score based on the 0-9 scale is explained in Supplementary Table S1 whereas the SEV recorded as the % age area of the flag leaf covered with disease and was scored from 0 to 100%.

Greenhouse-Based Resistance Screening to Puccinia striiformis (Pst)
The seedling response of 465 wheat genotypes was evaluated with isolates of six Pst USA races including  and three isolates of Pakistani races PK07-4, PK07-12, PK08-2 [59] under controlled greenhouse conditions. The virulence and avirulence formulae of the stripe rust race isolates are provided in the Supplementary Tables S2 and S3. All the stripe rust races were collected from the USDA, Wheat Health, Genetics and Quality Research Unit, Pullman, WA. Four to five seeds of each genotype were planted per well in a 96 wells tray filled with the number 1 sunshine mix growing medium (Sungro Horticulture, Bellevue, WA, USA). Trays were regularly watered and kept in a rust free greenhouse at 20 • C with 50% relative humidity (RH). The inoculation was done when the seedling reached the 2-leaf stage after approximately 9-10 days of sowing. The inoculation of each race was done by mixing the rust urediniospores with talcum powder. Inoculated plants were incubated in a dark dew chamber for 24 h at 10 • C and 100% relative humidity and then moved to the greenhouse having the 8 • C day, 16 • C night temperature and 16 h photoperiod. The reaction to Puccinia striiformis f. sp. tritici was scored after 18 to 20 days of inoculation, using the 0-9 scale for the infection type (IT) [60,61]. Based on the infection type, the genotypes were grouped as 0-3 = resistant; 4-6 = intermediate; 7-9 = susceptible. The scale of the infection type (IT) disease score is discussed in Supplementary Table S1.

Statistical Analysis
The range, mean, standard deviation, coefficient of determination (R 2 ) were scored within and across the environments using JMP Genomics 15.1.0 (SAS Institute Inc., Cary, NC, USA, 2007). Broad sense heritability (H 2 ) was calculated by the variance component obtained from REML (random effects model) computed using JMP software. The BLUE (best linear unbiased estimator) value for the IT and SEV scores of all environments was calculated using the PROC MIXED procedure in SAS v9.3 (SAS Institute Inc., Cary, NC, USA, 2007) considering the genotype as a fixed effect [16].

DNA Extraction, SNP Genotyping
Wheat genotypes sown in greenhouse and young leaves used for DNA extraction using a robotic system of oKtopureTM at the Western regional small grain genotyping laboratory (WRSGGL) (Washington State University, Pullman, WA, USA) [62] for SNP genotyping against stripe rust. Targeted amplicon sequencing (TAS) for stripe rust resistant genes was done using NextSeq ® 500 (Illumina, Inc., Pullman, WA, USA). Genotypic calling and removing monomorphic as well as low quality SNPs, was carried out using GenomeStudio Software v2011.1. (Illumina, Inc., Pullman, WA, USA) to call bi-allelic SNPs AA, AB and BB, for this default clustering algorithm, was used. A total 1500 SNPs were yielded and subjected to TASSEL (trait analysis by association, evolution and linkage) software v.5.2.61 [63] to remove the SNPs with minor allelic frequency MAF < 0.05 and to made kinship matrix. A total 765 high-quality SNPs were selected and projected on to a consensus map of hexaploid wheat to order them based on the chromosome position [25]: these 765 SNPs were used for association analysis.

Population Structure and Linkage Disequilibrium (LD)
Major genetic structure of the selected genotypes was determined using 150 SNP markers with inter marker distances >5 cm from each other using the Bayesian model-based clustering algorithm in STRUCTURE v2.3.4 [64]. The number of subpopulations (K) was estimated by the running simulation from burn-in 10,000 iteration to 100,000 Monte Carlo Markov Chain (MCMC) replicates. K was run from 1-10 times and 10 independent runs were set for each run. The STRUCTURE results were visualized to determine the value of K (subpopulation) based on the ad hoc criterion by using the STRUCTURE HARVESTER [65,66].
The measurement of the linkage disequilibrium between the pairs of the SNP marker was estimated using the program TASSEL (v5.2.61). The LD parameters D' and r 2 among the loci and comparison-wise significance was computed by 1000 permutations. The critical r 2 value was determined by taking the 95th percentile of the unlinked markers [67]. The scatter plot among the r 2 and distance on chromosome, of all significant (p < 0.001) pairwise combinations, were used to fit the locally weighted polynomial regression curve (LOESS) to estimate the extent of the LD decay in the R environment [16] using the critical r 2 value.

Genome-Wide Association (GWA) Analysis
Integrated mixed-model (MLM) method for association mapping, which accounts for multiple levels of relatedness, was used to narrate the genetic polymorphism to important phenotypic variation in specific traits [68]. An association test was performed using both (1) the Genome Association and Prediction Integrated Tool (GAPIT) [69] and (2) fixed and random model circulating probability unification (FarmCPU) package [70] implemented in R software v.3.6.1 (https://www.r-project.org/). The MLM model utilized trait data, genotype data K (kinship) and PCA (principle component analysis) to find the marker-trait association. The model comparison was done to select the best model for the marker-trait association (MTA) with each trait as K + P (kinship and principal component) [44] and K + Q (kinship and population structure) [68]. The final results were analyzed by FarmCPU, selecting the model (K + Q) based on their respective Q-Q plots. Significant MTA was described based on the p-value. Markers with p < 0.0001 were considered significant for the seedling test and p < 0.001 for the field experiment. Marker-trait association was performed with all nine rust races data, scored at the seedling stage and with the field data of all the environments separately for the IT score, the disease severity score and with the best linear unbiased estimator (BLUE) value using the FarmCPU package implemented in R software v.3.6.1.

Conclusions
GWAS provides a good outline of the distribution and frequency of resistance genes over the whole world subpopulation. This spring wheat Pakistani germplasm was proved an efficient source of phenotypic diversification to combat stripe rust infection for both seedling and field experiments and to determine the yield QTLs related to the yield components. The genotypes possessing a higher fraction of resistance loci of stripe rust divulged themselves as a parental breeding line and hence can increase the breeding efficiency for stripe rust resistance. The present research findings can be exploited by wheat breeders to increase the resistant capability and yield potential of their cultivars by marker-assisted selection breeding.