Combination of Linkage Mapping, GWAS, and GP to Dissect the Genetic Basis of Common Rust Resistance in Tropical Maize Germplasm

Common rust (CR) caused by Puccina sorghi is one of the destructive fungal foliar diseases of maize and has been reported to cause moderate to high yield losses. Providing CR resistant germplasm has the potential to increase yields. To dissect the genetic architecture of CR resistance in maize, association mapping, in conjunction with linkage mapping, joint linkage association mapping (JLAM), and genomic prediction (GP) was conducted on an association-mapping panel and five F3 biparental populations using genotyping-by-sequencing (GBS) single-nucleotide polymorphisms (SNPs). Analysis of variance for the biparental populations and the association panel showed significant genotypic and genotype x environment (GXE) interaction variances except for GXE of Pop4. Heritability (h2) estimates were moderate with 0.37–0.45 for the individual F3 populations, 0.45 across five populations and 0.65 for the association panel. Genome-wide association study (GWAS) analyses revealed 14 significant marker-trait associations which individually explained 6–10% of the total phenotypic variances. Individual population-based linkage analysis revealed 26 QTLs associated with CR resistance and together explained 14–40% of the total phenotypic variances. Linkage mapping revealed seven QTLs in pop1, nine QTL in pop2, four QTL in pop3, five QTL in pop4, and one QTL in pop5, distributed on all chromosomes except chromosome 10. JLAM for the 921 F3 families from five populations detected 18 QTLs distributed in all chromosomes except on chromosome 8. These QTLs individually explained 0.3 to 3.1% and together explained 45% of the total phenotypic variance. Among the 18 QTL detected through JLAM, six QTLs, qCR1-78, qCR1-227, qCR3-172, qCR3-186, qCR4-171, and qCR7-137 were also detected in linkage mapping. GP within population revealed low to moderate correlations with a range from 0.19 to 0.51. Prediction correlation was high with r = 0.78 for combined analysis of the five F3 populations. Prediction of biparental populations by using association panel as training set reveals positive correlations ranging from 0.05 to 0.22, which encourages to develop an independent but related population as a training set which can be used to predict diverse but related populations. The findings of this study provide valuable information on understanding the genetic basis of CR resistance and the obtained information can be used for developing functional molecular markers for marker-assisted selection and for implementing GP to improve CR resistance in tropical maize.


Introduction
Common rust (hereafter CR) caused by Puccina sorghi is one of the most destructive fungal foliar diseases in maize-growing regions predominantly in humid areas and has been reported to cause 12 to 61% yield losses in a favorable environment [1,2]. These yield losses are subject to leaf area infected and host growth stages whereby the former has been estimated to cause about 3-8% yield loss for each 10% of the total leaf area affected [3]. Quantitative resistance is due to partial resistance or adult plant resistance [4]. Numerous studies have suggested that older and mature tissues have more resistance to CR than younger soft tissues [4][5][6].
Past efforts to control CR through conventional means have been largely unsuccessful and also affected by unpredictable weather, and the use of fungicides leads to environmental effects and increased production costs [7]. Host-plant resistance has been identified as the most reliable and economically viable option among several available options to alleviate plant diseases [7,8]. In the case of CR, researchers have identified both qualitative and quantitative resistance [9,10]. Resistance through R genes has been identified more than 25 dominant race-specific (Rp) genes in chromosomes (chr) 3, 4, and 10 [9,11], which mediate the recognition of the pathogen and trigger a hypersensitive reaction to prevent further spread of the pathogen [12]. However, novel P. sorghi races can overcome the qualitative resistance in some genotypes and this requires a continuous search for sources of stable and durable resistance (quantitative) in order to manage the disease. Identifying resistant germplasm to CR and incorporation of resistant genes or genomic regions to elite lines and commercial hybrids has the potential to increase yields with lower production costs [13].
Genetic mapping through linkage analyses and GWAS have been used in many studies in plant breeding [12,[14][15][16][17][18][19][20][21][22][23][24][25][26]. The two approaches exploit the recombination's ability to break up the genome into fragments that can be correlated with phenotypic variation but differ with the type of control they have on the recombination [27]. Linkage mapping in plants utilizes biparental crosses and, thus, is a closed controlled system. This further limits the number of recombinations that can sufficiently shuffle the genome into small fragments and results into QTLs localized in large chromosomal regions [28]. On the other hand, GWAS uses natural populations that mimic historical recombinations and provide a higher resolution as compared to linkage mapping [25]. This approach, however, has no control over relatedness and is prone to spurious associations. Accounting for population structure and kinship relatedness in a mixed model has been the most effective method of reducing false associations in GWAS [29]. Quantitative resistance for CR, a more durable approach, has also been exploited in a few studies through quantitative trait loci (QTL) analysis and genome-wide association studies (GWAS). A total of 41 QTLs have been identified conferring resistance to CR across all maize chromosomes in four separate studies utilizing different maize germplasm [7,10,13,30]. One study by Olukolu et al. [12], carried out GWAS and identified three marker trait associations (MTAs) in chr 2, 3, and 8 using 274 diverse inbred lines and 246,497 SNPs. Another study combined linkage and association mapping on 296 tropical maize inbreds and identified 25 QTLs on chr 1, 3, 5, 6, 8, and 10 associated with CR resistance [31]. These markers and candidate genes were directly or indirectly involved with plant disease responses [12,31].
In cases where the phenotype is strongly correlated with relatedness, population mapping even with the use of mixed models can be severely underpowered [27]. Joint linkage association mapping (JLAM) has the potential to overcome the downsides of the both linkage mapping and association mapping approaches whereby one compliments the other. Association mapping will increase the mapping resolution power while linkage mapping will account for relatedness in cases where Q + K explains most of the phenotypic variance. To our knowledge, no study has been reported utilizing JLAM to identify QTLs conferring resistance to CR in maize.
Another promising genomic tool is the genomic prediction (GP) which has been applied successfully in many plant breeding studies [32,33]. Previous reports have indicated the potential of GP to increase genetic gain and reduce the time taken in breeding programs significantly [33]. GP utilizes genome-wide markers to estimate the effects of all loci and computes a genomic estimated breeding value (GEBVs) [34]. Unlike genetic mapping, GP does not identify significant MTAs first but goes ahead to use all markers available to estimate their effects thus providing a powerful approach to account for any effects that might have been missed by genetic mapping. However, this doesn't mean complete withdrawal of genetic mapping but rather the incorporation of the two in genetic studies as complimentary approaches since each provide significant advantages. The main objectives of this study were to: (1) assess the phenotypic variation for CR by evaluating five segregating populations and an association panel in multiple environments; (2) detect stable and the novel genetic loci significantly associated with CR resistance by using QTL mapping, JLAM, and GWAS; and (3) GP within populations and Improved maize for African Soil (IMAS) panel and among the IMAS and F3 populations in order to understand the genetic architecture of maize CR resistance.

Results
Comparable disease pressure was observed in each environment as indicated by significant genotypic variance at each environment for each population (Table S1). Further, significant (p < 0.05) Pearson correlations were also observed among phenotypic values determined at different environments for each population (Table S2). This suggested that there was enough CR disease pressure in each environment. Cross-environment analyses revealed normal distributions of CR disease severity (score 1-9) in the F 3 populations and IMAS association panel, ranging from high resistance to moderate susceptibility with scores ranging from 2.0 to 6.0 and with average means ranging from 3.4 to 4.1 for different populations (Table 1 and Figure 1). Analysis of variance for the biparental populations and the association panel showed significant genotypic and genotype x environment (GXE) variances except for GXE of Pop1 and Pop4 (Table 1). Heritability (h 2 ) estimates were moderate with range of 0.37-0.45 for the individual F3 populations, 0.45 across the F3 populations and 0.65 for the IMAS panel. An IMAS association mapping panel was earlier used to study the maize lethal necrosis resistance [20], where the population structure was reported in detail. In the IMAS panel, a rapid linkage disequilibrium (LD) decay across physical distance in kb was reported (Figure 2). At LD cut-off points of r 2 = 0.1, the average physical distance was 14.97 kilo base pairs. GWAS was performed using a mixed linear model by integrating population structure (PCA) and family relatedness (kinship) within the IMAS panel using 337,110 high quality SNPs. GWAS for CR identified 14 significant marker-trait associations (MTA, significant threshold p < 9 × 10 −6 ). These SNPs were found across all chromosomes except on chr 6, 7, and 8 (Table 2 and Figure 3). The distribution of SNPs across chromosomes and their level of significance for CR are shown in a Manhattan plot ( Figure 3A). To test the ability of the model used, a quantile-quantile plot of the observed-log p-value vs. the expected-log p-value was plotted. As shown in Figure 3B, the population structure was controlled well by the mixed linear model ( Figure 3). In terms of the percentage of phenotypic variance explained (PVE), the SNPs identified from GWAS individually explained 6-10% (Table 2). SNP S5_51353429 had the largest significant MTA while S3_21856582 explained the most phenotypic variance. Candidate genes were selected around the significant SNPs and identify the putative function of these genes. Six candidate genes were identified in the significant SNP sites or adjacent to these sites ( Table 2). We identified two candidate genes each on chromosomes 2, 5, and 10. well by the mixed linear model ( Figure 3). In terms of the percentage of phenotypic variance explained (PVE), the SNPs identified from GWAS individually explained 6-10% (Table 2). SNP S5_51353429 had the largest significant MTA while S3_21856582 explained the most phenotypic variance. Candidate genes were selected around the significant SNPs and identify the putative function of these genes. Six candidate genes were identified in the significant SNP sites or adjacent to these sites (Table 2). We identified two candidate genes each on chromosomes 2, 5, and 10.   well by the mixed linear model ( Figure 3). In terms of the percentage of phenotypic variance explained (PVE), the SNPs identified from GWAS individually explained 6-10% (Table 2). SNP S5_51353429 had the largest significant MTA while S3_21856582 explained the most phenotypic variance. Candidate genes were selected around the significant SNPs and identify the putative function of these genes. Six candidate genes were identified in the significant SNP sites or adjacent to these sites (Table 2). We identified two candidate genes each on chromosomes 2, 5, and 10.    Linkage map for each of the five populations was constructed. For each population, the number of progenies or families, markers, map lengths, and average genetic distances between the markers are presented in Table S3. Detection of QTL in the F3 populations revealed seven QTLs in pop1, nine QTL in pop2, four QTL in pop3, five QTL in pop4 and one QTL in pop5, distributed on all chromosomes except chromosome 10 ( Table 3). In pop1, QTLs individually explained 3-7% of phenotypic variance, in pop2, individual QTL explained 2-11% of total variation. In pop3, each QTL explained 5-9% of variation, whereas the range was 3-20% in pop4 and 17% in pop5. The total variance explained by each population ranged from 14 to 40%. Comparison of QTL detected among five populations revealed that QTL qCR1-78 was detected in both pop1 and pop5 on chromosome 1. Another QTL qCR3-151 detected in pop2 was within the confidence interval of the QTL qCR3-113 detected in pop1 (Table 3). QTL qCR9-117 detected in pop3 overlapped with QTL qCR9-118 observed in pop2.  Linkage map for each of the five populations was constructed. For each population, the number of progenies or families, markers, map lengths, and average genetic distances between the markers are presented in Table S3. Detection of QTL in the F 3 populations revealed seven QTLs in pop1, nine QTL in pop2, four QTL in pop3, five QTL in pop4 and one QTL in pop5, distributed on all chromosomes except chromosome 10 ( Table 3). In pop1, QTLs individually explained 3-7% of phenotypic variance, in pop2, individual QTL explained 2-11% of total variation. In pop3, each QTL explained 5-9% of variation, whereas the range was 3-20% in pop4 and 17% in pop5. The total variance explained by each population ranged from 14 to 40%. Comparison of QTL detected among five populations revealed that QTL qCR1-78 was detected in both pop1 and pop5 on chromosome 1. Another QTL qCR3-151 detected in pop2 was within the confidence interval of the QTL qCR3-113 detected in pop1 (Table 3). QTL qCR9-117 detected in pop3 overlapped with QTL qCR9-118 observed in pop2.
All five F 3 populations and IMAS panel was plotted by using first three principal components, which together contributed for 27.5% of the variation. PCA plot showed clear clustering of the five  (Table 4). These QTLs individually explained 0.3 to 3.1% and together explained 45% of the total phenotypic variance. Among the 18 QTL detected through JLAM, six QTLs, qCR1-78, qCR1-227, qCR3-172, qCR3-186, qCR4-171, and qCR7-137 were also detected in individual population-based linkage mapping ( Table 3). Out of these 18 QTLs, one QTL on chromosome 7 (qCR7-10) followed by QTL on chromosome 1 (qCR1-78) were strongly associated with CR disease severity in terms of p values. all chromosomes except on chromosome 8 (Table 4). These QTLs individually explained 0.3 to 3.1% and together explained 45% of the total phenotypic variance. Among the 18 QTL detected through JLAM, six QTLs, qCR1-78, qCR1-227, qCR3-172, qCR3-186, qCR4-171, and qCR7-137 were also detected in individual population-based linkage mapping ( Table 3). Out of these 18 QTLs, one QTL on chromosome 7 (qCR7-10) followed by QTL on chromosome 1 (qCR1-78) were strongly associated with CR disease severity in terms of p values.    We used the RR-BLUP model to predict the lines performance within each population. The prediction correlation was highest for pop4 (r = 0.51) followed by the IMAS panel (r = 0.46) and pop 2 (r = 0.46) and was low for pop3 with 0.19 ( Figure 5A). However, the prediction through combined analysis of the five F 3 populations reported high improvement in the prediction accuracy with 0.78 for CR ( Figure 5A). Prediction of biparental populations by using IMAS panel as training population reveals the correlations of 0.15, 0.05, −0.14, 0.22 and 0.07 for pop1, pop2, pop3, pop4, and pop5, respectively ( Figure 5B).

Discussion
Over the past years, CR epidemics have mainly resulted from high levels of susceptibility among commercial maize hybrids [3]. Past efforts to exploit genetic resistance for CR have largely been through R genes [9,11]. A major drawback with genetic resistance through R genes is that it is overcome with time [12]. Quantitative resistance is by far a better option compared to single gene based resistance; however, few studies have been carried out on quantitative resistance to CR. This study therefore aimed to detect quantitative resistance to CR through GWAS by using IMAS association mapping panel, and QTL detection, JLAM, and GP using five F3 populations.
Phenotypic analyses revealed normal distributions in all populations varied from highly resistance to moderate susceptibility, therefore indicating polygenic resistance in the materials used. Genotype and GxE interaction variances were significant in most populations which showed their importance in CR resistance. Heritability estimates were moderate in F3 populations. High

Discussion
Over the past years, CR epidemics have mainly resulted from high levels of susceptibility among commercial maize hybrids [3]. Past efforts to exploit genetic resistance for CR have largely been through R genes [9,11]. A major drawback with genetic resistance through R genes is that it is overcome with time [12]. Quantitative resistance is by far a better option compared to single gene based resistance; however, few studies have been carried out on quantitative resistance to CR. This study therefore aimed to detect quantitative resistance to CR through GWAS by using IMAS association mapping panel, and QTL detection, JLAM, and GP using five F 3 populations.
Phenotypic analyses revealed normal distributions in all populations varied from highly resistance to moderate susceptibility, therefore indicating polygenic resistance in the materials used. Genotype and GxE interaction variances were significant in most populations which showed their importance in CR resistance. Heritability estimates were moderate in F 3 populations. High heritability observed in the IMAS association panel may be attributed to the large genetic diversity of the germplasm used.
Previous studies on the population structure of the IMAS association panel used in the present GWAS showed confounding structure in the panel posing a need to account for it [18,35]. The quantile-quantile plot of expected vs. observed log p values indicated that the mixed model used in this study effectively accounted for the population structure and kinship matrix. Ideally, the p values are expected to follow the diagonal plot of the expected vs. observed p values assuming no marker trait associations. In real datasets, the markers are expected to deviate to the left side of the plot as a result of true marker trait associations, however, strong deviations to the left indicate inflated false positives while to the right indicate deflated false negatives [25].
Association mapping resolution also strongly depends on the LD of the particular population under study since it exploits the historical recombinations that occur in natural populations [36]. Rapid LD decay as shown in the IMAS population shows the significant diversity in the panel and its suitability for GWAS. The small mapping distance of 50kb and rapid decay at r 2 = 0.1 enable the detection of marker trait associations with small effects that might have otherwise been missed with other mapping approaches.
In the GWAS, 14 significant SNPs were associated with P. sorghi resistance across three environments (Table 2, Figure 3), of which some overlapped with previously reported QTL intervals [12,31,37]. Specifically, SNP S3_21856582 detected in this study is also reported earlier [12] and interestingly it is also overlapped with two biparental based QTL studies [10,13] which suggests the possibility of potential candidate gene/marker for CR resistance across genetic backgrounds and environments. Another two SNPs on chromosome 5, S5_10087070 and S5_10089138 are found close to SNP S5_10,055,423 found in the previous association mapping study in tropical maize germplasm [31]. Other SNPs detected in this study seems to be novel and add for the new source of resistance for CR.
The candidate genes in the present study were identified as encoding transcription factors (TFs), ATP binding and intracellular signaling. A putative candidate gene GRMZM2G181030 has been linked with the MYB superfamily which is one of the three largest TF families in maize [38]. The important roles of MYB superfamily are in developmental processes and defense responses in plants [39]. GRMZM2G322582 was encoded as ATP binding protein which play important roles in membrane transport, cellular motility and regulation of various metabolic processes [40]. GRMZM2G086484 was linked to the Pleckstrin homology (PH) domain superfamily protein which plays a role in recruiting proteins to different membranes, thus targeting them to appropriate cellular compartments or enabling them to interact with other components of the signal transduction pathways [41]. GRMZM2G181002 was annotated in protein kinase superfamily specifically the phosphotransferases, serine and threonine specific kinase superfamily. Serine/threonine kinase receptors play a role in the regulation of cell proliferation, programmed cell death (apoptosis), cell differentiation, and embryonic development [42]. Lastly, GRMZM2G009188 was encoded as 11-beta-hydroxysteroid dehydrogenase 1B whose role is in the regulation of energy metabolism and immune system by locally reactivating glucocorticoids predominantly by catalyzing the reduction of cortisone to cortisol in intact cells that also express hexose-6-phosphate dehydrogenase (H6PDH), which provides cofactor NADPH [43].
The linkage analysis within the individual bi-parental populations was also carried out revealing 23 QTLs associated with CR resistance in maize and were distributed on all chromosomes (  [7,10,12,13,31]. The present study's results corroborate these earlier studies, as most of the identified QTLs fell in those bins of linkage groups. Within the current study we also found several QTLs overlapping across populations with linkage mapping, across methods with linkage analyses, JLAM, and GWAS (Table 2, Table 3, Table 4). QTL qCR1-139 detected on pop4 was co-located with QTL detected in F 3 pop 3 and F 3 pop 5, which suggests one of possible potential candidate for CR resistance in chromosome 1 across different genetic back grounds.
Identification of genomic regions for CR resistance have previously been carried out in biparental-based linkage analysis [7,10,13,30] and natural population-based association mapping [12,31]. The current study consequently sought to exploit JLAM using multiple segregating populations in conjunction with QTL analysis and GWAS. GWAS and JLAM increased the resolution within the confidence intervals of QTL for CR resistance. qCR1-139, which explained >20% of the phenotypic variance, was a locus with major effect based on genetic linkage mapping, and was likely to consist of two QTL as revealed by GWAS and JLAM results. SNP S1_220067760 identified through GWAS and SNP S1_227241027 identified through JLAM fall within the confidence interval of QTL qCR1-139 (Tables 2-4). The significant marker S2_16361185 detected through GWAS and marker S2_20589802 (qCR2-20) detected through JLAM fall within the confidence interval of another QTL on chromosome 2, qCR2-16 detected in pop3, which helps to increase the precision of QTL position. Fractionation of single major QTL was also reported previously in maize [44]. Thus, the combination of linkage mapping, JLAM and GWAS approaches helped to further refine the CR resistance loci to an extent that it became possible to separate the effects of two co-segregating QTL (on chromosome 1 and 2). Thus, GWAS and JLAM can serve as a great complementary tool to identify and validate the molecular markers linked to QTL and to be used in applied breeding. Nevertheless, it is warranted to validate through fine mapping and gene cloning coupled with functional genomics studies, in order to clarify further on QTL intervals identified and refined in this study.
Since GS was demonstrated to be useful in plant breeding, there have been many studies that demonstrate the utility of GP in breeding for disease resistance in crops [26,[45][46][47][48][49]. Prediction correlations for within association panel and biparental populations is comparable to the other disease reported in maize [18,24,50]. As described in earlier study [51], the benefits of large populations or combined population predictions were higher compared to individual population-based predictions. This corroborates with the present study whereby combined population prediction was higher over individual or smaller population-based predictions ( Figure 5A).
The success of GP gains more attention when the training population is related but independent of testing or validating populations. In the current study, since the parental lines of F 3 populations are part of the IMAS association panel, we tried to use IMAS panel as an independent training population and predict the F 3 populations for their performance for CR resistance. We observed, for four out of five populations, prediction correlations of 0.05, 0.07, 0.15, and 0.22 (Figure 4), which is comparable to the prediction correlations observed for maize lethal necrosis [26] under similar testing environments. Overall, results from our study encourages to develop an independent but related diverse population as training population to predict CR resistance. This also supports the use of GEBVs as indirect selection criteria, at least to remove the lines with poor performance for CR resistance. Conducting separate field trials for CR screening in the breeding pipelines always takes additional resources. The model's predictive ability, with cross validation, showed moderate prediction accuracies for CR in pop1 and pop4 were comparable to the correlations observed under similar scenario of predicting biparental populations by using association mapping panel for maize lethal necrosis and maize chlorotic mottle virus resistance [26]. Overall, to have an effective independent training population, relatedness between training and validation population is critical as well as the trait architecture and heritability. Compared to abiotic stress related traits, traits like CR and many other maize diseases are relatively less complex and our results suggests that it is possible to have an independent training population with continuous improvement and used routinely in breeding program.
In conclusion, we used five F3 populations and one IMAS association mapping panel, together comprising 1300 lines, to unravel the genetic architecture of CR resistance. In this study we identified new QTLs as well as reconfirmed the QTLs reported in earlier studies for CR resistance in tropical and subtropical maize germplasm. Linkage mapping identified 26 QTLs with four major QTLs, which explained >10% of phenotypic variance. The detected QTLs were validated with GWAS, and several SNPs were found overlapping with the identified QTLs through either linkage mapping or JLAM. These genomic regions can serve as potential sources to improve resistance to CR. GP can be used in combined populations to predict the response of the germplasm to CR resistance. Having a common training population derived from intensively phenotyped and genotyped lines with diverse representation from a breeding program holds promise in breeding for CR resistance.
All the lines from IMAS panel and five F 3 populations were planted in a 4-m-long single row plots in an alpha lattice design with two replications in each location. Two seeds were planted per hill and thinned to a single plant per hill, three weeks after emergence to ensure uniform plant density. Standard agronomic practices were followed. The chosen locations were natural hotspots for foliar diseases including CR; good disease infection pressure across the trials at each location was observed. The IMAS panel, and F 3 populations were evaluated for their responses to CR in two to three environments (Tables S1 and S2). CR disease severity data was recorded after flowering and scored plot-wise on an ordinal scale of 1 (no rust, highly resistant, without disease symptoms) to 9 (highly susceptible, most severe). On the IMAS panel, in addition to CR severity scoring, data were also collected for anthesis date (AD). The populations used in this study, methods applied, and the different type of analyses used in this study are presented as a general analyses diagram ( Figure 6).
All the lines from IMAS panel and five F3 populations were planted in a 4-m-long single row plots in an alpha lattice design with two replications in each location. Two seeds were planted per hill and thinned to a single plant per hill, three weeks after emergence to ensure uniform plant density. Standard agronomic practices were followed. The chosen locations were natural hotspots for foliar diseases including CR; good disease infection pressure across the trials at each location was observed. The IMAS panel, and F3 populations were evaluated for their responses to CR in two to three environments (Table S1 and S2). CR disease severity data was recorded after flowering and scored plot-wise on an ordinal scale of 1 (no rust, highly resistant, without disease symptoms) to 9 (highly susceptible, most severe). On the IMAS panel, in addition to CR severity scoring, data were also collected for anthesis date (AD). The populations used in this study, methods applied, and the different type of analyses used in this study are presented as a general analyses diagram ( Figure 6).

Phenotypic and Genotypic Data Analysis
For IMAS panel and each F 3 populations, each location-year combinations were treated as an independent environment which resulted into three environments for IMAS panel, and F 3 pop2, whereas the number of environments were two for F 3 pop1, F 3 pop3, F 3 pop4, and F 3 pop5. Analysis of variance for individual and across environments was undertaken using the ASREML-R [54] for each bi-parental population and the IMAS panel. The following statistical model was used to estimate the variance components (Equation (1): where CR mnop is the phenotypic performance of the mth genotype at the nth environment in the oth replication of the pth incomplete block, µ is an intercept term, Genotype m is the genetic effect of the mth genotype, Env n is the effect of the nth environment, (Genotype x Env) mn is the interaction effect between genotype and environment, Rep(Env) on is the effect of the oth replication at the nth environment, Block(Rep.Env) pno is the effect of the pth incomplete block in the oth replication at the nth environment, and e mnop is the residual. The genotypic effect, genotype by environment interaction and effect of incomplete blocks were treated as random effects in order to estimate their variances and residual error variance. Environments and replications were treated as fixed effects. Assuming fixed genotypic effects, a mixed linear model was fitted to obtain the best linear unbiased estimates (BLUEs). Significance of variance components were tested by model comparison with likelihood ratio tests in which the halved p-values were used as an approximation. Broad-sense heritability (H 2 ) was calculated for all the traits using the following Equation (2): where σ 2 g is the genotype variance; σ 2 gl is the genotype × environment interaction variance; and σ 2 ε is the error variance, l represents number of environments and r for number of replications. META-R software [55] was used to obtain best linear unbiased prediction (BLUP) for each genotype across environments. Combined analyses of the five F 3 populations were carried out in META-R.
The F 3 populations lines as well as their parents, and the IMAS association panel inbred lines were genotyped with Genotyping by Sequencing (GBS). DNA of all lines was extracted from 3-4 leaves stage seedlings and genotyped using GBS platform at the Institute for Genomic Diversity, Cornell University, Ithaca, USA as per the procedure described in earlier studies [56]. The~955K GBS SNP datasets were filtered where a minor allele frequency of <0.05, heterozygosity of >5% and missing data rates >10% were excluded from further analysis in TASSEL ver 5.2 [57].

PCA and Linkage Disequilibrium
The principal components (PC) of the five F 3 populations and IMAS panel were estimated using TASSEL ver 5.2 and visualized using R software version 3.2.5 (https://www.r-project.org/) to obtain the explained variance of each PCs. Further, the LD of the IMAS panel was calculated using Tassel version 5.2 to obtain the rate of LD decay in the population. LD decay rate between each pair of SNPs was analyzed with the squared Pearson correlation coefficient (r 2 ). The rate of LD decay with physical distance was visualized and average pairwise distances at which LD decayed at r 2 = 0.1 and 0.2 were calculated in R software. The 'nlin' function in R was used to fit non-linear models into the genome-wide LD data by incorporating r 2 as responses (y axis) and pairwise distances (x axis) as predictors. The average estimator for LD decay was calculated at 'significance' threshold of r 2 = 0.1 and r 2 = 0.2 cutoff points in relation to distance [58] and a representative scatter plot was drawn as LD between adjacent markers versus chromosome distance (kb).

Genome-wide Association Analyses
The BLUP values obtained for CR were used in GWAS as phenotypes. The kinship matrix obtained with a centered identity by state (IBS) and the first five PCs which explained maximum variation were used to correct the population structure in a mixed linear model using TASSEL version 5.2. Genome wide scans for marker-trait associations were conducted with mixed linear model. The amount of phenotypic variation explained by the model was assessed using the R 2 statistics, calculated by fitting all significant SNPs simultaneously in a linear model in R. To determine the significance threshold, multiple testing correction was conducted where the total number of tests were estimated based on the average extent of LD at r 2 = 0.1 [19]. With respect to the above, significant associations were declared when p values in independent tests were less than 9 × 10 −6 . The 50 bp source sequences of the significantly associated SNPs were used to perform BLAST searches against the B73 RefGen v2 genome set in Maize GDB (http://www.maizegdb.org). The putative candidate genes identified in Maize GDB were within or adjacent to each associated SNP.

Detection of QTLs and Joint Linkage Association Mapping
The number of SNPs was further reduced by selecting homozygous and polymorphic markers between the parents in each population and were further filtered based on minimum distance between adjacent SNPs as ≥ 200 kilo base pairs (Kbps). For JLAM, markers from all five F 3 populations were combined, and markers with <1% missing value and >5% MAF and heterozygosity of <5% were retained. Finally, a set of 5000 SNPs that are uniformly distributed across the genome were used for JLAM analyses.
QTL IciMapping version 4.1 [59] was used to construct the linkage map based on data from all five biparental populations. QTL IciMapping was used to remove the highly correlated SNPs that do not provide any additional information by using an inbuilt tool BIN. This resulted in retention of 1130, 1047, 1099, 1122, and 1081 high-quality SNPs in pop1, pop2, pop3, pop4, and pop5, respectively (Table S3). BLUP values across environments for the F 3 populations were used in QTL detection analysis using inclusive composite interval mapping (ICIM). The probability in the stepwise regression was set at 0.01 and the scanning step was 1 cM. For determination of QTL significance, the threshold LOD score was set to >2.5 by using 1000 permutations and a p value ≤ 0.05. The phenotypic variation explained (PVE) by each QTL and across all QTLs for each trait was estimated. The origin of the favorable allele for CR resistance was identified based on the sign of the additive effects of each QTL.
A set of 5000 high-quality GBS SNPs derived from the genotyped F 3 populations were used in JLAM analyses. BLUPs calculated across populations and environments were used as phenotypes. To best carry out association mapping in multiple biparental populations, a two-step biometric model which incorporates population effect, cofactors and a marker effect across populations was used to detect QTL [60,61]. This model was explained in detail by Liu et al. [61] and Würschum et al. [60]. The first step involved selection of cofactors based on Schwarz Bayesian Criterion (SBC) [62] by including a population effect and cofactors carried out using PROC GLM SELECT implemented in the statistical software SAS 9.4 [63]. In the second step, p values for the F-test were calculated using a full model (including SNP effect) versus a reduced model (without SNP effect). Genome-wide scans for QTLs were implemented in R version 3.2.5 (R Development Core Team, Vienna, Austria; 2015). The Bonferroni-Holm procedure [64] was used to detect markers with significant (p < 0.05) main effects and was controlled for multiple testing. The total proportion of PVE by the detected QTLs was calculated by fitting all significant SNPs simultaneously in a linear model to obtain an adjusted R2 [65].

Genomic Prediction
GP was carried out with ridge regression BLUP [66] within and across the five F 3 populations as well as in the IMAS panel for CR resistance at five-fold cross-validation. BLUEs across environments for each of the biparental populations and across five F3 populations were used for the analysis.
For all biparental populations and IMAS panel, same set of high-quality uniformly distributed 5000 SNPs with no missing values and MAF > 0.05 were used. Prediction scenarios used for GP include: 'within population' where training and validation sets are derived from within individual biparental population and IMAS panel; a 'combined population' prediction approach where data from five F 3 populations were combined and sampled randomly to form validation and training set; and 'across population' prediction in which IMAS association panel was used as a training set and each F 3 population as a validation population. For each approach, 100 iterations were done for sampling of the training and validation sets.