Mapping Fusiform Rust Resistance Genes within a Complex Mating Design of Loblolly Pine

Fusiform rust resistance can involve gene-for-gene interactions where resistance (Fr) genes in the host interact with corresponding avirulence genes in the pathogen, Cronartium quercuum f.sp. fusiforme (Cqf). Here, we identify trees with Fr genes in a loblolly pine population derived from a complex mating design challenged with two Cqf inocula (one gall and 10 gall mixtures). We used single nucleotide polymorphism (SNP) genotypes at sufficient density to ensure linkage between segregating markers and Fr genes identifying SNPs that explained high proportions of variance in disease incidence using BayesC, that also were significant using Bayesian Association with Missing Data 348 (BAMD) software. Two SNPs mapped near Fr1 and generated significant LOD scores in single marker regression analyses for Fr1/fr1 parent 17 as well as four other parents. One SNP mapped near Fr8 and was significant for parent 28. Two SNPs mapped to linkage groups not previously shown to contain Fr genes and were significant for three parents. Parent 2 showed evidence of Fr gene stacking. Our results suggest that it is feasible to identify trees segregating for Fr genes, and to map Fr genes, based on parental analysis of SNPs that cosegregate with disease incidence in designed resistance screening trials.


Introduction
Fusiform rust is one of the most important pine diseases in the southeastern United States [1,2].It is caused by the fungus Cronartium quercuum (Berk.)Miyabe ex Shirai f.sp fusiforme (Cqf), which alternates between oak and pine host species [1].In pine, this fungus causes galls in stems and branches, reducing growth, reducing wood quality and making trees susceptible to breakage in windstorms, thereby generating significant economic losses [2].
Genetic resistance to fusiform rust can involve major genes in the pine host [3][4][5][6].Resistant host genotypes carry one or more fusiform rust resistance (Fr) genes that interact with corresponding avirulence (Avr) genes in the pathogen.These are allele-specific interactions between host and pathogen-if the host is homozygous recessive (fr/fr) for an Fr gene, or if the pathogen carries an allele for virulence (avr) that can overcome the host Fr gene, the result in both cases is a diseased host.When the host carries a dominant allele (Fr) for resistance, and the pathogen is avirulent (Avr) to that gene, host resistance is expressed.
Given the difficulty of unraveling specific genetic interactions in populations with multiple Fr genes (at varying but unknown frequency) and multiple Avr genes (at varying but unknown frequency), mapping Fr genes has been best accomplished in families that segregate for single Fr genes after being challenged with genetically defined, single-spore-derived, inoculum [7,8].A useful approach has been to separate galled and non-galled progeny of a suspected heterozygous Fr/fr tree, and screen the two pools with many hundreds of DNA markers using bulk segregant analysis.When the inoculum is avirulent to the Fr gene, markers linked to the Fr gene co-segregate with the resistance phenotype.This strategy was successfully applied to identify the first fusiform rust resistance locus in loblolly pine, Fr1, in progeny derived from a heterozygous (Fr1/fr1) genotype [6,8].The DNA marker J7_470 cosegregated with resistance in seedlings inoculated with a Cqf single-spore isolate that was homozygous for the corresponding Avr1 gene, and thus the marker was linked to Fr1.This finding was subsequently validated using the same parent [6] in the clonally-propagated CCLONES (Comparing Clonal Lines ON Experimental Sites) population [9] and in a slash × loblolly hybrid family [10].
Other Fr genes (numbered Fr2 through Fr9) have since been discovered and mapped to single loci in the loblolly pine genome.These typically used the same strategy of bulk segregant analysis to identify linked markers in progeny of trees with Fr genes, having been challenged by single-spore cultures with distinct avirulence specificities [3,8].When Fr genes are detected in two different parents at the same locus (i.e., show linkage to the same DNA marker(s)) but show distinct interactions with inocula, they are assumed to be part of a cluster of resistance genes.Resistance gene clusters (e.g., complex resistance loci) contain two or more resistance genes as defined by their distinct avirulence specificities [8].These gene clusters have been characterized for resistance genes in several different plant species in which the host has coevolved with the pathogen [11][12][13][14].
Given the availability of a large number of single nucleotide polymorphic (SNP) markers for the loblolly pine genome [15], we hypothesized that associations could be detected based on SNP marker linkage to Fr genes.The CCLONES population was already genotyped for SNP markers on each chromosome, so that bulk segregant analysis need not be performed on pools of galled and non-galled progeny to identify Fr-linked markers.Instead, phenotypes and genotypes can be associated to identify SNPs that explain variance in (i.e., co-segregate with) host resistance to the inoculum.We reasoned that the presence of parents 17 (known Fr1/fr1) and 32 (known Fr8/fr8) in the CCLONES population [16], coupled with the reported avirulence of both one gall and 10 gall inocula to Fr1 [9], would create useful internal checks for validating the approach, and for the possible discovery of additional Fr genes within the population.
To identify SNPs associated with genetic resistance, we used a statistical method called BayesC that is applied in genomic selection experiments [17] to identify those SNPs with the largest effects on disease phenotypes [18].We then used the association software BAMD (Bayesian Association with Missing Data), which performs multiple imputations to resolve for missing data, provides a simultaneous solution for all markers analyzed, and [19] generates a confidence interval around each SNP's effect.We detected linkage to Fr genes that were previously mapped to single loci using RAPD markers, which were subsequently localized on the genetic map of loblolly pine [20].We also detected linkage to at least one Fr gene in seven other parents, and found evidence of Fr gene stacking in one parent.The approach described here should prove useful for identifying parents that harbor Fr genes, and in tracking their transmission in southern pine breeding populations.

Plant Material and Disease Phenotyping
Phenotypic data from the 69 full-sib families of CCLONES were obtained from a previous study [16], where inocula were derived from aeciospores from a single gall (one gall inoculum), as well as with a mixture of aeciospores collected from 10 different galls (10 gall inoculum).The data consisted of results obtained six months after inoculation for both tests.Gall score was recorded at the ramet level: 1 for ramets with at least one gall or 0 for ramets with no galls, and then the gall score was calculated for each genotype as the proportion of galled ramets.Gall length (mm) was measured as a continuous trait on all galled ramets [16].The initial measurements [16] had misclassified families that were discovered after genotypic data became available.Therefore, a DNA marker-corrected pedigree structure of CCLONES was utilized for the current study [21].

Identification of Markers Using BayesC
Genotypic data for 4853 polymorphic SNPs were obtained using the Illumina Infinium platform (Illumina, San Diego, CA) [22].Genotypic data were available for 803 of the 1360 clones inoculated with the 10 gall inoculum and for 467 of the 698 clones inoculated with the one gall inoculum.
The hierarchical Bayesian model BayesC [17] was used to identify SNPs with large effects, since it is computationally less intensive than the association testing platform BAMD.As a consequence, we used BayesC to initially detect a subset of potentially significant SNPs, which we then evaluated in BAMD.For each trait, deregressed breeding values [23] were regressed on all markers with BayesC simultaneously and an estimate of the effect of each marker was obtained.For gall score, breeding values were obtained using the ramet incidence (estimating clonal breeding values using individual ramet gall score data).For gall length data, breeding values were obtained directly from the gall length measurements only from the subset of trees with galls.In each case, markers explaining greater than 0.2% of the phenotypic variance (i.e., the deregressed breeding values) were selected for association testing.

Association Testing Using BAMD
Association analyses were performed using BAMD (Bayesian Association with Missing Data), which finds simultaneous solutions for SNP effects, performs multiple imputation of missing SNP data and generates a confidence interval from the posterior distribution [19].Significant associations were detected using the following linear mixed model: where Y is the vector of deregressed breeding values [23] for the trait, X is the matrix for population structure effects, β is the coefficient for population structure effects, Z is the matrix for SNP effects, γ is the coefficient for SNP additive effects with a single variance and ε is the residual ~ N (0, Iσ 2 ε ).No specific population structure groups were identified using Structure software, version 2.3.4 [24,25], thus the X matrix represented a single population group.The matrix I is an identity matrix.Significant SNPs were obtained based on the 90%, 95%, 97.5%, 99% and 99.9% confidence intervals from the last 50,000 of a total of 100,000 iterations of the Gibbs sampler in BAMD.

Single Marker Regression Analyses and Mapping of Significant SNPs
Single marker regression was performed on the significant SNPs from the one gall and 10 gall score datasets.This was done to determine the likelihood of each marker being linked to a gene causing the phenotype.We used SAS software [26] to fit the following linear model: where Y is the vector of observations for the trait, µ is the population mean, r is the vector for fixed effect replication (1-5 reps), b is the vector for fixed effect SNP genotype (AA, AB, BB) and ε is the vector of random residual effects, and X and Z are incidence matrices.A reduced model was then fit where the SNP genotype effect was omitted from the equation and a likelihood ratio test was performed.

LR = −2ln(full model) + 2ln(reduced model) (3)
Logarithm of the odds (LOD) scores was obtained from likelihood ratios by multiplying LR by 4.61 for linear-log conversion.LOD scores of 3 or higher were noted, and LOD scores of 4 or higher were used in genetic interpretations.
We conducted all possible single-marker regression analyses by parent for significant SNPs detected by both BayesC and BAMD.SNPs that were significantly associated with gall score were examined for their potential inclusion in genetic linkage maps relative to existing markers, which included RAPD markers previously recognized as linked to Fr loci [8,20,22] (Munoz P. and Peter G. unpublished data [27]) with our reference being the standard linkage group identifiers for loblolly pine [20].Significant SNPs were recorded for all parents in the CCLONES population, and genetic inferences regarding Fr gene segregation were made based on the assumptions that: (1) parents segregating for an Fr gene would show a significant association with a marker linked to that Fr gene, and (2) the significant relationship would only be detected when the inoculum is avirulent to that Fr gene.These assumptions were based on a parent-by-parent basis given prior knowledge that parent 17 is Fr1/fr1 [6] and parent 32 is Fr8/fr8 (selection D in Amerson et al. [8]).We inferred that significant SNPs were associated with resistance when they cosegregated with an Fr gene (LOD > 4), and that SNPs would cosegregate with resistance only in cases where the inoculum was avirulent to the corresponding Fr gene.When a previously untested parent appeared to segregate for an Fr gene, we dropped out families (if any) shared with other parents that were either known to have an Fr gene, or inferred to have an Fr gene based on the parent-wise analysis.If the LOD score dropped <4, we inferred the previously untested parent did not have an Fr gene, but rather the association was due to the parent known to have an Fr gene.

SNP Effect Size and Number Differs for Gall Score and Gall Length
The SNPs detected using BayesC that explained the greatest proportion of variance (over 0.2%) in gall score and gall length were normalized, superimposed, and the results for the 10 gall inoculum are presented in Figure 1.A comparatively coarser profile of SNP effects is observed for gall score as compared to gall length.Individual SNPs explained a greater proportion of the variance for gall score than for gall length, as the top five SNPs for gall score accounted for 29.9% of the variance, whereas the top five SNPs for gall length accounted for only 4.3%.In addition, a greater number of SNPs accounted for >0.2% of variance for gall score (N = 50) than for gall length (N = 16).The same trends were observed for the one gall inoculum, where SNP effects for gall score were larger (8.8% for the top five) and greater in number (N = 29) than for gall length (3.1% for the top five, N = 15; Figures S1  and S2).Taken together, more SNPs with major effects were detected for gall score than for gall length, which suggested that major effect SNPs for gall score warranted closer examination.

SNPs Differ in One Gall and 10 Gall Experiments
Considering all SNPs identified by BayesC with effects >0.2% on gall score (Figure S1), there were eight SNPs that were common for those detected in the one gall and 10 gall experiments, with their ranking being similar.Stated a different way, there was a total of 71 SNPs that interacted uniquely with either the one gall or 10 gall inocula, and eight SNPs that interacted similarly with both inocula.
SNPs selected using BayesC were analyzed in BAMD, and the results from both methods are presented in Figure S1.BAMD assigned a nonzero mean effect on gall score with a confidence interval of 90% or greater for a total of 16 SNPs (out of a total 29) for the one gall inoculum and 28 (out of a total 50) for the 10 gall inoculum.Because the SNPs declared by BAMD as significant represented a subset of the total number of SNPs detected by BayesC, we focused our attention on those SNPs that were detected by both methods.SNPs for both gall score and gall length identified by both BayesC and BAMD are tallied in Table 1.
Since some of the significant SNPs presented in Table 1 were common for the two experiments, they are further compared in Figure 2. Of the 40 SNPs identified with high confidence (>90%) in one gall and 10 gall score experiments, four were shared.Collectively, these results imply there are some major SNP effects that explain host responses to both inocula.This would be expected when both inocula are predominantly avirulent to a specific Fr gene that is segregating in the population, e.g., SNP #4 (Figure 1).However, most SNPs are unique to the one gall or 10 gall inocula, presumably reflecting differential interactions of Fr genes avirulent to only one inoculum.All significant SNPs were examined for potential linkage to one or more rust resistance genes.1.Of the total of 21 significant SNPs that were detected in the 10 gall inoculum test, four were also significant for the one gall inoculum at varying levels of significance, while 12 were only significant for the one gall inoculum.

Significant SNP Markers are Associated with Resistance
Significant SNPs were regressed on phenotypes (i.e., gall score de-regressed breeding values) across the entire CCLONES population to determine the magnitude of the association, and by parent to identify those parents that may be segregating for resistance (Figure 3).In the population analysis (Figure 3A), the SNP genotype classes were typically represented by two major classes, with one of the homozygous classes in some cases represented by a small number of individuals, although that was not the case for all significant SNPs (Tables S2 and S3).Associations between gall score and the SNP marker are detected as significant mean differences between the major homozygous class and the heterozygous class (Figure 3A).When single-marker regression was performed by parent, most parents showed non-significant relationships between the SNP class and gall score among their progeny (Figure 3B).However, significant relationships (LOD score>4) were detected for subsets of parents.Cases were found in which the significant relationship between a marker and gall score for a parent was observed for one gall inoculum only, or for 10 gall inoculum only, or for both inocula, presumably reflecting differences in virulence of the inocula to specific Fr genes.
We tabulated the results from single marker regression analyses into a summary table (Table S1) for all parents, including the families for which they were either the seed or pollen donor, the inoculum source (one gall or 10 gall) that appears avirulent to the corresponding Fr gene, the LOD scores for significant SNPs, the linkage group to which the significant SNP has been assigned, and our genetic interpretation of the results.
We detected a few cases in which a significant LOD score was inconsistent with what was already known about a parent.Most notably, parents 18 and 19 were known test cross parents for Fr1 (i.e., they are both fr1/fr1); however both parents showed significant LOD scores for an Fr gene linked to the Fr1 locus.The most likely explanation for this inconsistency relates to the circular mating design within the CCLONES population.The parental analyses are not entirely independent, so parents adjacent to one another in the mating scheme tend to share families-e.g., family 400 (parent 17 × parent 18) and family 401 (parent 17 × parent 19).We eliminated shared families, repeated regression analyses for the relevant SNPs, and filtered the results in Table S1 to include only those parents for which we could confidently assign Fr genes to the correct parents.
The results from parents heterozygous for Fr genes, and for other parents that shared the same significant associations, are presented in Table 2. Two parents (parents 17 and 32) are known heterozygotes for Fr genes (Fr1/fr1 and Fr8/fr8, respectively; selections 10-5 and D, in Amerson et al. [8]), while one parent (parent 23) is heterozygous for a potentially new Fr gene (tentatively defined as Fr10) mapping to linkage group 11, which distinguishes it from previously recognized Fr genes 1-9 [8].Along with parent 17, four additional parents (parents 2, 12, 20 and 22) showed significant associations with Fr1-linked SNPs.Parent 32 (Fr8/fr8) showed no significant associations in the SNP marker analyses, while parent 28 showed significant associations with a SNP marker linked to a previously identified marker linked to Fr8 [8].Finally, SNPs associated with gall score in parent 23 (Fr10) were also associated in parents 2 and 4. Parent 2 showed evidence of Fr gene stacking, with Fr genes within the Fr1 and Fr10 complex loci.We eliminated parents 18 and 19 from Table 2 based on their known fr1/fr1 genotype, which we confirmed by finding non-significant LOD scores for parents 18 and 19 when families shared with parents 17 and 20 were excluded from the regression analysis.

Discussion
Rapid progress has been made in genetic development of fusiform rust resistance with a few generations of selection [3,28,29].Given this rapid progress, it seems reasonable to conclude that much of the improvement has been due to selection for major gene resistance [3,[30][31][32].Genomic markers now enable detection of parents that harbor specific Fr genes, so that their progeny deployed in plantation forests can be monitored for durability under field conditions.As a step toward the goal of identifying Fr genes in breeding populations, we used two statistical approaches-Bayes C and BAMD-to detect segregating Fr genes in a population generated by a complex mating design.BayesC is computationally rapid, and appears to efficiently identify SNPs associated with oligogenic traits [17,18].BAMD is computationally more intensive as it performs formal multiple imputation for missing SNPs, and generates a Bayesian confidence interval to assess the quality of associations.By using these two methods in tandem, we identified SNPs that were highly significant for rust resistance.It was not our intention to formally compare and contrast BayesC and BAMD; rather we wanted to screen for SNPs that mapped near Fr loci.We assume that higher LOD scores for the association represents closer linkage of the SNP marker to the Fr gene, coupled with a small proportion of misclassified genotypes, for example due to a low level of virulence in the respective inoculum to a particular Fr gene that is being mapped.

SNPs Mapped to Fr1 (LG2)
The avirulence of the one gall and 10 gall inocula to most Fr genes was not known prior to this study; however, an exception was Fr1, where we previously reported [9] that both one and 10 gall inocula were avirulent to Fr1 with rare exceptions.Therefore, we expected markers linked to Fr1 in the progeny of parent 17 would show significant associations in both one and 10 gall experiments.No predictions about avirulence to other Fr genes were made; rather we made genetic inferences about avirulence based on significant associations that were detected.
We validated the overall computational approach by detecting significant SNPs (including SNP #4) that were linked to Fr1 in parent 17 (Fr1/fr1).In addition to detecting Fr1, there is at least one other Fr gene at the Fr1 locus.This Fr gene is in parents 2 and 22 and differs from Fr1 in that only the one gall inoculum is avirulent to it (i.e., in the 10 gall inoculum this Fr gene has been overcome).The available data do not allow us to identify this Fr gene in parents 2 and 22-it could be one previously identified that is linked to Fr1 (Fr3, Fr4, Fr6, Fr7, Fr9) or a new one.More detailed analysis of parents 2 and 22 is warranted.
Parents 12 and 20 are similar to parent 17.The data are consistent with parents 12 and 20 harboring Fr1, but we cannot exclude a different Fr gene to which both inocula are avirulent.

SNPs Mapped to Fr8 (LG10)
Parents 28 and 32 appear to have different Fr genes near the Fr8 locus on linkage group 10.Previous mapping experiments [8] determined that parent 32 is Fr8/fr8, and it appears neither inoculum is avirulent to Fr8 because we detected no significant SNPs in this parent.By contrast, we detected significant SNPs linked to the locus in parent 28 (only one gall avirulent).

SNPs Mapped to LG11
Parents 2, 4, and 23 have at least one Fr gene that maps to linkage group 11.Previous inoculation trials and limited marker investigation (Amerson, unpublished) suggested that parent 23 is heterozygous for a previously unmapped Fr gene.For the resistance gene that we have recognized in the current study in parent 23, it appears only that the 10 gall inoculum is avirulent to this gene.This pattern is shared among the other parents, but we cannot exclude that one or more may harbor a different Fr gene to which only the 10 gall inoculum is avirulent.

Genetic Architecture of Rust Resistance
Significant SNPs were detected in some parents at more than one known Fr locus.Parent 2 may harbor more than one Fr gene, based on segregation for genes at Fr1 and the new locus on linkage group 11.As analyses proceed in more advanced pedigrees, it should be expected that parents with multiple Fr genes will be detected, especially given the importance of rust resistance as a selected trait and the opportunities for Fr gene -stacking‖ to have occurred during the first, second and/or third breeding cycle of southern pine tree improvement [28][29][30][33][34][35] Parent 2 is an interesting example in which the markers linked to Fr1 and the linkage group 11 locus enabled the detection of these Fr genes in only the one gall and in both inocula, respectively.Thus, the inocula served as differentials for the Fr genes, and the markers provided the information on their locations.Based on our results, it seems feasible to identify these kinds of parents in complex populations, and then sort out the actual Fr identification using more laborious but precise methods [4,30].
We also analyzed gall length on susceptible genotypes, in which larger galls indicate greater severity of the disease, whereas smaller galls may reflect a certain degree of tolerance; a possible indicator of partial resistance [16,35,36].No SNPs were shared between gall score and length datasets, which reinforces the observed lack of genetic correlation between the traits using quantitative genetic methods [16].In contrast to gall score, the genetic architecture of gall length appears to be quantitative, with many genes of small effect [16,36].

Identifying Fr Genes: Paths Forward
It will be useful to map all significant SNPs, so that individuals can be genotyped by breeders for Fr genes using flanking markers.Identification of Fr genes themselves would obviate the need to use SNPs in linked genes as surrogates for marker assisted selection.In several plant systems, disease resistance genes have been molecularly identified and this creates useful tools for breeders to monitor their transmission to progeny [37][38][39][40][41].Many of these resistance genes have been identified in genomic mapping studies, and as the loblolly pine draft genome continues to improve with additional information, integration of the genetic and physical maps around Fr loci should be feasible.The logic of genomic mapping is that an Fr gene must be present on any scaffold containing two known flanking markers.The loblolly pine genome is physically the largest sequenced to date at 21.7 Gb [42] with a genetic map distance of 899 cM [22].In a hypothetical situation in which two SNPs flanking Fr1 were 1 cM apart on a linkage map, the minimum length of a scaffold containing both SNPs would be ~20,000 Kb.Alternatively, candidate gene approaches based on homology to known R genes may help identify Fr genes.The continued refinement of the loblolly pine genome, coupled with increasingly dense genetic maps, should enable Fr genes to be identified in the near future.

Conclusions
We identified SNPs linked to known Fr genes in loblolly pine by using BayesC and BAMD in tandem.Specific parents that segregate for one or more Fr genes were identified using single-marker regression analysis.We also identified significant SNPs that map to presumed Fr loci in previously unreported regions of the pine genome.Four general principles seem the most important to consider in detecting Fr genes in breeding and natural populations: 1) Incorporation of host genotypes with known Fr genes as internal controls; 2) Integration of markers linked to Fr loci in the host population; 3) Use of differential inoculum, and 4) Reliable assessment of phenotypes to ensure correct classification of resistant and susceptible trees.Our approach supports the feasibility of using markers to guide breeding and selection for fusiform rust resistance.

Figure 1 .
Figure 1.Comparative magnitude of the normalized effects (absolute values) for gall score (red markers) and gall length (blue markers) obtained using Bayes C in the 10 gall datasets.SNP markers are represented by a single line and are in alphanumerical order.Numbers correspond to ranking of the top five SNPs with largest effects on gall score in the 10 gall dataset (See Figure S1 for identification of the top-ranked SNPs).

Figure 3 .
Figure 3. (A) Mean gall scores for 10 gall (top) and one gall (bottom) inoculum across genotypic classes for two significant SNPs for gall score.Error bars correspond to standard errors of the mean and the numbers within the graphs correspond to the number of clones for each genotypic class in the CCLONES population; (B) Single marker regression results showing LOD scores vs. mean gall score for 10 gall (or one gall) inoculum by parent for the two significant SNPs shown in A. Each point within the plot corresponds to a single parent.Red dots show parents with LOD scores above 3 with their corresponding identification number, while black dots show parents with LOD scores <3.

Table 1 .
Significant SNPs after tandem BayesC and BAMD analysis for resistance to fusiform rust.Venn diagram showing significant SNPs for rust resistance between gall score for 10 gall and one gall data that are summarized in Table

Table 2 .
Significant SNPs obtained using BayesC and BAMD that mapped to linkage groups (LG) containing Fr genes.Each parent was known from previous work to segregate for a major gene, or yielded a LOD score >4 when the genotypic classes for the SNP were regressed on gall score breeding values from one gall and 10 gall inoculation experiments.