Insights into the Genetic Architecture and Genomic Prediction of Powdery Mildew Resistance in Flax (Linum usitatissimum L.)

Powdery mildew (PM), caused by the fungus Oidium lini in flax, can cause defoliation and reduce seed yield and quality. To date, one major dominant gene (Pm1) and three quantitative trait loci (QTL) on chromosomes 1, 7 and 9 have been reported for PM resistance. To fully dissect the genetic architecture of PM resistance and identify QTL, a diverse flax core collection of 372 accessions augmented with an additional 75 breeding lines were sequenced, and PM resistance was evaluated in the field for eight years (2010–2017) in Morden, Manitoba, Canada. Genome-wide association studies (GWAS) were performed using two single-locus and seven multi-locus statistical models with 247,160 single nucleotide polymorphisms (SNPs) and the phenotypes of the 447 individuals for each year separately as well as the means over years. A total of 349 quantitative trait nucleotides (QTNs) were identified, of which 44 large-effect QTNs (R2 = 10–30%) were highly stable over years. The total number of favourable alleles per accession was significantly correlated with PM resistance (r = 0.74), and genomic selection (GS) models using all identified QTNs generated significantly higher predictive ability (r = 0.93) than those constructed using the 247,160 genome-wide random SNP (r = 0.69), validating the overall reliability of the QTNs and showing the additivity of PM resistance in flax. The QTNs were clustered on the distal ends of all 15 chromosomes, especially on chromosome 5 (0.4–5.6 Mb and 9.4–16.9 Mb) and 13 (4.7–5.2 Mb). To identify candidate genes, a dataset of 3230 SNPs located in resistance gene analogues (RGAs) was used as input for GWAS, from which an additional 39 RGA-specific QTNs were identified. Overall, 269 QTN loci harboured 445 RGAs within the 200 Kb regions spanning the QTNs, including 45 QTNs located within the RGAs. These RGAs supported by significant QTN/SNP allele effects were mostly nucleotide binding site and leucine-rich repeat receptors (NLRs) belonging to either coiled-coil (CC) NLR (CNL) or toll interleukin-1 (TIR) NLR (TNL), receptor-like kinase (RLK), receptor-like protein kinase (RLP), transmembrane-coiled-coil (TM-CC), WRKY, and mildew locus O (MLO) genes. These results constitute an important genomic tool for resistance breeding and gene cloning for PM in flax.


Introduction
Diseases are major constraints to crop production. Powdery mildew (PM) is a common fungal disease of many crops worldwide. Most PM pathogen species are host-specific or infect few hosts, implying that their genomes encode distinct "toolboxes" of pathogenesisrelated genes [1]. Importantly, all PM fungi are obligate biotrophic plant pathogens; hence, their growth and reproduction are entirely reliant on the availability of water and nutrients from living host cells [2]. PM interferes with plant growth and reduces the crop's quality. For example, PM can reduce grain yields by up to 40% in wheat [3]. PM occurs later in the season when temperatures are between 20 and 25 • C, and the extent of the damages is only determined after the crop has been harvested [4]. Once the host is infected, the infection spreads quickly under favourable conditions, resulting in PM outbreaks. Furthermore, the evolution of pathogen races with increased virulence may result in a "breakdown" of resistance. For instance, Pm17, Pm3a, and Pm4a were defeated in several Eastern and mid-Atlantic regions of the United States [5,6], as was Pm8 in China [7]. As a result, most resistance genes become ineffective after a period of time [8]. Thus, it is necessary to keep looking for new sources of resistance or create new combinations to stay abreast of new races. This can be achieved by identifying new genes/alleles that will allow us to build molecular tools to quickly and efficiently introduce them into breeding lines and to diversify the resistance sources against the rapidly evolving pathogen races.
Plant disease resistance is usually categorized as either "qualitative," defined by the presence of major resistant (R) genes, or "quantitative" defined by the presence of resistance-related quantitative trait loci (QTL) [9,10]. R genes, for example, have been used successfully in wheat where they form the basis of resistance breeding programs that have produced many resistant commercial varieties [11]. The majority of the known R proteins have nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domains and act as intracellular immune receptors that recognize their cognate effectors directly or indirectly [12]. Two other genetically specified R proteins are cell surface-localized receptorlike transmembrane proteins (RLPs) and receptor-like kinases (RLKs) [13]. Mildew locus O (MLO) is a well-studied PM susceptibility gene that was first discovered in barley in 1942 [14] and later identified in rice and wheat [15,16].
Qualitative resistance means "monogenic", "vertical" or "race-specific", while quantitative resistance refers to adult plant resistance (APR), "slow-mildewing", or "partial resistance" [17]. Qualitative resistance genes are usually short-lived due to the frequent changes in the pathogen population [9]. Quantitative genetics approaches, such as estimating genetic elements, heritability, and efficient gene numbers, are typically used to investigate quantitative resistance. Several studies have identified more than 100 QTL on all chromosomes in wheat [11]. Quantitative resistance is more durable [18,19], and offers long-term defence to host plants. However, detecting such resistance is difficult, particularly when R genes are present. Traditionally, genetic studies of quantitative traits use segregating biparental populations that have been tested for the traits of interest and genotyped with DNA-based molecular markers. This method necessitates the costly and time-consuming creation of specific mapping populations. Additionally, the resolution is limited by the number of crossovers and the high linkage disequilibrium (LD), necessitating additional research to fine-map the QTL that often span several cM [20].
Genome-wide association studies (GWAS) are a powerful alternative strategy to the traditional linkage-based QTL mapping, focusing on LD obtained from unrelated genotypes of a collection that reflects historical recombination, thereby resulting in more accurate positioning of QTL and higher mapping resolution. However, GWAS are less effective at detecting alleles with small effects and rare alleles, and can produce a large number of false-positive associations. To improve the statistical mapping power of GWAS, a large population size and high marker density are needed. Several GWAS analyses for PM resistance have been performed in plant species such as wheat [11,21,22], barley [23] and oat [24]. These studies revealed that resistance to PM was mainly mediated by a few minor QTL with small effects and heavily influenced by the genetic background of the populations studied, the phenotyping conditions, and the genotype-by-environment interactions. These findings imply the limited efficiency of marker-assisted selection (MAS) to improve PM resistance by pyramiding small-effect favourable alleles. Genomic selection (GS), also known as genomic prediction (GP), is a promising alternative in crops, particularly for improving complex traits. In GS, genome-wide markers are used to predict the genomicestimated breeding values (GEBVs) of individuals by capturing the benefits of both majorand minor-effect QTL [25,26]. As a result, GS captures a more significant proportion of the genetic variance of the selected traits than MAS, which is often restricted to a small number of markers linked to major QTL.
In flax, PM, caused by the obligate biotrophic ascomycete Oidium lini Skoric, is one of the most destructive and common flax foliar diseases [27,28]. It was first reported in Western Canada in 1997 [29] where it remains a sporadic disease. Infection with PM at an early stage of development will result in significant yield and seed quality reductions [30]. Most Canadian flax varieties are moderately resistant to PM under field conditions with natural inoculum [31]. One major dominant gene for resistance to PM, designated Pm1, has been identified from the Canadian varieties AC Watson, AC McDuff and AC Emerson as well as from the introduced varieties Atalante and Linda. Two additional dominant genes have also been postulated in Linda [32]. The use of resistant varieties in combination with a systematic disease management program is the most successful way to reduce the incidence of PM and save producers money.
In the present study, both GWAS and GS analyses were performed on 447 flax accessions. Field resistance assessment was performed for five to eight years (2010-2017) in Morden, Manitoba, Canada and genotyping was achieved through short-read sequencing of genomic DNA. The major objectives of this study were: (1) to discover major and minor QTNs associated with PM resistance in flax through GWAS using seven widely used multilocus and two single-locus statistical methods; (2) to identify putative candidate genes for these QTNs; and (3) to evaluate the efficiency of the QTNs as markers in GS models.

Evaluation of Powdery Mildew Resistance
The average PM rating of the 372 accessions of the core collection over five years (2012-2016) was 4.84 ± 1.35, while that of the 75 selected breeding lines was 2.24 ± 0.43 ( Figure 1A, Table S1). The overall average was 4.5 ± 1.4 over the five years with large phenotypic variation (coefficient of variation (CV): 28.1-53.2%) ( Table 1).  (2012)(2013)(2014)(2015)(2016). Box plots of PM ratings for the 372 accessions of the core collection and the 75 selected breeding lines (A) and the number of accessions and mean PM ratings by resistance groups (B). R: resistant (PM ratings of 1 to <3); MR: moderately resistant (3 to <4); MS: moderately susceptible (4 to <5); S: susceptible (5 to <7); HS: highly susceptible (7 to 9). Based on the average PM ratings over five years, 85 genotypes were either highly resistant (HR) or resistant (R); this group included 70 of the 75 selected breeding lines and 15 accessions from the core collection. Of these 85 genotypes, 83 belong to the linseed morphotype while F_LTU_B_CN10111 and F_NLD_C_CN18983 are fibre-type flax. A total of 92 genotypes were moderately resistant (MR), including the five selected breeding lines; 114 genotypes were moderately susceptible (MS); 118 genotypes were susceptible (S); and 38 genotypes were highly susceptible (HS) ( Figure 1B). The PM ratings averaged 2.2 ± 0.4, 3.4 ± 0.3, 4.4 ± 0.3, 5.8 ± 0.6 and 7.3 ± 0.3 for the R, MR, MS, S and HS groups, respectively ( Figure 1B).
Similar performance for PM resistance between years was observed, although the yea-to-year PM ratings of the genotypes differed slightly. The PM ratings in 2014 and 2015 were generally lower than those in other years (Table 1), while the PM ratings of the 75 breeding lines were lower in 2010 (Table S1). Pearson correlation analysis corroborated the significant correlations across years (r = 0.35-0.81, p < 0.01) ( Figure 2).

Genetic Structure of the Population
Principal component analysis (PCA) was performed for the flax core collection of 372 accessions and 75 selected breeding lines using all 247,160 SNPs identified from the 447 genotypes ( Figure S1). The first three principal components (PCs) accounted for 11.8%, 8.1% and 6.1% of the total variation, respectively. These three PCs grouped all genotypes into either linseed or fibre morphotypes ( Figure S1). Most of the 75 selected breeding lines were highly resistant to PM ( Figure 1A), creating a spatial cluster that differed from the core collection ( Figure S1). To capture more of the structural contributions of the PCs, the first ten PCs (38% of the total variation) were selected as a genetic structure matrix for the downstream GWAS analyses.  (Table S2). QTNs located in the same haplotype blocks were treated as QTL, and the QTNs with the largest effect (R 2 ) were chosen as the tag to represent these QTL. Singleton QTNs were considered independent QTL. Hereafter, we use the tag QTNs or singleton QTNs to represent the QTL. The 349 tag QTNs and their related information are listed in Table S3 and depicted on chromosomes in Figure 3. The 349 unique QTNs were identified using a number of GWAS statistical models (Table S4). The single-locus model GLM detected only nine QTNs, but these had relatively large effects, ranging from 0.65 to 27.62% and averaging 11.34% of R 2 (Table S4). No QTNs were detected with the single-locus model MLM. The multi-locus FarmCPU method detected 15 QTNs with relatively large effects (9.07% of R 2 ). The remaining six mrMLMbased multi-locus models identified most of the QTNs, including some with large and minor effects accounting for 3.19-5.79% of R 2 . The QTN numbers varied from 68 with FASTmrEMMA to 108 with pLARmEB (Table S4). Only 2-4 QTNs were common between GLM and the seven multi-locus models; in contrast, 13-39 QTNs were shared by two of seven multi-locus models ( Table S5).

Identification of QTL
The identified QTNs also differed across individual PM phenotypic datasets (Table S6). A total of 45-84 QTNs were identified from the six PM datasets, of which few were shared by more than one PM dataset (Table S7). However, when assessing the effects of the QTN alleles across PM datasets, most QTNs were found to be stable over years. Of the 349 unique QTNs, 265 had significant allele effects in at least three of the PM datasets (Table S3). A total of 122, 56, 54, 33, 32 and 52 QTNs had significant allele effects in six, five, four, three, two and one PM datasets, respectively. QTN effect R 2 estimates were positively correlated with the number of PM datasets with significant allele effects and negatively correlated to the coefficients of variation (CVs) of the QTN effects, indicative of the stability of the QTNs ( Figure 4). Large-effect QTNs were stable across years, while the small-effect QTNs only significant in one PM dataset were least stable ( Figure 4, Table S3). Of the 122 QTNs stable across all six PM datasets, 44 were of large QTN effects ≥ 10% of R 2 , of which the following 11 had  Table S3). Of the 44 large-effect QTNs with R 2 ≥ 10%, 15 were located on chromosome 5 and 5 on chromosome 13.

Favourable Alleles
According to the effect direction (positive or negative) of two alleles for a QTN, the favourable allele (FA) composition of all identified QTNs existing within each genotype was determined. For all 349 unique QTNs, the total number of favourable alleles (NFAs) within a genotype ranged from 141 to 305. The NFAs were significantly negatively correlated with the PM ratings of the accessions ( Figure 5A) (R 2 = 0.62). The 75 selected breeding lines that were resistant (R) or moderately resistant (MR) were clearly distinguished through NFAs of accessions. The NFAs for five groups of genotypes were 262 ± 35, 204 ± 22, 193 ± 15, 180 ± 13 and 169 ± 12 for R, MR, MS, S and HS, respectively ( Figure 5B), confirming the same linear correlation between NFAs and PM ratings displayed in Figure 5A. Based on the number of favourable alleles in the genotypes of the population, three types of QTNs were observed ( Figure 6, Table S3): type 1 QTNs with low favourable allele frequencies (FAFs) ( Figure 6A), type 2 QTNs with high FAFs ( Figure 6B), and type 3 QTNs with FAFs close to 0.5 ( Figure 6C). Of the 447 genotypes in this study, 177 or 37% were R or MR; thus, the FAFs of good QTNs were expected to be approximately 0.37. We found that the 33 QTNs with R 2 > 20% had an average FAF of 0.27 (0.15-0.33), and the 33 QTNs with R 2 between 10% and 20% had an average FAF of 0.48 (0.19-0.86), compared to the remaining QTNs with R 2 ≤ 10% that had an average FAF of 0.60 (0.06-0.94). The FAFs of most large-effect QTNs (>10%) ranged from 0.1 to 0.6 in all genotypes and 0.5 to 0.9 in resistant genotypes ( Figure 7).

Relationship between PM Resistance and Flax Morphotypes
Of the 447 accessions used in this study, 367 and 80 belonged to the linseed and fibre morphotypes, respectively. Linseed and fibre genotypes had average PM ratings of 4.2 ± 1.5 and 5.3 ± 1.5, and average NFAs of 208.8 ± 38.1 and 175.7 ± 12.2, respectively ( Figure 8). Significant differences in PM ratings and NFAs between linseed and fibre genotypes were observed (Wilcox test, p = 2.487 × 10 −8 for PM rating and p < 2.2 × 10 −16 for NFAs). The 10% most resistant accessions were all linseed types, while the 10% most susceptible included 18 fibre accessions (~38%) even though fibre accessions made up only 17.9% of the overall collection. Overall, a higher proportion of fibre accessions than expected based on their representation in the collection were susceptible, and this was congruent with the proportionally lower NFAs.
To further identify candidate genes for PM resistance, a subset of 3230 SNPs located on 838 flax RGAs were extracted from the overall SNP dataset and GWAS analysis was performed with the same models as mentioned above. Significant QTNs were identified from 42 RGAs, including Lus10030587, Lus10003971, and Lus10002249 that had already been detected with the 247,160 genome-wide SNP dataset. Combining the results from both datasets, a total of 388 QTNs were obtained, including 132 QTNs identified on 45 RGAs (Table 2) and 87 non-RGA genes (Table S8).

Genomic Prediction for PM Resistance
To evaluate the overall reliability of the identified QTNs, ten GS models were compared using the 349 unique QTNs and a five-fold cross-validation scheme was used to identify the best models for genomic prediction of PM resistance. All models performed similarly, except for RKHS and RFR ( Figure S4). GBLUP, BRR and SVR generated the highest predictive ability of 0.93. Thus, GBLUP was used for the remaining comparisons.
Five marker sets were used to construct the GBLUP GS models: (1) 349 QTNs identified from the genome-wide dataset of 247,160 SNPs, (2) 388 QTNs obtained using the genomewide and the RGA-derived SNP datasets, (3) 132 QTNs located within genes, including 45 RGAs and 87 non-RGA genes, (4) 294 QTNs or SNPs located on 294 candidate RGAs with significant allele effects, and (5) the genome-wide dataset of 247,160 SNPs. The 349 QTN and the 388 QTN datasets explained the highest genetic variation for PM resistance (h 2 = 0.69-0.70) and generated the highest predictive ability (r = 0.925-0.926) ( Table 4). The predictive ability obtained from the other datasets was significantly lower and the lowest was obtained from the genome-wide dataset of 247,160 SNPs (r = 0.690). The prediction model constructed using all 447 accessions as a training population, the PM ratings over five years, and all 349 QTNs with GBLUP explained 96% of PM variation (R 2 = 0.96), showing high predictive ability and the potential of this model in applied genomic prediction ( Figure S5).

Discussion
Genetic studies of PM resistance in flax are few in number, limiting our understanding of the genetic architecture of this trait to a few major genes. Indeed, using traditional genetic analyses, the single dominant gene Pm1 that confers resistance to PM was identified in several Canadian ('AC Watson', 'AC McDuff', and 'AC Emerson') and introduced ('Atalante' and 'Linda') cultivars, and two putative dominant genes were additionally postulated in 'Linda' [32]. In agreement with the latter, a QTL mapping study using biparental F 3 and F 4 populations derived from a cross between the susceptible cultivar NorMan and the resistant cultivar Linda identified three PM resistance QTL [27]. Here, we exploited a large genetic panel that included 372 accessions from the diverse flax core collection [33,34] from which we defined a high density genome-wide dataset of 247,160 SNPs and an RGA-specific subset of 3230 SNPs, with the view of gaining greater insights into the genetic architecture of PM resistance in flax in order to design breeding methods for its improvement. Because the core collection contained few highly resistant lines, we augmented the germplasm with 75 selected breeding lines previously phenotyped and deemed highly resistant to PM. This significantly enhanced the detection power and the reliability of the identified QTNs through an increase in population size and the genetic variation of the PM ratings. We also used two single-and seven multi-locus statistical models to identify both large-and small-effect QTNs, resulting in a total of 349 unique QTNs from the genome-wide SNP dataset (Table S3). This has proven to be a good strategy to take advantage of the strength of each model as well as to palliate their shortcomings [35,36]. Post-identification QTN analysis ( Figure S4) also contributed to the reliability of the QTNs by removing potentially redundant and false-positive QTNs. Overall, the methodology used herein constitutes a powerful strategy, combining different tools and methods to identify the most reliable QTN-trait association.
Current GWAS methods are limited to predicting genes or genetic features controlling traits. This may mostly depend on the density of genome-wide markers, the size of a genetic panel, the association of QTN with the trait or the extent of the QTN effect, and so on. To date, the main method for predicting candidate genes from QTNs identified by GWAS remains to scan the annotated genes in their vicinity. An optimal window size may be determined based on linkage disequilibrium decay [38,39]. In this study, we adopted a window of 200 Kb flanking a QTN [35,40]. To provide further support to the candidate RGAs as it relates to their function in disease resistance, we first narrowed the candidate genes to the flax RGAs identified from its annotated reference genome sequence [37,41,42]. Second, we reduced the set to the candidate RGAs located within the specified window size of 200 Kb and that had at least one SNP with significant allele effects on PM ratings within the RGAs. That is, even though no QTN was identified from a candidate gene, the SNP(s) on a candidate gene must have been significantly related to PM rating to be considered. Third, we performed GWAS using the 3230 SNPs exclusively located within flax RGAs in order to identify QTNs associated with PM resistance that were specifically located within RGAs. In this way, we identified 45 RGAs that possessed intragenic QTNs and an additional 270 RGAs supported by SNPs within these genes, all with significant allele effects.
QTL mapping provides a useful statistical genetics tool to identify QTNs and candidate genes associated with PM resistance. Molecular markers can be designed from some largeeffect QTNs for MAS. Alternatively, or in addition to, all QTN markers can be used to establish GP models for predicting GEBVs of germplasm or breeding lines. In this study, we counted favourable alleles of all QTNs (ignoring QTN effects) in each accession of the genetic panel and we observed a significant correlation between the number of favourable alleles and PM ratings (R 2 = 0.62, Figure 5A). PM-resistant accessions had significantly more favourable alleles than PM-susceptible accessions ( Figure 5B), clearly indicating the significant additive feature of the identified QTNs. Consequently, the pyramiding of favourable alleles through cross prediction [43], hybridization and recombination, as well as GS is expected to translate into cultivars with improved resistance to PM. The 75 selected breeding lines with high PM resistance and a high number of favourable alleles demonstrate the potential of this approach. Genomic cross prediction is an advanced genomic tool that integrates computer simulation and GS to predict the genetic performance of different types of crosses by evaluating the expected breeding values and genetic variances of their segregating populations for the purpose of selecting superior crosses and consequently enhancing the potential for success [43]. GS has been evaluated via a cross-validation approach for agronomic, abiotic and biotic stress-related traits, including pasmo resistance in flax [36,44,45]. For example, the predictive ability of pasmo resistance in flax was 0.92 when 500 QTL were used for prediction [44], similar to our results of 0.925 obtained using 349 QTNs for PM resistance. Therefore, these QTNs offer the potential for PM resistance breeding using genomics-assisted breeding methods.
The potential candidate genes can be further validated using functional approaches. Once functionally validated, they can be genetically edited to improve cultivar PM resistance. For example, MLO is a resistance gene that confers PM resistance in many crops, such as grapevine [46], wheat [47], and barley [48]. Several technologies such as genome editing and targeting induced lesions in genomes (TILLING) have been used to create MLO mutants with enhanced PM resistance in bread wheat [47,49]. Nekrasov et al. (2017) [50] reported a non-transgenic tomato variety resistant to PM (Oidium neolycopersici) produced through editing the MLO gene (SlMlo1) using the CRISPR/Cas9 technology, which is based on the Cas9 DNA nuclease guided to a specific DNA target by a single guide-RNA (sgRNA). PMR4 encodes a callose synthase, and its loss-of-function mutants are resistant to PM in Arabidopsis and tomato. The CRISPR/Cas9-mediated knockout mutants of the PMR4 ortholog (SlPMR4) in tomato showed partial resistance against the PM pathogen O. neolycopersici [51]. RNA silencing of SlPMR4 also enhanced the resistance to PM in tomatoes [52]. In this study, we detected six MLO gene orthologues (Lus10036120, Lus10036121, Lus10023506, Lus10015461, Lus10012698, and Lus10001336) that had significant SNP effects with R 2 of 1-10%. QTN Lu13-1749576 (R 2 = 1.23%) was identified from the gene region of the MLO ortholog Lus10001336. These MLO and other identified candidate genes could be candidates for genome editing to improve PM resistance.
The RPW8 domain was found in several broad-spectrum powdery mildew resistance proteins from Arabidopsis thaliana and other dicots [53,54]. The A. thaliana locus RPW8 contains two naturally polymorphic, dominant R genes, RPW8.1 and RPW8.2, which individually control resistance to a broad range of powdery mildew pathogens [53]. Therefore, the three RPW8 genes identified on chromosome 13 are potentially important candidate genes. An additional PM resistance study based on a fibre flax recombinant inbred line (RIL) population derived from the Aurore/Adelie cross also showed that several RPW8 genes may explain the PM resistance difference between the two parents [55].
It is worth noting that some paired candidate duplicate genes co-located with some paired QTNs (Table S9). For instance, in Figure 9, three RPW8 genes (Lus10000835, Lus10000836 and Lus10009328) are paired duplicate genes with sequence similarity of 0.67-0.69 (Table S9). Correspondingly, four QTNs Lu13-4791823, Lu13-4830850, Lu13-4866704, and Lu13-4900476 are paired QTNs. This finding is worthy of further investigations to test whether these duplicated genes and QTNs contribute additively to PM resistance in flax.
We observed that in the combined population of 372 accessions from the core collection and 75 breeding lines, linseed accessions tended to be more resistant to PM than fibre accessions. The same result was also observed in the flax core collection alone [56]. This result is somewhat surprising because fibre flax is grown in coastal, humid environments that are conducive to the development of the disease and resistance to PM has long been a major objective of fibre flax breeding programs considering that PM has the potential to considerably reduce fibre quality. Therefore, crosses with resistant linseed varieties can be used to introduce new allelic diversity for PM resistance in fibre flax breeding programs and genomic selection can be used to assist in selecting the most resistant lines while preserving the fibre morphotype.

Genetic Population
A diverse genetic panel of 372 accessions from the previously described flax core collection [33,34] was used. The core collection was assembled from the world collection of 3378 flax accessions, collected from 39 countries and corresponding to 11 regions of the world: North America, South America, Eastern Asia, Western Asia, Southern Asia, Central and Eastern Europe, Western Europe, Southern Europe, Northern Europe, Oceania, and Africa. This panel contained 17 landraces, 84 breeding lines, 234 cultivars, and 37 accessions of unknown improvement status that were grouped into two morphotypes: 80 fibre and 292 linseed accessions [56].
The core collection [56] contained few highly resistant accessions. To empower the GWAS and GP analyses, an additional 75 resistant breeding lines were added to the core collection.
The 75 selected breeding lines were field-evaluated using the same procedure as the flax core collection, but this was accomplished during eight consecutive years (2010-2017). As both populations were assessed in the same PM nursery using the same procedure, the two datasets of the five common years (2012-2016) were combined for downstream analyses.

Genotyping and SNP Identification
Whole-genome resequencing methodology was employed to genotype all individuals of the core collection. The lines were grown in growth chambers with 20 h light at 22 • and 4 h dark at 18 • until they were approximately 7-8 cm tall. The compact tip of the plants (75-100 mg) was collected, flash-frozen in liquid nitrogen, and immediately lyophilized. Genomic DNA was extracted using the DNeasy 96 Plant kit (Qiagen, Mississauga, ON, Canada) and quantified using the Quant-iT PicoGreen dsDNA assay kit (Thermo Fisher Scientific, Waltham, MA, USA), both according to the manufacturer's instructions. The Illumina HiSeq 2000 platform (Illumina Inc., San Diego, CA, USA) was used to generate 100 bp paired-end reads with~15.5X genome coverage equivalents of the reference genome. All reads from each individual of the population were aligned to the scaffold sequences of the flax reference genome [57] using BWA v0.6.1 [58] with base-quality Q score in Phred scale >20 and other default parameters. The alignment file for each individual was used as input for SNP discovery using the software package SAMtools v1.12 [59]. All variants were further filtered to obtain a set of high-quality SNPs as previously described [60]. The SNP coordinates were then converted to the chromosome scale of the flax pseudomolecules v2.0 upon its release [42]. All procedures were implemented in the AGSNP pipeline [61,62] and its updated GBS version [60]. The detected SNPs were further filtered with minor allele frequency (MAF) > 0.05 and SNP call rate ≥ 60%. To minimize the contribution from regions of extensive strong linkage disequilibrium (LD), a single SNP was retained per 200 Kb window when pairwise correlation coefficients (r 2 ) among neighbouring SNPs were greater than 0.8 [63,64], resulting in a total of 258,873 SNPs. Missing SNPs (on average, 14.13% of a missing data rate) were imputed using Beagle v.4.2 with default parameters [65].
A similar approach was used for the genotyping of the 75 selected breeding lines. Shotgun PCR-free library preparation (Lucigen, Middleton, WI, USA), library quality control (Illumina, San Diego, CA, USA) and sequencing were performed by the Centre d'expertise et de services Génome Québec (Montréal, QC, Canada). Twenty-one samples were indexed per lane, and sequencing on the HiSeqX platform (Illumina) generated 150 bp paired-end reads to an estimated average of 15X genome coverage per genotype. The same procedure was used to identify SNPs from the 75 selected breeding lines. The identified SNPs were combined with the SNP set from the core collection, resulting in a common set of 247,160 SNPs that was used in all downstream analyses.

Genome-Wide Association Studies (GWAS)
GWAS analyses were conducted separately for the five individual year datasets and the 5-year average dataset with the single-locus models GLM [66] and MLM [67] and the following seven multi-locus models: FarmCPU [68], pLARmEB [69], pKWmEB [70], FASTmrMLM [71], FASTmrEMMA [72], ISIS EM-BLASSO [73] and mrMLM [71]. The R package rMVP v1.0.4 [68] was used to run the GLM, MLM and FarmCPU models, while the R package mrMLM v5.0.1 [74] was used to run the remaining six multi-locus models. The kinship matrix used for each model was calculated using the module implemented in the corresponding software. The population structure of the 447 accessions was estimated using principal component analysis (PCA) and the first ten principal components (PCs) as cofactors in the models.
The threshold of significant associations for GLM, MLM and FarmCPU was determined by a critical p value (α = 0.05) subjected to Bonferroni correction, i.e., the corrected p-value = 2.02 × 10 −7 (0.05/247,160 SNPs). A log of odds (LOD) score of 3.0 was used to detect significant associations for the six models implemented in the mrMLM package.
The putative QTNs identified were first analysed by testing the statistical significance of QTN alleles for PM resistance. Statistically significant differences between alleles provided validity to the QTNs. Wilcox non-parametric tests were performed using the R function wilcox.test to remove the non-significant QTNs at a 5% probability level. The direction (positive or negative) of QTN effects was subsequently determined. Only QTNs with consistent effect directions in all datasets were considered valid and were retained. Such QTNs were grouped into QTL by calculating haplotype blocks using plink v1.9 [75]. QTNs located in the same haplotype block were grouped into QTL. For each such defined QTL, the QTN with the largest average R 2 over all datasets was chosen as the tag QTN representative of the QTL. R 2 values were calculated based on simple regressions of QTNs on PM ratings, representing the proportion of the total variation for PM resistance explained by the QTNs/QTL.
To analyse the stability of the QTNs, we used the coefficient of variation (CV) of the allele effect values across the six PM datasets (five individual years and mean over years) for each QTN. Favourable alleles were determined based on the difference in PM rating of individuals with either of the two alleles at a given QTN. Stable QTNs have the same favourable allele in all PM datasets. QTNs with inconsistent effect values were considered unreliable and were removed. QTNs were declared highly stable if the effect differences across datasets were significant. Thus, for a significant QTN, its effect difference must be significant in at least one PM dataset. CV for effect values over datasets can be used to measure the stability of QTNs. The second criterion was R 2 (%), i.e., the variance proportion of the phenotypic variation explained by the QTN. Thus, for each QTN, CV and R 2 were used to describe its stability and the extent of its effect. A stable QTN is defined here as having R 2 > 10% and CV < 100%.
To test QTN effect additivity, the number of QTNs with negative effect or favourable alleles (NFA) in all accessions was tallied. A simple regression of NFA on PM in the population was calculated. Correlations between the NFA and PM rating in the six datasets were calculated using the R function "cor". A complete description of the post-identification QTN analysis pipeline is depicted in Figure S6.

Candidate Gene Prediction
A total of 1327 RGAs were identified using the RGAugury pipeline in the flax pseudomolecule [41,42]. An additional 117 disease-resistance-related genes were detected in the flax genome based on a homology search against the annotated Arabidopsis genes. Thus, a total of 1444 RGAs of known annotated functions to disease resistance were used for candidate gene prediction. To predict candidate resistance genes co-localized with QTNs, the RGAs located within 200 Kb of a QTN, i.e., 100 Kb on either side of the QTN, were investigated. In addition, GWAS were also performed using the models previously mentioned and the 3230 SNP subset present within 838 of the 1444 RGAs and that segregated in the germplasm under study.
Five-fold cross-validation with 50 iterations was used to estimate the predictive ability of the models for the 372 accessions in the core collection and 75 selected breeding lines. The predictive ability (r) was calculated as the Pearson's correlation coefficient between the mean genomic estimated breeding values (GEBVs) and the observed phenotypes. A custom genomic selection pipeline (GSPipeline v1.0) integrating the ten genomic prediction models implemented in the R packages rrBLUP v4.6.1 [76], BGLR v1.0.9 [79], BLR v1.6 [80], randomForest v4.7-1 [81] and sommer v4.1.6 [82] was used for GP model construction and cross-validation. Tukey's multiple pairwise comparisons (HSD.test function) were performed to test the statistical significance of the predictive ability.

Conclusions
This study reaffirmed the quantitative nature of PM resistance in flax. The flax core collection is a valuable genetic panel that contains a broad range of genetic variation from diverse geographical origins and different improvement status, and has demonstrated, herein, that it also possesses considerable genetic variation for PM resistance. However, few accessions were highly resistant and its complementation with 75 highly resistant breeding lines was beneficial to our study. This large genetic panel with its high-density genome-wide SNPs combined with multiple single-and multi-locus GWAS models proved powerful to identify large-and minor-effect QTL, resulting in the identification of 349 QTNs and 445 candidate RGAs associated with PM resistance in flax. GWAS using the small RGA-derived SNP set further refined the candidate gene set down to 45 RGAs, which harboured QTNs within their coding sequences. Other candidate RGAs near QTNs were also supported by at least one significant SNP within their coding sequence. Significant additive features of the identified QTNs facilitate the application of these QTNs in markerassisted and genomic selection, and a high predictive ability is expected for PM resistance. This large-scale QTL identification study provides great potential to use the identified QTNs and their potential candidate genes for PM resistance breeding and gene cloning.