Genome-Wide Association Study Identifies QTL and Candidate Genes Governing Seed Mucilage Content and Hull Content in Flax ( Linum usitatissimum L . )

New flaxseed cultivars differing in seed mucilage content (MC) with low hull content (HC) represent an attractive option to simultaneously target the food and feed markets. Here, a genome-wide association study (GWAS) was conducted for MC and HC in 200 diverse flaxseed accessions genotyped with 1.7 million SNP markers. The data obtained for MC and HC indicated a broad phenotypic variation and high (~70%) and a moderate (~49%) narrow sense heritability, respectively. MC and HC did not differ statistically between fiber and oil morphotypes, but yellow-seeded accessions had 2.7% less HC than brown-seeded ones. The genome wide linkage disequilibrium (LD) decayed to r2 = 0.1 at a physical distance of ~100 Kb. Seven and four QTL were identified for MC and HC, respectively. Promising candidate genes included Linum usitatissimum orthologs of the Arabidopsis thaliana genes TRANSPARENT TESTA 8, SUBTILISIN-LIKE SERINE PROTEASE, GALACTUROSYL TRANSFERASE-LIKE 5, MUCILAGE-MODIFIED 4, AGAMOUS-LIKE MADS-BOX PROTEIN AGL62, GLYCOSYL HYDROLASE FAMILY 17 and UDP-GLUCOSE FLAVONOL 3-O-GLUCOSYLTRANSFERASE that have been shown to play a role in mucilage synthesis and release, seed coat development and anthocyanin biosynthesis in A. thaliana were identified. The favorable alleles will be useful in flaxseed breeding towards the goal of achieving the ideal flaxseed cultivars for food and feed by genomic-based breeding.


Introduction
Flaxseed (Linum usitatissimum L.), one of the oldest crops, has been used as human food and animal feed since ancient times.Nowadays, flaxseed enjoys new prospects in the functional food market because of the growing consumers` interest for food with health benefits [1].Indeed, flaxseed is a raw material rich in bioactive compounds such as α-linolenic acid (omega-3) with cardiovascular benefits, lignans with anticancer properties, insoluble and soluble fiber (mucilage) capable of lowering cholesterol and insulin [2].
Flaxseed mucilage is a heterogeneous polysaccharide composed of xylose, arabinose, glucose, galactose, rhamnose and fructose [3] that can be purified into neutral and acidic polymers.Mucilage abounds in the seed coat where it makes up to 8-10% of the seed weight [4].Mucilage synthesis is tightly linked to seed coat development [5] and both tissues form the seed hull, a structure representing 37-48% of the seed weight [6,7].These two fractions rich in polysaccharides are components of the flaxseed meal, primarily used as a protein rich livestock and poultry feed [6,8].
Absorption of flaxseed meal's advantageous 31-45% protein content [9] may be hindered by mucilage and cell wall polysaccharides due to the swelling capacity of polysaccharides in the digestive tract of monogastric animals with the concomitant growth depression and reduced feed efficiency [7,10].In that context, reduction of mucilage and hull contents in flaxseed meal would be desirable to achieve improved feeding value for livestock and poultry.Studies of flaxseed mucilage degradation are focused on chemical retting, enzyme retting and steam explosion [11].Reduction of the hull content in flaxseed and rapeseed meal has been achieved through dehulling methods [12] and to the use of yellow-seeded genotypes [7,13].Since food and feed markets demand flaxseed cultivars differing in mucilage and hull content, it is crucial to decipher the genetic factors underlying these complex traits to accelerate the development of market specific flaxseed cultivars, leading to an improvement in overall seed value.
Similarly, putative flax orthologs of the TRANSPARENT TESTA 3, 4, 5 and 7 (TT3, TT4, TT5 and TT7), flavonol synthase (FLS) and BANYULS (BAN) involved in flavonoids synthesis during seed coat development have also been identified [15].Genetic variation for mucilage and hull content in flaxseed has been assessed [4,7; 16-18] but no QTL have been reported so far, which would constitute a first step toward dissecting the genetics of these traits.In flaxseed, QTL studies have been rather scarce.QTL for Fusarium wilt resistance [19], powdery mildew [20], iodine value, palmitic, linoleic and linolenic acids [21,22] and seed and flower color [23] were reported.QTL for seed protein, cell wall, straw weight, yield-related traits and phenological traits have also been reported using bi-parental mapping and association mapping [22,24,25].Recently, genome wide association studies (GWAS) have been conducted for agronomic and seed quality traits using thousands of SNP loci [26,27].Since GWAS mines the natural sequence diversity within a species and captures historical recombination events, it is a suitable approach to discover loci that control complex traits, leading to a higher mapping resolution facilitating the identification of candidate genes [28].
Currently, there exists a suite of genomic tools available for flaxseed genetic studies [21,22,26,27; 29-32] so that QTL dissection through GWAS and subsequent gene discovery is now feasible.Here, we used 200 diverse flax accessions genotyped with a set of 1.7 million single nucleotide polymorphisms (SNPs) and performed GWAS for mucilage and hull contents.This study enabled the identification of highly promising candidate genes and markers for the development of flaxseed cultivars differing in mucilage content with reduced hull content, leading to an increase overall seed value of this important cash crop.

Phenotypic evaluation
Seed mucilage (MC) and hull (HC) contents displayed normal distribution across the two environments according to the Shapiro-Wilk normality test and normality plots (Table S1, Figure S1).Variance component analysis indicated significant effects of genotype, environment and genotype x environment interaction according to the Wald statistic (P < 0.001 g -1 for the fiber and oil morphotypes, respectively (P = 0.651) (Figure S2a).The average MC registered values of 56.63 and 59.22 (P = 0.517) for the brown and yellow seeded classes, correspondingly.The average HC did not differ statistically between flax morphotypes (fiber = 43.41%,oil = 42.79%;P = 0.373).On the other hand, yellow-seeded genotypes averaged 2.66% less HC than brown seeded accessions (P = 3.2 e -5 ) (Figure S2b).

Population structure and linkage disequilibrium
The dendrogram based on 771,914 SNPs and the STRUCTURE plot grouped the 200 individuals into two major clusters arbitrarily assigned as "red" and "blue" (Figures 2a and 2b).In the K against ΔK plot, a break in the slope was clearly observed at K = 2 (Figure 2b).The red cluster comprised almost exclusively genotypes belonging to the oil morphotype while the blue cluster included both flax morphotypes.The coefficient of population differentiation (FST = 0.08) indicated a weak population structure between the two clusters.
The genome wide LD decayed to r 2 = 0.1 at a physical distance of ~100 Kb (Figure 2c).
Intra-chromosomal LD decayed to r 2 = 0.1 at a distance between marker pairs ranging from ~40 Kb on chromosome 8 to ~210 Kb on chromosome 13 (Figure S3).A highly significant positive correlation (r = 0.75, P = 0.0012) between marker density and the intra-chromosomal LD blocks was observed (Figure 3).For example, chromosomes 4 and 8 with the smallest number of markers, and chromosomes 6 and 13 with the largest, displayed the fastest and slowest LD decays, respectively (Figure 3 and Figure S3).The fast LD decay observed in this association panel are indicative of its advantageous potential for reducing QTL intervals and fine mapping of candidate genes for MC and HC.

Genome-wide association analysis
Three GWA models were tested including GLM-Q, GLM-PCA and MLM-K.According to the Q-Q plot results, the GLM-Q model showed a strong skew toward significance for every trait (Figure S4a), indicating that the Q matrix was insufficient to account for population structure and cryptic relatedness.Conversely, the MLM-K which only used the kinship matrix, led to an overcorrection of these confounding factors, particularly for HC (Figure S4b).The GLM-PCA was tested with 5 and 10 PCA covariates for HC and MC, accounting for 30.1 and 37.1% of the variation, respectively.Both GLM-5PCA and GLM-10PCA models performed well in controlling the rate of false positives, providing a suitable statistical power to identify significant marker-trait associations for MC and HC (Figure 4).Therefore, the GLM-PCA model was applied for GWA in this study.
GWA analysis identified 12 and 17 significant associations for MC in CAR2014 and HU2015, respectively (P < -log10 (P) = 6.88), and markers Lu5-3808878, Lu7-13225294 and Lu11-2498303 were significant in both environments (Table 1, Figure 4,).Various significant SNP markers fell into the same LD blocks.For example, five other significant markers surrounded the peak SNP Lu5-3808878 (Figure 4), thus they were considered the same QTL.Following this criterion, seven QTL were delineated on chromosomes 2, 3, 5, 7 and 11.The peak SNPs of these QTL accounted for 11.8 to 17.3% of phenotypic variation, and the combined three consistent QTL accounted for 43.6% of the MC variation (Table 1).
A total of three and four significant associations were detected for HC in CAR2014 and HU2015, respectively (P < -log10 (P) = 6.88).Markers Lu7-6577527 and Lu13-2803224 were significant in both environments (Table 1).The four QTL identified on chromosomes 7, 10, 12 and 13 explained between 13.8 to 17.8% of the HC variation.The two consistent QTL Lu7-6577527 and Lu13-2803224, accounted for a combined 33% of the HC variation (Table 1).The peak SNPs effect for MC and HC were all significant according to the non-parametric Kruskal-Wallis test (P < 0.05) except for Lu3-26033342 associated with MC (Figures 5a and 5b; Figure S5).Accessions with a thymine (T) allele at Lu2-22298066 displayed on average an increase of 15.3 and 9.4 mg g -1 in MC compared to accessions with a cytosine (C) allele in CAR2014 and HU2015, respectively (Figure 5a).Similarly, accessions with a "T" allele at Lu3-7398487 had on Accessions with a guanine (G) allele at Lu7-13225294 had on average 8.56 and 7.71 mg g -1 more mucilage compared to accessions carrying an adenine (A) allele in CAR2014 and HU2015, respectively (Figure 5a).The allelic effect for the other four peak SNPs is illustrated in Figure S5.The allelic effect of peak SNPs for HC revealed that accessions harboring a "C" allele at Lu7-6577527 had on average 1.4% and 1.3% less HC compared with "A" allele genotypes in CAR2014 and HU2015, correspondingly (Figure 5b.)On average, HC was reduced by 1.4% and 1.3% (Lu7-6577527) to 2.6% and 2.7% (Lu13-2803224) in CAR2014 and HU2015, respectively (Figure 5b, Figure S5).The combined QTL effect revealed that the MC of accessions harboring none of the favorable QTL alleles averaged 44.6 and 48.9 mg g -1 compared to 72.1 and 67.6 mg g -1 for those with five favorable alleles in CAR2014 and HU2015, respectively (Figure 6).No accession had all seven favorable QTL alleles.The combined QTL effect for HC indicated that genotypes with none of the favorable QTL alleles averaged 45.5% and 45.3% HC compared to genotypes with four favorable QTL alleles, in which HC averaged 42.7% and 42.9% in CAR2014 and HU2015, respectively (Figure 6).

Identification of candidate genes
The LD blocks harboring the peak SNPs were mined for genes relevant to MC and HC using the L. usitatissimum v.1.0reference genome.A total of 204 and 118 candidate genes were identified for MC and HC, respectively (Table 2, Table S2).Several genes ascribed to carbohydrate metabolism, seed mucilage synthesis, modification and release, and cell wall synthesis and modification were identified at the MC QTL loci (

Phenotypic variation of mucilage and hull contents
Flaxseed mucilage and seed hull possess valuable nutritional and rheological attributes [33,34] but are also known to affect animal performance [7].The presence of mucilage and fiber components (i.e.acid detergent lignin) in flaxseed meal reduces the energy uptake in both monogastric and ruminant animals [35].Therefore, knowledge about the phenotypic variation and genetic control of seed mucilage content (MC) and hull content (HC) is pivotal to better design breeding strategies aiming to improve the overall food and feed value of flaxseed.The broad phenotypic variation of MC and HC in the association panel and the degree of additivity of the genetic components hint at the potential for improving flaxseed for either high or low MC and reduced HC through marker assisted selection.
Kaewmanne et al. [4] reported MC ranging from 1.8 to 2.9% in seven Italian flaxseed cultivars, while [16] found that MC ranged from 3.6 to 8.0% in 109 flaxseed accessions.We found a slightly wider range from 2 to 10% in our diversity panel.Little information exists for HC variation in large collection of flaxseed.In general, HC ranges from 22-27% to 36-48% were reported in mechanically treated and hand-dissected seeds, respectively [7,36], which is much higher than canola at 18.6% and soybean at 16.1% [6].Reduction of HC can be achieved through the use of yellow-seeded cultivars, known to contain higher oil content and less HC than their brown-seeded counterparts [7,37].
Indeed, the yellow-seeded accessions displayed a lower HC compared to the brown-seeded as the answer to reduced HC flaxseed because their susceptibility to natural splitting and mechanical cracking of the seed coat can negatively affect seed quality [38].Consequently, breeding and mechanical trials should be conducted together in order to identify the ideal HC that would ensure seed mechanical quality.All considered, our association panel harbored abundant phenotypic variation for dissecting the genetic landscape of MC and HC.

Population structure and linkage disequilibrium
When the main factors accounting for population subdivision correlate with a trait under study (i.e.geographic distribution and flowering time), then marker-trait associations for the specific trait will undergo a more accentuated effect of the structure confounding factor [39].In flaxseed, population structure has been assessed in varying number of accessions, where geographic origin and flax morphotype seemed to have been the main factors underlying population subdivisions [40][41][42].In our association panel, the "red" and "blue" clusters were slightly differentiated (FST = 0.08), with a weak morphotype effect on dendrogram topology, possibly due to the small number of fiber type (33) compared to the larger number of oilseed type accessions (153).
Linkage disequilibrium (LD) is the main factor influencing marker density requirement and mapping resolution in GWAS.Mating system and genetic diversity influence LD decay.LD decays more rapidly in outcrossing plant species than in self-pollitated plants [43], and similarly in wild relatives and landraces compared to modern cultivars [44].Here, we observed a rapid LD decay for most of the chromosomes, comparable to some maize commercial elite inbred lines [45] and faster than winter type Brassica napus (480 to 1283 kb, r 2 = 0.1) [46].Therefore, the 200 flaxseed accessions of our diversity panel is expected to contain plentiful allelic diversity as suggested for the generally short LD blocks for the 15 chromosomes; consequently assisting the search for candidate genes through efficient narrowing of the putative QTL regions.

Genome-wide association and candidate genes
Several general (GLM) and mixed (MLM) linear models have been proposed to control both population structure and cryptic relatedness [47][48][49].In flax, MLM has been the preferred association model for multiple traits [24][25][26]42].The "red" and "blue" clusters were weakly differentiated, and MC and HC between flax morphotypes was not statistically significant (Figure S2a and S2b) opposite to a report comparing indica and japonica rice types assessed for 34 traits [39].Hence, the genetic architecture of MC and HC seem to be only weakly correlated with population and family structures, and GLM-PCA was sufficient for controlling the rate of false positive associations.
The discovery of QTL for agronomic and economically important traits in crops is of great importance to marker assisted breeding.This is the first report of QTL for MC in flax, likely because this trait has not been a breeding priority in the most important breeding programs of the world [18].
In the present study, GWAS identified seven QTL for MC, and their effects clearly suggested the promise for marker assisted selection towards modifying MC.Chromosome 3's multiple MC QTL harbored candidate genes orthologous to Arabidopsis TT8 gene which is part of a transcription factor complex that, along with GLABRA2 (GL2), regulates MUM4 gene expression [50].MUM4 is required to produce rhamnose, a key substrate for mucilage biosynthesis [50] and chromosome 3 Lus10009311 is its flax ortholog.In Arabidopsis,  [51].The presence of a L. usitatissimum ortholog Lus10009311 in a LD block with a peak SNP for MC corresponds to the expected role of RG I synthesis.Another Arabidopsis gene, SBT1.7 triggers the activation of cell-wall modifying enzymes necessary for mucilage release upon imbibition [52].In line with the expected seed coat mucilage dynamics, we identified two orthologous copies of this gene in two independent QTL (Table 2).Arabidopsis PECTIN METHYLESTERASE INHIBITOR 6 (PMEI6) mutants were defective in seed coat mucilage release [53].An ortholog of the Arabidopsis gene PECTIN METHYLESTERASE 36 (PME36), another family member, was harbored at one of the MC QTL loci identified herein.While PME36 has not been shown to be involved in mucilage release, it might participate indirectly because it exerts a similar role to that of PMEI6 in pectin synthesis and cell wall modification [54].
Oil content is an economically important but genetically complex trait.MC is negatively correlated with oil content; therefore reducing MC should facilitate the increasing of oil content.
Indeed, reduced accumulation of mucilage accompanied by increased oil content was observed in Arabidopsis MUM4 or GL2 mutants [55].We observed a significant negative correlation (r = -0.15,P = 0.03) between MC and oil content in the association panel (data not shown), perhaps due to increased carbon allocation to the embryo in reduced or no seed coat mucilage synthesis in low MC accessions as proposed in Arabidopsis [55].
Increasing seed oil content and reducing the fiber fraction of the meal have been important goals in oil crop breeding.In B. napus and L. usitatissium, seed coat thickness or HC are negatively correlated with seed oil and protein contents as well as seed color [56][57][58].QTL for seed coat color to indirectly increase oil content and minimize HC have been identified in B. napus and soybean [37,59,60].In flax, a pleiotropic QTL controlling yellow seed and white flower color was recently dissected at the molecular level but its effect on HC has not been addressed [23].Here, we identified four QTL whose effects reduced HC by 2.6% on average.Chromosome 7 harbored Lus10035456, which resembles the A. thaliana transcription factor AGL62. AGL62 mutants initiated embryo and endosperm formation, but failed to form a seed coat [61].Light seed color and low HC are thought to coincide because the biochemical pathways leading to lignin and pigment synthesis share common precursors [59].In Arabidopsis, the core components of seed coat pigments are proanthocyanidins (PAs) [62].Chromosome 12 encompassed three candidate genes including the ortholog of Arabidopsis O-GLYCOSYL HYDROLASES FAMILY 17 gene.GH17 is coexpressed with TT12, AHA10 and BAN, that might process glycosylated flavan-3-ol monomers, leading to accumulation of PAs in the seed coat [63].In black seed soybean, a UDP-glucose:flavonoid 3-O-glucosyltransferase (UGT78K1), was isolated from the seed coat, a key enzyme that catalyzes the final step in anthocyanin biosynthesis [64].Chromosome 13 contained Lus10026926, an ortholog of the A.
thaliana UGT79B1, a gene also involved in anthocyanin biosynthesis.Yellow seed color stems from blocked biosynthesis of PAs that impart the brown color to the seed coat [65].The flaxseed meal derived from brown-seeded cultivars contains PAs that negatively affect protein digestion [66], hence low PA meal is preferred in animal ration.Additional advantages of modifying the seed color and reducing MC and HC include higher limpidity of the crude oil from the removal of gum-like residues and dark pigments, higher protein content and better feeding value of flaxseed meal for livestock and poultry [7].

Plant material and field trials
A total of 200 L. usitatissimum accessions, including 153 belonging to the oil morphotype, 33 to the fiber morphotype and 14 of unknown morphotype from the Canadian flax core collection [67] were selected for this study based on their geographic distribution and genetic diversity.This germplasm has already been fully re-sequenced using Illumina technology, generating 1.7 million SNP markers which were anchored to the physical [32] and genetic maps [31] as part of the Total Utilization Flax GENomic (TUFGEN) project (Table S3).The 200 genotypes were planted in 2014 and 2015 at the Agriaquaculture Nutritional Genomic Center (CGNA) experimental stations located in Vilcún (CAR2014) and Huichahue (HU2015), La Araucania region, Chile.The association panel was planted at each location using a completely randomized design (CRD) with biological three replicates.Genotypes were arranged in rows and columns in order to take into account spatial heterogeneity.The association panel was planted in single 2-m rows, with 20-cm row spacing and a seeding rate of 550 seeds m -2 .

Phenotyping of mucilage and hull contents
The seed mucilage content (MC) was determined in three biological replicates following the procedure described by [4] with minor modifications.A total of 2 g of seeds were incubated in 20 mL of water at 100°C for 15 min in 50 mL Falcon tubes.Next, the tubes were shaken for 30 min at 250 rpm.The soluble extract was recovered by centrifugation at 6132 relative centrifugal force (RCF) during 30 min and the mucilage fraction was precipitated by adding 30 mL of ethanol (95%) overnight at 4°C.The residual ethanol was evaporated at 45°C for 24 h.The seeds were recovered and the extraction procedure was carried out twice more to maximize mucilage recovery.Following this, the mucilage pellet was weighed and expressed as milligrams of mucilage per gram of seed (mg g -1 ).
Hull content (HC) was determined in three biological replicates by separating the hull from the embryos using a dissecting needle and tweezers from 50 seeds after imbibition in water for 24 h.
Both fractions were dried at 90°C for 4 h before their dry weights were measured.HC was expressed as: [hull dry weight / (hull dry weight + embryo dry weight)] x 100, averaged from 50 seeds.[70].Trait h 2 estimates were computed using the equation: h 2 = σ 2 a / σ 2 a + σ 2 e, where σ 2 a is the additive genetic variance and σ 2 e is the residual error variance [70].Subsequently, BLUEs were used for GWA analysis.

Population structure and linkage disequilibrium
Population structure was estimated with 259 neutral SSR loci [41], with linkage disequilibrium < 0.4 and distributed across flax's 15 chromosomes.The software STRUCTURE v.2.3.4 [47] was employed with predefine numbers of genetic cluster (K) from 1 ̶ 5, using 50,000 burn-in iterations followed by 100,000 MCMC across five independent runs for each K values.The number of clusters (K) was calculated with the Evanno method [71] implemented in the R package POPHELPER v. 1.1.10 [72].A total of 771,914 SNPs filtered from the 1.7 million SNPs by removing those with a minor allele frequency < 0.05 and > 10% missing data was used to produce a dendrogram using the neighbour-joining (NJ) algorithm implemented in TASSEL v.5.2.31 [70].Genome wide linkage disequilibrium (LD) and intra-chromosomal LD between pairs of SNPs using the 771,914 filtered SNPs was estimated using squared allele frequency correlations (r 2 ) in TASSEL v.5.2.31 [70] with a sliding window size of 50.LD values were plotted against physical distance to determine the LD decay using the Hill and Weir [73] function.A cut-off value of r 2 = 0.1 was set to estimate the average LD blocks [41].

Genome-wide association analysis
GWAS was performed in TASSEL v.5.2.31 [70] using the 771,914 filtered SNPs.Three models were performed including GLM-Q, GLM-PCA and MLM-K.The Q matrix generated from STRUCTURE was used as cofactor to adjust for population stratification (GLM-Q).A GLM-PCA was assessed including up to ten principal component covariates.The ten PCAs were generated in TASSEL v.5.2.31 [70] with 105,038 SNPs (MAF > 0.05 and at least 95% present among the 200 genotypes).For the MLM-K, a kinship matrix was created in TASSEL v.5.2.31 [70] with the set of 105,038 SNPs and used as covariate to account for cryptic relatedness.A quantile-quantile (Q˗Q) plot was displayed using the R package qqman [74] to evaluate the fitness and efficiency of the different models.The final Manhattan plots were also displayed using the qqman package [74].The Bonferroni correction (0.05 / n) criterion for multiple test comparisons provides a strict threshold of significance; therefore, a corrected P-value for multiple hypotheses testing of 6.88 (-log (0.1 / 771,914)) was used as threshold for the significance of marker-trait associations.

Identification of candidate genes
To identify candidate genes associated with significant SNPs, the Jbrowse feature of Phytozome v.12.1 (http://phytozome.jgi.doe.gov/pz/portal.html)was used to examine the L. usitatissimum v.1.0genome [75] for genes relevant to MC and HC in flaxseed.As mentioned above, a cut-off value of r 2 = 0.1 was set to estimate the average LD block for each chromosome.The defined physical distance was used to pinpoint candidate genes on either sides of the most significant SNPs.A plausible candidate gene was defined by the following criteria: BLASTX searches from the Arabidopsis genome returned orthologous protein sequences with functions associated to the phenotypes of interest.

Conclusions
We performed GWAS using a set of 771,914 SNPs, identifying seven and four QTL for MC and HC, respectively.Above all, chromosome 3 encompassed three QTL harboring promising candidate genes for MC.Three of the QTL associated with HC contained plausible candidate genes related to seed coat and anthocyanin biosynthesis.These QTL and candidate genes shed light on the genetic architecture of MC and HC, two complex traits whose phenotyping is labor-intensive and time-consuming.The favorable alleles will be useful in flaxseed breeding to tailor flaxseed cultivars for food and feed through molecular-assisted breeding schemes.Further validation of candidate genes like LuTT8, LuSBT1.7,LuMUM4 and LuAGL62 through gene expression analysis or gene editing is warranted to deepen our understanding of the high complexity of cell wall dynamics as they pertain to seed mucilage and seed coat biosynthesis in flaxseed.

Figure 1 .
Figure 1.Mucilage (MC) and hull (HC) contents distribution in the association panel in two environments: CAR2014 and HU2015.Values represent the mean of three biological replicates for each trait.

Figure 2 .
Figure 2. Population structure and genome-wide linkage disequilibrium decay.(a) Neighbour-joining (NJ) tree for 200 flax accessions based on 779,914 SNPs.(b) Model-based population structure of 200 flax accessions belonging to two clusters predefined by the STRUCTURE software.Each accession is represented by a vertical bar.The color subsections within each vertical bar indicate membership coefficient (Q) to different clusters.(c) Genome-wide linkage disequilibrium decay of r 2 values (red line), against physical distance (Kb) using the Hill and Weir (1988) function in L. ussitatissum.Yellow line indicates the cutoff value (r 2 = 0.1) used to determine the genome-wide LD block size.

Figure 3 .
Figure 3. Single nucleotide polymorphism (SNP) density plot across the L. usitatissimum genome.Number of SNPs and LD blocks are also indicated for each of the 15 chromosomes.

Figure 4 .
Figure 4. Manhattan plots and quantile-quantile (Q-Q) plots of the GLM-PCA across two environments.(a) GWA results (GLM + 10 PCA) for mucilage content in CAR2014 and HU2015, respectively.(b) GWA results (GLM + 5 PCA) for hull content in CAR 2014 and HU2015, respectively.Negative log10 transformed p values are plotted against physical position on each of the 15 chromosomes.The black horizontal dotted line indicates the genome-wide significance threshold (-log10 (P) = 6.88).The most plausible candidate genes are indicated.

Figure 6 .
Figure 6.Combined phenotypic effects of favorable QTL associated with (a) mucilage and (b) hull contents in the association panel.Grey and white boxplots represent the CAR2014 and HU2015 locations, respectively.

Few
accessions combined favorable alleles for reduced MC and HC.It should be possible to combine these attributes in a single genotype through the pyramiding of the respective favorable Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 14 August 2018 doi:10.20944/preprints201808.0258.v1alleles because the significant QTL for both traits did not co-locate in the flax genome.The development of yellow-seeded cultivars with low HC and either low or high MC for different industrial uses is a real option and represents an opportunity for increased market share and value.
(a) the gene had a function known to be related to the trait evaluated based on gene ontology term descriptions in Phytozome; (b) Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 14 August 2018 doi:10.20944/preprints201808.0258.v1 ).The phenotypic variation for MC in CAR2014 ranged from 23.52 to 103.57mg g -1 with an average of 58.67.A lower variation was observed for MC in HU2015, which ranged from 18.88 to 91.90 mg g -1 with an average of 55.04.HC variation ranged from 35.56 to 48.59% in CAR2014 and from 35.73 to 48.59% in HU2015 (Figure1, TableS1).MC and HC were significantly positively correlated in CAR2014 and HU2015 with coefficients of 0.28 and 0.29, respectively.Narrow sense heritability (h 2 ) for MC attained 70.7 and 73.8% in CAR2014 and HU2015, respectively.Lower h 2 of 51.4 and 46.2% for HC at CAR2014 and HU2015 were observed.MC did not differ statistically between flax morphotypes nor seed color classes according to the Kruskal-Wallis non-parametric test.The average MC was 55.33 and 56.63 mg

Table 1 .
Genome-wide significant peak SNPs for mucilage content (MC) and hull content (HC).

Table 2 )
. Five particularly promising candidate genes were identified.

preprints.org) | NOT PEER-REVIEWED | Posted: 14 August 2018 doi:10.20944/preprints201808.0258.v1
[68]rmation criterion (AIC) and Bayesian information criterion (BIC), and the most appropriate model in each environment was used to obtain best linear unbiased estimates (BLUEs) for mucilage and hull contents in GenStat v.16[68].The significance for fixed terms was evaluated using a Wald statistic test.Descriptive statistics and Shapiro-Wilk normality test were conducted in the R package Preprints (www.