Genetic Dissection for Maize Forage Digestibility Traits in a Multi-Parent Advanced Generation Intercross (MAGIC) Population

Forage feedstock is the greatest source of energy for livestock. Unfortunately, less than 50% of their fiber content is actually digested and assimilated by the ruminant animals. This recalcitrance is mainly due to the high concentration of plant cell wall material and to the limited digestion of the fiber by the microorganisms. A Genome-Wide Association Study (GWAS) was carried out in order to identify Single Nucleotide Polymorphisms (SNPs) associated with forage digestibility traits in a maize Multi-Parent Advanced Generation Intercross (MAGIC) population. We identified seven SNPs, corresponding to five Quantitative Trait Loci (QTL), associated to digestibility of the organic matter, 11 SNPs, clustered in eight QTLs, associated to Neutral Detergent Fiber (NDF) content and eight SNPs corresponding with four QTL associated with Acid Detergent Fiber (ADF). Candidate genes under the QTL for digestibility of the organic matter could be the ones involved in pectin degradation or phenylpropanoid pathway. Transcription factor genes were also proposed for the fiber QTL identified, in addition to genes induced by oxidative stress, or a gene involved in lignin modifications. Nevertheless, for the improvement of the traits under study, and based on the moderate heritability value and low percentage of the phenotypic variability explained by each QTL, a genomic selection strategy using markers evenly distributed across the whole genome is proposed.


Introduction
Maize (Zea mays L.) is the most important crop in terms of production, preceding wheat and rice, and is a crop adapted to almost every region in the planet. Forage feedstock is the greatest source of energy for ruminants. Apart from this, the quality of forage feedstock directly affects milk production and performance, being essential for dairy farms. In 2018, forage maize was harvested in 247,384,552 ha in the United States that yielded a total of 121,361 tons/ha [1]. In forage species, fiber comprises 300-800 mg/g dry matter contents. Unfortunately, less than 50% of the fiber content is actually digested and used by the animal. This recalcitrance is mainly due to the high concentration of plant cell wall material and to the limited digestion of the fiber by the microorganisms of rumen. In this sense, the cell wall comprises most of the plant's dry weight, and is composed primarily of three polymer components: cellulose (40-50%), hemicellulose (15-25%), lignin (20-25%), and other components (5-10%) [2]. Cell wall fiber in particular highly influences the nutritional value of the forage [3].
Cell wall digestibility and silage traits (cell wall fiber content) have been extensively studied as breeding targets for improving the feeding value of the forage crops [3][4][5][6]. Overall, to improve forage varieties, it is necessary to improve both production and quality, which translates to the enzymatic digestibility of the forage. Both traits are subjected to plant genetic variation, and are of major interest for silage quality in cereals. Major genes such as those behind brown midrib mutants, as well as minor genes underlying small-effect QTL that have been detected all over the genome, contribute to the variability in cell wall digestibility [7]. Some studies of silage quality and digestibility assign their variations mainly to genes and QTLs related or involved in monolignol biosynthesis [3]. Nevertheless, some other studies suggest that a large part of the variation for cell wall digestibility is explained not only by lignin, but also by the crosslinking of feruloylated arabinoxylans to G units of lignin via ether bonds mediated by ferulic acid, as well as the crosslinking of arabinoxylans [3].
In the identification of genomic regions associated with maize biomass digestibility mapping of QTL has been widely used. In this sense, Truntzler et al. [7] carried out a meta-QTL analysis of the QTL described for cell wall digestibility as a way to synthesize information, infer the number of meta-QTL involved in trait variation and estimate the position of the meta-QTL with an increased accuracy. As a result, they described 26 meta-QTL for digestibility. The studies in the 14 publications analyzed by Trunztler et al., along with the later published research were performed using bi-parental populations, with the exception of a study performed by Wang et al. [8] that used an inbred panel. Multiparent Advanced Generation Intercross (MAGIC) populations, present several advantages over bi-parental populations and association panels: (1) contrary to bi-parental mapping populations, multiple founders' populations have the ability to analyze several alleles simultaneously, besides presenting more genetic variation, (2) allele frequencies are more balanced than in panels because founders contribute equally to the population (3) the mapping power and resolution of MAGIC maize population are strengthened by high minor allele frequencies and a rapid decay of linkage disequilibrium, respectively [9,10]. Therefore, results from QTL mapping in MAGIC populations could be complementary to results from bi-parental populations and association mapping panels. In addition, even though QTL resolution in MAGIC populations is not as high as in diversity panels, MAGIC populations present a known underlying structure that prevents false positive associations [11,12].
One of the most robust techniques for association mapping is the genome wide association (GWAS) analysis, that enables high-resolution mapping of QTL to narrow genomic regions and to search for genes that contribute to the variability of each trait [10]. This study aims to deepen into the genomics of maize digestibility and forage quality traits in a maize MAGIC population adapted to European Atlantic coast conditions using a GWAS approach. This would provide useful information in order to set the groundwork for future breeding programs focused in the enhancement of forage digestibility, especially in the region where this research was carried out, where dairy farms are crucial for rural development.

Development of the MAGIC Population
A MAGIC population using eight temperate maize inbred lines of diverse genetic origin was developed. The eight founders have two shared characteristics: the lack of stiff stalk materials in their pedigrees and partial resistance to corn borer attack [13,14]. Six of them come from European germplasm (EP17, EP43, EP53, EP86, PB130 and F473) and two from American germplasm (A509, EP125). Procedures used to release the 672 recombinant inbred lines of the population have been previously reported [13,14]. The pedigree of each line is shown in Table 1.
The initial step to build the MAGIC population was to obtain four single crosses, each including a different pair of parents. In the next generation, two double hybrids were obtained, each involving a different pair of single hybrids. Finally, the two double hybrids were crossed to get the eight-way cross (described in [15]). The eight-way cross was random mated for six generations. In each generation a minimum of 50 crosses were made between 100 different individuals. A bulk was made with the same number of kernels form each ear to contribute to the next generation. After six cycles of recombination, plants were self-pollinated during six generations using the single seed descent method and approximately 700 highly homozygous lines were obtained, from which approximately 600 RILs are maintained nowadays. The method used guarantees that each RIL derives from a different plant of the random mating population.

Experimental Design
A subset of 408 RILs of the MAGIC population together with eight founders were tested in a single augmented design with 10 blocks in Pontevedra, Spain (42 • 24 N, 8 • 38 W and 20 m above sea level), during two seasons (2016 and 2017). Forty-two non-replicated RILs plus the eight founders (PB130 and EP42 were replaced by EC212 and EP60 in both years respectively due to lack of seed availability) were randomly assigned to plots within each block. Only 30 RILs were evaluated in block 10. Each experimental plot consisted of a single row with 13 single-kernel hills planted manually, spacing between consecutive hills in a row being 0.18 m and 0.8 m between rows, obtaining a final density of~70,000 plants ha −1 . Local agronomical practices were fulfilled.

Phenotypic Data
Plots were harvested approximately 55 days after silking (days from planting until half of the plants in the plot showed visible silks). In order to avoid precocity/maturity differences, RILs were harvested in sets based on its flowering time. In this way, dry matter content of the samples varied from 40 to 60%. The stover sample was composed of tissue from at least two plants per plot, chopped, pre-dried at 35 • C in a forced air camera, then dried at 60 • C in a stove. Lastly, dry stover samples from each plot were grounded in a Wiley mill with a 0.75 mm screen. Biochemical determinations were performed using these samples.
All the samples harvested were analysed by NIRs. 214 were included as an expansion of a previous calibration equation for this type of material (calibration set). From those 214, a total of 47 were as well analysed by wet chemistry.

Digestibility of Organic Matter
Digestibility of Organic Matter (DOM) of stover was determined by Near-infrared spectroscopy (NIRs) at LIGAL. Calibration reference set consisted of 214 samples, from which 47 were selected for wet chemistry determinations. The spectral information of the dried and grounded (0.75 mm) samples for DOM was obtained using a Foss NIRSystem 6500 monochromator spectrophotometer (Foss NIRSystem, Silver Spring, WA, USA), located in an isothermal chamber (24 ± 1 • C), provided with a rotation module that performs reflectance measurements in the spectral region between 400 and 2500 nm, at 2 nm intervals.
The collection of the spectral data and the chemometric analysis of the data was carried out using the WinISI II v program 1.5 (Infrasoft International, Port Matilda, PA, USA). In order to detect the presence of extrapolations of the prediction model, the identification of "outliers" samples (spectra not represented within the available NIRS calibration group) was performed using the Global Mahalanobis distance (GH). Those samples that presented GH values greater than 3 were considered outliers [16]. The samples recognized as outliers were analyzed by the in vitro digestibility procedure described by Tilley and Terry [17] modified by Alexander and McGowan [18]. Predictive performance of the NIRs calibrations model is detailed below: DMO (R 2 = 0.94, 1 − VR = 0.92, SEC = 1.65, SECV = 1.90, RPD = 3.42, and RER = 16.78). In which R 2 refers to the coefficient of determination of calibration, 1 − VR is the coefficient of determination of cross validation, SEC the standard error of calibration, SECV the standard error of cross validation, RPD the Ratio of Performance to Deviation and RER the ratio Error Range.

Fibers Composition
Neutral Detergent Fiber (NDF) and Acid Detergent Fiber (ADF) were determined by Near-Infrared Spectroscopy (NIRs). Calibration reference set consisted on 214 samples, from which 47 were selected for wet chemistry determinations.
The spectral information of the grounded dry (0.75 mm) stover samples was collected in the same way as for DMO. The samples recognized as outliers were analyzed in duplicate by reference methods in the LIGAL laboratory. Subsequently, the reference values were integrated into the corresponding spectral library of the calibration group, expanding and updating the NIRS prediction equations. Wet determinations of NDF and ADF were carried out following the procedures proposed by Van Soest and Robertson [19] and by Goering and Van Soest [20], respectively, adapted to the Fibertec System model 1020 digester (Foss  Tecator AB

Statistical Analysis
RILs were previously genotyped with 955,690 SNPs using a genotyping-by-sequencing (GBS) strategy at the Institute of Biotechnology of the Cornell University. Genotype matrix was filtered. SNPs with more than 50% missing data and a minor allele frequency less than 5% were omitted. Heterozygous genotypes were considered missing data. After filtering, 215,131 SNPs distributed across the maize genome were retained.
Individual and combined analyses of variance of trials were performed using the mixed model procedure (PROC MIXED) of the SAS program (version 9.4, SAS Institute Inc.: Cary, NC, USA,) [21] and the best linear unbiased estimator (BLUE) for each inbred was calculated based on the combined data for the 2-year analysis. Lines were considered as fixed while years and blocks (years) were considered random factors. The phenotype matrix was created with BLUE estimations. The comparison of means was carried out using the Fisher's protected least significant difference (LSD). Heritabilities (ĥ2) were estimated for each trait on a family mean basis as previously described by Holland et al. [22]. The genetic (rg) and phenotypic (rp) correlation coefficients were calculated using REML estimates according to a published SAS mixed model procedure [23].
A GWAS was completed with Tassel 5 [24] based on a mixed linear model using a genotype-phenotype matrix and a kinship matrix obtained by the centered IBS method [25]. Among the mixed linear model options, we used the optimum compression level and P3D to estimate the variance components.

SNPs, QTL and Candidate Gene Selection
The experiment-wise threshold for a significant association between a trait and a SNP was p = 1 × 10 −4 after verifying that at that point the observed F test statistics deviated from the expected F test statistics in the Q-Q plot of the model (Supplementary Figure S1). We considered a ±700 kbp confident interval region around each significant SNP following previous association studies using the same mapping population [26]. In case confidence intervals of two SNPs overlapped they were assigned to a single QTL. The two described genes that delimit the ±700 kbp region around the SNP in the reference genome assembly version 2 were positioned in version 4 of the reference genome, and all genes contained in the region delimited by those genes were then identified and characterized based on the maize B73 reference genome assembly (version 4) available on the MaizeGDB browser [27] (Supplementary Table S1).

Results
RILs of the MAGIC population together with the eight founders were evaluated for digestibility traits in stover samples. Means and ranks are shown in Table 2. Means among inbred founders and among RILs differed significantly for DOM and for fibers, whereas heritability estimates for those traits were also significant and moderate ( Table 2).

Genotypic and Phenotypic Correlations
We found high positive correlations, both genotypic and phenotypic, between NDF and ADF. However, correlation coefficients between cell wall fibers and DOM were lower and negative (Table 3).

Association Analysis
A marker was considered significantly associated with a trait at p values less than 1.00 × 10 −4 (−log10 (p-value) = 4.0) following Q-Q plot for all traits. We considered a +/−700 kbp region as SNP confident interval and two SNPs were considered the same QTL when their confident intervals overlapped. We found a total of 26 SNPs that corresponded with 17 QTL. We identified seven SNPs associated with five DOM QTL, 11 SNPs, clustered in eight QTL, associated with NDF and eight SNPs, corresponding with four QTLs, associated with ADF (Table 4). Major frequency alleles contributed always to increased fiber content and digestibility of the organic matter levels except for one SNP associated to ADF. The percentages of variances explained by each significant SNP ranged from 5 to 9%. The significant SNPs found in the current study were distributed in bins 1.07, 3.05, 5.03 and 8.03 for digestibility of the organic matter; bins 1.08, 3.02, 3.05, 5.09 and 7.02 for NDF, and in bins 7.02, 3.05, 5.04 and 6.04 for ADF (Table 4). Table 4. SNPs and QTL significantly associated for final use of maize stover traits Digestibility of the organic matter and fibers), including SNP's chromosome, bin and position within chromosome, allelic variants, number of lines with an allelic variant, and additive effect for the SNP, proportion of total variance explained by the SNPs and p-value for the association between the SNP and the phenotype. indicates the chromosome and the number after the last underscore indicates the QTL order within the chromosome. c : The number before the underscores indicates the chromosome number and the number after the underscore indicates the physical position in bp within the chromosome. d : Chromosome. e : A bin is the interval that includes all loci from the leftmost or top Core Marker to the next Core Marker. The genetic maps are divided into 100 segments of approximately 20 centiMorgans designated with the chromosome number followed by a two-digit decimal) [27]. f : The letter before the diagonal is the nucleotide with the largest value; and the letter after the diagonal is the nucleotide with the smallest value. g : No: number of inbred lines homozygous for a determined allelic variant, the number before the diagonal represents the number of homozygous with the largest mean value; and the number after the diagonal the number of homozygous with the smallest mean value. h : Additive effect: the additive effect was calculated as half the difference between the mean of the homozygous for the allele with the largest value and the mean of the homozygous for the allele with the smallest value. i : Percentage of phenotypic variance explained by each marker.

Candidate Gene/Enzyme Selection
Based on the annotated functions of the identified genes and enzymes within the supporting intervals for QTLs associated with DOM, we spot two genes that could be considered as good candidates: Rhamnogalacturonan lyase (Zm00001d031993) which degrades the rhamnogalacturonan I backbone of pectin [28] and (tf 38) (Zm00001d032024) which was shown to be under-expressed in bmr1 and bmr2 mutant plants [29].
In the case of cell wall fibers, we selected several candidate genes and enzymes that could be behind the significant associations between SNP polymorphisms and NDF content: MYB 51 (Zm00001d018465), MYB 22 (Zm00001d039475), glutathione reductase (Zm00001d018454), UDP-glucose dehydrogenase (Zm00001d033188). The gene 5,10-methylenetetrahydrofolate reductase (Zm00001d042247) is proposed as candidate gene for the overlapping QTL for NDF and ADF contents. In addition, the enzyme glutathione dehydroascorbate reductase (Zm00001d037240) is suggested as a probable candidate for the other QTL detected for ADF content (Table 5). Among the genes suggested as being associated with fibers there are genes that encode transcription factors that may be involved in cell wall development (MYB tf); enzymes induced in oxidative stress scenarios (glutathione reductase) and a gene involved in lignin polymer modifications (MTHFR).

Means Comparisons, Correlations and Heritability
We have observed genetic variation among RILs for all traits; therefore, this allows us to propose the best RILs that could be next crossed to testers in order to determine their suitability for being parents of commercial hybrids. Besides, as this MAGIC population is characterized by the lack of Reid germplasm, we could expect good hybrids combinations with inbreds of this heterotic group. Simultaneously, the best RILs could be used as base materials for selection programs oriented to provide, in the medium or long run, new inbreds with improved characteristics.
Correlations between fibers are significant and follow the trends reported in the literature [6,8,30,31]. This suggests an important co-variation among the main fibrous components of the cell wall. Attending to digestibility, in this MAGIC population we found a medium-low correlation coefficient between fibers and digestibility following the trend reported in the literature but with lower values [8,31]. This low biological significance may be caused by the development of the MAGIC population. The eight-way cross of the founder lines was random mated for six generations and after this, plants were self-pollinated during six generations using the single seed descent method [15]. After the multiple recombination events the initial associations that could be found among the founders can be lost or weakened in the RILs.
In addition, the moderate heritability estimates for DMO and fibers, that agree with the ones obtained in previous studies [32][33][34], point to the feasibility of improvement through phenotypic selection. However, this type of selection is time and labor-consuming. Another strategy, to be used in combination with phenotypic selection, could be a marker assisted selection (MAS) using markers significantly associated with the traits under study. Nevertheless, MAS should be excluded because of the low percentage of the variability explained by the QTL detected in the current study. In this sense, in a whole genome base a genomic selection could be implemented as an alternative to phenotypic or marker assisted selection as next described.

QTL Co-Localizations
Several genomic studies have focused in enhancing feedstock digestibility and ruminant energy intake, thus regions where we found markers associated to forage digestibility are included within the supporting intervals of QTLs previously described for the same traits. In this way, markers associated with DOM localized with QTLs for this trait portrayed by Barrière et al. [32,35] and Meng et al. [33] in bin 1.07; by, Wang et al. [8] and Courtial et al. [36] in bin 3.05, and by Bohn et al. [37] in bin 8.06.
Similarly, the QTL detected for both NDF and ADF co-localized in the bin 3.05 with QTLs previously reported by other authors [4,32,34,35,38,39]. Additionally, markers associated with ADF in this study colocalized with described QTLs for the same trait by Wang et al. [8] in bins 5.04 and 7.02.
This coincidence of co-localizations between traits related to digestibility makes bin 3.05 and the QTLs contained on it good candidates for future breeding programs. In addition, although digestibility and fibers have been extensively studied, some genomic regions identified in the current study can be added, such as QTLs in bin 1.08 and 5.09 for NDF and in bin 5.03 for DOM.
Moreover, in the same genomic regions that we found associated with forage quality traits, we also found QTLs previously reported for hydroxcycinnamic acids [40][41][42][43] and other important cell wall components such as hemicellulose, cellulose or lignin content [39,41,44].
To summarize, even though correlations among DOM and fibers were significant, the values obtained were not high enough to support indirect selection programs for enhancing DOM based on fibers. Thus, these results support a direct breeding strategy for the improvement of DOM, based preferably on genomic selection. Genomic selection is based on trait values predicted as the sum of and individual's breeding value across all the markers distributed along the genome. It has been observed that this selection strategy leads to high correlations between predicted and true breeding value for a quantitative trait [45].

Candidate Genes/Enzymes Proposal
From the listed genes/enzymes found within the confidence interval for each SNP, we explain here why genes involved in cell wall synthesis and deconstruction, oxidative stress, and transcriptional control of genes involved in the phenylpropanoid pathway could be considered as good candidate genes for DOM or fiber content.
Although lignin content arises as the main target to increase stover digestibility, manipulating cell wall polysaccharides appears as an alternative to improve roughage digestibility [5]. Thus, enzymes that participate both in the synthesis and modification of cell wall polysaccharides have been considered good candidates for DOM and cell wall fibers QTLs [5,[46][47][48]. Within the supporting intervals of qDOM_1_1 QTL we found a rhamnogalacturonan lyase, an enzyme that degrades the rhamnogalacturonan I backbone of pectins. Despite its low abundance in maize stems and leaves, pectin fraction may influence digestibility; in this sense, this lyase enzyme can reduce overall content of rhamnogalacturonan I but may also induce morphological changes in the pectin fraction [28,49]. Transgenic poplar lines expressing an Arabidopsis thaliana gene encoding rhamnogalacturonan lyase showed enhanced cell-cell separation and increased accessibility of cellulose and xylan to hydrolytic enzyme activities, which would have enhanced digestibility [50]. Increasing the pectin content of the cell walls has been proposed as a target for improving digestibility as pectin degradation, in contrast to cellulose or hemicellulose degradation, is more quickly digested [5]. On the other hand, and in reference to fibers, we found an UDP-glucose dehydrogenase within the supporting interval of qNDF_1_1. This enzyme oxidizes UDP-Glucose to the corresponding uronic acid and thereby controls the pool of the common precursor, UDP-D-glucuronate, of major cell wall polysaccharides (UDP-D-xylose and UDP-L-arabinose). This gene could be a good candidate for NDF due to its putative regulatory role in sugar interconversions within cell-wall polysaccharide synthesis [51,52].
Oxidative stress has a negative effect on biomass and plant fitness [53][54][55] so we hypothesized that plant mechanisms to protect from ROS damage could contribute to enhanced plant tolerance, and thereby contributing to enhance plant development, biomass and fiber composition. Accordingly, we suggest a glutathione reductase as candidate for qNDF_5_1 and a glutathione dehydroascorbate reductase for qADF_6_ and also catalyzes the GSH-dependent reduction of dehydroascorbate (DHA) to ascorbate, a reaction implicated in plant redox homeostasis.
Lignin has been pointed out as the most important polymer in the determination of biomass recalcitrance [56]. This role has led to the identification of the genes involved in monolignol biosynthesis and polymerization, as well as the transcription factors that regulate lignin synthesis. The maize MYBs are classified into 37 groups [57], according to the phylogeny, expression patterns, and structural and functional characteristics. Those groups encompass a large number of plant biological functions such as regulation of secondary metabolism, the control of cell shape, the response to various stress conditions, or hormone responses in higher plants [57]. In the current association study, we identified two genes encoding MYB tf that lie within the QTL confidence intervals. We found one MYB tf (MYB51, Zm00001d018465), included in the group "Defence/Cell Wall development", within the confidence interval of qNDF_5_1; and another (MYB22, Zm00001d039475) within the confidence interval of qNDF_3_1 clustered in the group "Cell Wall development/thickening". We considered those genes as candidates for NDF as it is considered as proxy of total cell wall concentration, thus any gene that regulates cell wall development or thickening would probably influence NDF content [58].
Similarly, within the confidence interval of a DOM QTL, we found a gene encoding MYB tf 38. Guillaumie et al. [29] studied the differential expression of phenylpropanoid pathway related genes in brown midrib mutants. They found that MYB tf 38, was under expressed in brm1 and brm2, maize mutants characterized by increased digestibility. They found that ZmMYB 38 protein was the closest maize sequence to Eucalyptus EgMYB2, which regulated secondary cell wall formation and lignin biosynthesis [29,59].
Finally, within the supporting interval of a QTL associated with both NDF and ADF, we spot a 5,10-methylenetetrahydrofolate reductase (MTHFR). This enzyme that catalyzes the conversion of 5,10-methylenetetrahydrofolate to 5-methyltetrahydrofolate, can disturb lignin biosynthesis. MTHFR enzyme is involved in the metabolism of methionine, a precursor of the methyl donor S-adenosyl-L-methionine (SAM) [60,61]. Because SAM is consumed by both caffeoyl-CoA O-methyltransferase (CCoAOMT) and caffeic acid/5-hydroxyferulic acid O-methyltransferase (COMT) [60], alterations in the accumulation of SAM are expected to reduce the accumulation of both G-and S-lignin monolignols affecting cell wall composition and performance. Therefore, we consider MTHFR as an interesting candidate considering that is linked to both acid and neutral detergent fiber composition.
To sum up, some genes are presented as candidates for being involved in genomics of the traits under study based on their annotated function. Nevertheless, other genes that are not yet characterized, or encoding genes and "hypothetical protein" could also be potential genes of interest. The regions identified by the QTL in this study are starting points to narrow down candidate genes, but may also help to advance the understanding of maize's genetic complexity for digestibility and fiber composition. The annotated associations need to be further validated in subsequent studies.

Conclusions
Markers underlying QTL and target genes in this research allow a better understanding of the factors that can globally impact the traits under study, and may help to establish breeding programs for increasing components that directly influence the final use of the maize feedstock. Of special interest are those QTLs detected in this study that co-localized with other previously described digestibility QTLs, since they emphasize the influence and importance of said genomic regions. The best method for improving DMO would be direct genomic selection using genotypic data across the whole genome. However, in view of the heritability values obtained, a phenotypic selection could be also adequate.