Multi-Allelic Haplotype-Based Association Analysis Identifies Genomic Regions Controlling Domestication Traits in Intermediate Wheatgrass

Intermediate wheatgrass (IWG) is a perennial forage grass undergoing a rigorous domestication as a grain crop. As a young grain crop, several agronomic and domestication traits need improvement for IWG to be relevant in current agricultural landscapes. This study genetically maps six domestication traits in the fourth cycle IWG breeding population at the University of Minnesota: height, seed length, seed width, shattering, threshability, and seed mass. A weak population structure was observed and linkage disequilibrium (r2) declined rapidly: 0.23 mega base pairs at conventional r2 value of 0.2. Broad-sense heritabilities were overall high and ranged from 0.71–0.92. Association analysis was carried out using 25,909 single SNP markers and 5379 haplotype blocks. Thirty-one SNP markers and 17 haplotype blocks were significantly associated with the domestication traits. These associations were of moderate effect as they explained 4–6% of the observed phenotypic variation. Ten SNP markers were also detected by the haplotype association analysis. One SNP marker on Chromosome 8, also discovered in haplotype block analysis, was common between seed length and seed mass. Increasing the frequency of favorable alleles in IWG populations via marker-assisted selection and genomic selection is an effective approach to improve IWG’s domestication traits.


Introduction
Intermediate wheatgrass [IWG, Thinopyrum intermedium (Host) Barkworth & D.R. Dewey subsp. intermedium, 2n = 6x = 42] is a perennial forage grass species currently under domestication as a new grain crop [1]. In addition to its grain's use as human food [2], IWG offers remarkable ecosystem services such as reduction in soil erosion, surface nutrient runoff, nutrient leaching to groundwater, and increased carbon sequestration [3][4][5]. A close relative of wheat (both belong to the Triticeae tribe), IWG was introduced into the United States from the Maikop region of Russia in 1932 [6]. The crop was recently adopted by multiple breeding programs in an attempt to domesticate it as a food crop.
The University of Minnesota (UMN) started its IWG domestication program in 2011 from 2,560 individuals that were derived 66 mother plants of the third recurrent selection cycle at The Land Institute in Salina, KS, USA [7]. The main goals of the breeding program are to improve grain yield and yield component traits, domestication traits such as seed shatter, free threshing grain, and seed size as well as other agronomic characteristics such as height, resistance to lodging, uniform flowering and maturity, and disease resistance. Between 2011 and 2020, the UMN IWG breeding program has completed four selection cycles and extensive progress has been made in improving these traits. This progress throughout the years enabled the breeding program to release the first synthetic IWG cultivar for grain production, MN-Clearwater, in 2019 [8].
Despite the progress made in improving IWG germplasm, IWG is still in the early stages of domestication. Several key traits need major improvement to help improve the crop's agronomic performance. A set of traits, commonly known as the "domestication syndrome", distinguishes most food crops from their wild progenitors as they essentially inspired humans to adopt plant cultivation [9,10]. Examples of domestication traits include larger seed size (mass and dimensions), reduced seed shattering, better grain threshing, i.e. de-hulled state, and short plant height. Intermediate wheatgrass was selected out of 100 other perennial grain species because of its easily threshed seeds, relatively large seeds, and resistance to shatter [1]. Yet, after nearly four decades of selection, further improvement of these traits is still needed to ensure the crop's success in the future.
One approach to improve traits of interest is by recurrent phenotypic selection. This was the primary method of IWG germplasm improvement until the mid-2010s when Zhang et al. [7] optimized a genomic selection-based breeding approach in IWG. Vital to the success of genomic as well as marker-based selection methods is the availability of genome-wide markers that provide adequate coverage of loci controlling the traits of interest [11]. Discovery of markers in strong linkage with trait loci can assist in markerassisted selection of suitable genotypes as well as in discovery of candidate genes [12]. Such marker-trait associations (MTAs) are generally uncovered by association mapping studies or genome-wide association studies that evaluate large populations for traits of interest and regress the trait data on marker information to identify the MTAs. As the significance of MTAs can confound with allelic frequency due to population structure, it is important to correct for existing population structures to avoid false associations [13]. In recent years, association studies have increasingly used mixed models with genetic relationships as cofactors to correct for population structure to lower the number of false signals [14]. When paired with high-throughput marker systems, association studies report higher levels of marker polymorphism and allelic variation, thus aiding the discovery of useful, novel alleles associated with the traits [15,16].
Single nucleotide polymorphism (SNP) markers are the widely preferred high-throughput marker system for genomic studies including gene mapping approaches. They are abundant in most genomes, offer better discriminatory power relative to other marker types, and are easily converted to molecular markers for wet-lab assays [17]. SNP markers discovered via partial or whole-genome sequencing also enable the discovery of population-specific SNPs free of ascertainment bias at a low cost per data point [18][19][20]. In most methods of association analysis, SNP markers are independently tested for their association with the trait, but this can be problematic for complex traits controlled by several genomic loci [21,22]. In addition, a single-SNP model might not be able to capture the true allelic diversity given their usual bi-allelic nature [23]. Testing multiple SNP markers, either with a multi-locus model that simultaneously evaluates adjacent markers or with haplotype blocks where tightly linked markers are converted to a single multi-locus haplotype block, is proposed to better represent the genetic architecture of complex traits [24,25].
This research study was therefore carried out with the following objectives: (i) discover genome-wide markers and describe the haplotype distribution in the fourth cycle IWG breeding population at the University of Minnesota; (ii) evaluate trait relationships and heritabilities, linkage disequilibrium, and population structure; and (iii) use single SNPs and multi-allelic haplotypes to characterize genomic regions controlling six important domestication traits: plant height, seed length, seed width, shattering, and threshability.

Plant Population
The population used for association analysis in this study is the fourth recurrent selection cycle (C4) at the University of Minnesota (UMN_C4 hereafter). This population was obtained by crossing 73 cycle 3 (UMN_C3) IWG genets selected based on their superior genomic estimated breeding values (GEBVs) obtained from genomic selection models trained on the UMN_C3 field data collected during 2017-2019. A genet is defined as a genetically unique organism and refers to individual plants in an outcrossing species such as IWG [7]. These 73 parental genets were selected mainly for larger seed size, better Agriculture 2021, 11, 667 3 of 15 threshability, and reduced seed shattering, while strong selection pressure was also placed on higher grain yield and reduced plant height. The parental genets were vernalized at 4 • C for 8 weeks during November-December 2017 and intercrossed in a greenhouse during January-March 2018. From each mother plant, nine random seeds were germinated in June 2018, cloned into two replicates in August 2018, and transplanted in the field in September 2018 at two MN locations: Crookston and St. Paul. Transplanted clones were planted approximately 0.91 m (3 ft) apart and IWG border plants surrounded the plots on all sides. In both cropping years (2019, 2020), 45 kg ha −1 of N was applied in April in St. Paul and in May in Crookston. Weed control in the plots was done with mechanical cultivation and manual labor. The herbicide Dual II Magnum (S-Metolachlor 82.4%, Syngenta) was applied in April of both years at a rate of 1.2 L ha −1 . After harvesting the grain, plants were mowed to 15-20 cm. In the final association analysis, 637 genets were used after discarding genets that were lost due to plant death or poor genotypic data.

Genotyping
DNA was extracted from 10-15 cm long leaf tissue using the BioSprint 96 DNA Plant Kit (QIAGEN, Valencia, CA). Extracted DNA was quantified using picogreen, normalized to 10 ng/µL, and digested with the enzymes PstI and MspI. Samples were pooled at 192-plex to form sequencing libraries that were sequenced on Illumina's Novaseq 600. Generated read sequences were passed through a quality filter of Q > 30 and de-multiplexed to obtain reads for each genet. Reads were aligned to the IWG reference genome v2.1 [26] using 'bwa mem' [27], and SNP-calling was done with 'samtools mpileup' and 'bcftools' [28]. SNPs with minor allele frequency (MAF) of less than 3% and more than 20% missing data were removed which resulted in 25,909 high quality SNPs. This SNP-set was imputed in Tassel version 5.2.71 [29] with the LD-kNNi method [30] using 30 nearest neighbors. To estimate imputation accuracy, known genotypes of 20% alleles in the input file were masked before imputation then compared with allelic predictions of the masked genotypes. Forcedimputation was not carried out if the missing genotype of a locus could not be resolved.

Phenotyping and Statistical Analysis
The UMN_C4 IWG population was evaluated at Crookston, MN and St. Paul, MN in 2019 and 2020 for multiple agronomic and domestication traits, of which we focus our association analysis on six important domestication traits: (1) plant height, (2) seed length, (3) seed width, (4) shattering, (5) free grain threshing (threshability hereafter), and (6) seed mass measured in terms of thousand kernel weight (TKW). To measure these traits, 10 mature spikes were harvested per plant and dried at 32 • C for 72 h. Plant height was measured 1-7 d before harvest by measuring the length from base (ground) to the tallest inflorescence. Shattering was measured as the percentage of seed, spikelets, and occasionally a portion of the spike itself that broke off from the spike. It was measured on a 0-9 scale where 0 is no shattering observed and 9 is 90% or more shattering. Spikes were threshed using a Wintersteiger LD 350 (Wintersteiger Inc, Salt Lake City, UT, USA), and threshability was measured on a 0-9 scale where 0 was assigned to completely hulled grain and 9 for 90% or more de-hulled (naked) grain. Approximately 100-300 de-hulled grain of each genet were scanned using a Marvin seed analyzer (MARViTECH GmbH, Wittenburg, Germany) to obtain seed dimensions. The same imaged seeds were weighed to obtain TKW.
A mixed model in the R package 'lme4' was used to correct the trait data for trial effect (i.e., environmental variability) and obtain the best linear unbiased estimate (BLUE) for each genet. Specifically, BLUEs were obtained by removing the fixed effect estimate for each environment from the trait value for each genet in that environment. Broad-sense heritability (H) of each trait was calculated on a genet-mean basis using the following formula: H = σ g 2 /(σ g 2 + σ gl 2 /l + σ gy 2 /y + σ e 2 /ly), where σ g 2 is the genetic variance, σ gl 2 is the genotype x location variance, σ gy 2 is the genotype x year variance, σ e 2 is the residual variance, l is the number of locations, and y is the number of years.

Linkage Disequilibrium and Population Relatedness
Pairwise linkage disequilibrium (LD) among the markers was calculated in Tassel 5.2.71 with a sliding window of 50 markers. Obtained LD (r 2 ) values were plotted against the physical distance obtained from IWG v2.1 reference genome and a locally weighted polynomial regression (LOWESS) curve was fitted to display the LD decay. Decay distance of LD was estimated using the method of Hill and Weir [31], and assessed at the conventionally accepted r 2 value of 0.2 [32]. Genetic relatedness among the genets was estimated in Tassel 5.2.71 using the Centered_IBS method with default parameters. Population structure was analyzed using principal component (PC) analysis with the function prcomp in R.

Haplotype Construction
Haplotype blocks were constructed for each chromosome separately using HAPLOVIEW 4.2 [33]. Block construction was done based on the confidence interval algorithm of Gabriel et al. [34] that considers LD (D') among the markers or loci within and outside a proposed block. SNP markers were considered to be in strong LD if the lower boundary confidence interval of D' was ≥0.80 and the upper boundary was ≥0.98. Haplotype blocks were converted into multi-allelic markers by considering allelic combinations within each block as independent alleles. Thus, haplotype blocks were numbered in ascending order (1,2,3 . . . n) if a block was not observed before [35]. Singletons, i.e., haplotypes defined by a single SNP, were not used in the association analysis that used multi-allelic haplotype markers.

Association Analysis
Association analysis was carried out using the trait BLUEs, as described above. Single SNP analysis was done using the 'FarmCPU' model in the R package 'GAPIT' [36,37]. FarmCPU is a multi-locus mixed linear model and is known to control false positive and false negative associations. Association analysis using multi-allelic haplotype data was done in Tassel 3 with the mixed linear model [29]. Because of a weak population structure, PC values were not included in the model for association analyses. In both association analyses, significant quantitative trait loci (QTL) were declared at default Bonferroni thresholds at α = 0.05, i.e., i.
For multi-allelic haplotype blocks: at α/no. of observations = 0.05/5379 = p value of 9.30 × 10 −06 or LOD equivalent of 5.03 In both analyses, the percentage of phenotypic variation explained by the significant markers (R 2 ) were estimated following the method of Sen and Churchill [38] as implemented in the R package 'qtl' [39]. Favorable alleles in this study are defined as alleles that contribute towards the improvement of trait values for seed length, seed width, threshability, and thousand kernel weight (TKW) and that reduce trait values of plant height and shattering.

Genotyping and Population Properties
Allele calling using the IWG reference genome v2.1 resulted in discovery of 4,782,922 genome-wide SNP markers in the UMN_C4 population. After discarding SNP markers with minor allele frequency lower than 3%, 1,820,973 remained, which subsequently reduced to 25,909 after removing markers with missing allele proportion of >20%. The average marker distribution per chromosome was 1234 and ranged from 588 (Chromosome 21) to 2177 (Chromosome 20). Missing allelic information was imputed using the LD-kNNi method in Tassel using 30 nearest SNPs. Imputation accuracy on this set of 25,909 SNPs was 94.7%. The imputation step also lowered the highest missing allele proportion from 20% to 3.1%. The overall population heterozygosity was 34%.
A weak population structure was observed among the genets in UMN_C4 as the first two principal component (PC) axes explained 5.1% of the total genetic variation ( Figure 1A); the first 10 PC axes explained <16% of the total genetic variation. Mean genome-wide LD (r 2 ) value was 0.05 and ranged from 0.04 (Chromosome 21) to 0.06 (Chromosome 11). The physical distance at which LD reduced to half its value at conventional r 2 = 0.20 was found to be 0.23 mega base pairs (Mbp) ( Figure 1B).

Haplotype Distribution
Haplotype construction using HAPLOVIEW resulted in the placement of 14 (58%) out of 25,909 SNP markers in 5379 haplotype blocks (Table 1). On average, chromosome housed 256 haplotype blocks with the least number of blocks (127) on C mosome 21 and the highest number of blocks (453) on Chromosome 20. The average n ber of unique haplotype blocks per chromosome was 8. Most haplotype blocks w formed with 2 SNP markers (58%) followed by 3 (23%). The maximum number of markers in the blocks was 11 (<0.5%). Haplotypes were converted to multi-allelic mar by considering allelic combinations within each block as independent alleles. The num of alleles (i.e., nucleotide base calls) in these multi-allelic markers ranged between and 68 per block, with an average of 11 alleles (median = 7). The most frequent haplo block consisted of seven allelic combinations (10%), followed by four and six (both a frequency).

Haplotype Distribution
Haplotype construction using HAPLOVIEW resulted in the placement of 14,951 (58%) out of 25,909 SNP markers in 5379 haplotype blocks (Table 1). On average, each chromosome housed 256 haplotype blocks with the least number of blocks (127) on Chromosome 21 and the highest number of blocks (453) on Chromosome 20. The average number of unique haplotype blocks per chromosome was 8. Most haplotype blocks were formed with 2 SNP markers (58%) followed by 3 (23%). The maximum number of SNP markers in the Agriculture 2021, 11, 667 6 of 15 blocks was 11 (<0.5%). Haplotypes were converted to multi-allelic markers by considering allelic combinations within each block as independent alleles. The number of alleles (i.e., nucleotide base calls) in these multi-allelic markers ranged between two and 68 per block, with an average of 11 alleles (median = 7). The most frequent haplotype block consisted of seven allelic combinations (10%), followed by four and six (both at 9% frequency).

Trait Distribution, Correlation, Heritability
A broad range of phenotypic variation was observed in the UMN_C4 IWG population for the traits plant height, shattering, threshability, seed length, seed width, and seed mass (measured as thousand kernel weight, TKW) ( Figure 2). The mean plant height was 132 cm with the shortest plant measuring 64 cm and the tallest plant measuring 162 cm. The average shattering score was 3, with a majority of the population, i.e., 431 genets (68%) exhibiting high resistance to seed shatter (≤30% shatter). The average threshability score was 6, and a majority of the population, i.e., 407 genets (64%), had a high proportion of de-hulled grain (≥60% de-hulled). The shortest seed length and seed width were 5.2 mm and 1.5 mm, respectively; the longest seed length and seed width were 7.5 mm and 2.0 mm, respectively. Population means for seed length and seed width were 6.2 mm and 1.7 mm, respectively. The smallest seed mass (TKW) was 5.7 g and the largest was 12.3 g with an average of 8.3 g.
Pairwise trait correlations varied by trait pairs ( Table 2). The strongest positive correlation was observed between seed length and TKW with the coefficient of correlation, r = 0.67 followed by seed width and TKW (r = 0.66) and shattering and seed width (r = 0.34). The strongest negative correlation was observed between seed width and threshability (r = −0.42), followed by shattering and threshability (r = −0.37). Threshability was negatively correlated with all other domestication traits. Plant height was moderately, yet significantly, correlated with seed length (r = 0.20), seed width (r = 0.11), and TKW (r = 0.20). Pairwise trait correlations varied by trait pairs ( Table 2). The strongest positive correlation was observed between seed length and TKW with the coefficient of correlation, r = 0.67 followed by seed width and TKW (r = 0.66) and shattering and seed width (r = 0.34). The strongest negative correlation was observed between seed width and threshability (r = −0.42), followed by shattering and threshability (r = −0.37). Threshability was negatively correlated with all other domestication traits. Plant height was moderately, yet significantly, correlated with seed length (r = 0.20), seed width (r = 0.11), and TKW (r = 0.20).

Association Analysis
Association analysis using the FarmCPU method with single SNP markers resulted in detection of 31 significant marker-trait associations (MTAs) in 14 chromosomes (Table   (Table S1). The highest H was observed for seed length (0.92), and the lowest was for shattering (0.71). The estimate of H was 0.73 for plant height, and 0.79 for both threshability and TKW.

Association Analysis
Association analysis using the FarmCPU method with single SNP markers resulted in detection of 31 significant marker-trait associations (MTAs) in 14 chromosomes (Table 3, Figure 3 and Figure S2). The greatest number of MTAs were detected in Chromosome 20 (7). Five MTAs were discovered for plant height and explained a total of 23% of observed phenotypic variation. Two of the five MTAs contributed favorable alleles, i.e., they conditioned for shorter plant height with the largest favorable allele contributed by the marker S20_613713856. Seed length had the most MTAs identified, with eight, and were distributed in seven chromosomes; these eight MTAs explained a combined phenotypic variation of 36.6%, and three contributed favorable alleles. Five MTAs were discovered for seed width on four chromosomes (2,3,5,9) that explained 23% of the phenotypic variation and alleles of three of the five significant markers had positive phenotypic effect. Four MTAs were discovered for shattering in four chromosomes that explained a total of 18.6% of the phenotypic variation. Alleles from all but one SNP marker were favorable and contributed towards reduction in seed shatter. Six MTAs were discovered for threshability in five chromosomes that explained a total of 26.6% of the phenotypic variation. Of the six MTAs for threshability, alleles of four SNP markers contributed toward better threshability and two against. Seed mass (TKW) had the least number of MTAs identified at three in three chromosomes that explained 14% of the total observed phenotypic variation in the UMN_C4 population. Only one out of the three significant SNP markers had a favorable allele. MAF for all observed MTAs ranged from 7.2% to 44.4%. No large-effect QTL were observed as all MTAs accounted for phenotypic variation in the range of 4.0-6.5%. The SNP marker S08_323540788 on chromosome 8 was a common MTA between seed length and TKW; no other common markers were observed among the remaining traits. The average minor allele frequency (MAF) of all significant loci was 0.22 Table 3. SNP markers, significantly associated with the domestication traits height, seed length, seed width, shattering, threshability, and thousand kernel weight (TKW) in the UMN_C4 intermediate wheatgrass population. 'MAF' is minor allele frequency, 'LOD' is −log 10 p-value of association analysis, and 'R 2 ' is the proportion of phenotypic variance explained by the significant haplotype block. Column 'Hb' indicates if the SNP marker belongs to a haplotype block, 'NA' means the SNP marker was not binned into a haplotype block. Association analysis using the haplotype blocks resulted in detection of 17 significant haplotype-trait associations (HTAs) in 11 chromosomes (Table 4). Chromosome 8 had the highest frequency (3) of HTAs. Six HTAs were discovered for plant height that explained a total of 26.4% of observed phenotypic variation; three of which were favorable. Three HTAs were discovered for threshability that explained a total of 17.4% of phenotypic variation; two of the three HTAs had favorable alleles, i.e., they contributed towards improved threshability. Seed length, seed width, shattering, and seed mass (TKW) each had two HTAs detected and explained a combined phenotypic variation of 11%, 11.7%, 11.9%, and 11.1%, respectively. One of the two HTAs for seed length and seed width had favorable alleles, whereas both significant alleles within HTAs for TKW were unfavorable and contributed towards smaller seed mass. No large-effect loci were detected as all HTAs accounted for phenotypic variation in the range of 4-5%. The haplotype block Chr08-Hb.228 was common between seed length and TKW; no other common blocks were observed among the remaining traits.  Association analysis using the haplotype blocks resulted in detection of 17 significant haplotype-trait associations (HTAs) in 11 chromosomes (Table 4). Chromosome 8 had the highest frequency (3) of HTAs. Six HTAs were discovered for plant height that explained a total of 26.4% of observed phenotypic variation; three of which were favorable. Three HTAs were discovered for threshability that explained a total of 17.4% of phenotypic variation; two of the three HTAs had favorable alleles, i.e., they contributed towards improved threshability. Seed length, seed width, shattering, and seed mass (TKW) each had two HTAs detected and explained a combined phenotypic variation of 11%, 11.7%, 11.9%, and 11.1%, respectively. One of the two HTAs for seed length and seed width had favorable alleles, whereas both significant alleles within HTAs for TKW were unfavorable and contributed towards smaller seed mass. No large-effect loci were detected as all HTAs accounted for phenotypic variation in the range of 4-5%. The haplotype block Chr08-Hb.228 was common between seed length and TKW; no other common blocks were observed among the remaining traits.  Table 4. Haplotype blocks in significant association with the domestication traits height, seed length, seed width, shattering, threshability, and thousand kernel weight (TKW) in the UMN_C4 intermediate wheatgrass population. 'LOD' is −log 10 p-value of association analysis, and 'R 2 ' is the proportion of phenotypic variance explained by the significant haplotype block. 'SigSNP' indicates SNP markers within the significant haplotype block that were also significant in the single marker analysis; 'NA' means no SNP marker was binned into the haplotype block.

Comparison of Mapping Results between Marker Types
Few SNP markers discovered in the single marker association analysis were also discovered in haplotype analysis. Of the 31 single significant markers, 14 were binned into 14 haplotype blocks in nine chromosomes ( Table 3). The haplotype block Chr08-Hb.228 housed the SNP marker S08_323540788, a common MTA between seed length and TKW. Of the 17 HTAs, 10 blocks had SNPs that were also significant in single SNP association analysis (Table 4). For these 10 common SNPs, pairwise correlation coefficients for their p-values of marker-trait associations, percentage of phenotypic variation explained (R 2 ), and additive allelic effect were 0.25, 0.41, and 0.99, respectively.
For the six domestication traits, more MTAs were discovered relative to HTAs, i.e., 31 MTAs and 17 HTAs. Cumulative R 2 values of the MTAs were therefore larger than cumulative R 2 values of HTAs for all traits except plant height (Figure 4). For plant height, the HTAs explained 27% of the observed phenotypic variation whereas the MTAs explained 22%. For remaining traits, R 2 values for MTAs exceeded BTA R 2 by 58% on average. The largest difference was observed for seed length where MTAs explained 36% of observed phenotypic variation whereas the HTAs explained only 8%.

IWG Population, Trait Properties, and Association Mapping
In this study, we used the IWG breeding germplasm from the fourth recurrent selection cycle at the University of Minnesota (UMN_C4) to analyze population properties and carry out marker-trait associations. Similar to our third cycle breeding germplasm (UMN_C3), UMN_C4 also showed a rapid decline in LD with decay at 0.23 Mbp when r 2 = 0.2 [40]. Also similar to UMN_C3, UMN_C4 lacked strong population stratification as the first two PC axes explained only 5.1% of the variation; this value was 3.9% in UMN_C3. In another study that combined several IWG populations from the University of Minnesota and The Land Institute (Salina, KS, USA), the first two PC axes explained 4.9% of the genotypic variation [41]. This lack of a strong population structure in IWG breeding populations is likely due to the limited number of founder parents [1]. Yet, population divergence could occur in future populations as breeding programs impose differential selection pressure on various agronomic and local adaptation traits.
We evaluated six important domestication traits, plant height, seed length, seed width, shattering, threshability, and seed mass expressed in terms of thousand kernel weight (TKW), in the UMN_C4 IWG breeding population and mapped genomic loci controlling these traits. Heritability estimates for these traits were in general high and agreed with previous studies [40,42,43]. Most traits had moderate to strong correlations among each other with few notable exceptions, e.g., r = −0.07 between height and threshability

IWG Population, Trait Properties, and Association Mapping
In this study, we used the IWG breeding germplasm from the fourth recurrent selection cycle at the University of Minnesota (UMN_C4) to analyze population properties and carry out marker-trait associations. Similar to our third cycle breeding germplasm (UMN_C3), UMN_C4 also showed a rapid decline in LD with decay at 0.23 Mbp when r 2 = 0.2 [40]. Also similar to UMN_C3, UMN_C4 lacked strong population stratification as the first two PC axes explained only 5.1% of the variation; this value was 3.9% in UMN_C3. In another study that combined several IWG populations from the University of Minnesota and The Land Institute (Salina, KS, USA), the first two PC axes explained 4.9% of the genotypic variation [41]. This lack of a strong population structure in IWG breeding populations is likely due to the limited number of founder parents [1]. Yet, population divergence could occur in future populations as breeding programs impose differential selection pressure on various agronomic and local adaptation traits.
We evaluated six important domestication traits, plant height, seed length, seed width, shattering, threshability, and seed mass expressed in terms of thousand kernel weight (TKW), in the UMN_C4 IWG breeding population and mapped genomic loci controlling these traits. Heritability estimates for these traits were in general high and agreed with previous studies [40,42,43]. Most traits had moderate to strong correlations among each other with few notable exceptions, e.g., r = −0.07 between height and threshability and r = −0.05 between seed length and threshability (Table 2). Threshability had significant yet negative correlations with shattering (r = −0.37) and seed width (r = −0.42). Negative relationships of threshability with shattering and seed dimensions have also been previously observed in different IWG populations [42,44], and present a unique challenge to a breeder. Plants that produce larger seeds with minimal to no shattering and high threshability are desired to maximize the yield potential of IWG. As threshability is negatively correlated with shattering and seed weight, improving it increases the risk of depreciating trait values for the other traits. Yet, a large enough breeding population could provide the opportunity for genotypes with desirable trait combinations to exist, e.g., individuals that have reduced shattering and high threshability with larger grain. For example, in UMN_C4, there were 59 (9%) out of 657 genets with no more than 30% seed shatter and 70% or higher threshability with grain width ranging from 1.70-1.90 mm; the widest grain was 2.03 mm in the population. A careful and strong selection pressure is thus warranted to simultaneously improve these three traits in IWG.
We carried out association analysis using multi-allelic haplotypes for the first time in IWG. Ten of 17 (59%) blocks that were significantly associated with our domestication traits also contained SNP markers discovered using the single marker association analysis (Tables 3 and 4). Differences between haplotype analysis and single marker analysis are not uncommon [35,45,46]. These differences are primarily attributed to the differences in allelic composition and frequencies as well as LD with the causative loci [47,48]. This difference was evident in our UMN_C4 population as a majority (81%) of the haplotype blocks were formed with 2-3 SNPs, likely due to a rapid decline in LD (0.23 Mbp). Because of the inherent differences between the two marker types, multiple studies have suggested using both single markers and haplotypes as they often complement each-other [45,49]. Our results corroborate that it is best to use both marker types in association analysis to obtain a better understanding of the genetic control behind complex quantitative traits in IWG.

Comparison with Existing IWG QTL
Association analysis of six domestication traits in the UMN_C4 IWG breeding population led to discovery of 31 single marker-trait associations (MTAs) and 17 haplotype block trait associations (HTAs). The locus S09_398417599 was significantly associated with plant height in UMN_C4 and falls within the lower (Chr09_370475337) and upper (Chr09_81355723) bounds of the QTL associated with stem length discovered by Larson et al. [42]. Larson et al. [42] also reported that their QTL region was proximally aligned to the barley DENSO (Semidwarf 1) gene [50]. All other MTAs and HTAs discovered for plant height appear to be unique to our population.
More than 60 QTL have been reported for seed length in different IWG populations [40,42,51]. The SNP marker S08_323540788 and the haplotype block Chr08-Hb.228 (which contained S08_323540788) were significantly associated with seed length in UMN_C4. This marker is located at a distance of <1 Mbp from the significant marker S08_323540788 for seed length discovered by Bajgain et al. [40] in an association mapping study in the UMN_C3 population. Likewise, more than 50 QTL have been reported for seed width in IWG [40,42,51]. All MTAs and HTAs detected in our study were >17 Mbp away from these previously reported QTL and thus likely are unique to our population. Of the MTAs and HTAs we discovered for TKW, the significant marker S20_314990021 is located <1 Mbp away from the QTL Ti_QSws.umn_20.2 reported by Zhang et al. [51]. In Zhang et al.'s study, the QTL Ti_QSws.umn_20.2 was associated with the traits TKW and seed area in two different populations and had allelic effect of −0.31 for TKW and −0.21 for seed area. In our study, this marker had a larger, positive allelic effect (0.32) and explained 5% of the observed phenotypic variation.
In this study, we discovered five unique associations (between MTA and BTA) on chromosomes 11, 15, 17, and 20 for shattering. In various IWG populations, more than 50 QTL have been reported for resistance to shatter [42,44]. The significant SNP marker S11_233685188 was located less than 4 Mbp and 2 Mbp from the markers S11_229911033 and S11_231841692 in studies by Larson et al. [42] and Altendorf [44], respectively. It is therefore likely that these three markers represent the same locus/gene. The significant marker S05_156358046 for threshability was also discovered in a nested association mapping population created from genets of the UMN_C2 population [44]. The SNP marker S05_156358046 explained 5% of the phenotypic variation (R 2 ) and had an allelic effect of 0.35 in our study whereas in Altendorf's study, this marker had a larger R 2 value depending on the environment (10.4-13.8%) yet smaller allelic effect (0.008-0.011). Another SNP marker, S20_583569893, is located more than 7 Mbp away from the marker S20_590832266 discovered by Altendorf et al. [44], and is therefore unlikely to be the same locus.

QTL Size and Implications in Intermediate Wheatgrass Breeding
For all domestication traits in our UMN_C4 population, we observed high broad-sense heritability estimates (H ≥ 0.71). This indicates that the variations observed for all traits were largely under the control of genetic factors. Despite a higher degree of genetic control over the traits, all genomic loci associated with the traits were of small to moderate effects (proportion of explained phenotypic variance, R 2 = 4-6%). The cumulative phenotypic variance explained by the MTAs and HTAs are likely a function of number of discovered loci as previously suggested [35]. Yet, it was interesting to observe higher cumulative R 2 for HTAs relative to MTAs. However, the small to moderate effects of the QTL pose a two-fold problem: (1) the difficulty in combining the loci and using markers associated with these loci in marker-assisted selection is higher, and (2) improving trait values and increasing genetic gain from phenotypic selection only can be more challenging because of the resources required to accumulate many small-effect loci. The problem is exacerbated in a near-obligate out-crossing species such as IWG with heterogeneous population composition and many heterozygous loci. Allelic interactions in such populations are likely to dissociate rapidly with each advancing generation [52], hindering the process of rapid accumulation of favorable alleles.
A better method to accumulate causative loci and their favorable alleles is an in silico selection approach, such as genomic selection that evaluates all genome-wide markers to predict traits of interest [53]. In particular, significant markers from association studies, when used in genomic prediction models as covariates, are known to improve traits prediction in self-pollinated as well as cross-pollinated crop species [54][55][56]. Similar results have also been observed in IWG where the use of significant markers from association mapping has improved trait predictive ability of genomic prediction models in disease and yield component traits [40,57]. When significant markers are fitted as fixed effects, the model estimates effects of remaining markers, potentially avoiding strong shrinkage due to medium-large fixed-effect markers and allows the model to predict trait values with higher accuracy [58,59]. The use of genome-wide markers, including significant markers, in genomic selection can therefore improve important traits while saving time and resources compared to phenotypic selection only.
To summarize, IWG is a novel perennial grain crop that offers substantial ecological benefits, while producing edible grain. Because the domestication timeline of IWG is very short, with modern genomic tools not applied until recently, the crop still requires significant improvements of several agronomic and domestication traits. We identified several genomic loci-31 single markers and 17 haplotype blocks-that control six important domestication traits in IWG: plant height, seed length, seed width, shattering, threshability, and seed mass. The accumulation of these loci by increasing the frequencies of their favorable alleles in IWG breeding germplasm will be instrumental in improving this crop.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/agriculture11070667/s1, Figure S1: Squared allele-frequency correlations (r 2 ) plotted against physical distance (mega base pairs, Mbp) in the UMN_C4 intermediate wheatgrass population. The decline of linkage disequilibrium is shown by plotting the LOWESS curve in blue color. Figure S2: QQ-plots generated by FarmCPU model during genome-wide association scan for significant SNP markers. A: Height (cm), B: Seed length (mm), C: Seed width (mm), D: Shattering, E: Threshability, F: Thousand kernel weight (TKW, g). Table S1  Data Availability Statement: Sequence data of UMN_C4 can be accessed through NCBI's SRA Bioproject PRJNA722274. Phenotypic data is available for research purposes upon a reasonable request.