Genome-Wide Association Analysis for Hybrid Breeding in Wheat

Disclosure of markers that are significantly associated with plant traits can help develop new varieties with desirable properties. This study determined the genome-wide associations based on DArTseq markers for six agronomic traits assessed in eight environments for wheat. Moreover, the association study for heterosis and analysis of the effects of markers grouped by linkage disequilibrium were performed based on mean values over all experiments. All results were validated using data from post-registration trials. GWAS revealed 1273 single nucleotide polymorphisms with biologically significant effects. Most polymorphisms were predicted to be modifiers of protein translation, with only two having a more pronounced effect. Markers significantly associated with the considered set of features were clustered within chromosomes based on linkage disequilibrium in 327 LD blocks. A GWAS for heterosis revealed 1261 markers with significant effects.


Introduction
Wheat (Triticum aestivum L.) is an important source of calories and proteins for humans worldwide [1]. This crop is cultivated on approximately 216 million hectares and yields over 765 million tons of grain [2]. For the last 60 years, the average yield of wheat has been increased 3-fold to the current level of approx. 3.5 t/ha [2,3], but the progress in this regard has been relatively slow in recent years. Norman Borlaug's "Green Revolution" initiated this continuous progress, which aimed to release new highly productive cultivars for developing countries. The use of heterosis and hybrid wheat breeding is a possible option for maintaining or boosting yield productivity in modern wheat cultivars. The possibility of the commercial use of wheat F1 hybrids was suggested as early as 1963 by Briggle [4]. Wheat hybrids may significantly outperform traditional cultivars regarding yield level (usually by under 5%, but in particular situations by more than 40%; [5]) because of the effect of heterosis and can reveal higher stability in various environments due to their heterozygotic nature [6][7][8]. Moreover, the certified hybrid seeds desired by producers protect the interests of breeders [9].
The availability of high-throughput genotyping techniques, such as genotyping by sequencing (GBS) or single nucleotide polymorphism (SNP) arrays, has accelerated global genetic analyses of large genome species, such as wheat [10][11][12][13][14]. The results of genomewide association studies (GWAS) can be applied to the selection of the most valuable genotypes (genomic selection). GWAS was recently used to analyze anther extrusion in wheat [15,16]. Association studies based on various types of populations for the wheat plant traits analyzed in this report were also performed; we refer to a number of them in Discussion, Sections 3.1-3.3. More complex analyses focusing on the multi-trait requirements of hybrid breeding have not yet been reported. In the breeding of traditional cultivars, the target of GWAS is relatively clear, and the objective of genomic selection is to identify genotypes with alleles associated with increased values of yield-related traits that are frequently affected by environmental changes [17][18][19][20]. The application of GWAS for genomic selection in hybrid breeding procedures is complex. The successful release of a cultivar results from selecting a proper maternal line (seed parent), paternal line (pollen parent), and prediction of the heterosis effect revealed by the final hybrid. The challenge is that a desirable trait in one of these three components can be undesirable in another. Furthermore, the best performance of the final hybrid should be achieved when numerous loci of high-yielding parents remain heterozygotic. Genomic selection in hybrid breeding should be adjusted to meet these requirements.
In addition to the agronomic performance of related traits, the condition of heterosis breeding success is the appropriate flowering biology of the parental components. Different elements of flowering biology determine the suitability of male and female wheat parents for hybrid breeding. Efficient anther extrusion and a high quantity of released pollen over an extended period are considered the most important traits desired in male candidates for the successful production of hybrid wheat seeds. For female parents as pollen recipients, more significant is the ability to open flowers in the time necessary to get pollinated for extended periods, coupled with hairy stigmas. Considering the plant height, it is recommended that the pollen parent be taller than the seed parent because the pollen of wheat is much less mobile than the pollen of typical wind-pollinated plants such as maize and rye [3,7,16,[21][22][23].
Despite several studies, the processes of flower opening and anther extrusion, a very complex and dynamic phenomenon combining anatomical, physiological, and biochemical parameters, are still not fully understood or defined [24][25][26]. Nevertheless, the variation and heritability of flowering-related traits are considered moderate to high, which provides the opportunity to select desirable parental components [7,27].
In the presented work, we use the data presented in [28], which was applied to the analysis of the genetic structure of a large pool of 509 wheat varieties and breeding lines. On adding to them the phenotypic observations obtained in two independent series of field trials, we presume that by using GWAS we will identify chromosomal regions associated with six agronomically important traits that are significant for wheat hybrid breeding.
To verify this presumption, we set the following objectives: (1) to apply the genotyping data [28] consisting of SNP marker observations determined by the DArTseq platform for GWAS, and (2) to identify chromosomal regions associated with six traits within a set of wheat genotypes across different environments.

Phenotypic Characteristics of the Accessions
Six phenotypic traits were assessed in eight environments. The distributions of genotypic means (BLUPs obtained in individual experiments; Table S1) were approximately normal; a slight skewness to the right was observed for time of flowering-FT ( Figure S1). Two phenological traits, time of heading-HT and FT, were positively correlated with each other and with plant height-PH (p-value < 0.05; Figure S1). In contrast, for PH, a negative correlation was observed with yield-related traits (number of kernels per spike-KN, weight of kernels per spike-KW, and thousand kernel weight-TKW). KW was positively correlated with TKW.
In ANOVA, all fixed effects, i.e., effects of the year (Y), localization (L), and Y × L interaction, were significant (at least at a p-value < 0.05, but most of them at p-value < 0.001). The variance components for random effects of genotypes (G) were much larger than those for G × Y and G × L interactions for traits HT, FT, and PH, which was reflected by the large heritability for these traits (88-90%; Table 1). The variance components were in the same order for yield-related traits, and therefore, heritability was smaller (38-54%). Table 1. Mean values, coefficients of variation (CV), variance components (with standard errors) for genotypes (G) and their interactions with year (Y) and localization (L), and broad-sense heritability coefficients for the six traits. A comparison of the distributions of genotypic means in particular experiments showed that accessions headed and flowered earlier and were shorter in 2018 than in 2019 ( Figure 1A), and that the yield parameters were higher in 2018. The weather conditions, as characterized by monthly precipitation and temperatures, differed slightly between the two growing seasons. In 2017/2018, winter temperatures were lower, and summer precipitation was slightly higher than in 2018/2019 ( Figure 1B). flowering-FT ( Figure S1). Two phenological traits, time of heading-HT and FT, were positively correlated with each other and with plant height-PH (p-value < 0.05; Figure  S1). In contrast, for PH, a negative correlation was observed with yield-related traits (number of kernels per spike-KN, weight of kernels per spike-KW, and thousand kernel weight-TKW). KW was positively correlated with TKW.
In ANOVA, all fixed effects, i.e., effects of the year (Y), localization (L), and Y × L interaction, were significant (at least at a p-value < 0.05, but most of them at p-value < 0.001). The variance components for random effects of genotypes (G) were much larger than those for G × Y and G × L interactions for traits HT, FT, and PH, which was reflected by the large heritability for these traits (88-90%; Table 1). The variance components were in the same order for yield-related traits, and therefore, heritability was smaller (38-54%). Table 1. Mean values, coefficients of variation (CV), variance components (with standard errors) for genotypes (G) and their interactions with year (Y) and localization (L), and broad-sense heritability coefficients for the six traits. A comparison of the distributions of genotypic means in particular experiments showed that accessions headed and flowered earlier and were shorter in 2018 than in 2019 ( Figure 1A), and that the yield parameters were higher in 2018. The weather conditions, as characterized by monthly precipitation and temperatures, differed slightly between the two growing seasons. In 2017/2018, winter temperatures were lower, and summer precipitation was slightly higher than in 2018/2019 ( Figure 1B). Stability analysis in the AMMI model revealed that, in general, the reactions of genotypes to conditions in the four locations were more similar in 2019 than in 2018 ( Figure S2A). In 2018, for traits HT, PH, KN, and TKW, a considerable difference existed between reactions to conditions in (Antoniny, Nagradowice) vs. (Konczewice, Strzelce). The patterns of the interactions for localization were similar for HT and TKW. Genotypes showed continuous distributions of interactions. For FT accessions, PHR_6 (Akteur), STH_148 (UKR 12), and for KW accessions, STH_65 (STH 9025), STH_167 (Cornea ost.) showed outlying instability. The largest significant correlations existed between instability variances for genotypes estimated for pairs of traits: HT and FT and KW and TKW ( Figure S2B,C); correlations for trait pairs (PH, FT) and (KW, KN) were statistically significant but very small. The ranks of genotypes concerning instability variances for all traits are presented in Table S2; they show that the most stable genotypes for one trait were usually much less stable for the other.

Association Analysis
GWAS revealed 7603 SNPs with polymorphisms significantly associated with at least one phenotypic trait (only SNPs with more than five accessions in each of the two homozygous classes; BH corrected p-value < 0.05). The characteristics of the allelic substitution effects for these loci are shown in Table 2. By considering also the criterion related to the size of the substitution effect being in the second lower or upper percentile, the number of associated markers was reduced to 1273, and the number of significant associations was reduced to 1344 ( Figure 2, Table 3 and Table S3).
The mean linkage disequilibrium (LD) with other SNPs in the region ±5 Mb was determined for significant SNPs. A detailed analysis of LD has been provided in [28] (a heatmap of LD on chromosome 1A in Figure S2A, exemplary LD clusters in Figure S2B, and the plots of LD vs. physical distance between markers with 0-20 Mb distance intervals in Figure 5A of [28]). The percentage of associations with a small mean LD (<0.01) in this interval varied from 5% for PH to 20% for KW ( Table 3). The significant SNPs were clustered into small groups of no more than five markers within chromosomes based on LD ( Table 3). The highest number of LD clusters was on chromosome 3A for the two phenological traits, on chromosome 2B for PH, on 2B and 6B for KN, on 7B for KW, and on 2A for TKW ( Figure 3).
The fraction of associations showing interaction with the environment was the lowest for PH and KN and the highest for KW. Associations were unevenly distributed in the wheat subgenomes. Most of the associated SNPs mapped to the A and B subgenomes (38.4% and 45.01%, respectively), whereas only 14.14% were derived from D, and 2.44% remained unmapped (Table 3, Figure 2); however, the proportion of the relative number of SNPs was similar: 8.99% from the A subgenome, 10.14% from B, 8.45% from D and 11.07 from Un. For five traits, most associations were detected in subgenome A, and only for PH were most associations detected in subgenome B. More than 50% of the associated SNPs for all the traits were located in genes. Most SNPs associated with traits were predicted to modify protein translation, and only two SNPs significantly influenced  Allelic substitution effects for different traits were correlated ( Figure S3). However, not all correlations were of the same size and sign as those of the traits themselves. For example, allelic effects for PH were negatively correlated with effects for phenological traits (positive correlation for traits), and effects for KW were positively correlated with phenological traits (negative correlation for traits).
SNPs associated with traits were divided into three sets corresponding to plant phenology (HT or FT, set 1), PH (set 2), and kernel properties (KN or KW or TKW, set 3), with the numbers of associations in these sets being 592, 391, and 429, respectively. There were 26, 99, and 26 SNPs common to sets 1 and 2, 1 and 3, and 2 and 3, respectively, and the 12 SNPs common for all three sets ( Figure 4, Table S3). For pairs of traits, the largest number of common significant SNPs was recorded for HT and FT (312), which is consistent with a large correlation between these two traits reported above. * Total number of markers indicated by VEP [29] (in brackets: number of markers assigned to specific genes).      Of the 26 markers that belonged to sets 1 and 2, 17 were assigned to genes, with 5 being modifiers and 4 substitutions having moderate effects (Table 4). There were three markers with concordant effects on earliness and plant height, with the modifying effect of substitutions. Two of these (2253029|F|0-10|CT and 1237800|F|0-13|CG) were associated with genes (Table S3). Out of 12 SNPs common for sets 1, 2 and 3, eleven were associated with KN and one with TKW. The same direction of effect could be seen for FT and KW, while the opposite direction was seen for KN and PH.

Marker LD Clusters
Clusters of markers significantly associated with HT, FT (set 1), and with PH (set 2) ( Table 3) were considered for LD-based GWAS. Combinations of their genotypes, represented by at least 25 accessions from the investigated collection, were considered for further analysis. The characteristics of the 327 marker clusters are presented in Table S4. Among them, we attempted to identify the effects of genotypic combinations that were higher than the separate allele effects of particular single markers. Examples of genotypic combinations with large negative effects on FT, HT, and PH are shown in Figure 5. For marker LD block no. 21, which clusters markers "1207903|F|0-56|AT_4992730|F|0-46|CT", the variant "T/T T/T" was present in 26 accessions and was related to early flowering (effect of −2.9 days) and low plants (effect of −3.8 cm) in comparison to the average of all accessions. One of these markers was assigned as an upstream modifier of the gene TraesCS1B02G028100, annotated as involved in "serine-type endopeptidase inhibitor activity." Other interesting marker clusters with higher effects contributed by haplotypes were identified, for example, no. "291", 2260931|F|0-40|CT_1004422|F|0-41|AG. We noted that 76% of the genotypic variants in marker LD blocks were fully homozygous. Therefore, their information was equivalent to the knowledge of haplotypes.

Marker LD Clusters
Clusters of markers significantly associated with HT, FT (set 1), and with PH (set 2) ( Table 3) were considered for LD-based GWAS. Combinations of their genotypes, represented by at least 25 accessions from the investigated collection, were considered for further analysis. The characteristics of the 327 marker clusters are presented in Table S4. Among them, we attempted to identify the effects of genotypic combinations that were higher than the separate allele effects of particular single markers. Examples of genotypic combinations with large negative effects on FT, HT, and PH are shown in Figure 5. For marker LD block no. 21, which clusters markers "1207903|F|0-56|AT_4992730|F|0-46|CT", the variant "T/T T/T" was present in 26 accessions and was related to early flowering (effect of −2.9 days) and low plants (effect of −3.8 cm) in comparison to the average of all accessions. One of these markers was assigned as an upstream modifier of the gene TraesCS1B02G028100, annotated as involved in "serine-type endopeptidase inhibitor activity." Other interesting marker clusters with higher effects contributed by haplotypes were identified, for example, no. "291", 2260931|F|0-40|CT_1004422|F|0-41|AG. We noted that 76% of the genotypic variants in marker LD blocks were fully homozygous. Therefore, their information was equivalent to the knowledge of haplotypes.

Heterosis
Searching for heterosis of SNP polymorphisms (on the set of SNPs with more than five heterozygous lines; BH corrected p-value < 0.05) revealed 1261 SNPs with significant effects (Table 5 and Table S5), with the largest number of effects for phenological traits and no effects for KW. Significant SNPs were evenly distributed over the subgenomes, with 53-80% of SNPs in the genes ( Figure 6). None of the SNPs revealed a high translation effect, and the majority of the predicted effects were classified as low or modifying protein translation.

Heterosis
Searching for heterosis of SNP polymorphisms (on the set of SNPs with more than five heterozygous lines; BH corrected p-value < 0.05) revealed 1261 SNPs with significant effects (Tables 5 and S5), with the largest number of effects for phenological traits and no effects for KW. Significant SNPs were evenly distributed over the subgenomes, with 53-80% of SNPs in the genes ( Figure 6). None of the SNPs revealed a high translation effect, and the majority of the predicted effects were classified as low or modifying protein translation.  Association sets for traits (HT, FT), (PH), (KN, KW, and TKW) based on heterosis effects contained 480, 324, and 83 markers, respectively. The latter (yield-related) set contained 52 markers assigned to genes; the heterosis effects in this set were: from −4.50 to 3.09 grains for KN, from −0.12 to 0.09 g for KW, and from −2.09 to 4.13 g for TKW.
The correlations of heterosis effects were similar to those of additive effects ( Figure S4).

Allelic Substitution Effects vs. Heterosis Effects
Allelic effects were correlated with heterosis effects for markers with high heterozygosity (Figure 7). The range of additive effects was smaller than that of heterosis effects for all traits.   [29] (in brackets: number of markers assigned to specific genes).

Allelic Substitution Effects vs. Heterosis Effects
Allelic effects were correlated with heterosis effects for markers with high heterozygosity (Figure 7). The range of additive effects was smaller than that of heterosis effects for all traits.

GO Annotation of Associated Markers
GO terms (biological processes) represented in sets of genes assigned to SNPs in association sets 1, 2, and 3 for allelic effects and heterosis effects are shown in Table S6A and Table S6B, respectively. Most frequently (16-26%), markers associated with both additive and heterosis effects were involved in redox processes and protein phosphorylation. The next most common GO terms were regulation of transcription (DNA-templated), carbohydrate metabolic process, and transmembrane transport. One marker, 1238701|F|0-16|GA (TraesCS2D02G127300), belonged to sets 1 and 2 with respect to additive effects (effects for HT −0.75, for PH 2.47), and to set 3 with respect to heterosis effects (effect 2.25 for TKW).

GO Annotation of Associated Markers
GO terms (biological processes) represented in sets of genes assigned to SNPs in association sets 1, 2, and 3 for allelic effects and heterosis effects are shown in Table S6A and Table S6B, respectively. Most frequently (16-26%), markers associated with both additive and heterosis effects were involved in redox processes and protein phosphorylation. The next most common GO terms were regulation of transcription (DNA-templated), carbohydrate metabolic process, and transmembrane transport. Responses to auxins were identified in groups of genes responsible for plant height additive effects more frequently than proteolysis.
Responses to auxins were identified in groups of genes responsible for plant height additive effects more frequently than proteolysis.

Discussion
The main goal of hybrid wheat breeding is to exploit the heterosis effect. Final success depends on several factors, but yield is a crucial goal. Without a significant increase in productivity offered by hybrid cultivars, they will not be the choice of farmers; as an alternative, they may apply less expensive seeds of well-performing classic cultivars. The cost of hybrid seed production is also important. It depends on numerous factors related to plant morphology and the biology of flowering [3,7,30]. The cross-pollination of wheat plants is based on the wind, but it is significantly less effective than that of typical openpollinated plants, such as rye and maize. Waines and Hedge [31] noticed that singular wheat pollen could be found as far as 1000 m from the plantation, but hybrid seeds are usually undetectable beyond 20-30 m. The distance between parental lines of wheat allowing for economically sufficient effectiveness of seed production is limited to 2-3 m, and larger values can be applied when the pollen parent is taller than the seed parent. The reduction of plant height in wheat (highly recommended for maternal components) can be achieved by utilizing properly selected reduced height (Rht) genes.
In this study, we present a multifaceted statistical analysis aimed at providing information on a broad population of cultivars and breeding strains, which can primarily be used in wheat hybrid breeding. We investigated other morphological and phenological traits significant for efficient seed production and chosen traits related to wheat productivity, aiming to verify whether our non-standard genome-wide association study methods have potential application in hybrid breeding of wheat.
The analysis of variance showed that genotypic variability, measured in relation to variabilities caused by genotype by year and genotype by localization interactions, and, consequently, broad-sense heritability, was higher for phenological traits and for plant height than for yield-related traits. This indicates the possibility of selecting valuable parental forms independently, to some extent, from the target environment. Additionally, stability analysis (AMMI) showed that, although a given genotype may be subjected to various degrees of environmental variance for various traits, some correlation exists among instability variances for traits related to hybrid component selection.
However, our association analysis was performed assuming that SNP variant effects can also interact with environmental conditions. This aspect of GWAS analysis is often addressed by performing separate analyses for each environment (cf. [32][33][34]), mainly because of the limited options presented by the data analysis software. Based on the mixed model developed by Malosetti et al. [35], our approach allowed for the explicit testing of markers by environment interaction effects, which provided a conclusion on the lower environmental variability of SNP effects for plant height than for phenology. We also performed GWAS analysis using marker LD clusters. The most promising marker clusters were those for which the effects of genotypic combinations were higher than the allelic effects of individual markers. Furthermore, concordant effects of earliness and plant height are required for the practical importance of the haplotype. The applied procedure allowed the identification of groups of markers with a high impact on phenotypic traits, which could be expected for individual polymorphisms. Thus, we demonstrated the possibility of using the phenomenon of non-allelic interactions in LD-informed hybrid breeding. Moreover, as hybrid breeding is based on the effects of heterozygous materials, another association analysis was performed with respect to heterosis. It aimed to identify genomic loci for which heterozygotes could be more profitable than homozygotes, especially for yield-related traits. Although the results obtained in this way are far less valuable than those that could be gained from observations of hybrids, this is undoubtedly progress compared to the methodological approaches used to date in this type of research.
Validation of GWAS results can be done by an independent experimental study concerning another pool of genotypes. In our case, the pool covers most of the accessions that are practically interesting for project stakeholders. Another form of validation can be based on a real breeding process; this has been started by passing our results to breeders, who have already performed selected crosses and are currently assessing the value of hybrids by their standard procedures. In this report, to verify the results of GWAS, publicly available data from independent, post-registration trials performed in a wider set of environments were used. The mean BLUPs for genotypes from the HYBRE experiments were correlated with the general means from the PRT. Moreover, substantial correlations between allelic substitution effects obtained from GWAS on both datasets were obtained; the same was true for heterosis effects. The effects concerning plant height were found to be the most correlated. This means that the data obtained in the HYBRE experiment are representative, and it can be assumed that new breeding creations can be evaluated in limited field experiments rather than in large experimental systems.

Flowering-Related QTLs and Genes
The expected female component of the hybrid should contain long stigmatic hairs that are fully extruded and receptive for long periods [7]. Wheat stigma remains receptive for up to 13 days after anthesis; however, it is usually highly receptive for no more than 3 days after anthesis [21]. Effective anther extrusion of the pollen parent is crucial for the successful setting of hybrid seeds. Recently, Denisow et al. [36] showed that the anther extrusion plays a much more important role in the contribution to the final amount of pollen available for cross-pollination than previously thought. Flowering generally begins at the center of a wheat spike and proceeds in both directions (up and down). Within each spikelet, the primary floret opens first, followed by the secondary, tertiary, etc. The first two florets of the spikelet produce the most abundant anthers, filled with the most viable pollen [37]. Thus, it is suggested that when these highly effective florets of the male component begin flowering, the female parent should be ready to receive pollen. The optimal difference between the flowering times of the seed parent and pollen parent is 2-3 days [30]. The phenotypic efficiency of some of the SNP markers indicated in our analyses met these criteria. The studied set of wheat accessions had been focused on genotypes cultivated in Europe [28]. Therefore, phenotypic variation was, in some way, limited. Despite this, some of the selected SNP markers revealed additive effects ranging from 1-1.6, corresponding to a difference from 2 to over 3 days of flowering time of homozygotic wheat genotypes, which are potential parental components of hybrids.
Markers associated with flowering time are necessary to synchronize the flowering of hybrid wheat components. Eight QTLs for heading time have been reported [38]. One of them, QHD_7A_psr_ParW670_CFLN17, corresponding to a relatively short region 710-719 Mbp on 7A, colocalized with the 3025631|F|0-10|GA marker found in our study. The regulation of flowering has been extensively studied at the gene level. Flowering is mainly regulated by vernalization (VRN1, VRN2, VRN3, and VRN-D4), photoperiod (Ppd-A1, Ppd-B1 Ppd, and Ppd-D1), and earliness per se (eps) genes [39]. Some of these genes showed pleiotropic effects, i.e., Ppd-A1 increased TKW and yield. Similarly, Ppd-B1 was associated with a high kernel number [40]. Eps genes interact with Ppd1 and are associated with spikelet number [41]. In wheat, a homolog of the Arabidopsis early-flowering 3 (ELF3) gene was identified as a candidate gene for Eps-Am1 [42]. Additional genes affecting flowering time in wheat were reviewed by Zhang et al. [39].
Searching Ensembl Plants for wheat orthologs of 204 Arabidopsis thaliana genes related to flowering [43] resulted in a list of 617 wheat genes. Of these, 31 were assigned to some of the SNPs analyzed in this study. Five SNPs were associated with FT or HT, with a negative effect of the ALT allele; these were orthologs of the A. thaliana genes UGT87A2, MFT, FRI, AGL57, TT16, and MAF4.

Plant Height-Related QTLs and Genes
Some Rht loci have been suggested to negatively affect anther extrusion [44,45]. The insertion of Rht genes was crucial for the success of the "green revolution" in wheat. Replacement of some widely used Rht loci with alternative variants more neutral for the flowering process may be recommended in hybrid breeding [46]. For example, the Rht1 and Rht24 genes reduce plant height, but Rht1 simultaneously reduces anther extrusion, whereas Rht24 does not have such adverse effects [30]. All wheats are assumed to be monomorphic for Rht-A1a [47]. Rht24 occurs at relatively high frequencies in European and Chinese wheat cultivars and was mapped to the same region as Rht14, Rht16, and Rht18 [48,49].
QTLs and candidate genes for PH were collected from the catalog of gene symbols for wheat ([47] with updates), and 44 microarray probes reported for PH [50] were mapped to separate physically linked regions in the Chinese Spring genome V 1.0 (Table S7A). We found eight DArTseq markers overlapping the four regions identified with these microarray probes. Additionally, DarTseq markers were found in the physical regions of QHt.nau-2D, Rht8, Rht13, and Rht22 [51][52][53].
Information on known genes responsible for wheat PH was used to search for homologous and paralogous genes (with identity to target > 50%). Sequences (TraesCS4A02G271000, TraesCS4B02G043100, and TraesCS4D02G040400) corresponding to gibberellic acid insensitive (GAI) Rht-A1, Rht-B1, and Rht-D1 were mapped to the 4AL, 4BS, and 4DS chromosome arms [54,55]. These GAI loci were highly conserved, and a search with blastn revealed no additional candidate homologous genes.
The remaining Rht genes were sensitive to GA. Two known genes from this group are GA2-β-dioxygenase (GA2oxA9) and AP2-D (Q gene), which correspond to Rht18 and Rht23 loci, respectively [50,56]. Two GA2oxA9 homologs and three paralogs were found, and five DArTseq markers coincided with the GA2oxA9 homolog located on the 6B chromosome. Two sequences were used to identify AP-2 homologs. Five AP-2 homologs were found, three of which were located in proximity (<5 Mbp) to 14 DArTseq markers associated with PH (Table S7A).
We identified 224 genes (containing 313 SNPs) assigned to DNA sequences related to PH using the text search in the description of genes by GO terms and Interpro features for the texts "gibberellin," "auxin," "cytochrome," "kaurene," "kaurenoic," and "DELLA." Most detected hits were for "cytochrome," as in [50]. Of these, only 12 genes (with 14 SNPs) associated with PH showed noticeable negative or positive phenotypic effects. Three of them were orthologs of the A. thaliana genes CYP85A1, CYP85A2, CYP84A4, CYP84A1, and CYP734A1.

Spike Traits Related to QTLs and Genes
The positions of significant SNPs for grain weight, number per spike, and TKW were compared with those reported in previous studies (Table S7B). Maintaining a higher KN is an important breeding target for stress-tolerant lines [57]. A total of 142 regions were associated with KN, and 11 overlapped (±1 Mb) with the DArTseq markers reported in our study. Only precise QTLs spanning regions not exceeding 5 Mb were included in the search for common positions with SNP markers identified in our study. Ninety-two markers associated with kernel weight have been described in the literature, and only one marker had a position congruent with the SNP identified in our study. Out of the 123 SNPs or QTL described as responsible for TKW, five SNP markers were identified in previously described regions. TaGW8-B1 is associated with agronomic traits in bread wheat cultivars. The TaGW8-B1a allele increases TKW and spikelet number per spike and provides higher yields than cultivars with the TaGW8-B1b allele [58]. KASP markers for TKW have also been developed [59][60][61][62].

SNP Translation Effects
Among the identified polymorphisms, two SNPs had high translational effects. The first identified polymorphism, 1090593|F|0-44|CT, was associated with TKW and located in the gene TraesCS4B02G086500, annotated with the GO term "polygalacturonase activity". Polygalacturonases are hydrolyzing enzymes implicated in a wide range of plant developmental processes, such as cell elongation, organ abscission, fruit ripening, microspore release, and pollen tube growth [63,64]. Members of two out of the five clades, C and F, are also expressed in grasses during root and seed development [65]. They were detected in the outer pericarp and intermediate layers of grains of the related species Aegilops tauschii [66]. Ye et al. [67] found three putative candidate genes for QTL mapped on chromosome 4A in wheat; two (TraesCS4A02G229600 and TraesCS4A02G229700) were orthologous to the Arabidopsis gene At2g43860, coding for polygalacturonase. SNP 1090593|F|0-44|CT was localized in the same homologous group but on chromosome 4B and had a high translational level, resulting in a STOP codon instead of a lysine codon. It is predicted that this substitution resulted in protein shortening (by 40 residues) and an altered C-terminus (the structures of wild-type and altered proteins, Figure S5).
The second SNP (1130302|F|0-44|TG) was located in gene TraesCS6A02G085100 (Arabidopsis RPT3 ortholog) with a high translational effect and annotated with the GO term "protein ubiquitination" and was associated with FT. Protein ubiquitination is a sophisticated system of post-translational modification in all eukaryotes and has been demonstrated to play a key role in various plant developmental stages and processes, such as seed dormancy and germination, root growth, flowering time control, self-incompatibility, chloroplast development, and several abiotic stress responses [68]. The SNP 1130302|F|0-44|TG was positioned in the intron and localized in the 5 UTR without any direct impact on the protein sequence. In general, cis-acting elements present in UTRs are essential for post-transcriptional control, including alternative polyadenylation, riboswitching, shortpeptide translation, nonsense-mediated decay, and alternative splicing [69]. However, the T/G mutation found in our study may disrupt splicing, possibly resulting in the retention of this intron. Intron retention in the 5 UTR affects gene expression (by alternative splicing, alternative polyadenylation, and protein translation). Such an effect has been shown, inter alia, in Arabidopsis ZIF2 [70], EF1α-A3 [71] and rice OsmiR156h [72]. It has been postulated that UTR-related regulation of gene expression helps plants adapt to environmental fluctuations [69]. Flowering time is a major factor in climatic adaptation [73][74][75].

Germplasm Resources
The study was conducted using 509 wheat varieties and breeding lines. A set of 277 European varieties, registered mainly in Germany, Poland, and the United Kingdom, was used. Advanced breeding lines were represented by 232 accessions from the ongoing programs of the Plant Breeding Strzelce (STH) and Poznań Plant Breeding (PHR) companies. The genetic characteristics of the studied resources were provided by Tyrka et al. [28].

Genotyping
A detailed description of genotyping and genotypic data processing was provided by Tyrka et al. [28]. Briefly, for each genotype, DNA was isolated from 15-20 bulked 2-week-old seedlings. DNA concentration and purity were determined using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and DNA quality was assessed using 1.5% agarose gel electrophoresis. The DNA was stored at −20 • C and diluted to a working concentration of 50 ng/µL for subsequent wheat DArTseq 1.0 genotyping completed by Diversity Arrays Technology (Bruce, Australia). Only selected high-quality data for 13,499 SNPs were taken into account in this research (dominant markers of the Silico-DArT type were omitted).

Data Analysis
Phenotypic data were analyzed using a linear mixed model (LMM) with fixed effects of the year (Y), location (L), Y × L interaction, and random effects of genotypes (G), G × Y interaction, and G × L interaction. Broad-sense heritability was estimated by the method described by Cullis et al. [77]. The additive main effects and multiplicative interaction (AMMI) analysis was performed as described by Gauch [78]. Genome-wide association study (GWAS), using SNP polymorphisms and genotypic means (BLUPs) obtained in individual experiments, was performed by a method that allows for the interaction of genetic effects with the environment, developed by van Eeuwijk et al. [79] and Malosetti et al. [35]. This is based on a linear mixed model with the population structure estimated by eigenanalysis of the kinship matrix (see [28]) and the compound symmetry variancecovariance model used for environmental variation. Marker data used for these analyses were coded as follows: 0, reference (REF) homozygote; 1, heterozygote; and 2, alternative (ALT) homozygote. Standard GWAS considers the additive marker effects, but also non-additive effects might explain an important proportion of the variation in complex traits ( [80]). Therefore, GWAS for heterosis effects, estimated as the difference between the mean for heterozygotes and the mean for two homozygotes, was performed based on mean values over all experiments. To do this, marker data coded with 0, 1, and 2 values in the model matrix (used in GWAS for allelic substitution effects) were recoded to values of 0 for homozygotes (of both types) and 1 for heterozygotes [80]. The population structure was represented in the model by the eigenanalysis scores. The effects of marker linkage disequilibrium (LD) clusters were analyzed using mean trait values over all experiments and analysis of covariance, with the genotypic combinations in these clusters as classifying factors and eigenscores as covariates. p-values for allelic substitution effects and heterosis effects were corrected for multiple testing using the Benjamini-Hochberg (BH) method. Only SNP data with more than five accessions present in both homozygous classes were used during the genome-wide association analyses. An SNP effect was considered statistically significant if the BH-corrected p-value was lower than 0.05.
Statistical analyses were performed using Genstat for Windows 19th edition [81]. Visualizations of results were performed using Genstat 19 or R software. The annotation of SNP markers with respect to genomic positions, neighboring genes, their Gene Ontology classification, and SNP translation effects [29] used in this study were the same as those used by Tyrka et al. [28].

Conclusions
In conclusion, by employing an appropriate experimental and statistical approach, we generated a set of SNP markers that can be used in breeding practice to predict the earliness and height of plants to assess whether a given genotype will be particularly good as a maternal (early and short plant) or paternal form (late and tall plant). The estimated heritability of phenological traits and plant height was relatively high; therefore, selection based on the markers indicated during this study may be successful. In contrast, the heritability of yield-related traits in our field trials was relatively low; nevertheless, it remained within the range reported by other authors [82][83][84][85].

1.
The successful parallel selection of homozygous parental genotypes (based on traits regulated by additive genes controlling phenology and plant height) and components revealing a high heterosis effect (the choice based on the highest values of spike-yieldrelated traits revealed by heterozygous genotypes) using GWAS.

2.
Demonstrating that the application of clustered markers for the choice of genotypes in multi-feature processes may be more efficient than classic selection based on single marker polymorphism (SNP) 3.
Validation of the GWAS results using post-registration trial data.

Conflicts of Interest:
The authors declare no conflict of interest.