Dissecting the Contribution of Environmental Inﬂuences, Plant Phenology, and Disease Resistance to Improving Genomic Predictions for Fusarium Head Blight Resistance in Wheat

: Environmental factors like temperature and humidity are presumed to greatly inﬂuence Fusarium head blight FHB infections in wheat. Anther retention AR, on the other hand, is a morphologically neutral trait that shares a common genetic basis with FHB resistance. In this study, our aims were to: (i) Evaluate two types of corrections of FHB severity scores, namely method-1 via linear regression on ﬂowering time (FT), and method-2 via a best-subset multiple linear regression analysis comprising FT plus accumulated thermal time variables; and (ii) assess the performance of multi-trait genomic selection (MT.GS) models for FHB severity assisted by AR. The forward prediction scenarios where GS models were trained with data from the previous years revealed average prediction accuracies (PA) of 0.28, 0.33, and 0.36 for FHB severity scores that were uncorrected or corrected by method-1 and method-2, respectively. FHB severity scores free from the inﬂuences of both environment and phenology seemed to be the most e ﬃ cient trait to be predicted across di ﬀ erent seasons. Average PA increments up to 1.9-fold were furthermore obtained for the MT.GS models, evidencing the feasibility of using AR as an assisting trait to improve the genomic selection of FHB resistance breeding lines.


Introduction
Fusarium head blight FHB has become a major threat for wheat production, particularly in warm and humid regions [1]. FHB is caused by several members of the Fusarium genus yet evidence supports that resistance to FHB is neither Fusarium-species-nor isolate-specific [2][3][4]. The economic impact of this fungal disease is caused by either subtle to severe grain yield and quality losses, or mycotoxin accumulation [5]. According to recent studies, the combination of tolerant varieties, fungicides, and specific management practices might be used to decrease FHB losses [6,7] such as the $1.18 billion reached in United States in 2015-2016 [8].
Genetic improvement of host resistance is considered the most sustainable and suitable approach to manage this disease [9]. Conventional breeding, however, is limited mainly by both the lack of highly resistant germplasm and the quantitative nature of the resistance to FHB [10]. More than 500 quantitative trait loci (QTLs) for FHB resistance have been mapped into 44 chromosomal regions spreading across all 21 wheat chromosomes [10][11][12][13][14]. To date, ten QTL [10] have been either validated or employed in Marker-Assisted Selection (MAS) breeding: 2B-2, 2D-2, 3A-1, 3B-1 (Fhb1), 3B-2, 4B-1 (Fhb4), 5A-1 (Fhb5), 6A-2, 6B-1 (Fhb2), and 7B-1. From these loci, Fhb1 is the most intensively studied, and it has been deployed, through MAS, in a few registered cultivars in the United States [15][16][17], Canada [18], Australia [19], and Europe [20]. Genomic selection (GS) appears to be ideal to target the complex genetic architecture of FHB resistance under its assumption that at least some of the markers are in linkage disequilibrium (LD) with loci associated with the trait of interest [21]. Several studies [10,11] have assessed the effectiveness of GS models in FHB resistance improvement in wheat and some report higher accuracies and selection gains than MAS. The performances of GS models varied in whether a major QTL was included, the traits representing FHB resistance, the size and composition of the training and validation populations, and the types of prediction models. Multivariate GS models generally improve prediction accuracies employing strongly correlated and highly heritable traits as covariates [22,23], and even more when the records of the indicator traits are also available for the tested genotypes [24][25][26][27].
Morpho-agronomical and phenological traits plus the environment directly affect FHB infection [1]. Plant height, heading/flowering date, and floral morphological traits like anther retention (AR) are traits that have been extensively studied in the context of FHB disease [11]. Concerning earliness, it has been shown that there is no systematic type of its association with field FHB resistance, which frequently results in ambiguous associations with reports of negative [28,29] and positive [5,[30][31][32][33] correlations. It has been postulated that season-specific weather conditions at flowering and inoculation time rather than a shared genetic control might better explain this apparent tradeoff [34][35][36].
Variations in environmental temperature and humidity in the atmosphere are major factors modulating FHB infection, and there is a general consensus that warm and wet conditions at anthesis favor FHB disease severity [4,[37][38][39][40]. Wetness periods of at least 24 h and temperatures above 15 • C are required for successful infections by most of the FHB causing agents [4,39], although some epidemics have occurred in seasons with lower temperatures and above-average precipitation at anthesis [41,42]. Temperature also plays an important role from inoculum production and dispersal to its infection of wheat heads [43]. Optimum temperature conditions for disease development depend on the Fusarium species, inoculum type and virulence, and the affected tissue [39]. The two most prevalent Fusarium species that produce the mycotoxin deoxynivalenol (DON) are Fusarium culmorum and Fusarium graminearum. Their growth temperature optima under in vitro growth are 24-28 • C and 20-25 • C for F. culmorum and F. graminearum, respectively [39,43,44]. This is in concordance with the fact that F. graminearum is predominant in the United States, Canada, Australia, and parts of continental Europe with hotter summers than North Western Europe, where F. culmorum and F. avenaceum are among the more predominant species [45]. Ideal weather conditions at anthesis not only promote infection but also encourage the vegetative spread of mycelium to neighboring florets by favoring disease cycle components like perithecia maturation and ascospore formation [40,[46][47][48]. Earliness and environmental variables have been studied together in another pathosystem on wheat via a step-wise multiple linear regression and resulted in temperature and rainfall measurements preferred over heading date [49].
The extent of retained anthers after flowering is, on the other hand, a trait phenotypically associated with FHB resistance [50] specifically with resistance to initial infection [51]. Anther retention (AR) possesses a quantitative genetic nature and a shared genetic correlation with FHB traits as twelve QTL have been reported to be associated with both FHB resistance and AR [51][52][53][54][55][56][57][58]. In the field, AR usually shows a positive correlation with FHB severity, i.e., partially or not extruded anthers are indicative of higher FHB infection and fully extruded anthers are associated with reduced FHB infection [56]. When anthers are either partially extruded or stuck between palea and lemma, they become the media to facilitate the FHB infection into the floret cavity. However, if the anthers are fully extruded, it is more difficult for the FHB pathogen to colonize the spikelet tissue [52]. It has been postulated that the selection of wheat lines with low AR could be a good strategy for breeders when breeding against FHB Agronomy 2020, 10, 2008 3 of 16 susceptibility [56,57,59]. Successful cases have been reported in breeding programs in China [60] and promising results in Europe [61].
In this study, we aim to compare the performance of FHB severity scores from a best-subset multiple linear regression analysis involving flowering time and thermal variables within a genomic prediction framework. Additionally, we are motivated to evaluate multi-trait GS models for FHB severity having anther retention as an indicator trait measured in both the training and validation sets.

Plant Material and Field Experiments
Training Sets (TS) for GS models were tested in three trials seasonally evaluated between the years 2015 and 2017 totaling 853 genotypes, being either F 4:6 , F 5:7 or double haploid breeding lines. On the other hand, the 143 overlapping lines each evaluated in two consecutive years in the time period 2015 to 2018 were used as Validation Sets (VS). The lines belonged to 429 bi-parental families with sizes varying from 1 to 22 individuals derived from 305 parents. For the purposes of this study, the latter will be referred to as 16 All the trials were phenotyped for Fusarium head blight severity (FHBs) in an artificially inoculated disease nursery at the experimental station of the Department of Agrobiotechnology in Tulln (16 • 04, 16 E, 48 • 19, 08 N, and 177 m above sea level). Within each trial, two replicates per genotype were sown in double-rows of 1 m length with 17 cm spacing. A DON-producing Fusarium culmorum isolate (Fc91015) was applied at a conidial concentration of 2.5 × 10 4 spores mL −1 several times at anthesis with an automatic backpack sprayer. The anthesis date itself was recorded as flowering time (FT) observed as days after 1 May. Constant humidity/moisture conditions were kept through a mist irrigation system during 20 h after each inoculation. Anther retention (AR) was measured at five days post-anthesis as the proportion of 20 florets per plot: Four basal florets of five heads were manually opened and inspected on whether anthers remained within the floret or between lemma and palea. FHBs symptoms were scored as percentage of infected spikes for each plot on 10, 14, 18, and 22 days after inoculation (dai). The area under the disease progress curve (AUDPC) scores were standardized to a 0-100% scale as stated by [62].
where y t is the observed data at time t and d t is the tth day of measurement, going from t = 10 to 22, and r is the total number of observations r = 4. Accumulated thermal time (ATT) at each of the disease evaluation's scoring dates plus 1, 2, 4, and 7 dai were determined as the sum of degree-days per every single 24 h period, calculated in a modified form as suggested by [63]: where minimal (T min ) and maximal (T max ) temperatures are measured in • C. Base temperature, defined as the one below which growth in the system ceases, was ignored.

FHB Severity Correction Methods
The following correction methods were applied over the raw data in a plot observations basis for each trial separately. In addition, every application was done every time a given TS or VS was sampled. AUDPC scores were first corrected as the residuals from the regression on the FT scores x FT , as stated in [64]: where β 0 and β FT are the intercept and the regression coefficient, respectively.

Method-2: Feature Selection
Alternatively, a preceding filtering step of ATT variables was conducted targeting nonlinear temporal trends. Basically, only the ATT variables showing a high correlation of r = ± 2 √ 0.60 with respect to flowering time were considered in the next analysis stages. A lasso regression [65] model was employed for selecting relevant predictors amongst the input variables of flowering time, as well as the thermal variables. Let y be the dependent variable, i.e., FHBr, and p be the total number of predictors x i ; the original linear regression model can be written as follows: The lasso algorithm estimates linear regression coefficients (β) through L1-constrained least squares, minimizing the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant (s). Specifically, for model (4), the constrained L1 norm can be given by the following inequality: The lasso parameter estimates' calculation is a problem equivalent to minimizing the following loss function in a typical Lagrangian form for model (4): where N is the sample size and λ ≥ 0 is a complexity tuning parameter controlling the degree of shrinkage. It was obtained through cross-validation as implemented in the cv.glmnet function of the glmnet R package [66]. Similar to (3), the new subset of features (p ) were the new predictors used to correct AUDPC scores:

Field Trials Analysis
Best linear unbiased estimates (BLUEs) were derived from each trial and trait from a linear mixed model of the form: where µ is the overall mean; y i,j are the plot-basis observations of the traits AR, FT, AUDPC, AUDPC-1, and AUDPC-2; g i is the fixed effect of the ith genotype; r j is the random effect of the jth replication; and e ij is the random residual effect. Model (8) was expanded to account for the effect of the kth trial t k as follows: where µ is the overall mean; y ijk are the plot-basis observations of the traits AR, FT, AUDPC, AUDPC-1, and AUDPC-2; g i is the fixed effect of the ith genotype; t k is the effect of the kth trial; (g·t) ik and (t·r) kj are the interaction terms of "genotype × trial" and "trial × replication", respectively; and e ijk is the random residual effect. Entry-mean heritabilities (h 2 ) for both training and validation sets were calculated as h 2 = σ 2 g /(σ 2 g + 1 2 MVD), where σ 2 g corresponds to the genetic variance and MVD to the mean variance of a difference of the BLUEs [67]. Additionally, plot-basis heritabilities (H 2 ) were for the training sets and as for the validation sets where nr is the number of replications, σ 2 e is the residual variance, nt is the number of trials, and σ 2 gt is the "genotype × trial" variance component.

Genotypic Data
DNA from all the lines was extracted with a modified protocol by [68], and each one was genotyped with the genotyping-by-sequencing (GBS) approach (Diversity Array technologies P/L). Quality control filtered out markers with more than 10% of missing data or a minor allele frequency smaller than 5%, which resulted in a set of 5700 single nucleotide polymorphism (SNP) markers.

Validation Schemes
Every validation step consisted of randomly sub-setting 25 and 200 lines as validation (VS) and training (TS) sets, respectively, from their larger sets. The latter sampling was repeated 300 times. Only forward prediction scenarios were considered, resulting in each of the three OV being predicted using their previous trial as TS (Figure 1a). In addition, 17-OV2 and 18-OV3 were predicted with enlarged training sets of 400-600 lines composed of the sum of 200 sampled lines from each of the previous two or three trials ( Figure 1b). nr is the number of replications, is the residual variance, nt is the number of trials, and σ is the "genotype x trial" variance component.

Genotypic Data
DNA from all the lines was extracted with a modified protocol by [68], and each one was genotyped with the genotyping-by-sequencing (GBS) approach (Diversity Array technologies P/L). Quality control filtered out markers with more than 10% of missing data or a minor allele frequency smaller than 5%, which resulted in a set of 5700 single nucleotide polymorphism (SNP) markers.

Validation Schemes
Every validation step consisted of randomly sub-setting 25 and 200 lines as validation (VS) and training (TS) sets, respectively, from their larger sets. The latter sampling was repeated 300 times. Only forward prediction scenarios were considered, resulting in each of the three OV being predicted using their previous trial as TS (Figure 1a). In addition, 17-OV2 and 18-OV3 were predicted with enlarged training sets of 400-600 lines composed of the sum of 200 sampled lines from each of the previous two or three trials (Figure 1b).

GS Models
Genomic best-linear unbiased predictions were obtained from the following model: where y contains the calculated AUDPC estimates either obtained by (1), (3), or (7); β is the vector of the fixed effects containing the overall mean; g is the vector of genomic estimated breeding values (GEVBs) [g ∼ N 0, Gσ 2 g ], with σ 2 g being the genotypic variance estimated by the restricted maximum likelihood (REML) approach. X and Z are the design matrices for fixed and random effects, respectively; e is a vector containing the residuals e ∼ N 0, σ 2 e ; and G accounts for the genomic relationship matrix as [69] where Z is the matrix of m markers and n individuals with elements z ij = x ij − 2p j + 1, x ij is the value of a given allele for the ith genotype at the jth locus, and p j the allele frequency of the jth marker. Model (10) includes a fixed effect for the kth trial when multiple trials was combined as a training set.
Additionally, a so-called trait-assisted genomic prediction [25,26] was evaluated, in which for each validation run, the assistance trait (AR) estimators of the validation set were taken into account and represented pre-existing information about the genotype performance corresponding to the AR scores measured in the trial preceding the validation year.
Multi-trait genomic best linear unbiased prediction (GBLUP) models followed this general equation: where y t is the array for a given number of t traits, i.e., AUDPC and AR, containing the BLUE from the phenotypic analysis; and g t is the array containing the GEBVs [g ∼ MVN 0, Σ g ⊗ G ] with Σ g as the complete unstructured variance-covariance matrix, σ 2 , and M t as its design matrix. The terms σ 2 g 1 , σ 2 g 2 are the genetic variances of AUDPC and AR, respectively. The residual array follows a distribution: [e ∼ MVN(0, Σ e ⊗ I n )], where I n stands for an nxn identity matrix and Σ e for the completely unstructured variance-covariance matrix accounting for the residual variance and correlations between both traits.
Prediction accuracy (PA) was estimated as the Pearson correlation coefficient between the BLUE estimates for AUDPC-corrected scores and their respective genomic-estimated breeding values GEBVs accordingly divided by the square root of the entry-mean heritability h 2 .

Phenotypic Values
Disease severity was considerably higher in 2016 than in the other years while the lowest range and less variable scores were observed in the 2017 trial. The 2018 trial was the most variable in terms of flowering time contrarily to the 2017 trial that had the smallest range in FT. The lines evaluated in the 2017 trial were more prone to trap anthers across the largest training sets ( Figure S1).
Flowering time was the trait with the highest heritabilities' estimates concerning the across-trials analysis. Heritabilities for AUDPC were moderately high and slightly lower in magnitude than both corrected AUDPC scores, suggesting that the latter were able to capture a higher proportion of genotypic variability.
However, across training sets, heritabilities of the corrected AUDPC scores were lower (Table 1) and a closer look at the repeatability at plot level within each trial showed that they were not improved by the applied corrections (Table 2). Interestingly, the genetic variances (σ 2 g ) were higher for the validation sets than for the training sets.

Trait Correlations and Variable Selection
Disease severity scores in the 2017 trial were atypically correlated toward FT as they showed a moderate negative coefficient (r = −0.46), while in both the 2015 and 2016 trials, those traits were strongly and moderately positively correlated with coefficients of r = 0.51 and 0.29, respectively. The smaller validation set in the 2018 trials also exhibited a stronger positive tradeoff r = 0.73 ( Figure S2).
The correlations between accumulated thermal time (ATT) and AUDPC scores were specific to each trial and differed in magnitude and sense. ATT features in the 2015 trial showed a two-clustered profile correlation with AUDPC being positive from one to four dai and negative from 14 to 22 dai. In the 2016 trial, none of the ATT variables seemed to play a major role concerning disease severity nor flowering time, and this led to method-1 and method-2 to be equivalent in this specific TS. ATT variables at one, two, and eighteen dai were discarded from the feature selection process in the 2017 TS and the rest of the ATT features were significantly positively correlated with disease severity except ATT at 22 dai. In the 2018 trial, all ATT features were considered relevant, and conversely to the 2017 trial, disease severity and earliness were directly related, meaning that the most affected and late-flowering genotypes accumulated a higher degree-day rating ( Figure S2). The latter lead to the observation that, typically, the most affected and earlier genotypes in this trial were also the ones that accumulated much more degree-days.
Concerning the training sets of both 2015 and 2017 trials, a maximum of three predictors were selected under method-2 and none of those were recurrent. Flowering time was not present on the best-features set in the 2017 trial presumably replaced by ATT at 22 dai as both were negatively correlated with disease severity. The models derived through method-2 significantly improved their counterparts with FT as the unique predictor in terms of R 2 and Akaike information criterion (AIC) ( Table 2). FT was consistently selected as the best explanatory variable in 2017 OV lines, and ATT04 was the unique variable producing the best fit in the 2018 trial (Table 3).

Phenotypic Selection
The highest phenotypic correlation of AUDPC raw scores between overlapping lines was detected between the 2016 and 2017 trials (r = 0.74) and the lowest between the 2017 and 2018 trials (r = 0.18). Nonetheless, phenotypic selection based on corrected AUDPC scores was superior to the AUDPC raw-based selection across every validation set (Table 3). Significant increments of 4% were on average obtained when selecting lines from 16-OV1 and 17-OV2 regardless of the correction method and, remarkably, selection of the 18-OV3 lines based on corrected scores led to up to more-than-threefold increments where method-2 outperformed method-1 ( Table 3).
Given that anther retention AR scores were used to select for disease severity (indirect selection IS), the correlation coefficients were lower than the direct selection described above except for the selection of AUDPC raw scores of 18-OV3 lines. This indirect selection based on AR was most favorable in the selection of 16-OV1 lines because of both the high correlations AR-AR and AR-AUDPC on the implied trials (Table 3, Figure S3).

Single Trials
Across the three validation scenarios with TS composed of single trials (schemes detailed in Figure 1a), prediction accuracies PA averaged 0.28, 0.33, and 0.36 for the GS models based on AUDPC raw scores (AU-R), and corrected under method-1 (AU-1) and method-2 (AU-2), respectively (Figure 2a). Hence, genomic predictions based on corrected AUDPC scores under method-2 showed a slightly superior PA among all three scenarios, being most notorious for the prediction of both 16-OV1 and 18-OV3 sets. This advantage of method-2 over method-1 reflects the better model fit in the phenotypic analysis of FHB severity in both TS of the 2015 and 2017 trials, as well as in the 18-OV3 VS by selecting additional and/or different features than flowering time ( Table 2, Table 3).
Increments from 8% up to more than sixfold were achieved when anther retention scores from their previous year served as the assisting trait in multi-trait genomic selection (MT.GS) models predicting 16-OV1 and 18-OV3 lines, respectively (Figure 2a). Single-trait ST.GS models predicting 18-OV3 lines performed poorly in terms of PA; however, their MT versions produced the largest increments when modeling AUDPC-corrected scores (Figure 2a). On the other hand, the prediction of the 17-OV2 lines was the scenario where the most accurate ST genomic predictions were obtained, and it was precisely the one that did not represent an opportunity for their MT versions with poorer PA performances, which was 3% lower regardless of the type of AUDPC scores (Figure 2a). The latter observation for such a case might be partly explained by the fact that this was the only validation scenario where, on average, prediction accuracy is sensibly higher than the correlation of AR-AUDPC seen in the indirect selection based on AR scores from the previous year. the three previous trials regardless of the type of severity score, while for their MT.GS model versions, the most successful cases were models trained with the merge of lines from 2016 and 2017 trials. Across all scenarios in consideration, the relative increments in PA of MT.GS models averaged 3.1, 2.6, and 2.1 folds for the AU-R, AU-1, and AU-2, respectively, from their single trait model versions.

Combined Trials
The GS models aiming to predict 17-OV2 lines trained with the combination of lines from the 2015 and 2016 trials led to worse PA performances for both traits AU-1 and AU-R scores (PA = 0.48 and 0.41) compared with the model trained only with the 2016 trial (PA = 0.52 and 0.54). In the latter case, the prediction of the AUDPC scores corrected by method-2 (AU-2) outperformed the other traits and was equivalent to its respective single-trial counterpart (Figure 2b).
The following outcomes might be highlighted from the three scenarios of the genomic selection of the 18-OV3 lines employing TS composed from various trials (Figure 2b): The poor performance of models exclusively trained with lines from the 2017 trial was improved and, more specifically, genomic predictions based on AU-2 scores resulted in performances averaging PA = 0.18 and 0.36 for the ST and MT models' versions, respectively, versus models based on AU-1 scores PA = 0.15 and 0.38, and models based on AUDPC-raw scores PA = 0.13 and 0.38. The only scenario where the type of FHB severity score represented significant differences in PA was when 2015 and 2017 trials were combined to train the single trait GS model, leading to the following order according to their PA: AU-2 > AU-1, AU-R. The highest PA rates for the ST.GS models were obtained when the TS was combined with lines from the three previous trials regardless of the type of severity score, while for their MT.GS model versions, the most successful cases were models trained with the merge of lines from 2016 and 2017 trials. Across all scenarios in consideration, the relative increments in PA of MT.GS models averaged 3.1, 2.6, and 2.1 folds for the AU-R, AU-1, and AU-2, respectively, from their single trait model versions.

Discussion
This study evaluated a procedure consisting of a best-subset multiple linear regression analysis accounting for earliness and temperature variables to produce corrected FHB severity scores that were subsequently compared with the original trait under various genomic prediction scenarios. The convenience of including anther retention as an indicator trait in phenotypic and genomic selection was also investigated, and by doing so, this becomes the first report, at least to the best of our knowledge, of such an attempt in winter wheat.
QTL mapping studies have found twelve overlapping loci between HD/FT and FHB resistance traits, with the most frequently detected QTL being close to the Vrn and Ppd genes that control, among others, the vernalization requirement and photoperiodic sensitivity in wheat [11]. However, when it comes to multi-environment trials, the nonsystematic type of FT-FHBr associations have been detected [36] and several studies follow a pre-correction of the FHBs scores to dissect it from passive mechanisms of resistance [34,70,71]. In order to tackle this trade-off, genome-wide association mapping and prediction studies [27,[70][71][72] oftentimes follow the approach described in [34], which is basically the same as that referred to in this study as method-1. Our findings constitute in this way an enhancement from the latter approach that exclusively considered earliness traits and increased the explainability of the genetic variability of FHB resistance of wheat lines evaluated in consecutive years. According to our results, up to three thermal variables were preferably selected over FT in 2017 and 2018 trials as best predictors. Noticeably, stepwise linear regression has often been used to identify the best predictors when modeling epidemiologic data including the FHB pathosystem [73][74][75]. However, the usage of the forward and backward elimination of variables has been discouraged, with all subset selections being a more statistically grounded option [74], whereas lasso regression often outperforms the latter methods due to a higher coefficient stability, especially in cases of multicollinearity between predictor variables [76].
Wheat heads are most susceptible to FHB infection at anthesis [77] but infection can occur up to the soft dough stage [78]. ATT at the day of anthesis/inoculation was significant and directly correlated with FHBs, except in the 2016 trial, and it became the best predictor in the TS of the 2015 trial and the 18-OV3 lines. ATT variables closer to the date of anthesis were likewise chosen in 2015 (two dai) and in 2017 (both four and seven dai). On the other hand, ATTs at 14 and 22 dai were additionally selected in 2015 and 2017 trials, respectively. Models employed to predict the risk of FHB epidemics [73,79] often consider among their predictors temperature measurements at up to 15 days post-anthesis and interactions with either rainfall or humidity variables. Considering the latter weather measurements, as well as alternative nonlinear responses of FHBs to environmental variables [63], might improve the proposed method-2. Concerning relative humidity, it was not accounted for in the presented approach because of the assumption that mist irrigation conditions kept this factor at high and constant levels in trials conducted for the study at hand.
The accumulation of temperature seemed to favor the development of disease at early stages of disease development (one to seven dai) across all trials. Contrarily, less visible symptoms were correlated with ATT variables at later evaluation stages in 2017 (at 22 dai) and especially in 2015 with the difference that in the former trial, earliness was associated with more susceptible genotypes. A shift in the daily temperature might explain those changes between 43 and 51 days after May 1st in 2015 where a decrease of almost 11 • C in the daily mean temperature was detected. Seemingly in 2017, there was a sustained increase between 48 and 53 days after 1 May 2017 of 8.5 • C ( Figure S4).
The composition and size of the training sets are of utmost importance in GS as evidenced by simulated and empirical data [80][81][82], and in that regard, this study proposed validation scenarios using complete trials as folds to validate genotypes in unobserved years. Poor performances of GS models predicting 18-OV3 lines were found in absolute terms of prediction accuracies that were improved on average 2.9-and 1.9-fold for 2-3 times larger training set sizes, respectively, by adding both lines and additional trials. Nevertheless, the contrary effect was observed for 17-OV2 lines especially for uncorrected AUDPC scores, where the prediction accuracy was 20% lower when the TS consisted of lines from both 2015 and 2016 trials. As both the 2015 and 2018 trials were heavily dependent on earliness and temperature, adjusting their AUDPC scores from these strong phenological influences putatively allowed the GS models to exploit genomic relatedness more efficiently. Prediction accuracy is supposed to be directly related to the degree of genetic relationship between training and validation sets. The performance of GS models in terms of prediction accuracy for FHB resistance in wheat has been validated in independent samples [83] and found to be comparable to cross-validated schemas conditioned to the relatedness level TS-VS. Additionally, based on empirical results, it has been proposed that less biased predictions across breeding cycles/years can be expected for highly heritable traits like FHB or plant height, as shown in barley [84] compared to quality traits in wheat [85] and sugar beet [86].
The correspondence between a high proportion of retained anthers and an increased FHB severity has been systematically found across wheat populations [58]. AR is considered a morphological marker for FHBr, especially for its Type I or resistance to initial infection [52,59]. Here, indirect selection based on AR measurements led to correlations between 0.30 and 0.51, and they were on average 34% lower than the actual direct selection based on AUDPC scores. Anther retention keeps the advantages of traits like plant height or flowering time as a low-cost phenotype indirectly associated with FHBr, but unlike the latter two, it is agronomically more neutral in the sense that further breeding implications/considerations like the selection of unfavorable tall genotypes might be largely avoided. Although AR was also correlated with plant height in the germplasm studied here, this correlation was low and mostly attributed to the presence/absence of the dwarfing alleles at the Rht-B1 and Rht-D1 loci (data not shown). Multi-trait GS models in the FHB resistance context in wheat have been studied, most of them accounting for flowering time and plant height [27,87,88] as covariates or related traits like FHB incidence, DON concentration, or Fusarium damage kernel index (FDK) [89,90]. An upsurge in prediction accuracy of MT.GS models was confirmed here with AR as the indicator trait for most of the validation scenarios. For instance, increments in prediction accuracy averaged 46% across the three scenarios consisting of multiple trial-training sets predicting 18-OV3 lines; however, the MT.GS models performed slightly worse (−3%) when aiming to predict the set of 17-OV2 lines with lines from the previous year. From simulation and empirical proofs, it has been noted that both higher heritabilities and correlations are needed for the indicator trait in order to yield higher accuracies [23,88,91,92]. As both FHBr and AR had a similar heritability in the study at hand, the degree of their correlation in each GS scenario seemed to be the major constraint for improving the prediction accuracy. Hence, in cases when this correlation was low, the prediction accuracy could not be improved beyond the respective single-trait model, while an advantage of using AR as an assisting trait in MT.GS models was generally seen with an increase in the correlation of AR-FHBr. Finally, MT.GS models assisted by AR also represented an advantage over their indirect phenotypic selection counterparts.

Conclusions
The approach presented here to account for temperature variations within and among seasons for correcting FHB severity scores, although not being the most comprehensive, turned out to be a simple and fast way to produce consistent predictions for observed and unobserved winter wheat genotypes. Furthermore, multi-trait genomic selection models evidenced the feasibility of the usage of AR as an assisting trait even in cases without FHB records available (e.g., early breeding stages), which could be scored and used for indirect FHBr selection and will not require additional considerations to compensate for undesirable indirect effects during selection.