Multi-Year Dynamics of Single-Step Genomic Prediction in an Applied Wheat Breeding Program

The availability of cost-efficient genotyping technologies has facilitated the implementation of genomic selection into numerous breeding programs. However, some studies reported a superiority of pedigree over genomic selection in line breeding, and as, aside from systematic record keeping, no additional costs are incurring in pedigree-based prediction, the question about the actual benefit of fingerprinting several hundred lines each year might suggest itself. This study aimed thus on shedding some light on this question by comparing pedigree, genomic, and single-step prediction models using phenotypic and genotypic data that has been collected during a time period of ten years in an applied wheat breeding program. The mentioned models were for this purpose empirically tested in a multi-year forward prediction as well as a supporting simulation study. Given the availability of deep pedigree records, pedigree prediction performed similar to genomic prediction for some of the investigated traits if preexisting information of the selection candidates was available. Notwithstanding, blending both information sources increased the prediction accuracy and thus the selection gain substantially, especially for low heritable traits. Nevertheless, the largest advantage of genomic predictions can be seen for breeding scenarios where such preexisting information is not systemically available or difficult and costly to obtain.


Introduction
Four prediction scenarios have been of principal interest when implementing genomic selection into breeding programs: The prediction of untested and tested genotypes in tested and untested environments. These combinations have thus been investigated in various studies, which found that the prediction of tested genotypes in already tested environments results in the highest accuracy followed by the prediction of untested genotypes in tested environments [1,2]. The most challenging scenario is though given by predicting untested genotypes in untested environments, which corresponds to, among others, genomic-based prediction across multiple years in applied breeding programs [3][4][5][6]. This prediction problem can however be simplified in some cases to predicting already tested genotypes in untested environments or years by including preexisting information e.g., from preliminary yield trials for the trait of interest or of correlated secondary traits into prediction models [7][8][9].
Such a multi-year prediction is oftentimes for a large extent equivalent with a prediction across multiple breeding cycles or cohorts in cereal breeding [10], giving rise to the further challenge of an increasing genetic distance between the training population and selection candidates, which generally results in lower prediction accuracies in comparison to training populations that are closely related to the selection candidates [11,12]. An additional issue, aside from actual relationships, is given by the several possibilities to address this genetic distance in order to derive accurate performance predictions. Marker-based relationship matrices have been frequently reported to outperform models with pedigree relationship matrices for this purpose [13,14], whereas some studies have reported similar or even superior performance of pedigree in comparison to genomic selection in applied line breeding programs [15][16][17].
Given that no additional direct costs or labour, aside from systematic record keeping, are incurring when exploiting pedigree data for deriving breeding values, the question about the benefit of fingerprinting several hundred of even thousands of lines each year might suggest itself. One possibility to putatively enhance accuracies above pure pedigree and genomic prediction is given by blending both relationship matrices for a so-called single-step prediction [18,19]. The merit of such a single-step prediction is though dependent on the fine-tuning of several scaling factors used for setting-up a combined relationship matrix [20], and on the dynamics of genotype-by-environments interactions when prediction across multiple years. The aims of this study were thus to (i) compare various possibilities for blending pedigree and genomic relationship information, and (ii) to investigate the dynamics of pedigree, genomic, and single-step prediction models across a time period of ten years in an applied wheat (Triticum aestivum L.) breeding program.

Plant Material
A population of 4032 recombinant inbred and double haploid breeding lines, developed in an applied winter wheat breeding program, was analysed in this study (Table 1). A subset of 2567 of these lines were phenotyped for grain yield, protein content using near-infrared spectroscopy (NIRS), and protein yield as the product of both traits merely in preliminary yield or observation trials at one location in Austria between 2012 and 2018, where F 4:5 generation lines were tested unreplicated alongside replicated checks. Another subset of 1464 of these lines were also investigated for the mentioned traits in a total of 148 Central and Eastern European multi-environment trials in a total of different 38 locations from 2010 to 2019. The latter lines represented selection candidates with superior performance that were advanced to multi-environment testing in several countries ranging from Austria, Serbia and Romania to Turkey, and within each year a different subpopulation of 113-209 lines was tested in 2-24 trials, depending on the year and trait. Trial series within each year were generally unbalanced and partially replicated lines were tested alongside checks in plots of 5-10 m 2 size within each trial, while each line was on average replicated five times for protein yield and the protein content and eight times for grain yield across all trials. The number of lines from preliminary trials, which were retested in multi-environment trials in the corresponding subsequent year varied finally between 76 and 170. All trials were managed according to good agronomical practice at the given location.  2010  127  2011  162  2012  208  142  130  2013  193  306  170  2014  202  361  156  2015  206  676  151  2016  192  666  139  2017  209  875  83  2018  113  554  76  2019 114 † Including lines that were tested in multiple years of multi-environment trials; ‡ Number of F 5:6 , F 5:7 and doubled haploid lines tested in multi-environment trials; § Number of F 4:5 and doubled haploid lines tested in the unreplicated preliminary yield trial; ¶ Number of selected lines that were retested in multi-environment trials in the next year.

Statistical Analysis of the Phenotypic Data
Phenotypic data from all 148 multi-environment trials and the seven preliminary yield trials were firstly analysed individually with various models correcting for spatial trends. All 15 possible combinations of random row and/or column effects with/without modelling autoregressive variancecovariance structures between the plots either in row, in column or in both directions [21] were compared with a baseline model without spatial correction (Supplementary Table S1). The best fitting model was chosen by Akaike's information criterion (AIC) to derive best linear unbiased estimates (BLUE) and compute the heritability by where σ 2 G designates the genetic variance of the investigated trait, and MVD is the mean variance of a difference between the BLUEs [22]. The derived BLUEs from the 148 multi-environment trials were subsequently used for an across-trial analysis within each year with a linear mixed model of the form y jk = µ + g j + t k + e jk (2) where y jk are the BLUEs from the individual trials for grain yield, protein content and protein yield, respectively; µ is the grand mean; and g j is the effect of the j-th line which was modelled fixed to derive year-specific BLUEs and random to estimate the genetic variance. The effect of the kth trial t k was fixed, while the effect e jk that incorporated both the line-by-trial interaction variance and the residual effect was assumed to be random following a normal distribution with e ∼ N 0, Iσ 2 e . It should be stressed that the preliminary yield trials were not subject to an across-trial analysis, and each preliminary yield trial was regarded as an individual set in all subsequent analysis. The year-wise heritabilities obtained in this across-trial analysis were again derived using Equation (1). All phenotypic analyses were conducted with ASReml 3 for R 3.6.3 [23].

Pedigree and Genotypic Data
The 4032 breeding lines could be grouped into 1420 different families with a size of 1-34 lines per family and a genealogy of 1151 ancestors tracing back up to 13 generations. DNA of all investigated breeding lines was extracted with a modified protocol following [24], and each line was genotyped with the genotyping-by-sequencing (GBS) approach from Diversity Array Technologies in Australia [25]. Markers with more than 10% of missing data and a minor allele frequency smaller than 5% were removed, while only one SNP of completely identical markers was retained for all for further analyses. Therefore, a final set of 3137 SNP markers was available after quality control and chromosome-wise imputing missing data points with the missForest algorithm [26]. The part of missing datapoints amounted 5.7% and an imputation accuracy of r = 0.93 was assessed at a 10% missing level by masking another 4.3% of the data points and correlating their observed and imputed marker profiles in a 50 times replicated resampling scheme. Originally missing data points not caused by masking were excluded from this evaluation as suggested by [27]. This marker set was subsequently used to examine the population structure by a principal component analysis (Supplementary Figure S1) and utilized for modelling genomic relationships in all employed prediction models. Phenotypic and genotypic data for a simulated population of wheat lines based on the actual GBS maker data as well as accompanied R Code is furthermore available as supplementary material to illustrate the mentioned prediction models that will be described in the next section.

Comparison of Phenotypic with Pedigree, Genomic and Single-Step Prediction across Years
The merit of different models for predictive breeding was assessed in a forward prediction of 2013-2019, where a random sample of 50 lines tested in multi-environment trials constituted one Agronomy 2020, 10, 1591 4 of 18 validation population for each of these years. Phenotypic prediction was based on the observed line performance in unreplicated preliminary yield trials in the respective preceding year, while pedigree-based and genomic-based prediction was conducted by randomly sampling a unique set of 100 lines tested in multi-environment trials from each of the three years preceding the validation year into a training population ( Figure 1A). It should be noticed that none of the lines in the validation population were contained in the mentioned training population to avoid an upward bias of the prediction accuracy, and an assessment of the within-family was not feasible as the validation populations were mostly comprised of families with less than three lines and merely three families contained more than ten lines. The described sampling scheme was repeated 50 times for each year, and the corresponding breeding values were obtained from Best Linear Unbiased Prediction Models for each trait of interest: where y is an Nx1 vector of BLUEs obtained in the year-wise phenotypic analysis with model (2), while the vector e designates of residual effects with e ∼ N 0, Iσ 2 e . X is a fixed effect matrix with the corresponding vector b modelling the grand mean as well as fixed year effect. Additionally, the potential of combining multi-environment with preliminary yield trial data for a pedigree-assisted and genomic-assisted prediction was assessed by using BLUEs of both multi-environment and preliminary yield trials in model (3), where b modelled the grand mean as well as fixed year-by-set effects, i.e., a multi-environment trial series and a preliminary trial tested in the same year had different fixed effects. The prediction models were furthermore extended by an additional fixed group effect, differentiating between multi-environment and preliminary yield trials, while a unique residual variance was assigned to each of these groups.
Agronomy 2020, 10, x FOR PEER REVIEW 5 of 20 Figure 1. Schematic representation of the tested prediction scenarios with 3-year training population of lines tested in multi-environment trials (A) as well as cumulative training population coming from all years preceding a validation year (B). A total of 100 lines from each year were randomly sampled in a training population and 50 lines in a validation population in the first scenario, while in the second scenario all available lines were sampled in the training population and the prediction models were validation with the entire set of lines that was in common between the respective preliminary yield trials and subsequent year of multi-environment trials. Prediction models were either fitted without pre-existing information of the selection candidates in the validation population (blue arrows) or exploiting such information in common models (purple arrows) and compared with phenotypic selection based on observations from preliminary yield trials (red arrows).
The relationship information was modelled by the random effects matrix and the corresponding vector of additive line effects with ~ N(0, σ G 2 ) for pedigree best linear unbiased prediction (P-BLUP) and ~ N(0, σ G 2 ) for genomic best linear unbiased prediction (G-BLUP). The necessary pedigree relationship matrix was computed by the recursive method proposed by [28], Figure 1. Schematic representation of the tested prediction scenarios with 3-year training population of lines tested in multi-environment trials (A) as well as cumulative training population coming from all years preceding a validation year (B). A total of 100 lines from each year were randomly sampled in a training population and 50 lines in a validation population in the first scenario, while in the second scenario all available lines were sampled in the training population and the prediction models were validation with the entire set of lines that was in common between the respective preliminary yield trials and subsequent year of multi-environment trials. Prediction models were either fitted without pre-existing information of the selection candidates in the validation population (blue arrows) or exploiting such information in common models (purple arrows) and compared with phenotypic selection based on observations from preliminary yield trials (red arrows).
The relationship information was modelled by the random effects matrix Z and the corresponding vector g of additive line effects with g ∼ N 0, Aσ 2 G for pedigree best linear unbiased prediction (P-BLUP) and g ∼ N 0, Gσ 2 G for genomic best linear unbiased prediction (G-BLUP). The necessary pedigree relationship matrix A was computed by the recursive method proposed by [28], while the genomic relationship matrix G was built following [29] where W is a centered marker matrix of the j lines with W jl = Z jl + 1 − 2p l and p l being the allele frequency at the lth locus. Blending the pedigree and genomic relationship matrix was facilitated by the method proposed by [18] as well as [19] for a single-step genomic best linear unbiased prediction (SSG-BLUP), assuming an additive line effect with g ∼ N 0, Hσ 2 G , where H was computed by with G mod = G adj + A 22 , and G adj = a + bG as adjusted genomic relationship matrix according to the suggestion by the authors of [30], which was employed to account for the impact of genetic trends across multiple generations by the intercept a and models the according reduction in genetic variance from the base population to the population of genotyped lines by b. The matrix A 11 contains the pedigree relationship between non-genotyped lines, A 22 the pedigree relationship between genotyped lines, while A 12 and A 21 model the pedigree relationship between genotyped and non-genotyped lines. Given that lines are oftentimes genotyped during a stage of preliminary yield trials in applied line breeding programs, a less costly strategy was tested that restricted genotyping to advanced lines tested in multi-environment trials, i.e., A 11 contained all lines of the validation population and A 22 modelled the relationship between lines in the training population for the single-step genomic predictions tested in the study at hand, while A 12 and A 21 modelled the relationship between lines in the training and validation population. The inverse of H, which is necessary for solving the underlying mixed model equations, was given by [20] where α, β, τ and ω are scaling factors influencing the mixing proportion between the involved pedigree and genomic relationship matrix. Based on the single-step prediction methodology, the merit of blending the genomic and pedigree relationship matrix was finally tested assuming that a breeding programs budget allows to genotype all early generation lines, i.e., A 11 was absent and A 22 was equivalent to A when deriving H mod −1 by It should be noticed that in this scenario the full matrix H −1 given in (6) is reduced to its lower right quadrant, i.e., the blending aspect of the single-step genomic prediction approach, and this approach will be referred to as pedigree-genomic prediction (PG-BLUP) in the study at hand. The scaling factors in H −1 and H mod −1 for both SSG-BLUP and PG-BLUP were subsequently investigated for their merit to increase the prediction accuracy by firstly varying τ and ω in the intervals τ = [0, 2] and ω = [−1, 1], excluding the combination ω = 1 and τ = 0 as H is indefinite in this case. The scaling factors α and β were furthermore kept constant at α = 1 and β = 0 at first, whereas they were subsequently varied in the grid (α, β) ∈ [0.05, 1]x[0.05, 1] when the two other scaling factors were kept constant at τ = 1 and ω = 1.
Agronomy 2020, 10, 1591 6 of 18 The above-described models were subsequently tested in another resampling scheme, where a unique set of 100 lines tested in multi-environment trials from one to nine years preceding the validation year were sampled into a training population in order to investigate the effect of its size. Please notice that for predicting 2013 a maximum of three years, i.e., 300 lines were used for model training, while for 2019 the entire range of one to nine years, i.e., 100-900 lines in the training population could be explored. Additional to the resampling schemes, a forward prediction using all available lines from multi-environment trials as well as the specific or all preliminary yield trials preceding a validation year for model training was finally conducted ( Figure 1B). The latter scenario combined all data from previous multi-environment and preliminary yield trials for prediction model training and represented thus an option to vastly increase the size of the training population albeit with lines possessing qualitatively lower phenotypic records. The prediction accuracy for all models was obtained from the correlation of the predicted breeding values with the observed genotypic performance divided by the square root of the heritability [31]. The prediction accuracy for phenotypic selection was analogously derived from the correlation of the BLUEs from the preliminary yield trial and the observed genotypic performance in a corresponding validation year divided by the square root of the heritability. The relationship matrices A and H were generated with the R package AGHmatrix [32], and all models for genomic and pedigree prediction were fitted with sommer [33].

Prediction Accuracy within and across Families
A simulation study was subsequently conducted to gain further insight into the performance of the empirically tested pedigree, genomic and single-step models particularly regarding the accuracy of predictions within bi-parental families and across family averages, which was not possible in the empirical study due to the above-mentioned small family sizes. For this purpose, a total of 50 families that were actually tested in preliminary yield trials in 2015-2018 were 50 times randomly sampled, and the existing marker information from the according parents were used to simulate progeny populations with 50 recombinant inbred lines per family. These validation populations were generated using the R/qtl package [34], which features a Stahl model [35] that assumes independent crossovers along the chromosomes to simulate recombination breakpoints. The training population was again built by sampling 100 genotypes from the three preceding breeding cycles or cohorts as in the empirical study ( Figure 1A).
The underlying causal variants of a quantitatively inherited trait were represented by sampling N QTL = 100 marker loci, while another set of N SNP = 2000 markers was sampled to represent linked loci for building genomic relationship matrices. The QTL effects α were randomly sampled from an identical and independent normal distribution with α ∼ N(0, 1) to obtain the estimated breeding values of all lines by P EBV = Q P α + e = P TBV + e (8) where Q P is the matrix of marker genotypes of the lines in the training and validation population, and α is the vector of QTL effects that were used to derive the vector P TBV of true breeding values. The entries for the vector of error effects e were randomly sampled from a normal distribution with zero mean and a variance equal to where σ 2 TBV is the variance of the true breeding values, and h 2 an aspired heritability of h 2 = 0.30, h 2 = 0.50 and h 2 = 0.70 that was separately determined for lines within each of the breeding cycles or cohorts representing the training population. The corresponding heritability of the validation population was on the other hand set to h 2 = 0.10, h 2 = 0.30 and h 2 = 0.50 in order to reflect the lower heritability of testing in preliminary yield trials. The accuracy of the respective prediction models was Agronomy 2020, 10, 1591 7 of 18 assessed by correlating the predicted with the true breeding values, while the true breeding value TBV ij of each line was additionally dissected into TBV ij = TBV F i + TBV L ij (10) where TBV F i is the true breeding value of the i-th family's average and TBV L ij is the deviation of the true breeding value of the j-th line from the i-th family´s average. The predicted breeding values were analogously divided into family averages as well as deviations from the average, of which both were used for assessing the prediction accuracy for the respective source of variation.

Results
A high average heritability was estimated for the protein content (H 2 = 0.76), followed by the grain yield (H 2 = 0.58) and protein yield (H 2 = 0.43) according to expectations (Supplementary  Table S2) [36][37][38]. Grain yield varied between 39.2 and 100.3 dt ha −1 and the protein content between 10.7 and 17.2% across all years, resulting in a protein yield of 6.0-13.6 dt ha −1 in the analysis across trials that were common for both traits. Mixing pedigree and genomic relationship matrices for pedigree-genomic-based prediction (PG-BLUP) gave hardly any benefit over genomic-based selection, i.e., when all lines in the training and validation had genotypic data and ω = 1 as well as β = 0 in H mod −1 [7], for any of the investigated traits without preexisting information of the selection candidates in the tested resampling scheme with 3-year training populations ( Figures 1A and 2A-F). Utilising data from preliminary yield trials for a pedigree-genomic-assisted prediction resulted, however, in a noticeable increase of up to 7.0-19.8% in prediction accuracy over pedigree-assisted (H mod −1 = A −1 ) and 2.5-10.9% over genomic-assisted (H mod −1 = G adj −1 ) prediction models, respectively, when conducting a grid-search for the optimal combination of α and β ( Figure 2G-I). Altering τ and ω in H mod −1 of the PG-BLUP model in this scenario revealed on the other hand a negligible advantage over the baseline G-BLUP and P-BLUP models except for the protein content ( Figure 2J-L). The more cost-efficient single-step genomic prediction with a non-genotyped validation population (SSG-BLUP) showed an increase in predictive performance for grain yield and protein content when ω → 1 in H −1 [6] in comparison to the baseline P-BLUP model (Figure 3 central + right columns), but it resulted generally in lower prediction accuracies than the mentioned pedigree-genomic prediction (PG-BLUP) models. Blending of the pedigree and genomic relationship matrices by αG adj + βA 22 appeared not to be favourable in SSG-BLUP as higher prediction accuracies were achieved when α → 1 and β → 0 . Noticeable, a larger weight on the pedigree relationship matrix was beneficial for the protein yield (Figure 3 left column) both for the single-step-based and single-step-assisted predictions utilising preexisting information of lines contained in the validation population.
As the difference in prediction accuracy of the optimum and equal weighting of α and β was less than 2%, the factor combination α = β = 0.5 and τ = ω = 1 was used for pedigree-genomic predictions (PG-BLUP) in this scenario. A similar observation was made, at least for grain yield and the protein content, for the scaling parameters in the single-step genomic predictions (SSG-BLUP), which were thus set to α = 1, β = 0 and τ = ω = 1. Enlarging the training population size by using an increasing number of years preceding a validation year from model training resulted in many cases in an increase of the prediction accuracy for the pedigree-genomic predictions (PG-BLUP) (Figure 4). Although there was a general trend of an increase in prediction accuracy with an increase of the training population size, employing the largest possible training population size did not always result in the highest prediction accuracy as, e.g., the prediction accuracy for protein yield was larger in 2016 with a training population size of 300 (3 years) than with 600 (6 years) lines ( Figure 4A). Similar patterns were also found for the other tested models for predicting grain yield (Supplementary Figure S2), protein content (Supplementary Figure S3) and protein yield (Supplementary Figure S4).  genomic prediction (PG-BLUP) models. Blending of the pedigree and genomic relationship matrices by α adj + β appeared not to be favourable in SSG-BLUP as higher prediction accuracies were achieved when α → 1 and β → 0. Noticeable, a larger weight on the pedigree relationship matrix was beneficial for the protein yield (Figure 3 left column) both for the single-step-based and singlestep-assisted predictions utilising preexisting information of lines contained in the validation population.  . Average prediction accuracy for protein yield (first columns), grain yield (central column) and protein content (right column) obtained from the 50 times repeated resampling scheme with a 3-year training population in a single-step genomic prediction scenario where only lines in the training population are genotypes to derive single-step-based (A-F; blue coloured) and single-step-assisted predictions (G-L; red coloured) for the non-genotyped lines in the validation population with the latter exploiting preexisting information of the validation population from preliminary yield trials. The weights on the scaling factors α, β, τ and ω to create the relationship matrix H mod −1 (Equation (7)) that blended pedigree and genomic relationship information were varied in a grid search, while the factors α and β were kept constant at α = 1 and β = 0 when varying τ and ω, whereas τ and ω were kept constant at ω = 1 and τ = 1 when modifying α and β. Isolines of same prediction accuracy for different combinations of the scaling factors are highlighted in black colour, while a darker colouring within the respective subplots A-L indicates a higher prediction accuracy for the particular model and scaling factor combination.
of the training population size, employing the largest possible training population size did not always result in the highest prediction accuracy as, e.g., the prediction accuracy for protein yield was larger in 2016 with a training population size of 300 (3 years) than with 600 (6 years) lines ( Figure 4A). Similar patterns were also found for the other tested models for predicting grain yield (Supplementary Figure S2), protein content (Supplementary Figure S3) and protein yield (Supplementary Figure S4).  . Prediction accuracy for protein yield using pedigree-genomic-based (A-C; blue coloured) as well as pedigree-genomic-assisted (D-F; red coloured) forward predictions with the latter exploiting pre-existing information from preliminary yield trials. The training population for each prediction model was built by randomly sampling 100 lines from one to nine years preceding the respective validation year. A darker colouring generally indicates a higher prediction accuracy.
Using the same scaling parameters as described above, the models combining pedigree and genomic information were subsequently compared with the baseline G-BLUP and P-BLUP models in a forward prediction across multiple years with an accumulating training population of all lines tested in the years preceding the validation year ( Figure 1B). Genomic-based prediction (G-BLUP) was on average superior to pedigree-based prediction (P-BLUP) for all investigated traits in this forward prediction ( Figure 5), while using single-step-based prediction (SSG-BLUP) with a training population of genotyped advanced generation lines led to a notable increase of the prediction accuracy in comparison to the latter model except for the protein content.
Combining data from all previous multi-environment and preliminary yield trials for prediction model training was used to test the merit of vastly increasing the size of the training population for the mentioned models, especially for the more recent years ( Figure 6). This augmentation of the training population did not lead to a clear advantage over prediction models that used only preexisting information from one preliminary yield trial for a particular set of selection candidates in combination with a training set comprised of lines tested in multi-environment trials (Figure 7), and the average difference between both methods ranged from r = −0.03 to r = 0.02.
Except for the protein yield, all tested pedigree and genomic models without preexisting information of the selection candidates performed on average worse in comparison to phenotypic selection based on observations from preliminary yield trials, which was, however, dependent on the specific year ( Supplementary Figures S5 and S6). Notwithstanding, exploiting phenotypic data from preliminary yield trials for model training resulted in the majority of cases in more accurate predictions than possible by a selection based on phenotypic observations alone. Markedly, pedigree-assisted selection was in this scenario inferior to genomic-assisted selection for protein yield ( Figure 5B), but performed similar for the protein content ( Figure 5F), and even superior for grain yield ( Figure 5D). Nevertheless, making additional use of molecular marker data for a pedigree-genomic-assisted selection (PG-BLUP) resulted in prediction accuracies that were 48.1%, 5.2%, and 0.7% higher in comparison to pedigree-assisted selection (P-BLUP) for each of the aforementioned traits, whereas a single-step-assisted selection (SSG-BLUP) performed worse than each of the mentioned methods.
Agronomy 2020, 10, x FOR PEER REVIEW 11 of 20 Using the same scaling parameters as described above, the models combining pedigree and genomic information were subsequently compared with the baseline G-BLUP and P-BLUP models in a forward prediction across multiple years with an accumulating training population of all lines tested in the years preceding the validation year ( Figure 1B). Genomic-based prediction (G-BLUP) was on average superior to pedigree-based prediction (P-BLUP) for all investigated traits in this forward prediction (Figure 5), while using single-step-based prediction (SSG-BLUP) with a training population of genotyped advanced generation lines led to a notable increase of the prediction accuracy in comparison to the latter model except for the protein content.  . The employed training populations included all available lines tested in multi-environment trials preceding a specific validation year, whereas both scenarios without (TP small ) or with augmenting the training population with lines tested in all preliminary yield trials preceding a validation year (TP large ) were considered. All prediction models were fitted either without including preexisting information of a specific validation population from the preliminary yield trial preceding the validation year ("Based"-predictions) or with including this source of phenotypic information into the training population ("Assisted"-predictions). the mentioned models, especially for the more recent years (Figure 6). This augmentation of the training population did not lead to a clear advantage over prediction models that used only preexisting information from one preliminary yield trial for a particular set of selection candidates in combination with a training set comprised of lines tested in multi-environment trials (Figure 7), and the average difference between both methods ranged from r = −0.03 to r = 0.02.  . Difference in the prediction accuracy with (TPlarge) or without (TPsmall) augmenting the training population with lines tested in all preliminary yield trials preceding a validation year (∆ = r TP large − r TP small ) using pedigree-based (P-BLUP), single-step-based (SSG-BLUP), genomic-based (G-BLUP) and pedigree-genomic-based (PG-BLUP) forward predictions (A-C; blue coloured) as well as pedigree-assisted, single-step-assisted, genomic-assisted and pedigree-genomic-assisted forward predictions with the latter exploiting preexisting information of the validation population from preliminary yield trials (D-F; red coloured). A darker colouring generally indicates an advantage for augmenting the training population with lines tested in all preliminary yield trials preceding a validation year.
Except for the protein yield, all tested pedigree and genomic models without preexisting information of the selection candidates performed on average worse in comparison to phenotypic selection based on observations from preliminary yield trials, which was, however, dependent on the specific year ( Supplementary Figures S5 and S6). Notwithstanding, exploiting phenotypic data from preliminary yield trials for model training resulted in the majority of cases in more accurate predictions than possible by a selection based on phenotypic observations alone. Markedly, pedigreeassisted selection was in this scenario inferior to genomic-assisted selection for protein yield ( Figure   Figure 7. Difference in the prediction accuracy with (TP large ) or without (TP small ) augmenting the training population with lines tested in all preliminary yield trials preceding a validation year (∆ = r TP large − r TP small ) using pedigree-based (P-BLUP), single-step-based (SSG-BLUP), genomic-based (G-BLUP) and pedigree-genomic-based (PG-BLUP) forward predictions (A-C; blue coloured) as well as pedigree-assisted, single-step-assisted, genomic-assisted and pedigree-genomic-assisted forward predictions with the latter exploiting preexisting information of the validation population from preliminary yield trials (D-F; red coloured). A darker colouring generally indicates an advantage for augmenting the training population with lines tested in all preliminary yield trials preceding a validation year.
Similar observations were made in the simulation study, where the prediction accuracy for differentiating lines within families could be increased by including phenotypic data from the validation population into the model (Table 2). Noticeably, within-family prediction was in this case also possible with pedigree relationships, resulting in the same accuracy as phenotypic within-family prediction. Pedigree-assisted prediction performed however still worse than genomic prediction, whereas this difference became smaller with an increase in heritability. Lastly, predicting across family averages was always very accurate and even close to one when utilising preexisting information of the selection candidates for model training. Table 2. Comparison of the prediction accuracies for different validation metrics in the simulation using only phenotypic (OBS), pedigree (P-BLUP), single-step (SSG-BLUP), genomic (G-BLUP) or pedigree-genomic (PG-BLUP) predictions with and without using preexisting information of the selection candidates.

Discussion
Blending vast pedigree records with DNA fingerprints by single-step prediction has been a major topic in animal breeding [20], and several studies in dairy cows, poultry and pigs showed a clear advantage of implementing this method into breeding programs [39]. The application of this approach to enlarge training populations has likewise revealed a slight advantage for predicting fingerprinted genotypes in plant breeding programs for which extensive pedigree records were available [15,40]. Some merit of using a single-step genomic prediction was found in this study when testing a breeding strategy in which only advanced generation lines tested in multi-environment trials are genotyped to improve the prediction of non-genotyped early generation lines tested in preliminary yield trials. The above-mentioned studies employed however fixed values for the scaling factors α, β, τ and ω, whereas a slight advantage was found by [41] when varying the mixing proportion of the pedigree and genomic relationship matrix by altering the ratio τ : (τ − 1) with the constraint τ = ω and τ > 0. The variation of the scaling factors showed on the other hand a marginal effect on accuracy of the tested single-step genomic prediction approach in the study at hand, making the basic weights of τ = ω = 1 preferable, while setting arbitrary values of α = 0.95 and β = 0.05 can furthermore ensure that G mod in the lower right quadrant of H is invertible and aid in the convergence of the respective prediction models [39] albeit this was not an issue in the study at hand.
Notwithstanding, in case that the budget of a breeding program allows genotyping all early generation lines, the aim of blending relationship matrices in the here tested pedigree-genomic prediction can be seen in targeting genetic variation that could not be explained by markers resulting in a higher heritability in comparison to the G-BLUP model (Supplementary Table S3). Altering α and β in the multi-year forward prediction in the studied wheat breeding program resulted accordingly in an appreciable increase in prediction accuracy, which was in line with previous reports in small-grain cereals [42,43]. A similar goal was pursued by the authors of [14], who suggested to model two separate kernels for combining pedigree and genomic relationship information in prediction models. This approach gave also promising results for the mentioned prediction across years, with prediction accuracies similar to the presented pedigree-genomic selection models (data not shown). However, fulfilling both goals of targeting genetic variation not explained by marker data as well as enlarging training populations with non-genotyped lines is difficult to realise with such two-kernel models, making single-step prediction with a stronger blending of the genomic and pedigree relationship matrices putatively a more preferable approach. Notwithstanding, the quality of the phenotypic data is in such an approach decisive, which was indicated by a similar or even lower prediction accuracy in the forward prediction of the protein content, grain yield and protein yield when enlarging the training population by more than 2000 lines that were solely tested in unreplicated preliminary yield trials (Figures 5 and 7).
Regarding the comparison of genomic models with genotyped selection candidates and predictions based on pedigree data alone, the latter showed clearly a worse average performance, especially when preexisting information of the selection candidates was not included into the training population. One reason is certainly given by the fact that pedigree-based prediction does not address the Mendelian sampling term, i.e., the variation within progeny families [44,45]. Given the absence of line per se performance, the within-family prediction accuracy is thus zero for pedigree-based selection. The within-family prediction accuracy is, however, also rather low for genomic-based prediction [46,47] in comparison to predictions across a diverse panel of lines and families [11,12] as seen in the simulation study. Including full-sibs and half-sibs as in the genomic-assisted and pedigree-genomic-assisted selection approaches can thus be regarded as one factor that resulted in an approximately twice as high within-family prediction accuracy in the simulation. This strategy showed moreover in most cases a clear benefit in comparison to the reliance on genomic-based, pedigree-genomic-based or phenotypic selection alone in the empirical study. Yet, exploiting this preexisting information of the selection candidates with the tested models did not take the genetic correlation between multi-environment and preliminary yield trials into account, while [9] suggested to regard, e.g., grain yield in multi-environment and preliminary yield trials as two different traits in a bivariate genomic prediction model. Testing the latter approach revealed a genomic correlation of r G = 0.75, r G = 0.34 and r G = 0.28 between the multi-environment and preliminary yield trials for the protein content, grain yield and protein yield, respectively, but did not result in a higher prediction accuracy in comparison to the univariate models presented in this study (Supplementary Figure S7).
Notwithstanding, it should be mentioned that the pedigree-assisted models including phenotypic observations from preliminary yield trials performed sometimes similar or even superior to the corresponding genomic-assisted selection model. This observation was likewise made by the authors of [15,17] when predicting already tested lines across a multitude of international mega-environments, and might be attributed to the ability to differentiate between lines within families if preexisting information of the selection candidates is exploited for making predictions. Further possible influencing factors include the altering of performance rankings by inducing a shrinkage towards family means when modelling covariances between family members [48] as well as the previous mentioned drawing of performance information from half-sibs and full-sibs [49,50]. Therefore, unless traits for an indirect selection that explain a sufficient proportion of genetic variance of the target trait are available, prediction accuracies within families are zero or rather low [51], and predictions based on pedigree and/or genomic relationships differentiate for the larger part between averages of progeny populations. Although this might seem rather trivial at first glance, as such a differentiation is also feasible by the phenotypic mid-parent value [52], the latter is oftentimes less accurate than predictions containing relationship information [53]. Phenotypic mid-parents are moreover dependent on the magnitude of the parents´reliability for the traits of interest and both parents have to be tested "in-house" by a breeding program, which is not always the case when germplasm is exchanged and crossed in the framework of the breeders´exemption. Accurately differentiating between family averages by making use of pedigree and/or marker data can furthermore guide a breeder´s decisions, e.g., in how many genotypes should be advanced to the next generation from each family or even if breeding efforts for some of the conducted crosses should be discontinued, as the population mean is a decisive part in the selection among crosses by criteria such as "usefulness" [52,53].

Conclusions
Given the availability of deep pedigree records, pedigree prediction can perform similar to genomic prediction if preexisting information of the selection candidates is available, while blending both information sources can increase the prediction accuracy and thus the selection gain substantially, especially for low heritable traits. Nevertheless, the largest advantage of predictions using genomic data can generally be seen for breeding scenarios and traits were such preexisting information is not systemically available or difficult and costly to obtain.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4395/10/10/1591/s1, Figure S1: Principal component analysis of the 4032 recombinant inbred and double haploid breeding lines involved in the study. The lines are coloured according to their membership to the breeding cycles 2010 to 2019 that were designated to the first year of multi-environment testing of a particular cohort; Figure S2: Prediction accuracy for protein yield using pedigree-based (A), single-step-based (B), genomic-based (C), and pedigree-genomic-based; Figure S3: Prediction accuracy for grain yield using pedigree-based (A), single-step-based (B), genomic-based (C), and pedigree-genomic-based (D) forward predictions (blue coloured) as well as pedigree-assisted (E), single-step-assisted (F), genomic-assisted (G), and pedigree-genomic-assisted (H) forward predictions (red coloured) with the latter exploiting pre-existing information from preliminary yield trials. The training population for each prediction model was built by randomly sampling 100 lines from one to nine years preceding the respective validation year. (D) forward predictions (blue coloured) as well as pedigree-assisted (E), single-step-assisted (F), genomic-assisted (G), and pedigree-genomic-assisted (H) forward predictions (red coloured) with the latter exploiting pre-existing information from preliminary yield trials. The training population for each prediction model was built by randomly sampling 100 lines from one to nine years preceding the respective validation year; Figure S4: Prediction accuracy for the protein content using pedigree-based (A), single-step-based (B), genomic-based (C), and pedigree-genomic-based (D) forward predictions (blue coloured) as well as pedigree-assisted (E), single-step-assisted (F), genomic-assisted (G), and pedigree-genomic-assisted (H) forward predictions (red coloured) with the exploiting pre-existing information from preliminary yield trials. The training population for each prediction model was built by randomly sampling 100 lines from one to nine years preceding the respective validation year; Figure S5: Prediction accuracy for protein yield (A-D), grain yield (E-H), and protein content (I-L) using pedigree-based (P-BLUP), single-step-based (SSG-BLUP), genomic-based (G-BLUP), and pedigree-genomic-based (PG-BLUP) forward predictions as well as pedigree-assisted, single-step-assisted, genomic-assisted, and pedigree-genomic-assisted forward predictions exploiting pre-existing information of the validation population from preliminary yield trials. The displayed results were obtained with an accumulating training population of lines tested in multi-environment trials coming from all years preceding the validation years 2013-2019, while only the preliminary yield trial preceding the validation year was used when fitting the prediction models. The average performance of each model (coloured horizontal lines) is compared with the predictive performance of phenotypic selection based on preliminary yield trials (green horizontal dashed line); Figure S6: Prediction accuracy for protein yield (A-D), grain yield (E-H), and protein content (I-L) using pedigree-based (P-BLUP), single-step-based (SSG-BLUP), genomic-based (G-BLUP), and pedigree-genomic-based (PG-BLUP) forward predictions as well as pedigree-assisted, single-step-assisted, genomic-assisted, and pedigree-genomic-assisted forward predictions exploiting pre-existing information of the validation population from preliminary yield trials. The displayed results were obtained with an accumulating training population of lines tested in multi-environment trials coming from all years preceding the validation years 2013-2019, while all preliminary yield trials preceding the validation year were used when fitting the prediction models. The average performance of each model (coloured horizontal lines) is compared with the predictive performance of phenotypic selection based on preliminary yield trials (green horizontal dashed line); Figure S7: Prediction accuracy for protein yield (A), grain yield (B), and protein content (C) using pedigree (P-BLUP), single-step (SSG-BLUP), genomic (G-BLUP), and pedigree-genomic (PG-BLUP) prediction. A random sample of 50 lines tested in multi-environment trials in 2015-2019 constituted one validation population for each of these years, while 100 lines tested in multi-environment trials from each of the three years preceding a respective validation year were sampled into a training population, Table S1: Frequency of the models used for spatial correction as in [1] with random row and/or column effects with/without modelling autoregressive variance-covariance structures (AR1) between the plots either in row, in column or in both directions; Table S2: Mean, range, and heritability (H 2 ) for protein yield, grain yield, and protein yield across the multi-environment trials conducted between 2010 and 2019; Table S3: Estimated heritability for the pedigree (P-BLUP), genomic