Multi-trait regressor stacking increased genomic prediction accuracy of sorghum grain composition

Cereal grains, primarily composed of starch, protein, and fat, are major source of staple for human and animal nutrition. Sorghum, a cereal crop, serves as a dietary staple for over half a billion people in the semi-arid tropics of Africa and South Asia. Genomic prediction has enabled plant breeders to estimate breeding values of unobserved genotypes and environments. Therefore, the use of genomic prediction will be extremely valuable for compositional traits for which phenotyping is labor-intensive and destructive for most accurate results. We studied the potential of Bayesian multi-output regressor stacking (BMORS) model in improving prediction performance over single trait single environment (STSE) models using a grain sorghum diversity panel (GSDP) and a biparental recombinant inbred lines (RILs) population. A total of five highly correlated grain composition traits: amylose, fat, gross energy, protein and starch, with genomic heritability ranging from 0.24 to 0.59 in the GSDP and 0.69 to 0.83 in the RILs were studied. Average prediction accuracies from the STSE model were within a range of 0.4 to 0.6 for all traits across both populations except amylose (0.25) in the GSDP. Prediction accuracy for BMORS increased by 41% and 32% on average over STSE in the GSDP and RILs, respectively. Predicting whole environments by training with remaining environments in BMORS yielded higher average prediction accuracy than from STSE model. Our results show regression stacking methods such as BMORS have potential to accurately predict unobserved individuals and environments, and implementation of such models can accelerate genetic gain.

Cereal grains provide more than half of the total human caloric consumption globally and amount to 2 over 80% in some of the poorest nations of the world [1]. Sorghum [Sorghum bicolor (L.) Moench], a 3 drought-tolerant cereal crop, is a dietary staple for over half a billion people of semi-arid tropics which 4 is inhabited by some of the most food insecure and malnourished populations [2]. In industrialized 5 countries, such as United States and Australia, grain sorghum is primarily grown for animal feed. 6 But in recent years the uses of sorghum grain have expanded to baking, malting, brewing, and 7 biofortification [3][4][5]. Therefore, genetic improvement of sorghum grain composition is crucial to 8 mitigate the global malnutrition crisis, to increase efficiency of feed grains used in animal production, 9 and to serve evolving niche markets for gluten-free grains. 10 In the last two decades, the use of genome-wide markers in prediction of genetic merit of individuals 11 has revolutionized plant and animal breeding. Genomic prediction (GP) uses statistical models to 12 estimate marker effects in a training population with phenotypic and genotypic data which is then 13 used to predict breeding values of individuals solely from genetic markers [6,7]. Training population 14 1/15 size, genetic relatedness between individuals in training and testing population, marker density, span 15 of linkage disequilibrium and genetic architecture of traits are some of the factors that can affect the 16 predictive ability of the models [8][9][10]. Genomic prediction models are routinely studied and applied 17 by breeding programs around the world in several crops. Novel statistical methods that are capable 18 of incorporating pedigree, genomic, and environmental covariates into statistical-genetic prediction 19 models have emerged as a result of extensive computational research [11]. 20 One of the main advantages of GP is that breeders can use phenotypic values from some lines 21 in some environments to make predictions of new lines and environments. Genomic best linear 22 unbiased prediction (GBLUP) proposed by VanRaden [12] is probably the most widely used genomic 23 prediction model in both plant and animal breeding. Since then GBLUP model has been extended 24 to include G × E interactions resulting in improved prediction accuracy of unobserved lines in 25 environments [13,14]. Burgueño et. al. [13] found an increase in prediction ability of unobserved 26 wheat genotypes by about 20% in multi-environment GBLUP model compared to single environment 27 model. Also an extension of the GBLUP model, Jarquín et. al. [14] introduced a reaction norm model 28 which introduces the main and interaction effects of markers and environmental covariates by using 29 high-dimensional random variance-covariance structures of markers and environmental covariates. 30 While most of the genomic prediction studies have been on individual traits, breeding programs use 31 selection indices based on several traits to make breeding decisions. To address those challenges, 32 expanded genomic prediction models that perform joint analysis of multiple traits have been studied 33 using empirical and simulated data [15,16]. Subsequent improvement in prediction accuracy from 34 multi-trait model over single-trait model depends on trait heritability and correlation between the 35 traits involved [15,17]. 36 Data generated in breeding programs span multiple environment and are recorded for multiple 37 traits for each individual. While multi-environment models and multi-trait models are implemented 38 separately, a single model to account for complexity of variance-covariance structure in a combined 39 multi-trait multi-environment (MTME) model was lacking until Montesinos-López et. al. [18] 40 developed a Bayesian whole genome prediction model to incorporate and analyze multiple traits and 41 multiple environments simultaneously. Montesinos-López et. al. [18] also developed a computationally 42 efficient Markov Chain Monte Carlo (MCMC) method that produces a full conditional distribution 43 of the parameters leading to an exact Gibbs sampling for the posterior distribution. Another MTME 44 model that employs a completely different method was proposed by Montesinos-López et. al. [19]. 45 This method, called the Bayesian multi-output regression stacking (BMORS), is a Bayesian version 46 of multi-target regressor stacking (MTRS) originally proposed by Spyromitros-Xioufis et. al. [20,21]. 47 This method consists of training in two stages: first training multiple learning algorithms for the 48 same dataset and then subsequently combining the predictions to obtain the final predictions. 49 Genomic prediction for grain quality traits has previously been reported in crops such as wheat 50 [22][23][24], rye [25], maize [26], and soybean [27]. Hayes et. al. [28] and Battenfield et. al. [23] used 51 near-infrared derived phenotypes in genomic prediction of protein content and end-use quality in 52 wheat. Multi-trait genomic prediction models can simultaneously improve grain yield and protein 53 content despite being negatively correlated [24,29]. In sorghum, grain macronutrients have shown 54 to be inter-correlated among one another [30], which suggests the multi-trait models may increase 55 predictive ability of individual grain quality traits. The ability to assess genetic merit of unobserved 56 selection candidates across environments is promising for reducing evaluation cost and generation 57 interval in the sorghum breeding pipeline where parental lines of commercial hybrids are currently 58 selected on the basis of extensive progeny testing [31]. In order to extend capacities to performance 59 index selection for multiple environments, we need to study and effectively implement MTME genomic 60 prediction models in our breeding programs. In this study, we report the first implementation of 61 genomic prediction for grain composition in sorghum, and the objective was to assess potential for 62 improvement in prediction accuracy of multi-trait regressor stacking model over single trait model 63 for five grain composition traits: amylose, fat, gross energy, protein and starch.  [32]. The details on experimental field design and agronomic practices are described in Boyles 72 et. al. [33] and Sapkota et. al. [34]. Briefly, the experiments were planted in a two row plots each 6.1 73 m long, separated by row spacing of 0.762 m with an approximate planting density of 130,000 plants 74 ha −1 . Fields were irrigated only when signs of drought stress was seen across the field. The primary panicle of three plants selected from each plot were harvested at physiological maturity. 84 The plants from beginning and end of the row were excluded to account for border effect. Panicles 85 were air dried to a constant moisture (10-12%) and threshed. A 25g subsample of cleaned and 86 homogenized grain ground to 1-mm particle size with a CT 193 Cyclotec Sample Mill (FOSS North 87 America) was used in near-infrared spectroscopy (NIRS) for compositional analysis.

88
Grain composition traits such as total fat, gross energy, crude protein, and starch content can be 89 measured using NIRS. Previous studies have shown high NIRS predictability of the traits used in 90 feed analysis [35,36]. We used a DA 7250 TM NIR analyzer (Perten Instruments). The ground sample 91 was packed in a gradually rotating Teflon dish positioned under the instrument's light source and 92 predicted phenotypic values was generated based on calibration curve for spectral measurements. The 93 calibration curve was built using wet chemistry values from a subset of samples. The wet chemistry 94 was performed by Dairyland Laboratories, Inc. (Arcadia, WI) and the Quality Assurance Laboratory 95 at Murphy-Brown, LLC (Warsaw, NC). The details on the prediction curves and wet chemistry can 96 be found in Boyles et. al. [30].

97
Genotypic data 98 Genotyping-by-sequencing (GBS) was used for genetic characterization of the GSDP and RILs 99 populations [30,33,37]. Sequenced reads were aligned to the BTx623 v3.1 reference assembly 100 (phytozome) using Burrows-Wheeler aligner [38]. SNP calling, imputation and filtering was done 101 using TASSEL 5.0 pipeline [39]. The TASSEL plugin FILLIN for GSDP and FSFHap for RILs 102 population were used to impute for missing genotypes. Following imputation SNPs with minor allele 103 frequency (MAF)<0.01, and sites missing in more than 10% and 30% of the genotypes in GSDP and 104 RILs, respectively, were filtered. The number of genotypes studied for each population represent 105 those with at least 70% of SNP sites. The genotype matrix with 224,007 SNPs from GSDP and 106 56,142 SNPs from RILs population was used for whole genome regression.

Statistical analysis 108
The statistical software environment 'R' was used for model building and analysis [40]. The phenotypic 109 values of the traits were adjusted for random effects of replications within environment using 'lme4' 110 package in R [41]. Principal component analysis was done using the R package 'factoextra' [42]. 111 Marker-based estimates of narrow sense (genomic) heritabilities were calculated using the SNP 112 genotype matrix and phenotypic values using the R package 'heritability' [43]. A matrix with dummy 113 variables '1' and '0' representing combinations of environmental variables (replication and year for 114 GSDP, and replication, year and location for RILs) was used as co-variate in heritability estimation. 115 Single-trait single-environment (STSE) model: 116 The following genomic best linear unbiased prediction (GBLUP) model was used to assess prediction 117 performance of an individual trait from a single environment: where y j is a vector of adjusted phenotypic mean of the j th line (j = 1, 2,..., J ). µ is the overall 119 mean which is assigned a flat prior, g j is a vector of random genomic effect of the jth line, with 120 g = (g 1 , ..., g j ) T ∼ N (0, Gσ 2 1 ), σ 2 1 is a genomic variance, G is the genomic relationship matrix in the 121 order J × J and is calculated [12] as G = ZZ T 2 pj qj , where q j and p j denote major and minor allele 122 frequency of j th line respectively, and Z is the design matrix for markers of order J × p (p is total 123 number of markers). Further, e j is residual error assigned the normal distribution e ∼ N (0, Iσ 2 e ) 124 where I is identity matrix and σ 2 e is the residual variance with a scaled-inverse Chi-square density. 125 Bayesian multi-environment (BME) GBLUP model:

126
Considering genotype × environment interaction can contribute to substantial amount of pheno-127 typic variance in complex traits, we fit the following univariate linear mixed model to account for 128 environmental effects in prediction performance: where y j is a vector of adjusted phenotypic mean of the j th line in the i th environment (i = 1, 130 2,.., I, j = 1, 2,..., J ). E i represents the effect of i th environment and g j represents the genomic effect 131 of the j th line as described in equation 1. The term gE ij represents random interaction between the 132 genomic effect of j th line and the i th environment with gE = (gE 11 ,..., gE IJ ) T ∼ N (O, σ 2 2 I I ⊗ G), 133 where σ 2 2 is an interaction variance, and e ij is a random residual associated with the jth line in the 134 ith environment distributed as N(0, σ 2 e ) where σ 2 e is the residual variance.

136
BMORS is the Bayesian version of multi-trait (or multi-target) regressor stacking method [19]. The 137 multi-target regressor stacking (MTRS) was proposed by Spyromitros-Xioufis et. al. [20,21] based on 138 multi-labeled classification approach of Godbole and Sarawagi [44]. In BMORS or MTRS, the training 139 is done in two stages. First, L univariate models are implemented using the multi-environment 140 GBLUP model given in equation 2, then instead of using these models for prediction, MTRS performs 141 the second stage of training using a second set of L meta-models for each of the L traits. The 142 following model is used to implement each meta-model: where the covariatesẐ 1ij ,Ẑ 2ij ,...,Ẑ Lij represent the scaled prediction from the first stage training 144 with the GBLUP model for L traits, and β 1 , ..., β L are the regression coefficients for each covariate 145 in the model. The scaling of each prediction was performed by subtracting its mean (µ lij ) and 146 dividing by its corresponding standard deviation (σ lij ), that is,Ẑ lij = (ŷ lij -µ lij )σ −1 lij , for each l = 147 1,..., L. The scaled predictions of its response variables yielded by the first-stage models as predictor 148 information by the BMORS model. Simply put, the multi-trait regression stacking model is based 149 on the idea that a second stage model is able to correct the predictions of a first-stage model using 150 information about the predictions of other first-stage models [20,21]. 151 Performance of prediction model:

152
All prediction models were fit using Bayesian approach in statistical program 'R'. The STSE model 153 (1) was fit using the R package 'BGLR' [45], BME model (2) and BMORS model (3) were fit using 154 the R package 'BMTME' [19]. A minimum of 20,000 iterations with 10,000 burn-in steps was used 155 for each Bayesian run.

156
The evaluation of prediction performance of models was done using a five-fold cross validation 157 (CV), which means 80% of the samples were used as training set and testing was done on the 158 remaining 20% for each cross-validation fold. The individuals were randomly assigned into five 159 mutually exclusive folds. Four folds were used to train prediction models and to predict the genomic 160 estimated breeding values (GEBVs) of the individuals in fifth fold (validation/test set). The accuracy 161 of prediction for each fold was calculated as Pearson's correlation coefficient (r) between predicted 162 values and adjusted phenotypic means for the individuals in validation set. Each cross validation run, 163 therefore, resulted in five estimates of prediction accuracy. The same set of individuals were assigned 164 to training and validation across different traits and models tested by using set.seed() function in R. 165 In order to avoid bias due to sampling, we performed 10 different cross-validation runs to calculate 166 the mean and dispersion of the prediction accuracies.

168
Phenotypic variation 169 A single calibration curve for NIRS was used for both populations studied. Table 1 outlines the 170 summary statistics of NIRS predictions and phenotypic distribution and heritability of the grain 171 composition traits. The cross validation accuracy (R 2 ) of the NIRS calibration curve was moderately 172 high to high, except for fat which had a moderate R 2 value (0.41). We had a total of three 173 environments (three years in one location) for the GSDP and four environments (two years in two 174 locations) for the RILs. Traits were normally distributed except amylose in two 2014 environments 175 in the RILs which had bimodal distribution (S1 Fig, S2 Fig). All traits showed significant variation 176 in distribution across the environments, except starch in GSDP. Table 1. Summary statistics of NIRS calibration and phenotypic distribution. R 2 is the prediction accuracy and SECV is the standard error of cross validation for the NIRS calibration curve. Mean represents the phenotypic mean of the trait with its standard deviation (SD). h 2 is the estimate of genomic heritability.

177
Trait The genomic heritabilities of all traits except gross energy were significantly higher (p <0.05) 178 in the RILs than in the GSDP (Table 1). Trait heritabilities were high in the RILs, with protein 179 and gross energy having the highest and lowest heritabilities, respectively. In the GSDP, genomic 180 5/15 heritability was moderately high for fat and gross energy, moderate for protein and starch, and low 181 for amylose. The poor genomic heritability (0.24) of amylose in the GSDP was expected because only 182 a very small proportion (1%) of accessions have low amylose as a result of waxy gene (Mendelian 183 trait). Prediction performance in single and multiple environment 198 We first implemented GBLUP prediction model for single-trait single-environment (STSE). Predic-199 tion accuracies of the STSE model varied across environments in both populations (Fig 2). The 200 environments 2014 in the GSDP and TX2014 in the RILs had highest average prediction accuracy 201 but were not always the best predicted environment for all traits. While poorly predicted for amylose, 202 the environments 2017 in the GSDP and TX2015 in the RILs had higher prediction accuracy for 203 starch compared to all or most environments. Despite variation across environments and populations, 204 the average prediction accuracies from the STSE were within the range of 0.4 to 0.6 for all traits 205 except amylose (0.25) in the GSDP. The average prediction accuracy of the STSE model in the GSDP 206 was positively correlated (r=0.86) with the genomic heritability of the traits. In the RILs, there 207 was a positive correlation (r=0.77) between average prediction accuracy and genomic heritability for 208 amylose, fat and gross energy, but the traits (protein and starch) with the highest heritabilities had 209 6/15 relatively lower average prediction accuracies. We didn't see substantial improvement in multi-environment (BME) model over the STSE 211 prediction accuracies (Fig 3). In the GSDP, the multi-environment models resulted in a decline 212 in average prediction accuracy compared to the STSE model for fat (21%), amylose (10%) and 213 protein (4%), however, no significant change was observed for gross energy and starch (S4 Fig, S1 214 Table). The prediction accuracy in the RILs increased by an average of 3% in the BME compared to 215 the STSE, however, the overall trend of prediction accuracy for traits and environments remained 216 unchanged (S4 Fig). The environment SC2014 showed consistent increase in accuracy for BME over 217 STSE model across all traits with about 10% increase for protein (S2 Tab). Amylose in TX2015 218 environment had the single greatest increase (12%) in average prediction accuracy in the BME among 219 all trait-environment combinations for the RILs.

220
Bayesian multi-output regression stacking 221 We tested two different prediction schemes in the BMORS prediction model using the two functions 222 BMORS() and BMORS Env() as described in [19]. While the BMORS() function was used for a 223 five-fold CV as described in the methods section, the BMORS Env() was used to assess the prediction 224 performance of whole environments while using the remaining environments as training.

225
Five-fold CV 226 The prediction accuracy from five-fold CV in BMORS increased by 41% and 32% on average over 227 the STSE model in GSDP and RILs, respectively. Fig 4 shows the prediction accuracy of BMORS 228 for each trait and environment combination across the two populations. While the percent change in 229 accuracy varied across environments, the BMORS model nonetheless had higher average prediction 230 accuracy than the STSE and BME models for all traits (Fig 3). The increase in average accuracy 231 ranged from 11% (amylose, 2014) to 66% (amylose, 2013) in the GSDP with exception of amylose in 232 2017 (13% decrease), and 15% (fat, SC2015) to 60% (protein, TX2014) in the RILs (S1 Table). The 233 increase in average prediction accuracy was higher (35%) for both locations in 2014 for the RILs, 234 whereas, the year 2013 in the GSDP increased the most (S1 Table, S2 Table). Among the traits, 235 protein (54%) had the greatest average increase in prediction accuracy in the GSDP, whereas in the 236 RILs, protein and starch (42%) both showed the greatest increase.

Prediction of whole environment 238
Predicting a whole environment using the BMORS model usually yielded higher accuracy than the 239 mean prediction accuracy from the STSE model for each trait and environment combination (Fig 2, 240  5, Table 2). The distribution of prediction accuracy across trait and environment combination were, 241 however, similar to the results from the STSE model. In the GSDP, little variation in prediction 242 accuracies was observed across environments for gross energy, starch and protein, whereas, amylose 243 and fat showed greater variability in prediction accuracy between environments. In the RILs, 244 prediction accuracy for all traits except protein had high variability across the environments (Table 245 2).

246
In order to assess predictability by location or year in the RILs, we tested one location or year 247 8/15 by training the BMORS model using the other location or year, respectively ( Table 2). The Texas 248 location had higher accuracy of prediction for fat (+0.11) and gross energy (+0.1) compared to 249 South Carolina, but rest of the traits had similar prediction accuracy (difference <0.02). Prediction 250 accuracy of whole years varied across traits, amylose (+0.09) and fat(+0.04) were higher in 2014, 251 protein was higher (+0.05) in 2015, and starch and gross energy were similar. 252 Figure 5.
Prediction accuracy of the test environments predicted using the BMORS Env in the GSDP. Values on top of the bar represent the height of the bar. Phenotyping for grain compositional traits is: 1) challenging and labor-intensive, 2) destructive 254 for most accurate results, and 3) only performed after plants reach physiological maturity and are 255 harvested. The use of genomic prediction for compositional traits will be extremely valuable because 256 it increases selection intensity and decreases generational interval by overcoming the phenotyping 257 challenges. Moreover, these traits are complex and quantitatively inherited so will benefit from 258 genomic prediction's ability to account for many small effect QTLs in estimating breeding values. While the accuracy of NIRS calibration for traits in this study ranged from moderate to high, there 261 was prediction error associated with NIRS prediction. However, it is unclear if and what effects NIRS 262 prediction error had on genomic prediction. No direct correlation was observed between the genomic 263 prediction accuracy and NIRS statistics for the traits studied. The trait with the lowest NIRS R 2 , 264 fat, was predicted as well as or better than starch, protein and gross energy, which had NIRS R 2 265 >0.7. Despite varying strength of correlations between traits across the two populations studied, the 266 nature of relationship was similar for a given pair of traits, which is also in agreement with previous 267 studies [30,46,47]. The strong negative relationship of starch and amylose to protein, fat and gross 268 energy was further elucidated by the PCA analysis of phenotypic correlation matrix (S3 Fig). Since 269 starch, protein and fat were measured on a percent dry matter basis, the strong correlation between 270 them is expected.

271
Genetic relatedness and trait architecture are known to affect the accuracy of genomic prediction 272 [8,48]. The genetic relatedness between individuals and heritability of the traits were higher in 273 the RILs than in GSDP (S3 Fig, Table 1 ). Those factors could be contributing to higher average 274 prediction accuracy in the RILs. However, the average prediction accuracies for gross energy and 275 starch were comparable between GSDP and RILs (Fig 3). Prediction accuracy in the GSDP could 276 have been boosted by greater genetic diversity despite lower genetic relatedness [34]. Heffner et. 277 al. [22] observed a prediction accuracy of 0.5-0.6 for wheat flour protein in two biparental populations. 278 Guo et. al. [26] reported prediction accuracies of 0.44 and 0.8 for protein and amylose in rice diversity 279 panel. Similar results were observed in our STSE models for protein content (Fig 2). Whereas, 280 lower prediction accuracy of amylose in our diversity panel is probably due to the lack of sufficient 281 low-amylose lines with the waxy gene [30]. While we couldn't find any previous genomic prediction 282 studies on starch, fat and gross energy, these traits are nutritionally one of the most important traits 283 in cereal grains. The moderate to high prediction accuracy observed suggests implementation of 284 genomic selection can improve genetic gain for these grain quality traits.

285
Multi-trait regressor stacking 286 One of the daunting tasks of genomic prediction is estimating the effects of unobserved individuals 287 and environments. As multiple traits are analyzed across several environments, the ability to 288 combine information from multiple traits and environments can be crucial in increasing accuracy of 289 prediction [13,15,16]. When the correlations among traits are high, prediction accuracies of complex 290 traits can be increased by using multivariate model that takes this correlation into account [15,18]. 291 We fit a Bayesian multi-environment (BME) model (2) that takes the genotype × environment effects 292 into consideration. In the GSDP, where environments were three years at the same location, the 293 BME model showed a slight decline (7%) in average prediction accuracy which was mostly due to the 294 two traits, amylose and fat (Supplementary Table S1). The RILs showed slight increase (2-3%) in 295 prediction accuracy of traits when averaged over the environments, but there was variability across 296 the environments (S2 Table). 297 We implemented two functions [BMORS () and BMORS Env ()] which are not only used to 298 evaluate prediction accuracy but are also computationally efficient [19]. The BMORS model (3) 299 performs two-stage training by stacking the multi-environment models from all the traits. The 300 five-fold cross validation conducted for BMORS was similar to the CV1 strategy of Montesinos-López 301 et. al. [18]. The use of multi-trait models has been consistently shown to increase prediction accuracy 302 over single-trait models across different crops and traits [15][16][17]49]. The multi-target regressor 303 stacking increased average prediction accuracy by 41% and 32% in the GSDP and RILs, respectively, 304 as compared to the STSE prediction accuracy. Average prediction accuracy of all traits improved 305 in BMORS over STSE and BME across both the populations (Fig 3). Consistent improvement in 306 accuracy of BMORS is a result of the ability to use not only correlation between traits but also 307 between environment in the model training [18,19]. The ability to accurately estimate genetic merit of 308 lines in unobserved environments is of tremendous value in plant breeding. Our results show potential 309 10/15 of BMORS Env() function for predicting the whole environment. Testing a whole environment 310 by training BMORS model using all other environments resulted in higher prediction accuracy for 311 that trait-environment combination than using STSE or BME model. Prediction accuracy of all 312 environments were 0.5 or higher with exception of amylose in GSDP, the reason for which we have 313 discussed above (Fig 5, Table 2).

314
Application for breeding 315 Grain quality traits such as starch and protein content have been under selection since the inception 316 of phenotypic selection in modern breeding practices. More recently, total energy supplement of 317 grain has gained attention for increasing feed efficiency in animal production, and a need exists 318 for increasing total calories for human nutrition in the wake of global malnutrition crisis. Despite 319 high correlations among these traits, the genetic variation underlying starch, protein and fat can be 320 decoupled. [30] have shown major and minor effect QTLs underlying the three traits are distributed 321 across the genome and are segregating in biparental populations. However, in practice, selection 322 would be conducted simultaneously for these traits using a selection index rather than for individual 323 traits. Velazco et. al. [31] observed an increase in predictive ability by using a multi-trait model for 324 grain yield and stay green in sorghum, and argue that such an exercise would allow for using selection 325 index for implementation of genomic selection for correlated traits. Increased prediction accuracy, 326 improved selection index, and estimation of precise genetic, environmental and residual co-variances 327 makes multi-trait multi-environment models preferable over univariate models [18]. The multi-trait 328 regression stacking model we tested shows large scale improvement in model prediction and can 329 be used in tandem with Bayesian multi-trait multi-environment (BMTME) model for parameter 330 estimation and assessing prediction accuracy. The ability to estimate genetic effects and breeding 331 values of unobserved environments will be of great advantage to predict performance in diverse 332 environments and for implementation of selection theory.

334
Phenotyping of grain compositional traits using near-infrared spectroscopy is labor-intensive, generally 335 destructive, and time limiting. Therefore, the use of genomic selection for these traits will be 336 extremely valuable. This study establishes the potential to improve genomics-assisted selection of 337 grain composition traits by using multi-trait multi-environment model. The phenotypic measurements 338 obtained from NIRS prediction were amenable to genomic selection as shown by moderate to high 339 prediction accuracy for single trait prediction. While multi-environment model alone didn't lead to 340 much improvement over single environment model, stacking of regression from multiple traits showed 341 substantial improvement in prediction accuracy. The prediction accuracy increased by 32% and 342 41% in the RILs and GSDP, respectively, when using the Bayesian multi-output regressor stacking 343 (BMORS) model compared to a single trait single environment model. The ability to predict line 344 performance in an unobserved environment is of great importance to breeding programs, and results 345 show high accuracy for predicting whole environments using BMORS.