Prediction Strategies for Leveraging Information of Associated Traits under Single- and Multi-Trait Approaches in Soybeans

: The availability of molecular markers has revolutionized conventional ways to improve genotypes in plant and animal breeding through genome-based predictions. Several models and methods have been developed to leverage the genomic information in the prediction context to allow more e ﬃ cient ways to screen and select superior genotypes. In plant breeding, usually, grain yield (yield) is the main trait to drive the selection of superior genotypes; however, in many cases, the information of associated traits is also routinely collected and it can potentially be used to enhance the selection. In this research, we considered di ﬀ erent prediction strategies to leverage the information of the associated traits ([AT]; full: all traits observed for the same genotype; and partial: some traits observed for the same genotype) under an alternative single-trait model and the multi-trait approach. The alternative single-trait model included the information of the AT for yield prediction via the phenotypic covariances while the multi-trait model jointly analyzed all the traits. The performance of these strategies was assessed using the marker and phenotypic information from the Soybean Nested Association Mapping (SoyNAM) project observed in Nebraska in 2012. The results showed that the alternative single-trait strategy, which combines the marker and the information of the AT, outperforms the multi-trait model by around 12% and the conventional single-trait strategy (baseline) by 25%. When no information on the AT was available for those genotypes in the testing sets, the multi-trait model reduced the baseline results by around 6%. For the cases where genotypes were partially observed (i.e., some traits observed but not others for the same genotype), the multi-trait strategy showed improvements of around 6% for yield and between 2% to 9% for the other traits. Hence, when yield drives the selection of superior genotypes, the single-trait and multi-trait genomic prediction will achieve signiﬁcant improvements when some genotypes have been fully or partially tested, with the alternative single-trait model delivering the best results. These results provide empirical evidence of the usefulness of the AT for improving the predictive ability of prediction models for breeding applications.


Introduction
Genomic prediction (GP) [1] has substantially changed conventional animal and plant breeding [2,3] in the last 20 years. It uses massive information of genome-wide dense markers to describe the genomic similarities between pairs of genotypes [4,5], allowing the borrowing of information between tested and untested genotypes. The ultimate goal in GP is to predict the expected and thus, the borrowing of information between traits was leveraged. Details of these two methods are provided below.

Single-Trait
M1: Single-trait based on marker data G. Consider that the phenotypic measure y ij of the ith (i = 1, 2, . . . , n) genotype for the jth (j = 1, 2, . . . , t) trait can be represented as the sum of a common effect µ j plus a linear combination between p marker SNPs X ijk ; k = 1, 2, . . . , p and their corresponding effects, plus an error term, such that: In addition, consider further assumptions on the marker effects (IID: independent and identically distributed outcomes) following normal densities, such that b jk ∼ N 0, σ 2 b j and with σ 2 b j acting as the corresponding variance component. From the results from the multivariate normal density, we find that the vector of genetic effects g j = g ij , g ij = p k=1 X ijk b jk , for the jth trait is distributed as g j ∼ N 0, Gσ 2 g j , with G = XX p [4], σ 2 g j = p × σ 2 b j and X representing the centered (by columns) matrix of molecular markers. After collecting the aforementioned results, M1 becomes: where e ij , the error term, follows IID normal densities, such that e ij ∼ N 0, σ 2 j and σ 2 j represents the residual variance for the jth trait.
M2: Single-trait based on AT only. This model is similar to M1 and is exclusively used for yield prediction; however, instead of using maker SNPs as covariates, it incorporates phenotypic information from the AT. Consider that the yield performance y i1 of the ith genotype can be represented as the sum of a common effect across genotypes (µ 1 ) plus a linear combination between the AT y j , j = 2, 3, . . . , t and their corresponding effects (or weights), such that c i = t j=2 y ij w j , where c i represents the phenotypic effect of the ith genotype, with w t ∼ N 0, σ 2 t and σ 2 t as the corresponding variance component. This model can be written as follows: and from properties of the multivariate normal density, we find that the vector of phenotypic effects c = {c i } follows a multivariate density, such that c ∼ N 0, , Y {−1} is the centered (by columns) matrix of AT and σ 2 T = (t − 1)σ 2 t is the corresponding variance component. T is the matrix of phenotypic relationships, and its entries describe the phenotypic similarities (based on the AT) between pairs of genotypes. This model requires perfect information of the AT for all genotypes (i.e., no missing values are allowed on these).
M3: Single-trait combining maker and AT data (G + T). This model combines marker SNPs and the information of the AT by coupling models M1 and M2 into a single linear predictor as follows: with all the model terms as previously defined.

Multi-Trait
M4: Multi-trait model based on marker data G.
Consider y = y 1 , y 2 , . . . , y t as the vector of phenotypic responses for t (7) traits and it is of order n × t such that the linear predictor for all traits can be written in matrix formulation as: (4) u = [u 1 , u 2 , . . . , u t ], with u j as the vector of the common effect for the jth trait across genotypes; g = [g 1 , g 2 , . . . , g t ] with g j as the vector of genomic effects for the jth trait, such that g ∼ N(0, G ⊗ Σ) and e ∼ N(0, I n ⊗ R), where Σ and R are the corresponding genetic and residual co-variance matrices between pairs of traits and these are of order t × t (7 × 7).

Prediction Schemes
To evaluate the performance of the described models for predicting grain yield and the AT as well, four different cross-validation (CV) scenarios were considered. Figure 1 provides a graphical representation of the different scenarios for predicting untested (CV00) and partially tested (CV2, CV2H, CVY) genotypes. These correspond to different prediction objectives (yield and AT), different levels of overlapping (50% and 75%) across genotypes, and different training (50% and 75%) and testing set sizes (50% and 25%), respectively.
Agriculture 2020, 10, x FOR PEER REVIEW 13 of 13 the AT, and of the AT with marker data in the analyses, respectively. The model M4 was used to jointly perform predictions on all traits for (i) partially tested (CV2, CV2H), (ii) untested (CV00) genotypes, (iii) and for yield prediction only (CVY) using information from AT.   Figure 1).
Graphical representation of four different cross-validation designs; horizontal gray blocks of lines (20 per block) correspond to those genotypes assigned to the training set while the horizontal blocks of white lines (20 per block) conform the testing set. CV2: for each trait, half of the phenotypic information was assigned to the training set while the remaining half to the testing set, such that this design allows different degrees of overlapping across traits for genotypes varying between no phenotypic information (20 genotypes) and information on all the 7 traits (20 genotypes). CV2H: similar to the previous scheme; however, there is phenotypic information available on all the traits for those genotypes with available yield records (training set for yield). CV00: there is phenotypic information available across traits only for those genotypes with yield records. CVY: There is phenotypic information on grain yield for half of the genotypes while full information for the remaining traits (6). Figure 1) considers the partial information of genotypes (i.e., for a particular genotype, some traits were observed but not others). In this case for each trait, 50% (1280 genotypes) of the genotypes were assigned to training and testing sets, allowing some degree of overlapping across traits. There are 20 genotypes with no information for any of the 7 traits; also, there are 20 genotypes with information on all traits, and 140 genotypes with information for only one trait (not the same trait, 20 genotypes per trait). The full list of the number of genotypes by the number of observed traits is as follows: 20 (0), 140 (1), 420 (2), 700 (3), 700 (4), 420 (5), 140 (6), and 20 (7). In all scenarios, the rows of the different cross-validation schemes (the same for all schemes) from Figure 1 were randomly assigned to training and testing sets, such that 50% of the genotypes comprised the training set and the remaining 50% the testing set. In this case, it is expected half of the genotypes in the previous list were assigned to each one of the sets.

CV2 (top left in
CV2H (top right) also corresponds to partially tested genotypes but ensuring full information of all AT for those genotypes with yield records (training set) and partial information of the AT for the remaining genotypes (testing set). As such, 50% of the yield observations were assigned to the training sets, and consequently 75% of AT were assigned to the training set, and the remaining were assigned to the testing set (50% for yield and 25% for the AT). The full list of the number of genotypes by the number of observed traits is as follows: 20 (0), 120 (1), 300 (2), 400 (3), 300 (4), 120 (5), 20 (6), and 1280 (7). This prediction scheme allows (i) assessment of the effects in predictive ability when increasing the information of the AT for yield prediction, and (ii) assessment of the effects of increasing the training set size when predicting the AT. Under this scenario, the single-trait and the multi-trait methods can be applied. In this case, the training and the testing sets were comprised of the same rows as in the previous scheme. Similar to the previous scheme, 50% of the genotypes in this listing were expected to conform the training set while the remaining 50% composed the testing set.
CV00 (bottom left) corresponds to the case of predicting the trait performance of fully untested genotypes. No information for any of the traits is available for those genotypes in the testing set. Here, 50% of the genotypes were assigned to the training set, and the remaining 50% to the testing set.
CVY (bottom right) is exclusively used for yield prediction, and it requires the full information of the AT for all genotypes for its implementation. The objective of this scheme was to assess the levels of yield predictive ability that can be reached when considering perfect information of the AT under the single-trait and the multi-trait approaches.
Since different cross-validation schemes were intended for different prediction objectives with different models to avoid confusion, all the studied cases (strategies) are provided in Table 1. These strategies (model-cv scheme) were implemented for yield prediction only and for predicting the AT as well using the information of marker SNPs of AT and combined. With model M1 (baseline model), each trait is analyzed separately, and thus no borrowing of information between traits is possible. M2 and M3 models were exclusively used for yield prediction, and these allowed the incorporation of the AT, and of the AT with marker data in the analyses, respectively. The model M4 was used to jointly perform predictions on all traits for (i) partially tested (CV2, CV2H), (ii) untested (CV00) genotypes, (iii) and for yield prediction only (CVY) using information from AT. Table 1. Strategies (model-cv scheme) for predicting yield performance and associated traits (AT) for different approaches (S, single-trait; M multi-trait) and different training/testing sizes (further details in Figure 1).

Assessing Predictive Ability
Despite the adopted strategy and the approach (single-trait/multi-trait) used for performing the predictions, the predictive ability was assessed as the Pearson correlation between predicted and observed values within traits. The assignation of the training/testing sets was repeated 20 times and the mean correlation and corresponding standard deviation were computed across replicates for each strategy.

Software
Single-trait analyses (M1, M2, and M3) were performed using BGLR R-package [32] while the multi-trait analysis was implemented using the MTM R-package [20]. The fitted models were developed under the Bayesian approach considering normal priors for the main effects while Scaled-inverse Chi-Squared priors were considered for the variance components. A total of 50,000 samples were considered for the Gibbs sampler, 20,000 of these were used for burn-in and a thinning of 5. Thus, the final chain consisted of 6000 samples. Table 2 presents the percentage of phenotypic variability explained for the different model terms For the other traits, the heritability ranged between 0.4 (lodging) and 0.68 (protein and oil).  Table 3 presents the phenotypic (lower triangular) and genetic (upper triangular) correlations among pairs of traits. The phenotypic correlations were computed as the Pearson correlation between pairs of traits. The matrix of genetic correlations was obtained by performing the full data set analysis (no missing data) under model M4. As expected, yield and protein showed negative phenotypic (−0.310) and genetic (−0.776) correlations while yield and oil showed positive trends (0.259 and 0.800, respectively). Interestingly, while yield and plant height showed a strong negative genetic correlation (−0.720), the phenotypic correlation was close to zero (−0.094). Yield and lodging showed strong negative phenotypic and genetic correlations (−0474 and −0.921, respectively). The phenotypic and genetic correlations between yield and seed size were positive (0.149 and 0.582), the same as between yield and fiber content (0.162 and 0.566).  Table 4 shows for the 7 traits, the squared root of the genomic heritability (H) (the theoretical upper threshold for predictive ability) and the goodness of fit derived from the full data analysis (FD; i.e., the correlation between the predicted and observed values without considering missing values) under the single-trait (M1_FD) and the multi-trait (M4_FD) models, and the average prediction accuracy for different prediction strategies (model-CV scheme) with different levels of connectivity across traits. Table 4. The square root of the heritability (H), the goodness of fit using single-trait (M1_FD) and multi-trait (M4_FD) models, and the average predictive ability (20 replicates) of unobserved records for 7 traits using different prediction strategies: M1_CV00, single-trait genomic prediction with 50% of missing values per trait; M1_CV2H, single-trait genomic prediction with 75% of missing values for traits other than yield (50%); M4_CV00, multi-trait genomic prediction model with 50% of missing values across traits; M4_CV2, multi-trait genomic prediction model with 50% of the genotypes partially observed (i.e., some traits but not others for the same genotype); M4_CV2H, multi-trait genomic prediction with 75% of missing values for traits other than yield (50%).

Predictive Ability
Regarding the predictive ability under the different prediction scenarios, the single-trait approach (M1_CV00) returned an average predictive ability of 0.547 (SD = 0.011) for yield while for the other traits, it varied between 0.469 (height, SD = 0.016) and 0.645 (protein, SD = 0.008). For grain yield, the results from this strategy were used as the baseline to compare the performance of the other strategies. In addition, the results of the strategies M1_CV00, M1_CV2, M1_CV2H, and CVY were equivalent for the yield predictive ability, and thus only those from the M1_CV00 strategy were used to contrast with the remaining strategies (M4_CV00, M4_CV2, M4_CV2H).
The M4_CV00 strategy based on the multi-trait model returned a mean correlation for grain yield of 0.513 (SD = 0.019) while for the other traits, these values ranged between 0.478 (lodging) and 0.600 (seed size). Thus, when no information for any of the traits for those genotypes in the testing set was available, their predictive ability significantly decreased, except for plant height (0.469 and 0.487). When the genotypes were partially observed (50%) for some traits but not others, M4_CV2 returned an average yield predictive ability of 0.581 (SD = 0.014). This represents an improvement of 6% and of 13% with respect to the M1_CV00 and M4_CV00 strategies, respectively. For the remaining traits, the M4_CV2 strategy always returned the best results compared with strategies M1_CV00 (~1-14%) and M4_CV00 (~3-29%). Seed size and protein content were the less and more benefited traits of the borrowing of information between traits, respectively.
When the percentage of partially observed genotypes (i.e., not all the traits were observed) increased from 50% to 75% for the AT (CV2_H), benefits were not observed in yield prediction by increasing the levels of connectivity across traits. M4_CV2H returned a mean correlation of 0.580 (SD = 0.15). Under this strategy, yield is the only trait that can be compared with the other strategies because in all cases, the testing set is the same (size and genotypes). For the remaining traits, the training and testing set sizes changed with respect to the previous strategies, thus their corresponding predictive ability can be compared only between strategies M1_CV2H and M4_CV2H. Slight improvements (2%~9%) in predictive ability were observed with M4 with respect to M1; under the CV2H scheme, the most benefited trait of the borrowing of information between traits was protein.
Models M2 and M3 were used to leverage the information of the AT for yield prediction under the single-trait approach. These models require complete information of the AT for all genotypes (i.e., no missing values are allowed on these).
A fair comparison of the performance of all the different prediction strategies can be done with yield prediction only (training-testing sets). Figure 2 presents the square root of the heritability (H) obtained with the single-trait model M1, the goodness of fit derived from the single-trait (M1_FD) and the multi-trait (M4_FD) models, and the average yield predictive ability (and SDs) for the different prediction strategies (model-CV scheme). In all the cases, the testing set comprised 50% of the genotypes while the remaining 50% were used for model calibration. M3_CVY, the single-trait strategy that combines marker SNPs and information of the AT, returned the best results (mean = 0.681 and SD = 0.010). M4_CVY, the multi-trait strategy that considers the same amount of information than the previous one, was the second-best strategy (mean = 0.607, SD = 0.014). The improvement of M3_CVY and M4_CVY with respect to the baseline strategy (M1_CVY; mean = 0.547, SD = 0.011) was 25% and 11%, respectively. Additionally, the improvement of the single-trait strategy M3_CVY with respect to the multi-trait strategy M4_CVY was 12%.
The multi-trait strategies M4_CV2 (0.581, SD = 0.014) and M4_CVH (0.580, SD = 0.015) practically performed the same and their results were slightly below M4_CVY (0.607). When only phenotypic information of the AT was used, the M2_CVY strategy returned a mean correlation of 0.516 (SD = 0.011). The less efficient strategy was M4_CV00 (0.513, SD = 0.018). Thus, under the multi-trait model, when no information of the AT is available for those genotypes in the testing set, the predictive ability is negatively affected.

Discussion
In this study, we compared different prediction strategies for addressing different prediction problems. Only the results for grain yield can be compared across all these prediction strategies. For the remaining traits (AT), their results are comparable only when the same training and testing sets were used for different prediction scenarios (CV00, CV2, and CV2H). The heritability varied across traits, with protein and oil being the traits with the highest values (0.68). These results suggest the predominance of the pleiotropic effect of genes [33]. On the other hand, grain yield was the trait with the lowest heritability (0.49). Thus, an improvement in predictive ability can be expected when the information of traits with high heritability in the multi-trait context is included [28,33,34].
Regarding the phenotypic and the genetic correlations, yield and protein exhibited a moderated negative phenotypic correlation (−0.310) while the genetic correlation was stronger (−0.766). On the other hand, as we expected, yield and oil showed positive phenotypic (0.259) and a genetic correlation (0.800). Oil and protein are two of the main constituents of the seed composition and these were strongly negatively correlated at the phenotypic (−0.693) and the genetic (−0.918) levels as addressed by [35,36]. The strong correlation among traits (between AT, and between AT and yield) might indicate that the inclusion of the AT can be beneficial for yield prediction and for predicting the AT as well as under the multi-trait approach [28,33].
The goodness of fit was significantly higher under the multi-trait model compared with the single-trait approach. Additionally, these values were higher than the squared root of the heritability obtained with the single-trait model. The square root of the heritability can be considered as the upper threshold of the predictive ability under the single-trait model. These results provide insights about the levels of prediction accuracy that can be reached when the phenotypic information from the AT is combined with the marker data. We should recall that H was computed using maker and yield phenotypic information from the same trait (yield), thus an increase in predictive ability is expected with the addition of the information from the AT. In our case, the multi-trait strategies improved the

Discussion
In this study, we compared different prediction strategies for addressing different prediction problems. Only the results for grain yield can be compared across all these prediction strategies. For the remaining traits (AT), their results are comparable only when the same training and testing sets were used for different prediction scenarios (CV00, CV2, and CV2H). The heritability varied across traits, with protein and oil being the traits with the highest values (0.68). These results suggest the predominance of the pleiotropic effect of genes [33]. On the other hand, grain yield was the trait with the lowest heritability (0.49). Thus, an improvement in predictive ability can be expected when the information of traits with high heritability in the multi-trait context is included [28,33,34].
Regarding the phenotypic and the genetic correlations, yield and protein exhibited a moderated negative phenotypic correlation (−0.310) while the genetic correlation was stronger (−0.766). On the other hand, as we expected, yield and oil showed positive phenotypic (0.259) and a genetic correlation (0.800). Oil and protein are two of the main constituents of the seed composition and these were strongly negatively correlated at the phenotypic (−0.693) and the genetic (−0.918) levels as addressed by [35,36]. The strong correlation among traits (between AT, and between AT and yield) might indicate that the inclusion of the AT can be beneficial for yield prediction and for predicting the AT as well as under the multi-trait approach [28,33].
The goodness of fit was significantly higher under the multi-trait model compared with the single-trait approach. Additionally, these values were higher than the squared root of the heritability obtained with the single-trait model. The square root of the heritability can be considered as the upper threshold of the predictive ability under the single-trait model. These results provide insights about the levels of prediction accuracy that can be reached when the phenotypic information from the AT is combined with the marker data. We should recall that H was computed using maker and yield phenotypic information from the same trait (yield), thus an increase in predictive ability is expected with the addition of the information from the AT. In our case, the multi-trait strategies improved the results from the single-trait strategy, which only uses marker data, when at least 50% of the genotypes were partially observed.
The different cross-validation schemes attempt to mimic some realistic scenarios that breeders might face in their breeding programs. When the main interest is the yield prediction only, sometimes there might be available information of the AT for those genotypes in the testing set. Thus, this information can be used for improving the yield predictive ability to some degree with respect to the single-trait baseline strategy. In our case, the M4_CV2 strategy, which includes information of the AT for some of the genotypes (50%) in the testing set, improved the results of the baseline model (CV1_CV00) in about 6%. Increasing the information of the AT up to 75% (M4_CV2H), ensuring complete information for all genotypes in the training set, did not benefit the yield predictive ability in comparison with the previous case.
On the other hand, when no information for the AT was available for any of the genotypes in the testing set, the yield predictive ability decreased to 0.513. This result contrasts with other studies [21,28,33], where the predictive ability practically remained unchanged with respect to the baseline model (single-trait model). When the information of all the AT was available for all genotypes (M4_CVY = 0.607), slight improvements were observed with respect to all the previous strategies (single-trait and multi-trait) and these ranged between 5% and 18%.
When the single-trait strategy was implemented considering the same information as in the previous strategy (i.e., perfect information for AT and marker data) but leveraged in a different way, the predictive ability was significantly improved with respect to all previous strategies (single-trait and multi-trait). The M3_CVY strategy returned a mean correlation 0.681 and this represents a relative improvement between 12% and 33% with respect to all the previous strategies. In this case, the information from the AT was included using a variance-covariance matrix between traits in addition to the molecular marker information. Other authors have shown similar improvements predicting yield when combining information from secondary traits with maker information in comparison with the baseline strategy. For example, [30] showed an improvement of 22% when combining soybean canopy coverage recorded in the early stages of plant growth (days 14-33 after planting) with marker data. Additionally, [29] showed an improvement of 13% when combining the soybean canopy data with the marker information via a hybrid matrix where the importance of both components was sequentially weighted. Other studies have shown similar improvements when combining both sources of information (marker and AT) for yield prediction in maize [37] and wheat [38].
The strategy that only considers the information of the AT for yield prediction (M2_CVY) decreased the predictive ability of the baseline strategy (M1_CVY) by around 6%. Additionally, these results were slightly better than those from the multi-trait strategy when the information from the AT was not available for those genotypes in the testing set (M4_CV00). Other authors also showed a reduction in the yield predictive ability when using the information from other traits only. For example, [29] showed a significant reduction in the soybean predictive ability of around 48% with respect to the M1_CVY strategy; [30] also showed a decrease of around 25% in two out of three predicted soybean trials.
With respect to the other traits, we observed that the predictive ability was improved (6-14%) only when the genotypes in the testing set were partially observed. Additionally, small improvements were observed when the training sample size was increased under the multi-trait approach (M4_CV2H) and these improvements were higher than those from the single-trait (M1_CV2H) for the same partitions. In this case, these results provide evidence of the benefits of the borrowing of information between correlated traits. On the other hand, when the information of the AT was absent for all genotypes, the predictive ability was reduced (except for plant height) under the multi-trait strategy (M4_CV00) with respect to the baseline strategy (M1_CV00).
In cases of full data availability, M3 showed to be more appropriate. However, perfect information of the AT is not always available in the routine of a soybean breeding program. For these cases, it is feasible to apply multi-trait models in scenarios where there is correspondence up to a certain degree between genotypes in training and testing sets across the correlated traits (i.e., M4_CV2). Additionally, when the testing sets are reduced in terms of the number of genotypes for the secondary traits, the yield predictive ability remained unchanged with respect to the previous strategy; however, the M4_CV2H predictive ability of the remaining traits was slightly improved. Further studies should consider a better selection of the AT to be included for yield prediction and the prediction of these ATs as well. For example, questions like up to what extent can the predictive ability be improved by adding only those ATs with the strongest phenotypic/genomic correlation with yield, or what is the lowest phenotypic/genetic correlation threshold that should be considered to ensure significantly higher correlations than those derived from the baseline model should be studied.
In general, the results from this study showed that improvements in the yield predictive ability can be accomplished only when the information from the AT is available for those genotypes in testing sets when performing the single-trait and the multi-trait strategies. In both approaches, the predictive ability was significantly improved with respect to the baseline strategy (M1_CVY). We also showed that the same information (i.e., markers and AT) can be leveraged differently depending on the adopted strategy. The single-trait strategy (M3_CVY) more efficiently combined the maker and phenotypic information from the AT than the multi-trait strategy (M4_CVY). A practical implication of adopting one approach or the other is that the single-trait approach is much easier to implement and is computationally less demanding. Another important implication of these results is that the use of AT is also convenient because in many cases, the measurement of these traits does not require an evaluation of extensive field trials (plots with or without replicates) to be conducted. In these cases, the information of the AT could significantly improve the yield predictive ability, thus saving resources (water, land, seed availability, time, money, etc.).

Conclusions
In this study, we compared different methods for effectively including the information of associated traits for yield prediction and for predicting the AT as well under different cross-validation schemes using the single-trait and multi-trait approaches. The multi-trait model was shown to improve the performance of the conventional single-trait model but only when the information from the AT was available for those genotypes in the testing set. However, the predictive ability was reduced in comparison with the single-trait baseline model when predicting completely untested genotypes. As was expected, under the multi-trait fashion, the best results were obtained when there was available complete or partial information of the AT; however, we observed that a more efficient use of this information could be accomplished under the single-trait parameterization by combining the complete information of the AT with marker SNPs via covariance structures.
The implications of these results are that (i) the single-trait approach is much easier to implement than the multi-trait model and since it returned the best results, we recommend its use for yield prediction; and (ii) the convenience of using the information of correlated traits is evident when some of these traits can be measured easier than grain yield or do not require extensive field trials.
Finally, one of the disadvantages of using the information of correlated traits for effectively improving yield predictive ability of genotypes is that these must be partially tested in order to measure the AT. This fact poses challenges to fully implement the alternative single-trait or the conventional multi-trait strategies to leverage the availability of the information on the AT for those genotypes in the training set. Perhaps, a solution would consider predicting the missing values of the AT of those genotypes in the testing set using the single-trait approach first, and then combine the predicted and the observed values to perform predictions of yield records under the alternative single-trait or multi-trait approach as a second step. This is still an area of development, and we expect that future research will consider/address these challenges.