Next Article in Journal
Matrix Dynamics and Microbiome Crosstalk: Matrix Metalloproteinases as Key Players in Disease and Therapy
Next Article in Special Issue
Genome-Wide Identification of the Dof Gene Family and Functional Analysis of PeSCAP1 in Regulating Guard Cell Maturation in Populus euphratica
Previous Article in Journal
Multi-Omic Analysis Reveals the Potential Anti-Disease Mechanism of Disease-Resistant Grass Carp
Previous Article in Special Issue
Correction: Ma et al. Complete Chloroplast Genomes of 9 Impatiens Species: Genome Structure, Comparative Analysis, and Phylogenetic Relationships. Int. J. Mol. Sci. 2025, 26, 536
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GBLUP Outperforms Quantile Mapping and Outlier Detection for Enhanced Genomic Prediction

by
Osval Antonio Montesinos-López
1,
José Crossa
2,3,
Paolo Vitale
2,
Guillermo Gerard
2,
Leonardo Crespo-Herrera
2,
Susanne Dreisigacker
2,
Carolina Saint Pierre
2,
Luis G. Posadas
4,
Afolabi Agbona
5,6,
Raymundo Buenrostro-Mariscal
1,
Abelardo Montesinos-López
7,* and
Aakash Chawade
8,*
1
Facultad de Telemática, Universidad de Colima, Colima 28040, CL, Mexico
2
International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México–Veracruz, Texcoco 52640, EM, Mexico
3
Colegio de Postgraduados, Montecillos 56230, EM, Mexico
4
Department of Agronomy and Horticulture, University of Nebraska, 363 Keim Hall, Lincoln, NE 68583, USA
5
International Institute of Tropical Agriculture (IITA), Ibadan 200001, Nigeria
6
Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX 77843, USA
7
Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, JA, Mexico
8
Department of Plant Breeding at SLU, Swedish University of Agricultural Sciences, P.O. Box 190 Alnarp, SE-23422 Lomma, Sweden
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(8), 3620; https://doi.org/10.3390/ijms26083620
Submission received: 20 February 2025 / Revised: 9 March 2025 / Accepted: 7 April 2025 / Published: 11 April 2025
(This article belongs to the Special Issue Advances in Plant Genomics and Genetics: 2nd Edition)

Abstract

Genomic selection (GS) accelerates plant breeding by predicting complex traits using genomic data. This study compares genomic best linear unbiased prediction (GBLUP), quantile mapping (QM)—an adjustment to GBLUP predictions—and four outlier detection methods. Using 14 real datasets, predictive accuracy was evaluated with Pearson’s correlation (COR) and normalized root mean square error (NRMSE). GBLUP consistently outperformed all other methods, achieving an average COR of 0.65 and an NRMSE reduction of up to 10% compared to alternative approaches. The proportion of detected outliers was low (<7%), and their removal had minimal impact on GBLUP’s predictive performance. QM provided slight improvements in datasets with skewed distributions but showed no significant advantage in well-distributed data. These findings confirm GBLUP’s robustness and reliability, suggesting limited utility for QM when data deviations are minimal.

1. Introduction

Genomic selection (GS) has changed plant breeding over the past decade, fundamentally transforming genetic evaluation and selection. By integrating genomic data into predictive models, GS has accelerated breeding cycles, improved selection precision, and enhanced genetic gains [1,2]. Unlike traditional methods reliant on extensive phenotypic evaluations, GS leverages genome-wide markers to predict genotype performance, reducing the costs and time associated with field trials [3]. This innovation has been pivotal in addressing global challenges such as food security and climate change by enabling the rapid development of high-yielding, resilient crop varieties [4]. Today, GS is a cornerstone of modern plant breeding, integrating cutting-edge technologies and big data analytics to drive sustainability and innovation.
GS has been successfully applied across diverse crops, enhancing yield potential and disease resistance in maize and wheat [2], accelerating the development of stress-tolerant rice varieties [5], and shortening breeding cycles in perennials like sugarcane and oil palm [6]. Its ability to predict genetic potential using genome-wide markers has significantly reduced the need for extensive phenotypic evaluations. Additionally, GS has improved genetic gains for complex traits such as drought tolerance and nutrient use efficiency, underscoring its transformative impact on modern agriculture [7].
The GBLUP (genomic best linear unbiased prediction) statistical model remains one of the most popular and widely used approaches in genomic prediction due to its simplicity, robustness, and interpretability. Despite the emergence of modern machine learning methods, GBLUP is preferred in many cases because it is computationally efficient and provides reliable predictions, especially for traits controlled by many small-effect loci [8]. Its linear mixed-model framework accounts for genetic relationships using genomic relationship matrices, making it particularly suitable for plant and animal breeding programs [1]. While machine learning methods like random forests and deep learning can capture complex non-linear interactions, they often require large datasets, extensive hyperparameter tuning, and are prone to overfitting when data are limited [2]. In contrast, GBLUP provides a balance between accuracy and simplicity, ensuring stable performance across a variety of traits and environments [9,10]. Its widespread adoption by GS underscores its reliability and practical advantages, particularly in agricultural contexts where interpretability and computational feasibility are critical.
Given the computational efficiency and widespread use of GBLUP in genomic prediction, there is significant interest in exploring strategies to enhance its predictive power. Combining GBLUP with quantile mapping (QM) and outlier detection techniques offers a promising avenue for improvement. Quantile mapping can address biases in the distribution of predicted values by aligning them more closely with the observed data, thereby increasing prediction accuracy and ensuring a better calibration [11]. Outlier detection, on the other hand, enhances the robustness of the model by identifying and removing data points that disproportionately influence predictions, which is especially crucial in genomic datasets prone to noise and inconsistencies [12]. Together, these methods can, in theory, synergistically improve GBLUP by refining its inputs and outputs, ultimately leading to more reliable predictions. This combined approach not only leverages the interpretability and computational advantages of GBLUP but also integrates advanced techniques to address limitations inherent to genomic datasets, making it a powerful tool for plant and animal breeding.
QM is widely utilized across disciplines for bias correction and improving data alignment. In climate science, QM adjusts biases in model outputs, enhancing the accuracy of temperature and precipitation projections for reliable climate assessments [13]. In hydrology, it refines streamflow and rainfall-runoff predictions, crucial for flood and drought evaluations [14]. In remote sensing, QM harmonizes satellite-derived data with ground-based observations, improving environmental dataset utility [15]. Beyond environmental sciences, QM is applied in genomics for aligning predicted values with observed data, enhancing prediction accuracy, and in economics for bias correction in income and risk assessments. Its versatility makes QM a critical tool across multiple fields.
Outlier detection plays a critical role in improving predictions in machine learning by identifying and mitigating the impact of anomalous data points that can distort model performance. By detecting and removing outliers, models achieve a better generalization, reduced bias, and enhanced accuracy, especially in regression and classification tasks. Methods such as statistical thresholds, clustering, and advanced algorithms like isolation forests are commonly applied to detect outliers in diverse datasets. Outlier detection has shown effectiveness in applications such as genomic prediction, fraud detection, and environmental modeling, where precise predictions are essential for decision-making [16]. These approaches refine training data quality and ultimately lead to more robust and reliable machine learning models [17,18]. These studies underscore the importance of addressing outliers to enhance the reliability of genomic prediction models.
As already mentioned, previous studies have shown that quantile mapping (QM) and outlier detection can enhance GBLUP for genomic predictions, which motivated this study. QM improves calibration by aligning predicted values with observed distributions, addressing biases from GBLUP’s normality assumptions. Outlier detection enhances robustness by mitigating the impact of extreme values that could distort variance estimates and bias predictions. Given these prior findings, this study aimed to further evaluate their effectiveness. To strengthen the rationale, it is important to explicitly reference previous studies, clarify how these methods theoretically improve predictions, and demonstrate their impact through comparative analyses.
By leveraging QM for bias correction and four outlier detection methods (Invchi, Logit, Meanp, and SumZ) to refine the training set, this study aims to maximize the predictive potential of GBLUP across diverse datasets. The benchmark analysis, conducted on 14 real datasets, evaluates predictive accuracy using Pearson’s correlation (COR) and normalized mean square error (NRMSE), showcasing the synergistic effects of combining these complementary methods. However, for simplicity, we present full results below for three datasets, Disease, EYT_1, and Wheat_1, as well as results across datasets. We studied GBLUP alone and GBLUP in combination with quantile mapping (QM) and four outlier detection models (Invchi, Logit, Meanp, and SumZ) making a total of 10 genomic prediction models. Several results for datasets are shown in Appendix A, Appendix B and Appendix C.

2. Results

The results are presented in four sections. Section 1, Section 2 and Section 3 present the results for the datasets Disease, EYT_1, and Wheat_1. Section 4 provides the results across datasets. Appendix A provides the tables of results corresponding to datasets Disease, EYT_1, Wheat_1, and across datasets. Appendix B and Appendix Cprovide the figures and tables of results for the other datasets included in the study: Maize, Japonica, Indica, Groundnut, EYT_2, EYT_3, Wheat_2, Wheat_3, Wheat_4, Wheat_5, and Wheat_6. The results are provided in terms of the metrics of Pearson’s correlation (COR) and normalized mean square error (NRMSE). The assignment of datasets to the appendices was random, that is, not based on any specific criteria.
As described in the Section 4 below, we compared the genomic prediction accuracy of 10 different model options: GBLUP alone; GBLUP combined only with quantile mapping (QM); GBLUP combined with the four outlier detection methods (Invchi, Logit, Meanp, and SumZ); and GBLUP combined with the four combinations of quantile mapping (QM) with the four outlier detection methods (QM_Invchi, QM_Logit, QM_Meanp, and QM_SumZ).

2.1. Disease

Figure 1 presents the results for the Disease dataset under a comparative analysis of the GBLUP, Invchi, Logit, Meanp, Sumz, QM, QM_Invchi, QM_Logit, QM_Meanp, QM_Sumz, and Sumz models in terms of their predictive efficiency measured by COR and NRMSE. For more details, see Table A1 (in Appendix A).
The analysis of Pearson’s correlation between observed and predicted values (Figure 1A) for the Disease dataset reveals that the GBLUP method stands out as the most effective approach, achieving a correlation of 0.1766, which is 0.8567% greater than QM’s correlation of 0.1751. In comparison to other methods, GBLUP significantly outperforms Meanp (0.1728, 2.1991% less effective), QM_Meanp (0.1661, 6.3215% less effective), SumZ (0.1630, 8.3436% less effective), QM_Sumz (0.1586, 11.3493% less effective), Logit (0.1559, 13.2777% less effective), Invchi (0.1552, 13.7887% less effective), QM_Invchi (0.1530, 15.4248% less effective), and QM_Logit (0.1528, 15.5759% less effective).
Regarding the NRMSE metric between observed and predicted values (Figure 1B) for the Disease dataset, the results indicate that the GBLUP method achieves the lowest average NRMSE, making it the most effective option. GBLUP yields a value of 0.4313, which is 0.1159% better than Meanp (0.4318) and 0.5565% better than SumZ (0.4337). Additionally, GBLUP outperforms Logit (0.4345) by 0.7419% and Invchi (0.4346) by 0.7651%. Notably, GBLUP also shows significant advantages over QM_Logit (0.4984) by 15.5576%, QM_Invchi (0.4986) by 15.604%, QM_Sumz (0.4987) by 15.6272%, QM_Meanp (0.5072) by 17.598%, and QM (0.5234) by 21.354%.
Overall, the analysis of the Disease dataset indicates that the GBLUP method is the most effective approach, demonstrating a higher Pearson’s correlation compared to other methods, including QM and Meanp. This trend is also reflected in the NRMSE metric, where GBLUP achieves the lowest average NRMSE, confirming its superior performance. Its advantages over a range of alternative methods, including various quantile mapping strategies, further solidify the reliability and effectiveness of GBLUP for predictive tasks in this context.

2.2. EYT_1

The results for the models evaluated on the EYT_1 dataset (Figure 2) were assessed using the same metrics, COR and NRMSE. For more details, see Table A2 (in Appendix A).
The evaluation of Pearson’s correlation between observed and predicted values (Figure 2A) for the EYT_1 dataset indicates that the GBLUP method emerges as the most effective strategy, attaining a correlation of 0.4659, which is 3.9955% greater than Meanp’s correlation of 0.4480. In relation to other approaches, GBLUP significantly surpasses QM (0.4429, 5.193% less effective), Invchi (0.4417, 5.4788% less effective), Logit (0.4414, 5.5505% less effective), SumZ (0.4389, 6.1517% less effective), QM_Meanp (0.4273, 9.0335% less effective), QM_Sumz (0.4270, 9.1101% less effective), QM_Logit (0.4257, 9.4433% less effective), and QM_Invchi (0.4193, 11.1138% less effective).
Regarding the NRMSE metric between observed and predicted values (Figure 2B) for the EYT_1 dataset, the findings reveal that the GBLUP method achieves the lowest average NRMSE, establishing it as the most effective choice. GBLUP has a value of 0.0450, which is 0.8889% greater than Meanp (0.0454) and 1.1111% better than Invchi (0.0455). Additionally, GBLUP outperforms Logit (0.0456) and SumZ (0.0456) by 1.3333%. Notably, GBLUP also exhibits significant advantages over QM_Sumz (0.0512) by 13.7778%, QM_Logit (0.0519) by 15.3333%, QM_Invchi (0.0533) by 18.4444%, QM_Meanp (0.0534) by 18.6667%, and QM (0.0545) by 21.1111%.
Overall, the analysis of the EYT_1 dataset indicates that the GBLUP method consistently outperforms other strategies, displaying both the highest Pearson’s correlation and the lowest NRMSE. This establishes GBLUP as the most effective choice compared to Meanp, Invchi, and the various quantile mapping methods. Its superior performance across both metrics underscores its reliability and potential for the enhancement of predictive accuracy in related applications.

2.3. Wheat_1

This section presents the results of the genomic prediction models evaluated on the Wheat_1 data, considering the same metrics as before. For more details, see Table A3 (in Appendix A).
The assessment of Pearson’s correlation between observed and predicted values (Figure 3A) for the Wheat_1 dataset shows that the GBLUP method emerges as the most effective strategy, achieving a correlation of 0.4682, which is 3.8598% greater than Meanp’s correlation of 0.4406. In comparison to other methods, GBLUP significantly outperforms QM (0.4508, 6.2642% less effective), SumZ (0.4400, 6.4091% less effective), Logit (0.4387, 6.7244% less effective), Invchi (0.4314, 8.5304% less effective), QM_Meanp (0.4299, 8.909% less effective), QM_Invchi (0.4256, 10.0094% less effective), QM_Sumz (0.4214, 11.1058% less effective), and QM_Logit (0.4187, 11.8223% less effective).
Regarding the NRMSE metric between observed and predicted values (Figure 3B) for the Wheat_1 dataset, the findings indicate that the GBLUP method achieves the lowest average NRMSE, establishing it as the most effective option. GBLUP has a value of 0.887, which is 1.5671% better than Logit (0.9009) and 1.6347% greater than Meanp (0.9015). Additionally, GBLUP outperforms SumZ (0.9016) by 1.646% and Invchi (0.9047) by 1.9955%. Notably, GBLUP also presents significant advantages over QM_Invchi (0.9866) by 11.2289%, QM_Meanp (0.9895) by 11.5558%, QM_Logit (1.0148) by 14.4081%, QM_Sumz (1.0238) by 15.4228%, and QM (1.0293) by 16.0428%.
The assessment of the Wheat_1 dataset reveals that the GBLUP method is the most effective strategy, achieving a higher Pearson’s correlation compared to other approaches, including Meanp and remaining methods. The performance of GBLUP is not only superior in correlation but also presents the lowest average NRMSE, further establishing its effectiveness. It significantly outperforms other methods, such as Logit and SumZ, as well as a range of quantile mapping strategies, indicating its reliability for predictive tasks. Overall, the consistent advantages of GBLUP reinforce its position as the preferred method in this context.

2.4. Across Data

In this section, the analysis of the results presented across datasets is given under the same model and metrics as before. For more details, see Table A4 (in Appendix A).
The assessment of Pearson’s correlation between observed and predicted values (Figure 4A) across datasets highlights the GBLUP method as the most effective strategy, achieving a correlation of 0.4834, which is 3.9794% greater than Meanp’s correlation of 0.4649. In comparison to other methods, GBLUP significantly outperforms QM (0.4659, 3.7562% less effective), SumZ (0.4584, 5.4538% less effective), Logit (0.4569, 5.8% less effective), and Invchi (0.4533, 6.6402% less effective). Notably, GBLUP also shows advantages over various quantile mapping methods, including QM_Meanp (0.4458, 8.4343% less effective), QM_Logit (0.4412, 9.5648% less effective), QM_Sumz (0.4405, 9.7389% less effective), and QM_Invchi (0.4355, 10.9989% less effective).
The assessment of the NRMSE metric between observed and predicted values (Figure 4B) across datasets indicates that the GBLUP method achieves the lowest average NRMSE, establishing it as the most effective option. GBLUP has a value of 0.6954, which is 0.7046% better than Meanp (0.7003) and 0.9347% greater than SumZ (0.7019). Additionally, GBLUP outperforms Logit (0.7019) and Invchi (0.7043) by 0.9347% and 1.2798%, respectively. Notably, GBLUP also presents significant advantages over various quantile mapping methods, including QM_Logit (0.7928) by 14.0063%, QM_Meanp (0.7976) by 14.6966%, QM_Invchi (0.8018) by 15.3005%, QM_Sumz (0.8110) by 16.6235%, and QM (0.8160) by 17.3425%.
The assessment of Pearson’s correlation across datasets reveals that the GBLUP method is the most effective approach, achieving a higher correlation compared to other methods, including Meanp and various quantile mapping strategies. GBLUP not only excels in correlation but also records the lowest average NRMSE, solidifying its status as the most reliable option. Its performance surpasses that of Logit and SumZ, as well as several quantile mapping methods, indicating a clear advantage. Overall, GBLUP’s consistent effectiveness across both metrics reinforces its preference for predictive tasks in this context.

3. Discussion

The successful implementation of GS in plant breeding faces several challenges, including the need for high-quality genomic and phenotypic data, appropriate statistical models, and robust validation strategies. One key hurdle is the limited availability of large, diverse datasets required to capture the genetic architecture of complex traits and account for genotype-by-environment interactions, which are critical in breeding programs targeting multiple environments [2,19]. Additionally, computational demands increase significantly with the inclusion of high-dimensional genomic data, requiring advancements in algorithms and computational resources. Another challenge lies in translating GP predictions into actionable breeding decisions, demanding integration with traditional breeding practices and decision-support tools [20]. Addressing these issues involves interdisciplinary collaboration and significant investment in training, data curation, and infrastructure to fully leverage the potential of GP in enhancing genetic gains and breeding efficiency.
Improving the efficiency of GS in plant breeding relies on strategies that enhance prediction accuracy, optimize resource allocation, and integrate GS into breeding pipelines. One successful approach is the use of multi-environment trials (MET) to capture genotype-by-environment interactions, enabling better predictions across diverse target environments [2]. Sparse testing schemes, which involve phenotyping only a subset of genotypes in certain environments, are also effective in reducing costs while maintaining prediction accuracy when paired with robust statistical models [21,22]. Additionally, leveraging complementary data sources such as high-throughput phenotyping and environmental covariates can further enhance GS accuracy by providing insights into complex trait architectures [23]. Implementing these strategies requires investment in advanced data management systems and interdisciplinary collaboration to fully integrate GS into breeding programs and maximize genetic gains.
Despite its potential, the practical application of GS in plant breeding remains highly challenging due to complexities such as the need for high-quality genomic and phenotypic data, the variability in genotype-by-environment interactions, and the computational burden of analyzing large datasets. The effectiveness of GS often depends on the accuracy of prediction models, which can be hindered by limited training data, especially for less-studied traits or environments [24]. Furthermore, the integration of GS into breeding programs requires adapting existing workflows and overcoming economic and logistical barriers, such as the cost of genotyping and the need for skilled personnel [2]. To address these limitations, researchers are actively exploring novel approaches, including integrating environmental data, leveraging machine learning techniques, and developing strategies like sparse testing to improve the efficiency and scalability of GS [25]. These efforts aim to refine GS methodologies and make them more applicable to real-world breeding scenarios.
For this reason, this study explored the use of quantile mapping and the removal of outlier observations within a GBLUP framework to improve the predictive accuracy of the conventional GBLUP model. In theory, these combinations have the potential to enhance the prediction accuracy of GBLUP by addressing critical issues such as the influence of extreme values and non-normality in the data. Quantile mapping, by transforming the distribution of predictions to better align with observed values, can correct systematic biases that often undermine model performance. Simultaneously, outlier removal helps reduce noise and ensures that the model focuses on patterns representative of the majority of the data, which is particularly important when dealing with genomic data characterized by high dimensionality and complex interactions. These adjustments aim to refine the training dataset and statistical assumptions of the model, ultimately resulting in more robust and reliable predictions. Furthermore, integrating these strategies within the GBLUP framework offers an opportunity to adapt this widely used genomic prediction method to varying data qualities and environmental conditions, addressing persistent challenges in plant breeding programs.
However, our results combining the GBLUP method with quantile mapping and outlier detection techniques did not meet expectations. In terms of Pearson’s correlation, across all datasets and within each individual dataset, the GBLUP method proved to be the most effective, consistently achieving higher correlations than the alternative approaches. This superior performance of GBLUP is further supported by its ability to minimize errors, as evidenced by lower NRMSE values. Compared to other methods, including any outlier detection method, quantile mapping, and resulting combinations of quantile mapping with outlier detection techniques, GBLUP consistently delivers more accurate predictions, reaffirming its reliability and robustness in the context of breeding programs.
Our results emphasize the benefits and robustness of the GBLUP method, which remains one of the most popular approaches for genomic prediction. Its popularity stems from several key factors. Firstly, GBLUP is computationally efficient and relatively simple to implement, making it accessible for a wide range of breeding programs. Secondly, it leverages genomic relationships to predict breeding values, effectively capturing additive genetic effects, which are crucial for many quantitative traits. Additionally, GBLUP is grounded in a solid statistical framework, offering reliable and interpretable results. Its ability to handle high-dimensional genomic data without overfitting further contributes to its widespread use. Moreover, the compatibility of GBLUP with extensions, such as the incorporation of environmental covariates or non-additive effects, enhances its adaptability to complex breeding scenarios. These advantages collectively solidify the position of GBLUP as a cornerstone method in genomic prediction.
Finally, we want to emphasize that our results are specific to the datasets used in this study, which reflect genetic and environmental conditions. The observed lack of improvement in predictive accuracy when combining GBLUP with quantile mapping and outlier detection techniques may be influenced by the nature of the datasets, such as their size, genetic architecture, or level of noise. While these combinations did not outperform the conventional GBLUP method in this context, it is important to acknowledge that their effectiveness could vary under different circumstances. For instance, in datasets with pronounced outliers or non-normal distributions, quantile mapping and outlier removal may play a more significant role in improving model performance. Additionally, these techniques might offer advantages in scenarios in which specific traits exhibit strong non-linear patterns or in which genotype-by-environment interactions are highly complex. Therefore, while our findings reaffirm the robustness of the standard GBLUP method, they also suggest the need for further exploration of these combinations across diverse datasets to fully understand their potential.
This study evaluates the impact of quantile mapping and outlier detection on the accuracy of genomic predictions using GBLUP. However, confidence intervals for accuracy metrics, such as Pearson’s correlation and root means square error, were not computed, which limits the ability to assess the statistical uncertainty associated with the observed improvements. Additionally, formal hypothesis testing, such as paired statistical tests to compare GBLUP with and without these enhancements, was not conducted. While the study primarily focused on practical predictive improvements rather than statistical inference, future research should incorporate bootstrapping or cross-validation techniques to estimate confidence intervals and apply appropriate statistical tests, such as paired t-tests or Wilcoxon signed-rank tests, to determine whether the observed differences are statistically significant. Implementing these approaches would strengthen the robustness of the conclusions and provide a clearer understanding of the reliability and generalizability of the proposed methods across different datasets and breeding populations.
Furthermore, computational time was not systematically evaluated, which is an important factor when implementing these methods in large-scale genomic selection programs. Future studies should assess the trade-off between improved prediction accuracy and the additional computational cost associated with quantile mapping and outlier detection, particularly in large datasets where efficiency is a key consideration. Implementing these approaches would strengthen the robustness of the conclusions and provide a clearer understanding of the reliability, scalability, and generalizability of the proposed methods across different datasets and breeding populations.

4. Methods and Materials

4.1. Datasets

A detailed overview of the 14 datasets used in this study is provided in Table 1.

4.2. Bayesian GBLUP Model

The GBLUP model implemented was:
Y i = μ + g i + ϵ i  
where Y i represents the best linear unbiased estimates (BLUE) for the i-th genotype. The grand mean is denoted by μ, and the random effects associated with genotypes (Lines), g j ,   j = 1 , , J , is distributed as g = g 1 , , g J T N J 0 , σ g 2 G , where G is the genomic relationship-matrix [8] and σ g 2 is the genetic variance component. The residual errors, ϵ i , are assumed to be independent and normally distributed with mean 0 and variance σ 2 . This model was implemented in R statistical software version 4.4.3 [26] with the BGLR library of Pérez and de los Campos [27].

4.3. Quantile Mapping (QM)

QM is a widely used bias adjustment technique for post-processing climate model simulations. It addresses the mismatch between the coarse spatial resolution of model outputs and finer spatial scales of interest [9]. QM achieves this by aligning the cumulative density function (CDF) of the modeled data with that of reference data for each target location. Specifically, it creates a quantile-dependent correction function to map simulated quantiles to their corresponding reference values. This correction function is then applied to the modeled time series, yielding bias-adjusted values that align with the distribution of the reference data. QM operates under the assumption that the CDF of a variable in the forecast and observational time series remains consistent in future periods [28]. Given variable x, QM minimizes discrepancies between the CDFs of model data and reference data over a calibration period. In practice, the algorithm maps the model output x to the observational output y using a transformation function ℎ, ensuring the two CDFs become equivalent [29]. In terms of equations, this results in:
y = h x C D F y y = C D F x ( x )
y = C D F y 1 ( C D F x x )
where C D F 1 is the inverse function of the CDF. From Equation (1) it becomes clear that QM equates the cumulative distribution functions (CDFs) C D F y and C D F x , respectively, of the observed data y and modeled data x , over a historical period, which results in the transfer function (1). The implemented QM scheme was based on the R package map version 3.4.2.1 [13].
Since the QM method relies on the observed and predicted values from the training set to adjust predictions, it is important to emphasize that QM is specifically implemented to refine the predicted values generated by the GBLUP method. In other words, the conventional GBLUP results are enhanced through this QM adjustment process.

4.4. Outlier Detection Methods

The four methods used for the detection of influential measures are based on the p-value-based meta-analysis approach proposed by Budhlakoti and Mishra [30]. A brief description of these approaches is as follows. Let us assume there are K independent tests, and their corresponding p-values are p 1 , p 2 ,…,   p k . Under H 0 , it is assumed that p-values from different methods (for individual observations) are uniformly distributed between 0 and 1 (i.e., p k ~ U ( 0,1 ) ). To determine the overall statistical significance of the hypothesis under test ( H 0 , i.e., null hypothesis vs. H 1 , alternative hypothesis), individual p-values for each observation/genotype from different methods are combined. The specific methods used for this purpose are summarized in Table 2.
This approach (Table 2) was used to compute the final statistical significance values, specifically the combined p-values for the selected observations or genotypes. Influential observations were determined by applying a suitable p-value threshold. The methods were implemented using source code available from a GitHub repository at GitHub—BudhlakotiN/OGS: R/OGS: Outlier in Genomics Data.
It is important to note that these four outlier detection methods (Invchi, Logit, Meanp, SumZ) were applied to the training set of each fold. After implementation, any observations identified as outliers were removed from the training set. The reduced training set was then used to implement the GBLUP method, as described in Equation (1).

4.5. Combining Quantile Mapping with Outlier Detection Methods Using GBLUP

Combining the quantile mapping (QM) method with the four outlier detection methods (Invchi, Logit, Meanp, and SumZ) resulted in the development of four additional approaches, denoted as QM_Invchi, QM_Logit, QM_Meanp, and QM_SumZ. These methods were implemented as follows: first, each of the four outlier detection methods was applied as previously described. Subsequently, the QM method was applied using the observed and predicted values from the training set produced by each of the four outlier detection methods.
Therefore, a total of 10 models were employed in this study. These included: GBLUP alone; GBLUP combined with quantile mapping (QM); GBLUP combined with the four outlier detection methods, Invchi, Logit, Meanp, and SumZ; and, finally, GBLUP combined with the four combinations of quantile mapping (QM) with the outlier detection methods (QM_Invchi, QM_Logit, QM_Meanp, and QM_SumZ). Results are thus presented for a total of 10 combinations of GBLUP-based models, incorporating various combinations with QM and outlier detection methods.

4.6. Evaluation of Prediction Performance

To evaluate the proposed methods, we used cross-validation; more specifically, a 10-fold cross-validation approach. In each fold, 80% of the data were allocated for training and 20% for testing. For each testing set, prediction accuracy was assessed using two metrics: average Pearson’s correlation (COR) and normalized root mean square error (NRMSE) [35]. These metrics were selected because they facilitate comparisons across different traits, being independent of the trait’s scale. Both metrics were calculated using the observed values ( y i ) and the predicted values [ f ^ ( x i ) ] from the testing set of each fold. The average performance over the 10 folds was reported. COR and NRMSE were selected not only for their utility in genomic prediction but also because they are widely used metrics for the evaluation of prediction performance.

5. Conclusions

Our benchmark analysis shows that the conventional GBLUP method outperforms quantile mapping, outlier detection techniques, and their combination in the context of genomic prediction. These findings reaffirm the effectiveness and robustness of GBLUP, which remains one of the most widely used techniques in plant and animal breeding for genomic selection. However, our results are not definitive, as substantial empirical evidence suggests that removing outliers from the training data can enhance prediction accuracy and quantile mapping can improve predictions in the testing set. Therefore, further empirical evaluations are essential to thoroughly assess the benefits and limitations of these alternative methods within the context of genomic selection. This will provide a more comprehensive understanding of their potential to complement or improve upon GBLUP.

Author Contributions

Conceptualization, O.A.M.-L. and A.M.-L.; methodology, O.A.M.-L. and A.M.-L.; validation and investigation, O.A.M.-L., A.M.-L., A.A. and J.C.; writing—original draft preparation and software, A.A. and R.B.-M.; writing—review and editing, A.C. Authors P.V., G.G., L.C.-H., S.D., C.S.P. and L.G.P. have read the first version. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by SLU Grogrund grant number SLU.ltv.2020.1.1.1-654.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The genomic and phenotypic data used in this study are available at the following link. https://github.com/osval78/Refaning_Penalized_Regression, accessed on 8 January 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

GSGenomic selection
GBLUPGenomic best linear unbiased predictor
QMQuantile mapping
CDFCumulative density function
CORCorrelation
NRMSENormalized root mean square error

Appendix A

Results for datasets Disease (Table A1), EYT_1 (Table A2), Wheat_1 (Table A3), and across datasets (Table A4).
Table A1. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Disease dataset, using quantile mapping.
Table A1. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Disease dataset, using quantile mapping.
MetricMethodMinMeanMedianMax
CORGBLUP0.09760.17660.19770.2344
CORInvchi0.05440.15520.18200.2293
CORLogit0.05060.15590.18290.2344
CORMeanp0.10080.17280.17550.2421
CORQM0.10180.17510.18820.2351
CORQM_Invchi0.05110.15300.18440.2234
CORQM_Logit0.04860.15280.18410.2258
CORQM_Meanp0.08420.16610.17350.2407
CORQM_Sumz0.06990.15860.18550.2204
CORSumZ0.06810.16300.19170.2291
COR_SEGBLUP0.01690.02340.02420.0290
COR_SEInvchi0.02760.03090.03230.0327
COR_SELogit0.02210.03170.03410.0388
COR_SEMeanp0.02470.02720.02520.0318
COR_SEQM0.02060.02470.02280.0307
COR_SEQM_Invchi0.03030.03260.03230.0352
COR_SEQM_Logit0.02270.03200.03280.0407
COR_SEQM_Meanp0.02410.02670.02410.0318
COR_SEQM_Sumz0.02010.02820.03060.0340
COR_SESumZ0.02000.02740.02950.0327
NRMSEGBLUP0.40550.43130.41270.4757
NRMSEInvchi0.40660.43460.41360.4837
NRMSELogit0.40600.43450.41350.4840
NRMSEMeanp0.40440.43180.41490.4761
NRMSEQM0.46700.52340.50280.6005
NRMSEQM_Invchi0.44640.49860.47280.5767
NRMSEQM_Logit0.44390.49840.47500.5762
NRMSEQM_Meanp0.44640.50720.49540.5798
NRMSEQM_Sumz0.44630.49870.47410.5756
NRMSESumZ0.40630.43370.41270.4820
NRMSE_SEGBLUP0.00320.00630.00650.0092
NRMSE_SEInvchi0.00390.00660.00640.0095
NRMSE_SELogit0.00430.00650.00580.0093
NRMSE_SEMeanp0.00360.00620.00580.0093
NRMSE_SEQM0.00760.00910.00900.0107
NRMSE_SEQM_Invchi0.00770.01000.01050.0119
NRMSE_SEQM_Logit0.00690.00890.00990.0101
NRMSE_SEQM_Meanp0.00760.01020.01110.0118
NRMSE_SEQM_Sumz0.00590.00910.00980.0116
NRMSE_SESumZ0.00430.00660.00620.0093
Table A2. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_1 dataset, using quantile mapping.
Table A2. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_1 dataset, using quantile mapping.
MetricMethodMinMeanMedianMax
CORGBLUP0.42820.46590.47270.4901
CORInvchi0.40370.44170.44450.4741
CORLogit0.39370.44140.44770.4765
CORMeanp0.40280.44800.45700.4753
CORQM0.35420.44290.46330.4908
CORQM_Invchi0.35880.41930.42830.4617
CORQM_Logit0.37150.42570.43060.4698
CORQM_Meanp0.33930.42730.44770.4746
CORQM_Sumz0.37940.42700.42920.4702
CORSumZ0.39630.43890.44380.4717
COR_SEGBLUP0.00960.01640.01510.0256
COR_SEInvchi0.01620.01960.01780.0264
COR_SELogit0.01530.01840.01810.0220
COR_SEMeanp0.01260.01790.01670.0257
COR_SEQM0.01090.02340.01970.0432
COR_SEQM_Invchi0.01930.02790.02910.0339
COR_SEQM_Logit0.01560.02260.02330.0284
COR_SEQM_Meanp0.01220.02260.02220.0338
COR_SEQM_Sumz0.01200.02250.02360.0308
COR_SESumZ0.01260.01900.01910.0252
NRMSEGBLUP0.03490.04500.04480.0552
NRMSEInvchi0.03550.04550.04550.0557
NRMSELogit0.03540.04560.04560.0556
NRMSEMeanp0.03530.04540.04530.0558
NRMSEQM0.03860.05450.05750.0646
NRMSEQM_Invchi0.04280.05330.05380.0628
NRMSEQM_Logit0.03770.05190.05360.0629
NRMSEQM_Meanp0.03760.05340.05710.0617
NRMSEQM_Sumz0.03800.05120.05220.0626
NRMSESumZ0.03550.04560.04560.0558
NRMSE_SEGBLUP0.00040.00060.00060.0010
NRMSE_SEInvchi0.00050.00080.00080.0011
NRMSE_SELogit0.00060.00080.00070.0012
NRMSE_SEMeanp0.00060.00080.00080.0010
NRMSE_SEQM0.00030.00310.00230.0076
NRMSE_SEQM_Invchi0.00100.00400.00430.0064
NRMSE_SEQM_Logit0.00060.00330.00380.0049
NRMSE_SEQM_Meanp0.00060.00320.00240.0074
NRMSE_SEQM_Sumz0.00060.00260.00240.0049
NRMSE_SESumZ0.00060.00080.00070.0010
Table A3. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_1 dataset, using quantile mapping.
Table A3. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_1 dataset, using quantile mapping.
MetricMethodMinMeanMedianMax
CORGBLUP0.46820.46820.46820.4682
CORInvchi0.43140.43140.43140.4314
CORLogit0.43870.43870.43870.4387
CORMeanp0.44060.44060.44060.4406
CORQM0.45080.45080.45080.4508
CORQM_Invchi0.42560.42560.42560.4256
CORQM_Logit0.41870.41870.41870.4187
CORQM_Meanp0.42990.42990.42990.4299
CORQM_Sumz0.42140.42140.42140.4214
CORSumZ0.44000.44000.44000.4400
COR_SEGBLUP0.01490.01490.01490.0149
COR_SEInvchi0.01160.01160.01160.0116
COR_SELogit0.01230.01230.01230.0123
COR_SEMeanp0.01220.01220.01220.0122
COR_SEQM0.01750.01750.01750.0175
COR_SEQM_Invchi0.01610.01610.01610.0161
COR_SEQM_Logit0.01290.01290.01290.0129
COR_SEQM_Meanp0.01370.01370.01370.0137
COR_SEQM_Sumz0.01660.01660.01660.0166
COR_SESumZ0.01270.01270.01270.0127
NRMSEGBLUP0.88700.88700.88700.8870
NRMSEInvchi0.90470.90470.90470.9047
NRMSELogit0.90090.90090.90090.9009
NRMSEMeanp0.90150.90150.90150.9015
NRMSEQM1.02931.02931.02931.0293
NRMSEQM_Invchi0.98660.98660.98660.9866
NRMSEQM_Logit1.01481.01481.01481.0148
NRMSEQM_Meanp0.98950.98950.98950.9895
NRMSEQM_Sumz1.02381.02381.02381.0238
NRMSESumZ0.90160.90160.90160.9016
NRMSE_SEGBLUP0.00920.00920.00920.0092
NRMSE_SEInvchi0.00530.00530.00530.0053
NRMSE_SELogit0.00600.00600.00600.0060
NRMSE_SEMeanp0.00550.00550.00550.0055
NRMSE_SEQM0.04360.04360.04360.0436
NRMSE_SEQM_Invchi0.03670.03670.03670.0367
NRMSE_SEQM_Logit0.03630.03630.03630.0363
NRMSE_SEQM_Meanp0.03670.03670.03670.0367
NRMSE_SEQM_Sumz0.04610.04610.04610.0461
NRMSE_SESumZ0.00590.00590.00590.0059
Table A4. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) across the datasets, using quantile mapping.
Table A4. Performance comparison of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) across the datasets, using quantile mapping.
MetricMethodMinMeanMedianMax
CORGBLUP0.09760.48340.49370.6941
CORInvchi0.05440.45330.46820.6667
CORLogit0.05060.45690.47280.6676
CORMeanp0.10080.46490.47510.6739
CORQM0.10180.46590.48050.6978
CORQM_Invchi0.05110.43550.44060.6653
CORQM_Logit0.04860.44120.45880.6668
CORQM_Meanp0.08420.44580.45930.6751
CORQM_Sumz0.06990.44050.46450.6633
CORSumZ0.06810.45840.47090.6662
COR_SEGBLUP0.00920.02000.01790.0503
COR_SEInvchi0.01070.02280.01970.0659
COR_SELogit0.00730.02230.01990.0657
COR_SEMeanp0.00930.02220.01990.0605
COR_SEQM0.01090.02450.02250.0507
COR_SEQM_Invchi0.01400.02780.02460.0570
COR_SEQM_Logit0.00960.02550.02060.0668
COR_SEQM_Meanp0.00880.02650.02470.0550
COR_SEQM_Sumz0.01200.02560.02170.0502
COR_SESumZ0.00970.02190.01910.0577
NRMSEGBLUP0.02970.69540.42107.9058
NRMSEInvchi0.03050.70430.42547.9443
NRMSELogit0.03040.70190.42497.8848
NRMSEMeanp0.03000.70030.42367.8912
NRMSEQM0.03610.81600.48608.8085
NRMSEQM_Invchi0.03120.80180.47198.6206
NRMSEQM_Logit0.03770.79280.47658.6409
NRMSEQM_Meanp0.03760.79760.48298.6800
NRMSEQM_Sumz0.03800.81100.47018.6088
NRMSESumZ0.03000.70190.42217.9065
NRMSE_SEGBLUP0.00040.08850.00642.7860
NRMSE_SEInvchi0.00050.08850.00632.7937
NRMSE_SELogit0.00060.08820.00592.7865
NRMSE_SEMeanp0.00060.08780.00582.7730
NRMSE_SEQM0.00030.13160.01073.1423
NRMSE_SEQM_Invchi0.00100.12820.01043.0366
NRMSE_SEQM_Logit0.00060.11940.01023.0848
NRMSE_SEQM_Meanp0.00060.12950.01123.0857
NRMSE_SEQM_Sumz0.00060.13200.01243.0495
NRMSE_SESumZ0.00060.08870.00632.8039

Appendix B

Figures for datasets Maize (Figure A1), Japonica (Figure A2), Indica (Figure A3), Groundnut (Figure A4), EYT_2 (Figure A5), EYT_3 (Figure A6), Wheat_2 (Figure A7), Wheat_3 (Figure A8), Wheat_4 (Figure A9), Wheat_5 (Figure A10), and Wheat_6 (Figure A11).
Figure A1. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Maize dataset, using quantile mapping.
Figure A1. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Maize dataset, using quantile mapping.
Ijms 26 03620 g0a1
Figure A2. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Japonica dataset, using quantile mapping.
Figure A2. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Japonica dataset, using quantile mapping.
Ijms 26 03620 g0a2
Figure A3. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Indica dataset, using quantile mapping.
Figure A3. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Indica dataset, using quantile mapping.
Ijms 26 03620 g0a3
Figure A4. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Groundnut dataset, using quantile mapping.
Figure A4. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Groundnut dataset, using quantile mapping.
Ijms 26 03620 g0a4
Figure A5. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_2 dataset, using quantile mapping.
Figure A5. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_2 dataset, using quantile mapping.
Ijms 26 03620 g0a5
Figure A6. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_3 dataset, using quantile mapping.
Figure A6. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the EYT_3 dataset, using quantile mapping.
Ijms 26 03620 g0a6
Figure A7. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the Normalized Root Mean Square Error (NRMSE) (B) for the Wheat_2 dataset, using quantile mapping.
Figure A7. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the Normalized Root Mean Square Error (NRMSE) (B) for the Wheat_2 dataset, using quantile mapping.
Ijms 26 03620 g0a7
Figure A8. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_3 dataset, using quantile mapping.
Figure A8. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_3 dataset, using quantile mapping.
Ijms 26 03620 g0a8
Figure A9. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_4 dataset, using quantile mapping.
Figure A9. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_4 dataset, using quantile mapping.
Ijms 26 03620 g0a9
Figure A10. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_5 dataset, using quantile mapping.
Figure A10. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_5 dataset, using quantile mapping.
Ijms 26 03620 g0a10
Figure A11. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_6 dataset, using quantile mapping.
Figure A11. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and the normalized root mean square error (NRMSE) (B) for the Wheat_6 dataset, using quantile mapping.
Ijms 26 03620 g0a11

Appendix C

Table of results for datasets Maize, Japonica, Indica, Groundnut, EYT_2, EYT_3, Wheat_2, Wheat_3, Wheat_4, Wheat_5, and Wheat_6.
Table A5. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR and COR standard error COR_SE) and the normalized root mean square error (NRMSE and NRMSE standard error, NRMSE_SE) for Maize, Japonica, Indica, Groundnut, EYT_2, EYT_3, Wheat_2, Wheat_3, Wheat_4, Wheat_5, and Wheat_6 datasets.
Table A5. Comparative performance of genomic prediction models in terms of Pearson’s correlation (COR and COR standard error COR_SE) and the normalized root mean square error (NRMSE and NRMSE standard error, NRMSE_SE) for Maize, Japonica, Indica, Groundnut, EYT_2, EYT_3, Wheat_2, Wheat_3, Wheat_4, Wheat_5, and Wheat_6 datasets.
DataMetricMethodMinMeanMedianMax
MaizeCORGBLUP0.42250.42250.42250.4225
MaizeCORInvchi0.41060.41060.41060.4106
MaizeCORLogit0.42760.42760.42760.4276
MaizeCORMeanp0.42350.42350.42350.4235
MaizeCORQM0.37480.37480.37480.3748
MaizeCORQM_Invchi0.36910.36910.36910.3691
MaizeCORQM_Logit0.38090.38090.38090.3809
MaizeCORQM_Meanp0.37410.37410.37410.3741
MaizeCORQM_Sumz0.37920.37920.37920.3792
MaizeCORSumZ0.42640.42640.42640.4264
MaizeCOR_SEGBLUP0.01800.01800.01800.0180
MaizeCOR_SEInvchi0.01730.01730.01730.0173
MaizeCOR_SELogit0.01760.01760.01760.0176
MaizeCOR_SEMeanp0.01740.01740.01740.0174
MaizeCOR_SEQM0.02290.02290.02290.0229
MaizeCOR_SEQM_Invchi0.02370.02370.02370.0237
MaizeCOR_SEQM_Logit0.02010.02010.02010.0201
MaizeCOR_SEQM_Meanp0.02180.02180.02180.0218
MaizeCOR_SEQM_Sumz0.02060.02060.02060.0206
MaizeCOR_SESumZ0.01750.01750.01750.0175
MaizeNRMSEGBLUP7.90587.90587.90587.9058
MaizeNRMSEInvchi7.94437.94437.94437.9443
MaizeNRMSELogit7.88487.88487.88487.8848
MaizeNRMSEMeanp7.89127.89127.89127.8912
MaizeNRMSEQM8.80858.80858.80858.8085
MaizeNRMSEQM_Invchi8.62068.62068.62068.6206
MaizeNRMSEQM_Logit8.64098.64098.64098.6409
MaizeNRMSEQM_Meanp8.68008.68008.68008.6800
MaizeNRMSEQM_Sumz8.60888.60888.60888.6088
MaizeNRMSESumZ7.90657.90657.90657.9065
MaizeNRMSE_SEGBLUP2.78602.78602.78602.7860
MaizeNRMSE_SEInvchi2.79372.79372.79372.7937
MaizeNRMSE_SELogit2.78652.78652.78652.7865
MaizeNRMSE_SEMeanp2.77302.77302.77302.7730
MaizeNRMSE_SEQM3.14233.14233.14233.1423
MaizeNRMSE_SEQM_Invchi3.03663.03663.03663.0366
MaizeNRMSE_SEQM_Logit3.08483.08483.08483.0848
MaizeNRMSE_SEQM_Meanp3.08573.08573.08573.0857
MaizeNRMSE_SEQM_Sumz3.04953.04953.04953.0495
MaizeNRMSE_SESumZ2.80392.80392.80392.8039
JaponicaCORGBLUP0.56810.59140.58030.6366
JaponicaCORInvchi0.52660.54610.54120.5753
JaponicaCORLogit0.53720.56020.56020.5831
JaponicaCORMeanp0.54780.56620.56370.5896
JaponicaCORQM0.52460.56330.56910.5905
JaponicaCORQM_Invchi0.49550.53330.53200.5737
JaponicaCORQM_Logit0.51710.54300.53640.5821
JaponicaCORQM_Meanp0.51640.54730.54210.5884
JaponicaCORQM_Sumz0.50680.54300.53780.5895
JaponicaCORSumZ0.53760.56090.55800.5901
JaponicaCOR_SEGBLUP0.01820.02330.01980.0352
JaponicaCOR_SEInvchi0.01670.02790.02620.0426
JaponicaCOR_SELogit0.02010.02850.02720.0395
JaponicaCOR_SEMeanp0.02180.02970.02700.0428
JaponicaCOR_SEQM0.01870.03370.03280.0507
JaponicaCOR_SEQM_Invchi0.01780.02780.02720.0390
JaponicaCOR_SEQM_Logit0.01970.03700.03070.0668
JaponicaCOR_SEQM_Meanp0.02170.03290.03190.0460
JaponicaCOR_SEQM_Sumz0.01730.03010.02700.0490
JaponicaCOR_SESumZ0.01750.02870.02760.0422
JaponicaNRMSEGBLUP0.02970.12740.05240.3752
JaponicaNRMSEInvchi0.03050.13220.05460.3891
JaponicaNRMSELogit0.03040.13130.05400.3869
JaponicaNRMSEMeanp0.03000.13030.05380.3837
JaponicaNRMSEQM0.04130.14270.05980.4100
JaponicaNRMSEQM_Invchi0.03120.14240.06060.4171
JaponicaNRMSEQM_Logit0.04090.14130.05580.4126
JaponicaNRMSEQM_Meanp0.04870.14300.05580.4117
JaponicaNRMSEQM_Sumz0.03920.14480.06220.4157
JaponicaNRMSESumZ0.03000.13120.05410.3866
JaponicaNRMSE_SEGBLUP0.00100.00450.00300.0110
JaponicaNRMSE_SEInvchi0.00090.00490.00380.0110
JaponicaNRMSE_SELogit0.00100.00470.00370.0106
JaponicaNRMSE_SEMeanp0.00090.00490.00380.0110
JaponicaNRMSE_SEQM0.00180.00780.00930.0107
JaponicaNRMSE_SEQM_Invchi0.00110.00540.00510.0102
JaponicaNRMSE_SEQM_Logit0.00160.00660.00720.0103
JaponicaNRMSE_SEQM_Meanp0.00170.00730.00780.0118
JaponicaNRMSE_SEQM_Sumz0.00180.00810.00860.0135
JaponicaNRMSE_SESumZ0.00080.00470.00380.0106
IndicaCORGBLUP0.35100.51510.52830.6530
IndicaCORInvchi0.28630.46220.45920.6439
IndicaCORLogit0.29190.46820.46440.6523
IndicaCORMeanp0.31780.48740.49170.6484
IndicaCORQM0.35090.49820.50260.6367
IndicaCORQM_Invchi0.27830.43460.43260.5947
IndicaCORQM_Logit0.29030.45240.45490.6097
IndicaCORQM_Meanp0.32660.45440.44430.6025
IndicaCORQM_Sumz0.28470.45520.45840.6192
IndicaCORSumZ0.30720.46960.46500.6413
IndicaCOR_SEGBLUP0.02680.03600.03340.0503
IndicaCOR_SEInvchi0.02120.04230.04110.0659
IndicaCOR_SELogit0.02330.03920.03390.0657
IndicaCOR_SEMeanp0.02840.04030.03610.0605
IndicaCOR_SEQM0.02300.03590.03610.0486
IndicaCOR_SEQM_Invchi0.04330.04930.04840.0570
IndicaCOR_SEQM_Logit0.01970.03890.03860.0586
IndicaCOR_SEQM_Meanp0.03520.04560.04620.0550
IndicaCOR_SEQM_Sumz0.02250.03720.03810.0502
IndicaCOR_SESumZ0.02330.03760.03460.0577
IndicaNRMSEGBLUP0.03350.13930.04730.4293
IndicaNRMSEInvchi0.03660.14190.04680.4372
IndicaNRMSELogit0.03650.14150.04650.4363
IndicaNRMSEMeanp0.03530.14050.04710.4324
IndicaNRMSEQM0.03610.15440.05610.4691
IndicaNRMSEQM_Invchi0.03920.15620.05730.4709
IndicaNRMSEQM_Logit0.04250.15710.05400.4780
IndicaNRMSEQM_Meanp0.05270.16010.05860.4704
IndicaNRMSEQM_Sumz0.04270.15380.05330.4661
IndicaNRMSESumZ0.03690.14050.04680.4315
IndicaNRMSE_SEGBLUP0.00150.00700.00190.0228
IndicaNRMSE_SEInvchi0.00150.00740.00190.0241
IndicaNRMSE_SELogit0.00170.00750.00180.0248
IndicaNRMSE_SEMeanp0.00170.00750.00190.0247
IndicaNRMSE_SEQM0.00140.00810.00500.0209
IndicaNRMSE_SEQM_Invchi0.00180.00960.00840.0198
IndicaNRMSE_SEQM_Logit0.00140.00980.00810.0217
IndicaNRMSE_SEQM_Meanp0.00810.01220.00990.0210
IndicaNRMSE_SEQM_Sumz0.00140.00870.00690.0196
IndicaNRMSE_SESumZ0.00150.00730.00190.0237
GroundnutCORGBLUP0.59280.64570.64800.6941
GroundnutCORInvchi0.56730.62570.63450.6667
GroundnutCORLogit0.57230.62570.63150.6676
GroundnutCORMeanp0.58280.63110.63390.6739
GroundnutCORQM0.59100.64450.64460.6978
GroundnutCORQM_Invchi0.55880.62010.62820.6653
GroundnutCORQM_Logit0.56530.62070.62530.6668
GroundnutCORQM_Meanp0.57640.62610.62650.6751
GroundnutCORQM_Sumz0.55950.61780.62420.6633
GroundnutCORSumZ0.56640.62300.62980.6662
GroundnutCOR_SEGBLUP0.01170.01620.01680.0193
GroundnutCOR_SEInvchi0.01440.01820.01840.0217
GroundnutCOR_SELogit0.01620.01870.01890.0206
GroundnutCOR_SEMeanp0.01740.01860.01780.0214
GroundnutCOR_SEQM0.01170.01610.01690.0187
GroundnutCOR_SEQM_Invchi0.01430.01900.01940.0232
GroundnutCOR_SEQM_Logit0.01570.01870.01930.0207
GroundnutCOR_SEQM_Meanp0.01770.01930.01910.0212
GroundnutCOR_SEQM_Sumz0.01720.01890.01870.0210
GroundnutCOR_SESumZ0.01640.01880.01860.0215
GroundnutNRMSEGBLUP0.18570.21850.20980.2688
GroundnutNRMSEInvchi0.19320.22430.21400.2760
GroundnutNRMSELogit0.19270.22390.21440.2741
GroundnutNRMSEMeanp0.19130.22250.21180.2749
GroundnutNRMSEQM0.18520.22190.21510.2720
GroundnutNRMSEQM_Invchi0.19300.22640.21700.2787
GroundnutNRMSEQM_Logit0.19240.22640.21810.2772
GroundnutNRMSEQM_Meanp0.19010.22490.21530.2790
GroundnutNRMSEQM_Sumz0.19240.22710.21780.2803
GroundnutNRMSESumZ0.19270.22460.21480.2764
GroundnutNRMSE_SEGBLUP0.00420.00640.00660.0083
GroundnutNRMSE_SEInvchi0.00410.00700.00720.0095
GroundnutNRMSE_SELogit0.00430.00680.00730.0083
GroundnutNRMSE_SEMeanp0.00430.00690.00720.0089
GroundnutNRMSE_SEQM0.00450.00650.00630.0090
GroundnutNRMSE_SEQM_Invchi0.00430.00670.00640.0097
GroundnutNRMSE_SEQM_Logit0.00490.00650.00630.0083
GroundnutNRMSE_SEQM_Meanp0.00470.00680.00670.0091
GroundnutNRMSE_SEQM_Sumz0.00490.00660.00620.0090
GroundnutNRMSE_SESumZ0.00430.00680.00710.0088
EYT_2CORGBLUP0.44930.53200.52960.6196
EYT_2CORInvchi0.43410.50660.50390.5846
EYT_2CORLogit0.43580.50920.50820.5845
EYT_2CORMeanp0.43850.51330.50920.5961
EYT_2CORQM0.41350.51540.51630.6157
EYT_2CORQM_Invchi0.39560.48670.48930.5726
EYT_2CORQM_Logit0.39590.49460.50630.5701
EYT_2CORQM_Meanp0.41770.48810.49150.5516
EYT_2CORQM_Sumz0.37300.48070.48480.5805
EYT_2CORSumZ0.43730.51130.50860.5909
EYT_2COR_SEGBLUP0.01270.01540.01470.0193
EYT_2COR_SEInvchi0.01460.01940.02020.0228
EYT_2COR_SELogit0.01600.01850.01750.0230
EYT_2COR_SEMeanp0.01530.01830.01800.0219
EYT_2COR_SEQM0.01340.02190.02230.0297
EYT_2COR_SEQM_Invchi0.02030.02550.02490.0319
EYT_2COR_SEQM_Logit0.01620.02090.01800.0313
EYT_2COR_SEQM_Meanp0.01590.02590.02610.0352
EYT_2COR_SEQM_Sumz0.02020.02470.02490.0287
EYT_2COR_SESumZ0.01530.01880.01850.0231
EYT_2NRMSEGBLUP0.78660.84630.85080.8970
EYT_2NRMSEInvchi0.81590.86400.86740.9054
EYT_2NRMSELogit0.81640.86280.86530.9043
EYT_2NRMSEMeanp0.80840.86020.86490.9027
EYT_2NRMSEQM0.83780.99300.99791.1382
EYT_2NRMSEQM_Invchi0.87680.98980.98691.1086
EYT_2NRMSEQM_Logit0.87770.94790.90201.1098
EYT_2NRMSEQM_Meanp0.92130.98630.99261.0385
EYT_2NRMSEQM_Sumz0.87171.03841.04401.1939
EYT_2NRMSESumZ0.81170.86190.86640.9032
EYT_2NRMSE_SEGBLUP0.00770.01000.01050.0114
EYT_2NRMSE_SEInvchi0.00900.01090.01110.0124
EYT_2NRMSE_SELogit0.00920.01060.01040.0125
EYT_2NRMSE_SEMeanp0.00970.01040.01010.0118
EYT_2NRMSE_SEQM0.01150.07460.06110.1648
EYT_2NRMSE_SEQM_Invchi0.01700.08240.07370.1652
EYT_2NRMSE_SEQM_Logit0.01200.04290.02810.1036
EYT_2NRMSE_SEQM_Meanp0.01490.07830.08940.1195
EYT_2NRMSE_SEQM_Sumz0.04400.11050.11630.1654
EYT_2NRMSE_SESumZ0.00920.01080.01080.0124
EYT_3CORGBLUP0.47600.48840.48810.5012
EYT_3CORInvchi0.46040.46890.46610.4830
EYT_3CORLogit0.45980.46950.47050.4771
EYT_3CORMeanp0.46480.47550.47150.4944
EYT_3CORQM0.43400.46180.45660.5002
EYT_3CORQM_Invchi0.41040.43830.43070.4814
EYT_3CORQM_Logit0.38390.43810.44740.4736
EYT_3CORQM_Meanp0.43720.45350.44200.4930
EYT_3CORQM_Sumz0.39680.44050.44860.4681
EYT_3CORSumZ0.46670.47300.47100.4830
EYT_3COR_SEGBLUP0.01370.01960.02030.0243
EYT_3COR_SEInvchi0.01380.01850.01890.0224
EYT_3COR_SELogit0.01450.01840.01780.0234
EYT_3COR_SEMeanp0.01340.01810.01790.0231
EYT_3COR_SEQM0.01390.02440.02540.0329
EYT_3COR_SEQM_Invchi0.01400.02740.02740.0406
EYT_3COR_SEQM_Logit0.01470.02250.02000.0352
EYT_3COR_SEQM_Meanp0.01330.02540.02790.0325
EYT_3COR_SEQM_Sumz0.01580.02860.02910.0402
EYT_3COR_SESumZ0.01510.01790.01730.0218
EYT_3NRMSEGBLUP0.86630.87580.87570.8854
EYT_3NRMSEInvchi0.87500.88500.88660.8919
EYT_3NRMSELogit0.87870.88430.88420.8899
EYT_3NRMSEMeanp0.86940.88090.88260.8892
EYT_3NRMSEQM0.95491.16441.18691.3290
EYT_3NRMSEQM_Invchi0.92261.15721.17581.3544
EYT_3NRMSEQM_Logit0.93241.12721.02121.5341
EYT_3NRMSEQM_Meanp0.92291.10001.10391.2693
EYT_3NRMSEQM_Sumz0.93581.18751.14501.5243
EYT_3NRMSESumZ0.87550.88240.88350.8871
EYT_3NRMSE_SEGBLUP0.00800.01160.01160.0154
EYT_3NRMSE_SEInvchi0.00730.01000.01040.0117
EYT_3NRMSE_SELogit0.00760.00980.00940.0129
EYT_3NRMSE_SEMeanp0.00700.01000.01020.0124
EYT_3NRMSE_SEQM0.01280.13210.15000.2155
EYT_3NRMSE_SEQM_Invchi0.01050.13740.13070.2777
EYT_3NRMSE_SEQM_Logit0.01480.10400.07750.2461
EYT_3NRMSE_SEQM_Meanp0.01130.15210.13620.3246
EYT_3NRMSE_SEQM_Sumz0.01280.13830.13160.2773
EYT_3NRMSE_SESumZ0.00780.00970.00960.0117
Wheat_2CORGBLUP0.36050.36050.36050.3605
Wheat_2CORInvchi0.32580.32580.32580.3258
Wheat_2CORLogit0.32360.32360.32360.3236
Wheat_2CORMeanp0.32570.32570.32570.3257
Wheat_2CORQM0.34760.34760.34760.3476
Wheat_2CORQM_Invchi0.32410.32410.32410.3241
Wheat_2CORQM_Logit0.30640.30640.30640.3064
Wheat_2CORQM_Meanp0.29990.29990.29990.2999
Wheat_2CORQM_Sumz0.31150.31150.31150.3115
Wheat_2CORSumZ0.33030.33030.33030.3303
Wheat_2COR_SEGBLUP0.01440.01440.01440.0144
Wheat_2COR_SEInvchi0.01440.01440.01440.0144
Wheat_2COR_SELogit0.01530.01530.01530.0153
Wheat_2COR_SEMeanp0.01620.01620.01620.0162
Wheat_2COR_SEQM0.02160.02160.02160.0216
Wheat_2COR_SEQM_Invchi0.01400.01400.01400.0140
Wheat_2COR_SEQM_Logit0.02810.02810.02810.0281
Wheat_2COR_SEQM_Meanp0.02910.02910.02910.0291
Wheat_2COR_SEQM_Sumz0.02570.02570.02570.0257
Wheat_2COR_SESumZ0.01440.01440.01440.0144
Wheat_2NRMSEGBLUP0.93400.93400.93400.9340
Wheat_2NRMSEInvchi0.94770.94770.94770.9477
Wheat_2NRMSELogit0.94950.94950.94950.9495
Wheat_2NRMSEMeanp0.94840.94840.94840.9484
Wheat_2NRMSEQM1.11661.11661.11661.1166
Wheat_2NRMSEQM_Invchi1.03451.03451.03451.0345
Wheat_2NRMSEQM_Logit1.10271.10271.10271.1027
Wheat_2NRMSEQM_Meanp1.16471.16471.16471.1647
Wheat_2NRMSEQM_Sumz1.09551.09551.09551.0955
Wheat_2NRMSESumZ0.94700.94700.94700.9470
Wheat_2NRMSE_SEGBLUP0.00530.00530.00530.0053
Wheat_2NRMSE_SEInvchi0.00390.00390.00390.0039
Wheat_2NRMSE_SELogit0.00430.00430.00430.0043
Wheat_2NRMSE_SEMeanp0.00430.00430.00430.0043
Wheat_2NRMSE_SEQM0.07320.07320.07320.0732
Wheat_2NRMSE_SEQM_Invchi0.00900.00900.00900.0090
Wheat_2NRMSE_SEQM_Logit0.06990.06990.06990.0699
Wheat_2NRMSE_SEQM_Meanp0.08510.08510.08510.0851
Wheat_2NRMSE_SEQM_Sumz0.06590.06590.06590.0659
Wheat_2NRMSE_SESumZ0.00390.00390.00390.0039
Wheat_3CORGBLUP0.37190.37190.37190.3719
Wheat_3CORInvchi0.31170.31170.31170.3117
Wheat_3CORLogit0.30850.30850.30850.3085
Wheat_3CORMeanp0.32240.32240.32240.3224
Wheat_3CORQM0.34860.34860.34860.3486
Wheat_3CORQM_Invchi0.28660.28660.28660.2866
Wheat_3CORQM_Logit0.28620.28620.28620.2862
Wheat_3CORQM_Meanp0.28080.28080.28080.2808
Wheat_3CORQM_Sumz0.28880.28880.28880.2888
Wheat_3CORSumZ0.31360.31360.31360.3136
Wheat_3COR_SEGBLUP0.01320.01320.01320.0132
Wheat_3COR_SEInvchi0.01070.01070.01070.0107
Wheat_3COR_SELogit0.00730.00730.00730.0073
Wheat_3COR_SEMeanp0.01060.01060.01060.0106
Wheat_3COR_SEQM0.01940.01940.01940.0194
Wheat_3COR_SEQM_Invchi0.02000.02000.02000.0200
Wheat_3COR_SEQM_Logit0.02040.02040.02040.0204
Wheat_3COR_SEQM_Meanp0.02530.02530.02530.0253
Wheat_3COR_SEQM_Sumz0.01800.01800.01800.0180
Wheat_3COR_SESumZ0.00970.00970.00970.0097
Wheat_3NRMSEGBLUP0.92990.92990.92990.9299
Wheat_3NRMSEInvchi0.95140.95140.95140.9514
Wheat_3NRMSELogit0.95330.95330.95330.9533
Wheat_3NRMSEMeanp0.94830.94830.94830.9483
Wheat_3NRMSEQM1.11771.11771.11771.1177
Wheat_3NRMSEQM_Invchi1.12231.12231.12231.1223
Wheat_3NRMSEQM_Logit1.12581.12581.12581.1258
Wheat_3NRMSEQM_Meanp1.17781.17781.17781.1778
Wheat_3NRMSEQM_Sumz1.12171.12171.12171.1217
Wheat_3NRMSESumZ0.95170.95170.95170.9517
Wheat_3NRMSE_SEGBLUP0.00550.00550.00550.0055
Wheat_3NRMSE_SEInvchi0.00310.00310.00310.0031
Wheat_3NRMSE_SELogit0.00220.00220.00220.0022
Wheat_3NRMSE_SEMeanp0.00330.00330.00330.0033
Wheat_3NRMSE_SEQM0.06250.06250.06250.0625
Wheat_3NRMSE_SEQM_Invchi0.06350.06350.06350.0635
Wheat_3NRMSE_SEQM_Logit0.06470.06470.06470.0647
Wheat_3NRMSE_SEQM_Meanp0.08230.08230.08230.0823
Wheat_3NRMSE_SEQM_Sumz0.06320.06320.06320.0632
Wheat_3NRMSE_SESumZ0.00340.00340.00340.0034
Wheat_4CORGBLUP0.36290.36290.36290.3629
Wheat_4CORInvchi0.33110.33110.33110.3311
Wheat_4CORLogit0.33290.33290.33290.3329
Wheat_4CORMeanp0.34090.34090.34090.3409
Wheat_4CORQM0.35050.35050.35050.3505
Wheat_4CORQM_Invchi0.31590.31590.31590.3159
Wheat_4CORQM_Logit0.33260.33260.33260.3326
Wheat_4CORQM_Meanp0.34080.34080.34080.3408
Wheat_4CORQM_Sumz0.34220.34220.34220.3422
Wheat_4CORSumZ0.34140.34140.34140.3414
Wheat_4COR_SEGBLUP0.01490.01490.01490.0149
Wheat_4COR_SEInvchi0.01500.01500.01500.0150
Wheat_4COR_SELogit0.01650.01650.01650.0165
Wheat_4COR_SEMeanp0.01600.01600.01600.0160
Wheat_4COR_SEQM0.02380.02380.02380.0238
Wheat_4COR_SEQM_Invchi0.02200.02200.02200.0220
Wheat_4COR_SEQM_Logit0.01670.01670.01670.0167
Wheat_4COR_SEQM_Meanp0.01590.01590.01590.0159
Wheat_4COR_SEQM_Sumz0.01570.01570.01570.0157
Wheat_4COR_SESumZ0.01590.01590.01590.0159
Wheat_4NRMSEGBLUP0.93340.93340.93340.9334
Wheat_4NRMSEInvchi0.94480.94480.94480.9448
Wheat_4NRMSELogit0.94440.94440.94440.9444
Wheat_4NRMSEMeanp0.94210.94210.94210.9421
Wheat_4NRMSEQM1.11151.11151.11151.1115
Wheat_4NRMSEQM_Invchi1.08901.08901.08901.0890
Wheat_4NRMSEQM_Logit1.01851.01851.01851.0185
Wheat_4NRMSEQM_Meanp1.02221.02221.02221.0222
Wheat_4NRMSEQM_Sumz1.01581.01581.01581.0158
Wheat_4NRMSESumZ0.94160.94160.94160.9416
Wheat_4NRMSE_SEGBLUP0.00590.00590.00590.0059
Wheat_4NRMSE_SEInvchi0.00470.00470.00470.0047
Wheat_4NRMSE_SELogit0.00510.00510.00510.0051
Wheat_4NRMSE_SEMeanp0.00500.00500.00500.0050
Wheat_4NRMSE_SEQM0.07580.07580.07580.0758
Wheat_4NRMSE_SEQM_Invchi0.07410.07410.07410.0741
Wheat_4NRMSE_SEQM_Logit0.01380.01380.01380.0138
Wheat_4NRMSE_SEQM_Meanp0.01480.01480.01480.0148
Wheat_4NRMSE_SEQM_Sumz0.01370.01370.01370.0137
Wheat_4NRMSE_SESumZ0.00500.00500.00500.0050
Wheat_5CORGBLUP0.43670.43670.43670.4367
Wheat_5CORInvchi0.41400.41400.41400.4140
Wheat_5CORLogit0.41570.41570.41570.4157
Wheat_5CORMeanp0.42670.42670.42670.4267
Wheat_5CORQM0.42770.42770.42770.4277
Wheat_5CORQM_Invchi0.39760.39760.39760.3976
Wheat_5CORQM_Logit0.39980.39980.39980.3998
Wheat_5CORQM_Meanp0.42560.42560.42560.4256
Wheat_5CORQM_Sumz0.40460.40460.40460.4046
Wheat_5CORSumZ0.41890.41890.41890.4189
Wheat_5COR_SEGBLUP0.01790.01790.01790.0179
Wheat_5COR_SEInvchi0.01980.01980.01980.0198
Wheat_5COR_SELogit0.01980.01980.01980.0198
Wheat_5COR_SEMeanp0.01950.01950.01950.0195
Wheat_5COR_SEQM0.01600.01600.01600.0160
Wheat_5COR_SEQM_Invchi0.02090.02090.02090.0209
Wheat_5COR_SEQM_Logit0.02140.02140.02140.0214
Wheat_5COR_SEQM_Meanp0.01880.01880.01880.0188
Wheat_5COR_SEQM_Sumz0.02400.02400.02400.0240
Wheat_5COR_SESumZ0.01820.01820.01820.0182
Wheat_5NRMSEGBLUP0.90110.90110.90110.9011
Wheat_5NRMSEInvchi0.91280.91280.91280.9128
Wheat_5NRMSELogit0.91270.91270.91270.9127
Wheat_5NRMSEMeanp0.90670.90670.90670.9067
Wheat_5NRMSEQM1.07871.07871.07871.0787
Wheat_5NRMSEQM_Invchi1.05221.05221.05221.0522
Wheat_5NRMSEQM_Logit1.05171.05171.05171.0517
Wheat_5NRMSEQM_Meanp0.99090.99090.99090.9909
Wheat_5NRMSEQM_Sumz1.04761.04761.04761.0476
Wheat_5NRMSESumZ0.90980.90980.90980.9098
Wheat_5NRMSE_SEGBLUP0.00970.00970.00970.0097
Wheat_5NRMSE_SEInvchi0.00880.00880.00880.0088
Wheat_5NRMSE_SELogit0.00930.00930.00930.0093
Wheat_5NRMSE_SEMeanp0.00900.00900.00900.0090
Wheat_5NRMSE_SEQM0.05270.05270.05270.0527
Wheat_5NRMSE_SEQM_Invchi0.06050.06050.06050.0605
Wheat_5NRMSE_SEQM_Logit0.06120.06120.06120.0612
Wheat_5NRMSE_SEQM_Meanp0.02110.02110.02110.0211
Wheat_5NRMSE_SEQM_Sumz0.07150.07150.07150.0715
Wheat_5NRMSE_SESumZ0.00800.00800.00800.0080
Wheat_6CORGBLUP0.53070.53070.53070.5307
Wheat_6CORInvchi0.51670.51670.51670.5167
Wheat_6CORLogit0.52180.52180.52180.5218
Wheat_6CORMeanp0.52060.52060.52060.5206
Wheat_6CORQM0.51090.51090.51090.5109
Wheat_6CORQM_Invchi0.50010.50010.50010.5001
Wheat_6CORQM_Logit0.51920.51920.51920.5192
Wheat_6CORQM_Meanp0.52000.52000.52000.5200
Wheat_6CORQM_Sumz0.49700.49700.49700.4970
Wheat_6CORSumZ0.51830.51830.51830.5183
Wheat_6COR_SEGBLUP0.00920.00920.00920.0092
Wheat_6COR_SEInvchi0.01070.01070.01070.0107
Wheat_6COR_SELogit0.00950.00950.00950.0095
Wheat_6COR_SEMeanp0.00930.00930.00930.0093
Wheat_6COR_SEQM0.01640.01640.01640.0164
Wheat_6COR_SEQM_Invchi0.02310.02310.02310.0231
Wheat_6COR_SEQM_Logit0.00960.00960.00960.0096
Wheat_6COR_SEQM_Meanp0.00880.00880.00880.0088
Wheat_6COR_SEQM_Sumz0.01910.01910.01910.0191
Wheat_6COR_SESumZ0.01120.01120.01120.0112
Wheat_6NRMSEGBLUP0.84980.84980.84980.8498
Wheat_6NRMSEInvchi0.86310.86310.86310.8631
Wheat_6NRMSELogit0.85980.85980.85980.8598
Wheat_6NRMSEMeanp0.85820.85820.85820.8582
Wheat_6NRMSEQM0.98630.98630.98630.9863
Wheat_6NRMSEQM_Invchi0.95900.95900.95900.9590
Wheat_6NRMSEQM_Logit0.89940.89940.89940.8994
Wheat_6NRMSEQM_Meanp0.90170.90170.90170.9017
Wheat_6NRMSEQM_Sumz0.95360.95360.95360.9536
Wheat_6NRMSESumZ0.86060.86060.86060.8606
Wheat_6NRMSE_SEGBLUP0.00590.00590.00590.0059
Wheat_6NRMSE_SEInvchi0.00620.00620.00620.0062
Wheat_6NRMSE_SELogit0.00580.00580.00580.0058
Wheat_6NRMSE_SEMeanp0.00520.00520.00520.0052
Wheat_6NRMSE_SEQM0.06760.06760.06760.0676
Wheat_6NRMSE_SEQM_Invchi0.06480.06480.06480.0648
Wheat_6NRMSE_SEQM_Logit0.01070.01070.01070.0107
Wheat_6NRMSE_SEQM_Meanp0.00890.00890.00890.0089
Wheat_6NRMSE_SEQM_Sumz0.05090.05090.05090.0509
Wheat_6NRMSE_SESumZ0.00640.00640.00640.0064

References

  1. Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef]
  2. Crossa, J.; Pérez-Rodríguez, P.; Cuevas, J.; Montesinos-López, O.; Jarquín, D.; de Los Campos, G.; Burgueño, J.; González-Camacho, J.M.; Pérez-Elizalde, S.; Beyene, Y.; et al. Genomic selection in plant breeding: Methods, models, and perspectives. Trends Plant Sci. 2017, 22, 961–975. [Google Scholar] [CrossRef] [PubMed]
  3. Heffner, E.L.; Sorrells, M.E.; Jannink, J.L. Genomic selection for crop improvement. Crop Sci. 2009, 49, 1–12. [Google Scholar] [CrossRef]
  4. Varshney, R.K.; Roorkiwal, M.; Sorrells, M.E. Genomic selection for crop improvement: An introduction. In Genomic Selection for Crop Improvement: New Molecular Breeding Strategies for Crop Improvement; Springer: Cham, Switzerland, 2017; pp. 1–6. [Google Scholar]
  5. Xu, Y.; Liu, X.; Fu, J.; Wang, H.; Wang, J.; Huang, C.; Prasanna, B.M.; Olsen, M.S.; Wang, G.; Zhang, A. Enhancing genetic gain through genomic selection: From livestock to plants. Plant Commun. 2020, 1, 100005. [Google Scholar] [CrossRef] [PubMed]
  6. Voss-Fels, K.P.; Cooper, M.; Hayes, B.J. Accelerating crop genetic gains with genomic selection. Theor. Appl. Genet. 2019, 132, 669–686. [Google Scholar] [CrossRef]
  7. Rutkoski, J.; Poland, J.; Mondal, S.; Autrique, E.; Pérez, L.G.; Crossa, J.; Reynolds, M.; Singh, R. Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 Genes Genomes Genet. 2016, 6, 2799–2808. [Google Scholar] [CrossRef]
  8. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
  9. Fernando, R.L.; Gianola, D. Optimal properties of the conditional mean as a selection criterion. TAG Theor. Appl. Genet. Theor. Angew. Genet. 1986, 72, 822–825. [Google Scholar] [CrossRef]
  10. Robinson, G.K. That BLUP is a Good Thing: The Estimation of Random Effects. Stat. Sci. 1991, 6, 15–32. [Google Scholar]
  11. Cannon, A.J.; Sobie, S.R.; Murdock, T.Q. Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes? J. Clim. 2015, 28, 6938–6959. [Google Scholar] [CrossRef]
  12. Tarr, G.; Müller, S.; Weber, N.C. Robustness and outlier detection in genomic prediction. BMC Bioinform. 2016, 17, 1–13. [Google Scholar]
  13. Gudmundsson, L.; Bremnes, J.B.; Haugen, J.E.; Engen-Skaugen, T. Technical Note: Downscaling RCM precipitation to the station scale using statistical transformations—A comparison of methods. Hydrol. Earth Syst. Sci. 2012, 16, 3383–3390. [Google Scholar] [CrossRef]
  14. Li, H.; Sheffield, J.; Wood, E.F. Bias correction of monthly precipitation and temperature fields from Intergovernmental Panel on Climate Change AR4 models using equidistant quantile matching. J. Geophys. Res. Atmos. 2010, 115, D10101. [Google Scholar] [CrossRef]
  15. Feng, S.; Zhang, H.; Tong, X.; Wang, Y.; Liu, B. Application of quantile mapping bias correction in remote sensing hydrology. Remote Sens. 2021, 13, 2118. [Google Scholar]
  16. Hodge, V.J.; Austin, J. A survey of outlier detection methodologies. Artif. Intell. Rev. 2004, 22, 85–126. [Google Scholar] [CrossRef]
  17. Zhou, L.; Ding, X.; Zhang, Q.; Wang, Y.; Lund, M.S.; Su, G. Influence of outliers on accuracy of genomic prediction for feed efficiency traits in dairy cattle. J. Dairy Sci. 2014, 97, 7346–7357. [Google Scholar]
  18. González-Recio, O.; Rosa, G.J.M.; Gianola, D. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. J. Dairy Sci. 2014, 97, 4648–4659. [Google Scholar] [CrossRef]
  19. Heslot, N.; Akdemir, D.; Sorrells, M.E.; Jannink, J.L. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor. Appl. Genet. 2014, 128, 669–682. [Google Scholar] [CrossRef]
  20. Spindel, J.E.; McCouch, S.R. When more is better: How data sharing would accelerate genomic selection of crop plants. New Phytol. 2016, 212, 814–826. [Google Scholar] [CrossRef]
  21. Montesinos-López, O.A.; Pierre, C.S.; Gezan, S.A.; Bentley, A.R.; Mosqueda-González, B.A.; Montesinos-López, A.; van Eeuwijk, F.; Beyene, Y.; Gowda, M.; Gardner, K.; et al. Optimizing sparse testing for genomic prediction of plant breeding crops. Genes 2023, 14, 927. [Google Scholar] [CrossRef]
  22. Montesinos-López, O.A.; Mosqueda-González, B.A.; Salinas-Ruiz, J.; Montesinos-López, A.; Crossa, J. Sparse multi-trait genomic prediction under balanced incomplete block design. Plant Genome 2023, 16, e20305. [Google Scholar] [CrossRef] [PubMed]
  23. Montesinos-López, O.A.; Herr, A.W.; Crossa, J.; Montesinos-López, A.; Carter, A.H. Enhancing winter wheat prediction with genomics, phenomics and environmental data. BMC Genom. 2024, 25, 544. [Google Scholar] [CrossRef] [PubMed]
  24. Hickey, J.M.; Chiurugwi, T.; Mackay, I.; Powell, W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat. Genet. 2017, 49, 1297–1303. [Google Scholar] [CrossRef]
  25. Wang, X.; Xu, Y.; Hu, Z.; Xu, C. Genomic selection methods for crop improvement: Current status and prospects. Crop J. 2018, 6, 330–340. [Google Scholar] [CrossRef]
  26. R Core Team. R: A Language and Environment for Statistical Computing. In R Foundation for Statistical Computing; R Core Team: Vienna, Austria, 2024; Available online: https://www.R-project.org/ (accessed on 8 January 2025).
  27. Pérez, P.; de Los Campos, G. Genome-wide regression and prediction with the BGLR statistical package. Genetics 2014, 198, 483–495. [Google Scholar] [CrossRef]
  28. Tong, Y.; Gao, X.; Han, Z.; Xu, Y.; Xu, Y.; Giorgi, F. Bias correction of temperature and precipitation over China for RCM simulations using the QM and QDM methods. Clim. Dyn. 2021, 57, 1425–1443. [Google Scholar] [CrossRef]
  29. Piani, C.; Haerter, J.O.; Coppola, E. Statistical bias correction for daily precipitation in regional climate models over Europe. Theor. Appl. Climatol. 2010, 99, 187–192. [Google Scholar] [CrossRef]
  30. Budhlakoti, N.; Rai, A.; Mishra, D.C. Statistical approach for improving genomic prediction accuracy through efficient diagnostic measure of influential observation. Sci. Rep. 2020, 10, 8408. [Google Scholar] [CrossRef]
  31. Won, S.; Morris, N.; Lu, Q.; Elston, R.C. Choosing an optimal method to combine P-values. Stat. Med. 2009, 28, 1537–1553. [Google Scholar] [CrossRef]
  32. Mudholkar, G.S.; George, E.O. The logit method for combining probabilities. In Symposium on Optimizing Methods in Statistics; Rustagi, J., Ed.; Academic Press: New York, NY, USA, 1979; pp. 345–366. [Google Scholar]
  33. Sutton, A.J.; Abrams, K.R.; Jones, D.R.; Sheldon, T.A.; Song, F. Methods for Meta-Analysis in Medical Research; Wiley: Chichester, UK, 2000; Volume 348. [Google Scholar]
  34. Stouffer, S.A.; Suchman, E.A.; DeVinney, L.C.; Star, S.A.; Williams, R.M., Jr. The American Soldier: Adjustment During Army Life. (Studies in Social Psychology in World War ii), Vol. 1; Princeton Univ. Press: Princeton, NJ, USA, 1949. [Google Scholar]
  35. Montesinos López, O.A.; Montesinos-López, A.; Crossa, J. Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer: Cham, Switzerland, 2022. [Google Scholar]
Figure 1. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for Disease dataset.
Figure 1. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for Disease dataset.
Ijms 26 03620 g001
Figure 2. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for EYT_1 dataset.
Figure 2. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for EYT_1 dataset.
Ijms 26 03620 g002
Figure 3. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for Wheat_1 dataset.
Figure 3. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) for Wheat_1 dataset.
Ijms 26 03620 g003
Figure 4. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) across datasets, using quantile mapping.
Figure 4. Comparative performance of genomic prediction methods in terms of Pearson’s correlation (COR) (A) and normalized root mean square error (NRMSE) (B) across datasets, using quantile mapping.
Ijms 26 03620 g004
Table 1. Brief data description. RCBD denotes randomized complete block design, while alpha-lattice denotes the alpha lattice experimental design.
Table 1. Brief data description. RCBD denotes randomized complete block design, while alpha-lattice denotes the alpha lattice experimental design.
DataNo. LinesNo. MarkersMulti-Environment DataBLUEs Across
Environments
Experimental Design
Indica32716,383YESYESRCBD
Japonica32016,383YESYESRCBD
Groundnut3188268YESYESAlpha-lattice
Maize72254,113YESYESRCBD
Wheat_1130178,606YESYESAlpha-lattice
Wheat_2140378,606YESYESAlpha-lattice
Wheat_3140378,606YESYESAlpha-lattice
Wheat_4138878,606YESYESAlpha-lattice
Wheat_5139878,606YESYESAlpha-lattice
Wheat_6127778,606YESYESAlpha-lattice
EYT_17762038YESYESAlpha-lattice
EYT_27752038YESYESAlpha-lattice
EYT_39642038YESYESAlpha-lattice
Disease43811,617YESYESRCBD
Table 2. Outlier detection methods that combine p-values to calculate overall significance, where: p k   denotes the statistical significance value from kth methods for an individual or genotype; K: different methods for which p-values can be combined; df: degrees of freedom; N: normal distribution; t: central t-distribution; χ2: central Chi-square distribution.
Table 2. Outlier detection methods that combine p-values to calculate overall significance, where: p k   denotes the statistical significance value from kth methods for an individual or genotype; K: different methods for which p-values can be combined; df: degrees of freedom; N: normal distribution; t: central t-distribution; χ2: central Chi-square distribution.
MethodsAuthorsTest StatisticsTransformed Variable Dist .   Under   H 0
Inverse Chi-Square (Invchi)Won, et al. (2009) [31]. L = k = 1 K Z k Z k = 2 l o g p k χ 2 K 2
LogitMudholkar and George (1979) [32]. S = k = 1 K S k S k = l o g p k 1 p k t 5 K + 4
MeanpSutton, et al. (2000) [33] W = ( 0.5 p ¯ ) 12 K p ¯ = k = 1 K p k K N(0,1)
SumZStouffer, et al. (1949) [34] Z = k = 1 K w k z ( p k ) k = 1 K w k 2 NAN(0,1)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Montesinos-López, O.A.; Crossa, J.; Vitale, P.; Gerard, G.; Crespo-Herrera, L.; Dreisigacker, S.; Saint Pierre, C.; Posadas, L.G.; Agbona, A.; Buenrostro-Mariscal, R.; et al. GBLUP Outperforms Quantile Mapping and Outlier Detection for Enhanced Genomic Prediction. Int. J. Mol. Sci. 2025, 26, 3620. https://doi.org/10.3390/ijms26083620

AMA Style

Montesinos-López OA, Crossa J, Vitale P, Gerard G, Crespo-Herrera L, Dreisigacker S, Saint Pierre C, Posadas LG, Agbona A, Buenrostro-Mariscal R, et al. GBLUP Outperforms Quantile Mapping and Outlier Detection for Enhanced Genomic Prediction. International Journal of Molecular Sciences. 2025; 26(8):3620. https://doi.org/10.3390/ijms26083620

Chicago/Turabian Style

Montesinos-López, Osval Antonio, José Crossa, Paolo Vitale, Guillermo Gerard, Leonardo Crespo-Herrera, Susanne Dreisigacker, Carolina Saint Pierre, Luis G. Posadas, Afolabi Agbona, Raymundo Buenrostro-Mariscal, and et al. 2025. "GBLUP Outperforms Quantile Mapping and Outlier Detection for Enhanced Genomic Prediction" International Journal of Molecular Sciences 26, no. 8: 3620. https://doi.org/10.3390/ijms26083620

APA Style

Montesinos-López, O. A., Crossa, J., Vitale, P., Gerard, G., Crespo-Herrera, L., Dreisigacker, S., Saint Pierre, C., Posadas, L. G., Agbona, A., Buenrostro-Mariscal, R., Montesinos-López, A., & Chawade, A. (2025). GBLUP Outperforms Quantile Mapping and Outlier Detection for Enhanced Genomic Prediction. International Journal of Molecular Sciences, 26(8), 3620. https://doi.org/10.3390/ijms26083620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop