ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP

Rao, Yao; Zhang, Lilian; Gao, Lutao; Wang, Shuran; Yang, Linnan

doi:10.3390/ani15081172

Open AccessArticle

ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP

by

Yao Rao

^1,2,3

,

Lilian Zhang

^1,2,3,

Lutao Gao

^1,2,3

,

Shuran Wang

^1,2,3 and

Linnan Yang

^1,2,3,*

¹

College of Big Data, Yunnan Agricultural University, Kunming 650201, China

²

Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China

³

Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China

^*

Author to whom correspondence should be addressed.

Animals 2025, 15(8), 1172; https://doi.org/10.3390/ani15081172

Submission received: 1 April 2025 / Revised: 16 April 2025 / Accepted: 17 April 2025 / Published: 18 April 2025

(This article belongs to the Special Issue Molecular Markers and Genomic Selection in Farm Animal Improvement)

Download

Browse Figures

Versions Notes

Simple Summary

In recent years, machine learning has garnered significant attention in genomic selection. However, the inherent “black-box” nature of machine learning models often hinders their interpretability, making it challenging to understand their decision-making processes. In this study, we propose ExAutoGP, a novel genome prediction method that leverages automated machine learning (AutoML) to enhance predictive accuracy while improving model interpretability by integrating SHapley Additive exPlanations (SHAP). We evaluate the predictive performance of ExAutoGP against genomic best linear unbiased prediction (GBLUP), BayesB, support vector regression (SVR), kernel ridge regression (KRR), and random forest (RF) across three datasets. The results demonstrate that ExAutoGP consistently delivers robust and superior predictive performance across all traits. Furthermore, the SHAP method was applied to determine the importance of the features and interpret the ExAutoGP model.

Abstract

Machine learning has attracted much attention in the field of genomic prediction due to its powerful predictive capabilities, yet the lack of an explanatory nature in modeling decisions remains a major challenge. In this study, we propose a novel machine learning method, ExAutoGP, which aims to improve the accuracy of genomic prediction and enhance the transparency of the model by combining automated machine learning (AutoML) with SHapley Additive exPlanations (SHAP). To evaluate ExAutoGP’s effectiveness, we designed a comparative experiment consisting of a simulated dataset and two real animal datasets. For each dataset, we applied ExAutoGP and five baseline models—Genomic Best Linear Unbiased Prediction (GBLUP), BayesB, Support Vector Regression (SVR), Kernel Ridge Regression (KRR), and Random Forest (RF). All models were trained and evaluated using five repeated five-fold cross-validation, and their performance was assessed based on both predictive accuracy and computational efficiency. The results show that ExAutoGP exhibits robust and excellent prediction performance on all datasets. In addition, the SHAP method not only effectively reveals the decision-making process of ExAutoGP and enhances its interpretability, but also identifies genetic markers closely related to the traits. This study demonstrates the strong potential of AutoML in genomic prediction, while the introduction of SHAP provides actionable biological insights. The synergy of high prediction accuracy and interpretability offers new perspectives for optimizing genomic selection strategies in livestock and poultry breeding.

Keywords:

genomic selection; genomic prediction; automated machine learning; SHapley additive exPlanations; animal breeding; ExAutoGP

1. Introduction

Genomic selection is an advanced breeding strategy that significantly enhances the efficiency of genetic improvement by leveraging genome-wide single nucleotide polymorphisms (SNPs) to estimate individual genomic breeding values (GEBVs) [1]. The concept of genomic selection initially emerged in animal breeding and has gained widespread adoption over the past decade, driven by the rapid decline in the cost of high-throughput genotyping and sequencing technologies. This approach has been successfully applied in both animal and plant breeding, demonstrating its effectiveness and broad applicability [2,3]. Compared to conventional breeding methods that rely on phenotypic selection or marker-assisted selection, genomic selection overcomes key limitations by enabling the prediction of breeding values for individuals that have not undergone phenotypic evaluation. It operates under the assumption that each quantitative trait locus (QTL) is in linkage disequilibrium with at least one marker within the genome, thereby enhancing the accuracy of breeding value estimation. In addition to accelerating genetic progress and increasing genetic gain, genomic selection significantly reduces breeding costs [4]. For instance, its implementation in dairy cattle breeding in 2008 led to an approximately 92% reduction in breeding costs compared to traditional progeny testing, while also eliminating the need for progeny testing sessions [5]. In American Holstein cattle, the annual genetic gain for yield traits doubled, increasing from approximately 50% to 100% [6]. Today, genomic selection is widely employed in large-scale livestock breeding programs, including those for pigs, sheep, and poultry, with more than 400 million animals genotyped to date [7]. The adoption of genomic selection has revolutionized both animal and plant breeding, fundamentally transforming genetic improvement strategies and accelerating breeding efficiency [8,9].

Currently, parametric methods are the dominant tools for genomic selection in practical breeding, mainly including GBLUP [10], single-step GBLUP [11], least absolute shrinkage and selection operator [12], and Bayesian methods, such as BayesB, BayesLasso [13,14]. GBLUP estimates GEBVs by integrating genome-wide marker data with phenotypic information [15], whereas Bayesian methods predict GEBVs by estimating the effects of each marker through a Bayesian strategy [16]. Usually, the prediction accuracy of Bayesian methods is better than that of GBLUP, but it relies on the Markov Chain Monte Carlo algorithm for parameter estimation, which has a high computational complexity and is analytically time-consuming. However, these methods are all based on linear models, which mainly consider additive genetic effects and ignore potentially complex non-linear relationships between markers and phenotypes, such as epistasis, dominance effects, and genotype–environment interactions [17]. In addition, these methods usually assume that genotypic and phenotypic data obey specific distributions (e.g., normal distribution) [1] and assume that the effects of each gene are independent and fixed, which fails to adequately capture interactions between genes and between genes and the environment.

To overcome the limitations of traditional parametric models, machine learning (ML) has garnered significant attention in genomic selection due to its ability to model complex nonlinear relationships. As a key branch of artificial intelligence, ML estimates GEBVs by uncovering statistical patterns between large-scale genotype and phenotype data without requiring strict assumptions about data distribution or gene effects [18]. Unlike conventional methods, ML can integrate genome-wide marker information and effectively capture nonlinear interactions in high-dimensional genomic data, making it particularly advantageous when the number of variables far exceeds the sample size. Furthermore, as an advanced branch of machine learning, deep learning employs multilayer neural networks to model complex nonlinear relationships. It is capable of automatically extracting features and uncovering deep patterns from high-dimensional genomic data. Consequently, an increasing number of genomic selection studies have begun to incorporate deep learning approaches [19,20]. In recent years, ML has emerged as a powerful tool in genomic selection, offering new perspectives for genomic prediction. Despite its advantages, the widespread adoption of ML in genomic selection still faces several challenges. For researchers without expertise in ML, model selection and hyperparameter optimization often rely on manual tuning, which is both computationally intensive and laborious. Furthermore, the effective application of ML techniques necessitates a deep understanding of the theoretical foundations, algorithmic properties, and appropriate application scenarios of different methods, thereby increasing the technical barrier to entry. Consequently, optimizing model selection and parameter tuning to enhance the usability and practical applicability of ML in genomic selection remains a critical area of ongoing research.

To address the above issues, the use of AutoML has been proposed to optimize the ML process. AutoML is an automated end-to-end machine learning process that automates key steps in model development, including data preprocessing, feature engineering, model selection, and hyper-parameter tuning, thereby accelerating model deployment and significantly reducing reliance on human intervention and expertise [21]. Among the existing AutoML frameworks, the Genetic Programming-based Tree Process Optimization Tool (TPOT) is a widely used approach. TPOT is capable of automatically exploring and evolving the machine learning process to find the optimal machine learning pipeline [22]. It has been shown that AutoML performs well in several domains, for example, the AutoML tool developed by Simone et al. [23] demonstrated higher reliability in predicting articular osteoarthritis and achieved excellent performance in predicting 90-day mortality after gastric cancer surgery [24]. Furthermore, Zhang et al. [25] proposed a model called CascadeDumpNet that combines deep learning and AutoML for open dump detection, which effectively eliminated false detections and improved accuracy. Nevertheless, there are still unfilled gaps in the application of AutoML in animal genomic selection. Furthermore, the ‘black-box’ nature of ML models makes it difficult to interpret the results, which limits researchers’ trust and application of the models [26]. In the field of genomic selection, model interpretability is paramount. When generating predictions or recommendations, it is essential to provide clear, transparent, and logically coherent justifications. This ensures that breeders can trust the model’s outputs and make well-informed breeding decisions. Therefore, we hypothesize that integrating AutoML with interpretability methods can not only enhance the predictive performance of the model but also improve its transparency and practicality.

Based on the above background, this study proposes a novel machine learning approach, ExAutoGP, which integrates AutoML with SHAP to improve the accuracy and interpretability of genomic prediction. First, we present the framework design of ExAutoGP in detail, including the automated modeling pipeline optimized by AutoML and the implementation of SHAP for feature interpretation. Second, we conduct a comparative experiment using one simulated dataset and two real animal datasets to systematically evaluate the predictive performance and computational efficiency of ExAutoGP against baseline models (GBLUP, BayesB, SVR, KRR, and RF). A five-times repeated five-fold cross-validation scheme is employed to ensure the robustness and reliability of the results. In addition, SHAP-based visual analysis is used to elucidate the decision-making process of ExAutoGP and clarify the specific contribution of each feature to the model output. Finally, we comprehensively analyze the experimental results and discuss the advantages and potential applications of ExAutoGP in genomic prediction. Overall, our research results provide valuable insights into genomic prediction and offer a promising direction for optimizing breeding strategies.

2. Materials and Methods

2.1. Datasets

To evaluate the performance of ExAutoGP in comparison with other methods, three publicly available datasets were used. In brief, the details of these datasets are as follows.

2.1.1. Simulation Dataset

The first dataset analyzed in this study was obtained from the 16th QTL-MAS workshop dataset (http://qtl-mas-2012.kassiopeagroup.com/en/dataset.php), which was simulated by Usai et al. (2012) [27]. This publicly available dataset comprises 3000 training individuals (from generations 1–3) and 1000 validation individuals (from generation 4). To simulate genetic effects, 50 QTLs were randomly selected, and their effect sizes were generated based on a gamma distribution with a shape parameter of 0.42 and a scale parameter of 5.4. Three genetically related quantitative traits (T1, T2, and T3) were generated from these 50 QTLs based on genotype data from 3000 trained individuals. The genotype data include five chromosomes, encompassing a total of 10,000 SNPs. This dataset has been widely used to assess the performance of genomic prediction methods [28,29].

2.1.2. German Holstein Cattle

The dataset used genotype data from 5024 German Holstein bulls, all genotyped by Illumina Bovine SNP50 BeadChip (https://www.g3journal.org/content/5/4/615.supplemental). After quality control [28], SNPs with call rates below 95%, minimum allele frequencies (MAF) < 0.01, and deviations from Hardy–Weinberg equilibrium (HWE, p < 1 ×

10^{- 4}

) were excluded, and 42,551 SNPs were ultimately retained for subsequent analysis. Based on previous studies [30,31], three traits were selected that represented different genetic constructs: milk yield (MY, kg), milk fat percentage (MFP, %), and somatic cell score (SCS). The MFP trait had a high heritability and was mainly determined by a single main effector gene (the DGAT1 gene) along with multiple small effector genes and genomic regulatory elements (GREs). The MY trait has medium heritability and its variance is influenced by DGAT1 and multiple small and medium effector genes and GREs. The SCS trait, on the other hand, is a low-heritability trait, influenced by many small effector genes and GREs, and its phenotypic values have been transformed by a normal distribution to facilitate subsequent analyses.

2.1.3. Chicken Dataset

This study utilized a dataset of 1063 Rhode Island Red chickens published by Liu et al. [32] (https://figshare.com/articles/Genome-wide_Association_Analysis_of_Age-Dependent_Egg_Weights_in_Chickens/5844420), which includes both genotypic and phenotypic data for egg weight (EW). The experimental material was obtained from the 11th-generation purebred Rhode Island Red chicken population of Beijing Huadu Yukou Poultry Breeding Co. (Beijing, China). To investigate the genetic basis of egg weight, 1078 hens with well-documented pedigrees were selected and genotyped using the Affymetrix Axiom® 600K Chicken Genotyping Microarray. Egg weight measurements were recorded at seven time points, spanning from the onset of egg laying to 80 weeks of age. In terms of phenotypic characterization, the first egg weight (FEW) was defined as the weight of the first egg laid by each hen. Subsequent egg weights were measured at 28, 36, 56, 66, 72, and 80 weeks of age. During the early laying period (≤56 weeks), egg weights were recorded over two consecutive days, whereas in the later laying period (>56 weeks), data were collected over three consecutive days. These time points were chosen based on breeding objectives, and the mean egg weight over 2 or 3 days was used as the phenotypic value for each sample. Individuals with missing genotypic or phenotypic data were excluded from the analysis. Since the dataset underwent quality control in the original study, no additional filtering was applied in this study. The three datasets contain 13 traits and the statistical information for each dataset is listed in Table 1.

2.2. Genomic Prediction Models

2.2.1. GBLUP

GBLUP uses genomic relationships to estimate the genetic value of an individual [33]. The statistical model of GBLUP can be written as:

y_{c} = 1 μ + Z g + e,

(1)

where

y_{c}

is a vector of corrected phenotypes of genotyped individuals,

μ

is the overall mean, 1 is an n-dimensional vector with all 1’s in a column, Z is the correlation matrix between

y_{c}

and g, and g is a vector of genomic breeding values. The assumed normal distribution followed is

g \sim N (0, G σ_{g}^{2})

, where G is the genomic relationship matrix (G matrix) and

σ_{g}^{2}

is the additive genetic variance. The residual effect vector (e) is distributed as

e \sim N (0, I σ_{e}^{2})

, where I is a unit matrix and

σ_{e}^{2}

is the residual variance. GBLUP is implemented using the R package ‘BGLR’.

2.2.2. BayesB

BayesB is a random search variable selection method [1]. BayesB assumes that most segments have no effect for a particular trait, an assumption that is consistent with the properties of quantitative traits, while a few segments have an effect and differ in the variance of the effect. The prior assumptions in BayesB for SNP effects and their corresponding variances are as follows:

α_{j} ∣ π, σ_{j}^{2} \sim (i i d) \{\begin{matrix} N (0, σ_{j}^{2}) & 1 - π \\ 0 & π \end{matrix},

(2)

and

σ_{j}^{2} ∣ v_{α}, S_{α}^{2} \sim (i i d) χ^{- 2} (v, S),

(3)

where

α_{j}

is the j-th labeled effect,

σ_{j}^{2}

is the labeled effect variance, v is the a priori degrees of freedom, and S is the a priori scaling parameter. Meuwissen et al. [1] proposed that when the a priori distribution of the effect variance is a scaled inverse chi-squared distribution, its posterior distribution is likewise a scaled inverse chi-squared distribution. That is,

P (σ_{α_{j}}^{2}) = χ^{- 2} (v, S)

, where

σ_{α_{j}}^{2}

is the j-th SNP effect variance, and here, the prior distribution of the SNP effect,

P (α)

, is normal, i.e., it obeys the distribution

N (0, σ_{j}^{2})

, thus defining the prior distribution of

σ_{j}^{2}

,

P (σ_{α_{j}}^{2})

. Within the Bayesian inference framework, the scaled inverse chi-squared distribution serves as the conjugate prior for the variance parameter of the normal distribution, facilitating posterior derivation and sampling, thereby improving computational efficiency [34]. Its heavy-tailed property also allows for better modeling of the heterogeneity in SNP effect variances, aligning with the sparsity assumption in genomic data where a small number of loci exhibit large effects while the majority have effects close to zero. BayesB is implemented using the R package ‘BGLR’.

2.2.3. SVR

Support vector regression (SVR) is a technique for solving regression problems based on the principle of support vector machine [35]. SVR is an application of SVM to regression dealing with quantitative responses. It uses linear or nonlinear kernel functions to map the input space (labeled dataset) to a higher-dimensional feature space [36], and performs modeling and prediction on the feature space. The model formulation of SVR can be expressed as:

f (x) = β_{0} + h {(x)}^{T} β,

(4)

where

f (x)

is the vector of predicted values,

h {(x)}^{T}

is the kernel function,

β

is the weight vector, and

β_{0}

is the deviation. Typically, the formal SVR is given by minimizing the following limiting loss function:

min_{β_{0}, β} \frac{1}{2} {∥ β ∥}^{2} + C \sum_{i = 1}^{n} V_{ϵ} (f (x_{i} - y_{i})),

(5)

where C is a regularization constant controlling the trade-off between prediction error and model complexity, y is the vector of observations,

V_{ϵ}

is the

ϵ

-insensitive loss, and

∥ \cdot ∥

is the norm in Hilbert space. The

ϵ

-insensitive loss is defined as:

V_{ϵ} (r) = \{\begin{matrix} 0 & if | r | < ϵ \\ | r | - ϵ, & otherwise \end{matrix},

After optimization, the final form of SVR can be written as:

f (x) = \sum_{i = 1}^{m} ({\hat{a}}_{i} - α_{i}) k (x, x_{i}),

(6)

where

k (x, x_{i}) = ϕ {(x_{i})}^{T} ϕ (x_{j})

is the kernel function and

{\hat{a}}_{i}

is the Lagrange multiplier. In this study, the SVR method is implemented using the SVR function in Python’s Scikit-learn library (Version 1.6.0) by searching for the best kernel function and hyperparameters through a grid search.

2.2.4. KRR

Kernel Ridge Regression algorithm (KRR) is a nonlinear regression method that combines the advantages of kernel techniques and ridge regression and adds a kernel function to the ridge regression algorithm, which can be effective in discovering the nonlinear structure of data [37]. The difference between kernel ridge regression and ridge regression is that kernel ridge regression utilizes the kernel technique to define a higher-dimensional feature space and then constructs a ridge regression model in the feature space [38]. The KRR prediction function can be written as follows:

f (x_{i}) = k^{'} {(K + λ I)}^{- 1} y,

(7)

where

k^{'}

is a vector,

k_{i} = (x, x_{i}), i = 1, 2, 3, \dots, n

, n is the total number of samples in the training set, K is the Gram matrix,

λ

is the regularization constant, and I is the unit matrix. KRR is significantly computationally efficient, and it can deal with larger-scale datasets [39]. At the same time, it performs

L 2

regularization on the data to prevent overfitting, and due to its regularization property, it is also robust to data containing noise. We optimized the hyperparameters using a grid search approach and implemented the Kernel Ridge Regression (KRR) method via the KernelRidge function in Python’s Scikit-learn library (version 1.6.0).

2.2.5. RF

Random Forest (RF) is a widely used ensemble learning algorithm for predicting GEBVs. RF was first proposed by Breiman [40], which is essentially an ensemble of decision trees [41]. The decision tree is used as the basis to integrate the prediction results of each base learner to obtain the final result. Therefore, the key step in the prediction of RF is the formation of decision trees and forests [42]. Random forests can handle a large number of features and are commonly used in classification and regression problems for high-dimensional data. In genomic selection, because the construction process of a random forest has randomness and uses random samples and random features, it has good robustness and can effectively avoid the problem of overfitting [43]. However, the higher the number of decision trees and the larger the depth of the tree in the random forest model, the higher the computational complexity, and a large amount of training data is needed to achieve better prediction results. The expression of random forest regression is:

y = \frac{1}{M} \sum_{m = 1}^{M} t_{m} (ψ_{m} (y : X)),

(8)

where y is the predicted value of the random forest regression, the predictor variable

t_{m} (ψ_{m} (y : X))

is the decision tree, constructed at the m-th iteration using bootstrap samples of the data

ψ_{m} (y : X)

, and M is the number of decision trees in the random forest. The RF method was implemented using the RF function from Python’s Scikit-learn library (Version 1.6.0), with a grid search employed to optimize the number of decision trees (M) and the maximum tree depth.

2.2.6. TPOT: ExAutoGP Framework

TPOT [44] is a state-of-the-art AutoML approach proposed by Olson and Moore [22], a framework for automatically designing and optimizing ML pipelines based on genetic programming algorithms, whose operators are mainly derived from the scikit-learn package [45]. The initial phase of TPOT’s work randomly generates a collection of pipelines that contain data preprocessing, feature selection, and model selection steps, and evaluates their performance using cross-validation [22]. Subsequently, TPOT uses the high-performing pipelines as ‘parents’ and generates the next generation of pipelines using a genetic algorithm, which involves crossover (fusion of components from different pipelines) and mutation (random modification of specific parts of the pipeline) to produce ‘offspring’ pipelines. The process involves crossover (fusing different pipeline components) and mutation (randomly modifying specific parts of the pipeline) to produce ‘offspring’ pipelines. Over multiple iterations, TPOT progressively optimizes the performance of the pipeline population through selection, crossover, and mutation. To reduce the risk of overfitting, TPOT draws on an early stopping strategy in ML to monitor the improvement of pipeline performance, and if no significant improvement is seen after a number of generations, the evolutionary process is terminated and the best pipeline is selected [46]. Furthermore, TPOT’s optimization strategy strikes a balance between utilizing the identified optimal pipeline and exploring novel pipeline designs. By automating the end-to-end ML process, TPOT not only significantly saves time and reduces human intervention, but also may discover complex pipeline structures that are inaccessible to manual design, thus increasing the likelihood of generating high-performance pipelines with the ability to generalize over unknown data [47]. Figure 1 shows an overview of the workflow of the ExAutoGP framework implemented based on TPOT.

2.3. Evaluation

In this study, we evaluated the performance of all models using a 5-fold cross-validation approach. For this purpose, each dataset was randomly divided into five non-overlapping subsets, four of which were used as the training set and the remaining one as the test set, ensuring that each subset was given one chance to serve as the test set. To further enhance the stability and reliability of the model performance evaluation, this cross-validation process was repeated five times, generating a total of 25 independent results. To quantify the prediction accuracy of the proposed method, we calculated the Pearson correlation coefficients for each trait based on the five repetitions of each 5-fold cross-validation and took the average of the 25 results as the final assessment metrics. The Pearson correlation coefficients were calculated using the following formula:

r (y_{c}, GEBV) = \frac{cov (y_{c}, GEBV)}{\sqrt{var (y_{c}) var (GEBV)}},

(9)

where

y_{c}

is the corrected phenotype.

2.4. SHAP: Modelling Interpretation

SHAP is a method for interpreting model predictions based on the concept of Shapley value in game theory [48], which aims to quantify the contribution of features to the prediction of an ML model. The SHAP value is calculated by the following formula:

SHAP (i) = \sum_{F \in N ∖ {i}} \frac{| F |! (| N | - | F | - 1)!}{| N |!} [f (F \cup {i}) - f (F)],

(10)

where

SHAP (i)

denotes the Shapley value of the i-th feature, N denotes the set of all features, F denotes the combination of features with the i-th feature removed,

f (F)

denotes the output of the model on the input feature combination F,

| N |

denotes the number of features in the set N,

| S |

denotes the number of features in the set S, and

f (F \cup {i}) - f (F)

denotes the cumulative contribution of feature i. The significant advantage of SHAP is that it is model-independent, enabling it to be widely applied to a wide range of ML algorithms, including linear models, decision trees, and neural networks, thus intuitively revealing features that play a key role in model prediction. As it is based on the Shapley value in game theory, SHAP systematically calculates the difference in marginal contribution to the prediction result when a feature is included or excluded under all possible feature subsets. This approach not only ensures fairness and consistency in feature contribution allocation but also provides highly interpretable quantitative results.

3. Results

3.1. Comparison of ExAutoGP with GBLUP and BayesB in Genomic Prediction Accuracy

The ExAutoGP method demonstrated a substantial improvement over traditional genomic prediction approaches, namely GBLUP and BayesB, across multiple datasets and phenotypic traits. In the simulated dataset (Figure 2), ExAutoGP achieved prediction accuracies of 0.466 and 0.592 for the T2 and T3 traits, respectively, surpassing the performance of GBLUP (0.397 and 0.545) and BayesB (0.461 and 0.608). Notably, for the T2 trait, ExAutoGP exhibited a 17.4% improvement in predictive accuracy over GBLUP and a 1.1% increase relative to BayesB. However, ExAutoGP is slightly inferior to BayesB in T1 and T3 traits. For the cattle dataset (Figure 3), ExAutoGP achieved a prediction accuracy of 0.854, which was superior to that of GBLUP (0.815), on the MFP trait. In addition, ExAutoGP’s prediction accuracy similarly outperformed GBLUP and BayesB on the SCS trait, while the three had comparable prediction accuracies on the MY trait. On the chicken dataset (Figure 4), for the four traits EWAFE, EW28, EW56, and EW80, ExAutoGP’s prediction accuracy outperformed GBLUP and BayesB to varying degrees, with enhancements ranging from 0.5% to 6.5%. For the EW66 trait, there was no significant difference between the three methods. For the EW36 trait, BayesB performed better and ExAutoGP and GBLUP were comparable. And on the EW72 trait, GBLUP improved by 4% over ExAutoGP, while ExAutoGP improved by 3% over BayesB. Overall, ExAutoGP demonstrated superior and consistent prediction performance on all three datasets. Details of the 5-fold cross-validated Pearson correlation coefficients for five replications are in Supplementary Tables S1–S3.

3.2. Comparison of ExAutoGP and Traditional Machine Learning Methods in Genomic Prediction Accuracy

As expected, ExAutoGP significantly outperforms the other three traditional ML methods, namely SVR, KRR, and RF, in all the evaluation datasets. The average prediction accuracy of ExAutoGP over the three datasets is improved by 7.5%, 7.46%, and 28.73% over SVR, KRR, and RF, respectively, with the improvement in each trait ranging from 0.4% to 20.8%, 1.8% to 24.5% and 6.4% to 58.2%. In the simulated dataset (Figure 2), ExAutoGP achieved an average prediction accuracy of 0.489, representing an increase of 11.9%, 9.5%, and 16% compared to SVR (0.437), KRR (0.447), and RF (0.422), respectively. Notably, for the T2 trait, ExAutoGP exhibited substantial improvements over SVR (20.8%), KRR (17%), and RF (23.7%). On the cattle dataset (Figure 3), ExAutoGP showed different degrees of improvement over SVR, KRR, and RF on all three traits, with a 44.6% improvement in ExAutoGP’s prediction accuracy over RF on the SCS trait. On the chicken dataset (Figure 4), for the EW28 trait, ExAutoGP’s prediction accuracy improved by 18.4%, 15.3%, and 53.6% over SVR, KRR, and RF, respectively. Similarly, ExAutoGP improved by 7.6%, 24.5%, and 43.6% over SVR, KRR, and RF on the EW72 trait. For the EW80 trait, ExAutoGP improved the prediction accuracy over RF by up to 58.2%. Detailed information on the 5-fold cross-validated Pearson correlation coefficients for five replications is in Supplementary Tables S1–S3.

3.3. Comparison of Computation Time

We compared the computational time required by the six genomic prediction methods to generate predictions on the three datasets, as shown in Figure 5. The run times are in seconds, and all methods were run on computers equipped with high-performance servers (CentOS Linux 8 operating system, Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz processor, 144 CPU cores, and 1TB total memory). To improve efficiency, parallel processing is used for the 5-fold cross-validation of each method. The computational process of TPOT is more time-consuming, as it requires searching for the best machine learning pipeline through continuous iterations. Nevertheless, ExAutoGP runs two to six times faster than BayesB on all three datasets. On both the simulated and chicken datasets, ExAutoGP required more computational time compared to SVR, KRR, and RF. Nonetheless, its fully automated process may mitigate the additional time cost associated with manual hyperparameter tuning. BayesB had the longest average runtime, primarily due to the iterative nature of the Markov Chain Monte Carlo algorithm, which requires extensive iterations to achieve convergence. This issue was particularly pronounced in the chicken dataset, where the substantially larger number of SNPs further exacerbated BayesB’s computational burden. A detailed breakdown of the computational times for each method is provided in Supplementary Table S4.

3.4. Interpreting ExAutoGP Model Using the SHAP Method

We systematically assessed the importance of each trait variable in the ExAutoGP model and its contribution to model prediction using the SHAP method on real traits (see Figure 6 and Figures S1–S9). To illustrate this approach, we present the example of the EWAFE trait from the chicken dataset. On the basis of the model performance evaluation, we further calculated SHAP values to quantify the impact of individual traits on the prediction results. The trait importance bar chart (Figure 6A) illustrates the ranking of trait importance determined based on the mean absolute value of SHAP values, listing the top 20 most important key variables in descending order. These values reflect the average impact of the individual features on the magnitude of the model output. The results show that the AX-75490162_A locus has the most significant contribution among the 20 key loci, followed by AX-75490231_A and AX-75490266_A loci. It is important to note that the feature importance visualization in Figure 6A only includes variables retained after the feature selection process, and all features are displayed with their original nomenclature.

To further parse the decision-making mechanism of the model, we enhanced the interpretability of the results using the SHAP summary graph (Figure 6B). This method arranges individual features around a center line based on their importance. Each point represents a data sample that is labeled at a specific coordinate position based on the SHAP value of its corresponding feature. Features located to the left of the center line have a negative SHAP value, indicating that they have a negative effect on the model predictions, while features to the right have a positive SHAP value, indicating that they drive the model predictions toward a positive outcome. In addition, the color coding of the sample points reflects the magnitude of the feature values, with red indicating higher values and blue indicating lower values. Figure 6B illustrates that the SHAP value distribution for the AX-75490162_A locus is relatively concentrated and positively skewed, indicating a positive contribution to the model output. In contrast, the AX-80872441_A and AX-75662925_A loci exhibit negative contributions, reducing the model’s predicted values. Additionally, the SHAP values of the AX-75490180_A, AX-75490174_A, and AX-75490232_A loci display a narrow range, suggesting a relatively stable influence on model predictions.

According to the study by Liu et al. [32], eight of the top twenty key loci identified using the SHAP method overlapped with SNPs that were previously reported to have significant or suggestive associations with the EWAFE trait. Notably, loci AX-75490180_A (rs314056488) and AX-75490174_A (rs316324927) are located in the intronic region of the candidate gene CECR1 within the chromosomal region associated with cat eye syndrome. In addition, locus AX-75490232_A (rs316675251) is positioned upstream of CECR2, a candidate gene associated with both the EWAFE and EW56 traits. Previous studies have demonstrated that CECR2 plays a crucial role in embryonic development and organogenesis. These findings provide critical insights into the genetic architecture underlying the EWAFE trait and offer valuable directions for further investigation.

4. Discussion

In this study, we evaluate the performance of the ExAutoGP method in genomic prediction by comparing it with traditional approaches, including GBLUP and BayesB, as well as ML methods such as SVR, KRR, and RF. The predictive accuracy of ExAutoGP is assessed across multiple datasets to establish its robustness and generalizability. In the simulated dataset, ExAutoGP achieved prediction accuracies of 0.466 and 0.592 for the T2 and T3 traits, respectively, surpassing GBLUP by 17.4% (0.397) and 8.6% (0.545). The genomic prediction accuracies obtained using GBLUP (0.404, 0.397, and 0.545) closely aligned with previously reported values (0.404, 0.400, and 0.545) [49], confirming the reliability of our implementation. Similarly, in the bovine dataset, the prediction performance of the GBLUP and BayesB methods in this study for the three traits was consistent with those reported in the literature [28,50] and slightly worse than that reported by Liang et al. [51]. This discrepancy may be attributed to the differences in the cross-validation strategies used in different studies. Notably, for the milk fat percentage (MFP) trait, ExAutoGP demonstrated superior predictive accuracy (0.854) compared to GBLUP (0.815), highlighting its advantage for high-heritability traits. In addition, the effectiveness of ExAutoGP was further verified in the chicken dataset, where ExAutoGP’s prediction accuracies for the EWAFE and EW80 traits were 0.295 and 0.318, respectively, which were 2.8% and 5.5% higher than GBLUP, and 6.2% and 6% higher than BayesB. However, for some traits (e.g., EW72), GBLUP performed slightly better than ExAutoGP, which may be related to the population structure and the genetic structure of the traits [52].

Comparison with traditional ML methods further highlights the advantages of ExAutoGP. Across all evaluation datasets, the average prediction accuracy of ExAutoGP improved by 7.5%, 7.46% and 28.73% over SVR, KRR, and RF, respectively. In particular, on the T2 trait in the simulated dataset, ExautoGP’s prediction accuracy improved by 20.8%, 17%, and 23.7% over SVR, KRR, and RF, respectively. Notably, the performance of SVR and KRR on the three simulated traits in our study was consistent with the findings reported by Liang et al. [49], while our proposed ExAutoGP model significantly outperformed their method. Furthermore, for the three traits in the cattle dataset, ExAutoGP demonstrated superior predictive performance compared to the improved KRR and SVR models developed by Li et al. [50], and achieved comparable results to the KAML approach proposed by Yin et al. [53]. On the EW80 trait in the chicken dataset, ExAutoGP’s improvement over RF was as high as 58.2%. This significant enhancement in predictive accuracy can be attributed to ExAutoGP’s ability to efficiently capture non-linear relationships and complex patterns in genomic data through automated pipeline optimization. In contrast, traditional ML methods are constrained by the need for manual parameter tuning. Among the three conventional ML models, RF exhibited the poorest performance, primarily due to its sensitivity to hyperparameters such as the number and maximum depth of decision trees. The tuning process for RF is computationally intensive and often fails to yield optimal parameters, further compromising its predictive performance [54]. However, the improvement in prediction accuracy is often accompanied by an increase in computational cost. In this study, we found that although ExAutoGP requires more computational resources on the simulated dataset and the chicken dataset compared to SVR, KRR, and RF, its runtime is still 2 to 6 times faster than BayesB. ExAutoGP’s iterative search strategy based on TPOT is time-consuming, but it avoids the additional time cost of manual parameter tuning through parallel processing and automated processes, thus potentially maintaining a high level of efficiency.

In many areas of scientific research, achieving high predictive accuracy while simultaneously enhancing the interpretability of model decision-making processes is essential. In this study, SHAP analysis was employed to quantify the contributions of individual features to model predictions and to improve interpretability through visualization. The results revealed that the AX-75490162_A locus exerted the most significant influence on model output, with its positive SHAP value distribution further confirming its critical role in prediction outcomes. Moreover, among the SNPs previously reported to be significantly associated with the EWAFE trait [32], eight of the top twenty key loci identified by SHAP analysis in this study were consistent with known loci. These co-identified SNPs play essential roles in the genetic regulation of the EWAFE trait, further reinforcing the reliability of the proposed approach. It is worth emphasizing that the integration of AutoML with SHAP analysis not only enhanced the predictive performance of the model but also improved its interpretability, rendering the decision-making process more transparent. SHAP analysis not only quantifies the importance of individual trait-associated variables and their contributions to model predictions but also facilitates the identification of key genetic markers significantly associated with the trait. This approach enhances the understanding of the genetic regulatory mechanisms underlying complex traits and provides new perspectives for future genetic research.

Previous studies have shown that no single predictive model can be universally applied to all traits due to the complexity and heterogeneity of trait genetic architectures [50,55,56]. To address this challenge, this study introduces AutoML into genomic prediction, offering a novel perspective for improving model adaptability. ExAutoGP leverages TPOT to automate the construction of high-performance ML pipelines, which includes data pre-processing, feature engineering, model selection, and hyperparameter tuning. By systematically and iteratively searching for optimized model architectures tailored to specific datasets and task requirements [19], ExAutoGP effectively captures the diverse association patterns between genotypes and phenotypes. Unlike conventional ML approaches, ExAutoGP dynamically adapts to the genetic background of different traits by automatically evaluating multiple ML models and hyperparameter configurations. This eliminates the reliance on a priori assumptions regarding trait genetic architectures and manual parameter tuning. Furthermore, ExAutoGP significantly reduces the technical barriers for breeders in applying ML by simplifying model development, thereby facilitating its wider application in genomic selection.

Although ExAutoGP demonstrates considerable potential in genomic prediction, it still faces several limitations. On the one hand, ExAutoGP significantly reduces manual intervention and improves modeling efficiency through automated hyperparameter optimization, model selection and feature engineering. However, this automated process requires systematic search and evaluation of a large number of model configurations, which leads to higher computational costs. On the other hand, only three datasets were used in this study, which may have some limitations in terms of model generalization capability. Therefore, future studies may consider introducing more efficient search strategies or utilizing distributed computing techniques to reduce the computational burden and enhance the computational efficiency. Furthermore, the advancement of ExAutoGP should be further validated on larger and more diverse datasets [57], especially in traits regulated by non-additive genetic effects (e.g., dominance vs. epistasis), and it is expected that ExAutoGP will further improve the prediction accuracy of GEBV [58,59]. Notably, an important finding in this study is that multiple key loci significantly associated with the target traits were successfully identified by SHAP analysis, which reveals the decision logic of the model from the perspective of biological mechanisms. This finding highlights the value of introducing interpretable methods into genomic prediction, and also provides an important research direction for further exploring the biological interpretability of models in the future.

5. Conclusions

In this study, we introduce ExAutoGP, a novel machine learning strategy for genomic prediction, leveraging the AutoML framework TPOT. Our evaluation of ExAutoGP against five established genomic prediction methods across three datasets demonstrates its ability to dynamically adapt to traits with diverse genetic architectures, delivering robust predictions and consistently outperforming several other machine learning models in prediction accuracy. The integration of the SHAP approach enhances model transparency and enables the identification of key loci associated with target traits, shedding light on the underlying genetic mechanisms and offering valuable insights for breeding strategies. To further advance ExAutoGP, future work will focus on employing more sophisticated search strategies and validating the method on larger and more diverse datasets. In addition, the successful application of SHAP analysis highlights the potential of interpretable approaches in genomic prediction, paving the way for deeper exploration of biological interpretability in predictive modeling.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ani15081172/s1.

Author Contributions

Y.R. and L.Y. conceived the project. Y.R. performed statistical analysis and wrote the manuscript. L.Z. and L.G. provided help on ML and wrote the Python code. S.W. participated in data analysis. L.Z., L.G., S.W. and L.Y. contributed to writing and revising the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Research and Demonstration on Intelligent Management of High Quality Beef Cattle Industry in Yunnan Plateau (Major Special Project of Yunnan Province, No. 202102AE090009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset(s) supporting the conclusions of this article is (are) available in the public repository. Simulation dataset: http://qtl-mas-2012.kassiopeagroup.com/en/dataset.php; German Holstein cattle: https://www.g3journal.org/content/5/4/615.supplemental; Chicken: https://figshare.com/articles/Genome-wide_Association_Analysis_of_Age-Dependent_Egg_Weights_in_Chickens/5844420.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine learning
SHAP	SHapley Additive exPlanations
AutoML	Automated machine learning
SNP	Single nucleotide polymorphism
GEBV	Genomic estimated breeding value
QTL	Quantitative trait locus
GBLUP	Genomic best linear unbiased prediction
SVR	Support vector regression
KRR	Kernel ridge regression
RF	Random forest
TPOT	Tree-based pipeline optimization tool
MY	Milk yield
MFP	Milk fat percentage
SCS	Somatic cell score
EW	Egg weight
EWAFE	First egg weight
EW28	Egg weight at 28 weeks of age
EW36	Egg weight at 36 weeks of age
EW56	Egg weight at 56 weeks of age
EW66	Egg weight at 66 weeks of age
EW72	Egg weight at 72 weeks of age
EW80	Egg weight at 80 weeks of age

References

Meuwissen, T.H.; Hayes, B.J.; Goddard, M. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef]
De Roos, A.; Schrooten, C.; Veerkamp, R.; Van Arendonk, J. Effects of genomic selection on genetic improvement, inbreeding, and merit of young versus proven bulls. J. Dairy Sci. 2011, 94, 1559–1567. [Google Scholar] [CrossRef]
Heffner, E.L.; Jannink, J.L.; Sorrells, M.E. Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 2011, 4, 65. [Google Scholar] [CrossRef]
Hayes, B.J.; Bowman, P.J.; Chamberlain, A.J.; Goddard, M.E. Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy Sci. 2009, 92, 433–443. [Google Scholar] [CrossRef] [PubMed]
Schaeffer, L. Strategy for applying genome-wide selection in dairy cattle. J. Anim. Breed. Genet. 2006, 123, 218–223. [Google Scholar] [CrossRef] [PubMed]
García-Ruiz, A.; Cole, J.B.; VanRaden, P.M.; Wiggans, G.R.; Ruiz-López, F.J.; Van Tassell, C.P. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc. Natl. Acad. Sci. USA 2016, 113, E3995–E4004. [Google Scholar] [CrossRef] [PubMed]
Voss-Fels, K.P.; Cooper, M.; Hayes, B.J. Accelerating crop genetic gains with genomic selection. Theor. Appl. Genet. 2019, 132, 669–686. [Google Scholar] [CrossRef]
Sandhu, K.; Patil, S.S.; Pumphrey, M.; Carter, A. Multitrait machine-and deep-learning models for genomic selection using spectral information in a wheat breeding program. Plant Genome 2021, 14, e20119. [Google Scholar] [CrossRef]
Crossa, J.; Pérez-Rodríguez, P.; Cuevas, J.; Montesinos-López, O.; Jarquín, D.; De Los Campos, G.; Burgueño, J.; González-Camacho, J.M.; Pérez-Elizalde, S.; Beyene, Y.; et al. Genomic selection in plant breeding: Methods, models, and perspectives. Trends Plant Sci. 2017, 22, 961–975. [Google Scholar] [CrossRef]
VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
Misztal, I.; Legarra, A.; Aguilar, I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 2009, 92, 4648–4655. [Google Scholar] [CrossRef] [PubMed]
Daneshvar, A.; Mousa, G. Regression shrinkage and selection via least quantile shrinkage and selection operator. PLoS ONE 2023, 18, e0266267. [Google Scholar] [CrossRef]
González-Recio, O.; Forni, S. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet. Sel. Evol. 2011, 43, 1–12. [Google Scholar] [CrossRef] [PubMed]
Yi, N.; Xu, S. Bayesian LASSO for quantitative trait loci mapping. Genetics 2008, 179, 1045–1055. [Google Scholar] [CrossRef] [PubMed]
Meuwissen, T.; Hayes, B.; Goddard, M. Genomic selection: A paradigm shift in animal breeding. Anim. Front. 2016, 6, 6–14. [Google Scholar] [CrossRef]
Pérez, P.; de Los Campos, G. Genome-wide regression and prediction with the BGLR statistical package. Genetics 2014, 198, 483–495. [Google Scholar] [CrossRef]
Ge, X.J.; Hwang, C.C.; Liu, Z.H.; Huang, C.C.; Huang, W.H.; Hung, K.H.; Wang, W.K.; Chiang, T.Y. Conservation genetics and phylogeography of endangered and endemic shrub Tetraena mongolica (Zygophyllaceae) in Inner Mongolia, China. BMC Genet. 2011, 12, 1–12. [Google Scholar] [CrossRef]
Montesinos López, O.A.; Montesinos López, A.; Crossa, J. Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Weigel, K.; VanRaden, P.; Norman, H.; Grosu, H. A 100-Year Review: Methods and impact of genetic selection in dairy cattle—From daughter–dam comparisons to deep learning algorithms. J. Dairy Sci. 2017, 100, 10234–10250. [Google Scholar] [CrossRef]
Shahinfar, S.; Mehrabani-Yeganeh, H.; Lucas, C.; Kalhor, A.; Kazemian, M.; Weigel, K.A. Prediction of Breeding Values for Dairy Cattle Using Artificial Neural Networks and Neuro-Fuzzy Systems. Comput. Math. Methods Med. 2012, 2012, 127130. [Google Scholar] [CrossRef]
Baratchi, M.; Wang, C.; Limmer, S.; van Rijn, J.N.; Hoos, H.; Bäck, T.; Olhofer, M. Automated machine learning: Past, present and future. Artif. Intell. Rev. 2024, 57, 122. [Google Scholar] [CrossRef]
Olson, R.S.; Moore, J.H. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Proceedings of the Workshop on Automatic Machine Learning, PMLR, New York, NY, USA, 24 June 2016; pp. 66–74. [Google Scholar]
Castagno, S.; Birch, M.; van der Schaar, M.; McCaskie, A. Predicting rapid progression in knee osteoarthritis: A novel and interpretable automated machine learning approach, with specific focus on young patients and early disease. Ann. Rheum. Dis. 2025, 84, 124–135. [Google Scholar] [CrossRef] [PubMed]
SenthilKumar, G.; Madhusudhana, S.; Flitcroft, M.; Sheriff, S.; Thalji, S.; Merrill, J.; Clarke, C.N.; Maduekwe, U.N.; Tsai, S.; Christians, K.K.; et al. Automated machine learning (AutoML) can predict 90-day mortality after gastrectomy for cancer. Sci. Rep. 2023, 13, 11051. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Ma, J. CascadeDumpNet: Enhancing open dumpsite detection through deep learning and AutoML integrated dual-stage approach using high-resolution satellite imagery. Remote Sens. Environ. 2024, 313, 114349. [Google Scholar] [CrossRef]
Azodi, C.B.; Tang, J.; Shiu, S.H. Opening the black box: Interpretable machine learning for geneticists. Trends Genet. 2020, 36, 442–455. [Google Scholar] [CrossRef]
Usai, M.G.; Gaspa, G.; Macciotta, N.P.; Carta, A.; Casu, S. XVI th QTLMAS: Simulated dataset and comparative analysis of submitted results for QTL mapping and genomic evaluation. BMC Proc. 2014, 8, 1–9. [Google Scholar] [CrossRef]
Zhang, Z.; Erbe, M.; He, J.; Ober, U.; Gao, N.; Zhang, H.; Simianer, H.; Li, J. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix. G3: Genes, Genomes, Genet. 2015, 5, 615–627. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Su, G.; Jiang, L.; Bao, Z. An efficient unified model for genome-wide association studies and genomic selection. Genet. Sel. Evol. 2017, 49, 1–8. [Google Scholar] [CrossRef]
Hu, Z.L.; Park, C.A.; Wu, X.L.; Reecy, J.M. Animal QTLdb: An improved database tool for livestock animal QTL/association data dissemination in the post-genome era. Nucleic Acids Res. 2013, 41, D871–D879. [Google Scholar] [CrossRef]
Zhang, Z.; Ober, U.; Erbe, M.; Zhang, H.; Gao, N.; He, J.; Li, J.; Simianer, H. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS ONE 2014, 9, e93017. [Google Scholar] [CrossRef]
Liu, Z.; Sun, C.; Yan, Y.; Li, G.; Wu, G.; Liu, A.; Yang, N. Genome-wide association analysis of age-dependent egg weights in chickens. Front. Genet. 2018, 9, 128. [Google Scholar] [CrossRef]
Habier, D.; Fernando, R.L.; Dekkers, J. The impact of genetic relationship information on genome-assisted breeding values. Genetics 2007, 177, 2389–2397. [Google Scholar] [CrossRef] [PubMed]
Gelman, A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
Maenhout, S.; De Baets, B.; Haesaert, G.; Van Bockstaele, E. Support vector machine regression for the prediction of maize hybrid performance. Theor. Appl. Genet. 2007, 115, 1003–1013. [Google Scholar] [CrossRef] [PubMed]
Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016. [Google Scholar]
Christmann, A.; Xiang, D.; Zhou, D.X. Total stability of kernel methods. Neurocomputing 2018, 289, 101–118. [Google Scholar] [CrossRef]
He, J.; Ding, L.; Jiang, L.; Ma, L. Kernel ridge regression classification. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2263–2267. [Google Scholar]
Chen, B.W.; Abdullah, N.N.B.; Park, S.; Gu, Y. Efficient multiple incremental computation for Kernel Ridge Regression with Bayesian uncertainty modeling. Future Gener. Comput. Syst. 2018, 82, 679–688. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Zhang, P.; Liu, C.; Lao, D.; Nguyen, X.C.; Paramasivan, B.; Qian, X.; Inyinbor, A.A.; Hu, X.; You, Y.; Li, F. Unveiling the drives behind tetracycline adsorption capacity with biochar through machine learning. Sci. Rep. 2023, 13, 11512. [Google Scholar] [CrossRef]
Xiang, T.; Li, T.; Li, J.; Li, X.; Wang, J. Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs. FASEB J. 2023, 37, e22961. [Google Scholar] [CrossRef]
Elshawi, R.; Maher, M.; Sakr, S. Automated machine learning: State-of-the-art and open challenges. arXiv 2019, arXiv:1906.02287. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Radzi, S.F.M.; Karim, M.K.A.; Saripan, M.I.; Rahman, M.A.A.; Isa, I.N.C.; Ibahim, M.J. Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction. J. Pers. Med. 2021, 11, 978. [Google Scholar] [CrossRef] [PubMed]
Kiala, Z.; Odindi, J.; Mutanga, O. Determining the capability of the tree-based pipeline optimization tool (tpot) in mapping parthenium weed using multi-date sentinel-2 image data. Remote Sens. 2022, 14, 1687. [Google Scholar] [CrossRef]
Mosca, E.; Szigeti, F.; Tragianni, S.; Gallagher, D.; Groh, G. SHAP-based explanation methods: A review for NLP interpretability. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 4593–4603. [Google Scholar]
Liang, M.; An, B.; Li, K.; Du, L.; Deng, T.; Cao, S.; Du, Y.; Xu, L.; Gao, X.; Zhang, L.; et al. Improving genomic prediction with machine learning incorporating TPE for hyperparameters optimization. Biology 2022, 11, 1647. [Google Scholar] [CrossRef]
Li, M.; Hall, T.; MacHugh, D.E.; Chen, L.; Garrick, D.; Wang, L.; Zhao, F. KPRR: A novel machine learning approach for effectively capturing nonadditive effects in genomic prediction. Briefings Bioinform. 2025, 26, bbae683. [Google Scholar] [CrossRef] [PubMed]
An, B.; Liang, M.; Chang, T.; Duan, X.; Du, L.; Xu, L.; Zhang, L.; Gao, X.; Li, J.; Gao, H. KCRR: A nonlinear machine learning with a modified genomic similarity matrix improved the genomic prediction efficiency. Briefings Bioinform. 2021, 22, bbab132. [Google Scholar] [CrossRef]
Zingaretti, L.M.; Gezan, S.A.; Ferrão, L.F.V.; Osorio, L.F.; Monfort, A.; Muñoz, P.R.; Whitaker, V.M.; Pérez-Enciso, M. Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species. Front. Plant Sci. 2020, 11, 25. [Google Scholar] [CrossRef]
Yin, L.; Zhang, H.; Zhou, X.; Yuan, X.; Zhao, S.; Li, X.; Liu, X. KAML: Improving genomic prediction accuracy of complex traits using machine learning determined parameters. Genome Biol. 2020, 21, 1–22. [Google Scholar] [CrossRef]
Wang, X.; Shi, S.; Wang, G.; Luo, W.; Wei, X.; Qiu, A.; Luo, F.; Ding, X. Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs. J. Anim. Sci. Biotechnol. 2022, 13, 60. [Google Scholar] [CrossRef]
Chafai, N.; Hayah, I.; Houaga, I.; Badaoui, B. A review of machine learning models applied to genomic prediction in animal breeding. Front. Genet. 2023, 14, 1150596. [Google Scholar] [CrossRef]
Liu, B.; Liu, H.; Tu, J.; Xiao, J.; Yang, J.; He, X.; Zhang, H. An investigation of machine learning methods applied to genomic prediction in yellow-feathered broilers. Poult. Sci. 2025, 104, 104489. [Google Scholar] [CrossRef] [PubMed]
Meyer, P.G.; Cherstvy, A.G.; Seckler, H.; Hering, R.; Blaum, N.; Jeltsch, F.; Metzler, R. Directedeness, correlations, and daily cycles in springbok motion: From data via stochastic models to movement prediction. Phys. Rev. Res. 2023, 5, 043129. [Google Scholar] [CrossRef]
Long, N.; Gianola, D.; Rosa, G.J.; Weigel, K.A. Application of support vector regression to genome-assisted prediction of quantitative traits. Theor. Appl. Genet. 2011, 123, 1065–1074. [Google Scholar] [CrossRef]
Millet, E.J.; Kruijer, W.; Coupel-Ledru, A.; Alvarez Prado, S.; Cabrera-Bosquet, L.; Lacube, S.; Charcosset, A.; Welcker, C.; van Eeuwijk, F.; Tardieu, F. Genomic prediction of maize yield across European environmental conditions. Nat. Genet. 2019, 51, 952–956. [Google Scholar] [CrossRef] [PubMed]

Figure 1. End-to-end workflow of the ExAutoGP framework. It consists of three key components. (1) Data input module. (2) Core TPOT module: Contains components for data preprocessing, feature selection, model selection, hyper-parameter optimization, and model evaluation to automatically and iteratively optimize the ML pipeline. (3) Extraction of the optimal pipeline and interpretation with SHAP.

Figure 2. Comparison of prediction accuracies of SVR, KRR, RF, ExAutoGP, GBLUP, and BayesB on simulated datasets. For the box-and-line plot, the line in the middle of the box indicates the median mean correlation, the top of the line extending above the box indicates the maximum value, the bottom of the line extending below indicates the minimum value, and outliers indicate outliers.

Figure 3. Comparison of the prediction accuracy of SVR, KRR, RF, ExAutoGP, GBLUP, and BayesB on the Holstein cattle dataset. For the box-and-line plot, the line in the middle of the box indicates the median average correlation, the top of the line extending above the box indicates the maximum value, the bottom of the line extending below indicates the minimum value, and outliers indicate outliers.

Figure 4. Comparison of prediction accuracies of SVR, KRR, RF, ExAutoGP, GBLUP, and BayesB for the chicken dataset. For the box-and-line plot, the line in the middle of the box indicates the median mean correlation, the top of the line extending above the box indicates the maximum value, the bottom of the line extending below indicates the minimum value, and outliers indicate outliers.

Figure 5. Comparison of computational performance (in seconds) of SVR, KRR, RF, ExAutoGP, GBLUP, and BayesB on three datasets.

Figure 6. Analysis of EWAFE traits in the chicken dataset using SHAP to assess the importance of features. (A) Feature importance ranking: The top 20 traits are sorted in descending order by their mean SHAP values. (B) SHAP summary plot showing the distribution of SHAP values for each feature.

Table 1. Summary of the three datasets.

Dataset	Trait	N	SNPs	$h^{2}$	Mean	SD
Simulation	T1	3000	10,000	0.36	0.00	176.52
	T2	3000	10,000	0.35	0.00	9.51
	T3	3000	10,000	0.52	0.00	0.02
German Holstein	MY	5024	42,551	0.95	370.79	641.60
	MFP	5024	42,551	0.94	−0.06	0.28
	SCS	5024	42,551	0.88	102.32	11.73
Chicken	EWAFE	1052	294,705	0.10	0.00	3.27
	EW28	1063	294,705	0.50	57.19	3.47
	EW36	1063	294,705	0.50	59.15	3.28
	EW56	1027	294,705	0.51	65.83	4.25
	EW66	960	294,705	0.51	80.85	4.34
	EW72	847	294,705	0.54	60.97	3.53
	EW80	852	294,705	0.29	62.33	5.07

Note: Trait: the specific phenotype or characteristic being measured. N: number of individuals with the phenotype. SNPs: the number of genetic markers used for analysis. h²: heritability of the trait. Mean: the average value of the trait. SD: standard deviation, reflecting the degree of dispersion of trait values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rao, Y.; Zhang, L.; Gao, L.; Wang, S.; Yang, L. ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP. Animals 2025, 15, 1172. https://doi.org/10.3390/ani15081172

AMA Style

Rao Y, Zhang L, Gao L, Wang S, Yang L. ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP. Animals. 2025; 15(8):1172. https://doi.org/10.3390/ani15081172

Chicago/Turabian Style

Rao, Yao, Lilian Zhang, Lutao Gao, Shuran Wang, and Linnan Yang. 2025. "ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP" Animals 15, no. 8: 1172. https://doi.org/10.3390/ani15081172

APA Style

Rao, Y., Zhang, L., Gao, L., Wang, S., & Yang, L. (2025). ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP. Animals, 15(8), 1172. https://doi.org/10.3390/ani15081172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.1.1. Simulation Dataset

2.1.2. German Holstein Cattle

2.1.3. Chicken Dataset

2.2. Genomic Prediction Models

2.2.1. GBLUP

2.2.2. BayesB

2.2.3. SVR

2.2.4. KRR

2.2.5. RF

2.2.6. TPOT: ExAutoGP Framework

2.3. Evaluation

2.4. SHAP: Modelling Interpretation

3. Results

3.1. Comparison of ExAutoGP with GBLUP and BayesB in Genomic Prediction Accuracy

3.2. Comparison of ExAutoGP and Traditional Machine Learning Methods in Genomic Prediction Accuracy

3.3. Comparison of Computation Time

3.4. Interpreting ExAutoGP Model Using the SHAP Method

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI