Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization

Simple Summary Machine learning has been a crucial implement for genomic prediction. However, the complicated process of tuning hyperparameters tremendously hindered its application in actual breeding programs, especially for people without experience tuning hyperparameters. In this study, we applied a tree-structured Parzen estimator (TPE) to tune the hyperparameters of machine learning methods. Overall, incorporating kernel ridge regression (KRR) with TPE achieved the highest prediction accuracy in simulation and real datasets. Abstract Depending on excellent prediction ability, machine learning has been considered the most powerful implement to analyze high-throughput sequencing genome data. However, the sophisticated process of tuning hyperparameters tremendously impedes the wider application of machine learning in animal and plant breeding programs. Therefore, we integrated an automatic tuning hyperparameters algorithm, tree-structured Parzen estimator (TPE), with machine learning to simplify the process of using machine learning for genomic prediction. In this study, we applied TPE to optimize the hyperparameters of Kernel ridge regression (KRR) and support vector regression (SVR). To evaluate the performance of TPE, we compared the prediction accuracy of KRR-TPE and SVR-TPE with the genomic best linear unbiased prediction (GBLUP) and KRR-RS, KRR-Grid, SVR-RS, and SVR-Grid, which tuned the hyperparameters of KRR and SVR by using random search (RS) and grid search (Gird) in a simulation dataset and the real datasets. The results indicated that KRR-TPE achieved the most powerful prediction ability considering all populations and was the most convenient. Especially for the Chinese Simmental beef cattle and Loblolly pine populations, the prediction accuracy of KRR-TPE had an 8.73% and 6.08% average improvement compared with GBLUP, respectively. Our study will greatly promote the application of machine learning in GP and further accelerate breeding progress.


Introduction
Genomic selection (GS) has been widely used in actual animal and plant breeding processes, which predicts the genomic estimated breeding value (GEBV) with the whole genome markers information [1]. Compared with the traditional selection methods, which relied on progeny testing, GS tremendously accelerated the breeding process by predicting GEBV with genotypes [2]. The Holstein dairy cattle of America was the first breed that applied GS to select high-fitness individuals and has continued up until the present [3]. After more than ten years of development, GS has been gradually applied to practical genetic improvement of pigs, horses, chickens, rice, and wheat, and it significantly improved breeding [4][5][6][7][8][9][10]. Genomic best linear unbiased regression (GBLUP) and BayesB are the representative methods of GS. The former directly estimated GEBV by constructing the

TPE:
TPE was a Bayesian optimization algorithm that optimized the hyperparameters automatically. Hyperparameters optimization of TPE can be represented as: where f M (x) represents an objective score to maximize model M, x * is the set of hyperparameters that yields the highest score, and x can take on any values in the space R.
In the optimized progress of Equation (1), excepted improvement (EI) was used as the criterion. EI is the expectation under some model M of f : x → R that f M (x) will exceed some threshold y * : In direct contrast to the Gaussian and random research model p M (y|x) are TPE models p(x|y) and p(y), where p(x|y) can be written as: where l(x) is the density formed by using the observations x (i) such that the corresponding objective score was lower than y * and g(x) is the density formed by using the remaining observations. With Bayes' rule, the EI equation becomes: Finally, the EI can be represented as: Equation (5) indicated that EI is proportional to the ratio of l(x)/g(x), and therefore, to maximize EI, we should draw scores of the hyperparameters, which are more likely to be under l(x) than under g(x). The TPE works by drawing sample hyperparameters, evaluating them in terms of l(x)/g(x), and returning the candidate x * greatest EI for each iteration. In this study, we achieved TPE with the help of a hyperopic Python package, which can be obtained at https://github.com/hyperopt/hyperopt, accessed on 29 September 2022. The process of using TPE to optimize the hyperparameters of KRR or SVR is demonstrated in Figure 1. In this study, the TPE was performed with Python software with the hyperopt package. Finally, the EI can be represented as: Equation (5) indicated that EI is proportional to the ratio of ( )/ ( ), and therefore, to maximize EI, we should draw scores of the hyperparameters, which are more likely to be under ( ) than under ( ). The TPE works by drawing sample hyperparameters, evaluating them in terms of ( )/ ( ), and returning the candidate * greatest EI for each iteration. In this study, we achieved TPE with the help of a hyperopic Python package, which can be obtained at https://github.com/hyperopt/hyperopt, accessed on 29 September 2022. The process of using TPE to optimize the hyperparameters of KRR or SVR is demonstrated in Figure 1. In this study, the TPE was performed with Python software with the hyperopt package. Optimization of the hyperparameters of KRR and SVR to construct the GP model using TPE. The working process can be divided into three steps: (1) select the ML algorithm and the dimension reduction methods. In this study, the ML we used was KRR or SVR, and the method to reduce the dimension was principal component analysis; (2) determine the hyperparameters that need to be optimized, which include the number of the principal component to construct the prediction model and the hyperparameters of KRR (e.g., kernel, alpha, etc.) and SVR (e.g., kernel, gamma, C, etc.); and (3) optimize the hyperparameters by using TPE and training the prediction model for GP.

PCA:
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions per observation, increasing the interpretability of data while preserving the maximum amount of information and enabling the visualization of multidimensional data. In this study, we applied PCA to reduce the dimensionality of the input feature, and the number of (k) components incorporated in SVR and KRR was optimized by TPE. The PCA procedure was performed using the Python software with the sklearn package. The working process can be divided into three steps: (1) select the ML algorithm and the dimension reduction methods. In this study, the ML we used was KRR or SVR, and the method to reduce the dimension was principal component analysis; (2) determine the hyperparameters that need to be optimized, which include the number of the principal component to construct the prediction model and the hyperparameters of KRR (e.g., kernel, alpha, etc.) and SVR (e.g., kernel, gamma, C, etc.); and (3) optimize the hyperparameters by using TPE and training the prediction model for GP. PCA: Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions per observation, increasing the interpretability of data while preserving the maximum amount of information and enabling the visualization of multidimensional data. In this study, we applied PCA to reduce the dimensionality of the input feature, and the number of (k) components incorporated in SVR and KRR was optimized by TPE. The PCA procedure was performed using the Python software with the sklearn package. KRR: KRR utilizes the kernel trick (φ(x i )) to construct a newly latent feature space and then build the ridge regression model in the latent space. KRR is represented as where β is the vector of weights. Regularized least squares is applied to optimize β as follows: where C is the regularization constant. By taking the derivative of L KRR concerning β equating the resulting equations to zero, the output weight vector β is obtained as: where φ(x i ) is the instance in the latent feature space. I is an identity matrix. According to the representer theorem, β can be represented with α as: thus: where K is the kernel matrix whose entries are obtained as: Finally, with a new test instance x i , the predicted value is obtained using dual weights and similarity between the test sample x i and all the training samples used for prediction.
where k = K x i , x j , j = 1, 2, . . . , n. In this study, the KRR was performed using the Python software with sklearn package, and the hyperparameters that need to be tuned are demonstrated in Table 2.
where h(x) T is the kernel function. As for the 'ε-insensitive' SVM regression, the SVR problem is represented as: Biology 2022, 11, 1647 V ε (r) is an 'ε-insensitive' error measure. If the absolute error between f (x i ) and y i of the ith instance is bigger than ε, the loss is calculated. λ is a positive constant. · 2 denotes the norm under a Hilbert space.
The SVR can be written as follows: and, whereα * i ,α i are positive weights given to each observation and estimated from the data, and the inner product kernel K(x i , x i ) is an n × n symmetric and positive definite matrix. All of the SVR procedure was performed using the Python software with sklearn package, and the hyperparameters needed to be optimized, as shown in Table 2.
GBLUP: GBLUP estimates GEBVs using phenotypes and genomic relationships that are calculated with the whole genome markers' information [43]. In GBLUP, the effect of each SNP was assumed to follow the identical normal distribution [44]. The formula of GBLUP is as follows: where y * is the vector of the corrected phenotype, and Z is an incidence matrix for individual effects. γ ∼ N 0, Gσ 2 g is a vector of breeding values. The σ 2 g is genetic variance. e ∼ N 0, Iσ 2 e is a vector of residuals, where I is an identity matrix and σ 2 e is the residual variance. G matrix was calculated with the formula: G = ZZ 2 ∑ p i (1−p i ) , p i is the MAF of the i-th marker. This study performed GBLUP using the software R with the rrBLUP package.

Assessing Prediction Performance
This study quantified the prediction accuracies with the Pearson correlation coefficients between predicted GEBV and the phenotypes. The prediction accuracy demonstrated in this paper was the mean of fifty replicates of ten-times repeat five-fold crossvalidation for each trait. The Pearson correlation coefficient was calculated as follows: , where y * was the phenotype.
To test the statistical significance of the observed differences between the method, Friedman's and Nemenyi post hoc tests were performed with the Python software with the Orange and Scipy package. Friedman's test operated on the average rank of the methods and checked the validity of the hypothesis (null hypothesis) and verified that all methods were equivalent. The Nemenyi post hoc test was used to find which methods, in particular, differed from GBLUP.

Results
Simulation dataset: Firstly, we demonstrated the performance of KRR and SVR, tuning the hyperparameters with TPE, RS, and Grid, compared with GBLUP in the simulated dataset ( Table 3). The simulated dataset, QTLMAS 16th, includes three traits, and the heritability was 0.45, 0.38, and 0.48, respectively. Table 2 summarizes the prediction accuracy performances of each strategy. The results showed that the average prediction accuracy of GBLUP is similar to KRR-TPE, and the prediction accuracy of SVR is slightly lower. For KRR, using TPE to optimize the hyperparameters achieved the highest accuracy, but KRR-RS also achieved good performance and was on average just 0.24% lower than KRR-TPE. The performance of SVR was unsatisfactory in this dataset, except for the lower prediction accuracy in T1 and T2. It can also not be applied to the GEBV prediction of T3. Note: the prediction accuracy of each method was assessed by the Pearson correlation coefficient between the predicted GEBV and the phenotypes of each trait. Five-fold cross-validation repeated ten times was used to ensure the high credibility of the results.
Chinese Simmental beef cattle: In addition to the simulation data, we also evaluated the prediction ability of each strategy in the real animal and plant dataset from cattle, cows, pigs, and pine. We compared the average prediction accuracy of each method in Table 4. The results indicated that the KRR-TPE predicted more accurate GEBV compared with KRR-RS, KRR-Grid, and GBLUP, improving prediction accuracy by 1.67% (−0.47-6.12%), 10.36% (8.73-13.08%), and 8.73% (3.40-17.88%) respectively. Although the performance of SVR did not meet the expectations compared with the reported studies, the prediction ability of SVR-TPE and SVR-RS was comparable with GBLUP for the Chinese Simmental beef cattle population. Note: prediction accuracy performance was measured by the Pearson correlation coefficient. The prediction accuracy was calculated by five-fold cross-validation repeated ten times, which means that the prediction accuracy in Table 3 is the mean of the fifty Pearson correlation coefficients between the predicted GEBV by each method and the phenotypes.
Loblolly Pine: The Loblolly Pine dataset concluded eight quantitative traits and the heritability of the traits was from 0.12 to 0.43. Figure  German Holstein population and pig population: Moreover, we calculated the average prediction accuracy of each method in the German Holstein population and pig population, both of which were genotyped using the chips with a marker density of about 50 k. Figure 3 compares the predictive ability of various methods to predict GEBV in the German Holstein and pig populations. For the pig population ( Figure 3A), except that KRR-Grid performed worse than GBLUP, an average of 5.63%, the rest of the methods were comparable with GBLUP, and there was no method that outperformed other methods. Figure 3B reports the predictive performance of the methods in the German Holstein population, which was similar to the performance in the pig population; there was no obvious difference in the average of each method.

German Holstein population and pig population:
Moreover, we calculated the average prediction accuracy of each method in the German Holstein population and pig population, both of which were genotyped using the chips with a marker density of about 50k. Figure 3 compares the predictive ability of various methods to predict GEBV in the German Holstein and pig populations. For the pig population ( Figure 3A), except that KRR-Grid performed worse than GBLUP, an average of 5.63%, the rest of the methods were comparable with GBLUP, and there was no method that outperformed other methods. Figure 3B reports the predictive performance of the methods in the German Holstein population, which was similar to the performance in the pig population; there was no obvious difference in the average of each method. . The prediction accuracy was assessed by the Pearson correlation coefficient between predicted GEBV and phenotypic values with five-fold cross-validation and was repeated ten times.
General evaluation: Firstly, we used Friedman's test and the Nemenyi post hoc test to test the statistical significance of the observed difference in all datasets, and the testing results are shown in Figure 4. The p-value of Friedman's test was 0.003, which means the average rank of each method was not equivalent. According to the Nemenyi post hoc test, the average rank of KRR-TPE differed significantly from GBLUP, and the average rank of other methods was not significantly different. Subsequently, we compared the time consumed by GBLUP, KRR-TPE, and SVR-TPE ( Figure 5), and the results indicated that KRR-TPE was slower than GBLUP and faster than SVR-TPE.  General evaluation: Firstly, we used Friedman's test and the Nemenyi post hoc test to test the statistical significance of the observed difference in all datasets, and the testing results are shown in Figure 4. The p-value of Friedman's test was 0.003, which means the average rank of each method was not equivalent. According to the Nemenyi post hoc test, the average rank of KRR-TPE differed significantly from GBLUP, and the average rank of other methods was not significantly different. Subsequently, we compared the time consumed by GBLUP, KRR-TPE, and SVR-TPE ( Figure 5), and the results indicated that KRR-TPE was slower than GBLUP and faster than SVR-TPE.
Biology 2022, 06, x FOR PEER REVIEW the Pearson correlation between predicted GEBV and phenotypic values with five-fold cr dation and was repeated ten times.
German Holstein population and pig population: Moreover, we calculated th age prediction accuracy of each method in the German Holstein population and p ulation, both of which were genotyped using the chips with a marker density o 50k. Figure 3 compares the predictive ability of various methods to predict GEBV German Holstein and pig populations. For the pig population ( Figure 3A), exce KRR-Grid performed worse than GBLUP, an average of 5.63%, the rest of the m were comparable with GBLUP, and there was no method that outperformed othe ods. Figure 3B reports the predictive performance of the methods in the German H population, which was similar to the performance in the pig population; there obvious difference in the average of each method. General evaluation: Firstly, we used Friedman's test and the Nemenyi post to test the statistical significance of the observed difference in all datasets, and the results are shown in Figure 4. The p-value of Friedman's test was 0.003, which me average rank of each method was not equivalent. According to the Nemenyi post h the average rank of KRR-TPE differed significantly from GBLUP, and the average other methods was not significantly different. Subsequently, we compared the tim sumed by GBLUP, KRR-TPE, and SVR-TPE ( Figure 5), and the results indicated th TPE was slower than GBLUP and faster than SVR-TPE.

Discussion
As ML has entered the twenty-first century, the theory and application of ML have been greatly developed. Breeders desired to predict more accurate GEBV with ML, further accelerating the genetic gain in animal and plant breeding. Before constructing the prediction model, the hyperparameters should be appointed, which determine whether machine learning algorithms can effectively mine information in the high-throughput genome data. The most-reported studies that used ML to predict GEBV tuned hyperparameters manually, which was cumbersome and relied on rich experience and was thus not friendly for novices. Therefore, there is an urgent need to develop an uncomplicated hyperparameters optimization strategy to promote ML application in GP. Thus, we simplified the process of predicting GEBV by using the automatic parameters tuning strategy, TPE, which optimized the hyperparameters automatically and performed better in reported studies [30][31][32][33]. The performance of KRR and SVR, tuned by TPE, RS, and Grid separately, were assessed by the simulation dataset and four real datasets. Overall, the performance of KRR-TPE outperformed the other methods.
Typically, the grid search sets a series of initial values for each hyperparameter and then evaluates the prediction ability of the estimator with every combination to select the optimum parameters with the most robust hyperparameters configuration. Although grid search works well in most cases, it can only ensure that the parameters selected are relatively optimal because it did not search from the complete parameter space. RS is essentially similar to grid search, randomly selecting the combination from the hyperparameters space and then repeating this process. If we repeat this process enough times, we can get the best configuration of the hyperparameters. To a certain extent, RS was capable of tuning hyperparameters automatically, but it seems impractical for genomic prediction due to being time-consuming. Although grid search and RS might find the super parameters relatively suitable for the prediction model, these have an obvious disadvantage and are still not intelligent enough.
The TPE algorithm is designed to optimize quantization hyperparameters with a more brilliant strategy to find a quantization configuration that achieves an expected accuracy target. TPE is an iterative process that uses the history of evaluated hyperparameters to create a probabilistic model using the Bayesian algorithm, which is used to advise on the next configuration of hyperparameters to evaluate. Therefore, it is highly performant and very time-efficient compared with RS and grid search [45]. Our results again proved that TPE is an excellent automatic tuning hyperparameters strategy. According to the results of the general evaluation section, the KRR-TPE was the only method significantly superior to GBLUP.
As for why the performance of SVR-TPE did not outperform SVR-Grid, we analyzed the process of tuning the hyperparameters of SVR and believe there were the following two main determinants. Firstly, to ensure the consistency of experimental results and the acceptability of calculation time for the actual breeding program, the iteration times to optimize the hyperparameters of SVR-TPE and KRR-TPE were set to only 200. However,

Discussion
As ML has entered the twenty-first century, the theory and application of ML have been greatly developed. Breeders desired to predict more accurate GEBV with ML, further accelerating the genetic gain in animal and plant breeding. Before constructing the prediction model, the hyperparameters should be appointed, which determine whether machine learning algorithms can effectively mine information in the high-throughput genome data. The most-reported studies that used ML to predict GEBV tuned hyperparameters manually, which was cumbersome and relied on rich experience and was thus not friendly for novices. Therefore, there is an urgent need to develop an uncomplicated hyperparameters optimization strategy to promote ML application in GP. Thus, we simplified the process of predicting GEBV by using the automatic parameters tuning strategy, TPE, which optimized the hyperparameters automatically and performed better in reported studies [30][31][32][33]. The performance of KRR and SVR, tuned by TPE, RS, and Grid separately, were assessed by the simulation dataset and four real datasets. Overall, the performance of KRR-TPE outperformed the other methods.
Typically, the grid search sets a series of initial values for each hyperparameter and then evaluates the prediction ability of the estimator with every combination to select the optimum parameters with the most robust hyperparameters configuration. Although grid search works well in most cases, it can only ensure that the parameters selected are relatively optimal because it did not search from the complete parameter space. RS is essentially similar to grid search, randomly selecting the combination from the hyperparameters space and then repeating this process. If we repeat this process enough times, we can get the best configuration of the hyperparameters. To a certain extent, RS was capable of tuning hyperparameters automatically, but it seems impractical for genomic prediction due to being time-consuming. Although grid search and RS might find the super parameters relatively suitable for the prediction model, these have an obvious disadvantage and are still not intelligent enough.
The TPE algorithm is designed to optimize quantization hyperparameters with a more brilliant strategy to find a quantization configuration that achieves an expected accuracy target. TPE is an iterative process that uses the history of evaluated hyperparameters to create a probabilistic model using the Bayesian algorithm, which is used to advise on the next configuration of hyperparameters to evaluate. Therefore, it is highly performant and very time-efficient compared with RS and grid search [45]. Our results again proved that TPE is an excellent automatic tuning hyperparameters strategy. According to the results of the general evaluation section, the KRR-TPE was the only method significantly superior to GBLUP.
As for why the performance of SVR-TPE did not outperform SVR-Grid, we analyzed the process of tuning the hyperparameters of SVR and believe there were the following two main determinants. Firstly, to ensure the consistency of experimental results and the acceptability of calculation time for the actual breeding program, the iteration times to optimize the hyperparameters of SVR-TPE and KRR-TPE were set to only 200. However, SVR had more hyperparameters compared with KRR, and insufficient iterations might result in SVR-TPE not achieving higher prediction accuracy. Secondly, our team had previously accumulated a lot of experience in SVR hyperparameters optimization in GP. Consequently, we can easily achieve the high prediction accuracy of SVR by manually tuning hyperparameters, but it is not realistic for most breeders and experimenters. Therefore, we have reason to believe that TPE has great potential to promote the wide application of ML in GP.
Although the computing speed of computers has been significantly improved, computing speed is still a major problem to be solved when widely applying ML to GP. To further tap the potential of automatic hyperparameters tuning and promote the application of ML in actual animal breeding, running the GP program in the graphics processing unit (GPU) might be a practical tactic. From my perspective, only by combining computer science, mathematics, and biology can we fully phase high-throughput genome information and achieve a breakthrough in animal breeding.

Conclusions
In conclusion, we first integrated KRR with tree-structured Parzen estimator optimized hyperparameters automatically to simplify the process of using machine learning for genomic prediction. The results indicate that incorporating KRR and TPE can greatly and more conveniently improve the prediction accuracy of GEBV. Especially for the Chinese Simmental beef cattle and Loblolly pine populations, the prediction accuracy of KRR-TPE had an 8.73% and 6.08% average improvement compared with GBLUP, respectively. Our study also promotes machine learning's wider application in actual animal and plant breeding.