Genomic Selection for Milk Production Traits in Xinjiang Brown Cattle

Simple Summary Milk production is an important trait in the breeding and genetic improvement of Xinjiang Brown cattle. To obtain the best strategy for improving the reliability of the breeding value estimation for each trait, we used single-trait and multitrait models based on the A-array pedigree-based best linear unbiased prediction (PBLUP) and H-array single-step genomic best linear unbiased prediction (ssGBLUP) to perform the genetic evaluation of different strategies using the restricted maximum likelihood (REML) and Bayesian methods. Upon comparison, the ssGBLUP calculation results of the multitrait models obtained using the REML and Bayesian methods were better than those of other strategies. Considering the calculation time, the multitrait model REML method is recommended for ssGBLUP calculation to accurately predict the breeding value of young animals; thus, this strategy should be used for the early breeding selection of Xinjiang Brown cattle. Abstract One-step genomic selection is a method for improving the reliability of the breeding value estimation. This study aimed to compare the reliability of pedigree-based best linear unbiased prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP), single-trait and multitrait models, and the restricted maximum likelihood (REML) and Bayesian methods. Data were collected from the production performance records of 2207 Xinjiang Brown cattle in Xinjiang from 1983 to 2018. A cross test was designed to calculate the genetic parameters and reliability of the breeding value of 305 daily milk yield (305 dMY), milk fat yield (MFY), milk protein yield (MPY), and somatic cell score (SCS) of Xinjiang Brown cattle. The heritability of 305 dMY, MFY, MPY, and SCS estimated using the REML and Bayesian multitrait models was approximately 0.39 (0.02), 0.40 (0.03), 0.49 (0.02), and 0.07 (0.02), respectively. The heritability and estimated breeding value (EBV) and the reliability of milk production traits of these cattle calculated based on PBLUP and ssGBLUP using the multitrait model REML and Bayesian methods were higher than those of the single-trait model REML method; the ssGBLUP method was significantly better than the PBLUP method. The reliability of the estimated breeding value can be improved from 0.9% to 3.6%, and the reliability of the genomic estimated breeding value (GEBV) for the genotyped population can reach 83%. Therefore, the genetic evaluation of the multitrait model is better than that of the single-trait model. Thus, genomic selection can be applied to small population varieties such as Xinjiang Brown cattle, in improving the reliability of the genomic estimated breeding value.


Introduction
Xinjiang Brown cattle was the first breed of cattle used for milk and meat purposes after the founding of the People's Republic of China [1]. The breeding industry of Xinjiang Brown cattle accounts for a large proportion of the local economic development as well as farmers' and herders' income. In 2018, the population of Xinjiang Brown cattle reached 1.5 million [2]. However, compared with the Holstein cattle, genetic improvement technologies for Xinjiang Brown cattle are relatively behind. Xinjiang Brown cattle are a unique species in Xinjiang, where for a long time, the breeding value for the milk production traits of this cattle breed was estimated using pedigree-based best linear unbiased prediction (PBLUP) through the construction of the additive genetic relationship matrix (A matrix) [1]. However, with the reduction in sequencing cost, cattle breeding has entered the genome era [3].
The large-scale breeding of Xinjiang Brown cattle population is limited, and the number of the grazing or semihouse feeding and semi-grazing population is high. Although the production performance database of core breeding farms has been preliminarily established, the limitations of Xinjiang Brown cattle production performance data, coupled with the lack of pasture management and the presence of a significant amount of truncated data, have led to the genetic evaluation of Xinjiang Brown cattle in failing to achieve better reliability in terms of genomic estimated breeding value, thus influencing the accurate selection of excellent breeding animals. Compared with PBLUP, genome prediction can reduce the breeding cost and generation interval by at least 50% [4], leading to faster genetic progression. Genomic selection (GS) technology can also improve the reliability of genetic evaluation; therefore, it is imperative to apply it in genetically evaluating Xinjiang Brown cattle. A study on the GS of cattle mainly focuses on the reliability of models, methods, and genomic estimated breeding values [4]. On the basis of a previous calculation of genomic best linear unbiased prediction (GBLUP) using the genetic-pedigree joint relationship matrix (G matrix), single-step GBLUP (ssGBLUP) was proposed [5]. Furthermore, it is more comprehensive than GBLUP in genetically evaluating individuals [6]. ssGBLUP can simultaneously use the phenotype and pedigree information of genotype and nongenotype animals [7,8], and combine the A and G matrices to construct a genetic-pedigree relationship matrix (H matrix) [9][10][11] to perform a one-step genetic evaluation. For example, Li et al. [12] used a one-step method to estimate the milk production trait of Holstein cattle in China and increased the accuracy of estimation by 0.12 compared with the two-step GBLUP method. However, the study also found that adjusting the proportion of G and A matrices in the one-step method did not improve reliability [13,14]. Studies using different livestock species further confirmed that ssGBLUP was more reliable than BLUP and GBLUP in estimating GEBV [15][16][17][18].
The multitrait BLUP (MBLUP) has been studied for decades. It is a method for the genetic evaluation of individuals for two or multiple traits using information such as the phenotypic and genetic correlations of traits. An advantage of MBLUP [19] is the increased accuracy of genetic evaluation. The multitrait genetic evaluation of the REML and Bayesian methods is widely used in cattle breeding [20]. Studies have shown that the accuracy of the breeding value estimated by the multitrait model is twice that of PBLUP [21]. The accuracy of estimation based on the G-matrix multitrait model was also 3% higher than that of the single-trait model [22], where many studies have reported the advantages of multitrait genetic evaluation or GS [23][24][25][26].
Therefore, this study aimed to use the REML and Bayesian methods to estimate the genetic parameters of the milk production traits of Xinjiang Brown cattle using the A and H matrices, respectively, with a goal to ultimately obtain the variance components and genomic breeding values of each trait of the single-and multitrait models; this will improve the reliability of the estimated values and provide theoretical support for a more accurate estimation of the genetic parameters of the Xinjiang Brown cattle population.

Data Source and Processing
Data of Xinjiang Brown cattle were obtained from 7516 production performance measurement records and 16 Effect Division: The field effect was divided into four levels, where the calving year was divided into seven levels according to phenotypic records: 1985-1995, 1996-2000, 2001-2005, 2006-2008, 2009-2011, 2012-2014, and 2015-2018; the calving season effect, according to the unique climatic conditions in Xinjiang, was divided into spring (April and May), summer (June, July, and August), autumn (September), and winter (January, February, March, October, November, and December) using the pentad mean temperature method; the birth order effect was divided into six levels: 1, 2, 3, 4, 5, and 6 (including >6 births).

Genotyping Data
The chip data included 403 Xinjiang Brown cattle cows, 71 Xinjiang Brown cattle bulls, and 11 Swiss brown cattle bulls, where a total of 139,376 single nucleotide polymorphism (SNP) loci were detected using the GeneSeek GGP Bovine 150 K chip (Illumina). The Beagle v.4.1 software was used for imputation, which mainly inferred the presence of haplotypes in the population based on the principle of linkage disequilibrium. Therefore, the quality of the chip data needed to be controlled to ensure the accuracy of imputation. Quality control standard: the SNP loci with an individual genotyping detection rate of <90%, individual genotype deletion rate of <10%, minor allele frequency of >0.01, and Hardy Weinberg equilibrium p-value of >1 × 10 −6 were excluded. Finally, 118,021 SNP loci of Xinjiang Brown cattle were retained for subsequent analysis.

Estimation of Genetic Parameters of the Single-Trait Model
Different relationship matrices (A or H matrix) were constructed using single-trait models, PBLUP and ssGBLUP, and the genetic parameters of milk production traits of Xinjiang Brown cattle were estimated using the REML method.
Matrix expression formula of each trait model is shown below: where Y is the observed value vector of each trait (including 305 dMY, MFY, MPY, and SCS), b is the fixed effect vector (field, calving year, calving season, and birth order), a is the random additive genetic effect vector, e is the residual effect vector, X is the fixed effect coefficient matrix, and Z is the random additive genetic effect coefficient matrix. For PBLUP [27] or ssGBLUP [28], assume a ∼ N 0, A or Hσ 2 a , e ∼ N 0, Iσ 2 e , where A is the additive genetic relationship matrix based on the pedigree, H is the genomicpedigree joint relationship matrix, σ 2 a represents additive genetic variance, I is the unit matrix, and σ 2 e represents the variance of random error effect. With given A and G values, H is calculated using the following formula: where subscripts 1 and 2 of A represent the populations of non-genotyped and genotyped animals, respectively; G is the genomic relationship matrix, and its calculation formula ; M is the incidence matrix of SNP effect, where elements 0 − 2p j , 1 − 2p j , and 2 − 2p j represent 11 homozygotes, 12 or 21 heterozygotes, and 22 homozygotes, respectively; p j is the minimum allele frequency of the jth SNP; m is the number of markers; and p k is the allele frequency of the Kth SNP. Therefore, the formula for H −1 is , where A −1 is the inverse matrix of all pedigree relationships; G −1 is the inverse matrix of genomic relationship; and A −1 22 is the inverse matrix of the pedigree relationship of genotyped individuals (the same letter in the multitrait model represents the same meaning).

Estimation of Genetic Parameters of the Multitrait Model
The REML and Bayesian methods were used on the multitrait animal model to perform genetic parameter estimation based on PBLUP and ssGBLUP, where the matrix form of the multitrait model [29] is as follows: where y i is the observed value vector of all individuals, b i is the fixed effect vector of the ith trait, a i is the additive genetic effect vector of the ith trait, e i is the residual random effect vector of the ith trait, and X i and Z i are the relationship matrices of β and a i , respectively. Assume where A is the additive genetic effect matrix; σ 2 a i and σ 2 e i are the additive genetic variance and random error effect variance of the ith trait, respectively; σ a i a j and σ e i e j (i = j) are the additive genetic covariance and random error effect covariance between the ith and jth traits, respectively. The formula for H −1 is The REML method of the REMLF90 module of BLUPF90 software [30] was used to estimate the variance components. After the calculation results converged, AIREMLF90 was used to run for 0 times and the approximate standard errors of all calculation parameters were obtained based on the algorithm implemented by Meyer and Houle in the AIREMLF90 program. As an alternative of SE, calculate SD for function of (co)variances by repeated sampling of parameter estimates from their asymptotic multivariate normal distribution, following ideas presented by Meyer and Houle 2013 [31]. The Bayesian method was followed using the GIBBS1F90 module in the BLUPF90 software and the Bayesian Gibbs sampling method. In the Bayesian method, the total strand length of samples was 100,000 and the length of the preheating strand was 10,000, with a sparse interval of 50. The Geweke diagnostics method of POSTGIBBSF90 module of BLUPF90 software was used to check the convergence of the Gibbs chain.

Calculation of Heritability
The calculation formula of heritability is as follows: where h 2 is heritability, σ 2 a is the additive genetic variance, σ 2 pe is the permanent environmental variance of individual, and σ 2 e is the residual variance. The formula for the standard error of heritability is shown below: where SE 2 h 2 is the standard error of heritability, σ 2 a is the additive genetic variance, and σ 2 p is the overall phenotypic variance, σ 2

Reliability of Breeding Value Estimation
The definition of reliability of GEBV was: where a and ε are independent, R 2 GEBV is the reliability of genomic estimated breeding value, Cor 2 (GEBV, a) is the square of the correlation coefficient between genomic estimated breeding and actual values. The reliability of animal breeding values was calculated according to prediction error variance as where PEV is the prediction error variance and V a is the direct additive genetic variance. Table 1 lists the sample number of observed values, minimum value, maximum value, mean value, standard deviation, and coefficient of variation of the milk production and reproductive traits of Xinjiang Brown cattle. Figure 1 shows that the data frequency distribution of the milk production traits of Xinjiang Brown cattle followed a normal distribution. The average milk production of Xinjiang Brown cattle in 305 days was 4216.49 kg, which was similar to that of Sanhe cattle and Chinese Simmental cattle but lower than that of dual-purpose cows such as European Fleckvieh cattle and Swiss brown cattle [32]. The mean MFY and MPY of the population were 168.53 kg and 143.79 kg, respectively, which were lower than those of Fleckvieh cattle and Swiss Brown cattle [33], and similar to those recently reported Italian Simmental cattle [34] SCS can be used as an indicator of breast health, where the lower the SCS value, the lower the risk for mastitis.

Descriptive Statistical Analysis of Each Trait
These traits can reflect the production efficiency and health status of dairy cows and are also important goals for the breeding of Xinjiang Brown cattle.   Table 2 shows the estimation results of the genetic parameters for the milk production traits of Xinjiang Brown cattle obtained using the single-and multitrait models based on PBLUP and ssGBLUP using the REML or Bayesian methods. The heritability of milk production traits of Xinjiang Brown cattle that was calculated on the basis of PBLUP and ssGBLUP using the multitrait models REML and Bayesian methods was higher than those of each trait calculated using the single-trait model REML method; moreover, the standard error of heritability of the former model was lower than that of the latter. However, the heritability results of the milk production traits of Xinjiang Brown cattle calculated using the REML and Bayesian methods based on PBLUP and ssGBLUP of the multitrait model were similar; Gibbs parameter chains of additive genetic variance of milk production traits under different relationship matrices were obtained using the Bayesian method. Geweke diagnostic test results show that the convergence trend of iteration curve indicates that the parameter chain converges. The heritability of 305 dMY, MFY, and MPY estimated by the two methods in the multitrait model was approximately 0.39 (0.02), 0.40 (0.03), and 0.49 (0.02), respectively, which indicated high heritability traits (h 2 > 0.3). However, the heritability of SCS was approximately 0.07 (0.02), which suggested a low heritability trait (h 2 < 0.1). Table 3 compares the estimated breeding value (EBV), GEBV reliability, and increased reliability (∆reliability) of the overall and genotyped populations of Xinjiang Brown cattle that were calculated via different methods and models. Results showed that the reliability of GEBV calculated using ssGBLUP was higher than that of EBV calculated using PBLUP in the overall and genotyped populations regardless of whether it was based on the single-or multitrait model. Comparing with the multi-and single-trait models of the REML method, the reliability of EBV estimated using the multitrait model was higher than that estimated using the single-trait model; the reliability of GEBV estimated using the multitrait model was higher than that estimated using the single-trait model. However, comparing the REML and Bayesian methods of the multitrait model, the reliability of EBV and GEBV of SCS calculated using the Bayesian method was slightly higher than that calculated using the REML method in the overall and genotyped populations. In contrast, the EBV and GEBV reliability results of other traits were similar (Figure 2). In all results, the ∆reliability of the genotyped population was higher than that of the overall population and the ∆reliability of the Bayesian method was slightly higher than that of the REML method. Table 2. Variance components and heritability of milk traits obtained using the pedigree relationship matrix (pedigree-based best linear unbiased prediction (PBLUP)) and combined genomic-pedigree matrix (single-step genomic best linear unbiased prediction (ssGBLUP)) (SE of variance components and heritability reported in parentheses).   Figure 2. Density map of the reliability of GEBV using different methods of the single-trait and multitrait models. M_GIBBS_H is the reliability of GEBV using the Bayesian method of the multitrait model, M_REML_H is the reliability of GEBV using the REML method of the multitrait model, and S_REML_H is the reliability of GEBV using the REML method of the single-trait model.

Analysis of Genetic Parameters of Milk Production Traits
In this study and the one by Zhou Jinghang, the heritability of 305 dMY and SCS estimated using the REML method of the DMU software based on the pedigree data of Xinjiang Brown cattle was 0.40 (0.017) and 0.08 (0.009), respectively, which were both higher than the heritability results of PBLUP calculated using the REML and Bayesian methods in this study; the standard error of the former was lower. However, the heritability results were similar to those of ssGBLUP calculated using the REML and Bayesian methods in this study. The estimated heritability of MFY and MPY was 0.30 (0.013) and 0.20 (0.011), respectively, which were significantly lower than the results estimated using the REML and Bayesian methods for different genetic relationship matrices of the multitrait model in this study [35]. In addition, the heritability of somatic scoring was lower than that of the American and Italian brown cattle populations (0.12 [36] and 0.14 [37], respectively).
In this study, the heritability obtained using the REML and Bayesian methods in the multitrait model was higher than that in the single-trait model. Furthermore, in all model methods, except for the decrease in SCS, the heritability of other traits calculated based on ssGBLUP was slightly higher than that calculated based on PBLUP [29]; the standard errors of the heritability of the two were similar. The heritability in genetic parameter estimation based on the G matrix was lower than that based on the A matrix [38,39]. During ssGBLUP estimation, the slight increase in heritability was mainly reflected in the increase in additive genetic variance components [40,41]. The additive genetic variance difference between PBLUP and ssGBLUP estimations was primarily due to the construction of the A and H matrices being different, and the scale of diagonal elements was also different [42]. For example, the individual breeding value, a i = 0.5(a s + a d ) + m i , where a s is the breeding value of the father, a d is the breeding value of the mother, and m i is the Mendelian sampling deviation of individuals. For the A matrix, its diagonal is the expected value of the genetic relationship coefficient between individuals. In contrast, the diagonal in the G matrix is the real genetic relationship coefficient between individuals considering the Mendelian sampling deviation [43,44]. For example, when using paternal or maternal phenotypic information to estimate the breeding value, if the genetic relationship of the A matrix is 0.5, the genetic relationship of the G matrix will be ≥0.5 or ≤0.5; the H matrix combines the A and G matrices. Several studies have shown that GBLUP and ssGBLUP are better than PBLUP. The main reason is that the G matrix shows a more realistic genetic relationship between individuals than the A matrix [44][45][46]. Therefore, the parameter estimation of the A and H matrices may have deviations [47]. In addition, many studies conducted in China and overseas have confirmed that the REML method is relatively ideal for estimating genetic parameters of livestock breeding [48]. As the Bayesian method can consider prior information of unknown parameters and provide accurate posterior distribution for a limited sample size [49], it is favorable for the genetic evaluation of small breed populations when there is a large amount of historical data available [50]; however, its calculation time is relatively long.

Predictive Analysis of EBV and GEBV Reliability
When using the REML method of the single-trait model, only the reliability of the breeding value of 305 dMY was >0.4. The reliability of the breeding value of milk production traits estimated based on PBLUP and ssGBLUP of the REML and Bayesian methods in the multitrait model ranged from low to moderate. In general, the increased ratio of the reliability of genotyped population was 0.9%-3.6% higher than that of the overall population, and the GEBV reliability of the genotyped population was up to 83%. Compared with PBLUP, ssGBLUP method with chip information added significantly and simultaneously improved the reliability of breeding value estimation in both REML and Bayesian methods (Figure 2), which indicated that GS using ssGBLUP is feasible for the genetic evaluation of Xinjiang Brown cattle. Although the Bayesian method of the multitrait model is slightly better than the REML method, using the REML method to calculate ssGBLUP can save more calculation time.
Relying on the linkage disequilibrium between SNP markers and quantitative trait locus of target traits, genomic prediction promotes the association between SNP markers and individual phenotypic values. Thus, the accurate estimation of SNP's genetic effect based on the phenotypic value has become a key factor in genomic prediction; the accuracy of genomic prediction [51] can be improved by increasing the number and accuracy of phenotypic values (by expanding the reference population) [52]. Genotyping candidate animals with different SNP loci and chip densities can also influence the accuracy of GS [53]. In the ssGBLUP method, the genomic breeding value estimation accuracy is higher for traits with a larger number of animal phenotypes and genotypes and those with high heritability [54].
With increased reference population size and improved pedigree data integrity, the accuracy of EBV or genomic breeding estimation increases [55]. However, there are relatively few genotyped populations of Xinjiang Brown cattle (genotyped individuals account for only 2.89%), which is the key factor restricting the reliability of ssGBLUP estimation of breeding value. Furthermore, some studies have reported that the reliability of low heritability traits is greatly improved when GBLUP is used [56]. In studies using different livestock [15,17,[57][58][59], the same conclusion that ssGBLUP can more accurately predict the genetic value of animals than the classical PBLUP method was drawn. In addition to increasing the number of reference groups of this variety, some studies have shown that the combined reference population significantly improves the reliability of GEBV for the populations of four countries. Compared with the use of the domestic population alone as a reference, the reliability of GEBV is increased by an average of 10% when different reference population sizes are used for different countries and traits. In addition, expanding the reference population size can significantly improve the reliability of GEBV [50] by 2-19% [52,60]. Therefore, in the future, the cross variety GS of Xinjiang Brown cattle can also be used as a strategy to expand the reference population and improve the reliability of the breeding value estimation.

Conclusions
It is feasible to use ssGBLUP to perform the genomic evaluation of the milk production traits of Xinjiang Brown cattle. However, the single-trait model does not consider the covariance between traits, and thus has a large error. The use of the multitrait model, regardless of the use of PBLUP or ssGBLUP, can greatly improve the reliability of the breeding value estimation. Moreover, the estimated genetic parameters provide a basis for calculating a more accurate breeding value of Xinjiang Brown cattle. However, if it is needed to obtain a higher level of breeding value reliability, the number of genotypes needs to be simultaneously expanded when using high-density chips (150 k). Nevertheless, applying the ssGBLUP method to the genetic evaluation of Xinjiang Brown cattle necessitates further study and analysis. The next challenge is to build a joint reference group through cross variety GS to expand the number of genotyped animals with the objective to improve the reliability of the breeding value estimation of various traits of Xinjiang Brown cattle.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: "Not applicable" for studies not involving humans.

Data Availability Statement:
The study did not report any data.