Impact of Selection Signature on Genomic Prediction and Heritability Estimation in Livestock

Hongzhi Zhang; Zhixu Pang; Wannian Wang; Liying Qiao; Wenzhong Liu

doi:10.3390/ani15101383

,

and

¹

Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, Shanxi Agricultural University, Taigu 030801, China

²

Key Laboratory of Farm Animal Genetic Resources Exploration and Precision Breeding of Shanxi Province, Taigu 030801, China

^*

Author to whom correspondence should be addressed.

Animals2025, 15(10), 1383;https://doi.org/10.3390/ani15101383

This article belongs to the Section Animal Genetics and Genomics

Version Notes

Order Reprints

Simple Summary

Genomic best linear unbiased prediction (GBLUP) has become a widely adopted method for animal genomic evaluation. This method directly estimates genomic estimated breeding value (GEBV) by constructing a genomic relationship matrix (GRM). The GRM is central to GBLUP and remains an active research topic, as different GRM construction methods correspond to different assumptions. In this study, two selection-adjusted GRMs were compared. Both models were evaluated for their accuracy in GEBV prediction and their performance in heritability estimation under varying selection signatures. The results demonstrated that explicitly modeling the relationship between marker effect sizes and minor allele frequency (MAF) during GRM construction can effectively improve the accuracy of GEBV predictions and enhance heritability estimation performance.

Abstract

Natural or artificial selection could shape genetic architecture, e.g., the relationship between minor allele frequency (MAF) and the effect sizes of causal variants (CVs). This study aimed to investigate the impact of the MAF–effect size relationship (as a selection signature, S) on genomic prediction and heritability estimation in livestock, using both simulated data (Holstein) and real datasets (Holstein and pigs). We evaluated the performance of two models: (1) selection-adjusted genomic best linear unbiased prediction (GBLUP-S), and (2) MAF-stratified selection-adjusted genomic best linear unbiased prediction (GBLUP-SMS). Simulation results demonstrated that for traits under strong negative selection (S < −1), both GBLUP-S and GBLUP-SMS outperformed classic GBLUP. The prediction accuracy of GBLUP-S improved by 0.011–0.031, while GBLUP-SMS achieved a gain of 0.005–0.025. Furthermore, GBLUP-SMS exhibited lower sensitivity to variations in S-values, whereas GBLUP-S heavily relied on accurate S specification. When the true S was matched, GBLUP-SMS generated more unbiased (or comparable) heritability estimates and higher prediction accuracy relative to GBLUP-S. Critically, mismatched S in GBLUP-S led to increased bias in heritability estimates and reduced prediction accuracy. Cross-validation with real phenotypic data from Holsteins and pigs demonstrated that implementing selection-adjust methods improved prediction accuracy by 0.015 for FP in Holsteins and 0.01 for T1 in pigs, while enhancing the unbiasedness of heritability estimates across all traits. Negative selection signatures were identified for cattle (S = −0.5) and pig T1, T2, and T3 (S = −1.5, −1, and −2, respectively). These findings advance the theoretical framework of GBLUP-based genomic prediction and heritability estimation.

Keywords:

genomic selection; heritability estimation unbiasedness; MAF–effect size relationship; GBLUP; MAF stratification

1. Introduction

Genomic selection (GS) has been widely applied in livestock and plant breeding programs [1,2]. The central task of GS is the prediction of genomic estimated breeding value (GEBV). Since the first statistical method for GS was proposed by Meuwissen et al. [3], new methods have continually emerged [4,5]. Under the linear model framework, there are mainly two types of methods: derivatives of best linear unbiased prediction (BLUP) [6] and Bayesian Alphabet methods [7,8]. The former methods calculate GEBV by constructing the genomic relationship matrix (GRM) [9], exemplified by genomic best linear unbiased prediction (GBLUP) [10], single-step genomic best linear unbiased prediction (SSGBLUP), and their extensions [11], while the latter methods indirectly derive GEBVs through marker effect estimation, such as BayesA [3]. However, no consensus exists on the optimal GS method for the prediction of specific traits.

In human genetic studies, natural selection has been shown to shape genetic architecture, e.g., the relationship between minor allele frequency (MAF) and the effect sizes of causal variants (CVs) [12]. Negative selection can cause a negative MAF–effect size correlation of CVs (i.e., lower MAF, bigger effect size), whereas positive selection reverses this trend. The linkage-disequilibrium-adjusted kinship (LDAK) model, designed for single nucleotide polymorphism (SNP)-based heritability estimation, accounts for the effect caused by selection through integrating a scale parameter

α

that captures the relationship between MAF and the effect sizes of SNPs in the GRM [13]. When linkage disequilibrium (LD) and genotype certainty are ignored, the GRM under strong negative selection assumption (i.e.,

α

= −1) matches that of the genome-wide complex trait analysis (GCTA) model [14]; whereas the GRM under neutrality assumption (i.e.,

α

= 0) aligns with the method in [10], which is commonly used in animal and plant prediction. Although the impact of selection on heritability estimation has been widely studied in human diseases [15], its implications for livestock heritability estimation and genomic prediction remain poorly understood.

Classic GBLUP uses a single GRM to estimate the similarity between individuals, implying that the effect sizes of all SNPs follow a common Gaussian distribution [16], which is often unrealistic. To address this, Speed proposed MultiBLUP, which constructs GRMs by SNP classification [17]. Additionally, a series of methods have been proposed for the heritability estimation by stratifying SNPs based on MAF and LD [18], such as genomic restricted maximum likelihood (GREML)-LDMS [19]. While the effectiveness of LD-stratified GRMs has been explored for GS [20], the suitability of MAF-stratified GRMs for GS remains to be investigated.

The objectives of this study were as follows: (1) validate selection-adjusted GRM strategies for genomic prediction in related livestock using Holstein simulations; (2) compare model performance under different selection signatures (MAF–effect size relationships) and GRM approaches (single vs. multi-GRM) in Holstein and pig datasets; (3) quantify trait-specific selection signatures in both species. These results may provide innovative perspectives for optimizing the implementation of genomic selection (GS) in livestock breeding programs.

2. Materials and Methods

2.1. Population and Genotypes

2.1.1. Holstein

This study utilized publicly available genomic data from 5024 German Holstein dairy cattle generated using the Illumina BovineSNP50 BeadChip (Illumina, Inc., San Diego, CA, USA) [21]. Quality control (QC) for SNPs was performed using PLINK 1.90 based on the following: (1) SNPs with missing call rates > 0.05 were removed; (2) SNPs deviating significantly from Hardy–Weinberg equilibrium (HWE) p-value < 10 × 10⁻⁶ were excluded; (3) SNPs with MAF < 0.01 were discarded. After QC, 42,214 SNPs were retained for subsequent analyses.

2.1.2. Pig

For pig data, this study employed publicly available Porcine SNP60 BeadChip (Illumina, Inc., San Diego, CA, USA) data from the Pig Improvement Company (PIC) (Oxfordshire, UK) [22], comprising 3534 individuals. SNPs located on the X and Y chromosomes were excluded. The overall genotype missing rate across the dataset was less than 0.01. Missing genotypes were imputed by Hickey et al. [22] using parental genotype information, resulting in imputed genotypes represented as decimal values. To ensure compatibility with downstream analytical software, these imputed genotypes were rounded to the nearest integer (0, 1 or 2). QC procedures identical to those described for Holstein were applied, retaining 50,435 SNPs after filtering.

2.2. Phenotypes

2.2.1. Simulated Phenotypes

Using QC-filtered SNPs from Holstein, 1000 SNPs were randomly selected as CVs. To simulate the relationship between MAF and the marker effect sizes of CVs, the effect size of each CV was calculated as

g_{i} = \sqrt{[{2 p}_{i} (1 - p_{i})]^{S} ({g_{i}}^{*})^{2}}

, where

{g_{i}}^{*} ~ N (0, 1)

and p_i represents the MAF of the i-th CV. Seven S-values (−2, −1.5, −1, −0.5, 0, 0.5, and 1) were used to simulate traits under varying selection signatures. Residuals were sampled from the distribution e_i ~

N (0, σ_{a}^{2} (1 - h^{2}) / h^{2}))

, where

σ_{a}^{2}

is the total genetic variance, and h² is the heritability (fixed at 0.1). The phenotype y_j for individual j was derived as

y_{j} = \sum_{i = 1}^{m} x_{i j} g_{j} + e_{i}

, where x_i is the genotype (coded as 0, 1, or 2) of individual j at the i-th CV. The simulation was repeated 10 times to minimize sampling variability and was implemented in R.

2.2.2. Real Holstein Traits

Three economically important traits were analyzed: milk yield (MY), milk fat percentage (FP), and somatic cell score (SCS). Estimated breeding values (EBVs) for these traits were derived using traditional methods and used as pseudo-phenotypes. These traits represent distinct genetic architectures: FP is influenced by a major-effect gene and numerous minor-effect genes, MY is influenced by a few moderate-effect genes and minor-effect genes, and SCS is influenced solely by minor-effect genes [23,24]. Prior to heritability estimation and genomic prediction, pseudo-phenotypes were adjusted for population means and fixed effects, resulting in zero-mean value. Standardization was further applied to ensure a standard deviation of 1 for each trait.

2.2.3. Real Pig Traits

Data for three traits (T1, T2, and T3) were analyzed. Environmental effects (birth year and farm) were corrected for all traits. Trait T3 was centered to a zero mean. For traits T1 and T2, pseudo-phenotypes were calculated for individuals without direct phenotypes using weighted progeny information. Single-trait best linear unbiased prediction (BLUP) was employed to estimate EBVs, which were then used as pseudo-phenotypes in this study.

2.3. Models for Genomic Selection and Heritability Estimation

This study implemented two enhanced GBLUP models for genomic prediction and heritability estimation: (1) the selection-adjusted GBLUP (GBLUP-S) model, referred to as the “

α

model” in [12] and applied to heritability estimation in human traits [25]; and (2) the MAF-stratified selection-adjusted GBLUP (GBLUP-SMS) model, an extension of MultiBLUP [17].

2.3.1. GBLUP-S Model

We used a mixed linear model:

y = Z a + e

(1)

where y is a vector of phenotypes corrected for fixed effect and overall mean, a is a vector of additive genetic values with a ~ N(0, G

σ_{a}^{2}

), G is the GRM,

σ_{a}^{2}

is additive genetic variance, Z is an incidence matrix corresponding to a, and e is a vector of residuals with each element ~

N (0, σ_{e}^{2})

, with

σ_{e}^{2}

as the residual variance.

The GRM was constructed as follows:

G = \frac{M M^{'}}{\sum_{i = 1}^{m} {[2 p_{i} (1 - p_{i})]}^{S + 1}}

(2)

where M is a centered genotype matrix with each element

m_{ij} = (x_{ij} - 2 p_{i}) \times {[2 p_{i} (1 - p_{i})]}^{\frac{S}{2}}

, x_ij is the genotype (coded as 0, 1, or 2) of individual j at the i-th SNP, and S is the assumed MAF–effect size relationship.

2.3.2. GBLUP-SMS Model

In this model, the SNPs were partitioned into five groups by MAF bin (0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5).

We used a multi-component mixed linear model:

y = \sum_{i = 1}^{5} Z_{t} a_{t} + e

(3)

where y and e are the same as for Equation (1), a_t is a vector of genetic values of individuals captured by SNPs in the t-th groups, with a_t ~ N(0, G^(t)

σ_{a}^{2}

^(t)), G^(t) as the GRM constructed using SNPs in the t-th groups, following the same method as described in Equation (2),

σ_{a}^{2}

^(t) is the additive genetic variance for the t-th bin, and Z_t is the incidence matrix for a_t.

2.4. Model Assessment

The genomic prediction and heritability estimation in this study were performed using LDAK 5.2 software [26], while data analysis and visualization were conducted using the R programming (v. 4.5.0).

For simulated data, each scenario was replicated 10 times. The reference and candidate populations were split in a 9:1 ratio. The performance of genomic prediction was assessed using the Pearson correlation coefficient between true breeding values (TBVs) and the GEBVs of individuals in the candidate population. Additionally, the log-likelihood ratio test statistic (LRT) was used to evaluate the goodness of model fit.

For real data, a 10-fold cross-validation approach was employed. Genomic prediction performance was assessed using the Pearson correlation coefficient between phenotypes and GEBVs. Heritability enrichment (defined as the ratio of observed heritability to expected heritability) facilitated understanding of the genetic architecture of complex traits [27]. Heritability enrichment served as an indicator to evaluate the performance of heritability estimation under different parameters of S in the GBLUP-SMS model. Genotypic data were stratified into five MAF bins to construct GRMs. The heritability enrichment for the i-th MAF bin was calculated as follows:

E n r i c h m e n t_{i} = \frac{{\hat{h}}_{S N P \in M A F_{i}}^{2}}{E ({\hat{h}}_{S N P \in M A F_{i}}^{2})}

(4)

where

{\hat{h}}_{SNP \in MA F_{i}}^{2}

is the estimated SNP heritability explained by the i-th bin, and

E ({\hat{h}}_{S N P \in M A F_{i}}^{2})

is the expected SNP heritability explained by the i-th bin, computed as follows:

E ({\hat{h}}_{S N P \in M A F_{i}}^{2}) = \frac{\sum_{j \in M A F_{i}} {[2 p_{j} (1 - p_{j})]}^{S + 1}}{\sum_{i = 1}^{5} \sum_{j \in M A F_{i}} {[2 p_{j} (1 - p_{j})]}^{S + 1}}

(5)

Heritability enrichment proportions close to 1:1:1:1:1 across the five bins indicates superior model fit and reliable heritability estimates. A chi-square statistic was constructed to evaluate the goodness-of-fit of the G-SeMS model under a different S:

χ^{2} = \sum_{i = 1}^{5} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}

(6)

where O_i is the observed heritability enrichment proportion of the i-th bin and E_i is the expected heritability enrichment proportion of the i-th bin. Five MAF bins were tested to assess whether heritability enrichment proportions followed a uniform distribution.

3. Results

3.1. Genetic Architecture for Simulated Traits

This study simulated multiple traits with heritability 0.1 across seven S-values using real genotypes of Holstein to assess the performance of genomic prediction and heritability estimation for GBLUP-S and GBLUP-SMS models. Figure 1 illustrates the genetic architecture of these simulated traits.

Figure 1. Genetic architecture of simulated traits across seven S-values. (A) Distribution of simulated effect sizes from CVs. (B) Distribution of heritability explained by simulated CVs.

The relationship between MAF and the effect sizes of CVs is shown in Figure 1A. The figure demonstrates that when the S = −2, effect sizes were higher in low-MAF regions and lower in high-MAF regions, displaying a left-skewed distribution pattern indicative of strong negative selection. As the S-value increased, effect sizes gradually decreased in low-MAF regions and increased in high-MAF regions. At S = 0, effect sizes were uniformly distributed across MAF intervals. When S = 1, higher marker effect sizes were observed in high-MAF regions, resulting in a right-skewed distribution.

The relationship between heritability and the MAF of CVs under different S-values is illustrated in Figure 1B. The heritability explained by the i-th CV was calculated as

h_{i}^{2} = 2 p_{i} (1 - p_{i}) σ_{g_{i}}^{2}

, where the simulated variance followed

σ_{g_{i}}^{2} \propto [2 p_{i} (1 - p_{i})]^{S}

. When S = −1,

E (h_{i}^{2}) \propto 1

, corresponding to a uniform distribution of SNP heritability across MAF intervals as shown in the figure. For S < −1, the heritability distribution exhibited a “left-skewed” pattern, while for S > −1, it displayed a “right-skewed” distribution.

3.2. Performance of Different Models in Terms of Generic Evaluation and Heritability Estimation in Simulated Data

The prediction accuracies of GEBV, heritability estimates, and LRT for simulated traits with heritability 0.1 under different selection signatures are presented in Figure 2, Figure 3, and Figure 4, respectively. Detailed numerical values of genomic prediction accuracies and heritability estimates are provided in Table S1.

Figure 2. Accuracy of genomic prediction of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Figure 3. Heritability estimates of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Figure 4. LRT statistics of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Both GBLUP-S and GBLUP-SMS methods demonstrated higher prediction accuracies when the S parameters approximated the true selection signature. Notably, GBLUP-S exhibited greater sensitivity to assumed S-value specification compared to GBLUP-SMS, with the latter showing smaller accuracy fluctuations across different simulated selection signatures. For simulated traits with selection signatures ≥ 0, GBLUP-S outperformed GBLUP-SMS across all S-values except at −2 and −1.5. Compared to the classic GBLUP method (as proposed by Vanranden et al., equivalent to GBLUP-S with S = 0), both methods showed great advantages when applied to traits under strong negative selection (selection signatures < −1) using S-values close to the true selection signatures: GBLUP-S achieved accuracy improvements of 0.011–0.031, while GBLUP-SMS showed gains of 0.005–0.025.

For all simulated traits with varying selection signatures, GBLUP-SMS consistently produced higher heritability estimates with greater stability compared to GBLUP-S (Figure 3; Table S1). The GBLUP-S method achieved elevated heritability estimates when the specified S parameters approximated the true selection signature of the simulated trait. Compared to the classic GBLUP, GBLUP-S generally generated higher heritability estimates when appropriate S-values were applied. Notably, both methods exhibited relatively large values in heritability estimates compared to the true heritability (0.1). Notably, no significant concordance was observed between GEBV prediction accuracies and heritability estimates using the two models.

As evidenced by the LRT statistics, while discernible differences in LRT statistics emerged across varying selection signature (S) parameters in the two models, both methods demonstrated adequate model fit. This indicates that the derived heritability estimates maintained satisfactory unbiasedness. Furthermore, a notable consistency emerged between the magnitude of LRT statistics and the heritability estimates, particularly for simulated traits under strong negative selection (S < −1).

3.3. Application to Real Traits

3.3.1. Holstein

For MY (Figure 5, Table S2), GEBV prediction accuracies for both GBLUP-S and GBLUP-SMS stabilized with negligible variation at S ≥ −0.5. GBLUP-SMS maintained robust accuracy across all S-values, while GBLUP-S exhibited lower overall performance. The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S4) demonstrated that GBLUP-SMS achieved optimal unbiasedness in heritability estimation at S = −0.5, yielding a heritability estimate of 0.809.

Figure 5. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for milk yield in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In D, red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

For FP (Figure 6, Table S2), GEBV accuracies for GBLUP-S increased monotonically with S, peaking at S = 1, while GBLUP-SMS maintained stable accuracies across all S-values. Compared to classic GBLUP, the maximum accuracy improvement reached 0.015 (GBLUP-SMS at S = 1 vs. classic GBLUP). The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S5) demonstrated that GBLUP-SMS achieves optimal unbiasedness in heritability estimation at S = 1, yielding a heritability estimate of 0.897.

Figure 6. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for fat percentage in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

For SCS (Figure 7, Table S2), both GBLUP-S and GBLUP-SMS achieved stabilized GEBV accuracies at S ≥ −1, with no significant differences between methods in this range. The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S6) demonstrated that GBLUP-SMS achieved optimal unbiasedness in heritability estimation at S = 0, yielding a heritability estimate of 0.800.

Figure 7. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for somatic cell score in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

3.3.2. Pig

For T1 (Figure 8, Table S3), GBLUP-S demonstrated a prediction accuracy improvement of ~0.01 compared to classic GBLUP at S = 1.5. The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S7) revealed optimal unbiasedness in heritability estimation under strong negative selection (S = −1.5), yielding a heritability estimate of 0.725.

Figure 8. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T1 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

For T2 (Figure 9, Table S3), GEBV prediction accuracies for both methods were indistinguishable across all S-values. The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S8) revealed that optimal unbiasedness in heritability estimation was achieved under negative selection (S = −1), yielding a heritability estimate of 0.744.

Figure 9. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T2 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

For T3 (Figure 10, Table S3), GEBV prediction accuracies for both methods remained indistinguishable across all S-values. The integrated analysis of LRT statistics, MAF-stratified enrichment ratios across varying S-values in the GBLUP-SMS model, and their corresponding chi-square statistics (Table S9) revealed that optimal unbiasedness in heritability estimation was achieved under strong negative selection (S = −2), yielding a heritability estimate of 0.694.

Figure 10. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T3 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

4. Discussion

4.1. Performance of Models in Simulation Analysis

The GRM constructed with parameter S = 0 in the GBLUP-S framework, equivalent to the classic GBLUP [10], remains the most widely applied approach in current animal and plant breeding, which assumes equal variance across all SNP effects while disregarding the impact of natural or artificial selection on genetic architecture, i.e., ignoring the relationship between MAF and marker effects. Notably, the GCTA implements the GBLUP-S model under S = −1, which postulates a strong negative correlation between MAF and marker effects, along with equal heritability contributions from each SNP [14].

Our study demonstrated that integrating selection-adjusted genetic structure into GRM construction through distinct genotype coding standardization strategies greatly influences the performance of heritability estimation and genomic prediction. This finding aligns with simulation results derived from human data [28]. Given the heterogeneity in genetic architectures across traits due to divergent selection pressures in domesticated species, the true selection signatures likely vary. Incorporating the relationship between marker effects and MAF may thus refine heritability estimation and genetic prediction models for livestock. Specifically, our results indicate that when the S value used in models matches or approximates the true selection signature, selection-adjust methods achieve higher heritability estimates while maintaining unbiasedness and genomic prediction accuracy. For empirical datasets, cross-validation within reference populations could provide preliminary inference of plausible ranges for S.

In human genetics, Bayesian methods such as BayesS [29] employ Markov Chain Monte Carlo (MCMC) algorithms to estimate trait-specific selection signatures. However, BayesS exhibits limited efficacy for traits with small phenotypic datasets. Consequently, its adaptation to large-scale breeding populations merits exploration to enhance heritability estimation and genetic evaluation performance [15].

For human heritability analyses, stringent quality control typically removes closely related individuals (e.g., excluding pairs with GRM values > 0.025) [19,30] to mitigate inflation from shared environmental confounders, thereby ensuring narrow-sense heritability estimates [31,32]. In contrast, limited population sizes in animal and plant breeding often preclude the establishment of unrelated population for reliable heritability estimation [19,33]. Our simulations revealed that GBLUP-S or GBLUP-SMS heritability estimates for traits with true h² = 0.1 were systematically larger than the true values, partly attributable to unaccounted shared environmental variance, suggesting that GBLUP-S or GBLUP-SMS estimates in breeding populations conflate SNP-associated variance with non-genetic factors, deviating from narrow-sense heritability definitions [34].

Notably, GBLUP-SMS demonstrated superior stability in heritability estimation and genomic prediction compared to GBLUP-S, with minimal performance fluctuations across varying S-values. This robustness implies that modeling distinct MAF distributions across genomic regions enhances adaptability to diverse genetic architectures [15].

4.2. Performance of Models on Real Data

Our findings indicate that the selection signature parameter S greatly affects heritability estimates and GEBV accuracy. Incorporating selection signatures (e.g., marker effect–MAF relationships) into genomic models may enhance heritability estimation unbiasedness through causal quantitative trait loci (QTL)-linked SNPs identification, with modest gains in genomic prediction.

For the GBLUP-S method, selecting an appropriate S value aligned with the trait’s genetic architecture is critical to achieving optimal heritability estimates or GEBV accuracy, consistent with its theoretical assumptions [26]. In contrast, GBLUP-SMS exhibits reduced sensitivity to S variations, though its performance remains partially influenced by parameter selection. For instance, in pig T3, the stratification of SNPs by MAF bins in GBLUP-SMS, which constructs separate GRMs for distinct MAF bins with independent distributional assumptions, may mitigate discrepancies between genetic architectures and model hypotheses (selection signature), particularly for traits governed by large- or moderate-effect genes.

In real phenotypic data, similar to simulated data, the presence of closely related individuals within populations can lead to relatively larger heritability estimates. However, unlike simulated scenarios where QTL positions are predefined, the unknown genomic locations of causal QTL in empirical data exacerbate the issue of ‘missing’ or ‘hidden’ heritability. Specifically, causal variants with true effects may fail to be captured due to either (1) incomplete linkage disequilibrium (LD) between QTL and genotyped SNPs on commercial panels, or (2) QTL with minor allele frequencies (MAF) too low (<0.01) to establish sufficient LD with nearby markers, collectively contributing to an underestimation of heritability [13].

5. Conclusions

Under the GBLUP model framework, incorporating the relationship between MAF and marker effect sizes caused by natural or artificial selection when constructing the GRM can improve the performance of heritability estimation and genomic prediction. Additionally, the GBLUP-SMS method demonstrates greater stability compared to GBLUP-S, as it does not rely on specifying S parameters in the model. The chi-square test serves as an effective approach to validate the performance of heritability estimation and genomic prediction.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ani15101383/s1, Table S1: Performance of GBLUP-S and GBLUP-SMS for simulated data in terms of genomic prediction and heritability estimation. Table S2: Performance of GBLUP-S and GBLUP-SMS for Holstein in terms of genomic prediction and heritability estimation. Table S3: Performance of GBLUP-S and GBLUP-SMS for a pig in terms of genomic prediction and heritability estimation. Table S4: Heritability enrichment ratio and chi-square values within each MAF bin of milk yield of Holstein. Table S5: Heritability enrichment ratio and chi-square values within each MAF bin of milk fat percentage of Holstein. Table S6: Heritability enrichment ratio and chi-square values within each MAF bin of somatic cell score of Holstein. Table S7: Heritability enrichment ratio and chi-square values within each MAF bin of T1 of pig. Table S8: Heritability enrichment ratio and chi-square values within each MAF bin of T2 of pig. Table S9: Heritability enrichment ratio and chi-square values within each MAF bin of T3 of pig.

Author Contributions

Conceptualization, W.L.; methodology, H.Z. and W.L.; software, Z.P. and H.Z.; validation, H.Z.; formal analysis, Z.P. and W.W.; investigation, Z.P.; resources, L.Q.; data curation, H.Z.; writing—original draft preparation, H.Z. and W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (31972560).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data (Holstein and pig) are publicly available from G3 (https://pmc.ncbi.nlm.nih.gov/articles/instance/4390577/bin/supp_5_4_615__index.html (accessed on 1 February 2024)) and (https://pmc.ncbi.nlm.nih.gov/articles/instance/3337471/bin/supp_2_4_429__index.html (accessed on 1 February 2024)).

Acknowledgments

The authors would like to thank the Department of Animal Genetics, Breeding and Reproduction for providing essential equipment. Thanks to the data provider of Holstein and pig.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hickey, J.M.; Chiurugwi, T.; Mackay, I.; Powell, W. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat. Genet. 2017, 49, 1297–1303. [Google Scholar] [CrossRef] [PubMed]
Meuwissen, T.; Hayes, B.; Goddard, M. Genomic selection: A paradigm shift in animal breeding. Anim. Front. 2016, 6, 6–14. [Google Scholar] [CrossRef]
Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef]
Crossa, J.; Pérez-Rodríguez, P.; Cuevas, J.; Montesinos-López, O.; Jarquín, D.; de Los Campos, G.; Burgueño, J.; González-Camacho, J.M.; Pérez-Elizalde, S.; Beyene, Y.; et al. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017, 22, 961–975. [Google Scholar] [CrossRef] [PubMed]
Hayes, B.J.; Bowman, P.J.; Chamberlain, A.J.; Goddard, M.E. Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy. Sci. 2009, 92, 433–443. [Google Scholar] [CrossRef]
Misztal, I.; Aguilar, I.; Lourenco, D.; Ma, L.; Steibel, J.P.; Toro, M. Emerging issues in genomic selection. J. Anim. Sci. 2021, 99, 1–14. [Google Scholar] [CrossRef] [PubMed]
Gianola, D.; de los Campos, G.; Hill, W.G.; Manfredi, E.; Fernando, R. Additive genetic variability and the Bayesian alphabet. Genetics 2009, 183, 347–363. [Google Scholar] [CrossRef]
Gianola, D. Priors in whole-genome regression: The bayesian alphabet returns. Genetics 2013, 194, 573–596. [Google Scholar] [CrossRef]
Aguilar, I.; Misztal, I.; Johnson, D.L.; Legarra, A.; Tsuruta, S.; Lawlor, T.J. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genomic prediction of Holstein final score. J. Dairy. Sci. 2010, 93, 743–752. [Google Scholar] [CrossRef]
VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy. Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
Legarra, A.; Christensen, O.F.; Aguilar, I.; Misztal, I. Single Step, a general approach for genomic selection. Livest. Sci. 2014, 166, 54–65. [Google Scholar] [CrossRef]
Schoech, A.P.; Jordan, D.M.; Loh, P.R.; Gazal, S.; O’Connor, L.J.; Balick, D.J.; Palamara, P.F.; Finucane, H.K.; Sunyaev, S.R.; Price, A.L. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 2019, 10, 790. [Google Scholar] [CrossRef] [PubMed]
Speed, D.; Cai, N.; Johnson, M.R.; Nejentsev, S.; Balding, D.J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 2017, 49, 986–992. [Google Scholar] [CrossRef]
Yang, J.; Benyamin, B.; McEvoy, B.P.; Gordon, S.; Henders, A.K.; Nyholt, D.R.; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W.; et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010, 42, 565–569. [Google Scholar] [CrossRef] [PubMed]
Zeng, J.; de Vlaming, R.; Wu, Y.; Robinson, M.R.; Lloyd-Jones, L.R.; Yengo, L.; Yap, C.X.; Xue, A.; Sidorenko, J.; McRae, A.F.; et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 2018, 50, 746–753. [Google Scholar] [CrossRef]
Neves, H.H.; Carvalheiro, R.; Queiroz, S.A. A comparison of statistical methods for genomic selection in a mice population. BMC Genet. 2012, 13, 100. [Google Scholar] [CrossRef] [PubMed]
Speed, D.; Balding, D.J. MultiBLUP: Improved SNP-based prediction for complex traits. Genome Res. 2014, 24, 1550–1557. [Google Scholar] [CrossRef]
Evans, L.M.; Tahmasbi, R.; Vrieze, S.I.; Abecasis, G.R.; Das, S.; Gazal, S.; Bjelland, D.W.; de Candia, T.R.; Goddard, M.E.; Neale, B.M.; et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 2018, 50, 737–745. [Google Scholar] [CrossRef]
Yang, J.; Manolio, T.A.; Pasquale, L.R.; Boerwinkle, E.; Caporaso, N.; Cunningham, J.M.; de Andrade, M.; Feenstra, B.; Feingold, E.; Hayes, M.G.; et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011, 43, 519–525. [Google Scholar] [CrossRef]
Ren, D.; Cai, X.; Lin, Q.; Ye, H.; Teng, J.; Li, J.; Ding, X.; Zhang, Z. Impact of linkage disequilibrium heterogeneity along the genome on genomic prediction and heritability estimation. Genet. Sel. Evol. 2022, 54, 47. [Google Scholar] [CrossRef]
Matukumalli, L.K.; Lawley, C.T.; Schnabel, R.D.; Taylor, J.F.; Allan, M.F.; Heaton, M.P.; O’Connell, J.; Moore, S.S.; Smith, T.P.; Sonstegard, T.S. Development and characterization of a high density SNP genotyping assay for cattle. PLoS ONE 2009, 4, e5350. [Google Scholar] [CrossRef] [PubMed]
Cleveland, M.A.; Hickey, J.M.; Forni, S. A common dataset for genomic analysis of livestock populations. G3 2012, 2, 429–435. [Google Scholar] [CrossRef]
Zhang, Z.; Ober, U.; Erbe, M.; Zhang, H.; Gao, N.; He, J.; Li, J.; Simianer, H. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS ONE 2014, 9, e93017. [Google Scholar] [CrossRef] [PubMed]
Hu, Z.L.; Park, C.A.; Wu, X.L.; Reecy, J.M. Animal QTLdb: An improved database tool for livestock animal QTL/association data dissemination in the post-genome era. Nucleic Acids Res. 2013, 41, D871–D879. [Google Scholar] [CrossRef]
Lee, S.H.; Yang, J.; Chen, G.B.; Ripke, S.; Stahl, E.A.; Hultman, C.M.; Sklar, P.; Visscher, P.M.; Sullivan, P.F.; Goddard, M.E.; et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 2013, 93, 1151–1155. [Google Scholar] [CrossRef] [PubMed]
Speed, D.; Hemani, G.; Johnson, M.R.; Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012, 91, 1011–1021. [Google Scholar] [CrossRef]
Tiezzi, F.; Maltecca, C. Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix. Genet. Sel. Evol. 2015, 47, 24. [Google Scholar] [CrossRef]
Eaves, L.J.; Last, K.A.; Young, P.A.; Martin, N.G. Model-fitting approaches to the analysis of human behaviour. Heredity 1978, 41, 249–320. [Google Scholar] [CrossRef]
Zeng, J.; Xue, A.; Jiang, L.; Lloyd-Jones, L.R.; Wu, Y.; Wang, H.; Zheng, Z.; Yengo, L.; Kemper, K.E.; Goddard, M.E.; et al. Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat. Commun. 2021, 12, 1164. [Google Scholar] [CrossRef]
Manolio, T.A.; Collins, F.S.; Cox, N.J.; Goldstein, D.B.; Hindorff, L.A.; Hunter, D.J.; McCarthy, M.I.; Ramos, E.M.; Cardon, L.R.; Chakravarti, A.; et al. Finding the missing heritability of complex diseases. Nature 2009, 461, 747–753. [Google Scholar] [CrossRef]
Visscher, P.M.; Yang, J.; Goddard, M.E. A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. (2010). Twin Res. Hum. Genet. 2010, 13, 517–524. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011, 88, 76–82. [Google Scholar] [CrossRef] [PubMed]
Lee, S.H.; DeCandia, T.R.; Ripke, S.; Yang, J.; Sullivan, P.F.; Goddard, M.E.; Keller, M.C.; Visscher, P.M.; Wray, N.R. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 2012, 44, 247–250. [Google Scholar] [CrossRef] [PubMed]
Zaitlen, N.; Kraft, P.; Patterson, N.; Pasaniuc, B.; Bhatia, G.; Pollack, S.; Price, A.L. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 2013, 9, e1003520. [Google Scholar] [CrossRef]

Figure 1. Genetic architecture of simulated traits across seven S-values. (A) Distribution of simulated effect sizes from CVs. (B) Distribution of heritability explained by simulated CVs.

Figure 2. Accuracy of genomic prediction of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Figure 3. Heritability estimates of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Figure 4. LRT statistics of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values. The white diamonds inside the boxes represent the mean values. The numbers at the top of each plot indicate the true S-values used in simulating the traits, while the values on the x-axis correspond to the S-values applied in the model.

Figure 5. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for milk yield in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In D, red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Figure 6. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for fat percentage in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Figure 7. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for somatic cell score in Holstein. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Figure 8. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T1 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Figure 9. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T2 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Figure 10. Performance of GBLUP-S (red) and GBLUP-SMS (cerulean blue) at different assumed S-values for T3 in pig. (A) Genomic prediction, (B) heritability estimation, (C) LRT statistics, and (D) heritability enrichment propositions across MAF bins obtained using GBLUP-SMS. In (A–C), white diamonds indicate mean values; blue dashed lines represent classic GBLUP. In (D), red, olive yellow, dark green, blue, and purple correspond to MAF bins 0.01–0.1, 0.1–0.2, 0.2–0.3, 0.3–0.4, and 0.4–0.5, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Impact of Selection Signature on Genomic Prediction and Heritability Estimation in Livestock

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Population and Genotypes

2.1.1. Holstein

2.1.2. Pig

2.2. Phenotypes

2.2.1. Simulated Phenotypes

2.2.2. Real Holstein Traits

2.2.3. Real Pig Traits

2.3. Models for Genomic Selection and Heritability Estimation

2.3.1. GBLUP-S Model

2.3.2. GBLUP-SMS Model

2.4. Model Assessment

3. Results

3.1. Genetic Architecture for Simulated Traits

3.2. Performance of Different Models in Terms of Generic Evaluation and Heritability Estimation in Simulated Data

3.3. Application to Real Traits

3.3.1. Holstein

3.3.2. Pig

4. Discussion

4.1. Performance of Models in Simulation Analysis

4.2. Performance of Models on Real Data

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics