Next Article in Journal
Evolutionary Analysis and Catalytic Function of LOG Proteins in Plants
Previous Article in Journal
Interferon Regulatory Factors (IRF1, IRF4, IRF5, IRF7 and IRF9) in Sichuan taimen (Hucho bleekeri): Identification and Functional Characterization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accuracy of Genomic Predictions Cross Populations with Different Linkage Disequilibrium Patterns

1
College of Animal Science, Anhui Science and Technology University, Chuzhou 233100, China
2
Anhui Province Key Laboratory of Livestock and Poultry Product Safety Engineering, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei 230031, China
3
Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2024, 15(11), 1419; https://doi.org/10.3390/genes15111419
Submission received: 18 October 2024 / Accepted: 29 October 2024 / Published: 31 October 2024
(This article belongs to the Section Population and Evolutionary Genetics and Genomics)

Abstract

:
Background/Objectives: There is a considerable global population of beef cattle, with numerous small-scale groups. Establishing separate reference groups for each breed in breeding practices is challenging, severely limiting the genome selection (GS) application. Combining data from multiple populations becomes particularly attractive and practical for small-scale populations, offering increased reference population size, operational ease, and data sharing. Methods: To evaluate potential for Chinese indigenous cattle, we evaluated the influence of combining multiple populations on genomic prediction reliability for 10 breeds using simulated data. Results: Within-breed evaluations consistently yielded the highest accuracies across various simulated genetic architectures. Genomic selection accuracy was lower in Group B populations referencing a Group A population (n = 400), but significantly higher in Group A populations with the addition of a small Group B (n = 200). However, accuracy remained low when using the Group A reference group (n = 400) to predict Group B. Incorporating a few Group B individuals (n = 200) into the reference group resulted in relatively high accuracy (~60% of Group A predictions). Accuracy increased with the growing number of individuals from Group B joining the reference group. Conclusions: Our results suggested that multi-breed genomic selection was feasible for Chinese indigenous cattle populations with genetic relationships. This study’s results also offer valuable insights into genome selection of multipopulations.

1. Background

The genome selection (GS) accuracy significantly hinges on the size of the reference population [1]. Notably, GS has seen its most mature application in dairy cows due to their ample reference population [2]. However, challenges arise in the beef cattle industry, distinguished by a large population with numerous small-scale groups. Establishing a distinct reference group for each breed becomes impractical in breeding practice, severely limiting GS application in these groups. To address this, multipopulation/variety GS has emerged, combining reference populations from different breeding organizations, enterprises, or breeds. This approach goes beyond merely adding individuals of the same origin. For a small number of populations, establishing a sufficient reference population size is impossible. Therefore, combining multiple population data for GS increases the reference population size, promotes easy operation, and accomplishes resource sharing, thus making it particularly attractive and practical for small-scale population.
At present, GS has been widely used. Mucha et al. attempted to estimate the genomic estimated breeding value (GEBV) of hybrid progeny using a reference population comprising three dairy goat breeds [3]. GS studies were conducted on single breed and mixed reference populations using Holstein cattle and Jersey cattle. The use of Bayes method resulted in an over 13% increase in the accuracy of GEBV estimation for specific traits in mixed reference populations [4]. Zhou, in comparing GS accuracy using single- and multi-group reference groups from multiple dairy cow groups in Norway, Denmark, Finland, and Sweden, discovered that mixed reference groups enhanced the prediction accuracy of production performance traits for Norwegian Red Bull and Danish dairy cow groups [5]. The Bayes model, coupled with Genome-Wide Association Studies (GWAS) marker screening, proves more suitable for the cost-effective GS of complex traits in multi-variety populations [6]. In the study of multi-breed GS, Van Den Berg et al. suggested that incorporating a marker screening process before the BayesR method could enhance the accuracy of predicting populations not included in the reference population. This marker screening method significantly improved computational efficiency [7]. Brito et al. compared methods of setting verification groups, including generation-based, k-means clustering, genome clustering, and random grouping. Genome clustering demonstrated the highest accuracy, while the k-means clustering method exhibited the lowest accuracy. It is evident that employing a mixed reference population with sufficient numbers yields greater accuracy than that of a single-variety GS [8]. Van den Berg’s study demonstrated the advantages of using sequencing data in conjunction with marker screening methods in multipopulation GS [9].
Simultaneously, studies utilizing SNP chips of varying densities for multipopulation GS research and application have been conducted. In cross-breed GS, incorporating individual data from candidate populations significantly enhances the reference population [10]. The study by Hoze and colleagues additionally demonstrates that utilizing a reference population encompassing multiple breeds leads to a 2.9% improvement in the precision of predictions as opposed to relying on a single-breed reference group [11]. Conversely, Kachman et al. observed that the prediction accuracy, using a mixed reference population from multiple groups, does not surpass that of a single variety in a sufficiently large reference population. Prediction accuracy is higher with close genetic relationships among layers but diminishes when the genetic relationship is distant [12,13]. Currently, GS results in multipopulation scenarios that are unsatisfactory, potentially influenced by genetic relationships, linkage disequilibrium (LD) consistency, and quantitative trait locus (QTL) differences among populations. Thus, conducting multipopulation studies using diverse methods holds significant importance in both GS theory and breeding applications.
Current situationof beef cattle poses a challenge to the efficiency of beef breeding in our country. The genetic resources used in beef cattle production and improvement mainly rely on foreign introductions. Breeding of Chinese beef cattle is in its initial stages, lacking a fully established infrastructure for breeding technology systems, such as performance measurement stations, breeding databases, and genetic evaluation platforms [14]. Since GS technology can make accurate selections without relying on phenotypic and genealogical data, it provides an opportunity for advancing Chinese beef cattle breeding. Chinese indigenous cattle exhibit a diverse pattern of linkage disequilibrium. Consequently, examining these cattle can provide significant insights about the genetic foundations of key characteristics and can be used to evaluate the effectiveness of genomic selection across multiple breeds. The objective of this research is to evaluate the effectiveness of implementing genomic selection across multiple populations in Chinese native cattle and to identify a feasible strategy for genomic selection suitable for Chinese native cattle populations of limited size.

2. Methods

The genotype data were retrieved from our previous study [15]. Therefore, no Animal Care Committee approval was necessary for the purposes of this study.

2.1. Animals and Methods

All individuals from 10 Chinese cattle breeds (Table 1) were genotyped via the Illumina BovineHD Beadchip (Illumina, Inc., San Diego, CA, USA). The simulation procedure was set up to generate a similar linkage disequilibrium structure of each breed as described by a previous study [15]. We started with 21–26 available samples for each breed comprising 658,234 SNPs. For each breed, we simulated 1500 individuals via resampling, which assumes a block of 500 adjacent markers for each population. Thus, the simulated data can retain similar patterns of LD (broken by strong recombination hotspots) and allele frequencies as observed in the original data.
SNP quality control (QC) involved PLINK v1.9 [16]. Samples with total call rates < 0.90 were excluded, and only autosomal SNPs were considered for subsequent analyses. Samples that had a total call rate of less than 0.90 were filtered out, and for the following analyses, only the autosomal SNPs were taken into account. SNPs with call rates (CRs) < 0.90, minor allele frequencies (MAFs) < 0.01, and significant deviation from Hardy–Weinberg Equilibrium (p < 1.0 × 10−6) were excluded. After QC, genotype phasing was performed using BEAGLE v5.0 [17]. The Chinese native cattle populations were categorized using K-means clustering, as executed within the R 3.6.1 program [18,19].

2.2. Principal Component Analysis and Persistence of Allele Phase

To examine the genetic makeup of both actual and simulated populations, principal components and the genomic relationship matrix (GRM) [20] were computed utilizing high-quality SNPs. Principal components were estimated using the prcomp function implemented in R package “stats”.
The continuity of alleles between the real and simulated genotypes was evaluated, with phase consistency being quantified by determining the Pearson correlation coefficient of the average linkage phase across various distances. The correlation coefficients (r) were calculated for marker pairs among populations, categorizing the marker distances into specific intervals: 2.5 kilobases (kb) for the short-range category (0–10 kb), 10 kb for the medium-range (10–100 kb), and 100 kb for the long-range (100–1000 kb).

2.3. Simulation

Phenotypes were simulated corresponding to the simulated genotype. A variety of scenarios were emulated as outlined in Table 2, encompassing different levels of heritability, quantities of QTLs, and distributions of QTL effects. A selection of SNP markers were randomly designated as QTLs, with their additive impacts being drawn from three distinct normal distribution: N (0, 0.0001 σ g 2 ), N (0, 0.001 σ g 2 ), and N (0, 0.01 σ g 2 ), which present large-, medium-, and small-effect QTLs, respectively, and σ g 2 is the additive genetic variance.
The true breeding values are ascertained by calculating the sum of the effects of their genotypes on the QTL. Environmental impacts were assigned randomly from a normal distribution characterized by a mean of 0 and a variance given by the formula V g 1 h 2 h 2 , with V g representing the genetic value variance and h 2 representing the trait heritability. The individual phenotypes were derived from the combined effects of genetics and environment.
For each scenario, phenotypes were simulated, where residuals were extracted from a suitable Gaussian distribution to generate three traits, each possessing a heritability of 0.1, 0.3, and 0.6, respectively. Each of these scenarios were replicated 10 times.
True breeding values (TBVs) were estimated by aggregating the impacts of the genotypes at the QTLs, as dictated by the following formula:
T B V = j = 1 n x i j a j
where x i j is the genotype of individual j coded as 0, 1, and 2 for QTL i ; a j is the additive effect of QTL i; and n is the number of QTLs.

2.4. Genomic Evaluation

Genomic breeding values were calculated across all scenarios employing the method of genomic best linear unbiased prediction (GBLUP). The GBLUP model was applied using the following formula:
y = X b + Z a + G g + e
where y represents a vector of observed phenotypes. The matrices X , Z , and G are used to allocate the phenotypes to the vectors b , a , and g , respectively, which correspond to fixed effects (such as the overall mean and breed-specific effects) and polygenic breeding values derived from genomic information. The vector e is a vector of residual errors distributed as N (0, I σ e 2 ), following the identity matrix I and error variance σ e 2 . Polygenic and genomic breeding values were distributed according to normal distributions, denoted as N (0, A σ a 2 ) and N (0, GRM σ g 2 ), respectively, where A represents the numerator relationship matrix, σ a 2 corresponds to the additive genetic variance, the GRM stands for the genomic relationship matrix, and σ g 2 refers to the genetic variance attributed to genomic variants. The GRM was constructed as described previously [21].

2.5. Reference and Validation Populations

Three scenarios of references were examined, which varied based on the size and composition of the reference population.
Scenario I: in this case, the reference population was constituted by 1200 individuals drawn from a single simulated breed, with each breed representing a unique reference population.
Scenario II: the reference population, consisting of 1200 individuals, was selected at random from three distinct simulated breeds, ensuring an equal representation from each breed’s population. K-means clustering and Principal Component Analysis (PCA) were applied to categorize the ten populations into three groups. These groups were then merged to create three separate reference populations. Additionally, for comparative analysis, a fourth reference population was established, which included three breeds (XZC, LSC, and HNC) from the different groups.
Scenario III: a reference population was established by pooling individuals from 10 different breeds, achieved by selecting 120 individuals at random from each of the ten individual populations.
The accuracy and bias of genomic prediction (GP) were assessed using a 5-fold cross-validation (CV) procedure. Within this methodology, the entire population was randomly partitioned into five distinct groups at random. During each round of the process, one group was designated as the validation set, and the other four groups functioned as the reference set. This random division was conducted five separate times for both the GBLUP and Bayesian models. The accuracy of the predictions was assessed by computing the average Pearson correlation coefficient between the adjusted phenotypic values and the GEBVs (genomic estimated breeding values) for the validation groups [22]. The formula is as follows:
P r e d i c t i o n   a c c u r a c y = c o r ( y ,   g e b v )
where y denotes the vector of adjusted phenotypes, and g e b v represents the vector of GEBVs.
To quantify the extent of prediction inflation or deflation, we calculated the correlation coefficient between the adjusted phenotype and GEBVs for individuals within the validation group. This metric was derived as follows:
b = c o r ( y ,   g e b v ) v a r ( g e b v )
where y and g e b v are the same as that in Equation (3), with the regression coefficient for unbiased models anticipated to be approximately 1, whereas values > 1 indicate a biased deflation prediction of GEBVs, and values < 1 indicate a biased inflation prediction of GEBVs [23].
To ascertain if haplotype-based approaches could markedly enhance the prediction accuracy compared to SNP-based methods, a one-sided paired t-test was employed to assess the statistical significance of the observed differences. The threshold for statistical significance was established at a p < 0.05.

2.6. Comparison of Genome Prediction Accuracy of Different Genetic Relationships Across Populations

Table 2 outlines the 11 selected combinations, employed to assess genome prediction accuracy. Using a reference population of 400 individuals from population 1 genome breeding values were predicted for 200 individuals each from population 1 and population 2. The accuracy was evaluated using the following formula:
r G E B V = C o r ( G E B V , E B V )
The precision of genomic prediction was gauged by the correlation between the projected genetic values and the true breeding values of the synthesized phenotypes. Each simulation scenario was conducted in quintuplicate, and the average accuracy was determined from these replicates.

2.7. Assess Genetic Relationships Between Populations

To fulfill genetic assessment requirements, 70 individuals were selected to maximize genetic relationships. Chip genotype data typing were utilized for this purpose, ensuring accuracy in assessing LD levels, as >70 samples are needed for a reliable evaluation of LD structure consistency between the two populations.

3. Results

3.1. Genetic Relationships Between Populations

The genetic relationship among 10 simulated populations was evaluated. For ease of quantification, we assessed LD structure consistency between the populations, ensuring marker spacing within the 0–100 mb range. A representative sample of 11 combinations (Table 3) was selected to evaluate the accuracy of cross-species genome selection.
Results indicated LD structure consistency ranging from 0.107 to 0.516. The lowest LD structure consistency (0.107277) was observed between simnd and simyh, while the highest (0.516203) was between simls and simzt. We attribute this variation to the correlation between LD structure consistency and individuals from the 10 Chinese cattle breeds. As the distance increases among individuals from these breeds, LD structure consistency between populations also increases.

3.2. Multipopulation Genomic Genetic Assessment of Variety Combinations

Genetic relationships guided the selection of populations for cross-breed and joint genetic assessment. Populations with LD structure consistency > 0.45 (marker spacing within 0–100 mb) were suitable for combined genomic genetic assessment. Figure 1 illustrates that, with a reference population comprising 400 and 200 individuals from populations A and B, respectively, the prediction accuracy of the genomes increased as LD consistency improved between the populations.
After comparison, the average prediction accuracy of four different simulated phenotypes was shown in Table 4 and Figure 1. Overall, cross-breed genome prediction accuracy rises with increased relationship values. When LD structure consistency between populations exceeds 0.45, cross-breed genome prediction accuracy for low, medium, and high genetic traits can reach 0.30, 0.35, and 0.37, respectively, equivalent to 60–69% of single-breed genome prediction accuracy.

3.3. Multipopulation Genomic Genetic Assessment Methods

Figure 2 illustrates the average accuracy of predictions for group A, indicated as 0.18, 0.31, and 0.43 for heritability (h2) of 0.1, 0.3, 0.6. Regardless of heritability, the prediction accuracy remains relatively stable as the number of individuals in group B increases. However, when the LD correlation between group A and B is high (0.5), utilizing the reference group of group A (n = 400) to predict group B results in low accuracy. When a few individuals of Group B (n = 200) are added to the reference group, the accuracy is relatively high (~60% of the prediction of group A). Furthermore, the accuracy increased as the number of individuals joining Group B increased.
When the same number of individuals joined Group B, the accuracy improvement varied with increasing heritability. Even in direct cross-breed genome assessments between two closely related breeds, the accuracy remains low. In practical applications, when using variety A to predict the genome of variety B, adding an appropriate number of variety B individuals to the reference population is essential for obtaining reliable accuracy.

4. Discussion

4.1. Simulation of Genotype and Phenotype

The assessment of multipopulation genomic prediction is contingent upon the LD patterns present within these populations.
Consequently, comprehending the LD patterns across various populations can provide valuable insights for exploring the genomic prediction accuracy in a multipopulation context. In the current investigation, simulations were conducted using a resampling method to accurately capture and reflect the allele frequencies and population-specific LD patterns observed in actual populations [24,25]; the genotype for each simulated individual was generated by extracting and repurposing genotype segments from the authentic genotypes of the animals under study. Consequently, the simulated population maintains the fundamental LD patterns structures and allele frequencies that are characteristic of the genuine data from Chinese indigenous cattle. The findings from our study serve as significant validation for the theoretical assessment of genomic prediction in Chinese indigenous cattle breeds.
Additionally, we assessed the characteristics of genomic variation across the entire genome and compared the efficacy of various strategies [26]. Based on the outcomes of PCA and the continuity of phase persistence assessments, it was determined that the simulated genotypes are capable of mirroring the realistic LD patterns, thereby making them suitable for exploring genomic prediction across multiple breeds.
In the simulation process, QTLs were chosen at random from the SNP loci present in the genuine genotype dataset. The allele frequencies of these QTLs vary across different breeds, suggesting that the majority of QTLs are segregating within these breeds. The phenotypes simulated for individuals within each population were distinct due to variations in MAF, which aligns with the genetic diversity observed in real datasets across different populations. This approach is instrumental in assessing the efficacy of genomic prediction for multiple populations.

4.2. LD Level and LD Structure Consistency

To assess LD levels in populations, this study utilized r2 instead of D′ due to its susceptibility to sample size [27]. Khatkar’s study indicated that an accuracy of 0.85 is achievable with a sample size > 55, which the groups/populations in this study met, ensuring sufficient LD assessment accuracy. Furthermore, LD levels, as observed in other breeds like Angus and Holstein cattle, decrease with increased marker distance [27,28,29].
The GP accuracy is influenced by the heritability and genetic structure of traits, impacting genetic progress in breeding [5,30]. GEBV accuracy is affected by trait heritability, with lower heritability leading to lower prediction accuracy [31,32,33]. In the simulation study of Spanish regional cattle, Mouresan et al. determined that gene prediction accuracy for a trait with 0.4 heritability ranged between 0.363 and 0.330 in the mixed seven breeds. Additionally, prediction accuracy decreased with decreasing heritability [34]. Esfandyari et al. [35] explored the advantages of applying GS to purebreds for enhancing the performance of crossbred performance (CP), using purebred data under two scenarios: low or high linkage disequilibrium (LD) phase correlation between the two pure lines. Their findings suggest that when there is a high correlation of LD phase between both pure lines, combining them into a single reference population enhances the selection accuracy of selection for predicting marker effects. Morgante F et al. [36] showed that within a population of individuals without genetic relationships and with low levels of LD, the additive GRM, which was built from all common variants (~1,800,000), failed to provide adequate predictive accuracy. This was true irrespective of the trait’s genetic architecture, including instances where the trait exhibited a purely additive architecture. Simultaneously, Cañas-Álvarez et al. [37] emphasized that all average accuracy estimates were positive, aligning with the persistent LD found between these populations and their genetic closeness.

4.3. Predictive Accuracies from Admixed Population

In this study, we noted that predictive accuracies were comparatively modest when assessing the reference and validation populations derived from distinct breeds. This discrepancy may arise due to high LD within the studied breeds, contributing to correlation between SNPs and causal polymorphisms, unlike in other breeds [38]. These findings align with prior empirical studies on traits of comparable heritability [31,32,39]. Meuwissen et al., through a simulation analysis, determined that a predictive reliability of 0.62 could be achieved for a training set comprising 1000 phenotypes with a heritability estimate of 0.5 [31]. The composition of the reference population significantly influenced the precision of the predictions, especially concerning the level of genetic association between the reference and the validation populations [40]. Our study indicates that incorporating these breeds into the reference population enhances the accuracy of predictions. Notably, genomic prediction accuracy, reported in a previous study for Holstein–Friesian and Jersey cows, ranged from 0.01 to 0.19.The addition of individuals from other breeds did not significantly enhance accuracy in the reference population [41]. In application, merging data from different breeds can augment genetic progress when the components of the mix share genetic ties. The precision of predictions was enhanced through the integration of multiple breeds grouped via the K-means approach [15]. Our results provided valuable insights into applying a pooled data approach for multiple population selection. However, certain investigations have indicated that the strategy of pooling data could potentially reduce the predictive precision for mixed populations [42,43].
Our results corroborated earlier research, demonstrating that when employing a composite reference of seven breeds, the predictive accuracies for Spanish native cattle with a heritability of 0.4 varied from 0.363 to 0.330. It is important to note that these accuracies tend to diminish as heritability decreases [34]. Utilizing a pooled data strategy could result in reduced accuracy, especially for elements of admixed populations with limited population sizes [44]. The primary concern revolves around optimizing benefits while considering genotyping costs. To improve the precision of genomic breeding value predictions, it is essential to include a significant number of animals with known genotypes and phenotypes in the training dataset [31,45]. Incorporating individuals from populations with genetic affinities is advantageous for the genotyping of smaller populations. This methodology is practical for the implementation of genomic prediction across a variety of small breeds, including local cattle breeds in different nations. This study revealed lower accuracy in cross-breed predictions, where the reference and validation populations originated from different breeds. This reduced accuracy may stem from the high LD [between QTL and single-nucleotide polymorphism (SNP)] present in the reference population but not necessarily mirrored in the validation population. Similar trends have been observed in other populations [32,38,39]. The composition of the reference population significantly impacts the precision of estimating genome breeding values, particularly when there is a considerable genetic distance between the reference and validation populations [40]. In breeding scenarios, the primary concern revolves around achieving accurate selection results while managing breeding costs. To enhance the accuracy of genomic breeding value estimation, it becomes imperative to establish a larger reference population incorporating both genotype and phenotypic data [45].

4.4. Effect of Heritability and Genetic Architecture

The trait’s heritability and genetic framework can affect the genetic improvement achieved through genomic prediction within breeding initiatives [5,30]. In our investigation, GBLUP was employed to forecast genetic merit, operating under the premise that each marker exerts an equal influence [46]. The heritability of a phenotype is a determinant of the dependability of GEBVs [31,33,47], with traits exhibiting low heritability generally yielding lower predictive accuracies. In our simulation, reduced heritability predominantly led to diminished predictive precision for the majority of traits and situations examined. The additive model, which includes all common variants, was only able to explain a fraction of the total heritability forcomplex trait [36], the observed decline in predictive accuracy in actual data can be ascribed to the “missing heritability” issue, stemming from the impact of non-additive genetic factors.
Previous research has indicated that genomic predictions derived from real data were not aligning consistently with the outcomes of simulation analyses [48,49]. A potential explanation is that the simulated data, which encompass a diverse genetic structure, substantially deviate from the characteristics of actual populations. Numerous investigations have contrasted methodologies using simulated genetic frameworks that include 50 or fewer QTLs, and their conclusions have demonstrated that the genetic architecture can influence the precision of predictions. This includes the number of QTLs and the variance they contribute [48,50]. In this study, individuals from 10 Chinese cattle breeds are employed, and three different strategies simulate and evaluate genetic relationships among the 10 populations. Thus, results indicate the feasibility of a multipopulation GS strategy for Chinese native cattle, even under small population sizes.

5. Conclusions

In conclusion, this study emphasizes that the accuracy of cross-breed genome assessments is notably low when conducted between closely related breeds. However, a practical and effective measure to improve accuracy in genome selection for breeds with small populations involves integrating individuals from other breeds with a close genetic correlation to the reference population. Upon establishing the combination method for genetic evaluation across multipopulations, we utilized common LD information to filter markers. The procedure involved the following steps: (1) calculating LD levels individually for markers in the genotype data of each variety; (2) within each variety, sorting the rij values among markers and selecting markers based on the criteria of 0.80 < r i j < 0.999 or 0.90 < r i j < 0.999; (3) by retaining a pair of markers in all populations, the two SNPs were designated as SNP sets, forming the basis for interpopulation LD selection for multipopulation genome genetic assessment. This finding holds crucial practical significance for implementing GS in local beef cattle breeds across diverse countries. This study’s results also furnish valuable references for optimizing genome selection strategies in multipopulation contexts.

Author Contributions

Data curation, J.L.; Formal analysis, H.J.; Funding acquisition, L.X. and J.H.; Investigation, S.Z.; Methodology, J.L.; Project administration, L.X., J.L. and J.H.; Resources, L.X. and J.H.; Software, L.J.; Supervision, L.X. and J.H.; Validation, Y.J.; Visualization, L.J. and H.J.; Writing—original draft, L.J. and L.X.; Writing—review and editing, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Program of Anhui Provincial Key Laboratory of Livestock and Poultry Product Safety Engineering (XM2405), Innovative Construction Project of Anhui Province (S202003b06020001), National Natural Science Foundation of China (32002162), and Joint Research on Improved Beef Cattle Breeds in Anhui Province ([2021]1146).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by Animal Experiment Welfare Ethics Committee of Anhui Academy of Agricultural Sciences (protocol code AAAS2024-31 and 10 October 2024).

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data cannot be made available, as they are the property of the cattle producers in China, and this information is commercially sensitive.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GS: genome selection; GEBV: genomic estimated breeding value; GWAS: genome-wide association studies; LD: linkage disequilibrium; QTL: quantitative trait locus; h2: heritability; CP: crossbred performance; SNP: single-nucleotide polymorphism; Simmg: Simulated Inner Mongolia cattle; Simyh: Simulated Yanhuang cattle; Simcdm: Simulated Caidamu cattle; Simpw: Simulated Pingwu cattle; Simls: Simulated Liangshan cattle; Simzt: Simulated Zhaotong cattle; Simws: Simulated Wenshan cattle; Simnd: Simulated Nandan cattle.

References

  1. Goddard, M. Genomic selection: Prediction of accuracy and maximisation of long term response. Genetica 2009, 136, 245–257. [Google Scholar] [CrossRef] [PubMed]
  2. Hayes, B.J.; Bowman, P.J.; Chamberlain, A.J.; Goddard, M.E. Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy Sci. 2009, 92, 433–443. [Google Scholar] [CrossRef] [PubMed]
  3. Mucha, S.; Mrode, R.; MacLaren-Lee, I.; Coffey, M.; Conington, J. Estimation of genomic breeding values for milk yield in UK dairy goats. J. Dairy Sci. 2015, 98, 8201–8208. [Google Scholar] [CrossRef] [PubMed]
  4. Hayes, B.J.; Bowman, P.J.; Chamberlain, A.C.; Verbyla, K.; Goddard, M.E. Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet. Sel. Evol. 2009, 41, 51. [Google Scholar] [CrossRef]
  5. Zhou, L.; Heringstad, B.; Su, G.; Guldbrandtsen, B.; Meuwissen, T.H.; Svendsen, M.; Grove, H.; Nielsen, U.S.; Lund, M.S. Genomic predictions based on a joint reference population for the Nordic Red cattle breeds. J. Dairy Sci. 2014, 97, 4485–4496. [Google Scholar] [CrossRef]
  6. Sollero, B.P.; Junqueira, V.S.; Gomes, C.C.G.; Caetano, A.R.; Cardoso, F.F. Tag SNP selection for prediction of tick resistance in Brazilian Braford and Hereford cattle breeds using Bayesian methods. Genet. Sel. Evol. 2017, 49, 49. [Google Scholar] [CrossRef]
  7. van den Berg, I.; Boichard, D.; Guldbrandtsen, B.; Lund, M.S. Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study. G3 Genes Genomes Genet. 2016, 6, 2553–2561. [Google Scholar] [CrossRef]
  8. Brito, L.F.; Clarke, S.M.; McEwan, J.C.; Miller, S.P.; Pickering, N.K.; Bain, W.E.; Dodds, K.G.; Sargolzaei, M.; Schenkel, F.S. Prediction of genomic breeding values for growth, carcass and meat quality traits in a multi-breed sheep population using a HD SNP chip. BMC Genet. 2017, 18, 7. [Google Scholar] [CrossRef]
  9. van den Berg, I.; Bowman, P.J.; MacLeod, I.M.; Hayes, B.J.; Wang, T.; Bolormaa, S.; Goddard, M.E. Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genet. Sel. Evol. 2017, 49, 70. [Google Scholar] [CrossRef]
  10. Toosi, A.; Fernando, R.L.; Dekkers, J.C. Genomic selection in admixed and crossbred populations. J. Anim. Sci. 2010, 88, 32–46. [Google Scholar] [CrossRef]
  11. Hoze, C.; Fritz, S.; Phocas, F.; Boichard, D.; Ducrocq, V.; Croiseau, P. Efficiency of multi-breed genomic selection for dairy cattle breeds with different sizes of reference population. J. Dairy Sci. 2014, 97, 3918–3929. [Google Scholar] [CrossRef] [PubMed]
  12. Daetwyler, H.D.; Swan, A.A.; van der Werf, J.H.; Hayes, B.J. Accuracy of pedigree and genomic predictions of carcass and novel meat quality traits in multi-breed sheep data assessed by cross-validation. Genet. Sel. Evol. 2012, 44, 33. [Google Scholar] [CrossRef] [PubMed]
  13. Wientjes, Y.C.; Veerkamp, R.F.; Calus, M.P. The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction. Genetics 2013, 193, 621–631. [Google Scholar] [CrossRef]
  14. Ma, H.; Li, H.; Ge, F.; Zhao, H.; Zhu, B.; Zhang, L.; Gao, H.; Xu, L.; Li, J.; Wang, Z. Improving Genomic Predictions in Multi-Breed Cattle Populations: A Comparative Analysis of BayesR and GBLUP Models. Genes 2024, 15, 253. [Google Scholar] [CrossRef]
  15. Xu, L.; Zhu, B.; Wang, Z.; Xu, L.; Liu, Y.; Chen, Y.; Zhang, L.; Gao, X.; Gao, H.; Zhang, S. Evaluation of linkage disequilibrium, effective population size and haplotype block structure in Chinese cattle. Animals 2019, 9, 83. [Google Scholar] [CrossRef]
  16. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.; Daly, M.J. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef]
  17. Browning, B.L.; Zhou, Y.; Browning, S.R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 2018, 103, 338–348. [Google Scholar] [CrossRef]
  18. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
  19. Jombart, T.; Ahmed, I. adegenet 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics 2011, 27, 3070–3071. [Google Scholar] [CrossRef]
  20. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
  21. Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011, 88, 76–82. [Google Scholar] [CrossRef] [PubMed]
  22. Bolormaa, S.; Pryce, J.; Kemper, K.; Savin, K.; Hayes, B.; Barendse, W.; Zhang, Y.; Reich, C.; Mason, B.; Bunch, R. Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle. J. Anim. Sci. 2013, 91, 3088–3104. [Google Scholar] [CrossRef] [PubMed]
  23. Xu, L.; Gao, N.; Wang, Z.; Xu, L.; Liu, Y.; Chen, Y.; Xu, L.; Gao, X.; Zhang, L.; Gao, H. Incorporating genome annotation into genomic prediction for carcass traits in Chinese Simmental beef cattle. Front. Genet. 2020, 11, 481. [Google Scholar] [CrossRef]
  24. Shi, M.; Umbach, D.M.; Wise, A.S.; Weinberg, C.R. Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect. BMC Bioinform. 2018, 19, 2. [Google Scholar] [CrossRef]
  25. Chen, L.; Yu, G.; Langefeld, C.D.; Miller, D.J.; Guy, R.T.; Raghuram, J.; Yuan, X.; Herrington, D.M.; Wang, Y. Comparative analysis of methods for detecting interacting loci. BMC Genom. 2011, 12, 344. [Google Scholar] [CrossRef]
  26. Carvajal-Rodríguez, A. Simulation of genomes: A review. Curr. Genom. 2008, 9, 155–159. [Google Scholar] [CrossRef]
  27. Khatkar, M.S.; Nicholas, F.W.; Collins, A.R.; Zenger, K.R.; Cavanagh, J.A.; Barris, W.; Schnabel, R.D.; Taylor, J.F.; Raadsma, H.W. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC Genom. 2008, 9, 187. [Google Scholar] [CrossRef]
  28. Lu, D.; Sargolzaei, M.; Kelly, M.; Li, C.; Vander Voort, G.; Wang, Z.; Plastow, G.; Moore, S.; Miller, S.P. Linkage disequilibrium in Angus, Charolais, and Crossbred beef cattle. Front. Genet. 2012, 3, 152. [Google Scholar] [CrossRef]
  29. Sudrajad, P.; Seo, D.; Choi, T.; Park, B.; Roh, S.; Jung, W.; Lee, S.; Lee, J.; Kim, S.; Lee, S. Genome-wide linkage disequilibrium and past effective population size in three Korean cattle breeds. Anim. Genet. 2017, 48, 85–89. [Google Scholar] [CrossRef]
  30. Karoui, S.; Carabaño, M.J.; Díaz, C.; Legarra, A. Joint genomic evaluation of French dairy cattle breeds using multiple-trait models. Genet. Sel. Evol. 2012, 44, 39. [Google Scholar] [CrossRef]
  31. Meuwissen, T.H.; Hayes, B.J.; Goddard, M. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef] [PubMed]
  32. Saatchi, M.; McClure, M.C.; McKay, S.D.; Rolf, M.M.; Kim, J.; Decker, J.E.; Taxis, T.M.; Chapple, R.H.; Ramey, H.R.; Northcutt, S.L. Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation. Genet. Sel. Evol. 2011, 43, 40. [Google Scholar] [CrossRef] [PubMed]
  33. Calus, M.; Meuwissen, T.; De Roos, A.; Veerkamp, R. Accuracy of genomic selection using different methods to define haplotypes. Genetics 2008, 178, 553–561. [Google Scholar] [CrossRef] [PubMed]
  34. Mouresan, E.; Cañas-Álvarez, J.; González-Rodríguez, A.; Munilla, S.; Altarriba, J.; Díaz, C.; Baró, J.; Molina, A.; Piedrafita, J.; Varona, L. Evaluation of the potential use of a meta-population for genomic selection in autochthonous beef cattle populations. Animal 2018, 12, 1350–1357. [Google Scholar] [CrossRef] [PubMed]
  35. Esfandyari, H.; Sørensen, A.C.; Bijma, P. Maximizing crossbred performance through purebred genomic selection. Genet. Sel. Evol. 2015, 47, 16. [Google Scholar] [CrossRef]
  36. Morgante, F.; Huang, W.; Maltecca, C.; Mackay, T.F. Effect of genetic architecture on the prediction accuracy of quantitative traits in samples of unrelated individuals. Heredity 2018, 120, 500–514. [Google Scholar] [CrossRef]
  37. Cañas-Álvarez, J.; Mouresan, E.; Varona, L.; Díaz, C.; Molina, A.; Baro, J.; Altarriba, J.; Carabaño, M.; Casellas, J.; Piedrafita, J. Linkage disequilibrium, persistence of phase, and effective population size in Spanish local beef cattle breeds assessed through a high-density single nucleotide polymorphism chip. J. Anim. Sci. 2016, 94, 2779–2788. [Google Scholar] [CrossRef]
  38. Van den Berg, I.; Meuwissen, T.; MacLeod, I.; Goddard, M. Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction. J. Dairy Sci. 2019, 102, 3155–3174. [Google Scholar] [CrossRef]
  39. Van Eenennaam, A.L.; Weigel, K.A.; Young, A.E.; Cleveland, M.A.; Dekkers, J.C. Applied animal genomics: Results from the field. Annu. Rev. Anim. Biosci. 2014, 2, 105–139. [Google Scholar] [CrossRef]
  40. Pszczola, M.; Strabel, T.; Mulder, H.; Calus, M. Reliability of direct genomic values for animals with different relationships within and to the reference population. J. Dairy Sci. 2012, 95, 389–400. [Google Scholar] [CrossRef]
  41. Wientjes, Y.C.; Calus, M.P.; Goddard, M.E.; Hayes, B.J. Impact of QTL properties on the accuracy of multi-breed genomic prediction. Genet. Sel. Evol. 2015, 47, 42. [Google Scholar] [CrossRef] [PubMed]
  42. Kachman, S.D.; Spangler, M.L.; Bennett, G.L.; Hanford, K.J.; Kuehn, L.A.; Snelling, W.M.; Thallman, R.M.; Saatchi, M.; Garrick, D.J.; Schnabel, R.D. Comparison of molecular breeding values based on within-and across-breed training in beef cattle. Genet. Sel. Evol. 2013, 45, 30. [Google Scholar] [CrossRef] [PubMed]
  43. Weber, K.; Thallman, R.; Keele, J.; Snelling, W.; Bennett, G.; Smith, T.; McDaneld, T.; Allan, M.; Van Eenennaam, A.; Kuehn, L. Accuracy of genomic breeding values in multibreed beef cattle populations derived from deregressed breeding values and phenotypes. J. Anim. Sci. 2012, 90, 4177–4190. [Google Scholar] [CrossRef] [PubMed]
  44. Rekaya, R. A multi-compartment model for genomic selection in multi-breed populations. Livest. Sci. 2015, 177, 1–7. [Google Scholar]
  45. Erbe, M.; Hayes, B.; Matukumalli, L.; Goswami, S.; Bowman, P.; Reich, C.; Mason, B.; Goddard, M. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 2012, 95, 4114–4129. [Google Scholar] [CrossRef]
  46. de Los Campos, G.; Vazquez, A.I.; Fernando, R.; Klimentidis, Y.C.; Sorensen, D. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 2013, 9, e1003608. [Google Scholar] [CrossRef]
  47. Daetwyler, H.D.; Villanueva, B.; Woolliams, J.A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 2008, 3, e3395. [Google Scholar] [CrossRef]
  48. Habier, D.; Fernando, R.L.; Dekkers, J. The impact of genetic relationship information on genome-assisted breeding values. Genetics 2007, 177, 2389–2397. [Google Scholar] [CrossRef]
  49. Wiggans, G.; VanRaden, P.; Cooper, T. The genomic evaluation system in the United States: Past, present, future. J. Dairy Sci. 2011, 94, 3202–3211. [Google Scholar] [CrossRef]
  50. Chen, L.; Li, C.; Miller, S.; Schenkel, F. Multi-population genomic prediction using a multi-task Bayesian learning model. BMC Genet. 2014, 15, 53. [Google Scholar] [CrossRef]
Figure 1. Relationship between cross-breed genome prediction accuracy and LD consistency between populations.
Figure 1. Relationship between cross-breed genome prediction accuracy and LD consistency between populations.
Genes 15 01419 g001
Figure 2. Influence of different reference population settings on cross-breed prediction accuracy (LD correlation between AB is 0.5).
Figure 2. Influence of different reference population settings on cross-breed prediction accuracy (LD correlation between AB is 0.5).
Genes 15 01419 g002
Table 1. Phenotypic simulation of the information on breeds.
Table 1. Phenotypic simulation of the information on breeds.
Cattle BreedsAbbreviationNumber
Inner Mongolia cattleMGC21
Yanhuang cattleYHC24
Caidamu cattleCDM25
Xizang cattleXZC26
Pingwu cattlePWC24
Liangshan cattleLSC22
Zhaotong cattleZTC23
Wenshan cattleWSC24
Hannan cattleHNC26
Nandan cattleNDC25
Table 2. Phenotypic simulation strategy.
Table 2. Phenotypic simulation strategy.
Phenotypic Simulation StrategynQTL 1nS 2nM 3nL 4Heritability (h2)
Strategy I100001000.1/0.3/0.6
Strategy II20001361614250.1/0.3/0.6
Strategy III50004595390150.1/0.3/0.6
Strategy IV10,00010,000000.1/0.3/0.6
1 Total number of QTLs. 2 Number of QTLs with small effect (nS). 3 Number of QTLs with medium effect (nM). 4 Number of QTLs with large effect (nL).
Table 3. LD structure consistencies were selected between cultivar combinations for comparison of cross-breed genome predictions.
Table 3. LD structure consistencies were selected between cultivar combinations for comparison of cross-breed genome predictions.
Population 1Population 2LD Structure Consistencies
simndsimyh0.107277
simcdmsimws0.174347
simyhsimzt0.2228
simndsimpw0.255829
simndsimzt0.28869
simpwsimws0.363149
simcdmsimyh0.411982
simmgsimyh0.457195
simlssimws0.431238
simndsimws0.453237
simlssimzt0.516203
Simmg (Simulated Inner Mongolia cattle), Simyh (Simulated Yanhuang cattle), Simcdm (Simulated Caidamu cattle), Simpw (Simulated Pingwu cattle), Simls (Simulated Liangshan cattle), Simzt (Simulated Zhaotong cattle), Simws (Simulated Wenshan cattle), and Simnd (Simulated Nandan cattle).
Table 4. Comparison of genome prediction accuracy of different genetic relationships across populations.
Table 4. Comparison of genome prediction accuracy of different genetic relationships across populations.
Popcor100Low HeritabilityMedium HeritabilityHigh Heritability
Population 1Population 2Population 1Population 2Population 1Population 2
nd-yh0.1070.4780.1670.4930.1910.5050.208
cdm-ws0.1740.5820.1840.5650.2120.5620.231
yh-zt0.2230.5560.2070.5880.2380.5760.261
nd-pw0.2560.4790.2340.4940.2700.5050.298
nd-zt0.2890.4780.2480.4940.2870.5050.315
pw-ws0.3630.5580.2740.5810.3170.5670.351
cdm-yh0.4120.5820.2740.5640.3180.5620.345
ls-ws0.4310.7050.2810.6470.3260.6200.349
nd-ws0.4530.5160.2590.5530.3010.5530.307
mg-yh0.4570.4790.2980.4910.3470.5030.365
ls-zt0.5160.5190.3470.5500.4040.5530.438
cor100 refers to the inter-marker LD structure consistency between two populations in the range of 0–100 mb; low, medium, and high forces represent heritability of 0.1, 0.3, and 0.6, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, L.; Xu, L.; Jin, H.; Zhao, S.; Jia, Y.; Li, J.; Hua, J. Accuracy of Genomic Predictions Cross Populations with Different Linkage Disequilibrium Patterns. Genes 2024, 15, 1419. https://doi.org/10.3390/genes15111419

AMA Style

Jin L, Xu L, Jin H, Zhao S, Jia Y, Li J, Hua J. Accuracy of Genomic Predictions Cross Populations with Different Linkage Disequilibrium Patterns. Genes. 2024; 15(11):1419. https://doi.org/10.3390/genes15111419

Chicago/Turabian Style

Jin, Lei, Lei Xu, Hai Jin, Shuanping Zhao, Yutang Jia, Junya Li, and Jinling Hua. 2024. "Accuracy of Genomic Predictions Cross Populations with Different Linkage Disequilibrium Patterns" Genes 15, no. 11: 1419. https://doi.org/10.3390/genes15111419

APA Style

Jin, L., Xu, L., Jin, H., Zhao, S., Jia, Y., Li, J., & Hua, J. (2024). Accuracy of Genomic Predictions Cross Populations with Different Linkage Disequilibrium Patterns. Genes, 15(11), 1419. https://doi.org/10.3390/genes15111419

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop