Next Article in Journal
Estimation of Genetic Parameters for Body Weight and Its Stability in Huaxi Cows from Xinjiang Region
Previous Article in Journal
Mitochondrial Regulation of Spermatozoa Function: Metabolism, Oxidative Stress and Therapeutic Insights
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle

1
Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
2
Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, United States Department of Agriculture, Beltsville, MD 20705, USA
3
Department of Animal Science, University of Minnesota, St Paul, MN 55108, USA
*
Author to whom correspondence should be addressed.
Animals 2025, 15(15), 2247; https://doi.org/10.3390/ani15152247
Submission received: 7 July 2025 / Revised: 28 July 2025 / Accepted: 29 July 2025 / Published: 31 July 2025
(This article belongs to the Section Animal Genetics and Genomics)

Simple Summary

Using a unique line of Holstein cattle unselected since 1964, we compared the genomes of unselected and selected Holstein cattle and reported genome differences due to long-term selection. We also integrated selection signatures with gene annotation, pathways, and the cattle QTL database to further explore the functional link between selection signatures and economic traits. The candidate selection signatures were involved in multiplex functions such as milk production, reproduction, and health. We confirmed that long-term artificial selection affected the whole genome rather than a few major genes due to the polygenic nature of the complex traits under selection.

Abstract

A unique line of Holstein cattle has been maintained without selection in Minnesota since 1964. After many generations, unselected cattle produce less milk, but have better reproductive performance and health traits when compared with contemporary cows. Comparisons between this line of unselected Holstein and those under selection provide useful insights that connect selection and complex traits in cattle. Utilizing these unique resources and sequence data, we sought to identify genome changes due to selection. We sequenced 30 unselected and 54 selected Holstein cattle and compared their sequence variants to identify selection signatures. After many years, the two populations showed completely different patterns in their genome-level population structures and linkage disequilibrium. By integrating signals from five different detection methods, we detected consensus selection signatures from at least four methods covering 14,533 SNPs and 155 protein-coding genes. An integrated analysis of selection signatures with gene annotation, pathways, and the cattle QTL database demonstrated that the genomic regions under selection are related to milk productivity, health, and reproductive efficiency. The polygenic nature of these complex traits is evident from hundreds of selection signatures and candidate genes, suggesting that long-term artificial selection has acted on the whole genome rather than a few major genes. In summary, our study identified candidate selection signatures underlying phenotypic differences between unselected and selected Holstein cows and revealed insights into the genetic basis of complex traits in cattle.

1. Introduction

Over hundreds of years, selective breeding has greatly increased production and meanwhile changed the genomes of domestic animals [1]. In dairy farming, selection and breeding practices are routinely employed to enhance economic traits such as milk yield, protein and fat content, and reproductive efficiency [2]. More recently, the U.S. dairy industry has shifted selection goals to traits related to disease resistance and efficiency (https://uscdcb.com/). Over generations, such artificial selection has left distinctive imprints across the bovine genome, known as selection signatures, reflecting the genetic basis of economically important dairy traits.
In this study, we used two groups of Holstein cattle, a selected group and an unselected line under the same management and environment conditions at the University of Minnesota since 1964 [3]. The unselected line was bred within the group without any selection, and the selected group was bred with commercial bull semen with top milk production. Due to these different breeding strategies, the control line remained unchanged for most traits, while the selected group had an increasing trend as to the national average for production traits, but a decreasing trend for fertility traits. As the unselected and selected lines were maintained under the same conditions, we hypothesized that genome differences between the two lines are responsible for the divergent phenotypic variation. Understanding the genomic differences underlying these two different populations is crucial not only for mapping genes related to economic traits, but also for maintaining the genetic diversity and long-term sustainability of breeding programs [4].
Many selection signature analysis methods have been developed to detect genomic regions under selection that are sensitive to distinct time scales and types of selective sweeps. For instance, population differentiation-based methods such as the Fixation Index (Fst) detect distant selection events with strong interpopulation differentiation. Nucleotide diversity can be used to identify genomic regions that have significantly reduced nucleotide diversity [5]. Haplotype-based methods such as cross-population extended haplotype homozygosity (XP-EHH) is useful for ongoing or nearly fixed sweeps [6], and the integrated haplotype score (iHS) is sensitive to rapidly increasing derived allele frequencies [7]. Allele frequency differentiation-based methods like the cross-population composite likelihood ratio (XP-CLR) effectively detects differences in allele frequencies at multiple loci between two populations without relying on changes in population size [8]. Different methods capture different aspects of selection, but may also miss signals due to their distinct assumptions and sensitivities.
Previously, we reported genome differences between these two groups of cattle using 50K SNP array data [9], which have limited resolution. In this study (Figure 1), we conducted whole genome resequencing analysis on 84 Holstein cattle from the unselected (n = 30) and selected lines (n = 54). We carried out direct genomic comparisons using five methods for selective signature detection: XP-EHH, the iHS, the XP-CLR, the θ/π ratio, and the Fst. We also assembled 8285 quantitative trait loci (QTL) and 155 candidate genes covering six major trait types, especially those associated with milk production and reproduction traits. Employing a set of complementary allele frequency and haplotype-based methods, along with the annotation of QTLs and candidate genes, this study aimed to identify selection signatures underlying phenotypic differences between unselected and selected Holstein cows and to reveal a comprehensive understanding of the genetic basis of complex traits in cattle.

2. Materials and Methods

2.1. Study Population and Sample Collection

This study used two groups of unique Holstein cattle resources (Figure 1): a closed herd maintained without selection since 1964 from Minnesota (GP1, n = 30) and a long-term artificially selected Holstein population in the United States breeding system (GP2, n = 54) [9]. Blood samples were collected from all individuals, and whole-genome sequencing data were generated.

2.2. Whole Genome Sequencing and Data Quality Control

Tissue samples were processed and sequenced with standard procedures. Variants were called per-sample using the DeepVariant model with option “WGS” [10]. Quality control measures were applied to ensure high-quality single nucleotide polymorphism (SNP) datasets. All of the acquired SNP data were quality checked using Plink v1.90 software [11], including the elimination of SNPs with low quality (QUAL < 10), a low minor allele frequency (MAF < 0.05), a high missing rate (F_MISS > 0.5), and multiple allelic variants. Finally, a total of 12,381,574 high-quality bi-allelic SNPs were retained for downstream analysis.

2.3. Analysis of Diversity and Population Structure

To compare genetic diversity between the two groups, filtered high-quality SNPs were used to estimate observed and expected heterozygosity (Ho and He, respectively) and the F coefficient with Plink v1.90. To evaluate population structure, filtered SNPs were first pruned for linkage disequilibrium (LD) in Plink v1.90 using the indep-pairwise command with parameters “50 5 0.2.” Specifically, a genome-wide scan was conducted with a 50-SNP window shifted by 5 SNPs at each step, and SNPs with an r2 of more than 0.2 were removed. Principal component analysis (PCA) was then conducted on the LD-pruned SNPs, and the results were visualized using the ggplot2 package in R [12]. To estimate and compare the patterns of LD between the unselected and selected populations, the LD coefficient (r2) was calculated for each pair of SNPs. With a maximum distance of 500 kb, PopLDdecay v3.4218 was used to generate LD decay plots based on the distance between SNPs [13].

2.4. Selection Signature Analysis

To carry out comprehensive analysis of significant genomic signals under selection, we employed five different detection methods: the fixation index (Fst), also known as the genetic differentiation index or Fst analysis; nucleotide diversity (PI); the cross-population composite likelihood ratio test (XP-CLR) [5,14]; cross-population extended haplotype homozygosity (XP-EHH) [6]; and the integrated haplotype score (iHS) to detect selective signatures between and within populations [7]. Each method was chosen for its ability to capture selection signals from different perspectives. All analyses were performed using standardized pipelines and software tools appropriate for each method. The Fst, the θ/π ratio, and allele frequencies were calculated using VCFtools with a 50 kb sliding window and a step size of 20 kb for scanning the whole genome [15]. The XP-CLR program used a 50 kb sliding window with a step size of 20 kb. XP-EHH and iHS scores were computed using Selscan with a 50 kb sliding window and a default step size with normalization applied to ensure comparability across genomic regions [16]. Finally, we summarized the common selection signatures within the top 5% thresholds in each method. To minimize false positives and enhance the reliability of detected signals, we focused on the top 5% of SNPs identified by at least four out of the five methods.

2.5. Functional Annotation and Enrichment Analysis

Annotation of reference genes was obtained from the ARS-UCD 2.0 bovine reference genome (gff_gcf_cattle_2.0) [17]. The significant SNPs were annotated and mapped to genes using the findOverlaps and queryHits functions in the GenomicRanges package [18]. Furthermore, candidate gene sets were identified and tested for enrichment using Shiny 0.82 for KEGG pathways [19] and the Metascape tool for GO biological process enrichment analysis [20].

2.6. Cattle QTL Annotation and Analysis

To further investigate the functional relevance of genomic regions under selection, QTL annotation was performed using the R package GALLO (v1.4.4) [21] with a ±250 kb window around each significant SNP to capture associated QTLs. Cattle QTL data were extracted from the Animal QTLdb (Animal_QTLdb_release55_cattleARS_UCD2.gff) [22]. Annotated QTLs were further categorized into functional groups, and functional enrichment analysis was conducted to explore potential biological pathways influenced by the identified selection signals.

3. Results

3.1. Population Analysis Identified Genome-Wide Differences Between Selected and Unselected Cattle

After quality control, a total of 12,381,574 high-quality bi-allelic SNPs were retained for population structure analysis. The average missing rate after quality control was 2.6%, indicating high genotyping reliability. To evaluate the genetic diversity of the population, we estimated the genome-wide inbreeding coefficients (F) of the two groups (Table 1), with the unselected population exhibiting a higher level of inbreeding (0.0783 ± 0.0155 SE) compared to the selected population (0.0493 ± 0.0095 SE). Correspondingly, the average observed heterozygosity (Ho) ratio was 0.291 in the unselected group and 0.305 in the selected group, while the expected heterozygosity (He) ratios were 0.316 and 0.321, respectively.
The LD decay analysis revealed that the unselected population exhibits a relatively slower LD decay rate (Figure 2A). Additionally, principal component analysis (PCA) using LD-pruned SNPs revealed a clear separation between the animals of these two groups (Figure 2B).

3.2. Selection Signature Analysis with Five Methods Identified Candidate Regions Under Selection

Manhattan plots of five methods were used to show specific selection signatures from each method (Figure 3).
Genome-wide selection signature scanning by five methods (Fst, Pi, XP-CLR, XP-EHH, and iHS) showed a large proportion of overlap in the detected candidate regions among different methods (Figure 4A). A total of 14,533 SNPs were fully covered by four or more methods. Of the 14,533 SNPs supported by ≥4 statistics, around 21% of them mapped to BTA 14 (2975), 14% to BTA 16 (2092), and 11% to BTA 17 (1613). These candidate regions include genes PLAG1 [23], PREX2 [24], CAPN2 [25], and EDNRA [26]. It is noteworthy that the iHS method, although it detected a higher number of signals, showed a lower percentage of overlap with other methods (Figure 4B).

3.3. Gene Annotation and Enrichment Analysis of Selection Signatures

A total of 163 genes were annotated in the candidate selection signature regions (top 5% from at least four methods) (Figure 5A). Among these, 155 genes had transcript-level annotations and were classified into 137 protein-coding genes and 14 long non-coding RNA (lncRNA) genes. Additionally, eight genes were annotated as pseudogenes, which were identified at the gene level but lacked corresponding transcript records.
GO and KEGG pathway enrichment analyses revealed significant processes related to milk production and overall physiological homeostasis. These included core metabolic functions and ion homeostasis, such as lipid and carbohydrate metabolism, and calcium and sodium ion transport (Figure 5B). Crucially, hormone-related pathways, including thyroid hormone synthesis and cellular response to hormone stimuli, were also enriched (Figure 5C). Furthermore, annotations highlighted processes vital for cellular structure, growth, and development, such as cell–cell junctions, skeletal system development, and mitotic cell-cycle regulation. Collectively, these enriched functions underpin the superior production performance and overall physiological resilience observed in the selected dairy cattle.

3.4. Cattle QTL Enrichment Analysis in Candidate Selection Signatures

Using the cattle QTL database [22], we annotated QTLs within a 250 kb window around the significant selection signature SNPs to better understand their biological importance. An integrated analysis of 8285 annotated QTLs discovered six major trait categories (Figure 6A), indicating a strong enrichment for selection signatures associated with economically important traits. Notably, selection signatures were predominantly associated with milk production and reproduction traits.
Focusing on production-related traits, the top 10 annotated QTLs included traits related to milk composition and yield, milk protein percentage, milk fat percentage, milk yield, and milk fat yield (Figure 6B). Traits related to milk fatty acid profiles, including the C14 index, C16 index, palmitoleic acid content, and myristoleic acid content, were also notably enriched, suggesting selection pressures on both milk quantity and milk quality. Similarly, in the reproduction category (Figure 6C), annotated QTLs were mainly associated with key fertility and calving traits. Specifically, QTLs related to calving ease and age at puberty were predominant.
The functional relevance of these identified selection signatures was further explored by functional enrichment analysis. Consistent with the QTL distribution in different types, these signatures are predominantly associated with a comprehensive suite of economically important traits, including various aspects of milk production and composition (e.g., milk fat yield, protein content, diverse fatty acid profiles, casein percentages, and rennet coagulation properties) directly aligning with the metabolic, hormone-related, and digestive pathways identified through gene enrichment (Figure 7). Furthermore, significant QTLs linked to reproductive performance (e.g., sexual precocity, calving ease, and dystocia) and growth and carcass characteristics (e.g., metabolic body weight, carcass length, and muscle PH) were also identified.

4. Discussion

This study employed an integrative analysis of selection signatures between a closed herd without selective breeding (unselected group) and contemporary commercially bred animals (selected group) and yielded a robust set of genomic and biological insights regarding selection. A total of 14,533 sequence variants were consistently identified by four or more selection signature methods, overlapping with 155 genes. This high concordance across different methods underscores the reliability of the detected selection signals, providing a strong foundation for understanding the genomic architecture shaped by artificial selection.
Comparison between the two groups of Holstein cattle on the genome-wide level suggests that the unselected population exhibits a slightly reduced level of heterozygosity and elevated inbreeding, likely due to its smaller effective population size and greater degree of local mating from historical isolation. In contrast, the selected group, representing a commercially selected population, shows higher heterozygosity and lower inbreeding, possibly reflecting structured breeding programs that maintained genetic variability.
Gene Ontology (GO) and KEGG pathway enrichment analyses of the 155 candidate genes revealed their involvement in a diverse array of biological processes that are critical for dairy cattle performance. Several biologically relevant terms were identified, possibly reflecting the physiological demands of intensive milk production, as well as the physiological processes of milk synthesis and secretion in dairy cattle [27]. Pathways related to cellular metabolism and ion homeostasis were significantly enriched, including those governing lipid metabolism (CAPN2, LYN, PAX2, RET, PLCB1, PAQR8, and GPR155). CAPN2, LYN, PAX2, and RET have established roles in mammary gland development and function related to milk production and post-lactational processes [28,29,30,31], and PAX2 has been indicated to be relevant to milk fatty acid in Holstein cows [32].
The enrichment analysis identified another important pathway relevant to carbohydrate digestion and absorption and the transport of calcium and sodium ions (EDNRA, PKD2L1, ITPR2, SLC17A1, SLC38A4, PTPRC, TRPV2, DST, CAPN2, MAN1A1, SCGN, MYOF, EFCAB2, ADGRE3, and EFHC1). ITPR2 has been identified as a significant gene associated with milk yield and myristic acid percentage in milk [33]. A previous study reported the association of MAN1A1 with milk production and nitrogen excretion in dairy cows [34]. TRPV2 has been reported as associated with resistance to mastitis as well as milk yield and milking rate [35,36]. Furthermore, genes and pathways related to hormone regulation and signaling were prominently identified, indicating the importance of endocrine systems on growth and reproduction, which are intensely selected in cattle breeding programs. For example, the GHR gene serves an essential role in reproductive function through mediating the influence of growth hormone (GH) in diverse reproductive events [37]; EDNRA and SMYD3 are associated with reproductive traits in Holstein cattle [38]. PAQR8, a membrane progesterone receptor, was shown to have a possible role in mediating progesterone signaling, which is important for reproductive processes [39,40].
Additionally, the gene and enrichment analysis found pathways associated with cellular structure, growth, and development, including genes ADD3, CTNNA2, PTPRM, PDZD2, KAZN, DNMBP, PLEKHA7, THEMIS, IGSF9, CNTN4, SDK1, LAMA1, and CADM2. These genes and pathways are related to cell–cell junctions and homophilic cell adhesion, suggesting selection in dairy cattle for epithelial integrity and efficient cellular communication. Previously, studies have shown that CTNNA2 (catenin alpha 2) is implicated in milk yield, milk fat, milk quality, and milking speed in Holstein cows [41,42]. IGSF9, SDK1, and CADM2 are members of the immunoglobulin superfamily [43,44]. Their patterns of expression and functions imply a possible influence on tissue structure and milk production efficiency. Recent evidence suggests variants in LAMA1 have been associated with improvements in genomic prediction accuracy for milk yield, highlighting LAMA1 as a promising target for enhancing dairy cattle productivity [45].
The potential functional relevance of these selection signatures was also explored by an integrated analysis of 8285 cattle QTLs. These QTLs are associated with a comprehensive set of economically important traits, directly validating the importance of selection signatures and aligning with gene functions and pathways. Notably, dominant milk production QTLs might reflect the historical selection objectives aimed at improving dairy production and quality. QTLs for various aspects of milk composition (e.g., milk fat yield, protein content, diverse fatty acid profiles, casein percentages, and rennet coagulation properties) directly align with the identified metabolic, digestive, and hormone-related pathways. For instance, the enrichment of lipid metabolism genes (e.g., FBP and GPR155) and hormone signaling pathways provides a clear genetic basis for the observed improvements in milk fat and protein synthesis, as well as overall milk yield. Similarly, the prominence of reproduction-related QTLs (e.g., sexual precocity, lactation persistence, calving ease, and dystocia) might indicate breeding programs concurrently targeting functional traits to mitigate the fertility decline associated with intense selection for production. Along with enhancing economic returns, these could further reinforce the role of hormone signaling and developmental processes. For example, GHR is central to growth and metabolism. In addition, the presence of QTLs related to health and immune responses (e.g., interleukin-4 level and bovine tuberculosis susceptibility) further connects selection signatures to broader physiological adaptation, which could correspond to some pathways such as cell–cell junction and calcium ion binding, supported by genes like LYN [46] and INPP4B [47,48]. INPP4B facilitates cell survival and proliferation and supports the long-term lodgment of immune cells within tissues, which may highlight the potential of this gene as a high-value marker for genomic selection in dairy cattle for enhanced health and productivity.

5. Conclusions

Our results demonstrate that the genomic regions under selection in dairy cattle are critical for their superior productivity, health, and reproductive efficiency. The polygenic nature of these complex traits is evident from the hundreds of candidate genes and QTLs identified, which highlights that artificial selection has acted on the whole genome rather than on a few major genes. The candidate selection signatures provided useful insights into the genetic basis of complex traits and selection responses in dairy cattle. However, our study was limited by a small sample size and potential noises introduced by genetic drift. Therefore, in the future more powerful research is needed to elucidate how specific genetic variations affect economically important traits by combining selection signature results with functional association studies.

Author Contributions

L.M. and Y.D. conceived this study. J.C., L.Y. and Y.G. analyzed and interpreted data. J.C. and L.M. wrote the manuscript. Y.D. and G.E.L. contributed tools and materials. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by USDA National Institute of Food and Agriculture (NIFA) AFRI grant numbers 2020-67015-31133, 2021-67015-33409, and 2024-67015-42295. This work was also supported in part by UDSA ARS appropriated projects 8042-31000-001-00-D, 8042-31000-002-00-D, and 8042-31310-078-00-D. The funders had no role in the study’s design, data collection and analysis, the decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

Not applicable as no animal experiment was done for this study.

Informed Consent Statement

Not applicable.

Data Availability Statement

The whole-genome sequence data of the 30 unselected and 54 selected Holstein cattle were uploaded to the NCBI SRA database with SRA Bioproject number PRJNA1287090.

Acknowledgments

We thank the Council on Dairy Cattle Breeding (CDCB; Bowie, MD) for providing cattle GWAS data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hayes, B.J.; Lewin, H.A.; Goddard, M.E. The Future of Livestock Breeding: Genomic Selection for Efficiency, Reduced Emissions Intensity, and Adaptation. Trends Genet. 2013, 29, 206–214. [Google Scholar] [CrossRef]
  2. Wiggans, G.R.; Cole, J.B.; Hubbard, S.M.; Sonstegard, T.S. Genomic Selection in Dairy Cattle: The Usda Experience. Annu. Rev. Anim. Biosci. 2017, 5, 309–327. [Google Scholar] [CrossRef]
  3. Hansen, L. Consequences of Selection for Milk Yield from a Geneticist’s Viewpoint. J. Dairy Sci. 2000, 83, 1145–1150. [Google Scholar] [CrossRef] [PubMed]
  4. Brito, L.; Bédère, N.; Douhard, F.; Oliveira, H.; Arnal, M.; Peñagaricano, F.; Schinckel, A.; Baes, C.; Miglior, F. Genetic Selection of High-Yielding Dairy Cattle Toward Sustainable Farming Systems in a Rapidly Changing World. Animal 2021, 15, 100292. [Google Scholar] [CrossRef] [PubMed]
  5. Nei, M.; Li, W.-H. Mathematical Model for Studying Genetic Variation in Terms of Restriction Endonucleases. Proc. Natl. Acad. Sci. USA 1979, 76, 5269–5273. [Google Scholar] [CrossRef] [PubMed]
  6. Sabeti, P.C.; Varilly, P.; Fry, B.; Lohmueller, J.; Hostetter, E.; Cotsapas, C.; Xie, X.; Byrne, E.H.; McCarroll, S.A.; Gaudet, R. Genome-Wide Detection and Characterization of Positive Selection in Human Populations. Nature 2007, 449, 913–918. [Google Scholar] [CrossRef]
  7. Voight, B.F.; Kudaravalli, S.; Wen, X.; Pritchard, J.K. A Map of Recent Positive Selection in the Human Genome. PLoS Biol. 2006, 4, e72. [Google Scholar]
  8. Yu, Y.; Fu, J.; Xu, Y.; Zhang, J.; Ren, F.; Zhao, H.; Tian, S.; Guo, W.; Tu, X.; Zhao, J. Genome Re-Sequencing Reveals the Evolutionary History of Peach Fruit Edibility. Nat. Commun. 2018, 9, 5404. [Google Scholar] [CrossRef]
  9. Ma, L.; Sonstegard, T.S.; Cole, J.B.; VanTassell, C.P.; Wiggans, G.R.; Crooker, B.A.; Tan, C.; Prakapenka, D.; Liu, G.E.; Da, Y. Genome Changes Due to Artificial Selection in Us Holstein Cattle. BMC Genom. 2019, 20, 128. [Google Scholar] [CrossRef]
  10. Poplin, R.; Chang, P.-C.; Alexander, D.; Schwartz, S.; Colthurst, T.; Ku, A.; Newburger, D.; Dijamco, J.; Nguyen, N.; Afshar, P.T. A Universal Snp and Small-Indel Variant Caller Using Deep Neural Networks. Nat. Biotechnol. 2018, 36, 983–987. [Google Scholar] [CrossRef]
  11. Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-Generation Plink: Rising to the Challenge of Larger and Richer Datasets. GigaScience 2015, 4, s13742-015-0047-8. [Google Scholar] [CrossRef] [PubMed]
  12. Villanueva, R.A.M.; Chen, Z.J. Ggplot2: Elegant Graphics for Data Analysis; Taylor & Francis: Abingdon, UK, 2019. [Google Scholar]
  13. Zhang, C.; Dong, S.-S.; Xu, J.-Y.; He, W.-M.; Yang, T.-L. PopLDdecay: A Fast and Effective Tool for Linkage Disequilibrium Decay Analysis Based on Variant Call Format Files. Bioinformatics 2019, 35, 1786–1788. [Google Scholar] [CrossRef] [PubMed]
  14. Chen, H.; Patterson, N.; Reich, D. Population Differentiation as a Test for Selective Sweeps. Genome Res. 2010, 20, 393–402. [Google Scholar] [CrossRef] [PubMed]
  15. Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T. The Variant Call Format and Vcftools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
  16. Szpiech, Z.A.; Hernandez, R.D. Selscan: An Efficient Multithreaded Program to Perform Ehh-Based Scans for Positive Selection. Mol. Biol. Evol. 2014, 31, 2824–2827. [Google Scholar] [CrossRef]
  17. Rosen, B.D.; Bickhart, D.M.; Schnabel, R.D.; Koren, S.; Elsik, C.G.; Tseng, E.; Rowan, T.N.; Low, W.Y.; Zimin, A.; Couldrey, C. De Novo Assembly of the Cattle Reference Genome with Single-Molecule Sequencing. GigaScience 2020, 9, giaa021. [Google Scholar] [CrossRef]
  18. Lawrence, M.; Huber, W.; Pagès, H.; Aboyoun, P.; Carlson, M.; Gentleman, R.; Morgan, M.T.; Carey, V.J. Software for Computing and Annotating Genomic Ranges. PLoS Comput. Biol. 2013, 9, e1003118. [Google Scholar] [CrossRef]
  19. Kanehisa, M.; Furumichi, M.; Sato, Y.; Matsuura, Y.; Ishiguro-Watanabe, M. KEGG: Biological Systems Database as a Model of the Real World. Nucleic Acids Res. 2025, 53, D672–D677. [Google Scholar] [CrossRef]
  20. Zhou, Y.; Zhou, B.; Pache, L.; Chang, M.; Khodabakhshi, A.H.; Tanaseichuk, O.; Benner, C.; Chanda, S.K. Metascape Provides a Biologist-Oriented Resource for the Analysis of Systems-Level Datasets. Nat. Commun. 2019, 10, 1523. [Google Scholar] [CrossRef]
  21. Fonseca, P.A.; Suarez-Vega, A.; Marras, G.; Cánovas, Á. GALLO: An R Package for Genomic Annotation and Integration of Multiple Data Sources in Livestock for Positional Candidate Loci. GigaScience 2020, 9, giaa149. [Google Scholar] [CrossRef]
  22. Hu, Z.-L.; Park, C.A.; Wu, X.-L.; Reecy, J.M. Animal QTLdb: An Improved Database Tool for Livestock Animal Qtl/Association Data Dissemination in the Post-Genome Era. Nucleic Acids Res. 2013, 41, D871–D879. [Google Scholar] [CrossRef] [PubMed]
  23. D’Occhio, M.J.; Campanile, G.; Baruselli, P.S.; Porto Neto, L.R.; Hayes, B.J.; Snr, A.C.; Fortes, M.R. Pleomorphic Adenoma Gene1 in Reproduction and Implication for Embryonic Survival in Cattle: A Review. J. Anim. Sci. 2024, 102, skae103. [Google Scholar] [CrossRef] [PubMed]
  24. Gai, Z.; Hu, S.; Ma, J.; Wang, Y.; Gong, G.; Zhao, J. Whole Genome-Wide Analysis of Dep Family Members in Sheep (Ovis Aries) Reveals Their Potential Roles in Regulating Lactation. Chem. Biol. Technol. Agric. 2022, 9, 68. [Google Scholar] [CrossRef]
  25. Magalhaes, A.F.; De Camargo, G.M.; Fernandes, G.A.; Gordo, D.G.; Tonussi, R.L.; Costa, R.B.; Espigolan, R.; Silva, R.M.d.O.; Bresolin, T.; De Andrade, W.B. Genome-Wide Association Study of Meat Quality Traits in Nellore Cattle. PLoS ONE 2016, 11, e0157845. [Google Scholar] [CrossRef]
  26. Puglisi, R.; Cambuli, C.; Capoferri, R.; Giannino, L.; Lukaj, A.; Duchi, R.; Lazzari, G.; Galli, C.; Feligini, M.; Galli, A. Differential Gene Expression in Cumulus Oocyte Complexes Collected by Ovum Pick up from Repeat Breeder and Normally Fertile Holstein Friesian Heifers. Anim. Reprod. Sci. 2013, 141, 26–33. [Google Scholar] [CrossRef]
  27. Mukherjee, J.; Das, P.K.; Banerjee, D. Lactation Physiology. In Textbook of Veterinary Physiology; Springer: Berlin/Heidelberg, Germany, 2023; pp. 639–674. [Google Scholar]
  28. Arnandis, T.; Ferrer-Vicens, I.; García-Trevijano, E.; Miralles, V.; García, C.; Torres, L.; Viña, J.; Zaragozá, R. Calpains Mediate Epithelial-Cell Death During Mammary Gland Involution: Mitochondria and Lysosomal Destabilization. Cell Death Differ. 2012, 19, 1536–1548. [Google Scholar] [CrossRef]
  29. García-Trevijano, E.R.; Ortiz-Zapater, E.; Gimeno, A.; Viña, J.R.; Zaragozá, R. Calpains, the Proteases of Two Faces Controlling the Epithelial Homeostasis in Mammary Gland. Front. Cell Dev. Biol. 2023, 11, 1249317. [Google Scholar] [CrossRef] [PubMed]
  30. Tornillo, G.; Knowlson, C.; Kendrick, H.; Cooke, J.; Mirza, H.; Aurrekoetxea-Rodríguez, I.; dM Vivanco, M.; Buckley, N.E.; Grigoriadis, A.; Smalley, M.J. Dual Mechanisms of Lyn Kinase Dysregulation Drive Aggressive Behavior in Breast Cancer Cells. Cell Rep. 2018, 25, 3674–3692.e10. [Google Scholar] [CrossRef]
  31. Vallone, S.A.; García Solá, M.; Schere-Levy, C.; Meiss, R.P.; Hermida, G.N.; Chodosh, L.A.; Kordon, E.C.; Hynes, N.E.; Gattelli, A. Aberrant Ret Expression Affects Normal Mammary Gland Post-Lactation Transition, Enhancing Cancer Potential. Dis. Models Mech. 2022, 15, dmm049286. [Google Scholar] [CrossRef]
  32. Li, C.; Sun, D.; Zhang, S.; Wang, S.; Wu, X.; Zhang, Q.; Liu, L.; Li, Y.; Qiao, L. Genome Wide Association Study Identifies 20 Novel Promising Genes Associated with Milk Fatty Acid Traits in Chinese Holstein. PLoS ONE 2014, 9, e96186. [Google Scholar] [CrossRef]
  33. Chen, Z.; Yao, Y.; Ma, P.; Wang, Q.; Pan, Y. Haplotype-Based Genome-Wide Association Study Identifies Loci and Candidate Genes for Milk Yield in Holsteins. PLoS ONE 2018, 13, e0192695. [Google Scholar] [CrossRef]
  34. Pečnik, Ž.; Jevšinek Skok, D. Identification of Genomic Regions Affecting Nitrogen Excretion Intensity in Brown Swiss Dairy Cows. Anim. Biotechnol. 2024, 35, 2434097. [Google Scholar] [CrossRef] [PubMed]
  35. Cai, Z.; Guldbrandtsen, B.; Lund, M.S.; Sahana, G. Prioritizing Candidate genes Post-Gwas Using Multiple Sources of Data for Mastitis Resistance in Dairy Cattle. BMC Genom. 2018, 19, 656. [Google Scholar] [CrossRef] [PubMed]
  36. Zare, M.; Atashi, H.; Hostens, M. Genome-Wide Association Study for Lactation Performance in the Early and Peak Stages of Lactation in Holstein Dairy Cows. Animals 2022, 12, 1541. [Google Scholar] [CrossRef] [PubMed]
  37. Chang, C.-W.; Sung, Y.-W.; Hsueh, Y.-W.; Chen, Y.-Y.; Ho, M.; Hsu, H.-C.; Yang, T.-C.; Lin, W.-C.; Chang, H.-M. Growth Hormone in Fertility and Infertility: Mechanisms of Action and Clinical Applications. Front. Endocrinol. 2022, 13, 1040503. [Google Scholar] [CrossRef]
  38. Li, F.; Feng, X.; Li, R.; Du, B.; Xue, X. Genetic Bases and Molecular Breeding of Key Economic Traits in China Dairy Cattle: A Progress Report. Int. J. Clin. Case Rep. Rev. 2022, 12. [Google Scholar] [CrossRef]
  39. Chen, S.; Paul, M.R.; Sterner, C.J.; Belka, G.K.; Wang, D.; Xu, P.; Sreekumar, A.; Pan, T.-c.; Pant, D.K.; Makhlin, I. PAQR8 Promotes Breast Cancer Recurrence and Confers Resistance to Multiple Therapies. Breast Cancer Res. 2023, 25, 1. [Google Scholar] [CrossRef]
  40. Zhang, Y.T.; Hong, W.S.; Liu, D.T.; Qiu, H.T.; Zhu, Y.; Chen, S.X. Involvement of Membrane Progestin Receptor Beta (Mprβ/Paqr8) in Sex Pheromone Progestin-Induced Expression of Luteinizing Hormone in the Pituitary of Male Chinese Black Sleeper (Bostrychus sinensis). Front. Endocrinol. 2018, 9, 397. [Google Scholar] [CrossRef]
  41. Laodim, T.; Koonawootrittriron, S.; Elzo, M.A.; Suwanasopee, T.; Jattawa, D.; Sarakul, M. Genetic Factors Influencing Milk and Fat Yields in Tropically Adapted Dairy Cattle: Insights from Quantitative Trait Loci Analysis and Gene Associations. Anim. Biosci. 2023, 37, 576. [Google Scholar] [CrossRef]
  42. Daldaban, F.; Kıyıcı, J.M.; Akyüz, B.; Aksel, E.G.; Kaliber, M.; Çınar, M.U.; Arslan, K. Association of the Cacna2d1 Gene with Milk Yield and Milk Quality Traits in Holstein Cattle. J. Dairy Res. 2024, 91, 373–377. [Google Scholar] [CrossRef]
  43. Liu, Y.; Wang, H.; Zhao, X.; Zhang, J.; Zhao, Z.; Lian, X.; Zhang, J.; Kong, F.; Hu, T.; Wang, T. Targeting the Immunoglobulin Igsf9 Enhances Antitumor T-Cell Activity and Sensitivity to Anti–Pd-1 Immunotherapy. Cancer Res. 2023, 83, 3385–3399. [Google Scholar] [CrossRef]
  44. Dai, L.; Zhao, J.; Yin, J.; Fu, W.; Chen, G. Cell Adhesion Molecule 2 (Cadm2) Promotes Brain Metastasis by Inducing Epithelial-Mesenchymal Transition (Emt) in Human Non-Small Cell Lung Cancer. Ann. Transl. Med. 2020, 8, 465. [Google Scholar] [CrossRef]
  45. Cai, W.; Cole, J.B.; Goddard, M.E.; Li, J.; Zhang, S.; Song, J. Mammary Gland Multi-Omics Data Reveals New Genetic Insights into Milk Production Traits in Dairy Cattle. PLoS Genet. 2025, 21, e1011675. [Google Scholar] [CrossRef]
  46. Arshad, U.; Kennedy, K.M.; Cid de la Paz, M.; Kendall, S.J.; Cangiano, L.R.; White, H.M. Immune Cells Phenotype and Bioenergetic Measures in Cd4+ T Cells Differ between High and Low Feed Efficient Dairy Cows. Sci. Rep. 2024, 14, 15993. [Google Scholar] [CrossRef] [PubMed]
  47. Peng, V.; Trsan, T.; Sudan, R.; Bhattarai, B.; Cortez, V.S.; Molgora, M.; Vacher, J.; Colonna, M. Inositol Phosphatase Inpp4b Sustains Ilc1s and Intratumoral NK Cells through an Akt-Driven Pathway. J. Exp. Med. 2024, 221, e20230124. [Google Scholar] [CrossRef] [PubMed]
  48. Chen, Z.Y.; Mortha, A. Inpp4b Ensures That Ilc1s and Nk Cells Set up a Productive Home Office. J. Exp. Med. 2024, 221, e20232375. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of study design and analysis.
Figure 1. Overview of study design and analysis.
Animals 15 02247 g001
Figure 2. Population LD and structure of two groups of Holstein cattle. (A) Genome-wide LD decay estimated from each population, with x indicating the distance between SNPs and the y-axis representing the squared correlation (r2) between pairs of SNPs. (B) Principal component analysis of two cattle groups, with x and y axes representing principal components 1 and 2, respectively.
Figure 2. Population LD and structure of two groups of Holstein cattle. (A) Genome-wide LD decay estimated from each population, with x indicating the distance between SNPs and the y-axis representing the squared correlation (r2) between pairs of SNPs. (B) Principal component analysis of two cattle groups, with x and y axes representing principal components 1 and 2, respectively.
Animals 15 02247 g002
Figure 3. Manhattan plots of genome-wide selection signatures by five different methods. (A) Genome-wide distribution of Fst windows. (B) Genome-wide distribution of -ln(Pi_ratio) windows. (C) Genome-wide distribution of XP-CLR windows. (D) Genome-wide distribution of iHS values. (E) Genome-wide distribution of XP-EHH values. Black dashed lines represent the threshold of the top 5% values, and windows above the black lines were considered candidate regions of selection signatures.
Figure 3. Manhattan plots of genome-wide selection signatures by five different methods. (A) Genome-wide distribution of Fst windows. (B) Genome-wide distribution of -ln(Pi_ratio) windows. (C) Genome-wide distribution of XP-CLR windows. (D) Genome-wide distribution of iHS values. (E) Genome-wide distribution of XP-EHH values. Black dashed lines represent the threshold of the top 5% values, and windows above the black lines were considered candidate regions of selection signatures.
Animals 15 02247 g003
Figure 4. Identification of consensus selection signatures from four or more methods. (A) Venn diagram of candidate regions shared by five methods. (B) Number of candidate regions shared by four or five methods.
Figure 4. Identification of consensus selection signatures from four or more methods. (A) Venn diagram of candidate regions shared by five methods. (B) Number of candidate regions shared by four or five methods.
Animals 15 02247 g004
Figure 5. Gene and functional enrichment analysis of candidate selection signatures. (A) GO term enrichment for genes within candidate selection signatures. (B) Functional region proportion in overlapped genes. (C) KEGG term enrichment for genes within candidate regions.
Figure 5. Gene and functional enrichment analysis of candidate selection signatures. (A) GO term enrichment for genes within candidate selection signatures. (B) Functional region proportion in overlapped genes. (C) KEGG term enrichment for genes within candidate regions.
Animals 15 02247 g005
Figure 6. Proportional distribution of QTLs annotated from selection signatures across trait categories. (A) Percentage of QTLs for different trait categories. (B) Enrichment analysis results of QTLs in milk production traits. (C) Enrichment analysis results of QTLs in reproduction traits.
Figure 6. Proportional distribution of QTLs annotated from selection signatures across trait categories. (A) Percentage of QTLs for different trait categories. (B) Enrichment analysis results of QTLs in milk production traits. (C) Enrichment analysis results of QTLs in reproduction traits.
Animals 15 02247 g006
Figure 7. Top 20 enriched traits for QTLs near selection signature SNPs. Richness factors were obtained by calculating the ratio of the number of QTLs annotated in the candidate regions and the total number of QTLs.
Figure 7. Top 20 enriched traits for QTLs near selection signature SNPs. Richness factors were obtained by calculating the ratio of the number of QTLs annotated in the candidate regions and the total number of QTLs.
Animals 15 02247 g007
Table 1. Genetic diversity of two Holstein populations.
Table 1. Genetic diversity of two Holstein populations.
PopulationObserved Heterozygosity (Ho)Expected Heterozygosity (He)Inbreeding coefficient (F)
Unselected0.2910.3160.078
Selected0.3050.3210.049
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, J.; Yang, L.; Gao, Y.; Liu, G.E.; Da, Y.; Ma, L. Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle. Animals 2025, 15, 2247. https://doi.org/10.3390/ani15152247

AMA Style

Cai J, Yang L, Gao Y, Liu GE, Da Y, Ma L. Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle. Animals. 2025; 15(15):2247. https://doi.org/10.3390/ani15152247

Chicago/Turabian Style

Cai, Jiarui, Liu Yang, Yahui Gao, George E. Liu, Yang Da, and Li Ma. 2025. "Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle" Animals 15, no. 15: 2247. https://doi.org/10.3390/ani15152247

APA Style

Cai, J., Yang, L., Gao, Y., Liu, G. E., Da, Y., & Ma, L. (2025). Selection Signature Analysis of Whole-Genome Sequences to Identify Genome Differences Between Selected and Unselected Holstein Cattle. Animals, 15(15), 2247. https://doi.org/10.3390/ani15152247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop