Next Article in Journal
Advancements in Gene Therapy for Non-Small Cell Lung Cancer: Current Approaches and Future Prospects
Previous Article in Journal
The Complete Mitochondrial Genome of Petalocephala arcuata Cai Et Kuoh, 1992 (Hemiptera: Cicadellidae: Ledrinae: Petalocephalini) and Its Phylogenetic Implications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genome-Wide Analysis of Copy Number Variations in Three Populations of Nanyang Cattle Using Whole-Genome Resequencing

1
College of Big Data, Yunnan Agricultural University, Kunming 650201, China
2
Yunnan Engineering Technology Research Center of Agricultural Big Data, Kunming 650201, China
3
Yunnan Engineering Research Center for Big Data Intelligent Information Processing of Green Agricultural Products, Kunming 650201, China
*
Author to whom correspondence should be addressed.
Genes 2025, 16(5), 568; https://doi.org/10.3390/genes16050568
Submission received: 17 March 2025 / Revised: 1 May 2025 / Accepted: 5 May 2025 / Published: 12 May 2025
(This article belongs to the Section Animal Genetics and Genomics)

Abstract

:
Copy number variation (CNV) serves as a crucial contributor to genetic diversity, exerting a profound influence on phenotypic diversity, traits of economic significance, and the evolutionary trajectory of livestock species. This study aimed to dissect the genome-wide CNV landscape of the Nanyang cattle line (Nanyang, Pinnan, and Xianan cattle) to identify functionally relevant CNVs associated with key economic traits and breed differentiation. In this study, 27 resequencing datasets were utilized to analyze the genome-wide distribution of CNVs in three breeds of Nanyang cattle (Nanyang cattle, Pinnan cattle, and Xianan cattle) based on the latest reference genome ARS-UCD2.0. This study identified a total of 97,564 CNVs, and after merging CNVs with overlapping genomic positions, we obtained 10,349 CNV regions (CNVRs), accounting for 1.48% of the reference genome. Functional enrichment analysis showed that CNVR genes were mainly involved in organ development, neural regulation, immune regulation, and metabolism. In addition, 131 CNVRs overlapped with 81 quantitative trait loci (QTLs), such as growth and carcass QTL, multiple birth QTL, tenderness score QTL, and antal follicle number QTL. Additionally, AOX1, KRT72, and ZBTB7C were found to overlap with body weight QTLs. Furthermore, a selective sweep analysis of CNVR revealed that numerous genes (KIF26A, SPINT4, OR5W1, etc.) exhibited divergent copy numbers between breeds. Conclusively, this study facilitates comprehension of the genetic characteristics of the Nanyang cattle line at the CNV level and furnishes valuable information for the advancement of the Nanyang cattle line breeding system.

1. Introduction

Copy number variations (CNVs) and single nucleotide polymorphisms (SNPs), as critical genomic variations, serve as key drivers in shaping domestication traits and adaptive evolution across animal and plant species [1]. Unlike SNPs, which refer to the substitution, deletion or insertion of only one nucleotide, CNV is defined as a change in the DNA sequence compared with the reference assembly due to the loss (deletion) or addition (insertion and duplication) of nucleotide bases. CNVs usually range from 1 kb to several Mb. Therefore, it is generally accepted that CNVs have the potential to significantly affect the phenotypic characteristics of livestock [2]. Consequently, enhancing our comprehension of the prevalence and functional intricacies of CNVs in livestock, particularly those associated with complex traits and environmental adaptation, will facilitate substantial advancements in the genetic enhancement of economic and production traits, along with animal health [3]. Previously, large-scale CNV detection was predominantly conducted using comparative genomic hybridization (aCGH) and high-density single-nucleotide polymorphism (SNP). However, these methods have certain limitations, including low coverage and low resolution. As sequencing costs decrease, next-generation sequencing (NGS) overcomes the limitations of chips and demonstrates significant advantages in the detection of genomic CNVs.
A substantial body of research has been dedicated to the analysis of CNV maps in various livestock species, including cattle, goats, sheep, and pigs. The findings of these studies have demonstrated that these CNVs have a considerable impact on the production performance of livestock. Studies have shown that CNVs were enriched in the immune system and olfactory receptor genes of cattle and were associated with some economic traits. The study also detected CNVs in the Kit gene, which is associated with color laterality [4]. Studies have found multiple CNV overlapping genes (such as EDNRA, ADAMTS20, and ASIP) related to adaptive traits (such as coat color, muscle development, and metabolic processes) [5]. Studies have found several important functional genes related to reproductive traits, such as KDM2A, ACTN3, RHOD, ACTB, CCDC42, PIK3R5, NTN1, and BMP2. These genes play a key role in embryonic development, spermatogenesis, cell proliferation, migration, and differentiation, and may affect the number of live piglets by changing gene dosage [6]. Studies have found that Lactalbumin Alpha (LALBA), a key gene that controls milk production in cattle, presents highly differentiated CNVs in the promoter region, making it a strong functional candidate gene for differences in milk production-related traits between swamp buffalo and river buffalo [7]. The duplication of RUNX Family Transcription Factor 1 (RUNX1) may promote hypoxia adaptation in OTS and HTS on the Qinghai–Tibet Plateau [8]. In addition, the distal-less homeobox 3 (DLX3) gene overlaps with the CNVR associated with wool curly, indicating that the CNV can be identified as a candidate for the special curly wool phenotype of Tan sheep [9].
Nanyang cattle stands as one of China’s five premier cattle breeds. During the agricultural epoch, this breed was indispensable, significantly contributing to the nation’s agricultural output. It holds a prominent place on the “National Inventory of Livestock and Poultry Genetic Resources”, as curated by China’s Ministry of Agriculture and Rural Affairs [10]. Prior research findings indicate that Nanyang cattle is a crossbreed resulting from the hybridization of Bos taurus and Bos indicus. This breed boasts notable strengths, including a towering build, superb meat texture, and a rich intramuscular fat content. However, it also exhibits certain drawbacks, such as a sluggish growth pace and a relatively low slaughter yield [11,12]. Pinnan cattle are a population formed by Piedmontese cattle as the father and Nanyang cattle as the mother through progressive hybridization, cross-breeding, and self-breeding. This results in an early puberty, a fast growth rate, and an excellent meat quality [13]. Xianan cattle are a novel breed formed by Charolais cattle as the father and Nanyang cattle as the mother through cross-breeding innovation and the backcrossing of superior individuals. Xianan cattle exhibit several advantageous traits, including early maturity, accelerated growth, superior meat quality, and a reduced incidence of dystocia, thereby enhancing the efficiency of beef cattle production and the economic returns to the cattle industry [14]. Up until now, research efforts in this field have been rather sparse, with only a handful of studies delving into the genomic disparities within the Nanyang cattle breed at the levels of single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs). Moreover, the exploration of copy number variation (CNV) in this breed has predominantly centered around the influence of CNV in a solitary gene on growth-related traits, leaving broader genomic patterns and interactions largely uncharted [10,15].
This study aimed to characterize genome-wide copy number variation patterns across three Nanyang cattle lineages (Nanyang, Pinnan, and Xianan cattle) using high-coverage resequencing data, with the following objectives: (1) systematically map CNVRs and assess their genomic distribution and functional relevance to key economic traits; (2) identify CNVRs overlapping quantitative trait loci (QTLs) associated with growth, carcass quality, and reproductive performance; and (3) investigate population-level differentiation in CNVRs through selective sweep analysis to uncover signatures of artificial or environmental selection. By integrating comparative genomics, functional annotation, and population genetics approaches, this work seeks to establish a foundational CNV resource for the Nanyang cattle line, enabling targeted exploration of structural variation in breeding programs and advancing molecular strategies for trait optimization.

2. Materials and Methods

2.1. Samples Collection and Genome Sequencing

In this study, a total of 27 whole genome sequencing data sets of cattle samples were obtained, all of which were downloaded from the NCBI public database. The data sets included 7 Nanyang cattle (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA396672, https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA379859, accessed on 16 March 2025), 10 Pinnan cattle (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA698276, accessed on 16 March 2025) and, 10 Xianan cattle (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1058368, accessed on 16 March 2025) (Supplementary Table S1). All data were sequenced on an Illumina HiSeq 2000 sequencer (Illumina Inc., San Diego, CA, USA) with 100 bp paired-end reads. Following the acquisition of the downloaded cattle whole genome sequencing raw data, the raw data of the Illumina platform were filtered using FASTP software version 0.18.0 [16] to obtain relatively high-quality sequencing data for clean read assembly analysis. The raw reads were processed according to stringent filtering criteria: (1) removal of reads with ≥10% unidentified nucleotides (Ns), (2) removal of reads where >50% of bases had a Phred quality score ≤ 20, and (3) removal of reads aligned with adapter sequences [17]. The preprocessed sequencing data underwent alignment against the most recent cattle reference genome, ARS-UCD2.0, which was sourced from the Ensembl database. This alignment was executed utilizing the BWA software package [18], and Subsequently, to ensure the accuracy of CNV analysis, any PCR duplicates that could potentially skew the results were systematically identified and eliminated using the Picard-2.9.2 (https://broadinstitute.github.io/picard/, accessed on 16 March 2025) and its Markduplicates module.

2.2. Detection of CNVs and CNVRs

We detected CNVs for each individual by using CNVnator [19], Lumpy [20], and CNVcaller [21]. We then merged the results of CNVnator and Lumpy with the results of CNVcaller, with the goal of maximizing population-specific variation while reducing rare variation at the individual level. CNVnator operates as a read-depth analysis tool tailored for detecting CNV by comparing genomic data against the ARS-UCD2.0 reference genome. To enhance the precision of CNV predictions, stringent filtering criteria were applied: calls were retained only if they met a p-value threshold of less than 0.001, had a proportion of reads with zero mapping quality (q0) below 0.5, and spanned a genomic region exceeding 1 kilobase in size. For the purpose of cross-variety comparisons in copy number, the “-genotype” feature within CNVnator was employed to derive an estimated copy number count for each genomic segment of interest. The Lumpy software, configured with its default parameter settings, was utilized to identify copy number variations (CNVs). Specifically, for each sample analyzed, the Lumpy express module was engaged to scrutinize discordant-read pairs and split-read pairs, processes which are instrumental in accurately detecting and delineating genomic regions exhibiting copy number alterations. We used manual inspection and SURVIVOR (version 0.0.1) [22] to merge the results of the three software and determine the final data set. The CNVRs of the Nanyang cattle line were divided into duplication CNVRs, deletion CNVRs, and CNVRs with both duplication and deletion. The length of the CNVRs with both deletion and duplication and deletion types did not exceed 50 kb, and the length of the CNVRs with duplication types did not exceed 500 kb [23]. Furthermore, an analysis was conducted to examine the chromosomal distribution patterns of these specific genomic regions within the Nanyang cattle lineage. This was achieved by utilizing the RIdeogram tool [24], which is part of the Bioconductor software suite.

2.3. Functional Annotation and Enrichment Analysis of CNVRs

To decode the functional implications of the detected CNV within the Nanyang cattle breed’s genetic blueprint, we retrieved the corresponding annotation files from the NCBI database, specifically from the directory (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/263/795/GCF_002263795.3_ARS-UCD2.0/, accessed on 16 March 2025). These annotations offer a detailed map of the genomic landscape, enabling us to explore how the CNVs may shape the breed’s phenotypic traits and biological processes. The annotation of candidate CNVRs was completed by using the software program, ANNOVAR [25]. We conducted Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, focusing solely on protein-coding genes. This analytical process was carried out utilizing the Database for Annotation, Visualization, and Integrated Discovery (DAVID), accessible via the URL (https://david.ncifcrf.gov/, accessed on 16 March 2025). The aim was to uncover the biological processes, molecular functions, and cellular components associated with the genes of interest, as well as to identify the relevant signaling pathways and metabolic networks they might be involved in [26]. Biological process, cellular component, and molecular function were used as GO term categories with a significance level of 0.01. Furthermore, quantitative trait loci (QTLs) for cattle were downloaded from the cattle QTLdb (https://www.animalgenome.org/cgi-bin/QTLdb/BT/summary, accessed on 16 March 2025) [27] and compared with the identified CNVRs. Given the dearth of research focusing on relevant quantitative trait loci (QTLs) mapped against the ARS-UCD2.0 genome assembly, our investigation was constrained to utilizing QTL data documented in the ARS-UCD1.2 version. Specifically, we filtered for QTLs with a confidence interval narrower than 5 megabases (Mb), aiming to refine our analysis to regions of higher genomic precision. To ascertain the overlap between these QTLs and the CNVRs detected in our study, we leveraged the ’intersect’ functionality of the Bedtools-v2.27.1 software package [28]. This approach enabled us to pinpoint QTLs that spatially coincide with CNVRs, thereby offering insights into potential genetic loci influencing traits of interest in Nanyang cattle.

2.4. Sweep Selective Analysis of the CNVR

To pinpoint copy number variation regions (CNVRs) that differ among Nanyang, Xianan, and Pinnan cattle breeds, we computed the Vst statistic. Vst operates on principles akin to the Fst statistic, a well-established measure for assessing genetic differentiation between populations. However, Vst is tailored to quantify population disparities based on copy number data.The formula for calculating Vst is expressed as Vst = (Vt − Vs)/Vt. In this equation, Vt denotes the total variance observed across all unrelated individuals from the combined cattle populations. On the other hand, Vs signifies the weighted average of the variances within each individual population, with the weights assigned according to the respective population sizes [29]. Subsequently, we zeroed in on the upper 5% of CNVR that exhibited exceptionally elevated Vst values. These extreme-value loci, effectively serving as “outliers” within our dataset, were scrutinized to investigate their potential links with prominent phenotypic traits specific to Nanyang cattle. To unravel the underlying biological mechanisms and functional roles these regions might play, we conducted a comprehensive functional enrichment analysis on these selected CNVRs.

3. Results

3.1. The Landscape of Copy Number Variation in Nanyang Cattle

We gathered whole genome sequencing data from a cohort of 27 Chinese cattle breeds. The sequencing depth for these samples ranged between approximately 5.43X and 13.91X. Once the sequencing reads were generated, we aligned them to the bovine reference genome, ARS-UCD2.0. This alignment step is crucial as it allows us to accurately map the genetic information from our samples onto a standardized genomic framework. Following the alignment process, we achieved an impressive average coverage rate of 99.68%. Such a high level of coverage is vital as it significantly bolsters the reliability of our subsequent CNV detection efforts, ensuring that the genetic variations we identify are based on a robust and comprehensive genomic dataset (Supplementary Table S1). We generated a CNVR dataset for each cattle breed, which contained 10,349 CNVRs, including 4741 duplicate CNVRs, 3313 deleted CNVRs, and 2295 duplicated and deleted CNVRs, with a total length of 36,814,051 bp and an average length of 3557 bp, covering 1.48% of the reference genome (Figure 1A). In this study, 10,349 CNVRs (comprising duplications, deletions, and duplications and deletions) were categorized into groups of varying lengths. The size distribution of all CNVRs exhibited an L-shaped curve, with 50.9% of CNVRs situated within the 0–2 kb range, 30.7% of CNVRs located within the 2–5 kb range, 6.8% of CNVRs positioned within the 5–10 kb range, and the remaining CNVRs exceeding 10 kb (Figure 1B) (Supplementary Table S2). Upon closer examination, it became evident that the distribution of CNVR across the genome was far from uniform. A substantial majority, accounting for 54.3% (or 5621 in number) of the CNVRs, were situated within intergenic regions—those expanses of DNA lying between genes. In stark contrast, only a minuscule 1.6% of the CNVRs were found nestled within exonic regions, which are the protein-coding segments of genes crucial for determining an organism’s traits (Figure 1C).

3.2. Functional Annotation of CNVRs

Functional enrichment analysis was performed on GO terms and KEGG pathways throughDAVID with 2851 protein-coding genes within 1 kb of CNVR [30]. The 10,349 identified copy number variation regions (CNVRs) were mapped to a curated set of 2851 genes, revealing a subset of the genome where structural alterations in copy number have occurred. These genes, which are prone to undergoing copy number variations, represent an exceptionally precious genomic asset. They offer a unique avenue for delving into the complex web of connections that exist between genes affected by CNV and the observable physical and behavioral characteristics in Nanyang cattle. To gain a more profound insight into how these CNVR functionally impact biological processes, we performed an extensive functional enrichment analysis focusing on the genes located within these specific genomic areas. The DAVID database was utilized to perform functional enrichment analysis of GO terms and KEGG pathways on the 2851 genes. The analysis identified 173 GO terms that were significantly enriched (p < 0.01), encompassing 76 biological processes, 54 cellular components, and 43 molecular functions (Supplementary Table S3). The functions of these genes are primarily associated with organ development, neural regulation, immune regulation, and metabolism. Examples of specific terms include protein binding (GO:0005515), ATP binding (GO:0005524), metal ion binding (GO:0046872), and signal transduction (GO:0007165). Furthermore, a KEGG pathway analysis of shared CNVR hidden genes revealed their enrichment in 48 pathways (Supplementary Table S4, p < 0.05), including the calcium signaling pathway (bta04020), the Focal adhesion pathway (bta04510), the Oxytocin signaling pathway (bta04921), and the cell adhesion molecules pathway (bta04514), amongst others.

3.3. QTLs Overlapping with Identified CNVRs

To further elucidate the correlation between CNVRs and traits in Nanyang cattle, we used QTL data from cattle for comparison with the detected CNVRs. The results of this analysis indicated that 131 CNVRs were found to overlap with 81 quantitative trait loci (QTLs), including Subcutaneous fat thickness QTL (38 CNVRs), Longissimus muscle area QTL (14 CNVRs), multiple birth QTL (14 CNVRs), tenderness score QTL (11 CNVRs), Antral follicle number QTL (9 CNVRs), etc. (Supplementary Table S5). Furthermore, we identified several CNVR genes associated with slaughter performance, including AOX1, KRT72, SFXN1, ZBTB7C, and CACNA1G genes located at Longissimus muscle area QTL (223729), Meat color QTL (222285), Marbling score QTL (222248), and multiple birth QTL (258520). These data hold immense significance in guiding future genetic enhancement efforts aimed at advancing the Nanyang cattle breed.

3.4. CNVRs Diverging Among Populations

We applied Vst statistics to analyze the differentiation of CNVR among Nanyang cattle, Pinnan cattle, and Xianan cattle. The average Vst values of all detected responses to CNVR were 0.1591 for Nanyang cattle and Pinnan cattle, 0.2561 for Nanyang cattle and Xia’nan cattle, and 0.3157 for Pinnan cattle and Xianan cattle (Supplementary Table S6). Pinnan and Xianan cattle showed the highest degree of differentiation, which is consistent with the results of breeding between the two cattle. As shown in Figure 2 and Supplementary Table S6, different CNVRs were unevenly distributed on chromosomes. To understand the genes with high differentiation between breeds, we further examined genes with VST > 0.79 (the highest 98th percentile). Four genes including LOC788997, KIF26A, SPINT4, and OR5W1 in Nanyang cattle and Pinnan cattle, 407 genes including CEBPA, LOC101905257, TWIST2, KCNJ5, and CRLF3 in Nanyang cattle and Xianan cattle, and 529 genes including IFRD2, SLC16A5, LOC101905099, LRP5, and CLMN in Pinnan cattle and Xianan cattle all exceeded the threshold. Further functional analysis showed that a total of 14 GO terms were found to be enriched between Nanyang cattle and Xianan cattle, which were mainly related to neural regulation and control. In addition, 16 KEGG pathways were enriched, including Dopaminergic synapse, Oxytocin signaling pathway, and calcium signaling pathway (Supplementary Table S7). A total of 23 GO terms were found to be enriched between Pinnan cattle and Xianan cattle, which were mainly related to signal transduction and organ development. In addition, 20 KEGG pathways were enriched, including Circadian entrainment, Cortisol synthesis and secretion, and Regulation of actin cytoskeleton (Supplementary Table S8).

4. Discussion

Throughout the processes of domestication and subsequent diversification within a species, the prevalence of CNV in its genome dynamically shifts in response to selective pressures exerted by environmental demands or human-driven breeding goals, while substantial research endeavors have focused on pinpointing causal mutations and pivotal genes underlying traits of interest, the task of systematically screening and validating genomic markers linked to copy number changes remains inherently challenging due to their structural complexity and the intricate interplay of genetic factors. As a prominent source of genetic diversity distinguishing individuals within a population, CNVs hold significant potential to drive phenotypic alterations. They can exert their influence through multiple mechanisms, such as disrupting gene architecture, altering gene dosage (thereby modifying the number of gene products), and perturbing the delicate balance of allele frequencies that govern regulatory networks. These effects underscore the critical role of CNVs in shaping the genetic and phenotypic landscape of a species, highlighting their importance in evolutionary trajectories and breeding strategies [31]. Over the past several decades, the advent and rapid advancement of high-throughput sequencing (HTS) methodologies, coupled with sophisticated bioinformatics analytics, have progressively revolutionized the field of genomic research. These technological innovations have been pivotal in enabling the construction of comprehensive, genome-wide maps of CNV, offering unprecedented resolution and scale in the study of structural genomic diversity [32]. The diversity of CNVs has been extensively explored in various animals, such as cattle, sheep, chickens, and pigs.
In our present investigation, we leveraged whole genome sequencing data generated through Next Generation Sequencing (NGS) technology to uncover CNV. When contrasted with conventional CNV-detection approaches, such as those relying on SNP microarrays and array Comparative Genomic Hybridization (aCGH), NGS offers a host of benefits in accurately quantifying both the quantity and dimensions of CNVs. Thanks to its exceptional sensitivity in CNV detection, NGS is capable of pinpointing CNV boundaries with a far greater degree of precision [33]. Compared with the ARS-UCD1.2 reference genome, it improves the reliability of CNV screening. To accurately gauge copy numbers at genomic breakpoints and structural variation hotspots, we employed three distinct software tools, each employing unique algorithms tailored for precise CNV detection. Our results showed that copy number duplication events were more common than deletion events, which is consistent with most previous reports. In addition, the location distribution of CNVRs in the cattle genome is not uniform and is non-randomly scattered on chromosomes. Genomic annotation revealed that a substantial proportion of CNVR was predominantly mapped to intergenic or intronic segments within the cattle genome (Figure 1C). This finding aligns with prior research, which similarly indicates that numerous CNVRs are situated within genomic loci characterized by high variability, often encompassing genes with dynamic regulatory or structural features [23].
In the present study, GO enrichment analysis showed that many CNVR-carrying genes were significantly enriched with GO terms related to sensory perception (Supplementary Table S3). This is consistent with findings from studies of CNVs in humans, yak, pigs, horses, dogs, and mice, which also found significant enrichment of GO terms related to sensory perception [34,35,36]. In addition, GO terms related to energy metabolism were also significantly enriched. Fine regulation of energy metabolism plays a decisive role in the healthy growth and reproduction of cattle under different climate conditions and food resource supplies. Through KEGG signaling pathway analysis, it was found that CNVR genes were significantly enriched in signal transduction and nutritional metabolism (Supplementary Table S4). The calcium ion signaling pathway represents a pivotal signaling mechanism within cells, exhibiting significance across a diverse array of cell types and functions. It has been demonstrated to regulate a number of processes, including cell proliferation, differentiation, apoptosis, and various physiological functions. This is achieved by regulating changes in the concentration of calcium ions within the cell, transmitting information and triggering a series of biological reactions. Furthermore, the enrichment of CNVR overlapping genes on nervous system-related signaling pathways in Nanyang cattle breeds has established a genetic foundation that enables adaptation to diverse environmental challenges. These gene enrichments enable the Nanyang cattle line to respond more effectively to various environmental changes by optimizing neural responses and behavioral patterns. This discovery does far more than simply enrich our comprehension of how cattle have evolved adaptively over time; it also paves the way for innovative avenues and fresh conceptual frameworks in future research endeavors. These will be focused on enhancing the genetic traits of Nanyang cattle breeds and refining their capacity to thrive in diverse environmental conditions.
The QTL analysis in this study showed that many CNVR overlapping genes were located in the growth and carcass QTL regions, such as AOX1, KRT72, SFXN1, ZBTB7C, etc. The levels of 2-pyrrolidone and glycerophospholipids are regulated by the gene expression of AOX1, which further affects the levels of volatiles, 2-pyrrolidone, and decanal, respectively [37]. The KRT7 gene exhibited elevated expression levels within the phenotype group, suggesting that keratin proteins play a role in the manifestation of the plaque-associated phenotype. Keratin proteins, collectively denoted as KRTs, constitute the primary structural constituents of skin, hair, and wool. They exert regulatory influence over the growth and developmental processes of these tissues [38]. Sfxn1 is essential for erythrocyte maturation via facilitating hemoglobin production in zebrafish [39]. The ZBTB7C gene functions as a regulatory hub, governing the expression of Matrix Metalloproteinases (MMPs)—a family of zinc-dependent endopeptidases with multifaceted roles in cellular processes. MMPs are pivotal in orchestrating cell proliferation, migration, and differentiation, as well as in regulating angiogenesis and apoptosis [40]. Given these critical functions, the genes harboring CNV identified in this study emerge as promising molecular markers. These markers hold significant potential for guiding future breeding strategies aimed at enhancing the Nanyang cattle lineage, offering a genetic basis for targeted improvements in traits relevant to livestock production and health.
Selective sweep analysis serves as a powerful tool to uncover critical genomic regions that harbor candidate genes shaped by both environmental pressures and artificial selection throughout the processes of adaptation and domestication [41]. Notably, in our study, the unconventional kinesin KIF26A. This gene exerts a pivotal influence on the development of the enteric nervous system (ENS), functioning by suppressing a cell growth-promoting signaling cascade [42]. SPINT4 is an epididymis-specific protein having anti-proteolytic activity, which is related to spermatozoal maturation, motility, and male fertility [43]. Functional enrichment assessments revealed that genes co-localized within the distinct CNVRs were predominantly implicated in vital biological pathways. These pathways encompassed regulatory mechanisms and the calcium signaling cascade, shedding light on their substantial contribution to the adaptive disparities observed across different cattle breeds.
While this genome-wide CNV profiling of Nanyang cattle populations provides valuable structural variation data, two validation layers remain outstanding. Firstly, the absence of orthogonal biological validation (qPCR or droplet digital PCR confirmation for high-impact CNVRs) introduces uncertainty in boundary delineation accuracy, particularly for complex tandem duplications. Secondly, the phenotypic correlative analysis is constrained by unavailable quantitative trait datasets, could not be integrated through mixed linear models (GWAS-based CNV–phenotype associations). Future investigations should prioritize mediated CNV reconstruction in bovine primary myocytes coupled with transcriptomic profiling to empirically verify dosage effects on candidate genes.

5. Conclusions

The Nanyang cattle lineage exhibits lineage-specific CNVRs enriched in calcium signaling, cell adhesion, and Oxytocin pathways, directly linking structural genomic variation to enhanced muscle development, tissue integrity, and reproductive efficiency. Overlap analysis identified 131 CNVRs colocalized with 81 QTLs, including AOX1 (associated with meat flavor compounds) and SFXN1(linked to meat tenderness), implicating copy number dosage effects in carcass quality optimization. Selective sweep analysis revealed divergent CNVRs (KIF26A and SPINT4) under breed-specific selection, reflecting adaptive pressures on neuronal regulation and reproductive traits. Notably, 54.4% of CNVRs reside in intergenic regions, suggesting regulatory element disruption as a driver of phenotypic diversity. Breed-specific copy number differences in OR5W1 (olfaction) and LRP5 (bone density) further highlight genetic adaptations to ecological niches and management practices, providing actionable insights for precision breeding programs targeting productivity and resilience.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes16050568/s1, Tables S1–S8.

Author Contributions

Conceptualization, D.D. and L.Y.; methodology, D.D.; software and validation, D.D.; formal analysis, Y.R.; investigation, L.Z.; resources, L.G. and L.P.; writing—original draft preparation, D.D.; writing—review and editing, L.P.; visualization, Y.R.; project administration, L.Z.; funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Research and Demonstration on Intelligent Management of High Quality Beef Cattle Industry in Yunnan Plateau (Major Special Project of Yunnan Province, No. 202102AE090009).

Institutional Review Board Statement

Not involving humans or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm.nih.gov/, PRJNA379859, PRJNA396672, PRJNA698276, PRJNA1058368.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mérot, C.; Oomen, R.A.; Tigano, A.; Wellenreuther, M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol. Evol. 2020, 35, 561–572. [Google Scholar] [CrossRef] [PubMed]
  2. Salehian-Dehkordi, H.; Xu, Y.X.; Xu, S.S.; Li, X.; Luo, L.Y.; Liu, Y.J.; Wang, D.-F.; Cao, Y.-H.; Shen, M.; Lv, F.H.; et al. Genome-wide detection of copy number variations and their association with distinct phenotypes in the world’s sheep. Front. Genet. 2021, 12, 670582. [Google Scholar] [CrossRef] [PubMed]
  3. Cao, X.K.; Huang, Y.Z.; Ma, Y.L.; Cheng, J.; Qu, Z.X.; Ma, Y.; Bai, Y.-Y.; Tian, F.P.; Lin, F.; Chen, H.; et al. Integrating CNVs into meta-QTL identified GBP4 as positional candidate for adult cattle stature. Funct. Integr. Genom. 2018, 18, 559–567. [Google Scholar] [CrossRef]
  4. Upadhyay, M.; da Silva, V.H.; Megens, H.-J.; Visker, M.H.P.W.; Ajmone-Marsan, P.; Bâlteanu, V.A.; Dunner, S.; Garcia, J.F.; Ginja, C.; Kantanen, J.; et al. Distribution and functionality of copy number variation across European cattle populations. Front. Genet. 2017, 8, 108. [Google Scholar] [CrossRef]
  5. Liu, M.; Zhou, Y.; Rosen, B.D.; Van Tassell, C.P.; Stella, A.; Tosser-Klopp, G.; Rupp, R.; Palhière, I.; Colli, L.; Sayre, B.; et al. Diversity of copy number variation in the worldwide goat population. Heredity 2019, 122, 636–646. [Google Scholar] [CrossRef]
  6. Stafuzza, N.B.; Silva, R.M.D.O.; Fragomeni, B.D.O.; Masuda, Y.; Huang, Y.; Gray, K.; Lourenco, D.A.L. A genome-wide single nucleotide polymorphism and copy number variation analysis for number of piglets born alive. BMC Genom. 2019, 20, 321. [Google Scholar] [CrossRef]
  7. Yang, L.; Han, J.; Deng, T.; Li, F.; Han, X.; Xia, H.; Quan, F.; Hua, G.; Yang, L.; Zhou, Y. Comparative analyses of copy number variations between swamp buffaloes and river buffaloes. Anim. Genet. 2023, 54, 199–206. [Google Scholar] [CrossRef]
  8. Hu, L.; Zhang, L.; Li, Q.; Liu, H.; Xu, T.; Zhao, N.; Han, X.; Xu, S.; Zhao, X.; Zhang, C. Genome-wide analysis of CNVs in three populations of Tibetan sheep using whole-genome resequencing. Front. Genet. 2022, 13, 971464. [Google Scholar] [CrossRef]
  9. Ma, Q.; Liu, X.; Pan, J.; Ma, L.; Ma, Y.; He, X.; Zhao, Q.; Pu, Y.; Li, Y.; Jiang, L. Genome-wide detection of copy number variation in Chinese indigenous sheep using an ovine high-density 600 K SNP array. Sci. Rep. 2017, 7, 912. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Wei, Z.; Zhang, M.; Wang, S.; Gao, T.; Huang, H.; Zhang, T.; Cai, H.; Liu, X.; Fu, T.; et al. Population Structure and Selection Signal Analysis of Nanyang Cattle Based on Whole-Genome Sequencing Data. Genes 2024, 15, 351. [Google Scholar] [CrossRef]
  11. Lyu, Y.; Wang, F.; Cheng, H.; Han, J.; Dang, R.; Xia, X.; Wang, H.; Zhong, J.; Lenstra, J.A.; Zhang, H.; et al. Recent selection and introgression facilitated high-altitude adaptation in cattle. Sci. Bull. 2024, 69, 3415–3424. [Google Scholar] [CrossRef] [PubMed]
  12. Xia, X.; Zhang, F.; Li, S.; Luo, X.; Peng, L.; Dong, Z.; Pausch, H.; Leonard, A.S.; Crysnanto, D.; Wang, S.; et al. Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment. Genome Biol. 2023, 24, 211. [Google Scholar] [CrossRef] [PubMed]
  13. Zhang, S.; Yao, Z.; Li, X.; Zhang, Z.; Liu, X.; Yang, P.; Chen, N.; Xia, X.; Lyu, S.; Shi, Q.; et al. Assessing genomic diversity and signatures of selection in Pinan cattle using whole-genome sequencing data. BMC Genom. 2019, 23, 460. [Google Scholar] [CrossRef] [PubMed]
  14. Song, X.; Yao, Z.; Zhang, Z.; Lyu, S.; Chen, N.; Qi, X.; Liu, X.; Ma, W.; Wang, W.; Lei, C.; et al. Whole-genome sequencing reveals genomic diversity and selection signatures in Xia’nan cattle. BMC Genom. 2024, 25, 559. [Google Scholar] [CrossRef]
  15. Mei, C.; Junjvlieke, Z.; Raza, S.H.A.; Wang, H.; Cheng, G.; Zhao, C.; Zhu, W.; Zan, L. Copy number variation detection in Chinese indigenous cattle by whole genome sequencing. Genomics 2020, 112, 831–836. [Google Scholar] [CrossRef]
  16. Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
  17. Yuan, H.; Wei, W.; Zhang, Y.; Li, C.; Zhao, S.; Chao, Z.; Xia, C.; Quan, J.; Gao, C. Unveiling the Influence of Copy Number Variations on Genetic Diversity and Adaptive Evolution in China’s Native Pig Breeds via Whole-Genome Resequencing. Int. J. Mol. Sci. 2024, 25, 5843. [Google Scholar] [CrossRef]
  18. Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef]
  19. Abyzov, A.; Urban, A.E.; Snyder, M.; Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011, 21, 974–984. [Google Scholar] [CrossRef]
  20. Layer, R.M.; Chiang, C.; Quinlan, A.R.; Hall, I.M. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 2014, 15, R84. [Google Scholar] [CrossRef]
  21. Wang, X.; Zheng, Z.; Cai, Y.; Chen, T.; Li, C.; Fu, W.; Jiang, Y. CNVcaller: Highly efficient and widely applicable software for detecting copy number variations in large populations. Gigascience 2017, 6, gix115. [Google Scholar] [CrossRef] [PubMed]
  22. Jeffares, D.C.; Jolly, C.; Hoti, M. Transient structural variations alter gene expression and quantitative traits in Schizosaccharomyces pombe. bioRxiv 2016, 047266. [Google Scholar]
  23. Huang, Y.; Li, Y.; Wang, X.; Yu, J.; Cai, Y.; Zheng, Z.; Li, R.; Zhang, S.; Chen, N.; Nanaei, H.A.; et al. An atlas of CNV maps in cattle, goat and sheep. Sci. China Life Sci. 2021, 64, 1747–1764. [Google Scholar] [CrossRef] [PubMed]
  24. Hao, Z.; Lv, D.; Ge, Y.; Shi, J.; Weijers, D.; Yu, G.; Chen, J. RIdeogram: Drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 2020, 6, e251. [Google Scholar] [CrossRef]
  25. Wang, K.; Li, M.; Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010, 38, e164. [Google Scholar] [CrossRef]
  26. Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 44–57. [Google Scholar] [CrossRef]
  27. Hu, Z.L.; Park, C.A.; Reecy, J.M. Building a livestock genetic and genomic information knowledgebase through integrative developments of Animal QTLdb and CorrDB. Nucleic Acids Res. 2019, 47, D701–D710. [Google Scholar] [CrossRef]
  28. Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef]
  29. Redon, R.; Ishikawa, S.; Fitch, K.R.; Feuk, L.; Perry, G.H.; Andrews, T.D.; Fiegler, H.; Shapero, M.H.; Carson, A.R.; Chen, W.; et al. Global variation in copy number in the human genome. Nature 2006, 444, 444–454. [Google Scholar] [CrossRef]
  30. Seol, D.; Ko, B.J.; Kim, B.; Chai, H.-H.; Lim, D.; Kim, H. Identification of copy number variation in domestic chicken using whole-genome sequencing reveals evidence of selection in the genome. Animals 2019, 9, 809. [Google Scholar] [CrossRef]
  31. Pinto, D.; Darvishi, K.; Shi, X.; Rajan, D.; Rigler, D.; Fitzgerald, T.; Lionel, A.C.; Thiruvahindrapuram, B.; MacDonald, J.R.; Mills, R.; et al. Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat. Biotechnol. 2011, 29, 512–520. [Google Scholar] [CrossRef] [PubMed]
  32. Liu, Y.; Mu, Y.; Wang, W.; Ahmed, Z.; Wei, X.; Lei, C.; Ma, Z. Analysis of genomic copy number variations through whole-genome scan in Chinese Qaidam cattle. Front. Vet. Sci. 2023, 10, 1148070. [Google Scholar] [CrossRef]
  33. Yuan, C.; Lu, Z.; Guo, T.; Yue, Y.; Wang, X.; Wang, T.; Zhang, Y.; Hou, F.; Niu, C.; Sun, X.; et al. A global analysis of CNVs in Chinese indigenous fine-wool sheep populations using whole-genome resequencing. BMC Genom. 2021, 22, 78. [Google Scholar] [CrossRef] [PubMed]
  34. Jia, C.; Wang, H.; Li, C.; Wu, X.; Zan, L.; Ding, X.; Guo, X.; Bao, P.; Pei, J.; Chu, M.; et al. Genome-wide detection of copy number variations in polled yak using the Illumina BovineHD BeadChip. BMC Genom. 2019, 20, 376. [Google Scholar] [CrossRef] [PubMed]
  35. Berglund, J.; Nevalainen, E.M.; Molin, A.-M.; Perloski, M.; The LUPA Consortium; André, C.; Zody, M.C.; Sharpe, T.; Hitte, C.; Lindblad-Toh, K.; et al. Novel origins of copy number variation in the dog genome. Genome Biol. 2012, 13, R73. [Google Scholar] [CrossRef]
  36. Paudel, Y.; Madsen, O.; Megens, H.-J.; Frantz, L.A.; Bosse, M.; Bastiaansen, J.W.; Crooijmans, R.P.; Groenen, M.A. Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication. BMC Genom. 2013, 14, 449. [Google Scholar] [CrossRef]
  37. Liu, D.; Zhang, H.; Yang, Y.; Liu, T.; Guo, Z.; Fan, W.; Wang, Z.; Yang, X.; Zhang, B.; Liu, H.; et al. Metabolome-based genome-wide association study of duck meat leads to novel genetic and biochemical insights. Adv. Sci. 2023, 10, 2300148. [Google Scholar] [CrossRef]
  38. Zhao, B.; Cai, J.; Zhang, X.; Li, J.; Bao, Z.; Chen, Y.; Wu, X. Single nucleotide polymorphisms in the KRT82 promoter region modulate irregular thickening and patchiness in the dorsal skin of New Zealand rabbits. BMC Genom. 2024, 25, 458. [Google Scholar] [CrossRef]
  39. Bao, B.; An, W.; Lu, Q.; Wang, Y.; Lu, Z.; Tu, J.; Zhang, H.; Duan, Y.; Yuan, W.; Zhu, X.; et al. Sfxn1 is essential for erythrocyte maturation via facilitating hemoglobin production in zebrafish. Biochim. Biophys. Acta BBA-Mol. Basis Dis. 2021, 1867, 166096. [Google Scholar] [CrossRef]
  40. Jeon, B.N.; Yoon, J.H.; Kim, M.K.; Choi, W.I.; Koh, D.I.; Hur, B.; Kim, K.; Kim, K.S.; Hur, M.W. Zbtb7c is a molecular ‘off’ and ‘on’ switch of Mmp gene transcription. Biochim. Biophys. Acta BBA–Gene Regul. Mech. 2016, 1859, 1429–1439. [Google Scholar] [CrossRef]
  41. Chen, Q.; Qu, K.; Ma, Z.; Zhan, J.; Zhang, F.; Shen, J.; Ning, Q.; Jia, P.; Zhang, J.; Chen, N.; et al. Genome-wide association study identifies genomic loci associated with neurotransmitter concentration in cattle. Front. Genet. 2020, 11, 139. [Google Scholar] [CrossRef] [PubMed]
  42. Zhou, R.; Niwa, S.; Homma, N.; Takei, Y.; Hirokawa, N. KIF26A is an unconventional kinesin and regulates GDNF-Ret signaling in enteric neuronal development. Cell 2009, 139, 802–813. [Google Scholar] [CrossRef] [PubMed]
  43. Zhao, W.; Mengal, K.; Yuan, M.; Quansah, E.; Li, P.; Wu, S.; Xu, C.; Yi, C.; Cai, X. Comparative RNA-Seq analysis of differentially expressed genes in the epididymides of Yak and cattleyak. Curr. Genom. 2019, 20, 293–305. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Genetic diversity and distribution of CNVRs in Nanyang cattle. (A) Autosomal distribution of CNVRs. The colors painted on the chromosomes represent gene density, and the positions of different colors outside the chromosomes represent duplications (orange), deletions (green), and duplications and deletions (purple). (B) The frequency of different types of CNVRs. (C) Functional classification of the detected CNVRs.
Figure 1. Genetic diversity and distribution of CNVRs in Nanyang cattle. (A) Autosomal distribution of CNVRs. The colors painted on the chromosomes represent gene density, and the positions of different colors outside the chromosomes represent duplications (orange), deletions (green), and duplications and deletions (purple). (B) The frequency of different types of CNVRs. (C) Functional classification of the detected CNVRs.
Genes 16 00568 g001
Figure 2. The Manhattan plot of the gene Vst values of the whole genome. (A) NY vs. PN. (B) NY vs. XN. (C) PN vs. XN. The red line indicates 0.8 Vst.
Figure 2. The Manhattan plot of the gene Vst values of the whole genome. (A) NY vs. PN. (B) NY vs. XN. (C) PN vs. XN. The red line indicates 0.8 Vst.
Genes 16 00568 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dang, D.; Zhang, L.; Gao, L.; Peng, L.; Rao, Y.; Yang, L. Genome-Wide Analysis of Copy Number Variations in Three Populations of Nanyang Cattle Using Whole-Genome Resequencing. Genes 2025, 16, 568. https://doi.org/10.3390/genes16050568

AMA Style

Dang D, Zhang L, Gao L, Peng L, Rao Y, Yang L. Genome-Wide Analysis of Copy Number Variations in Three Populations of Nanyang Cattle Using Whole-Genome Resequencing. Genes. 2025; 16(5):568. https://doi.org/10.3390/genes16050568

Chicago/Turabian Style

Dang, Dong, Lilian Zhang, Lutao Gao, Lin Peng, Yao Rao, and Linnan Yang. 2025. "Genome-Wide Analysis of Copy Number Variations in Three Populations of Nanyang Cattle Using Whole-Genome Resequencing" Genes 16, no. 5: 568. https://doi.org/10.3390/genes16050568

APA Style

Dang, D., Zhang, L., Gao, L., Peng, L., Rao, Y., & Yang, L. (2025). Genome-Wide Analysis of Copy Number Variations in Three Populations of Nanyang Cattle Using Whole-Genome Resequencing. Genes, 16(5), 568. https://doi.org/10.3390/genes16050568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop