Identify Candidate Genes Associated with the Weight and Egg Quality Traits in Wenshui Green Shell-Laying Chickens by the Copy Number Variation-Based Genome-Wide Association Study

Simple Summary The Wenshui green shell-laying chicken is an improved breed crossed from the Wenshang reed-feather chicken and Xinyang green shell-laying chicken, whose major characteristics are reed feathers, green-shelled eggs, a high egg-laying number, and excellent egg quality. The study of body weight and egg quality traits in Wenshui green shell-laying chickens is important for chicken-related breeding work. In this paper, we performed a copy number variation regions (CNVRs)-based genome-wide association study (GWAS) in Wenshui green shell-laying chickens to identify variations and candidate genes associated with their weight and egg quality traits. Finally, we identified important genes associated with body weight and egg quality traits. This study can provide a basic reference for the genetic improvement of chickens’ body weight and egg quality traits. Abstract Copy number variation (CNV), as an essential source of genetic variation, can have an impact on gene expression, genetic diversity, disease susceptibility, and species evolution in animals. To better understand the weight and egg quality traits of chickens, this paper aimed to detect CNVs in Wenshui green shell-laying chickens and conduct a copy number variation regions (CNVRs)-based genome-wide association study (GWAS) to identify variants and candidate genes associated with their weight and egg quality traits to support related breeding efforts. In our paper, we identified 11,035 CNVRs in Wenshui green shell-laying chickens, which collectively spanned a length of 13.1 Mb, representing approximately 1.4% of its autosomal genome. Out of these CNVRs, there were 10,446 loss types, 491 gain types, and 98 mixed types. Notably, two CNVRs showed significant correlations with egg quality, while four CNVRs exhibited significant associations with body weight. These significant CNVRs are located on chromosome 4. Further analysis identified potential candidate genes that influence weight and egg quality traits, including FAM184B, MED28, LAP3, ATOH8, ST3GAL5, LDB2, and SORCS2. In this paper, the CNV map of the Wenshui green shell-laying chicken genome was constructed for the first time through population genotyping. Additionally, CNVRs can be employed as molecular markers to genetically improve chickens’ weight and egg quality traits.


Introduction
CNV is one of the common structural variation phenomena in the genome, ranging in size from 50 bp to several Mb.Its variation types include copy number deletions, Vet.Sci.2024, 11, 76 2 of 13 insertions, recombinations, and multi-site complex mutations [1].CNV is also one of the significant genetic bases for the evolution of individual phenotypic diversity and population adaptation [2].It accounts for a relatively large proportion of the total genetic variation in a species [3], usually through dosage and positional effects of genes to achieve structural variation in genes [4,5].It can modulate organismal plasticity and influence disease production and development [6].It is widespread in the genomes of humans and other species, covers many more nucleotides than the total number of single nucleotide polymorphisms (SNPs), and greatly enriches the diversity of genomic variation [7].
In the current poultry production process, genetic variation has received widespread attention as one of the main factors influencing traits as generations alternate.Among them are several studies of CNV.For example, identified CNVs have been associated with broiler body weight [8], belly fat [9], and skin color [10], as well as breed-specific CNVs detected at the population level [11].Therefore, studying chicken traits from a genomic perspective can help further develop their economic traits.
Currently, there are many methods to investigate CNV.In addition to conventional cytogenetic research methods, other methods include Array Comparative Genomic Hybridization (aCGH) chip technology [12], SNP chip technology [13], and next-generation sequencing (NGS) technology [14].NGS technology, compared to aCGH chip technology and SNP chip technology, has higher resolution, the ability to perform diversity analysis, and the capability to detect a wider range of variations [15].In NGS, various software can analyze CNV detection from whole-genome sequencing (WGS) data.According to the principle, software can be divided into four categories, namely Read-pair (RP), Split-read (SR), Read-depth (RD), and Assembly (AS); there is also some software using the Combined Approach (CA) to detect CNV [16,17].
With the popularization of CNV research in animals, many researchers have attempted to perform CNV-based GWAS research [2,8,[18][19][20][21].Since the concept of the GWAS was first proposed by Risch et al. in 1996 [22], the GWAS has been used primarily to discover genes associated with human genetic disease [23].The genome information of many animals has improved with the rapid development of sequencing technology.GWAS research on CNV has gradually shifted from human diseases to economic and phenotypic traits in livestock species [18,19,[24][25][26].This indicates that CNV may significantly impact critical economic traits of livestock [27].
This paper aimed to identify CNV in Wenshui green shell-laying chickens and to conduct GWAS analysis for weight and egg quality traits based on copy number variation regions (CNVRs).The aim was to explore the genetic variation and candidate genes related to weight and egg quality traits of Wenshui green shell-laying chickens and to provide the basis for applying molecular breeding techniques such as molecular marker-assisted selection and genome selection to improve chickens' weight and egg quality traits.

Population Description
The animals in this experiment were selected from Jinqiu Agricultural and Animal Husbandry Technology Co., Ltd.(Tai'an, China), and a total of 834 Wenshui green shell-laying chickens from the same batch were selected as the experimental group.Egg production data were recorded for 3 months after the test group started laying, and egg quality was measured for eggs laid at 30 and 40 weeks of age.The Wenshui green shell-laying chicken selected for this experiment is an improved breed.It is a cross between the Wenshang reed-feather chicken and the Xinyang green shell-laying chicken, created through selection technology that combines modern conventional breeding technology and molecular detection technology.The varieties are cultivated by aggregating the good characters (genes) such as reed feathers, green-shelled eggs, high egg yield number, and good egg quality.

Phenotyping
The egg weight and egg shape index of Wenshui green shell-laying chickens at 30 weeks of age were measured.The egg weight (EW), egg shape index (ESI), yolk color (YC), egg white height (EWH), shell thickness (SH), shell strength (SS), yolk weight (YW), shell weight (SW), egg white weight (EWW), yolk ratio (YR), shell ratio (SR), egg white ratio (EWR), concentrated egg white long diameter (EWL), concentrated egg white short diameter (EWS) and Haugh unit (HU) of Wenshui green shell-laying chickens at 40 weeks of age were measured.The weights of Wenshui green shell-laying chickens at birth (BW), 4 weeks (4-W), 8 weeks (8-W), 13 weeks (13-W), 15 weeks (15-W), and 38 weeks (38-W) of age were measured.In order to ensure the quality of the data, the phenotypic data were quality tested and the detailed results are shown in Supporting Information Table S1.
Blood was collected from the test group at around 50 weeks of age using the subwing venous blood collection method.Insulation was prepared before collection, to prevent samples from going bad due to high temperatures.The storage temperature was −20 • C. TLANGEN's Genomic DNA Extraction Kit was used to extract DNA from blood.Each sample's DNA was extracted from the blood using the phenol-chloroform procedure, and the samples were analyzed for DNA contamination using a 1% agarose gel [28].The OD 260 /OD 280 ratio was measured to identify the content and quality of the DNA, with the OD 260 /OD 280 generally ranging from 1.7-1.9.The extracted high-quality DNA was sent to Beijing Youji Technology Co., Ltd.(Beijing, China) for whole-genome sequencing (paired-end sequencing was performed and reads were 150 bp in length) to obtain the raw genomic data.

Sequence Alignment to Reference Genome
Quality screening of the raw genomic data was carried out by removing adapter sequences using Trimmomatic software v0.38 [29].The average reading per sample after quality control was 44,931,409 and the average sequencing depth was 11.74X.The qualityscreened data were compared to the reference genome (the reference genome is the Wenshui green shell-laying chicken's genome) using bwa software v0.7.17 [30].The average comparison rate was as high as 99.76%, and the average coverage was 97.55%.Repeat sequences were labeled using GATK software v4.2.6.1 after a comparison of the data [31].

CNV Detection
In this paper, DELLY software v1.1.6was used in the CA method to detect CNV in the genomic data of the test population.DELLY combines short-range and long-range pairedend mapping and split-read analysis to detect balanced and unbalanced forms of structural variation, such as deletions, tandem duplications, inversions, and translocations, with high sensitivity and specificity [32].Since DELLY software v1.1.6 is designed for detecting CNVs in populations, the population results directly present the curated population's CNVR (the copy number variation region is a large genomic segment formed by adjacent copy number variation sections with overlapping regions [33]) obtained by merging overlapping CNVs across samples.The CNVR is defined as gain, loss, or mixed (losses and gains happening in the same area).The CNVs detected by DELLY were filtered by retaining the rows that contained "pass" in the FILTER column of the vcf file.Additionally, CNVRs within the range of 50 bp to 5 Mb were selected.After applying these filters, the CNVRs were further examined using BEDTools software v2.26.0.The purpose was to ensure that any overlapping CNVs in the population were merged [34].

CNV-Based GWAS
We selected the CNVR datasets with frequencies above 0.5% in each population to increase the precision of the GWAS results [18].Format conversion of collated vcf files to bim, fam, and bed files was performed using plink v2.0 [35], where the bed files storing the genotypes were stored and needed to be recoded.Coding was based on genotype information in vcf files, gain, loss, and normal (2n), and coded as 1, −1, and 0, respectively [8,19].The mixed linear model (MLM) of GMAT software was selected for single-trait genome-wide association analysis [36].The individuals measured in this paper were born during the same period and housed in the same coop in stepped cages for rearing.The uniform model is as follows: where y is the vector of phenotypic observations for individuals; µ is the population mean; g is the individual CNV effect vector and W is the design matrix of g; u is the multigene effect vector and Z is the design matrix of u; e is the random residual vector.Where the random effects follow a normal distribution, where G is the genomic relationship matrix constructed by CNV; I is the unit array; σ 2 a is the additive genetic variance; σ 2 e is the random residuals.In the CNVR-based GWAS, we established a threshold for genome-wide significance, which was set at (0.05/N).N represents the number of CNVRs [18].

Gene Annotation
Self-written scripts were developed, considering the genes contained in a 100 kb window (50 kb up-and downstream) around the genomic regions of significant associated CNVRs based on the gff file of the Wenshui green shell-laying chicken's reference genome.

CNVR-Based Genome-Wide Association Study
The genome-wide association study (GWAS) was performed to identify associations between the CNVR and 23 weight and egg quality traits evaluated in Wenshui green shelllaying chickens.Five traits were found to be significantly associated with the CNVR.A Manhattan plot of the CNVRs significantly associated with weight and egg quality traits on the 38 autosomes is shown in Figure 4. with weight and egg quality traits, among which two CNVRs were significantly correlated with 30-EW, one CNVR was significantly correlated with 40-EW and 40-EWW, four CNVRs were significantly correlated with 8-W, and one CNVR was significantly correlated with 38-W.Eleven genes were identified within a 100-kb window in genomic regions defined by significant CNVRs associated with 30-EW, 40-EW, 40-EWW, 8-W, and 38-W.  CNVR Position: CNVR position based on the Wenshui green shell-laying chicken's genome. 4p-value: genome-wide significance. 5Proximal Gene: ID of the gene on NCBI.

Discussion
In this paper, we identified 11,035 CNVRs in Wenshui green shell-laying chickens with a total length of 13.1 Mb, representing approximately 1.4% of their autosomal genome.There were 10,446 loss types, 491 gain types, and 98 mixed types.CNVR sizes ranged from 51 bp to 642.6 kb.To compare these results to previous studies, for example, Rao et al. detected 357 CNVRs using PennCNV software to detect CNVs in F2 generation flocks of White Recessive Rock and Xinhua chickens.Among the CNVRs, there were 213 loss types, 112 gain types, and 32 mixed types [37].Chen et al. combined a variety of CNV detection methods; mrFAST, CNVnator, BreakDancer, and Pindel were tested against one original (Red Jungle fowl), two commercial (Recessive White Rock chickens, White Leghorn chickens) and local Chinese chickens (Xinghua chickens, Luxi Game fowl, Beijing-You chickens).A total of 11,123 CNVRs were detected, of which 8834 were loss types, 1911 gain types, and 378 mixed types [38].Seol et al. used CNVnator software v0.4 to identify CNVs in four species (Cornish chickens, White Leghorn chickens, Rhode Island Red chickens, Red Jungle fowl) and screened 3079 CNVRs, of which 2443 were loss types and 636 were gain types [11].Zhang et al. used PennCNV software to identify CNVs in the 475th generation of broiler lines from Northeast Agricultural University and screened 460 CNVRs, of which 320 were loss types, 93 gain types, and 47 mixed types [9].Han et al. used aCGH microarray technology to detect CNVs in five breeds of chickens (Xichuan black-bone chickens, Silkie chickens, Lushi chickens, Gushi chickens, and Houdan chickens) and screened 281 CNVRs, among which 181 were loss types, 91 gain types, 9 mixed types [10].When comparing these results, it was found that the CNVR replication rate detected in the studies was not high, which could be due to a number of reasons, such as different methods (aCGH array, SNP array, and next-generation sequencing (NGS)), different species, different algorithmic software, and quality screening of results.In this paper, the results showed that there were more loss-type CNVs than gain-type and mixedtype CNVs, which is basically consistent with the results of other studies mentioned above.The reasons for this result may be multifaceted, and there is no specific explanation yet.It is speculated that it may be due to the differences in the genetic structure of the species and the differences in the selective pressure during the breeding process.In addition, in this paper, CNVs were not detected on some chromosomes, which may be due to various reasons.These reasons include the quality of the genomic data, the detection methods used, and the structure of the chromosomes.Among these factors, we have already conducted quality checks on the genomic data, and the detection method used in this experiment is widely accepted.Therefore, when considering multiple factors, it is possible that the structure of the chromosomes themselves is the reason.
After conducting a CNVR-based GWAS on weight and egg quality traits, it was found that six CNVRs on chromosome 4 were significantly correlated with weight and egg quality traits.Among them, two CNVRs were significantly correlated with egg quality traits.The genes screened were FAM184B, MED28, LAP3, ATOH8, and ST3GAL5.Four CNVRs were significantly correlated with body weight traits, and the genes screened were LOC112532307, LDB2, LOC107053295, LOC121110716, SORCS2, and LOC121110591.
The FAM184B gene is a protein-coding gene that is widely expressed in a variety of tissues, including the brain and skin.In chickens, Zhang et al. found that the FAM184B gene on chromosome 4 was associated with the first spawning weight in Jinghai yellow chickens [39].Jin et al. found that the FAM184B gene on chromosome 4 was closely related to body weight in Yancheng chickens [40].In this paper, 30-EW and 40-EW were associated with the FAM184B gene.One of the factors affecting egg weight is the weight of hens at the start of laying after sexual maturity, and the weight at the start of laying directly determines the weight of hens at the peak of laying [41][42][43].These results suggest that the FAM184B gene indirectly affects egg weight during the laying period by influencing the body weight of the Wenshui green shell-laying chickens.
The MED28 gene is a protein-coding gene involved in cell proliferation and cycle regulation.In cattle, the MED28 gene was found to be associated with body weight and intramuscular fat content [44][45][46].In sheep, the MED28 gene was found to be associated with prenatal and postnatal body weights [47,48].In pigs, the MED28 gene was found to be associated with muscle development [49].There are no reports in the literature about the MED28 gene in chickens.In this paper, 30-EW and 40-EW were correlated with the MED28 gene.It is hypothesized that the MED28 gene indirectly influences egg weight during the laying period by influencing the body weight of the Wenshui green shell-laying chickens.
The LAP3 gene encodes an aminopeptidase that catalyzes N-terminal amino acid removal and is implicated in protein maturation and degradation [50].Liu et al. found that the LAP3 gene on chromosome 4 in Beijing-You chickens is associated with carcass weight and visceral weight [51].In this paper, 30-EW and 40-EW were correlated with the LAP3 gene.It is hypothesized that the LAP3 gene indirectly influences egg weight during the laying period by influencing the body weight of the Wenshui green shell-laying chickens.
The ATOH8 gene is a transcription factor with a bHLH domain that is involved in the development of the nervous system, kidney, pancreas, retina, and muscle [52].Studies in chickens have shown the expression of the ATOH8 gene during skeletal myogenesis in chickens [53,54].In mice, it has been found that the ATOH8 gene regulates muscle cell proliferation by modulating myopeptide signaling [53].The energy metabolism status of muscles is a primary factor influencing the quality of eggs [55].In this paper, a correlation was found between the 30-EW and the ATOH8 gene.This suggests that the ATOH8 gene indirectly affects the egg weight during the laying period by influencing the growth of skeletal muscles in Wenshui green shell-laying chickens.
The ST3GAL5 gene is a protein-coding gene that plays a crucial role in glycosylation reactions and is involved in biological processes such as immune regulation and neural system development [56,57].Research conducted in chickens has found that the ST3GAL5 gene exhibited activity toward lactosylceramide.Furthermore, it was observed to be expressed at relatively higher levels in the small intestine, large intestine, and spleen of chickens [58].The small intestine and large intestine are vital organs for maintaining the digestive, endocrine, metabolic, and immune functions in livestock [59].In this paper, a correlation was found between the 30-week egg weight (30-EW) and the ST3GAL5 gene.This suggests that the ST3GAL5 gene indirectly affects the egg weight during the laying period in Wenshui green shell-laying chickens by influencing their digestive system.
The LDB2 gene may bind to a number of transcription factors and is critical for brain development and blood vessel creation [60,61].In a paper on chickens, Gu et al. discovered a correlation between the LDB2 gene on chromosome 4 and body weight during weeks 7-12 in an F2 population of Silky fowl and White Plymouth Rock chickens [62].Zhang et al. and Wang et al. also found a correlation between the LDB2 gene on chromosome 4 and body weight in Gushi-Anka F2 chickens and Jinghai Yellow chicken hens [63,64].Liu et al. found that the LDB2 gene on chromosome 4 in Beijing-You chickens is associated with carcass weight and visceral weight [51].Dou et al. discovered that the LDB2 gene is an important candidate gene influencing the rapid growth of broiler chickens [65].In this paper, a correlation was found between the 8-week weight (8-W) and the LDB2 gene.This suggests that the LDB2 gene can influence the body weight of Wenshui green shell-laying chickens.
The SORCS2 gene is a member of the Vps10p protein family, and Vps10p is associated with neurological disorders in mammals.In a paper on chickens, Li et al. explored the genetic mechanisms associated with aggressive behavior in chickens.They found that the SORCS2 gene on chromosome 4 in a Chinese native breed, a dwarf yellow meat-type chicken, may play an important role in regulating dopamine pathways and neurotrophic factors involved in aggressive behavior [66].Chen et al. found that the SORCS2 gene is a candidate gene for aggressive behavior in Luxi Game fowl [38].In this paper, a correlation was found between the 8-week weight (8-W) and the SORCS2 gene.This suggests that the SORCS2 gene can influence the body weight of Wenshui green shell-laying chickens.
At present, this is the first CNV identification in Wenshui green shell-laying chickens.CNVs were identified and compiled into CNVRs, while only autosomal chromosomes were considered.The association between weight and egg quality traits and CNVRs was determined by the GWAS.The interpretation of CNVR adjacent genes is helpful to better understand the weight and egg quality traits of Wenshui green shell-laying chickens.In addition, the limitations of this paper lie in the fact that there are many genetic variation factors that influence animal traits, such as SNPs, structural variation (SV), CNV, and DNA methylation.This study only focused on CNV detection in the Wenshui green shelllaying chicken breed, and further research is needed to investigate other aspects of genetic variation.This would contribute to a more comprehensive understanding of the Wenshui green shell-laying chicken breed.

Conclusions
CNVs were identified for the first time in a population of Wenshui green shell-laying chickens and merged into CNVRs.GWAS analysis based on CNVRs was conducted to investigate their association with weight and egg quality traits.A total of 11,035 CNVRs were identified in the entire population, accounting for approximately 1.4% of the chicken autosomal genome.The paper identified FAM184B, MED28, LAP3, ATOH8, and ST3GAL5 as potential candidate genes influencing 30-EW and 40-EW, while LDB2 and SORCS2 were identified as potential candidate genes influencing 8-W.Therefore, this research reveals the potential impacts of CNVs on weight and egg quality traits in Wenshui green shell-laying chickens, providing new insights for future studies.

Figure 1 .
Figure 1.Distribution of CNVR types in Wenshui green shell-laying chickens.Figure 1. Distribution of CNVR types in Wenshui green shell-laying chickens.

Figure 1 .
Figure 1.Distribution of CNVR types in Wenshui green shell-laying chickens.Figure 1. Distribution of CNVR types in Wenshui green shell-laying chickens.

Figure 2 .
Figure 2. The overall CNVR maps for Wenshui green shell-laying chickens in the 38 autosomes.There are three distinct categories of CNVR: loss (grey), gain (red), and mixed (blue).Autosome values are on the Y-axis, and chromosomal location in Mb is on the X-axis.

Figure 2 .
Figure 2. The overall CNVR maps for Wenshui green shell-laying chickens in the 38 autosomes.There are three distinct categories of CNVR: loss (grey), gain (red), and mixed (blue).Autosome values are on the Y-axis, and chromosomal location in Mb is on the X-axis.

Figure 4 .
Figure 4. Manhattan plot and QQ plot for CNVR segments on the 38 autosomal chromosomes associated with 30-EW, 40-EW, 40-EWW, 8-W, and 38-W.In the Manhattan plot, the X-axis represents the autosome and the Y-axis indicates the −log10 (p-value).The lines in the chart indicate FDR-corrected p-values of 0.05.The Manhattan plot shows the CNVRs that are significantly as-sociated with the trait, while the QQ plot shows the significance of the association between CNVRs and the trait.

Figure 4 .
Figure 4. Manhattan plot and QQ plot for CNVR segments on the 38 autosomal chromosomes associated with 30-EW, 40-EW, 40-EWW, 8-W, and 38-W.In the Manhattan plot, the X-axis represents the autosome and the Y-axis indicates the −log10 (p-value).The lines in the chart indicate FDR-corrected p-values of 0.05.The Manhattan plot shows the CNVRs that are significantly as-sociated with the trait, while the QQ plot shows the significance of the association between CNVRs and the trait.

Author Contributions:
Conceptualization, H.T. and C.N.; methodology, D.W.; software, C.N.; validation, H.T., C.N. and D.W.; formal analysis, S.Y.; investigation, D.W.; resources, H.T.; data curation, C.Y.; writing-original draft preparation, S.Y.; writing-review and editing, H.T.; visualization, W.L.; supervision, Q.Z.; project administration, H.T.; funding acquisition, H.T. All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the Agricultural Breeding Project of Shandong Province (2022LZGCQY016), the Shandong Modern Agricultural Industry and Technology System (SDAIT-11-02).Institutional Review Board Statement: All animals were treated in accordance with the guidelines for the care and use of laboratory animals prescribed by Shandong Agricultural University (No. SDAUA-2018-018).

Table 1 .
Distribution of CNVRs across autosomal chromosomes of the Wenshui green shell-laying chicken's genome.

Table 2
lists the CNVRs significantly associated

Table 2 .
Significant CNVRs associated with weight and egg quality traits in Wenshui green shelllaying chickens.EW: egg weight at 30 weeks of age; 40-EW: egg weight at 40 weeks of age; 40-EWW: egg white weight at 40 weeks of age; 8-W: weight at 8 weeks of age; 38-W: weight at 38 weeks of age. 2 Gain: duplications; loss: deletions.