Genome-Wide Association Study of Egg Production Traits in Shuanglian Chickens Using Whole Genome Sequencing

Egg production is the most important economic trait in laying hens. To identify molecular markers and candidate genes associated with egg production traits, such as age at first egg (AFE), weight at first egg (WFE), egg weight (EW), egg number (EN), and maximum consecutive egg laying days (MCD), a genome-wide analysis by whole genome sequencing was performed in Shuanglian chickens. Through whole genome sequencing and quality control, a total of 11,006,178 SNPs were obtained for further analysis. Heritability estimates ranged from moderate to high for EW (0.897) and MCD (0.632), and from low to moderate (0.193~0.379) for AFE, WFE, and EN. The GWAS results showed 11 genome-wide significant SNPs and 23 suggestive significant SNPs were identified to be associated with EN, MCD, WFE, and EW. Linkage disequilibrium analysis revealed twenty-seven SNPs associated with EN were located in a 0.57 Mb region on GGA10, and clustered into five blocks. Through functional annotation, three candidate genes NEO1, ADPGK, and CYP11A1, were identified to be associated with EN, while the S1PR4, LDB2, and GRM8 genes was linked to MCD, WFE, and EW, respectively. These findings may help us to better understand the molecular mechanisms underlying egg production traits in chickens and contribute to genetic improvement of these traits.


Introduction
Egg production is the most important economic trait in laying hens.In 2022, global chicken egg production was approximately 88.68 million tons, accounting for about 93% of the poultry egg production [1].Egg production traits mainly include the egg number (EN), age at first egg (AFE), weight at first egg (WFE), and egg weight (EW).EN is the most important production performance indicator among these traits, which is directly related to egg production [2,3].Furthermore, improving these egg production traits is the main objective in chicken breeding programs [4].However, since these traits are commonly of low heritability, complex, and controlled by polygenes, the genetic progress obtained through conventional breeding approaches is minor [5].Therefore, it is necessary to identify the key markers and genes associated with these traits for molecular breeding.
QTL mapping and genome-wide association study (GWAS) have been widely employed to detect many molecular markers for complex traits in chickens [6][7][8].To date, a total of 540 QTLs have been reported to be associated with egg number (EN), 56 QTLs with age at first egg (AFE), and 393 QTLs with egg weight (EW) in the Chicken QTL database (https://www.animalgenome.org/cgi-bin/QTLdb/GG/index,accessed on 25 August 2023).However, the resolution of QTL mapping and GWAS is poor due to the wide confidence intervals for position, resulting in few detected causative genes associated with these traits.In addition, studies on candidate genes related to egg production traits are mainly focused on the endocrine hypothalamic-pituitary-gonadal (HPG) axis [9].A number of candidate genes, including GnRH-I, NPY, GNRHR, GnIH, FSHR, PRL, PRLR, ESR, and VIP, have been identified by these association studies [10].In novel genes mining by GWAS, related studies have only reported a few SNPs significantly associated with egg production traits, and most of these SNPs are located on chromosomes 1, 2, 3, 4, 5, 6, 9, and 21 [9,[11][12][13][14].Therefore, to better understand the molecular mechanisms underlying egg production performance, more candidate genes and SNPs need to be identified.
The Shuanglian chicken is a well-known indigenous breed in China, which has characteristics of early sexual maturity, good meat quality, disease resistance, and excellent egg quality.In the present study, GWAS was performed on 163 Shuanglian chickens to identify molecular markers and candidate genes affecting egg production traits by whole genome sequencing.These promising SNPs and genes could be helpful to engineer practical breeding programs for the improvement of egg production performance.

Ethics Statement
All the animal research activities in this study were approved by the Animal Care and Use Committee of the Institute of Animal Science and Veterinary Medicine, Hubei Academy of Agricultural Sciences (HBAAS20220216).

Population and Phenotyping Measurement
The population size of Shuanglian chickens was 163 in this study.All chickens were raised in stair-step individual cages at Jinshui Experimental Base (Wuhan, Hubei province, China) under the regular feeding and management conditions.We recorded the egg production every day.The phenotyping measurement includes six traits: age at first egg (AFE), weight at first egg (WFE), egg weight (EW), egg number at 40 weeks (EN40), egg number at 43 weeks (EN43), and maximum consecutive egg laying days (MCD).The phenotypic correlation between each pair of traits were evaluated using SPSS software 20.A blood sample of each chicken was collected and put into a corning tube with EDTA for preparing the genomic DNA.

Whole Genome Sequencing and Quality Control
The genomic DNA was extracted using a standard technique of digestion with proteinase K followed by phenol-chloroform extraction and ethanol precipitation [15].DNA concentration and purity were measured by NanoDrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA, USA).The DNA extract was stored at −80 • C for further analysis.

Estimation of Genetic Parameters
Genetic parameters, including heritability and genetic correlation, were estimated based on average information-restricted maximum likelihood algorithm (AI-REML) in GCTA software 1.24.7 [20].The estimate model was as follows: Genes 2023, 14, 2129 3 of 13 where y is the vector of phenotype; b is the fixed effect; X and Z are the matrices corresponding to b and u; and e is the residual.u indicates the normal distribution ( u ∼ N(0, G jk σ 2 u )) that all the SNPs obey, in which σ 2 u is additive genetic variance, and G jk is constructed by genomic information: where X ij is the copy number of the reference allele at the ith SNP of individual j; X ik is the copy number of the reference allele at the ith SNP of individual k; p i is the minor allele frequency of the i th SNP in the population; and N is the number of SNPs.The heritability of each trait was calculated as . Genetic correlation was calculated as: where r G represents the genetic correlation between trait X and Y; cov G XY indicates the covariance of two traits; and σ 2 G X and σ 2 G Y indicate the variances of two traits, respectively.

Genome-Wide Association Study
The genome-wide association analysis was performed by mixed linear model in the GEMMA software 0.98.3 [21], and the model was as follows.
where y is the vector of phenotype; W is a matrix of covariates (including a column of 1 s and top three principal components of population structure); a is a vector of the corresponding coefficients including the intercept; X is a vector of SNP genotypes; b is the effect size of the SNP; u is a vector of individual random effects; e is a vector of errors.p-values were adjusted by the Bonferroni multiple test method, the threshold p-values of suggestive and genome-wide significance were calculated as 9.09 × 10 −8 (1/11,006,178) and 4.54 × 10 −9 (0.05/11,006,178), respectively.

Variance Explained by the Most Significant SNPs
Based on the most significant SNP associated with each trait, a linear regression model was constructed for analysis of variance: where Y is the phenotype; X is the genotype of the SNP; and b is the effect value of the genotype; a is the intercept which represents the background value of individual phenotype after deducting the genotype effect; and e is the residual.Therefore, there are three variances in this model, and their relationship was as follows: where Var(Y) is the variance of phenotype; Var(bX) is the variance of genetic effect of the SNP; and Var(e) is the variance of residual.The percentage of the phenotype variance explained by the most significant SNP was calculated as follows: Genes 2023, 14, 2129 4 of 13

Linkage Disequilibrium Analysis and Identification of Candidate Genes
Linkage disequilibrium (LD) analysis was performed using LDBlockShow software 1. 40 [22].The candidate genes near the most significant SNPs or in the candidate genomic region were identified by the gene annotation information from Ensembl genome browser 110 (http://asia.ensembl.org/index.html,accessed on 25 August 2023).Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were employed to identify related pathways.KEGG and GO analyses were performed using OmicShare Tools (https://www.omicshare.com,accessed on 25 August 2023).

Phenotype Data, Whole Genome Sequencing Data, and Population Structure
The phenotype data of age at first egg (AFE), weight at first egg (WFE), egg weight (EW), egg number at 40 weeks (EN40), egg number at 43 weeks (EN43), and maximum consecutive egg laying days (MCD) in the Shuanglian chicken population are shown in Table 1.Except for EW, the coefficient of variation (CV) of all traits is above 10%, with MCD having the highest CV (54.3%).Through whole genome sequencing and quality control, a total of 11,006,178 high-quality SNPs were obtained.The SNP density plot is presented in Figure 1.Principal component analysis (PCA) was carried out.And, the scatter plot of the first two principal components is displayed in Figure S1.Genes 2023, 14, x FOR PEER REVIEW 4 of 13

Linkage Disequilibrium Analysis and Identification of Candidate Genes
Linkage disequilibrium (LD) analysis was performed using LDBlockShow software 1. 40 [22].The candidate genes near the most significant SNPs or in the candidate genomic region were identified by the gene annotation information from Ensembl genome browser 110 (http://asia.ensembl.org/index.html,accessed on 25 August 2023).Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were employed to identify related pathways.KEGG and GO analyses were performed using OmicShare Tools (https://www.omicshare.com,accessed on 25 August 2023).

Phenotype Data, Whole Genome Sequencing Data, and Population Structure
The phenotype data of age at first egg (AFE), weight at first egg (WFE), egg weight (EW), egg number at 40 weeks (EN40), egg number at 43 weeks (EN43), and maximum consecutive egg laying days (MCD) in the Shuanglian chicken population are shown in Table 1.Except for EW, the coefficient of variation (CV) of all traits is above 10%, with MCD having the highest CV (54.3%).Through whole genome sequencing and quality control, a total of 11,006,178 high-quality SNPs were obtained.The SNP density plot is presented in Figure 1.Principal component analysis (PCA) was carried out.And, the scatter plot of the first two principal components is displayed in Figure S1.

Estimation of Phenotypic and Genetic Parameters
Heritability, genetic correlation, and phenotypic correlations for AFE, WFE, EW, EN40, EN43, and MCD are shown in Table 2. Heritability of AFE, WFE, EW, EN40, EN43, and MCD was estimated to be within 0.193 and 0.897, with the heritability of egg weight (0.897) being the highest among them.The phenotypic correlation data indicated there was a significant correlation between most traits, AFE had a highly significant positive correlation with EW, and a highly significant negative correlation with EN.Similarly, a strong genetic correlation was observed among EN40, EN43, and MCD, the genetic correlation coefficient was almost 1, while the phenotypic correlation coefficient was also above 0.694.The phenotypic correlation coefficients and genetic correlation coefficients among traits were located in the upper and lower triangle, respectively; the heritability of each trait was located in diagonal of the table.* indicates significant correlation (p < 0.05), ** (p < 0.01) and *** (p < 0.001) indicates extremely significant correlation, respectively.

Genome-Wide Association Analysis
The GWAS results of EN40, EN43, and MCD are shown in Figure 2, and the information of the significant SNPs is illustrated in Table 3.    GWAS results indicated 10 SNPs significantly associated with EN40 and EN43 were detected on chromosome 10 (GGA10), and the significant SNPs detected for these two traits were completely consistent, covering a 0.19 Mb (1.77 Mb−1.96Mb) genomic region.The most significant SNP of EN40 was rs794599852 (G > A), located in the intron of NEO1 gene, and it explained 8.5% of the phenotypic variance.The most significant SNP of EN43 were rs737101872 (A > G), located in the intron of the ENSGALG00010025119 gene, and it explained 7.2% of the phenotypic variance.The most significant SNP of MCD was 28:1696604 (A > G), located in the upstream of the S1PR4 gene on GGA28, and it explained 22.8% of the phenotypic variance.In addition, nineteen suggestive significant SNPs were also found on GGA3, GGA4, and GGA10 to be associated with EN40; seventeen suggestive significant SNPs were also found on GGA1, GGA10, and GGA19 to be associated with EN43; one suggestive significant SNP related to EW and WFE was found on GGA1 and GGA4 (Figure S2, Table 4), respectively.Unfortunately, there was no genome-wide and suggestive significant SNP associated with AFE.

Linkage Disequilibrium Analysis
In order to fine-map the genomic region associated with EN, linkage disequilibrium analysis of the associated SNPs (10 significantly and 17 suggestively) on GGA10 was performed using LDBlockShow software 1.40 in Figure 3.The LD analysis revealed all genome-wide and suggestive significant SNPs associated with EN were located in the 0.57 Mb region, which were in strong LD status and were clustered into five blocks.Gene annotation data showed ten candidate genes (CELF6, BBS4, ADPGK, HEXA, ARIH1, PARP6, NEO1, HCN4, REC114, and NPTN) were identified in the block region.

Enrichment Analysis
To gain insight into the function of these 34 significant SNPs associated with egg production traits, the corresponding candidate genes were used for GO and KEGG enrichment analysis.Most of these genes were significantly enriched in GO terms of biological processes (BP) and participated in cellular process (Figure 4).KEGG analysis revealed a total of seven significantly enriched pathways, mainly including the ubiquinone and other terpenoid-quinone biosynthesis, glycosphingolipid biosynthesis, and steroid hormone biosynthesis (Figure 5).More details of the GO and KEGG enrichment can be found in Tables S1 and S2.
nome-wide and suggestive significant SNPs associated with EN were located in the 0.57 Mb region, which were in strong LD status and were clustered into five blocks.Gene annotation data showed ten candidate genes (CELF6, BBS4, ADPGK, HEXA, ARIH1, PARP6, NEO1, HCN4, REC114, and NPTN) were identified in the block region.

Enrichment Analysis
To gain insight into the function of these 34 significant SNPs associated with egg production traits, the corresponding candidate genes were used for GO and KEGG enrichment analysis.Most of these genes were significantly enriched in GO terms of biological processes (BP) and participated in cellular process (Figure 4).KEGG analysis revealed a total of seven significantly enriched pathways, mainly including the ubiquinone and other terpenoid-quinone biosynthesis, glycosphingolipid biosynthesis, and steroid hormone biosynthesis (Figure 5).More details of the GO and KEGG enrichment can be found in Tables S1 and S2.

Discussion
Egg production is not only an important economic trait, but also a complex quantita tive trait, which is influenced by both genetics and environment, with genetics being th

Discussion
Egg production is not only an important economic trait, but also a complex quantitative trait, which is influenced by both genetics and environment, with genetics being the determining factor [9].In this study, we present a GWAS to identify the candidate genes associated with egg production traits in a Shuanglian chicken population from China.In general, the important conditions for GWAS to achieve better results mainly include the number of markers, population structure, and relationships between individuals [23].With the continuous development of high-throughput sequencing technology, the application of whole genome resequencing is becoming increasingly widespread.In this study, whole genome sequencing was used to obtain more genetic variation information.A total of 11,006,178 high-quality SNPs were finally obtained through quality control, which were evenly distributed on each chromosome.Based on high-quality SNPs, we analyzed the population structure and the relationship between individuals.The analysis of population structure indicated the population roughly fell into three subgroups.Therefore, in the construction of the GWAS model, the first three principal components from PCA were used as covariates, and the genetic relationship between individuals was also taken into consideration.
The coefficient of variation is an important indicator for measuring the uniformity of traits.In this study, the CV of AFE, WFE, MCD, EN40, and EN43 were all above 10%; the CV of MCD was 54.3%, indicating the significant phenotypic variations in these traits.In general, heritability is considered low if the value varies from 0 to 0.20, moderate at 0.20 to 0.40, and high at >0.40.Previous studies have shown the heritability of egg production traits was mostly between 0.160 and 0.640 [24,25], with the heritability of EN ranging from 0.160 to 0.250, indicating a low to moderate level of heritability.In this study, apart from EW (0.897), the heritability of AFE, WFE, MCD, EN40, and EN43 was within 0.193 and 0.632, indicating our estimation results of trait heritability were comparable to previous research.The high heritability of EW in this study indicated the genetic factor accounted for a large proportion of the phenotypic variation, which was conducive to revealing genetic characteristics of this trait in Shuanglian chickens.
The genetic correlation coefficient between AFE and EW, EN40, EN43, MCD were 0.678, −0.764, −0.789, and −0.442, respectively, suggesting a strong correlation among these traits.Moreover, the results of phenotypic correlation were consistent with those of genetic correlation, and there was a significant correlation between these traits, which indicated AFE could be used as a key indicator for predicting the EN.MCD can reflect the continuous production performance of the chicken population [26]; the genetic correlation coefficient between EN and MCD was almost 1 in this study, suggesting the higher the number of eggs produced by the population, the better the continuous production performance.Therefore, the AFE and MCD could be considered important indicators in the subsequent breeding of Shuanglian chickens, while balancing the WFE for joint breeding to further improve its production performance.
GWAS results indicated all the significant SNPs associated with EN40 and EN43 were located on chromosome 10, and these SNPs were all located in regulatory regions such as intron and intergenic regions, indicating regulatory mutation might be the determinants of phenotypic differences.In addition, the proportion of the phenotypic variance explained by the most significant SNP of EN40, EN43, and MCD was 8.5%, 7.2%, and 22.8%, respectively, implying our method for the identification of main candidate genes associated with these traits was powerful and effective.After LD analysis, all the genome-wide and suggestive significant SNPs associated with EN located in the 0.57 Mb region, which were in strong LD status and were clustered into five blocks.These blocks contained a total of 10 candidate genes: CELF6, BBS4, ADPGK, HEXA, ARIH1, PARP6, NEO1, HCN4, REC114, and NPTN.Through functional annotation, two candidate genes NEO1 and ADPGK were reported to be associated with reproductive performance.It was found the NEO1 gene was closely related to the fertility of humans and mice, and this gene was highly expressed in both ovaries and uterus [27,28].The amino acid homology encoded by this gene in mice and chickens is as high as 94%, and the expression pattern of chicken NEO1 is similar to that of mouse [29].ADPGK plays an important role instead of hexokinase in regulating energy metabolism [30].It has also been confirmed to be related to the reproductive performance [31].In addition, we also annotated the upstream and downstream 1 Mb regions of all genome-wide and suggestive significant SNPs and identified an important candidate gene CYP11A1 related to egg production performance.The CYP11A1 gene is the first hydrolytic enzyme that catalyzes the production of steroid hormones and is also the rate-limiting enzyme of this process.It plays an important role in the reproductive physiology of animals regulated by hormones [32].According to previous studies, the LDB2 gene could regulate cell migration, and a 31-bp Indel within LDB2 gene was significantly associated with growth traits in the F2 resource chicken population and affected the expression of LDB2 in muscle tissue [33][34][35].Furthermore, in studies with Lingxian white goose and goats, the LDB2 gene was also confirmed to be associated with growth and weight traits [36][37][38].The above studies indicate the LDB2 gene is closely related to the weight traits of poultry and can be an important candidate gene for WFE of Shuanglian chickens in our study.For the EW trait, the GRM8 gene encodes a G protein-coupled receptor, involved in glutamatergic neurotransmission in the central nervous system, and it plays a role in intracellular signal transduction and cyclic adenylate signal cascade inhibition [39].Previous studies found the GRM8 gene could participate in the generation of GABA, an inhibitory neurotransmitter in the brain and associated with egg weight [40,41].In addition, the S1PR4 gene is a signaling lipid.It could regulate cell fate and participate in STAT3, MAPK, and Akt pathways [42], which were involved in regulating the cell cycle, proliferation and apoptosis of chicken follicular granulosa cells [43].Perhaps, the S1PR4 and GRM8 genes may be potential candidate genes for MCD and EW traits, respectively.

Conclusions
In conclusion, the GWAS performed by whole genome sequencing in this study demonstrated 34 significant SNPs were identified to be associated with EN, MCD, WFE, and EW.Moreover, three genes NEO1, ADPGK, and CYP11A1 were identified as candidates associated with EN, while the S1PR4, LDB2, and GRM8 genes were linked to MCD, WFE, and EW, respectively.Our findings may help us to better understand the molecular mechanisms underlying egg production traits in chickens and contribute to genetic improvement of these traits.

Figure 1 .
Figure 1.The density plot of SNPs.

Figure 1 .
Figure 1.The density plot of SNPs.

Figure 2 .
Figure 2. Manhattan plots and quantile-quantile (Q-Q) plots of egg number at 40 weeks (EN40), egg number at 43 weeks (EN43), and maximum consecutive egg laying days (MCD).The solid and dashed line indicates the genome-wide and suggestive significance threshold, respectively.Red points indicate the genome-wide significant SNPs, green points indicate the suggestive significant SNPs.

Figure 2 .
Figure 2. Manhattan plots and quantile-quantile (Q-Q) plots of egg number at 40 weeks (EN40), egg number at 43 weeks (EN43), and maximum consecutive egg laying days (MCD).The solid and dashed line indicates the genome-wide and suggestive significance threshold, respectively.Red points indicate the genome-wide significant SNPs, green points indicate the suggestive significant SNPs.

Figure 3 .
Figure 3. Linkage disequilibrium analysis of genomic regions associated with EN.

Figure 3 .
Figure 3. Linkage disequilibrium analysis of genomic regions associated with EN.Genes 2023, 14, x FOR PEER REVIEW 9 of 13

Figure 4 .
Figure 4. GO annotation analysis for candidate genes.Figure 4. GO annotation analysis for candidate genes.

Figure 4 .
Figure 4. GO annotation analysis for candidate genes.Figure 4. GO annotation analysis for candidate genes.

Figure 4 .
Figure 4. GO annotation analysis for candidate genes.

Table 3 .
Information of the genome-wide significant SNPs associated with EN40, EN43, and MCD.

Table 3 .
Information of the genome-wide significant SNPs associated with EN40, EN43, and MCD.

Table 4 .
Information of the suggestive significant SNPs associated with EN, WFE, and EW.
1 Pos (bp) indicates SNP position derived from the GRCg7b reference (Ensembl release 110).2MAFindicates the minor allele frequency of first listed marker.