Genome-Wide Association Study on Reproductive Traits Using Imputation-Based Whole-Genome Sequence Data in Yorkshire Pigs

Reproductive traits have a key impact on production efficiency in the pig industry. It is necessary to identify the genetic structure of potential genes that influence reproductive traits. In this study, a genome-wide association study (GWAS) based on chip and imputed data of five reproductive traits, namely, total number born (TNB), number born alive (NBA), litter birth weight (LBW), gestation length (GL), and number of weaned (NW), was performed in Yorkshire pigs. In total, 272 of 2844 pigs with reproductive records were genotyped using KPS Porcine Breeding SNP Chips, and then chip data were imputed to sequencing data using two online software programs: the Pig Haplotype Reference Panel (PHARP v2) and Swine Imputation Server (SWIM 1.0). After quality control, we performed GWAS based on chip data and the two different imputation databases by using fixed and random model circulating probability unification (FarmCPU) models. We discovered 71 genome-wide significant SNPs and 25 potential candidate genes (e.g., SMAD4, RPS6KA2, CAMK2A, NDST1, and ADCY5). Functional enrichment analysis revealed that these genes are mainly enriched in the calcium signaling pathway, ovarian steroidogenesis, and GnRH signaling pathways. In conclusion, our results help to clarify the genetic basis of porcine reproductive traits and provide molecular markers for genomic selection in pig breeding.


Introduction
The reproductive performance of pigs plays a key role in the pig industry. Improving the reproductive performance of sows can lead to higher economic benefits for pig farms. However, reproductive traits are low-heritability traits, and their genetic structure is much more complex [1]. Therefore, it is difficult to improve these traits more rapidly using traditional breeding methods. With the development of molecular breeding technology, marker-assisted selection (MAS) and genomic selection (GS) have become effective ways to improve pig breeding efficiency [2].
In recent years, to complete genomic screening for trait-associated variants, genomewide association studies (GWASs) have been widely applied to find quantitative trait loci (QTL) in economic traits [3]. Thus far, 35,384 QTLs have been identified in pigs according to pigQTLdb, of which 3315 QTLs are associated with reproduction (https://www. animalgenome.org/cgi-bin/QTLdb/SS/summary, 25 April 2022). In pigs, GWAS has identified numerous SNPs significantly associated with growth traits [4,5], meat quality [6,7], Genes 2023, 14, 861 2 of 13 feed efficiency [8,9], semen traits [10,11], coat color [12,13], genetic defects [14,15], disease susceptibility [16,17], and microbial phenotypes [18]. However, most of them were genotyped based on SNP microarrays, and the density of markers is a key factor affecting GWAS efficiency [19]. With the development of sequencing technology and its increasingly low cost, many researchers have used sequencing or resequencing to perform relevant studies [20][21][22]. However, the sequencing or resequencing of large population samples is too costly and remains an inefficient strategy. Genotype imputation is an effective strategy in GWAS [23], which has been widely used in human genetics research, such as HapMap [24] and the 1000 Genomes Project [25]. It can increase the total number and density of SNPs used for association analysis and provide the opportunity to discover new potential genes.
In our study, we performed GWAS using two different genotype imputation databases and identified genetic variants related to five reproductive traits in large white pigs.

Ethics Statement
All ear tissue sample collection procedures were approved by the Institutional Animal Care and Use Committee of the Northwest A & F University (approval number: NWAFU-314021167).

Animals and Phenotypes
The pig population was uniformly reared at the core breeding farm of Zhumei Group Limited (Zhumadian City, China). Briefly, we collected breeding information and lineage records of large white pigs from 2011 to 2019 at this farm. There were 3733 pigs with complete pedigrees, and pedigrees could be traced back three generations. A total of 10,206 reproduction records of 2844 pigs were collected. The phenotype records included parity (including 8 levels: 1, 2, 3, 4, 5, 6, 7, or 8 or higher parity number), herd-year-season, and five reproductive traits. Five reproductive phenotypes, namely, total number born (TNB), number born alive (NBA), litter birth weight (LBW), gestation length (GL), and number of weaned (NW), were chosen for the next analysis. Table 1 presents the descriptive statistics of the five traits. Apart from GL, the other four traits had coefficients of variation above 25%.

Estimation of Genetic Parameters and Genetic Correlation
The variance and covariance components and genetic correlations of the five traits were calculated using a repeatability model in DMU v6.0 software [29].
The animal model was as follows: y = Xb + Z a a + Z pe pe + e (1) In the model, y is a vector of phenotype records; b is the fixed effect of herd-yearseason and parity with eight levels; X is a design matrix relating b to y; a is a vector of additive genetic effects; pe is a vector of random permanent environmental effects; and e is a vector of random residual effects. Z a and Z pe are the corresponding incidence matrices.
The genetic correlation was calculated as follows: where r 12 is the genetic correlation between trait 1 and trait 2, a 1 and a 2 represent the additive genetic values of trait 1 and trait 2 for the same individuals, and cov(a 1 , a 2 ), σ a 1 , and σ a 2 refer to the genetic covariance of two traits and the genetic standard deviations of trait 1 and trait 2, respectively.

Genome-Wide Association Study (GWAS)
To perform GWAS, we used the sum of an individual's estimated breeding value (EBV) and residual as the adjusted phenotype in this study. We used fixed and random model circulating probability unification (FarmCPU) models for GWAS in GAPIT3 [30]. This method iteratively takes advantage of the mixed linear model (MLM) as the random model and stepwise regression as the fixed model [31]. In this study, we used the Bonferroni correction method to find candidate SNPs. p < 1/N represents the genome-wide suggestive significance threshold. p < 0.05/N represents the genome-wide significance threshold. Manhattan and Q-Q plots were generated using the R CMplot package version 4.2.0 [32].

Candidate Gene Search
We used BedTools [33] to search for candidate genes in the regions 0.5 Mb downstream and upstream of the significant SNPs based on the pig reference genome (http://useast. ensembl.org/Sus_scrofa/Info/Index/, accessed on 16 December 2022, Sscrofa11.1). Additionally, to better understand the biological processes and pathways of these candidate genes, we also performed enrichment analyses. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO) terms were enriched via KOBAS-i [34].

Genetic Parameters and Genetic Correlations of Reproductive Traits
The genetic parameters of the five reproductive traits are presented in Table 2. The heritability estimates of TNB, NBA, LBW, GL, and NW were 0.0442 ± 0.0011, 0.0442 ± 0.0012, 0.0476 ± 0.0025, 0.1571 ± 0.0009, and 0.0727 ± 0.0021, respectively. As can be seen, these traits are all low-heritability traits. Table 3 shows the genetic correlations of the five reproductive traits. The genetic correlations ranged from −0.235 to 0.985, with standard errors ranging from 0.001 to 0.015. Among the five reproductive traits, TNB, NBA, LBW, and NW show strong positive correlations, with correlations ranging from 0.751 to 0.985. In contrast, GL shows some negative correlations with the remaining four traits, but the correlations are not strong.

Bioinformatics Annotation Analysis
In this research, candidate functional genes were found by searching 0.5 Mb upstream and downstream of the suggestive SNPs using GWAS based on chip data and two imputed databases. The genes associated with TNB are found to be linked to glycolysis/gluconeogenesis, TGF-β, the oxytocin signaling pathway, and oocyte maturation processes. For NBA and LBW, the same genes, PDGFRB, CAMK2A, and MMP2, are identified, mainly enriched in the calcium signaling pathway, GnRH signaling pathway, and embryonic organ development process. Finally, the functional genes related to GL are enriched in the mTOR signaling pathway, ovarian steroidogenesis, prolactin signaling pathway, embryo development, and regulation of G protein-coupled receptor signaling pathway (Table 7).

Discussion
Reproductive traits such as TNB, NBA, LBW, GL, and NW are closely related to sow fertility and are important quantitative indicators of pig production. However, most of them have low heritability due to the complexity of the genetic structure. Therefore, it is important to clarify the genetic relationships between reproductive traits and to identify key candidate genes. In this study, a repeatability model was used to estimate the heritability of reproductive traits. The heritability estimates of the TNB, NBA, LBW, GL, and NW traits were 0.0442, 0.0442, 0.0476, 0.1571, and 0.0727, respectively. This is in agreement with the results of previous studies [35][36][37]. Additionally, we also calculated genetic correlations between individual traits and found strong positive correlations between TNB, NBA, LBW, and NW, with correlation coefficients ranging from 0.751 to 0.985, in agreement with previous reports [38,39]. This suggests that fewer traits can be selected to simplify breeding work.
Genotype imputation has been widely used in recent years with the development of sequencing technologies, price reductions, and the demand for high-density markers. This approach allows the imputation of chip data with low-density markers to WGS data, and the imputation accuracy is affected by the density of the target SNPs, the size of the reference population, the genetic distance between the target and imputation reference population, and the imputation procedure [40]. In our study, we imputed chip data using two publicly available online populating platforms. PHARP v2 provides genotype imputation using Minimac4, and the reference panel includes 4096 haplotypes, 53 million autosome SNPs, and 122 pig breeds [27]. The reference panel of SWIM 1.0 has a total of 2259 pigs, representing 44 different breeds. Based on the imputed data of the two imputation platforms mentioned above, combined with chip data, we performed GWAS for five reproductive traits.
In our study, we conducted GWAS for five reproductive traits using imputation data from two different online imputation platforms. Imputation data based on the SWIM platform detected more significant or potentially significant loci compared to the PHARP platform. This may be due to the fact that the SWIM platform has a larger number of pigs in its reference panel. In addition, an imputation strategy could improve on previous SNP-based studies without the need for additional data and expense. Furthermore, a common set of SNPs can be obtained with an imputation approach, thus making a metaanalysis possible.
Some studies have shown that the FarmCPU model can be effective in GWASs for identifying loci with low-heritability traits [35]. So, we performed GWAS by using the FarmCPU model, which divides the MLM into two parts and uses them iteratively [31]. For the TNB trait, a total of 19 suggestive candidate genes were identified based on chip data and imputed data. Among them, the RPS6KA2 gene plays a major role in the EGF signaling cascade at ovulation, which is also correlated with oocyte developmental quality [41]. As a transcription factor, SMAD4 plays an important role in the porcine reproductive system. It has been shown that miR-143 [42], miR-26b [43], and miR-10b [44] can inhibit apoptosis and promote E2 release via SMAD4 in porcine granulosa cells. For both the NBA and LBW traits, GWAS based on imputed data identified the CAMK2A, NDST1, and RPS14 genes. In a meta-analysis of reproductive traits in heifers, the CAMK2A gene was identified as being involved in calcium signaling mechanisms and acting on pituitary gonadotropin secretion [45]. This is consistent with our findings. In addition, NDST1 has been shown to be critical for many organogenesis processes, and the targeted disruption of the NDST1 gene impaired heart development in mice [46]. NSDT1 f/f /2 null /3 null mice with defective decidualization resulted in female infertility [47]. It has been reported that RSP14 is a key gene in early embryonic development [48]. Embryonic stem cells heterozygous for the RSP14 gene showed defects in embryoid body differentiation [49]. For the GL trait, both GWASs based on chip data and imputed data identified genome-wide significant SNPs. Based on KEGG and GO analyses, we annotated a total of 13 candidate genes, mainly related to the ovarian steroidogenesis pathway and embryo development process. Among these, ADCY5 was identified as being associated with seasonal estrus in Sunite sheep [50], egg production in white Muscovy ducks [51], and fertility in cows [52], while in human GWAS, ADCY5 was found to be associated with gestational duration [53]. Interestingly, it has been reported that ADCY5 is associated with fetal growth and birth weight [54]. However, the ADCY5 gene has not been studied in pig reproduction, and we speculate that this gene may be a key gene in the influence of reproductive performance in pigs. Unfortunately, no potential SNPs were identified for the NW trait, probably due to the small size of the population and the high number of missing phenotypic data points. Overall, our results identify a number of new key candidate genes and loci associated with reproductive traits in large white pigs, but further studies are needed to confirm the functions of these genes.

Conclusions
In this study, the genetic parameters of TNB, NBA, LBW, GL, and NW in Yorkshire pigs were estimated using a repeatability model. These traits are low-heritability traits. There were strong positive correlations between TNB, NBA, LBW, and NW, excluding the GL trait, which was weakly negatively correlated with them. GWASs based on chip data and imputed data were performed for five reproductive traits in Yorkshire pigs. Finally, combining the results of GWAS and bioinformatics annotation analysis, SMAD4, RPS6KA2, CAMK2A, NDST1, and ADCY5 were identified as novel genes, and some of them have not been studied in livestock, so they may be key candidate genes affecting reproductive traits in pigs. The results of this study highlight some new major genes regulating reproductive traits in pigs and can benefit genome selection for pig genetic breeding.