Genome-Wide Association Study for Reproductive Traits in a Duroc Pig Population

Simple Summary Reproductive traits are economically important in the pig industry, and it is critical to explore their underlying genetic architecture. Hence, four reproductive traits, including litter size at birth (LSB), litter weight at birth (LWB), litter size at weaning (LSW), and litter weight at weaning (LWW), were examined. Through a genome-wide association study in a Duroc pig herd, several candidate single-nucleotide polymorphisms (SNPs) and genes were found potentially associated with the traits of interest. These findings help to understand the genetic basis of porcine reproductive traits and could be applied in pig breeding programs. Abstract In the pig industry, reproductive traits constantly influence the production efficiency. To identify markers and candidate genes underlying porcine reproductive traits, a genome-wide association study (GWAS) was performed in a Duroc pig population. In total, 1067 pigs were genotyped using single-nucleotide polymorphism (SNP) chips, and four reproductive traits, including litter size at birth (LSB), litter weight at birth (LWB), litter size at weaning (LSW), and litter weight at weaning (LWW), were examined. The results showed that 20 potential SNPs reached the level of suggestive significance and were associated with these traits of interest. Several important candidate genes, including TXN2, KCNA1, ENSSSCG00000003546, ZDHHC18, MAP2K6, BICC1, FAM135B, EPHB2, SEMA4D, ST3GAL1, KCTD3, FAM110A, TMEM132D, TBX3, and FAM110A, were identified and might compose the underlying genetic architecture of porcine reproductive traits. These findings help to understand the genetic basis of porcine reproductive traits and provide important information for molecular breeding in pigs.


Introduction
Animal reproductive traits are economically important but are mostly sex-specific (such as sperm quality in males and fertility in females), and most of them are complex and present low heritability. Hence, the genetic improvement on these traits is especially difficult compared to other complex traits in livestock breeding practices. In the pig industry, litter traits are extremely important economic traits for pig production, as they are directly related to production efficiency. Improving the litter size is the main breeding goal and has been intensively selected in breeding programs for decades in many well-organized breeding systems, such as at the Canadian Center for Swine Improvement (http://www.ccsi.ca/). Though genetic gain for these traits has been obtained with traditional breeding strategies, the slow genetic improvement has increased the need for molecular breeding, such as genomic selection [1].
The fast development of molecular quantitative genetics methods and high-throughput genotyping techniques has increased the feasibility of genetic improvements of reproductive traits via marker-assisted selection or genomic selection. In genomic selection, it has been proved that the genetic gain could be achieved by incorporating prior information such as candidate genes or quantitative trait loci (QTL) affecting the traits under consideration [2]. Hence, for a better genetic dissertation and breeding practice, it is critical to explore the underlying genetic architecture of reproductive traits. A powerful way consists in testing the association between reproductive records and genetic markers covering the whole genome via genome-wide association studies (GWAS) [3].
In the past decade, GWAS was widely used to dissect the genetic architecture of growth [4,5], reproduction [6,7], and meat quality [8,9] traits in a variety of pig populations. These GWAS and former QTL mapping studies together identified 28,720 QTLs in total, of which 2129 QTLs are associated with reproductive traits [10]. However, a limited number of genes were reported for each reproductive trait, which can explain only a small proportion of genetic variance [11]. More QTLs or genes underlying reproductive traits are yet to be uncovered.
The main objective of this study was to perform a GWAS to identify potentially important single-nucleotide polymorphism (SNPs) or QTL regions associated with four reproductive traits, including litter size at birth (LSB), litter weight at birth (LWB), litter size at weaning (LSW), and litter weight at weaning (LWW) in a Duroc pig population. Subsequently, the potential function of significant chromosomal regions was analyzed in detail.

Ethics Statement
Animal care and experiments were conducted according to the Regulations for the Administration of Affairs Concerning Experimental Animals (Ministry of Science and Technology, China, revised in June 2004) and were approved by the Animal Care and Use Committee of the South China Agricultural University, Guangzhou, Guangdong, China (permit number: SCAU#2013-10).

Population and Phenotyping
The population used in the present study was normally maintained in a breeding herd in Fujian, China. Four reproductive traits, including LSB, LWB, LSW, and LWW, were recorded for all sows in this herd. LSB and LWB were measured within 24 hours after delivery, and LSW and LWW were recorded 24 days (the day of weaning) after delivery. At present, information for 4539 Duroc pigs with 15,662 farrowing records has been collected for a period of 10 years (2008 to 2017). A multi-traits animal model was used to estimate covariance components for calculating heritability and genetic correlation. The models used were as follows: where Y is the vector of phenotypic records, b is the vector of fixed effects including parity and year-season, a is the vector of additive genetic, pe is the vector of permanent environmental effects, e is a vector of residuals, and X, Z, and W are incidence matrices for b, a, and pe. Estimated breeding values (EBVs) of all pigs and the reliabilities of EBVs were imputed using animal model best linear unbiased prediction [12] and obtained from the in-farm genetic evaluation software Herdsman swine management platform (S & S Programming, Lafayette, IN, USA). As EBVs include pedigree information, which could be significantly associated with the examined traits rather than with the phenotype, de-regressed EBV (DEBV) were calculated for each pig to remove the contribution of information from relatives. The equation for DEBV [13] is as follows: where DEBV is de-regressed EBV, PA represents parental average, EBV and REL are the estimated breeding value and reliability of each pig.

Genomic DNA Extraction and Genotyping
In the present study, a total of 1067 Duroc pigs (81 boars and 986 sows) were genotyped for further GWAS analysis. Genomic DNA was extracted from pig ear tissue using the TaKaRa

Data Quality Control
For phenotypes, descriptive statistical analyses were performed in R [14] to check the data quality.
For genotypes, quality control on genotypes was performed using the PLINK software [15]. In the present study, the SNPs common between the Illumina PorcineSNP60 BeadChip and the GeneSeek GGP-Porcine chip were retained, and these SNPs were filtered according to the following criteria: (1) call rate <90%; (2) minor allele frequency (MAF) <1%; and (3) Hardy-Weinberg equilibrium (HWE) testing with a p-value < 1.00e-6. Following the quality control, 32,147 SNPs were retained for further analysis. In order to check whether population stratification existed in the Duroc pig herd, the genomic kinship between all pairs of individuals was calculated using SNPs ion autosomes.

Statistical Analysis
The association between each SNP marker and the phenotypes was tested with a single-marker regression mixed linear model by GEMMA software [16]. The statistical model is as follow: where y is a vector of dependent variable (DEBVs in this study), µ is the overall mean, G is the realized relationship matrix constructed with markers, σ 2 α is the additive genetic variance, Z represents incidence matrices corresponding to u, e is the vector of residual errors, and σ 2 e is the residual variance. To confirm the thresholds for the genome-wide significance and suggestive significance, effectively independent tests based on the independent markers and linkage disequilibrium (LD) block (defined as a set of SNPs with pairwise r square values >0.40) were calculated as in [17]. A total of 9266 effectively independent tests was suggested in the present study. Therefore, the genome-wide significance threshold was 0.05/9266 = 5.40 × 10 −6 , and the genome-wide suggestive significance threshold was 1/9266 = 1.08 × 10 −4 .

Identification of Candidate Genes
Candidate genes were identified according to their physical positions and functions based on the Sus scrofa 10.2 reference genome assembly. The SNP-containing or nearest annotated genes for each potential SNP were obtained from the Ensembl release 89 (http://may2017.archive.ensembl.org/index. html) and taken as candidate genes.

Functional Enrichment Analysis
Genes that were less than 1 Mb away from the potential SNPs were selected and identified as the functional genes. Further Gene Ontology (GO) terms annotation was conducted using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) Version 6.8 [18], then the biological process GO terms were selected with the Benjamini-Hochberg method, adjusted p-value <0.05.

Description of Phenotypes and Genotypes
Descriptive statistics of DEBVs for the traits of LSB, LWB, LWW, and LSW analyzed in this study are presented in Table 1. The genetic correlations for all pairs of traits are given in Table 2. In the present study, the heritability of LSB, LWB, LWW, and LSW was 0.158, 0.161, 0.173, and 0.140, respectively. The traits of LSB and LWB were strongly and positively correlated, and the correlation between LSW and LWW showed the same pattern.
There were 38,544 SNPs before frequency and genotyping pruning. Through quality control, 130 and 6267 SNPs were excluded from our dataset due to HWE testing and MAF, respectively. Finally, 32,147 SNPs from 1067 animals were retained for further analysis.

Genome-Wide Association Results
In total, 20 SNPs that reached the suggestive significance level were found to be associated with one of the tested reproductive traits (Table 3) and were defined as the potential SNPs for each trait. Manhattan plots were used to visualize the association results of the four traits ( Figure 1). The most significant SNPs associated with LSB were rs80979042 and rs80825112, located in the intron of BICC1 gene on chromosome 14. Additionally, the other five potential SNPs were located on chromosome 5, 6, and 12. For LWB, two SNPs located between 5.66 Mb and 5.68 Mb on chromosome 4, two SNPs located between 68.26 Mb and 74.78 Mb on chromosome 6, and a SNPs located on chromosome 14 at 1.18 Mb reached suggestive significance. For both LWW and LSW, SNP rs328230332 located in the intron of FAM110A on chromosome 17 was simultaneously associated with these two traits. Additionally, there were other four and two potential SNPs associated with LWW and LSW, respectively. By extending 1 Mb downstream and upstream of the potential SNPs, 146, 62, 58, and 63 functional genes were identified for LSB, LWB, LSW, and LWW, respectively. For LSB, the functional genes were enriched in "GO: 0061436, establishment of skin barrier", "GO: 0045606, positive regulation of epidermal cell differentiation", and some other epidermal growth-associated GO terms. For LWB, the functional genes were involved in organismal defense-and immunity-associated GO terms, including "GO: 0042742, defense response to bacterium" and "GO: 0045087, innate immune response". Furthermore, the functional genes of LSW were enriched in "GO: 0042742, defense response to bacterium", "GO: 0045087, innate immune response", and "GO: 003511,~embryonic forelimb morphogenesis". Besides, the functional genes of LWW were involved in "GO: 0061436, establishment of skin barrier", "GO: 0043552, positive regulation of phosphatidylinositol 3-kinase activity", and "GO: 0034644, cellular response to UV" (Table 4). Table 3. Potential single-nucleotide polymorphisms (SNPs) and candidate genes detected in the genome-wide association study for four porcine reproduction traits.

Discussion
In the present study, we used the Illumina Porcine SNP60 Chips and the GeneSeek GGP-Porcine Chip to genotype a Duroc pig population. Then, a genome-wide association analysis between the genotypes and four reproductive trait phenotypes was performed using a single-marker regression approach. Finally, 20 potential SNPs reaching suggestive significance were identified to be associated with the four pig reproductive traits.
SNP rs328230332 was associated with both LWW and LSW. This SNP is located in the intron of the FAM110A gene (family with sequence similarity 110, member A). FAM110A has been reported to be significantly associated with young-onset hypertension in Han Chinese population of Taiwan [19]. In livestock, Gutiérrez-Gil et al. stated FAM110A is located in genomic regions associated with dairy

Discussion
In the present study, we used the Illumina Porcine SNP60 Chips and the GeneSeek GGP-Porcine Chip to genotype a Duroc pig population. Then, a genome-wide association analysis between the genotypes and four reproductive trait phenotypes was performed using a single-marker regression approach. Finally, 20 potential SNPs reaching suggestive significance were identified to be associated with the four pig reproductive traits.
SNP rs328230332 was associated with both LWW and LSW. This SNP is located in the intron of the FAM110A gene (family with sequence similarity 110, member A). FAM110A has been reported to be significantly associated with young-onset hypertension in Han Chinese population of Taiwan [19].
In livestock, Gutiérrez-Gil et al. stated FAM110A is located in genomic regions associated with dairy production in sheep. SNP rs80979042 and rs80825112 on chromosome 14 were associated with LSB and are located within the intron of Bicaudal C Homolog 1 (BICC1) gene. BICC1 has been identified by GWAS as a candidate gene associated with major depressive disorder in humans [20]. BICC1 provokes renal and pancreatic cysts and the visceral left-right patterning of ectopic Wnt/β-catenin signaling during visceral left-right patterning [21]. In pigs, BICC1 has been reported to be associated with prenatal development of skeletal muscles and to be expresses differently during myogenesis in Pietrain and Duroc pigs [22]. Two SNPs, rs81476258 and rs81326131, were annotated within the intron of EPHB2, which belongs to the Eph family, the largest known tyrosine kinase receptor (RTK) family [23]. EPHB2 is activated by its ligand EphrinB1, which results in protein phosphorylation in cells. In embryonic development and postnatal life processes, EPHB2 is expressed in a tissue-specific and time-specific manner [24] and also plays an important role in axon orientation, angiogenesis, and tumorigenesis [25]. Furthermore, EPHB2 is involved in regulating postnatal myogenesis through inhibiting satellite cell formation in mice [26].
Additionally, several other candidate genes were identified to be potentially associated with these traits of interest. KCNA1, a member of potassium channels family, has been involved in the maturation of the mouse auditory forebrain [27]. ZDHHC18 is the component of Hippo pathway, influencing skeletal muscle feed arteries in rat [28]. FAM135B, associated with the traits of LWB, has been related to mid-test metabolic weight in U.S. beef cattle [29]. SEMA4D differential expression has been reported in the skeletal muscle of pigs with distinct growth and fatness profiles [30]. For the functional enrichment analysis results, several biological processes GO terms were preferentially associated with the functional genes of more than two traits studied in the present research, including "GO: 0061436, establishment of skin barrier", "GO: 0045087, innate immune response", and "GO: 0042742, defense response to bacterium". These results reveal that the functional genes detected in the present study, mainly involved in the growth of epidermal tissues, organismal defense, and immunity, are strongly associated with birth (alive or dead) and weaning in piglets [31]. There were no common SNPs between the present study and some previous studies [17,32], which might indicate that different pig breeds need different selection strategies for reproductive traits.
Taking the DEBV as "dependent variable" could improve the power of GWAS. Generally, trait phenotype is the dependent variable in most GWAS. However, in this study, DEBVs obtained from on-farm genetic evaluation were employed, because they are more suitable than the original phenotypes. This is mainly because: (1) Systematic environmental effects are excluded from DEBVs; (2) More phenotypic records from relatives were combined to evaluate the individual under consideration; (3) Some trait phenotypes can be recorded only in one gender; and (4) They do not contain information from relatives. Recently, many GWAS have used DEBVs as "phenotype", especially for livestock populations [33][34][35].
In this study, we have not identified significant SNPs associated with the four reproduction traits examined. We speculate that the relatively small number of SNPs reaching the suggestive significance level is partly due to the small size of the research population used in the present study. Additionally, the substructure within this population further decreased the effective population size and, hence, affected the power of GWAS. In a population with limited size, only large or moderate QTLs can be detected. Hence, research populations must be enlarged in our future studies to confirm the findings from the present study.

Conclusions
Through a genome-wide association study of four reproductive traits in a Duroc pig herd, we detected 20 SNPs that were potentially associated with these traits of interest. TXN2, KCNA1, ENSSSCG00000003546, ZDHHC18, MAP2K6, BICC1, FAM135B, EPHB2, SEMA4D, ST3GAL1, KCTD3, FAM110A, TMEM132D, TBX3, and FAM110A might be important candidate genes that compose the underlying genetic architecture of porcine reproductive traits. These findings help to understand the genetic basis of porcine reproductive traits and could be potentially applied in pig breeding programs.