Genome-Wide Association Study Reveals Novel Candidate Genes Influencing Semen Traits in Landrace Pigs

Simple Summary Improving the reproductive performance of boars is a major goal in pig-breeding programmes. Semen traits are significantly associated with the reproductive performance of boars, and thus, identifying single nucleotide polymorphisms (SNPs) and the genes affecting semen traits are important. Genome-wide association study have been widely used to detect significant SNPs associated with economically important traits in pigs. The results of our study can be used in genomic selection models to facilitate future breeding programs aimed at improving reproductive traits in boars. Abstract Artificial insemination plays a crucial role in pig production, particularly in enhancing the genetic potential of elite boars. To accelerate genetic progress for semen traits in pigs, it is vital to understand and identify the underlying genetic markers associated with desirable traits. Herein, we genotyped 1238 Landrace boars with GeneSeek Porcine SNP50 K Bead chip and conducted genome-wide association studies to identify genetic regions and candidate genes associated with 12 semen traits. Our study identified 38 SNPs associated with the analyzed 12 semen traits. Furthermore, we identified several promising candidate genes, including HIBADH, DLG1, MED1, APAF1, MGST3, MTG2, and ZP4. These candidate genes have the potential function to facilitate the breeding of boars with improved semen traits. By further investigating and understanding the roles of these genes, we can develop more effective breeding strategies that contribute to the overall enhancement of pig production. The results of our study provide valuable insights for the pig-breeding industry and support ongoing research efforts to optimize genetic selection for superior semen traits.


Introduction
Artificial insemination (AI) is a widely used technology in swine breeding that enables the efficient dissemination of improved genetics from elite boars to commercial pig populations.To achieve successful AI outcomes, it is essential to produce high-quality semen characterized by favorable genetic traits, good sperm motility, and a high number of sperm cells per ejaculate.However, the semen quality of boars is affected by various factors, such as nutrition, disease, the interval between the ejaculate seasons, and age [1].In addition, genetics plays an important role in semen quality; semen traits are known to have low to moderate heritability [2,3].
Currently, there are several approaches to evaluate semen quality [4], including measuring semen volume, concentration, motility, progressive motility, and the total proportion of sperm morphological abnormalities.These approaches are crucial for commercial boar stations to maintain the consistency and stability of boar semen quality.In recent years, with advances in high-throughput genotyping and molecular techniques, there has been growing interest in understanding the molecular processes and genetic mechanisms that influence semen traits.Although several genes and markers associated with pig-semen traits were identified and described in previous studies [5,6], few studies have extensively analyzed large datasets to identify novel quantitative trait loci (QTL) and provide deeper insights into the genes controlling the semen traits of boars.This is likely due to the genetic complexity of sperm cell production and maturation.
Genome-wide association studies (GWAS) are commonly used to identify singlenucleotide polymorphisms (SNPs) associated with traits with major effects [7].To enhance these traits, traditional selection methods must be supplemented with marker-assisted selection.This can be achieved by identifying the genotypes of reproductive traits and their relationships by determining the associated polymorphisms and phylogenetic relationships.
Previous studies have reported on the genetic regions and/or candidate genes associated with boar semen traits [8][9][10], contributing to our understanding of the genetic architecture of swine semen traits.These studies varied in terms of the genetic background, number of boars, and density of the genetic marker panels used.Therefore, the objective of the present study was to identify the QTL regions and candidate genes associated with semen quality in the Landrace boar population using GWAS.

Phenotypic and Pedigree Data
We collected 225,468 semen samples from 2059 Landrace boars (including 1238 genotyped pigs) in three locations (Tiantishan, Daling, and Gongzhutun) operated by Guangxi Yangxiang Co., Ltd.(Guigang, China) from 2016 to 2021.The 2059 boars both had phenotypic data and pedigree information and were used to acquire estimated breeding values (EBV).The age of the boars ranged from 5 to 71 months.All boars were raised in hog houses with the temperature, humidity, and wind speed automatically controlled.Each boar was allocated to a single pen with approximately 7 m 2 of space and fed a specialized diet.Semen traits, including semen density (DEN), semen motility (MOT), and percentage of abnormal sperms (ABN) were directly measured using computer-aided sperm analysis.The semen volume (VOL) was measured by weight assuming that 1 mL = 1 g.Other semen traits such as total sperm number (TSN) and functional sperm number (FSN) were calculated using the following formulae [11]: To ensure quality, semen data were retained only if they met the following criteria: VOL range of 30-800 mL, DEN range of 0.01-10 × 10 8 /mL, MOT range of 10-99%, and ABN range of 0-90%.
We calculated the coefficient of variation (CV) for semen data collected in the same year and season to measure the degree of variation in boar semen data as follows: where CV was the coefficient of variation of the semen traits of Landrace, σ was the standard deviation of the semen traits of Landrace, and µ was the average of the semen traits of Landrace.

Genotypic Data
Total DNA was extracted from 1238 boars using genome extraction kits (Wuhan NanoMagBio Technology Co., Ltd., Wuhan, China), according to manufacturer's instructions.Boars were genotyped using the GeneSeek Porcine SNP50 K Bead chip data (Neogen, Lansing, MI, USA), which contains 50,703 SNPs.The SNP positions were remapped to the Sus scrofa 11.1 reference genome using the genome remapping procedure available from the NCBI (National Center for Biotechnology Information).Autosomal SNPs were filtered using PLINK v1.9, as previously described [12], based on the following quality control criteria: individual call rate ≥ 90%, SNP call rate ≥ 90%, minor allele frequency ≥ 0.01, and Hardy-Weinberg equilibrium p-value ≥ 10 −6 .After quality control, 1238 boars and 43,876 SNPs were retained for further analysis.Missing genotypes were imputed using the Beagle software (version 4.1), as previously described [13].Subsequently, the imputed SNPs were subjected to another round of quality control using the same criteria mentioned above in PLINK v1.9.

Statistical Model
Variance components were estimated using DMUAI, which were then used in the DMU to predict estimated breeding values (EBV).The multi-trait animal model for PBLUP (Pedigree Best Linear Unbiased Prediction) was as follows: where Y was the vector of phenotypic values; b was the vector of fixed effects, including center-year-season (levels are shown in Figure S1); a~N(0, Aσ a 2 ) was the vector of additive genetic effects of the boar; p(0, Iσ p 2 ) was the vector of the permanent environmental effect of the boar; Z and W were design matrices for a and p, respectively; age represents the age of the boar; Intv was the interval between the present and previous semen collection time points; and e(0, Iσ e 2 ) was the residual effect.It was assumed that ~N(0, Aσ a 2 ), p(0, Iσ p 2 ), and e(0, Iσ e 2 ), where σ a 2 , σ p 2 and σ e 2 were additive genetic, permanent environmental, and residual variances, respectively.A was a matrix that combined pedigree information and I was an identity matrix.The pedigree information used in this study encompassed a pedigree of 5284 pigs across three generations.
We then obtained deregressed EBVs (DEBVs) for the 1238 boars.DEBVs were calculated as follows [14,15]: where DEBV was the deregressed estimated breeding value of each boar, PA represents the parental average, EBV was the estimated breeding value, REL was the reliability of each boar, and std was the standard error of the EBV of each boar.σ a 2 was the additive effect variance for the relative traits.
We performed GWAS using the Circulating Probability Unification (FarmCPU) model with rMVP software (version 1.0.8) as previously described [16] in 1238 genotyped pigs.This model used fixed and random effects to control for false negatives simultaneously.The model was expressed as follows: where Y was the vector of DEBV; T was a matrix of fixed effects, including the top three principal components with the corresponding effect; P j was the genotype matrix of j pseudoquantitative trait nucleotides (QTNs), which were used as a fixed effect and q j was the corresponding effect; m k was a vector of genotypes for the kth marker to be tested and h k was the response effect; and e was the residual effect vector with distribution e~(0, Iσ e 2 ), where σ e 2 represented the residual variance.The random-effects model was used to select the most appropriate pseudo-QTNs.
The genome-wide significance threshold was set at p < 0.05/N, where N was the number of qualified SNPs.In this study, N was 43,876 and the threshold was set to 1.14 × 10 −6 .Phenotypic and genetic correlations among the semen traits were calculated using "asreml" package in R-4.0.4 (www.r-project.org,accessed on 23 December 2023) statistical environment and were used to determine whether there were associations between the GWAS results.

Annotation of Candidate Genes
We identified potential candidate genes within 500 kb up-and downstream of genomewide significant SNPs in the Sus scrofa 11.1 genome from the Ensembl database.Candidate genes were selected for traits according to their biological function using NCBI and Genecards database.

Phenotypic Data Analysis and Heritability Estimates
We collected 225,468 semen samples from 2059 boars and calculated the VOL, DEN, MOT, ABN, TSN, and FSN for each sample.The descriptive statistics of the 12 semen traits are listed in Table 1.In brief, the coefficient of variation (C.V) value of the 11 analyzed semen traits ranged from 25.07% (CV VOL ) to 89.01%(ABN), thereby indicating the potential genetic improvement space for the traits.Table S1 shows the data distribution of DEBV.Table S2 shows the estimates of the variance components of the semen traits.The genomic heritability estimates for VOL, DEN, MOT, ABN, TSN, FSN, CV VOL , CV DEN , CV MOT , CV ABN , CV TSN , and CV FSN were 0.20, 0.17, 0.23, 0.24, 0.11, 0.15, 0.017, 0.003, 0.001, 0.011, 0.002, and 0.044, respectively.

Discussion
In recent years, there have been an increasing number of studies on genomic regions affecting boar semen traits, owing to advances in molecular and genotyping techniques, statistical methods, and the application of GWAS [9,10].In this study, we performed a GWAS to identify QTL regions and candidate genes associated with semen quality in a Landrace boar population.A total of 38 SNPs and 13 genes were considered to be candidate markers associated with semen traits.
The HIBADH (3-hydroxyisobutyrate dehydrogenase) gene encodes mitochondrial 3-hydroxyisobutyrate dehydrogenase and may be associated with certain semen traits in men, such as sperm motility [17].In cattle, HIADBH was reported to be related to low sperm vitality through whole genome association analysis [18].Our results suggest HIBADH, located at the SNP ASGA0080065 on chromosome 18, is significantly correlated with six semen traits: CV VOL , CV DEN , CV MOT , CV ABN , CV TSN , and CV FSN .However, the underlying mechanism whereby HIBADH affects semen traits is not well understood; further research is required to comprehensively elucidate this association.
The DLG1 (discs large MAGUK scaffold protein 1) gene, located at SNP ASGA0058857 on chromosome 13, significantly correlated with CV VOL and CV ABN .DLG1 was shown to be related to litter size via the Hippo signaling pathway, which plays a key role in mechanical transduction in Pelibuey sheep [19].DLG1 is a scaffold protein that participates in controlling key cellular processes, such as polarity, proliferation, and migration, by interacting with various cell partners [20].DLG1 is highly expressed in oocytes and granulosa cells [21] and encodes MAGUK protein family members involved in epithelial cell polarization in mice [22].
The MED1 (mediator complex subunit 1) gene (also known as NR2C2) encodes a protein called mediator subunit 1, which is a component of the transcriptional mediator complex and was significantly associated with CV DEN , CV MOT , CV ABN , CV TSN , and CV FSN in the current study.MED1 is thought to act as a bridge between transcriptional activators and the RNA polymerase II complex, thereby facilitating the transcription of target genes.MED1 is involved in the regulation of various biological processes, including development, cell proliferation, and apoptosis [23,24].Bovo et al. [25] found that MED1 was related to calcium ion concentration in the serum of large white pigs.Calcium ions play a role in sperm motility and fertilization and may be involved in regulating the contractile activity of smooth muscle cells in the vas deferens and the duct that carries sperm from the testes to the urethra [26].Calcium ions may also play a role in the acrosome reaction by which the sperm acrosome (a structure containing hydrolytic enzymes) is released during fertilization [27].
The APAF1 (apoptotic peptidase activating factor 1) gene, located in the WU_10.2_5_89558832SNP on chromosome 5, was found to be significantly correlated with CV ABN .APAF1 is a central component of the apoptosome, a multi-protein complex that activates procaspase-9 after cytochrome c is released from the mitochondria in the intrinsic pathway of apoptosis [28].A C/T mutation in this gene affects the success rate of sperm fertilization and the mummy rate in dairy cows [29].
The MGST3 (microsomal glutathione S-transferase 3) gene, located in SNP ALGA0026338 of chromosome 4, was found to be significantly correlated with three semen traits: MOT, ABN, and FSN.MGST3 participates in the synthesis of prostaglandins that are involved in cell differentiation and apoptosis [30].
The EFNA5 (ephrin A5) gene, located at SNP MARC0067122 on chromosome 2, was identified as a candidate gene correlated with five semen traits: CV DEN , CV MOT , CV ABN , CV TSN , and CV FSN .The loss of EFNA5 in female mice results in subfertility, implying that Eph-ephrin signaling may also play a previously unidentified role in fertility regulation in women.EFNA5 is highly expressed in cancerous prostate tissues, suggesting that EFNA5 may participate in prostate carcinogenesis [31].The prostate is the largest organ that secretes prostate fluid, which is an important component of semen.
The SS18L1 (SS18L1 subunit of BAF chromatin remodeling complex) gene on chromosome 17 was found to have a significant relationship with CV VOL in our study.The methylation level of SS18L1 in sperm from low-fertility buffaloes was found to be significantly higher than that in sperm from high-fertility buffaloes [32].This further suggests that SS18L1 could be a candidate gene for influencing semen traits.SS18L1 is significantly related to disease resistance and heat tolerance in the Dehong humped cattle population through genome-wide selection [33].This suggests that the quality of boar semen may not only be affected by genetics but also by the external environment, including disease, temperature, humidity, and nutrition.Boars with stronger disease and heat resistance may be less affected by these factors in terms of semen quality.
AKR1B1 (aldo-keto reductase family 1 member B), a protein present in bovine epididymal spermatozoa, which is found exclusively in the detergent-soluble fractions of detergent-resistant membranes [34], may be compartmentalized in the lumen of the epididymosomes but not at the membrane surface [35].The intracellular localization of AKR1B1 may be involved in the accumulation of sorbitol within the cytoplasm, which confers protection to epididymal spermatozoa against hypertonic conditions and enhances sperm survival during epididymal transit and storage [36,37].Significant differences were found in the expression of AKR1B1 between blastocysts of different qualities [38,39].
ZP4 (zona pellucida (ZP) glycoprotein 4) was suggested to play important roles in species-specific sperm-egg binding, preventing polyspermy, ZP-induced acrosome reactions, and protecting the embryo [40].Human ZP4 alone is insufficient to support the binding of human sperm to the ZP in transgenic mice, suggesting that other zona proteins may also play a role in human gamete recognition [41,42].ZP4 was shown to induce the acrosome reaction and improve the efficiency of in vitro porcine fertilization [43]; however, ZP4 is probably dispensable for female fertility [44].
In the present study, other candidate genes associated with semen traits in Landrace pigs were identified, such as HSD17B12 (hydroxysteroid 17-beta dehydrogenase 12), HABP2 (hyaluronan binding protein 2), CTNND2 (catenin delta 2), and MTG2 (mitochondrial ribosome-associated GTPase 2); however, information on their function in relation to semen traits is limited.

Conclusions
38 SNPs were found to be associated with 12 semen traits in Landrace pigs using GWAS.Of the SNPs associated with the candidate genes, ZP4 may be worth further investigation.These findings provide valuable insights into future molecular breeding of boar semen traits in the context of genomic selection by allowing the identified SNPs to be assigned with higher weights in models.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ani14131839/s1, Figure S1: The number of observations per each fixed-effect-level; Table S1: DEBV of semen traits; Table S2: Estimates of heritability for semen traits in pigs.

Table 1 .
Descriptive statistics of semen traits.
N1: the number of boars; N2: the records of semen data; Average: average number of samples per boar; S.D: Standard deviation; Min: minimum; Max: maximum; C.V: coefficient of variation.

Table 2 .
Correlation coefficients of semen traits in the boars.

Table 3 .
Candidate genes for semen traits in pigs.