Genome-Wide Association Analysis of Reproductive Traits in Chinese Holstein Cattle

This study was to explore potential SNP loci for reproductive traits in Chinese Holstein cattle and identify candidate genes. Genome-wide Association Study based on mixed linear model was performed on 643 Holstein cattle using GeneSeek Bovine 50 K SNP chip. Our results detected forty significant SNP loci after Bonferroni correction. We identified five genes (VWC2L, STAT1, PPP3CA, LDB3, and CTNNA3) as being associated with pregnancy ratio of young cows, five genes (PAEP, ACOXL, EPAS1, GLRB, and MARVELD1) as being associated with pregnancy ratio of adult cows, and nine genes (PDE1B, SLCO1A2, ARHGAP26, ADAM10, APBB1, MON1B, COQ9, CDC42BPB, MARVELD1, and HPSE2) as being associated with daughter pregnancy rate. Our study may provide valuable insights into identifying genes related to reproductive traits and help promote the application of molecular breeding in dairy cows.


Introduction
Genome-wide Association Study (GWAS) was proposed by Risch and Merikan-gas in 1996 when studying the genetic base of human complex diseases [1].GWAS applies a Single Nucleotide Polymorphism (SNP) as a molecular genetic marker at genome-wide level and identifies SNPs and candidate genes for target traits [2].The analysis basis of GWAS is the phenomenon of linkage disequilibrium (LD) caused by recombination in the long-term evolution of natural populations [3].According to the decay relationship between SNP locus and target trait, the SNP locus can be analyzed using statistics to identify those closely related to phenotypic variation.With the development of sequencing technology and high-throughput genotyping, GWAS has become an important method for analyzing genetic variation.Since the introduction of cattle high-density chips, there are more extensive and in-depth livestock-related GWAS research.
Reproductive traits are important economic traits of dairy cows, including calving interval (CI), calving ease (CE), heifer conception rate (HCR), cow conception rate (CCR), and daughter pregnancy rate (DPR).Among them, HCR, CCR, and DPR can more intuitively represent the reproductive ability of cows.In the past few decades, most intensive selection on milk production traits has led to a decline in the reproductive function of dairy cows, for example, CI, CE, HCR, CCR, and DPR, which indirectly affected the economic benefits [4,5].Nowadays, the concept of balanced breeding has been popularized and recognized, and the reproductive traits of dairy cows have been given attention.With the continuous maturity of sequencing technology, GWAS studies on the reproductive traits of dairy cows are been reported successively, but there is still a lot of room for research.Olsen et al. [6] performed a GWAS analysis of 2552 Norwegian Red Bull groups from 108 bull families, and target traits mainly included stillbirth, dystocia, and 56-days non-return rate, and found 16 SNPs significantly associated with stillbirth and dystocia.Among them, SNP-rs29012179, located in BTA 12, affects both reproductive and milk production traits.Sahana et al. [7] analyzed on 11 reproductive traits using GWAS and identified 24 QTL regions on 14 chromosomes composed of highly significant SNPs in Danish and Swedish Holstein.Hering et al. [8] performed a GWAS analysis of semen quality with 320 bulls as a test population, finding 34 SNPs that reached genome-level significance.Huang et al. [9] studied the reproductive traits of Holstein cattle using the selective DNA pooling method and found twenty-two SNPs related to fertilization rate and five SNPs related to blastocyst rate.Liu et al. [10].conducted genome-wide association analysis on the first birth age of HOST cattle and found genes including KLHL 4 Ls, TRAM 1, TRAM 2, ZNF438 and MATK were associated with reproductive disorders within 200 kb upstream and downstream of the significant SNP.Multiple SNP sites within the 1 Mb region of chromosome 7 were significantly associated with the day age of primary labor.However, GWAS analysis of reproductive traits including CI, CE, HCR, CCR, and DPR was not fully explored.
In this study, GWAS was used to explore SNP loci affecting important traits including HCR, CCR and DPR in dairy cows and detect the SNP loci associated with target traits.

Sample Genotype and Quality Control
In this study, the tail root blood was collected from 643 Holstein cows, and 3-5 drops of fresh blood from the tail root of dairy cows were drawn for DNA extraction.The first step involved cell lysis, exposing and rupturing membrane barriers such as cell and nuclear membranes.The next step was to remove membrane lipids from the sample; finally, DNA precipitation involved the removal of DNA-associated proteins by protease and removal of RNAs by RNase.The extracted DNA samples were quantitatively detected by a microspectrophotometer, adding DNA samples directly to the base, using two optical fibers to form the detection path, and with full-spectrum detection, the OD value of the DNA samples could be obtained.The DNA genome of Holstein cattle was scanned and analyzed using the Genomic Profiler Bovine 100 K chip containing 88,107 SNPs.Final genotyping data were obtained for the SNP.Quality control of individual data and SNP data in PLINK (v1.9) [11], began by first converting the SNP chip data into files that could be read by the PLINK (v1.9) software (*.map and *.Pad).The converted files included: individual ID, male parent ID (0 without record), female parent ID (0 without record), sex, SNP allele, chromosome, physical location, and genetic distance.A total of 637 individuals and 88,107 SNPs were obtained.The quality control conditions were as follows: (1) individual data with SNP genotype missing rates greater than 10% were filtered out; (2) SNP data with detection rate less than 90% were filtered out; (3) SNP data with a minimum allele frequency of less than 5% were removed; and (4) SNP data with Hardy-Weinberg equilibrium test P values less than 1.0 × 10 −7 were removed.
After quality control, a total of 76,160 high-quality SNPs from 637 samples were used for subsequent GWAS analysis.The quality control command in PLINK software were as follows: PLINK--bfile filename--mind 0.1--geno 0.1--maf 0.05--hwe 0.0000001--make-bed--out outfilename.The output file contained the individual data information and the SNP data information after quality control (Table 1).Information on the distribution of SNP markers on each chromosome before and after quality control is shown in Table 2.

Statistical Analysis
Principal component analysis (PCA) was evaluated with PLINK (v1.9).The population was corrected for PCA results, thus avoiding false positives [12] due to population stratification.GWAS analysis between reproductive traits and genome-wide SNPs is performed using a mixed linear model in the GEMMA (v0.98.5) software [13], establishing the model as follows: y = µ + Xβ + Zv + e In the formula: y is a vector of n × 1, for breeding values after removing genomic effects; µ represents the mean value of the trait phenotypic values; X is the mixed-effect matrix of n × q; β is a vector of q × 1, for the coefficient of the fixed effect; Z is the random polygene effect relationship matrix of n × t; v is the vector influenced by many factors, following the distribution of N (0, Kσ 2 α ), where K is related to all SNP genotypes and σ 2 α additive variance, and is the relatedness matrix of t × t; e is the residual vector following the N (0, Iσ 2 α ) distribution about the unknown variance σ 2 α .

Significance Test
Bonferroni correction was conducted for multiple tests.The significance level of the single test is: chromosome level (1/n), genome-wide level (0.05/n), where n is the number of SNP after quality control.Therefore, the chromosome level significant threshold is (1/76160 = 1.31 × 10 −5 ), and the genome-wide level significant threshold is (0.05/76160 = 6.57× 10 −7 ).

Population Stratification Analysis
To determine whether the population stratification affected the association between SNPs and traits, Q-Q plots (Quantile-Quantile plot) were drawn using R v3.5.1 [14].

Screening and Annotation of the Candidate Genes
Based on National Animal Genome Research Program and National Center for Biotechnology databases, ARS-UCD 1.2 was used as the reference genome.Candidate genes for HCR, CCR, and DPR were searched within 50 kb the upstream or downstream of significant SNPs, and genes functions were annotated [15].

Enrichment Analysis of the Candidate Genes
The enrichment analysis of reproductive trait-related candidates obtained by GWAS analysis was conducted with the DAVID database, and identified the pathways in which these genes were involved.

Population Stratification
Figure 1 main component analysis revealed the sample population was divided into groups (PC1: 0~−1.5, PC2: 0~−0.05;PC1: −0.05~0, PC2: 0.2~0.25;PC1: −0.1~0.1,PC2: −0.1~0.05).Therefore, when performing association analysis on the whole genome after quality control, using population relatedness as a random effect relationship matrix in the mixed linear model can avoid false positive [16][17][18] of results due to population stratification.The Q-Q diagram was drawn with R (3.5.1), as shown in Figures 2-4.The ordinate is −log 10 (GWAS p-value observations), the abscissa is −log 10 (GWAS p-value predictions), and the red line is straight line Y = X.It can be seen the distribution of the χ 2 statistic calculated by the SNP association analysis does not deviate prematurely from the hypothesis test, indicating the analytical model used complies with this study.A deviation occurs on the upper right side of the figure, indicating SNP sites [19] are significantly correlated with HCR, CCR, and DPR, respectively.

Genome-Wide Association Study
After quality control, the genotypes of 88,107 SNPs were obtained in 637 cows.Genome-wide association analysis was conducted on HCR, CCR, and DPR in Chinese Holstein cattle, and a Manhattan plot was generated as shown in Figures 5-7 (different colors represent the different chromosome numbers).After Bonferroni correction, eight SNPs were significantly associated with HCR at the chromosome level (P < 1/76160), eight SNPs were significantly associated with CCR at the chromosome level (P < 1/76160), and twentyfour SNPs were significantly associated with DPR at the chromosome level (P < 1/76160).

Genome-Wide Association Study
After quality control, the genotypes of 88,107 SNPs were obtained in 637 cows.Genome-wide association analysis was conducted on HCR, CCR, and DPR in Chinese Holstein cattle, and a Manhattan plot was generated as shown in Figures 5-7 (different colors represent the different chromosome numbers).After Bonferroni correction, eight SNPs were significantly associated with HCR at the chromosome level (P < 1/76160), eight SNPs were significantly associated with CCR at the chromosome level (P < 1/76160), and twenty-four SNPs were significantly associated with DPR at the chromosome level (P < 1/76160).
Genes 2024, 15, x FOR PEER REVIEW 7 of 14 significant SNPs were annotated, and the 20 closest candidate loci were obtained (Table 3).

Enrichment Analysis of the Candidate Genes
GO enrichment of reproductive trait candidate genes using DAVID database found 28 biological reactions were associated in vivo, each involving zero to fifteen genes (Figure 8).The KEGG analysis revealed no pathway where the candidate genes were located.

Discussion
In this study, we performed a genome-wide association analysis based on a mixed linear model of reproductive-related traits for a population using the GeneSeek Genomic Profiler Bovine 50 K SNP chip in 643 Chinese Holstein cattle.The PLINK software was used to perform the quality control and principal component analysis of the individual

Discussion
In this study, we performed a genome-wide association analysis based on a mixed linear model of reproductive-related traits for a population using the GeneSeek Genomic Profiler Bovine 50 K SNP chip in 643 Chinese Holstein cattle.The PLINK software was used to perform the quality control and principal component analysis of the individual and genotype data.The principal component analysis found the sample population could be divided into three groups, which had an indicative role in avoiding the emergence of false positives and the selection of models due to population stratification [20].In the mixed linear model, SNP effects and population stratification effects were used as fixed effects and kinship matrix as random effects for subsequent GWAS analysis, which greatly improved the accuracy of the association analysis [21,22].

Candidate Genes Associated with the HCR
VWC2L is located on chromosome 2 in dairy cows.VWC2L was previously identified as a member of the cysteine desmin (CKP) family.A previous study suggests it is a novel secreted protein that regulates the function of osteoblast cells and promotes matrix mineralization by regulating the Osterix expression of TGF-βsuperfamily growth factor signaling pathway [23].Wang Kejun et al. [24] found VWC2L gene was significantly associated with food conversion rate, involved in the growth metabolism, and indirectly associated with the growth traits of pigs.Therefore, VWC2L promotes the growth and metabolism of Holstein cattle, and it promotes the pregnancy of young cows by accelerating the formation of fetal bone.
STAT1 was detected on chromosome 2. Signal transducers and activators of transcription (STAT) proteins are essential for the regulation of many biological processes.In cattle, microarray analysis identified STAT1 as differentially expressed genes in the implanted surrounding endometrium.To gain new insights regarding STAT1 during the estrous cycle and early pregnancy, Carvalho et al. [25] investigated STAT1 transcript and protein expression and its biological activity in bovine tissue and endometrial-derived cells.This study found pregnancy increased STAT1 expression on day 16, STAT1 was located in endometrial cells on day 20 of pregnancy, and increased binding of STAT1 to interferon regulatory factor 1 (IRF1), cytokine-induced SH2-containing protein (CISH), and cytokine signaling 1 and 3 (SOCS1, SOCS3) gene promoters was consistent with the induction of its transcripts.The high expression of STAT1 in endometrial cells during pregnancy confirmed its association with pregnancy in young cows.
The PPP3CA gene, a candidate gene for goat high reproductive traits revealed by genome-wide association study, is located on chromosome 6 in cattle and is significantly associated with the pregnancy matching rate in young cows.Yangyang et al. [26] explored the genetic variation of goat PPP3CA and evaluated the genetic impact on litter size.They found only a 20bp insertion-deletion polymorphism in northern Shaanxi white cashmere goats was significantly associated with litter number (p < 0.05), and individuals with deletion/deletion (DD) genotype showed a primary phenotype compared with individuals with other genotypes.This study demonstrated deletion mutations within the PPP3CA gene in goats significantly affected litter size.By UniProt comparison, the PPP3CA protein of Holstein cattle and goats are as high as 98.40%, and the PPP3CA gene, as a candidate gene for the pregnancy rate of young cows, may have some influence on the reproductive traits of Holstein cattle population.
LDB3, located on bovine chromosome 28, is a well-characterized striated PDZ-LIM protein that regulates mechanical stress signaling [27] by association with the mechanosensing domain in filamin C. CTNNA3, also located on bovine chromosome 28, is implicated with preferential expression of the maternal allele in early gestational placental tissue [28] and differentiated villous or extravillous trophoblast.Whether these two genes can be used as candidate genes in young cows needs further confirmation.

Candidate Genes Associated with the CCR
PAEP is located on chromosome 11 in cattle.PAEP is a progesterone-associated endometrial protein, and studies demonstrate this gene enhances cell proliferation through the receptor-mediated membrane IgM receptor, and is the main component in regulating cell proliferation in milk [29].As a 28-kD glycoprotein, PAEP constitutively expresses in both the human reproductive tract and the hematopoietic system [30].Therefore, PAEP promotes cell proliferation under the action of progesterone, which has a positive impact on the pregnancy matching rate in adult cows.
ACOXL is located on chromosome 11.Ana Paula Sbardella et al. [31] used genomewide association analysis to study reproductive traits in Nelore cattle and found ACOXL was significantly associated with multiple reproductive traits, and this gene could also serve as a candidate gene affecting reproductive traits in Chinese Holstein cattle.
MARVELD1 is located on chromosome 26 of dairy cows, which is associated with CCR and DPR.By gene mapping of the mouse cell cycle, Zeng Fanli et al. [32] found overexpression of mouse MARVELD1 resulted in significant inhibition of cell proliferation, G1 arrest, and reduced cell migration.This study showed MARVELD1 was a microtubuleassociated protein that played an important role in cell cycle progression and migration and had an effect on adult cows and daughter pregnancies.
EPAS1 is located on chromosome 11 in cattle and plays multiple supporting roles in maintaining specific aspects of adipocyte function, including regulation of glucose uptake and lipid synthesis [33].GLRB (glycine receptor β), located on chromosome 17, is a regulator of motor neuron numbers [34].These two genes may be candidates for the pregnancy rate in adult cows and require experimental validation.

Candidate Genes Associated with the DPR
PDE1B is located on bovine chromosome 5.Previous studies have shown the PDE1B gene could be regarded as a positional and functional candidate gene for carcass traits in beef cattle.Zhou Di et al. [35] used high-throughput sequencing to analyze the MyoD1 gene knockout transcriptome of MDBK cells (bovine kidney cells) and found PDE1B was highly expressed in the vine leg muscle, dorsal long muscle, and shoulder, which was related to muscle formation and differentiation.Therefore, PDE1B has a certain promoting effect on fetal muscle formation and differentiation, and it has a positive effect on daughter pregnancy, and PDE1B can be further identified as a candidate gene for DPR.
SLCO1A2 is located on bovine chromosome 5; it is a member of the solute carrier organic anion transporter family.Joachim Geyer et al. [36] encode two different fulllength SLCO1A2 cDNA as 666 amino acid membrane proteins containing 12 putative transmembrane-spanning domains.Bovine SLCO1A2 expression was detected in the liver, kidney, brain, and adrenal glands.This study also confirmed bovine SLCO1A2 had functional homology with human SLCO1A2, which had a certain role in the growth and development of various fetal organs, and could be further verified as a candidate gene for DPR.
ADAM10 is located on bovine chromosome 10.It is found ADAM10 initiates the hydrolysis of regulatory membrane proteins by shedding the ectodomains of many different substrates, participates in cell-cell interaction and cell migration [37], has a certain role in the functional maintenance of organs and tissues during pregnancy, and can be further identified as a candidate gene related to DPR.
APBB1 is located on bovine chromosome 15.APBB1 is described as a bridging protein, acting in concert with the βamyloid precursor protein (APP) and Tip60 (histone acetyltransferase).Studies have shown APBB1 played a dominant role in nuclear signaling [38].This gene could be further identified as a candidate gene for DPR.
MON1B is located on bovine chromosome 18.It has been shown MON1B participated in blastocyst formation to promote embryonic development, and gene expression [39] existed in mature oocytes and embryos.
COQ9 is located on bovine chromosome 18.To explore the effects of single nucleotide polymorphisms in COQ9 on mitochondrial, ovarian function, and fertility in Holstein cows, M Sofia Ortega et al. [39] evaluated fertility phenotype measures of up to five lactation in a population of 2273 Holstein cows and showed the coenzyme Q9 mutation affected ovarian function.The A allele was associated with increased mitochondrial DNA copy number in oocytes, with overdominance effects on oocyte COQ9 expression, follicle number, and anti-mullerian hormone concentration.COQ9 is associated with fertility in Holstein cattle and can be a candidate gene influencing DPR.
ARHGAP26, CDC42BPB, HPSE2 are located on chromosomes 7, 21, and 26, respectively.The mechanism of these three genes related to pregnancy is not clear, and its effect on the pregnancy rate needs to be further explored.

Candidate-Gene Enrichment Analysis
GO term function enrichment and KEGG pathway enrichment for the overall different genes, do not distinguish which different gene is up-or down-regulated, and cannot reflect the overall form of activation or inhibition.However, these analyses can categorize genes according to the cell components, molecular function, and biological processes following a multiple gene set analysis, so that researchers can better understand gene function.KEGG pathway integration of the information of genomics, biochemistry, and system functional omics helps researchers to study gene and expression information as a whole, to better understand the mechanism of gene regulation, and also to judge whether the candidate genes of each trait are reasonable, and provide reference for subsequent identification tests.
In this study, GO enrichment found 28 biological reactions were correlated with reproductive traits, with each reaction involving zero to fifteen genes.The specific regulatory mechanism needs to be explored.

Conclusions
In this study, genome-wide association analysis of reproductive traits in 643 Chinese Holstein cattle population in Tianjin, based on a mixed linear model, identified multiple SNP loci significantly associated with the target trait and identified corresponding candidate genes, but most of the genes were not confirmed and a further identification is needed.This study provides the theoretical basis and breeding direction for the improvement of the breeding performance of Chinese Holstein cattle, and also lays a foundation for further mining and identifying the important gene loci of dairy cows in Tianjin, improving the molecular breeding theory and improving the production performance of dairy cows.
Genes 2024, 15, x FOR PEER REVIEW 5 of 14 indicating the analytical model used complies with this study.A deviation occurs on the upper right side of the figure, indicating SNP sites [19] are significantly correlated with HCR, CCR, and DPR, respectively.

Figure 5 .
Figure 5.The Manhattan plot of the HCR.Figure 5.The Manhattan plot of the HCR.

Figure 5 .
Figure 5.The Manhattan plot of the HCR.Figure 5.The Manhattan plot of the HCR.

Figure 5 .
Figure 5.The Manhattan plot of the HCR.

Figure 6 .
Figure 6.The Manhattan plot of the CCR.Figure 6.The Manhattan plot of the CCR.

Figure 6 .
Figure 6.The Manhattan plot of the CCR.Figure 6.The Manhattan plot of the CCR.Genes 2024, 15, x FOR PEER REVIEW 8 of 14

Figure 7 .
Figure 7.The Manhattan plot of the DPR.

Figure 7 .
Figure 7.The Manhattan plot of the DPR.

Genes 2024 , 14 Figure 8 .
Figure 8. GO enrichment results for candidate genes for reproductive traits.

Figure 8 .
Figure 8. GO enrichment results for candidate genes for reproductive traits.
numbers and/or tables, reviewed the draft paper, and approved the final draft.All authors have read and agreed to the published version of the manuscript.Funding: This work was supported by Special Seed Industry Project of Tianjin Academy of Agricultural Sciences-Research on innovation and Efficient Breeding Technology of Livestock and poultry germplasm Resources (2023ZYCX011); Tianjin Science and Technology Plan Project-Integrated Demonstration of High Quality Beef Cattle Breeding (22ZYCGSN00020); Tianjin Science and Technology Plan Project-Key Technology Research and germplasm innovation of High-yield Milk Breeding (22ZXZYSN00020).Institutional Review Board Statement: Experimental procedure was approved by the Animal Care and Use Committee of Animal Husbandry and Veterinary Research Institute of Tianjin, China.Data and model availability statement: The data were confidential and not deposited in an official repository.

Table 1 .
Descriptive statistics for reproductive traits.

Table 2 .
Information on the distribution of SNP markers on each chromosome before and after quality control.

Table 3 .
Results of genome-wide association analysis of reproductive traits in Chinese Holstein cattle.

Table 3 .
Results of genome-wide association analysis of reproductive traits in Chinese Holstein cattle.