Genome-Wide Association Study Provides Insights into Important Genes for Reproductive Traits in Nelore Cattle

Simple Summary In this study, we investigated the association between single nucleotide polymorphisms (SNPs) and reproductive traits in order to identify candidate genes and biological pathways associated with these traits in Nelore beef cattle. The genome-wide association analysis revealed genomic regions that could explain part of the genetic variance of the studied traits. The results revealed genes with important functions for reproductive traits, such as fertility and precocity. Some genes were associated with more than one trait, being important for reproductive efficiency. The identification of candidate genes that were associated with the studied traits as well as genes enriched in the functional terms and pathways may be useful for exploring the genetic architecture underlying reproductive traits and may be used in Nelore breeding programs. Abstract The identification of genomic regions associated with reproductive traits as well as their biological processes allows a better understanding of the phenotypic variability of these traits. This information could be applied to animal breeding programs to accelerate genetic gain. The aim of this study was to evaluate the association between single nucleotide polymorphisms (SNP) with a scrotal circumference at 365 days of age (SC365) and at 450 days of age (SC450), gestation length (GL) as a calf trait, age at first calving (AFC), accumulated productivity (ACP), heifer early calving until 30 months (HC30), and stayability (STAY) traits, in order to identify candidate genes and biological pathways associated with reproductive traits in Nelore cattle. The data set consisted of pedigree, phenotypes, and genotypes of Nelore cattle from the “Associação Nacional de Criadores e Pesquisadores” (ANCP). The association analyses were performed using the Weighted Single-Step Genome-Wide Association method; the regions, consisting of 10 consecutive SNP, which explained more than 0.5% of additive genetic variance, were considered as a significant association. A total of 3, 6, 7, 5, 10, 25, and 12 windows were associated with SC355, SC450, GL, AFC, ACP, HC30, and STAY, respectively. The results revealed genes with important functions for reproductive traits, such as fertility and precocity. Some genes were associated with more than one trait, among them CAMK1D, TASP1, ACOXL, RAB11FIP5, and SFXN5. Moreover, the genes were enriched in functional terms, like negative regulation of fat cell differentiation, fatty acid alpha-oxidation, and sphingolipids signaling pathway. The identification of the genes associated with the traits, as well as genes enriched in the terms and pathway mentioned above, should contribute to future biological validation studies and may be used as candidate genes in Nelore breeding programs.


Introduction
Greater inclusion of animal protein in human food has increased demand for beef production. Producers and researchers have been working, using the benefits of genetics, nutrition, and animal reproduction, on the search for a performance that is compatible with the growth of beef cattle production and that results in an increase in productivity [1,2].
Reproductive traits are economically relevant to beef cattle production systems [3]. Sexual precocity is especially important for Bos taurus indicus breeds that are commonly less precocious than Bos taurus taurus breeds [4]. Nevertheless, Zebu breeds have adaptive advantages in tropical production in comparison with Taurine breeds [5,6]. Reproductive inefficiency results in monetary losses caused by decreased production and detained reproduction, which impacts the costs and the sustainability of cattle production systems [7]. Therefore, reproductive traits must be considered in beef cattle production.
However, the functional reproductive traits selection is limited due to its smallmoderate heritability [8,9] and result in a slow genetic gain. These traits are controlled by many genes and have relatively low to moderate heritability; that is, they are strongly influenced by environmental factors [10]. Thus, a small proportion of the phenotypic variance may be explained by genetic variance [9]. Furthermore, most reproductive traits are measured late in the life of the animals and are limited by sex, making selection and genetic gain difficult [11][12][13]. The response selection for reproductive traits is slower than for productive traits [14], which have intermediate to high heritability estimates. Thus, alternatives that improve the response to selection and consequently accelerate genetic gain are necessary to improve the reproductive performance of beef cattle herds. The genomic information has been included in the genetic evaluation, and even with the low heritability estimates of reproductive traits, a positive impact on fertility has been observed [8,15,16].
The selection focus on production traits and its antagonism with fertility traits in dairy cattle has caused a progressive decline in fertility. Thus, the inclusion of the reproductive traits in breeding programs and the use of genomic information benefit the identification of quantitative trait loci (QTL), and candidate genes associated with reproductive traits are highlighted [16]. García-Ruiz et al. [15] showed that the low heritability traits, such as fertility, are those that most benefit from the use of genomic information, increasing accuracy and selection differential and decreasing the generation interval.
Genomic studies have been widely used for beef cattle. Single nucleotide polymorphism (SNP) applied in genome-wide association studies (GWAS) can be used to identify possible associations between chromosomal regions and complex traits, such as reproductive traits. GWAS have been developed aiming to identify candidate genes and to understand the genetic mechanisms involved with the reproductive traits [17][18][19]. Thus, the use of more informative markers and the disposal of those that generate noise in the predictions could improve the accuracy of genomic predictions [20].
The SNP associated with genes that influence reproductive traits, once identified, can be incorporated into SNP panels to increase the accuracy of genomic predictions for fertility. Thus, contributing to the selection process [21] and making it possible, using singlestep approaches, to evaluate genotyped young animals without their phenotypes [8] and increase the understanding of the mechanisms regulating the reproductive performance of beef cattle.
Genomic studies are the basis for accurate animal production [22]. Therefore, the aim of this study is to evaluate the association between SNP and scrotal circumference at 365 days of age (SC365) and at 450 days of age (SC450), gestation length (GL) as a calf trait, age at first calving (AFC), accumulated productivity (ACP), heifer early calving until 30 months (HC30) and stayability (STAY) traits, in order to identify candidate genes and biological pathways associated with reproductive traits in Nelore beef cattle.

Data
The data set was composed of pedigree, phenotypes, and genotypes of Nelore beef cattle from the "Associação Nacional de Criadores e Pesquisadores" (ANCP). The ANCP animal breeding program aims to identify, select and offer to the beef market genetically superior animals for productive and reproductive traits. The pedigree used was composed of 140,199 Nelore animals born in the years 1934 to 2016, with 9.37 ± 5.64 mean generations known and 16.82% of missing parents rate (at least one unknown parent), and with an effective population size equal to 404. The phenotypic data set consisted of a total of 50,331 animals with observations for at least one of the SC365, SC450, GL, AFC, ACP, HC30, and STAY traits in Nelore beef cattle (Table 1). The ACP index was calculated under the expression described in the study by Lôbo et al. [23]: where W 210 is the average body weight of weaned calves corrected for 210 days of age, n c is the total number of calves produced, and ADC n is the dam's age at last calving. The 365 and 550 constants enable fertility expression on an annual basis. HC30 and Stayability are bi-categorical traits. HC30 was defined attributing the value of 1 (success) for heifers first calving until 30 months of age, and 0 (failure) otherwise. For Stayability, cows that attained at least three calvings at 76 months of age had the phenotype categorized as "success"; otherwise, cows that had not had three calvings until this age had a phenotype described as "failure".
The animals were raised on pasture, weaned between 6 and 8 months of age, and the reproductive management consisted of the mating season lasting from 60 to 120 days using artificial insemination or controlled natural mating. The contemporary groups (CG) were constituted by sex (when the trait was measured in both sexes), animals born on the same farm, year, and season and belonging to the same management group.

Phenotype Data
The CG with a frequency of fewer than five animals was excluded. The connectability analysis between CG for each trait was performed using the AMC software [25], and disconnected CG was disregarded in the analyzes. The residual normality and variance homogeneity assumptions were performed, and observations with ±3.0 standard deviations were excluded for each trait. The number of observations after phenotype data editing is summarized in Table 1.

Genotype Data
The SNP with a call rate less than 90% and minor allele frequency (MAF) less than 5% were excluded. Samples with a call rate of less than 90% were also not considered in analyses. Only autosomal SNPs were considered. After genomic data quality control, 460,838 SNPs and 8545 animals with observations for at least one trait were available.

Genome-Wide Association Analysis
The methodology used was the weighted single-step genome-wide association (Wss-GWAS); that is, modification of the BLUP method, with the numerator relationship matrix A −1 replaced by the genomic matrix H −1 [26]: where A is the numerator relationship matrix for all animals; A 22 is the numerator relationship matrix for genotyped animals, and G is the genomic relationship matrix. The G matrix was obtained according to Amin et al. [27] and Leuttenegger et al. [28]: where Z is the coefficient matrix adjusted for allele frequency; D is the matrix that contains the weights for all SNPs.
For the derivation of SNPs effects and weights, the animal effect was decomposed into genotyped animals (a g ) and not genotyped (a n ), as described by Wang et al. [29]. The animal effect of the genotyped animals is a function of the SNP effects: where Z is a matrix relating genotypes of each loci and u is a vector of the SNP effects. The animal effect variance was calculated by: where D is the diagonal matrix of weights for variances of SNP; σ 2 u is the genetic additive variance captured by each SNP and G * is the weighted relationship genomic matrix, obtained by G * = Gq where q is a weighting/normalizing factor. This factor was used according to Chen et al. [30], to ensure genomic evaluations unbiased; for this, the average diagonal in G to that of A 22 .
Thus, the effect of SNPs was obtained following the equation taken from Wang et al. [29]: The iterative process described by Wang et al. [29] in six steps was used: Step one, D = I; step two, to calculateâ g with WssGWAS; step three, to calculate the SNP effect; step four, to calculate the variance of each SNP, d i =û 2 i 2p i (1 − p i ), where i is the ith SNP; step five, the normalized value of the SNP to keep the additive genetic variance constant and step six, to calculate the G matrix. A weighted process that recalculated the weights of SNPs consisted of repeating once steps 2 to 6 [29].
This process increases SNP weights explaining larger genetic variance and decreases SNP weights explaining small genetic variance. The percentage of genetic variance explained by the ith region was calculated as follows: where a i is the genetic value of the ith region that consists of 10 consecutive SNP, σ 2 a is the total genetic variance, Z j is the vector of SNP content of the jth SNP for all individuals and u j is the marker effect of the jth SNP within the ith region.
The analyzes for estimation of the genomic association were performed using software from the BLUPF90 program family [31,32]. The SC365, SC450, GL, AFC, and ACP traits were evaluated under a linear model using POSTGSF90 software, while for HC30 and STAY traits; a threshold model was applied using THRGIBBS1F90 software. The variance components and heritability estimates of traits are provided in the Supplementary Table S1. The general animal models used for SC365, SC450, AFC, ACP, STAY, and HC30 (8) and for GL (9) were: where y t is the vector of phenotypic observations for each trait, except for STAY and HC30, that is the threshold vector; b is the vector of fixed effects, that included CG; a is the vector of effects of the animals; m is the vector of maternal effect, and e is the vector of residual effects; X, Z 1 and Z 2 are the incidence matrix for b, a, and m, respectively. The variances of a and e were calculated by: where σ 2 a is the additive genetic variance; σ 2 m is the maternal additive genetic variance; σ 2 e is the residual variance; σ am is the additive-maternal covariance and H is the matrix that combines the relationship and genomic information matrix, and I is the identity matrix.

Search for Associated Genes
The chromosome regions, consisting of windows with 10 consecutive SNPs that explained more than 0.5% of additive genetic variance were considered as associated with the studied trait [33], and then the investigation of the genes that this region contained was carried out with Ensembl Genome Browser (http://www.ensembl.org/index.html accessed on 18 October 2018) using the Ensembl gene 94 version of the gene model and UMD3.1.1 bovine genome assembly as reference. The enrichment of genes, as well as their classification according to biological function and identification of metabolic pathways, was performed using the "Database for Annotation, Visualization and Integrated Discovery (DAVID) v. 6.8" (http://david.abcc.ncifcrf.gov/ accessed on 14 November 2018) for each trait, with a p-value significance criterion of 0.05.

Results and Discussion
The associated regions with the traits studied are shown in Figure 1 and presented in Table 2. A total of 3, 6, 2, 5, 10, 25, and 12 windows with 10 SNP were associated with SC355, SC450, GL, AFC, ACP, HC30, and STAY, respectively.
The genes found in the significant regions for each trait are presented in Supplementary Table S2. In the regions associated with SC365 (Table 2), no gene was found. A total of five known genes and three unknown genes are associated with SC450, and in the enrichment analysis, no term was significantly enriched (p > 0.05). A single known gene (CDH22) was found in GL-associated regions (Table S2), and in the enrichment analysis, no term was significantly enriched (p > 0.05).  A total of 231 genes were found in the AFC-associated regions (Table S2), with 178 known and 53 unknown genes. The enrichment analysis (p ≤ 0.05) for AFC-associated genes in six terms (Table S3), including mitochondrial translational initiation and elongation. In these pathways, the mitochondrial ribosomal protein S5 (MRPS5) and mitochondrial ribosomal protein S9 (MRPS9) genes have been enriched and have already been associated with feed intake [34]. Basarab et al. [35] showed that animals with positive feed intake consumed more feed and retained more energy. The AFC has a negative genetic correlation with weaning and yearling weight gain (−0.20 and −0.24, respectively), indicating that the selection for greater weight gain results in an AFC decrease [36]. In Brazil, the Nelore is selected for weight gain. This can impact the sexual precocity of the animals, resulting in lower AFC [36]; in addition, it is noteworthy that the selection for weight gain can also increase the weight of adult females [37,38], which may not be interesting depending on the production system.
A total of 244 genes were found in the ACP-associated regions (Table S2), of which 156 are known genes, and 88 are unknown. A total of 22 terms (Table S4) were significantly enriched (p ≤ 0.05), being related to immune system processes and body growth.
The neuropeptide Y receptor Y1 (NPY1R) gene was enriched in the regulation of multicellular organism growth term. The NPY1R has been found in association with Angus heifers' fertility by Neupane et al. [39], and the neuropeptide Y associated with maternal behavior dependent on nutritional status in mice [40]. Adding to this, and considering that ACP indicates the abilities of the female to calve at a young age, to maintain the regularity of calving, and to wean heavy calves [41], the maternal behavior can influence the weaning weight of animals, and cow fertility can influence not only in AFC but also to maintain a shorter calving interval, which reflects the occurrence of regular calving, evidencing the importance of the NPY1R gene for a cow's ACP throughout life.
In the regions associated with HC30, 580 genes were found (Table S2), of which 434 are known and 146 are unknown. Analyzing the total of genes, 22 terms (Table S5) were significantly enriched (p ≤ 0.05), among them fatty acid alpha-oxidation, negative regulation of fat cell differentiation, and innate immune response were highlighted.
Into the negative regulation of fat cell differentiation term, the E2F transcription factor 1 (E2F1), GATA binding protein 3 (GATA3), additional sex combs like 1, transcriptional regulator (ASXL1), and tribbles pseudokinase 3 (TRIB3) genes were enriched. This term involves processes that inhibit the differentiation of adipocytes [42]. Considering the association of the two terms mentioned above with HC30, the results in the present study suggest that the favorable selection of alleles that inhibit degradation of 3-methyl branched fatty acids and which favor differentiation of adipocytes can result in higher fat deposition in these animals, which would be sexually precocious, i.e., with more success for HC30.
The biological process of innate immune response was significantly enriched (p ≤ 0.05) associated with the HC30 trait. Melo et al. [18] identified an association between the number of calves at 53 months of age trait and the immune processes pathway. The phenotypic correlations were reported by Banos et al. [43] between immune traits with reproductive performance traits; in especial, the authors reported that a higher concentration of CD8+ cells was associated with longer calving intervals. Thus, the immune system can interfere not only in the HC30 trait but also in cow longevity since the longer calving interval results in a lower stay in the herd.
A total of 52 genes were found in the STAY-associated regions (Table S2), of which 37 are known genes, and 15 are unknown genes. The analysis of the biological functions of the genes enriched six terms (p ≤ 0.05) (Table S6), among them, the sphingolipid signaling pathway. The lipid pathways are associated with the reproductive traits, such as the Nelore heifer reconception trait, indicating that cows with greater capacity to accumulate fat have better rebreeding performance [18] and consequently greater STAY.
The sphingolipid signaling and its metabolic products have a signaling function in several other pathways, such as the metabolic sphingosine-1-phosphate (S1P) that acts as a growth and survival factor and the ceramide (Cer) that activates the apoptotic pathways [44]. The relation of this pathway to the activation of growth factor interferes in the sexual precocity of the heifers, which can take less time to reach the necessary weight for reproductive maturity, thus decreasing AFC and increasing the cow's STAY. By activating survival factors, this pathway could also be associated with the longevity of cows in the herd, and thus, in STAY.
The phospholipase C beta 1 (PLCB1), enriched in the sphingolipid signaling pathway, and phospholipase C beta 4 (PLCB4) genes, both located in BTA 13, were associated with STAY. Early embryonic development in mammals depends on maternal miRNA stocks prior to the initiation of zygote regulation; this miRNA involvement has been described in rats [45] and bovine animals [46]. The PLCB1 is a target gene of the miR-205, and this miRNA was differentially expressed during preimplantation embryonic development, and both PLCB1 and miR-205 have a reciprocal expression pattern [46]. The PLCB1 and PLCB2 genes are targets of the miR-301b, which is associated with ovarian follicle development in cattle [47].
The jagged 1 (JAG1) and mitochondrial ribosomal protein L33 (MRPL33) genes, both located in BTA 13, were found associated with STAY. Although these genes have not been enriched in biological pathways, they have important reproductive functions. The JAG1 gene has a signaling function during embryogenesis [48], and the MRPL33 gene was cited by Melo [49] for being in association with heifer rebreeding in Nelore cattle. This fact reinforces the association between MRPL33 and STAY found in our work because the stayability of a cow in the herd is influenced by its ability for reconception even when it is a heifer.
Many uncharacterized genes were found associated with the traits. These genes do not yet have an identified ortholog, which means that they do not have a homologous gene that has the same function in different organisms; therefore, in the enrichment analyzes, these genes were not found. All the biological processes, cell components, molecular functions, and pathways that were significantly enriched (p ≤ 0.05) in the gene ontology analysis are presented in Supplementary Tables S3-S6. None of the genes found in the associated regions with each trait studied were associated with all the traits. The associated genes with SC450, GL, and ACP did not coincide with any gene associated with the other traits. According to Kluska et al. [50], using part of the database of this manuscript, the SC450 presented genetic correlation with AFC, HC30, and STAY equal to −0.29 ± 0.06, 0.52 (0.31-0.74) and 0.35 (0.20-0.50), respectively. This indicates that selection for SC can decrease AFC [51] and increase HC30 and STAY; therefore, it is an indicator of the sexual precocity of Nelore heifers. Grossi et al. [41] found a negative genetic correlation between ACP and AFC (−0.33 ± 0.04) in Nelore dams, i.e., the gains in ACP are related to a lower AFC. The genetic correlation between these traits, related by Kluska et al. [50] and Grossi et al. [41] indicates that these traits are controlled by common genes, reinforcing the GWAS results of this paper that found some common genes associated with the traits.
The common genes associated with the reproductive traits are presented in Table S7. A single common and known gene is associated with AFC and STAY; 3 common genes are in association with AFC and HC30 (2 known genes); 31 genes common to HC30 and STAY (22 known genes).
Different studies reported that AFC and STAY present negative genetic correlation (−0.64 (−0.83-−0.44); −0.69 and −0.32 (−0.40-−0.25) [9,50,52]; thus, the decrease in AFC would lead to an increase in STAY. The acyl-CoA oxidase like (ACOXL) gene located in the BTA 11 was found in the regions associated with AFC and STAY, so the selection targeted to this gene would allow a possible gain for both traits. ACOXL is a primary rate-limiting enzyme in peroxisomal fatty acids β-oxidation [53,54] and has been found in association with pregnancy outcome (pregnant or not pregnant) using fixed-time artificial insemination in Brahman heifers [55].
The RAB11 Family Interacting Protein 5 (RAB11FIP5) and Sideroflexin 5 (SFXN5) genes, both located in BTA 11, were associated with AFC and HC30. The RAB11FIP5 and SFXN5 genes have already been described as associated with shear force [56], a method used to evaluate the tenderness of meat. There is a high degree of interdependence of puberty traits and growth and body composition [57]. Pacheco et al. [58] reported that subcutaneous fat thickness is favored when younger animals are slaughtered. Still, part of a chromosomal region associated with the occurrence of early calving is also described when meat tenderness in Nelore cattle is investigated [17,59]. In view of this, the RAB11FIP5 and SFXN5 genes seem to influence sexual precocity in cattle, justifying the association of these genes with AFC and HC30 traits identified in the present study.
The gene calcium/calmodulin dependent protein kinase ID (CAMK1D) was found associated with HC30 and STAY. This gene is located in BTA 13, and its association with heifer rebreeding and age at first calving was found by Costa et al. [60] in a GWAS study of Nelore dams. In addition, Melo et al. [19] reported that this gene might contribute to sexual precocity in Nelore cattle.
Threonine aspartase 1 (TASP1) gene, also associated with HC30 and STAY, generates alpha and beta subunits, forming the active alpha2-beta2, necessary to cleave the protein used for the maintenance of HOX gene expression [61]. When the HOX gene has its expression altered, it interferes with uterine development, preventing the implantation of the embryo and causing infertility in women [62]. In bovine, the HOX expression profile suggests a role during early developmental events, including oocyte maturation, the maternal to embryonic transition, and the first differentiation events [63]. In the same study, the comparison between two bovine and mouse species indicated that HOX expression at initial stages is partly conserved among mammals. Thus, the idea of the influence of the TASP1 gene on cow fertility is reinforced and justifies its association with STAY and HC30 traits.

Conclusions
The results in the present study provide a better understanding of the genes associated with reproductive traits studied in Nelore cattle. Many genes found in the associated windows appear to be related to more than one trait, indicating that these genes have a pleiotropic effect and are important for reproductive efficiency. Among these common genes, the CAMK1D and TASP1 genes were associated with HC30 and STAY; the ACOXL gene that appeared to be associated with AFC and STAY, and RAB11FIP5 and SFXN5 genes associated with AFC and HC30 should be highlighted due to their relationship with reproductive efficiency.
Many pathways and terms were associated with the reproductive traits in Nelore cattle, but the fatty acid alpha-oxidation and negative regulation of fat cell differentiation terms stand out for being related with HC30, while the sphingolipid signaling pathway was highlighted as being related to STAY. The identification of the genes associated with the traits, as well as genes enriched in the terms and pathway mentioned above, should contribute to future biological validation studies and may be used as candidate genes in Nelore breeding programs.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ani11051386/s1. Table S1: Estimations of genetic variance (σ 2 a ), genetic variance of maternal effect (σ 2 m ), genetic standard deviation (σ a ), genetic standard deviation of maternal effect (σ m ), residual variance (σ 2 e ), residual standard deviation (σ e ), heritability (h 2 a ), and heritability of maternal effect (h 2 m ) for scrotal circumference at 365 days of age (SC365), scrotal circumference at 450 days of age (SC450), gestation length (GL), as a calf trait, age at first calving (AFC), accumulated productivity (ACP), early calving until 30 months (HC30), and stayability (STAY). Table S2: Genes in significant genomics regions associated with reproductive traits in Nelore cattle. Table S3: Enriched GO terms and KEGG pathways significantly enriched (p ≤ 0.05) from DAVID software for age at first calving. Table S4: Enriched GO terms and KEGG pathways significantly enriched (p ≤ 0.05) from DAVID software for accumulated productivity. Table S5: Enriched GO terms and KEGG pathways significantly enriched (p ≤ 0.05) from DAVID software for heifer early calving until 30 months. Table S6: Enriched GO terms and KEGG pathways significantly enriched (p ≤ 0.05) from DAVID software for stayability. Table S7: Common gene list associated with de with reproductive traits in Nelore cattle