Genomewide Association Analyses of Lactation Persistency and Milk Production Traits in Holstein Cattle Based on Imputed Whole-Genome Sequence Data

Lactation persistency and milk production are among the most economically important traits in the dairy industry. In this study, we explored the association of over 6.1 million imputed whole-genome sequence variants with lactation persistency (LP), milk yield (MILK), fat yield (FAT), fat percentage (FAT%), protein yield (PROT), and protein percentage (PROT%) in North American Holstein cattle. We identified 49, 3991, 2607, 4459, 805, and 5519 SNPs significantly associated with LP, MILK, FAT, FAT%, PROT, and PROT%, respectively. Various known associations were confirmed while several novel candidate genes were also revealed, including ARHGAP35, NPAS1, TMEM160, ZC3H4, SAE1, ZMIZ1, PPIF, LDB2, ABI3, SERPINB6, and SERPINB9 for LP; NIM1K, ZNF131, GABRG1, GABRA2, DCHS1, and SPIDR for MILK; NR6A1, OLFML2A, EXT2, POLD1, GOT1, and ETV6 for FAT; DPP6, LRRC26, and the KCN gene family for FAT%; CDC14A, RTCA, HSTN, and ODAM for PROT; and HERC3, HERC5, LALBA, CCL28, and NEURL1 for PROT%. Most of these genes are involved in relevant gene ontology (GO) terms such as fatty acid homeostasis, transporter regulator activity, response to progesterone and estradiol, response to steroid hormones, and lactation. The significant genomic regions found contribute to a better understanding of the molecular mechanisms related to LP and milk production in North American Holstein cattle.


Introduction
Milk production and composition are the most intensively selected traits in dairy cattle breeding programs around the world due to their direct economic impact to the industry and close link with nutritional properties [1,2]. Additionally, lactation persistency (LP), defined as the ability of a cow to maintain milk production at a high level after reaching the milk production peak, greatly impacts the economic return of the dairy sector [3]. Different indicators of LP have been proposed over time [4][5][6][7], with heritability estimates ranging from 0.14 to 0.24 [8,9]. Heritability estimates in Holstein cattle for milk (MILK), fat (FAT), and protein (PROT) yields usually range from 0.24 to 0.52, and from 0.36 to 0.68 for fat percentage (FAT%) and protein percentage (PROT%) [10][11][12][13].
Identifying genomic regions and candidate genes related to milk production traits is crucial to better understand the biological mechanisms underlying their phenotypic

iWGS and Genomic Quality Control
Imputed WGS data from 9131 Holstein cows, with 29,548,077 SNPs were available for this study. Genotype imputation was performed first from medium-density SNP panels (MD, 9131 cows and 56,955 or 60,914 SNPs; Illumina, San Diego, CA, USA) to a high-density panel (HD, 311,725 SNPs; Illumina, San Diego, CA, USA); and secondly, from HD to WGS, resulting in the iWGS datasets. The HD reference population had 2397 animals (from the North American Holstein population), and there were 1147 animals with WGS data (from the 1000 Bull Genomes Project, which also included North American Holstein animals). Imputation was performed using the FImpute software [39]. Imputation accuracies greater than 95% have been achieved for Holstein and other North American dairy cattle breeds [19,40].
Before imputation, SNPs with a call rate lower than 0.95, extreme departure from Hardy-Weinberg equilibrium (p < 10 −8 ), located in nonautosomal chromosomes or with unknown position, and excess of heterozygosity greater than 0.15 were removed from the dataset. Only SNPs present in both MD and HD files were retained in the MD dataset. After quality control (QC), 38,955 SNPs remained in the MD dataset to be used in the imputation analyses. A total of 297,114 SNPs (from the initial 311,725 SNPs) remained in the HD panel dataset after QC. Subsequently, imputation from HD to WGS was performed for all autosomal chromosomes.
After imputation to WGS, an additional QC was performed to remove the individuals and SNPs with a missing rate greater than 0.1, minor allele frequency (MAF) lower than 0.01, and extreme deviation from Hardy-Weinberg equilibrium (p < 10 −8 ; e.g., [41][42][43]. Finally, a range of 5,108,861 to 6,101,357 SNPs remained for the GWAS analyses of LP and milk production traits. The PLINK software [44] was used to perform all the QC applied. All these analyses were done following Chen et al. [21].

Association Analyses, Statistical Models, and Significance Testing
Association analyses were performed using the GCTA package [45], fitting a mixed linear model (MLM), including a polygenic effect. Therefore, SNP effects were estimated using the following statistical model: where y is a vector of pseudo-phenotypes (dEBVs); 1 is a vector of ones; µ is the overall mean; b is the fixed effect of the SNP tested for association with each trait, X is a vector containing the genotype score for the tested SNP; u is a vector of polygenic effects with u ∼ N 0, Gσ 2 u , where G is the genomic-based relationship matrix (GRM) and σ 2 u is the additive genetic variance of polygenic effects; Z is the incidence matrix of u; and e is a vector of residual effects with e ∼ N 0, Iσ 2 e , where I is an identity matrix and σ 2 e is the residual variance.
To avoid double-fitting candidate SNPs [46], GRMs were alternatively constructed by randomly sampling 50,000 SNPs from all chromosomes, except the one in which the analyzed SNPs were located. After the GWAS analyses, all SNPs were ranked based on their p-values and clumped according to their LD pattern (r 2 > 0.90), as suggested by Prive et al. [47]. The genomic inflation factor (λ) was calculated as the ratio of the median of the observed distribution of the X 2 statistic to the expected median (λ = median X 2 /0.4549), for which a 95% confidence interval (CI) of λ value was further derived.
To avoid the occurrence of excessive false-negative results by using the Bonferroni adjustment, as a large number of variants were tested [48], we alternatively calculated the threshold of significant testing as 0.05 divided by the number of independent chromosomal segments (M e ) at chromosome-wise levels [49]. M e is a function of the effective population size (N e ) and chromosome length (L, in centi-Morgans-cM), and was calculated as [50]: M e = 2N e L log(N e L) N e was considered to be equal to 58 [51] and one cM equivalent to 1 Mb [52]. The SNPs were considered as statistically significant if their − log 10 (P) was higher than the chromosome-wide threshold.

Functional Genomic Analyses
The SNP coordinates were based on the ARS-UCD1.2 assembly of the cattle reference genome (GCA_002263795.2). The annotation information was obtained from the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov, accessed on 30 October 2021). The GALLO R package [53] was used to detect genes located within ±100 Kb of the significant SNPs and QTL regions previously cited in the Animal QTLdb [54]. The Variant Effect Predictor (VEP) tool from Ensembl [55] was utilized to identify novel variants associated with the main peaks observed in the Manhattan plots. Functional enrichment analyses of the candidate genes identified were performed using the DAVID platform [56]. Gene network joint analyses were performed using the STRING database [57].

GWAS
After QC and LD-based clumping [47], the remaining number of informative SNP ranged from 1,673,052 (FAT) to 2,117,121 (LP). The Manhattan plots illustrate the chromosomal distribution of SNPs significantly associated with each trait (Figure 1). Additionally, Manhattan plots with a y-axis truncated at a lower level for the milk production traits are available as Figure S1 for a better visualization of peaks other than those found on BTA14. For the milk production traits, the significant peaks were higher and sharper, suggesting a more precise detection of narrower QTL regions distributed across the whole genome. A strong association was found in BTA14 (Tables 2 and 3), with ARHGAP39, bta-mir-2308,  C14H8orf82, LRRC24, LRRC14, RECQL4, MFSD3, GPT, PPP1R16A, FOXH1, KIFC2, CYHR1,  TONSL, VPS28, ENSBTAG00000053637, SLC39A4, CPSF1, and ADCK5 harboring the most significant SNPs for MILK, FAT, FAT%, and PROT%; and with MAF1, SHARPIN, CYC1, GPAA1, EXOSC4, OPLAH, SPATC1, GRINA, PARP10, PLEC, and bta-mir-2309 being the most significant candidate genes for PROT. Milk production traits were highly associated with the diacylglycerol O-acyltransferase 1 (DGAT1) gene (Tables S1-S5), contributing to the highest peak found in BTA14. For LP, significant SNPs were spread across various chromosomes and with less-defined peaks. The most significant regions for LP were observed in BTA28 (p-value = 5.28 × 10 −7 ) and BTA18 (p-value = 8.56 × 10 −7 ), in which, for the BTA28 peak, the most significant SNPs were associated with ZMIZ1 and PPIF; and for the BTA18 peak, with ARHGAP35, NPAS1, TMEM160, ZC3H4, and SAE1 (Table 2). Moreover, the genes presented in Tables 2 and 3 were previously reported in the Animal QTLdb to be associated with a large number of QTL regions. Most of these QTL are related to milk production traits, but there are some others linked to reproduction, health, production, and exterior (conformation and appearance). were associated with ZMIZ1 and PPIF; and for the BTA18 peak, with ARHGAP35, NPAS1, TMEM160, ZC3H4, and SAE1 (Table 2). Moreover, the genes presented in Tables 2 and 3   Table 2; Table 3 were previously reported in the Animal QTLdb to be associated with a large number of QTL regions. Most of these QTL are related to milk production traits, but there are some others linked to reproduction, health, production, and exterior (conformation and appearance).

Commonly Identified Genes for Two or More Traits
Similar genomic regions were detected to be associated with different LP and milk production traits (Figure 2), indicating potential pleiotropic effects. LP presented common candidate genes with MILK, FAT, FAT%, and PROT%, where 15 candidate genes were concurrently associated with LP and the mentioned production traits: CXCL13 and LDB2 (MILK); ZMIZ1, bta-mir-371, NLRP12 and PPIF (FAT); INPP5A (FAT%); and SER-PINB6, SERPINB9, IGF2BP1, and DLX4 (PROT%). Additionally, LP, FAT%, and PROT% showed common candidate genes between the three traits simultaneously: ABI3, GNGT2, B4GALNT2, and PHOSPHO1, demonstrating their importance not only in the genetic background of milk solids production, but also in the duration of the lactation peak.
As a result of the high genetic correlation existing between the milk production traits, 98 genes were commonly associated with the five milk production traits (MILK, FAT, FAT%, PROT, and PROT%), as demonstrated in Figure 2. All those common genes were located on BTA14, reinforcing the impact of this genomic region on milk production traits. The candidate genes located closer to genomic region linked to the top SNP found (BTA14: 465,742 bp) among MILK, FAT, FAT%, and PROT% were PPP1R16A, FOXH1, KIFC2, and CYHR1. In addition, the closer common genes related to the top SNP found for PROT (BTA14, BP= 827,938) were GRINA, PARP10, and PLEC.
The gene interaction network analysis revealed a strong connection between LP and milk production traits ( Figure 3). Genes such as ARHGAP35, TMEM160, and SAE1 (BTA18), which were highly associated with LP, demonstrated to be linked with ARHGAP39, PPP1R16A, FOXH1, and CYHR1 (BTA14), which were significantly associated with milk production. Three big clusters were formed rounding ARHGAP39, TONSL, ADCK5, reinforcing that the molecular interactions among these three genes seem to be related to the control of the gene expression and protein regulation of milk traits.

Functional Analyses of Candidate Genes
Gene ontology (GO) enrichment analyses were performed to better understand the functional role of the candidate genes identified. GO terms for 14 biological processes and 12 molecular functions were significantly enriched, with 44 genes for MILK, 86 genes for FAT, and 33 for FAT% (Table 4). Furthermore, GO terms for 26 biological processes and nine molecular functions were significantly related to 41 genes for PROT, 109 genes for PROT%, and 6 for LP (Table 5). Five GO terms were found to be related to two or more traits. The GO:1903494 term associated with the response to dehydroepiandrosterone, GO:1903496 linked to the response of 11-deoxycorticosterone, and GO:0032355 associated with the response to estradiol were commonly identified for MILK, PROT, and PROT%. In addition, GO:0015125, related to bile acid transmembrane transporter activity, was identified for FAT and FAT%; and GO:0005149, which is related to the interleukin-1 receptor binding, was associated with FAT% and PROT%.

Commonly Identified Genes for Two or More Traits
Similar genomic regions were detected to be associated with different LP and milk production traits (Figure 2), indicating potential pleiotropic effects. LP presented common candidate genes with MILK, FAT, FAT%, and PROT%, where 15 candidate genes were concurrently associated with LP and the mentioned production traits: CXCL13 and LDB2 (MILK); ZMIZ1, bta-mir-371, NLRP12 and PPIF (FAT); INPP5A (FAT%); and SERPINB6, SERPINB9, IGF2BP1, and DLX4 (PROT%). Additionally, LP, FAT%, and PROT% showed common candidate genes between the three traits simultaneously: ABI3, GNGT2, B4GALNT2, and PHOSPHO1, demonstrating their importance not only in the genetic background of milk solids production, but also in the duration of the lactation peak.  milk production traits (Figure 3). Genes such as ARHGAP35, TMEM160, and SAE1 (BTA18), which were highly associated with LP, demonstrated to be linked with ARHGAP39, PPP1R16A, FOXH1, and CYHR1 (BTA14), which were significantly associated with milk production. Three big clusters were formed rounding ARHGAP39, TONSL, ADCK5, reinforcing that the molecular interactions among these three genes seem to be related to the control of the gene expression and protein regulation of milk traits.

Discussion
The majority of previous GWAS carried out to search for genomic regions associated with economically important traits in dairy cattle have been conducted using MD or HD SNP panel data. However, iWGS can increase the statistical power for detecting QTLs and causative variants for complex traits [58]. Based on iWGS-based GWAS, we identified novel genomic regions of interest, revealing novel candidate genes (ARHGAP35, NPAS1, TMEM160, ZC3H4, SAE1, ZMIZ1, PPIF, LDB2, ABI3, SERPINB6, and SERPINB9 for LP; NIM1K, ZNF131, GABRG1, GABRA2, DCHS1, and SPIDR for MILK; NR6A1, EXT2, POLD1, GOT1, and ETV6 for FAT; DPP6, LRRC26, and the KCN gene family for FAT%; CDC14A, RTCA, HSTN, and ODAM for PROT; and HERC3, HERC5, LALBA, and NEURL1 for PROT%), and confirmed previously reported associations. The fairly high number of novel candidate genes for LP and milk production traits supports the use of a denser mapping of the genome for GWAS purposes, as well as the polygenic nature of these traits. Liu et al. [59] evaluating 1220 Holstein cows with a SNP panel containing 124,743 markers identified 10 highly significant SNPs associated with FAT and PROT. However, those authors did not detect any significant SNP related to milk yield, even with a moderate number of markers involved in the GWAS, reinforcing the argument that the use of WGS can substantially contribute to this type of analysis. For instance, numerous QTLs were found for MILK (146), FAT (152), and PROT (166) in five French and Danish dairy breeds when using WGS data [18]. Using iWGS-based GWAS, we identified multiply genomic regions affecting LP, a complex trait with few known QTLs. Therefore, our findings greatly contribute to elucidating the genetic background of LP in Holstein cattle.

Candidate Genes for Lactation Persistency
Among the traits evaluated in this current study, LP is the less explored one, and to our best knowledge, this is the first iWGS-based GWAS for LP, which presents a good opportunity to explore novel genomic regions influencing its phenotypic expression. Despite the fact that the peaks were not well defined due to the highly polygenic nature of LP [7], BTA18 and BTA28 were the ones with the most significant regions. The genes highly associated to LP on BTA18 were ARHGAP35, NPAS1, TMEM160, ZC3H4, and SAE1 and on BTA28 were ZMIZ1 and PPIF. None of these genes were previously linked to LP. Interestingly, most of these genes mentioned above associated with LP were previously linked to fertility traits in dairy cattle [60][61][62], indicating that fertility and LP are genetically correlated in dairy cattle populations [63]. Other studies have also demonstrated the genetic relationship between reproductive traits and LP in dairy cattle. For instance, Muir et al. [64] showed that heifers with difficult first calving tended to have more persistent first lactations and lower peak yields, indicating an antagonistic relationship between calving ease and LP. More recently, Yamazaki et al. [9] reported that differences in LP are related to a cow's ability to conceive after the second calving. Therefore, the best bulls for improving female fertility after the second calving may differ with the production system and herd milk production, given that a strong genetic correlation was verified between LP and fertility traits in Japanese Holstein cattle [9]. Another important finding was the gene network connection among ARHGAP35, TMEM160, and SAE1 (BTA18) and ARHGAP39, PPP1R16A, FOXH1, and CYHR1 (BTA14), as demonstrated in Figure 3. No other study has reported a close network interaction between some of the main genes responsible for LP and milk production traits, indicating potential pleiotropic effects (Figure 3 and Supplemental Figure S2). Further studies should investigate the connection between LP and milk production traits at the molecular level. Evidence of interaction between LP and milk production traits were previously reported by Jakobsen et al. and Yamazaki et al. [65,66], where moderate to high genetic correlations were observed between these two trait groups.
Three GO terms were significantly enriched for LP, GO:0030334 (regulation of cell migration), GO:19003955 (positive regulation of protein targeting to mitochondrion), and GO:0004867 (serine-type endopeptidase inhibitor activity). The genes linked to the GO terms ABI3 and LDB2 (GO:0030334), SAE1 (GO:1903955), and SERPINB6 and SERPINB9 (GO:0004867) could also be considered as novel candidate genes and its molecular role related to LP should be deeper investigated. The ABI3-ABI gene family member 3 was significantly associated in our study not only with LP but also FAT% and PROT% (Figure 2), confirming its influence in the phenotypic expression of bovine milk-related traits. The ABI3 gene was previously related to milk pregnancy-associated glycoproteins in Holstein cattle but its molecular role in dairy cattle has been little explored [67]. SAE1, which also appear as one of the most significant genes for LP, was previously associated with lactation evolution in mice, indicating that this gene plays an important role in the expression of lactation in other mammalian species [68]. SERPINB6 and SERPINB9 are genes belonging to the serpins superfamily of protease inhibitors, which uses a conformational change to inhibit target enzymes. They are central genes controlling many proteolytic cascades, including important mammalian coagulation pathways [69]. Both genes were formerly associated with milk production traits in buffaloes [70] and somatic cell count in Jersey cows and clinical ketosis lactation in Holstein cattle [71,72], but this is the first time that SERPINB6 and SERPINB9 have been associated with LP. Due to its relationship with other dairy cattle traits, its molecular role in the expression of LP should be further investigated.

Candidate Genes for Milk Yield
Several powerful associations detected here support previously reported genomic regions for milk production traits. For instance, the region containing highly significant SNPs on BTA14 for MILK, including ARHGAP39, PPP1R16A, FOXH1, KIFC2, CYHR1, and TONSL was reported in other studies [15,37,73,74]. Atashi et al. [74] reported this region on BTA14 to be associated with 305-day milk yield and peak yield in dairy cows. Other relevant genomic regions were found on BTA20, BTA5, and BTA6, also containing genes previously identified as potentially influencing milk production. MGST1, SLC15A5 (BTA5); MOB1B, DCK, SLC4A4 (BTA6) and NIM1K, ZNF131 (BT20) can be highlighted for its strong significance with MILK (Table 2). Raven at al. [75] reported the link between MGST1 and milk production in multibreed cattle. Additionally, it is noteworthy that the SLC4A4 gene had an important function in the production of MILK in a study with Holstein cattle in the USA [73]. SLC4A4 is a solute transporter, belonging to one of the major transporter superfamilies mostly involved in the active transport of glucose. Glucose uptake by mammary epithelial cells is a crucial stage in milk synthesis, and therefore, directly influences MILK [76]. Furthermore, NIM1K and ZNF131 have been reported to prolong lactation period culminating in higher milk production levels in Canadian dairy cattle [36]. Banos et al. [77] reported that ZNF131 was expressed in the milk transcriptome and the mammary gland of dairy sheep, highlighting the impact of this gene in the process of molecular transcription of regions related to sheep milk production. The importance of zinc finger protein 131 (ZNF131) on the transcription of molecular codes responsible for the milk production seems clear. Lastly, this is the first time that NIM1K and ZNF131 have been directly associated with milk yield in dairy cattle.
Four other novel candidate genes for MILK presented significant GO terms (FDR ≤ 1%) in the enrichment analyses. GO:0004890 and GO:0005230: GABRG1 (BTA14) and GABRA2 (BTA6); GO:0003273: DCHS1 (BTA15) and GO:0071479: SPIDR (BTA14), shown in Table 4, can be highlighted as genes that might play fundamental roles in metabolic functions involving MILK. Interestingly, the four genes were previously linked to FAT [62,78,79] or other milk components [80], confirming its close relationship with milk production traits. For instance, γ-aminobutyric acid type A genes (GABRG1 and GABRA2), which contributes to γ-aminobutyric acid (GABA) chloride ion channel activity and participates in GABA-A receptor activity, were previously associated with milk production traits in Holstein and Xinjiang Brown cattle [81,82].

Candidate Genes for Fat Yield
As observed for MILK, the GWAS for FAT revealed several genomic regions with highly significant SNPs associated with 989 genes spread across 21 chromosomes. Besides the genes located in BTA14 (ARHGAP39, PPP1R16A, FOXH1, KIFC2, CYHR1, and TONSL) identified for MILK, a highly significant genomic region was identified for FAT on BTA5, which harbors the SLC15A5 gene. SLC15A5 was previously associated with FAT, presented in a large region of 88.26-93.69 Mb on BTA5, that seems to have a cluster of additive effects linked with MGST1, PLEKHA5, and ABCC9 [62]. The same genes (MGST1, PLEKHA5, and ABCC9) were also significantly associated with FAT in our study (Supplemental Table S1), suggesting that this genomic region plays an important role in the expression of FAT in Holstein cattle. Other significant peaks for FAT were observed on BTA11, BTA15, BTA16, and BTA26, revealing novel candidate genes. On BTA11, the most significant region contains the NR6A1 gene, which was also included in the significant GO:0000978, demonstrating its potential for being a novel candidate gene associated with FAT. NR6A1 has high homology among different species [83], and acts in the expression of traits linked to metabolism, reproduction, and production, as demonstrated in a study with swine where this gene was related with fat deposition [84]. Another relevant gene located on BTA11 that might be interesting to investigate further is OLFML2A. Besides been highly associated with FAT, OLFML2A was also found in VEP analysis, confirmg its potential as a novel candidate gene associated with FAT. OLFML2A (Olfactomedin Like 2A) is a protein coding gene related to protein homodimerization activity and extracellular matrix binding. To our best knowledge, this gene has never been associated with any milk trait in the literature before and its influence on milk related traits warrants a deeper investigated. Interestingly, the OLFML2A gene was found differently expressed in a study of fat depotspecific gene signatures in mice, contributing to the distinct patterns of extracellular matrix remodeling and adipose function in different fat depots [85]. Additionally, on BTA15, the most significant SNP found was associated with EXT2, which was also related with a significant GO term responsible for cell differentiation (GO:0030154). EXT2 was previously cited as a suggestive gene associated with milk iron content in Jersey cows [86]. However, this is the first time that this gene has been directly associated with FAT in Holstein cattle.
Among the most significant GO terms, GO:0055089 is involved in fatty acid homeostasis in which important genes such as DGAT1 are included. The link between the DGAT gene with FAT and other milk production traits is widely known [28,87]. However, out of the five genes related to GO:0055089, POLD1 and GOT1 have not been previously associated with FAT. POLD1, which is located on BTA18, was previously associated with PROT in Nordic Holstein cattle, but its function in the expression of milk production traits is not yet well established [88]. GOT1, located on BTA26, was previously associated with milk fatty acids acting in the transformation of citrate by ATP-citrate lyase in the cytosol, which is required for fatty acid synthesis [89]. Another novel candidate gene for FAT is ETV6, which was identified in two GO terms (GO:0030154 and GO:0000978), representing cell differentiation and RNA polymerase II core promoter proximal region sequence-specific DNA binding, respectively. ETV6 was also previously associated with FAT% in Brown Swiss cattle [90] and MILK in Holstein cattle [78], but this is the first time that ETV6 was linked to FAT.

Candidate Genes for Fat Percentage
For FAT%, 4459 SNPs, distributed across 22 chromosomes, associated with 2016 annotated genes were found. Among the highly significant regions, it can be highlighted the SLC15A5 (BTA5) and MRPS30 (BTA20) genes, besides those already mentioned from the BTA14 also found in MILK and FAT. SLC15A5, a solute carrier family member, was also recently associated with FAT% in Holstein cattle [62,91], confirming its importance to milk quality. Furthermore, other genes from the solute carrier family demonstrated to be relevant for FAT% since SLCO1A2, SLCO1B3, SLCO1C1 and SLCO2B1 were associated in the GO:0015125, one of the significant GO terms enriched for this trait (Table 4). MRPS30 was first associated with FAT% in Jiang et al. [62] and previously linked to MILK in studies reported by Fang et al. and Cai et al. [92,93].
The most significant GO terms for FAT% are GO:0005149 (interleukin-1 receptor binding) and GO:0015459 (potassium channel regulator activity). The GO:0005149 presented the highest significance (8.5 × 10 −8 ) and showed a big group of IL family genes associated to FAT% (Table 4). There are few reports in the literature integrating the interleukin-1 receptor with dairy cattle-only research that associated genes from this gene family with mastitis or fertility indicators [94,95]. However, there is proven evidence in other species, including humans, of the relationship between the IL gene family and the structuring of fat in different tissues, which may demonstrate the importance of some of these genes in the context of the structuring of milk fat [96,97]. Another important GO term found was the GO:0015459, which is related to the potassium channel activity. The main genes involved in this GO are DPP6 (BTA4), LRRC26 (BTA11) besides others from the KCN gene family. Genes from the KCN family were previously associated with milk traits [98], but its molecular involvement with FAT% needs to be further explored. A possible mechanism that might be worth investigating is that, in milk, potassium is correlated with lactose, and therefore with milk yield, via osmotic regulation [99]. Therefore, changes in percentage traits could be driven by potassium-induced changes in milk volume.

Candidate Genes for Protein Yield
For PROT, CDC14A and RTCA (BTA3); and MAF1, SHARPIN, CYC1, EXOSC4, PARP10, OPLAH, GRINA, PLEC (BTA14) are key candidate genes for milk protein expression. On BTA3, this is the first time that CDC14A and RTCA are associated with PROT, but interestingly both genes were already detected in signatures of selection of other milk production traits in Valdostana cattle populations [100]. Furthermore, RTCA was previously associated with milk production traits in buffalo [101], but its connection with milk traits has not been described yet. The MAF1 gene located on BTA14 has been associated with milk protein synthetic capacity and for this reason, it has been pointed out as a key candidate genes for PROT [34,62,102]. The other genes identified on BTA14 (SHARPIN, CYC1, EXOSC4, PARP10, OPLAH, GRINA, and PLEC) were recently strongly associated with milk production traits, including PROT [37,93,102]. According to Jena et al. [103], SHARPIN influences mammary gland development and controls extracellular matrix organization of stroma during branching morphogenesis, which induces alveologenesis during pregnancy and lactation. Moreover, Lin et al. [104] found strong association of SHARPIN, CYC1, EXOSC4, and PARP10 with milk serum albumin, which is one of the main protein contents of cattle milk. These facts reinforce the hypothesis that the large genomic region where these genes are located (0.5 Mb upstream and downstream from SHARPIN) is important for milk protein expression. It is important to highlight that both GRINA and PARP10 were found in our VEP analysis, confirming that those two genes are highly associated with PROT and should be considered as novel relevant variants for this trait. The Glutamate Receptor Ionotropic NMDA-Associated Protein 1 (GRINA) belongs to the Lifeguard family and is involved in calcium homeostasis [105]. This gene was previously reported to play an important role in the lipid, major proteins, and cholesterol homeostasis in milk content of dairy cows, suggesting that GRINA might contribute to the regulation of solids present in dairy milk [12,102]. PARP10, which is a member of the poly (ADP-ribose) polymerases family, is related to several essential biological functions, such as immunity, metabolism, apoptosis, and DNA damage repair [106]. From a physiological perspective, the concentration of albumin in milk is influenced by pathological and genetic factors, which could connect the action of PARP10 on the regulation of albumin in dairy cattle milk content [104].
The GO enrichment analyses for PROT (Table 5) also revealed the casein gene cluster containing the CSN1S1, CSN1S2, CSN2, and CSN3 genes, which encode αs1, αs2, β, and κ casein, respectively. These genes have a strong influence on casein synthesis in cattle milk, and polymorphisms in this region have significant impact on milk protein composition [107]. In this context, HSTN and ODAM, which are located at the same region of the casein gene cluster, can be considered as novel candidate genes for PROT due to their close binding on BTA6 with CSN1S1, CSN1S2, CSN2, and CSN3, where all these genes are in linkage disequilibrium.

Candidate Genes for Protein Percentage
With several genomic regions adjacent to those mentioned in the other milk production traits, PROT% was the trait with the largest number of significant markers, i.e., 5519 SNPs, spread across 24 chromosomes and 2739 annotated genes. The most significant regions were found on BTA5, BTA6, BTA14, BTA15, and BTA20, with special emphasis on BTA6, BTA20, and BTA14. The most significant genes located on BTA6 are HERC3, HERC5, and PIGY. HERC3 and HERC5 can be considered as novel candidate genes for PROT% as they were previously associated with PROT and FAT%, respectively [29,30]. Genes from the HERC family of ubiquitin ligases associate with prolactin to regulate important milk proteins such as β-casein [108], turning these two genes into strong candidates to be intimately related to PROT%. On the BTA20 chromosome, PAIP1, C20H5orf34, TMEM267, CCL28, HMGCS1, and NIM1K are the most significant genes located in this genomic region. Of those, only CCL28 was previously reported to be associated with PROT% in North American Holstein cattle [62]. However, the other five genes also deserve special attention, not only for being highly associated with PROT% but also for being grouped in a narrow genomic region (31.20-31.50 Mb) that possibly makes these genes act together in the expression of PROT%. The C-C motif chemokine ligand 28 gene (CCL28) belongs to the subfamily of small cytokine CC genes that are involved in immunoregulatory and inflammatory processes [109]. Because of its relation with antimicrobial activity, CCL28 can play an important role in mastitis control and thus, indirectly influencing milk production [110]. This gene seems to be a strong candidate gene, as it was identified in our VEP analysis and is directly associated with one of the markers with one of the highest significance levels for PROT% on the BTA20.
Eighteen GO terms were significantly enriched (Table 5) and associated with various processes, especially GO:0005149 (interleukin-1 receptor binding), GO:1903496 (response to 11-deoxycorticosterone), GO:1903494 (response to dehydroepiandrosterone), and GO:0007595 (lactation). As found for FAT%, the interleukin-1 receptor binding presents a large group of IL family genes for PROT%, which are crucial in the expression of milk production traits in many species, including dairy cattle [97]. The terms GO:1903496 and GO:1903494 were also found for PROT, reinforcing the relevance of the casein gene cluster for milk protein, in both forms, total yield and percentage. Additionally, in both terms, the gene LALBA (milk whey protein α-lactoalbumin) was also identified for PROT%, highlighting the involvement of this well-known gene with PROT%, as previously reported in other dairy cattle studies [111,112]. On GO:0007595, many genes known for its association with PROT% were identified such as STAT5A, STAT5B, ATP2B2, CSN3, CSN2, and PRLR, validating the connection of these genes with milk composition in dairy cattle. VDR has also been previously linked to PROT% in Holstein and Jersey cows [113].

Common Genes Identified in GO Terms
Several overlapping genomic regions were found either between milk production traits or between milk production and LP traits (Figure 2). This fact supports the hypothesis that many genes associated with these traits could have a pleiotropic effect in dairy cattle [16]. According to Oliveira et al. [16], the pleiotropic effects observed on genes related to milk traits suggest a biological function on the use of energy resources directly affecting the synthesis of milk and solids. Few genes have been reported to commonly influence LP and milk production traits, which reinforces the fact that the common genes found in our study can help to elucidate the molecular interactions among various candidate genes with potential pleiotropic effects.
Fifteen genes were significantly associated with LP and at least one of the milk production traits, as CXCL13 and LDB2 (MILK); ZMIZ1, bta-mir-371, NLRP12 and PPIF (FAT); INPP5A (FAT%); SERPINB6, SERPINB9, IGF2BP1 and DLX4 (PROT%). Additionally, LP, FAT%, and PROT% had common candidate genes between the three traits simultaneously: ABI3, GNGT2, B4GALNT2, and PHOSPHO1, demonstrating their importance on the genetic structure of milk solid production, but also in the duration of peak lactation. Out of those mentioned genes, SERPINB6 and SERPINB9 have been previously associated with somatic cell count in Jersey cattle [16] and milk production traits in water buffalo [114]. Furthermore, it is noteworthy that the genes LDB2 (MILK), SERPINB6, SERPINB9 (PROT%) and ABI3 (FAT% and PROT%) were also present in the significant GO terms of biological process, revealing its relationship with regulation of cell migration, and regulation of protein and enzyme inhibitor activity.
As expected, due to the high genetic correlation between the studied traits, numerous genes were simultaneously associated with the five milk production traits. Oliveira Junior et al. [13] working with the same North American Holstein population found high genetic correlation (>0.48) among MY, FAT, PROT, FAT%, and PROT% highlighting the close genetic relationship between these traits. Even in multibreed dairy populations the high correlations among production traits are usually observed [115]. In total, 98 common genes were identified for all milk production traits ( Figure 2). Interestingly, all these genes are located on BTA14, reinforcing the importance of this genomic region for milk production traits. The genes located closer to the genomic region linked to the top SNP found among MILK, FAT, FAT%, and PROT% are PPP1R16A, FOXH1, KIFC2, and CYHR1. Nayeri et al. [15] reported that PPP1R16A, FOXH1, and CYHR1 were commonly linked with FAT and FAT% in Holstein cattle and Cai et al. [93] also demonstrated that PPP1R16A, FOXH1, KIFC2, and CYHR1 presented a potential pleiotropic effect on MILK, FAT, and PROT. Finally, the closer common genes related to the top SNP found for PROT are GRINA, PARP10, and PLEC. GRINA and PARP10 were also reported by Cai et al. [93] as genes with pleiotropic effect on MILK, FAT, and PROT. Furthermore, PLEC was the only gene reported to be commonly associated with MILK, PROT, and FAT in Chinese Holstein cattle [17], which is in agreement with our findings.

Potential Implications and Limitations
Several novel candidate genes associated with LP and milk production traits in dairy cattle were identified, while previous associations were also confirmed. These findings will be useful for optimizing genomic prediction of breeding values in Holstein cattle and other dairy breeds, by adding the significant SNPs into commercial SNP panels to increase the accuracy of predictions and also give differential weight to these important genomic regions through biology-driven genomic prediction methods [116]. Furthermore, the genomic regions revealed are initial targets for future studies investigating the molecular mechanisms influencing the phenotypic expression of milk related traits. For instance, some important candidate genes found require a better understanding of their molecular functions, such as SAE1, SERPINB6, and SERPINB9, which were highly associated with LP but their biological functions are not clear. SAE1-SUMO activating enzyme subunit 1, is a gene linked to one of the most significant SNPs for LP and also present in one of the enriched GO terms found here, was previously associated with dairy cow mammary gland epithelial cells [117], but its molecular functions related to LP are still unclear.
Additionally, even with an application of a strict quality control to reduce the influence of the poorly imputed variants and individuals on the GWAS analysis, it is still expected that some removed low-frequency variants could be associated with the studied traits and not identified here. Despite the great advantage of identifying causal mutations at low frequency [20], they could also be false positives. Future studies should focus on the biological validation of the key candidate genes reported in this study. This could be done based on in-vitro experiments and gene knock-out models.

Conclusions
We have shown that the use of imputed whole-genome sequence data for GWAS enabled the identification of a high number of SNPs associated with lactation persistency and milk production traits in dairy cattle. Several genomic regions and candidate genes were identified, which are widely distributed across all autosomal chromosomes, especially BTA5, BTA6, BTA14, BTA18, BTA20, and BTA28. This study also confirmed the importance of the BTA14 for milk production traits. Additionally, many genomic regions with potential pleiotropic effects were identified. Numerous novel candidate genes were revealed: ARHGAP35, NPAS1, TMEM160, ZC3H4, SAE1, ZMIZ1, PPIF, LDB2, ABI3, SERPINB6, and SERPINB9 (LP); NIM1K, ZNF131, GABRG1, GABRA2, DCHS1, and SPIDR (MILK); NR6A1, OLFML2A, EXT2, POLD1, GOT1, and ETV6 (FAT); DPP6, LRRC26, and KCN gene family (FAT%); CDC14A, RTCA, HSTN, and ODAM (PROT); HERC3, HERC5, LALBA, CCL28, and NEURL1 (PROT%), involved in key biological pathways such as fatty acid homeostasis, transporter regulator activity, response to progesterone and estradiol, response to steroid hormones, and lactation. Lastly, another relevant finding was that a variety of genomic regions related to LP and milk production were previously associated with fertility traits in dairy cattle, confirming the links between these two groups of traits. Our findings contribute to a better understanding of the molecular mechanisms underlying the phenotypic expression of lactation persistency and milk production traits, which can be useful for improving the genomic evaluation of important economic traits in the Holstein cattle.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12111830/s1, Figure S1: Manhattan plots with truncated Y-axis for the GWAS results for milk yield (MILK), fat yield (FAT), fat percentage (FAT%), protein yield (PROT) and protein percentage (PROT%), based on imputed whole-genome sequence data. Figure S2: Gene interaction network for genes associated with lactation persistency in North American Holstein cattle. Table S1: Description of significant SNPs, associated genes id and name, and gene biotype for milk yield (MILK) in North American Holstein cattle. Table S2: Description of significant SNPs, associated genes id and name, and gene biotype for fat yield (FAT) in North American Holstein cattle. Table S3: Description of significant SNPs, associated genes id and name, and gene biotype for fat percentage (FAT%) in North American Holstein cattle. Table S4: Description of significant SNPs, associated genes id and name, and gene biotype for protein yield (PROT) in North American Holstein cattle Table S5: Description of significant SNPs, associated genes id and name, and gene biotype for protein percentage (PROT%) in North American Holstein cattle. Table S6: Variants identified in variant effect predictor analysis for LP, MILK, FAT, FAT%, PROT, and PROT% in North American Holstein cattle. Table S7: Description of significant SNPs, associated genes id and name, and gene biotype for lactation persistency (LP) in North American Holstein cattle.