Genome-Wide Association Study Using Whole-Genome Sequence Data for Fertility, Health Indicator, and Endoparasite Infection Traits in German Black Pied Cattle

This genome-wide association study (GWAS) aimed to identify sequence variants (SVs) and candidate genes associated with fertility and health in endangered German Black Pied cattle (DSN) based on whole-genome sequence (WGS) data. We used 304 sequenced DSN cattle for the imputation of 1797 genotyped DSN to WGS. The final dataset included 11,413,456 SVs of 1886 cows. Cow traits were calving-to-first service interval (CTFS), non-return after 56 days (NR56), somatic cell score (SCS), fat-to-protein ratio (FPR), and three pre-corrected endoparasite infection traits. We identified 40 SVs above the genome-wide significance and suggestive threshold associated with CTFS and NR56, and three important potential candidate genes (ARHGAP21, MARCH11, and ZNF462). For SCS, most associations were observed on BTA 25. The GWAS revealed 61 SVs, a cluster of 10 candidate genes on BTA 13, and 7 pathways for FPR, including key mediators involved in milk fat synthesis. The strongest associations for gastrointestinal nematode and Dictyocaulus viviparus infections were detected on BTA 8 and 24, respectively. For Fasciola hepatica infections, the strongest associated SVs were located on BTA 4 and 7. We detected 200 genes for endoparasite infection traits, related to 16 pathways involved in host immune response during infection.


Introduction
Rapid advances in sequencing technology have opened new opportunities for dairy cattle breeding. Whole-genome sequencing (WGS) and genotype imputation to wholegenome sequence genotypes is an effective method to identify genes and causal mutations for single-gene traits, genetic defects, and complex polygenic traits in cattle [1]. Compared to commonly used single nucleotide polymorphism (SNP) chip arrays, WGS data enable discovering both common and rare variants affecting complex polygenic traits in cattle [2]. Consideration of complete linkage information in WGS data plus the possible consideration of the full genetic variance may contribute to the identification of variants, which do not exceed the significance threshold in genome-wide association studies (GWAS) using commercial SNP panels [2,3]. As an outstanding step forward, using WGS data in GWAS enhances the discovery of causative mutations [4,5] and increases the power to map genes for low heritable functional traits in cattle [4,6,7].
Powerful gene mapping approaches for functional traits is of special importance for the conservation of local and endangered breeds, since these are less competitive to large cattle populations with regard to economically relevant traits [8,9]. Being predominantly kept in outdoor systems, local breeds are considered to be robust to harsh environmental conditions, displaying favorable adaptive trait characteristics such as heat stress or disease resistance [9][10][11]. The conservation of local endangered breeds is essential to offer solutions for future challenges in animal husbandry and to preserve genetic diversity [11]. In this regard, WGS data may include genomic markers which can be used to improve functional traits [12]. Moreover, compared with pedigree data, deep and dense genomic information as provided by WGS data improves inbreeding estimations and is powerful to identify deleterious variants in a population. Both aspects, i.e., controlling inbreeding and deleterious allele frequencies, are important mechanisms in genomic selection programs to maintain long-term selection response and genetic diversity [13].
With a small population size of~2560 cattle, the German dual-purpose Black Pied cattle (DSN, German: Deutsches Schwarzbuntes Niederungsrind) is an endangered local breed in Germany [14]. The breed has a long breeding history in the German grassland region East Frisia, as well as in the German federal states of Lower Saxony and Brandenburg located in the eastern part of Germany [15,16]. DSN is considered the founder breed of the modern Holstein Friesian (HF) and characterized by higher fertility and milk protein and fat content compared to HF [10,17]. DSN cows are well adapted to grazing systems, and thus, 50% of all DSN cows in Germany are kept under organic pasture conditions [18]. However, grazing is associated with an increased risk for endoparasite infections [19].
Gastrointestinal nematodes (GIN), the bovine lungworm (Dictyocaulus viviparus), and the liver fluke (Fasciola hepatica) are the three most important helminthic species in pastured dairy cattle [20]. The annual estimated costs of these three helminth infections were 941 million euros in dairy cattle in Europe due to impairments in milk production and fertility [20]. Since anthelmintic treatment against endoparasite infections is restricted in organic dairy farming, there is a need to develop breeding strategies to improve cows' resistance to helminthic parasites. In this regard, research has aimed to investigate the genetic background of resistance to endoparasite infections in dairy cattle since 2017 [21][22][23]. Surprisingly, in a cross-classified research design for selection line comparisons, May et al. [21] identified higher GIN and D. viviparus infection rates for DSN compared to HF. An ongoing GWAS [22] detected the strongest associations for GIN and D. viviparus infections on BTA 2,5,8,15,17,21, and 24 in a population of 148 DSN cows with 700K imputed genotypes. For F. hepatica, associated significant SNP markers were identified on BTA 1, 7, and 28. The genes ALCAM, CDH2, EGFR, and PHLPP1 were annotated as main candidates involved in immunological functions during endoparasite infections in DSN. As an extension to commercial SNP chip applications, GWAS based on WGS data is expected to unravel the genetic architecture of endoparasite resistance and further important functional traits in DSN much deeper. For example, Twomey et al. [7] applied a GWAS for endoparasite infection traits in Irish dairy and beef cattle and demonstrated that only 0 to 11% of all quantitative trait loci (QTL) from imputed WGS data were detected when using 50 K genotypes.
Further GWAS in DSN were based on commercial 50 K SNP chip panels with focus on milk production [24] and udder health traits [25]. These studies revealed candidate genes previously described in HF and other cattle breeds, raising the interest in identifying species-specific variants for functional traits in DSN with WGS data. Clinical mastitis is the most common disease in DSN with incidences up to 26% [25], raising the interest in improving somatic cell counts (SCC) in milk as an indicator for udder health in DSN via genomic selection. Jaeger [10] studied udder health indicator traits in DSN phenotypically and identified a strong detrimental impact of increased SCC on feed intake and rumination. Although well adapted to grazing systems, DSN are exposed to metabolic stress due to heat stress impact on pastures and associated mobilization of body reserve in response to restricted feed intake. The fat-to-protein ratio (FPR) is a valuable indicator for metabolic health (e.g., ketosis) and for metabolic stability in this regard [26,27]. Moreover, FPR might be a novel indicator trait for robustness in DSN, especially in the context of challenging heat stress environments in outdoor systems [28]. Due to their favorable grazing abilities, the conservation of DSN is financially supported by the German government. However, ongoing DSN breed competitiveness implies an optimization of the preventive health management in grazing and organic farming systems via genomic selection, especially to improve metabolic stability and resistance to infectious diseases. Therefore, the present study aimed to identify genome-wide associations, potential candidate genes and pathways for female fertility, SCC as an indicator for udder health, FPR as an indicator for metabolic health and stability, and endoparasite infections in DSN based on WGS data. Providing genomic markers for fertility and functional health traits is important for the development of future genomic selection programs to preserve DSN and to improve the breed adaption towards pasture environments.

Cow Traits
Cow traits were available for 1886 DSN cows from eight dairy farms located in the German federal states of Brandenburg (five farms), Hesse (one farm), and Lower Saxony (two farms). The cows were born between 2005 and 2016. The first calving age ranged from 22 to 40 months (mean: 27.3 months). Fertility traits included calving-to-first service interval (CTFS) and the non-return after 56 days (NR56 = insemination success proved at day 56 after first insemination) in first parity cows. For the udder and metabolic health indicator traits SCC and FPR, we included the first test-day record between days in milk (DIM) 5 and 40 from the first parity. SCC was log-transformed into somatic cell score (=SCS =log 2 (SCC/100,000) + 3) [29]. For fertility and health indicator traits, we excluded cows with a genetic breed (DSN) percentage lower than 90% according to our own developed algorithm to clearly differentiate between DSN and HF cows [30].
The endoparasite infection traits considered repeated measurements for fecal egg counts for GIN (FEC-GIN), fecal egg counts for F. hepatica (FEC-FH), and fecal larvae counts (FLC) for D. viviparus (FLC-DV). A modified McMaster technique [31] with a sensitivity of 25 eggs/g feces was used to determine FEC-GIN. For GIN, strongylid eggs were the predominant morphotype followed by Strongyloides papillosus and Capillaria spp. eggs. Fecal egg counts for F. hepatica were examined by the sedimentation technique using 10 g feces per sample. The fecal larvae count for D. viviparus was determined with the Baermann technique using 40 g feces per sample [32]. Endoparasite traits (FEC-GIN, FLC-DV, FEC-FH) were available for an initial dataset including 1166 untreated and pastured Black and White dairy cows (including DSN) examined in 2015 [21]. Using this initial dataset, endoparasite traits were pre-corrected for fixed effects via linear mixed models in the statistical software SAS (version 9.4 [33]) as described in May et al. [22]. In this regard, farm, parity, genetic line (DSN and other Holstein Friesian selection lines), season of parasitological examination, and lactation stage of cows were included as fixed effects. The pre-corrected phenotypes (residuals) for the three endoparasite traits (FEC-GIN, FEC-FH, and FLC-DV) from the linear mixed models are later denoted as RES-GIN, RES-FH, and RES-DV, respectively. Descriptive statistics for FEC-GIN, FLC-DV, and FEC-FH in the initial dataset of 1166 cows are presented in Table 1.

Whole-Genome Sequencing and Imputation of 50K Genotypes
Whole-genome sequencing data were available for 304 DSN cattle, of which 47 were bulls (all available DSN sires used for artificial insemination) and 257 were cows from eight different herds, reflecting the whole phenotypic range for milk yield, milk composition, and reproduction traits in the DSN population. Sequencing was performed on the Illumina NovaSeq platform (Novogene Bioinformatics Technology Co., Ltd., Beijing, China) with 150 paired end reads and 15× coverage. Sequence read mapping, variant calling, and recalibration were performed following the 1000 Bull Genome (http://www.1000bullgenomes.com; accessed on 21 February 2020 guidelines. Lower quality sequence variants (SVs) were discarded by applying the machine learning method "variant quality score recalibration" in the Genome Analysis Toolkit (GATK, version 4.1.3.0 [36]) and using training and truth set variants provided by the 1000 Bull Genomes database. After, the dataset included 20,567,619 SVs (including 18.5 million SNPs and 2.0 million indels). SVs with a minor-allele frequency (MAF) <1% and a call rate < 95% were discarded. SVs with 5% of Mendelian inconsistences (i.e., opposing homozygotes detected from 156 sire-offspring pairs) were removed. Moreover, SVs with a quality by depth (QD) <10 and coverage <3000 were removed in case they were not predicted as high or moderate impact by the variant effect predictor (VEP) [37]. After quality filtering, the dataset contained 16,175,216 SVs from 304 DSN. This dataset provided a reference panel for the imputation of 1797 DSN genotyped with the BovineSNP50 (50 K) Bead Chip V2 (Illumina Inc., San Diego, CA, USA). The average numerator relationship between the 304 cattle from the reference panel and the 1797 genotyped cows for imputation was 6.7%. The inbreeding coefficients were 1.9% for the reference group and 2.1% for the imputation group. The imputation was performed in the software BEAGLE (version 5.1 [38]) based on a one-step imputation accuracy approach [39]. The average imputation accuracy was 97.04%. SVs with MAF <5%, linkage disequilibrium (r 2 ) <0.5 (output from BEAGLE), and significant deviation from the Hardy-Weinberg equilibrium (HWE, p < 10 −6 ) were discarded from the imputed dataset. Then, the 1797 imputed DSN genotypes were merged with the 304 sequenced DSN, resulting in a total number of 12,164,173 SVs (including 11,043,497 SNPs and 1,120,676 indels) from 2101 sequence level DSN genotypes.

Quality Control of Whole-Genome Sequence Genotypes
Imputed sequence-level genotypes and cow traits were merged, resulting in a total number of 1683 cows with phenotypes for fertility traits, 1638 cows with phenotypes for health and metabolic stability indicator traits, and 142 cows with phenotypes for pre-corrected endoparasite infection traits. Quality control of imputed WGS genotypes was performed within the three datasets in three consecutive runs in PLINK (version 1.9 [40]). SVs with MAF <5% and significant deviation from HWE (p < 10 −6 ) or a call rate <95% were discarded. After quality control, 11,391,082 SVs (10,343,725 SNPs and 1,047,357 indels) remained for cows with phenotypes for CTFS and NR56, 11,413,456 SVs (10,363,951 SNPs and 1,049,505 indels) remained for cows with phenotypes for SCS and FPR, and 10,595,540 SVs (9,624,332 SNPs and 971,208 indels) remained for cows with precorrected phenotypes (residuals) for endoparasite infection traits (RES-GIN, RES-FH and RES-DV). Descriptive statistics for all traits of cows with imputed sequence level genotypes after quality control is shown in Table 1.

Genome-Wide Association Analyses
We applied a single marker linear mixed model in the software package GCTA [34] to identify genome-wide associations for fertility traits (CTFS, NR56), health and metabolic stability indicator traits (SCS, FPR), and pre-corrected endoparasite infection traits (RES-GIN, RES-FH, RES-DV). All GWAS were performed as MLMA-LOCO "leaving one chromosome out" analysis for large datasets using the -mlma and -mlma-subtract-grm options in GCTA. The statistical model for testing single-locus effects was defined as follows: y = Xβ + Zu + Ss + e where y = vector including records for CTFS, NR56, SCS, FPR, RES-GIN, RES-FH, and RES-DV; β = vector of fixed effects (farm, calving year, calving month, and a linear regression on age at first calving for CTFS and NR56; farm, test-day year-season, a linear regression on DIM, and a linear regression on fat percentage for SCS; farm, test-day year-season, and a linear regression on age at first calving for FPR); u = vector of polygenic effects with u~N (0, Gσ 2 u ), with G denoting the genetic similarity matrix among individuals, and σ 2 u the polygenic variance; s = vector for marker effects; e = vector of random residuals; and X, Z, and S were incidence matrices for β, u, and s, respectively.

Candidate Gene Annotations and Pathway Analyses
We applied the biomaRt R package [42,43] from Bioconductor to retrieve 'rs accession numbers' of associated SVs using the getBM() function. Potential candidate genes were queried and assigned to the associated SVs using the current gene annotations from ENSEMBL (release 104) [44] based on the Bos taurus ARS-UCD1.2 genome assembly [45]. A gene was considered as a candidate gene if at least one SV above pSug was positioned in the respective gene and/or within 100 kb up-and downstream of the respective candidate gene. Physiological functions and positions of potential candidate genes were manually reviewed in the ENSEMBL and KEGG [46] databases. In addition, the identified potential candidate genes were manually submitted to the DAVID (version 6.8 [47]) and KEGG pathway databases for pathway analysis.

Calving-to-First Service Interval
The Manhattan plot from the GWAS with CTFS is given in Figure 1A. The genomic inflation factor λ was 1.041. We identified 33 SNPs above pSug on BTA 5, 12, 13, 15, and 28 (Table S1). In addition, one SNP (rs380946888) on BTA 12 surpassed the genome-wide significance threshold pBonf. The highest number of associations was identified on BTA 13 (n = 15) and on BTA 28 (n = 10). The associated markers were annotated to six potential candidate genes on BTA 12, 13, 15, and 28 (Table 2). On BTA 12, we identified the Kelch-like family member 1 (KLHL1) gene. The highest association (p = 2.25 × 10 −7 ) on BTA 13 was found within the Rho GTPase activating protein 21 (ARHGAP21) gene with five SNPs located in the gene and nine SNPs and one indel close to the gene ( Table 2). Further annotated genes were the lin-7 homolog C (LIN7C) gene, ENSBTAG00000052005, and the olfactory receptor family 5 subfamily member 5 (OR5BE5) gene on BTA 15. All associated variants on BTA 28 are located in the choline O-acetyltransferase (CHAT) gene. Figure 1B shows the Manhattan plot from the GWAS with NR56. The genomic inflation factor λ was 1.014. The GWAS revealed six SVs (five SNPs and one indel) above pSug on BTA 8, 11, 13, and 20 for NR56 (Table S1). The associated markers were annotated to three potential candidate genes ( Table 2). The SNPs rs110809463 and rs135364419 on BTA 8 are located in the zinc finger protein 462 (ZNF462) gene. The SNP rs383197946 on BTA 11 is located in the EFR3 homolog B (EFR3B) gene. The SNP rs207515592 on BTA 20 is closely related to the membrane associated ring-CH-type finger 11 (MARCH11) gene. The Manhattan plot from the GWAS with SCS is presented in Figure 2A. The genomic inflation factor λ was 1.072. The GWAS revealed one SNP (rs137783421) above pBonf on BTA 11 and 53 SNPs above pSug on BTA 1, 9, 11, 13, 23, and 25 (Table S2). The majority of associations (n = 47) was found on BTA 25. We identified nine potential candidate genes for SCS (Table 3). The claudin 8 (CLDN8) gene on BTA 1 was annotated to the leukocyte transendothelial migration pathway and to the cell adhesion molecules pathway, playing a major role in gene expressions during Escherichia coli-induced mastitis in dairy cows (Table 4). One marker on BTA 9 was located in the SUMO specific peptidase 6 (SENP6) gene. The SNP rs137783421 (maximum association for SCS) on BTA 11 is located in the Rap guanine nucleotide exchange factor 1 (RAPGEF1) gene. The RAN binding protein 9 (RANBP9) gene on BTA 23 was annotated to the SNP rs876215027. All genomic associations on BTA 25 (n = 47; bp: 6,629,679-6,656,058) are located in or close to the RNA binding fox-1 homolog 1 (RBFOX1) gene.

Somatic Cell Score
The Manhattan plot from the GWAS with SCS is presented in Figure 2A. The genomic inflation factor λ was 1.072. The GWAS revealed one SNP (rs137783421) above pBonf on BTA 11 and 53 SNPs above pSug on BTA 1, 9, 11, 13, 23, and 25 (Table S2). The majority of associations (n = 47) was found on BTA 25. We identified nine potential candidate genes for SCS (Table 3). The claudin 8 (CLDN8) gene on BTA 1 was annotated to the leukocyte transendothelial migration pathway and to the cell adhesion molecules pathway, playing a major role in gene expressions during Escherichia coli-induced mastitis in dairy cows

Fat-to-Protein Ratio
The Manhattan plot from the GWAS for FPR is presented in Figure 2B. The genomic inflation factor λ was 1.088. The GWAS revealed one SNP (rs439994366) above pBonf on BTA 12 and 60 SVs (54 SNPs and six indels) above pSug on BTA 9, 11, 12, 13, 19, 26, and 27 (Table S2). The associated markers were annotated to 19 potential candidate genes (Table 3). The SNP rs439994366 is located in the Von Willebrand factor A domain containing 8 (VWA8) gene on BTA 12, with 15 SVs (thereof two indels) located in the gene and one further SNP at a close distance. The region between 58.4 and 75.3 Mb on BTA 13 harbored a cluster of ten potential candidate genes. On BTA 27, we identified the GINS complex subunit 4 (GINS4) gene, the glycerol-3-phosphatase acetyltransferase 4 (GPAT4) gene, and the thyroid hormone receptor β (THRB) gene.
The glutamate metabotropic receptor 1 (GRM1) gene is related to the calcium signaling, neuroactive ligand-receptor interaction, and phospholipase D signaling pathway, being associated with milk yield in dairy cattle (Table 4). Furthermore, the angiopoietin 4 (ANGPT4) gene and the thyroid hormone receptor β (THRB) gene are located in four pathways previously described for milk fat or protein synthesis in dairy cows (Table 4).

Fat-to-Protein Ratio
The Manhattan plot from the GWAS for FPR is presented in Figure 2B. The genomic inflation factor λ was 1.088. The GWAS revealed one SNP (rs439994366) above pBonf on BTA 12 and 60 SVs (54 SNPs and six indels) above pSug on BTA 9, 11, 12, 13, 19, 26, and 27 (Table S2). The associated markers were annotated to 19 potential candidate genes ( Table 3). The SNP rs439994366 is located in the Von Willebrand factor A domain containing 8 (VWA8) gene on BTA 12, with 15 SVs (thereof two indels) located in the gene and one further SNP at a close distance. The region between 58.4 and 75.3 Mb on BTA 13 harbored a cluster of ten potential candidate genes. On BTA 27, we identified the GINS complex subunit 4 (GINS4) gene, the glycerol-3-phosphatase acetyltransferase 4 (GPAT4) gene, and the thyroid hormone receptor β (THRB) gene.
The glutamate metabotropic receptor 1 (GRM1) gene is related to the calcium signaling, neuroactive ligand-receptor interaction, and phospholipase D signaling pathway, being associated with milk yield in dairy cattle (Table 4). Furthermore, the angiopoietin 4 (ANGPT4) gene and the thyroid hormone receptor β (THRB) gene are located in four pathways previously described for milk fat or protein synthesis in dairy cows (Table 4). Table 3. Potential candidate genes related to the identified sequence variants (SVs) significantly associated with the udder health indicator trait somatic cell score (SCS) and with the metabolic health and stability indicator trait fat-to-protein ratio (FPR) in DSN. The associations between each SV and RES-GIN are shown in the Manhattan plot in Figure 3A. The genomic inflation factor λ was 1.019. We identified 47 SVs (thereof one indel) above pBonf and 270 SNPs and 30 indels above pSug on 14 chromosomes associated with RES-GIN, respectively (Table S3). Most significantly and suggestively associations were identified on BTA 2 (n = 165) and on BTA 24 (n = 89). The strongest associations were identified on BTA 8 with seven markers surpassing pBonf. Additionally, 29 SNPs and one indel on BTA 24 and 10 SNPs on BTA 26 surpassed pBonf. The majority of associations surpassing pSug was found on BTA 2 (n = 165) and BTA 24 (n = 59) (Table S3). Significantly and suggestively associated SVs were annotated to 46 potential candidate genes (Table 5; Table S4). The most interesting candidate genes with at least 10 SVs located in the gene were the contactin-associated protein-like 5 (CNTNAP5) gene and the microtubule-associated protein 2 (MAP2) gene on BTA 2. On BTA 24, we identified 23 SVs (22 SNPs and one indel) located in the α kinase 2 (ALPK2) gene and 58 further SVs (thereof one indel) in close distance (<19.2 kb) to ALPK2 ( Figure 3B). Both genes NEDD4 like E3 ubiquitin protein ligase (NEDD4L) and MALT1 paracaspase (MALT1) flank APLK2, forming an association cluster (bp: 57,593,275-57,876,049) with a total number of 79 associated SNPs and three associated indels ( Figure 3B). In addition, we detected ten SNPs in the cartilage acidic protein 1 (CRTAC1) gene on BTA 26. Table 4, the gene activin receptor type 1C (ACVR1C) on BTA 2 is related to the cytokine-cytokine interaction pathway and to the TGF-ß signaling pathway, both involved in host immune response mechanisms. The adenylate cyclase 1 (ADCY1) on BTA 4 is part of the chemokine signaling and estrogen signaling pathways, previously identified for GIN infections in cattle (Table 4). Further pathways involved in host immune response were annotated to the MALT1 and PH domain leucine rich repeat protein phosphatase 1 (PHLPP1) gene on BTA 24. These pathways include the B cell receptor signaling pathway, Ctype lectin receptor signaling pathway, NF-kappa B signaling pathway, PI3K-Akt signaling pathway, and the T cell receptor signaling pathway (Table 4). and three associated indels ( Figure 3B). In addition, we detected ten SNPs in the cartilage acidic protein 1 (CRTAC1) gene on BTA 26. As shown in Table 4, the gene activin receptor type 1C (ACVR1C) on BTA 2 is related to the cytokine-cytokine interaction pathway and to the TGF-ß signaling pathway, both involved in host immune response mechanisms. The adenylate cyclase 1 (ADCY1) on BTA 4 is part of the chemokine signaling and estrogen signaling pathways, previously identified for GIN infections in cattle (Table 4). Further pathways involved in host immune response were annotated to the MALT1 and PH domain leucine rich repeat protein phosphatase 1 (PHLPP1) gene on BTA 24. These pathways include the B cell receptor signaling pathway, C-type lectin receptor signaling pathway, NF-kappa B signaling pathway, PI3K-Akt signaling pathway, and the T cell receptor signaling pathway (Table 4).
The identified candidate genes were annotated to 12 pathways potentially involved in host-parasite interactions ( Table 4). The phospholipase C β 1 (PLCB1) gene on BTA 13 (three SNPs located in the gene) was assigned to four biological pathways potentially involved in host-Fasciola hepatica interactions (Table 4). We identified two suggestively associated SNPs in the protein tyrosine kinase 2 (PTK2) gene in BTA 14, which is part of the chemokine signaling, leukocyte transendothelial migration, and natural killer cell mediated cytotoxicity pathways ( Table 4). The four genes interleukin 21 (IL21), prolactin-related protein VII (PRP-VII), prolactin-related protein IX (PRP9) and SMAD family member 4 (SMAD4) were related to the cytokine-cytokine receptor interaction, JAK-STAT signaling, PI3K-Akt signaling, TGF-β signaling, and Th17 cell differentiation pathways, respectively (Table 4).

Bovine Lungworm (Dictyocaulus viviparus) Infections
The Manhattan plot for the GWAS with RES-DV is shown in Figure 5A. The genomic inflation factor λ was 1.064. We identified 332 SVs (311 SNPs and 21 indels) above pBonf on BTA 2, 5, 6, 9, 15, 20, and 24 (Table S3). The most (n = 208) and strongest associations above pBonf were identified on BTA 24. The GWAS revealed 823 further SVs (761 SNPs and 62 indels) above pSug on all BTA except for BTA 1, 12, 18, 19, 22, and 25. The highest number of associations above pSug were detected on BTA 9 (n = 231) and on BTA 24 (n = 214). We identified 92 potential candidate genes for RES-DV (Table 5; Table S6). The region between 94.9 and 97.7 Mb on BTA 5 includes a cluster of seven potential candidate genes (Table S6). The genes including the largest number of associations within this cluster were the Rho GDP dissociation inhibitor β (ARHGDIB) gene, the glutamate ionotropic receptor NMDA type subunit 2B (GRIN2B) gene, and the epithelial membrane protein 1 (EMP1) gene. We identified the contactin 5 (CNTN5) gene on BTA 15 with 47 significantly associated SVs (thereof four indels) located in the gene and 56 associated SVs (thereof three indels) in the flanking region of the gene. The region between 68.2 and 68.4 Mb on BTA 21 includes a cluster of five potential candidate genes. The strongest association within this cluster (p = 1.62 × 10 −7 ; bp 68,299,895) was located in the kinesin light chain 1 (KLC1) gene, with a total number of 59 SVs (49 SNPs and 10 indels) located in the gene (Table S6). On BTA 24, we identified a cluster of five genes including a large number of associations in the region between 7.0 and 8.4 Mb. These genes include the suppressor of cytokine signaling 6 (SOCS6) gene, the rotatin (RTTN) gene, the CD226 molecule (CD226) gene, the docking protein 6 (DOK6) gene, and the coiled-col domain containing 102B (CCDC102B) gene. Within this cluster, we identified the largest number of associations for DOK6 ( Figure 5B). A further potential candidate gene was laminin subunit α 3 (LAMA3) with 34 SVs (thereof two indels) located in the gene.
The genes SOCS6, CD226, and LAMA3 were related to the JAK-STAT signaling, cell adhesion molecules, and PI3K-Akt signaling pathways, respectively ( Table 4). The bone morphogenetic protein receptor type 1B (BMPR1) gene on BTA 6 was annotated to the cytokine-cytokine receptor interaction and to the TGF-β signaling pathway, which are involved in the secretion and up-and downregulation of cytokines during helminth infections (Table 4). In addition, we identified the JAK-STAT signaling pathway and the PI3K-Akt signaling pathway for the cyclin D3 (CCND3) gene.
The genes SOCS6, CD226, and LAMA3 were related to the JAK-STAT signaling, cell adhesion molecules, and PI3K-Akt signaling pathways, respectively ( Table 4). The bone morphogenetic protein receptor type 1B (BMPR1) gene on BTA 6 was annotated to the cytokine-cytokine receptor interaction and to the TGF-β signaling pathway, which are involved in the secretion and up-and downregulation of cytokines during helminth infections (Table 4). In addition, we identified the JAK-STAT signaling pathway and the PI3K-Akt signaling pathway for the cyclin D3 (CCND3) gene.      [48].
C-type lectin receptor signaling pathway bta04625 RES-GIN MALT1 (24) C-type lectin receptors are involved in innate and adaptive immunity to pathogens [46]; helminth C-type lectins are involved in host-parasite interactions [55].
Natural killer cell mediated cytotoxicity bta04650 RES-FH PTK2 (14) Natural killer cells are lymphocytes of the innate immune response involved in host defense against infections with parasites [46]; cytotoxic natural killer cells were involved in early stage of infection by F. hepatica in rats [63]; pathway identified for F. hepatica infections in mice [54].
NOD-like receptor signaling pathway bta04621 RES-FH PLCB1 (13) Family of pattern recognition receptors responsible for various pathogens and generating innate immune response [46].
Phospholipase D signaling pathway bta04072 FPR GRM1 (9) Phospholipase D is an essential enzyme for the production of phosphatidic acid, a key intermediate in milk fat synthesis during lactation [61]. FPR ANGPT4 (13) Pathway associated with milk fat traits in dairy cattle [62].
T cell receptor signaling pathway bta04660 RES-GIN MALT1 (24) Involvement of B cells in immune response to gastrointestinal nematodes in ruminants [48].

RES-FH
SMAD4 (24) RES-DV BMPR1B (6) Th17 cell differentiation bta04659 RES-FH IL21 (17), SMAD4 (4) F. hepatica-induced TGF-ß suppresses Th17 responses in infected mice [66]. Table 5. Potential candidate genes (sorted alphabetically) related to the identified sequence variants (SVs) associated with residuals for fecal egg counts of gastrointestinal nematodes (RES-GIN), residuals for fecal egg counts of liver flukes (RES-FH), and residuals for fecal larvae counts of bovine lungworms (RES-DV) in DSN. Genes with at least one associated variant in the respective gene are marked in bold. A gene was considered a candidate gene if at least one SV above pSug was positioned in the respective gene and/or within 100 kb up-and downstream of the gene.

Discussion
This is the first study investigating genome-wide associations and candidate genes for functional traits in the local endangered DSN breed based on imputed WGS data. In our study, we used a quite large reference population of 304 sequenced DSN for imputation, resulting in a high imputation accuracy of 97.04%. Brondum et al. [67] and Mao et al. [68] demonstrated higher imputation accuracies for WGS data when using multi-breed compared to single-breed reference populations. Lower imputation accuracies of up to 95% were achieved in cattle studies using multi-breed reference panels with 242 to 1577 sequenced cattle [5,67,69]. Korkuć et al. [39] compared different imputation strategies in DSN and suggested a large reference panel of the same breed instead of a multi-breed (DSN together with HF) approach. Achieving an imputation accuracy above 97%, we expected a high power in our GWAS to identify genomic loci associated with complex functional traits in DSN.
We focused on fertility, udder health, metabolic health and stability indicator traits, and endoparasite infection traits since these are highly relevant for the conservation of the breed [10,24]. For CTFS, we identified SVs on BTA 12, 13, 15, and 28 with the largest number of associations on BTA 13. In Holstein cows, genomic loci for CTFS were detected on BTA 13, too [70], but associated regions on BTA 13 did not overlap with our findings. We identified ARHGAP21 on BTA 13 as one main candidate gene for CTFS with five SNPs located in the gene or within a region up to 42.3 kb downstream of the gene. The ARHGAP21 gene was reported to be involved in post-partum anestrus in tropical beef cattle [71]. Rosa et al. [72] demonstrated in a mouse model that ARHGAP21 is involved in insulin secretion. Insulin plays a major role in the regulation of energy balance and reproductive functions in dairy cattle [73]. Gong et al. [74] showed that diets with higher insulin levels significantly reduced the intervals from calving to first ovulation in dairy cows. Thus, ARHGAP21 might impact CTFS by regulating important hormone functions in dairy cattle. For NR56, our GWAS revealed significant SVs on BTA 8,11,13, and 20. Holmberg and Andersson-Eklund et al. [75] identified SNPs on BTA 11 and on BTA 20 associated with NR56 in Swedish dairy cows. We identified three genes for NR56, of which two were previously reported to be involved in cattle female fertility. The ZNF462 gene was identified as a transcription factor involved in fertility in Brangus heifers [76]. Moore et al. [77] showed reduced expressions of ZNF462 in the corpus luteum of low fertility HF cows compared to the high fertility control group. Kiser et al. [78] described MARCH11 on BTA 20 as a top locus for conception rate at first service and repeated artificial insemination in Holstein heifers. Furthermore, MARCH11 was recently associated with endometriosis in HF cows in our studies [79]. Hence, our detected genes influencing fertility in DSN may have similar functions for fertility in other cattle breeds. This observation is in accordance with Korkuć et al. [24], who found a high overlap of genomic regions in DSN and HF for milk production traits, although they identified no common genes for DSN and HF. Although we identified genes for CTFS and NR56 in DSN which were previously described in other breeds, the same gene can be differentially expressed in different breeds, which might phenotypically explain improved fertility in DSN. For example, Timperio et al. [80] showed transcriptomic level disparities in liver tissues in two closely related Bos taurus breeds, which contribute to large physiological differences. Moreover, Lehnert et al. [81] demonstrated that gene expression levels for growth-related genes can differ in different breeds, although the genes are involved in growth in cattle generally.
For FPR, we detected a cluster of ten closely related genes on BTA 13. This finding is in contrast to the GWAS by Korkuć et al. [24], where no SNP reached the significance threshold on BTA 13 for milk fat and protein traits in a population of 1816 50K genotyped DSN. However, similar to our findings, the authors detected SVs on BTA 9, 11, 12, and 27 associated with milk fat content or milk fat yield in DSN. Interestingly, Klein et al. [27] identified similar associations for FPR in Holstein cows on BTA 9, 13, 14, and 27, indicating a strong genomic relatedness of HF and DSN. The DGAT1 gene on BTA14 was highly associated with FPR in Holstein cows [27]. Furthermore, Jaeger [10] identified DGAT1 for fat content in a multi-breed GWAS including a quite large number of DSN cows. However, although genomic relatedness between DSN and HF is high, we detected no associations in the DGAT1 gene for FPR in DSN, which coincides with the observation by Korkuć et al. [24] for other milk production traits in DSN. We identified the glycerol-3-phosphatase acyltranserase 4 (GPAT4) gene on BTA 27 as a potential candidate gene for FPR. The GPAT4 gene is well described as a locus involved in lipid metabolism in HF cows [82]. The THRB gene on BTA 27 was associated with FPR and RES-DV, and might be an interesting candidate gene when aiming for improvements for both traits, i.e., less susceptibility to D. viviparus infections and stable FPR, which has been already proven by May et al. [21] quantitative-genetically. Furthermore, we found seven pathways for genes associated with FPR (e.g., MAPK signaling pathway), which were associated with milk production and fat yield in different cattle breeds [62,83]. The overlap of pathways identified for fat yield in previous studies and FPR in our study might be explained by the fact that fat yield is more variable than protein yield, and thus, contributes to a larger variation in FPR.
For the health and metabolic stability indicator traits SCS and FPR, Jaeger [10] estimated genetic correlations lower than 0.80 between two classes of average herd sizes and first calving ages in DSN based on pedigree data, indicating the presence of genotype-byenvironment interactions (GxE). Hence, we assume that a large number of genetic loci contribute to GxE for SCS and FPR, which should be explored in ongoing studies. The knowledge of how GxE contributes to the phenotypic variance for adaptive and characteristic traits in DSN is of great importance for the conservation of the breed, since large GxE SNP effects in cattle indicate considerable opportunity to improve environmental resistance and health [84]. For SCS, our GWAS revealed SVs on BTA 1,9,11,13,23, and 25. Significant marker associations for SCS in other dairy cattle breeds were described on each chromosome [85], indicating the complex genetic architecture of the trait. In the present study, we estimated additive genetic effects by neglecting non-additive dominance effects, which could play a crucial role in the genomic architecture of complex functional traits [86]. As pointed out by Howard et al. [87], dominance decreases when alleles are almost fixed due to inbreeding. Interestingly, a large inbreeding depression was observed for SCS in Holstein cows [88], while inbreeding depression for SCS is currently not present in DSN, possibly due to different selection strategies with only a small number of DSN bulls with high SCS breeding values used for artificial insemination [30]. Substantial dominance effects in DSN for endoparasite infection traits were estimated for an immunological relevant gene [89]. The largest number of associations was detected on BTA 25 within and close to the RBFOX1 gene, which was previously associated with subclinical ketosis in Jersey cattle [90]. An interesting candidate gene associated with SCS was the CLDN8 gene on BTA 1, which was found to be increasingly expressed in the bovine mammary glands during reduced milking frequency [91]. Furthermore, CLDN8 is part of the cell adhesion molecules and leukocyte transendothelial migration pathways, being involved in the recruitment and activation of macrophages during acute Escherichia coli-induced mastitis [50,92]. Due to the biological relevance of CLDN8 in udder health, we recommend a more detailed investigation of polymorphisms and favorable alleles in CLDN8, which might contribute to improved udder health in DSN. No overlap was seen between the genomic loci affecting SCS and loci affecting mastitis, which have been identified by Meier et al. [24]. In contrast to our findings for SCS, Meier et al. [25] detected genomic loci on BTA 3, 6, 9, and 26 for mastitis in a population of 1026 50K genotyped DSN, and they identified BMPR1B on BTA 6 as the main candidate gene associated with mastitis. The BMPR1B gene was described to be involved in immune response to the mastitis pathogen Staphylococcus aureus and is related to the cytokine-cytokine interaction and TGF-ß signaling pathway. In the present study, BMPR1B was associated with RES-DV assuming that BMPR1B plays an important role in both bacterial and endoparasite infections in DSN.
The GWAS revealed a large number of associations and candidate genes for endoparasite infection traits. Interestingly, we identified three genes associated with two endoparasite traits (TRERF1 on BTA 23, PIK3C3 on BTA 24, and PCDH15 on BTA 26). The PCDH15 gene, identified for GIN and F. hepatica infections, was associated with D. viviparus infections in our preliminary GWAS in DSN based on imputed high-density (HD) 700K data [22]. Hence, PCDH15 seems to be involved in overall resistance against endoparasite infections in DSN. Twomey et al. [7] applied a GWAS for antibody responses to endoparasite infections (F. hepatica, Neospora caninum, and Ostertagia ostertagi) in dairy and beef cattle and identified up to 248 SVs and up to 48 QTL for the different traits, which is similar to the number of SNPs and candidate genes for endoparasite infection traits detected in our study. However, the number of detected SVs for fertility and health indicator traits was smaller, which might be due to smaller heritabilities or a more complex genetic architecture for genes involved in host immune response to parasitic disease. Another explanation for the larger number of SVs detected for endoparasite traits addresses the chosen pre-correction approach for endoparasite infection traits in order to consider a larger number of cows examined for endoparasite infections. However, when applying the same approach for fertility and health indicator traits and using the model residuals from pre-corrected traits, the number of significantly and suggestively associated markers did not increase substantially for these traits (results not shown). Furthermore, the genomic inflation factor λ ranged from 1014 to 1088 for all traits and pre-corrected endoparasite traits, indicating that the number of false-positive associations for endoparasite infection traits does not differ to the other investigated traits.
For RES-GIN, a large number of associations surpassing pBonf were detected on BTA 2, 8, 24, and 26. Using the same study design and statistical approach for endoparasite infection traits in DSN imputed to high-density (HD) 700K, we identified associations for RES-GIN on BTA 2,4,5,8,9,18,22,24, and 26 in a previous GWAS [22]. Interestingly, a large proportion of SNPs detected with HD did not reach the significance threshold in the present study based on WGS data and only one candidate gene (PHLPP1) on BTA 24 overlapped in both datasets. The same observation was made by Wu et al. [6] when comparing GWAS results from different SNP panels and WGS data for the same trait. A possible explanation refers to differences in the genomic relationship matrix and LD between lowor high-density SNP panels and WGS data. Moreover, the significance thresholds in the present GWAS differs for different marker densities as a result of Bonferroni multiple testing correction and the calculated effective number of independent SVs. In consequence, markers detected with 700K HD data by May et al. [22] did not reach the suggestive and significance threshold in this study. However, for RES-DV, we detected linkage (r 2 > 0.6) between 21 significantly associated SNPs from the GWAS by May et al. [22] and 258 significantly or suggestively associated SVs in the present study. Furthermore, when marker density increases, LD within a region becomes stronger, implying a larger number of SVs around the identified genes affecting the trait of interest. For fecal egg excretion in Angus cattle, Kim et al. [52] detected significant associations on BTA 3, 5, 8, 15, and 27 and, similar to our GWAS, they found genes annotated to the chemokine signaling and cytokine-cytokine interaction pathway. In contrast, Twomey et al. [7] identified the strongest associations on BTA 3, 4, 12, 13, 14, 19, 21, and 23 for antibody response to O. ostertagi (most common GIN species in cattle) in a multi-breed GWAS. We identified ALPK2 on BTA 24 as the gene with the largest number of associations within and in close distance to the gene. Surprisingly, ALPK2 is not involved in immunological pathways and was not reported as a gene underlying parasite resistance in other animal species. In accordance to the study by May et al. [22], the largest number and strongest associations for RES-GIN and RES-DV were detected on BTA 2 and on BTA 24, which might be explained by the biological relatedness of both nematode traits RES-GIN and RES-DV. However, we detected no overlapping gene for RES-GIN and RES-DV. The gene FAM234B on BTA 5 and the gene DOK6 on BTA 24 were previously reported for D. viviparus larvae counts [22]. This is the first GWAS for patent D. viviparus infections in cattle based on WGS data and suggesting that the cytokine-cytokine interaction, JAK-STAT signaling, PI3K-signaling, and TGF-β signaling pathways are associated with D. viviparus infections in cattle.
For RES-FH, WGS data revealed a large number of candidate genes and related pathways previously described for F. hepatica infections via transcriptomic studies [51,53,56,59,63], indicating the advantage of WGS data to identify causal mutations for complex health traits in dairy cattle. We identified the fibroblast growth factor receptor 1 (FGFR1) gene on BTA 27 associated with RES-FH. Yu et al. [93] showed that fibroblast growth factor polypeptides are associated with liver fibrosis in mice. Liver fibrosis plays a key role in host immune response to F. hepatica infections [64]. Interestingly, TGF-ß plays a crucial role in fibrogenic processes during F. hepatica infections and acts as a host receptor for a parasitic growth factor [64,94]. Fu et al. [64] described the SMAD4 gene on BTA 24 as a key factor involved in fibrogenetic processes during F. hepatica infections. Here, we detected SMAD4 associated with RES-FH and related to the TGF-ß signaling pathway, indicating the power of our GWAS approach to identify genes associated with complex traits in DSN. Together with IL21, SMAD4 is related to the Th17 cell differentiation pathway. Walsh et al. [66] showed that TGF-ß suppresses Th17 immune response in the host via F. hepatica infections. Furthermore, we detected six genes for RES-FH related to the cGMP-PKG signaling pathway and the JAK-STAT signaling pathway, which are well known to be involved in F. hepatica infections [56,59]. Surprisingly, the genes prolactin-related protein IIV and IX (PLR9 and PRP-VII) on BTA 23 were associated with RES-FH and annotated to the JAK-STAT signaling pathway, too. This finding addresses possible polymorphisms in the identified prolactin genes in DSN associated with both improved milk production and resistance to F. hepatica infections. This is of great importance, since genetic correlations between milk production and F. hepatica egg excretion was shown to be favorable in quantitative-genetic studies [21,22]. Hence, genetic correlations with milk production traits should be taken into consideration when selecting DSN being well adapted to pasture production systems with improved fertility and metabolic stability and enhanced resistance to endoparasite and udder bacterial infections.

Conclusions
Using WGS data, we identified a large number of SVs and potential candidate genes associated with fertility, udder and metabolic health indicator traits, metabolic stability, and endoparasite infection traits in the local DSN population. Such in-depth insights into the genomic particularities enable improved selection strategies in the local DSN breed, especially when defining criteria in the context of conservation of genetic resources. The investigated traits are known as specific adaptive traits in DSN. Genetic improvements of these traits contribute to overall robustness and to possible DSN breed advantages or DSN breed competitiveness over large commercial dairy cattle populations. Selection of DSN animals carrying favorable alleles for the identified SVs is an efficient breeding approach in this regard. For CTFS and NR56, three annotated genes have similar functions in other cattle breeds. CLDN8 and RBFOX1 were the most interesting potential candidate genes for SCS. The largest number of associations for FPR were detected on BTA 12 and 27. For endoparasite infection traits, we detected potential candidate genes and related biological pathways, which are involved in host immune response to endoparasite infections. In particular, the identified markers within immunological relevant genes should be used for future genomic selection strategies in DSN, aiming to improve health and adaption to pasture environments.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12081163/s1, Table S1: List of all SVs associated with calving-to-first-service interval (CTFS) and non-return at day 56 (NR56), Table S2: List of all SVs associated with somatic cell score (SCS) and fat-to-protein ratio (FPR), Table S3: List of all SVs associated with residuals of gastrointestinal nematode infections (RES-GIN), Dictyocaulus viviparus infections (RES-DV), and Fasciola hepatica infections (RES-FH), Table S4: Potential candidate genes with corresponding number of SVs within and close to the gene related to the identified SVs associated with residuals of fecal egg counts for gastrointestinal nematodes, Table S5: Potential candidate genes with corresponding number of SVs within and close to the gene related to the identified SVs associated with residuals of fecal egg counts for Fasciola hepatica, Table S6: Potential candidate genes with corresponding number of SVs within and close to the gene related to the identified SVs associated with residuals of fecal larvae counts for Dictyocaulus viviparus. Institutional Review Board Statement: The whole genotyping process from tissue sampling up to the SNP database development was embedded in the logistics and infrastructure of the cow genotyping activities in Germany, organized by the German Holstein breeding organization and their participating regional breeding organizations and farmers. This activity is the basis to implement a national cow training set for genomic selection. Farmers agreed to participate in this study and to collect small fecal samples for endoparasite determinations. Such fecal collection does not influence the wellbeing of cows. All further data were provided by the respective breeding organizations and milk recording organizations provided (i.e., the traits from official milk recording schemes, genomic data, and pedigree data). Thus, no ethical approval was required for this study.

Informed Consent Statement: Not applicable.
Data Availability Statement: All the data supporting the results of this article are included within the article. The raw phenotypic and genotypic data are stored in the databases of the IT company vit Verden and the cluster justHPC from Giessen University (https://www.hkhlr.de/de/cluster/ justhpc-giessen, accessed on 2 June 2021). All data can be provided on request.