A Million-Cow Genome-Wide Association Study of Three Fertility Traits in U.S. Holstein Cows

A genome-wide association study (GWAS) of the daughter pregnancy rate (DPR), cow conception rate (CCR), and heifer conception rate (HCR) using 1,001,374–1,194,736 first-lactation Holstein cows and 75,140–75,295 SNPs identified 7567, 3798, and 726 additive effects, as well as 22, 27, and 25 dominance effects for DPR, CCR, and HCR, respectively, with log10(1/p) > 8. Most of these effects were new effects, and some new effects were in or near genes known to affect reproduction including GNRHR, SHBG, and ESR1, and a gene cluster of pregnancy-associated glycoproteins. The confirmed effects included those in or near the SLC4A4-GC-NPFFR2 and AFF1 regions of Chr06 and the KALRN region of Chr01. Eleven SNPs in the CEBPG-PEPD-CHST8 region of Chr18, the AFF1-KLHL8 region of Chr06, and the CCDC14-KALRN region of Chr01 with sharply negative allelic effects and dominance values for the recessive homozygous genotypes were recommended for heifer culling. Two SNPs in and near the AGMO region of Chr04 that were sharply negative for HCR and age at first calving, but slightly positive for the yield traits could also be considered for heifer culling. The results from this study provided new evidence and understanding about the genetic variants and genome regions affecting the three fertility traits in U.S. Holstein cows.


Introduction
The U.S. Holstein genomic evaluation included three female fertility traits, daughter pregnancy rate (DPR) as the percentage of cows that become pregnant during each 21 d period [1], and cow conception rate (CCR) and heifer conception rate (HCR) as measures of the conception ability of cows and heifers [2,3]. These fertility traits all have low heritability [4][5][6] with large random variations, which made identifying and understanding the genetic factors affecting these traits challenging. For the same three fertility traits in U.S. Holstein cattle, several GWAS were reported, including studies using 27,214 Holstein bulls [7], 24,000 Holstein bulls and 36,000 cows [4], and a large-scale GWAS using 186, 188-269,158 Holstein cows [8]. These studies have accumulated some consensus for genetic effects affecting the three fertility traits, including effects in the GC and KALRN genes for DPR and CCR [7,8] and in the AFF1 gene for HCR [4,8]. The sample sizes of 186, 188-269,158 Holstein cows for the large-scale GWAS of DPR, CCR, and HCR were unprecedentedly large in 2019, but the numbers of significant effects detected for the three fertility traits were far fewer than those for the five production traits: 15-360 additive effects and 1-2 dominance effects for each of the three fertility traits compared to 9803-15,215 additive

Results and Discussion
The GWAS detected 7576, 3798, and 726 additive effects and 22, 27, and 25 dominance effects with log 10 (1/p) > 8 for DPR, CCR, and HCR, respectively, for a total of 12,100 additive effects and 74 dominance effects for the three fertility traits. These effects were considerably more than the 487 additive effects and 5 dominance effects for the same three fertility traits reported in the 2019 study [8]. The new effects included some effects involving genes known to affect reproduction and some rare, but very negative effects and confirmed some of the previously detected effects. The additive effects each with a small effect, except with a few sharply negative additive effects, pointed to the polygenetic nature of the fertility traits. The number of significant dominance effects (74 effects) was much smaller than the number of additive effects (12,100 effects), but some of the dominance effects involved rare genotypes with sharply negative dominance values. The details of the GWAS results are provided in the Supplementary Materials (Tables S1-S6), including the top-1000 additive effects for DPR (Table S1) and CCR (Table S3), all 726 significant additive effects for HCR (Table S5), and all 74 significant dominance effects of the three fertility traits (Tables S2, S4, and S6), whereas the main text focused on the most-significant chromosome regions and effects. The statistical significance measured by the log 10 (1/p) from the t-test of Equation (5) is an increasing function of the size of the additive effect (α defined by Equation (10)) for the additive test or the size of the dominance effect (δ defined by Equation (12)) for the dominance test and is an increasing function of the minor allele frequency for the additive test and minor genotypic frequency for the dominance test. Therefore, an SNP with a rare allele or genotype, but a large additive or dominance effect could be highly significant, and an SNP with a small additive or dominance effect, but a large minor allele or genotypic frequency could also be highly significant if the sample size is sufficiently large such as the million-cow samples in this study. The results of this study had examples of both cases. For the same effect size, the alternative alleles and genotypes of the same SNP may have drastically different impacts on the phenotype. Therefore, the presentation and discussion of the results will cover the statistical significance, the effect size, the allelic effects and dominance values, and the allele and genotypic frequencies associated with the allelic effects and dominance values.

Additive Effects for DPR
Two chromosome regions had highly significant additive effects, the 83.44-93.83 Mb region of Chr06 with peak effects in or near GC, NPFFR2, and SLC4A4 and the 43.59-51. 11 Mb region of Chr18 with peak effects in or near PAK4, DEDD2, POU2F2, CEBPG, PEPD, and SIPA1L3. Other chromosome regions with similar statistical significance included  Chr30 is the pseudoautosome region of the X chromosome. Chr31 is the nonrecombining region of the X chromosome (no recombination with the Y chromosome). 4

of 27
The 83.44-93.83 Mb region of Chr06 (10.4 Mb in size) had a large cluster of highly significant additive effects (Figure 1c, Table S1). In this large region, positive and negative effects on average had similar sizes, an average of 0.23 for positive effects and −0.22 for negative effects. The three SNPs in GC (GC vitamin D binding protein), CSN1S2 (casein alpha-S2) and CSN3 (casein kappa) each had the largest positive effects, and the SNP effect in GNRHR (gonadotropin-releasing hormone receptor) was an example of symmetric effects, where the positive and negative effects of the SNP had similar sizes (Figure 1c, Table S1). The known biological functions of these four genes should all be relevant to dairy cows.
Within this large 10.4 Mb region, the 86.79-87.32 region (0.53 Mb in size) containing GC, NPFFR2, and SLC4A4 had eleven of the top-twenty significant effects (Table 1). In this 0.53 Mb region, negative effects on average had larger sizes than the positive effects, an average of −0.32 for negative effects and 0.29 for positive effects. Two SNPs between SLC4A4-GC, one SNP between GC-NPFFR2, and one SNP in NPFFR2 had the largest negative effects and were the most-significant effects among all SNPs ( Figure 1c, Table  S1, Appendix A Figure A1). The #1 significant effect (rs110434046) was between GC and NPFFR2, about 220 Kb downstream of GC (Table 1, Appendix A Figure A1). "u" indicates that the SNP is upstream of the gene. "α" is the additive effect of the SNP as the difference between the allelic effects of "allele 1" and "allele 2" (Equation (10)). "ae+" is the allelic effect of the positive allele (Equation (11)). "ae−" is the allelic effect of the negative allele (Equation (11)). "f_al+" is the frequency of the positive allele. "f_al−" is the frequency of the negative allele.
The 86.79-87.32 region had two examples indicating the power of large samples for fine mapping to distinguish the relative importance of two closely linked loci to the phenotype. The two SNPs in NPFFR2, rs109034709 and rs137844449, were only 1062 bp apart, but had very different statistical significance and effect sizes, log 10 (1/p) = 94.27 (#2 ranking) and |α| = 0.71 for rs109034709 and log 10 (1/p) = 32.28 (#286 ranking) and |α| = 0.41 for rs137844449 (Table 1, Appendix A Figure A1). This example showed that the sample size of 1,194,736 cows for DPR was able to distinguish the additive effects of two SNPs as close as 1062 bp apart, noting that the two SNPs had similar minor allele frequencies, 0.364 for rs109034709 and 0.375 for rs137844449 (Tables 1 and S1). Another two closely linked SNPs in the same region also had very different log 10 (1/p) values (Appendix A Figure A1).
The additive effects of Chr18 among the top-1000 effects were in the 43.10-59.75 Mb region of the chromosome (Figure 1d, Table S1). This Chr18 region had seven SNPs with the top-20 effects (ranking #8-#19) in or near PAK4, DEDD2, POU2F2, CEBPG, PEPD, CHST8, and SIPA1L3 (Table 1). Among these seven SNPs, three SNPs in or near CEBPG, PEPD, and CHST8 had some of the most-negative effects, which will be discussed later along with the dominance effects of those SNPs.
The additive effects of Chr01 among the top-1000 effects were mostly in two regions: the 68.04-70.7b Mb and 134.36-141.05 Mb (Figure 1e). In the 68.04-70.7b Mb region, there were two SNPs in KALRN and one SNP in CCDC14. These sharply negative allelic effects along with those in or near CEBPG, PEPD, and CHST8 of Chr18 had a common feature: the alternative alleles of the SNP behaved like neutral alleles with slightly positive allelic effects above the "0" line ( Figure 1d). The 134.36-141.05 Mb had the most-significant effect of Chr01 in IGSF5. This SNP along with the SNP in PRDM15 had some of the highest positive allelic effects of this chromosome (Figure 1e), along with the positive effects in GC, CSN1S2, and CSN3 of Chr06 (Figure 1c).
The additive effects of Chr05 among the top-1000 effects were in the 24.82-103.67 Mb region ( Figure 1f, Table S1) with the most-significant effects in AMN1 and SLC38A4 and the most-negative effects in C2CD5 and near KRAS.
The most-significant additive effect of Chr19 (#46 among all significant additive effects) was 5139 bp downstream of SHBG (sex-hormone-binding globulin), and this SNP also had a highly significant effect on AFC [9].

Dominance Effects for DPR
DPR had 22 significant dominance effects with log 10 (1/p) > 8 (Table S2). The dominance tests showed that only four chromosomes had significant dominance effects for DPR, Chr18, Chr06, Chr01, and Chr09 (Table 2, Figure 2a-e). The CEBPG-PEPD-CHST8 region of Chr18 had the most-significant dominance effects and the most-negative dominance values of the homozygous recessive genotypes (d_RR), followed by the AFF1 KLHL8 region of Chr06 and the CCDC14-KALRN region of Chr01. The dominance values of the homozygous recessive genotypes of Chr09 were also negative ( Figure 2e), but not as negative as those of Chr18, Chr06, and Chr01. The most-significant dominance effect of Chr04 (Figure 2f) was statistically insignificant for DPR, but this effect was more negative than those of Chr09 and was highly significant for HCR to be described later. a This SNP is recommended for culling heifers with the homozygous recessive genotype based on its effect on DPR (δ > 2.0). "u" indicates the SNP is upstream of the gene. "δ" is the dominance effect of the SNP as the difference between the heterozygous dominance value and the average of the two homozygous dominance values (Equation (12)). "DR" is the heterozygous genotype with one dominant allele and one recessive allele. "d_DR" is the dominance value of the heterozygous genotype with one dominant allele (D) and one recessive allele (R) (Equation (13)). "DD" is the homozygous genotype with two dominant alleles. "d_DD" is the dominance value of the homozygous genotype with two dominant alleles (DD) (Equation (13)). "RR" is the homozygous genotype with two recessive alleles. "d_RR" is the dominance value of the homozygous genotype with two recessive alleles (RR) (Equation (13)). "f_DR" is the frequency of the heterozygous genotype. "f_DD" is the frequency of the homozygous genotype of the dominant allele. "f_RR" is the frequency of the homozygous genotype of the recessive allele.   The number of significant dominance effects was much smaller than those of the additive effects, but the size of each dominance effect generally was much larger than that of the additive effects. Among the 22 significant dominance effects (Table S2), 13 dominance effects had effect sizes greater than the largest additive effect, 1.86-4.81-times as large as the largest additive effect, and all these 13 dominance effects had rare homozygous recessive genotypes with genotypic frequencies of 0.3-0.6% (Table 2).

Additive Effects for CCR
The additive effects for CCR (Table S3) overlapped by 60.5% those of DPR (Table S1) among the top-1000 additive effects of each trait. The highly significant additive effects for CCR (Figure 3a-f) mostly overlapped with the highly significant effects for DPR (Figure 1a-f) with some differences. The same SNP (rs110434046) of Chr06 had the most-significant additive effect for both CCR (Table S3) and DPR (Table 1). A major difference was that Chr03 was more significant for CCR ( Figure 3a) than for DPR (Figure 1a). The #2 most-significant effect for CCR among all SNPs was on Chr03 (Table S3), and this effect was #59 for DPR (Table S1). The additive effects of Chr03 for CCR covered a large distance of Chr03 and was mostly in the region of 44-64 Mb, and most of the negative alleles had larger effect sizes than the positive effect sizes (Appendix B Figure A2). Another difference was that Chr01, Chr03, Chr05, Chr06, and Chr18 had similar statistical significance for the most-significant effects on each chromosome for CCR (Figure 3a), unlike DPR, where Chr06 and Chr18 were considerably more significant than the other chromosomes ( Figure 1a). Other differences between CCR and DPR included different effects among the top-1000 effects (Tables S1 and S3) and the location shifts of significant effects, e.g., the most-significant Chr01 effect for CCR was downstream of 7SK (Table S3), whereas the most-significant Chr01 effect for DPR was in IGFS5 (Table 1). Among the top-1000 most-significant SNP effects, negative allelic effects on average had larger sizes (average −0.29) than positive allelic effects (average 0.24), with chromosomes 1 and 18 having sharply lower additive effects than those of other chromosomes (Figure 3c,d).

Dominance Effects for CCR
CCR had 27 significant dominance effects with log 10 (1/p) > 8 (Table S4), and 19 of these 22 dominance effects overlapped 19 of the 22 dominance effects for DPR. The most-significant dominance effects for CCR were on Chr18, Chr06, Chr01, and Chr09 (Figure 4a-e), the same chromosomes with dominance effects for DPR. The dominance effect of CCR in estrogen receptor 1 (ESR1) of Chr09 (Figure 4e) was not shared by CCR and DPR. The dominance values of the three genotypes of this dominance effect had the smallest differences among all significant dominance effects for CCR. The reason for this dominance effect to be statistically significant was the large minor genotypic frequency of the three genotypes of 0.143 for the recessive genotype (the "22" genotype; Table S4). This was an example of a significant effect with a small effect size, but a large minor genotypic frequency.
The CEBPG-PEPD-CHST8 region of Chr18 had the most-significant and -negative dominance values of the homozygous recessive genotypes (Figure 4b), followed by the AFF1-KLHL8 region of Chr06 ( Figure 4c) and the CCDC14-KALRN region of Chr01 ( Figure 4d). The dominance values of the homozygous recessive genotypes of Chr09 were also negative ( Figure 4e), but not as negative as those of Chr18, Chr06, and Chr01. The most-significant dominance effect of Chr04 was statistically insignificant for DPR ( Figure 2f), but this effect was more negative than those of Chr09 and was highly significant for HCR to be described later.

Additive Effects for HCR
HCR had 726 significant additive effects with log 10 (1/p) > 8 (Table S5). The highly significant additive effects for HCR were in the AFF1-KLHL8 region of Chr06, the EPN2-GRAP region of Chr19, and a chromosome region with multiple genes of pregnancyassociated glycoprotein (PAG2-PAG7, PAG10, PAG11) of Chr29 (Figure 5a). Among the 726 significant SNP effects, negative allelic effects on average had larger sizes (average −0.21) than positive allelic effects (average 0.17), with Chr01 and Chr06 having sharply lower additive effects than those of other chromosomes (Figure 5b-d). It was interesting that the sharply negative additive effects of Chr18 for CCR and DPR were not shared by HCR and the sharply negative dominance effects of Chr18 for CCR and DPR were also not shared by HCR, as to be shown. Some of the Chr19 effects on HCR also had highly significant effects on the age at first calving (AFC) [9], including SNPs in or near NCOR1, ARHGAP44, HS3ST3A1-COX10, and SHBG (Table 3 and Table S5, Figure 5e). The pregnancy-associated glycoprotein genes of Chr29 (Figure 5f) were some of the genes detected in this study with known biological functions related to reproduction. "u" indicates the SNP is upstream of the gene, and "d" indicates the SNP is downstream of the gene. "α" is the additive effect of the SNP as the difference between allelic effects of "allele 1" and "allele 2" (Equation (10)). "ae+" is the allelic effect of the positive allele (Equation (11)). "ae−" is the allelic effect of the negative allele (Equation (11)). "f_al+" is the frequency of the positive allele. "f_al−" is the frequency of the negative allele.

Common Effects for DPR, CCR, and HCR
The additive effects of the three fertility traits had 50 common SNPs (Table S7). The overlapping effects were not ranked among the top-20 effects for any of the three traits. The highest-ranked effect was an SNP in LOC100848253 for DPR (#28), upstream of EYA4 for CCR (#44), and in GLDN for HCR (#33). The SNP in ARHGAP44 (rs110401500) affecting all three traits was not ranked high (#295-#348, Table S6), but had the most-significant effect for AFC [9]. Another SNP in ARHGAP44 (rs135166134) had the #13 effect for HCR. These results of ARHGAP44's involvement in the three fertility traits and AFC indicated that ARHGAP44 had an important role in all these four reproductive traits. The SNP downstream of SHBG (rs111004845) was another interesting example of an SNP effect affecting all four reproduction traits, noting that the 80K SNP data in this study did not have any SNP in SHBG. The effects of this SNP for the three traits were ranked #37 for HCR, #46 for DPR, and #186 for CCR (Table S7), but this SNP had the third-most-significant effect for AFC [9]. SHBG is one of the candidate genes with known biological functions affecting reproduction. This gene encodes a steroid-binding protein, and the encoded protein transports androgens and estrogens in the blood, binding each steroid molecule as a dimer formed from identical or nearly identical monomers [10]. Two other candidate genes (COX10 and TTC19) also had effects affecting all four reproductive traits. These candidate genes, ARHGAP44, SHBG, COX10, and TTC19, should be strong candidate genes for their effects on all four reproductive traits.

Dominance Effects for HCR
HCR had 25 significant dominance effects with log 10 (1/p) > 8 (Table S6) Table S6). An SNP upstream of AGMO and an SNP in AGMO were sharply negative for HCR (Figure 6b), noting that these SNPs did not have significant effects for DPR ( Figure 2f) and CCR (Figure 4f). HCR dominance effects in the 101.88-102.21 Mb region of Chr06 were sharply negative (Figure 4c), and these effects were from the same SNPs in AFF1 and KLHL8, which also had sharply negative dominance values for DPR ( Figure 2f) and CCR (Figure 4f). All the highly significant dominance effects had low frequencies for the recessive genotypes and generally had larger effect sizes than the additive effects (Tables 3 and 4). a This SNP is recommended for culling heifers with the homozygous recessive genotype based on its effect on HCR (δ > 2.0). "u" indicates the SNP is upstream of the gene. "δ" is the dominance effect of the SNP as the difference between the heterozygous dominance value and the average of the two homozygous dominance values (Equation (12)). "DR" is the heterozygous genotype with one dominant allele and one recessive allele. "d_DR" is the dominance value of the heterozygous genotype with one dominant allele (D) and one recessive allele (R) (Equation (13)). "DD" is the homozygous genotype with two dominant alleles. "d_DD" is the dominance value of the homozygous genotype with two dominant alleles (DD) (Equation (13)). "RR" is the homozygous genotype with two recessive alleles. "d_RR" is the dominance value of the homozygous genotype with two recessive alleles (RR) (Equation (13)). "f_DR" is the frequency of the heterozygous genotype. "f_DD" is the frequency of the homozygous genotype of the dominant allele. "f_RR" is the frequency of the homozygous genotype of the recessive allele.

Elimination of Rare Negative Recessive Genotypes for Heifer Culling
Based on the results of the additive and dominance effects for DPR, CCR, and HCR, we recommend 11 SNPs with sharply negative dominance values for heifer culling by eliminating heifers with the homozygous recessive genotype for any of the 11 SNPs, plus two Chr05 SNPs previously recommended for heifer culling based on AFC, for a total of 13 SNPs (Table 5). Of these SNPs, the three Chr01 SNPs were selected based on the additive effects for DPR and CCR (Figures 1e and 3c); the four SNPs of Chr06 were based on the dominance effects for DPR, CCR, and HCR (Figures 2c, 4c and 6c); the four Chr18 SNPs were based on the additive and dominance effects for DPR and CCR (Figures 1d, 2b, 3d and 4b). In addition, two Chr04 SNPs were selected based on the dominance effects for HCR (Figure 6b). Among these 15 SNPs, the 4 Chr06 SNPs and the 2 Chr05 SNPs were also recommended for heifer culling for AFC [9]. The homozygous recessive genotype of each SNP affected 2240-7079 cows among the more than one-million cows in this study (    DPR is the daughter pregnancy rate. CCR is the cow conception rate. HCR is the heifer conception rate. AFC is the age at first calving, where positive AFC values represent older first calving ages and are considered negative, whereas negative AFC values represent younger first calving ages and are considered positive. "u" indicates the SNP is upstream of the gene, and "d" indicates the SNP is downstream of the gene. a This Chr06 SNP was also recommended for heifer culling in the GWAS for AFC [9]. b This Chr05 SNP was recommended for heifer culling in the GWAS for AFC [9], and this study showed that this SNP also had negative effects on the three fertility traits. To evaluate the consequence of eliminating the homozygous recessive genotypes using these SNPs, we calculated the measure of "negative impact" (NI), which is the difference between the average of the homozygous recessive phenotypic values and the average of the phenotypic values of the other two genotypes (Equation (14)) for seven traits, the three fertility traits, AFC, and the three yield traits ( Table 5). The NI values provided estimated how much the homozygous recessive genotype of each SNP affected the seven traits if a heifer was eliminated for having the homozygous recessive genotype.
The results showed that the Chr01, Chr06, and Chr18 SNPs had the most-negative NI for DPR and CCR, whereas the Chr06 and Chr04 SNPs had the most-negative NI for HCR and AFC (Table 5). Only the two Chr04 SNPs had a minor positive NI for the three yield traits, and all the remaining thirteen SNPs on Chr01, Chr06, Chr18, and Chr05 had a negative NI for all seven traits, noting that a positive NI value for AFC means an older first calving age and is interpreted as a negative result. The four Chr06 SNPs recommended by this study for all three fertility traits and by the AFC study [9] had the most-negative NI for AFC, and the two Chr05 SNPs recommended for AFC had the most-negative NI for the three yield traits. Therefore, all these thirteen SNPs could be recommended for culling heifers with the homozygous recessive genotype for any of the thirteen SNPs, whereas the use of the two Chr04 SNPs for heifer culling should consider the minor losses in milk, fat, and protein yields. We limit this recommendation to heifer culling because more research is needed to recommend the 13-15 SNPs for bull culling.

Interpretation of Dominance Effects
Dominance effects can be interpreted using three measures: dominance values (Equation (13)), corrected genotypic means after removing pedigree additive (breeding) values from the original phenotypic values (Equations (1) and (4)), and the genotypic means of the corrected phenotypic values after removing the pedigree additive (breeding) values from the original phenotypic values (Equations (1) and (4)) and the genotypic means of the original phenotypic values without removing the pedigree additive values. Based on dominance values that removed the SNP additive effects, all dominance effects were positive overdominance effects in the sense the heterozygous dominance value was more positive than the dominance value of either homozygous genotype of each SNP (Figures 2, 4 and 6).
Based on the genotypic means of the corrected phenotypic values that removed the pedigree additive (breeding) values and the original phenotypic values that did not remove the pedigree additive values, the dominance effects were nearly complete dominance or partial dominance effects (Figure 7). The Chr04 example for HCR (Figure 7a,b) was nearly complete dominance where the heterozygous genotypic mean was nearly the same as the genotypic mean of the homozygous dominance genotype, whereas the genotypic mean of the homozygous recessive genotype was sharply lower than those of the other two genotypes. The Chr06 example for CCR (Figure 7c,d) was between nearly complete dominance and partial dominance, where the heterozygous genotypic mean was close to the genotypic mean of the homozygous dominance genotype, but not as close as that of the Chr04 example for HCR (Figure 7a,b). The Chr01 example for CCR (Figure 7e,f) was partial dominance, where the heterozygous genotypic mean was farther apart from the genotypic mean of the homozygous dominance genotype than in the Chr04 example for HCR (Figure 7a,b) and the Chr06 example for CCR (Figure 7c,d). DPR dominance effects (Appendix C Figure A3 (Figure 7b,d,f). The lack of deviation of the genotypic mean of each homozygous dominant genotype from the population mean was considered to have no effect on the phenotype by the dominant allele when in homozygous status. DPR examples of the dominance effects (Appendix C Figure A3) provided further evidence that all dominant alleles in this study were neutral alleles when in homozygous status.
The most-important finding of the dominance effects in this article along with the dominance effects for AFC in a previous study [9] was the existence of dominant and recessive alleles for quantitative traits in dairy cattle, where the recessive allele had a sharply negative effect on the phenotype when in homozygous status and the dominant allele neutralized most of the negative effect of the recessive allele when in heterozygous status and behaved like a neutral allele with no effect on the phenotype when in homozygous status.

Statistical Significance, Effect Size, and Frequency
This study used the statistical significance measured by the log 10 (1/p) from the t-test as the primary evidence of the genetic effect of the fertility traits, but also considered the size and frequency of each genetic effect, because statistical significance is affected by the size of the genetic effect being tested (Figure 8a) and the allele frequency of the SNP (Figure 8b) and because the effect size and frequency have practical implications. Figure 8a shows a clear increasing trend of the statistical significance as the size of the additive effect increases, except a small number of SNPs with large effects, but medium statistical significance due to the low MAF of those SNPs. Figure 8b shows the statistical significance had a trend to be high for the high MAF around 0.5 of the negative alleles, although exceptions existed due to different effect sizes. The relationship between statistical significance and the frequency of the positive allele is just the mirror image of Figure 8b.  shows a clear increasing trend of the statistical significance as the size of the additive effect increases, except a small number of SNPs with large effects, but medium statistical significance due to the low MAF of those SNPs. Figure 8b shows the statistical significance had a trend to be high for the high MAF around 0.5 of the negative alleles, although exceptions existed due to different effect sizes. The relationship between statistical significance and the frequency of the positive allele is just the mirror image of Figure 8b. A high statistical significance could be due to a large genetic effect and a small MAF such as the significance of rs42739334 in KLHL8 (Table 3) or a small genetic effect with a high MAF such as rs3423297865 in ESR1 (Table S4). An SNP with a large and low-frequency effect had a large impact on the phenotype, but affected a small number of cows, whereas an SNP with a small and high-frequency effect had a small impact on the phenotype, but affected many cows. Among the positive and negative allelic effects, most effects had small sizes (Figure 8c,d), indicating the polygenetic nature of DPR with many genetic effects each having a small effect. Negative allelic effects had larger effects (Figure 8c) than the positive effects (Figure 8d). If implemented, the recommendation from this article for eliminating heifers with sharply negative homozygous recessive genotypes should help reduce the frequency of the large negative fertility effects in the U.S. Holstein population, although those negative genotypes already had low frequencies. A high statistical significance could be due to a large genetic effect and a small MAF such as the significance of rs42739334 in KLHL8 (Table 3) or a small genetic effect with a high MAF such as rs3423297865 in ESR1 (Table S4). An SNP with a large and low-frequency effect had a large impact on the phenotype, but affected a small number of cows, whereas an SNP with a small and high-frequency effect had a small impact on the phenotype, but affected many cows. Among the positive and negative allelic effects, most effects had small sizes (Figure 8c,d), indicating the polygenetic nature of DPR with many genetic effects each having a small effect. Negative allelic effects had larger effects (Figure 8c) than the positive effects (Figure 8d). If implemented, the recommendation from this article for eliminating heifers with sharply negative homozygous recessive genotypes should help reduce the frequency of the large negative fertility effects in the U.S. Holstein population, although those negative genotypes already had low frequencies.

Comparison between the 2023 and 2019 Studies
The large-scale GWAS [8] (2019 study) and the million-cow GWAS of this article (2023 study) were most comparable because these studies all used U.S. Holstein cows, the unprecedently large samples at the time of each study, and the same methods of data analysis. Therefore, the comparison with previous studies was focused on the comparison between the 2023 and 2019 studies. However, these two studies had major differences.
A major difference between these two studies was the different sample sizes, with the sample sizes of the 2023 study being 4.28-5.38-times as large as those of the 2019 study (Table 6). Two other major differences were the different versions of the cattle genome assembly and the different numbers of SNPs for the GWAS. The 2019 study used the UMD 3.1 version of the cattle genome assembly and 60,671 SNPs (60K), whereas the 2023 study used the ARS-UCD1.3 version of the cattle genome assembly and 78,964 SNPs (80K). The 80K and 60K had 44,718 common SNPs; the 80K had 34,246 SNPs not in the 60K; the 60K had 15,953 SNPs not in the 80K. Therefore, the estimates of the significant effects of the 60K confirmed by the 80K of the 2023 study should only be interpreted as approximate estimates of the confirmed effects by the 2023 study. a Effects from the 2019 study were compared to the top-1000 effects for DPR (Table S1) and CCR (Table S3) and to all 726 significant additive effects for HCR (Table S5) from the 2023 study. DPR is the daughter pregnancy rate. CCR is the cow conception rate. HCR is the heifer conception rate.
All additive effects of the 2019 study were compared with the top-1000 additive effects for DPR and CCR and with all 726 additive effects for HCR of the 2023 study. The results showed that 13-42% of additive effects from the 2019 study were confirmed by the 2023 study, with DPR having the highest percentage of confirmed effects and HCR having the lowest percentage of confirmed effects (Table 6). Given the sample sizes of the 2019 study with 269,158 cows for HCR, only 13% (2 out of 15) of the previous effects making the top 726 effects in the 2023 study were an indication of the challenging task of identifying genetic effects for a low-heritability trait such as HCR and was an indication that large samples such as the million-cow samples of the 2023 study could be extremely valuable for identifying the genetic effects of low-heritability traits. The results of DPR with 42% confirmed effects and CCR with 18% confirmed effects pointed to the same conclusions, which were further supported by the dominance results.
The 2019 study detected only five dominance effects for the three fertility traits involving two SNPs, one SNP in AFF1 of Chr06 for all three fertility traits and one SNP in SIPA1L3 of Chr18 for DPR and CCR. The dominance effects of the AFF1 SNP were confirmed for all three fertility traits, whereas the dominance effects of the SIPA1L3 SNP were not confirmed by the 2023 study, although the additive effect of the SIPA1L3 SNP was among the top-20 additive effects for DPR in the 2023 study ( Table 1). The 2023 study detected 69 more dominance effects than in the 2019 study (74 vs. 5).
Some new effects of the 80K were due to new SNPs not in the 60K, and some effects in the 60K only were due to SNPs not in the 80K. The effects of SNPs in the 80K only included those in the CEBPG-PEPD and DEDD2-POU2F2 region of Chr18 for DPR and CCR and in KLHL8 and PAG6 for HCR. The most-notable 60K-only SNP effect was that in COX17, which had the second-most-significant effect for DPR and the most-significant effect for CCR. The most-notable effects confirmed by both the 2023 and 2019 studies included those in the SLC4A4-GC-NPFFR2 region of Chr06 and in KALRN of Chr01 for DPR and CCR and in AFF1 of Chr06 for all three fertility traits.
The comparison between HCR results of this study and the AFC results in a previous study [9] led to an interesting inference about the potential differences of the genetic effects with respect to the conception rate and successful pregnancy. The HCR effects (Table S5) overlapped many of the AFC effects on Chr19, but had no overlap with the AFC effects on Chr15, which along with Chr19 had most of the significant AFC effects. Since AFC involves puberty, conception, and successful pregnancy till birth, the Chr19 AFC effects should be contributing to the conception component of AFC, whereas the Chr15 AFC effects should be contributing to the successful pregnancy given that the Chr15 AFC effects had no overlap with the HCR effects. It is interesting that the progesterone receptor (PGR) gene of Chr15 known to be associated with the maintenance of pregnancy had a highly significant effect on AFC, but had no effect on HCR, consistent with the assumption that the Chr15 effects of AFC were related to the pregnancy maintenance of heifers.
This 2023 study detected some effects in or near genes known to affect reproduction that the 2019 study did not detect, including SHBG (Figure 1a), GNRHR (Figure 1c), ESR1 (Figure 4e), and multiple genes of pregnancy-associated glycoproteins (Figure 5f). SHBG is the sex-hormone-binding globulin gene and is likely associated with early puberty [11,12]. GNRHR is gonadotropin-releasing hormone receptor, and the activation of the receptor ultimately causes the release of gonadotropic luteinizing hormone (LH) and follicle-stimulating hormone (FSH) [13]. ESR1 is the estrogen receptor 1 gene, and the protein encoded by this gene regulates the transcription of many estrogen-inducible genes that play a role in growth, metabolism, sexual development, gestation, and other reproductive functions and is expressed in many non-reproductive tissues [14]. Pregnancy-associated glycoproteins are correlated with pregnancy success and are predictive of impending embryonic mortality in both beef and dairy cattle [15]. These known reproductive functions should increase the likelihood that genetic effects in or near these genes were true effects.

Gene Ontology of Candidate Genes
To understand the potential biological functions of the candidate genes, we searched the Gene Ontology Resources (GO) [16] for the biological processes involved for the candidate genes of the additive and dominance effects for DPR (Tables 1 and 2) and HCR (Tables 3 and 4). A major question to answer for the GO search was whether candidate genes with highly significant effects such as GC, NPFFR2, and SLC4A4 had known reproductive functions, but the GO results found none for DPR (Table S9; Figure S1), noting that the candidate genes with known reproductive functions were not included in the GO analysis, such as SHBG (Figure 1a), GNRHR (Figure 1c), ESR1 (Figure 4e), and multiple genes of pregnancy-associated glycoproteins (Figure 5f). The GO results for the HCR additive effects found one candidate gene (MYOCD) involving reproduction (development of the reproductive system and structure; Table S10, Figure S2). No candidate gene of the dominance effects was found to be involved in reproduction. The GO results of DPR and HCR showed that every candidate gene involved multiple biological processes, and some biological processes involved multiple genes (Tables S9 and S10). GC (GC vitamin-D-binding protein) was involved in vitamin D and steroid metabolic processes, and NPFFR2 (neuropeptide FF receptor 2) and SLC4A4 (solute carrier family 4 member 4) each involved many regulatory processes (Table S9). However, none of those regulatory processes was known to affect reproduction. In contrast to the lack of evidence for the connections between the known biological functions of the candidate genes and the phenotypes, the GWAS results in this study provided Holstein-specific and high-confidence evidence for the association between many candidate genes and the three fertility traits. The combination of the GO results with the GWAS results of this study should point to the potential involvement of various regulatory processes of NPFFR2 and SLC4A4, as well as the vitamin D and steroid metabolism of GC in Holstein female fertility. Similarly, the combination of the GO and GWAS results should provide useful indications of the potential genetic mechanisms of other significant SNP effects affecting the fertility traits in Holstein cows.

Holstein Population and SNP Data
The Holstein population in this study had 1,001,374-1,194,736 first-lactation Holstein cows with phenotypic observations on the daughter pregnancy rate (DPR), cow conception rate (CCR), and heifer conception rate (HCR) and genotypes of 78,964 original and imputed SNPs. The SNP genotypes were from 32 SNP chips with various densities (Table S11) and were imputed to 78,964 SNPs via the FindHap algorithm [17] as a routine procedure for genomic evaluation by the Council on Dairy Cattle Breeding (CDCB) [18]. The SNP genotyping quality control by the CDCB had checks and requirements at the individual and SNP levels, including the call rate, parent-progeny conflicts, sex verification using X-specific SNPs, and Hardy-Weinberg equilibrium [19,20]. In addition, we applied a minor allele frequency (MAF) of 5% for SNP filtering in this study. The phenotypic values used in the GWAS analysis were the phenotypic residuals after removing fixed non-genetic effects available from the December 2022 U.S. Holstein genomic evaluation by the CDCB. The basic statistics of the cows and the phenotypic data of the three fertility traits are given in Table S12. With the requirement of a 5% MAF, the number of SNPs for the GWAS analysis was 75,524 for DPR, 75,295 for CCR and 75,140 for HCR. For 75,524 SNPs with additive and dominance effects, the threshold p-value for declaring significant t-tests for the Bonferroni correction with 0.05 genome-wide false positives was 10 −8 , or log 10 (1/p) = 8. The SNP and gene positions were those from the ARS-UCD1.3 cattle genome assembly [21]. Genes containing or in the proximity of highly significant additive and dominance effects were identified as candidate genes affecting the three fertility traits. All figures for the GWAS results were produced using SNPEVG2 in the SNPEVG package [22].

GWAS Analysis
The GWAS analysis used an approximate generalized least-squares (AGLS) method. The AGLS method combines the least-squares (LS) tests implemented by EPISNP1mpi [23,24] with the estimated breeding values from routine genetic evaluation using the entire U.S. Holstein population. The statistical model was: where y = column vector of phenotypic deviation after removing fixed nongenetic effects such as heard-year-season (termed as "yield deviation" for any trait) using a standard procedure for the CDCB/USDA genetic and genomic evaluation; µ = common mean; I = identity matrix; g = column vector of genotypic values; X g = model matrix of g; b = (µ, g ) , X = (I, X g ); a = column vector of additive polygenic values; Z = model matrix of a; and e = column vector of random residuals. The first and second moments of Equation (1) are:E(y) = Xb and var(y) = V = ZGZ + R = σ 2 a ZAZ + σ 2 e I, where σ 2 a = additive variance, A = additive relationship matrix, and σ 2 e = residual variance. The problem of estimating the b vector that includes SNP genotypic values in Equation (1) is the requirement of inverting the V if the generalized least-squares (GLS) method is used or inverting the A matrix if the mixed model equations (MMEs) [25] are used. However, both V and A cannot be inverted for our sample size. To avoid inverting these large matrices, the GWAS used the method of approximate GLS (AGLS), which replaces the polygenic additive values (a) with the best linear unbiased prediction based on pedigree relationships [8]. The AGLS method is based on the following results: where y * = y − Zâ andâ is the best linear unbiased prediction (BLUP) of a. Equation (2) is the GLS solution, and Equation (3) is the MME solution of b. These two equations yield identical results, andb from either equation is termed the best linear unbiased estimator (BLUE) [25]. Ifâ is known, the LS version of the BLUE given by Equation (3) is computationally efficient relative to the GLS of Equation (2), requiring the V inverse, or the joint MME solutions ofb andâ, requiring the A inverse. The AGLS method uses two approximations. The first approximation is to useã from routine genetic evaluation as an approximation ofâ in Equation (3): where y * = y − Zã, andã is the column vector of 2(PTA) with PTA being the predicted transmission ability from the routine genetic evaluation. Equation (4) achieves the benefit of sample stratification correction from mixed models using pedigree relationships without the computing difficulty of inverting V or A. The second approximation of the AGLS approach is the t-test using the LS, rather than the GLS formula of the t-statistic, to avoid using the V inverse in the GLS formula. The significance tests for additive and dominance SNP effects used the t-tests of the additive and dominance contrasts of the estimated SNP genotypic values [23,26]. The t-statistic of the AGLS was calculated as: where L j = additive or dominance contrast; var(L j = standard deviation of the additive or dominance contrast; s a = row vector of additive contrast coefficients = P 11 /p 1 0.5P 12 (p 2 − p 1 )/(p 1 p 2 ) −P 22 /p 2 ; s d = row vector of dominance contrast coefficients = −0.5 1 0.5 ; v 2 = (y − Xb) (y − Xb)/(n − k)= estimated residual variance; g = column vector of the AGLS estimates of the three SNP genotypic effects of g 11 , g 12 , and g 22 from Equation (4); (X X) − gg = submatrix of (X X) − corresponding toĝ; where p 1 = frequency of A 1 allele, p 2 = frequency of A 2 allele of the SNP, P 11 = frequency of A 1 A 1 genotype, P 12 = frequency of A 1 A 2 genotype, P 22 = frequency of A 2 A 2 genotype, n = number of observations, and k = rank of X. The formula of s a defined above allows Hardy-Weinberg disequilibrium [26] and simplifies to p 1 p 2 − p 1 −p 2 under Hardy-Weinberg equilibrium. The additive effects of each SNP were estimated using three measures, the average effect of gene substitution, the allelic mean, and the allelic effect of each allele based on quantitative genetics definitions [26,27]. The allelic mean (µ i ), the population mean of all genotypic values of the SNP (µ), the allelic effect (a i ), and the average effect of gene substitution of the SNP (α) are: µ 1 = P 11.1 g 11 + 0.5P 12.1 g 12 (6) µ 2 = 0.5P 12.2 g 12 + P 22.2 g 22 (7) where P 11.1 = P 11 /p 1 , P 12.1 = P 12 /p 1 , P 12.2 = P 12 /p 2 , and P 22.2 = P 22 /p 2 . The additive effect measured by the average effect of gene substitution of Equation (10) is the distance between the two allelic means or effects of the same SNP and is the fundamental measure for detecting SNP additive effects, as shown by the t-statistic of Equation (5). The allelic effects defined by Equation (9) provide an understanding of the effect size and direction of each allele. However, the allelic effect of Equation (9) is not comparable across SNPs because the allelic effect is affected by the genotypic mean of the SNP defined by Equation (8). To compare allelic effects across SNPs, we replaced the SNP genotypic mean (µ) in Equation (9) with the average of all SNP genotypic means (µ all ): Equation (11) was used only for the purpose of graphical displaying the allelic effects.
The dominance effect of each SNP was estimated as the dominance contrast ofĝ from Equation (4): δ = L d = d 12 − (d 11 +d 22 )/2 = g 12 − (g 11 + g 22 )/2 (12) where g ij is the AGLS estimates of the SNP genotypic value from Equation (4) (i, j = 1, 2) and d ij is the dominance value (dominance deviation) of the A i A j SNP genotype: In this study, overdominance refers to the fact that the dominance value of the heterozygous genotype is more extreme than that of either homozygous genotype, i.e., d 12 > d 11 (1) and (4)), and the genotypic means of the original phenotypic values without removing the pedigree additive values were also used for the interpretation of the dominance effects. For the example of using the original phenotypic values, complete dominance of the dominant allele over the recessive allele was defined by y rd = y dd (not observed in this study), nearly complete dominance by y rd ≈ y dd (observed in this study), partial dominance by y rd < dd (observed in this study), and positive overdominance by y rd > y dd and y rd > y rr (not observed in this study), where y rr = the mean of the original phenotypic values without removing the pedigree additive (breeding) values of cows with the homozygous recessive genotype, y rd = the mean of the original phenotypic values without removing the pedigree additive (breeding) values of cows with the heterozygous genotype, and y dd = the mean of the original phenotypic values without removing the pedigree additive (breeding) values of cows with the homozygous dominant genotype of the SNP. A neutral dominant allele was defined as the lack of deviation of the genotypic mean of the homozygous dominant genotype from the population mean and was considered to have no effect on the phenotype by the dominant allele when in homozygous status.
To evaluate the impact of sharply negative homozygous recessive genotypes, a measure of negative impact was calculated as the difference between the mean phenotypic values of cows with the homozygous recessive genotypes and the mean values of the other two genotypes, the homozygous dominant genotype and the heterozygous genotype: NI = y rr − (y rd +y dd )/2 (14) where NI = negative impact of the homozygous recessive genotype.
To understand the potential reproductive functions of selected candidate genes, Gene Ontology (GO) analysis was performed using the OmicShare platform (www.omicshare. com/tools, accessed on 12 June 2023).

Conclusions
The million-cow GWAS identified many additive effects and a small number of dominance effects affecting the three fertility traits (DPR, CCR, HCR) in U.S. Holstein cows with high statistical confidence. The additive effects mostly were small effects, whereas the dominance effects mostly were substantially larger than the additive effects. Each dominance effect typically involved a dominant allele and a recessive allele, where the recessive allele had a sharply negative effect when in homozygous status and the dominant allele neutralized most of the negative effect of the recessive allele when in heterozygous status, but behaved like a neutral allele when in homozygous status. Most of the effects were new; some effects confirmed previously reported effects; some effects involved candidate genes with known functions affecting reproduction. The results from this study provided a new understanding of genetic factors underlying the three fertility traits in U.S. Holstein cows.

Appendix A
ARS computing facilities. The use of the USDA-ARS computers by this research was supported by USDA-ARS Projects 8042-31000-002-00-D and 8042-31000-001-00-D.

Conflicts of Interest:
The authors declare no conflict of interest. Appendix A Figure A1. Additive effects for DPR in the 86.8-87.3 Mb region of Chr06. "ae-" indicates the negative allelic effect. "ae+" indicates the positive allelic effect. Red and green circles indicate very different statistical significance measured by the log10(1/p) of two closely linked SNPs. The image of the gene regions at the bottom of this figure was from Ensembl [28]. Each green arrowed line indicates an SNP position.   Figure A3. Examples of the dominance effects of daughter pregnancy rate (DPR). DPR dominance effects were partial dominance, and the dominant alleles were neutral alleles when in homozygous status with genotypic means nearly overlapping the population mean of all genotypes or all cows.