Genetic Variability of Methane Production and Concentration Measured in the Breath of Polish Holstein-Friesian Cattle

Simple Summary Methane is one of the main contributors to climate change. A potential to reduce emissions by genetic selection exists; however, the genetic architecture of methane production remains largely unknown. We aimed to estimate its heritability and to perform genome-wide association studies for the identification of candidate genes associated with daily methane production and concentration. Methane was measured in the air exhaled by dairy cows during milking in an automated milking system and was analyzed using genomic information that was available for some of the cows. We showed that methane production and concentration are partly controlled by genes; however, no major genes were found. The estimated heritabilities indicate that selection for lower methane emission could be successful. Abstract The genetic architecture of methane (CH4) production remains largely unknown. We aimed to estimate its heritability and to perform genome-wide association studies (GWAS) for the identification of candidate genes associated with two phenotypes: CH4 in parts per million/day (CH4 ppm/d) and CH4 in grams/day (CH4 g/d). We studied 483 Polish Holstein-Friesian cows kept on two commercial farms in Poland. Measurements of CH4 and carbon dioxide (CO2) concentrations exhaled by cows during milking were obtained using gas analyzers installed in the automated milking system on the farms. Genomic analyses were performed using a single-step BLUP approach. The percentage of genetic variance explained by SNPs was calculated for each SNP separately and then for the windows of neighbouring SNPs. The heritability of CH4 ppm/d ranged from 0 to 0.14, with an average of 0.085. The heritability of CH4 g/d ranged from 0.13 to 0.26, with an average of 0.22. The GWAS detected potential candidate SNPs on BTA 14 which explained ~0.9% of genetic variance for CH4 ppm/d and ~1% of genetic variance for CH4 g/d. All identified SNPs were located in the TRPS1 gene. We showed that methane traits are partially controlled by genes; however, the detected SNPs explained only a small part of genetic variation—implying that both CH4 ppm/d and CH4 g/d are highly polygenic traits.


Introduction
Methane (CH 4 ) is one of the main contributors to climate change. Its global warming potential over 100 years is 28 times higher than that of carbon dioxide (CO 2 ) [1]. Among various sources of emissions, the livestock industry plays an important role [2]. Although CH 4 does not accumulate over time, by reducing its emission the livestock industry can slow down the climate change process [3].
Among enteric methane mitigation strategies, which include nutritional and management strategies, selective breeding to reduce the amount of methane produced per kg of milk is of interest. Genetic reduction of CH 4 emission requires measurements of emission from many animals. Emission of CH 4 can be measured in several ways. One group of measuring methods is breath analyzing techniques. These are well-documented methods based on the principle that about 90% of enteric CH 4 is released during eructation events and by breathing [4]. These methods are cost-effective and non-invasive. Some of them are based on infrared spectroscopy and have been used successfully by several research groups [5][6][7], showing the possibility of acquiring large amounts of data suitable for genetic analysis.
Once the methane measurements are collected, one may define various methane phenotypes, which include: methane production, methane intensity, methane yield, or residual methane production [8]. Commonly, the amount of methane produced by an animal [9,10] is expressed in grams or liters of CH 4 per day, which is calculated in accordance to Madsen et al. (2010) [11]. The aforementioned phenotype is based on the CH 4 /CO 2 ratio (ppm) multiplied by CO 2 L/day estimated from the heat production unit [11,12]. An alternative solution is to use raw CH 4 concentrations, which are obtained without taking into account other traits. Thus far there have been few studies based on direct CH 4 emission [13,14], and only the study of van Engelen et al. (2018) [15] provides heritability estimates-but without a focus on more detailed genetic architecture.
The genetic architecture of CH 4 remains largely unknown. Several studies have attempted to approach this issue, and results indicate that genes control approximately 20-30% of methane emission variability [9,10,16]. Differences between the estimates may arise from the complexity of the emission process and differences between populations and periods of measurement. In addition, the emission process is affected by several factors, e.g., biological processes in the rumen, type of diet, amount of feed intake, and size of the animal [17].
In this paper, we aim to estimate heritability for two methane phenotypes: CH 4 ppm/day (CH 4 ppm/d) and CH 4 grams/day (CH 4 g/d), and to perform genome-wide association studies (GWAS) for the identification of candidate genes associated with CH 4 ppm/d and CH 4 g/d in the air exhaled by dairy cows.

Materials and Methods
We studied 483 Polish Holstein-Friesian cows kept on two commercial farms in Poland. Detailed information on farms, measuring set-up, and data processing of records used in this study can be obtained from Pszczola et al. (2017) [7]. The CH 4 and CO 2 concentrations were measured on Farm 1 during two periods: from 02 December 2014 to 03 February 2016, and 01 June 2016 to 17 September 2016 from the 15th to 305th day in milk (DIM). Measurements on Farm 2 were taken over the period of 05 February 2016 to 14 March 2016, from the 5th to 305th DIM. Measurements of CH 4 and CO 2 concentrations were taken during milking using an FTIR gas analyzer (GASMET 4030; Gasmet Technologies Oy, Helsinki, Finland) that was installed in the feeding bin of the Automated Milking System (AMS, Astronaut, Lely Industries, NV, Maassluis, The Netherlands). Cows were milked repeatedly during the experiment.
Measurements of CH 4 and CO 2 concentrations were corrected for diurnal variation with a Fourier series approach [18,19] using the model described by Pszczola et al. (2017) [7], and averaged per cow per day. Therefore, the analyzed phenotypes were daily average CH 4 concentrations (CH 4 ppm/d), with a total of 34,359 CH 4 measurements. The detailed number of observations and average CH 4 emission per farm, along with production parameters, are presented in Table 1. The cows had ad-libitum group access to partial mixed rations, and additional concentrates were given to cows during milking. Further details on the feeding regime can be obtained from Pszczola et al. (2017) [7].  [11] by multiplying the CH 4 to CO 2 concentration ratio by the expected CO 2 production (liters/day) estimated from heat production [11,12]. This estimation of CO 2 production used the milk production, live weight, and day of pregnancy obtained from the AMS system. This resulted in an estimated CH 4 production expressed in liters per day, and was converted to grams per day using a CH 4 density of 0.668 g/L, which assumes normal temperature (20 • C) and pressure (101.325 kPa). For more details, see Pszczola et al. (2017) [7].
Overall, 330 of the 483 cows were genotyped with the Illumina BovineSNP50 v2.0 BeadChip (Illumina Inc., San Diego, CA, USA). The SNP data were processed with the following quality control checks: (1) being in Hard-Weinberg equilibrium, (2) not being monomorphic, and (3) having an SNP call rate of above 0.95. We also removed SNPs located on the sex chromosomes and unassigned SNPs. Six cows were removed as they had a call rate below 0.9. After quality control, 39,269 SNPs remained for the genome-wide association analysis.
The statistical model used for estimation of (co)variance components included random and fixed effects, which were selected based on the Akaike information criteria obtained from the lme4 package in R software [20,21]. Variance components were estimated using the REML approach in airemlf90 software [22].
The final model was the following: (1) in which CH 4 ijkl is the raw average daily CH 4 concentration in ppm of the ith cow in the jth lactation, on the kth day in milk (DIM) over a range of 5th to 305th day, and the lth farm-year-week of the measurement; βn are fixed regression coefficients. Lactation (LAC) had two levels: 1 and 2+, for which records up to lactation 8 were included. The general lactation curve was modeled using third-order Legendre polynomials and random animal (animal), and permanent environment (pe) effects were modeled with second-order Legendre polynomials [23].
The covariance structure for model (1) was: where G and P are covariance matrices of the random regression coefficients for animal and permanent environment effects, ⊗ is the Kronecker product, H is the combined pedigree and genomic relationship matrix, I is an identity matrix and σ 2 e is the residual variance. The G matrix was defined as: where Z contains genotypes adjusted with allele frequency and p j is the allele frequency for marker j [24]. The H matrix was created as in [25,26]: where subscripts 1 and 2 represent ungenotyped and genotyped animals, respectively, A is a pedigree matrix and G is a genomic relationship matrix. All three matrices were computed using the BLUPF90 suite of programs [22]. To investigate the existence of a possible population structure, we performed principal component analysis on the calculated G matrix. Some of the animals with phenotypes were not genotyped. Therefore, to utilize all the available information for estimating SNP effects, we used a single-step genomic BLUP (ssGBLUP) method that integrates phenotypes, genotypes, and pedigree information [25,26]. Genome-wide association studies were conducted with the BLUPF90 suite of programs [27]. The implemented algorithm uses the ssGBLUP procedure, which fits all the SNPs simultaneously and then obtains estimates of marker effects and their associated p-values from the estimated breeding values [28,29]. The main advantage of using ssBLUP for GWAS is the potential for using ungenotyped animals and the lack of need for using de-regressed phenotypes. The threshold for identifying significant SNPs was p ≤ 0.05; therefore, after Bonferroni correction to account for multiple testing (i.e., 0.05/39,269), we obtained a threshold of 5.9 on the -log10 scale.
Next, we calculated the proportion of genetic variance explained by significant SNPs. In addition, we calculated the proportion of genetic variance explained by the moving windows of adjacent SNPs. We used two approaches to define the windows: in the first, window size was arbitrarily fixed to 50 SNPs, and in the second, window size was set according to the average length of the haplotype block. A haplotype block was required to have at least 95% of combinations of SNPs within a region in very high linkage disequilibrium [30]. As the average block length in the bovine genome is 69.7 +/− 7.7 kb, the window size was set to 70 kb. As a result, the number of SNPs grouped per window varied from 1 to 5 SNPs across the chromosomes. In the final step, potential candidate SNPs were checked in the Cow QTL database [31] for linkage to previously detected quantitative trait loci (QTLs) for CH 4 production or other traits.

CH 4 Measurements
Variances of average daily CH 4 ppm/d were 36,401 (CH 4 ppm/d) 2 for Farm 1 and 27,820 (CH 4 ppm/d) 2 for Farm 2 and differed significantly (Bartlett test p-value < 0.001). Variances of average daily CH 4 g/d were 7789 (CH 4 g/d) 2 for Farm 1 and 10,962 (CH 4 g/d) 2 for Farm 2 and differed significantly (Bartlett test p-value < 0.001). Animals on Farm 1 and Farm 2 produced similar amounts of CH 4 ppm/d (Table 1). Although Farm 2 had higher CH 4 concentrations by about 12 ppm than Farm 1, differences between average CH 4 ppm/d per farm were not statistically significant (Welch two-sample t-test). In the case of CH 4 g/d (Table 1), Farm 2 had higher CH 4 concentrations by 107 g and was significantly different from Farm 1 (p-value < 0.001, Welch two-sample t-test).

Variance Components
Genetic, permanent environment, and residual variances for CH 4  Genetic, permanent environment effect, and residual variances for CH4 g/d throughout lactation, estimated using model (1), are presented in Figure 2. The genetic variance fluctuated during lactation, with an average of 1199 (g/d) 2 , a maximum of 1486 (g/d) 2 on the 5th DIM, and a minimum of 853 (g/d) 2 on the 292nd DIM. Permanent environment variance also fluctuated with a mean of 1709 (g/d) 2 , a maximum of 3397 on the 305th DIM, and a minimum of 1016 on the 102nd DIM. The residual variance was estimated at 2620 (g/d) 2 .   Genetic, permanent environment effect, and residual variances for CH4 g/d throughout lactation, estimated using model (1), are presented in Figure 2. The genetic variance fluctuated during lactation, with an average of 1199 (g/d) 2 , a maximum of 1486 (g/d) 2 on the 5th DIM, and a minimum of 853 (g/d) 2 on the 292nd DIM. Permanent environment variance also fluctuated with a mean of 1709 (g/d) 2 , a maximum of 3397 on the 305th DIM, and a minimum of 1016 on the 102nd DIM. The residual variance was estimated at 2620 (g/d) 2 .

Heritability Estimates
The heritability of CH 4 ppm/d systematically rose during lactation (Figure 3) starting at 0 on the 5th DIM and reaching a maximum of 0.14 at the end. The average heritability was 0.085. The heritability of CH 4 g/d (Figure 3)  The heritability of CH4 ppm/d systematically rose during lactation ( Figure 3) starting at 0 on the 5th DIM and reaching a maximum of 0.14 at the end. The average heritability was 0.085. The heritability of CH4 g/d (Figure 3) fluctuated with a mean of 0.22, a maxi mum of 0.26 on the 128th DIM, and a minimum of 0.13 on the 305th DIM.

Genome-Wide Association Study
The principal component analyses showed two clusters (Figure 4), indicating th presence of a family structure. Animals from both farms were present in the two clusters The obtained PC scores were small and explained 3.4% (PC1) and 2.5% of variance, re spectively. p-values were obtained to identify candidate SNPs associated with CH4 ppm/d and with CH4 g/d, and we found that none of the tested SNPs was significantly associated with the phenotypes. To identify candidate regions, we evaluated SNPs in 70 kbp and 5 SNP windows for CH4 ppm/d and CH4 g/d. The analysis using 70 kbp windows did no detect any promising SNPs. However, the 50-SNP window approach detected potentia candidate SNPs on BTA 14 which explained more than 0.09% of variance for CH4 ppm/d ( Figure 5) and more than 1% of genetic variance for CH4 g/d ( Figure 6). The discussed SNPs (Table 2) are located in the TRPS1 gene, which has been previously associated with milk fat yield, as evidenced by QTL:181825 [31].

Genome-Wide Association Study
The principal component analyses showed two clusters (Figure 4), indicating the presence of a family structure. Animals from both farms were present in the two clusters. The obtained PC scores were small and explained 3.4% (PC1) and 2.5% of variance, respectively. p-values were obtained to identify candidate SNPs associated with CH 4 ppm/d and with CH 4 g/d, and we found that none of the tested SNPs was significantly associated with the phenotypes. To identify candidate regions, we evaluated SNPs in 70 kbp and 50 SNP windows for CH 4 ppm/d and CH 4 g/d. The analysis using 70 kbp windows did not detect any promising SNPs. However, the 50-SNP window approach detected potential candidate SNPs on BTA 14 which explained more than 0.09% of variance for CH 4 ppm/d ( Figure 5) and more than 1% of genetic variance for CH 4 g/d ( Figure 6). The discussed SNPs (Table 2) are located in the TRPS1 gene, which has been previously associated with milk fat yield, as evidenced by QTL:181825 [31].

CH4 Measurements
The CH4 measurements varied across farms 1 and 2 for both CH4 ppm/d and CH4 g/d. According to Huhtanen et al. (2015) [32], potential sources of higher differences between farms may be sampling locations with different airflow patterns and cow variation in exhalation rate. Animals at Farm 2 were heavier (about 125 kg) and produced about 3 kg milk more per day (for more details see Table 1). This may indicate differences in feed intake or feed efficiency, although an investigation of this would require feed intake data, which were unavailable as the measurements were taken on commercial farms.

Variance Components
Estimates of genetic variance for CH4 g/d were lower when compared to the study of Pszczola et al. (2017) [7], who used the same method and data, but no genotypes. Therefore, the use of genotypes lowered the estimates. Fluctuations of permanent environment variance in both traits and high levels of residual variance are probably as a result of unaccounted disturbances such as changes in the exhalation rate of the cows, head movement, or changes in the airflow around the feeding bin that could have been present during the measuring period [32,33].

CH 4 Measurements
The CH 4 measurements varied across farms 1 and 2 for both CH 4 ppm/d and CH 4 g/d. According to Huhtanen et al. (2015) [32], potential sources of higher differences between farms may be sampling locations with different airflow patterns and cow variation in exhalation rate. Animals at Farm 2 were heavier (about 125 kg) and produced about 3 kg milk more per day (for more details see Table 1). This may indicate differences in feed intake or feed efficiency, although an investigation of this would require feed intake data, which were unavailable as the measurements were taken on commercial farms.

Variance Components
Estimates of genetic variance for CH 4 g/d were lower when compared to the study of Pszczola et al. (2017) [7], who used the same method and data, but no genotypes. Therefore, the use of genotypes lowered the estimates. Fluctuations of permanent environment variance in both traits and high levels of residual variance are probably as a result of unaccounted disturbances such as changes in the exhalation rate of the cows, head movement, or changes in the airflow around the feeding bin that could have been present during the measuring period [32,33].

Heritability Estimates
Pszczola et al. (2017) [7] reported an average heritability of 0.27 for CH 4 g/d for pedigree-based analyses of 483 Polish Holstein-Friesian cows. In this current study, the addition of genotype data for 330 of the 483 animals resulted in a decrease in the heritability estimates, suggesting that the genetic influence on CH 4 g/d may be lower than previously estimated. Lassen and Lovendahl [34] reported a heritability of 0.21 for Holstein-Friesian cattle, and heritability estimates for CH 4 g/kg of dry matter intake (DMI) predicted from milk fat composition ranged between 0.12 and 0.44 [16]. A possible explanation for these differing results is the higher number of observations (e.g., Lassen and Lovendahl (2016) [13] used records collected from 3121 animals in their study). Direct comparison of CH 4 heritability estimates from a range of studies requires traits that are comparable with one another. The CH 4 g/kg of DMI in the study of van Engelen et al., 2015 [16] is predicted from milk fat composition; therefore, its heritability estimates depend on production parameters. Genomic studies based on CH 4 g/kg of DMI could result in studying production parameters and not CH 4 emission itself. An alternative could be studies based on direct CH 4 measurements. The CH 4 production expressed in CH 4 g/d is calculated using a ratio of direct measurements of CH 4 and CO 2 multiplied by the predicted heat production, which is calculated using production parameters (i.e., fatprotein corrected milk), resulting in higher CH 4 emission heritability estimates. Therefore, while the CH 4 g/d is based on direct CH 4 measurements, it is still to some extent associated with cow production traits. Lassen and Lovendahl (2016) [34] indicated a positive additive genetic correlation of 0.43 ± 0.10 between CH 4 g/d and fat-protein corrected milk, which may have an impact on the results of the genetic studies on CH 4 g/d. Van Engelen et al. (2017) [15] used log 10 -transformed CH 4 ppm/d and obtained a heritability of 0.11. This was a similar phenotype to that of CH 4 ppm/d presented in this current study, since it also is based only on direct CH 4 measurements, without any additional traits involved in the calculations. Heritability estimates of both CH 4 ppm/d and log 10 -transformed CH 4 ppm/d are relatively low when compared to other CH 4 phenotypes. While genetic studies based on CH 4 ppm can remain unaffected by other traits, estimates of CH 4 ppm/d or log 10 -transformed CH4 ppm/d are affected by environmental factors and animal behavior during the measuring time. The exhalation rate of the cow, head movement, or changes in the airflow around the feeding bin may affect CH 4 ppm/d estimation and, as a result, may affect the estimation of CH 4 ppm/d heritability [32,33]. It is, therefore, still unclear which phenotype is more reliable, and more studies are required in this field.

Genome-Wide Association Study
The principal component analysis showed some stratification in the population; however, the magnitude of PC values was small, and both farms were represented in the two clusters ( Figure 4). As shown by Mancin et al. [35], who analyzed the impact of different population structures on GWAS results, the level of stratification present in our data should have no influence on our analyses.
QTLs linked to the development of the digestive tract were identified by Pszczola et al. (2018) [9] using similar data, but with CH 4 g/d phenotype and a Bayesian variable selection method [36]. In this study, we used single-step GWAS for post-analyses of ssBLUP, since it allows the inclusion of non-genotyped animals and thus provides more insight into the genetic architecture of the studied trait. The use of different methods on the same dataset also allows for the validation of previous results, as well as providing a reference point for the analysis of the new CH 4 phenotype (i.e., CH 4 ppm/d). In our current approach, none of the SNPs reported by Pszczola et al. (2018) [9] reappeared, which could indicate that Bayesian variable selection is more sensitive than the single-step GWAS approach.
Several studies [37][38][39] used the GBLUP method for estimation of marker effects for GWAS and Aguilar et al. (2019) [28] indicated an empirical agreement between association studies on beef cattle in the literature and their findings based on single-step GWAS. Legarra et al. (2018) [40] also showed similarities between the results of the Bayesian approach and the GBLUP approach for GWAS studies. Pszczola et al. (2018) [9] used the Bayesian approach for the GWAS analyses of CH 4 g/d, for which only genotyped animals could be used. In this current study, we decided to use ssGWAS, as it allowed us to include all the 483 phenotyped animals, of which only 330 were genotyped. A larger number of both genotyped and ungenotyped animals can improve the power of a GWAS analysis; however, in our case, it did not yield a higher number of significant SNPs as compared with Pszczola et al. (2018) [9].
In this study, we detected potentially novel SNPs for both CH 4 ppm/d and CH 4 g/d. The identification of the same SNPs in both traits-which express CH 4 emission differentlymay indicate a true association with CH 4 emission. However, the found QTLs, similar to previous studies, explained only a small percent of the genetic variation-suggesting either the need for analysis on a larger dataset or a lack of SNPs with a large effect on the analyzed traits. Another possible reason for finding a small number of SNPs with significant effects on the investigated trait is the size of the SNP chip used. In this study we used the Illumina BovineSNP50 v2.0 BeadChip; possibly, a higher density chip would help to pin-point more significant SNPs.

Conclusions
This research aimed to study the genetic architecture of the CH 4 ppm/day and CH 4 g/day phenotypes. The obtained heritabilities of both traits indicated that CH 4 emission is partially controlled by genes. While GWAS studies reported potentially novel SNPs associated with both CH 4 phenotypes, all SNPs were associated with only one gene and explained only a small percent of the genetic variation. We conclude that both CH 4 ppm/d and CH 4 g/d are highly polygenic traits; thus, the identification of a specific gene or region will be challenging and will require large datasets.