Identification of Genetic Effects of ACADVL and IRF6 Genes with Milk Production Traits of Holstein Cattle in China

With the development of high-throughput sequencing, RNA sequencing has been widely used in the identification of candidate genes for complex traits in livestock, and the functional genes and mutations with large genetic effects on milk production traits can provide molecular information for marker-assisted selection to increase the selection accuracy and accelerate genetic gain in dairy cattle. Our previous study on the liver transcriptome of Holstein cows found that acyl-CoA dehydrogenase (ACADVL) and interferon regulatory factor 6 (IRF6) are differentially expressed between dry and peak lactation periods, as well as that they are involved in lipid metabolism and the proliferation and differentiation of mammary epithelial cells. Thus, the two genes were considered candidates for milk traits. Hence, this study further collected 1186 Holstein cows from 110 sire families to investigate their genetic associations with milk yield and composition traits. By resequencing the entire exons and 2000 bp of the 5′ and 3′ flanking regions of the two genes, we identified eight SNPs in ACADVL and eight SNPs in IRF6. Subsequent single-locus association analyses showed that the eight SNPs in ACADVL were all significantly associated with milk fat yield, fat percentage, and protein yield (p values ≤ 0.0001–0.0414), and the eight SNPs in IRF6 were associated with milk, fat, and protein yields in the first or second lactation (p values ≤ 0.0001–0.0467). Using Haploview 4.2, one haplotype block with eight of the SNPs in ACADVL (D’ = 0.99–1.00) and two haplotype blocks in IRF6 with three of the SNPs in each were observed (D’ = 0.98–1.00). Similarly, the haplotype combinations of ACADVL were significantly associated with milk yield, fat percentage, fat yield, and protein yield in the two lactations (p values ≤ 0.0001–0.0125), and those of IRF6 were associated with five milk traits (p values ≤ 0.0001–0.0263). Furthermore, with the JASPAR software, it was predicted that the SNPs 19:g.26933503T>C in ACADVL and 16:g.73501985G>A in IRF6 changed the transcription factor binding sites of ZEB1, PLAGL2, and RHOXF1, implying their impacts on the expressions of the corresponding genes. Our findings demonstrated that the ACADVL and IRF6 genes have significant genetic effects on milk yield and composition traits, and the valuable SNPs might be used as genetic markers for genomic selection programs in dairy cattle.


Introduction
It is well known that milk and dairy products are one of the ideal foods for humans because they not only possess a comprehensive nutritional composition, such as rich nutrients, proteins, fatty acids, vitamins, etc. [1], but also reduce the morbidity of some chronic diseases, such as type II diabetes, childhood obesity, and cardiovascular disease [2][3][4].
With the development of China's economy, the dairy industry has become one of the most important components of the Chinese economy [5], and the social demand is gradually changing from an increased milk intake to the intake of high-quality milk [6]. For the dairy industry, milk production traits are the most important economic traits, which include milk production, fat yield, fat percentage, protein yield, and protein percentage, and there is a strong association between all of the milk production traits [7,8]. After years of continuous breeding, the production performance and the genetic improvement of dairy cattle have made great progress, but traditional breeding methods still have shortcomings, such as a long generation interval and slow progress. With the advent of genomic selection, the breeding of dairy cattle has developed more rapidly [9]. Genomic selection can estimate the genomic estimated breeding values (GEBVs) of dairy cows by using genomewide SNP markers and combining them with progeny assays [10][11][12] and then adding the gene information with large genetic effects to the SNP chips used in the genomic selection to improve the accuracy of the GEBVs [13].
With the development of high-throughput sequencing, RNA sequencing has been widely used in the identification of candidate genes for complex traits in livestock [14][15][16]. In our previous study, we analyzed the liver transcriptome of three healthy Holstein cows in dry, early, and peak lactation periods and identified promising candidate genes for milk protein and fat traits [17]. The acyl-CoA dehydrogenase (ACADVL) gene, which is located on chromosome 19 and has a full length of 5307 bp, acts on the first step of fatty acid β-oxidation by encoding the very long-chain acyl-coenzyme A dehydrogenase (VLCAD) and plays an important role in lipid metabolism and long-chain fatty acid oxidation energy provision [18]. The ACADVL has also been shown to correlate positively with the concentration of triglycerides (TGs) in the liver, β-hydroxybutyric acids (BHBA) in mice [19], and non-esterified fatty acids (NEFA) in dairy cow serum [20], all of which are the main precursors of synthetic milk fat [21]. The interferon regulatory factor 6 (IRF6) gene is a member of the IRF family of transcription factors that plays a major role in innate immune responses and is involved in tumor suppression, cell cycle regulation, and apoptosis [22][23][24][25]. The IRF6 is responsible for the proliferation and differentiation of mammary epithelial cells, and a large number of mammary epithelial cells constitute the acinus, which is the basic unit of mammary synthesis and milk production [26,27]. Based on these previous studies, the ACADVL and IRF6 genes are probably candidates for milk production traits. Therefore, in this study, we detected the genetic variants within the ACADVL and IRF6 genes, statistically analyzed whether both of the genes had significant genetic effects on the milk yield and composition traits in a Holstein cow population, and identified potential functional mutations. Furthermore, our study will provide valuable information for understanding the genetic architecture of milk traits and genetic markers for genomic selection programs in dairy cattle.

Animal and Phenotype Data Collection
A total of 1186 Holstein cows from 110 sire families with daughter numbers ranging from 5 to 80 and an average of 11 daughters per sire were used in this study. All of the cows in this study were selected from two farms in Hebei Province, China, where regular and standard performance testing (dairy herd improvement, DHI) has been regularly conducted for many years. The phenotypic values for the five milk production traits (305 d milk yield, 305 d milk protein yield, 305 d milk fat yield, 305 d milk protein percentage, and 305 d milk fat percentage) for these cows were provided by the Dairy Data Center of China, Dairy Association of China (http://www.bdcc.com.cn/ (accessed on 10 November 2021); Beijing, China). They were calculated based on test-day records of the DHI data, and there were 1122 records for the first lactation and 617 records for the second lactation. The descriptive statistics of the phenotypic values of the milk production traits in the first and second lactations are shown in Table S1.
The genomic DNA was extracted from each blood sample using the Tiananpu Blood Deoxyribonucleic Acid Kit (Tiangen, Beijing, China). The concentration and purity were detected by a NanoDrop 2000 spectrophotometer (Thermo Scientific, Hudson, DE, USA) and 1.0% agarose gel electrophoresis, respectively.

SNP Identification and Genotyping
Based on the genomic sequences of the bovine ACADVL (Gene ID: 282130) and IRF6 (Gene ID: 614253) genes from GenBank (https://www.ncbi.nlm.nih.gov/nuccore (accessed on 20 March 2022)), 19 and 16 pairs of primers (Table S2) were designed using Primer 3.0 (http://bioinfo.ut.ee/primer3-0.4.0/ (accessed on 20 March 2022)) to amplify the entire coding region and the upstream and downstream regulatory regions of 2000 bp of the ACADVL and IRF6, respectively. The primers were synthesized at the Beijing Genomics Institute (Beijing, China). We randomly selected blood genomic DNA samples from 111 cows and diluted each sample to 50 ng/µL. Next, we placed each sample randomly into five pools, four of which contained 22 samples, and the fifth one contained 23 samples. Using the pooled DNA samples as templates, PCR amplifications were performed in a final reaction volume of 2 µL of genomic DNA, 1.25 µL of each primer (10 mM), 12.5 µL of Premix TaqTM (Takara, Dalian, China), and 8µL of DNase/RNase-free deionized water (Tiangen, Beijing, China). The following PCR protocol was used: 5 min at 94 • C for the initial denaturation, followed by 35 cycles at 94 • C for 30 s, 60 • C for 30 s, and 72 • C for 30 s, and a final extension at 72 • C for 7 min for all primers. Then, each PCR product was sequenced by the ABI3730XL DNA analyzer (Applied Biosystems, Foster, CA, USA), and the sequences were aligned with the reference genome (ARS-UCD1.2) using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 9 May 2022)) to search for potential SNPs.

Linkage Disequilibrium (LD) Estimation and Association Analyses
First, the Haploview 4.2 software was utilized to calculate the degree of the LD extent among the SNPs of the ACADVL and IRF6 genes. The extent of the LD is measured by the D' value, which is proportional to it. The haplotype block with a frequency greater than 0.05 was retained.
The additive genetic relationship numerator matrix A was constructed using the SAS 9.4 software to trace the pedigree back to three generations of the 1186 involved subjects. The variance components were provided by the Dairy Date Center of China and estimated based on the data of 30,000 Holstein cows in China using the DMU package version 6 (University of Aarhus, Foulum, Denmark). Finally, we estimated the associations between the SNPs and the haplotype blocks with the five milk production traits using a mixed animal model and the SAS 9.4 software. The animal model is as follows: where Y is the phenotypic value of each trait of each cow, µ is the population mean, HYS is the fixed effects of the herd, year, and season, b is the regression coefficient for covariant M, M is the month age effect of calving, G is the effect of the genotype or haplotype, a is the random additive effect of the subjects, distributed as N , and α = a + d(q − p) where AA, BB, and AB are the least square means of the milk production traits in the corresponding genotypes, p is the frequency of allele A, and q is the frequency of allele B.

Biological Function Prediction
We used the Jaspar software (http://jaspar.genereg.net/ (accessed on 26 July 2022)) to predict whether the SNPs in the 5 ′ flanking regions of the ACADVL and IRF6 genes changed the transcription factor binding sites (TFBs) (relative score ≥ 0.90).

SNP Identification
By resequencing the entire coding regions of the ACADVL and IRF6 genes and their upstream and downstream regulatory regions, a total of sixteen SNPs were found in ACAD-VL and IRF6. For the ACADVL gene, the following eight SNPs were identified: rs41607272, rs443625020, and rs134933545 in the introns, rs41904274, rs209157804, rs380322647, and rs211172355 in the 3 ′ flanking region, and rs41642657 in the 5 ′ flanking region. The following eight SNPs of the IRF6 gene were also detected: rs110521095 in the exon, rs41819993, rs109620848, rs41819982, rs111023669, and rs41819977 in the intron, rs136705901 in the 5 ′ flanking region, and rs133566767 in the 3 ′ flanking region. Detailed information and the genotypic and allele frequencies of the sixteen SNPs are shown in Table 1.

Single-Locus Association Analyses with Five Milk Traits
The associations of the SNPs in the ACADVL and IRF6 genes with genetic effects on the five milk traits are presented in Table 2. For the ACADVL gene, the SNP g.26933503T>C was strongly associated with fat and protein yields in both lactations (p values: 0.0064-0.0414) and was strongly associated with fat percentage only in the second lactation (p value = 0.0127). The g.26936476T>C, g.26937081G>C, and g.26941116C>A were significantly associated with all of the traits except for protein percentage in the first lactation (p values < 0.0001-0.0172), while in the second lactation, only the g.26936476T>C had a strong association with milk yield (p value = 0.004), and the g.26937081G>C and g.26941116C>A had significant associations with fat percentage, fat yield, and protein yield (p values < 0.0001-0.0375). A significant association was found between the fat yield and the g.269400-95G>T and g.26941299G>A in the first lactation (p values = 0.0051 and 0.006), while in the second lactation, the g.26940095G>T and g.26941299G>A were significantly associated with fat percentage, fat yield, and protein yield, respectively (p values: 0.006-0.0258). The g.26936494C>G was strongly associated with all of the five traits except for protein percentage in the first lactation (p values < 0.0001-0.0397). Finally, the g.26941290A>G had an association with milk, fat, and protein yields in the first lactation (p values 0.0059-0.0431) and with fat percentage and protein yield in the second lactation (p values = 0.0127 and 0.024).   As for the IRF6 gene, in the first lactation, the g.73507276G>A, g.73515941C>T, g.73516-019C>T, and g.73517802G>C were significantly associated with all of the milk production traits apart from milk yield (p values < 0.0001-0.0311). The g.73501985G>A was only associated with fat percentage (p values = 0.0059), and the g.73510990T>C had associations with fat percentage, milk yield, and protein yield (p values: 0.0237-0.0467). The g.73506456C>T was associated with milk, fat, and protein yields in both lactations (p values: 0.0001~0.0332), and the g.73519434G>A was associated with all of the five traits in the two lactations (p values < 0.0001-0.0068). In addition to the two SNPs above, all of the remaining SNPs were significantly associated with all of the traits except for protein percentage (p values < 0.0001-0.0465). In addition, as shown in Table S3, the additive, dominant, and substitution effects of these sixteen SNPs were significantly associated with at least one milk production trait in the first or second lactation (p values < 0.05).

Haplotype-Based Association Analyses with Five Milk Traits
Using Haploview 4.2, the eight SNPs of the ACADVL gene were observed to be highly linked, and a haplotype block was formed (A: D' > 0.99, Figure 1). The haplotype block consists of five haplotypes, each with haplotypes H1 (CCCCGAGA), H2 (TCCGTCAG), H3 (CCCGGCGA), H4 (CTCGGCGA), and H5 (CCGGGCGA), and their frequencies were as follows: 42.1%, 22.8%, 15.7%, 12.7%, and 6.1%, respectively. The haplotype blocks were strongly associated with four of the milk production traits (p values: < 0.0001-0.0125) except for protein percentage in the two lactations (Table 3). Among them, the haplotype H2 was superior to the others in milk, fat, and protein yields, making it the dominant haplotype combination.

Functional Variation Prediction Caused by SNPs
We predicted the effect of the two SNPs (19:g.26933503T>C and 16:g.73501985G>A) in the 5′ promoter region of the ACADVL and IRF6 genes on TFBSs using the JASPAR software ( Table 4). The results showed that the mutation from allele T to C of the 19:g.26933503T>C in ACADVL caused the disappearance of the binding site (BS) for the transcription factor (TF) zinc finger E-box-binding protein 1 (ZEB1) (relative score = 0.91). As for the IRF6 gene, two haplotype blocks were formed, and each of them included three SNPs (B: D' > 0.98, C: D' > 1.00; Figure 1). The haplotype block 1 formed the following four haplotypes: H1(GCA), H2 (ATG), H3 (GCG), and H4 (GTG). The haplotype block 2 formed the following three haplotypes: H1 (CCC), H2 (CCG), and H3 (TTG). The frequencies of block 1 were as follows: 51.3%, 26.5%, 11.5%, and 10.6%; the frequencies of block 2 were as follows: 58.7%, 36.0%, and 5.3%, respectively. The haplotype combinations in block 1 were significantly associated with all of the five milk production traits in both lactations (p values < 0.0001-0.0018). In block 2, the haplotype combinations were significantly associated with four of the traits except for protein percentage in the second lactation (p values < 0.0001-0.0263). In these two blocks, haplotype H2 was the advantageous haplotype for the yield traits.

Functional Variation Prediction Caused by SNPs
We predicted the effect of the two SNPs (19:g.26933503T>C and 16:g.73501985G>A) in the 5 ′ promoter region of the ACADVL and IRF6 genes on TFBSs using the JASPAR software ( Table 4). The results showed that the mutation from allele T to C of the 19:g.26933503-T>C in ACADVL caused the disappearance of the binding site (BS) for the transcription factor (TF) zinc finger E-box-binding protein 1 (ZEB1) (relative score = 0.91). For the 16:g.73501-985G>A in IRF6, allele G created a BS for the transcription factor pleomorphic adenoma gene-like protein 2 (PLAGL2) (relative score = 0.98), and allele A invented a BS for the Rhox homeobox family member 1 (RHOXF1) (relative score = 0.97).

Discussion
In our previous RNA-seq study, the ACADVL and IRF6 genes were identified as key candidates for milk production traits [17]. Here, we further validated that these two genes indeed displayed significant genetic effects on the milk yield and composition traits of Holstein cattle in China.
The ACADVL gene is involved in fatty acid degradation and metabolism pathways, but it is also located within the significant intervals of known QTLs for milk fat percentage, fat yield, protein percentage, and protein yield in dairy cattle [28][29][30]. Some researchers have found that a mutation in ADACVL causes the loss of VLCAD function, meaning that long-chain fatty acids cannot be oxidized and decomposed, finally resulting in very-long chain acyl-CoA dehydrogenase deficiency (VLCADD), which, in turn, impacts the heart, liver, and muscle functions in humans, dairy cattle, and dogs [31][32][33][34][35]. Tucci et al. reported that ACADVL-knocked-out mice had impaired lipid metabolism and developed hepatic steatosis when fed medium-chain triglycerides [36]. In dairy cows, it has been reported that ACADVL is also involved in the process of lipid metabolism [37] and that ACADVL expression in the liver is significantly higher before calving and decreases after calving, which indicates that the upregulation of ACADVL promotes the hepatic lipid metabolism to cope with the fat mobilization required for calving and supports greater milk production [38,39]. The IRF6 gene is a DNA-binding transcriptional activator that is in a nonphosphorylated state at the cell quiescent phase and plays a crucial role in the proliferation and differentiation of mammary epithelial cells. When cells are in their proliferative phase, the IRF6 protein expression and phosphorylation are regulated by the synergistic action of maspin (a mammary serine protease inhibitor), resulting in a decrease in protein stabil-ity and, thereby, promoting the proliferation and differentiation of mammary epithelial cells [40,41]. Bailey et al. found that IRF6 expression is up-regulated in mammary glands after cessation of lactation [42]. The additive effects are the cumulative effects between the alleles and non-alleles and the part of the genetic inheritance that can be fixed and stably inherited [43]. The dominant effects are the interactions between the genes at the same locus, but this part of the effect cannot be stably inherited and is the main cause of heterosis [44]. Additionally, the substitution effects are the numerical changes that occur when one gene replaces another.
For dairy cattle breeding, the additive and substitution effects are the priority considerations for exploiting gene effects, which could be steadily inherited to the next generation. Hence, based on the association analysis results and the additive and substitution effects on each SNP, we concluded that the ACADVL and IRF6 genes had significant genetic effects on milk yield and compositional traits.
The binding of TFs to TFBSs regulates the transcription of target genes, thereby affecting gene expression [45]. The SNPs located at the TFBs may affect the binding of TFs, leading to differences in gene expression between individuals with different genotypes [46]. In this study, it was predicted that 19:g.26933503T>C changed the bindings of TF ZEB1, which is a highly conserved and multifunctional transcription factor. Some studies have shown that ZEB1 can either enhance or repress the activity of gene promoters [47,48]. Additionally, we found that Holstein cows with the genotype TT of 19:g.26933503T>C in the first lactation had significantly higher milk fat and protein percentages than those with a genotype CC. In addition, for the 16:g.73501985G>A in IRF6, allele G invented a BS for PLAGL2, and an allele created a BS for RHOXF1. PLAGL2 is a zinc finger protein derived from the PLAG gene family (PLAG1, PLAGL1, and PLAGL2). PLAGL2 specifically binds to consensus sequences in target gene promoters, which include a core sequence (GRGGC) and a cluster sequence (RGGK) separated by seven random nucleotides [49]. RHOXF1 is a TF of the RHOX family that could regulate the development of animal embryos [50]. The milk production traits, except for protein percentage, of the AA genotype subjects at the 16:g.73501985G>A in IRF6 were significantly higher than those of the GG genotype subjects in the second lactation. Therefore, we hypothesized that the phenotypic changes in the milk production traits in dairy cows may be due to changes in gene expression caused by such SNPs. However, the biological functions of these two SNPs still need further indepth investigations to be validated.

Conclusions
In conclusion, by performing single-locus and haplotype-based association analyses, we confirmed the significant genetic effects of the ACADVL and IRF6 genes on milk yield and composition traits. Additionally, the haplotype H2 in these two genes could be used as a genetic marker to increase milk, fat, and protein yields. Of note, the SNPs 19:g.26933503-T>C in ACADVL and 16:g.73501985G>A in IRF6 might change the TFBs, thereby regulating the expressions of the two genes. Thus, the identified SNPs in the ACADVL and IRF6 genes could be used for genomic selection programs in dairy cattle.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13122393/s1. Table S1. Descriptive statistics of the phenotypic values for milk production traits and pedigree information. Table S2. Primers and procedures for the PCR used in the SNP identification of the ACADVL and IRF6 genes. Table S3. Additive, dominant, and allele substitution effects of 16 SNPs in the ACADVL and IRF6 genes on milk yield and composition traits of Holstein cattle in China during two lactations.
Author Contributions: Conceptualization, D.S., K.W. and B.H.; experiment design and resources, D.S.; blood collection, experiments, and data analysis, P.P., Y.L. and W.Z.; writing-original draft, P.P.; writing-review and editing, D.S. All authors have read and agreed to the published version of the manuscript.