Single Nucleotide Polymorphisms of ALDH18A1 and MAT2A Genes and Their Genetic Associations with Milk Production Traits of Chinese Holstein Cows

Our preliminary work had suggested two genes, aldehyde dehydrogenase 18 family member A1 (ALDH18A1) and methionine adenosyltransferase 2A (MAT2A), related to amino acid synthesis and metabolism as candidates affecting milk traits by analyzing the liver transcriptome and proteome of dairy cows at different lactation stages. In this study, the single nucleotide polymorphisms (SNPs) of ALDH18A1 and MAT2A genes were identified and their genetic effects and underlying causative mechanisms on milk production traits in dairy cattle were analyzed, with the aim of providing effective genetic information for the molecular breeding of dairy cows. By resequencing the entire coding and partial flanking regions of ALDH18A1 and MAT2A, we found eight SNPs located in ALDH18A1 and two in MAT2A. Single-SNP association analysis showed that most of the 10 SNPs of these two genes were significantly associated with the milk yield traits, 305-day milk yield, fat yield, and protein yield in the first and second lactations (corrected p ≤ 0.0488). Using Haploview 4.2, we found that the seven SNPs of ALDH18A1 formed two haplotype blocks; subsequently, the haplotype-based association analysis showed that both haplotypes were significantly associated with 305-day milk yield, fat yield, and protein yield (corrected p ≤ 0.014). Furthermore, by Jaspar and Genomatix software, we found that 26:g.17130318 C>A and 11:g.49472723G>C, respectively, in the 5′ flanking region of ALDH18A1 and MAT2A genes changed the transcription factor binding sites (TFBSs), which might regulate the expression of corresponding genes to affect the phenotypes of milk production traits. Therefore, these two SNPs were considered as potential functional mutations, but they also require further verification. In summary, ALDH18A1 and MAT2A were proved to probably have genetic effects on milk production traits, and their valuable SNPs might be used as candidate genetic markers for dairy cattle’s genomic selection (GS).


Introduction
Milk, cheese, yogurt, and other dairy products can provide good nutrition. Some studies have shown that certain components in milk and dairy products are beneficial for gastrointestinal health and have neutral to beneficial effects on biomarkers of inflammation [1,2]. The protein in milk plays a certain role in strengthening muscle, reducing blood pressure, and helping learning and memory. Trans fatty acids in milk fat may be associated with anticarcinogenic properties in humans, and may decrease tumor growth and the risk of coronary heart disease [1]. In recent years, while China's economy has developed rapidly and people's living standards have improved, the emphasis on healthy diet is increasing, leading to the soaring proportion of dairy product consumption [3,4]. Meanwhile, China's dairy industry is developing steadily and has achieved remarkable significant SNPs to provide some reference information for their application on GS chip development in dairy cattle. In addition, we predicted the possible effect of the variants at the identified SNP loci on gene expression based on their location, such as transcription factor binding sites (TFBSs), laying a research foundation for further exploring the causal mutations of important traits in dairy cows.

Animal Selection, Pedigree, and Phenotypic Data Collation
We used 924 daughters of 44 Chinese Holstein bull families from 22 farms of Beijing Shounong Animal Husbandry Development Co., Ltd. (Beijing, China) as the experimental population. Among them, the average number of daughters per bull mentioned above was 21 (ranging from 6 to 62), and each cow had three generations of genealogical information. The offspring were all raised from 2009 to 2015, with good health, accurate pedigree, and standardized Dairy Herd Improvement (DHI) records. The DHI measurement method referred to the national standard of "Technical Specification of Chinese Holstein cattle performance test" (standard number NY/T 1450-2007), and the measurement indicators included 305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage ( Table S1). The study was conducted in accordance with Guide for the Care and Use of Laboratory Animals and approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University (Beijing, China; permit number: DK996).

Genomic DNA Extraction from Frozen Semen and Blood Samples
The genomic DNA of the 44 Chinese Holstein bulls' frozen semen samples and their 924 daughters' blood samples were extracted by the optimized high-salt method and a TIANamp Blood DNA Kit (Tiangen Biochemical Technology, Beijing, China), respectively. The concentration and integrity of DNA samples were tested using a NanoDrop2000 spectrophotometer (Thermo Science, Hudson, NH, USA) and gel electrophoresis (1.5%), respectively.

Polymorphism Detection of Candidate Genes
Based on the sequence of full coding region and 2000 bp of upstream and downstream regulatory regions of the ALDH18A1 and MAT2A genes of the cattle in GenBank (National Library of Medicine. Available online: https://www.ncbi.nlm.nih.gov/nuccore (accessed on 8 September 2021)), we designed primers (Table S2) using Primer 3.0 (Primer 3. Available online: https://bioinfo.ut.ee/primer3-0.4.0/ (accessed on 8 September 2021)). Then, we diluted the concentration of genomic DNA in the frozen semen to 50 ng/µL, and pipetted 1 µL each into a mixing pool for PCR amplification (Table S2). After the PCR amplification products were tested by 2% gel electrophoresis, the qualified PCR products were sent to Beijing Qingke Xinye Biotechnology Co., Ltd. (Beijing, China) for bidirectional sequencing. Based on the sequencing results, we used the Chromas 1.62 software to view the sequencing diagram and compared the sequence to the reference sequence (ARS-UCD1.2) on NCBI-BLAST (National Library of Medicine. Available online: https://blast.ncbi.nlm.nih.gov/ Blast.cgi (accessed on 20 November 2021)) in order to identify polymorphic sites and mutation types. Subsequently, we performed SNP genotype testing on 924 individuals using Genotyping by Target Sequencing (GBTS) technology by Boruidi Biotechnology Co., Ltd. (Hebei, China). The allele and genotype frequencies were directly calculated. The Hardy−Weinberg equilibrium was tested by the chi-squared test to compare the observed genotype frequencies within the expected frequencies.

Linkage Disequilibrium (LD) Estimation
The extent of LD between the identified SNPs was estimated using Haploview 4.2 (Broad Institute of MIT and Harvard, Cambridge, MA, USA). The extent of LD is measured by the D value, which is proportional to it. The haplotype block with a frequency greater than 0.05 was retained.

Association Analysis of Single Marker/Haplotype and Milk Production Traits
The MIXED process in SAS 9.4 software (SAS Institute Inc., Cary, NC, USA) was used to conduct association analysis on five milk production traits and SNPs or haplotype blocks, and the animal model used was as follows: where Y is the phenotype values of individual milk production traits (305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage); µ is the overall mean; hys is the fixed effect of farm, calving year, and calving season; M is the age of calving as a covariant; b is the regression coefficient of the covariate M; G is the genotype or haplotype combination effect; a is the individual random additive genetic effect, where the distribution is N 0, Aδ 2 a , A is a pedigree-based relationship matrix, and the additive genetic variance is δ 2 a ; and e is the random residual, where the distribution is N 0, Iδ 2 e , the unit matrix is I, and the residual variance is δ 2 e . We used the Bonferroni multiple test to correct the p value of the hypothesis test in the correlation analysis, and obtained the corrected p.
In addition, we calculated the additive, dominant, and substitution effects of SNP loci, using the following formula: where, a, d, and α are the additive effect, dominant effect, and substitution effect, respectively; AA, AB, and BB are the phenotypic least squares means of the corresponding genotypes; p is the frequency of allele A; and q is the frequency of allele B.

Biological Function Prediction
We used Jaspar (JASPAR. Available online: http://jaspar.genereg.net/ (accessed on 10 April 2022)) and Genomatix (Genomatix software suite. Available online: https://www. genomatix.de/cgi-bin/sessions/login.pl?s=6de98bdc4d8464e81dfaf67276f8f5f7 (accessed on 10 April 2022)) software to predict whether SNPs in the 5 flanking region ALDH18A1 and MAT2A genes changed the TFBS. Based on the characteristics of the two software, we screened the prediction results by different criteria, that is, the relative score of the former was higher than 0.8, and the binding site of the latter was a highly conservative core sequence. We selected the transcription factors that the two software suggested in order to improve the reliability of the predictions.

Single Marker Association Analysis
The associations between the 10 SNPs of the ALDH18A1 and MAT2A genes and 5 milk production traits in the first and second lactation stages, including 305-day milk yield, fat yield, fat percentage, protein yield, and protein percentage, were analyzed. The results showed that for ALDH18A1, 26:g.17100534G>T was significantly associated with the 305-day milk yield in the first lactation (corrected p = 0.0016; Table 2); 26:g.17130318C>A, 26:g.17118244G>A, 26:g.17102977T>C, 26:g.17100534G>T, and 26:g.17086802T>A reached significant association levels for the 305-day milk, fat, and protein yields in the second lactation (corrected p ≤ 0.0488); and 26:g.17088978A>G was significantly associated with the 305-day milk yield, fat yield, fat percentage, and protein yield in the second lactation (corrected p ≤ 0.0072). In MAT2A, 11:g.49465032C>T was significantly associated with the 305-day milk yield and protein yield in the first lactation (corrected p ≤ 0.0252), and the 305-day milk yield, fat yield, and protein yield in the second lactation (corrected p ≤ 0.0204), while 11:g.49472723G>C was significantly associated with the 305-day milk yield and protein percentage in the first lactation (corrected p ≤ 0.0222). The results of additive, dominant, and substitution effects are shown in Table S3.

Haplotype Association Analysis
Through estimating the degree of LD among the 10 identified SNPs by Haploview 4.2, we discovered that 7 SNPs of the ALDH18A1 gene formed two haplotype blocks ( Figure 1; Block 1: D = 0.80-1.00; Block 2: D = 0.98-1.00). In Block 1, six haplotypes were found, and the frequencies of H1 (ACGGG), H2 (TGGGG), H3 (TGAAG), H4 (TGAGG), H5 (TGAGT), and H6 (AGGGG) were 0.58, 0.172, 0.094, 0.066, 0.066, and 0.02, respectively. Block 2 consisted of two haplotypes, H1 (TG) and H2 (CA), with frequencies of 0.874 and 0.124, respectively. Moreover, we found that the haplotype combinations had significant associations with the 305-day milk yield, fat yield, fat percentage, protein yield, or protein percentage in the two lactations (corrected p ≤ 0.025; Table 3). Block 1 was significantly associated with the 305-day milk yield, fat yield, and protein yield in both lactation stages (corrected p ≤ 0.014). Block 2 had a significant association with the 305-day milk yield, fat yield, and protein percentage in the second lactation stage (corrected p ≤ 0.025). The number in the table represents the mean ± standard deviation; the number in the bracket represents the number of cows for the corresponding genotype; p value shows the significance for the genetic effects of SNPs; the superscript letters indicate the significance of different genotypes, involving the comparison between each two pairs; a, b within the same column with different superscripts means p < 0.05; and A, B within the same column with different superscripts means p < 0.01. Corrected p means p value after multiple test correction of Bonferroni.

Functional Variation Prediction Caused by SNPs
We predicted the TFBS changes of the two SNPs, 26:g.17130318C>A and 11:g.49472723G>C, located in the 5 region of the ALDH18A1 and MAT2A genes, respectively, using the Jaspar and Genomatix software to obtain the common result (Table 4). Mutation from the allele C to A of 26:g.17130318C>A in ALDH18A1 caused the disappearance of the binding site (BS) for transcription factor (TF) homeobox containing 1 (HMBOX1) (relative score = 0.92). For 11:g.49472723G>C in MAT2A, allele C created the BS for MYC associated zinc finger protein (MAZ) (relative score = 0.84).

Discussion
Our previous study considered the ALDH18A1 and MAT2A genes to be candidates to affect milk production traits in dairy cows [21]. In this study, we detected the polymorphisms of the ALDH18A1 and MAT2A genes and found that there was a certain genetic association between the SNPs/haplotype blocks and milk production traits. Haplotype analysis has been shown to have a more accurate advantage for detecting genetic variation in complex traits than single-label SNP analysis [28,29].
With different application purposes, gene chips could be divided into high-density gene chips with research value and low-density gene chips with application value [30]. Based on the results of this study, we speculated that there were two possible applications. On the one hand, the reliability of genome breeding value estimation could be improved to a certain extent by increasing the weight of SNPs/ haplotype chromosomal segments of these two genes on high-density SNP chips. On the other hand, considering costeffectiveness, low-density gene chips can also be constructed by selecting loci such as SNPs of ALDH18A1 and MAT2A that are closely associated with milk production traits, focusing on estimating the breeding value of the target traits [30,31]. At present, many studies use low-density SNP chipsets to genotype animals, and then accurately interpolate them into high-density groups to improve the accuracy of breeding value estimation [32][33][34]. However, the application in this direction still needs in-depth verification.
Furthermore, for each identified SNP of ALDH18A1 and MAT2A in the current study, we observed that the phenotypic value of individuals with low-frequency alleles is low, which may indicate that due to long-term artificial breeding of high-yield individuals, the proportion of genotypes that have a beneficial impact on the phenotype gradually occupies a large proportion. This implied the effectiveness of dairy cattle breeding in recent years, and also reflected that these loci might be closely associated with milk production traits. Moreover, low-frequency variants (such as 26:g.17100534G>T) may be caused by potential sampling error effects. If not, we need to increase sample size to vigorously identify such associations, which has been emphasized by recent studies that identified the role of low frequency and rare coding variation in complex traits [35]. However, we found that the ten SNPs of ALDH18A1 and MAT2A showed different associations between the first and second lactations, and considered that it might be caused by the different number of cows selected for genetic association analysis. As we used 924 cows in the first lactation and 635 in the second lactation (the cows selected in the two lactation periods did not completely overlap), the statistical significance might have been impacted. Generally, cows have higher milk production in the second lactation, and the different physiologic status of cows between the two lactations was also one possibility for the different genetic effects across lactations. In addition, the results showed that for some SNPs, there were an increased value in the heterozygous genotype, whereas there were others with an increased value in the homozygous genotype. This might have been caused by the interaction between alleles. Taking the milk production in the second lactation period as an example, for 26:g.17118244G>A, the phenotypic value of heterozygotes was the highest because of the significant dominant effect; for 26:g.17086802T>A, the phenotypic value of heterozygotes was in the middle because of the significant additive effect.
There are many factors that affect gene expression, one of which is that TFBSs regulate the transcription of target genes by binding to different transcription factors [36].The SNPs located at TFBSs may affect the binding of transcription factors, resulting in differences in gene expression among individuals with different genotypes [37]. In this study, we found the changes of TFBSs caused by the SNPs in the 5 flanking regions of the ALDH18A1 and MAT2A genes. In ALDH18A1, the BS of TF HMBOX1 appeared when the allele A mutated to C of 26:g.17130318C>A. HMBOX1 is a homeobox containing protein with transcriptional repressor activity [38][39][40]. Based on the phenotypic data of milk production traits with different genotypes, we found that, for 26:g.17130318C>A in ALDH18A1, the fat and protein yields of cows with genotype AA were significantly higher than those with genotype CC in the second lactation. These findings suggest that the mutant site 26:g.17130318C>A could modulate the expression of ALDH18A1 to affect the milk yield traits in dairy cattle by binding the TF HMBOX1. In addition, for 11:g.49472723G>C in MAT2A, allele C created the BS for MAZ. MAZ is a transcription factor that plays a dual regulatory role in initiating and terminating the transcription of certain genes [41][42][43][44]. The 305-day milk yield and protein percentage of CC genotype individuals at 11:g.49472723G>C in MAT2A were significantly lower than those of GG individuals in the first lactation. The above findings indicate that 11:g.49472723G>C might be a key mutation that affects the phenotypic changes of milk production by the regulation of MAT2A gene by binding the TF MAZ. Thus, we speculated that the change of gene expression caused by SNPs may be one of the reasons for the phenotypic changes of milk production traits in dairy cows.
Furthermore, this study has some limitations compared with a genome-wide association study (GWAS). GWAS can screen SNPs in the whole genome and find parts associated with traits. At the same time, systematic errors such as population stratification can be corrected in the analysis process. Therefore, the significant SNPs obtained from GWAS are generally more reliable than those obtained from allelic association. Indeed, GWAS gives the possibility to take into account a massive amount of SNPs; however, most of them have no influence on production traits. SNPs with no impact on production traits can cause information noise. Moreover, each SNP found using a high-density microarray should be analyzed in detail to prove the associations with production traits.

Conclusions
This study identified single nucleotide polymorphisms of the ALDH18A1 and MAT2A genes, and found that most of these SNPs were associated with milk production traits of cows. We propose that these SNPs could be used as genetic markers for milk production traits in dairy GS; however, this must be confirmed on larger populations of dairy cattle. Two SNPs, 26:g.17130318C>A in ALDH18A1 and 11:g.49472723G>C in MAT2A, might change the TFBSs to regulate expression of the corresponding gene. This study also provided some reference information for further functional verification of ALDH18A1 and MAT2A.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13081437/s1, Table S1: Descriptive statistics of phenotypic values for dairy production traits in two lactations; Table S2: Primers and procedures for PCR used in SNPs identification of ALDH18A1 and MAT2A genes; Table S3: Additive, dominant, and substitution effects of SNPs in ALDH18A1 and MAT2A genes on milk yield and composition traits in Chinese Holstein cattle during two lactations.