Molecular Marker-Assisted Selection of ABCG2, CD44, SPP1 Genes Contribute to Milk Production Traits of Chinese Holstein

Simple Summary Chinese Holstein cattle are the main breed of dairy cows in China. This work performed an association analysis between polymorphisms of three candidate genes (ABCG2, CD44, SPP1) and milk production performance on Chinese Holstein cattle. The results showed that the different genotypes of these 10 SNPs from the ABCG2, CD44, and SPP1 genes significantly affected the milk production performance of Chinese Holstein cattle in terms of milk yield, milk fat percentage, milk protein percentage, somatic cell score, and urea nitrogen content. The ABCG2, CD44, and SPP1 genes could be selected for marker-assisted selection, which is of great significance for future precise molecular breeding. Abstract Based on our results of genome-wide association analysis, we performed gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis; three candidate genes (ABCG2, CD44, SPP1) were screened in this study for SNPs association analysis with production traits in 999 Holstein cattle. In this research, flight mass spectrometry genotyping was used to detect the polymorphism of SNP seats. It was shown that four, four, and two single nucleotide polymorphisms (SNP) loci were detected for the ABCG2, CD44, and SPP1 genes, respectively, and the different genotypes of these 10 SNPs significantly affected the milk production performance of Chinese Holstein cattle in terms of milk yield, milk fat percentage, milk protein percentage, somatic cell score, and urea nitrogen content. Among them, ABCG2-G.80952G > T locus, ABCG2-G.120017G > A locus and CD44-G.2294G > C locus had significant effects on somatic cell score (p < 0.01). Cows with GG genotypes at ABCG2-G.80952G > T locus, AA and GG genotypes at ABCG2-G.120017G > A locus, and GG genotypes at CD44-G.2294G > C locus had lower somatic cell scores. The present study elucidated that ABCG2, CD44, and SPP1 could be selected for marker-assisted selection and will benefit for future precise molecular breeding.


Introduction
The performance of dairy cows mainly refers to milk production traits. Milk-producing traits have always been the more concerned quantitative economic traits in dairy farming, which are jointly controlled by micro-efficiency polygenes and are susceptible to environmental influences [1]. Due to the long generation interval of dairy cows, the selection of milk producing traits of dairy cows is slow in conventional breeding work. Therefore, molecular marker-assisted breeding is a new situation in the current breeding work. Single nucleotide polymorphisms (SNP) have been widely used in DNA molecular genetic markers, and the prospects are very promising [2].

Sample Collection
The experimental population of this study was from four large-scale dairy farms in Jiangsu, China. In this study, SNP microarray samples were collected from cows for genetic parameter evaluation and association analysis. SNP chips were used to collect samples of hair follicles from cows, and hair follicle samples from 999 cows in four farms were collected from 8580 cows that passed the quality control, these experimental cows were of the same age (24 months) and the same parity (parity 2). The collected samples were sent to Neogen Biotechnology Co. Ltd (Lansing, MI, USA), and DNA was extracted and genotyped from the hair follicles of 999 cows.

Enrichment Analysis and DNA Detection
The database (DAVID) online software (https://david.ncifcrf.gov, accessed on 4 November 2022) and the bovine reference genome ARS-UCD1.2(ftp://hgdownload.soe.ucsc.edu/golden path/bosTau9/, accessed on 10 September 2022) in the UCSC database were used, and other online tools have carried out the GO (https://david.ncifcrf.gov, accessed on 4 November 2022) annotation and KEGG (https://www.kegg.jp/, accessed on 4 November 2022) pathway enrichment analysis of candidate gene functions. DNA purity and concentration were tested as follows: using NanoDroP1000 spectrophotometer can directly detect the ratio of DNA OD260/OD280, OD260/OD230 and the concentration of DNA, using the following formula to calculate the concentration of DNA: DNA concentration (ug/mL) = 50 × OD260, when the ratio of OD260/OD280 is between 1.7-1.9, it indicates that the concentration of DNA is relatively ideal.

PCR Amplification, and Commercial Sequences
Based on the sequences of the bovine ABCG2, SPP1, and CD44 genes published on NCBI, primers were designed by software in the CDS region and part of the intron region of the ABCG2, SPP1, and CD44 genes, respectively. A total of 7 pairs of primers were involved in this test, and the details of the primers were shown in Table 1.

Screening of SNP Mutation Sites
20 DNA samples were randomly selected for each pair of primers for PCR amplification, and the successfully amplified products were sent to Qingke Biological Company (Nanjing, China) for sequencing. The sequencing results were compared with Vector NTI Advance 11 software to find out the mutation sites and their specific positions. The amplification procedure was as follows: pre-denaturation at 94 • C for 5 min, denaturation at 94 • C for 30 s, renaturation at 68 • C for 30 s, and extension at 72 • C for 2 min 10 s, 17 cycles. Denaturation at 94 • C for 30 s, renaturation at 51 • C for 30 s, and extension at 72 • C for 2 min 10 s, 20 cycles. The final extension was at 72 • C for 10 min and stored at 4 • C.

Fractal Detection of SNPs
The screened mutation loci were determined by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) for typing detection. The genotyping detection system adopts the time-of-flight mass spectrometry biochip system (MassARRAY®MALDI-TOF System) developed by Sequenom, Inc. (San Diego, CA, USA). This system has been widely promoted and used in the Human Genome Hapmap Project.

Statistical Analysis
The Hardy-Weinberg equilibrium test (HWE), genotype and allele frequency analysis, and linkage disequilibrium (LD, as measured by D and r 2 ) analysis were performed for SNP sites after ABCG2, SPP1, and CD44 genotyping by SHEsis software (http://analysis2 .bio-x.cn/myAnalysis.php, accessed on 4 November 2022) [9]. Multi-factor ANOVA with SPSS (Ver. 18.0) software and general linear models were used to perform the association analysis between the genotype and production performance of ABCG2, SPP1, and CD44 gene SNPs loci. The reduced linear model excluded fixed effects of age and paternity. The specific models were as follows: where Y ijklmnop is the observed value of the trait, µ is the population mean, and C i is the fixed effect of calving season. M j is the fixed effect of the lactation stage, P k is the fixed effect of parity, J l is the fixed effect of measured season, N m is the fixed effect of the test year, G n is the fixed effect of genotype, F o is the cattle field effect, and e ijklmnop is the random error. Multiple comparisons between genotypes were obtained using Duncan's method.

Functional Annotation and Signal Pathway Analysis of ABCG2, CD44 and SPP1 Genes
The profiles of three genes were in the biological process (BP), cell component (CC), and molecular function (MF) categories by GO analysis, respectively. The BP analysis showed that the three genes were classified into the cellular process, localization, metabolic process, response to stimulus, and single-organism process. The CC annotation revealed that the three genes were involved in the cell, cell part, and organelle. The MF classification of genes suggested that three genes were involved in binding. GO annotation analysis elucidated that the three genes participated in the regulation of production performance by multiple indispensable activities ( Figure 1A). Furthermore, we performed KEGG pathway enrichment analysis to explore the most active pathways of these genes; the results revealed that these genes were related to signaling molecules and interactions, as well as the immune system and infectious diseases, which are partially but not completely involved ( Figure 1B).

Screening of ABCG2, CD44 and SPP1 Genes SNP Mutation Loci
By comparing the sequencing results with the three gene sequences, four SNP loci were found, respectively, in ABCG2 and CD44: Figure 3A-D, and Figure 4A,B show the sequencing peaks of each SNP locus of ACG2, CD44, and SPP1, respectively.

LD Analysis of SNPS in ABCG2, CD44 and SPP1 Genes
LD analysis was performed on the SNP loci of the ABCG2, CD44, and SPP1 genes with SHEsis software, and the results were obtained by D′ value and r 2 value. The D′ > 0.7 and r 2 > 1/3 might suggest a sufficiently strong LD to be available for plotting [10]. All the D' and r 2 values among the different SNP loci of the ABCG2, CD44, and SPP1 genes listed in Table 2. The ABCG2 gene ABCG2-g.57261A > G and ABCG2-g.94683A > G, ABCG2-g.120017G > A, ABCG2-g.80952G > T and ABCG2-g.120017G > A are in a highly linked state (D′ > 0.7, r 2 > 1/3), respectively, and the linkage degree between the remaining loci is relatively low ( Figure 5A). The CD44 gene CD44-g.2294G > C and CD44-g.86978G > A are in a highly linked state (D′ > 0.7, r 2 > 1/3), and there is no high degree of linkage between other loci chain relationship ( Figure 5B). There is no high linkage relationship between the two SNPs of the SPP1 gene (D′ > 0.7, r 2 < 1/3) ( Figure 5C).

LD Analysis of SNPS in ABCG2, CD44 and SPP1 Genes
LD analysis was performed on the SNP loci of the ABCG2, CD44, and SPP1 genes with SHEsis software, and the results were obtained by D value and r 2 value. The D > 0.7 and r 2 > 1/3 might suggest a sufficiently strong LD to be available for plotting [10]. All the D' and r 2 values among the different SNP loci of the ABCG2, CD44, and SPP1 genes listed in Table 2. The ABCG2 gene ABCG2-g.57261A > G and ABCG2-g.94683A > G, ABCG2-g.120017G > A, ABCG2-g.80952G > T and ABCG2-g.120017G > A are in a highly linked state (D > 0.7, r 2 > 1/3), respectively, and the linkage degree between the remaining loci is relatively low ( Figure 5A). The CD44 gene CD44-g.2294G > C and CD44-g.86978G > A are in a highly linked state (D > 0.7, r 2 > 1/3), and there is no high degree of linkage between other loci chain relationship ( Figure 5B). There is no high linkage relationship between the two SNPs of the SPP1 gene (D > 0.7, r 2 < 1/3) ( Figure 5C).

Association Analysis of Four SNP Loci in ABCG2 Gene and Production Traits
The correlation analysis results of the four SNP loci of the BCG2 gene with different genotypes of the Holstein cattle somatic cell score, urea nitrogen in milk, daily milk yield, milk protein rate, and milk fat rate are shown in Table 4. The effect of ABCG2-g.57261A > G locus on milk fat rate reached a significant level (p < 0.01). The milk fat rate of GGtype cows was the highest, and the AA genotype was the lowest. The ABCG2-g.94683A > G locus had a highly significant effect (p < 0.01) on the urea nitrogen content in milk, which was significantly higher in the milk of AA-type cows than in AG and GG types. The ABCG2-g.80952G > T locus had a highly significant (p < 0.01) effect on milk yield, milk protein rate, somatic cell score, and urea nitrogen in cows, with TT-type cows having significantly lower measured daily milk yield than GG and GT types, while TT-type cows had significantly higher milk protein rate, somatic cell score, and urea nitrogen than GG and GT types. ABCG2-g.120017G > A had significant effects on milk yield, milk protein percentage, somatic cell score, and urea nitrogen (p < 0.01) and had significant effects on milk fat percentage (p < 0.05). The milk protein percentage, milk fat percentage, somatic cell score, and urea nitrogen of AG cows were significantly higher than those of AA and GG cows. A square visual diagram of ABCG2 gene difference expression is shown in Figure 6.

Association Analysis of 4 SNP Loci of CD44 Gene and Production Traits
The association analysis results of different genotypes of the 4 SNP loci of CD44 gene with somatic cell score, milk urea nitrogen, measured daily milk yield, milk protein percentage, and milk fat percentage of Holstein cattle are shown in Table 5. The effect of CD44-g.2263A > G locus on milk urea nitrogen reached a very significant level (p < 0.01), and the urea nitrogen content of AA cows was significantly higher than that of AG and GG genotypes. The CD44-g.86978G > A locus had a very significant effect on the milk fat rate of dairy cows (p < 0.01), and the AA type was the highest. CD44-g.2294G > C locus had extremely significant effects on SCS and milk urea nitrogen contents (p < 0.01) and had significant effects on milk fat percentage (p < 0.05). CG-type cows had a higher milk fat percentage and milk urea nitrogen content, which were significantly higher than CC-type cows and CG-type cows; CC-type cows and CG-type cows had higher SCS. The CD44-G.86895A > G locus had significant effects on milk yield, milk fat rate, protein rate, and milk urea nitrogen content of dairy cows (p < 0.01). The milk yield, milk fat percentage, and milk urea nitrogen content of GG dairy cows were the highest, as they were significantly higher than those of other genotypes, and the protein percentage of AA and AG cows was the highest. A square visual diagram of CD44 gene difference expression is shown in Figure 7. Note: Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05); * indicates that the differences reach a significant level (p < 0.05); ** indicates that the differences reach a highly significant level (p < 0.01). Note: Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05); * indicates that the differences reach a significant level (p < 0.05); ** indicates that the differences reach a highly significant level (p < 0.01).

Association Analysis of Two SNP Loci in SPP1 Gene and Production Traits
The correlation analysis results of the two SNP loci of the SPP1 gene between different genotypes and Holstein cattle somatic cell score, urea nitrogen in milk, daily milk yield, milk protein rate, and milk fat rate are shown in Table 6. The SPP1-g.50265G > A locus had a significant level of effect on milk yield p < 0.05) and had a highly significant effect on milk urea nitrogen content (p < 0.01). The milk yield and milk urea nitrogen content of the AG and GG cows were significantly higher than those of the AA cows. The SPP1-g.50315 C > T locus had a significant effect on the milk fat rate of dairy cows (p < 0.05), and the CT-type and TT-type were significantly higher than the CC-type. A square visual diagram of SPP1 gene difference expression is shown in Figure 8. The milk fat percentage of GG-type cows in the ABCG2-g.57261A > G locus was significantly higher than that of AA-type and AG-type cows. (B) ABCG2-g.94683A > G locus had significantly higher urea nitrogen content in the milk of AA type cows than AG and GG types. In ABCG2-g.80952G > T locus, the tested day milk yield (C) of TT type cows was significantly lower than that of GG and GT type cows; somatic cell score (D), milk protein percentage (E), and urea nitrogen (F) of TT type cows were significantly higher than GG and GT types. In ABCG2-g.120017G > A locus, AA type cows tested day milk yield (G) was significantly higher than that of AG and GG type cows, and the milk protein percentage (H), milk fat percentage (I), somatic cell score (J) and urea nitrogen (K) of AG type cows were significantly higher than those of AA and GG type cows. Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05).

Figure 7.
Effect of different genotypes in CD44 four SNP loci on production traits in Chinese Holstein cattle. In CD44-g.2263A > G locus, AA type urea nitrogen content of was significantly higher than that of AG and GG genotypes (A). In CD44-g.2294G > C locus, milk fat percentage (B) and milk urea nitrogen content of CG-type (D) cows were significantly higher than CC and CG types. Higher SCS for CC-and CG-type cows (C). In CD44-g.86978G > A locus, Type AA cows have the highest milk fat percentage (E). In CD44-g.86895A > G locus, tested day milk yield (F), milk fat rate (G), and milk urea nitrogen content (I) of GG-type cows were significantly higher than those of AA, AG types; AA and AG cows had the highest protein rates (H). Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05). Note: Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05); * indicates that the differences reach a significant level (p < 0.05); ** indicates that the differences reach a highly significant level (p < 0.01). Figure 8. SPP1-g.50265G > A and SPP1-g.50315C > T loci on the productive performance of different genotypes. tested day milk yield (A) and milk urea nitrogen content (C) were significantly higher in AG and GG types of cows than in AA-type; CT and TT types of cows were significantly higher than the CC-type, milk fat rate (B) of CT and TT types were significantly higher than that of CC-type. Data in the same column with different lowercase letters on the shoulder indicate significant differences (p < 0.05).

Discussion
Previous studies analyzed the association between ABCG2 gene polymorphism and milk yield and milk composition of Kalanfris cattle and found that the three genotypes of SNP on exon 14 had significant effects on the milk fat percentage of cows [11]. Tantia identified the quantitative trait loci of Indian cattle (Bos indicus) and river cattle (Bubalus bubalis) and found that the allele of ABCG2 had a significant impact on high milk fat yield, high fat, and protein percentage [12]. In this study, four polymorphisms were found in the ABCG2 gene, and all these polymorphisms were located in the intronic region. Existing studies have shown that introns can generate splicing signals through their own sequence properties to affect transcript splicing, thereby affecting protein sequences and individual phenotypes [13]. The results of this study showed that ABCG2-G.57261A > G locus had a significant effect on the milk fat percentage of dairy cows, and GG type was the highest. Milk urea nitrogen content in the AA genotype of ABCG2-G.94683A > G locus was significantly higher than that in the AG and GG genotypes. Milk protein percentage, somatic cell score and urea nitrogen of cows carrying ABCG2-G.80952G > T TT genotype were significantly higher than those of the GG and GT genotypes. Milk protein percentage, milk fat percentage, somatic cell score, and urea nitrogen were significantly higher in cows carrying the AG genotype at the ABCG2-g.120017G > A locus than in the AA and GG types. The effects of ABCG2-G.80952G > T and ABCG2-G.120017G > A on production traits of dairy cows showed similar trends, probably because of the high linkage relationship between them. The results of this study show that ABCG2 polymorphisms have significant effects on dairy cows' milk fat rate, milk yield, milk protein rate, and other production traits, which are basically consistent with previous studies. In the selection of dairy cows, the selection of ABCG2-g.57261A > G site GG type and ABCG2-g.120017G > A site AG type dairy cows should be increased in view of improving milk fat rate. In order to improve milk production and milk protein rate of dairy cows, the selection of TT-type cows at ABCG2-g.80952G > T site and AG-type cows at ABCG2-g.120017G > A site can be strengthened. However, the SCS of cows with these two genotypes are also relatively high, which may increase the burden of udder function with the increase in milk production, aggravate the shedding of mammary epithelial cells, and increase the SCS of cows. Olsen [14] found an SNP mutation in the ABCG2 gene in exon 9, resulting in a mutation of the amino acid from tyrosine to cysteine, but the differences in the five lactation performances among the three genotypes were not significant. In the selection of dairy cow genotypes, it is necessary to comprehensively select the SNPs genotypes corresponding to the target traits, so we are required to use different methods to find more SNP loci that affect dairy cow traits.
Most of the existing studies on CD44 have focused on disease treatment. CD44 gene rs13347 polymorphism is significantly associated with elevated cancer risk in Asians [15]. Deletion of the CD44 gene has been reported to reduce proinflammatory cytokines [16], and treatment of affected mice with CD44 antibodies reduced inflammation [17]. Previous studies on CD44 in cattle have shown that CD44 is related to dairy production traits. Jiang [18] believed that the CD44 gene could affect the synthesis of triglyceride in bovine mammary epithelial cells, thereby affecting the lipid content in milk. A total of 4 SNP loci were found in this study, CD44-g.2263A > G and CD44-g.2294G > C were located in the intron region, CD44-G.86895A > G and CD44-G.86978G > A were located in the exon region, the CD44-g.86895A > G site is a nonsense mutation, and the CD44-g.86978G > A site is a missense mutation. Association analysis found that the milk urea nitrogen content of cows with CD44-g.2263A > G locus AA genotype was significantly higher than that of AG and GG genotypes. The CD44-g.86978G > A locus had a significant effect on milk fat percentage in cows, with the AA type having the highest. Cows with the CD44-G. 2294G > C CG genotype had a higher milk fat percentage and milk urea nitrogen content, which were significantly higher than those with CC and CG genotypes, while cows with CC and CG genotypes had higher SCS. Cows carrying the CD44-g.86895A > G locus GG genotype had the highest milk yield, milk fat percentage, and milk urea nitrogen content, which were significantly higher than other genotypes, while cows with AA and AG genotypes had the highest protein rate. From the results of this study, the genotypes of the three SNP loci have a significant impact on the milk fat rate of dairy cows, which proves that CD44 is related to the lipid metabolism of dairy cows. This result is consistent with previous studies. In order to improve milk fat percentage, the selection of dairy cows carrying CD44-G.86978G > A AA genotype, CD44-G.2294G > C CG genotype and CD44-G.86895A > G GG genotype should be strengthened. There are more and more studies on SPP1 gene polymorphism, which provides a powerful reference for disease treatment and improving livestock and poultry production.
Maria [19] conducted an association analysis of the SPP1 gene polymorphism in Salda sheep and found that rs161844011 locus in exon 7 was associated with SCS. Liang [20] found that people carrying the CC genotype and C allele of the rs11730582 locus of the SPP1 gene had a reduced risk of developing breast cancer. Furthermore, SPP1 gene polymorphism was significantly correlated with lactation persistence of dairy cows, and G was its dominant allele [21]. Mello analyzed the association between SPP1 gene polymorphism and milk yield of Girorando cows and found that 305-day cows carrying the T allele at position g.8514C > T had higher milk yield, but it did not reach a significant level [22]. The SNPs of the two SPP1 genes obtained in this study were located in the intronic region. The milk yield and milk urea nitrogen content of the AG and GG genotypes of SPP1-G. 50265G > A were significantly higher than those of the AA genotypes. SPP1-g.50315 C > T loci CT and TT genotypes had significantly higher milk fat percentages than the CC types. The selection of cows carrying the G allele at the SPP1-G.50265G > A locus can be strengthened in future dairy breeding to improve milk yield. From the perspective of improving the milk fat rate of dairy cows, the selection of dairy cows carrying the SPP1-g.50315 C > T locus T allele can be strengthened.

Conclusions
This work performed GO annotation and KEGG pathway enrichment analysis on the ABCG2, CD44 and SPP1 genes and analyzed the association between the ABCG2, CD44, and SPP1 genes and Chinese Holstein production performance. The milk yield, milk fat rate, milk protein rate, somatic cell score, and blood urea nitrogen content of Holstein cattle with different genotypes of these 10 SNP loci were different. Among them, ABCG2-G.80952G > T locus, ABCG2-G.120017G > A locus and CD44-G.2294G > C locus had significant effects on the somatic cell score (p < 0.01). The present study elucidated that ABCG2, CD44 and SPP1 could be selected for marker-assisted selection and will benefit future precise molecular breeding.