Genetic Analyses Confirm SNPs in HSPA8 and ERBB2 are Associated with Milk Protein Concentration in Chinese Holstein Cattle

Heat shock 70 kDa protein 8 (HSPA8) and erb-b2 receptor tyrosine kinase 2 (ERBB2) were the promising candidates for milk protein concentration in dairy cattle revealed through previous RNA sequencing (RNA-Seq) study. The objective of this post-RNA-Seq study was to confirm genetic effects of HSPA8 and ERBB2 on milk protein concentration in a large Chinese Holstein population and to evaluate the genetic effects of both genes on other milk production traits. There were 2 single-nucleotide polymorphisms (SNPs) identified for HSPA8 and 11 SNPs for ERBB2 by sequencing 17 unrelated Chinese Holstein sires. The SNP-rs136632043 in HSPA8 had significant associations with all five milk production traits (p = 0.0086 to p < 0.0001), whereas SNP-rs132976221 was remarkably associated with three yield traits (p < 0.0001). Nine (ss1996900615, rs109017161, rs109122971, ss1996900614, rs110133654, rs109941438, rs110552983, rs133031530, and rs109763505) of 11 SNPs in ERBB2 were significantly associated with milk protein percentage (p = 0.0177 to p < 0.0001). A 12 Kb haplotype block was formed in ERBB2 and haplotype associations revealed similar effects on milk protein traits. Our findings confirmed the significant genetic effects of HSPA8 and ERBB2 on milk protein concentration and other milk production traits and SNP phenotypic variances above 1% may serve as genetic markers in dairy cattle breeding programs.


Introduction
Molecular selective breeding strategies have been widely applied in animal breeding to improve the important economic traits of livestock. Identification of key genes or causal variations for economic traits is prerequisite to perform molecular selective breeding strategies [1,2]. Milk protein concentration is an important index to evaluate the nutritional value in cow's milk. However, limited key genes or variations for milk protein concentration were found in dairy cattle [3][4][5][6][7][8][9]. Our initial RNA sequencing (RNA-Seq) study revealed that heat shock 70 kDa protein 8 (HSPA8) and erb-b2 receptor tyrosine kinase 2 (ERBB2) were candidate genes affecting milk protein traits in dairy cows [10]. It was observed that HSPA8 (Log 2 fold change = −1.13, q-value = 2.91 × 10 −2 ) and ERBB2 (Log 2 fold change = 1.00, q-value = 5.25 × 10 −3 ) were significant differentially expressed in bovine mammary tissues of cows in high and low milk protein percentage comparisons [10].
HSPA8 is located in BTA15 with a total length of 4426 bp, containing 9 exons and 8 introns, and encoding 650 amino acids. HSPA8 is an important gene in the MAPK pathway [11], which has a positive effect on protein synthesis by increasing the stability of mRNA through phosphorylation of the AU-rich element-binding protein [12]. HSPA8 is highly expressed in the lactating mammary gland [10,13], which has a role in regulating protein folding and processing in the endoplasmic reticulum [14], likely in support of active milk protein synthesis [13]. ERBB2 encodes a receptor of tyrosine kinases [15] and is located in BTA19 with a total length of 24,682 bp, containing 27 exons and 26 introns, and encoding 1255 amino acids. ERBB2 could activate PI3K signaling pathway by directly binding of PI3K regulatory subunit p85 to phosphorylated tyrosine residues, which is known to regulate milk protein synthesis [16]. Most importantly, HSPA8 is known to be downstream of ERBB2 in human [17,18], indicating the probably regulatory effect of ERBB2 to HSPA8 and working together in specific biological processes.
Genetic analyses between selected genes and bovine milk production traits can provide valuable molecular information for the genetic improvement program for milk quality of dairy cattle. Therefore, the aims of this study were to identify genetic polymorphisms in HSPA8 and ERBB2, to explore the linkages among single-nucleotide polymorphisms (SNPs), and to conduct the genetic effects analyses in a large Chinese Holstein population.

Animal Population and Phenotypic Data
A total of 1027 Chinese Holstein cows from 17 sire families were used to construct the study population. Family size ranges from 25 to 187 daughters with an average of 60 daughters per sire (Table S1). Cows were selected from 17 dairy farms in the Beijing Sanyuan Lvhe Dairy Farm Center, where regular and standard performance testing (Dairy Herd Improvement, DHI) have been implemented since 1999. Five milk production traits (305 d milk yield, 305 d protein yield, 305 d fat yield, average 305 d protein percentage, and average 305 d fat percentage) were collected from individual animal via the complete DHI data and were used for the subsequent analyses.

Genomic DNA Extraction
Blood samples were collected from 1027 Chinese Holstein cows via coccygeal vein and stored at −20 • C. The tubule frozen semen samples of 17 sires were collected from Beijing Bull Station. Genomic DNA was extracted from blood samples with a TIANamp Blood DNA kit (TIANGEN Biotech, Beijing, China) and from the semen samples using a standard phenol-chloroform procedure [19]. The quantity and quality of isolated DNA were confirmed before further analysis.

SNP Identification and Genotyping
A DNA pool was constructed from aforementioned 17 Holstein bulls (50 ng/uL of each individual) to identify potential SNPs that were involved in HSPA8 and ERBB2 genes. A total of 13 and 26 pairs of primers (Table S2) were designed to amplify all exons and their partial flanking intronic sequences based on the reference sequences of the bovine HSPA8 (NCBI Reference Sequence: AC_000172.1) and ERBB2 (NCBI Reference Sequence: AC_000176.1) referring to Bos_taurus_UMD_3.1 assembly using Primer3 web Program v.0.4.0 (http://primer3.ut.ee), respectively.
The polymerase chain reaction (PCR) was performed to amplify the pooled DNA from 17 sires with a final reaction volume of 25 µL, comprising of 50 ng genomic DNA, 0.5 µL of each primer, 2.5 µL 10× PCR buffer, 2.5 mM each of dNTP, and 1 U of Taq DNA polymerase (Takara Biotechnology Co., Ltd., Dalian, China). The PCR protocol was 5 min at 94 • C for initial denaturing, followed by 34 cycles at 94 • C for 30 s, 56 • C for 30 s, 72 • C for 30 s, and a final extension at 72 • C for 7 min. The amplification products were visualized by gel electrophoresis on 2% agarose gels, followed by photography under UV light. After that, 40 µL of each PCR product from the pooled DNA was bi-directionally sequenced using the ABI3730XL (Applied Biosystems, Foster City, CA, USA), and the sequences were aligned to the bovine reference sequences (UMD3.1) using BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) to identify potential SNPs. The subsequent genotyping analysis of the 1027 Chinese Holstein cows were performed with matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS, Squenom MassARRAY, Bioyong Technologies Inc. HK) assay.

Linkage Disequilibrium Analysis
Pair-wise linkage disequilibrium (LD) was measured for each pair of SNPs genotyped within the HSPA8 and ERBB2 genes based on the criterion of D prime (D ) using Haploview [20]. Genotypes were firstly imputed for each individual using the Beagle3.2 software program [21]. Briefly, an iterative algorithm was applied for fitting a haplotype Hidden Markov Model (HMM) to genotype data that alternated between model building and sampling. In the model-building step, current estimates of phased haplotypes are used for building a new haplotype HMM, in the sampling step, new haplotypes are sampled for each individual conditional upon the genotype data and current haplotype HMM. Estimated phased haplotypes for the initial iteration are obtained by imputing missing genotypes at random according to allele frequencies and randomly phasing heterozygous genotypes. Accordingly, haplotype blocks where SNPs are in high LD (D > 0.90) were determined based on confidence intervals methods [22]. A haplotype with a frequency >5% was considered as a distinguishable haplotype, while the haplotypes with relative frequency <5% were pooled into a single group. Haplotype blocks within relative SNPs were applied to subsequent analyses to detect their associations with phenotypes.

Association and Haplotype Analyses
Hardy-Weinberg equilibrium test was conducted on each identified SNP. Chi-square was used to compare the number of expected and observed genotypes with a significance level of 0.05. The genetic effects of each candidate SNP or haplotype on five milk production traits were analyzed with the mixed procedure of SAS (SAS Institute Inc., Cary, NC, USA) with the following statistical model: where, y ijklmn was the phenotypic value of each trait of cows (n = 1027 for each trait); µ was the overall mean; F i was the fixed effect of farm; YS j was the fixed effect of year-season; P k was the fixed effect of parity; M was the covariate effect of calving month; b was the regression coefficient of M; G l was the fixed effect corresponding to the genotype of polymorphisms or haplotype; α m was the random polygenic effect, distributed as N (0, Aσ a 2 ), with the additive genetic relationship matrix A and the additive genetic variance σ a 2 ; and e ijklmn was the random residual, distributed as N (0, Iσ e 2 ), with identity matrix I and residual error variance σ e 2 . The overall reliability of the whole model was symbolized as 'Goodness of Fit', and was calculated by R 2 as following formula, where, R 2 was the overall reliability; SS R was sum of squares of variables; SS T was total sum of squares; SS E was residual sum of squares; Residual was the residuals; Y was phenotypes of traits; Y was mean values of phenotypes of traits. The results showed that the R 2 of milk yield, milk protein percentage, milk fat percentage, milk protein yield and milk fat yield are 0.55, 0.50, 0.66, 0.73, 0.76, respectively, suggests that the model provide a good fit. The differences among the effects of single SNPs or haplotypes on each trait were compared with Bonferroni correction. The significant level of the multiple tests was equal to the raw P value divided by number of tests. The additive (a), dominance (d) and allele substitution (α) effects were estimated according to the equation proposed by Falconer & Mackay [23], i.e., a = (AA − BB)/2, d = AB − (AA + BB)/2 and α = a + d(q − p), where AA and BB represent the two homozygous genotypes, AB is heterozygous genotype, and p and q are the allele frequencies of corresponding loci.

Phenotypic Variance
The proportion of phenotypic variance of the trait explained by a SNP was symbolized to show the effect of a SNP on a specific trait. The calculation formula is: where p is the allele frequency of SNP, α is the average effect of gene substitution calculated by the linear mixed model, and σ 2 p is the estimate of the phenotypic variance using the complete DHI data of Chinese dairy cattle population.

Ethics Approval and Consent to Participate
All protocols for collection of the blood and frozen semen samples of experimental individuals were approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University (Permit Number: DK996). We obtained written agreements from the cattle owners to use the samples and data.

SNPs Identification
Two SNPs of rs136632043 and rs132976221 were identified for HSPA8 gene, with one located in the 3 regulatory region (3 -UTR) and the other located in the intron (Table 1). A total of 11 SNPs was discovered for ERBB2 gene. Among these identified SNPs, eight SNPs (ss1996900615, rs109122971, ss1996900614, rs110133654, rs109941438, rs110552983, rs133031530 and rs109763505) were found within introns, one (rs133724008) was in the 5' regulatory region (5 -UTR), and two SNPs (rs110735562 and rs109017161) were synonymous substitutions located in exons. All 13 identified SNPs of HSPA8 and ERBB2 genes were in Hardy-Weinberg equilibrium (p > 0.05, Table 2).

Single Locus-Based Association Analyses
SNP-rs136632043 was highly associated with all five milk production traits (p = 0.0086 to p < 0.0001; Table 3). SNP-rs132976221 also showed strong associations with three yield traits (milk yield, fat yield, and protein yield, p < 0.0001). There were six significant pairs of SNP-trait explaining phenotypic variations with greater than 1%, a range from 1.70 to 5.13% (Table 3). In addition, SNPs in HSPA8 also showed the corresponding significant additive, substitution, or dominant effects on target traits (Table S3).
Nine SNPs (ss1996900615, rs109017161, rs109122971, ss1996900614, rs110133654, rs109941438, rs110552983, rs133031530, and rs109763505) in ERBB2 were significantly associated with milk protein percentage (p = 0.0177 to p < 0.0001; Table 3). SNPs of rs110735562 and rs133724008 had significant associations with three yield traits (milk yield, fat yield, and protein yield, p = 0.0290 to p < 0.0001), whereas SNP of rs133724008 was also significantly associated with milk protein percentage (p < 0.0001; Table 3). Phenotypic variations explained by the 11 SNPs in ERBB2 with greater than 1% were existed in four significant pairs of SNP-trait, ranging from 1.49 to 2.05% (Table 3). Nine SNPs (ss1996900615, rs109017161, rs109122971, ss1996900614, rs110133654, rs109941438, rs110552983, rs133031530, and rs109763505) had significant dominant effects on milk protein percentages (Table S3). SNP-rs110735562 had significant dominant effects on three yield traits. SNP-rs133724008 showed significant additive and substitution effects on three milk yield traits and significant dominant effects on milk protein percentage (Table S3). Note: p-value refers to the results of association analysis between each SNP and milk production traits. Different letter (small letters: p < 0.05; capital letters: p < 0.01) superscripts (adjusted value after correction for multiple testing) indicate significant differences among the genotypes.

LD and Haplotypes Analyses
One SNP (ss1996900615) identified for ERBB2 gene was insertion/deletion (InDel), whereas the remaining 10 SNPs were used to perform LD analysis. Pair-wise D measures showed that nine SNPs in ERBB2 were highly linked (D = 0.99~1.00) and one 12 Kb haplotype block comprising these nine SNPs were inferred (Figure 1), in which three main haplotypes were formed. The frequencies of the haplotypes ACGGGCTGC, GTCAATAAT, and GTCAGTAAT were 56.57%, 24.15%, and 15.04%, respectively (Table 4). Subsequently, haplotype-based analysis showed significant association of the haplotypes with all four milk production traits, except the milk yield (p = 0.0130 to p < 0.0001, Table 5). No LD was observed between the two identified SNPs for HSPA8 gene. One SNP (ss1996900615) identified for ERBB2 gene was insertion/deletion (InDel), whereas the remaining 10 SNPs were used to perform LD analysis. Pair-wise D′ measures showed that nine SNPs in ERBB2 were highly linked (D′ = 0.99~1.00) and one 12 Kb haplotype block comprising these nine SNPs were inferred (Figure 1), in which three main haplotypes were formed. The frequencies of the haplotypes ACGGGCTGC, GTCAATAAT, and GTCAGTAAT were 56.57%, 24.15%, and 15.04%, respectively (Table 4). Subsequently, haplotype-based analysis showed significant association of the haplotypes with all four milk production traits, except the milk yield (p = 0.0130 to p < 0.0001, Table  5). No LD was observed between the two identified SNPs for HSPA8 gene.

Discussion
The results of the present study confirmed the significant genetic effects of HSPA8 and ERBB2 genes on milk protein traits in a large population of dairy cattle. The two candidate genes also had remarkable impacts on other milk production traits, which may provide new insights into the enhancement of milk profiles via selection strategies.
Two SNPs, rs110735562 and rs109017161 in exonic region of ERBB2 showed significant associations with milk protein traits are synonymous variations, which could modify mRNA stability and further affect protein expression [24,25]. The SNPs have affect the promoter activity and gene expression [26], and a previous study reported that two SNPs (rs209535817 and rs210440016) in the 5 -UTR region of bovine SAA2 (serum amyloid A2) gene impact the phenotype through altering the promoter activity [27]. Hence, it suggested that the genetic effect of SNP rs133724008 identified in the 5 -UTR of the ERBB2 gene on milk production traits was likely due to the impacts on its transcription. Generally, mature miRNAs are bound to mRNA ribosomal complex and regulate the expression of target genes by complementary recognition of the 3 -UTR of the mRNA [28,29]. Thus, whether there are relative miRNAs binding to the 3 -UTR (Position: Chr15, 34216278-34216505) of HSPA8 including identified SNPs were predicted by RNAhybrid software [30]. The results showed that miR-301a was targeted to the 3 -UTR including SNP rs136632043 (Position: Chr15, 34216487, Table S4), which indicated the significant associations of SNP rs136632043 in 3 -UTR of HSPA8 with milk production traits probably resulted from miR-301a or unknown potential regulatory mechanism. Although an intron does not hold a sequence for coding protein, an important function of SNPs in introns in altering gene transcriptional level has been elucidated [31,32]. Additionally, the significant associations between SNPs in introns with milk protein traits are also likely due to their LD with true causative variation.
As a group of highly conserved and widely expressed proteins, heat-shock proteins (HSPs) play important physiological functions [33]. For example, HSPA8 has been validated as an evolutionarily conserved protein in swine and bovine [34]. HSPA8 is highly involved in many biological processes, including proteasomal degradation [35], catalyzing protein folding and clathrin uncoating [36], and other protein networks involved in protein catabolism, protein homeostasis, ubiquitination, carbohydrate metabolism and cell cycle control [37]. ERBB2 is a member of a family of transmembrane receptor tyrosine kinases involved in the regulation of cellular processes by modulation of several pathways, such as mTOR, MAPK, and PI3K/AKT pathways [38][39][40]. In consistent with the present study, the functional role of ERBB2 on milk production traits has been also identified as positional candidate gene for lactation persistency in Canadian Holstein cattle [41]. Therefore, the results of the current study and previously published research indicate that HSPA8 and ERBB2 are promising candidate genes that have strong genetic effects on milk production traits. In addition, both single SNP-based and haplotype-based association analyses also suggest that the novel SNPs in these two genes may be used as potential genetic markers for genetic improvement in dairy breeding schemes.
Generally, a small proportion with less than 1% of the phenotypic variance was explained by polymorphisms underlying complex traits in livestock animals [42]. In the present study, six and four significant pairs of SNP-trait explaining phenotypic variations with greater than 1% were found in HSPA8 and ERBB2, respectively. These results suggest that the subset of these large-effect SNPs (rs132976221, rs136632043, rs133724008, and rs110735562) could be used as potential genetic markers for further marker-assisted selection (MAS) in milk production traits, especially for milk protein traits. All identified SNPs in HSPA8 and ERBB2 genes could be incorporated into the SNP panels for genomic selection of dairy cattle breeding schemes and could be used to improve frequencies of the genetic markers that are positively related to milk production traits of interest.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/2/104/s1. Table S1: The distribution of daughters from 17 herds in 17 farms. Table S2: PCR primers information of HSPA8 and ERBB2 genes. Table S3: Additive, dominant and allele substitution effects of the SNPs associated with milk production traits of HSPA8 and ERBB2 in Chinese Holstein cattle. Table S4: The predicted miRNAs targeted to the 3 -UTR of HSPA8 gene.
Author Contributions: C.L. and M.W. conducted association analysis and wrote the manuscript. W.C. and S.L. took part in SNP identification and genotyping. C.Z. and H.Y. collected phenotypes and pedigree data. D.S. participated in the data filtering and provided suggestions for the manuscript. S.Z. designed the study and revised the manuscript. All authors read and approved the final manuscript.