The Candidate Chromosomal Regions Responsible for Milk Yield of Cow: A GWAS Meta-Analysis

Simple Summary Milk production is one of the most important economic traits in dairy cattle. Therefore, determining the genomic regions influencing this trait can improve milk yield. In this study, we collected data from 16 articles associated with milk yield genome-wide association studies (GWAS) on different cattle breeds. Based on the information from the analysis and level of significance (p-value < 2.5 × 10−6), we identified different genomic regions on chromosomes with the highest marker density, markers with the highest effect, genes within or near these regions, chromosomes with the greatest effects on milk yield. Abstract Milk yield (MY) is highly heritable and an economically important trait in dairy livestock species. To increase power to detect candidate genomic regions for this trait, we carried out a meta-analysis of genome-wide association studies (GWAS). In the present study, we identified 19 studies in PubMed for the meta-analysis. After review of the studies, 16 studies passed the filters for meta-analysis, and the number of chromosomes, detected markers and their positions, number of animals, and p-values were extracted from these studies and recorded. The final data set based on 16 GWAS studies had 353,698 cows and 3950 markers and was analyzed using METAL software. Our findings revealed 1712 significant (p-value < 2.5 × 10−6) genomic loci related to MY, with markers associated with MY found on all autosomes and sex chromosomes and the majority of them found on chromosome 14. Furthermore, gene ontology (GO) annotation was used to explore biological functions of the genes associated with MY; therefore, different regions of this chromosome may be suitable as genomic regions for further research into gene expression.


Introduction
Milk is an important natural source of nutrients for the growth of newborn mammals. Different methods have been applied to detect genetic factors affecting milk production in dairy cattle, the most recent of which is genome-wide association studies (GWAS). The ultimate goal of GWAS is to identify the dependency between single nucleotide polymorphisms (SNP) and a trait using high-density markers at the genome surface to detect causative mutations that affect the phenotype of a trait [1]. During the last decade, GWAS has become an important source for generating novel hypotheses in the field of genetics. Therefore, GWASs tend to be suitable for detecting common variants associated with specific phenotypes [2].
Using data from GWAS, the meta-analysis technique is used to detect common genomic regions affecting traits by pooling the results of many studies together. Meta-analysis is an essential tool for synthesizing evidence needed to inform clinical decision making and policy. Systematic reviews summarize available literature using specific search parameters followed by critical appraisal and logical synthesis of multiple primary studies [3]. Nowadays, the meta-analysis technique is used in the agricultural and veterinary sciences in order to resolve inconsistencies in the results of scientific sources. Using the meta-analysis technique, which is a systematic and statistical study, data from different studies can be combined to achieve a single conclusion and interpretation. The reason is that individual studies have some limitations regarding the statistical power and reliability of the results. A meta-analysis by combining data and results of different research improves statistical power and accuracy of estimates [4,5]. Meta-analysis is becoming an increasingly important tool in GWAS studies of complex genetic diseases and traits [6]. The aim of this study was to detect the chromosomal regions related to milk yield using meta-analysis of different cow breeds.

Data and Literature Review
The review of GWAS studies on cow milk yield regarding the number of chromosomes and SNP positions reveals details of chromosomal regions that affect the trait. In this study, the data from GWAS tests on MY are from Google Scholar (https://scholar.google.ca/, accessed on 12 December 2019) and the National Center for Biotechnology Information site (www.ncbi.nlm.nih.gov, accessed on 27 December 2019) searched ( Figure 1). Using different filters including articles in journals with high impact factors (greater than 0.9) and timespan 2010-2019 and also had the required factors for analysis with METAL software, 16 out of 19 studies were used for meta-analysis. The required information such as marker name and the number of their chromosomes, their position on the chromosome, their p-values, and also the number of tested animals of each study was stored in a file. It should be noted that the number of autosomal and sexual SNPs associated with milk yield that were extracted from these 16 articles was 3950. Therefore, GWASs tend to be suitable for detecting common variants associated with specific phenotypes [2]. Using data from GWAS, the meta-analysis technique is used to detect common genomic regions affecting traits by pooling the results of many studies together. Meta-analysis is an essential tool for synthesizing evidence needed to inform clinical decision making and policy. Systematic reviews summarize available literature using specific search parameters followed by critical appraisal and logical synthesis of multiple primary studies [3]. Nowadays, the meta-analysis technique is used in the agricultural and veterinary sciences in order to resolve inconsistencies in the results of scientific sources. Using the meta-analysis technique, which is a systematic and statistical study, data from different studies can be combined to achieve a single conclusion and interpretation. The reason is that individual studies have some limitations regarding the statistical power and reliability of the results. A meta-analysis by combining data and results of different research improves statistical power and accuracy of estimates [4,5]. Meta-analysis is becoming an increasingly important tool in GWAS studies of complex genetic diseases and traits [6]. The aim of this study was to detect the chromosomal regions related to milk yield using metaanalysis of different cow breeds.

Data and Literature Review
The review of GWAS studies on cow milk yield regarding the number of chromosomes and SNP positions reveals details of chromosomal regions that affect the trait. In this study, the data from GWAS tests on MY are from Google Scholar (https://scholar.google.ca/, accessed on 12 December 2019) and the National Center for Biotechnology Information site (www.ncbi.nlm.nih.gov, accessed on 27 December 2019) searched ( Figure 1). Using different filters including articles in journals with high impact factors (greater than 0.9) and timespan 2010-2019 and also had the required factors for analysis with METAL software, 16 out of 19 studies were used for meta-analysis. The required information such as marker name and the number of their chromosomes, their position on the chromosome, their p-values, and also the number of tested animals of each study was stored in a file. It should be noted that the number of autosomal and sexual SNPs associated with milk yield that were extracted from these 16 articles was 3950.

Meta-Analysis
The meta-analysis was based on the weighted Z-scores model as implemented in the METAL software [7]. It considers the p-value, direction of effect, and the number of individuals included in each within-population GWAS study [8].

Meta-Analysis
The meta-analysis was based on the weighted Z-scores model as implemented in the METAL software [7]. It considers the p-value, direction of effect, and the number of individuals included in each within-population GWAS study [8].
The GWAS meta-analysis showed the effective chromosomes ( Figure 2). For the Manhattan plot, a pre-determined genome-wide significance threshold of 2.5 × 10 −6 was calculated with formulae 1 and 2 (α = 0.01).  Using the Ensembl site (http://ftp.ensembl.org/pub/release-103/gtf/bos_taurus/, accessed on 20 August 2020), the calculated data were checked and the loci of the effective markers and the genes were identified.

Downstream Analyses
The genes with variants that were significant in the meta-analysis and detected SNPs located on them were used as input for the gene ontology (GO) test. The GO terms (the significance level < 0.05) enrichment analysis with genes found within the top SNPs was performed. Using GO Consortium (https://biit.cs.ut.ee/gprofiler/gost, accessed on 5 February 2021), to investigate the biological processes of genes associated with MY investigated.

Results
The number of SNPs affecting the MY with a significance level lower than <2.5 × 10 −6 were 1712 sites located on all chromosomes and mainly on chromosome 14. The GWAS meta-analysis showed the effective chromosomes by the Manhattan plot ( Figure 2). The number of effective SNPs on chromosomes 14, 20, 6, and 5 were 950, 224, 87, and 65, respectively ( Table 1). The other 386 identified SNPs with significance levels lower than 2.5 × 10 −6 were located on the other 26 sex and autosomal chromosomes. The results showed that fifty-five percent of the effective SNPs related to milk yield were located on chromosome 14. Using the Ensembl site (http://ftp.ensembl.org/pub/release-103/gtf/bos_taurus/, accessed on 20 August 2020), the calculated data were checked and the loci of the effective markers and the genes were identified.

Downstream Analyses
The genes with variants that were significant in the meta-analysis and detected SNPs located on them were used as input for the gene ontology (GO) test. The GO terms (the significance level < 0.05) enrichment analysis with genes found within the top SNPs was performed. Using GO Consortium (https://biit.cs.ut.ee/gprofiler/gost, accessed on 5 February 2021), to investigate the biological processes of genes associated with MY investigated.

Results
The number of SNPs affecting the MY with a significance level lower than <2.5 × 10 −6 were 1712 sites located on all chromosomes and mainly on chromosome 14. The GWAS meta-analysis showed the effective chromosomes by the Manhattan plot ( Figure 2). The number of effective SNPs on chromosomes 14, 20, 6, and 5 were 950, 224, 87, and 65, respectively ( Table 1). The other 386 identified SNPs with significance levels lower than 2.5 × 10 −6 were located on the other 26 sex and autosomal chromosomes. The results showed that fifty-five percent of the effective SNPs related to milk yield were located on chromosome 14. Results for the top loci by p-value in the meta-analysis, with the most significant SNP per locus, are presented in (Table 2). The significance level of 1712 identified SNPs in the meta-analysis was compared and 5 SNPs of rs109421300, rs135549651, rs109146371, rs109350371, and BovineHD4100003579 had the smallest p-values (Table 2).  The identified SNPs were distributed on 18 genes (regardless of duplicate genes) with the names: DGAT1, ENSBTAG00000015040, RPAP3, ZC3H3, MROH1, MAF1, MAPK15, RHPN1, VPS28, TRAPPC9, ENSBTAT00000065585, ADGRB1, CYHR1, PTK2, PLEC, SCRIB, GML, and FAM135B.
The GO annotation based on biological processes (BP) showed 32 genes involved in biological functions associated with MY. According to the GO term, these candidate genes were found to be enriched in 15 biological processes. All of GO terms for MYrelated biological pathways were related to "wound healing", "metaphase/anaphase transition of meiosis I", "meiotic chromosome separation", "cell migration", "cell motility and locomotion" ( Table 3).
The 18 candidate genes for MY resulting from GWAS were associated with the GO terms of PTK2 (wound healing) also in (response to wounding), ENSBTAT00000065585  Note: GO enrichment analysis was performed in candidate genes associated with milk yield (p-value < 2.5 × 10 −6 ).

Discussion
A genome-wide meta-analysis and enrichment analysis for milk yield was conducted according to the results of 16 studies (on 353,698 cows and 3950 SNPs) from all over the world (Table 4). We confirmed substantial contribution of different chromosomal loci associated with MY in cows. Three of the most important SNPs, i.e., rs109421300, rs135549651, and rs109146371, were located on chromosome 14. These observations support the notion that the suggestive loci identified in this study, have an outstanding effect on MY. Moreover, fifty-five percent or 995 identified SNPs with a significance level lower than the specified level, were located on chromosome 14. Therefore, it can be concluded that chromosome 14 is the most effective chromosome on MY. The description of its different regions adds to the accuracy of this issue.
The study showed that regions 1,489,496 to 5,494,654 of chromosome 14 had the most effective SNPs compared to other regions of this chromosome. This means that all of the top 45 SNPs on chromosome 14 were located in this region. Only 24 SNPs in this region were located on the genes. Given that, the density of markers in some regions, including  1,675,278 to 1,967,325 and 4,336,714 to 4,468,478, was higher than in other regions, so that 13 SNPs from 45 of them were located on these regions and the most influential SNP (p-value: 2.93 × 10 −771 ) in this region was on DGAT1 (Diacylglycerol O-Acyltransferase 1), a protein-coding gene. DGAT1 is an enzyme that catalyzes the synthesis of triglycerides from diglycerides and acyl-coenzyme A [25]. The DGAT1 K232A polymorphism was previously shown to have a significant effect on bovine milk production characteristics (milk yield, protein content, fat content, and fatty acid composition) [25]. The next SNP (p-value: 1.12 × 10 −710 ) was located on the LOC100141215 gene. Therefore, because these regions have the highest density and the greatest effect, it can be said, the regions with the most impact.
In In the continuation of this study, for a better understanding of the mechanisms of MY and the genomic regions involved, it was necessary to analyze the candidate regions obtained from the results of this study. After performing downstream analyses and finding the relation between the identified genes and these terms, we investigated the relation between some of them and MY using studies that have been previously conducted.
Wound healing is a localized process that involves inflammation, wound cell migration and mitosis, neovascularization, and regeneration of the extracellular matrix [26]. Milk of the cow, especially low-fat milk, is a rich source of calcium which can play a significant role in the acceleration of wound healing and increment of healing quality [27]. Calcium has an essential role in wound healing; therefore, healing is known as a calcium-dependent process [27].
The metaphase to anaphase transition is a point of no return; the duplicated sister chromatids segregate to the future daughter cells, and any mistake in this process may be deleterious to progeny [28]. The metaphase to anaphase transition is controlled by a ubiquitin-mediated degradation process [28].
Cell migration is a complex process requiring the coordination of numerous interand intracellular events, such as cytoskeleton reorganization, matrix remodeling, cell-cell adhesion modulation, and induction of chemoattractants [29]. Cell migration plays an important role in a variety of normal physiological processes. These include embryogenesis, angiogenesis, wound healing, repairing of intestinal mucosal damage, and immune defense [30]. However, in some pathological conditions, such as atherosclerosis or gastrointestinal ulcers, a large area of denudation is commonly found, and an immediate repair by the reestablishment of the intact monolayer of cells is required [31].
Cell motility is the capacity of cells to translocate onto a solid substratum. This behavior is often a hallmark of fibroblastic cells. In epithelial cells, cell motility occurs after the dissociation of a cell from its neighboring cell(s) and after the modification of its position relative to other cells or a solid substrate [32]. Cell motility plays an integral role in many physiologic and pathologic processes, notably organized wound contraction and fibroblast and vascular endothelial cell migration during wound healing, metastatic tumor cell migration, stem cell mobilization and homing, and tissue remodeling [33]. Sufficient information is not available about other terms and their relation to MY and this requires further investigation.
For a better understanding of the mechanisms of milk production, it is suggested that more downstream analysis on the proposed region affecting MY including pathway analysis is carried out. Furthermore, it may be needed to review the contribution of the genes located in that region on the MY variance. For example, DGAT1, which is a major gene for MY, had the highest significant level in this study. Banabazi et al. (2016) have identified SNPs located on the transcribed regions and their 100 K proposed panel performed 2% better than the 700 K panel [34]. It is suggested to check the SNPs located on the candidate region among 1019 loci that they discovered on the transcriptome of chromosome 14 and 24,842 SNPs located on a high-density commercial SNP array (700 K) on the same chromosome. In addition, the comparison between Bos-taurus and Bos-indicus cattle may highlight the importance of the candidate region.

Conclusions
The most effective SNPs and genes which affect milk yield are located on chromosome 14, and the regions between 1,489,496 to 5,494,654 have the most effective SNPs in terms of the significance level. Emphasis on the use of these SNPs could justify a large part of the genetic variance in MY. Downstream analyses in these regions also partially demonstrated the mechanism of the effect of genes associated with MY in these regions. Additional analysis can help better understand the mechanism of MY in these regions.