Sheep milk production is highly important in Mediterranean and Middle Eastern countries. Spain holds one of the highest dairy sheep livestock counts in Europe [1
], and almost all the sheep milk produced is used for the production of high-quality cheese [2
]. Milk composition has a strong influence on the technological and organoleptic characteristics of dairy products [3
]. Sheep milk properties enable sheep cheeses to have better sensory characteristics than cheeses from goat and cow milk [4
Breed selection schemes in dairy sheep are generally focused on milk yield and fat and protein contents. Therefore, genetic parameters for these traits have been widely studied [5
]. However, few genetic studies have investigated the genetic component involved in the cheese-making process through the analysis of milk traits (milk yield and composition) and cheese-making traits (milk coagulation properties (MCP) and cheese yield-related traits) [8
]. Cheese-making traits are difficult to measure in routine integration into classical breeding programmes, therefore the identification of genetic markers associated with these genes may be of high relevance to the sheep dairy industry.
To elucidate the complex genetic architecture underlying milk traits, several research approaches have been performed. Previous studies have focused on the evaluation of polymorphisms in ovine major milk proteins (caseins and whey proteins) and genes related to the fat synthesis of milk. Some of these polymorphisms have been associated with milk yield, protein and fat milk contents and milk technological properties [3
]. Complex traits, such as milk and cheese-making traits, are assumed to be influenced by many genomic regions. In this sense, the availability of genome-wide Single Nucleotide Polymorphisms (SNPs) panels has enabled the identification of genomic regions associated with complex traits in many cases by applying the genome-wide association study (GWAS) approach. Detailed information about the genomic regions or quantitative trait loci (QTLs) influencing traits of interest in dairy sheep identified by association–based studies can be found in the SheepQTLdb [14
]. Despite the considerable advantages of the GWAS approach in the identification of genomic regions associated with these traits, we need to consider that, for complex traits, it is difficult to devise experimental designs with adequate power to identify genes that contribute to the genetic variance of these traits [15
]. Some specific statistical procedures, such as stepwise regression, may help to overcome this power limitation. In addition, integrated approaches, such as those based on partial correlation and information theory (PCIT) [16
], have attempted to enrich GWAS analyses with information from other sources, providing useful alternatives for characterising genes and gene networks associated with complex traits [17
]. Generating knowledge on these gene networks may help to elucidate the genetic architecture of complex traits and thus develop genomic tools with predictive value for such traits.
In the present study, by using a custom 50K chip integrating SNPs identified in a previous study by our group investigating the variability of the sheep milk transcriptome [19
], we applied a stepwise procedure in combination with classical GWAS, co-association network (PCIT) and pleiotropy analyses to decipher the genetic architecture of 14 milk and cheese-making traits measured in a commercial population of Assaf. The principal aim of this stepwise analysis is the identification of gene networks (candidate genes and their potential regulators) and biological processes implicated in milk synthesis and cheese-making production efficiency. The results described in this report have enabled us to select a panel of markers that could be used as predictors of an individual’s genetic potential for milk and cheese-making. This study may provide a practical and cost-effective solution for the genetic improvement of these economically important traits in the studied population.
Almost all the milk produced from Spanish Assaf ewes is used for cheese manufacturing. Therefore, cheese-making traits could be used as selection criteria in dairy sheep breeding programmes. However, the routine measurement of milk traits is simpler and less expensive than that for cheese-making traits, especially at the individual animal level. Since milk traits are already considered selection criteria in genomic selection programmes of dairy sheep, previous studies have focused on identifying the relationship between milk’s physicochemical composition parameters and cheese-making variables [10
]. Furthermore, concerning the milk coagulation properties and cheese yield in the Assaf breed, the genetic parameters of these traits have been adequately discussed in a recent paper by our group [10
]. In this study, we analysed seven milk traits and seven cheese-making traits through a stepwise procedure in combination with classical GWAS, pleiotropy and co-association analyses. Our main aim was to identify SNPs located within a confidence interval of genes that are relevant to the traits considered, which could be used in genomic selection programmes applied in dairy sheep.
Regarding the results of the multifactorial ANOVA (Table 1
), our results regarding the most important role of pH on milk coagulation efficiency followed by the effect of SCC agree with previously reported studies [20
]. On the other hand, our analysis related a high initial milk pH measurement, low SCC and low lactose content to inefficiency in the coagulation process (Figure 3
), which is in agreement with previous reports [2
]. In addition, we found an influence of the DIM on the milk and cheese-making traits (Table 2
) according to Jaramillo et al. [2
], who described the variation of the renneting variables and physicochemical milk composition during lactation in sheep.
The high correlations found among the two families of traits that were analysed (Table 4
) support the possibility of using these correlations to predict the GEBV from cheese-making traits from milk phenotypes, whose sampling is implemented in the official milk recording system, and the genotypes of the SNP chip. To this end, a stepwise analysis strategy has been applied to obtain the minimum number of SNPs that can explain the maximum genetic variance for both types of traits.
The stepwise regression forward selection method generates a GRM in each step, attempting to capture as much additive genetic variance as possible for each trait. The variation of the genomic relationship between the animals of the population (Figure S2
) enabled us to reach the maximum of the genetic variance explained through the design of an idealised pedigree, achieved in the 109th step of the stepwise method, where the animals were arranged in small groups with similar z-score values (Figure S3
). Hence, the z-score estimated here summarises the cheese-making aptitude based on the 14 traits analysed in this study.
The genetic component is one of the factors influencing cheese production; therefore, elucidating the genomic regions related to milk and cheese-making traits might help to elucidate the genetic background underlying cheese-making efficiency. In this study, the combination of classical GWAS with the stepwise regression method and pleiotropy analysis was an efficient approach to discover the best combination of genetic variants underlying cheese-making traits. These SNPs, located within genes or in the confidence interval of 20 kb from a gene, can explain the highest proportion of genetic variance and could help to understand the role of the related genes and their co-associations on the studied traits. Through stepwise analysis, we selected two gene sets. The first significantly co-associated gene set, composed of 374 genes, could be useful for the design of a low-density SNP chip to generate information that could help to increase the efficiency of dairy sheep breeding programmes. The second selected gene set, composed of 4586 genes, might help to elucidate the role of the genes that influence cheese-making efficiency. This gene set also revealed how much of the average genetic variance of the 14 traits could be overestimated according to the markers selected for the corresponding analysis.
The functional enrichment analyses performed, based on multiple sources of information, enabled us to identify and classify the biological processes related to the two considered gene co-association networks. The first gene co-association network was composed of 374 genes, of which 55 were TF and CF. Among the TFs found, zinc-finger transcription, homeobox and ETS were the most common among families. These TFs are related to the control of the expression of multiple genes [33
] involved in regulating the expression of target genes associated with cellular differentiation [34
] and activating or repressing the transcription process [35
]. For that reason, transcription and co-transcription factors were considered potential regulators of the network. Moreover, three transcription factors were located in the confidence intervals of the QTLs related to the traits under study: the MECOM
gene related to cell differentiation and the regulation of transcription, the ZNF250
gene associated with the regulation of transcription and the ZFPM1
gene related to the cell morphogenesis process [11
]. These genes were expressed in the sheep mammary gland during lactation [19
], which supports their role in the synthesis of milk. All these genes could be considered functional candidates affecting milk and cheese-making traits in sheep. The enrichment analysis detailed 139 biological processes associated with protein metabolism pathways and 19 with fat metabolism pathways. Some of the genes that make up this first set (CD44
, ITPR1, PCSK2,
have shown a similar role in dairy cattle [15
] and are detailed in Additional file 8.
The second gene co-association network consisted of 4586 genes, including 497 potential regulators of the network. This gene set includes two additional transcription factor families: bHLH (basic helix–loop–helix), one of the largest families of dimerising transcription factors, and HMG (high mobility group), which is involved in many biological processes, such as transcription, replication and recombination [37
]. Among the new functions associated with transcription factors, the hormone-mediated signalling pathway should be highlighted due to its impact on milk production through the influence of corticotropin, prolactin and thyroid hormones [38
]. The successive gene selection by the stepwise method has allowed extending the list of genes possibly involved in milk synthesis and cheese-making efficiency and has enabled significant biological processes associated with the gene set to be identified. The detailed significant functions were generally related to basal and essentially biological processes, but we should emphasise the homeostatic process, ion transport and cellular response to stress. Suárez-Vega et al. [19
] also reported this last function as of significant relevance in the mammary gland, possibly due to the elevated rates of protein and fat synthesis faced by this organ during lactation. Finally, these results suggest that many general biological processes indirectly influence milk yield, composition and coagulation traits.
It is worth highlighting six of the genes gathered in the co-association network, which encode milk proteins or proteins involved in milk fat metabolism [19
]. The LALBA
gene, which encodes the whey protein α-lactalbumin, was reported to be strongly associated with protein and fat percentage in dairy sheep [11
]. The BTN1A1
gene, which encodes butyrophilin subfamily 1 member A1, and the SLC27A6
gene, which encodes solute carrier family 27 member 6, were found to be associated in cows with lipid droplet formation and fatty acid uptake, respectively [43
]. The perilipin-2 protein (encoded by the PLIN2
gene) was found to be related to the packaging of triglycerides for secretion as milk lipids in the mammary gland [44
]. Last, the ACACA
gene, which encodes acetyl-coenzyme A carboxylase α, and the SCD
gene, which encodes stearoyl-CoA desaturase, are related to fatty acid synthesis and desaturation [43
]. The phospholipase A2-activating protein (PLAA
gene), which is related to the protein phospholipid metabolic (GO:0006644) and prostaglandin metabolic processes (GO:0006693), and the acetyl-CoA acyltransferase 2 (ACAA2
gene), which is involved in fatty acid catabolic process (GO:0009062), were also highlighted by the enrichment analysis carried out in a previous study of the transcriptome of the sheep mammary gland [19
]. Moreover, Sanchez et al. [15
] reported 62 genes included in this gene set (see Table S5
) as possible functional candidates related to milk cheese-making properties. The effect of those genes on milk protein, milk fatty acid and milk mineral composition has also been supported in other studies [15
]. Similarly, Cánovas et al. [48
] reported three genes associated with citrate content in cow milk, coding for citrate synthase (encoded by CS
gene), dihydrolipoamide dehydrogenase (DLD
gene) and ATP citrate lyase (ACLY
gene), which were also detailed in this gene set.
Furthermore, pleiotropy is defined as the presence of statistically significant associations of one marker with more than one trait [49
]. Pleiotropic effects estimated for the co-transcription factors were higher than for the rest of the genes included in both gene networks which, together with the transcription factors, have been considered as potential regulators of the co-association networks presented in this study and therefore of the metabolic pathways related to milk and cheese-making traits. Apart from the transcription, co-transcription factors and coding genes, microRNAs (miRNAs) were also included in these gene networks; one miRNA was included in the first gene set selected (microRNA_125b-1), and 14 were included in the second selected gene set by stepwise analysis (see Table S2
). The miRNAs are involved in the regulation of the expression of complementary messenger RNAs [50
] and have a role in mammary gland development and lactation and lipid and fatty acid metabolism [51
]. In addition, several unannotated genes have been found in both gene networks, which could codify for novel proteins or constitute functional noncoding RNAs. In addition, some genes potentially belonging to the zinc-finger transcription factor family have been found to be unclassified, as in AnimalTFDB 3.0 [29
]. These findings reflect the incomplete annotation of the sheep genome, as previously suggested by Suárez-Vega et al. [19
]. Therefore, it is important to consider that this incompleteness of the reference genome can complicate the interpretation of results from association studies.
To summarise, stepwise regression analysis is a computationally costly and exhaustive method for prioritising genes related to the analysed traits, which has enabled us to identify co-association networks composed of candidate genes and their potential regulators. In addition, the approach presented in this study has also allowed us to understand the co-association among the highlighted gene sets and their possible biological roles in milk and cheese traits in sheep. The co-association network composed of 374 genes may be suitable for the design of a low-density chip useful to predict an individual’s genetic potential for cheese-making efficiency. This approach would enable selection for these difficult-to-measure traits earlier in life compared with traditional selection methods [25
]. Sheep milk is mostly transformed into cheese [53
]. Therefore, it is important to implement genomic selection strategies for milk and cheese-making traits to improve cheese-making efficiency without causing negative effects for the selection for milk production and other functional traits of considerable interest for sheep breeders, such as SCC.