1. Introduction
Sheep meat is the fourth economically most important type of meat product [
1]. Identification of quantitative trait loci (QTLs) and candidate genes that affect growth rate, carcass, and meat productivity traits is a prerequisite for the marker-assisted selection of these traits [
2]. Genome-wide association studies (GWAS) are a modern, agnostic, data-driven method for identifying QTLs associated with different traits. We previously developed a comprehensive database of QTLs associated with growth and meat traits in sheep based on the available published GWAS and gene-candidate studies [
3]. It is regularly updated with newly discovered loci, but only a few of them have been confirmed in independent studies or validated over the past seven years. Therefore, good quality annotation data are required for further development of this database as a necessary and important data source for modern selection practices in sheep.
There are two general ways of increasing the power of QTL detection—one by increasing the sample size of study cohorts and one by using a more powerful methodology for genetic analysis. The application of multivariate methods in genetic association studies can increase the statistical power to discover novel loci involved in the genetic control of quantitative traits [
4,
5,
6]. Multivariate analysis of variance (MANOVA) is one of the most powerful multivariate methods for genetic association studies [
6]. For certain traits, the application of MANOVA was empirically demonstrated to lead to non-trivial increases in the number of loci robustly identified in a replicable fashion [
7]. Recently, this method has become applicable to the so-called GWAS summary statistics (full information about the effect sizes, effective and reference alleles, sample size, standard errors, and
p-values for all available single nucleotide polymorphisms—SNPs). It should be noted that the application of MANOVA is a common practice in human genetics [
8,
9]. Its application in animal genetics is rapidly emerging [
10].
In this study, we focused on the search for new loci associated with meat productivity and carcass traits in sheep using a modern multivariate approach. For the genome-wide association studies in humans, it is customary to provide summary statistics in the supplements to the published research, which is not the case for studies in sheep. However, there are exceptions. Bolormaa et al. [
10] had previously provided full access to their summary-level GWAS data [
10]. Using data from 10,613 animals, they identified 71 significant loci associated with 56 traits, in a multivariate joint analysis of all 56 traits using an approach similar to MANOVA. However, grouping traits by biological relationship and/or by correlation structure of the data allowed us to increase the power of genetic mapping and detect new loci [
7]. In the present study, we exploited this approach to increase the number of loci associated with economically important traits and applied MANOVA to the summary statistics obtained in GWAS. We grouped 18 original univariate traits into three multivariate traits by biological relationship: the multivariate trait related to the general mass and mass of different parts (MMass), the multivariate trait that incorporates fat-related traits (MFat), and the multivariate trait that incorporates traits related to muscle properties (MMeat). Using multivariate analysis, we detected 12 associated loci, eight of which were reported for the first time. We replicated one of them in an independent sample of Russian sheep populations (
N = 108).
2. Materials and Methods
2.1. Data Collection
We used the data on summary statistics (Z-scores) for 510,174 SNPs for 56 carcass composition traits provided by Bolormaa et al. (2016) [
10]. The total sample size was 10,613. The sample consisted of nine breeds: Merino (MER), Poll Dorset (PD), Border Leicester (BL), Suffolk (SUF), white Suffolk (WS), Texel (TEX), Corriedale (CORR), Coopworth (COOP), and various Crosses (MIX). All animals were genotyped by Ovine Infinium
® HD SNP BeadChip, comprising 603,350 (HD) SNPs (Illumina Inc., San Diego, CA, USA) and the Illumina 50 k Ovine SNP chip (Illumina Inc., San Diego, CA, USA), comprising 54,241 (50 k) SNPs. Out of 10,613 animals with 50 k and 1735 HD genotypes, 1682 animals were genotyped by both SNP arrays. The imputation of the 50 k to HD was carried out using Fimpute [
11]. In total, 10,613 animals had real or imputed HD genotypes for 510,174 autosomal SNPs and a phenotypic record for at least one trait. For more details, see [
10].
2.2. Data Pre-Processing
We unified the original summary level data with the available Ovine Infinium
® HD SNP BeadChip reference (the Oar_v3.1 assembly) downloaded from [
12]. SNPs with mismatches in chromosome position or rs number excluded. In total, 493,742 out of 510,174 SNPs passed quality control and were used in further analysis. The program code is freely available and can be found at
https://github.com/Defrag1236/ovines_multivariate_analysis (access date: 3 March 2021).
2.3. Phenotype Grouping
Since this study is focused on the detection of loci associated with meat productivity, we used 18 out of 56 traits from the original study [
10] directly related to mass, carcass, meat, and fat weights. We grouped these 18 traits in three multivariate traits by the similarity of their biological functions: the multivariate trait related to the general mass and the mass of different parts (MMass), the multivariate trait that incorporates fat-related traits (MFat), the multivariate trait that incorporates traits related to muscle properties (MMeat). The description of the groups is provided in
Supplementary Table S1. The correlation matrix of all 56 traits is shown in
Supplementary Figure S1. The correlation matrix for each respective multivariate trait is provided in
Supplementary Figures S2–S4.
2.4. Multivariate GWAS Analysis
The multivariate analysis was performed using the summary statistics-based method described by Stephens (2013) [
6] using the R environment [
13]. This method enables the application of multivariate analysis of variance (MANOVA) using only the Z-scores (the ratio of effect size and its standard error) from the results of GWAS for several traits. In general, MANOVA is a generalized form of univariate analysis of variance (ANOVA) and uses the covariance between outcome variables in testing the statistical significance of the mean differences. In the context of genetic studies, MANOVA searches for the best linear combination of the original traits that is associated with a particular SNP. As a result, it gives the
p-value of the association of each SNP with the group of the traits being studied. In our case, the MANOVA T-score for the
j-th SNP was calculated as
where
is the vector of Z-scores for the original traits and
is a matrix of phenotypic correlations. The matrix of phenotypic correlations was calculated as the correlation between Z-scores for the original traits using the approach proposed by Stephens (2013) [
6]. The
p-value was calculated using the R command
, where T_score is the T-score and X is the corresponding number of degrees of freedom (9 for MMass, 6 for MFat, and 3 for MMeat).
We used the threshold p-value , where is the number of SNPs that passed QC, and 3 is the number of multivariate traits.
The associated loci were defined as regions within ±500 kb around the SNPs most significantly associated with the trait (lead SNPs). We manually checked regional association plots for all loci found and merged two loci into one if they were located within the same linkage disequilibrium block. SNPs located between 22 Mb and 42 Mb on chromosome 6, between 24.5 Mb and 28 Mb on chromosome 11, and between 62 Mb and 67 Mb on chromosome 18 were merged into one locus each.
A locus was defined as new if it was not present in the list of known SNPs and genes from our previously published database of QTLs and genes associated with meat productivity in sheep [
3]. This database contains information from 23 papers published between 2013 and 2020.
The proportion of variance explained by the locus and expressed as a percentage was calculated in the R environment as , where p is the least p-value among those for 56 univariate traits.
Inflation factors for all multivariate traits and original univariate traits used for analysis are presented in
Supplementary Table S2.
2.5. Regional Association Plots
Regional association plots were constructed using a homebrew script adjusted to the sheep genome annotation. The code is available in our Git repository (
https://github.com/Defrag1236/ovines_multivariate_analysis, access date: 3 March 2021). We used Ensembl gene annotation (genome version OAR_3.1). The linkage disequilibrium (
r2) between SNPs was calculated as the squared Pearson correlation between Z-scores for 56 original traits using the approach proposed by Stephens (2013) [
6]. All regional association plots for discovered loci can be found in
Supplementary Figures S8–S19.
2.6. Functional Annotation
We used the Ensembl variant effect predictor (VEP) to analyze the potential effects of SNPs and indels in high linkage disequilibrium (LD;
r2 > 0.7) with the lead SNPs. The analysis was performed using software available online [
14].
2.7. Literature-Based Prioritization
In order to detect promising candidate genes in associated loci, we performed a literature-based prioritization for genes located within 500 kb of the lead SNPs. We used the Uniprot database (
http://www.uniprot.org, access date: 3 March 2021) to extract information about the short gene annotation, molecular function, and biological process of the gene. Due to the relative scarcity of data on sheep gene function, we supplemented our literature-based analysis with information published in the catalog of human genes Online Mendelian Inheritance in Man (OMIM) [
15] and a mouse knockout database [
16].
2.8. The Functional Network of Genes Associated with Meat Productivity Traits
The functional network is based on the assumption that the genes functionally connected and/or involved in one gene network will show a similar pattern of association with traits (in our case 56 traits). The original idea was presented in papers by Bolormaa et al. [
10,
17] and a recent study in human genetics [
18]. To test our hypothesis, we used Spearman’s pairwise correlations calculated using original Z-scores. The sign of the correlation is irrelevant because the direction of the GWAS effect estimates depends on the allele used as a reference. Moreover, we named the loci after the prioritized genes rather than after rs IDs.
To construct the functional network, we performed a pairwise association analysis. Spearman’s correlation coefficient was computed for each pair from among 12 significantly associated SNPs [
10,
17,
19]. As a threshold for the significance of Spearman’s coefficient, we used the
p-value corrected for (12 × 11)/2 tests (
p-value ≤ 7.6 × 10
−4).
For the visualization of the functional network between genes with significant Z-based Spearman’s correlation, we used the “igraph” package for the R environment (
https://www.r-project.org/, access date: 3 March 2021). Clustering and visualization were carried out using the “corrplot” package for R and the basic “hclust” function. For clustering, we estimated squared Euclidean distances by subtracting the absolute values of the genetic correlation from 1 and used Ward’s clustering method.
2.9. Replication in a Russian Independent Sample
The replication was performed on a mix of Russian sheep populations totaling 108 animals. The sample consisted of two genetic pools of animals. The first pool contained backcross hybrids from the Romanovskaya breed and argali (Ovis ammon). The second pool contained mixed hybrids of the Romanovskaya breed and Katahdin breed with argali and mouflon (Ovis gmelini).
All animals were genotyped by Ovine Infinium® HD SNP BeadChip, comprising 603,350 (HD) SNPs (Illumina Inc., San Diego, CA, USA). The alleles were coded using the Illumina A/B forward/forward scheme. We filtered genotypes prior to analysis by MAF > 1% and by genotype call rate >95%. Next, we performed a PCA analysis of genotypes and manually checked whether the animals in each pool were genetically homogenous.
For all animals, phenotype information about 11 original traits was available for three-time intervals after the birth (6 days, 42 days, and 90 days). Based on these phenotypes, seven indexes (see
Supplementary Table S3) were calculated for three-time intervals. Each of these indexes reflects the meat productivity, growth, and carcass properties of the animals. In total, we analyzed 24 traits (seven indexes and mass in three-time intervals) for replication analysis.
GWAS for each trait and for each pool of animals was performed using the GEMMA software [
18], mixed-models, and the Wald test. The number of fetuses and sex were included as covariates. We used EMMAX [
20] for the estimation of the genetic relatedness matrix using the Balding–Nichols method. For each phenotype and time interval, we meta-analyzed two pools of animals by inverse-variance weighted meta-analysis using the METAL software [
21].
Next, we a performed multivariate analysis of three-time intervals on each of the eight phenotypes using the approach described above. Only traits with a univariate sample size of more than 20 were included in the multivariate analysis. In total, we obtained eight multivariate GWAS with a sample size varying from 50 to 108. Inflation factors for each of the univariate and multivariate traits are presented in
Supplementary Table S4. As additional validation, we checked the association of four known loci. Locus rs401834107 was associated with width slope index at
p < 1.106 × 10
−5, which is below the Bonferroni corrected threshold (
p-value
). The other three loci had a minimum
p-value higher than the threshold.
Only six out of eight novel loci were analyzed. Locus rs403766990 was excluded because of a low MAF (<1%), and locus rs428034699 was excluded because of a low genotypic call rate (<95%). The threshold for replication was set to
, where 6 is the number of novel SNPs for replication and 8 is the number of multivariate traits in the replication cohort. Moreover, if SNP was significant at least for one of the multivariate traits we have checked the consistency between the direction of the meat productivity effects estimated for the traits with univariate
p-value by Bolorma et al. [
10] and in our replication study.
4. Discussion
Our results demonstrate that it is possible to extract additional useful information from published data through the use of different methods. In our opinion, reporting full summary-level data may greatly accelerate the accumulation of relevant information about SNPs and their association with economically important traits in sheep. This practice is widely adopted in human GWAS studies.
Interestingly, five of eight novel loci were associated with MMass, the multivariate trait representing general body mass. The number of novel loci associated with the traits representing fat or muscle composition was only three. Among 56 original traits, we have detected the ones most associated with a lead SNP for each new locus (see
Supplementary Table S5). For three loci, the most associated univariate trait was leg bone weight. Dressing and hot carcass weight were among the most significant univariate traits. Thus, we may speculate that the genes located in the new loci are likely to alter the carcass and bone properties, potentially increasing the animal size and, consequently, increasing the animal mass.
We performed gene prioritization for the novel loci. For the locus tagged by rs193632759, we found a missense mutation in the MASP1 gene, which gives us relatively high confidence that this gene is functional for the association. For the locus tagged by rs425579441, there was only one annotated gene (SETBP1) within the 500-kb region (±500 kb away from the lead SNP), making this gene the only candidate for the locus. For the other loci, our confidence in the prioritized genes is not well supported due to the use of literature-based prioritization only. For six loci, our prioritization was based on cumulative evidence from the literature and on the physical proximity of the gene. In total, we have prioritized 13 genes for 8 novel loci. Additional studies are needed to find out what gene is truly functional in the loci with ambiguous prioritization.
We also replicated the novel loci in the Russian sheep population. Although the settings of the cohort were not optimal for replication and the sample size was not large enough to guarantee the optimum power, we replicated one of six novel loci (rs401990068) included in the replication study. Another locus near DOCK8/PGM5/DMRT1 was close to the threshold (p-value = 0.0014) but was not considered as replicated. Further replication studies are needed with larger sample sizes to confirm the association of the other novel loci.
It should be noted that for 3 loci out of the 12 discovered, we observed large blocks of linkage disequilibrium (between 22 Mb and 42 Mb on chromosome 6, between 24.5 Mb and 28 Mb on chromosome 11, and between 62 Mb and 7 Mb on chromosome 18). Those loci are highly associated with meat and carcass traits (the proportion of variance explained for the most associated univariate trait was approximately 3.42%, 1.42%, and 0.48% for loci on chromosomes 6, 11, and 18, respectively). It is most likely that these loci contain a haplotype or several haplotypes with a particularly strong effect and should consequently be considered as some of the most important loci for practical use and selection. A more detailed study of these regions would require a different methodology for fine mapping and/or sequencing of the regions. Moreover, we have shown that these loci together with the novel locus
FAM3C/WNT16 are parts of the same functional network (see
Figure 3). Given that the
LCORL gene is a known transcription factor and the strongest association signal for meat productivity traits, we may speculate that LCORL directly regulates the expression of genes in the other three loci, which most probably consist of receptors, enzymes, and ion channels.
Finally, we have uploaded information about eight new loci into a recently developed database of QTLs and genes associated with growth and meat traits in sheep [
3]. In total, the database now contains more than 130 unique loci. This information can be useful for further association studies and preliminary estimation of genetic variability in different breeds.
5. Conclusions
Eight novel loci containing the MASP1, DOCK8/PGM5/DMRT1, POlR1B, LLPH, SHISAL1, FAM3C/WNT16, SYNPO/PDGFRB/CDX1, and SETBP1 genes are associated with mass, fat, and muscle composition traits in sheep. The loci with the FAM3C/WNT16, LCORL, SLC16A11, and GHR genes are parts of one functional network with shared regulation. Association of locus rs401990068 was replicated on an independent sample of Russian sheep populations.