GWAS and Post-GWAS High-Resolution Mapping Analyses Identify Strong Novel Candidate Genes Influencing the Fatty Acid Composition of the Longissimus dorsi Muscle in Pigs

Fatty acid (FA) composition is one of the most important parameters for the assessment of meat quality in pigs. The FA composition in pork can also affect human health. Our aim was to identify quantitative trait loci (QTLs) and positional candidate genes affecting the FA profile of the longissimus dorsi muscle in a large F2 intercross between Landrace and Korean native pigs comprising 1105 F2 progeny by genome-wide association studies (GWAS) and post-GWAS high-resolution mapping analyses. We performed GWAS using the PorcineSNP60K BeadChip and a linear mixed model. Four genome-wide significant QTL regions in SSC8, SSC12, SSC14, and SSC16 were detected (p < 2.53 × 10−7). Several co-localizations of QTLs in SSC12 for oleic acid, linoleic acid, arachidonic acid, monounsaturated FAs, polyunsaturated FAs, and the polyunsaturated/saturated FA ratio were observed. To refine the QTL region in SSC12, a linkage and linkage disequilibrium analysis was applied and could narrow down the critical region to a 0.749 Mb region. Of the genes in this region, GAS7, MYH2, and MYH3 were identified as strong novel candidate genes based on further conditional association analyses. These findings provide a novel insight into the genetic basis of FA composition in pork and could contribute to the improvement of pork quality.


Introduction
FA composition plays an important role in the evaluation of pork quality because it can contribute to sensorial, nutritional, and functional factors. For example, FA composition controls the oxidative stability of pork, which in turn affects meat color and flavor [1]. Linoleic acid (an omega-6 FA) and α-linolenic acid (an omega-3 FA) are regarded as essential fatty acids because they cannot be synthesized in mammals [2]. In addition, the ratio of unsaturated FA (UFA) and saturated FA (SFA) levels in membrane fluidity and cell-cell interactions has been implicated in various types of human diseases, such as obesity [3], (6890N, Agilent Technologies, Germany). Table 1 shows the descriptive statistics and heritability for the FA traits. Several of the FA profile traits showed significant deviation from normality. Prior to the statistical genetic analyses, these data were transformed using the natural logarithm (i.e., C18:2 and PUFAs) or the square root (i.e., C20:1 and C20:4) to remove skewness.

Estimation of Heritability and GWAS for Mapping QTLs
The ASReml program (ver. 4.0) was used to estimate the heritability of each FA trait recorded in this study (VSN International, UK), and the following linear mixed model was used for analysis: y = Xb + Zu + e (1) where y is the vector of FA phenotypes and b is the vector of fixed effects including the intercept, sex, batch, and carcass weight; u is the vector of random additive effects following a normal distribution u~N(0, Aσ a 2 ), in which A is the numerator relationship matrix based on pedigree and σ a 2 is the additive genetic variance; e is the vector of random residual following a normal distribution e~N(0, Iσ e 2 ), in which I is the identity matrix and σ e 2 is the residual variance; and X and Z are the incidence matrices for b and u.
A single-marker association analysis was performed to identify QTLs for FA profile traits using the genome-wide efficient mixed-model association (GEMMA) program (ver. 0.94.1) [24]. This approach adjusts the familial relatedness within the F 2 intercross during the GWAS. The following linear mixed model was used to assess the association between SNP markers and FA profile traits: y = Xb + Z 1 a + Z 2 u + e (2) where y is the vector of FA phenotypes and b is the vector of fixed effects including the intercept, sex, batch, and carcass weight; a is the vector of fixed effect of SNP marker; u is the vector of random additive effects following a normal distribution u~N(0, Gσ g 2 ), in which G is the genomic relationship matrix that was constructed using the 39,485 SNP markers and σ g 2 is the additive genetic variance; e is the vector of random residual following a normal distribution e~N(0, Iσ e 2 ), in which I is the identity matrix and σ e 2 is the residual variance; X is the incidence matrix of b; and Z 1 and Z 2 are the incidence matrices for a and u.
To address multiple comparison issues of the GWAS, a Bonferroni-adjusted significance threshold (i.e., 0.01/39,485 SNP markers, p < 2.53 × 10 −7 ) was calculated for the LK cohort. Based on the significance threshold, a GWAS was performed chromosome by chromosome. The confidence interval of these QTL regions was determined using two SNPs (upper boundary and lower boundary) having nominal p-values lower than the significance threshold (p < 2.53 × 10 −7 ) in the neighborhood of the top-ranked significant SNP. The % proportion of phenotypic variance (%VAR SNP ) explained by the top-ranked SNP was computed using the following equation: where p is the minor allele frequency of the marker, a is the estimated allelic effect of the marker, and σ p 2 is the phenotypic variance component for each FA profile trait. The phenotypic variance component (σ p 2 ) was estimated using the GCTA program (ver. 1.93.2) [25].

Joint Linkage and Linkage Disequilibrium (LALD) Analysis for High-Resolution QTL Mapping
High-resolution mapping of the most significant QTL detected by GWAS was performed by jointly integrating both linkage (LA) and linkage disequilibrium (LD) using a haplotype-based approach. First, the CRIMAP program (ver. 2.507), developed by Evans and Maddox (URL: http://www.animalgenome.org/bioinfo/tools/share/crimap, accessed on 7 November 2017), was used to establish the genetic linkage map of the chromosome harboring the most significant QTL detected by GWAS. Second, founder haplotypes found in the parental pigs (i.e., Landrace and KNP) were reconstructed using the DualPHASE program (ver. 1.1) [26], which integrates LA information obtained at the family level and LD information obtained at the population level in a Hidden Markov Model framework. A total of 20 founder haplotype clusters (K = 20) were used. Third, the estimated haplotypes were incorporated into a linear mixed model including fixed (i.e., sex, batch, and carcass weight) and random (i.e., the 20 effects of founder haplotypes and animal effects) effects and random residual error terms to perform high-resolution mapping of QTL analysis using the QxPAK program (ver. 5.05) [27]. The confidence interval of the fine-mapped QTL was established using the 1-LOD drop method [28].

Further Conditional Association Analyses for Positional Candidate Gene Identification
Using the GCTA program (ver. 1.93.2), conditional association analyses were performed to further refine the critical region identified by LALD analyses [29]. This was achieved using a multiple regression frame to test the hypothesis of the presence of QTL at a particular DNA marker conditioned on selected markers (i.e., key markers) as cofactors [30,31]. Prior to conditional analysis, a stepwise regression was performed to select key SNP marker(s) using the GCTA-slct procedure. Subsequently, the selected key SNP markers were incorporated into a multiple regression based on a GCTA-cojo model as cofactors, and the effect of other SNP markers in the critical region was evaluated to determine whether additional independent association signals were present. A threshold of p < 0.0005 was applied to determine the statistical significance of the conditional analyses; this corresponded to the Bonferroni adjustment threshold for the critical region by LALD analysis.

Positional Candidate Gene Analyses
The gene list and map position in each QTL was extracted from the NCBI database release 85 based on Sus scrofa 11.1 assembly. A list of genes and the map position in each QTL region was obtained from the NCBI database. If locus information was not available in the NCBI database, the ENSEMBL database based on Sus scrofa 11.1 assembly was used. Comparative analysis of QTL locations was conducted using the Animal QTLdb (URL: https://www.animalgenome.org/cgi-bin/QTLdb/index, accessed on 8 January 2021).

Haplotype Reconstruction and Haplotype-Based Association Analysis
Pigs in the F 2 generation were used to construct haplotypes at the loci of interest using the SNP markers described in Supplementary Table S1. The Beagle program (ver. 3.3.0.) was applied to construct the haplotypes [32]. The haplotype data of the F 2 pigs were used to perform haplotype-based (diplotype) association analysis. The effect of haplotypes on FA traits was evaluated by the following general linear model (GLM) using the MINITAB program (ver. 14, MINITAB, State College, PA, USA): where Y ijkl is the phenotypic value, µ is the overall population mean, S i is the fixed effect of sex, B j is the fixed effect of batch, D k is the fixed effect of diplotype (Supplementary  Table S1), b is a regression coefficient, CW ijkl is covariate for the carcass weight, and e ijkl is the residual term of the model. To determine the statistical significance of the GLM analyses, a threshold of p < 0.01 was used.

Results and Discussion
Basic descriptive statistics and heritability of the 18 phenotypic records of FA traits in the longissimus dorsi muscle in the F 2 pigs are described in Table 1. The most abundant FA was C18:1, followed by C18:0, C18:2, and C16:1 in the F 2 population. The average estimate of heritability for the 19 FA composition traits was 0.396, which indicates that the contribution of genetic effects to phenotypic variation in FA composition traits is substantial.

GWAS Detected a Major QTL for FA Composition in SSC12 in the LK Cross
To investigate the genetic foundation underlying the FA traits of the longissimus dorsi muscle in pigs, we used a large F 2 intercross between Landrace pigs and KNP comprising 1105 F2 offspring. Using this cohort, GWAS detected 22 genome-wide significant QTLs (p < 2.53 × 10 −7 ) for the 17 FA traits located on pig chromosome 8 (SSC8), SSC12, SSC14, and SSC16 (Table 2, Figure 1, Supplementary Figure S1). These QTLs explained 4.4 to 29.4% of the phenotypic variance in FA phenotypes ( Table 2).
In SSC8, we mapped the QTL regions for C16:0 and C16:1 ( Table 2). Previously reported QTLs for C16:0 and C16:1 were colocalized to the identified QTL in SSC8 in this study [14] [33,34]. This QTL region harbors a known positional candidate gene: the ELOVL6 gene. The ELOVL6 (elongation of very-long-chain fatty acid protein 6) gene is located at 111.75-Mb. This gene plays a critical role in assembling carbons, catalyzing 16-carbon FA to form 18-carbon FA [35]. Corominas et al. [36] reported that the level of expression of the ELOVL6 gene was associated with C16:0 and C16:1 contents and the elongation activity ratio in pig muscle.
In SSC14, QTLs for C16:1, C18:0, SFAs, and UFAs were identified ( Table 2). Formerly reported QTLs for the four FA traits were mapped to the same region of identified QTL in SSC14 in this study [14,34,37,38]. The QTLs for C18:0, SFA, and UFA contain the SCD gene. The SCD (stearoyl-CoA desaturase) gene is located at 111.46-Mb. SCD plays a pivotal role in regulating the ratio of C18:0 to C18:1. This enzyme also controls the ratio of C16:0 to C16:1 [39]. We previously reported that the SCD gene has a considerable effect on the FA profile in this F 2 cohort [40].
In SSC16, a QTL for C20:0 was detected ( Table 2). Formerly identified QTLs for C20:0 were colocalized to the QTL identified in SSC16 in this study [14,41]. The QTL region includes the ELOVL7 gene. The ELOVL7 (elongation of very-long-chain fatty acid protein 7) gene is located at 39.46 Mb. This gene encodes an elongase that is involved in the catalysis of very-long-chain fatty acids, including C20:0 [42]. Thus, the ELOVL7 gene can be regarded as a strong positional candidate gene for C20:0 in this QTL.
In SSC12, we identified genomic regions enriched in QTLs for the trait of interests ( Figure 1, Table 2). In particular, the QTLs for C18:1, C18:2, C20:4, MUFAs, PUFAs, and the P/S ratio showed much lower p-values than the other 9 QTLs. These QTLs span from 42.14 to 59.56 Mb (17.42 Mb region) in SSC12 and contain 344 annotated genes. However, the QTL region is too large to pinpoint positional candidate genes for the FA traits in SSC12.

Post-GWAS LALD Mapping Narrowed Down the QTL Confidence Interval to a 749.1 kb Critical Region in SSC12
We conducted a joint linkage and linkage disequilibrium (LALD) mapping to narrow down the confidence intervals (CIs), including positional candidate genes ( Figure 2). A 1-LOD drop method was applied to estimate the CI of the QTLs in SSC12 for 15 FA traits [28]. The longest CI was 8.02 Mb, while the shortest CI was 0.749 Mb. The 0.75 Mb region was shared across the 15 QTLs for FA traits in SSC12 (Supplementary Figure S2). Thus, we defined the CI of the QTLs for FA traits as a 749.1 kb region (12:54,812,172-55,561,243 bp). This critical region overlaps with the previously reported 718.4 kb region (12:54,842,795-55,561,243) affecting the intramuscular fat (IMF) content in longissimus dorsi in the LK cross [12]. To the best of our knowledge, this critical region in SSC12 was revealed for the first time for muscle FA traits in pigs in this study. Recently, a comprehensive GWAS for FA profile traits was reported using multiple pig populations [7]. This study also detected a significant association of SNPs on SSC12 with FA composition traits. However, the location of the SNPs was not overlapped with the critical region that we revealed in this study. This critical 749.1 kb region in SSC12 encompasses 10 annotated genes (i.e., GAS7, MYH13, MYH8, MYH4, MYH1, MYH2, MYH3, ADPRM, TMEM220, and PIRT) with 19 SNP markers in the Sscrofa 11.1 genome dataset.   As shown in Figure 2, the LALD profiles for C18:1, C18:2, C20:4, MUFAs, PUFAs, and the P/S ratio showed a cluster with an extremely high log 10 (1/p-value) compared to those of the other nine traits. In addition, the CIs of the six traits defined by the LALD analysis were narrower than those of the other 9 traits (Supplementary Figure S2). Thus, we focused on these six FA traits for further high-resolution mapping.

Post-GWAS Conditional Association Analysis of the 749.1 kb Critical Region Identified Strong Candidate Genes
Prior to conditional association analysis, we performed a stepwise regression analysis to select key SNP marker(s) among the 19 SNP markers in the 749.1 kb critical region using the GCTA-slct procedure. The selected key SNP was used as a cofactor for subsequent conditional analysis in the critical region. From the variable selection analysis based on stepwise regression, GAS7:g.18482 T>C (12:54,956,054) was selected as the single key marker across the six FA traits (Tables 3-5). In the single-marker association analysis, this marker showed a strong association with C18:1, C18:2, C20:4, MUFAs, PUFAs, and P/S ratio (Table 2, Supplementary Figure S3). Further, we encountered similar evidence of a strong association among adjacent SNP markers in the 749.1 kb critical region because of the comprehensive linkage disequilibrium (LD) structure (Supplementary Figure S3) [43]. (B) Ten NCBI protein-coding genes are located within the 749.1 kb region associated with the FA composition traits. Gene names in parentheses were annotated by our previous study [12].
To explore the possibility of dissociating association signals in the high LD region, we performed a conditional association analysis by conditioning the key SNP at each of the remaining SNP markers in the 749.1 kb critical region. Except for one SNP (intron ENSSSCG00000044652:g.8526 T>C (12:55,530,321)) showing marginal significance for C18:1 (p = 0.0004), no additional significant association signals were detected (Tables 3-5 (Tables 3-5). A single-marker association study reported that these three SNPs showed extremely high significance. However, it was not possible to statistically resolve the effect of these SNP markers, given that this collinearity originated from the LD structure. The LD correlation (r) values were >0.996 between the three SNPs and the key marker (Tables 3-5; Supplementary Figure S3); therefore, we constructed haplotypes of the four markers to evaluate the combined effect of the three genes (i.e., GAS7, MYH2, and MYH3) on the FA traits.

Haplotype Construction and Haplotype-Based Association Analysis
We constructed haplotypes of the SNP markers of the GAS7, MYH2, and MYH3 genes to evaluate the effect of haplotypes derived from the three genes on the FA traits. The haplotype and diplotype frequencies in the F 2 pigs are listed in Supplementary Table S1. The results of haplotype-based association analysis indicated that there were significant associations between diplotype and FA traits in F2 progeny (Figure 3). The effects of diplotype on C18:1, C18:2, C20:4, MUFAs, PUFAs, and the P/S ratio showed extremely low p-values compared to those of the other of 6 traits ( Figure 3). Moreover, some of the data (i.e., C16:0, C17:0, C18:1, C18:2, C20:1, C20:4, MUFAs, PUFAs, and P/S ratio) fit considerably well to an additive model of inheritance (i.e., the genotypic value of heterozygote lies in the middle of the genotypic value of the two homozygotes).

GAS7, MYH2, and MYH3 Were Revealed as Novel Strong Candidate Genes for FA Composition
The FA composition of meat lipids has various effects on meat quality and human health. For example, the proportion of MUFAs, specifically oleic acid (C18:1), was strongly and positively correlated with overall palatability, including flavor, juiciness, and tenderness in beef [44]. Among UFAs, PUFAs have been in the spotlight because of their diverse physiological functions. PUFAs are essential factors in several cellular functions and modulate the physical properties of cell membranes and the gene expression of enzymes involved in triglyceride storage and fatty acid oxidation [45]. Thus, adequate intake of PUFAs in human diets can reduce the risk of several diseases, including cardiovascular diseases [46]. Since the haplotypes of the GAS7, MYH2, and MYH3 genes were reciprocally associated with MUFAs and PUFAs, marker-assisted selection of haplotype1 (ht1) can result in the production of PUFA-enriched pork with low palatability. By contrast, marker-assisted selection of the haplotype2 (ht2) can lead to the production of pork with high MUFAs content. Thus, marker-assisted selection of the ht1/ht2 diplotype could be an immediate approach to simultaneously acquiring the two FA profile characteristics in pork.
The GAS7 gene encodes growth arrest-specific protein 7, which was initially known for its important roles in neuronal development and is implicated in the regulation of lipid metabolism [47,48]. Yang et al. [49] constructed GAS7 transgenic (Tg) mice and reported decreased adiposity and unesterified cholesterol in a Tg mouse model. They also suggested a putative involvement of the GAS7 gene in free FA metabolism on the basis of pathway and network analyses. In the case of genes encoding myosin heavy chain (MYH) isoforms, Lim et al. reported effects of SNPs in the MYH2 locus on muscle fiber characteristics and meat quality including IMF content [50]. Recently, we also reported that a functional regulatory variant of MYH3 is causatively associated with muscle fiber-type composition and IMF content in the same population used in this study [12]. Myoblasts and adipoblasts originate from the same mesoderm layer in embryos, and many studies on the multipotential capacity of muscle satellite cells to differentiate into myogenic, adipogenic, and osteogenic cells have been conducted. However, knowledge of the implications of MYH isoforms in this transdifferentiation between myoblasts and adipoblasts is still limited [51]. Therefore, further functional studies are required to reveal the potential involvement of MYH isoforms in the transdifferentiation process. Because of the strong genetic correlation between IMF content and FA profile [52], our results uncover the possibility of simultaneous manipulation of muscle FA composition and IMF content by identifying the significant haplotype effect of three genes on the FA profile.

Conclusions
In this study, we aimed to identify individual positional candidate gene(s) in the FA profile of the longissimus dorsi muscle in an intercross between Landrace pigs and KNPs. For this purpose, we conducted a GWAS in combination with post-GWAS high-resolution mapping analyses. The GWAS approaches identified significant QTLs in SSC8, SSC12, SSC14, and SSC16 at the genome-wide level. Relevant concordance was revealed between the QTLs detected in this study and previously reported QTLs for muscle FA traits, mainly in SSC8, SSC14, and SSC16. The QTL in SSC12 has not been well documented for FA traits. The post-GWAS high-resolution mapping analyses of the QTL in SSC12 identified a 749.1 kb critical region, and the GAS7, MYH2, and MYH3 genes were identified as strong candidate genes for FA traits. Furthermore, although we could not elucidate the individual contribution of the three genes to the phenotypic variation of the FA traits because of the extremely high LD, these findings provide new insight into the genetic basis of FA composition in pork and could contribute to the genetic improvement of pork quality.