Identifying Shared Risk Genes between Nonalcoholic Fatty Liver Disease and Metabolic Traits by Cross-Trait Association Analysis

Nonalcoholic fatty liver disease (NAFLD) generally co-occurs with metabolic disorders, but it is unclear which genes have a pleiotripic effect on NAFLD and metabolic traits. We performed a large-scale cross-trait association analysis to identify the overlapping genes between NAFLD and nine metabolic traits. Among all the metabolic traits, we found that obesity and type II diabetes are associated with NAFLD. Then, a multitrait association analysis among NAFLD, obesity and type II diabetes was conducted to improve the overall statistical power. We identified 792 significant variants by a cross-trait meta-analysis involving 100 pleiotripic genes. Moreover, we detected another two common genes by a genome-wide gene test. The results from the pathway enrichment analysis show that the 102 shared risk genes are enriched in cancer, diabetes, insulin secretion, and other related pathways. This study can help us understand the molecular mechanisms underlying comorbid NAFLD and metabolic disorders.


Introduction
Nonalcoholic fatty liver disease (NAFLD) refers to fat accumulation in at least 5% of hepatocytes without excessive alcohol intake. It encompasses the spectrum of disease from simple steatosis to nonalcoholic steatohepatitis (NASH) with hepatic inflammation and hepatocyte ballooning. The distinction between these diseases is important because steatosis is less likely to evolve into severe, liver-related complications, whereas NASH might progress to liver cirrhosis and hepatocellular carcinoma [1]. NAFLD is currently recognized as the most common liver disease worldwide [2]. It was estimated that NAFLD affected more than one-third of the population globally and 20 million people would eventually die of NAFLD-related liver diseases [3]. The prevalence of NAFLD is increasing with the dramatic escalation of metabolic syndrome [4]. Metabolic syndrome is defined as the presence of three or more of the following clinical symptoms: abdominal obesity, hypertension, elevated triglycerides, reduced high-density protein cholesterol (HDL) and impaired fasting glucose [5].
Clinical and epidemiological studies showed that NAFLD and metabolic abnormality often coexist in either one patient or different members of the same family [6]. It has been reported that 30-100% of NAFLD patients are obese individuals; type II diabetes is present in 10-75% of NAFLD patients; low HDL affects 64% of NAFLD patients; the prevalence of hypertriglyceridemia in NAFLD patients is 30-42%; the incidence of NAFLD in nonobese, nondiabetic patients with primary hypertension is more than twofold that in the control group [5]. These observations on co-occurrence suggest that there exists comorbidity between NAFLD and metabolism. The high rate of comorbidity may be due to the shared genetic factors and a common molecular mechanism.
NAFLD and metabolic disorders are highly heritable traits [5,7]. Previous twin studies reported substantial genetic correlations between hepatic steatosis and metabolic traits, including HDL (r = 0.451), triglyceride (r = 0.678), hypertension (r = 0.444), obesity (r = 0.534) and diabetes (r = 0.716) [8]. In addition, the liver has been pointed out to be a key determinant of metabolic abnormalities and NAFLD has been considered as a cause and a consequence of metabolic syndrome [9]. Furthermore, a recent study revealed the causal relationship between NAFLD, type II diabetes and obesity by using the Mendelian randomization analysis framework [10]. These studies also demonstrate the existence of shared genetic components between NAFLD and metabolic traits. Identifying shared risk factors among different complex traits is a hot topic at present. For example, shared genetic architectures have been identified between metabolic traits and Alzheimer's disease [11], seven psychiatric traits [12], chronic obstructive pulmonary disease and cardiovascular disease-related traits [13] and so on. In addition, our previous study integrated multitrait and multiomics approaches to identify shared risk genes for asthma, hay fever and eczema [14].
To date, several genome-wide association studies (GWASs) have been carried out on NAFLD and each of the metabolic traits. Some genes (such as PNPLA3 for NAFLD, FTO for obesity and TCF7L2 for type II diabetes) have been successfully discovered and confirmed [15][16][17][18]. In addition, murine NAFLD models and their potential human relevance were also illustrated [1]. However, it is still unclear which genes have a pleiotropic effect on NAFLD and metabolic traits. Therefore, identifying the shared risk genes of NAFLD and metabolic traits may help us uncover the common genetic architecture and may play a positive role in understanding the disease etiology.
In this study, we first estimated the genetic correlations of NAFLD with nine metabolic traits. Then, we conducted a multitrait association analysis among NAFLD and its genetically related traits to improve the overall statistical power. Moreover, we performed a cross-trait meta-analysis and gene test to identify the pleiotropic variants and shared risk genes. Finally, we undertook a transcriptome-wide association study (TWAS), a pathway enrichment analysis and protein-protein interaction (PPI) network analysis to obtain biological insight of the shared risk genes between NAFLD and its related traits.

Data Source
We downloaded the GWAS results from public databases, including NAFLD from the Scalable and Accurate Implementation of GEneralized mixed model (SAIGE) database [19]; hypertension (HTN), obesity and type II diabetes mellitus (T2D) from GWASATLAS database [20]; low-density lipoproteins (LDL), high-density lipoproteins (HDL), triglyceride (TC) and total cholesterol (TG) from The Global Lipids Genetics Consortium (GLGC) website [21]; fasting glucose (FG) and fasting insulin (FIN) from the Meta-Analyses of Glucose and Insulin-related Traits Consortium (MAGIC) website [22]. All the individuals in the SAIGE and GWASATLAS databases are from the UK Biobank (UKBB) study [23]. Detailed GWAS results for each phenotype are listed in Table 1.

Linkage Disequilibrium Score Regression Analysis
In order to quantitatively measure the genetic correlation between NAFLD and nine metabolic traits, we used linkage disequilibrium score regression (LDSC) to conduct a post-GWAS genome-wide genetic correlation analysis. Notice that all SNPs in the GWAS results were merged with HapMap3 SNPs after excluding those in the HLA region as recommended [11]. In addition, LDSC enables us to estimate the heritability of each trait, as well as the LD-score intercepts which can indicate whether the inflation in GWAS is caused by population structure or sample overlap [24].

Multitrait Association Analysis
After assessing genetic correlations, we conducted a multitrait association analysis among NAFLD and its genetically related metabolic traits to boost statistical power using MTAG software. The key idea of MTAG is that, when GWAS results from different traits are correlated, the effect estimates of each trait can be improved by appropriately integrating the information from the GWAS results of other traits [25]. Moreover, MTAG can generate trait-specific effect estimates for each SNP.

Cross-Trait Meta-Analysis
We applied RE2C cross-trait meta-analysis to identify the pleiotropic variations among NAFLD and its related metabolic traits based on the trait-specific GWAS results after MTAG analysis. RE2C is a random effect model to detect the heterogeneous effects of various GWAS results [26]. When we use RE2C software, the correlation matrix is made of the corresponding genetic correlation between each of the two traits. As a result of the GWAS results retrieved from different methods, the regression coefficients are represented as either the SNP effect sizes or odds ratios (ORs). We transformed logOR from logistic regressions for case-control studies into the equivalent regression coefficients obtained on the quantitative scale by the following formula [13]: where beta is the SNP effect size, and N case and N control represent the number of cases and controls, respectively.
To determine significant loci (p < 5 × 10 −8 ) that are independent of each other and genes within the range, we used the clumping function of PLINK software [27] with parameters -clump-p1 5 × 10 −8 -clump-p2 1 × 10 −5 -clump-r 2 0.05 -clump-kb 500clump-range NCBI.37.3.gene.loc [13], i.e., the SNPs with a p-value less than 1 × 10 −5 , LD statistic r 2 more than 0.05, and within 500 kb distance from the peak will be assigned to that peak's clump; parameter NCBI.37.3.gene.loc is the reference gene list file, which contains all the locations of gene in NCBI human genome build 37.

Genome-Wide Gene Test
In order to obtain more overlapping genes, we also conducted a genome-wide gene test based on the adjusted GWAS results (i.e., after MTAG analysis) for NAFLD and its related metabolic traits. The genome-wide gene test was performed using MAGMA (version 1.07) software [28], with gene annotation from NCBI human genome build 37 (including 19,427 gene locations) as reference. The significance threshold after Bonferroni correction was determined as 2.57 × 10 −6 (= 0.05/19,427). MAGMA analysis can identify the genes with significant association for each disease or trait.

Transcriptome-Wide Association Study, Pathway Enrichment Analysis, and Protein-Protein Interaction Network Analysis
We performed a transcriptome-wide association study (TWAS) using FUSION software [29] to detect significant genes whose cis-regulated expression is associated with NAFLD and its related metabolic traits in specific tissues. In our study, TWAS was based on 43 GTEx (version 6) tissues expression weights. The Benjamin-Hochberg method (false discovery rate <0.05) was used for gene-tissue pairs of each trait on p-values to account for multiple test correction. In order to understand the underlying biological pathways for the identified shared risk genes between NAFLD and its related metabolic traits, we used the tool Enrichr [30] to implement the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The significant criterion is the adjusted p-value less than 0.05. In addition, we applied STRING (version 10) [31] to analyze the protein-protein interaction (PPI) network.
The overall study design is shown in Figure 1.

Genetic Correlation between NAFLD and Metabolic Traits
LDSC was used to estimate the genetic correlation r g (ranging from −1 to 1) between NAFLD and nine metabolic traits. p < 0.05 or r g > 10% between two traits was considered to be significant [13]. We found a significant correlation (r g = 0.2106, p = 0.0132) between NAFLD and type II diabetes. Although the p-value between NAFLD and obesity did not reach the statistical significance level of 0.05, its genetic correlation r g was larger than 10%, so obesity was also considered to be significantly correlated with NAFLD. We did not observe a significant correlation between NAFLD and the other seven metabolic traits. Detailed results of genetic correlation are shown in Table 2.

Mutli-Trait Association Analysis between NAFLD and Metabolic Traits
Given the significant genetic correlations between NAFLD and its two related metabolic traits (obesity and type II diabetes), we improved the overall statistical power to detect novel loci by conducting MTAG multitrait association analysis. After MTAG analysis, the number of significant loci in the GWAS result of NAFLD increases from 24 to 157, and the distribution of locations is more extensive (from chromosomes 19, 22 to chromosomes 2, 3, 4, 6,7,8,9,10,16,22). With regard to obesity, the number of significant loci slightly decreases (from 67 to 65), but the locations broaden from chromosomes 16, 18 to chromosomes 3, 10, 16,18,19. Several loci (rs2518796 on chromosome 3, rs3810291 on chromosome 19, and other seven significant loci (rs697237, rs7901695, rs7903146, rs4132670, rs7074440, rs12243326, rs12255372) on chromosome 10 were newly detected. With regard to type II diabetes, the number of significant loci decreases from 385 to 284; however, the distribution of these significant loci is essentially unchanged, both being located on chromosomes 1, 2, 3, 4, 6,7,8,9,10,11,12,13,16,17,19. In addition, we also explored the independent significant loci, the number of which increased from 2 to 16 for NAFLD, from 3 to 6 for obesity, and decreased from 41 to 34 for type II diabetes. The Manhattan plots of NAFLD, obesity and type II diabetes before and after MTAG analysis are displayed in Figure 2.

Mutli-Trait Association Analysis between NAFLD and Metabolic Traits
Given the significant genetic correlations between NAFLD and its two related metabolic traits (obesity and type II diabetes), we improved the overall statistical power to detect novel loci by conducting MTAG multitrait association analysis. After MTAG analysis, the number of significant loci in the GWAS result of NAFLD increases from 24 to 157, and the distribution of locations is more extensive (from chromosomes 19, 22 to chromosomes 2, 3, 4, 6,7,8,9,10,16,22). With regard to obesity, the number of significant loci slightly decreases (from 67 to 65), but the locations broaden from chromosomes 16, 18 to chromosomes 3, 10, 16,18,19. Several loci (rs2518796 on chromosome 3, rs3810291 on chromosome 19, and other seven significant loci (rs697237, rs7901695, rs7903146, rs4132670, rs7074440, rs12243326, rs12255372) on chromosome 10 were newly detected. With regard to type II diabetes, the number of significant loci decreases from 385 to 284; however, the distribution of these significant loci is essentially unchanged, both being located on chromosomes 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 19. In addition, we also explored the independent significant loci, the number of which increased from 2 to 16 for NAFLD, from 3 to 6 for obesity, and decreased from 41 to 34 for type II diabetes. The Manhattan plots of NAFLD, obesity and type II diabetes before and after MTAG analysis are displayed in Figure 2.

Cross-Trait Meta-Analysis between NAFLD and Metabolic Traits
We identified 792 significant pleiotropic variants, in which 85 loci are independent, involving 100 pleiotropic genes in the region. The strongest association signal rs7903146 (located on chromosome 10, PRE2C = 6.95 × 10 −206 ) is mapped to TCF7L2, which is a wellknown gene susceptible to type II diabetes [32]. Moreover, it was reported that TCF7L2 is independently associated with NAFLD in Asian Indians [33], and its effect on type II diabetes risk is modulated by obesity in European populations [34]. The second strongest signal is found on FTO (rs1421085, located on chromosome 16, PRE2C = 4.47 × 10 −34 ). FTO is the first obesity-susceptibility gene identified by GWAS, and it has been termed the "fat gene" [35]. The next three strongest signals are observed on IGF2BP2 (rs7651090, located on chromosome 3, PRE2C = 9.28 × 10 −34 ), CDKAL1 (rs2206734, located on chromosome 6, PRE2C = 1.61 × 10 −32 ), and SLC30A8 (rs13266634, located on chromosome 8, PRE2C = 1.79 × 10 −31 ), respectively. They were all proved to be associated with type II diabetes [36]. In addition, the key gene PNPLA3 (rs738408, located on chromosome 22 PRE2C = 6.84 × 10 −20 ) for NAFLD was also successfully detected [37]. Partial results of the RE2C cross-trait meta-analysis are shown in Table 3 (the full results are shown in Table S1). The functional relevance of genes in Table 3 is listed in Table S2.

Cross-Trait Meta-Analysis between NAFLD and Metabolic Traits
We identified 792 significant pleiotropic variants, in which 85 loci are independent, involving 100 pleiotropic genes in the region. The strongest association signal rs7903146 (located on chromosome 10, P RE2C = 6.95 × 10 −206 ) is mapped to TCF7L2, which is a wellknown gene susceptible to type II diabetes [32]. Moreover, it was reported that TCF7L2 is independently associated with NAFLD in Asian Indians [33], and its effect on type II diabetes risk is modulated by obesity in European populations [34]. The second strongest signal is found on FTO (rs1421085, located on chromosome 16, P RE2C = 4.47 × 10 −34 ). FTO is the first obesity-susceptibility gene identified by GWAS, and it has been termed the "fat gene" [35]. The next three strongest signals are observed on IGF2BP2 (rs7651090, located on chromosome 3, P RE2C = 9.28 × 10 −34 ), CDKAL1 (rs2206734, located on chromosome 6, P RE2C = 1.61 × 10 −32 ), and SLC30A8 (rs13266634, located on chromosome 8, P RE2C = 1.79 × 10 −31 ), respectively. They were all proved to be associated with type II diabetes [36]. In addition, the key gene PNPLA3 (rs738408, located on chromosome 22 P RE2C = 6.84 × 10 −20 ) for NAFLD was also successfully detected [37]. Partial results of the RE2C cross-trait meta-analysis are shown in Table 3 (the full results are shown in Table S1). The functional relevance of genes in Table 3 is listed in Table S2.

Overlapping Genes in Genome-Wide Gene Analysis
We used MAGMA genome-wide gene analysis to identify the genes associated with NAFLD, obesity and type II diabetes. After Bonferroni correction, 18, 18, and 26 genes (P MAGMA < 2.57 × 10 −6 ) were tested to be significantly associated with NAFLD, obesity, and type II diabetes, respectively. NAFLD and type II diabetes have 16 overlapping genes, NAFLD and obesity have six overlapping genes, obesity and type II diabetes have six overlapping genes. There are six genes (IGF2BP2, SLC22A3, ZMIZ1, TCF7L2, FTO, and NPC1) significantly associated with all the three diseases, in which IGF2BP2, TCF7L2 and FTO are the top three genes identified by RE2C analysis. Moreover, SLC22A3 and NPC1 are the two overlapping genes only detected by MAGMA analysis. The results of genome-wide gene test are shown in Table 4 and the functional relevance of genes in Table 4 is also listed in Table S2.

Results from the Transcriptome-Wide Association Study, Biological Pathway Analysis and Protein-Protein Interaction Network Analysis
We also performed TWAS for the adjusted GWAS results of NAFLD, obesity, and type II diabetes by FUSION software. We found that gene ATP13A1 is significantly associated with a tissue (Cells_Transformed_fibroblasts) for NAFLD, and rs16996148 is the expressed quantitative trait loci (eQTL). In addition, 22 gene-tissue pairs (involving nine genes) were found to be associated with obesity and gene KAT8 is significantly associated with 14 tissues. Moreover, there are 41 gene-tissue pairs (involving 21 genes) associated with type II diabetes and WFS1 is significantly associated with eight tissues. However, none of the genes is observed to be associated with all the three diseases. The TWAS results of NAFLD, obesity and type II diabetes are shown in Table S3.
We combined the pleiotropic genes from cross-trait meta-analysis and genome-wide gene test and 102 shared risk genes were obtained. KEGG pathway enrichment analysis showed that shared risk genes were significantly enriched in 20 biological pathways (Table S4), including cancer, diabetes, insulin secretion and other related pathways. The most strongly enriched one is the cancer pathway, including nine enriched genes (TCF7L2, CDKN2B, CCND2, BCL2L11, BCL2, LPAR2, CXCR4, PPARG and ADCY5).
In order to understand the interaction between shared risk genes, PPI network analysis was carried out using the STRING tool. There were a total of 292 pairs of interactions. All the interacting genes had combined scores of no less than 0.4, in which five pairs of genes had scores larger than 0.95. The 13 hub genes (with degrees larger than 10) that interacted extensively with other genes are TCF7L2, CDKAL1, SLC30A8, MTNR1B, CDC123, FTO, IGF2BP2, KCNQ1, PPARG, ADCY5, HHEX, KCNJ11 and THADA, which includes the top five genes of significance identified by cross-trait meta-analysis. The PPI network for shared risk genes is shown in Figure 3.

Discussion
To our knowledge, this study is the first large-scale cross-trait association analysis to investigate the genetic overlap between NAFLD and metabolic traits. Selecting GWASs with the largest samples can facilitate the detection of novel associations. As expected

Discussion
To our knowledge, this study is the first large-scale cross-trait association analysis to investigate the genetic overlap between NAFLD and metabolic traits. Selecting GWASs with the largest samples can facilitate the detection of novel associations. As expected from epidemiologic and genetic studies [3,10,38], obesity and type II diabetes were found to have significant genetic correlation with NAFLD. However, we did not observe significant genetic correlation between NAFLD and hypertension, which suggests that the association between NAFLD and new-onset hypertension might be due to the main linking mechanism, such as insulin resistance [39].
MTAG multitrait association analysis is a very powerful procedure in our study. It incorporates correlations among multiple various GWAS results to enhance detections. In some traits (not all traits), MTAG results are more significant than the original GWASs. Notice that 133 significant loci were newly discovered on chromosomes 2, 3,4,6,7,8,9,10,16 for NAFLD. Furthermore, MTAG makes significant loci for each trait more widely distributed. The distributions of significant loci for NAFLD and obesity were obviously more extensive, and it remained unchanged for type II diabetes (both located on chromosomes 1,2,3,4,6,7,8,9,10,11,12,13,16,17,19) in spite of the decrease in significant loci. Therefore, by applying multitrait association analysis, we have the opportunity to detect novel associations and then identify more shared risk genes.
RE2C cross-trait meta-analysis also has the advantage to detect shared loci, due to the fact that many traits or diseases exist with a polygenic background and most of the variants have too weak effects to be detected. RE2C identified 792 significant pleiotropic loci, involving 100 pleiotropic genes. The top five genes of significance (TCF7L2, FTO, IGF2BP2, CDKAL1 and SLC30A8) are all hub genes according to PPI network analysis. Many pleiotropic genes have been reported to be associated with at least one of the three diseases. For example, PNPLA3, TM6SF2, and GCKR were found to have robust and reproducible associations with NAFLD by GWASs [37]; FTO, LCT, and PPARG were reported to be associated with obesity [18]; TCF7L2, SLC30A8, CDKAL1, CDKN2B, IGF2BP2, HHEX, FTO, PPARG and KCNJ11 were confirmed to be associated with type 2 diabetes [36]. These results indicate the reliability of the pleiotropic genes we identified.
In the genome-wide gene test, we detected two additional novel common genes (SLC22A3 and NCP1). Gene SLC22A3 has been regarded as a therapy target for type 2 diabetes in a previous study, because it may confer metformin clinical responses in patients with type 2 diabetes [40]. This gene has neither been reported in the GWASs of NAFLD or obesity. However, higher SLC22A3 methylation was reported to be related to higher body mass index [41], which suggests that SLC22A3 is associated with obesity. Moreover, genetic studies have implicated the NPC1 gene in susceptibility to obesity, and it may play a role in adipocyte processes underlying obesity [42]. NPC1 was also reported to affect liver fat metabolism by regulating the transport of cholesterol and lipids from lysosome to different cellular compartments, but there is still little evidence for a role of altered NPC1 function in NAFLD pathogenesis [43]. These suggest that NCP1 may play a role in some undetermined mechanisms of NAFLD. The other four genes (IGF2BP2, ZMIZ1, TCF7L2 and FTO) can also be identified by cross-trait meta-analysis and IGF2BP2, TCF7L2 and FTO are the top three genes.
Pathway enrichment analysis shows that 102 shared risk genes are enriched in 20 pathways, in which the cancer pathway has the strongest enrichment signal. This finding is essentially consistent with the previous statement that NAFLD, obesity, and type II diabetes are replacing viral and alcoholic liver disease as the main pathogenic factors of hepatic cellular cancer [44]. Moreover, there are other enriched pathways, maturity onset diabetes of the young, colorectal cancer, insulin secretion, and so on. These conclusions broaden our understanding of NAFLD, obesity and type II diabetes, and provide guidance for disease prevention.
The limitations of our study include the following: First, we used the public datasets with the largest cohorts at the present, and the results cannot be replicated because we did not find other independent GWASs to confirm our findings. Second, genome-wide genetic correlation between NAFLD and each of metabolic traits is relatively weak and local genetic correlation analysis is needed to evaluate the functionally categorized genetic correlation at every LD-independent region. Third, we identified 102 shared risk genes, some of which might be potential causal genes. Further functional experiments are needed to study the causal variants or genes.

Conclusions
Our study showed strong genetic correlations between NAFLD and two of the metabolic traits (obesity and type II diabetes). After a multitrait association study of the GWAS results for NAFLD, obesity and type II diabetes, 102 shared risk genes were identified by cross-trait meta-analysis and gene test. A pathway enrichment analysis revealed that the shared risk genes are enriched in cancer, diabetes, insulin secretion, and other related pathways. These shared risk genes and pathways may serve as common drug targets in NAFLD and metabolic disorders and provide help in the treatment of both diseases in clinical settings.

Data Availability Statement:
The datasets analyzed in the current study are available from public websites. NAFLD: summary results for all variants are available at https://www.leelabsg.org/ resources. HTN and T2D: summary results are available from the GWASATLAS database (https: //atlas.ctglab.nl/). LDL, HDL, TC and TG: summary results are publicly available at the GLGC website (http://lipidgenetics.org/). FG and FIN: summary results are publicly available at the MAGIC website (https://www.magicinvestigators.org/).

Conflicts of Interest:
The authors declare no conflict of interest.