Excess Heritability Contribution of Alcohol Consumption Variants in the “Missing Heritability” of Type 2 Diabetes Mellitus

We aim to compare the relative heritability contributed by variants of behavior-related environmental phenotypes and elucidate the role of these factors in the conundrum of “missing heritability” of type 2 diabetes. Methods: We used Linkage-Disequilibrium Adjusted Kinships (LDAK) and LDAK-Thin models to calculate the relative heritability of each variant and compare the relative heritability for each phenotype. Biological analysis was carried out for the phenotype whose variants made a significant contribution. Potential hub genes were prioritized based on topological parameters of the protein-protein interaction network. We included 16 behavior-related phenotypes and 2607 valid variants. In the LDAK model, we found the variants of alcohol consumption and caffeine intake were identified as contributing higher relative heritability than that of the random variants. Compared with the relative expected heritability contributed by the variants associated with type 2 diabetes, the relative expected heritability contributed by the variants associated with these two phenotypes was higher. In the LDAK-Thin model, the relative heritability of variants of 11 phenotypes was statistically higher than random variants. Biological function analysis showed the same distributions among type 2 diabetes and alcohol consumption. We eventually screened out 31 hub genes interacting intensively, four of which were validated and showed the upregulated expression pattern in blood samples seen in type 2 diabetes cases. Conclusion: We found that alcohol consumption contributed higher relative heritability. Hub genes may influence the onset of type 2 diabetes by a mediating effect or a pleiotropic effect. Our results provide new insight to reveal the role of behavior-related factors in the conundrum of “missing heritability” of type 2 diabetes.


Introduction
Type 2 diabetes mellitus (type 2 diabetes) is a complex disease induced by a combination of environmental and genetic factors. Previous studies have shown that overweight, smoking, sedentary lifestyle and education are common risk factors of type 2 diabetes [1][2][3][4]. Meanwhile, genome-wide association studies (GWAS) have identified more than 500 susceptibility loci that demonstrated a robust association with type 2 diabetes [5]. In contrast to the tremendous stride in GWAS research, the conundrum of "missing heritability" in type 2 diabetes has progressed slowly and arduously. Genome-wide chip heritability analysis explained 19% of type 2 diabetes risk on a liability scale, which is much smaller when compared to heritability estimates expected from the observed trait concordance within families [6,7]. Although there are several hypotheses regarding rare variants, structural variants and gene-environment interactions for the missing heritability [8][9][10], the limited incremental value in heritability estimated by GWAS so far suggests that the genetic prediction of complex diseases on a population basis will be challenging. There is still a long way to go to fully understand the etiology of type 2 diabetes before getting it under control.
An important controversial assumption about heritability is the idea that the genetic influence on trait development can be separated from the environmental context [10]. In addition to the direct effect of genetics, part of the effect of genetic factors is mediated by environmental factors. Baud et al. found social genetic effects (SGE, effects of an individual's genotypes on others' phenotype, also called indirect genetic effects) can explain up to 29% of phenotypic variance, and for several traits, their contribution exceeded that of direct genetic effects (effects of an individual's genotypes on its own phenotype) [11]. Undoubtedly, ignoring SGE can severely bias estimates of direct genetic effects (heritability) [11]. Xia et al. used a linear mixing model to estimate the indirect heritability between partners, and found evidence of indirect genetic effects between partners in about 50% of phenotypes [12]. The genetic nurturing effect proposed by Kong et al. is a manifestation of the social genetic effect within the family. Using results from a meta-analysis of educational attainment, they found the polygenic score computed for the non-transmitted alleles of 21,637 probands with at least one parent genotyped had an estimated effect on the educational attainment of the proband, that is 29.9% (p = 1.6 × 10 −14 ) of that of the transmitted polygenic score [13]. The evidence above suggests that genetic factors can affect individual phenotypes through their contributions to the environment.
Another controversy about missing heritability is that there is currently much debate regarding the best model for how heritability varies across the genome. It has been shown that the LDAK model leads to estimates of common single-nucleotide polymorphism (SNP) heritability, on average, 43% (s.d. 3%) higher than those obtained from the widely used software Genome-wide Complex Trait Analysis (GCTA) and 25% (s.d. 2%) higher than those from the recently proposed extension LD and minor allele frequency (MAF) stratified multi-component GCTA (GCTA-LDMS) across 19 traits [14]. In terms of the rationality of the hypothesis, it is more realistic to employ the LDAK model, where expected heritability varies with both linkage disequilibrium (LD) and MAF [15,16]. In addition, considering the computational burden, the simplified LDAK-Thin model is also an alternative, which is a one-parameter model, and can be incorporated in any existing method simply by changing which predictors are included in the regression and how these are standardized [15].
In this study, we compared the heritability contribution of environmental phenotypes, especially behavior-related environmental phenotypes that have a genetic basis, with that of type 2 diabetes by using heritability estimation models to estimate the relative expected heritability tagged by each variant. The susceptibility variants of candidate environmental phenotypes were further characterized by functional annotation and protein-protein interaction (PPI) analysis to identify the potential key genes of type 2 diabetes. Our work is a new attempt to provide information and evidence to elucidate the genetic mechanisms underlying the missing heritability of type 2 diabetes and promote the development of comprehensive prevention for type 2 diabetes.

Overview of Behavior-Related Phenotypes
Based on the results of the literature review and the results of Yuan et al., we eventually included 16 behavior-related phenotypes, including educational attainment, lifetime smoking index, alcohol consumption, coffee intake, caffeine intake, breakfast skipping, morningness, insomnia, sleep duration, short sleep, daytime napping, restless leg syndrome, moderate to vigorous physical activity, strenuous sports, vigorous physical activity and accelerometer. The union of variants for type 2 diabetes and the phenotype that both appear simultaneously in the tagging file is defined as the valid variant set for the consequent analysis. A total of 2607 valid variants were included in the analysis. The mean minimum allele frequency (MAF) was 0.28 (s.d. 0.14), and 149 variants were rare variants (MAF < 0.05). The results of traditional epidemiological studies on behavior-related phenotypes of type 2 diabetes and the information of susceptibility variants for each phenotype included in the analysis are shown in Tables 1 and 2, and Figure 1.

Estimation of Relative Expected Heritability by LDAK
The relative expected heritability estimated of all 2607 variants estimated by SumHer under the LDAK model assumption was 19.5, which was not higher than that of simulated sampling. All variants of behavior-related phenotypes accounted for 83.39% of the total phenotypic heritability. Educational attainment contributed the most, at 76.43% of the total phenotypic heritability. The heritability contributed by the susceptibility variants was significantly correlated with the number of variants (correlation coefficient = 0.90, p < 0.001), as seen in Table 3. The results of simulation sampling showed that the relative heritability of the susceptibility variants of caffeine intake and alcohol consumption was significantly higher than that of random variants. In caffeine intake, the average heritability of the total variants was 0.01 and the average heritability of phenotypic variants was 0.04, while the attribution heritability of phenotypic variants was 2.43% and the relative heritability of phenotypic variants was 4.51 times. The corresponding parameters for alcohol consumption were 0.01, 0.02, 37.45% and 2.24 times, respectively. The relative heritability of phenotypic variants of skipping breakfast, coffee consumption and strenuous sports were also more than 2 times that of type 2 diabetes variants, while it was not statistically significant compared with simulation sampling.

Estimation of Relative Expected Heritability by LDAK
The relative expected heritability estimated of all 2607 variants estimated by SumHer under the LDAK model assumption was 19.5, which was not higher than that of simulated sampling. All variants of behavior-related phenotypes accounted for 83.39% of the total phenotypic heritability. Educational attainment contributed the most, at 76.43% of the total phenotypic heritability. The heritability contributed by the susceptibility variants was significantly correlated with the number of variants (correlation coefficient = 0.90, p < 0.001), as seen in Table 3. The results of simulation sampling showed that the relative heritability of the susceptibility variants of caffeine intake and alcohol consumption was significantly higher than that of random variants. In caffeine intake, the average heritability of the total variants was 0.01 and the average heritability of phenotypic variants was 0.04, while the attribution heritability of phenotypic variants was 2.43% and the relative heritability of phenotypic variants was 4.51 times. The corresponding parameters for alcohol consumption were 0.01, 0.02, 37.45% and 2.24 times, respectively. The relative heritability of phenotypic variants of skipping breakfast, coffee consumption and strenuous sports were also more than 2 times that of type 2 diabetes variants, while it was not statistically significant compared with simulation sampling.

Estimation of Relative Expected Heritability by LDAK-Thin
The relative expected heritability estimated of all 2607 variants estimated by SumHer under the LDAK-Thin model assumption was 671.3, which was significantly higher than that of simulated sampling. All variants of behavior-related phenotypes accounted for 86.88% of total phenotypic heritability. Educational attainment contributed the most, at 79.48% of the total phenotypic heritability. The heritability contributed by the susceptibility variants was significantly correlated with the number of variants (correlation coefficient = 0.91, p < 0.001), as seen in Table 4. Compared to the simulation sampling, the relative heritability of variants of 11 phenotypes, including insomnia, educational attainment, lifetime smoking index, alcohol consumption, coffee consumption, daytime napping, sleep duration, short sleep, morningness, moderate to vigorous physical activity and vigorous physical activity, was statistically higher than the random variants. Among the phenotypes with significant differences, the relative heritability of phenotypic variants of all phenotypes was higher than that of type 2 diabetes, except alcohol consumption. The relative heritability of phenotypic variants for short sleep was the highest, which was 1.36 times that of type 2 diabetes, which accounted for 8.87% of the total phenotypic heritability.

Biological Function Analysis
As the variants of alcohol consumption were identified to contribute higher relative heritability than that of the random variants in two heritability models, we then performed the functional annotation, enrichment analysis and protein interaction network analysis for the targeted phenotypes (Tables S1 and S2).
For 98 susceptibility variants associated with alcohol consumption, 7 of them (7.14%) were missense mutations, 1 (1.02%) was a synonymous mutation (rs17029090), 14 (14.29%) were in the untranslated region (UTR) and 66 (67.35%) were mutations in the intronic region. The regulatory element functional annotation results revealed six transcription factor binding sites and two CpG sites. Fourteen variants had Combined Annotation-Dependent Depletion (CADD) scores greater than 12.37, suggesting that they might be deleterious mutations (Table S3). The Chi-Squared test showed no significant difference between the distributions for the functional category and RegulomeDB ranking among type 2 diabetes and alcohol consumption (Tables S4 and S5). KEGG pathway enrichment analysis was performed on 55 genes annotated by susceptibility variants of alcohol consumption (Table S6). Under the false discovery rate at the 0.05 level, the study found that the genes were significantly enriched in glycolysis/gluconeogenesis (hsa00010), tyrosine metabolism (hsa00350), fatty acid degradation (hsa00071), retinol metabolism (hsa00830), metabolism of xenobiotics by cytochrome P450 (hsa00980), drug metabolism-cytochrome P450 (hsa00982), chemical carcinogenesis (hsa05204) and propanoate metabolism (hsa00640).

Screening of Hub Genes
Based on the closeness, edge percolated component (EPC), and maximum neighborhood component (MNC) stress of the protein interaction network formed by genes annotated from variants associated with alcohol consumption and type 2 diabetes, the genes in the protein interaction network were sorted (Table S7). We eventually screened out 31 hub genes interacting intensively (p < 0.001), as seen in Figure 2, of which 2 genes (GCKR and TCF4) were identified simultaneously by the susceptibility variants of alcohol consumption and type 2 diabetes (Table S8). GCKR is highly expressed in the liver. CAMD2 and RPTOR were identified by the susceptibility variants of alcohol consumption only, of which CAMD2 is highly expressed in brain-related tissues ( Figure S1).

Significant Upregulation of RPTOR
Among the hub genes screened, four genes (NEUROG3, TCF7L2, MAP2K5 and RPTOR) were validated as showing the upregulated expression pattern in blood samples in type 2 diabetes cases (Figure 3). GCKR is highly expressed in the liver. CAMD2 and RPTOR were identified by the susceptibility variants of alcohol consumption only, of which CAMD2 is highly expressed in brain-related tissues ( Figure S1).

Significant Upregulation of RPTOR
Among the hub genes screened, four genes (NEUROG3, TCF7L2, MAP2K5 and RPTOR) were validated as showing the upregulated expression pattern in blood samples in type 2 diabetes cases (Figure 3). susceptibility variants of alcohol consumption only, of which CAMD2 is highly expressed in brain-related tissues ( Figure S1).

Significant Upregulation of RPTOR
Among the hub genes screened, four genes (NEUROG3, TCF7L2, MAP2K5 and RPTOR) were validated as showing the upregulated expression pattern in blood samples in type 2 diabetes cases (Figure 3).

Discussion
This study provides new insight into the association between type 2 diabetes and alcohol consumption. In this study, we found the variants of alcohol consumption were identified as contributing higher relative heritability than that of the random variants in two heritability models. In the LDAK model, the relative expected heritability contributed by the variants associated with these two phenotypes was twice as much as the relative expected heritability contributed by the variants associated with type 2 diabetes, while in the LDAK-Thin model, the relative expected heritability contributed by the variants associated with these two phenotypes was less than the relative expected heritability contributed by the variants associated with type 2 diabetes. Such inconsistencies in the relative expected heritability of each variant may be due to differences in the weights assigned to the variants in model assumptions. Boyle et al. believed that the heritability of a typical complex phenotype is driven by a large number of variations in the regulatory element region [17]. Liu et al. found that the distribution of heritability in each variant showed tissue specificity, with genes with related functions (e.g., neuronal function in schizophrenia and immune function in Crohn's disease) contributing slightly more to heritability than random genes, while genes not expressed in related cell types did not contribute to heritability [18]. Yet, the specific assumptions of which model is more reasonable still need to be further explored at the level of biological mechanisms. Biological function analysis showed the same distributions for the functional category and RegulomeDB ranking among type 2 diabetes and alcohol consumption. Based on the topological parameters of the protein interaction network, we eventually prioritize an intensively interactive hub 31 genes, of which two hub genes (CAMD2 and RPTOR) were annotated by the variants of alcohol consumption only. Our study provided a comprehensive approach to delineate

Discussion
This study provides new insight into the association between type 2 diabetes and alcohol consumption. In this study, we found the variants of alcohol consumption were identified as contributing higher relative heritability than that of the random variants in two heritability models. In the LDAK model, the relative expected heritability contributed by the variants associated with these two phenotypes was twice as much as the relative expected heritability contributed by the variants associated with type 2 diabetes, while in the LDAK-Thin model, the relative expected heritability contributed by the variants associated with these two phenotypes was less than the relative expected heritability contributed by the variants associated with type 2 diabetes. Such inconsistencies in the relative expected heritability of each variant may be due to differences in the weights assigned to the variants in model assumptions. Boyle et al. believed that the heritability of a typical complex phenotype is driven by a large number of variations in the regulatory element region [17]. Liu et al. found that the distribution of heritability in each variant showed tissue specificity, with genes with related functions (e.g., neuronal function in schizophrenia and immune function in Crohn's disease) contributing slightly more to heritability than random genes, while genes not expressed in related cell types did not contribute to heritability [18]. Yet, the specific assumptions of which model is more reasonable still need to be further explored at the level of biological mechanisms. Biological function analysis showed the same distributions for the functional category and RegulomeDB ranking among type 2 diabetes and alcohol consumption. Based on the topological parameters of the protein interaction network, we eventually prioritize an intensively interactive hub 31 genes, of which two hub genes (CAMD2 and RPTOR) were annotated by the variants of alcohol consumption only. Our study provided a comprehensive approach to delineate the potential causal genes and biological processes involved in type 2 diabetes pathogenesis and proposed new insight into revealing the role of behavior-related environmental factors in the conundrum of "missing heritability" of type 2 diabetes. Systematic reviews have found a U-shaped association between alcohol consumption and type 2 diabetes [19,20]. Moderate alcohol consumption also has a protective effect on blood glucose management. Initiating moderate wine intake, especially red wine, among well-controlled diabetics as part of a healthy diet is apparently safe and modestly decreases cardiometabolic risk. In particular, only alcohol dehydrogenase allele [ADH1B*1] carriers significantly benefited from the effect of both wines on glycemic control compared with persons homozygous for ADH1B*2 [21]. We found that the ADH1B gene is a missense mutation annotated by the variant rs1229984 associated with alcohol consumption, which implied that it may be a key gene in the biological mechanism of alcohol consumption and type 2 diabetes. However, this gene was not tagged as a hub gene in our study, possibly because the number of genes annotated by variants of type 2 diabetes exceeded that of alcohol consumption, thus it may be diluted by type 2 diabetes-related genes.
Among the hub genes identified, we particularly highlighted those annotated by alcohol consumption variants, because these genes may influence the onset of type 2 diabetes by a mediating effect or a pleiotropic effect, which is of significance for the comprehensive prevention of type 2 diabetes. GCKR, a hub gene identified simultaneously by the susceptibility variants of alcohol consumption and type 2 diabetes, has densely interacted with type 2 diabetes-related genes such as FTO and SLC2A2. GCKR is the susceptibility gene candidate of maturity-onset diabetes of the young (MODY), whose protein product binds non-covalently to form an inactive complex with the enzyme to regulate glucokinase in liver and pancreatic islet cells. Previous studies have found that polymorphisms in GCKR (rs780094) are associated with non-alcoholic fatty liver disease in multiple populations [22][23][24]. Evidence of an association between this variant and type 2 diabetes or metabolic risk has also been detected [25,26]. An exome-chip association analysis for circulating FGF21 levels in Chinese individuals found that the common missense variant of GCKR, rs1260326 (p.Pro446Leu), may influence FGF21 expression via its ability to increase glucokinase (GCK) activity [27]. This can lead to enhanced FGF21 expression via elevated fatty acid synthesis, which is recognized as an important metabolic regulator of glucose homeostasis [27,28]. CAMD2 and RPTOR were specifically alcohol consumption annotating genes. CADM2 variants influence a wide range of both psychological and metabolic traits, suggesting common biological mechanisms across phenotypes via the regulation of CADM2 expression levels in adipose tissue [29]. RPTOR encodes a component of a signaling pathway that regulates cell growth in response to nutrient and insulin levels. Its encoded protein forms a stoichiometric complex with the mTOR kinase, of which the dysregulation of signaling is implicated in pathologies that include diabetes, cancer and neurodegeneration [30]. Regarding the indirect effect of genetic factors, our study calculated the heritability contribution of each phenotype and explored the biological function of the potential mechanism. Such a new method identified genes related to the onset of type 2 diabetes, and the function of these pleiotropic genes needs to be verified in subsequent analyses using primary individual-level data or experimental evidence.
There are some limitations in this study. Firstly, due to the limitation of computational resources, only two simple heritability models were considered, and the models weighted by functional annotation were ignored. Since the estimated heritability in this study is the relative expected heritability rather than the absolute heritability, the results between models were not comparable to a certain extent. Although we applied the relative heritability of phenotypic variants, the results of some phenotypes were not consistent. The hypothesis relating to which model is more reasonable still needs to be further explored.
In particular, whether this phenomenon exists in more complex heritability models also needs to be followed up. In addition, the extrapolation of the conclusions in non-European ancestry needs to be further verified as there are systematic differences not only in gene frequency among different populations, but also in their behavior and lifestyle, such as drinking culture. Further studies on a larger scale are needed to verify the reliability of the conclusions in other populations.
Previous studies identified hub genes of type 2 diabetes based on the direct genetic effect, while recent studies found that the majority of phenotypic variance is driven by genes that are not directly related to the phenotypes [18]. Therefore, indirect effects of genetic factors, especially those mediated by modifiable phenotypes such as behaviorrelated phenotypes, should be considered in etiological studies and intervention strategies for chronic diseases such as type 2 diabetes.

Identification for Candidate Environmental Phenotypes Associated with Type 2 Diabetes
Behavior-related environmental phenotypes found to be potentially causally associated with type 2 diabetes were identified as candidate phenotypes based on previous traditional epidemiological literature reports and Mendelian randomization studies. The literature was searched in the PubMed database, and the search strategies were as follows:

The Data Source
Genetic variants information of type 2 diabetes was acquired from Mahajan et al. 's work [32]. In this study, GWAS results from 32 studies for 898,130 individuals (74,124 T2D cases and 824,006 controls) of European ancestry were aggregated. Imputation was implemented using the Haplotype Reference Consortium reference panel. Association summary statistics from sex-combined analyses for each variant across all studies were aggregated using fixed-effect meta-analyses with an inverse-variance weighting of log-ORs and corrected for residual inflation by means of genomic control. In total, 403 independent association signals were detected by conditional analyses at each of the genome-wide-significant risk loci for type 2 diabetes (except at the major histocompatibility complex (MHC) region). Summarylevel data are available at the DIAGRAM consortium (http://diagram-consortium.org/, accessed on 13 November 2020) and Accelerating Medicines Partnership type 2 diabetes (http://www.type2diabetesgenetics.org/, accessed on 13 November 2020). The information of susceptibility variants of candidate phenotypes is shown in Table 1. Detailed definitions of each phenotype are shown in Supplementary Table.

LDAK Model
The LDAK model [14] is an improved model to overcome the equity-weighted defects for GCTA, which weighted the variants based on the relationships between the expected heritability of an SNP and minor allele frequency (MAF), levels of linkage disequilibrium (LD) with other SNPs and genotype certainty. When estimating heritability, the LDAK Model assumes: where E[h 2 j ] is the expected heritability contribution of SNP j and f j is its (observed) MAF. The parameter α determines the assumed relationship between heritability and MAF. In human genetics, it is commonly assumed that heritability does not depend on MAF, which is achieved by setting α = -1; however, we consider alternative relationships. The SNP weights 1 , . . . . . . , m are computed based on local levels of LD; j tends to be higher for SNPs in regions of low LD, and thus the LDAK Model assumes that these SNPs contribute more than those in high-LD regions. Finally, r j ∈ [0,1] is an information score measuring genotype certainty; the LDAK Model expects that higher-quality SNPs contribute more than lower-quality ones.

LDAK-Thin Model
The LDAK-Thin model [15] is a simplification of the LDAK model. The model assumes j is either 0 or 1, that is, not all variants contribute to the heritability based on the LDAK model.

Model Implementation
We applied SumHer (http://dougspeed.com/sumher/, accessed on 13 January 2021) [33] to estimate each variant's expected heritability contribution. The reference panel used to calculate the tagging file was derived from the genotypes of 404 non-Finnish Europeans provided by the 1000 Genome Project. Considering the small sample size, only autosomal variants with MAF ≥ 0.01 were considered. Data preprocessing was completed with PLINK1.9 (https://www.cog-genomics.org/plink/1.9/, accessed on 13 January 2021) [34]. SumHer analysies are completed using the default parameters, and a detailed code can be found in http://dougspeed.com/reference-panel/, accessed on 13 January 2021.

Estimation and Comparison of Expected Heritability
To estimate and compare the relative expected heritability, we define three variants set in the tagging file: G 1 was generated as the set of significant susceptibility variants for type 2 diabetes; G 2 was generated as the union of type 2 diabetes and the set of each behaviorrelated phenotypic susceptibility variants. Simulation sampling is conducted because all estimations calculated from tagging file were point estimated without a confidence interval. We hoped to build a null distribution of the heritability of random variants. This allowed us to distinguish the contribution of phenotypic variants from the null distribution of random variants at the significance level of α = 0.05. Therefore, the random variant set G 3 was generated. G 3 was defined as the union of type 2 diabetes and the random susceptibility variants. The set size of G 3 was equal to that of G 2 , which could control spurious inflation caused by increasing the number of variants. The calculation procedure for G 3 is the exactly same as that for G 2 . The sum of expected relative heritability contributed to by variants in G 1 and G 2 was calculated, respectively. We simulated random sampling progress to generate 100 G 3 sets to the significance of G 2 at the level of α = 0.05. For each phenotype, we also calculated indexes as follows: Average heritability of total variants Average heritability of phenotypic variants Attribution heritability of phenotypic variants Relative heritability of phenotypic variants h 2 G i and n G i (i = 1, 2, 3) were the expected relative heritability and set size of G 1 , G 2 and G 3 .  [35]. SNPNexus is a web-based annotation tool developed by Claude Chelala et.al. The latest version was updated in December 2019. CADD scores greater than 12.37 were considered high probability of a harmful mutation [36]. Potential regulatory functions were annotated by RegulumeDB [37]. We also explored the expression of hub genes in the dataset Genotype-Tissue Expression (GTEx) [38] using FUMA [39].

KEGG Pathway Enrichment Analysis
To clarify the biological mechanism behind the potential pathogenic genes of type 2 diabetes behavior-related phenotypes, we conducted pathway enrichment analysis on the susceptibility variants of type 2 diabetes in Kyoto Encyclopedia of Genes and Genomes (KEGG) dataset [40] behavior-related phenotypes annotated by GRCH37/HG19. An over-represented analysis was used to test whether potential pathogenic genes of behaviorrelated phenotypes of type 2 diabetes were significantly enriched in the above pathways. The data targeted by over-representative analysis is a group of genes of interest. The statistical principle is the hypergeometric distribution test, and the p-value is calculated by Fisher's exact probability method [41]. The p-value in the target pathway (KI) is calculated as follows: Among them, N is the total number of genes studied, N is the total number of potential pathogenic genes for behavior-related phenotypes of type 2 diabetes, M is the total number of genes in pathway K i and M is the total number of potential pathogenic genes for behavior-related phenotypes of type 2 diabetes in pathway K i . Subsequently, the Benjamini and Hochberg method was used to correct the multiple tests, and the significance level of pathway analysis was defined as the false discovery rate (FDR) < 0.05.

Protein Interaction Network Analysis
Based on the "guilt-by-association" principle, protein-protein interaction (PPI) analysis identifies a set of genes whose downstream products (proteins) are associated with each other. These identified genes combine to influence disease. In this study, protein interaction network analysis was completed by String (https://string-db.org/, accessed on 8 February 2021) [42]. String is a database of known and predicted protein-protein interactions designed to collect, score and integrate all publicly available sources of protein-protein interaction information, and to supplement the information by calculating the predictions. The visualization of the network was completed by Cytoscape 3.7.0 [43].

Screening of Hub Genes
The acquisition of hub proteins and subnetworks in the complex differential protein interaction network is particularly important for the search of the mechanism of life activities. Therefore, in this study, the Cytohubba module [44] was used to sequence the genes in the network and screen the hub genes. Cytohubba can predict and explore key nodes and subnetworks in a given network through several topological algorithms. We used four global topology analysis methods, including the Edge Percolated Component (EPC), Maximum Neighborhood Component (MNC) and centralities based on shortest paths including closeness and stress to prioritize genes in the network. Hub genes were defined as the shared top 25% of genes sorted by each method.

Expression Analysis of Hub Gene in Blood Samples
To evaluate whether the hub genes identified are differentially expressed, we used publicly available expression dataset GSE184050 from the Gene Expression Omnibus (https: //www.ncbi.nlm.nih.gov/geo/, accessed on 29 September 2021) database. GSE184050 compared changes in gene expression using two longitudinally collected blood samples from subjects who transitioned to type 2 diabetes between the time points against those who did not with a novel analytical network approach. A total of 116 individual samples (50 from type 2 diabetes cases and 66 from healthy controls) were submitted to the analysis. RNA was extracted, amplified, reverse transcribed, labelled and sequenced with Illumina HiSeq 2000 (Illumina, Inc., San Diego, CA, USA).

Conclusions
We found that alcohol consumption contributed higher relative heritability and eventually screened out 31 hub genes candidate of the development of type 2 diabetes. Hub genes may influence the onset of type 2 diabetes by a mediating effect or a pleiotropic effect. Our results provided new insight into revealing the role of behavior-related factors in the conundrum of the "missing heritability" of type 2 diabetes. Institutional Review Board Statement: The present study did not follow a prespecified analysis plan or protocol. Ethics approval was not required for this study because the data are public available, deidentified, summary-level data. The patients/participants provided their written informed consent to participate in the original studies.
Data Availability Statement: Our datasets analyzed during the current study were derived from the following public domain resources: Summary statistics of the GWAS is available from DIAGRAM consortium (http://diagram-consortium.org/, accessed on 13 November 2020). We applied SumHer (http://dougspeed.com/sumher/, accessed on 13 January 2021) to estimate each variant expected heritability contribution. The reference panel used to calculate the tagging file was derived from the genotypes of 404 non-Finnish Europeans provided by the 1000 Genome Project. Data preprocessing was completed with PLINK1.9 (https://www.cog-genomics.org/plink/1.9/, accessed on 13 January 2021).