An Integrative Phenotype-Genotype Approach Using Phenotypic Characteristics from the UAE National Diabetes Study Identifies HSD17B12 as a Candidate Gene for Obesity and Type 2 Diabetes.

The United Arab Emirates National Diabetes and Lifestyle Study (UAEDIAB) has identified obesity, hypertension, obstructive sleep apnea, and dyslipidemia as common phenotypic characteristics correlated with diabetes mellitus status. As these phenotypes are usually linked with genetic variants, we hypothesized that these phenotypes share single nucleotide polymorphism (SNP)-clusters that can be used to identify causal genes for diabetes. Materials and We explored the National Human Genome Research Institute-European Bioinformatics Institute Catalog of Published Genome-Wide Association Studies (NHGRI-EBI GWAS) to list SNPs with documented association with the UAEDIAB-phenotypes as well as diabetes. The shared chromosomal regions affected by SNPs were identified, intersected, and searched for Enriched Ontology Clustering. The potential SNP-clusters were validated using targeted DNA next-generation sequencing (NGS) in two Emirati diabetic patients. RNA sequencing from human pancreatic islets was used to study the expression of identified genes in diabetic and non-diabetic donors. Eight chromosomal regions containing 46 SNPs were identified in at least four out of the five UAEDIAB-phenotypes. A list of 34 genes was shown to be affected by those SNPs. Targeted NGS from two Emirati patients confirmed that the identified genes have similar SNP-clusters. ASAH1, LRP4, FES, and HSD17B12 genes showed the highest SNPs rate among the identified genes. RNA-seq analysis revealed high expression levels of HSD17B12 in human islets and to be upregulated in type 2 diabetes (T2D) donors. Our integrative phenotype-genotype approach is a novel, simple, and powerful tool to identify clinically relevant potential biomarkers in diabetes. HSD17B12 is a novel candidate gene for pancreatic β-cell function.


Introduction
Type 2 diabetes (T2D) is a multifactorial disorder characterized by the insufficiency of insulin secretion and/or insulin action [1]. The disease is one of the most critical public health challenges of the 21st century [2]. Rapid socio-economic transition is one of the leading causes of the global diabetes epidemic and rising prevalence rates in developing economies [3]. The cost associated with the disease affects more impoverished world regions as well as high-income countries, thus imposing a substantial global economic burden [4]. Recent reports showed a high prevalence of T2D in the United Arab Emirates (UAE) [5]. Thus, effective interventions are urgently needed to slow the diabetes epidemic and reduce diabetes-related complications. An ultimate goal in diabetes management is the identification of novel biomarkers that enable the detection, prevention, treatment of the disease, and its complications long before overt disease development [6].
Genome-wide association studies (GWAS) have identified more than 143 common genetic variants associated with T2D [7]. However, these variants explain only a small proportion of the heritability of the disease [8]. Thus, more extensive studies are needed to identify T2D loci in different populations, as well as novel approaches to make sense of these generated databases [9].
The United Arab Emirates National Diabetes and Lifestyle Study (UAEDIAB) study is a cross-sectional survey designed to investigate the prevalence of diabetes and associated risk factors in 827 Emiratis and 2724 expatriates living in Dubai, Sharjah, and the Northern Emirates [10][11][12]. UAEDIAB study has identified a list of patient's related measurements including obesity/BMI, hypertension, obstructive sleep apnea, and dyslipidemia that correlated with diabetes. Interestingly, most of these identified phenotypes have previously been associated with genetic variants [13][14][15][16].
Random genetic variants are unlikely to occur in the same genomic position unless there is a positive selection for that mutation, which might represent a fitness advantage [17]. However, genetic variants in terms of single nucleotide polymorphism (SNP) can cluster in particular chromosomal regions [18]. Those regions might represent an increased vulnerability to mutation as a result of their unique sequence, structural, and functional features [19]. Such SNPs clusters are thought to occur preferentially around some functional genes. Therefore, the identification of clustered SNPs might provide insights into the molecular mechanisms of mutagenesis. This may include vulnerable regions, possible positive selection for disease development, and regulatory actions associated with T2D.
In this study, we describe an integrative phenotype-genotype approach to identify novel potential biomarkers for T2D by utilizing the phenotypic characteristics of the UAEDIAB study. This was addressed by a systems genetics approach integrating SNP-clusters in chromosomal regions between the identified UAEDIAB-phenotypes and published SNPs using the National Human Genome Research Institute-European Bioinformatics Institute Catalog of Published Genome-Wide Association Studies (NHGRI-EBI GWAS) Catalog. We believe that our study might lead to novel testable hypotheses for genes involved in β-cell function and will add new insight into the molecular pathogenesis and biomarkers for T2D.

Study Design, Population, and Settings
The details of the study design and sampling of the UAEDIAB study are described elsewhere [10,20]. In brief, the UAEDIAB is a cross-sectional study conducted to estimate both prevalence and risk factors of diabetes among UAE nationals and expatriates who have been living in Sharjah, Dubai, and the Northern Emirates for at least four years. The study was approved by the Ethics Committee of Sharjah University and the Ministry of Health Research Ethics Committee (REC number MOHAP/DXB/SUBC/No.14/2017). All participants had consented to the usage of their collected data and samples.

SNP to SNP Functional and Pathway Enrichment
To investigate whether the identified genes share common pathways that may link the given phenotypes, we used eXploring Genomic Relations (XGR) for enhanced interpretation web tools (http: //galahad.well.ox.ac.uk:3040/). XGR provides enhanced interpretation of GWAS via comprehensively utilizing ontology and network information [22]. The tool uses the Experimental Factor Ontology (EFO) option to provide a systematic description of many experimental variables available in EBI databases and the NHGRI-EBI Catalog [23]. Enrichment analysis is based on the hypergeometric/binomial distribution or Fisher s exact test which tests the statistical significance of the observed number of SNPs overlapped between an input group of SNPs and SNPs annotated by EFO. As validation of the results, Enriched Ontology Clustering for the identified genes and SNPs associated with the 5 identified UAEDIAB-phenotypes study were generated using the Metascape (a web-based tool used for comprehensive gene list annotation and analysis resource) and Reactome option in SNPnexus [24]. Socialiser for SNPs in XGR was used to assess the degree of relatedness between any two SNPs in terms of annotation profiles and network were performed by the tool.
Similarity functions serve to conduct similarity analysis calculating semantic similarity-a type of comparison to assess the degree of relatedness between two entities (e.g., genes or SNPs) based on their annotation profiles (by ontology terms) [25]. To do so, information content (IC) of a term is first defined to measure how informative a term is to being used for annotating genes: −log10 (frequency of genes annotated to this term). The similarity between the two terms is then measured based on IC, usually at the most informative common ancestor (MICA). Finally, the similarity between two entities (e.g., genes) was derived from pairwise term similarity using best-matching based methods: average, maximum, and complete. To identify phenotypically important SNPs, we used SNPnexus (http://www.snp-nexus.org) to assess the potential significance of identified SNPs on the major transcriptome, proteome, regulatory, and structural variation models [26].

Targeted Next-Generation Sequencing
As proof of concept to validate the identified SNP-clusters and genes related to diabetes in our cohort, two Emirati diabetic patients (1 male and 1 female; age 60 ± 1) were selected as they have an exceptionally high level of glycated hemoglobin (HbA1c > 10.5%), fasting blood glucose values > 13 mmol/L, BMI > 40, weight 120 ± 3 kg, with dyslipidemia profile, hypertension, and obstructive sleep apnea/snoring. Extracted blood DNA was subjected to targeted DNA next-generation sequencing using the S5 semi-conductor based DNA sequencer as described previously [26]. Briefly, we used the targeted coding-exome sequencing approach by designing multiplex primers that map across key genes along the coding region of the genome. We multiplexed the two patients on one Ion Chip. A pooled barcoded amplicon-tagged library generated using Fluidigm Access Array (Fluidigm Europe B.V, Amsterdam, The Netherlands) was diluted and subjected to emulsion polymerase chain reaction (PCR) with Ion SphereTM particles with Ion Template OT2 200 Kit (Ion OneTouchTM system) following the manufacturer's instructions (Thermo Fisher, Waltham, MA, USA). The pooled samples were sequenced using 540 Ion Chip with Ion 200 Sequencing kit on the S5 sequencer following manufacturer instructions (Thermo Fisher). The total mapped reads obtained were around 40 million per patient, with 109 mean depth and 94% uniformity coverage. 92.4% of the amplicons were aligned to the reference genome (HG19 build). The AQ20 quality mean length was 138 bp and the mean raw accuracy was 98.6%. The mean coverage across the amplicon was around ×130. The bioinformatics analysis was carried out by first aligning the data using the BWA alignment algorithm, followed by sequence filtering using SAMtools. Mutations file (VCF) was generated using vcftools. The mutations were visualized using Integrated Genome Viewer.

In Silico Validation
To link the genomic SNPs to the functional and dynamic mRNA transcriptomics of the identified genes, we searched publicly available transcriptomics database "GEO Omnibus" of pancreatic islets isolated from nondiabetic and compared to diabetic samples. Then, we extracted blood transcriptomic datasets from T2D, pre-diabetic, and healthy controls. Details of the used datasets are listed in Supplementary Table S2.

Statistical Methods
GraphPad Prism version 7.00 for Windows (GraphPad Software, La Jolla, CA, USA) was used for statistical analysis. First, the D Agostino-Pearson normality test was used to determine whether to perform parametric or nonparametric tests. One-way ANOVA test was performed to determine whether there are any statistically significant differences between the mean values of the controls and different groups for the gene expression and protein levels. For nonparametric tests, the Kruskal-Wallis test and Dunn's correction for multiple testing were used. The same software was used to examine the correlations between the different parameters using linear regression. A Student's t-test was used to look for the difference between two groups under a given experiment or treatment. p < 0.05 was considered significant.

NHGRI-EBI GWAS Catalog Analysis Identifies Eight SNP-Clusters in Chromosome Regions with Frequent SNPs Associated with Most of the UAEDIAB-Phenotypes
SNPs related to the different UAEDIAB-phenotypes and diabetes were explored using the NHGRI-EBI GWAS Catalog. Our analysis showed a total of 708 chromosomal regions associated with most SNPs that have been associated with UAEDIAB-phenotypes. Of them, 211 chromosomal regions were associated with diabetes, 270 SNPs with BMI and obesity, 155 regions with lipid profile, and 48 regions with hypertension, while 24 regions were associated with obstructive sleep apnea ( Figure 1). Interestingly, the intersection of chromosomal regions associated with UAEDIAB-phenotypes with diabetes identified eight common chromosomal regions between the phenotypes. Of them, four chromosomal regions (8p22, 1q32.3, 12q24.13, and 7p15.2) were overlapped between BMI/obesity, dyslipidemia, hypertension, and T1D/T2D, three regions (11p11.2, 6q21, and 17q12) between BMI/obesity, dyslipidemia, sleep apnea, and T1D/T2D and only, one common region (15q26.1) was found between BMI/obesity, hypertension, sleep apnea, and T1D/T2D ( Figure 2 and Table 1). Analysis of those eight shared chromosomal regions revealed 46 potential SNPs (Table 2). Data from the NHGRI-EBI GWAS Catalog revealed that 34 genes were reported to be affected by those SNPs (Table 2).

SNPs Enrichment Analysis Showed Significant Ontology Similarity and Metabolic Pathways Enrichment
XGR web tool was used to explore the correlation of the 46 identified SNPs as a group to each other or to UAEDIAB-phenotypes. This analysis further confirms the association between the identified 46 SNPs and the given phenotypes (FDR < 0.05). Additionally, our analysis showed a significant association between the 46 identified SNPs and other phenotypes (Table 3). Of them, 15 SNPs were associated with metabolic disease, 12 SNPs with diabetes mellitus, 13 SNPs with body mass index, and 10 SNPs with T2D.
Next, we harnessed the Socialiser tool to assess the degree of relatedness or network between any two SNPs in the meaning of annotation profiles. Our results showed that 25 out of the 46 SNPs had a similarity score of more than 0.5 in scale range from 0 to 1 with at least five other SNPs (Figure 3 and Table 4). These results indicate a possible co-occurrence or functional relationship of related SNPs. Moreover, clustering of the reported 34 genes in the shared chromosomal regions according to pathway enrichment revealed a significant enrichment of four major pathways, namely developmental biology, signal transduction, and metabolism of lipids and steroids (Supplementary Table S1).   Figure 3. SNP-based Similarity Analysis of the identified SNPs using the "Socialiser for SNPs" option in the eXploring Genomic Relations (XGR) web tool. Seven SNPs showed similarity to at least 10 other SNPs with a similarity score of more than 0.5 in scale range from 0 to 1. C12orf30 SOCS5P5, MARCKS 5

Targeted Next-Generation Sequencing (NGS) Confirms the Presence of Similar SNP-Clusters in Emirati Diabetic Patients
To confirm that our pipeline was able to identify similar SNP-clusters in the Emirati population, we performed NGS on two diabetic patients (as described in the methods section). As shown in Table 5 and Supplementary Figure S1, NGS analysis confirmed the presence of a high rate of SNPs (defined as more than one SNP) in the same genes identified earlier. Table 5. List of SNPs identified by next-generation sequencing (NGS) from two diabetic patients in genes located in the shared chromosomal region.

Identified Genes
Number of SNPs Out of the 34 identified genes by our pipeline, four genes (ASAH1, LRP4, FES, and HSD17B12) showed to have at least four SNPs from each patient (four was the median of the SNPs number per patients after excluding those who have one SNP only).

RNA-Seq Analysis in Human Pancreatic Islets Showed that HSD17B12 is Novel Candidate Gene for Pancreatic β Cell Function
To gain more insights about the role of the four genes (ASAH1, LRP4, FES, and HSD17B12) in pancreatic islets, we used our published RNA-seq data to investigate their expression in human islets [27,28]. RNA-seq mean expression analysis from nondiabetic islets showed that ASAH1 and HSD17B12 are highly expressed at a high level in human islets as compared FES, LRP4, or to the ion channel gene KCNJ11, a functional marker for pancreatic β-cell function. (Figure 4A). Differential expression analysis exhibited that expression of HSD17B12 is significantly reduced (p = 0.03) in diabetic islets (n = 12) when compared with nondiabetic islets (n = 63, Figure 4E). Expression of ASAH1, FES, and LRP4 was not affected by diabetes status (Figure 4B-D). Collectively, based on the expression level and differential expression in human islets, our results suggest that HSD17B12 is a candidate gene for pancreatic β-cell function.

Annotations of SNPs in HSD17B12
Targeted NGS performed in two local Emirati diabetic patients identified a total of four intronic SNPs in the HSD17B12 gene (Supplementary Table S2). Interestingly, the two patients showed one common genotype of rs4573668 (homozygote C/C instead of the reference G). To identify if this SNP is phenotypically important, we used the SNPnexus tool to explore the location and functional consequences of rs4573668. Interestingly, we found that rs4573668 to be located in the 5' untranslated region of the HSD17B12 transcript which could affect its expression, while it does not alter the amino acid sequence of the mature protein (Supplementary Figure S2). This might indicate that rs4573668 plays a role in controlling the activity and function of the HSD17B12 in terms of transcription or epigenetic modifications.

Discussion
In this study, we employed an integrative phenotype-genotype systems approach to investigate whether UAEDIAB-phenotypes shared chromosomal regions containing SNP-clusters, which can be exploited as candidate biomarkers for pancreatic β-cell function and risk of diabetes. Our analysis identified eight shared chromosomal regions in four UAE-DIAB-phenotypes ( Figure 2 and Table 1). The identified regions contain 46 SNPs with reported effect on 34 genes (Table 2), hence, putatively involved in the pathogenesis processes leading to the disease. In vivo validation on Emirati diabetic patients, showed that ASAH1, LRP4, FES, and HSD17B12 have the highest number of SNPs among the 34 genes.
An important part of our study is providing more knowledge for the association of the five identified candidate genes with diabetes in human pancreatic islets from donors with and without diabetes ( Figure 4). ASAH1 gene is a member of the ceramidases family which degrade ceramide to free fatty acids and sphingosine (SPH) [29]. It has been shown that the inhibitory effects of accumulated saturated fatty acids on insulin signaling were prevented by ASAH1 overexpression [30]. Other studies have revealed that endurance exercise was associated with a reduction in ceramide levels in the skeletal muscles of obese individuals, hence leading to the improvement of insulin sensitivity [31]. Although the expression of ASAH1 in human islets was high, we were not able to observe any differential expression of the gene in diabetic islets (Figure 4). This suggests that ASAH1 is probably not involved in β-cell function but rather insulin signaling as evident by other reports [30,31].
LRP4 is a single-pass transmembrane protein belonging to the LDL receptor-related protein (LRP) family. Members of this family were found to be involved in various processes, including development and physiology [32]. LRP4 loss of function has been associated with developmental anomalies associated with Cenani-Lenz syndrome (CLS) disease, which includes limb malformation and renal agenesis [33]. Furthermore, adipocytes in LRP4 deficient mice were characterized by an improvement of glucose and insulin tolerance, lipid homeostasis, less adipocyte hypertrophy as well as reduction of serum fatty acids, thus confirming the role of LRP4 in regulating glucose metabolism [34]. FES gene encodes for a non-receptor protein with tyrosine-specific activity. A recent investigation of the genetic determinants of cardiovascular diseases has found FES to be associated with hypertension and BMI [35]. The findings that expression of LPR4 and FES is not affected by diabetes status and low expressed in human islets indicate that those genes might not be potential important players in β-cell function ( Figure 4).
HSD17B12 (Hydroxysteroid 17-β dehydrogenase) is an essential molecule in the elongation of very-long-chain fatty acids (VLCFAs) and the production of arachidonic acid through the conversion of 17-keto and 17-hydroxysteroids [36]. The gene has been recently linked to BMI and obesity susceptibility [37]. We showed that HSD17B12 expression was very high in human islets and significantly decreased in diabetic islets compared to nondiabetic (Figure 4). The findings may suggest a role of HSD17B12 gene in pancreatic β-cell function. In support of our RNA-seq data, it was reported that tissues involved in lipid metabolisms such as liver, kidney, muscle, and adipose tissues have a high expression level of HSD17B12 in humans and mice [36]. Interestingly, pathway enrichment analyses of the 34 genes pointed out that HSD17B12 is involved in signal transduction and metabolism of lipids and steroids, which fits well with studies of phenotypes.
A previous study showed that individuals carrying the T2D risk allele T for the intronic SNP rs11037579 had lower expression of HSD17B12 in adipose tissue of insulin-resistant subjects [38]. Additionally, rs1061810 was documented to be associated with T2D indicating a role for HSD17B12 in diabetes [39]. Our results indicated that genotyping of rs4573668 (C/C) is common in the local Emirati patients. Since the rs4573668 variant might lead to a change in the protein amino acid sequence, therefore, it could play a significant role in controlling the activity and function of the HSD17B12 in terms of transcription, translation, and epigenetic modifications. Moreover, the findings that HSD17B12 expression is reduced in diabetic islets, highlights the potential of utilization of this gene as an early biomarker in blood samples for pre-diabetic patients.
In summary, our study proposes HSD17B12 as a causal gene for T2D and pancreatic β-cell function. More future functional studies still needed to validate the finding.

Conclusions
In conclusion, our combined approach supports the existence of common chromosomal regions and SNP-clusters among obesity/BMI, hypertension, obstructive sleep apnea, dyslipidemia, and diabetes which might be involved in the pathogenesis of these clinically related phenotypes.
We were able to replicate these SNP-clusters on the local Emirati population different from the ethnic groups available in GWAS. HSD17B12 was identified as a candidate gene for β-cell function. Our data are the implementation of genomics-based approaches to chronic disease detection using bioinformatics in silico approach, and, therefore, further functional validation is still needed to elucidate the role of HSD17B12 and rs4573668 in the pathogenesis of T2D. In the end, we believe that such knowledge could increase our understanding of diabetes and facilitate the development of drugs.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/4/461/s1, Figure S1: Screenshots of examples SNPs in the genes identified generated by targeted DNA next-generation sequencing in Emirati diabetic patients, Figure S2: List of the location and functional consequences of rs4573668 as predicted by SNPnexus tool, Table S1: Enriched Ontology Clusters of the identified 34 genes associated with the UAEDIA-phenotypes, Table S2: List of the 4 identified SNPs in HSD17B12 gene in two Emirati diabetic patients, with their location and significance in gene expression. Funding: J.T. is supported by grants from the University of Sharjah (1701090119-P and 1701090121-P) and the