Bioinformatics and Next-Generation Data Analysis for Identification of Genes and Molecular Pathways Involved in Subjects with Diabetes and Obesity

Background and Objectives: A subject with diabetes and obesity is a class of the metabolic disorder. The current investigation aimed to elucidate the potential biomarker and prognostic targets in subjects with diabetes and obesity. Materials and Methods: The next-generation sequencing (NGS) data of GSE132831 was downloaded from Gene Expression Omnibus (GEO) database. Functional enrichment analysis of DEGs was conducted with ToppGene. The protein–protein interactions network, module analysis, target gene–miRNA regulatory network and target gene–TF regulatory network were constructed and analyzed. Furthermore, hub genes were validated by receiver operating characteristic (ROC) analysis. A total of 872 DEGs, including 439 up-regulated genes and 433 down-regulated genes were observed. Results: Second, functional enrichment analysis showed that these DEGs are mainly involved in the axon guidance, neutrophil degranulation, plasma membrane bounded cell projection organization and cell activation. The top ten hub genes (MYH9, FLNA, DCTN1, CLTC, ERBB2, TCF4, VIM, LRRK2, IFI16 and CAV1) could be utilized as potential diagnostic indicators for subjects with diabetes and obesity. The hub genes were validated in subjects with diabetes and obesity. Conclusion: This investigation found effective and reliable molecular biomarkers for diagnosis and prognosis by integrated bioinformatics analysis, suggesting new and key therapeutic targets for subjects with diabetes and obesity.


Introduction
Diabetes mellitus and obesity are major metabolic or endocrine disorders and are dramatically increasing throughout the globe [1]. The prevalence of obesity and type 2 diabetes mellitus is considerably higher [2]. Diabetes mellitus and obesity are linked with progression of cardiovascular diseases [3], hypertension [4], and neurological and neuropsychiatric disorders [5] and asthma [6]. Till today, there is no cure for diabetes mellitus and obesity, and treatment and mediation tailored to clinical features are endorsed. Genetic and environmental factors are two initial contributors to these disorders [7]. Exploration of the molecular mechanisms of diabetes mellitus and obesity will develop the considerate of its pathogenesis and has key implications for designing new therapy.
Molecular mechanisms of subject with diabetes and obesity have been increasingly studied. Previous investigations showed that genes and signaling pathways are associated with diabetes mellitus and obesity. Key genes such as ENPP1 [8] and FTO [9] were responsible for development of diabetes mellitus and obesity. Recent investigations showed that PI3K/AKT pathway [10] and TLR pathway [11] as a potential target for diabetes mellitus and obesity. However, certain key genes and pathways associated with diabetes mellitus and obesity have not been completely investigated. Further studies are necessary to elucidate these essential genes and pathways to provide novel therapeutic targets for the treatment of diabetes and obesity.
In recent years, the analysis of biological information, known as bioinformatics, has attracted a great deal of attention and sustained breakthroughs in the search for biomarkers for various diseases [12][13][14]. With the gradual advancement of next-generation sequencing (NGS) technology, bioinformatics has become increasingly essential in molecular pathogenesis, performing a major role in elucidating diseases mechanisms and finding novel targets for diseases treatment and patient prognosis [15]. With the wide function of NGS, a huge amount of data hasbeen generated, and most of the data have been deposited and stored in public databases. NGS data analyses have been carried out on diabetes and obesity in recent years [16], and hundreds of differentially expressed genes (DEGs) have been obtained. Bioinformatics methods combining with NGS techniques will be innovative.
Therefore, in this investigation, we downloaded the next-generation sequencing (NGS)data GSE132831, provided by Osinski et al. [17], from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/, accessed on 11 June 2020) [18] database to identify the differentially expressed genes (DEGs) between diabetes mellitus and obesity samples and normal control samples. With the identified DEGs, we performed Gene Ontology (GO) and pathway enrichment analyses to investigate the functions and pathways enriched by the DEGs. Additionally, we constructed a protein-protein interaction (PPI) network and modules screened out some important gene nodes to perform clustering analysis. Furthermore, we constructed a target gene-miRNA regulatory network and target gene-TF regulatory network based on these key genes to investigate the potential relationships between genes and subject with diabetes and obesity. Finally, hub genes were validated by using receiver operating characteristic (ROC) curve analysis. The research design of this study was shown in Figure 1. These results might provide novel ideas for future investigation and treatment of diabetes mellitus and obesity by exploring prognostic markers and therapeutic targets in diabetes mellitus and obesity.

RNA Sequencing Data
The NGS data GSE132831 was downloaded from the GEO database, which was based on the platform of GPL1857 Illumina NextSeq 500 (Homo sapiens). This dataset, including samples of 104 diabetic obese and samples of 120 normal control, was deposited by Osinski et al. [17].

Identification of DEGs
The limma R/Bioconductor software package was used to perform the identification of DEGs between samples of diabetic obese and normal control in R software [19]. The cutoff criteria were |logFC| > 1.112 for up-regulated genes, |logFC| <−0.64 for down-regulated genes, and a p-value < 0.05. The significance of p value measures how likely it is that any observed difference between two groups (diabetes mellitus and obesity samples and normal control samples). The significance of log FC looks only at genes which vary wildly amongst other genes.

Protein-Protein Interaction (PPI) Network and Module Analysis
The IID interactome (http://iid.ophid.utoronto.ca/, accessed on 11 June 2020) [23] is an online database containing known and predicted PPI networks. In this investigation, a PPI network of identified DEGs in dataset was identified using the IID interactome database (combined score >0.4) and subsequently visualized using Cytoscape (http://www.cytoscape.org/, accessed on 11 June 2020) software (version 3.8.2) [24]. The regulatory relationship between genes were analyzed through topological property of computing network including the node degree [25], betweenness centrality [26], stress centrality [27] and closeness centrality [28] by using the Network Analyzer app within Cytoscape. The PEWCC1 (http://apps.cytoscape.org/apps/PEWCC1, accessed on 11 June 2020) [29] program within Cytoscape was used to detect modules of the PPI network. The GO and pathway enrichment analysis of the identified modules was then performed using the ToppGene database.

Target Gene-TF Regulatory Network
NetworkAnalyst database (https://www.networkanalyst.ca/, accessed on 11 June 2020) [31] is a bioinformatics platform for predicting target gene-TF pairs. In the present study, the target genes were predicted using ChEA TF database. In this study, TFs were considered the targeted TFs of hub genes based on this TF database. The target gene-TF regulatory network was depicted and visualized using Cytoscape software.

Receiver Operating Characteristic (ROC) Analysis
A ROC analysis is a technique for visualizing, construct and determining classifiers based on their achievement. A diagnostic test was firstly performed in order to measure the diagnostic value of candidate biomarkers in subject with diabetes and obesity. Sensitivity and specificity of each biomarker in this diagnostic test were determined. ROC curves were retrieved by plotting the sensitivity, against the specificity using the pROC in R software [32]. Area under the ROC curve (AUC) was determined to predict the efficiency of this diagnostic test. A test with AUC bigger than 0.9 is assigned great efficiency, 0.7-0.9, modest efficiency and 0.5-0.7, small efficiency.

Identification of DEGs
The DEGs were screened by "limma" package (p-value < 0.05, and |logFC| > 1.112 for up-regulated genes and |logFC| <−0.64 for down-regulated genes). The GSE132831 dataset contained 872 DEGs, including 439 up-regulated genes and 433 down-regulated genes. DEGs are listed in Table S1. The volcano plot is presented in Figure 2. The heat map DEGs is shown in Figure 3.

GO and Pathway Enrichment Analyses of DEGs
To gain in-depth and comprehensive biological characteristics of these DEGs, GO functional annotation and REACTOME pathway enrichment analysis were performed through online analytical tool ToppGene. The BP was mainly enriched in plasma membrane bounded cell projection organization, neurogenesis, cell activation and secretion (Table S2). The CC was mainly enriched in neuron projection, golgi apparatus, secretory granule and secretory vesicle (Table S2). The MF was significantly enriched in drug binding, ribonucleotide binding, signaling receptor binding and molecular transducer activity (Table S2). Result of REAC-TOME enrichment analysis showed that top pathways were axon guidance, extracellular matrix organization, neutrophil degranulation and innate immune system (Table S3).

Protein-Protein Interaction (PPI) Network and Module Analysis
To find the hub genes in the DEGs, Network Analyzer, a plug-in Cytoscape was performed. All the genes and edges were determined. IID interactome mapped 872 DEGs into a PPI network containing 3894 nodes and 7142 edges ( Figure 4). Hub genes with the high node degree, betweenness centrality, stress centrality and closeness centrality are listed in Table S4

Receiver Operating Characteristic (ROC) Analysis
To identify new potential biomarkers for diabetes and obesity, ROC curves of data derived from healthy controls and patients with diabetes and obesity was analyzed using the R package. The AUC calculated to assess the discriminatory ability of hub genes (Figure 8). Validated by ROC curves, we found that hub genes had high sensitivity and specificity, including MYH9, FLNA, DCTN1, CLTC, ERBB2, TCF4, VIM, LRRK2, IFI16 and CAV1, and AUC values more than 0.7. This analysis demonstrated that the hub genes had a diagnostic role.
As known, dynamic networks analysis and disease gene association were criteria for progression of various diseases [172,173]. Protein-protein interaction (PPI) network and its module can be regarded as key to the understanding of progression of diabetes mellitus and obesity, and might also lead to novel therapeutic way. MYH9 [41, [174][175][176], ERBB2 [38, [177][178][179][180], TCF4 [65,181], VIM (vimentin) [182,183], LRRK2 [184,185] and CAV1 [161,[186][187][188][189][190][191][192] have been implicated as a principal mediator of diabetes mellitus. VIM (vimentin) binds to insulin-responsive aminopeptidas, a major cargo protein of glucose transporter type 4, and decreases the glucose tolerance [182]. IFI16 [193], ERBB2 [194], VIM (vimentin) [182,195] and CAV1 [160,[196][197][198][199] are crucial factors for advancement of obesity. IFI16 showed adipogenesis, an enhanced inflammatory state and damaged insulin-stimulated glucose uptake in adipose tissue [193]. Motor protein MYH9 bindsto actin and producesmechanical force through magnesium-dependent hydrolysis of ATP, and it generatesthe contraction of striated and smooth muscles [200]. ErbB2 is a receptor tyrosine kinase family whose activity in cells depends on dimerization with another ligand-binding ErbB receptor, and associated with progression of various diseases [201]. TCF4 is a member of the basic helix-loop-helix (bHLH) family of transcription factors that have a key role in a various diseases [202]. VIM (vimentin) is an intermediate filament (IF) protein and plays an important role in epithelial-mesenchymal transition (EMT), a process that occurs during the development of various diseases [203]. LRRK2 is an enigmatic protein and has been one of the central molecules in a number of human diseases [204]. CAV1 is a cell surface protein shownto play a key role in insulin resistance [205]. IFI16 is an innate immune sensor for intracellular DNA and is associated with DNA damage in various diseases [206]. We identified novel targets including CLTC (clathrin heavy chain), TNS2, PLCG1 and NIFK (nucleolar protein interacting with the FHA domain of MKI67) for specific therapy of diabetes mellitus and obesity. Further investigation is needed to validate these results and investigate the roles of these biomarkers in diabetes mellitus and obesity.
In the present investigation, NGS data analysis revealed that the mechanism of occurrence of diabetes mellitus and obesity might be related to the expression of miRNA and TF. To validate the accuracy of the target genes, miRNAs and TFs identified by target gene-miRNA regulatory network and target gene-TF regulatory network analysis. Yan et al. [207], Wang et al. [208], Yan et al. [209] and Guo et al. [210] showed that expression and prognosis of hsa-mir-4329, hsa-mir-3685, hsa-mir-6124, hsa-mir-1297 and SMARCA4 are associated with the risk of cardiovascular diseases. Several studies have shown that biomarkers including hsa-mir-1299 [211], hsa-mir-4779 [212] and hsa-mir-4459 [213] might be predictive biomarkers for the efficacy of diabetes mellitus treatment. TCF7 was revealed and regarded as diagnostic biomarker in type 1 diabetes mellitus [214]. Transcription factor MYB was involved in asthma [215]. MYB might be associated with diabetes and obesity. E2F4 [216] and CLOCK [217] are associated with prognosis in patients with diabetes mellitus and obesity. CUX1 [218], NANOG [219], GATA4 [220] and HIF1A [221] plays a vital role in the patients with obesity. Novel targets include MYO18A, SEC16A, CCNB1, MAD2L1, hsa-mir-4315, hsa-mir-6134, hsa-mir-9500, KIFC3, FBL (fibrillarin), TUBA1A and GFI1B might have crucial biologic functions in the pathogenesis of patients with diabetes mellitus and obesity. This result indicated that our identified biomarkers are involved in the pathological progression of diabetes and obesity, its associated complications beingneurological and neuropsychiatric disorders, hypertension, cardiovascular diseases and asthma, thus warranting further exploration.
However, there are some limitations in this investigation. For instance, the NGS data were obtained from the GEO database and were not given by the authors. Therefore, further research should be conducted to verify whether these target genes can be used in the clinical treatment of diabetes mellitus and obesity.

Conclusions
Using a bioinformatics analysis of NGS dataset GSE132831, we identified the genes of diabetes and obesity. We found that DEGs in patients were enriched for pathways mainly involved in the axon guidance, neutrophil degranulation, plasma membranebounded cell projection organization, and cell activation. Focusing on the key genes and corresponding pathways involved in diabetes and obesity could provide new insights for diabetes mellitus and obesity treatment. Hub genes including MYH9, FLNA, DCTN1, CLTC, ERBB2, TCF4, VIM, LRRK2, IFI16 and CAV1 were identified as potential novel biomarkers for diabetes and obesity. The validation of hub genes was demonstrated by ROC analysis. Further investigation isurgently demanded to validate the hub genes, and further molecular mechanisms would be uncovered. All the output will lay the foundation for finding a possible therapeutic strategy to treat diabetes mellitus and obesity.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/medicina59020309/s1, Table S1: The statistical metrics for key differentially expressed genes (DEGs).; Table S2: The enriched GO terms of the up and down regulated differentially expressed genes; Table S3: The enriched pathway terms of the up and down regulated differentially expressed genes; Table S4: Topology table for up and down regulated genes; Table S5: miRNA-target gene and TF-target gene interaction.