Systems Approach to Pathogenic Mechanism of Type 2 Diabetes and Drug Discovery Design Based on Deep Learning and Drug Design Specifications

In this study, we proposed a systems biology approach to investigate the pathogenic mechanism for identifying significant biomarkers as drug targets and a systematic drug discovery strategy to design a potential multiple-molecule targeting drug for type 2 diabetes (T2D) treatment. We first integrated databases to construct the genome-wide genetic and epigenetic networks (GWGENs), which consist of protein–protein interaction networks (PPINs) and gene regulatory networks (GRNs) for T2D and non-T2D (health), respectively. Second, the relevant “real GWGENs” are identified by system identification and system order detection methods performed on the T2D and non-T2D RNA-seq data. To simplify network analysis, principal network projection (PNP) was thereby exploited to extract core GWGENs from real GWGENs. Then, with the help of KEGG pathway annotation, core signaling pathways were constructed to identify significant biomarkers. Furthermore, in order to discover potential drugs for the selected pathogenic biomarkers (i.e., drug targets) from the core signaling pathways, not only did we train a deep neural network (DNN)-based drug–target interaction (DTI) model to predict candidate drug’s binding with the identified biomarkers but also considered a set of design specifications, including drug regulation ability, toxicity, sensitivity, and side effects to sieve out promising drugs suitable for T2D.


Introduction
In recent years, chronic diseases are major causes of morbidity and mortality worldwide. As patients' long-term conditions could deteriorate gradually with age, chronic diseases require continuous monitoring and treatment to maintain quality of life. Diabetes is one of the prominent chronic diseases caused by either dysfunctional insulin production or failed deployment of insulin. Among them, type 2 diabetes (T2D) accounts for 90% to 95% cases in all diabetes and is estimated to impact about 435 million patients around the world by 2030 [1]. While T2D is considered to be most common in adults, the diagnosis of pediatric T2D increases steadily [2]. Common symptoms for T2D include frequent urination, thirst, constant hunger, etc. Most importantly, as a risk factor for heart, blood vessels, eyes, kidneys, and nervous system diseases, T2D might inevitably increase the risk of death and the medical burden on society.
T2D has been typically seen as insulin-independent, which implies an ineffective utilization of insulin due to insulin resistance [3]. However, it has been shown that the pancreatic β-cell destruction due to inflammation and immune response might also give rise to T2D aggravation [4,5]. Nowadays, albeit much effort has been dedicated to elucidate to T2D aggravation [4,5]. Nowadays, albeit much effort has been dedicated to elucidate the T2D pathogenic mechanism, few studies discussed how pancreatic destruction occurs in T2D, let alone its correlation with inflammatory response on β-cells. In fact, the systematic pathogenic mechanism of T2D still remains unclear. Therefore, we proposed a systems biology approach to investigate key pathogenic factors in view of genetic and epigenetic networks through system identification and system order detection methods by genome-wide RNA-seq data of T2D.
In the past decade, many methodologies have been proposed to identify the complex relations between the gene-gene, gene-protein, and protein-protein interactions. Although traditional biological experiments have been used to identify the protein-protein interaction network in the late 1990s [6], some drawbacks were also incurred. First, it is expensive and time-consuming to execute a large number of experiments for developing new therapies. Second, the biological experiments have practical limitations on taking the whole genome into consideration [7]. As a result, some potential pathways for diseases may not be well detected and studied. For instance, while the genome-wide association studies (GWAS) have investigated the single nucleotide polymorphisms (SNPs) of the human genome and found many disease-related variations, such findings alone can not explain the complex pathogenesis [8,9]. To overcome these challenges, we employed a systems biology approach to macroscopically analyze the systematic relationship among the proteins, genes, and microenvironment in the T2D pathogenic mechanism.
The systems biology method has been widely used to investigate the pathogenesis of disease such as cancer [10] and the progression of virus infection [11]. Likewise, in the proposed issues of T2D management, the systems biology method was deployed to trim off false positives from the candidate genome-wide genetic and epigenetic networks (GWGENs) as well as identify the disease-based GWGENs. Then, with the help of the principal network projection (PNP) approach, the core GWGENs were sifted out and further projected to KEGG pathways for subsequent analysis. Right after, by comparing the discrepancy between the non-T2D and T2D core signaling pathways, the pathogenic mechanism can be revealed. According to the analyzed pathogenic mechanism in core signaling pathways, the fat accumulation-dependent signaling pathways and the high glucose-induced signaling pathways lead pancreatic β-cells to excessive burden, bringing about cell inflammation and apoptosis during the development of T2D. Such phenomenon reduces the production of insulin secretion and disrupts the balance between glucose and insulin, giving rise to T2D. Hence, we proposed IKK, STAT3, PPARƔ, ETS1, and FAS as the significant pathogenic biomarkers contributing to the fat accumulation-dependent as well as the inflammatory-dependent cell apoptosis for systematic drug discovery design.
The process of drug development is an arduous task because of the high cost and time-consuming trials. It is estimated that it takes about 12-15 years and more than one billion US dollars before marketing a new drug [12]. Pharmaceutical companies need to spend a large amount of time and effort on executing experiments to understand the properties and the interactions of the drug to its targets. In addition, numerous animal and clinical trials ought to be carried out so as to ensure its safety and effectiveness [13]. These complicated procedures vastly increase the risk of failure in drug development, and most of them originate from the poor clinical outcomes [14]. On this ground, we developed systematic strategies based on drug-target interaction prediction and drug design specifications, which include drug regulation ability, toxicity, sensitivity, and side effect, to confront the problems from the perspective of system engineering. As the result, we chose and combined Sulforaphane and Biotin as the multiple-molecule targeting drug, which may potentially regulate IKK, STAT3, PPARƔ, ETS1, and FAS for T2D management. Taken together, we expect that the systematic drug discovery and design procedures presented in this study can provide an efficient way to find the multiple-molecule targeting drug candidates for T2D treatment before clinical trials.
, ETS1, and FAS as the significant pathogenic biomarkers contributing to the fat accumulation-dependent as well as the inflammatory-dependent cell apoptosis for systematic drug discovery design.
The process of drug development is an arduous task because of the high cost and timeconsuming trials. It is estimated that it takes about 12-15 years and more than one billion US dollars before marketing a new drug [12]. Pharmaceutical companies need to spend a large amount of time and effort on executing experiments to understand the properties and the interactions of the drug to its targets. In addition, numerous animal and clinical trials ought to be carried out so as to ensure its safety and effectiveness [13]. These complicated procedures vastly increase the risk of failure in drug development, and most of them originate from the poor clinical outcomes [14]. On this ground, we developed systematic strategies based on drug-target interaction prediction and drug design specifications, which include drug regulation ability, toxicity, sensitivity, and side effect, to confront the problems from the perspective of system engineering. As the result, we chose and combined Sulforaphane and Biotin as the multiple-molecule targeting drug, which may potentially regulate IKK, STAT3, PPAR 2 of 39 [4,5]. Nowadays, albeit much effort has been dedicated to elucidate echanism, few studies discussed how pancreatic destruction occurs orrelation with inflammatory response on β-cells. In fact, the systemhanism of T2D still remains unclear. Therefore, we proposed a sysch to investigate key pathogenic factors in view of genetic and epigegh system identification and system order detection methods by gedata of T2D. de, many methodologies have been proposed to identify the complex e gene-gene, gene-protein, and protein-protein interactions. Altological experiments have been used to identify the protein-protein in the late 1990s [6], some drawbacks were also incurred. First, it is consuming to execute a large number of experiments for developing d, the biological experiments have practical limitations on taking the consideration [7]. As a result, some potential pathways for diseases ected and studied. For instance, while the genome-wide association e investigated the single nucleotide polymorphisms (SNPs) of the hund many disease-related variations, such findings alone can not exthogenesis [8,9]. To overcome these challenges, we employed a sysch to macroscopically analyze the systematic relationship among the microenvironment in the T2D pathogenic mechanism. logy method has been widely used to investigate the pathogenesis of er [10] and the progression of virus infection [11]. Likewise, in the 2D management, the systems biology method was deployed to trim rom the candidate genome-wide genetic and epigenetic networks as identify the disease-based GWGENs. Then, with the help of the ojection (PNP) approach, the core GWGENs were sifted out and fur-G pathways for subsequent analysis. Right after, by comparing the n the non-T2D and T2D core signaling pathways, the pathogenic evealed. According to the analyzed pathogenic mechanism in core the fat accumulation-dependent signaling pathways and the high aling pathways lead pancreatic β-cells to excessive burden, bringing ion and apoptosis during the development of T2D. Such phenomeuction of insulin secretion and disrupts the balance between glucose ise to T2D. Hence, we proposed IKK, STAT3, PPARƔ, ETS1, and FAS hogenic biomarkers contributing to the fat accumulation-dependent matory-dependent cell apoptosis for systematic drug discovery dedrug development is an arduous task because of the high cost and ls. It is estimated that it takes about 12-15 years and more than one fore marketing a new drug [12]. Pharmaceutical companies need to t of time and effort on executing experiments to understand the propctions of the drug to its targets. In addition, numerous animal and o be carried out so as to ensure its safety and effectiveness [13]. These res vastly increase the risk of failure in drug development, and most om the poor clinical outcomes [14]. On this ground, we developed , ETS1, and FAS for T2D management. Taken together, we expect that the systematic drug discovery and design procedures presented in this study can provide an efficient way to find the multiple-molecule targeting drug candidates for T2D treatment before clinical trials. Flowchart of systems biology method and the outline of systematic drug discovery design. The candidate genome-wide genetic and epigenetic networks (GWGEN) consist of gene regulation network (GRN) and protein-protein interaction network (PPIN), where candidate GRN was constructed through integrating gene regulation databases and candidate PPIN was constructed via protein-protein interaction databases. The candidate GWGENs were identified to obtain real GWGENs by RNA-seq data from GSE81608 through system identification and system order detection. Then, core GWGENs were extracted from real GWGENs by the principal network projection (PNP) method. Potential drugs were discovered according to the significant biomarkers determined by investigating the T2D pathogenesis constructed through comparing core signaling pathways of non-type 2 diabetes (T2D) and T2D. Figure 1. Flowchart of systems biology method and the outline of systematic drug discovery design. The candidate genome-wide genetic and epigenetic networks (GWGEN) consist of gene regulation network (GRN) and protein-protein interaction network (PPIN), where candidate GRN was constructed through integrating gene regulation databases and candidate PPIN was constructed via protein-protein interaction databases. The candidate GWGENs were identified to obtain real GWGENs by RNA-seq data from GSE81608 through system identification and system order detection. Then, core GWGENs were extracted from real GWGENs by the principal network projection (PNP) method. Potential drugs were discovered according to the significant biomarkers determined by investigating the T2D pathogenesis constructed through comparing core signaling pathways of non-type 2 diabetes (T2D) and T2D. The T2D pathogenic mechanism investigation by comparing the T2D and non-T2D core signaling pathways. The genes and proteins in the core signaling pathways were chosen from the T2D and non-T2D core GWGENs. The gene regulations and protein interactions were constructed based on the edges in core GWGENs. The blocks of light orange, light blue, and light green background color separately indicate the T2D differential signaling pathways, the common signaling pathways of both T2D and non-T2D, and the non-T2D differential signaling pathways, respectively. The cellular functions caused by target genes are clustered with solid lines in different colors and referred to Uniprot. The bold arrowhead marks in black denote the relatively low and high expression in pathogenic signaling pathways in contrast to non-T2D.2.2. The Pathogenic Microenvironment in T2D.
Note that, to reinforce the reliability of constructed T2D pathogenic mechanism, the collected RNA-seq data on the pancreatic β-cell was selected with age greater than or equal to 50 years due to high incidence, and they were classified into non-T2D and T2D, as shown in Table 1. Table 1. Samples of RNA-seq data on pancreatic β-cells from GSE81608 were selected according to age greater than or equal to 50 years and classified into non-T2D and T2D.

RNA-seq Data Non-T2D T2D
Age ≥ 50 86 123 Based on the information from the accessible bioinformatics databases, the candidate GWGENs were constructed and identified by system identification and the system order detection method to prune off the trivial interactions and regulations. Although the extracted GWGENs (real GWGENs plotted by Cytoscape software in Figure A1) in a smaller scale could be apparently observed as shown in Table A1, the network complexity it owns still blocked the further analysis. To deal with this problem, the PNP method was applied to distill the real GWGENs into the core GWGENs, which effectively reduced the network size and simplified the subsequent pathogenic markers and pathway analysis of T2D. Notably, among the core GWGENs as shown in Figure A2, the top 3000 major nodes from 85% of the real GWGENs after projection were included.

of 33
Thereafter, the core GWGENs for T2D and non-T2D were projected to KEGG pathways by DAVID software to derive the core signaling pathways in Tables 2 and 3, respectively.  According to the enrichment analysis of core T2D signaling pathways shown in Table 2,  there are 22 genes related to insulin resistance and 11 genes related to type II diabetes melli-tus. In addition, 14 genes are associated with lipid metabolism. Such findings indicate that pathogenic factors of diabetes are related to not only high glucose but also fat accumulation. As a result, with the help of KEGG pathway annotation, core signaling pathways for T2D and non-T2D were constructed and individually illustrated in Figures A3 and A4.  According to the T2D pathogenic signaling pathways (Figure 2), the lipid and glucose metabolism pathways are found to play crucial roles in impairing the pancreatic functions, leading to the occurrence of T2D. Lipid accumulation in the pancreas owing to long-term excessive caloric intakes inevitably causes the burden on the pancreas β-cell, which thereby impacts the insulin production. In our body, lipids are degraded into either the triglycerides or the free fatty acids (FFAs). Some of them might also transform into low-density lipoproteins (LDLs). It is known that LDLs and FFAs can act to interfere with insulin biosynthesis, insulin secretion, and cell proliferation [15]. On the other hand, serving as a peptide hormone to consolidate the concentration of glucose level in blood and to stimulate the decomposition of fat, glucagon (GCG) would be suppressed to a lower concentration due to the high glucose intake. Therefore, the ability of lipid decomposition is declined. In addition, higher glucose is often accompanied by the enrichment of glucagon-like peptide-1 (GLP1) and insulin-like growth factor 1 (IGF1). Although the provoked downstream pathways may expand the cell mass and enhance the capacity of insulin secretion to balance the blood sugar level, extreme glucose intake often disturbs homeostasis. As a result, the pancreatic β-cells cannot withstand the impact of high glucose and lipids, and they eventually cause dysfunction.
Furthermore, the effect of immune and inflammatory responses should not be neglected. When the pancreatic β-cells suffer damage from endoplasmic reticulum stress (ER stress), the immune response would be activated along with the secretion of cytokine factors, such as IL6 and FAS or chemokine CXCL10.

Pancreatic β-Cell Proliferation and Apoptosis in the T2D Inflammatory Microenvironment
We conducted a literature survey to outline the biological functions implied in Figure 2. Under high glucose conditions, GLP1 and IGF1 ligands were induced to a higher level than normal. Catalyzed by GLP1, GLP1R delivered the transduction signal via PIK3R1, RAS, RAF1, and MEK1 to activate the MAPK pathway. The activated MAPK due to the overexpression of GLP1 obliquely elevated the level of transcription factor (TF) ETS1, which subsequently upregulated the target gene FOXO1 but downregulated TF FOXA2. Emerging studies revealed that the upregulation of FOXO1 contributes to the apoptosis of the pancreatic β-cell, concurrently alleviating cell proliferation [16]. In addition, serving as an important TF in pathogenic pathways, FOXO1 could suppress TF GSK3B to elevate PDX1 expression, where GSK3B is a negative regulator, and its downregulation maintains PDX1 protein stability to delay its phosphorylated degradation [17]. As a critical regulator in pancreatic β-cell development, PDX1 is responsible for cell proliferation and insulin secretion [18]. However, it has been reported that a decreased FOXA2 could reduce its binding to the PDX1 promoter [19], holding an antagonism. If without sufficient PDX1, pancreatic β-cells cannot repair the damage from cell apoptosis and peroxide [20]. It has been validated that the AKT1 activates the downstream protein MTOR through TSC2 and RHEB and simultaneously upregulates TF FOXO1, which is phosphorylated by kinase PHKB [21,22]. In T2D, MTOR phosphorylated S6KB1, and it is well documented that this upregulation expands the cell size and number of pancreas for producing more insulin and maintaining the pancreatic function to decompose the glucose [23]. Among signaling transductions associated with pancreatic β-cell survival, upon receiving the signal from ligand low-density lipoprotein (LDL), receptor LDLR stimulated MIR24 to restrain the transcription of target gene IRE1. Notably, the inhibition of IRE1 protects the pancreatic β-cell from ER stress-induced apoptosis while accelerating the impairment of insulin secretion [24].
Furthermore, the influence of AKT1-dependent immune response, FFA-induced, and MAPK-relevant pathways occupied a key position on cell survival. In the AKT1 pathway, PDK1 suppressed TF IKK phosphorylation degradation through SGK1. SGK1 plays a role in anti-inflammation, since it impedes the apoptotic promoter NF-κB from translocation to mitigate its ability for inflammatory cytokines transcription [25]. In contrast, the MTOR pathway and CXCL10-mediated MYD88 signaling both enhanced the nuclear translocation of NF-κB through IKK. It can initiate immune response, contributing to the cell inflammation and apoptosis [26,27]. NF-κB is a double-edge sword in immune modulation. In general, NF-κB-dependent transcription not only accelerates the anti-apoptosis mechanism in favor of cell survival but also augments the inflammatory response, leading to cell death. Nevertheless, in T2D, the absence of anti-apoptosis pathways pertaining to NF-κB was found.
On top of that, FFA mediated the reduction of AKT1 phosphorylation by JUNB and XBP1, hence weakening the transcriptional ability of AKT1.
Concerning cell death, the members of the CASP family count for a great deal in inducing apoptosis when stimulated by exogenous and endogenous environmental factors. The IGF1 signal indirectly induced the NEDD8 activation and further upregulated the inflammatory mediator CASP1, resulting in an aggravated inflammation response [28]. Another CASP member, CASP8, could be triggered by the FASL-stimulated apoptotic pathway as well, causing the inception of cell inflammatory response and apoptosis. On the other hand, as a typical hallmark of apoptosis in the CASP family, CASP3 could be indirectly activated by IL6. Although IL6 might upregulate AKT1 activity to raise the cell proliferation through JAK2-induced demethylation, IL6 inhibited the key controller of anti-apoptosis BCL2 through STAT3 downregulation. It is worth noting that from the non-T2D signaling pathway, IL6 was characterized as an anti-inflammatory cytokine and indirectly interacted with ISL1 and SETD7 to activate PDX1, intensifying cell proliferation. Moreover, the FFAdependent ER stress could also interrupt the STAT3-dependent signaling, which causes the blockade of cellular defensive machinery from BCL2. The inhibited BCL2 activated the caspase cleavage TF SP1 and obliquely its target gene CASP3, resulting in apoptosis, which has been suggested through opening the channel on mitochondria membrane to secret CYCS [29].

Abnormality in Insulin Synthesis and Insulin Secretion
Insulin synthesis and secretion are significant and indispensable modulation functions in the pancreas. Without sufficient insulin, the pancreas is not able to effectively decompose glucose to generate enough energy for cell tissues. In T2D, there exist confrontations between the glucose-induced promotion of insulin and the apoptosis-triggered reduction of insulin secretion. From the pathogenic signaling pathways shown in Figure 2, the high expression of phosphorylated PDK1 interacted with TF SGK1 to prompt the upregulation of PLD1, which has previously been described to facilitate insulin secretion [30]. Likewise, the increment of insulin production could also be triggered by GCG-stimulated TF MAFA activation pathway through signaling cascades GPR52, ABCB1, CAMP, PKA, and CREB1. This finding is in line with the observation of a study that GCG level rises in response to lipid metabolism when lipids accumulate in the pancreas [31]. Furthermore, the upregulated SGK1 inhibited NEDD4 to accelerate the GLUT1 deubiquitylation, promoting the insulin secretion. On the other hand, as the ubiquitin ligase, COP1, its inhibition induced by the GLP1-stimulated MAPK pathway could attenuate the degradation of negative modulator ETV1 and impel its target gene EXOC6 to overexpress, therefore weakening the ensuing insulin secretion stimulation.
Meanwhile, in contrast to the upregulation of GLUT1, the target gene GLUT2 was repressed by PPARγ through both the signaling cascades: one via FFA-dependent FFAR1 and FABP5 signal transduction; the other via the decrement of TF PGC1α transcriptional ability through GLP1-catalyzed deacetylated enzyme SIRT upregulation. Consequently, the loss of GLUT2 gave rise to a drop on insulin secretion. Furthermore, miRNAs also play a key role in the pancreas to regulate insulin synthesis and secretion. An abnormal expression of miRNAs often arouses repercussion. Despite holding the potency to prevent pancreatic β-cells from apoptosis through IRE1 inhibition [24], MIR24 was inevitably induced by LDL to dampen insulin synthesis through triggering MAFA downregulation. Aside from that, acting as a downstream of MAPK signaling cascades, when activated, MIR29B2 kept its target MCT1 (SLC16A1, a plasma membrane monocarboxylate transporter to manage the exocrine function of insulin) from expression, thereby resulting in the interruption of insulin secretion [32].

Potential Multiple-Molecule Targeting Drug for T2D Utilizing Systematic Drug Discovery Approach
According to the investigation of the pathogenic mechanism, the primary progression of T2D stemmed from excessive inflammation and cell apoptosis owing to fat accumulation in the pancreas. Moreover, an over intake of glucose pressures the pancreas to overwork, therefore leading to dysfunction. In line with this notion, significant biomarkers related to fat accumulation, cell inflammation, and apoptosis were selected. Then, we used these biomarkers to search for favorable compounds that can serve as potential therapy of T2D. Consequently, we took IKK, STAT3, FAS, ETS1, and PPARγ as biomarkers and sought to reverse their expression levels. Amongst them, IKK, STAT3, and FAS are pertinent to pancreas inflammation and death; ETS1 is responsible for pancreas proliferation; PPARγ can regulate the glucose flux into the pancreas through the channel protein GLUT2 and therefore stimulate insulin secretion.
After defining these potential biomarkers as drug targets, we select candidate drugs by drug repositioning, with consideration of their chemical properties. On one hand, a deep neural network (DNN)-based DTI model was pretrained to predict drug-target binding likely to exist; on the other, drug design specifications, i.e., regulation ability, toxicity, sensitivity, and side effect were further exploited to sieve out potential drugs for designing a multiple-molecule targeting drug for T2D treatment before clinical trials. The flowchart of systematic drug discovery and design procedure is described in Figure 3. . The flowchart of systematic drug discovery and design procedure. The drug-target binding datasets were obtained from BindingDB, which integrated substantial information of drugs and targets from several databases. Then, the drug and target features were sequentially preprocessed through descriptor transformation, standardization, and PCA dimension reduction. Afterwards, the processed data were split into training and testing data for deep neural network (DNN)-based drug-target interaction (DTI) model training and performance evaluation, respectively. During the training process, the model parameters were updated through the error between the true binding label and predicted binding label of each drug-target pair. The well-trained DNN-based DTI model was used to predict the binding probability between drugs and the identified biomarkers to sift out candidate drugs. Finally, with the consideration of drug design specifications including regulation ability, toxicity, sensitivity, and side effect, potential drugs were selected and integrated for novel medication therapy curing T2D. Figure 3. The flowchart of systematic drug discovery and design procedure. The drug-target binding datasets were obtained from BindingDB, which integrated substantial information of drugs and targets from several databases. Then, the drug and target features were sequentially preprocessed through descriptor transformation, standardization, and PCA dimension reduction. Afterwards, the processed data were split into training and testing data for deep neural network (DNN)-based drug-target interaction (DTI) model training and performance evaluation, respectively. During the training process, the model parameters were updated through the error between the true binding label and predicted binding label of each drug-target pair. The well-trained DNN-based DTI model was used to predict the binding probability between drugs and the identified biomarkers to sift out candidate drugs. Finally, with the consideration of drug design specifications including regulation ability, toxicity, sensitivity, and side effect, potential drugs were selected and integrated for novel medication therapy curing T2D.
From our DNN-based DTI model (Figure 4), we set four hidden layers, and each of them is connected with a ReLU activation function layer behind. The ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions adopted to deal with classification issues [33]. Meanwhile, to hinder the model from overfitting during the training process, the dropout layer is incorporated after each hidden layer. The dimension of the input layer is 618, corresponding to the features size of the drug-target pair, and 512, 256, 128, and 64 neurons are embedded respectively in the four hidden layers. Prior to the output layer, a sigmoid activation function is applied to limit the value within the range between 0 and 1 (probability). Note that a sigmoid function is commonly used in binary classification problems. The outcome of DTI indicates the likelihood of a binding, where a higher value corresponds to a more reliable interaction (binding) between the drug and target. The loss and accuracy during the training process are recorded in Figures 5 and 6, respectively. The well-trained DTI model was supervised through applying the 10-fold cross-validation to evaluate the model performance, as shown in Table 4. Eventually, we received an average accuracy of 94.89 (%) with the standard deviation of 0.156 (%), and the model with best testing performance was picked as our DTI model. From our DNN-based DTI model (Figure 4), we set four hidden layers, and each of them is connected with a ReLU activation function layer behind. The ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions adopted to deal with classification issues [33]. Meanwhile, to hinder the model from overfitting during the training process, the dropout layer is incorporated after each hidden layer. The dimension of the input layer is 618, corresponding to the features size of the drug-target pair, and 512, 256, 128, and 64 neurons are embedded respectively in the four hidden layers. Prior to the output layer, a sigmoid activation function is applied to limit the value within the range between 0 and 1 (probability). Note that a sigmoid function is commonly used in binary classification problems. The outcome of DTI indicates the likelihood of a binding, where a higher value corresponds to a more reliable interaction (binding) between the drug and target. The loss and accuracy during the training process are recorded in Figures 5 and 6, respectively. The well-trained DTI model was supervised through applying the 10-fold cross-validation to evaluate the model performance, as shown in Table 4. Eventually, we received an average accuracy of 94.89 (%) with the standard deviation of 0.156 (%), and the model with best testing performance was picked as our DTI model.   From our DNN-based DTI model (Figure 4), we set four hidden layers, and each of them is connected with a ReLU activation function layer behind. The ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions adopted to deal with classification issues [33]. Meanwhile, to hinder the model from overfitting during the training process, the dropout layer is incorporated after each hidden layer. The dimension of the input layer is 618, corresponding to the features size of the drug-target pair, and 512, 256, 128, and 64 neurons are embedded respectively in the four hidden layers. Prior to the output layer, a sigmoid activation function is applied to limit the value within the range between 0 and 1 (probability). Note that a sigmoid function is commonly used in binary classification problems. The outcome of DTI indicates the likelihood of a binding, where a higher value corresponds to a more reliable interaction (binding) between the drug and target. The loss and accuracy during the training process are recorded in Figures 5 and 6, respectively. The well-trained DTI model was supervised through applying the 10-fold cross-validation to evaluate the model performance, as shown in Table 4. Eventually, we received an average accuracy of 94.89 (%) with the standard deviation of 0.156 (%), and the model with best testing performance was picked as our DTI model.

Model Performance (10-Fold Cross-Validation) Validation LossValidation Accuracy (%) Testing Loss
Testing Accuracy (%) The far-left column recorded the numbers of 10-fold cross-validation models. The block with values in bold denotes the model with best testing accuracy in contrast to the other models and is chosen as the well-trained DTI model for drugtarget binding prediction.
Furthermore, we also compared the DNN-based DTI model with other DTI models based on machine learning classification approaches, such as random forest, K-nearest neighbor (KNN), and Support Vector Machine (SVM) by the receiver operating characteristic (ROC) curve measure. The visualization of ROC curve comparison is denoted in Figure 7. From the figure, the performance of our proposed DTI model is apparently better than the others, which indicates that the deep learning algorithm greatly adapts to the calculation of the overwhelming and complicated drug-target interaction data in contrast to other traditional machine learning methods.  Furthermore, we also compared the DNN-based DTI model with other DTI models based on machine learning classification approaches, such as random forest, K-nearest neighbor (KNN), and Support Vector Machine (SVM) by the receiver operating characteristic (ROC) curve measure. The visualization of ROC curve comparison is denoted in Figure 7. From the figure, the performance of our proposed DTI model is apparently better than the others, which indicates that the deep learning algorithm greatly adapts to the calculation of the overwhelming and complicated drug-target interaction data in contrast to other traditional machine learning methods. Through the prediction of the pretrained DNN-based DTI model, candidate drugs were sieved out owing to possessing high probability to bind (dock) to the selected biomarkers. However, the balance between the drug potency and adverse effect should also be concerned, since potent drugs are usually accompanied with a high risk of damage. Accordingly, with the consideration of the drug design specifications such as regulation capacity, toxicity, sensitivity, and side effect, we could further assure the stability and safety of drugs in clinical trials. For the purpose of measuring the regulation capacity of candidate drugs, the available data with well-documented regulation ability information was downloaded from L1000 level 5 dataset, which contains 978 genes treated with 19,811 small molecular compounds in 78 different cell lines [34]. By referring to LINCS L1000, we can examine whether a specific gene was upregulated (positive values) or downregulated (negative values) after being treated with an existing compound. On the other hand, the drug with lower toxicity often possesses a smaller side effect with reference to the median lethal dose (LD50) value in DrugBank. Being the numeric index of lethality, LD50 plays a pivotal role in drug safety evaluation. Further, administering a drug with higher drug sensitivity (a lower value of half maximal effective concentration (EC50)) could also cut down the dosage of the drug and further mitigate the ensuing side effect [35]. Within, the drug sensitivity data were collected from the PRISM dataset, which includes 4518 drugs being experimented across 578 human cell lines based on the EC50. EC50 is used to measure the potency of a drug, where a drug with smaller EC50 implies that it could exert Through the prediction of the pretrained DNN-based DTI model, candidate drugs were sieved out owing to possessing high probability to bind (dock) to the selected biomarkers. However, the balance between the drug potency and adverse effect should also be concerned, since potent drugs are usually accompanied with a high risk of damage. Accordingly, with the consideration of the drug design specifications such as regulation capacity, toxicity, sensitivity, and side effect, we could further assure the stability and safety of drugs in clinical trials. For the purpose of measuring the regulation capacity of candidate drugs, the available data with well-documented regulation ability information was downloaded from L1000 level 5 dataset, which contains 978 genes treated with 19,811 small molecular compounds in 78 different cell lines [34]. By referring to LINCS L1000, we can examine whether a specific gene was upregulated (positive values) or downregulated (negative values) after being treated with an existing compound. On the other hand, the drug with lower toxicity often possesses a smaller side effect with reference to the median lethal dose (LD50) value in DrugBank. Being the numeric index of lethality, LD50 plays a pivotal role in drug safety evaluation. Further, administering a drug with higher drug sensitivity (a lower value of half maximal effective concentration (EC50)) could also cut down the dosage of the drug and further mitigate the ensuing side effect [35]. Within, the drug sensitivity data were collected from the PRISM dataset, which includes 4518 drugs being experimented across 578 human cell lines based on the EC50. EC50 is used to measure the potency of a drug, where a drug with smaller EC50 implies that it could exert the maximum effect with a lower dose [36]. On top of that, we defined the side effect of each drug as its additional binding to other targets rather than the desired biomarkers. The fewer unwanted targets the drug binds, the smaller it affects other pathways. The side effects for the candidate drugs are denoted in Table 5, and further information of the proposed candidate drugs for the identified biomarkers were presented in Table 6. Leveraging these pharmacological properties from databases, appropriate drugs were plausibly selected from Tables 5 and 6 to meet the drug design specifications. Ultimately, we suggested a combination of Sulforaphane and Biotin as our potential multiple-molecule targeting drug for T2D. The side effect of a drug is defined as the number of targets except the desired biomarkers. The candidate drugs with '*' are selected as our potential drugs for T2D. Sulforaphane is a natural edible substance isothiocyanate produced by the enzymatic action of the myrosinase on glucopharanin, which is a 4-methylsulfinylbutyl glucosinolate contained in cruciferous vegetables of the genus Brassica such as broccoli, brussel sprouts, and cabbage. Several experiments have validated that Sulforaphane mitigates oxidative stress and protects cells from damage by invaded tumors and diseases [37]. Biotin, also called vitamin H, is a water-soluble B vitamin and involves a wide range of metabolic processes in body. It plays an important role in not only the protein synthesis but also the fat and carbohydrate metabolism. A previous experiment in rats documented that the insulin secretion dysfunction is related to the loss of biotin [38]. The chemical structures of the T2D multiple-molecule targeting drug and the corresponding drug design specifications with respect to suitable regulation ability, low toxicity, high sensitivity and low side effect are given in Table 7. Table 7. The drug design specifications of a potential multiple-molecule targeting drug for T2D.

Drug Names
Regulation Ability to Specific Biomarkers Toxicity (LD50, moL/kg)

Sulforaphane Biotin
Binding numbers except their desired target biomarkers in core signaling pathways (side effect) 23 19 the T2D multiple-molecule targeting drug and the corresponding drug design specifications with respect to suitable regulation ability, low toxicity, high sensitivity and low side effect are given in Table 7. Table 7. The drug design specifications of a potential multiple-molecule targeting drug for T2D.

Drug Names
Regulation Ability to Specific Biomarkers Toxicity (LD50, mol/kg) Biotin Binding numbers except their desired target biomarkers in core signaling pathways (side effect) 23 19 Chemical structures of multiple molecular drugs '' denotes the drug could bind to the biomarkers with a desired regulation capacity. Among the chemical structures of the multiple molecular drugs, "R(CH2)nH" indicates the alkyl group; "RNCS" means the isothiocyanate group; "RSOR′" represents the sulfinyl group; "R′R"NH" is the secondary amine; "RCOR′" means the carbonyl group; 'RSR′' is the sulfide group; and the "RCOOH" is the carboxyl group.
represents a solid wedge where the bond is pointing out toward the viewer and indicates a hashed wedge where the bond is receding away from the viewer.
According to Tables 5-7, the combination therapy of Sulforaphane and Biotin has the potential to restore the abnormal regulation in T2D. The reversing of STAT3 may reduce the cell apoptosis caused by the endogenous damage substances, and the lower expression of FAS can decrease the cell apoptosis by the interference of an exogenous microenvironment. In addition, the reduction of IKK expression can suppress some phosphorylated degradations, hence mitigating the formation of inflammatory environments and the subsequent activation of cell apoptosis. Furthermore, the downregulation of ETS1 by the proposed multiple-molecule targeting drug can facilitate FOXA2 to form the connection with FOXO1, therefore enhancing the cell proliferation. The recovery expression of PPARγ can achieve the equilibrium between glucose intake and insulin secretion. Taken together, by administering the proposed multiple-molecule targeting drug, an upregulation of STAT3 and PPARγ accompanied by the downregulation of IKK, ETS1, and FAS can validly be attained, yielding encouraging results for the treatment of T2D patients.

The Association between Macrophage Polarization and Inflammatory Response in T2D
When it comes to the T2D pathogenic mechanism, the pancreatic β-cell could not withstand the decomposition of excessive glucose in the body from long-term high glucose intakes, leading to pancreatic β-cell exhaustion and insulin resistance. Although the accumulation of glucose in the body is a crucial factor for T2D development, it is worth noting that the inflammatory-dependent apoptosis stemming from the fat accumulation in the pancreatic β-cell is also a pivotal issue. In T2D specific signaling pathways, FFA produced by the hydrolysis of oils and fats not only disrupted the glucose homeostasis the T2D multiple-molecule targeting drug and the corresponding drug design specifications with respect to suitable regulation ability, low toxicity, high sensitivity and low side effect are given in Table 7. Table 7. The drug design specifications of a potential multiple-molecule targeting drug for T2D.

Drug Names
Regulation Ability to Specific Biomarkers Toxicity (LD50, mol/kg) Biotin Binding numbers except their desired target biomarkers in core signaling pathways (side effect) 23 19 Chemical structures of multiple molecular drugs '' denotes the drug could bind to the biomarkers with a desired regulation capacity. Among the chemical structures of the multiple molecular drugs, "R(CH2)nH" indicates the alkyl group; "RNCS" means the isothiocyanate group; "RSOR′" represents the sulfinyl group; "R′R"NH" is the secondary amine; "RCOR′" means the carbonyl group; 'RSR′' is the sulfide group; and the "RCOOH" is the carboxyl group.
represents a solid wedge where the bond is pointing out toward the viewer and indicates a hashed wedge where the bond is receding away from the viewer.
According to Tables 5-7, the combination therapy of Sulforaphane and Biotin has the potential to restore the abnormal regulation in T2D. The reversing of STAT3 may reduce the cell apoptosis caused by the endogenous damage substances, and the lower expression of FAS can decrease the cell apoptosis by the interference of an exogenous microenvironment. In addition, the reduction of IKK expression can suppress some phosphorylated degradations, hence mitigating the formation of inflammatory environments and the subsequent activation of cell apoptosis. Furthermore, the downregulation of ETS1 by the proposed multiple-molecule targeting drug can facilitate FOXA2 to form the connection with FOXO1, therefore enhancing the cell proliferation. The recovery expression of PPARγ can achieve the equilibrium between glucose intake and insulin secretion. Taken together, by administering the proposed multiple-molecule targeting drug, an upregulation of STAT3 and PPARγ accompanied by the downregulation of IKK, ETS1, and FAS can validly be attained, yielding encouraging results for the treatment of T2D patients.

The Association between Macrophage Polarization and Inflammatory Response in T2D
When it comes to the T2D pathogenic mechanism, the pancreatic β-cell could not withstand the decomposition of excessive glucose in the body from long-term high glucose intakes, leading to pancreatic β-cell exhaustion and insulin resistance. Although the accumulation of glucose in the body is a crucial factor for T2D development, it is worth noting that the inflammatory-dependent apoptosis stemming from the fat accumulation in the pancreatic β-cell is also a pivotal issue. In T2D specific signaling pathways, FFA produced by the hydrolysis of oils and fats not only disrupted the glucose homeostasis '√' denotes the drug could bind to the biomarkers with a desired regulation capacity. Among the chemical structures of the multiple molecular drugs, "R(CH2) n H" indicates the alkyl group; "RNCS" means the isothiocyanate group; "RSOR " represents the sulfinyl group; "R R"NH" is the secondary amine; "RCOR " means the carbonyl group; 'RSR ' is the sulfide group; and the "RCOOH" is the carboxyl group.

W 15 of 39
2D multiple-molecule targeting drug and the corresponding drug design specificawith respect to suitable regulation ability, low toxicity, high sensitivity and low side t are given in Table 7. sign specifications of a potential multiple-molecule targeting drug for T2D. egulation Ability to Specific Biomarkers Toxicity (LD50, mol/kg) Biotin t their desired target biomarkers in core signaling pathways (side effect) 19 Chemical structures of multiple molecular drugs to the biomarkers with a desired regulation capacity. Among the chemical structures of (CH2)nH" indicates the alkyl group; "RNCS" means the isothiocyanate group; "RSOR′" ′R"NH" is the secondary amine; "RCOR′" means the carbonyl group; 'RSR′' is the sulfide carboxyl group.
represents a solid wedge where the bond is pointing out toward the d wedge where the bond is receding away from the viewer.
According to Tables 5-7, the combination therapy of Sulforaphane and Biotin has the tial to restore the abnormal regulation in T2D. The reversing of STAT3 may reduce ell apoptosis caused by the endogenous damage substances, and the lower expression S can decrease the cell apoptosis by the interference of an exogenous microenviron-. In addition, the reduction of IKK expression can suppress some phosphorylated adations, hence mitigating the formation of inflammatory environments and the subent activation of cell apoptosis. Furthermore, the downregulation of ETS1 by the prod multiple-molecule targeting drug can facilitate FOXA2 to form the connection with O1, therefore enhancing the cell proliferation. The recovery expression of PPARγ can ve the equilibrium between glucose intake and insulin secretion. Taken together, by inistering the proposed multiple-molecule targeting drug, an upregulation of STAT3 PPARγ accompanied by the downregulation of IKK, ETS1, and FAS can validly be ed, yielding encouraging results for the treatment of T2D patients.

scussion he Association between Macrophage Polarization and Inflammatory Response in T2D
When it comes to the T2D pathogenic mechanism, the pancreatic β-cell could not stand the decomposition of excessive glucose in the body from long-term high gluintakes, leading to pancreatic β-cell exhaustion and insulin resistance. Although the ulation of glucose in the body is a crucial factor for T2D development, it is worth g that the inflammatory-dependent apoptosis stemming from the fat accumulation e pancreatic β-cell is also a pivotal issue. In T2D specific signaling pathways, FFA uced by the hydrolysis of oils and fats not only disrupted the glucose homeostasis  the T2D multiple-molecule targeting dru  tions with respect to suitable regulation a  effect are given in Table 7. '' denotes the drug could bind to the biomarkers with a desired regula the multiple molecular drugs, "R(CH2)nH" indicates the alkyl group; "R represents the sulfinyl group; "R′R"NH" is the secondary amine; "RCOR group; and the "RCOOH" is the carboxyl group. represents a solid w viewer and indicates a hashed wedge where the bond is receding aw According to Tables 5-7, the combin potential to restore the abnormal regulat the cell apoptosis caused by the endogeno of FAS can decrease the cell apoptosis by ment. In addition, the reduction of IKK degradations, hence mitigating the forma sequent activation of cell apoptosis. Furth posed multiple-molecule targeting drug FOXO1, therefore enhancing the cell prol achieve the equilibrium between glucose administering the proposed multiple-mo and PPARγ accompanied by the downr attained, yielding encouraging results fo

The Association between Macrophage Po
When it comes to the T2D pathoge withstand the decomposition of excessiv cose intakes, leading to pancreatic β-cell accumulation of glucose in the body is a noting that the inflammatory-dependent in the pancreatic β-cell is also a pivotal produced by the hydrolysis of oils and indicates a hashed wedge where the bond is receding away from the viewer.
According to Tables 5-7, the combination therapy of Sulforaphane and Biotin has the potential to restore the abnormal regulation in T2D. The reversing of STAT3 may reduce the cell apoptosis caused by the endogenous damage substances, and the lower expression of FAS can decrease the cell apoptosis by the interference of an exogenous microenvironment. In addition, the reduction of IKK expression can suppress some phosphorylated degradations, hence mitigating the formation of inflammatory environments and the subsequent activation of cell apoptosis. Furthermore, the downregulation of ETS1 by the proposed multiple-molecule targeting drug can facilitate FOXA2 to form the connection with FOXO1, therefore enhancing the cell proliferation. The recovery expression of PPARγ can achieve the equilibrium between glucose intake and insulin secretion. Taken together, by administering the proposed multiple-molecule targeting drug, an upregulation of STAT3 and PPARγ accompanied by the downregulation of IKK, ETS1, and FAS can validly be attained, yielding encouraging results for the treatment of T2D patients.

The Association between Macrophage Polarization and Inflammatory Response in T2D
When it comes to the T2D pathogenic mechanism, the pancreatic β-cell could not withstand the decomposition of excessive glucose in the body from long-term high glucose intakes, leading to pancreatic β-cell exhaustion and insulin resistance. Although the accumulation of glucose in the body is a crucial factor for T2D development, it is worth noting that the inflammatory-dependent apoptosis stemming from the fat accumulation in the pancreatic β-cell is also a pivotal issue. In T2D specific signaling pathways, FFA produced by the hydrolysis of oils and fats not only disrupted the glucose homeostasis but also increased the ER stress to indirectly trigger the follow-up inflammatory response, i.e., IKKinduced NF-κB pathway and apoptotic pathways related to the CASP family. Additionally, to form the inflammatory microenvironment, the immune response is initiated to activate the releasing of pro-inflammatory cytokines such as IL-1β and IL-6 when damage and infection occur.
Among all types of immune cells, macrophages exert a significant effect on pancreatic β-cells in T2D. Macrophages are mononuclear phagocytic cells and widely distributed in human organs. They play an indispensable role in physiological homeostasis, immune surveillance, and cell regeneration. There are mainly two types, M1-and M2-type macrophage, existing in the pancreas. M1-type macrophages are referred to as "proinflammatory macrophages" that can activate inflammatory response and recruit T-cells and natural killer cells to eliminate the harmful substance [39]. However, excessive inflammation stimulated by cytokine and chemokine often inevitably causes great harm to health. Conversely, M2-type macrophages named "anti-inflammatory macrophages" can primarily mediate the side effect arising from the inflammatory and immune response [40]. Under a high glucose and fat environment, glucotoxicity and lipotoxicity in response to the damage infringe the stability of pancreatic β-cell, so that the accumulation of intracellular stress including the oxidative and ER stress intensifies [41]. Moreover, the intracellular stress can modulate the polarization of macrophage from M2-type to M1-type, causing the imbalance in the ratio of M1-type/M2-type macrophages. Consequently, the number of M1-type macrophages residing in the pancreatic β-cell outweighs that of M2-type macrophages to induce pancreatic β-cell toward inflammatory-dependent apoptosis and pancreatic function impairment, which further strengthens the investigation of pancreatic β-cell destruction by apoptosis and inflammation in the T2D pathogenic mechanism.

The Modulation of Ion Channels Involved in Insulin Resistance and Glucose Homeostasis
Nutrients and chemical substances are essential to sustain cell stability and survival. The ways for substances intakes from the extracellular space into cells consist of the diffusion across the plasma membrane and the transmission on it via channel proteins. As modulators of ion channels residing on the plasma membrane, the GLUT family, i.e., GLUT1 and GLUT2, stood out as crucial factors to maintain the equilibrium between the glucose uptake and insulin secretion in the T2D pathogenic mechanism. As the result of high glucose intakes, GLUT1 and GLUT2 increase the ratio of ATP/ADP, leading to inducing the electrical and transductive signal to inactivate the KATP + ion channel protein, a modulator of K + flux in cells [42]. Then, the inactivated KATP + channel further activates the opening of the Ca 2+ ion channel, elevating the concentration of Ca 2+ to promote insulin secretion [43]. However, in the aforementioned pathogenic signaling pathways of T2D, FFA-dependent pathway inhibited GLUT2 to reduce the sensitivity of insulin secretion via pancreatic β-cell in response to glucose accumulation. This phenomenon has also been reported in diseases related to glycogen metabolism and metabolic disorders [44]. It is also commonly found in T2D and should be taken into consideration when performing its medical treatment.

Potential Multiple-Molecule Targeting Drug for to the Identified Biomarkers of T2D
In recent years, pharmacology companies have devoted to the discovery and design of drugs for T2D treatment, e.g., Metformin, Sulfonylureas, Meglitinides, Thiazolidinediones, DPP-4 inhibitors, GLP-1 receptor agonists, etc. Metformin is the first-line medication for the treatment of T2D, working for lowering the production of glucose in liver and improving insulin sensitivity. In spite of holding promising efficacy, it can give rise to acute pancreatitis if overdosed [45,46]. Adjuvant medication therapy with either Sulfonylureas such as Glucotrol or Meglitinides such as Repaglinide can effectively stimulate pancreas β-cells to secrete more insulin in the short term; however, Sulfonylureas can cause the progressive dysfunction of pancreas β-cells in a long-term treatment. Furthermore, they also aggravate the risk of gaining weight and lowering blood glucose levels [47,48]. Thiazolidinediones is an analogue of Metformin, rendering tissues more sensitive to the insulin. However, it may as well result in overweight and even severe side effects, i.e., heart failure and anemia. DPP-4 inhibitors that prevent DPP-4 from degrading GLP-1, e.g., Sitagliptin, and GLP-1 receptor agonists that affect GLP-1 to last longer, e.g., Liraglutide, are of particular interest for their glucose-lowering effects, which are useful to the treatment of T2D [49]. Although they are at very low risk of hypoglycemia and are also known to help with weight loss, these medications might lead to gastrointestinal disorders, e.g., nausea, diarrhea, or constipation, and overdosage often increases the probability of pancreatitis occurrence [50,51]. SGLT2 inhibitors are one of the newer medications used to lower blood sugar in patients with T2D. Unlike most anti-diabetic drugs that work by either increasing insulin in the body or increasing the insulin sensitivity of cells, SGLT2 inhibitors, e.g., Dapagliflozin, Canagliflozin, Empagliflozin, etc., cause kidneys to excrete glucose into urine to reduce the blood sugar level [52]. However, excess sugar in urine creates a cozy environment for bacteria and fungi to thrive in the urinary tract or genital area, giving rise to urinary tract infection (about 50% greater in patients with diabetes) [53]. In addition, some patients may experience increased frequency of urination, which leads to lower blood pressure due to the loss of fluids; others may notice a slight increase in their cholesterol values [54]. Therefore, discovering effective treatments and promising medications for T2D is still needed.
The proposed drug combination of Sulforaphane and Biotin not only is natural and readily available from daily life but also holds a chance to keep the body from inflammatorydependent apoptosis and fat accumulation. Although further evaluation in clinical trials is still needed and the potential side effects after consuming should be monitored, the newfound medication indeed brings hope to improve T2D management.

Overview the Procedure of Systems Biology and Systematic Drug Discovery and Design for Type 2 Diabetes (T2D) and Non-T2D
To investigate and gain much more understanding of the T2D pathogenesis, we applied a systems biology approach [55] to build core signaling pathways and explored discrepancies between non-T2D and T2D from the perspective of molecular genetics and epigenetics. Furthermore, a systematic drug discovery procedure was proposed to discover and design a promising drug combination for treating T2D. Notably, drug design specifications were further utilized for screening potential drugs from predicted candidate drugs. The procedure of a systems biology approach and the outline of systematic drug discovery and design method is shown in Figure 1 and subdivided into a few steps:

The Construction of Candidate GWGENs
The candidate protein-protein interaction network (PPIN) and candidate gene regulatory network (GRN) were constructed and integrated into GWGEN for non-T2D and T2D respectively by mining the protein-protein interaction and gene regulation databases.

The Identification of Real GWGENs
The system identification and system order detection method Akaike information criterion (AIC) are used to remove the false positive protein interactions and gene regulations in candidate GWGENs to obtain the real GWGENs via the RNA-seq data downloaded from NCBI GSE81608.

Extracting Core GWGENs by Principal Network Projection (PNP) Method
The core GWGENs were obtained through extracting 85% of the principal network components consisting of the top 3000 proteins, genes, miRNAs, and lncRNAs by the PNP method from the viewpoint of network significance.

The Explorations of Core Signaling Pathways
According to the nodes and edges in core GWGENs and the KEGG pathways annotations, the core signaling pathways were established for T2D and non-T2D. Subsequently, we investigated the genetic and epigenetic pathogenic mechanism by comparing their core signaling pathways.

Potential Multiple-Molecule Targeting Drug Discovery
The deep neural network (DNN) was trained for the drug-target interaction model (DTI) via the drug-target interaction database. With the help of the DTI model, drugs having the possible interactions with biomarkers were predicted to be the candidate drugs. Then, the potential multiple-molecule targeting drug was selected for T2D treatment before clinical trials from the candidate drugs according to the drug design specifications of drug regulation ability, toxicity, sensitivity, and side effect.

Data Mining, Preprocessing and Candidate GWGENs Construction
In our research, the dataset with accession number GSE81608 was downloaded from the gene expression omnibus (GEO) of the National Center for Biotechnology Information (NCBI), and its relevant experimental platform was GPL16791. The dataset contained mRNA expression levels of genes, proteins, miRNAs, TFs, receptors, and lncRNAs in pancreatic α-cell, β-cell, δ-cell, and PP-cell. The samples of the dataset were assorted into two categories, i.e., T2D and non-T2D. In this study, to identify the T2D pathogenic mechanism on the pancreatic β-cell, the samples of the subtype β-cell were specifically extracted from the original experimental data. Furthermore, according to the WHO report, the age distribution of incidence in diabetes is at the range of approximately 50 years old and older. Therefore, 86 and 123 samples were chosen respectively for T2D and non-T2D with age equal to or greater than to 50 years old. Then, we constructed the candidate PPIN based on the Database of Interacting Proteins (DIP) [56], IntAct [57], the Biological General Repository for Interaction Datasets database (BioGRID) [58], the Biomolecular Interaction Network Database (BIND) [59], and the Molecular INTeraction Database (MINT) [60]. In addition, the candidate GRN was built based on the Integrated Transcription Factor Platform database (ITFP) [61], the Human Transcriptional Regulation Interactions database (HTRI) [62], and the TRANScription FACtor database (TRANSFAC) [63]. MiRNAs and lncRNAs regulations in GRN were referenced to the TargetScanHuman database [64], CircuitsDB [65], and StarBase2.0 [66].

Constructing the Systematic Model for the Candidate GWGEN of T2D and Non-T2D
For the purpose of imitating the human cellular system, we built the stochastic interactive and regulatory models to describe the candidate GWGEN. The candidate GWGEN was composed of PPIN containing the protein-protein interactions and GRN containing the regulations of genes, miRNAs, and lncRNAs. Next, we described the interactions of proteins and regulations of genes, lncRNAs, and miRNAs using the protein-protein interactive model (PPIM), gene regulatory model (GRM), lncRNA regulatory model (LRM), and miRNA regulatory model (MRM) in detail.
First, the q-th protein in PPIM can be described as the following equations: where p q [n] indicates the expression level of the q-th protein in the n-th sample and p r [n] indicates the expression level of the r-th protein in the n-th sample; κ qr denotes the interaction ability between the q-th protein and the r-th protein; G q represents the total number of proteins that interact with the q-th protein; Q denotes the total number of proteins in candidate PPIM; N means the total number of samples in our data; λ q,PPI M shows the basal level in the model of the q-th protein due to unknown protein interactions of histone modifications such as phosphorylation and acetylation; and µ q,PPI M [n] expresses the data noise of the q-th protein.
Second, the transcriptional regulation of the x-th gene in GRM is given as below: , for x = 1, . . . , X, n = 1, . . . , N (2) where g x [n] denotes the expression level of the x-th gene in the n-th sample; t u [n], l v [n], and m w [n] individually indicate the expression level of the u-th TF, the v-th lncRNA and the w-th miRNA of the n-th sample; U x , V x , and W x separately mean the total binding number of TFs, lncRNAs and miRNAs; α xu shows the transcriptional regulatory ability from the u-th TF to the x-th gene; β xv represents the transcriptional regulatory ability from the v-th lncRNA to the x-th gene; γ xw ≥ 0 expresses the post-transcriptional regulatory ability of the w-th miRNA on the x-th gene; X denotes the total number of gene in GRNs; N indicates the total number of data samples; λ x,GRM means the basal level of the x-th gene because of the unknown gene regulations such as methylation; and µ x,GRM [n] is the data noise.
Third, TFs, lncRNAs, and miRNAs also have a potential impact on the i-th lncRNA and we can depict this behavior by the LRM in candidate GWGENs. The equation is obtained as follows: , for i = 1, . . . , I, n = 1, . . . , N where l i [n] indicates the expression level of the i-th lncRNA; t u [n], l v [n], and m w [n] represent the expression level of the u-th TF, the v-th lncRNA, and the w-th miRNA of the n-th sample, respectively; U i , V i , and W i individually show the total binding number of TFs, lncRNAs and miRNAs. σ iu expresses the transcriptional regulatory ability from the u-th TF to the i-th lncRNA; ς iv means the transcriptional regulatory ability from the v-th lncRNA to the i-th lncRNA; τ iw ≥ 0 denotes the post-transcriptional regulatory ability from the w-th miRNA to the i-th lncRNA; I is the total number of lncRNAs and N indicates the total number of samples; λ i,LRM denotes the basal level of the i-th lncRNA; µ i,LRM [n] expresses the data noise.
Fourth, the expression of the j-th miRNA is also affected by the TFs, lncRNAs, and miRNAs. Furthermore, we can illustrate MRM in candidate GWGENs through the following equation: , for j = 1, . . . , J, n = 1, . . . , N (4) where m j [n] means the expression level of j-th miRNA; t u [n], l v [n], and m w [n] separately represent the expression level of the u-th TF, the v-th lncRNA and the w-th miRNA, respectively; U j , V j , and W j show the binding total number of TFs, lncRNAs and miRNAs; ω ju denotes the transcriptional regulatory ability from the u-th TF to the j-th miRNA; ξ jv expresses the transcriptional regulatory ability from the v-th lncRNA to the j-th miRNA; ψ jw indicates the post-transcriptional regulatory ability from the w-th miRNA to the j-th miRNA; J is the total number of miRNAs and N indicates the total number of samples; λ j.MRM is the basal level of the j-th miRNA; µ j,MRM [n] denotes the data noise.

The System Identification and System Order Detection Methods for Real GWGENs Identification
According to the above stochastic models, PPIM in (1) composed the candidate PPIN; GRM in (2), LRM in (3), and MRM in (4) constituted the candidate GRN. We made use of the system identification and system order detection methods to obtain the real GWGENs of T2D and non-T2D by the corresponding RNA-seq data, respectively. In order to identify the parameters of these stochastic models, Equations (1)-(4) could separately be rewritten as the following linear regression forms. (8) for q = 1, . . . , Q, x = 1, . . . , X, i = 1, . . . , I, j = 1, . . . , J, n = 1, . . . , N, where (5), (6), (7), and (8) are separately regression forms for PPIM, GRM, LRM, and MRM. Q, X, I, and J are respectively the total number of proteins, genes, lncRNAs and miRNAs in the candidate GWGWN, and N is the total number of samples.
The linear regression forms in (5), (6), (7), and (8) could be simplified as the following formulas: where the Φ q,PPI M [n], Φ x,GRM [n], Φ i,LRM [n], and Φ j,MRM [n] individually denote the regression vectors of proteins, gene, lncRNAs, and miRNAs in the n-th sample; θ q,PPI M means the parameter vector of the protein-protein interaction abilities and protein basal levels; θ x,GRM , θ i,LRM , and θ j,MRM are the parameter vector of the transcriptional regulatory abilities and basal levels of the genes, lncRNAs, and miRNAs, respectively; ε q,PPI M , ε x,GRM , ε i,LRM , and ε j,MRM are individually the data noise for PPIM, GRM, LRM and MRM. For N samples, the above regression equations are given as below: . . .
The above equations could be individually represented as the follows: M j = Φ j,MRM · Θ j,MRM + E j,MRM , for j = 1, . . . , J where Φ q,PPI M , Φ x,GRM , Φ i,LRM , and Φ j,MRM are separately the regression matrix of proteins, genes, lncRNAs and miRNAs of N samples. Θ q,PPI M , Θ x,GRM , Θ i,LRM , and Θ j,MRM are the corresponding interactive and regulatory parameter vectors. E q,PPI M , E x,GRM , E i,LRM , and E j,MRM are the corresponding data noise vectors.
What is worth noticing is that the maximum degree of the parameter estimation of proteins in PPIs and genes in GRNs must be less than the samples; otherwise, it would cause the overfitting problem during the process of system identification.
Firstly, for the purpose of identifying the real GWGENs, we adopted the least square method to estimate the parameter vectors θ q,PPI M , θ q,GRM , θ q,LRM , and θ q,MRM with negative regulation constraint on miRNA as follows: , for 1,...,  What is worth noticing is that the maximum degree of the parameter estimation of proteins in PPIs and genes in GRNs must be less than the samples; otherwise, it would cause the overfitting problem during the process of system identification.
Firstly, for the purpose of identifying the real GWGENs, we adopted the least square method to estimate the parameter vectors , , for 1,...,  What is worth noticing is that the maximum degree of the parameter estimation of proteins in PPIs and genes in GRNs must be less than the samples; otherwise, it would cause the overfitting problem during the process of system identification.
Firstly, for the purpose of identifying the real GWGENs, we adopted the least square method to estimate the parameter vectors , , , where , q PPIM Ω means the estimated residual error of the q-th protein for the least square parameter estimation in (21) and q Q denotes the number of protein interactions with the q-th protein.
, , Based on the above constrained optimization problems in Equations (21)-(24), we sought out the optimal solution of the interactive ability parameters among proteinŝ Θ q,PPI M , the regulatory parameters of genesΘ x,GRM , lncRNAsΘ i,LRM and miRNAŝ Θ j,MRM via the RNA-seq data of non-T2D and T2D, respectively. The above optimization problems for parameter estimation could be solved by the MATLAB optimization toolbox. Carefully, the negative inequality constraints in Equations (21)-(24) mean that the regulatory parameters of miRNAs should be less than or equal to zero to ensure the negative regulation of miRNAs on genes, lncRNAs and miRNAs.
After the parameter estimation of candidate GWGENs of non-T2D and T2D by the corresponding RNA-seq data, we used the system order detection method, AIC, to detect the system order (the number of interactions of each protein or the number of regulations of each gene, lncRNA and miRNA). The detailed equations of AIC for each protein, gene, lncRNA and miRNA are shown below.
where Ω q,PPI M means the estimated residual error of the q-th protein for the least square parameter estimationΘ q,PPI M in (21) and Q q denotes the number of protein interactions with the q-th protein.
where Ω x,GRM denotes the estimated residual error of the x-th gene in (22) and O x,GRM means the number of regulations of the genes, lncRNAs and miRNAs on the x-th gene; Θ x,GRM is the estimated parameters in (22).
where Ω i,LRM shows the estimated residual error of the i-th lncRNA in (23) and O i,LRM indicates the number of regulations of the genes, lncRNAs and miRNAs on the i-th lncRNA; Θ i,LRM expresses the estimated parameters in (23).
where Ω j.MRM expresses the estimated residual error of the j-th miRNA in (24) and O j.MRM represents the number of parameters regulations of the genes, lncRNAs and miRNAs on the j-th miRNA;Θ j,MRM is the estimated parameter in (24). According to the order detection of AIC in system identification [67], the real order of a system (i.e., the number of interactions of the q-th protein in (1) or the number of regulations on the x-th in (2)) is to minimize the AIC. Therefore, the true number of interactions or regulations for each protein, gene, lnRNA and miRNA in candidate GWGENs can be obtained by solving the following AIC minimization problems.
where Q * q denoted the true number of protein interactions for the q-th protein; U * x , V * x , W * x individually indicate the true number of regulations of genes, lncRNAs and miRNAs on the x-th gene; U * i , V * i , W * i denote the true number of regulations of genes, lncRNAs and miRNAs on the i-th lncRNA, respectively; U * j , V * j , W * j are separately the true number of regulations of genes, lncRNAs and miRNAs on the j-th miRNA. Therefore, the proteinprotein interactions and gene, miRNA, and lncRNA regulations out of true order by AIC minimization problems in (29)-(32) are considered as false positives in candidate GWGEN of non-T2D and T2D and should be removed one by one to obtain the real GWGEN.

The Principal Network Projection (PNP) Method for the Core GWGENs Extraction from Real GWGENs
The real GWGENs of non-T2D and T2D were compared to investigate the genetic and epigenetic pathogenic molecular mechanism. However, it was still harder to analyze the two larger scale and complicated real GEGENs so that we applied the principal network projection (PNP) method on the basis of the singular value decomposition (SVD) to extract the core GWGENs from the real GWGENs. Before studying the core network extraction in depth, we will start by introducing the real GWGEN network matrix H. Network matrix H consists of interactions among proteins and regulations of the TF-gene, TF-lncRNA, TF-miRNA, lncRNA-gene, lncRNA-lncRNA, lncRNA-miRNA, miRNA-gene, miRNA-lncRNA, and miRNA-miRNA in the real GWGEN as the follows:    1  2   11  12  1  1  11  12  1  1  11  12  1  1   21  22  2  2  21  22  2  2  21  22  2  2   1  2  1 2 1ˆˆˆˆx where ˆq r κ is the interaction ability of between the q-th protein and the r-th protein; ˆx u α , ˆx v β , and ˆx w γ are individually the regulation abilities of the u-th TF on the x-th gene, the v-th lncRNA on the x-th gene, and the w-th miRNA on the x-th gene; ˆi u σ , ˆi v ς , and ˆi w τ represent the regulation abilities of the u-th TF on the i-th lncRNA, the v-th lncRNA on the i-th lncRNA, and the w-th miRNA on the i-th lncRNA, respectively; ˆj u ω , ˆj v ξ , and ˆj w ψ separately show the regulation abilities of the u-th TF on the j-th miRNA, the v-th lncRNA on the j-th miRNA, and the w-th miRNA on the w-th miRNA. In addition, some zeros are omitted in the matrix, which means that there is neither interaction nor regulation between the source and target. Thereafter, the core GWGENs were obtained by applying PNP on the network matrix H with an energy threshold of 85%. First, the network matrix H is decomposed by whereκ qr is the interaction ability of between the q-th protein and the r-th protein; α xu ,β xv , andγ xw are individually the regulation abilities of the u-th TF on the x-th gene, the v-th lncRNA on the x-th gene, and the w-th miRNA on the x-th gene;σ iu ,ς iv , andτ iw represent the regulation abilities of the u-th TF on the i-th lncRNA, the v-th lncRNA on the i-th lncRNA, and the w-th miRNA on the i-th lncRNA, respectively;ω ju ,ξ jv , andψ jw separately show the regulation abilities of the u-th TF on the j-th miRNA, the v-th lncRNA on the j-th miRNA, and the w-th miRNA on the w-th miRNA. In addition, some zeros are omitted in the matrix, which means that there is neither interaction nor regulation between the source and target. Thereafter, the core GWGENs were obtained by applying PNP on the network matrix H with an energy threshold of 85%. First, the network matrix H is decomposed by singular value decomposition (SVD) as follows [68]: where S ∈ R (Q * +X * +I * +J * )×(Q * +X * +I * +J * ) and D T ∈ R (U * +V * +W * )×(U * +V * +W * ) are the unitary singular matrices; V = diag(v 1 , · · · , v ii , · · · v U * +V * +W * ) ∈ R (Q * +X * +I * +J * )×(U * +V * +W * ) denotes the diagonal matrix of which the components at the diagonal are the singular values of H and are arranged in descending order, i.e., v In addition, we defined the normalization of singular values in (36) as below.
From the above formula, the top I significant singular vector structures were selected to represent the system with energy equal to or more than 85%. Then we respectively projected each node of the real GWGEN (i.e., each row of network matrix H) to the top I singular vectors as follows.
Z(a, b) = h a,: · d T b,: , for a = 1, . . . , Q * + X * + I * + J * , b = 1, . . . , I where Z(a, b) denotes the projection value of the a-th node on the b-th significant singular vector; h a,: means the a-th row vector of network matrix H, and d T :,b denotes the the b-th column of D T . Next, we define the 2-norm projection value to each node such as protein, gene, lnRNA and miRNA in real GWGEN from the top I significant singular vectors as below.
According to the equation in (40), the top 3000 pivotal proteins, genes, miRNAs, and lncRNAs with higher projection value were selected to construct the core GWGENs for T2D and non-T2D, respectively. Afterwards, the core GWGENs were uploaded to the DAVID website for KEGG pathway enrichment analysis, and the construction of core signaling pathways for non-T2D and T2D were accomplished with the help of the annotation of KEGG pathways. The enrichment analysis was used to validate that our results were associated with T2D. Eventually, the potential biomarkers were chosen through investigating the T2D pathogenesis by comparing the non-T2D and T2D core signaling pathways.

Systematic Drug Discovery Based on Drug Design Specifications for T2D
Based on drug design specifications, we aimed to discover a potential multiplemolecule targeting drug for the identified biomarkers. We proposed a DTI model based on a deep neural network to predict the drug-target interaction between the available drugs and targets (biomarkers). Since it is not enough to consider the drug-target interaction alone for drug design, some specifications, i.e., regulation ability, toxicity, sensitivity and side effect are necessary to sieve the candidate drugs predicted by the DTI model. Then, with these considerations, we suggested an appropriate multiple-molecule targeting drug for T2D treatment before clinical trials.
First, based on the flowchart of the systematic drug discovery method in Figure 3, we accessed an integrated collection of protein-ligand affinity data through BindingDB's unified interface [69], which harvests the selected data and information from multiple existing databases, i.e., PubChem, ChEMBL, UniProt, DrugBank, etc. (for more details, readers can refer to Appendix B). Recently, the feature-based method, for instance, molecular descriptor, is broadly used to describe the structural and chemical properties of molecules such as characteristics from the 2D and 3D spectrum of structure, molecular weight, hydrophilic, hydrophobicity, etc. It was validated that the chemical properties of the drug and genomic sequence of the target could be described with the molecular descriptor for the purpose of convenient analysis in drug design, since the molecular descriptor can transform complicated chemical properties into a simple numerical feature vector [70,71]. On this ground, we utilized the functions from python package pyBioMed to transform both the drug and target into a descriptor as their features individually under the python2.7 environment. The considered drug features of a molecule included constitutional descriptors, connectivity indices, E-state indices, charge descriptors, molecular properties and kappa shape indices. For the target features, the structural and physicochemical features of proteins and peptides from amino acid sequence such as amino acid composition, dipeptide composition . . . , etc. are calculated (for more detailed information about the descriptor transformation, readers could access the documents of pyBioMed [72]). Then, the descriptor of the drug and target were combined into a feature vector v drug-target corresponding to the drug-target pair as below [73]: Among v drug−target , 363 features for a drug and 996 features for a target were collected, where the former features in v drug-target are for the drug and the latter are for the target. d 1 represents the first drug feature; t 1 indicates the first target feature; M is the total number of drug features; and N denotes the total number of target features. Before training the DNNbased DTI model, we encountered a problem that features are not in the same standing. Since the variables of the features are measured at different scales, they do not contribute equally to the model fitting and might end up creating a bias, i.e., the feature with a larger value would dominate the result. To deal with this potential problem, a feature-wise scaling is usually implemented prior to model fitting. As powerful techniques of feature scaling, Min-max Normalization and Standardization methods are commonly used for bringing every feature in the same footing without any upfront importance. Although Min-max Normalization can also normalize the data into the same scale, it is much more sensitive to outliers compared to Standardization. Therefore, Standardization was performed on the features before applying principal component analysis (PCA) to improve the model performance, and the corresponding mathematical formulation is shown as follows: where d i is the i-th drug feature and the d * i is the i-th drug feature after Standardization; µ i and σ i individually represent the mean and standard deviation of the i-th drug feature; t j indicates the j-th feature of the target and t * j denotes the j-th feature of the target after Standardization; µ j and σ j separately signify the mean and standard deviation of the j-th target feature; M denotes the total number of drug features; and N is the total number of target features.
where x and h denote input and output, respectively; w is the weighting matrix and b is the bias vector; σ(·) indicates the activation function with Rectified Linear Unit (ReLU) in the hidden layer and Sigmoid in the output layer. Since the binary classification issue is concerned, the binary-cross entropy is chosen as the cost function to calculate the model loss: where p n means the truth label of positive interaction;p n indicates the predictive probability of positive interaction, 1 − p n shows the truth label of negative interaction, and 1 −p n represents the predicted probability of negative interaction. L(p n , p n ) denotes the average of total loss C(p n , p n ). According to the cost function, the backward propagation algorithm is applied to update the model parameter set θ containing the weighting matrix and bias vector through calculating the gradient of cost function in (46) to get the result in (50) and eventually derive the optimal solution θ * in (48) as follows.
where l is the l-th epoch of learning procedure; η is the learning rate; and ∇L(θ l−1 ) is the gradient of L(θ l−1 ) as below: Based on the backward propagation method, the DNN-based DTI model could adjust the parameters to fit the drug-target interaction data at each iteration well. In addition, the hyperparameters were tuned to not only lower the training time but also achieve the best model performance. We used Adam [74] as an optimizer with a default setting and set the learning rate as 0.0001 to make the model parameter θ converge faster and precisely. We set 100 for epochs and 100 for batch size. For the data, we split one-fourth of the data as testing data and three-fourths of it as training data. Moreover, we further divided the training data into ten equal folds to perform ten-fold cross-validation, in which nine-tenths of them were used for model training and one-tenth was used for validation. Such application is exploited to supervise whether the model was better than that of the former epoch and to guarantee the model stability. Furthermore, to avoid overfitting, not only did we embed the dropout layer (dropout rate = 0.4) behind each hidden layer but also applied the early stopping strategy to monitor whether the test accuracy decreased with the continuous improvement of training accuracy or not. After accomplishing model training as shown in Figure 4, we adopted the AUC (area under the curve) score and ROC (receiver operating characteristics) curve [75] in Figure 7 as the performance measurement. It is one of the most useful evaluation metrics to visualize the model performance when it comes to the binary classification problems. The higher AUC score is that in which the area under the line is larger; the better accuracy is for the DNN-based DTI model predicting the true positive and true negative drug-target interaction. The formulas for the AUC score and ROC curve are shown below.
where TP (True Positive) means that the real value is true and is judged correctly; TN (True Negative) shows that the real value is true and is judged by mistake; FP (False Positive) indicates the real value is false and is judged accurately; FN (False Negative) represents that the real value is false and is judged in error.
It is worth noting that the majority of previous network approaches use machine learning (ML)-based methods to perform predictions over the drug-target interaction space [76][77][78]. However, such techniques have major limitations. Traditional ML is a timeconsuming process and requires lots of expertise to design and run the algorithms. Without a good understanding of the domain knowledge and feature engineering, a traditional machine algorithm can hardly work well.
As a kind of ML-based model with multiple hidden layers and a more complicated parameter training procedure, the deep learning method attracts lots of attention for its relatively better performance and ability to learn representations of data with multiple levels of abstraction [79]. When there is a lack of domain understanding for feature introspection, deep learning techniques outshine others as we do not have to worry much about feature engineering. Additionally, the comparison of deep-learning methods with other acceptable ML algorithms in the task of new DTIs identification has previously been performed as well, where framework based on deep learning could indeed achieve relatively high prediction performance [80]. As a result, for each algorithm compared in our work, only default parameters without fine-tuning were set to learn features from the data. However, a disadvantage should be solved that there are no experimental validated noninteracting drug-target pairs so that it is difficult to select negative samples, which would largely influence the predictive accuracy of the method [81]. Hence, apart from extracting a great number of samples from the presently largest database, BingdingDB, we further followed the criteria in Appendix B to abstract negative examples from existing drug-target interactions, which enabled us to evaluate and manipulate the data more realistically to achieve better performance.

Conclusions
In this study, on the basis of our proposed combination of systems biology and systematic drug discovery design, we not only investigated the complicated pathogenic molecular mechanism of T2D from genetic and epigenetic perspectives but also discovered a potential drug combination for the clinical treatment of T2D based on four drug design specifications. At first, we constructed the stochastic biological networks by systematic identification and system order detection methods by exploring big data. After that, we extracted the core signaling pathways by the PNP method and the annotation of KEGG pathways to select the significant biomarkers from the pathogenesis of T2D. For the purpose of discovering candidate drugs interacting with these biomarkers, we trained a DNN-based DTI model to predict the possible drug-target interactions. Moreover, we considered the drug regulation ability, toxicity, sensitivity and side effects as the drug design specifications to better sieve appropriate potential drugs. As a result, a set of combinational multiple molecular drugs is proposed as a multiple-molecule targeting drug for T2D treatment. Since the beginning of this century, the advent of the genomic era has presented researchers with a myriad of high throughput genome-wide biological data, which can assist in the interpretation of the indecipherable genetic and epigenetic regulations and the optimization of drug efficacy. Considering the combination of multiple types of genomics data could benefit us to gain deeper insight into the pathogenic mechanism of diseases. It is expected that our systems biology and systematic drug discovery design might provide a new orientation for T2D therapeutics.    . The core signaling pathways of non-T2D based on our result for investigating the non-T2D genetic and epigenetic molecular mechanism. The genes and proteins in the core signaling pathways were chosen from the non-T2D core GWGENs. The gene regulations and protein interactions were constructed on the basis of the edges in non-T2D core GWGENs. The cellular functions caused by target genes are clustered with solid lines in different colors. Figure A3. The core signaling pathways of non-T2D based on our result for investigating the non-T2D genetic and epigenetic molecular mechanism. The genes and proteins in the core signaling pathways were chosen from the non-T2D core GWGENs. The gene regulations and protein interactions were constructed on the basis of the edges in non-T2D core GWGENs. The cellular functions caused by target genes are clustered with solid lines in different colors. Figure A4. The core signaling pathways of T2D based on our result for investigating the T2D genetic and epigenetic molecular mechanism. The gene regulations and protein interactions were constructed on the basis of the edges in T2D core GWGENs. The cellular functions caused by target genes are clustered with solid lines in different colors. The cellular functions caused by target genes are clustered with solid lines in different colors. The bold arrowhead marks in black denote the relatively low and high expression in T2D pathogenic signaling pathways in contrast to non-T2D.Appendix B. Figure A4. The core signaling pathways of T2D based on our result for investigating the T2D genetic and epigenetic molecular mechanism. The gene regulations and protein interactions were constructed on the basis of the edges in T2D core GWGENs. The cellular functions caused by target genes are clustered with solid lines in different colors. The cellular functions caused by target genes are clustered with solid lines in different colors. The bold arrowhead marks in black denote the relatively low and high expression in T2D pathogenic signaling pathways in contrast to non-T2D.

Appendix B
BindingDB is a well accessible database of measured binding affinities, focusing chiefly on the interactions among small molecules and proteins. It provided about one million binding data for thousands of small molecules and proteins. By the following five criteria, we collected a binary classification dataset with 33,777 samples for positive examples and 27,493 for negative instances.

1.
The chemical identifier (PubChem CID) is recorded, and the small molecule has chemical structure expressed by SMILES (Although both SMILES and InChI are recorded in BindingDB dataset, SMILES is easier to read and more supported by software).

2.
The protein identifier (Uniprot ID) is recorded, and the protein is represented by the sequence in Fasta format.

3.
The half maximal inhibitory concentration (IC50) value, a primary measure of binding effectiveness, is recorded 4.
The chemical molecule weight is less than 1000 Da due to our focus on small molecule drugs.

5.
According to the activity threshold discussed by Wang et al. [82], it is recorded as positive if the IC50 is less than 100 nm and negative if IC50 is greater than 10,000 nm.
Since most drug-target interaction databases do not provide real negative examples, it is a common solution to randomly sample a small number of unknown interactions as negative examples. However, unknown interactions do not mean negation, i.e., without interactions. There might just be no experimental evidence or record at present. Therefore, we followed the above procedure to evaluate and classify BindingDB samples more practically.