Identifying Drug Targets of Oral Squamous Cell Carcinoma through a Systems Biology Method and Genome-Wide Microarray Data for Drug Discovery by Deep Learning and Drug Design Specifications

In this study, we provide a systems biology method to investigate the carcinogenic mechanism of oral squamous cell carcinoma (OSCC) in order to identify some important biomarkers as drug targets. Further, a systematic drug discovery method with a deep neural network (DNN)-based drug–target interaction (DTI) model and drug design specifications is proposed to design a potential multiple-molecule drug for the medical treatment of OSCC before clinical trials. First, we use big database mining to construct the candidate genome-wide genetic and epigenetic network (GWGEN) including a protein–protein interaction network (PPIN) and a gene regulatory network (GRN) for OSCC and non-OSCC. In the next step, real GWGENs are identified for OSCC and non-OSCC by system identification and system order detection methods based on the OSCC and non-OSCC microarray data, respectively. Then, the principal network projection (PNP) method was used to extract core GWGENs of OSCC and non-OSCC from real GWGENs of OSCC and non-OSCC, respectively. Afterward, core signaling pathways were constructed through the annotation of KEGG pathways, and then the carcinogenic mechanism of OSCC was investigated by comparing the core signal pathways and their downstream abnormal cellular functions of OSCC and non-OSCC. Consequently, HES1, TCF, NF-κB and SP1 are identified as significant biomarkers of OSCC. In order to discover multiple molecular drugs for these significant biomarkers (drug targets) of the carcinogenic mechanism of OSCC, we trained a DNN-based drug–target interaction (DTI) model by DTI databases to predict candidate drugs for these significant biomarkers. Finally, drug design specifications such as adequate drug regulation ability, low toxicity and high sensitivity are employed to filter out the appropriate molecular drugs metformin, gefitinib and gallic-acid to combine as a potential multiple-molecule drug for the therapeutic treatment of OSCC.


Introduction
In recent years, oral cancer has been the eighth most common malignancy of the head and neck, with more than 145,500 deaths worldwide each year [1,2], and oral squamous cell carcinoma (OSCC) accounts for approximate 90% of all cancers in the oral cavity [3]. Despite many advancements in cancer treatment, the 5-year survival rate for OSCC patients is 50% [4], which has remained unchanged over the past decade. In recent years, various environmental factors, such as smoking and chewing betel nut, are the main causes of OSCC [5]. However, not everyone exposed to these triggers will develop oral cancer. Many studies have shown that the occurrence of oral cancer is the result of oncogene activation or tumor suppressor gene inactivation. Therefore, a better understanding of the regulatory target interaction (docking), adequate drug regulation ability, low toxicity and high sen sitivity to select adequate molecular drugs for biomarkers, which are combined as a mul tiple-molecular drug of OSCC, from the perspective of system engineering. As seen in the flowchart of the systematic drug design procedure in a DNN-based DTI model wa trained by DTI databases in advance. Then, the well-trained DNN-based DTI model could predict a set of candidate molecular drugs for each biomarker (drug target). Eventually with the help of the above drug design specifications, we chose and combined metformin gefitinib and gallic-acid among the set of candidate molecular drugs as the multiple-mol ecule drug to target the biomarkers HES1, TCF, NF-κB and SP1 for OSCC. Taken together we expect that the proposed systematic medicine discovery and design procedure can provide an efficient way to design a multiple-molecule drug as a new therapy for OSCC treatment before clinical trials. Figure 1. Flowchart of the systems biology method and the outline of the systematic drug discovery design. The candidate GWGEN consists of a gene regulation network (GRN) and protein-protein in teraction network (PPIN), where the candidate GRN was constructed through integrating gene regu lation databases, and candidate PPI was constructed via protein-protein interaction databases. Th candidate GWGEN was identified to obtain real GWGEN by OSCC microarray data from GSE3078 and GSE17913 through system identification and system order detection, and core GWGEN was ex tracted from real GWGEN by the PNP method. The core signaling pathways of non-OSCC and OSCC are obtained by core GWGENs of non-OSCC and OSCC via the denotation of KEGG pathways, re spectively. The carcinogenic biomarkers were identified by comparing the core signaling pathway and their down streaming abnormal cellular functions of non-OSCC and OSCC. The DNN-based DT model can be employed to predict candidate molecular drugs for these carcinogenic biomarkers, and drug design specifications are used to select a multiple-molecule drug for OSCC. Figure 1. Flowchart of the systems biology method and the outline of the systematic drug discovery design. The candidate GWGEN consists of a gene regulation network (GRN) and protein-protein interaction network (PPIN), where the candidate GRN was constructed through integrating gene regulation databases, and candidate PPI was constructed via protein-protein interaction databases. The candidate GWGEN was identified to obtain real GWGEN by OSCC microarray data from GSE30784 and GSE17913 through system identification and system order detection, and core GWGEN was extracted from real GWGEN by the PNP method. The core signaling pathways of non-OSCC and OSCC are obtained by core GWGENs of non-OSCC and OSCC via the denotation of KEGG pathways,

Overview of the Systems Biology Method and the Systematic Drug Discovery and Design of OSCC
In order to obtain insight into the carcinogenic mechanism to identify significant carcinogenic biomarkers as drug targets of OSCC, we search for potential molecular drugs to target these significant biomarkers by a deep neural network (DNN)-based DTI model and drug design specifications from the viewpoint of system engineering. The first step is to construct a candidate GWGEN of non-OSCC and OSCC by big data mining from the databases DIP [22], IntAct [23], BioGRID [24], MINT [25] HTRIdb [26], ITFP [27], TRANSFAC [28], CircuitDB [29], TargetScanHuman [30] and starBase 2.0 [31]. Then, the system identification method in Equations (1)- (24) by the microarray data of non-OSCC and OSCC and the system order detection method in Equations (25)- (32) are employed to construct real GWGENs of non-OSCC and OSCC in Figure 2 by pruning off the false positives from candidate GWGEN, respectively. Since, at most, 6000 molecules in real GWGENs can be annotated by the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, the principal network projection method (PNP) in Equations (33)- (40) is used for extracting the core GWGENs in Figure 3, i.e., the core GWGEN network of non-OSCC and the core GWGEN network of OSCC are at the maximum with 6000 significant nodes. The core GWGENs extracted by the PNP method from the real GWGENs can also simplify the investigation of the carcinogenic mechanism of OSCC. The numbers of proteins, TFs, Receptors, LncRNAs and miRNAs of core GWGENs are also indicated. The core signaling pathways of non-OSCC and OSCC are constructed by projecting the corresponding core GWGENs to KEGG significant pathways in Figure 4. By comparing the core signaling pathways and the downstream abnormal cellular functions between non-OSCC and OSCC in Figure 4, we could investigate the carcinogenic mechanism of OSCC, from which the significant biomarkers were identified as drug targets for the therapeutic treatment of OSCC. Furthermore, the deep neural network of the drug-target interaction (DTI) model is trained in Figure 5 to predict candidate molecular drugs, which are selected by drug design specifications such as adequate drug regulatory ability, low toxicity and high sensitivity as potential molecular drugs that can be combined as a multiple-molecule drug of OSCC.

Overview of the Systems Biology Method and the Systematic Drug Discovery and Design of OSCC
In order to obtain insight into the carcinogenic mechanism to identify significant carcinogenic biomarkers as drug targets of OSCC, we search for potential molecular drugs to target these significant biomarkers by a deep neural network (DNN)-based DTI model and drug design specifications from the viewpoint of system engineering. The first step is to construct a candidate GWGEN of non-OSCC and OSCC by big data mining from the databases DIP [22], IntAct [23], BioGRID [24], MINT [25] HTRIdb [26], ITFP [27], TRANSFAC [28], CircuitDB [29], TargetScanHuman [30] and starBase 2.0 [31]. Then, the system identification method in Equations (1)- (24) by the microarray data of non-OSCC and OSCC and the system order detection method in Equations (25)- (32) are employed to construct real GWGENs of non-OSCC and OSCC in Figure 2 by pruning off the false positives from candidate GWGEN, respectively. Since, at most, 6000 molecules in real GWGENs can be annotated by the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, the principal network projection method (PNP) in Equations (33)-(40) is used for extracting the core GWGENs in Figure 3, i.e., the core GWGEN network of non-OSCC and the core GWGEN network of OSCC are at the maximum with 6000 significant nodes. The core GWGENs extracted by the PNP method from the real GWGENs can also simplify the investigation of the carcinogenic mechanism of OSCC. The numbers of proteins, TFs, Receptors, LncRNAs and miRNAs of core GWGENs are also indicated. The core signaling pathways of non-OSCC and OSCC are constructed by projecting the corresponding core GWGENs to KEGG significant pathways in Figure 4. By comparing the core signaling pathways and the downstream abnormal cellular functions between non-OSCC and OSCC in Figure 4, we could investigate the carcinogenic mechanism of OSCC, from which the significant biomarkers were identified as drug targets for the therapeutic treatment of OSCC. Furthermore, the deep neural network of the drug-target interaction (DTI) model is trained in Figure 5 to predict candidate molecular drugs, which are selected by drug design specifications such as adequate drug regulatory ability, low toxicity and high sensitivity as potential molecular drugs that can be combined as a multiple-molecule drug of OSCC.
(A) Real GWGEN of non-OSCC  The core GWGEN of OSCC. The core GWGENs were extracted by the PNP method from the real GWGENs to simplify the annotation of KEGG pathways for the carcinogenic mechanism analysis of OSCC. The numbers denote the node numbers of proteins, TFs, Receptors, LncRNAs and miRNAs, respectively. The purple lines indicate the protein-protein interactions, and the orange lines denote the gene regulations. The core GWGEN of OSCC. The core GWGENs were extracted by the PNP method from the real GWGENs to simplify the annotation of KEGG pathways for the carcinogenic mechanism analysis of OSCC. The numbers denote the node numbers of proteins, TFs, Receptors, LncRNAs and miRNAs, respectively. The purple lines indicate the protein-protein interactions, and the orange lines denote the gene regulations.
(B) Core GWGEN of OSCC The core GWGEN of OSCC. The core GWGENs were extracted by the PNP method from the real GWGENs to simplify the annotation of KEGG pathways for the carcinogenic mechanism analysis of OSCC. The numbers denote the node numbers of proteins, TFs, Receptors, LncRNAs and miRNAs, respectively. The purple lines indicate the protein-protein interactions, and the orange lines denote the gene regulations. Figure 4. The core signaling pathways in three blocks represent specific OSCC, common non-OSCC and specific non-OSCC core signaling pathways from left to right, respectively. The core signaling pathways of non-OSCC and OSCC are based on the annotation of core GWGENs of non-OSCC and OSCC in Figure 3, respectively. For investigating the genetic and epigenetic carcinogenic mechanism of OSCC, the core signaling pathways and the downstream abnormal cellular functions of non-OSCC and OSCC are compared. The genes and proteins in the core signaling pathways were chosen from core GWGENs of the non-OSCC and OSCC by the annotation of KEGG pathways. The gene regulations and protein interactions were constructed based on the edges in core GWGENs of non-OSCC and OSCC. The low and high expression arrow-head marks are relative to non-OSCC. . The core signaling pathways in three blocks represent specific OSCC, common non-OSCC and specific non-OSCC core signaling pathways from left to right, respectively. The core signaling pathways of non-OSCC and OSCC are based on the annotation of core GWGENs of non-OSCC and OSCC in Figure 3, respectively. For investigating the genetic and epigenetic carcinogenic mechanism of OSCC, the core signaling pathways and the downstream abnormal cellular functions of non-OSCC and OSCC are compared. The genes and proteins in the core signaling pathways were chosen from core GWGENs of the non-OSCC and OSCC by the annotation of KEGG pathways. The gene regulations and protein interactions were constructed based on the edges in core GWGENs of non-OSCC and OSCC. The low and high expression arrow-head marks are relative to non-OSCC.
The collected microarray data were classified into non-OSCC and OSCC groups, as shown in Table 1. Based on protein interaction and gene regulation databases, since the candidate GW-GEN was constructed due to different biological conditions and computational predictions in these databases, there are many false positives in the candidate GWGEN. Therefore, the system identification method in Systems Biology [32,33] is employed to identify the protein interactions and gene regulations by the microarray data of OSCC and non-OSCC. Systems order detection is employed to prune off the false positive protein interactions out of the interaction order of each protein and the gene regulations out of the regulation order of each gene in candidate GWGEN to obtain the real GWGENs of OSCC and non-OSCC. The real GWGENs were too complex and large for analyzing the carcinogenesis of OSCC by the annotation of KEGG pathways. To solve this problem, the core GWGENs were extracted from the real GWGENs of OSCC and non-OSCC by the PNP method to reduce the network size in order to simplify the following carcinogenic analysis of OSCC by the annotation through KEGG pathways. In Table 2, the sizes of the core GWGENs of OSCC are smaller than real OSCC GWGENs. Both the real OSCC GWGEN and the real non-OSCC were plotted by Cytoscape software in Figure 2. The corresponding core GWGENs for non-OSCC and OSCC were plotted by Cytoscape software in Figure 3. Moreover, the core signaling pathways were obtained by the annotation of core GWGENs by KEGG pathways to investigate the carcinogenic mechanism by comparing core signaling pathways. The KEGG pathway enrichment analyses of the core signaling pathways of OSCC and non-OSCC are given in Tables 3 and 4, respectively. The core signaling pathways for OSCC and non-OSCC are given in Figure 4. Then, based on the core signal pathways and their downstream abnormal cellular functions of OSCC and non-OSCC in Figure 4, we will investigate the genetic and epigenetic carcinogenic mechanism of OSCC in the following.   The content in the table shows the number of nodes. The rows of the table contain different types of nodes.

Investigating the Genetic and Epigenetic Carcinogenic Mechanism of OSCC
Cancer is a disease mainly caused by abnormal signaling pathways. Smoking and drinking can cause changes in the microenvironment of the human body [34,35]. In our study, adverse factors such as JAG1, Wnt, ECM, EGF and IGF1 produced by the human microenvironment ultimately lead to cellular dysfunctions. From comparing the core signaling pathways of OSCC and non-OSCC in Figure 4, the core signaling pathways of OSCC and their downstream abnormal cellular functions need to be investigated into the genetic and epigenetic carcinogenic mechanism of OSCC. The core signal pathways and their downstream cellular functions of OSCC are investigated as follows: (i) Abnormal MAPK signaling pathway in OSCC In Figure 4, the ligand epithelial growth factor (EGF) in the micro-environment of OSCC targets the EGF receptor (EGFR), activates the phosphorylation of its downstream signaling proteins Src and PI3K and then leads to the well-known estrogen-mediated Ras/Raf/MEK/ERK pathway [36]. Several mutations in the MAPK/ERK pathway have been identified in human cancers. The mitogen-activated protein kinase (MAPK) cascade is a critical pathway for human cancer cell survival, dissemination and resistance to drug therapy. The extracellular signal-regulated kinase (ERK) pathway is a convergent signaling node that receives input from numerous stimuli, including internal metabolic stress and DNA damage pathways and altered protein concentrations, as well as from external growth factors, cell-matrix interactions and communications from other cells. Mutated genes responsible for regulating cell fate, genome integrity and survival can overactivate this pathway by causing increased protein amplification and altering the tumor microenvironment. These mutations can occur upstream of membrane receptor genes [37]. Current and future drug development efforts will require altering and modulating tumor signaling in this complex network of codependent pathways. ERK ultimately redirects to the transcription factor SP1, which is abnormally upregulated in patients with OSCC, and SP1 promotes the expression of CCND1 and COX-2 [38,39]. COX-2 has been reported to be involved in cancer cell migration, cancer cell proliferation, lymph-angiogenesis and metastasis. There was a significant positive correlation between angiogenesis and apoptosis [40]. CCND1 was negatively correlated with apoptosis and an accelerated cell cycle [41,42].
(ii) The impact of the Wnt signaling pathway on OSCC The ligand Wnt is common in embryonic development and cancer [43]. Wnts are secreted glycoproteins that bind to frizzled class 1 (FZD1) receptors, which may be coupled to heterotrimeric G proteins. Intracellularly, signal transduction passes through GBP, glycogen synthase kinase 3β (GSK3β) and tumor suppressor gene product (APC) and then activates a key protein (β-catenin). Consequentially, stable β-catenin enters the nucleus and interacts with TF TCF/LEF, leading to the transcription of the Wnt target genes C-Myc and CCND1 [44]. According to reports and studies [45], there is a regulation of underexpressed TF TCF/LEF on C-Myc and CCND1. CCND1 was positively correlated with apoptosis and an accelerated cell cycle. C-Myc was positively correlated with proliferation and negatively correlated with apoptosis [46].
(iii) Notch signaling pathway in OSCC In Figure 4, the cell surface receptor Notch is activated and causes mutation by the microenvironmental factor JAG1 of OSCC [47,48]. The constitutive activation of the Notch pathway has been demonstrated in various types of malignancies [49]. In this study, the Notch pathway was also found to be extremely important in OSCC. The Notch signaling cascade affects several key aspects of normal development through proliferation and apoptosis [50]. All the signaling pathways transduce down to the TF HES1 [51]. Finally, the signal is transmitted to the proto-oncogene C-Myc [52]. According to reports and our study, HES1 is underexpressed and upregulates C-Myc and then promotes cell apoptosis and proliferation in OSCC patients [50].
(iv) Key transcription factor NF-κB on OSCC Integrin subunit alpha (ITGA) also plays a crucial role in OSCC. In Figure 4, it can be seen that, after the receptor ITGA is affected by the extrinsic factor extra cellular matrix 9 of 29 (ECM) in the micro-environment of OSCC [53], it will then successively affect the downstream signaling transduction proteins Src/PI3K/PKB/AKT. Insulin-like growth factor 1 (IGF-1R) signaling is partially mediated by the Src pathway. The activation of Src and IGF-1R also activates Akt. It is a key effector of the PI3K/Akt pathway. It is abnormally activated in most malignant tumors, promoting cell growth, proliferation and survival [54]. In many reports, AKT is underexpressed and phosphorylated in OSCC patients [55]. It then affects the TFs IKK and NF-κB. NF-κB affects numerous target genes. Many reports indicate that the genes COX-2 and VEGF are particularly important in OSCC patients [55,56]. The genes NF-κB, COX-2 and VEGF are abnormally up-regulated to promote angiogenesis, lympha-angiogenesis, migration, proliferation and apoptosis [55,56].
According to the description of OSCC above, poor diet and living habits can lead to changes in the human microenvironment. Then, a series of pathway signaling leads to downstream cellular dysfunction, among which abnormal apoptosis, proliferation, migration and angiogenesis play crucial roles on the carcinogenesis of OSCC.

Significant Biomarkers as Drug Targets for the Therapeutic Treatment of OSCC Utilizing the Systematic Drug Discovery Approach
After investigating the carcinogenic mechanism of OSCC from the core signaling pathways and their downstream abnormal cellular functions, the significant biomarkers of the carcinogenic mechanism of OSCC will be identified as drug targets for therapeutic treatment.
According to the investigation of carcinogenic mechanisms, OSCC suffers from proliferation and apoptosis. Based on the above core signaling pathways and their downstreaming abnormal cellular functions, we properly selected significant biomarkers that were related to carcinogenically abnormal inflammation, apoptosis, proliferation, angiogenesis and cell cycles. Consequently, we selected HES, TCF, NF-κB and SP1 as significant biomarkers and aimed to reverse their expression levels, i.e., to restore them to normal inflammation, apoptosis, proliferation, angiogenesis and cell cycles. HES plays an important role in the NOTCH pathway. TCF is also a main character in the Wnt pathway. NF-κB is involved in cellular responses to many stimuli. Sp1 has been shown to be involved in apoptosis.
After identifying the significant biomarkers as drug targets, we considered the chemical properties of drugs and targets to select candidate molecular drugs for these drug targets (biomarkers) based on their drug-target bindings (dockings) predicted by the DNN-based DTI model and to design a multiple-molecule drug for OSCC treatment before clinical trials based on some suitable drug design specifications, i.e., adequate drug regulation ability, low toxicity and high sensitivity. The flowchart of the systematic drug discovery and design method is described in Figure 5. The flowchart of the systematic drug discovery and design procedure for OSCC. The drugtarget binding datasets were obtained from the binding database BindingDB, which integrates the information of drugs and targets from several databases. Then, the drug and target features were preprocessed respectively, including descriptor transformation, standardization and PCA dimension reduction. Afterwards, the processed data were split into the training data for DNN-based DTI model training and the testing data for DTI model performance validation in Figures 5 and 6. We updated the model parameters through the model error between the true binding label and the predicted binding label of drug-target pairs. The well-trained DNN-based DTI model was used to predict the binding probability between drugs and targets (biomarkers). Therefore, candidate drugs were predicted for each biomarker in Table 5 by the well-trained DNN-based DTI model from the drug databases and further filtered as potential drugs by the drug design specifications of suitable regulation ability, low toxicity and high sensitivity, which are combined as a multiple-molecule drug for OSCC in Table 6.  The flowchart of the systematic drug discovery and design procedure for OSCC. The drug-target binding datasets were obtained from the binding database BindingDB, which integrates the information of drugs and targets from several databases. Then, the drug and target features were preprocessed respectively, including descriptor transformation, standardization and PCA dimension reduction. Afterwards, the processed data were split into the training data for DNN-based DTI model training and the testing data for DTI model performance validation in Figures 5 and 6. We updated the model parameters through the model error between the true binding label and the predicted binding label of drug-target pairs. The well-trained DNN-based DTI model was used to predict the binding probability between drugs and targets (biomarkers). Therefore, candidate drugs were predicted for each biomarker in Table 5 by the well-trained DNN-based DTI model from the drug databases and further filtered as potential drugs by the drug design specifications of suitable regulation ability, low toxicity and high sensitivity, which are combined as a multiple-molecule drug for OSCC in Table 6. Figure 5. The flowchart of the systematic drug discovery and design procedure for OSCC. The drugtarget binding datasets were obtained from the binding database BindingDB, which integrates the information of drugs and targets from several databases. Then, the drug and target features were preprocessed respectively, including descriptor transformation, standardization and PCA dimension reduction. Afterwards, the processed data were split into the training data for DNN-based DTI model training and the testing data for DTI model performance validation in Figures 5 and 6. We updated the model parameters through the model error between the true binding label and the predicted binding label of drug-target pairs. The well-trained DNN-based DTI model was used to predict the binding probability between drugs and targets (biomarkers). Therefore, candidate drugs were predicted for each biomarker in Table 5 by the well-trained DNN-based DTI model from the drug databases and further filtered as potential drugs by the drug design specifications of suitable regulation ability, low toxicity and high sensitivity, which are combined as a multiple-molecule drug for OSCC in Table 6.   For our DNN-based DTI model in Figure 5, DNN was set with four hidden layers, and each of them is connected with a ReLU activation function layer after it. ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions. Although ReLU is not the best activation function, it is useful for the classification issue. Additionally, the dropout layer is incorporated after each hidden layer to prevent overfitting. The input layer has 996 nodes, and 512, 256, 128 and 64 neurons are embedded, respectively, in four hidden layers. Before the output layer, a sigmoid activation function is used to restrict the output value in the range of 0 to 1 as a probability value. In general, the sigmoid function is usually used for binary classification. The outcome of DTI is a probability value, where a higher value corresponds to a more reliable interaction (docking) between the drug and the target. The loss and accuracy during the training process are separately recorded in Figures 6 and 7, respectively. Table 6. A potential multiple-molecule drug for OSCC from the candidate molecular drugs in Table 5 by the drug design specifications of suitable regulation ability, low toxicity and high sensitivity. For our DNN-based DTI model in Figure 5, DNN was set with four hidden layers, and each of them is connected with a ReLU activation function layer after it. ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions. Although ReLU is not the best activation function, it is useful for the classification issue. Additionally, the dropout layer is incorporated after each hidden layer to prevent overfitting. The input layer has 996 nodes, and 512, 256, 128 and 64 neurons are embedded, respectively, in four hidden layers. Before the output layer, a sigmoid activation function is used to restrict the output value in the range of 0 to 1 as a probability value. In general, the sigmoid function is usually used for binary classification. The outcome of DTI is a probability value, where a higher value corresponds to a more reliable interaction (docking) between the drug and the target. The loss and accuracy during the training process are separately recorded in Figure 6 and For our DNN-based DTI model in Figure 5, DNN was set with four hidden layers, and each of them is connected with a ReLU activation function layer after it. ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions. Although ReLU is not the best activation function, it is useful for the classification issue. Additionally, the dropout layer is incorporated after each hidden layer to prevent overfitting. The input layer has 996 nodes, and 512, 256, 128 and 64 neurons are embedded, respectively, in four hidden layers. Before the output layer, a sigmoid activation function is used to restrict the output value in the range of 0 to 1 as a probability value. In general, the sigmoid function is usually used for binary classification. The outcome of DTI is a probability value, where a higher value corresponds to a more reliable interaction (docking) between the drug and the target. The loss and accuracy during the training process are separately recorded in Figure 6 and For our DNN-based DTI model in Figure 5, DNN was set with four hidden layers, and each of them is connected with a ReLU activation function layer after it. ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions. Although ReLU is not the best activation function, it is useful for the classification issue. Additionally, the dropout layer is incorporated after each hidden layer to prevent overfitting. The input layer has 996 nodes, and 512, 256, 128 and 64 neurons are embedded, respectively, in four hidden layers. Before the output layer, a sigmoid activation function is used to restrict the output value in the range of 0 to 1 as a probability value. In general, the sigmoid function is usually used for binary classification. The outcome of DTI is a probability value, where a higher value corresponds to a more reliable interaction (docking) between the drug and the target. The loss and accuracy during the training process are separately recorded in Figure 6 and Figure 7, respectively. denotes the drug that can bind to the biomarker with a desired regulation ability in (·). For our DNN-based DTI model in Figure 5, DNN was set with four hidden layers, and each of them is connected with a ReLU activation function layer after it. ReLU activation function could avoid vanishing gradient problems and converge much faster than the other activation functions. Although ReLU is not the best activation function, it is useful for the classification issue. Additionally, the dropout layer is incorporated after each hidden layer to prevent overfitting. The input layer has 996 nodes, and 512, 256, 128 and 64 neurons are embedded, respectively, in four hidden layers. Before the output layer, a sigmoid activation function is used to restrict the output value in the range of 0 to 1 as a probability value. In general, the sigmoid function is usually used for binary classification. The outcome of DTI is a probability value, where a higher value corresponds to a more reliable interaction (docking) between the drug and the target. The loss and accuracy during the training process are separately recorded in Figure 6 and   Furthermore, we also plot the receiver operating characteristic (ROC) curve measure of the probability of the prediction accuracy of the DNN-based DTI model. The visualization of the ROC curve comparison is denoted in Figure 8. From Figure 8, the prediction perfor-mance of our proposed DTI model indicates that the deep learning concept is promising to adapt to the large and complicated drug-target interaction data.

Drugs
Furthermore, we also plot the receiver operating characteristic (ROC) curve measure of the probability of the prediction accuracy of the DNN-based DTI model. The visualization of the ROC curve comparison is denoted in Figure 8. From Figure 8, the prediction performance of our proposed DTI model indicates that the deep learning concept is promising to adapt to the large and complicated drug-target interaction data. To find suitable drug candidates, predictions are made based on the high probability of the candidate drug binding (docking) to the selected biomarker. At the same time, since powerful drugs are often associated with a high risk of damage, attention should also be paid to the balance between drug efficacy and adverse effects. Correspondingly, we guarantee the stability and safety of the drug in clinical trials, taking into account the drug design specifications, such as regulatory ability, toxicity and sensitivity.
To measure the regulatory ability of drug candidates, available regulatory capacity data were downloaded from the L1000 level5 dataset [57,58], which contains 978 genes treated with 19,811 small molecule compounds in 78 different cell lines. In the accommodation ability data, positive values indicate up-regulations and negative values indicate down-regulations. Based on this criterion, we searched for molecular drugs in suitable cell lines that could reverse the expression of carcinogenic biomarkers in OSCC to restore their normal expressions to remedy their down streaming cellular dysfunctions of OSCC. In addition, a lower drug toxicity has the effect of reducing side effect by referring to the median lethal dose (LD50) value in DrugBank [59]. LD50 is often considered for disease and cancer drug design. The lower the LD50 value, the greater the toxicity. In addition, a higher drug sensitivity (lower EC50 value) can reduce the drug dosage. Drug susceptibility data were obtained from the PRISM dataset [60], which includes 4518 drugs tested in 578 human cell lines based on the half-maximal effective concentration (EC50). EC50 is a measure of the potency of a drug, where a lower EC50 indicates that the drug works best at lower doses. As mentioned above, aberrant expression in a disease can be well reversed by choosing the right drug. Some candidate drugs predicted by the DNN-based DTI model for identified biomarkers and their regulability, toxicity, and sensitivity To find suitable drug candidates, predictions are made based on the high probability of the candidate drug binding (docking) to the selected biomarker. At the same time, since powerful drugs are often associated with a high risk of damage, attention should also be paid to the balance between drug efficacy and adverse effects. Correspondingly, we guarantee the stability and safety of the drug in clinical trials, taking into account the drug design specifications, such as regulatory ability, toxicity and sensitivity.
To measure the regulatory ability of drug candidates, available regulatory capacity data were downloaded from the L1000 level5 dataset [57,58], which contains 978 genes treated with 19,811 small molecule compounds in 78 different cell lines. In the accommodation ability data, positive values indicate up-regulations and negative values indicate downregulations. Based on this criterion, we searched for molecular drugs in suitable cell lines that could reverse the expression of carcinogenic biomarkers in OSCC to restore their normal expressions to remedy their down streaming cellular dysfunctions of OSCC. In addition, a lower drug toxicity has the effect of reducing side effect by referring to the median lethal dose (LD50) value in DrugBank [59]. LD50 is often considered for disease and cancer drug design. The lower the LD50 value, the greater the toxicity. In addition, a higher drug sensitivity (lower EC50 value) can reduce the drug dosage. Drug susceptibility data were obtained from the PRISM dataset [60], which includes 4518 drugs tested in 578 human cell lines based on the half-maximal effective concentration (EC50). EC50 is a measure of the potency of a drug, where a lower EC50 indicates that the drug works best at lower doses. As mentioned above, aberrant expression in a disease can be well reversed by choosing the right drug. Some candidate drugs predicted by the DNN-based DTI model for identified biomarkers and their regulability, toxicity, and sensitivity information are shown in Table 5. The potential drugs metformin, gallic-acid and gefitinib are selected by the three above-mentioned drug design specifications (i.e., suitable regulation ability, high sensitivity and low toxicity) and are combined as a multiple-molecule drug for therapeutic treatment of OSCC in Table 6.

Discussion Potential Multiple-Molecule Drug for the Identified Biomarkers of OSCC
Recently, Cisplatin has been the most common drug for the treatment of OSCC [61,62], but its powerful side effects are daunting; for example, renal toxicity, nausea, vomiting and neurotoxicity [63]. In this study, we tried to find novel multiple-molecular drugs to treat OSCC. With the consideration of sensitivity, toxicity and regulation ability as drug specifications, we combined deep learning and systems biology methods to find the right molecular drugs for the identified biomarkers of OSCC. Ultimately, the molecular drugs metformin, gefitinib and gallic-acid are detected and combined as a multi-molecule drug in Table 6 for the therapeutic treatment of OSCC.
Gefitinib is a drug that acts on the tyrosine kinase domain of the epidermal growth factor receptor [64]. EGFR is overexpressed in certain human cancers, such as breast and lung cancer [65,66]. The excessive activation of EGFR by the ligand EGF in the microenvironment of OSCC leads to the abnormal activation of anti-apoptotic Ras cell signaling, resulting in uncontrolled cell division [67,68]. Gefitinib binds covalently to the enzyme adenosine triphosphate (ATP) and inhibits the epidermal growth factor receptor tyrosine kinase [69].
Metformin, a low-cost antidiabetic drug [70], has been widely used to treat diabetes by inhibiting hepatic gluconeogenesis and enhancing glucose uptake by skeletal muscle [71]. Several studies have shown that metformin is being repurposed as an anticancer therapeutic for different types of cancer [72]. The insulin receptor transmits signals through growth factor receptor binding protein 2 (GRB2) to the Ras/Raf/ERK pathway that drives cell growth. There is evidence that these pathways play an important role in the changes in cellular metabolism that are typical of tumor cells. Elevated circulating insulin/IGF1 levels and the upregulation of insulin/IGF receptor signaling have been implicated in the development of various cancers. Metformin was found to reduce insulin levels, inhibit the insulin/IGF signaling pathway and alter cellular metabolism in normal and cancer cells [73]. Interestingly, metformin can increase the sensitivity of oral cancer cells to chemotherapeutic drugs such as gefitinib [74,75], improving therapeutic efficacy and reducing dose and toxicity [76]. Overall, metformin in combination with gefitinib may be a potential drug for the development of new therapeutic strategies for human OSCC.
Gallic-acid is a phenolic compound widely found in the plant kingdom [77], such as green vegetables, fruits and other plants [78]. The toxicity of gallic-acid to normal cells of humans is very low [79]. It is highly toxic to bad cells such as fibrotic cells and cancer cells [80,81] and can kill these harmful cells; but when it encounters normal cells, the toxicity will become weaker [82].
In summary, the molecular drugs metformin, gefitinib and gallic-acid are selected and combined as a multi-molecule drug of OSCC in Table 6 from the proposed systematic drug discovery and design perspectives.

A General Review of Constructing Core Genome-Wide Genetic and Epigenetic Networks (GWGENs) of OSCC and Non-OSCC
At first, we divided the data into a disease group and a control group. The disease group data came from GSE30784, and the control group data came from GSE30784 and GSE17913. Then, we used the following steps to construct core GWGENs of OSCC and non-OSCC.
(1) Constructing the candidate GWGEN: We use big database mining to construct a candidate PPIN and a candidate GRN, including genes, miRNAs and lncRNAs. Note that the candidate GWGEN includes a candidate PPIN and a candidate GRN. (2) Identifying real GWGENs: We identify the parameters of PPIN and GRN through the system identification method by solving the corresponding constrained linear least squares estimation problems with the help of the microarray data of OSCC and non-OSCC. After performing system modeling and parameter identification for proteins, genes, miRNAs and lncRNAs in the candidate GWGEN, we used the system order detection method AIC to prune the false positives in the regulation and interactions in the candidate GWGEN to obtain the real GWGENs of OSCC and non-OSCC. (3) Extracting the core GWGENs: To extract the core GWGENs of OSCC and non-OSCC, we applied the PNP approach. By doing this, we could compute a projection value for each node in the real GWGENs to 85% significant network structures of real GWGENs. The top 6000 nodes of real GWGENs with the highest projection values have remained as core GWGENs. (4) Building and comparing the core signaling pathways: The core signaling pathways for cells of OSCC and non-OSCC can be constructed by the annotation of the KEGG pathways of core GWGENs of OSCC and non-OSCC, respectively. We investigated the molecular mechanisms of carcinogenesis of OSCC by comparing the upstream microenvironmental factors, core signaling pathways and their corresponding downstream abnormal cellular functions of OSCC and non-OSCC.

Data Preprocessing for Constructing the Candidate GWGEN
In this research, we downloaded the dataset with accession numbers GSE30784 and GSE17913 from the National Center for Biotechnology Information (NCBI). We divided the data into a disease group and a control group. The disease group data came from GSE30784, and the control group data came from GSE30784 and GSE17913. Note that the candidate GWGEN included a candidate PPIN and a candidate GRN. The candidate GWGEN matrix is a binary matrix. If two nodes have an interaction or regulation, we assigned value one for it; otherwise, we assigned a value zero for it. For building the candidate PPIN, we referred to the following databases: DIP [22], IntAct [23], BioGRID [24] and MINT [25]. Moreover, to construct the candidate GRN, we considered the following databases: HTRIdb [26], ITFP [27], TRANSFAC [28], CircuitDB [29], TargetScanHuman [30] and StarBase 2.0 [31].

System Modeling of the Candidate GWGEN
To model the candidate GWGEN, we build system modeling for proteins, genes, miRNAs and lncRNAs. First, in the candidate GWGEN, the following interactive equation describes the q-th protein interaction with its G q neighboring proteins in the candidate PPIN: κ qr p q [n]p r [n] + λ q,PPI M + µ q,PPI M [n], for q = 1, . . . , Q, n = 1, . . . , N where p q [n] means the expression level of the q-th protein in the n-th sample and p r [n] is the expression level of the r-th protein in the n-th sample; κ qr indicates the interaction ability between the q-th protein and the r-th protein; G q represents the total number of proteins that interact with the q-th protein in the candidate PPIN; Q expresses the total number of proteins in the candidate PPIN; N denotes the total number of samples; λ q,PPI M is the basal level in the model of the q-th protein for unknown protein interactions of histone modifications such as phosphorylation, acetylation and ubiquitination; µ q,PPI M [n] denotes the environment and measurement noise of the q-th protein.
Second, the transcriptional regulation of the x-th gene in GRN is described by the following equation: , for x = 1, . . . , X, n = 1, . . . , N where g x [n] means the gene expression level of the x-th gene in the n-th sample; t u [n], l v [n] and m w [n] individually indicate the gene expression level of the u-th TF, the v-th lncRNA and the w-th miRNA in the n-th sample; U x , V x and W x individually denote the total binding number of TFs, lncRNAs and miRNAs; α xu is the transcriptional regulatory ability of the u-th TF on the x-th gene; β xv expresses the transcriptional regulatory ability of the v-th lncRNA on the x-th gene; γ xw ≥ 0 denotes the post-transcriptional regulatory ability of the w-th miRNA on the x-th gene; X means the total number of genes in the candidate GWGEN; N denotes the total number of samples; λ x,GRM means the basal level of the x-th gene because of the unknown gene regulations such as methylation and genetic mutation; µ x,GRM [n] represents the environment or measurment noise.
Third, TFs, lncRNAs and miRNAs also have a potential impact on the i-th lncRNA, and we can depict this behavior by the lncRNA model (LRM) in the candidate GWGEN. The regulatory equation is described as follows: , for i = 1, . . . , I, n = 1, . . . , N where l i [n] means the expression level of the i-th lncRNA; t u [n], l v [n] and m w [n] indicate the expression level of the u-th TF, the v-th lncRNA and the the w-th miRNA of the n-th sample, respectively; U i , V i and W i individually represent the total binding number of TFs, lncRNAs and miRNAs on the i-th lncRNAs, respectively; σ iu is the transcriptional regulatory ability from the u-th TF on the i-th lncRNA; ς iv expresses the transcriptional regulatory ability from the v-th lncRNA on the i-th lncRNA; τ iw ≥ 0 denotes the posttranscriptional regulatory ability from the w-th miRNA on the i-th lncRNA; I denotes the total number of lncRNAs and N means the total number of samples; λ i,LRM indicates the basal level of the i-th lncRNA; µ i,LRM [n] is the data noise.
Fourth, the expression of the j-th miRNA is also affected by the TFs, lncRNAs and miRNAs. Furthermore, we can illustrate the regulation of miRNA in the miRNA model (MRM) of the candidate GWGEN through the following equation: , for j = 1, . . . , J, n = 1, . . . , N where m j [n] means the expression level of the j-th miRNA; t u [n], l v [n] and m w [n] separately represent the expression level of the u-th TF, the v-th lncRNA and the w-th miRNA, respectively; U j , V j and W j are the binding total numbers of TFs, lncRNAs and miRNAs, respectively; ω ju expresses the transcriptional regulatory ability from the u-th TF on the j-th miRNA; ξ jv denotes the transcriptional regulatory ability from the v-th lncRNA on the j-th miRNA; ψ jw denotes the post-transcriptional regulatory ability from the w-th miRNA on the j-th miRNA; J indicates the total number of miRNAs and N is the total number of samples; λ j.MRM represents the basal level of the j-th miRNA; µ j,MRM [n] means the data noise.

The System Identification and System Order Detection Methods for Real GWGENs of OSCC and Non-OSCC from the Candidate GWGEN
According to the interaction and regulation models we described above, the candidate PPIN is generated from PPI models in (1); the candidate GRN is described and combined by the corresponding regulation models in (2)-(4). We obtain the real GWGENs of OSCC and non-OSCC through pruning the false positive interactions and regulations by system identification and system order detection methods based on the microarray data of OSCC and non-OSCC, respectively. To identify the parameters of the above interactive and regulatory models, Equations (1)-(4) could be respectively expressed by the following linear regression forms.
(1) The above linear regression forms for N samples are denoted, respectively, as the following: [2] . . .
The above equations could be individually expressed as the following algebraic equations: P q = Φ q,PPI M · Θ q,PPI M + E q,PPI M , for q = 1, . . . , Q where Φ q,PPI M , Φ x,GRM , Φ i,LRM and Φ j,MRM separately express the regression matrix of proteins, genes, lncRNAs and miRNAs of N samples. Θ q,PPI M , Θ x,GRM , Θ i,LRM and Θ j,MRM individually mean the corresponding interactive and regulatory parameter vectors. E q,PPI M , E x,GRM , E i,LRM and E j,MRM respectively denote the corresponding data noise vectors.
For the constrained optimization problems for the parameter estimation problems (21)-(24), we look for the optimal parameter estimates of the candidate GWGEN through the microarray data of OSCC and non-OSCC as follows: interactive parameters between proteinsΘ q,PPI M , the regulatory parameters of genesΘ x,GRM , lncRNAsΘ i,LRM and miR-NAsΘ j,MRM . These constrained optimization problems for the parameter estimation of the candidate GWGEN were solved by the MATLAB Optimization Toolbox. It is worth noting that the negative inequality constraints in (22)- (24) represent that the regulatory parameters of miRNAs should be less than or equal to zero to ensure the negative regulation of miRNAs on genes, lncRNAs and miRNAs.
After the parameter estimation of the candidate GWGEN of non-OSCC and OSCC by the respective microarray data, we used the system order detection method, AIC, to detect the system order (the number of interactions of each protein or the number of regulations of each gene, lncRNA and miRNA). The equations of AIC for each protein, gene, lncRNA and miRNA are given as follows.
where Ω q,PPI M is the estimated residual error of the q-th protein from the least square parameter estimationΘ q,PPI M in (21), and Q q means the number of protein interactions with the q-th protein. where where Ω x,GRM means the estimated residual error of the x-th gene in (22), and O x,GRM is the number of regulations of the genes, lncRNAs and miRNAs on the x-th gene;Θ x,GRM denotes the estimated parameters in (22). where where Ω i,LRM denotes the estimated residual error of the i-th lncRNA in (23), and O i,LRM is the number of regulations of the genes, lncRNAs and miRNAs on the i-th lncRNA;Θ i,LRM indicates the estimated parameters in (23).
where Ω j.MRM denotes the estimated residual error of the j-th miRNA in (24), and O j.MRM indicates the number of regulations of the genes, lncRNAs and miRNAs on the j-th miRNA; Θ j,MRM denotes the estimated parameters in (24).
The AIC system order detection method in (25) means that if the system order (number interactions) Q q increases, the second term of AIC in (25) will increase and the first term of AIC will decrease, and vice versa. The real number of interactions (system order) Q q * will balance two terms and achieve the minimum AIC (Q q ). Therefore, employ the AIC technique to determine the real number of interactions or regulations for each protein, gene, miRNA and lncRNA by their AICs in (25)- (28) in the candidate GWGEN to prune the false positive interactions and regulations and obtain the real GWGENs of OSCC and non-OSCC in the following.
Considering the order detection method of AIC in system identification, the real order of the system (i.e., the total number of interactions of the q-th protein in (1) or the number of regulations on the x-th in (2)) is to minimize the AIC problems of system identification in (21)- (28). Therefore, the real number of interactions or regulations for every protein, gene, lncRNA and miRNA in the candidate GWGEN can be obtained by solving the AIC minimization problems below: where Q * q means the real number of protein interactions for the q-th protein; U * x , V * x and W * x respectively express the real number of regulations of genes, lncRNAs and miRNAs on the x-th gene; U * i , V * i and W * i represent the real number of regulations of genes, lncRNAs and miRNAs on the i-th lncRNA, respectively; U * j , V * j and W * j are individually the real number of regulations of genes, lncRNAs and miRNAs on the j-th miRNA. Therefore, the interactions of proteins and the regulations of the gene, miRNA and lncRNA, which are out of real order by solving the AIC minimization problems in (29)- (32), are thought of as false positives in the candidate GWGEN of non-OSCC and OSCC and should be pruned out to obtain the real GWGENs of non-OSCC and OSCC in Figure 2, respectively.
From the above equation, we chose the top I significant singular vector structures to indicate the significant energy of real GWGENs, which is equal or more than the threshold of 0.85. Then, we separately projected every node of real GWGENs (i.e., each row of the network matrix H) to the top I significant singular vectors, as follows.
Z(a, b) = h a,: · d T b,: , for a = 1, . . . , Q * + X * + I * + J * , b = 1, . . . , I where Z(a, b) means the projection value of the a-th node on the b-th significant singular vector; h a,; is the a-th row vector of the network matrix H and d T :,b expresses the b-th column of D T , i.e., the transpose of the b-th singular vector. Then, we define the two-norm projection value of each node such as protein, gene, lnRNA and miRNA in real GWGENs on the top I significant singular vectors as follows: a, b), for a = 1, . . . , Q * + X * + I * + J * According to the two-norm projection values in (40), the top 6000 significant proteins, genes, miRNAs and lncRNAs with high projection values were chosen to comprise the core GWGENs for OSCC and non-OSCC in Figure 3. Then, the core GWGENs were submitted to the Database for Annotation, Visualization and Integrated Discovery (DAVID) website to conduct the KEGG pathway enrichment analysis in Tables 3 and 4, and the core signaling pathways of non-OSCC and OSCC in Figure 4 were obtained by exploring the annotation of KEGG pathways. The enrichment analysis was used to check which significant pathways of our results were important to OSCC. Finally, the potential biomarkers were identified in Table 5 for the OSCC carcinogenic mechanism investigated by comparing the core signaling pathways and their downstream abnormal cellular functions of non-OSCC and OSCC in  Based on drug/target (biomarker) prediction and drug design specifications, we want to discover a potential multiple-molecule drug for the carcinogenic biomarkers on OSCC. First, we trained a DTI model based on a deep neural network (DNN) to predict the drugtarget interactions between the drugs and targets (biomarkers). However, it is not enough to only consider the interactions between the molecular drugs and the targets in drug design. Some drug design specifications, i.e., adequate regulation ability, low toxicity and high sensitivity, are necessary to filter the candidate drugs predicted by the DNN-based DTI model. This systems drug discovery method is introduced in the following paragraphs for designing a multiple-molecule drug for OSCC treatment before clinical trials.
The flowchart of the systematic drug discovery method is shown in Figure 5. First, drug-target interaction databases by UniProt [84], DrugBank [59], ChEMBL [85] and Pubchem [86] are combined to train DNN as the DTI model for drug-target interaction prediction. In a few years, the feature-based method, i.e., molecular descriptor, will be widely used to describe the structural and chemical properties of molecules such as characteristics from the 2D, 3D spectrum of the structure, molecular weight, hydrophilic and hydrophobicity, etc. The chemical properties of the drug and genomic sequence of the target could be described with the molecular descriptor for the purpose of convenient analysis in drug design since the molecular descriptor can transform complicated chemical properties into a feature vector. We used the molecule descriptor function and protein descriptor function of python package pyBioMed to transform the drug and target into descriptors as drug and target features, respectively, under the python2.7 environment. Drug features of molecule descriptors include constitutional descriptors, connectivity indices, E-state indices, charge descriptors, molecular properties and kappa shape indices. For the target features, the protein descriptor calculates the structural and physicochemical features of proteins and peptides from the amino acid sequence such as amino acid composition, dipeptide composition, etc. For more detailed information about the descriptor transformation, readers can access the documents of pyBioMed. The drug descriptor and the target descriptor are combined into the feature vector corresponding to the drug-target pair, as follows: where the former symbol D in v drug-target is defined as the drug descriptor and the latter symbol T is the target descriptor. d 1 means the first drug feature; t 1 represents the first target feature; M indicates the total number of features of the drug and N expresses the total number of features of the target. We have 363 features for the drug and 996 features for the target. Before training our DNN-based DTI model, we encountered several problems: features of different scales will affect our results-for example, large-scale features will have a great dominance in the training process. There are far more unverified data in the data than verified data, so there exists a between-class imbalance issue which could lead to a biased parameter updating tendency to the larger class during the training process.
To remedy the imbalance issue, we have to solve the problem of scale first. We use the min-max normalization to deal with this problem. Min-max normalization can handle this problem effectively, but compared to normalization, it is very sensitive to outliers. Additionally, feature normalization is performed before principal component analysis (PCA) to improve the DNN-based DTI model training performance. Therefore, the normalization to the features is given as follows: where d i denotes the i-th drug feature and d * i expresses the i-th drug feature after the standardization; µ i and σ i respectively denote the mean and standard deviations of the i-th drug feature; t j means the j-th feature of the target and t * j represents the j-th feature of the target after standardization; µ j and σ j separately indicate the mean and standard deviation of the j-th target feature; M expresses the total number of drug features and N denotes the total number of target features.
In our data, the number of proven drug-target interaction samples, called the positive class, is 1455, and the number of unverified drug-target interaction samples, called the negative class, is much larger than the positive class. There exists a huge difference in the amount between the negative class and the positive class, which leads to the between-class imbalance issue. Before training the DNN-based DTI model, we also implemented data preprocessing by the principal component analysis (PCA) method-(35)-(40)-to reduce the dimension of the features of the drug and target to the dimension of the input of DNN. The PCA extraction was conducted after down-sampling and standardization to ensure that the PCA could accurately project the original data on the feature plane. It is worth noting that data preprocessing, e.g., PCA, is only conducted in training data because testing data should be deemed unknown to the model. The drug features were selected by the top 85% with higher singular values. The insignificant features of the drug and the target lower than the dimension of the input of DNN are deleted, and the remainders were employed to train DNN as the DTI model to predict candidate drugs for biomarkers (drug targets) of OSCC.
We referenced the basic concept and knowledge of DNN to train a DNN-based DTI model to predict drug-target interaction through the python Tensorflow and Keras package under the python3.7 environment. In the model structure of the DNN-based DTI model in Figure 4, the function in a feedforward step can be denoted as follows: where x and h respectively denote the input and output; w is the weighting matrix and b is the bias vector; σ (.) indicates the activation function with ReLU in the hidden layer and the sigmoid function at the output layer. Since the binary classification issue is concerned, the binary-cross entropy is chosen as the cost function to calculate the model loss [11]: where p n means the truth label of positive interaction;p n indicates the predictive probability of positive interaction, and 1 − p n shows the truth label of negative interaction; 1 −p n represents the predicted probability of negative interaction; L(w, b) denotes the average of total loss C n (w, b). According to the cost function, the backward propagation algorithm is applied to update the model parameter set θ containing the weighting matrix w and the bias vector b through calculating the gradient of the cost function in (46) and eventually obtain the optimal solution θ * in (48), as follows.
where l is the l-th epoch of the learning procedure; η is learning rate and ∇L θ l−1 is the gradient of L θ l−1 , as follows: Based on the backward propagation method, the DNN-based DTI model could adjust the parameters to fit the drug-target interaction data at each iteration well.
In addition, the hyperparameters were tuned to not only lower the training time but also achieve the best model performance. We used an optimizer with default settings and set the learning rate to 0.003 to make model parameter θ converge faster and precisely. We set 100 for epochs and 100 for batch size. For the data, we split one-fourth of the data as testing data and three-fourths of it as training data. Moreover, we divided the training data into five equal folds to perform the five-fold cross-validation strategy. In the five-fold data, four-fifths of them were for the model training and one-fifth of them was used as the validation data, which play the role of supervisor to check whether the model was better than that of the former epoch. Additionally, five-fold cross-validation could verify the stability of the data and model. To avoid model overfitting, we applied an early stopping strategy to check if the test accuracy decreased when the training accuracy increased continuously. Moreover, we embedded the dropout layer after each hidden layer to further prevent model overfitting and set 0.4 for the dropout rate. After training the DNN-based DTI model, we adopted a performance measurement AUC (area under the curve) score and ROC (receiver operating characteristics) curve in Figure 8 to check the model performance. It is one of the most useful evaluation metrics to visualize the model performance when it comes to the classification problems. The higher the AUC score (which means the area under the line is larger), the better the accuracy is for the DNN-based DTI model in predicting the true positive and true negative drug-target interaction. The formulas for the AUC score and ROC curve are shown below [83].
where TP (True Positive) means that the real value is true and is judged correctly; TN shows that the real value is true and is judged by mistake; FP indicates that the real value is false and is judged accurately; FN represents that the real value is false and is judged in error.

Conclusions
In this study, based on our proposed combination of systems biology and systems drug discovery design methods, we investigated the complex carcinogenic molecular mechanisms of OSCC from genome-wide data, genetic and epigenetic network perspectives, and designed prospective multiple-molecular drugs as a multi-molecule drug to target multiple biomarkers of OSCC. We construct a genetic and epigenetic biological network by exploring methods of systematic identification and systematic sequential detection of big data. Afterwards, we extracted core signaling pathways by the PNP method and KEGG pathway annotation to select important biomarkers from OSCC carcinogenesis by comparing core signaling pathways and their downstream abnormal cellular functions. To discover drug candidates that interact with these biomarkers, we trained a DNN-based DTI model by DTI databases to predict drug-target interaction probability values. In addition, we took the drug regulation ability, low toxicity and high sensitivity as the drug design criteria to screen out suitable potential drugs from the predicted drug candidates. Therefore, a set of combined multi-molecular drugs is proposed as a multi-molecule drug for OSCC treatment. In the future, more and more different types of genomics data will be available for epigenetic and epigenetic regulations. A combination of multiple types of genomics data is needed to help us enhance our work and gain a deeper understanding of the significant biomarkers of the carcinogenic mechanism of OSCC. It is anticipated that the proposed systems biology and systems drug discovery design approach may provide new directions for OSCC treatment.

Conflicts of Interest:
The authors declare no conflict of interest.