Multiple-Molecule Drug Repositioning for Disrupting Progression of SARS-CoV-2 Infection by Utilizing the Systems Biology Method through Host–Pathogen–Interactive Time Profile Data and DNN-Based DTI Model with Drug Design Specifications

: The coronavirus disease 2019 (COVID-19) pandemic has claimed many lives since it was first reported in late December 2019. However, there is still no drug proven to be effective against the virus. In this study, a candidate host–pathogen–interactive (HPI) genome-wide genetic and epigenetic network (HPI-GWGEN) was constructed via big data mining. The reverse engineering method was applied to investigate the pathogenesis of SARS-CoV-2 infection by pruning the false positives in candidate HPI-GWGEN through the HPI RNA-seq time profile data. Subsequently, us-ing the principal network projection (PNP) method and the annotations of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, we identified the significant biomarkers usable as drug targets for destroying favorable environments for the replication of SARS-CoV-2 or enhancing the defense of host cells against it. To discover multiple-molecule drugs that target the significant biomarkers (as drug targets), a deep neural network (DNN)-based drug–target interaction (DTI) model was trained by DTI databases to predict candidate molecular drugs for these drug targets. Using the DNN-based DTI model, we predicted the candidate drugs targeting the significant biomarkers (drug targets). After screening candidate drugs with drug design specifications, we finally proposed the combination of bosutinib, erlotinib, and 17-beta-estradiol as a multiple-molecule drug for the treatment of the amplification stage of SARS-CoV-2 infection and the combination of erlotinib, 17-beta-estradiol, and sertraline as a multiple-molecule drug for the treatment of saturation stage of mild-to-moderate


Introduction
In late December 2019, China reported cases of pneumonia to the World Health Organization (WHO). In January 2020, the WHO named this disease coronavirus disease 2019 (COVID- 19). As of 21 June 2022, there have been more than 500 million confirmed cases of COVID-19 and more than 6 million deaths [1]. COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The high transmissibility and clinical severity of COVID-19 rapidly caused a global health crisis. In most cases, the symptoms of COVID-19 are mild. The most commonly reported symptoms are headache (34-70%), myalgia (36-63%), fatigue (63%), cough (50.3-63.2%), and fever (43-45%) [2,3]. However, some cases result in an acute course and complications. So far, the pathogenesis of COVID-19 has not been completely clarified, and more than two years after the beginning of the COVID-19 outbreak, there is still no drug proven to be effective against it.
SARS-CoV-2 is a positive-sense single-stranded RNA virus and encodes approximately 29 proteins, including four structural proteins [4][5][6]. Some proteins of SARS-COV-2 have been confirmed to interact with host proteins and affect host gene expression [7,8]. In addition to genetic regulation, epigenetic regulation is also important to the viral infection [9][10][11]. MicroRNA (miRNA) can cause RNA silencing and post-transcriptional repression of gene expression [12,13]. They can regulate various cellular activities, including apoptosis, cell growth, and differentiation [14]. Long noncoding RNA (lncRNA), a class of non-coding RNA, can be an inhibitory regulator of miRNA [15,16] or involved in transcriptional regulation. They are involved in different regulatory mechanisms during viral infection [17].
For new drug discovery, the global pharmaceutical industry faces high attrition rates [18], high costs increasing with time, and changing regulatory requirements. These risks cause investors to be less willing to invest in the pharmaceutical industry. Therefore, drug repositioning (also called drug repurposing) is proposed [19,20]. Drug repositioning refers to the identification of new usages for investigational or approved drugs which are outside the scope of the original medical indication. However, the experimental methods to verify the drug-target interaction (DTI) are extremely expensive and time-consuming. It is necessary that the systems biology method be employed to investigate pathogenic mechanisms to identify significant drug targets, and this should be followed by computational methods based on the DTI model to predict the drugs for significant drug targets on a large scale to reduce the high cost and development time, especially during the COVID-19 outbreak. Moreover, combination therapy for multiple proteins (drug targets) using more than one medication or modality has been used for many diseases, including various cancers [21] and infections [22]. It has been proposed for the treatment of COVID-19 in some research [23,24]. Veklury (Remdesivir) is an FDA-approved drug for mild-tomoderate COVID-19. However, the efficiency of Remdesivir is still debatable [25][26][27]. Olumiant (Baricitinib), the other FDA-approved drug, is used in the treatment of COVID-19 in hospitalized adults who require supplemental oxygen, invasive or non-invasive mechanical ventilation, or extracorporeal membrane oxygenation (ECMO-that is, Baricitinib is not used for mild-to-moderate COVID- 19).
In this study, we first employed the systems biology method via host-pathogen-interactive (HPI) RNA-seq time profile data to investigate the pathogenesis of COVID-19 in order to identify significant biomarkers as drug targets and treat the SARS-CoV-2 infection at the amplification and saturation stages. Recently, a deep neural network (DNN)based DTI model that targets proteins to significantly improve the prediction of DTI compared to conventional DTI models has been introduced by a deep learning algorithm through the feature vectors of molecular drugs [28,29]. Therefore, a DNN-based DTI model was trained by DTI databases [30][31][32][33][34] to efficiently predict candidate molecular drugs for each significant biomarker (drug target) of the amplification and saturation stages of SARS-CoV-2 infection. Then, based on drug design specifications, in order to prune some candidate drugs, a set of candidate molecular drugs combined as a multiplemolecule drug was proposed as a potential combination therapy to target these significant biomarkers at the amplification and saturation stages to disrupt the progression of SARS-CoV-2 infection. Therefore, the systematic procedure of repositioning a multiple-molecule drug for disrupting the progression of SARS-CoV-2 infection by utilizing the systems biology method through HPI RNA-seq data and DNN-based DTI model with drug design specifications are described as follows: (1) constructing the candidate HPI genome-wide genetic and epigenetic network (HPI-GWGEN) using big data mining from databases [35][36][37][38][39][40][41][42][43][44][45][46]; (2) system identification and system order selection to eliminate the false positives in the candidate HPI-GWGEN to obtain the real HPI-GWGEN using the system models and HPI RNA-seq time profile data; (3) applying the principal network projection (PNP) method to construct the core HPI-GWGEN and obtain the core HPI signaling pathways and their abnormal downstream cellular functions of SARS-CoV-2 infection using annotations of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [47,48]; (4) investigating and comparing the pathogenic mechanisms between the amplification and saturation stages of SARS-CoV-2 infection to identify significant biomarkers as drug targets against SARS-CoV-2 infection at the amplification and saturation stages; and (5) predicting the candidate drugs for significant biomarkers using a DNN-based DTI model and selecting potential multiple-molecule drugs for the therapeutic treatment of the amplification and saturation stages of SARS-CoV-2 infection with three drug design specifications, i.e., regulation ability, toxicity, and sensitivity. We finally proposed the combination of bosutinib, erlotinib, and 17-beta-estradiol as a multiple-molecule drug for the treatment of the amplification stage of SARS-CoV-2 infection and the combination of erlotinib, 17beta-estradiol, and sertraline as a multiple-molecule drug for the treatment of the saturation stage of mild-to-moderate SARS-CoV-2 infection.

Core HPI Signaling Pathways during Amplification and Saturation Stage of SARS-CoV-2 Infection by the Systems Biology Method
To find the molecular drugs for disrupting SARS-CoV-2 infection, systematic methods including reverse engineering (i.e., reversing the infectious process of SARS-CoV-2) were first used to construct the real HPI-GWGEN by the genome-wide HPI RNA-seq time-profile data, and the PNP method was applied to extract the core HPI-GWGEN for analysis. With the annotation of KEGG pathways, the core HPI-GWGEN was annotated as the core signal pathways to identify significant biomarkers as drug targets for the pathogenesis of SARS-CoV-2 infection at the amplification and saturation stages (note that the amplification and saturation stages are defined according to the RNA-seq time profile data in Section 4.2.1). After training the DNN-based DTI model, we predicted the potential multi-molecule drugs that target the significant biomarkers (drug targets) against SARS-CoV-2 infection at the amplification and saturation stages based on the DNN-based DTI model and three drug design specifications, respectively. The procedure of the systems biology method and drug discovery is shown in Figure 1.

Well-trained DNN-based DTI Model
Drug-target feature vectors (PyBioMed) Potential multiple-molecule drug for stage of COVID-19

Drug design specifications screening
Core HPI signaling pathways for SARS-CoV-2 infection Figure 1. The flowchart of the systems biology method of investigating pathogenesis of SARS-CoV-2 infection to identify significant biomarkers as drug targets and systems drug discovery via a DNNbased DTI model to predict candidate molecular drugs for drug targets through a deep learning scheme and then to screen multiple-molecule drugs by drug design specifications for disrupting the SARS-CoV-2 infection at the amplification and saturation stages.
The number of nodes and edges in candidate HPI-GWGEN and real HPI-GWGENs of amplification and saturation infectious stages are shown in Table 1. The significant pruning of edges between candidate and real HPI-GWGEN implies that the false-positive edges in candidate HPI-GWGEN are eliminated by the system order selection. The real HPI-GWGENs of amplification and saturation infectious stages are respectively shown in Figure A1, which are represented with the aid of Cytoscape [49]. Table 1. The number of nodes and edges in candidate and real HPI-GWGENs at amplification and saturation stages. After system identification and system order detection, the false positives in each node of candidate HPI-GWGEN can be pruned through each infectious stage of HPI RNA-seq data and then form the real HPI-GWGEN of each infectious stage. In addition to the significant deletion of false-positive edges, the PNP method was applied to evaluate the significance of each node to the network matrix according to its projection value. The core HPI-GWGENs in the amplification and saturation stages were built by extracting the top-6000-significance nodes from the corresponding real HPI-GWGENs of the amplification and saturation infectious stages. The core HPI-GWGENs of amplification and saturation infectious stages are shown in Figure 2 and Appendix A, which were represented with the aid of Cytoscape [49]. The top-6000-significance nodes can be uploaded to the database for annotation, visualization, and integrated discovery (DAVID) functional annotation tool [47,48] to help investigate the pathogenic mechanism of the amplification and saturation stages of SARS-CoV-2 infection. The enrichment analysis of KEGG pathways for amplification and saturation infectious stages are shown in Tables 2 and 3, respectively. The enrichment analysis of KEGG may imply what pathways are involved in each infection stage. We try to find the pathogenetic molecular mechanism for the viral amplification in early infection and the pathogenetic molecular mechanism for the viral saturation in late infection.   With the help of the enrichment analysis of KEGG and the exiting literature, the core host-pathogen interactive signaling pathways and their downstream abnormal cellular functions of SARS-CoV-2 infection at the amplification and saturation stages are shown in Figure 3. The core signaling pathways with red background in the middle of Figure 3 are common between the amplification and saturation stages of SARS-CoV-2 infection; the specific signaling pathways on the left-hand-side are the amplification stage of SARS-CoV-2 infection; and the specific signaling pathways on the right-hand-side are the saturation stage of SARS-CoV-2 infection.

Investigation of Specific Core HPI Signaling Pathways in Amplification Infectious Stage
According to the core HPI signaling pathways of SARS-CoV-2 infection, several downstream abnormal cellular functions were identified in Figure 3. We try to investigate the progression of SARS-CoV-2 infection. In specific core signaling pathways of the amplification infectious stage in the left column of Figure 3, AP2M1 and AAK1 were identified to be associated with endocytosis. AAK1 can phosphorylate AP2M1, which promotes receptor-mediated endocytosis to help with viral entry [50].
The receptor EGFR was identified to activate MAPK1 by signaling through SOS1, RRAS2, BRAF, and MEK1. The activation of receptor EGFR may be caused by the ligand HB-EGF. For coronavirus infection, the upregulation of ligand HB-EGF, which can bind to the receptor EGFR and ERBB4, has been observed [51]. MAPK1 is the pivotal component to activate the downstream cellular function of receptor EGFR. MAPK1 suppresses the TF FOXO3 by phosphorylation and activates MNK1 and TF FOS. The target gene of TF FOXO3, BCL2L11, encodes the Bcl-2-like protein 11 (BIM) and acts as an apoptotic activator [52]. Lots of pathogens may interfere in the host's cellular functions to establish the proper environment for replication. Because viruses, including SARS-CoV-2, need to use the host cell for replication, the apoptosis of the host cell in early infection may be not desirable for it [53]. Thus, the suppression of BCL2L11 causes the survival of the host cell, which is necessary for the replication of SARS-CoV-2. MNK1 activates the translation initiation factor EIF4E. EIF4E can direct the ribosomes to bind to the 5′ cap of mRNA. It has been reported that the inhibition of the interaction between viral mRNA and EIF4E can suppress the replication of SARS-CoV-2 [54], which implies that the activation of EIF4E is also a desirable event for the replication of SARS-CoV-2. The activation of TF FOS induces the target genes CCL2 and IL12. CCL2 belongs to cytokine genes and is involved in inflammation and immunoregulatory processes. It exhibits a chemotactic activity for basophils and monocytes, which can help the host cell defend the SARS-CoV-2 virus. In addition to activating MAPK1, receptor EGFR also interacts with PIK3C2A and NUMB. Recently, the activation of PIK3C2A has been proven to act both in the control of receptor endocytosis and resensitization [55], and the activation of receptor EGFR can increase the activity of PIK3C2A [56]. The activation of endocytic adaptor protein NUMB has been reported to promote the recycling of EGFR [57]. The recycling and resensitization may enhance the activation of downstream genes of receptor EGFR.
The receptor ERBB4 is also activated by the binding of HB-EGF. After interacting with HB-EGF, receptor ERBB4 activates PIK3CB and then activates the AKT1. AKT1 can suppress TF FOXO3 and activate MTOR, CHUK, and TF CREB1. The phosphorylation of MTOR can activate the S6K1. The activation of S6K1 induces protein synthesis, including the viral protein at the ribosome [58]. The activation of TF CREB1 by AKT1 induces the target gene BCL2, which can encode an outer mitochondrial membrane protein to block the apoptosis of cells. Therefore, the expression of BCL2 can promote the survival of the infected cells. As mentioned above, the survival of infected cells may help in the replication of SARS-CoV-2. The TF CREB1 is also activated by lncRNA MALAT1. The high expression of MALAT1 has been reported to be differentially expressed in severe COVID-19 [59]. A previous study has shown that the lncRNA MALAT1 can maintain the phosphorylation of TF CREB1 for continuous CREB1 signaling activation [60]. The lncRNA MALAT1 has also been found to regulate the miRNA MIR106B. The miRNA MIR106B is silenced by MALAT1 and induces the target gene of MIR106B [61]. Here, the identified target gene of miRNA MIR106B is CDKN1A [62], which can block cell cycle progression by inhibiting the cyclin-dependent kinase (CDK). Lots of viruses can induce the host cell cycle arrest to produce resources and construct a proper environment for viral replication to increase replication efficiency [63].
In summary, the activated EGFR is involved in the endocytosis which may help the entry of virus and establishes a favorable environment, including the suppression of proapoptotic protein and the activation of eukaryotic translation initiation factor. Furthermore, receptor EGFR has been shown to act as a cofactor for the internalization of several viruses [64,65]. Endocytosis is also required for the entry of SARS-CoV-2 into the host cell. To disrupt the proper environment for viral replication, we choose EGFR and AKT1, as significant biomarkers, as drug targets for the specific pathogenesis of the amplification stage to destroy the favorable environment.

Investigation of Common Core HPI Signaling Pathways of Amplification and Saturation Infectious Stages
Some signaling pathways were identified in both the amplification and saturation stages of SARS-CoV-2 infection. The common signaling pathways of amplification and saturation infectious stages are shown in the middle column of Figure 3 with a red background. The receptor TLR2 has been shown to detect the SARS-CoV-2 envelope (E) protein as its ligand [66]. It is a common pathogen recognition receptor that activates the host's innate immunity. After detecting the SARS-CoV-2 E protein, TF NFKB1 is activated by signaling through MYD88, IRAK4, TRAF6, TAK1, and CHUK. The activation of TF NF-B induces the target genes CCL2, IL1A, TNF, and IL12A. All of them are involved in the inflammation and innate immune responses [67][68][69] to defend against SARS-CoV-2 infection. TNF may also be implicated in apoptosis [70]. It is the ligand of receptor TNFR1 as well. In addition to the activation of TF NF-B as a defense response of the host cell, melanoma differentiation-associated protein 5 (MDA5) can detect replicative intermediates of both positive-and negative-strand RNA viruses [71]. After detecting the virus, TF IRF3 is phosphorylated through signaling proteins MDA5, MAVS, TRAF3, and TBK1, which can induce the target gene IFNB1. The IFNB1 can be the ligand of the receptor IFNAR1. The infected host cells will release interferons (IFN) and let nearby cells enhance their antiviral defenses. Like other viruses [72], SARS-CoV-2 also interrupts the host cell's immune system. The SARS-CoV-2 membrane (M) protein was identified to inhibit the production of IFN. A recent study has shown that SARS-CoV-2 M protein can prevent nuclear translocation of TF IRF3 and inhibit the phosphorylation of IRF3 [73].
The host cell cycle is also interfered with by the SARS-CoV-2 protein. The receptor TGFBR1 was identified to activate TF SMAD3, which may be caused by the ligand TGF-. The upregulation of ligand TGF-has been observed in SARS-CoV-2 infection [74,75]. The phosphorylation of SMAD3 induces the target genes CDKN1A, CDKN2B, and SER-PINE1. Both CDKN1A and CDKN2B are CDK inhibitors to disrupt the cell cycle's progression. As mentioned above, cell cycle arrest is a desirable event for viral replication. SARS-CoV-2 nucleocapsid (N) protein was identified to interact with TF SMAD3 to enhance the downstream target genes, which was consistent with previous research [76]. In addition to causing cell cycle arrest, TF SMAD3 is also involved in coagulopathy by inducing the target gene SERPINE1. The overexpression of SERPINE1 has been reported to play an important role in COVID-19-associated coagulopathy, leading to acute respiratory distress syndrome (ARDS) [77]. Coagulopathy may be fatal, so TF SMAD3 should be suppressed.
Briefly, the abnormal suppression of IRF3, which induces IFNB1, by the SARS-CoV-2 M protein and the overactivation of TF SMAD3 establishes the proper environment for the replication of SARS-CoV-2. Therefore, we choose IFNB1 and SMAD3 as significant biomarkers (drug targets) for the overlapping pathogenesis of the amplification and saturation stages.

Investigation of Specific Core HPI Signaling Pathways at Saturation Infectious Stage
In the saturation infectious stage, the mRNA level of the virus is decreased. Based on signaling pathways in the right column of Figure 3, we investigate the host defense mechanism against SARS-CoV-2 infection. The inducted proteins of target genes TNF and IFNB1 may be the ligands. While receiving the ligand TNF-, the downstream proteins of receptor TNFR1 are activated. The activation of receptor TNFR1 can activate TRAF2 and FADD through signaling protein TRADD. After activation of TRAF2, TF JUN is phosphorylated via signaling proteins MEKK1/SEK1/JNK3, which induce the target genes TNF, IL12A, CXCL10, and TP53. The target gene CXCL10 is a pro-inflammatory cytokine that is involved in lots of processes such as apoptosis, chemotaxis, and the activation of peripheral immune cells [78][79][80]. The induction of TP53 can induce the target gene BAX. BAX is a pro-apoptosis protein that has been shown to be involved in P53-mediated apoptosis [81]. The other apoptotic pathway activated by the receptor TNFR1 is through FADD, which activates the BH3 interacting-domain (BID) protein through signaling protein CASP10 [82]. BID is also a pro-apoptosis protein. As mentioned above, the apoptosis of the infected host cells can reduce the replication of virus because the virus must employ the living host cell for replication.
Another specific signaling pathway of the saturation infectious stage is the production of ISG. With the binding of ligand IFN-, receptor IFNAR1 interacts with TYK2. TYK2 can stimulate the phosphorylation of STAT1 to interact with TF IRF1 and TF IRF9. TF IRF1 induces the target gene IFNB1 and TF IRF9 targets genes ISG15 and MX1. Interferon-stimulated gene 15 (ISG15) is important to host cells against virus infection. It is involved in several cellular functions, including the limit of the newly synthesized virus proteins, induction of natural killer cell proliferation, and the enhancement of lytic capabilities of lymphokine-activated killer-like cells [83]. The target gene MX1 can antagonize the replication process of several different RNA and DNA viruses. Recent research has shown that MX1 is upregulated after SARS-CoV-2 infection, that MX1 has a direct effect on the viral ribonucleoprotein complex, and that its GTPase activity is essential for its antiviral function [84].
In summary, for the specific core HPI signaling pathways in the saturation infectious stage, the virus mRNA level begins to reduce. The activation of TNF and IFN signaling pathways may contribute to the host cell defense against SARS-CoV-2 infection. Furthermore, the enhancement of IFNB1 in common core HPI signaling pathways potentially enhances the IFN signaling pathway. Thus, the TF JUN is picked as the drug target for the specific pathogenesis in the saturation infectious stage to enhance the defense of SARS-CoV-2 infection.

Multiple-Molecule Drug Discovery and Design by DNN-Based DTI Model with Drug Design Specifications
After investigating the core HPI signaling pathways and their downstream abnormal cellular functions for SARS-CoV-2 infection at the amplification and saturation stages in the above subsection, we finally chose EGFR, AKT1, IFNB1, and SMAD3 as the drug targets for the amplification stage of SARS-CoV-2 infection and picked IFNB1, SMAD3, and JUN as the drug targets for saturation stage of SARS-CoV-2 infection. The purpose of drug targets for the amplification infectious stage is mostly to reduce the interferences of the host cell by SARS-CoV-2. The drug target JUN for therapy in the saturation infectious stage is to shorten the course of the SARS-CoV-2 infection by reducing the virus's replication. Subsequently, the DNN-based DTI model is constructed and trained by DTI databases to predict the candidate repurposing drugs that target these significant biomarkers (drug targets). Finally, the potential multiple-molecule drug is proposed for SARS-CoV-2 infection by screening the candidate repurposing drugs with three drug design specifications. The architecture of the DNN-based DTI model for DTI prediction is shown in Figure 4.  The DNN-based DTI model was first trained by DTI data at the right column. Then, the well-trained DNN-based DTI model was used for the prediction of candidate drugs and the candidate drugs were finally screened by drug design specifications as the potential drugs to combine a multiplemolecule drug in the left column.

Prediction Performance of DNN-Based DTI Model
The model was trained by Keras with a batch size = 64, epoch = 200 (with Early Stopping), and Adam optimizer (default arguments). A 10-fold cross-validation was applied to estimate the prediction performance of the DNN-based DTI model, which is shown in Table 4. The learning process (loss and accuracy of training and validation) is visualized in Figure 5 (the early stop at epoch = 117 to avoid overfitting). Finally, the receiver operating characteristic (ROC) curve is plotted in Figure 6, and the area under the ROC curve (AUC) is 0.991, which implies that the model has good ability to distinguish between positive and negative interactions.   After training the DNN-based DTI model, we can predict the candidate drugs which target the significant drug targets found by the systems biology method. The regulation ability, toxicity, and sensitivity of the drug are considered as drug design specifications for screening the candidate drugs. The regulation ability indicates the upregulation (> 0) or downregulation (<0) of the drug target interaction. Therefore, if a drug target is upregulated during infection process, we need to select a drug to downregulate it, and vice versa. A small value of sensitivity indicates a lower sensitivity to chemical perturbation for the cell, and a higher LC50 implies a lower toxicity for the body. Thus, to upregulate a target gene, we tend to find a molecular drug with a larger regulation ability, a small value of sensitivity, and a larger LC50 value. Based on these drug design specifications, some candidate drugs for the significant drug targets are shown in Table 5. Finally, the potential multiple-molecule drug shown in Table 6 is proposed for the treatment of the amplification stage of SARS-CoV-2 infection, and the potential multiple-molecule drug shown in Table 7 is proposed for the saturation stage of mild-to-moderate SARS-CoV-2 infection. The drugs with a star (*) are the potential drugs for multiple-molecule drugs in Tables 6 and 7. Table 6. The proposed potential multiple-molecule drug with the corresponding targets for disrupting the progression of the amplification stage of SARS-CoV-2 infection.
Erlotinib 17-beta-estradiol V indicates the drug can induce or inhibit the corresponding target.

Discussion
After two years of the COVID-19 pandemic, many drugs have been used to treat COVID-19. Nevertheless, there is still no drug proven effective against COVID-19 and no clear mechanism for how SARS-CoV-2 affects the host cell to replicate efficiently. We first constructed the dynamic models to describe the candidate HPI-GWGEN and used HPI RNA-seq time-profile data to identify the core signaling pathways of the HPI-GWGEN during SARS-CoV-2 infection. Then, we explored the possible systematic molecular mechanism of why SARS-CoV-2 can replicate rapidly by the reverse engineering method. Although the treatment of cytokine storm (the main cause of mortality in COVID-19) is a crucial issue, the suppression of viral load is also important. For viral infection, the drugs can be designed to target the host or virus. The drugs targeting the virus are designed to target viral proteins such as viral proteases to inhibit their biological function. The rapid mutation of viruses may cause the failure of the drugs that target the virus, especially RNA viruses. The variant of viruses may not only increase the transmissibility but also cause the failure of vaccines and the drugs targeting the viral protein. Although the design of drugs targeting the virus is more popular, antiviral drugs targeting host proteins have also been used for other viral infections. For instance, the commercial drug interferon alpha has been shown to be effective against the viral infection of hepatitis B and C virus, papillomavirus (Kaposi's sarcoma) virus, and human herpesvirus 8 [85]. The drug Nitazoxanide shows antiviral activity for some viruses by targeting the host protein and affecting the host cellular functions [85,86]. Thus, we try to find the drugs that target the host proteins to reduce the risk of the variant of SARS-CoV-2. With the temporal information of HPI RNA-seq time-solved data, we could investigate the core HPI signaling pathways of SARS-CoV-2 infection, as shown in Figure 3, to identify the significant drug targets for the amplification and saturation infectious stages. Then, we utilized the DNNbased DTI model to predict the candidate drugs which can induce or inhibit the significant biomarkers (drug targets). After screening drug design specifications, we suggested the multiple-molecule drug consisting of bosutinib, erlotinib, and 17-beta-estradiol for the treatment of the amplification stage of SARS-CoV-2 infection and the multiple-molecule drug consisting of erlotinib, 17-beta-estradiol, and sertraline for the treatment of the saturation stage of mild-to-moderate SARS-CoV-2 infection.
In the multiple-molecule drug for the amplification stage of SARS-CoV-2 infection, Bosutinib, a synthetic quinolone derivative and tyrosine kinase inhibitor, is commonly used for the treatment of chronic myeloid leukemia. It has been shown that Bosutinib can inhibit the activation of EGFR and induce the apoptosis of cells [87]. In addition to the inhibition of EGFR activation, it can inhibit the activation of Akt as well [87]. Erlotinib, a quinazoline derivative, is a common EGFR inhibitor that is used to treat cancer, especially for EGFR mutation-positive, non-small-cell lung cancer [88]. In addition to the inhibition of EGFR, a previous study has also shown that Erlotinib can inhibit the activation of Smad2/3 [89]. Furthermore, Erlotinib was reported to prevent fibrosis development in in vivo models [51], and it was proposed as a therapeutic agent in the treatment of COVID-19 [90]. For the drug targeting IFNB1, we predicted 17-beta-estradiol. 17-beta-estradiol is a synthetic form of estradiol, a steroid sex hormone, which may be involved in inflammation and the immune [91]. A previous study shows that 17-beta-estradiol can induce IFN, especially IFN-, via the activation of IRF1 [92]. 17-beta-estradiol is also reported to suppress the phosphorylation of Smad2 and Smad3 and reduce their gene reporter activity in response to TGF-beta [93]. Interestingly, many observations have shown that the mortality rate of men is higher than women [94][95][96]. Estradiol may be a protective role against COVID-19 [97]. Hence, it was also proposed as a therapy of COVID-19 [98]. Here, we think that it can induce IFNB1 and suppress the phosphorylation of SMAD3 to achieve protection against COVID-19. In the multiple-molecule drug for the saturation stage of SARS-CoV-2 infection, Sertraline, a selective serotonin reuptake inhibitor used in the treatment of depression, can upregulate JUN and induce apoptosis [99][100][101]. The apoptosis of infected cells is important to clean the viruses in infected cells. In addition to the upregulation of JUN, the anticoagulant property of Sertraline has been reported as well [102].
In summary, we applied the systems biology method to clearly understand the molecular mechanism of rapid replication of SARS-CoV-2. Subsequently, we identified the significant biomarkers as drug targets to destroy the proper microenvironment for the replication of SARS-CoV-2 or enhance the defense of host cells against SARS-CoV-2. EGFR, AKT1, IFNB1, and SMAD3 are chosen as the drug targets for the amplification stage of SARS-CoV-2 infection, and IFNB1, SMAD3, and JUN are picked out as the drug targets for the saturation stage of SARS-CoV-2 infection. After training the DNN-based DTI model with DTI databases, we could predict the potential molecular drugs with three design specifications for the treatment of SARS-CoV-2 infection. Finally, we proposed the combination of bosutinib, erlotinib, and 17-beta-estradiol as the multiple-molecule drug for the treatment of the amplification stage of SARS-CoV-2 infection and the combination of erlotinib, 17-beta-estradiol, and sertraline as the multiple-molecule drug for the treatment of the saturation stage of mild-to-moderate SARS-CoV-2 infection.

Construction of the Candidate HPI-GWGEN using Big Data Mining
The HPI-GWGEN contains two networks: the HPI protein-protein interaction network (HPI-PPIN) and the HPI gene regulation network (HPI-GRN). Both of them can be further classified into the host intraspecies network, host-pathogen interspecies network, and pathogen intraspecies network. In the candidate HPI-GWGEN, we are only concerned with whether proteins, genes, miRNAs, or lncRNAs in the candidate HPI-GWGEN have existing interactions or regulations. This can be expressed by a Boolean matrix.
There are currently not enough regulations between humans and SARS-Cov-2 to construct an HPI-GRN. We first supposed that the regulations of the host on the virus genes are not negative. We then used systematic methods to eliminate the false positives.

HPI RNA-Seq Time-Profile Data
To find the crosstalk between humans and SARS-CoV-2, the dynamic models for the candidate HPI-GWGEN were constructed, and HPI RNA-seq time-profile data were utilized to present the expressions of HPI-GWGEN during the amplification and saturation infectious stages for these HPI-GWGEN models. The dynamic models can describe the candidate HPI-GWGEN through the reverse engineering method [103] using HPI RNAseq time-profile data to reflect the system behavior of HPI-GWGEN during SARS-CoV-2 infection.
The HPI RNA-seq data could be downloaded from the National Center for Biotechnology Information (NCBI) (GEO number: GSE163547) [104]. We found the average for two samples infected with the MOI of 0.25 at 0, 4, 24, 48, 72, and 96 h post-infection (hpi). The genome-wide HPI RNA-seq time-profile data were employed to identify the system parameters of candidate HPI-GWGENs using the system identification method [103]. Since the mRNA level of SARS-CoV-2 majorly increased with time, reached the peak at 48 hpi, and slowly decreased (saturation) after 48 hpi, we defined the period from 0 to 48 hpi as the (viral) amplification stage and the period from 24 to 96 hpi as the (viral) saturation stage. With the Gencode v35/v27 annotation, the nodes were sorted into six types to be adopted for the dynamic models: host proteins, virus proteins, host genes, host miRNAs, host lncRNAs, and virus genes.

Dynamic Models for HPI-GWGEN
For the candidate HPI-PPIN in candidate HPI-GWGEN, the expression levels of host proteins and interactive pathogen proteins can be modeled as the following dynamic equations [103]: where ̇ ( ), ( ), ( ), and ( ) represent the expression level of the th host protein, the ℎth host protein, the th pathogen protein, and the th host gene at time t, respectively; and are the number of the host proteins and the pathogen proteins that interact with the th host protein, respectively; ̇ and specify the interactive ability between the ℎth host protein and the th host protein and between the th pathogen protein and the th host protein, respectively; , − , and indicate the translation rate from the corresponding mRNA, the degradation rate, and the basal activity level of the th host protein, respectively; the basal level denotes the unknown or unavailable interaction such as phosphorylation; ( ) is the stochastic noise of the th host protein at time t; is the total number of host proteins in candidate HPI-PPIN.
The dynamic interaction models of pathogen proteins of HPI-PPIN in the candidate HPI-GWGEN can be described as the following discrete-time dynamic equations [103]: ≥ 0 − ≤ 0, for = 1, 2, ⋯, where ( ), ( ), ( ), and ( ) denote the expression level of the th pathogen protein, the ℎth host protein, the th pathogen protein, and the th pathogen gene at time t, respectively; and are the number of the host proteins and the pathogen proteins that interact with the th pathogen protein in a candidate HPI-PPIN, respectively; and specify the interactive ability between the ℎth host protein and the th pathogen protein and between the th pathogen protein and the th pathogen protein, respectively; , − , and indicate the translation rate from the corresponding mRNA, the degradation rate, and the basal activity level of the th pathogen protein, respectively; ( ) represents the stochastic noise of the th pathogen protein at time t; is the total number of pathogen proteins in a candidate HPI-PPIN.
The dynamic regulatory models of host genes in the HPI-GRN of a candidate HPI-GWGEN can be depicted as the following discrete-time dynamic equations [103]: where ( ), ( ), ( ), and ( ) denote the expression level of the th host gene, the ℎth host TF, the th host miRNA, and the th host lncRNA at time t, respectively; , , and represent the number of host TFs, host miRNAs, and host lncRNAs that have regulation on the th host gene, respectively; , , and specify the regulation ability of the ℎth host TF, the th host miRNA, and the th host lncRNA on the th host gene, respectively; − and indicate the degradation rate and the basal activity level of the th host gene, respectively; ( ) is the stochastic noise of the th host gene at time t; is the total number of host genes in a candidate HPI-GRN. The dynamic regulatory models of the host miRNAs in the HPI-GRN of a candidate HPI-GWGEN can be modeled as the following discrete-time dynamic equations [103]: where ( ) , ( ) , ( ) , and ( ) denote the expression level of the th host miRNA, the ℎth host TF, the th host miRNA, and the th host lncRNA at time t, respectively; , , and represent the number of host TFs, host miRNAs, and host lncRNAs that have regulation on the th host miRNA, respectively; , , and specify the regulation ability of the ℎth host TF, the th host miRNA, and the th host lncRNA on the th host miRNA, respectively; − and represent the degradation rate and the basal activity level of the th host miRNA, respectively; ( ) is the stochastic noise of the th host miRNA at time t; is the total number of host miRNAs in candidate HPI-GRN.
The dynamic regulatory models of the host lncRNAs in the HPI-GRN of a candidate HPI-GWGEN can be depicted as the following discrete-time dynamic equations [103]: where ( ), ( ), ( ), and ( ) denote the expression level of the th host lncRNA, the ℎth host TF, the th host miRNA, and the th host lncRNA at time t, respectively; , , and represent the number of host TFs, host miRNAs, and host lncRNAs that have regulation on the th host lncRNA, respectively; , , and specify the regulation ability of the ℎth host TF, the th host miRNA, and the th host lncRNA on the th host lncRNA, respectively; − and indicate the degradation rate and the basal activity level of the th host lncRNA, respectively; ( ) is the stochastic noise of the th host lncRNA at time t; is the total number of host lncRNAs in a candidate HPI-GRN.
The dynamic regulatory models of the pathogen genes in the HPI-GRN of a candidate HPI-GWGEN can be depicted as the following discrete-time dynamic equations [103]: where ( ), ( ), ( ), ( ), and ( ) denote the expression level of the th pathogen gene, the ℎth host TF, the th host miRNA, the th host lncRNA and th pathogen protein at time t, respectively; , , , and represent the number of host TFs, host miRNAs, host lncRNAs, and pathogen TFs that have regulation on the th pathogen gene, respectively; , , , and specify the regulation ability of the ℎth host TF, the th host miRNA, the th host lncRNA, and the th pathogen protein on the th pathogen gene, respectively; − and indicate the degradation rate and the basal activity level of the th pathogen gene, respectively; ( ) is the stochastic noise of the th pathogen gene at time t; is the total number of pathogen genes in a candidate HPI-GRN.

System Identification and System Order Selection for HPI-GWGEN
With the discrete-time dynamic models for the candidate HPI-GWGEN, we can perform system identification using HPI RNA-seq time-profile data. Nevertheless, the number of parameters may be larger than the number of samples, which may cause over-fitting in the least square parameter estimation. Furthermore, the candidate HPI-GWGEN contains many false positives. Thus, we used the cubic spline interpolation to solve the overfitting problem in the system identification process [103]. Then, we applied a system order detection method, the Akaike information criterion (AIC) [103], to detect the system order of the dynamic equations of protein interaction and gene regulation after the system identification to prune the false positives out of the system order in the candidate HPI-GWGENs to obtain the real HPI-GWGEN.
To estimate the parameters in Equations (1)-(6), we arranged each equation as the linear regression form with regressor (expression level from RNA-seq data) ( ) and the parameter vector as follows: Then, we used the time points from to (T represents the number of time points of each infectious stage after interpolation) as the vector of observations. We could form the regressor matrices and the vectors of observations with Equations (7)-(12) as the following augmented regression equations, respectively: With regressor matrices and the regression vectors of observations , we could obtain the estimated parameter vectors by solving the following constrained optimization problems with some biological upper and lower bounds (i.e., the translation rate ≥ 0, the degradation rate − ≤ 0, and the regulation ability of miRNAs ≤ 0) on Equations (13)-(18) [103]: subject to We wanted to prune the false positives in the candidate HPI-GWGEN by detecting the edges out of the system order (i.e., the number of interactions or regulations for each node). The AIC considers both the estimated variance and model complexity to obtain the fittest number of edges (i.e., the system order). The AIC value for each model of a node with estimated parameters is given as follows (the dim() in equations represents the dimension of vector) [103]: ( , , , ) = ( In the AICs in Equations (25)- (30), increasing the number of interactions or regulations would decrease the system identification error in the first term but increase the second term in the right-hand-side of Equations (25)- (30), and vice versa. The right number (system order of each node) of regulations and interactions would lead to the minimum value of AIC. In other words, when the false positives are considered in AIC with a larger model complexity, a larger AIC value is obtained because the false positives cannot reduce the estimated error variance in the first term and increase the model complexity in the second term. Hence, we can delete some false-positive interactions or regulations in each node of the candidate HPI-GWGEN to achieve the correct (real) HPI-GWGEN via the AIC method. After solving optimization problems in Equations (19)- (24) with the aid of the MATLAB function lsqlin and considering the system order, we could obtain the real HPI-GWGEN. To represent the real HPI-GWGEN as a network matrix, we needed to integrate the PPIN and GRN. When we described the dynamic models in Equations (1)-(6), we did not consider the zero term. To represent different models (especially, different dimension of network) with a network matrix, we had to fill the terms which lack interaction or regulation (i.e., without interaction or regulation in candidate HPI-GWGENs or the false positives pruned off after system order detection) with zeros. For convenience, we still used the same superscript for estimated coefficients of interaction or regulation to denote the relation between two nodes, i.e., still represents the estimated interactive ability between the 1st host protein and the 2nd host protein; it will be zero if these proteins do not interact in the candidate HPI-GWGEN, or it will be a false positive pruned off after system order selection. Then, the network matrix ∈ ( )×( ) of the real HPI-GWGEN was integrated as the following network matrix: where P and G represent the PPIN and GRN, respectively; HP, PP, HG, HM, HL, and PG specify the host protein, the pathogen protein, the host gene, the host miRNA, the host lncRNA, and the pathogen gene, respectively. We wanted to investigate the significant pathogenesis for the amplification and saturation stages based on their real HPI-GWGEN. However, the real HPI-GWGEN was too large to be analyzed by the annotation of KEGG pathways. Thus, we applied the PNP method to extract the core HPI-GWGEN from the real HPI-GWGEN, which is more easily annotated by KEGG pathways [47,48].

PNP Method to Extract the Core HPI-GWGEN from Network Matrix of Real HPI-GWGEN
The is a method that projects each row (node) of network matrix M in Equation (31) (real HPI-GWGEN) onto the significant 85% structure of the overall network so that we can know the importance of each node to the overall network according to the projection value. We first performed the singular value decomposition (SVD) for the network matrix using the following equation: where the columns of L and R denote the left singular vectors and right singular vectors of M, respectively; E only contains the singular values of M on the entries of and ≥ ≥ ≥ ⋯ ≥ 0; = + + + + + and = + + + are the row and column dimensions of network matrix M, respectively.
Here, we adopted the truncated SVD. That is, we chose the top t singular values that consisted of more than 85% of the energy of network. In other words, the minimum t is satisfied with the following equation: Then, we define the projection value for the th row of the network matrix on the top t singular vectors of the network matrix M using the following equation: where and represent the th row of M in Equation (31) and the th column of (the columns of R are the right singular vectors of M), respectively.
Subsequently, the projection values of each node were ranked from large to small values (a larger value implies more significance to the network). We used the top-6000significance nodes to construct the core HPI-GWGEN, which is acceptable for the annotation of KEGG pathways in DAVID [47,48]. Thus, we could extract the core HPI-GWGEN of the amplification and saturation stages from the real HPI-GWGEN of the amplification and saturation stages, correspondingly. With the aid of the annotation of KEGG pathways, we investigated the pathogenic mechanism. Finally, considering the core HPI signaling pathways and their downstream abnormal cellular functions, we selected the significant biomarkers as drug targets against the amplification and saturation stages of SARS-CoV-2 infection.

Preprocess of Targets and Drugs Data
To train the DNN-based DTI model in Figure 4, we first collected the DTI data from databases, including ChEMBL [30], BindingDB [31], Pubchem [32], UniProt [33], and DrugBank [34]. To calculate chemical descriptors for drugs and properties of proteins, we used PyBioMed [105], a python package, to transform the drugs and targets into features. Subsequently, the drug and target features were merged into a feature vector as in the following equation: where and are the features of the drug and target (protein), respectively; and are the th drug feature and the th target feature, respectively; A and B are the total number of features of the drug and the total number of features of the target, respectively. The input of the DNN-based DTI model should be in the feature vector form.
The collected drug-target interaction data contain 80,291 proven (positive) interactions and 100,024 unproven (negative) interactions, which implies that the collected data suffer from data imbalance. Data imbalance can cause the model to predict the majority, leading to prediction bias. Hence, the negative interactions were randomly deleted to match the number of positive interactions. Then, we divided all the data into training data (four-fifths) and testing data (one-fifth). To improve the convergence of gradient descent, we first performed feature scaling for the training dataset. Because there were some outliers for some features, we used standardization for each feature of training data. Then, PCA was applied to reduce the dimensions of the feature vector to 900 for the convenience of computing the DNN-based DTI model with 900 neurons in the input layer, as shown in Figure 4.

Architecture of DNN-Based DTI Model
Recent studies [28,29] have shown that the DNN-based DTI model can improve the prediction of interaction probability. We employed the DNN to predict the DTI for the pre-candidate molecular drugs with the significant biomarkers (drug targets). The architecture of neural network is shown in Figure 4. The neural network contains four hidden layers. Each hidden layer contains 512, 256, 128, and 64 neurons, respectively. Each neuron in the hidden layer has a bias, ReLU as the activation function to learn the nonlinearity, and a dropout (= 0.45) to avoid overfitting [106]. The output layer contains a neuron, a bias, and sigmoid as the activation function to represent the output between 0 and 1 as the drug-target interaction probability.
Since the DTI prediction was the binary classification, we used the binary cross-entropy as the loss function. The adaptive moment estimation (Adam) [107] was adopted as the optimization algorithm to update the parameters of the DNN-based DTI model. The DNN-based DTI model was trained by Keras with batch size = 64, epoch = 200 (with Early Stopping), and Adam optimizer (default arguments). The 10-fold cross-validation was first applied to examine the prediction performance of the DNN-based DTI model. Finally, the AUC was used to judge the ability of DNN-based DTI model to distinguish positive and negative interaction.

Drug Design Specifications
In addition to the drug-target interaction, the regulation ability, toxicity, and sensitivity of the drug are considered when we chose the drugs to make sure the quality of drugs. The Library of Integrated Network-Based Cellular Signatures (LINCS) L1000 dataset [108,109] is used for drug regulation ability, i.e., the indicates the upregulation (>0) or downregulation (<0) of the drug target interaction. The drug sensitivity is also considered. The PRISM Repurposing dataset [110] contains chemical perturbations of compounds for homo sapiens cells. The closer zero value of sensitivity indicates the less sensitivity chemical perturbation for the cell. The other drug design specification is the toxicity (LC50), which is obtained by the tool ADMETlab 2.0 [111]. The higher value of LC50 implies the lower toxicity for the body.

Conflicts of Interest:
The authors declare no conflict of interest.