A New Computational Approach to Evaluating Systemic Gene–Gene Interactions in a Pathway A ﬀ ected by Drug LY294002

: In this study, we investigate how drugs systemically a ﬀ ect genes via pathways by integrating information from interactions between chemical compounds and molecular expression datasets, and from pathway information such as gene sets using mathematical models. First, we adopt drug-induced gene expression datasets; then, employ gene set enrichment analysis tools for selecting candidate enrichment pathways; and lastly, implement the inverse algorithm package for identifying gene–gene regulatory networks in a pathway. We tested LY294002-induced datasets of the MCF7 breast cancer cell lines, and found a CELL CYCLE pathway with 101 genes, ERBB signaling pathway consisting of 82 genes, and MTOR pathway consisting of 45 genes. We consider two interactions: quantity strength depending on number of interactions, and quality strength depending on weight of interaction as positive ( + ) and negative ( − ) interactions. Our methods revealed ANAPC1-CDK6 ( − 0.412) and ORC2L- CHEK1(0.951) for the CELL CYCLE pathway; INS-RPS6 ( − 3.125) and PRKAA2-PRKAA2 ( + 1.319) for the MTOR pathway; and CBLB-RPS6KB1 ( − 0.141), RPS6KB1-CBLC ( + 0.238) for the ERBB signaling pathway to be top quality interactions. Top quantity interactions discovered include 12; the CDC ( − , + ) gene family for the CELL CYCLE pathway, 20; PIK3 ( − ), 23; PIK3CG ( + ) for the MTOR pathway, 11; PAK ( − ), 10; PIK3 ( + ) for the ERBB signaling pathway.


Introduction
Even with the enormous annual expense (~1.8 billion US dollars) associated with drug development [1] and the time-consuming 10-15-year process required to bring a novel drug to market [2,3], the new drug approval rating averaged only 2.01% between 2000 and 2008 [4,5]. To mitigate the time lag, drug repositioning, in which existing approved drugs are recycled for new diseases by integrating targets using computational methods, has been suggested as a pharmaceutical alternative [1]. Target genes of a drug are traditionally identified by chemical similarity [6]. The advantages of using gene expression response of cell lines after treatment include the ability to identify systematic drug influence and the fact that no prior information is required [7,8].
Computational drug studies are based on the similarity of chemical components, physical binding of proteins, and the kernel diffusion method, which integrates gene expression and network data [9][10][11]. A recent study developed a local radiality method for identifying drug-target interactions by integrating topological and differentially expressed gene information [12]. In particular, the study demonstrated a downstream method by identifying the affected pathway from drug-target and gene-gene connections [12]. A recent study [13] reported an approach for the systemic negative feedback gene network by inducing a drug for colorectal cancer. The advantages of the data-driven drug target network model are revealing potential drug targets and being excluded considering We used L 1 and L 2 norm for error functions as follows, L 1 = |R − C|, and L 2 = (R − C) 2 where R is real datasets and C is synthetic datasets. Here, we employed the inverse algorithm package for identifying the relationship of genes in the drug enrichment pathway.
Breast cancer is the leading cancer in women and the second common cancer overall. The total direct medical costs including diagnosis, treatment, and follow up associated with breast cancer are very expensive [24]. Despite all efforts over several decades, breast cancer is the leading cause of death and malignancy in females, and will be expected to reach 16 million by 2025 [25]. Here, we employed the MCF7 breast cancer cell line which has been proved to be a suitable model for breast cancer investigation [26]. We believe that the approach presented here can help the treatment of the cancer as well as reduce the burden of the expense and its side effects.
The novelty of our study is the downstream approach that consists in first identifying systemically affected pathways (gene sets), and then finding gene networks using an inverse problem algorithm which can be used to explore systemic drug efficacy.

Materials
We downloaded the 2017 version of the MCF7 breast cancer cell line dataset, which induced 182 drugs (GPL3921), from the Gene Expression Omnibus (GEO). The MCF7 dataset consisted of 22,268 probes and 218 samples divided into 182 cancer cell lines and 36 control samples using Affymetrix HT human genome U133A array platform by Lamb [27]. We downloaded platform information from BioMart to match probe to gene names, which concluded 13,236 unique genes. The number of unique drugs is 87 out of 182. Among them, we extracted eight samples induced by LY294002. We also downloaded 186 KEGG pathways gene sets in 2019 for molecular path information, and a gene set enrichment analysis (GSEA) tool (to identify enrichment on those gene sets. Additionally, we used web-based open source STITCH (search tool for interactions of chemicals), which shows networks between drugs, pathways, and genes [23].

Methods
The existing gene set enrichment analysis (GSEA) tool was employed to find drug associated enrichment pathways by calculating the enrichment score of gene expression level based on the Kolmogorov-Smirnov statistical test [28]. In addition, we used open source analysis STITCH, which offers networks between 1.5 million genes, 2200 drugs, and 68,000 chemicals. To identify enrichment pathways, we compared cancer datasets to normal datasets; to identify relationships between genes in the selected pathways, we used only cancer datasets. In Figure 1, blue boxes represent input datasets, orange boxes represent input pathway information and the gray box identifies gene-gene networks.
Processes 2020, 8, x FOR PEER REVIEW 3 of 10 offers networks between 1.5 million genes, 2200 drugs, and 68,000 chemicals. To identify enrichment pathways, we compared cancer datasets to normal datasets; to identify relationships between genes in the selected pathways, we used only cancer datasets. In Figure 1, blue boxes represent input datasets, orange boxes represent input pathway information and the gray box identifies gene-gene networks. The inverse algorithm consists of two parts: one for searching parameters and the other for generating synthetic datasets. The main elements of parameter searching include the optimization method, line search, and error function. For the optimization method, we used the Broyden-Fletcher-Goldfarb-Shanno-algorithm [29] as it is computationally efficient, being constructed of an inverse Hessian matrix to determine the values of next step iteration. The error function was used to find the global minimum by integrating the difference between real and synthetic datasets and provides a stop condition for iteration. Here, we used 1 L norm below.
To analyze the influence of the drug to cancer genes, the following linear combination was employed: ' X WX  [30], where , ' X X represent clinical and synthetic datasets, respectively.
The element ij w of matrix W represents the influence of the j th gene on i th gene which we refer to as interaction weight. X represents logarithm base 2 of the expression level of MCF7 breast cancer datasets downloaded from Gene Expression Omnibus (GEO).  The inverse algorithm consists of two parts: one for searching parameters and the other for generating synthetic datasets. The main elements of parameter searching include the optimization method, line search, and error function. For the optimization method, we used the Broyden-Fletcher-Goldfarb-Shanno-algorithm [29] as it is computationally efficient, being constructed of an inverse Hessian matrix to determine the values of next step iteration. The error function was used to find the global minimum by integrating the difference between real and synthetic datasets and provides a stop condition for iteration. Here, we used L 1 norm below.
To analyze the influence of the drug to cancer genes, the following linear combination was employed: X = WX [30], where X, X represent clinical and synthetic datasets, respectively.
The element w ij of matrix W represents the influence of the j th gene on i th gene which we refer to as interaction weight. X represents logarithm base 2 of the expression level of MCF7 breast cancer datasets downloaded from Gene Expression Omnibus (GEO). X represents the synthetic datasets generated based on a clinical dataset with white noise σ k = (−k, k) using Gaussian distribution such as

Results
We extracted enrichment pathways based on enrichment score from Gene Set Enrichment Analysis (GSEA) by comparing cancer gene expression datasets (eight samples) induced by LY294002 to normal datasets (36 samples), in parallel, matched with putative pathways suggested by STITCH. We obtained the CELL CYCLE pathway for the most differentially expressed pathway from GSEA as ordered by the normalized enriched score (NES), ERBB signaling pathway for the least false discovery rate from STITCH, and the MTOR pathway generated from both GSEA and STITCH ordered within the top five. We then calculated the interaction weight between 101 genes of the CELL CYCLE pathway, 82 genes of the ERBB signaling pathway, and 45 genes of the MTOR pathway. The direct code for performing inverse problem is generated from synthetic datasets by adding white noise We obtained a total of 10,201 relationships in the CELL CYCLE pathway, 6724 relationships in ERBB pathways, and 2025 relationships in the MTOR pathway. In the results, we considered quantity interaction depending on the number of connections and quality interaction depending on computed weight. The computed interaction weights from inverse problem differed in independent pathways; therefore, weight cutoffs were different for each pathway. We selected the top 100 interactions. Analysis from STITICH presents 10 genes affected by LY294002 including FOXO3, RPS6KB1, PIK3C (A, B, D, G) GSK3B, AKT1, PTGS2, and MTOR. After we refined the datasets with KEGG pathway information, the ERBB signaling pathway included 82 genes and the MTOR pathway included 45 genes. Eight of 10 genes were regulated in the ERBB signaling pathway (consisting of 82 KEGG genes), and seven genes were regulated in MTOR genes (consisting of 45 KEGG genes). We presented the gene name where appropriate; otherwise, we presented gene families for the quantity strength.

CELL CYCLE Pathway
We obtained the CELL CYCLE pathway including 101 genes as the most enriched pathway from GSEA performance with 42 core genes, while this pathway was not indicated in STITCH. Since interaction weights vary depending on the pathway, we selected the top 100 interactions for each

Results
We extracted enrichment pathways based on enrichment score from Gene Set Enrichment Analysis (GSEA) by comparing cancer gene expression datasets (eight samples) induced by LY294002 to normal datasets (36 samples), in parallel, matched with putative pathways suggested by STITCH. We obtained the CELL CYCLE pathway for the most differentially expressed pathway from GSEA as ordered by the normalized enriched score (NES), ERBB signaling pathway for the least false discovery rate from STITCH, and the MTOR pathway generated from both GSEA and STITCH ordered within the top five. We then calculated the interaction weight between 101 genes of the CELL CYCLE pathway, 82 genes of the ERBB signaling pathway, and 45 genes of the MTOR pathway. The direct code for performing inverse problem is generated from synthetic datasets by adding white noise σ k = (−k, k) where k is {k|0.1 ≤ k ≤ 1} to cancer datasets. We obtained a total of 10,201 relationships in the CELL CYCLE pathway, 6724 relationships in ERBB pathways, and 2025 relationships in the MTOR pathway. In the results, we considered quantity interaction depending on the number of connections and quality interaction depending on computed weight. The computed interaction weights from inverse problem differed in independent pathways; therefore, weight cutoffs were different for each pathway. We selected the top 100 interactions. Analysis from STITICH presents 10 genes affected by LY294002 including FOXO3, RPS6KB1, PIK3C (A, B, D, G) GSK3B, AKT1, PTGS2, and MTOR. After we refined the datasets with KEGG pathway information, the ERBB signaling pathway included 82 genes and the MTOR pathway included 45 genes. Eight of 10 genes were regulated in the ERBB signaling pathway (consisting of 82 KEGG genes), and seven genes were regulated in MTOR genes (consisting of 45 KEGG genes). We presented the gene name where appropriate; otherwise, we presented gene families for the quantity strength.

CELL CYCLE Pathway
We obtained the CELL CYCLE pathway including 101 genes as the most enriched pathway from GSEA performance with 42 core genes, while this pathway was not indicated in STITCH. Since interaction weights vary depending on the pathway, we selected the top 100 interactions for each positive and negative. The CELL CYCLE pathway demonstrated the top five negative quality strengths (−) in Table 1: ANAPC1-CDK5, CDKN2D-YWHAB, CCNA1-CDKN2A, CCNH-RB1, PTTG1-MCM7. Additionally, the top five quantity strengths include the CDC, CCN, CDK, MCM, and ORC family. Table 1 shows the top five positive quality strengths: ORC2L-CHEK1, PTTG1-SFN, CDC16-PTTG2, YWKAB-PKMYT1, and BUB3-PRKDC. Furthermore, the top 5 quantity strengths show that CDC families are activated by 12 genes, CCN (A, B, D, E, H) are activated by 10 genes, the MCM family including MCM (2, 3, 4, 5, 6, 7) is activated by seven, and the ORC family including ORC (1L, 2L, 3L, 4L, 5L, 6L) is activated by seven. As a stress gene, minichromosome maintenance deficient 7 (MCM7) is also overexpressed in esophageal squamous tumor with poor survival time [31]. The result of CCNA1 is also supported that CCNA1 was suggested as inhibitor in cell cycle by using DNA methylation and gene expression levels in MCF7 cell line [32]. In the case of CDK6, there is no evidence of association of SNPs in breast cancer [33]; however, other studies suggested profoundly related with ER+ breast cancer [34,35]. Therefore, our results may help the predictive target genes of specific drug and systemic insight of the mechanism.

MTOR Pathway
The MTOR pathway including 45 genes was selected from GSEA with nine core statistically computed genes: AKT3, PIK3CG, AKT2, RPS6KA6, PIK3CA, ULK2, PGF, VEGFC, and ULK1. The MTOR pathway was connected with seven genes in the STITCH with a 1.11 × 10 14 false discovery rate. We implemented the inverse algorithm with 45 genes of KEGG pathway information, and presented the results in (Table 2) for negative and (B) for positive quality interactions and meaningful quantity interaction from the top 100 selected from each of negative and positive interactions. In the results of negative quantity interactions, we demonstrate stress genes and their frequency from seven unique genes-shown in (Table 2). Especially, PIK3CG and AKT1 are indicated from STITCH to bind LY294002 in the MTOR pathway. In the positive interaction, specific genes instead of gene families are targeted by stress genes. We provide selected validation for our findings. For example, ribosomal protein S6 (RPS6) was associated with breast cancer by targeting miR-129-5p [36] and regulating PIK3CA [37]. Recent work suggested that PIK3CG was a targeted therapy for claudin-low breast cancer; we also found the quantitative target gene of the given drug [38]. The diagram shown in Figure 3 illustrates an aggregation of both negative and positive quality interactions presented in Table 2 We provide selected validation for our findings. For example, ribosomal protein S6 (RPS6) was associated with breast cancer by targeting miR-129-5p [36] and regulating PIK3CA [37]. Recent work suggested that PIK3CG was a targeted therapy for claudin-low breast cancer; we also found the quantitative target gene of the given drug [38]. The diagram shown in Figure 3 illustrates an aggregation of both negative and positive quality interactions presented in Table 2.

ERBB Signaling Pathway
The ERBB signaling pathway was selected from STITCH with 1.17e-15 FDR, not from GSEA. Using the inverse algorithm, 100 interactions were obtained with negative and positive weights. Results in Table 3 present stress genes that are identical for each negative (RPS6KB1) and positive

ERBB Signaling Pathway
The ERBB signaling pathway was selected from STITCH with 1.17 × 10 15 FDR, not from GSEA. Using the inverse algorithm, 100 interactions were obtained with negative and positive weights. Results in Table 3 present stress genes that are identical for each negative (RPS6KB1) and positive weight (CBLC). Note that the number of unique negative and positive stress genes was, respectively, only two (RPS6KB1, MYC) and four (CBLC, CAMK2D, EREG, and SHC3). The most quantified target genes were PAK and PIK3. The selected validation of our results are presented as follows: p21-activated kinase 2 (PAK2) is known to play a critical role in breast cancer cell invasion [39], and p21-activated kinase 7 (PAK7) is another known oncogene [40]. Our method shows that PAK2 and PAK7 are inhibited by RPS6KB1 which responds to LY294002. However, RPS6KB1 is overexpressed in various cancers, implying the likelihood of poor prognosis [41][42][43]. In the previous experiment study, Cbl proto-oncogene C (CBLC) was identified as regulating PARP in breast cancer [44]. Discrimination of the presented method from previous works is to discover interaction genes rather than individual markers. As an example, the gene interaction within the pathway presented in Table 3 is schematically illustrated in Figure 4, which may help understanding systemic drug efficacy.
was identified as regulating PARP in breast cancer [44]. Discrimination of the presented method from previous works is to discover interaction genes rather than individual markers. As an example, the gene interaction within the pathway presented in Table 3 is schematically illustrated in Figure 4, which may help understanding systemic drug efficacy.

Discussion
In this study, we identified LY-294002 drug induced gene-gene networks in the enrichment pathways obtained from GSEA and/or web-based analysis tool STITCH. We selected three pathways:

Discussion
In this study, we identified LY-294002 drug induced gene-gene networks in the enrichment pathways obtained from GSEA and/or web-based analysis tool STITCH. We selected three pathways: the CELL CYCLE pathway from GSEA, the ERBB signaling pathway from STITCH, and the MTOR pathway from GSEA and STITCH. To compute the interaction weight between genes in the selected pathways, we employed the inverse problem package. The inverse algorithm, which is a well-known parameter searching algorithm, includes three major parts: an optimization method for searching parameters, direct code for generating synthetic datasets, error function for computing least squares minimization. The advantages of this proposed study are that no prior knowledge requirement exists, and that systemic drug influence insights can be freely explored. In the results, we presented quantity and quality interactions with weight. Some meaningful interactions of the part of oncogene networks were discovered: PTTG1-MCM7 from the CELL CYCLE pathway [45,46] and PIK3CG-RPS6 from the MTOR pathway [36,37]. In addition, PTTG1 was validated as a biomarker of tumor cells including breast cancer in in vivo and in vitro experiments [45]. Especially, CDK6 in the cell cycle and PIK3CG (not PIK3CA) in the MTOR pathway are recently focused in breast cancer [33,37]. The interaction of RPS6KB1-PAK2(7) from the ERBB signaling pathway was supported with a recent study carried out by Zhang [47], in which PAK2 was the strongest resistance inducer of tamoxifen in breast cancer. Those genes were identified individually, while our findings are gene interactions with activation (positive) or inhibition (negative) of which systemic drug efficacy can be understood. Even though we provided meaningful validation for our results, the computational approaches always have been faced with the limitation of application to the real clinical medicine. Therefore, this network needs to confirm laboratory experiments in the future. In addition, as the preliminary study presented here, a single drug and limited pathways are tested. For generalizability of this approach, we need to test various drugs and pathway information based on breast cancer.