A Computational Approach for Pathway-Based Systemic Drug Inﬂuence

: Drug repositioning is a well-known method used to reduce the time, cost, and development risks involved in bringing a new drug to the market. The rapid expansion of high-throughput datasets has enabled computational research that can suggest new potential uses for existing drugs. Some computational methods allow the prediction of potential drug targets of a given disease from a systematic network. Despite numerous efforts, the path of many drugs’ efﬁcacy in the human body remains unclear. Therefore, the present study attempted to understand drug efﬁcacy by systematically focusing on functional gene sets. The purpose of this study was to carry out modeling to identify systemic gene networks (called drug paths) in drug-speciﬁc pathways. In our results, we found ﬁve different paths for ﬁve different drugs. We performed GSEA to compare seven valproic acid-induced samples to 36 normal samples with 13,235 genes and determined the selenoamino acid metabolism (SelAM), including only 23 genes (Figure 3). Using these 23 genes of the SelAM pathway, we carried out an inverse problem to reveal the systemic functional relationships between those genes in this pathway. We revealed 21 positive relationships and 5 negative relationships using a cutoff of 0.45. The yellow nodes are acceptor nodes, while the green nodes are donors; the red lines indicate negative relationships, and gray lines show positive relationships.


Introduction
The utilization of existing drugs to treat new diseases, which is called drug repositioning, is a process that may help eliminate some of the issues associated with bringing a new drug to market, a 10-15-year process that requires considerable effort and an average cost of 2-3 billion United States dollars [1][2][3]. Despite overcoming these issues, the average approval rate of new drugs for clinical use is only 10%. Drug repositioning methods have been developed on the basis of the similarity of chemical compounds or ligands, the three-dimensional (3D) structure of proteins [4,5], and network-based algorithms for drug-target interactions and drug-disease interactions [6][7][8][9]. Structural approaches have identified potential drug-target interactions using 3D structures. The disadvantages of this method include the time-consuming computational 3D simulation process and the associated limitations of being applied without first obtaining 3D structure information. Ligand-based approaches have identified the target protein by comparing candidate ligands [5,7]. One limitation of this method is that no or few known ligands for the target protein may exist [10]. A network-based model that consists of nodes and edges has indicated that a higher number of connected nodes will produce a greater amount of influence [11,12]. In many cases, a network-based algorithm combined with a mathematical model was applied to develop dynamics of drug studies [13][14][15]. These computational drug repositioning methods usually follow four steps: data collection from a drug bank or gene-protein interaction, a computational model, evaluation of the method using new datasets, and validation through clinical trials [7]. Although network-based models have revealed functional relationships and drug candidates for repositioning, the drug-disease and drug-target relationships are constructed on the basis of prior knowledge, such as protein-protein interactions [2]. In previous studies, we adopted inverse problems and/or ordinary differential equations for inferring parameters of gene-gene interaction [16,17]. Both methods contributed to identifying interactions between nodes, such as genes, drugs, and targets.
Novelty of this study is that we use a mathematical model of an inverse problem for inferring gene-gene networks in enrichment pathway instead of statistical approaches. In particular, this mathematical approach has an advantage of identifying parameters without any prior information. Our method integrates pathway information and differentially expressed gene datasets with a mathematical computational algorithm. Therefore, the proposed study aimed to find the most effected pathway of a given drug as well as systemic gene interactions in the pathway.

Materials
We downloadedMCF7 breast cancer cell line (series GSE5258-GPL3921) updated in 2017 from Gene Expression Omnibus (GEO) [18]. The datasets included 218 total samples divided into 182 drug-induced cancer cell lines and 36 control samples from the Affymetrix-HThuman genome U133A array platform converting 22,268 probes to 13,235 gene names. Datasets were rearranged by eliminating those located under unknown gene names and averaging others listed under duplicate gene names. Our new method (see Section 2.2) requires the statistical analyses in which normal and tumor samples are used previously [19]. In this study, we set the minimum sample size to five arbitrarily. We extracted five druginduced datasets consisting of five or more samples: LY294002, valproic acid, sirolimus, trichostatin A, and wortmannin-induced datasets. These datasets consisted of eight, seven, seven, seven, and five samples, respectively.
The targets of gene markers by stage, grade, and drug sensitivity are unstable among datasets, and thus drug-path methods are introduced [20,21]. Numerous gene sets are available in open source; here, we used the Kyoto encyclopedia of genes and genomes (KEGG) pathway information [22].

Methods
Even though all the pathways play important roles in drug efficacy, we were concerned with identifying the pathway most affected by a drug. Therefore, we employed gene set enrichment analysis (GSEA) downloaded from [23] and obtained it using the enrichment score. Any detailed information regarding statistical analyses involved were stated previously [20].
An inverse algorithm [16,24,25] consists of four subroutines: the optimization method, a line search for finding the next direction for the parameters, objective function for determining whether the algorithm terminates or repeats by computing the distance between real datasets, and a mathematical model to generate synthetic datasets and parameters in matrix form. The method used in the study was also examined and validated in our previous studies [16,25,26]. In this study, we only used cancer datasets to identify the gene-gene network (parameters in an inverse algorithm) of the enrichment pathways. The parameter predictions for an inverse algorithm, however, require both real datasets and synthetic datasets. Synthetic datasets, which are required to search parameters, will be updated in each iteration. By adding white Gaussian noise [27][28][29] to a real dataset, we generated initial synthetic data instead of random permutations, since the variation in the real dataset was about 50% of the mean values. In the case of the datasets used to detect signals such as an interaction, noise must be reduced [27]. In this study, however, noise was added to find interactions. The schematic overall flow chart of the methods is shown in Figure 1. In brief, the formula for generating synthetic datasets was x syn = x real + ε, where ε ∈ (−0.1, 0.1); ε is white Gaussian noise; x syn and x real are synthetic and real datasets, respectively. In each iteration, x syn is newly updated. We adapted the L 2 norm for objective function f (x), which can be rewritten as L 2 = ∑ (x real − x syn ) 2 . The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is one that uses variable metric methods in multidimensions for finding the minimum of objective function, which is called an optimization method. The advantage of this method is that it uses a quasi-Newton method using the quadratic convergence of a Newton method of a nonlinear model, which is faster and more accurate than other optimization methods [24]. The drug modeling is written as X ij = [p ik ] X kj , and the p ik x kj , where x i s are ith gene expression data and p ik is the gene interaction between the ith gene and the jth gene [16]. methods in multidimensions for finding the minimum of objective function, which is called an optimization method. The advantage of this method is that it uses a quasi-Newton method using the quadratic convergence of a Newton method of a nonlinear model, which is faster and more accurate than other optimization methods [24]. The drug modeling is written as ' i x s are th i gene expression data and ik p is the gene interaction between the th i gene and the th j gene [16].

Results
We identified the top enrichment pathways characterized by metabolism using GSEA on the basis of each drug-induced dataset. These pathways include nitrogen, selenoamino acid and cysteine, porphyrin chlorophyll, naphthalene, and anthracene degradation metabolism pathway from LY294002-, valproic acid-, sirolimus-, trichostatin A-, and wortmannin-induced datasets. The cutoff values depend on size of genes in a pathway.

The Nitrogen Metabolism Pathway from LY294002-Induced Datasets
LY294002 is a known inhibitor of oncogenes [30,31]. We performed GSEA to compare LY294002-induced cancer datasets (with eight samples and 13,235 genes) to normal datasets. The nitrogen metabolism (NitroM) pathway was obtained as the top enrichment pathway. To perform an inverse algorithm using NitroM and including 23 unique genes, we obtained 23 positive relations and 2 negative relations with a 0.46 cutoff. Only two genes, asparagine synthetase (ASNS) and CA14, were donors, while all 23 genes in the NitroM pathway were acceptors ( Figure 2).

Results
We identified the top enrichment pathways characterized by metabolism using GSEA on the basis of each drug-induced dataset. These pathways include nitrogen, selenoamino acid and cysteine, porphyrin chlorophyll, naphthalene, and anthracene degradation metabolism pathway from LY294002-, valproic acid-, sirolimus-, trichostatin A-, and wortmannin-induced datasets. The cutoff values depend on size of genes in a pathway.

The Nitrogen Metabolism Pathway from LY294002-Induced Datasets
LY294002 is a known inhibitor of oncogenes [30,31]. We performed GSEA to compare LY294002-induced cancer datasets (with eight samples and 13,235 genes) to normal datasets. The nitrogen metabolism (NitroM) pathway was obtained as the top enrichment pathway. To perform an inverse algorithm using NitroM and including 23 unique genes, we obtained 23 positive relations and 2 negative relations with a 0.46 cutoff. Only two genes, asparagine synthetase (ASNS) and CA14, were donors, while all 23 genes in the NitroM pathway were acceptors ( Figure 2). NitroM is required for cell growth and proliferation and is essential fo of a variety of biomolecular compounds [32]. A prior study described a c investigation of some drugs, including eryaspase targeting NitroM; the tar paraginase of eryaspase may be involved in reducing metastasis in breast c NitroM is required for cell growth and proliferation and is essential for the synthesis of a variety of biomolecular compounds [32]. A prior study described a current clinical investi-Processes 2021, 9, 1063 4 of 8 gation of some drugs, including eryaspase targeting NitroM; the target gene L-asparaginase of eryaspase may be involved in reducing metastasis in breast cancer patients [33]. These studies support our result that the drug's target gene is ASNS in the NitroM pathway and also that LY294002 is one candidate drug for the treatment of breast cancer.

Selenoamino Acid Metabolism from Valproic Acid-Induced Datasets
We performed GSEA to compare seven valproic acid-induced samples to 36 normal samples with 13,235 genes and determined the selenoamino acid metabolism (SelAM), including only 23 genes (Figure 3). Using these 23 genes of the SelAM pathway, we carried out an inverse problem to reveal the systemic functional relationships between those genes in this pathway. We revealed 21 positive relationships and 5 negative relationships using a cutoff of 0.45. The yellow nodes are acceptor nodes, while the green nodes are donors; the red lines indicate negative relationships, and gray lines show positive relationships.
NitroM is required for cell growth and proliferation and is essential for the sy of a variety of biomolecular compounds [32]. A prior study described a current investigation of some drugs, including eryaspase targeting NitroM; the target ge paraginase of eryaspase may be involved in reducing metastasis in breast cancer [33]. These studies support our result that the drug's target gene is ASNS in the pathway and also that LY294002 is one candidate drug for the treatment of breas

Selenoamino Acid Metabolism from Valproic Acid-Induced Datasets
We performed GSEA to compare seven valproic acid-induced samples to 36 samples with 13,235 genes and determined the selenoamino acid metabolism ( including only 23 genes (Figure 3). Using these 23 genes of the SelAM pathway, we out an inverse problem to reveal the systemic functional relationships betwee genes in this pathway. We revealed 21 positive relationships and 5 negative relati using a cutoff of 0.45. The yellow nodes are acceptor nodes, while the green no donors; the red lines indicate negative relationships, and gray lines show positi tionships. In this case, adenosylhomocysteinase (AHCY) was negatively related to fiv and methionine adenosyltransferase 1A (MAT1A) was positively connected genes, which may imply that these two genes could be valproic acid drug ta omarkers.

Cysteine Metabolism Pathway from Sirolimus-Induced Datasets
A GSEA was performed to compare seven sirolimus-induced cancer sample mal samples with 13,235 genes. We obtained the cysteine metabolism (CystM) p In this case, adenosylhomocysteinase (AHCY) was negatively related to five genes, and methionine adenosyltransferase 1A (MAT1A) was positively connected with 21 genes, which may imply that these two genes could be valproic acid drug target biomarkers.

Cysteine Metabolism Pathway from Sirolimus-Induced Datasets
A GSEA was performed to compare seven sirolimus-induced cancer samples to normal samples with 13,235 genes. We obtained the cysteine metabolism (CystM) pathway, including 13 genes with the highest enriched score. The gene networks are shown (cutoff: 0.15, −0.15) with six donors and five acceptors (Figure 4). In this pattern, we found a larger number of donors than acceptors. The results of studies that have investigated the associations between cysteine an breast cancer risk are inconsistent. Some cohort studies have reported positive associa tions [34,35], while another cohort study did not observe an association [36]. A fourt study reported a negative association with breast cancer risk [37]. In addition, a recen study reported that balancing the cysteine pathway was a therapeutic benefit in neuro degenerative disorders [38].

Porphyrin Chlorophyll Metabolism from Trichostatin A-Induced Datasets
We carried out a GSEA to compare seven trichostatin A-induced cancer samples t normal datasets and obtained the porphyrin chlorophyll metabolism (PorChM) as the to  The results of studies that have investigated the associations between cysteine and breast cancer risk are inconsistent. Some cohort studies have reported positive associations [34,35], while another cohort study did not observe an association [36]. A fourth study reported a negative association with breast cancer risk [37]. In addition, a recent study reported that balancing the cysteine pathway was a therapeutic benefit in neurodegenerative disorders [38].

Porphyrin Chlorophyll Metabolism from Trichostatin A-Induced Datasets
We carried out a GSEA to compare seven trichostatin A-induced cancer samples to normal datasets and obtained the porphyrin chlorophyll metabolism (PorChM) as the top enrichment pathway consisting of 28 genes and 29 gene-gene networks (cutoff > 0.6 or <−0.6). UGT2B4 and UGT1A10 are donor genes, and all their negative interactions are shown in Figure 5.

Porphyrin Chlorophyll Metabolism from Trichostatin A-Induced Datasets
We carried out a GSEA to compare seven trichostatin A-induced cancer samples to normal datasets and obtained the porphyrin chlorophyll metabolism (PorChM) as the top enrichment pathway consisting of 28 genes and 29 gene-gene networks (cutoff > 0.6 or <−0.6). UGT2B4 and UGT1A10 are donor genes, and all their negative interactions are shown in Figure 5. Trichostatin A, a histone deacetylase inhibitor, is a target therapeutic drug of hematological malignancies, such as breast cancer [39,40]. However, no systemic influences of trichostatin A have been discovered.

Naphthalene and the Anthracene Degradation Pathway from Wortmannin-Induced Datasets
Wortmannin is a known inhibitor of oncogenes [30]. We performed a GSEA to compare five samples of wortmannin-induced datasets with normal datasets and obtained the Naphthalene_and Anthracene_Degradation (NaphAD) pathway, including 14 genes in the top differentially expressed pathway. The gene-gene interactions of this pathway are Trichostatin A, a histone deacetylase inhibitor, is a target therapeutic drug of hematological malignancies, such as breast cancer [39,40]. However, no systemic influences of trichostatin A have been discovered.

Naphthalene and the Anthracene Degradation Pathway from Wortmannin-Induced Datasets
Wortmannin is a known inhibitor of oncogenes [30]. We performed a GSEA to compare five samples of wortmannin-induced datasets with normal datasets and obtained the Naphthalene_and Anthracene_Degradation (NaphAD) pathway, including 14 genes in the top differentially expressed pathway. The gene-gene interactions of this pathway are shown in Figure 6 with a cutoff > 6.0 and <−6.0. Only two genes affected 12 genes; PRMT8 (chr 12p 13.32) is an inhibitor, while PRMT5 (chr 14q11.2) is an activator. shown in Figure 6 with a cutoff > 6.0 and <−6.0. Only two genes affected 12 genes; PRMT8 (chr 12p 13.32) is an inhibitor, while PRMT5 (chr 14q11.2) is an activator. A previous study reported that PRMT8 was correlated with survival in patients with breast cancer and gastric cancer [41]. That study demonstrated that a high level of PRMT8 expression was associated with positive survival in breast cancer and negative survival in gastric cancer. A previous study reported that PRMT8 was correlated with survival in patients with breast cancer and gastric cancer [41]. That study demonstrated that a high level of PRMT8 expression was associated with positive survival in breast cancer and negative survival in gastric cancer.

Discussion
We analyzed five different drug datasets and obtained five different pathways and systematic gene networks from each pathway (Figure 7). In general, the number of donors was much smaller than that of acceptors, and some relationships between donors were connected either positively or negatively. In the case of trichostatin A, all donors negatively interacted with all acceptors. In contrast, there were more positive interactions than negative ones for LY294002 and valproic acid. Interestingly, in the case of wortmannin, most acceptors were negatively affected by PRMT8 and positively influenced by PRMT5. The only exceptions were DHRS7, PRMT2, and LCMT2, which were negatively affected by PRMT8. In the case of sirolimus, the six donors and five acceptors contributed 4 negative interactions and 19 positive interactions. A previous study reported that PRMT8 was correlated with survival in patients with breast cancer and gastric cancer [41]. That study demonstrated that a high level of PRMT8 expression was associated with positive survival in breast cancer and negative survival in gastric cancer.

Discussion
We analyzed five different drug datasets and obtained five different pathways and systematic gene networks from each pathway (Figure 7). In general, the number of donors was much smaller than that of acceptors, and some relationships between donors were connected either positively or negatively. In the case of trichostatin A, all donors negatively interacted with all acceptors. In contrast, there were more positive interactions than negative ones for LY294002 and valproic acid. Interestingly, in the case of wortmannin, most acceptors were negatively affected by PRMT8 and positively influenced by PRMT5. The only exceptions were DHRS7, PRMT2, and LCMT2, which were negatively affected by PRMT8. In the case of sirolimus, the six donors and five acceptors contributed 4 negative interactions and 19 positive interactions. These results imply that drugs not only directly influence their target gene but also systemically affect hierarchical gene networks in a disease during treatment. Therefore, exploring systemic patterns of drug influence is a better therapeutic method than investigating only one target gene of a drug. From this perspective, the presented method provides a computational approach to determine the pathway-based systemic drug influence; These results imply that drugs not only directly influence their target gene but also systemically affect hierarchical gene networks in a disease during treatment. Therefore, exploring systemic patterns of drug influence is a better therapeutic method than investigating only one target gene of a drug. From this perspective, the presented method provides a computational approach to determine the pathway-based systemic drug influence; given that the patterns of relationships between gene networks and drugs are the same, many drugs can therefore be repurposed to treat other diseases.

Conclusions
In this work of pathway information and inverse problem, we proposed gene-gene interaction-based drug paths in the enriched pathway showing that four out of five drugs have an effect on some genes, which then regulate other genes in the pathway. Although those newly suggested drug paths remain to be validated, the current findings of the study suggest that screening network-based drug efficacy via a specific pathway may offer an unprecedented opportunity of applying a pre-existing drug to a new disease with the same pattern in the selected pathway.