# System-Based Differential Gene Network Analysis for Characterizing a Sample-Specific Subnetwork

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Nonparametric Bayesian Network

**X**be an n-by-p data matrix whose element ${x}_{ij}$ corresponds to the observation of the j-th variable; that is, the expression value of the j-th gene, at the i-th sample, where n represents the total number of samples and p the number of variables. In the nonparametric BN model, we consider the joint density of all the variables and assume that it can be decomposed as the product of the local conditional densities, such as

#### 2.2. Proposed Method for Evaluating Sample Specific Edge Contribution Values

**E**is an n-by-m matrix whose element ${e}_{iv}$ corresponds to ECv of the v-th edge at the i-th sample, where m represents the total number of estimated edges and n the number of samples. The ECv matrix thus can be considered as a set of each sample’s ECvs. Since ECv originally represents the sample-specific profiling for each estimated edge, clustering of the ECv matrix highlights differences according to ECvs for each sample, which then allows for the grouping of samples based on their system-level similarities.

#### 2.3. Data Preparation

_{2}-transformed values of preprocessed data were applied to BN estimation and ECv calculation. The clinical and RNA-seq data of lung cancer patients [20] were acquired from the Genomic Data Commons Data Portal at TCGA and UCSC Xena [24]. NSCLC patients with either Lung Squamous Cell Carcinoma (LUSC) or Lung Adenocarcinoma (LUAD) were selected. The patients were first screened to obtain tumor specimens. RNA-seq data was filtered to remove genes with a mean percentile lower than 15, resulting in 17,450 genes. In the clinical data, we removed entries for patients whose follow-up or decease data was more than 2000 days. Further to these preprocesses, we deleted the patient data that were not common in the RNA-seq and clinical data. The number of the final patients for analyses was 426 (alive: 238, deceased: 188) for LUSC and 457 (alive: 285, deceased: 172) for LUAD. The details of the acquired data above are listed in Table S2.

#### 2.4. Differential Expression Gene Analysis

#### 2.5. Molecular Function Analysis

#### 2.6. ECv Matrix Clustering and Survival Analysis

#### 2.7. Network Analysis and Visualization

#### 2.8. Computation Environment

## 3. Results

#### 3.1. Basal Gene Network Estimation

#### 3.2. ΔECv Highlights the EMT-Characterized Edges

_{2}Fold Change (FC) distribution of mRNAs, which is a standard indicator in DEGs, we overlapped their histograms (Figure 3A). This showed that the distribution of ΔECv is much steeper than that of log

_{2}FC throughout the thresholds, suggesting that ΔECv can be considered as a better indicator for extracting condition-dependent edges. We set ΔECv threshold as 1.0 because it corresponds to approximately top 0.1% of edges out of the total number of edges. To gain reliable EMT-distinctive edges, we extracted 120 edges which exceeded the threshold in all of the three cell lines that are composed of 150 genes (Figure 3B,C). Because there were nine samples both for TGFβ-treated and control cells, we can evaluate statistical significance of the extracted edges. We performed t-tests for ECv between TGFβ-treated and control samples, and found that 108 edges out of ΔECv-extracted 120 satisfied the criteria of FDR (False Discovery Rate)-corrected p value < 0.01 (Figure S1). This supports that ΔECv extracted edges are statistically significant, even though we estimated the basal network from a small number of samples. Furthermore, considering that we used log

_{2}-transformed expression data, threshold 1.0 for ΔECv was generally supposed to be a 2-FC cutoff for the estimated system. Therefore, we hypothesized that 1.0 was a moderate threshold for the EMT data set and used this in the following analyses. The ΔECv heat map reflects the samples’ distinctive ECv matrix for the selected 120 edges (Figure 3D). This shows that the EMT-induced and control samples were clearly separated into two clusters, which further suggests that our ECv method captures the EMT-induced pattern of cellular network differences.

#### 3.3. ΔECv Unveils the EMT Networks

#### 3.4. Biological Validation for the EMT Network with the Comparison between ΔECv and DEG

_{2}FC > 2 and FDR-corrected p value < 0.00001, approximately following a previous report [19], resulting in 125 genes (Figure S2). This DEG-extracted gene set principally reflects a difference between control and TGFβ-treated samples. The number of shared genes obtained by ΔECv and DEG was 71 (Figure 5A), suggesting that some population of the ΔECv genes belongs to the DEG-extracted gene set. Considering that 2-FC is a standard lowest cutoff for making a decision for potential DEGs, the remaining 79 genes for ΔECv and 54 genes for DEG might be exclusive for each method (Figure 5A). This implies that network-driven ΔECv can extract genes that the conventional DEG method never does. To get more of an insight into the biology involved, we examined whether biological functions are different between the gene sets obtained by these two methods. The molecular functions in the top 6 ranks out of 10 are exactly the same between them (Figure 5B,C), and the top 3 functions of “cellular movement”, “cellular development” and “cellular growth and proliferation” might represent the EMT features. This further supports that at least the major population of ΔECv-extracted genes consists of EMT-related genes. Moreover, we observed that representative EMT markers, CDH1 and CDH2, were included in the EMT network (Figure 4). These results show that our method enables us to identify genes that are not identified through DEG, along with a biological validity, indicating the advantage of our ΔECv method.

#### 3.5. A Clinical Approach Using the EMT Network

## 4. Discussion

## 5. Patents

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ECv | Edge Contribution value |

EMT | Epithelial-Mesenchymal Transition |

BN | Bayesian Network |

DEG | Differentially Expressed Gene |

TCGA | The Cancer Genome Atlas project |

NNSR | the Neighbor Node Sampling and Repeat algorithm |

NSCLC | Non-Small Cell Lung Cancer |

LUAD | Lung Adenocarcinoma |

LUSC | Lung Squamous Cell Carcinoma |

X | n-by-p data matrix whose element ${x}_{ij}$ corresponds to the expression value of the j-th gene at the i-th sample (samples: $1\le i\le n$, genes: $1\le j\le p$) |

$p{a}_{ik}^{(j)}$ | expression value of the k-th parent of the j-th gene at the i-th sample |

$\mathit{\theta}$ | parameter vector of the conditional density |

${m}_{k}^{(j)}(p{a}_{ik}^{(j)})$ | value of the regression curve by B-splines, or contribution of the k-th parent of the j-th gene to the expression value of ${x}_{ij}$ |

${\sigma}_{j}$ | standard deviation for total observations of the j-th gene |

$EC{v}_{(i)}({j}_{k}\to j)$ | edge contribution value for edge ${j}_{k}\to j$ with respect to the i-th sample |

E | n-by-m ECv matrix whose element ${e}_{iv}$ corresponds to ECv of the v-th edge at the i-th sample (samples: $1\le i\le n$, edges: $1\le v\le m$) |

## References

- Lecca, P.; Priami, C. Biological network inference for drug discovery. Drug Discov. Today
**2013**, 18, 256–264. [Google Scholar] [CrossRef] [PubMed] - Schena, M.; Shalon, D.; Heller, R.; Chai, A.; Brown, P.O.; Davis, R.W. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. USA
**1996**, 93, 10614–10619. [Google Scholar] [CrossRef] [PubMed][Green Version] - Rapaport, F.; Khanin, R.; Liang, Y.; Pirun, M.; Krek, A.; Zumbo, P.; Mason, C.E.; Socci, N.D.; Betel, D. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol.
**2013**, 14, 3158. [Google Scholar] [CrossRef] [PubMed][Green Version] - Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nat. Protoc.
**2009**, 4, 44–57. [Google Scholar] [CrossRef] - Aravind, S.; Pablo, T.; Vamsi, K.M.; Sayan, M.; Benjamin, L.E.; Michael, A.G.; Amanda, P.; Scott, L.P.; Todd, R.G.; Eric, S.L.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA
**2005**, 102, 15545–15550. [Google Scholar] - Mootha, V.K.; Lindgren, C.M.; Eriksson, K.F.; Subramanian, A.; Sihag, S.; Lehar, J.; Puigserver, P.; Carlsson, E.; Ridderstråle, M.; Laurila, E.; et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet.
**2003**, 34, 267–273. [Google Scholar] [CrossRef] - Chai, L.E.; Loh, S.K.; Low, S.T.; Mohamad, M.S.; Deris, S.; Zakaria, Z. A review on the computational approaches for gene regulatory network construction. Comput. Biol. Med.
**2014**, 48, 55–65. [Google Scholar] [CrossRef] - Creixell, P.; Reimand, J.; Haider, S.; Wu, G.; Shibata, T.; Vazquez, M.; Mustonen, V.; Gonzalez-Perez, A.; Pearson, J.; Sander, C.; et al. Mutation Consequences and Pathway Analysis Working Group of the International Cancer Genome Consortium. Pathway and network analysis of cancer genomes. Nat. Method
**2015**, 12, 615–621. [Google Scholar] - Yan, K.K.; Wang, D.; Sethi, A.; Muir, P.; Kitchen, R.; Cheng, C.; Gerstein, M. Cross-disciplinary network comparison: Matchmaking between hairballs. Cell Syst.
**2016**, 2, 147–157. [Google Scholar] [CrossRef][Green Version] - Margolin, A.A.; Nemenman, I.; Basso, K.; Wiggins, C.; Stolovitzky, G.; Dalla Favera, R.; Califano, A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform.
**2006**, 7 (Suppl. 1), S7. [Google Scholar] [CrossRef][Green Version] - Araki, H.; Tamada, Y.; Imoto, S.; Dunmore, B.; Sanders, D.; Humphrey, S.; Nagasaki, M.; Doi, A.; Nakanishi, Y.; Yasuda, K.; et al. Analysis of PPARα-dependent and PPARα-independent transcript regulation following fenofibrate treatment of human endothelial cells. Angiogenesis
**2009**, 12, 221–229. [Google Scholar] [CrossRef] [PubMed] - Wang, L.; Hurley, D.; Watkins, W.; Araki, H.; Tamada, Y.; Muthukaruppan, A.; Ranjard, L.; Derkac, E.; Imoto, S.; Miyano, S.; et al. Cell cycle gene networks are associated with melanoma prognosis. PLoS ONE
**2012**, 7, e34247. [Google Scholar] [CrossRef][Green Version] - Affara, M.; Sanders, D.; Araki, H.; Tamada, Y.; Dunmore, B.J.; Humphreys, S.; Imoto, S.; Savoie, C.; Miyano, S.; Kuhara, S.; et al. Vasohibin-1 is identified as a master-regulator of endothelial cell apoptosis using gene network analysis. BMC Genomics
**2013**, 14, 23. [Google Scholar] [CrossRef] [PubMed][Green Version] - Singh, A.J.; Ramsey, S.A.; Filtz, T.M.; Kioussi, C. Differential gene regulatory networks in development and disease. Cell Mol. Life Sci.
**2018**, 75, 1013–1025. [Google Scholar] [CrossRef] [PubMed] - Tamada, Y.; Imoto, S.; Araki, H.; Nagasaki, M.; Print, C.; Charnock-Jones, D.S.; Miyano, S. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2011**, 8, 683–697. [Google Scholar] [CrossRef] [PubMed] - Shimamura, T.; Imoto, S.; Shimada, Y.; Hosono, Y.; Niida, A.; Nagasaki, M.; Yamaguchi, R.; Takahashi, T.; Miyano, S. A novel network profiling analysis reveals system changes in epithelial-mesenchymal transition. PLoS ONE
**2011**, 6, e20804. [Google Scholar] [CrossRef] - Yu, X.; Zeng, T.; Wang, X.; Li, G.; Chen, L. Unravelling personalized dysfunctional gene network of complex diseases based on differential network model. J. Transl. Med.
**2015**, 13, 189. [Google Scholar] [CrossRef][Green Version] - Kuijjer, M.L.; Tung, M.G.; Yuan, G.; Quackenbush, J.; Glass, K. Estimating sample-specific regulatory networks. iScience
**2019**, 14, 226–240. [Google Scholar] [CrossRef][Green Version] - Sun, Y.; Daemen, A.; Hatzivassiliou, G.; Arnott, D.; Wilson, C.; Zhuang, G.; Gao, M.; Liu, P.; Boudreau, A.; Johnson, L.; et al. Metabolic and transcriptional profiling reveals pyruvate dehydrogenase kinase 4 as a mediator of epithelial-mesenchymal transition and drug resistance in tumor cells. Cancer Metab.
**2014**, 2, 20. [Google Scholar] [CrossRef][Green Version] - Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature
**2014**, 511, 543–550. [Google Scholar] [CrossRef] - Imoto, S.; Goto, T.; Miyano, S. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac. Symp. Biocomput.
**2002**, 7, 175–186. [Google Scholar] - Arima, C.; Kajino, T.; Tamada, Y.; Imoto, S.; Shimada, Y.; Nakatochi, M.; Suzuki, M.; Isomura, H.; Yatabe, Y.; Yamaguchi, T.; et al. Lung adenocarcinoma subtypes definable by lung development-related miRNA expression profiles in association with clinicopathologic features. Carcinogenesis
**2014**, 35, 2224–2231. [Google Scholar] [CrossRef] [PubMed] - Gendelman, R.; Xing, H.; Mirzoeva, O.K.; Sarde, P.; Curtis, C.; Feiler, H.S.; McDonagh, P.; Gray, J.W.; Khalil, I.; Korn, W.M. Bayesian network inference modeling identifies TRIB1 as a novel regulator of cell-cycle progression and survival in cancer cells. Cancer Res.
**2017**, 77, 1575–1585. [Google Scholar] [CrossRef][Green Version] - Goldman, M.; Craft, B.; Hastie, M.; Repečka, K.; McDade, F.; Kamath, A.; Banerjee, A.; Luo, Y.; Rogers, D.; Brooks, A.N.; et al. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation. bioRxiv
**2019**, 326470. [Google Scholar] - Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.
**2015**, 43, e47. [Google Scholar] [CrossRef] - Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. B
**1995**, 57, 289–300. [Google Scholar] [CrossRef] - Krämer, A.; Green, J.; Pollard, J.; Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics
**2014**, 30, 523–530. [Google Scholar] [CrossRef] - Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I.; et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res.
**2016**, 44, e71. [Google Scholar] [CrossRef] - Mounir, M.; Lucchetta, M.; Silva, T.C.; Olsen, C.; Bontempi, G.; Chen, X.; Noushmehr, H.; Colaprico, A.; Papaleo, E. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput. Biol.
**2019**, 15, e1006701. [Google Scholar] [CrossRef][Green Version] - Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res.
**2003**, 13, 2498–2504. [Google Scholar] [CrossRef] - Heerboth, S.; Housman, G.; Leary, M.; Longacre, M.; Byler, S.; Lapinska, K.; Willbanks, A.; Sarkar, S. EMT and tumor metastasis. Clin. Transl. Med.
**2015**, 4, 6. [Google Scholar] [CrossRef] [PubMed] - Hynes, R.O. The extracellular matrix: Not just pretty fibrils. Science
**2009**, 326, 1216–1219. [Google Scholar] [CrossRef] [PubMed][Green Version] - Song, K.; Li, Q.; Jiang, Z.Z.; Guo, C.W.; Li, P. Heparan sulfate D-glucosaminyl 3-O-sulfotransferase-3B1, a novel epithelial-mesenchymal transition inducer in pancreatic cancer. Cancer Biol. Ther.
**2011**, 12, 388–398. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hsu, C.Y.; Chang, G.C.; Chen, Y.J.; Hsu, Y.C.; Hsiao, Y.J.; Su, K.Y.; Chen, H.Y.; Lin, C.Y.; Chen, J.S.; Chen, Y.J.; et al. FAM198B is associated with prolonged survival and inhibits metastasis in lung adenocarcinoma via blockage of ERK-mediated MMP-1 expression. Clin. Cancer Res.
**2018**, 24, 916–926. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wang, J.; Ding, N.; Li, Y.; Cheng, H.; Wang, D.; Yang, Q.; Deng, Y.; Yang, Y.; Li, Y.; Ruan, X.; et al. Insulin-like growth factor binding protein 5 (IGFBP5) functions as a tumor suppressor in human melanoma cells. Oncotarget
**2015**, 6, 20636–20649. [Google Scholar] [CrossRef][Green Version] - Tzanakakis, G.; Kavasi, R.M.; Voudouri, K.; Berdiaki, A.; Spyridaki, I.; Tsatsakis, A.; Nikitovic, D. Role of the extracellular matrix in cancer-associated epithelial to mesenchymal transition phenomenon. Dev. Dyn.
**2018**, 247, 368–381. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Overview of our proposed method. The center hairball (blue) is the basal network. The red nodes in the basal network represent the ΔEdge Contribution value (ECv)-extracted Epithelial-Mesenchymal Transition (EMT) network.

**Figure 2.**Graphical representation of ΔECv. Line (blue) is a nonparametric regression curve ${m}_{1}^{(Y)}(X)$ for edge $X\to Y$ estimated with Bayesian Network. Plots for X axis correspond to actual mRNA signal values and ${Y}^{\prime}$ axis partial residuals where the effects of the other parents are subtracted from the children’s mRNA signal values. Plots (green) and plots (yellow) represent, for instance, control and perturbated samples, respectively. Plots (grey) represent other values used for determining the regression curve. Values in ${Y}^{\prime}$ axis also stand for output through the regression function with parents’ mRNA signals. By the definition, these correspond to ECvs. ΔECv is defined as the difference between two conditions. In this example, the difference of ECvs between perturbated and control samples, i.e., the ΔECv of these two conditions, is depicted.

**Figure 3.**ΔECv analysis. (

**A**) The histograms of absolute ΔECv (green) and absolute log

_{2}Fold Change (FC) (magenta) for each cell line. FC is defined as TGFβ-treated/control. The number of total edges is 154,369 for ΔECv. The total number of genes for log

_{2}FC is 19,849. Y axis stands for density. X axis corresponds to the threshold for each ΔECv and log

_{2}FC. (

**B**) The Venn diagram represents the numbers of ΔECv-extracted edges for all the cell lines with threshold 1.0. (

**C**) The subnetwork of all the 2601 edges which were extracted from each cell lines by ΔECv. The nodes (150) and edges (120) with red highlight the common edges for each cell line. (

**D**) Heat map and the result of hierarchical clustering for the ECv matrix of ΔECv-extracted 120 edges with 18 samples.

**Figure 4.**Visualization of the EMT network. One hundred and fifty nodes and 411 edges constitute the network. The total number of connected components is seven. Node: The top 5% hub genes (filled with red) in the EMT network, and the top 5% (labeled with red) hub genes in the basal network are displayed. Nodes (blue line) represent genes extracted by ΔECv exclusively. Edge: Bold edges (grey) and standard edges (grey) represent ones with absolute ΔECv more than 1.5 and 1.0, respectively. ECv high 38 (green) and low (magenta) 70 edges in TCGA data-fitting experiment (Figure 6A and Figure S3B) are labeled. Dotted edges (grey) originally belonged to the basal network.

**Figure 5.**Comparison of ΔECv and Differentially Expressed Genes (DEG). (

**A**) Venn diagram for the genes extracted by ΔECv and DEG. (

**B**) Top 10 terms of the molecular function analysis for the ΔECv-extracted 150 genes. (

**C**) Top 10 terms of the molecular function analysis for the 125 genes through DEG.

**Figure 6.**Unsupervised clustering and survival analysis for Lung Squamous Cell Carcinoma (LUSC). (

**A**) Heat map with hierarchical clustering for the ECv matrix of 108 edges with 426 samples in LUSC RNA-Seq data. (

**B**) Kaplan-Meier curves for the two patient groups; group 1 (blue, n: 244) and group 2 (orange, n: 182), corresponding the patient clusters in the heat map in A. The survival analysis was performed using log-rank test for p value calculation.

T | Concordance |
---|---|

10,000 | 72.7% |

100,000 | 89.0% |

500,000 | 94.3% |

1,000,000 | 95.6% |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Tanaka, Y.; Tamada, Y.; Ikeguchi, M.; Yamashita, F.; Okuno, Y. System-Based Differential Gene Network Analysis for Characterizing a Sample-Specific Subnetwork. *Biomolecules* **2020**, *10*, 306.
https://doi.org/10.3390/biom10020306

**AMA Style**

Tanaka Y, Tamada Y, Ikeguchi M, Yamashita F, Okuno Y. System-Based Differential Gene Network Analysis for Characterizing a Sample-Specific Subnetwork. *Biomolecules*. 2020; 10(2):306.
https://doi.org/10.3390/biom10020306

**Chicago/Turabian Style**

Tanaka, Yoshihisa, Yoshinori Tamada, Marie Ikeguchi, Fumiyoshi Yamashita, and Yasushi Okuno. 2020. "System-Based Differential Gene Network Analysis for Characterizing a Sample-Specific Subnetwork" *Biomolecules* 10, no. 2: 306.
https://doi.org/10.3390/biom10020306