Unraveling the Molecular Puzzle: Exploring Gene Networks across Diverse EMT Status of Cell Lines

Heewon Park

doi:10.3390/ijms241612784

School of Mathematics, Statistics and Data Science, Sungshin Women’s University, Seoul 02844, Republic of Korea

Int. J. Mol. Sci.2023, 24(16), 12784;https://doi.org/10.3390/ijms241612784

This article belongs to the Special Issue Data Science in Cancer Genomics and Precision Medicine

Version Notes

Order Reprints

Abstract

Understanding complex disease mechanisms requires a comprehensive understanding of the gene regulatory networks, as complex diseases are often characterized by the dysregulation and dysfunction of molecular networks, rather than abnormalities in single genes. Specifically, the exploration of cell line-specific gene networks can provide essential clues for precision medicine, as this methodology can uncover molecular interplays specific to particular cell line statuses, such as drug sensitivity, cancer progression, etc. In this article, we provide a comprehensive review of computational strategies for cell line-specific gene network analysis: (1) cell line-specific gene regulatory network estimation and analysis of gene networks under varying epithelial–mesenchymal transition (EMT) statuses of cell lines; and (2) an explainable artificial intelligence approach for interpreting the estimated massive multiple EMT-status-specific gene networks. The objective of this review is to help readers grasp the concept of computational network biology, which holds significant implications for precision medicine by offering crucial clues.

Keywords:

gene regulatory network; explainable AI; epithelial–mesenchymal transition; precision medicine

1. Introduction

Recently, heterogeneous gene regulatory networks, which are regulatory interactions between genes controlling specific cell functions, have garnered a large amount of attention in various fields of research to understand disease mechanisms arising from complex molecular networks. To estimate gene regulatory networks, various computational methodologies have been developed. Furthermore, the effectiveness of these gene networks has been validated in various works, e.g., drug combinations identification, cancer prediction, etc. [1,2,3]. Although numerous studies on gene regulatory networks have been widely conducted, most of the existing studies have been based on averaged gene networks across all cell lines. Thus, we cannot reveal cell line (or patient)-specific gene regulatory networks that contain crucial clues for precision medicine.

In this article, we review computational strategies for cell line characteristic-specific gene network analysis. We first review machine learning approaches for sample-specific gene network estimation. We then focus on epithelial–mesenchymal transition (EMT), which is a biological phenomenon wherein epithelial cells undergo a transformation, leading to the loss of their characteristic properties and the acquisition of mesenchymal characteristics. The EMT plays key roles in cancer invasion, metastasis, and resistance to chemotherapy. Thus, uncovering the EMT mechanism is crucial for developing effective strategies to target cancer metastasis and enhance therapeutic efficacy. To understand the mechanism of transformation from epithelial cells to mesenchymal states and the related markers, we consider gene networks under varying EMT statuses of cell lines and review a study on uncovering system changes in gene networks under varying EMT statuses of cell lines [4]. We then turn our focus to the drawbacks in the studies involving the analysis of cell line characteristic-specific gene networks, specifically the interpretation of the massive multiple networks. The cell line-specific gene network analysis provides hundreds of networks for hundreds of cell lines, where a network is given in matrix form consisting of more than 10,000 rows with more than 1000 columns. Thus, the task of interpreting the massive multiple networks remains a big challenge. However, the inference of gene regulatory networks is not the final result; rather, these networks are intended to help solve biological and biomedical problems by effectively interpreting the estimated gene networks [5]. Gene regulatory networks play a crucial role in comprehending the maintenance, establishment, and disruption of cellular identity in diseases [6]. The gene networks facilitate our understanding of the molecular mechanisms governing organisms and reveal the fundamental principles governing a wide array of biological processes and reactions in organisms [7]. Furthermore, extensive evidence shows that epithelial–mesenchymal transition (EMT) is regulated by numerous transcription factors and signaling pathways, and gene regulatory networks play a critical role in controlling EMT programs during both developmental processes and disease [8,9].

In the second part of this article, we review the explainable artificial intelligence (XAI) approach, TRIP, which is used to interpret large-scale gene regulatory networks, enabling researchers to unravel complex biological processes and gain a deeper understanding of disease mechanisms [10]. We also review the application results of XAI in interpreting the estimated EMT status-specific gene networks [11]. TRIP was applied to gene networks estimated a decade ago and EMT markers were uncovered. Interestingly, some of the genes identified in this analysis were also identified as EMT markers in a previous study on EMT network analysis [4]. This implies that the dataset from 10 years ago contained knowledge of the EMT markers and their corresponding EMT-related mechanisms, which have subsequently been discovered in the last decade. Due to the lack of computational strategies for comprehensive analysis and interpretation of the massive gene regulatory networks, we were unable to uncover the EMT markers and their EMT-related mechanisms 10 years ago. Based on these reviews, it is expected that the incorporation of XAI will bring forth new possibilities in computational network biology, and may lead to a more effective understanding of complex disease mechanisms.

The remainder of this paper is organized as follows. In Section 2, we review the computational strategies for cell line-specific gene network analysis. The application results of the strategy for estimating cell line-specific gene networks under varying EMT statuses of cell lines are reviewed in Section 2. Then, we review the explainable artificial intelligence approach for interpreting the massive multiple networks in Section 3. The conclusions are provided in the Discussion section.

2. Investigating System Changes in Epithelial–Mesenchymal Transition through Personalized Gene Network Analysis

2.1. Computational Strategies

The expression levels of p regulator genes are given as

X = {(x_{1}, . . ., x_{n})}^{T} \in R^{n \times p}

, where

x_{i} = (x_{i 1}, . . ., x_{i p})

, which control target gene transcription

y_{ℓ} \in R^{n}

for n cell lines,

ℓ = 1, . . ., q

. The gene regulatory network can be described by the following linear regression framework,

\begin{matrix} y_{i ℓ} = β_{ℓ}^{T} x_{i} + ϵ_{i ℓ}, i = 1, . . ., n, ℓ = 1, . . ., q, \end{matrix}

(1)

where

β_{ℓ} = {(β_{ℓ 1}, . . ., β_{ℓ p})}^{T}

is the regression coefficient vector that represents the effect of p regulator genes on the

ℓ^{t h}

target gene, and

ϵ_{i ℓ}

is a random error vector.

The

L_{1}

-type regularization methods have been used to estimate gene regulatory networks, e.g., elastic net [12],

\begin{matrix} {\hat{β}}_{ℓ} = \underset{β_{ℓ}}{\arg \min} {\frac{1}{2} \sum_{i = 1}^{n} {(y_{i ℓ} - β_{ℓ}^{T} x_{i})}^{2} + P_{δ λ} (β_{ℓ})}, \end{matrix}

(2)

where

\begin{matrix} P_{δ λ} (β_{ℓ}) = λ \sum_{j = 1}^{p} [\frac{1}{2} (1 - δ) β_{ℓ j}^{2} + δ | β_{ℓ j} |] \end{matrix}

where

λ > 0

is a regularization parameter controlling the degree of shrinkage applied to

β_{ℓ}

, and

0 \leq δ \leq 1

is a mixing parameter between the ridge [13] and lasso [14] penalties. Although the

L_{1}

-type regularization methodologies successfully estimate gene regulatory networks, the methods provide an averaged gene network that provides an average representation across various cell lines.

However, molecular interplays exhibit diverse structures depending on the characteristics of cell lines. Figure 1 shows the correlation between two genes (i.e., UPF1 and DPM1) in cell lines corresponding to low (drug-sensitive), middle (drug-moderate), and high (drug-resistant) IC50 values of an anti-cancer drug (i.e., capecitabine). As shown in Figure 1, the two genes show a positive correlation in the drug-sensitive cell lines, whereas a negative correlation is seen in the drug resistance cell lines. Furthermore, the positive and negative correlation patterns between two genes cannot be discerned without considering the characteristics of cell lines (i.e., all cell lines). This implies that the characteristics of cell lines should be considered when estimating gene regulatory networks to extract crucial information for precision medicine. That is, cell line-specific gene regulatory networks are essential to accurately uncover the gene regulatory system for the specific characteristics of each cell line. RNA expression data were obtained from the CCLE dataset, and drug sensitivity data were downloaded from the Genomics of Drug Sensitivity in Cancer Project.

Figure 1. Correlation between two genes (i.e., UPF1 and MPM1) under varying conditions of cell line characteristics (i.e., all cell lines, anti-cancer drug-sensitive, -moderate, and -resistant cell lines).

2.1.1. NetworkProfiler

Shimamura et al. [4] developed a computational strategy to infer cell line-specific gene networks based on the following varying coefficient [15]

\begin{matrix} y_{i ℓ} = β_{ℓ}^{T} (m_{α}) x_{i} + ε_{i ℓ}, \end{matrix}

(3)

where

β_{ℓ} (m_{α})

is the regression coefficient vector that describes the effect of p regulator genes on the

ℓ^{t h}

target gene in the

α^{t h}

cell line. The

α^{t h}

cell line has

m_{α}

as a cancer-related characteristic, where the characteristic of cell lines is referred to as a modulator M (e.g., EMT status of cell lines). To estimate the varying coefficient

β_{ℓ} (m_{α})

, Shimamura et al. [4] developed the kernel-based L1-type regularization method as follows,

\begin{matrix} {\hat{β}}_{ℓ} (m_{α}) = \underset{β_{ℓ} (m_{α})}{\arg \min} {\frac{1}{2} \sum_{i = 1}^{n} {y_{i ℓ} - β_{ℓ}^{T} (m_{α}) x_{i ℓ}}^{2} G (m_{i} - m_{α} | b_{ℓ}) + P {β_{ℓ} (m_{α})}} \end{matrix}

(4)

where

P {β_{ℓ} (m_{α})}

is the recursive elastic net penalty,

\begin{matrix} P {β_{ℓ} (m_{α})} = λ_{ℓ α} \sum_{j = 1}^{p} [\frac{1}{2} (1 - δ_{ℓ α}) β_{ℓ j α}^{2} + δ_{ℓ α} w_{ℓ j α} | β_{ℓ j α} |], \end{matrix}

(5)

and

\begin{matrix} G (m_{i} - m_{α} | b_{ℓ}) = \exp \{\frac{- {(m_{i} - m_{α})}^{2}}{b_{ℓ}}\}, \end{matrix}

(6)

is the Gaussian kernel function used for grouping cell lines according to their cancer-related characteristics (i.e., modulator

m_{i}

for

i = 1, . . ., n

). In other words, the Gaussian kernel function is employed to quantify the similarity between the characteristics of cell lines and determine the weights that control the influence of samples in estimating the gene network for the

α^{t h}

cell line. Thus, we can estimate

β_{ℓ} (m_{α})

based only on cell lines having similar characteristics

m_{i}

to those of the target cell line

m_{α}

. This implies that NetworkProfiler has the ability to identify specific molecular interactions for cancer-related statuses of cell lines (e.g., EMT status, drug sensitivity, cancer progression-specific gene regulatory networks). In practice, genomic datasets frequently contain outliers originating from diverse sources, such as experimental errors, coding errors, and other factors. However,

L_{1}

-type regularization methods and NetworkProfiler suffer from outliers because the methods are based on the least-squares loss function. Thus, the network estimation and edge selection procedures are disturbed in the presence of outliers. In short, we cannot effectively perform personalized gene network analysis. In the subsequent subsection, we review robust personalized gene network analysis.

2.1.2. Robust NetworkProfiler

In practice, clinical and genomic alteration datasets typically encompass outliers arising from multiple sources (e.g., experimental errors, reporting or labeling errors, etc.). The genomic dataset consists of a substantial number of features (e.g., genes) and a limited number of samples (e.g., cell lines). This type of data is called high-dimensional data. Identifying and managing outliers in a high-dimensional genomic dataset are crucial and challenging tasks. To effectively detect outliers in a high-dimensional genomic dataset, Park et al. [16] considered the following robust Mahalanobis distance computed in the robust principal component space,

\begin{matrix} R . M D_{i}^{r . p c} = \sqrt{{(z_{i}^{R} - T^{r . p c})}^{T} {(C^{r . p c})}^{- 1} (z_{i}^{R} - T^{r . p c})}, \end{matrix}

(7)

where

T^{r . p c}

and

C^{r . p c}

are the robust mean and covariance matrices, respectively, estimated using the minimum volume ellipsoid, and

Z^{R} = {(z_{1}^{R}, . . ., z_{n}^{R})}^{T}

is a

κ

-dimensional matrix of the robust principal components. By using the robust Mahalanobis distance, Park et al. [16] developed the following weight to control outliers in high-dimensional genomic data,

\begin{matrix} R_{i}^{κ} = \frac{\min (\sqrt{k / R . M D_{i}^{r . p c}}, 1)}{\sum_{i = 1}^{n} \min (\sqrt{k / R . M D_{i}^{r . p c}}, 1)}, \end{matrix}

(8)

where

k = χ^{2} (df = κ)

is the 95% quantile of the

χ^{2} (df = κ)

distribution [17]. Park et al. [16] incorporated the weight into the kernel-based

L_{1}

-type regularization method as follows

\begin{matrix} {\hat{β}}_{ℓ} (m_{α}) = \underset{β_{ℓ} (m_{α})}{\arg \min} {\frac{1}{2} \sum_{i = 1}^{n} R_{i}^{κ} {y_{i ℓ} - β_{ℓ}^{T} (m_{α}) x_{i ℓ}}^{2} G (m_{i} - m_{α} | b_{ℓ}) + P {β_{ℓ}^{T} (m_{α})}} . \end{matrix}

(9)

The robust NetworkProfiler detects data points as outliers if their

R . M D_{i}^{r . p c}

is greater than the 95th percentile of the

χ^{2} (d f = κ)

distribution, and then reduces the effect of the detected outliers by applying the weight

R_{i}^{κ}

to the network estimation. Due to its robust nature, NetworkProfiler can effectively estimate personalized gene networks, even in the presence of outliers.

2.2. Uncovering Changes in Gene Regulatory Networks in the Epithelial–Mesenchymal Transition

We review the application results of NetworkProfiler for gene network analysis under varying EMT statuses of cell lines. Shimamura et al. [4] applied NetworkProfiler to reveal system changes under varying EMT statuses of cell lines. EMT status-specific gene networks were estimated using the EMT modulator that describes the EMT statuses of cell lines, defined using the module discovery method [18]. This method is based on 50 genes labeled as EMT-related genes (i.e., EMT-UP, EMT-DN, JECHLIN-GER-EMT-UP, and JECHLIN-GER-EMT-DN) in the Molecular Signatures Database v2.5, where low and high values of the EMT modulator represent the epithelial- and mesenchymal-like cell lines, respectively. The expression profiles of 13,508 genes in 762 cell lines were obtained from the Sanger Cell Line Project (http://bonsai.hgc.jp/~shima/NetworkProfiler/, accessed on 25 June 2023). The gene networks were estimated with 13,508 target genes and 1732 regulator genes, consisting of 47 nuclear receptors, 1183 transcription factors, and 502 human miRNA. For the 762 modulator values describing the EMT statuses of 762 cell lines, 762 gene regulatory networks between the 13,508 target and 1732 regulator genes were estimated.

To reveal system changes in the context of EMT, Shimamura et al. [4] focused on the well-known EMT marker, E-cadherin, because the loss of the cell adhesion molecule E-cadherin is a biomarker of EMT. Then, they identified candidate regulators of E-cadherin based on the following regulatory effect of the

j^{t h}

regulator on the

l^{t h}

target gene (i.e., E-cadherin) at the

α^{t h}

cell line,

\begin{matrix} {RE}_{j ℓ α} = \sum_{s \in π_{j ℓ α}} {\hat{β}}_{s}^{(j \to ℓ)} (m_{α}) \cdot x_{α j} . \end{matrix}

(10)

where

π_{j ℓ α}

is the set of all possible paths from

x_{j}

(i.e., regulators) to

y_{ℓ}

(i.e., E-cadherin), and

{\hat{β}}^{(j \to ℓ)} (m_{α})

is the product of the estimated coefficients on the

ℓ^{t h}

path in

π_{j ℓ α}

, where the length of the path from

x_{j}

to

y_{ℓ}

was regarded as 1 or 2. In other words, they considered the parent and grandparent genes as potential regulatory factors. Then, the following regulatory effect change according to the EMT status of cell lines was computed,

\begin{matrix} {REC}_{j ℓ} = \max {{RE}_{j l α}; α = 1, . . ., n} - \min {{RE}_{j ℓ α}; α = 1, . . ., n}, \end{matrix}

(11)

to measure how the EMT status affects the regulatory effect of the regulators on E-cadherin. The highest ranked 25 genes corresponding the highest REC values were extracted as candidate regulators of E-cadherin: IRF6 (-), miR-141 ([19]), GRHL2 (-), ZEB1 ([20]), LSR (-), miR-200b ([19]), KLF4 ([21]), OVOL2 (-), miR-200a ([19]), FOXA2 ([22]), TCF4 ([23]), ELF3 (-), SNF17 (-), MYB (-), KLF5 (-), miR-192 ([24]), FOXA1 ([23]), SNF165 (-), NKX2-1 (-), HNF1B (-), TFE3 (-), ZEB2 ([25]), TRIM29 (-), SNAI2 ([26]), where reference numbers in brackets indicate the previous studies on the regulatory mechanisms of the genes related to E-cadherin.

Among the 25 candidate regulators identified, about half of the genes had well-established evidence supporting their regulatory roles in E-cadherin, whereas the mechanisms of the remaining genes were yet to be revealed. NetworkProfiler was employed to predict the mechanistic interpretations of the E-cadherin-related regulatory system as follows:

The expression of miR-141 had a strong positive effect on the expression of E-cadherin in epithelial-like cells, whereas this effect decreased as the transition from epithelial- to mesenchymal-like cell lines occurred.
The expression of ZEB1 had a weak negative effect on the expression of E-cadherin in epithelial-like cells, whereas this effect increased as the transition from epithelial- to mesenchymal-like cell lines occurred.
miR-141 and ZEB1 had a strong negative effect on each other only in epithelial-like cells.

The findings suggest the existence of an adverse feedback loop between miR-141 and ZEB1 in epithelial-like cells, and this interaction had been previously revealed in [27].

As the transition from epithelial-like cells to mesenchymal-like cells occurred, the expression levels of miR-141 and E-cadherin decreased, whereas the expression level of ZEB1 increased.

Shimamura et al. [4] suggested, based on the aforementioned results, that the inhibition of miR-141 in mesenchymal-like cells disrupts the adverse feedback loop between miR-141 and ZEB1, consequently resulting in decreased expression of E-cadherin due to the increased expression of ZEB1.

2.3. Limitations of Current Personalized Gene Network Analysis

Personalized gene network analysis (e.g., EMT status-specific gene network) generates massive multiple networks, where each network is represented in matrix form with 1732 columns and 13,508 rows. This indicates that the interpretation of the analysis of EMT status-specific gene networks involves the examination of 762 extensive matrices, with each matrix corresponding to a distinct cell line.

Although NetworkProfiler enables us to explore changes in the system under different EMT status conditions, the existing study focused only on the well-known EMT marker, E-cadherin, and then interpreted the results based on the neighboring genes of E-cadherin. This approach was taken as the comprehensive analysis and interpretation of the extensive multiple networks were not feasible. However, the narrow interpretation is insufficient to understand the complex mechanism of disease. To effectively advance precision medicine, a comprehensive interpretation of the massive multiple gene networks is essential.

3. Explainable Artificial Intelligence (XAI) for Comprehensive Gene Network Analysis

In this section, we review the explainable artificial intelligence approach, known as Tensor Reconstruction-based Interpretable Prediction (TRIP), for the analysis of massive multiple networks [10]. In recent years, artificial intelligence has garnered considerable attention in various fields of research, statistics, computer science, biomedical, etc. Although AI approaches have shown significant success in terms of prediction or classification accuracy, the methods frequently encounter the black-box problem, wherein the decision-making process of AI machines cannot be explained due to the highly intricate nature of the deep learning model’s decision rules. The utilization of AI in the medical field is limited because a basis for explanation is required. To address the black-box problem, Maruhashi et al. [10] developed an XAI strategy called TRIP, which is a deep learning approach for tensor decomposition. It aims to find a subspace of the data from multiple networks that minimizes prediction error while retaining as much of the data information as possible. TRIP estimates a human-readable low-dimensional subspace and performs predictions based on the estimated subspace. Thus, we can effectively interpret and understand the analysis of massive multiple networks because the decision boundaries of a model can be efficiently visualized in a lower human-readable dimension, even though the model is learned through a deep learning approach. We briefly review the mathematical formula of TRIP in the following subsection.

3.1. Method: Tensor Reconstruction-Based Interpretable Prediction (TRIP)

The gene network connecting target genes with regulator genes can be regarded as a second-order tensor. Assume a K-mode tensor

X

with size

I_{1} \times \dots \times I_{K}

. TRIP estimates a projection matrix

C^{(k)} \in R^{I_{k} \times J_{k}}

for

X

and projects

X

onto a lower-dimensional subspace using the estimated

C^{(k)}

, i.e.,

\begin{matrix} {\bar{X}}_{i} = X_{i} \prod_{k} \times_{k} C^{(k)} . \end{matrix}

(12)

Then, the projected tensor

{\bar{X}}_{i}

is used as the input for the prediction or classification model, as follows

\begin{matrix} {\hat{y}}_{i} = f ({\bar{X}}_{i}, θ) for i = 1, . . ., n . \end{matrix}

(13)

This implies that the prediction or classification is conducted within the estimated human-readable low-dimensional subspace. The projection-based prediction leads to more explainable and interpretable results in the analysis of multilayer gene networks because the complex massive multiple gene networks are effectively visualized within a human-readable dimensional subspace.

TRIP estimates the projection matrices

C^{(k)}

and learns the deep learning-based prediction model simultaneously by minimizing not only the prediction error but also the subspace estimation error. The objective function of TRIP is given as

\begin{matrix} O_{T} = \frac{1}{n} \sum_{i = 1}^{n} {L (y_{i}, {\hat{y}}_{i}) + γ ∥ X_{i} - {\bar{X}}_{i} \prod_{k} \times_{k} C^{(k) T} ∥_{2}^{2}}, \\ subject to C^{(k) T} C^{(k)} = I, \end{matrix}

(14)

where

γ > 0

is the tuning parameter for the projection error. The first term

L (y_{i}, {\hat{y}}_{i})

and the second term are the loss functions for predicting and estimating the projection matrices, respectively. As shown in the objective function (14), TRIP learns a model by minimizing these two loss functions simultaneously. In other words, the projection matrices

C^{(k)}

are estimated to minimize errors in both projection and prediction. This implies that TRIP enables us to achieve effective prediction results while still retaining a significant amount of the original data variance within the estimated subspace. For details about TRIP, please refer to Maruhashi et al. [10]

3.2. Comprehensive Interpretation of the Massive Multiple Gene Networks across Varying EMT Statuses

Park et al. [11] applied TRIP to estimate 762 gene regulatory networks across varying EMT statuses of cell lines. The gene network between 13,508 targets with 1762 regulators was regarded as a second-order tensor. TRIP first estimated the projection matrices

C^{(k)}, k = 1, 2

for the regulator and target genes axes and learned a

50 \times 50

subspace of the 762 networks. Then, 50 crucial components that describe the importance of the target and regulator genes for EMT-modulator prediction were extracted. Park et al. [11] interpreted the results based on the crucial components of the subspace for regulator genes. Among the estimated 50 components, the first three components explained approximately 70% of the variability in the regulator genes within EMT-related gene networks (first component: 56%; second component: 8%; third component: 4%). Then, the interpretation of the EMT networks was conducted using the first three components. Figure 2 shows the overall framework of the EMT network analysis. In other words, EMT status-specific gene networks were estimated for 762 cell lines. To interpret the massive multiple gene networks, the explainable AI technique, TRIP, was applied. The interpretation of these networks was based on the first three crucial components of the regulator axis.

Figure 2. Overall framework of EMT network analysis: (1) Estimation of EMT status-specific gene networks for 762 cell lines. (2) Interpretation of the massive multiple gene networks using the explainable AI technique, TRIP. (3) Interpretation of the networks based on the first three crucial components of the regulator axis.

In order to achieve a more biologically reliable interpretation, Park et al. [11] combined the results obtained using TRIP with well-known EMT markers, i.e., ZEB1, ZEB2, SNAIL1, SNAIL2, and TWIST1 (EMT-TFs). The target networks of the five EMT-TFs were extracted separately for the high- and low-value regions of each component. The target genes (TG.EMT-TFs) of the EMT-TFs and the target genes of the TG.EMT-TFs were also extracted. For the networks in the high and low regions of each component, the binary adjacency matrices were computed. The 10 genes with the most significant differences in edge structure between the binary adjacency matrices for the high and low regions of each component were extracted. Table 1 shows the identified genes from the three components and their EMT-related evidence, where “◯” in the column “In NetworkProfiler” indicates that the genes were also identified as top-25-ranked regulators of E-cadherin in the previous study of EMT status-specific gene network [4].

Table 1. Identified novel candidate markers involved in EMT related mechanism [11].

Most of the EMT markers identified using TRIP have been supported by strong evidence as EMT markers, and their EMT-related mechanisms have been previously reported in multiple studies. This implies that TRIP has provided biologically reliable insights into the identification of EMT-related mechanisms. Out of the 17 genes identified, GRHL2, IRF6, LSR, and OVOL2 were also identified as EMT markers (i.e., regulators of E-cadherin) in the previous EMT network analysis using the NetworkProfiler [4]. The EMT-related mechanisms of genes such as GRHL2, IRF6, LSR, and OVOL2 were not known a decade ago. However, over the past decade, the EMT-related mechanisms of the genes have been uncovered. Interestingly, the EMT status-specific gene networks were estimated a decade ago based on data from that time, and TRIP was applied to the data to identify the EMT-related mechanisms. This implies that the data from 10 years ago already contained insights into EMT mechanisms that have since been revealed in the past decade. Because of a shortage of computational strategies for comprehensive analysis of the large-scale gene regulatory network, we were unable to reveal EMT-related mechanisms a decade ago (e.g., GRHL2, IRF6, OVOL2, LSR shown in Table 1). The application of the explainable AI technique, TRIP, allowed us to uncover the EMT-related mechanisms that have been unveiled over the past ten years in a single, unified analysis. Reviews on EMT gene network analysis indicate that the application of explainable AI could usher in a new era for computational network biology, potentially revolutionizing our understanding of intricate disease mechanisms.

4. Discussion

The aim of this review paper was to provide insights into the computational strategies for cell line-specific gene networks and the use of the explainable AI approach to overcome the bottleneck of existing black-box AI (i.e., interpreting the massive multiple gene networks). We also reviewed computational strategies for analyzing massive multiple gene networks. Although many studies have been conducted on gene network analysis, the existing studies were related to averaged gene networks for all cell lines (or patients), which cannot provide cell line-specific molecular characteristics. To address this issue, recently, computational strategies for cell line-specific gene network estimation have been developed. Cell line characteristic-specific gene network analysis enables us to uncover gene regulatory systems for specific characteristics of cell lines. The gene networks identified through this approach provide indispensable information that can significantly contribute to precision medicine. However, interpreting the massive multiple gene networks remains a challenge, mainly due to the extensive scale of the networks involved. Each interpretation subject entails hundreds of massive networks, comprising more than 10,000 rows and over 1000 columns. We reviewed the computational strategies for cell line-specific gene network estimation and the explainable AI approach for interpreting the estimated massive multiple gene regulatory networks. We also reviewed the analysis results of the gene regulatory system changes under varying EMT statuses of cell lines. Based on the reviews of two studies, it was found that the explainable AI approach, TRIP, has successfully unraveled the EMT-related mechanisms that have been discovered over the past ten years. This implies that the lack of computational strategies for the comprehensive analysis of large-scale gene networks has hindered the identification of the EMT-related mechanisms that have been revealed in the past ten years. We expect that explainable AI methods, such as TRIP, can offer insightful and interpretable solutions for understanding the complex dynamics of cancer network biology.

In this article, we have reviewed computational strategies for cell line-specific gene network analysis and their application for EMT status-specific gene networks. The Connectivity Map is one of the most powerful tools for identifying connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs [61]. It can be suggested that the use of a Connectivity Map with the reviewed strategies can lead to more comprehensive results, especially in the interpretation of drug-related characteristic-specific gene network analysis.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The estimated networks can be downloaded from http://bonsai.hgc.jp/~shima/NetworkProfiler/, accessed on 25 June 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Daoud, M.; Mayo, M. A survey of neural network-based cancer prediction models from microarray data. Artif. Intell. Med. 2019, 97, 204–214. [Google Scholar] [CrossRef] [PubMed]
Cheng, F.; Kovacs, I.; Barabasi, A. Network-based prediction of drug combinations. Nat. Commun. 2019, 10, 1197. [Google Scholar] [CrossRef] [PubMed]
Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6533–6542.
Shimamura, T.; Imoto, S.; Shimada, Y.; Hosono, Y.; Niida, A.; Nagasaki, M.; Yamaguchi, R.; Takahashi, T.; Miyano, S. A novel network profiling analysis reveals system changes in epithelial-mesenchymal transition. PLoS ONE 2011, 6, e20804. [Google Scholar] [CrossRef] [PubMed]
Emmert-Streib, F.; Dehmer, M.; Haibe-Kains, B. Gene regulatory networks and their applications: Understanding biological and medical problems in terms of networks. Front. Cell Dev. Biol. 2014, 2, 38. [Google Scholar]
Badia-I-Mompel, P.; Wessels, L.; Müller-Dott, S.; Trimbour, R.; Ramirez Flores, R.O.; Argelaguet, R.; Saez-Rodriguez, J. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 2023, 1–16. [Google Scholar]
Zhao, M.; He, W.; Tang, J.; Zou, Q.; Guo, F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief. Bioinform. 2021, 22, bbab009. [Google Scholar] [CrossRef]
Lavin, D.P.; Tiwari, V.K. Unresolved Complexity in the Gene Regulatory Network Underlying EMT. Front. Oncol. 2020, 10, 554. [Google Scholar] [CrossRef]
Fazilaty, H.; Rago, L.; Kass Youssef, K.; Ocana, O.H.; Garcia-Asencio, F.; Arcas, A.; Galceran, J.; Nieto, M.A. A gene regulatory network to control EMT programs in development and disease. Nat. Commun. 2019, 10, 5115. [Google Scholar]
Maruhashi, K.; Park, H.; Yamaguchi, R.; Miyano, S. Linear Tensor Projection Revealing Nonlinearity. arXiv 2020, arXiv:2007.03912. [Google Scholar]
Park, H.; Maruhashi, K.; Yamaguchi, R.; Imoto, S.; Miyano, S. Global gene network exploration based on explainable artificial intelligence approach. PLoS ONE 2020, 15, e0241508. [Google Scholar]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Techonometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Varying-Coefficient Models. J. R. Stat. Soc. Ser. B 1993, 4, 757–796. [Google Scholar] [CrossRef]
Park, H.; Shimamura, T.; Miyano, S.; Imoto, S. Robust prediction of anti-cancer drug sensitivity and sensitivity-specific biomarker. PLoS ONE 2014, 9, e108990. [Google Scholar] [CrossRef]
Khan, J.A.; Van Aelst, S.; Zamar, R.H. Robust linear model selection based on least angle regression. J. Am. Stat. Assoc. 2007, 102, 1289–1299. [Google Scholar] [CrossRef]
Niida, A.; Smith, A.D.; Imoto, S.; Aburatani, H.; Zhang, M.Q.; Akiyama, T. Gene set-based module discovery in the breast cancer transcriptome. BMC Bioinf. 2009, 10, 71. [Google Scholar] [CrossRef] [PubMed]
Gregory, P.A.; Bert, A.G.; Paterson, E.L.; Barry, S.C.; Tsykin, A.; Farshid, G.; Vadas, M.A.; Khew-Goodall, Y.; Goodall, G.J. The miR-200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB1 and SIP1. Nat. Cell Biol. 2008, 10, 593–601. [Google Scholar] [CrossRef]
Comijn, J.; Berx, G.; Vermassen, P.; Verschueren, K.; van Grunsven, L.; Bruyneel, E.; Mareel, M.; Huylebroeck, D.; van Roy, F. The two-handed E box binding zinc finger protein SIP1 downregulates Ecadherin and induces invasion. Mol. Cell 2001, 7, 1267–1278. [Google Scholar] [CrossRef]
Yori, J.L.; Johnson, E.; Zhou, G.; Jain, M.K.; Keri, R.A. Kruppel-like factor 4 inhibits epithelial-tomesenchymal transition through regulation of E-cadherin gene expression. J. Biol. Chem. 2010, 285, 16854–16863. [Google Scholar] [CrossRef]
Song, Y.; Washington, M.K.; Crawford, H.C. Loss of FOXA1/2 is essential for the epithelialto-mesenchymal transition in pancreatic cancer. Cancer Res. 2010, 70, 2115–2125. [Google Scholar] [CrossRef] [PubMed]
Sobrado, V.R.; Moreno-Bueno, G.; Cubillo, E.; Holt, L.J.; Nieto, M.A.; Portillo, F.; Cano, A. The class I bHLH factors E2-2A and E2-2B regulate EMT. J. Cell Sci. 2009, 122, 1014–1024. [Google Scholar] [CrossRef] [PubMed]
Kato, M.; Zhang, J.; Wang, M.; Lanting, L.; Yuan, H.; Rossi, J.J.; Natarajan, R. MicroRNA-192 in diabetic kidney glomeruli and its function in TGF-beta-induced collagen expression via inhibition of E-box repressors. Proc. Natl. Acad. Sci. USA 2007, 104, 3432–3437. [Google Scholar] [CrossRef] [PubMed]
Eger, A.; Aigner, K.; Sonderegger, S.; Dampier, B.; Oehler, S.; Schreiber, M.; Berx, G.; Cano, A.; Beug, H.; Foisner, R. DeltaEF1 is a transcriptional repressor of E-cadherin and regulates epithelial plasticity in breast cancer cells. Oncogene 2005, 4, 2375–2385. [Google Scholar] [CrossRef]
Hajra, K.M.; Chen, D.Y.; Fearon, E.R. The SLUG zinc-finger protein represses E-cadherin in breast cancer. Cancer Res. 2002, 62, 1613–1618. [Google Scholar]
Bracken, C.P.; Gregory, P.A.; Kolesnikoff, N.; Bert, A.G.; Wang, J.; Shannon, M.F.; Goodall, G.J. A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res. 2008, 68, 7846–7854. [Google Scholar] [CrossRef]
Daniel, J.G.; Panizzi, J.R. Spatiotemporal expression profile of embryonic and adult ankyrin repeat and EF-hand domain containing protein 1-encoding genes ankef1a and ankef1b in zebrafish. Gene Exp. Patterns. 2019, 34, 119069. [Google Scholar] [CrossRef]
Wang, S.; Yan, S.; Zhu, S.; Zhao, Y.; Yan, J.; Xiao, Z.; Bi, J.; Qiu, J.; Zhang, D.; Hong, Z.; et al. FOXF1 Induces Epithelial-Mesenchymal Transition in Colorectal Cancer Metastasis by Transcriptionally Activating SNAI1. Neoplasia 2018, 20, 996–1007. [Google Scholar] [CrossRef]
Wei, H.J.; Nickoloff, J.A.; Chen, W.H.; Liu, H.Y.; Lo, W.C.; Chang, Y.T.; Yang, P.C.; Wu, C.W.; Williams, D.F.; Gelovani, J.G.; et al. FOXF1 mediates mesenchymal stem cell fusion-induced reprogramming of lung cancer cells. Oncotarget 2014, 5, 9514–9529. [Google Scholar] [CrossRef]
Lo, P.K. The controversial role of forkhead box F2 (FOXF2) transcription factor in breast cancer. PRAS Open. 2017, 1, 9. [Google Scholar]
Cai, J.; Tian, A.X.; Wang, Q.S.; Kong, P.Z.; Du, X.; Li, X.Q.; Feng, Y.M. FOXF2 suppresses the FOXC2-mediated epithelial-mesenchymal transition and multidrug resistance of basal-like breast cancer. Cancer Lett. 2015, 367, 129–137. [Google Scholar] [CrossRef] [PubMed]
Iwasaki, H.; Nakano, K.; Shinkai, K.; Kunisawa, Y.; Hirahashi, M.; Oda, Y.; Onishi, H.; Katano, M. Hedgehog Gli3 activator signal augments tumorigenicity of colorectal cancer via upregulation of adherence-related genes. Cancer Sci. 2013, 104, 328–336. [Google Scholar] [CrossRef] [PubMed]
Rodrigues, M.F.S.D.; Miguita, L.; De Andrade, N.P.; Heguedusch, D.; Rodini, C.O. GLI3 knockdown decreases stemness, cell proliferation and invasion in oral squamous cell carcinoma. Int. J. Oncol. 2018, 53, 2458–2472. [Google Scholar] [CrossRef]
Chung, V.Y.; Tan, T.Z.; Tan, M.; Wong, M.K.; Kuay, K.T.; Yang, Z.; Ye, J.; Muller, J.; Koh, C.M.; Guccione, E.; et al. GRHL2-miR-200-ZEB1 maintains the epithelial status of ovarian cancer through transcriptional regulation and histone modification. Sci. Rep. 2016, 6, 19943. [Google Scholar] [CrossRef]
Xiang, J.; Fu, X.; Ran, W.; Wang, Z. Grhl2 reduces invasion and migration through inhibition of TGFβ-induced EMT in gastric cancer. Oncogenesis 2017, 6, e284. [Google Scholar] [CrossRef] [PubMed]
Cieply, B.; Farris, J.; Denvir, J.; Ford, H.L.; Frisch, S.M. Epithelial-mesenchymal transition and tumor suppression are controlled by a reciprocal feedback loop between ZEB1 and Grainyhead-like-2. Cancer Res. 2013, 73, 6299–6309. [Google Scholar] [CrossRef]
Mooney, S.M.; Talebian, V.; Jolly, M.K.; Jia, D.; Gromala, M.; Levine, H.; McConkey, B.J. The GRHL2/ZEB Feedback Loop-A Key Axis in the Regulation of EMT in Breast Cancer. J. Cell Biochem. 2017, 118, 2559–2570. [Google Scholar] [CrossRef]
Alimirah, F.; Chen, J.; Davis, F.J.; Choubey, D. IFI16 in human prostate cancer. Mol. Cancer Res. 2007, 5, 251–259. [Google Scholar] [CrossRef]
Lin, W.; Zhao, Z.; Ni, Z.; Zhao, Y.; Du, W.; Chen, S. IFI16 restoration in hepatocellular carcinoma induces tumour inhibition via activation of p53 signals and inflammasome. Cell Prolif. 2017, 50, e12392. [Google Scholar] [CrossRef]
Unterholzner, L.; Keating, S.E.; Baran, M.; Horan, K.A.; Jensen, S.B.; Sharma, S.; Sirois, C.M.; Jin, T.; Latz, E.; Xiao, T.S.; et al. IFI16 is an innate immune sensor for intracellular DNA. Nat. Immunol. 2010, 11, 997–1004. [Google Scholar] [CrossRef]
Roy, A.; Ghosh, A.; Kumar, B.; Chandran, B. IFI16, a nuclear innate immune DNA sensor, mediates epigenetic silencing of herpesvirus genomes by its association with H3K9 methyltransferases SUV39H1 and GLP. eLife 2019, 8, e49500. [Google Scholar] [CrossRef] [PubMed]
Ke, C.Y.; Xiao, W.L.; Chen, C.M.; Lo, L.J.; Wong, F.H. IRF6 is the mediator of TGFβ3 during regulation of the epithelial mesenchymal transition and palatal fusion. Sci. Rep. 2015, 5, 12791. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Cheng, P.; Wang, J.; Qiu, X.; Zhang, X. IRF6 Is Directly Regulated by ZEB1 and ELF3, and Predicts a Favorable Prognosis in Gastric Cancer. Front. Oncol. 2019, 9, 200. [Google Scholar]
Shimada, H.; Abe, S.; Kohno, T.; Satohisa, S.; Konno, Y. Loss of tricellular tight junction protein LSR promotes cell invasion and migration via upregulation of TEAD1/AREG in human endometrial cancer. Sci. Rep. 2017, 7, 37049. [Google Scholar] [CrossRef] [PubMed]
Parsana, P.; Amend, S.R.; Hernandez, J.; Pienta, K.J.; Battle, A. Identifying global expression patterns and key regulators in epithelial to mesenchymal transition through multi-study integration. BMC Cancer. 2017, 17, 447. [Google Scholar] [CrossRef]
Reaves, D.K.; Fagan-Solis, K.D.; Dunphy, K.; Oliver, S.D.; Scott, D.W.; Fleming, J.M. The role of lipolysis stimulated lipoprotein receptor in breast cancer and directing breast cancer cell behavior. PLoS ONE 2014, 9, e91747. [Google Scholar] [CrossRef]
Takano, K.; Kakuki, T.; Obata, K.; Nomura, K.; Miyata, R.; Kondo, A.; Kurose, M.; Kakiuchi, A.; Kaneko, Y.; Kohno, T.; et al. The Behavior and Role of Lipolysis-stimulated Lipoprotein Receptor, a Component of Tricellular Tight Junctions, in Head and Neck Squamous Cell Carcinomas. Anticancer Res. 2016, 36, 5895–5904. [Google Scholar] [CrossRef]
Liu, J.; Wu, Q.; Wang, Y.; Wei, Y.; Wu, H.; Duan, L.; Zhang, Q.; Wu, Y. Ovol2 induces mesenchymal-epithelial transition via targeting ZEB1 in osteosarcoma. Onco Targets Ther. 2018, 11, 2963–2973. [Google Scholar] [CrossRef]
Nilsson, G.; Kannius-Janson, M. Forkhead Box F1 promotes breast cancer cell migration by upregulating lysyl oxidase and suppressing Smad2/3 signaling. BMC Cancer 2016, 16, 142. [Google Scholar] [CrossRef]
Roca, H.; Hernandez, J.; Weidner, S.; McEachin, R.C.; Fuller, D. Transcription factors OVOL1 and OVOL2 induce the mesenchymal to epithelial transition in human cancer. PLoS ONE 2013, 8, e76773. [Google Scholar] [CrossRef]
Hong, T.; Watanabe, K.; Ta, C.H.; Villarreal-Ponce, A.; Nie, Q.; Dai, X. An Ovol2-Zeb1 Mutual Inhibitory Circuit Governs Bidirectional and Multi-step Transition between Epithelial and Mesenchymal States. PLoS Comput. Biol. 2015, 11, e1004569. [Google Scholar] [CrossRef]
Zhang, Y.; Liao, Y.; Chen, C.; Sun, W.; Sun, X.; Liu, Y.; Xu, E.; Lai, M.; Zhang, H. p38 regulated FOXC1 stability is required for colorectal cancer metastasis. J. Pathol. 2020, 250, 217–230. [Google Scholar] [CrossRef]
Chandhoke, A.S.; Karve, K.; Dadakhujaev, S.; Netherton, S.; Deng, L.; Bonni, S. The ubiquitin ligase Smurf2 suppresses TGFβ-induced epithelial-mesenchymal transition in a sumoylation-regulated manner. Cell Death Differ. 2016, 23, 876–888. [Google Scholar] [CrossRef]
Huang, Y.; Tong, J.; He, F.; Yu, X.; Fan, L.; Hu, J.; Tan, J.; Chen, Z. miR-141 regulates TGF-β1-induced epithelial-mesenchymal transition through repression of HIPK2 expression in renal tubular epithelial cells. Int. J. Mol. Med. 2014, 35, 311–318. [Google Scholar] [CrossRef] [PubMed]
Moustakas, A.; Heldin, C.H. Mechanisms of TGFβ Induced Epithelial-Mesenchymal Transition. J. Clin. Med. 2016, 5, 63. [Google Scholar] [PubMed]
Saito, R.A.; Watabe, T.; Horiguchi, K.; Kohyama, T.; Saitoh, M.; Nagase, T.; Miyazono, K. Thyroid transcription factor-1 inhibits transforming growth factor-beta-mediated epithelial-to-mesenchymal transition in lung adenocarcinoma cells. Cancer Res. 2009, 69, 2783–2791. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Yan, W.; Chen, X. P63 regulates tubular formation via epithelial-to-mesenchymal transition. Oncogene 2014, 33, 1548–1557. [Google Scholar] [CrossRef][Green Version]
Olsen, J.R.; Oyan, A.M.; Rostad, K.; Hellem, M.R.; Liu, J.; Li, L.; Micklem, D.R.; Haugen, H.; Lorens, J.B.; Rotter, V.; et al. p63 attenuates epithelial to mesenchymal potential in an experimental prostate cell model. PLoS ONE 2013, 8, e62547. [Google Scholar]
Lindsay, J.; McDade, S.S.; Pickard, A.; McCloskey, K.D.; McCance, D.J. Role of DeltaNp63gamma in epithelial to mesenchymal transition. J. Biol. Chem. 2011, 286, 3915–3924. [Google Scholar] [CrossRef]
Lamb, J.; Crawford, E.D.; Peck, D.; Modell, J.W.; Blat, I.C.; Wrobel, M.J.; Lerner, J.; Brunet, J.P.; Subramanian, A.; Ross, K.N.; et al. The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–1935. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Gene	Components	References	In NetworkProfiler
AFF1	3	-	-
ANKRD5	1	[28]	-
FOXF1	2	[29,30]	-
FOXF2	1	[31,32]	-
GLI3	2	[33,34]	-
GRHL2	1	[35,36,37,38]	◯
IFI16	2, 3	[39,40,41,42]	-
IRF6	3	[43,44]	◯
KANK2	3	-	-
LSR	2	[45,46,47,48]	◯
MAFB	3	-	-
OVOL2	2	[49,50,51,52]	◯
PCBD1	2	-	-
SOX13	2	[53]	-
TGFB1I1	2	[54,55,56,57]	-
TP63	2, 3	[58,59,60]	-
ZNF91	1	-	-

Unraveling the Molecular Puzzle: Exploring Gene Networks across Diverse EMT Status of Cell Lines

Abstract

1. Introduction

2. Investigating System Changes in Epithelial–Mesenchymal Transition through Personalized Gene Network Analysis

2.1. Computational Strategies

2.1.1. NetworkProfiler

2.1.2. Robust NetworkProfiler

2.2. Uncovering Changes in Gene Regulatory Networks in the Epithelial–Mesenchymal Transition

2.3. Limitations of Current Personalized Gene Network Analysis

3. Explainable Artificial Intelligence (XAI) for Comprehensive Gene Network Analysis

3.1. Method: Tensor Reconstruction-Based Interpretable Prediction (TRIP)

3.2. Comprehensive Interpretation of the Massive Multiple Gene Networks across Varying EMT Statuses

4. Discussion

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics