Hybrid E/M Phenotype(s) and Stemness: A Mechanistic Connection Embedded in Network Topology

Metastasis remains an unsolved clinical challenge. Two crucial features of metastasizing cancer cells are (a) their ability to dynamically move along the epithelial–hybrid–mesenchymal spectrum and (b) their tumor initiation potential or stemness. With increasing functional characterization of hybrid epithelial/mesenchymal (E/M) phenotypes along the spectrum, recent in vitro and in vivo studies have suggested an increasing association of hybrid E/M phenotypes with stemness. However, the mechanistic underpinnings enabling this association remain unclear. Here, we develop a mechanism-based mathematical modeling framework that interrogates the emergent nonlinear dynamics of the coupled network modules regulating E/M plasticity (miR-200/ZEB) and stemness (LIN28/let-7). Simulating the dynamics of this coupled network across a large ensemble of parameter sets, we observe that hybrid E/M phenotype(s) are more likely to acquire stemness relative to “pure” epithelial or mesenchymal states. We also integrate multiple “phenotypic stability factors” (PSFs) that have been shown to stabilize hybrid E/M phenotypes both in silico and in vitro—such as OVOL1/2, GRHL2, and NRF2—with this network, and demonstrate that the enrichment of hybrid E/M phenotype(s) with stemness is largely conserved in the presence of these PSFs. Thus, our results offer mechanistic insights into recent experimental observations of hybrid E/M phenotype(s) that are essential for tumor initiation and highlight how this feature is embedded in the underlying topology of interconnected EMT (Epithelial-Mesenchymal Transition) and stemness networks.


Introduction
Metastasis is the deadliest process in cancer, and it is involved in over 90% of all cancer-related deaths. It is a dynamic multi-stage cascade of events involving the initial detachment of cells from the primary tumor mass, dissemination of cancer cells, their exit (extravasation) at many distant organs, and finally, their ability to colonize distant organs through tumor outgrowth [1]. It is a challenging process for cells with extremely high attrition rates (>99%) such that most cells die in circulation or are unable to adapt to the foreign biochemical and/or biophysical ecology of distant organs [2,3]. Thus, phenotypic plasticity-the ability of disseminating cancer cells to adapt their phenotypes reversibly in response to their dynamic microenvironments-has been regarded as a hallmark of metastasis-initiating cells [4]. Phenotypic plasticity exists at multiple interconnected axes (a) epithelial-mesenchymal plasticity (EMP), (b) metabolic plasticity, and (c) plasticity between a cancer stem cell (CSC) and non-CSC state (i.e., stemness), among others [5]. Understanding such functional inter-dependencies from a dynamical systems level can lead to deciphering the organizing principles of underlying regulatory network modules and, eventually, therapeutic advances that can target cancer cell adaptability/plasticity [6,7].
Out of the various possible pairwise couplings among these different axes of plasticity, the one between EMP and stemness has been investigated most thoroughly through in vitro, in vivo, and in silico approaches. Initial reports, which treated EMP as a binary (all-or-none) process where cells switch between epithelial and mesenchymal phenotypes, suggested that a "full" epithelial-mesenchymal transition (EMT) was associated with increased stemness (tumor initiation potential) [8,9]. This view was challenged by later studies demonstrating that a "full" EMT was dispensable for acquiring stemness, or also perhaps an obstacle for the same [10,11]. A more nuanced recent view highlights that EMP involves multiple stable hybrid E/M states in addition to "fully" E or "fully" M ones and that these hybrid states may be maximally stem-like [12][13][14][15][16][17]. In squamous cell carcinoma cells that were categorized into six phenotypes along the EMP spectrum, stemness was shown to be acquired in earlier stages of EMP, and it did not increase as cells progressed towards a "fully" mesenchymal phenotype. Intriguingly, the metastatic potential was found to be the maximum for hybrid E/M states [16]. Furthermore, in breast cancer cells, a transition from a hybrid E/M phenotype to a "fully" mesenchymal one, as driven by constitutive ectopic expression of EMT-inducing transcription factor (EMT-TF) ZEB1, led to the loss of tumorigenicity in vitro and in vivo [13]. Three-dimensional assays to investigate collective cell invasion in breast cancer organoids showed that most leader cells co-expressed epithelial, mesenchymal, and CSC markers [18], further strengthening the association between hybrid E/M phenotype(s) and stemness. One possible reason for the higher stemness of hybrid E/M cells may be their increased propensity to give rise to epithelial and mesenchymal cells [19,20] in addition to self-renewal, the two critical traits of stem cells in development and homeostasis. While this association of hybrid E/M phenotype with enhanced stemness has been consistently reported across cancer types [21], a mechanistic underpinning facilitating the association's emergence remains unclear.
We have previously developed mechanism-based mathematical models that investigate the coupled nonlinear dynamics of EMP and stemness modules-miR-200/ZEB and LIN28/let-7, respectively. These models have predicted that while the most likely positioning of the "stemness window" is around the mid-point of the "EMP axis" (i.e., hybrid E/M phenotypes) [15], the "window" is dynamic in nature and could move to either more epithelial or more mesenchymal ends too [22,23]. A caveat of this analysis is that it was performed on a few specific parameter sets; thus, its applicability to explain the diverse and mounting experimental evidence across cancer types, as mentioned above, remains limited.
Here, we have simulated the coupled dynamics of miR-200/ZEB and LIN28/let-7 circuit over a wide range of biologically relevant parameter sets to interrogate whether the association of a hybrid E/M phenotype with stemness is a trait embedded in the topology of this coupling itself. Our results suggest that the overlap of hybrid E/M and stemness features is an outcome of coupled network topology between the miR-200/ZEB and LIN28/let-7 feedback loops. Moreover, upon extending the network to include various phenotypic stability factors (PSFs), such as GRHL2, OVOL1/2, and NRF2, we notice that this association is maintained. Thus, our results unravel the underlying design principles of the EMP-stemness association and predict that this interconnection is likely to be seen across multiple carcinomas.

Gene Regulatory Network Underlying E/M Plasticity and Stemness Enables the Existence of Four E/M Phenotypes
To unravel the mechanistic underpinnings of the association between stemness and epithelial-mesenchymal plasticity (EMP), we considered the regulatory interactions among key factors implicated in governing the E/M plasticity (ZEB/miR-200) and the stemness characteristics (LIN28/let-7) ( Figure 1A). E/M plasticity is at least in part controlled by a mutually inhibitory feedback loop between the transcription factor family ZEB and the microRNA family miR-200 [24,25]. Moreover, ZEB can self-activate through ESRP1 and/or CD44/HA signaling [26][27][28]. This network can give rise to three phenotypes: epithelial (high miR-200, low ZEB), mesenchymal (low miR-200, high ZEB), and hybrid E/M (medium miR-200, medium ZEB) [29]. This network can be influenced by other EMT-TFs such as SNAIL which self-inhibits transcriptionally and activates the expression very different between e and m states, its levels are similar for e and he clusters, and for hm and m clusters ( Figure S2A and Figure 1D, Table S7), thus indicating that miR-200 may be a more coarse-grained readout of the EMT state of a cell. Corroborative trends, i.e., the highest levels of let-7 and lowest levels of LIN28 in the e cluster, and lowest levels of let-7 and highest levels of LIN28 in the m cluster, were observed ( Figure 1D and Figure S2B). The levels of NF-kB and SNAIL levels were found to be almost similar across the clusters ( Figure S2C,D), which is not surprising because neither NF-kB nor SNAIL are influenced by any of the other four nodes (miR-200, ZEB, let-7, LIN28) in the coupled EMP-stemness network.  Table S2 in Section S2.1) showing ZEB levels in the four clustersepithelial (e), hybrid epithelial (he), hybrid mesenchymal (hm), and mesenchymal (m). The colors of the labels shown in the heatmap are for e, he, hm, and m clusters displayed here. (D) Expression levels of EMP and the stemness genes in the four clusters. In each node, the horizontal line represents the median value, the darkest region represents the middle 30% (35th to 65th percentile) of the data, and the region next in lightness represents the next 40% (15th to 35th and 65th to 85th percentiles), while the lightest region represents the next 20% of the data (5th to 15th and 85th to 95th percentiles) (E) Principal Component Analysis (PCA) plot colored according to the K-means cluster assignment. (Principal Component 1 (PC1) = 0.48 × miR200 − 0.54 × ZEB − 0.21 × SNAIL − 0.49 × LIN28 + 0.44 × let7 + 0.057 × NF-kB; Principal Component 2 (PC2) = 0.27 × miR200 − 0.08 × ZEB − 0.66 × SNAIL + 0.14 × LIN28 − 0.49 × let7 − 0.47 × NF-kB). For (C,D): Significance bars added to show statistically significant differences (Mann-Whitney U test). Legend for the p-values: *** p < 0.0001; for effect sizes, refer to Table S7 in Supplementary Section S2.5.

Stemness Characteristics are Enriched for in the Hybrid E/M Phenotypes
The gene regulatory network involving the interplay between E/M plasticity and stemness can give rise to different phenotypes (steady states). Next, we investigated whether these states can co-exist for certain parameter sets, i.e., is the network capable of exhibiting multistability? Out of the ensemble of parameter sets sampled by RACIPE, we observed that a subset of parameter sets can give rise to a combination of two or more steady states per parameter set (bistability: 37.3 ± 0.95%, tristability: 24.2 ± 0.09%, etc.), while the others converged to a single steady state for the given system of ordinary differential equations (monostability: 29.6 ± 0.70%). A closer inspection of the monostable solutions revealed epithelial (e) and mesenchymal (m) states as the most predominant solutions (Figure 2A), indicating that epithelial and mesenchymal phenotypes are more  Table S2 in Section S2.1) showing ZEB levels in the four clusters-epithelial (e), hybrid epithelial (he), hybrid mesenchymal (hm), and mesenchymal (m). The colors of the labels shown in the heatmap are for e, he, hm, and m clusters displayed here. (D) Expression levels of EMP and the stemness genes in the four clusters. In each node, the horizontal line represents the median value, the darkest region represents the middle 30% (35th to 65th percentile) of the data, and the region next in lightness represents the next 40% (15th to 35th and 65th to 85th percentiles), while the lightest region represents the next 20% of the data (5th to 15th and 85th to 95th percentiles) (E) Principal Component Analysis (PCA) plot colored according to the K-means cluster assignment. (Principal Component 1 (PC1) = 0. 48 . For (C,D): Significance bars added to show statistically significant differences (Mann-Whitney U test). Legend for the p-values: *** p < 0.0001; for effect sizes, refer to Table S7 in Supplementary Section S2.5.
All these regulatory links in the abovementioned network may be active, with different strengths in individual cells. Further, in cancer cells, the mutational background can alter the stability and/or production rate of various species, for instance, mutant p53 gain of function, triggering ZEB1 [42]. Thus, a rigorous analysis of the emergent dynamics of the coupled EMP-stemness network would require a framework that can decode the features robust to parameter variation of a specific network topology. To achieve this goal, we simulated the coupled EMP-stemness network through random circuit perturbation RACIPE [43] (see Section 4). RACIPE takes network topology as its input, converts it into a set of coupled ordinary differential equations (ODEs) where the influence of each activation or inhibition regulatory link is represented by a shifted Hill function. Parameters corresponding to this set of equations are sampled randomly within a biologically relevant range. Therefore, it generates an ensemble of ODE models, each with a different parameter set, thus representing the intrinsic variability in kinetic parameter values in a given cell population. The set of steady-state values obtained from this ensemble can be then used to identify robust dynamical signatures emerging from this topology through appropriate statistical analysis (see Section 4).
First, we plotted a heatmap of all the steady-state values obtained via RACIPE for this coupled network and performed agglomerative hierarchical clustering ( Figure 1B) on it. We observed the possible existence of four major clusters, indicated in the dendrogram plotted using Ward's minimum variance criterion (see Section 4) ( Figure S1A). We next performed K-means clustering for varied values of K and observed a peak in the average silhouette width at K = 4, endorsing the existence of four clusters in RACIPE solutions ( Figure S1B). This trend was supported by all other cluster quality metrics we used ( Figure S1C-E). We plotted histograms of ZEB levels for the four clusters identified in the heatmap and assigned them phenotypic labels accordingly ( Figure 1C,D). The cluster with the minimum median ZEB levels is referred to as an epithelial (e) phenotype, while the cluster with the maximum ZEB median levels is referred to as a mesenchymal (m) phenotype. The other two clusters had intermediate ZEB levels, being categorized as E/M hybrids, and are assigned the labels of hybrid epithelial (he) and hybrid mesenchymal (hm). Furthermore, we performed principal component analysis (PCA) on all steady-state solutions ( Figure 1E; each color represents a distinct cluster identified by K-means). Principal component 1 and 2 explain 47% and 19% of the total variance, respectively, and show roughly four clusters in the PCA plot where the two hybrid clusters (blue and pink) are sandwiched between the epithelial (purple) and the mesenchymal (yellow) clusters (Figure 1E). PC1 can be thought of as an approximate EMT axis based on the coefficients for miR-200 and ZEB. Interestingly, while miR-200 levels are very different between e and m states, its levels are similar for e and he clusters, and for hm and m clusters ( Figure S2A and Figure 1D, Table S7), thus indicating that miR-200 may be a more coarse-grained readout of the EMT state of a cell. Corroborative trends, i.e., the highest levels of let-7 and lowest levels of LIN28 in the e cluster, and lowest levels of let-7 and highest levels of LIN28 in the m cluster, were observed ( Figure 1D and Figure S2B). The levels of NF-kB and SNAIL levels were found to be almost similar across the clusters ( Figure S2C,D), which is not surprising because neither NF-kB nor SNAIL are influenced by any of the other four nodes (miR-200, ZEB, let-7, LIN28) in the coupled EMP-stemness network.

Stemness Characteristics are Enriched for in the Hybrid E/M Phenotypes
The gene regulatory network involving the interplay between E/M plasticity and stemness can give rise to different phenotypes (steady states). Next, we investigated whether these states can co-exist for certain parameter sets, i.e., is the network capable of exhibiting multistability? Out of the ensemble of parameter sets sampled by RACIPE, we observed that a subset of parameter sets can give rise to a combination of two or more steady states per parameter set (bistability: 37.3 ± 0.95%, tristability: 24.2 ± 0.09%, etc.), while the others converged to a single steady state for the given system of ordinary differential equations (monostability: 29.6 ± 0.70%). A closer inspection of the monostable solutions revealed epithelial (e) and mesenchymal (m) states as the most predominant solutions (Figure 2A), indicating that epithelial and mesenchymal phenotypes are more likely to exist relative to the hybrid ones, thereby reminiscent of observations based on the modeling of larger EMP regulatory networks too [44,45]. This trend is also observed when considering all the solutions together irrespective of the number of states (Table S5). Consistently, among all possible bistable solutions, the most predominant one is the phase {e, m}, i.e., the co-existence of epithelial and mesenchymal states ( Figure S3A). The least frequent bistable phase is {he, hm}, i.e., the one with the co-existence of both hybrid phenotypes, while the frequency of bistable phases containing one hybrid phenotype and one of the epithelial or mesenchymal ones-{e, hm}, {he, m}, {e, he}, {m, hm}-is intermediate ( Figure S3A). Similarly, among all possible tristable solutions, the phases containing both epithelial and mesenchymal states-{e, he, m} and {e, hm, m}-were the most predominant ( Figure S3B). Put together, these results indicate that the coupled EMPstemness regulatory network is multistable and therefore allows for cells to switch among the epithelial, mesenchymal, and the hybrid phenotypes due to stochastic perturbations.
J. Clin. Med. 2021, 10, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/jcm likely to exist relative to the hybrid ones, thereby reminiscent of observations based on the modeling of larger EMP regulatory networks too [44,45]. This trend is also observed when considering all the solutions together irrespective of the number of states (Table  S5). Consistently, among all possible bistable solutions, the most predominant one is the phase {e, m}, i.e., the co-existence of epithelial and mesenchymal states ( Figure S3A). The least frequent bistable phase is {he, hm}, i.e., the one with the co-existence of both hybrid phenotypes, while the frequency of bistable phases containing one hybrid phenotype and one of the epithelial or mesenchymal ones-{e, hm}, {he, m}, {e, he}, {m, hm}-is intermediate ( Figure S3A). Similarly, among all possible tristable solutions, the phases containing both epithelial and mesenchymal states-{e, he, m} and {e, hm, m}-were the most predominant ( Figure S3B). Put together, these results indicate that the coupled EMP-stemness regulatory network is multistable and therefore allows for cells to switch among the epithelial, mesenchymal, and the hybrid phenotypes due to stochastic perturbations.  Table S7 in Supplementary Section S2.5.
Next, we investigated the stemness traits of different cell states along the E/M spectrum. Intermediate levels of OCT4 have been reported to confer maximal stemness to the cells [35][36][37]. Given that OCT4 is a direct target of LIN28, we defined the "stemness for the four phenotypes with the stemness window (black vertical lines) as defined by the middle 30% of the "LIN28 biological range" (median ± one interquartile range averaged over n = 5 RACIPE replicates). Significance bars added to show statistically significant differences (Mann-Whitney U test) (C) Value of p1-the probability of a solution lying in the stemness window, conditional on the solution belonging to a particular phenotype for all the phenotypes (n = 5 RACIPE replicates). (D) Same as (C) except for p2-the probability of a solution belonging to a particular phenotype conditional on it lying in the stemness window. For (A,C,D): Mean values show average across five RACIPE replicates and the error bars represent the across-replicate standard deviations. Two-tailed Student's t-test with unequal variance (Welch's t-test) was performed. Legend for the p-values: + p > 0.01, * 0.01 > p > 0.001, ** 0.001 > p > 0.0001, *** p < 0.0001; for effect sizes, refer to Table S7 in Supplementary Section S2.5.
Next, we investigated the stemness traits of different cell states along the E/M spectrum. Intermediate levels of OCT4 have been reported to confer maximal stemness to the cells [35][36][37]. Given that OCT4 is a direct target of LIN28, we defined the "stemness window" to be a region around median LIN28 expression levels in all steady-state solutions combined and the median averaged over all the replicates of the coupled EMP-stemness circuit. Specifically, we considered the median ± one interquartile range of the LIN28 distribution to be the "biological range" of LIN28, and the middle 30% of this range centered around its median is considered to be the "stemness window", which might allow for a population enriched in the stem-like phenotype ( Figure 2B). To test the robustness of this specifically chosen stemness window, we need a semi-objective heuristic for increasingly wider "stemness windows" centered at the mid-point of the range chosen. When the size of the "stemness window" is considered to be more than one-third of the range of LIN28 levels, we see a qualitative change in the distribution of phenotypes lying in this window. This analysis underscores that hybrid E/M phenotypes are predominantly associated with stemness unless the "stemness window" starts getting too large to be of a biologically implausible size ( Figure S3C).
Next, we quantified the probability that a particular solution lies within the "stemness window" given that it belongs to a particular phenotype along the E-M spectrum (p1). Interestingly, we found that hybrid (he and hm) phenotypes are more likely to be stem-like than either extreme (e and m) E/M phenotypes ( Figure 2C). Alternatively, we computed the probability that a solution belongs to a particular phenotype along the E-M spectrum, given that it already lies in the "stemness window" (p2). Congruently, we observed that hybrid phenotypes (he and hm) are more enriched for in the stemness region than either epithelial (e) or mesenchymal (m) solutions, although the contribution of e and m cell states to "stemness window" cannot be ignored ( Figure 2D). Put together, these observations suggest that although the hybrid phenotypes he and hm are more likely to be associated with a stem-like characteristic, stemness is not exclusive to them. Thus, a subset of "pure" epithelial and/or mesenchymal cell states can also be potentially stem-like, depending on parametric combinations. Similar trends are observed when analyzing the multistable systems only as well (i.e., excluding all solutions obtained from monostable parameter sets), endorsing the robustness in trends (Supplementary Section S4).
Having established a strong association between the hybrid E/M phenotypes and stemness, we sought to understand the mechanistic underpinnings that can facilitate this association. Through RACIPE, we simulated a hypothetical "uncoupled" circuit where the inhibitory links from miR-200 to LIN28 and those from let-7 to ZEB are absent. The ensemble of solutions obtained for this uncoupled circuit also has four clusters (see Supplementary Section S2.2). For the uncoupled circuits, when we plot the clusters in a ZEB-LIN28 plane (indicative of an EMP-stemness plane with ZEB being a proxy for the E/M axis and LIN28 as a proxy for stemness), we observed that the mid-points of these four clusters were arranged in a square-the four vertices being [high ZEB, high LIN28], [low ZEB, high LIN28], [high ZEB, low LIN28], and [low ZEB, low LIN28]-demonstrating the independence of the ZEB and LIN28 levels ( Figure S4A). The [high ZEB, high LIN28] and [low ZEB, low LIN28] clusters correspond to m and e states, respectively, and the location of the former on the two-dimensional plane for the coupled circuit overlapped with that of the uncoupled circuit. However, in the case of the coupled circuit, the square geometry is disturbed; instead, more of a rhombus-like shape emerged due to the shift of the mid-points of the [low ZEB, high LIN28] and [high ZEB, low LIN28] clusters which now tend to have intermediate values of both. This analysis suggests that coupling EMP and stemness modules enables a higher likelihood of both hybrid (he, hm) clusters to be more stem-like ( Figure S4A). This observation can be mechanistically interpreted by understanding the relative "activity" of the two coupling links (miR-200 inhibiting LIN28 and let-7 inhibiting ZEB). In the [high ZEB, high LIN28] state, the levels of let-7 and miR-200 are relatively low, thus, the links are effectively "inactive". In the case of [low ZEB, high LIN28] and [high ZEB, low LIN28], one link is much more active than the other, hence we see relocations mostly along one of the two axes of the LIN28-ZEB plane. For the [low LIN28, low ZEB] state, both links are approximately equally "active", and therefore the cluster migrates along to even lower values of both LIN28 and ZEB.
To validate this hypothesis, we quantified the strength of coupling links-the net effect was strongest for e (i.e., [low LIN28, low ZEB]) and weakest for m (i.e., [high ZEB, high LIN28]) ( Figure S4B). As a control case for the link strength metric, we quantified the degree of asymmetry in the strength of the two links within EMP and stemness modules individually (see Section 4). We observed that, as expected, the effective "suppression" (the expression free topological asymmetry) of ZEB on miR-200 was the highest in the case of hm and m states relative to e and he states ( Figure S4D). On the other hand, we observed that the suppression of LIN28 on let-7 was higher in the case of he and m and lower in the e and hm phenotypes ( Figure S4C).

The Effect of Phenotypic Stability Factors (PSFs) on the Phenotypic Landscape of EMP
Previous work, including ours, has shown that various factors, such as GRHL2, OVOL1/2, and NRF2, can stabilize the hybrid E/M phenotype of cells [46][47][48][49]. Their knockdown in H1975-a stable hybrid E/M lung cancer in vitro model-pushed cells towards a complete EMT, thus, they were classified as "phenotypic stability factors" (PSFs) for hybrid E/M phenotypes. For a specific set of parameters, incorporating GRHL2 also promoted the association between hybrid E/M states and stemness [46]. Thus, we investigated whether the hybrid E/M-stemness interconnection was maintained for multiple PSFs across an ensemble of parameter sets.
We simulated via RACIPE the coupled EMP-stemness networks after incorporating these three PSFs, one at a time ( Figure 3A). As compared to the circuit without these PSFs (i.e., the "base" coupled EMP-stemness network shown in Figure 1A), the circuits including PSFs had a higher frequency of parameter sets leading to tristability and quadrastability, at the expense of a lower frequency of parameter sets primarily leading to monostability and even bistability in the case of the GRHL2 circuit ( Figure 3B). In general, all these circuits showed a trend towards increasing multistability, with the strongest effect seen for the GRHL2 circuit ( Figure 3B). However, incorporating the PSFs altered neither the optimal number of clusters as identified by average silhouette widths ( Figure 3C) nor the position of clusters on the ZEB-LIN28 plane that showed an almost complete overlap with those seen for the "base" circuit ( Figure 3D). Therefore, the introduction of the PSFs does not seem to disrupt the core structure of the steady-state solutions, as obtained earlier.
In addition to being thought of as PSFs, GRHL2 and OVOL1/2 have been considered as Mesenchymal-Epithelial Transition (MET)-inducing transcription factors (MET-TFs), as their overexpression in carcinomas is capable of upregulating the expression levels of Ecadherin and/or revert EMT [50][51][52][53][54] as well as other EMT-associated traits, such as anoikis resistance, metabolic reprogramming [55], and immune evasion [56]. This observation is consistent with their knockdown known to drive a "full" EMT in both cancer cell lines and in developmental contexts [46,49]. Thus, with the goal to identify the effect of overexpression and downregulation of these PSFs on relative proportions of the different EMPs, we quantified the changes in the frequency of different monostable solutions-e, m, he, and hm-upon a 10-fold over-expression (oe10) and a 10-fold down-expression (de10) of these PSFs. Computationally, this provides a dynamically consistent way to characterize the effect attributable to any node on the network dynamics and RACIPE implements this perturbation by altering the rate of production of the specific node (see Section 4). We observed that increasing the expression of any one of the three PSFs enriched the epithelial (e) phenotype at the expense of the mesenchymal (m) phenotype. Similar, albeit weaker, trends were seen for an enrichment of the he phenotype at the expense of the hm phenotype ( Figure 3E). This effect was more pronounced in the GRHL2 case in comparison to either OVOL or NRF2 ( Figure 3E). This trend is also present overall not only for only monostable solutions, but also for all parameter sets considered together (see Supplementary Section S2.4).
J. Clin. Med. 2021, 10, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/jcm hm, m} phase upon overexpression of the PSFs ( Figure S5D-F), with unremarkable changes in the other phases (see Supplementary Section S2.3 for details on the multistable distributions). Put together, these results suggest that incorporating any of these PSFs (GRHL2, OVOL, NRF2) tends to maintain the four clusters seen in the EMPstemness coupled network earlier, and their over-expression can serve as a way to at least inhibit the progression of a complete EMT. Further analyses regarding the association of multistability and stemness can be obtained in Supplementary Section 4.

Networks Incorporating Phenotype Stability Factors (PSFs) Preserve the Association of the Hybrid E/M States with Stemness
Next, we evaluated how the association of EMP phenotypes with stemness is altered upon including PSFs. As earlier, we explored two kinds of association of the hybrid E/M phenotypes and a stem-like state: the probability of being a stem-like state given that a solution belongs to a particular E/M phenotype (p1) and the probability of belonging to a particular E/M phenotype given that the solution is a stem-like state (p2). On comparing p1 and p2 for all circuits with the two hybrid clusters (he, hm) taken to- Moreover, a closer inspection of steady-state solutions from bistable parameter sets showed a consistent trend across PSFs in the case of their overexpression: a most remarkable decrease in the proportion of the {hm, m} phase with the most remarkable increase in the proportion of the {he, e} phase ( Figure S5A-C). Similarly, the tristable states also showed an agreement with the trend of enriching "relatively epithelial" phases, with maximum enrichment of the {e, he, m} phase and a maximum depletion of the {e, hm, m} phase upon overexpression of the PSFs ( Figure S5D-F), with unremarkable changes in the other phases (see Supplementary Section S2.3 for details on the multistable distributions). Put together, these results suggest that incorporating any of these PSFs (GRHL2, OVOL, NRF2) tends to maintain the four clusters seen in the EMP-stemness coupled network earlier, and their over-expression can serve as a way to at least inhibit the progression of a complete EMT. Further analyses regarding the association of multistability and stemness can be obtained in Supplementary Section S4.

Networks Incorporating Phenotype Stability Factors (PSFs) Preserve the Association of the Hybrid E/M States with Stemness
Next, we evaluated how the association of EMP phenotypes with stemness is altered upon including PSFs. As earlier, we explored two kinds of association of the hybrid E/M phenotypes and a stem-like state: the probability of being a stem-like state given that a solution belongs to a particular E/M phenotype (p1) and the probability of belonging to a particular E/M phenotype given that the solution is a stem-like state (p2). On comparing p1 and p2 for all circuits with the two hybrid clusters (he, hm) taken together and the two non-hybrid clusters (e, m) taken together, we observe that the hybrids are much more likely to be stem-like ( Figure 4A) as well as constitute a larger proportion of the stemnessenriched pool of solutions ( Figure 4B), across all cases of PSFs. This observation suggests that the association of hybrid E/M phenotypes with stemness is a dynamical trait encoded in the core network topology, and this feature is maintained upon including PSFs. We additionally looked at the distribution profile of the PSFs in all the clusters ( Figure S7A-I) for all the circuits with down-and over-expression. Overall, the distribution profiles are distinct for all the circuits, perhaps indicating the role of specific topological features in dictating this distribution profile. GRHL2 remains appreciably bimodal throughout the range of expression, and the e and he clusters occupy the higher end of the GRHL2 histogram, while the m and hm clusters occupy the lower end ( Figure  S7A-C). OVOL is bimodal at 10-fold down-expression, but the profile shifts towards unimodality upon increasing expression, possibly due to its self-inhibition. Generally, a lower OVOL level is associated with the m and hm clusters and vice versa ( Figure S7D-F). On the contrary, the NRF2 distribution profile does not show any readily appreciable preference for any of the clusters at any level of expression ( Figure S7G-I).   Table S8 in Supplementary Section S2.5 for the statistical testing of these quantities.
A closer look at the changes in p1 and p2 values upon over-expression or downregulation of PSFs revealed small variable minor changes for p1 ( Figure S6A-C), but the p2 values for the hybrid epithelial (he) phenotype increased, whereas that of hybrid mesenchymal (hm) state decreased with little change in those of e or m phenotypes ( Figure 4C-E). However, at all levels of PSF expression in all circuits (from 10-fold down-expression to a 10-fold over-expression), the hybrids together (he, hm) have a more considerable value of p1 and p2 in comparison to the "fully" epithelial and mesenchymal states (e, m) ( mboxfigfig:jcm-1007161-f004C-E and Figure S6A-C), suggesting that hybrid E/M phenotypes are more likely to be stem-like than either extreme of the EMT axis. Further, over-expression of PSFs may pull the composition of the solutions in the "stemness window" towards the epithelial end of the EMT axis.
We additionally looked at the distribution profile of the PSFs in all the clusters (Figure S7A-I) for all the circuits with down-and over-expression. Overall, the distribution profiles are distinct for all the circuits, perhaps indicating the role of specific topological features in dictating this distribution profile. GRHL2 remains appreciably bimodal throughout the range of expression, and the e and he clusters occupy the higher end of the GRHL2 histogram, while the m and hm clusters occupy the lower end ( Figure S7A-C). OVOL is bimodal at 10-fold down-expression, but the profile shifts towards unimodality upon increasing expression, possibly due to its self-inhibition. Generally, a lower OVOL level is associated with the m and hm clusters and vice versa ( Figure S7D-F). On the contrary, the NRF2 distribution profile does not show any readily appreciable preference for any of the clusters at any level of expression ( Figure S7G-I).
The preference of GRHL2 and OVOL for the e and he clusters may be attributed to the presence of a mutually inhibitory negative feedback loop with ZEB, which causes the PSF (GRHL2, OVOL) levels to be relatively high when the ZEB levels are low (e and he) and vice versa. The GRHL2 and OVOL coupling with the EMP circuit has two differences: (a) GRHL2 activates miR-200 but OVOL does not, and (b) GRHL2 self-activates but OVOL self-inhibits. To deconvolute the effect of these two differences, we investigated a circuit where the activation link from GRHL2 to miR-200 is knocked down (GRHL2-KD). This circuit differs from the OVOL circuit only in terms of the nature of the self-regulation. We then investigated this knockdown circuit with regard to all the investigations explored for each of the other circuits above and compared them to the trends observed for the OVOL circuit. Qualitatively, all the trends matched well, and the only remarkable difference was in the cluster-wise distribution profile of the PSF at different levels of expression ( Figure 4F and Figure S8A-F), suggesting that the specific enrichments observed are primarily dictated by the interacting links between the PSF node and the other nodes, while the PSF distribution profile itself is strongly affected by the nature of the self-link.

Discussion
Metastasis has been indicated as a hallmark of cancer [57]. However, decades of research have highlighted that hallmarks of metastasis are quite different from those of primary tumor growth [58]. Given the ever-changing biochemical and biomechanical environments for the cell in the metastatic cascade, it is not surprising that metastasis is a highly inefficient process and that only a minuscule percentage of disseminated cells that can pass through these bottlenecks by adapting to their dynamic environments can eventually colonize distant organs and form macrometastases. Thus, the concept of Lamarckian evolution seems to be more prevalent than that of Darwinian selection (of a subset of genetic clones) in the context of metastasis. Consequently, environment-driven phenotypic plasticity has been advocated as a hallmark of metastasis [58]. Recent observations in the evolution of cancer therapy resistance endorse the role of Lamarckian induction in enabling the dynamic adaptability of cancer cells [59][60][61][62][63].
Here, we elucidate the design principles of regulatory networks connecting two crucial aspects of phenotypic plasticity during metastasis-epithelial-mesenchymal plasticity (EMP) and stemness. Our results demonstrate that a higher likelihood of hybrid epithelial/mesenchymal (E/M) phenotype(s) acquiring increased stemness is an outcome of the network topology of a coupled EMP-stemness network. These findings offer a mechanistic basis of various recent in vitro, in vivo, and clinical work suggesting an enhanced stemness and/or aggressiveness trait of hybrid E/M phenotype(s), as reported across different carcinomas. A recent quantitative analysis of about a hundred urothelial carcinomas concluded that "it is not the amount, but merely the presence of a minimum of tumor cells in hybrid E/M states that contribute to disease aggressiveness" [64]. This clinical observation is reminiscent of in vivo experiments showing that both "extremely epithelial" (generated via CRISPR/Cas9 by eliminating Zeb1 expression) and "extremely mesenchymal" cells (generated by constitutive over-expression of Zeb1), either alone or in combination, had minimal tumor initiation potential as compared to hybrid E/M cells. Put together, these results highlight the importance of individual cells co-expressing epithelial and mesenchymal markers (i.e., one or more hybrid E/M phenotype(s)) in enabling stemness. At least two factors can underlie this enhanced stemness of hybrid E/M cells: (a) their ability to self-renew, as some cells undergoing a full EMT can undergo cell cycle arrest [65,66], and (b) their ability to give rise to heterogeneous subpopulations of epithelial and mesenchymal cells [19,20] that can potentially cooperate in driving metastatic traits [67]. Thus, hybrid E/M cells can be conceptually thought of as similar to adult stem cells in a tissue, a notion supported by a hybrid E/M phenotype seen in a subset of mammary stem cells [68,69].
It should be noted that we are proposing an enrichment of hybrid E/M phenotype(s) in stem-like populations, but not an exclusive or exhaustive association between hybrid E/M cells and stemness. Not every cell undergoing EMP, even a partial EMT, is expected to be stem-like; nor is every cancer stem cell (CSC) expected to manifest the hybrid E/M phenotype(s). CSCs of varying EMP status have been seen in breast cancer [11], prostate cancer [70], cervical cancer [71], and squamous cell carcinoma [72,73], with different possible spatial localization within a tumor [74]. This possibility can be explained by our model predictions demonstrating the presence of more epithelial or more mesenchymal phenotypes within the dynamic "stemness window", albeit at a relatively lower frequency as compared to that of hybrid E/M phenotype(s). Increased stemness may lead to a higher tumor-initiating potential.
Our results show that stemness as a function of EMP is not necessarily always monotonically increasing. Instead, stemness is likely to first increase as cells begin to go through EMT and then decrease as they cross the range of hybrid phenotype(s) to become "fully" mesenchymal. Whether the terminal value of stemness is higher or lower than the initial "fully" epithelial remains to be rigorously quantified. Similar observations have been reported for tumor aggressiveness as a function of chromosomal instability (CIN) [75,76]. Such reports caution us against falling prey to tacit assumptions about monotonic relationships between any two molecular factors and/or phenomena and reveal the fundamental insights that can be gained by mechanistic modeling of nonlinear dynamics embedded in a complex regulatory network.
To investigate the robustness of our results and quantify the influence of various PSFs on the association between hybrid E/M phenotypes and stemness, we incorporated GRHL2, OVOL2, and NRF2 in our coupled network. We observed that these PSFs maintained the enrichment of hybrid E/M phenotypes in stem-like traits, endorsing the robustness of this association. Consistently, many of these PSFs have been reported to play a functional role in maintaining more epithelial phenotypes (e, he) as well as stemness in experiments in cancer cell lines and in cellular reprogramming contexts involving the modulation of EMP [53,77,78]. GRHL2 and OVOL2 have also been proposed as MET-inducing transcription factors (MET-TFs), but our simulations suggest that their ability to induce a "complete" MET need not be universal. This prediction is validated by recent experimental observations that the ability of GRHL2 to induce a MET depended on the epigenetic background of cells [54,79]. Thus, our results propose that. similar to variations in the ability of EMT-TFs to drive an EMT [80,81], MET-TFs may induce MET to varying degrees. Moreover, the epigenetic background of cells may alter the dynamics of EMT/MET and/or CSCs/non-CSCs and the interconnection between these two axes.
The approach presented here can be expanded to answer a long-standing open question in metastasis-how do cells coordinate various axes of their plasticity during the metastatic cascade? Mechanism-based models of individual axes of plasticity are the building blocks required to answer this question from a dynamical systems perspective. There has been increasing interest in dynamical modeling of EMP [82][83][84][85][86]; similar efforts to model related aspects of EMP, such as metabolic reprogramming [87], autophagy [88,89], therapy resistance [90,91], and immune evasion [92], are being attempted too. Coupling these modules to decode the interconnections among different axes of plasticity can help unravel the survival strategies of metastasis-initiating cells and eventually contribute to designing new clinical strategies. For instance, a recent prediction we made is that breaking the positive feedback loops in a network can restrict plasticity and potentially impact metastatic potential [6]. Indeed, proof-of-principle validation for this prediction was seen when cancer cells with a compromised ZEB/miR-200 feedback loop showed significant metastatic ability in vivo [93]. Future efforts are needed to dissect the functional consequences of breaking feedback loops among different axes of plasticity (such as the ones investigated here) in eventually delaying or preventing metastasis.

Random Circuit Perturbation (RACIPE)
RACIPE [43] is a tool allowing a parameter-agnostic computational simulation of regulatory networks based solely on the network topology. The network is composed of nodes representing various gene products and directed links representing inhibitory and activating links between them. For each node i with its expression level at a time t as p i , with incoming links from a set {A} of nodes, we thus have the ordinary differential equation for the dynamics of p i : where g is the basal production rate, k is the degradation rate, and H S is the shifted Hill function with parameters n, µ, λ, p representing the effect of regulation by node j on i, altering its production rate. λ > 1 represents an activating link and λ < 1 represents an inhibitory link, with λ = 1 representing no regulation. p is the threshold parameter that controls the level of p around which the non-linearity in H S is seen. n is the Hill coefficient representing the "extent of non-linearity" in the function by modulating the steepness of the response curve. See Supplementary Figure S9 for graphs of the shifted Hill function for varied parameters and Supplementary Section S3 for the explicit form of all the model equations. The final form of the shifted Hill function is given below.

Parameter Sampling
Given a network topology, RACIPE generates an ensemble of random models, each with a randomly sampled set of parameters for every link. The default range of random sampling has been estimated from BioNumbers [94], and here we choose the default option of uniform random sampling from the range. Instead of directly randomizing the basal production rate parameter g, RACIPE randomizes the maximal production rate G, following where {A p } is the set of all incoming activating links (production rate when all activating links are maximally active and all inhibitory links are inactive). For each parameter set, RACIPE randomizes the initial values between the smallest possible steady state (all activating regulations are inactive and all inhibitory regulations are maximally active) and the largest possible state (the opposite of the previous configuration) in log scale. RACIPE allows perturbing the circuit by changing the expression levels of different nodes relative to the other unperturbed nodes. Over-expression and downregulation by xfold changes the sampling range for G by x times, hence, a 10-fold over-expression of a node will sample the G of the node from a range 10 times the range of the unperturbed (normal) node, while a 10-fold down-expression shrinks the range to 1/10th of its normal range. Details of the default ranges and the ranges used in our simulation are in Supplementary Section S1.1.2.

Simulation
We generate an ensemble of 10,000 models with each run (analogous to a biological replicate, and referred to elsewhere as such), and repeat this for 5 replicates to get the standard deviation for proportion (or frequency) calculations. RACIPE, for each of the models, simulates the network starting from multiple random initial positions until the network reaches a steady state, and prints as output the levels of each node at all identified steady-state solutions. The details of the run parameters used are in Supplementary Sections S1.1.1 and S1.1.2. The output of RACIPE for levels of all nodes are in log2 scale, and we perform a gene-wise normalization of the values before further analysis. For p i , representing the actual steady-state levels of node i, the final values z i are obtained after normalizing the log-scale node levels: Here, x and σ(x) represent the mean value and the standard deviation of the respective quantities. p re f is taken as the corresponding node in the circuit with no over-expressed or down-expressed nodes (the "reference circuit"), allowing a uniform normalization for circuits across perturbations (see Supplementary Section S1.2.3).

Link Strength Metrics
We define a link strength metric similar to the one used earlier [95] to quantify the strength of each link independent of its activation at steady state. The link strength (l ij ) is directly proportional to an activating λ p or inversely proportional to an inhibitory λ m and it is inversely proportional to the threshold µ. To normalize the threshold, we choose the normalization value G/k, the maximum possible node level at steady state.
For any two links l ij and l gh (usually a part of a mutually inhibitory feedback motif between two nodes), the asymmetry is defined as log 2 l ij /l gh , representing the overall asymmetry in the effective strengths of the two links forming a mutually inhibitory feedback loop. When quantifying the coupling strength, the combined strength is defined as log 2 l ij + log 2 l gh .

Calculation of p1 and p2 Values
For the calculation of the conditional probability of having a feature x conditional on having a feature y, we take the proportion of all steady-state solutions having the feature y that have the feature x too. We use two such conditional probabilities in our results. p1 is the probability of lying in the stemness window conditional on the solution belonging to a particular cluster, and is calculated as the proportion of all solutions of the specific cluster lying in the stemness window. p2 is the probability of belonging to a particular cluster conditional on the solution lying in the stemness window, and is calculated as the proportion of all solutions in the stemness window belonging to a particular cluster.

Clustering
For clustering, a K-means algorithm was used on the appropriately normalized data for only the four nodes (miR-200, ZEB, LIN28, and let7), with all solutions taken together with the same weight, without distinguishing between solutions belonging to systems with different numbers of stable states. K-means finds an optimal splitting of the data into K clusters (where K is a hyperparameter that the algorithm takes as the input) by iteratively minimizing inertia, or the total within-cluster sum of squares distance from the cluster centers. K-means results have a stochastic component since they depend upon the initialization of the cluster centers which are iteratively improved by the algorithm. To offset this, at each run, we randomize the initialization multiple times and select the best outcome in terms of minimized K-means inertia (see Supplementary Section S1.2.1 for further details of the algorithm parameters). In order to select a cluster number, we run the K-means algorithm with different values for K and at each value, compute multiple cluster quality metrics, all of which attempt to quantify the optimality of the clusters obtained at each value for K, and we then plot these metrics across different values of K. A sharp change in the slope as well as an overall higher value for the average silhouette widths indicate an appropriate partitioning [96]. For our data, the peak and the sudden change in slope [97] were almost unambiguously in the favor of K = 4 in all the metrics that we used (average silhouette widths, Calinski-Harabasz index [98], Davies-Bouldin index [99], K-means inertia) (see Supplementary Section S1.2.2 and Figure S2). Other relatively subjective supportive evidence can be gained from the visual evaluation of the PCA plot and observing the dendrogram computed using agglomerative hierarchical clustering using Ward's minimum variance criterion [100]. The clustering algorithms and the cluster quality metrics were from scikit-learn version 0.32 (see Supplementary Section S1.1.3 for more details on the software versions and the platform). Each replicate was separately clustered and normalized (see Supplementary Section S1.2.3 for details on normalization).
This phenotype assignment strategy was maintained to be consistent all throughout the study for all circuits and replicates-a separate run of the K-means algorithm was performed for each replicate and the resultant clusters were assigned phenotypes based on their median ZEB levels (also see Supplementary Section S2.1).

Significance Testing
To compute the statistical significance for all comparisons, we used two strategies. For comparisons within the same RACIPE replicate (where n is large enough), we use the two-sided non-parametric Mann-Whitney U test [101] (also known as the Wilcoxon ranksum test). For cross-replicate comparisons (i.e., proportions), we use Welch's t-test [102], allowing unequal variances though with the assumption of normality of the compared groups. The Mann-Whitney U test is not suitable for small samples as it discards all information about the size of the differences (and as a result, caps the highest significance that can be achieved based on n) and also the assumption of normality of the proportions across replicates is a relatively reasonable assumption since it is stochastic differences which primarily cause the inter-replicate values to differ. Furthermore, we employ the Holm-Bonferroni correction to adjust the p-values accounting for multiple comparisons to control a family-wise error rate (FWER) in the overall rejection of the null hypotheses (see Supplementary Section S2.5). Due to the large number of steady states per RACIPE replicate, for some comparisons within the same replicate, very small differences of means (or medians) give a statistically significant p-value but which is not necessarily biologically relevant. Thus, to ascribe possible biological meaning to such comparisons, for cases of an irrelevantly small "magnitude" of this difference (i.e., the effect size), we provide the ratio of the means/medians and the absolute difference of these quantities.

Supplementary Materials:
The following are available online at https://www.mdpi.com/2077-038 3/10/1/60/s1, Figure S1: Optimal cluster number by varied cluster quality, Figure S2: Four E/M phenotypes enabled by the base circuit, Figure S3: Stemness window and the bistable/tristable phase distributions of the coupled EMP-stemness circuit, Figure S4: Link strength analysis, Figure S5: Distribution of bistable and tristable systems upon incorporating PSFs with the coupled EMPstemness circuit, Figure S6: Stemness probability (p1) changes with PSF expression, Figure S7: PSF expression profiles at different expression levels, Figure S8: GRHL2-KD circuit results for the investigated quantities, Figure S9: Shifted Hill function across different parameters, Table S1: Parameters ranges and the simulation options, Table S2: Median (iqr) for ZEB and LIN28 in different replicates of the base circuit rounded off to 3 significant figures, Table S3: Average Silhouette widths for the Uncoupled E/M and Stemness circuits, Table S4: Proportion (averaged across all replicates) of filtered x-stable parameter sets based on > 1 solutions of the set being assigned the same phenotype (standard deviations in the parentheses), Table S5: Proportion (averaged across all replicates) of all solutions irrespective of the number of stable states of the corresponding parameter set belonging to different phenotypes (with standard deviations in parenthesis), Table S6: Statistical Testing results for differences of various calculated proportions in the base circuit considering all the replicates (i.e., the proportion calculated for the 5 replicates forms one comparison group), Table S7: Statistical Testing results for differences of the node distributions across the clusters for a single replicate of the base circuit, Table S8: Statistical Testing results for the PSF circuits with  the proportions calculated for all the replicates of the de10 circuit forming one comparison group  while the proportions of all the replicates of the oe10 circuit forming the other comparison group,  Table S9: p1 and p2 for solutions of multi-stable systems (after removing monostable systems), Table S10: Different configurations of the association of stemness with the different phenotypes in bistable phases for base circuit, Table S11: Different configurations of the association of stemness with the different phenotypes in tristable phases for base circuit, Table S12: Different configurations of the association of stemness with the different phenotypes in bistable phases for GRHL2 circuit (reference un-perturbed), Table S13: Different configurations of the association of stemness with the different phenotypes in tristable phases for GRHL2 circuit(reference un-perturbed), Table S14: Different configurations of the association of stemness with the different phenotypes in bistable phases for OVOL circuit(reference un-perturbed), Table S15: Different configurations of the association of stemness with the different phenotypes in tristable phases for OVOL circuit(reference un-perturbed), Table S16: Different configurations of the association of stemness with the different phenotypes in bistable phases for NRF2 circuit(reference un-perturbed), Table S17: Different configurations of the association of stemness with the different phenotypes in tristable phases for NRF2 circuit(reference un-perturbed).
Author Contributions: M.K.J. conceived and supervised research; S.P. performed research; S.P. and S.S. analyzed data. All authors contributed to writing and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.