Modeling and Simulation of a miRNA Regulatory Network of the PTEN Gene

The PTEN onco-suppressor gene is likely to play an important role in the onset of brain cancer, namely glioblastoma multiforme. Consequently, the PTEN regulatory network, involving microRNAs and competitive endogenous RNAs, becomes a crucial tool for understanding the mechanism related to low levels of expression in cancer patients. This paper introduces a novel model for the regulation of PTEN whose solution is approximated by a high-dimensional system of ordinary differential equations under the assumption that the Law of Mass Action applies. Extensive numerical simulations are presented that mirror parts of the biological subtext that lies behind various alterations. Given the complexity of processes involved in the acquisition of empirical data, initial conditions and reaction rates were inferred from the literature. Despite this, the proposed model is shown to be capable of capturing biologically reasonable behaviors of inter-species interactions, thus representing a positive result, which encourages pursuing the possibility of experimenting on data hopefully provided by omics disciplines.


Introduction
In humans, PTEN (Phosphatase and TENsin homolog) is a gene located on chromosome 10 and involved in several important processes, such as maintaining genomic stability or cell proliferation, survival and migration [1]. Over the past few decades, however, something else has been the reason this gene has been put in the spotlight, namely its role as a tumor suppressor [2].
Since the earliest evidence dating back to 1998, genetic alterations of PTEN have been reported in several human cancers; for example, in glioblastoma multiforme (GBM), one of the most common and aggressive brain tumors, PTEN was found to be mutated or eliminated with an incidence of 70% of cases [2,3].
Several studies have focused on determining its full involvement and importance in the cellular context, obtaining some interesting results: PTEN, in fact, is not only a commonly deleted or mutated onco-suppressor in cancer, but it also appears to be haploinsufficient when it comes to its function [2]. That is to say, having only one functional copy of PTEN is not enough to provide the levels of protein necessary for the proper functioning of the cell under physiological conditions. It has also been speculated that even a slight decrease in PTEN levels may eventually lead to greater vulnerability to cancer or promote tumor progression [1]. This is why scientists are interested in understanding how this important gene is regulated, and which other actors are involved in determining its quantity within the cell and its down-regulation under pathological conditions [3].
The control of gene expression is a process implementable in all those steps that, starting from the DNA (the gene) and passing through a RNA intermediary (the messenger RNA or mRNA), lead to the final functional product (the protein). In the case of PTEN, for example, some of these mechanisms act before the initial transcription to mRNA, some affect the transcription itself, others can act at the post-transcriptional level, as in the case of RNA interference [1].
In the process just mentioned, short endogenous RNAs (microRNAs or miRNAs) are able to control the expression of a gene by binding to its mRNA. After their initial transcription and processing, in fact, the miRNAs are loaded into a protein complex called RNA-induced silencing complex (RISC). Once inside, they can guide the complex towards their mRNA targets, thereby allowing the RISC to degrade them. Intuitively, this course of action leads to a substantial decrease in target levels over time, hence to the downregulation of the gene under consideration [3].
PTEN has proven to be regulated by a rather extensive network of miRNAs, further complicated by the presence of competitive endogenous RNA (ceRNA), molecular species with a sequence similar to the mRNAs of PTEN, that can subsequently act as a bait for miRNAs [1]. The general picture is that of a complex network, in which miRNAs and ceRNAs interact, and whose alteration can affect the mRNA levels of PTEN available to be translated into the final protein. Several articles have been published on this subject, all indicating the important involvement of this network in various types of cancer, such as melanoma or even glioblastoma [4,5].
Given the crucial importance of the miRNA regulatory network behind PTEN levels and the recent advances in biology and omics disciplines, in this work, we make use of such knowledge to update the stochastic mathematical model presented in [3], which simulates the regulatory network for PTEN, to improve its biological accuracy. To this aim, we critically reviewed and rewrote the model in [3], checking each individual reaction channel and implementing new data on ceRNAs and miRNAs, mainly using databases (such as Reactome [6], Genecard [7,8] or the Gene Expression Atlas [9]) and the literature on the subject [10].

The Model
The model detailed in this section and in Section 3 seeks to reflect the complex network of interactions revolving around the post-transcriptional regulation of PTEN due to miR-NAs and ceRNAs. The main rationale behind the proposed model is that the mRNA levels of PTEN can be reduced upon interaction with miRNAs and can be increased indirectly if ceRNAs are present to act as bait for miRNAs, an activity that has been described as sponge effect in [5].
In Figure 1, the double helix symbolizes the DNA sequences attributable to PTEN, miRNAs and ceRNAs, respectively indicated by the vertical bars colored in blue (right helix section), red (middle helix section), green (left section propeller). After their initial transcription in RNA (corresponding to reactions c1, c2, c3), the species can undergo spontaneous degradation within the cell (reactions c4, c5, c6), or they can eventually be translated into a protein, as in the case of the mRNAs of PTEN. For their part, miRNAs can instead be loaded into RISC (reaction c7), thus becoming capable of targeting and degrading both mRNAs and ceRNAs (reactions c9, c10, respectively). This loading also affects the time required for miRNA degradation, which is now higher than for the unloaded miRNAs (reaction c8). The importance of our new model resides in a few considerations, which we now underline and explain. The first observation concerns the introduction of the loading into RISC as its own reaction, since this process has been repeatedly reported in the literature and referred to as a biological bottleneck: most miRNAs are degraded before such loading can occur, so they lack the ability to bind their targets [11,12].
On the other hand, RISC-loaded miRNAs can persist inside the cell for much longer times [13]; for this reason, a different degradation reaction for loaded miRNAs is provided in our model. To describe the various processes illustrated in Figure 1, our model includes the general set (1a) of chemical kinetic reactions, which represents a blueprint for the further reactions, also present in our model.
The notation adopted in set (1a) is as follows: • DNA p is the DNA sequence for PTEN; • RNA p is the mRNA of PTEN; • ceRNA i is the i-th ceRNA; • DNA ceRNA i is the DNA sequence for ceRNA i ; • miRNA j is the j-th miRNA; • miDNA j is the DNA sequence for miRNA j ; • lmiRNA j represents the loaded analogue of miRNA j .
Each reaction in (1a) is characterized by a specific coefficient c j , linked to the probability that such a reaction will indeed occur, as well as to the instant of time at which it may occur; notice that the same type of reaction may have a different c j value (reaction rate) if the interacting species are different. Figure 1 and the related set (1a) display a synthetic version of our model. The complete series of kinetic reactions are in sets (10a)-(10c). Note that the latter represents a simplified deterministic model, in which the instrinsic stochasticity of biochemical reactions is neglected: this allows it to cope with their large number, in addition to the numerous species involved, leading to a high-dimensional differential system, as described below.
A set of n r monomolecular or bimolecular reactions, the latter having distinct reactants, can be translated into a system of n s ordinary differential equations (ODEs), where n s is the total number of X species that enter the set of chemical reactions and that, within the ODE, assume the role of differential variables X 1 , . . . , X k , . . . X n s .
In other words, X k = X k (t) is a differential variable representing the k-th X-species considered and its biological state at time t ; for instance, X 1 is DNA p , X 2 is RNA p , and so on.
Correspondences between differential variables X k and the X-species appearing in the small model (1a) are shown in Table 1, through which (1a) itself can be rewritten as (1b). Table 1. X-species involved in the chemical reactions (1a).
In a similar way, Table 2 summarizes the correspondences between differential variables X k and the X-species considered in the larger simulation presented in Section 3.
The translation from the set of n r reactions to the system of n s ordinary differential equations is illustrated in the following, employing model (1b) and its corresponding differential system (7), where the reactions are n r = 10 and the differential equations are n s = 7 .
Formula (2) provides the general form of the ODE k-th representative, for k = 1 , . . . , n s .
The differential variables X p and X q , with p , q ∈ {1 , . . . , n s } , are the reactants appearing in the j-th reaction, whose reaction rate is c j . The role of the scalar quantities ν k j is introduced shortly.
The scalar quantities ν k j in (2) are in fact related to the stoichiometry of the chemical reactions. They are collected column-wise to form the so-called stoichiometric vectors ν j , with j = 1 , . . . , n r . For each j-th reaction, the correspondent ν j is defined as in (3).
Notice that if X k is both a reactant and a product in the j-th reaction, then it is obviously ν k j = −1 + 1 = 0 .
A stoichiometric matrix S = ν k j of dimensions n s × n r can thus be defined; for instance, in the case of set (1b), S takes the form outlined in Figure 2. The notation in (2) can be simplified by introducing the so-called propensity functions, whose j-th entry takes the general form (4), where p , q ∈ {1 , . . . , n s } and j = 1 , . . . , n r .
For each j-th reaction, a j (t) expresses the link between the probability c j and the reactants involved; for instance, the propensity functions corresponding to (1b) are shown in (5), where dependence on time t has been temporarily discarded, to ease the notation. a 1 = c 1 X 1 , a 6 = c 6 X 6 , a 2 = c 2 X 3 , a 7 = c 7 X 6 , a 3 = c 3 X 5 , a 8 = c 8 X 7 , a 4 = c 4 X 2 , a 9 = c 9 X 7 X 2 , a 5 = c 5 X 4 , a 10 = c 10 X 7 X 4 .
The system dimensions obviously depend on the number n s of X-species involved (i.e., number of differential equations) and the number n r of reactions considered (i.e., number of parameters). The ODE system for model (1b) is as in (7), with seven equations and ten parameters, and where again dependence on time t has been temporarily discarded.
By defining initial conditions at time t = T 0 : the ODE (6) becomes an Initial Value Problem (IVP), whose integration over the time interval [ T 0 , T ] will yield the solution vector (or state vector):

Simulation
In this section, a complete set of chemical reactions is presented, grouped into three subsets (10a)-(10c) for readability. Table 2 reports all the X-species considered in such subsets, as well as their correspondences with the differential variables X k , which are as follows: the first sixteen reactions in (10a) involve X 1 to X 11 ; the next twenty (from 17 to 36) reactions in (10b) involve X 12 to X 20 , as well as some among X 1 to X 11 ; finally, the last thirty-one (from 37 to 67) reactions in (10c) involve X 21 to X 37 , and again some among X 1 to X 11 .   Reactions (10a)-(10c) can then be rewritten in differential form: although their explicit translation in terms of X k is not provided here, the procedure to obtain the correspondent ODE system is the same as that described in Section 2 for the small model (1a).
Before leaving this section, we mention again that unsimplified modeling would result in a discrete and stochastic system, due to intrinsic noise affecting, for each reaction in the system, the probability to occur, as well as the time instant at which it may occur. In other words, the X-species (n s in total) interact through n r reactions, forming a system whose evolution is characterized by a non-linear Markov process, discrete to mirror the fact that molecules are discrete objects, and whose solution, i.e., the state vector X(t) in Formula (9), is a discrete jump Markov process. In the case of low dimensions, the proper solver should belong to the class of stochastic simulation algorithms [14][15][16]. However, such an approach is unfeasible here, due to the high dimensions of the differential system considered that would result in too slow a simulation. To accelerate the computation, we then considered neglecting the intrinsic stochasticity of the biochemical reactions and obtaining X(t) as the solution of a deterministic ODE, interpretable as the limiting approximation to the stochastic and discrete solution provided by a stochastic simulation. This is indeed equivalent to regarding the time evolution as a continuous process, modeled by the so-called reaction-rate system of ODEs [16] that arises when the Law of Mass Action applies. The latter prescribes that, in a chemical reaction at the equilibrium, the ratio between the concentration of products and reactants is constant; in our case, in which we are considering molecular species in a solution inside the cell, hence subject to a dynamical equilibrium and the Law of Mass Action, such a constant is related to the correspondent c j for each j-th reaction [3].

Numerical Simulation
The ODE system (12), together with the initial conditions X k (T 0 ) = X k,0 reported in Table 3, form an IVP made of thirty-seven equations and sixty-seven parameters. In this section, we present the results of its numerical integration over a time interval [ T 0 , T ] , where T 0 = 0 and T = 10 5 and where the time unit is expressed in minutes. Table 3. Initial conditions X k (T 0 ) = X k,0 , where T 0 = 0 , for the IVP correspondent to reactions (10a)-(10c). Values of c j , which enter the definition of the propensity functions a j in (12), are inferred according to what is known in the literature and the already cited Genecard database [3,12,[17][18][19][20][21][22]. In a few situations where such bibliographic data were not available, missing values have been filled in by making assumptions about the underlying biology. Table 4 reports on the c j values employed in the numerical simulations. All experiments were carried out on a DELL XPS notebook computer, with 7th generation Intel Core i7 processor at 2.8 GHz, and 16 GB RAM at 2400 MHz, running under Ubuntu 20.04.2 LTS operating system. The problem solving environment of Mathematica, version 12.2, is employed to set up the IVP, formed by (12) with the initial conditions in Table 3 and the coefficients c j in Table 4, integrate it and plot its solutions.
The IVP considered shows some stiffness, which is trapped by the LSODA method [23] used by the numerical differential solver NDSolve [24] available in Mathematica. It is a multi-pass method able to switch from an explicit Adams-Bashforth method [25,26] to an implicit Backward Differentiation Formula (BDF) method [27,28]. The Mathematica implementation of these methods also includes order variation based on local error estimates [29,30].
The simulation provides for the subdivision of the overall integration interval [T 0 , T] into two subintervals, [T 0 , T 1 ] and then [T 1 , T] , to highlight an instant of time T 1 in which some event may occur, such as the expression of a miRNA, or the duplication or increased expression of a gene.
The IVP is first integrated over the time interval [ T 0 , T 1 ] , where T 0 = 0 and T 1 = 1.5 × 10 4 . An intermediate solution is thus obtained, which we call Physiological solution: We observe that, for k = 2 , 8 , 9 , 10 , 32 , components Xi k (t) can only be expressed via numerical integration; note that they correspond to RNA p and the four ceRNAs involved in the model. All the other components of Xi(t) are either constant functions or can be expressed in closed form, in terms of the exponential function; among the latter, it is Xi 15 (t) = Xi 27 (t) = Xi 29 (t) and Xi 33 (t) = Xi 35 (t) . The IVP solution to the Physiological case is plotted in Figure 3. At this point, a modified IVP is built, formed by (12) with the coefficients c j in Table 4, and with initial conditions provided by the values of the Physiological solution components Xi(t) evaluated at time T 1 . One exception to the newly computed initial conditions is Xi 13 (t) , whose originally null value is modified so that Xi 13 (T 1 ) = 2 ; this corresponds to considering an IVP which we denominate Ectopic Expression problem and whose solution is illustrated in Figure 4. A different experiment is also considered, formed by (12) with the coefficients c j of Table 4, and with an exception inserted in the new initial conditions (namely, the values of the Physiological solution evaluated at T 1 ): the value of Xi 3 (t) , originally equal to 2 , is modified to become Xi 3 (T 1 ) = 5 ; this corresponds to forming another IVP, denominated Duplication problem, whose solution is given in Figure 5. A further experiment is finally carried out; this time, an exception is inserted in the coefficients c j of Table 4, by setting c 3 = 3.4 × 10 −2 . The new IVP (12), whose new initial conditions are the values of the Physiological solution evaluated at T 1 , is named Increased Expression problem; its solution is given in Figure 6. By integrating each of the three modified IVPs over the time interval [ T 1 , T ] , where T 1 = 1.5 × 10 4 and T = 10 5 , the corresponding final solution X(t) is obtained, whose components can be outlined as follows, in all the cases considered: X k (t) has only numerical form k = 2 , 8 , 9 , 10 , 32 , X k (t) has closed form k = 4,11,15,16,17,18,19,20,28,30,33,34,36 , with X 15 (t) = X 27 (t) = X 29 (t) , X 33 (t) = X 35 (t) , X k (t) = 2 k = 1 , 5 , 6 , 7 , 12 , 13 , 14 , 21 , 23 , 24 , 26 , X k (t) = 0 k = 25 , 31 , 37 , Observations that are similar to those related to the intermediate Physiological solution Xi(t) can also be made on the solution X(t) obtained at the end of the second integration step. As before, X 2 (t) , X 8 (t) , X 9 (t) , X 10 (t) , X 32 (t) cannot be expressed in closed form and require numerical integration, while the remaining components of X(t) are constant or possess an explicit exponential expression; in particular, equalities are verified by the following closed-form components of the solution: X 15 (t) = X 27 (t) = X 29 (t) and X 33 (t) = X 35 (t) . The color coding employed is as follows. The green curves represent the levels of the various ceRNAs, while the black curve represents RNA p . Cyan curves are the considered miRNAs (dashed style) and their lmiRNA counterparts (solid style). Those miRNAs that are subject to change are rendered in red or purple (dashed curves), while their loaded versions are given as solid-style curves in the same red or purple colors. Figure 3 refers to a Physiological case. Here, the initial conditions reflect the various species that are normally present in the human brain under physiological conditions. After an initial steep increase in the levels of RNAp and ceRNAs, a slight decrease can be observed, due to the action of the lmiRNAs on their targets, after both expression of their miRNA and loading into RISC have occurred. From a biological perspective, our model mirrors what should happen inside a brain cell under physiological conditions: since PTEN is an onco-suppressor gene, it is important that its levels reach an equilibrium and do not drop dramatically, even when miRNAs are present that are capable to down-regulate it. Figure 4 is related to an Ectopic Expression case. The sudden expression of a miRNA and its loaded counterpart are simulated here, assumed to occur at the instant of time T 1 and respectively rendered as dashed-purple and solid-purple curves. The ectopic expression of a gene (i.e., the presence of its product in a biological context in which it should not be present) has been observed in many pathological conditions, including tumors. As Figure 4 describes, our model predicts an (obvious) increase in the mRNA levels of the newly appeared miRNA, together with a small overall decrease in the levels of RNA p and ceRNAs. This may be explained by the levels of the newly inserted lmiRNA, which turn out to be comparable to the other lmiRNAs: in other words, even if the new miRNA gets expressed, only a small fraction of it can be loaded into RISC and can, thus, become capable of affecting RNA p levels. Figure 5 illustrates a Duplication case, related to the simulation of the effect of the duplication of a gene: this translates into the presence of an additional copy of the gene being expressed inside the cell, together with an overall increase in the levels of its products. As shown in Figure 5, at the time T 1 of duplication, a noticeable leap in the miRNA product can be observed (dashed red curve), as well as a small upsurge in the lmiRNA levels (solid red curve). The overall effect on RNAp (and the targeted ceRNAs) levels remains modest, as in the previous case of ectopic expression of a gene. Figure 6 refers to an Increased Expression case, that is the simulation of the increased expression of an already present gene, a condition usually referred to as over-expression in biology and that can lead to pathological conditions, similar to those potentially caused by duplications or ectopic expressions. Starting from T 1 , a sharp increase can be observed in the miRNA levels, as well as in the related lmiRNA species (dashed-red and solid-red curves in Figure 6). Unlike in cases of ectopic expression and duplication, here the miRNA curve soon overtakes the RNAp curve. However, the overall levels of RNAp are not subject to a more significant decrease than in the previous two simulation cases. This again seems to imply the robustness of the ceRNAs network in acting as off-targets and the loading into RISC as a critical step in regulating the levels of the targeted species.

Conclusions
Focus of this work is the modeling of a gene regulatory network, in particular that of the miRNAs of PTEN, and its impact on target species, the latter being mRNAs or ceRNAs. To this aim, a model is designed, related to sixty-seven chemical reactions in which thirty-seven species appear, that describes the regulation of the PTEN gene and involves, in particular, four ceRNAs and nine miRNAs. To deal with the large dimensions involved, the simulations presented relate to a simplified deterministic model, in which the intrinsic stochasticity of biochemical reactions is neglected, and show the behavior of the species considered in some biologically relevant situations, as explained in Section 5.
From a biological perspective, our model, although simplified, is able to mirror parts of the biological subtext that lies behind various alterations and can lead to many pathologies, such as GBM, in our specific case. In addition, our results suggest a role for RISC, not just as a restrictive step, but as a kind of safety mechanism. Loading into RISC, being a biological bottleneck, ensures that alterations (or even fluctuations) in miRNA levels will not be capable to directly (and dramatically) affect the entire network, thereby increasing its robustness.
In the model presented, initial conditions and reaction rates are not experimentally determined, but rather inferred from the literature available on the topic; despite this, the model is capable to behave properly, that is, in a biologically reasonable fashion. The interest remains related to the possibility of testing our model on empirical data, currently not available, due to the complexity of processes involved in obtaining them, making this task an auspicable and advisable future work.