Kinetic Theory Modeling and Efficient Numerical Simulation of Gene Regulatory Networks Based on Qualitative Descriptions

In this work, we begin by considering the qualitative modeling of biological regulatory systems using process hitting, from which we define its probabilistic counterpart by considering the chemical master equation within a kinetic theory framework. The last equation is efficiently solved by considering a separated representation within the proper generalized decomposition framework that allows circumventing the so-called curse of dimensionality. Finally, model parameters can be added as extra-coordinates in order to obtain a parametric solution of the model.


Introduction
Using mathematical modeling to address large scale problems in the world of biological regulatory networks has become increasingly necessary given the sheer quantity of data made available by improved technology.In the most general sense, modeling approaches can be thought of as being either quantitative or qualitative.Quantitative methods such as ordinary differential equations or the chemical master equation are widespread in the literature; when the model is well developed, the detail therein can be incredibly informative.However, these methods are not well suited for all applications.Quantitative models require an in depth knowledge of the reaction kinetics and generally fail as the problem size grows.The alternative approach, qualitative models, does not possess the same amount of detail but captures the essential dynamics of the system.In addition, qualitative models have a variety of analysis tools which can be applied regardless of the problem size.Gene regulation, as a sub-genre of biological regulatory networks, is characterized by large numbers of interconnected species whose influences depend on passing some threshold, thus, largely sigmoidal behaviors.The application of qualitative methods to these systems can be highly advantageous to the modeler.
In this work, we begin by considering the qualitative framework of Process Hitting, revisited briefly in Section 2.1.A highly flexible model, Process Hitting captures the most important dynamics of the system with a relatively simple syntax.The structure of this syntax lends it to powerful static analysis tools which can be used to answer some of the most important questions about the model such as steady states or reachability without constructing the state space.Realistic models in gene regulation are immense and highly interconnected: even when considering a boolean space, the very enumeration of the possible states of the resulting system creates a combinatorial explosion.This is a frequent obstacle in the field of computer science and has been dubbed the curse of dimensionality.However, there are some questions for which one must access the underlying probability distribution associated with the Markov transitions of the qualitative model.In addition, gaining access to the probability distribution allows for a qualitative and intuitive analysis of the system as a whole.The most pervasive methods have historically been simulation-based, although there are some instances in which this becomes computationally infeasible.Here, we propose a method to solve the system by treating the Markov equations of a Process Hitting model with numerical techniques.A reduced-basis method, the Proper Generalized Decomposition (PGD) can be used to overcome the curse of dimensionality and provide fast, computationally inexpensive solutions to an otherwise intractable problem, as discussed in Section 2.2.In addition, PGD has certain qualities particularly favorable for applications to gene regulatory networks.Unknown parameters can easily be incorporated into the model at the cost of another dimension, as demonstrated in Section 3.2.

The Qualitative Model: Process Hitting
Process Hitting is a powerful yet simple tool for the analysis of large regulatory networks.Historically related to the discrete models of Stuart Kauffman [1] and René Thomas [2], Process Hitting attempts to address problems of scalability in classical modeling methods while maintaining the highest degree of expressiveness possible.Formally a subclass of asynchronous automata, it relies on large degrees of abstraction to describe the system as a whole.All interacting species -whether they be enzymes, genes or proteins -are abstracted as sorts.These sorts are then subdivided into processes, which could represent concentration levels, spatial configuration, or any other form which has a distinct qualitative impact on the system.Processes interact with one another via actions, in which processes hit one another to create a bounce to some new level of the same sort at a given rate.For gene regulatory networks, processes are often abstractions of relevant concentration ranges, discretized domains of real numbers, and actions represent activation and inhibition reactions.Figure 1 illustrates how to define sorts, processes and actions from a biological understanding of an interaction.Process Hitting relies on the initial construction of the most permissive dynamics, otherwise called generalized dynamics, in which no restrictions are placed on the potential behaviors.An example of this can be seen in Figure 1.

Figure 1.
Creating a Process Hitting action.In gene regulation, we consider two kinds of interactions between species: activation and inhibition.If a is an activator of b , it is common to represent this by a signed, directed graph (left).These interactions have a characteristic form: unlike kinetic reactions, activation and inhibition usually depend on the regulator passing some threshold concentration in order to become effective (middle).Process Hitting (right) represents these reactions via actions: a activates b becomes a 1 hits (solid arrow) b 0 to bounce (bashed arc) to b 1 .Generalized dynamics attempts to create the most permissive dynamics possible for the directed graph.Therefore, the absence of a effectively acts as an inhibitor, adding the action a 0 hits b 1 to bounce to b 0 .
Every action can be associated with temporal and stochastic parameters, the reaction rate for example [3].
The general dynamics may then be successively enriched by the addition of cooperative sorts in order to best capture some known biological behaviors or eliminate undesirable behaviors.Cooperative sorts represent not species but, rather, the combined effects when multiple regulators interact cooperatively on a single target.These sorts are the combined space of the original species, thus must be updated such that the current state of the cooperative sort is compatible with the current state of each of its components.A visual explanation of the construction of a cooperative sort and its refinement of a Process Hitting model can be found in Figure 2.
Although this is a very simplistic representation of the inner kinetics of a biological process, Process Hitting semantics allow us to easily model interactions with only partial knowledge of the logical functions encoded therein and pave the way for powerful static analysis techniques in order to study fixed points, reachability and cut sets which determine minimum criteria for reachability, in spite of the present combinatorial explosion [4,5].Examples of Process Hitting at work can be found in Section 3, where we use static analysis to compare the fitness of the generalized dynamics model with that of the refined model.Furthermore, these tools are freely available online in a software called PINT.We will not attempt to expound completely upon the details of Process Hitting here but, rather, point those interested towards [5] for a formal and thorough introduction to the modeling framework.As we progress to a biological application in Section 3, greater clarity will be given to the concepts described above, including the relevance of cooperative sorts and the power of static analysis.But what should happen in the case that both a and b are present?According to the left hand model, the system will oscillate.If we know more about how the system should function, however, we would like to be able to include this information into our model.With general dynamics, we are unable to express logical gates in which multiple species exhibit deterministic combined effects on a target, such as a ∧ −b , or the presence of the activator without the presence of the inhibitor.In order to add this combined interaction and eliminate the oscillatory behavior, we must refine the Process Hitting model with a cooperative sort, ab .This sort will handle the interactions of a and b on c while leaving the original species to interact with other elements as before.In exchange, more actions must be added such that a and b can effectively update ab so that it truly reflects the current state of both elements.In our example, ab 1,1 will not interact with c 0 , thus c remains inactive.

Treating Qualitative Systems with Numerical Techniques
In order to address Process Hitting's global results, that is, the full and complete description of the systems behavior given an initial condition, we must consider the framework in a stochastic context.Process Hitting actions move the system from one state, z , to another, ẑ , at a given propensity which depends on only the current state and time, or a j (z,t) .As a memoryless random walk, each action corresponds to a Markov equation which tracks the net change in the probability of existing at a certain state and time: The result is a system of linear, time dependent, differential equations, defined given an initial condition.Some of the most famous and broadly used techniques for addressing problems such as these have been simulation based.Simulation can become computationally expensive with respect to computing time and available memory.An alternative approach is the direct application of a numerical method to the Markov equations.Here, we propose Proper Generalized Decomposition (PGD) as an effective and well-suited technique for gene regulatory networks.

Proper Generalized Decomposition
Proper Generalized Decomposition [6,7] is a multi-linear numerical solver which assumes that the target, in this case, the probability distribution, can be written as a sum of a product of separable functions of the interacting species, F i j (z i ) , i = 1,, N sp , and time, F t j (t) : PGD is performed iteratively, starting at some arbitrary guess and searching for sets of functions, one vector at a time, which will minimize the residual of the running sum.These functions are colloquially called modes, however, since the only objective is the reduction of the residual, there is no underlying notion that they represent the greatest source of variance, as is the case with Principal Component Analysis.Although the accuracy increases with every addition, we assume that only a limited number, M , of sets of functions are needed to capture the behavior of the system.If we consider a network of N sp species with N possible levels, the resulting dimensionality is the M sum of N sp functions of size N , or M ⋅ N ⋅ N sp in contrast to the original (N sp ) N .We have not changed the state space but, rather, re-ordered it such that only one vector of size N , usually on the scale of 2 or 3, must be addressed at any given time, see Figure 3. Since all operations can be performed by canonical techniques and are highly parallelizable, iterations are generally fast and computationally inexpensive.

Figure 3.
Refinement Decomposition of a state space.This illustration shows how a multidimensional space, for example, a cubic space of three dimensions, 3 3 can be decomposed into the product of the individual dimensions, 3 × 3 .This mathematical property is exploited by PGD in that we search for the individual vectors which are of relatively small size, never touching the full state space.In such a way, we move from (N sp ) N to M ⋅ N ⋅ N sp , as M sets of these vectors must be found.

Application to a Biological Network
It is easier to understand the concepts of Process Hitting and PGD, as well as to see their individual and combined benefits, when seen in action in the context of a realistic application towards a gene regulatory network.Here, we investigate a medium scale model of the ErbB signaling pathway which regulates a cells transition from G1 to S life phase, an important checkpoint which determines whether a cell should divide, delay division or enter a quiescent state.Over expression of ErbB is associated with many kinds of cancer, and drugs which target it and its receptor are common treatments for breast, lung and colon cancers.The directed graph for this network was taken from [8], where twenty species interact according to Boolean rules.The directed graph can be found in the appendix for reference.We begin our application by constructing a Process Hitting model from this Boolean predecessor, taking the most permissive, generalized dynamics, followed by its refinement via the incorporation of cooperative sorts.The impact that this refinement has on both the static analysis and application of PGD will be investigated, both in terms of expressiveness and complexity.Finally, the potential of PGD's capacity to easily incorporate model parameters as extra coordinates will be demonstrated by taking many potential values for the rates of two reactions in the directed graph.
The translation of a Boolean model to the generalized dynamics of Process Hitting is relatively straight forward, as shown in Figure 2: the absence of an activator effectively serves as an inhibitor and vice versa.The formal relationship between Boolean networks and Process Hitting can be found in [9].At this point, we would like to investigate the model to see if it adequately reflects our biological understanding of the system as a whole: are experimentally demonstrated states reachable, are impossible states unreachable, and are there fixed points if steady state behaviors exist?These questions constitute sanity checks, making sure our model is not essentially flawed from the beginning.The structure of the system, see appendix, suggests two species of experimental interest: EGF as an input, having no predecessor, and pRB as output, having no successor.Using these two species, we can easily formulate simple reachability criteria in order to perform sanity checks on our model.We consider a system at rest, in which all components begin in their inactive state.If no changes are made on the input protein, EGF when it is inactive, we expect that the system will remain at rest and that no change is to occur in the output protein.However if EFG is introduced, the signal should be able to propagate to the output, pRB.In order to be a feasible model, the system must pass these two criteria.Results from static analysis, shown in Table 1 provide good evidence that the generalized dynamics are too permissive and do not accurately capture the biological behaviors which are essential for a functioning model: not only are we are unable to find any fixed points within the system, of which we do expect to find at least one, but the protein pRB may become sporadically activated in a globally inactive system, failing the first sanity check.Therefore, we must refine the model, incorporating the suggested logical gate rules from [8] via cooperative sorts.In doing so, we recapture these vital phenomena, finding three fixed points and passing both sanity checks.These results were obtained in a matter of seconds, using simple commands in freely available software, allowing us to efficiently alter our model before investing time in more computationally expensive analysis.Table 1.Results for ErbB models using generalized dynamics and a refinement with cooperative sorts.Here, the two models were tested using two three sanity checks related to our biological understanding of the system: the presence of fixed points, the lack of impossible behaviors and the presence of demonstrated behaviors.In order to be considered a functioning model, pRB should remain at rest when the system is universally inactive, including the absence of input protein EGF.However, in the presence of EGF, a signal should be able to propagate through the system, potentially activating pRB.We see that, while the generalized dynamics were able to propagate a signal from EGF to pRB (EGF present), it was not able to prevent sporadic activation of pRB in a system at rest (EGF absent), nor find any fixed points.

Cooperative Sorts in the context of PGD
The Markov Equations of the Process Hitting actions provide a system of DEs to which we can apply PGD.Each species occupies a dimension of the state space.With two processes to each sort, the final problem is of size 2 20 , or over one million possible states.The underlying probability distribution is a function of these species and time.Our goal is to approximate this solution by a summation of separable functions In the case of a Process Hitting containing only the generalized dynamics, this is an appropriate and accurate method.However, once cooperative sorts are incorporated into the qualitative model, the cooperating species can no longer be represented by separable functions.To satisfy the enriched model, we may simply combine those dimensions which participate in cooperative sorts.While this does create vectors which grow exponentially with each added species, it is biologically implausible that more than three or four species would participate in a cooperative influence on a single target.Therefore, we can expect this growth to be cut short long before the dimension of a cooperative sort becomes too large.As we combine the state spaces so that they reflect their cooperative sorts, the error associated with PGD solutions as compared to the solution obtained from simulation techniques decreases.But what is to be done when one species participates in multiple cooperative sorts?The species cannot be represented twice in the decomposition, so we cannot construct two separate entities for the cooperative sorts as we would like.Rather, if we simply combine the two cooperative sorts into a single element, we return to the most accurate representation of the system, as each species is only represented once, but any non-separable behavior can be taken into account.Again, while it is possible to experience exponential growth in the combination of cooperative sorts, it is very rare biologically that one species would be participant in more than a handful of interactions.The solutions that we obtain from PGD are approximations of the full probability distribution corresponding to the Markov equations created by Process Hitting.From these probability distributions, we are able to make fast analysis of global behaviors of the system: rather than being limited to asking the questions answerable using static analysis, a modeler can watch the system evolve through time and make general statements on the qualitative behavior.

Incorporation of Unknown Parameters
It is often the case, especially in a growing field such as geneomics, that elements of a regulatory network are disputed or unknown.Researchers may come to very different conclusions about the parameters which fit a particular system.With simulation techniques, each new set of parameters requires a full repetition of all of the trials, limiting the modeler and leading to ad hoc choices made for the sake of feasibility.However, PGD offers a simple way of incorporating these unknown parameters directly into the model, making it possible to obtain an approximate solution for a range of values all at once [10].The parameter is encoded as one of the separable spaces and is included at the cost of one dimension added to the overall solution space.For our example, perhaps one of the regulating reactions is difficult to study separately from the system as a whole, say, interactions involving p27 and p21.Unlike the first half of the directed graph which is simply an activation cascade, these proteins are involved in both inhibiting and activating relationships, so changes to their rate laws should more greatly influence the final expression of pRB.We would like to incorporate many potential values of the action firing rate r into our model, anywhere between two times faster and two times slower than the other reactions in the system.Since our representation requires discretization, we consider forty equally spaced values between r 2 and 2r .Our decomposition of Ψ(z,t) is changed slightly in order to accommodate the parameter for the range of possible values: Wile simulation run time grows linearly with each element, 40 times longer since there are 40 values in the discretization, to obtain a result, we are able to derive a solution in relatively equal time using PGD.In figure 4, we see three solutions for the protein pRB given different values of the rate parameter, r 2 , 3r 2 and 2r .These comparably fast results allow us to perform general analysis on the network by directly observing how the global behavior changes with parameters.Here, we have selected three potential values for the firing rate r , r 2 (left), 3r 2 (middle) and 2r (right), for any interaction involving the proteins p21 or p27.The resulting behaviors of five proteins along the chemical pathway are shown here.Since the system is binary, active expression is plotted in yellow and inactive in blue, following their probabilities on the y-axis.Notice that in all cases, behavior is equivalent until around 1.75 seconds, when the firing cascade reaches p21 and p27.Some signals are simply amplified, whereas others, such as p21 and pRB, develop more complicated behaviors as the firing rate increases, perhaps suggesting cyclical or dampening behaviors.

Evaluation and Analysis
Up to now, we have presented a new method of approaching discrete models of gene regulatory networks, uncovering briefly the origins of its individual components, Process Hitting and PGD, and applying it to a real biological system.As bioinformatics grows and many new methodologies are proposed, however, it is increasingly important to discuss in a straightforward manner how well our method performs, highlighting both its merits and weaknesses.
We began our approach by considering the discrete modeling framework of Process Hitting.This approach allows for a great extent of abstraction in the development of a regulatory model and comes with a well-developed analysis toolbox, making it an attractive starting framework.However, the application of PGD to Markov Equations would be effective for any discrete modeling type which can be described as such.Process Hitting does have an advantage in that the species which interact in nonlinear ways and thus must be represented together in the decomposition are well defined as cooperative sorts in the very construction of the model.Once the Markov Equations have been provided, the method is relatively straightforward; PGD is a well-founded numerical method with thoroughly documented implementations.
As for the results themselves, there are several points to touch on: ease of analysis, model validation, and accuracy.One of the most interesting aspects of this approach is the nature of the results: a full probability distribution as an approximate solution to a set of equations.As demonstrated in Figure 4, the output lends itself to visualization on an individual scale, that is, for each protein involved.The behavior of a gene or protein can be described in very plain and qualitative terms, even for elements whose evolution is complicated and never reaches steady state behaviors.However, the apparent behavior can only be taken at face value: while a protein may appear to oscillate or tend to a certain value, the solution is only valid for the limited time frame in which it is analyzed and is not tied to a mathematical principle governing its evolution.Furthermore, since the solution is approximative, it is possible that species whose static analysis proves total inactivation or activation would be found slightly activated or inactivated in the PGD solution, a potentially important distinction in the world of genomics.In such instances, the static analysis and numerical analysis may be found in conflict with one another, compromising the accuracy of the resulting analysis.

Conclusions
In the case of gene regulatory networks, there are many reasons why a modeler might choose the application of qualitative methods, one of which is Process Hitting.Process Hitting offers many advantages for large scale, which are often the more realistic, systems in the form of static analysis tools.These analysis tools alone, however, cannot provide the complete and intuitive solution of the system as a full probability distribution for each state over time.By translating Process Hitting actions to Markov equations, we are able to treat a system of PDEs directly.Proper Generalized Decomposition has proven efficient in solving Process Hitting models.As opposed to simulation techniques, which have been historically been the preferred methodology, PGD can provide full solutions, including multiple unknown parameters, with a single run.Here, we have shown some of the potential of this method, applying a combination of static analysis and numerical tools in order to maximize the expressiveness and understanding of a qualitative model.Only the basic elements of Process Hitting have been incorporated into the Markov equations considered, that is, actions with simple rate laws.Including temporal and varied stochastic features into these equations would further increase its potential.

Appendix: The ErbB Signaling Pathway
For this work, we used a Boolean model of the ErbB signaling pathway for the regulation of G1/S cell cycle transition as developed by [8].In this article, the authors began by constructing a model from the literature, then proceeded to refine the model via network reconstruction.Although these refinements proved useful in the selection of novel targets for gene therapy, we would like to focus on the initial derivation of the model in which all reactions correspond to cited regulations.However, we will use the logical rules suggested within this article for the refinement of the Process Hitting model via cooperative sorts, shown in Table 2.

Table 2.
The proposed logical rules for species with more than one regulator.EGF (epidermal growth factor) binds to ErbB receptors, of which there are four structural variants, three thought to be involed in this network.These receptors are functional when they form heterodimers, excluding ErbB1 which is able to function as a homodimer as well.Functional receptors transmit signals to AKT1, an apoptosis-inhibiting transmitter, and MEK1, a protein kinase.Along with transcription factors c-MYC and ER-α , these entities downregulate kinase inhibitors p21 and p27 while upregulating the cyclins (CycE1 and CycD1) needed to activate their respective cyclin dependent kinases (CDK).These CDKs will work to phosphoralize, and therefore inactivate, the retinoblastoma protein (pRB).Only when this protein is inactive can the E2F group of transcriptional factors required for DNA replication and, therefore, cell proliferation.Although the interaction between CDKs and pRBs is inhibitive, we have kept the activations as indicated by the authors, using pRB as a proxy for it's following and more interesting product, E2F.In addition, we have included the logical rule proposed for Cyclin D presented in their work.

Figure 2 .
Figure 2. Refinement of a model via Cooperative Sorts.Here, a is an activator of c while b inhibits c .The generalized dynamics of the system have been constructed on the left.But what should happen in the case that both a and b are present?According to the left hand model, the system will oscillate.If we know more about how the system should function, however, we would like to be able to include this information into our model.With general dynamics, we are unable to express logical gates in which multiple species exhibit deterministic combined effects on a target, such as a ∧ −b , or the presence of the activator without the presence of the inhibitor.In order to add this combined interaction and eliminate the oscillatory behavior, we must refine the Process Hitting model with a cooperative sort, ab .This sort will handle the interactions of a and b on c while leaving the original species to interact with other elements as before.In exchange, more actions must be added such that a and b can effectively update ab so that it truly reflects the current state of both elements.In our example, ab 1,1 will not interact with c 0 , thus c

Figure 4 .
Figure 4. Sample of results from incorporating model parameters as dimensions in PGD solution.Here, we have selected three potential values for the firing rate r , r 2 (left), 3r 2 (middle) and 2r (right), for any interaction involving the proteins p21 or p27.The

Figure 6 .
Figure 6.The interaction graph for ErbB mediated G1/S cell cycle transition.Here, elements directly related to the ErbB signaling portion of the network are represented by boxes, while the elements related to kinase activity are represented by circles.Activation interactions are shown in green arrows and inhibition in red blunted arrows.Since this is the initial, most basic network derived from the literature, no combined effects requiring Boolean logic gates are shown.