1. Introduction
An important step in understanding the dynamics of a chemical reaction network (CRN) is its mathematical modelling. The dynamics of a CRN are usually determined by a system of ordinary differential equations (ODEs) known as the kinetic model of the CRN. The set of parameters of such a system is usually partially or even entirely unknown and is often estimated from various types of observational data obtained from biochemical experiments. Typically, the experimental data available for estimating the parameters are time series, i.e., the data are collected at discrete points in time.
The
bottom-up approach (see, e.g., [
1,
2]) is a modelling method widely utilized in various research domains, including systems biology. In this method, accessible experimental data are employed to construct a comprehensive mathematical model of a system. The bottom-up modelling technique in systems biology comprises four primary stages. The initial stage is draft reconstruction, which involves the compilation of appropriate data from biological experiments. The second stage involves manually curating the gathered data, for example, by inserting absent values and removing irrelevant data. In the third stage, the knowledge concerning the biological interactions occurring in the CRN is transcribed into a mathematical expression. In the fourth stage, the parameters of this mathematical expression are numerically approximated from the observed experimental data, culminating in a comprehensive mathematical model.
The final stage of the bottom-up approach, namely parameter determination, can be executed using various approaches. The fundamental concept behind any method for parameter estimation is to compare the available experimental data with the corresponding values produced by the mathematical model. To guarantee a well-defined parameter estimation problem, it is essential to possess complete experimental data of external variables. This means having data corresponding to measurements of all external variables in the mathematical model. In the instance of CRNs, the choice of the most suitable technique generally relies on the qualities of the data amassed from biological experiments and the structure of the CRN being considered. Several mathematical techniques have been explored in the literature for parameter estimation of kinetic models of CRNs utilizing experimental data of species concentrations. A prevalent approach to solve the parameter estimation problem for scenarios where all species concentrations can be experimentally measured is the well-known (weighted)
least squares technique (see, e.g., [
3,
4]). The application of this optimization approach minimises the (weighted) summation of squared
residuals, i.e., the (weighted) summation of squared differences between the observed experimental concentration values and the corresponding values foreseen by the model. Some of the widely recognized methodologies such as maximum likelihood estimation, finite differences, quasi-linearization procedure, etc. have been deliberated in [
5]. For an exhaustive overview of available mathematical techniques, consult [
6,
7,
8,
9]. In specific instances, experiments provide data corresponding to reaction rates (see, e.g., [
10,
11,
12,
13]). Recently, in [
14], a method for parameter estimation from this form of experimental data for enzymatic CRNs was proposed. It is based on the approximation of the vector of species concentrations with parametric Bézier curves [
15,
16], which, in combination with the general least squares approach, leads to a complete mathematical model.
Bayesian-based parameter estimation techniques have also been widely developed for systems of ODEs (see, e.g., [
17,
18,
19,
20,
21,
22]). In such cases, the vector of parameters involved in the system of ODEs is usually treated as a random variable. A probability distribution (known as
prior distribution) such as the normal distribution, the uniform distribution, the Poisson distribution, etc., (see, e.g., [
23]) is therefore assigned to the vector of parameters. The technical core of the Bayesian parameter estimation approach is to construct the
joint probability distribution (for the vector of parameters and the available data corresponding to the vector of dependent variables) and to perform computations to determine the
posterior distribution (the conditional distribution of the parameters, given the experimental data corresponding to the dependent variable). Even though Bayesian-based approaches are useful techniques for parameter estimation and are extensively applied to CRNs (see, e.g., [
24,
25,
26,
27,
28]), such approaches have many shortcomings. One of the main shortcomings is that most of these methods use only the available experimental data and do not consider, for the most part, the structure of the system of ODEs. Moreover, such an approach is mostly not straightforward to apply and may require huge computational efforts. Another shortcoming is the ambiguity in the construction of appropriate probability specifications since there is no explicit strategy for properly assigning a prior distribution. Depending on the characteristics of the available experimental data, it cannot always be assured that the model-predicted values corresponding to the obtained Bayesian estimates will be a satisfactory fit for the available experimental data. In such cases, a separate analysis of the fit of the model is required.
In most cases, not all concentrations can be measured experimentally. This incompleteness of the data makes the problem of parameter estimation more challenging, both mathematically and computationally, as we do not have a well-posed parameter estimation problem in this case. As expected, there is no general direct solution to this problem in the literature. Therefore, new mathematical techniques are necessary for estimating the parameters included in mathematical models using this type of data. In this manuscript, we address the problem of parameter estimation for CRNs from observed time series partial experimental data of species’ concentrations. We are mainly interested in CRNs governed by the
mass action kinetic rate law (MAKRL), since this is the law that governs most real-life CRNs. We assume that measurements of concentrations are available for some of the species at discrete time points, which may not necessarily be equidistant. Direct estimation of the parameters of the mathematical model using the available partial experimental data is usually not feasible. This is because we do not have a well-posed parameter estimation problem, since not all concentrations involved in the mathematical model are measured experimentally. Therefore, we employ the
Kron reduction method [
29] to transform the ill-posed parameter estimation problem into a well-posed problem.
Before performing parameter estimation, it is important to understand whether the parameters are identifiable given a particular type of output, i.e., whether there is a unique parameter vector corresponding to the given output vector. We address the parameter identifiability issue of dynamical systems described by a system of ODEs. The uniqueness of the parameter vector depends on the structure of the system under consideration and the type of the available outputs. First, we recall two known definitions of parameter identifiability and then demonstrate the link between them
Our parameter estimation technique consists of three major steps. The first of these involves reduction of the known original model of the biochemical reaction network with unknown parameters using the technique of Kron reduction [
29]. Of the several available techniques for model reduction of CRNs in the literature (see [
30,
31] for a thorough review of such techniques), we chose Kron reduction method for two particular reasons. The first and foremost is that Kron reduction preserves the kinetics of the original model. Thus, if the original network is governed by MAKRL, the reduced model obtained by Kron reduction also corresponds to another CRN, whose variables are concentrations of chemical species that are a subset of the species of the original network and moreover, the reduced CRN is also governed by MAKRL. This kinetics preservation property does not hold for several other known reduction techniques. The second reason is that by using Kron reduction method, it is possible to arrive at a reduced model of a CRN, whose dependent variables are exactly the set of concentrations of compounds whose time-series data are available. The (unknown) parameters of this reduced model are functions of the parameters of the original mathematical model. The availability of this reduced model with unknown parameters together with the time-series data of all the dependent variables of this model leads to a well-posed parameter estimation problem. The solution of this parameter estimation problem is the second step of our procedure. Because of its simplicity and computational feasibility, we apply the least squares optimization technique to deal with this estimation problem. In the final step, we solve an optimization problem in which the parameters of the original model are determined in such a way that the original mathematical model and the Kron reduced model have minimum difference between a key characteristic property associated with them. It is shown that this key characteristic property is related to the settling time of the corresponding CRN in case the given model is linear. The entire procedure has been automated and the corresponding MATLAB library is provided as
Supplementary Material.
We apply our new techniques to two realistic examples of CRNs from the
Biomodels database [
32]. We consider a model of
nicotinic acetylcholine receptors [
33] and a model of
Trypanosoma brucei trypanothione synthetase [
34]. For each of these models, we first generate partial time series data of the species’ concentrations using the parameter values given in the corresponding reference. We then explain how to derive the Kron-reduced mathematical model obtained by deleting a subset of complexes and determine the values for its parameters such that the dynamics of the Kron-reduced model in the least squares sense most closely approximate the available time series data. Using these estimated values for the parameters of the Kron-reduced model and their dependence on the parameters of the original mathematical model, we finally determine the estimates for the parameters contained in the original model.
2. Background
In this section, we give a compact description of the mathematical tools that are necessary for demonstrating the main results of the current manuscript.
2.1. Notations
We introduce the notations that will be used throughout the rest of the manuscript. For a vector , refers to its ith entry, i.e., . denotes the diagonal matrix, whose diagonal entries are the entries of the vector v. is the entry of the matrix M corresponding to the ith row and the jth column. The eigenvalues of the square matrix M are denoted by . The spectrum of the square matrix M is denoted by , i.e., . is the complex conjugate of the complex number . and denote the real and imaginary parts of the complex number , respectively. is the cardinality of the set . Vectors of length m composed entirely of ones and zeros are respectively denoted as and . Furthermore, is the matrix with all its entries set to zero.
2.2. Mathematical Models
We outline the process of deriving a mathematical formulation that describes the dynamics of a CRN. Let
,
, be the set of
s distinct chemical species of the considered CRN with
r unidirectional reactions
,
. The connection between the species and the reactions is established through an
stoichiometric matrix denoted as
S. Its elements are determined as
, with
representing the number of moles of the
ith species
within the substrate of the
jth reaction
, and
indicating the same for the product of the reaction. Denote by
the vector of unidirectional reaction rates. This vector is dependent on both the species’ concentration vector
x and the parameter vector
inherent to the model. The fundamental framework that characterizes the evolution of the species concentration vector is given by the stoichiometric representation of the
balance laws as:
Graphs are crucial tools for modelling various types of relations and processes in numerous scientific domains including systems biology. In chemical reaction network theory (CRNT), directed graphs are commonly used in the process of modelling CRNs to display the link between individual reactions. The complexes of a CRN are defined as the left-hand (substrate) and right-hand (product) sides of the reactions. Let , , denote the set of distinct complexes of the considered CRN. The complexes can be inherently linked with the vertices of a directed graph, where the directed edges align with the reactions present within the CRN. More precisely, if there is a reaction for which the complex is the substrate and the complex is the product, then in the corresponding graph of complexes there is a directed edge having the vertex associated with the complex as the tail vertex and the vertex associated with the complex as the head vertex. A linkage class of a CRN is a connected component of the corresponding graph of complex, i.e., a maximal set of complexes such that every complex in the set is connected by a directed edge to at least one other complex.
Any CRN can be uniquely described by a system of ODEs given in (
1), independent of its governing laws. In this manuscript, we consider only mass action kinetics rate law, since it is the governing law of a wide range of real-life CRNs. According to this rate law, the rate of a reaction is directly proportional to the concentration of each species involved in the substrate of the reaction, raised to a power equal to the number of its moles in the expression of this substrate. More precisely, the reaction rate
of the
jth reaction,
, is given in the following form
where, as earlier,
,
, is the rate constant of the
jth reaction and
,
, is the number of moles of the species
in the substrate of the
jth reaction. Note that in this case, the only parameters contained in the mathematical model are the rate constants of the reactions, i.e.,
and the number of unknown parameters is
. Next, we obtain an expression for the vector of reaction rates
given in terms of matrix multiplication, which is a useful approach for automated modelling purposes. Define the
the
substrate composition matrix and the
substrate expression function of the CRN as:
The
conductance matrix of the CRN is a
diagonal matrix whose
ith diagonal entry is the rate constant of the
ith reaction
, i.e., if
is the vector of rate constants, then
. Then observe that the vector of reaction rates can be expressed in the following matrix multiplication form:
Example 1. To elucidate the outlined modelling process, we apply it to the subsequent example of a CRN. Consider a scenario where five chemical species, denoted as , for , are engaged in three distinct unidirectional reactions, given as follows:For , let be the rate constant of the ith reaction. Observe that the second reaction can be interpreted as the reverse of the first reaction. In our modelling approach, we treat each reversible reaction as a pair of distinct unidirectional reactions. There are three distinct complexes , , involved in the CRN, which are given as , , and . For the first reaction, since the complex is the substrate and the complex is the product, in the graph of complexes corresponding to the CRN shown in (4) there is a directed edge having as the tail vertex and as the head vertex. Similarly, we can construct the edges of the graph of complex corresponding to the other reactions. The resulting graph of complexes is thus . Note that this graph of complexes consists of only a single linkage class. Given the assumption that the reactions (4) are governed by MAKRL, the reaction rates can be computed by Equation (2) as follows: The vector of reaction rates can be written in the matrix multiplication form (3) with the substrate composition matrix Ω
and the substrate expression function φ given by Thus, in this case the balance laws (1) can be written as: 2.3. The Weighted Directed Laplacian Matrix
In CRNT, the (
weighted directed)
Laplacian matrix is a matrix representation of the reactions occurring between the different complexes. Here we explain how to construct the Laplacian matrix of the CRN using the (
weighted directed)
adjacency matrix corresponding to its graph of complexes. The adjacency matrix
A is a
matrix, with
c being the number of complexes of the CRN, such that its entry
is equal to
k if there is a reaction having the
jth complex of the network as substrate and
ith complex of the network as a product with
k being the rate constant of the reaction. The (
weighted directed)
degree matrix D of the graph of complexes corresponding to a CRN is a
diagonal matrix such that its
ith diagonal entry is equal to the sum of the elements of the
ith column of the weighted adjacency matrix
A. The
Laplacian matrix associated with the graph of complexes of the considered CRN is defined as follows:
If there is a reaction for which the complex
is the substrate and the complex
is the product, then the off-diagonal element
is equal to the rate constant of the respective reaction taken with the negative sign. For useful properties of Laplacian matrices, we refer to [
35]. Any directed graph is defined by an
incidence matrix [
36], which represents the connections between its vertices and edges. In the case of CRNs, the
incidence matrix
B of the graph of complexes is defined as follows:
Define the
outgoing matrix of the considered CRN as follows:
It can be shown that the Laplacian is given in matrix multiplication form as follows:
For automatic modelling purposes, it is convenient to construct the Laplacian matrix using the simple matrix multiplication form given in (
7).
Next, we show how to represent the balance laws (
1) in terms of the weighted directed Laplacian matrix
L. The
c complexes of the considered CRN are described by an
complex composition matrix Z, whose columns express the complexes of the CRN in terms of their species. More precisely, the element
of the complex composition matrix
Z is the number of moles of the
ith species
in the expression of
jth complex
. As explained in [
29,
37], it can be shown that the balance laws of a mass action CRN can be rewritten as follows:
where
is the
complex expression function defined as:
Example 2. With reference to the CRN example (4), the weighted adjacency matrix A and the weighted degree matrix D are:The weighted directed Laplacian L of the CRN (4) is therefore given byOn the other hand, the Laplacian matrix can be computed using Equation (7) with the incidence matrix B and the outgoing matrix Δ
given by:The balance laws (5) can be rewritten as Equation (8) with the complex composition matrix Z and the complex expression function ψ given by: In the following theorem we recall certain important spectral properties (see, e.g., [
38]) of the weighted directed Laplacian matrix associated with the graph of complexes corresponding to a CRN governed by MAKRL.
Theorem 1 (Spectrum of the weighted directed Laplacian matrix).
If the graph of complexes of a CRN governed by MAKRL has a single linkage class, then the eigenvalues , , of the weighted directed Laplacian matrix L associated with the CRN can be ordered in such a way that: First note that
. This is simply because of the fact that the sum of each column of
L is equal to zero according to the definition of the Laplacian matrix, i.e.,
. Moreover, from the generalization of the matrix-tree theorem it follows that the multiplicity of the zero eigenvalue is equal to the number of connected components, which is
. Using Greshgorin’s circle theorem [
39], it can be shown that the real parts of non-zero complex eigenvalues of
L are strictly positive, i.e., if
and
, then
. For a detailed explanation of the proof of Theorem 1 we refer to [
40].
2.4. Kron Reduction of Chemical Reaction Networks
The Kron reduction for mathematical models of CRNs [
29,
37,
41] is performed by assuming that certain intermediate complexes are complex balanced and is carried out by computing the Schur complement of the weighted Laplacian matrix associated with the corresponding graph of complexes. We therefore first recall the definition of Schur complements (see, e.g., [
42]) of a given square matrix.
Definition 1 (Schur complement).
Let , , , and be constant matrices such that the latter is invertible. Consider the following block matrix:The Schur complement of the block matrix is the matrix defined as: Let
be the set of indices corresponding to the complexes of the CRN, i.e.,
. Suppose our objective involves removing the complexes associated with the subset of indices denoted as
. Note that it should be ensured that
. The removal of complexes is accomplished through the computation of the Schur complement
of the block matrix of the Laplacian matrix
L corresponding to the set of indices
. Here,
is the set of indices corresponding to the complexes remaining in the reduced graph of complexes.
is again a Laplacian matrix since it satisfies the properties of Laplacian matrices ([
29], Proposition 1). Furthermore, it has been proven that the equation
describes the dynamics of a CRN governed by MAKRL, with a smaller number of complexes. Here,
is the vector of species’ concentrations in the reduced mathematical model (which contains a subset of the elements of
x.),
is the complex composition matrix of the reduced CRN, and
As explained in [
29,
37], a well-chosen
will result in a reduction of dependent variables within the corresponding mathematical model. Note that the parameters contained in the Kron-reduced mathematical model can be represented as a function of the parameters involved in the original model. More precisely, if
denotes the vector of parameters of the reduced model with
being the number of reactions in it, then there is a function
such that
. In general, the manual derivation of the explicit form of the function
f is not straightforward. However, we use MATLAB symbolic variables to derive the explicit form of
f in a fully automated fashion. We refer to the function
f as the
parameter dependence function, since it specifies the dependence of the vector of parameters
p of the reduced model on the vector of parameters
k of the original model.
In order to determine the structure of the reduced CRN, we need to find the incidence matrix and the complex composition matrix of the reduced network. This can be done according to the automated procedure described in [
37]. The incidence matrix
is determined by making use of its Laplacian matrix
. According to this procedure, if
,
, then in the reduced graph of complexes there is a reaction for which the
jth complex of the reduced CRN is the substrate and
ith complex of the reduced CRN is the product complex. Therefore, the entries of the incidence matrix
of the reduced graph of complexes are defined according to (
6). The complex composition matrix
of the reduced CRN is obtained by simply removing the columns of the incidence matrix
Z of the original CRN that correspond to the set of indices
. As mentioned earlier, the incidence matrix describes the reactions occurring between the complexes and the complex composition matrix gives the expression of complexes in terms of the species. We therefore use
and
to determine the reactions corresponding to the reduced graph of complexes.
Example 3. To illustrate the Kron reduction method for CRNs, we demonstrate it for the example given in (4). Assume that we want to delete the complex from the graph of complexes by applying the Kron reduction method. In other words, and . The elimination of is carried out by computing the Schur complement of the Laplacian matrix corresponding to the set of indices , which results in the following weighted directed Laplacian matrix associated with the reduced graph of complexes:As explained in [37], the complex composition matrix of the reduced CRN is obtained by eliminating the second column of the complex composition matrix Z of the original CRN, i.e.,Thus the balance laws of the Kron-reduced model are as follows:Using the Laplacian matrix of the Kron-reduced model we obtain the incidence matrix of the reduced complex graph:Taking into account and we derive the reactions of the reduced CRN:where the parameter p is given in terms of the parameters of the original model as , i.e., in this case for the explicit form of the function f we have . Note that after deleting the complex from the graph of complexes by the Kron reduction approach, the species is not involved in the resulting reduced CRN, since its concentration is conserved in time. In [
29,
37], the optimal combination of complexes for deletion is selected by making use of an error integral, which quantifies the difference between the dynamical behaviors of the original model and the corresponding reduced model. This error integral is measure that is based on a particular trajectory corresponding to the original model.
2.5. The Least Squares Optimization Method
We explain how to apply the (weighted) least squares optimization technique to estimate the parameters
involved in the kinetic model (
1) describing the dynamics of the given CRN. For the general least squares optimization method we refer to, for example, [
3,
4]. This optimization method plays a crucial role in our parameter estimation method.
Assume that biological experiments provide complete experimental data of species’ concentrations, i.e., measurements that correspond to all of the species’ concentrations. For
, let
,
;
, be the observed value of the
ith concentration at time instant
, which is the
mth time point corresponding to the
jth experiment. We aim to identify the best-fitting parameter values of the mathematical model (
1) corresponding to the above-mentioned observed time-series experimental data of species’ concentrations. For every
, consider the following initial value problem (IVP):
Since the available experimental data corresponds to the measurements of all the species’ concentrations, for every parameter vector
the IVP given in (
10) generates data for the species’ concentrations. More precisely, the IVP (
10) can be numerically solved with respect to time. However, this numerical integration is not always possible if the available experimental data corresponds to only some of the concentrations.
For every
, let
,
;
, denote the model-predicted value of the
ith concentration obtained by numerically solving the IVP (
10). In this case, the least squares error is defined as the sum of squared residuals, which are the differences between the observed experimental values of concentrations and the corresponding model-predicted values provided from the IVP given in (
10):
Here, for
and
,
denotes the weight of the corresponding measurement. In this case, it is assumed that the measurements have different uncertainties. Each weight can be taken, for example, equal to the reciprocal of the variance of the measurement:
We will refer to this approach as the weighted least squares (WLS). If the measurements have equal variance, then the weights can be taken equal to one. We will refer to the corresponding approach as the unweighted least squares (UWLS). The (weighted) least squares optimization technique finds the optimal parameter values by minimizing the error (
11). This minimization can be done, for example, by the standard Levenberg-Marquardt algorithm [
43,
44], or the modified Levenberg-Marquardt algorithm [
45]. We denote by
the solution to the aforementioned optimization problem, i.e.,
3. Parameter Identifiability
In this section, we first recall the definitions of least squares parameter identifiability and parameter identifiability, and in addition demonstrate the link between these two identifiability concepts. Consider a dynamical system described by a system of ODEs that is given in following the form:
where
f is an
s-dimensional vector-valued function depending on the structure of the system and
g is
n-dimensional. Here,
is the parameter vector,
is the vector of states of the system and
y is the
n-dimensional output vector. For a given vector of initial states
, let
denote the output trajectory of the system (
12) corresponding to the parameter vector
and initial states
.
Assume that, in an experimental setup, the output has been continuously measured over the time interval
, for some pre-specified
. Let
be the resulting measured output. Consider the cost function
defined as:
We recall the definition of least squares parameter identifiability of dynamical systems given in the form (
12), which was first introduced in [
46].
Definition 2 (Least squares parameter identifiability).
The dynamical system (12) is least squares parameter identifiable, if for every given vector of initial states and for every given measurement function , the cost function (13) admits a unique minimum. If there exists at least one vector of initial states
and a measurement function
such that the cost function (
13) has multiple minima, then the dynamical system (
12) is least squares parameter nonidentifiable.
Each vector of parameters
determines a set
of admissible output trajectories of the system (
12). We recall the definition of parameter identifiability of dynamical systems provided in [
14].
Definition 3 (Parameter identifiability).
The dynamical system (12) is parameter identifiable, if for parameter vectors such that , we have . Equivalently, the dynamical system (
12) is parameter identifiable, if
for two parameter vectors
, implies
. If there are two distinct parameter vectors
for which
, then the dynamical system (
12) is parameter unidentifiable.
We finally turn our attention to the main contribution of this section. In the following theorem we specify a link between the two identifiability concepts given above.
Theorem 2. If the dynamical system (12) is parameter unidentifiable, then it is also least squares parameter unidentifiable. Proof. We assume that the dynamical system (
12) is parameter unidentifiable and we prove that it is also least squares parameter unidentifiable. Since the dynamical system (
12) is parameter unidentifiable, there are two parameter vectors
, such that
and
. This means that there is at least one vector of initial states
for which
,
. Choose a measurement function
as:
Note that this choice results in
. For two parameter vectors
there is a vector of initial states
and a measurement function
such that
. Note that zero is the minimum value of
since it is a non-negative function. We conclude that the minimum of
is not unique in this case and thus the dynamical system (
12) is least squares parameter unidentifiable. □
6. Discussion
The Kron-reduced mathematical model with the best-fitting values of parameters (in the sense of least squares), as we can see from
Figure 2 and
Figure 5, is generally not an appropriate approximation, meaning that the corresponding model predicted values are far from being good fits for the available time-series data. This is because of the fact that, in general depending on the number of complexes deleted from the graph of complexes, it is not assured that the Kron-reduced model is a reasonable approximation for the original mathematical model.
The choice of the Kron reduction technique as a tool for reducing mathematical models in our parameter estimation method is based on several advantages of this reduction technique that are particularly appropriate for the problem. First of all, it does not impose any restrictions on the choice of complexes to be deleted. Thus, we can delete all the complexes containing at least a single unmeasured species. A second advantage is that Kron reduction method preserves the kinetics of the CRN, i.e., if MAKRL governs the given CRN, then the corresponding Kron-reduced model is also governed by this rate law. A third advantage of the Kron reduction method is that we are able to compare the dynamics of the original model to the one of the reduced model using the Laplacian matrix of the original model and the Laplacian matrix of the reduced model. To the best of our knowledge, there is no other reduction technique that offers all these aforementioned advantages.
The suggested parameter estimation method is only applicable to a mass action CRN with a constant Laplacian. This is because of the fact that in the estimation procedure, the eigenvalues of the Laplacian matrix are used. For a general enzymatic CRN, the corresponding Laplacian matrix is not constant since it depends on the vector of species’ concentrations. In such cases, it is not straightforward how to use a similar technique for parameter estimation purposes. The parameter estimation of enzyme kinetic reaction networks from partial data of species’ concentrations is still an open problem that will be considered in future work.
As explained in [
37,
41], a
linkage class of a CRN is a connected component of its graph of complexes. It is also stated in these papers that if a network has a linkage class with only one reaction, then the removal of a complex involved in such a reaction by Kron reduction leads to the removal of the reaction. In the case, where the intermediate Kron reduction phase of our parameter estimation procedure leads to the removal of some of the reactions of the original model, we would expect that the estimated parameters of the original model associated with the removed reactions would have larger confidence intervals compared with those of the other parameters that are associated with the remaining reactions of the network.
7. Conclusions
In this paper, we have introduced an innovative parameter estimation approach for mathematical models of mass action CRNs using observed time-series incomplete experimental data of species’ concentrations. As far as we know, there exists no direct technique for deducing the parameters in a mathematical model from this sort of experimental data. We have addressed this problem by devising an algorithmic strategy, which involves the application of Kron reduction technique for kinetic models as an intermediate step in the overall parameter estimation approach. The complexes that should be deleted using Kron reduction are chosen in such a way that in the reduced model only the concentrations of the measured species are involved. Since all the species’ concentrations involved in the Kron-reduced model are measured we now have a well-posed parameter estimation problem. We estimate the parameters involved in the Kron-reduced model using the least squares method to identify the best-fitting values of the parameters involved in the Kron-reduced model. To estimate the parameters contained in the original mathematical model, we have devised a new trajectory-independent measure to quantify the difference between the dynamics of the original model and the corresponding Kron-reduced model. It is based on comparing the smallest non-zero real part of the eigenvalues of the original Laplacian matrix with the one of the Kron-reduced Laplacian matrix. The reason behind the choice of measure is the fact that the smallest non-zero real part of the Laplacian matrix is related to the settling time of the CRN that is characterized by the Laplacian matrix. This measure can be regarded as a function of the parameter vector of the original mathematical model since the eigenvalues of both the original Laplacian matrix as well as the Kron-reduced Laplacian matrix depend only on this vector of parameters. Finally, we find the estimates of the parameters for which the above-mentioned spectral-based measure admits its smallest value.
We have devised an automatic process for our parameter estimation approach, crafting a MATLAB library that can be employed to deduce the parameters from experimental data of species’ concentrations automatically. This MATLAB library is given as
Supplementary Material. We utilized this MATLAB library to effectively employ our parameter estimation approach on two real-life CRNs taken from the Biomodels database [
32]. For each of these models, we observed that our parameter estimation method resulted in a complete mathematical model that could make accurate predictions about the dynamics of the CRN. While we have exclusively applied and tested our parameter estimation method on a limited scale, involving only two real-life instances of CRNs, its applicability extends to all networks regulated by MAKRL. It should be noted that the method places no restriction on the size or scale of the model as well as the biological purpose of the associated network as long as its reactions are governed by MAKRL. Thus the method is applicable for models of core metabolism (for instance E.coli central carbon metabolism models reviewed in [
55]) as well as models of regulatory networks. It can also be applied for small models like the one of NAR considered in this paper as well as genome-scaled models as in [
56].