Next Article in Journal
Emprint Microwave Thermoablation System: Bridging Thermal Ablation Efficacy between Human Patients and Porcine Models through Mathematical Correlation
Next Article in Special Issue
Deciphering Metabolic Pathways in High-Seeding-Density Fed-Batch Processes for Monoclonal Antibody Production: A Computational Modeling Perspective
Previous Article in Journal
Force-Controlled Biomechanical Simulation of Orthodontic Tooth Movement with Torque Archwires Using HOSEA (Hexapod for Orthodontic Simulation, Evaluation and Analysis)
Previous Article in Special Issue
Prediction of Metabolic Flux Distribution by Flux Sampling: As a Case Study, Acetate Production from Glucose in Escherichia coli
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parameter Estimation for Kinetic Models of Chemical Reaction Networks from Partial Experimental Data of Species’ Concentrations

1
School of Environmental Engineering, University of Seoul, Seoul 02504, Republic of Korea
2
Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon 21985, Republic of Korea
3
Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
*
Author to whom correspondence should be addressed.
Bioengineering 2023, 10(9), 1056; https://doi.org/10.3390/bioengineering10091056
Submission received: 14 July 2023 / Revised: 23 August 2023 / Accepted: 31 August 2023 / Published: 7 September 2023
(This article belongs to the Special Issue Metabolic Modeling and Engineering)

Abstract

:
The current manuscript addresses the problem of parameter estimation for kinetic models of chemical reaction networks from observed time series partial experimental data of species concentrations. It is demonstrated how the Kron reduction method of kinetic models, in conjunction with the (weighted) least squares optimization technique, can be used as a tool to solve the above-mentioned ill-posed parameter estimation problem. First, a new trajectory-independent measure is introduced to quantify the dynamical difference between the original mathematical model and the corresponding Kron-reduced model. This measure is then crucially used to estimate the parameters contained in the kinetic model so that the corresponding values of the species’ concentrations predicted by the model fit the available experimental data. The new parameter estimation method is tested on two real-life examples of chemical reaction networks: nicotinic acetylcholine receptors and Trypanosoma brucei trypanothione synthetase. Both weighted and unweighted least squares techniques, combined with Kron reduction, are used to find the best-fitting parameter values. The method of leave-one-out cross-validation is utilized to determine the preferred technique. For nicotinic receptors, the training errors due to the application of unweighted and weighted least squares are 3.22 and 3.61 respectively, while for Trypanosoma synthetase, the application of unweighted and weighted least squares result in training errors of 0.82 and 0.70 respectively. Furthermore, the problem of identifiability of dynamical systems, i.e., the possibility of uniquely determining the parameters from certain types of output, has also been addressed.

1. Introduction

An important step in understanding the dynamics of a chemical reaction network (CRN) is its mathematical modelling. The dynamics of a CRN are usually determined by a system of ordinary differential equations (ODEs) known as the kinetic model of the CRN. The set of parameters of such a system is usually partially or even entirely unknown and is often estimated from various types of observational data obtained from biochemical experiments. Typically, the experimental data available for estimating the parameters are time series, i.e., the data are collected at discrete points in time.
The bottom-up approach (see, e.g., [1,2]) is a modelling method widely utilized in various research domains, including systems biology. In this method, accessible experimental data are employed to construct a comprehensive mathematical model of a system. The bottom-up modelling technique in systems biology comprises four primary stages. The initial stage is draft reconstruction, which involves the compilation of appropriate data from biological experiments. The second stage involves manually curating the gathered data, for example, by inserting absent values and removing irrelevant data. In the third stage, the knowledge concerning the biological interactions occurring in the CRN is transcribed into a mathematical expression. In the fourth stage, the parameters of this mathematical expression are numerically approximated from the observed experimental data, culminating in a comprehensive mathematical model.
The final stage of the bottom-up approach, namely parameter determination, can be executed using various approaches. The fundamental concept behind any method for parameter estimation is to compare the available experimental data with the corresponding values produced by the mathematical model. To guarantee a well-defined parameter estimation problem, it is essential to possess complete experimental data of external variables. This means having data corresponding to measurements of all external variables in the mathematical model. In the instance of CRNs, the choice of the most suitable technique generally relies on the qualities of the data amassed from biological experiments and the structure of the CRN being considered. Several mathematical techniques have been explored in the literature for parameter estimation of kinetic models of CRNs utilizing experimental data of species concentrations. A prevalent approach to solve the parameter estimation problem for scenarios where all species concentrations can be experimentally measured is the well-known (weighted) least squares technique (see, e.g., [3,4]). The application of this optimization approach minimises the (weighted) summation of squared residuals, i.e., the (weighted) summation of squared differences between the observed experimental concentration values and the corresponding values foreseen by the model. Some of the widely recognized methodologies such as maximum likelihood estimation, finite differences, quasi-linearization procedure, etc. have been deliberated in [5]. For an exhaustive overview of available mathematical techniques, consult [6,7,8,9]. In specific instances, experiments provide data corresponding to reaction rates (see, e.g., [10,11,12,13]). Recently, in [14], a method for parameter estimation from this form of experimental data for enzymatic CRNs was proposed. It is based on the approximation of the vector of species concentrations with parametric Bézier curves [15,16], which, in combination with the general least squares approach, leads to a complete mathematical model.
Bayesian-based parameter estimation techniques have also been widely developed for systems of ODEs (see, e.g., [17,18,19,20,21,22]). In such cases, the vector of parameters involved in the system of ODEs is usually treated as a random variable. A probability distribution (known as prior distribution) such as the normal distribution, the uniform distribution, the Poisson distribution, etc., (see, e.g., [23]) is therefore assigned to the vector of parameters. The technical core of the Bayesian parameter estimation approach is to construct the joint probability distribution (for the vector of parameters and the available data corresponding to the vector of dependent variables) and to perform computations to determine the posterior distribution (the conditional distribution of the parameters, given the experimental data corresponding to the dependent variable). Even though Bayesian-based approaches are useful techniques for parameter estimation and are extensively applied to CRNs (see, e.g., [24,25,26,27,28]), such approaches have many shortcomings. One of the main shortcomings is that most of these methods use only the available experimental data and do not consider, for the most part, the structure of the system of ODEs. Moreover, such an approach is mostly not straightforward to apply and may require huge computational efforts. Another shortcoming is the ambiguity in the construction of appropriate probability specifications since there is no explicit strategy for properly assigning a prior distribution. Depending on the characteristics of the available experimental data, it cannot always be assured that the model-predicted values corresponding to the obtained Bayesian estimates will be a satisfactory fit for the available experimental data. In such cases, a separate analysis of the fit of the model is required.
In most cases, not all concentrations can be measured experimentally. This incompleteness of the data makes the problem of parameter estimation more challenging, both mathematically and computationally, as we do not have a well-posed parameter estimation problem in this case. As expected, there is no general direct solution to this problem in the literature. Therefore, new mathematical techniques are necessary for estimating the parameters included in mathematical models using this type of data. In this manuscript, we address the problem of parameter estimation for CRNs from observed time series partial experimental data of species’ concentrations. We are mainly interested in CRNs governed by the mass action kinetic rate law (MAKRL), since this is the law that governs most real-life CRNs. We assume that measurements of concentrations are available for some of the species at discrete time points, which may not necessarily be equidistant. Direct estimation of the parameters of the mathematical model using the available partial experimental data is usually not feasible. This is because we do not have a well-posed parameter estimation problem, since not all concentrations involved in the mathematical model are measured experimentally. Therefore, we employ the Kron reduction method [29] to transform the ill-posed parameter estimation problem into a well-posed problem.
Before performing parameter estimation, it is important to understand whether the parameters are identifiable given a particular type of output, i.e., whether there is a unique parameter vector corresponding to the given output vector. We address the parameter identifiability issue of dynamical systems described by a system of ODEs. The uniqueness of the parameter vector depends on the structure of the system under consideration and the type of the available outputs. First, we recall two known definitions of parameter identifiability and then demonstrate the link between them
Our parameter estimation technique consists of three major steps. The first of these involves reduction of the known original model of the biochemical reaction network with unknown parameters using the technique of Kron reduction [29]. Of the several available techniques for model reduction of CRNs in the literature (see [30,31] for a thorough review of such techniques), we chose Kron reduction method for two particular reasons. The first and foremost is that Kron reduction preserves the kinetics of the original model. Thus, if the original network is governed by MAKRL, the reduced model obtained by Kron reduction also corresponds to another CRN, whose variables are concentrations of chemical species that are a subset of the species of the original network and moreover, the reduced CRN is also governed by MAKRL. This kinetics preservation property does not hold for several other known reduction techniques. The second reason is that by using Kron reduction method, it is possible to arrive at a reduced model of a CRN, whose dependent variables are exactly the set of concentrations of compounds whose time-series data are available. The (unknown) parameters of this reduced model are functions of the parameters of the original mathematical model. The availability of this reduced model with unknown parameters together with the time-series data of all the dependent variables of this model leads to a well-posed parameter estimation problem. The solution of this parameter estimation problem is the second step of our procedure. Because of its simplicity and computational feasibility, we apply the least squares optimization technique to deal with this estimation problem. In the final step, we solve an optimization problem in which the parameters of the original model are determined in such a way that the original mathematical model and the Kron reduced model have minimum difference between a key characteristic property associated with them. It is shown that this key characteristic property is related to the settling time of the corresponding CRN in case the given model is linear. The entire procedure has been automated and the corresponding MATLAB library is provided as Supplementary Material.
We apply our new techniques to two realistic examples of CRNs from the Biomodels database [32]. We consider a model of nicotinic acetylcholine receptors [33] and a model of Trypanosoma brucei trypanothione synthetase [34]. For each of these models, we first generate partial time series data of the species’ concentrations using the parameter values given in the corresponding reference. We then explain how to derive the Kron-reduced mathematical model obtained by deleting a subset of complexes and determine the values for its parameters such that the dynamics of the Kron-reduced model in the least squares sense most closely approximate the available time series data. Using these estimated values for the parameters of the Kron-reduced model and their dependence on the parameters of the original mathematical model, we finally determine the estimates for the parameters contained in the original model.

2. Background

In this section, we give a compact description of the mathematical tools that are necessary for demonstrating the main results of the current manuscript.

2.1. Notations

We introduce the notations that will be used throughout the rest of the manuscript. For a vector v R m , v i refers to its ith entry, i.e., v = [ v i ] i = 1 m . Γ v = diag ( v ) denotes the m × m diagonal matrix, whose diagonal entries are the entries of the vector v. M i j is the entry of the matrix M corresponding to the ith row and the jth column. The eigenvalues of the m × m square matrix M are denoted by λ 1 ( M ) , , λ m ( M ) . The spectrum of the square matrix M is denoted by σ ( M ) , i.e., σ ( M ) = λ i ( M ) ; i = 1 , , m . u ¯ is the complex conjugate of the complex number u C . ( u ) and ( u ) denote the real and imaginary parts of the complex number C , respectively. | I | is the cardinality of the set I . Vectors of length m composed entirely of ones and zeros are respectively denoted as 𝟙 m and 𝟘 m . Furthermore, 𝟘 m × n is the m × n matrix with all its entries set to zero.

2.2. Mathematical Models

We outline the process of deriving a mathematical formulation that describes the dynamics of a CRN. Let X i , i = 1 , , s , be the set of s distinct chemical species of the considered CRN with r unidirectional reactions R j , j = 1 , , r . The connection between the species and the reactions is established through an s × r stoichiometric matrix denoted as S. Its elements are determined as S i j = β j i α j i , with α j i representing the number of moles of the ith species X i within the substrate of the jth reaction R j , and β j i indicating the same for the product of the reaction. Denote by ν R + r the vector of unidirectional reaction rates. This vector is dependent on both the species’ concentration vector x and the parameter vector κ R + q inherent to the model. The fundamental framework that characterizes the evolution of the species concentration vector is given by the stoichiometric representation of the balance laws as:
d x d t = S ν ( κ , x ) .
Graphs are crucial tools for modelling various types of relations and processes in numerous scientific domains including systems biology. In chemical reaction network theory (CRNT), directed graphs are commonly used in the process of modelling CRNs to display the link between individual reactions. The complexes of a CRN are defined as the left-hand (substrate) and right-hand (product) sides of the reactions. Let C n , n = 1 , , c , denote the set of distinct complexes of the considered CRN. The complexes can be inherently linked with the vertices of a directed graph, where the directed edges align with the reactions present within the CRN. More precisely, if there is a reaction for which the complex C i is the substrate and the complex C j is the product, then in the corresponding graph of complexes there is a directed edge having the vertex associated with the complex C i as the tail vertex and the vertex associated with the complex C j as the head vertex. A linkage class of a CRN is a connected component of the corresponding graph of complex, i.e., a maximal set of complexes such that every complex in the set is connected by a directed edge to at least one other complex.
Any CRN can be uniquely described by a system of ODEs given in (1), independent of its governing laws. In this manuscript, we consider only mass action kinetics rate law, since it is the governing law of a wide range of real-life CRNs. According to this rate law, the rate of a reaction is directly proportional to the concentration of each species involved in the substrate of the reaction, raised to a power equal to the number of its moles in the expression of this substrate. More precisely, the reaction rate ν j of the jth reaction, j = 1 , , r , is given in the following form
ν j = k j i = 1 s x i α i j ,
where, as earlier, k j , j = 1 , , r , is the rate constant of the jth reaction and α i j , i = 1 , , s , is the number of moles of the species X i in the substrate of the jth reaction. Note that in this case, the only parameters contained in the mathematical model are the rate constants of the reactions, i.e., κ = k and the number of unknown parameters is q = r . Next, we obtain an expression for the vector of reaction rates ν R + r given in terms of matrix multiplication, which is a useful approach for automated modelling purposes. Define the s × r the substrate composition matrix Ω and the substrate expression function φ : R + s R + r of the CRN as:
Ω i j = S i j , if S i j < 0 S 0 , otherwise , φ ( x ) = i = 1 s x i Ω i j j = 1 r .
The conductance matrix  Γ k of the CRN is a r × r diagonal matrix whose ith diagonal entry is the rate constant of the ith reaction R i , i.e., if k R + r is the vector of rate constants, then Γ k = diag ( k ) . Then observe that the vector of reaction rates can be expressed in the following matrix multiplication form:
ν ( k , x ) = Γ k φ ( x ) .
Example 1.
To elucidate the outlined modelling process, we apply it to the subsequent example of a CRN. Consider a scenario where five chemical species, denoted as X i , for i = 1 , , 5 , are engaged in three distinct unidirectional reactions, given as follows:
X 1 + X 2 X 3 X 4 + X 5 .
For i = 1 , , 3 , let k i be the rate constant of the ith reaction. Observe that the second reaction can be interpreted as the reverse of the first reaction. In our modelling approach, we treat each reversible reaction as a pair of distinct unidirectional reactions. There are three distinct complexes C i , i = 1 , , 3 , involved in the CRN, which are given as C 1 = X 1 + X 2 , C 2 = X 3 , and C 3 = X 4 + X 5 . For the first reaction, since the complex C 1 is the substrate and the complex C 2 is the product, in the graph of complexes corresponding to the CRN shown in (4) there is a directed edge having C 1 as the tail vertex and C 2 as the head vertex. Similarly, we can construct the edges of the graph of complex corresponding to the other reactions. The resulting graph of complexes is thus C 1 C 2 C 3 . Note that this graph of complexes consists of only a single linkage class.
Given the assumption that the reactions (4) are governed by MAKRL, the reaction rates can be computed by Equation (2) as follows:
ν 1 = k 1 x 1 x 2 , ν 2 = k 2 x 3 , ν 3 = k 3 x 3 .
The vector of reaction rates can be written in the matrix multiplication form (3) with the substrate composition matrix Ω and the substrate expression function φ given by
Ω = 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 , φ ( x ) = x 1 x 2 x 3 x 3 .
Thus, in this case the balance laws (1) can be written as:
d x 1 d t = d x 2 d t = k 1 x 1 x 2 + k 2 x 3 , d x 3 d t = k 1 x 1 x 2 k 2 x 3 k 3 x 3 , d x 4 d t = d x 5 d t = k 3 x 3 .

2.3. The Weighted Directed Laplacian Matrix

In CRNT, the (weighted directed) Laplacian matrix is a matrix representation of the reactions occurring between the different complexes. Here we explain how to construct the Laplacian matrix of the CRN using the (weighted directed) adjacency matrix corresponding to its graph of complexes. The adjacency matrix A is a c × c matrix, with c being the number of complexes of the CRN, such that its entry A i j is equal to k if there is a reaction having the jth complex of the network as substrate and ith complex of the network as a product with k being the rate constant of the reaction. The (weighted directed) degree matrix D of the graph of complexes corresponding to a CRN is a c × c diagonal matrix such that its ith diagonal entry is equal to the sum of the elements of the ith column of the weighted adjacency matrix A. The c × c Laplacian matrix associated with the graph of complexes of the considered CRN is defined as follows:
L = D A .
If there is a reaction for which the complex C j is the substrate and the complex C i is the product, then the off-diagonal element L i j is equal to the rate constant of the respective reaction taken with the negative sign. For useful properties of Laplacian matrices, we refer to [35]. Any directed graph is defined by an incidence matrix [36], which represents the connections between its vertices and edges. In the case of CRNs, the c × r incidence matrix B of the graph of complexes is defined as follows:
B i j = 1 , if the complex C i is the substrate of the reaction R j , 1 , if the complex C i is the product of the reaction R j 0 , otherwise .
Define the c × r outgoing matrix Δ of the considered CRN as follows:
Δ i j = 0 , if B i j = 1 B i j , otherwise .
It can be shown that the Laplacian is given in matrix multiplication form as follows:
L = B Γ k Δ .
For automatic modelling purposes, it is convenient to construct the Laplacian matrix using the simple matrix multiplication form given in (7).
Next, we show how to represent the balance laws (1) in terms of the weighted directed Laplacian matrix L. The c complexes of the considered CRN are described by an s × c complex composition matrix Z, whose columns express the complexes of the CRN in terms of their species. More precisely, the element Z i j of the complex composition matrix Z is the number of moles of the ith species X i in the expression of jth complex C j . As explained in [29,37], it can be shown that the balance laws of a mass action CRN can be rewritten as follows:
d x d t = Z L ψ ( x ) ,
where ψ : R + s R + c is the complex expression function defined as:
ψ ( x ) = i = 1 s x i Z i j j = 1 c .
Example 2.
With reference to the CRN example (4), the 3 × 3 weighted adjacency matrix A and the 3 × 3 weighted degree matrix D are:
A = 0 k 2 0 k 1 0 0 0 k 3 0 , D = k 1 0 0 0 k 2 + k 3 0 0 0 0 .
The 3 × 3 weighted directed Laplacian L of the CRN (4) is therefore given by
L = k 1 k 2 0 k 1 k 2 + k 3 0 0 k 3 0 ,
On the other hand, the Laplacian matrix can be computed using Equation (7) with the incidence matrix B and the outgoing matrix Δ given by:
B = 1 1 0 1 1 1 0 0 1 , Δ = 1 0 0 0 1 1 0 0 0 .
The balance laws (5) can be rewritten as Equation (8) with the 5 × 3 complex composition matrix Z and the complex expression function ψ given by:
Z = 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 , ψ ( x ) = x 1 x 2 x 3 x 4 x 5 .
In the following theorem we recall certain important spectral properties (see, e.g., [38]) of the weighted directed Laplacian matrix associated with the graph of complexes corresponding to a CRN governed by MAKRL.
Theorem 1
(Spectrum of the weighted directed Laplacian matrix). If the graph of complexes of a CRN governed by MAKRL has a single linkage class, then the eigenvalues λ i , i = 1 , , c , of the weighted directed Laplacian matrix L associated with the CRN can be ordered in such a way that:
0 = λ 1 < ( λ 2 ) ( λ 3 ) ( λ c ) .
First note that 0 σ ( L ) . This is simply because of the fact that the sum of each column of L is equal to zero according to the definition of the Laplacian matrix, i.e., L 𝟙 c = 𝟘 c . Moreover, from the generalization of the matrix-tree theorem it follows that the multiplicity of the zero eigenvalue is equal to the number of connected components, which is c rank ( L ) . Using Greshgorin’s circle theorem [39], it can be shown that the real parts of non-zero complex eigenvalues of L are strictly positive, i.e., if λ σ ( L ) and λ 0 , then ( λ ) > 0 . For a detailed explanation of the proof of Theorem 1 we refer to [40].

2.4. Kron Reduction of Chemical Reaction Networks

The Kron reduction for mathematical models of CRNs [29,37,41] is performed by assuming that certain intermediate complexes are complex balanced and is carried out by computing the Schur complement of the weighted Laplacian matrix associated with the corresponding graph of complexes. We therefore first recall the definition of Schur complements (see, e.g., [42]) of a given square matrix.
Definition 1
(Schur complement). Let A 1 R n × n , A 2 R n × m , A 3 R m × n , and A 4 R m × m be constant matrices such that the latter is invertible. Consider the following ( n + m ) × ( n + m ) block matrix:
L = A 1 A 2 A 3 A 4 .
The Schur complement of the block matrix A 4 is the n × n matrix L ^ defined as:
L ^ = A 1 A 2 A 4 1 A 3 .
Let I be the set of indices corresponding to the complexes of the CRN, i.e., I = { 1 , , c } . Suppose our objective involves removing the complexes associated with the subset of indices denoted as I ¯ . Note that it should be ensured that | I ¯ | < c . The removal of complexes is accomplished through the computation of the Schur complement L ^ R | I ^ | × | I ^ | of the block matrix of the Laplacian matrix L corresponding to the set of indices I ¯ . Here, I ^ = I I ¯ is the set of indices corresponding to the complexes remaining in the reduced graph of complexes. L ^ is again a Laplacian matrix since it satisfies the properties of Laplacian matrices ([29], Proposition 1). Furthermore, it has been proven that the equation
d y d t = Z ^ L ^ ψ ^ ( y ) ,
describes the dynamics of a CRN governed by MAKRL, with a smaller number of complexes. Here, y R + s is the vector of species’ concentrations in the reduced mathematical model (which contains a subset of the elements of x.), Z ^ is the complex composition matrix of the reduced CRN, and
ψ ^ ( y ) = i = 1 s y i Z ^ i j j = 1 | I ^ | .
As explained in [29,37], a well-chosen I ¯ will result in a reduction of dependent variables within the corresponding mathematical model. Note that the parameters contained in the Kron-reduced mathematical model can be represented as a function of the parameters involved in the original model. More precisely, if p R + r ^ denotes the vector of parameters of the reduced model with r ^ being the number of reactions in it, then there is a function f : R + r R + r ^ such that p = f ( k ) . In general, the manual derivation of the explicit form of the function f is not straightforward. However, we use MATLAB symbolic variables to derive the explicit form of f in a fully automated fashion. We refer to the function f as the parameter dependence function, since it specifies the dependence of the vector of parameters p of the reduced model on the vector of parameters k of the original model.
In order to determine the structure of the reduced CRN, we need to find the incidence matrix and the complex composition matrix of the reduced network. This can be done according to the automated procedure described in [37]. The incidence matrix B ^ is determined by making use of its Laplacian matrix L ^ . According to this procedure, if L ^ i j < 0 , i j , then in the reduced graph of complexes there is a reaction for which the jth complex of the reduced CRN is the substrate and ith complex of the reduced CRN is the product complex. Therefore, the entries of the incidence matrix B ^ of the reduced graph of complexes are defined according to (6). The complex composition matrix Z ^ of the reduced CRN is obtained by simply removing the columns of the incidence matrix Z of the original CRN that correspond to the set of indices I ¯ . As mentioned earlier, the incidence matrix describes the reactions occurring between the complexes and the complex composition matrix gives the expression of complexes in terms of the species. We therefore use B ^ and Z ^ to determine the reactions corresponding to the reduced graph of complexes.
Example 3.
To illustrate the Kron reduction method for CRNs, we demonstrate it for the example given in (4). Assume that we want to delete the complex C 2 from the graph of complexes by applying the Kron reduction method. In other words, I ¯ = 2 and I ^ = 1 , 3 . The elimination of C 2 is carried out by computing the Schur complement of the Laplacian matrix corresponding to the set of indices I ¯ , which results in the following 2 × 2 weighted directed Laplacian matrix associated with the reduced graph of complexes:
L ^ = k 1 k 3 k 2 + k 3 0 k 1 k 3 k 2 + k 3 0 .
As explained in [37], the complex composition matrix Z ^ of the reduced CRN is obtained by eliminating the second column of the complex composition matrix Z of the original CRN, i.e.,
Z ^ = 1 0 1 0 0 0 0 1 0 1
Thus the balance laws of the Kron-reduced model are as follows:
d y 1 d t = d y 2 d t = k 1 k 3 k 2 + k 3 y 1 y 2 , d y 3 d t = 0 , d y 4 d t = d y 5 d t = k 1 k 3 k 2 + k 3 y 1 y 2 .
Using the Laplacian matrix L ^ of the Kron-reduced model we obtain the incidence matrix B ^ of the reduced complex graph:
B ^ = 1 1 .
Taking into account Z ^ and B ^ we derive the reactions of the reduced CRN:
X 1 + X 2 p X 4 + X 5 ,
where the parameter p is given in terms of the parameters of the original model as p = k 1 k 3 k 2 + k 3 , i.e., in this case for the explicit form of the function f we have f ( k ) = k 1 k 3 k 2 + k 3 . Note that after deleting the complex C 2 from the graph of complexes by the Kron reduction approach, the species X 3 is not involved in the resulting reduced CRN, since its concentration x 3 is conserved in time.
In [29,37], the optimal combination of complexes for deletion is selected by making use of an error integral, which quantifies the difference between the dynamical behaviors of the original model and the corresponding reduced model. This error integral is measure that is based on a particular trajectory corresponding to the original model.

2.5. The Least Squares Optimization Method

We explain how to apply the (weighted) least squares optimization technique to estimate the parameters κ R + p involved in the kinetic model (1) describing the dynamics of the given CRN. For the general least squares optimization method we refer to, for example, [3,4]. This optimization method plays a crucial role in our parameter estimation method.
Assume that biological experiments provide complete experimental data of species’ concentrations, i.e., measurements that correspond to all of the species’ concentrations. For j = 1 , , n , let x ^ i , m ( j ) , i = 1 , , s ; m = 0 , , N j , be the observed value of the ith concentration at time instant t m ( j ) , which is the mth time point corresponding to the jth experiment. We aim to identify the best-fitting parameter values of the mathematical model (1) corresponding to the above-mentioned observed time-series experimental data of species’ concentrations. For every j = 1 , , n , consider the following initial value problem (IVP):
d x d t = S ν ( κ , x ) x ( t 0 ( j ) ) = x ^ i , 0 ( j ) .
Since the available experimental data corresponds to the measurements of all the species’ concentrations, for every parameter vector κ R + p the IVP given in (10) generates data for the species’ concentrations. More precisely, the IVP (10) can be numerically solved with respect to time. However, this numerical integration is not always possible if the available experimental data corresponds to only some of the concentrations.
For every j = 1 , , n , let x i ( t m ( j ) , κ ) , i = 1 , , s ; m = 0 , , N j , denote the model-predicted value of the ith concentration obtained by numerically solving the IVP (10). In this case, the least squares error is defined as the sum of squared residuals, which are the differences between the observed experimental values of concentrations and the corresponding model-predicted values provided from the IVP given in (10):
ε ( κ ) = j = 1 n i = 1 s w i ( j ) m = 0 N j x i ( t m ( j ) , κ ) x ^ i , m ( j ) 2 .
Here, for i = 1 , , s and j = 1 , , n , w i ( j ) denotes the weight of the corresponding measurement. In this case, it is assumed that the measurements have different uncertainties. Each weight can be taken, for example, equal to the reciprocal of the variance of the measurement:
w i ( j ) = 1 σ i ( j ) 2 , i = 1 , , s ; j = 1 , , n .
We will refer to this approach as the weighted least squares (WLS). If the measurements have equal variance, then the weights can be taken equal to one. We will refer to the corresponding approach as the unweighted least squares (UWLS). The (weighted) least squares optimization technique finds the optimal parameter values by minimizing the error (11). This minimization can be done, for example, by the standard Levenberg-Marquardt algorithm [43,44], or the modified Levenberg-Marquardt algorithm [45]. We denote by κ ^ R + p the solution to the aforementioned optimization problem, i.e., κ ^ = arg min κ ε ( κ ) .

3. Parameter Identifiability

In this section, we first recall the definitions of least squares parameter identifiability and parameter identifiability, and in addition demonstrate the link between these two identifiability concepts. Consider a dynamical system described by a system of ODEs that is given in following the form:
d x d t = f ( κ , x ) y = g ( κ , x ) ,
where f is an s-dimensional vector-valued function depending on the structure of the system and g is n-dimensional. Here, κ R + p is the parameter vector, x : R + R + s is the vector of states of the system and y is the n-dimensional output vector. For a given vector of initial states x 0 R + s , let y ( t | κ , x 0 ) denote the output trajectory of the system (12) corresponding to the parameter vector κ R + p and initial states x 0 R + s .
Assume that, in an experimental setup, the output has been continuously measured over the time interval [ 0 , T ] , for some pre-specified T > 0 . Let y ^ : R + R n be the resulting measured output. Consider the cost function ε x 0 : R + p R + defined as:
ε x 0 ( κ ) = i = 1 n 0 T y i ( t | κ , x 0 ) y ^ i ( t ) 2 d t ,
We recall the definition of least squares parameter identifiability of dynamical systems given in the form (12), which was first introduced in [46].
Definition 2
(Least squares parameter identifiability). The dynamical system (12) is least squares parameter identifiable, if for every given vector of initial states x 0 and for every given measurement function y ^ , the cost function (13) admits a unique minimum.
If there exists at least one vector of initial states x 0 and a measurement function y ^ such that the cost function (13) has multiple minima, then the dynamical system (12) is least squares parameter nonidentifiable.
Each vector of parameters κ R + p determines a set Y κ of admissible output trajectories of the system (12). We recall the definition of parameter identifiability of dynamical systems provided in [14].
Definition 3
(Parameter identifiability). The dynamical system (12) is parameter identifiable, if for parameter vectors κ , κ ¯ R + p such that κ κ ¯ , we have Y κ Y κ ¯ .
Equivalently, the dynamical system (12) is parameter identifiable, if Y κ = Y κ ¯ for two parameter vectors κ , κ ¯ R + p , implies κ = κ ¯ . If there are two distinct parameter vectors κ κ ¯ for which Y κ = Y κ ¯ , then the dynamical system (12) is parameter unidentifiable.
We finally turn our attention to the main contribution of this section. In the following theorem we specify a link between the two identifiability concepts given above.
Theorem 2.
If the dynamical system (12) is parameter unidentifiable, then it is also least squares parameter unidentifiable.
Proof. 
We assume that the dynamical system (12) is parameter unidentifiable and we prove that it is also least squares parameter unidentifiable. Since the dynamical system (12) is parameter unidentifiable, there are two parameter vectors κ , κ ¯ R + p , such that κ κ ¯ and Y κ = Y κ ¯ . This means that there is at least one vector of initial states x 0 R + s for which y ( t | κ , x 0 ) = y ( t | κ ¯ , x 0 ) , t [ 0 , T ] . Choose a measurement function y ^ : R + R n as:
y ^ ( t ) : = y ( t | κ , x 0 ) = y ( t | κ ¯ , x 0 ) , t [ 0 , T ] .
Note that this choice results in ε x 0 ( κ ) = ε x 0 ( κ ¯ ) = 0 . For two parameter vectors κ κ ¯ there is a vector of initial states x 0 R + s and a measurement function y ^ : R + R n such that ε x 0 ( κ ) = ε x 0 ( κ ¯ ) = 0 . Note that zero is the minimum value of ε x 0 since it is a non-negative function. We conclude that the minimum of ε x 0 is not unique in this case and thus the dynamical system (12) is least squares parameter unidentifiable. □

4. Parameter Estimation Procedure

In this section, we describe the main contribution of the current manuscript. We show how to use the Kron reduction method for kinetic models described in the previous section, as a tool for estimating the parameters involved in the corresponding mathematical model from time-series partial experimental data of species’ concentrations.

4.1. Problem Statement: Available Experimental Data

Typically, biological experiments provide measurements of species’ concentrations or reaction rates. In this manuscript, we assume that in an experimental setup some of the species’ concentrations are measured. Suppose that the output of a biological experiment corresponding to the mathematical model (8) is of the form:
z = H x ,
where H R n × s is a constant matrix known as the measurement matrix. In general, H can have an arbitrary structure. However, in general only some of the species’ concentrations are measured experimentally. Thus, we assume that each row of H has only one non-zero element, which is equal to one and is placed in the position corresponding to the particular species. For instance, with reference to the CRN (4), if the species that are measured experimentally are X 1 , X 2 , X 4 and X 5 , then
H = 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 .
Further assume that we have l different sets of time-series data of z collected from biological experiments on the same CRN. For j = 1 , , l ;   i = 1 , , n and m = 0 , , N j , let z ^ i , m ( j ) be the observed value of the ith output at time instant t m ( j ) corresponding to the jth experiment. For compactness, we consider the following observed experimental time-series data sets:
Λ ( j ) = t m ( j ) , z ^ i , m ( j ) i = 1 , , n ; m = 0 , , N j , j = 1 , , l .
We aim to identify the best-fitting parameters corresponding to the mathematical model (8) of the considered CRN from observed time-series data Λ ( j ) , j = 1 , , l , collected from biological experiments.

4.2. Trajectory-Independent Error

We propose a trajectory-independent alternative method to the computation of the error integral defined in [29] for comparing the dynamics of the Kron-reduced model to the one of the original mathematical model. In our parameter estimation procedure, this error will be used as an objective function for minimization. It is based on matching the so called relaxation constant of the Kron-reduced mathematical model to the one of the original mathematical model. We define the relaxation constant  τ ( L ) of a CRN as the smallest non-zero real part of the eigenvalues of its Laplacian matrix, i.e.,
τ ( L ) = min λ σ ( L ) λ 0 ( λ ) .
The quantity τ ( L ) illustrates a key characteristic property of the CRN that is represented by the Laplacian matrix L. In the case when each complex is a single-species complex, i.e., Z = I s , it can be shown that relaxation constant τ ( L ) is an exact indicator of the settling-time of the CRN, i.e., the time instant after which the concentrations of all the species fall and remain within some specified percentage of their corresponding steady-state values. This case is encountered commonly as in one of our demonstrative real-life examples considered in the next section. In this case, the solution to the balance laws (8) is of the form:
x ( t ) = i = 1 q j = 0 p i 1 t j w ( i j ) ( t ) e ( λ i ) t ,
where p i is the multiplicity of the eigenvalue λ i and
w ( i j ) = b ( i j ) cos ( λ i ) t + c ( i j ) sin ( λ i ) t with a , b ( i j ) , c ( i j ) R s .
Since the real parts of eigenvalues of the Laplacian matrix L are all nonnegative, from (16) it follows that the settling-time of the CRN is determined by the slowest decaying term within the summation sign in the right hand side of the equation. Observe that for every i = 1 , , q the terms t j w ( i j ) ( t ) e ( λ i ) t , j = 1 , , p i 1 , decay faster than the term w ( i 0 ) ( t ) e ( λ i ) t whose time constant is equal to ( λ i ) 1 . Among the terms w ( i 0 ) ( t ) e ( λ i ) t , the slowest decaying term is the one with the biggest time constant, i.e., the one with the smallest value of ( λ i ) . Thus, for comparing the settling-time of the Kron-reduced model with the one of the original model it is reasonable to compare the relaxation constant τ ( L ^ ) of the Kron-reduced model with the relaxation constant τ ( L ) of the original mathematical model. In general, irrespective of whether the complex composition matrix Z is the identity matrix or not, we quantify the difference between the dynamics of the original mathematical model and the one of the Kron-reduced model using the trajectory-independent spectral based error δ : R + r R + defined as:
δ ( k ) = τ ( L ) τ ( L ^ ) .
Note that the spectral based error δ defined in (17) is a function that only depends on the vector of parameters k contained in the original model. This is due to the fact that the entries as well as the eigenvalues of both the original Laplacian matrix L and the Kron-reduced Laplacian matrix L ^ are functions of the parameter vector k.
Remark 1.
In the physical sciences, the term “relaxation" usually refers to the return of a system to equilibrium. Each relaxation process can be categorized by a relaxation time (see, e.g., [47,48,49]). Since the quantity τ ( L ) is an indicator of the settling time of the CRN, i.e., its relaxation, we refer to the constant τ ( L ) as the relaxation constant of the CRN described by the Laplacian matrix L.

4.3. Estimation Procedure

We turn our attention to the parameter estimation procedure. We assume that in an experimental setup certain measurements Λ ( j ) , j = 1 , , l , of the form given in (15) are collected. The principal goal is to find estimates for the parameters k of the original model (8) such that the corresponding model-predicted values fit the available time-series partial experimental data (15).
As explained in Section 2.5, since we do not have complete data on species’ concentrations, direct estimation of the parameters involved in the balance laws (8) is not possible. In order to convert the problem to a well-posed parameter estimation problem, we first derive the Kron-reduced mathematical model obtained after deleting a certain subset of complexes from the corresponding graph of complexes. For this purpose, we identify this subset of complexes by making use of the complex composition matrix Z and the measurement matrix H. The latter specifies the set of indices J corresponding to the species that are measured experimentally. For instance, if the measurement matrix H is the one given in (14), then the above-mentioned set of indices is simply J = 1 , 2 , 4 , 5 , i.e., only the species X i , i = 1 , 2 , 4 , 5 , are measured experimentally. Furthermore, the complex composition matrix Z identifies the set of complexes I ¯ that have at least one unmeasured species. We now delete the complexes corresponding to the set of indices I ¯ using the Kron reduction method. The s ^ × c ^ complex composition matrix Z ^ and the c ^ × r ^ incidence matrix B ^ are computed using the procedure explained earlier.
Subsequently, we determine the parameter dependence function f : R + r R + r ^ . In order to express the vector of parameters p R + r ^ of the Kron-reduced model as a function of the vector of parameters k R + r of the original model, i.e p = f ( k ) . In general, the explicit form of this function is complicated and its manual derivation is not straightforward. However, the usage of MATLAB symbolic variables allows us to derive the explicit form of the parameter dependence function f. Using the properties of the Kron reduction method for CRNs governed by mass action kinetics rate law, we conclude that the dependence function is a vector-valued rational function of its argument.
We now consider the parameter estimation problem for the Kron-reduced mathematical model from the available observed time-series experimental data of species’ concentrations (15). It is a well-posed parameter estimation problem since the available experimental data corresponds to all the concentrations involved in the Kron-reduced mathematical model. We apply the least squares optimisation technique to determine the best-fitting values of parameters p ^ R r ^ , i.e., the parameter values for which the available observed time-series experimental data Λ ( j ) , j = 1 , , l given in (15) fits the Kron-reduced model.
Finally, we determine the values of parameters k ^ R r for which the trajectory-independent spectral-based error function δ given in (17) admits its minimum value subject to the constraint
f ( k ) = p ^ .
In other words, we have the following well-posed constrained optimisation problem
min k R + r δ ( k ) subject to f ( k ) = p ^ .
We use the method of Lagrange multipliers to solve the above-mentioned problem using MATLAB Optimization Toolbox. We developed a MATLAB library for the automation of our estimation procedure. The inputs required for our parameter estimation procedure are the s × c complex composition matrix Z, the c × r incidence matrix B, the n × s measurement matrix H, and the observed time-series experimental datasets Λ ( j ) , j = 1 , , l . The output of the parameter estimation method is the vector k ^ R + r of the best-fitting parameter values corresponding to the experimental time-series observed data given in (15).

5. Application to Real-Life Examples

We demonstrate the applicability of our automated parameter estimation algorithm from observed time-series partial experimental data of species’ concentrations on two real-life computational models of biological processes retrieved from the BioModels database [32]. We consider a model of nicotinic acetylcholine receptors [33] and a model of Trypanosoma brucei trypanothione synthetase [34]. For each of these models, we first generate partial time-series data corresponding to the species’ concentrations using the values of parameters provided in the corresponding reference. Next, we perturb these generated data with white Gaussian noise with zero mean and sufficiently small standard deviation. We then apply our estimation technique to determine the best-fitting values of parameters in a fully automated manner. The corresponding MATLAB library is provided as Supplementary Material.

5.1. Nicotinic Acetylcholine Receptors

We consider a model of nicotinic acetylcholine receptors (NAR) developed in [33]. Nicotinic receptors are receptor polypeptides (short chains of amino acids) that respond to the neurotransmitter acetylcholine (a signalling molecule secreted by a neuron) as well as to drugs such as the agonist nicotine. They are found in the central and peripheral nervous systems, muscles, and several other tissues of different organisms. A schematic representation of the network corresponding to the mathematical model of NAR is provided in the left-hand panel of Figure 1. A detailed description of the mathematical model can be found in [33].

5.1.1. The Considered Model and the Available Data

In the considered model of NAR, there are 12 chemical species participating in 17 reversible mass action reactions. All complexes here are single-species complexes. There are thus 12 complexes in the corresponding graph of complexes. As mentioned earlier, for the purpose of automated modelling purposes we regard each reversible reaction as two separate unidirectional reactions. Table 1 provides an overview of the primary compounds participating in the reaction system.
The rest of the compounds are intermediate enzyme complexes participating in the reaction network. The network consists of the following reactions, all of which are reversible.
B k 2 k 1 BL ,   B k 12 k 11 A ,
BL k 4 k 3 BLL ,   BL k 14 k 13 AL ,
BLL k 6 k 5 ALL ,   I k 16 k 15 IL ,
A k 8 k 7 AL ,   IL k 18 k 17 ILL ,
AL k 10 k 9 ALL ,   A k 20 k 19 I ,
AL k 22 k 21 IL ,   I k 30 k 29 D ,
ALL k 24 k 23 ILL ,   IL k 32 k 31 DL ,
D k 26 k 25 DL ,   ILL k 34 k 33 DLL .
DL k 28 k 27 DLL ,
The values of the parameters k i , i = 1 , , 34 , provided in [33] are given in Table 2. We assume that the species that are measured in an experimental setup are the ones having a greater impact on the reactions, which are the activatable resting closed state B (the Basal state), the state of higher affinities with open channel A (the Active state), the states of higher affinities with closed channels I (the Inactivatable state) and D (the Desensitized state). In other words, the corresponding measurement matrix is:
H = I 4 𝟘 4 × 8 .
We generate data for these species using the values of parameters provided in [33]. Additionally, we perturb the obtained data with white Gaussian noise of zero mean and standard deviation 0.0001 . We explain how to estimate the parameters involved in the corresponding mathematical model from the above-mentioned time-series partial data of concentrations using our parameter estimation method.

5.1.2. Proving Parameter Unidentifiability

Before we perform the parameter estimation, we show that the parameters of the mathematical model under consideration cannot be uniquely determined from the available partial time series data of the species’ concentrations. The reason is that we are able to provide two different vectors of parameters, both leading to the same system of ODEs corresponding to the measured concentrations of the species. Consider two vectors of parameters k , k ¯ R + 34 with the following properties:
k i = k ¯ i , i = 1 , 7 , 11 , 12 , 15 , 19 , 20 , 25 , 29 , 30 , k i = k ¯ i = 0 , i = 2 , 8 , 16 , 26 , k i k ¯ i , otherwise .
Any two vectors of parameters k and k ¯ with these properties lead to the same ODE system corresponding to the measured concentrations of the species, and thus to the same sets of admissible output trajectories. Note that it may be cumbersome to derive these properties manually. Therefore, we have used MATLAB symbolic variables to derive these properties in an automatic way. We conclude that the corresponding mathematical model is parameter unidentifiable. From Theorem 2 it follows that the considered mathematical model is also least squares parameter unidentifiable.

5.1.3. Parameter Estimation Procedure

According to our procedure, we first determine the Kron reduced mathematical model obtained by deleting the complexes involving at least one species that is not measured experimentally. By making use of the complex composition matrix and the measurement matrix we identify the single-species complexes BL, AL, IL, DL, BLL, ALL, ILL, and DLL to be the ones that should be deleted from the original mathematical model. These are the precisely the species whose concentrations are not measured. We consequently obtain a new mathematical model in which only the concentrations of the measured species are involved. The reactions corresponding to the Kron reduced mathematical model are given below:
B p 2 p 1 A ,   I p 6 p 5 D ,   B p 10 p 9 D ,
A p 4 p 3 I ,   A p 8 p 7 D ,   B p 12 p 11 I .
The Kron reduced network is schematically illustrated in the right-hand panel of Figure 1. Recall that the vector of parameters p is a function of the vector of parameters k. We now possess a complete time-series data of species’ concentrations corresponding to the Kron reduced mathematical model. In other words, we have a well-posed parameter estimation problem from species’ concentrations, which is solved using the well known least squares method. In order to test the performance of our parameter estimation method, we use both UWLS and WLS to determine the corresponding values of the parameters for which the Kron reduced mathematical model fits the available data of species’ concentrations. These estimates are given in Table 3.
Figure 2 represents the comparison of the available time-series data of species’ concentrations of the model of NAR and the corresponding predicted values (for both WLS and UWLS approaches) obtained from the Kron reduced mathematical model with estimated parameters provided in Table 3.
Note that, while Kron reduction method effectively reduces the complexity of the original mathematical model, it may not fully capture all relevant dynamical properties. As can be seen from Figure 2, the Kron reduced mathematical model is not a good fit for the available time-series data. Finally, we determine the values of parameters k ^ (for both WLS and UWLS) that minimize the eigenvalue-based error δ defined in (17) subject to the constraint defined in (18). These best-fitting values of parameters are provided in Table 2. The comparison of the available time-series data of species’ concentrations and the corresponding model predicted values obtained from (8) with estimated parameter values k ^ (for both WLS and UWLS approaches) is given in Figure 3.
We observe that our parameter estimation method is able to derive a complete mathematical model that is able to make accurate predictions about the dynamics of the CRN. Note from Table 2 that the estimated values of some of the parameters, e.g., k i , i = 21 , , 24 and i = 31 , , 34 , differ by a large percentage from their corresponding values provided in [33]. However, the mathematical model with the estimated values of parameters is a reasonably good fit (as can be seen in Figure 3) for the generated time-series data of species’ concentrations. The reason behind this, as discussed earlier, is the fact that the parameters involved in the considered mathematical model of NAR are not least squares identifiable from the output corresponding to the compounds B, A, I, and D, as proved in Section 5.1.2.
Recall that in our parameter estimation method, we make use of the least squares optimization technique. As mentioned above, we used both WLS and UWLS to determine the best-fitting values of parameters corresponding to each of these approaches (see Table 2). We perform LOOCV in order to understand which one of these approaches is a preferable technique for data-fitting in our proposed parameter estimation procedure. We first give a short summary of LOOCV. It involves splitting the data set into two parts: a single datapoint, that is used for validation (excluded datapoint); and the remaining datapoints (training dataset), that make up the set used for parameter estimation. The parameter estimation procedure is applied using the training dataset, and a model-predicted value is computed for the excluded datapoint. We then compute the mean squared error between the excluded datapoint and its corresponding model-predicted values. Since the excluded datapoint is not used in the fitting process, this error provides an unbiased estimate for the test error. On the other hand, it is a poor estimate for the test error since it is based upon a single datapoint. We can repeat the procedure by excluding each datapoint of the given data exactly once and compute the mean squared error between the excluded datapoint and its corresponding model-predicted values. The LOOCV training error is the average of these test error estimates.
We used WLS in our parameter estimation procedure and computed the LOOCV training error. Subsequently, we used UWLS in our parameter estimation procedure and again computed the LOOCV training error. These training errors are provided in Table 4.
As we can see from this table, using UWLS in our parameter estimation procedure results in a smaller LOOCV training error than using WLS. We thus conclude that, for this particular dataset, UWLS is a preferable approach of data-fitting in our proposed parameter estimation procedure.
We calculated the 95% confidence intervals for the estimated values of parameters corresponding to UWLS by performing bootstrapping (see, e.g., [50,51,52,53]) for our parameter estimation method. The procedure for this calculation is as follows. We first construct 2000 resampled datasets (of the same length as the available data) by randomly choosing datapoints from the available data. Note that the same datapoint can be chosen multiple times. Secondly, we use our parameter estimation method to determine the best-fitting values of parameters corresponding to each of the resampled dataset. As a result, for each of the parameters we obtain a set of 2000 estimates. The 95% bootstrap confidence intervals were constructed by choosing 2.5% and 97.5% percentiles of the corresponding bootstrap estimates. These confidence intervals are included in Table 2.

5.2. Trypanosoma Brucei Trypanothione Synthetase

Subsequently, we demonstrate the applicability of our new parameter estimation method on a very different type of a CRN. We consider a kinetic model of Trypanosoma brucei trypanothione synthetase (TBTS). This mathematical model of TBTS was developed in [34] and describes the entire kinetic profile. Trypanosoma brucei is a species of parasitic kinetoplastid (an organism whose cells contain a cell nucleus and that is not an animal, plant, or fungus). Unlike other parasites (normally infecting blood and tissue cells) it is exclusively extracellular and inhabits the blood plasma as well as body fluids. It causes deadly diseases in humans such as African trypanosomiasis or sleeping sickness. A trypanothione synthetase is a catalytic enzyme. A schematic representation of the model of TBTS is shown on the left-hand side of Figure 4.

5.2.1. The Considered Model and the Available Data

In the considered mathematical model of TBTS, there are 22 species participating in 59 unidirectional reactions in terms of 24 distinct complexes. Table 5 provides an overview of the primary compounds participating in the reaction system.
The remaining compounds occurring in the network are intermediate complexes of the enzymes E and X. The reactions occurring in the network are given as:
E k 2 k 1 EA ,   XQ k 31 k 30 XQB ,
EA k 4 k 3 EAB ,   XQ k 33 k 32 XQR ,
E k 6 k 5 EB ,   E k 35 k 34 EC ,
EB k 8 k 7 EAB ,   EC k 37 k 36 EAC ,
EAB k 10 k 9 EABQ ,   EAC k 39 k 38 EABC ,
EA k 12 k 11 EAQ ,   EC k 41 k 40 EBC ,
EB k 14 k 13 EBQ ,   EBC k 43 k 42 EABC ,
E k 16 k 15 EQ ,   EA k 45 k 44 EAC ,
EQ k 18 k 17 EAQ ,   EB k 47 k 46 EBC ,
EAQ k 20 k 19 EABQ ,   EAB k 49 k 48 EABC ,
EQ k 22 k 21 EBQ ,   EABC k 50 P + XC ,
EBQ k 24 k 23 EABQ ,   XC k 52 k 51 E _ Q ,
EABQ k 25 P + XQ ,   E _ Q k 54 k 53 E ,
XQ k 27 k 26 ER ,   XC k 56 k 55 XCB ,
ER k 29 k 28 E ,   XC k 58 k 57 XCR ,
P k 59 S .
where k i is the rate constant of the ith reaction. The values of these parameters provided in [34] are listed in Table 6.
Note that, unlike the previous example, not every complex is a single-species complex. We assume that the species that are measured experimentally are P, S, XQ, XC, XQB, XQR, XCB, XCR and the main enzyme E. We generate data corresponding to the concentrations of the above-mentioned species using the balance laws (8) and the parameter values provided in [34]. These parameters are given in Table 6. Additionally, the obtained data have been perturbed with white Gaussian noise of zero mean and standard deviation 0.02.

5.2.2. Proving Parameter Unidentifiability

We show that the parameters of the considered mathematical model cannot be uniquely determined from the available partial time series data of the species’ concentrations. Similar to the case of the mathematical model of NAR, we provide two different vectors of parameters, both leading to the same system of ODEs corresponding to the measured concentrations of the species. Consider two vectors of parameters k , k ¯ R + 59 with the following properties:
k i = k ¯ i , i = 1 , 2 , 5 , 6 , 15 , 16 , 19 , 25 , , 35 , 50 , , 59 , k i k ¯ i , otherwise .
Any two vectors of parameters k and k ¯ with these properties lead to the same ODE system corresponding to the measured species’ concentrations, and thus to the same sets of admissible output trajectories. These properties have been derived in an automatic way using MATLAB symbolic variables. We conclude that the corresponding mathematical model is parameter unidentifiable. From Theorem 2 it follows that the considered mathematical model is also lest squares parameter unidentifiable.

5.2.3. Parameter Estimation Procedure

Using the complex composition matrix and the measurement matrix our procedure selects the single-species enzymatic complexes EA, EB, EC, EQ, ER, E_Q, EAB, EAC, EAQ, EBC, EBQ, EABC, EABQ to form the set of complexes that should be deleted from the graph of complexes by applying the Kron reduction method. The reactions corresponding to the resulting Kron-reduced model are given as:
E p 1 P + XQ ,   XC p 10 p 9 E ,
E p 2 P + XC ,   XC p 12 p 11 XCB ,
XQ p 4 p 3 E ,   XC p 14 p 13 XCR ,
XQ p 6 p 5 XQB ,   P p 15 S .
XQ p 8 p 7 XQR ,
Here, as usual, for every i = 1 , , 15 , p i denotes the rate constant of the ith reaction corresponding to the Kron-reduced model. The schematic representation of the Kron reduced network is illustrated in the right-hand panel of Figure 4.
Since we have complete data of the concentrations of the species involved in the Kron reduced model, we may apply the least squares method to find the best-fitting values of the parameters p ^ R + 15 . Similar to the case of the mathematical model of NAR, we use both WLS and UWLS to determine the values of the parameters for which the Kron reduced mathematical model fits the available data of species’ concentrations. These estimates are given in Table 7. Figure 5 visualizes the comparison of the available time-series data and the corresponding model predicted values (for both WLS and UWLS approaches) obtained from the Kron reduced model with estimated values of parameters p ^ . Note that, in this case, both WLS and UWLS result in similar model-predicted values for the output concentrations. Also note that the Kron reduced mathematical model is not a good fit for the available time-series data.
In the final step, we determine the values of parameters k ^ R + 59 (for both WLS and UWLS approaches) of the original model that minimize the eigenvalue-based error δ defined in (17) subject to the constraint defined in (18). These values of parameters are given in Table 6. The comparison of the available time-series data and the corresponding model predicted values obtained from the balance laws (8) with the estimated values of parameters k ^ (for both WLS and UWLS approaches) be seen in Figure 6. Note that our parameter estimation method resulted in a complete mathematical model that is able to make accurate predictions about the dynamical behavior of the CRN.
Observe from Table 6 that for some of the parameters, e.g., k i , i = 7 , , 14 and i = 20 , , 24 , there is a substantial difference between the values provided in [54] and their corresponding estimated values. The reason behind this is the fact that the parameters contained in the mathematical model are least squares unidentifiable from the available time-series partial data of species’ concentrations, as proved in Section 5.2.2.
Similar to the case of the mathematical model of NAR, we perform LOOCV to understand which one of WLS and UWLS is a more preferable approach for data-fitting in our proposed parameter estimation procedure. The corresponding LOOCV training errors are given in Table 8. As we can see from this table, unlike the case of NAR, using WLS in our parameter estimation procedure results in a smaller LOOCV training error than using UWLS. We thus conclude that, for this particular dataset, WLS is a more preferable approach of data-fitting in our proposed parameter estimation procedure. We calculate the 95% confidence intervals for the estimated values of parameters corresponding to WLS by performing bootstrapping for our parameter estimation method. These confidence intervals are included in Table 6.

6. Discussion

The Kron-reduced mathematical model with the best-fitting values of parameters (in the sense of least squares), as we can see from Figure 2 and Figure 5, is generally not an appropriate approximation, meaning that the corresponding model predicted values are far from being good fits for the available time-series data. This is because of the fact that, in general depending on the number of complexes deleted from the graph of complexes, it is not assured that the Kron-reduced model is a reasonable approximation for the original mathematical model.
The choice of the Kron reduction technique as a tool for reducing mathematical models in our parameter estimation method is based on several advantages of this reduction technique that are particularly appropriate for the problem. First of all, it does not impose any restrictions on the choice of complexes to be deleted. Thus, we can delete all the complexes containing at least a single unmeasured species. A second advantage is that Kron reduction method preserves the kinetics of the CRN, i.e., if MAKRL governs the given CRN, then the corresponding Kron-reduced model is also governed by this rate law. A third advantage of the Kron reduction method is that we are able to compare the dynamics of the original model to the one of the reduced model using the Laplacian matrix of the original model and the Laplacian matrix of the reduced model. To the best of our knowledge, there is no other reduction technique that offers all these aforementioned advantages.
The suggested parameter estimation method is only applicable to a mass action CRN with a constant Laplacian. This is because of the fact that in the estimation procedure, the eigenvalues of the Laplacian matrix are used. For a general enzymatic CRN, the corresponding Laplacian matrix is not constant since it depends on the vector of species’ concentrations. In such cases, it is not straightforward how to use a similar technique for parameter estimation purposes. The parameter estimation of enzyme kinetic reaction networks from partial data of species’ concentrations is still an open problem that will be considered in future work.
As explained in [37,41], a linkage class of a CRN is a connected component of its graph of complexes. It is also stated in these papers that if a network has a linkage class with only one reaction, then the removal of a complex involved in such a reaction by Kron reduction leads to the removal of the reaction. In the case, where the intermediate Kron reduction phase of our parameter estimation procedure leads to the removal of some of the reactions of the original model, we would expect that the estimated parameters of the original model associated with the removed reactions would have larger confidence intervals compared with those of the other parameters that are associated with the remaining reactions of the network.

7. Conclusions

In this paper, we have introduced an innovative parameter estimation approach for mathematical models of mass action CRNs using observed time-series incomplete experimental data of species’ concentrations. As far as we know, there exists no direct technique for deducing the parameters in a mathematical model from this sort of experimental data. We have addressed this problem by devising an algorithmic strategy, which involves the application of Kron reduction technique for kinetic models as an intermediate step in the overall parameter estimation approach. The complexes that should be deleted using Kron reduction are chosen in such a way that in the reduced model only the concentrations of the measured species are involved. Since all the species’ concentrations involved in the Kron-reduced model are measured we now have a well-posed parameter estimation problem. We estimate the parameters involved in the Kron-reduced model using the least squares method to identify the best-fitting values of the parameters involved in the Kron-reduced model. To estimate the parameters contained in the original mathematical model, we have devised a new trajectory-independent measure to quantify the difference between the dynamics of the original model and the corresponding Kron-reduced model. It is based on comparing the smallest non-zero real part of the eigenvalues of the original Laplacian matrix with the one of the Kron-reduced Laplacian matrix. The reason behind the choice of measure is the fact that the smallest non-zero real part of the Laplacian matrix is related to the settling time of the CRN that is characterized by the Laplacian matrix. This measure can be regarded as a function of the parameter vector of the original mathematical model since the eigenvalues of both the original Laplacian matrix as well as the Kron-reduced Laplacian matrix depend only on this vector of parameters. Finally, we find the estimates of the parameters for which the above-mentioned spectral-based measure admits its smallest value.
We have devised an automatic process for our parameter estimation approach, crafting a MATLAB library that can be employed to deduce the parameters from experimental data of species’ concentrations automatically. This MATLAB library is given as Supplementary Material. We utilized this MATLAB library to effectively employ our parameter estimation approach on two real-life CRNs taken from the Biomodels database [32]. For each of these models, we observed that our parameter estimation method resulted in a complete mathematical model that could make accurate predictions about the dynamics of the CRN. While we have exclusively applied and tested our parameter estimation method on a limited scale, involving only two real-life instances of CRNs, its applicability extends to all networks regulated by MAKRL. It should be noted that the method places no restriction on the size or scale of the model as well as the biological purpose of the associated network as long as its reactions are governed by MAKRL. Thus the method is applicable for models of core metabolism (for instance E.coli central carbon metabolism models reviewed in [55]) as well as models of regulatory networks. It can also be applied for small models like the one of NAR considered in this paper as well as genome-scaled models as in [56].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/bioengineering10091056/s1, The MATLAB library corresponding to the parameter estimation method described in the main manuscript.

Author Contributions

S.R. proposed the problem addressed in the manuscript, while M.G. formulated and further developed the problem; M.G. initially developed the solution methodology, which was further refined and enhanced by S.R.; M.G. conducted the computational experiments associated with the research; M.G. took the lead in writing the manuscript, and S.R. provided critical feedback, reviewed the content, and contributed to its improvement. Both authors actively participated in discussions and revisions to ensure the accuracy and clarity of the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work has been supported by the Research and Development Program of Ghent University Global Campus (GUGC), South Korea. We would like to express our sincere gratitude to Samvel B. Gasparyan (As­traZeneca, Gothenburg, Sweden) and Joris Vankerschaver (University of Ghent, Ghent, Belgium) for their invaluable assistance with both the numerical experiments and statistical analysis. Their contributions were instrumental in the successful completion of this research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CRNChemical Reaction Network
CRNTChemical Reaction Network Theory
IVPInitial Value Problem
LOOCVLeave-One-Out Cross-Validation
MAKRLMass Action Kinetics Rate Law
NARNicotinic Acetylcholine Receptors
ODEOrdinary Differential Equation
TBTSTrypanosoma Brucei Trypanothione Synthetase
UWLSUnweighted Least Squares
WLSWeighted Least Squares

References

  1. Edwards, L.M.; Thiele, I. Applying systems biology methods to the study of human physiology in extreme environments. Extrem. Physiol. Med. 2013, 2, 8. [Google Scholar] [CrossRef]
  2. Mock, A.; Chiblak, S.; Herold-Mende, C. Lessons we learned from high-throughput and top-down systems biology analyses about glioma stem cells. Curr. Pharm. Des. 2014, 20, 66–72. [Google Scholar] [CrossRef] [PubMed]
  3. Himmelblau, D.M.; Riggs, J.B. Basic Principles and Calculations in Chemical Engineering, 8th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2012. [Google Scholar]
  4. Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
  5. Fröhlich, F.; Kaltenbacher, B.; Theis, F.J.; Hasenauer, J. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS Comput. Biol. 2017, 13, e1005331. [Google Scholar] [CrossRef] [PubMed]
  6. Crampin, E.J.; Schnell, S.; McSharry, P.E. Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. Prog. Biophys. Mol. Biol. 2004, 86, 77–112. [Google Scholar] [CrossRef] [PubMed]
  7. Donnelly, J.K.; Quon, D. Identification of parameters in systems of ordinary differential equations using quasilinearization and data perturbation. Can. J. Chem. Eng. 1970, 48, 114–119. [Google Scholar] [CrossRef]
  8. Loskot, P.; Atitey, K.; Mihaylova, L. Comprehensive review of models and methods for inferences in bio-chemical reaction networks. Front. Genet. 2019, 10, 549. [Google Scholar] [CrossRef] [PubMed]
  9. Ross, J.; Schreiber, I.; Vlad, M.O. Determination of Complex Reaction Mechanisms: Analysis of Chemical, Biological, and Genetic Networks; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
  10. Antoniewicz, M.R. Methods and advances in metabolic flux analysis: A mini-review. J. Ind. Microbiol. Biotechnol. 2015, 42, 317–325. [Google Scholar] [CrossRef] [PubMed]
  11. Hodges, A.; Chatelier, R. Electrochemical Method for Measuring Chemical Reaction Rates. US Patent 6,444,115, 3 September 2002. [Google Scholar]
  12. Megchelenbrink, W.; Rossell, S.; Huynen, M.A.; Notebaart, R.A.; Marchiori, E. Estimating metabolic fluxes using a maximum network flexibility paradigm. PLoS ONE 2015, 10, e0139665. [Google Scholar] [CrossRef]
  13. Orth, J.D.; Thiele, I.; Palsson, B.Ø. What is flux balance analysis? Nat. Biotechnol. 2010, 28, 245–248. [Google Scholar] [CrossRef]
  14. Gasparyan, M.; Van Messem, A.; Rao, S. Parameter estimation for models of chemical reaction networks from experimental data of reaction rates. Int. J. Control 2021, 96, 392–407. [Google Scholar] [CrossRef]
  15. Bézier, P.E. The Mathematical Basis of the UNISURF CAD System; Butterworth-Heinemann: Newton, MA, USA, 1986. [Google Scholar]
  16. Bézier, P.E.; Sioussiou, S. Semi-automatic system for defining free-form curves and surfaces. Comput.-Aided Des. 1983, 15, 65–72. [Google Scholar] [CrossRef]
  17. Gelman, A.; Bois, F.; Jiang, J. Physiological pharmacokinetic analysis using population modeling and informative prior distributions. J. Am. Stat. Assoc. 1996, 91, 1400–1412. [Google Scholar] [CrossRef]
  18. Girolami, M. Bayesian inference for differential equations. Theor. Comput. Sci. 2008, 408, 4–16. [Google Scholar] [CrossRef]
  19. Barber, D.; Wang, Y. Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 22–24 June 2014; Volume 32, pp. 1485–1493. [Google Scholar]
  20. Chkrebtii, O.A.; Campbell, D.A.; Calderhead, B.; Girolami, M.A. Bayesian solution uncertainty quantification for differential equations. Bayesian Anal. 2016, 11, 1239–1267. [Google Scholar] [CrossRef]
  21. Liu, B.; Wang, L.; Cao, J. Bayesian estimation of ordinary differential equation models when the likelihood has multiple local modes. Monte Carlo Methods Appl. 2018, 24, 117–127. [Google Scholar] [CrossRef]
  22. Huang, H.; Handel, A.; Song, X. A Bayesian approach to estimate parameters of ordinary differential equation. Comput. Stat. 2020, 35, 1481–1499. [Google Scholar] [CrossRef]
  23. Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: New York, NY, USA, 2004. [Google Scholar]
  24. Murzin, D.Y.; Wärnå, J.; Haario, H.; Salmi, T. Parameter estimation in kinetic models of complex heterogeneous catalytic reactions using Bayesian statistics. React. Kinet. Mech. Catal. 2021, 133, 1–15. [Google Scholar] [CrossRef]
  25. Rathinam, M.; Yu, M. State and parameter estimation from exact partial state observation in stochastic reaction networks. J. Chem. Phys. 2021, 154, 034103. [Google Scholar] [CrossRef]
  26. Savara, A.; Walker, E.A. CheKiPEUQ Intro 1: Bayesian parameter estimation considering uncertainty or error from both experiments and theory. ChemCatChem 2020, 12, 5385–5400. [Google Scholar] [CrossRef]
  27. Galagali, N.; Marzouk, Y.M. Bayesian inference of chemical kinetic models from proposed reactions. Chem. Eng. Sci. 2015, 123, 170–190. [Google Scholar] [CrossRef]
  28. Cohen, M.; Vlachos, D.G. Chemical kinetics Bayesian inference toolbox (CKBIT). Comput. Phys. Commun. 2021, 265, 107989. [Google Scholar] [CrossRef]
  29. Rao, S.; van der Schaft, A.; van Eunen, K.; Bakker, B.M.; Jayawardhana, B. A model reduction method for biochemical reaction networks. BMC Syst. Biol. 2014, 8, 52. [Google Scholar] [CrossRef] [PubMed]
  30. Radulescu, O.; Gorban, A.N.; Zinovyev, A.; Noel, V. Reduction of dynamical biochemical reactions networks in computational biology. Front. Genet. 2012, 3, 131. [Google Scholar] [CrossRef] [PubMed]
  31. Snowden, T.J.; van der Graaf, P.H.; Tindall, M.J. Methods of model reduction for large-scale biological systems: A survey of current methods and trends. Bull. Math. Biol. 2017, 79, 1449–1486. [Google Scholar] [CrossRef]
  32. Malik-Sheriff, R.S.; Glont, M.; Nguyen, T.V.N.; Tiwari, K.; Roberts, M.G.; Xavier, A.; Vu, M.T.; Men, J.; Maire, M.; Kananathan, S.; et al. BioModels—15 years of sharing computational models in life science. Nucleic Acids Res. 2020, 48, D407–D415. [Google Scholar] [CrossRef] [PubMed]
  33. Edelstein, S.J.; Schaad, O.; Henry, E.; Bertrand, D.; Changeux, J.P. A kinetic mechanism for nicotinic acetylcholine receptors based on multiple allosteric transitions. Biol. Cybern. 1996, 75, 361–379. [Google Scholar] [CrossRef] [PubMed]
  34. Leroux, A.E.; Haanstra, J.R.; Bakker, B.M.; Krauth-Siegel, R.L. Dissecting the catalytic mechanism of Trypanosoma brucei trypanothione synthetase by kinetic analysis and computational modeling. J. Biol. Chem. 2013, 288, 23751–23764. [Google Scholar] [CrossRef]
  35. Bapat, R.B. Graphs and Matrices; Springer: Berlin/Heidelberg, Germany, 2010; Volume 27. [Google Scholar]
  36. Bollobás, B. Modern Graph Theory: Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 1998; Volume 184. [Google Scholar]
  37. Gasparyan, M.; Van Messem, A.; Rao, S. An automated model reduction method for biochemical reaction networks. Symmetry 2020, 12, 1321. [Google Scholar] [CrossRef]
  38. Agaev, R.; Chebotarev, P. On the spectra of nonsymmetric Laplacian matrices. Linear Algebra Its Appl. 2005, 399, 157–168. [Google Scholar] [CrossRef]
  39. Gershgorin, S.A. Über die abgrenzung der eigenwerte einer matrix. Bull. l’Acad. Sci. l’URSS 1931, 6, 749–754. [Google Scholar]
  40. Mesbahi, M.; Egerstedt, M. Graph Theoretic Methods in Multiagent Networks; Princeton University Press: Princeton, NJ, USA, 2010. [Google Scholar]
  41. Gasparyan, M.; Van Messem, A.; Rao, S. A novel technique for model reduction of biochemical reaction networks. IFAC-PapersOnLine 2018, 51, 28–31. [Google Scholar] [CrossRef]
  42. Zhang, F. The Schur Complement and Its Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
  43. Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 1944, 2, 164–168. [Google Scholar] [CrossRef]
  44. Marquardt, D.W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 1963, 11, 431–441. [Google Scholar] [CrossRef]
  45. Fletcher, R. A Modified Marquardt Subroutine for Non-Linear Least Squares; Theoretical Physics Division, Atomic Energy Research Establishment: Harwell, UK, 1971. [Google Scholar]
  46. Bellman, R.; Åström, K.J. On structural identifiability. Math. Biosci. 1970, 7, 329–339. [Google Scholar] [CrossRef]
  47. Caravelli, F.; Traversa, F.L.; Di Ventra, M. Complex dynamics of memristive circuits: Analytical results and universal slow relaxation. Phys. Rev. E 2017, 95, 022140. [Google Scholar] [CrossRef] [PubMed]
  48. Atkins, P.; de Paula, J. Atkins’ Physical Chemistry; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
  49. Yau, M.K.; Rogers, R.R. A Short Course in Cloud Physics; Elsevier: Amsterdam, The Netherlands, 1996. [Google Scholar]
  50. Efron, B.; LePage, R. Introduction to Bootstrap; Wiley & Sons, New York, NY, USA, 1992.
  51. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
  52. Johnson, R.W. An introduction to the bootstrap. Teach. Stat. 2001, 23, 49–54. [Google Scholar] [CrossRef]
  53. Varian, H. Bootstrap tutorial. Math. J. 2005, 9, 768–775. [Google Scholar]
  54. Clancy, C.E.; Rudy, Y. Cellular consequences of HERG mutations in the long QT syndrome: Precursors to sudden cardiac death. Cardiovasc. Res. 2001, 50, 301–313. [Google Scholar] [CrossRef]
  55. Shepelin, D.; Machado, D.; Nielsen, L.K.; Herrgård, M.J. Benchmarking kinetic models of Escherichia coli metabolism. bioRxiv 2020. [Google Scholar] [CrossRef]
  56. Smallbone, K.; Simeonidis, E.; Swainston, N.; Mendes, P. Towards a genome-scale kinetic model of cellular metabolism. BMC Syst. Biol. 2010, 4, 1–9. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of the original model (left-hand panel) of nicotinic acetylcholine receptors and the corresponding Kron-reduced model (right-hand panel) obtained by deleting the single-species complexes BL, AL, IL, DL, BLL, ALL, ILL, and DLL from the graph of complexes.
Figure 1. Schematic representation of the original model (left-hand panel) of nicotinic acetylcholine receptors and the corresponding Kron-reduced model (right-hand panel) obtained by deleting the single-species complexes BL, AL, IL, DL, BLL, ALL, ILL, and DLL from the graph of complexes.
Bioengineering 10 01056 g001
Figure 2. The available time-series data of species’ concentrations of the model of nicotinic acetylcholine receptors and the corresponding predicted values (using WLS and UWLS) obtained from the Kron reduced mathematical model with estimated parameters provided in Table 3.
Figure 2. The available time-series data of species’ concentrations of the model of nicotinic acetylcholine receptors and the corresponding predicted values (using WLS and UWLS) obtained from the Kron reduced mathematical model with estimated parameters provided in Table 3.
Bioengineering 10 01056 g002
Figure 3. The available time-series data of species’ concentrations of the model of nicotinic acetylcholine receptors and the corresponding model predicted values (for both WLS and UWLS approaches) with parameters estimated by our method. These estimated parameters are given in Table 2.
Figure 3. The available time-series data of species’ concentrations of the model of nicotinic acetylcholine receptors and the corresponding model predicted values (for both WLS and UWLS approaches) with parameters estimated by our method. These estimated parameters are given in Table 2.
Bioengineering 10 01056 g003
Figure 4. Schematic representation of the original model (left-hand panel) of trypanosoma brucei trypanothione synthetase and the corresponding Kron-reduced model (right-hand panel) obtained by deleting the single-species complexes EA, EB, EC, EQ, ER, E_Q, EAB, EAC, EAQ, EBC, EBQ, EABQ, EABC from the graph of complexes.
Figure 4. Schematic representation of the original model (left-hand panel) of trypanosoma brucei trypanothione synthetase and the corresponding Kron-reduced model (right-hand panel) obtained by deleting the single-species complexes EA, EB, EC, EQ, ER, E_Q, EAB, EAC, EAQ, EBC, EBQ, EABQ, EABC from the graph of complexes.
Bioengineering 10 01056 g004
Figure 5. The available time-series data of species’ concentrations of the model of trypanosoma brucei trypanothione synthetase and the corresponding predicted values obtained from the Kron-reduced mathematical model with estimated parameters provided in Table 7.
Figure 5. The available time-series data of species’ concentrations of the model of trypanosoma brucei trypanothione synthetase and the corresponding predicted values obtained from the Kron-reduced mathematical model with estimated parameters provided in Table 7.
Bioengineering 10 01056 g005
Figure 6. The available time-series data of species’ concentrations of the model of trypanosoma brucei trypanothione synthetase and the corresponding model predicted values with parameters estimated by our method. These estimated parameters are given in Table 6.
Figure 6. The available time-series data of species’ concentrations of the model of trypanosoma brucei trypanothione synthetase and the corresponding model predicted values with parameters estimated by our method. These estimated parameters are given in Table 6.
Bioengineering 10 01056 g006
Table 1. An overview of the primary compounds participating in the considered model of the nicotinic acetylcholine receptors.
Table 1. An overview of the primary compounds participating in the considered model of the nicotinic acetylcholine receptors.
Species NotationSpecies NameSpecies ID
AState of higher affinities with open channelActive
BActivatable resting closed stateBasal
IInactivatable state of higher affinities with closed channelInactivatable
DDesensitised state of higher affinities with closed channelDesensitised
Table 2. The values of the rate constants of the model of the nicotinic acetylcholine receptors provided in [33], the estimated values obtained using our method (for both WLS and UWLS), and the corresponding confidence intervals calculated using bootstrapping.
Table 2. The values of the rate constants of the model of the nicotinic acetylcholine receptors provided in [33], the estimated values obtained using our method (for both WLS and UWLS), and the corresponding confidence intervals calculated using bootstrapping.
ParametersProvided ValuesEstimated Values (UWLS)Estimated Values (WLS)Confidence Intervals
k 1 3000.00002985.39822999.1102[2975.0001, 3004.0154]
k 2 8000.00007997.64797992.4287[7658.0406, 8106.2202]
k 3 1500.00001499.45861486.9596[1400.0112, 1587.4735]
k 4 16,000.0015,995.2815,978.15[15,116.77, 16,007.37]
k 5 30,000.000029,991.095129,956.5732[2821.5981, 3033.6088]
k 6 700.0000700.4901722.1788[675.6376, 896.1618]
k 7 3000.00003011.98913038.551[2999.0560, 3066.3532]
k 8 8.64002.007736.7439[1.4659, 237.2756]
k 9 1500.00001499.38461496.7745[1424.2376, 1608.9460]
k 10 17.280018.07715.4371[2.3910, 26.9894]
k 11 0.54000.75220.0934[ 2.82 × 10 5 , 0.7812]
k 12 10,800.0010,809.9510,822.44[10,800.01, 10,833.63]
k 13 130.0000129.89770.01295[0.0002, 145.5059]
k 14 2740.00002739.33152736.4242[2546.3118, 2843.7101]
k 15 3000.00003000.56362848.1724[2772.8053, 3000.0009]
k 16 4.00003.2436162.0834[1.1588, 211.1424]
k 17 1500.00001500.07761491.0929[1488.1059, 1513.4998]
k 18 8.000011.4275378.6929[6.9902, 503.0242]
k 19 19.700019.63930.6956[0.1307, 12.2759]
k 20 3.74004.07545.5581[1.8582, 27.4541]
k 21 19.850018.9582104.6638[17.6869, 190.2117]
k 22 1.74000.255866.9278[0.0327, 77.2369]
k 23 20.000017.80550.0246[0.0014, 23.0991]
k 24 0.81000.00190.01142[0.8588, 2.6275]
k 25 3000.00002999.1782998.2315[2880.6353, 3012.3044]
k 26 4.00004.02984.4320[1.4111, 5.6222]
k 27 1500.00001499.61451500.2207[960.5602, 1739.9138]
k 28 8.00005.772214.2218[3.6051, 18.2161]
k 29 0.05000.00410.0566[ 3.93 × 10 6 , 0.0567]
k 30 0.00100.30510.4618[0.0002, 6.9397]
k 31 0.0500 4.99 × 10 6 0.0015[ 5.15 × 10 7 , 68.9054]
k 32 0.00100.00060.0577[0.0001, 1.8794]
k 33 0.0500 6.78 × 10 8 0.0028[ 1.3 × 10 8 , 0.5321]
k 34 0.0010 1.60 × 10 6 0.0009[ 0.85 × 10 6 , 0.0021]
Table 3. The best-fitting parameter values (for both WLS and UWLS approaches) of the Kron reduced mathematical model of the nicotinic acetylcholine receptors corresponding to the available time-series experimental data.
Table 3. The best-fitting parameter values (for both WLS and UWLS approaches) of the Kron reduced mathematical model of the nicotinic acetylcholine receptors corresponding to the available time-series experimental data.
ParametersEstimated Values (UWLS)Estimated Values (WLS)
p 1 2060.87440.2430
p 2 947.04660.6338
p 3 1385.3214 1.73 × 10 5
p 4 13,732.964713,783.3992
p 5 1061.548353.9256
p 6 8.65121.1069
p 7 2860.88802762.7458
p 8 313.892316.0378
p 9 0.01920.2593
p 10 1617.0009340.8659
p 11 2769.50293.8327
p 12 1.303688.8853
Table 4. Leave-one-out cross-validation training errors for the mathematical model of the nicotinic acetylcholine receptors corresponding to weighted and unweighted least squares optimization methods.
Table 4. Leave-one-out cross-validation training errors for the mathematical model of the nicotinic acetylcholine receptors corresponding to weighted and unweighted least squares optimization methods.
Unweighted Least SquaresWeighted Least Squares
Training error3.21643.6108
Table 5. An overview of the primary compounds participating in the considered model of the Trypanosoma brucei trypanothione synthetase.
Table 5. An overview of the primary compounds participating in the considered model of the Trypanosoma brucei trypanothione synthetase.
Species NotationSpecies NameSpecies ID
EEnzymeE
PAdenosine diphosphateADP
QGlutathionylspermidineGSP
RBis(glutathionyl)spermine T ( SH ) 2
CSpermidineSpd
BGlutathioneGSH
AAdenosine triphosphateATP
XGlutathione spermidine EnzymeE.GS_P
Table 6. The values of the rate constants of the model of the trypanosoma brucei trypanothione synthetase provided in [34], the estimated values obtained using our method (for both WLS and UWLS), and the corresponding confidence intervals calculated using bootstrapping.
Table 6. The values of the rate constants of the model of the trypanosoma brucei trypanothione synthetase provided in [34], the estimated values obtained using our method (for both WLS and UWLS), and the corresponding confidence intervals calculated using bootstrapping.
ParametersProvided ValuesEstimated Values (UWLS)Estimated Values (WLS)Confidence Intervals
k 1 53.417844.100242.8180[36.3628,  53.1468]
k 2 9.0680016.753112.4590[8.4905,  15.9902]
k 3 1.19178.71103.9547[1.6893,  7.7521]
k 4 3.630016.45784.8968[3.7351,  7.9581]
k 5 67.870058.554668.7404[51.6995,  69.8990]
k 6 9.520062.85310.3219[0.0410,  10.7708]
k 7 7.970013.81480.2034[0.0662,  6.8344]
k 8 4.500052.28165.1613[4.6057,  10.0443]
k 9 1.76001.82735.2016[0.0798,  7.7487]
k 10 5.500011.795911.5207[5.9256,  13.4458]
k 11 5.073485.89260.5021[0.0919,   8.6912]
k 12 5.039654.941511.2865[6.5235,   11.8915]
k 13 92.961686.374922.6051[16.9820,   91.2826]
k 14 69.670074.23730.2118[0.0664,   67.9548]
k 15 58.280051.588548.5051[45.9896,   70.1724]
k 16 81.540089.710176.2469[68.4823,   95.0394]
k 17 7.900085.89120.2689[0.0213,   7.9164]
k 18 8.8900101.020314.0437[9.2273,   16.0937]
k 19 13.05002.91138.5301[0.3245,   12.0491]
k 20 6.540093.75669.0591[4.4476,   12.4538]
k 21 6.261552.64660.2456[0.0223,   7.9409]
k 22 9.5536106.583015.6348[5.1120,   16.2136]
k 23 2.747929.71980.7268[0.0109,   5.7346]
k 24 7.950053.897913.1923[8.4006,   19.5529]
k 25 8.130067.24794.1898[1.9219,   12.9717]
k 26 2.78000.00560.1197[0.2465,   4.5403]
k 27 4.810024.74399.0857[5.8291,   13.3738]
k 28 0.0900100.82933.4949[0.0506,   1.8452]
k 29 5.470018.10140.3826[0.1967,   1.7861]
k 30 5.52001.73970.5090[0.0.4621,   1.7866]
k 31 7.86694.13340.8026[0.7527,    3.0838]
k 32 8.60565.01410.8054[0.7705,    3.3345]
k 33 2.67291.15040.2569[0.2016,    0.9534]
k 34 6.640056.19570.6883[0.6682,    6.4683]
k 35 8.350021.481710.9232[8.8105,    12.3105]
k 36 2.600060.63973.9687[2.0922,    7.0713]
k 37 6.090068.201110.5792[7.4107,    13.1953]
k 38 5.980071.01245.3738[3.3981,    30.2118]
k 39 7.080090.12959.6122[7.1826,    11.6251]
k 40 4.230097.82415.0378[3.7236,    6.6922]
k 41 6.901277.27709.0691[7.0122,    10.3685]
k 42 5.145855.66084.9221[3.0766,    9.6034]
k 43 9.836394.671312.5293[9.9403,    20.2133]
k 44 8.010055.930911.7194[8.6101,    12.8384]
k 45 1.70005.81165.0751[1.6858,    6.1590]
k 46 2.090010.30610.2928[0.1207,    19.6999]
k 47 6.270087.72727.2096[6.2626,    33.9641]
k 48 8.430048.08448.3244[8.5120,    11.2713]
k 49 4.490084.91906.1626[4.3403,    9.6653]
k 50 0.94001.77200.2076[0.0253,    0.8675]
k 51 5.231751.05070.8873[0.8291,    1.6575]
k 52 2.859962.85075.3629[0.7383,    6.4126]
k 53 3.22702.257310.4618[3.2739,    15.3034]
k 54 6.47001.60550.1016[0.0154,    0.7756]
k 55 3.24001.35030.6410[0.2601,    1.5042]
k 56 4.95002.43550.8050[0.7399,    2.1214]
k 57 8.96005.64370.8007[0.0290,    3.5065]
k 58 1.25000.47220.3622[0.0869,    0.4206]
k 59 2.31000.96810.8058[0.7560,   1.1369]
Table 7. The best-fitting parameter values (for both WLS and UWLS approaches) of the Kron reduced mathematical model of the Trypanosoma brucei trypanothione synthetase corresponding to the available time-series experimental data.
Table 7. The best-fitting parameter values (for both WLS and UWLS approaches) of the Kron reduced mathematical model of the Trypanosoma brucei trypanothione synthetase corresponding to the available time-series experimental data.
ParametersEstimated Values (UWLS)Estimated Values (WLS)
p 1 8.86810.3572
p 2 2.33870.5386
p 3 0.04090.0893
p 4 5.45800.9955
p 5 4.62003.0531
p 6 7.85254.9069
p 7 8.51191.9480
p 8 2.97760.6329
p 9 2.52137.1061
p 10 3.11722.7720
p 11 3.26436.1818
p 12 4.94303.6137
p 13 9.02303.2929
p 14 0.83620.5099
p 15 2.29471.5590
Table 8. Leave-one-out cross-validation training errors for mathematical model of the Trypanosoma brucei trypanothione synthetase corresponding to unweighted and weighted least squares optimization methods.
Table 8. Leave-one-out cross-validation training errors for mathematical model of the Trypanosoma brucei trypanothione synthetase corresponding to unweighted and weighted least squares optimization methods.
Unweighted Least SquaresWeighted Least Squares
Training error0.82330.6977
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gasparyan, M.; Rao, S. Parameter Estimation for Kinetic Models of Chemical Reaction Networks from Partial Experimental Data of Species’ Concentrations. Bioengineering 2023, 10, 1056. https://doi.org/10.3390/bioengineering10091056

AMA Style

Gasparyan M, Rao S. Parameter Estimation for Kinetic Models of Chemical Reaction Networks from Partial Experimental Data of Species’ Concentrations. Bioengineering. 2023; 10(9):1056. https://doi.org/10.3390/bioengineering10091056

Chicago/Turabian Style

Gasparyan, Manvel, and Shodhan Rao. 2023. "Parameter Estimation for Kinetic Models of Chemical Reaction Networks from Partial Experimental Data of Species’ Concentrations" Bioengineering 10, no. 9: 1056. https://doi.org/10.3390/bioengineering10091056

APA Style

Gasparyan, M., & Rao, S. (2023). Parameter Estimation for Kinetic Models of Chemical Reaction Networks from Partial Experimental Data of Species’ Concentrations. Bioengineering, 10(9), 1056. https://doi.org/10.3390/bioengineering10091056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop