Multi-Objective Optimization of Experiments using Curvature and Fisher Information Matrix

The bottleneck in creating dynamic models of biological networks and processes often lies in estimating unknown kinetic model parameters from experimental data. In this regard, experimental conditions have a strong influence on parameter identifiability and should therefore be optimized to give the maximum information for parameter estimation. Existing model-based design of experiment (MBDOE) methods commonly rely on the Fisher Information Matrix (FIM) for defining a metric of data informativeness. When the model behavior is highly nonlinear, FIM-based criteria may lead to suboptimal designs since the FIM only accounts for the linear variation of the model outputs with respect to the parameters. In this work, we developed a multi-objective optimization (MOO) MBDOE, where model nonlinearity was taken into consideration through the use of curvature. The proposed MOO MBDOE involved maximizing data informativeness using a FIM-based metric and at the same time minimizing the model curvature. We demonstrated the advantages of the MOO MBDOE over existing FIM-based and other curvature-based MBDOEs in an application to the kinetic modeling of fed-batch fermentation of Baker’s yeast.


Introduction
Dynamic models of biological networks and processes are often created to gain a better understanding of the system behavior.The creation of dynamic biological models requires the values of kinetic parameters, many of which are system-specific and typically not known a priori.These parameters are commonly estimated by calibrating model simulations to the available experimental data.Such parameter fitting is known to be challenging, as there often exist multiple parameter combinations that fit the available data equally well; that is, the model parameters are not identifiable [1][2][3][4][5].While there exist a number of reasons for such lack of parameter identifiability, experimental conditions have a strong influence on this issue and thus should be carefully designed.In addition, biological experiments and data collection are often costly and time-consuming, further motivating the need for well-planned experiments that would give the maximum information given finite resources.
Model-based design of experiments (MBDOEs) offer a means for integrating dynamic modeling with experimental efforts, as illustrated by the iterative procedure in Figure 1.The role of the model here is to capture the knowledge and information about the system up to a given iteration.By using MBDOEs, one could harness this knowledge to guide experiments in the next iteration.MBDOE techniques have been used extensively for chemical process modeling [6], and more recently, they have been applied to the modeling of cellular processes [7,8].For the purpose of parameter estimation, experiments are generally designed to improve the precision of the estimated parameters.In this regard, the Fisher information matrix (FIM), whose inverse provides an estimate of the lower bound of parameter variance-covariance by the Cramér-Rao inequality [9], has been commonly used to define the objective function in the optimal experimental design (see [8] and references therein).Since the turn of the century, FIM-based MBDOE methods have had newfound applications in the emerging area of systems biology [10][11][12][13][14][15][16].Besides FIM, Bayesian approaches have also been used for MBDOEs, where given the prior distribution of the model parameters, experiments are designed to minimize the posterior parameter variance [8].Bayesian MBDOE strategies have been applied to the modeling of biological networks for reducing parametric uncertainty [17][18][19].While our work is concerned with MBDOEs for the purpose of parameter estimation, MBDOE strategies have also been developed and applied for discriminating between biological model structures [20][21][22][23][24] and reducing cellular process output uncertainty [19,25,26].In this work, we focused on FIM-based MBDOEs for parameter estimation.The FIM relies on a linear approximation of the model behavior as a function of the parameters.More precisely, the FIM is computed as a function of the first-order parametric sensitivity coefficients (Jacobian matrix) of model outputs.For systems with a high degree of nonlinearity, the optimal experimental design using the FIM may perform poorly [27].For this reason, Bates and Watts proposed a MBDOE based on minimizing model curvature by using the second-order parametric sensitivities (Hessian matrix) [28].Hamilton and Watts further introduced a design criterion, called Q-optimality, based on a quadratic approximation of the volume of the parameter confidence region [29].More recently, Benabbas et al., proposed two curvature-based MBDOEs [30].In one design, the authors used a minimization of the root mean square (RMS) of the Hessian matrix, while in another design, they employed a constrained optimization guaranteeing the RMS to be lower than a given level.While the second strategy using a curvature threshold was demonstrated to give more informative experiments, how to set the appropriate RMS threshold value in a particular application was not described.
Recently, Maheshwari et al. described a multi-objective optimization (MOO) formulation for optimizing the design of the experiment using a combination of FIM-based metric and parameter correlation [15].Because parameter correlations could not account for model nonlinearity, the strategy has the same drawback as FIM-based methods when applied to nonlinear models.In this work, we proposed a MOO MBDOE method using a combination of a FIM criterion and model curvature.We demonstrated the advantages of the proposed MOO MBDOE over FIM-based and other curvature-based methods in an application to the kinetic modeling of the fed-batch fermentation of baker's yeast [30,31].

Model-Based Optimal Design of Experiments
We assume that the experimental data y ∈ IR n are contaminated by additive random noise, as follows: where µ and denote the mean of the measurement data and the random noise, respectively.
When the total number of data points n is greater than the number of parameters p, µ spans a p-dimensional space Ω ⊂ IR n , where Here, x ∈ IR s denotes the state vector, θ ∈ IR p denotes the parameter vector, u ∈ IR m denotes the input and F(x, u, θ) denotes the vector of nonlinear model equations.The subspace Ω is also called the expectation surface or the solution locus.For a dynamic system, the state x is often described by a set of ordinary differential equations (ODEs): The estimation of model parameters θ from a given set of data y is typically formulated as a minimization of the weighted sum of squares of the difference between the model prediction F(x, u, θ) and the measurement data y.For example, the maximum likelihood estimator (MLE) of the model parameters for normally distributed data with known variance V is given by the minimum of the following objective function: When the model is a linear function of the parameters F(x, u, θ) = Xθ, X ∈ IR n×p , then the parameter estimates are given by θ = (X T V −1 X) −1 X T V −1 y.In this case, the MLE is the minimum variance unbiased estimator of θ, where the covariance matrix of the parameter estimates is given by V θ = (X T V −1 X) −1 .When the model is nonlinear (with respect to the parameters), the parameter estimates θ = arg min Φ(θ) do not necessarily correspond to the minimum variance estimator.According to the Cramér-Rao inequality [9], the inverse of the FIM provides a lower bound for the covariance of the parameter estimates θ, that is where F = Ḟ( θ, x) = ∂F(x,u,θ) ∂θ | θ= θ is the first-order sensitivity matrix of F(x, u, θ) with respect to the parameters θ.
On the basis of the Cramér-Rao inequality, the FIM has been commonly used as a criterion of data informativeness in MBDOEs.Many methods for MBDOEs, such as those listed in Table 1, are based on finding experimental conditions that optimize a FIM-based information metric.As shown in Equation ( 5), the FIM relies on a linearization of the model behavior with respect to the parameters.Essentially, the linearization replaces the expectation surface Ω by its tangent plane at θ.The performance of the experimental design using a FIM-based criterion would therefore depend on whether (1) the model outputs vary proportionally with the parameter values (planar assumption), and (2) whether this proportionality is constant (uniform coordinate assumption) [32].When the model is highly nonlinear with respect to the parameters, FIM-based MBDOEs may produce suboptimal designs [33,34].A recent MOO MBDOE using a combination of a FIM criterion and parameter correlation has been shown to provide an improvement over FIM-based MBDOE methods [15].However, this method also relies on the first-order parametric sensitivity matrix, and thus it could not account for model nonlinearity.
Table 1.Model-based designs of experiments (MBDOEs) using the Fisher information matrix (FIM).

FIM-Based MBDOE Criterion
Curvature-based design of experiment methods such as the Q-optimality have been introduced to account for model nonlinearity by employing a second-order approximation of the model output.
Here, the curvature of the expectation surface Ω is captured using the second-order sensitivities of F(x, u, θ) based on the Taylor series expansion: where Fijk = ∂ 2 F i (x,u,θ) | θ= θ is the n × p × p Hessian matrix.As mentioned in the Introduction, several curvature-based MBDOE methods are available, for example, by minimizing curvature or using a curvature threshold [30].In this work, we employed a MOO approach based on curvatures for designing optimal experiments.The basic premise of our MBDOE is to select experimental conditions that maximize the informativeness of data and ensure that the model behaves relatively linearly with respect to the parameters.More specifically, our MBDOE uses two objective functions, the first of which involves the maximization of a FIM-based information metric, and the second of which involves the minimization of relative curvature measures [28].The second objective function ensures that the FIM can provide a reliable measure of data informativeness.

Multi-Objective Design of Experiments Based on Curvatures
In this section, we derive the relative curvature measures by following the work of Bates and Watts [28].We consider an arbitrary straight line in the parameter space passing through θ: where h = [h 1 , h 2 , . . ., h p ] is a non-zero vector.As the scalar parameter b varies, a curve is traced through the expectation surface, also referred to as the lifted line, according to The tangent line of this curve at b = 0 is given by The set of all such tangent lines, that is, the column space of F, describes the tangent (hyper)plane at µ( θ).
Meanwhile, the curvature measures come from a quadratic approximation of µ.In this case, the acceleration of µ(b) at b = 0 can be written as follows: The acceleration vector μh can be subsequently decomposed into two components: where at µ( θ), μt h is tangential to the tangent plane and μn h is normal to the tangent plane.The tangential acceleration μt h is also called the parameter-effect curvature [28] and provides a measure of nonlinearity along the parameter vector h.The degree of the parameter-effect curvature can change upon reparameterization of the model.Meanwhile, the normal acceleration μn h does not vary with model parameterization, and hence it is called the intrinsic curvature.Finally, the relative curvature measures in the direction of h are given by [28,32]: Below, we describe the decomposition of the Hessian into the tangential and the normal component.We consider the QR-factorization of the Jacobian F, that is, The remaining column vectors of Q (i.e., N) are orthonormal to the tangent surface at ϕ = 0.In the same manner, the Hessian matrix in the rotated axes can be written as Ü = L T FL, where L = R−1 and Üijk = ∂ 2 F i (x,u,ϕ) The decomposition of the Hessian into the tangential and normal components is given by the following equation [28]: The matrices Ät and Än respectively correspond to the parameter-effect and intrinsic curvature components of the Hessian.
To normalize the relative curvatures in Equations ( 12) and ( 13), Bates and Watts [28] used the scaling factor ρ, where ρ = s √ p and s 2 = (y− μ) T (y− μ) n−p .Following the same procedure, we define the normalized relative curvatures as follows: In addition, recasting h in the rotated axes as h = Ld, the tangent line μLd will have a unit norm (i.e., μLd = 1) when d is a unit vector.The computation of γ t h and γ n h is thus simplified into In the proposed experimental design, the maximum of these curvature measures are used, where As mentioned above, in formulating the MOO for the design of experiments, two design criteria have been taken into account.The first is that the experiment should be designed to maximize the informativeness of the data for parameter estimation.In this case, we employ an information metric based on the FIM.Meanwhile, the second design criterion in the MOO aims to minimize both the parameter-effect and intrinsic curvatures.The MOO formulation offers certain advantages, for example, that there is no need to prioritize any one of the criteria beforehand.Instead, we generate the Pareto set or Pareto frontier representing the set of solutions for which we cannot improve the value of one objective function without negatively affecting the other(s) [35].
Considering the kinetic ODE model given in Equation (3), our multi-objective formulation using the D-optimal criterion is given by max subject to where λ i is the ith eigenvalue of the FIM (Equation ( 5)).The first objective function can be substituted with other FIM-based metrics (see Table 1).The parameter vector θ is either an initial guess of the parameter values or the parameter estimates from the current iteration of an iterative model identification procedure [6].The decision variables may include the initial condition of the states x 0 , the sampling time points of measurements t sp , and the dynamic input u(t).In the case study below, we considered a control vector parametrization (CVP) of the input u i (t) as illustrated in Figure 2.

Numerical Implementation of the Curvature-Based MOO Design
As described in the previous section, the parameter-effect and intrinsic curvatures require the computation of the first-and second-order model sensitivities.For the ODE model in Equation ( 3), the first-order sensitivities can be calculated according to The sensitivities in the above equation are normalized with respect to the parameter values.The last term on the right-hand side is the first-order sensitivities of the ODE model, which obey the following differential equation: Here, we have assumed that x 0 is not part of the parameter estimation, but such an assumption can be easily relaxed.In the case study, the sensitivities ∂x ∂θ were computed by solving the ODE in Equation ( 24) simultaneously to Equation (3), following a procedure known as the direct differential method [36].Meanwhile, the Hessian matrix was approximated using a finite-difference method, as follows: where e j is the jth elementary vector and uses 1% parameter perturbations (i.e., ∆θ j /θ j =0.01).
The second-order sensitivities above are also normalized with respect to the parameter values.Meanwhile, the curvature measures γ t max and γ n max in Equations ( 19) and (20) were calculated from the Hessian matrix using the alternating least squares (ALS) method [37], an algorithm created to find the maximum singular value σ max of a three-dimensional matrix.Based on the definitions in Equations ( 19) and (20), the maximum curvature measures can be determined by computing the maximum singular values of the matrices ρ Ät and ρ Än , respectively.More specifically, we implemented the ALS method to solve for where B is either ρ Ät or ρ Än .The ALS algorithm started with initial guess values of the vectors r and s and used the above equation to solve for one variable while fixing the other in an alternating manner.Zhang and Golub showed that the method linearly converges in a neighbourhood of the optimal solution [37].
In the case study, the MOO problem was solved using the non-dominated sorting genetic algorithm II (NSGAII) in MATLAB, producing a Pareto frontier in the space of the objective functions [38].We employed a population size of 300 and set the number of generations to 50 times the number of parameters (i.e., 1450).We recasted a maximization of an objective function as the minimization of its negative counterpart.The optimal design was selected from the Pareto frontier by balancing the trade-offs among the objective functions.More specifically, we first normalized the objective functions such that their values on the Pareto frontier ranged between 0 and 1.Finally, we chose among the solutions on the Pareto frontier that which minimized the Euclidean distance of all (normalized) objective functions as the final design.

MBDOEs of Baker Yeast Fermentation Model
We evaluated the performance of the proposed MBDOE in an application to a kinetic model of a fed-batch fermentation of baker's yeast [30,31].In addition to a D-optimal criterion, we also implemented A-optimal, E-optimal and modified E-optimal criteria (see Table 1) with our MOO MBDOE.We compared the performance of our method to other MBDOEs, including (a) FIM-based MBDOEs, that is, D-optimal, A-optimal, E-optimal and modified E-optimal designs; (b) a D-optimal design with a curvature threshold [30]; (c) a Q-optimal MBDOE [29]; and (d) a MOO MBDOE using parameter correlation [15].In total, we applied and compared 14 MBDOE methods.For the optimizations in (a), (b) and (c), we employed the enhanced scatter search metaheuristic (eSSm) algorithm [39][40][41].For the MOO in (d), we used the optimization algorithm and optimal Pareto point selection, as described in the previous section.
In the fed-batch fermenter model, cellular growth and product formation are captured by the biomass variable x 1 , which is assumed to rely on a single substrate variable x 2 .The fermenter operates at a constant temperature and the feed is free from product.The model equations are given by (27) where the input u 1 is the dilution factor (in the range of 0.05-0.20 h −1 ) and the input u 2 is the substrate concentration in the feed (in the range of 5-35 g/L).In the model, the biomass growth follows Monod-type kinetics.The parameters θ 1 and θ 2 are the Monod kinetic parameters, θ 3 is the yield coefficient, and θ 4 is the cell death rate constant.
In the MBDOE, the design variables consisted of the initial condition of the biomass x 1 (0) in the range between 1 and 10 g/L, 10 measurement sampling times (t sp ), and the inputs u 1 (t) and u 2 (t).The piecewise-constant dynamic inputs were each parametrized using the CVP, as shown in Figure 2. Thus, the MOO was performed with 29 design parameters (x 1 (0), 10 t sp 's, 10 u i,j 's, and 8 t sw 's).The length of the time interval between two successive measurement sampling points was constrained to be between 1 and 20 h, while that between two input switching times was bounded between 2 and 20 h.The calculations of the Jacobian and Hessian matrices in MBDOEs were made using parameter values θ d = [θ 1 , θ 2 , θ 3 , θ 4 ] = [0.5, 0.5, 0.5, 0.5] [15,42], which were different from the "true" parameter values used for noisy data generation in the next section.The reason for using a different parameter set in the MBDOE to the true values was to emulate the typical scenario in practice, for which one would start only with an estimate or guess of the model parameters.Figures 3 and 4 show the optimal dynamic inputs and data sampling times resulting from all the MBDOE methods mentioned above (see also the Pareto frontiers in Figures S1 and S2 in the Supplementary Materials).Meanwhile, Table 2 gives the optimal initial biomass concentration x 1 (0).(C,D) A-optimal (red).(E,F) E-optimal (green).(G,H) modified E-optimal (black).In panels (A-H), the optimal u 1 and u 2 using Fisher information matrix (FIM)-based criteria are shown by solid line.Those using FIM-based criteria combined with curvatures are shown by dashed line, while those using FIM-based criteria combined with parameter correlation are drawn with dashed-dot line.

Q-optimal
Threshold curvature modified E-optimal and correlation modified E-optimal and curvatures modified E-optimal E-optimal and correlation E-optimal and curvatures E-optimal A-optimal and correlation A-optimal and curvatures A-optimal D-optimal and correlation D-optimal and curvatures D-optimal Table 2. Optimal initial condition of biomass x 1 (0) (g/L) from model-based design of experiments (MOO: multi-objective optimization).

Design Criterion
x 1 (0) D-optimal 10.0 MOO D-optimal and curvatures 10.0 MOO D-optimal and correlation 10.0 A-optimal 10.0 MOO A-optimal and curvatures 9.9 MOO A-optimal and correlation 10.0 E-optimal 10.0 MOO E-optimal and curvatures 10.0 MOO E-optimal and correlation 10.0 Modified E-optimal 10.0 MOO modified E-optimal and curvatures 10.0 MOO modified E-optimal and correlation 10.0 Threshold curvature 8.2 Q-optimal 5.5

Performance Evaluation
For each of the optimal experimental designs above, we generated in silico datasets by simulating the ODE model using the parameter values θ * = [0.31,0.18, 0.55, 0.05], as reported in previous publications [15,42].We subsequently added independent and identically distributed (i.i.d.) Gaussian random white noise to the model simulations using a relative variance of 0.04 for both x 1 (t) and x 2 (t) [15,42].For each in silico dataset, we then performed a parameter estimation using the resulting data (y 1 and y 2 ) by maximum likelihood estimation, that is, by minimizing We also employed the following constraints for θ in the optimization above: 0.05 ≤ θ 1 , θ 2 , θ 3 ≤ 0.98 and 0.01 ≤ θ 4 ≤ 0.98 Finding the globally optimal solution to the parameter estimation in Equation ( 4) is challenging.Here, we solved the constrained parameter optimization problem using the interior-point algorithm (implemented by the subroutine fmincon function in MATLAB) with the true parameter values θ * as the initial guess.By employing the true values as the initial starting point of the optimization, we expected that the parameter accuracy would mainly be affected by the experimental design and not by the ability of the parameter optimization algorithm to find the globally optimal solution.

Discussion
As shown in Figure 3, the MBDOEs prescribed manipulating the input u 1 (t) mostly at the beginning of the experiment and the input u 2 (t) for the entire duration of the experiment.For the majority of the MBDOEs in this study, the optimal sampling times spread unevenly over the duration of the experiment (see Figure 4).A more detailed comparison between Figures 3 and 4 showed that the optimal sampling points were typically placed before and after a change in the dynamic inputs u 1 (t) and u 2 (t).The exception to this observation was for the optimal design using the A-optimal criterion, which gave the worst parameter accuracy among the MBDOEs considered.
The consideration of model curvature using the proposed MOO MBDOE generally led to improved parameter accuracy over using only model curvature (i.e., Q-optimal and threshold curvature) or using only FIM-based criteria.The lowest nMSE came from the MOO MBDOE design using the modified E-optimal with model curvature.In comparison to MOO MBDOE using parameter correlation, employing model curvature in the MOO framework gave better experimental designs with lower average nMSEs.Meanwhile, Q-optimality and curvature thresholding strategies provided better nMSEs than the majority of the FIM-based criteria, except the D-optimal design.Finally, the optimal experiments based on the A-optimal criterion, either alone or in MOO MBDOE, performed poorly.The poor performance of the A-optimal design has also been reported in a previous publication [15].
The obvious drawback of curvature-based MBDOEs in comparison to FIM-based strategies is the higher computational cost associated with computing the Hessian matrix.While the number of first-order sensitivities (Jacobian) increases linearly with the number of parameters p, the number of second-order sensitivities scales with p 2 .Fortunately, the calculation of the Hessian matrix can be easily parallelized and implemented using multiple computing cores.In practice, one often focuses on only a subset of the model parameters, and therefore the MBDOE is typically done for a handful of parameters.
We note that the MBDOE methods considered in this work consider only parametric uncertainty in the models and assume that uncertainty in model equations, that is, structural uncertainty, is not significant.For certain types of models, such as generalized mass action and S-system models [43,44], model structural uncertainty can be treated as parametric uncertainty, and therefore the MBDOE strategies developed here could be applied.As mentioned in the Introduction, MBDOE methods for discriminating between model structures have been developed, many of which are based on the Bayesian approach.Furthermore, in applications for which there exists intrinsic parametric variability, for example, batch-to-batch variability in cell culture fermentation processes, Bayesian MBDOE methods would be more suitable than FIM-based strategies, as Bayesian methods are able to incorporate the prior (intrinsic) distribution of the parameter values in the design.Nevertheless, as demonstrated in the case study, even when the MOO MBDOEs were performed using model parameters that were quite different from the true values, the resulting optimal designs led to precise and accurate parameter estimates.Meanwhile, biological systems, like other complex systems, have been argued to be sloppy.In the context of our work, sloppy systems lead to mathematical models whose FIMs have eigenvalues that are logarithmically spread evenly over large orders of magnitude [45].In other words, the system behavior is sensitive to or is controlled by a small number of parameter combinations (along the FIM eigenvectors corresponding to large eigenvalues).At the same time, there exist many parameter combinations that can be varied without affecting the system behavior.Such sloppiness could arise in a system governed by processes that span large and evenly distributed length and/or time scales, such that there exists no clear separation between relevant and irrelevant mechanisms.A recent study demonstrated that in the case of sloppy systems, reducing the model parametric uncertainty by MBDOEs beyond a certain point might not necessarily translate to any improvement in model prediction accuracy [45].However, it is possible to construct reduced-order models of sloppy systems, whose parameters correspond to the important parameter combinations [46,47].Parameter estimation and MBDOE strategies can then be applied to these reduced models.

Figure 1 .
Figure 1.Iterative model identification cycle.The model building process involves the following key steps: experimental design, model structure formulation, parameter estimation, and model validation.

Figure 3 .
Figure 3. Optimal dilution factor and feed substrate concentration.Optimal dilution factor (u 1 in h −1 , left panels) and feed substrate concentration (u 2 in g/L, right panels).(A,B) D-optimal (blue).(C,D)A-optimal (red).(E,F) E-optimal (green).(G,H) modified E-optimal (black).In panels (A-H), the optimal u 1 and u 2 using Fisher information matrix (FIM)-based criteria are shown by solid line.Those using FIM-based criteria combined with curvatures are shown by dashed line, while those using FIM-based criteria combined with parameter correlation are drawn with dashed-dot line.(I-J) Threshold curvature (magenta, solid line), and Q-optimal design (magenta, dashed line).

Figure 4 .
Figure 4. Optimal sampling grid from model-based design of experiments (MBDOEs).Simple Fisher information matrix (FIM)-based criteria shown by continuous line, FIM-based criteria combined with curvatures by dashed line, and FIM-based criteria combined with parameter correlation by dashed-pointed line.Dots indicate the sampling times.
Control vector parametrization of input profiles.In the baker yeast case study, we implemented piecewise constant input profiles with u i =[u i,1 , u i,2 , u i,3 , u i,4 , u i,5 ] and four switching times: t sw1 , t sw2 , t sw3 , and t sw4 .

Table 3 .
Model-based design of experiment (MBDOE) performance on the fed-batch fermentation of baker's yeast model.The overall parameter accuracy is represented by the average of the normalized mean-square error (nMSE).The reported parameter values and errors are the averages and standard deviations from 100 repeated runs of parameter estimation.