Robust Model Selection: Flatness-Based Optimal Experimental Design for a Biocatalytic Reaction

Considering the competitive and strongly regulated pharmaceutical industry, mathematical modeling and process systems engineering might be useful tools for implementing quality by design (QbD) and quality by control (QbC) strategies for low-cost but high-quality drugs. However, a crucial task in modeling (bio)pharmaceutical manufacturing processes is the reliable identification of model candidates from a set of various model hypotheses. To identify the best experimental design suitable for a reliable model selection and system identification is challenging for nonlinear (bio)pharmaceutical process models in general. This paper is the first to exploit differential flatness for model selection problems under uncertainty, and thus translates the model selection problem to advanced concepts of systems theory and controllability aspects, respectively. Here, the optimal controls for improved model selection trajectories are expressed analytically with low computational costs. We further demonstrate the impact of parameter uncertainties on the differential flatness-based method and provide an effective robustification strategy with the point estimate method for uncertainty quantification. In a simulation study, we consider a biocatalytic reaction step simulating the carboligation of aldehydes, where we successfully derive optimal controls for improved model selection trajectories under uncertainty.


Introduction
Quality by design (QbD) and quality by control (QbC) are critical aspects in pharmaceutical manufacturing, aiming for low-cost but high-quality drugs [1]. For research and development costs in the competitive and strongly regulated pharmaceutical industry have been rising over recent decades leading to shrinking profit margins [2], QbD and QbC have attracted much notice. Mathematical models and process systems engineering might be useful tools for implementing effective QbD/QbC strategies, which is particularly true for the rapidly growing biopharmaceutical market [3][4][5]. A crucial task in modeling biopharmaceutical manufacturing processes is the reliable identification of model parameters and proper model candidates from a set of various model hypotheses. Model-based design of experiment (MBDoE) approaches, where the challenge of precise parameter identification and reliable model selection is formulated as an optimization problem, have proven beneficial over the last few decades [6][7][8][9]. Please note that the literature distinguishes between statistical design of experiment and MBDoE approaches. Methods of the former as response surface methods are popular and commonly used because of their simplicity, but, for the same reason, are not able to sufficiently handle complex dynamic systems [10]. On the other hand, the MBDoE approach frequently aims for optimal control actions, where commonly used numerical methods give rise to nonlinear programming problems that need substantial amounts of computational time and efficient solvers [11]. Moreover, robustification against model parameter uncertainties further complicates the problem, as statistical quantities must be considered when solving the underlying optimization problem. In addition to the actual MBDoE optimization framework and data acquisition, recent studies have shown that a well-posed parameter identification and model selection problem is equally important [12][13][14]. In this context, the consideration of input residuals in parameter estimation beyond the classical approach of output residuals has drawn attention in the literature [13][14][15][16]. For an effective solution to the robust model selection problem, we aim at implementing an inverse modeling technique by using the differential flatness concept [17]. In systems theory, differential flatness implies that analytical expressions of the system's states and controls exist that are functions of so-called flat outputs and their derivatives. Hence, by implementing a differential flatness strategy to solve an MBDoE problem, we can avoid solving differential and sensitivity equations numerically as required by standard control vector parameterization strategies for model selection. Please also note that while optimizing the flat output trajectory, the number of continuous optimization variables is reduced to the number of flat outputs, i.e., an additional reduction of computational costs as compared to standard nonlinear programming techniques with control and state variable discretization [18]. Methods based on differential flatness have been heavily used in research and industry-primarily in the design of open-loop controllers and the planning of state trajectories in (electro)mechanical systems [19]. In the case of the MBDoE technique, strategies exploiting differential flatness have barely been considered [15,16,20,21], and-to our knowledge-have not been applied to model selection problems under uncertainty before. Thus, this study addresses two open research challenges: (i) to apply the flatness concept to multi-model problems, and (ii) to integrate parameter uncertainties into the flatness-based optimization problem for robust model selection. In detail, we aim at discarding inappropriate model candidates from a set of competing model hypotheses. To this end, we show that the considered model candidates satisfy the differential flatness condition, and we solve an optimization problem under uncertainty that makes use of the flatness property. For the sake of illustration, we consider a biocatalytic process from a carboligation reaction system that forms an essential precursor in the synthesis of natural products and pharmaceuticals [22]. Please note that carbon-carbon bond-forming reactions are the backbone of numerous high-value molecules in industrial organic synthesis. However, biocatalytic production of the related bulk and fine chemicals and active pharmaceutical ingredients (APIs) is insufficiently explored today. Therefore, the application of enzymes in organic synthesis is currently an important research topic to exploit the significant potential of improving manufacturing processes in the chemical and pharmaceutical industry under the agenda of green chemistry [23,24]. An essential problem with all mathematical models and model-based design concepts, including MBDoE, is the uncertainty in model parameters. To ensure reliable inferences, parametric uncertainty should be taken into account in the MBDoE approach [6,14,25,26]. In general, probability-based concepts have attracted considerable attention in robust optimization. In the literature, between classical and Bayesian settings for handling model predictive errors as a potential consequence of uncertain model parameters can be distinguished. In the former, the strategies have in common that the model output differences in the objective function are weighted by an error covariance matrix of the model prediction to avoid experimental regions where model predictions are poor [7,[27][28][29]. One drawback of these strategies is that the prediction error covariance matrix is calculated from local sensitivities, where designs of experiments for local sensitivities might provide misleading information compared to designs based on global sensitivities [30]. On the other hand, Bayesian model selection with consideration of parametric uncertainties requires the determination of computationally expensive integrals, in particular, if the recommended bias-free numerical evaluation path is chosen [31,32]. More broadly speaking, uncertainty propagation and quantification still pose considerable challenges in computational efficiency and numerical accuracy. In this context, we propose to diminish this computational burden by using an efficient and effective rule to determine statistical moments circumventing a local approximation, namely the point estimate method (PEM). The PEM has proven beneficial in various engineering problems [33], including robust optimization problems in complex (bio)chemical process design [3,34].
The paper is organized as follows. In Section 2, we cover the basics of the inverse modeling and differential flatness concept. In Section 3, we introduce the flatness-based MBDoE technique with the PEM for its robustification under parameter uncertainties. In Section 4, we apply the proposed MBDoE concept to a biocatalytic reaction network, and we discuss the results in Section 5. Conclusions can be found in Section 6.

Inverse Modeling
In process systems engineering, dynamic process models are often mathematically described with ordinary differential equations (ODEs). Their state-space formulation is where t is the time, x(t) ∈ R n is the system states, u(t) ∈ R m is the system inputs, and θ ∈ R p is the parameter vector. The nonlinear function f: R n × R m × R p − → R n represents the vector field of the dynamic system. The output function is defined as with the system output vector y(t) ∈ R q and h: R n × R m × R p − → R q as the (non-)linear output function. Please note that in many cases, the system outputs are a (linear) function of the system states only; that is, y(t) = h(x(t)). Equations (1) and (2) form the process model, M. Please note that in the following, time and parameter dependencies are specified only when the context requires it. As opposed to solving the ODE system (Equation (1)) in a forward manner to conventionally exploit the outputs y given the inputs u, inverse modeling aims at reconstructing the inputs u for given outputs y [35][36][37]. Please note that the process of inverse simulation is not to be confused with inverse problems, where in the latter the goal is to estimate unknown model parametersθ [35]. The different principles of forward simulation, inverse modeling, and the inverse problem of parameter identification are visualized in Figure 1. Figure 1. Illustration of the different principles of a standard process simulation configuration (a), the inverse modeling setting to reconstruct system inputs (b), and the inverse problem for parameter identification (c).
In addition to numerical inversion approaches as model inversion-or inverse simulation-based techniques [38], algebraic methods have been proposed for inverse modeling [39], among which strategies based on differential flatness are important representatives.

Differential flatness
In systems theory, within the context of algebraic model inversion techniques and aiming at feedforward control problems, differential flatness was introduced by Fliess et al. [17] in 1992. A (non-)linear process model (Equations (1) and (2)) is called differentially flat or shortly flat if there exists an output vector y flat = h flat (x, u,u, . . . , u (s) , θ), with a finite value s ∈ N and the smooth mapping function h flat : R n × (R m ) s+1 × R p − → R q , referred to as a flat output which fulfills the following conditions: 1.
The states and the controls can be described as a function of the flat output and its derivatives: with the smooth mapping functions Ψ The dimensions of the control and the flat output vector are equal: Here, r ∈ N specifies the number of occurring derivatives. Generally speaking, r is not known beforehand, apart from single-input single-output (SISO) systems, where r = n − 1 [40]. There is an infinite number of flat output candidates, and often, they are a function of the states only [40]-similar to the fact that the real output function of a dynamic system is, in many cases, exclusively a function of the states.
Flat outputs might have no direct physical meaning, or they might be identical to measurable quantities; then also referred to as flat inputs [41]. Moreover, between local and global flatness must be distinguished, where for the local form, the differential flatness is valid only for a restricted domain bound by occurring singularities [42]. For certain kinds of singularities, a globally flat system can be designed following a superposition concept. To this end, individual local model inversion domains are created from different local flat outputs where the union of the local domains provides a global framework for the flatness-based model inversion [42]. Alternatively, a transformation approach exists to bypass points of a singularity [43]. In addition to differential flatness of ODE systems, flatness-based methods are also being used for discrete-time systems and partial differential equations (PDEs); see [44][45][46], and references within. Application scenarios based on differential flatness cover predominantly (electro)mechanical problems, but also process technologies like heat exchangers [47], crystallizers [48] or (bio)reactors [45,49], lithium-ion batteries [50], and more.

Flat Output Identification
To implement a flatness-based concept, two key challenges exist: First, determining if a system is differentially flat, and second, constructing a flat output alternatively. There is no general method for either of those challenges. However, exceptions exist, and include systems that are linearizable by static feedback control, which are flat by definition. Considering a linear system, controllability conditions differential flatness and vice versa [43]. Methods of determining flat outputs and the mapping function h flat , respectively, are currently being researched, and the interested reader is referred to [19,51,52] and the references therein. Moreover, the construction of so-called flat inputs (i.e., reconstructed inputs based on the given output function (Equation (2))) was shown for SISO systems and a limited set of multi-input and multi-output (MIMO) cases [53].
Alternatively, in systems theory, valuable information about system characterization based on the interaction of inputs, system states and outputs can also be extracted from graph theory [39,54,55]. Typically, observability or controllability measures are derived for (non)linear state-space models using directed graphs (digraphs) [56][57][58]. A digraph D = [V, E] with n + m + q vertices v i ∈ V and edges e ∈ E can be constructed from adjacency matrices A u , A x , and A y of the process model for the inputs, system states, and outputs, respectively. The a i,j -th element of A x or A u is set to 1 if derivatives exist, respectively, and to 0 if this is not the case. If these derivatives exist (i.e., ∂ f i /∂x j = 0 or ∂ f i /∂u j = 0), then there is an edge from vertex v j to vertex v i . Analogously, the adjacency matrix for the outputs y and the output function h(x, u) can be defined. Regarding the flatness property, in the paper by Csercsik et al. [55], an algorithm for flatness analysis of MIMO systems is introduced that uses an explicit expressibility graph. Please note that from a given digraph, the explicit expressibility graph is formed with reversed edge directions, with the outputs replaced by the flat outputs, and with the self-loop system states dependencies omitted. The following theorem then gives a necessary condition for differential flatness based on the adjacency matrices A u , A x , and A y [55].
Theorem 1. For a prospective set of flat output-input pairs, m pairwise disjoint paths must exist in the explicit expressibility graph, the union of which covers each vertex of the graph.
In summary, existing construction methods are often restrictive and require cumbersome calculations. In practice, for the identification of flat outputs (Equation (3)) a sequential strategy including an expert guessing for a flat output candidate followed by a subsequent validation step in terms of Equations (4)-(6) has proven favorable. This trial-and-error approach is facilitated by two facts. First, numerous technical systems are indeed flat systems. Second, frequently, comparable to Lyapunov functions in control theory, informative flat outputs have physical signification [40]. Once the flat output configuration is identified, the flat output functions y flat must be parameterized using for instance spline functions.

Basis Splines
The flat outputs y flat have to be parameterized while limited by the fact that they have to satisfy solutions of the dynamic system (Equations (1) and (2)). For flat outputs, basis functions from a large set of possibilities, ranging from simple polynomials to more complex functions, can be chosen. Basis splines, i.e., piece-wise polynomial functions also known as B-splines, offer great freedom of action, are flexible and well-studied, and libraries for their usage are implemented in many different programming languages [59]. The classical application of splines is the approximation and interpolation of data points.
A flat output based on B-splines is defined by where N i,k are B-splines of order k on l collocation points, and θ i is a coefficient vector, whose elements are referred to as meta-parameters. Please note that the meta-parameters θ i constitute the decision variables in the flatness-based optimal experimental design framework for model selection, which is introduced in the following section.

Robust Design of Experiments for Model Selection
To maximize the information provided by an experiment for the aim of discriminating between different model hypotheses, we incorporate the inverse modeling approach into the design of experiments setting. First, a generic scheme for a standard MBDoE strategy for model selection is given in Figure 2. Starting with the model-building process, several competing model hypotheses and candidates may exist. Thus, the MBDoE technique predicts operating conditions and experimental data that ensure better model selection and falsification. Next, the optimized experimental setup is implemented, and new informative data are recorded. With these new data, the model parameters of all model candidates are refined and updated. Finally, based on the recalculated mismatch of experimental data and simulation results, the difference in the competing model candidates may become more significant, and some model candidates might be excluded in terms of falsification. As indicated in Figure 2, the MBDoE workflow is an iterative process; that is, the prediction of improved experiments, the data acquisition, the adaption of the model parameters, and the assessment of the remaining model candidates is repeated until the most suitable model candidate is identified. Mathematically, the MBDoE approach can be stated as an optimal control problem (OCP) of the form Process model : Equation (1), J is the scalar objective function aiming to maximize the difference between the trajectories of the output functions of competing model candidates, and φ(t) = φ(u(t), x 0 ) is the design vector within the design space Φ specified by the control u(t) and the initial states where t 0 = 0 is the initial time, and t f is the time duration of the experiment. The objective function is given in Mayer form without loss of generality as the Lagrange and Bolza problem types can be converted to the former [18]. Both equality constraints g eq : R n×m×p → R n eq and inequality constraints g ineq : R n×m×p → R n ineq can be considered in the optimization problem. Exemplarily, they might be introduced to avoid negative values of system states or to restrict a species' concentration to a certain limit. [φ min ,φ max ] are the upper and lower boundaries for the design variables. Solving the dynamic optimization problem (Equation (8)) can be performed following two different discretization strategies of the time-dependent functions in Equation (8): Optimize then Discretize, also called the indirect or variational approach, or Discretize then Optimize, its direct counterpart. The first strategy focuses on the solution of the first-order optimality conditions for the OCP, resulting in a boundary value problem (BVP). The difficulties associated with solving BVPs led to research activities dealing with the conversion of the OCP to constrained, finite-dimensional problems to exploit state-of-the-art large-scale nonlinear program (NLP) solvers [18]. Methods applying NLP solvers can be classified further into sequential and simultaneous strategies. In the sequential approach referred to as single shooting, the control vector is parameterized, and the resulting NLP problem is solved using control vector parameterization (CVP) methods [60,61]. In contrast, the simultaneous approach multiple shooting divides the OCP into smaller subproblems requiring next to CVP initial states in the individual problems and serves as a bridge to the direct transcription approach, where all variables, i.e., states and controls, are discretized [18]. The discretization of the states and controls is usually realized using collocation and control parameterization techniques that lead to complex optimization problems for which efficient solvers and great computational power are needed [11].
Various scalar objective functions J for the MBDoE approach (Equation (8)) have been proposed in the literature [62][63][64][65][66][67]. The general goal of the optimal control problem for model selection (8) is to maximize some measure on the distance between the different model candidates by searching for an optimal control profile while meeting the specified constraints, including the solving of the underlying ODE system (Equation (1)).
The proposed flatness-based approach for the MBDoE performs differently. It literally operates in an inverse manner. Optimized flat output trajectories maximize the difference between the model outputs of the competing model candidates. The related controls u and the initial states x 0 are reconstructed from the flat model M −1 and have to be identical for all model candidates and their optimized flat outputs y flat . Using the Euclidean distance as a discrepancy measure between model outputs and assuming M model candidates, the original OCP given in Equation (8) The inputs, states and outputs in the optimization formulation are expressed according to Equations (2), (4), and (5). The differences between the recalculated inputs and the initial states for all M model candidates are measured by the delta function ∆(·) that might be the Euclidean distance alike. For the flat outputs are of functional form, time-dependent empirical basis functions .., M} with meta-parameters θ ,i have to be specified; see also Section 2.3.
The optimal control problem stated in Equation (8) is readily transformed in an algebraic nonlinear optimization problem as the system inputs and outputs are available in closed form, where the need for integrating the underlying ODE system and the need for control and or state vector parameterization are dropped. As a last step, the resulting optimal control profiles and initial conditions are employed to conduct a new experiment that is expected to deliver additional informative data regarding model falsification.
After conducting the optimally designed experiment, gathering of corresponding data, and the model parameters update, the model assessment is performed. Criteria for model assessment and selection are subject to different dimensions: falsifiability, explanatory adequacy, interpretability, faithfulness, goodness-of-fit, complexity/simplicity and generalizability [35].
For instance, a well-known and commonly used criterion is the Akaike information criterion (AIC) or its small-sample analog, the corrected Akaike information criterion AIC c [68], which evaluates the model response residuals, as well as the model complexity in terms of the dimension of the model parameter vector. In general, however, there is no master strategy for model discrimination and therefore, no universal criterion. The method applied depends on the stage of modeling, the type of uncertainty expected, and the strategy chosen [69]. For instance, the so-called overlap approach emphasizes the fact that in reality, model parameters are uncertain, which is in turn not regarded in many common discrimination criteria as the AIC or the Bayesian information criterion BIC [31]. Assuming additive measurement noise and according to the Doob-Dynkin lemma [70], the identified model parametersθ can be considered to be random variables, where the probability space (Ω, F , P) contains the sample space Ω, the σ-algebra F , and the probability measure P. Thus, the overlap approach accounts for parameter uncertainties by considering the trajectory confidence intervals, which is in contrast to the AIC, where only the point estimate at the maximum likelihood and the corresponding model responses are used [69]. Adjusted to the requirements at hand, the overlap of two models considering parametric uncertainties is where µ i,k and σ i,k are the expected value and the variance of the output function of model i at time point k resulting from the stochastic nature of the parameters, respectively. In the next subsection, we illustrate how these statistical quantities in Equation (10) can be quantified efficiently with the point estimate method (PEM).

Point Estimate Method
The mean and variance occurring in the overlap formula (Equation (10)) are known as the first raw and second central moments of a continuous random variable and are defined as where P(θ) is the probability density function related to the random model parameter vector θ of a process model M (Equations (1) and (2)). k(·) represents a generic nonlinear function. Usually, no closed solutions of the integrals on the right side of Equations (11a) and (11b) are available, and must be evaluated numerically. For accurate results, Monte Carlo simulations can be performed that draw many samples from the probability density functions. Thus, in particular, in the case of complex functions k(θ) and many model parameters, the evaluation of the integral becomes computationally expensive or even prohibitive due to the so-called curse of dimensionality. Alternatively, the PEM, initially developed for generic multi-dimensional integration problems over symmetrical regions, is a credible and practical method for uncertainty propagation with low computational cost; see [33] and references within. In process systems engineering, the PEM has been successfully applied to robustify various optimization problems, including non-symmetrical probability density functions, correlated model parameters, and imprecise parameter uncertainties [3,71,72]. The PEM approximates the integrals in Equation (11) by summing over n PEM weighted sampling points Here, θ 0 [i] means that the ith element of the nominal parameter vector, θ 0 , is permuted via the spreading parameter, ϑ, and θ 0 [(i, j)] that the ith and the jth elements are modified, respectively. Note that the weight factors, w l , as well as the spreading parameter, ϑ, are determined via a corresponding algebraic equation system; the interested reader is referred to [33] and references therein. Based on the parameter samples, θ l , we can approximate the statistics of a given nonlinear function. For instance, the approximations of the expected value and the variance as stated in Equations (11a) and (11b) read as Please note that the overall parameter sample number, n PEM , used in Equations (14a) and (14b) scales quadratically to the dimension of uncertain model parameters: At the cost of accuracy (i.e., approximation error introduced via Equation (13)), the required sample numbers in Equations (14a) and (14b) can be reduced according to: where the overall parameter sample number, n PEM , scales linearly to the dimension of uncertain model parameters: Equations (14) and (16) are exact for monomials of up to degrees five and three [33]. Thus, in what follows, we use the terms PEM5 and PEM3 to distinguish between the two PEM approximation schemes. In Table 1, we summarize the values for the weights, w l , and the spread parameter, ϑ, assuming a standard Gaussian distribution. In the case of PEM3, the spread parameter, ϑ, can be considered to be a design parameter, which is frequently set to the corresponding PEM5 value; that is, Please note that any parametric or non-parametric probability distribution of relevant model parameters can be considered via a (non)linear transformation step, including parameter correlations [71].

Robust Flatness-Based Objective Function
For the robust formulation of the nominal optimization problem (Equation (9)), we aim at introducing an additional term in the objective function that penalizes the propagated uncertainty that is quantified by the variances V of the models' inputs u. In this vein, we expect the calculated overlap (Equation (10)) to decrease as the confidence bands of the trajectories tighten. The objective function is balanced by a weight factor α ∈ [0, 1] leading to Pareto optimal points for its different values. The optimization problem for a flatness-based robust design of experiments for model selection is then stated as The decision variables of the optimization problem are the meta-parameters present in the basis functions for the flat outputs, i.e., the control vectors of the B-spline curves. As previously in the nominal optimization problem (Equation (9)), the robust formulation is a programming problem for which all occurring functions are algebraically available. Thus, the nominal version (Equation (9)) and the robust version (Equation (18)) are readily stated as an NLP without any use of parameterization methods. Furthermore, the possibility of deriving functions for exact gradients is given as opposed to the application of automatic differentiation or finite differences methods. A flow chart for the proposed robust MBDoE strategy using differential flatness is given in Figure 3.

Computational Methods
The whole program is coded in Julia [73]. The derivation of flat systems is based on the comprehensive Julia suite on differential equations [74]. For symbolic calculation, the Python package SymPy is used where its functions can be accessed from Julia via an interface. As modeling language for the optimization problems, Julia-based JuMP was chosen [75]. The nonlinear programming problems were solved with the open-source large-scale interior point solver IPOPT [76]. All calculations were run on a standard desktop machine and finished within an hour, where the bulk of the computational load was spent in symbolic calculations and the quantification of the propagated uncertainty.

Case Study
Biocatalytic reactions appear in numerous syntheses of natural products and active pharmaceutical ingredients. For example, chiral hydroxy ketones are important building blocks in the pharmaceutical industry and can be produced from aldehydes via enzymatic carboligation [22]. A lack of mechanistic kinetic models for biocatalytic carboligation and simultaneous inactivation of the benzaldehyde lyase was studied in [77] using MBDoE, where the model quality of the different model candidates was analyzed using the AIC. Motivated by the biocatalytic carboligation of ketones from aldehydes, in this simulation study, we consider two models simulating the enzymatic step from benzaldehyde to benzoin [22]. In the enzymatic reaction chain, the forward reaction forming the intermediate is significantly larger than its reverse counterpart leading to the two model candidates in Tables 2 and 3.
where the indices are assigned according to {1, 2, 3, 4} = {S 1 , C 1 , E, P}, and substrate S 1 and enzyme E can be dosed over the course of the experiment. Please note that in model M 1 , the differential equations of the second substrate S 2 , which is assumed to be constant, and the loss product C 2 are not specified, as information about their time behavior does not influence the other differential equations, and therefore, is not relevant to the problem at hand.

Results
For the case study given in Section 4, we analyzed the differential flatness property, and then, we designed discriminating input controls based on the proposed MBDoE approach, which makes use of the differential flatness concept. The effect of model parameter uncertainties was studied as well, and the robust MBDoE approach was applied as an appropriate countermeasure.

Differential Flatness Property
We derive the flat outputs for both models following heuristic methods to 1) obtain a flat output candidate according to Equation (3) and 2), with the help of graph theory, to show that the candidate fulfills the differential flatness conditions given Equations (4)- (6). Drawing the directed graphs (digraphs) for model M 1 and model M 2 , we observe that they look alike. The corresponding adjacency matrices are  The explicit expressibility graph, which can be readily obtained from the digraph, is shown in Fig.  4b. Please note that in comparison to the digraph (Fig. 4a), the self-loops are omitted, and the edge directions are reversed. In Fig. 4b, two disjoint paths are drawn that cover all vertices related to the inputs, system states, and flat outputs. Therefore, the flat output vector (Eq. 21) is supposed to form a differentially flat system which we will confirm using its definition in the following. From the ODE system, after several reformulations and substitutions, we obtain the inverse models: The control vector dimension in both models is 2. Therefore, to comply with condition given in Equation (6), the dimension of the flat output vector must be 2 as well. Inspecting the system of differential equations in Equation (19) one might observe that x 4 does not appear on the right side of any of the differential equations. The corresponding node in the digraph (see Figure 4a) represents a dead end (i.e., no edge from x 4 ∈ V to {x 1 , x 2 , x 3 } ∈ V), and therefore, must be part of the flat output.
Obviously, the second element of the flat output vector should be chosen from the concentrations left as they have direct physical meaning for the study case at hand. We consider the flat output candidate that evidently satisfies Equation (3). The explicit expressibility graph, which can be readily obtained from the digraph, is shown in Figure 4b. Please note that in comparison to the digraph (Figure 4a), the self-loops are omitted, and the edge directions are reversed. In Figure 4b, two disjoint paths are drawn that cover all vertices related to the inputs, system states, and flat outputs. Therefore, the flat output vector (Equation (21)) is supposed to form a differentially flat system which we will confirm using its definition in the following. From the ODE system, after several reformulations and substitutions, we obtain the inverse models: and Please note that we omitted to write out the equations for the second element of the output vector u 2 for model M 1 and model M 2 due to readability reasons. However, keep in mind that the functions F 1 and F 2 in Equations (22) and (23) differ. The inverse model (Equation (22)) with its flat output (Equation (21)) satisfies the set containing all three conditions for differential flatness (Equations (4)-(6)). Accordingly, the inverse model of model M 2 (Equation (23)) fulfills the conditions for differential flatness. Thus, both models are differentially flat models. For prescribed trajectories γ * 1 = x * 1 and γ * 2 = x * 4 , the necessary experimental conditions for the controls are readily available by differentiation; for an example, see [20].

Non-Optimized Experimental Design
We first consider the non-optimized experimental design, i.e., the initial situation before any optimization iteration. Please note that a batch process is assumed, and thus, the controls are zero and not explicitly shown. Plots for the measured states and the measured states with confidence bands are shown in Figure 5 on the left side and on the right side, respectively. for differential flatness. Thus, both models are differentially flat models. For prescribed trajectories 307 γ * 1 = x * 1 and γ * 2 = x * 4 , the necessary experimental conditions for the controls are readily available by 308 differentiation; for an example, see [20]. 309

310
We first consider the non-optimized experimental design, that is, the initial situation before any 311 optimization iteration. Please note that a batch process is assumed, and thus, the controls are zero and 312 not explicitly shown. Plots for the measured states and the measured states with confidence bands are 313 shown in Fig. ?? on the left side and on the right side, respectively. For the nominal case in Fig. ??, it is impossible to distinguish the two models from each other, 315 even if the experimental data were close to one of the models. In Fig. ??, it is well observable that the 316 uncertainties in the measured states given as confidence intervals are low. In order to better compare 317 the results with the expected results from the optimization part, we normalize the overlap measure 318 (Equation (10)) to the overlap corresponding to Fig. ??; that is, OVL N = 1.0.  For the nominal case in Figure 5a, it is impossible to distinguish the two models from each other, even if the experimental data were close to one of the models. In Figure 5b, it is well observable that the uncertainties in the measured states given as confidence intervals are low. To better compare the results with the expected results from the optimization part, we normalize the overlap measure (Equation (10)) to the overlap corresponding to Figure 5b; that is, OVL N = 1.0.

Optimized Experimental Design without Uncertainty: The Nominal Case
Before optimizing, suitable B-spline types must be chosen. In contrast to the classical application of splines in approximation or interpolation of available data, they are used to express smooth control and state trajectories along the time axis, and are created from their definition as opposed to a fitting process. Thus, common problems experienced in spline approximation and interpolation, like cusps and loops [78], are less likely to occur. From the different methods for splines, we choose the uniform method where the data points are uniformly distributed over the domain range. Furthermore, for each element of the flat output vectors, a B-spline curve of order 6 with 12 control points and clamped end conditions was used, resulting in a 48-dimensional decision space for the optimization problem. In the first optimization step, we follow the classical deterministic optimization by setting α = 1 in the setting (Equation (18)). The results for the measured states of the nominal case are shown in Figure 6, again, without uncertainty and with uncertainty vis-à-vis. The corresponding control curves are displayed in Figure 7 on the left side.
In the plot without propagated uncertainties, it is clearly visible that the expected values of the measured states are significantly driven apart. However, a look at the plot on the right reveals that (1) the uncertainties have substantially enlarged compared to the initial situation, and (2) the confidence bands of both models overlap noticeably. The overlap values are given in Table 4. The overlap has decreased versus the initial situation, i.e., it is 6 percentage points lower compared to before the optimization. However, we also came across simulation runs where the overlap worsened in the sense of a larger overlap value if the nominal optimization state was compared with the non-optimized. In this context, the question arises if we can perform better, i.e., further lower the overlap by inherently taking care of the parametric uncertainties in the optimization setting. Therefore, we robustify the optimization by considering the variances in the objective function as described in Section 3.  Table 4. The overlap has 336 decreased versus the initial situation, that is, it is 6 percentage points lower compared to before the

343
For the robust MBDoE setting (Equation (18)), we set α = 0.5; that is, the Euclidean distance of points that the PEM is using. In particular, when PEM5 is employed, the computational time increases In the plot without propagated uncertainties, it is clearly visible that the expected values of the 333 measured states are significantly driven apart. However, a look at the plot on the right reveals that 1) 334 the uncertainties have substantially enlarged compared to the initial situation, and 2) the confidence 335 bands of both models overlap noticeably. The overlap values are given in Table 4. The overlap has 336 decreased versus the initial situation, that is, it is 6 percentage points lower compared to before the 337 optimization. However, we also came across simulation runs where the overlap worsened in the sense 338 of a larger overlap value if the nominal optimization state was compared with the non-optimized. In 339 this context, the question arises if we can perform better, that is, further lower the overlap by inherently 340 taking care of the parametric uncertainties in the optimization setting. Therefore, we robustify the 341 optimization by considering the variances in the objective function as described in Section 3.1.

Robust Experimental Design
For the robust MBDoE setting (Equation (18)), we set α = 0.5; that is, the Euclidean distance of the flat outputs of model M 1 and M 2 as well as the resulting uncertainties in the recalculated model inputs are equally weighted. Compared with the nominal optimization, the computational time to solve reaching satisfying convergence increases multiple-fold despite the low number of sampling points that the PEM is using. In particular, when PEM5 is employed, the computational time increases considerably due to the higher sample number; see Equations (15) and (17). The concentration curves for the resulting states of the robust flatness-based design are shown in Figure 8.
The trajectories for the expected values in the left subfigure clearly drift apart, but less strongly than in the nominal optimization case; see Figure 6. Considering 100 equally spaced points, the norms of the model output distances in the robust case are roughly half as large as in the nominal case. The presumption that the overlap has decreased can be confirmed when looking at the overlap values in Table 4. considerably due to the higher sample number; see Equations (15) and (17). The concentration curves 349 for the resulting states of the robust flatness-based design are shown in Fig. 7.  Table 4.  (18)), but the technical realization of these input controls is 368 beyond the scope of this study.  The overlap value has decreased by 19 percentage points and by 13 percentage points when compared with the initial situation and the nominal optimization, respectively. The propagated uncertainties can be extracted from the graphs shown in Figure 9 where the variances for both outputs of model 1 are considerably lower when the nominal case on the left is compared with the robust one on the right. Furthermore, the approximation of the propagated uncertainty using the PEM is sufficiently good, as the results are in proximity to the Monte Carlo results with 10 4 samples. Thus, the inclusion of the propagated uncertainty as a penalty in the objective function has delivered the desired outcome. Prospectively, a different choice of α or the drawing of a Pareto front might further decrease the overlap between the two model outcomes. The corresponding optimal controls for the substrate and enzyme are drawn in Figure 7 on the right side. Please note that the realization of the profiles in a laboratory setup is constrained by the available pump devices. These constraints could be included in the robust MBDoE framework (Equation (18)), but the technical realization of these input controls is beyond the scope of this study.

Conclusions
We have extended a novel approach to the model-based design of experiments for model selection using the differential flatness property of the systems by robustifying the optimization problem against parametric uncertainty. In comparison with commonly used methods for model discrimination, the resulting optimal control problem does not require approximation methods for solving the underlying system of differential equations as both controls and states can be derived analytically in differentially flat systems without numerical integration. Likewise, parameterization of the controls and potentially the states is obsolete for we directly arrive at a nonlinear programming problem. Moreover, the possibility to provide analytic gradients to the optimization solver, even for highly nonlinear systems, is given. The robustification strategy comprised the consideration of propagated uncertainties; that is, the variances of the reconstructed model inputs were added as a penalty term in the objective function. The corresponding uncertainty quantification was performed using the point estimate method, a computationally cheap but reasonably accurate method compared with standard Monte Carlo simulations for uncertainty quantification. We successfully applied the strategy to a nonlinear enzymatic reaction using B-splines as the parameterization technique for expressing the flat outputs, and thus, obtained optimal experimental input controls for the subsequent experiment. We showed that a nominal optimization without the consideration of parameter uncertainties might result in unreliable optimal controls, i.e., not leading to the possibility of selecting one model over another after having performed the subsequent experiment. In contrast, if parametric uncertainties were inherently considered in the optimal experimental design problem, more reliable MBDoE results were identified. We emphasize the importance of including the quantification of uncertainties over classic deterministic optimization where the differential flatness concept shows promising characteristics for advanced system identification strategies in terms of precise parameter estimates and reliable model selection.