Analysis of the Parametric Correlation in Mathematical Modeling of In Vitro Glioblastoma Evolution Using Copulas

: Modeling and simulation are essential tools for better understanding complex biological processes, such as cancer evolution. However, the resulting mathematical models are often highly non-linear and include many parameters, which, in many cases, are difﬁcult to estimate and present strong correlations. Therefore, a proper parametric analysis is mandatory. Following a previous work in which we modeled the in vitro evolution of Glioblastoma Multiforme (GBM) under hypoxic conditions, we analyze and solve here the problem found of parametric correlation. With this aim, we develop a methodology based on copulas to approximate the multidimensional probability density function of the correlated parameters. Once the model is deﬁned, we analyze the experimental setting to optimize the utility of each conﬁguration in terms of gathered information. We prove that experimental conﬁgurations with oxygen gradient and high cell concentration have the highest utility when we want to separate correlated effects in our experimental design. We demonstrate that copulas are an adequate tool to analyze highly-correlated multiparametric mathematical models such as those appearing in Biology, with the added value of providing key information for the optimal design of experiments, reducing time and cost in in vivo and in vitro experimental campaigns, like those required in microﬂuidic models of GBM evolution.


Introduction
Biological processes usually involve several cell populations interacting in a complex, dynamic, and multiple interactive micro-environment [1]. Understanding these interactions between cells and microenvironment is crucial in many physiological and pathological processes [2]. However, progressing in this understanding with only in vivo experiments is difficult. Despite them being more realistic, isolating effects or achieving particular conditions is complex in such experiments due to technical and/or ethical reasons.
In vitro experiments permit better control of the variables, while reducing costly and ethically-questioned animal assays. Nonetheless, the predictive power of currently available in vitro models is still poor due to the strong difficulties that we face in reproducing the structure and distribution of the different cell populations as well as the particular environmental conditions in which cells live, adapt and react (e.g., three-dimensionality) [3]. Microfluidics is a new in vitro technique that allows more precise reproductions of the microenvironment and cell distribution [4,5], including three-dimensionality, thus making in vitro tests much closer to the actual in vivo conditions. This permits, for example, a more reliable and efficient drug testing [6,7].
Finally, mathematical models allow to separate and quantify the effects of each mechanism or parameter, as well as to predict the outcome in "what if" situations, which are sometimes impossible to achieve in in vivo or in vitro experiments [8,9]. Nevertheless, these models are mostly non-linear, involve highly-coupled multiphysic interactions, and include many parameters. In many occasions, those parameters are difficult to measure and have strong hidden correlations. Moreover, it is usual to have a lack of data both for quantification and validation of the parameters and results [10]. Therefore, they are fitted only for the results available, which usually correspond to very specific conditions. This may lead to trivial conclusions that could have been directly derived from the model assumptions, making the results only useful for those particular experiments, with the obtained conclusions impossible to generalize.
In a previous paper [11], we addressed this parametric analysis in a particular problem-the mathematical modeling of the in vitro (using microfluidic devices) evolution of glioblastoma multiforme (GBM), the most aggressive and lethal among primary glioma tumors [12]. In Ref. [11], we presented a general framework in which the main cell processes involved (proliferation, chemotaxis, random migration, apoptosis, and necrosis), in response to changes in the oxygen concentration, were mathematically formulated. We then analyzed three different experimental configurations, reproducing the main GBM migratory structures (pseudopalisade and necrotic core formation). An extensive analysis of all model parameters was performed, both from literature and by fitting the associated in silico results with those derived from the experiments. As main results of that work, we identified a unique set of parameters able to accurately reproduce the quantitative results for the three case-studies. However, we also found two model limitations: (i) the sensitivity analysis showed that the model is strongly affected by small variations in the oxygen cell consumption and diffusion and (ii) a strong correlation was found between the parameters associated with those two mechanisms.
The objective of the present work is to present the possibilities in this context offered by a methodology that is able to separate the correlated effects found in that study, and to get a more accurate and reliable representation of the experimental results in the parametric space. With that purpose, we approximate the multidimensional probability density function of the parameters by means of appropriate copulas. Copulas allow considering separately the marginal distributions and the dependence between variables in multivariate statistical problems, including those with high correlation. This permits using general models for the marginal distributions, while the variable dependence model can be different [13]. Copulas are today used in a wide range of areas in Economic sciences and Engineering. The most recent models have been successfully applied in portfolio management and optimization [14], actuarial analysis [15], quantitative finance and risk theory [16,17]. A particularly hot topic is the study of climate-agent time series [18,19], hydrology [20,21] and weather and climate research [22,23]. Some efforts have been made in transportation research [24] and traffic policy [25]. Recently, copulas have been successfully applied in reliability analysis in civil [26], mechanical and structural [27], offshore [28] and software [29] engineering. In Biology, copulas have been used in the field of genetics [30] to model gene dependencies.
Up to the authors' knowledge, there is no work using copulas for the parametric analysis of evolution processes in Biology, where, as commented, many of the parameters involved are unknown and uncontrolled, and high correlations between parameters are common. We prove here that copulas are an adequate tool to improve the analysis of highly-correlated multiparametric mathematical models such as those appearing in Biology, with the added value of providing key information for the optimal design of new experiments with the highest information possible, thus reducing time and cost not only in in vitro experiments but also in scarce and costly in vivo cases.

Deterministic and Stochastic Models
Let us suppose that our problem may be represented by the following mathematical relationship: with • u (an m-dimensional vector) the output variable, that is, the outcome of the experiments, that we measure. • λ the variables which we can control when performing the experiments (such as environmental variables, geometric parameters, or boundary conditions). • θ the model parameters, that we cannot control and whose values must be determined (θ ∈ Ω, with Ω the parametric space of dimension n). • F the mathematical model, that relates the experimental configuration λ with the output variables u in terms of the set of parameters θ.
In relation to the accuracy and precision of the model, it is possible to define three levels of analysis: (1) the model is perfect and the experimental measures are noise-free; (2) the model is perfect and the experimental measures are noisy; and (3) the model is not perfect and the measurements are noisy. Only the third case is, in general, realistic in complex problems as the one here analyzed.
In addition, it is difficult to define universal values for the parameters in biological problems, since they are highly-dependent on the particular experimental context.
As a consequence of all the previous observations, it is more appropriate to consider a stochastic approach, and reformulate Equation (1) as: where U and Θ are now random vectors of dimensions m and n respectively. The proposed approach is therefore suitable when the following conditions are satisfied: • Many coupled phenomena are present, being difficult to design experiments able to isolate each of them (complexity). • The measurement space is large and it is possible to perform a sufficiently big number of experiments N (data availability).
From a mathematical point of view, these two statements may be reformulated as: • The model F includes many parameters (n 1) and/or is non-separable. The separability of a model is evaluated by the possibility of approximating F as: The lower M, the easier to define a set of different experimental configurations S = {λ j } j=1, ..., k to isolate each of the parameters θ j by solving separately each equation u j = F M (λ, θ). Although this separability definition is not very rigorous, it is enlightening enough for our purposes. • The dimension of the measurement space is high (m 1) and/or the sample size is large enough (N 1). Without loss of generality, we consider that m is, actually, the reduced dimensionality of the space or in other words that all variables of the ambient space are independent.

Case Study: In Vitro GBM Evolution
There have been many attempts to develop mathematical models to describe how tumors grow and respond to therapies [10,31]. In particular, in previous works, we demon-strated the possibility of developing GBM pseudopalisades [32] and necrotic cores [33] in vitro. Figure 1 illustrates one of such experiments in which a high density cell culture is exposed to oxygen flow by two lateral channels but, due to self-induced hypoxia, the formation of a necrotic core in the central part of the chamber is observed. One of the main problems in these models is the lack of reliable values for the many parameters involved that forces many times to rely on values fitted from different situations, leading sometimes to unreliable conclusions. We recently proposed a mathematical model for GBM in vitro evolution [11], together with an extensive parameter discussion. This model enables the simulation of different stages of GBM evolution under several experimental conditions, showing robustness, while keeping a small uncertainty range in the results. It is established in terms of three advection-reaction-diffusion equations and the associated parameters that are expressed as: Equation (4) quantifies the evolution of the cell normoxic phenotype concentration, C a , with three terms: random diffusion, growth-death source, and chemotaxis. Equation (5) models the evolution of the necrotic phenotype concentration, C d , which contains only the dead cells derived from the normoxic phenotype. Finally, Equation (6) defines the O 2 concentration evolution in the hydrogel in which cells are embedded, considering both oxygen diffusion and cell consumption. Functions β a , G a , χ O 2 a , χ C a a , S ad and H a are nonlinear corrections accounting for cell metabolic behavior: a defines a chemotaxis correction accounting for the oxygen concentration. It has been shown that GBM cells present what is called the go or grow behavior [34]: cells spend resources in proliferating when they are enough oxygenated and activate migration mechanisms under hypoxia conditions, that is, when the oxygen concentration is under a certain hypoxia threshold O H 2 . Therefore, we state: χ C a a defines a chemotaxis correction accounting for the cell concentration. We assume that cellular motility is only possible when the cell concentration is below the saturation capacity of the hydrogel C M : β a accounts for the dependence of the proliferation activity on the oxygen concentration, in agreement with the go or grow paradigm [34]. Cell proliferation decreases when the oxygen concentration is under the hypoxia threshold, O H 2 , and is totally inactivated under total lack of oxygen: G a is a logistic growth correction accounting for space and nutrients availability [35]. Cell proliferation decreases when the cell concentration approaches the hydrogel saturation capacity, C M : S ad is a death activation function accounting for the oxygen concentration. Cell death is a complex phenomenon that can be due to two different cell mechanisms, necrosis, and apoptosis [36,37]. Cell necrosis is highly dependent on the oxygen concentration, while cell apoptosis is not. Therefore, we have chosen a soft transition function for S ad depending on two parameters-a location parameter, O A 2 , identifying the anoxia oxygen concentration and a spread parameter, ∆O A 2 , associated with the death stochastic nature: Finally, H a is the Michaelis-Menten correction factor in oxygen consumption, related to the oxidative phosphorylation kinetics [38]. The consumption rate is constant for high oxygen concentrations, but decreases to zero with a homographic shape. The value of the oxygen concentration for which the consumption rate is halved is the so-called Michaelis- The function H a is then stated as: Equations (4)-(6) are complemented with the boundary and initial conditions. For the experiments carried out in our microfluidic devices, we assume total impermeability (Neumann boundary conditions) for the cell populations and a fixed value for the oxygen concentration at both sides of the channel (Dirichlet boundary conditions). Therefore, if L is the chamber length, we may write: with O l 2 and O r 2 the oxygen levels at the left and right channels of the chip. The initial oxygen concentration is assumed to be homogeneous over the whole chamber and equal to the maximum of both lateral oxygen concentrations, that is . The resulting experimental parametric space consists, therefore, of three parameters, corresponding to the concentration at the boundaries of the chip, (O l 2 , O r 2 ), and the initial cell concentration, (C 0 ), assumed constant throughout the chip. That is: O H 2 , O A 2 , ∆O A 2 and O M 2 have a clear meaning in terms of cell metabolism and are assumed to be known and constant for all cell cultures used in our experiments, at least from an illustrative point of view. Besides, although C M is very dependent on the experimental conditions (hydrogel mechanical properties, nutrients, ...), we shall assume it is constant, for the sake of simplicity. The values for these parameters were taken from a previous work [11].
Previous research in computational biology has mainly focused on the value of the parameters or, in the best case, in their (individual) uncertainty. However, in many cases, the fitting process is very complex and the parameters are highly correlated due to, at least, two facts: • Samples variability: Different physical phenomena may have an inherent correlation supported by physical considerations, being this correlation independent of the experiments performed or the model used. For example, when working with GBM cellular models, cell motility is induced by the random motion inherent to any cell and several taxis effects driven by external physical or chemical stimuli. Mathematical parameters related to these phenomena (e.g., diffusion and chemotaxis coefficients) appearing in the model equations will present, therefore, a strong correlation in the different experimental samples. • Model complexity: The non-separability of the model and/or the experiments does not allow to isolate the particular mechanisms. For example, when working with GBM cellular models, without further measurements of cell oxygen consumption or oxygen flux, it is impossible to establish if a lack of oxygen in a certain region is due to high cell consumption or due to low oxygen diffusion. The mathematical parameters related to these phenomena (e.g., oxygen diffusion and cell oxygen consumption coefficients) should present a strong correlation, although this correlation does not have a physical meaning, being inherent to the model or to the experimental set-up.
Thanks to the flexibility, portability, automation, integration, and miniaturization of the microfluidic experiments, a huge amount of data may be generated. Accordingly, this type of experiments is a perfect domain of application for the framework presented herein.

Data Generation and Numerical Solution
As the methodology is based on the availability of sufficient data, the data set used for illustrating the methodology was generated synthetically using numerical simulation. For this purpose, the assumed values for the parameters were extracted from Ref. [11] and a data set of "experimental" measurements was generated by simulation, using randomly generated boundary and initial conditions.
The summary of the model parameters is shown in Table 1, together with the value used for data generation.

mmHg
With respect to the simulated virtual experiment, we set a chip length of L = 0.1 cm, a mesh size of ∆x = 0.0025 cm and a time step of ∆t = 1000 s. N = 400 different experiments, {λ i } i=1, ..., 400 , were simulated varying the boundary conditions: the left and right channel oxygen concentrations were set randomly between 0 and 7 mmHg using two independent uniform distributions while the initial oxygen concentration was set to the maximum of both values, as mentioned. The initial cell profile is supposed to be uniform and randomly sampled from a reciprocal distribution (to take into account both the exponential and saturated growth regimes) between 4 × 10 6 and 5 × 10 7 cell/mL. The numerical solutions are obtained for t m = 8 d and the output variable associated to the experiment i, u i = u s (x, t m ; λ i ), is the numerical solution of the model equations (the mathematical approach and numerical procedures and algorithms are detailed in Ref. [11]), with boundary and initial conditions defined by λ, at time t m and at points given by the defined mesh x. Here, x j = j∆x, j = 1, . . . , 41. The computed data were all perturbed with a uniform . . , 41 and i = 1, . . . , 400. Within the framework presented in Section 2.1, u = F(λ, θ) are the numerical solutions obtained, with λ the control parameters, θ the unknown parameters and F the mathematical model presented.

Concept of Copulas
In Probability and Statistics, a copula is an n-multivariate probability distribution function U whose marginals, U i , are uniform distributions on [0, 1] [39]. They were introduced by Sklar in 1959 [40]. As the marginal distributions are known, a copula describing the structural dependence between variables is enough to perfectly define the model.

Mathematical definition.
As mentioned, a copula is a function C : I n → I, where I = [0; 1] such that: • For u 1 , . . . , u n ∈ I, and if u i = 0 for some 1 ≤ i ≤ n: • For u j ∈ I, 1 ≤ j ≤ n: We can distinguish between parametric and non-parametric copulas. In this work, we use a hybrid approach, as we fit the marginal distributions by means of kernel estimators [41] of the probability density functions and use a parametric copula. With this approach, the required data-set grows as O(n) where n is the space dimension.

Fitting and Model Validation
Let us suppose we have a data-set of values for different experiments, λ i , characterized in terms of a resultant mean value µ i and a covariance matrix Σ i , i = 1, . . . , N, obtained from different measurements associated to the configuration i. As the assumed model F is known, it is possible, for each piece of data u i , to obtain the set of parameters θ i which best fits it.
In order to avoid pathological numerical convergence, we only take into account those sets of parameters θ i which lie inside the bibliography ranges considered in Ref. [11], amplified by 50% to avoid considering the parameters bounds as deterministic values, that, as shown in Table 2, are very large ranges. Therefore, the resulting intervals are [(1 − κ)x inf , (1 + κ)x sup ], being x inf and x sup the lower and upper bounds detailed in Ref. [11] and κ = 0.5, as summarized in Table 2. As a result of this process, we obtain a dataset with n = 6 (number of parameters), N = 111 (dataset size) and m = 41 (measurement space dimension), so we are under the scope of the presented framework: N × m n > 1. Once θ i , i = 1, . . . , N are obtained, the next step is the adjustment of the marginal distributions. The values θ i j , j = 1, . . . , n, are used for fitting the marginal random variable Θ j whose cumulative distribution is assumed to be G j . Here, we can follow either a parametric (that is, G j (x) = G j (x; α j )) or a non-parametric approach (which is the one followed in this work). The values θ i j are therefore transformed into uniformly distributed ones via the standard transformation y i j = G j (θ i j ). As y i are considered uniformly distributed with a joint dependence, it is possible to fit this structural dependence using parametric copulas.
To summarize, the steps of the training process are: 1. Problem minimization to obtain θ i . We have to minimize the residual function R i : where the Mahalanobis distance has been used to take into account the sample variability. Assuming that Σ i = σ i 2 I, Equation (18) can be rewritten as: 2. Kernel density estimation of the marginal distributions from the data θ i j . 3. Transformation into uniformly distributed values y i j . 4. Copula fitting of the y data to capture the joint dependence.
The presented sequence of steps allows moving from a dataset S = {θ i } i=1, ..., N to a probabilistic model for the random vector Θ (the marginal kernel densities and the copula parameters encoding the structural dependence), as it is the aim of statistical procedures.
To avoid overfitting, we follow a typical train-test approach: we divide the datasets λ i − u i (where u i includes µ i and Σ i ) in two separate subsets, one used for training and the other used for testing.
If we consider now the test data-set, the procedure is:

Model Analysis and Parameter Estimation
Once the distribution of the random vector Θ is learned, the model is known from a probabilistic point of view. The first straightforward application is parameter estimation It is important to emphasize that with "parameter estimation" we refer to the parameters of the mathematical model, not to the parameters of the distributions used in the statistical characterization (actually, the statistical characterization may be non-parametric), that may be estimated via common statistical inference techniques. A point estimate of the model parameters is given by: where P is a central tendency operator, for example, the expectation operator E, minimizing the L 2 squared norm dispersion (its minimum is the variance), or the geometric median operator M, minimizing the L 2 norm dispersion (its minimum is the mean absolute deviation). However, it is more interesting to perform a confidence region estimation. As suggested in Ref. [44], in this work, we use the so-called Highest Density Regions (HDR) because of their easy interpretation, straightforward generalization to multi-dimensional spaces and direct computation. Recall that, under some distributional assumptions (e.g., normality assumption), HDR computation is reduced to other standard confidence region computation techniques (e.g., χ 2 quantile tolerance ellipsoids). HDR computation enables reliable parameter estimation since, given a significant level threshold α, it is possible to define an HDR region in which the parameters are located with a p = 1 − α probability. This may be performed for single parameters, or, in general, k-tuples of parameters.
This methodology is also applicable to conditional distributions. Let us suppose that we know the value of a certain subset of parameters θ * and let us define θ = (θ , θ * ). Knowing the distribution Θ, that is obtained after the fitting-validation procedure, it is possible to define the conditioned distribution of Θ given Θ * = θ * by its density f defined in terms of the density f of θ: so all HDR computations are now applied to the distribution of Θ given Θ * = θ * by replacing f by f .

Design of Experiments
Design of experiments techniques aim to maximize the information obtained from each performed experiment, in order to reduce the number of them required [45]. In particular, in this work, we use the techniques within the Bayesian Experimental Design (BED), based on the Bayesian interpretation of probability.
BED aims to maximize the expected utility of the experiment outcome [46]. The utility function expresses how useful is the information provided by an experiment. Of course, the optimal experiment design depends on the chosen utility criterion. In this work, the definition of the utility function is based on the Shannon entropy or Information entropy [47].
Under these assumptions, the utility of an experiment λ is defined as the priorposterior gain in Shannon information. That is, the additional information that the experimental configuration λ provides about our model parameters. The utility U(λ) then writes: where u is the experimental observation and θ is a vector of parameters to be determined. f (u|θ, λ) is the probability density of obtaining an experimental outcome u given the experimental configuration λ and the model parameters θ and f (θ, u|λ) is obtained as follows, being f (θ) the prior PDF over the parameters θ: If we assume that u has a multivariate normal distribution (what is indeed not necessary but has been here considered for illustration purposes) with covariance matrix Σ = σ 2 I, and knowing that the entropy of a multivariate normal distribution of dimension n is only dependent on the standard deviation σ [48], we have the following expression for the utility: We assume that we measure the alive cell concentration at 5 given points: u k = C a (x = x k ), k = 1, . . . , 5, where x 1 = 0.015 cm, x 2 = 0.035 cm, x 3 = 0.050 cm, x 4 = 0.065 cm, x 5 = 0.085 cm. We work under the homoscedasticity and independence assumption so that each concentration measurement is assumed to be normally distributed with µ i = u i and σ i = σ, i = 1, . . . , 5. The uncertainty associated with the measurement of the cell concentration is assumed to be σ = 1 × 10 6 cell/mL.
As we work under the assumptions detailed above, Equation (24), representing the utility of an experimental configuration λ, may be computed via numerical integration. A convergence analysis was performed, justifying the use of a given value of N θ (number of sampling points for the model parameter) and N u (number of sampling points for the experimental outcome) for each computation in the numerical integration process.
In order to avoid numerical problems, in all simulations the uniform distributions of the parameters were sampled from = 0.01 to 1 − = 0.99.

Marginal Distributions
First of all, we obtain the fitting of the univariate marginal distributions. Figure 2 shows the kernel estimation of the marginal distribution of the different parameters. We have chosen a Gaussian kernel for all the estimations with variable bandwidths (w 1 = 7.46 × 10 −11 cm 2 /s , w 2 = 9.52 × 10 −10 cm 2 /(mmHg·s), w 3 = 1.66 × 10 −6 cm 2 /s, w 4 = 2.17 × 10 −10 mmHg·cm 3 /(cell·s), w 5 = 9.57 × 10 4 s and w 6 = 2.74 × 10 4 s). The values are generally concentrated around the one used for the data generation, although the distributions present a variable uncertainty, related to the model complexity and its influence on the minimization procedure. For example, it is interesting to observe that all distributions present a multimodal feature, surely related to the existence of several local minima in the minimization procedure.

Parametric Copula Structure
Then, the data are transformed into uniformly distributed values using the cumulative distribution function (CDF) associated to this kernel estimation and a t-Student copula fitted by means of maximum likelihood (ML) estimation. The use of a t-Student copula is justified as it allows a different structural dependence for each of the variable pairs considered [16] and, besides, it outperforms Gaussian copula when estimating the cooccurrence of extreme events [49]. We obtain a copula with ν = 1.8 degrees of freedom and a Pearson correlation matrix of: Note that the value obtained for ν is far from the Gaussian limit (ν → ∞), justifying the use of the t-Student model.

Complete Probabilistic Model and Bayesian a Posteriori Corrections
In order to briefly analyze the aspect of the whole model, we represent in Figure 3a the bivariate joint distribution of (D O 2 , α a ). Knowing the whole joint distribution function allows us to make a posteriori corrections using Bayesian theory and conditional probability as explained in Section 3.2.3. If we are interested in the joint distribution of two parameters (e.g., D O 2 and α a ), assuming that we know the rest (D a , K a , τ a , τ ad ), the uncertainty of the parameter estimation obviously decreases, as can be seen in Figure 3b. In order to compare the impact of setting a posteriori the rest of the parameters, contour plots of both distributions, absolute and conditional (normalized between 0 and 1 to compare them more easily) are depicted in Figure 3c. (a) Bivariate joint distribution of (D O 2 , α a ).
(b) Bivariate joint distribution of (D O 2 , α a ) assuming we know the rest of parameters.
(c) Comparison between the distribution shape of (a) and (b).

Validation of the Results Using Test Data
Over-fitting is one of the main problems in any statistical or numerical parametric fitting. In our methodology, this is avoided by using a sub-set of the data as test data for validating the models.

Marginal Distributions
Marginal distributions are validated as pointed out in Section 3.2. To do so, new "experimental" data are compared to the data generated from the multivariate model. It is important to note that the original data are not used, but, on the contrary, a new data-set is strictly generated from the parametric copula and marginal densities, using the same procedure described for the generation of the original data. The histogram of data, the ecdf of the test data (with 95% confident interval) compared to the model data, the boxplot of both test and model data and the Q-Q plot of the test data, when compared to the model, are shown in Figure 4 for D a as an illustrative example. The validation of the whole set of variables has been performed and good agreement was found between the model and test data except, if at all, for the extreme values, at the tail values of the distributions. In Figure 5, the ecdf of the test data for each model parameter is shown.

Joint Dependencies
Testing the structural dependence between parameters is not trivial. In Section 3.2, a multivariate statistical test was referenced. However, here we evaluate merely the differences in the correlation coefficients between the model-based and the test data. In Figure 6b, we represent the Kendall τ correlation index between the variables for the model and test data. We observe again a good agreement between the model values of the correlation coefficients ( Figure 6a) and those obtained from the sample of the test data ( Figure 6b), even though the test sample is finite, which can cause differences between the model and the statistical values.

Parameter Estimation
In Figure 7, we show p-confident HDR regions for p = 0.90, p = 0.95 and p = 0.99 for the pair of variables D O 2 − α a . We present the results for the absolute distribution and the conditional distribution when the rest of parameters are known. The results are compared with the classical ellipsoid estimation, which is based on the normality assumption. The differences, both in the shape and the size of the regions, are clear and are explained by the complex dependence structure between variables.

Estimation of the Output Variables
Once the multivariate distribution of the random vector Θ is characterized, we know the distribution of the random vectors U = F(λ, Θ). In Figure 8, we show the distribution of the vector U for three experiments, which illustrates completely different behaviors corresponding to the main histopathological features of GBM. For the first one, the oxygen flow is set to 2 mmHg in the left channel and 0 in the right channel and the initial concentration of cells is C 0 = 4 × 10 6 cell/mL (pseudopalisade experiment in Ref. [11]). For the second one, the oxygen flow is set to 7 mmHg in both channels and the initial concentration of cells is C 0 = 40 × 10 6 cell/mL (necrotic core experiment in Ref. [11]). Finally, for the third one, the oxygen flow is set to 7 mmHg in both channels and the initial concentration of cells is C 0 = 4 × 10 6 cell/mL (double pseudopalisade experiment in oxygenated conditions in Ref. [11]).

Design of Experiments
In this section, the aim is to determine the experimental configuration with the highest utility, that is, to choose both right and left oxygen flow levels and the initial cell concentration to get the maximum possible information from the new experiment. We focus here on the effect of coupling between parameters and how it affects the utility interpretation and model parameter estimation.
Two different families of simulations were carried out. In the first one, only one parameter dependence is analyzed at a time, leaving the rest fixed at the value set in Section 3.1. These figures show configurations where, if the rest of the parameters are assumed to be known, the unknown parameter will be estimated accurately. This is the case in Figure 9a,b. In the second family, two parameter dependencies are analyzed. They are considered as bivariate distributions in order to observe the effect that the parameter correlation has in characterizing these parameters, that is, how it modifies the utility values. Figure 9c, which belongs to this family, illustrates experimental configurations where the two-dimensional vector will be estimated accurately.
In Figure 9 we compare the iso-utility curves when analyzing one or two parameter dependencies for the pair of parameters related to oxygen, changes in the cell population and cell motility respectively. We assume for all figures C 0 = 5 × 10 7 cell/mL. In these figures we can see the most useful experiments (those configurations corresponding to the highest utility values) and those that lead to a poor adjustment of the model parameters.
This analysis may be performed for different parameter combinations, and for different degrees of knowledge. For instance, Table 3 summarizes all possibilities when exploring the relationship between D O 2 and α a , as we are interested in the estimation of these two parameters, both individually or jointly. The cases analyzed in this paper are reported in the third column. Table 3. Different possibilities when exploring the relationship between D O 2 and α a in the utility computation.  . Iso-utility curves for parameters related to oxygen for an initial concentration of C 0 = 5 × 10 7 cell/mL.

Discussion
The train-test methodology based on copulas followed in the fitting process has shown that it is possible to establish a gradation in the strength of the parameter dependencies. Figure 6a illustrates the strength of this relationship, showing that there are pairs of phenomena difficult to isolate from the experimental and/or computational points of view. For example, cell random motility and chemotaxis migration (τ = 0.77). Both phenomena have similar effects but in the opposite direction. Thus, it is difficult to isolate their individual effect on cell behavior if we have limited measurements available on the cell profiles. It is then only possible to evaluate, on the outcome, their combined resultant effect, that is, the average cell motility. This analysis may be done for each parameter couple, justifying the approach adopted in this work.
It is important to note that the high complexity of biological systems, resulting in coupling between pairs of variables, is moderated by the values of the rest, since the bivariate random distributions (shown for example in Figure 3a) are only a projection of the whole 6-dimensional joint distribution. Comparing Figure 3a,b, for example, we can observe the conditioning effect in location, spread, and directionality of the dependency.
Once the probabilistic model is fitted, predicting the actual value of the model parameters is easily carried out. As it is observed in Figure 7, the normality assumption for the confidence region estimation is not always a good starting hypothesis. First, it does not take into account the complexity of the relationship between the model parameters (i.e., physical phenomena) and may lead to non reliable values (meaningless physical magnitudes, such as negative oxygen diffusion). Secondly, it may mislead with respect to the uncertainty that we actually have for different significant levels. In any case, the confident region estimation using HDR and a proper probabilistic analysis are very informative about the degree of reliability of the mathematical model used for a biological explanation. These two observations become even more evident when the uncertainty of the model is reduced, as it can be seen when comparing Figure 7a,b: the chosen significant level has a major impact on the confidence region size and shape. In all the cases analyzed, this uncertainty reduction makes the confidence region to concentrate around the parameter values used in the data generation process. Knowledge of the model parameter variation (from a probabilistic point of view) allows to predict the outcome of a given experiment. This can be used not only for model calibration and validation, but for experimental planning (deciding the appropriate material, equipment or accuracy of the measuring devices and techniques to be used). For example, in Figure 8, it may be seen that the necrotic core experiment requires less accuracy in the measurement of the cell profile in the central part of the chamber for parameter estimation, while the pseudopalisade experiment requires a measurement technique able to detect extremely low alive cell concentrations. It can also be observed that the appearance of significant alive cells at the right side of the chamber in the pseudopalisade experiment would not be explained by the model parameter variability, but rather by a model limitation.
The probabilistic knowledge of the model can be further exploited in experimental planning and design by using BED theory. In the analysis performed in this work, there are several aspects important to remark. All graphics showing the utility function are symmetric with respect to the line O l = O r . This is coherent with the symmetrical configuration of the experimental set-up (geometry and properties). The utility value should therefore not be modified by flipping the boundary conditions. Besides, it can be seen that the level curves belonging to D O 2 and α a have similar shapes. This is due to the correlation between parameters, as it can be observed from the Kendall correlation coefficient τ for each pair of variables (Figure 6b). The coefficient corresponding to D O 2 and α a is high and, consequently, they are strongly correlated, so the experiments needed to characterize the value of one of them are similar to the ones needed to characterize the other.
Iso-utility curves give us a picture that may be interpreted biologically and is coherent with the different phenomena occurring in the microfluidic device. However, the coupling between them makes this interpretation difficult. In this work, the utility has been computed for four different initial cell concentrations, ranging from a low concentration C 0 = 1 × 10 6 cell/mL to the chip saturation concentration C 0 = C M = 5 × 10 7 cell/mL. The maximum utility is always reached for the highest initial concentration (5 × 10 7 cell/mL).
A summary of the analysis is presented in Table 4, where the best experimental configuration is presented for each of the parameters' calibration, together with the maximum utility value. For the analyzed family of experiments, the most useful experiments are always the ones performed for high concentrated cell cultures. As most phenomena are related to cell concentrations, the higher the concentration, the more quantifiable the different biological mechanisms. Besides, it results clear that configurations with oxygen gradient are more useful for accurately characterizing the parameters related to oxygen (D O 2 , α a ) and cell migration (D a , K a ), when the other parameters are assumed to be known. However, this gradient has to be moderate to avoid regions of total normoxia or total anoxia. When the aim is to perfectly discriminate between their effects, softer gradients are generally preferred (Figure 9c, Table 4). Finally, for high initial cell concentrations, growth and death parameters are also well characterized under gradient conditions: we need to induce localized hypoxic conditions in order to evaluate growth under saturation capacity and death.

Conclusions
Mathematical modeling of complex cell processes is very challenging due to its intrinsic non-linearity, highly-coupled multiphysic interactions, and the many correlated parameters which are difficult to measure or simply unknown. These parameters are most times obtained for a particular problem under specific conditions, leading in many cases to conclusions, directly derived from the modeling assumptions and therefore providing little new information. Also, they are difficult to generalize.
As a result, a proper and extensive parametric analysis is mandatory. This should include an extensive and detailed study of the values reported in the bibliography, a careful sensitivity analysis and a sufficient number of different experiments, not only for calibration but also for validation, avoiding parameter overfitting.
This analysis, although it allows the identification of the optimal set of parameters, is most times difficult to extend to other problems with reasonable accuracy and therefore with a certain validation of its actual physical character and its value range. It is also difficult to discriminate between correlated parameters associated to mechanisms that cannot be isolated in the experiments. Hence, we need additional information both to get a better discrimination between them, and to identify the optimal conditions for additional experiments to provide the maximum information possible in order to get such discrimination.
We have proved here that copulas are a simple and powerful tool to detect and improve highly-correlated multiparametric mathematical models such as those appearing in Biology, with the added value of providing key information for the optimal design of new experiments with the highest information possible for the problem in hands, thus reducing time and cost not only in our in vitro experiments but also in scarce and costly in vivo cases.