An Analysis of Uncertainty Propagation Methods Applied to Breakage Population Balance

In data-driven empirical or hybrid modeling, the experimental data influences the model parameters and thus also the model predictions. The experimental data has some variability due to measurement noise and due to the intrinsic stochastic nature of certain pharmaceutical processes such as aggregation or breakage. To use predictive models, it is imperative that the accuracy of the predictions is known. To this extent, various uncertainty propagation techniques applied to a predictive breakage population balance model are studied. Three uncertainty propagation techniques are studied: linearization, sigma point, and polynomial chaos. These are compared to the uncertainty obtained from Monte Carlo simulations. Linearization performs the worst in the given scenario, while sigma point and polynomial chaos methods have similar performance in terms of accuracy.


Introduction
In the pharmaceutical industry, mathematical models are an integral part of the quality by design (QbD) paradigm [1][2][3].Mathematical models may be white-box (mechanistic), black-box (empirical), or gray-box (hybrid).White-box models are based on a mechanistic understanding of the underlying physical, chemical phenomena that are well understood (e.g., the Arrhenius equation).Black-box models are data-driven models that do not consider any physics behind an operation and are valid only in a very specific region of operation.Gray-box models combine both a theoretical understanding and empirical data.As complete mechanistic models can be very expensive to build, most models used in the industry fall in the empirical or hybrid categories.
Inevitably, the experimental data used to build such a model affects the estimated values of the model parameters and hence the model predictions.A modeler must thus carefully design the experiment such that the information content in the data is maximized to obtain the most accurate parameter estimates [4].However, even with well designed and accurate experiments, some variability is inherent to the modeling process.This variability arises from a lack of measurement samples (both number and repetitions), the noise that corrupts measurements, and the intrinsic stochastic nature of certain processes such as breakage or aggregation [5].Uncertainty refers to the precision of the parameters estimated from given data.In many cases, this uncertainty is described by the confidence interval of the parameter estimate.If decisions must be made using models with uncertain parameters, it is important that how uncertainty affects model prediction is known.
This work describes how the accuracy of a model prediction can be described from the confidence intervals of the estimated parameters.The focus lies on model of conical screen mill developed by Barasso et al. [6].The conical screen mill is a popular size reduction equipment used to delump blended active pharmaceutical ingredients (APIs), break tablets for recovery, or deliver a specific reduced particle size.A classical way of modeling milling equipment is by using the population balance framework [7].Under the assumption of well-mixedness, population balance models (PBMs) are hybrid models which can describe the temporal change in the particle number density in a physical volume through ∂n(t, x) ∂t with initial condition n(0, x) = n in (x).
(2) n(t, x) describes the particle number density as a function of time, t, and the internal state vector, x.
The internal state vector defines the properties which are used to describe the number densities in the distribution, e.g., concentration, porosity, and particle size.The second term in the equation describes the continuous change over the internal state vector arising from processes such as crystal growth, consolidation, or evaporation.Processes involving the appearance or disappearance of particles at discrete points in the internal state vector (e.g., aggregation or breakage) are not taken into account by this term but by the birth and death terms on the right-hand side: B(t, x) and D(t, x), respectively.In many cases, a one-dimensional (1D) PBM is formulated using only the particle size (x) as the internal state vector.However, multidimensional PBMs considering properties such as concentration or porosity can easily be formulated to account for complex situations that can arise.In case of a pure breakage process, such as the conical screen mill, the above equation can be reduced to a one-dimensional PBM as The term B(t, x) on the right-hand side of the equation represents the birth of the particle by the breakage process.The breakage function b(x, y) is the probability function describing the formation of particles with size x by the breakage of the particles with size y.The selection function S(x) describes the rate of breakage of particles with size x.The term D(t, x) on the right describes the death of the particle because of breakage.The selection function and the breakage distribution function are normally empirical functions whose parameters need to be estimated from a given experimental dataset.
In this work, the cone mill model developed in [6] is used as a predictive model.The uncertainty in the parameters is propagated to the median particle size (d 50 ).The d 50 is a common size indicator used in the pharmaceutical industry to describe the size of an API.In many cases, design decisions are based on the d 50 , making it a critical quality attribute (CQA).As such, it is important that the accuracy of the d 50 value predicted from the model is known.In the pharmaceutical industry, the most common method used is the Monte Carlo method.This method works well for relatively simple models which are not computationally expensive.However, for complex expensive models, Monte Carlo quickly becomes unattractive due to the computational time required.The aim of this work is to present other methods that can be used to propagate uncertainty through a nonlinear model, without requiring excessive sampling as in the Monte Carlo method.Three uncertainty propagation techniques are considered for comparison: linearization, sigma points, and polynomial chaos expansion (PCE).As no analytical steady state solution is available for the PBM, these methods will be evaluated against the Monte Carlo method.In case the analytical solution is available, the accuracy of the techniques could be compared using an error norm (e.g., [8]).All four methods are described in detail in Section 2.1.The cone mill model and its parameters are described in Section 2.2.The numerical method used to solve the PBM is briefly described in Section 2.3.This study will not discuss or compare the various discretization schemes available to solve the PBMs numerically.

Uncertainty Propagation Methods
In this section, the four uncertainty propagation techniques are described.For brevity and ease, a dynamic model is represented in its general form ẋ = f (x, θ) (4) where x ∈ R n x is the state vector, y ∈ R n y is the output vector, and θ ∈ R n θ is the uncertain parameter vector.

Monte Carlo Method
In the Monte Carlo method, a large number of pseudo-random input parameters are drawn from the estimated parameter distribution [9].Based on these parameters, the model output is calculated, and the mean and the confidence interval is determined empirically through the law of large numbers.The Monte Carlo method is relatively easy to apply as there is a large availability of random number generators available.Moreover, as no assumption is made on the distributions, this method can be considered the most accurate.However, there is no general guidance on the number of parameters that should be drawn to obtain reliable results and as such tens of thousands of model simulations may be required, making it computationally very expensive.

Linearization
The linearization approach uses a linear approximation of the variance-covariance matrix of the model output.This approximation is obtained from a first-order Taylor series expansion of the model with respect to the uncertain parameter.However, the higher-order terms in the expansion can only be neglected if the uncertainty is smaller compared to the model curvature.If S = ∂ f /∂θ is the sensitivity matrix of the model output with respect to the uncertain parameters, and V θ is the variance-covariance matrix of the parameters, the variance-covariance matrix of the model output is given by From the variance-covariance matrix, the (1 − α)100% confidence can be found akin to the confidence bound on parameter estimates giving [10] However, as the variance on the estimated parameter is only the lower bound on its true variance [11], the actual uncertainty on the model output can be even higher.

The Sigma Point Method
The sigma point method (SP) is a sampling-based method for nonlinear transformation of Gaussian random variables [12].The parameter distribution is represented by a finite number of deterministically chosen sampling points called the sigma points.For n uncertain parameters, a set of 2n sigma points is chosen as A set of model outputs can be calculated from the sigma points as The mean can then be calculated as while the variance-covariance matrix is calculated as The uncertainty on the model output can the be predicted using Equation (7).When no information on output distribution is available, it is suggested that the value of κ be set to 3 − n.This ensures that the root mean squared error in the mean prediction is minimized up to the fourth order [12].In the sigma point approach, the model equations need to be solved 2n + 1 times.

The Polynomial Chaos Method
In the polynomial chaos expansion method (PCE), the model response is approximated by an infinite series of orthogonal basis functions [13].For practical applications, the infinite series is truncated to a limited number of terms M.
The basis function can be estimated using the Wiener-Askey scheme [14], which suggests the use of various orthogonal polynomials based on the probability distribution of the parameters.The number of terms in the series is determined by the number of uncertain parameters (n) and the order of the polynomials (m) used as Once the PCE has been formulated, the mean and the variance of the model output can be approximated from the PCE coefficients as ȳPCE = a 0 ( 16) Different methods exist to determine the coefficients of the PCE.Intrusive methods use Galerkin projection to compute the coefficients [15,16].These methods can be a complex set of equations that need to be derived and solved for each case.Non-intrusive methods are based on sampling by repeated model evaluations at the collocation points [13,17].The number of collocation points should be higher or equal to the number of coefficients in the PCE.The non-intrusive approach is used in this study.

The Mathematical Model for the Cone Mill
The cone mill consists of a rotating impeller that provides impact energy to the particles.The particles stay in the mill until they are reduced to a size smaller than the screen aperture.Different types of impeller shapes are available, along with a variety of screens with different shapes and a variety of aperture sizes.The screen size is the most significant parameter affecting the particle size of the milled product [18,19].Even then, the impeller speed, impeller shape, and the screen size have a statistically significant impact on the final size distribution [20].For the same impeller shape, either an increase in impeller speed or a decrease in screen size leads to a lower mean particle size.
In this work, the model developed by Barasso et al. [6] is utilized.The model describes the evolution of the number of particles over time with respect to the particle size represented by its volume.The model considers a cone mill operated in a starve feed mode, which is common for continuous operations.
Here, n(x, t) is the number of particles of volume x in the mill at any time t.This model is a simple extension of the batch breakage equation described in Equation ( 3) to a continuous system by including the feed inlet ( ṅin (x, t)) and the product outlet ( ṅout (x, t)).
ṅin (x, t), the number of particles being fed to the mill, is calculated from the mass flow rate with ρ being the density of the particles, and f in (x) being the volume-weighted distribution of the feed particles.ṅout (x, t) is the outlet flow based the following screen classification model.
where d(x) is the particle diameter associated with volume x. d screen is the screen opening, and δ is a parameter which determines the critical diameter.A particle with diameter larger than the screen opening will be retained in the mill, whereas a particle smaller than the critical diameter will exit the mill.The screen is non-ideal for particle diameters between the screen opening and the critical diameter.The terms B(x, t) and D(x, t) describe the birth and death of the particle of size x by breakage as described in Equation (3).
The breakage rate depends on the particle and process parameters and is represented as with α and γ as model parameters that need to be tuned, and v imp is the impeller speed of the cone mill shaft.The breakage distribution function is chosen to be a log-normal function The volume of the parent particle is represented by y, and C(y) is chosen to fulfill the mass conservation constraint The parameter values and the operating conditions for the model are given in Table 1.The parameter estimates and their confidence bounds computed by Barasso et al. [6] from the experimental data are used in this study.

The Numerical Method
As the analytical solution of the PBM described in Section 2.2 is impossible, the PBM must be solved numerically.A variety of other methods based on size grid discretization are available in the literature [7,[21][22][23][24].The fixed pivot technique (FPT) of Kumar and Ramakrishna [21] is used in this study.The general idea behind the FPT is to discretize the size range into small sections and to represent each section by a pivot.If a new particle is born at a size other than that of the pivot, it is divided between the neighboring pivots such that any two integral properties are conserved.
The continuous size domain is first discretized into I cells of size ∆x i = x i−1/2 − x i+1/2 , i = 1, . . ., I. Every individual cell [x i−1/2 , x i+1/2 ] is represented by a size x i .The particle distributions are considered to be point masses at these points.Thus, the entire size distribution can be represented by The equation for the FPT by conserving the number and mass are given as follows: where

Results
Four parameters-critical screen parameter δ, the two selection function parameters α and γ, and breakage distribution parameter σ, are considered uncertain for this study.These parameters are assumed to be Gaussian random variables, with the mean as the nominal value and the standard deviation described in Table 1.
The model is used to predict the d 50 of the API exiting the mill.The mill is operated with a 1575 µm sieve and an impeller speed of 4923 RPM.After 30 s of operation, the impeller speed is changed to 1500 RPM.This is done to highlight the benefits and drawbacks of the various methods considered here.All the computations are carried out in MATLAB 2017b (The MathWorks Inc., Natick, MA, USA).
The PBM is solved by discretizing the model using the fixed pivot technique described in Section 2.3.The discretization leads to a system of differential algebraic equations, which was solved using a variable order numerical difference formula (ode15s function).Normally, the choice of discretization also affects the solution.In this study, only the fixed pivot method with 100 grid points is considered.A comparison of various discretization schemes for the cone mill is provided in [25].
All the methods will be compared at time just after the impeller speed is changed.This region represents the maximum dynamic change in the model and, as such, can be used to critically evaluate the methods.As the set point change is induced after 30 s, the evaluation is carried out on the d 50 value at 31 s.The uncertainty bands in all cases refer to the 95% confidence intervals calculated based on the F-distribution.

The Monte Carlo Method
The Monte Carlo variance decays as 1/ √ N, with N being the number of samples.Thus, if a large number of samples are used, the Monte Carlo method can be considered the most accurate.However, the results of the Monte Carlo depend on the number of samples that are drawn from the given distribution.Some studies discuss the methods for determining the number of iterations required in the Monte Carlo method.However, these calculations are not always feasible in practice.Thus, the appropriate number of iterations must be determined empirically.Figure 1 illustrates the mean value of the d 50 for a different number of samples.It can be seen that the mean value stabilizes above 4000 samples.Thus, at least 5000 samples should be used to obtain reliable information from the Monte Carlo simulations.For further comparison, 12,000 samples are drawn from a normal distribution as depicted in Figure 2. The d 50 distribution at 31 s is depicted in Figure 3.This output is tested for normality using the Kolmogrov-Smirnov (KS) test.It should be noted that, even after 12,000 model evaluations, the KS test rejects the hypothesis that the output could be drawn from a normal distribution.Thus, more samples would be required to evaluate the true variance of the distribution.For this study, the distribution is fit to a normal curve.It is observed that, for the d 50 value at 31 s, the distribution fit has a mean of 208.366 ± 0.741 µm and a standard deviation of 41.7409 ± 0.5177 µm.This fit is considered good enough to use 12,000 Monte Carlo samples to quantify the uncertainty.The result of the Monte Carlo simulation over the entire time is shown in Figure 4.
As such, the three other methods will be compared to the solution from the Monte Carlo method.

Linearization Method
The main advantage of the linearization method is its ease of implementation and relatively low computational burden.The Jacobian matrix required can be calculated numerically.In this study, a simple finite difference scheme is used to calculate the Jacobian.
The deviation ∆θ was chosen to be 10 −3 times the nominal parameter value.Thus, for the current system, with four uncertain parameters, the linearization method is required to evaluate the model five times to determine the Jacobian.
The result of the linearization method is depicted in Figure 5.It can be observed that, in some regions of operation (after 100 s), the linearization method yields a good approximation of the uncertainty.However, it overpredicts the uncertainty in regions of high system dynamics.As the evaluation of Jacobian is extremely sensitive to model curvature, linearization completely fails in regions of high model dynamics.This is evident from Figure 5.At the moment the setpoint change induced (30 s), linearization grossly overpredicts the uncertainty associated with model prediction.Thus, linearization should be used with caution, especially when there are dynamic conditions to which the model is sensitive.

Sigma Point Method
With four uncertain parameters, nine sigma points need to be calculated, and the model is evaluated at these sampling points.As can be observed from Figure 6, the results of the sigma point method closely mimic the Monte Carlo simulations even in the region of the setpoint change.
This shows that the sigma point approach is more reliable than the linearization approach.The performance comes at a cost, as more function evaluations are required.However, the number of iterations is still orders of magnitude smaller than that for the Monte Carlo method.The major drawback of the sigma point method is that it requires the parameters to be normally distributed.

Polynomial Chaos Expansion
As the parameters in this study are assumed to be normally distributed, Hermite polynomials are used based on the Weiner-Askey scheme [14].PCEs of order 1 (PCE1) and 2 (PCE2) are studied, and the linear regression method is used to determine the coefficients of the PCEs.Based on Equation ( 15), PCE1 leads to 5 unknown coefficients, and PCE2 to 15 unknown coefficients.A major issue in application of PCE is sampling.To determine the coefficients, a set of K (K ≥ M + 1) samples from the random variables is sampled.Generally, K = 2(M + 1) samples are considered to be sufficient for a robust solution.However, the choice of samples highly affects the accuracy of the results.Thus, a variety of sampling techniques are proposed in the literature [26,27].In this study, we use sigma points from Section 3.3 augmented by a Latin-hypercube-based design [28] to sample in the feasible space denoted by the parameter confidence interval.The results of PCE1 (with nine sampling points chosen to be the sigma points) are depicted in Figure 7, and the results of PCE2 (with 32 sampling points) are depicted in Figure 8.In Figure 9, the accuracy of the PCE2 method is evaluated based on the number of samples used.The mean d 50 value (bars), and its variance (error bars) starts to converge towards the Monte Carlo value (solid & dashed horizontal line) with an increasing number of samples.Although not used here, PCE can easily be expanded to use third-order polynomials.However, in that case, the number of samples will increase to a minimum of 72.Generally, the increase in accuracy achieved by higher-order PCE is not enough to warrant the increased computational burden [13].In general, we can say that PCE methods with adequate sampling approximate the mean and variance accurately.In Figure 10, the four methods are compared using the d 50 evaluated at 31 s.It is clearly seen that linearization performs the worst of all the methods.As previously mentioned, this is due to the change in model curvature induced due to step change in the impeller speed at 30 s.The other techniques-sigma point, PCE1, and PCE2-result in values that are comparable to each other.The sigma point method slightly underestimates the variance, while the PCE1 method slightly overestimates the variance.The performance of the PCE methods can be improved by using more sampling points or using a higher-order polynomial.For all practical purposes, the performance of these three methods in terms of accuracy can be considered the same.Figure 11 compares the number of function evaluations required for each technique.Monte Carlo performs the worst in terms of computational time with 12,000 function evaluations.All other methods require a significantly lower number of evaluations.For the current case, SP methods seem the most suitable, as we know the parameters to be normally distributed.

Conclusions
Firstly, it can be concluded that the model in consideration is not adequate for any decision making, as the prediction uncertainty in terms of 95% confidence interval is high.The model needs to be improved either by using more experimental data to estimate the parameters or, if that fails, by improving the model structure.
However, the aim of the study is to discuss the selection of methods for uncertainty propagation to provide an accurate representation of the model prediction uncertainty in the breakage population balance models.The results demonstrate that, although computationally the cheapest, linearization does not provide a reliable estimate of the uncertainty.If it is known a priori that the model under consideration is smooth and not extremely nonlinear, linearization can be a good option to achieve a first estimate of the prediction uncertainty.However, in the case of highly nonlinear dynamics, other approaches are deemed necessary.
The Monte Carlo method is the most accurate way of predicting the uncertainty, but due to the difficulty in knowing the number of samples required, the method can quickly become very computationally demanding.Recently, Monte-Carlo-based methods such as multi-level Monte Carlo (MLMC) and quasi Monte Carlo (QMC) methods have gained popularity, as they reduce the computational time of the full Monte Carlo method.MLMC tries to minimize the computational time by approximating the final mean as a sum of predicted means obtained at lower accuracy (which in many cases means a lower computational time) [29].Thus, even though MLMC leads to a larger number of function evaluations, the total computational cost could be lower than a standard Monte Carlo method.The QMC, on the other hand, relies on smart sampling strategies to reduce the number of function evaluations, thus reducing the computational time [30].These methods, although interesting, would still lead to a fairly high computational cost and can be justified when there is no information on the distribution of uncertain parameters.However, as in the case under study, the parameters are known to be normally distributed, the other methods are more attractive.
The deterministic sampling of the model parameters in the sigma point method give it an advantage over the random sampling of the Monte Carlo method The sigma point method reduced the number of samples and function evaluations from more than 12,000 to only 9 and still provided an accurate representation of uncertainty.However, sigma point methods can only be applied when the distribution of the uncertain parameters is symmetric and unimodal.
PCE methods also lead to an extensive reduction of sampling points compared with the Monte Carlo method and provide accurate predictions on model uncertainty.PCE methods can be complex to implement.The choice of polynomials and collocation points for sampling plays an important role and, as such, is non-trivial.If the uncertain parameters are characterized by asymmetric and/or multimodal distributions, the PCE method has to be used to replace the Monte Carlo method.The choice of orthogonal polynomials depends on the distribution of the uncertain parameters and follows the Wiener-Askey scheme [14].
Thus, we can conclude by stating that the sigma point method is the most attractive method for applications such as the PBM studied here because of its ease of implementation, its accuracy in representing model uncertainty, and its computational efficiency.PCE methods become attractive when the parameter distributions are known a priori to be asymmetric or multimodal.However, when no information about parameter distributions is available, the Monte Carlo method has to be used.In such a situation, MLMC or QMC methods can reduce the computational burden.

Figure 1 .
Figure 1.Stability of the Monte Carlo methods with respect to the number of samples drawn.The mean value for the d 50 stabilizes after around 5000-6000 iterations.

Figure 2 .
Figure 2. Parameter distributions used for the Monte Carlo method.Twelve thousand samples are used for further comparison.

Figure 3 .
Figure 3. Distributions of d 50 value at 31 s simulated from the Monte Carlo method using 12,000 samples.The solid line shows the fit of a normal distribution to the histogram.

Figure 4 .
Figure 4. Model prediction (black solid) with 95% confidence range (shaded) using the Monte Carlo method with 12,000 samples.

Figure 6 .
Figure 6.Model prediction (black solid) with 95% confidence range (shaded) using the sigma point method.

Figure 7 .
Figure 7. Model prediction (black solid) with 95% confidence range (shaded) using first-order polynomial chaos.Twelve samples were drawn using the Latin hypercube method.

Figure 9 .
Figure 9.The number of samples used for PCE2 approximation based on the d 50 value at 31 s.

Figure 10 .Figure 11 .
Figure 10.The different techniques with results from the Monte Carlo simulation (horizontal lines) using the d 50 value evaluated at 31 s. Figure (a) shows all four methods, while Figure (b) is an inset which excludes linearization due to its high confidence interval.