Dynamic Optimisation of Beer Fermentation under Parametric Uncertainty

: Fermentation is one of the most important stages in the entire brewing process. In fermentation, the sugars are converted by the brewing yeast into alcohol, carbon dioxide, and a variety of by-products which affect the ﬂavour of the beer. Fermentation temperature proﬁle plays an essential role in the progression of fermentation and heavily inﬂuences the ﬂavour. In this paper, the fermentation temperature proﬁle is optimised. As every process model contains experimentally determined parameters, uncertainty on these parameters is unavoidable. This paper presents approaches to consider the effect of uncertain parameters in optimisation. Three methods for uncertainty propagation (linearisation, sigma points, and polynomial chaos expansion) are used to determine the inﬂuence of parametric uncertainty on the process model. Using these methods, an optimisation formulation considering parametric uncertainty is presented. It is shown that for the non-linear beer fermentation model, the linearisation approach performed worst amongst the three methods, while second-order polynomial chaos worked the best. Using the techniques described below, a fermentation process can be optimised for ensuring high alcohol content or low fermentation time while ensuring the quality constraints. As we explicitly consider uncertainty in the process, the solution, even though conservative, will be more robust to parametric uncertainties in the model.


Introduction
Consumption of alcoholic beverages like beer, wine, and spirits has always played an important role in food security and health [1]. With its history extending to more than 7 millennia [2], beer is one of the most popular beverages in the world. The global beer consumption in 2019 was estimated at around 189.05 million kilolitres [3]. The beer sector is a major contributor to the European Union's economy with more than 10,300 active breweries providing over 130,000 jobs. In 2018, the beer sector contributed over e55 billion to the EU's economic growth [4]. With 340 breweries in Belgium producing over 1500 different beers [5], the beer culture of Belgium has been inscribed on UNESCO's Intangible Cultural Heritage of Humanity list [6]. According to Belgian law [7], beer is "the beverage obtained after alcoholic fermentation of a wort prepared primarily from starch and sugary raw materials, of which at least 60% barley or wheat malt, as well as hops (possibly in processed form) and brewing water". Thus, the main ingredients which form beer are only barley malt (i.e., a starch source), hops, yeast, and water.
Nevertheless, beer production is a complex process with a multitude of processing steps involved. The barley is converted into malt during the malting process. Malting converts the hard barley grains to friable malt by producing and activating various enzymes. The malt is then milled, and mixed with water to convert the starch and proteins into fermentable sugars through a process known as mashing. The mashed product, known as wort, is then boiled with hops (although hops can be added at different stages of the beer production). The residual hops and other products coagulated from the boiling process A common requirement for all the optimisation studies mentioned above is the process model. A process model is a mathematical abstraction of the reality. Describing the plethora of chemical species produced during fermentation [12,22] would lead to an extremely complex mathematical model. It is thus common to use reduced-order models which only consider key chemical compounds. A variety of process models have been proposed to describe the fermentation process [23][24][25][26]. The model developed in de Andrés-Toro et al. [23] is based on extensive experimentation on industrial-scale fermentation and has been widely used in optimisation studies. All models contain parameters which have to estimate from experimental data.
As all experiments inherently have noise (either measurement noise, or in the case of yeast, biological variability), the parameters estimated from the data are uncertain. Although it is possible to reduce the uncertainty (i.e., improve the parameter accuracy) by performing more and better experiments, and by using better sensors, it is impossible to completely eliminate the uncertainty. Thus, any optimisation study based on a mathematical model must take this parametric uncertainty into consideration. Use of inaccurate parameters can lead to constraint violations, which in reality might affect the safety of the process or the quality of the product. In this paper, an optimisation strategy to determine the fermentation temperature profile by including the parametric uncertainty in the process model is presented. Following all the optimisation studies mentioned earlier, the model developed by de Andrés-Toro et al. [23] is used.
In the following sections, the dynamic optimisation formulation under parametric uncertainty is presented. The concept and techniques of uncertainty propagation are introduced. This is followed by the description of the de Andres-Toro model for beer fermentation. The in-house tool used for the dynamic optimisation is briefly presented. The temperature profile obtained after the optimisation is then discussed in the results section. Finally, the conclusions section summarises the main findings of this paper.

Materials and Methods
In this section the fermentation model used for optimisation is described. Then, the robustified dynamic optimisation formulation is presented. The three techniques for uncertainty propagation are discussed. These are the linearisation, sigma point, and polynomial chaos expansion methods. Next, the approach to evaluate the solutions based on Monte Carlo simulation is presented. Finally, the software used is briefly discussed.

Fermentation Model
As mentioned, the fermentation model developed by de Andrés-Toro et al. [23] is used in this study. It is assumed that the yeast pitched to the wort is immediately suspended and consists of active, latent, and dead yeast cells. The latent yeast cells become active over time and eventually die. Only the active cells contribute to the fermentation. The fermentation is differentiated into two phases: a lag phase in which the latent cells convert to active cells and the active phase in which fermentation occurs. The active phase begins when 80% of the latent cells are activated. The total concentration of yeast cells suspended in the wort (C X,sus ) at any time is given by where the subscripts act, lat, and dead correspond to the concentration of active, latent, and dead yeast cells.
In the lag phase, the dead cells settle in the wort and the latent cells are activated. The rate of settling (µ SD ) depends on the wort density and concentration of carbon dioxide. The wort density is related to the initial concentration (C S0 ), while the carbon dioxide concen-tration is related to ethanol concentration (C eth ). The rate of yeast activation (µ L ) is related to the temperature via an Arrhenius-type equation. The lag phase can be modelled as, Once the active phase begins, the active yeast cells start reproducing and dying (with a rate, µ DT ), while the latent cells continue to get activated. The dead yeast cells continue to settle to the bottom of the fermentation tank. The biomass dynamics in the active phase are described as The sugar consumption is proportional to the active yeast cell concentration and the uptake rate µ S follows Michaelis-Menten kinetics.
Similarly, the rate of ethanol formation µ eth also follows Michaelis-Menten kinetics. However, as experimental data showed a decrease in the rate with time, an inhibition factor ( f ) is included. The evolution of ethanol concentration is expressed as Two by-products are considered: diacetyl (C DY ) and ethyl acetate (C EA ). Diacetyl or 2,3-butanedione is a carbonyl compound which is produced primarily in the early phases of fermentation. At a later stage, diacetyl is converted into other products. The rate expressions for the formation and then conversion of diacetyl used in the model are obtained experimentally. Although an important compound for beer quality, it adversely affects the flavour in high concentrations. At concentrations over 0.1 ppm, it brings a buttery flavour and rancid mouthfeel to the beer [11]. Ethyl acetate is an ester with the highest concentration in beer. It is normally used as an indicator for all the esters present and has a buttery-solvent-like aroma [14]. The by-product evolution is described by All the rate expressions are described in Table 1. The rate constants depend on the temperature Arrhenius-type relation described by The parameter values used are reported in Table 2.

Rate Expression
Settling of dead cells

Dynamic Optimisation under Parametric Uncertainty
Any process model can be represented in a general form asẋ = f (x, u, θ, t) with x ∈ R n as the process state vector, u ∈ R m as the control vector, and θ ∈ R p as the parameter vector. The general optimisation problem, which minimises an objective J in the interval t ∈ [0, t f ], is written as c(x, u, θ, t) ≤ 0 (12) As mentioned in the introduction, the parameters θ are estimated from experimental data and are inherently uncertain. The uncertainty in the parameters is typically described by a (possibly unknown) probability distribution. The aim is now to ensure that critical constraints (c(x, u, θ, t)) are not violated. Two approaches are possible. In the robust approach, the worst case scenario is considered and the problem is reformulated in terms of the worst case [27]. However, it is possible that the worst case scenario occurs at a very low probability and optimising for this scenario gives a very conservative solution [28]. In the stochastic approach, the constraints are reformulated as chance constraints which express a limit on the probability of constraint violation [29]. These constraints are expressed as Computationally tractable formulation of such probabilistic chances constraints is difficult [30]. Thus, it is common to approximate them using deterministic constraints of the form where E · and V · are the expectation and variance operators, respectively. α c is termed the backoff parameter. The choice of the backoff parameter depends on the allowed probability of constraint violation β. The backoff parameter can be chosen via the Cantelli-Chebyshev inequality [31]. Although this parameter is valid for any probability distribution, it leads to an excessively large backoff parameter, which can then lead to infeasibility in the optimisation problem. Another approach to compute the backoff parameter utilises the quantiles of the model output distribution [32]. However, this requires an assumption on the underlying distribution of the model output under uncertainty. It is common to assume a normal distribution [29,32,33]. Under the assumption of normality, a 5% allowance for constraint violation, for example, leads to a backoff parameter value of 1.65.
Similar to the constraints, the objective can also be reformulated by allowing for a backoff parameter. The reformulated optimisation problem can now be expressed as Note that it is not necessary to reformulate all constraints as chance constraints. Normal constraints and approximated chance constraints can both be present in the formulation.

Uncertainty Propagation Techniques
The formulation presented in Equation (15) requires the expectation and variance of the constraints and the objective to be computed efficiently. The uncertainty in the parameters propagates through the nonlinear model onto the model output. When the probability density function of the uncertain parameters ρ θ (θ) is known, it is possible to compute the expectation and variance. For a hypothetical nonlinear function y = f (θ), the expectation and variance can be computed as However, evaluating these integrals is numerically challenging. Thus, various approaches to estimate the expectation and the variance are used.
Linearisation is a popular approach based on using first-order approximation of the process model.
where S θ 0 is the sensitivity matrix (S = ∂ f /∂θ) evaluated at the nominal parameter value x 0 . Thus, when the parameters in the nonlinear model follow a normal distribution, the variance-covariance matrix of the model response can be approximated by V y = S θ 0 V θ S θ 0 . Linearisation performs well when the uncertainty is small compared to the model curvature.
When the model is highly nonlinear, the approach fails to provide a correct estimate of the variance [34,35]. The other two techniques utilised in this paper are the sigma point method [36] and polynomial chaos expansion method [30,37]. The sigma point method relies on a smart selection of sampling points (called the sigma points), which are then propagated through the nonlinear model to obtain a response set. The expectation and the variance are then estimated from this response set. The sigma points are computed based on the parameter variance V θ .
For p uncertain parameters, the set of sigma points contains 2p elements. This set is then transformed by the nonlinear map (i.e., the nonlinear process model in this case) to obtain a set of model responses, The expectation and the variance-covariance matrix are computed using this response set.ȳ The polynomial chaos expansion aims to approximate the model response by a truncated sum of orthogonal polynomials. The model response is approximated as where Ψ i (θ) are multivariate orthogonal polynomials. The total number of terms required in the approximation (M) depends on the number of uncertain parameters (p) and the order of the polynomials (o) used.
If the parameters are assumed to be independent, the multivariate polynomials can be constructed from univariate orthogonal polynomials. The Wiener-Askey scheme [37] lists some commonly used probability distributions that are associated with certain univariate orthogonal polynomials. Other approaches based on statistical moments [38,39] or the Gram-Schmidt orthogonalisation [40,41] are available to generate the monovariate polynomials when the parameters are correlated or do not follow a distribution in the Wiener-Askey scheme.
Once the polynomials are generated, the coefficients in the expansion, a i s, need to be computed. Approaches to compute these coefficients are classified into intrusive and non-intrusive approaches. Intrusive approaches use Galerkin projections on the process model to generate the coefficients [42,43]. Non-intrusive methods, however, treat the model as a black-box and rely on sampling [44]. In this paper, the non-intrusive approach based on least-squares regression is used. Both Nimmegeers et al. [29] (supplementary file) and Bhonsale et al. [38] elaborate on this approach.
Due to the certain attributes of orthogonal polynomials [42], the expectation and variance can be estimated directly from the coefficients in the model response expansion.
Regardless of the method used for uncertainty propagation, the optimisation problem described in Equation (15) has to be augmented with extra states so that the expectation and variance can be estimated. The augmented states depend on the technique used. For linearisation, the augmented state is the sensitivity matrix, while for sigma points and polynomial chaos the augmented states are the original states evaluated at the parameter sampling point.

Implementation
In all cases, the dynamic optimisation problem has to be solved numerically. In this paper the direct collocation approach [45] is utilised. In this method, both the state and control vectors are fully discretised in time leading to a nonlinear program (NLP). For the states, between every discretised time interval four collocation points are used. These collocation points obey the model equation through equality constraints. A cubic Lagrange polynomial with collocation points situated at the Radau roots is used at each interval. The control variable is discretised as piecewise constant on each time interval. The NLP is solved using the interior point method as implemented in IPOPT [46].
All the dynamic optimisation problems are implemented in a Python-based tool developed within the BioTeC+ team called POMODORO [47]. POMODORO utilises CasADi [48] to obtain the gradient and Hessian information required for the optimisation.
CasADi can provide exact Hessian information with an automatic differentiation method. While the default NLP solver in POMODORO is IPOPT, other (commercial) solvers can also be used via their CasADi interface. POMODORO is available freely for academic use via the website https://cit.kuleuven.be/biotec/software/pomodoro (accessed on 12 November 2021).

Results and Discussion
The fermentation model described in Section 2.1 is optimised for two objectives: (i) maximising ethanol production, and (ii) minimising batch time. A constraint on minimum ethanol concentration is imposed. The minimum concentration at the end of the batch is set to 25 g/L. Similarly, an end time constraint on the suspended active cells is also imposed. The maximum concentration of active cells should be less than 0.5 g/L. The two major quality constraints imposed are on the concentrations of diacetyl and ethyl acetate. The maximum concentration of these by-products should be less than 0.2 ppm and 2 ppm, respectively [11,21]. Without accounting for the uncertainty on any of the parameters, the optimised temperature trajectories are depicted in Figure 1a,b.  When maximising the ethanol concentration at the end of the batch, the wort is immediately heated to a relatively high temperature to accelerate the conversion of latent cells to active cells and to enhance the growth of these active cells. The temperature profile stays relatively constant to boost the production of ethanol. This also enhances byproduct formation. The temperature is optimised such that the concentration of ethyl acetate does not exceed 2 ppm. As there is no limit on the batch time, the fermentation continues until all the active yeast cells are consumed. A maximum ethanol concentration of 60.24 g/L is reached in 168 h.
When minimising the batch time, the wort is first heated at the maximum possible temperature (15 • C) to accelerate the the conversion to and the growth of active yeast cells. However, this also accelerates the production of ethyl acetate. Thus after some time, the wort is cooled too rapidly. The cooling occurs in steps because the consecutive temperature jumps are constrained to 2.5 • C/h. This slows down the byproduct formation; however, the ethanol production continues. The evolution of the concentration profiles is depicted in Figure 2b. Towards the end of the batch, the temperature increases again rapidly so as to ensure the active cells are killed and the concentration of yeast cells is below the threshold of 0.5 g/L. The batch stops as soon as this threshold limit is reached. The ethanol concentration achieved is 48.11 g/L in 100 h.

Influence of Uncertainty
The fermentation model used contains 18 parameters, each of which are determined experimentally. However, considering the uncertainty on each parameter would lead to a very large optimisation problem which would be numerically infeasible. Thus, based on a sensitivity analysis, four parameters which have the most influence on the states of interest (quality parameters) are considered. These are as follows: the maximum growth rate (µ X,0 ), yeast death rate (µ DT ), maximum sugar consumption rate (µ S,0 ), and ethyl acetate production (Y EA ). The only quality parameters considered are the byproducts and the ethanol concentration. Under in silico uncertainty, Figure 3 shows the trajectories of 500 Monte Carlo simulations with the optimal temperature profile obtained for maximising ethanol concentration. The diacetyl concentrations are always below the threshold, while the ethyl acetate concentrations show a large variation and large constraint violation. In total, 50.4% of Monte Carlo simulations violated the ethyl acetate constraint. This exemplifies the need to incorporate the uncertainty into any optimisation study.

Optimisation under Uncertainty
The uncertainty is incorporated in the optimisation study using the three approaches described earlier. For polynomial chaos expansion, two polynomial orders are used: first order and second order. As the parameter uncertainty is assumed to be Gaussian, Hermite polynomials are used. A desired confidence interval for the constraints is set according to the quantiles of a normal distribution. The backoff parameters used in this study are reported in Table 3. The optimised control profiles are then tested via a Monte Carlo simulation with randomly generated parameter values.    It is impossible to judge a priori which approach provides the best approximation. Thus, the results obtained are assessed using a Monte Carlo simulation. The result with lowest percentage of constraint violation is preferred. The constraint violation as a percentage of the total simulation is reported in Table 4. All methods lead to lower constraint violations than the nominal case. However, second-order polynomial chaos seems to perform the best. The probability densities of the two constraints via the trajectories optimised using all methods are depicted in Figure 5. The majority of constraint violations are caused due to overproduction of ethyl acetate. This is expected from the process dynamics. As concentration evolution of both ethanol and ethyl acetate is described by similar mathematical expressions, maximising ethanol boosts the production of ethyl acetate. However, the conversion/inhibition factors and the production rates differ and are influenced by the temperature differently.       The constraint violation percentage is reported in Table 5. All the robustification approaches lead to a significant reduction in constraint violation. Again, the sigma points approach and PCE2 perform the best in terms of constraint violation. From the concentration density plots (Figure 7) for ethyl acetate and diacetyl, it is observed that when the objective is to minimise batch time, the majority of constraint violations are caused due to excess diacetyl concentration. For the case of maximising ethanol, the constraint violations are caused due to excess ethyl acetate. This can be explained by the dynamics of the process. If the fermentation is allowed to proceed uninterrupted, diacetyl decomposes into a variety of other products. As minimising the batch time essentially interrupts the fermentation, it influences the diacetyl decomposition.

Comparison of Robustification Approaches
For both objectives, the sigma points approach and PCE2 approach perform best in terms of constraint violation. The two first-order methods, linearisation and PCE1, both lead to relatively high constraint violation percentages. As both these methods rely on local linear approximations of a nonlinear model, these methods can be expected to perform poorly when the underlying process model is very nonlinear. However, due to the simplicity of implementation, linearisation is often used despite the poor performance. In the case of maximising ethanol, linearisation performs surprisingly well. This can be explained by the optimised profile. The optimised profile consistently hovers around 14 • C and does not induce any abrupt changes to the process dynamics, leading to a smooth evolution of concentration profiles. In such cases, linearisation is known to perform well. However, when an optimised profile excites the nonlinearity in the model by a big jump (as is the case in minimising time), linearisation over(or under)estimates the variances drastically. This is corroborated by a comparison of the estimated variances of the concentrations with the variance obtained from Monte Carlo simulations reported in Table 6. For maximising ethanol, the variance is estimated with good accuracy, while for minimising batch time, the accuracy reduces. This is consistent with the results of Bhonsale et al. [35], which demonstrate the failure of linearisation when a step change in input is applied to the process exciting its nonlinear dynamics.
A polynomial chaos expansion is fundamentally a reduced-order model of the original process model. In PCE1, the expansion is truncated after the first-order terms making the model linear. The weak performance of PCE1 can be attributed to the local nonlinearity of the fermentation model, which makes the variance approximation poor. Adding a secondorder term in PCE2 significantly improves the variance approximation and subsequently the performance of the optimisation under uncertainty.

Effect of Backoff Parameter
As mentioned earlier, the backoff parameter is based on the quantiles of the output distribution. As expected, increasing the value of the back-off parameter (i.e., constraint violations tolerated) leads to a reduction in the percentage of constraint violations for all the methods. This comes at the cost of a more conservation solution. For all methods, an increase in backoff parameter leads to a reduction in the maximum ethanol concentration obtained and longer batch times when the batch time is minimised. It should be noted that in all cases, the tolerances set for the constraint violation are not met. For example, a backoff parameter of 1.65 corresponds to 5% tolerance for constraint violation. However, PCE2, which performs the best for both objectives, still leads to 6.7% and 8.2% violations for maximising ethanol and minimising batch time objectives, respectively. This is because the backoff parameter is based on the quantiles of a normal distribution. It is evident from Figures 5 and 7 that the model output does not follow a Gaussian distribution. Hence, the quantiles used do not exactly correspond to the tolerances. As only two moments, the expectation and the variance, can be obtained from either of the three methods, information on the distribution is impossible to obtain without Monte Carlo simulations. Nevertheless, the reduction in constraint violations with increasing backoff parameters under the assumption of Gaussianity is still significant.

Conclusions
Dynamic optimisation has tremendous potential to improve fermentation operation. However, models used in optimisation are inherently uncertain due to their estimation from experimental (noisy) data. It has been shown that if the parametric uncertainty is not included in the optimisation, the state constraints can be violated. Hence, to avoid bad fermentation batches it is necessary to include the uncertainty information available in the optimisation framework.
In this study, a beer fermentation process was optimised while accounting for parametric uncertainty using three techniques. The results obtained with the three techniques were compared for different backoff parameter values. Two objectives, maximising the ethanol concentration and minimising the fermentation time, were considered. While all the methods led to a significant improvement in constraint violation when compared to the nominal case, second-order PCE performed the best. The main advantage of the PCE approach is that no assumption on the distribution of uncertain parameters needs to be made. For independent parameters with certain distributions, the polynomials involved in the PCE can be obtained through the Wiener-Askey scheme [37]. If the parameters are not independent, or do not follow a distribution from the Wiener-Askey scheme, the polynomials can be obtained by utilising the statistical moments and Gram-Schmidt orthogonalisation. However, PCE is computationally expensive as it leads to a significantly bigger optimisation problem. The sigma point approach leads to a smaller optimisation problem while still performing well. However, the direct application of the sigma point approach is limited to parameters with symmetric unimodal distributions. For asymmetric distributions, transformation functions need to be utilised to implement the sigma point approach [49]. Linearisation performs the worst amongst the three approaches. This is expected due to the nonlinearity of the process. However, when the objective is to maximise ethanol linearisation, it performs similarly to the sigma point approach. The study has also emphasised the need for appropriate selection of the backoff parameter.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: