Artificial Neural Networks and Gompertz Functions for Modelling and Prediction of Solvents Produced by the S. cerevisiae Safale S04 Yeast

: The present work aims to develop a mathematical model, based on Gompertz equations and ANNs to predict the concentration of four solvent compounds ( isobutanol , ethyl acetate , amyl alcohol and n ‐ propanol ) produced by the yeasts S. cerevisiae , Safale S04, using only the fermentation temperature as input data. A beer wort was made, daily samples were taken and analysed by GC ‐ FID. The database was grouped into five datasets of fermentation at different setpoint temperatures (15.0, 16.5, 18.0, 19.0 and 21.0 °C). With these data, the Gompertz models were parameterized, and new virtual datasets were used to train the ANNs. The coefficient of determination (R 2 ) and p ‐ value were used to compare the results. The ANNs, trained with the virtual data generated with the Gom ‐ pertz functions, were the models with the highest R 2 values (0.939 to 0.996), showing that the pro ‐ posed methodology constitutes a useful tool to improve the quality (flavour and aroma) of beers through temperature control.


Introduction
Beer is a complex beverage that contains more than 450 different compounds, including higher alcohols, esters, acids and aldehydes, which have a great impact on aroma and flavour characteristics. This impact could be positive or negative depending on the beer style and also on the concentration of these compounds. Therefore, it is interesting to develop mathematical models that help predict its concentration.
In general, when brewers refer to the compounds found in a beer, it is not common to do so by their names but by the characteristics they produce; thus, they are known as "organoleptic descriptors", which can refer to a compound, several of them or even mouthfeel, flavours and aromas. One of these descriptors is solvents, which are described as: "similar to acetone or lacquer thinner aromas" [1]. According to different authors, temperature, although it should not be forgotten that the predominant factor is the genetics of the yeast strain. Yeasts S. cerevisiae (ale) produce higher amounts of higher alcohols than S. pastorianus (lager).
These solvents are desirable at certain levels as they add complexity to the flavours and aromas. However, if these compounds exceed certain limits, they could become major quality failures. Considering this, the obvious question would be: how much solvent is desirable? This question does not have a clear answer, but it will be the brewmaster with his experience who decides how much the optimal amount is. Thus, it is important to understand which are the variables that influence the concentration of solvents, but at the same time which are the ones that brewmasters can manipulate practically. As far as the authors know, there are no fermentation control systems that use temperature as a control variable to estimate these solvents, despite the important impact they have on the organoleptic quality of the final product.
Considering that brewers try to standardize their recipes and brewing methods, the temperature is practically the only variable that allows exercising control over the fermentation process. For this reason, the mathematical model must consider the temperature as the main input value and the solvent concentration as an output.
Due to the sigmoidal behaviour of the compound concentration growth curves, these can be modelled using Gompertz functions [6]. However, each temperature and each compound will be represented by a unique function, which makes the practical applicability of the model for different working temperature ranges difficult. ANNs have proven to be powerful tools in the modelling of different parameters in beers, prediction of compound concentration, classification of beers, pattern recognition in noses and electronic tongues, among other uses [7][8][9][10][11][12]. The use of neural networks has the advantage that once trained is possible to estimate all the possible values that can befall.
The objective of this work is to provide specific models to predict the final concentration of four solvent compounds: isobutanol, ethyl acetate, propanol and amyl alcohol, using only temperature and timestamp readings as control variables.
Five fermentations (dataset from 1 to 5) of 10 days each were carried out at different target temperatures: 15.0 °C, 16.5 °C, 18.0 °C, 19.0 °C, 21.0 °C. Samples were taken daily (every 24 h) for 10 days for chromatographic analysis. For the quantification of the four compounds, three replicates of each sample were analysed, resulting in thirty experimental data per fermentation temperature for each compound.
Beers were not filtered, neither was any type of clarifier added.
Tilt™ hydrometer-thermometer (Tilt, Santa Rosa, CA, US) was installed to provide temperature and density real-time data. These data were sent through a Raspberry Pi 3B+ (Raspberry Pi Foundation, Cambridge, UK) using the Tilt™ Pi V2 Buster App to a Thing-Speak™ channel (MathWorks, Natick, MA, US). Tilt accurate for temperature is ± 0.5 °C.

Sample Description and Experimental Data and Analysis of Solvents Using GC-FID
This method was extensively described by Loira et al. [14]. Each day samples were taken in triplicate and analysed with a gas chromatograph coupled with a flame ionisation detector (GC-FID). Samples were injected after filtration through 0.45 μm cellulose methyl ester membrane filters (Labbox, Madrid, Spain). The equipment was the Agilent Technologies 6850 gas chromatograph (Palo Alto, CA, USA). The injection temperature was 250 °C and the detector temperature was 300 °C. The column used was a DB-624 capillary column (60 m × 250 μm × 1.40 μm) (stationary phase 6% cyanopropyl phenyl, 94% polydimethylsiloxane). The column temperature ramp was 40 °C during the first five minutes, then a linear increase of 10 °C per minute until 250 °C. This temperature was maintained for 5 minutes. The total runtime of each sample was 40 min. The carrier gas used was hydrogen with a flow rate in the column of 2.2 L•min −1 and 100 μL of internal standard (4methyl-2-pentanol, 500 mg/L) (Fluka Chemie GmbH, Buchs, Switzerland) was added to 1 mL test samples. This method is a variant of one recommended by the International Organisation of Vine and Wine (OIV) for the analysis of higher alcohols. The detection limit was 0.1 mg/L. The volatile compounds analysed were previously calibrated with fivepoint calibration curves (R 2 ): 1-propanol (0.999), ethyl acetate (0.999), isobutanol (0.999), 2methyl-1-butanol (0.999), and 3-methyl-1-butanol/isoamyl alcohol (0.999).

Mathematical Fitting Models: Gompertz Functions and ANNs
For curve fitting, ANNs training and data analysis Matlab ® R2021a (MathWorks, Natick, MA, US) was used.
are mathematical parameters to describe the sigmoidal growth curve.
From the Gompertz mathematical parameters a, b, c can be estimated and the values A, kg, λ can be calculated (Equations (2)-(4)), which has the advantage of having a practical biological interpretation as described by Zwietering et al.; A [mgL −1 ] is an asymptote, representing the upper value of the compound concentration when time tends to infinity; kg [mg L −1 h −1 ] represents the specific max growth rate; finally, λ [h] represents the lag time from zero to the moment when the exponential growth begins, i.e., the intercept on the abscissa axis of the tangent that passes through the inflexion point on an exponential curve, as explained in detail by Singh et al. [16]. (1) Once the adjustment parameters of the model were obtained for each compound and each temperature, the Gompertz functions were used to generate a database larger than the original one; the object was to train the neural network and thus obtain robust results. A matrix with 140625x4 values was generated with this porpoise.
Four models were developed using ANNs. The inputs were temperature and timepoints in all cases. The outputs were n-propanol, ethyl acetate, amyl alcohol and isobutanol. The training algorithm was Bayesian Regularization with ten neurons in the hidden layer for n-propanol, ethyl acetate and amyl alcohol and five for isobutanol since this algorithm works well on small datasets [2] (Figure 1). Random data division (70% training and 30% testing) was used. The means squared error (MSE) and root means square error (RMSE) algorithms were used for performance purposes.

Statistical Analysis
Raw data were compared using F-test, p-value (p < 0.05) and R 2 -value, with the simulated data from the Gompertz functions and AANs results to assess the performance of the functions.

Results and Discussion
To examine the effects of different temperatures in the solvent's concentrations, five experiments were carried out in five datasets (1, 2, 3, 4, 5) which correspond to the setpoint temperatures (15.0, 16.5, 18.0, 19.0, and 21.0 °C), respectively. Despite setting the system at those temperatures, the sensor measurements (°C) varied over time. In this work, all the calculations were performed with these measured values. Measured statistical data is given in Table 1. Each model was fitted to each dataset with Gompertz functions, the estimated mathematical parameters are given in Table 2. From the Gompertz parameters a, b and c, the values of A, kg, and have been calculated, which correspond to the amplitude (maximum value), the maximum growth rate and the lag time of the compounds. These values are given in Table 2. This information is useful for understanding and predicting the lag phase of yeast and its relationship with the generation of the studied compounds.
The use of Gompertz functions to model similar compounds has been done before in other works such as Membré and Tholozan [6] although with other microorganisms, or Benucci et al. [17] for different compounds but similar purposes.  [19]. It should be noted that in the bibliography different concentrations of the studied compounds can be found. Most of these studies do not specify the composition of the worts or the yeast strains used. Without this information, it is difficult to accurately compare the results obtained, due to, as it is known, the differences in the composition of sugars and nitrogen present directly affect the concentration of higher alcohols [5] and thus that of esters [4].
Riverol and Cooney reported that the rate constants of ethyl acetate do not change concerning temperature [21], which differs from the present work and that reported by Loviso et al. and García et al. [4,20]. Riverol et al. did not specify which yeast they modelled, but fermentation temperatures were between 10 °C and 15 °C which suggests that it was a strain of S. pastorianus.
In the cases of propanol and isobutanol, the odour thresholds are not exceeded at any of the temperatures, while in amyl alcohol it is close to the lower limit when fermented at 15 [22][23][24]. The values obtained in this work are within or very close to these ranges (20.99-29.95 [mg L −1 ]), thus we can say it is perceived in the aroma, even though its intensity varies. Tª Table 3 shows that all models fit the data reasonably, which is demonstrated by the values close to one of all R 2 -values calculated from Gompertz functions. n-propanol (at 15 °C) is the only function with an R 2 -value less than 0.95.
The analysis of variance, for the same fermentation time and temperature, from the Gompertz equation for each case indicates that there are no significant differences between the experimental and estimated data. The statistical comparison with raw data is given in Table 3. Among F-values, the highest ones are those from 21 °C (its p-values are the lowest). Due to this higher temperature, the experimental data corresponding to the first 24 h of fermentation is close to the value of the asymptote A.
Using a simulated dataset from the twenty Gompertz functions (five fermentation temperatures for four compounds) developed, a data mesh was generated with Matlab ndgrid function (Figure 6), with all possible values of temperature and time between the minimum (14.4 °C) and maximum (20.6 °C) values reported by the temperature sensor, and between a time of 0 to 9 days. With this meshing, the neural network has been trained.
Amyl alcohol, ethyl acetate and n-propanol ANNs were generated with ten neurons in the hidden layer, isobutanol with five neurons. The results have been represented in Figure  6, where isobutanol is the compound on which the temperature exerts the greatest influence, n-propanol and amyl alcohol are the fastest to reach high values with mean kg values of 1.142 and 2.324 [mg L −1 h −1 ] respectively.
The result of the data obtained from the simulation carried out with the neural network is compared with the raw data, the R 2 values are similar to those obtained in the Gompertz functions, most being slightly lower except for ethyl acetate in datasets 4 and 5 and amyl alcohol in dataset 1, which are slightly higher ( Table 2).
The p-values are higher in 14 of the 20 ANNs results compared to those of the Gompertz functions. In the case of dataset 5, the values are considerably higher (on average 70.4% higher), which would indicate that in this case, the ANNs model has significantly better quality than the Gompertz one, even though they were developed with data simulated by these functions. With the data presented, all the ANN models fit well considering that they are small datasets, which is precisely one of the advantages provided by the Bayesian regularization algorithm. In addition, Gonzalez et al [25] comments that an indicator of a good model without overfitting occurs when the validation correlation coefficient is close to the training correlation coefficient and the MSE of the training stage must be lower than the other phases. The first condition cannot be verified because the Bayesian regularization algorithm does not present a validation phase, while the second condition is fulfilled. All these values are shown in  It can be seen in Figure 6 that the initial values of the ANNs (t = 0) are not zero, as understood in the experimental results. This is a consequence of the use of the Gompertz functions, which due to the mathematical nature of the parameter b (Equation (1)) makes the behaviour of the starting point a relative value, as a percentage of the asymptote A, and does not pass through the point (0,0) [26].

Conclusions
To the best of our knowledge, this is the first work to model and predict the concentration of solvents (isobutanol, n-propanol, ethyl acetate and amyl alcohols) produced by the commercial ale yeast S. cerevisiae Safale S04 from the perspective of temperature as the only input value.
This work has shown that Gompertz functions could be a high-precision data source for ANNs. High R 2 values (0.939 to 0.9962), the precision of the ANNs and the simple way to implement these models make them a useful tool to improve the quality (flavour and aroma) of beers through temperature control.