A Theoretical Analysis for Assessing the Variability of Secondary Model Thermal Inactivation Kinetic Parameters

Traditionally, for the determination of the kinetic parameters of thermal inactivation of a heat labile substance, an appropriate index is selected and its change is measured over time at a series of constant temperatures. The rate of this change is described through an appropriate primary model and a secondary model is applied to assess the impact of temperature. By this approach, the confidence intervals of the estimates of the rate constants are not taken into account. Consequently, the calculated variability of the secondary model parameters can be significantly lower than the actual variability. The aim of this study was to demonstrate the influence of the variability of the primary model parameters in establishing the confidence intervals of the secondary model parameters. Using a Monte Carlo technique and assuming normally distributed DT values (parameter associated with a primary inactivation model), the error propagating on the DTref and z-values (secondary model parameters) was assessed. When DT confidence intervals were broad, the secondary model’s parameter variability was appreciably high and could not be adequately estimated through the traditional deterministic approach that does not take into account the variation on the DT values. In such cases, the proposed methodology was essential for realistic estimations.


Introduction
Processing and storage induce changes in foods due to biological, chemical and physical reactions. Such changes proceed at a certain rate and with particular kinetics. Kinetic modeling enables us to describe these changes and their rates quantitatively and has become a very useful approach in relation to food processing and food quality. Significant published work regarding kinetic modeling has focused on microbiological safety and spoilage in a variety of food matrices [1] or on describing basic reaction mechanisms that can be important for quality evaluation and control [2].
Food kinetics is based on the thorough study of the rates at which physico-and bio-chemical reactions proceed. The area of kinetics in food systems has received a great deal of attention in past years, primarily due to efforts to optimize or at least maximize the quality of food products during processing and storage [3]. The use of chemical kinetics, the study of the rates and mechanisms that are involved in the reactions of interest, and the mathematical relationships that best describe the influence of different external or internal factors, such as pH, water activity, pressure, or in particular temperature, on the reaction rate constants have been used to model changes in food quality. Kinetic parameters describing such changes are needed.
When building mathematical equations to describe food kinetics, experimental data is used to estimate relevant parameters. Both for microbiological and chemical predictions, the choice of conditions, the experimental design, as well as the quality of experimental, data is important for the validity of parameters, and hence the reliability of the prediction [4]. The estimation of model parameters per se, and the accurate calculation of their variation (expressed by its confidence intervals) affect considerably the final predictions. To obtain the best results, careful choice of primary and secondary models as well as appropriate statistics are necessary.
The most widely applied procedure for food kinetic modeling (either for microbiological measurements or for chemical changes) includes a step-to-step procedure [5][6][7][8][9][10][11][12]. In such an approach, initially, experiments are conducted under different constant temperature conditions, the most representative indices are selected, and their change is measured as a function of processing or storage time. From this point, the traditional two-stage methodology is based on the application of an appropriate primary model in order to describe the rate of these changes and determine, through appropriate mathematical equations, the relevant kinetic parameters for each temperature. The second step is to select a secondary model that best describes the effect of temperature on the respective kinetic parameters. The ultimate goal is to build this secondary model and define not only the estimates of its parameters, but also provide information on their variability. Consequently, knowledge of all parameters of primary and secondary models would allow for the prediction of quality or microbiological status of the food at any, different temperature of interest. If validated, mathematical models can be a useful tool to assess safety and quality after processing and at any stage of food storage. This conventional method of a two-step procedure for food modeling is however very time-consuming and lacks reliability concerning the expression of parameters' variability [4]. In its present form, errors, or equivalently, confidence intervals of primary model parameters at each temperature are not taken into account when calculating secondary model parameters. As a consequence, confidence intervals of estimates of the kinetic parameters of the secondary model can be falsely narrower than the actual confidence intervals, implied by the experimental data. There are numerous published studies that ignore the variation in the primary model's kinetic parameters while reporting confidence intervals for the parameters appearing in the secondary model [6,[12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. In fact, this has become the standard procedure in the literature. In such cases, either a first or an nth reaction order equation or some other inactivation model, or a mathematical expression to describe microbiological growth (for example Gompertz, Baranyi, logistic etc.) is selected according to the results of the best fit criterion applied in each experimental data set. Thereafter, Arrhenius equation, amongst numerous models proposed in current literature, is mostly used as a secondary model to describe the temperature effect on kinetic parameters. In all these studies, regardless of the primary and secondary model used, and no matter the index measured or the food matrix in question, secondary model kinetic parameters' uncertainty (expressed as confidence intervals of the parameter) does not incorporate primary model parameters' true variation.
A reliable alternative is deemed necessary to account for the real uncertainty of model parameters, in order to be able to predict, in a more accurate way, chemical or microbiological changes, at any selected temperature and time during processing and product shelf life (at storage or distribution). The Monte Carlo simulation technique has been recently proposed in literature for the probabilistic assessment of stochastic variability and uncertainty associated with microbiological [9,[29][30][31][32] or chemical/sensory [33][34][35][36][37][38] attributes of several food matrices.
The Monte Carlo technique is also used as a background tool in the bootstrap methodology described by [39] in 1993, first applied in scientific fields other than food science. Recently, a bootstrap Monte Carlo analysis has been proposed to compute the variation of the primary model's parameters based on the propagation of error, artificially introduced to the experimental data (concentration vs. time data); it was assumed that the experimental data was normally distributed and the error introduced at each measurement was based on its standard deviation [40][41][42][43][44]. However, the variation of the primary model parameters has not been taken into account in determining the uncertainty of the parameters of the secondary model. Therefore, the scope of this work was to assess the propagation of the primary model parameters variability into the secondary model kinetic parameters estimation. As a case study, thermal inactivation data of L-carnitine under isothermal conditions was used along with the classical thermobiological approach [45,46] for the determination of the respective parameters of the primary (decimal reduction time, D T ) and secondary (z and D Tref values) models. For this purpose, Monte Carlo simulation, designed and implemented through a FORTRAN algorithm was used.

Materials and Methods
According to the classical thermobacteriological approach for thermal inactivation kinetics [45,46], the inactivation rate is described by the decimal reduction time (D T ), through the following Equation (1): where D T is defined as the time in minutes, at constant temperature, required to reduce the initial concentration of a thermolabile substance by 90%, C the concentration at time t and C o the initial concentration of the thermolabile substance. Based on Equation (1) and the application of linear regression, the D T value is determined at each temperature of the isothermal experiments, from the negative reciprocal of the corresponding slope. The temperature dependence of D T is characterized by the z value, that is, the increase in temperature necessary to induce a 10-fold reduction in D T (Equation (2)): where T ref is a reference temperature. Parameter z is determined as the negative reciprocal of the slope of log D T vs. T regression line.

Monte Carlo Simulation
The proposed procedure was based on the assumption that the estimate of the main parameter D T of Equation (1), which is determined by linear regression of logC vs. t data for each temperature, can be described by a distribution of values rather than by a single value. The benefit of this approach is that it incorporates the error in D T calculated by the linear regression of the primary model, and thus the variability of the parameter [47,48]. For the development of this methodology, experimental data of L-carnitine thermal degradation, describing its inactivation at eight different constant temperatures were used [49].
In applying the Monte Carlo simulation, a normal distribution was assumed to describe D T variability. A similar approach was used by [37] when assessing the confidence intervals of E a and k ref parameters of the Arrhenius equation for non-isothermal degradation of cyanidins. An example of such a normal distribution, which is constructed based on the estimate of the mean value, µ, of 896.3 min and its standard deviation, (σ = 22.5 min) is demonstrated in Figure 1; it refers to the D 80 • C value and it is based on the relative experimental data. The distribution was discretized every 0.5 × σ intervals on the D T scale, the central interval being located between µ − 0.25 × σ and µ + 0.25 × σ. Within each such interval, a single D T value, equal to the D T value at the middle of the particular interval, was assigned ( Figure 1). The same procedure was followed to construct the corresponding distribution curves for all eight temperatures of the isothermal experiments.
There are some (limited) publications that refer to the distributions that best describe primary models' kinetic parameters. For example, Poschet et al. [42], in a Monte Carlo analysis, represented the parameter distributions of a microbial growth model by normal distributions. Results by Huang and Vinyard [9] showed that the distributions of a primary model (a system of ordinary differential equations) parameters were approximately normal. Pénicaud et al. [50] superimposed a pseudo random noise on the experimental data, performed 2000 runs using Monte Carlo technique and showed that the primary model's parameters (apparent reaction rate constant of ascorbic acid loss) were normally distributed. Our assumption about D T having a normal distribution was checked by creating 1000 logC vs. time synthetic data sets at 80 • C, estimating D 80 • C for each set, and finally plotting a histogram of D T values ( Figure 2). The synthetic data sets were based on the experimental measured C data and their variation, by assuming that they were normally distributed and using a Monte Carlo technique implemented in a FORTRAN coded program. Based on this plot, where the normal distribution line was also added ( Figure 2) we concluded that D T values were approximately normally distributed. As a consequence, D T values can be described by a mean value and a symmetric standard deviation.  There are some (limited) publications that refer to the distributions that best describe primary models' kinetic parameters. For example, Poschet et al. [42], in a Monte Carlo analysis, represented the parameter distributions of a microbial growth model by normal distributions. Results by Huang and Vinyard [9] showed that the distributions of a primary model (a system of ordinary differential equations) parameters were approximately normal. Pénicaud et al. [50] superimposed a pseudo random noise on the experimental data, performed 2000 runs using Monte Carlo technique and showed that the primary model's parameters (apparent reaction rate constant of ascorbic acid loss) were normally distributed. Our assumption about D T having a normal distribution was checked by creating 1000 logC vs. time synthetic data sets at 80 °C, estimating D 80°C for each set, and finally plotting a histogram of D T values ( Figure 2). The synthetic data sets were based on the experimental measured C data and their variation, by assuming that they were normally distributed and using a Monte Carlo technique implemented in a FORTRAN coded program. Based on this plot, where the normal distribution line was also added ( There are some (limited) publications that refer to the distributions that best describe primary models' kinetic parameters. For example, Poschet et al. [42], in a Monte Carlo analysis, represented the parameter distributions of a microbial growth model by normal distributions. Results by Huang and Vinyard [9] showed that the distributions of a primary model (a system of ordinary differential equations) parameters were approximately normal. Pénicaud et al. [50] superimposed a pseudo random noise on the experimental data, performed 2000 runs using Monte Carlo technique and showed that the primary model's parameters (apparent reaction rate constant of ascorbic acid loss) were normally distributed. Our assumption about D T having a normal distribution was checked by creating 1000 logC vs. time synthetic data sets at 80 °C, estimating D 80°C for each set, and finally plotting a histogram of D T values ( Figure 2). The synthetic data sets were based on the experimental measured C data and their variation, by assuming that they were normally distributed and using a Monte Carlo technique implemented in a FORTRAN coded program. Based on this plot, where the normal distribution line was also added ( Figure 2) we concluded that D T values were approximately normally distributed. As a consequence, D T values can be described by a mean value and a symmetric standard deviation. The Monte Carlo technique [37,51] was implemented through a FORTRAN based algorithm. At each iteration, a random number was generated out of a normal distribution and a value was The Monte Carlo technique [37,51] was implemented through a FORTRAN based algorithm. At each iteration, a random number was generated out of a normal distribution and a value was assigned to D T based on the discretization of the corresponding normal distribution curve. The correlation between the random numbers and the respective values of D T was based on the frequency determined by the corresponding distribution, for each of the eight temperatures studied. In our study, Monte Carlo simulation was repeated 500 times, assigning 500 values to each D T parameter, for all eight temperatures studied. For example, in the case of the distribution curve for the 80 • C case, out of the 500 randomly selected D T values, 100 were assigned with the mean value of 896.3 min, giving a percentage of 20%; 60 were given a value of 873.9 min, giving a percentage of 12%; five were given a value of 840.1 min, giving a percentage of approximately 1% and so on, as shown in Figure 1. Thus, 500 sets of eight D T values, each one at each of the eight experimental temperatures, were created.
The ultimate goal was to determine the values and the corresponding errors (or confidence intervals) of D Tref and z (Equation (2)) based on these 500 D T datasets that have been produced out of the previous step of Monte Carlo application. For each specific, random, set of the eight D T values, a pair of estimates for D Tref and z and their corresponding variations was assessed based on linear regression of log D T vs. T data. For the 500 pairs of D Tref and z values and their assumed normal distribution (created based on the estimated regression 95% confidence intervals) and using again the Monte Carlo technique, 500 D Tref and z values were randomly chosen from each distribution. The final set comprised 500 × 500 values for each parameter of the secondary model, D Tref and z. From these 500 × 500 values for each parameter, the mean values and the corresponding 95% confidence intervals of D Tref and z parameters were calculated. At this point, it should be noted that, as a reference temperature, the value of 120 • C was used.
δ D T re f = ln(10) D T re f δ log(D T re f )

Results and Discussion
The D T values, obtained through Equation (1) and the experimental data of L-carnitine thermal degradation, and the corresponding 95% Confidence Intervals (CI) are depicted on the second and third columns of Table 1, respectively. Based on these D T values and linear regression through Equation (2), the following estimates, and the corresponding ±95% CI, were calculated for the secondary model parameters: z (in • C) = 30.2 ± 2.3 and D Tref (in min) = 50.6 ± 7.3 for a chosen T ref equal to 120 • C. The application of the proposed methodology resulted on the same mean D Tref and z values and did not lead to significant changes of the 95% CI of the parameters D Tref and z compared to their initial estimates. The 95% CI on the D Tref and z found through the proposed analysis were ±7.7 min and ±2.4 • C, respectively, comparable to the initial estimates of ±7.3 min and ±2.3 • C, respectively, that were calculated without taking into account the uncertainty at the determination of each D T value (at each temperature). This was attributed to the relatively low error (narrow 95% CI) of D T values at each of the temperatures of the experiment compared to the error of the linear regression ( Figure 3a). Consequently, the described methodology was tested using a higher, arbitrarily decided error on the same, experimentally calculated mean D T values (Table 1, fourth column). Thereafter, the procedure previously described was repeated with the new, expanded ±95% CI of D T values. The legitimacy of the procedure, as far as the Monte Carlo scheme used and the randomness of the selections, was confirmed by the estimation of the mean values and the 95% CI of the 500 random sets of the D T values and the resulting proximity to their initial values (Table 1).   In Figure 3a,b, the straight grey lines included represent the linear regression lines for 250 (out of the 500) sets of D T values, both for the original narrow 95% CI (Figure 3a), as well as for the intentionally expanded 95% CI (Figure 3b) Figure 3a, grey lines lie well within the ±95% CB (due to their small experimental error), in Figure 3b, grey lines' range not only exceed the ±95% CB but almost In order to better visualize the importance of the initial variability of D T values on the secondary model kinetic parameters, a qualitative illustration is given in Figure 3. In this figure, original data points (log(D T ) vs. T ref -T), the linear regression line and respective 95% confidence bands (CB, red lines) and prediction bands (PB, black solid lines), are indicated. Error bars represent the ±95% CI for the D T data. Figure 3a refers to the data with the original, narrow 95% CI (columns 2 and 3 of Table 1) while Figure 3b uses the data with the intentionally expanded 95% CI (columns 2 and 4 of Table 1). As it can be seen, regression, ±95% CB and ±95% PB lines are identical for both Figure 3a,b. The regression line uses only mean D T values. The equations also used for the construction of the ±95% CB and ±95% PB (Equations (7) and (8), respectively) do not include any D T variation in their formulations. For y = log(D T ) and x = (T ref -T), the expected value of y for a given x o (±95% CB) is given by [52]: while the predicted value of y for a given x o (±95% PB) is given by: where σ y is the standard deviation of the residuals, σ xx is the sum of squared error, N is the number of variables (here equals eight, the number of experimental temperatures), y o is the estimated value of y based on linear regression, x the mean value of all x's and t a/2 is the value of Student distribution for N-2 degrees of freedom. In Figure 3a,b, the straight grey lines included represent the linear regression lines for 250 (out of the 500) sets of D T values, both for the original narrow 95% CI (Figure 3a), as well as for the intentionally expanded 95% CI (Figure 3b) of our case study. It can be observed that the range and error bars of these log(D T ) vs. (T ref − T) regression lines, are much wider in Figure 3b, confirming the expanded variability. Whereas in Figure 3a, grey lines lie well within the ±95% CB (due to their small experimental error), in Figure 3b, grey lines' range not only exceed the ±95% CB but almost cover the whole ±95% PB, owing to the intentionally expanded error of the experimental data.
To demonstrate the rationale of the proposed approach, let us consider an extreme scenario, where the experimentally determined mean values of D T parameter for the eight temperatures were lined in a perfect straight line, giving a linear regression with R 2 = 1 and σ xx = 0. In that case, all regression, ±95% CB and ±95% PB lines would coincide regardless of the variability of the primary model's parameter (D T ). This would easily lead to false interpretation of secondary model parameter estimates and poor predictions. In this extreme case, as well as in the case of the expanded error studied in this work, application of the proposed methodology can be of significant help to estimate realistically the variability of secondary model's parameters.
Mean values and standard deviations of the D 120 • C and z values (500 × 500 data points for each parameter) generated by Monte Carlo simulations were calculated and used to create the corresponding frequency curves, both for the original, narrow, as well as for the intentionally expanded 95% CI, for both kinetic parameters, D 120 • C and z (Figure 4a,b). Estimates (mean values) and their variation (represented by standard deviation) of secondary model kinetic parameters (D Tref and z-value), as calculated by regression analysis for the initial experimental data, were used to build the corresponding frequency curve (Figure 4). The curve, based on regression analysis estimates (both its height, as well as its broadness), was in proximity to the results of the Monte Carlo simulation, when the narrow 95% CI were used (as expected from Figure 3a depiction). In the case of the expanded error, however, the respective frequency curve, based on values generated by the Monte Carlo simulation, differed significantly from the distribution based on the calculations of regression analysis, showing the necessity of the introduced approach. case of the expanded error, however, the respective frequency curve, based on values generated by the Monte Carlo simulation, differed significantly from the distribution based on the calculations of regression analysis, showing the necessity of the introduced approach.
The distributions in Figure 4a,b were constructed based on mean values and their 95% CI of parameters D 120°C and z respectively, which were calculated to be 50.2 ± 13.9 min and 30.2 ± 4.6 °C, for the expanded error, compared to 50.6 ± 7.7 min and 30.2 ± 2.4 °C for the small, original error.  It is a common practice that estimation of model parameters is expressed as mean ± standard deviation, or mean ±95% CI, which implicitly assumes normal distributions of the corresponding parameter. In our case study, the D T ref and z values out of the Monte Carlo iterations were presented in histograms (a number up to 50,000 iterations was used) and shown to be adequately described by a normal distribution ( Figure 5). Thus, CI can be considered as symmetric. The distributions in Figure 4a,b were constructed based on mean values and their 95% CI of parameters D 120 • C and z respectively, which were calculated to be 50.2 ± 13.9 min and 30.2 ± 4.6 • C, for the expanded error, compared to 50.6 ± 7.7 min and 30.2 ± 2.4 • C for the small, original error.
It is a common practice that estimation of model parameters is expressed as mean ± standard deviation, or mean ±95% CI, which implicitly assumes normal distributions of the corresponding parameter. In our case study, the D Tref and z values out of the Monte Carlo iterations were presented in histograms (a number up to 50,000 iterations was used) and shown to be adequately described by a normal distribution ( Figure 5). Thus, CI can be considered as symmetric. All the discussion up to this point was made on a 500 × 500 data points' base for Monte Carlo simulation. However, before selecting the appropriate number of trials, a sensitivity analysis was performed for both parameters of the secondary model (D 120°C and z), which, for the sake of clarity was not presented earlier. This study, again realized using a FORTRAN code, is deemed necessary in order to test Monte Carlo effectiveness and seek the necessary repetitions to obtain a reliable approximation of the "real" confidence intervals and thus, parameters' variability. In this context, four distinctive cases of a significantly different number of trials were implemented, namely 25 × 25, 100 × 100, 250 × 250 and 500 × 500, following the procedure previously detailed. The estimated mean values and corresponding 95% CI for the D 120°C (in min) and z (in °C) values were 48   All the discussion up to this point was made on a 500 × 500 data points' base for Monte Carlo simulation. However, before selecting the appropriate number of trials, a sensitivity analysis was performed for both parameters of the secondary model (D 120 • C and z), which, for the sake of clarity was not presented earlier. This study, again realized using a FORTRAN code, is deemed necessary in order to test Monte Carlo effectiveness and seek the necessary repetitions to obtain a reliable approximation of the "real" confidence intervals and thus, parameters' variability. In this context, four distinctive cases of a significantly different number of trials were implemented, namely 25 × 25, 100 × 100, 250 × 250 and 500 × 500, following the procedure previously detailed. The estimated mean × 250 and 500 × 500 data sets, respectively. These results ( Figure 6) indicated that, excluding the 25 × 25 iterations simulation, the use of only 100 points out of each of only 100 distributions of D 120 • C and z (based on random 100 data sets of D T values of the primary model) is sufficient to approximate the realistic variations of the kinetic parameters of the secondary model. However, since the computing power and speed nowadays is easily available without big costs, the application of at least 250 × 250 trials is preferable to attain more accurate predictions.

120°C
was not presented earlier. This study, again realized using a FORTRAN code, is deemed necessary in order to test Monte Carlo effectiveness and seek the necessary repetitions to obtain a reliable approximation of the "real" confidence intervals and thus, parameters' variability. In this context, four distinctive cases of a significantly different number of trials were implemented, namely 25 × 25, 100 × 100, 250 × 250 and 500 × 500, following the procedure previously detailed. The estimated mean values and corresponding 95% CI for the D 120°C (in min) and z (in °C) values were 48.9 ± 15.2 and 29.8 ± 4.4, 50.1 ± 14.1 and 30.3 ± 4.8, 50.3 ± 14.0 and 30.4 ± 4.7, and 50.2 ± 13.8 and 30.2 ± 4.6, for the cases of 25 × 25, 100 × 100, 250 × 250 and 500 × 500 data sets, respectively. These results ( Figure 6) indicated that, excluding the 25 × 25 iterations simulation, the use of only 100 points out of each of only 100 distributions of D 120°C and z (based on random 100 data sets of D T values of the primary model) is sufficient to approximate the realistic variations of the kinetic parameters of the secondary model. However, since the computing power and speed nowadays is easily available without big costs, the application of at least 250 × 250 trials is preferable to attain more accurate predictions.

Conclusions
An approach that takes into account the uncertainty of the primary model rate parameters of thermal inactivation (D T values) and integrates it in the variability of the secondary model parameters (D Tref and z) was implemented in a FORTRAN code through a Monte Carlo scheme. When D T confidence intervals were broad, the secondary model's parameter variability was appreciably high and could not be adequately estimated through a traditional deterministic approach. In such cases, the proposed methodology was essential for realistic estimation of the uncertainty on secondary model parameters. For the example presented, the 95% CI on the mean D Tref and z values estimated with the proposed procedure were ±13.9 min and ±4.6 • C, compared to ±7.3 min and ±2.3 • C, respectively, calculated with the traditional two-step linear regression approach which is based on the mean D T values. A sensitivity analysis for the Monte Carlo trials revealed that the use of a minimum 250 × 250 data set was necessary to ensure sufficient accuracy of CI predictions. It should be noticed that, although the preceding analysis referred to first-order inactivation data and Bigelow's model for temperature dependency of primary models kinetic parameters, the same procedure can be employed when a different primary or secondary model is used to describe kinetic data.