Existing Empirical Kinetic Models in Biochemical Methane Potential (BMP) Testing, Their Selection and Numerical Solution

Biochemical Methane Potential (BMP) tests are a crucial part of feasibility studies to estimate energy recovery opportunities from organic wastes and wastewater. Despite the large number of publications dedicated to BMP testing and numerous attempts to standardize procedures, there is no “one size fits all” mathematical model to describe biomethane formation kinetic precisely. Importantly, the kinetics models are utilized for treatability estimation and modeling processes for the purpose of scale-up. A numerical computation approach is a widely used method to determine model coefficients, as a replacement for the previously used linearization approach. However, it requires more information for each model and some range of coefficients to iterate through. This study considers existing empirical models used to describe biomethane formation process in BMP testing, clarifies model nomenclature, presents equations usable for numerical computation of kinetic parameters as piece-wise defined functions, defines the limits for model coefficients, and collects and analyzes criteria to evaluate and compare model goodness of fit.


Introduction
Biochemical Methane Potential (BMP) tests are used to determine the anaerobic degradability of waste [1] or the possibility to recover energy from wastewater as a combustible gaseous mixture containing methane, typically referred to as biogas [2]. Both goals require the modeling and parametrization of the process for further scaling up or transition from a batch to a continuous flow process [3]. Models can be useful to determine the duration of the inoculum adaptation period, to estimate maximum biomethane yield and conversion rate of the substrate of interest into energy carrier, a biomethane [4].
Empirical kinetic models used in BMP tests vary by origin, including, but not limited to enzymatic (e.g., Monod [2,5], Michaelis-Menten [6], etc.) or chemical (e.g., first-order [7] or variable time dependency, etc.) kinetics, statistical distribution (e.g., Weilbull [8], Cauchy [8], Gaussian [9]) and microbial growth models (e.g., Gompertz [10], Logistic, etc.). Mathematical models usually refer to canonical algebraic form of equations with dimensionless coefficients or are based on enzymatic equations describing the dependence of rate of chemical reactions based on substrate concentration. Studies have attempted to re-express and bind some of the coefficients to physically meaningful values and utilize the experiment duration time as a variable instead [8,10]. More recent studies utilize modified versions of equations for batch testing, but there is no standard recommended equation to describe and fit all experimental data, and various models fit various experiments better [2,[11][12][13]. Some authors [14] suggest that results of properly performed BMP test should be well described by the 1.
Start with arbitrary, but reasonable values of all coefficients of an expected equation. It is an array, called a "seed", of all coefficients like V ∞ (biomethane potential), k (rate constant) and others if any existed in a model, represented as vector ( 2.
Calculate for each data point: a.
The theoretical estimation of the y value (either accumulated biomethane yield or production rate):ŷ b.
The value of error function based on experimental values (y i ) and theoretically estimated values (ŷ i ). Please note that this is not a Gauss error function: 3. Sum the error values across all measured datapoints, as an accumulated error which is also referred as an objective function: 4.
Based on the accumulated error value E, change the initial set of parameters to minimize E.

5.
Repeat steps 2 to 4 until E stops decreasing or decrease is negligible.
This set of instructions is a simplified description of the approach that is implemented in the majority of libraries for numerical and scientific calculations for either programing languages or data processing software. The details of implementation, as well as error function, depend on the specific utilized software or algorithm. However, the implementation itself can be optimized and deserve a special consideration as discussed in [15]. The Matlab (Mathworks Inc., Natick, MA, USA) software was previously used for fitting various BMP models to experimental data [2,[16][17][18][19], often using a squared difference of experimental and estimated values as an error function where summation resulted in Residual Sum of Squares error (RSS). For example, if we suspect the first-order kinetics to describe the process, applying these instructions result to: The main disadvantages of the numerical approach, based on the algorithm described above, are: • An indefinite number of iterations before achieving the solution; • It results in an approximate solution, rather than an analytically precise solution; • The precision of calculation improves with higher number of observations; • The calculated result may be dependent on utilized error function.
The substantial advantages of the numerical approach are the ability to operate simultaneously with a large number (set) of variables, but in order to find solutions several questions must be addressed, including: • The exact algebraic form (shape) of equation; • The range of variables, where models are mathematically possible (reasonable).
In addition, the issue of model selection among multiple candidates must be addressed: what criteria, what value of criteria is favorable, etc. Each of the above-mentioned questions is addressed in this analysis of the models applicable for BMP testing. All the models identified here were found in the literature to be applicable for BMP analysis. The objectives of present paper are collecting existing mathematical models for biomethane formation, adjusting them for the purpose of numerical computation, showing the process of such an adjustment, as well as the reasoning for it. This study will be useful for researchers who intend to implement the fitting of model(s) to experimental data from scratch, using general-purpose programming or scripting language, and fine tune the process of numerical computation.

Representation of Existing Empirical Biomethane Models for Numerical Computation
Regardless of the equation's origin and form, the biomethane formation model must be adjusted prior to use. Adjusting existing models to numerical solutions consist of several steps: (1) define the equation (or system of equations) to describe the model; (2) define the intervals where the model is valid and values for regions where it is not; (3) define the limits, where each model coefficient belongs.
The demonstration of each step using as an example a variable time dependence model, the equation of which is mentioned in [17]: where V t is the total volume of produced biomethane at the moment t of experiment; V ∞ is the ultimate amount of gas can be produced in this experiment, also referred to as the biomethane potential; k is the rate constant of gas production; t is the time since beginning of experiment; and γ is the order of time dependency. This equation assumes the start of the digestion process at the same time the experiment starts, which may happen in batch testing. However, gas formation can also start after a delay, used by microorganisms to acclimate to a substrate. This delay is usually referred to as a lag-phase: t lag . No gas is formed during the lag-phase. The correction to the equation is as follows: The V t must be confined to be equal to zero when t ≤ t lag since γ may be: • An even integer number, which will turn the V t to a negative number, which will have no physical value; • A fractional number, which will result in taking a root from negative number and lead to complex number calculations, which also have no physical value.
The V ∞ can only be greater than zero, as well as the kinetic rate constant k. Thus, the model transforms into a system of equations, which in mathematically appropriate terminology would be called a "piece-wise defined function": The system of Equation (9) represents the shape of the model where coefficients are constrained in ranges: Constraints like Equation (10) define the search area for calculations and are sometimes called "bounds", but mathematically appropriate term is the "domain of a function". We repeated this process of model adjustment for each empirical model identified in the literature and in presented study. Table 1 identifies the existing empirical models that are presented in the peer-reviewed publications attempting to describe biomethane accumulation over time. Equations defining those models have been transformed in this study into the piece-wise defined functions, and the domain of those functions were determined. Some models identified in different studies share a mathematical form but are referenced under different names. For each of such models with multiple names, we specified only one of the names in column "Model Name", while others are mentioned in the footer of Table 1. Table 1. Interpretation of biomethane formation models for numerical fitting for single-step digestion.

Model Name Model Equations Bounds Reference
The first-order kinetics model is also met under other names: Exponential, Malthus, Monomolecular, Transference function, Transfer function, Reaction-curve type, Simple Mitscherlich equation. Very often this model is used with no lag-phase, i.e., when t lag = 0. 2 The specific time model can be also found under name of first-order inverse time model, 3 also called a first-order kinetics with modified time dependency. 4 Cauchy distribution can be met under name of Lorentzian function. 5 Fitzhugh sometimes is used as simplified version, where n > 0, actually such modification is Bertalanffy model. 6 Autocatalytic function is considered to be a predecessor of logistic function. 7 Logistics function is sometimes considered as a special case of Richards functions, where d = 1. 8 The Gompertz equation for biomethane is also called as Zwietering modification of Gompertz equation, Zwietering-Gompertz equation or modified Gompertz equation.
Some clarification should be provided for parameter k, which can be found in various biomethane models. Traditionally the parameter k is referred to as the reaction rate constant or hydrolysis constant if hydrolysis is assumed to be a rate limiting reaction. Less often, k is re-expressed as ratio of maximum production rate (in units of amount of substance per units of time) to the gas potential (in amount units), resulting in: This representation is typical for the first-order kinetics to "convert" it to transference function, meanwhile in sigmoidal functions, a similar kinetic rate coefficient has a different calculation behind it [8,10]: For the scope of this study, such modifications were not considered as a separate model since they do not affect the calculation process but add an extra conversion step with obtained parameters.
Additional attention must be focused on the term "lag-phase". Models derived from microbial growth use the term lag-phase as a biologically similar value, which would be the time from the beginning of experiment needed by microbial consortia to adapt to the new substrate and achieve the exponential growth rate [39,40]. Thus, the gas formation is observed even before achieving t lag . Other models, derived from enzyme or chemical kinetics, use a value of lag-phase as time when the actual gas formations start, but no gas formed prior to t lag . This issue was explicitly explained in [8]. Mathematically, it results in various artefacts such as obtaining negative values in first-order kinetics at timepoints before the lag-phase or saddle-shaped gas production curve for the Quadratic Monod model. In both cases, the gas values for timepoints before t lag should be replaced with a zero.
Special mention should also be made of all models used with the powering of exponents, such as Gompertz, Logistics, etc. Based on their algebraic forms, those models cannot be equal to zero, which is irrational for a methane formation at the beginning of a BMP test. The reason for this is these models were originally used to describe the growth of biomass based on initial seeding amount, but not biomethane formation [10]. Thus, some authors [41] suggest a correction to formulas, introducing the bias for zero timepoint at beginning of experiment: However, the same authors states that the modified Equation (13) requires extra recalculation step for actual biomethane potential value as: The majority of models identified in literature follows representation suggested by [8], where gas yield at moment t is expressed as a fraction maximum gas yield as time dependent correction: This "fractional" representation is reasonable since the gas potential V ∞ is a parameter of interest, but not all models are possible to be expressed that way. One of such models is the Levi-Minzi model, originated from the composting and mineralization process [42,43], but is used in some studies for biomethane kinetics [25]. It cannot be used to determine the biomethane potential, since the equation is: Whereas biomethane potential should be equal to: However, under conditions, where k > 0 and m ∈ (0, 1): Since the BMP value in this case is undefined in numerical value, this model should be definitely avoided in modeling of BMP.

Optimization of Calculation Process
While the calculation of maximum biomethane potential yield and the determination of its formation kinetics are the goals of a BMP test itself, the optimization of calculation process to make it faster is not addressed in the BMP-related literature. Since the numerical approach is basically a continuously repeated calculation until achieving a certain precision, optimization can be performed by decreasing the amount of computation work to compute the solution, which also reduces the required analysis time. Thus, the optimization points are: • Minimize number of arithmetic operations in formulae; • Narrow the search area (minimize the intervals of equation coefficient existence).
While the minimization of the number of arithmetical operations would not be an incredible improvement to the majority of modern computers, it may assist in various embedded systems such as data loggers (like AMPTS II, OxitTop, YieldMaster, Nautilus BMP, etc.) to add the feature of gas kinetics prediction with the early estimation of gas potential, utilizing the approach from [17,26]. Such gas counting solutions vary in implementation. For example, Anaero Technology (Cambridge, UK) uses Arduino MEGA2560 (Arduino LLC, Boston, MA, USA) development boards in the core of their Gasflowmeter product to count gas [44]. ATmega2560 (Microchip Technology Inc., Chandler, Arizona, USA), as the core of Arduino MEGA2560, is an 8-bit microcontroller designed for fast data capturing, but not for numerical computation. Other manufacturers do not share the hardware specification of biomethane logging systems; however, the purpose of such system implies a fast data collection and sensor interfacing rather than performing multiple mathematical calculations. The AMPTS II (Bioprocess Control AG, Lund, Sweden) uses some ARM CPU [45] and function under control of Linux-based operating system for data acquisition and runs the web-server for ease of data access, but no particular model for kinetic computation is stated [46]. Logically, decreasing the number of computational steps for kinetics computation could allow to add this feature to already existing solutions as part of a firmware.

Minimization the Number of Arithmetic Operations
The minimization of number of arithmetic operations can be illustrated with first-order kinetics equation, which can be re-expressed in three mathematically equal shapes: Equations (19) and (21) return the result in six arithmetic steps, while Equation (20) returns the result in five. When the same equation is needed to be calculated several hundred or thousand times per one datapoint, it becomes a significant shortcut. Moving towards sigmoidal functions, the difference in arithmetic operations number between canonical algebraic form and modified form for biological meanings increases. Zwietering et al. [10] re-expressed the Gompertz equation as Equation (22), deriving it from its algebraic form as Equation (23) with a set of substitutions (24): Using Equation (22) provides the result in 11 arithmetic operations, while Equation (23) uses only seven operations. Thus, from a numerical computation perspective it is more efficient to use an algebraic form to determine coefficients and later recalculate abstract parameters a, b and c into chemically or physically meaningful parameters using a system of Equation (24).
Here, we consider it necessary to make extra emphasis that the number of mathematic operations could be the issue for embedded systems, which are limited in computational capabilities. For desktop systems, this optimization may not be issue, due to higher computational performance in general.

Narrowing the Search Area
The search area is a combination of existence intervals for each of the coefficients in an equation. Previously, we called these bounds or domain of function; however, the domain of function means the range of variables where the function can be calculated. Some coefficients can be "limited" from only one side, such as greater than zero, etc., but having a limit from another side will decrease the number of options for the coefficient value. Thus, it will decrease the number of steps before a solution is achieved. In addition, for BMP testing, we want to narrow such intervals to ranges where they have physical meaning.
Narrowing the intervals where each of the equation coefficients may belong may be conditionally divided in two categories: • Those based on the BMP experimental set-up; • Those based on the re-parametrization of model.
Based on the BMP experimental set-up, we can limit the search values of biomethane potential V ∞ and lag-phase t lag . Thus, t lag cannot be longer than the duration of the test: t lag ∈ 0, t experiment . The V ∞ can be expressed as yield (either volumetric or molar) per amount of loaded Chemical Oxygen Demand (COD) or Volatile Solids (VS). [20,47], which would be called as Specific Methane Production (SMP) curve [14]. In such cases, we can estimate upper limit of biomethane yield per loaded COD as: (25) where V limit is the maximum possible amount of gas yielded per gram of COD; R is the universal gas constant, in international unit system equals to 8.3145 J K·mole [48]; T is the absolute temperature in Kelvins (K); p is the absolute pressure in Pascals (Pa); COD gas−eq is the COD equivalent of 1 mole of biomethane as grams of COD per mole of methane; and 10 6 is a coefficient to convert resulting value from cubic meters to milliliters. gCOD for biomethane yield. For biodegradability tests or if an experiment involves the digestion of a sample with a known chemical composition, the approximate total biomethane yield can be roughly estimated according to stoichiometric equations [49][50][51]: The biomethane potential calculated according to Equation (26) is called theoretical methane potential and, alternatively, can be calculated based on organic fraction composition [52]. For instance, the theoretical biomethane potential for acetic acid or glucose is~370 ml CH 4 g , for casein~420 ml CH 4 g , etc.
However, such values do not account for the anaerobic degradability of substrate consumption for biomass production, and the real biomethane yield should be lower. The narrowing of coefficient existence intervals based on model re-parametrization could be the issue for models describing the digestion of mixture of compounds with different kinetic rates (rapid and slow) or even different kinetic models. Such situations are more relevant to real-world samples, such as animal manure with grass bedding, etc. The model re-parametrization approach can be illustrated by the following equation, assuming two first-order kinetics reactions without a lag-phase: The domain of function represented by Equation (27) is defined as: With some rearrangement expressing the total gas yield in experiment as V ∞ and a fraction of rapidly decomposing compounds via x as in Equations (29) and (30), we can obtain an equation which is mathematically equivalent to the original (27), but with a narrower existence range: After rearrangement, Equation (27) transforms into (32) with the existence domain of (33).
This will significantly narrow the search area, and with some experiment-based limitation it can be narrowed even further. Some models, already treated in similar para-representation way, are identified in studies by other authors and represented in Table 2.
However, the adaptation of multi-kinetics models to numerical solutions may become more complicated, since the same basic concept may be re-expressed differently based on assumptions of either two-step reaction, diauxic growth or co-metabolization.
A simplified case of single substrate two-step kinetics is the situation when some substrate (cellulose or starch, for example) are converted into methane though the formation of intermediate compounds such as volatile fatty acids (acetic acid, for example) through either hydrolysis or acidogenesis, and they are the substrate for actual methane formation: Such partial cases are described in the literature [16,57] where both steps are assumed to be running both at first-order kinetics, and are also included into Table 2 as first-order two-step reaction. Its adaptation for numerical solution will consist of introducing zero gas yield at timepoints before the lag-phase only. Table 2. Biomethane formation models for multi-step digestion.

Model Name Model Equation References
First-zero-order kinetic model [25,53] First-first-order kinetic model 2 [2,16,21,25,54] Modified Gompertz [6,21,25,55,56] First-order two-step reaction [16,57] First-order two substrate two -step reaction The first-zero-order kinetic model is also met under the names combined first-and zero-order kinetics; 2 The first-first-order kinetic model is also met under names: combination of two first-order kinetics, two-pool first-order and pseudo-parallel first-order.
Unlike two-step digestion, the diauxic growth is a phenomenon which describes possible consumption of two substrates when consortia first utilize one substrate and switches to the second one only after complete depletion of the first one [58,59]. For similar cases, Equation (32) can be re-expressed for numerical computation as: Another case could be a "co-metabolization", a phenomenon when two or more substrates are consumed simultaneously, but with different consumption rates: Each particular case of kinetics combination can have its own representation for numerical computation. Used software may cause extra complications in setting the bounds, due to their own implementation of search algorithms. Not all software or programming languages will support the range t Rlag < t < t Slag as bound for search, since there is extra need to "hardcode" that relationship or use dynamic bounds. Another approach would be splitting (re-expression) the lag-phase for later t Slag decomposed compounds as an arithmetic sum of lag-phase for rapid decomposing compounds t Rlag plus some incremental value which is equal to a delay between fast and slow decomposing compounds. Such system of equation can be expressed as: This system will be more universal across available data processing software; however, it requires extra recalculation to a set of coefficients from Equation (35):

Criteria to Compare Models "inter se"
The numerical approach simplifies the determination of equation coefficients but does not answer which model is the true one or which one is more trustable. Historically, the most often used way to describe how good a certain model describes the experimental data was the calculation of the coefficient of determination [17], also known as R 2 value, which is calculated as: Worth noticing is that the term "coefficient of determination" is applicable for linear regression [60], while for nonlinear models an applicable parameter would be "model efficiency coefficient" [61] with the same calculation formula. Despite this, a large number of BMP-related studies are still using the term "coefficient of determination".
An R 2 value that is closer to 1 implies a better model fit, but this parameter, in fact, compares the disturbances (scatter) of datapoints to differences of theory to actual measurements. Along with R 2 , one or more parameters are used to describe the actual difference between modeled data and experimental data. Some of parameters are collected in Table 3. Any of these should be minimized for better fit. The majority of implementations of numerical approaches utilize the RSS as the fitness function. Table 3. Fitness criteria used in various studies for BMP testing.

Criterion Calculation Used in
Residual Sum of Squares (RSS) |yi−ŷi| y i [17,64] Mean Square Percentage error (MSPE) [18] In the authors' opinion, the proper error function for numerical computation specifically in BMP testing should pay attention to how far the measurement is from the beginning of experiment. Such an approach would increase the price of error for later measurements and reinforcing the determination of plateau. However, the authors did not find any error functions for such correction in the literature for BMP data analysis, and therefore that function may be appropriate for future analysis.
In cases where it is not clear whether data is noisy or there is an overlap of several processes running simultaneously, the problem is still solvable numerically; however, solving it numerically also adds complexity, since it allows for splitting one equation into several smaller equations. Noisy data can falsely imply multiple process running simultaneously to split the model of one kinetics into several sub-kinetics. To prevent a false trail, criteria are needed which penalize excessive parametrization or "overfitting" and complexity. Some studies [12,65] use either a modification of R 2 called adjusted-R 2 and marked as R 2 or R 2 adj : where N is the number of datapoints (number of observations); RSS is the Residual Sum of Squares; and M is the number of parameters fit by the model. However, the primary tools purposed to avoid overfitting are the information criteria [47,53,62] collected in Table 4. The most typically used criteria are the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), which is less often referred to as the Schwarz Information Criterion. A lower value of a used criterion implies a better fit of the model to the experimental data. A special remark is required regarding the Akaike Information Criterion. It exists in several variations, which depends on sample size (number of datapoints). Table 4. Information criteria used in biomethane studies.

Conclusions
Based on this critical review, the major conclusion is that, despite the large number of publications dedicated to BMP testing and numerous attempts to standardize procedures, there is no "one size fits all" mathematical model to describe biomethane formation kinetics precisely. This conclusion is supported by the compiled and critical review of the existing empirical biomethane formation models found in peer-reviewed publications cited and the parameters used to estimate the fitness of model to experimental data.
The collected models addressed in this review were used in various studies referenced for BMP testing; however, the usefulness and practicality of such models should be reconsidered and verified on a wide variety of experimental curves. We questioned the viability of one model in this study, but insightful analysis of each model deserves a separate study. The criteria for selecting the most congruent model among several appropriate models, as well as the criteria for model fitness, present further concerns due to similarity of typical error functions for model fitness estimation. While information criteria intended to prevent the over-complication of calculation do exist, the usability of those criteria specifically in BMP experiments has not been addressed in existing BMP studies.
Since a single general model cannot be established for the best extraction of practically useful information from BMP testing, the analysis of each issue identified above should be undertaken. Such an investigation would require the collection and analysis of data for multiple experiments from different laboratories. The authors suggest that this review will lay the groundwork for such an investigation.
Author Contributions: Y.P. conceived the study, collected and analyzed models, wrote the manuscript under supervision and support from R.C.S. and C.D.M. All authors have read and agreed to the published version of the manuscript.  Specific initial growth rate on rapidly decomposing compounds