Not Just Numbers: Mathematical Modelling and its Contribution to Anaerobic Digestion Processes

: Mathematical modelling of bioprocesses has a long and notable history, with eminent contributions from ﬁelds including microbiology, ecology, biophysics, chemistry, statistics, control theory and mathematical theory. This richness of ideas and breadth of concepts provide great motivation for inquisitive engineers and intrepid scientists to try their hand at modelling, and this collaboration of disciplines has also delivered signiﬁcant milestones in the quality and application of models for both theoretical and practical interrogation of engineered biological systems. The focus of this review is the anaerobic digestion process, which, as a technology that has come in and out of fashion, still remains a fundamental process for addressing the global climate emergency. Whether with conventional anaerobic digestion systems, bioreﬁneries, or other anaerobic technologies, mathematical models are important tools that are used to design, monitor, control and optimise the process. Both highly structured, mechanistic models and data-driven approaches have been used extensively over half a decade, but recent advances in computational capacity, scientiﬁc understanding and diversity and quality of process data, presents an opportunity for the development of new modelling paradigms, augmentation of existing methods, or even incorporation of tools from other disciplines, to ensure that anaerobic digestion research can remain resilient and relevant in the face of emerging and future challenges.


Introduction
Mathematical modelling has an important and continued role to play in the design and operation of engineered biological systems (EBS), where the inherently complex interaction between biotic and abiotic components may result in unpredictable and undesirable process behaviour requiring operator intervention or advanced control architecture. Over the last four decades (circa 1980 onwards), much focus has been on the development and extension of classical models based on first-principles theory and engineering knowledge. Notably, the work of several research groups coalesced around a unified mechanistic model with the aim to represent, i.e. simulate, the dynamics of the activated sludge treatment process [1,2]. This reference model, the Activated Sludge Model No. 1 (ASM1) [3], initiated a highly productive period in wastewater engineering, where mathematical modelling became a distinct and important field within the bioprocess engineering community. Iterations of ASM1 to include phosphate accumulating organisms (ASM2 [4]) and subsequent consideration of their denitrifying ability (ASM2d [5]), facilitated a greater degree of granularity in the modelling of wastewater treatment processes. However, the increased model complexity due to additional requirements on parameterisation and model calibration highlighted the difficulty in developing rigorous mathematical models that are a trade-off between complexity and accuracy. Nevertheless, the framework of these models has remained in popular use, through refinement and augmentation, as the standard modelling tool used by wastewater engineering practitioners and researchers.
Given the relative ubiquity of the ASMs and a reawakening of anaerobic digestion research during the late 1990s, it was inevitable that engineers and scientists would seek to develop a similar modelling 3 of 31 background and, as such, we limit the use of mathematical equations to elementary examples that are qualified in the text. More thorough reviews of the mathematical derivations of the methods presented here are cited in the text.

A first-principles, engineering approach to modelling anaerobic digestion
The history of anaerobic digestion modelling from an engineering perspective, that is a mathematical description of the microbial dynamics and biochemical processes resulting in sequential biodegradation of complex organic material into biogas, begins over 50 years ago.
The pioneering work of John F. Andrews, an environmental engineer at Clemson University, South Carolina, formed the touchstone for incremental development of practical mathematical models that could be understood and used by both researchers and practitioners to augment their experimental studies. Andrews used the structural elements of Ordinary Differential Equations (ODEs), which had facilitated the dynamical study of microbiology using mathematical constructs by researchers such as A.J. Lotka, V. Volterra, J.B.S. Haldane, J. Monod, N. Rashevsky, the forefathers of a discipline now commonly referred to as mathematical biology [24]. It is interesting to note that these key players came from broad scientific disciplines; Haldane was a geneticist, Monod a biochemist, Volterra a mathematician, Lotka a biophysicist, and Rashevsky a theoretical physicist. 4

of 31
The early work by Andrews built on the empirical and theoretic research of these pioneers to derive mathematical expressions that were founded on first-principles chemistry encapsulated by the Michaelis-Menten enzymatic reaction rate equation [25] whereṖ is the rate of product formation, V max is the maximum achievable rate of the chemical system, K M is a constant representing the substrate concentration, [S], at a reaction rate half the V max . Through the early work of Lotka and Volterra, describing the dynamics of biological systems as a pair of nonlinear ODEs (i.e. the predator-prey equations) (see, for example [26]), and the empirical rationalisation of the Michaelis-Menten equation by Jacques Monod to derive an equation for microbial growth on a rate-limiting nutrient [27], the platform for mathematical modelling of bioengineering systems was established [28]. Andrews synthesised this structural and empirical knowledge to describe microorganisms growing on substrates that are self-inhibitory above a threshold concentration (e.g. those found in industrial wastewater treatment such as ammonia, phenols, and volatile fatty acids). Andrews coupled the Haldane function describing enzymatic inhibition with the Monod equation for microbial growth to develop Lotka-Volterra type nonlinear ODEs for simulating the continuous-culture dynamics of microorganisms under rate-limiting inhibition [29] dX 1 where X i , i ∈ [0, 1] are the microbial biomass concentrations in the reactor influent and effluent, respectively, S i , i ∈ [0, 1] are the substrate concentrations in the reactor influent and effluent,μ is the maximum specific growth rate, θ is the mean residence time in the reactor (NB 1/θ is commonly denoted as D, the dilution rate, in current modelling practice), K S is the half-saturation constant, K i is the inhibition constant, and Y is the yield coefficent of biomass growing on the substrate (for units, see original article [29]). What is notable about this study is that it is one of the first examples of the use of digital computing to obtain solutions to a system of equations describing microbial dynamics. A digital analog simulation program, PACTOLUS, ran the model on an IBM360 computer, employing a circuit-like block-diagram approach more commonly associated with electronic engineering. The importance of Andrews' seminal work in the late 1960s and early 1970s cannot be understated for coupling microbiological empiricism to engineering practice through the use of mathematical tenets. Integral to this was the understanding that process stability, and ultimately failure (e.g. through organic overloading, product inhibition), could be predicted by appropriate mechanistic models supported by experimental evidence. Andrews published the first mathematical model of the anaerobic digestion process in 1969 to address the increasing reports of digester failures [30]. Anaerobic digestion (AD) as an industrial process became prominent in Europe in response to energy shortages during World War II, and in the United States, anaerobic municipal solid waste treatment had existed as a technology from around the same period. Given the complexity of feedstocks that were being used in AD, and the relative immaturity in process engineering, the high prevalence of reactors experiencing process instability and failing would be significant. It was, therefore, serendipitous that Andrews initiated a first epoch in AD process modelling to understand stability just a few years before the 1973 OPEC oil embargo precipitated an oil crises that directly led to a drive towards alternative energy supplies, with AD at the forefront.
The dynamic modelling approach was extended further to include a method to incorporate changes in pH through the introduction of phase (gas, liquid, biological) interactions and alkaline buffering capacity (i.e. CO 2 -bicarbonate system) [31]. The time-dependent, predictive characteristics 5 of 31 of this extended model provided the ability to obtain a semi-quantitative assessment of the model uncertainty, that is the difference between observed (experimental) data and model prediction for the model variables of interest. Noting that model uncertainty comprises parametric uncertainty and structural uncertainty, and both model inputs and experimental observations are subject to measurement uncertainty, then it is generally accepted that the model predictions are not a perfect representation of reality [32]. In turn, this acknowledgment of uncertainty laid the foundation for model-based process control of bioengineering systems such as the activated sludge process and anaerobic digestion [32][33][34][35][36]. This was a momentous leap-forward in the early days of programmatic computing, especially as industry and academia were starting to coalesce around methods for incorporation of Instrumentation, Control and Automation (ICA) technology for the improvement of biological treatment processes [37,38].
Over subsequent years, other research groups began to build on the work of Andrews and further extend AD models to include additional stages of the process beyond acetoclastic methanogensis, the focus of the earlier models [30,31]. Fundamentally, these later models adopt the same mass-balanced, reaction kinetics approach as Andrews, adding more processes and components to describe an increasingly more complex, but also more complete conventional anaerobic digestion process. Hill and Barth included heterotrophic acidogens and the obligate methanogens previously studied by Andrews, and a function describing methanogen inhibition by VFA and ammonia [39]. As with previous models, the authors considered only the acetate metabolic pathway based on existing empirical knowledge at the time. Notation in this model differs minimally from that provided by Andrews in Eqs. (2,3), with the dilution term (1/θ) now given as the flow rate over the reactor volume (F/V) and the differential terms on the right-hand side stated implicitly.
A thorough mathematical description of the of the gas-liquid transfer of carbonate and ammonia, and the algebraic equations describing charge balancing to calculate temperature-corrected pH, is reported for the first time in this work. This detail invariably led to greater model complexity and, at the time, reliance on high specification computers and bespoke programming languages (Continuous Modelling Simulation Program, CMSP, op. cit. Hill & Barth). While the anaerobic digestion process was understood to be microbially diverse, process or population scale modelling could be simplified due to the highly structured and often compartmentalised nature of bioreactor systems. This structuring, based on agglomeration of microbial ecological functions, i.e. step-wise mass-balanced and stoichiometrically rational biochemical reactions performed by mixed, but functionally distinct microorganisms, provided a firm basis from which mathematical modelling of biologically engineered systems became established in the following two decades. Table 1 indicates a selection of the key mathematical models that provided necessary iterations towards a consensus dynamic model for anaerobic digestion, manifest as the Activated Sludge Model No. 1 (ASM1). For brevity, the list is by no means exhaustive and does not aim to present any detailed analysis of the models, which may instead be found in comprehensive reviews elsewhere [40][41][42].
Modelling efforts were bolstered by research conducted in empirical microbiology, which contributed to providing better characterisation (i.e. parameterisation) of anaerobic growth kinetics and understanding of the metabolic and biochemistry of the principal reaction pathways [43][44][45][46]. Experimental work provided the data and knowledge to determine the structure and stoichiometric properties of the anaerobic digestion process that, in turn, allowed for an emerging formalism in parallel model development. An important body of work that motivated a structural approach to AD modelling was Gujer and Zehnder's extensive study of anaerobic digestion conversion processes [44], which built on an earlier proposed structure describing overall substrate flux from particulate organic matter to methane and carbon dioxide end-products [47]. Their study describes anaerobic digestion as six distinct conversion processes, integrating established knowledge of the acetogenesis and methanogenesis steps with emerging bioprocess kinetics and microbial research on the mechanisms of organic matter hydrolysis and acidogenesis, thus providing a complete description of the canonical AD process as understood at that time. Defining the units of substrate flux in mass of theoretical Chemical Oxygen Table 1. Important AD mathematical modelling milestones towards ADM1. Key (Process): EH = Enzymatic hydrolysis, AG = Acidogenesis, AGp = propionate Acidogenesis, AGb = butyrate Acidogenesis, AGa = acetate Acidogenesis, AcG = Acetogenesis, pAcG = propionate-degrading Acetogenesis, bAcG = butyrate-degrading Acetogenesis, fAcG = fatty acid-degrading Acetogenesis, AMG = Acetoclastic Methanogenesis, HMG = Hydrogenotrophic Methanogenesis; (Industry): A = Agricultural, BW Biological (Activated) Sludge, G General AD, IW Industrial Wastewater; (Program): CSMP = Continuous Systems Modeling Program, NR = Dynamic solver/program not reported. NB Number of parameters include conventional operating/control parameters such as dilution rate and influent concentrations. * Parameter information not available. Demand (tCOD) as opposed to a molar basis, was a consideration replicated from activated sludge modelling, in its conventional use for measurement of wastewater characteristics and given its implicit function for calculating carbon oxidation states and, thus mass balancing of biochemical reactions [6,48]. NB Typically, for anaerobic modelling, concentrations of organic compounds are given in units kgCOD m −3 , and inorganic carbon and nitrogen in kmole m −3 , whereas ASMs generally use mg l −1 .

Model
The case for using molar units, however, was argued on the basis that the inclusion of inorganic carbon (CO 2 ) to close the mass-balance for anaerobic oxidation of VFAs invalidates the stoichiometric carbon balance of the biochemical reactions [49]. The structured approach was further formalised by J.D. Bryers [53] using simple matrix algebra to derive expressions for the biological reactions in the form of a stoichiometric matrix of (m × n) molar coefficients (ν i,j , for i = 1, . . . m reactions and j = 1, . . . n components [reactants, biomass, gases]) to ensure a completely mass-balanced system of process equations. In considering a unit basis of mass COD, normalised yield coefficients defining S j consumed (−) or formed (+) in the i th reaction are given by where M j and m j are the molecular weight and COD mass equivalence of component S j , respectively, and lim indicates the limiting reactant in the i th reaction. Now, using matrix algebra, we have the following condition under the principle of mass conservation where Y is an m × n matrix of yield coefficients and S is a vector of length n of components. The use of a structural matrix for deriving a system of equations representing biochemical reactions was first proposed by E.E. Petersen [59] and employed in the development of a mathematical model of activated sludge systems by the IAWPRC (now International Water Association, IWA) Task Group on Mathematical Modelling for Design and Operation of Biological Wastewater Treatment [3]. The Gujer matrix, as schematically summarised in Fig. 2, is partitioned into three sections comprising a matrix of stoichiometric coefficients and a process rate vector, as described above, and a composition matrix that includes conservative constituents of the state variables, e.g. biomass contains both COD and Nitrogen, which ensure proper mass and charge balancing as model resolution increases, i.e. the product of each row in Υ and each row in the composition matrix will be zero.
The conversion rates for each state variable can be simply calculated by the column-wise summation of the product of stoichiometric terms and process rateṡ where Λ is the matrix of m × n stoichiometric terms and ρ is a vector of n process specific kinetic rates, which take the form where µ j,lim is the substrate-limited growth rate of organism X, defined by some growth function (e.g., Monod saturating kinetics, Contois density dependent growth, etc), cf. the final term in Eq. 3. NB Hydrolysis in these early models are typically included as first-order reaction rates (i.e. r hyd = k hyd S).
The notationṘ ≡ dR j dt , for j = 1, . . . , n, is used to indicate that the resultant vector of rates are time-dependent and equivalent to a system of ODEs, with additional terms to mass balance the reactor and describe any liquid-gas transfer, which can be integrated to give transient dynamics of the process with respect to each state-variable.

Figure 2.
A general schematic of the Gujer/Petersen matrix. We use ν i,j to represent stoichiometric coefficients, ρ j for process rates and indicate the two commonly used constituents in biological treatment models, Chemical Oxygen Demand (COD i ) and Nitrogen (N i ).
This highly structured matrix formulation for constructing biochemical reaction models was adopted as standard within the engineering community, leading to a general framework for modelling the activated sludge process, specifically related to nitrogen removal (ASM1) [3]). It was particularly open to integration in simulation software, especially programs with superior abilities for rapidly solving matrix linear algebra problems, such as Matlab and FORTRAN. Naturally, this led to a number of bespoke simulation softwares to provide engineers with the tools to rapidly develop and interrogate the models through process simulation, often in a modular way (e.g., Biowin, STOAT, GPS-X, SIMBA, AQUASIM) [1]. Although the Anaerobic Digestion Model No. 1 was published over ten years after the first ASM, work on developing a general anaerobic digestion model employing a similar structural and notational form emerged earlier. Siegrist et al. used a Gujer matrix of 14 processes and 18 state variables to model the digestion of mesophilic sewage sludge [57] and Christ et al. used this approach with a better description of the hydrolysis constants for the digestion of municipal organic residues [60]. Other groups incorporated greater degrees of resolution in existing deterministic models of the process, for example by separation of the complex organic substrates (carbohydrates and proteins) for hydrolysis [56,61,62].
The Anaerobic Digestion Model No. 1 (ADM1), developed by the IWA Task Group for Mathematical Modelling of Anaerobic Digestion Processes [6], consolidates much of the preceding theory, empirical knowledge and mathematical structure referenced in this section. The original model represents a comprehensive five-stage anaerobic digestion process (with 19 sub-processes) comprising biochemical reactions (defined through the use of the Gujer matrix) incorporating degradable substrates and soluble/insoluble inerts, and physico-chemical reactions (non-biologically mediated processes such as ion association/disassociation, phase transfer). Biochemical processes are [1] an initial disintegration of composite material (e.g. lysed biomass) to particulates, which undergo [2] enzymatic hydrolysis (forming carbohydrates, proteins and lipids), [3] acidogenesis of simple sugars and amino acids to organic acids, CO 2 and H 2 , [4] acetogenesis of long-chain fatty acids and organic acids (butyrate, valerate, propionate) to acetate and hydrogen, and finally [5] acetoclastic and hydrogenotrophic methanogenesis. The model may be solved as a system of 32 ODEs, where some lumped variables are decoupled (e.g. inorganic carbon S ic ≡ [S CO2 , S HCO3− ]). Biochemical reactions are assumed to follow the 1 st -order rate law for extra-cellular processes and Monod-type substrate uptake for intra-cellular 9 of 31 kinetics. Alternatively, the model can be defined as a system of Differential Algebraic Equations (DAE), with the biochemical reactions integrated explicitly as a system of 26 ODEs (7 groups of bacteria and archaea, 19 catalysing processes), three liquid phase physico-chemical processes, and eight implicit algebraic variables (per reactor). This approach is a means to overcome the inherent 'stiffness' problem due to the fast hydrogen and pH dynamics (now explicitly calculated algebraically), which results in some numerical solvers encountering difficulty in solving the system of ODEs.
As noted earlier, the model is far from perfect but its flexibility and structure means it is suitable as a tool to be used at the design stage of AD plant development, for qualitative scenario testing, or for benchmarking control schemes [63][64][65]. As with the ASMs, modifications to the model have been implemented based on process requirements, such as operating mode [66,67], or to include new biochemical processes [68,69]. A more comprehensive transition to a consensus ADM No. 2 that addresses the main issues with the original model has yet to emerge, perhaps due to the lack of clear consensus or motivation, or that its inherent structure provides ample opportunity for modification of components for bespoke applications. Many of the suggested approaches relate to the lack of identifiability and heavy parameterisation of ADM1, which makes its use for process monitoring and model-based control difficult [70]. It is interesting to note that models that have sought to overcome the structural verbosity, are greatly simplified, relying on fewer state variables and lumped parameters. These models, such as the Bernard model [41,71] and AM2 [72], relate back to the pioneering early work of Andrews and Graef, allowing the process to be identifiable and enabling process control schemes to be designed to improve and optimise process performance.
In the next section, we take a closer look at how reduced modelling approaches and their mathematical analysis are being used to provide further insights into the dynamic and steady-state behaviour of anaerobic digestion.

Reduced Order Models Unlock the Power of Mathematics
It is common practice in modelling to start with a simple description of the system under investigation before building in complexity to achieve a more complete representation of reality. However, it should be noted that greater complexity does not necessarily equate to a higher degree of accuracy [73]. Terminology such as simplicity and complexity are often abstract and subjective, but in the frame of engineering models such as those discussed here, models with a greater number of parameters, state variables (dimension), and processes, or those that are highly non-linear, can be defined as complex. In mathematical sciences, this complexity is often referred to as model order and is the principle driver for the richness of dynamical behaviours captured in microbial community models [74].
The early anaerobic digestion models were low-order and aggregated, developed with the empirical knowledge of the biochemical reactions at that time, and constrained by the available computing resources and numerical solvers for integrating the system of differential equations. The dramatic advances in computational power and efficiency have provided engineers with an ever increasing ability to simulate bioprocesses at greater levels of complexity and speed. However, there are still significant trade-offs between model resolution, accuracy and performance. For example, consider a mathematical model describing biofilm growth in a biological reactor. There are several approaches that can be taken to represent the evolution of this process in time, where the simplest would be to assume a homogeneous, quasi steady-state biofilm and to develop a model as a system of ODEs, for which the solution is trivial (e.g. [75]). The underlying assumption of a completely mixed system is not correct in practice, as biofilms are subject to diffusive mass transfer limitation, requiring consideration of spatial heterogeneity. Describing biofilm processes at higher dimensions, using Partial Differential Equations (PDEs) to solve the diffusion-limited mass transfer of biomass, gas and liquid (i.e. using the second-order Navier-Stokes equations or via the simpler boundary layer approximation), is non-trivial and significantly increases the computational requirements to find solutions [76]. A trade-off between model complexity and empirical uncertainty, that is the extent and accuracy of Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 data to support model parameterisation, is one of the fundamental challenges that limits anaerobic digestion modelling [77].
Model order reduction methods such as decomposition, interpolation, low-order approximation and simplifying physics, are generally applied in engineering when high order models are computationally costly, or simulation become impractical. In EBS, mathematical biologists are often interested in the qualitative behaviour of the system rather than quantitative predictions, and the simpler models often allow for a tractable and complete analysis using techniques re-purposed from dynamical systems theory. It is important to ensure that the simplifying process retains conservation of the properties of the full-order model and has a low approximation error, otherwise analysis of the simple model becomes a study of a largely theoretical system with no further insight into the actual system. Anaerobic digestion is an ideal process for reduced order modelling as it is highly compartmentalised allowing for easy decoupling of system interactions into smaller sub-processes with fewer state variables and parameters. Physico-chemical processes that may operate at significantly different time-scales to the biochemical reactions may also be excluded.
The work of J.F. Andrews and colleagues (see Section 2) laid the early foundations for the application of dynamical analysis to anaerobic systems. They presented not only a template for mathematical modelling and dynamic simulation of the process, but were the first to discuss the opportunities for automatic process control to address system instability or failure caused by acidification or functional inhibition [31,34]. The analysis at this time involved exhaustive and time-consuming numerical simulation (by today's standards) of a model over hundreds of operating points and process conditions. Although ADM1 consolidated state-of-the-art modelling and provided a reliable, if imperfect, representation of the process, others drew on these earlier ideas to address specific challenges such as the effect of reactor start-up conditions on process stability [78] and reactor control under process disturbances [79] or shock loading events [80,81].
A major benefit of having representative and qualitatively reliable reduced models, which are tractable to mathematical interrogation, is their availability for deeper study of the nonlinear dynamics and the implications on system performance. In effect, they represent in silico tools that can complement time-consuming experimental studies, allowing for rapid investigation of local and global system behaviour to aid process or experimental design, derive new knowledge and understanding, and even suggest new operating protocols. It must be reiterated that models cannot completely replace practical experiments, nor prove comprehensively the underlying process mechanisms in all but the most simple of systems [82].
Anaerobic digestion models have been extensively studied by mathematical biologists due to their generally reducible structure, the rich dynamics emerging from relatively few biotic interactions, and the modularity of components facilitating an array of hypothesis driven studies. Mathematical analysis provides an ability to study the asymptotic behaviour of dynamical systems, such as AD. That is, we are able to observe whether the process tends to some steady-state (stable equilibrium) or undergo unforced cycling (functional periodicity), for a given set of parameters and initial conditions of the process [83].
A number of mathematical analyses of existing AD models appeared in the 2000s. The Bernard model (see Section 1), describes a two-step (acidogenesis-methanogenesis) process and is tractable to dynamical systems analysis of its steady-state behaviour and stability. One of the principal benefits of simpler models are their availability for model-based process control, where a perfect representation of the typically non-linear system is not required [84]. An initial bifurcation analysis of the model at equilibrium allows for identification of the key process variables that can significantly impact process behaviour and, thus, determine the main candidates for control variables and their critical values. Rincón et al. employed the model to design an adaptive controller to avoid biomass washout in an anaerobic digester through identification of the normal form bifurcation (i.e. local behaviour around the system equilibrium) with dilution rate as the bifurcating parameter [85]. An analytical determination of the model equilibria and their stability using general growth functions (i.e. only the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 functional response and not the mathematical structure of biomass growth is assumed), showed the existence of bistability dependent on initial conditions, where the process may converge to the nominal (desired) operating point or undergo acidification and complete washout of the biomass [86]. Recently, the Bernard Model was again used and a rigorous analytical approach employed for open-loop control of Biochemical Oxygen Demand (BOD) within a target design interval [87]. The model has also been used to investigate the correct determination of overloading tolerance, defined as the distance between a stable interior equilibrium and the limit point of the system, for correct process monitoring [88,89].
To address issues of process representation with this reduced order model, especially at low microbial population sizes, a stochastic process has been suggested as a way to account for its structural and parametric uncertainty [90], demonstrating a promising, if not fully realised, potential.
Researchers have studied the dynamical behaviour of EBS more generally [91], analysing the role of ecological interactions (e.g. competition, mutualism), rather than the specific process, on the existence, uniqueness and stability of the system [92][93][94]. These methods have also been applied to specific processes, such as the ammonia removal (SHARON) process [95], Anammox systems [96], anaerobic chlorophenol mineralisation [97,98], and conventional anaerobic digestion [99,100].
While global stability analysis is challenging for models of increasing order, an initial numerical analysis of the full ADM1 (44 state variables, 29 ODEs, 15 DAEs) was achieved using a prediction-correction continuation algorithm for solving the differential equations and the Newton-Raphson method for simultaneous solution of the algebraic equations [101]. They showed, through performing a one parameter bifurcation, that a limit point exists when either the dilution rate increases or substrate inlet concentration increases above or decreases below critical threshold values. A positive feedback loop between acetic acid accumulation and acetate degrader washout due to inhibition was identified as the critical mechanism in all three cases. Interestingly, a reduced model that allowed for the derivation of analytical solutions matched the qualitative behaviour of the full ADM1, as long as key features (i.e. ammonia inhibition) are preserved in the simpler model). Further study of the Börnhoft model have shown the role of inhibition on multiplicity of equilibria [102], expanded knowledge on the richness of AD dynamics, and counter-intuitive result suggesting biogas production may not always be optimal at the co-existence equilibrium [103]. A recent study not only provides new theoretical results for the model, but examines the effects of stochasticity to account for both environmental and parameter uncertainty (as with [90] and the Bernard Model). Interestingly, the author's find that the process is more prone to failure at start-up and becomes resilient to stochastic effects as the microbial communities establish. This is perhaps an intuitive result given real-world examples of startup failure in biological reactors, for example due to acclimation issues (e.g. [104]).
A relatively simple example of bifurcation analysis, for illustrative purposes, is provided in Appendix A, using the model first developed by Andrews (See Section 2) [29].
Despite a significant body of work evidencing the power of mathematical analysis for enhancing AD process understanding and enabling practical controller design, there still remains a substantial barrier to the wider use of these techniques for engineering purposes. In part this is due to a dissonance between the level of mathematical acuity required to conduct the analysis and the substantial process knowledge in the domain of engineers and scientists. While there do exist groups who have core competency in both fields, more formal engagement through communication, collaboration and bottom-up empirical studies to support theoretical observations, will be necessary if it is to gain traction in the wider EBS community.

Empiricism, Data and the Digital Future
It may be argued that the mechanics of anaerobic digestion is well-understood and the maturity of phenomenological models facilitate their use for process design, operation and control. In data poor environments, where measurements are infrequent, erroneous or hard to obtain, mechanistic models are an obvious choice. Nevertheless, improvement in sensor technology and fast and cheap tools to characterise the microbial communities in digesters through multi-omic analyses, has gifted Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 scientists and engineers with a wealth of data and knowledge of their ecology [105][106][107]. When used effectively, this can provide a greater resolution of understanding, helping to derive models that link microbial community composition and function effectively [108,109]. For example, metagenomic data as microbial biomarkers have recently been used in combination with a data-driven model for prediction of ammonia and phenol inhibition [110]. Unlike data-rich industries that employ extensive process monitoring, there is a sense that data-driven models for systems that are open to high levels of uncertainty and transient or irregular variation in their operation is under-exploited. While process monitoring and automatic control have become standard, for example in most large wastewater treatment systems, there is still a tendency to rely heavily on existing knowledge and empirically driven decision making, where the plant operators are the experts, while much of the data gathered at the plant is unused. This has come to be known as 'dark data', that is data collected by an industry that is either archived or discarded in lieu of capacity or motivation for its use at that time [111]. It can be argued that for AD plants, dark data is largely irrelevant given the quantity of data typically acquired, but as industries are increasingly integrated, relying on automation and knowledge extracted from data, so the opportunities offered by digital technologies, which can manage large and diverse amounts of information, become apparent [112].
Machine learning and intelligent systems (i.e. Artificial intelligence) have become, seemingly, pervasive in industry and society in general. However, there are risks associated with their wholesale adoption, and not because they threaten to break Asimov's three laws of robotics, as popular media may like to imagine. The main issues for concern, especially in motivating uptake by the water and AD industries, are (i) the potential for application redundancy by using 'smart' tools that provide little or no benefit where conventional engineering practice is more than adequate (human vs. machine expertise), (ii) hastily appropriated technologies without undertaking due diligence on model relevance/adequacy or consideration of the long-term system behaviour (garbage in, garbage out / inherent non-linearity and uncertain dynamics), (iii) advanced models and algorithms may require significant cost and expertise to implement and maintain. Nevertheless, it is apparent that developments in data acquisition and computing capacity facilitates the cautious use of 'smart models' for improvement of EBS. Indeed, many of the tools used now are based on fundamental theory developed over a century of statistical and mathematical research. For example, popular dimension reduction and regression techniques such as Principal Component Analysis (PCA) and Partial Least Squares (PLS) have been re-purposed for chemometric process monitoring [113,114] and control [115] of anaerobic digesters. These multivariate methods are used to transform high-dimensional data by matrix factorisation into a lower dimensional space that retains the maximal information (variance) present in the original dataset (X for PCA) or maximal covariance between predictor and response datasets (X and Y for PLS). The relationship between the original variables and samples projected into this reduced dimensional space represents a model of the system or process described by the original data where T and U are the projections (or scores matrices) of X and Y, respectively, P and Q are the weights (or loadings matrices). E and F are the residual matrices. If the purpose of these models is for process monitoring and fault or disturbance detection, it is vital that the data used to construct them, and the number of principal components (factors in PLS) to retain (T l , U l , where l < m, where m is the original number of variables in the X-block), represent its normal operating behaviour. New data acquired from the process is projected into the model and statistical measures (e.g. Hotelling's T 2 statistic) are used to identify samples that exceed some pre-determined threshold. The presumption is that these points represents non-normal process behaviour where variable contributions to the loadings may be examined to determine the source(s) of this exception.
PCA has been used as a data pre-processing step for extracting the correlation structure between variables. This variable selection method has been used as a preliminary step in the use of machine learning models to predict biogas yield from municipal AD [116]. A PCA model was used to detect disturbances in a sewage sludge fed AD process using PLS regression to chemical composition of the digestate from Near Infrared spectral data [117].
However, conventional multivariate models are limited by their stationarity [118], i.e. the model is a representation of the process over a fixed domain (e.g. time). In order to be useful for monitoring of dynamic processes, such as wastewater treatment and AD, adaptive/recursive methods have been developed for real-time monitoring and multivariate statistical process control, whereby the data model is updated with new information as it is acquired [119][120][121].
Unlike PCA, regression models, such as PLS, can be used to relate empirically derived data (measured explanatory variables) such as spectral data to one or more dependent (predicted) variables such as chemical concentrations, through their covariance structure. These regression models become useful in process operation when key performance indicators or control variables are difficult to measure. A model may be developed based on available and reliable measurements that are correlated with the target variable, acting as a soft-sensor for process monitoring and control. Like PCA methods, issues such as stationarity, multi-collinearity, and bias persist with ordinary PLS (OPLS). Robust method adaptations and rigorous data handling protocols are generally available (and required) to handle issues that may be encountered when using multivariate tools in practice [122,123]. Multivariate soft-sensors have be utilised quite widely by the wastewater community, but fewer examples relating to the anaerobic digestion process have appeared [114,124,125]. This may be reflective of the general preference for using mechanistic models in AD or, anecdotally, the suspicion by practitioners that data-driven models are intangible and unrepresentative of the process. This may be exacerbated by the varied quality and quantity of data historically recorded at the plants themselves.
Nevertheless, 'smart' AD systems as a sub-class of a wider smartening of industry through the so-called 4th Industrial Revolution is an interesting and potentially valuable direction that practitioners may explore. This is especially pertinent in relation to the larger 'Smart Water' initiatives and repurposing of AD technology in bio-based manufacturing and biorefineries.
A renewed interest in Artificial Neural Networks (ANN) and Fuzzy logic, methods that have a long association with the bioprocessing and manufacturing industries, as well as AD itself [126,127], may propagate research and debate on the role of advanced data-driven models for application to AD moving forward. However, at present most applications of machine learning models have focused on conventional purposes, such as methane prediction [128][129][130].
In the final section we discuss how modelling is evolving by incorporation of interdisciplinary ideas and methods and show that no single approach is a catch all for engineered biological systems and anaerobic digestion, specifically.

Multi-disciplinarity and the Future of Anaerobic Digestion Modelling
Mathematical modelling will play an increasingly important role in the design, monitoring and optimisation of EBS such as anaerobic digestion. As new challenges and opportunities emerge, technology develops and understanding increases, models will adapt and integrate the new knowledge acquired in the pursuit of smarter, cleaner and transformative processes. A recent review has highlighted possible trajectories for mathematical modelling of AD systems, both vertically (i.e. scale of models and connections across these scales) and horizontally (i.e. integration of processes, new anaerobic technologies) [22].

Hybrid models
In the previous sections we have seen how disparate modelling approaches have been employed in the design, monitoring and control of AD. The selection of the appropriate model to use is dependent on its purpose, the quality, quantity and type of available process knowledge and data, the cost of Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 model development and implementation versus the benefit of employing a model, among other considerations. It is a truism to state that no model is perfect and often no single model is uniquely suited to the task in hand. For example, when there is lack of confidence in process measurements or data for characterising a process is unobtainable, but a good mechanistic understanding has accrued, then a first-principles model seems logical. On the other hand, for many complex biological systems, only surface understanding of the system is known and no consensus exists on the precise structural properties and interaction mechanisms, then a data-driven approach may be suitable. There is still debate, intensified by the current digitalisation epoch, as to whether machine learning is ready to entirely replace mechanistic modelling. As previously alluded to, the rapid adoption of AI and machine learning tools across many industries brings to mind the gold rush era. Some may succeed and prosper, while others may inevitably discover only fool's gold. This is exemplified in the medical field, where early adoption of machine learning by clinicians is superseded by their acquisition in tackling more complex problems found in cell biology research. To quote a recent polemical opinion on this subject [131]: Fundamental biology should not choose between small-scale mechanistic understanding and large-scale prediction. It should embrace the complementary strengths of mechanistic modelling and machine learning approaches...
Hybrid modelling is not a new concept, in fact many applications of what would be considered 'pure modelling' are a combination of structured and unstructured methods, e.g. consider parameter estimation techniques, such as the Extended Kalman Filter for fitting mechanistic models. However, explicit use of distinct modelling approaches have rarely been considered for application to bioprocesses. For anaerobic digestion, it appears viable that combining mechanistic and data-driven methods, together with extant process knowledge and empirical data would provide a means to overcome some of the shortcomings of either modelling approach alone [129].
An early example of hybrid modelling for AD employed a neural network to model process kinetics in combination with the Bernard Model described in Section 3 to ensure the ANN is constrained by the law of mass conservation (see Eq. 5) [132]. A similar approach using a kinetic model for lignocellulose fermentation and a PLS soft-sensor showed reasonable prediction potential over long time horizons [133]. The study is notable in its claim that it represents a simple demonstration of the digital twin concept. Although we don't discuss digital twins in this review, the concept that physical systems from individual process units to whole cities may be replicated digitally is becoming fairly ubiquitous, in tandem with concepts such as Industry 4.0 mentioned previously. While it could be argued the use of buzzwords may attract attention to otherwise conventional research, all models may be classed as digital representations of real world phenomenon and some of these will feature in the establishment of digital twins in the future.
A neural network was used to predict AD performance and a Particle Swarm optimisation (PSO) algorithm is used to maximise methane yield, essentially employing a (meta)heuristic approach to evaluate a fitness function until a stopping criterion is reached (based on temperature and VSS, in this case) [134]. Nature-based methods, such as PSO, are not covered in this review, but more information on their emerging relevance to bioprocess monitoring, can be found in [135]. PCA was used as a feature extraction method to greatly reduce the dimensionality of ADM1 with relatively low loss of information, producing an identifiable model for supervisory control and optimisation of a winery effluent treatment plant [136]. A novel hybrid approach, augmenting ADM1 with information at the cellular level including intracellular fluxes and microbial activity, has recently been described. Flux Balance Analysis (FBA), a method to predict metabolic function from genomic information, was used to replace the Monod growth function in the ADM1 with a FBA growth prediction based on intracellular flux of metabolites [137]. The method allows for the dynamic prediction of optimal microbial composition, where conventional FBA provides steady-state analysis.

Thermodynamic modelling
There has been considerable debate within the scientific community on the aptitude of purely kinetic models to describe the behaviour of systems that are subject to energetic constraints [28], especially in systems where microbial growth is directly linked to free energy availability. In anaerobic digestion, omission of energetic considerations in the mathematical description of the process dynamics will lead to qualitative and quantitative errors in model predictions, significantly undermining their usefulness. Indeed, the interplay between microbial kinetics and energetic constraints becomes important when the system approaches thermodynamic equilibrium, that is moving from exergonic sufficiency (net release of free energy) towards an endergonic state (net absorption of free energy) where biochemical reactions are completely inhibited (See Fig. 3 for an example of the impact of hydrogen on thermodynamic inhibition of anaerobic propionate degradation). The rich microbial ecology that emerges in low energy environments, where thermodynamic inhibition is likely to constrain metabolic conversion, is of both interest and importance in the preservation of viable communities in AD [138,139] and the study of thermodynamic principles in biological systems has been an active area of research for decades [140,141].
Various strategies have been proposed for the incorporation of thermodynamics into existing kinetic models or for the development of a universal theory. The Monod growth function is purely an empirical construct. Despite showing that the equation can relate with thermodynamic principles through the substrate affinity constant, i.e. K S = e −∆G 0 /RT [142], it is done so by making the assumption that only the limiting substrate and not microbial concentrations impact the growth dynamics. The Gibbs free energy change as the thermodynamic driving force for biochemical reactions forward was first coupled with a Michaelis-Menten type irreversible kinetic model describing anaerobic propionate degradation by Hoh and Cord-Ruwisch [143]. They summised that reactions occurring close to thermodynamic equilibrium are reversible, driven by bidirectional reaction rates for the enzymatic conversion of substrates ([S]) to products ([P]), and the reverse [25,144], given by the Briggs-Haldane mechanism where E is the enzyme concentration, Eχ is the enzyme-substrate-product complex, and k 1 , k 2 , k −1 and k −2 are the forward (k + ) and reverse (k − ) rate constants, which are combined to give the overall equilibrium constant K eq = k 1 k 2 /k −1 k −2 . Under substrate saturation conditions, the Haldane kinetic scheme (shown in Eqs. 2 and 3, within square brackets) dominates any thermodynamic constraints. However, as the substrate is consumed, a trade-off between kinetics and thermodynamics becomes apparent [138], and a rate equation for the reversible reaction can be derived where k r = k 2 /k −1 , Γ is the mass-action ratio ([P]/[S]) at dynamic equilibrium). Given that the overall free energy change of reaction ∆ r G = RT ln Γ/K eq [145], then the general model proposed becomes [143] µ =μ In modelling the growth rate using the theory of reversible enzymatics, the energy required for driving the reaction and invested in maintaining other cellular functions not related to growth is overlooked, i.e. energy used for microbial growth is not the maximum free energy available in the system [146][147][148].
There has been significant developments in the understanding of thermodynamic principles for general Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 application to microbial life, notably refinements in the derivation of metabolic energy requirements [149,150] and attempts to develop general or universal models to address shortcomings in established models, e.g. limits on metabolic parameter identification [151], and even the use of Maxwell-Boltzmann statistical physics to derive a fundamental link between microbial growth and energy states at the molecular level of substrates (i.e. the Microbial Transition State) [152,153]. However, despite the considerable amount of research on the application of thermodynamics to microbial systems, there are relatively few examples of direct application to the anaerobic digestion process itself. Oh and Martin used principles of thermodynamic equilibrium with chemical activity and fugacity together with a stoichiometric model to study energetic requirements of methanogensis [154], and recently Delattre et al. have proposed a kinetic-thermodynamic model to characterise a synthetic anaerobic tri-culture community performing sulphate reduction and methanogenesis [155]. Despite this, it is certain that as more consistent approaches are developed and theory more widely understood, appropriate thermodynamic models will be developed and tested for application to AD systems with the aim to improve performance and understand energetic limitations of different metabolic pathways. A recent paper has demonstrated that, potentially, analytical methods from mathematics (see Section 3) can be utilisd, despite the increased complexity, to elucidate the role of thermodynamics in biological growth models [156].

Computational Fluid Dynamics
While ADM1 may be regarded as the most complete phenomenological description of the AD process from a bioprocess perspective, where the conditions of microbial growth are rigorously defined, there is an underlying assumption, as in many chemostat-type models, of spatial homogeneity, i.e. a completely mixed environment, or zero spatial dimensionality. This makes calculation of mass retention time and transfer rates relatively straightforward. However, in practice these ideal conditions are not observed and, at full-scale, the heterogeneous distribution of metabolites and biomass can lead to sub-optimal performance, dead-zones and unwanted reactor behaviour (e.g. foaming, physico-chemical gradients) [157,158].
Modelling the mass transport and hydrodynamic behaviour of fluids and particles in anaerobic digestion has been an emerging research field in recent years. Computational Fluid Dynamics (CFD) is an established modelling tool that uses the mathematics of fluid mechanics (e.g. Navier-Stokes, Euler equations) and numerical analysis to solve the governing equations on a 1 to 3 dimensional discretised grid [159]. As computational capabilities have increased (see Section 1), so the possibilities for using CFD to influence design and performance of AD reactors, especially related to mixing efficiency and identification of flow issues, have become a viable tool for multi-disciplinary research teams [160] and modelling consultancies (see Fig. 4 for an example of CFD employed at full-scale). This is reflected by the increasing number of studies and software platforms that use CFD for AD research, including the performance of gas-mixing [161], physical mixing [162,163], process design and optimisation [164].
The use of CFD in anaerobic digestion processes, whether for reactor mixing and design or process optimisation, is likely to become a valuable tool for process engineers. At present it is used as a complementary approach with models that describe biochemical reactions through dynamic integration. However, recent studies have looked at using CFD models with biological kinetics and advection-diffusion transport of soluble metabolites and biomass, in a similar way to the methods for modelling spatially realised biofilm growth [165,166].

Individual-based modelling
We have seen that modelling of anaerobic digestion, and bioprocesses more broadly, have evolved and diversified as knowledge has accrued at different scales and computational capacity has dramatically increased. Although mechanistic models are likely to persist as the method of choice for engineers and practitioners, other approaches will emerge that offer those working in AD with tools to enhance their understanding and management of these systems. One such method that is highly Figure 3. Example of the thermodynamic 'window of opportunity' for syntrophic degradation of propionate via two pathways (MMC is the 'classical' methylmalonyl-CoA pathway). Both reactions are susceptible to hydrogen inhibition, shown by their increasing endergonicity as H 2 partial pressure increases. Hydrogenotrophic methanogens mitigate this by transforming hydrogen to methane, above a certain partial pressure threshold required for growth. The red and blue boxes shows the hydrogen partial pressures at which propionate degradation by the classical and Smithella pathways are possible due to this syntrophic behaviour (Adapted from [139]). Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1 granular and explicitly resolved is Individual based modelling. These models (IbMs) and associated techniques such as agent based models and cellular automata (CA), are spatio-dynamical and discrete in time, space and states. While CA are typically used to predict geometric distributions based on local interactions, IbMs are highly resolved at the cell or individual level [74,167]. Unlike aggregated and continuous population models, IbMs proscribe behavioural physical and biological traits to individuals in a highly discretised manner, where the complex, global behaviour of the population emerges from the simpler, local properties and interactions amongst individuals. IbMs have historically been used in macro-ecology, but found significant purchase as a tool for modelling biofilms, building on knowledge gained from spatio-temporal, multi-species biomass based models, such as those briefly mentioned in Section 3 [168,169]. In addition to the microbial growth properties, physical interactions between individuals, motility, environmental gradients, fluid dynamics, thermodynamics, and even stochasticity may also be included as hybrid components to a more comprehensive process, sub-process or reactor scale model [170][171][172]. Such a multi-scale model is highly complex and requires a large degree of multi-disciplinarity to ensure successful implementation. However, the use of IbMs is becoming more ubiquitous and recent efforts to develop large scale simulators of EBS using hybrid IbM and statistical emulators have established a framework for studying microbial community behaviour at scale [69,173,174], opening the door for application to anaerobic digestion processes [172,175] (See Fig. 5 for an example application of an IbM applied to anaerobic processes).  Morphology is dependent on initial seeding. The relative density of acetoclastic methanogens (xac_ch4) and hydrogenotrophic methanogens (xco2_ch4) is indicative of the dominant acetoclastic pathway, and the syntrophic interaction with the acetogens (xglu_ac). The images were produced using the NUFEB simulator [176] by Bowen Li (Newcastle University, UK, June 2020) and used with permission.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: we now set the LHS of the equations to zero and solve the equalities for X 1 and S 1 . This results in the identification of three possible steady-states defined analytically by E 0 = S 0 1 , 0 (A3a) where λ i and X * 1,i , i = [1,2] are the steady-state concentrations of S 1 and X 1 , respectively (See Appendix A for details).
To calculate the stability of the equilibria, we first construct the Jacobian matrix of the system For each equilibria, we can substitute expressions for S 1 and X 1 at steady-state into the Jacobian and check its eigenvalues, knowing that |J − sI| = 0, where I is the n × n identity matrix and s are the eigenvalues that satisfy the equation. Taking, for example, the washout equilibria E 0 , the eigenvalues are determined analytically to be [−D, µ(S 0 1 ) − D]. Hence, we can now derive an explicit condition that states E 0 is meaningful and stable (s i < 0) for D > µ(S 0 1 ). In biological terms, this means if the dilution rate is higher than the growth rate of the organism, its washout is guaranteed, a common observation that links chemostat theory with empirical understanding. We can also find analytical expressions for the existence, uniqueness and stability of the other two equilibria but omit them here for brevity. Layperson definitions: Existence: Conditions for which an equilibrium exists; Uniqueness: Under the conditions for existence, is the criteria for one and only one equilibrium satisfied?; Stability: Given some arbitrarily small perturbation away from the equilibrium, does the system return to that equilibrium (local stability)? Does the system returns to an equilibrium from any permissible point in that system (global stability)? The Routh-Hurwitz criterion and Lyupanov functions are common methods used to test stability for systems of ODEs [177].
Unless these analytical expressions are transparent to interpretation, it is often difficult to extract practical knowledge from them. However, because they describe the general and immutable behaviour of the system, local bifurcation analysis can be used as a complementary method for visual interpretation of existence and stability of equilibria. For engineering purposes, bifurcation analysis is usually performed using free system parameters, i.e. the operating or control parameters that can be adjusted. However, bifurcation using fixed or unidentifiable parameters can provide useful knowledge for model fitting when there is a degree of parameter uncertainty, or potentially where microbial traits can be adapted or evolved (e.g., synthetic biology, bioengineering, bioaugmentation).
We conduct a numerical analysis of the model by assigning parameter values, shown in Table A1, to the model. A one-parameter bifurcation, using dilution rate as the bifurcating parameter, is shown in Fig. A1. Briefly, the analysis identifies stable (red) and unstable (dashed black) equilibria in the X 1 variable domain. Two branch points (BP) representing the switching between two equilibria, and a limit point (LP) where two equilibria collide (or emerge), are identified. Biological interpretation: LP represents the maximum dilution rate at which a positive X 1 concentration is possible. A two-parameter bifurcation diagram is shown in Fig. A2. This is sometimes referred to as an operating diagram as it allows for the identification of stable operating regions. In this case, maintaining a dilution rate below 0.8284 and a acetate influent concentration between the blue and red curves guarantees stable existence of the acetate degrader. Increasing this concentration, so that the process is operating above the red curve in a region of bistability, would require control of the initial conditions to avoid the washout case.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 3 July 2020 doi:10.20944/preprints202007.0032.v1  Figure A1. One parameter bifurcation in X 1 for the parameter set given in Table A1 indicating a limit point (saddle-node bifurcation), LP, at a dilution rate of 0.8284 at the collision of two equilibria (E 1 and E 2 ). The red line corresponds to stable fixed points (equilibria) and dashed black line to unstable fixed points. It can be noted that at D > 0.3279, the system is bistable in E 0 and E 2 . The branch point (BP) at D = 0, corresponds to the theoretical maximum X 1 , where no biomass or substrate leaves the reactor.