Abstract
This article deals with the inclusion of microbial ecology measurements such as abundances of operational taxonomic units in bioprocess modelling. The first part presents the mathematical analysis of a model that may be framed within the class of Lotka–Volterra models fitted to experimental data in a chemostat setting where a nitrification process was operated for over 500 days. The limitations and the insights of such an approach are discussed. In the second part, the use of an optimal tracking technique (developed within the framework of control theory) for the integration of data from genetic sequencing in chemostat models is presented. The optimal tracking revisits the data used in the aforementioned chemostat setting. The resulting model is an explanatory model, not a predictive one, it is able to reconstruct the different forms of nitrogen in the reactor by using the abundances of the operational taxonomic units, providing some insights into the growth rate of microbes in a complex community.
1. Introduction
Microbial communities and their interactions play a central role in the understanding of microbial ecosystems [], and a current challenge is integrating genetic sequencing data in a deterministic modelling framework [,]. Using the terminology from the thorough review in current methodologies on the deterministic modelling approaches of microbial community dynamics presented by Song et al. [], this articles deals with population-based approaches where species are taken as the interacting units.
The classical ecological concept of species and niche in the microbial world is an elusive one: in the macro world one can clearly differentiate one species from another for reproductive reasons and their ability to give birth to offspring. In the case of bacteria and archea, reproduction goes simply by binary fission and exchange of some functional genes (e.g., the ability to synthesize or metabolize substances) can be acquired in evolutionary scale through lateral gene transfer []. Therefore as an ecological problem is hard to define precisely the ‘niche’ of ‘microbial species’. These obstacles can be circumvented by considering the microbiologist concept of operational taxonomic unit (OTU) based on the clustering of organisms sharing similar sequences of the 16S rDNA marker gene. In the past years considerable efforts have been made to measure the bacterial community composition. Tests such as fluorescence in situ hybridization (FISH), polymerase chain reaction (PCR) dependent techniques, and PCR independent techniques for the analysis of DNA have become a standard tool for studying microbial diversity []. The contribution of this article is a new method to integrate the microbial community measurements in chemostat models, based on any sequencing or fingerprinting technique that can quantify the species abundances over time. In other words, while most models used in bioengineering are functional—in the sense they consider only one species per biological reaction considered—this work is an attempt to merge classical population-based models used in ecology and those used by engineers in biotechnology.
Interactions lie at the heart of ecology. Lotka [] and Volterra [], independently, presented a 2 dimensional dynamical system to model prey-predator relationships, now known as the Lotka–Volterra (LV) equations. The model is very rich from a mathematical standpoint, and is also a classic equation to study in Mathematical Ecology []. Extensions of the Lotka–Volterra model have derived what is now known as generalized Lotka–Volterra (gLV) models [] shown in Equation (1):
where represents the species abundance, the intrinsic growth rate of the species, and the terms reflect the effect of OTU j on the growth of OTU i. The equation states that the growth rate of is proportional to , but this proportionality constant depends on its intrinsic growth rate multiplied by the sum of all interactions affecting it. Note that if there are no type of interactions (), one recovers n uncoupled linear differential equations, and thus the solution becomes , that is exponential growth on time.
The diagonal terms are known as intraspecies interaction, while the off diagonal terms are known as the pairwise interspecies interactions. Noting the signs of pairs , the classical ecological relationships of mutualism or cooperation , commensalism , predation or parasitism , competition , and ammensalism can be recovered []. Model (1) has been thoroughly analysed, even when the coefficients and are time dependent and exhibit periodicity (which models seasonal traits) [,]. The gLV model has been used in microbial ecology to some degree of success to study the gut microbiome of mice infected with C. difficile []. However, the quadratically growing number of parameters to describe interactions naturally entails problems of identifiability if the data set is not large enough, or the system has not been sufficiently perturbed. On a more conceptual ground, the interaction coefficients of a gLV model do not represent mechanistically anything, so even if a model correctly predicts the microbial community dynamics, it might not add to the understanding of what could be physically or biologically taking place. These observations led us to develop what can be considered the core contribution of this article, which is to study the growth rate of each species in a mixed culture: we reconstruct the shape of their growth rate, instead of trying to fit a particular function (such as the gLV equations). As Monod himself commented when developing the growth law that bears their name is that any function with the same shape (monotone, concave, and bounded on the substrate) would have served [], in this spirit we formulate the question: what is the shape of the growth functions of multiple species developing together?
As a departing point, the work of Dumont et al. [] is presented in Section 2. They modelled a chemostat experiment where nitrification takes place by considering a glV model coupled with a substrate limited growth expression ( is no longer a constant, but the classic Monod expression) and fitted their model using absolute abundances of the major OTU identified by molecular fingerprints obtained by single-strand conformation polymorphism (SSCP). Section 3 inspects the model through a mathematical analysis. Some interesting outputs of this analysis are that the number of possible equilibrium points grows exponentially with the number of species, coexistence can be achieved within the same functional group, and bi-stability may arise. In Section 4, the concept of interaction function is developed such that it generalizes the gLV model. For approximating the interaction function a method of optimal control theory was adapted: The growth rate of each species is modulated by a constrained regular control of the system, thus the growth rate of each OTU is corrected in order to fit the experimental data. The regular control is composed of a feedback part on the species state variable, and a feed forward part, or tracking, on the measurement of abundances of each species; the method involves solving state-dependent Ricatti equations []. In Section 5, the methodology from Section 4 is applied to the data from the experiments performed by Dumont et al. [] and not just to the most abundant species as it was the case of the model analysed in Section 3 []. This approach explicitly assumes that dynamics of complex ecosystems are driven by interactions, that are the results of feedback loops of each species on the growth rate of others. The method shows that by following the community dynamics one can propose a growth rate that reconstructs the substrates dynamics, however this cannot be considered a predictive model, but rather a explicatory model. The article ends with a discussion on the scope of applicability and perspectives of the method.
2. Model Definition
Notations used throughout the article:
- n: the number of OTU considered.
- : the number of OTU in functional group . In the example corresponds to ammonia oxidizing bacteria (AOB) and corresponds to nitrite oxidizing bacteria (NOB).
- Let m be an interger then .
- : is the concentration of OTU i measured in . .
- x: vector .
- : concentration of substrate 1 in . In the example represents ammonium.
- : concentration of substrate 2 in . In the example represents nitrite.
- : concentration of substrate 3 in . In the example represents nitrate.
- : entry concentration of substrate 1 in . May depend on time .
- s: vector . Referred to as metabolites.
- : Interaction function of OTU .
- : growth function of OTU .
- vector containing the growth function of every OTU.
- D: dilution rate of the continuous reactor in . May depend on time .
- : yield of grams of OTU i formed per gram of substrate consumed.
- : yield of grams of substrate consumed/produced per gram of OTU j formed. If negative it represents consumption, if positive it represents production.
- Y: matrix containing all yields such that .
- For integers and and , represents a matrix of rows and columns with a in every entry.
- Let m be an integer then is the identity matrix of size m.
- Let M be a matrix, then represents the i-th row of matrix M.
- Let S be a finite set with elements. Then .
- Given a vector , the function stands for:
2.1. Stoichiometric Equations
A cascade (bio)reaction process is considered. Suppose n different OTU are present in the chemostat. A two step cascade reaction refers to the situation where a group of microorganisms () consumes a substrate and produces and biomass, while another group of microorganisms () consumes and produces and biomass. and are called functional groups. The number of organisms in each functional will be denoted and , respectively, that is and . This work treats the case when and are disjoint sets:
Hypothesis 1 (H1).
Sets and satisfy: and .
The situation is described as simplified Reactions (R1) and (R2). The reactions are simplified in the sense that they do not attempt to represent a balanced chemical reaction, rather they represent the direction of the bioprocess and the proportions of different consumed and formed compounds of interest. The terms are known as yields, they represent the quantity of g of biomass produced per g of substrate consumed by OTU i. For example, in the case of reaction (R1), one gram of is consumed, one gram of and grams of dry biomass of OTU i are produced.
However for expressing the system of differential equations further below, the terms are used. They express the grams of substrate consumed (negative sign) or produced (positive sign) per gram of OTU j formed. They are related to as seen in Table 1. This defines the stoichiometry matrix , such that .
Table 1.
Relationship of with .
Furthermore, for each , OTU i is characterized by its process rate (also known as growth function) . Notice that for being as generic as possible, the growth rate may be a function of the whole state in order to model the influence of all OTU on the growth rates of others.
An example of this process is the nitrification process where group is known as Ammonia oxidizing Bacteria (AOB), and group is known as Nitrite oxidizing Bacteria (NOB) [].
2.2. Mass Balance Equations
Consider the scenario of a continuous and homogeneous reactor: the input flow is the same as the output flow, with a dilution rate D. The input flow contains a concentration of substrate . Each OTU grows at a rate . System (3) represents this situation. A specific case of is given in the next subsection.
System (3) can also be written in a more compact form using the stoichiometric matrix Y and the diag operator.
2.3. Kinetic Equations
In the work of Dumont et al. [], the growth rates seen in Equation (6) were calibrated against experimental data for the two most abundant OTU of each functional group.
The term accounts for pairwise interactions affecting the growth rate of each OTU, while the term is a Monod growth expression, where represents the maximum growth rate, and the half saturation constant []. Note that if every , then one recovers a classic substrate limited growth. Let A denote the matrix with entries hereafter referred to as the interaction matrix. Dumont et al. did not analyse their model but simply provided several simulations using parameter values identified from experimental data. The following section of this article deals with the mathematical analysis of model (3) with growth rates given by (6).
3. Mathematical Analysis
The system of Equation (3) is defined in the region
First, sufficient conditions on the interaction matrix for the system to be well posed are established: meaning that solutions remain bounded and non-negative in time, this ultimately implies that the solution exists for every [].
Second, the equilibria of the system are derived. Possible equilibrium points for this system grow exponentially with the number of OTU considered (n). Stability is not analytically addressed, a numerical scheme calculating every equilibrium point and the system’s Jacobian eigenvalues at the equilibrium point was implemented for studying the system.
3.1. Properties of the System
A bound on the norm of the interaction matrix that depends on the initial conditions and parameters one establishes that solutions will remain positive and bounded.
Lemma 1.
For initial conditions , there exists positive scalars , , and such that solutions to (3) satisfy the following inequalities:
The proof can be seen in Appendix A. A bound on the norm of A is found such that every matrix A respecting the bound, guaranties that is a positively invariant set.
Lemma 2.
For initial conditions , there exists a constant such that for every matrix A satisfying , the solutions of system (3) with growth rates given by (6), remain in Ω and are bounded.
The proof can be seen in Appendix A.
The importance of Lemma 2 is that by restricting the norm of matrix A the system is well-posed, meaning that the solutions can have biological and physical sense (there is no such thing as negative concentrations). Particularly, one has that , which in this context implies that a bound on the sum of the absolute value of the interaction terms that affects each species allows the system to be well-posed. Note, however, this is a sufficient condition, thus the range of values matrix A can sustain for the system to remain well-posed may be considerably larger.
3.2. Equilibrium Points
In this section, analytical expressions for equilibrium points are shown. However, no analytic expression concerning the stability of such points is presented. In the following pages the reader will appreciate that the expressions of the equilibrium points are not simple, consequently replacing them in a block matrix and calculating eigenvalues resisted an algebraic treatment. To answer the question of stability a numeric scheme is used by evaluating the Jacobian at the equilibrium point. At the end of the section an algorithm is provided for exploring all the possible equilibria. All the computations for deriving the equations of this section can be found in Appendix B.
Let be such that,
Then
Thus, system (3) is rewritten as follows.
Definition 1.
An equilibrium point (or steady state) is a point so that the right hand side of Equations (11)–(14) equals zero.
Observe that equilibrium points are by definition non-negative so the state variables can have physical meaning. For studying the cases where contains zero valued entries, the set of non-active coordinates is defined as follows:
Definition 2.
Given an equilibrium point of system (3), then the set of non-active coordinates is defined as: . and denote the number of positive entries of of functional groups and , respectively. denotes the total number of positive entries of . The active point is defined by the positive entries of . Analogously, the functions and are defined by the positive entries of . The active interactions is defined as the matrix A without the rows and columns. The active stoichiometry matrix is the matrix Y without the columns.
In order to derive the equilibrium points, it is desirable an invertible matrix. Therefore in what follows of the work it is assumed that matrix A and some of its submatrices have an inverse, this is stated properly in Hypothesis 2.
Hypothesis 2 (H2).
Let A be the interaction matrix of size and S be a proper subset of with . Then the matrix defined by taking out the S rows and columns of matrix A is invertible.
Assuming Hypothesis 2 a formula for the active points is derived from Equation (11):
Note as well that at the equilibrium, can be defined in terms of , and . This is done by adding Equations (12)–(14) which gives:
3.2.1. Both Functional Groups Are Present
The case where in each functional group remains at least one OTU is represented by Hypothesis 3.
Hypothesis 3 (H3).
The set satisfies .
By replacing Equation (15) in Equation (12) can be written as a function of :
Then by replacing (17) in Equation (13), one gets a fourth degree polynomial for .
Formulae for coefficients can be found in Appendix B.
The equilibrium point can be calculated from the solutions of the system of Equations (15)–(18) with non negative coordinates. If the system only provides solutions with at least one negative entry then the set cannot define an equilibrium point.
3.2.2. Washout of
The washout of is equivalent to Hypothesis 4.
Hypothesis 4 (H4).
and .
Under this case note that depends only on . Therefore when Equation (15) is replaced in (12), one obtains a quadratic equation for :
where can be found in Appendix B.
Since then from Equation (14).
In this case the equilibrium point can be calculated from the solutions of the system of Equations (15), (16), (A44) and (21) with non-negative coordinates. If the system only provides solutions with at least one negative entry then the set cannot define an equilibrium point.
3.2.3. Washout of
The washout equilibria means for every . This is equivalent to Hypothesis 5. Note that the structure of a cascade reaction implies that if gets washed out, then so is .
Hypothesis 5 (H5).
.
From Equation (12), one gets
then (16) implies
The equilibrium is then given by .
All the former discussion leads to a potential number of different equilibria. Indeed:
3.3. Stability: Operating and Ecological Diagrams
In this subsection the stability of the equilibrium points is addressed. Operating and ecological diagrams are created from this stability analysis. Both are an illustrative way of representing the long term behaviour of a reactor depending on operating parameters, namely D and : In a plane different zones representing the stability properties of system (3) are identified [].
For checking local asymptotic stability of the equilibrium points, the Jacobian of the system is provided and evaluated at each of these points. The resulting matrix’s eigenvalues must have negative real part. A general formula for this Jacobian is presented in Appendix B (see (A70)).
Algorithm 1 summarizes this procedure.
| Algorithm 1: Algorithm for Evaluating the Possible Equilibrium Points of System (3). |
![]() |
Operating and ecological diagrams are created by running Algorithm 1 for different pairs . In the case of operating diagrams [] (OD) all the pairs are regrouped such that the points of the set represent when partial nitrification (PN), complete nitrification (CN), washout (WO), or a combination of them may arise []. PN refers to the state when nitrite () accumulates because the OTU of are washed out and thus no conversion from to takes place. On the contrary CN is when nitrate () accumulates because of the presence of OTU of .
In the case of ecological diagrams (ED) the pairs are regrouped such that the points in the set have the same non-active coordinates. In other words, instead of representing areas where either CN, PN or WO take place, ranges of pairs where species coexist are represented. ED provide more information than OD, in the sense that one can deduce the latter from the former.
A first example using operating diagrams is presented to illustrate how adding interactions in a model consisting of 1 OTU in and 1 OTU in may lead to very different outcomes.
The important question of the existence of limit cycles was not resolved in this work. In the numerical analysis of this model at least one stable equilibrium was found for any choice of parameters. This obviously does not exclude the existence of limit cycles, but to what concerns the authors’ intuitions there always seems to be at least one stable equilibrium point.
3.3.1. Case Study 1: 1 AOB and 1 NOB
Consider the case where and . The operating diagrams when no interactions take place () and a non-zero interaction matrix are presented. When Algorithm 1 is no longer valid (A is not invertible), nevertheless the stability analysis is much simpler and is given in the Appendix B section. The interaction matrices are shown in Figure 1a,b, the rational behind the second choice was to force a very strong interaction of on and observe its effects. The biological reason behind a negative microbial interaction might be the release of a toxin by that affects [], or in this case it might represent competition for oxygen; we stress the fact that gLV interactions do not explicitly account for mechanistically anything they just try to represent an ecological relationship taking place. The rest of parameters can be seen in Table 2.
Figure 1.
Interaction matrices. Note how the presence of x1 affects very negatively x2 in (b), with respect to other interactions. The terms in the diagonal entries of the matrix represent intraspecies interactions, while the terms off the diagonal represent the interspecies interactions. (a) Interaction matrix of model (3) with no interactions. (b) A non-zero interaction matrix of model (3).
Table 2.
A set of kinetic parameters of model (3).
The operating diagrams can be seen in Figure 2, note how partial nitrification (washout of ) of Figure 2b is much bigger when compared to Figure 2a. The shape of the PN region in Figure 2b is somewhat unintuitive, because at a constant dilution rate (0.24 day for example) and an increasing , one passes from a PN zone, to a CN zone, and then back again to a PN zone. The mathematical explanation lies in the fact that also increases with , and the affine part of the growth function of plays a bigger role than substrate limitation ().
Figure 2.
(a) Operating diagram of model (3) with no interactions (interaction matrix represented by Figure 1a). (b) Operating diagram of model (3) with interactions represented by Figure 1b. Note how (b) has a much larger zone where partial nitrification takes place. This is due to the negative interaction of x1 on x2.
3.3.2. Case Study 2: 2 AOB and 2 NOB
Case study 2 is based on Dumont et al. [] model parameters. They proposed a distribution of parameters obtained from a Bayesian estimation method. Their fit describes well the dynamics of the two most abundant OTU in each functional group, but it still fails to capture the measured substrates dynamics. Kinetic parameters of case study 2 can be seen in Table 3. The estimated interaction matrix is shown in Figure 3a. A second matrix is presented, which is obtained by the sign change of coefficient (Figure 3b), and finally a third one is obtained by using a positive value for (Figure 3c). The idea is to show that qualitatively different outcomes can be obtained by changing one interaction at a time.
Table 3.
Kinetic parameters of model (3) from Dumont et al. [].
Figure 3.
Interaction matrices for each case for a consortia of 4 bacterial species where x1 and x2 are AOB and x3 and x4 are NOB. Parameters a11 and a12 were modified in (b,c), respectively.
The ecological diagrams are presented in Figure 4, where the legend indicates the species surviving in the zone of the respective colour. In Figure 4b, the system exhibits bi-stability (it is represented in numbering as (1) and (2) of the different possible equilibriums). Note how every zone in Figure 4b has two stable equilibria, meaning that the outcome of the system is determined by its initial conditions, particularly interesting is the green zone where either coexist or only remains, because in operational terms this means that either PN or CN may take place. When compared to Figure 4a, one can see that this change in the interactions of the microbial community can dramatically change the outcome of the reactor in a large operating zone.
Figure 4.
Ecological diagrams (ED). The different zones represent the combination of surviving species in the steady state. PN takes place when neither x3 nor x4 are present. CN takes place if x3 or x4 are present. (a) ED from interaction matrix on Figure 3a. (b) ED from interaction matrix on Figure 3b. In the legend (1) and (2) represent the two different stable equilibria in each zone. (c) ED from interaction matrix on Figure 3c. Note that in (b) two stable equilibria exist for each zone.
One can see that coexistence in the same functional group is never attained in Figure 4a,b, whereas in Figure 4c and , both AOB, coexist in either partial or complete nitrification. That means that the competitive exclusion principle [] (CEP) does not hold. The CEP roughly states that if two species are growing on the same limiting resource, and their growth laws only depend non decreasingly on the resource, then only one of them will survive in the long run. This is interesting in light of reports on wastewater treatment plants where coexistence between species in nitrifying reactors has been shown [], thus implying that a more complex growth law (as shown in here) or model structure involving other biological processes is required to include microbial diversity in mathematical models.
Remarks
Model (3) serves to illustrate that by considering a more complex growth rate that tries to model ecological interactions one might explain differences in reactors operating under similar conditions. It also shows a new mechanism by which the CEP no longer holds and which explains how multiple stable equilibria may appear. Since the gLV model discussed fails to completely capture the dynamics observed in the chemostat experiments [], the next section proposes a new approach to study interactions.
4. Generalized Approach for Modelling Interactions
In the following an explanatory model (as opposed to a predictive model) is developed based on the hypothesis that interactions might be driving the nitrification process. In the previous sections interactions were modelled as an affine function of the OTU concentration that multiplies a substrate dependent growth equation. More generally the interaction function represents how the growth rate of species i is affected by the concentration of other species, x:
Given a vector the interaction function is denoted as:
Let be a bounded, positive, and continuous function of s (e.g., Monod, Haldane). The growth equation of OTU i becomes:
Note .
Since the growth of a single strain in batch experiments is driven by the substrate concentration, when no interactions are present one should recover expression . Therefore if there are no interactions then . From this hypothesis, note that since if there is minimal presence of OTU, interactions cannot exist. Furthermore for this study it is assumed that is a continuously differentiable function on x. For making explicit all of the former:
Hypothesis 6 (H6).
The interaction function previously defined satisfies:
- There is an open set such that .
Note the Jacobian matrix of function , then a first order approximation of centred at gives: . When one recovers the growth expression from the previous section (Equation (6)) implying that can be seen as the interaction matrix from model (3).
4.1. Unravelling the Interaction Function
Suppose that the functions , and the yields are well-known. By using experimental measurements of x, represented by , the objective is to reconstruct function . For doing so, the terms are replaced by controls , thus . A control law is obtained by solving a nonlinear optimal tracking problem.
Consider the observable system (25), with being the output, because we are observing measurements coming from genetic sequencing.
Consider the weighted norms defined by positive definite matrices Q and R, represented by and , respectively, and . The optimal tracking problem is defined as:
The control is intended to drive the system to be near a desired output , which in this context are the measurements of the concentrations of OTU. The term , was added for two reasons:
- First, because the interest is testing the idea that interactions could be driving the system. Therefore adding a penalization in the objective function for each control to remain near 1 can be seen as an attempt to explain data without any interaction. In other words, if the control terms are found to drift from 1, it means that interactions are necessary to explain the system dynamics.
- Second, to force a regularized control. Otherwise note that u is linear in (25), therefore if the integral cost does not have a non-linear expression of u the optimal control will be of a bang-bang type with possibly singular arcs []. Since the objective is to find a differentiable expression of the addition of the regularization term is deemed necessary.
The problem of approximating the solution of the system to a desired reference (z in this case) is called the optimal tracking problem. For solving such a problem the approach developed by Cimen et al. [,] was adapted to our problem. The method proposed involves the resolution of Approximating Sequences of Ricatti Equations (ASRE). It consists of iteratively calculating trajectories of System (25) with a certain control law to later feed a non-autonomous Ricatti differential equation with the resulting trajectory. Then, a new control law that uses the solution of the Ricatti equation is proposed and a new trajectory is calculated. The iteration is stopped when a convergence in the output or the control is observed.
The control term should remain positive for the system to be well posed (no negative states), and an upper bound was added to represent the fact that life cannot grow infinitely fast. The tracking problem does not consider a constrained control. Nevertheless, the methods of Cimen et al. [] were directly used with an explicit constraint in the synthesis of the control. Even though this is probably suboptimal when the control reaches its bounds, one at least is certain about its optimality when the control never reaches its constraints.
Another departure from their method is that in their formulation an approximation of the dynamics for calculating the trajectories is used. They proved that such a linearisation converges to the original dynamics. In the case here presented the linearisation of the dynamics was not necessary.
The change of variable is used for technical reasons explained in Appendix C.
This in turn implies . The feedback control , will be of the form . Where matrix and vector solve differential equations. In Appendix C it is proved that, thanks to the structure of the system, one only needs to calculate differential equations for the synthesis of the control, instead of that would imply the direct application of the method, which renders the method—at least theoretically—scalable for a growing number of OTU.
4.2. Proof of Concept
The approach was tested with data generated by simulating model (3) using the parameters of the case study 1 (Table 2) with interaction matrix given by (Figure 1b). In the operating diagram of the same case (Figure 2b) the red zone implies complete nitrification, while the green zone means partial nitrification. For integrating the former phenomena in the simulation, the system was simulated for 300 days, and perturbed at day 150 from the CN zone () to a PN zone (). Simulations can be seen in Figure 5. Note how from day 150 the NOB population (OTU 2) represented in Figure 5b decreases, which in turn implies a decrease in , as seen in Figure 5c. In the case where no interactions take place, the OD seen in Figure 2a implies that would have accumulated all along the trajectory, since the perturbation still remains in the CN zone.
Figure 5.
Synthetic data generated by model (3), with parameters from case study 1. Note the effects of the increased input sin generated in day 150.
For the tracking procedure the functions and the yields were the same as those used for simulating the synthetic data (parameters in Table 2) and the control is meant to account for the interaction term. The Q and R matrices were and , respectively, with and in order to better track the NOB trajectories, since they are less abundant. The values were obtained by trial and error, by using a single for both functional groups, beginning with , in which case one can see how the optimal control becomes , thus no tracking is performed. Further diminishing the value from adds too much noise to the control, without significant gains on the quality of the tracking.
The results of the procedure to identify interactions can be seen in Figure 6 and Figure 7. Figure 6a and Figure 7a,b show the total biomass concentration, and the trajectories for the OTU belonging to and , respectively. It can be seen that the method approaches well the trajectories of the OTU, with a better result for the AOB community, which can be explained by the one order of magnitude difference in their concentrations (which in turn is a consequence of the one order of magnitude difference in their yields). The metabolites concentration represented in Figure 6b are in accordance with the simulated: The method is able to reconstruct the metabolites trajectories from the community measurements.
Figure 6.
Asterisks represent the synthetic data, while the continuous lines represent the method’s output. The method is able to reconstruct the metabolites pattern, from the biomasses concentrations.
Figure 7.
Asterisks represent the synthetic data, while the continuous lines represent the method’s output. The method reconstructs a continuous trajectory from the synthetic data.
Figure 8 and Figure 9 show the controls and the corrected growth rate for each functional group. The control for each functional group can be seen in Figure 8a and Figure 9a. Note that from the structure of a quadratic regulator, since there is no cost in the final state, the end value is always 1. Figure 8b and Figure 9b show the resulting growth rates for and , respectively, without the control . Figure 8c and Figure 9c are the complete expression that determines growth rate, that is . Note how little the shape changes with respect to Figure 8b and Figure 9b, which might mislead the reader to conclude that the control had reduced effects in the dynamics. The way out of this conundrum is to remember that the control’s effects are already included in and , and thus in expression .
Figure 8.
Obtained control and reconstructed growth rate for OTU 1 (AOB).
Figure 9.
Control and reconstructed growth rate for OTU 2 (NOB).
A final comment on the identifiability of the interaction terms. Even though one might propose a growth rate with the tracking control that accurately replicates the OTU trajectory , retrieving the original interaction coefficients from the obtained control for this example was not possible. The former was tried by minimizing function with a non linear optimization solver for a 1000 initial random guesses for matrix A. If one also takes into account and as parameters to fit this adds even more degrees of freedom, thus suggesting that the identifiability of growth functions (6) in model (3) might be very low.
5. Application
The tracking problem was applied to data coming from a nitrification process with experimental conditions described in []. For exploring the hypothesis of interactions as drivers of bioreactors performance environmental conditions should be kept as constant as possible. Therefore only data from day 183 onwards was used because a change in the operating temperature happened at that point, which is known to have an effect on kinetics. For choosing which species belong in which functional group, the procedure described of Ugalde-Salas et al. [] was used. From day 183 to day 315, 31 OTU were identified in the group (AOB) and 5 in the functional group (NOB).
A first example of the procedure is performed when the classified OTU are regrouped in their assigned functional groups by adding their concentrations. A 5 dimensional dynamical system is obtained, thus there are only two interacting functional biomasses: this case is structurally the same as in the proof of concept, but here a real dataset is used. The same procedure is applied where no regrouping occurs and the system state grows to 39.
The knowledge of functions was based on a study of nitrification’s kinetic parameters []. Particularly given the system’s ammonium and nitrite concentration a Monod function (Equation (27)) was used for and with parameters given in Table 4 calculated from the equation of Table 2 of the same article. The yields were fitted to match the nitrogen mass balances. The Q and R matrices were the same as in the proof of concept section, that is and , respectively, with and , because data lie in the same order of magnitude than synthetic data.
Table 4.
A set of kinetic parameters of model (25).
For the reader to gain understanding of the situation, a simulation of the system using the experiments operating parameters (D and ) is presented without control (i.e., ) in Figure 10 nitrate () accumulates all along the trajectory, but when compared to data it is clear that stops accumulating after a while.
Figure 10.
Simulation of system (25) when u = 1, with functions as in (27). Data points are represented by a star. The continuous line represents the simulation.
When applying the tracking method one obtains the simulation that can be seen in Figure 11 and Figure 12. The method captures the tendencies of the measured substrates as seen in Figure 11b. The tracking of each functional group (AOB), and (NOB) can be seen in Figure 12a,b, respectively.
Figure 11.
Results on applying the tracking method to a nitrification experiment when regrouping OTU in their functional groups. Data points are represented by asterisks. The continuous line represents the tracking procedure results.
Figure 12.
The tracking procedure applied to the observed biomass (asterisks) regrouped in two functional groups.
The growth rates of each functional group are shown in Figure 13. Note in the case of AOB (Figure 13a) the resulting growth rate shows a noisy curve formed by pulses. The behaviour of the NOB community (Figure 13b) is qualitatively very similar with somewhat stronger pulses and less noise. The former is to be expected since more OTU were regrouped to compose the AOB biomass, therefore more noise sources were added.
Figure 13.
Obtained growth rates when regrouping OTU in their functional groups.
The same procedure is applied without regrouping. The results on total biomass and metabolites are shown in Figure 14a,b, respectively. Both patterns still fit the data, but to a lesser degree of precision when compared to Figure 11. This can be explained by inspecting Figure 15, Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20. First note the absolute error of the tracking for each of the OTU in the AOB community (Figure 15b, Figure 16b, Figure 17b, Figure 18b and Figure 19b), almost every point lies below [g/L], implying that the method might not be able to track below that threshold for the members of the AOB community. The former ultimately implies that the most abundant OTU are better tracked, thus the information contained in the least abundant species is not integrated in the model. Notice that the error for the NOB community is lower (Figure 20b), almost every point lies below [g/L], this can be explained in the one order of difference in the entry of matrix Q for the AOB and the NOB community. It may be the case that using appropriate weight matrices that account for the difference between OTU abundances could help in this aspect; in that sense only one rational was tested (inverse of the mean abundance of each OTU in the diagonal entries of matrix Q) and did not improve the results. When looking at the growth rates (Figure 15c, Figure 16c, Figure 17c, Figure 18c, Figure 19c and Figure 20c) one again observes pulses for each OTU. Finally, note that most OTU were present only for a fraction of the experiment’s duration.
Figure 14.
Results on applying the tracking method to a nitrification experiment when all OTU are tracked independently. Data points are represented by a star. The continuous line represents the tracking procedure results.
Figure 15.
Results for OTU 1-7 (AOB).
Figure 16.
Results for OTU 8-14 (AOB).
Figure 17.
Results for OTU 15-22 (AOB).
Figure 18.
Results for OTU 23-28 (AOB).
Figure 19.
Results for OTU 29-31 (AOB).
Figure 20.
Results for OTU 32-36 (NOB).
In both cases, the regrouped and individual tracking, the growth rate varies strongly, raising the question whether the observed pulses are emerging from interactions within the microbial community. When growth rates are compared to the proof of concept section it seems doubtful that a linear pairwise interaction model such as the gLV model could capture the complexity of the particular chemostat analysed. Perhaps these interactions are not constant through time (as opposed to the gLV model) or a different interaction function should be thought of. However the former questions cannot be fully clarified here, because the quality of the genetic sequencing from molecular fingerprints might not be the best when compared to more recent techniques, thus it is unclear if the pulses are due to noise of the measurements.
The interpretation of the correction term as interactions is not the only possible reading. In other contexts the correction term might also be interpreted as a non accounted phenomena ranging from environmental factors (e.g., temperature, pH) to other biological factors (viruses, flock formation, pathogens). Alternative hypotheses for explaining the observed patterns in the microbial community should be considered as well.
6. Conclusions and Perspectives
Over the last decades, advances in genetic sequencing and microbial ecology have opened a gap for modellers in biochemical processes to integrate this valuable information. Considering the success of mass balance models to predict and pilot bioreactors, new models should be built upon them, or at least be compared against them. This article exposed what can be gained from combining population-based models as used in ecology with functional group based approaches as used in bioengineering. The analysis of the gLV model proposed by Dumont et al. [] already shows that such a combination can give way to models that include bi-stability, coexistence within a functional group, and unintuitive operational insights such as raising the input ammonium to achieve partial nitrification. The increased number of parameters of this particular model obviously hinders its potential application, but it surely helps to illustrate what can be gained by joining both types of models. The mathematical analysis focused on the particular case of pairwise interactions, which can be seen as a first order approximation of the introduced concept of interaction function. This opens the question for a broader class of interaction functions that could well represent complex microbial ecosystems, particularly bioreactors.
With that line of reasoning, in order to understand what this interaction function should look like, a data-driven approach was presented. It can be simply described as correcting the growth rate expression of each individual species in a mass balance model by explicitly assuming a control loop on the growth rate depending on the species state variables and the measured abundances. The reconstructed growth rates seem to consist of pulses, suggesting that a form of a possible interaction function should reproduce this behaviour. However, the former questions cannot be fully clarified here, because the identification and quantification of OTU from single-strand conformation polymorphism (SSCP) as performed by Dumont et al. [] might not be optimal. Indeed, the determination of OTU from SSCP profiles rely on the identification and quantification of peaks and miss many OTU that appear only as background noise []. This type of fingerprinting is thus less accurate than more recent methods such as the sequencing of the 16S ribosomal RNA. In spite of this, one is able to recover the substrates dynamics, implying that the hypothesis of microbial interactions as drivers of a bioprocess, in the form of feedback loops affecting each others growth rates, is not far-fetched.
The use of the tracking technique can be applied in a straightforward manner to already existing models in mixed homogeneous bioreactors, under the condition that the microbial species have already been identified with a particular functionality of the system. Even though the tracking model was proposed for a chemostat setting as a way to correct a substrate limited expression, the method could be used in contexts where less information on the growth function of microbes is known. If one supposes nothing on the growth expression but the fact that is bounded (life cannot grow infinitely fast) the model becomes a linear model and one recovers a classic quadratic regulator for linear systems. This was tested for the data presented in this article and, unfortunately, negative substrate appeared as an output, suggesting that the substrate limitation term is crucial for the model to be well-posed. Nevertheless, even in the former case, the synthesis of the optimal bounded control remains an open theoretical challenge. One might bypass this issue of the current control scheme by, for example, a very thorough use of the Pontryagin maximum principle for the synthesis of the control. In a more general view the reconstruction of the growth function in chemostat systems is already subject to problems of identifiability [], integrating genetic sequencing could provide a path for more certainty in model calibration.
Author Contributions
Conceptualization J.H., P.U.-S. and E.D.-L.Q., Formal analysis P.U.-S. and H.R.C., Supervision J.H. and E.D.-L.Q. Writing—review & editing J.H., P.U.-S. and E.D.-L.Q. Software P.U.-S. All authors have read and agreed to the published version of the manuscript.
Funding
This work was financed by project Thermomic ANR-16-CE04-0003 (France) and FONDECYT grant 1200355 and Basal Program CMM-AFB 170001, both from ANID (Chile).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data can be recovered from the following repository https://github.com/paus-5/Class-and-Track (accessed on 24 February 2021).
Acknowledgments
The authors would like to thank Alain Rapaport for the enriching exchanges and comments.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| AOB | Ammonia oxidizing bacteria |
| NOB | Nitrite oxidizing bacteria |
| OTU | Operational Taxonomic Unit |
Appendix A. Proofs of Properties of the System
Lemma A1.
For initial conditions , there exists positive scalars , , and such that solutions to (3) satisfy the following inequalities:
Proof.
Define . Computing one gets:
Define and consider the differential equation:
If there is a time interval H such that , then is constant and therefore bounded by the value of the solution in . If , then is a stable asymptotic equilibrium. Define . If then by the Picard-Lindeloff theorem , otherwise it would cross the solution of the initial value problem (A4) with starting point , therefore . The same reasoning may be applied if . One concludes that .
Now consider
From a comparison lemma (see chapter 3 []), the solution of (A5) is bounded by w and therefore .
Define , and and noting that and satisfies the same differential equations as the above reasoning may be applied and one has the desired bounds. □
Lemma A2.
For initial conditions , there exists a constant such that for every matrix A satisfying , the solutions of system (3) with growth rates given by (6), remain in Ω and are bounded.
Proof.
If any coordinate of the solution becomes zero, then it’s derivative is either zero, or positive. The bound will follow from Lemma 1.
Note that for any , if then , therefore all solutions of system (3) with initial condition remain in these planes. By the Picard Lindeloff theorem a solution starting in cannot cross these planes, therefore for . If , then . Therefore . If , let , by adding both inequalities from Lemma (1) and since one has , which in turn implies: . Define , and let A be a matrix such that . It follows that , and therefore:
Computing , and thus . Note that since , bound (A6) is valid for any time, and not only when . If then . Therefore . For the boundedness it suffices to notice from Lemma 1 that the sum of positive elements is bounded, therefore each element is bounded. □
Appendix B. Deduction of Equilibrium Points
Recall that , then thus, system (3) is rewritten as follows.
Recall the set of non-active coordinates. Let M be the matrix defined by taking out the columns of the identity matrix of size n. When M multiplies from the left it adds rows of zeros to the multiplied matrix in the coordinates. When matrix M multiplies from the right it takes out the columns of the multiplied matrix. When multiplies from the left it takes out the rows of the multiplied matrix. When multiplies from the right it adds columns of zeros to the multiplied matrix in the coordinates. This gives the following relationships:
From Equation (A7) equilibrium points satisfy:
note that therefore the coefficients play no role in Equation (A13) and one can rewrite Equation (A13) as
From Hypothesis 2 matrix has an inverse. This gives the following formula
Note as well that at the equilibrium, can be defined in terms of , and . This is done by adding Equations (A8)–(A10) which gives:
Appendix B.1. Both Functional Groups Are Present
The case where in each functional group remains at least one OTU is represented by Hypothesis 3. By replacing in Equation (A8) yields
For notation and indexing purposes it is useful to define (note ). Replacing (A15) in Equation (A8) reads as follows:
and so we get to formula (17) of the main text:
where
The same computations must be done with Equation (A9), which is structurally very similar to (A8). By replacing in Equation (A9) yields
It is again useful to define:
and one gets to expression:
where
Then by replacing (A21) in Equation (A30), and after some reordering, one gets a fifth degree polynomial for .
where
Appendix B.2. Washout of G2
The washout of is represented in Hypothesis 4.
Under this case note that depends only on . By replacing again in Equation (A7), one obtains the same computations as in the previous section, so retake Equation (A41)
Since then thus (A41) becomes:
Furthermore, so a quadratic equation for is obtained.
where
Appendix B.3. Jacobian of the System
Recall that is a function, then its derivative is where denotes the set of continuous linear mappings from X to Y, such that , where the linear mapping is evaluated at []. In the case where and , is the Jacobian matrix evaluated at point x, furthermore (where is the canonical j-th vector) is the j-th column of the Jacobian matrix evaluated at point x.
If f is linear then for any . In the case of the diag operator, we observe it is linear, therefore
In the case of bilinear mappings another formula holds, let be bilinear. Then . In the case of the function such that , one can see that .
The chain rule states that and , if then . Take the expression . It is clear that . By the chain rule:
Note finally that function h is symmetric, i.e., . Then going back to our system.
where For a fixed s, let then, . Let , then compute :
Again, for a fixed s, let , then compute :
Let the function containing the first components of function and the function containing the last components of function so one can write
One can see then that:
Now for a fixed x let , therefore . Let , then compute :
Again, for a fixed x let , then compute :
Finally note that:
Then the Jacobian of the system may be expressed as:
Appendix B.4. Stability Analysis with no Interactions
In this case the growth functions are
Coexistence:
Washout of
Washout
Jacobian of the system:
Appendix C. Tracking Problem Reformulation and Details
For applying the methods developed in []. Define the system state . Make the change of variables with are applied to system (25). The system may be rewritten then as:
Define:
with
and
Then the system (A71) can be rewritten as:
is the measured vector containing the OTU concentrations in time. The cost functional is given by
where and R are positive definite matrices. Since there is no interest in the final time . Q and R are taken as diagonal matrices, in that way the system can be reduced as shown below. Particularly after testing the model in the proof of concept and data, the Q and R matrices were and , respectively, with and .
Define the dynamic sequences for , as:
and for define as the solution of (A71) with . The control law is given by
where and are the solution to the differential equations:
Replacing the matrices of our problem
For certain entries of the dynamic the constantly zero function is a solution for them, implying by existence and uniqueness that they should be constantly zero. Then has non zero entries and has n non zero entries, explicitly:
where , and The equations for :
Inspecting the former equation one notices that if , is a solution of all non diagonal entries when R is diagonal. Furthermore, therefore, once again, by existence and uniqueness they should be constantly zero. Hence only the diagonal entries should be calculated.
For and the system reduces to:
Furthermore, the control law is given by
They were solved using standard backward numerical integration.
References
- West, S.A.; Diggle, S.P.; Buckling, A.; Gardner, A.; Griffin, A.S. The Social Lives of Microbes. Ann. Rev. Ecol. Evol. Syst. 2007. [Google Scholar] [CrossRef]
- Widder, S.; Allen, R.J.; Pfeiffer, T.; Curtis, T.P.; Wiuf, C.; Sloan, W.T.; Cordero, O.X.; Brown, S.P.; Momeni, B.; Shou, W.; et al. Challenges in microbial ecology: Building predictive understanding of community function and dynamics. ISME J. 2016, 10, 2557–2568. [Google Scholar] [CrossRef]
- Wade, M.; Harmand, J.; Benyahia, B.; Bouchez, T.; Chaillou, S.; Cloez, B.; Godon, J.J.; Boudjemaa, B.M.; Rapaport, A.; Sari, T.; et al. Perspectives in mathematical modelling for microbial ecology. Ecol. Model. 2016, 321, 64–74. [Google Scholar] [CrossRef]
- Song, H.S.; Cannon, W.; Beliaev, A.; Konopka, A. Mathematical Modeling of Microbial Community Dynamics: A Methodological Review. Processes 2014, 2, 711–752. [Google Scholar] [CrossRef]
- Boucher, Y.; Douady, C.J.; Papke, R.T.; Walsh, D.A.; Boudreau, M.E.R.; Nesbø, C.L.; Case, R.J.; Doolittle, W.F. Lateral Gene Transfer and the Origins of Prokaryotic Groups. Ann. Rev. Genet. 2003. [Google Scholar] [CrossRef]
- Ferrera, I.; Sánchez, O. Insights into microbial diversity in wastewater treatment systems: How far have we come? Biotechnol. Adv. 2016, 34, 790–802. [Google Scholar] [CrossRef] [PubMed]
- Lotka, A.J. Elements of Physical Biology. J. Am. Stat. Assoc. 1925. [Google Scholar] [CrossRef]
- Volterra, V. Fluctuations in the abundance of a species considered mathematically. Nature 1926. [Google Scholar] [CrossRef]
- Kot, M. Elements of Mathematical Ecology; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar] [CrossRef]
- Hernández-Bermejo, B.; Fairén, V. Lotka–Volterra representation of general nonlinear systems. Math. Biosci. 1997. [Google Scholar] [CrossRef]
- Perez-Garcia, O.; Lear, G.; Singhal, N. Metabolic network modeling of microbial interactions in natural and engineered environmental systems. Front. Microbiol. 2016, 7, 673. [Google Scholar] [CrossRef]
- Gopalsamy, K. Global asymptotic stability in a periodic Lotka–Volterra system. J. Aust. Math. Soc. Ser. B Appl. Math. 1985. [Google Scholar] [CrossRef]
- Aleksandrov, A.Y.; Aleksandrova, E. Convergence conditions for some classes of nonlinear systems. Syst. Control Lett. 2017, 104, 72–77. [Google Scholar] [CrossRef]
- Bucci, V.; Tzen, B.; Li, N.; Simmons, M.; Tanoue, T.; Bogart, E.; Deng, L.; Yeliseyev, V.; Delaney, M.L.; Liu, Q.; et al. MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses. Genome Biol. 2016, 17, 121. [Google Scholar] [CrossRef]
- Monod, J. Recherches sur la Croissance des Cultures Bacteriennes; Hermann: Paris, France, 1942. [Google Scholar]
- Dumont, M.; Godon, J.J.; Harmand, J. Species Coexistence in Nitrifying Chemostats: A Model of Microbial Interactions. Processes 2016, 4, 51. [Google Scholar] [CrossRef]
- Çimen, T. State-Dependent Riccati Equation (SDRE) control: A survey. IFAC Proc. Vol. (IFAC-PapersOnline) 2008, 41, 3761–3775. [Google Scholar] [CrossRef]
- Dumont, M.; Harmand, J.; Rapaport, A.; Godon, J.J. Towards functional molecular fingerprints. Environ. Microbiol. 2009, 11, 1717–1727. [Google Scholar] [CrossRef]
- Sharma, B.; Ahlert, R. Nitrification and nitrogen removal. Water Res. 1977, 11, 897–925. [Google Scholar] [CrossRef]
- Khalil, H.K. Nonlinear Systems; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar] [CrossRef]
- Pavlou, S. Computing operating diagrams of bioreactors. J. Biotechnol. 1999, 71, 7–16. [Google Scholar] [CrossRef]
- Harmand, J.; Lobry, C.; Rapaport, A.; Sari, T. The Chemostat: Mathematical Theory of Microorganism Cultures; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar] [CrossRef]
- Khin, T.; Annachhatre, A.P. Novel microbial nitrogen removal processes. Biotechnol. Adv. 2004, 22, 519–532. [Google Scholar] [CrossRef]
- Wagner, M.; Loy, A.; Nogueira, R.; Purkhold, U.; Lee, N.; Daims, H. Microbial community composition and function in wastewater treatment plants. Antonie Van Leeuwenhoek 2002, 81, 665–680. [Google Scholar] [CrossRef]
- Harmand, J.; Lobry, C.; Rapaport, A.; Sari, T. Optimal Control in Bioprocesses: Pontryagin’s Maximum Principle in Practice; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar] [CrossRef]
- Çimen, T.; Banks, S.P. Nonlinear optimal tracking control with application to super-tankers for autopilot design. Automatica 2004, 40, 1845–1863. [Google Scholar] [CrossRef]
- Dumont, M.; Rapaport, A.; Harmand, J.; Benyahia, B.; Godon, J.J. Observers for microbial ecology-How including molecular data into bioprocess modeling? In Proceedings of the 2008 16th Mediterranean Conference on Control and Automation, Ajaccio, France, 25–27 June 2008; pp. 1381–1386. [Google Scholar]
- Ugalde-Salas, P.; Harmand, J.; Desmond-Le Quéméner, E. Asymptotic Observers and Integer Programming for Functional Classification of a Microbial Community in a Chemostat. In Proceedings of the 2019 18th European Control Conference (ECC), Naples, Italy, 25–28 June 2019; pp. 1665–1670. [Google Scholar]
- Wiesmann, U. Biological nitrogen removal from wastewater. In Biotechnics/Wastewater; Springer: Berlin/Heidelberg, Germany, 1994; pp. 113–154. [Google Scholar] [CrossRef]
- Loisel, P.; Harmand, J.; Zemb, O.; Latrille, E.; Lobry, C.; Delgenès, J.P.; Godon, J.J. Denaturing gradient electrophoresis (DGE) and single-strand conformation polymorphism (SSCP) molecular fingerprintings revisited by simulation and used as a tool to measure microbial diversity. Environ. Microbiol. 2006, 8, 720–731. [Google Scholar] [CrossRef]
- Dochain, D. State and parameter estimation in chemical and biochemical processes: A tutorial. J. Process Control 2003, 13, 801–818. [Google Scholar] [CrossRef]
- Cartan, H. Calcul Différentiel; Hermann: Paris, France, 1967; Volume 1. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
