Microbial Interactions as Drivers of a Nitrification Process in a Chemostat

This article deals with the inclusion of microbial ecology measurements such as abundances of operational taxonomic units in bioprocess modelling. The first part presents the mathematical analysis of a model that may be framed within the class of Lotka–Volterra models fitted to experimental data in a chemostat setting where a nitrification process was operated for over 500 days. The limitations and the insights of such an approach are discussed. In the second part, the use of an optimal tracking technique (developed within the framework of control theory) for the integration of data from genetic sequencing in chemostat models is presented. The optimal tracking revisits the data used in the aforementioned chemostat setting. The resulting model is an explanatory model, not a predictive one, it is able to reconstruct the different forms of nitrogen in the reactor by using the abundances of the operational taxonomic units, providing some insights into the growth rate of microbes in a complex community.


Introduction
Microbial communities and their interactions play a central role in the understanding of microbial ecosystems [1], and a current challenge is integrating genetic sequencing data in a deterministic modelling framework [2,3]. Using the terminology from the thorough review in current methodologies on the deterministic modelling approaches of microbial community dynamics presented by Song et al. [4], this articles deals with population-based approaches where species are taken as the interacting units.
The classical ecological concept of species and niche in the microbial world is an elusive one: in the macro world one can clearly differentiate one species from another for reproductive reasons and their ability to give birth to offspring. In the case of bacteria and archea, reproduction goes simply by binary fission and exchange of some functional genes (e.g., the ability to synthesize or metabolize substances) can be acquired in evolutionary scale through lateral gene transfer [5]. Therefore as an ecological problem is hard to define precisely the 'niche' of 'microbial species'. These obstacles can be circumvented by considering the microbiologist concept of operational taxonomic unit (OTU) based on the clustering of organisms sharing similar sequences of the 16S rDNA marker gene. In the past years considerable efforts have been made to measure the bacterial community composition. Tests such as fluorescence in situ hybridization (FISH), polymerase chain reaction (PCR) dependent techniques, and PCR independent techniques for the analysis of DNA have become a standard tool for studying microbial diversity [6]. The contribution of this article is a new method to integrate the microbial community measurements in chemostat models, based on any sequencing or fingerprinting technique that can quantify the species abundances over time. In other words, while most models used in bioengineering are functional-in the sense they consider only one species per biological reaction considered-this work is an attempt to merge classical population-based models used in ecology and those used by engineers in biotechnology.
Interactions lie at the heart of ecology. Lotka [7] and Volterra [8], independently, presented a 2 dimensional dynamical system to model prey-predator relationships, now known as the Lotka-Volterra (LV) equations. The model is very rich from a mathematical standpoint, and is also a classic equation to study in Mathematical Ecology [9]. Extensions of the Lotka-Volterra model have derived what is now known as generalized Lotka-Volterra (gLV) models [10] shown in Equation (1):ẋ where x i represents the species abundance, µ i the intrinsic growth rate of the species, and the terms a ij reflect the effect of OTU j on the growth of OTU i. The equation states that the growth rate of x i is proportional to x i , but this proportionality constant depends on its intrinsic growth rate multiplied by the sum of all interactions affecting it. Note that if there are no type of interactions (a ij = 0), one recovers n uncoupled linear differential equations, and thus the solution becomes x i (t) = x i (0) exp(µ i t), that is exponential growth on time.
The diagonal terms a ii are known as intraspecies interaction, while the off diagonal terms are known as the pairwise interspecies interactions. Noting the signs of pairs (a ij , a ji ), the classical ecological relationships of mutualism or cooperation (+, +), commensalism (+, 0), predation or parasitism (+, −), competition (−, −), and ammensalism (−, 0) can be recovered [11]. Model (1) has been thoroughly analysed, even when the coefficients µ i and a ij are time dependent and exhibit periodicity (which models seasonal traits) [12,13]. The gLV model has been used in microbial ecology to some degree of success to study the gut microbiome of mice infected with C. difficile [14]. However, the quadratically growing number of parameters to describe interactions naturally entails problems of identifiability if the data set is not large enough, or the system has not been sufficiently perturbed. On a more conceptual ground, the interaction coefficients of a gLV model do not represent mechanistically anything, so even if a model correctly predicts the microbial community dynamics, it might not add to the understanding of what could be physically or biologically taking place. These observations led us to develop what can be considered the core contribution of this article, which is to study the growth rate of each species in a mixed culture: we reconstruct the shape of their growth rate, instead of trying to fit a particular function (such as the gLV equations). As Monod himself commented when developing the growth law that bears their name is that any function with the same shape (monotone, concave, and bounded on the substrate) would have served [15], in this spirit we formulate the question: what is the shape of the growth functions of multiple species developing together?
As a departing point, the work of Dumont et al. [16] is presented in Section 2. They modelled a chemostat experiment where nitrification takes place by considering a glV model coupled with a substrate limited growth expression (µ i is no longer a constant, but the classic Monod expression) and fitted their model using absolute abundances of the major OTU identified by molecular fingerprints obtained by single-strand conformation polymorphism (SSCP). Section 3 inspects the model through a mathematical analysis. Some interesting outputs of this analysis are that the number of possible equilibrium points grows exponentially with the number of species, coexistence can be achieved within the same functional group, and bi-stability may arise. In Section 4, the concept of interaction function is developed such that it generalizes the gLV model. For approximating the interaction function a method of optimal control theory was adapted: The growth rate of each species is modulated by a constrained regular control of the system, thus the growth rate of each OTU is corrected in order to fit the experimental data. The regular control is composed of a feedback part on the species state variable, and a feed forward part, or tracking, on the measurement of abundances of each species; the method involves solving state-dependent Ricatti equations [17]. In Section 5, the methodology from Section 4 is applied to the data from the experiments performed by Dumont et al. [18] and not just to the most abundant species as it was the case of the model analysed in Section 3 [16]. This approach explicitly assumes that dynamics of complex ecosystems are driven by interactions, that are the results of feedback loops of each species on the growth rate of others. The method shows that by following the community dynamics one can propose a growth rate that reconstructs the substrates dynamics, however this cannot be considered a predictive model, but rather a explicatory model. The article ends with a discussion on the scope of applicability and perspectives of the method.

Model Definition
Notations used throughout the article: 13. µ = (µ 1 (s, x), . . . , µ n (s, x)) vector containing the growth function of every OTU. 14. D: dilution rate of the continuous reactor in [1/day]. May depend on time D = D(t). 15. y i : yield of grams of OTU i formed per gram of substrate consumed. 16. y s i /x j : yield of grams of substrate s i consumed/produced per gram of OTU j formed.
If negative it represents consumption, if positive it represents production. 17. Y: matrix containing all yields such that Y ij = y s i /x j . 18. For integers m 1 and m 2 and a ∈ R, a m 1 ×m 2 represents a matrix of m 1 rows and m 2 columns with a in every entry. 19. Let m be an integer then I m is the identity matrix of size m. 20. Let M be a matrix, then M i• represents the i-th row of matrix M. 21. Let S be a finite set with m ∈ N elements. Then |S| := m. 22. Given a vector v = (v 1 , . . . , v n ) ∈ R n , the function diag(v) stands for:

Stoichiometric Equations
A cascade (bio)reaction process is considered. Suppose n different OTU are present in the chemostat. A two step cascade reaction refers to the situation where a group of microorganisms (G 1 ⊂ [n]) consumes a substrate s 1 and produces s 2 and biomass, while another group of microorganisms (G 2 ⊂ [n]) consumes s 2 and produces s 3 and biomass. G 1 and G 2 are called functional groups. The number of organisms in each functional will be denoted n 1 and n 2 , respectively, that is |G 1 | = n 1 and |G 2 | = n 2 . This work treats the case when G 1 and G 2 are disjoint sets: Hypothesis 1 (H1). Sets G 1 and G 2 satisfy: The situation is described as simplified Reactions (R1) and (R2). The reactions are simplified in the sense that they do not attempt to represent a balanced chemical reaction, rather they represent the direction of the bioprocess and the proportions of different consumed and formed compounds of interest. The terms y i are known as yields, they represent the quantity of g of biomass produced per g of substrate consumed by OTU i. For example, in the case of reaction (R1), one gram of s 1 is consumed, one gram of s 2 and y x i /s 1 grams of dry biomass of OTU i are produced.
However for expressing the system of differential equations further below, the terms y s i /x j are used. They express the grams of substrate s i consumed (negative sign) or produced (positive sign) per gram of OTU j formed. They are related to y i as seen in Table 1. This defines the stoichiometry matrix Y ∈ R 3×n , such that Y ij = y s i /x j .

Yields per Biomass Formed
Furthermore, for each i ∈ [n], OTU i is characterized by its process rate (also known as growth function) µ i (s, x). Notice that for being as generic as possible, the growth rate may be a function of the whole state in order to model the influence of all OTU on the growth rates of others.
An example of this process is the nitrification process where group G 1 is known as Ammonia oxidizing Bacteria (AOB), and group G 2 is known as Nitrite oxidizing Bacteria (NOB) [19].

Mass Balance Equations
Consider the scenario of a continuous and homogeneous reactor: the input flow is the same as the output flow, with a dilution rate D. The input flow contains a concentration s in of substrate s 1 . Each OTU grows at a rate µ i (s, x). System (3) represents this situation. A specific case of µ i (s, x) is given in the next subsection.
System (3) can also be written in a more compact form using the stoichiometric matrix Y and the diag operator.

Kinetic Equations
In the work of Dumont et al. [16], the growth rates seen in Equation (6) were calibrated against experimental data for the two most abundant OTU of each functional group.
a ij x j ∀i ∈ G 2 The a ij x j accounts for pairwise interactions affecting the growth rate of each OTU, while the termμ i s j K i + s j is a Monod growth expression, whereμ i represents the maximum growth rate, and K i the half saturation constant [15]. Note that if every a ij = 0, then one recovers a classic substrate limited growth. Let A denote the matrix with entries a ij hereafter referred to as the interaction matrix. Dumont et al. did not analyse their model but simply provided several simulations using parameter values identified from experimental data. The following section of this article deals with the mathematical analysis of model (3) with growth rates given by (6).

Mathematical Analysis
The system of Equation (3) is defined in the region Ω := {(x 1 , . . . , x n , s 1 , s 2 , s 3 ) ∈ R n+3 |x 1 , . . . , x n , s 1 , s 2 , s 3 ≥ 0} First, sufficient conditions on the interaction matrix for the system to be well posed are established: meaning that solutions remain bounded and non-negative in time, this ultimately implies that the solution exists for every t ≥ 0 [20].
Second, the equilibria of the system are derived. Possible equilibrium points for this system grow exponentially with the number of OTU considered (n). Stability is not analytically addressed, a numerical scheme calculating every equilibrium point and the system's Jacobian eigenvalues at the equilibrium point was implemented for studying the system.

Properties of the System
A bound on the norm of the interaction matrix that depends on the initial conditions and parameters one establishes that solutions will remain positive and bounded. Lemma 1. For initial conditions (x 1 (0), . . . , x n (0), s 1 (0), s 2 (0), s 3 (0))) ∈ Ω, there exists positive scalars M 1 , M 2 , and M 3 such that solutions to (3) satisfy the following inequalities: The proof can be seen in Appendix A. A bound on the norm of A is found such that every matrix A respecting the bound, guaranties that Ω is a positively invariant set.
The proof can be seen in Appendix A. The importance of Lemma 2 is that by restricting the norm of matrix A the system is wellposed, meaning that the solutions can have biological and physical sense (there is no such thing as negative concentrations). Particularly, one has that A ∞ = max 1≤i≤m ∑ n j=1 |a ij |, which in this context implies that a bound on the sum of the absolute value of the interaction terms that affects each species allows the system to be well-posed. Note, however, this is a sufficient condition, thus the range of values matrix A can sustain for the system to remain well-posed may be considerably larger.

Equilibrium Points
In this section, analytical expressions for equilibrium points are shown. However, no analytic expression concerning the stability of such points is presented. In the following pages the reader will appreciate that the expressions of the equilibrium points are not simple, consequently replacing them in a 5 × 5 block matrix and calculating eigenvalues resisted an algebraic treatment. To answer the question of stability a numeric scheme is used by evaluating the Jacobian at the equilibrium point. At the end of the section an algorithm is provided for exploring all the possible equilibria. All the computations for deriving the equations of this section can be found in Appendix B.
Definition 1. An equilibrium point (or steady state) is a point (x eq , s eq ) ∈ Ω so that the right hand side of Equations (11)-(14) equals zero.
Observe that equilibrium points are by definition non-negative so the state variables can have physical meaning. For studying the cases where x eq contains zero valued entries, the set of non-active coordinates is defined as follows: Definition 2. Given an equilibrium point (x eq , s eq ) of system (3), then the set of non-active coordinates J ⊂ {1, . . . , n} is defined as: J = {j 1 , . . . , j m : x eq j i = 0, i ∈ [m]}. n act 1 and n act 2 denote the number of positive entries of x eq of functional groups G 1 and G 2 , respectively. n act = n − m denotes the total number of positive entries of x eq . The active point x act ∈ R n act is defined by the positive entries of x eq . Analogously, the functions f act (s) and µ act (x, s) are defined by the positive entries of x eq . The active interactions A act is defined as the matrix A without the J rows and columns. The active stoichiometry matrix Y act is the matrix Y without the J columns.
In order to derive the equilibrium points, it is desirable an invertible A act matrix. Therefore in what follows of the work it is assumed that matrix A and some of its submatrices have an inverse, this is stated properly in Hypothesis 2.
Hypothesis 2 (H2). Let A be the interaction matrix of size n ∈ N and S be a proper subset of [n] with |S| = m. Then the matrix B ∈ R (n−m)×(n−m) defined by taking out the S rows and columns of matrix A is invertible.
Assuming Hypothesis 2 a formula for the active points is derived from Equation (11): Note as well that at the equilibrium, s 3 can be defined in terms of s 1 , s 2 and s in . This is done by adding Equations (12)-(14) which gives:

Both Functional Groups Are Present
The case where in each functional group remains at least one OTU is represented by Hypothesis 3.
By replacing Equation (15) in Equation (12) s 2 can be written as a function of s 1 : Then by replacing (17) in Equation (13), one gets a fourth degree polynomial for s 1 .
Formulae for coefficients b 1 , b 2 , b 3 , a 0 , a 1 , a 2 , a 3 , a 4 can be found in Appendix B. The equilibrium point can be calculated from the solutions of the system of Equations (15)-(18) with non negative coordinates. If the system only provides solutions with at least one negative entry then the set J cannot define an equilibrium point.

Washout of G 2
The washout of G 2 is equivalent to Hypothesis 4.
Under this case note that f act (s) depends only on s 1 . Therefore when Equation (15) is replaced in (12), one obtains a quadratic equation for s 1 : where a i can be found in Appendix B. Since x i = 0 ∀i ∈ G 2 then from Equation (14).
In this case the equilibrium point can be calculated from the solutions of the system of Equations (15), (16), (A44) and (21) with non-negative coordinates. If the system only provides solutions with at least one negative entry then the set J cannot define an equilibrium point.

Washout
The washout equilibria means x i = 0 for every i ∈ {1, . . . , n 1 }. This is equivalent to Hypothesis 5. Note that the structure of a cascade reaction implies that if G 1 gets washed out, then so is G 2 .
From Equation (12), one gets s in = s 1 then (16) implies The equilibrium is then given by 0 1×n s in 0 0 . All the former discussion leads to a potential number of (2

Stability: Operating and Ecological Diagrams
In this subsection the stability of the equilibrium points is addressed. Operating and ecological diagrams are created from this stability analysis. Both are an illustrative way of representing the long term behaviour of a reactor depending on operating parameters, namely D and s in : In a D − s in plane different zones representing the stability properties of system (3) are identified [21].
For checking local asymptotic stability of the equilibrium points, the Jacobian of the system is provided and evaluated at each of these points. The resulting matrix's eigenvalues must have negative real part. A general formula for this Jacobian is presented in Appendix B (see (A70)).
Algorithm 1 summarizes this procedure.

Algorithm 1: Algorithm for Evaluating the Possible Equilibrium Points of System (3).
Data: Operating and ecological diagrams are created by running Algorithm 1 for different pairs (s in , D). In the case of operating diagrams [22] (OD) all the pairs (s in , D) are regrouped such that the points of the set P represent when partial nitrification (PN), complete nitrification (CN), washout (WO), or a combination of them may arise [23]. PN refers to the state when nitrite (s 2 ) accumulates because the OTU of G 2 are washed out and thus no conversion from s 2 to s 3 takes place. On the contrary CN is when nitrate (s 3 ) accumulates because of the presence of OTU of G 2 .
In the case of ecological diagrams (ED) the pairs are regrouped such that the points in the set P have the same non-active coordinates. In other words, instead of representing areas where either CN, PN or WO take place, ranges of pairs (s in , D) where species coexist are represented. ED provide more information than OD, in the sense that one can deduce the latter from the former.
A first example using operating diagrams is presented to illustrate how adding interactions in a model consisting of 1 OTU in G 1 and 1 OTU in G 2 may lead to very different outcomes.
The important question of the existence of limit cycles was not resolved in this work. In the numerical analysis of this model at least one stable equilibrium was found for any choice of parameters. This obviously does not exclude the existence of limit cycles, but to what concerns the authors' intuitions there always seems to be at least one stable equilibrium point.

Case Study 1: 1 AOB and 1 NOB
Consider the case where n 1 = 1 and n 2 = 1. The operating diagrams when no interactions take place (A = 0) and a non-zero interaction matrix are presented. When A = 0 Algorithm 1 is no longer valid (A is not invertible), nevertheless the stability analysis is much simpler and is given in the Appendix B section. The interaction matrices are shown in Figure 1a,b, the rational behind the second choice was to force a very strong interaction of x 1 on x 2 and observe its effects. The biological reason behind a negative microbial interaction might be the release of a toxin by x 1 that affects x 2 [1], or in this case it might represent competition for oxygen; we stress the fact that gLV interactions do not explicitly account for mechanistically anything they just try to represent an ecological relationship taking place. The rest of parameters can be seen in Table 2.
The operating diagrams can be seen in Figure 2, note how partial nitrification (washout of G 2 ) of Figure 2b is much bigger when compared to Figure 2a. The shape of the PN region in Figure 2b is somewhat unintuitive, because at a constant dilution rate (0.24 day −1 for example) and an increasing s in , one passes from a PN zone, to a CN zone, and then back again to a PN zone. The mathematical explanation lies in the fact that x 1 also increases with s in , and the affine part (1 + a 21 x 1 + a 22 x 2 ) of the growth function of x 2 plays a bigger role than substrate limitation ( s 2 s 2 +K 2 ).   (3).

Case Study 2: 2 AOB and 2 NOB
Case study 2 is based on Dumont et al. [16] model parameters. They proposed a distribution of parameters obtained from a Bayesian estimation method. Their fit describes well the dynamics of the two most abundant OTU in each functional group, but it still fails to capture the measured substrates dynamics. Kinetic parameters of case study 2 can be seen in Table 3. The estimated interaction matrix is shown in Figure 3a. A second matrix is presented, which is obtained by the sign change of coefficient a 11 (Figure 3b), and finally a third one is obtained by using a positive value for a 12 (Figure 3c). The idea is to show that qualitatively different outcomes can be obtained by changing one interaction at a time.
026 100 The ecological diagrams are presented in Figure 4, where the legend indicates the species surviving in the zone of the respective colour. In Figure 4b, the system exhibits bistability (it is represented in numbering as (1) and (2) of the different possible equilibriums). Note how every zone in Figure 4b has two stable equilibria, meaning that the outcome of the system is determined by its initial conditions, particularly interesting is the green zone where either x 1 , x 3 coexist or only x 2 remains, because in operational terms this means that either PN or CN may take place. When compared to Figure 4a, one can see that this change in the interactions of the microbial community can dramatically change the outcome of the reactor in a large operating zone.  One can see that coexistence in the same functional group is never attained in Figure 4a,b, whereas in Figure 4c x 1 and x 2 , both AOB, coexist in either partial or complete nitrification. That means that the competitive exclusion principle [22] (CEP) does not hold. The CEP roughly states that if two species are growing on the same limiting resource, and their growth laws only depend non decreasingly on the resource, then only one of them will survive in the long run. This is interesting in light of reports on wastewater treatment plants where coexistence between species in nitrifying reactors has been shown [24], thus implying that a more complex growth law (as shown in here) or model structure involving other biological processes is required to include microbial diversity in mathematical models.

Remarks
Model (3) serves to illustrate that by considering a more complex growth rate that tries to model ecological interactions one might explain differences in reactors operating under similar conditions. It also shows a new mechanism by which the CEP no longer holds and which explains how multiple stable equilibria may appear. Since the gLV model discussed fails to completely capture the dynamics observed in the chemostat experiments [16], the next section proposes a new approach to study interactions.

Generalized Approach for Modelling Interactions
In the following an explanatory model (as opposed to a predictive model) is developed based on the hypothesis that interactions might be driving the nitrification process. In the previous sections interactions were modelled as an affine function of the OTU concentration that multiplies a substrate dependent growth equation. More generally the interaction function represents how the growth rate of species i is affected by the concentration of other species, x: Given a vector (v 1 , . . . , v n ) the interaction function I is denoted as: Let f i (s) be a bounded, positive, and continuous function of s (e.g., Monod, Haldane). The growth equation of OTU i becomes: Note f (s) := ( f 1 (s), . . . , f n (s)) .
Since the growth of a single strain in batch experiments is driven by the substrate concentration, when no interactions are present one should recover expression f i (s). Therefore if there are no interactions then I i (x) = 1. From this hypothesis, note that lim There is an open set Ω ⊂ R n such that I ∈ C 1 (Ω).
Note J I (x) the Jacobian matrix of function I, then a first order approximation of I(·) centred atx gives: . Whenx = 0 one recovers the growth expression from the previous section (Equation (6)) implying that J I (0) can be seen as the interaction matrix from model (3).

Unravelling the Interaction Function
Suppose that the functions f i (s), and the yields y i are well-known. By using experimental measurements of x, represented by z(t), the objective is to reconstruct function I(x). For doing so, the terms I i (x) are replaced by controls u i (t), thus I i (x(t)) = u i (t). A control law is obtained by solving a nonlinear optimal tracking problem. Consider the observable system (25), with y(t) = x(t) being the output, because we are observing measurements coming from genetic sequencing.
Consider the weighted norms defined by positive definite matrices Q and R, represented by · Q and · R , respectively, andū > 0. The optimal tracking problem is defined as: The control u(t) is intended to drive the system to be near a desired output z(t), which in this context are the measurements of the concentrations of OTU. The term (u − 1) R , was added for two reasons: • First, because the interest is testing the idea that interactions could be driving the system. Therefore adding a penalization in the objective function for each control to remain near 1 can be seen as an attempt to explain data without any interaction. In other words, if the control terms are found to drift from 1, it means that interactions are necessary to explain the system dynamics. • Second, to force a regularized control. Otherwise note that u is linear in (25), therefore if the integral cost does not have a non-linear expression of u the optimal control will be of a bang-bang type with possibly singular arcs [25]. Since the objective is to find a differentiable expression of I(x) the addition of the regularization term is deemed necessary.
The problem of approximating the solution of the system to a desired reference (z in this case) is called the optimal tracking problem. For solving such a problem the approach developed by Cimen et al. [17,26] was adapted to our problem. The method proposed involves the resolution of Approximating Sequences of Ricatti Equations (ASRE). It consists of iteratively calculating trajectories of System (25) with a certain control law to later feed a non-autonomous Ricatti differential equation with the resulting trajectory. Then, a new control law that uses the solution of the Ricatti equation is proposed and a new trajectory is calculated. The iteration is stopped when a convergence in the output or the control is observed.
The control term should remain positive for the system to be well posed (no negative states), and an upper bound was added to represent the fact that life cannot grow infinitely fast. The tracking problem does not consider a constrained control. Nevertheless, the methods of Cimen et al. [26] were directly used with an explicit constraint in the synthesis of the control. Even though this is probably suboptimal when the control reaches its bounds, one at least is certain about its optimality when the control never reaches its constraints.
Another departure from their method is that in their formulation an approximation of the dynamics for calculating the trajectories is used. They proved that such a linearisation converges to the original dynamics. In the case here presented the linearisation of the dynamics was not necessary.
The change of variable u i (t) = v i (t) + 1 is used for technical reasons explained in Appendix C.
This in turn implies v i (t) ∈ [−1,ū − 1]. The feedback control v(t), will be of the form v(t) = −R −1B (t) P (t)x(t) − s f (t) . Where matrix P(t) and vector s f (t) solve differential equations. In Appendix C it is proved that, thanks to the structure of the system, one only needs to calculate 2n differential equations for the synthesis of the control, instead of (n + 3) 2 + (n + 3) that would imply the direct application of the method, which renders the method-at least theoretically-scalable for a growing number of OTU.

Proof of Concept
The approach was tested with data generated by simulating model (3) using the parameters of the case study 1 ( Table 2) with interaction matrix given by (Figure 1b). In the operating diagram of the same case ( Figure 2b) the red zone implies complete nitrification, while the green zone means partial nitrification. For integrating the former phenomena in the simulation, the system was simulated for 300 days, and perturbed at day 150 from the CN zone ((s in , D) = (1.25, 0.24)) to a PN zone ((s in , D) = (1.95, 0.24)). Simulations can be seen in Figure 5. Note how from day 150 the NOB population (OTU 2) represented in Figure 5b decreases, which in turn implies a decrease in s 3 , as seen in Figure 5c. In the case where no interactions take place, the OD seen in Figure 2a implies that s 3 would have accumulated all along the trajectory, since the perturbation still remains in the CN zone.  For the tracking procedure the functions f i (s) and the yields y i were the same as those used for simulating the synthetic data (parameters in Table 2) and the control is meant to account for the interaction term. The Q and R matrices were λ 1 I n 1 0 0 λ 2 I n 2 and I n , respectively, with λ 1 = 10 −4 and λ 2 = 10 −5 in order to better track the NOB trajectories, since they are less abundant. The values were obtained by trial and error, by using a single λ for both functional groups, beginning with λ = 1, in which case one can see how the optimal control becomes u = 1, thus no tracking is performed. Further diminishing the value from 10 −5 adds too much noise to the control, without significant gains on the quality of the tracking. The results of the procedure to identify interactions can be seen in Figures 6 and 7. Figures 6a and 7a,b show the total biomass concentration, and the trajectories for the OTU belonging to G 1 and G 2 , respectively. It can be seen that the method approaches well the trajectories of the OTU, with a better result for the AOB community, which can be explained by the one order of magnitude difference in their concentrations (which in turn is a consequence of the one order of magnitude difference in their yields). The metabolites concentration represented in Figure 6b are in accordance with the simulated: The method is able to reconstruct the metabolites trajectories from the community measurements. Figures 8 and 9 show the controls and the corrected growth rate for each functional group. The control for each functional group can be seen in Figures 8a and 9a. Note that from the structure of a quadratic regulator, since there is no cost in the final state, the end value is always 1. Figures 8b and 9b show the resulting growth rates for AOB and NOB, respectively, without the control u i (t). Figures 8c and 9c are the complete expression that determines growth rate, that is f i (s(t))u i (t)x i (t). Note how little the shape changes with respect to Figures 8b and 9b, which might mislead the reader to conclude that the control had reduced effects in the dynamics. The way out of this conundrum is to remember that the control's effects are already included in x i (t) and s i (t), and thus in expression f i (s(t))x i (t).     A final comment on the identifiability of the interaction terms. Even though one might propose a growth rate with the tracking control u(t) that accurately replicates the OTU trajectory x(t), retrieving the original interaction coefficients from the obtained control for this example was not possible. The former was tried by minimizing function dt with a non linear optimization solver for a 1000 initial random guesses for matrix A. If one also takes into accountμ i and K i as parameters to fit this adds even more degrees of freedom, thus suggesting that the identifiability of growth functions (6) in model (3) might be very low.

Application
The tracking problem was applied to data coming from a nitrification process with experimental conditions described in [27]. For exploring the hypothesis of interactions as drivers of bioreactors performance environmental conditions should be kept as constant as possible. Therefore only data from day 183 onwards was used because a change in the operating temperature happened at that point, which is known to have an effect on kinetics. For choosing which species belong in which functional group, the procedure described of Ugalde-Salas et al. [28] was used. From day 183 to day 315, 31 OTU were identified in the G 1 group (AOB) and 5 in the G 2 functional group (NOB).
A first example of the procedure is performed when the classified OTU are regrouped in their assigned functional groups by adding their concentrations. A 5 dimensional dynamical system is obtained, thus there are only two interacting functional biomasses: this case is structurally the same as in the proof of concept, but here a real dataset is used. The same procedure is applied where no regrouping occurs and the system state grows to 39.
The knowledge of functions f i (s) was based on a study of nitrification's kinetic parameters [29]. Particularly given the system's ammonium and nitrite concentration a Monod function (Equation (27)) was used for G 1 and G 2 with parameters given in Table 4 calculated from the equation of Table 2 of the same article. The yields were fitted to match the nitrogen mass balances. The Q and R matrices were the same as in the proof of concept section, that is λ 1 I n 1 0 0 λ 2 I n 2 and I n , respectively, with λ 1 = 10 −4 and λ 2 = 10 −5 , because data lie in the same order of magnitude than synthetic data. For the reader to gain understanding of the situation, a simulation of the system using the experiments operating parameters (D and s in ) is presented without control (i.e., u(t) = 1) in Figure 10 nitrate (s 3 ) accumulates all along the trajectory, but when compared to data it is clear that s 3 stops accumulating after a while.
When applying the tracking method one obtains the simulation that can be seen in Figures 11 and 12. The method captures the tendencies of the measured substrates as seen in Figure 11b. The tracking of each functional group G 1 (AOB), and G 2 (NOB) can be seen in Figure 12a,b, respectively.
The growth rates of each functional group are shown in Figure 13. Note in the case of AOB (Figure 13a) the resulting growth rate shows a noisy curve formed by pulses. The behaviour of the NOB community (Figure 13b) is qualitatively very similar with somewhat stronger pulses and less noise. The former is to be expected since more OTU were regrouped to compose the AOB biomass, therefore more noise sources were added.     The same procedure is applied without regrouping. The results on total biomass and metabolites are shown in Figure 14a,b, respectively. Both patterns still fit the data, but to a lesser degree of precision when compared to Figure 11. This can be explained by inspecting Figures 15-20. First note the absolute error of the tracking for each of the OTU in the AOB community (Figures 15b-19b), almost every point lies below 0.015 [g/L], implying that the method might not be able to track below that threshold for the members of the AOB community. The former ultimately implies that the most abundant OTU are better tracked, thus the information contained in the least abundant species is not integrated in the model. Notice that the error for the NOB community is lower (Figure 20b), almost every point lies below 0.005 [g/L], this can be explained in the one order of difference in the entry of matrix Q for the AOB and the NOB community. It may be the case that using appropriate weight matrices that account for the difference between OTU abundances could help in this aspect; in that sense only one rational was tested (inverse of the mean abundance of each OTU in the diagonal entries of matrix Q) and did not improve the results. When looking at the growth rates (Figures 15c-20c) one again observes pulses for each OTU. Finally, note that most OTU were present only for a fraction of the experiment's duration.
In both cases, the regrouped and individual tracking, the growth rate varies strongly, raising the question whether the observed pulses are emerging from interactions within the microbial community. When growth rates are compared to the proof of concept section it seems doubtful that a linear pairwise interaction model such as the gLV model could capture the complexity of the particular chemostat analysed. Perhaps these interactions are not constant through time (as opposed to the gLV model) or a different interaction function should be thought of. However the former questions cannot be fully clarified here, because the quality of the genetic sequencing from molecular fingerprints might not be the best when compared to more recent techniques, thus it is unclear if the pulses are due to noise of the measurements.
The interpretation of the correction term as interactions is not the only possible reading. In other contexts the correction term might also be interpreted as a non accounted phenomena ranging from environmental factors (e.g., temperature, pH) to other biological factors (viruses, flock formation, pathogens). Alternative hypotheses for explaining the observed patterns in the microbial community should be considered as well.

Conclusions and Perspectives
Over the last decades, advances in genetic sequencing and microbial ecology have opened a gap for modellers in biochemical processes to integrate this valuable information. Considering the success of mass balance models to predict and pilot bioreactors, new models should be built upon them, or at least be compared against them. This article exposed what can be gained from combining population-based models as used in ecology with functional group based approaches as used in bioengineering. The analysis of the gLV model proposed by Dumont et al. [16] already shows that such a combination can give way to models that include bi-stability, coexistence within a functional group, and unintuitive operational insights such as raising the input ammonium s in to achieve partial nitrification. The increased number of parameters of this particular model obviously hinders its potential application, but it surely helps to illustrate what can be gained by joining both types of models. The mathematical analysis focused on the particular case of pairwise interactions, which can be seen as a first order approximation of the introduced concept of interaction function. This opens the question for a broader class of interaction functions that could well represent complex microbial ecosystems, particularly bioreactors.
With that line of reasoning, in order to understand what this interaction function should look like, a data-driven approach was presented. It can be simply described as correcting the growth rate expression of each individual species in a mass balance model by explicitly assuming a control loop on the growth rate depending on the species state variables and the measured abundances. The reconstructed growth rates seem to consist of pulses, suggesting that a form of a possible interaction function should reproduce this behaviour. However, the former questions cannot be fully clarified here, because the identification and quantification of OTU from single-strand conformation polymorphism (SSCP) as performed by Dumont et al. [18] might not be optimal. Indeed, the determination of OTU from SSCP profiles rely on the identification and quantification of peaks and miss many OTU that appear only as background noise [30]. This type of fingerprinting is thus less accurate than more recent methods such as the sequencing of the 16S ribosomal RNA. In spite of this, one is able to recover the substrates dynamics, implying that the hypothesis of microbial interactions as drivers of a bioprocess, in the form of feedback loops affecting each others growth rates, is not far-fetched.
The use of the tracking technique can be applied in a straightforward manner to already existing models in mixed homogeneous bioreactors, under the condition that the microbial species have already been identified with a particular functionality of the system. Even though the tracking model was proposed for a chemostat setting as a way to correct a substrate limited expression, the method could be used in contexts where less information on the growth function of microbes is known. If one supposes nothing on the growth expression but the fact that is bounded (life cannot grow infinitely fast) the model becomes a linear model and one recovers a classic quadratic regulator for linear systems. This was tested for the data presented in this article and, unfortunately, negative substrate appeared as an output, suggesting that the substrate limitation term is crucial for the model to be wellposed. Nevertheless, even in the former case, the synthesis of the optimal bounded control remains an open theoretical challenge. One might bypass this issue of the current control scheme by, for example, a very thorough use of the Pontryagin maximum principle for the synthesis of the control. In a more general view the reconstruction of the growth function in chemostat systems is already subject to problems of identifiability [31], integrating genetic sequencing could provide a path for more certainty in model calibration.

Acknowledgments:
The authors would like to thank Alain Rapaport for the enriching exchanges and comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Defines = max{s in (t)|t ≥ 0} and consider the differential equation: If there is a time interval H such that t * ∈ H ⇒ D(t * ) = 0, then z 1 is constant and therefore bounded by the value of the solution in z(t * ). If D(t) > 0, then w =s is a stable asymptotic equilibrium. Define M 1 := max{w(0),s}. If w(0) >s then by the Picard-Lindeloff theorem ∀tẇ(t) < 0, otherwise it would cross the solution of the initial value problem (A4) with starting points, therefore w(t) ≥s. The same reasoning may be applied if w(0) <s. One concludes that w(t) ≤ M 1 . Now considerż From a comparison lemma (see chapter 3 [20]), the solution of (A5) is bounded by w and therefore z 1 (t) ≤ M 1 .
,s , and M 3 = max{s 1 (0) + s 2 (0) + s 3 (0),s} and noting that z 2 (t) and z 3 (t) satisfies the same differential equations as z 1 the above reasoning may be applied and one has the desired bounds.
Proof. If any coordinate of the solution becomes zero, then it's derivative is either zero, or positive. The bound will follow from Lemma 1.
Note that for any i ∈ [n], if x i = 0 thenẋ i = 0, therefore all solutions of system (3) with initial condition x i (t 0 ) = 0 remain in these planes. By the Picard Lindeloff theorem a solution starting in int(Ω) cannot cross these planes, therefore x i ≥ 0, and thus s 2 (t) ≥ 0. Note that since s 2 ≥ 0, bound (A6) is valid for any time, and not only when s 2 = 0. If s 3 = 0 theṅ For the boundedness it suffices to notice from Lemma 1 that the sum of positive elements is bounded, therefore each element is bounded.

Appendix B. Deduction of Equilibrium Points
thus, system (3) is rewritten as follows.
Recall J the set of non-active coordinates. Let M be the matrix defined by taking out the J columns of the identity matrix of size n. When M multiplies from the left it adds rows of zeros to the multiplied matrix in the J coordinates. When matrix M multiplies from the right it takes out the J columns of the multiplied matrix. When M multiplies from the left it takes out the J rows of the multiplied matrix. When M multiplies from the right it adds columns of zeros to the multiplied matrix in the J coordinates. This gives the following relationships: x eq = Mx act ; f act (s) = M f (s); µ act (x, s) = M µ(x, s); Y act = YM; M M = I n act (A11) From Equation (A7) equilibrium points satisfy: note that ∀j ∈ J , x eq j = 0 therefore the coefficients a ij play no role in Equation (A13) and one can rewrite Equation (A13) as From Hypothesis 2 matrix A act has an inverse. This gives the following formula Note as well that at the equilibrium, s 3 can be defined in terms of s 1 , s 2 and s in . This is done by adding Equations (A8)-(A10) which gives:

. Both Functional Groups Are Present
The case where in each functional group remains at least one OTU is represented by Hypothesis 3. By replacing x eq = Mx act in Equation (A8) yields (s in − s 1 )D + Y act 1• diag(µ act (x, s))x act = 0 (A17) For notation and indexing purposes it is useful to define B := Y act 1• (A act ) −1 M (note B j = 0 ∀j ∈ J ). Replacing (A15) in Equation (A8) reads as follows: and so we get to formula (17) of the main text: The same computations must be done with Equation (A9), which is structurally very similar to (A8). By replacing x eq = Mx act in Equation (A9) yields It is again useful to define: and one gets to expression: where Then by replacing (A21) in Equation (A30), and after some reordering, one gets a fifth degree polynomial for s 1 .
The washout of G 2 is represented in Hypothesis 4. Under this case note that f act (s) depends only on s 1 . By replacing again x eq = Mx act in Equation (A7), one obtains the same computations as in the previous section, so retake Equation (A41) Since G 1 ⊂ J then B i = 0∀i ∈ G 2 thus (A41) becomes: Furthermore, so a quadratic equation for s 1 is obtained.

. Jacobian of the System
Recall that f : X → Y is a function, then its derivative is f : [32]. In the case where X = R n and Y = R m , f (x) is the Jacobian matrix evaluated at point x, furthermore f (x)[e j ] (where e j is the canonical j-th vector) is the j-th column of the Jacobian matrix evaluated at point x.
If f is linear then f (a) = f for any a ∈ X. In the case of the diag operator, we observe it is linear, therefore diag (a)[x] = diag(x) In the case of bilinear mappings another formula holds, let h : X 1 × X 2 → Y be bilinear. Then h (a 1 , a 2 )[x 1 , x 2 ] = h(x 1 , a 2 ) + h(a 1 , x 2 ). In the case of the function h 1 : M n×n (R) × R n → R n such that h 1 (M, x) = Mx, one can see that h 1 (A, x)[B, y] = Bx + Ay.
The chain rule states that f : X → Y and g : Y → Z, if h := g • f then h (a) = g ( f (a)) • f (a). Take the expression h : (x, y) ∈ R n × R n → diag(x)y. It is clear that h = h 1 (diag(x), y). By the chain rule: Note finally that function h is symmetric, i.e., h(x, y) = h(y, x). Then going back to our system.

Appendix C. Tracking Problem Reformulation and Details
For applying the methods developed in [26]. Define the system state X = (x, s). Make the change of variables v i = u i − 1 with v = (v 1 , . . . , v n ) are applied to system (25). The system may be rewritten then as: Define: and B 1 (X) = diag( f (s)) diag(x); B 2 (X) = Y diag( f (s)) diag(x); C(X) = diag(I n ) 0 n×3 (A75) Then the system (A71) can be rewritten as: X = A(X)X + B(X)v; y = C(X)X (A76) z(t) ∈ R n is the measured vector containing the OTU concentrations in time. The cost functional is given by where F, Q and R are positive definite matrices. Since there is no interest in the final time F = 0. Q and R are taken as diagonal matrices, in that way the system can be reduced as shown below. Particularly after testing the model in the proof of concept and data, the Q and R matrices were λ 1 I n 1 0 0 λ 2 I n 2 and I n , respectively, with λ 1 = 10 −4 and λ 2 = 10 −5 .
Define the dynamic sequences for i ∈ N,Ẋ [i] as: and for i = 0 define X [0] (t) as the solution of (A71) with v(t) ≡ 0. The control law is given by where P [i] (t) ∈ M n+3×n+3 (R) and s [i] f (t) ∈ R n+3 are the solution to the differential equations: Replacing the matrices of our probleṁ For certain entries of the dynamic the constantly zero function is a solution for them, implying by existence and uniqueness that they should be constantly zero. Then P [i] has n × n non zero entries and s [i] has n non zero entries, explicitly: Inspecting the former equation one notices that if i = j, P ij (t) = 0 is a solution of all non diagonal entries when R is diagonal. Furthermore, therefore, once again, by existence and uniqueness they should be constantly zero. Hence only the diagonal entries should be calculated.P    f (t f ) = 0 (A96) Furthermore, the control law is given by (A97) They were solved using standard backward numerical integration.