Multi-Objective Optimization of Microalgae Metabolism: An Evolutive Algorithm Based on FBA

Studies enabled by metabolic models of different species of microalgae have become significant since they allow us to understand changes in their metabolism and physiological stages. The most used method to study cell metabolism is FBA, which commonly focuses on optimizing a single objective function. However, recent studies have brought attention to the exploration of simultaneous optimization of multiple objectives. Such strategies have found application in optimizing biomass and several other bioproducts of interest; they usually use approaches such as multi-level models or enumerations schemes. This work proposes an alternative in silico multiobjective model based on an evolutionary algorithm that offers a broader approximation of the Pareto frontier, allowing a better angle for decision making in metabolic engineering. The proposed strategy is validated on a reduced metabolic network of the microalgae Chlamydomonas reinhardtii while optimizing for the production of protein, carbohydrates, and CO2 uptake. The results from the conducted experimental design show a favorable difference in the number of solutions achieved compared to a classic tool solving FBA.


Introduction
Microalgae are unicellular photosynthetic organisms. They are capable of capturing gases such as CO 2 from internal combustion engines and industries, and converting it into oxygen [1]. Furthermore, some strains of microalgae have the ability to thrive under stress conditions while removing oxygen peroxide, nitrates, and phosphates present in wastewater [2], making microalgae suitable for several bioremediation strategies. In addition, microalgae CO 2 capture through photosynthesis and its transformation into several industrial raw materials such as carbohydrates, lipids, proteins, pigments, aromatic compounds, etc., is a more economical and attractive renewable source [3,4].
A large number of strains of microalgae have been studied, finding several metabolic pathways involved in the synthesis of many secondary metabolites. However, the production rate of these metabolites varies from one species to another or even in the same species, according to different environmental and metabolic conditions. The production of the secondary metabolites depends on many factors, such as the type of microalgae and the culture conditions, temperature, pH, lighting, and nutrient sources [5].
So far, metabolic models built from genomic sequences allow a quantitative view of the transport and metabolism of compounds within a target organism. In addition, these metabolic models have long been used to generate optimized design strategies for an improved production process [6].
Most metabolic models of microalgae focus on studying a single objective function, e.g., biomass. For the particular case of metabolic networks in steady-state, Flux Balance Analysis (or FBA) is the most commonly used optimization method for their study [7,8]. Equation (1) defines the associated FBA linear optimization problem [8], where v is the flux vector across the reactions. The stoichiometric matrix S m×n represents the metabolic network, where there is a metabolite per row and a reaction per column. The value of the cell S ij is the stoichiometric coefficient of the metabolite i involved in reaction j [7], and the LB j , UB j are the lower and upper bounds for the fluxes allowed in the metabolic system. The steady-state assumption is established by Sv = 0 [9].
However, despite the widespread use of FBA to predict fluxes in large-scale networks, it is not always accurate in predicting fluxes in vivo [20]. Moreover, most metabolic models satisfy n > m, meaning that multiple solutions might be found. This situation becomes more complex in simultaneous bioproducts optimization. A recent trend that works in metabolic analysis involves optimizing several objectives to engage in the study of more than one bioproduct of interest [21][22][23]. In the past decade, this method can be traced back to the work of Zomorrodi and Maranas [24]. There, they developed the computational framework OptCom for FBA of microbial communities. The foundation of the framework is multi-level optimization; it optimizes problems embedded one another in a hierarchical structure for the sake of reaching optimum values for the final chosen bioproduct. Budinich et al. [21] extend FBA for microbial communities by defining a Multi-Objective FBA (MOFBA) in order to study multiple trade-offs between nutrients and growth rates. More recently, Andrade et al. [22] and Pelt-KleinJan [23] proposes a multi-objective formulation of FBA that considers nutrient limitations for metabolic analysis.
Multi-objective optimization has been exploited in a wide variety of fields in science and engineering [25,26]. MOFBA, in particular, appears in medicine, where Zhang and Boley [27] proposed a non-linear MOFBA to explain the impact of the objectives cells in the Warburg effect in different cell types. Moreover, the works [21,24,27] simulate genomescale metabolic models for microbial ecosystems as a single strain exchanging; they use multi-objective flux equilibrium analysis, and flux variability analysis (MO-FVA).
The main goal for multi-objective optimization is a good approximation of the Pareto frontier, cf. [28]; Figure 1 illustrates this within a metabolite context in a bi-objective function maximizing carbohydrates and proteins [29]. In this field, Multi-objective Evolutionary Algorithms (MOEAs) are widely recognized. Mainly, the algorithm NSGAII (Non-Dominated Sorting-based multi-objective EA) proposed by [30,31] has been quite effective when handling two or three objectives [32,33]. Based on the survey in [34], the only related work that uses NSGAII for FBA optimization is by Costanza et al. [35]. An overall view of the previous analysis indicates that the motivation for using multi-objective optimization in FBA lies in improving the prediction capability of FBA. However, the revised approaches do not adequately exploit the versatility of metaheuristics to approximate the Pareto frontier under a moderate consumption of computational resources. In other words, using a metaheuristic can better approximate the Pareto frontier, and provide a greater diversity of solutions than the previous approaches [25,26].
Hence, this work proposes a novel implementation of the metaheuristic algorithm NSGAII [30] for microalgae growth optimization. The novelty in the proposed NSGAII includes an original encoding scheme or genotype and an original fitness evaluation function. While in [35], NSGAII uses a knockout vector as genotype or encoding scheme, and OptKnock (cf. [36]) as fitness evaluation, the proposed NSGAII uses an original encoding scheme that generalizes the previous one, and an original fitness function evaluation based on FBA. The proposed encoding scheme is a generalization because the associated solutions' search space includes the knockouts. The use of FBA instead of OptKnock as a fitness function might significantly impact the performance of the algorithm because instead of solving a costly combinatorial optimization problem as in OptKnock, it solves a simpler linear equation system.
The conducted experimental design demonstrates the validity against a glyclolysis module of a reduced metabolic network for microalgae Chlamydomonas reinhardtii [20]. Moreover, the proposed NSGAII is compared against FBA, and the results show that while the quality of the solution remains, the proximity to an ideal point is improved statistically and it achieved a greater diversity of solutions. Hence, the main contributions are the novel multi-objective optimization problem for metabolic analysis and the metaheuristic algorithm to solve it. Table 1 summarizes the performance of NSGAII and FBA. Column 1 shows each configuration considered. Column 2 shows the quality of solution Q achieved by NSGAII. Columns 3 to 5 show the value of Q for FBA considering as objectives each of the bioproducts chosen in the associated configuration. Finally, Column 6 presents the number of solutions produced by NSGAII; this number denoted F 0 , is the number of non-dominated solutions reported by the algorithm.

Results
The solution quality was statistically compared between NSGAII and the distinct solutions reported by FBA for each configuration, and each objective took the lead. The null hypothesis H 0 was tested: the medians of the differences between the two group samples are equal. Using the Wilcoxon statistical test with a significance level set to 0.01, and as pairs of group samples (NSGAII MinEuclid , FBA Obj 1 ), (NSGAII MinEuclid , FBA Obj 2 ), (NSGAII MinEuclid , FBA Obj 3 ), the obtained p-values were 0.000018, 0.000026, and 0.000087, respectively. These results mean a rejection of H 0 , indicating a difference between the quality results of NSGAII and FBA, favoring NSGAII due to its lower values.  The results in Table 1 show that NSGAII improved FBA in terms of quality. Considering multiple objectives, NSGAII obatined closer solutions to the ideal point than those obtained by FBA. Moreover, the statistical test confirms that there is indeed a significant difference among these results. In addition, the good performance of NSGAII with respect to the diversity indicator D is also confirmed according to the values shown in Column 6, where the number of solutions ranges from a few dozens to several hundred, depending on the configuration, while classical FBA usually offers only one solution when Flux Variance Analysis (FVA) is not used.     A closer numerical look at the differences between NSGAII and FBA under configuration C 0 is shown in Table 2. This table compares the fluxes achieved by FBA in each case against some selected solutions reported by NSGAII.
Some additional insights arise from the previous results. Let us begin with the variety of configurations used in the comparison; this demonstrates the versatility of NSGAII to adapt to different circumstances and its capacity to improve the analysis of the metabolic network given the larger number of solutions produced for each of them.
After that, evolutionary approaches require fewer resources than FBA when dealing with multiple objectives; for example, it has the advantage of spending less time and memory. Approaches such as NSGAII allow a greater power of choice in the decisionmaking process also due to the variety and number of solutions, and the possibility of an easier recognition of the most important fluxes in a network and their influence and impact rather than not having a methodology.
Finally, by analyzing sets of several tens or hundreds of solutions simultaneously instead of just one through the classical approach, it is possible to have a better perspective of what is happening in cell metabolism.    Figure 6 is an oversimplifying of the Chlamydomonas reinhardtii metabolic network (detailed in Figure 7). The representation of Figure 6a are values obtained by optimizing the v 14 flux using FBA. Figure 6b-d correspond to results obtained from NSGAII. From these results, it can be seen that a cell always produces a compromise between the amount of biomass, carbon reserves, and its respiration process. Everything that occurs inside is distributed in distinct ways and proportions. The advantage of the proposed implementation of NSGAII is that one execution of the algorithm produces several fluxes distributions, in contrast with FBA which only produces one, the solutions' set from NSGAII improve a researcher's sight view of the physiological scenario under different conditions. The growth on acetate as a carbon source of Chlamydomonas reinhardtii synthesizes CO 2 as a product of metabolism, as seen in all fluxes distributions shown in Figure 6a-d, denoted v 18 . In this case, the metabolism of carbon acts as a source of energy for biomass growth. Figure 6c shows that the protein value of 6.60 is greater than the flux of carbohydrates that corresponds to 3.9, indicating a low accumulation of carbohydrates in favor of more significant protein production. This result concurs with the reported higher protein content observed when Chlamydomonas reinhardtii is cultured heterotrophically [15]. Figure 6d shows a slightly more uniform distribution of fluxes compared to the ones shown in Figure 6a, where all the fluxes are directed to the production of carbohydrates, compromising the entire flux of proteins v 10 , leading to a state lacking growth in the biomass.

Discussion
NSGAII can also produce solutions with similar fluxes' values to those obtained from FBA. For example, in Figure 6b, values are almost equal to those in Figure 6a, where the produced fluxes point to the generation of Carbohydrates and CO 2 . However, these solutions compromise protein production, such situation implies null growth and an undesirable condition for a real process, as discussed in previous experimental research [15,37].

Materials and Methods
This section presents the proposed implementation for NSGAII, and the experiment conducted to validate it in the solution of MOFBA. It is organized into three subsections. First, it shows the proposed approach. After that, it presents the case study. Finally, it ends with the description of the experimental design used to test NSGAII.

Proposed Evolutionary Approach Based on NSGAII
Traditionally, the optimization of metabolic networks considers a steady-state approximation in which the metabolite concentrations are assumed to be constant. Since in the majority of the real-world metabolic networks the number of reactions (denoted by n) is higher than the number of metabolites (denoted by m), there is a large number of combinations of reaction fluxes that satisfy such systems. This section presents the proposed multi-objective FBA and a novel evolutionary approach that solves it, efficiently producing a proper approximation of the Pareto frontier with a quite diverse set of Pareto solutions that can improve decision-making in metabolic engineering.
The novel features included in this research are: (1) a novel encoding scheme for an FBA solution; (2) an original fitness evaluation based on classic FBA that guides the approximation of the Pareto front on MOFBA; and (3) a novel adaptation of the NSGAII framework to solve the particular MOFBA defined under the previous characteristics, so that feasible solutions can be achieved.

Multi-Objective Optimization Model for FBA
Equation (1) shows the classic formulation for FBA. The solution of such an approach will yield a solution that maximizes the biomass. This work proposes the use of Equation (2) as the multi-objective alternative. It has a slight variation compared to Equation (1) which consists of the optimization of a set of bioproducts {v b 1 , . . . , v b m } instead of a single one. Equation (2) also considers the stoichiometric matrix S, the fluxes vector v, the steady-state condition Sv = 0, and w.l.o.g. that all objectives are to be maximized.
MOFBA, as defined in Equation (2), cannot be solved using traditional linear solvers. Instead, enumerative schemes or approximated approaches must be used to achieve the Pareto frontier. The following section shows the novel evolutionary approach proposed in this work to solve this problem.

Evolutionary Approach for MOFBA
This work proposes the use of a novel ensemble encoding the solution of MOFBA (as defined in Equation (2)) and its solution using NSGAII to solve the MOFBA. Given that NSGAII is an evolutive algorithm, it requires the definition of the following features:  Table 3 shows a graphical example of w applied to a case with |V | = 7, and |V M | = |V b | = 3. Having as initial bounds LB i = 0, and UB i = 10, and considering V M = {v 1 , v 2 , v 3 } and V b = {v 4 , v 5 , v 6 }, the use of w will results in the new bounds (LB, UB) = {(5, 7.5), (0, 7.5), (2, 8.4), (10, 10), (2.5), (6.25), (7.5, 8.75)}. Let us point out that any application of W over a reaction i uses its initial values for LB i , UB i .  Genetic Operators. These operators create new solutions by dynamically and randomly varying the values of decision variables on existing solutions. Their selection was due to their success in solving problems involving real-valued decision variables [38]. The chosen operators for mutation, crossover, and selection were Polynomial Mutation [39], SBXCrossover [40], and a simple yet reliable random selection, respectively. Constraint Handling Scheme. The ensemble for NSGAII considers a feasibility constraint. This constraint is violated whenever FBA in the fitness function reports unfeasibility. The latter arises because the bounds defined by an encoded solution might cause no feasible solution exists. This work uses the constraint handling method proposed in [41] to overcome this situation. As the generations evolve in NSGAII, the competition among solutions will always prefer feasible solutions, despite the non-domination status. In the long run, such a strategy tends to eradicate unfeasible solutions at the final report of the algorithm.
Initialization method. A randomly generated initial population was considered as the input for NSGAII.
In general, the novel idea behind the previous ensemble is that NSGAII exerts selective pressure toward the Pareto frontier using the genetic operators, the ranking based on nondomination, the diversity control established by the crowding distance, and an adequate constraint handling strategy. The solutions, which indirectly define the flux values for vector v in FBA, will dynamically evolve in the algorithm, and those with better aptitude (i.e., that are feasible and improve the best the chosen bioproducts of the reactions) are kept for future generations. The final set of solutions provided by NSGAII represents a wider, more spread, and more promising combination of the reaction fluxes values that a decision maker can take as an advantage for their research.
The following section details the validation process for this proposed ensemble of NSGAII.

Case of Study: Metabolic Network of Chlamydomonas reinhardtii
This work analyzes the FBA of an essential module of the microalgae Chlamydomonas reinhardtii. Figure 7 presents the related metabolic network, and Table 4 the corresponding reactions. This case presents three bioproducts of interest: proteins, carbohydrates, and CO 2 (denoted here as v 10 , v 14 , and v 18 ). Moreover, these networks show three primary substrates: acetate, E4P, and X5P (or v 1 , v 16 , and v 17 , respectively). Optimizing the three bioproducts (v 10 , v 14 , v 18 ) having as control decision variables (v 1 , v 16 , v 17 ) is the base to validate NSGAII. The next section presents specific details on the performed experiment.

Experimental Design
The experimental method evaluates the performance of NSGAII and compares it against the classical FBA. The hypothesis under consideration was that the decision-making on metabolic analysis can be improved using the evolutionary approach. The considered indicators that assess such improvement were the quality of the solutions, their diversity D, and their spread, or S concerning the Pareto frontier when considering the optimization of multiple fluxes. During the experiment, it was assumed that the study cases have the initial bounds limits in every reaction set to LB = 0, UB = 100.
Besides the set of bioproducts of interest, the configuration C 0 = {v 10 , v 14 , v 18 }, the experiment extends the analysis to the additional set of 36 configurations presented in Table 5. Each configuration represents a different set of bioproducts of interest. Demonstrating a good performance on such number of configurations would indicate that the evolutionary approach is versatile, and can easily be adapted to analyze metabolic networks under distinct contexts.
The quality of a solution, denoted Q, is measured as the Euclidean distance to an Ideal Point. Each configuration defines the ideal point and the best flux value resulting from FBA using each bioproduct of interest. For example, with the configuration C 0 the ideal point would be formed by the three optimal flux values reported by FBA, optimizing v 10 , v 14 , and v 18 , respectively.
The diversity of solutions, or D, is assessed by the number of distinct solutions achieved by the evolutionary approach. Let us recall that FBA only provides an optimal solution. Finally, the spread or S will be graphically demonstrated by showing a broad distribution of solutions approximating the Pareto frontier.
The development of the proposed NSGAII for MOFBA was done by combining the FBA implementation from COBRAPY [42], and the NSGAII implementation from PyMETAL [43]. Both technologies were integrated into a Python script, the following NSGAII values as parameters: (1) the population size was set to N = 100; (2) the stop criterion was defined as a maximum number of evaluations of 10, 000; (3) for the Polyno-mialMutation, a mutation probability of 0.083 and a distribution index of 20 was set; and (4) for the SBXCrossover a crossover probability of 1.0 with a distribution index of 20 was set. Given the stochastic nature of NSGAII, each configuration was solved 31 times in order to have a representative sample of the solutions.

Conclusions
This study carried out the maximization of multi-objective functions of the production of proteins, carbohydrates, and CO 2 as a basis to demonstrate the use of multi-objective optimization with the NSGAII algorithm in a part of a reduced metabolic network of the microalga Chlamydomonas reinhardtii. The results obtained were compared with FBA analysis, achieving a similarity between the maximization of FBA when only a single objective function is maximized, but it also shows a broader perspective of the study of metabolic fluxes when there are multiple objective functions, NSGAII provide not only solutions closer to the ideal point in quality but also a better diversity and spread of them. Moreover, as far as we know, this is the first approach using evolutionary algorithms for metabolic analysis.
The use of the NSGAII algorithm to facilitate the understanding of the behavior of the metabolism when using multi-objective analysis must take into consideration that a possible decision-maker in the case study is interested in a broader set of combinations of fluxes in the reaction, by maximizing or minimizing different objective functions simultaneously, to explore a wide set of solutions of which those that may be more convenient for the investigation of a species are more easy to extract; in addition, we obtained a broader perspective of the study of metabolic fluxes by knowing how the different settings assigned.
For future work, the NSGAII evolutionary algorithm will be applied to a much larger microalgae metabolic network, using different configurations by maximizing different objective functions and comparing the results with FBA, observing the different solutions according to the requirements, and taking the most suitable solutions according to the needs of the model. In addition, the use of the NSGAII algorithm compensates production between the fluxes of the metabolites of interest and the fluxes of biomass production.