An Enhanced Segment Particle Swarm Optimization Algorithm for Kinetic Parameters Estimation of the Main Metabolic Model of Escherichia Coli

: Building a biologic model that describes the behavior of a cell in biologic systems is aimed at understanding the physiology of the cell, predicting the production of enzymes and metabolites, and providing a suitable data that is valid for bio-products. In addition, building a kinetic model requires the estimation of the kinetic parameters, but kinetic parameters estimation in kinetic modeling is a di ﬃ cult task due to the nonlinearity of the model. As a result, kinetic parameters are mostly reported or estimated from di ﬀ erent laboratories in di ﬀ erent conditions and time consumption. Hence, based on the aforementioned problems, the optimization algorithm methods played an important role in addressing these problems. In this study, an Enhanced Segment Particle Swarm Optimization algorithm (ESe-PSO) was proposed for kinetic parameters estimation. This method was proposed to increase the exploration and the exploitation of the Segment Particle Swarm Optimization algorithm (Se-PSO). The main metabolic model of E. coli was used as a benchmark which contained 172 kinetic parameters distributed in ﬁve pathways. Seven kinetic parameters were well estimated based on the distance minimization between the simulation and the experimental results. The results revealed that the proposed method had the ability to deal with kinetic parameters estimation in terms of time consumption and distance minimization.


Introduction
Mathematical modeling is a compelling strategy for predicting and recognizing the biologic behavior of a cell's system. The mathematical model can create and predict the results of the empirical hypotheses that can be used to examine the process. In this regard, the biologic model studies the chemical metabolism and the pathways to construct a kinetic model that can give a clear understanding of a cell mechanism [1]. Therefore, building and developing a kinetic model requires a large number of parameters such as kinetic parameters, potential of hydrogen (pH) and initial enzymatic and metabolite concentration. In the same vein, the building and the developing of the kinetics model gives a clear picture of cell behavior when it is needed in biotechnology applications [2]. In addition, building and developing a kinetic model is a challenging task because it requires kinetic parameter estimation [3].
Kinetic parameters estimation is the finding of the nearest value that minimizes the distance between the simulated and the real experimental results [4]. Kinetic parameters estimation is a complex task because of the nonlinearity of the kinetic model, and it is usually measured in different conditions and time consumption [5]. Usually, the kinetic parameters stored in the database are insufficient for building accurate kinetic model due to the cell behavior in different measurements stated by different laboratories utilizing various in vitro models and conditions [6]. Normally, the model is built depending on time derivative expressions using Ordinary Differential Equations (ODEs) which describe the changes in a state or quantity of interest over time. Recently, several kinetic models of cell metabolism are based on ODEs which have been developed to detect time-dependent changes in metabolic concentration. For example, models of glycolysis and pentose phosphate (PP) pathways in E. coli [7,8] and the tri-carboxylic acid (TCA) cycle in Dictyostelium discoideum [9] have been constructed. Furthermore, large scale model integration has been performed. For example, the model of [7] has been integrated with the model of (TCA) cycle [8,10] and amino acid biosynthesis [11]. Researchers have studied the glycolysis pathway to clone a new host and observe its changes and effects [12].
Therefore, because of the difficulty in estimating kinetic parameters, many researchers lately have used the metaheuristic optimization algorithms' methods to estimate the kinetics of the E. coli model and other biologic models. Some of these metaheuristic algorithms were used in order to estimate the kinetic parameters employed by [13][14][15][16][17][18][19][20][21] and some of these algorithms have used experimental data taken from [7,22,23] to investigate their abilities in the kinetic parameters estimation.
In this regards, the biologic kinetic models contain hundreds to thousands of kinetic parameters which have caused serious challenges in parameter estimation due to the high dimension of search space that is required to be explored. Thus, in high dimensional kinetic models (containing hundreds to thousands of kinetic parameters), the procedures are considered to be expensive in terms of computational cost and the performance of the aforementioned algorithms tend to deteriorate, which may lead to low accuracy [6].
The Differential Evolution (DE) algorithm is an example that is widely used and studied in parameter estimation of metabolic model [6,17]. Time consumption is the main weakness of (DE) algorithm; this makes it difficult for the (DE) algorithm to tune its parameters when it involves a large number of processors and multiple local searches. In addition, the Genetic Algorithm (GA) belongs to the class of evolutionary algorithms which almost share the same idea as DE. This algorithm is a popular method that is used in parameter estimation of metabolic model. The main weakness of this algorithm is the high computational time which mostly does not perform effectively when it is compared to DE and Particle Swarm Optimization (PSO) algorithms [24][25][26].
Since 1995, the PSO algorithm has been making great impact in solving many problems in different fields such as [14,[27][28][29]. This method was proposed by [30] and it was adopted in simulating flock of birds and a school of fish in order to locate their food. The algorithm particles during their movement in the process of searching share the information with each other and then they search for their destination stochastically and independently. Therefore, the particles' movements in the search space move towards the optimum solution by taking different paths and directions.
Moreover, the method of Se-PSO was proposed and developed based on the PSO algorithm in order to increase the computational time, and the accuracy of PSO was presented by [31]. This development is based on the recognition of the local and global point problem of PSO [30]. In this regard, the segmentation divides the particles into groups so as to move together towards the optimal solution. This algorithm was adapted for large-scale kinetic parameters of E. coli model [32] and governor-turbine model identification of single area power plant [28,32] was used in a linear inertia weight. Due to the nonlinearity of the model which uses linear inertia weight (ω) by [32], this inertia weight affects the exploration and exploitation of the particles. Consequently, there is need for large exploration at the beginning and small exploitation at the end of the algorithm execution. The purpose for this is to avoid the local optima trap so as to increase the efficiency of the kinetic parameters estimation with a view to minimizing the model distances in reasonable time.
In this study, the method of Se-PSO algorithm was enhanced by adding a damping process which helped the particles to have a wide exploration at the beginning and small exploitation at the end during the algorithm execution. The aim was to investigate the amount of each solution to be found at the search space. Thus, the enhanced Se-PSO was adapted to estimate large-scale kinetic parameters of E. coli and to increase the efficiency of the Se-PSO algorithm in terms of exploration and exploitation.
The other sections of the study are organized as follows: Section 2 describes the problem formulation. Section 3 describes the materials and methods. Section 4 describes the obtained results. Section 5 gives the conclusion of the findings of this study.

Problem Formulation
Kinetic parameters estimation of the kinetic model was formulated to find the best set of kinetics that minimizes the model simulation differences with real experimental data. This formulation was set as a global optimization problem. In this regard, optimization algorithm methods can be adapted in order to estimate the kinetic parameters and minimize the differences between the model simulation and real experimental result. The formulation of this problem mathematically was set as a fitness function in Equation (1) [13]: where R is the number of metabolites i experimental data; so R = 12, y s,i is the estimated metabolites concentration s of metabolite i, y e,i is the real metabolites experimental data e of metabolite i.

Kinetic Model Structure
The dynamic model of the main metabolic of E. coli formulated by [10] was used as a benchmark. This model, which consists of (glycolysis, pentose phosphate, TCA cycle, gluconeogenesis and glyoxylate) pathways in addition to acetate formation with phosphotransferase system, is described in This model has 23 metabolites, 28 enzymatic reactions, 10 co-factors (e.g., Adenosine triphosphate (ATP), Coenzyme A (COA), Nicotinamide adenine dinucleotide phosphate (NADPH)) and 172 kinetic parameters. The rate at which the concentration of the metabolite in the considered model changes is given by Equation (2) as follows: where C i is the concentration of metabolite i, v i is the rate of reaction j, S i,j is the stoichiometric coefficient of the metabolite i in the reaction j and µ is the specific growth rate. Thus, the term µC i represents the dilution effect due to biomass growth. The model mass balance equations and the kinetic rate equations of the considered model are stated in Appendix A Tables A1 and A2, respectively. Figure 1. This model has 23 metabolites, 28 enzymatic reactions, 10 co-factors (e.g., Adenosine triphosphate (ATP), Coenzyme A (COA), Nicotinamide adenine dinucleotide phosphate (NADPH)) and 172 kinetic parameters. The rate at which the concentration of the metabolite in the considered model changes is given by Equation (2) as follows: Where is the concentration of metabolite , is the rate of reaction , , is the stoichiometric coefficient of the metabolite in the reaction and is the specific growth rate. Thus, the term represents the dilution effect due to biomass growth. The model mass balance equations and the kinetic rate equations of the considered model are stated in Appendix A Tables A1 and A2, respectively.

Parameter Estimation by (ESe-PSO) Algorithm
In this study, an enhanced (Se-PSO) algorithm was proposed. This method shares the same idea as Se-PSO, but it is different in terms of exploration and exploitation of the inertia weight [27]. Decreasing linear inertia weight ( = 0.9 0.4) [27] was used. The inertia weight controlled the effect which the last iteration speed had on the current speed, and thus allowed the particles to explore wider areas in the beginning and nearby areas in later stages with reduced speed for exploitation. Here, the main plan of ESe-PSO was to add a damping process to the inertia weight so as to support the exploration and exploitation of the particles towards the optimum solution. The development was done by initializing the inertia weight = 1 and damping parameter called

Parameter Estimation by (ESe-PSO) Algorithm
In this study, an enhanced (Se-PSO) algorithm was proposed. This method shares the same idea as Se-PSO, but it is different in terms of exploration and exploitation of the inertia weight [27]. Decreasing linear inertia weight (ω = 0.9 to 0.4) [27] was used. The inertia weight controlled the effect which the last iteration speed had on the current speed, and thus allowed the particles to explore wider areas in the beginning and nearby areas in later stages with reduced speed for exploitation. Here, the main plan of ESe-PSO was to add a damping process to the inertia weight so as to support the exploration and exploitation of the particles towards the optimum solution. The development was done by initializing the inertia weight ω = 1 and damping parameter called ωdamp = 0.99. This damp parameter was calculated in the iteration process to increase the controlled convergence and to support the particles for reaching the global solution. In Table 1 below, the (ESe-PSO) algorithm is described: Table 1. Pseudo-code of Enhanced Segment Particle Swarm Optimization Algorithm (ESe-PSO).

Algorithm ESe-PSO Adoption
1. Begin 2. Initialize B, S, D, ω, and ωdamp 3. Initialize v i , X i , c 1 , c 2 , r 1 , r 2 , number_segment 4. Segment length = initial value/number_segment 5. Adopt the b e1 : b en parameters boundaries with respect to D 6. For j = 1 to the number of the segment 7. Determine initial fitness for segment J 8. Assume Best fitness = initial fitness 9. End for 10. For m data account  Table 1 describes the steps of the enhanced segment particle swarm optimization algorithm used to estimate the kinetic parameters. These algorithms were started by initializing the particles that had the highest number of generation S = 30. The bird number B = 10 and b e1 : b en was the kinetic parameter boundaries b from e1 to en parameters. The problem dimension D = 7, inertia weight ω = 1 and damping process ωdamp = 0.99 were in step 2. Then step 3 initialized the velocity and position of particles with the parameters of PSO which are acceleration coefficients towards the best personal position p i and the global best position of the entire G i , respectively, where c 1 & c 2 are constants called cognitive and social scaling parameters, respectively, and a random number between 0 and 1 for (r 1 , r 2 ).
The calculation of the segment length was obtained by utilizing step 4. Each kinetic parameter of the boundaries, both of the upper and lower, regarding the dimension of the problem D was created in step 5. Based on the number of segments, the calculation of the initial fitness of the whole segments was done with the assumption that the initial fitness was the best fitness step for 6, 7 and 8.
Here, the new fitness was calculated based on the fitness equation stated in step 11. Where the fitness was greater than the best fitness set, the iteration updated the velocity and the position towards the new fitness steps of 15, 17 and 19. If the new fitness was smaller than or equal to the best fitness print, the global best particles position updated the kinetic parameter, based on the global best particles position which is supposed to be the new initial kinetic parameters step in 22 and 23. If the fitness was greater than the best fitness, it returned to step 2 until the program highly figured out a solution or the iteration was met in step 25. The global point of segments is set equal to the global best position which was then used to calculate the optimal segment by its Equations (28)- (30). After the optimum segments were identified, they were used as the new initial kinetic parameter; PSO algorithm in [13] was searched among them and ended the program in step 31.

Results
The estimation of large-scale kinetic parameters in the kinetic model was a difficult task due to the nonlinearity of the model. For this reason, the kinetic parameters were reported from different laboratories and time consumption [32]. It was stated that during the application of the local sensitivity analysis to [14] model result, each kinetic parameter increased up to 200% by step 0.5 to calculate its sensitivity. It was found out that 7 kinetic parameters were highly affected by the used model's response for estimating the proposed algorithm. These sensitive kinetic parameters are v pyk max ,n pk ,icdh,k f icdh ,k d icdhnap ,k m icdhnap and v icl max which were involved in this reaction rate of V pyk , V icdh , and V icl . In addition, the experimental data of [16] were taken. These experimental data are Glc, G6P, F6P, FDP, PEP, PYR, 6PG, Ru5P, Xu5P, S7P, R5P and E4P.
As a matter of fact, during the ESe-PSO algorithm execution, the segments of the kinetic parameters of [32] were increased by adding 1 segment to each kinetic to increase the possibility of searching the wide-area and finding an accurate solution which is depicted in Appendix A Table A3.
The ESe-PSO algorithm parameters such as c 1 and c 2 was initialized as mentioned in [32], ω = 1 and ωdamp = 0.99. In this regard, the kinetic boundaries with their upper and lower bounds values were initialized with small increase based on [32] to allow the proposed algorithm search large space. Here, the mechanism for selecting the boundaries was based on the original kinetic parameters values, which assumed that these values are ± with each kinetics' small value, so that the optimum result may be around it. These kinetic parameters boundaries are depicted in Table 2 with their estimation as follows: After the kinetic parameters were initialized, the kinetic parameters were segmented to find the optimal segment for each kinetics. Later on, the updating of position, velocity and the fitness function was done based on the initialization of ESe-PSO parameters such as the inertia weight, damping process and the acceleration coefficient (c 1 and c 2 ). Thus, the optimal segment was determined based on the fitness function. The optimal segments were sent to the PSO algorithm which provided a local best solution so that PSO would search around the optimal segments.
In this regard, the ESe-PSO algorithm estimated the kinetic parameters and minimized the model distance to 28.94%. The model simulation was moved closer to the real experimental data when compared to the considered model; the Se-PSO, GA and DE algorithms are described in Table 3. The GA parameters were set as follows: generation gap = 0.8, crossover = 0.85, mutation = 0.05 [33] and maximum generation = 500. The DE parameters were set as follows: scaling factor F = 0.7, crossover rate Cr = 0.8, search strategy was DE/rand/1/bin as stated in [6], population size = 100 and a number of generation = 500 [32].  Table 3 shows how the proposed method moved the model simulation towards the experimental data closely. This simulation increased PEP metabolites and this may be due to the other pathways engaged during the execution of the algorithm such as anaplerotic, TCA, with acetate formation. In addition, this increase may be the OAA,PEP and PYR metabolites that were affecting the cell's growth [10].
The analysis of the model distance minimization showed that 12 out of the 12 datasets were moving towards the real experimental data. These metabolites are Glc, G6P, F6P, FDP, PEP, PYR, 6PG, Ru5P, Xu5P, R5P, S7P and E4P; thus, the metabolites were still slightly far from the experiment, perhaps because of the model complexity, shortage of data and the portion of some metabolites. Furthermore, the G6P empirical data were 3.48 mM and the simulation outcome of G6P could be leading to the empirical data rather than the G6P result model under examination when related to the G6P model. This discrepancy may be because of the model resolution of G6P consistency in the presence of 5 mM MgSO 4 and 0.48 mM NADP + after the addition of 0.7 U ml −1 of G6PDH as reported by [10].
Moreover, the FDP model simulation was having a better movement towards the experimental data rather than the considered model. This may be due to the nonlinearity of the system or the lumping of GAP and DHAP metabolites in one equation. The changes in FDP may be due to the lumping of pykII and pykI as stated in [10].
As stated in Table 3 above, the simulation response of the distance model minimization was moving towards the experimental data in 12 metabolites. Moreover, the model under investigation had five pathways than the data of only two pathways where only one metabolite was not well minimized (PEP) due to the lack of experimental data. The estimated kinetic parameters were used in the model simulation to compare our model responses to [10] model responses. The result of the comparison is depicted in Figure 2 below: In order to prove that the result is statistically consistent, the STD, mean and F value were calculated for the considered model. The result of the simulated model is displayed in Table A3 and Equations (3)-(6).
In Appendix A Table A4, the mean and STD were calculated for the proposed algorithm results and the response of the considered model's result. The proposed algorithm showed 0.6855 mean compared to the considered model (0.5251), DE (0.6189) and GA (0.6216) which indicated that the proposed algorithm result moved to the experimental data mean of 0.9647. In addition, DE algorithms showed a better result than GA in the mean values.
In addition, the proposed algorithm showed 0.9320 STD compared to the considered model's 0.9198 STD which indicated that the proposed algorithm result moved to the experimental data STD of 1.2271. Thus, the hypothesis of the result in Table 4 is calculated as follows: The hypothesis of Table A4 is calculated and decided below:  In order to prove that the result is statistically consistent, the STD, mean and F value were calculated for the considered model. The result of the simulated model is displayed in Table A3 and Equations (3)-(6).
In Appendix A Table A4, the mean and STD were calculated for the proposed algorithm results and the response of the considered model's result. The proposed algorithm showed 0.6855 mean compared to the considered model (0.5251), DE (0.6189) and GA (0.6216) which indicated that the proposed algorithm result moved to the experimental data mean of 0.9647. In addition, DE algorithms showed a better result than GA in the mean values.
In addition, the proposed algorithm showed 0.9320 STD compared to the considered model's 0.9198 STD which indicated that the proposed algorithm result moved to the experimental data STD of 1.2271. Thus, the hypothesis of the result in Table 4 is calculated as follows: The hypothesis of Table A4 is calculated and decided below: where STD 2 E is the standard deviation of the optimized result E, STD 2 D is the standard deviation of the considered model D and the n E , n D are the number of variables for the optimized and model result. This hypothesis was locking for minimizing the considered model's errors. This suggests that we accepted H 0 and rejected H 1 as the accepted result. Thus, the F test showed that the proposed method had moved to the critical F value better than Se-PSO, DE and GA by indicating1.0482, 1.0497, 1.2881 and 1.3156, respectively.
The objective function and time consumption were calculated for the proposed algorithm, DE and GA as depicted in Table 4 below: Table 4 demonstrates a comparison between the proposed algorithm with DE and GA algorithms. Table indicates that

Conclusions
The proposed method of ESe-PSO was used in order to estimate large scale kinetic parameters based on a large kinetic model. The estimation was carried by minimizing the model response distances with real experimental data. Seven kinetic parameters were estimated using 12 metabolites and the method optimized the 12 metabolites. These metabolites are Glc, G6P, F6P, FDP, PEP, PYR, 6PG, Ru5P, Xu5P, R5P S7P and E4P. The metabolites were well minimized with a slightly small error which may be due to the lump of GAP/DAHP metabolites or tktb involvement in the pentose phosphate and glycolysis pathways. However, the proposed method of ESe-PSO algorithm revealed that it had the ability to perform kinetic parameters estimation. It is hereby suggested that further study can be carried out on the evaluation of large scale kinetic parameters segmentation of other algorithms like Firefly and Bat algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Mathematical proofs of results not central to the study can be added as an appendix.

The Parameters
In Table A5 the parameters of the proposed method are stated as follows: Where ω is the inertia weight, ω damp is the damping process, c 1 and c 2 are the cognitive and social scaling parameters, respectively, and r 1 and r 2 are random numbers drawn from a uniform distribution. The damping process was added and the c 1 = 1.2 and c 2 = 1.2 were equal due to the trial and errors which were performed [32].