Escherichia coli Cultivation Process Modelling Using ABC-GA Hybrid Algorithm

: In this paper, the artiﬁcial bee colony (ABC) algorithm is hybridized with the genetic algorithm (GA) for a model parameter identiﬁcation problem. When dealing with real-world and large-scale problems, it becomes evident that concentrating on a sole metaheuristic algorithm is somewhat restrictive. A skilled combination between metaheuristics or other optimization techniques, a so-called hybrid metaheuristic, can provide more efﬁcient behavior and greater ﬂexibility. Hybrid metaheuristics combine the advantages of one algorithm with the strengths of another. ABC, based on the foraging behavior of honey bees, and GA, based on the mechanics of nature selection, are among the most efﬁcient biologically inspired population-based algorithms. The performance of the proposed ABC-GA hybrid algorithm is examined, including classic benchmark test functions. To demonstrate the effectiveness of ABC-GA for a real-world problem, parameter identiﬁcation of an Escherichia coli MC4110 fed-batch cultivation process model is considered. The computational results of the designed algorithm are compared to the results of different hybridized biologically inspired techniques (ant colony optimization (ACO) and ﬁreﬂy algorithm (FA))—hybrid algorithms as ACO-GA, GA-ACO and ACO-FA. The algorithms are applied to the same problems—a set of benchmark test functions and the real nonlinear optimization problem. Taking into account the overall searchability and computational efﬁciency, the results clearly show that the proposed ABC–GA algorithm outperforms the considered hybrid algorithms.


Introduction
Although many different methods of global optimization have been developed, the efficiency of an optimization method is always determined by the specific nature of the particular problem. Parameter optimization of cellular dynamics models has become a field of particular interest, as the problem is widely applicable. Identification of the parameters of a nonlinear dynamic model is more difficult than the linear case since there are no general analytical results.
Typically, metaheuristic methods can successfully solve complex identification problems. Even more effective behaviors can be achieved through a combination of different metaheuristic techniques, the so-called hybrid metaheuristics [8].

1.
Combining the features of ABC and GA, a new hybrid algorithm ABC-GA is proposed and tested on classic benchmark test functions and a real-world problem. 2.
The superiority of the designed hybrid ABC-GA is shown based on the comparison of the simulation results with other hybrid metaheuristic algorithms for the problems of the two different test groups. 3.
Using the hybrid metaheuristic algorithm ABC-GA the optimal parameters' values of a nonlinear mathematical model of an E. coli cultivation process are estimated.
The paper is organized as follows. In Section 2, the background of the ABC-GA hybrid algorithm is given. The performance of the designed hybrid on different unconstrained optimization problems is studied in Section 3. The considered real-world problem-a model parameter identification-is outlined in Section 4, and the numerical results from the identification are presented and discussed. Concluding remarks are given in Section 5.

ABC-GA Hybrid Algorithm
Based on the foraging behavior of the honey bees, the ABC algorithm was introduced by Karaboga for the purpose of numerical optimization problems [18].
ABC operates on a set of food sources representing potential solutions to the problem under consideration. The initial set of solutions is generated during the initialization phase of the algorithm when the initial values of the algorithm's parameters are set.
The ABC algorithm evolves in three phases on each iteration: employed bees' phase, onlookers' phase and scouts' phase [31], during which a new set of food sources is formed. The employed bees search for new food sources around those stored in their memory. Based on their evaluations, the newly found food sources may replace the old ones. Only certain food sources can be modified during the onlookers' phase. They are selected based on the probability values associated with them. The last phase is when an abandoned food source is replaced by a new random one found by a scout.
These three phases are repeated for a predetermined number of cycles called maximum cycle number (MCN). The best food source represents a reasonable solution to the considered problem.
GA was developed to model adaptation processes using a recombination operator with a mutation in the background [32].
GA maintains a population of individuals. Each individual represents a potential solution to the problem. Each solution is evaluated, so a new population is formed by selecting more fit individuals. In order to form new solutions, some elements of the new population undergo transformations by means of "genetic" operators. These are unary transformations (mutation type) which create new individuals by a small change in a single individual, and higher-order transformations (crossover type), which create new individuals by combining parts from several individuals.
After a certain number of generations, the algorithm converges and is expected to best represent a near-optimum (reasonable) solution to the problem under consideration.
The proposed hybrid algorithm is a collaborative combination between the ABC algorithm and GA. The aim is to avoid the poor convergence rate of the ABC algorithm, a disadvantage reported in [26,32,33]. The population-based GA randomly generates an initial population that can be very far from the optimal solution and may require a lot of iterations to draw close to it. To overcome these limitations, ABC is applied to the generated initial solutions for a few iterations. The ABC outcomes are then used as an initial population of GA. Thus, GA starts with a population much closer to the optimal one compared to the randomly generated initial population. The best solution is obtained by the genetic evolution of the ABC, and the result uses fewer computational resources. This integration aims to provide a proper balance between exploration and exploitation (intensification and diversification).
The main steps of the ABC-GA hybrid algorithm are presented in Figure 1. The best solutions accumulated by ABC for n number of runs, where n is the population size of GA, are used as the initial population Pop0 of GA. GA iteratively improves the existing population by selecting and reproducing parents for a certain number of generations.
The selection of parents to produce next generations is performed based on the fitness of the individuals. The selection method for this particular case is the roulette wheel selection. Two genetic operators-crossover and mutation, are applied later to generate new individuals. The extended intermediate recombination and real value mutation employed here are defined in the following way.
The crossover operator combines the genetic material of two parents. The recombi- During the initialization phase of the algorithm, values of the input parameters of ABC and GA are set. The optimization problem is defined, as well as the problem parameters and their bounds. The initial solutions are generated using Equation (1). Each solution is a D-dimensional vector , limited by the lower and upper bounds of the corresponding parameter of the optimization problem: ABC utilizes this set of initial solutions in an attempt to get closer to finding the optimal solution to the optimization problem. ABC runs for a predetermined MCN, and each cycle evolves in three phases, defined in [31].
The search for new food sources (solutions) during the employed bees' phase is based on Equation (2): where j is a random integer number in the range [1, D]; k is randomly selected index; The onlookers' search around certain food sources is based on the same Equation (3). The food sources are selected taking into account a probability value p i associated with each food source, calculated by the roulette wheel selection: where f i is the fitness value of the solution During the greedy selection in the employed bees' phase and the onlookers' phase, the number of trials for each food source is updated when it is not replaced by a better one. When the trials exceed a predefined limit, the corresponding food source is abandoned and replaced by a new food source, randomly generated using Equation (1). This new food source represents a food source found by a scout.
The best solutions accumulated by ABC for n number of runs, where n is the population size of GA, are used as the initial population Pop 0 of GA. GA iteratively improves the existing population by selecting and reproducing parents for a certain number of generations.
The selection of parents to produce next generations is performed based on the fitness of the individuals. The selection method for this particular case is the roulette wheel selection. Two genetic operators-crossover and mutation, are applied later to generate new individuals. The extended intermediate recombination and real value mutation employed here are defined in the following way.
The crossover operator combines the genetic material of two parents. The recombination process is unconditional. Let x and y be two D-dimensional vectors denoting the parents from the current population. Let z = z 1 , z 2 , . . . , z D be the result of the recombination. The elements of z are generated using Equation (4).
where j ∈ [1; D]; α j ∈ [−δ; 1 + δ] is chosen with uniform probability. δ indicates to what degree an offspring can be generated out of the parents' s scope; usually, the value of δ is 0.25. The mutation operator is applied to each element x = x 1 , x 2 , . . . , x D with a probability inversely proportional to the number of dimensions D. Let z = z 1 , z 2 , . . . , z D be the result of the mutation. The elements of z are evaluated using Equation (5).
1} from a Bernoulli probability distribution and k ∈ {4, 5, . . . , 20} is the mutation precision related to the minimal step size and the distribution of mutation steps in the mutation range.
The fitness of each individual in the population is evaluated. Offspring with a better fitness evaluation move to the next generation until the maximum number of generations MaxGen is reached.
The optimal solution to the particular problem is the best individual from the last generation of GA. Since the initial population is not random but much closer to the optimal sought solution, GA converges to the optimal solution in much fewer iterations.
The hybrid ABC-GA proposed here applies the two algorithms, ABC and GA, sequentially. Therefore, the computational complexity of ABC and GA should be considered separately when the complexity of the hybrid is calculated.
For a given problem, let O (f ) be the computational complexity of its fitness function evaluation. The complexity of the standard ABC can be expressed as O (MCN × NP × f ). Since ABC is required to generate an initial population for GA, the algorithm is executed n number of times, where n is the population size of GA. The computational complexity of GA can be evaluated as O (MaxGen × n × f ). The overall complexity of the ABC-GA hybrid algorithm can then be evaluated as It should be noted, however, that the values of the parameters MCN, NP and MaxGen used in the hybrid ABC-GA are far smaller than the values used in the standard case. The reason is that the ABC algorithm is used to give an initial push to GA, to trigger GA and the search for the optimal solution of the problem to start from a closed position rather than a random one. This way, GA needs far fewer steps to complete the search.
In order to improve the computation precision in the hybrid ABC-GA algorithm, the complexity of the algorithm is sacrificed.

ABC-GA Hybrid Algorithm Performance on Different Unconstrained Optimization Problems
The proposed algorithm ABC-GA is tested on nine well-known benchmark functions listed in Table 1. All benchmark functions are considered with a dimension of 30 except for Wood's function (dimension of 4). Table 1. Benchmark unconstrained optimization problems.

Function Definition Range
Powell To verify the performance of ABC-GA, the obtained results are compared with the hybrid algorithms ACO-GA and GA-ACO [34] and ACO-FA [35]. The common parameter settings of the hybrid algorithms used in all test sets are provided in Table 2. The parameters were chosen to ensure enough configuration to find the optimal solution. Probability  The performance of the four compared hybrid algorithms is evaluated by calculating the best value, the mean value and the standard deviation (SD). The obtained results are presented in Table 3. The best and mean values for each problem are highlighted in bold.
According to Table 3, ABC-GA produced the best value for all nine benchmark functions and the best mean value for eight of them. For Griewank, the best mean value was obtained by the hybrid ACO-GA.
As can be seen for the unimodal Sphere function, the convergence accuracy and stability of ABC-GA are better than those of GA-ACO and ACO-FA. Only the results achieved by GA-ACO are close to the results of ABC-GA. The same results are obtained for the complex unimodal Rosenbrock. However, for the multimodal Rastrigin, Ackley, Griewank and unimodal Schaffer #1 mostly ACO-FA does not produce good results. In the case of Griewank, GA-ACO and ACO-FA results are significantly worse, too. The mean values of GA-ACO and ACO-FA are 9.71 × 10 −3 and 8.70 × 10 −7 , respectively, compared to the mean ABC-GA value of 1.08 × 10 −15 . The ACO-FA shows the worse performance for all considered benchmark test functions. The results obtained by ABC-GA demonstrate the achieved good balance between algorithm's exploration and exploitation.
To illustrate the efficiency of ABC-GA, the convergence of the compared hybrid algorithms towards the optima for 30 runs is presented in Figure 2. Two benchmark functions, Sphere and Powell, are chosen as the most indicative. For the rest of the functions, the superiority of ABC-GA is very clearly shown by the results in Table 3. ACO-FA is not included in the chart because it produces much worse results.  The plots show the comparative performance of the ABC-GA hybrid algorithm. In terms of convergence, it can be noticed that ABC-GA has a relatively fast convergence toward its final optimal value compared to the other hybrid algorithms. The plots show the comparative performance of the ABC-GA hybrid algorithm. In terms of convergence, it can be noticed that ABC-GA has a relatively fast convergence toward its final optimal value compared to the other hybrid algorithms.

Problem Formulation
There is an increasing interest in technologies that maximize the production of various essential enzymes and therapeutic proteins based on E. coli cultivation [36][37][38][39]. The development of mathematical models for the description, monitoring and control of bioprocesses is a complicated task due to the inherent complexity and non-linearity of biological systems. An important part of building the model is the choice of a specific optimization procedure for parameter identification. The high accuracy of the estimations of the model parameters is essential for successful model development.
The application of the general state-space dynamical model to the fed-batch cultivation process of bacteria E. coli leads to the following nonlinear differential equation system [40]. The parameter identification problem considered here is solved using real experimental data of an E. coli MC4110 fed-batch cultivation process. Offline measurements of the biomass concentration and online data for glucose concentration are used. A detailed description of the cultivation conditions and experimental data are discussed in [40].
The numerical optimization problem uses a standard objective function-a minimization of the distance measure J between the experimental data and the model-predicted values of the main process variables: where n is the number of data points for each process variable (X, S); X exp and S exp are the biomass and substrate experimental data; X mod and S mod are the biomass and substrate model predictions for a given set of the model parameters.
Although the objective function seems bi-objective, the preliminary tests show that there is no need to use some weighted method to convert the function into a single objective.
The two-process variables X dX dt = f (X, S) and S dS dt = f (X, S) are deeply connected and dependent. Moreover, due to the specificity of the experimental data, the error of the first part of Equation (9) is in the same range as the error of the second part of the equation. Therefore, the use of any weight coefficients in the objective function leads to the deterioration of the results.
The upper and lower bounds of the model parameters µ max , k S and Y S/X (see Equation (1)) are considered as follows [41]: x max = [0.8, 1, 10].
Processes 2021, 9,1418 10 of 15 For the problem considered here, the constructed solution is a 3-dimensional vector

Numerical Results and Discussion
All computations have been performed using a PC/Intel Core i7-8700 CPU @ 3.20GHz, 16 GB Memory (RAM), Windows 10 operating system and Matlab R2013a environment.
The basic parameters of both algorithms (ABC and GA) have been set based on a series of preliminary tests in accordance with the problem under consideration. The chosen algorithms' parameters and functions are summarized in Table 4 (only those parameters that obtain different values compared to those presented in Table 2 are listed here). The parameter settings of the other comparative algorithms are taken from the respective original papers. ABC algorithm starts with a small population of ten individuals. The initial solution is evaluated for only 25 iterations. The best 50 solutions are then used as an initial GA population. Again, for a low number of generations (50), GA looks for optimal model parameter estimates.
Because of the stochastic characteristics of the hybrid algorithm, a series of 30 runs are performed. The average, best and worst results of the 30 runs for the objective function value J are summarized in Table 5. The presented results show that for a small population size (10) and only 25 iterations, the ABC algorithm produces a very good initial solution for GA. To form the initial population for GA, ABC with these particular parameters' settings is executed 50 times. Next, based on this initial population, GA converges to a near-optimal solution for 50 generations.  [35].
Estimated values of the model parameters (µ max , k S and Y S/X ), as well as statistical measures as the population variance (σ 2 ) and standard deviation (SD), are presented in Table 6. The population variance is used since it is a parameter of a set of decisions (estimations) that does not depend on research methods or sampling practices. Similar results for the estimates of the yield of glucose per biomass (Y S/X ) are reported in [41,43]. According to [44,45], the values of the parameters µ max and k S are also within acceptable boundaries.
The values obtained for σ 2 and SD show the good performance of the proposed ABC-GA hybrid algorithm.
A graphical representation of the modeled E. coli fed-batch cultivation process variables (biomass and substrate) and the measured ones (real experimental data), based on the model parameters listed in Table 6, are presented in Figures 3 and 4. A graphical representation of the modeled E. coli fed-batch cultivation process variables (biomass and substrate) and the measured ones (real experimental data), based on the model parameters listed in Table 6, are presented in Figures 3 and 4.  The graphical results show that the ABC-GA hybrid algorithm achieves a very good correspondence between the measured and modeled process variables. The model obtained by the ABC-GA hybrid scheme predicts with a high degree of accuracy the biomass and substrate dynamics of the process.
The results presented here are compared to already published results from the application of different hybrid metaheuristics to the same problem. The performance of the proposed hybrid ABC-GA is compared to the performance of hybrid algorithms as ACO-  A graphical representation of the modeled E. coli fed-batch cultivation process variables (biomass and substrate) and the measured ones (real experimental data), based on the model parameters listed in Table 6, are presented in Figures 3 and 4.  The graphical results show that the ABC-GA hybrid algorithm achieves a very good correspondence between the measured and modeled process variables. The model obtained by the ABC-GA hybrid scheme predicts with a high degree of accuracy the biomass and substrate dynamics of the process.
The results presented here are compared to already published results from the application of different hybrid metaheuristics to the same problem. The performance of the proposed hybrid ABC-GA is compared to the performance of hybrid algorithms as ACO-  The graphical results show that the ABC-GA hybrid algorithm achieves a very good correspondence between the measured and modeled process variables. The model obtained by the ABC-GA hybrid scheme predicts with a high degree of accuracy the biomass and substrate dynamics of the process.
The results presented here are compared to already published results from the application of different hybrid metaheuristics to the same problem. The performance of the proposed hybrid ABC-GA is compared to the performance of hybrid algorithms as ACO-GA and GA-ACO [34] and ACO-FA [35] (Table 7). The comparison of the estimated model parameters (µ max , k S , and Y S/X ) is presented in Table 8. The comparison shows that the designed ABC-GA hybrid algorithm performs better than the other considered algorithms. The best result, J = 4.3391, is obtained by ABC-GA. A very close result is achieved by the ACO-FA hybrid, considering the observed average results.
The worst result (J = 4.7016) obtained by ABC-GA shows that some further improvement of the GA evolution is necessary in order to achieve better results. It has been observed in the literature that the sampling capability of GA is greatly affected by the population size [46]. Another problem of GA is the algorithm's parameters setting. The choice of appropriate control parameters is a tedious task. These will be directions for the process of improving the performance of GA in the ABC-GA hybrid algorithm proposed here.
In order to show the underlying frequency distribution of the results obtained by the compared hybrid algorithms, histograms are presented in Figure 5.
The observed distribution of the result of the ACO-GA hybrid algorithm is a plateau or multimodal distribution. GA-ACO and ACO-FA produce results which are with a left-skewed distribution. The objective function values of 30 runs obtained by the ABC-GA hybrid algorithm considered here are with a normal distribution. The advantage of hybridizing the ABC algorithm with GA over the other considered hybrid algorithms is that the multi-modality or skewed distribution is avoided. served in the literature that the sampling capability of GA is greatly affected by the population size [46]. Another problem of GA is the algorithm's parameters setting. The choice of appropriate control parameters is a tedious task. These will be directions for the process of improving the performance of GA in the ABC-GA hybrid algorithm proposed here.
In order to show the underlying frequency distribution of the results obtained by the compared hybrid algorithms, histograms are presented in Figure 5. The observed distribution of the result of the ACO-GA hybrid algorithm is a plateau or multimodal distribution. GA-ACO and ACO-FA produce results which are with a leftskewed distribution. The objective function values of 30 runs obtained by the ABC-GA hybrid algorithm considered here are with a normal distribution. The advantage of hybridizing the ABC algorithm with GA over the other considered hybrid algorithms is that the multi-modality or skewed distribution is avoided.

Conclusions
In this paper, an ABC-GA hybrid algorithm is designed and applied to the parameter identification of a cultivation model. A system of nonlinear ordinary differential equations is used to model bacteria growth and substrate utilization. Model parameter identification is performed using a real experimental data set from an E. coli MC4110 fed-batch cultivation process.
The ABC-GA algorithm uses, in the beginning, a small ABC population size (10 individuals) and obtains the GA initial set of solutions for only 25 iterations. Next, for only 50 generations, GA converges to the final solution.
The proposed hybrid algorithm is further compared to other nature-inspired population-based hybrid metaheuristics known in the literature. As competing algorithms, GA, FA and ACO hybrids are chosen and applied to the same parameter identification problem. It is shown that the ABC-GA hybrid algorithm outperforms the other competitor algorithms.
Since the hybrid uses a low number of individuals, the known dependence of sampling capability of GAs on the population size could be exhibited. For example, in the case when ABC-GA obtains the worst results (Table 7). Further improvements to the designed ABC-GA hybrid will be directed to overcoming this dependence. Another direction for

Conclusions
In this paper, an ABC-GA hybrid algorithm is designed and applied to the parameter identification of a cultivation model. A system of nonlinear ordinary differential equations is used to model bacteria growth and substrate utilization. Model parameter identification is performed using a real experimental data set from an E. coli MC4110 fed-batch cultivation process.
The ABC-GA algorithm uses, in the beginning, a small ABC population size (10 individuals) and obtains the GA initial set of solutions for only 25 iterations. Next, for only 50 generations, GA converges to the final solution.
The proposed hybrid algorithm is further compared to other nature-inspired populationbased hybrid metaheuristics known in the literature. As competing algorithms, GA, FA and ACO hybrids are chosen and applied to the same parameter identification problem. It is shown that the ABC-GA hybrid algorithm outperforms the other competitor algorithms.
Since the hybrid uses a low number of individuals, the known dependence of sampling capability of GAs on the population size could be exhibited. For example, in the case when ABC-GA obtains the worst results (Table 7). Further improvements to the designed ABC-GA hybrid will be directed to overcoming this dependence. Another direction for improving the algorithm is the algorithm's parameters setting. A more in-depth study on the influence of the control parameters (both for ABC and GA) on the hybrid algorithm's performance could be made.