A Hybrid of Particle Swarm Optimization and Harmony Search to Estimate Kinetic Parameters in Arabidopsis thaliana

: Recently, modelling and simulation have been used and applied to understand biological systems better. Therefore, the development of precise computational models of a biological system is essential. This model is a mathematical expression derived from a series of parameters of the system. The measurement of parameter values through experimentation is often expensive and time-consuming. However, if a simulation is used, the manipulation of computational parameters is easy, and thus the behaviour of a biological system model can be altered for a better understanding. The complexity and nonlinearity of a biological system make parameter estimation the most challenging task in modelling. Therefore, this paper proposes a hybrid of Particle Swarm Optimization (PSO) and Harmony Search (HS), also known as PSOHS, designated to determine the kinetic parameter values of essential amino acids, mainly aspartate metabolism, in Arabidopsis thaliana . Three performance measurements are used in this paper to evaluate the proposed PSOHS: the standard deviation, nonlinear least squared error, and computational time. The proposed algorithm outperformed the other two methods, namely Simulated Annealing and the downhill simplex method, and proved that PSOHS is a more suitable algorithm for estimating kinetic parameter values.


Introduction
In silico optimization is a rapid and cost-effective method for finding optimal solutions in optimization problems. The two currently available methods are local and global methods. Each, however, comes with inherent systematic problems that require troubleshooting. The local optimization method performs poorly when solving non-linear and dynamic biological problems [1]. In this type of problem, the global search is a more effective method as it is capable of finding an optimal solution that satisfies all requirements. In fact, the global search method has recently been subject of much attention within the scientific community [1][2][3]. Parameter estimation is an essential phase in the simulation of the biological system because the value of the parameter determines the behaviour of the model.
In estimating parameter values, it is important to first determine the objective function that can then be minimized by using suitable optimization methods. The objective function must be minimized to obtain the ideal value of the parameter. Several optimization methods have, therefore, been applied to estimate parameter values [4]. After comparing several methods, Baker et al. [4] pointed out that while the Genetic Algorithm (GA) was less time-consuming than Simulated Annealing (SA), it faced the local minima problem. In MCMC approaches are often applied to determine the posterior distribution of rate parameters. However, developing a good MCMC sampler for its multimodal and dimensional parameter distribution is challenging. Valderrama-Bahamóndez and Fröhlich [5] found that parallel adaptive MCMC performed better in parameter estimations after comparing the ability of different MCMC approaches to estimate the kinetic rate parameters of ordinary differential equation (ODE) systems.
In addition, global optimization algorithms, like Particle Swarm Optimization (PSO) [6,7], SA [8], and Scatter Search [9,10], have been successful in the estimation of parameters in different biological models. In a comparative study on computational intelligence methods, carried out by Tangherloni et al. [11], the performance of various meta-heuristics was compared; in particular, their ability to estimate the parameters with a set of benchmark functions and synthetic biochemical models. Tangherloni et al. [11] concluded that classic benchmark functions lacked the comprehensibility that compounded the real-world optimization problem. The hybridization of optimization methods has been proposed so that the best features of each method could be utilized. Convergence to global optima is a consistent problem in standard PSO for multimodal and high-dimensional functions [12]. Hence, Fu et al. [12] proposed a hybrid method by incorporating the evolutionary operations of the Differential Evolution (DE) algorithm to improve its conventional velocity updating strategy in PSO. Harmony Search (HS) is a simple method with an efficient and evolutionary algorithm. It has parameters like the Harmony Memory Considering Rate (HMCR) and Pitch Adjusting Rate (PAR) that solve the local optima problem. Furthermore, HS is able to balance between intensification and diversification whereby promising regions and non-explored regions are thoroughly and evenly explored in both processes [13]. This paper introduces a hybrid of Particle Swarm Optimization [14] and Harmony Search [15] (PSOHS) and simulates the essential amino acid metabolism for kinetic parameter estimation.
PSO estimates kinetic parameter values by using swarm intelligence. The candidate solutions are analogous to particles flying with specific velocities in specific directions in the search space, either alone or together with their companions. Presumably, the particle will move to its best positions. HS, on the other hand, is based on the analogy with natural musical performance processes. By improving the pitches of the instrument of each music player, a better harmony is achieved and, hence, the quality of the performance [16]. Gao et al. [16] further discussed this method and its applications. The historical development of the HS algorithm structure had been extensively reviewed by Zhang and Geem [17]. This paper introduces a new hybrid algorithm, PSOHS, into the SBToolbox to estimate kinetic parameters by simulating the aspartate metabolism of isoleucine, lysine, and threonine in a small plant of the mustard family, Arabidopsis thaliana. Its model is often chosen in research because it has been demonstrated to render better results when analysing plant development and growth. This paper focuses on the biochemical reactions of essential amino acid production in Arabidopsis thaliana.

Materials and Methods
This section describes and discusses the problem formulation, and it details the hybridization of PSO and HS. A basic PSO is also described to differentiate between the basic PSO and the hybrid version.

Problem Formulation
This paper uses a biochemical system, which is a mathematical modelling framework based on ordinary differential equation (ODE). In the system, biochemical processes are represented using power-law expansions in the variables. In a biochemical process, a system of kinetic equations is formed according to the information regarding the underlying network structure of a pathway, and the parameters are acquired from the literature or estimated values are acquired from a data fit. Chemical kinetics mean rates of chemical reactions. The parameter estimation problem is posed in Equation (1) below. There, s(X) denotes a biological compound, depending on a set of parameters X = (X1,X2,X3, . . . Xd), where d is the total number of parameters. Hence, the reaction rate of the compound s is presented by ds dt = g(s(X), t), s(t 0 ) = s(0), y = g(s(X), t) + e. (1) In Equation (1), g represents a nonlinear function and t represents the sampling time. Then, y is a time series of simulated data, also known as the output of the model, and e represents the noise data that is randomly produced by Gaussian noise n (1,0). The purpose of the parameter estimation is to discover the set of the optimal parameter, which is denoted as X. Then the variance between the simulated time-series data denoted as y and experimental time-series data denoted as y exp can be reduced. The variance is calculated by applying the nonlinear least squared error function, f(X), which is shown in Equation (2): In Equation (2), n is the total number that maximum value generated and i is the index variable.

Particle Swarm Optimization (PSO)
PSO is a swarm intelligence-based optimization algorithm. The method was developed by Kennedy and Eberhart [6] and had been inspired by the social behaviour of flocking birds and schooling fish when searching for food [6].
In PSO, particles act as a possible solution in the search space of a problem. Every particle in the search space is assigned a certain velocity so that its movements through the search space can be determined. In the search space, each particle's movement is affected not only by its local best-known position, but it is also directed toward the best-known positions, where the best conditions are those that have been found by other particles.
Hence, there are two positions in the whole swarm, the local best-known position and the best-known position in the swarm. pbest represents the local best-known position while gbest represents the best-known positions in the swarm. Figure 1 shows the PSO flowchart. First, each particle is assigned a random position within the problem space to form an initial population. Then, the fitness of each particle is evaluated and compared with pbest. If the current value is better than pbest, then this value is updated to the new pbest. Then, the particle's best-known position and the swarm's best-known position are updated. The termination criterion is when the maximum number of iterations is performed, or a solution meets the adequate objective function value.

Harmony Search (HS)
Geem et al. [13] developed HS after being inspired by the music improvisation process. The algorithm achieved better harmony the same way as music players improved the pitches of their instruments [18]. To escape from the local minima, HS performs a stochastic random search instead of a gradient search. There are three rules to generate perfect harmony, which are Random Selection (RS), Harmony Memory Considering Rate (HMCR), and Pitching Adjust Rate (PAR). Figure 2 shows

Harmony Search (HS)
Geem et al. [13] developed HS after being inspired by the music improvisation process. The algorithm achieved better harmony the same way as music players improved the pitches of their instruments [18]. To escape from the local minima, HS performs a stochastic random search instead of a gradient search. There are three rules to generate perfect harmony, which are Random Selection (RS), Harmony Memory Considering Rate (HMCR), and Pitching Adjust Rate (PAR). Figure 2 shows the flowchart of HS. First, a population of random harmonies in Harmony Memory (HM) is initialized. Several randomly generated solutions are included in the initial HM. The next step is to improvise a new solution from the HM followed by updating the latter. In each repetition, the algorithm improvises a new harmony, and the latest harmony is generated by the following three rules: (1) HMCR is used to select the variables of the new harmonies from the overall HM harmonies; (2) PA is responsible for local improvement; and (3) RS provides random elements for the new harmony. If the latest harmony is better than the existing one, then it evaluates and replaces the worst harmony in HM. This process is iterated until the stopping criteria are met. the flowchart of HS. First, a population of random harmonies in Harmony Memory (HM) is initialized. Several randomly generated solutions are included in the initial HM. The next step is to improvise a new solution from the HM followed by updating the latter. In each repetition, the algorithm improvises a new harmony, and the latest harmony is generated by the following three rules: (1) HMCR is used to select the variables of the new harmonies from the overall HM harmonies; (2) PA is responsible for local improvement; and (3) RS provides random elements for the new harmony. If the latest harmony is better than the existing one, then it evaluates and replaces the worst harmony in HM. This process is iterated until the stopping criteria are met.

A Hybrid of PSO and HS (PSOHS)
PSOHS is proposed in this paper, which is a hybrid of HS and PSO. Figure 3 shows the steps taken by the proposed algorithm in order to acquire the optimal parameter values.

A Hybrid of PSO and HS (PSOHS)
PSOHS is proposed in this paper, which is a hybrid of HS and PSO. Figure 3 shows the steps taken by the proposed algorithm in order to acquire the optimal parameter values.  6, the number of iterations is 20, and the harmony consideration rate is 0.9. Then, the fitness of each particle in d variables is evaluated.

Iteration
The next step involves evaluating and sorting the fitness of the population. In each iteration, two values are updated, which are pbest and gbest. At the iteration step, each solution is evaluated by using the optimization fitness function. If the current value of the solutions in the d-dimensional space is better than pbest, then the pbest value is updated to the current value and the pbest location is the current location. Then, the fitness value is compared with the best-known positions in the swarm. If the current value is better than gbest, then gbest is updated to the current solution index.

Hybridization of Harmony Search
First, the parameters and the HS Memory (HM) are initialized. The parameters are (i) the size of HM; (ii) the Harmony Memory Considering Rate (HMCR); and (iii) the Pitching Adjust Rate (PAR). Next, the latest harmony is created based on three rules: (i) Random Selection (RS); (ii) HMCR; and (iii) PAR. Firstly, several randomly created solutions to the problems are included in the initial HM. Each component of this solution is acquired on the basis of the HMCR. The probability of choosing a component from the HM members is defined as HMCR and, therefore, 1-HMCR is the probability of creating it randomly. The harmony is chosen from a random HM member and PAR is used to further mutate the chosen harmony. The probability of a candidate from the HM to be mutated is determined by PAR. The RS is responsible for providing random elements to the new harmony. If the latest harmony yields a better fitness, it will replace the worst member in the HM. Otherwise, it is removed. This process is iterated until a stopping condition is reached.

Termination
The iterations terminate if the stopping criterion is achieved. The two stopping criteria are when the fitness function can no longer improve or when the predefined maximum loop values are reached.

Experimental Setup
The experiments have been carried out by means of computer simulation in which the performance of the selected algorithms has been assessed.

Experiment Setup (Computational Approach)
Three different algorithms; PSOHS, SA, and the downhill simplex method, have been used to estimate the parameter values. The algorithms were executed in MATLAB R2010a on a 1024 MB (1 GB) RAM and Intel Pentium 4 processor laptop. The result of PSOHS was compared with SA and the downhill simplex method. The total run for estimating all the kinetic values was 50 individual runs. The accuracy, consistency, nonlinear least squared error, and standard deviation were calculated and compared with both algorithms to evaluate the performance of PSOHS. The formula for calculating the nonlinear least squared error and standard deviation is given below: In Equation (3), e means the squares of the errors, y is the measurement result, and y i is the simulated result. Equation (4) is used to calculate the average squared error, where A represents the average squared error and N represents the number of samples.
Equation (5) is used to calculate the standard deviation.
Aspartate metabolism from Arabidopsis thaliana is the dataset that has been used in this paper. Arabidopsis thaliana has been chosen as a model organism due to its advantages for genetic experiments, such as (i) a short generation time; (ii) its small size; and (iii) its prolific seed production. In microorganisms, aspartate acts as the precursor to several amino acids, including methionine, threonine, isoleucine, and lysine, which are essential for humans. Threonine, isoleucine, and lysine have been selected in this paper because all of them cannot be produced by the body, but they are important in almost all body functions. Dataset details are shown in Table 1. In this paper, the values of a total of 31 kinetic parameters are estimated using PSOHS.

Result and Discussion
The performance of PSOHS was compared with SA and the downhill simplex method. Tables 2-4 show the kinetic parameter values that are estimated using PSOHS, SA, and the downhill simplex method on the basis of the experimental value [19]. The parameter values might range wildly in scale as they originate from the previous work [19]. It should be noted that Equation (3) is used to calculate the distance between the experimental data and model simulation for each reaction in each amino acid (isoleucine, lysine, and threonine). There are many reactions involved in each amino acid as well as ODEs in Arabidopsis thaliana. Tables 2-4 summarize the experimental results. The average squared error in Tables 2-4 shows the average reaction in the ODE system for each amino acid that involves a number of kinetic parameters. Meanwhile, Tables 5-7 report the parameter values obtained from the experimental data, as well as those generated from the proposed PSOHS, the downhill simplex method, and SA.      This paper focuses on parameter estimation using the proposed PSOHS. The performance of PSOHS was measured in terms of computational time, model accuracy, and precision of the algorithms. Model accuracy is measured by the distance value between the experimental data and model simulation using the nonlinear least squared method. In addition, the average squared error was calculated to get the average of all ODEs in each amino acid. To test the algorithm's precision, standard deviation was used for 50 individual runs. High standard deviations demonstrate low precision, and low standard deviations demonstrate high precision. The experiments were carried out in 50 individual runs to test the algorithms, and the result shown is the best multivariate solution among the runs. The average squared error and standard deviation were calculated from the runs. Tables 2-4 show the comparisons of computational time in seconds, average squared error, and standard deviation among PSOHS, SA, and the downhill simplex method. Table 2 shows the execution time for PSOHS to estimate the six kinetic parameters of isoleucine is 100.23 s, which is the lowest compared to 130.56 s for the downhill simplex method and 778.00 s for SA. The standard deviation of PSOHS is 0.0002, which is the closest to zero compared to the downhill simplex method and SA, which are 0.0004 and 0.002. Based on these comparisons, PSOHS shows the lowest average squared error and a low standard deviation, and this proves that PSOHS is more consistent, precise, and reliable in parameter values estimation compared to SA and the downhill simplex method. Table 3 shows that the average nonlinear least squared error for the three algorithms is 0.0211, 0.084, and 0.0406. The standard deviations of the three algorithms are 0.0133, 0.0998, and 0.0347. Besides, the computational time for PSOHS is 184.03; for the downhill simplex method it is 376.59 and for SA it is 1518.05. From among the three algorithms for estimating the kinetic parameter of lysine, PSOHS shows the best result.
Threonine has the greatest number of kinetic parameters to be estimated among the selected amino acids. Table 4 presents the comparison among PSOHS, the downhill simplex method, and SA. It seems that the average nonlinear least squared error for PSOHS is smaller than the downhill simplex method and SA. The average nonlinear least squared error for PSOHS is 0.0024, while the average nonlinear least squared errors for the downhill simplex method and SA are 0.012 and 0.0066, respectively. Furthermore, to estimate all kinetic parameters, PSOHS takes 255.37 s, which is a considerably lower computational time than the downhill simplex method, which takes 362.32 s, and SA, which takes 1794.91 s. In terms of standard deviation, the downhill simplex method and SA show a high standard deviation compared to PSOHS. The standard deviation for PSOHS, downhill simplex method, and SA are 0.0037, 0.017, and 0.0079, respectively. The results show that PSOHS outperforms the downhill simplex method and SA in estimating the sixteen kinetic parameters of threonine.

Conclusions
In conclusion, when estimating kinetic parameter values, PSOHS performed better than the downhill simplex method and SA, as shown by the smaller standard deviation in PSOHS. Moreover, PSOHS is less time-consuming. The lower nonlinear least squared error of PSOHS also proves that this algorithm is more accurate compared to SA and the simplex downhill method. Future lines of research are going to focus on including different performance measurements and algorithms and on comparing how they affect the performance of PSOHS. Only one dataset has been used in this research due to an unavoidable constraint. However, more datasets can be used in future work. Large-scale metabolic parameter estimation is preferable. However, the inclusion of more datasets poses a bigger challenge in that the parameters of every single gene and its product will be needed in order to be estimated. It leads to large-scale metabolic parameter estimations [20]. PSOHS should be further expanded to that scale with the aim of resolving the problem. Besides, the shortcomings of the existing HS, such as the fine-tuning ability of the algorithm, can also be improved in future work [21].