Shaping and Dilating the Fitness Landscape for Parameter Estimation in Stochastic Biochemical Models

Featured Application: Potential applications of this work regard the parameter estimation and the resulting analyses of complex biochemical systems characterized by a stochastic behavior, which allows for elucidating the unknown underlying mechanisms. Abstract: The parameter estimation (PE) of biochemical reactions is one of the most challenging tasks in systems biology given the pivotal role of these kinetic constants in driving the behavior of biochemical systems. PE is a non-convex, multi-modal, and non-separable optimization problem with an unknown ﬁtness landscape; moreover, the quantities of the biochemical species appearing in the system can be low, making biological noise a non-negligible phenomenon and mandating the use of stochastic simulation. Finally, the values of the kinetic parameters typically follow a log-uniform distribution; thus, the optimal solutions are situated in the lowest orders of magnitude of the search space. In this work, we further elaborate on a novel approach to address the PE problem based on a combination of adaptive swarm intelligence and dilation functions (DFs). DFs require prior knowledge of the characteristics of the ﬁtness landscape; therefore, we leverage an alternative solution to evolve optimal DFs. On top of this approach, we introduce surrogate Fourier modeling to simplify the PE, by producing a smoother version of the ﬁtness landscape that excludes the high frequency components of the ﬁtness function. Our results show that the PE exploiting evolved DFs has a performance comparable with that of the PE run with a custom DF. Moreover, surrogate Fourier modeling allows for improving the convergence speed. Finally, we discuss some open problems related to the scalability of our methodology.


Introduction
The parameter estimation (PE) problem consists of estimating the reliable kinetic parameterization of a biochemical model and represents one of the most complex but fundamental tasks in systems biology, since kinetic constants are necessary to simulate the dynamics of the system under investigation. The inference of a good parameterization thus allows for validating the model and for formulating predictions on the emergent behavior of the system in perturbed conditions [1]. Similar to other real-world optimization problems, PE is typically characterized by many local optima that impede reaching the global optimum due to premature convergence of any optimization meta-heuristics. Moreover, when dealing with the PE of stochastic biochemical models, the multidimensional fitness landscape is rugged, making the optimization process even harder and more time-consuming [2][3][4].
Many works proposed solutions focused on the simplification of the optimization task by exploiting specific transformation functions to re-map the original candidate solutions into a modified search space or by using "smoothed" surrogate fitness landscapes [5]. The manipulation of the fitness landscape can be realized by exploiting the so-called dilation functions (DFs) [6], which can "compress" and "dilate" localized areas of the search space, or other approaches such as the Shrinking Space Technique [7][8][9], which alters the search space to concentrate the search on specific areas, and the Space Transformation Search, [10], which transforms the original search space and simultaneously evaluates the solutions in both the original and transformed spaces. The generation of surrogate models has been shown to be particularly advantageous when algorithms that generally require a huge number of fitness evaluations to converge to the optimal solution-e.g., evolutionary computation and swarm intelligence-are used to tackle the optimization problem. As a matter of fact, the surrogate model is based on an approximated fitness function that can be evaluated with a reduced computational effort [11]. Multi-modal, rugged, and noisy fitness landscapes-such as those typically associated with the PE problem of stochastic biochemical models-can be efficiently tackled with a surrogate modeling method called surF [12], which exploits the discrete Fourier transform. While surF is applied here on the landscape induced by a stochastic model, it can clearly be applied to a deterministic landscape, as carried out, for example, in [12,13]. A peculiarity of surF is that the number of low frequency spectral coefficients used for the inverse Fourier transform can be set to adjust the smoothness of the created surrogate model.
In [5], DFs and surF were coupled with FST-PSO [14], a settings-free version of particle swarm optimization (PSO) [15], to investigate the PE problem of biochemical systems. The transformation of the candidate solutions operated by the DFs together with the surrogate model obtained with surF allowed for improving the optimization process carried out with FST-PSO, which altogether provided good-quality results for the PE problem while reducing the number of fitness evaluations required.
In this paper, we extend the work presented in [5] by extensively investigating the impact of (1) DFs [6] and (2) the combination of DFs and surF [12] on the performance and the results of the PE of stochastic biochemical models. Moreover, we tackle a limitation of these approaches that typically requires specific knowledge about the features of the fitness landscape of the optimization problem under investigation. To this aim, we included in the PE problem an additional parameter to be estimated, which encodes the number of low-frequency Fourier coefficients retained to build the surrogate with surF, and we automatically derived an optimal DF utilizing a separate optimization process. Our results highlight the advantages of using these approaches when dealing with the PE problem and reveal some limitations regarding the optimization of the DFs.
The paper is structured as follows. In Section 2, we introduce the formalism used to define reaction-based models of biochemical systems, the algorithm employed for the stochastic simulations of the dynamics, the meta-heuristics exploited for the optimization process, the concept of DF, and the method used to create surrogates of the fitness landscapes. In Section 3, we present the results related to the investigation of the effect of using DF to solve the PE problem and the performance of DFs when coupled with surF. Section 4 provides some final remarks and directions for future works.

Reaction-Based Modeling and Stochastic Simulation Algorithm
Considering the stochastic formulation of chemical kinetics [16], we can define a reaction-based model (RBM) by specifying two disjoint sets:

•
The set S = {S 1 , . . . , S N } of molecular species; • The set R = {R 1 , . . . , R M } of the biochemical reactions describing all interactions among the species in S. In RBMs, a biochemical reaction is defined as follows: where α m,n , β m,n ∈ N indicate the stoichiometric coefficients associated with the n-th reactant and with the n-th product of the m-th reaction, respectively. The value c m ∈ R + , associated with reaction R m , is named the stochastic (kinetic) constant and encompasses all its chemical-physical properties. Although deterministic simulations of mechanistic RBMs allow for reproducing the dynamic behavior of biochemical systems [17], only stochastic simulations of RBMs can reproduce the emerging effects due to biological noise, which plays an important role when the molecular species occur in low amounts within the system [18].
The Stochastic Simulation Algorithm (SSA) [16] is a seminal procedure that allows for generating an exact realization of the temporal evolution of an RBM SSA requiring that (i ) all reactions involving the chemical species must take place in a single volume, characterized by constant physical conditions (e.g., pressure and temperature) throughout the simulation; (ii) the molecules must be uniformly distributed within the volume, which is then considered well-stirred; and (iii) the (discrete) quantity of each molecular species is denoted as an integer number x n ∈ N, for each n = 1, . . . , N.
The state of the system at time t is denoted in SSA by the vector x = x(t) ≡ (x 1 (t), . . . , x N (t)), and its content is updated according to the execution of reactions. To be more precise, at each step of the simulation, SSA identifies the reaction that will take place in the time interval [t, t + τ) according to the probability of each reaction to occur in the infinitesimal time step [t, t + dt), which is proportional to the so-called propensity function of each reaction. The propensity functions are calculated as a m (x) = c m d m (x), where d m (x) correspond to the number of distinct combinations of the reactant molecules involved in reaction R m , in the current state of the system x.
Successively, SSA calculates the waiting time τ before the reaction is executed, as follows: where ζ 1 is a random number sampled from the uniform distribution in (0, 1), and a 0 (x) = ∑ M m=1 a m (x). m denotes the index of the reaction to fire and corresponds to the smallest integer in the interval [1, M] where ζ 2 is a random number sampled in from the uniform distribution in [0, 1]. The interested reader is referred to [16] for more details about SSA.
It is worth noting that the vector c = (c 1 , . . . , c M ) of stochastic constants governs the behavior of the system, since it affects the calculation of the propensity functions of the reactions. Unfortunately, the values in c are hard (or even impossible) to measure with specific lab experiments; nonetheless, such fundamental parameters can be estimated by exploiting computational intelligence methods [19].
In this work, the PE problem for the inference of c is tackled by assuming that the sets S and R, and the vector x(0) that denotes the molecular amounts of the species present in the RBM at time t = 0 are known. We also assume that experimental data (e.g., the amounts of a subset of molecular species measured at some time points t = {t 1 , . . . , t K }) are available.
The PE consists of identifying the vector c of stochastic constants that allow for obtaining simulated dynamics that resembles the behavior of the system as observed in the experimental data. Formally, the PE problem can be re-stated as the minimization of the following fitness function: where X c n (t k ) is the simulated amount of species S n at time t k , obtained with the putative parameterization c and O n (t k ) is the experimental (target) amount of S n measured at time t k .
Note that, due to the stochasticity of SSA simulations, two or more evaluations of the fitness of the same parameterization c, considering the same initial state x(0), generally lead to different values. In Figure 1, we show the result of 100 independent SSA simulations of the model of Michaelis-Menten (MM) enzyme kinetics [20], which will be used as a test case in this paper. The MM model consists of the following reactions: Figure 1 highlights that different SSA runs of the MM model-executed using the stochastic constants associated with the reactions, and the initial condition given in Table 1generate simulation outcomes that are quantitatively different from each other (due to stochastic noise) but in agreement from a qualitative point of view, as denoted by the distributions of the final amount of the three species S (substrate), E (enzyme), and P (product).

Molecular Species Amount
The noise due to stochastic simulations affects the fitness evaluations and thus hampers the identification of the global optimum of the PE problem: any hill climbing or gradientbased approach would be misled by this issue and generally be unable to converge to the global optimum. This is the reason why computational intelligence meta-heuristics should be definitely preferred in this context [19]. In this paper, we exploit the swarm intelligence method named FST-PSO [14] that is briefly described in the next section.

Fuzzy Self-Tuning Particle Swarm Optimization
Particle swarm optimization (PSO) is a widespread meta-heuristics for global optimization [15]. PSO is based on a population (i.e., the swarm) of candidate solutions (i.e., the particles) that move inside a bounded D-dimensional search space. The particles cooperate with the aim of identifying the global best solution to the problem under investigation, and the strategy exploited during the optimization balances the global exploration and the local exploitation capabilities of the particles. In particular, the convergence process is driven by two specific settings called cognitive attractor C cog ∈ R + and the social attractor C soc ∈ R + of the particles. The performance of PSO is also influenced by the so-called inertia factor ω ∈ R + , which is used to prevent chaotic movement of the particles, and by the maximum and minimum velocity ( v max , v min ∈ R D ) allowed along each dimension of the search space, which can drastically affect the quality of the optimal solutions found. It is worth noting that the settings of this meta-heuristic must be accurately adjusted considering the characteristics of the optimization problem, a fine-tuning process that usually requires many trials, since these values cannot be determined analytically.
To overcome this limitation, fuzzy self-tuning PSO [14], which is an improved version of PSO that does not require any user settings, has been recently introduced. FST-PSO dynamically adjusts the settings of each particle independently from the rest of the swarm thanks to a fuzzy rule-based system (FRBS). The FRBS evaluates the performances of the particle and its distance from the position of the current global best particle to (possibly) modify its settings during the optimization process.
It has been shown in the literature that FST-PSO can outperform the classic PSO (and many competitor algorithms) in several benchmark [14] and real-world problems [21][22][23][24].

Dilation Functions
Dilation functions (DFs) are reversible transformations of a D-dimensional real space with the aim of "expanding" and "compressing" the regions of the search space in order to facilitate the optimization process and to improve the performances obtained by any optimization method (e.g., PSO). Formally, DFs are defined as mappings in the unitary interval W : [0, 1] → [0, 1]; these mappings can be re-scaled to any arbitrary interval by means of linear transformations. Such functions are applied individually to each dimension of the search space, and it is also possible to apply different DFs to different dimensions. In this work, we employed DFs generated by two different methods: (a) DFs based on control points [6]; (b) DFs automatically determined through evolutionary strategies [25].
A control point χ is a couple such that χ ∈ (0, 1) × (0, 1). A DF is defined by a set of Q + 2 control points χ = {(0, 0), χ 0 , . . . , χ Q−1 , (1, 1)}, where the control points χ i (blue square dots in Figure 2), i = 0, . . . , Q − 1, are in ascending order (i.e., χ i−1 ≤ χ i ≤ χ i+1 ) and the points (0, 0), (1, 1) are called boundary control points (which are represented by gray crosses in Figure 2). The control points map their values into the respective new values according to their definition (such values are represented in Figure 2 by red and yellow points). In order to map any given value of the unitary interval, a linear interpolation between the control points ∈ χ is required. In such a way, the original value is applied to the linear function that interpolates the control points and the point is mapped into the "dilated" value (green points in Figure 2).  In Figure 3, an example of DF based on control points is provided; in particular, an antilog function is represented. The selection of this DF is determined by the log-uniform distribution followed by the stochastic parameters in biochemical models (i.e., they present a uniform distribution when a logarithmic scale is considered). Due to this consideration, in [26], we proved that it is possible to improve the performances of PE using particle swarm optimization (PSO) by means of a particle initialization according to a log-uniform distribution. The anti-log DF can be used to initialize the particles in regions of the search space characterized by fitness values of lower orders of magnitude.
In general, the update rules of the positions of the particles performed in a linear fashion prevent the exploration of such regions. The use of the anti-log DF during the optimization process-and for the initialization of the particles-can overcome this problem by allowing the exploration of the search space in a logarithmic fashion by dilating the regions in the lowest orders of magnitudes. The DF is defined by the control points determined according to Equation (3): for i = 1, . . . , Q, with Q = 9.

Evolving Dilation Functions
Recently, Papetti et al. [25] showed that ad-hoc DFs for a fitness landscape can be evolved autonomously by means of an evolutionary algorithm. The DFs are defined by a composition of basis functions (BFs), which are bijective and monotonic increasing functions that map the unitary interval into the unitary interval. In this work, we used two classes of BFs: (i) a family of linear transformations (reported in Equation (4)); (ii) the folding operators (Equation (5)). The linear transformation class is characterized by the following function: The parameter p (p ∈ [0, ∞)) in the first class of BFs determines the magnitude of the distortion applied to the search space with values further from 1 (see Figure 4 on the left). The folding operator is based on a BF q p , its inverse q −1 p , and a point r ∈ [0, 1], which is the center where the points are moved away or toward (examples of folding operators with different values of r and p are shown in Figure 4 on the right).
In this work, q p = l p and r ∈ 1/4, 1/2, 3/4. We also introduced the identity (i.e., I p (x) = x) function as a BF to let the evolutionary process use fewer transformations than the maximum number allowed and to reduce the complexity of the DF.
The five BFs used in this work to compose the DFs are reported in Table 2.

ID Name Semantics
Folding Operator with l To effectively evolve a DF composed of an arbitrary number of BFs, a two-layered algorithm was developed by Papetti et al. in [25]. This approach leverages the computation of a fitness function f that is based on the computation of the original fitness function f -or on one of its surrogatesf ,f -in various dilated points. In particular, an operative definition of the fitness function f is the following: I points are randomly sampled from the dilated search space according to a uniform distribution; for each dilated point, the respective fitness value f is extracted; and f is the average of the fitness values of the 20% best points w.r.t. the respective fitness values. The outer layer is a (µ + λ)-evolutionary strategy [27] aiming to find the optimal individual representing the structure of the composition of the BFs, i.e., which BFs and in which order they are composed. The evolved individuals are strings of length ξ-with ξ > 1-belonging to the alphabet {0, 1, 2, 3, 4} ξ , where the integers are unique identifiers of the BFs considered for this work (see Table 2). The parameters of the BFs of each candidate DF are optimized by the inner layer by means of FST-PSO (please refer to Section 2.2 for the details) based on the computation of the fitness f . As soon as FST-PSO terminates, the best fitness value found g, together with the parameters listed in the particle, are provided to the outer layer and g is the actual fitness value of the candidate DF that drives the evolution process toward optimal DFs. When the overall evolutionary process ends, the best DF W * -that is the individual with the best value of f -is returned. The number of sampled points I depends on the dimensions of the search space D, and it can automatically be inferred through a heuristic I = 5 + 10 √ D .

Surrogate Fourier Modeling with surF
Periodic functions, as well as functions defined only on a limited domain, can be represented via a weighted sum of frequencies by computing their Fourier transform. Representing a function via frequencies allows for performing some interesting manipulations. For example, removing higher frequencies has a smoothing effect on the signal. This effect is being employed by the recently introduced Fitness Landscape Surrogate Modeling with Fourier Filtering (surF) technique [12]. There, a surrogate model of the fitness landscape is produced by "smoothing out" the original landscape with the intention of removing local optima where the optimization might stop.
Let D be the number of dimensions of the search, space and let the search space itself be the hypercube [ , u] D ⊂ R D . Let f : [ , u] D → R be the objective function, i.e., f represents the fitness landscape. Since f is usually not known in an analytical form, we work with samples of f and, consequently, we employ a discrete version of the Fourier transform, the discrete cosine transform (DCT) [28]. It is worth noting that the time and space complexity of surF mainly depends on the choice of method used to compute the Fourier transform; thus, different approaches would give different space and time complexity bounds.

Discrete Cosine Transform.
To recall the DCT, let us start with the simpler case of a single dimension, i.e., D = 1. Hence, we have a function f : [ , u] → R and ρ ∈ N equally spaced points z 0 , . . . , z ρ−1 in the interval [ , u], where f is the sample. That is, we know f (z 0 ), . . . , f (z ρ−1 ). The DFT is used to represent each of these points as a weighted sum of frequencies: Notice how the entire set of ρ points is entirely determined by the coefficients ψ 0 , . . . , ψ ρ−1 representing the amplitudes of the different frequencies. That is, given ψ 0 , . . . , ψ ρ−1 , it is possible to recover the values f (z 0 ), . . . , f (z ρ−1 ) via the inverse DFT.
Working with frequencies, however, has some advantages: since each ψ j is associated with a different frequency, we can manipulate it. One of the simplest modifications can be performed by zeroing out all but the first γ ∈ {0, . . . , ρ − 1} coefficients. The effect is that all of the ρ − γ higher frequencies are removed. When computing the inverse DFT on the coefficients ψ 0 , . . . , ψ γ−1 , 0, . . . , 0 we obtain ρ points y 0 , . . . , y ρ−1 as a "smoothed" version of f (z 0 ), . . . , f (z ρ−1 ). While we have only obtained ρ points, we can define a surrogate of f as a functionf : [ , u] → R performing a linear interpolation among the points y 0 , . . . , y ρ−1 . Sincef is a smoothed version of f , it might have a reduced amount of local optima, thus helping the optimization process.
The procedure can be extended in any number D of dimensions. Instead of having ρ points sampled from f , there will be ρ D points sampled from f in an equally spaced grid in [ , u] D . Consequently, ρ D coefficients are obtained via DFT and of those, only γ D are preserved (i.e., all ψ i 1 ,...,i D with i j < γ for all 1 ≤ j ≤ D). Hence, the fitness landscape reconstructed by surF has a tunable level of non-zero high-frequency components, which can be controlled by means of the γ hyperparameter.

Reducing the number of samples.
The major drawback of the previous approach is that sampling ρ D points from f can be extremely expensive, since the number of points grows exponentially with the number of dimensions. Hence, a way to reduce the number of sampling operations of f is employed by surF. The main idea is to use σ ρ D points of f to generate a first surrogate off , which is then used for the sampling and the remaining of the surF procedure.
To constructf , σ points z 0 , . . . z σ−1 are randomly selected in [ , u] D . The selection procedure can be, for example, a uniform selection in the search space. To evaluatef on a point z ∈ [ , u] D , the following procedure is employed: • If z is in the convex hull defined by the points z 0 , . . . z σ−1 , thenf ( z) is obtained by a linear interpolation; • otherwise, a linear interpolation is not possible andf ( z) is defined asf ( z ), where z is the point among z 0 , . . . z σ−1 nearest to z.
Oncef is constructed, we can sample ρ D points from it and proceed with the previously define construction of a smooth surrogate (off in this case).

Parameters of surF.
The surF algorithm requires the following settings: • σ, which is the number of samples from f used to buildf ; • ρ, which is the "density" of samples fromf to obtain the ρ D points used to calculate the DFT; and • γ, which controls the number of low frequencies preserved.
Concerning the first settings, in this work, we assume σ = 100, generated using quasi random sampling using Sobol sequences [29], one of the five methods natively supported by surF along with pseudo-random generators, chaotic sequences, quantum random numbers, and entropic point packing [13] . Quasi-random sampling is surF's default because it was shown to be the most effective approach when a few samples can be collected [13].
The ρ value should be selected carefully in order to not exceed the resources of the machine. In this work, we investigate the PE of a model with 3 missing parameters using ρ = 100, leading to the calculation of 1 × 10 6 interpolated values.
Finally, the γ parameter is particularly sensitive: as a matter of fact, it has a relevant impact on the optimization because a γ value too small can make the reconstruction of the original global minimum impossible. Since the shape and characteristics of the fitness landscape are unknown, the optimal γ cannot be determined using analytical approaches. One possible solution for this problem is trial and error: repeat the optimization with an increasing number of γ values, observing which setting leads to the best solution. Alternatively, we can co-evolve the optimal γ by extending the candidate solutions' representation, as described in the next section.

Results
In this section, we investigate the impact on the PE performances of both DFs and surF. In all tests that follow, we estimate the stochastic parameters of the MM model using the settings-free algorithm FST-PSO, running for 100 iterations. The γ parameter of surF is co-evolved with the rest of the parameters (the range of possible values was set to [2,10]). As baseline, we exploit the variant of FST-PSO with fuzzy rules for minimum velocity disengaged, denoted by FST-PSO no vmin , which was shown to be the most effective choice for this specific problem [22]. In the case of problems modified using a DF or a combination of DF and surF, we exploit the standard version of FST-PSO. All of the analyses were implemented in the Python programming language, exploiting the FST-PSO [14] and surF [13] libraries.

Effect of DFs on the PE Problem
As a first group of tests, we compared the performance of the PE using the analytical anti-log DF and an evolved DF. In the latter case, we automatically evolved DFs composed of up to 5 BFs by means of the two-layered algorithm presented in Section 2.3. The population of the evolutionary algorithm was formed by 10 individuals, and we evolved the population for 11 generations. Each process to determine the best parameterization of the BFs was performed with 10 particles for 10 iterations. The number of sampled points I to compute f was determined with the heuristic proposed in Section 2.3, which corresponds to I = 23 in the following experiments. The overall budget to evolve the DF with such settings is 253, 000 fitness evaluations.
The actual DFs used here-i.e., the anti-log and the evolved one-are shown in Figure 5. It is worth noting that the optimal DF evolved by our approach for the PE of the MM model, despite being "stronger" than the analytical DF, applies a similar dilation to the fitness landscape, i.e., the region corresponding to the lowest orders of magnitude is expanded. To be more specific, the optimal DF obtained for the PE is a composition of three BFs (two out of five BFs were identity functions that can be ignored), as follows: l 0.09 (F 1 2 (l 4.65 (l 0.02 (x d )))), for all d = 1, . . . , D. Figure 6 shows the fitness landscapes for the PE of the MM model (restricted to the parameters c 2 and c 3 in order to represent the 3D landscape) corresponding to the original problem (left), the landscape dilated with the analytical anti-log function (center), and the landscape dilated using the evolved DF (right). The effect of these DFs is to shift away the global optimum from the lowest orders of magnitude. The shift exerted by the evolved function seems to be stronger, revealing a larger region characterized by sub-optimal fitness values (yellow points), even though the intensity also depends on the magnitude of the original parameters.  We compared the performance of FST-PSO, executed considering the three aforementioned fitness landscapes to estimate the parameters of the MM model. In each case, we performed 30 independent runs using the default functioning settings that correspond to 14 individuals and 100 iterations. In the first case (i.e., the original fitness landscape), the fuzzy reasoning that governs the v min of FST-PSO is disabled because it is known to affect the performance in the original formulation of the PE problem [22]. Figure 7 shows the convergence plot (left) and the distributions of the best fitness values found over 30 runs of each methodology (right). These results show that the analytic DF (green dashed line) is the best option for the PE problem: the average best fitness (ABF) is indeed much lower with respect to the original landscape (coral solid line) and slightly lower than in the case of the evolved DF (blue dotted line). Although the best individual in the first iteration of the optimization with the evolved DF is, on average, worse than in the case of the analytical anti-log DF, all runs based on the evolved DF are characterized by a very fast convergence: as a matter of fact, after 20 iterations, the average quality of the identified solutions is comparable (Figure 7, left), as also confirmed by the boxplots (Figure 7, right). The distributions reported by means of the boxplot were compared by leveraging the Mann-Whitney U rank test. The p-values confirmed that the differences are statistically significant. It is worth noting that the original landscape induced by the PE problem is difficult to explore, even with a low number of dimensions (3,

Combining DFs and Fourier Surrogate Modeling
As a second group of tests, we applied the new surF algorithm to all the previous fitness landscapes to optimize a filtered and smoothed version of the optimization problem. Figure 8 shows the effect of surF using γ = 2: in the case of the original landscape, the low number of Fourier coefficients prevents the reconstruction of the high frequency optimum, a circumstance already discussed in [5,12]; in the case of the landscape dilated with the antilog DF, the smoothed landscape is denoted by a global optimum corresponding to the actual target parameterization; and finally, in the case of surF applied to the fitness landscape dilated using the evolved DF, the problem of the missing high frequency components again prevents a correct reconstruction of the landscape, as in the case of the original problem. To further investigate the effect of the parameter γ on the fitness landscape, we created different surrogate models starting from the fitness landscape obtained with the evolved DF (Figures 9). The plots highlight that, even in the case of γ = 100, the global optimum cannot be properly reconstructed, as the excessively strong dilation caused by the evolved DF is detrimental to the optimization results when coupled to Fourier surrogate modeling. Figure 10 confirms this insight by showing that FST-PSO cannot converge in the case of the surrogate model created with surF, starting from the fitness landscape dilated with the evolved DF (blue dotted line); on the contrary, FST-PSO using the surrogate model created with surF starting from the fitness landscape dilated with the analytical anti-log DF, is capable of reaching the global optimum. In any case, it is worth mentioning that, thanks to the strategy of co-evolving the γ value with the rest of the candidate solution, the methodology no longer requires domain knowledge to achieve accurate solutions for the PE problem of biochemical systems. ,

Discussion
In this work, we investigated the impact of two methods for the fitness landscape manipulation, namely surF [12] and DFs [6], on the PE problem of stochastic biochemical models. Following the results obtained in [5], we addressed a common limitation of these two methods, that is, the need for expert knowledge regarding the characteristics of the fitness landscape. This was achieved by modifying the surF algorithm to co-evolve the optimal value of the γ hyperparameter and by performing a separate optimization process to evolve an optimal DF for the problem at hand [25].
Our results show that the PE employing the evolved DF has a performance that is comparable with the one employing the analytical DF, further corroborating their application for the PE of biochemical models and in contexts where there is no knowledge about the fitness landscape [25]. The co-evolution of the γ hyperparameter in surF also proved to be an effective strategy by improving the convergence speed and the final results with respect to a PE not employing surF surrogates, notably with one less hyperparameter to be manually set by the user when compared with the original surF algorithm [12]. Figure 11 shows the distribution of the optimal γ values automatically identified by our approach. As can be observed from the plot, the mode of the distribution is γ = 7 and, overall, the optimization of this parameter led always to values between 5 and 8. This confirms the adequacy of the search space for the parameter γ (i.e., [2,10]) employed in our tests.
However, these improvements were observed only on fitness landscapes that were dilated with the analytical DF exploiting expert knowledge. In our tests, the application of surF prevented the convergence to optimal solutions on landscapes dilated with the evolved DF, suggesting that not all DFs are beneficial to the construction of surrogates by means of surF, even when a high value of γ is employed. As a future extension of this work, we plan to further investigate this issue by evolving DFs that take into account the transformations of the landscape applied by surF. Moreover, we plan to improve the evolution of DFs by identifying different dilations for each dimension of the considered PE problem.
Another limitation that affects surF regards the algorithm used to generate the surrogate models, which is characterized by a high time and space complexity. As already discussed in [13], surF relies on DFT, which can calculate the multi-dimensional spectra of the fitness landscape from equispaced samples. That is, surF interpolates the samples over a multi-dimensional regular lattice, an operation characterized by exponential complexity. In practical terms, surF has reasonable time and space complexity up to five or six variables, but the computation time required makes it unsuitable for more than four variables. Thus, we plan to investigate different approaches to computing the Fourier transform using, for example, sparse samples. This will greatly improve the performances of surF, in terms of computation time. We will also explore alternative strategies, not involving the computation of a Fourier transform, to generate surrogate models with a less-than-exponential complexity, thus allowing for the synergistic application of DFs and surF to high-dimensional optimization problems, such as the PE of large RBMs with several missing parameters [30].

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations and mathematical notation are used in this manuscript: the experimental (target) amount of S n measured at time t k p parameter of the linear basis function l p ψ ρ coefficient representing the amplitude of ρ-th frequency r parameter of the folding operator R set of biochemical reactions R m m-th biochemical reaction ρ number of equally spaced points to build the surrogate function S set of molecular species S i i-th molecular specie σ number of samples used to construct the surrogate t time of the system t vector of time points t k k-th time point τ waiting time v max maximum velocity of the FST-PSO particles v min minimum velocity of the FST-PSO particles ω inertia factor of FST-PSO X c n (t k ) simulated amount of the species S n at time t k x(t) vector representing the state of the system at time t x n amount of the n-th molecular specie χ control point χ Q Q-th control point ξ length of the individuals representing the DFs χ vector of control points y ρ fitness value of the ρ-th point of the surrogate z ρ ρ-th point of the search space to build the surrogate function ζ 1 random number sampled from an uniform distribution ζ 2 random number sampled from an uniform distribution