Simulation-Based EDAs for Stochastic Programming Problems

: With the rapid growth of simulation software packages, generating practical tools for simulation-based optimization has attracted a lot of interest over the last decades. In this paper, a modiﬁed method of Estimation of Distribution Algorithms (EDAs) is constructed by a combination with variable-sample techniques to deal with simulation-based optimization problems. Moreover, a new variable-sample technique is introduced to support the search process whenever the sample sizes are small, especially in the beginning of the search process. The proposed method shows efﬁcient results by simulating several numerical experiments.


Introduction
Realistic systems often lack a sufficient amount of real data for the purposes of output response evaluation. This fact represents a blocking obstacle to the design process of most of the optimization techniques. However, several processes require optimization at some stage, e.g., engineering design, medical treatment, supply chain management, finance, and manufacturing [1][2][3][4][5][6][7][8][9]. Therefore, real data alternatives are investigated. For instance, data augmentation is used to expand small-sized sets by applying some transformations to these given real sets [10,11]. Nonetheless, the nature and size limitation of some real data set do not guarantee sufficient diversity of the generated augmented data. Therefore, simulated data are a good choice either for these cases or even other cases. Recently, the combination between simulation and optimization has drawn much attention. The term "simulation-based optimization" is commonly used instead of "stochastic optimization," since almost all recent simulation software package contain an optimization procedure for creating applicable simulation methods [12,13].
Simulation-based optimization has been used for optimization of real world problems accompanied with some kind of uncertainties in the form of randomness. These problems may need to deal with systems and models, which are highly nonlinear and/or have large-scale dimensions. These kinds of problems can be classified as stochastic programming problems, in which a stochastic probability function is used to estimate the underlying uncertainty. A problem of such kind is defined mathematically as follows [14]: where f is a real-valued function defined on search space X ⊆ R n with objective variables x ∈ X and ω is a random variable whose probability density function is F. The choice of the optimal simulation parameters is critical to performance. However, the optimal selection of these parameters is still a quite challenging task. Because the simulation process is complicated, the objective function is subjected to different noise levels, followed by expensive computational evaluation. The simulation complication is characterized by: • the calculations and time cost of the objective function, • the difficulty of computing the exact gradient of the objective function, as well as the high cost and time consuming element of calculating numerical approximations of it, and • the included noise in objective functions.
To deal with these issues, global search methods should be invoked in order to avoid using classical nonlinear programming methods, which fail to solve these types of multiple local optima problems.
Recently, a great interest has been given to using some artificial tools in optimization. Metaheuristics play a great role in either real-life simulation or invoking intelligent learned procedures [15][16][17][18][19][20][21][22][23][24]. Metaheuristics exhibit good degrees of robustness for a wide range of problems. However, these methods suffer from slow convergence in cases of complex problems, which leads to high computational cost. This slow convergence may come from the random structures of exploring the global search space of such methods. On the other side, metaheuristics cannot utilize local information to infer promising search directions. On the contrary, a main characteristic of local search methods is characterized by their fast conversion. Local search methods exploit local information to estimate local mathematical or logical movements in order to determine promising search directions. However, the local search method can easily be stuck at local minima, i.e., it is highly probable to miss global solutions. Therefore, design hybrid methods that combine metaheuristics and local search are highly needed to obtain practical efficient solvers [25][26][27]. Estimation of Distribution Algorithms (EDAs) are promising metaheuristics-in which exploration for potential solutions in search space depends on building and sampling explicit probabilistic models of promising candidates' solutions [28][29][30][31].
There are several optimization and search techniques that have been developed to deal with the considered problem. The variable-sample method is one of these techniques [32]. The main idea of the variable-sample method is to convert a stochastic optimization problem to a deterministic one. This is achieved by estimating the objective function that contains random variables at a certain solution by sampling it with several trails. This type of Monte Carlo simulation can obtain approximate values of the objective function. The accuracy of such simulation depends on the sample-size; therefore, using a large enough size is recommended. The variable-sample method controls the variability of such sample-size in order to maintain the convergence of search methods to optimal solutions under certain conditions. A random search algorithm called Sampling Pure Random Search (SPRS) was previously reported in [32] by invoking the variable-sample method. In SPSR, the objective function with random variables is estimated at a certain input by the average of variable-sized samples at that input. Under certain conditions, the SPSR algorithm can converge to local optimal solutions.
More accurate solutions could be obtained in many cases for stochastic programming problem, e.g., [15,18,33]. In this paper, we present a novel metaheuristic method of Estimation of Distribution Algorithm (EDA). The proposed method is combined with the SPRS method [32], and a modified sampling search method called Min-Max Sampling Search (MMSS) in order to deal with the stochastic programming problem. The main modification in the MMSS method is to use a function transformation instead of the average of the function values, especially when the samples size is not sufficiently large. Actually, the proposed EDA-based search method starts searching the space with relatively small sample sizes in order to achieve faster exploration. Subsequently, the EDA-based method increases sample sizes while the search process progresses. The proposed modified MMSS method restricts the noisy values of the estimated function in small-size samples at the early stages of the search process. Several experiments with their technical discussion have been done in order to test the performance of the proposed methods.
The rest of the paper is organized as follows. The main structures of the estimation of distribution algorithms and variable-sample methods are highlighted in Sections 2 and 3, respectively. In Section 4, we highlight the main components of the proposed EDA-based methods. In Section 5, we investigate parameter tuning of the proposed methods and explain the experimentation work conducted for evaluation purposes. Detailed results and discussions are given in Section 6. Finally, the paper is concluded in Section 7.

Estimation of Distribution Algorithms
The EDAs have been widely studied in the context of global optimization [28][29][30]34]. The use of the probabilistic models in optimization offers some significant advantages comparing with other types of metahuristics [35][36][37][38]. Firstly, the initial population is generated randomly to fill the search space. Then, the best individuals are used to build the probabilistic model. Usually, the best individuals are selected in order to push the search space to the promising regions. New candidate solutions replace the old solutions partly or as a whole, where the individual of highest quality is the target, and the quality of a solution is measured by its associated fitness value. Generally, EDAs execute a repeated process of evaluation, selection, model building, model sampling, and replacement.
The main objective of EDA is to improve the estimation of the solution space probability distribution by exploiting samples generated by the current estimated distribution. All generated samples are subjected to a selection method to pick certain points to be used for probability distribution estimation. Algorithm 1 explains the main steps of standard EDAs.

Algorithm 1
Pseudo code for the standard EDAs 1: g ← 0 2: D g ← Generate and evaluate M random individuals (the initial population). 3: do 4: D s g ← Select L ≤ M individuals from D g according to a selection method. 5: P g (x) = P(x|D s g ) ← Estimate the joint probability distribution of the selected individuals. 6: D g+1 ← Sample and evaluate M individuals from (the new population) P g (x). 7: g ← g + 1. 8: until a stopping criterion is met.
Many studies have been done in continuous EDAs. Generally, there are two major branches in continuous EDAs. One is widely used that is based on the Gaussian distribution model [28,39] and another major based on the histogram model [40]. The Marginal Distribution Algorithm for continuous domains (UMDAc) is a simple real valued EDA which uses a univariate factorization of normal density in the structure of the probabilistic model. It has been introduced by Larrañaga et al. [41,42]. Some statistical tests were carried out in each generation and for each variable in order to determine the density function that best fits the variables. In this case, the factorization of the joint density function is represented as: where Θ l is a set of local parameters. The learning process to obtain such joint density function is shown in Algorithm 2.

Algorithm 2
Learning the joint density function 1: while the stopping criterion is not met do 2: for i = 1 to n do 3: Set D Se,X i l−1 to be the projection of the selected individuals over the i th variable. 4: Select, using a hypothesis test, the density function f l (x i ; Θ l i ) that best fits D Se,X i l−1 .

5:
Obtain the maximum likelihood estimates for Θ l i = (Θ l,k 1 i , . . . , Θ l,k i i ). At each generation, the form of the learnt joint density function is:

Variable Sampling Path
The variable-sample (VS) method [32] is defined as a class of Monte Carlo methods that is used to deal with unconstrained stochastic optimization problems. Sampling Pure Random Search (SPRS) is the random search algorithm introduced by Homem-De-Mello [32] that invokes the VS method. The sample of average approximation with a variable sampling size scheme replaces the objective function in each main iteration of the SPRS algorithm. Specifically, an estimation of the objective function can be computed as:f where are sample values of the random variable of the objective function. Under certain conditions, the SPRS can converge to a local optimal solution [32]. Algorithm 3 demonstrates the formal algorithm of (SPRS) method.

Algorithm 3 Sampling Pure Random Search (SPRS) Algorithm
1: Generate a point x 0 ∈ X at random, set the initial sample size N 0 , and set k := 1. 2: Generate a point y ∈ X at random. 3: Generate a sample ω k 1 , . . . , ω k N k , then set x k+1 := y. Otherwise, set x k+1 := x k . 6: If the stopping criteria are satisfied, stop. Otherwise, update N k , set k := k + 1 and go to Step 2.

Estimation of Distribution Algorithms for Simulation-Based Optimization
We propose an EDA-based method to deal with the simulation-based global optimization problems. The proposed method combines the EDA with a modified version of the variable-sample method of Algorithm 3. We suggest a modification to the variable-sample method in order to avoid sample low quality resulted by small sample sizes. This modification depends on a function transformation, as shown in Algorithm 4.

Function Transformation
The search process is affected with sample sizes. Using large sizes is time consuming, while using small ones yielding low-quality approximation of function values. A good search strategy is to start the search with relatively small-size samples and then increase the sample size while the search is going on. To avoid degradation of the search process when the sample sizes are small in the early stages of the search, a new function transformation is proposed by re-estimating the fitness function if the sample size (N) falls under a certain threshold, i.e., N < N small with a predefined N small . Specifically, the samples of size N at solution x are evaluated and sorted. Then, the N/2 highest and N/2 lowest objective function values of these samples are averaged and denoted byf max (x) and f min (x), respectively. Then, a new transformed estimation of the objective function at x is computed using the computed values;f max (x) andf min (x): with suitable transformation function φ(·). The choice of a function transformation is preformed empirically to select the function whose best performance among several candidate functions. We found that the following function gives the best performance: where µ is a weight parameter (with 0 < µ ≤ 1) that depends on the sample size N at the current iteration. Algorithm 4 uses the above-mentioned function transformation to modify the sampling search method in Algorithm 3. Generate a point x 0 ∈ X at random, set the initial sample size N 0 , and set k := 1.

3:
while the stopping criteria are not satisfied do 4: Generate a point y ∈ X at random.

The Proposed EDA-Based Method
The proposed EDA-based method is a combination of estimation of distribution algorithms with the UMDAc technique [42], in which the stochastic function values can be estimated using sampling techniques. In the simulation-based optimization problem, the objective (fitness) function contains a random variable. Therefore, function values can be approximated using variable sample techniques, which is illustrated in two different ways as in the SPRS and MMSS techniques.
Algorithm 5 illustrates the main steps of the proposed EDA-based method. The initialization process in the EDA has been applied to create M diverse individuals within the search area using a diversification technique similar to those used in [24,43,44]. Besides this exploration process, a local search has been applied to each individual to improve the characteristics of the initial population. The EDA selection process uses the standard EDA selection scheme, in which the best L individuals with the best fitness function values have been chosen from P g generation to be survive to the next generation P g+1 . In order to improve the performance of the search process, an intensification step has been applied to the best solution in each generation. The main difference between difference between EDA-SPRS and EDA-MMSS is in the function evaluation step. The sample path method SPRS depends on sample average approximation in the evaluation function process. However, the search process is not sufficiently appropriate with small sample sizes. This problem affects the quality of the final solution. EDA-MMSS algorithm is modified by applying the same main steps of the EDA-SPRS method, which were explained in the previous subsection, except the function evaluation method. A modified sample path method is defined to evaluate the function values in a small sample size by Equation (5), while the SPRS method is accepted when the number of sample size is sufficiently large. Algorithm 5 is used with Algorithm 4 for the evaluation function to deal with a stochastic optimization problem in order to improve the performance of SPRS method with a small sample size.

Numerical Experiments
Experiments are conducted to prove the efficiency of the proposed method and its versions. We use various benchmark data sets for evaluation purposes. In this section, we explain the the details of the experimentation procedures.

Test Functions
The proposed algorithm is tested using several well-known benchmark functions [45][46][47][48] in order to check its ground truth of the method efficiency without the effect of noise. A set of classical test functions are listed in Table 1 that are conducting initial tests for parameter setting and performance analysis of the proposed global optimization method. The characteristics of these test functions are diverse enough to cover many kinds of difficulties that usually arise in global optimization problems. For more professional comparisons, another set of benchmark functions (h 1 -h 25 ) with higher degrees of difficulty were used. Specifically, the CEC 2005 test set [47,48] is also invoked in comparisons; see Appendix A.
For simulation-based optimization, two different benchmark test sets are used. The first one (SBO-Set A) consists of four well-known classical test functions; Goldstein and Price, Rosenbrock, Griewank, and Pinter functions that are used to compose seven test problems ( f 1 -f 7 ) [49][50][51]. The details of these seven test functions are shown in Table 2. The mathematical definitions of the test functions are given in Appendix B. The other test set (SBO -Set B) contains 13 test functions (g 1 -g 13 ) with Gaussian noise (µ = 0, σ = 0.2); see Appendix C for the function definitions of this test set.   Table 3 shows the used parameters in the proposed EDAs-based method. An empirical parameter tuning process was followed to find the best values of the parameter set. All parameters are tested to obtain standard settings for the proposed algorithms. The maximum number of function evaluations maxI, which is used as a termination criteria, has two different values according to whether the problem is stochastic or deterministic. The EDA population size R and number of selected individuals in each iteration S are considered as measurable effects to build an efficient algorithm. While the main task is to make a balance between the exploration process and avoid the high complexity, we use large R to explore the search space, and then continue with limited selected individuals S to control the complexity of building and sampling the model. The theoretical part of choosing the parameters of the EDAs has been discussed in [29]. The number of sample size N k , which is used in Algorithm 3 and Algorithm 4, is adapted to be started with N small and then gradually increased to reach N max . The threshold of the small sample size N small and µ parameters used in function transformation in Label 4 are chosen experimentally. The threshold of small sample sizes 300 µ

Parameter Settings
The parameter of the function transformation 0.5 × N/N max maxI Max no. of function evaluations for GO 10,000n Max no. of function evaluations for SBO 1,000,000

Runs
No. of independent runs in each experiment 25

Performance Analysis
In this section, we compare the performance of EDA-SPRA and EDA-MMSS methods. The results are reported in Table 4; these results are the average of errors for obtaining the global optima over 25 independent runs with 500,000 maximum objective function evaluations for each run. In the experimentation, the errors of obtained solutions are denoted by f Gap , which is defined as: where x is the best solution obtained by the methods, and x * is the optimal solution. The results represented in Table 4 show the effect of the function transportation step on the final result. Table 4 reveals that the performance of EDA-MMSS method is better than that of EDA-SPRA at obtaining better solutions in average. Actually, the EDA-MMSS code could reach better average solutions than the EDA-SPRA code in five out of seven test functions. This favors the efficiency of the proposed sampling technique.
The Wilcoxon rank-sum test [52][53][54] is used to measure the performance of all the methods compared. This test is known as a non-parametric test [55,56], which our experiments support. The statistical measures used in this test are the sum of rankings obtained in each comparison and the p-value associated. Typically, the differences d i between the performance values of the two methods on i-th out of n results are calculated. Afterwards, based on the absolute values of these differences, they are ranked. The ranks R + and R − are computed as follows: The statistics of the rank-sum test for an initial comparison between the proposed EDA-SPRS and EDA-MMSS methods are shown in Table 5. Although the EDA-MMSS method could obtain better solutions in four out of seven than the EDA-SPRS method, there is no significant difference between the two compared methods at significance level 0.05.  Figure 1 shows the performance of EDA-SPRA and EDA-MMSS for one random run, and the figure illustrates the significant difference of the search behavior between the two methods, and how the restriction process of the noisy part of the EDA-MMSS in the beginning of the search process affects the quality of the individual later. The EDA-MMSS method manages to reach better solutions than the EDA-SPRA method in five out of seven cases shown in Figure 1. Therefore, the proposed MMSS technique could help the search method to perform better exploration and reach near global optimal solutions.

Results and Discussion
The results of the proposed methods on global optimization and simulation-based optimization problems are discussed in this section.

Numerical Results on Global Optimization
For global optimization problem, we applied the EDA-D algorithm to ten test problems, which are illustrated in Table 1, before applying the EDA-D algorithm on stochastic optimization problems in order to guarantee its efficiency. Then, it has been compared with other metaheuristic methods. The heuristic solution x is said to be optimal in case of the gap defined in Equation (6) being less than or equal to 0.001. Table 6 reported the average f Gap for 25 independent runs of EDA method for each function, and it is compared with the Directed Scatter Search (DSS) method, which is introduced in [51] with 5000 maximum number of function evaluations for each method. The results shown in Table 6 that the performance of the EDA-D code show promising performance, and its f Gap values show its ability of obtaining global minima for 6 of 10 test problems. The comparison statistics are stated in Table 7, which indicates the similar behavior of the two compared methods at the significance level 0.05.  Table 7. Wilcoxon rank-sum test for the results of Table 6.

Comparison Criteria Compared Methods R + R − p-Value Best Method
Average Errors DSS, EDA-D 22 33 0.7337 -For a more professional comparison, more sophisticated functions were used to compare the results of the proposed method with those of some benchmark methods. The hard test set test functions h 1 -h 25 , stated in Appendix A, are invoked in testing the performance of the proposed EDA-D method against seven benchmark differential evolutionary methods [57]. The results of the proposed methods and the seven compared methods using the hard test functions h 1 -h 25 with n = 30 are reported in Table 8. This table contains average errors with 25 independent trials for each method. Results of the methods used in the comparison are taken from [57]. These comparisons indicate that the proposed method is promising, and its results are similar to those of the methods used in the comparison. Comparative statistics for the results in Table 8 are presented in Table 9 using the rank-sum statistical test. The results of this statistical test indicate that there is no significant difference between the proposed method and the global optimization benchmark methods used in the comparison at the significance level 0.05. This indicates that the proposed method is promising in the field of deterministic global optimization. This motivates the idea of combining the EDA-D with sampling methods to solve stochastic global optimization problems.

Simulation Based Optimization Results
The proposed method has been compared with another metahuristic method to demonstrate its performance in terms of simulation optimization. The EDA-MMSS method has been compared with Evolution Strategies and Scatter Search for a simulation-based global optimization problem, which is introduced in [51]. Table 10 shows the comparison among the proposed method, Directed Evolution Strategies for Simulation-based (DESSP), and Scatter Search for Simulation-Based Optimization (DSSSP). The averages of the function values for the tested functions, and the best values over 25 independent runs for EDA-MMSS, DESSP, and DSSSP are simulated to seven test function problems presented in Table 2 with 500,000 maximum function evaluations, and their processing times are shown in Table 11. From Table 10, the EDA-MMSS method has shown superior performance for this function, especially for high dimensional function f 7 .  Table 11. Averages of processing time (in seconds) for obtaining results in Table 10.  Table 12 provides a statistical comparisons for the results in Tables 10 and 11. Although there were no significant differences between the results of the proposed method and those of the methods used in the comparison in terms of solution qualities, it is clear that the proposed method obtained better results on solution averages. This indicates the robustness of the proposed method. Moreover, the proposed method shows superior performance in saving processing times as shown in Tables 11 and 12. The last experiment was conducted to compare the results of the proposed methods with those of recent benchmark methods in simulation-based optimization. Eight methods [58] are used in this final comparison and their results are reported in Table 13 using the SBO Test Set B. The proposed methods could obtain better results than seven out of eight methods used in this comparison. Comparative statistics for these results are reported in Table 14 using the rank-sum statistical test. These statistics indicate that there is no difference between the results of the proposed EDA-SPRS and EDA-MMSS methods. Moreover, the proposed methods have overcome 7 out of 8 methods used in the comparison, which is clear from the p-values.

Conclusions
In this paper, a modified method of Estimation of Distribution Algorithms for simulation-based optimization is proposed to find the global optimum or near-optimum of noisy objective functions. The proposed method is composed of combining a modified version of Estimation of Distribution Algorithm with a new sampling technique. This technique is a variable-sample method, in which function transportation is used with small-size samples in order to reduce the large dispersion resulting from random variables. The proposed sampling technique helps the search method to perform better exploration and hence to reach near global optimal solutions. The obtained results indicate the promising performance of the proposed method versus some existing state-of-the-arts methods, especially in terms of robustness and processing time.