Estimation of Distribution Algorithms with Fuzzy Sampling for Stochastic Programming Problems

: Generating practical methods for simulation-based optimization has attracted a great deal of attention recently. In this paper, the estimation of distribution algorithms are used to solve nonlinear continuous optimization problems that contain noise. One common approach to dealing with these problems is to combine sampling methods with optimal search methods. Sampling techniques have a serious problem when the sample size is small, so estimating the objective function values with noise is not accurate in this case. In this research, a new sampling technique is proposed based on fuzzy logic to deal with small sample sizes. Then, simulation-based optimization methods are designed by combining the estimation of distribution algorithms with the proposed sampling technique and other sampling techniques to solve the stochastic programming problems. Moreover, additive versions of the proposed methods are developed to optimize functions without noise in order to evaluate different efﬁciency levels of the proposed methods. In order to test the performance of the proposed methods, different numerical experiments were carried out using several benchmark test functions. Finally, three real-world applications are considered to assess the performance of the proposed methods.


Introduction
Several real-world applications can be formulated as continuous optimization problems in a wide range of scientific domains, such as engineering design, medical treatment, supply chain management, finance, and manufacturing [1][2][3][4][5][6][7][8][9]. Many of these optimization formulations have some sort of uncertainty and their objective functions contain noise [10][11][12][13]. Moreover, it is sometimes necessary to deal with complex problems with high nonlinearity and/or dimensionality, and occasionally there is no analytical form for the objective function [14]. Even if the objective functions associated with these types of problems are expressed mathematically, in most cases they are not differentiable. Therefore, classical optimization methods fail to adapt them, and it is impossible to compute their gradient. The situation is much worse when these functions contain high noise levels.
Simulation and optimization has attracted much interest recently, since the output response evaluation of such real-world problems need simulation techniques. Moreover, optimization problems in stochastic environments are realized by combining simulation-based estimation with an optimization process. Therefore, the title "simulation-based optimization" is commonly used instead of "stochastic optimization" [15,16].
Simulation-based optimization is used with certain types of uncertainties to optimize the real-world problem. There are four types of uncertainties discussed in [14]: noise in objective function evaluations; approximation of computationally expensive objective functions with surrogate models; changes or disturbance of design parameters after determining the optimal solution; problems with time-varying objective functions. We consider the first type of uncertainty, where the problem is defined mathematically as follows [17]: where f is a real-valued function defined on search space X ⊆ R n with objective variables x ∈ X, and ω is a random variable whose probability density function is F. Problem (1) is also referred to as the stochastic programming problem in which random variables appear in the formulation of the objective functions.
In spite of the importance of the choice of optimal simulation parameters in improving operation, configuring them well still remains a challenge. Because of the complicated simulation process, the objective function is subjected to different noise levels followed by expensive computational evaluation. These problems are restricted by the following characterizations: • The complexity and time necessary to compute the objective function values; • The difficulty of computing the exact gradient of the objective function, as well as its numerical approximation being very expensive; • The noise values in the objective function.
To deal with these characterizations, global search methods should be invoked to avoid using classical nonlinear programming that fails to solve such problems with multiple local optima.
Recently, the use of artificial intelligence methods in optimization has been of great interest. Metaheuristics play a significant role in both real-life simulations and invoking smart methods [18][19][20][21][22][23][24]. Metaheuristics show strong validity rates across a wide variety of applications. These methods, however, suffer from slow convergence, especially in cases of complex applications, which lead to high computational costs. This slow convergence may be a result of the exploration structures of such methods, while exploring the search space depends on the random structures. On another hand, metaheuristics cannot utilize local information to deduce promising search directions. The estimation of Distribution Algorithms (EDAs) comprise a class of evolutionary computation [25] and has been widely studied in the global optimization field [26][27][28][29]. Compared with traditional Evolutionary Algorithms (EAs), such as Genetic Algorithms (GAs), this type of algorithm has neither crossover nor mutation operators. Instead, an EDA explicitly builds a probabilistic model by learning and sampling the probability distribution of promising solutions in each generation. While building the probabilistic model presents statistical information from the search space, it is used as the guidance of reproduction to find better solutions.
On the other hand, several optimal search techniques have been designed to tackle the stochastic programming problem. Some of these techniques are known as variable-sample methods [30]. The key aspect of the variable-sample approach is to reformulate the stochastic optimization problem in the form of a deterministic one. A differential evolution variant is proposed in [12] equipped with three new algorithmic components, including a central tendency-based mutation, adopted blending crossover, and a new distance-based selection mechanism. To deal with the noise, their algorithm uses non-conventional mutation strategies. In [31], an extension of multi-objective optimization is proposed, based on an differential evolution algorithm to manage the effect of noise in objective functions. Their method applies an adaptive range of the sample size for estimating the fitness values. In [32], instead of using averages, the search policy considers the distribution of noisy samples during the fitness evaluation process. A number of different approaches to deal with noise are presented in [33]. Most sampling methods are based on the use of averages, and this motivates us to use different sampling techniques. One possible sampling alternative is the use of fuzzy logic, which is an important pillar of computational intelligence. The idea of the fuzzy set was first introduced in [34]; this enabled a member to belong to a set in a partitioned way, as opposed to in a definite way, as stated by classical set theory. In other words, membership can be assigned a value within the [0, 1] interval instead of the {0, 1} set. Over the past four decades, the theory of fuzzy random variables [35] has been developed via a large number of studies in the area of fuzzy stochastic optimization [36,37]. The noisy part of our problem can be considered to be randomness or fuzziness, and it can be understood as a fuzzy stochastic problem, which can be found in the literature [38][39][40][41][42].
In this paper, EDAs are used to solve nonlinear optimization problems that contain noise. The proposed EDA-based methods follow the class of EDAs proposed in [43]. The designed EDA-model is firstly combined with variable-sample methods (SPRS) [30]. Sampling techniques have a serious problem when the sample size is small, so estimating the objective function values with noise is accurate in these cases. Therefore, we propose a new sampling technique based on fuzzy systems to deal with small sample sizes. Another EDA-based method uses the proposed fuzzy sampling technique. Moreover, additive versions of the proposed methods are developed to optimize functions without noise in order to evaluate different efficiency levels of the proposed methods. In order to test the performance of the proposed methods, different numerical experiments were carried out using several benchmark test functions. Moreover, three real-world applications are considered to assess the performance of the proposed methods.
The rest of the paper is structured as follows. In Section 2, we highlight the main structure and techniques for EDAs. The design elements and proposed methods are stated in Section 3. In Section 4, algorithmic implementations of the proposed methods and numerical experiments are discussed. The results for three stochastic programming applications are presented in Section 5. Finally, the paper is concluded in Section 6.

Estimation of Distribution Algorithms
EDAs were firstly introduced in [44] as a new population-based method, and have been extensively studied in the field of global optimization [26,44]. Despite the fact that EDAs were firstly proposed for combinatorial optimization, many studies have been performed applying them to continuous optimization. The primary difference between EDAs is the aspect of building the probabilistic model. Generally, in continuous optimization there are two considerable branches: one is based on the Gaussian distribution model [25,26,[45][46][47][48][49][50][51], and the other on the histogram model [47,[52][53][54][55][56][57][58]. The first is the most widely used and has been studied extensively. The main steps of general EDAs are stated in Algorithm 1.
Algorithm 1 Pseudo-code for EDA approach 1: g ← 0 2: P g ← Generate and evaluate M random individuals (the initial population). 3: repeat 4: P s g ← Select m(≤ M) individuals from P g according to a selection method.

5:
D g (x) = p g (x|P s g ) ← Estimate the joint probability distribution of the selected individuals. 6: , and evaluate them. 7: g ← g + 1. 8: until a stopping criterion is met.
In the case of adapting a Gaussian distribution model D g (x) in Algorithm 1, it has the form of a normal density with a meanμ and a covariance matrix Σ. The earliest proposed EDAs were based on simple univariate Gaussian distributions, such as the Marginal Distribution Algorithm for continuous domains (UMDA G c ) and Population-Based Incremental Learning for continuous domains (PBIL c ) [26,45]. In these, all variables are taken to be completely independent of each other, and the joint density function is where Θ l is a set of local parameters. Such models are simple and easy to implement with a low computational cost, but they fail with high dependent variable problems. For this problem, many EDAs based on multivariate Gaussian models have been proposed, which adapt the conventional maximum likelihood-estimated multivariate Gaussian distribution, such as Normal IDEA [46,47], EMNA global [26], and EGNA [25,26]. These methods have the same performance, since they are based on the same multivariate Gaussian distribution, and there is no significant difference between them [26]. However, in these methods the dependence between variables is taken, so they have a poor exploitative ability and the computational cost increases exponentially with the problem size [59]. To address this problem, various extensions of these methods have been introduced, which depend on scaling Σ after estimating the maximum likelihood according to certain criteria to improve the exploration quality. This has been done in methods such as EEDA [48], SDR-AVS-IDEA [50], and CT-AVS-IDEA [49].
The EDA with Model Complexity Control (EDA-MCC) method was introduced to control the high complexity of the multivariate Gaussian model without losing the dependence between variables [43].
Since the univariate Gaussian model has a simple structure and limited computational cost, it has difficulty solving nonseparable problems. On other hand, the multivariate Gaussian model can solve nonseparable problems, but it usually has difficulty as a result of its complexity and cost. In the EDA-MCC method, the advantages of the univariate and multivariate Gaussian models are combined according to certain criterion and by applying two main strategies: • Weakly Dependent Variable Identification (WI). In this strategy, the correlation coefficients between variables are calculated to measure how much they are dependent. This means that the observed linear dependencies are measured by their correlation coefficient with each other, as follows: where corr(x i , x j ) is the linear correlation coefficient between x i and x j , cov(x i , x j ) is their covariance, σ i and σ j are their standard deviations, respectively, and i, j = 1, . . . , n.
Briefly, all variables are divided into two sets: W and S, where W is the set of weakly dependence variables, and S is the set of strong dependent variables. These variable sets are defined as follows: where θ is a threshold (0 ≤ θ ≤ 1), and this reflects how much the user trusts the univariate model in the problem. Algorithm 2 shows the main flow of the WI strategy. • Subspace Modeling (SM). This strategy is applied on the S set. The performance of the multivariate model needs a large population size and the complexity of computations increases frequently. The SM strategy is applied on the variables set in S, which preferably has a limited number of variables. If the size |S| of set S is not limited, then the population points are projected to several subspaces of the n dimensional search space. Then, a multivariate model can be built for each subspace, which means that the dependence is considered only between variables in the same subspace. The main steps of the SM strategy are explained in Algorithm 3. Parameter c is a predefined one that controls the number of the subspaces, where (1 ≤ c ≤ n).
After carrying out the WI and SM strategies, the final joint probability distribution function (pdf) has the following form: where φ i (·) is the univariate pdf of variable x i ∈ W, and ψ k (·) is the multivariate pdf of the variables in S k . The main steps of the EDA-MCC method are illustrated in Algorithm 4. Select m ≤ M individuals from P.

4:
Call Algorithms 2 and 3 sequentially to build a model, as in Equation (6).

5:
Generate new individuals P : Variable values of an individual are generated independently from φ i (·) and ψ k (·). Then, combine all generated variable values together to produce an individual. 6: P ← P + P . 7: until a stopping criterion is met.

Estimation of Distribution Algorithms for Simulation-Based Optimization
In this section, new EDA-based methods are proposed in order to deal with nonlinear and stochastic programming problems. Moreover, a new sampling technique is introduced based on fuzzy logic. Before presenting the proposed EDA-based methods, we illustrate the sampling techniques used to deal with noise.

Sampling Techniques
Two different sampling techniques were used to build two EDA-based methods for stochastic programming problems. The first sampling technique is the variable sampling path [30], while the other is the proposed fuzzy sampling technique. The details of these sampling techniques are illustrated in the following sections.

Variable Sampling Path
The variable-sample (VS) method [30] is defined as a class of methods that uses the Monte Carlo simulation to solve the stochastic programming problem. This sampling technique invokes several simulations to estimate the objective function value at a single solution. Search methods can gain benefits from such sampling to convert the stochastic programming problem into a nonlinear programming one. Sampling Pure Random Search (SPRS) [30] is a random search algorithm that uses the VS process. The average of variable-size samples replaces the objective function value of the SPRS algorithm in each objective function evaluation call. The SPRS algorithm can converge, under certain conditions, to an optimal local solution. The formal SPRS algorithm is shown in Algorithm 5.
Algorithm 5 Sampling Pure Random Search (SPRS) Algorithm 1: Generate a point x 0 ∈ X at random, set an initial sample size N 0 , and k := 0. 2: Generate a point y ∈ X at random. 3 , then set x k+1 := y. Otherwise, set x k+1 := x k . 6: If the stopping criteria is satisfied, stop. Otherwise, update N k , set k := k + 1 and go to Step 2.

Fuzzy Sampling
The basic study of possible definitions of a fuzzy number is proposed in [60]. In the case of a valuation (Boolean) set, the membership of any element x ∈ X to the subset A(⊆ X) is given by In the fuzzy set, the membership values fall in the real interval [0,1], as in [34], and µ A measures the degree of membership of an element x in X-i.e., µ A (x) : X → [0, 1]. Many definitions have been introduced for the membership function µ depending on the problem's properties [39,61].
The average sampling in Algorithm 5 works well whenever the sample size N is sufficiently large. However, it fails to estimate the objective function values with small sample sizes, especially in the early stages of the search process, and promising solutions may be lost. Because of this, we proposed a new sampling technique based on fuzzy sets for the better estimation of function values even with relatively small sample sizes. Specifically, if our target is to estimatef (x) using a sample of size N; F(x, ω 1 ), . . . , F(x, ω N ). The proposed fuzzy sampling technique defines that estimation aŝ where µ i is the associated membership function for every simulated value F(x, ω i ), and i = 1, . . . , N.
In order to compute the membership values, three featured sample values are stored. The first two are the maximum value F max and the minimum value F min among the current sample values; F(x, ω 1 ), . . . , F(x, ω N ). The last feature value is called the guide value F G and is selected within the interval [F min , F max ]. Then, the membership values µ i , i = 1. . . . , N, can be defined as in the following formula: where ρ is a radius value set based on the sample size N. The calculation of the membership values takes into consideration two main points: • µ i is set to take values between 0 and 1: its value is near to 1 when the sample values are close to F G , and it is reduced and reaches 0 when it is far from this value at the end of the radius ρ, which is initialized to be (F max − F min ) if the sample size is small; • While the sample size N is increased during the search process, the values of µ i become close to 1 since the radius ρ is expanded to cover the whole interval [F min , F max ].
Algorithm 6 contains the main steps of the proposed Fuzzy Sampling Random Search (FSRS) method to deal with the objective function noise.

Algorithm 6 Fuzzy Sampling Random Search (FSRS)
1: Generate a point x 0 ∈ X at random; set an initial sample size N 0 , and k := 0. 2: Generate a point y ∈ X at random. 3: Generate a sample ω k 1 , . . . , ω k N k . 4: for i = 1, . . . , N k , do 5: Compute F(x k , ω k i ), and F(y, ω k i ). 6: end for 7: Sort the sample values to set F max and F min .
It is worth mentioning that several alternatives have been tested in our experiments to find the best choice for the F G . The conclusion of those experiments is that the median value of F(x, ω 1 ), . . . , F(x, ω N ) gives the best algorithmic results.

EDA-Based Methods for Simulation-Based Optimization
The proposed EDA-based methods are a combination of the EDA-MCC method, which is stated in Algorithm 4, and different sampling techniques for nonlinear and stochastic programming problems. In our proposal to build the EDA model (6), we used the UMDA G c model as a univariate Gaussian model [26], in which the joint density function is defined as where µ = (µ 1 , . . . , µ n ) is the mean and σ 2 = (σ 2 1 , . . . , σ 2 n ) is the variance. Furthermore, the EEDA model [48] was used as a multivariate Gaussian model which is an extension of the EMNA global model [26]. The multivariate joint density function is defined as where Σ is the covariance matrix. In the EEDA method, the covariance matrix is redefined in each iteration by expanding the original matrix in the direction of the eigenvector corresponding to the smallest eigenvalue. In other words, the minimum eigenvalue is reset to the value of the maximum eigenvalue. Algorithm 7 illustrates the main steps of the proposed EDA-based method. Select the best m ≤ M individuals P s g . 6: Compute the variable sets W and S using the WI strategy in Algorithm 2.

7:
Estimate joint density function of W variables using Equation (9). 8: Apply the SM strategy using set S as in Algorithm 3.

9:
Estimate the multivariate joint density function for each variable subset using Equation (10). 10: Generate new (M − L) individuals P C g by using Equations (9) and (10) independently.

11:
Estimate the objective function values at P C g individuals. 12: Apply a local search to each individual in P g+1 . 14: Set g := g + 1. 15: until the stopping criterion is met.
Different EDA-based methods can be generated from Algorithm 7 according to the technique used to estimate the objective function values in Steps 2 and 11. Therefore, we have three versions:

Numerical Experiments
The proposed methods were programmed in MATLAB (see the Supplementary Materials), and tested using different benchmark test functions to prove their efficiency. Four test sets are used to discuss the proposed method results: The versions of the main proposed method were tested using these test sets. Specifically, the DEDA method was tested with Test Sets A and B, while the ASEDA and FSEDA methods were tested with Test Sets C and D. Beside these test sets, we discuss three real-world applications in the next section. To assess the statistical differences between the compared results, the nonparametric Wilcoxon rank-sum test [63][64][65][66][67] was used. This test obtained the following statistical measures:

•
The associated p-value; • The ranks R + and R − which are computed as follows: where d i is the difference between the i-th out of r results of the compared methods. Before discussing the main results, we illustrate the parameter tuning and setting.

Parameter Tuning and Setting
In order to complete the description of our algorithms, the parameters are discussed in this section. Table 1 contains the definitions of all parameters and their best values. Some numerical experiments were tested to find the suitable values of these parameters. Parameter values were set to be as independent from the problem as possible. Despite the the theoretical part of EDAs parameters being studied before-for example, in [27]-the number of population size R values is still a main factor that varies from problem to problem. In fact, trading off between the population size and the number of generation is a main issue.
Before choosing a suitable value for parameter R, a comparison between different values of R = 100, R = 200, and R = 500 was applied for both types of problems (global optimization, simulation-based optimization). The number of function evaluations was set to be fixed at 500,000 in all of these experiments. Table 2 shows that for the global optimization problem, increasing the population size does not have a positive effect on most problems. This means that the search process is more qualified with an extra number of generations. For the simulation-based optimization problem, Figure 1 shows an almost identical performance when applying different values of R = 100, R = 200, R = 500, and R = 1000. Some functions need more exploration of the search space (increase R), such as f 3 and f 5 , but R = 100 is still the best choice for rest of the functions.
For parameters m and L, which follow parameter R, setting higher values yields higher computational times without any significant improvement. For sample size parameters, the settings follow the recommended values in [10,13].

Global Optimization Results
The proposed DEDA algorithm was tested to solve nonlinear programming problems using the test functions in Set A and Set B. These test functions have diverse characteristics to assess various difficulties that occur in global optimization problems. For all test results for global optimizations, the records were obtained over 100 independent runs with a maximum function evaluation of 20,000. First, Table 3 shows average errors (Av.) and standard deviation (Std.) obtained by the DEDA method using Test Set B. It reached the global optima within error gaps less than 10 −3 for 25 out of 40 test functions. The DEDA results were compared with those of the scatter search methods introduced in [10]: • Scatter Search (SS): The standard scatter search method.  Table 4 shows the average errors (Av.), standard deviation (Std.) and success rates (Suc.) obtained by each method using Test Set A. In general, the DEDA obtained lower average errors and higher success rates than the other two methods, as can be seen in the ranks R + and R − in Table 5. However, there was no significant difference between these methods according to the p-values obtained by the Wilcoxon rank-sum test, as shown in Table 5.

Fuzzy Sampling Performance
The FSRS (Algorithm 6) results were compared with those of the standard SPRS (Algorithm 5) in high noise-i.e., N(0, 10 2 ). These results are illustrated in Figure 2 which shows the averagef (x) of the best obtained solutions by each method for every test function. Tested functions with different dimensions were selected from Test Set C. The sample size varies from 10 to 1000. The results shown in Figure 2 reveal that the performance of the FSRS algorithm is consistently better than that of the SPRS algorithm for small number values N. There is no significant difference between them with higher sample sizes. Therefore, the proposed fuzzy sampling could efficiently deal with a wide range of noise, especially with small sample sizes.

Simulation-Based Optimization Results
In this section, we give more details about the experimental results of the comparison between the proposed FSEDA and ASEDA algorithms. Then, the results of the best method are compared with other benchmark methods. Table 6 shows the best and average errors obtained by the two methods using the seven test functions in Set C. The FSEDA method could obtain better solutions than the other method for six out of seven test functions, and its average solutions are better in five out of seven test functions. Therefore, we used the FSEDA results to compare with the other benchmark methods. Two main comparison experiments are presented to test the FSEDA performance against some benchmark methods. The first experiment used Test Set C to make the comparisons with methods in [10,68], while the other experiment used Test Set D with methods in [12,13]. Table 7 shows the best and the average errors obtained by the proposed FSEDA method and the following evolutionary-based methods: • DESSP: Directed evolution strategies for stochastic programming [10]. • DSSSP: Directed scatter search for stochastic programming [10,68].
The results were obtained over 25 independent runs with maximum function evaluations equal to 500,000. Moreover, Table 8 shows the statistical measures of the results compared in Table 7. These statistical measures reveal that there is no significant difference between the proposed method and the other two methods in terms of the best solution found or the average errors. However, for high dimensional function f 7 , the proposed method demonstrated the best performance.  The other comparison experiment compared the FSEDA results with those of the following benchmark methods [12,13] using Test Set D with dimensions 30 and 100.
The average errors over 30 independent runs are reported in the following tables, with computational budgets of 100,000 and 300,000 function evaluations for dimensions 30 and 100, respectively. Table 9 displays the average errors for test functions with n = 30, and the statistical measures of these results are presented in Table 10. The FSEDA outperforms seven out of nine methods in terms of obtaining better average solutions.  The results of test functions with n = 100 are shown in Table 11. Their statistical measures are reported in Table 12. The FSEDA method outperformed six out of nine methods used in this comparison in terms of average solution quality.

Stochastic Programming Applications
In this section, we investigate the strength of the proposed methods in solving real-world problems. Therefore, the FSEDA and ASEDA methods attempted to find the best solutions for three different real stochastic programming applications: • The product mix (PROD-MIX) problem [76,77]; • The modified production planning Kall and Wallace (KANDW3) problem [77,78]; • The two-stage optimal capacity investment Louveaux and Smeers (LANDS) problem [77,79].
These applications are constrained stochastic programming problems. Therefore, the penalty methodology [80] was used to transform these constrained problems into a series of unconstrained ones. These unconstrained solutions are assumed to converge to the solutions of the corresponding constrained problem.
To solve these problems, the proposed EDA-based methods were used with the parameters in Table 1, except the population size, which was adjusted to R = 300. The penalty parameter was set to λ = 1000. The algorithms were terminated when they reached 30, 000 function evaluations.

PROD-MIX Problem
This problem assumes that a furniture shop has two workstations (j = 1, 2); the first workstation is for carpentry and the other for finishing. The furniture shop has four products (i = 1, . . . , 4). Each product i consumes a certain number of man-hours t ij at j a workstation, with man-hours h j being limited at each workstation j. The shop should purchase man-hours v j from outside the workstation j if the man-hours exceed the limit. Each product earns a certain profit c i . The most important aspect is to maximize the total profit of our shop and minimize the cost of purchased man-hours.

The Mathematical Formulation of the PROD-MIX Problem
The formal description of the PROD-MIX Problem can be defined as follows [76,77]. The required values for parameters and constants are also expressed. i The product class (i = 1, . . . , 4). j The workstation (j = 1, 2). x i The quantities of product (decision variables). v j The outside purchased man-hours for workstation j. Therefore, the object function for the PROD-MIX Problem can be expressed as x i , v i ≥ 0, ∀ i, j.

Results of the PROD-MIX Problem
The FSEDA method found a new solution with value f max = 20, 580.99, and the decision variable values x max = (1356.2, 17.4, 88.1, 38.1). The best known value for this problem is f * = 17, 730.3, [76,77]. Figure 3 shows the comparison between the performance of the ASEDA and FSEDA methods. This figure shows that the FSEDA method demonstrated the best performance in terms of reaching the optimal solution.

The KANDW3 Problem
In the KANDW3 Problem [77,78], a refinery makes J different products by blending I raw materials. The refinery produces the quantities x it of the raw material i in period t with cost c i to meet the demands d jt . Each product j requires the raw material i to be stored in a ij . If the refinery does not satisfy the demands in period t, it should outsource y the product with cost h. The main objective is to satisfy the demand completely with a minimum cost.

The Mathematical Formulation of the KANDW3 Problem
The formal description of the KANDW3 Problem can be defined as follows [77,78]. The required values for parameters and constants are also expressed. i The materials (i = 1, . . . , I). j The products (j = 1, . . . , J). t The time periods (t = 1, . . . , T). x it The quantity of material i in the period t (decision variables). y jt The quantity of outsourced product j in period t. The values for demands can be obtained from the Figure 4. The object function for the KANDW3 Problem [77,78] can be expressed as x it , y jt ≥ 0, ∀ i, j, t.

Results of KANDW3 Problem
The FSEDA method found the objective function value f min = 1558.9, with the decision variable values x min = (2, 13,10,20). The best known value for the KANDW3 Problem is f * = 2613, as mentioned in [78]. Therefore, the proposed method found a new minimal value for the KANDW3 Problem. The comparison between the ASEDA and FSEDA methods is shown in Figure 5. In this figure, the FSEDA method provided better solutions as compared to the ASEDA method.

The LANDS Problem
Power plants are the key issue in the LANDS Problem [77,79]. Assume that there are four types of power plants which can be operated by three different modes to meet the electricity demands; the operating level y ij of power plant i in mode j to satisfy the demands d j with the cost h ij . The budget b is considered as a constraint which limits the total cost. The main objective is to determine the optimal capacity investment x i in the power plant i.

The Mathematical Formulation of the LANDS Problem
The formal description of the LANDS Problem can be defined as follows [77,79]. The required values for parameters and constants are also expressed. i The power plant type (i = 1, . . . , 4). j The operating mode (j = 1, . . . , 3). The minimum total installed capacity m = 12.0. b The available budget for capacity installment, b = 120.0. Therefore, the object function for the LANDS Problem [77,79] can be expressed as ∑ i y ij ≥d j , ∀ j, x i , y ij ≥ 0. ∀ i, j.

Results of the LANDS Problem
The objective function value f min = 413.616 was obtained by the FSEDA method with the decision variables x min = (2.6, 2.7, 2.6, 4.3). The best known function value for this problem is f * = 381.85, which is presented in [79]. Figure 6 presents the comparison between the FSEDA and ASEDA performance for the LANDS Problem. In Figure 6, the FSEDA method reached the best solution faster than the ASEDA method.

Conclusions
In this paper, four new algorithms are presented to deal with various problems and applications. The first method is called Fuzzy Sampling Random Search (FSRS), which is a new sampling search technique. The other three methods are EDA-based methods which are denoted by DEDA, ASEDA, and FSEDA. The DEDA method is proposed to deal with deterministic nonlinear programming problems. While the ASEDA and FSEDA methods are designed with average and fuzzy sampling techniques, respectively, to deal with stochastic programming problems. Several sets of benchmark tests involving nonlinear and stochastic programming problems were tested, and the results demonstrate the promising performance of the novel methods. In fact, using a fuzzy membership function is very efficient in containing the anomalous function simulations resulting from small sample sizes. The numerical simulations show that the ASEDA and FSEDA methods are promising simulation-based optimization tools. Moreover, the FSEDA method obtained new optimal solutions for two out of three real-world applications. Finally, the experimental analysis of the proposed methods has enabled us to suggest extending the present work using different metaheuristics to solve simulation-based optimization problems in both continuous and combinatorial domains.