## 1. Introduction

In real-world applications, a wide range of uncertainties have to be taken into account. Therefore, optimization problems under uncertainties have been studied for many years. Generally speaking, there are two types of formulations for handling uncertainties in optimization problems. The first one is the deterministic optimization problem [

1]. The second one is the stochastic optimization problem [

2]. The robust optimization problem is a well-known deterministic formulation [

1]. The robust optimization problem always considers the worst-case performance under uncertainties. Therefore, the overestimation of uncertainties may lead to a conservative decision in practice.

The Chance Constrained Problem (CCP), which is also referred to as the probabilistic constrained problem [

3], is one of the possible formulations of the stochastic optimization problem. CCP is a risk-averse formulation of problem under uncertainties. Specifically, CCP ensures that the probability of meeting all constraints is above a certain level. Since the balance between optimality and reliability can be designated by CCP, many real-world applications have been formulated as CCPs [

3,

4,

5].

CCP has been studied in the field of stochastic programming for many years [

2]. In stochastic programming, the optimization methods of the nonlinear programming [

6] have been used to solve CCP. Recently, Evolutionary Algorithms (EAs) have also been reported for solving CCPs [

7,

8,

9]. However, in the conventional formulation of CCP, a well known probability distribution such as the normal distribution is used widely as a mathematical model of unknown uncertainties. Then, the pseudo data generated randomly by using the Monte Carlo method based on the mathematical model are used to represent uncertainties [

10]. In some cases, the mathematical model is used to derive a deterministic formulation of CCP [

2,

11]. Otherwise, very few data, or scenarios, observed actually are used to represent uncertainties. As a drawback of the conventional formulation of CCP, the estimation error of uncertainties is unavoidable in the evaluation of solutions. In other words, if CCP is defined incompletely, the solution is also defective. Then, we cannot enjoy the benefit of CCP.

In recent years, due to advanced information technologies such as Wireless Sensor Networks (WSN) and Internet of Things (IoT) [

12], huge data sets called “big data” have been easily obtained in various fields including culture, science, and industry [

13]. In many real-world applications, the variance of observed data is caused by some uncertainties. These applications are probably formulated as CCP more accurately by using a large data set instead of the mathematical model.

In this paper, CCP was formulated by using a large data set called a full data set. However, we assumed that the full data set is too large to solve CCP practically. Therefore, in order to evaluate solutions of CCP, we had to reduce the size of the full data set. Clustering is a popular data reduction technique [

14]. Clustering divides a data set into some subsets in order to meet two requirements: “Internal cohesion” and “External isolation”. As a drawback of clustering, the result of clustering depends on the structure of data. Moreover, it is not good at dealing with a huge data set [

15].

Sampling is another technique of data reduction. In particular, Simple Random Sampling (SRS) is widely used due to its easy execution and simplicity [

16]. SRS selects a few samples randomly from a huge data set and discards most of data. As a drawback of SRS, the key information in many data is likely to be lost. Therefore, a new data reduction method called Weighted Stratified Sampling (WSS) has been proposed by authors to use the full data set completely for solving CCP [

17].

By using the new data reduction method called WSS, the above CCP based on the full data set is converted into a relaxation problem of CCP. In order to solve the relaxation problem of CCP efficiently, a new optimization method based on Differential Evolution (DE) [

18] is also contrived in this paper. In the new optimization method, a pruning technique is introduced into an adaptive DE [

19] for reducing the number of candidate solutions to be examined on the process of the optimization.

The proposed approach is applied to a real-world application, namely the flood control planning formulated as CCP [

5]. In addition to the conventional reservoir, the water-retaining capacity of the forest is considered in the flood control planning. Incidentally, various reservoir systems have been studied for protecting a downstream area of river from flood damage [

20,

21,

22]. Even though historical data are used in these studies, many of them have been limited to dealing with problems of deterministic formulation. A stochastic formulation such as CCP is generally a more realistic representation of the flood control planning because stream flows have randomness and are stochastic in nature.

This paper is an extended version of the paper presented in ICIST2019 [

17] and differs from the conference paper in the following three points: (1) The necessary sample size for SRS is derived theoretically. Then, it is shown that the theoretical sample size is too large in practice; (2) By using larger data sets, the performance of WSS is examined more intensively by comparison with SRS. Then, it is proven that WSS outperforms SRS in the accuracy of the estimated probability; (3) The ability of the pruning technique to reduce the run time of the adaptive DE is evaluated. Then, it is shown that the effect of the pruning technique increases proportionally to the sample size of WSS.

The remainder of this paper is organized as follows.

Section 2 formulates CCP from a full data set.

Section 3 explains two data reduction methods, namely the conventional SRS and the proposed WSS. By using a data reduction method, a relaxation problem of CCP is also derived.

Section 4 proposes an adaptive DE combined with a pruning technique for solving the relaxation problem of CCP efficiently.

Section 5 examines the performance of WSS intensively by comparison with SRS.

Section 6 applies the proposed approach to a real-world application, namely the flood control planning formulated as CCP.

Section 7 evaluates the ability of the pruning technique to reduce the run time of the adaptive DE on a personal computer. Finally,

Section 8 concludes this paper and provides future work.

## 7. Performance Evaluation of ADEP

For solving the relaxation problem of CCP efficiently, the pruning technique shown in (

24) is introduced into an Adaptive DE (ADE) and ADEP is proposed. By comparing ADE with ADEP, the ability of the pruning technique to reduce the run time of ADE is evaluated. The flood control planning formulated as CCP in (

37) is used to draw a comparison between ADE and ADEP. Therefore, the control parameters of them are given by

Table 1 except the sampe size

N. Thereby, ADE and ADEP are executed on a personal computer (CPU: Intel(R) Core(TM)

[email protected], Memory: 16.0GB).

By changing the value of

$\beta \in (0,\phantom{\rule{0.166667em}{0ex}}1)$ with a sample size

$N\simeq 482$, ADE and ADEP are applied to the relaxation problem of CCP, respectively, 50 times.

Table 3 shows the results of the experiments average over 50 runs in which

$f({\mathit{x}}_{b})$ is the objective function value of the best solution

${\mathit{x}}_{b}\in \mathit{X}$;

$\widehat{p}({\mathit{x}}_{b},\phantom{\rule{0.166667em}{0ex}}\mathbf{\Theta})$ is the empirical probability provided by

${\mathit{x}}_{b}\in \mathit{X}$. The run time of each algorithm except the generation of the full data set

$\mathit{B}\subseteq {\Re}^{3}$,

$\left|\mathit{B}\right|={10}^{7}$ is also shown in

Table 3. Rate in

Table 3 means the percentage of the trial vectors

${\mathit{z}}_{i}\in \mathit{X}$ which are discarded by the pruning technique used in ADEP. Furthermore, the numbers in parenthesis indicate the standard deviations of the respective values in

Table 3.

From

Table 3, we confirm that the pruning technique works well for reducing the run time of ADEP. Besides, the high rate in

Table 3 shows that more than half of the trial vectors

${\mathit{z}}_{i}\in \mathit{X}$ are eliminated by the pruning technique without evaluating the value of

$h({\mathit{z}}_{i})$ in (

23). From the values of

$f({\mathit{x}}_{b})$ and

$\widehat{p}({\mathit{x}}_{b},\phantom{\rule{0.166667em}{0ex}}\mathbf{\Theta})$ in

Table 3, we can also see that ADE and ADEP find the same solution

${\mathit{x}}_{b}\in \mathit{X}$. Therefore, the pruning technique dose not harm the quality of the solution obtained by ADEP.

By using a larger sample size

$N\simeq 1304$, ADE and ADEP are applied to the relaxation problem of CCP again 50 times.

Table 4 shows the results of the experiments in the same way with

Table 3. From

Table 4, we can also confirm the effectiveness of the pruning technique used in ADEP.

Form

Table 3 and

Table 4, the run times of ADE and ADEP depend on the sample size

N of WSS. The pruning technique of ADEP is more effective when a large sample size is required. We can also see that the sample size

$N=482$ is large enough for solving the flood control planning because there is not much difference between the qualities of the solutions shown in

Table 3 and

Table 4.

The advantage of the pruning technique might not be demonstrated well enough due to the short run times of ADE shown in

Table 3 and

Table 4. The short run time of ADE is attributable to the simple forest mechanism model given by (

35). If the inflow of water is estimated through a complex mathematical computation taking hours [

36] or the amount of rainfall is predicted from a huge weather data set [

37], we must realize the advantage of the pruning technique that surely reduces the run time of ADE without harming the quality of the obtained solution. In any case, we can confirm the expected performance of the pruning technique from the high rates shown in

Table 3 and

Table 4.

## 8. Conclusions

For solving CCP formulated from a huge data set, or a full data set, a new approach is proposed. By using the full data set instead of the mathematical model simulating uncertainties, the estimation error of uncertainties caused by the mathematical model can be eliminated. However, the full data set is usually too large to solve CCP practically. Therefore, a relaxation problem of CCP is derived by using a data reduction method. As a new data reduction method based on the stratified sampling, WSS is proposed and evaluated in this paper. Contrary to the well-known SRS, WSS can use the information of the full data set completely. Besides, it is shown that WSS outperforms SRS in the accuracy of the estimated probability. In order to solve the relaxation problem of CCP efficiently, an Adaptive DE combined with a Pruning technique (ADEP) is also proposed. The proposed approach is demonstrated through a real-world application, namely the flood control planning formulated as CCP.

Since huge data sets are available in various fields nowadays, many real-world applications can be formulated as CCPs without making mathematical models. Therefore, the combination of ADEP and WSS seems to be a promising approach to CCP formulated by using a huge data set. Especially, ADEP is applicable to any CCP in which the probabilistic constraint has to be evaluated empirically from a set of samples. On the other hand, there are the following open problems about WSS.

How to properly make the strata from a full data set for WSS: The performance of WSS depends on the stratification method such as the number of strata and the shape of each stratum. By improving the stratification method, the optimal sample size of WSS will also be found.

How to feedback the values of functions ${g}_{m}(\mathit{x},{\mathit{\theta}}^{n})$ to generate samples ${\mathit{\theta}}^{n}\in \mathbf{\Theta}$: If we can use the function values effectively, we must be able to make the strata for WSS adaptively.

How to cope with high-dimensional data sets: Since the similarity of data ${\mathit{\xi}}^{\ell}\in {\mathit{B}}_{n}$ which are assigned in the same stratum ${\mathit{B}}_{n}\subseteq \mathit{B}$ is reduced in proportion to the dimensionality of the full data set, it may be hard to represent all data ${\mathit{\xi}}^{\ell}\in {\mathit{B}}_{n}$ only by one sample ${\mathit{\theta}}^{n}\in \mathbf{\Theta}$.

In our future work, we will tackle the above open problems about WSS. Moreover, we would like to demonstrate the usefulness of the proposed approach through the various real-world applications which are formulated as CCPs by using huge data sets. In particular, it is necessary that the proposed approach to CCP be evaluated by using real data sets [

36]. We also need to compare ADEP with state-of-the-art optimization methods such as Ant Colony Optimization (ACO) algorithm [

38].