Solving Contextual Stochastic Optimization Problems through Contextual Distribution Estimation

: Stochastic optimization models always assume known probability distributions about uncertain parameters. However, it is unrealistic to know the true distributions. In the era of big data, with the knowledge of informative features related to uncertain parameters, this study aims to estimate the conditional distributions of uncertain parameters directly and solve the resulting contextual stochastic optimization problem by using a set of realizations drawn from estimated distributions, which is called the contextual distribution estimation method. We use an energy scheduling problem as the case study and conduct numerical experiments with real-world data. The results demonstrate that the proposed contextual distribution estimation method offers specific benefits in particular scenarios, resulting in improved decisions. This study contributes to the literature on contextual stochastic optimization problems by introducing the contextual distribution estimation method, which holds practical significance for addressing data-driven uncertain decision problems.


Introduction
Uncertainty is prevalent in various decision-making systems.For optimization problems with uncertain parameters in the objective function, traditional models typically assume that these uncertain parameters follow known probability distributions, thereby establishing stochastic optimization models to solve the problems [1].However, obtaining the true distributions of uncertain parameters is often challenging.Abundant historical data on uncertain parameters and their related features present new perspectives for addressing uncertainty [2], in fields like transportation [3,4] and logistics [5][6][7].
Stochastic problems, without knowing feature information, can use a sample average approximation (SAA) to generate approximate stochastic models [8].However, in the era of big data, we have access to informative features related to uncertain parameters, leading to contextual stochastic problems.When the objective function is linear in uncertain parameters, we can use predict-then-optimize and smart predict-then-optimize methods to predict the point values of uncertain parameters [9].Then, these point predictions are plugged into downstream optimization problems to derive solutions.However, when the objective function is nonlinear in uncertain parameters, good predictions do not necessarily mean good decisions for uncertain optimization problems [10].Various machine learning (ML) methods are used to approximate uncertain parameters as a function of features [2].To some extent, predictive methods lack the capability to consider the influence of prediction errors on downstream decisions.Replacing a random parameter with its mean is widely recognized as potentially resulting in suboptimal solutions for a stochastic optimization model [11].Therefore, we should estimate the conditional distributions of uncertain parameters [12].Weighted SAA (w-SAA) is an advanced method that uses the empirical distributions of uncertain parameters to derive solutions by solving contextual stochastic problems [13].A global predictive prescription method based on quantile regression is proposed to predict the distributions of the unknown parameters in the optimization model by Wang and Yan [14].However, the authors do not estimate the mean and variance of the parameter distribution directly.
From the above literature, we emphasize that the existing studies mainly use empirical distributions to solve contextual stochastic problems, which lack the estimation of the exact underlying distributions of uncertain parameters.Consequently, this study builds on w-SAA but goes a step further; that is, we estimate the conditional distributions of uncertain parameters accurately using ML models and then utilize the estimated distribution to generate a set of estimates, which are then used to obtain approximate solutions to contextual stochastic problems.We use an energy scheduling problem as the case study, build the contextual stochastic optimization model, and conduct numerical experiments with real-world data, as well as four specific ML models, including k-nearest-neighbors (kNN), classification and regression tree (CART), random forest (RF), and kernel regression (KR).The results demonstrate that the proposed contextual distribution estimation method offers specific benefits in particular scenarios, resulting in improved solutions compared to w-SAA and providing new tools for addressing contextual stochastic optimization problems in practice.
The remainder of this study is organized as follows.Section 2 introduces w-SAA and contextual distribution estimation methods.Section 3 presents the mathematical model for our studied case and the results of numerical experiments.Section 4 concludes this study.

Methodology
This section provides an overview of the general form of the contextual stochastic optimization problem, introduces w-SAA and contextual distribution estimation methods, and defines the decision loss as the evaluation metric.In addition, we present the specific steps for implementing the two methods by using four ML models.

Contextual Stochastic Optimization Problem
Consider: (i)  is a random variable ( ∈  ⊂ ℝ   , where   is the dimension of ), with the underlying true distribution   and historical observed data { 1 , … ,   }; (ii)  ( ∈  ⊂ ℝ   , where   is the dimension of ) represents the decision variable, where  is the feasible solution set subject to deterministic constraints; and (iii) (; ) denotes the uncertain cost function.The traditional stochastic optimization problem is defined as follows: If we have historical observed data of feature variables  ∈  ⊂ ℝ   (where   is the dimension of ) related to , denoted as { 1 , … ,   }, where   and   are observed simultaneously for  ∈ {1, … , }, we can establish a dataset: (2)

The w-SAA Method
When (; ) is linear in the random variable  and we have sufficient data, we can utilize an ML model to predict the expected value of the random variable , denoted as  ̂[| =  0 ] =  ̂( 0 ).By plugging this point estimate into the objective function, Problem (2) can be approximated as: However, when (; ) is nonlinear in the random variable  , a good prediction may not lead to a good solution.Given this case, the w-SAA method is an advanced alternative.This study focuses on stochastic problems whose objective functions are nonlinear in their uncertain parameters.
The w-SAA method assigns weights to each available data sample and then utilizes SAA for the solution [13].This method approximates Problem (2) as: where  , ( 0 ) represents the weight assigned to the data sample (  ,   ) based on dataset   , observation  =  0 , and a specific ML method (such as kNN).It is evident that the w-SAA method directly uses the historical data as an empirical distribution, which does not estimate the mean and the variance of the parameter distribution directly.

The Contextual Distribution Estimation Method
We now aim to go a step further beyond w-SAA by estimating the mean and the variance of the conditional distribution of the random variable  given an observation.Specifically, given  =  0 , we seek to estimate the conditional mean [| =  0 ] and the conditional variance [| =  0 ].To elaborate, the estimated conditional mean of  is equivalent to the point prediction of the ML model, denoted as  ̂[| =  0 ] =  ̂( 0 ).The prediction error of the model on the training dataset is defined as   =  ̂[| =   ] −   ,  ∈ {1, … , } [15].Consequently, the estimated conditional variance of  can be calculated as: Then, we plot the frequency distribution of { 1 , … ,   } to fit the random variable  with a suitable distribution (e.g., a Gaussian distribution).Consequently, we obtain an estimate of the conditional distribution given  =  0 , denoted as ̂ |= 0 .Based on this estimated distribution, we can generate a total of  estimates of  , represented as  ̂( 0 ),  ∈ {1, 2, … , }.Subsequently, Problem (2) can be approximated as: Finally, Algorithm 1 shows the procedures of the contextual distribution estimation method.
Algorithm 1.The pseudo-code of the contextual distribution estimation method.
Step 1. Plot the frequency distribution of { 1 , … ,   } and determine the approximate distribution type of the random variable .
Step 5. Solve the following model and obtain the approximate solution:

The Evaluation Metric
To evaluate the effectiveness of the above methods in solving Problem (2), we introduce the decision loss.We define the test dataset   as   = {( +1 ,  +1 ), … , ( + ,  + ):   ∈ ,   ∈ ,  ∈ { + 1, … ,  + }} , where  denotes the number of test data points.Based on the dataset   , we can obtain the decision ̂(  ) for the observation  =   using various methods.The optimal objective function value under complete information is  * (  ) = min ∈ (;   ) ,  ∈ { + 1, … ,  + } .Therefore, the decision loss is defined as: where [̂(  );   ] denotes the actual cost resulting from decision ̂(  ) for the observation  =   , with the difference between [̂(  );   ] and  * (  ) representing the decision loss for the observation  =   .The decision loss   , of a certain method based on   , is defined as the average of all decision losses on the test dataset   .

ML Methods
Subsequently, we present the specific steps for implementing w-SAA and contextual distribution estimation methods by utilizing four ML models: kNN, CART, RF, and KR.

kNN
In the kNN regression model, this study adopts the Euclidean distance as the metric for measuring distances.The Euclidean distance between two data samples   = ( 1  ,  2  , … ,     ) and   = ( 1  ,  2  , … ,     ) can be represented as: In a trained kNN model based on dataset   , when  =   ,  ∈ { + 1, … ,  + } , the predicted value of  is given by: , where ≤ } represents the index set of the k-nearest neighbors of   ; here,  is the hyperparameter of the kNN regression model.
In the w-SAA method, the weights assigned to each training data sample (  ,   ) for   are as follows:

CART
During the training process of a CART, we input the training dataset   and set various hyperparameters of the tree model, including the maximum depth, ℎ max , the minimum number of samples for splitting, t min , and the minimum number of samples for leaf nodes,  min .By tuning these parameters, we obtain the final CART.
The construction process of a CART is as follows: 1. Input training dataset   , hyperparameters ℎ max , t min , and  min .2. For the training dataset  = {( (1) ,  (1) ), … , ( () ,  () )} of the current node: 3.If the number of samples is less than  min :  <  min , 4. or if the tree depth is greater than or equal to ℎ max : depth ≥ ℎ max , 5. return a decision subtree and stop recursion at the current node.where () ∈ {1, … ,   } denotes the leaf node corresponding to input  and   is the number of leaf nodes in the CART.
In the w-SAA method, the weights assigned to each training data sample (  ,   ) for   are as follows: ,  ∈ {1, … , }.

RF
RF is an algorithm that integrates multiple CARTs based on the concept of ensemble learning, with each CART as its basic unit.The hyperparameters of an RF include the forest size  (i.e., how many trees are constructed), the maximum depth of each tree, ℎ max , the minimum number of samples for splitting, t min , and the minimum number of samples for leaf nodes,  min .We first input the training dataset   , then set various hyperparameters, and finally obtain the final model through hyperparameter tuning.
The process of building an RF is as follows: 1. Input training dataset   with feature dimension   and set the forest size .2. For each tree ,  training samples are randomly drawn from   with replacement to form the training dataset for that tree.3. Build each decision tree  using the CART algorithm.4. For each new observation, obtain the final prediction result by averaging the prediction results of all decision trees considered.
The prediction function for decision tree  corresponding to input  can be represented as: where   represents the number of leaf nodes in decision tree  , () ∈ {1, … ,   } denotes the leaf node assigned to input , and  ̂ represents the predicted value of leaf node  of decision tree ,  ∈ {1, … ,   }.Therefore, the prediction function of RF can be expressed as: .
In the w-SAA method, the weights assigned to each training data sample (  ,   ) for   are as follows: ,  ∈ {1, … , }.

KR
KR, as a method for fitting nonlinear models, essentially utilizes kernel functions as weight functions to establish nonlinear regression models.In this study, a Gaussian kernel function is employed to fit the data, shown as follows: where  is a new observation,   is a historical data sample, ‖ −   ‖ is the Euclidean distance (L2 norm) between  and   , and ℎ is the bandwidth, serving as a hyperparameter of the KR model.
Thus, the prediction value of  at   is the weighted sum of all   in the training dataset: ̂KR (  ) = ∑  , KR (  )   =1 .

Case Study
This section begins with an energy scheduling problem and its stochastic optimization model.Subsequently, we present the real-world data used in the numerical experiments and preprocess the data.Finally, we train four ML models and calculate the decision losses for w-SAA and contextual distribution estimation methods.

Energy Scheduling Problem and Its Mathematical Model
This section presents an energy production scheduling problem, where the future 24hour energy prices are uncertain, requiring the company to plan corresponding energy production schedules and maximize the profit obtained from energy sales on the future day.The definitions of parameters and variables are shown in Table 1.The settings of the parameters in Table 1 are designed based on the real situation of an energy company.For example, for energy-consuming enterprises, the larger the production quantity, the higher the unit production cost; therefore, we design a piecewise cost function with the gradually increasing unit production cost.This experimental setup can reflect the essence of a class of decision-making problems that enterprises face in reality.As the energy price per hour is an uncertain parameter, this study establishes the following stochastic optimization model: ≥ 0,   ≥ 0,  ≥ 0,  ∈ .
Objective function ( 6) minimizes the expected total negative profits from energy sales on a future day.Constraint (7) represents the production capacity constraint per unit time.Constraints ( 8)-( 11) denote the cost calculation formulas per unit time.Constraint ( 12) is the formula for total cost calculation.Constraint (13) specifies the variable domain constraints.

Introduction of Datasets
The energy price dataset used in this study consists of 14,592 records [16].We divide it into a training dataset   ( = 11640) and a test dataset   ( = 2952).Each record contains historical values of a random variable  and feature variables .Specifically, the feature vector  = ( 1 , … ,  9 ) ∈  ⊂ ℝ 9 and the random variable  ∈  ⊂ ℝ , along with their practical meanings and ranges, are detailed in Table 2.  Histograms and density curves of the frequency distribution of the random variable  based on { 1 , … ,   } are plotted, as shown in Figure 1.From the graph, it can be observed that, without considering some extreme values, the distribution of the random variable  approximates a normal distribution.Therefore, in the contextual distribution estimation method, this study chooses a normal distribution to fit the random variable .Since the mathematical model in this study is aimed at scheduling energy production within a 24-hour period in the future, during testing on the test dataset   , each of the 24 data samples form an input of a decision problem.Therefore, we construct a set of decision problems {  }, where  ∈ {1, … , } and  = ⌊ 24 ⁄ ⌋ = ⌊2952 24 ⁄ ⌋ = 123.For problem   ,  ∈ {1, … , }, its input data is defined as: where In the w-SAA method, for   ,  ∈ {1, … , 24}, the weight  , (  ) can be obtained through an ML model, where  ∈ {1, … , }.Therefore, there are  24 potential combinations of energy prices in one day.The weight for each parameter combination is ∏  ,  (  )

𝑙=1
, where   ∈ {1, … , }.This study randomly selects  scenarios along with their weights to be inputted into the objective function of the mathematical model for solving, and obtains the decision ̂ w−SAA (   ) .In the contextual distribution estimation method, for   ,  ∈ {1, … , 24}, the estimated distribution ̂ |=  can be obtained.Based on ̂ |=  , we obtain  estimates of , denoted as  ̂(  ),  ∈ {1, 2, … , }.Therefore, there are  24 potential combinations of energy prices in one day, with equal weight for each parameter combination.This study also randomly selects  scenarios to be inputted into the objective function of the mathematical model for solving, and obtains the decision ̂ distr_esti (   ).
The optimal objective function value of problem   under complete information is  * (   ) = min ∈ (;    ); hence, the decision loss is: Based on the training dataset   , this study adopts four ML models, i.e., kNN, CART, RF, and KR, to compare the effectiveness of the w-SAA and contextual distribution estimation methods.
The kNN and KR models involve calculating distances between data samples, thus requiring data standardization before training the models, while the CART and RF models can be trained directly using the original dataset.The standardization method is used as follows: features  1 to  4 represent periodic features related to time; therefore, we employ Sine-Cosine encoding, where  ′ = sin (2 ), with  as the original value,  ′ as the standardized value, and   as the total periods of .For example, the standardized data of the feature  1 = ( 1 1 ,  The data after Z-score standardization have a mean of 0 and a standard deviation of 1.

Model Training
During the model training process, this study randomly splits the training dataset   into   train and   valid in a 4:1 ratio, with   train containing 9312 data samples and   valid containing 2328 data samples.
In the w-SAA method, the study trains ML models based on   train and tunes hyperparameters based on   valid , with decision loss as the training metric.In the process of validation and testing, this study randomly selects  = 200 scenarios of energy prices to be inputted into the model for solving, yielding approximate solutions ̂ w−SAA and calculating the corresponding decision losses   w−SAA .The hyperparameter settings and tuning results for the four models are shown in Table 3.
The line charts of decision loss during the hyperparameter tuning process of the four models for w-SAA are depicted in Figure 2.  4.
The line charts of decision loss during the hyperparameter tuning process of the four models for contextual distribution estimation are depicted in Figure 3.

Discussion
The experiments are conducted on a computer with AMD Ryzen 5 4600U and 16 GB (3200 MHz) RAM under the Windows 10 operating system.The mathematical model in Section 3.1 is implemented in Python programming language using Gurobi 9.5.2 as the solver.This study trains four ML models, i.e., kNN, CART, RF, and KR, based on the dataset   to implement two methods, w-SAA and contextual distribution estimation.The models are tested on the test dataset   , and the corresponding decision losses   are calculated and summarized in Table 5. Figure 4 illustrates the results.From Table 5 and Figure 4, we can see that compared to the traditional w-SAA method, the proposed contextual distribution estimation method has similar decision loss under the kNN, CART, and RF models, and exhibits certain advantages under the KR models.Specifically, under the kNN model, the decision loss obtained by w-SAA is 1.78% lower than that of our proposed method; under the CART model, the decision loss obtained by w-SAA is 0.63% lower than that of our proposed method; under the RF model, the decision loss obtained by w-SAA is 3.33% lower than that of our proposed method; however, under the KR model, the decision loss obtained by our proposed method is 30.36%lower than that of w-SAA.
The results of the numerical experiments demonstrate that our proposed method, i.e., contextual distribution estimation, exhibits certain advantages under some particular scenarios, leading to a significant reduction in decision loss.

Conclusions
For contextual stochastic optimization problems whose objective functions are nonlinear in their uncertain parameters, this study builds on w-SAA and further estimates the conditional distributions of uncertain parameters accurately, thereby obtaining approximate solutions to the problems.Specifically, we use the point prediction of an ML model as an estimate of the conditional mean, the estimated variance of the differences between predicted values and actual values on the training dataset as estimates of the conditional variance, and fit the uncertain parameters with an appropriate distribution based on historical data.By generating a set of estimates from the estimated distribution and inputting them into the model, an approximate solution to the stochastic optimization problem can be obtained.
This study uses the energy scheduling problem as the case study.Four ML models, i.e., kNN, CART, RF, and KR, are trained based on a real-world energy price dataset to implement w-SAA and contextual distribution estimation methods, and the performance of the two methods is tested.From the experimental results, it is shown that the proposed contextual distribution estimation method in this study exhibits advantages in certain scenarios compared to w-SAA, significantly reducing decision losses.
This study introduces the contextual distribution estimation method for contextual stochastic optimization problems, which can be applied to address data-driven uncertain decision problems in the field of operations research and management science, such as transportation and logistics.In future research, extensive computational experiments with data from different fields, such as transportation, manufacturing, and logistics, should be conducted to validate the effectiveness of the contextual distribution estimation method.

1 𝑉.
Therefore, the approximation of Model A is shown as follows:

Specifically, 𝑥 1 = 1
indicates January of the current year,  2 = 1 indicates the first week of the current year,  3 = 1 indicates Monday,  4 = 1 indicates the first hour of the day, and so on. 5 = 1 indicates that the day is a holiday.

Figure 2 .
Figure 2. Decision loss on   valid of the four models for w-SAA: (a) kNN; (b) CART; (c) RF; and (d) KR.In the contextual distribution estimation method, this study trains ML models based on   train and tunes hyperparameters based on   valid , with decision loss as the training metric.In the process of validation and testing, this study generates  = 200 estimates of energy price based on the estimated distribution ̂ |=  ,  ∈ {1, … ,24} and randomly selects  = 200 scenarios of energy prices to be inputted into the model for solving, yielding approximate solutions ̂ distr_esti and calculating the corresponding decision losses   distr_esti .The hyperparameter settings and tuning results for the four models are shown in Table4.

Figure 3 .
Figure 3. Decision loss on   valid of the four models for contextual distribution estimation: (a) kNN; (b) CART; (c) RF; and (d) KR.

Figure 4 .
Figure 4. Decision loss of w-SAA and the contextual distribution estimation on the test dataset   .

Table 1 .
The definitions of the parameters and variables.

Table 2 .
Description of the data features.

Table 3 .
Hyperparameter settings and tuning results based on decision loss for w-SAA.

Table 4 .
Hyperparameter settings and tuning results based on decision loss for contextual distribution estimation.

Table 5 .
Decision loss of w-SAA and contextual distribution estimation on the test dataset   .