A risk-aversion approach for the Multiobjective Stochastic Programming problem

Multiobjective stochastic programming is a field well located to tackle problems arising in emergencies, given that uncertainty and multiple objectives are usually present in such problems. A new concept of solution is proposed in this work, especially designed for risk-aversion solutions. A linear programming model is presented to obtain such solution.


Introduction
Decision making is never easy, yet we often have to make decisions. Emergencies and disaster management are fields in which many difficulties often arise, such as high uncertainty and multiple conflicting objectives. To overcome such difficulties, risk-aversion decisions are usually sought. Risk-aversion is the attitude for which we prefer to lower uncertainty rather than gambling extreme outcomes (positive or negative).
Risk-aversion, although typically studied in problems with uncertainty, can as well be considered when making decisions with multiple criteria. For instance, in the field of disaster management, solutions that are sufficiently good for all criteria are usually preferred than others that perform exceptionally good for some criteria but inadequately for the others.
This situation, in which multiple conflicting objectives have to be optimized, has led to the definition of different solution concepts and methodologies. Depending on the problem and the type of solution considered, a specific methodology should be applied.
The concept of efficiency reflects the intuition that for a solution to be acceptable, another cannot exist improving that one in every objective. Multiple notions of efficiency are available. The notation that this paper follows is the given in Ehrgott (2005).
• Efficient or Pareto optimal if there is no x ∈ X such that f k (x) ≤ f k (x) for all k = 1, . . . , K and f i (x) < f i (x) for some i ∈ {1, . . . , K}.
• Strictly efficient if there is no x ∈ X, x =x such that f (x) ≤ f (x).
Furthermore, the set of efficient solutions is called the efficient set, and the image under f of this set is the nondominated set. It is reasonable to assume that the solution given to any problem must lie in the efficient set.
Uncertainty is another feature present in the studied problems, in which risk-averse decisions will be preferred. The most common ways for dealing with the uncertainty are stochastic programming and robust optimization, in which fuzzy optimization is also included (Rommelfanger, 2004).
The different approaches for treating uncertainty do not respond to the desires of the modeller, instead, they reflect the nature of the uncertainty. If the uncertainty comes with an underlying known or estimated probability distribution, then stochastic programming is used. On the other hand, if uncertainty comes from a lack of precision or semantic uncertainty, then robust optimization is used. Robust optimization does not assume a known (or existing) distribution (Ben-Tal and Nemirovski, 1999;Chen et al., 2007;Klamroth et al., 2017). A recent review of robust optimization is written in Gabrel et al. (2014). For an introduction to stochastic programming the reader is referred to Birge and Louveaux (2011).
Stochastic programming is the widest used technique when there are historical data or information to infer a probability distribution. Moreover, usually discrete distributions are used, calling scenarios the different values. The concepts of value-at-risk (VaR) and conditional value-at-risk (CVaR) are widely used for quantifying risk (see for instance Yao et al. (2013);Mansini et al. (2015); Liu et al. (2017); Dixit and Tiwari (2019); Fernández et al. (2019)). They are typically defined for losses distributions in finance, where the right tail of the distributions are of interest.
Consider now the following problem, in which multiple objectives to be minimized and uncertainty are included simultaneously: The above problem is typically called multiobjective stochastic programming problem (MSP), especially if ω, the uncertainty source, has a known probability distribution.
In this paper we introduce a new solution concept in multiobjective stochastic programming based on risk-aversion preferences. Such concept is complemented with a mathematical programming model to efficiently compute it, and computational experiments are performed to assess its strengths.

Structure of paper
The remaining of this paper is organized as follows. Section 2 includes the definition of a novel concept of solution for MSP problems and studies its properties. Section 2.4 illustrates a basic example of how this solution can be found if the decision space is finite and small.
Section 3 shows how to obtain such a solution with a linear programming model. An application to the multicriteria knapsack problem is developed in Section 4.
2 Multiobjective stochastic programming 2.1 Literature review Goicoechea (1980) develops PROTRADE method, where utility functions are defined to aggregate objectives into a single objective stochastic problem. The resulting problem is solved with an interactive method, where the decision-maker defines an expected solution and a feasibility probability. Leclercq (1982) reduces the stochasticity by adding some good measures to the list of objectives, such as the mean, variance, or probability of being over/below a threshold. The resulting multiobjective deterministic problem is solved aggregating the objectives, but it could be solved via other techniques. Caballero et al. (2004) compare the stochastic approach with the multiobjective approach when using different techniques. The stochastic approach transforms the MSP on a single-objective stochastic problem, while the multiobjective approach first reduces the stochasticity transforming the MSP on a deterministic multiobjective problem. They highlight that "the multiobjective approach forgets the possible existence of stochastic dependencies between objectives." Aouni et al. (2005) study stochastic goal programming, where the deviation of the objective functions to some goals set beforehand to stochastic values is minimized.
In Ben Abdelaziz and Masri (2010) a chance-constrained compromise approach is proposed, with an example presented in Ben Abdelaziz et al. (2007). In Muñoz et al. (2010) the INTEREST method is proposed. It is an interactive reference point method. The decision-maker gives reference levels u i and probabilities β i , hoping to achieve a solution x * such that P (f i (x * ) ≤ u i ) ≥ β i . If this is infeasible, the decision-maker should either increase the reference levels or decrease the probabilities of achievement. Ben Abdelaziz (2012) reviews different solutions methods for the MSP problem, categorizing them as stochastic approach or multiobjective approach. Some fields where MSP models have been developed are: forest management (Álvarez-Miranda et al., 2018), multiple response optimization (Díaz-García and Bashiri, 2014), energy generation (Teghem et al., 1986;Bath et al., 2004), energy exchange (Gazijahani et al., 2018), capacity investment (Claro and De Sousa, 2010)

Definitions and dominance relationship
The concept of CVaR allows to aggregate several scenarios by just looking at what happens in the worst ones. The ordered weighted averaging (OWA) operators are defined in Yager (1988), and independently in the field of locational analysis Carrizosa et al. (1994);Nickel and Puerto (1999) under the name of ordered median function. These concepts will allow us to aggregate different criteria by looking at the least desirable ones, as a risk-aversion measure.
Remark 1. For certain weights, the OWA represents a known quantity: • If λ i = 1 n , the resulting OWA is the average of a. • If λ 1 = 1, and λ j = 0 for j > 1, the OWA is the maximum of a.
• If λ n = 1, and λ j = 0 for j < n, the OWA is the minimum of a. Yager and Alajlan (2016) later study how to assign weights for an OWA when criteria have different importances.
Definition 4 (OWA with importances, Yager and Alajlan (2016)). Given a 1 , . . . , a n ∈ R with importances u 1 , . . . , u n such that i u i = 1 the weights λ j for the OWA can be calculated with f , the weight generating function in the following manner: 1. Sort vector a such that a (1) ≥ a (2) ≥ . . . ≥ a (n) .

Obtain the weights as
Example 1 (of Definition 4). Consider the following weight generating function, for a given r ∈ (0, 1]: Let (·) be the order such that a (1) ≥ · · · ≥ a (n) , u (j) the weight associated to a (j) , and also let T j = j k=1 u (k) . We shall see know how the weights are obtained from f . Let j * be such that T j * −1 < r ≤ T j * .
Consequently the OWA of a 1 , . . . , a n with importances u 1 , . . . , u n is: That is, the OWA is the average of the worst a j , weighted by their importances, with total importance adding up to r The starting point of this paper is the recurrent idea of representing ordered weighted or ordered median operators by means of k-sums. k-sums (or k-centra in the location analysis literature) are sums of the k-largest terms of a vector (Puerto et al., 2017). One can trace back, at least to Kalcsics et al. (2002), the use of k-sums to represent ordered median objectives. More recent references are for instance Blanco et al. (2013Blanco et al. ( , 2014; Ponce et al. (2018) and Filippi et al. (2019). This last reference introduces a normalized version of k-centrum, named β-average that will be used in our paper.
Through the remaining of the paper consider that f j k (x) are functions to be minimized within a feasible set X, with k = 1, . . . , K representing K different objectives with importances w k and j = 1, . . . , J encoding J different scenarios with probabilities π j . Filippi et al. (2019)). Given β ∈ (0, 1], for each criterion k it can be defined g β k (x) which measures the average of f on the worst scenarios , with accumulated probability equal to β. Remark 2 (Filippi et al. (2019)). Given a value β, if the sum of the probabilities of the worst scenarios is exactly β, then the β-average is exactly (1 − β)-CVaR.
Example 2. Consider a point x, a fixed criterion k and 5 different scenarios with probabilities π j and values of f j k given. Table 1 shows the β-averages for different values of β, in which the scenarios have been ordered from largest value of f to smallest.
• For β = 0.2, the scenario j = 1 is the only one needed to obtain the worst scenario with probability 0.2, and hence g β • Finally if β = 0.5 scenario 3 needs to be added as well, but only with the probability needed until reaching 0.5: When using the β-average the functions f j k (x) were transformed into g β k (x), a collection of K functions not depending on the scenario. An OWA will be defined now, via its weight generating function, that will reduce the K β-averages into a scalar function.
Definition 6 (r-OWA, O r (x)). Given x i ∈ R with importance w i (i = 1, . . . , K, w i ≥ 0, i w i = 1) and r ∈ (0, 1], the function O r (x) is defined as the OWA with the following weight generating function: is made on a similar manner that the one given of the β-average (Definition 5), but it is done on a context with importances rather than probabilities. Example 3 shows the similarities between both approaches.
Example 3. Consider a point x and let g k (x) be the evaluation of x under 5 different criteria with importances w j . Table 2 shows the r-OWAs for different values of r, in which the criteria have been ordered from largest values of g k (x) to smallest. Consider the case r = 0.5: 1. As g k (x) are already ordered for largest to smallest, the values of T k are: 3. The weights of the OWA: 4. Consequently the r-OWA is: Remark 4. Given x 1 , . . . , x K and its associated importances w 1 , . . . , w K , then the λ k of the r-OWA are determined in such a way that: Given r, β ∈ (0, 1] and x ∈ X, let us introduce the function h β r (x) as the r-OWA of the β-averages. That is: Remark 5. If the importance of all criteria is the same (w k = 1 K for all k) and r = n K with n ∈ {1, . . . , K}, then the h β r (x) is the average of the n worst β-averages. Recall that this is called n-centra (Nickel and Puerto, 2005).
Definition 7 induces a domination relationship with the following properties: , and then x x, so is reflexive.
, which leads to x z, and we conclude that is transitive.
it cannot be guaranteed that x = y, and hence is not antisymmetric. 7

Idea of solution and dominance properties
Consider the multiobjective stochastic programming problem: The previously defined concepts of β-average and r-OWA transform the M SP problem into a deterministic multiple objective problem, and then into a deterministic single objective problem.
to be minimized which depends on the scenario j and the criterion k.
2. The problem is transformed into a deterministic one with multiple objectives (MOP) using the β-average concept.
3. Computing the r-OWA, each x ∈ X is assigned a scalar. The problem consists of finding the x which minimizes this h β r (x).
The solution procedure lies into what is usually called a scalarization approach. When obtaining a minimizer of h β r (x) it is also desired that the optimal solution is efficient for the associated MOP problem: Proposition 1. Given x * minimum of h β r (x) the following statements hold: 1. x * is not necessarily efficient of the MOP problem.
2. x * is weakly efficient of the MOP problem.
3. If x * is the only minimum of h β r (x), then x * is efficient.
4. Given x * not efficient, an alternative y * can be found on a second phase such that y * is efficient and h β r (x * ) = h β r (y * ).
These properties are known when using scalarization techniques (Ehrgott, 2005). Hence only an example of the first statement will be shown.
Example 4 (x * is not necessarily efficient). Consider the example displayed on Table 3, in which there are only two feasible solutions, two equiprobable scenarios (π 1 = π 2 = 1 2 ), three equally important criteria (w 1 = w 2 = w 3 = 1 3 ), and consider the values of β = 1 2 and r = 2 3 are taken. The β-averages are (0.8, 0.4, 0.65) for the first alternative and (0.8, 0.45, 0.65) for the second alternative. When computing the function h β r , both alternatives have an objective Table 3: Values of two alternatives for each scenario j and criterion k, together with their β-averages (β = 1 2 ) and r-OWAs (r = 2 3 )

An illustrative example
The solution concept proposed will be now applied, first with a discrete (and small) case. When the solution space is discrete, and all feasible solutions can be explicitly enumerated, the steps are as follows: Step 0 Normalize all objective functions f j k (x).
Step 1 Set values for β, r ∈ (0, 1]. Step 2 For every x ∈ X and every criterion define g β k (x) as: average of worst scenarios for criterion k with probabilities adding up to β Step 3 Define h β r (x) as: h β r (x) = average of worst g β k (x) values with importances adding up to r Step 4 Search for x ∈ X minimizing h β r (x) Assume a decision space with only four alternatives, evaluated under five different scenarios with six criteria. For each of those alternatives it can be computed the value of the functions f j k (x) to be minimized. Table 6 shows the values of f , evaluated on feasible point x 1 , by each of the scenarios and criteria considered.
The first step consists on calculating the β-averages. Let assume a value of β = 0.3: 1. For the first criterion the worst scenario is j 5 , which has probability 0.1. The second worst is j 4 , with a probability of 0.25. As the sum of those probabilities exceeds the β fixed, for computing the β-average just a probability of 0.2 is considered: The last step is calculating the function h β r (x), that is, the r-OWA of the β-averages. Table 7 calculates the r-OWA, and shows as well the information of the previously calculated β-averages, when the value of r = 0.17 is taken.  Results The values of the functions for the other alternatives, as well as its β-averages and r-OWAs are shown in Tables 13, 14 and 15, starting on Page 27. A summary of the results can be seen in Table 8, where all the β-averages and r-OWAs are shown, determining that the optimal alternative for the values of β and r given is Alternative 1.  Variations on β and r yield very different results. Figure 1a shows which of the four alternatives has the lowest h value, depending on the values of β and r. Figure 1b shows the optimal objective value when varying the parameters β and r. It can be appreciated how h decreases when β and r increase. This is due to the fact that the original f j k functions are to be minimized, and the larger the parameters β and r are, more favourable scenarios/criteria will take part on the computation of h β r (x), hence decreasing its optimal value.

Computing the minimum: continuous case
A concept of solution was proposed with Definition 7. When the functions f j k (x) to be minimized are given, a new function h β r (x) to be minimized is defined, with parameters β and r such that h β r (x) is the r-OWA of the β-averages. If the decision space is sufficiently small, the procedure shown in the above example obtains such a solution.
In this section, a mathematical programming model will be developed to obtain the minimum of h β r (x) which allows one to obtain the proposed solution for bigger decision spaces, including continuous ones.

Mathematical programming model
Given β ∈ (0, 1], let be the ordered scenario such that: The definition ofπ is made in such a way that jπ j = β. In this way, the average of the β worst values can be computed as 1 k (x), which coincides with the definition of β-average (Definition 4). This computation can be written as the following optimization problem:

. , J
A more natural approach would be to consider u j =ũ j β . These u j represent the proportion in which scenario j plays a part on the aggregated β-average. Introducing that change, the model is: The dual formulation is: And hence finding the x ∈ X which minimizes the average of the worst β scenarios for a given k is: min Or alternatively: Which is equivalent to: x ∈ X Remark 6. Models (2) and (3) are equivalent, as for any x ∈ X chosen in (3) the values z and y j will get as small as allowed by constraint (3b), as this improves the objective function (3a). Consequently for every x, its β-average will be computed appropriately, and thus (3) obtains the x ∈ X with smallest β-average, as desired on (2).
For every k ∈ {1, . . . , K} thanks to the problem (1) the function g β k (x) can be defined, which measures for each x ∈ X the β-average for that criterion, being: The already known approach for single criterion problems ends here. Given that, the next step is finding a "good" solution for all k. That is: Given r ∈ (0, 1] the r-OWA of the β-averages will be now computed (in accordance with the definition given in Section 2). That is, the solution of the following problem is sought: Or equivalently: Its dual formulation is: Replacing the value of g β k (x) given in (4) the next model is obtained: Model (5) calculates for a given x ∈ X the r-OWA of its β-averages, which coincides with the notion of the function h(x) given in Section 2. This problem is not explicit in that it contains nested optimization problems in the constraints. For that reason, we propose a single level alternative for x ∈ X fixed.
Consider the following linear programming model: Proposition 2. Transformation from model (5) to model (6) is valid, in that their optimal solution and objective values coincide.
be the optimal solution of model (6). (z * , v * k ) is feasible of model (5), and it will be shown that it is also optimal for such model. Assume it exists (z , v k ) feasible of model (5) This and constraint (6b) implies there exists k 0 such that: otherwise z , v k , z * k , y * kj would be optimal of model (6). Since z * k0 and y * k0j are feasible of model (6) they are also feasible of the model on the RHS of constraint (5b), and thus z and v k0 violate constraint (5b).
Proposition 2 showed that the optimal solutions of models (5) and (6) coincide. Proposition 3 goes further showing the connection between their feasible sets.

Proposition 3. The feasible set of model (5) is a projection of the feasible set of model (6).
Proof.
1. For each feasible solution (z, v k ) of model (5) there is at least one feasible solution of model (6) with same values (z, v k ), being so the same objective function.
2. For each feasible solution (z, v k , z k , y kj ) of model (6), (z, v k ) is a feasible solution of model (5), being so the same objective function. Let (z 2 , v 2 k , z 2 k , y 2 kj ) a feasible solution of model (6). Since constraints (6b), (6c) and (6d) are included in model (6), (z 2 k , y 2 kj ) is feasible for the model included in the RHS of constraint (5b) and therefore greater than or equal to the minimum of that model, verifying: and so, feasible for model (5).
Finally after proving the validity of model (6) it is possible to let x ∈ X free, with the purpose of finding the one minimizing the function h β r (x):

Knapsack problem
The multiobjective stochastic knapsack problem is used to illustrate the usefulness of the previously defined concept.
Definition 8 (Multiobjective stochastic knapsack problem). Let I be a collection of objects with weights v i , which can be selected as members of a knapsack with maximum weight V . There is a set of scenarios J, each of them with probability π j , and a set of criteria K, with importances w k . For every pair of scenario-criterion, there is a benefit associated with selecting object i, denoted by b i jk . Which objects should be taken in order to maximize benefit?
The above problem differs with the well-known knapsack problem in that there is stochasticity and multiple objectives to be maximized.
The following MSP model can be adapted to analyze the problem. Note that to preserve the sense of the optimization, rather than to maximize the benefits of the carried objects, it will be minimized the value of the objects not chosen.
When using the methodology developed in the previous sections, problem (7) is transformed into the following mixed-integer linear programming model: For given r, β ∈ (0, 1], model (MSP) obtains the x * minimizing the r-OWA of the β-averages. In order to illustrate the benefits of using model (MSP), a naive way of solving problem (7) is considered: Hence model (MIP) computes the average of the f j k , using the importances of the criteria and the probability of the scenarios. It is clear that for "average" criteria-scenarios x * MIP , the optimal solution of model (MIP), outperforms x * MSP , the optimal solution of model (MSP). Conversely x * MSP will improve x * MIP in unfavourable criteria-scenarios, as expected of a risk-averse solution.

Computational experiments
The following sections will show computational experiments, for different values of r and β and different number of objects, scenarios and criteria. Algorithm 1 shows how the random instances are created, given a number of objects, scenarios and criteria.
For each of the solved instances it will be recorded: • t MSP , t MIP : Solution time in seconds of models (MSP) and (MIP). With them the following value is calculated: • To grasp the difference between the MSP and the naive approach, the following will be calculated: These quantities reflect what is the effect of making decision x * MSP instead of x * MIP . Large values of ∆ avg indicate high penalties for making decision x * MSP instead of x * MIP in average scenarios-criteria. Similarly, the larger ∆ tail is, the higher benefit is obtained from making decision x * MSP in tail events. They will be called deteriorating rate and improvement rate Models are solved in GAMS 26.1.0 with solver IBM ILOG CPLEX Cplex 12.8.0.0, using a personal computer with an Intel Core i7 processor and 16Gb RAM. Experiment 1 First experiment will consist on a full factorial design, in which the values of |I|, |J|, |K|, r, β fall in these sets: • |I| ∈ {50, 100, 200} • |J| ∈ {5, 25, 100} • |K| ∈ {3, 6, 9} • r ∈ {0.33, 0.5, 0.67} • β ∈ {0.05, 0.1, 0.5} For each tuple (I, J, K) random data will be generated, using algorithm 1, which will then be solved for every pair (r, β). All criteria and scenarios are given same importance and probabilities. That is, w k = 1 |K| , π j = 1 |J| . Time limit was set in two hours by instance, in which all but three of the 3 5 = 243 configurations were solved to optimality. Experiment 2 For the next experiment 100 random instances will be created, keeping the values of |I|, |J|, |K|, r, β constant and equal to the median value of the previous experiment. That is, |I| = 100, |J| = 25, |K| = 6, r = 0.5, β = 0.1. All criteria and scenarios are given same importance and probabilities. All 100 instances were solved to optimality.

Results
Experiment 1 Table 16 shows for each of the 243 instances the solution times of the MSP and the MIP models, and the deteriorating and improvement rates of using the MSP solution instead of the MIP solutions (measured in deviation to MIP solution). Table 9 shows the correlations between times and rates with the parameters of the instance. It can be seen how the MSP solution has a higher impact when fewer scenarios are considered. In addition to that, it can be appreciated that the MSP solution times decrease when β increase, that is, when more scenarios are included in the β-average computation. This appreciation is confirmed by Table 10, in which it can be seen that the median time penalty factor (how many more times does it take to solve the MSP model than the MIP one) is much smaller when β = 0.5 than when β = 0.05. Solution times of the MSP model are alarmingly high for some instances, due to the fact that the admissible integrality gap has been set to zero. If that is relaxed, it can be seen that all of the 243 instances reach an integrality gap smaller than 5% in under 3 seconds, 2% in under 5 seconds and 1% in under 88 seconds. Table 11 groups instances by r and β, and shows the deteriorating and improvement rates. It can be seen that the improvement rate (in the tail) is generally higher than the deteriorating rate (in the average), especially in cases with small r and β.
This claim is also supported with Figure 2, where each of the 243 instances is shown according to the values of ∆ avg and ∆ tail , and grouped by the values of (r, β). Almost all of the instances ara above the imaginary line ∆ avg = ∆ tail , which shows that considering the MSP solution improves in the tail more than it loses in the average situations. In addition to that, it can be seen that the largest improvements in the tail are on instances with β = 0.05 (one of the usual values taken for CVaR), and especially with the smallest values of r. When r and β grow the differences between the MIP and MSP solutions are reduced.  Table 17 contains the results for each of the 100 instances, all of them with constant parameters |I| = 100, |J| = 25, |K| = 6, r = 0.5, β = 0.1. Table 12 contains a summary of the results, where it is again seen that the improvements in the tail are better than the loses in the average situations. Although single instances might take a long computing time, the median MSP solution time (3.74s) is definitely satisfactory. It is worth mentioning that the models were implemented without providing any extra bounds or known cuts that could reduce solution times. Finally, figure 3 shows the values of f j k (x), where x = x * MIP in blue squares and x = x * MSP in orange circles, for just one of the created instances. It can be appreciated how x * MIP performs better than x * MSP in average criteria-scenarios, but x * MSP is better with unfavourable situations.

Conclusions
In this paper a new concept of solution has been proposed for Multiobjective Stochastic Programming problems, focused on risk-aversion. As such, this concept can be particularly useful in real-life situations where there exists a great concern with respect to unfavourable situations, such as emergency management.
The solution concept is supported by an efficient way to compute it by a Mathematical Programming problem. This model is linear provided that the underlying problem can be linearly representable. Numerical experiments have been conducted for validating this approach, solving a multiobjective stochastic knapsack problem.
The research has also shown that the improvements in the tail (unfavourable situations) are consistently higher than loses on average situations, especially when small values of the parameters β and r are chosen. These differences, although clearly noticeable, are not as high as one could expect. This is possibly due to the randomness of the data. It is reasonable to assume that in actual real-life problems there are choices that are more conservative for every scenario and criterion, and thus being preferable for risk-aversion attitudes.
Results showed that there is a clear increase in computational time; however this is arguably acceptable as a price to pay for being risk-averse. Furthermore, this could also be due to the random nature of the data. Nevertheless, it was also shown that allowing for even rather small integrality gaps (1%) leads to drastic improvement on the computing times.  Table 15: Values of alternative 4 by scenario (j) and criteria (k) criteria w 1 = 0.20 w 2 = 0.10 w 3 = 0.20 w 4 = 0.25 w 5 = 0.15 w 6 = 0.10 k 1 k 2 k 3 k 4 k 5 k 6 scenarios π 1 = 0.