1. Introduction
Recently, the renewable energy sources, particularly wind and photovoltaic (PV) systems, have seen significant growth and are increasingly connected to the power systems [
1,
2,
3]. However, as the proportion of wind power and PV access continues to grow, the forecast for renewable energy output is becoming increasingly uncertain, which presents great challenges to the secure and cost-effective operation of power systems [
4]. Addressing the negative impacts of uncertainties in renewable energy output and load, while minimizing the risk of elevated operation costs, is essential for ensuring the secure and cost-effective operation of the power systems.
Much literature has studied the problem of optimal dispatch of power systems under uncertainty. Plytaria et al. [
5] proposed an energy dispatch model of a microgrid with solar and stationary battery systems, based on the stochastic optimization (SO) method. Zhang et al. [
6] developed a multi-stage SO model, offering greater flexibility compared to the traditional two-stage approach, which can better handle the uncertainties associated with renewable energy outputs. Based on the quasi-Monte Carlo simulation technique to generate wind farm output scenarios, Chen et al. [
7] proposed an SO model for economic dispatch that incorporates the trading of flexible ramping services. Nguyen et al. [
8] used a robust optimization (RO) model for energy management of energy storage systems (ESSs) and renewable energies in DC microgrids, and a convex reformulation to deal with the nonlinear non-convex equations. Chen and Wei [
9] proposed a novel two-stage RO model for power systems to reduce operational risks based on the decision-related uncertainty sets.
The aforementioned optimal dispatch methods considering uncertainties can be classified into two categories: RO and SO methods. However, the RO method ignores the random variables’ probability distribution, potentially leading to overly conservative decisions [
10]. And the SO method considers the probability distribution information and minimizes the expected operation costs but ignores the extreme risk scenarios in the tails of the probability distributions. The extreme risk scenarios often have a low probability of occurrence, but if they do occur, they will result in high operating costs for the grid [
11]. To mitigate the shortcomings of the risk-neutral SO decisions described above, risk metrics have been introduced into the SO method with the aim of achieving more economic and secure decisions by avoiding probabilistic tail risks in advance.
Daneshvar et al. [
12] considered the risk under uncertainties during system operation by means of a robust function based on information gap decision-making and obtained the risk-averse strategies. Li et al. [
13] and Farzan et al. [
14] proposed a risk-averse SO method for multi-energy grids based on a scenario approach, and introduced a conditional value at risk (CVaR) metric for quantifying tail risks within a specific probability distribution, thereby facilitating risk-averse decisions. Recently, Belles-Sampera et al. [
15] proposed a new risk metric in finance, named GlueVaR (Glued Value-at-Risk), which has more flexible parameter combinations than CVaR, and is able to take into account multiple risk needs of decision makers [
16]. Furthermore, the stochastic approximate dynamic programming (SADP) algorithm has been widely used in SO problems to address the problem of low solution efficiency, which can transform a complex multi-temporal model into a series of single-period models in the form of recursive solution [
17]. Lin et al. [
18] developed an SO model for an islanded microgrid considering PV ancillary services and solved the model by the SADP algorithm. Lin et al. [
19] used an improved SADP algorithm with the segmented linear approximate value functions (AVFs) to solve the SO model of power system economic dispatch considering pumped storage plants. Das et al. [
20] proposed an SADP algorithm based on strategy function accelerated solution and applied it to the SO dispatch problem. Therefore, this paper focuses on how to design an improved SADP algorithm that can efficiently formulate risk-averse strategies based on the new risk metric.
The major contributions can be summarized as follows:
(1) A risk-averse stochastic optimization model is constructed, which introduces the advanced financial risk measure GlueVaR into power system dispatch. This model employs second-order cone programming formulations to capture complex interdependencies and enables simultaneous optimization across multiple risk horizons with adjustable conservatism parameters, thereby explicitly quantifying and mitigating tail risks often overlooked by conventional methods.
(2) An SADP algorithm based on risk-averse approximate value functions (RAVFs) is proposed to solve the established model efficiently, in which the training process of the RAVFs employs machine learning principles to directly encode risk preferences into operational decisions by integrating GlueVaR metric.
(3) A parallel computing architecture is embedded within the SADP algorithm framework, decomposing the multi-period optimization problem into coordinated single-period sub-problems. This design enables the parallel computation of all probabilistic scenarios during offline training, drastically reducing the training time and ensuring solution efficiency for online risk-aware decision-making.
The rest of this paper is structured as follows.
Section 2 introduces the risk-averse SO model of new power system.
Section 3 introduces the improved SADP algorithm.
Section 4 presents the case studies in the power system.
Section 5 presents the detailed discussion and further study.
Section 6 presents the conclusions.
3. Solution Methodology
3.1. Model Reformulation
Define the state of ESS at period
t as
Rt = {
Rb,t}; the external random variables as
Wt = {
,
,
}; the decision variables as
Xt = {
,
,
,
,
,
,
,
}; and introduce
St = (
Wt,
Rt) as the pre-decision state and
as the post-decision state. According to the risk-averse Bellman recursive equation [
30], the multi-temporal model can be transformed into a recursive solution of the single-temporal model when the model uses a consistent risk metric to calculate the cost of risk:
where
and
are the value functions of
and
, respectively, and
is the decision feasible domain. When the analytic expression of
is known, (22) is a deterministic optimization model. However, the expression of the risk-averse cost (23) depends on the information of random variables that are difficult to obtain accurately in practice, which makes the recursive solution difficult. Thus, this paper proposes a risk-averse SADP algorithm based on GlueVaR metric to strive to obtain a computationally efficient near-optimal strategy.
3.2. GlueVaR-Based Risk Metric
Choosing the right risk metrics is critical to accurately measuring the cost of risk. The most popular risk metrics used in power systems are mainly VaR and CVaR, which are calculated as (24) and (25):
where
α is the confidence level;
u is the VaR value;
P(
Y ≤
u) ≥
α means the probability of
Y ≤
u is at least
α; and (
Y −
u)
+ denotes that the
Y −
u is taken is equal to 0, except when
Y ≥
u.
VaR and CVaR’s conservatism hinges on the given confidence level, often set high to minimize the risk costs of system operation. However, decisions at higher confidence levels can impact system operation economics, requiring decision-makers to balance risk avoidance with economic considerations. GlueVaR, a flexible risk metric, has been increasingly used in finance. It offers adjustable parameters to suit diverse risk needs, encompassing VaR and CVaR. The detailed definitions of GlueVaR can be referred in [
15], by introducing into the distortion function, the GlueVaR risk metric
can be expressed in the form of the linear combinations of CVaR and VaR as follows:
While the VaR and CVaR methods can only metric at a single given confidence level, GlueVaR is able to metric at two given confidence levels α and β simultaneously, and by adjusting the parameters it can metric any cost of risk between VaRα and CVaRβ. The GlueVaR metric introduces two key parameters, k1 and k2, which are not merely abstract coefficients but direct levers for specifying operational risk preferences. Their practical interpretation is pivotal for real-world applications:
(1) The parameter k1 governs aversion to extreme “disaster” scenarios. It applies a risk weight to the very far tail of the loss distribution (associated with a high-confidence VaR). A high k1 value directs the optimization to prioritize hedging against low-probability, high-impact events (e.g., a compound failure during a once-in-a-decade storm).
(2) The parameter k2 governs aversion to severe “stress” scenarios. It weighs the conditional expectation of losses beyond a specified VaR threshold (aligned with CVaR). A high k2 value focuses the strategy on mitigating more frequent, severe stress periods (e.g., a week of consecutively low renewable generation and high demand).
This structure allows system operators to move beyond a single risk measure. For practical tuning, the following guidelines are proposed:
(1) Set (k1, k2) = (0, 1) to emulate a pure CVaR-like policy focused on “stress” management; set (1, 0) to emulate a pure high-confidence VaR-like policy focused on “disaster” prevention.
(2) Adjust k1 and k2 based on the specific reliability mandates, observed volatility of renewable assets, and the perceived trade-off between preparing for extreme tail events versus managing severe but more common conditions. For instance, a balanced profile like (0.5, 0.5) explicitly tells the algorithm to weigh both concerns equally.
Therefore, the GlueVaR metric provides flexible adjustment of risk preferences, enabling a tailored balance among diverse decision-making needs in practical applications.
3.3. Steps of Improved SADP Algorithm Based on GlueVaR
The core of the proposed algorithm lies in designing and training a set of explicit, parameterized risk-averse AVFs to encapsulate economic risks and overcome the computational “curse of dimensionality”. To preserve the physical interpretability of the power system dispatch strategy, piecewise linear functions are employed to parameterize the value functions. This formulation not only effectively captures the marginal effects of the value functions as the system state changes but also ensures that its parameters directly correspond to the risk-adjusted marginal value of dispatchable resources under specific system conditions.
To determine the slopes of segmented linear functions, a large number of sample sets need to be generated during the offline training of the AVFs. Then, the proposed algorithm calculates the risk-averse cost of each sample set by combining with GlueVaR, and updates the slope of the segmented function with the information of the risk of each sample set. After obtaining the converged AVFs, they can be applied online in conjunction with the operation state of the power system. The proposed algorithm can be divided into the following steps as follows:
3.3.1. Initialization
The Monte Carlo approach is applied in generating a large number of error data of random variables as the training sample set Φ, and the sample set Φ is randomly and equally divided into M batches of small sample sets Φm; set the initial slopes v = 0, m = 1.
3.3.2. Training Approximation Value Functions
(i) For each error scenario φ of a small sample set Φm, solve the corresponding (22) based on segmented linear functions and obtain the optimal decision Xφ,t and the objective function Cφ,t, and update the state . In order to improve the computational efficiency, each error scenario can be computed in parallel for the same sample set.
(ii) In order to make the AVFs capture the decision-maker’s risk appetite, the training objective is not to minimize the traditional expected cost, but to minimize the risk-adjusted future cost distribution measured by GlueVaR. Thus, calculate the cost of risk metrics based on the following (27):
Combining (24)–(26), the risk metric
can be obtained by solving the following linear optimization problem:
where
,
,
,
are auxiliary variables. This objective function indicates that the direction of parameter updates is predominantly determined by the most adverse (i.e., high-cost) batch of scenarios, with the degree of adversity being precisely calibrated by the GlueVaR parameters (
k1,
k2). In this way, the risk measure is directly encoded into the parameters of the AVFs, rather than evaluating the strategy afterward in deep reinforcement learning method.
(iii) Apply a small perturbation to each post-decision state variable at t − 1 (in which t = 2, 3,…,T) and calculate the post-perturbation risk metric cost similarly to steps (i), (ii).
(iv) Obtain the observed slope of the post-decision state at the moment
t − 1 (in which
t = 2, 3,…,
T):
where
is the risk metric cost calculated after applying a perturbation to the
b-th state variable at time period
t in the
m-th batch of sample sets;
is the amount of the given state perturbation. The provisional slope of the AVFs is updated in the following equation:
where
is the slope update step. After obtaining the temporary slope, the successive projection approximation algorithm [
31] is used to perform the mapping operation to maintain the convexity of the AVFs and to obtain the updated slope
.
(v) Let m = m + 1, if m < M, then return to step (i); otherwise, the training process ends and the trained AVFs is output.
3.3.3. Online Applications of Approximate Value Functions
Based on the trained AVFs and the real-time grid operation state data, the established model (22) is solved and output the online optimal dispatch decision (t = 1, 2,…, T), and send the decision command to each unit of the grid.
4. Case Studies
4.1. System Parameters
The case studies are conducted on the IEEE 33-bus system, with its topology depicted in
Figure 1. The number of GTs, ESSs, PV plants, and WTs in the system are 4, 2, 9, and 4, respectively. Take
Rb,min = 100 kWh,
Rb,max = 1000 kWh,
=
= 240 kW. The forecast data of load, PV, and WT are detailed in
Figure 2. The time-of-use electricity prices are detailed in
Table 1. Buses 8 and 24 access the incentive-type demand response loads, whose loads and maximum reductions are shown in
Figure 3, and buses 18 and 31 access the price-based demand response loads, whose loads and maximum reductions are shown in
Figure 4. The uncertainties associated with active load and renewable generation forecast errors are modeled using normal distributions, specifically
N(0, 0.03
2) for load and
N(0, 0.15
2) for wind/PV output, with output errors truncated at ±10% of capacity. This modeling choice is a common simplification for capturing forecast errors in SO method studies [
19], focusing on the mean and variance as the primary moments of uncertainty. Then 5000 error scenarios are extracted as the training sample set using the Monte-Carlo method, and randomly divided equally into 100 batches of small sample sets, each containing 50 error scenarios. All the simulations are made on MATLAB R2019a and each problem is solved by CPLEX solver.
4.2. Analysis of Training Results of Improved SADP Algorithm
To assess the convergence and learning performance of the improved SADP algorithm incorporating GlueVaR risk metrics, we use the offline optimal solution as a reference benchmark for comparison, assuming complete knowledge of uncertainties to minimize the total cost. The changes in the average operation cost of the improved risk-averse SADP algorithm, the risk-neutral SADP algorithm (
), and the offline optimal solution for each batch of samples during the training process are compared in
Figure 5. Meanwhile, the changes in the VaR value at the confidence level of 95% of each algorithm in each batch of samples are compared in
Figure 6. The improved SADP algorithm demonstrates good convergence during training. Initially, convergence is challenging due to inaccurate AVFs; however, performance improves as more samples refine the model. After about 30 iterations, the system’s operation cost decreases and stabilizes. The improved SADP algorithm’s efficiency stems from leveraging the value function’s convexity to quickly find the optimal strategy without exhaustively searching the state space. Compared to the risk-neutral SADP algorithm, the risk-averse SADP algorithm in this study shows higher average costs but offers more stability and lower 95%-VaR risk costs in extreme scenarios, suggesting that risk-averse decision of the proposed algorithm leads to more consistent performance under uncertainty.
4.3. Analysis of the Algorithm’s Solution Results in the Prediction Scenario
Results are presented for a representative prediction scenario. The statistical performance of the proposed policy, including variance analysis across multiple uncertainty realizations, is provided in
Section 4.4 (
Table 2). The trained AVFs are applied to the decision-making under the predicted scenario of random variables, as illustrated in
Figure 7,
Figure 8 and
Figure 9. Among them,
Figure 7 shows the output of various types of units at each time period under the predicted scenario, and
Figure 8 and
Figure 9 show the variation curves of the residual power of ESSs and the load demand response, respectively. The GT as a traditional energy source remains almost unchanged, acting as a base energy source. In periods 1 to 13, WTs and PV plants provide a large proportion of the system power, and the excess power is utilized to recharge ESSs, at which point their residual power of ESSs increases. Meanwhile, the system can maintain its power balance without purchasing electricity from the external system. And the load reductions at demand response buses are negative, as shown in
Figure 9, indicating that the system power output is abundant and can be increased to run part of the load.
During the periods 14–24, the PV plant’s outputs gradually reduced due to insufficient light intensity, and the decision maker began to buy electricity from external systems, the buses of the ESS gradually discharged, and the demand response buses began to reduce the load in response to system demand to meet the system power balance. The whole decision process reflects the flexibility of the source-grid-load-storage coordination, which can flexibly adapt the decision according to the characteristics of renewable energy output.
4.4. Comparison of Test Results Under Different Risk Preferences
The proposed risk-averse SO model utilizes GlueVaR for risk metric, with decision at a given confidence level hinging on the parameters (
k1,
k2). Ref. [
15] establishes that GlueVaR is a coherent risk metric, characterized by essential properties such as sub-additive and positively homogeneous within a specific range of parameter values. By adjusting (
k1,
k2) within the range and training the AVFs for each scenario, the AVFs’ performance on an independently sampled set of 1000 groups of scenarios is tested. The results, including the 95%-VaR and average cost, are depicted in
Figure 10. It is observed that as the values of (
k1,
k2) approach (1, 0), the degree of risk aversion increases, making the average cost in the test set larger, while the 95%-VaR cost is relatively lower; when (
k1,
k2) takes a value farther away from (1, 0), the lower the degree of decision-making risk aversion, the more economical the decision, i.e., the lower the average cost in the test set, but the extreme cost becomes higher. Meanwhile, it is observed that selecting more conservative values for risk parameters, such as (1, 0), can mitigate the risk costs associated with extreme scenarios and, to some extent, narrow the fluctuation range of operation costs, although it increases the average cost.
The statistical comparison of test results under different risk preferences is presented in
Table 2. Based on 50 independent Monte Carlo runs, the results are expressed as “mean ± standard deviation” demonstrating the statistical stability of the algorithm’s performance. It can be observed that the results for CVaR
0.80 are nearly identical to those of the GlueVaR metric with parameters (
k1,
k2) = (0, 1), while the results for CVaR
0.95 are very close to those of GlueVaR with (
k1,
k2) = (1, 0). This correspondence occurs because, with these parameter sets, the GlueVaR measure simplifies to CVaR
0.80 and CVaR
0.95, respectively, as shown in (26).
More importantly, the parameters (
k1,
k2) serve as practical levers for tuning risk attitudes beyond these fixed corner cases. As outlined in
Section 3.2,
k1 primarily governs aversion to extreme, low-probability “disaster” scenarios, whereas
k2 controls aversion to more frequent, severe “stress” scenarios. The performance spectrum in
Table 2 directly reflects this operational interpretation. For instance, the configuration (0.5, 0.5), which explicitly weighs both disaster mitigation and stress management equally, achieving a balanced trade-off: it yields a stable reduction of 8.1% (with a standard deviation of ±0.8%) in extreme risk (95%-VaR) at the cost of an 8.1% increase in the average total cost. In contrast, the (0.0, 1.0) configuration provides less extreme risk reduction for a lower cost penalty, while the (1.0, 0.0) configuration achieves the highest risk reduction at the highest economic cost. This parametric flexibility allows the GlueVaR metric to adapt to specific, practical decision-making needs, demonstrating a clear advantage over single risk measures with fixed confidence levels.
4.5. Comparison of Test Results Under Different Methods
To demonstrate the advantages of our proposed method, the comparison with a wider range of state-of-the-art risk-averse methods and AI-based methods in the modified IEEE 33-bus system are shown in
Table 3. Five representative methods are as follows:
(1) Risk-neutral stochastic optimal power flow (S-OPF) [
5]: serves as the economic performance lower bound.
(2) Information-gap decision theory (IGDT) based robust method [
12]: the non-probabilistic, extreme uncertainty-averse method.
(3) CVaR based S-OPF [
13]: the standard single-metric probabilistic risk-averse method.
(4) Robust OPF (R-OPF) [
8]: the conservative worst-case optimization method.
(5) Deep Reinforcement Learning (DRL) [
32]: the data-driven AI baseline method.
The proposed GlueVaR-SADP method demonstrates a great balance between risk mitigation, economic efficiency, and computational tractability. In terms of risk management, it achieves a 7.5% reduction in 95% VaR from the risk-neutral benchmark, effectively curbing extreme event costs. While its average operational cost is moderately higher than the risk-neutral lower bound, it remains more economical than all other risk-averse benchmarks, including IGDT, CVaR-SOPF, and R-OPF, highlighting its efficient trade-off. In terms of computational performance, the proposed method requires less than 8500 s for offline training, which is an order of magnitude faster than the DRL baseline (exceeding 65,000 s). This efficiency stems from the parallelizable architecture of the SADP framework. During online application, it delivers a solution in approximately 160 s, striking a favorable balance between the prolonged solving times of scenario-based methods (including S-OPF, CVaR-SOPF) and the simpler but more conservative robust approach. Moreover, unlike the black-box DRL policy, the proposed method provides explicit and interpretable risk quantification, a crucial feature for operational acceptance.
4.6. Test Results in the Modifed IEEE 69-Bus System
To thoroughly evaluate the scalability and generalization capability of the proposed method, comprehensive tests are conducted on a larger and more complex modified IEEE 69-bus distribution system. This system integrates a higher penetration of renewable resources, distributed storage, with its topology depicted in
Figure 11. Under the same probabilistic scenario set, the systematic comparison between the proposed GlueVaR-SADP method and the five benchmark methods described in
Section 4.5 is presented here.
As shown in
Table 4, the proposed method maintains its consistent advantages in the 69-bus system. In terms of risk control, its 95% VaR is significantly lower than those of the risk-neutral, IGDT, and robust optimization benchmarks, and is comparable to the CVaR method, while offering more flexible risk preference tuning. Economically, its average cost remains lower than all other risk-averse benchmarks except the risk-neutral one, demonstrating a good balance between risk mitigation and economic efficiency.
The comparison of computational performance is particularly notable. The offline training time of the proposed method is far less than that of the DRL method. Its online solution time achieves the best trade-off between accuracy and speed: it is an order of magnitude faster than scenario-based stochastic optimization methods (S-OPF, CVaR-SOPF) while avoiding the excessive conservatism of robust optimization (R-OPF). Compared with the results from the 33-bus system, as the system scale increases, the growth in the online computation time for the proposed method is much lower than the increase in problem complexity, verifying the excellent scalability of its parallel computing architecture.
5. Discussion
This section provides an in-depth interpretation of the core findings, mechanisms, positioning, and significance of the proposed risk-averse SADP algorithm. To ensure a clear presentation, the discussion is organized into the following subsections.
5.1. Key Findings and Core Validation
The empirical evidence from our case studies reveals that the risk-averse SADP algorithm can greatly reduce the operation risk costs under extreme uncertainty scenarios while maintaining computational tractability for real-time grid operations. This achievement directly validates our hypothesis that integrating GlueVaR risk metrics into value function approximation enables more robust decision-making without sacrificing the economic efficiency that system operators require.
The mechanism underlying this improvement warrants careful examination. Traditional stochastic optimization approaches in power systems have struggled with what we might call the “uncertainty paradox”—the need to account for rare but catastrophic events without becoming paralyzed by over-conservative strategies. Our algorithm addresses this through its dual-layer risk assessment architecture. During offline training, the parallel computation framework processes multiple uncertainty scenarios, but rather than treating each scenario equally, the GlueVaR metric assigns differential weights based on tail risk characteristics. This selective attention to extreme events, combined with the value function’s ability to encode risk preferences directly into state transition mappings, creates a decision framework that remains vigilant against high-impact events without sacrificing day-to-day operational efficiency.
5.2. Multi-Scenario Performance Analysis and Scalability
Power system operators face increasingly complex decision environments where renewable penetration levels exceed 40% in many regions. Our framework’s ability to coordinate source-grid-load-storage resources under such conditions represents a fundamental shift in operational philosophy. Rather than treating uncertainty as a constraint to be minimized, the algorithm leverages uncertainty information to create adaptive strategies that exploit favorable conditions while protecting against adverse scenarios. This paradigm shift becomes particularly relevant when considering the evolving regulatory landscape, where grid operators face both reliability mandates and renewable integration targets that often seem contradictory.
To comprehensively evaluate the broader applicability of our approach, we conducted extended sensitivity analyses across diverse operational contexts. The following analysis presents a systematic comparison of algorithm performance under varying grid configurations and uncertainty levels. The data presented in
Table 5 illuminates several crucial relationships that merit deeper investigation. Most strikingly, the correlation between renewable penetration levels and algorithm performance exhibits non-linear characteristics that challenge conventional assumptions about renewable integration challenges. Systems with higher renewable penetration, particularly the island system scenario with 80% renewables, demonstrate the greatest improvements in both average cost reduction and tail risk mitigation. This counterintuitive result emerges from the algorithm’s sophisticated handling of uncertainty—as renewable variability increases, the value of intelligent coordination between storage, demand response, and conventional generation becomes more pronounced, creating larger optimization spaces where our risk-aware approach can identify superior solutions.
The urban microgrid scenario presents particularly interesting dynamics. Despite moderate renewable penetration at 45%, it achieves convergence faster than any other configuration while maintaining strong performance metrics. This efficiency stems from the dense interconnection topology typical of urban grids, where multiple pathways for power flow create redundancy that the algorithm exploits through its network-aware optimization structure. The algorithm recognizes these topological advantages during the training phase, encoding them into value functions that prioritize flexible routing strategies during high-uncertainty periods.
To statistically validate the algorithm’s performance, a Monte Carlo analysis comprising 50 independent runs is conducted for the high-renewable penetration scenario in
Table 6. The proposed GlueVaR-SADP algorithm achieves a mean reduction of 31.2% (with a standard deviation of ±1.7%) in extreme-event costs (95%-VaR) compared to the risk-neutral benchmark. Concurrently, it maintains the average operational cost within 7.5% (mean, ±0.9% standard deviation) of the risk-neutral solution.
5.3. Limitations and Future Research Directions
This section comprehensively discusses the limitations of the current study and, based on these limitations as well as the research findings, proposes clear directions for future work.
5.3.1. Limitations of the Current Study
Although the proposed framework has achieved positive results, several limitations constrain its generalizability:
(1) Firstly, the assumption of normally distributed forecast errors, adopted for computational convenience, may not accurately capture the heavy-tailed distribution characteristics often present in extreme weather events. Secondly, the current model assumes perfect compliance from demand response resources, which does not align with real-world phenomena such as user behavior degradation and fatigue effects. Furthermore, the treatment of communication delays and measurement errors as negligible may not hold in deployment scenarios with limited infrastructure.
(2) The current framework is primarily designed for operational optimization at the distribution network and microgrid level. It does not consider N − 1 contingency constraints or cascading failure scenarios, which are crucial for transmission system security analysis. Additionally, the model does not incorporate large-scale electric vehicle (EV) fleets as spatiotemporally coupled mobile storage resources, representing a gap in modeling storage flexibility.
5.3.2. Future Research Directions
Addressing the above limitations and building upon the technical pathway established in this research, future work will focus on the following extensions:
(1) A primary direction is the integration of electric vehicle (EV) fleets. This involves designing novel state representation methods to characterize their spatiotemporal uncertainty and probabilistic flexibility, thereby enhancing system economics and demand response capabilities. Concurrently, there is a need to develop refined demand response models capable of capturing dynamic user behaviors to improve the practicality of generated strategies.
(2) To address information uncertainty, developing robust versions of the algorithm is necessary to maintain performance under higher-order uncertain information. Furthermore, the intrinsic relationship between algorithm solving efficiency and decision quality should be systematically investigated, striving to improve computational performance while ensuring the interpretability of the physical models.
(3) The current framework, focused on short-term operations, should be extended to long-term investment decision problems such as generation expansion planning, investigating methods for embedding risk preferences under long-term uncertainty. Exploring the potential for incorporating emerging technologies like hydrogen energy storage and peer-to-peer energy trading into the modeling framework is also promising.
6. Conclusions
This study proposes a risk-averse SO model for new power systems and develops an improved SADP algorithm for efficient solving. The method incorporates the GlueVaR risk metric into the training of risk-aware approximate value functions. Compared with the results of the risk-neutral SADP algorithm, although the decisions of the improved algorithm increase the total average cost under uncertainty, it can significantly mitigate the costs associated with extreme events and narrows the operation cost fluctuation range. In addition, the introduced GlueVaR metric is more flexible than the CVaR metric for decision-making, which not only covers the CVaR metric decision-making at two different confidence levels, but also enables the risk preference to be flexibly adjusted between the two through the appropriate parameter values, making it more suitable for complex and variable practical applications. Moreover, numerical results indicate a strong relationship between conservatism parameters and reserve capacity allocation, offering practical support for risk-adaptive decision-making.
Furthermore, a key advantage of the proposed algorithm is its computational performance and scalability. Comprehensive tests on both IEEE 33-bus and larger 69-bus systems demonstrate that the algorithm enables fast online decision-making suitable for real-time operations. Its offline training is more efficient and stable than data-driven deep reinforcement learning approaches. Crucially, the parallel computing architecture ensures excellent scalability, as the increase in online solution time remains lower than the growth in problem complexity when moving to larger systems. Therefore, this work provides not only a flexible risk-management paradigm but also a computationally efficient and scalable decision-support tool, offering a practical solution for real-time, risk-aware dispatch in power systems with high renewable penetration.