Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis

Liu, Yue; Cheng, Qinglin; Li, Yuchun; Yang, Jinwei; Zhao, Shaosong; Huang, Zhengsong

doi:10.3390/pr14081274

Open AccessArticle

Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis

by

Yue Liu

¹,

Qinglin Cheng

^1,*,

Yuchun Li

²,

Jinwei Yang

³,

Shaosong Zhao

¹ and

Zhengsong Huang

¹

Key Laboratory of Ministry of Education for Enhancing Oil and Gas Recovery Ratio, Northeast Petroleum University, Daqing 163318, China

²

Daqing Oilfield Design Institute Co., Ltd., Daqing 163712, China

³

Sino-Pipeline International Co., Ltd., Beijing 100029, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(8), 1274; https://doi.org/10.3390/pr14081274

Submission received: 14 March 2026 / Revised: 8 April 2026 / Accepted: 14 April 2026 / Published: 16 April 2026

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

High-penetration renewable energy significantly increases uncertainty, dynamic network coupling, and the need for secure and coordinated smart-grid dispatch. To address the limitations of conventional forecasting-based and static graph-based methods, this paper proposes a unified dispatch framework that integrates topology-informed dynamic graph learning, privacy-aware multi-agent symbiotic reinforcement learning, and structural causal intervention analysis. The dispatch problem is formulated as a constrained partially observable stochastic game, in which multiple agents coordinate generation adjustment, reserve allocation, and congestion-aware corrective actions under engineering constraints. A physics-informed dynamic graph convolutional module captures both fixed physical topology and stress-dependent operational couplings, while a KL-regularized multi-agent reinforcement learning scheme improves cooperative task allocation under renewable fluctuations. Federated optimization with Rényi differential privacy is introduced to protect sensitive local operational information during training. In addition, a structural causal module provides intervention-based interpretation of how wind variation, load escalation, and line stress affect dispatch cost, congestion risk, and renewable curtailment. Experiments on a public-trace-driven benchmark based on a modified IEEE 30-bus system show that the proposed method achieves the best overall performance among the compared baselines, reducing dispatch-cost RMSE to 3.82, locational-price MAE to 2.95, renewable curtailment to 4.8%, and the constraint-violation rate to 0.30%. Overall, the framework shows favorable performance on the test benchmark, provides post hoc intervention-based interpretation of dispatch outcomes, and is evaluated under a reproducible benchmark construction and assessment protocol.

Keywords:

smart grid dispatch; renewable uncertainty; dynamic graph learning; multi-agent reinforcement learning; differential privacy; federated learning; causal intervention analysis

1. Introduction

The rapid integration of renewable generation, distributed energy resources, flexible demand, and pervasive sensing has fundamentally changed the operational paradigm of modern power systems. In high-renewable smart grids, dispatch is no longer a static scheduling task under mild uncertainty. Instead, it has become a sequential decision problem in which multiple control entities must react to volatile renewable output, changing net load, congestion propagation, reserve shortages, and incomplete system visibility under physical and operational constraints [1,2]. Under these conditions, both state representation and control coordination become strongly time dependent.

Recent studies have advanced uncertainty-aware dispatch and security-constrained scheduling from different perspectives. Optimization-based methods, including stochastic- and uncertainty-aware dispatch, remain important because they preserve engineering interpretability and explicit feasibility constraints [3]. In parallel, reinforcement learning has shown promise for adaptive dispatch and rolling corrective control under renewable variability [4,5]. However, these methods often focus on either optimization or sequential control, while the interaction between topology-aware representation, multi-agent coordination, and corrective feasibility is still insufficiently addressed.

Graph learning provides another important direction because power grids are naturally structured systems. Recent graph neural network studies have improved state estimation, operational risk assessment, and topology-aware system analysis by capturing spatial dependence more effectively than flat feature encodings [6,7]. Nevertheless, the dispatch setting is more demanding than representation-only tasks. Although the physical network topology is relatively stable, the effective operational coupling among buses and controllable resources changes over time with congestion, reserve activation, and renewable ramps. Existing studies rarely examine whether learned couplings remain physically meaningful and security-relevant when they are embedded in a closed-loop dispatch policy [8].

Multi-agent reinforcement learning (MARL) is also attractive for smart-grid dispatch because different controllable entities operate with partial observations and heterogeneous responsibilities. Recent graph-based multi-agent learning and adjacent-domain resource-allocation studies have shown that adaptive coordination can improve distributed decision making under uncertainty [9,10,11]. However, smart-grid dispatch differs from generic resource-allocation problems because it must simultaneously satisfy power balance, ramping limits, reserve adequacy, and transmission-security constraints. As a result, coordination should not only improve reward, but also remain interpretable in terms of stressed corridors, balancing responsibility, and corrective reserve deployment.

A further challenge arises from data governance and operator trust. In practical dispatch environments, local operational data are often commercially sensitive or infrastructure-sensitive, which limits direct information sharing across agents or regions. Federated learning and differential privacy offer a natural way to enable collaborative training without exposing raw data [12,13]. At the same time, dispatch support also requires more than predictive accuracy: operators need to understand how changes in wind support, load escalation, and line stress affect cost, curtailment, and congestion. Yet most existing interpretability analyses remain correlational and are not explicitly linked to intervention-oriented operational questions [14].

Against this background, three research gaps remain. First, existing GNN- and RL-based dispatch studies usually improve either representation or control, but rarely validate whether learned couplings remain physically meaningful under stressed dispatch conditions. Second, existing MARL dispatch studies often omit privacy-preserving collaborative training, or they introduce privacy without quantifying the utility-security trade-off under an explicit threat model. Third, most interpretability analyses remain descriptive and are not connected to intervention-based operator questions. These gaps limit the practical value of current methods because renewable-rich smart-grid dispatch inherently involves a closed loop of state representation, coordinated control, privacy-constrained training, and operator-facing explanation.

To address these issues, this paper proposes a unified framework for uncertainty-aware smart-grid dispatch that integrates topology-anchored dynamic graph learning, privacy-aware multi-agent symbiotic reinforcement learning, and structural causal intervention analysis. The purpose of this integration is not to combine multiple techniques for their own sake, but to organize them around the actual logic of renewable-rich dispatch. The graph module captures fixed physical topology together with stress-dependent residual coupling. The MARL module learns coordinated corrective policies under partial observability and heterogeneous agent roles. Federated optimization with Rényi differential privacy supports collaborative training while limiting local information leakage. The causal module provides post hoc intervention-based interpretation of how wind variation, load growth, and line stress influence dispatch outcomes.

The main contributions of this study are summarized as follows: The adaptive graph is decomposed into a fixed physical-topology component and a learned residual component, enabling the model to capture time-varying operational coupling while preserving topological grounding. The dispatch task is formulated as a constrained partially observable stochastic game, and a task-masked coordination regularization strategy is adopted to improve collaboration among tightly coupled agents. Rather than claiming novelty in the privacy mechanism itself, this study focuses on integrating Rényi differential privacy into collaborative dispatch learning and quantifying the resulting privacy-utility trade-off. The proposed framework estimates intervention effects associated with wind support, load escalation, and line stress, thereby linking learned dispatch behavior to operationally meaningful questions. The empirical study is conducted on a public-trace-grounded benchmark based on a modified IEEE 30-bus system, with evaluation covering graph behavior, coordination performance, privacy diagnostics, causal robustness, and comparison with strong learning-based and optimization-based baselines.

2. Methodology

Figure 1 summarizes the overall framework. Multimodal operational data are first encoded by a topology-informed DGCN and a temporal encoder. The resulting latent representation is then consumed by the MASRL module for dispatch optimization. During training, clients participate in federated optimization under an RDP guarantee, and the final policy is analyzed using a structural causal model.

2.1. Problem Formulation as a Constrained Partially Observable Stochastic Game

Consider a smart grid with

N

controllable entities and a scheduling horizon of

T

intervals. Agent

i

may represent a thermal unit, gas unit, wind-farm aggregator, storage device, or regional load-serving controller. Because each agent observes only local measurements and delayed neighborhood information, the dispatch problem is naturally modeled as a constrained partially observable stochastic game [15]. Under centralized training, the same system can be viewed as a constrained partially observable Markov decision process with decentralized execution.

Let

s (t)

denote the global state at interval

t

. The state includes bus-level injections, line-loading ratios, reserve margin, locational prices, meteorological variables, renewable forecasts, realized renewable outputs, unit operating points, and topology descriptors. Agent

i

receives a local observation

o_{i} (t)

that is a partial function of the state. The joint observation is

o (t) = \{o_{1} (t), \dots, o_{N} (t)\}

, and the joint action is

a (t) = \{a_{1} (t), \dots, a_{N} (t)\}

.

The action of each agent depends on its physical role. For dispatchable generation units, actions include active-power adjustment, reserve participation, and upward or downward ramp decisions. For storage units, actions include charging, discharging, and reserve allocation. For load-serving entities, actions may include demand response activation, flexible load shifting, and interruption scheduling. These actions are coupled through network and market constraints.

The global state vector stacks 30 nodal net injections, 41 branch-loading ratios, six thermal-unit outputs, three wind realizations, three wind forecasts, two system reserve indicators, 30 locational-price proxies, four calendar markers, and six weather-derived drivers, giving 125 continuous features per interval. Each agent receives a role-specific observation of 18–32 features formed from local device states, neighboring line stress, and delayed regional measurements. The action dimension is three for dispatchable generators (active-power adjustment, upward reserve, downward reserve), two for storage devices (charge or discharge command and reserve share), and two for load-serving entities (flexible-load shift and curtailment request). The base reward uses

λ_{c o s t} = 1.0, λ_{s h e d} = 40.0, λ_{c u r t} = 6.0, λ_{c a r b o n} = 2.0, λ_{o v e r} = 18.0, λ_{r a m p} = 12.0, λ_{r e s e r v e} = 10.0

before adaptive penalty rescaling, so that security and reliability violations remain more expensive than marginal energy-redispatch changes.

C P O S G = (S, O, A, P, r, g, N, T)

(1)

where

S

is the global state space,

O

is the local observation space,

A

is the joint action space,

P

is the stochastic transition kernel,

r

is the reward function,

g

denotes the system constraints,

N

is the agent set, and

T

is the horizon.

m a x i m i z e o v e r π : E_{π} [\sum_{t = 0}^{T - 1} γ^{t} r (t)]

(2)

s u b j e c t t o g_{k} (s (t), a (t)) \leq 0, f o r k = 1, \dots, K

(3)

The instantaneous reward is designed to reflect the multi-objective nature of practical dispatch:

r (t) = - [C_{o p} (t) + λ_{1} C_{s h e d} (t) + λ_{2} C_{c u r t} (t) + λ_{3} C_{e m} (t) + λ_{4} C_{v i o l} (t)]

(4)

where

C_{o p}

is the operating cost,

C_{s h e d}

is the load-shedding penalty,

C_{c u r t}

is the renewable-curtailment penalty,

C_{e m}

is the carbon-emission cost, and

C_{v i o l}

aggregates security-constraint violations such as line overloads, reserve shortage, and ramp breaches.

The main physical constraints are listed below:

\sum d i s p a t c h a b l e g e n e r a t i o n + r e n e w a b l e o u t p u t + s t o r a g e d i s c h a r g e - s t o r a g e c h a r g e = d e m a n d + l o s s e s

(5)

P_{i, \min} \leq P_{i} (t) \leq P_{i, \max}

(6)

- R_{i, d o w n} \leq P_{i} (t) - P_{i} (t - 1) \leq R_{i, u p}

(7)

a b s o l u t e v a l u e o f l i n e f l o w | F_{l} (t) | \leq F_{l, \max} f o r e v e r y l i n e l

(8)

Equation (5) enforces system-wide power balance; (6) enforces generator capacity limits; (7) enforces ramping limits; and (8) enforces line-flow security. The proposed learning framework is motivated by the fact that these constraints operate under stochastic renewable output, partial observability, and distributed actions (Table 1 and Table 2).

2.2. Topology-Informed Adaptive Dynamic Graph Representation

A central challenge in renewable-rich grids is that node dependence is both structured and time-varying. The physical network topology is relatively fixed, but the effective operational coupling among buses changes with congestion, redispatch, reserve activation, and weather-driven renewable patterns. Therefore, a static adjacency matrix is often insufficient.

To capture this behavior, we define the adaptive adjacency matrix as the sum of a physically grounded base matrix and a learned residual matrix (Figure 2):

A (t) = A_{p h y s} + Δ A (t)

(9)

Here

A_{p h y s}

is derived from the electrical network using bus connectivity and electrical distance, while

Δ A (t)

is a data-driven residual that reflects time-varying couplings. This decomposition constrains the learned graph to remain physically plausible while preserving adaptability.

Given input feature tensor

X

over a rolling window, the graph representation is written as:

H^{(l + 1)} (t) = σ (\sum_{k = 0}^{K} [\hat{A} {(t)}^{k} H^{l} (t) W_{k}^{l}])

(10)

z (t) = S T C N (H^{L} (t - w + 1 : t))

(11)

where

\hat{A} (t)

is the normalized adaptive adjacency matrix,

H^{l} (t)

is the l-th graph-convolutional feature map,

W_{k}^{l}

are learnable parameters, and STCN is the temporal encoder that aggregates historical dynamics over window length

w

. The final output

z (t)

is the latent spatio-temporal state representation that feeds the reinforcement learner.

The DGCN and the dispatch policy are optimized in a coupled fashion. The DGCN reduces observation noise and compresses spatial dependencies into

z (t)

, thereby improving state identifiability for policy learning. Conversely, the downstream actor-critic gradients encourage the representation to emphasize dispatch-relevant patterns such as congestion precursors, renewable ramps, and reserve shortfalls. The result is not a loose pipeline but a representation-control co-optimization mechanism.

The learned graph is further regularized so that adaptation remains physically plausible. Specifically, the residual component is stress gated according to line loading, reserve shortage, and renewable ramp intensity, which prevents the model from inventing strong couplings in electrically quiet periods. A complementary plausibility penalty discourages electrically implausible long-range edges, aligns the learned adjacency with electrical–distance structure, and bounds spectral instability under stressed operation. This addition is intended to make dynamic graph learning dispatch-relevant and safety-aware rather than to claim a new generic graph-learning family. During evaluation, the learned graph is judged by topology consistency, electrical–distance correlation, congestion–edge recall, and N-1 secure decision ratio. These diagnostics provide direct evidence of whether the residual graph highlights physically meaningful stress paths instead of arbitrary correlations.

2.3. Multi-Agent Symbiotic Reinforcement Learning

Each controllable entity learns a policy

π_{i}

that maps local observations and graph-enhanced latent states to dispatch actions. Since multiple agents jointly affect the same network-level objective, independent learning may lead to unstable or redundant decisions. To address this issue, we introduce a symbiotic regularization term that encourages coordination only among agents with coupled operating responsibilities.

The training objective is defined as follows:

J (θ) = E [\sum_{t} γ^{t} \sum_{i} r_{i} (t)] - β \sum_{i < j} M_{i j} D_{K L} (π_{i} ∥ π_{j})

(12)

where

M_{i j}

is a task-interaction mask. If two agents share a congested corridor, co-provide balancing reserves, or serve the same regional load pocket, then

M_{i j} = 1

; otherwise

M_{i j} = 0

. Thus, the KL regularization term is not used to force global policy homogeneity. Instead, it reduces inconsistent reactions among tightly coupled agents and improves task-allocation smoothness in shared operating regions.

a_{i} (t) = A c t o r_{i} (z_{i} (t), o_{i} (t)), Q (t) = C r i t i c (z (t), a (t))

(13)

The actor uses local information enhanced by topology-aware features, while the critic evaluates the joint action against global constraints and reward. This centralized-training, decentralized-execution design is suitable for dispatch problems where local agents have partial observations but system-level performance must remain coherent (Figure 3).

Agent coordination is further modulated by a topology- and stress-aware coupling signal derived from the adaptive graph and system-security indicators. Under normal operating conditions, the coordination mask follows a static task-coupling pattern determined by resource roles and regional responsibilities. When congestion intensity, reserve shortage, or renewable-ramp stress exceeds predefined thresholds, the coordination mask is updated to emphasize agents located in the affected corridor or balancing region. This event-driven update concentrates coordination on reserve sharing, congestion relief, and ramp balancing during stressed intervals, thereby allowing the collaboration mechanism to adapt to evolving operational couplings rather than relying on a fixed interaction structure.

To improve training stability under heterogeneous operating regimes, the penalty weights associated with overload, reserve shortage, and ramp violations are adjusted online according to recent constraint-violation statistics. Let

v_{k}^{(t)}

denote the observed violation rate of constraint type

k

over the latest

H

episodes and let

{\bar{v}}_{k}

denote its target rate. The corresponding penalty weight is updated as

λ_{k}^{(t + 1)} = λ_{k}^{(t)} (1 + η \frac{v_{k}^{(t)} - {\bar{v}}_{k}}{{\bar{v}}_{k} + ϵ})

(14)

where

η

is the adaptation rate and

ϵ

is a small stabilizing constant. In this way, constraints that are violated more frequently receive stronger penalties, whereas consistently satisfied constraints do not dominate the reward signal. The adaptive scheme reduces manual retuning across scenarios, stabilizes critic learning, and improves policy transferability under different volatility conditions.

2.4. Federated Learning with Rényi Differential Privacy

Distributed smart-grid environments often impose strict data-sharing limitations. Operational records may contain commercially sensitive, geographically sensitive, or infrastructure-sensitive information. The proposed framework therefore adopts federated learning so that local training remains on each client while only clipped and perturbed model updates are transmitted, following the broader direction of privacy-preserving renewable and multi-energy learning.

Let

D_{i}

be the local dataset on client

i

and

F_{i} (θ)

be its empirical loss. The federated objective is

F (θ) = \sum_{i} [\frac{n_{i}}{\sum_{j} n_{j}}] F_{i} (θ)

(15)

where

n_{i}

is the local sample size. Before upload, each client clips its gradient and perturbs it with Gaussian noise:

Server aggregation uses sample-size-weighted averaging after local clipping and perturbation, i.e., each round combines client parameters in proportion to local sample count so that no client dominates the global update. The local gradient norm is clipped before Gaussian perturbation, and the privacy accountant accumulates the effective epsilon across communication rounds under

δ = 10^{- 5}

. This design makes the aggregation rule, privacy budget, and leakage evaluation directly comparable across privacy settings.

{\tilde{g}}_{i} (t) = Clip (g_{i} (t), C) + Normal (0, σ^{2} C^{2} I)

(16)

where

C

is the clipping norm and

σ

is the Gaussian noise multiplier. In the revised notation,

ε

denotes the privacy budget and

δ

denotes the failure probability, consistent with standard differential privacy definitions. The pair (

ε

,

δ

) is obtained from a Rényi differential privacy accountant after composition across communication rounds.

θ (t + 1) = \sum_{i} w_{i} θ_{i} (t + 1), w_{i} = \frac{n_{i}}{\sum_{j} n_{j}}

(17)

This design clarifies what is shared among agents under privacy protection: clipped and perturbed parameter updates, not raw measurements, labels, market records, or local grid traces. RDP is adopted because it yields tighter privacy accounting over repeated communication rounds and therefore offers better utility under the same target privacy guarantee. In addition, the final operating point is selected jointly from utility and attack-resistance curves rather than from accuracy alone.

We assume an honest-but-curious server or external eavesdropper that can observe model updates, per-round losses, or output confidence scores, but cannot access raw client data. The primary attack target is membership inference: the attacker attempts to decide whether a given local trajectory participated in training by exploiting confidence or update statistics [16].

2.5. Structural Causal Modeling and Intervention-Based Interpretation

Correlation-based analysis is not sufficient for smart-grid decision support because many operational variables are confounded by weather, temporal effects, and control reactions. To increase interpretability, we construct a structural causal model that links weather, renewable generation, net load, dispatch actions, congestion, and dispatch outcomes.

For modality

m

, the causal graph

G (m)

is obtained through a hybrid procedure. First, we define a prior skeleton based on power-system knowledge and time precedence. For example, weather variables may influence renewable output, renewable output influences net load and reserve requirements, and dispatch actions influence congestion and emissions. Second, admissible edges are refined using data-driven structure search and conditional-independence testing under temporal-order constraints. Edges that violate physical logic or time precedence are removed.

Y = f (X, U; G), a n d c a u s a l e f f e c t i s e v a l u a t e d b y P (Y | d o (X = x_s t a r))

(18)

where

X

denotes observed variables,

U

denotes exogenous disturbances,

G

is the causal graph, and

Y

is an outcome such as dispatch cost, curtailment, or congestion probability. The final causal analysis reports intervention effects together with bootstrap confidence intervals, placebo checks, and sensitivity-to-confounding diagnostics so that the interpretation is more rigorous than descriptive association alone. Explicit intervention design is required for rigorous evaluation. For example, the intervention

d o (W i n d S p e e d = q_{0.9})

can be compared with the control setting

W i n d S p e e d = q_{0.5}

, where

q_{0.9}

and

q_{0.5}

are the 90th and 50th percentiles of wind speed. The resulting change in renewable output, dispatch cost, or congestion risk can then be estimated under the structural causal model (Figure 4).

The structural causal analysis is conducted under four identification assumptions. First, temporal precedence is imposed within each 15 min interval so that exogenous drivers precede dispatch responses and outcome variables. Second, weather and calendar variables are treated as pre-treatment covariates. Third, dispatch actions are modeled as mediators that transmit the effects of exogenous variation to congestion, curtailment, and cost outcomes. Fourth, no unmodeled fast-timescale controller with simultaneous feedback is assumed to operate within the analysis window. Under these assumptions, average treatment effects are estimated by g-computation with bootstrap confidence intervals. Robustness is evaluated through placebo interventions on temporally shifted variables, leave-one-covariate-out sensitivity analysis, and bounded hidden-confounding stress tests.

The structural causal model includes the variables {weather, wind generation, net load, reserve margin, dispatch action, line stress, operating cost, curtailment, congestion risk}. Candidate edges are first screened according to temporal precedence and engineering feasibility, and are then retained only when they satisfy conditional-independence screening and remain stable across bootstrap structure-search runs. Intervention effects are estimated by g-computation on the fitted causal graph. Placebo interventions are generated by permuting treatment labels within the same diurnal block, and sensitivity is summarized by the minimum latent-confounding strength required to reverse the sign of the estimated effect.

3. Experimental Design

3.1. Test System, Data, and Scenario Definition

Experiments are conducted on a Texas/ERCOT-grounded public-trace dispatch dataset mapped to a modified IEEE 30-bus system with 41 transmission connections, six thermal units, three wind-farm injection points, and heterogeneous load centers. Historical ERCOT load and market-price records are combined with wind-generation traces derived from the NREL WIND Toolkit, while the MATPOWER-based 30-bus network serves as the physical dispatch backbone [17,18,19,20].

The remapped study window spans 90 consecutive days at 15 min resolution, corresponding to 8640 intervals. Public ERCOT load and market-price traces are temporally aligned with remapped wind trajectories so that the resulting dataset preserves realistic variability patterns while remaining fully reproducible. It records unit dispatch, bus-level load, reserve requirement, congestion proxy, locational marginal price, renewable output, and operating cost. The thermal fleet covers base-load, mid-merit, and flexible operating roles with unit limits ranging from 6 to 82 MW and ramp limits from 8 to 16 MW per 15 min, while the three wind plants are connected to buses 7, 19, and 27 with installed capacities of 58, 42, and 38 MW.

Three evaluation scenarios are defined using reproducible thresholds rather than qualitative labels only. The normal scenario contains intervals whose 1 h wind-ramp magnitude does not exceed 18.0 MW and whose congestion index remains below 12.45. The high-wind-volatility scenario contains intervals whose 1 h wind-ramp magnitude exceeds 32.9 MW, whereas the congestion-sensitive scenario contains intervals whose congestion index exceeds 13.93. For all methods, the first 60 days are used for training, the next 15 days for validation, and the final 15 days for testing to avoid temporal leakage.

3.2. Data Cleaning, Missing-Value Handling, and Feature Engineering

Engineering range checks are first applied to identify impossible values in generator output, bus load, reserve requirement, and congestion variables. Short isolated spikes are detected through robust filtering and temporal consistency checks on 15 min ramp increments. Missing gaps shorter than four intervals are imputed by linear interpolation for electrical variables and seasonally matched interpolation for exogenous weather-derived drivers, whereas longer gaps are masked during training and excluded from metric aggregation. All continuous variables are normalized using statistics computed from the training partition only.

Feature engineering then transforms the cleaned series into model inputs. The final tensor includes lagged load and wind features, net injection, reserve margin, rolling price volatility, line-loading proxy, bus-type embedding, electrical–distance encoding, and calendar–weather interaction features. The look-back window is fixed to 16 intervals, identical feature definitions are used for all learning-based baselines, and every experiment is repeated with five random seeds (Table 3).

3.3. Baselines and Implementation Details

The baseline set spans classical forecasting-driven dispatch, graph-learning baselines, strong MARL baselines, and advanced uncertainty-aware optimization comparators. ARIMA and LSTM represent forecasting-driven baselines, while stochastic dispatch and adaptive look-ahead economic dispatch represent stronger engineering dispatch comparators [21,22]. DeepOPF is included as a learning-based optimization comparator, and MADDPG, QMIX, and MAPPO serve as strong MARL baselines [23,24,25,26]. Topology-aware graph comparators are represented through recent GNN studies for operational risk assessment and state estimation [27,28].

To ensure fairness, all methods use the same chronological split, identical scenario definitions, and the same evaluation horizon (Table 4). Hyperparameters are tuned on the validation set only. The DGCN contains two graph-convolutional layers with hidden dimension 64 followed by a temporal encoder with kernel size 3, while the actor and critic each use two fully connected layers of width 128. Optimization uses Adam with learning rate 10⁻⁴, batch size 256, discount factor

γ

= 0.99, early stopping on validation reward, and 1200 episodes. Federated optimization uses 50 communication rounds, clipping norm

C

= 1.0, noise multiplier

σ

= 0.8, and an RDP accountant to track privacy loss. All stochastic baselines are trained with the same observation space, action bounds, engineering constraints, and five-seed protocol so that mean values, standard deviations, and significance checks remain comparable.

3.4. Evaluation Metrics

Accordingly, the evaluation reports operating cost, renewable curtailment rate, load shedding, average overload rate, ramp-violation rate, carbon-emission intensity, cumulative reward, topology consistency, electrical–distance correlation, congestion–edge recall, and N-1 secure decision ratio. Prediction indicators, such as RMSE and MAE, are retained only to support the representation-learning discussion.

In addition, the following coordination, privacy, and causal indicators are reported: collaboration-efficiency score, reserve mismatch, convergence speed, policy adaptation lag, privacy budget (ε, δ), membership-inference AUC, TPR at 10% FPR, attack accuracy, average treatment effect, placebo effect, and bootstrap confidence intervals. All stochastic results are reported as averages over five runs, and the extended comparator tables report mean ± standard deviation whenever repeated training is applicable.

4. Results and Discussion

Whether the dynamic adjacency learned by the DGCN reflects actual grid structure rather than arbitrary latent connectivity. Figure 5 addresses this issue by jointly reporting topology consistency and the N-1 secure decision ratio under varying privacy budgets. Under normal conditions, the learned adjacency remains close to the physical topology, which confirms that the model does not invent unnecessary long-range couplings.

4.1. Adaptive Graph Behavior and Coordination Under Stressed Operation

A central issue in the proposed framework is whether the learned adaptive adjacency reflects meaningful grid couplings rather than arbitrary latent similarity. The value of the graph module does not lie in dynamic graph learning alone, but in learning a dispatch-oriented residual graph that remains anchored to physical structure while adapting to stressed operating conditions. Figure 5 therefore reports graph-side diagnostics, whereas Figure 6 shows the corresponding coordination behavior under the same stressed operating conditions.

Figure 5 indicates that the adaptive residual graph remains physically interpretable across the practical privacy range. Topology consistency rises from 0.72 at

ε

= 0.9 to 0.92 at

ε

= 3.1, after which the gain becomes marginal. The N-1 secure decision ratio follows the same pattern and reaches 0.956 at

ε

= 5.8, while the electrical–distance alignment metric remains above 0.80 throughout the recommended operating region. These results show that the learned residual couplings do not drift toward an unconstrained latent graph. Instead, they strengthen dependencies that remain plausible from the viewpoints of electrical proximity, congestion propagation, and stressed transfer corridors. The graph is therefore acting as a topology-anchored stress amplifier rather than as a free-form similarity matrix.

The stressed-interval behavior supports the same interpretation. When congestion intensifies or reserve deployment becomes necessary after renewable ramps, the residual edge weights increase mainly around congestion corridors and reserve-coupled subregions rather than across electrically distant buses. This behavior is important because it demonstrates that graph adaptability and engineering plausibility improve together instead of trading off against one another. In other words, the learned graph is not only adaptive, but also operationally meaningful under the same situations in which coordination pressure actually increases.

The operational effect of this graph adaptation becomes clear in Figure 6. Removing the KL-based symbiotic coordination term increases unserved energy from 5.8 to 15.1 MWh/day, reduces collaboration efficiency from 0.97 to 0.81, and lengthens policy-adaptation lag from 2.1 to 3.8 dispatch intervals. Table 5 further shows that removing KL coordination increases reserve mismatch from 1.2 to 2.9 MW even though topology consistency remains close to the full-model level. This difference clarifies the complementary roles of the two modules: the adaptive graph identifies changing dependencies, whereas the coordination term translates those dependencies into smoother task allocation and faster corrective alignment among coupled agents.

These results also explain how multi-agent collaboration adapts to topology changes and stressed operation. In the proposed framework, changes in congestion pressure, reserve coupling, and renewable ramps alter the residual graph and, through the latent state, affect the effective interaction pattern among agents. The KL-based coordination term is therefore selective rather than homogenizing. It acts only on agents that share stressed corridors, balancing responsibilities, or regional support tasks. The resulting coordination gain reflects more coherent dispatch behavior under pressure, not a loss of policy diversity.

4.2. Dispatch Performance, Strong-Baseline Comparison, and Strategy Refinement Under Renewable Uncertainty

The benchmark results evaluate whether the proposed closed-loop design improves dispatch performance under renewable uncertainty relative to both learning-based and optimization-based comparators. Figure 7, Table 5, Table 6 and Table 7 provide complementary evidence through event-level strategy refinement, core benchmark comparison, strong-baseline comparison, and external validation.

Figure 7 illustrates a representative wind-ramp day. The proposed controller reallocates thermal generation and reserve promptly after the strongest renewable ramps, while the reserve trajectory remains feasible over the full 24 h horizon. The dispatch response is not dominated by one delayed redispatch action. Instead, the policy refines its strategy over the following dispatch intervals, which is consistent with the shorter adaptation lag reported in Section 4.1. This event-level behavior is important because it shows how the learned policy responds to stochastic renewable conditions in time, rather than only through average performance summaries.

The benchmark results in Table 5 show that the proposed method outperforms forecasting-based, static-graph, and initial sequential-decision baselines across all core metrics. Relative to the strongest model in Table 5, namely the hybrid GNN-RL baseline, the proposed framework lowers daily operating cost from 482.7 to 468.9 × 10³ USD, reduces renewable curtailment from 5.9% to 4.8%, decreases overload rate from 0.60% to 0.30%, and improves reward from −41.2 to −36.4. Load shedding is almost eliminated, decreasing from 0.3% to 0.1%, while carbon intensity falls from 0.58 to 0.55 tCO2/MWh. These gains indicate that the advantage of the full framework is not confined to one economic indicator. It improves economy, security, flexibility, and emissions-related performance simultaneously.

Table 6 shows that the performance advantage persists when stronger baselines are included. Relative to MAPPO, operating cost falls from 474.6 to 468.9 × 10³ USD/day and overload rate drops from 0.41% to 0.30%. Compared with interval-uncertainty OPF and chance-constrained SCED, the proposed method also achieves lower curtailment and a better reward while retaining engineering feasibility. This comparison is particularly important because it shows that the proposed framework is not merely outperforming weak forecasting baselines. Its gains remain visible against both mature MARL coordination schemes and stronger uncertainty-aware dispatch optimizers.

The constraint-handling design is further clarified by the penalty-sensitivity study. Sweeping the overload penalty in {12, 18, 24} and the reserve-shortage penalty in {6, 10, 14} shows that the selected pair {18, 10} yields the best compromise between economy and security. Lower overload penalties allow more frequent corridor violations, whereas higher penalties slightly reduce overload risk but increase curtailment and operating cost. The selected operating point therefore represents a balanced constrained-RL configuration rather than an aggressive cost-minimizing setting.

External validation leads to the same overall conclusion. Table 7 shows that the performance ranking remains stable under public-trace remapping across the base remapped case, the high-volatility remapped case, and the winter-peak remapped case. Although operating cost rises under the harder scenarios, overload rate remains below 0.50% and reward degradation remains moderate. This result suggests that the learned policy generalizes beyond a single benchmark realization and remains effective under more challenging public-trace conditions.

4.3. Privacy-Utility Trade-Off and Leakage Evaluation

The privacy results evaluate whether RDP-protected federated training can be incorporated into the dispatch loop with limited utility degradation and whether the selected privacy budget is supported by explicit leakage evaluation. Table 8 summarizes the joint behavior of privacy budget, dispatch utility, security performance, and membership-inference success.

The results reveal a clear privacy-utility frontier. Moving from the non-private model to ε = 3.1 increases daily operating cost only from 467.6 to 468.9 × 10³ USD/day and raises curtailment only from 4.7% to 4.8%, while the N-1 secure ratio remains nearly unchanged, decreasing only from 0.957 to 0.956. At the same time, membership-inference attack AUC drops from 0.73 to 0.57. This trade-off explains the selection of the mid-range privacy setting, which preserves most of the dispatch value while substantially reducing leakage risk.

The two extremes of the privacy sweep are less attractive. Very strong privacy protection at ε = 0.9 suppresses attack AUC further to 0.54, but operating cost increases to 476.8 × 10³ USD/day, curtailment rises to 5.8%, and the N-1 secure ratio drops to 0.921. By contrast, very weak protection at ε = 5.8 recovers almost all utility, but attack AUC rises to 0.63. The recommended operating point is therefore not chosen because it optimizes one metric in isolation. It is selected because it occupies the most balanced region of the privacy-utility curve [29,30].

These results also clarify the contribution of the privacy module. The paper does not claim novelty from proposing a new differential-privacy mechanism by itself. Rather, the contribution lies in integrating RDP-protected federated training into a renewable-rich dispatch loop and quantifying the resulting trade-off between privacy, dispatch performance, and leakage resistance under an explicit threat model. From a deployment perspective, this is the relevant question: not whether privacy improves reward, but whether privacy can be added at an acceptable operational cost.

4.4. Causal Intervention Analysis and Module-Level Validation

The causal and ablation results assess whether the structural causal module provides evidence beyond descriptive association and whether the overall gain of the framework can be attributed to the intended components. Figure 8, Table 9, Table 10 and Table 11 provide the corresponding evidence from intervention analysis and module-level diagnostics.

Figure 8 and Table 9 show that the structural causal model yields stable intervention-oriented results that are consistent with system intuition. The intervention do(Wind high) reduces expected operating cost from 63.9 to 56.2 USD/MWh and lowers congestion risk from 0.31 to 0.24. Table 4 quantifies these effects: the average treatment effect of do(Wind high) is −7.7 USD/MWh for operating cost and −0.07 for congestion risk, with narrow bootstrap confidence intervals and small placebo effects. By contrast, do(Load high) increases operating cost by +6.4 USD/MWh, whereas do(Line stress) increases curtailment by +1.9%. The reported sensitivity statistics remain moderate, which suggests that the intervention direction is not driven by unstable model specification alone.

The ablation results strengthen this interpretation. Table 10 shows that removing the dynamic residual graph causes a substantial deterioration in RMSE, operating cost, curtailment, overload rate, and reward. Table 11 explains why: topology consistency drops from 0.92 to 0.81 and the N-1 secure ratio falls from 0.956 to 0.921. This finding indicates that the adaptive residual graph is not a cosmetic component. It is the main source of topology-aware situational awareness under stressed dispatch conditions.

Removing KL symbiosis causes the largest coordination-side deterioration. Collaboration efficiency falls from 0.97 to 0.81 and reserve mismatch rises from 1.2 to 2.9 MW, even though topology consistency remains close to the full-model level. This confirms that the graph module and the coordination module serve different roles. The graph identifies changing dependencies, whereas the symbiotic objective translates those dependencies into more coherent dispatch and reserve allocation under pressure.

Removing federated RDP yields only a small utility gain but sharply increases privacy exposure, with attack AUC rising from 0.57 to 0.73. This result reinforces the conclusion that privacy protection is introduced at limited operational cost. Finally, removing the causal module changes the point metrics only moderately, but wind-cost stability declines from 0.86 to 0.61. This indicates that the causal component contributes primarily to the robustness of intervention-oriented interpretation rather than to large gains in average reward.

Taken together, these results support the modular logic of the framework. The superiority of the full model does not come from one oversized component. It arises from the complementary interaction of topology-anchored graph adaptation, selective multi-agent coordination, privacy-aware collaborative training, and intervention-based analysis.

4.5. External Validity, Practical Scope, and Study Limitations

Although the proposed framework performs consistently across the benchmark, strong baselines, and public-trace remapping cases, several limitations remain. First, the study is conducted on a public-trace-driven benchmark mapped to a benchmark network rather than on confidential utility-grade dispatch logs. The present evidence therefore supports algorithmic validity and engineering plausibility, but it should not yet be interpreted as a direct claim of utility-scale deployment readiness.

Second, privacy evaluation is strengthened by membership-inference experiments, but the attack space is not exhausted. In particular, broader reconstruction-style attacks are discussed at the threat-model level rather than benchmarked as extensively as membership inference. Third, the causal conclusions depend on structural assumptions, temporal-order restrictions, and intervention design choices. For this reason, the causal module should be interpreted as an operator-oriented decision-support layer rather than as unrestricted causal truth.

These limitations also help define the practical scope of the present study. The current results suggest that dispatch intelligence for renewable-rich grids should be judged not only by average cost reduction, but also by physical consistency of learned couplings, coordination quality under stressed operation, privacy-compliant collaborative training, and intervention-oriented interpretability. Within that scope, the framework shows favorable performance on the test benchmark and provides a structured basis for future evaluation in larger security-constrained and utility-facing environments.

5. Conclusions

This paper presents a unified framework for uncertainty-aware smart-grid dispatch that integrates a topology-informed residual graph, multi-agent symbiotic reinforcement learning, RDP-protected federated training, and structural causal intervention analysis within a closed-loop decision architecture. By formulating the dispatch task as a constrained partially observable stochastic game, the proposed method connects state representation, distributed coordination, privacy-preserving collaborative training, and operator-facing interpretation under renewable uncertainty.

The empirical results show that the proposed framework achieves the best overall performance among the compared baselines on the public-trace-driven benchmark. Daily operating cost decreases to 468.9 × 10³ USD, renewable curtailment falls to 4.8%, load shedding is reduced to 0.1%, and overload rate declines to 0.30%. The graph-side diagnostics indicate that the learned residual graph remains physically anchored while adapting to stressed operation, and the coordination results confirm the value of KL-based symbiotic coordination in reducing unserved energy and shortening adaptation lag. The privacy sweep further shows that the selected mid-range privacy budget preserves most dispatch utility while reducing membership-inference attack AUC from 0.73 to 0.57. In addition, the causal analysis provides intervention-oriented evidence showing that higher wind support improves dispatch outcomes, whereas load escalation and line stress worsen cost, congestion, and curtailment.

Overall, these results suggest that dispatch intelligence for renewable-rich power systems should be evaluated not only by economic efficiency, but also by physical consistency, coordination quality, privacy compliance, and operational interpretability. Although the present study is conducted on a public-trace-driven benchmark, it provides a structured basis for future evaluation in larger security-constrained and utility-facing environments. Future work will extend the framework to richer multi-energy settings, larger-scale dispatch systems, and more realistic communication and cyber-physical disturbances.

Author Contributions

Conceptualization, Y.L. (Liu Yue) and Q.C.; methodology, Y.L. (Liu Yue) and S.Z.; software, Y.L. (Liu Yue); validation, Y.L. (Yuchun Li), J.Y. and S.Z.; formal analysis, Y.L. (Liu Yue) and S.Z.; investigation, Y.L. (Yuchun Li) and J.Y.; resources, Y.L. (Yuchun Li), J.Y. and Z.H.; data curation, J.Y. and Z.H.; writing—original draft preparation, Y.L. (Liu Yue); writing—review and editing, Q.C., Y.L. (Yuchun Li), J.Y., S.Z. and Z.H.; visualization, Y.L. (Liu Yue) and Z.H.; supervision, Q.C.; project administration, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (52574083).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Yuchun Li was employed by the company Daqing Oilfield Design Institute Co., Ltd., author Jinwei Yang was employed by the company Sino-Pipeline International Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Bai, Y.; Chen, S.; Zhang, J.; Xu, J.; Gao, T.; Wang, X.; Gao, D.W. An adaptive active power rolling dispatch strategy for high proportion of renewable energy based on distributed deep reinforcement learning. Appl. Energy 2023, 330, 120294. [Google Scholar] [CrossRef]
Zhang, C.; Wang, D.; Zhang, W.; He, L.; Zhou, K.; Li, J.; Zhu, L.; Zhou, B.; Zhou, Q.; Shuai, Z. A novel optimal power flow method considering interval uncertainties under high renewable penetration based on security limits definition. IEEE Trans. Sustain. Energy 2025, 17, 926–938. [Google Scholar] [CrossRef]
Liao, W.; Bak-Jensen, B.; Pillai, J.R.; Wang, Y.; Wang, Y. A review of graph neural networks and their applications in power systems. J. Mod. Power Syst. Clean Energy 2022, 10, 345–360. [Google Scholar] [CrossRef]
Authier, J.; Haider, R.; Annaswamy, A.; Dorfler, F. Physics-informed graph neural network for dynamic reconfiguration of power systems. Electr. Power Syst. Res. 2024, 235, 110817. [Google Scholar] [CrossRef]
Liu, J.; Jiang, G.; Chu, C.; Li, Y.; Wang, Z.; Hu, S. A formal model for multiagent Q-learning on graphs. Sci. China Inf. Sci. 2025, 68, 192206. [Google Scholar] [CrossRef]
Li, P.; Tang, X.-W.; Huang, X.-L.; Hu, F. Dynamic spectrum access for C-V2X via imitating Indian buffet process. IEEE Internet Things J. 2025, 24, 19849–19860. [Google Scholar] [CrossRef]
Tang, X.-W.; Huang, Y.; Shi, Y.; Wu, Q. MUL-VR: Multi-UAV collaborative layered visual perception and transmission for virtual reality. IEEE Trans. Wirel. Commun. 2025, 74, 2734–2749. [Google Scholar] [CrossRef]
Jiang, J.; Zhang, Y.; Wan, A.; AL-Bukhaiti, K.; Huang, J.; Cheng, X. Multi-target detection for safety monitoring in complex substation environments using YOLO-DySE. Signal Image Video Process. 2025, 19, 804. [Google Scholar]
Cao, B.; Liu, K.; Wu, G.; He, Z.; Xin, D.; Chen, K.; Gao, G. A self-supervised evaluation approach of insulation condition for vehicle cable terminals using hypergraph neural network with dynamic features. IEEE Trans. Ind. Inform. 2025, 74, 9330–9340. [Google Scholar] [CrossRef]
Grataloup, A.; Jonas, S.; Meyer, A. A review of federated learning in renewable energy applications: Potential, challenges, and future directions. Energy AI 2024, 17, 100375. [Google Scholar] [CrossRef]
Zhang, Y.; Ren, Y.; Liu, Z.; Li, H.; Jiang, H.; Xue, Y.; Ou, J.; Hu, R.; Zhang, J.; Gao, D.W. Federated deep reinforcement learning for varying-scale multi-energy microgrids energy management considering comprehensive security. Appl. Energy 2025, 380, 125072. [Google Scholar] [CrossRef]
Shen, B.; Yang, S.; Hu, J.; Zhang, Y.; Zhang, L.; Ye, S.; Yang, Z.; Yu, J.; Gao, X.; Zhao, E. Interpretable causal-based temporal graph convolutional network framework in complex spatio-temporal systems for CCUS-EOR. Energy 2024, 309, 133129. [Google Scholar] [CrossRef]
Kilembe, A.B.; Hamilton, R.I.; Papadopoulos, P.N. Explainable machine learning: A SHAP value-based approach to locational frequency stability. Int. J. Electr. Power Energy Syst. 2025, 170, 110885. [Google Scholar] [CrossRef]
Ebrie, A.S.; Kim, Y.J. Reinforcement learning-based optimization for power scheduling in a renewable energy connected grid. Renew. Energy 2024, 230, 120886. [Google Scholar] [CrossRef]
Liu, D.; Cheng, P.; Cheng, J.; Liu, J.; Lu, M.; Jiang, F. Improved reinforcement learning-based real-time energy scheduling for prosumer with elastic loads in smart grid. Knowl.-Based Syst. 2023, 280, 111004. [Google Scholar] [CrossRef]
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–26 May 2017; pp. 3–18. [Google Scholar]
Draxl, C.; Hodge, B.M.; Clifton, A.; McCaa, J. The wind integration national dataset (WIND) toolkit. Appl. Energy 2015, 151, 355–366. [Google Scholar] [CrossRef]
ERCOT. Hourly Load Data Archives. Available online: https://www.ercot.com/gridinfo/load/load_hist (accessed on 7 April 2026).
ERCOT. Market Prices. Available online: https://www.ercot.com/mktinfo/prices/index.html (accessed on 7 April 2026).
Zimmerman, R.D.; Murillo-Sanchez, C.E.; Thomas, R.J. MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 2011, 26, 12–19. [Google Scholar] [CrossRef]
Zhao, P.; Li, Z.; Bai, X.; Su, J.; Chang, X. Stochastic real-time dispatch considering AGC and electric-gas dynamic interaction: Fine-grained modeling and noniterative decentralized solutions. Appl. Energy 2024, 375, 123976. [Google Scholar] [CrossRef]
Wang, X.; Zhong, H.; Zhang, G.; Ruan, G.; He, Y.; Yu, Z. Adaptive look-ahead economic dispatch based on deep reinforcement learning. Appl. Energy 2024, 353, 122121. [Google Scholar] [CrossRef]
Pan, X.; Zhao, T.; Chen, M.; Zhang, S. DeepOPF: A deep neural network approach for security-constrained DC optimal power flow. IEEE Trans. Power Syst. 2021, 36, 1725–1735. [Google Scholar] [CrossRef]
Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6382–6393. [Google Scholar]
Rashid, T.; Samvelyan, M.; Schroeder de Witt, C.; Farquhar, G.; Foerster, J.N.; Whiteson, S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
Yu, C.; Velu, A.; Vinitsky, E.; Wang, Y.; Bayen, A.; Wu, Y.; Gao, J. The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 2022, 35, 24611–24624. [Google Scholar]
Zhang, Y.; Karve, P.M.; Mahadevan, S. Graph neural networks for power grid operational risk assessment under evolving unit commitment. Appl. Energy 2025, 380, 124793. [Google Scholar] [CrossRef]
Ngo, Q.-H.; Nguyen, B.L.H.; Vu, T.V.; Zhang, J.; Ngo, T. Physics-informed graphical neural network for power system state estimation. Appl. Energy 2024, 358, 122602. [Google Scholar] [CrossRef]
Tang, Y.; Zhang, S.; Zhang, Z. A privacy-preserving framework integrating federated learning and transfer learning for wind power forecasting. Energy 2024, 286, 129639. [Google Scholar] [CrossRef]
Morcillo-Jimenez, R.; Rivas, J.; Ruiz, M.D.; Martin-Bautista, M.J.; Fernandez Basso, C. Privacy-preserving energy analytics in smart offices via container-based federated learning. Internet Things 2025, 34, 101782. [Google Scholar] [CrossRef]

Figure 1. Overview of the revised dispatch framework.

Figure 2. Physical-topological and dynamic residual decomposition of the adaptive adjacency matrix.

Figure 3. Constrained partially observable stochastic game for distributed grid dispatch.

Figure 4. Structural causal graph and example intervention path.

Figure 5. Interpretability and security under differential privacy.

Figure 6. Effect of the KL coordination term on resilience and collaboration.

Figure 7. Dispatch trajectories under renewable uncertainty.

Figure 8. Causal intervention outcomes with respect to cost, congestion, and curtailment.

Table 1. Representative state, observation, action, and reward configuration used in training.

Item	Representative Content	Dimension/Range	Role in Training
State	Injections, branch loading, reserve, prices, weather, renewable realization	125 continuous features	Shared environment description for critic and SCM
Local observation	Device status, neighboring stress, delayed regional signals	18–32 features per agent	Decentralized policy input
Action	Generator adjustment, reserve share, storage command, load shift	2–3 controls per agent	Engineering-feasible dispatch decision
Reward	Cost, shedding, curtailment, carbon, overload, ramp, reserve penalties	Adaptive weighted sum	Balances economy, security, and service continuity

Table 2. Maps the CPOSG formulation to the physical dispatch problem.

Component	Symbol	Power-System Meaning
Global state	$s (t)$	Network status, nodal injections, prices, renewable realization, reserve margin, operating conditions
Local observation	$o_{i} (t)$	Local measurements and delayed neighborhood information available to agent $i$
Action	$a_{i} (t)$	Generation adjustment, reserve, charge/discharge, demand response, curtailment control
Transition	$P$	Grid evolution under renewable uncertainty, demand fluctuation, and control actions
Reward	$r (t)$	Economic, reliability, curtailment, carbon, and security objectives combined
Constraint set	$g_{k}$	Power balance, capacity, ramping, transmission, reserve, and feasibility constraints

Table 3. Test-system and benchmark configuration used in all experiments.

Item	Configuration
Network backbone	Modified IEEE 30-bus; 41 transmission branches
Thermal fleet	6 units; Pmin 6–18 MW; Pmax 32–82 MW; ramp 8–16 MW/15 min
Wind plants	Buses 7, 19, and 27; capacities 58, 42, and 38 MW
System demand	Mean 224.8 MW; peak 299.8 MW; trough 175.0 MW
Reserve formulation	0.085 × load + 0.12 × wind output
Time horizon	90 days; 15 min resolution; 8640 intervals
Market proxy	Average LMP proxy 56.6 USD/MWh
Trace grounding	ERCOT hourly load archive + ERCOT market-price traces + WIND Toolkit wind power series remapped to the 30-bus backbone

Table 4. Scenario definition and chronological data split.

Element	Quantitative Definition
Normal scenario	\|1 h wind ramp\| ≤ 18.0 MW and congestion index ≤ 12.45
High-wind-volatility scenario	\|1 h wind ramp\| > 32.9 MW (85%)
Congestion-sensitive scenario	Congestion index > 13.93 (90%)
Training partition	Days 1–60 (5760 intervals)
Validation partition	Days 61–75 (1440 intervals)
Test partition	Days 76–90 (1440 intervals)

Table 5. Dispatch-oriented comparison across baseline methods.

Method	Operating Cost	Curtailment Rate	Load Shedding	Overload Rate	Ramp Violation Rate	Carbon Intensity	Reward
ARIMA + rule-based dispatch	528.4	8.9	1.4	2.1	1.8	0.69	−64.3
LSTM + rule-based dispatch	507.6 ± 5.5	7.7 ± 0.3	0.9 ± 0.1	1.6 ± 0.2	1.2 ± 0.1	0.65 ± 0.01	−55.8 ± 1.4
ST-GCN	494.2 ± 4.1	6.6 ± 0.2	0.5 ± 0.1	0.9 ± 0.1	0.8 ± 0.1	0.61 ± 0.01	−47.5 ± 1.2
Transformer	499.8 ± 4.6	6.9 ± 0.3	0.6 ± 0.1	1.0 ± 0.1	0.9 ± 0.1	0.62 ± 0.01	−49.1 ± 1.3
Hybrid GNN-RL baseline	482.7 ± 3.3	5.9 ± 0.2	0.3 ± 0.1	0.6 ± 0.1	0.5 ± 0.1	0.58 ± 0.01	−41.2 ± 0.9
Proposed method	468.9 ± 2.8	4.8 ± 0.2	0.1 ± 0.0	0.3 ± 0.0	0.2 ± 0.0	0.55 ± 0.01	−36.4 ± 0.8

Table 6. Comparison with strong MARL and uncertainty-aware optimization baselines.

Method	Category	Operating Cost (10³ USD/Day)	Curtailment Rate (%)	Overload Rate (%)	Reward
MADDPG	MARL	479.8 ± 3.9	5.6 ± 0.2	0.51 ± 0.04	−40.1 ± 1.1
QMIX	MARL	476.2 ± 3.4	5.4 ± 0.2	0.44 ± 0.03	−39.2 ± 0.9
MAPPO	MARL	474.6 ± 3.1	5.2 ± 0.2	0.41 ± 0.03	−38.4 ± 0.9
Interval-uncertainty OPF	Optimization	486.9	5.8	0.38	−41.6
Chance-constrained SCED	Optimization	489.7	6.1	0.46	−42.8
Proposed method	Unified framework	468.9 ± 2.8	4.8 ± 0.2	0.30 ± 0.02	−36.4 ± 0.8

Table 7. External validation under public-trace remapping scenarios.

Scenario	Source-Trace Mix	Operating Cost (10³ USD/Day)	Curtailment (%)	Overload Rate (%)	Reward
Base remapped case	ERCOT load + ERCOT RTM price + WIND Toolkit wind	472.1 ± 3.0	5.1 ± 0.2	0.36 ± 0.03	−37.5 ± 0.8
High-volatility remapped case	ERCOT load + high-ramp wind subset + RTM price	481.3 ± 4.1	5.9 ± 0.3	0.48 ± 0.04	−40.2 ± 1.0
Winter-peak remapped case	ERCOT winter-peak load + RTM price + WIND Toolkit wind	478.7 ± 3.6	5.5 ± 0.2	0.42 ± 0.03	−39.1 ± 0.9

Table 8. Privacy-utility trade-off and membership-inference results across privacy budgets.

Noise Multiplier	Epsilon	Delta	Operating Cost (10³ USD/Day)	Curtailment Rate (%)	N-1 Secure Ratio	Attack AUC
0.0 (non-private)	inf	0	467.6	4.7	0.957	0.73
1.2	0.9	10⁻⁵	476.8	5.8	0.921	0.54
1.0	1.7	10⁻⁵	472.9	5.2	0.944	0.56
0.8	3.1	10⁻⁵	468.9	4.8	0.956	0.57
0.6	5.8	10⁻⁵	468.2	4.7	0.957	0.63

Table 9. SCM-based intervention estimates with bootstrap confidence intervals, placebo checks, and sensitivity statistics.

Intervention	Outcome	ATE	95% CI	Placebo Effect	Sensitivity Rho
do(Wind high)	Operating cost	−7.7 USD/MWh	[−8.5, −6.8]	−0.4	0.18
do(Wind high)	Congestion risk	−0.07	[−0.09, −0.05]	−0.01	0.15
do(Load high)	Operating cost	+6.4 USD/MWh	[+5.6, +7.2]	+0.3	0.21
do(Line stress)	Curtailment	+1.9%	[+1.4, +2.3]	+0.1	0.19

Table 10. Ablation study of core dispatch performance on the modified IEEE 30-bus benchmark.

Variant	RMSE	Operating Cost	Curtailment	Overload Rate	Reward
Full model	3.82 ± 0.05	468.9 ± 2.8	4.8 ± 0.2	0.30 ± 0.02	−36.4 ± 0.8
w/o dynamic residual graph	4.37 ± 0.08	479.5 ± 3.1	5.7 ± 0.2	0.52 ± 0.03	−40.7 ± 0.9
w/o KL symbiosis	4.58 ± 0.09	486.1 ± 3.5	6.4 ± 0.3	0.73 ± 0.04	−43.9 ± 1.0
w/o federated RDP	3.78 ± 0.05	467.6 ± 2.6	4.7 ± 0.2	0.29 ± 0.02	−36.0 ± 0.7
w/o causal module	3.89 ± 0.06	471.4 ± 2.9	5.0 ± 0.2	0.34 ± 0.02	−37.2 ± 0.8

Table 11. Module-specific diagnostics for graph validity, coordination quality, privacy leakage, and intervention stability in the ablation study.

Variant	Topology Consistency	N-1 Secure Ratio	Collab. Efficiency	Reserve Mismatch (MW)	Attack AUC	Wind-Cost Stability	Placebo Effect
Full model	0.92	0.956	0.97	1.2	0.57	0.86	−0.3
w/o dynamic residual graph	0.81	0.921	0.94	1.8	0.57	0.80	−0.4
w/o KL symbiosis	0.91	0.948	0.81	2.9	0.57	0.78	−0.3
w/o federated RDP	0.92	0.957	0.97	1.2	0.73	0.86	−0.3
w/o causal module	0.92	0.955	0.96	1.3	0.57	0.61	−0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Cheng, Q.; Li, Y.; Yang, J.; Zhao, S.; Huang, Z. Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis. Processes 2026, 14, 1274. https://doi.org/10.3390/pr14081274

AMA Style

Liu Y, Cheng Q, Li Y, Yang J, Zhao S, Huang Z. Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis. Processes. 2026; 14(8):1274. https://doi.org/10.3390/pr14081274

Chicago/Turabian Style

Liu, Yue, Qinglin Cheng, Yuchun Li, Jinwei Yang, Shaosong Zhao, and Zhengsong Huang. 2026. "Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis" Processes 14, no. 8: 1274. https://doi.org/10.3390/pr14081274

APA Style

Liu, Y., Cheng, Q., Li, Y., Yang, J., Zhao, S., & Huang, Z. (2026). Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis. Processes, 14(8), 1274. https://doi.org/10.3390/pr14081274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physics-Informed Smart Grid Dispatch Under Renewable Uncertainty: Dynamic Graph Learning, Privacy-Aware Multi-Agent Reinforcement Learning, and Causal Intervention Analysis

Abstract

1. Introduction

2. Methodology

2.1. Problem Formulation as a Constrained Partially Observable Stochastic Game

2.2. Topology-Informed Adaptive Dynamic Graph Representation

2.3. Multi-Agent Symbiotic Reinforcement Learning

2.4. Federated Learning with Rényi Differential Privacy

2.5. Structural Causal Modeling and Intervention-Based Interpretation

3. Experimental Design

3.1. Test System, Data, and Scenario Definition

3.2. Data Cleaning, Missing-Value Handling, and Feature Engineering

3.3. Baselines and Implementation Details

3.4. Evaluation Metrics

4. Results and Discussion

4.1. Adaptive Graph Behavior and Coordination Under Stressed Operation

4.2. Dispatch Performance, Strong-Baseline Comparison, and Strategy Refinement Under Renewable Uncertainty

4.3. Privacy-Utility Trade-Off and Leakage Evaluation

4.4. Causal Intervention Analysis and Module-Level Validation

4.5. External Validity, Practical Scope, and Study Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI