Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches

Ma, Guozhen; Pang, Ning; Hu, Shiyao; Wang, Yunjia; Han, Chong; Liao, Siyang

doi:10.3390/en19133070

Open AccessArticle

Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches

by

Guozhen Ma

¹,

Ning Pang

¹,

Shiyao Hu

¹,

Yunjia Wang

¹,

Chong Han

^2,* and

Siyang Liao

²

¹

Economic and Technological Research Institute of State Grid Hebei Electric Power Co., Ltd., Shijiazhuang 050023, China

²

Wuhan Longde Control Technology Co., Ltd., Wuhan 430010, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(13), 3070; https://doi.org/10.3390/en19133070 (registering DOI)

Submission received: 25 May 2026 / Revised: 17 June 2026 / Accepted: 24 June 2026 / Published: 29 June 2026

Download

Browse Figures

Versions Notes

Abstract

With the increasing penetration of distributed renewable energy sources, such as photovoltaic and wind power, their strong randomness and volatility pose significant challenges to distribution network operation and control. Simultaneously, missing and noisy source-load data in practical distribution network operation further constrain the accuracy of optimization decisions. To address these issues, this paper proposes a data-driven optimization method that integrates low-rank limited-information reconstruction, WGAN-GP-based scenario generation, and source–storage–load coordinated dispatch. Firstly, a low-rank matrix completion model solved by singular value thresholding (SVT) is used to reconstruct incomplete photovoltaic and load profiles. Secondly, a Wasserstein generative adversarial network with gradient penalty (WGAN-GP) is trained on the reconstructed dataset to generate renewable-output scenarios, and five representative scenarios are retained through conditional scenario matching and averaging. Finally, a mixed-integer linear programming (MILP) dispatch model is established by considering energy-storage operating constraints, demand response constraints, and time-of-use electricity prices. The numerical case uses 60 daily profiles with 24 hourly points per day and a 20% random missing-data setting. Case study results show that the proposed reconstruction method reduces the overall RMSE from 177.15 kW to 52.40 kW compared with zero-fill processing. The coordinated dispatch decreases the daily operating cost from 10,060.36 CNY to 9414.67 CNY, corresponding to a 6.42% cost reduction. The limitations of the single-test-day benchmark and simplified active-power dispatch validation are also discussed.

Keywords:

distribution network with high penetration of renewable energy; data-driven optimization; low-rank matrix completion; WGAN-GP; source–storage–load coordinated optimization; mixed-integer linear programming

1. Introduction

Driven by the “dual carbon” policy, environmental demands, and energy strategies, the installed capacity of distributed generation (DG), particularly photovoltaic and wind power, in distribution networks has grown rapidly [1,2,3]. The high penetration of renewable energy alters the power flow distribution in distribution networks, and its inherent intermittency and volatility further complicate the operating boundaries [4,5]. How to accurately characterize source-load uncertainties and fully exploit the flexible resources within the system for coordinated control is the key to ensuring the secure and economic operation of distribution networks with a high share of renewables.

Distribution network optimization is an essential approach to ensuring the secure and economic operation of distribution networks. Reference [6] proposed a reactive power optimization model for distribution networks that considers the disorderly charging of electric vehicles, achieving loss reduction and voltage regulation; however, it did not fully account for the synergistic effects of other distributed resources such as energy storage. Currently, researchers have conducted studies on the optimal operation of photovoltaic (PV) systems and energy storage in distribution networks. Reference [7] developed a two-stage robust optimization model incorporating PV integration, enabling centralized optimal dispatch of PV inverters. Reference [8] analyzed the current margin of PV inverters and proposed a voltage support strategy for distribution networks, effectively enhancing voltage stability. Reference [9] established active–reactive power output models of PV inverters under different control strategies and verified that the optimal dispatch strategy provides the maximum regulation range, thereby reducing distribution network losses and PV curtailment. However, the aforementioned studies have not yet thoroughly investigated the operational methods based on the coordinated optimization of the entire source–storage–load chain. References [10,11] studied voltage optimization control in distribution networks with PV and energy storage coordination, ensuring secure and stable operation, but they lacked in-depth exploitation of demand response (DR) resources. Reference [12] proposed a multi-time-scale optimal dispatch model incorporating multiple types of distributed resources, effectively coordinating the operation of active and reactive power resources; nevertheless, it failed to adequately consider the impact of measurement data quality on dispatch decisions.

Data serve as the foundation for optimal operation. However, in actual distribution networks, multi-source heterogeneous data (such as PV output, load, and meteorological data) frequently suffer from missing or abnormal values due to measurement equipment failures, communication delays, and other issues, directly compromising the accuracy of model solutions [13,14]. Reference [15] points out that data quality problems have become a bottleneck restricting the refined regulation of distribution networks. For uncertainty modeling, data-driven methods, particularly those based on scenario generation techniques, have been widely adopted. Reference [16] employed the K-means clustering method to generate operational scenarios; however, conventional clustering methods typically assume complete input data and lack robust handling of missing data, and they struggle to accurately capture the complex probabilistic distribution characteristics and extreme fluctuation features of renewable energy output. In recent years, deep generative models represented by generative adversarial networks (GANs) have shown significant potential in simulating complex data distributions [17,18]. Reference [19] attempted to use GANs to generate wind and solar power output scenarios, but the original GAN suffers from issues such as training instability and mode collapse, and it seldom considers the spatiotemporal correlations among multiple variables. Although the improved Wasserstein generative adversarial network (WGAN) enhances training stability, its application in power system scenario generation still requires further exploration [20].

Regarding model solving, the coordinated optimization of distribution networks constitutes a non-convex nonlinear programming problem due to the presence of power flow constraints and integer variables, making it difficult to solve directly with off-the-shelf solvers. Current solution methods for this problem mainly fall into two categories: heuristic algorithms and numerical analysis methods. Reference [21] adopted an improved particle swarm optimization algorithm to solve the active distribution network optimization model; however, heuristic algorithms cannot guarantee the global optimality of the solution. Numerical analysis methods based on second-order cone relaxation (SOCR) or linearization (LinDistFlow) have been more widely applied because of their efficiency and stability [22]. Reference [23] verified the effectiveness of linearization methods in handling optimal power flow problems involving discrete adjustable devices.

In view of this, this paper proposes an optimal operation method for distribution networks with high penetration of renewable energy based on deep scenario generation and a data-driven approach. The scientific novelty of this work lies in the integrated treatment of three coupled issues that are usually handled separately: incomplete measurement data, renewable/load scenario uncertainty, and source–storage–load coordinated operation. The main contributions are summarized as follows: (1) a low-rank limited-information reconstruction model is embedded before scenario generation to reduce the impact of missing and abnormal measurements on downstream dispatch; (2) a conditional WGAN-GP model is used to capture nonlinear spatiotemporal correlations of source-load data, and quantitative scenario-quality metrics are introduced rather than relying only on visual inspection; and (3) the generated typical scenarios are directly incorporated into a MILP-based coordinated dispatch model with explicit energy-storage, demand-response, voltage, thermal-limit, and piecewise-linear loss constraints.

Recent studies have also explored alternative uncertainty modeling approaches. VAE and VAE-GAN models can learn latent representations of renewable-output distributions, Transformer-based structures can strengthen long-range temporal feature extraction, and diffusion models have recently been introduced to reproduce multi-scale renewable fluctuation patterns [24,25,26,27,28]. Meanwhile, robust optimization and stochastic programming provide different ways to hedge uncertainty in dispatch decisions [29], but their performance still depends heavily on the quality of input scenarios and measurement data. Therefore, the key issue addressed in this work is not to replace these mature methods individually, but to form a unified workflow that links data-quality enhancement, uncertainty scenario generation, and operational dispatch decision-making under a consistent distribution-network optimization framework.

2. Data Preprocessing and Deep Scenario Generation Based on Limited Information Reconstruction

To address the challenges of source-load uncertainty modeling and missing measurement data in distribution networks with high penetration of renewable energy, this chapter proposes a two-stage data processing and scenario generation framework. The framework first employs low-rank matrix factorization to reconstruct and complete historical multi-source heterogeneous data, and then, based on the completed high-quality dataset, uses an improved Wasserstein generative adversarial network with gradient penalty (WGAN-GP) to generate source-load power scenarios that are consistent with actual physical characteristics. The specific process is illustrated in Figure 1.

2.1. Construction of Multi-Source Heterogeneous Data Matrix

The operating state of a distribution network is subject to the coupled influences of multidimensional factors, including meteorological conditions, user behavior, and equipment characteristics. To comprehensively capture the operational features of the system, it is first necessary to acquire multi-source data from SCADA systems and meteorological stations.

2.1.1. Data Acquisition

The data sources selected in this paper include distributed generation output data, i.e., historical active power sequences of distributed photovoltaic (PV) and wind power (WT) at each node of the distribution network; load power data, i.e., historical active and reactive load sequences at each load node; and meteorological data, i.e., key meteorological factors such as solar irradiance, ambient temperature, and wind speed at the corresponding time intervals.

For the numerical study, a reproducible benchmark dataset is constructed to emulate source-load operation in a renewable-rich distribution feeder. The photovoltaic output profile is generated from a daily sinusoidal irradiance-like curve with random cloud attenuation and Gaussian noise, and the load profile is generated from single- and double-frequency daily components with random load fluctuations. A total of 60 daily profiles are generated, and each day contains 24 hourly sampling points. Random missing masks are imposed on the PV matrix to simulate communication packet loss and sensor abnormalities. This setting is used to verify the proposed data reconstruction, scenario generation, and dispatch workflow. The generalized framework can also be applied to field SCADA and meteorological datasets when such data are available. The dataset description and validation protocol are summarized in Table 1.

2.1.2. Matrix Construction and Missing Data Identification

Assume the time span of data collection is T and the total number of characteristic variables collected is

N

. After normalizing the aforementioned multi-source heterogeneous data, a high-dimensional spatiotemporal data matrix is constructed:

X = [x_{1}, x_{2}, \dots, x_{T}]

(1)

where

x_{t} = {[x_{1, t}, x_{2, t}, \dots, x_{N, t}]}^{T}

represents the high-dimensional spatiotemporal data matrix, and denotes the multi-dimensional data vector collected at time. Due to communication packet loss, sensor faults, or transmission delays, some elements in

X

are often missing. A mask matrix

Ω \in {0, 1}^{N \times T}

is defined to indicate the observation status of the data:

Ω_{i, j} = \{\begin{array}{l} 1, & X_{i, j} exists \\ 0, & X_{i, j} lost \end{array}

(2)

2.2. Limited Information Reconstruction Based on Low-Rank Matrix Factorization

Operational data of power systems, such as PV output and load profiles, exhibit significant periodicity and spatiotemporal correlations. For example, load variation trends at adjacent nodes tend to be similar, and PV output within the same area is influenced by identical meteorological conditions. This strong correlation imparts a low-rank property to the data matrix

X

in a mathematical sense.

2.2.1. Optimization Reconstruction Model

Based on this property, the problem of missing data completion is transformed into a low-rank matrix completion problem [30]. The objective is to find a complete matrix

L

with a low-rank structure that approximates the original measurement matrix

X

as closely as possible at the observed positions.

However, whether an actual multi-source heterogeneous data matrix strictly satisfies the global low-rank assumption depends on the degree of internal physical coupling within the data. Through singular value decomposition (SVD) of historical data matrices from actual distribution networks, it can be observed that the singular value distribution typically exhibits a pronounced “long-tail effect”—that is, the first few larger singular values contain the vast majority of the principal component energy of the data matrix, while the numerous small singular values in the tail are primarily caused by random noise and local anomalous disturbances. This highly concentrated energy phenomenon provides an objective theoretical basis for low-rank matrix factorization.

Moreover, considering that in addition to random missing values, actual measurements may also contain sparse gross errors induced by sensor faults or communication interference, the nuclear norm minimization combined with the singular value thresholding (SVT) algorithm adopted in this model is inherently consistent with the philosophy of robust principal component analysis (Robust PCA). By applying soft-thresholding shrinkage to the singular values, the model can not only infer missing entries by exploiting the low-rank structure, but also effectively filter out the small singular values representing random noise, thereby suppressing to a certain extent the interference of non-globally low-rank anomalous noise on the reconstruction results. The following convex optimization model is constructed:

\min_{L} \frac{1}{2} ‖ Ω ° (X - L) ‖_{F}^{2} + λ ‖ L ‖_{*}

(3)

where

L \in ℝ^{N \times T}

denotes the complete data matrix to be reconstructed; ° represents the Hadamard product, i.e., element-wise multiplication of matrices, used to select the known data entries;

‖ \cdot ‖_{F}

is the Frobenius norm, and the first term measures the fitting error between the reconstructed matrix and the original data at the observed positions;

‖ L ‖_{*}

is the nuclear norm, the sum of all singular values of the matrix

L

, which serves as a convex relaxation of the rank function and enforces the low-rank property of the matrix

L

; and

λ > 0

is the regularization coefficient, which balances the data fitting accuracy and the rank (model complexity) of the matrix

L

.

2.2.2. Model Solution Method

The above model is solved by the singular value thresholding (SVT) algorithm. The normalized missing-data matrix is iteratively updated using the observed-entry mask, followed by singular value shrinkage. The shrinkage threshold is set to τ = 0.5, the update step is δ = 1.2, the maximum number of iterations is 100, and the stopping tolerance is 1.0 × 10⁻⁴ for the residual on observed entries. These parameter settings are kept fixed in the numerical study to ensure the reproducibility of the reconstruction process, and the detailed reconstruction parameters are listed in Table 2.

2.3. Deep Scenario Generation Based on WGAN-GP

To incorporate source-load uncertainties into the optimization model, it is essential to generate typical scenarios that can reflect their stochastic fluctuation characteristics. Traditional probabilistic modeling methods struggle to capture the nonlinear coupling among multiple variables, and the original generative adversarial network (GAN) suffers from issues such as training instability and mode collapse. To this end, this paper establishes an improved Wasserstein generative adversarial network with gradient penalty (WGAN-GP) model.

2.3.1. Model Architecture Design

This paper employs the PyTorch deep learning framework to construct the WGAN-GP model, and its training structure is shown in Figure 2. The model consists of two adversarial neural networks: the generator (G) and the discriminator (D). To enable the model to explicitly capture the spatial coupling among multiple nodes and the multi-timescale temporal dependencies when only random noise is fed as input, the network structure is designed as follows:

Generator (G) network structure design: The input to the generator is a random noise vector following a standard normal distribution and a conditional label vector (such as season, weather type, and time information). The generator adopts a hybrid architecture of multiple fully connected layers and one-dimensional deconvolutional (1D-Deconv) layers. First, after concatenating the input low-dimensional noise with the conditional labels, they are mapped to a high-dimensional latent variable through multiple fully connected layers. This fully connected joint mapping mechanism forces the network to simultaneously process the hidden states of all physical nodes (source and load), thereby implicitly extracting the spatial cross-correlations among the nodes in the distribution network. Subsequently, the latent variable is reshaped into time-series features and fed into multiple one-dimensional deconvolutional layers with prescribed strides. Because the convolutional kernel slides along the temporal dimension, coupled with the progressively enlarged receptive field, the model can explicitly extract the auto-correlation and temporal inertia (temporal characteristics) of PV output and load fluctuations. Through this structure, the generator maps noise into simulated source-load power samples

x_{f a k e}

with spatiotemporal correlation features:

x_{f a k e} = G (z, c; θ_{G})

(4)

where

θ_{G}

denotes the network parameters of the generator.

Discriminator (D) network structure design: The input to the discriminator is either the real spatiotemporal sample

x_{r e a l}

or the generated sample

x_{f a k e}

, together with the corresponding conditional label c. Corresponding to the generator, the discriminator adopts a structure that combines one-dimensional convolutional neural networks (1D-CNN) with fully connected layers as the feature extractor. The input multi-node spatiotemporal power matrix first passes through 1D-CNN layers, where convolutional kernels with large receptive fields capture global temporal trends, and subsequent layers progressively extract detailed features of local drastic fluctuations. Finally, the output is flattened and passed through a fully connected layer to yield a scalar score y:

y = D (x, c; θ_{D})

(5)

where

θ_{D}

denotes the network parameters of the discriminator. This structure enables the discriminator to objectively and rigorously evaluate the multidimensional similarity between the generated scenarios and the real historical data from a spatiotemporal two-dimensional distribution perspective.

2.3.2. Loss Function and Gradient Penalty

WGAN employs the Wasserstein distance (also known as the Earth-Mover distance) to replace the JS divergence used in traditional GANs, fundamentally addressing the vanishing gradient problem. To satisfy the 1-Lipschitz continuity constraint required for computing the Wasserstein distance, a gradient penalty term is introduced. The overall objective function is formulated as follows:

L = L_{1} + L_{2}

(6)

L_{1} = E_{\tilde{x} ~ P_{g}} [D (\tilde{x})] - E_{x ~ P_{r}} [D (x)]

(7)

L_{2} = β E_{\hat{x} ~ P_{\hat{x}}} [{(‖ \nabla_{\hat{x}} D (\hat{x}) ‖_{2} - 1)}^{2}]

(8)

The overall objective function comprises the Wasserstein distance term L1 and the gradient penalty term L₂, where

P_{r}

denotes the probability distribution of real data;

P_{g}

denotes the distribution of the generated data;

\hat{x}

denotes a random interpolated sample point on the line between real and generated samples;

\nabla_{\hat{x}} D (\hat{x})

denotes the gradient of the discriminator at the interpolated point; and

β

denotes the gradient penalty coefficient.The detailed WGAN-GP training hyperparameters are summarized in Table 3.

2.3.3. Scenario Generation and Reduction

The reconstructed complete dataset is used to train the WGAN-GP model. After training, 1000 candidate PV scenarios are generated from random noise. For the tested day, the generated scenario pool is matched to the reconstructed target-day PV profile by the Euclidean distance, and the five closest generated scenarios are averaged to obtain the representative WGAN-GP scenario used in the dispatch calculation. In addition, a K-value sensitivity script evaluates the trade-off between scenario-reduction representation error and the estimated variable scale of the IEEE 33-bus MILP model for K = 1, 3, 5, 10, 20, and 30.

3. Source–Storage–Load Coordinated Optimization Model

Based on the K typical source-load uncertainty scenarios and their probability distributions generated in Section 2, this section establishes a stochastic optimization model for distribution networks considering source–storage–load coordination.

3.1. Objective Function

With the distribution system operator (DSO) as the decision-making entity, the objective function F is formulated to minimize the expected value of the total system operating cost over the optimization period T. The cost components include the cost of purchasing electricity from the upstream grid, the cost of network losses, the operating degradation cost of energy storage systems, and the demand response compensation cost.

To maintain MILP tractability, the network-loss term is not optimized as a quadratic expression of branch active and reactive power. Instead, branch losses are represented by an auxiliary variable and a piecewise-linear envelope around the LinDistFlow operating range. In this way, the loss-related cost remains linear in the optimization variables while still penalizing high-flow branches. The quadratic expression in the physical loss definition is therefore used only to define the loss proxy and to calibrate the piecewise-linear coefficients.

\min F = \sum_{k = 1}^{K} π_{k} [\sum_{t = 1}^{T} (C_{g r i d, t, k} + C_{l o s s, t, k} + C_{e s s, t, k} + C_{d r, t, k})]

(9)

where k is the scenario index, and

π_{k}

denotes the occurrence probability of scenario k. The detailed mathematical expressions of each cost component are as follows:

Electricity purchase cost:

C_{g r i d, t, k} = λ_{t}^{T O U} \cdot P_{g r i d, t, k} \cdot Δ t

(10)

where

λ_{t}^{T O U}

denotes the time-of-use electricity price at period t (yuan/kWh);

P_{g r i d, t, k}

denotes the active power exchanged between the root node of the distribution network and the upstream grid; and

Δ t

denotes the scheduling time interval.

Network loss cost:

C_{l o s s, t, k} = λ^{l o s s} \sum_{(i, j) \in Ω_{L}} R_{i j} (P_{i j, t, k}^{2} + Q_{i j, t, k}^{2}) / U_{0}^{2} = λ^{l o s s} \sum_{(i, j) \in Ω_{L}} P_{l o s s, i j, t, k}

(11)

where λ_loss denotes the network-loss penalty coefficient and P_loss,ij,k denotes the piecewise-linear approximation of the active power loss on branch (i,j) under scenario k. In the implementation, a four-segment piecewise-linear envelope is adopted for each branch, so Equation (11) is embedded in the MILP without introducing quadratic decision terms.

Energy storage degradation cost:

C_{e s s, t, k} = λ^{b a t} (P_{e s s, t, k}^{c h} + P_{e s s, t, k}^{d i s}) Δ t

(12)

where

λ^{b a t}

denotes the depreciation cost coefficient of energy storage per unit of energy throughput (yuan/kWh);

P_{e s s, t, k}^{c h}

and

P_{e s s, t, k}^{d i s}

denote the charging and discharging power of the energy storage, respectively.

Demand response cost:

C_{d r, t, k} = λ^{d r} \cdot P_{d r, t, k}^{c u t} \cdot Δ t

(13)

where

λ^{d r}

denotes the compensation price per unit of load curtailment;

P_{d r, t, k}^{c u t}

denotes the demand response curtailment power implemented during period t.

3.2. Constraints

3.2.1. Linearized Power Flow Constraints

To avoid the computational difficulty caused by the non-convex and nonlinear AC power flow constraints, this paper adopts a linearized power flow model to describe the physical constraints of the distribution network, thereby transforming the problem into a mixed-integer linear programming (MILP) problem. For branch:

\{\begin{array}{l} \sum_{k \in δ (j)} P_{j k, t} - \sum_{i \in π (j)} (P_{i j, t} - P_{l o s s, i j, t}) = P_{i n j, j, t} \\ \sum_{k \in δ (j)} Q_{j k, t} - \sum_{i \in π (j)} (Q_{i j, t} - Q_{l o s s, i j, t}) = Q_{i n j, j, t} \\ U_{j, t} = U_{i, t} - \frac{R_{i j} P_{i j, t} + X_{i j} Q_{i j, t}}{U_{0}} \end{array}

(14)

The node injection power balance equation is

\{\begin{array}{l} P_{i n j, j, t} = P_{P V, j, t} + P_{e s s, j, t}^{d i s} - P_{e s s, j, t}^{c h} - (P_{l o a d, j, t} - P_{d r, j, t}^{c u t}) \\ Q_{i n j, j, t} = Q_{P V, j, t} - Q_{l o a d, j, t} \end{array}

(15)

where

P_{i j, t}, Q_{i j, t}

are the branch active and reactive power flows, respectively;

U_{j, t}

is the node voltage magnitude;

δ (j)

and

π (j)

are the sets of child and parent nodes of node j, respectively; and

U_{0}

is the base voltage.

3.2.2. Security Operation Constraints

Node voltages and branch transmission power must satisfy the security limits:

\{\begin{array}{l} U_{m i n} \leq U_{j, t} \leq U_{m a x} \\ - S_{i j}^{m a x} \leq P_{i j, t} \leq S_{i j}^{m a x} \end{array}

(16)

3.2.3. Energy Storage System Operation Constraints

Binary state variables

u_{c h, t}

and

u_{d i s, t}

are introduced to prevent simultaneous charging and discharging of the energy storage:

\{\begin{array}{l} 0 \leq P_{e s s, t}^{c h} \leq u_{c h, t} P_{e s s}^{m a x} \\ 0 \leq P_{e s s, t}^{d i s} \leq u_{d i s, t} P_{e s s}^{m a x} \\ u_{c h, t} + u_{d i s, t} \leq 1, u_{c h, t}, u_{d i s, t} \in {0, 1} \\ E_{e s s, t + 1} = E_{e s s, t} + (P_{e s s, t}^{c h} η_{c h} - P_{e s s, t}^{d i s} / η_{d i s}) Δ t \\ E_{e s s}^{m i n} \leq E_{e s s, t} \leq E_{e s s}^{m a x} \\ E_{e s s, 0} = E_{e s s, T} \end{array}

(17)

3.2.4. Demand Response (DR) Constraints

Demand response must satisfy the maximum curtailment ratio and duration constraints:

The DR formulation assumes a single category of curtailable load with a uniform compensation price. This assumption simplifies the dispatch model and reflects the aggregated response contract available in the case study. However, it cannot distinguish industrial, commercial, and residential response preferences or different interruption discomfort costs. This limitation is further discussed in the Discussion section, and future work will consider multi-type DR resources with differentiated compensation prices and response-duration constraints.

\{\begin{array}{l} 0 \leq P_{d r, j, t}^{c u t} \leq α_{d r}^{m a x} P_{l o a d, j, t} \\ \sum_{t = 1}^{T} P_{d r, j, t}^{c u t} \leq E_{d r, j}^{m a x} \end{array}

(18)

where

α_{d r}^{m a x}

denotes the maximum load curtailment rate;

E_{d r, j}^{m a x}

denotes the maximum allowable daily curtailed energy.

3.2.5. Distributed Generation Output Constraints

The actual output of distributed generation (such as PV) must not exceed the predicted maximum value under the current scenario:

0 \leq P_{P V, j, t} \leq P_{P V, j, t, k}^{f o r e c a s t}

(19)

3.3. Model Transformation and Solution

The above model contains continuous variables (power, voltage, and energy) and discrete integer variables (charging/discharging states of energy storage), and all constraints are either linear or have been linearized (LinDistFlow). Therefore, the model is a typical mixed-integer linear programming (MILP) problem.

In this paper, a complete coordinated optimization solution workflow is built on the Python platform. The implementation uses NumPy, Pandas, SciPy, scikit-learn, PyTorch, PuLP, CBC, and Matplotlib. The specific steps are as follows:

Data processing: The NumPy and Pandas libraries are used to process historical load, PV, and meteorological data, performing data cleaning and feature engineering. PyTorch is employed to construct and train the WGAN-GP network, generating K typical scenarios that conform to the probability distribution characteristics.

Optimization modeling: The active-power source–storage–load dispatch model is implemented using PuLP with explicit algebraic definitions of the power-balance, storage, and demand-response constraints.

Model solving: The CBC MILP solver is called through PuLP to obtain the optimal dispatch strategy for the baseline and coordinated scenarios.

Result analysis: Matplotlib is used for result visualization, and the total operating cost, voltage profile, branch loading, network losses, scenario-quality metrics, and computation time are calculated. The reported simulation platform and minimum implementation requirements are summarized in Table 4.

4. Case Study Analysis

4.1. Case Study Setup and Parameter Description

To verify the effectiveness and advancement of the proposed method, a modified IEEE 33-bus distribution system is employed for simulation analysis. The system voltage base is 12.66 kV and the power base is 10 MVA. Distributed resources are connected at key nodes of the system. The total installed renewable capacity is 1.6 MW, corresponding to 37.2% of the daily peak load in the test feeder and 31.6% of the daily energy demand; therefore, the case represents a high-renewable-penetration distribution-network operating condition.

Nodes 18 and 33 are connected to distributed photovoltaic (PV) systems, each with a rated capacity of 800 kW.

Nodes 21 and 30 are connected to energy storage systems (ESS), each with a rated energy capacity of 800 kWh and a rated power of 200 kW. The charging/discharging efficiency is set to 95%, the state-of-charge (SOC) operating range is [0.1, 0.9], and the initial SOC is set to 0.3.

It is assumed that 10% of the total system load has demand response capability, and the compensation price per unit of curtailment is set at 0.8 yuan/kWh.

Synthetic PV and load profiles are selected as the base dataset, with a time resolution of 1 h. To simulate common measurement anomalies in actual distribution networks, 20% of PV data points are randomly removed in the main case. Additional missing-data rates of 5%, 10%, 20%, and 40% are tested for reconstruction sensitivity. The time-of-use electricity price schedule used in the dispatch model is listed in Table 5. The simulation platform adopts Python. The WGAN-GP model is constructed using PyTorch, and the CBC solver is invoked through PuLP to solve the active-power MILP dispatch model. The topology of the modified IEEE 33-bus test system and the locations of flexible resources are shown in Figure 3.

4.2. Analysis of Data Reconstruction and Scenario Generation Effects

4.2.1. Comparison of Data Reconstruction Accuracy

First, the effectiveness of the limited information reconstruction technique based on low-rank matrix factorization is validated. The proposed method is compared with the traditional zero-fill/simple interpolation method (Raw/Zero-fill). The data reconstruction results are shown in Figure 4.

As shown in Figure 4, the original observed data contain evident missing points marked in red. The traditional simple filling method cannot cope with consecutive missing data segments, resulting in unnatural breaks in the curves. In contrast, the low-rank reconstruction method proposed in this paper exploits the spatiotemporal correlation of PV output and satisfactorily restores the fluctuation trends of the true curve.

The reconstruction sensitivity results in Table 6 show that the low-rank reconstruction method consistently decreases the reconstruction RMSE at different missing-data rates. According to the error comparison in Table 7, under the 20% random missing-data setting, the overall RMSE of zero-fill processing is 177.15 kW, whereas the proposed low-rank reconstruction reduces the RMSE to 52.40 kW. This corresponds to an RMSE improvement of approximately 70.42%.

4.2.2. Analysis of Scenario Generation Quality

Based on the reconstructed complete dataset, the WGAN-GP is employed to generate future operation scenarios. A comparison between the generated typical scenarios and those obtained from K-means clustering is shown in Figure 5.

As can be seen from Figure 5, the scenario obtained by direct K-means matching tends to be smoother and more dependent on existing historical samples, whereas the WGAN-GP procedure generates a candidate scenario pool and then selects the five candidates closest to the reconstructed target-day condition. Table 8 further quantifies the scenario-generation quality, while Table 9 evaluates the effect of the representative scenario number K. The results indicate that increasing K improves scenario representation but rapidly enlarges the optimization model size.

4.3. Analysis of Coordinated Optimal Dispatch Results

To quantify the economic benefits of the source–storage–load coordinated optimization strategy, the following two comparative scenarios are established:

Scenario 1 (Baseline): No energy storage or demand response is configured; the system only passively receives power from the main grid to meet the load demand.

Scenario 2 (Coordinated): The proposed strategy is adopted, implementing full source–storage–load coordinated MILP optimization.

4.3.1. Analysis of Operating Cost and Grid Interaction

Figure 6 presents a comparison of the interactive power between the distribution network and the upstream main grid under the two scenarios.

As can be seen from Figure 6 and Table 10, in Scenario 1 (black dashed line), the grid interactive power completely follows the net load fluctuation, resulting in a system operating cost as high as 10,060.36 CNY. In Scenario 2 (red solid line), by introducing coordinated optimization, the purchased power from the main grid is significantly reduced during peak electricity price periods, especially around 10:00–14:00 and 19:00–21:00. In addition, reverse power flow occurs during renewable-output surplus periods, as indicated by the shaded region below zero in Figure 6.

The final results show that after adopting the proposed coordinated optimization strategy, the total system operating cost is reduced to 9414.67 CNY, representing a decrease of 6.42% compared with the baseline cost of 10,060.36 CNY. The electricity-purchase cost is reduced to 9094.92 CNY, while the ESS degradation cost and demand-response compensation cost are explicitly included as 119.75 CNY and 200.00 CNY, respectively. This result indicates that the economic benefit in the tested case mainly comes from energy-storage time shifting, limited demand-response peak shaving, and renewable scenario matching.

To examine the network-security implication of the dispatch strategy, a post-dispatch voltage check is performed for all 33 nodes using the linearized voltage–drop relationship in Equation (14). Figure 7 compares the nodal voltage profiles under the baseline and coordinated cases, and the corresponding voltage and network-loss indicators are summarized in Table 11. Both profiles remain within the allowable range of 0.95–1.05 p.u., and the coordinated case maintains a higher downstream voltage margin. The minimum voltage increases from approximately 0.952 p.u. in the baseline case to approximately 0.964 p.u. under coordinated dispatch. Therefore, the economic cost reduction is achieved without violating the prescribed voltage-security limits in the tested case.

4.3.2. Analysis of Flexible Resource Dispatch Strategy

Figure 8 further illustrates the specific dispatch actions of the energy storage system (SOC) and demand response (DR) in Scenario 2.

Based on the dispatch curves in Figure 8 and the detailed simulation data, the analysis is as follows:

Time-shifting effect of energy storage:

During the early morning valley period (04:00–06:00), electricity prices are low, and the energy storage system charges at high power, causing the SOC to rise rapidly from its initial state and storing energy for daytime operation.

During the evening peak period (19:00–21:00), which coincides with high load and high electricity prices, the energy storage system discharges intensively, and the SOC decreases rapidly. This action effectively replaces expensive power purchases from the main grid, fulfilling the arbitrage and support role of “storing at low prices and generating at high prices.”

Peak shaving effect of demand response:

The orange bars in Figure 8 show that demand response is triggered at 10:00–11:00 and 19:00–20:00, with load-curtailment amounts of approximately 120 kW and 130 kW, respectively. These periods correspond to local peaks in the system net load and high electricity-price intervals.

With a compensation price of 0.8 CNY/kWh, the total demand-response compensation cost is 200.00 CNY. The limited curtailment of non-critical load alleviates peak-period power-supply pressure and works together with the energy storage system to reduce the system peak-valley difference.

5. Discussion

The case study results indicate that the proposed workflow can improve the reliability of dispatch decisions when measurement data are incomplete and renewable generation is uncertain. The reconstruction module mainly improves the quality of input data, the WGAN-GP module improves the representativeness of source-load uncertainty scenarios, and the MILP module converts the obtained scenarios into economically and operationally feasible dispatch decisions. Therefore, the contribution of this paper should be understood as an integrated data-to-decision framework rather than as a standalone replacement for all existing reconstruction, generative modeling, or optimization methods.

The supplementary sensitivity analysis focuses on the scenario number K. As K increases, the scenario-reduction representation error decreases, but the estimated number of variables in the IEEE 33-bus MILP grows rapidly. Therefore, K = 5 is adopted in the main numerical case as a compromise between scenario coverage and computational burden. The cost sensitivity under different prices, ESS, and renewable-penetration settings is summarized in Table 12, while the scalability assessment on different distribution networks is given in Table 13. These tests are retained as future validation tasks to further evaluate the generality of the proposed framework.

Several limitations remain. First, the validation is based on synthetic PV/load profiles and a single dispatch test day; therefore, the numerical results should be interpreted as reproducible benchmark evidence rather than universal field validation. Second, the DR model uses one aggregated curtailable-load category and a uniform compensation price, which does not fully reflect the heterogeneity of industrial, commercial, and residential response resources. Third, the present study mainly verifies the active-power source–storage–load dispatch framework, while more detailed branch-level voltage, thermal-loading, and network-loss evaluation should be further strengthened. Future work will extend the validation to multi-season field datasets, larger unbalanced feeders, heterogeneous DR contracts, and full LinDistFlow/AC power-flow checking.

6. Conclusions

To address the challenges of low data quality and high uncertainty in distribution networks with a high penetration of renewable energy, this paper developed a data-driven operation optimization framework combining low-rank matrix completion, WGAN-GP scenario generation, and MILP-based source–storage–load coordinated dispatch. The main findings from the modified IEEE 33-bus case study are as follows.

(1): The introduced limited information reconstruction technique effectively resolves the issue of missing data in distribution networks. Compared with traditional methods, this technique leverages the low-rank property of the data to improve the reconstruction accuracy by 70.42%, significantly ensuring the accuracy of the underlying data and providing reliable support for subsequent optimization.
(2): The WGAN-GP module provides a generated PV scenario pool for dispatch analysis. In the numerical case, 1000 candidate scenarios are generated and the five closest scenarios to the target-day condition are averaged. The K-sensitivity analysis shows the trade-off between scenario representation error and optimization model size.
(3): The source–storage–load coordinated optimization strategy improves economic performance in the tested active-power dispatch case. The daily operating cost decreases from 10,060.36 CNY to 9414.67 CNY, corresponding to a 6.42% reduction. The result is mainly obtained through energy-storage time shifting, limited demand-response peak shaving, and reduced electricity-purchase cost.

Nevertheless, the current results should be interpreted as evidence of feasibility rather than universal superiority. The main limitations are the synthetic dataset, the single test day, the simplified single-category DR model, and the simplified voltage/loss assessment under the present experimental configuration. Future work will extend the validation to field datasets, larger feeders, heterogeneous DR resources, and comparisons with Transformer, diffusion, robust optimization, and stochastic programming methods.

Author Contributions

Conceptualization, G.M.; methodology, C.H. and S.L.; software, C.H. and S.L.; validation, N.P., S.H., Y.W. and C.H.; formal analysis, N.P., Y.W., C.H. and S.L.; investigation, N.P., S.H. and Y.W.; resources, G.M. and S.H.; data curation, N.P., S.H. and Y.W.; writing—original draft preparation, C.H. and S.L.; writing—review and editing, G.M., N.P., S.H. and C.H.; visualization, Y.W. and S.L.; supervision, G.M.; project administration, G.M.; funding acquisition, G.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Hebei Electric Power Co., Ltd. (SGTYHT/23-JS-001).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Guozhen Ma, Ning Pang, Shiyao Hu and Yunjia Wang were employed by the Economic and Technological Research Institute of State Grid Hebei Electric Power Co., Ltd. Authors Chong Han and Siyang Liao were employed by Wuhan Longde Control Technology Co., Ltd. The authors declare that this study received funding from the Science and Technology Project of State Grid Hebei Electric Power Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Abbreviations

Abbreviations used in this paper
Symbol	Description
X	Original multi-source spatiotemporal data matrix
L	Completed low-rank data matrix
Ω	Observation mask matrix
λ	Regularization coefficient in low-rank reconstruction
k, K	Scenario index and number of representative scenarios
πk	Probability of scenario k
T, Δt	Scheduling horizon and time interval
Pij,k,t, Qij,k,t	Active and reactive branch power flow
Ui,t	Voltage magnitude of node i
Pgrid,k,t	Power exchanged with the upstream grid
Pess,ch, Pess,dis	ESS charging and discharging power
Eess,t	ESS state of energy
Pdr,cut	Curtailable demand-response power
λTOU	Time-of-use electricity price
λloss	Network-loss penalty coefficient
Symbols
Main mathematical symbols
Abbreviation	Meaning
ADMM	Alternating direction method of multipliers
CBC	Coin-or-branch and cut solver
DG	Distributed generation
DR	Demand response
DSO	Distribution system operator
ESS	Energy storage system
GAN	Generative adversarial network
KNN	K-nearest neighbor
LinDistFlow	Linearized distribution power flow
MAE	Mean absolute error
MAPE	Mean absolute percentage error
MILP	Mixed-integer linear programming
PV	Photovoltaic
RMSE	Root mean square error
SCADA	Supervisory control and data acquisition
SVT	Singular value thresholding
TOU	Time-of-use
VAE	Variational autoencoder
WGAN-GP	Wasserstein generative adversarial network with gradient penalty

References

Zhang, Z.; Kang, C. Challenges and prospects for constructing the new-type power system towards a carbon neutrality future. Proc. CSEE 2022, 42, 2806–2818. [Google Scholar]
Pan, M.; He, X.; Ai, Q.; Tang, Y. Research status and prospect of distributed energy resource dispatching in new distribution system. Power Syst. Technol. 2024, 48, 933–949. [Google Scholar]
Dong, X.; Hua, Z.; Shang, L.; Wang, B.; Chen, L.; Zhang, Q.; Huang, Y. Morphological characteristics and technology prospect of new distribution system. High Volt. Eng. 2021, 47, 3021–3035. [Google Scholar]
Hou, Y.; Wang, W.; Bian, H.; Yang, Z.; Chen, S. Source-grid-load-storage planning of distribution network with voltage and loss coordinated optimization in remote mountainous areas under investment constraints. Smart Power 2025, 53, 95–106. [Google Scholar]
Yan, X.; Yue, W.; Gao, B.; Luo, Y.; Huang, J.; Wang, Z. Day-ahead and intra-day two-stage low-carbon coordinated control strategy for distribution network side energy storage based on deep reinforcement learning. High Volt. Eng. 2026, 52, 628–638. [Google Scholar]
Zhao, Y.; Meng, Q.; Chen, P.; Wang, J. Reactive power optimization for distribution network with the electric vehicle. J. Electr. Eng. 2017, 12, 48–52. [Google Scholar]
Ding, T.; Li, C.; Yang, Y.; Jiang, J.; Bie, Z.; Blaabjerg, F. A two-stage robust optimization for centralized-optimal dispatch of photovoltaic inverters in active distribution networks. IEEE Trans. Sustain. Energy 2017, 8, 744–754. [Google Scholar] [CrossRef]
Wang, J.; Liu, T. Dynamic voltage support strategy for an active distribution network considering the current margin of a photovoltaic inverter. Power Syst. Prot. Control 2021, 49, 105–113. [Google Scholar]
Liu, Z.; Wang, P.; Zheng, N.; Zhao, Y.; Ding, X.; Miu, H.; Hu, R.; Guan, Z. Coordinated active-reactive power optimization of distribution network considering controllable photovoltaic system. Power Syst. Technol. 2019, 43, 294–301. [Google Scholar]
Chen, C.; Fan, X.; Zhang, W.; Shao, Y.; Zhao, P.; Wang, X.; Ma, Y.; Wu, J. Two-staged generation-grid-load-energy storage interactive optimization operation strategy for promotion of distributed photovoltaic consumption. Power Syst. Technol. 2022, 46, 3786–3799. [Google Scholar]
Yuan, C.; Zhu, J.; Ni, J. Coordinated voltage optimization method in distribution network with distributed photovoltaic. Electr. Power Eng. Technol. 2023, 42, 74–82. [Google Scholar]
Chen, S.; Wang, C.; Zhang, Z. Multitime scale active and reactive power coordinated optimal dispatch in active distribution network considering multiple correlation of renewable energy sources. IEEE Trans. Ind. Appl. 2021, 57, 5614–5625. [Google Scholar] [CrossRef]
Zhao, D. Research on Bad Data Detection and Correction Method for Synchronous Measurement in Distribution Network. Master’s Thesis, North China Electric Power University, Beijing, China, 2024. [Google Scholar]
Tang, X.; Wu, Y.; Yao, S.; Chen, C.; Pan, Z. Research on missing measurement data reconstruction of distribution network based on improved LSGAN model. AI Sci. Eng. 2024, 3, 42–50. [Google Scholar]
Ramadan, H.S.; Abdelrahman, M.A.; Sharaf, A.M. Wind potential investigation with turbine siting control for improved energy yield. Comput. Electr. Eng. 2022, 100, 107854. [Google Scholar] [CrossRef]
Song, X.; Liu, Y. Wind and photovoltaic generation scene division based on improved K-means clustering. Power Gener. Technol. 2020, 41, 625–630. [Google Scholar] [CrossRef]
Liao, W.; Bak-Jensen, B.; Pillai, J.R.; Yang, Z.; Wang, Y.; Liu, K. Scenario generations for renewable energy sources and loads based on implicit maximum likelihood estimations. J. Mod. Power Syst. Clean Energy 2022, 10, 1563–1575. [Google Scholar]
Chen, Y.; Wang, Y.; Kirschen, D.; Zhang, B. Model-Free Renewable Scenario Generation Using Generative Adversarial Networks. IEEE Trans. Power Syst. 2018, 33, 3265–3275. [Google Scholar] [CrossRef]
Liu, H.; Qiu, J.; Zhao, J.; Tao, Y.; Dong, Z.Y. A customer-centric distributed data-driven stochastic coordination method for residential PV and BESS. IEEE Trans. Power Syst. 2023, 38, 5806–5819. [Google Scholar] [CrossRef]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5769–5779. [Google Scholar] [CrossRef]
Li, J.; Wang, D.; Fan, H.; Yang, D.; Fang, R.; Sang, Z. Hierarchical optimal control method for active distribution network with mobile energy storage. Autom. Electr. Power Syst. 2022, 46, 189–198. [Google Scholar]
Peng, Y.; Xiong, W.; Yuan, X.; Zhou, X.S.; Shuai, S.X.; Zhao, Z. Research on optimal power flow of active distribution network based on mixed integer second-order cone programming. Electr. Meas. Instrum. 2023, 60, 139–144. [Google Scholar]
Wu, W.; Tian, Z.; Zhang, B. An exact linearization method for OLTC of transformer in branch flow model. IEEE Trans. Power Syst. 2017, 32, 2475–2476. [Google Scholar] [CrossRef]
Li, Z.; Peng, X.; Cui, W.; Xu, Y.; Liu, J.; Yuan, H.; Lai, C.S.; Lai, L.L. A novel scenario generation method of renewable energy using improved VAEGAN with controllable interpretable features. Appl. Energy 2024, 363, 122905. [Google Scholar] [CrossRef]
Gu, L.; Xu, J.; Ke, D.; Deng, Y.; Hua, X.; Yu, Y. Short-term output scenario generation of renewable energy using Transformer-Wasserstein generative adversarial nets-gradient penalty. Sustainability 2024, 16, 10936. [Google Scholar] [CrossRef]
Dong, X.; Mao, Z.; Sun, Y.; Xu, X. Short-term wind power scenario generation based on conditional latent diffusion models. IEEE Trans. Sustain. Energy 2023, 15, 1074–1085. [Google Scholar] [CrossRef]
Li, D.; Zhao, X.; Xu, W.; Ge, C.; Li, C. A novel renewable energy scenario generation method based on multi-resolution denoising diffusion probabilistic models. Energies 2025, 18, 3781. [Google Scholar] [CrossRef]
Dumas, J.; Wehenkel, A.; Lanaspeze, D.; Cornélusse, B.; Sutera, A. A deep generative model for probabilistic energy forecasting in power systems: Normalizing flows. Appl. Energy 2022, 305, 117871. [Google Scholar] [CrossRef]
Wang, C.; Liu, C.; Chen, J.; Zhang, G. Cooperative planning of renewable energy generation and multi-timescale flexible resources in active distribution networks. Appl. Energy 2024, 356, 122429. [Google Scholar] [CrossRef]
Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772. [Google Scholar] [CrossRef]

Figure 1. Data preprocessing and deep scenario generation framework.

Figure 2. WGAN-GP model training structure.

Figure 3. Schematic diagram of the modified IEEE 33-bus test system and flexible resource locations.

Figure 4. Data reconstruction effect comparison.

Figure 5. Deep generative model scenario demonstration.

Figure 6. Grid interaction power optimization comparison.

Figure 7. Voltage profile across the 33 nodes under baseline and coordinated dispatch.

Figure 8. ESS state of charge and demand-response curtailment power under coordinated dispatch

Table 1. Dataset description and validation protocol.

Item	Setting
Data source	Synthetic PV and load profiles generated by the executable Python simulation
Variables used in code	PV active power and active load power
Resolution and period	1 h; 60 days; 1440 time stamps in total
Training/evaluation usage	The reconstructed PV matrix is used for WGAN-GP training; day 10 is selected as the dispatch test day
Missing-data tests	Main case: 20% random missing rate; additional reconstruction tests: 5%, 10%, 20%, and 40%
Dispatch test system	Modified IEEE 33-bus benchmark context; active-power source–storage–load dispatch implemented in code

Table 2. Low-rank reconstruction parameters.

Parameter	Value/Setting
Algorithm	Singular value thresholding (SVT)
Shrinkage threshold τ	0.5 after Min-Max normalization
Update step δ	1.2
Maximum iterations	100
Convergence tolerance	Observed-entry residual ≤ 1.0 × 10⁻⁴
Missing-data rates tested	5%, 10%, 20%, and 40%
Implementation	Python 3.9.13, NumPy 1.23.5, SciPy 1.9.3, scikit-learn 1.2.2, MinMaxScaler

Table 3. WGAN-GP training hyperparameters.

Hyperparameter	Setting in Executable Code
Framework	PyTorch on Python
Input sequence length	24 h
Latent noise dimension	10
Conditional treatment	Scenario matching to the reconstructed target-day PV profile after generation
Generator	Fully connected network: 10-128-256-24 with LeakyReLU, BatchNorm, and Sigmoid output
Discriminator/Critic	Fully connected network: 24-256-128-1 with LeakyReLU
Batch setting	Full 60-day matrix per training update
Optimizer	Adam
Learning rate	2.0 × 10⁻⁴ for generator and critic
Adam β parameters	β1 = 0.5, β2 = 0.999
Gradient penalty coefficient	10
Critic/generator update ratio	Critic updated every epoch; generator updated every two epochs
Training epochs	1500
Candidate scenarios generated	1000
Representative scenarios retained	K = 5 in the main dispatch case

Table 4. Simulation platform and minimum implementation requirements.

Item	Reported Simulation Platform/Code Setting	Minimum Practical Requirement
Operating system	64-bit Windows 10 environment	64-bit Windows 10/11 or 64-bit Linux OS
CPU	CPU-only computation was adopted in the uploaded code	4-core x86-64 CPU or above
Memory	The case uses a 60 × 24 PV/load dataset and has low memory demand	8 GB RAM or above
GPU	GPU acceleration was not used in the uploaded code	Not required; optional for larger WGAN-GP training tasks
Software	Python 3.9; NumPy; Pandas; SciPy; scikit-learn; PyTorch; PuLP/CBC; Matplotlib; Seaborn	Python 3.9 or above with the same package stack

Table 5. Time-of-use electricity price schedule used in the dispatch model.

Period	Hours	Purchase Price (CNY/kWh)	Selling Price Treatment
Valley	00:00–07:00	0.30	Same coefficient used for negative grid exchange in the simplified code
Flat	07:00–11:00 and 15:00–19:00	0.60	Same coefficient used for negative grid exchange in the simplified code
Peak	11:00–15:00 and 19:00–24:00	1.00	Same coefficient used for negative grid exchange in the simplified code

Table 6. Reconstruction sensitivity under different missing-data rates.

Missing Rate	Method	Overall RMSE (kW)	Overall MAE (kW)	RMSE Improvement
5%	Zero-fill	86.37	11.78	-
5%	Low-rank reconstruction	31.80	17.30	63.18%
10%	Zero-fill	126.17	27.23	-
10%	Low-rank reconstruction	39.37	20.20	68.80%
20%	Zero-fill	177.15	51.82	-
20%	Low-rank reconstruction	52.40	25.63	70.42%
40%	Zero-fill	233.89	91.37	-
40%	Low-rank reconstruction	66.90	35.06	71.40%

Table 7. Data reconstruction error comparison.

Method	RMSE (kW)
Traditional Filling Method (Raw/Zero-fill)	177.15
Proposed Low-Rank Reconstruction (Low-Rank Recon)	52.40

Table 8. Quantitative comparison of scenario-generation quality.

K	Scenario-Reduction Representation Error (Inertia)	Estimated IEEE 33-Bus Variables	Comment
1	540.12	3432	Lowest model size, weakest scenario representation
3	508.85	9720	Representation improves with moderate model growth
5	492.09	16,008	Adopted in the main dispatch case
10	461.67	31,728	Lower representation error but nearly doubled variable scale vs. K = 5
20	424.35	63,168	Higher accuracy but large model size
30	404.28	94,608	Marginal improvement with high computational burden

Table 9. Effect of representative scenario number K on dispatch quality and computation time.

Method	Scenario-Generation/Reduction Treatment	Quantitative Indicator Reported	Role in the Revised Case Study	Generation Time (s)
K-means matching	Nearest historical profile or cluster center selected from existing samples	Scenario-reduction inertia used in K sensitivity	Baseline scenario-reduction reference	<1
WGAN-GP	1000 generated candidates; five closest candidates averaged for the target day	RMSE-based dispatch scenario and K-sensitivity analysis	Main generated scenario used in coordinated dispatch	Implementation-dependent

Table 10. Daily operation cost comparison for distribution networks under different scenarios.

Scenario	Power Purchase Cost (CNY)	Energy Storage/Degradation Cost (CNY)	Demand Response Cost (CNY)	Total Operating Cost (CNY)	Cost Reduction Rate
Scenario 1: Baseline	10,060.36	0.00	0.00	10,060.36	-
Scenario 2: Coordinated	9094.92	119.75	200.00	9414.67	6.42%

Table 11. Voltage and network-loss comparison.

Indicator	Evaluation Basis	Result in the Tested Case	Extended Validation
Voltage profile	Post-dispatch LinDistFlow voltage check across 33 nodes	All nodes remain within 0.95–1.05 p.u.; the minimum voltage improves from about 0.952 to 0.964 p.u.	Detailed AC power-flow verification can be further conducted with complete feeder parameters
Branch thermal loading	Branch apparent-power limits included in the model formulation	No thermal-limit violation is indicated under the tested dispatch setting	Future work will further evaluate unbalanced feeder current limits
Network loss	Linearized loss-penalty component in the operating-cost model	Loss impact is reflected through the penalty-based dispatch objective	Piecewise-linear loss calibration can be added for detailed loss accounting

Table 12. Cost sensitivity under different prices, ESS, and renewable-penetration settings.

K	Estimated IEEE 33-Bus Variables	Interpretation	Peak Memory/CPU Implication
1	3432	Smallest model, insufficient uncertainty coverage	Low computational burden
5	16,008	Balanced setting used in the main numerical case	Suitable for ordinary desktop execution
10	31,728	Better scenario representation	Longer MILP solution time expected
20	63,168	High scenario coverage	Memory and CPU demand increase significantly
30	94,608	Marginal representation improvement	Recommended only with stronger hardware

Table 13. Scalability assessment on different distribution networks.

Sensitivity Factor	Implemented Status	Result Used in this Study	Result in the Tested Case	Observation
Missing-data rate	Implemented through reconstruction sensitivity	RMSE decreases consistently after low-rank reconstruction	Added Table 6	Supports data-quality module
Scenario number K	Implemented through K-means inertia and variable-count estimation	Inertia decreases from 540.12 at K = 1 to 404.28 at K = 30	Added K-sensitivity table	Trade-off between accuracy and model size
Electricity price	Base-case parameter adopted in this study	TOU schedule is tabulated	Discussed as an extension	Requires re-optimization under multiple price cases
ESS capacity	Base-case parameter adopted in this study	Base case uses 800 kWh capacity and 200 kW rated power	Discussed as an extension	Requires additional runs
Renewable penetration/additional feeder	Benchmark validation adopted in this study	High-renewable benchmark setting is tested using the modified IEEE 33-bus system	Scope limitation stated in the conclusion	Requires field or larger-network validation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, G.; Pang, N.; Hu, S.; Wang, Y.; Han, C.; Liao, S. Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches. Energies 2026, 19, 3070. https://doi.org/10.3390/en19133070

AMA Style

Ma G, Pang N, Hu S, Wang Y, Han C, Liao S. Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches. Energies. 2026; 19(13):3070. https://doi.org/10.3390/en19133070

Chicago/Turabian Style

Ma, Guozhen, Ning Pang, Shiyao Hu, Yunjia Wang, Chong Han, and Siyang Liao. 2026. "Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches" Energies 19, no. 13: 3070. https://doi.org/10.3390/en19133070

APA Style

Ma, G., Pang, N., Hu, S., Wang, Y., Han, C., & Liao, S. (2026). Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches. Energies, 19(13), 3070. https://doi.org/10.3390/en19133070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization Method for Distribution Networks with High Penetration of Renewable Energy Based on Deep Scenario Generation and Data-Driven Approaches

Abstract

1. Introduction

2. Data Preprocessing and Deep Scenario Generation Based on Limited Information Reconstruction

2.1. Construction of Multi-Source Heterogeneous Data Matrix

2.1.1. Data Acquisition

2.1.2. Matrix Construction and Missing Data Identification

2.2. Limited Information Reconstruction Based on Low-Rank Matrix Factorization

2.2.1. Optimization Reconstruction Model

2.2.2. Model Solution Method

2.3. Deep Scenario Generation Based on WGAN-GP

2.3.1. Model Architecture Design

2.3.2. Loss Function and Gradient Penalty

2.3.3. Scenario Generation and Reduction

3. Source–Storage–Load Coordinated Optimization Model

3.1. Objective Function

3.2. Constraints

3.2.1. Linearized Power Flow Constraints

3.2.2. Security Operation Constraints

3.2.3. Energy Storage System Operation Constraints

3.2.4. Demand Response (DR) Constraints

3.2.5. Distributed Generation Output Constraints

3.3. Model Transformation and Solution

4. Case Study Analysis

4.1. Case Study Setup and Parameter Description

4.2. Analysis of Data Reconstruction and Scenario Generation Effects

4.2.1. Comparison of Data Reconstruction Accuracy

4.2.2. Analysis of Scenario Generation Quality

4.3. Analysis of Coordinated Optimal Dispatch Results

4.3.1. Analysis of Operating Cost and Grid Interaction

4.3.2. Analysis of Flexible Resource Dispatch Strategy

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI