1. Introduction
Driven by the “dual carbon” policy, environmental demands, and energy strategies, the installed capacity of distributed generation (DG), particularly photovoltaic and wind power, in distribution networks has grown rapidly [
1,
2,
3]. The high penetration of renewable energy alters the power flow distribution in distribution networks, and its inherent intermittency and volatility further complicate the operating boundaries [
4,
5]. How to accurately characterize source-load uncertainties and fully exploit the flexible resources within the system for coordinated control is the key to ensuring the secure and economic operation of distribution networks with a high share of renewables.
Distribution network optimization is an essential approach to ensuring the secure and economic operation of distribution networks. Reference [
6] proposed a reactive power optimization model for distribution networks that considers the disorderly charging of electric vehicles, achieving loss reduction and voltage regulation; however, it did not fully account for the synergistic effects of other distributed resources such as energy storage. Currently, researchers have conducted studies on the optimal operation of photovoltaic (PV) systems and energy storage in distribution networks. Reference [
7] developed a two-stage robust optimization model incorporating PV integration, enabling centralized optimal dispatch of PV inverters. Reference [
8] analyzed the current margin of PV inverters and proposed a voltage support strategy for distribution networks, effectively enhancing voltage stability. Reference [
9] established active–reactive power output models of PV inverters under different control strategies and verified that the optimal dispatch strategy provides the maximum regulation range, thereby reducing distribution network losses and PV curtailment. However, the aforementioned studies have not yet thoroughly investigated the operational methods based on the coordinated optimization of the entire source–storage–load chain. References [
10,
11] studied voltage optimization control in distribution networks with PV and energy storage coordination, ensuring secure and stable operation, but they lacked in-depth exploitation of demand response (DR) resources. Reference [
12] proposed a multi-time-scale optimal dispatch model incorporating multiple types of distributed resources, effectively coordinating the operation of active and reactive power resources; nevertheless, it failed to adequately consider the impact of measurement data quality on dispatch decisions.
Data serve as the foundation for optimal operation. However, in actual distribution networks, multi-source heterogeneous data (such as PV output, load, and meteorological data) frequently suffer from missing or abnormal values due to measurement equipment failures, communication delays, and other issues, directly compromising the accuracy of model solutions [
13,
14]. Reference [
15] points out that data quality problems have become a bottleneck restricting the refined regulation of distribution networks. For uncertainty modeling, data-driven methods, particularly those based on scenario generation techniques, have been widely adopted. Reference [
16] employed the K-means clustering method to generate operational scenarios; however, conventional clustering methods typically assume complete input data and lack robust handling of missing data, and they struggle to accurately capture the complex probabilistic distribution characteristics and extreme fluctuation features of renewable energy output. In recent years, deep generative models represented by generative adversarial networks (GANs) have shown significant potential in simulating complex data distributions [
17,
18]. Reference [
19] attempted to use GANs to generate wind and solar power output scenarios, but the original GAN suffers from issues such as training instability and mode collapse, and it seldom considers the spatiotemporal correlations among multiple variables. Although the improved Wasserstein generative adversarial network (WGAN) enhances training stability, its application in power system scenario generation still requires further exploration [
20].
Regarding model solving, the coordinated optimization of distribution networks constitutes a non-convex nonlinear programming problem due to the presence of power flow constraints and integer variables, making it difficult to solve directly with off-the-shelf solvers. Current solution methods for this problem mainly fall into two categories: heuristic algorithms and numerical analysis methods. Reference [
21] adopted an improved particle swarm optimization algorithm to solve the active distribution network optimization model; however, heuristic algorithms cannot guarantee the global optimality of the solution. Numerical analysis methods based on second-order cone relaxation (SOCR) or linearization (LinDistFlow) have been more widely applied because of their efficiency and stability [
22]. Reference [
23] verified the effectiveness of linearization methods in handling optimal power flow problems involving discrete adjustable devices.
In view of this, this paper proposes an optimal operation method for distribution networks with high penetration of renewable energy based on deep scenario generation and a data-driven approach. The scientific novelty of this work lies in the integrated treatment of three coupled issues that are usually handled separately: incomplete measurement data, renewable/load scenario uncertainty, and source–storage–load coordinated operation. The main contributions are summarized as follows: (1) a low-rank limited-information reconstruction model is embedded before scenario generation to reduce the impact of missing and abnormal measurements on downstream dispatch; (2) a conditional WGAN-GP model is used to capture nonlinear spatiotemporal correlations of source-load data, and quantitative scenario-quality metrics are introduced rather than relying only on visual inspection; and (3) the generated typical scenarios are directly incorporated into a MILP-based coordinated dispatch model with explicit energy-storage, demand-response, voltage, thermal-limit, and piecewise-linear loss constraints.
Recent studies have also explored alternative uncertainty modeling approaches. VAE and VAE-GAN models can learn latent representations of renewable-output distributions, Transformer-based structures can strengthen long-range temporal feature extraction, and diffusion models have recently been introduced to reproduce multi-scale renewable fluctuation patterns [
24,
25,
26,
27,
28]. Meanwhile, robust optimization and stochastic programming provide different ways to hedge uncertainty in dispatch decisions [
29], but their performance still depends heavily on the quality of input scenarios and measurement data. Therefore, the key issue addressed in this work is not to replace these mature methods individually, but to form a unified workflow that links data-quality enhancement, uncertainty scenario generation, and operational dispatch decision-making under a consistent distribution-network optimization framework.
2. Data Preprocessing and Deep Scenario Generation Based on Limited Information Reconstruction
To address the challenges of source-load uncertainty modeling and missing measurement data in distribution networks with high penetration of renewable energy, this chapter proposes a two-stage data processing and scenario generation framework. The framework first employs low-rank matrix factorization to reconstruct and complete historical multi-source heterogeneous data, and then, based on the completed high-quality dataset, uses an improved Wasserstein generative adversarial network with gradient penalty (WGAN-GP) to generate source-load power scenarios that are consistent with actual physical characteristics. The specific process is illustrated in
Figure 1.
2.1. Construction of Multi-Source Heterogeneous Data Matrix
The operating state of a distribution network is subject to the coupled influences of multidimensional factors, including meteorological conditions, user behavior, and equipment characteristics. To comprehensively capture the operational features of the system, it is first necessary to acquire multi-source data from SCADA systems and meteorological stations.
2.1.1. Data Acquisition
The data sources selected in this paper include distributed generation output data, i.e., historical active power sequences of distributed photovoltaic (PV) and wind power (WT) at each node of the distribution network; load power data, i.e., historical active and reactive load sequences at each load node; and meteorological data, i.e., key meteorological factors such as solar irradiance, ambient temperature, and wind speed at the corresponding time intervals.
For the numerical study, a reproducible benchmark dataset is constructed to emulate source-load operation in a renewable-rich distribution feeder. The photovoltaic output profile is generated from a daily sinusoidal irradiance-like curve with random cloud attenuation and Gaussian noise, and the load profile is generated from single- and double-frequency daily components with random load fluctuations. A total of 60 daily profiles are generated, and each day contains 24 hourly sampling points. Random missing masks are imposed on the PV matrix to simulate communication packet loss and sensor abnormalities. This setting is used to verify the proposed data reconstruction, scenario generation, and dispatch workflow. The generalized framework can also be applied to field SCADA and meteorological datasets when such data are available. The dataset description and validation protocol are summarized in
Table 1.
2.1.2. Matrix Construction and Missing Data Identification
Assume the time span of data collection is
T and the total number of characteristic variables collected is
. After normalizing the aforementioned multi-source heterogeneous data, a high-dimensional spatiotemporal data matrix is constructed:
where
represents the high-dimensional spatiotemporal data matrix, and denotes the multi-dimensional data vector collected at time. Due to communication packet loss, sensor faults, or transmission delays, some elements in
are often missing. A mask matrix
is defined to indicate the observation status of the data:
2.2. Limited Information Reconstruction Based on Low-Rank Matrix Factorization
Operational data of power systems, such as PV output and load profiles, exhibit significant periodicity and spatiotemporal correlations. For example, load variation trends at adjacent nodes tend to be similar, and PV output within the same area is influenced by identical meteorological conditions. This strong correlation imparts a low-rank property to the data matrix in a mathematical sense.
2.2.1. Optimization Reconstruction Model
Based on this property, the problem of missing data completion is transformed into a low-rank matrix completion problem [
30]. The objective is to find a complete matrix
with a low-rank structure that approximates the original measurement matrix
as closely as possible at the observed positions.
However, whether an actual multi-source heterogeneous data matrix strictly satisfies the global low-rank assumption depends on the degree of internal physical coupling within the data. Through singular value decomposition (SVD) of historical data matrices from actual distribution networks, it can be observed that the singular value distribution typically exhibits a pronounced “long-tail effect”—that is, the first few larger singular values contain the vast majority of the principal component energy of the data matrix, while the numerous small singular values in the tail are primarily caused by random noise and local anomalous disturbances. This highly concentrated energy phenomenon provides an objective theoretical basis for low-rank matrix factorization.
Moreover, considering that in addition to random missing values, actual measurements may also contain sparse gross errors induced by sensor faults or communication interference, the nuclear norm minimization combined with the singular value thresholding (SVT) algorithm adopted in this model is inherently consistent with the philosophy of robust principal component analysis (Robust PCA). By applying soft-thresholding shrinkage to the singular values, the model can not only infer missing entries by exploiting the low-rank structure, but also effectively filter out the small singular values representing random noise, thereby suppressing to a certain extent the interference of non-globally low-rank anomalous noise on the reconstruction results. The following convex optimization model is constructed:
where
denotes the complete data matrix to be reconstructed; ° represents the Hadamard product, i.e., element-wise multiplication of matrices, used to select the known data entries;
is the Frobenius norm, and the first term measures the fitting error between the reconstructed matrix and the original data at the observed positions;
is the nuclear norm, the sum of all singular values of the matrix
, which serves as a convex relaxation of the rank function and enforces the low-rank property of the matrix
; and
is the regularization coefficient, which balances the data fitting accuracy and the rank (model complexity) of the matrix
.
2.2.2. Model Solution Method
The above model is solved by the singular value thresholding (SVT) algorithm. The normalized missing-data matrix is iteratively updated using the observed-entry mask, followed by singular value shrinkage. The shrinkage threshold is set to τ = 0.5, the update step is δ = 1.2, the maximum number of iterations is 100, and the stopping tolerance is 1.0 × 10
−4 for the residual on observed entries. These parameter settings are kept fixed in the numerical study to ensure the reproducibility of the reconstruction process, and the detailed reconstruction parameters are listed in
Table 2.
2.3. Deep Scenario Generation Based on WGAN-GP
To incorporate source-load uncertainties into the optimization model, it is essential to generate typical scenarios that can reflect their stochastic fluctuation characteristics. Traditional probabilistic modeling methods struggle to capture the nonlinear coupling among multiple variables, and the original generative adversarial network (GAN) suffers from issues such as training instability and mode collapse. To this end, this paper establishes an improved Wasserstein generative adversarial network with gradient penalty (WGAN-GP) model.
2.3.1. Model Architecture Design
This paper employs the PyTorch deep learning framework to construct the WGAN-GP model, and its training structure is shown in
Figure 2. The model consists of two adversarial neural networks: the generator (G) and the discriminator (D). To enable the model to explicitly capture the spatial coupling among multiple nodes and the multi-timescale temporal dependencies when only random noise is fed as input, the network structure is designed as follows:
Generator (G) network structure design: The input to the generator is a random noise vector following a standard normal distribution and a conditional label vector (such as season, weather type, and time information). The generator adopts a hybrid architecture of multiple fully connected layers and one-dimensional deconvolutional (1D-Deconv) layers. First, after concatenating the input low-dimensional noise with the conditional labels, they are mapped to a high-dimensional latent variable through multiple fully connected layers. This fully connected joint mapping mechanism forces the network to simultaneously process the hidden states of all physical nodes (source and load), thereby implicitly extracting the spatial cross-correlations among the nodes in the distribution network. Subsequently, the latent variable is reshaped into time-series features and fed into multiple one-dimensional deconvolutional layers with prescribed strides. Because the convolutional kernel slides along the temporal dimension, coupled with the progressively enlarged receptive field, the model can explicitly extract the auto-correlation and temporal inertia (temporal characteristics) of PV output and load fluctuations. Through this structure, the generator maps noise into simulated source-load power samples
with spatiotemporal correlation features:
where
denotes the network parameters of the generator.
Discriminator (D) network structure design: The input to the discriminator is either the real spatiotemporal sample
or the generated sample
, together with the corresponding conditional label c. Corresponding to the generator, the discriminator adopts a structure that combines one-dimensional convolutional neural networks (1D-CNN) with fully connected layers as the feature extractor. The input multi-node spatiotemporal power matrix first passes through 1D-CNN layers, where convolutional kernels with large receptive fields capture global temporal trends, and subsequent layers progressively extract detailed features of local drastic fluctuations. Finally, the output is flattened and passed through a fully connected layer to yield a scalar score y:
where
denotes the network parameters of the discriminator. This structure enables the discriminator to objectively and rigorously evaluate the multidimensional similarity between the generated scenarios and the real historical data from a spatiotemporal two-dimensional distribution perspective.
2.3.2. Loss Function and Gradient Penalty
WGAN employs the Wasserstein distance (also known as the Earth-Mover distance) to replace the JS divergence used in traditional GANs, fundamentally addressing the vanishing gradient problem. To satisfy the 1-Lipschitz continuity constraint required for computing the Wasserstein distance, a gradient penalty term is introduced. The overall objective function is formulated as follows:
The overall objective function comprises the Wasserstein distance term L1 and the gradient penalty term
L2, where
denotes the probability distribution of real data;
denotes the distribution of the generated data;
denotes a random interpolated sample point on the line between real and generated samples;
denotes the gradient of the discriminator at the interpolated point; and
denotes the gradient penalty coefficient.The detailed WGAN-GP training hyperparameters are summarized in
Table 3.
2.3.3. Scenario Generation and Reduction
The reconstructed complete dataset is used to train the WGAN-GP model. After training, 1000 candidate PV scenarios are generated from random noise. For the tested day, the generated scenario pool is matched to the reconstructed target-day PV profile by the Euclidean distance, and the five closest generated scenarios are averaged to obtain the representative WGAN-GP scenario used in the dispatch calculation. In addition, a K-value sensitivity script evaluates the trade-off between scenario-reduction representation error and the estimated variable scale of the IEEE 33-bus MILP model for K = 1, 3, 5, 10, 20, and 30.
3. Source–Storage–Load Coordinated Optimization Model
Based on the K typical source-load uncertainty scenarios and their probability distributions generated in
Section 2, this section establishes a stochastic optimization model for distribution networks considering source–storage–load coordination.
3.1. Objective Function
With the distribution system operator (DSO) as the decision-making entity, the objective function F is formulated to minimize the expected value of the total system operating cost over the optimization period T. The cost components include the cost of purchasing electricity from the upstream grid, the cost of network losses, the operating degradation cost of energy storage systems, and the demand response compensation cost.
To maintain MILP tractability, the network-loss term is not optimized as a quadratic expression of branch active and reactive power. Instead, branch losses are represented by an auxiliary variable and a piecewise-linear envelope around the LinDistFlow operating range. In this way, the loss-related cost remains linear in the optimization variables while still penalizing high-flow branches. The quadratic expression in the physical loss definition is therefore used only to define the loss proxy and to calibrate the piecewise-linear coefficients.
where
k is the scenario index, and
denotes the occurrence probability of scenario k. The detailed mathematical expressions of each cost component are as follows:
Electricity purchase cost:
where
denotes the time-of-use electricity price at period t (yuan/kWh);
denotes the active power exchanged between the root node of the distribution network and the upstream grid; and
denotes the scheduling time interval.
Network loss cost:
where λ
loss denotes the network-loss penalty coefficient and P
loss,ij,k denotes the piecewise-linear approximation of the active power loss on branch (i,j) under scenario k. In the implementation, a four-segment piecewise-linear envelope is adopted for each branch, so Equation (11) is embedded in the MILP without introducing quadratic decision terms.
Energy storage degradation cost:
where
denotes the depreciation cost coefficient of energy storage per unit of energy throughput (yuan/kWh);
and
denote the charging and discharging power of the energy storage, respectively.
Demand response cost:
where
denotes the compensation price per unit of load curtailment;
denotes the demand response curtailment power implemented during period t.
3.2. Constraints
3.2.1. Linearized Power Flow Constraints
To avoid the computational difficulty caused by the non-convex and nonlinear AC power flow constraints, this paper adopts a linearized power flow model to describe the physical constraints of the distribution network, thereby transforming the problem into a mixed-integer linear programming (MILP) problem. For branch:
The node injection power balance equation is
where
are the branch active and reactive power flows, respectively;
is the node voltage magnitude;
and
are the sets of child and parent nodes of node j, respectively; and
is the base voltage.
3.2.2. Security Operation Constraints
Node voltages and branch transmission power must satisfy the security limits:
3.2.3. Energy Storage System Operation Constraints
Binary state variables
and
are introduced to prevent simultaneous charging and discharging of the energy storage:
3.2.4. Demand Response (DR) Constraints
Demand response must satisfy the maximum curtailment ratio and duration constraints:
The DR formulation assumes a single category of curtailable load with a uniform compensation price. This assumption simplifies the dispatch model and reflects the aggregated response contract available in the case study. However, it cannot distinguish industrial, commercial, and residential response preferences or different interruption discomfort costs. This limitation is further discussed in the Discussion section, and future work will consider multi-type DR resources with differentiated compensation prices and response-duration constraints.
where
denotes the maximum load curtailment rate;
denotes the maximum allowable daily curtailed energy.
3.2.5. Distributed Generation Output Constraints
The actual output of distributed generation (such as PV) must not exceed the predicted maximum value under the current scenario:
3.3. Model Transformation and Solution
The above model contains continuous variables (power, voltage, and energy) and discrete integer variables (charging/discharging states of energy storage), and all constraints are either linear or have been linearized (LinDistFlow). Therefore, the model is a typical mixed-integer linear programming (MILP) problem.
In this paper, a complete coordinated optimization solution workflow is built on the Python platform. The implementation uses NumPy, Pandas, SciPy, scikit-learn, PyTorch, PuLP, CBC, and Matplotlib. The specific steps are as follows:
Data processing: The NumPy and Pandas libraries are used to process historical load, PV, and meteorological data, performing data cleaning and feature engineering. PyTorch is employed to construct and train the WGAN-GP network, generating K typical scenarios that conform to the probability distribution characteristics.
Optimization modeling: The active-power source–storage–load dispatch model is implemented using PuLP with explicit algebraic definitions of the power-balance, storage, and demand-response constraints.
Model solving: The CBC MILP solver is called through PuLP to obtain the optimal dispatch strategy for the baseline and coordinated scenarios.
Result analysis: Matplotlib is used for result visualization, and the total operating cost, voltage profile, branch loading, network losses, scenario-quality metrics, and computation time are calculated. The reported simulation platform and minimum implementation requirements are summarized in
Table 4.
4. Case Study Analysis
4.1. Case Study Setup and Parameter Description
To verify the effectiveness and advancement of the proposed method, a modified IEEE 33-bus distribution system is employed for simulation analysis. The system voltage base is 12.66 kV and the power base is 10 MVA. Distributed resources are connected at key nodes of the system. The total installed renewable capacity is 1.6 MW, corresponding to 37.2% of the daily peak load in the test feeder and 31.6% of the daily energy demand; therefore, the case represents a high-renewable-penetration distribution-network operating condition.
Nodes 18 and 33 are connected to distributed photovoltaic (PV) systems, each with a rated capacity of 800 kW.
Nodes 21 and 30 are connected to energy storage systems (ESS), each with a rated energy capacity of 800 kWh and a rated power of 200 kW. The charging/discharging efficiency is set to 95%, the state-of-charge (SOC) operating range is [0.1, 0.9], and the initial SOC is set to 0.3.
It is assumed that 10% of the total system load has demand response capability, and the compensation price per unit of curtailment is set at 0.8 yuan/kWh.
Synthetic PV and load profiles are selected as the base dataset, with a time resolution of 1 h. To simulate common measurement anomalies in actual distribution networks, 20% of PV data points are randomly removed in the main case. Additional missing-data rates of 5%, 10%, 20%, and 40% are tested for reconstruction sensitivity. The time-of-use electricity price schedule used in the dispatch model is listed in
Table 5. The simulation platform adopts Python. The WGAN-GP model is constructed using PyTorch, and the CBC solver is invoked through PuLP to solve the active-power MILP dispatch model. The topology of the modified IEEE 33-bus test system and the locations of flexible resources are shown in
Figure 3.
4.2. Analysis of Data Reconstruction and Scenario Generation Effects
4.2.1. Comparison of Data Reconstruction Accuracy
First, the effectiveness of the limited information reconstruction technique based on low-rank matrix factorization is validated. The proposed method is compared with the traditional zero-fill/simple interpolation method (Raw/Zero-fill). The data reconstruction results are shown in
Figure 4.
As shown in
Figure 4, the original observed data contain evident missing points marked in red. The traditional simple filling method cannot cope with consecutive missing data segments, resulting in unnatural breaks in the curves. In contrast, the low-rank reconstruction method proposed in this paper exploits the spatiotemporal correlation of PV output and satisfactorily restores the fluctuation trends of the true curve.
The reconstruction sensitivity results in
Table 6 show that the low-rank reconstruction method consistently decreases the reconstruction RMSE at different missing-data rates. According to the error comparison in
Table 7, under the 20% random missing-data setting, the overall RMSE of zero-fill processing is 177.15 kW, whereas the proposed low-rank reconstruction reduces the RMSE to 52.40 kW. This corresponds to an RMSE improvement of approximately 70.42%.
4.2.2. Analysis of Scenario Generation Quality
Based on the reconstructed complete dataset, the WGAN-GP is employed to generate future operation scenarios. A comparison between the generated typical scenarios and those obtained from K-means clustering is shown in
Figure 5.
As can be seen from
Figure 5, the scenario obtained by direct K-means matching tends to be smoother and more dependent on existing historical samples, whereas the WGAN-GP procedure generates a candidate scenario pool and then selects the five candidates closest to the reconstructed target-day condition.
Table 8 further quantifies the scenario-generation quality, while
Table 9 evaluates the effect of the representative scenario number K. The results indicate that increasing K improves scenario representation but rapidly enlarges the optimization model size.
4.3. Analysis of Coordinated Optimal Dispatch Results
To quantify the economic benefits of the source–storage–load coordinated optimization strategy, the following two comparative scenarios are established:
Scenario 1 (Baseline): No energy storage or demand response is configured; the system only passively receives power from the main grid to meet the load demand.
Scenario 2 (Coordinated): The proposed strategy is adopted, implementing full source–storage–load coordinated MILP optimization.
4.3.1. Analysis of Operating Cost and Grid Interaction
Figure 6 presents a comparison of the interactive power between the distribution network and the upstream main grid under the two scenarios.
As can be seen from
Figure 6 and
Table 10, in Scenario 1 (black dashed line), the grid interactive power completely follows the net load fluctuation, resulting in a system operating cost as high as 10,060.36 CNY. In Scenario 2 (red solid line), by introducing coordinated optimization, the purchased power from the main grid is significantly reduced during peak electricity price periods, especially around 10:00–14:00 and 19:00–21:00. In addition, reverse power flow occurs during renewable-output surplus periods, as indicated by the shaded region below zero in
Figure 6.
The final results show that after adopting the proposed coordinated optimization strategy, the total system operating cost is reduced to 9414.67 CNY, representing a decrease of 6.42% compared with the baseline cost of 10,060.36 CNY. The electricity-purchase cost is reduced to 9094.92 CNY, while the ESS degradation cost and demand-response compensation cost are explicitly included as 119.75 CNY and 200.00 CNY, respectively. This result indicates that the economic benefit in the tested case mainly comes from energy-storage time shifting, limited demand-response peak shaving, and renewable scenario matching.
To examine the network-security implication of the dispatch strategy, a post-dispatch voltage check is performed for all 33 nodes using the linearized voltage–drop relationship in Equation (14).
Figure 7 compares the nodal voltage profiles under the baseline and coordinated cases, and the corresponding voltage and network-loss indicators are summarized in
Table 11. Both profiles remain within the allowable range of 0.95–1.05 p.u., and the coordinated case maintains a higher downstream voltage margin. The minimum voltage increases from approximately 0.952 p.u. in the baseline case to approximately 0.964 p.u. under coordinated dispatch. Therefore, the economic cost reduction is achieved without violating the prescribed voltage-security limits in the tested case.
4.3.2. Analysis of Flexible Resource Dispatch Strategy
Figure 8 further illustrates the specific dispatch actions of the energy storage system (SOC) and demand response (DR) in Scenario 2.
Based on the dispatch curves in
Figure 8 and the detailed simulation data, the analysis is as follows:
Time-shifting effect of energy storage:
During the early morning valley period (04:00–06:00), electricity prices are low, and the energy storage system charges at high power, causing the SOC to rise rapidly from its initial state and storing energy for daytime operation.
During the evening peak period (19:00–21:00), which coincides with high load and high electricity prices, the energy storage system discharges intensively, and the SOC decreases rapidly. This action effectively replaces expensive power purchases from the main grid, fulfilling the arbitrage and support role of “storing at low prices and generating at high prices.”
Peak shaving effect of demand response:
The orange bars in
Figure 8 show that demand response is triggered at 10:00–11:00 and 19:00–20:00, with load-curtailment amounts of approximately 120 kW and 130 kW, respectively. These periods correspond to local peaks in the system net load and high electricity-price intervals.
With a compensation price of 0.8 CNY/kWh, the total demand-response compensation cost is 200.00 CNY. The limited curtailment of non-critical load alleviates peak-period power-supply pressure and works together with the energy storage system to reduce the system peak-valley difference.
5. Discussion
The case study results indicate that the proposed workflow can improve the reliability of dispatch decisions when measurement data are incomplete and renewable generation is uncertain. The reconstruction module mainly improves the quality of input data, the WGAN-GP module improves the representativeness of source-load uncertainty scenarios, and the MILP module converts the obtained scenarios into economically and operationally feasible dispatch decisions. Therefore, the contribution of this paper should be understood as an integrated data-to-decision framework rather than as a standalone replacement for all existing reconstruction, generative modeling, or optimization methods.
The supplementary sensitivity analysis focuses on the scenario number K. As K increases, the scenario-reduction representation error decreases, but the estimated number of variables in the IEEE 33-bus MILP grows rapidly. Therefore, K = 5 is adopted in the main numerical case as a compromise between scenario coverage and computational burden. The cost sensitivity under different prices, ESS, and renewable-penetration settings is summarized in
Table 12, while the scalability assessment on different distribution networks is given in
Table 13. These tests are retained as future validation tasks to further evaluate the generality of the proposed framework.
Several limitations remain. First, the validation is based on synthetic PV/load profiles and a single dispatch test day; therefore, the numerical results should be interpreted as reproducible benchmark evidence rather than universal field validation. Second, the DR model uses one aggregated curtailable-load category and a uniform compensation price, which does not fully reflect the heterogeneity of industrial, commercial, and residential response resources. Third, the present study mainly verifies the active-power source–storage–load dispatch framework, while more detailed branch-level voltage, thermal-loading, and network-loss evaluation should be further strengthened. Future work will extend the validation to multi-season field datasets, larger unbalanced feeders, heterogeneous DR contracts, and full LinDistFlow/AC power-flow checking.
6. Conclusions
To address the challenges of low data quality and high uncertainty in distribution networks with a high penetration of renewable energy, this paper developed a data-driven operation optimization framework combining low-rank matrix completion, WGAN-GP scenario generation, and MILP-based source–storage–load coordinated dispatch. The main findings from the modified IEEE 33-bus case study are as follows.
- (1)
The introduced limited information reconstruction technique effectively resolves the issue of missing data in distribution networks. Compared with traditional methods, this technique leverages the low-rank property of the data to improve the reconstruction accuracy by 70.42%, significantly ensuring the accuracy of the underlying data and providing reliable support for subsequent optimization.
- (2)
The WGAN-GP module provides a generated PV scenario pool for dispatch analysis. In the numerical case, 1000 candidate scenarios are generated and the five closest scenarios to the target-day condition are averaged. The K-sensitivity analysis shows the trade-off between scenario representation error and optimization model size.
- (3)
The source–storage–load coordinated optimization strategy improves economic performance in the tested active-power dispatch case. The daily operating cost decreases from 10,060.36 CNY to 9414.67 CNY, corresponding to a 6.42% reduction. The result is mainly obtained through energy-storage time shifting, limited demand-response peak shaving, and reduced electricity-purchase cost.
Nevertheless, the current results should be interpreted as evidence of feasibility rather than universal superiority. The main limitations are the synthetic dataset, the single test day, the simplified single-category DR model, and the simplified voltage/loss assessment under the present experimental configuration. Future work will extend the validation to field datasets, larger feeders, heterogeneous DR resources, and comparisons with Transformer, diffusion, robust optimization, and stochastic programming methods.