A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion

Liu, Yaoxian; Zhang, Kaixin; Sun, Yue; Chen, Jingwen; Chen, Junshuo

doi:10.3390/a18060373

Open AccessArticle

A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion

by

Yaoxian Liu

¹,

Kaixin Zhang

¹,

Yue Sun

²,

Jingwen Chen

¹

and

Junshuo Chen

^3,*

¹

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi’an 710021, China

²

State Grid Jibei Electric Power Co., Ltd., Research Institute, Beijing 100045, China

³

School of Energy and Electrical Engineering, Chang’an University, Xi’an 710064, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(6), 373; https://doi.org/10.3390/a18060373

Submission received: 5 May 2025 / Revised: 14 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

Currently, deep reinforcement learning has been widely applied to energy system optimization and scheduling, and the DRL method relies more heavily on historical data. The lack of historical operation data in new integrated energy systems leads to insufficient DRL training samples, which easily triggers the problems of underfitting and insufficient exploration of the decision space and thus reduces the accuracy of the scheduling plan. In addition, conventional data-driven methods are also difficult to accurately predict renewable energy output due to insufficient training data, which further affects the scheduling effect. Therefore, this paper proposes a small-sample scenario optimization scheduling method based on multidimensional data expansion. Firstly, based on spatial correlation, the daily power curves of PV power plants with measured power are screened, and the meteorological similarity is calculated using multicore maximum mean difference (MK-MMD) to generate new energy output historical data of the target distributed PV system through the capacity conversion method; secondly, based on the existing daily load data of different types, the load historical data are generated using the stochastic and simultaneous sampling methods to construct the full historical dataset; subsequently, for the sample imbalance problem in the small-sample scenario, an oversampling method is used to enhance the data for the scarce samples, and the XGBoost PV output prediction model is established; finally, the optimal scheduling model is transformed into a Markovian decision-making process, which is solved by using the Deep Deterministic Policy Gradient (DDPG) algorithm. The effectiveness of the proposed method is verified by arithmetic examples.

Keywords:

integrated energy system; deep reinforcement learning; photovoltaic output prediction; small-sample scenario; data expansion algorithm

1. Introduction

With the growing prominence of energy crises and environmental issues, countries worldwide are actively restructuring their energy systems to reduce dependence on fossil fuels [1,2]. Integrated energy systems (IES), through multi-energy coupling, have emerged as an effective solution to enhance energy utilization efficiency and address these challenges [3,4]. The optimal scheduling of IES currently represents a key research focus in this field.

Currently, there have been a large number of studies for integrated energy system dispatch, mostly offline optimization problems for day-ahead dispatch [5,6,7,8,9,10]. Stochastic programming [11,12] and robust optimization [13] have been used to deal with uncertainties caused by fluctuations in renewable energy sources, loads, and real-time tariffs [14] in the system. Traditional optimization methods for dealing with uncertainty rely on predicting day-ahead renewable energy output, load, and other data. A better optimal dispatch solution can be obtained by transforming the problem into a deterministic one, which can be solved by modeling the uncertainty. The above methods are mainly based on physical modeling approaches, and more accurate modeling of the integrated energy system is needed to obtain more dispatch information.

This paper considers another class of optimal scheduling methods: data-driven optimal scheduling. Methods such as reinforcement learning (RL) and deep RL (DRL) [15] have gained widespread attention since the success of AlphaGo [16]. RL learns through interactive trial-and-error [17], allowing model-free algorithms to handle system factors that are difficult to accurately model, and exhibits better real-time decision-making performance, which can be utilized in IES online real-time scheduling.

There have been studies applying DRL to IES optimal scheduling studies. Reference [18] proposed a real-time battery storage control model based on reinforcement learning by describing the problem as a Markov decision process (MDP), solving it using the Q-learning algorithm, and analyzing the effect of the discrete degree of the action space on the algorithm’s performance. However, discretizing continuous variables will significantly increase the computation time and error for continuous space problems, such as integrated energy system scheduling. Current mainstream DRL algorithms include the Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and the Twin-Delayed Deep Deterministic Policy Gradient algorithm (TD3), among others [19,20,21,22]. Reference [23] utilized the DDPG algorithm to solve continuous space problems, verifying the effectiveness and robustness of the algorithm. There is extensive literature on improving the method and enhancing the algorithm performance for the problems of convergence difficulty and unstable convergence in deep reinforcement learning algorithms. Reference [24] tuned the proportional-integral differential controller parameters based on an improved reinforcement learning agent with Double-Delayed Deep Deterministic Policy Gradients and trained multiple deep reinforcement learning agents under stochastic load perturbations and nonlinear power generation models to obtain the optimal controller gain for a multi-area interconnected system.

Deep reinforcement learning has been widely applied to energy system optimization and scheduling. However, the DRL method relies more heavily on historical data, and few existing studies have considered scenarios with insufficient data. The lack of historical operation data in the newly built integrated energy system leads to insufficient DRL training samples, which are prone to the problems of underfitting and inadequate exploration of the decision space. It thus reduces the accuracy of the scheduling plan. At the same time, the volatility and uncertainty of renewable energy output in the integrated energy system seriously affect the energy supply and demand balance [25], energy storage system energy management, and demand response management, and accurate prediction of its output can effectively improve the scientificity of real-time intraday scheduling in IESs [26]. Many newly built IESs have been put into operation for a short period, and the lack of historical data for renewable energy sources makes it difficult to establish a high-precision output prediction model. Reference [27] proposed a hybrid prediction model based on gray correlation analysis (GRA) combined with a sparrow search algorithm (SSA) and a gray neural network model (GNNM), which is used to address the issue of low prediction accuracy in short-term PV power generation predictions with small sample sizes. Reference [28] proposed a PV power prediction framework based on data enhancement, which recovers PV power data by filtering erroneous data and utilizing a self-encoder network. The experiments show that this method can effectively discover relevant features and repair erroneous data. However, these methods are still insufficient for comprehensive small-sample PV prediction.

Therefore, to solve the problem of missing data in small-sample scenarios of the deep reinforcement learning-based integrated energy system optimization scheduling problem, relying on the data-driven methodology, an XGBoost-DDPG optimization scheduling framework based on multidimensional data expansion for integrated energy systems in small-sample scenarios is proposed, which contains data expansion, photovoltaic prediction, and real-time optimization scheduling in small-sample scenarios. The experimental results demonstrate the effectiveness of our approach in addressing data scarcity challenges through multidimensional data augmentation and improved prediction-scheduling coordination. The main contributions of this paper are:

1. A small-sample data expansion method based on spatiotemporal meteorological synergistic enhancement is proposed to address the problem of scarcity of historical data for newly built integrated energy systems, which integrates the dual screening mechanisms of geospatial correlation. The method integrates the dual screening mechanism of geospatial correlation (Haversine formula) and meteorological feature matching (MK-MMD algorithm). It migrates cross-station PV data through the capacity conversion formula. Dynamic load enhancement based on day-type segmentation generates synthetic load data that accurately reflects the distribution characteristics through uniform random sampling, effectively addressing data scarcity in small-sample scenarios.

2. For the optimization and scheduling problem of a newly built integrated energy system, an integrated energy system optimization and scheduling framework based on multidimensional data expansion in a small-sample scenario is proposed, which includes the expansion of PV data and load data in small-sample scenario, PV output prediction considering a sample imbalance, and real-time optimization and scheduling based on the DRL method. The framework implements a closed-loop solution, spanning data expansion, prediction modeling, and optimal scheduling. It provides a new technical path for integrating energy system optimal scheduling in small-sample scenarios.

The rest of the paper is organized as follows. Section 2 presents a method for expanding small-sample data based on spatiotemporal meteorological synergistic enhancement. Section 3 describes the XGBoost PV power prediction method considering data imbalance in small-sample scenarios. Section 4 develops the optimal scheduling model of an integrated energy system based on deep reinforcement learning in small-sample scenarios. Case studies are presented in Section 5. Section 6 concludes.

2. A Small-Sample Data Expansion Method Based on Spatiotemporal Meteorological Synergistic Enhancement

The optimized scheduling of newly built Integrated Energy Systems (IESs) faces the challenge of data scarcity in small-sample scenarios, particularly when renewable energy output is highly uncertain. Traditional data enhancement methods (e.g., interpolation and generative adversarial networks) are prone to overfitting or pattern collapse when the sample size is too small.

Since closely spaced PV power plants in the same region tend to have similar meteorological factors, the output of PV equipment with similar capacities, efficiencies, etc. exhibits a high degree of similarity. Therefore, when historical data for a new PV power plant are lacking, they can be expanded by utilizing the historical data of similar PV power plants. Although the power load also has a certain degree of uncertainty, its overall distribution characteristics are more stable, making it appropriate to use dynamic data enhancement methods for data expansion.

Therefore, this paper proposes a small-sample data expansion method based on Spatiotemporal Meteorological Augmentation (STMA), which replaces the target distributed photovoltaic (PV) daily power curves based on meteorological similarity by surrounding PV daily power curves with multiple stations with measured power. The electricity load data employ a dynamic enhancement strategy based on day type division, which fully considers the differences in load patterns between weekdays and weekends. A uniform random sampling method generates synthetic data that conform to the real distribution characteristics, thereby constructing an enhanced dataset applicable to the target area. The data expansion is divided into PV output data expansion and electricity load data expansion.

The complete data expansion method flow is shown in Figure 1 and contains the following core steps:

1. Spatial correlation screening: Selecting neighboring PV plants based on geographic distance.

2. Meteorological similarity calculation: Quantify the difference in the distribution of meteorological features between the source and target domains using the multi-kernel maximum mean difference (MK-MMD) method.

3. Data conversion and fusion: Retain the source PV plant data with the highest similarity, convert the source PV plant data to the target PV plant according to the capacity ratio, and obtain complete PV output data.

4. Day Type Classification: Classify the electricity load data according to day type.

5. Same-day data filling: Fill in the missing historical load data by random sampling according to the same day type and time.

6. Data Merging: Merge the generated PV output and load data with the existing data, aligning them with time to obtain a more comprehensive historical electricity consumption dataset.

The basic idea of missing power data reconstruction for information at all-black sites is to utilize the surrounding PV daily power curve multi-field stations with measured power to replace the target distributed PV daily power curve based on meteorological similarity. The specific process is as follows:

(1) Spatial correlation screening

The closer the spatial distance of the PV power stations decreases, the stronger the correlation between the meteorological conditions and the power pattern. Therefore, the PV power stations with stronger correlation are first screened by spatial correlation. Define the coordinates of target area power plants as (lat_t, lon_t), the coordinates of candidate source domain power plants as (lat_s, lon_s), and Haversine’s formula calculates the geographic distance d:

d = 2 R \arcsin (\sqrt{\sin^{2} (\frac{Δ l a t}{2}) + \cos (l a t_{t}) \cos (l a t_{s}) \sin^{2} (\frac{Δ l o n}{2})})

(1)

where R = 6371 km is the radius of the Earth,

Δ l a t = l a t_{t} - l a t_{s}

,

Δ l o n = l o n_{t} - l o n_{s}

.

Screening strategy: Retain stations with d ≤ D_max as candidate source domains, with D_max set according to regional climate characteristics (e.g., D_max = 50 km for plains), or select the nearest n stations according to the number of stations.

(2) Calculation of meteorological similarity

The multi-kernel maximum mean discrepancy (MK-MMD) is employed to compute meteorological similarity. A temporal feature matrix is constructed for each candidate power station in the source domain

X \in ℝ^{T \times F}

, where T denotes the time steps and F represents meteorological features such as irradiance, temperature, and wind speed. The source domain

S = {x_{i}^{s}}_{i = 1}^{n}

and target domain

T = {x_{j}^{t}}_{j = 1}^{m}

are defined, and MK-MMD quantifies their distribution discrepancy using multi-scale Gaussian kernels:

{MK - MMD}^{2} (S, T) = \frac{1}{n^{2}} \sum_{i, j = 1}^{n} K (x_{i}^{s}, x_{j}^{s}) + \frac{1}{m^{2}} \sum_{i, j = 1}^{m} K (x_{i}^{t}, x_{j}^{t}) - \frac{2}{n m} \sum_{i = 1}^{n} \sum_{j = 1}^{m} K (x_{i}^{s}, x_{j}^{t})

(2)

The multi-kernel function is defined as:

K (x, y) = \sum_{k = 1}^{K} \exp (- \frac{∥ x - y ∥^{2}}{2 σ_{k}^{2}})

(3)

where σ_k represents adaptive bandwidths.

The similarity metric is computed by:

Similarity = \frac{1}{1 + MK - MMD}

(4)

For station data comparison, a 24-h (1-day) cycle is adopted as the temporal unit. Daily similarity between each candidate station and the target station is calculated. PV output data are then converted proportionally based on date-matched similarity results.

(3) PV Output Data Conversion:

The historical daily power profiles

P_{{CG}_{t}}

from selected source domain PV stations are scaled according to the capacity ratio between the well-documented station capacity

S_{CG}

and the target distributed PV capacity

S_{DG}

, yielding the reconstructed daily power curve

P_{{DG}_{t}}^{'}

:

P_{{DG}_{t}}^{'} = P_{{CG}_{t}} \frac{S_{DG}}{S_{CG}}

(5)

For the filtered source domain PV stations S₁, S₂, …, S_k, their output data P_s are temporally aligned with the target domain data P_t and concatenated:

P_{fused} = Concat (P_{t}, P_{s_{1}}, P_{s_{2}}, \dots, P_{s_{k}})

(6)

(4) Day Type Classification:

To address the critical challenge of limited historical data in newly established energy systems, this study intelligently extracts and restructures load characteristics from existing weekday and weekend data. The original dataset is first partitioned into two distinct temporal categories:

Let the original dataset be

D \in ℝ^{p \times 24}

containing p days of load profiles:

Weekday subset: $D_{weekday} \in ℝ^{n \times 24}$ (comprising n days);
Weekend subset: $D_{weekend} \in ℝ^{m \times 24}$ (containing m days).

where p = n + m.

(5) Same-Day Data Imputation:

The periodic characteristics of the power load retained by the module are determined by the data type to ensure that the generated data conform to the actual power consumption pattern (such as the difference between the peak load on weekdays and weekends) and to avoid feature distortion caused by simple random generation. The uniform random sampling strategy is adopted to maintain the distribution characteristics of the original data while ensuring data diversity, so that the enhanced dataset can not only fully cover the decision-making space but also accurately reflect the uncertainty of power load demand, thereby significantly improving the generalization ability and robustness of the subsequent dispatching model.

Let there be an existing subset of intra-week data

D_{weekday}

, a subset of weekend data

D_{weekend}

, the number of generation days K,

D_{τ}

is a sampling subset,

τ

denotes the data type

τ \in {weekday, weekend}

, and let the data to be generated be

\hat{D}

, which can be expressed by the following equation:

\hat{D} = [{\hat{d}}_{k, 0}, {\hat{d}}_{k, 1}, \dots, {\hat{d}}_{k, 23}] \in ℝ^{K \times 24}, k \in [1, K]

(7)

D_{τ} = \{\begin{matrix} D_{weekday}, \begin{matrix}  \end{matrix} τ = weekday \\ D_{weekend}, \begin{matrix}  \end{matrix} τ = weekend \end{matrix}

(8)

For the target moment

t \in {0, 1, \dots, 23}

, sample independently from the corresponding type of dataset:

{\hat{d}}_{t} = \{\begin{matrix} D_{t}^{(k_{t})}, \begin{matrix}  \end{matrix} k_{t} \sim U {1, n}, If weekly data is generated \\ D_{t}^{(s_{t})}, \begin{matrix}  \end{matrix} s_{t} \sim U {1, m}, If weekend data is generated \end{matrix}

(9)

where

U {\cdot}

denotes a uniform discrete distribution.

The specific process of data augmentation is as follows: First, the corresponding original dataset (

D_{weekday}

or

D_{weekend}

) is automatically selected based on the type of date (weekday/weekend), and then, the synthetic data are generated by a double-loop structure: the outer loop controls the generation of K time series, and the inner loop randomly samples the loading values from the selected dataset on an hour-by-hour basis (0–23 h). A uniform distribution is used for each sampling to ensure data diversity, and the constructed complete 24-h sequence is finally added to the augmented dataset

\hat{D}

. This method effectively addresses the lack of historical data for new energy systems by preserving the temporal characteristics of the original data (e.g., the difference in load patterns between weekdays and weekends). Algorithm 1 is the pseudo-code for this process.

Algorithm 1 Power load data expansion algorithm
Input: Raw datasets $D_{weekday}$ , $D_{weekend}$ ; Number of synthetic sequences $K$ .
Output: Augmented dataset $\hat{D} \in ℝ^{K \times 24}$
1:	Data Preparation:
2:	Initialize $τ$ type selector
3:	Iterative Generation:
4:	For $k = 1$ to $K$ do
5:	Determine sequence type:
6:	if $τ$ is weekday then
7:	$D_{τ} \leftarrow D_{weekday}$
8:	else
9:	$D_{τ} \leftarrow D_{weekend}$
10:	Initialize empty sequence ${\hat{D}}^{(k)} \leftarrow ϕ$
11:	for $t = 0$ to 23 do
12:	Sample index $i ~ U {1, \| D_{τ} \|}$
13:	Extract data point $\hat{d} t \leftarrow D τ^{(i)} (t)$
14:	Append ${\hat{d}}_{t}$ to ${\hat{D}}^{(k)}$
15:	Add ${\hat{D}}^{(k)}$ to $\hat{D}$
16:	Return $\hat{D}$

3. XGBoost PV Power Prediction Method Considering Data Imbalance in Small-Sample Scenarios

This paper employs the current mainstream XGBoost algorithm for photovoltaic power prediction [29,30,31]. Due to the limited actual operation data in the expanded dataset, an imbalance phenomenon occurred in the samples. Therefore, we oversampled the original real data. By duplicating and expanding a small amount of real data, the ratio of real data to expanded data became 1:1, prioritizing recent patterns and improving the prediction’s accuracy under the condition of stable weather changes in the short term. The specific process is shown in Figure 2. The orange part represents the actual data, while the blue part represents the expanded data. Let the sample size of the original real data and the sample size of the expanded data be

N_{r e a l}

and

N_{s y n}

, respectively. Replicate k copies of the actual output portion of the completed output data obtained from the expansion to obtain the amount of real data after oversampling is equal to the sample size of the expanded data, i.e.,

N_{r e a l}^{'} = k \cdot N_{r e a l}

. If

N_{r e a l}

is not an integer multiple of

N_{s y n}

, equilibrium can be achieved by random sampling or partial replication.

Based on the prediction principle of XGBoost, all features are predicted via the input of Equation (10). Let the time series dataset, including PV output data and influencing factors, be

D = {(X_{t}, y_{t})}_{t = 1}^{T}

, where

X = [x_{1}, x_{2}, \dots, x_{n}]

denotes the input feature vector at time t, including historical PV output features and meteorological observation features, i.e., the input quantity. y_t denotes the load value at time t, i.e., the output quantity. This paper’s historical PV output characteristics consider the first hour’s power, the first two hours’ power, and the first three hours’ sliding average power. On the other hand, meteorological characteristics are the main influencing factors affecting PV output determined by the correlation coefficient method.

The XGBoost model is based on the Gradient Boosting Decision Tree (GBDT) algorithm. The Gradient Boosting algorithm is an integrative learning method. Figure 3 shows the framework of the XGBoost algorithm, which iteratively trains a weak learner (usually a decision tree CART), corrects the residuals of the previous round in each round, and utilizes the gradient descent to minimize the loss function while introducing the learning rate and regularization term to control the model complexity and, finally, build a powerful integrated model.

In XGBoost, each leaf node has a weight, also known as leaf weight, which is the predicted value of the data corresponding to that leaf node. The leaf weight represents the regression take of all the samples in that tree at this leaf node. When there are multiple trees, the predictions of all trees are weighted and summed to obtain the final prediction, which the following equation can express:

{\hat{y}}_{i} = \sum_{k}^{K} f_{k} (X_{i})

(10)

where

f_{k}

is the prediction function of the k-th tree, X_i indicates the input features at time i, and

{\hat{y}}_{i}

denotes the prediction result of the k-th tree.

The XGBoost model can discover the complex nonlinear statistical relationship between the target model and the observed features by combining weak learners and complexity control iteratively and gradually optimizing the model to improve the performance during the training process. Combined with the prediction results in Equation (6), the objective function of the XGBoost model is:

O_{bj} = \sum_{i = 1}^{M} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(11)

Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{k = 1}^{K} ω_{k}^{2}

(12)

where M denotes the total number of samples in the dataset; y_i denotes the actual value;

l (y_{i}, {\hat{y}}_{i})

is the loss function, which measures the difference between the predicted value and the actual value;

Ω (f_{k})

represents the model complexity, i.e., the regularization term, which helps to prevent the model from overfitting; T is the number of leaf nodes; ω_k is the leaf node weights; and γ and λ are pre-given hyperparameters controlling the number of leaf nodes and the score, respectively.

The XGBoost model gradually enhances the additive training process of the model by moving forward. It trains a new model in each round of iterations, which is added to the set of previous models to reduce the loss function gradually.

{\hat{y}}_{i}^{q} = \sum_{q}^{Q} f_{k} (x_{i}) = \sum_{q}^{Q - 1} f_{k} (x_{i}) + f_{q} (x_{i})

(13)

\sum_{k}^{K} Ω (f_{k}) = \sum_{q}^{Q - 1} Ω (f_{k}) + Ω (f_{q})

(14)

O_{bj} = \sum_{i = 1}^{M} [f_{q} (x_{i}) g_{i} + \frac{1}{2} f_{q} {(x_{i})}^{2} h_{i}] + Ω (f_{q})

(15)

where Q is the number of iterations;

y_{i}^{q}

is the actual value of the q-th iteration;

{\hat{y}}_{i}^{q}

is the predicted value of the q-th iteration; f_q is the optimal tree in the q-th iteration; g_i and h_i are the first-order and second-order derivatives, respectively, of

y_{i}^{(q - 1)}

over the loss function

l (y_{i}^{q}, {\hat{y}}_{i}^{(q - 1)})

; and

l (y_{i}^{q}, {\hat{y}}_{i}^{(q - 1)})

is obtained from the Taylor expansion of

f_{a} (x_{i})

.

The XGBoost model is designed to gradually reduce the value of the loss function by using gradient boosting and combining information about the first-order and second-order derivatives. Thus, optimizing the loss function can be transformed into approximating the minimum of a quadratic function.

4. Optimal Scheduling Model of Integrated Energy System Based on Deep Reinforcement Learning in Small-Sample Scenarios

4.1. Optimized Scheduling Objective Functions for Integrated Energy Systems

The optimal energy system dispatch problem can be modeled using the MINLP model given in Equations (16)–(27). The objective function in Equation (16) aims to minimize the cost of operating the entire time horizon

T

, including the operating cost of the DG unit shown in Equation (17) and the cost of buying/selling power to the main grid in Equation (18). Given the output power

P_{i, t}^{G}

of the DG unit P, the operating cost can be estimated using the quadratic model in Equation (17). The transaction costs between the energy system and the main grid are settled on a time-of-use tariff, where the sale price is assumed to be lower than the purchase price. In Equation (18), p_t is the time-sharing tariff for time period t, and

P_{t}^{N}

is the power transaction of output/input to the primary grid.

\min \sum_{t \in T} \sum_{i \in G} (C_{i, t}^{G} + C_{t}^{E}) Δ t

(16)

C_{i, t}^{G} = a_{i} \cdot {(P_{i, t}^{G})}^{2} + b_{i} \cdot P_{i, t}^{G} + c_{i}, i \in G .

(17)

C_{t}^{E} = \{\begin{matrix} ρ_{t} P_{t}^{N} \begin{matrix}  \end{matrix} P_{t}^{N} > 0, \\ β ρ_{t} P_{t}^{N} \begin{matrix}  \end{matrix} P_{t}^{N} < 0 . \end{matrix}

(18)

\sum_{i \in G} P_{i, t}^{G} + \sum_{m \in V} P_{m, t}^{V} + P_{t}^{N} + \sum_{j \in B} P_{j, t}^{B} = \sum_{k \in L} P_{k, t}^{L}, \forall t \in T

(19)

\begin{matrix} {\underline{P}}_{i}^{G} \cdot u_{i, t} \leq P_{i, t}^{G} \leq {\bar{P}}_{i}^{G} \cdot u_{i, t} \forall i \in G, \forall t \in T \end{matrix}

(20)

\begin{matrix} P_{i, t}^{G} - P_{i, t - 1}^{G} \leq R U_{i} \forall i \in G, \forall t \in T \end{matrix}

(21)

P_{i, t}^{G} - P_{i, t + 1}^{G} \leq R D_{i} \forall i \in G, \forall t \in T

(22)

- {\underline{P}}_{j}^{B} \leq P_{j, t}^{B} \leq {\bar{P}}_{j}^{B} \forall j \in B, \forall t \in T

(23)

\begin{matrix} S O C_{j, t}^{B} = S O C_{j, t - 1} + η_{B} P_{j, t}^{B} Δ t \forall j \in B, \forall t \in T \end{matrix}

(24)

{\underline{E}}_{j}^{B} \leq S O C_{j, t}^{B} \leq {\bar{E}}_{j}^{B} \forall j \in B, \forall t \in T

(25)

- {\bar{P}}^{C} \leq P_{t}^{N} \leq {\bar{P}}^{C} \forall t \in T

(26)

u_{i, t} \in {0, 1}, \forall i \in G, \forall t \in T

(27)

The power balance constraint is formulated in Equation (19). The commitment state of the DG units is modeled using a binary variable, i.e., u_i_,t = 1 indicates that the i-th DG unit is in operation. The generation capacity limits of DG units are specified in Equation (20), while Equations (21) and (22) impose ramp-up and ramp-down constraints, respectively. For the energy storage system (ESS), Equations (23)–(25) define its operational model. It is worth noting that this formula does not take into account the operating costs of the energy storage system, and it can also pre-plan charging and discharging operations. Equation (23) defines the charging and discharging power limits, and Equation (24) models the state of charge (SOC) as a function of the charging and discharging power. The expression in Equation (25) limits the energy stored in the cell and avoids the effects of overcharge and over-discharge. Finally, the main grid output/import power limit is modeled by the expression in Equation (26). The constraint in Equations (20)–(25) is achieved through hard constraints, while the power balance in Equation (19) and Equation (26) is constrained by adding penalty terms.

In the equation, a_i, b_i, and c_i represent the quadratic, linear, and constant term coefficients of the operating cost for DG unit i, respectively; Δt is the discretized interval length of the operating time; λ is the discount factor;

{\bar{P}}_{i}^{G}

and

{\underline{P}}_{i}^{G}

denote the maximum and minimum power generation limits of distributed generator (DG) unit i, respectively; RU_i and RD_i are the ramp-up and ramp-down capabilities of DG unit i;

{\bar{P}}_{j}^{B}

and

{\underline{P}}_{j}^{B}

represent the maximum charging and discharging power limits of energy storage system (ESS) j;

{\bar{E}}_{j}^{B}

and

{\underline{E}}_{j}^{B}

are the maximum and minimum levels of its state of charge (SOC);

{\bar{P}}^{C}

is the maximum limit of the import/export power of the main grid; β is the electricity sales coefficient; η_B is the energy exchange efficiency of the ESS; σ₁ and σ₂ are the reward adjustment coefficient and constraint penalty coefficient, respectively; ρ_t is the electricity price at time t;

P_{m, t}^{V}

is the active power output of the photovoltaic (PV) system at time t; and

P_{k, t}^{L}

is the active power demand of load k at time t. Among the continuous variables,

P_{i, t}^{G}

is the active power output of DG unit i at time t;

P_{j, t}^{B}

is the charging/discharging power of ESS j at time t;

S O C_{j, t}^{B}

is the state of charge of ESS j at time t;

P_{t}^{N}

is the import power of the main grid at time t; and ΔP_t is the active power imbalance at time t. The binary variable u_i_,t indicates the operating state of DG unit i at time t (1 for running, 0 for shutdown).

4.2. Markov Decision Modeling and Solution Methods

During the operation of an integrated energy system (IES), the state of each power component in each scheduling period is determined solely by the state of that component in the previous scheduling period and changes in the external environment. Therefore, it can be modeled using a Markov decision process (MDP). A MDP can be represented by a quintuple: (S, A, P, R, and λ), where S denotes the system’s state variables, A represents the action space, P is the state transition probability, R is the reward function, and λ is the discount factor.

State Variables: In this formulation, the energy system operator can be modeled as a reinforcement learning (RL) agent. State information provides a critical basis for the operator to schedule generation units. Thus, the state is defined as

s_{t} = (P_{t}^{V}, P_{t}^{L}, P_{t - 1}^{G}, S O C_{t})

, where s_t ∈ S.

Action Variables: The actions for scheduling DG units and ESSs are defined as

a_{t} = (P_{i . t}^{G}, P_{t}^{B})

, where at ∈ A. The RL agent does not directly control transactions between the energy system and the main grid. Instead, after executing any action, power is imported/exported from the main grid to maintain the power balance. Therefore, a maximum power capacity constraint must be enforced, i.e., Equation (26).

Given the state s_t and the action at time step t, the energy system transitions to the next state s_t₊₁, defined as

P_{s s^{'}}^{a} = \Pr \{s_{t + 1} = s^{'} ∣ s_{t} = s, a_{t} = a\}

(28)

where

P_{s s^{'}}^{a}

corresponds to the leapfrog probability, which models the dynamics and uncertainty of the energy system. In model-based algorithms, uncertainty is predicted by determining values or sampling from a priori probability distributions. In contrast, DRL is a model-free approach that can learn uncertainty from historical data and interactions.

Reward function: Environmental offers reward r_t as an indicator guiding the direction of policy updates. In the optimal scheduling problem of the energy system, the reward function should guide the RL agent to take actions that minimize the operating cost while implementing the power balance constraint. This can be achieved by using the next reward function:

r_{t} (s_{t}, a_{t}) = - σ_{1} [\sum_{i \in G} C_{i, t}^{G} + C_{t}^{E}] - σ_{2} Δ P_{t},

(29)

where σ₁ and σ₂ control the trade-off between cost minimization and the penalties incurred when there is a power imbalance. ∆P_t corresponds to the power imbalance at time step t and is defined as

Δ P_{t} = | \sum_{i \in G} P_{i, t}^{G} + \sum_{m \in V} P_{m, t}^{V} + P_{t}^{N} + \sum_{j \in B} P_{j, t}^{B} - \sum_{k \in L} P_{k, t}^{L} |

(30)

The goal of the RL algorithm is to find an optimal scheduling policy π* (stochastic or deterministic) that maximizes the total expected discounted rewards of the formulated MDPs.

π^{*} = \arg \max_{π} E_{(s_{t}, a_{t}) \sim ρ_{π}} [\sum_{t \in T} R (s_{t}, a_{t})]

(31)

4.3. DDPG Algorithm

The DDPG algorithm is currently one of the more mainstream approaches; its advantage lies in its ability to handle high-dimensional and continuous action spaces, making it suitable for complex control tasks. This paper applies the above concept to the Markov decision process, solving for the optimal scheduling of the integrated energy system to mitigate the power imbalance in real-time scheduling, as well as to reduce costs through the optimization of charging and discharging actions for the integrated energy storage system. DDPG is an algorithm based on the structure of Actor–Critic [23], and its basic framework is shown in Figure 4.

Deep neural networks approximate the policy network and action value function. The parameters in the policy network and value network models are trained using the stochastic gradient method, which can solve the problem of continuous action space. It represents the policy as a form of a function with parameters that map a state directly to a definite action value and, therefore, a deterministic policy, which Bellman’s equations can solve to obtain the Q-value:

Q_{α} (s_{t}, a_{t}) = E [r (s_{t}, a_{t}) + γ Q_{α} (s_{t + 1}, a_{t + 1})]

(32)

Solving the maximum value of Bellman’s equation, the optimal action value function can be calculated, and the optimal policy can be obtained. The DDPG algorithm approximates the action value function through the Critic network and the policy function through the Actor network. At the same time, to ensure that the training process is stable, the DDPG algorithm, the Critic network is divided into two independent networks, the current Critic network and the target Critic network, and the Actor network is divided into two independent networks, the current Actor network and the target Actor network. The principle of the DDPG algorithm is shown in Figure 5.

Current Critic Network Parameter Update

The current Critic network Q updates the parameters by using the timing differential error (TD error), minimizing the loss function L as

L = \frac{1}{N} \sum_{I} {[y_{i} - Q (s_{i}, a_{i} | θ_{Q})]}^{2}

(33)

where

y_{i} = r_{i} + γ Q^{'} (s_{i + 1}, α^{'} (s_{i + 1} | θ_{α^{'}}) | θ_{Q^{'}})

is computed by the target Critic network and the target Actor network. Based on the loss function, the gradient of L with respect to θ_Q, denoted as

\nabla_{θ_{Q}} L

, is calculated, and the parameters θ_Q are updated along the gradient direction.

Current Actor Network Parameters Update

The current Actor network updates the parameter

θ_{α}

. The goal of the current Actor network is to obtain a larger Q-value, so, first, find the gradient

\nabla_{a} Q

of the current Critic network over the current Actor network action, and then calculate the gradient

\nabla_{θ_{α}} α

of the current Actor network a with respect to the parameter

θ_{α}

:

\nabla_{θ_{α}} α | s_{i} \approx \frac{1}{N} \sum_{i} \nabla_{a} Q (s, a | θ_{Q}) |_{s = s_{i}, a = α (s_{i})} \nabla_{θ_{α}} α (s | θ_{α}) |_{s = s_{i}}

(34)

Target Network Parameter Update

The parameters

θ_{α^{'}}

of the target Actor network α′ and the parameters

θ_{Q^{'}}

of the target Critic network Q′ are updated using a soft update method, expressed as

θ_{Q^{'}} \leftarrow τ θ_{Q} + (1 - τ) θ_{Q^{'}}

(35)

θ_{α^{'}} \leftarrow τ θ_{α} + (1 - τ)

(36)

where τ is the soft update coefficient, a weighted average of the original target and current network parameters, and its value ranges from 0 to 1.

4.4. Algorithm Flow

The framework of the XGBoost-DDPG optimal scheduling method for small-sample scenarios based on multidimensional data extension proposed in this paper is shown in Figure 6. The specific processes are as follows:

(1) First, to address the issue of scarce historical data for the new integrated energy system, the geographic distances between neighboring PV power plants are accurately calculated using the Haversine formula, and candidate power plants with strong spatial correlation are identified and screened out. On this basis, the multicore maximum mean difference (MK-MMD) algorithm is used to quantitatively match the distribution of meteorological features, ensuring that the selected source domain power stations have highly similar meteorological conditions to those of the target power stations. The screened high-quality source domain data are transformed into the output data applicable to the target power station through the capacity conversion formula. Meanwhile, for the power load data, based on the dynamic enhancement strategy of day type division, the difference in load patterns between weekdays and weekends is fully considered. A uniform random sampling method is used to generate synthetic data that conform to the characteristics of the real distribution.

(2) Secondly, the expanded PV output data are processed with sample imbalance, and the real data in the dataset are oversampled to increase the proportion of real data. A PV output prediction model is established using XGBoost, and the expanded dataset is utilized to train the model. The integrated learning framework combines meteorological features, historical load, and time cycle characteristics, utilizing a regularized objective function and second-order gradient optimization to accurately predict ultra-short-term PV power. The prediction result is then used as a key input parameter for optimal scheduling.

(3) Finally, the integrated energy system optimal scheduling problem is transformed into a Markov decision-making process, which is solved using the Deep Deterministic Policy Gradient (DDPG) algorithm. The Actor–Critical framework of the DDPG algorithm can effectively deal with the optimization problem in the continuous action space and approximate the optimal policy function and action value function by a deep neural network. During the training process, the goal network and soft update mechanism are used to ensure the algorithm’s stability. At the same time, a reasonable reward function is designed to guide the intelligence in minimizing operational costs while satisfying system constraints.

5. Case Studies

5.1. Data Description and Simulation Settings

The experiments were performed on a computer equipped with an AMD Ryzen 5 5600 6-Core processor, an NVIDIA GeForce GTX 4070 graphics card, and 16.0 GB of memory (RAM). The software simulation was implemented based on the Python 3.11.4 environment, the DDPG algorithm was implemented using PyTorch 2.6.0, and the XGBoost algorithm was implemented using Scikit-learn 1.6.1.

The data used in this paper were obtained from the open-source website renewables.ninja, which provides data on radiation intensity, temperature, precipitation, air density, wind speed, and the corresponding PV output power. In the optimized scheduling section of the integrated energy system, the PV output data are predicted by the method proposed in this paper, and the load and real-time tariffs are obtained from the real operating data of a region.

5.2. Data Sample Expansion

To select the features with the strongest correlation with the renewable energy output and increase the calculation speed, Pearson correlation coefficients were used to calculate the correlation between each meteorological feature and the renewable energy output. The results of the correlation calculation are shown in Table 1. As can be seen from the table, in terms of PV output, the correlation between radiation intensity and PV output is strong and is much higher than the other climatic factors. This is followed by humidity, and the remaining influences have low correlation with the PV output.

To verify the effectiveness of the method proposed in this paper, historical PV data and electric load data are extracted for three weeks each in the summer, transition season, and winter to simulate the small-sample scenarios. Firstly, the three nearest field stations are selected as candidate source domains based on geographic locations, and the geographic locations of the target PV plants and selected PV plants are shown in Table 2. The PV plants are chosen based on their geographic locations. Multicore maximum mean difference (MK-MMD) is used for meteorological similarity to calculate the similarity between the source domain data and the target domain data, which is calculated separately for each day of missing data, and the similar power stations of the day are calculated and then discounted to the target PV power stations according to the capacity ratio. Through the correlation analysis between meteorological characteristics and renewable energy output, it can be seen that the main meteorological influencing factor is radiation intensity. To avoid a significant difference in the weights of multiple influencing factors, only radiation intensity can be selected as the basis for judging in the meteorological similarity calculation.

The screened daily PV output data from the source domain are converted according to their respective capacities. The PV output data of the source and target domains, along with their corresponding meteorological features, are merged based on time alignment to construct a fusion dataset, which is used for model training. The electricity load data employs a dynamic enhancement strategy based on day type division, which fully considers the differences in load patterns between weekdays and weekends. It utilizes the uniform random sampling method to generate synthetic data that conform to the real distribution characteristics, thereby constructing an enhanced dataset applicable to the target region. The specific process is described in Section 2.

Taking the summer scenario as an example, the newly built integrated energy system contains only 3 weeks (21 days) of historical PV data and electricity load data. After adopting the small-sample data expansion method based on Spatiotemporal Meteorological Augmentation (STMA), the data volume is expanded from 3 weeks to 1 year (365 days), of which the PV output is the same as that of the weekend (365 days), in which the PV output data are converted from the three surrounding PV power stations based on the daily meteorological similarity, and the electricity load data are synthesized using the dynamic enhancement strategy of day type division with uniform random sampling.

5.3. PV Output Forecast Results

In the small-sample scenarios of the summer, winter, and transition seasons, data from a typical day in each season are selected as a test. The remaining data are used for training and validation, with an 8:2 ratio between the training set and the validation set. Data expansion methods, including those without data expansion, with data expansion, and with oversampling, are compared. For the XGBoost PV output prediction model, a grid search method is employed to determine the optimal parameters for training and prediction. The search range and step size are set, and the possible parameter combinations are explored and tested. The parameter combinations with the smallest error are then selected as the optimal parameters of the model. RMSE and MAE were used as indicators to evaluate the accuracy of the prediction. Table 3 shows the prediction errors in different seasons.

Figure 7 compares the predicted results and the actual values of the photovoltaic power station without dataset expansion. The relevant accuracy indicators are shown in Figure 7d. Due to the severe lack of historical output data of the newly built microgrid, the data used in the training set by traditional methods are too small, which are prone to underfitting and lead to poor prediction accuracy. Based on the RMSE index for further analysis, the prediction error of the prediction method in this paper has decreased by 49.36%, 14.75%, and 30.49%, respectively, in the summer, winter, and transitional seasons compared to the data-free expansion method. Among them, during the transition season, the prediction error decreased by 25.53% after adding data expansion and by an additional 6.66% after further increasing oversampling. After adding data expansion in the summer scene, the prediction error decreased by 34.03%. After adding oversampling, the error decreased by another 23.24%. After adding data expansion in the winter scenario, the prediction error decreased by 9.29%. After adding oversampling, the error further decreased by 6.02%.

Figure 7a–c show the comparison of photovoltaic prediction curves for typical days in the summer, winter, and transitional seasons. Due to the lack of sufficient training data, the prediction effect is poor in the data-driven prediction model. However, this paper can effectively alleviate this problem through data expansion and oversampling techniques. It is proven that the method proposed in this paper can effectively improve the accuracy of renewable energy output prediction under the condition of small samples.

5.4. Analysis of Data Expansion Effectiveness

Figure 8a presents a heat map showing the total annual output of photovoltaic power, which shows that solar power has a clear time-varying character, varying within a day and between days. Its output is concentrated in the daytime, especially between 9:00 A.M. and 5:00 P.M., with peaks around noon. In addition, the intensity and pattern of the output vary considerably from day to day, reflecting the effects of seasonal changes and weather conditions. This strong time variation means that the small amount of PV output data from newly built integrated energy systems cannot cover the diversity of the operating scenarios, especially those rare but critical situations (e.g., cloudy days in the winter or peak hours in the summer). This lack of diversity results in an inability to provide sufficient state space coverage for downstream tasks such as PV output prediction or reinforcement learning-based scheduling. Therefore, it is necessary to supplement the limited data with spatial correlation-based sample transfer and meteorological similarity metrics to construct a more representative and comprehensive training dataset, ensuring the learned models are highly stable and applicable to a wider range of real-world situations.

The kernel density estimation (KDE) curves before and after the expansion of the PV data are shown in Figure 8b. The probability density distribution pattern of the data is demonstrated. The figure shows that the distribution of a small amount of data differs from that of the complete data, in which there is a significant difference between the blue curve (original data and original one-month data) and the orange curve (year-round distribution). The original data distribution is wider, the main peak is lower, and the tail is longer, indicating that, when only a small amount of data are used, the characteristics of the distribution deviate significantly from the year-round data and cannot represent the overall characteristics of the year. It also shows that the data expansion method can better restore the missing data, in which the green curve (after expansion) and the orange curve (year-round distribution) are highly overlapped. The locations of the main peak and the distribution pattern are almost the same. This indicates that the data expansion method can effectively restore the missing year-round data distribution characteristics, making the sample distribution closer to the real year-round situation. This provides a more reliable database for subsequent modeling and reinforcement learning.

The oversampling method proposed in this paper is mainly used to improve the accuracy of PV prediction. We have effectively proved the effectiveness of oversampling through the ablation experiments in Section 4.3 of the paper. Compared to the prediction method without oversampling, this method can improve the accuracy to a certain extent.

When combined with time-of-day tariffs and electricity load demand, the expanded PV output data can provide a richer and more diverse state action space for reinforcement learning. This diversity helps the reinforcement learning intelligences to be exposed to more different scenarios and possibilities during the training process, thus enhancing their exploration ability and avoiding falling into local optimization. The rich data distribution can simulate various working conditions, enabling the intelligent body to learn strategies with more generalization ability and improving its adaptability and robustness in practical applications.

5.5. Optimal Dispatching Results of the Integrated Energy System

For the optimal scheduling task of an integrated energy system (IES), a rolling time domain optimization strategy is used to dynamically regulate the output allocation of each energy unit in collaboration with the energy storage system every hour based on the historical operating status of the system, the PV output prediction for the next hour (generated by the prediction model proposed in this paper), and the fine-tuned load demand (determined by the equipment operation plan) to achieve the minimization of the total system operating cost. This paper adopts the offline reinforcement learning approach and conducts learning from the expanded large-scale dataset. In the experiment, the simulation step size is taken as 1 h, the batch size is set to 512, the experience playback capacity is 50,000, the reward discount factor is 0.995, the number of neurons in the hidden layer of the Actor and Critic networks is 256, the learning rate of the Actor network is 0.0001, the learning rate of the Critic network is 0.001, and the soft updating parameter of the target network is 0.01.

The hyperparameters σ1 and σ2 in Equation (29) work together to affect the average operating cost and power imbalance to better understand the effect of the hyperparameter σ2. In this paper, σ1 is set to 1, and the value of σ2 is tested in the range of [20, 50, 100, 150]. Figure 9 shows the algorithm’s average running cost and power imbalance during the training period, and it can be seen that, when σ1 is fixed, the larger the value of σ2 is, the degree of power imbalance will be reduced rapidly, and the algorithm will converge faster. At the same time, a lower value of σ2 will speed up the convergence of the operating cost and slow down the convergence of the power imbalance. Overall, the deep reinforcement learning algorithm can converge gradually within 300 rounds in the tests conducted.

Reference [32] was used for some parameter settings. Three DG units are defined using the parameters shown in Table 4. These parameter settings are based on the typical configurations of existing diesel generator sets in the literature and are obtained through simulation verification based on actual simulation scenarios. They can effectively reflect the operating cost characteristics of the generator sets. The charging/discharging limit of the EESs is set to 100 kW, the nominal capacity is set to 500 kW, and the energy efficiency is set to η_B = 0.9. We assume the grid’s maximum output/input limit is 100 kW. To encourage the use of renewable energy, the sales price is set to half of the current electricity price, i.e., β = 0.5. Using data at one-hour resolution, the complete dataset after expansion is divided into a training set and a test set, including PV power generation, electrical load demand, and electricity prices for one year. The training set is only 3 weeks of data in the scenario without data expansion. The training set contains the first 3 weeks of each month in the scenario that includes data expansion. The test sets are all scheduled for the future week, and the process is also experimented with in the summer, winter, and transition seasons.

The initial SOC of the EESs was randomly set during training. All algorithms were implemented using Python via PyTorch and trained on 1000 episodes. The training time averaged about 1.5 h per 1000 episodes. Default settings were used unless otherwise stated. The hyperparameters σ₁ and σ₂ were defined as 1 and 100 as default values, respectively. The total running cost and power imbalance were used as metrics to evaluate the performance of the DRL algorithm. The experiments without data augmentation contain only existing data for training, and the rest of the model parameters are identical to those of the model with data augmentation. After the offline training of the model, the trained model can directly generate online scheduling decisions based on the real-time status of the IES.

Table 5 and Table 6 show the total power imbalance and the total cost during a week and a weekend for the summer, winter, and transition seasons, respectively. It can be seen that, after the data expansion, due to the sufficient sample size, the intelligences can be better trained with a low power imbalance most of the time while having a lower running cost, whereas the method without data expansion exhibits a higher power imbalance in scenarios that have never been seen, and these results are essential, because a larger power imbalance can lead to blackouts, primarily if the main grid defines the output limits. In Table 6, the method without data expansion has a lower total cost for some of the test days than the method with data expansion; this is because it does not respect the power balance, which leads to a large amount of power purchases from the grid, which reduces the total cost, but this is not justified.

To further analyze the reasons for this phenomenon, this paper uses a typical day within a summer week as an example for further analysis. Figure 10a,b show the scheduling results for typical summer intra-weekdays with and without data expansion. As can be seen from the figures, the method with data expansion is more inclined to draw power from the grid and charge the battery at night during the low-peak hours of electricity consumption. In contrast, the method without data expansion generates power. It sells it to the grid through unit 1, which is a time when the price of electricity is low; thus, the behavior is not economical. Also, DG3 is unreasonably invoked for power generation at 8:00 A.M. and 11:00 A.M., resulting in higher costs. In the second half of the scheduling cycle, the method with and without data expansion can effectively utilize energy storage for configuration due to sufficient learning training, prioritize the excess power generated by the PV for storage, sell the excess to the grid, and prioritize the use of batteries for energy supply when the PV output is insufficient. On the other hand, the method without data expansion does not sufficiently explore the decision space due to inadequate training samples, resulting in a failure of rational dispatch for decision-making during certain periods.

Figure 11 shows the reward curves with and without data expansion, and both methods start converging at 300 episodes. When training with the original small amount of data, the reward value increases rapidly at first and then converges to a certain range. The method with data expansion converges more slowly at first but eventually achieves higher reward values.

Figure 12 shows the time-of-day tariffs and SOC curves for the day. Both methods choose to charge when tariffs are low and discharge when tariffs are high; however, the method with data expansion is more efficient in terms of battery utilization, especially in the second half of the scheduling cycle.

Figure 13a shows the running cost at each time step, and it can be seen that this paper’s method has a lower running cost at most of the time steps, and only some of the periods with relatively high cost are because the comparison method does not comply with the power leveling constraint. Figure 13b shows the grid interaction comparison, where the intelligences can effectively comply with the power leveling constraints with sufficient training after data expansion. When training with only a small amount of data, insufficient samples result in a model that cannot generalize and struggles to handle previously unseen scenarios effectively.

6. Conclusions and Prospects

This paper proposes a data-driven optimal scheduling framework for new integrated energy systems with limited historical data. Considering the spatial correlation and meteorological similarity, the maximum mean difference method is used to find power similar to power stations, and the capacity discounting algorithm generates new energy power historical data. Based on the existing electric load data for different day types, we generate historical data by randomly sampling the load at the exact moment and combining the generated load data with the PV output data to construct a comprehensive historical dataset. For the sample imbalance problem in the small-sample scenario, the oversampling method is employed to enhance the data for the scarce samples, and an XGBoost PV output prediction model is established to predict the PV output in real-time. Finally, the optimal scheduling model is transformed into a Markov decision process and solved using the Deep Deterministic Policy Gradient (DDPG) algorithm. The framework can effectively solve the problem of insufficient training samples in deep reinforcement learning by integrating spatial correlation analysis, meteorological feature transfer, and dynamic data enhancement. An example study shows that, through multidimensional data enhancement and improved prediction scheduling coordination, the proposed framework can improve the accuracy of PV output prediction and the training effect of deep reinforcement learning intelligence in data-scarce scenarios, and the method can effectively satisfy the power constraint regulation and reduce the operation cost compared to the no data expansion scenario.

However, the current framework focuses on electricity-centered dispatch. Future extensions of the model to include heat–hydrogen gas multi-energy dynamics and demand response mechanisms will enhance its practical applicability.

Author Contributions

Conceptualization, Y.L. and J.C. (Junshuo Chen); Data curation, J.C. (Junshuo Chen); Formal analysis, K.Z.; Investigation, J.C. (Jingwen Chen); Methodology, Y.L.; Project administration, Y.S.; Resources, J.C. (Junshuo Chen); Software, Y.L. and J.C. (Jingwen Chen); Supervision, J.C. (Junshuo Chen); Validation, Y.L., K.Z., Y.S. and J.C. (Jingwen Chen); Writing—original draft, Y.L. and K.Z.; Writing—review and editing, Y.L. and K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Corporation of China, grant number 5400-202321572A-3-2-ZN.

Data Availability Statement

Data are available on request from the authors.

Conflicts of Interest

Author Yue Sun was employed by the company State Grid Jibei Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Li, L.; Lin, J.; Wu, N.; Xie, S.; Meng, C.; Zheng, Y.; Wang, X.; Zhao, Y. Review and Outlook on the International Renewable Energy Development. Energy Built Environ. 2022, 3, 139–157. [Google Scholar] [CrossRef]
Lee, C.-C.; Zhang, J.; Hou, S. The Impact of Regional Renewable Energy Development on Environmental Sustainability in China. Resour. Policy 2023, 80, 103245. [Google Scholar] [CrossRef]
Berjawi, A.E.H.; Walker, S.L.; Patsios, C.; Hosseini, S.H.R. An Evaluation Framework for Future Integrated Energy Systems: A Whole Energy Systems Approach. Renew. Sustain. Energy Rev. 2021, 145, 111163. [Google Scholar] [CrossRef]
El-Emam, R.S.; Constantin, A.; Bhattacharyya, R.; Ishaq, H.; Ricotti, M.E. Nuclear and Renewables in Multipurpose Integrated Energy Systems: A Critical Review. Renew. Sustain. Energy Rev. 2024, 192, 114157. [Google Scholar] [CrossRef]
Zhang, J.; Chen, J.; Ji, X.; Sun, H.; Liu, J. Low-Carbon Economic Dispatch of Integrated Energy System Based on Liquid Carbon Dioxide Energy Storage. Front. Energy Res. 2023, 10, 1051630. [Google Scholar] [CrossRef]
Ma, T.; Peng, L.; Wu, G.; Chen, D.; Zou, X. Optimized Operation of Integrated Cooling-Electricity-Heat Energy Systems with AA-CAES and Integrated Demand Response. Energies 2024, 17, 6000. [Google Scholar] [CrossRef]
Yu, P.; Wang, Z.; Guo, Y.; Tai, N.; Jun, W. Application Prospect and Key Technologies of Digital Twin Technology in the Integrated Port Energy System. Front. Energy Res. 2023, 10, 1044978. [Google Scholar] [CrossRef]
Jia, J.; Li, H.; Wu, D.; Guo, J.; Jiang, L.; Fan, Z. Multi-Objective Optimization Study of Regional Integrated Energy Systems Coupled with Renewable Energy, Energy Storage, and Inter-Station Energy Sharing. Renew. Energy 2024, 225, 120328. [Google Scholar] [CrossRef]
Wang, C.; Lv, C.; Li, P.; Song, G.; Li, S.; Xu, X.; Wu, J. Modeling and Optimal Operation of Community Integrated Energy Systems: A Case Study from China. Appl. Energy 2018, 230, 1242–1254. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, N.; Kang, C.; Kirschen, D.S.; Yang, J.; Xia, Q. Standardized Matrix Modeling of Multiple Energy Systems. IEEE Trans. Smart Grid 2019, 10, 257–270. [Google Scholar] [CrossRef]
Qadrdan, M.; Wu, J.; Jenkins, N.; Ekanayake, J. Operating Strategies for a GB Integrated Gas and Electricity Network Considering the Uncertainty in Wind Power Forecasts. IEEE Trans. Sustain. Energy 2014, 5, 128–138. [Google Scholar] [CrossRef]
Su, W.; Wang, J.; Roh, J. Stochastic Energy Scheduling in Microgrids With Intermittent Renewable Energy Resources. IEEE Trans. Smart Grid 2014, 5, 1876–1883. [Google Scholar] [CrossRef]
Martinez-Mares, A.; Fuerte-Esquivel, C.R. A Robust Optimization Approach for the Interdependency Analysis of Integrated Energy Systems Considering Wind Power Uncertainty. IEEE Trans. Power Syst. 2013, 28, 3964–3976. [Google Scholar] [CrossRef]
Shi, W.; Li, N.; Chu, C.-C.; Gadh, R. Real-Time Energy Management in Microgrids. IEEE Trans. Smart Grid 2017, 8, 228–238. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, D.; Qiu, R.C. Deep Reinforcement Learning for Power System Applications: An Overview. CSEE J. Power Energy Syst. 2020, 6, 213–225. [Google Scholar] [CrossRef]
Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the Game of Go without Human Knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
Wang, C.; Ju, P.; Lei, S.; Wang, Z.; Wu, F.; Hou, Y. Markov Decision Process-Based Resilience Enhancement for Distribution Systems: An Approximate Dynamic Programming Approach. IEEE Trans. Smart Grid 2020, 11, 2498–2510. [Google Scholar] [CrossRef]
Abedi, S.; Yoon, S.W.; Kwon, S. Battery Energy Storage Control Using a Reinforcement Learning Approach with Cyclic Time-Dependent Markov Process. Int. J. Electr. Power Energy Syst. 2022, 134, 107368. [Google Scholar] [CrossRef]
Shengren, H.; Vergara, P.P.; Duque, E.M.S.; Palensky, P. Optimal Energy System Scheduling Using a Constraint-Aware Reinforcement Learning Algorithm. Int. J. Electr. Power Energy Syst. 2023, 152, 109230. [Google Scholar] [CrossRef]
Zhang, Y.; Han, Y.; Liu, D.; Dong, X. Low-Carbon Economic Dispatch of Electricity-Heat-Gas Integrated Energy Systems Based on Deep Reinforcement Learning. J. Mod. Power Syst. Clean Energy 2023, 11, 1827–1841. [Google Scholar] [CrossRef]
Dou, J.; Wang, X.; Liu, Z.; Han, Y.; Ma, W.; He, J. MAPIRL: A Hyperbolic Tangent-Enforced Physical-Informed RL for Multi-IESs Optimal Dispatch. IEEE Trans. Ind. Appl. 2025, 61, 2549–2564. [Google Scholar] [CrossRef]
Yi, Z.; Luo, Y.; Westover, T.; Katikaneni, S.; Ponkiya, B.; Sah, S.; Mahmud, S.; Raker, D.; Javaid, A.; Heben, M.J. Deep Reinforcement Learning Based Optimization for a Tightly Coupled Nuclear Renewable Integrated Energy System. Appl. Energy 2022, 328, 120113. [Google Scholar] [CrossRef]
Yu, L.; Xie, W.; Xie, D.; Zou, Y.; Zhang, D.; Sun, Z.; Zhang, L.; Zhang, Y.; Jiang, T. Deep Reinforcement Learning for Smart Home Energy Management. IEEE Internet Things J. 2020, 7, 2751–2762. [Google Scholar] [CrossRef]
Khalid, J.; Ramli, M.A.M.; Khan, M.S.; Hidayat, T. Efficient Load Frequency Control of Renewable Integrated Power System: A Twin Delayed DDPG-Based Deep Reinforcement Learning Approach. IEEE Access 2022, 10, 51561–51574. [Google Scholar] [CrossRef]
Aslam, S.; Herodotou, H.; Mohsin, S.M.; Javaid, N.; Ashraf, N.; Aslam, S. A Survey on Deep Learning Methods for Power Load and Renewable Energy Forecasting in Smart Microgrids. Renew. Sustain. Energy Rev. 2021, 144, 110992. [Google Scholar] [CrossRef]
Zhu, J.; Dong, H.; Zheng, W.; Li, S.; Huang, Y.; Xi, L. Review and Prospect of Data-Driven Techniques for Load Forecasting in Integrated Energy Systems. Appl. Energy 2022, 321, 119269. [Google Scholar] [CrossRef]
Wang, Q.; Mutailipu, M.; Xiong, Q.; Jing, X.; Yang, Y. Small-Sample Short-Term Photovoltaic Output Prediction Model Based on GRA-SSA-GNNM Method. Processes 2024, 12, 2485. [Google Scholar] [CrossRef]
Wang, X.; Shen, Y.; Song, H.; Liu, S. Data Augmentation-Based Photovoltaic Power Prediction. Energies 2025, 18, 747. [Google Scholar] [CrossRef]
Zhu, J.; Li, M.; Luo, L.; Zhang, B.; Cui, M.; Yu, L. Short-Term PV Power Forecast Methodology Based on Multi-Scale Fluctuation Characteristics Extraction. Renew. Energy 2023, 208, 141–151. [Google Scholar] [CrossRef]
Xu, Y.; Zheng, S.; Zhu, Q.; Wong, K.; Wang, X.; Lin, Q. A Complementary Fused Method Using GRU and XGBoost Models for Long-Term Solar Energy Hourly Forecasting. Expert Syst. Appl. 2024, 254, 124286. [Google Scholar] [CrossRef]
Singh, U.; Singh, S.; Gupta, S.; Alotaibi, M.A.; Malik, H. Forecasting Rooftop Photovoltaic Solar Power Using Machine Learning Techniques. Energy Rep. 2025, 13, 3616–3630. [Google Scholar] [CrossRef]
Fan, C.; Wang, Y.; Zhang, Y.; Ouyang, W. Interpretable Multi-Scale Neural Network for Granger Causality Discovery. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]

Figure 1. Framework diagram of the STMA-based small-sample data augmentation approach.

Figure 2. Schematic diagram of oversampling.

Figure 3. XGBoost algorithm framework.

Figure 4. Actor–Critic architecture.

Figure 5. Flowchart of the DDPG algorithms.

Figure 6. The framework diagram of the XGBoost-DDPG optimal scheduling method for IES small-sample scenarios based on multidimensional data extension.

Figure 7. Comparison of PV power prediction results by different methods in different seasons. (a) Summer PV power prediction results. (b) Winter PV power prediction results. (c) Transitional season PV power prediction results. (d) Error comparison of each method (MAE).

Figure 8. Heat map and kernel density estimation curves: (a) Heat map of the annual PV output. (b) Kernel density estimation curves.

Figure 9. The total cost and power imbalance situations under different σ₂: (a) Total cost curves under different parameters. (b) Power imbalance curves under different parameters.

Figure 10. The scheduling results of typical days within the summer week: (a) Scheduling results with data expansion. (b) No data expansion scheduling results.

Figure 11. Comparison of the reward curves.

Figure 12. Time-of-day tariffs and SOC curves.

Figure 13. Single-step operating cost and grid interaction power. (a) Comparison of operating costs. (b) Comparison of power interaction with the power grid.

Table 1. Results of the correlation analysis between PV output and meteorological characteristics.

Characteristics	Temperature	Air Density	Radiation Intensity	Wind Speed	Precipitation
Correlation coefficient	0.52	0.19	0.98	−0.22	−0.09

Table 2. The geographical locations of photovoltaic power stations.

Station No.	Longitude	Latitude
Target photovoltaics	−94.0427	44.7910
No. 1 Photovoltaic	−94.7264	44.3268
No. 2 Photovoltaic	−93.8454	44.5473
No. 3 Photovoltaic	−94.3427	45.0465

Table 3. Prediction errors in different seasons.

Prediction Error	Without Data Expansion			With Data Expansion			Oversampling
Prediction Error	Trans.	Sum.	Win.	Trans.	Sum.	Win.	Trans.	Sum.	Win.
RMSE (kW)	17.35	20.42	20.13	12.92	13.47	18.26	12.06	10.34	17.16
MAE (kW)	9.05	11.47	9.88	7.82	6.74	8.74	6.06	4.97	8.28

Table 4. DG units information.

Units	$a [$ / {kW}^{2}]$	$b [$ / kW]$	$c [$]$	${\underline{P}}^{G} [kW]$	${\bar{P}}^{G} [kW]$	$R U [kW]$	$R D [kW]$
DG1	0.0034	3	30	10	150	100	100
DG2	0.001	10	40	50	375	100	100
DG3	0.001	15	70	100	500	200	200

Table 5. Power imbalance in different scenarios (kW).

Test Day	Without Data Expansion			With Data Expansion
Test Day	Trans.	Sum.	Win.	Trans.	Sum.	Win.
Weekday	259.84	141.22	361.65	156.08	4.31	0.00
Weekend	252.52	354.40	124.81	3.84	0.00	1.70

Table 6. Operating costs in different scenarios ($).

Test Day	Without Data Expansion			With Data Expansion
Test Day	Trans.	Sum.	Win.	Trans.	Sum.	Win.
Weekday	10,894.05	6389.72	11,192.66	6583.06	7557.53	10,805.92
Weekend	13,270.10	10,140.78	9138.18	12,796.00	5626.50	9502.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhang, K.; Sun, Y.; Chen, J.; Chen, J. A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion. Algorithms 2025, 18, 373. https://doi.org/10.3390/a18060373

AMA Style

Liu Y, Zhang K, Sun Y, Chen J, Chen J. A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion. Algorithms. 2025; 18(6):373. https://doi.org/10.3390/a18060373

Chicago/Turabian Style

Liu, Yaoxian, Kaixin Zhang, Yue Sun, Jingwen Chen, and Junshuo Chen. 2025. "A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion" Algorithms 18, no. 6: 373. https://doi.org/10.3390/a18060373

APA Style

Liu, Y., Zhang, K., Sun, Y., Chen, J., & Chen, J. (2025). A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion. Algorithms, 18(6), 373. https://doi.org/10.3390/a18060373

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Small-Sample Scenario Optimization Scheduling Method Based on Multidimensional Data Expansion

Abstract

1. Introduction

2. A Small-Sample Data Expansion Method Based on Spatiotemporal Meteorological Synergistic Enhancement

3. XGBoost PV Power Prediction Method Considering Data Imbalance in Small-Sample Scenarios

4. Optimal Scheduling Model of Integrated Energy System Based on Deep Reinforcement Learning in Small-Sample Scenarios

4.1. Optimized Scheduling Objective Functions for Integrated Energy Systems

4.2. Markov Decision Modeling and Solution Methods

4.3. DDPG Algorithm

4.4. Algorithm Flow

5. Case Studies

5.1. Data Description and Simulation Settings

5.2. Data Sample Expansion

5.3. PV Output Forecast Results

5.4. Analysis of Data Expansion Effectiveness

5.5. Optimal Dispatching Results of the Integrated Energy System

6. Conclusions and Prospects

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI