Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG

Wan, Zixing; Li, Wenwu; He, Mu; Zhang, Taotao; Chen, Shengzhe; Guan, Weiwei; Hua, Xiaojun; Zheng, Shang

doi:10.3390/en18153983

Open AccessArticle

Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG

by

Zixing Wan

^1,2,*,

Wenwu Li

^1,2,

Mu He

²,

Taotao Zhang

^1,3,

Shengzhe Chen

⁴,

Weiwei Guan

³,

Xiaojun Hua

³ and

Shang Zheng

⁵

¹

Hubei Technology Innovation Center for Smart Hydropower, Wuhan 430000, China

²

School of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China

³

China Yangtze Power Co., Ltd., Yichang 443000, China

⁴

Science and Technology Research Institute, China Three Gorges Corporation, Beijing 101117, China

⁵

Three Gorges Renewables Offshore Wind Power Operation and Maintenance Jiangsu Co., Ltd., Yancheng 224000, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(15), 3983; https://doi.org/10.3390/en18153983

Submission received: 7 June 2025 / Revised: 13 July 2025 / Accepted: 23 July 2025 / Published: 25 July 2025

(This article belongs to the Section B: Energy and Environment)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of high complexity in modeling the correlation of multi-dimensional stochastic variables and the difficulty of solving long-term scheduling models in continuous action spaces in multi-energy complementary systems, this paper proposes a long-term optimization scheduling method based on Deep Deterministic Policy Gradient (DDPG). First, an improved C-Vine Copula model is used to construct the multi-dimensional joint probability distribution of water, wind, and solar energy, and Latin Hypercube Sampling (LHS) is employed to generate a large number of water–wind–solar coupling scenarios, effectively reducing the model’s complexity. Then, a long-term optimization scheduling model is established with the goal of maximizing the absorption of clean energy, and it is converted into a Markov Decision Process (MDP). Next, the DDPG algorithm is employed with a noise dynamic adjustment mechanism to optimize the policy in continuous action spaces, yielding the optimal long-term scheduling strategy for the water–wind–solar multi-energy complementary system. Finally, using a water–wind–solar integrated energy base as a case study, comparative analysis demonstrates that the proposed method can improve the renewable energy absorption capacity and the system’s power generation efficiency by accurately quantifying the uncertainties of water, wind, and solar energy and precisely controlling the continuous action space during the scheduling process.

Keywords:

multi-energy complementarity; clean energy absorption; optimal scheduling; deep reinforcement learning; uncertainty

1. Introduction

The water–wind–solar multi-energy complementary system based on largescale hydropower bases is an important area for promoting the green transformation of energy and the high-quality development of clean energy. Long-term optimization scheduling in multi-energy complementary systems not only improves the absorption capacity of renewable energy but also enhances the stability and economic efficiency of system operation. However, on a long-term scale, both runoff and wind-solar power generation exhibit significant stochastic fluctuations and uncertainties, posing the following two challenges for the long-term scheduling of water–wind–solar integration systems: (1) How to accurately quantify the uncertainty of multi-dimensional stochastic variables to ensure the effectiveness of scheduling strategies; (2) how to efficiently solve the optimization scheduling strategy for multi-dimensional scenario sets to address the complexity and high dimensionality of multi-energy complementary systems. Solving these challenges is crucial for improving the operational efficiency of water–wind–solar complementary systems and promoting renewable energy absorption.

In response to these issues, numerous scholars have conducted extensive research on modeling and solving methods. In terms of uncertainty quantification, scenario analysis methods have been widely applied. Reference [1] uses a scenario tree method to generate scenario sets for long-term scales and inputs them into the long-term optimization scheduling model of reservoirs to improve the adaptability of scheduling decisions. Reference [2] develops a hybrid model to estimate the joint probability distribution of water and wind resources, generating numerous scenarios to promote scheduling optimization. Reference [3] uses Ensemble Kalman Filtering (EnKF) technology to address the uncertainty of runoff and photovoltaic power generation and successfully applies it to long-term stochastic optimization. References [1,2,3] consider only the uncertainty quantification in reservoir operation optimization involving a single type of renewable energy. As the number of clean energy sources participating in the scheduling increases, the complexity of uncertainty modeling in multi-energy complementary systems will grow exponentially. For example, References [4,5] employ Markov chains and a Copula-based C-Vine model to characterize the spatiotemporal correlations among heterogeneous energy sources such as hydropower, wind, and solar. These methods generate a large number of uncertainty scenarios to enhance the accuracy of dispatch optimization. Nevertheless, during the uncertainty quantification process, it is necessary to comprehensively account for the pairwise dependencies among random variables, which significantly increases model complexity and may result in overfitting. In terms of long-term scheduling solution methods, Reference [6] uses dynamic programming, discrete differential dynamic programming, and step-by-step optimization algorithms to efficiently solve multi-objective scheduling models. Reference [7] develops medium- and long-term optimization scheduling rules for water–solar complementary systems based on stochastic optimization methods, effectively improving the overall system benefits. Reference [8] proposes a two-stage stochastic optimization method based on residual benefit functions, providing guidance for long-term scheduling of water–solar complementary systems. Despite these advancements, traditional optimization algorithms still face the “curse of dimensionality” when dealing with long-term scheduling of multi-energy complementary systems, making it difficult to effectively handle large-scale, multi-constraint optimization tasks. With the advancement of artificial intelligence technologies, deep reinforcement learning algorithms based on Deep Q-Networks (DQN) have been gradually introduced into long-term optimal scheduling of cascade reservoirs and short-term dispatch of multi-energy complementary systems [9,10]. Reference [11] proposed a real-time scheduling model for multi-energy systems using a deep reinforcement learning algorithm based on Soft Actor-Critic (SAC), which alleviates the “curse of dimensionality”. However, when solving the scheduling optimization problem of hydro–wind–solar multi-energy complementary systems, the continuous nature of reservoir regulation in practical operations is not fully considered, resulting in limited optimization performance of the strategies and poor applicability in real-world implementation.

Therefore, the main contributions of this paper are as follows:

Regarding the complexity of high-dimensional uncertainty modeling:

An improved C-Vine Copula model is introduced to more effectively capture the non-linear dependencies among multi-dimensional stochastic variables in hydro–wind–solar systems, overcoming the limitations of traditional modeling approaches in balancing model complexity and the accurate depiction of inter-variable correlations under high-dimensional uncertainty.

Combined with LHS, the method significantly reduces the dimensionality and quantity of required samples while preserving scenario representativeness, thereby mitigating the computational complexity caused by high-dimensional uncertainty and providing a more realistic scenario set as input to the long-term scheduling model.

2.: Regarding the difficulty of solving in continuous action space:

The long-term scheduling problem is formulated as an MDP, and the DDPG algorithm is introduced to handle continuous control variables. This overcomes the limitations of traditional reinforcement learning algorithms designed for discrete action spaces, and enables the output of continuous dispatch instructions with control capability, improving engineering applicability.

A noise dynamic adjustment mechanism is designed to enhance exploration while ensuring convergence stability, achieving stable policy optimization in high-dimensional continuous action space.

In summary, this paper proposes a long-term scheduling optimization method for the water–wind–solar multi-energy complementary system based on the DDPG algorithm, aiming to achieve synergistic optimization of resource utilization and improvements in both economic efficiency and system stability.

2. Overall Framework

The long-term optimization scheduling method for the water–wind–solar multi-energy complementary system based on DDPG mainly consists of the following three steps, as shown in Figure 1.

Step 1: Generation of water–wind–solar scenarios. First, the correlation features between multiple energy sources in the water–wind–solar system are constructed based on the Copula function’s C-tree structure and hierarchical levels. Truncation processing is performed at a certain level to obtain the joint distribution function of each random variable. Then, LHS technique is used for stratified sampling to generate a large number of coupled water–wind–solar scenario sets. Detailed descriptions are provided in Section 3.

Step 2: Mathematical Modeling of Long-Term Scheduling. A long-term scheduling model for the hydro-wind-solar multi-energy system is constructed, with the objective of maximizing clean energy consumption. The model incorporates several realistic operational constraints, including water balance, outflow discharge limits, water level–reservoir volume relationships, and upper bounds on wind and photovoltaic power outputs, to accurately simulate the system’s real-world operation. Detailed modeling is presented in Section 4.

Step 3: The scheduling model is transformed into an MDP and solved using the DDPG algorithm. The agent interacts with the environment to generate scheduling actions. Ornstein–Uhlenbeck (OU) noise is introduced to promote policy exploration in continuous action spaces, and experience samples are collected and stored in the experience pool for replay. Meanwhile, gradient optimization and soft update mechanisms are used to update network parameters to obtain the optimal scheduling strategy, achieving long-term scheduling optimization of the water–wind–solar multi-energy complementary system. A detailed explanation of this process is provided in Section 5.

3. Hydro–Wind–Solar Scenario Generation

Runoff, as well as wind and solar power output, exhibit distinct seasonal characteristics. Within the same year, the output of the three sources varies significantly from month to month. Given that all three are influenced by meteorological factors, the random variables not only display high randomness and uncertainty but also show certain correlations. To accurately capture these uncertainties and the correlations between heterogeneous energy sources, a correlation modeling method based on the improved C-Vine-Copula model is proposed. This is combined with the Latin Hypercube Sampling technique to generate a coupled watershed water–wind–solar scenario set. The specific modeling process is shown in Figure 2.

3.1. Hydro–Wind–Solar Complementary System Correlation Modeling

This paper introduces a star-shaped C-Vine Copula model to hierarchically describe the spatiotemporal correlation between runoff, wind, and solar power. However, as the dimensionality of the variables increases, the complexity of the model significantly rises, and the dependencies between variables are not only direct but also exhibit conditional correlations. Therefore, an improved C-Vine-Copula model is proposed to accurately capture the correlations between the random variables of the water–wind–solar multi-energy complementary system while reducing the model complexity. The following are the specific steps for spatiotemporal correlation modeling of the multi-energy complementary system based on the improved C-Vine-Copula.

(1) Let the hydropower, wind power, and solar power outputs be represented by the random variables

X_{h}, X_{w}, X_{s}

, respectively. By applying non-parametric kernel density estimation (KDE), we obtain the smoothed marginal probability density function (PDF) for each random variable. Taking wind power as an example, the random variable

X_{w} = [x_{1 w}, x_{2 w} \dots \dots x_{n w}]

, where

x_{1 w}, x_{2 w} \dots \dots x_{n w}

are

n

historical observations of wind power output independently and identically distributed according to

F

. The kernel density estimation is given as follows:

{\hat{f}}_{h} (X_{w}) = \frac{1}{n} \sum_{i = 1}^{n} K_{h} (x_{w} - x_{i w}) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x_{w} - x_{i w}}{h})

(1)

In the formula,

f

is the probability density function;

h > 0

is a smoothing parameter, called the bandwidth; and

K (\cdot)

is the kernel function;

x_{w}

represents the target wind power output value for which the probability density is to be estimated.

(2) The C-vine Copula model is used to build the joint distribution between wind-solar power outputs and current flow, capturing the temporal dependency. The Kendall correlation coefficient is selected to measure the dependency between variables. The calculation formula for Kendall’s correlation coefficient is as follows:

τ = P [(X - X^{'}) (Y - Y^{'}) > 0] - P [(X - X^{'}) (Y - Y^{'}) < 0]

(2)

In the formula,

P (\cdot)

is the probability density function, and

(X, Y)

and

(X^{'}, Y^{'})

represent the joint distribution of the random variables.

(3) The method of selecting the optimal root node using the largest eigenvalue is adopted to construct the weight matrix, which is given as follows:

T = (\begin{matrix} 0 & τ_{12} & τ_{13} \\ τ_{21} & 0 & τ_{23} \\ τ_{31} & τ_{32} & 0 \end{matrix})

(3)

In the formula,

τ_{x y}

is the Kendall correlation coefficient.

By calculating the columns and selecting the root node, the dependencies between the root node variables and the other node variables form the first layer of the Causal Tree structure. The second layer, conditioned on the root node variables of the first layer, constructs the dependencies between the remaining variables. In the optimal pair Copula function selection for each edge in the tree structure, a mixed Copula function is introduced, and the EM algorithm is used to estimate its weight parameters and dependence coefficients. The mixed Copula form is as follows:

C (u, v) = \sum_{n} λ_{n} C (u_{n}, v_{n}, θ_{n})

(4)

In the formula,

u, v

are random variables with marginal distributions;

λ_{n} \in [0, 1]

is the weight parameter of the model; and

\sum_{n} λ_{n} = 1

.

θ_{n}

is the dependency parameter of the model.

(4) The information criterion method is used to select the optimal truncation level. First, different truncation levels are assumed, and the overall likelihood estimate of the model under each truncation level is calculated. The formula is as follows:

L = \prod_{i = 1}^{m} L_{i}

(5)

In the formula,

L_{i}

represents the estimated value of the model layer

i

,

m

is the total number of layers in the C-vine Copula model, and

L

is the overall estimated value of the model.

Based on the overall likelihood estimate obtained above, the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are calculated to determine the optimal truncation level. The formulas are as follows:

A I C = 2 k - x \ln (L)

(6)

B I C = k \ln (n) - 2 \ln (L)

(7)

In the formula,

k

is the number of parameters, and

n

is the sample size.

Based on the principles of AIC and BIC, the optimal model is selected by minimizing the corresponding values for the second layer of the C-vine Copula structure. Suppose there are

n

random variables

X = (x_{1}, x_{2}, \dots, x_{n})

,

F (x_{1}, x_{2}, \dots, x_{n})

and

f (x_{1}, x_{2}, \dots, x_{n})

represent the joint cumulative distribution function and joint probability density function, respectively. The marginal distributions

F_{i} (x_{i})

and

f_{i} (x_{i})

represent the marginal cumulative distribution functions and marginal probability density functions of the individual variables. Then, the C-vine Copula joint distribution function of

X

is as follows:

f (x_{1}, \dots, x_{n}) = [\prod_{k = 1}^{n} f_{k} (x_{k})] . \prod_{j = 1}^{n - 1} \prod_{i = 1}^{n - j} c_{j, j + i | 11 \dots j - 1} [F (x_{j} |x_{1}, \dots, x_{j - 1}), F (x_{j + i} |x_{1}, \dots, x_{j - 1})]

(8)

In the formula,

c_{j, j + i | 11 \dots j - 1} (\cdot)

represents the Copula probability density function formed by the variables

x_{j}

and

x_{j + i}

, conditional on knowing

x_{1}, x_{2}, \dots, x_{j - 1}

;

F (x_{j} | x_{1}, \dots, x_{j - 1})

represents the distribution function of the variable

x_{j}

, conditional on knowing

x_{1}, x_{2}, \dots, x_{j - 1}

.

An improved C-Vine Copula hierarchical modeling approach is used to effectively characterize the spatiotemporal correlation between random variables. Truncation is performed at appropriate levels [12] to reduce the model’s complexity, simplify the computational process, and achieve an effective balance between model complexity and fitting accuracy.

3.2. Scenario Generation for Multi-Energy Complementary Systems

Based on the joint output probability distribution of water, wind, and solar power constructed using the improved C-vine Copula model, Latin Hypercube Sampling is used for stratified sampling. The sample space of each dimension is evenly divided into several sub-intervals, and sample points are randomly selected within each sub-interval to generate a set of water–wind–solar coupling scenarios.

Assume the sample size is

N

. The

N

-th sample can be represented as

s_{n} = [s_{n h}, s_{n w}, s_{n s}]

, with the specific sampling procedure as follows:

(1) Let

x_{w}

be the random variable corresponding to the root node, where

x_{w} \in [x_{w d}, x_{w u}]

,and the marginal distribution function is

f_{w} (x_{w})

. The distribution function

[f_{w} (x_{w d}), f_{w} (x_{w u})]

will partition the

N

dimensional probability space into several subregions, where a random variable

q_{i}

is selected from each region, yielding a new random variable

Z_{1}

.

(2) Let

U_{1}, U_{2}, U_{3}

be the three desired variables, and the corresponding values obtained from the above steps are the random variables. Specifically,

U_{1} = Z_{1}

,

Z_{1}

which can be viewed as the desired random variable.

(3) Similarly, following the procedure in Step 1, random variables

Z_{2}, Z_{3}

are generated from the uniform distribution, defined as random variables

Z_{2}

. From the conditional distribution function, the second desired variable

U_{2}

is obtained, and

Z_{2} = F (x_{2} | x_{1}) = \partial C (U_{1}, U_{2}) / \partial U_{1}

is calculated. Among them,

Z_{2}

and

U_{1}

are known variables, and thus, the non-linearity is transformed into a linear process which is solved using the bisection method, where the solution is the desired random variable

U_{2}

.

(4) Similarly, as

Z_{2}, Z_{3}

have been defined, the following can be derived:

Z_{3} = F (x_{3} | x_{1}, x_{2}) = \partial C_{x_{3}, x_{2} | x_{1}} (F (x_{3} | x_{1}), Z_{2}) / \partial Z_{2}

(9)

The conditional distribution function

F (x_{3} | x_{1})

is obtained, and

F (x_{3} | x_{1}) = \partial C (U_{1}, U_{3}) / \partial U_{1}

is solved. The resulting outcome is the sample of the desired variable

U_{3}

.

(5) Finally, the inverse transform sampling for

U_{1}, U_{2}, U_{3}

is performed, thus the deterministic hydro–wind–solar scenario set

S = [S_{h}, S_{w}, S_{s}]

.

LHS is used for stratified sampling of the high-dimensional joint distribution. This not only ensures the uniformity of the sample distribution in space but also effectively improves sampling efficiency and sample coverage. The large number of generated coupling scenarios can accurately reflect the correlation characteristics between the random variables in the water–wind–solar multi-energy complementary system, providing reliable data support for system scheduling.

3.3. Scenario Quality Evaluation

To comprehensively assess the quality of scenarios generated for the hydro–wind–solar multi-energy complementary system, this study evaluates three aspects of the associated stochastic variables: correlation, randomness, and volatility. Corresponding metrics are constructed to quantify the reliability and rationality of the long-term scenarios.

3.3.1. Correlation Metric

To measure the structural consistency of generated scenarios in the spatial domain, Kendall’s Tau coefficient (

τ

) is selected as the statistical measure of inter-variable correlation strength. The Mean Kendall correlation coefficient Absolute Error (MKAE) between the wind power and photovoltaic output series is adopted as the evaluation metric. This indicator quantifies the deviation of the generated scenarios from historical observations in terms of correlation. A smaller MKAE value indicates that the generated scenarios better reproduce the spatial correlation structure observed in real data. The specific calculation formula is given as follows:

W = \frac{1}{N} \sum_{n = 1}^{N} | τ_{n} - \tilde{τ} |

(10)

In the formula,

τ_{n}

is the Kendall’s Tau value of the n-th scenario sample;

\tilde{τ}

is the Kendall’s Tau value derived from historical data; and

N

denotes the total number of generated scenarios.

3.3.2. Randomness Metric

To evaluate the goodness-of-fit between the overall statistical distribution of the generated scenarios and that of the historical data, the Mean Euclidean Distance (MED) is adopted as the randomness metric. This indicator calculates the Euclidean distance between the historical observation sequences of runoff, wind power, and photovoltaic output and their corresponding generated samples. A smaller MED value indicates that the generated scenarios better preserve the stochastic nature of the original data, i.e., they are closer to the real scenarios in the multidimensional feature space. The calculation formula is as follows:

E = \frac{1}{W} \sum_{w = 1}^{W} {‖R_{w} - R‖}_{2}

(11)

In the formula,

R_{w}

represents the

w

-th scenario sequence;

R

denotes the historical actual sequence;

{‖\cdot‖}_{2}

is the L2 norm of the difference between the two sequences.

3.3.3. Volatility Metric

The generated scenarios should be capable of capturing the full range of fluctuations observed in historical power output data, thereby reflecting the system’s realistic operational behavior under uncertainty. To this end, the Coverage Rate (CR) is introduced as the volatility evaluation metric. It measures the proportion of historical observations that fall within the range spanned by the generated scenarios. A higher coverage rate indicates that the generated scenarios better encompass the variability of historical data, implying stronger representativeness and robustness. The calculation is defined as follows:

\{\begin{matrix} C = \frac{1}{T} \sum_{t = 1}^{T} 1 (r_{\min, t} \leq r_{h, t} \leq r_{\max, t}) \\ r_{\min, t} = \min (r_{w, t}), w = 1, 2, \dots, W \\ r_{\max, t} = \max (r_{w, t}), w = 1, 2, \dots, W \end{matrix}

(12)

In the formula,

C

represents the coverage rate;

1 (\cdot)

is an indicator function that equals 1 if the condition in parentheses is satisfied, and 0 otherwise;

r_{h, t}

is the historical actual output value for all scenarios in month

t

;

r_{\min, t}

and

r_{\max, t}

denote the minimum and maximum values, respectively, of the generated scenarios for month

t

.

4. Model Construction

4.1. Objective Function

From the perspective of clean energy consumption and power generation efficiency, the scheduling objective is to simulate the operation decisions of the multi-energy complementary system under constraints such as channel capacity, based on the maximum available power generation of each unit in different scenarios, while ensuring that the installed capacity of hydropower and wind-solar units is fixed. Through optimization scheduling, the optimal discharge flow, photovoltaic power generation, and wind power generation are selected, maximizing the integrated operation power generation of water, wind, and solar, while reducing wind and solar curtailment, ultimately achieving the maximization of clean energy consumption. The expression is as follows:

F = \max \sum_{t = 1}^{T} (P_{t} - λ_{1} P_{t}^{w,}^{q} - λ_{2} P_{t}^{p v}^{, q} - λ_{3} P_{t}^{h}^{, q}) Δ t

(13)

P_{t} = P_{t}^{h} + P_{t}^{w} + P_{t}^{p v}

(14)

In the formula,

P_{t}

represents the total output of the water–wind–solar hybrid system at time

t

;

P_{t}^{h}, P_{t}^{w}, P_{t}^{p v}

represent the outputs of hydroelectric, wind, and photovoltaic power, respectively.

P_{t}^{w,}^{q}, P_{t}^{p v}^{, q}, P_{t}^{h}^{, q}

correspond to power for curtailed wind, curtailed solar, and curtailed water (electricity) generation.

λ_{1}, λ_{2}, λ_{3}

denote the curtailment penalty coefficients for wind, solar, and hydro power, respectively.

t

is the time period

(t = 1, \dots, T)

, and

T

is set to 12;

Δ t

is the time interval for the time step.

4.2. Constraints

(1) Water balance constraint:

V_{t} = V_{t - 1} + [Q^{i n}_{t} - Q^{o u t}_{t}] \times 3600 Δ t

(15)

In the formula,

V_{t}

is the reservoir volume at the end of time interval

t

(in

m^{3}

). The time interval

Δ t

is measured in hours, while the inflow and outflow rates

Q^{i n}_{t}

,

Q^{o u t}_{t}

(in

m^{3} / s

) represent the water flow into and out of the reservoir during interval, respectively. Since the time step in flow rate

Q

is defined in seconds, the term

Δ t

must be multiplied by 3600 to ensure consistent units when computing the change in reservoir volume.

(2) Water level range constraint:

L_{m i n} \leq L_{t} \leq L_{m a x}

(16)

In the formula,

L_{m i n}, L_{t}, L_{m a x}

represent the lower, current, and upper water levels of the hydropower station at time, respectively.

(3) Discharge flow limit:

0 \leq Q^{o u t}_{t} \leq Q^{o u t}_{m a x}

(17)

In the formula,

Q^{o u t}_{m a x}

represents the maximum outflow from the hydropower station at time

t

.

(4) Power generation flow limit:

0 \leq Q^{u}_{t} \leq Q^{u}_{m a x}

(18)

In the formula,

Q^{u}_{m a x}

represents the maximum power generation flow from the hydropower station at time

t

.

(5) Discharge flow power generation flow relationship:

Q^{o u t}_{t} = Q^{u}_{t} + Q^{q}_{t}

(19)

In the formula,

Q^{q}_{t}

represents the main flood discharge flow to the hydropower station when the antiflood pressure is reached.

(6) Power generation flow and output relationship:

P_{t}^{h} = K H Q^{u}_{t}

(20)

In the formula,

K = η ρ g

is a constant, where

η

is the overall efficiency of the machine group, taken as 0.83;

ρ

is the density of water, approximately 1000 kg/m³;

g

is the acceleration due to gravity, approximately 9.81 m/s²;

H

is the effective head, defined as the difference in height between the water surface upstream of the turbine and the water surface downstream of the turbine (considering the head loss after the turbine).

(7) Water level—storage capacity relationship:

V_{t} = g (L_{t})

(21)

In the formula,

g (x)

represents the non-linear relationship between the water level and storage capacity of the hydropower station reservoir.

(8) Power generation transmission capacity constraint:

P_{t} \leq P_{L, \max}

(22)

In the formula,

P_{L, \max}

represents the maximum capacity of the transmission channel of the water–wind–solar hybrid system.

(9) Wind power and photovoltaic output upper limit constraint:

Affected by factors such as wind speed and irradiance, the actual output of wind power and photovoltaic systems under specific environmental conditions does not exceed the maximum available output. This can be expressed as follows:

0 \leq P_{t}^{w} \leq P_{t}^{w}_{m a x}

(23)

0 \leq P_{t}^{p v} \leq P_{t}^{p v}_{m a x}

(24)

In the formula,

P_{t}^{w}_{m a x}, P_{t}^{p v}_{m a x}

represent the maximum output limits of wind and photovoltaic power at time

t

. When the actual power output is lower than the maximum allowable output, the excess part of the output is considered as curtailed energy:

P_{t}^{w,}^{q} = P_{t}^{w}_{m a x} - P_{t}^{w}

(25)

P_{t}^{p v}^{, q} = P_{t}^{p v}_{m a x} - P_{t}^{p v}

(26)

5. Model Solution

The integrated scheduling problem of hydro, wind, and solar energy is a typical multi-stage, multi-state decision problem. Given that the actual scheduling variables exhibit continuity characteristics, this paper proposes a long-term optimization scheduling method for the hydro-wind-solar multi-energy complementary system based on DDPG. This method can effectively solve the multi-agent model in a continuous action space and achieve long-term optimization scheduling for the system.

5.1. Markov Decision Process Model Transformation

For the long-term scheduling problem of the hydro-wind-solar multi-energy complementary system, it is first modeled as a discrete finite Markov Decision Process (MDP). Based on the existing mathematical optimization model and constraints, the problem is described as a process where four elements—state, action, transition, and reward—interact with each other. Considering the periodicity of the system, one year is taken as a complete cycle, which is divided into multiple decision steps, where the actions in each state correspond to scheduling decisions for a specific time period. After completing all decision steps (i.e., covering a full annual cycle), the total reward is calculated, and an episode ends. At the same time, considering the seasonal characteristics, differentiated penalties for water, wind, and solar curtailment are set during the flood and drought periods of each year to further optimize the scheduling strategy.

State: The state vector

s_{t} = [V_{t}, t]

is the state at the current time, which includes sufficient information to help the intelligent system make decisions.

V_{t}

is the storage capacity of the water reservoir at time

t

, and the time information

t

helps the intelligent system identify seasonal characteristics, optimizing the control strategy.

Action: The action vector

a_{t} = [P_{t}^{w}, P_{t}^{p v}, Q^{o u t}_{t}]

is the decision variable for the current month, which includes the following: the decision on water discharge flow

Q^{o u t}_{t}

, wind power generation

P_{t}^{w}

, and photovoltaic power generation

P_{t}^{p v}

. This model uses continuous action space, with each action’s magnitude limited to 3 consecutive values, normalized to the range [0, 1] to map it back to the actual physical range, ensuring the control strategy aligns with the real-world situation.

Transition: The state transition matrix describes how the system transitions from the current state to the next state under the selected action. Considering that the randomness in the hydro-wind-solar complementary system primarily arises from the uncertainty in wind, solar resources, and runoff, while the state variables only include deterministic characteristics of the system (such as reservoir capacity), the randomness in state transitions originates from external environmental inputs rather than the state variables themselves. External environmental inputs, including the maximum available power generation and flow rates for wind and photovoltaic power during the month, are represented as the action vector

e_{t} = [P_{t}^{w}_{m a x}, P_{t}^{p v}_{m a x}, Q^{i n}_{t}]

, which is finally combined with the state transition of the environmental scenario.

s_{t + 1} = s_{t} + f (a_{t}, e_{t})

(27)

In the formula,

s_{t + 1}

represents the state transition variable, which describes the state change

s_{t}

after action

a_{t}

, transitioning to the next state

s_{t + 1}

. The function

f (a_{t}, e_{t})

represents the action and environmental input affecting the next state. The state transition is calculated by the reservoir storage capacity formula. The environmental transition remains fixed under the same environmental scenario.

Reward: To reflect the scheduling demands of different seasons, the reward function will adopt different evaluation methods for various seasons (such as flood season and non-flood season). In addition, a time period factor can be introduced into the reward function by determining whether the current month falls within the flood or drought season, using different parameters for reward calculation according to the specific season. The reward function is as follows:

r = P_{t} - λ_{1} (t) P_{t}^{w,}^{q} - λ_{2} (t) P_{t}^{p v}^{, q} - λ_{3} (t) P_{t}^{h}^{, q}

(28)

In the formula,

t

represents the time period index;

λ_{1} (\cdot), λ_{2} (\cdot), λ_{3} (\cdot)

are the penalty coefficients for wind, photovoltaic, and hydro power generation, respectively.

5.2. DDPG Algorithm Solution

Based on the above Markov Decision Process transformation for the scheduling model, this paper uses the DDPG (Deep Deterministic Policy Gradient) algorithm to construct a policy network and Q-value network for the hydro-wind-solar complementary system. These networks are used to generate scheduling decisions and evaluate the value of the scheduling strategy. The specific steps are as follows:

(1) The initial strategy network (Actor)

μ (s |θ^{μ})

and the Q-value network (Critic)

Q (s, a |θ^{Q})

are initialized. At the same time, the target network is initialized as

θ^{μ}^{^{'}} = θ^{μ}

and

θ^{Q}^{^{'}} = θ^{Q}

, with the experience replay buffer

D = 5000

used to store data interacting with the environment. The water reservoir’s initial storage

V_{t}

, reward function parameters

λ_{1} = 1.5

,

λ_{2} = 2

,

λ_{3} = 2

, discount factor

γ = 0.9

, and other parameters are also defined, the learning rate was determined through multiple experimental trials.

(2) The initial month

t = r a n d i n t (1, 12)

is randomly selected, and the environmental input is determined as

e_{t} = [P_{t}^{w}_{m a x}, P_{t}^{p v}_{m a x}, Q^{i n}_{t}]

, with the initial state variable

s_{t} = [V_{t}, t]

. Meanwhile, the cumulative reward for the episode

r = 0

is set for optimizing the policy during subsequent evaluations.

(3) Obtain the current state

s_{t} = [V_{t}, t]

; input the current state

S_{t}

into the policy network, and output the action

a_{t} = μ (s_{t} | θ^{μ}) + N_{t}

, where

N_{t}

is the exploration noise used to ensure sufficient exploration.

(4) Update the environment based on the current state and action: Given the action

a_{t}

, current state

S_{t}

, and external environment inputs

e_{t} = [P_{t}^{w}_{m a x}, P_{t}^{p v}_{m a x}, Q^{i n}_{t}]

, update the reservoir volume accordingly. Then, update the month index

t \to t + 1

, and use the new month to update the wind and PV generation upper limits and inflow data:

e_{t} = [P_{t + 1,}^{w}_{m a x}, P_{t + 1,}^{p v}_{m a x}, Q^{i n}_{t + 1}]

. The next state is formed as

s_{t + 1} = [V_{t + 1}, t + 1]

, along with the next-step environment input.

(5) Compute the immediate reward. Calculate the reward based on the current decision, including total generation and curtailment penalties:

r_{t} = P_{t} - λ_{1} (t) P_{t}^{w,}^{q} - λ_{2} (t) P_{t}^{p v}^{, q} - λ_{3} (t) P_{t}^{h}^{, q}

. Then store the transition tuple

b_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1})

into the experience replay buffer

D

.

(6) Network update: After each decision step, the networks are updated using samples from the experience replay buffer. A mini-batch

B = {(s_{i}, a_{i}, r_{i}, s_{i + 1})}_{i = 1}^{N}

is randomly sampled from the replay buffer.

(1) Q-network update: First, the current Q-values are computed by feeding

s_{i}, a_{i}

into the Q-network:

Q (s_{i}, a_{i} | θ^{Q})

. Then, target Q-values are computed using samples

s_{i + 1}, r_{i}

from the replay buffer; it can be expressed as follows:

L (θ^{Q}) = \frac{1}{N} \sum_{i} {[Q (s_{i}, a_{i} | θ^{Q}) - y_{i}]}^{2}

(29)

Here,

r_{i}

is the reward generated by the action at time

i

, and

γ

is the discount factor representing the influence of future rewards on current decision making. The Q-network parameters are then updated by minimizing the mean squared error, and the formula is shown below:

L (θ^{Q}) = \frac{1}{N} \sum_{i} {[Q (s_{i}, a_{i} | θ^{Q}) - y_{i}]}^{2}

(30)

and computing the gradient of the loss with respect to all parameters in the Q-network. The parameters of the Q-network are updated using the calculated gradient to reduce the loss. The update rule is as follows:

θ^{Q} = θ^{Q} - α_{C} \cdot \nabla_{θ^{Q} loss}

(31)

where

α_{C}

is the learning rate of the Q-network, which is a small hyperparameter less than 1.

(2) Policy network update: Use the gradient of the Q-network output with respect to the action

\nabla_{a} Q (s, a | θ^{Q}) |_{a = μ (s | θ^{μ})}

and the gradient of the policy network parameters

\nabla_{θ^{μ}} μ (s | θ^{μ})

to maximize the Q-value estimated by the Q-network; updated rule:

θ^{μ} = θ^{μ} - α_{A} \cdot \nabla_{θ μ} J

(32)

Here,

α_{A}

is the learning rate of the policy network, which is a hyperparameter less than 1, and

\nabla_{θ μ} J \approx \frac{1}{N} \sum_{i} \nabla_{a} Q (s, a | θ^{Q}) |_{a = μ (s | θ^{μ})} \nabla_{θ^{μ}} μ (s | θ^{μ})

.

To improve the stability and effectiveness of the training process, the policy network is optimized through policy gradient. Meanwhile, a soft update mechanism is used to adjust the target network parameters, ensuring stability and convergence during training. The soft update formula is as follows:

θ^{Q'} \leftarrow λ θ^{Q} + (1 - λ) θ^{Q'}, θ^{μ'} \leftarrow λ θ^{μ} + (1 - λ) θ^{μ'}

(33)

In the formula:

λ

is a hyperparameter much smaller than 1, with a value of 0.00004.

(7) During the training process, the above steps are repeatedly executed until the maximum number of episodes is reached. If the termination condition is satisfied, the final optimized Actor network

μ (s |θ^{μ})

is output, representing the long-term optimal dispatch strategy of the hydro-wind-PV complementary system; otherwise, training continues until convergence. Figure 3 presents the complete process of parameter transmission and update in the DDPG algorithm, as follows.

The long-term scheduling optimization method for the hydro-wind-solar multi-energy complementary system based on DDPG enhances the exploration capability of the model by introducing OU noise [13]. The generation formula is as follows:

N_{t + 1} = N_{t} + θ (μ - N_{t}) + σ N (0, 1)

(34)

In the formula,

θ

represents the decay rate, which controls the speed at which the noise is averaged back;

σ

represents the noise width, which controls the magnitude of the exploration randomness.

Given the differences among the scheduling variables of hydroelectric discharge flow, wind power, and photovoltaic generation in the hydro-wind-solar energy system, the exploration parameters also vary. To accommodate the different stages of model training, this paper introduces a time-dependent noise dynamic adjustment mechanism to explore the scheduling strategy in the continuous action space. During the early stages of model training, larger noise amplitudes are used to fully explore the potential of various scheduling schemes. In the later stages, smaller noise amplitudes are employed to enhance the stability of the scheduling strategy. This mechanism improves the convergence of the reinforcement learning algorithm and the optimality of the solution, overcoming the issue of insufficient continuous action optimization in traditional methods. Furthermore, through experience replay and policy gradient optimization, the algorithm’s stability is further enhanced, effectively addressing the uncertainty challenges in the hydro–wind–solar multi-energy complementary system.

6. Case Analysis

6.1. Dataset Analysis and Parameter Settings

This paper analyzes a case study of a large hydropower station and its associated wind and solar power plants. The wind and solar output data are fitted using the reanalysis meteorological dataset provided by NASA, while the runoff data are provided by the power grid company. The analysis is conducted with a monthly time step, covering the period from 1990 to 2021.

The deep learning framework used in the experiment is the PyTorch framework under Python 3.12. The dataset for the DDPG algorithm consists of 10,000 historical runoff and wind–solar output samples generated using the scene generation method described in this paper. The training, validation, and test sets are split in a ratio of 70%, 20%, and 10%, respectively. In the DDPG model adopted in this study, both the Actor and Critic networks consist of two hidden layers, aiming to balance the model’s representational capacity and training efficiency. The discount factor is set to 0.99, and the batch size is empirically chosen as 32 based on engineering experience. The replay buffer capacity is 100,000. The experiments were conducted on a system equipped with an AMD (Santa Clara, CA, USA) Ryzen 7 3750H CPU, NVIDIA (Santa Clara, CA, USA) GeForce GTX 1050 GPU, and 8.0 GB RAM.

To further improve the training performance of the DDPG algorithm, we conducted sensitivity analysis and parameter optimization based on an initial parameter set. The learning rates of the Actor and Critic networks (denoted as

α_{A}

and

α_{C}

, respectively), along with the parameters

θ

and

σ

of the OU noise, were systematically adjusted. Several representative parameter combinations (as shown in Table 1) were tested, and their training performance was evaluated based on cumulative reward and training time. Finally, based on training performance, the optimal parameter configuration was determined as

α_{A} = α_{C} = 0.0001

,

θ = 0.9995

,

σ = 0.5

.

6.2. Hydro–Wind Coupling Scenario Analysis

Based on the given runoff and wind–solar output data, 10,000 sets of long-term uncertainty scenarios for the water–wind–solar multi-energy complementarity system were generated using the correlation modeling and sampling methods of the multi-energy complementarity system mentioned in Section 3.1 and Section 3.2.

As shown in Figure 4, the proposed scenario generation method effectively captures the annual variation trends in each variable in the multi-energy complementarity system. The generated scenarios generally align with the climate characteristics of the region: May to October is the rainy season, during which inflow is relatively high; November to April is the dry season, with a significant decline in runoff. During the spring and winter, wind speeds are higher, and solar radiation intensity is moderate, leading to abundant wind and solar resources. In contrast, during the summer and autumn, factors such as the rainy season and cloud cover lead to weaker wind and solar output. Additionally, the generated scenarios are closely distributed around the multi-year average, showing a certain degree of continuity, which proves that the method can accurately quantify the high-dimensional uncertainty of the water–wind–solar multi-energy complementarity system.

To validate the effectiveness of the proposed improved C-Vine Copula modeling approach in capturing multidimensional stochastic dependencies, a comparative analysis was conducted against the traditional C-Vine Copula method. Table 2 presents the correlation metrics derived during scenario generation using both methods. As shown, the absolute errors of Kendall’s Tau coefficients between variable pairs generated by the proposed method are consistently smaller than those of the traditional model, indicating higher accuracy in preserving the intervariable dependence structure.

Furthermore, Table 3 compares the two methods in terms of randomness (measured by Mean Euclidean Distance) and volatility (measured by Coverafge Rate). The results demonstrate that the proposed method yields significantly lower randomness metrics across runoff, wind power, and photovoltaic output, suggesting that the generated scenarios are more closely aligned with historical data. Additionally, a higher coverage rate indicates that the proposed method more comprehensively captures the fluctuation range of historical outputs.

In summary, the improved C-Vine Copula model outperforms the traditional method in maintaining correlation, approximating randomness, and covering volatility, thus exhibiting stronger overall modeling capability.

6.3. Result Analysis

6.3.1. Convergence Analysis

Based on the DDPG algorithm, we conducted systematic training for the agent and compared the training performance with the DQN algorithm in the long-term scheduling model of the multi-energy complementarity system. For key hyperparameters in the DQN algorithm, such as the learning rate and the ε-greedy exploration rate, this paper adopts the same parameter tuning strategy used for the DDPG algorithm. By comparing training performance under different parameter combinations, a set of optimal hyperparameter values suitable for this scenario was finally determined,

α^{'} = 0.003, ε = 0.85

. As shown in Figure 5, the convergence characteristics of the DDPG and DQN algorithms exhibit significant differences. Specifically, the DQN algorithm converges after approximately 6500 iterations, with its reward value showing considerable random fluctuations throughout the training process. In contrast, the DDPG algorithm reaches convergence after about 3000 iterations. Although the reward value fluctuates significantly in the early stages of training, as the number of iterations increases, the fluctuation range gradually decreases and stabilizes, demonstrating a faster convergence rate.

The performance difference mainly arises from the fundamental distinction in exploration strategies between the two algorithms. The DQN algorithm relies on a relatively random action selection mechanism and an experience replay strategy to obtain better reward values, which leads to a slow and inefficient training process. In contrast, the DDPG algorithm expands the exploration space by introducing high-intensity noise during the early stages of training. Although this causes significant fluctuations in reward values, it ensures a thorough search for the globally optimal strategy. As training progresses, the exploration noise gradually decays, which not only ensures the stability of model convergence but also significantly improves the efficiency of continuous space exploration, resulting in faster convergence.

6.3.2. Scheduling Result Analysis

To ensure the comprehensive benefits of the multi-energy complementary system, this study aims to maximize the consumption of clean energy. It verifies the dispatch performance of the hydro-wind-solar multi-energy complementary system through typical scenarios of wet and dry years, assuming the initial water levels for the wet and dry years are 970 m and 945 m, respectively. Figure 6 shows the comparison results of system output characteristics and reservoir water level processes based on the DDPG and DQN algorithms. Figure 6a, depicting the scheduling process for the dry year, shows that both algorithms exhibit a clean energy output increasing trend from January to June, reaching peak generation from July to September, and do not exceed the channel capacity limit, with no curtailment of wind or solar power. However, significant differences exist in the scheduling strategies: The DQN algorithm, due to insufficient control accuracy in continuous action space, frequently adopts a water storage strategy during the non-flood season (January to June) with lower inflow, and continues this storage strategy during the flood season (July to October) despite increasing inflow, resulting in limited hydropower output. In contrast, the DDPG algorithm strictly follows the real-time response principle, stabilizing the water level at the lower limit and dynamically adjusting hydropower output to compensate for the intermittency of wind and solar power. Although both algorithms generate similar total power in the dry year, the DDPG model significantly improves the reliability of power supply during periods of wind and solar fluctuations through refined water resource scheduling. Figure 6b shows the scheduling process for the wet year. Affected by the high inflow between July and September, the total clean energy output of the system reaches the channel capacity limit, leading to curtailment. The DQN algorithm chooses water storage during non-peak load conditions before (June) and after the flood season (September), and it misjudges the curtailment threshold in July and September when the water level is still below the upper limit, resulting in increased curtailment. In contrast, the DDPG algorithm strictly maintains the water level at the lower limit during the non-flood season to ensure immediate absorption of hydropower, and during the flood season, it dynamically adjusts the water storage strategy based on accurate runoff changes, stabilizing the water level near the upper limit. It then proactively makes curtailment decisions based on capacity constraints, achieving minimized curtailment control.

In terms of solution quality for a typical wet year, as shown in Table 4, compared to the DQN algorithm, the DDPG algorithm results in an increase in actual generation for the hydro-wind-solar multi-energy complementary system during the wet year. The actual power generation increased by 208.8 million kWh, and the curtailment of wind and solar power decreased by 180 million kWh. In terms of the water curtailment indicator, both systems had zero water curtailment. This indicates that in the wet year scenario, the DDPG model achieves better generation efficiency than the DQN model and demonstrates a more rational coordination in the utilization of wind and solar resources.

7. Conclusions

This study establishes a long-term optimal scheduling model for the hydro–wind–solar multi-energy complementary system with the goal of maximizing clean energy consumption. The DDPG algorithm is used for optimization and scheduling, leading to the following conclusions:

The improved C-Vine Copula model is used to model the correlations within the hydro–wind–solar multi-energy complementary system. Combined with Latin Hypercube Sampling, the uncertainty of the system is quantified, while reducing model complexity. The resulting scenarios effectively reflect the historical runoff and the characteristic patterns of wind and solar power generation in the region.
Compared to the DQN algorithm, the DDPG algorithm improves the model’s convergence speed and policy reward through an adaptive noise exploration mechanism and continuous action space optimization during the training phase. Additionally, through scheduling validation in typical hydrological scenarios, DDPG demonstrates stronger environmental adaptability and decision robustness in handling the complex continuous control problems of multi-energy complementary systems. Whether in the dry year scenario, where hydropower precisely compensates for wind and solar fluctuations, or in the wet year scenario, where curtailment and water level thresholds are actively optimized, DDPG maximizes the consumption of clean energy and system benefits.

8. Future Work

In this study, a long-term joint hydro–wind–solar power output scenario framework was constructed based on historical measurements of inflow, wind power, and photovoltaic output, capturing the interdependencies among multiple stochastic variables. On this basis, a deep reinforcement learning algorithm was introduced to solve the long-term scheduling model of a multi-energy complementary system, effectively addressing some of the challenges in handling complex uncertainties and high-dimensional decision making problems that exist in current research.

However, several limitations remain, which warrant further investigation and improvement in future work:

The current study does not fully account for scheduling costs and economic factors. Future research may incorporate the influence of market mechanisms to enhance realism.
This work primarily focuses on a local system and does not consider the coupling characteristics of multi-regional power grid interconnections, which could be a valuable direction for model extension.
Future research may explore adaptive strategies for real-time dynamic scheduling to enhance the model’s practical applicability and responsiveness.

Author Contributions

Conceptualization, W.L., Z.W. and M.H.; methodology, Z.W., W.L. and T.Z.; validation, Z.W., S.C. and M.H.; formal analysis, W.L. and W.G.; investigation, M.H., S.Z., S.C. and X.H.; resources, T.Z.; data curation, M.H., T.Z. and W.G.; writing—original draft preparation, Z.W.; writing—review and editing, W.L. and S.Z.; visualization, Z.W. and X.H.; supervision, S.C. and W.G.; project administration, X.H. and S.Z.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Research Fund of Hubei Technology Innovation Center for Smart Hydropower (SDCXZX-JJ-2023-23).

Data Availability Statement

The access/availability of the relevant data in this study has been correspondingly explained in the Section 6.

Conflicts of Interest

Authors Taotao Zhang, Weiwei Guan and Xiaojun Hua were employed by the company China Yangtze Power Co., Ltd. Author Shengzhe Chen was employed by the China Three Gorges Corporation. Author Shang Zheng was employed by the company Three Gorges Renewables Offshore Wind Power Operation and Maintenance Jiangsu Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Cao, H.; Qiu, J.; Zuo, H.-M.; Li, F.-F. A Long-Term Operational Scheme for Hybrid Hydro-Photovoltaic (PV) Systems That Considers the Uncertainties in Reservoir Inflow and Solar Radiation Based on Scenario Trees. Water Resour. Manag. 2023, 37, 5379–5398. [Google Scholar] [CrossRef]
Zhang, Y.; Cheng, C.; Cao, R.; Li, G.; Shen, J.; Wu, X. Multivariate Probabilistic Forecasting and Its Performance’s Impacts on Long-Term Dispatch of Hydro-Wind Hybrid Systems. Appl. Energy 2021, 283, 116243. [Google Scholar] [CrossRef]
Li, H.; Liu, P.; Guo, S.; Cheng, L.; Huang, K.; Feng, M.; He, S.; Ming, B. Deriving Adaptive Long-Term Complementary Operating Rules for a Large-Scale Hydro-Photovoltaic Hybrid Power Plant Using Ensemble Kalman Filter. Appl. Energy 2021, 301, 117482. [Google Scholar] [CrossRef]
Mu, C.X.; Zhang, J.T.; Cheng, C.T.; Xu, Y.; Yang, Y.Q. High-dimensional Uncertainty Scenario Generation Method for Hydro-wind-solar Multi-energy Complementary System Considering Spatio-temporal Correlation. Power Syst. Technol. 2024, 48, 3614–3623. [Google Scholar] [CrossRef]
Cao, H.; Mou, C.X.; Yang, Y.Q.; Xu, Y.; Zhang, Z.; Cheng, C.T. Long-term optimization scheduling method for hydro-wind-PV multi energy complementary systems considering multi uncertainty. Yangtze River 2024, 55, 26–34. [Google Scholar]
Zhao, Z.P.; Yu, Z.H.; Cheng, C.T.; Wang, J.; Feng, Z.K. Multi-risk Quantification and Long-term Multi-objective Coordinative Optimal Dispatch Method for Hydro-Wind-Solar Integrated Energy Base. Autom. Electr. Power Syst. 2024, 48, 118–130. [Google Scholar]
Yang, Z.; Liu, P.; Cheng, L.; Wang, H.; Ming, B.; Gong, W. Deriving Operating Rules for a Large-Scale Hydro-Photovoltaic Power System Using Implicit Stochastic Optimization. J. Clean. Prod. 2018, 195, 562–572. [Google Scholar] [CrossRef]
Wen, X.; Qin, J.S.; Tan, Q.F.; Zhang, Z.Y.; Wang, Y.L. Research on stochastic optimal operation of hydro-photovoltaic complementary based on utility function of carryover stage. Water Resour. Prot. 2023, 39, 23–31+62. [Google Scholar] [CrossRef]
Li, W.W.; Zhou, J.N.; Pei, B.L.; Zhang, Y.F. Study on long-term stochastic optimal operation of cascade reservoirs by deep reinforcement learning. J. Hydroelectr. Eng. 2023, 42, 21–32. [Google Scholar] [CrossRef]
Jiang, W.; Liu, Y.; Fang, G.; Ding, Z. Research on Short-Term Optimal Scheduling of Hydro-Wind-Solar Multi-Energy Power System Based on Deep Reinforcement Learning. J. Clean. Prod. 2023, 385, 135704. [Google Scholar] [CrossRef]
Ge, Y.; Xie, J.; Fu, D.; Feng, S.; Chang, J.; Song, Y. Real-Time Scheduling of Wind-Solar-Hydro Complementary System Based on Deep Reinforcement Learning. In Proceedings of the 2024 IEEE PES 16th Asia-Pacific Power and Energy Engineering Conference (APPEEC), Nanjing, China, 25–27 October 2024; pp. 1–4. [Google Scholar] [CrossRef]
Zhang, H.; Lu, Z.; Hu, W.; Wang, Y.; Dong, L.; Zhang, J. Coordinated Optimal Operation of Hydro–Wind–Solar Integrated Systems. Appl. Energy 2019, 242, 883–896. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]

Figure 1. Overall framework of long-term optimization scheduling for water–wind–solar multi-energy complementary system based on DDPG Algorithm.

Figure 2. Flowchart of watershed water–wind–solar scenario generation process.

Figure 3. DDPG algorithm parameter transmission and update process.

Figure 4. Hydro-wind–solar scenario set. (a) Runoff scenario set; (b) wind power output scenario set; (c) photovoltaic power output scenario set.

Figure 5. Iteration process of DQN and DDPG algorithm.

Figure 6. Scheduling decision processes of DQN and DDPG models in typical dry and wet years. (a) Dry year scheduling decision process; (b) wet year scheduling decision process.

Table 1. Learning rate of the DDPG algorithm and parameter selection of OU noise.

Number of Groups	$α_{A}$	$α_{C}$	$θ$	$σ$	Reward (10³)	Time (s)
1	0.005	0.001	0.9999	0.1	0.78	65.12
2	0.0001	0.001	0.9999	0.1	1.02	43.65
3	0.0003	0.001	0.9999	0.1	1.69	21.06
4	0.0001	0.005	0.9999	0.1	1.98	15.43
5	0.0001	0.0001	0.9999	0.1	2.10	12.40
6	0.0001	0.0003	0.9999	0.1	1.72	18.72
7	0.0001	0.0001	0.9995	0.1	2.08	16.32
8	0.0001	0.0001	0.999	0.1	2.10	11.25
9	0.0001	0.0001	0.9995	0.5	2.12	3.15
10	0.0001	0.0001	0.9995	0.9	2.08	13.79

Table 2. Comparison of absolute errors in Kendall’s Tau coefficients for generated scenarios.

Model	Runoff–Wind	Runoff–PV	Wind–PV
C-Vine Copula	0.031	0.056	0.038
The proposed model	0.020	0.003	0.009

Table 3. Calculation results of AED and coverage rate for generated scenarios.

Model	Runoff		Wind		PV
	AED	CR	AED	CR	AED	CR
C-Vine Copula	1.7425	85.4%	0.6582	80.5%	0.2419	88.5%
The proposed model	1.1296	92.5%	0.3715	93.8%	0.0976	90.2%

Table 4. Performance comparison of different models.

Model	Actual Power Generation (GWh)	Curtailed Power (GWh)	Curtailed Water (10⁸ m³)	Optimization Time (s)
DDPG	641.036	2.376	0	3.75
DQN	638.948	4.176	0	6.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, Z.; Li, W.; He, M.; Zhang, T.; Chen, S.; Guan, W.; Hua, X.; Zheng, S. Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG. Energies 2025, 18, 3983. https://doi.org/10.3390/en18153983

AMA Style

Wan Z, Li W, He M, Zhang T, Chen S, Guan W, Hua X, Zheng S. Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG. Energies. 2025; 18(15):3983. https://doi.org/10.3390/en18153983

Chicago/Turabian Style

Wan, Zixing, Wenwu Li, Mu He, Taotao Zhang, Shengzhe Chen, Weiwei Guan, Xiaojun Hua, and Shang Zheng. 2025. "Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG" Energies 18, no. 15: 3983. https://doi.org/10.3390/en18153983

APA Style

Wan, Z., Li, W., He, M., Zhang, T., Chen, S., Guan, W., Hua, X., & Zheng, S. (2025). Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG. Energies, 18(15), 3983. https://doi.org/10.3390/en18153983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Long-Term Scheduling Optimization of Water–Wind–Solar Multi-Energy Complementary System Based on DDPG

Abstract

1. Introduction

2. Overall Framework

3. Hydro–Wind–Solar Scenario Generation

3.1. Hydro–Wind–Solar Complementary System Correlation Modeling

3.2. Scenario Generation for Multi-Energy Complementary Systems

3.3. Scenario Quality Evaluation

3.3.1. Correlation Metric

3.3.2. Randomness Metric

3.3.3. Volatility Metric

4. Model Construction

4.1. Objective Function

4.2. Constraints

5. Model Solution

5.1. Markov Decision Process Model Transformation

5.2. DDPG Algorithm Solution

6. Case Analysis

6.1. Dataset Analysis and Parameter Settings

6.2. Hydro–Wind Coupling Scenario Analysis

6.3. Result Analysis

6.3.1. Convergence Analysis

6.3.2. Scheduling Result Analysis

7. Conclusions

8. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI