Next Article in Journal
Monitoring and Evaluation of Water Quality from Chirita Lake, Romania
Previous Article in Journal
Taxonomic and Functional Responses of Macroinvertebrates to Hydrological Changes and Invasive Plants in an NW Patagonia Riparian Corridor (Argentina)
Previous Article in Special Issue
Optimizing Hydrodynamic Regulation in Coastal Plain River Networks in Eastern China: A MIKE11-Based Partitioned Water Allocation Framework for Flood Control and Water Quality Enhancement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Optimal Scheduling of Pumped-Storage Units via DDPG with AOS-LSTM Flow-Curve Fitting

1
School of Electrical and Power Engineering, Hohai University, Nanjing 211100, China
2
NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China
*
Authors to whom correspondence should be addressed.
Water 2025, 17(13), 1842; https://doi.org/10.3390/w17131842
Submission received: 8 May 2025 / Revised: 15 June 2025 / Accepted: 18 June 2025 / Published: 20 June 2025

Abstract

:
The short-term scheduling of pumped-storage hydropower plants is characterised by high dimensionality and nonlinearity and is subject to multiple operational constraints. This study proposes an intelligent scheduling framework that integrates an Atomic Orbital Search (AOS)-optimised Long Short-Term Memory (LSTM) network with the Deep Deterministic Policy Gradient (DDPG) algorithm to minimise water consumption during the generation period while satisfying constraints such as system load and safety states. Firstly, the AOS-LSTM model simultaneously optimises the number of hidden neurons, batch size, and training epochs to achieve high-precision fitting of unit flow–efficiency characteristic curves, reducing the fitting error by more than 65.35% compared with traditional methods. Subsequently, the high-precision fitted curves are embedded into a Markov decision process to guide DDPG in performing constraint-aware load scheduling. Under a typical daily load scenario, the proposed scheduling framework achieves fast inference decisions within 1 s, reducing water consumption by 0.85%, 1.78%, and 2.36% compared to standard DDPG, Particle Swarm Optimisation, and Dynamic Programming methods, respectively. In addition, only two vibration-zone operations and two vibration-zone crossings are recorded, representing a reduction of more than 90% compared with the above two traditional optimisation methods, significantly improving scheduling safety and operational stability. The results validate the proposed method’s economic efficiency and reliability in high-dimensional, multi-constraint pumped-storage scheduling problems and provide strong technical support for intelligent scheduling systems.

1. Introduction

With global carbon emissions continually rising, climate change impacts are increasingly pronounced. Achieving carbon peaking and carbon neutrality has thus become a shared international objective [1]. Traditional fossil fuel combustion emits substantial greenhouse gases, exacerbating global warming and associated environmental issues such as rising sea levels [2,3]. Consequently, countries are pivoting to clean energy, emphasising renewables such as hydropower, wind, and solar [4,5].
Pumped-storage hydropower plants (PSHPs) are vital to grids dominated by renewables. By pumping water from the lower reservoir to the upper reservoir during off-peak electricity demand periods, PSHPs convert electrical energy into potential energy. At peak demand, or when renewables fall short, stored water is released through turbines, improving grid flexibility and stability [6,7]. PSHPs effectively mitigate the intermittency of renewable energy sources and improve overall power system reliability. Compared to fossil fuel generation, PSHPs offer higher round-trip efficiency (typically exceeding 70%), longer service life, lower maintenance costs, and near-zero direct carbon emissions, making them an environmentally friendly energy solution.
Short-term PSHP scheduling distributes load across units to minimise water use while respecting constraints such as vibration zones [8]. The process includes unit commitment (startup/shutdown status) [9] and load dispatch (assigned power output) [10]. The strong nonlinearity of PSHPs makes scheduling complex, so researchers rely on the Equal Incremental Rate method, Dynamic Programming (DP), and various intelligent algorithms.
The Equal Incremental Rate method [11] is relatively easy to implement and has found limited application in low-dimensional problems. In hydropower load allocation, ref. [12] proposed an equal-incremental-rate algorithm that identifies multiple solution regions. The method was validated with a hybrid model and applied to Geheyan Hydropower Station. Ref. [13] developed a short-term scheduling model for large hydropower stations and solved it with multi-core parallel DP. Ref. [14] merged progressive-structure and progressive-step DP into a hybrid approach and validated it on hydropower load allocation. Although DP is popular for PSHP scheduling, state-space discretisation leads to a curse of dimensionality in high-dimensional cases.
In recent years, optimisation algorithms have proliferated because of their simplicity and versatility [15,16]. Efficient methods such as Particle Swarm Optimisation [17] and Simulated Annealing [18] have demonstrated outstanding performance. Meanwhile, novel algorithms continue to emerge. Ref. [19] proposed a two-stage firefly optimisation algorithm with distinct encoding schemes for the economic load-dispatch problem. The authors added a dynamic patching mechanism to escape local optima, and numerical simulations confirmed its effectiveness. Ref. [20] introduced a two-layer optimisation framework that utilised the Cuckoo Search algorithm to solve a hydropower short-term scheduling model considering photovoltaic generation uncertainty. Compared with actual operations, the proposed deterministic and stochastic schemes reduced water consumption by 1.5% and 1.0%, respectively.
Of the three categories, the Equal Incremental Rate method is simple but increasingly unsuitable for nonlinear, high-dimensional problems. The DP method requires a high degree of discretisation, with its accuracy increasing with finer granularity. However, this also leads to a significant increase in computational demand and may result in the “curse of dimensionality”. Although intelligent algorithms offer relatively high solving efficiency, their accuracy often depends on manually tuned parameters and prior experience. Moreover, these methods tend to exhibit instability, and their results may not be reliably reproducible.
Recent advances in artificial intelligence, particularly Deep Reinforcement Learning (DRL), have offered promising tools for tackling complex, nonlinear optimisation problems. DRL fuses deep learning’s representation strength with reinforcement learning’s search efficiency, yielding strong results in energy scheduling. Ref. [21] employed the Deep Q-Network (DQN) approach to construct a short-term optimal scheduling framework for a multi-energy power system integrating hydropower, wind, and solar energy. Using real wind- and solar-power data, the framework markedly improved generation efficiency and decision quality. Ref. [22] integrated twin-delayed deep with learning-rate annealing and hindsight prioritised replay to build a battery-degradation model. The proposed system reduced costs and proved more adaptable. Nonetheless, its application to PSHP load optimisation has so far been limited. Widely used DRL variants such as Deep Q-Networks (DQNs), actor–critic schemes, and Proximal Policy Optimisation (PPO) [23] have encountered convergence difficulties, significant implementation complexity, or constraints in continuous-action spaces. The Deep Deterministic Policy Gradient (DDPG) algorithm, expressly devised for continuous, high-dimensional state–action domains, therefore emerges as a promising alternative [24].
To address these challenges, this study presents an intelligent scheduling framework for pumped-storage hydropower units by combining an Atomic Orbital Search-optimised Long Short-Term Memory network with the Deep Deterministic Policy Gradient algorithm (AOS-LSTM-DDPG). The AOS-LSTM model accurately fits unit flow characteristics using power output, water head, and discharge as inputs, significantly improving prediction accuracy and response efficiency. These fitted curves are embedded into a Markov decision process to guide the DDPG agent in learning water-efficient scheduling strategies. Operational constraints, including vibration zone avoidance, are incorporated into the reward design to enhance real-world applicability. Under a representative daily load scenario, the proposed method exhibits stable convergence and effectively reduces both water consumption and vibration events.
The main innovations and contributions of this study are as follows:
  • High-precision flow-efficiency curve fitting: AOS tunes LSTM hyperparameters to model nonlinear flow-efficiency curves more accurately than traditional methods, giving reliable input for scheduling.
  • Constraint-aware DRL embedded with fitted flow curves: Embedding the fitted curves into a DDPG-based Markov process yields a policy that minimises water use while respecting constraints such as vibration zone avoidance. The model offers fast inference, stable convergence, and superior water-saving and economic outcomes.
Other parts of this study are as follows: Section 2 models the short-term scheduling problem, detailing the objective and constraints. Section 3 describes the AOS-LSTM-DDPG framework. Section 4 presents case studies and comparative analysis. Finally, Section 5 concludes with key findings, limitations, and future work.

2. The Problem Description

This study investigates a short-term load optimisation scheduling model for pumped-storage hydropower units.

2.1. Objective Function

A PSHP must meet grid demand by adjusting output for peak-shaving and frequency regulation. At the same time, total water use should be minimised to improve efficiency. Accordingly, the optimisation objective is to minimise water consumption over the scheduling horizon:
Q a l l = min t = 1 T n = 1 N p r e _ q n , t ( H t , P n , t )
where Q a l l is the total water consumption over the scheduling horizon (m3); T is the total number of time intervals within the scheduling period; N represents the number of operable units in the pumped-storage power station; and p r e _ q n , t ( H t , P n , t ) is the predicted water consumption obtained from the prediction model, taking the operational water head H t and power output P n , t of the n t h unit at time interval t as inputs.
The plant under study uses fixed-speed pump turbines whose pumping power is either rated or zero; pumping-mode water use is therefore excluded, and only generation-mode consumption is optimised.

2.2. Constraints

When optimising PSHP generation schedules, multiple operational constraints must be considered. These constraints ensure safe and stable system operation, meet grid load requirements, optimise operational costs, and improve economic performance. Specific descriptions of these constraints are described below.

2.2.1. Load Balance Constraint:

The sum of power outputs from all units must satisfy the grid load demand at any given time [25]:
n = 1 N ( 1 - K n , t ) P n , t n = 1 N K n , t P u m p n , t L o a d t = 0
where P n , t is the power output of the nth PSHP unit at time (MW); L o a d t is the grid load demand at time t (MW); and K n , t is a binary variable representing the operating status of the nth PSHP unit at time t, defined as follows:
K n , t = 0 f o r   g e n e r a t i n g   a c t i o n 1 f o r   p u m p i n g   a c t i o n

2.2.2. Unit Output Constraints:

The power output of each unit must remain within its permissible operating range:
P n min P n , t P n max
P u m p n , t = 0 f o r   g e n e r a t i n g   a c t i o n P n max f o r   p u m p i n g   a c t i o n
where P n m i n ,   P n m a x are the minimum and maximum allowable power outputs for unit n (MW).

2.2.3. Generating Flow Rate Constraints

The generating flow rate for each unit must be maintained within its allowable operational range:
Q n min Q n , t Q n max
where Q n m i n ,   Q n m a x are the minimum and maximum allowable generating flow rates for unit n (m3/s).

2.2.4. Unit Vibration Zone Constraints

Units should avoid operating within vibration zones to the greatest extent possible:
P n , t P H , n , i ¯ P n , t P H , n , i ¯ 0
where P H , n , i ¯ and P H , n , i _ represent the upper and lower power output limits (MW), respectively, for the i-th vibration zone of unit n underwater head H.

2.2.5. Vibration Zone Crossing Risk Constraint

To reduce risks associated with frequent transitions through vibration zones during unit operation, the frequency of units crossing vibration zones must be constrained. This constraint can be expressed as follows:
η t η ¯ t
where η t represents the vibration zone crossing risk index for the units at time t; η t ¯ denotes the maximum allowable vibration zone crossing risk index at time t.
Given a determined load allocation plan, the specific calculation of the risk index is as follows:
For t 2
η t = n = 1 N 1 N Δ C n κ t 1 , t 1 P A P n , t B
where Δ C n κ ( t 1 , t ) is the number of vibration zone crossings for unit within the time interval ( t 1 , t ) ; P ( A P n , t B ) is the probability that unit n continuously operates within the recommended power range [A, B] during the interval (t − 1,t); and [A, B] is the recommended operational power output range (MW).

3. Materials and Methods

3.1. Refined Fitting Strategy for Unit Flow Characteristic Curves Based on AOS-LSTM

To accurately calculate water consumption for the load allocation of pumped-storage hydropower (PSHP) units, this study proposed a refined fitting strategy using an Atomic Orbital Search (AOS)-optimised Long Short-Term Memory (LSTM) neural network, abbreviated as AOS-LSTM. The trained model enhanced computational precision and response speed when integrated into the load optimisation scheduling model.
Flow characteristic curves of pump-turbine units are essential for PSHP operation and design, describing relationships between power output and flow rate under various conditions. Traditional approaches for constructing these curves involve fitting discrete measurement data [26] or using specialised simulation software [27]. However, these methods have limitations such as dependency on measured data, limited fitting accuracy, and complex modelling processes. Recent advances in artificial intelligence have explored intelligent algorithms and deep learning [28] to enhance accuracy and generalisation performance.
In this work, the AOS algorithm was employed to optimise key hyperparameters of an LSTM network, including hidden-layer width, batch size, and iteration count. The AOS-LSTM approach significantly improved prediction accuracy and response efficiency, thereby providing reliable inputs for PSHP load-optimisation scheduling.

3.1.1. LSTM Neural Network Structure

Recurrent Neural Networks (RNNs) are widely used for processing sequential data but often suffer from gradient vanishing and limited long-term memory. Long Short-Term Memory (LSTM) networks use input, forget, and output gates to overcome these problems and have proven effective in power-system analysis and fault diagnosis [29,30]. In this study, the LSTM is used to fit the flow characteristic curves of hydropower units. The LSTM memory cell (Figure 1) comprises three gates.

3.1.2. Atomic Orbital Search (AOS) Algorithm

To tune the LSTM’s hyperparameters (hidden neurons, batch size, training epochs), we adopt AOS—a new population-based metaheuristic inspired by electron motion in quantum orbitals [31]. AOS offers robust global search with few parameters and has proved effective in feature selection [32], photovoltaic parameter estimation [33], and path planning.

Basic Principle

AOS models each candidate solution as an electron moving within hypothetical concentric shells around an atomic nucleus. Electron transitions follow quantum rules: absorbing limited energy moves the electron to a higher shell; otherwise, it drops to a lower shell and emits a photon. This process guides the exploration and exploitation dynamics in the search space.

Initialisation

The initial positions of solution candidates (electrons) are randomly assigned based on the following:
x s j ( 0 ) = x s , min j + r a n d ( x s , max j x s , min j )
where x s j ( 0 ) is the position of the sth candidate in the jth dimension, and r a n d [ 0 , 1 ] is a uniformly distributed random number. The candidate distribution is further shaped by a probability density function (PDF), which defines the likelihood of electrons appearing at different radial distances from the nucleus (see Figure 2) [31].

Binding State and Energy

To simulate atomic behaviour, the algorithm evaluates the average position and energy (fitness value) of candidates within each shell:
B s t a t e z = 1 k r = 1 k A r z , B e n e r g y z = 1 k r = 1 k f r z
The global binding state and energy of the population are given by the following:
B s t a t e = 1 m r = 1 m A r , B e n e r g y = 1 m r = 1 m f r
where k and m denote the number of candidates in a shell and the total population size, respectively.

Search and Update Mechanism

The search process is controlled by a photon rate parameter, PR, representing the probability of photon–electron interaction. For each candidate, a random number θ [ 0 , 1 ] is generated:
If θ < P R , the electron performs exploratory motion:
x r z + 1 = x r z + μ r
where μ r is a random direction vector in [ 0 , 1 ] .
If θ P R , the electron either emits or absorbs a photon depending on its fitness:
Photon emission ( f r z B e n e r g y z ):
x r z + 1 = x r z + η r ω r L e n e r g y z δ r B s t a t e z z
Photon emission ( f r z < B e n e r g y z ):
x r z + 1 = x r z + η r ω r L e n e r g y z δ r B s t a t e z
Here, L e n e r g y z is the position of the candidate with the lowest energy in the zth layer, and η r , ω r , δ r are randomly generated coefficients in [0, 1].

Advantages and Integration of AOS in LSTM Training

  • Compared with traditional optimisation algorithms such as Particle Swarm Optimisation (PSO), the AOS algorithm offers several notable advantages for hyperparameter optimisation: strong global search driven by probabilistic electron dynamics;
  • Strong global search capability: the probabilistic modelling of electron motion enables effective exploration of the search space;
  • Ability to escape local optima: the energy-level transition mechanism allows the algorithm to overcome local minima;
  • Well-balanced exploration and exploitation: the multi-layer energy structure inherently maintains a balance between global exploration and local exploitation;
  • Low parameter dependency: AOS requires fewer control parameters, making it easy to integrate with deep learning models.
Therefore, this study integrates AOS into the LSTM training framework, forming the AOS-LSTM method to enhance the predictive accuracy and training stability in modelling the flow characteristics of pumped-storage hydropower units.

3.1.3. LSTM Neural Network Model Optimised by AOS

The structural hyperparameters of LSTM neural networks have a significant influence on their prediction accuracy. AOS tunes three hyperparameters—hidden neurons, batch size, and training epochs. Each hyperparameter set is viewed as a candidate solution and updated via the electron-orbital transition model [34]. In each iteration, the positions of the candidate solutions are adjusted to minimise the fitness function (prediction error), thereby obtaining the optimal hyperparameter combination. The dependent variables refer to the LSTM model’s prediction performance under a given set of hyperparameters. The primary indicators are the mean absolute error (MAE), the root mean square error (RMSE), and the convergence speed of training. These metrics constitute the fitness function used to guide the AOS search process. The detailed optimisation procedure of AOS-LSTM is described as follows:
  • Initialisation: generate an initial set of candidate LSTM hyperparameter solutions randomly, analogous to electron positions within an atomic system.
  • Orbital updating: reassign electrons to orbitals according to fitness (validation loss) and inter-electron spacing.
  • Position updating: perturb each electron’s position (hyperparameters) by a stochastic step drawn from its current orbital.
  • Energy (fitness) evaluation: re-compute fitness; retain the new position if loss decreases, otherwise revert or downgrade its energy level.
  • Iterative optimisation: repeat steps 2 through 4 until the predefined maximum number of iterations is reached or a convergence criterion is satisfied.
  • The specific algorithm flowchart is illustrated in Figure 3.

3.1.4. Model Evaluation Metrics

Model accuracy is quantified with the mean absolute error (MAE) and the root mean square error (RMSE) [35]; computational efficiency is gauged by training time. Smaller MAE/RMSE values imply higher predictive accuracy, whereas shorter training time indicates better computational efficiency. The error measures are defined as follows:
  • Mean absolute error (MAE):
M A E = 1 n i = 1 n y ^ i y i
  • Root mean square error (RMSE):
R M S E = 1 n i = 1 n y ^ i y i 2
where n is the number of samples, and y ^ i and y i denote the predicted and actual values for the i t h sample, respectively.

3.2. DDPG-Based Load Optimisation Scheduling Model for Pumped-Storage Units

3.2.1. DDPG Model

In reinforcement learning, a Markov decision process (MDP) describes how an agent interacts with its environment [36].
An MDP consists of five key components: state space S , action space A , state transition probabilities P , reward function R , and discount factor γ . At each time step t, the agent observes the current environment state s t and selects an action a t . The environment then returns a reward r t and transitions to the next state s t + 1 . Through continuous interaction, the agent gradually learns the optimal policy to maximise cumulative long-term rewards.
In this study, the load optimisation scheduling problem for pumped-storage units is formulated as a Markov decision process (MDP) and solved using the DDPG algorithm to handle continuous action spaces. The actor network outputs each unit’s power set-point at every step. Policy gradients, guided by the critic, continually refine these set-points.
The dependent variables include feedback from the environment, specifically the following:
  • Reward: a composite metric reflecting unit water consumption, load balance, and vibration zone avoidance;
  • System State: information such as the current water head, load demand, and previous outputs.
The reward and state feedback evaluate actions and drive policy updates via back-propagation.

Basic Principle

The action space is defined as the continuous output power range of pumped-storage units:
a t = P n , t , a t 0 , P n max
where P t , n denotes the output power of unit n at time t (MW), and P n m a x represents the maximum permissible output power of unit n (MW).

State Space Design

The state space should fully capture environmental information to enhance the decision-making efficiency and stability of the agent. At any given time step t, the state space is defined as follows:
s t = F t , H t , P H , n , i ¯ , P H , n , i ¯ , Q n min , Q n max
where each parameter is consistent with the definitions provided earlier, representing grid load demand, water head, the upper and lower power bounds of vibration zones, and allowable minimum and maximum flow rates, respectively.

Reward Function Design

The design of the reward function directly influences the agent’s learning performance in load optimisation scheduling for pumped-storage units. In this study, operational constraints are transformed into penalty terms within the reward function to achieve the optimisation objectives:
Vibration Zone Penalty Term:
ϕ 1 , t = φ 1 P n , t P H , n , i ¯ + P H , n , i ¯ 2 P H , n , i ¯ P n , t P H , n , i ¯ 0 e l s e
Vibration Zone Crossing Risk Penalty Term:
ϕ 2 , t = φ 2 e η t η t > η t ¯ φ 3 η t η t ¯
where, φ 1 , φ 2 , and φ 3 are positive penalty coefficients.
Consequently, the reward function for the pumped-storage unit load optimisation scheduling model is defined as follows:
r a t s t = α ( Q a l l , t M a x Q a l l , t ) ϕ 1 , t ϕ 2 , t
where Q a l l , t denotes the water consumption index at the current time step t; Q a l l , t M a x refers to a theoretical baseline of maximum water consumption (the total water usage when all units operate at full load); ( Q a l l M a x Q a l l , t ) by subtracting the actual water consumption from this value, a water-saving index can be derived and incorporated into the reward function. To balance the magnitudes between the water-saving term and the penalty terms, the reward function was scaled by introducing a normalisation factor α = 0.01 . Maximising this reward function thus corresponds to minimising both water consumption and operational risk. As shown in Figure 4, the overall framework of the DDPG algorithm for pumped-storage unit load optimisation scheduling integrates the plant environment with the agent learning process.

4. Results and Discussion

To validate the applicability and effectiveness of the proposed model, a case study was conducted using real operational data from a pumped-storage hydropower station in China. The plant has a total installed capacity of 1200 MW, consisting of four Francis-type reversible pump-turbine units, each rated at 300 MW.

4.1. Fitting of Unit Flow Characteristic Curves

This study proposes a high-precision method for fitting pump-turbine flow curves using an AOS-optimised LSTM network. First, the collected NHQ data under all operating conditions were preprocessed as follows:
  • All data features were normalised to eliminate the influence of scale differences.
  • Missing values in the dataset were imputed using interpolation;
  • The time-series data were transformed into a sliding-window format, facilitating their input into the LSTM model for subsequent prediction.
These steps improved training efficiency and helped the model capture key patterns. After preprocessing, the data dimensions were (3135, 3).
The preprocessed data were fed into the established LSTM neural network model. This study then applies AOS to tune three key hyperparameters: hidden-layer size, batch size, and training epochs. Figure 5 compares the fitness-value convergence of AOS-LSTM and PSO-LSTM.
Figure 5 shows both algorithms’ fitness values drop with each epoch and then plateau, confirming convergence. AOS-LSTM stabilises by the seventh epoch, whereas PSO-LSTM needs about twenty, indicating faster convergence. These results highlight AOS’s stronger global search and local refinement. Moreover, in terms of final converged values, the fitness value of AOS-LSTM stabilised around 0.0245, slightly lower than that of PSO-LSTM. The corresponding optimal network parameters obtained by the AOS-LSTM were determined to be 56 hidden-layer neurons, a batch size of 96, and 279 training epochs.
To ensure robust evaluation under the inherent uncertainty of intelligent algorithms, ten independent experiments were conducted with identical initial conditions. For additional validation, the model was compared with Back Propagation Neural Network (BPNN), LSTM, and PSO-LSTM on MAE, RMSE, and computation time (Table 1). The results confirm the method’s superior performance.
The prediction results of the four models are illustrated in Figure 6.
Table 1 shows that the AOS-LSTM outperforms traditional BPNN and LSTM in accuracy while keeping computation time reasonable. Specifically, the MAE values of the proposed model decreased by 87.11% and 84.25%, and the minimum RMSE values decreased by 86.68% and 81.49%, respectively. Against PSO-LSTM, AOS-LSTM cuts runtime and improves MAE and minimum RMSE by 69% and 65%, respectively. Figure 6 confirms that AOS-LSTM tracks both overall trends and extremes. Furthermore, the deviation in results across ten independent experiments did not exceed 0.6%, confirming the stability of the proposed approach. Therefore, the AOS-LSTM method demonstrates excellent adaptability and substantial potential for future precise predictions.

4.2. Training Procedure of the DDPG Model

In deep reinforcement learning, the proper selection and tuning of parameters significantly influence algorithm performance and convergence speed, among which the learning rate is particularly critical. An excessive learning rate makes the reward oscillate and prevents stable convergence. Conversely, a learning rate that is too small can slow down the improvement of rewards, prolonging the training process. Therefore, the learning rate often requires careful, repeated tuning to obtain good results. After conducting extensive parameter optimisation experiments, the final parameter values employed in this study are summarised in Table 2.
In addition to network-related hyperparameters, the reward function incorporates several penalty terms governed by corresponding coefficients, including vibration zone penalty ( φ 1 ) and transition-through-vibration-zone penalties ( φ 2 and φ 3 ). A sensitivity analysis assessed how these penalties affect training stability and performance. For efficiency, sensitivity runs were capped at 80,000 iterations enough to reveal clear performance trends. Figure 7 plots cumulative rewards for three representative penalty sets.
A small penalty ( φ 1 = 10 , φ 2 = 20 , φ 3 = 5 ) speeds early convergence but ends with a low cumulative reward, showing weak constraint enforcement and many infeasible schedules. A large penalty ( φ 1 = 500 , φ 2 = 1000 , φ 3 = 200 ) slows convergence and produces volatile rewards; the agent over-penalises exploration, becoming conservative and less generalisable. A moderate penalty ( φ 1 = 100 , φ 2 = 200 , φ 3 = 50 ) gives stable learning and the highest final reward, balancing water-saving and operational constraints.
Penalty choice is critical: too small weakens constraints, too large hampers learning. Therefore, the penalty coefficient combination ( φ 1 = 100 , φ 2 = 200 , φ 3 = 50 ) is adopted for the final model configuration in the subsequent experiments.
Training used stochastic exploration of the action space. Random noise added to actions encouraged early exploration and avoided local optima. The model underwent 150,000 training iterations, and the results are illustrated in Figure 8. As depicted, the agent’s learning process can be clearly divided into several phases: an initial exploration phase (iterations 0 to approximately 30,000), during which the agent continuously adapted to and explored the environment; a subsequent learning and optimisation phase (approximately 30,000–100,000 iterations), in which the agent progressively improved its decision-making policy; and finally, after about 100,000 iterations, the cumulative reward began to stabilise, indicating that the agent had effectively learned to make optimal decisions in the stochastic environment, thus maximising cumulative rewards. However, slight fluctuations in the stabilised cumulative reward values were still observable due to the introduction of random noise during action selection, which enhanced the model’s capability to avoid local optima and thereby improved its generalisation and adaptability.
The model was developed using the Python programming language (version 3.10.13) and the PyTorch deep learning framework (version 2.3.1) and was implemented in the PyCharm (2024.1.4) development environment. All training and testing processes were executed using an NVIDIA (Santa Clara, CA, USA) GeForce RTX 4060 GPU.

4.3. Decision Analysis of the Proposed Model

A real-world case study on a Chinese PSHP station tests the proposed strategy. The selected daily scenario represents a typical summer weekday in the southern power grid, where the pumped-storage station operates in a “one pumping and two generating” mode. Specifically, electricity is generated during the morning and evening peak hours to meet high grid demand, and water is pumped during the night when electricity consumption is low.
Figure 9 plots the daily load across 96 fifteen-minute intervals. The time axis (0–96) spans the full 24 h cycle. Positive values denote generation; negative values denote pumping. Figure 10 illustrates the head variation under the selected representative daily scheduling scenario.
To benchmark AOS-LSTM-DDPG, three methods are compared: Dynamic Programming (DP), Particle Swarm Optimisation (PSO), and standard DDPG. Figure 8 illustrates the unit power output distributions of the pumped-storage hydropower units under the daily load curve for the three optimisation strategies (DP, PSO, and AOS-LSTM-DDPG), where Unit1# through Unit4# denote the four pumped-storage units. Table 2 compares economic performance and operational risk for each strategy. Results are discussed in terms of economy and operational security.
  • Economic Performance Evaluation
As shown in Figure 11, both AOS-LSTM-DDPG and DP achieve stepwise load allocation across four generating units. During the morning and evening peaks, all units run steadily at 70–100% of rated output, boosting overall efficiency. In contrast, the output of PSO appears disordered, exhibiting significant imbalance; some units are either overloaded or underloaded, resulting in energy efficiency losses.
In terms of quantitative metrics, in Table 3, the water consumption of AOS-LSTM-DDPG is 1.983 × 10 7 m 3 , which is approximately 0.85%, 1.78%, and 2.36% lower than that of DDPG, PSO, and DP, respectively, indicating superior water-saving performance. Moreover, the inference time for both AOS-LSTM-DDPG and standard DDPG is within 1 s, satisfying the requirements for fast-response scheduling. By comparison, DP and PSO take 206.59 s and 10.82 s; DP’s discrete search struggles with real-time, continuous inputs.
2.
Operational Safety and Risk Assessment
From an operational risk perspective, the AOS-LSTM-DDPG method clearly outperforms the other approaches. Table 2 reports only two vibration-zone operations and two crossings for AOS-LSTM-DDPG. In contrast, DP resulted in 29 such operations and 14 transitions, which are approximately 1350% and 600% higher, respectively. PSO yielded 22 operations and 18 transitions, about 1000% and 800% higher. Even the standard DDPG, which lacks flow-efficiency guidance, experienced six and five occurrences, respectively, around 200% and 150% higher. These results show that flow-efficiency guidance markedly improves constraint awareness and risk avoidance
As shown in Figure 11a, AOS-LSTM-DDPG ensures that all units operate steadily within the high-efficiency range, avoiding inefficient low-load conditions. It also minimises the number of startups, thereby reducing equipment wear. In contrast, the PSO strategy illustrated in Figure 11c exhibits frequent switching and load fluctuations, causing abrupt changes in operational states, significantly increasing vibration risk, and reducing scheduling stability.
In summary, the proposed AOS-LSTM-DDPG framework offers superior economic efficiency and operational robustness. By combining flow-curve-informed state representation with deep reinforcement learning, the model effectively balances optimal scheduling with constraint satisfaction. It not only achieves lower water consumption and faster decision speed but also greatly mitigates vibration-zone risk, affirming its practical value for the intelligent dispatch of pumped-storage hydropower units.

5. Conclusions

This study proposes AOS-LSTM-DDPG, which combines high-precision flow-curve fitting with a constraint-aware DRL scheduler for pumped-storage units. The method accounts for real-world operational constraints, including system load, vibration zone avoidance, unit startup/shutdown duration, and operational state transitions. The main conclusions are as follows:
  • High-accuracy flow-curve fitting: The proposed AOS-optimised LSTM model accurately captures the flow-efficiency characteristics of PSH units. Compared with traditional fitting methods, it improves prediction accuracy by at least 65.35%, providing physical guidance and enhancing the agent’s constraint-awareness during dispatch.
  • Efficient DRL-based scheduling: With the inclusion of flow-feature guidance, the AOS-LSTM-DDPG model demonstrates stable convergence during 2 million training iterations and supports real-time inference within 1 s. Under a representative daily load scenario, it achieves the lowest water consumption (1.983 × 107 m3), outperforming standard DDPG (−0.85%), PSO (−1.78%), and DP (−2.36%) in economic efficiency.
  • Significant improvement in operational safety: The proposed method records only two vibration-zone operations and two transitions, representing a reduction of over 93.1%/85.7% compared to DP (29/14 events) and 90.9%/88.9% compared to PSO (22/18 events). This highlights its superior capability in constraint compliance and operational stability.
In terms of limitations, the tests covered one day only; future work will extend to multi-day and seasonal cases. Future research will be extended to cover cross-day and seasonal variations using multi-timescale scheduling frameworks; fixed penalty weights may curb adaptability-dynamic weighting and multi-objective DRL will be explored next. Future work could explore multi-objective DRL frameworks.
In conclusion, the proposed AOS-LSTM-DDPG method demonstrates significant advantages in economic performance, safety, and decision-making efficiency, making it a promising tool for intelligent and real-time scheduling in complex PSHP systems.

Author Contributions

Software, X.M. and C.H.; writing—original draft, X.M. and C.H.; writing—review and editing, H.P., Y.Z. and X.W.; data curation, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the State Key Program of National Natural Science Foundation of China (52339006); the Jiangsu Innovation Support Programmer for International Science and Technology Cooperation (Grant No. BZ2023047); and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Project No. KYCX24_0829).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Chenyang Hang was employed by the company NARI Group Corporation (State Grid Electric Power Research Institute). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, F.; Wang, X.; Liu, G. Allocation of carbon emission quotas based on global equality perspective. Environ. Sci. Pollut. Res. 2022, 29, 53553–53568. [Google Scholar] [CrossRef]
  2. Guo, X.; Huang, K.; Li, L.; Wang, X. Renewable energy for balancing carbon emissions and reducing carbon transfer under global value chains: A way forward. Sustainability 2022, 15, 234. [Google Scholar] [CrossRef]
  3. Han, X.; Ding, L.; Chen, G.; Liu, J.; Lin, J. Key technologies and research prospects for cascaded hydro-photovoltaic-pumped storage hybrid power generation system. Trans. China Electrotech. Soc. 2020, 35, 2711–2722. [Google Scholar]
  4. Azarova, E.; Jun, H. Investigating determinants of international clean energy investments in emerging markets. Sustainability 2021, 13, 11843. [Google Scholar] [CrossRef]
  5. Chai, R.; Li, G. Renewable clean energy and clean utilization of traditional energy: An evolutionary game model of energy structure transformation of power enterprises. Syst. Eng. Theory Pract. 2022, 42, 184–197. [Google Scholar]
  6. Han, M.; Chang, X.; Li, J.; Yang, G.; Shang, T. Application and development of pumped storage technology. Sci. Technol. Rev. 2016, 34, 57–67. [Google Scholar]
  7. Xu, R.; Zhang, J.; Liu, M.; Cao, C.; Chao, X. Life cycle cost of electrochemical energy storage and pumped storage. Adv. Technol. Electr. Eng. Energy 2021, 40, 10–18. [Google Scholar]
  8. Zhao, Z.; Jin, C.C.X.; Liu, L.; Yan, L. A MILP model for hydro unit commitment with irregular vibration zones based on the constrained Delaunay triangulation method. Int. J. Electr. Power Energy Syst. 2020, 123, 106241. [Google Scholar] [CrossRef]
  9. Vieira, D.A.G.; Costa, E.E.; Campos, P.H.F.; Mendonça, M.O.; Silva, G.R.L. A real-time nonlinear method for a single hydropower plant unit commitment based on analytical results of dual decomposition optimization. Renew. Energy 2022, 192, 513–525. [Google Scholar] [CrossRef]
  10. Cheng, X.; Feng, S.; Zheng, H.; Wang, J.; Liu, S. A hierarchical model in short-term hydro scheduling with unit commitment and head-dependency. Energy 2022, 251, 123908. [Google Scholar] [CrossRef]
  11. Shi, C.; Wei, T.; Tang, X.; Zhou, L.; Zhang, T. Charging–discharging control strategy for a flywheel array energy storage system based on the equal incremental principle. Energies 2019, 12, 2844. [Google Scholar] [CrossRef]
  12. Liao, S.; Liu, J.; Liu, B.; Cheng, C.; Zhou, L.; Wu, H. Multicore parallel dynamic programming algorithm for short-term hydro-unit load dispatching of huge hydropower stations serving multiple power grids. Water Resour. Manag. 2020, 34, 359–376. [Google Scholar] [CrossRef]
  13. Li, J.; Moe Saw, M.M.; Chen, S.; Yu, H. Short-term optimal operation of Baluchaung II hydropower plant in Myanmar. Water 2020, 12, 504. [Google Scholar] [CrossRef]
  14. Wang, W.; Wang, P.; Dong, Y. Modified dynamic programming algorithm and its application in distribution of power plant load. In E3S Web of Conferences, Proceedings of the 2019 International Conference on Building Energy Conservation, Thermal Safety and Environmental Pollution Control (ICBTE 2019), Hefei, China, 1–3 November 2019; EDP Sciences: Les Ulis, France, 2019; Volume 136, p. 01005. [Google Scholar]
  15. Hashim, F.A.; Houssein, E.H.; Hussain, K.; Mabrouk, M.S.; Al-Atabany, W. Honey badger algorithm: New metaheuristic algorithm for solving optimization problems, Math. Comput. Simul. 2022, 192, 84–110. [Google Scholar] [CrossRef]
  16. MiarNaeimi, F.; Azizyan, G.; Rashki, M. Horse herd optimization algorithm: A nature-inspired algorithm for high-dimensional optimization problems. Knowl. Based Syst. 2021, 213, 106711. [Google Scholar] [CrossRef]
  17. Zhang, X.; Wang, Z.; Lu, Z. Multi-objective load dispatch for microgrid with electric vehicles using modified gravitational search and particle swarm optimization algorithm. Appl. Energy 2022, 306, 118018. [Google Scholar] [CrossRef]
  18. Shang, Y.; Fan, Q.; Shang, L.; Sun, Z.; Xiao, G. Modified genetic algorithm with simulated annealing applied to optimal load dispatch of the Three Gorges Hydropower Plant in China. Hydrol. Sci. J. 2019, 64, 1129–1139. [Google Scholar] [CrossRef]
  19. Wang, X.; Yang, K.; Zhou, X. Two-stage glowworm swarm optimization for economical operation of hydropower station. IET Renew. Power Gener. 2018, 12, 992–1003. [Google Scholar] [CrossRef]
  20. Ming, B.; Liu, P.; Guo, S.; Cheng, L.; Zhou, Y.; Gao, S.; He, L. Robust hydroelectric unit commitment considering integration of large-scale photovoltaic power: A case study in China. Appl. Energy 2018, 228, 1341–1352. [Google Scholar] [CrossRef]
  21. Jiang, W.; Liu, Y.; Fang, G.; Ding, Z. Research on short-term optimal scheduling of hydro-wind-solar multi-energy power system based on deep reinforcement learning. J. Clean. Prod. 2023, 385, 135704. [Google Scholar] [CrossRef]
  22. Zhou, Y.; Huang, Y.; Mao, X.; Kang, Z.; Huang, X.; Xuan, D. Research on energy management strategy of fuel cell hybrid power via an improved TD3 deep reinforcement learning. Energy 2024, 293, 130564. [Google Scholar] [CrossRef]
  23. Liang, T.; Sun, B.; Tan, J.; Cao, X.; Sun, H. Scheduling scheme of wind-solar complementary renewable energy hydrogen production system based on deep reinforcement learning. High Volt. Eng. 2023, 49, 2264–2275. [Google Scholar]
  24. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
  25. Ha, P.T.; Tran, D.T.; Nguyen, T.T. Electricity Generation Cost Reduction for Hydrothermal Systems with the Presence of Pumped Storage Hydroelectric Plants. Neural Comput. Appl. 2022, 34, 9931–9953. [Google Scholar] [CrossRef]
  26. Novara, D.; McNabola, A. A model for the extrapolation of the characteristic curves of pumps as turbines from a datum best efficiency point. Energy Convers. Manag. 2018, 174, 1–7. [Google Scholar] [CrossRef]
  27. Cavazzini, G.; Zanetti, G.; Santolin, A.; Ardizzon, G. Characterization of the hydrodynamic instabilities in a pump-turbine operating at part load in turbine mode. In IOP Conference Series: Earth and Environmental Science, Proceedings of the 31st IAHR Symposium on Hydraulic Machinery and Systems, Trondheim, Norway, 26 June 2022–1 July 2022; IOP Publishing: New York, NY, USA, 2022; Volume 1079, p. 012033. [Google Scholar]
  28. Pan, H.; Hang, C.; Feng, F.; Zheng, Y.; Li, F. Improved neural network algorithm based flow characteristic curve fitting for hydraulic turbines. Sustainability 2022, 14, 10757. [Google Scholar] [CrossRef]
  29. Fang, Z.; Yang, Z.; Peng, H.; Chen, G. Prediction of Ultra-Short-Term power system based on LSTM-Random forest combination model. In Journal of Physics: Conference Series, Proceedings of the 2nd International Conference on Electronics, Electrical and Information Engineering, Changsha, China, 11–14 August 2022; IOP Publishing: New York, NY, USA, 2022; Volume 2387, p. 012033. [Google Scholar]
  30. Zhou, Y.; Kumar, A.; Gandhi, C.P.; Vashishtha, G.; Tang, H.; Kundu, P.; Xiang, J. Discrete entropy-based health indicator and LSTM for the forecasting of bearing health. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 120. [Google Scholar] [CrossRef]
  31. Azizi, M. Atomic orbital search: A novel meta-heuristic algorithm. Appl. Math. Model. 2021, 93, 657–683. [Google Scholar] [CrossRef]
  32. Abd Elaziz, M.; Ouadfel, S.; Abd El-Latif, A.A.; Ali lbrahim, R. Feature selection based on modified bio-inspired atomic orbital search using arithmetic optimization and opposite-based learning. Cogn. Comput. 2022, 14, 2274–2295. [Google Scholar] [CrossRef]
  33. Ali, F.; Sarwar, A.; Bakhsh, F.I.; Ahmad, S.; Shah, A.A.; Ahmed, H. Parameter extraction of photovoltaic models using atomic orbital search algorithm on a decent basis for novel accurate RMSE calculation. Energy Convers. Manag. 2023, 277, 116613. [Google Scholar] [CrossRef]
  34. Ha, P.T.; Tran, D.T.; Phan, T.M.; Nguyen, T.T. Maximization of Total Profit for Hybrid Hydro-Thermal-Wind-Solar Power Systems Considering Pumped Storage, Cascaded Systems, and Renewable Energy Uncertainty in a Real Zone, Vietnam. Sustainability 2024, 16, 6581. [Google Scholar] [CrossRef]
  35. Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
  36. Papadimitriou, C.H.; Tsitsiklis, J.N. The complexity of Markov decision processes. Math. Oper. Res. 1987, 12, 441–450. [Google Scholar] [CrossRef]
Figure 1. Structure of an LSTM memory cell.
Figure 1. Structure of an LSTM memory cell.
Water 17 01842 g001
Figure 2. Schematic diagram of PDF determining the distribution of the candidate solutions.
Figure 2. Schematic diagram of PDF determining the distribution of the candidate solutions.
Water 17 01842 g002
Figure 3. Flowchart of the AOS-LSTM algorithm.
Figure 3. Flowchart of the AOS-LSTM algorithm.
Water 17 01842 g003
Figure 4. The framework of the DDPG algorithm for pumped-storage load optimisation scheduling.
Figure 4. The framework of the DDPG algorithm for pumped-storage load optimisation scheduling.
Water 17 01842 g004
Figure 5. The fitness-value convergence curve.
Figure 5. The fitness-value convergence curve.
Water 17 01842 g005
Figure 6. Comparison of prediction results among four models.
Figure 6. Comparison of prediction results among four models.
Water 17 01842 g006
Figure 7. Cumulative reward curves under different penalty coefficient combinations.
Figure 7. Cumulative reward curves under different penalty coefficient combinations.
Water 17 01842 g007
Figure 8. Cumulative reward evolution during agent training.
Figure 8. Cumulative reward evolution during agent training.
Water 17 01842 g008
Figure 9. Daily load curve in “one pumping and two generating” modes.
Figure 9. Daily load curve in “one pumping and two generating” modes.
Water 17 01842 g009
Figure 10. Head variation curve during the scheduling period.
Figure 10. Head variation curve during the scheduling period.
Water 17 01842 g010
Figure 11. Comparison of unit power-dispatch results obtained with three optimisation methods.
Figure 11. Comparison of unit power-dispatch results obtained with three optimisation methods.
Water 17 01842 g011aWater 17 01842 g011b
Table 1. Comparison of prediction performance among four models.
Table 1. Comparison of prediction performance among four models.
ModelMAEMinimum RMSEAverage RMSEComputation Time (s)
BPNN1.0171.1871.95412.859
LSTM0.8320.8541.24613.032
PSO-LSTM0.4220.4561.08668.452
AOS-LSTM0.1310.1580.62551.365
Table 2. Specific parameters of the DDPG model.
Table 2. Specific parameters of the DDPG model.
ParameterCritic-NetworkActor-Network
Learning rate0.000040.00003
Soft update coefficient0.010.01
Number of network layers33
Neurons per layer6464
Hidden-layer activationReLUReLU
Output-layer activation/Tanh
Training episodes15001500
Table 3. Comparison of calculation indicators for four models.
Table 3. Comparison of calculation indicators for four models.
ModelModel Training Time (h)Decision Time (s)Water Consumption (×107 m3)In-Zone OperationsZone Crossings
DP____206.592.0312914
PSO10.822.0192218
DDPG2.80.7265
AOS-LSTM-DDPG2.90.741.98322
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, X.; Pan, H.; Zheng, Y.; Hang, C.; Wu, X.; Li, L. Short-Term Optimal Scheduling of Pumped-Storage Units via DDPG with AOS-LSTM Flow-Curve Fitting. Water 2025, 17, 1842. https://doi.org/10.3390/w17131842

AMA Style

Ma X, Pan H, Zheng Y, Hang C, Wu X, Li L. Short-Term Optimal Scheduling of Pumped-Storage Units via DDPG with AOS-LSTM Flow-Curve Fitting. Water. 2025; 17(13):1842. https://doi.org/10.3390/w17131842

Chicago/Turabian Style

Ma, Xiaoyao, Hong Pan, Yuan Zheng, Chenyang Hang, Xin Wu, and Liting Li. 2025. "Short-Term Optimal Scheduling of Pumped-Storage Units via DDPG with AOS-LSTM Flow-Curve Fitting" Water 17, no. 13: 1842. https://doi.org/10.3390/w17131842

APA Style

Ma, X., Pan, H., Zheng, Y., Hang, C., Wu, X., & Li, L. (2025). Short-Term Optimal Scheduling of Pumped-Storage Units via DDPG with AOS-LSTM Flow-Curve Fitting. Water, 17(13), 1842. https://doi.org/10.3390/w17131842

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop