Next Article in Journal
Detection of Fault Events in Software Tools Integrated with Human–Computer Interface Using Machine Learning
Previous Article in Journal
Study on Aerodynamic Noise of Ahmed Body Mounted with Different Spoiler Configurations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism

1
State Grid Shandong Electric Power Research Institute, Jinan 250003, China
2
School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China
3
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(18), 10031; https://doi.org/10.3390/app151810031
Submission received: 9 July 2025 / Revised: 4 September 2025 / Accepted: 8 September 2025 / Published: 14 September 2025
(This article belongs to the Special Issue Control and Security of Industrial Cyber–Physical Systems)

Abstract

With the rapid integration of high-penetration renewable energy, its inherent uncertainty complicates power system day-ahead/intra-day scheduling, leading to challenges like wind curtailment and high operational costs. Existing methods either rely on inflexible physical models or use deep reinforcement learning (DRL) without prioritizing critical variables or synergizing multi-source energy storage and demand response (DR). This study develops a multi-time scale coordination scheduling framework to balance cost minimization and renewable energy utilization, with strong adaptability to real-time uncertainties. The framework integrates a day-ahead optimization model and an intra-day rolling model powered by an attention-enhanced DRL Actor–Critic network—where the attention mechanism dynamically focuses on critical variables to correct real-time deviations. Validated on an East China regional grid, the framework significantly enhances renewable energy absorption and system flexibility, providing a robust technical solution for the economical and stable operation of high-renewable power systems.

1. Introduction

Against the backdrop of global energy transition and the urgent need to address environmental challenges, accelerating the integration of renewable energy has become a critical strategy for sustainable development. However, the inherent uncertainty of renewable energy sources such as wind power introduces significant complexities to power system scheduling, posing challenges to the stable and economic operation of the grid [1]. Effectively accommodating fluctuations in renewable energy output while coordinating multiple resources—including conventional generators, energy storage systems, and demand response programs—to achieve optimal interaction remains a key issue in modern power system operation.
To tackle the aforementioned challenges, artificial intelligence (AI) methods—especially deep reinforcement learning (DRL)—have emerged as promising alternatives to traditional model-based approaches, thanks to their data-driven adaptability in handling dynamic uncertainties without over-reliance on precise physical models [2]. Recent explorations of AI in real-time optimal power flow [3] not only expand its application scope in power systems but also provide technical references for AI-driven scheduling optimization.
Research on optimal power system scheduling has received extensive attention. For microgrid scenarios involving multi-energy resources, reference [4] proposed a day-ahead economic optimal scheduling method considering time-of-use electricity prices. At the building scale, reference [5] developed a day-ahead optimization model for smart homes with photovoltaic–thermal (PVT) systems, integrating battery/boiler storage and load scheduling to balance cost and comfort. Extending to integrated energy systems, reference [6] proposed a data-driven distributed robust self-scheduling method for integrated energy production units participating in day-ahead electricity–gas joint markets. Reference [7] constructed a battery charging–discharging model and derived an economic scheduling strategy for systems with energy storage using dynamic programming. Reference [8] characterized the uncertainty of new energy generation based on fuzzy sets, adopted a distributed robust optimization method to handle chance constraints, and managed the energy of islanded microgrids. Reference [9] addressed the uncertainty in renewable energy and load forecasting by building a two-stage robust optimization model with uncertainty adjustment parameters, which was solved using the column-and-constraint generation algorithm. Reference [10] considered the uncertainty of wind power in the uncertainty set, iteratively solved the adaptive robust optimization model using the column constraint generation algorithm, and ensured the feasibility of the scheduling strategy through a two-level robustness solution. However, most of these studies are confined to day-ahead scheduling and fail to consider real-time fluctuations or prediction errors; more critically, they rely heavily on precise physical models—computational efficiency declines sharply as system scale expands (e.g., multi-source energy storage integration), making them unsuitable for dynamic intra-day scenarios.
To better adapt to the dynamic characteristics of renewable energy and load fluctuations across different time horizons, research has increasingly focused on multi-time scale scheduling frameworks. Reference [11] proposed a two-stage scheduling model for integrated energy systems (IESs) based on distributionally robust adaptive MPC, improving day-ahead–intra-day coordination via dual-loop feedback. Reference [12] considered the participation of distributed renewable energy and distributed energy storage in microgrid scheduling, establishing an intra-day optimal scheduling model for active distribution networks based on improved deep reinforcement learning. Reference [13] analyzed the power and energy balance mechanism of multi-interactive distribution networks with source-grid–load-storage coordination, and constructed a source-grid–load-storage coordinated optimal scheduling model. Reference [14] proposed a four-stage scheduling model (day-ahead, intra-day, ultra-short-term, and ultra-ultra-short-term) optimized via Gurobi, incorporating buffer boundaries for global coordination. Reference [15] developed a multi-time scale rolling scheduling model based on “robust plans” and “robust operation zones” using robust optimization. While these models improve temporal coordination, they still depend on predefined physical constraints and neglect demand-side flexibility, limiting their ability to handle high-renewable penetration.
With the advancement of artificial intelligence, DRL chen2024model has emerged as a promising solution for uncertain environments [16,17], leveraging its adaptability and flexibility in dynamic scenarios [18,19]. Reference [20] adopted the deep Q-learning algorithm for optimal scheduling of energy management, and the simulation results showed that the use of reinforcement learning algorithms can improve energy utilization and scheduling performance. Reference [21] employed a two-layer RL model, where the upper layer considers the energy state of the entire scheduling cycle to generate charging and discharging strategies for energy storage, and the lower layer only considers the current operating cost, using mixed-integer linear programming to solve the current scheduling actions and operating costs. Reference [22] considered energy storage systems and realized robust economic scheduling of virtual power plants using the deep deterministic policy gradient algorithm based on scenario datasets generated by generative adversarial networks. Reference [23] evaluated seven DRL algorithms for energy management, highlighting their potential in complex scheduling scenarios.
Beyond electrochemical and pumped hydro storage, reference [24] integrated tidal range power stations into day-ahead scheduling, demonstrating flexible coordination of renewable energy with unique storage characteristics. Reference [25] further explored gravity energy storage (GES) in day-ahead scheduling, using distributionally robust optimization to handle price uncertainty. However, the above studies have two critical limitations: (1) they rarely consider the synergistic coordination of multi-source energy storage (e.g., pumped hydro storage + electrochemical storage) and dual-type demand response (price-based + incentive-based); (2) they lack mechanisms to prioritize critical state variables (e.g., wind power forecasts, energy storage state of charge), leading to suboptimal decision-making in multi-variable, multi-time scale scenarios.
The limitations of existing methods directly affect practical operation of high-renewable power systems: traditional day-ahead models cannot adjust to sudden wind surges; over-reliance on conventional generator ramping increases fuel/penalty costs; and poor multi-source coordination raises grid instability risks. For grids like East China’s (plagued by renewable integration challenges), these issues hinder carbon reduction and grid stability—urgently requiring a more adaptive framework.
To address the above gaps, this study develops a multi-time scale coordination scheduling framework integrated with attention-enhanced DRL, balancing cost minimization and renewable utilization while improving real-time adaptability. The proposed framework combines two core stages: Day-ahead scheduling (hourly resolution), which optimizes conventional generators, pumped hydro storage, electrochemical energy storage, and price-based demand response (PDR) to minimize total operational costs; and intra-day rolling scheduling (15 min resolution, 4 h horizon), which adjusts wind power, battery operation, and incentive-based demand response (IDR) in real time. Notably, the intra-day stage uses an attention-based Actor–Critic DRL algorithm—dynamically weighting critical variables (e.g., wind forecast errors, storage SOC) to correct deviations, overcoming the lack of targeted decision-making in existing DRL methods.
The rest of the paper is organized as follows. The multi-time scale coordination scheduling model is proposed in Section 2. Section 3 presents the reinforcement learning-based optimization framework. Section 4 verifies the effectiveness and advantages of the proposed method with simulation results. Finally, the conclusion of this paper is presented in Section 5.

2. Multi-Time Scale Coordination Scheduling Model

Figure 1 systematically outlines the two core stages of the framework—day-ahead scheduling and intra-day rolling scheduling—including their respective input, core optimization models, and key decision outputs, while also reflecting the integration of the attention-enhanced DRL mechanism in the intra-day dynamic adjustment process.

2.1. Day-Ahead Scheduling Optimization Model

The developed day-ahead optimization framework synergistically combines wind power forecasts with load demand predictions to achieve dual objectives of cost minimization and renewable energy utilization maximization. By coordinating the dispatch of conventional generators, pumped hydro storage, electrochemical energy storage systems, and demand response (DR), the model achieves the lowest total system operating cost while enhancing renewable energy utilization.

2.1.1. Objective Function

The proposed day-ahead scheduling model addresses two critical challenges in modern power systems: maximizing renewable energy integration and maintaining grid reliability under contingency conditions. Specifically, the optimization framework simultaneously minimizes total system operating costs while accounting for penalty costs associated with wind curtailment and emergency load shedding. By incorporating these penalty mechanisms, the model achieves a threefold optimization objective: economic efficiency through optimal resource dispatch, enhanced utilization of wind power generation by minimizing curtailment, and guaranteed power supply reliability during emergency operations through controlled load shedding strategies. This comprehensive approach effectively balances cost-effectiveness with operational resilience in renewable-rich power systems.
min f D A = t = 1 T 1 C G , t + C e r s s , t + C w i n d + C l o a d , t
C G , t = i = 1 N G a i P G , i , t 2 + b i P G , i , t + c i C e r s s , t = i = 1 N w a t e r f c , w a t e r P w a t e r , i , t + f m , w a t e r P w a t e r , i , t + i = 1 N b a t t e r y f c , b a t t e r y P b a t t e r y , i , t + f m , b a t t e r y P b a t t e r y , i , t C w i n d , t = i = 1 N w i n d f c P w i n d , i , t + k c , w i n d P w i n d , i , t p r e P w i n d , i , t C l o a d , t = k P D R Δ | P P D R , t | + k c , l o a d P l o s s , t
where f D A is the objective function of the day-ahead scheduling optimization model, representing the system operation cost; T 1 represents the number of time intervals in a day (24 h, hourly resolution);  C G , t , C e r s s , t , C w i n d , t , and C l o a d , t represent the cost functions of conventional units, energy storage power stations (ERSS, including pumped hydro storage and electrochemical energy storage), wind turbines, and user loads, respectively; N G is the number of conventional units; P G , i , t is the power generation of the i-th conventional unit at time t; a i , b i , and c i are the quadratic, linear, and constant cost coefficients of the i-th conventional unit, respectively; N w a t e r and N b a t t e r y are the numbers of pumped hydro storage stations and electrochemical energy storage stations, respectively; P w a t e r , i , t and P b a t t e r y , i , t are the output of pumped hydro storage stations and electrochemical energy storage stations, i at time t, respectively; f c ( P w a t e r , i , t ) and f c ( P b a t t e r y , i , t ) are the cost function of the pumped hydro storage stations and electrochemical energy storage stations, respectively; f m ( P w a t e r , i , t , ) and f m ( P b a t t e r y , i , t , ) are the maintenance cost function of the pumped hydro storage stations and electrochemical energy storage stations, respectively; N w i n d represents the number of distributed new energy units; P w i n d , i , t represents the output of the i-th wind turbine at time t; f c , w i n d ( P w i n d , i , t ) represents the cost function of the wind turbine at time t; k c , w i n d represents the wind curtailment penalty cost coefficient; P w i n d , t p r e represents the predicted output of wind power at time t; k P D R is the cost coefficients of PDR; Δ | P P D R , t | is the call volume of PDR at time t; k c , l o a d is the load loss penalty coefficient; P l o s s , t is the power loss of the load at time t.

2.1.2. Constraints

The day-ahead scheduling model incorporates comprehensive operational constraints to ensure physical feasibility and system reliability. These constraints govern the balanced operation of conventional generation units, renewable energy sources, energy storage systems, and demand response resources while maintaining network security.
The power balance constraint ensures generation–load equilibrium at each time interval, accounting for both fixed and price-responsive demand components:
L R , t + Δ P P D R , t + Δ P I D R , t = i N G P G , i , t + i N D C P w i n d , i , t + i N w a t e r P w a t e r , i , t + i N b a t t e r y P b a t t e r y , i , t + P l o s s , t
where L R is the part of the load that does not change with the electricity price; Δ P P D R , t is the change in PDR at time t; and Δ P I D R , t is the change in IDR at time t.
Conventional generation units are subject to technical operating limits. The output of each thermal unit must remain within its designated capacity range:
P G , i m i n P G , i , t P G , i m a x
where P G , i m i n and P G , i m a x are the upper and lower limits of the output of the i-th conventional unit, respectively.
Additionally, ramping capabilities constrain inter-temporal output variations to ensure feasible dispatch transitions:
R i d n P G , i , t P G , i , t 1 R i u p
where R i u p and R i d n represent the maximum allowable upward and downward power adjustments per time interval.
Renewable energy integration follows forecast availability constraints, requiring wind power output to not exceed predicted generation potential:
0 P w i n d , i , t P w i n d , i , t p r e
The output of new energy power generation should be less than the predicted value.
Energy storage systems, including both pumped hydro and electrochemical storage, operate under distinct technical constraints. Pumped hydro storage must maintain reservoir water volume within operational limits while respecting pumping/discharge rate constraints:
P w a t e r , t m i n P w a t e r , t P w a t e r , t m a x V P u m p m i n V w a t e r V P u m p m a x P w a t e r , t P w a t e r , t 1 Δ P R ,
where P w a t e r , t m i n and P w a t e r , t m a x represent the upper and lower limits of the power input and output of the pumped storage power station, respectively; V P u m p m i n and V P u m p m a x represent the upper and lower limits of the water storage of the pumped storage power station; and Δ P R represents the ramp rate of the pumped storage power station.
Electrochemical storage systems are constrained by power converter ratings and state-of-charge (SOC) boundaries:
P e l e c , t c h a P e l e c , t P e l e c , t d i s S S O C m i n S S O C , t S S O C m a x
where P e l e c , t c h a and P e l e c , t d i s are the rated charging power and rated discharging power of the inverter, respectively; S S O C , t is the state of charge of the energy storage power station; upper and lower limits; and S S O C m a x and S S O C m i n are the upper and lower limits of the state of charge of the energy storage power station.
Transmission network security is enforced through line flow limits derived from DC power flow assumptions:
P i j m a x B i j θ i , t θ j , t P i j m a x
where P i j m a x is the maximum transmission power of the transmission line between nodes i j ; B i j is the susceptance between nodes i; and θ i , t is the phase angle of node i at time t.
Flexibility requirements ensure adequate adjustment capabilities from both conventional and storage resources:
P G , i , t P G , i , t , b a s e ψ i P e r s s , t P e r s s , t , b a s e ψ e r s s
where ψ i and ψ e r s s are the flexible adjustment capabilities of conventional units and ERSS, respectively. P G , i , t , b a s e and P e r s s , t , b a s e are the baseline scheduled outputs of conventional units and ERSS, respectively.
Demand response programs operate within predefined participation limits. Price-based DR follows continuous adjustment bounds:
P P D R m i n P P D R , t P P D R m a x
where P P D R m i n and P P D R m a x are the lower and upper limits of the PDR load call volume, respectively.
The day-ahead scheduling solution provides key operational parameters for subsequent intra-day coordination, including committed conventional unit outputs, pumped storage schedules, and activated demand response quantities. These predetermined values serve as boundary conditions for finer-time scale operational decisions while maintaining overall system optimization objectives.

2.2. Intra-Day Rolling Scheduling Optimization Model

The intra-day rolling optimization framework establishes a dynamic decision-making process to address real-time operational challenges in renewable-integrated power systems. Building upon the day-ahead scheduling results, this approach combines ultra-short-term forecasting with adaptive optimization to maintain system balance while accommodating renewable generation variability. The model solves for optimal control sequences using 15 min resolution forecast data for wind power and load demand across a rolling 4 h horizon.
The objective function maintains consistency with the day-ahead optimization while focusing on adjustable operational costs:
min f I D = t = 1 T 2 C G , t + C e r s s , t + C w i n d , t + C l o a d , t
where T 2 represents the number of time intervals in the 4 h rolling horizon (15 min resolution, 16 intervals) and the load cost component now exclusively considers IDR activation:
C l o a d , t = k I D R Δ P I D R , t + k c , l o a d P l o s s , t
where k I D R is the cost coefficient of IDR; Δ | P I D R , t | is the call volume of IDR at time t. IDR refers to a demand-side management mechanism where users receive direct financial incentives (e.g., subsidies or rebates) to adjust their electricity consumption in response to grid operational needs, such as peak shaving or renewable energy absorption.
The constraint set largely mirrors the day-ahead model with additional provisions for IDR flexibility. IDR maintains separate capacity limits for load increase and decrease:
0 P I D R , t + P I D R + , m a x 0 P I D R , t P I D R , m a x
where P I D R , t + and P I D R , t are the increased load volume and decreased load volume of IDR, respectively.
The optimization process takes the day-ahead scheduled states as fixed parameters, including conventional unit commitments, pumped storage operations, and PDR allocations. Within this framework, the model determines optimal wind farm dispatch schedules, battery storage operations, and IDR activation levels to maintain system balance.

3. Reinforcement Learning-Based Optimization Framework

This study proposes a novel hierarchical optimization framework for power system dispatch that combines conventional optimization with reinforcement learning. The framework operates across two distinct time scales to address both planning and real-time operational challenges. In the day-ahead stage, a deterministic optimization problem is solved using hourly forecast data to establish baseline schedules for thermal units, pumped hydro storage, and PDR resources. These day-ahead commitments then serve as fixed parameters for the subsequent intra-day optimization phase.
The intra-day scheduling operates on a rolling 4 h horizon with 15 min resolution, employing reinforcement learning to adjust for forecast errors and real-time system conditions. This stage focuses on optimizing three key adjustable parameters: wind power output adjustments, battery storage charge/discharge commands, and IDR activations. The reinforcement learning agent interacts with a simulated power system environment, receiving state observations and reward signals to continuously improve its dispatch policy.

3.1. State Space and Action Set

The state space includes six dimensions: system load, wind power, current output state of conventional units, current output status of pumped storage units, state of charge of electrochemical energy storage at the previous moment, and reducible load reduction. The system state at the current time t is as follows:
s t = [ L R , t f o r e c a s t , P w a t e r , t 1 , i , P w a t e r , t , i , P w i n d , t 1 , i , P w i n d , t , i f o r e c a s t , P G , t 1 , i , P G , t , i , P b a t t e r y , t 1 , i , S s o c , t 1 , i , P P D R , t 1 , P P D R , t , P I D R , t 1 ]
where L R , t f o r e c a s t is the load forecast value at the current time; P w a t e r , t 1 , i and P w a t e r , t , i are the power of the i-th pumped storage unit at the previous moment and the current time, respectively. P w i n d , t 1 , i and P w i n d , t , i f o r e c a s t are the actual output of wind power at the previous moment and the forecast value at the current moment, respectively; P G , t 1 , i and P G , t , i are the power of the i-th conventional unit at the previous moment and the current time, respectively. P b a t t e r y , t 1 , i is the power of the i-th electrochemical energy storage at the previous moment. S s o c , t 1 , i is the actual state of charge of the electrochemical energy storage at the previous moment. P P D R , t 1 and P P D R , t are the power of PDR at the previous moment and the current time, respectively.
The action space includes three dimensions: wind turbine output adjustment, charge/discharge power of electrochemical energy storage, and reducible load reduction. The action space at the current time t is as follows:
a t = [ P w i n d , t , i , P b a t t e r y , t , i , P I D R , t ]

3.2. Reward Function

The reward function is an evaluation in the process of action network updating. During the model training process, the agent inputs action values to the power system. The system simulates operation according to the scheduling strategy and feeds back the reward function and the next state. The reward function reflects the quality of the agent’s output actions, thereby guiding the direction of strategy updating. The reward function mainly reflects two aspects: reducing the system operating cost and minimizing the penalty for system operation exceeding limits. The penalty function is as follows:
F a l l , t = α 5 F u n b , t + α 6 F w i n d , t + α 7 F e l e c , t + α 8 F s o c , t + α 9 F l i n e , t + α 10 F a d j u s t , t + α 11 F I D R , t R t = F a l l , t , F a l l , t 0 α + α 1 C G , t α 2 C e r s s , t α 3 C w i n d α 4 C l o a d , t , F a l l , t > 0
where F u n b , t is the power imbalance penalty quantifying deviation from generation–load equilibrium, F w i n d , t is the wind curtailment penalty for underutilized renewable generation, F e l e c , t is the battery storage violation penalty for exceeding power ratings, F s o c , t is the SOC boundary violation penalty for energy storage, F l i n e , t is the transmission line overload penalty for exceeding thermal limits, F a d j u s t , t is the regulation capacity violation penalty for exceeding ramping capabilities, and F I D R , t is the IDR deviation penalty. The coefficients α 1 through α 11 serve to normalize the dimensional units of all reward and penalty terms and establish the proportional relationships between different components. The parameter α + = 100 ensures that the reward value under non-violation conditions is always greater than that under violation conditions.
F u n b , t = L R , t + Δ P P D R , t + Δ P I D R , t P G , i , t + P w i n d , i , t + P w a t e r , i , t + P b a t t e r y , i , t 2
F w i n d , t = i = 1 N D C max ( 0 , P w i n d , i , t p r e P w i n d , i , t )
F e l e c , t = i = 1 N b a t t e r y max ( 0 , P b a t t e r y , i , t P e l e c d i s ) + max ( 0 , P e l e c c h a P b a t t e r y , i , t )
F s o c , t = i = 1 N b a t t e r y max ( 0 , S O C i , t S O C m a x ) + max ( 0 , S O C m i n S O C i , t )
F l i n e , t = i j N l max 0 , | B i j ( θ i , t θ j , t ) | P i j m a x 1
F a d j u s t , t = i = 1 N G max ( 0 , | P G , i , t P G , i , t , b a s e | ψ i ) + j = 1 N e r s s max ( 0 , | P e r s s , j , t P e r s s , j , t , b a s e | ψ e r s s )
F I D R , t = max ( 0 , P I D R , t + P I D R + , m a x ) + max ( 0 , P I D R , t P I D R , m a x )

3.3. Attention-Based Actor Network

The Actor–Critic framework is a powerful reinforcement learning approach that combines the strengths of policy-based and value-based methods [26]. In this paper, we adopt the Actor–Critic framework to solve the intra-day rolling scheduling problem, where the Actor network is responsible for generating control actions, and the Critic network evaluates the quality of these actions, as shown in Figure 2.
Actor–Critic methods are used, whose approach is to learn a value function Q ( S , a ) to approximate the expected return value in
Q ( S , a ) E t = t T γ t R t ( S , a )
The Critic network is updated to minimize the mean squared error between the predicted Q-value and the target Q-value:
L ( ψ ) = E ( s , a , r , s ) D ( Q ψ ( s , a ) y i ) 2
where the target value y i is computed using the target networks:
y i = r + γ Q ψ ¯ ( s , π θ ¯ ( s ) )
To enhance the Actor network’s ability to focus on critical state variables, such as wind power output, energy storage state, and reducible load, this paper introduces a multi-head attention mechanism [27] into the Actor network. The attention mechanism adaptively weights these key inputs based on their relevance to the current decision-making process. The proposed architecture enhances the Actor network’s state processing capability through an attention mechanism. This design focuses computational resources on critical power system variables while maintaining efficient operation. The complete state representation combines temporal system measurements and forecast data in Equation (15). The attention mechanism operates on carefully selected critical variables that most significantly impact dispatch decisions:
s t c r i t i c a l = P w i n d , t 1 , i , P w i n d , t , i f o r e c a s t , P b a t t e r y , t 1 , i , S s o c , t 1 , i , P I D R , t 1
while the remaining variables form the context set:
s t c o n t e x t = L R , t f o r e c a s t , P w a t e r , t 1 , i , P w a t e r , t , i , P G , t 1 , i , P G , t , i , P P D R , t 1 , P P D R , t
The attention transformation follows the standard scaled dot-product formulation:
Attention ( Q , K , V ) = Softmax Q K T d k V
with the query, key, and value matrices derived through learned linear projections:
Q = W q s t c r i t i c a l , K = W k s t c r i t i c a l , V = W v s t c r i t i c a l
where W q R d × d k , W k R d × d k , and W v R d × d v are trainable weight matrices. The network architecture then combines the processed representations:
a t = π θ ( s t ) = Tan h W 2 · ReLU ( W 1 · [ s t a t t n ; s t c o n t e x t ] + b 1 ) + b 2
The complete system automatically learns to allocate attention weights to variables like wind power forecast errors and storage SOC levels that prove most relevant for optimal dispatch decisions. The Tanh activation ensures generated actions remain within normalized bounds for subsequent scaling to actual control signals. Through end-to-end training, the attention mechanism develops specialized sensitivity to critical operational constraints without requiring manual feature engineering.
The Actor network is updated using the policy gradient:
θ J ( θ ) = E s D θ log π θ ( a | s ) · A π ( s , a )
where the advantage function A π ( s , a ) is approximated as:
A π ( s , a ) = α log π θ ( a | s ) Q ψ ( s , a ) + b ( s , a i )
where b ( s , a i ) is a baseline function that reduces variance in policy gradient estimation, computed as:
b ( s , a i ) = E a π ( s ) Q ψ ( s , ( a , a i ) )

4. Case Study

Tackling the critical issue of constrained renewable energy absorption, this study selects a regional power grid in East China—plagued by significant new energy integration hurdles—as a real-world testbed to validate its proposed scheduling framework. The grid’s infrastructure includes six traditional thermal units stationed at nodes 1, 2, 5, 8, 11, and 13, with detailed parameters documented in Table 1. Node 2 hosts a 400 MW wind farm alongside a 50 MW/200 MW·h electrochemical storage system, while node 8 accommodates a 100 MW/400 MW·h pumped storage facility [28].
Operational assumptions are set as follows: PDR adjustments are capped at 10% of total load, and IDR deployment does not exceed 5% of total load. For streamlined calculations, fixed compensation cost coefficients for IDR are employed, as specified in Table 2. Load and wind power dynamics are visualized in Figure 3, which captures load fluctuations alongside wind power variations under both scenarios. To ensure the reproducibility of the DRL implementation, detailed parameters of the training process and computational setup are specified as follows: the learning rate is set to 0.001 for both the Actor and Critic networks to balance training convergence speed and stability; the Adam optimizer is adopted, with momentum parameters β 1 = 0.9 and β 2 = 0.9 to adaptively adjust the learning rate for different parameters; the discount factor γ = 0.95 is used to emphasize short-term operational rewards while retaining consideration for long-term system stability; training is terminated when the average reward of 20 consecutive episodes stabilizes within a 2% fluctuation range, ensuring the learned policy is robust and convergent; and experiments are conducted on a server equipped with an Intel Core i5-11400F and 16 GB RAM with the software environment built using Python 3.8 and PyTorch 1.10 to provide sufficient computational resources for DRL model training and inference. The DRL model undergoes offline training only once. The training is conducted using sufficient historical operational data, which covers typical seasonal, daily, and load fluctuation scenarios.
Scheduling schemes for both scenarios are detailed in Figure 4 and Figure 5, where each curve represents the cumulative output of preceding components plus the contribution of the current unit or DR resource. Notably, the red curve (denoting load fluctuations) maintains balance with the combined outputs of conventional units, storage systems, and wind power across all periods. Meanwhile, Figure 6 and Figure 7 illustrate the operational plans for pumped storage, electrochemical storage, and demand response resources across the two scenarios. A closer look at DR resource behavior reveals distinct diurnal roles: daytime usage focuses on peak shaving and damping wind power volatility, while nighttime operations shift to valley filling. This division aligns with the contrasting patterns observed in wind power and load interactions. In reverse peak shaving scenarios, wind power generation surges during early morning (2:00–6:00) and evening (16:00–21:00)—periods when non-DR loads are low. Here, active DR resource activation and storage charging effectively boost wind power absorption. Conversely, during peak shaving, wind power output aligns more closely with load curves, peaking at midday (10:00–14:00) and afternoon (16:00–19:00). The high non-DR load during these windows reduces the need for DR resources compared to reverse peak shaving at the same times.
From the perspective of practical applicability, the proposed framework aligns well with the operational characteristics of the East China regional grid—where wind power is concentrated in coastal areas and load peak–valley differences are high. As shown in Figure 5 and Figure 7, the framework activates IDR and charges electrochemical storage to absorb surplus wind power. Regarding computational scalability, the attention mechanism’s ability to prioritize critical variables reduces redundant calculations.
The framework’s design allows straightforward extension to solar PV integration. PV’s uncertainty can be incorporated by adding “PV forecast error” to the attention mechanism’s critical variable set (Equation (28) in Section 3.3). For dynamic demand response, replacing the fixed DR participation rate with a “user response elasticity model” would further enhance performance.
To explore the practical effectiveness of dual energy storage systems in improving wind power utilization, reducing curtailment rates, and saving costs, this study designed comparative experiments targeting two typical scenarios: wind power peak shaving and reverse peak shaving. By constructing three differentiated operational schemes, the research systematically analyzed the impact mechanisms of energy storage integration forms and scheduling time scales on overall performance: The first scheme is the “no-storage benchmark mode”, which relies entirely on day-ahead scheduling plans without incorporating any energy storage equipment or implementing dynamic adjustments across multiple time scales such as intra-day, serving as a basic reference. The second scheme is the “single pumped storage mode”, which, while retaining the day-ahead scheduling framework, only introduces pumped storage stations for regulation, focusing on examining the operational boundaries of a single energy storage type. The third scheme, the “hybrid energy storage multi-scale mode”, represents the core strategy proposed in this study. It combines pumped storage with electrochemical energy storage and embeds a day-ahead-intra-day coordinated multi-time scale scheduling architecture to achieve complementary advantages of different energy storage characteristics. The DRL architecture, specifically the attention-based Actor–Critic network, is applied to all three schemes.
The experimental results (as shown in Table 3) clearly demonstrate the performance differences among the three schemes. The “no-storage benchmark mode” has obvious shortcomings in wind power absorption, especially in reverse peak shaving scenarios. During the high wind power periods, the limited call volume of demand response resources fails to match the large-scale wind power output, leading to severe curtailment and insufficient system flexibility to meet fluctuations. The “single pumped storage mode”, although an improvement over the benchmark scheme, is constrained by the inherent slow adjustment speed of pumped storage. It struggles to quickly respond to instantaneous fluctuations in wind power and load during reverse peak shaving, resulting in nearly identical curtailment rates in peak shaving and reverse peak shaving scenarios, with limited optimization effects.
In contrast, the “hybrid energy storage multi-scale mode” exhibits significant advantages: the rapid response capability of electrochemical energy storage can compensate for the lag in pumped storage adjustment, while the large-capacity characteristic of pumped storage can support energy balance over long time scales, forming efficient synergy. Combined with the refined regulation of demand response resources, this mode not only drastically reduces curtailment rates in both scenarios but also achieves a moderate reduction in system operation costs by minimizing unnecessary power redundancy and adjustment losses.To isolate the impact of the attention mechanism, a head-to-head comparison with the “DRL without attention” model is performed. The attention-enhanced model reduces wind curtailment and lowers operational costs. This improvement is attributed to the attention mechanism’s ability to weight wind power forecast errorsand energy storage SOC more heavily than other variables, enabling proactive correction of prediction deviations that would otherwise lead to curtailment or load shedding.
To evaluate the robustness of the framework under parameter fluctuations, sensitivity tests are conducted on key variables affecting scheduling outcomes. Focusing on IDR compensation costs—a critical parameter in demand response programs—we simulate adjustments within a 10% range (consistent with practical policy and subsidy variation ranges of 8–12%), modifying the baseline IDR cost (150 CNY/(MW·h)) to 135 CNY/(MW·h) and 165 CNY/(MW·h). The results indicate that even with 10% cost variations, the total operational cost changes by only 0.8–1.2% This stability arises from the framework’s “baseline + dynamic supplement” DR coordination: when IDR costs increase, the system automatically reduces IDR activation and increases battery discharge to maintain balance, avoiding over-reliance on expensive demand response. Conversely, lower IDR costs trigger increased IDR utilization, reducing energy storage cycling losses. This adaptive adjustment mechanism eliminates the need for precise parameter calibration, enhancing practical applicability.
The superiority of the proposed framework stems from three synergistic mechanisms: (1) the attention mechanism dynamically prioritizes critical variables (e.g., wind power forecast errors and energy storage SOC), enabling the agent to allocate computational focus to high-impact factors during decision-making; (2) hybrid energy storage (pumped hydro + electrochemical) leverages the high capacity of pumped storage and rapid response of batteries, reducing reliance on conventional generator ramping; and (3) coordinated DR scheduling (day-ahead PDR + intra-day IDR) enhances demand-side flexibility, with PDR establishing cost-effective baselines and IDR addressing real-time mismatches.
In summary, integrating dual energy storage technologies with a multi-time scale scheduling framework can more effectively address the uncertainty in new energy generation forecasts compared to a single pumped storage scheme. It establishes a more flexible and reliable balancing mechanism for the “source-grid-load-storage” system, thereby significantly enhancing new energy absorption capacity and providing strong support for the economical and stable operation of power grids with high proportions of renewable energy.

5. Conclusions

This study develops a multi-time scale coordination scheduling framework integrated with reinforcement learning to address the challenges of renewable energy integration in power systems. The framework combines day-ahead and intra-day rolling scheduling to balance cost efficiency and adaptability: the day-ahead model establishes optimal baselines for conventional generators, pumped hydro storage, and PDR to minimize operational costs, while the intra-day model adjusts to real-time fluctuations using electrochemical storage and IDR. The reinforcement learning-based intra-day optimization, featuring an attention-based Actor–Critic network, effectively handles wind power uncertainty by prioritizing critical variables such as wind forecasts and storage states, enhancing decision precision. Integrating hybrid energy storage (pumped hydro + electrochemical) and coordinated demand response significantly improves performance: compared to traditional schemes, it reduces wind curtailment rates and lowers operational costs. Case studies on an East China regional grid validate the framework’s practicality across typical scenarios, demonstrating its ability to mitigate source–load mismatches, especially during reverse peak conditions.
This study still has limitations that point to clear opportunities for future research: it currently focuses solely on wind power without incorporating solar PV, and uses a fixed demand response participation rate that fails to reflect the dynamic response characteristics of users in practice. Future work will integrate solar PV into the framework—by adding PV forecast error to the attention mechanism’s critical variables to optimize decision-making—and construct a dynamic demand response model based on user response elasticity, while further verifying the framework on large-scale power grids to enhance its universality and practical value.

Author Contributions

Conceptualization, E.L. and S.G.; methodology, E.L.; software, E.L. and X.C.; validation, S.G. and X.C.; formal analysis, E.L.; investigation, J.L. and E.L.; resources, J.L. and Y.S.; data curation, X.C.; writing—original draft preparation, Y.S.; writing—review and editing, X.C.; visualization, X.C.; supervision, S.G.; project administration, E.L., and M.Z.; funding acquisition, S.G. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Science and Technology Project of State Grid Shandong Electric Power Research Institute “Research and Application of Collaborative Control Technology for Multi-type Energy Storage and Temporal Complementarity” (No. 52062624000V).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhenbang, L. Analysis of Optimal Dispatching Technology for Electric Vehicles Participating in Day-ahead Wind Power Consumption. Shandong Electr. Power 2021, 48, 8–15+36. [Google Scholar]
  2. Wu, Z.; Zhang, M.; Fan, B.; Shi, Y.; Guan, X. Deep Synchronization Control of Grid-Forming Converters: A Reinforcement Learning Approach. IEEE/CAA J. Autom. Sin. 2025, 12, 273–275. [Google Scholar] [CrossRef]
  3. Wu, Z.; Zhang, M.; Gao, S.; Wu, Z.G.; Guan, X. Physics-informed reinforcement learning for real-time optimal power flow with renewable energy resources. IEEE Trans. Sustain. Energy 2024, 16, 216–226. [Google Scholar] [CrossRef]
  4. Guo, S.; Guo, Y.; Liu, X.; Dong, W. Research On The Day-ahead Economic Dispatch Method of Microgrid Including Solar-ESS-EV. Shandong Electr. Power 2025, 52, 1–11. [Google Scholar]
  5. Fiorotti, R.; Fardin, J.F.; Rocha, H.R.O.; Rua, D.; Lopes, J.A.P. Day-ahead optimal scheduling considering thermal and electrical energy management in smart homes with photovoltaic–thermal systems. Appl. Energy 2024, 74, 124070. [Google Scholar] [CrossRef]
  6. Li, Y.; Li, Z.; Liu, H. Distributed Robust Self-scheduling Method for Integrated Energy Production Unit Participating in Day-ahead Electricity-gas Joint Market. Shandong Electr. Power 2024, 51, 47–55. [Google Scholar]
  7. Chen, W.; Wu, N.; Huang, Y.-L.; Ma, X.; Guo, X.; Lin, D. Auxiliary decision-making method of optimal dispatching for microgrid based on deep learning. South. Power Syst. Technol. 2022, 16, 117–126. [Google Scholar]
  8. Shi, Z.; Liang, H.; Huang, S.; Dinavahi, V. Distributionally robust chance-constrained energy management for islanded microgrids. IEEE Trans. Smart Grid 2018, 10, 2234–2244. [Google Scholar] [CrossRef]
  9. Liu, Y.; Guo, L.; Wang, C. Economic dispatch of microgrid based on two stage robust optimization. Proc. CSEE 2018, 38, 4013–4022. [Google Scholar]
  10. Tan, J.; Wu, Q.; Hu, Q.; Wei, W.; Liu, F. Adaptive robust energy and reserve co-optimization of integrated electricity and heating system considering wind uncertainty. Appl. Energy 2020, 260, 114230. [Google Scholar] [CrossRef]
  11. Fan, G.; Peng, C.; Wang, X.; Wu, P.; Yang, Y.; Sun, H. Optimal scheduling of integrated energy system considering renewable energy uncertainties based on distributionally robust adaptive MPC. Renew. Energy 2024, 226, 120457. [Google Scholar] [CrossRef]
  12. Li, X.; Han, X.; Yang, M. Day-ahead optimal dispatch strategy for active distribution network based on improved deep reinforcement learning. IEEE Access 2022, 10, 9357–9370. [Google Scholar] [CrossRef]
  13. Liu, Z.; Zhang, M.; Huang, J.; Hu, Z.; Dai, P. Power and energy balance method of multivariate interactive distribution network based on source network load coordination. In Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics, Nanjing, China, 24–26 June 2022; pp. 685–689. [Google Scholar]
  14. Dou, X.; Xu, M.; Dong, J.; Quan, X.; Wu, Z.; Sun, J. Multi-time scale based improved energy management model for microgrid. Autom. Electr. Power Syst. 2016, 40, 48–55. [Google Scholar]
  15. Yang, J.; Li, Y.; Yang, P.; Duan, Z.; Yan, J. A robust scheduling method of ac/dc distribution network based on diamond-shaped cutting convex hull set. In Proceedings of the 2024 4th International Conference on Intelligent Power and Systems (ICIPS), Yichang, China, 1–3 November 2024; IEEE: New York, NY, USA; pp. 1038–1047. [Google Scholar]
  16. Chen, X.; Zhang, M.; Wu, Z.; Wu, L.; Guan, X. Model-free load frequency control of nonlinear power systems based on deep reinforcement learning. IEEE Trans. Ind. Inform. 2024, 20, 6825–6833. [Google Scholar] [CrossRef]
  17. Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
  18. Peng, L.; Sun, Y.; Xu, J.; Liao, S.; Yang, L. Self-adaptive uncertainty economic dispatch based on deep reinforcement learning. Autom. Electr. Power Syst. 2020, 44, 33–42. [Google Scholar]
  19. Hua, H.C.; Qin, Y.C.; Hao, C.T.; Cao, J.W. Optimal energy management strategies for energy Internet via deep reinforcement learning approach. Appl. Energy 2019, 239, 598–609. [Google Scholar] [CrossRef]
  20. Bui, V.-H.; Hussain, A.; Kim, H.-M. Q-learning-based operation strategy for community battery energy storage system (cbess) in microgrid system. Energies 2019, 12, 1789. [Google Scholar] [CrossRef]
  21. Nie, H.; Zhang, J.; Chen, Y.; Xiao, T. Real-time economic dispatch of community integrated energy system based on a double-layer reinforcement learning method. Power Syst. Technol. 2021, 45, 1330–1336. [Google Scholar]
  22. Fang, D.; Guan, X.; Hu, B.; Peng, Y.; Chen, M.; Hwang, K. Deep reinforcement learning for scenario-based robust economic dispatch strategy in internet of energy. IEEE Internet Things J. 2020, 8, 9654–9663. [Google Scholar] [CrossRef]
  23. Nakabi, T.A.; Toivanen, P. Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustain. Energy Grids Netw. 2021, 25, 100413. [Google Scholar] [CrossRef]
  24. Zhang, T.; Hanousek, N.; Qadrdan, M.; Ahmadian, R. A day-ahead scheduling model of power systems incorporating multiple tidal range power stations. IEEE Trans. Sustain. Energy 2022, 14, 826–836. [Google Scholar] [CrossRef]
  25. Wu, X.; Li, N.; Wang, X.; Kuang, Y.; Zhao, W.; Qian, T.; Zhao, H.; Hu, J. Day-ahead scheduling of a gravity energy storage system considering the uncertainty. IEEE Trans. Sustain. Energy 2020, 12, 1020–1031. [Google Scholar] [CrossRef]
  26. Chen, X.; Zhang, M.; Wu, Z.; Yu, L.; Hatziargyriou, N.D.; Guan, X. Load Frequency Control of Multi-microgrids Based on Deep Deterministic Policy Gradient Integrated with Online Learning. IEEE Trans. Smart Grid 2025, 16, 4266–4278. [Google Scholar] [CrossRef]
  27. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  28. Jin, L.; Fang, X.; Cai, Z.; Chen, D.; Li, Y. Multiple time-scales source-storage-load coordination scheduling strategy of grid connected to energy storage power station considering characteristic distribution. Power Syst. Technol. 2020, 44, 3641–3650. [Google Scholar]
Figure 1. Workflow diagram of the multi-time scale coordinated scheduling framework.
Figure 1. Workflow diagram of the multi-time scale coordinated scheduling framework.
Applsci 15 10031 g001
Figure 2. The framework of the attention-based Actor–Critic network.
Figure 2. The framework of the attention-based Actor–Critic network.
Applsci 15 10031 g002
Figure 3. Load and wind power output in different scenarios.
Figure 3. Load and wind power output in different scenarios.
Applsci 15 10031 g003
Figure 4. The scheduling situation of wind power peak shaving.
Figure 4. The scheduling situation of wind power peak shaving.
Applsci 15 10031 g004
Figure 5. The scheduling situation of wind power reverse peak shaving.
Figure 5. The scheduling situation of wind power reverse peak shaving.
Applsci 15 10031 g005
Figure 6. Power profiles in peak shaving scenario.
Figure 6. Power profiles in peak shaving scenario.
Applsci 15 10031 g006
Figure 7. Power profiles in reverse peak shaving scenario.
Figure 7. Power profiles in reverse peak shaving scenario.
Applsci 15 10031 g007
Table 1. Parameters of conventional units.
Table 1. Parameters of conventional units.
Unit NumberNode P max / MW P min / MW   a / ( Yuan / ( MW ) 2 )  
11200500.0375
2280200.175
3550150.625
4835100.0834
51130100.25
61340120.25
Unit Number b / ( Yuan / MW ) c / Yuan R u / R d /(MW/h) T S / T D /h
120372.5722
217.5352.3482
310316.5302
432.5329.2212
530276.4182
630232.2242
Table 2. Compensation cost coefficients of DR.
Table 2. Compensation cost coefficients of DR.
DR TypeCompensation Cost Coefficient/(Yuan/(MW·h))
PDR225
IDR150
Table 3. Comparison of different scheduling schemes.
Table 3. Comparison of different scheduling schemes.
ScenarioScheduling Scheme Cost/CNY 10,000  Curtailment Rate/% 
Scenario 1Scheme 1123.9512.40
Scheme 2118.278.09
 Scheme Without Attention 116.636.53
Proposed Scheme115.795.76
Scenario 2Scheme 1120.547.91
Scheme 2116.352.05
Scheme Without Attention113.591.30
Proposed Scheme112.461.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, E.; Gao, S.; Chen, X.; Li, J.; Sun, Y.; Zhang, M. Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism. Appl. Sci. 2025, 15, 10031. https://doi.org/10.3390/app151810031

AMA Style

Liu E, Gao S, Chen X, Li J, Sun Y, Zhang M. Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism. Applied Sciences. 2025; 15(18):10031. https://doi.org/10.3390/app151810031

Chicago/Turabian Style

Liu, Enren, Song Gao, Xiaodi Chen, Jun Li, Yuntao Sun, and Meng Zhang. 2025. "Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism" Applied Sciences 15, no. 18: 10031. https://doi.org/10.3390/app151810031

APA Style

Liu, E., Gao, S., Chen, X., Li, J., Sun, Y., & Zhang, M. (2025). Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism. Applied Sciences, 15(18), 10031. https://doi.org/10.3390/app151810031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop