Multi-Time-Scale Optimal Scheduling Strategy for Marine Renewable Energy Based on Deep Reinforcement Learning Algorithm

Surrounded by the Shandong Peninsula, the Bohai Sea and Yellow Sea possess vast marine energy resources. An analysis of actual meteorological data from these regions indicates significant seasonality and intra-day uncertainty in wind and photovoltaic power generation. The challenge of scheduling to leverage the complementary characteristics of various renewable energy sources for maintaining grid stability is substantial. In response, we have integrated wave energy with offshore photovoltaic and wind power generation and propose a day-ahead and intra-day multi-time-scale rolling optimization scheduling strategy for the complementary dispatch of these three energy sources. Using real meteorological data from this maritime area, we employed a CNN-LSTM neural network to predict the power generation and load demand of the area on both day-ahead 24 h and intra-day 1 h time scales, with the DDPG algorithm applied for refined electricity management through rolling optimization scheduling of the forecast data. Simulation results demonstrate that the proposed strategy effectively meets load demands through complementary scheduling of wave power, wind power, and photovoltaic power generation based on the climatic characteristics of the Bohai and Yellow Sea regions, reducing the negative impacts of the seasonality and intra-day uncertainty of these three energy sources on the grid. Additionally, compared to the day-ahead scheduling strategy alone, the day-ahead and intra-day rolling optimization scheduling strategy achieved a reduction in system costs by 16.1% and 22% for a typical winter day and a typical summer day, respectively.


Introduction
In recent years, with the increasingly severe problem of energy depletion, the development of renewable energy has gradually become the key to solving the energy crisis.Due to the volatility of renewable energy, the power system needs sufficient flexibility to balance the difference between power supply and demand [1].According to research, the proportion of renewable energy is expected to increase to 31% by the year 2035 [2].Therefore, effective dispatch for renewable energy sources, particularly taking into account their complementarities, is essential to ensure distribution network stability.
Currently, numerous achievements have been made in the research on the optimal scheduling of renewable energy resources.Day-ahead scheduling involves energy forecasting, with the authors of [3][4][5][6][7] focusing on wind power prediction and the authors of [8][9][10][11] on photovoltaic power prediction.In [12], the authors utilized actual operational data from the load system to perform load forecasting on the integrated energy system of industrial parks using deep learning algorithms, and [13] concentrates on load forecasting.However, these studies predominantly forecast from a single perspective, and simultaneous Entropy 2024, 26, 331 2 of 19 predictions of both the generation and consumer sides in the system could further enhance grid stability.Dahmani et al. [14] analyzed the reliability of offshore wind power, providing valuable insights.In an earlier work, ref. [15] introduces an integrated multi-energy complementary coordination scheduling method.A storage system control model (ESSCM) is proposed in [16] for the wind and solar hybrid combined storage system scenario to facilitate the synergistic operation of wind and photovoltaic (PV) power generation in a combined system, thus maximizing the benefits of the combined system in the electricity market.Zhang et al. [17] present a day-ahead scheduling model for an industrial power system integrating wind power and multiple types of storage, proving that the introduction of storage devices can reduce the occurrence of wind curtailment and enhance system flexibility.The authors of [18] offer a solution for coordinated optimal day-ahead scheduling in a hybrid thermal-wind-photovoltaic power generation system including an energy storage system (ESS), aimed at minimizing the total generation cost and suppressing frequent changes in the charging and discharging states of the ESS.Reddy et al. [19] propose an optimized scheduling strategy for a battery-thermal-wind-photovoltaic generation system considering the impact of uncertainties in wind, solar photovoltaic, and load forecasting.These scheduling schemes in [14][15][16][17][18][19] focus on the combination of single or dual types of energy sources.The authors of [17,18] do not consider the cost issues of storage systems.Moreover, different maritime areas have distinct climatic characteristics, and relying solely on photovoltaic and wind energy may not always meet the users' electricity demands.As an emerging energy source, wave energy, with its high predictability and abundant kinetic energy conversion potential, is attracting increasing attention.This paper aims to propose a multi-energy complementary system suitable for the Bohai and Yellow Sea areas.
The Bohai Sea and Yellow Sea, located in the eastern maritime region of China, are geographically connected and significantly influenced by the monsoon climate in terms of their climatic characteristics.For photovoltaic power generation, this area enjoys long sunshine hours and high solar radiation intensity in the summer, while the winter has relatively shorter sunshine hours and less intense solar radiation.Regarding wind power generation, influenced by monsoons, the region is dominated by northerly winds in winter, characterized by stable and strong wind conditions; in summer, it shifts to predominantly southerly winds, which are less stable and weaker.Wave energy in this area is primarily driven by wind waves, exhibiting distribution characteristics similar to the wind field, with higher waves and northerly directions in winter, and lower waves with southerly directions in summer.During the day, the peak production period of photovoltaic power generation is utilized to meet the high electricity demand during peak daytime hours, with wind and wave power generation serving as supplements.Since photovoltaic power generation does not produce electricity at night, wind and wave power generation can provide power, especially on nights with high wind speeds, where wind power generation can play a more significant role.An earlier work [20] assessed the complementary system of wind, photovoltaic, and wave energies in the Yellow Sea and Bohai Sea areas, demonstrating the rationality of their complementarity.Based on the rationale of wind-photovoltaic-wave complementarity, this paper proposes a day-ahead and intra-day multi-time-scale rolling optimization scheduling model that considers the integration of offshore wind energy, wave energy, and offshore photovoltaic energy.This multi-energy integration approach can more effectively utilize the monsoon characteristics of the area to enhance grid stability.
The main work is as follows: • Based on meteorological data from the Bohai Sea and Yellow Sea areas, we conducted an analysis of annual electricity production, which included the temporal distribution, Kendall coefficient, and entropy value.We discovered that the integration of wave energy could effectively complement wind energy, enhancing the complementarity between wind and solar photovoltaic energy sources.• We utilized a CNN-LSTM neural network to predict the power generation of three types of renewable energy sources 24 h and 1 h in advance, with time scales of 1 h and 10 min, respectively.Employing CNN-LSTM for predictions at 1 h and 10 min scales captures the short-term and long-term patterns of renewable energy output.This multi-time-scale approach aids in more accurately understanding and predicting the power generation of renewable energy sources, thereby facilitating more effective planning and scheduling of resources.• We formulated the day-ahead and intra-day energy scheduling problem as a Markov Decision Process (MDP) model.Within this MDP model, we used a DRL algorithm to find the optimal scheduling strategy, adjusting the reward function and state space to better accommodate the rolling optimization scheduling problem discussed in this paper.
In Section 2, we introduce the system models and the cost functions of each unit.In Section 3, we present the day-ahead and intra-day rolling optimization scheduling strategy and the CNNLSTM-DDPG algorithm.Section 4 is dedicated to simulation analysis, and Section 5 concludes the discussion.

Energy Complementarity Analysis
We collected meteorological data from the maritime areas surrounding Weihai City in Shandong Province, China, for the period from 1 January to 31 December 2019.Through the use of power conversion formulas, we calculated the annual power generation.For ease of analysis, the calculated data were normalized, as shown in Figure 1.

•
We utilized a CNN-LSTM neural network to predict the power generation of t types of renewable energy sources 24 h and 1 h in advance, with time scales o and 10 min, respectively.Employing CNN-LSTM for predictions at 1 h and 10 scales captures the short-term and long-term patterns of renewable energy ou This multi-time-scale approach aids in more accurately understanding predicting the power generation of renewable energy sources, thereby facilita more effective planning and scheduling of resources.

•
We formulated the day-ahead and intra-day energy scheduling problem as a Ma Decision Process (MDP) model.Within this MDP model, we used a DRL algor to find the optimal scheduling strategy, adjusting the reward function and state s to better accommodate the rolling optimization scheduling problem discussed in paper.
In Section 2, we introduce the system models and the cost functions of each un Section 3, we present the day-ahead and intra-day rolling optimization schedu strategy and the CNNLSTM-DDPG algorithm.Section 4 is dedicated to simula analysis, and Section 5 concludes the discussion.

Energy Complementarity Analysis
We collected meteorological data from the maritime areas surrounding Weihai in Shandong Province, China, for the period from 1 January to 31 December 2019.Thro the use of power conversion formulas, we calculated the annual power generation ease of analysis, the calculated data were normalized, as shown in Figure 1.The generating power of a wind turbine (WT) is directly related to the wind sp and the equation for the output power of a wind turbine is expressed as [21] 3 3 where in v is the cut-in wind speed and out v is the cut-out wind speed.When the w speed exceeds the cut-out wind speed, the wind turbine will stop generating electrici The generating power of a wind turbine (WT) is directly related to the wind speed, and the equation for the output power of a wind turbine is expressed as [21] where v in is the cut-in wind speed and v out is the cut-out wind speed.When the wind speed exceeds the cut-out wind speed, the wind turbine will stop generating electricity to protect the turbine blades.When the wind speed (v i ) is between the rated wind speed (v r ) and the cut-out speed (v out ), the wind turbine outputs its rated power.
When calculating the power output of a wind turbine (P WT ) based on wind speed, it is necessary to convert the wind speed according to the height of the wind turbine (WT), using the following conversion formula [22]: where v is the wind speed at the hub height (h) of the WT, v m is the wind speed measured at the height, h m , and z is a function of atmospheric stability and ocean surface characteristics, with a value of 0.11 [20].
The performance of photovoltaic (PV) power generation is primarily influenced by solar radiation and ambient temperature.The calculation formula is as follows [22]: In the formula, T cell represents the working temperature of the photovoltaic panel, T STC and R STC are the temperature and radiation values measured under standard conditions, R i is the current radiation, P Rated is the rated power, and k is the temperature conversion coefficient, set at −3.7%.
For the wave energy output model, this paper employs an oscillating buoy as the wave energy converter (WEC), represented as [23,24] In the formula, η represents the energy conversion efficiency, L width represents the width of the wave captured, H s and T e , respectively, represent the effective wave height and effective wave period, and ρ and g are the density of seawater and the acceleration due to gravity, respectively.
The Kendall coefficient (τ) is an indicator used to evaluate the correlation between two sets of data.Through the Kendall coefficient, the complementary potential between different energy sources can be analyzed [19].When 0 < τ < 1, the two energy sources are positively correlated, indicating that they have a similar increase or decrease relationship and lack complementarity; when −1 < τ < 0, the two energy sources are negatively correlated, indicating that they have opposite increase or decrease relationships and possess complementarity.Table 1 shows the complementarity among photovoltaic, wave energy, and wind energy sources.By evaluating the annual electricity production entropy of three energy sources, their uncertainty and variability can be quantified.Periods with higher entropy values indicate significant fluctuations in electricity production for the corresponding energy source, signifying increased uncertainty; conversely, lower entropy values suggest more stable production levels.As illustrated in Figure 2, photovoltaic (PV) and wave energy conversion (WEC) exhibit an approximate inverse relationship in entropy values throughout the year, suggesting that these two energy sources can complement each other in facing changes and instability.Meanwhile, wind turbine (WT) entropy values remain relatively stable over the year, contributing positively to enhancing the stability of grid operations.From Figure 1 and Table 1, it is evident that on a seasonal time scale, wind energy and photovoltaic energy exhibit good complementarity, while wave energy is correlated with wind energy and also has good complementary potential with photovoltaic energy.
( T ) W WEC PV τ + − in Table 1 shows that after integrating the WEC into the WT-PV system, the Kendall coefficient changes from −0.3649 to −0.4106, indicating an enhancement in complementarity.Therefore, the inclusion of wave energy can effectively compensate for the shortcomings of wind energy.
Based on a comprehensive analysis that considers the temporal distribution, Kendall coefficient, and entropy values of the three energy sources, we conclude that it is reasonable to implement complementary scheduling of wind, solar, and wave energy in the test marine area.

Microgrid Generation Model
Based on the aforementioned analysis, we considered a wind-solar-wave multienergy complementary system, as shown in Figure 3.This system consists of three microgrids, each equipped with renewable energy generation facilities and Distributed Generation (DG) devices as backup power sources.In this system, the generation of various energy sources is subject to environmental conditions, leading to uncertainties and volatility.To balance power generation with load demand, surplus electricity can be stored in the energy storage system for future use.Moreover, when the renewable energy generation exceeds load demand, this surplus electricity can also be sold through energy transactions with the main grid, thereby achieving optimal energy allocation and maximizing economic benefits.From Figure 1 and Table 1, it is evident that on a seasonal time scale, wind energy and photovoltaic energy exhibit good complementarity, while wave energy is correlated with wind energy and also has good complementary potential with photovoltaic energy.τ (WT+WEC)−PV in Table 1 shows that after integrating the WEC into the WT-PV system, the Kendall coefficient changes from −0.3649 to −0.4106, indicating an enhancement in complementarity.Therefore, the inclusion of wave energy can effectively compensate for the shortcomings of wind energy.
Based on a comprehensive analysis that considers the temporal distribution, Kendall coefficient, and entropy values of the three energy sources, we conclude that it is reasonable to implement complementary scheduling of wind, solar, and wave energy in the test marine area.

Microgrid Generation Model
Based on the aforementioned analysis, we considered a wind-solar-wave multi-energy complementary system, as shown in Figure 3.This system consists of three microgrids, each equipped with renewable energy generation facilities and Distributed Generation (DG) devices as backup power sources.In this system, the generation of various energy sources is subject to environmental conditions, leading to uncertainties and volatility.To balance power generation with load demand, surplus electricity can be stored in the energy storage system for future use.Moreover, when the renewable energy generation exceeds load demand, this surplus electricity can also be sold through energy transactions with the main grid, thereby achieving optimal energy allocation and maximizing economic benefits.From Figure 1 and Table 1, it is evident that on a seasonal time scale, wind energ and photovoltaic energy exhibit good complementarity, while wave energy is correlate with wind energy and also has good complementary potential with photovoltaic energ 1 shows that after integrating the WEC into the WT-PV system, th Kendall coefficient changes from −0.3649 to −0.4106, indicating an enhancement complementarity.Therefore, the inclusion of wave energy can effectively compensate f the shortcomings of wind energy.
Based on a comprehensive analysis that considers the temporal distribution, Kenda coefficient, and entropy values of the three energy sources, we conclude that it reasonable to implement complementary scheduling of wind, solar, and wave energy the test marine area.

Microgrid Generation Model
Based on the aforementioned analysis, we considered a wind-solar-wave mul energy complementary system, as shown in Figure 3.This system consists of thr microgrids, each equipped with renewable energy generation facilities and Distribute Generation (DG) devices as backup power sources.In this system, the generation various energy sources is subject to environmental conditions, leading to uncertainti and volatility.To balance power generation with load demand, surplus electricity can b stored in the energy storage system for future use.Moreover, when the renewable energ generation exceeds load demand, this surplus electricity can also be sold through energ transactions with the main grid, thereby achieving optimal energy allocation an maximizing economic benefits.For electricity pricing, this paper adopts time-of-use (TOU) pricing.The year is divided into heating and non-heating periods, and the day is divided into peak and off-peak hours.The non-heating period spans from 1 April to 31 October, and the heating period from November 1 to March 31 of the following year.During the non-heating period, peak hours are from 8:00 to 22:00, and off-peak hours are from 22:00 to 8:00 (the next day); during the heating period, peak hours are from 8:00 to 20:00, and off-peak hours are from 20:00 to 8:00 (the next day).The electricity price data were obtained from the official website of the State Grid Shandong Electric Power Company, and are shown in Table 2.

Cost Function
The goal of this paper is to achieve system cost minimization through microgrid scheduling optimization while fulfilling load demands.Consequently, we take into account the operational costs of each microgrid unit, transaction costs, and penalty costs simultaneously.To encourage the system to prioritize the use of renewable energy to satisfy the load demand, we set the selling price of electricity from the microgrid at 50% of the current electricity rate.
The operating costs of DG units can typically be approximated by a quadratic equation [19] due to the non-linear relationship between the cost and the generated electricity.This relationship encompasses fixed costs, costs that change linearly, and costs that are proportional to the square of the generated electricity.
To ensure that the operating costs of DG units are lower than the cost of purchasing electricity, the specific values of a, b, and c are shown in Table 3.The cost of trading electricity in the system is expressed as where p t is the electricity price at the current moment, and P Buy t and P Sell t are the amounts of electricity to be bought and sold, respectively.
The operating cost of renewable energy equipment is expressed as k is the linear coefficient The operating cost of the energy storage system (ESS) is expressed as The variable P swap t represents the charging and discharging power of the ESS.If P swap t is less than 0, the device is discharging; otherwise, the device is charging.during time period t overflows, resulting in curtailment situation.We set the penalty cost for curtailment as The detailed parameters are shown in Table 4.

Day-Ahead and Intra-Day Rolling Optimization Scheduling Strategy
The day-ahead and intra-day rolling optimization scheduling strategy is shown in Figure 4.
The operating cost of the energy storage system (ESS) is expressed as The variable The detailed parameters are shown in Table 4.

Day-Ahead and Intra-Day Rolling Optimization Scheduling Strategy
The day-ahead and intra-day rolling optimization scheduling strategy is shown in Figure 4.The foundation of rolling optimization is the day-ahead and intra-day forecasting.Day-ahead forecasting utilizes the CNN-LSTM network with a 1 h time step to predict the power generation from photovoltaics, wave energy, and wind turbines, as well as load demand, 24 h in advance.Intra-day forecasting, with a 10 min time step, predicts renewable energy generation and load demand 1 h ahead.In the day-ahead optimization phase, based on the day-ahead forecast results, a deep reinforcement learning algorithm is used to formulate an hourly scheduling plan for the ESS, DG, and the main grid, at 1 h intervals, 24 h in advance.The day-ahead objective function can be represented as min where (11a) ensures the balance of system power; (11b) indicates the maximum charging and discharging power of the energy storage system; (11c) limits the State of Charge (SOC) of the energy storage system; and (11d) represents the SOC update strategy.Intra-day optimization is based on the output of day-ahead optimization, adjusting the dispatch plan according to the deviations between intra-day forecasts and day-ahead forecasts.To maximize system economy, intra-day optimization tries to avoid changes to the scheduling plans of DG, ESS, and other units, focusing instead on re-planning the deviated wind, solar, and wave energy to perform peak shaving and valley filling.Adjustments to other units' scheduling plans are made only when the aforementioned operations cannot meet the load demand.The cost of intra-day optimization considers the impact of day-ahead forecast deviations on system costs, represented as Additionally, to encourage intra-day optimization to schedule only the energy with forecast deviations without altering the day-ahead decisions, we introduce an additional constraint: The objective function is updated as

CNNLSTM-DDPG Algorithm
Despite the discernible patterns exhibited by renewable energy sources, they also possess characteristics of uncertainty and volatility.The fundamental cause of these phenomena is the variability of weather conditions.Sudden changes in weather can increase the challenges of forecasting, as simple prediction models may fail to adequately capture such abrupt changes, leading to decreased forecasting accuracy.In the scenario considered in this paper, high accuracy is required to support the stability of the power supply in the scheduling system.Component-based forecasting is an excellent prediction method, as it allows for a better understanding of the factors behind renewable energy forecast results, but it inevitably increases the complexity of the process by requiring the integration of predictions from various components.Regarding neural network prediction models, a vast body of research has validated their reliability.Compared to component-based forecasting, neural network models may seem more complex at first glance, but they can autonomously capture and learn from complex relationships within the data, avoiding the need for result integration, and offering considerable accuracy in predictions.Considering both efficiency and accuracy, we have chosen to use neural network models for forecasting.For such long-term sequential data, CNN-LSTM stands out as an excellent choice.
The CNN-LSTM network is a hybrid neural network model that combines Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks.This structure leverages the powerful capability of CNNs in processing spatial features along with the advantages of LSTM in handling time-series data, making it particularly suitable for the complex and temporal correlation task of predicting renewable energy generation in this paper.
Wind, wave, and photovoltaic energies in the Bohai and Yellow Sea regions are influenced by weather conditions, seasonal changes, and other factors.CNNs can effectively extract spatial features, while LSTM excels at capturing the dynamic changes in these data over time.Utilizing both CNNs and LSTM allows for a more comprehensive understanding and use of these temporal correlation features, thereby enhancing prediction accuracy.And the CNN-LSTM model is capable of efficiently processing datasets by automatically extracting important features, thereby saving time and enhancing efficiency.
This paper employs a CNN-LSTM network for 24 h day-ahead forecasting and 1 h intraday forecasting of renewable energy generation data.When forecasting 24 h in advance, the CNN-LSTM network can analyze the daily variation trends of energy production based on past data, outputting future predictions for the next 24 h on an hourly time scale.This provides crucial information for the energy dispatch and management of the power grid.For predictions within an hour, the CNN-LSTM model can capture short-term fluctuations in energy production more accurately, outputting predictions for the next hour on a 10 min time scale.This is vital for adjusting the grid's immediate load and optimizing energy distribution.The combination of day-ahead forecasting and intra-day forecasting can effectively reduce the impact of renewable energy instability on the distribution grid.
Based on the forecast data from the CNN-LSTM network, the energy scheduling optimization problem can be formulated as a Markov Decision Process (MDP) decision model, which can then be solved using deep reinforcement learning algorithms.This model is represented by a quintuple (S, A, P, R, γ), where S represents the state space, A represents the action space, P represents the state transition probability, R represents the reward function, and γ represents the discount factor.In this paper, the state space consists of the power generation of renewable energy, power demand, and the charging/discharging status of batteries, which is s t = (P R t , P Load t , SOC t , p t ).Based on the state of the microgrid system at time t, the action space for the microgrid is defined as a t = (P DG i,t , P swap t , P R i,t , P Sell t , P

Buy t
). P is the transfer probability of transferring from the current state, s t , to the next state, s t+1 , after executing the current action, a t .The deterministic policy, π : S → P(A) , is defined as the mapping of received states to actions.Different actions explored in the environment will receive different rewards.The goal of reinforcement learning is to use the reward function as a guide to discover the action that maximizes rewards as the optimal solution to the optimization problem.The objective of this paper is to minimize the operational costs of the system.The day-ahead scheduling reward function is represented as follows: For intra-day optimization, the reward function is adjusted accordingly: The authors of [25,26] have effectively addressed the issue of cost reduction by incorporating deep reinforcement learning (DRL) into the energy dispatch problem of energy systems.The Deep Deterministic Policy Gradient (DDPG) algorithm has garnered attention for its effective handling of continuous action space problems [27,28].This algorithm combines the representational capabilities of deep learning with the decision optimization techniques of policy gradient methods, adopting a variant of the Actor-Critic architecture, implemented through deep neural networks to approximate both policy and value functions [29].The actor network directly maps states to deterministic actions, while the Critic network evaluates the expected return for given states and actions [30].Additionally, DDPG incorporates an experience replay mechanism, storing past transitions (states, actions, rewards, and new states) for reuse during training, thus reducing correlations between samples and enhancing learning stability.To further stabilize the learning process, DDPG also employs target network technology, setting up target networks for both the Actor and Critic and slowly updating their parameters, which helps to mitigate the issue of moving targets.
The flow of the algorithm is shown in Figure 5 and Algorithm 1.

Initialize:
The Critic networks Q(s, a θ Q ) and Actor network µ(s|θ µ ) ; the weights are θ Q and θ µ .The Critic target networks Q′(s, a θ Q ) and Actor target network µ′ have weights θ Q′ ← θ Q and θ µ′ ← θ µ The experience playback buffer (R) has size n.Empty the experience playback buffer (R).
Reset the simulation parameters of the energy dispatch system to obtain the initial observation state, s 1 .
Normalize state s i to s i ′.

6.
Obtain Actor network action a i and noise n i : Execute action a i , obtain the reward, r i , and observe the new state, s i+1 .

8.
Store transmission (s i ′, a i , r i , s i+1 ′) to the Replay Buffer (R).

11.
Update the Critic network parameters θ Q j based on the mean square loss function:

12.
Update the Actor network using the stochastic policy gradient:

13.
Update the target network parameters:

end for 15. end for
The initial stage involves day-ahead scheduling optimization, wherein the CNN-LSTM network is utilized to obtain initial value s Day−ahead i for the future 24 h renewable energy generation, power demand, battery State of Charge (SOC), and current electricity price.The Actor network gives the power scheduling plan a energy systems.The Deep Deterministic Policy Gradient (DDPG) algorithm has garnered attention for its effective handling of continuous action space problems [27,28].This algorithm combines the representational capabilities of deep learning with the decision optimization techniques of policy gradient methods, adopting a variant of the Actor-Critic architecture, implemented through deep neural networks to approximate both policy and value functions [29].The actor network directly maps states to deterministic actions, while the Critic network evaluates the expected return for given states and actions [30].Additionally, DDPG incorporates an experience replay mechanism, storing past transitions (states, actions, rewards, and new states) for reuse during training, thus reducing correlations between samples and enhancing learning stability.To further stabilize the learning process, DDPG also employs target network technology, setting up target networks for both the Actor and Critic and slowly updating their parameters, which helps to mitigate the issue of moving targets.The flow of the algorithm is shown in Figure 5 and Algorithm 1.  ) by minimizing the mean squared error loss function (18) to update the Critic network parameters. 2 The Actor network employs the gradient ascent algorithm, aiming to maximize the expected return estimated by the Critic network, thus updating the network parameters.The policy gradient for updating the Actor network is represented by the following equation: Lastly, the target network parameters are updated through a soft update mechanism, employing Equations ( 20) and ( 21) to control the update rate of the target network parameters, thereby enhancing learning stability.
Thus, the DDPG day-ahead optimization algorithm completes a training iteration.It is important to note that during each training session, the DDPG may randomly draw a batch of experiences from the replay pool (R) to update the network.This process is repeated N times to produce the optimal 24 h day-ahead scheduling plan.
In intra-day scheduling optimization, one hour prior to the actual operation, future one-hour power generation and demand forecasts are obtained from the CNN-LSTM network.Subsequently, the differences between these intra-day forecast results and the preestablished day-ahead scheduling plan are calculated to determine the renewable energy forecast deviation, P R− t,i , and the load demand forecast deviation, P L− t,i .The state space, s Intra−day i , for intra-day scheduling comprises a tuple (P R− t,i , P L− t,i , SOC Day−ahead , p t ) that includes these two forecast deviations, the State of Charge (SOC) coefficient of the energy storage system from the day-ahead plan, and the current electricity price.The network updating process thereafter follows the same procedure as for day-ahead optimization scheduling, with the exception that the reward function is replaced with Equation ( 16).After N training sessions, the intra-day optimized scheduling plan is output.

Simulation Analysis
Using the collected data, we created an original dataset.We designated the first three weeks of each month as the training set and the remaining time as the test set.This approach allows the trained algorithm to consider the seasonal variations in renewable energy generation and electricity demand.

Power Generation Forecast
We compared the day-ahead forecast and intra-day forecast results for a typical day and found discrepancies between them.This is because the forecasting error tends to decrease as the time approaches closer to the actual operation.The day-ahead forecast is conducted 24 h before the actual operation, while the intra-day forecast is carried out 1 h prior, utilizing the latest meteorological data and system status information, which may not be as accurate during the day-ahead forecast.
The forecast results of a typical day are shown in Figure 6a-d.From Figure 6, it can be observed that the accuracy of day-ahead forecasts is significantly lower compared to intra-day forecasts.This discrepancy can be attributed to two main reasons: (1) The dataset for intra-day forecasting is updated more frequently, with a finer time scale, making it more sensitive to fluctuations in data.(2) The day-ahead forecast involves making predictions 24 steps ahead for a 24 h period, whereas intra-day forecasting involves only 6 steps for predicting 1 h ahead.Naturally, forecasts involving fewer steps tend to be more accurate.

Scheduling Results
The DDPG algorithm was implemented and trained 500 times in Python using Pytorch.We simulated the system's performance under various monsoon conditions, setting up three renewable energy microgrids continuously to provide power output.The energy storage system (ESS) has a rated capacity of 5000 kW with an efficiency of 0.9.The installed capacities for WEC, WT, and PV are 600 kW, 1000 kW, and 1100 kW, respectively.
The forecast data were input into both the day-ahead optimization DRL model and the day-ahead and intra-day rolling optimization DRL model, with the reward value convergence curve shown in Figure 7.It can be observed that under the optimization of the DDPG algorithm, the reward gradually increases and stabilizes quickly.The reward for rolling optimization scheduling is greater than that for day-ahead optimization scheduling.This is because the rolling optimization scheduling has a more precise scheduling plan, which reduces penalty costs.
intra-day forecasts.This discrepancy can be attributed to two main reasons: (1) The dataset for intra-day forecasting is updated more frequently, with a finer time scale, making it more sensitive to fluctuations in data.(2) The day-ahead forecast involves making predictions 24 steps ahead for a 24 h period, whereas intra-day forecasting involves only 6 steps for predicting 1 h ahead.Naturally, forecasts involving fewer steps tend to be more accurate.

Scheduling Results
The DDPG algorithm was implemented and trained 500 times in Python using Pytorch.We simulated the system's performance under various monsoon conditions, setting up three renewable energy microgrids continuously to provide power output.The energy storage system (ESS) has a rated capacity of 5000 kW with an efficiency of 0.9.The installed capacities for WEC, WT, and PV are 600 kW, 1000 kW, and 1100 kW, respectively.
The forecast data were input into both the day-ahead optimization DRL model and the day-ahead and intra-day rolling optimization DRL model, with the reward value convergence curve shown in Figure 7.It can be observed that under the optimization of the DDPG algorithm, the reward gradually increases and stabilizes quickly.The reward for rolling optimization scheduling is greater than that for day-ahead optimization scheduling.This is because the rolling optimization scheduling has a more precise scheduling plan, which reduces penalty costs.As indicated in the hypothetical Figure 8, the intra-day adjustment phase allows for the precise allocation of electricity plans every 10 min, which is not feasible with dayahead scheduling.It is important to note that the intra-day adjustment phase of rolling optimization scheduling outputs scheduling plans on a 10 min timescale, whereas dayahead scheduling operates on a 1 h basis.To more intuitively analyze the changes and costs between rolling optimization scheduling and day-ahead optimization scheduling, we standardized the time scale.We aggregated the 1 h intra-day adjustment plans, converting the unit of measurement from power generation capacity (kW) to electricity generation and consumption (kWh).As indicated in the hypothetical Figure 8, the intra-day adjustment phase allows for the precise allocation of electricity plans every 10 min, which is not feasible with dayahead scheduling.It is important to note that the intra-day adjustment phase of rolling optimization scheduling outputs scheduling plans on a 10 min timescale, whereas dayahead scheduling operates on a 1 h basis.To more intuitively analyze the changes and costs between rolling optimization scheduling and day-ahead optimization scheduling, we standardized the time scale.We aggregated the 1 h intra-day adjustment plans, converting the unit of measurement from power generation capacity (kW) to electricity generation and consumption (kWh).
optimization scheduling outputs scheduling plans on a 10 min timescale, whereas dayahead scheduling operates on a 1 h basis.To more intuitively analyze the changes and costs between rolling optimization scheduling and day-ahead optimization scheduling, we standardized the time scale.We aggregated the 1 h intra-day adjustment plans, converting the unit of measurement from power generation capacity (kW) to electricity generation and consumption (kWh).The scheduling results obtained are shown in Figure 9.The rolling optimization scheduling results are analyzed horizontally.As shown in Figure 9a,b, during a typical summer day, the sunlight conditions are favorable, allowing for stable photovoltaic power generation.Wind resources are concentrated between 01:00 and 7:00 and between 13:00 and 17:00, with the generated surplus energy being stored in the ESS or sold to the main grid at appropriate times.A peak load occurs between 9:00 and 13:00, with the ESS output compensating for the shortage in generation.Notably, on this day, there is almost no renewable energy generation between 19:00 and 22:00, with the ESS playing a key role in maintaining the power supply.Unlike the summer, the typical winter day experiences poor sunlight conditions, resulting in significantly insufficient energy from the photovoltaic system.However, abundant wind resources on this day provide ample wave and wind energy, with the ESS again playing a crucial role during the evening peak load period.The presence of the ESS allows the power supply system to achieve peak shaving and valley filling from the generation side, enhancing system supply stability while avoiding energy waste.
Vertically comparing rolling optimization scheduling with day-ahead optimization scheduling, Figure 9a,c and 9b,d, respectively, show their scheduling plans on a typical summer monsoon day and a typical winter monsoon day.In Figure 9b,d from 11:00 to The scheduling results obtained are shown in Figure 9.The rolling optimization scheduling results are analyzed horizontally.As shown in Figure 9a,b, during a typical summer day, the sunlight conditions are favorable, allowing for stable photovoltaic power generation.Wind resources are concentrated between 01:00 and 7:00 and between 13:00 and 17:00, with the generated surplus energy being stored in the ESS or sold to the main grid at appropriate times.A peak load occurs between 9:00 and 13:00, with the ESS output compensating for the shortage in generation.Notably, on this day, there is almost no renewable energy generation between 19:00 and 22:00, with the ESS playing a key role in maintaining the power supply.Unlike the summer, the typical winter day experiences poor sunlight conditions, resulting in significantly insufficient energy from the photovoltaic system.However, abundant wind resources on this day provide ample wave and wind energy, with the ESS again playing a crucial role during the evening peak load period.The presence of the ESS allows the power supply system to achieve peak shaving and valley filling from the generation side, enhancing system supply stability while avoiding energy waste.
Entropy 2024, 26, x FOR PEER REVIEW 1 19:00, there is a significant difference in the predicted power generation from wav wind energy.The rolling optimization scheduling algorithm, upon receiving accurate prediction results, timely adjusted the scheduling plan, reducing system co Figure 9a,c at 11:00-13:00 and 21:00-23:00, and in Figure 9b,d at 7:00-11:00 and 22:00, there are significant load power fluctuations.The rolling optimization sched balanced the power fluctuations by adjusting the renewable energy generation ba prediction deviations and utilizing the energy storage system.The analysis of climate conditions showed that under different monsoon cond the output proportion of various renewable energy sources varies.During the su monsoon, the region experiences long sunshine hours and high-intensity solar rad leading to a high proportion of photovoltaic power generation; wind resourc relatively scarce, resulting in lower proportions of wind and wave energy.Duri winter monsoon, when the region experiences high wind speeds and stable directions, wind and wave energy generation contribute more to the output.
With an analysis of electricity prices, the inclusion of the energy storage s significantly contributes to reducing the system's electricity purchasing costs.Obs the State of Charge (SOC) of the energy storage system, it is evident that wh electricity prices are high during the day, the system minimizes electricity purchase the main grid by releasing the stored energy from the ESS to the maximum extent.The subsequent analysis focuses on how intra-day optimization adjustments ad the issue of power fluctuations.Here, "power curtailment" and "power d specifically relate to the dispatch plan, not the overall power supply system distinction is crucial because, although the integration of energy storage systems a main grid can dynamically balance power fluctuations to prevent system-wide Vertically comparing rolling optimization scheduling with day-ahead optimization scheduling, Figure 9a,c and 9b,d, respectively, show their scheduling plans on a typical summer monsoon day and a typical winter monsoon day.In Figure 9b,d from 11:00 to 19:00, there is a significant difference in the predicted power generation from wave and wind energy.The rolling optimization scheduling algorithm, upon receiving more accurate prediction results, timely adjusted the scheduling plan, reducing system costs.In Figure 9a,c at 11:00-13:00 and 21:00-23:00, and in Figure 9b,d at 7:00-11:00 and 19:00-22:00, there are significant load power fluctuations.The rolling optimization scheduling balanced the power fluctuations by adjusting the renewable energy generation based on prediction deviations and utilizing the energy storage system.
The analysis of climate conditions showed that under different monsoon conditions, the output proportion of various renewable energy sources varies.During the summer monsoon, the region experiences long sunshine hours and high-intensity solar radiation, leading to a high proportion of photovoltaic power generation; wind resources are relatively scarce, resulting in lower proportions of wind and wave energy.During the winter monsoon, when the region experiences high wind speeds and stable wind directions, wind and wave generation contribute more to the output.
With an analysis of electricity prices, the inclusion of the energy storage system significantly contributes to reducing the system's electricity purchasing costs.Observing the State of Charge (SOC) of the energy storage system, it is evident that when the electricity prices are high during the day, the system minimizes electricity purchases from the main grid by releasing the stored energy from the ESS to the maximum extent.
The subsequent analysis focuses on how intra-day optimization adjustments address the issue of power fluctuations.Here, "power curtailment" and "power deficit" specifically relate to the dispatch plan, not the overall power supply system.This distinction is crucial because, although the integration of energy storage systems and the main grid can dynamically balance power fluctuations to prevent system-wide power curtailment and power deficit, significant deviations in the dispatch plan can lead to substantial short-term overload stress on the equipment.The power curtailment and power deficit in question result from power fluctuations caused by dispatch deviations due to forecast inaccuracies.The objective of the proposed rolling optimization scheduling algorithm is to significantly reduce the occurrence of these issues, thereby lowering costs and enhancing the stability of the system.Figure 10 illustrates the role of the rolling optimization scheduling strategy in mitigating power fluctuations within the day-ahead scheduling plan, detailing how intra-day adjustment plans counteract the effects of power fluctuations, thereby mitigating instances of power curtailment and power deficit.
Entropy 2024, 26, x FOR PEER REVIEW 16 of 20 substantial short-term overload stress on the equipment.The power curtailment and power deficit in question result from power fluctuations caused by dispatch deviations due to forecast inaccuracies.The objective of the proposed rolling optimization scheduling algorithm is to significantly reduce the occurrence of these issues, thereby lowering costs and enhancing the stability of the system.Figure 10 illustrates the role of the rolling optimization scheduling strategy in mitigating power fluctuations within the day-ahead scheduling plan, detailing how intra-day adjustment plans counteract the effects of power fluctuations, thereby mitigating instances of power curtailment and power deficit.In Figure 10a, during the 9:00 time slot, the power fluctuation is less than 0, indicating that a power deficit occurred in the day-ahead scheduling plan for that period.In the 7:00-10:00 time slot in Figure 10b, the power fluctuation is greater than 0, indicating a need for more electricity, hence a power curtailment event.Rolling scheduling optimization mitigates power fluctuations by adjusting the supply conditions of each generation unit and the charging/discharging states of storage units in a timely manner.

Cost Analysis
Earlier, we mentioned the potential of wave energy to enhance the complementary In Figure 10a, during the 9:00 time slot, the power fluctuation is less than 0, indicating that a power deficit occurred in the day-ahead scheduling plan for that period.In the 7:00-10:00 time slot in Figure 10b, the power fluctuation is greater than 0, indicating a need for more electricity, hence a power curtailment event.Rolling scheduling optimization mitigates power fluctuations by adjusting the supply conditions of each generation unit and the charging/discharging states of storage units in a timely manner.

Cost Analysis
Earlier, we mentioned the potential of wave energy to enhance the complementary nature of the system's energy resources, with a theoretical analysis provided.Next, we aim to validate the impact of wave energy generation on the system through simulation experiments.In these experiments, we excluded the wave energy generation microgrid.To isolate the variable, the installed capacity of the WT was increased to 1600 kW, with all other equipment specifications remaining as before.
Figure 11 presents the scheduling plans in two different scenarios.As observed from the figure, despite the consistency in installed capacity of the power generation system, ESS capacity, etc., across both simulations, the inclusion of WEC resulted in a greater reserve of electricity.In Figure 11a, there is surplus electricity that is sold back to the main grid, further reducing costs.Comparing periods 21:00-23:00 in Figure 11a,b, the system with WEC still has enough energy in the ESS to cover the day's energy shortfall, whereas the system without WEC has to purchase more expensive electricity from the main grid.Table 5 compares the operating costs from two simulation experiments, demonstrating that the system incorporating WT, PV, and WEC has an operating cost that is 5% lower than the system solely comprising WT and PV.

Scenarios
Operating Cost WT-PV-WEC 7888.49WT-PV 8198.15 Then, we compared the system costs of using only day-ahead optimization with those of employing both day-ahead and intra-day rolling optimization.According to the description in the previous section, the system costs are represented as follows: (a) and (b) are Day-ahead optimization and rolling optimization system costs, respectively.Figure 12 illustrates the curtailment and deficit situations within the same typical day for the two optimization strategies, where values greater than 0 represent power curtailment and values less than 0 indicate power deficit.Figure 12a represents the scheduling results for a typical summer monsoon day; Figure 12b represents the scheduling results for a typical winter monsoon day.The system employing the dayahead and intra-day rolling optimization strategy exhibits significantly lower power curtailment and deficits compared to the system that only uses day-ahead optimization.
As a result, the costs associated with  Table 5 compares the operating costs from two simulation experiments, demonstrating that the system incorporating WT, PV, and WEC has an operating cost that is 5% lower than the system solely comprising WT and PV.Then, we compared the system costs of using only day-ahead optimization with those of employing both day-ahead and intra-day rolling optimization.According to the description in the previous section, the system costs are represented as follows: (a) and (b) are Day-ahead optimization and rolling optimization system costs, respectively.Figure 12 illustrates the curtailment and deficit situations within the same typical day for the two optimization strategies, where values greater than 0 represent power curtailment and values less than 0 indicate power deficit.Figure 12a represents the scheduling results for a typical summer monsoon day; Figure 12b represents the scheduling results for a typical winter monsoon day.The system employing the day-ahead and intra-day rolling optimization strategy exhibits significantly lower power curtailment and deficits compared to the system that only uses day-ahead optimization.As a result, the costs associated with C abandon t and C de f icit t are significantly higher for the day-ahead optimization alone, leading to increased overall costs.As shown in Table 6, the system costs using rolling optimization scheduling are reduced by 16.1% and 22% on typical winter and summer days, respectively, compared to using day-ahead scheduling.

Conclusions
This paper proposes a multi-time-scale rolling optimization scheduling model that integrates offshore wind, wave, and photovoltaic energy sources in the Bohai and Yellow Sea regions.This model can enhance grid stability and mitigate the impacts of renewable energy variability on the distribution system by leveraging the complementary characteristics of these energy sources.
Through an in-depth analysis of the collected dataset on renewable energy generation in the test maritime area, we confirmed the complementarity between wind, photovoltaic, and wave energies.Using a CNN-LSTM network, we predicted the power generation for both the day ahead (24 h) and intra-day (1 h), capturing short-term and long-term trends effectively.Subsequently, the DDPG algorithm was employed to explore the predicted state space and identify the optimal scheduling strategy.
Simulation experiments simulated the system's performance in various monsoonal conditions.The results demonstrate that day-ahead and intra-day rolling optimization can effectively balance power fluctuations through timely intra-day adjustments.The application of energy storage systems (ESSs) also bolstered the system's capability to cope with renewable energy fluctuations, reducing electricity purchase costs from the grid.Horizontal and vertical analyses showed that rolling optimization scheduling reduces curtailment and power deficits more effectively than traditional day-ahead scheduling, thereby lowering the overall system costs.On typical summer and winter days, the costs of systems using rolling optimization scheduling decreased by 16.1% and 22%, respectively.This study offers valuable insights for the efficient management and optimization scheduling of renewable energy grids in other maritime regions.sources, F.L.; data curation, H.W. and F.M.; writing-original draft preparation, R.X.; writing-review and editing, F.L. and J.L.; visualization, R.X. and H.W.; super-vision, F.L.; project administration, F.L.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant number U2006222.
Institutional Review Board Statement: Not applicable.

Figure 1 .
Figure 1.The annual power generation from marine renewable energy sources.

Figure 1 .
Figure 1.The annual power generation from marine renewable energy sources.
26,  x FOR PEER REVIEW 5 of 20 relatively stable over the year, contributing positively to enhancing the stability of grid operations.

Figure 2 .
Figure 2. The entropy values from marine renewable energy sources.

Figure 2 .
Figure 2. The entropy values from marine renewable energy sources.

Entropy 2024 ,
26,  x FOR PEER REVIEW 5 of relatively stable over the year, contributing positively to enhancing the stability of gr operations.

Figure 2 .
Figure 2. The entropy values from marine renewable energy sources.

Figure 4 .
Figure 4. Day-ahead and intra-day optimization scheduling strategy.

′.)
θ µ ) + n i , −1), 1) .The agent, The reward function for day-ahead scheduling is detailed in Equation(15).At this juncture, the Critic network assesses the action value function, Q(s Day−ahead i , a Day−ahead i ), of the current scheduling plan, a Day−ahead i , evaluating the plan's value.Then, the s Day−ahead i+1 is fed into both the Target Actor and Target Critic networks.The Target Actor proposes a scheduling plan, a Day−ahead i+1 , for state s Day−ahead i+1 , and the Target Entropy 2024, 26, 331 11 of 19 Critic calculates the maximum Q-value, Q Day−ahead target, for the optimal action of the next state via the Bellman optimality in Equation(17), which is relayed back to the Critic network.The quadruplet (s is stored in the experience replay pool (R) for subsequent training.|θµ′ )|θ Q′ )(17)

Figure 5 .
Figure 5. CNNLSTM-DDPG algorithm.Figure 5. CNNLSTM-DDPG algorithm.Following this, the parameters of both the Actor and Critic networks are updated.The Critic receives the evaluation Q Day−ahead target,i and compares it against its own calculations,

Figure 6 .
Figure 6.Power forecast results: (a) the PV forecast result; (b) the WT forecast result; (c) the Netload forecast result; (d) the WEC forecast result.

Figure 6 .
Figure 6.Power forecast results: (a) the PV forecast result; (b) the WT forecast result; (c) the Netload forecast result; (d) the WEC forecast result.

Figure 7 .
Figure 7.The reward of the DDPG algorithm.

Figure 7 .
Figure 7.The reward of the DDPG algorithm.

Figure 9 .
Figure 9.The scheduling result: (a) the rolling optimization scheduling with summer monsoo (b) the rolling optimization scheduling with winter monsoon days; (c) the day-ahead optim scheduling with summer monsoon days; (d) the day-ahead optimization scheduling with monsoon days.

Figure 9 .
Figure 9.The scheduling result: (a) the rolling optimization scheduling with summer monsoon days; (b) the rolling optimization scheduling with winter monsoon days; (c) the day-ahead optimization scheduling with summer monsoon days; (d) the day-ahead optimization scheduling with winter monsoon days.

Figure 10 .
Figure 10.Power fluctuation on different typical days: (a) the result of summer monsoon days; (b) the result of winter monsoon days.

Figure 10 .
Figure 10.Power fluctuation on different typical days: (a) the result of summer monsoon days; (b) the result of winter monsoon days.

Entropy 2024 , 20 Figure 12 .
Figure 12.Power deficit and power curtailment scenarios on different typical days: (a) the result of summer monsoon days; (b) the result of winter monsoon days.

Table 3 .
The specific values of a, b, and c.
occurs and the power demand of users cannot be met during time period t, this is referred to as a power deficit situation.The sum on the left side of the inequality represents the scheduled power P

Table 4 .
The values of parameters.
users cannot be met during time period t, this is referred to as a power deficit situation.The sum on the left side of the inequality represents the scheduled power

Table 4 .
The values of parameters.
WEC k 0.2 abandon k 0.8 k WT (P

Table 5 .
The costs of WT-PV-WEC and WT-PV.

Table 5 .
The costs of WT-PV-WEC and WT-PV.

Table 6 .
The costs of optimization scheduling.