1. Introduction
In recent years, with the rapid development of renewable energy resources, a new type of electric power system characterized by cleanliness, low carbon emissions, intelligent friendliness, and interactive openness has gradually taken shape. Against the backdrop of this new electric power system, the intermittency, volatility, and uncertainty of a high proportion of renewable energy sources have led to the “trilemma” of energy security, economy, and sustainability. Navigating this predicament has become a significant challenge for grid dispatch departments [
1]. Grid dispatching strategies, according to their temporal scale, can be divided into day-ahead scheduling, intra-day scheduling, and real-time scheduling. Among these, real-time scheduling, which demands the highest computational timeliness, further corrects the outcomes of day-ahead and intra-day scheduling based on accurate ultra-short-term forecasts of renewable energy sources and loads. However, the real-time dispatch has limited capabilities for mobilizing the flexible resources of units within a short timeframe [
2], and it is challenging to precisely achieve power balance and meet the N-1 operating requirements [
3], especially in scenarios with a high proportion of renewable energy. Therefore, designing a secure and efficient real-time dispatch method for modern power systems is of crucial importance.
Traditional power grid scheduling primarily employs model-driven approaches, constructing power system models that account for typical operational scenarios. These models are utilized to analyze and optimize grid operation strategies against various system constraints based on the model [
4,
5,
6,
7,
8]: scheduling methods based on grid expert strategies [
4] typically utilize models incorporating a limited number of grid forecast scenarios for offline fault analysis, identifying system vulnerabilities, and then formulating scheduling plans based on dispatchers’ personal experience [
5]. However, grid expert strategy methods struggle to promptly address real-time safety issues in scenarios with a high proportion of renewable energy sources and are insufficiently adaptable; moreover, traditional methods have a low level of intelligence, with decision-making being prone to human error [
5]. Scheduling methods based on mathematical optimization algorithms, with solving techniques primarily including robust optimization [
6], stochastic programming [
7], and chance-constrained programming [
8]. Reference [
6] utilizes robust optimization to establish a grid scheduling model considering the uncertainty of wind power output, yet this method generates overly conservative scheduling decisions due to its strict adherence to operational safety constraints during the solution process, failing to fully exploit the system’s potential to absorb renewable energy sources. Reference [
7] applies stochastic programming to transform a grid scheduling model with wind power output uncertainty into a deterministic scheduling model for a solution, but this method faces challenges due to the uncertainty of renewable energy outputs and the large scale of probabilistic modeling. Reference [
8] describes wind power output uncertainty using a Gaussian mixture model and proposes a chance-constrained programming scheduling method, converting hard constraints in the solution model to soft constraints and allowing for certain degrees of safety constraint violations, but this method requires the determination of the probability distribution of wind power output, leading to certain errors in scheduling results. Overall, traditional model-driven scheduling methods are essentially a static, deterministic decision-making paradigm. Against the background of the continuously increasing penetration rate of renewable energy, their fundamental deficiencies are manifested in the following aspects: the decision-making mechanism is rigid and lacks adaptability, and there is a fundamental conflict between computational complexity and the requirement for real-time performance.
These systematic deficiencies pose great challenges to them in new-type power systems dominated by new energy sources. Specifically, while ensuring that the safety margin is not reduced, it is also necessary to improve the system’s flexibility to accommodate fluctuating power sources. Consequently, traditional methods are no longer capable of supporting the sustainable development requirements of modern power grids.
The abundance of data in power systems lays a foundation for realizing data-driven artificial intelligence real-time scheduling methods. Data-driven methods, such as Reinforcement Learning (RL), have emerged as promising solutions for real-time scheduling. RL offers a powerful framework for solving sequential decision problems in power system scheduling [
9]. However, RL methods face certain challenges despite their potential. Reference [
10] introduces the double deep Q-learning method, which uses deep neural networks as function approximators to improve the efficiency of solving large grid state spaces. Nevertheless, this method relies on discrete state and action spaces, which encounter the issue of the curse of dimensionality, leading to increased computational complexity. Although the deep deterministic policy gradient algorithm, as discussed in reference [
11], mitigates the dimensionality problem by converting discrete variables into continuous ones, the training time remains long (62.2 h). Additionally, the Asynchronous Advantage Actor-Critic (A3C) algorithm also reduces the dimensionality of the scheduling model but does not address the N-1 operational risk in its reward function design [
12]. These RL methods often rely on random search to obtain rewards, which, when applied to large power systems, result in lengthy training times due to the large dimensionality of the action space, making it difficult to converge to the optimal policy [
13].
To address the real-time performance issues inherent in RL, Behavioral Cloning (BC) methods have been proposed as an alternative. Unlike RL, which requires extensive exploration, BC learns by imitating existing scheduling strategies, thus achieving faster and more accurate decision-making with less data [
14,
15,
16]. As a result, BC methods offer a significant improvement in solving real-time scheduling problems, with sample complexity decreasing exponentially and enhancing computational efficiency. Reference [
17] shows that BC can improve the training efficiency of optimized agents by approximately 37.5% compared to RL algorithms in cloud resource scheduling. Reference [
17] demonstrates that BC can achieve more than a 50% energy-saving rate in online vehicle edge computing task scheduling compared to baseline models.
However, the application of BC in real-time power system scheduling remains underexplored. Despite the advantages of BC in real-time performance, its ability to maintain high precision in scheduling remains limited, which often neglects the inherent topological structure of power systems, leading to a loss of critical information and reduced accuracy in scheduling [
18]. To overcome this, Graph Neural Networks (GNN), a hot research area in recent years, can effectively capture the topological information of power systems and model the complex nonlinear relationships between nodes, significantly improving scheduling accuracy. Reference [
19] indicates that GNN outperforms traditional multilayer perceptrons (MLP) in reactive power optimization problems in power systems. Thus, incorporating GNN with BC methods to learn optimal scheduling strategies can better utilize the topological structure of the system, compensating for the limitations of traditional methods in handling network structural information, and ultimately improving both the accuracy and efficiency of scheduling strategies.
This paper introduces a Grid Expert Strategy-Based Graph Neural Networks Behavioral Cloning (GES-GNNBC) method for real-time power system dispatch. Its main features include: (1) The design of a grid model based on graph theory that provides real-time information on nodes, upstream and downstream unit information on branches, and unit-to-branch connectivity, which can be utilized for the optimization of grid operation. (2) The proposition of a Grid Expert Strategy (GES) that takes into account grid operation optimization and power balance control. This strategy can adjust the active power output of units across the entire grid based on unit-to-branch association information and the full dispatch criteria for renewable energy sources, achieving optimization of grid overload and real-time load rate. Moreover, the strategy allows for the real-time adjustment of the combination of flexible units, balancing system power in real time, and enhancing the safety margin of balancing units. (3) By taking GNN into account, it is able to effectively capture the topological information of the power system, which significantly improves the scheduling accuracy on the basis of the efficient training process of the BC approach. The GES-GNNBC is fused with GES, realizing a model-data-driven GES-GNNBC method for real-time dispatch.
The remainder of this paper is organized as follows:
Section 2 introduces the operational rules of the power grid and the foundational technologies related to GES-GNNBC;
Section 3 proposes the GES design scheme, including grid modeling based on graph theory and the grid expert strategy;
Section 4 presents the GES-GNNBC implementation framework, comprising an introduction to the GNN architecture, the specific design scheme for BC, and the integration and application of GES-GNNBC in real-time power system dispatch;
Section 5 validates and compares the proposed methods with an improved IEEE 33-bus system;
Section 6 concludes the paper.
5. Case Study
To verify the effectiveness of the GES-GNNBC method, this paper conducts a case study based on the IEEE 33-bus standard system. The IEEE 33-bus test system, derived from the MATPOWER 7.1 toolbox, comprises 1 generator unit, 32 transmission lines, and 32 load nodes. To simulate the output of renewable energy units, this paper modifies the 33-bus system by replacing the unit at node 29 with a photovoltaic power station and the unit at node 15 with a wind farm, as illustrated in
Figure 3.
The rated capacity of the renewable energy units is 1.3 times that of the original synchronous units, and the modified system achieves a high proportion of renewable energy installed capacity at 28.1%. The output of renewable energy and load data are based on publicly available data from the California power grid in March, June, September, and December 2021 [
25]. To align with the IEEE 33-bus system, the renewable energy output data are proportionally adjusted to 25% of the actual data, and the load data are adjusted to 20% of the actual figures.
This paper utilizes PYPOWER 5.1.17, a Python 3.9 version of the widely used MATPOWER, as a grid environment simulator. Setting in Equation (4), in Equation (7), in Equation (16), in Equation (22), and , in Equation (28) are 0.57 pu, 0.43 pu, respectively. The loss function required for GES-GNNBC training is shown in Equation (34) where is set to be 2 and is 0.02. The reward function is shown in Equation (15), where and , , and are 1, 2, and 1, respectively.
In order to verify the training efficiency and generalization ability of GES-GNNBC, this study analyzes the performance of the traditional expert method (TEM) [
4], the A3C-based grid scheduling method [
12], and the GES-GNNBC method proposed in this paper so as to compare and analyze the three approaches in terms of the grid operation optimization and power balance control capabilities.
5.1. Experimental Settings
In terms of expert experience data collection, according to the state space S and action space A in Equations (32) and (33), each group of expert experience data includes the real-time state of the grid and the scheduling policy. In this paper, we take the 288 groups of GES input-output data with 5-min intervals as one scheduling day and randomly select 25 scheduling days per month in March, June, September, and December, and finally obtain a total of 28,800 groups of expert experience. The GNN-GES input-output data are randomly selected from 25 scheduling days in each month of March, June, September, and December and processed with GES to obtain a total of 28,800 sets of expert experiences.
First, GES-GNNBC is trained offline using expert experience data and the BC algorithm for 10,000 iterations; second, to demonstrate the stability of GES-GNNBC during the training process, A3C is trained for the same long training period as GES-GNNBC. Since the TEM scheme does not require training and does not have convergence problems, this section only compares the training convergence of GES-GNNBC in the BC algorithm training phase and A3C.
5.2. Training Convergence Comparison
The results of the training convergence comparison between the GES-GNNBC and A3C algorithms are shown in
Figure 4. The reward value is the average reward value of a single decision within a scheduling day; considering the variability of the scheduling scenarios, the reward value has a certain degree of reasonable oscillation under both algorithms. In terms of training results, GES-GNNBC reaches a high reward value of 1.96 at the beginning of training, and then after BC algorithm training, the reward value is further improved, and the training converges stably after 2000 scheduling cycles, with an average value of 2.02; whereas the A3C algorithm, even after 7500 scheduling cycles, still has a lower reward value than that of the GES-GNNBC, with an average value of 1.54. In terms of training time, the reward value is the average reward value of a single decision within a scheduling day, with a reasonable degree of oscillation, considering the differences in scheduling scenarios. In terms of training time, GES-GNNBC takes 1.2 h to reach the stable convergence condition (2000 scheduling cycles), while A3C takes 22.4 h to train, which is a significant increase in training speed. In conclusion, the training efficiency of GES-GNNBC is better than that of the A3C algorithm, and it has more potential to be applied in the real-time scheduling of power systems.
5.3. Comparison of Generalization Capability
In order to verify the generalization capability of the proposed GES-GNNBC, this paper selects six large grid disturbance events as test events to test and compare the three scheduling schemes, including: (1) three branch N-1 fault scenarios, which are the faults of lines 5, 12, and 19, respectively; (2) two renewable energy output steep rise/steep fall scenarios, with the range of the steep rise/steep fall accounting for the total generation ±15%; and (3) one load surge scenario with a surge range of 10%.
The grid simulator runs and solves the grid currents based on the scheduling decisions returned by the intelligentsia, stipulating that if the currents do not converge, the current scheduling scenario is judged to be non-convergent, and the comparison results are shown in
Table 1, where the average reward value is the total reward value in a scheduling cycle. As shown in
Table 1, GES-GNNBC has a scenario convergence number of 6, which is able to converge in all the scheduling scenarios, while TEM and A3C algorithms have a scenario convergence number of 4 and 3, respectively. The results indicate that, as previously discussed, the TEM approach exhibits inherent rigidity and limited adaptability, failing to effectively accommodate the current power grid landscape characterized by high penetration rates of renewable energy sources. Meanwhile, the A3C framework demonstrates conventional RL methodologies encountering significant training challenges, thereby exhibiting compromised performance capabilities under such operational conditions. The average reward value of GES-GNNBC is 173.26, while TEM and A3C algorithms are 128.52 and 88.47, respectively. It can be seen that GES-GNNBC obtains a much higher reward value than the other two algorithms. In terms of renewable energy utilization, both GES-GNNBC and TEM are better, with 99.98% and 99.96%, respectively, while the A3C algorithm is lower than the first two algorithms, with 91.68%. Based on the data comparison, it can be seen that the grid operation optimization capability of GES-GNNBC can get higher branch optimization rewards and renewable energy utilization rewards; moreover, the power balance control strategy of GES-GNNBC can make the scheduling decision more robust when coping with the scenarios of renewable energy output steeply rising/decreasing, load surge, and generator dropping out; at the same time, the GES-GNNBC is based on the online learning of the GES, which is better than A3C, and the GES-GNNBC is more intelligent. Meanwhile, GES-GNNBC is based on GES for online learning, and compared with the A3C algorithm, the GES-GNNBC intelligent body model converges and is more stable. Therefore, GES-GNNBC has better generalization ability than TEM and A3C.
5.4. Decision Time Comparison
In order to analyze the solution efficiency of the scheduling methods, the single-step decision-making time of the three schemes is compared. The computer platform used is the 13th Gen Intel(R) Core(TM) i5-13490F 2.50 GHz with 16 GB of RAM, and the average single-step decision-making time is shown in
Figure 5. The traditional TEM method takes 1.8 s, while the GES-GNNBC and A3C take 0.12 s and 0.14 s, respectively, and the decision-making time of the GES-GNNBC is improved by 15 times compared with that of the TEM. Although the time consumed by the GES-GNNBC and A3C algorithms are close to each other, the A3C algorithm is inferior to GES-GNNBC in terms of training efficiency and generalization ability in the analysis of this paper; therefore, it proves that the efficiency of GES-GNNBC is high when it is applied to the real-time security scheduling of electric power systems.
5.5. Grid Load Optimization Comparison
In order to reflect the advantages of GES-GNNBC real-time scheduling, this section reflects the advantages of GES-GNNBC’s grid operation optimization strategy in terms of grid load rate optimization comparison and contrast.
The data of March 5 were selected as the test scenario to compare the grid load optimization capability of GES-GNNBC, TEM, and A3C, and the results are shown in
Figure 6.
Figure 6 shows the comparison of the average values of the branch load factor of the whole network at multiple time points in the scheduling cycle, and the average value of the branch load factor of GES-GNNBC is significantly lower than that of the TEM and A3C algorithms because GES-GNNBC is able to make the branches with large transmission capacity carry more power loads.
Figure 6 shows the comparison of the variance of the algorithms, which represents the uniformity of the branch load, and the variance of the branch load of GES-GNNBC is significantly lower than that of the TEM and A3C algorithms. Therefore, GES-GNNBC is able to reduce the grid loading rate, improve the uniformity of grid current distribution, and enhance the grid transmission utilization.
The data of March 17 is selected as the test scenario, and GES-GNNBC is utilized for real-time scheduling, and the scheduling results are shown in
Figure 7. During 00:00–06:00, when the renewable energy output is high and the load demand is low, GES-GNNBC prioritizes meeting the load through renewable energy sources and reduces the thermal unit output; During 08:00–10:00, 11:00–12:00, and 20:00–21:00 h, when the load demand is high and the renewable energy output is low, GES-GNNBC meets the power balance by increasing the output of thermal power units. At the same time, GES-GNNBC keeps the balancer’s output locked around the neutral point of 395 MW (the balancer’s power operation range is 338–452 MW), maximizing the balancer’s real-time safety regulation margin.
In summary, GES-GNNBC can improve the robustness of the power system dispatching decision by fully considering grid operation optimization and power balance control. At the same time, GES-GNNBC’s online learning method can greatly improve the training efficiency and computational efficiency of intelligent algorithms and ensure a high degree of real-time scheduling decisions.