Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network

Huang, Qiyue; Wang, Yapeng; Yang, Xu; Im, Sio-Kei; Cai, Jianxiu

doi:10.3390/app151910555

Open AccessArticle

Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network

by

Qiyue Huang

^1,2,

Yapeng Wang

^1,*

,

Xu Yang

¹,

Sio-Kei Im

³ and

Jianxiu Cai

¹

Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China

²

New Energy and Power System Research Center, Ningbo Polytechnic University, Ningbo 315700, China

³

Macao Polytechnic University, Macao 999078, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10555; https://doi.org/10.3390/app151910555

Submission received: 28 August 2025 / Revised: 23 September 2025 / Accepted: 23 September 2025 / Published: 29 September 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

As an emerging distributed energy system, microgrid power flow prediction plays a crucial role in optimizing energy dispatch and power grid operation. Traditional methods of power flow prediction mainly rely on statistics and time series models, neglecting the spatial relationships among different nodes within the microgrid. To overcome this limitation, a Physical-Informed Graph Attention Network (PI-GAT) is proposed to capture the spatial structure of the microgrid, while an attention mechanism is introduced to measure the importance weights between nodes. In this study, we constructed a representative 14-node microgrid power flow dataset. After collecting the data, we preprocessed and transformed it into a suitable format for graph neural networks. Next, an autoencoder was employed for pre-training, enabling unsupervised learning-based dimensionality reduction to enhance the expressive power of the data. Subsequently, the extended data is fed into a graph convolution module with attention mechanism, allowing adaptive weight learning and capturing relationships between nodes. And integrate the physical state equation into the loss function to achieve high-precision power flow prediction. Finally, simulation verification was conducted, comparing the PI-GAT method with traditional approaches. The results indicate that the proposed model outperforms the other latest model across various evaluation indicators. Specifically, it has 46.9% improvement in MSE and 14.08% improvement in MAE.

Keywords:

flow prediction; graph attention network; physics-informed; microgrid

1. Introduction

In recent years, the rapid development of the social economy has led to an increase in energy demand and utilization rates. Nevertheless, a range of issues such as environmental pollution, energy imbalance, and the low utilization rate of new energy sources persist [1]. Consequently, there is a growing focus on establishing a novel energy utilization system known as the energy internet. Supported by artificial intelligence technologies like big data and machine learning, the energy internet integrates various datasets including power grid operation data, weather forecasts, and power market information for predictive purposes [2]. This enables real-time dynamic adjustments by all machines and systems, thereby ensuring optimal operational efficiency of the power grid. Microgrids, as a vital component of the energy internet, play a significant role in power system analysis and calculation. Power flow calculation within microgrids is of utmost importance, serving as the foundation of the power system’s computing environment and a prerequisite for its stability and reliability [3].

Classic power flow calculations are typically implemented using algorithms that prioritize convergence, speed, accuracy, and memory usage. These calculations often involve methods such as Gauss–Seidel iteration, Newton–Raphson method, and forward–backward iteration [4]. While these methods have been refined to some extent, they still possess certain limitations. Primarily relying on statistics and time series analysis, these methods utilize historical data for modeling and forecasting. However, they often overlook the impact of spatial relationships and fail to capture the propagation and interaction of power flows between different regions [5].

The rapid development of deep learning across various domains has sparked significant interest in power flow prediction methods based on neural networks. Neural networks possess the ability to automatically learn feature representations from data and exhibit strong nonlinear modeling capabilities [6]. Consequently, researchers have extensively explored the potential of artificial intelligence in power system power flow calculations. Pan proposed a power system power flow prediction model based on a multi-layer perceptron, which effectively handles grid power flow values under simple accidental conditions [7]. Mahela, on the other hand, introduced a complex-valued neural network combined with a fuzzy clustering algorithm to estimate bus voltage [8]. This approach yields more reliable results while requiring less training data and reduced training time. In the context of distributed power grids, Li devised a radial basis function neural network to calculate the power flow of microgrid incorporating wind and solar energy sources [9]. The Graph Convolutional Network (GCN) has been proven to be an effective method for processing data with complex spatial structures. By performing convolution operations on the graph structure, GCN captures the relationships and interactions between nodes, combining the feature information of a node with that of its neighboring nodes. However, the traditional GCN method overlooks the attention mechanism in spatial relationships, which limits its performance in power flow prediction. In power flow prediction tasks, the interactions between nodes carry different levels of importance. For instance, in grid power flow forecasting, certain nodes may exhibit higher interactions and hold greater significance, while others may have limited usage [10]. The traditional GCN method fails to discern the differences in importance between nodes, thereby underutilizing the information from key nodes for accurate prediction. Experts have proposed using a weighted adjacency matrix to enhance its performance and better reflect the strength of relationships between nodes through weights. However, this move still faces issues such as weight selection and increased complexity. In addition, there is a lack of optimization for the specific task of trend prediction.

Aiming at the limitations of the traditional GCN method, a power flow prediction research based on physical information graph attention network (PI-GAT) is proposed. The PI-GAT method uses the graph convolution network to model the spatial relationship, and combines the attention mechanism to obtain the important weights between nodes. By effectively learning the interaction between nodes, the future power flow changes can be predicted more accurately. In addition, the physical knowledge is integrated into the model design to ensure that the prediction results conform to the physical laws. The state equation function is incorporated into the optimization process of the model as a physical constraint. We use the equations for power flow calculation and integrate them into the calculation of the model loss function. Adopting gradient descent and other optimization algorithms to minimize the comprehensive loss function, thereby achieving consistency optimization between the state equation and prediction results.

1.: We introduce a novel spatial attention graph convolution module designed to learn the weights between nodes. This module effectively adapts to modeling spatial relationships and dynamically adjusts the level of interaction between nodes based on their respective importance. By incorporating the attention mechanism, we can more accurately capture the intricate relationships and interactions between nodes.
2.: We construct a comprehensive end-to-end power flow forecasting model, which integrates multiple convolution modules of spatial attention graph with traditional neural network components. The model effectively combines the spatial relationship and time information, and can more robustly capture the inherent dynamic characteristics of power flow changes. In addition, the state equation and loss function in power flow calculation are established simultaneously, and the physical constraints are integrated into the optimization process of the model. The gradient descent and other optimization algorithms are used to minimize the comprehensive loss function, so as to achieve the consistency optimization of the state equation and the prediction results.
3.: We use the PYPOWER software package combined with VSCODE method to build a 14-node AC microgrid and complete the simulation and data acquisition. After the completion of data construction, a large number of functional tests and comparative analysis were carried out on the proposed model. The experimental results demonstrate the significant advantages of the PI-GAT method in power flow prediction tasks. Compared to other models, our method achieves higher accuracy and improved predictive capabilities. By comparing it with other commonly used power flow prediction methods, we validate the effectiveness and superiority of our approach.

The rest of the paper is organized as follows: Section 2 presents the basic theory of the approach adopted by the PI-GAT model. Section 3 introduces the 14-node microgrid topology and data characteristics. Section 4 introduces the structure and pipeline of the model. Section 5 presents the experimental setup, results, and discussion. Section 6 concludes the paper.

2. Basic Theory

2.1. Graph Neural Network Theory

Graph neural network (GNN) is a deep learning model specially used for processing graph structure data. It can learn the characteristics of a graph through the relationship between nodes and edges. To elaborate, let us consider a graph network (GN) block as an example, which operates as a module transforming one set of interconnected data points (referred to as a graph) into another set in a similar format. Simply put, when we provide a graph as input, the GNN processes it, resulting in a modified graph as output. It is important to clarify that in this context, “graph” signifies a dataset organized in a specific interconnected structure, not necessarily a visual image. A graph can be defined as a triplet

G = (μ, V, E)

, where

μ

denotes the general representation of the graph, and V and E represent all the node sets and all the edge sets. In the edge set E, a pair of nodes

V_{i}

and

V_{j}

are connected, indicating that the two nodes are related as

E_{i j} = (V_{i}, V_{j})

.

If the reaction is symmetrical, the relationship is undirected. If the reaction is asymmetric, their relationship should be directed. In microgrid, because the current flow is directional, the edges in the graph are directional. Each block contains three update functions

ϕ

and three aggregate functions

ρ

. The update function updates the data of each edge, each node, and the whole graph, respectively. The aggregate function integrates and processes the input data respectively [11]. The working principle is shown in Figure 1 and the calculation formula is in Equation (1).

H^{l + 1} = σ ({(D + I)}^{0.5} (A + I) {(D + I)}^{- 0.5}) H^{l} W^{l}

(1)

where

H^{l}

stands for the l-th layer of the network. For nonlinear activation function

σ

, ReLU is used here.

D, A, I

represent a degree matrix, an adjacency matrix and an identity matrix respecively. The

W^{l}

is the weight of the l-th network layer.

2.2. Autoencoder Theory

The autoencoder is a widely used artificial neural network in self-supervised learning tasks. Its primary objective is to perform representation learning on input data by using the input itself as the learning target. Through this process, it achieves feature extraction from the input information [12]. It is important to note that the traditional autoencoder conducts dimensionality reduction by transforming data from high-dimensional space to low-dimensional space, thus extracting the main features. However, in this research, the autoencoder employed follows a reverse process, transforming data from low-dimensional space to high-dimensional space and then back to low-dimensional space [13]. The main purpose of this approach is to enhance the details of the original data and serve the functions of data generation and sample expansion. The basic structure of the autoencoder used in this study is illustrated in Figure 2.

It is important to note that in the autoencoder, various types of layers such as linear layers, convolutional layers, and RNN layers can be employed in each hidden layer. Each type of layer can have different effects on data reconstruction, and the number and size of layers can also impact the network’s expressive capacity. Therefore, it is crucial to design an appropriate autoencoder structure to facilitate efficient feature extraction.

2.3. Spatial Attention Mechanism

The Transformer model is a linear neural network model that adopts a multi head attention mechanism. By using parallel computing called Attention, we detect the weight of the mutual influence between distant data in long sequences. It mainly consists of Encoder and Decoder [14]. In practice, using only the Encoder structure not only ensures the accuracy of fault diagnosis but also offers benefits such as a straightforward design, fast training speed, and low computational complexity. The specific structure is shown in Figure 3a. The input in the figure is a waveform signal, which is encoded and processed through the Embedding Layer and Position Embedding layers. The multi-head Attention layer captures complex patterns and relationships. The Add&Norm layer is a residual network combined with regularization, allowing the network to only focus on the differences and accelerating convergence. The Feed Forward is composed of two fully connected layers, playing a role in dimensional adjustments [15].

The multi-head attention mechanism is the core of the Transformer model, which can be seen as the mapping of single-layer attention on different spatial dimensions [16]. The principle of single-layer attention is shown in Figure 3b. The input data

Q = K = V

is mapped to different spaces after passing through three different linear layers. By calculating the Q and K matrices through point multiplication, the Attention Matrix can be obtained through standardization and Softmax layers. Then, the attention map is multiplied by V, and the output is obtained through linear transformation. The formula for calculating attention is shown in Equation (2).

A (Q, K, V) = f_{4} (softmax (\frac{f_{1} (Q) f_{2} (K^{T})}{\sqrt{L}}) f_{3} (V))

(2)

where A is the attention output.

f_{1 - 4}

are the mapping functions of four different linear layers. L is the length of sequence. Softmax is the normalized activation function.

3. Microgrid Analysis

3.1. Microgrid Topology Structure

With the integration of new energy, the structure of microgrids has become more complex, requiring frequent adjustments to maintain voltage and frequency stability. As each device serves as a data node, its interrelationships vary with structural changes, making graph neural networks (GNNs) suitable for data mining. Figure 4 shows a typical 14-node AC microgrid structure diagram.

This typical 14-node AC microgrid structure diagram used in this paper refers to the IEEE standard test model, which is widely used in power system analysis and research.The model defines the type of each node (Slack node, PQ node, PV node), node injection power, branch parameters (resistance, reactance, susceptance), and other information in detail and can simulate the basic operation characteristics of power system. Building a composite microgrid based on this model can use its mature network structure and parameter system to provide the basis for power flow calculation and stability analysis of microgrid. The microgrid structure diagram consists of a total of 14 busbars, serving as crucial nodes within the power grid. Among these, there are five equivalent power supplies, including new energy sources, as well as five transformers. Various local loads of different sizes are distributed across each busbar. This 14-node microgrid structure effectively represents the typical microgrid structure found in most real-world projects. Furthermore, it is worth noting that the power grid topology can be adjusted based on specific engineering and maintenance requirements. In this regard, three different graph network models are introduced below, representing varying levels of topological complexity, as illustrated in Figure 5.

Figure 5a represents a comprehensive mapping diagram of the complete 14-node microgrid topology. This diagram depicts the nodes within the power grid and their interconnections. The topology has five sub-ring networks, which means that there are the most complex connections between different nodes. On the other hand, Figure 5b illustrates a modified topology mapping diagram where the connections between nodes 13–14, 9–11, and 3–4 have been disconnected. This diagram has three broken edges and a total of two sub-ring networks. The decrease in the sub-ring networks means that the complexity of the association between different nodes decreases. Lastly, Figure 5c further disconnects the tie lines between 12–13 and 8–9, resulting in a simplified feeder network. This network lacks a ring structure and possesses a straightforward configuration. It also means that the association between different nodes is relatively simple.

After completing the graph mapping of various microgrid topologies, the relationship between nodes and edges in the graph network is established. Taking Figure 5a as an example, the adjacency matrix

A \in R^{14 \times 14}

can be calculated to capture the connection relationships between different nodes in the graph. This matrix provides a clear representation of the links between nodes in the microgrid network and as shown in Equation (3).

A = (\begin{matrix} 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 \end{matrix})

(3)

In practical applications, considering the sparsity of the matrix and to simplify calculations and storage, only relevant node-to-node relationships are considered. Therefore, the adjacency matrix is reconstructed to reflect these connections. Subsequently, the adjacency list is derived from the reconstructed matrix, providing a more concise representation of the relationships between nodes. The adjacency table is shown in Equation (4).

\begin{matrix} A_{1} = {(1, 2), (1, 5), (2, 3), (3, 4), (4, 5), (4, 7), (5, 6), (6, 10), (6, 12), (6, 13), \\ (7, 8), (7, 9), (8, 9), (9, 11), (9, 14), (10, 11), (12, 13), (13, 14)} \end{matrix}

(4)

Similarly, the adjacency lists for the other two topologies are obtained as in Equations (5) and (6).

\begin{matrix} A_{2} = {(1, 2), (1, 5), (2, 3), (3, 4), (4, 7), (5, 6), (6, 10), (6, 12), (6, 13), (7, 8), \\ (7, 9), (9, 14), (10, 11)} \end{matrix}

(5)

\begin{matrix} A_{3} = {(1, 2), (1, 5), (2, 3), (3, 4), (4, 7), (5, 6), (6, 10), (6, 12), (6, 13), (7, 8), \\ (7, 9), (8, 9), (9, 14), (10, 11), (12, 13)} \end{matrix}

(6)

3.2. Data Analysis

Within the power grid bus, there are four measurable variables: active power (P), reactive power (Q), voltage (V), and voltage angle (

θ

). These power flow measurements are crucial for analyzing power systems in steady-state or transient conditions [17]. Each state of the microgrid, considering various typologies and load conditions, can be represented as a distinct graph. Each graph consists of 14 nodes, and each node is associated with four data features. The specifics are presented in Table 1. Consequently, it can be observed that a single graph corresponds to

R^{N \times 14 \times 14}

of data points.

In the system power flow calculation, the network nodes listed in Table 2 can be categorized into PQ nodes, PV nodes, and slack nodes.

Among them, the slack node, also known as the reference node, is primarily responsible for maintaining the balance of active power and reactive power within the system. Each microgrid typically has only one slack bus. This bus is often referred to as the PQ bus, which means that the voltage and voltage angle at this bus is known and can be determined [18].

PV nodes are typically associated with generator buses, where both the power generation and voltage are known and can be controlled. In this case, the PV bus is specified with the P and V.

On the other hand, PQ nodes, also known as load buses, are the most common type of bus nodes in microgrid. For PQ nodes, the P and Q are specified [19].

3.3. Flow Calculation Method

Grid power flow calculation is a method used to analyze and calculate parameters such as voltage, power, and current of each node in the power system. The classic power flow calculation method mainly includes Gauss–Seidel Iteration Method, Newton–Raphson Method and Fast Decoupled Method [20]. We take the Gauss–Seidel Iteration Method as an example to introduce the power flow calculation process. First, the power flow model in standard polar coordinates is shown in Equations (7) and (8).

P_{i} = U_{i} \sum_{j \in N (i)} U_{j} (G_{i j} cos θ_{i j} + B_{i j} sin θ_{i j}); i, j = 1, \dots, n

(7)

Q_{i} = U_{i} \sum_{j \in N (i)} U_{j} (G_{i j} sin θ_{i j} - B_{i j} cos θ_{i j}); i, j = 1, \dots, n

(8)

where n is the node number in microgrid.

P_{i}

,

Q_{i}

are the i-th elements of the active and reactive power.

U_{i}

,

θ_{i}

are the i-th elements of the voltage magnitude and phase angle.

G_{i j}

,

B_{i j}

are the conductance and susceptance value at the corresponding matrix position.

N (i)

represents the set of adjacent nodes.

The calculation results obtained by solving the equation represent the mathematical solution to the power flow equation. However, the practical significance of this set of solutions in engineering still needs to be verified. This is because certain technical and economic requirements must be met when the microgrid is operating. These requirements constitute the constraints on certain variables in the power flow problem [21]. Common constraints include voltage constraints, as shown in Equation (9). All electrical equipment must operate near rated voltage. This constraint is mainly for PV nodes. Power constraints, as shown in Equations (10) and (11). Since the active power of PQ node and PV node are disturbance variables, they are uncontrollable. Therefore, this constraint mainly targets the reactive power of slack nodes and PV nodes. Phase difference constraints, as shown in Equation (12). In order to ensure the stability of the system, it is required that the voltage difference at both ends of the line does not exceed a certain value [22].

V_{i min} \leq V_{i} \leq V_{i max}

(9)

P_{i min} \leq P_{i} \leq P_{i max}

(10)

Q_{i min} \leq Q_{i} \leq Q_{i max}

(11)

|θ_{i} - θ_{j}| < {|θ_{i} - θ_{j}|}_{max}

(12)

Under the conditions that satisfy the above constraints, the power flow calculation formula can be obtained as shown in Equation (13).

[\begin{matrix} Δ P_{i} \\ Δ Q_{i} \end{matrix}] = [\begin{matrix} \frac{\partial P_{i}}{\partial θ_{j}} & \frac{\partial P_{i}}{\partial U_{j}} U_{j} \\ \frac{\partial Q_{i}}{\partial θ_{j}} & \frac{\partial Q_{i}}{\partial U_{j}} U_{j} \end{matrix}] [\begin{matrix} Δ θ_{i} \\ Δ U_{i} / U_{i} \end{matrix}]

(13)

4. Model Structure

To enhance the accuracy and robustness of microgrid power flow prediction, this paper proposes a graph neural network-based prediction model. The model comprises several key modules: a data preprocessing module, a feature augment module, a PI-GAT module, and a predict module.

The data processing and reconstruction module initially performs data preprocessing on the raw input data. This involves improving data quality through methods such as standardization and data repair. Subsequently, the point and edge data features are integrated to reconstruct the dataset into a format that can be understood by the graph neural network.

The reconstructed data is then fed into a pre-trained autoencoder within the feature enhancement module. This step aims to enhance the data features through the autoencoder’s capabilities.

The enhanced data is further processed through the spatial graph convolution model. This model employs spatial attention extraction and convolution operations to deeply analyze the data’s characteristics.

Finally, the output module generates specific prediction results based on the high-level data, ensuring result interpretability and performance measurability. Figure 6 illustrates the model structure, and each module’s internal structure is detailed below.

4.1. Data Reconstructing

The data processing module includes data standardization, data repair, and data restructuring. Input data is

x \in R^{B \times L \times F}

and the output data is

x_{x} \in R^{(B \times L) \times F}

and

x_{e} \in R^{B \times 2 \times F}

.

4.1.1. Data Standardization

Due to the fact that the input data is collected by devices under different loads and power grid topologies, and for different types of faults, it often exists in different distribution states. In order to improve data quality and accelerate training speed, it is necessary to use standardization for data processing. Generally, a standard normal distribution with a mean of 0 and a normalization of 1 is used [23]. A mean of 0 is used to centralize data and maintain balance across all dimensions. The standard deviation of 1 is to present a unit vector for the length of data in each dimension, which means that data with different size range distributions will be uniformly scaled to a unified dimension for processing. This article adopts this mean variance normalization method. The specific formula is Equation (14).

X_{1} = \frac{X_{0} - u_{X_{0}}}{σ_{X_{0}}}

(14)

where

X_{0}

is the initial data.

u_{X_{0}}

is the data mean.

σ_{X_{0}}

is the standard deviation.

4.1.2. Data Repairing

Deep learning models require complete temporal data for modeling; therefore, the reliability of the dataset is crucial. However, in the actual process, due to the collection capacity and network transmission, temporal data often experiences missing and breakpoints, so data restoration work is necessary in certain degree. Data filling is generally done by finding the variation patterns of time series, in order to fill in the missing values, which is similar to time series prediction. The difference is that in the prediction task, the data after the prediction point is unknown. In the repair task, data before and after missing points can be obtained, so the repair efficiency is higher [24].

This paper adopts the K Nearest Neighbor (KNN) data repair method. The main task is to fill in the vicinity of missing points with K neighbors, calculate the distance matrix between data points with missing values and other data points without missing values, and select the k data points with the closest Euclidean distance. Fill in the missing values in the data with the field mean values corresponding to the selected k nearest neighbor data points [25].

4.1.3. Data Reconstructing

Data reconstruction plays a crucial role in the data processing phase. When using a graph neural network, the input format typically consists of two main components: node features and the graph’s topology. The node features refer to the feature vector matrix associated with each node, representing the known feature quantities of each bus. By reconstructing the graph data, traditional sequence data is transformed into new graph node data

x_{x} \in R^{(B \times L) \times F}

. It is important to note that for efficient training of neural networks, the dataset needs to be processed in batches. Each batch of data can be considered as a graph, implying that each graph contains

B \times L

nodes. In addition to node features, the topology of the graph is also an essential input component. The topological structure reflects the connection relationships between nodes and is commonly represented using a simplified adjacency list

x_{e} \in R^{B \times 2 \times F}

. This adjacency list provides information about the connectivity between different nodes in the graph.

In summary, the data processing and reconstruction module organizes the node characteristics and topological structure information into suitable data structures. This allows the data to be effectively input into the graph neural network model for training and prediction purposes. By appropriately preparing the data, the module ensures that the GNN can effectively learn and utilize the features and connectivity patterns of the microgrid system during the training and prediction processes [26].

4.2. Feature Augument Module

As can be seen from the above, the dimensionality of the microgrid graph data set is

F = 2

, which is low in dimensionality. Such low dimensionality may lead to information loss, causing the model to be unable to fully utilize data features. Such data distribution is relatively dense, which may cause the training model to be easily overfitted and have poor generalization ability. Feature redundancy, the high correlation between multiple features, may cause the model to rely too much on some features and ignore other features [27].

To enhance the predictive capability of the model, this article employs an autoencoder module to perform dimensionality reduction on the data. After completing the pre-training phase, the Encoder module is utilized to elevate the low-dimensional

x_{x} \in R^{(B \times L) \times F}

data to a higher-dimensional representation

x_{x^{'}} \in R^{(B \times L) \times F^{'}}

.

F^{'} = 128

.

Through this module, the dimensionality of the data is increased, which in turn improves the sparsity of the samples. Additionally, it enables the model to capture the correlations between different features, enhancing its expressive power. Moreover, this dimensionality enhancement helps to mitigate the risk of overfitting in the model, leading to improved prediction performance.

The structure of the autoencoding module in the model can be classified into nine categories based on the type and number of hidden layers. The Table 3 below presents these categories, and during the experiment, the best model is selected based on the evaluation results.

4.3. PI-GAT Module

The attention graph convolution module is an extension of the GCN. It introduces an attention mechanism to enhance information transfer and weight distribution between nodes in the graph. This module has broad applications in various graph data analysis and prediction tasks, including microgrid power flow prediction. By incorporating attention mechanisms, the module enables the model to focus on important nodes or edges in the graph, capturing their relative importance and improving the overall prediction performance.

The core idea behind the attention graph convolution module is to dynamically adjust the process of information transmission and aggregation among nodes by learning the relationship weights between them. In traditional graph convolution models, these weights are typically fixed, treating all nodes equally. However, the attention graph convolution module introduces the capability to adaptively adjust the weights between nodes based on the specific requirements of the task at hand. By doing so, the module allows the model to focus on and prioritize the most relevant nodes when performing information aggregation and transmission, thereby improving the model’s ability to capture important patterns and make accurate predictions in microgrid power flow and other graph-based prediction tasks.

The attention graph convolution module consists of two key steps: feature propagation and attention mechanism. Feature propagation involves the transfer and aggregation of information between nodes in a graph. During this process, each node combines its own features with those of its neighboring nodes to create a new representation of its features. Traditional graph convolution models typically rely on simple weighted average or concatenation operations for feature propagation. In contrast, the attention graph convolution module introduces an attention mechanism to dynamically adjust the weights assigned to neighboring nodes. This attention mechanism allows the module to focus on the most relevant neighbor nodes and adaptively adjust their influence on the feature aggregation process [28].

The attention mechanism plays a vital role in the attention graph convolution module. It computes the similarity or correlation between nodes and utilizes a normalization function to produce weights that determine the importance of connections between nodes. These weights are then applied to adjust the information aggregation during feature propagation, allowing for a more accurate representation of the relationships and significance between nodes. The attention mechanism generally comprises two essential components: the calculation of attention weights and the utilization of these weights in the feature propagation process [29].

Note that the calculation of the attention mechanism is linear calculation in nature and cannot handle spatiotemporal relationships. Therefore, location encoding needs to be added. It encodes the given input sequence according to odd and even positions. The odd positions are encoded using cosine functions and the even positions are encoded according to sinusoidal positions. After completing the encoding, the location information is added to the data sequence. The formula is as shown in Equations (15) and (16).

P E_{(p o s, 2 i)} = sin (p o s / 10000^{2 i / D})

(15)

P E_{(p o s, 2 i + 1)} = cos (p o s / 10000^{2 i / D})

(16)

where

p o s

is the input position. i represents the relative position. D represents the model dimension.

After completing the description of GAT block, it is necessary to introduce relevant contents of physical information into it. In this paper, the state equation function is incorporated into the optimization process of the model as a physical constraint. The prediction loss and physical loss are combined to form a comprehensive loss function, and back propagation and optimization are carried out [30]. In the process of optimization, the parameters of this model will be affected by two parts at the same time: the model will strive to make the predicted value close to the real value and also ensure that its output conforms to the basic physical laws of the power system. Among them, the active and reactive power calculated under the physical state equation is in Equations (17) and (18).

\tilde{P} = \hat{U} {(Y {\hat{U}}^{*})}^{*} \times c o s (\hat{θ})

(17)

\tilde{Q} = \hat{U} {(Y {\hat{U}}^{*})}^{*} \times s i n (\hat{θ})

(18)

where ∗ means conjugate. Y represents admittance matrix.

\hat{U}

and

\hat{θ}

are output of model prediction.

\tilde{P}

and

\tilde{Q}

are the active and reactive power calculated by physical equations.

After completing the power flow calculation, the comprehensive loss function is shown in Equation (19).

l o s s_{t o t a l} = \sum_{i = 1}^{N} {(U_{i} - \hat{U_{i}})}^{2} + \sum_{i = 1}^{N} {(θ_{i} - \hat{θ_{i}})}^{2} + \sum_{i = 1}^{N} {(P_{i} - \tilde{P_{i}})}^{2} + \sum_{i = 1}^{N} {(Q_{i} - \tilde{Q_{i}})}^{2}

(19)

where

P_{i}, U_{i}, Q_{i}, θ_{i}

are all actual values.

4.4. Predict Module

This module is the last layer of the entire network model and is used to output the final result of the generated model. This layer consists of two fully connected layers FC. Convert

X_{x} \in R^{B \times L \times F^{'}}

data into

X_{x} \in R^{B \times L \times F}

.

5. Experiments

5.1. Experimental Setting

This paper carries out data processing and calculation on a computer with 600Ada GPU which is created by NVIDIA in Santa Clara City in America. The program was implemented using Python (version 3.1.1) language within the PyTorch (version 2.8.0) framework.

We used a typical 14-node AC power grid structure and PYPOWER (version 5.1.19) software package combined with VSCODE (version 1.103) for simulation. PYPOWER is a power flow and optimal power flow (OPF) solver. By simulating the microgrid data under different operation states and different topology structures. A total of 10 topologies were collected, and nearly 20,000 data collections were completed. In each data collection, a total of 14 × 4 = 56 data points are collected, where 14 refers to the number of nodes in the power grid, and 4 refers to the

P, Q, V, θ

data volume, respectively.

P, Q

is known data and

V, θ

are predicted data.

The proportion of data training set, validation set, and testing set is

60 %

,

20 %

, and

20 %

, respectively. The loss function uses MSE Loss. Using Adam optimizer. The setting of model hyperparameters is as follows: Batchsize = 64, Epoch = 100, Dropout = 0.1, Learning rate =

1 \times 10^{- 4}

. All the hyperparameters in this manuscript are gradually optimized according to the computational efficiency and training conditions. Batch size is selected to ensure training efficiency. The setting of epoch can ensure sufficient model training. Dropout is selected according to the empirical method. In addition, it can help the model better learn the data characteristics without losing too much information. Learning rate as the most important parameter of deep learning, the best performance value is selected through grid search. Using kaiming weight initialization.

The steps of autoencoder calculation are shown in the pseudocode Algorithm 1. The steps of the calculation of the proposed model are shown in the pseudocode Algorithm 2.

Algorithm 1 Autoencoder module calculation algorithm

1:: C is MSELoss.
2:: O is Adam Optimizer.
3:: ${model}_{E}$ is auto-encoder’s encoder and ${model}_{D}$ is auto-encoder’s decoder.
4:: for X in TrainLoader do
5:: $o u t p u t_{e}$ ← ${model}_{E}$ (X)
6:: $o u t p u t$ ← ${model}_{D}$ ( $o u t p u t_{e}$ )
7:: $l o s s$ ← $C (X, o u t)$
8:: ${model}_{D}$ and ${model}_{E}$ backpropagation and update parameter
9:: end for
10:: Do valid data set and calculate valid loss.
11:: Do test data set and calculate test loss.

Algorithm 2 Power flow prediction algorithm

1:: C is MSELoss.
2:: O is Adam Optimizer.
3:: model is the proposed calculation model.
4:: for $X y$ in TrainLoader do
5:: $o u t p u t$ ← model( $X y$ )
6:: $l o s s 1$ ← $C (o u t p u t, X y . y)$
7:: Calculate $\tilde{X}$ by physical state equation.
8:: $l o s s 2$ ← $C (\tilde{X}, X y . x)$
9:: $l o s s = l o s s 1 + l o s s 2$
10:: model backpropagation and update parameter.
11:: end for
12:: Do valid data set and calculate valid loss.
13:: Do test data set and calculate test loss.

5.2. Autoencoding Pre-Training

This study employs an autoencoder for pre-training the data and utilizes the encoding function to increase the dimensionality of the data

d = 2 \to 128

. Increasing the dimensionality helps enhance the information richness, thereby improving the model’s expressive capacity. Table 2 demonstrates that the encoding and decoding sections of the autoencoder can be classified into nine types based on the network model’s layer types and numbers. Different hidden layer structures have varying effects on capturing data features. To determine the structure with the strongest expressive ability, pre-training tests were conducted on the nine solutions. The results, depicted Figure 7a, indicate that the solution with a single convolutional layer yielded the best pre-training performance. Moreover, this solution has a lower number of model parameters, which reduces computational costs and enhances speed. Consequently, the pre-training model in this article adopts a single convolutional layer.

To further validate the feasibility of the proposed pre-training solution, the data representation and restoration abilities of the autoencoder were examined. Figure 7b demonstrates that during the testing and verification phases, the model exhibits low loss and successfully achieves the objectives of feature retention and feature expansion. This indicates that the autoencoder effectively captures and preserves essential features of the data while also enabling the expansion of its expressive capacity. The results provide evidence of the model’s capability to retain and restore important information during the pre-training process.

5.3. Experiments Setting

5.3.1. Model Comparison

In order to verify the superiority of the proposed scheme, this article compares its performance with classical models. The explanation of the model used in the experiment is as follows.

DNN: Deep Neural Network. A deep linear network layer composed of residual structures [31].

CNN: Convolutional Neural Network. Its core idea is to extract image features using convolution and pooling operations. It can adaptively adjust network weights, thereby improving the accuracy and generalization ability of the model [32].

Bi-LSTM: Bidirectional Long Short-Term Memory. It is a variant of recurrent neural network RNN, which introduces a reverse LSTM layer and considers forward and backward information flow, which can solve problems such as information loss and order dependency limitations in sequence data [33].

ResNet: Residual Network. It is a deep convolutional neural network architecture that constructs the network by introducing residual connections, allowing information to jump over in the network to alleviate the problem of gradient vanishing in deep networks [34].

Transformer: It is a neural network architecture based on self-attention mechanism, which can better capture the global relationship of sequences and enhance the model’s expression ability and training effectiveness [35].

GCN: Graph Convolutional Network. It is a classic deep learning model based on graph structure. Graph convolution layers update node representations by convolving each node’s features with those of its neighbors, effectively capturing the graph’s structural information [36].

PI-GAT (Ours): It is the proposed Physics-Informed Graph Attention Network.

This article presents a comparison of the testing performance of different models across three typical topology structures. The results are summarized in Table 4.

It is evident that PI-GAT consistently achieves the best performance across different topologies based on the MSE and MAE metrics. The mean

r_{M S E} = 0.01 \sim 0 . 03

and

r_{M A E} = 0.09 \sim 0 . 13

. Taking the full-scale topology 1 as an example, PI-GAT demonstrates a remarkable

46.9 %

improvement in MSE and an approximate

14.08 %

increase in MAE compared to the second-ranked solution, Transformer. This highlights the beneficial impact of graph convolution on feature extraction. Similar notable improvements are observed across other topology sizes. Furthermore, when compared to the classic GCN model, it becomes apparent that the classic model exhibits poor performance in power flow prediction. However, the incorporation of self-attention and model enhancements significantly enhances the predictive accuracy of our PI-GAT model.

Therefore, this experiment proves that the proposed scheme has a greater improvement in prediction accuracy compared with the traditional schemes.

5.3.2. Visualization

To better understand and explain the PI-GAT model, as well as to facilitate data observation and analysis, the valid loss of the seven aforementioned methods during a 100-epoch training cycle is plotted in Figure 8. The subplots (a–c) in the figure correspond to three typical topological structures, respectively. It is evident that as the number of training epochs increases, the loss gradually decreases, indicating the effectiveness of model training without overfitting or underfitting. Around the 60th epoch, the loss curve tends to flatten, suggesting that the model has reached an optimal state. Furthermore, the proposed scheme consistently achieves the lowest loss compared to other model schemes, even within the same number of training epochs. This further confirms the superiority of the PI-GAT model in terms of prediction performance.

To further evaluate the prediction performance of the various proposed schemes on different nodes, separate testing was conducted to measure the loss values at different node positions. The results are visualized in Figure 9, where subplots (a–c) represent three typical topologies. For instance, in subplot (a), which corresponds to the complete structure, the proposed scheme demonstrates the smallest predicted loss across the 14 nodes of the microgrid, with a relatively balanced distribution and no significant fluctuations. This experiment provides further evidence of the stability and reliability of the predictions generated by the proposed scheme.

5.3.3. Attention Map

The PI-GAT model utilizes a spatial attention-based algorithm, which enables it to dynamically capture global spatial features and contextual understanding. To gain further insights into the attention mechanism and understand the model’s dependency in generating representations for each position, attention maps are generated and visualized in Figure 10. The subplots (a,b) represent attention maps under different head mapping spaces for a single sample, while the subplot (c,d) shows the attention map of randomly selected samples and heads. In the picture, brighter colors represent more focused attention, while darker colors represent less focused attention. It can be seen that this model is basically continuous and concentrated in time series. The attention maps clearly demonstrate the model’s ability to focus on key information and capture important dependencies within the input sequence.

5.4. Transfer Learning

The aforementioned comparative experiments have validated the accuracy of the proposed scheme. This chapter aims to assess the generalization capability of the proposed scheme. Transfer learning is a valuable technique that can enhance a model’s ability to generalize to new tasks, particularly in scenarios with limited data. By conducting transfer learning experiments, we can evaluate the model’s generalization performance on novel tasks and uncover its capacity to share and transfer knowledge between different tasks. The details are shown in Table 5.

In the performance comparison experiment, both the training and validation sets are derived from a single dataset, with a proportional split. Although they are not directly related, they tend to follow the same distribution. In the context of transfer learning, this article conducts a total of 10 experiments. The first five experiments use fixed training sample sets (Set 1), while the test sets are selected sequentially with an interval of 1. The remaining five experiments use random training and testing sample sets. The performance of the PI-GAT model is observed across three typical topologies. It is evident that the prediction performance remains stable regardless of the specific experimental setup. This experiment further highlights the generalization ability of the PI-GAT model, as it consistently performs well across different training and testing scenarios.

5.5. Few-Shot Learning

In practical projects, challenges such as limited data availability, rapid iterations, and data migration are often encountered, making it difficult to obtain a sufficient amount of high-quality data. Therefore, it is crucial to assess the predictive quality of the model when dealing with small data samples. To investigate this, the proportion of training data was adjusted to

5 %, 10 %, 25 %, 50 %, 100 %

, and experiments were conducted across three typical microgrid topologies. The results are presented in Table 6.

It can be observed that when the data exceeds

25 %

, the prediction performance remains largely unaffected. When the data is reduced to

10 %

, only the prediction performance of topology 1 is compromised, while the performance of other topologies remains strong. However, at a mere

5 %

data availability, the prediction performance is significantly disrupted. In practical engineering scenarios, such data adaptability is generally sufficient to meet the required needs.

5.6. Performance Verification Under Different Topologies

The experiments conducted in this article have primarily focused on three typical topological structures. However, it is important to verify whether the proposed scheme maintains higher accuracy under other topologies as well. Therefore, experiments were performed on 10 different topologies to observe the prediction performance. Among these, topologies No. 1–3 represent the typical structures, while topologies 4–10 represent additional topologies sorted by increasing structural complexity. The results are presented in Table 7. It can be observed that regardless of the topology configuration, the prediction performance of the PI-GAT model remains stable and consistently high. This indicates the robustness and effectiveness of the proposed scheme in dealing with various topological variations.

6. Conclusions

This study focuses on enhancing the accuracy and stability of microgrid power flow prediction by introducing a method based on a novel Physical-Informed Graph Attention Network. This method takes into account the spatial relationships and dependencies between nodes in microgrids. By conducting comparative experiments and analyzing the results, the following conclusions can be drawn.

First, the power flow prediction method, which is based on PI-GAT, exhibits significant advantages over traditional statistical models, time series models, and other commonly used methods. It consistently outperforms these approaches across various performance indicators, yielding superior prediction results. This demonstrates that the incorporation of spatial attention graph convolution enables better capturing of the relationships and influences between nodes in microgrids, ultimately enhancing the accuracy of power flow prediction.

In addition, our method effectively simulates the spatial relationships between nodes and introduces a spatial attention mechanism to learn the weights of nodes. This enables the model to more accurately distinguish the importance and contribution of individual nodes. Therefore, our model can make more accurate predictions about changes in node power flow. The integration of physical information helps the model achieve better generalization ability, limits the complexity of the model, reduces overfitting, guides the optimization process, and accelerates the convergence speed of the neural network.

Lastly, our approach utilizes an end-to-end training framework that integrates physical information, attention mechanisms, and graph neural networks together. This holistic approach takes into account both the temporal information and spatial relationships within the microgrid system, thereby enhancing the predictive power and stability of our model. Through rigorous experimental validation, we successfully demonstrate the effectiveness and feasibility of our method in accurately predicting power flow in microgrid systems. The combination of spatial attention and traditional neural network components in our model enables a comprehensive understanding of the underlying dynamics and dependencies, leading to improved prediction performance.

In conclusion, the proposed PI-GAT power flow prediction method holds great application potential in microgrid systems. Our research offers a novel solution to the power flow prediction problem in microgrids, providing valuable insights for optimizing microgrid system dispatch and advancing power grid operations. Future research endeavors can focus on further refining the model structure and algorithms to handle more complex and large-scale microgrid systems. Additionally, extending this method to prediction problems in other related fields would be an interesting avenue to explore. By continuously advancing and refining these predictive models, we can enhance the efficiency and reliability of microgrid systems and contribute to the broader field of energy management.

Author Contributions

Conceptualization, Q.H. and Y.W.; methodology, Q.H. and Y.W.; software, S.-K.I.; validation, Q.H.; formal analysis, S.-K.I.; investigation, S.-K.I. and X.Y.; resources, Y.W.; data curation, Q.H.; writing—original draft preparation, Q.H. and J.C.; writing—review and editing, Q.H. and Y.W.; visualization, Q.H. and J.C.; supervision, Y.W.; project administration, Y.W. and X.Y.; funding acquisition, Y.W. and S.-K.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to company confidentiality.

Acknowledgments

As part of the thesis work of QIYUE HUANG, this paper can be referred by s/c fca.d80d.8954.7.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, W.; Yang, Z.; Yu, J.; Bao, S.; Dai, W. Toward fast calculation of probabilistic optimal power flow. IEEE Trans. Power Syst. 2019, 34, 3286–3288. [Google Scholar] [CrossRef]
Santos, M.; Huo, D.; Wade, N.; Greenwood, D.; Sarantakos, I. Reliability assessment of island multi-energy microgrids. Energy Convers. Econ. 2021, 2, 169–182. [Google Scholar] [CrossRef]
Liao, W.; Bak-Jensen, B.; Pillai, J.R.; Wang, Y.; Wang, Y. A review of graph neural networks and their applications in power systems. J. Mod. Power Syst. Clean Energy 2021, 10, 345–360. [Google Scholar] [CrossRef]
Gao, H.; Sun, L.; Wang, J.X. PhyGeoNet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 2021, 428, 110079. [Google Scholar] [CrossRef]
Wang, D.; Zheng, K.; Chen, Q.; Zhang, X.; Luo, G. A data-driven probabilistic power flow method based on convolutional neural networks. Int. Trans. Electr. Energy Syst. 2020, 30, e12367. [Google Scholar] [CrossRef]
Donon, B.; Donnot, B.; Guyon, I.; Marot, A. Graph neural solver for power systems. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
Pan, X.; Zhao, T.; Chen, M.; Zhang, S. Deepopf: A deep neural network approach for security-constrained dc optimal power flow. IEEE Trans. Power Syst. 2020, 36, 1725–1735. [Google Scholar] [CrossRef]
Mahela, O.P.; Khan, B.; Alhelou, H.H.; Siano, P. Power quality assessment and event detection in distribution network with wind energy penetration using stockwell transform and fuzzy clustering. IEEE Trans. Ind. Informatics 2020, 16, 6922–6932. [Google Scholar] [CrossRef]
Li, S.; Gong, W.; Wang, L.; Gu, Q. Multi-objective optimal power flow with stochastic wind and solar power. Appl. Soft Comput. 2022, 114, 108045. [Google Scholar] [CrossRef]
Wu, H.; Wang, M.; Xu, Z.; Jia, Y. Graph attention enabled convolutional network for distribution system probabilistic power flow. IEEE Trans. Ind. Appl. 2022, 58, 7068–7078. [Google Scholar] [CrossRef]
Zhang, S.; James, J. Bayesian deep learning for dynamic power system state prediction considering renewable energy uncertainty. J. Mod. Power Syst. Clean Energy 2021, 10, 913–922. [Google Scholar] [CrossRef]
Chen, C.; Liang, H.; Zhai, X.; Zhang, J.; Liu, S.; Lin, Z.; Yang, L. Review of restoration technology for renewable-dominated electric power systems. Energy Convers. Econ. 2022, 3, 287–303. [Google Scholar] [CrossRef]
Gao, Q.; Yang, Z.; Yu, J.; Dai, W.; Lei, X.; Tang, B.; Xie, K.; Li, W. Model-driven architecture of extreme learning machine to extract power flow features. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4680–4690. [Google Scholar] [CrossRef]
Huang, B.; Wang, J. Applications of physics-informed neural networks in power systems-a review. IEEE Trans. Power Syst. 2022, 38, 572–588. [Google Scholar] [CrossRef]
Yang, Y.; Yang, Z.; Yu, J.; Zhang, B.; Zhang, Y.; Yu, H. Fast calculation of probabilistic power flow: A model-based deep learning approach. IEEE Trans. Smart Grid 2019, 11, 2235–2244. [Google Scholar] [CrossRef]
Yuan, J.; Weng, Y. Physics interpretable shallow-deep neural networks for physical system identification with unobservability. In Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 7–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 847–856. [Google Scholar]
Singh, M.K.; Kekatos, V.; Giannakis, G.B. Learning to solve the AC-OPF using sensitivity-informed deep neural networks. IEEE Trans. Power Syst. 2021, 37, 2833–2846. [Google Scholar] [CrossRef]
Hossain, R.R.; Huang, Q.; Huang, R. Graph convolutional network-based topology embedded deep reinforcement learning for voltage stability control. IEEE Trans. Power Syst. 2021, 36, 4848–4851. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Yong, P.; Zhang, N.; Kang, C.; Lu, D. Fast power system cascading failure path searching with high wind power penetration. IEEE Trans. Sustain. Energy 2019, 11, 2274–2283. [Google Scholar] [CrossRef]
Spinelli, I.; Scardapane, S.; Uncini, A. Adaptive propagation graph convolutional network. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4755–4760. [Google Scholar] [CrossRef]
Wang, D.; Zheng, K.; Chen, Q.; Luo, G.; Zhang, X. Probabilistic power flow solution with graph convolutional network. In Proceedings of the 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe), The Hague, The Netherlands, 26–28 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 650–654. [Google Scholar]
Lei, X.; Yang, Z.; Yu, J.; Zhao, J.; Gao, Q.; Yu, H. Data-driven optimal power flow: A physics-informed machine learning approach. IEEE Trans. Power Syst. 2020, 36, 346–354. [Google Scholar] [CrossRef]
Tanvir, R.B.; Islam, M.M.; Sobhan, M.; Luo, D.; Mondal, A.M. Mogat: A multi-omics integration framework using graph attention networks for cancer subtype prediction. Int. J. Mol. Sci. 2024, 25, 2788. [Google Scholar] [CrossRef]
Hu, C.; Liu, X.; Wu, S.; Yu, F.; Song, Y.; Zhang, J. Dynamic Graph Convolutional Crowd Flow Prediction Model Based on Residual Network Structure. Appl. Sci. 2023, 13, 7271. [Google Scholar] [CrossRef]
Sui, J.; Chen, P.; Gu, H. Deep Spatio-Temporal Graph Attention Network for Street-Level 110 Call Incident Prediction. Appl. Sci. 2024, 14, 9334. [Google Scholar] [CrossRef]
Ding, Z.; He, Z.; Huang, Z.; Wang, J.; Yin, H. Traffic flow prediction research based on an interactive dynamic spatial–temporal graph convolutional probabilistic sparse attention mechanism (IDG-PSAtt). Atmosphere 2024, 15, 413. [Google Scholar] [CrossRef]
Song, Y.; Luo, R.; Zhou, T.; Zhou, C.; Su, R. Graph attention informer for long-term traffic flow prediction under the impact of sports events. Sensors 2024, 24, 4796. [Google Scholar] [CrossRef]
Gao, M.; Yu, J.; Yang, Z.; Zhao, J. Physics embedded graph convolution neural network for power flow calculation considering uncertain injections and topology. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 15467–15478. [Google Scholar] [CrossRef]
Beinert, D.; Holzhüter, C.; Thomas, J.; Vogt, S. Power flow forecasts at transmission grid nodes using graph neural networks. Energy AI 2023, 14, 100262. [Google Scholar] [CrossRef]
Zhu, Y.; Zhou, Y.; Wei, W.; Wang, N. Cascading failure analysis based on a physics-informed graph neural network. IEEE Trans. Power Syst. 2022, 38, 3632–3641. [Google Scholar] [CrossRef]
Tiwari, D.; Zideh, M.J.; Talreja, V.; Verma, V.; Solanki, S.K.; Solanki, J. Power flow analysis using deep neural networks in three-phase unbalanced smart distribution grids. IEEE Access 2024, 12, 29959–29970. [Google Scholar] [CrossRef]
Tang, C.; Zhang, Y.; Wu, F.; Tang, Z. An improved cnn-bilstm model for power load prediction in uncertain power systems. Energies 2024, 17, 2312. [Google Scholar] [CrossRef]
Balasubramani, K.; Natarajan, U.M. Improving bus passenger flow prediction using Bi-LSTM fusion model and SMO algorithm. Babylon. J. Artif. Intell. 2024, 2024, 73–82. [Google Scholar] [CrossRef]
Yin, L.; Ge, W. Mobileception-ResNet for transient stability prediction of novel power systems. Energy 2024, 309, 133163. [Google Scholar] [CrossRef]
Zhang, J.; Yang, Y.; Wu, X.; Li, S. Spatio-temporal transformer and graph convolutional networks based traffic flow prediction. Sci. Rep. 2025, 15, 24299. [Google Scholar] [CrossRef]
Hu, X.; Yang, J.; Gao, Y.; Zhu, M.; Zhang, Q.; Chen, H.; Zhao, J. Adaptive power flow analysis for power system operation based on graph deep learning. Int. J. Electr. Power Energy Syst. 2024, 161, 110166. [Google Scholar] [CrossRef]

Figure 1. Working principle diagram of GNN.

Figure 2. Working principle diagram of autoencoder.

Figure 3. Schematic diagram of spatial attention.

Figure 4. The diagram of 14-node AC microgrid structure.

Figure 5. Mapping diagram of AC microgrid under different topologies. Different numbers on the graph represent different bus.

Figure 6. The pipeline of proposed Physics-Informed Graph Attention Network model.

Figure 7. Visualization of autoencoding pre-training.

Figure 8. Performance testing figures under different models.

Figure 9. A 14-node performance test comparison chart. The radius coordinate represents MSE loss and the angular coordinate represents the node.

Figure 10. Mapping diagram of a simple microgrid topology in PI-GAT. The brighter the color, the more focused the attention.

Table 1. Node data feature table.

Node			Data
Graph	1	$P_{1}$	$Q_{1}$	$V_{1}$	$θ_{1}$
	2	$P_{2}$	$Q_{2}$	$V_{2}$	$θ_{2}$
	3	$P_{2}$	$Q_{2}$	$V_{2}$	$θ_{2}$
	…	…	…	…	…
	12	$P_{12}$	$Q_{12}$	$V_{12}$	$θ_{12}$
	13	$P_{13}$	$Q_{13}$	$V_{13}$	$θ_{13}$
	14	$P_{14}$	$Q_{14}$	$V_{14}$	$θ_{14}$

Table 2. Table of different node types in the power grid.

BUS Type	Known	Unknown
Slack node	$V, θ$	$P, Q$
PV node	$P, V$	$Q, θ$
PQ node	$P, Q$	$V, θ$

Table 3. Autoencoder structure table.

Type	Hidden Layer	Number of Layers	Parameters
$A E_{1}$	Linear	1	1 K
$A E_{2}$	Linear	2	33 K
$A E_{3}$	Linear	3	66 K
$A E_{4}$	Conv	1	2 K
$A E_{5}$	Conv	2	100 K
$A E_{6}$	Conv	3	198 K
$A E_{7}$	RNN	1	17 K
$A E_{8}$	RNN	2	50K
$A E_{9}$	RNN	3	83 K

Table 4. Performance testing table under different models. The best results are highlighted in bold and underlined.

Models	Topology-1		Topology-2		Topology-3		Rank
Models	MSE	MAE	MSE	MAE	MSE	MAE	Rank
DNN	0.8557	0.7477	0.8555	0.7447	0.8556	0.7477	7
CNN	0.6932	0.6706	0.6971	0.6706	0.7001	0.6746	6
Bi-LSTM	0.0451	0.1601	0.0397	0.1578	0.0457	0.1610	3
Resnet	0.0942	0.2416	0.0988	0.2510	0.1006	0.2520	4
Transformer	0.0432	0.1511	0.0530	0.1689	0.0437	0.1538	2
GCN	0.6929	0.6719	0.7290	0.6883	0.7564	0.7016	5
Proposed	0.0182	0.0988	0.0182	0.1027	0.0253	0.1211	1

Table 5. Performance testing under different training data sources.

Train	Test	Topology-1		Topology-2		Topology-3
Train	Test	MSE	MAE	MSE	MAE	MSE	MAE
1	2	0.0314	0.1332	0.0283	0.1256	0.0278	0.1227
	4	0.0272	0.1332	0.0283	0.1243	0.0272	0.1223
	6	0.0259	0.1170	0.0247	0.1138	0.0254	0.1170
	8	0.0239	0.1108	0.0283	0.1258	0.0293	0.1255
	10	0.0247	0.1147	0.0242	0.1146	0.0260	0.1190
4	8	0.0234	0.1086	0.026	0.117	0.0236	0.1112
3	7	0.0234	0.1165	0.0269	0.1233	0.026	0.1191
3	2	0.0294	0.1281	0.0248	0.1141	0.0266	0.1208
6	1	0.0246	0.1155	0.0257	0.1178	0.0239	0.1126
5	9	0.0278	0.1247	0.0316	0.1324	0.0266	0.1209
Mean		0.0262	0.1190	0.0269	0.1209	0.0262	0.1191

Table 6. Performance testing under different proportion of training data.

Proportion	Topology	PI-GAT
Proportion	Topology	MSE	MAE
5%	1	0.4246	0.5510
	2	0.6033	0.5981
	3	0.2510	0.3880
10%	1	0.0877	0.2432
	2	0.0912	0.2267
	3	0.0689	0.2056
25%	1	0.0370	0.1472
	2	0.0362	0.1447
	3	0.0475	0.1630
50%	1	0.0336	0.1381
	2	0.0339	0.1384
	3	0.0363	0.1459
100%	1	0.0305	0.1301
	2	0.0282	0.1254
	3	0.0324	0.1364

Table 7. Test performance table under different topology maps.

Topology	Edge	PI-GAT
Topology	Edge	MSE	MAE
1	None	0.0280	0.1234
2	[4,5], [9,11], [13,14]	0.0282	0.1249
3	[4,5], [8,9], [9,11], [12,13], [13,14]	0.0277	0.1236
4	[4,7]	0.0306	0.1309
5	[6,13]	0.0285	0.1247
6	[6,10], [9,14]	0.0280	0.1246
7	[1,2], [10,11]	0.0329	0.1361
8	[1,5], [6,12], [9,14]	0.0278	0.1237
9	[4,5], [4,7], [8,9]	0.0273	0.1217
10	[4,5], [6,12], [9,11], [9,14]	0.0285	0.1254
Mean		0.0288	0.1259

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Q.; Wang, Y.; Yang, X.; Im, S.-K.; Cai, J. Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network. Appl. Sci. 2025, 15, 10555. https://doi.org/10.3390/app151910555

AMA Style

Huang Q, Wang Y, Yang X, Im S-K, Cai J. Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network. Applied Sciences. 2025; 15(19):10555. https://doi.org/10.3390/app151910555

Chicago/Turabian Style

Huang, Qiyue, Yapeng Wang, Xu Yang, Sio-Kei Im, and Jianxiu Cai. 2025. "Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network" Applied Sciences 15, no. 19: 10555. https://doi.org/10.3390/app151910555

APA Style

Huang, Q., Wang, Y., Yang, X., Im, S.-K., & Cai, J. (2025). Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network. Applied Sciences, 15(19), 10555. https://doi.org/10.3390/app151910555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Power Flow Prediction Based on Physics-Informed Graph Attention Network

Abstract

1. Introduction

2. Basic Theory

2.1. Graph Neural Network Theory

2.2. Autoencoder Theory

2.3. Spatial Attention Mechanism

3. Microgrid Analysis

3.1. Microgrid Topology Structure

3.2. Data Analysis

3.3. Flow Calculation Method

4. Model Structure

4.1. Data Reconstructing

4.1.1. Data Standardization

4.1.2. Data Repairing

4.1.3. Data Reconstructing

4.2. Feature Augument Module

4.3. PI-GAT Module

4.4. Predict Module

5. Experiments

5.1. Experimental Setting

5.2. Autoencoding Pre-Training

5.3. Experiments Setting

5.3.1. Model Comparison

5.3.2. Visualization

5.3.3. Attention Map

5.4. Transfer Learning

5.5. Few-Shot Learning

5.6. Performance Verification Under Different Topologies

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI