Heterogeneous Graph Neural Network with Local and Global Message Passing for AC-Optimal Power Flow Solutions

Aihui Wen; Bao Wen; Jining Li; Jin Xu

doi:10.3390/asi9010018

Abstract

The AC Optimal Power Flow (AC-OPF) problem remains a major computational bottleneck for real-time power system operation. Conventional solvers are accurate but time-consuming, while Graph Neural Networks (GNNs) offer faster approximations yet struggle to capture long-range dependencies and handle topological variations. To address these limitations, we propose a Heterogeneous Graph Transformer with bus-centric Local–Global Message Passing (LG-HGNN). The model performs type-specific local message passing over heterogeneous power graphs and applies a global Transformer only on bus nodes to capture system-wide correlations efficiently. Effective-resistance positional encodings and resistance-biased attention enhance electrical awareness, whereas bounded decoders and physics-informed regularization preserve operational feasibility. Experiments on IEEE 14-, 30-, and 118-bus systems show that LG-HGNN achieves near-optimal results within a few percent of the AC-OPF optimum and generalizes to thousands of unseen N-1 contingency topologies without retraining. Compared with interior-point solvers, it attains up to

190 \times

speedup before power-flow correction and over

10 \times

afterward on GOC 2000-bus systems, providing a scalable and physically consistent surrogate for real-time AC-OPF.

Keywords:

Heterogeneous Graph Neural Network; local–global message passing; AC optimal power flow; physics-informed learning

1. Introduction

The AC-Optimal Power Flow (AC-OPF) problem represents a cornerstone of power system operations, determining optimal generator dispatch strategies while respecting complex physical and engineering constraints [1]. Despite its fundamental importance across applications such as unit commitment and transmission expansion planning [2], AC-OPF remains computationally formidable for large-scale grids due to its inherent non-convexity and nonlinearity. Existing commercial solvers cannot guarantee globally optimal solutions for large networks within operationally acceptable timeframes [1]. This challenge becomes even more critical as system operators typically solve AC-OPF every 5–15 min [2], and the rapid variability introduced by distributed energy resources (DERs) demands faster and more reliable solution capabilities [3,4].

Although contemporary solvers such as IPOPT have benefited from GPU acceleration and condensed-space algorithms [5], they still fall short of meeting near-real-time requirements and exhibit poor scalability for large transmission systems [6]. Consequently, system operators often rely on mathematical relaxations [7,8] or simplified approximations such as DC-OPF [9]. These methods reduce computational burden but invariably produce sub-optimal and often AC-infeasible solutions [10]. Even small improvements in AC-OPF accuracy may translate into substantial economic and environmental benefits, with estimates suggesting that a 5% improvement in OPF efficiency could save billions of dollars annually [1].

A wide range of methodologies has been proposed to mitigate the computational difficulty of AC-OPF. Classical optimization-based solvers (e.g., Newton and interior-point methods) offer high accuracy but scale poorly and require repeated nonlinear equation solves [11,12]. Heuristic algorithms such as PSO, GA, and ACO [13,14,15] address nonconvexity but incur heavy computational costs and inconsistent convergence. More recently, data-driven models—including fully connected neural networks and deep learning surrogates—have been studied to accelerate OPF computation [16,17,18]. However, non-graph models struggle to generalize across topological variations and often violate AC constraints [2]. Graph Neural Networks (GNNs) alleviate this by explicitly encoding network structure [19,20], yet most existing GNN-based OPF solvers rely on homogeneous graphs and purely local message passing, limiting their ability to capture global electrical dependencies and adapt to N-1 contingency scenarios [21,22].

Graph Neural Networks have therefore received growing interest for AC-OPF due to their natural alignment with power network topologies [19,23]. Nevertheless, existing GNN-based approaches face several fundamental challenges [20,21]: (i) purely local message passing fails to capture long-range dependencies such as global voltage-phase coupling, requiring deep GNN stacks that introduce training instability and over-smoothing; (ii) most architectures rely on homogeneous graph representations, treating buses, generators, shunts, and loads as identical nodes, thereby ignoring inherent physical heterogeneity [24,25]; (iii) recent heterogeneous models [22] still require very deep propagation layers (e.g., 60 layers) for large networks, limiting computational efficiency; (iv) current GNN solvers exhibit limited adaptability to topological variations, often requiring costly retraining for each N-1 contingency case.

To address these challenges, we propose a Heterogeneous Graph Transformer with Local–Global Message Passing (LG-HGNN). Our key contributions are summarized as follows:

Bus-centric heterogeneous Transformer architecture. We develop a heterogeneous GNN combining local, edge-aware message passing with a bus-centric global Transformer. Global attention is applied exclusively to bus nodes—the carriers of system voltage states—while other node types (generators, loads, shunts) receive information hierarchically through connected buses. This physically grounded design reduces global attention complexity and enhances scalability on large grids.
Effective-resistance-aware propagation for topology robustness. We introduce effective-resistance positional encodings to capture electrical distances and bias global attention toward electrically relevant nodes. Combined with physics-informed loss functions and bounded decoders enforcing operational limits, LG-HGNN trained solely on base-topology data generalizes strongly to thousands of unseen N-1 contingencies without retraining.
Near-real-time AC-OPF inference with significant speedups. On IEEE 14-, 30-, 118-bus and GOC 2000-bus systems, LG-HGNN achieves accuracy comparable to interior-point solvers while delivering up to $190 \times$ speedup before power-flow correction and over $10 \times$ afterward, showing strong promise for real-time OPF applications.

The remainder of this paper is organized as follows. Section 2 reviews related work on AC optimal power flow and learning-based solution methods. Section 3 formalizes the power system graph representation and the AC-OPF problem formulation. Section 4 presents the proposed LG-HGNN architecture. Section 5 reports the experimental evaluation and results. Section 6 discusses the implications, limitations, and practical considerations of the proposed approach. Finally, Section 7 concludes the paper.

3. Power System Graph and AC-Optimal Power Flow Formulation

3.1. Notations

For convenience, Table 1 summarizes the main symbols used throughout the paper.

Table 1. Summary of main notation.

3.2. Power System Graph Description

We model the power network as a heterogeneous graph that mirrors the physical structure of the grid while exposing the different engineering roles of its components.

Let

N

denote the set of buses, with

| N | = N

, and let

G

,

L

, and

S

be the sets of generators, loads, and shunt elements, respectively. The transmission infrastructure is represented by a set of directed branches

E \subset N \times N

, where each pair

(i, j) \in E

corresponds either to an AC line or to a transformer. We write

E^{line}

and

E^{tr}

for the subsets of AC lines and transformers, and define

E = E^{line} \cup E^{tr}

.

To capture the heterogeneous nature of the system, we construct a graph

G^{het} = (V, E^{het})

with four node types and three edge types:

Node types
–
Bus nodes $V^{bus}$ : one node for each bus $i \in N$ .
–
Generator nodes $V^{gen}$ : one node for each generator $k \in G$ .
–
Load nodes $V^{load}$ : one node for each load $k \in L$ .
–
Shunt nodes $V^{shunt}$ : one node for each shunt element $k \in S$ .
Edge types
–
AC line edges $E^{line}$ : physical branches between buses.
–
Transformer edges $E^{tr}$ : transformer branches between buses.
–
Connector edges $E^{con}$ : pseudo-edges linking each generator, load, or shunt node to its host bus node.

We denote by

E^{het} = E^{line} \cup E^{tr} \cup E^{con}

the full set of heterogeneous edges used by the neural network, whereas

E

collects only physical bus–bus branches used in the AC-OPF formulation.

Figure 1 illustrates this heterogeneous representation on the IEEE-14 test system. Bus nodes form the backbone of the network, while generators, loads, and shunts are attached as satellite nodes via connector edges.

Figure 1. Illustration of the heterogeneous graph representation of the IEEE-14 bus system.

Each node type carries type-specific physical attributes:

Bus nodes. For each bus $i \in N$ , we store:
–
Voltage magnitude limits $(V_{i}^{l}, V_{i}^{u})$ ;
–
Base voltage (kV) and bus type (PQ, PV, reference, inactive);
Generator nodes. Each generator $k \in G$ is attached to a unique bus $i (k) \in N$ . For generator node k we store:
–
Active/reactive power limits $(P_{k}^{g, l}, P_{k}^{g, u}, Q_{k}^{g, l}, Q_{k}^{g, u})$ ;
–
Quadratic cost coefficients $(a_{k}, b_{k}, c_{k})$ defining

$C_{k} (P_{k}^{g}) = a_{k} {(P_{k}^{g})}^{2} + b_{k} P_{k}^{g} + c_{k};$

–
Initial active/reactive power generation and initial voltage magnitude.
Load nodes. Each load $k \in L$ is linked to its bus $i (k)$ and is characterized by its fixed complex demand

$S_{k}^{d} = P_{k}^{d} + j Q_{k}^{d} .$
Shunt nodes. Each shunt element $k \in S$ is attached to a bus $i (k)$ and described by its shunt admittance

$Y_{k}^{s} = G_{k}^{s} + j B_{k}^{s} .$

For a given bus voltage $V_{i (k)}$ , the corresponding shunt injection is $S_{k}^{s} = {(Y_{k}^{s})}^{*} {| V_{i (k)} |}^{2}$ .

Edge features encode the parameters of physical branches and the auxiliary connector edges:

AC line edges. For each line $(i, j) \in E^{line}$ we store:
–
Series impedance parameters $(r_{i j}, x_{i j})$ with series admittance $Y_{i j} = G_{i j} + j B_{i j} = \frac{1}{r_{i j} + j x_{i j}};$
–
Shunt (charging) admittances $Y_{i j}^{c}$ and $Y_{j i}^{c}$ at the i and j ends of the branch;
–
Thermal rating $s_{i j}^{u}$ (long-term MVA limit), which is used in the branch thermal limit constraint (8);
–
Angle-difference limits $(Δ θ_{i j}^{l}, Δ θ_{i j}^{u})$ defining the admissible range of $Δ θ_{i j}$ in (7).
Transformer edges. For transformer branches $(i, j) \in E^{tr}$ we store the same electrical parameters as for AC lines, namely $(r_{i j}, x_{i j})$ , $Y_{i j}$ , shunt admittances $(Y_{i j}^{c}, Y_{j i}^{c})$ , thermal rating $s_{i j}^{u}$ , and angle-difference limits $(Δ θ_{i j}^{l}, Δ θ_{i j}^{u})$ . In addition, we store the complex tap ratio $T_{i j} = t_{i j} e^{j ϕ_{i j}},$ where $t_{i j}$ is the tap magnitude and $ϕ_{i j}$ is the phase-shift angle.
Connector edges. For each generator, load, or shunt node, we create a connector edge to its bus node. These pseudo-edges carry no physical parameters; we only include a small one-hot type indicator that distinguishes generator–bus, load–bus, and shunt–bus connections.

To avoid ambiguity, we use the following conventions throughout the paper:

G^{het} = (V, E^{het})

denotes the heterogeneous graph used by the neural network;

N

,

G

,

L

, and

S

denote the sets of buses, generators, loads, and shunts used in the AC-OPF formulation; For each bus i, we denote by

G_{i} \subset G

,

L_{i} \subset L

, and

S_{i} \subset S

the subsets of generators, loads, and shunts connected to bus i;

S_{i j}

always denotes the apparent power flow from bus i to bus j, while

s_{i j}^{u}

is its thermal rating.

3.3. AC-OPF Mathematical Formulation

We now recall the AC-OPF problem solved by the conventional optimizer and approximated by our neural model. The formulation follows standard practice and is consistent with the OPFData [46] benchmark and related work.

Decision variables: For every generator

k \in G

we introduce a complex power

S_{k}^{g} = P_{k}^{g} + j Q_{k}^{g},

with real-valued decision variables

(P_{k}^{g}, Q_{k}^{g})

. For each bus

i \in N

we introduce a voltage phasor

V_{i} = | V_{i} | e^{j θ_{i}},

parametrized by voltage magnitude

| V_{i} |

and phase angle

θ_{i}

. Branch flows

S_{i j} = P_{i j} + j Q_{i j}

for

(i, j) \in E

are deterministically given by the bus voltages and branch parameters through the AC

π

-model; they are not independent variables in our formulation.

Objective: The goal of AC-OPF is to find an operating point that minimizes total generation cost:

min_{{P_{k}^{g}, Q_{k}^{g}, | V_{i} |, θ_{i}}} \sum_{k \in G} [a_{k} {(P_{k}^{g})}^{2} + b_{k} P_{k}^{g} + c_{k}] .

(1)

Operational constraints: The decision variables are subject to standard engineering limits as follows:

Bus voltage limits:

V_{i}^{l} \leq | V_{i} | \leq V_{i}^{u}, \forall i \in N .

(2)

Generator active and reactive power limits:

\begin{matrix} P_{k}^{g, l} \leq P_{k}^{g} \leq P_{k}^{g, u}, & \forall k \in G, \end{matrix}

(3)

\begin{matrix} Q_{k}^{g, l} \leq Q_{k}^{g} \leq Q_{k}^{g, u}, & \forall k \in G . \end{matrix}

(4)

Reference bus angles: A subset

R \subseteq N

of buses is designated as reference (slack) buses. Their angles are fixed to eliminate the global rotational degree of freedom:

θ_{r} = 0, \forall r \in R .

(5)

Branch angle difference limits: For each branch

(i, j) \in E

we define the effective angle difference

Δ θ_{i j} = θ_{i} - θ_{j} - ϕ_{i j},

(6)

where

ϕ_{i j}

is the transformer phase shift (zero for standard AC lines). Operational limits on this angle difference are enforced via

Δ θ_{i j}^{l} \leq Δ θ_{i j} \leq Δ θ_{i j}^{u}, \forall (i, j) \in E .

(7)

Branch thermal limits: The apparent power flow on each branch must not exceed its thermal rating:

| S_{i j} | \leq s_{i j}^{u}, \forall (i, j) \in E .

(8)

Branch power flow equations: Let

Y_{i j} = 1 / (r_{i j} + j x_{i j})

denote the series admittance of branch

(i, j)

, and let

Y_{i j}^{c}

and

Y_{j i}^{c}

denote the shunt (charging) admittances at the i and j ends of the branch, respectively. The complex tap ratio is

T_{i j} = t_{i j} e^{j ϕ_{i j}}

, with

T_{i j} = 1

for AC lines without transformer taps.

Under the standard AC

π

-equivalent model, the complex power flows from bus i to j and from j to i are given by

\begin{matrix} S_{i j} & = {(Y_{i j} + Y_{i j}^{c})}^{*} \frac{| V_{i} |^{2}}{| T_{i j} |^{2}} - Y_{i j}^{*} \frac{V_{i} V_{j}^{*}}{T_{i j}}, \forall (i, j) \in E, \end{matrix}

(9)

\begin{matrix} S_{j i} & = {(Y_{i j} + Y_{j i}^{c})}^{*} {| V_{j} |}^{2} - Y_{i j}^{*} \frac{V_{i}^{*} V_{j}}{T_{i j}^{*}}, \forall (i, j) \in E . \end{matrix}

(10)

Each branch flow can be decomposed into its active and reactive components,

S_{i j} = P_{i j} + j Q_{i j}, S_{j i} = P_{j i} + j Q_{j i} .

Power balance constraints: Finally, Kirchhoff’s current law is enforced at each bus in complex power form. For each

i \in N

:

\sum_{k \in G_{i}} S_{k}^{g} - \sum_{k \in L_{i}} S_{k}^{d} - \sum_{k \in S_{i}} S_{k}^{s} = \sum_{j : (i, j) \in E} S_{i j}, \forall i \in N .

(11)

where the right-hand side collects all branch flows

S_{i j}

whose sending end is bus i. Equivalently, one may include both outgoing and incoming flows using an incidence-based formulation; in our implementation, the data is oriented so that

E

contains a unique directed representation for each physical branch with i as the sending end in (11).

Equations (1)–(11) define the AC-OPF problem that we approximate. Given grid parameters and load profiles, the goal of our neural solver is to predict near-optimal operating points

(| V |, θ, P^{g}, Q^{g})

that satisfy these constraints up to small residual violations and from which all remaining electrical quantities (branch flows, shunt injections, etc.) can be recovered via the power flow Equations (9) and (10).

4. Methodology

4.1. Model Architecture

The proposed Heterogeneous Graph Neural Network with Local–Global Message Passing (LG-HGNN) learns a supervised approximation of AC-OPF solutions generated by a conventional solver, as defined in Section 3. After given grid parameters and load profiles, LG-HGNN predicts near-optimal values of

(| V |, θ, P^{g}, Q^{g})

. As shown in Figure 2, the architecture follows an encoder–processor–decoder paradigm:

Figure 2. Overall architecture of the proposed LG-HGNN.

An encoding stage that maps raw heterogeneous node and edge features into a common latent space and augments bus nodes with effective-resistance positional encodings.
A processing stage comprising K layers that combine local heterogeneous message passing with a bus-centric, effective-resistance-biased global transformer aggregation. This hierarchical design captures short-range electrical interactions while modeling long-range dependencies, such as voltage-angle correlations, in a way that respects the physical role of buses.
A decoding stage that uses type-specific bounded decoders to produce physically interpretable quantities (generator powers, bus voltages) directly constrained by the operational limits defined in (2)–(4). Remaining variables, such as branch flows $S_{i j}$ , are derived deterministically using the power flow Equations (9) and (10).

4.1.1. Encoding Stage

To inject global structural information into node representations, we employ effective resistance as a positional encoding that reflects the electrical distance between buses [55]. Consider the DC power flow linearization

P = L θ,

where

P \in R^{N}

collects active power injections at all buses,

θ \in R^{N}

contains voltage angles, and

L \in R^{N \times N}

is a susceptance-weighted graph Laplacian constructed from nonnegative edge weights

b_{i j}

(e.g., the magnitudes of line susceptances). Its entries are

L_{i j} = \{\begin{matrix} - b_{i j}, & i \neq j, \\ \sum_{k = 1}^{N} b_{i k}, & i = j, \end{matrix}

(12)

so that each row of L sums to zero and L is positive semidefinite. Let

L^{+}

denote the Moore–Penrose pseudoinverse of L.

The effective resistance between buses i and j is

Ω_{i j} = {(e_{i} - e_{j})}^{⊤} L^{+} (e_{i} - e_{j}),

(13)

with

e_{i}

the i-th canonical basis vector. The Laplacian L and effective resistances

Ω_{i j}

are computed on the bus–bus graph obtained from

E

.

For each bus i, we summarize its row

Ω_{i, :}

via simple statistics,

{PE}_{i} = [mean (Ω_{i, :}), std (Ω_{i, :}), min (Ω_{i, :}), median (Ω_{i, :}), max (Ω_{i, :})],

(14)

yielding a compact, scale-invariant positional encoding that captures the distribution of electrical distances from bus i to the rest of the network.

We then map all heterogeneous node and edge types to a shared latent space using type-specific multilayer perceptrons. Let

a \in {bus, gen, load, shunt}

index node types, and let

e \in {line, tr, con}

index edge types. Denote by

x_{u}^{a}

the raw feature vector of node u of type a, and by

x_{(i, j)}^{e}

the raw feature vector for an edge

(i, j)

of type e.

For bus nodes

i \in N

, we concatenate the effective resistance encoding and apply a type-specific encoder:

h_{bus, i}^{(0)} = LayerNorm ({MLP}_{enc}^{bus} ([x_{i}^{bus} ∥ {PE}_{i}])) .

(15)

For generator, load, and shunt nodes, we reuse the positional encoding of their host bus

i (u)

:

h_{a, u}^{(0)} = LayerNorm ({MLP}_{enc}^{a} ([x_{u}^{a} ∥ {PE}_{i (u)}])), a \in {gen, load, shunt} .

(16)

For edges, we use edge-type-specific encoders:

h_{e, (i, j)}^{(0)} = LayerNorm ({MLP}_{enc}^{e} (x_{(i, j)}^{e})) .

(17)

All encoders are two-layer MLPs with ReLU activations.

4.1.2. Processing Stage: Local and Global Message Passing

The core of LG-HGNN is a stack of K layers that alternates between local heterogeneous message passing and bus-centric global transformer aggregation. The local mechanism captures near-neighbor electrical interactions and respects type-specific relations, while the global transformer operates on bus embeddings to propagate information across the entire grid, reflecting the fact that voltage magnitudes and angles are defined per bus and are globally coupled through the admittance matrix.

At layer k, we denote node and edge embeddings by

h_{node}^{(k)}

and

h_{edge}^{(k)}

for brevity.

Local message passing with edge-aware attention: Each local layer consists of an edge update followed by a node update. A two-stage attention mechanism is employed to handle grid heterogeneity effectively. The edge update stage modulates messages based on joint node–edge–node context, allowing the model to selectively filter interactions according to physical branch attributes such as impedance, tap ratios, and connection type. The node update attention operates during neighborhood aggregation, re-weighting incoming messages to account for competition among multiple neighbors. This separation reflects the heterogeneous electrical roles of branches and nodes: the edge update stage captures relation-specific effects, while the node update stage resolves context-dependent importance among multiple incident components.

Edge update. For an edge

(i, j)

of type e connecting nodes i and j of types

a (i)

and

a (j)

, we first form an edge message:

\begin{matrix} m_{e, (i, j)}^{(k)} = & {MLP}_{edge}^{e} ([h_{a (i), i}^{(k - 1)} ∥ h_{e, (i, j)}^{(k - 1)} ∥ h_{a (j), j}^{(k - 1)}]), \end{matrix}

(18)

\begin{matrix} α_{e, (i, j)}^{(k)} = & σ ({(w_{e})}^{⊤} m_{e, (i, j)}^{(k)}), \end{matrix}

(19)

\begin{matrix} {\tilde{h}}_{e, (i, j)}^{(k)} = & α_{e, (i, j)}^{(k)} \cdot m_{e, (i, j)}^{(k)}, \end{matrix}

(20)

\begin{matrix} h_{e, (i, j)}^{(k)} = & h_{e, (i, j)}^{(k - 1)} + Dropout ({\tilde{h}}_{e, (i, j)}^{(k)}), \end{matrix}

(21)

where

σ (\cdot)

is the sigmoid function and

w_{e}

is a learnable attention vector specific to edge type e.

Node update. For each node i of type

a \in {bus, gen, load, shunt}

, we aggregate messages over its incident edges

N (i)

(the set of neighbors of node i) using a degree-normalized attention mechanism:

\begin{matrix} q_{a, i}^{(k)} = & W_{a}^{Q} h_{a, i}^{(k - 1)}, \end{matrix}

(22)

\begin{matrix} k_{e, (i, j)}^{(k)} = & W_{e}^{K} [h_{a (j), j}^{(k - 1)} ∥ h_{e, (i, j)}^{(k)}], \end{matrix}

(23)

\begin{matrix} v_{e, (i, j)}^{(k)} = & W_{e}^{V} [h_{a (j), j}^{(k - 1)} ∥ h_{e, (i, j)}^{(k)}], \end{matrix}

(24)

\begin{matrix} e_{i j}^{(k)} = & \frac{{(q_{a, i}^{(k)})}^{⊤} k_{e, (i, j)}^{(k)}}{\sqrt{d}}, \end{matrix}

(25)

\begin{matrix} α_{i j}^{(k)} = & {softmax}_{j \in N (i)} (e_{i j}^{(k)}), \end{matrix}

(26)

\begin{matrix} m_{a, i}^{(k)} = & \sum_{j \in N (i)} \frac{α_{i j}^{(k)}}{\sqrt{d_{i} d_{j}}} v_{e, (i, j)}^{(k)}, \end{matrix}

(27)

\begin{matrix} {\tilde{h}}_{a, i}^{(k)} = & {MLP}_{node}^{a} ([h_{a, i}^{(k - 1)} ∥ m_{a, i}^{(k)}]), \end{matrix}

(28)

\begin{matrix} h_{a, i}^{(k)} = & h_{a, i}^{(k - 1)} + Dropout ({\tilde{h}}_{a, i}^{(k)}), \end{matrix}

(29)

where

W_{a}^{Q}

,

W_{e}^{K}

, and

W_{e}^{V}

are learnable projections, d is the latent dimension, and

d_{i}

denotes the (possibly normalized) degree of node i. Residual connections and dropout mitigate over-smoothing and stabilize deep message passing.

Bus-centric hierarchical global aggregation: After local updates, we apply a global Transformer only to bus nodes, which serve as the physically meaningful carriers of voltage states. We stack the bus-node embeddings at layer k as

H_{bus}^{(k)} = {[h_{bus, i}^{(k)}]}_{i \in N} .

(30)

For each attention head

h = 1, \dots, N_{h}

, we compute bus-level queries, keys, and values as

Q_{h} = H_{bus}^{(k)} W_{Q}^{(h)}, K_{h} = H_{bus}^{(k)} W_{K}^{(h)}, V_{h} = H_{bus}^{(k)} W_{V}^{(h)} .

(31)

To inject electrical-distance priors, we bias the attention logits using the effective resistance

Ω_{i j}

defined in (13). For buses

i, j \in N

, the attention score of head h is

e_{i j}^{(k, h)} = \frac{Q_{h, i} K_{h, j}^{⊤}}{\sqrt{d / N_{h}}} - β Ω_{i j},

(32)

where

β

is a learnable or fixed scaling factor controlling the influence of electrical distance. The corresponding attention weights and head outputs are

α_{i j}^{(k, h)} = {softmax}_{j} (e_{i j}^{(k, h)}), Z_{h, i}^{(k)} = \sum_{j} α_{i j}^{(k, h)} V_{h, j} .

(33)

Outputs from all heads are concatenated and projected:

Z_{bus}^{(k)} = Concat (Z_{1}^{(k)}, \dots, Z_{N_{h}}^{(k)}) W_{O} .

(34)

A feedforward block with residual connection then yields the global bus update:

H_{bus, global}^{(k)} = H_{bus}^{(k)} + γ \cdot Dropout ({MLP}_{trans} (LayerNorm (Z_{bus}^{(k)}))),

(35)

where

γ

is a learnable scaling factor.

The updated bus representations are finally fused back into all node types in a hierarchical fashion. For each bus

i \in N

,

h_{bus, i}^{(k)} \leftarrow LayerNorm (h_{bus, i}^{(k)} + W_{bus} H_{bus, global, i}^{(k)}),

(36)

and for each generator, load, or shunt node u attached to bus

i (u)

,

h_{a, u}^{(k)} \leftarrow LayerNorm (h_{a, u}^{(k)} + W_{a} H_{bus, global, i (u)}^{(k)}), a \in {gen, load, shunt} .

(37)

This bus-centric hierarchical design reduces the quadratic attention complexity to

{| N |}^{2}

and aligns the modeling of long-range dependencies with the physical role of buses in AC power flow.

4.1.3. Decoding Stage

After K layers of processing, the model produces final node embeddings

h_{a, i}^{(K)}

that are decoded into AC-OPF variables. We use type-specific decoders with bounded outputs to enforce box constraints by design.

For each generator node

k \in G

we predict active and reactive powers

({\hat{P}}_{k}^{g}, {\hat{Q}}_{k}^{g})

as

{\hat{y}}_{gen, k} = Scale (σ ({MLP}_{dec}^{gen} (h_{gen, k}^{(K)})), y_{gen, k}^{l}, y_{gen, k}^{u}),

(38)

where

σ (\cdot)

is the sigmoid function mapping into

(0, 1)

, and

Scale (x, y^{l}, y^{u}) = y^{l} + x \cdot (y^{u} - y^{l})

is applied elementwise with

(y_{gen, k}^{l}, y_{gen, k}^{u})

given by the generator limits

(P_{k}^{g, l}, P_{k}^{g, u}, Q_{k}^{g, l}, Q_{k}^{g, u})

. This ensures that the box constraints (3) and (4) are satisfied by construction.

For each bus node

i \in N

we decode the voltage magnitude

| {\hat{V}}_{i} |

within its bounds:

| {\hat{V}}_{i} | = Scale (σ ({MLP}_{dec}^{bus} (h_{bus, i}^{(K)})), V_{i}^{l}, V_{i}^{u}),

(39)

so that (2) holds. Voltage angles

{\hat{θ}}_{i}

are predicted directly as unconstrained real-valued outputs,

{\hat{θ}}_{i} = {MLP}_{dec}^{θ} (h_{bus, i}^{(K)}),

(40)

and are subsequently regulated via angle-difference and power-balance penalties in the loss function rather than hard box constraints.

Given

(| \hat{V} |, \hat{θ})

, we recover complex voltages

{\hat{V}}_{i} = | {\hat{V}}_{i} | e^{j {\hat{θ}}_{i}}

and compute branch flows

{\hat{S}}_{i j}

and

{\hat{S}}_{j i}

deterministically using the power flow Equations (9) and (10). These derived quantities allow us to evaluate branch thermal limits (8) and power balance (11) without making them additional network outputs.

4.2. Loss Function

We train LG-HGNN to approximate AC-OPF solutions using a composite loss that combines a supervised prediction term with two physics-informed regularizers. The overall training objective is

L = L_{pred} + λ_{1} L_{branch} + λ_{2} L_{PF},

(41)

where

λ_{1}

and

λ_{2}

control the trade-off between data fit and physical consistency.

Prediction loss: Given ground-truth AC-OPF solutions

(θ_{i}, | V_{i} |, P_{k}^{g}, Q_{k}^{g}, S_{i j}, S_{j i})

from a conventional solver, the prediction loss penalizes mean-squared deviations between the neural outputs and these labels:

\begin{matrix} L_{pred} = & \sum_{i \in N} (∥ {\hat{θ}}_{i} - θ_{i} ∥_{2}^{2} + ∥ | {\hat{V}}_{i} | - | V_{i} {| ∥}_{2}^{2}) + \sum_{k \in G} (∥ {\hat{P}}_{k}^{g} - P_{k}^{g} ∥_{2}^{2} + {∥ {\hat{Q}}_{k}^{g} - Q_{k}^{g} ∥}_{2}^{2}) \\ + \sum_{(i, j) \in E} (∥ {\hat{S}}_{i j} - S_{i j} ∥_{2}^{2} + {∥ {\hat{S}}_{j i} - S_{j i} ∥}_{2}^{2}) . \end{matrix}

(42)

This term directly encourages the network to match the optimizer’s solution across buses, generators, and branches. Although branch power flows

(S_{i j}, S_{j i})

are deterministically derived from predicted bus voltages and angles via the AC power flow Equations (9) and (10), we intentionally include them in the supervised loss. This redundancy stabilizes the optimization landscape and accelerates convergence, a strategy often employed in physics-informed learning [6,51,52].

Branch constraint violation loss: To promote satisfaction of angle-difference and thermal limits (7)–(8), we penalize violations using a squared hinge function

{[x]}_{+} = max (x, 0)

. For each branch

(i, j) \in E

we define the predicted angle difference

Δ {\hat{θ}}_{i j} = {\hat{θ}}_{i} - {\hat{θ}}_{j} - ϕ_{i j},

and write

L_{branch} = \sum_{(i, j) \in E} ({[Δ {\hat{θ}}_{i j} - Δ θ_{i j}^{u}]}_{+}^{2} + {[Δ θ_{i j}^{l} - Δ {\hat{θ}}_{i j}]}_{+}^{2} + [| {\hat{S}}_{i j} {| - s_{i j}^{u}]}_{+}^{2}) .

(43)

This term encourages the model to remain within operational limits even for samples where such constraints are not explicitly enforced by the supervised labels alone.

Power flow violation loss: Finally, we promote AC feasibility by penalizing violations of the complex power balance (11) at each bus. Using the predicted quantities and the conductance/susceptance entries

G_{i j}^{bus}

and

B_{i j}^{bus}

of the full bus-admittance matrix

Y^{bus}

, we first define the active and reactive power mismatches at each bus i as

\begin{matrix} P_{b}^{i} & = \sum_{k \in G_{i}} {\hat{P}}_{k}^{g} - \sum_{k \in L_{i}} P_{k}^{d} - | {\hat{V}}_{i} | \sum_{j \in N} | {\hat{V}}_{j} | (G_{i j}^{bus} cos ({\hat{θ}}_{i} - {\hat{θ}}_{j}) + B_{i j}^{bus} sin ({\hat{θ}}_{i} - {\hat{θ}}_{j})), \end{matrix}

(44)

\begin{matrix} Q_{b}^{i} & = \sum_{k \in G_{i}} {\hat{Q}}_{k}^{g} - \sum_{k \in L_{i}} Q_{k}^{d} - | {\hat{V}}_{i} | \sum_{j \in N} | {\hat{V}}_{j} | (G_{i j}^{bus} sin ({\hat{θ}}_{i} - {\hat{θ}}_{j}) - B_{i j}^{bus} cos ({\hat{θ}}_{i} - {\hat{θ}}_{j})), \end{matrix}

(45)

and then aggregate them into scalar residuals

P_{b} = \frac{1}{| N |} \sum_{i \in N} | P_{b}^{i} |, Q_{b} = \frac{1}{| N |} \sum_{i \in N} | Q_{b}^{i} | .

(46)

The power flow violation loss is finally defined as

L_{PF} = P_{b} + Q_{b} .

(47)

Minimizing

L_{PF}

drives the predicted operating point toward satisfaction of Kirchhoff’s laws and reduces the amount of correction required by a subsequent AC power flow post-processing step. We use an

L_{2}

norm for

L_{pred}

since it is a supervised regression loss on solver-generated continuous labels, for which squared error is a standard choice. In contrast,

L_{PF}

acts as a feasibility regularizer during training, especially in early epochs. The power-flow mismatches can be sparse and heavy-tailed, i.e., a small number of buses may exhibit large residuals. We therefore adopt an

L_{1}

aggregation in

L_{PF}

to improve robustness to outliers and to prevent a few large mismatches from dominating the gradient signal.

Training Workflow: The overall training workflow of LG-HGNN is summarized in Algorithm 1. Each iteration encodes the heterogeneous graph, performs K layers of local–global message passing with residual updates, decodes the predicted AC-OPF variables, and minimizes the composite loss.

Algorithm 1 Training Procedure of LG-HGNN for AC-OPF Approximation.

Require: Heterogeneous graph

G^{het} = (V, E^{het})

; node and edge features

{x_{u}^{a}}, {x_{(i, j)}^{e}}

; ground-truth AC-OPF labels

(| V |, θ, P^{g}, Q^{g}, S_{i j}, S_{j i})

; hyperparameters

λ_{1}, λ_{2}

.

Ensure: Trained parameters

Θ

of LG-HGNN.

1:: Initialization: Compute effective-resistance encodings ${PE}_{i}$ using Equation (14). Initialize all encoder, processor, and decoder weights $Θ$ .
2:: for each training iteration do
3:: (1) Encoding: Obtain initial node and edge embeddings $h_{a, i}^{(0)}, h_{e, (i, j)}^{(0)}$ via type-specific MLP encoders (Equations (15)–(17)).
4:: (2) Message Passing:
5:: for $k = 1, \dots, K$ do
6:: (a) Update edges using Equations (18)–(21).
7:: (b) Update nodes using Equations (22)–(29).
8:: (c) Compute bus-centric global transformer aggregation using Equations (30)–(35).
9:: (d) Fuse local and global representations via Equations (36) and (37).
10:: end for
11:: (3) Decoding: Generate predictions $(| \hat{V} |, \hat{θ}, {\hat{P}}^{g}, {\hat{Q}}^{g})$ using bounded decoders (Equations (38)–(40)); compute derived branch flows ${\hat{S}}_{i j}$ and ${\hat{S}}_{j i}$ via the power flow Equations (9) and (10).
12:: (4) Compute Loss: Evaluate prediction loss $L_{pred}$ (Equation (42)), branch constraint loss $L_{branch}$ (Equation (43)), and power-flow violation loss $L_{PF}$ (Equations (44)–(47)); form the total loss $L$ as in Equation (41).
13:: (5) Parameter Update: Backpropagate through all modules and update $Θ$ using the Adam optimizer.
14:: end for

In summary, LG-HGNN combines heterogeneous graph modeling, bus-centric local–global message passing with effective-resistance-biased attention, bounded decoding, and physics-informed regularization to learn a fast and topology-robust surrogate of the AC-OPF optimizer that produces near-feasible and near-optimal dispatches directly from the underlying network data.

5. Experimental Evaluation

5.1. Datasets

We evaluate the proposed LG-HGNN framework on the OPFData dataset [46], which is currently the largest publicly available open-source AC-OPF dataset and contains 300,000 solved instances per grid configuration obtained with AC-IPOPT. The dataset encompasses 10 distinct grid topologies from the Power Grid Library [45]. Each grid topology includes two dataset variants: (1) ‘Full topology’ featuring the default grid configuration with active and reactive loads varying independently by

\pm 20 %

of nominal values, and (2) ‘N-1 topology’ where each instance has a randomly disconnected component (either branch or generator). For our experimental analysis, we utilize representative grids: IEEE-14, IEEE-30, IEEE-118. We further include the medium-scale GOC-2000 system from OPFData to assess the scalability of LG-HGNN to grids with 2000 buses and highly heterogeneous topology.

5.2. Baseline Models

To evaluate the effectiveness of the proposed LG-HGNN, we compare it with one classical optimization solver and two representative graph neural networks:

DC-IPOPT: An efficient framework for large-scale nonconvex optimization. It combines a sequential convex approximation outer loop with an interior-point solver (IPOPT) inner loop to iteratively handle nonlinear constraints. This decomposition enhances robustness and convergence stability, enabling DC-IPOPT to serve as a strong baseline for AC-OPF approximation.
Graph Attention Network (GAT): A neural model that integrates the attention mechanism into neighborhood aggregation. By assigning learnable importance weights to neighboring nodes, GAT adaptively captures salient local structures while maintaining inductive learning capability and strong generalization across varying graph topologies.
Graph Isomorphism Network (GIN): A theoretically grounded GNN proven to match the expressive power of the Weisfeiler–Lehman graph isomorphism test. Using a sum-based injective aggregation and an MLP update, GIN distinguishes fine structural variations in graphs and serves as a powerful baseline for graph representation learning.

All neural baselines (GAT, GIN) are trained under the same physics-aware loss function and constraint regularization. Their predictions thus constitute OPF-feasible warm-start candidates for classical solvers. The PF-corrected results effectively represent the hybrid GNN + OPF refinement stage, enabling a fair comparison of the physical consistency across models.

5.3. Experimental Configuration

Dataset partitioning allocates 270,000 samples for training (

90 %

), 15,000 for validation (

5 %

), and 15,000 for testing (

5 %

). Batch sizes are set to 256. For DC-IPOPT, the approximate solution is generated using the open-source Julia package PowerModels.jl [45] with the Ipopt optimizer. For the GAT and GIN baselines, we use the same composite loss function as LG-HGNN and set both models to have 3 layers with a hidden dimension of 256. Both models are trained using the Adam optimizer with a learning rate of

1 \times 10^{- 5}

and weight decay of

1 \times 10^{- 5}

for 200 epochs. For the proposed LG-HGNN, we also use the Adam optimizer with early stopping and gradient clipping for stability, a learning rate of

5 \times 10^{- 5}

, and weight decay of

5 \times 10^{- 5}

for 200 epochs. We set

λ_{1} = 0.7

and

λ_{2} = 0.3

in the composite loss. We also set

K = 3

with a consistent hidden dimension of 256 for all datasets.

5.4. Evaluation Metrics

We assess model performance using three complementary metrics: (1) Mean Squared Error (MSE), which computes the mean of squared differences between predictions and ground-truth values for each AC-OPF variable. (2) Feasibility, quantified by the average degree of constraint violations for Equations (7), (8) and (11), which are not strictly enforced during training. Generator power and voltage magnitude constraints (Equations (2)–(4)) are satisfied by design through our constrained decoding approach. All violation magnitudes are averaged over the test set. (3) Optimality ratio, defined as the ratio (in percent) between the objective cost computed from model predictions and the ground-truth AC-OPF cost.

5.5. Post-Processing Predictions with Power Flow

Power flow analysis can be applied to post-process the raw predictions from machine-learning solvers, ensuring that the resulting solutions satisfy Kirchhoff’s power balance constraints (see Equation (11)) and are AC-feasible. In this step, the predicted bus voltages and generator setpoints are used as initialization for an AC power flow solver (e.g., pandapower with numba acceleration), which computes consistent active and reactive flows across all branches. As detailed in [22], this post-processing guarantees satisfaction of the bus complex power balance equation, but other inequality constraints may still be violated. Specifically, only the slack bus adjusts its active power output to restore system-wide balance, which may lead to violations of its generation bounds. The active and reactive generations at other PV buses remain fixed as predicted by the ML model, while the reactive power at PQ buses may change to satisfy voltage and network constraints. Reactive power limits are not enforced by default; however, these can be incorporated by dynamically converting PV buses that reach their reactive power limits into PQ buses. Overall, the power flow post-processing step serves as a lightweight feasibility restoration procedure that significantly improves the physical consistency of ML-based OPF predictions.

5.6. Performance Comparison on Full Topology

Table 2, Table 3 and Table 4 comprehensively summarize model performance on the full-topology datasets (14-, 30-, and 118-bus systems) before and after AC power flow post-processing. Overall, LG-HGNN (Our_F) consistently matches or outperforms all baselines across almost all variables and grid sizes, while DC-IPOPT frequently exhibits poor AC feasibility once its DC solution is embedded into the full AC network.

Table 2. MSE of baselinesand LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Table 3. Feasibility of baselinesand LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Table 4. Optimality ratio (%) of baselines and LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Prediction Accuracy Analysis (Table 2). For bus-level variables, LG-HGNN achieves the lowest or near-lowest MSE for both voltage angles

θ

and magnitudes

| V |

across all IEEE systems. On the 14-bus grid, the pre-PF MSE for

θ

decreases from

4.38 \times 10^{- 3}

(DC-IPOPT) to

10^{- 5}

with GAT/GIN and further to

1.5 \times 10^{- 6}

with LG-HGNN, a pattern consistent for 30- and 118-bus systems. For

| V |

, all neural models significantly outperform DC-IPOPT, with LG-HGNN and GAT typically yielding the smallest errors, demonstrating the stabilizing effect of physics-informed regularization.

For generator variables, LG-HGNN consistently attains the best or tied-best MSE for both

P^{g}

and

Q^{g}

. While GAT and GIN already reduce

P_{G}

errors by two orders of magnitude over DC-IPOPT, LG-HGNN further improves them, and its explicit modeling of generator nodes enhances the coupling between reactive power and voltage regulation.

For line and transformer flows (

P_{i j}, P_{j i}, Q_{i j}, Q_{j i}

), DC-IPOPT exhibits large post-PF errors (up to

O (10^{- 1})

), reflecting the DC–AC mismatch. GAT and GIN reduce these errors to

10^{- 4}

–

10^{- 3}

, while LG-HGNN reaches the

10^{- 5}

range, particularly excelling on transformer flows. This demonstrates that heterogeneous representations with global attention enable LG-HGNN to capture long-range power-flow dependencies beyond the capacity of DC-IPOPT or conventional GNNs.

Power-flow post-processing (shaded rows) affects each method differently: LG-HGNN’s MSE changes only slightly, showing its predictions are already close to AC-feasible solutions, whereas DC-IPOPT exhibits large deviations in flow-related variables, confirming that its computational simplicity comes at the cost of AC accuracy.

Constraint Violation Degree Analysis (Table 3). The table reports average violations for key constraints, including branch thermal limits (

S_{i j}

,

S_{j i}

), angle-difference limits

Δ θ_{i j}

, and active/reactive power balance (

P_{b}

,

Q_{b}

). Voltage and generator bounds are excluded since they are directly enforced by constrained decoding.

Before power-flow post-processing (non-shaded rows), all neural models show minimal violations (

10^{- 7}

–

10^{- 2}

), with LG-HGNN matching or surpassing the best baseline. For example, on the 118-bus system, the pre-PF thermal-limit violation

S_{i j}

is about

10^{- 7}

for LG-HGNN, while

P_{b}

and

Q_{b}

remain within

10^{- 2}

–

10^{- 3}

pu, indicating that physics-informed loss promotes Kirchhoff-consistent solutions even without PF correction.

After power flow (shaded rows), differences with DC-IPOPT become pronounced. Because DC-IPOPT ignores reactive power and voltage constraints, its post-PF thermal violations reach

O (10^{- 1})

, whereas LG-HGNN and other GNNs remain in the

10^{- 5}

–

10^{- 4}

range. Power-balance errors are reduced to numerical zero for all methods after PF. Overall, LG-HGNN produces solutions inherently closer to AC-feasible operating points than DC-IPOPT, demonstrating superior physical consistency.

Optimality Ratio Analysis (Table 4). The updated results reveal that the proposed LG-HGNN achieves the most economically consistent performance among all solvers. Before power-flow correction, DC-IPOPT yields markedly lower optimality ratios—approximately 96–

97 %

on the 14- and 30-bus systems and

93.8 %

on the 118-bus grid—indicating that the DC approximation systematically underestimates the true AC generation cost. After AC power-flow recalculation, its ratios improve slightly but remain well below

100 %

, confirming that DC-based dispatches are economically sub-optimal when evaluated in the nonlinear AC domain.

In contrast, all neural solvers achieve ratios very close to the AC-OPF optimum. GAT and GIN exhibit slightly super-unit ratios (101–

102 %

) across all networks, while LG-HGNN consistently produces the lowest deviation from

100 %

. Specifically, Our_F attains

100.35 %

,

100.75 %

, and

101.15 %

on the IEEE-14, 30-, and 118-bus cases, respectively, and further stabilizes near

100.25

–

100.95 %

after PF. These near-unity ratios indicate that LG-HGNN accurately reconstructs generation dispatches that are economically very close to the ground-truth AC-OPF solutions, with minimal numerical bias.

As network size increases, the gap between DC-IPOPT and the neural methods widens, highlighting the scalability and robustness of the proposed architecture. Overall, the full-topology experiments confirm that LG-HGNN effectively balances physical feasibility and economic optimality, correcting the systematic cost bias of DC approximations and surpassing GAT and GIN in both accuracy and consistency across all grid scales.

Scalability to Large-Scale Grids

To evaluate the scalability of the proposed LG-HGNN architecture on realistic large-scale power systems, we further conduct experiments on the GOC-2000 dataset, which contains 2000 buses and exhibits significantly higher topological heterogeneity than the IEEE benchmarks. Table 5 reports the MSE values before and after PF correction for bus voltages, generator outputs, and line flows.

Table 5. MSE of baselines and LG-HGNN on GOC-2000 (Full topology). Non-shaded rows are before PF; shaded rows are after PF.

Across all state variables, LG-HGNN (Our_F) consistently achieves the lowest MSE among the learning-based baselines. On this 2000-bus system, the model maintains high accuracy with only moderate error growth relative to the 118-bus case (e.g., bus-angle MSE before PF increases from

1.48 \times 10^{- 4}

to

2.25 \times 10^{- 4}

). After PF correction, all models benefit from enforcing the nonlinear AC power flow equations, and the errors of LG-HGNN are further reduced by approximately 35–40% across most variables (e.g., bus-angle MSE decreases from

2.25 \times 10^{- 4}

to

1.45 \times 10^{- 4}

). This indicates that the predictions produced by LG-HGNN are already close to the AC-feasible manifold and require only small corrective adjustments.

Overall, these results demonstrate that LG-HGNN generalizes effectively to large-scale transmission networks with thousands of buses: it preserves its accuracy advantage over GAT and GIN on GOC-2000, while remaining fully compatible with standard PF-based post-processing. This validates the proposed model as a scalable and physically consistent surrogate for real-time AC-OPF applications on realistic, highly heterogeneous grids.

5.7. Performance Comparison on N-1 Topology

The N-1 contingency experiments evaluate the ability of different methods to generalize across large families of perturbed topologies where either a branch or a generator is randomly removed from service. In this setting, we distinguish between two configurations of our model: (1) Our_F, trained only on full-topology data and evaluated zero-shot on N-1 contingencies; and (2) Our_N, fine-tuned directly on the N-1 training subset. Table 6, Table 7 and Table 8 report prediction accuracy, feasibility, and economic optimality across IEEE-14, -30, and -118 systems.

Table 6. MSE of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Table 7. Feasibility of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Table 8. Optimality ratio (%) of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

Prediction Accuracy Analysis (Table 6). Compared with the full-topology case, N-1 contingencies cause notable distribution shifts as line and generator outages alter network connectivity and dispatch patterns. Under this setting, DC-IPOPT exhibits large post-PF MSEs—especially for reactive power and branch flows—due to its inherent DC–AC mismatch, while GAT and GIN remain competitive but suffer higher errors on larger grids.

Both Our_F (zero-shot) and Our_N (fine-tuned) demonstrate strong topology robustness. Even without N-1 training, Our_F consistently outperforms GAT and GIN across nearly all variables and systems; for example, on the 30-bus grid, the pre-PF MSE of

θ

drops from about

1.2 \times 10^{- 4}

(GAT/GIN) to

1.1 \times 10^{- 4}

for Our_F. This confirms that heterogeneous modeling and effective-resistance encoding enable strong inductive generalization to unseen topologies.

Fine-tuning further refines results: Our_N achieves slightly lower MSEs than Our_F (typically a few percent improvement) while preserving similar scaling across variables and systems. These modest yet consistent gains show that limited N-1 data is sufficient to adapt the pre-trained model to specific contingency patterns.

As in the full-topology case, post-PF processing affects methods differently. DC-IPOPT’s N-1 predictions lead to large post-PF errors (up to

O (10^{- 1})

), whereas Our_F and Our_N exhibit only minor MSE changes, indicating that both already produce solutions close to the AC-feasible manifold even under topological perturbations.

Constraint Violation Analysis (Table 7). The updated feasibility results on the N-1 datasets confirm and sharpen the trends observed under full-topology experiments. Before power-flow correction (non-shaded rows), all learning-based models show small yet finite violations of thermal-limit and power-balance constraints. Across all grids, the average magnitudes for

P_{b}

and

Q_{b}

remain within

10^{- 2}

–

10^{- 3}

pu, indicating that Kirchhoff’s laws are closely, though not perfectly, satisfied through purely learned predictions. Among neural models, Our_F consistently matches or slightly outperforms GAT and GIN, while the fine-tuned variant Our_N yields almost identical values. This pattern demonstrates that the zero-shot model already learns a representation near the N-1-aware feasible region, with minimal need for adaptation.

After AC power-flow recalculation (shaded rows), all neural solvers drive

P_{b}

and

Q_{b}

residuals to numerical zero, leaving only the branch thermal-limit constraints as non-trivial sources of error. Here, the advantage of LG-HGNN becomes particularly clear: on the 30- and 118-bus systems, DC-IPOPT exhibits post-PF thermal violations on the order of

10^{- 1}

, whereas LG-HGNN (both Our_F and Our_N) maintains violations within

10^{- 4}

–

10^{- 5}

. This represents several orders of magnitude improvement in AC-feasibility fidelity, showing that DC-OPF formulations can severely misestimate branch loading under contingencies, while the transformer-based heterogeneous model preserves physical consistency.

Notably, the nearly identical post-PF violation levels of Our_F and Our_N indicate that fine-tuning mainly refines numerical precision and economic optimality rather than altering feasibility behavior. In other words, LG-HGNN’s built-in inductive biases—heterogeneous node/edge typing, combined local-global propagation, and physics-aware regularization—are sufficient to produce AC-feasible operating points even in zero-shot generalization across unseen N-1 contingencies.

Optimality Ratio Analysis (Table 8). The refined results further emphasize the superior economic consistency of LG-HGNN under contingency conditions. Before power-flow correction, DC-IPOPT attains only 94–

97 %

optimality on the smaller IEEE-14 and IEEE-30 systems and about

93 %

on the 118-bus grid, confirming that the DC approximation substantially underestimates the true AC-OPF cost. Even after PF correction, its ratios remain below

100 %

, revealing persistent inefficiencies once DC dispatches are re-evaluated within the nonlinear AC model.

In contrast, the learning-based methods exhibit nearly perfect or slightly super-optimal behavior across all test cases. GAT and GIN reach around 101–

102 %

, while LG-HGNN consistently achieves ratios closest to the ideal value of

100 %

. Specifically, Our_F records approximately

100.5 %

,

101.0 %

, and

101.4 %

across the 14-, 30-, and 118-bus systems, respectively, and stabilizes near

100.45

–

101.25 %

after PF. Fine-tuning on contingency data (Our_N) further aligns predictions with the AC-OPF ground truth, reducing residual deviations by roughly

0.1 %

on average and yielding the most consistent ratios across all grid sizes.

These results confirm that LG-HGNN generalizes effectively to unseen N-1 topologies while preserving cost-optimal behavior. Unlike DC-IPOPT, whose N-1 solutions remain both economically sub-optimal and physically less feasible, LG-HGNN maintains near-unity optimality ratios together with minimal constraint violations. In summary, the model functions as a topology-robust, economically consistent surrogate: even in zero-shot settings it provides AC-feasible and nearly optimal dispatches, and with minor fine-tuning, it becomes virtually indistinguishable from the full AC-OPF solver while retaining its large computational advantage.

5.8. Ablation Study

Table 9 summarizes the ablation results on the IEEE-14, IEEE-30, and IEEE-118 systems. Across all benchmarks, the full LG-HGNN consistently achieves the lowest MSE in both voltage angle

θ

and magnitude

| V |

, demonstrating the necessity of combining effective-resistance priors, constrained decoding, and bus-centric global attention.

Table 9. Ablation studyon IEEE-14, IEEE-30, and IEEE-118 systems (MSE before PF).

Removing the effective-resistance positional encoding (w/o ER-PE) results in a clear loss of accuracy on all three systems. The degradation is most pronounced on IEEE-118, where the

θ

MSE increases from

1.48 \times 10^{- 4}

to

2.31 \times 10^{- 4}

, confirming that electrical-distance bias improves long-range coupling modeling and enhances robustness on large networks.

Eliminating the constrained decoder (w/o Constrained Decoder) leads to the largest growth in

| V |

MSE—nearly a 2.6× increase on the IEEE-118 system—highlighting that physics-aware output parameterization is essential for producing voltage predictions that remain within operationally meaningful bounds before power flow correction.

The variant without global attention (w/o Global Attention) exhibits the most severe performance drop, especially on IEEE-118, where the

θ

MSE more than triples relative to the full model. This confirms that purely local heterogeneous message passing is inadequate for capturing global voltage-angle dependencies, and that bus-centric global aggregation plays a critical role in scaling to large transmission systems.

Overall, the ablations demonstrate that each architectural component contributes meaningfully to the accuracy and physical consistency of LG-HGNN. Removing any one of them leads to noticeable performance degradation, particularly in larger grids where long-range electrical interactions and tight operational limits make AC-OPF learning significantly more challenging.

5.9. Sensitivity Analysis

To assess the robustness of LG-HGNN under structural and parametric perturbations, we conduct a sensitivity analysis along two dimensions: (i) graph topology noise; (ii) model parameter uncertainty. These perturbations represent realistic sources of error in power system operation, including topology misidentification, corrupted measurements, and uncertainty in learning-based predictions. Both tests are compatible with OPFData, which provides AC-OPF labels only for predefined full and N-1 topologies.

5.9.1. Sensitivity to Graph Topology Perturbations

OPFData includes both the base (full) topology and a large set of N-1 contingency topologies for each IEEE system. We use these configurations as structured topology perturbations and evaluate the LG-HGNN model trained on both full and N-1 settings, without retraining on the contingencies. The resulting MSEs on bus voltage angles and magnitudes (before power-flow correction) are summarized in Table 10.

Table 10. Sensitivity of LG-HGNN (Our_F) to N-1 topology perturbations. Reported as MSE on bus voltage angle

θ

and magnitude

| V |

before PF.

As shown in Table 10, LG-HGNN trained solely on the full topology generalizes well to unseen N-1 contingency topologies. On the largest IEEE-118 system, the MSE on bus voltage angles increases only slightly from

1.48 \times 10^{- 4}

to

1.60 \times 10^{- 4}

, while the voltage-magnitude MSE grows from

1.50 \times 10^{- 5}

to

2.60 \times 10^{- 5}

. On IEEE-30, the transition from the full topology to N-1 contingencies leads to a moderate increase in both angle and magnitude errors, but they remain within the same order of magnitude as in the full case.

For the smallest IEEE-14 system, topology changes have a stronger relative impact, as expected for networks with limited redundancy: the angle MSE increases from

1.50 \times 10^{- 6}

to

4.70 \times 10^{- 5}

, and the voltage-magnitude MSE roughly doubles. Even so, the absolute errors remain small in all three systems, indicating that performance degrades gracefully under realistic topology perturbations. Together with the dedicated N-1 model Our_N reported in Table 6, these results show that the proposed architecture is robust to graph topology noise and can be further specialized when contingency-specific training data are available.

5.9.2. Sensitivity to Model Parameter Uncertainty

Since OPFData does not provide ground-truth AC-OPF solutions for modified physical parameters, direct perturbation of line impedances or load values would make supervised evaluation impossible. Instead, we assess sensitivity to parameter uncertainty by perturbing the trained model weights. Specifically, we inject small Gaussian noise into all trainable parameters,

\tilde{W} = W + η, η \sim N (0, σ^{2}),

and evaluate the model on the same test set. This procedure captures robustness to training initialization variability, numerical uncertainty, and potential parameter drift in deployment.

Table 11 summarizes the results for two noise levels. LG-HGNN shows smooth degradation under parameter perturbations: a

σ = 0.01

noise level increases the MSE on

(θ, | V |)

by roughly 8–12%, while a larger

σ = 0.02

perturbation results in moderate growth (about 15–25%). The model remains stable, with no divergence or abnormal predictions, indicating that its heterogeneous and physics-aware architecture provides inherent robustness to weight-level uncertainty.

Table 11. Sensitivity to model parameter noise (MSE before PF).

These results show that LG-HGNN is robust to both realistic topology perturbations (as represented by full vs. N-1 configurations) and to parameter uncertainty modeled through weight perturbations. Performance degrades smoothly as noise intensity increases, highlighting the stability and generalization capability of the proposed heterogeneous local–global architecture.

5.10. Computational Efficiency Analysis

Figure 3 compares the per-instance solution time of DC-IPOPT and the proposed LG-HGNN across all datasets. The conventional DC-IPOPT solver exhibits a strong dependency on grid size, with average runtimes rising from 28.7 ms on the IEEE-14 system to 374.7 ms pre-power-flow (PF), and up to nearly 1 s after PF on the 118-bus system. On the 2000-bus GOC-2000 case, DC-IPOPT further increases to 4178.3 ms before PF and 5436.5 ms after PF. This rapid growth reflects the iterative nature of nonlinear optimization and the computational cost of repeated sparse matrix factorizations.

Figure 3. Average computational time (in ms) per instance comparing DC-IPOPT with LG-HGNN across different power systems.

In contrast, LG-HGNN maintains low inference latency, requiring only 5–7 ms for the 14- and 30-bus grids and below 30 ms for the 118-bus and 2000 cases before PF, since each prediction involves a fixed number of message-passing and attention layers. Consequently, LG-HGNN achieves substantial speed-ups of

\sim 6 \times

,

\sim 18 \times

,

\sim 52 \times

, and

\sim 193 \times

before PF, and

\sim 4 \times

,

\sim 13 \times

,

\sim 37 \times

, and

\sim 12 \times

after PF on the 14-, 30-, 118-, and GOC-2000 systems, respectively. Notably, for the 2000-bus case, the post-PF runtime is dominated by the power-flow correction step, which reduces the overall speed-up compared with the pure feed-forward inference stage. These gains are achieved without sacrificing accuracy or physical feasibility, indicating that the model can serve as an efficient surrogate for computationally intensive AC-OPF solvers.

Overall, the results highlight that LG-HGNN combines high physical fidelity with excellent practical scalability, delivering AC-feasible, near-optimal solutions within milliseconds.

6. Discussion

This section discusses the implications of the proposed LG-HGNN framework, its relation to existing AC-OPF solvers, and its current limitations.

6.1. Interpretation of Results

The results indicate that combining heterogeneous graph modeling with local–global message passing improves both accuracy and robustness for learning-based AC-OPF. The bus-centric global Transformer enables the model to capture long-range electrical dependencies that are difficult to represent using purely local message passing, particularly for voltage angles and reactive power variables. At the same time, explicitly distinguishing buses, generators, loads, and shunts allows the model to better reflect the physical structure of power systems, contributing to improved generalization under operating conditions and topology variations.

6.2. Comparison with Existing Methods

Compared with classical optimization-based solvers, LG-HGNN offers substantially faster inference at the cost of exact optimality guarantees, making it suitable for near-real-time or large-scale evaluation scenarios. In contrast to non-graph neural networks, which are sensitive to topology changes, the proposed graph-based formulation provides stronger structural inductive bias. Compared with existing GNN-based OPF solvers that rely solely on deep local message passing, LG-HGNN achieves competitive performance with fewer layers by introducing a physically motivated, bus-centric global aggregation mechanism, improving scalability and training stability.

6.3. Practical Implications and Limitations

LG-HGNN is best viewed as a fast surrogate or complementary tool to conventional AC-OPF solvers, for applications such as contingency screening, warm-start initialization, and large-scale scenario analysis. However, the approach depends on supervised training data and therefore inherits the limitations of available labeled AC-OPF datasets. In addition, while bounded decoders and physics-informed regularization reduce constraint violations, strict feasibility cannot be guaranteed without post-processing. Extending the framework to more complex OPF variants and hybrid learning–optimization pipelines remains an important direction for future work. Although the global attention module is restricted to bus nodes to reduce complexity, vanilla full attention over all bus nodes still scales quadratically with the number of buses in the worst case. For ultra-large grids (e.g.,

| N_{b u s} | > 10^{4}

), this component may limit scalability and become a computational bottleneck. Exploring sparse- or linear-attention variants could further improve scalability, which we leave as future work.

7. Conclusions

This work introduces the Heterogeneous Graph Neural Network with Local–Global Message Passing (LG-HGNN) as an efficient surrogate for AC Optimal Power Flow (AC-OPF) in large-scale power networks. By explicitly modeling buses, generators, loads, and shunts as distinct node types and transmission lines, transformers, and connector edges as distinct relation types, LG-HGNN faithfully captures the physical heterogeneity of power systems. A bus-centric local–global architecture, which combines edge-aware heterogeneous message passing over the full graph with an effective-resistance-biased global Transformer operating only on bus nodes, enables the model to learn both local electrical interactions and long-range dependencies that are essential for AC power flow. Type-specific bounded decoders and physics-informed regularization further enforce operational limits and approximate power-balance constraints already during training.

Experiments on IEEE 14-, 30-, 118-bus and GOC 2000-bus systems from the OPFData benchmark show that LG-HGNN attains low prediction errors on bus, generator, and branch variables and yields operating points whose objective values are typically within a few percent of the AC-OPF optimum. When evaluated in a zero-shot setting on thousands of unseen N-1 contingency topologies, the model maintains competitive optimality ratios and feasibility metrics without topology-specific retraining, demonstrating strong topology-robust generalization. In terms of computational performance, LG-HGNN offers substantial speedups compared with conventional interior-point-based solvers and achieves up to around

190 \times

acceleration before AC power-flow post-processing and more than

10 \times

speedup even when a corrective power-flow step is applied, making it a promising candidate for near-real-time OPF applications in large-scale networks.

Despite its strong empirical performance, LG-HGNN may still exhibit small residual constraint violations and does not provide strict feasibility guarantees. An interesting direction for future work is to couple the proposed architecture with differentiable optimization layers or projection operators that can enforce feasibility certificates by design. Further extensions to incorporate uncertainty, dynamic operating conditions, and security-constrained OPF formulations, as well as to integrate hybrid learning–optimization schemes, are expected to enhance the reliability and applicability of learning-based AC-OPF surrogates in real-world power system operations.

Author Contributions

A.W.: Conceptualization, Methodology, Software, Formal analysis, Data curation, Writing-original draft, Visualization, Writing-review and editing, Supervision. B.W.: Conceptualization, Investigation, Writing—review and editing, Supervision. J.L. and J.X.: Conceptualization, Investigation, Writing—review and editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This project is supported by the Science and Technology Project of China Southern Power Grid [Project No.030000KC23110056(GDKJXM20231264)].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that this study received funding from China Southern Power Grid. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:

AC-OPF	Alternating current optimal power flow
DC-OPF	Direct current optimal power flow
GNN	Graph neural network
IPOPT	Interior point optimizer
MSE	Mean square error
NN	Neural network
OPF	Optimal power flow
LG-HGNN	Local–Global Heterogeneous Graph Neural Network
PF	Power Flow
ER	Effective Resistance
PE	Positional Encoding

References

Cain, M.B.; O’neill, R.P.; Castillo, A. History of optimal power flow and formulations. Fed. Energy Regul. Comm. 2012, 1, 1–36. [Google Scholar]
Khaloie, H.; Dolanyi, M.; Toubeau, J.F.; Vallée, F. Review of machine learning techniques for optimal power flow. Appl. Energy 2025, 388, 125637. [Google Scholar] [CrossRef]
Panciatici, P.; Bareux, G.; Wehenkel, L. Operating in the fog: Security management under uncertainty. IEEE Power Energy Mag. 2012, 10, 40–49. [Google Scholar] [CrossRef]
Hamann, H.F.; Gjorgiev, B.; Brunschwiler, T.; Martins, L.S.; Puech, A.; Varbella, A.; Weiss, J.; Bernabe-Moreno, J.; Massé, A.B.; Choi, S.L.; et al. Foundation models for the electric power grid. Joule 2024, 8, 3245–3258. [Google Scholar] [CrossRef]
Shin, S.; Anitescu, M.; Pacaud, F. Accelerating optimal power flow with GPUs: SIMD abstraction of nonlinear programs and condensed-space interior-point methods. Electr. Power Syst. Res. 2024, 236, 110651. [Google Scholar] [CrossRef]
Pan, X.; Chen, M.; Zhao, T.; Low, S.H. DeepOPF: A feasibility-optimized deep neural network approach for AC optimal power flow problems. IEEE Syst. J. 2022, 17, 673–683. [Google Scholar] [CrossRef]
Yang, Z.; Zhong, H.; Bose, A.; Zheng, T.; Xia, Q.; Kang, C. A linearized OPF model with reactive power and voltage magnitude: A pathway to improve the MW-only DC OPF. IEEE Trans. Power Syst. 2017, 33, 1734–1745. [Google Scholar] [CrossRef]
Molzahn, D.K.; Holzer, J.T.; Lesieutre, B.C.; DeMarco, C.L. Implementation of a large-scale optimal power flow solver based on semidefinite programming. IEEE Trans. Power Syst. 2013, 28, 3987–3998. [Google Scholar] [CrossRef]
Chatzos, M.; Fioretto, F.; Mak, T.W.; Van Hentenryck, P. High-fidelity machine learning approximations of large-scale optimal power flow. arXiv 2020, arXiv:2006.16356. [Google Scholar]
Baker, K. Solutions of DC OPF are never AC feasible. In Proceedings of the Twelfth ACM International Conference on Future Energy Systems, Virtual, 28 June–2 July 2021; pp. 264–268. [Google Scholar]
Sun, D.I.; Ashley, B.; Brewer, B.; Hughes, A.; Tinney, W.F. Optimal power flow by Newton approach. IEEE Trans. Power Appar. Syst. 2007, PAS-103, 2864–2880. [Google Scholar] [CrossRef]
Momoh, J.A.; Zhu, J. Improved interior point method for OPF problems. IEEE Trans. Power Syst. 1999, 14, 1114–1120. [Google Scholar] [CrossRef]
Naderi, E.; Pourakbari-Kasmaei, M.; Abdi, H. An efficient particle swarm optimization algorithm to solve optimal power flow problem integrated with FACTS devices. Appl. Soft Comput. 2019, 80, 243–262. [Google Scholar] [CrossRef]
Bian, J.; Wang, H.; Wang, L.; Li, G.; Wang, Z. Probabilistic optimal power flow of an AC/DC system with a multiport current flow controller. Csee J. Power Energy Syst. 2020, 7, 744–752. [Google Scholar]
Abdulrasool, A.Q.; Al-Bahrani, L.T. Multi-objective constrained optimal power flow based on enhanced ant colony system algorithm. In Proceedings of the 2021 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, 25–27 March 2021; pp. 1–5. [Google Scholar]
Xiang, P.; Tianyu, Z.; Minghua, C. DeepOPF: Deep neural network for DC optimal power flow. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, Beijing, China, 21–23 October 2019; pp. 1–6. [Google Scholar]
Pan, X. Deepopf: Deep neural networks for optimal power flow. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 17–18 November 2021; pp. 250–251. [Google Scholar]
Yang, Y.; Yang, Z.; Yu, J.; Zhang, B.; Zhang, Y.; Yu, H. Fast calculation of probabilistic power flow: A model-based deep learning approach. IEEE Trans. Smart Grid 2019, 11, 2235–2244. [Google Scholar] [CrossRef]
Owerko, D.; Gama, F.; Ribeiro, A. Optimal power flow using graph neural networks. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 5930–5934. [Google Scholar]
Falconer, T.; Mones, L. Leveraging power grid topology in machine learning assisted optimal power flow. IEEE Trans. Power Syst. 2022, 38, 2234–2246. [Google Scholar] [CrossRef]
Liu, S.; Wu, C.; Zhu, H. Topology-aware graph neural networks for learning feasible and adaptive AC-OPF solutions. IEEE Trans. Power Syst. 2022, 38, 5660–5670. [Google Scholar] [CrossRef]
Piloto, L.; Liguori, S.; Madjiheurem, S.; Zgubic, M.; Lovett, S.; Tomlinson, H.; Elster, S.; Apps, C.; Witherspoon, S. Canos: A fast and scalable neural ac-opf solver robust to n-1 perturbations. arXiv 2024, arXiv:2403.17660. [Google Scholar] [CrossRef]
Gao, M.; Yu, J.; Yang, Z.; Zhao, J. A physics-guided graph convolution neural network for optimal power flow. IEEE Trans. Power Syst. 2023, 39, 380–390. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
Ahmad, N.; Ghadi, Y.; Adnan, M.; Ali, M. Load forecasting techniques for power system: Research challenges and survey. IEEE Access 2022, 10, 71054–71090. [Google Scholar] [CrossRef]
Omar, M.; Sayed, E.; Abdalmagid, M.; Bilgin, B.; Bakr, M.H.; Emadi, A. Review of machine learning applications to the modeling and design optimization of switched reluctance motors. IEEE Access 2022, 10, 130444–130468. [Google Scholar] [CrossRef]
Chowdhury, M.M.U.T.; Kamalasadan, S.; Paudyal, S. A second-order cone programming (socp) based optimal power flow (opf) model with cyclic constraints for power transmission systems. IEEE Trans. Power Syst. 2023, 39, 1032–1043. [Google Scholar] [CrossRef]
Chowdhury, M.M.U.T.; Hasan, M.S.; Kamalasadan, S. A distributed optimal power flow (d-opf) model for radial distribution networks with second-order cone programming (socp). In Proceedings of the 2023 IEEE Industry Applications Society Annual Meeting (IAS), Nashville, TN, USA, 29 October–2 November 2023; pp. 1–8. [Google Scholar]
Sasson, A.M. Combined use of the Powell and Fletcher-Powell nonlinear programming methods for optimal load flows. IEEE Trans. Power Appar. Syst. 2007, PAS-88, 1530–1537. [Google Scholar] [CrossRef]
Lavaei, J.; Low, S.H. Convexification of optimal power flow problem. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–1 October 2010; pp. 223–232. [Google Scholar]
Low, S.H. Convex relaxation of optimal power flow—Part II: Exactness. IEEE Trans. Control Netw. Syst. 2014, 1, 177–189. [Google Scholar] [CrossRef]
Kile, H.; Uhlen, K.; Warland, L.; Kjølle, G. A comparison of AC and DC power flow models for contingency and reliability analysis. In Proceedings of the 2014 Power Systems Computation Conference, Wroclaw, Poland, 18–22 August 2014; pp. 1–7. [Google Scholar]
Torres, G.L.; Quintana, V.H. An interior-point method for nonlinear optimal power flow using voltage rectangular coordinates. IEEE Trans. Power Syst. 2002, 13, 1211–1218. [Google Scholar] [CrossRef]
Roa-Sepulveda, C.; Pavez-Lazo, B. A solution to the optimal power flow using simulated annealing. Int. J. Electr. Power & Energy Syst. 2003, 25, 47–57. [Google Scholar]
Velloso, A.; Van Hentenryck, P. Combining deep learning and optimization for preventive security-constrained DC optimal power flow. IEEE Trans. Power Syst. 2021, 36, 3618–3628. [Google Scholar] [CrossRef]
Park, S.; Chen, W.; Mak, T.W.; Van Hentenryck, P. Compact optimization learning for AC optimal power flow. IEEE Trans. Power Syst. 2023, 39, 4350–4359. [Google Scholar] [CrossRef]
Huang, W.; Chen, M.; Low, S.H. Unsupervised learning for solving AC optimal power flows: Design, analysis, and experiment. IEEE Trans. Power Syst. 2024, 39, 7102–7114. [Google Scholar] [CrossRef]
Jia, Y.; Bai, X.; Zheng, L.; Weng, Z.; Li, Y. ConvOPF-DOP: A data-driven method for solving AC-OPF based on CNN considering different operation patterns. IEEE Trans. Power Syst. 2022, 38, 853–860. [Google Scholar] [CrossRef]
Zamzam, A.S.; Baker, K. Learning optimal solutions for extremely fast AC optimal power flow. In Proceedings of the 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Tempe, AZ, USA, 11–13 November 2020; pp. 1–6. [Google Scholar]
Chatzos, M.; Mak, T.W.; Van Hentenryck, P. Spatial network decomposition for fast and scalable AC-OPF learning. IEEE Trans. Power Syst. 2021, 37, 2601–2612. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Donon, B.; Clément, R.; Donnot, B.; Marot, A.; Guyon, I.; Schoenauer, M. Neural networks for power flow: Graph neural solver. Electr. Power Syst. Res. 2020, 189, 106547. [Google Scholar] [CrossRef]
Zhou, M.; Chen, M.; Low, S.H. DeepOPF-FT: One deep neural network for multiple AC-OPF problems with flexible topology. IEEE Trans. Power Syst. 2022, 38, 964–967. [Google Scholar] [CrossRef]
Babaeinejadsarookolaee, S.; Birchfield, A.; Christie, R.D.; Coffrin, C.; DeMarco, C.; Diao, R.; Ferris, M.; Fliscounakis, S.; Greene, S.; Huang, R.; et al. The power grid library for benchmarking ac optimal power flow algorithms. arXiv 2019, arXiv:1908.02788. [Google Scholar]
Lovett, S.; Zgubic, M.; Liguori, S.; Madjiheurem, S.; Tomlinson, H.; Elster, S.; Apps, C.; Witherspoon, S.; Piloto, L. OPFData: Large-scale datasets for AC optimal power flow with topological perturbations. arXiv 2024, arXiv:2406.07234. [Google Scholar] [CrossRef]
Yehudai, G.; Fetaya, E.; Meirom, E.; Chechik, G.; Maron, H. From local structures to size generalization in graph neural networks. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11975–11986. [Google Scholar]
Rampášek, L.; Galkin, M.; Dwivedi, V.P.; Luu, A.T.; Wolf, G.; Beaini, D. Recipe for a general, powerful, scalable graph transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 14501–14515. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Lu, L.; Pestourie, R.; Yao, W.; Wang, Z.; Verdugo, F.; Johnson, S.G. Physics-informed neural networks with hard constraints for inverse design. SIAM J. Sci. Comput. 2021, 43, B1105–B1132. [Google Scholar] [CrossRef]
Varbella, A.; Briens, D.; Gjorgiev, B.; D’Inverno, G.A.; Sansavini, G. Physics-Informed GNN for non-linear constrained optimization: PINCO a solver for the AC-optimal power flow. arXiv 2024, arXiv:2410.04818. [Google Scholar]
Fioretto, F.; Mak, T.W.; Van Hentenryck, P. Predicting ac optimal power flows: Combining deep learning and lagrangian dual methods. Proc. Aaai Conf. Artif. Intell. 2020, 34, 630–637. [Google Scholar] [CrossRef]
Baker, K. Learning warm-start points for AC optimal power flow. In Proceedings of the 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Pittsburgh, PA, USA, 13–16 October 2019; pp. 1–6. [Google Scholar]
Crozier, C.; Baker, K. Data-driven probabilistic constraint elimination for accelerated optimal power flow. In Proceedings of the 2022 IEEE Power & Energy Society General Meeting (PESGM), Denver, CO, USA, 17–21 July 2022; pp. 1–5. [Google Scholar]
Cetinay, H.; Kuipers, F.A.; Van Mieghem, P. A topological investigation of power flow. IEEE Syst. J. 2016, 12, 2524–2532. [Google Scholar] [CrossRef]

Figure 1. Illustration of the heterogeneous graph representation of the IEEE-14 bus system.

Figure 2. Overall architecture of the proposed LG-HGNN.

Figure 3. Average computational time (in ms) per instance comparing DC-IPOPT with LG-HGNN across different power systems.

Table 1. Summary of main notation.

Symbol	Description
$G^{het} = (V, E^{het})$	Heterogeneous graph used by LG-HGNN
$N$	Set of buses, $\| N \| = N$
$G, L, S$	Set of generators, loads and shunts
$E^{line}$	Set of AC transmission lines (bus–bus branches)
$E^{tr}$	Set of transformers (bus–bus branches)
$E^{con}$	Set of connector edges (gen/load/shunt to bus)
$E$	Set of physical branches, $E = E^{line} \cup E^{tr}$
$E^{het}$	Set of all type branches, $E^{het} = E^{line} \cup E^{tr} \cup E^{con}$
$G_{i}, L_{i}, S_{i}$	Generators/loads/shunts connected to bus i
$V_{i} = \| V_{i} \| e^{j θ_{i}}$	Complex voltage at bus i
$\| V_{i} \|$ , $θ_{i}$	Voltage magnitude and angle at bus i
$V_{i}^{l}, V_{i}^{u}$	Lower and upper bounds on $\| V_{i} \|$
$S_{k}^{g} = P_{k}^{g} + j Q_{k}^{g}$	Complex generation of generator $k \in G$
$P_{k}^{g}, Q_{k}^{g}$	Active and reactive power of generator k
$P_{k}^{g, l}, P_{k}^{g, u}$	Active power limits of generator k
$Q_{k}^{g, l}, Q_{k}^{g, u}$	Reactive power limits of generator k
$(a_{k}, b_{k}, c_{k})$	Quadratic cost coefficients of generator k
$S_{k}^{d} = P_{k}^{d} + j Q_{k}^{d}$	Complex demand of load $k \in L$
$S_{k}^{s}$	Complex shunt injection of shunt $k \in S$
$Y_{k}^{s} = G_{k}^{s} + j B_{k}^{s}$	Shunt admittance of shunt element k
$S_{i j} = P_{i j} + j Q_{i j}$	Apparent power flow from bus i to bus j
$S_{j i} = P_{j i} + j Q_{j i}$	Apparent power flow from bus j to bus i
$s_{i j}^{u}$	Thermal limit (MVA rating) of branch $(i, j)$
$Δ θ_{i j}$	Effective angle difference across branch $(i, j)$
$Δ θ_{i j}^{l}, Δ θ_{i j}^{u}$	Angle difference limits on branch $(i, j)$
$R$	Set of reference (slack) buses
$r_{i j}, x_{i j}$	Series resistance and reactance of branch $(i, j)$
$Y_{i j} = G_{i j} + j B_{i j}$	Series admittance of branch $(i, j)$
$Y_{i j}^{c}$ , $Y_{j i}^{c}$	Shunt (charging) admittances at i and j ends
$T_{i j} = t_{i j} e^{j ϕ_{i j}}$	Complex tap ratio of transformer $(i, j)$
$t_{i j}, ϕ_{i j}$	Magnitude and phase shift of transformer tap
L	Susceptance-weighted graph Laplacian
$L^{+}$	Moore–Penrose pseudoinverse of L
$Ω_{i j}$	Effective resistance between buses i and j
${PE}_{i}$	Positional encoding of bus i from $Ω_{i, :}$
$h_{a, i}^{(k)}$	Node embedding of node i of type a at layer k
$h_{e, (i, j)}^{(k)}$	Edge embedding of edge $(i, j)$ of type e at layer k
K	Number of message-passing layers
$N_{h}$	Number of attention heads in the transformer
$L_{pred}$	Supervised prediction loss
$L_{branch}$	Branch constraint violation loss
$L_{PF}$	Power-flow (nodal balance) violation loss
$P_{b}^{i}, Q_{b}^{i}$	Active and reactive power mismatches at bus i
$Y_{i j}^{bus} = G_{i j}^{bus} + j B_{i j}^{bus}$	Bus-admittance matrix entries
$λ_{1}, λ_{2}$	Weights for physics-informed regularization terms

Table 2. MSE of baselinesand LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

		IEEE-14				IEEE-30				IEEE-118
		DC-IPOPT	GAT	GIN	Our_F	DC-IPOPT	GAT	GIN	Our_F	DC-IPOPT	GAT	GIN	Our_F
Bus	$θ$	$4.38 \times 10^{- 3}$	$1.50 \times 10^{- 5}$	$2.00 \times 10^{- 5}$	$1.50 \times 10^{- 6}$	$2.13 \times 10^{- 2}$	$3.65 \times 10^{- 5}$	$4.00 \times 10^{- 5}$	$3.22 \times 10^{- 5}$	$1.29 \times 10^{- 2}$	$1.49 \times 10^{- 4}$	$1.60 \times 10^{- 4}$	$1.48 \times 10^{- 4}$
	$θ$	$4.35 \times 10^{- 3}$	$1.55 \times 10^{- 5}$	$2.10 \times 10^{- 5}$	$1.48 \times 10^{- 6}$	$2.14 \times 10^{- 2}$	$3.70 \times 10^{- 5}$	$4.10 \times 10^{- 5}$	$3.20 \times 10^{- 5}$	$1.29 \times 10^{- 2}$	$1.50 \times 10^{- 4}$	$1.62 \times 10^{- 4}$	$1.47 \times 10^{- 4}$
	$\| V \|$	$5.13 \times 10^{- 3}$	$3.12 \times 10^{- 6}$	$4.00 \times 10^{- 6}$	$3.12 \times 10^{- 6}$	$4.57 \times 10^{- 3}$	$3.73 \times 10^{- 6}$	$4.00 \times 10^{- 6}$	$1.90 \times 10^{- 6}$	$5.03 \times 10^{- 3}$	$1.65 \times 10^{- 5}$	$1.70 \times 10^{- 5}$	$1.50 \times 10^{- 5}$
	$\| V \|$	$5.12 \times 10^{- 3}$	$3.15 \times 10^{- 6}$	$4.10 \times 10^{- 6}$	$3.10 \times 10^{- 6}$	$4.57 \times 10^{- 3}$	$3.75 \times 10^{- 6}$	$4.05 \times 10^{- 6}$	$1.88 \times 10^{- 6}$	$5.03 \times 10^{- 3}$	$1.66 \times 10^{- 5}$	$1.72 \times 10^{- 5}$	$1.48 \times 10^{- 5}$
Gen	$P^{g}$	$6.80 \times 10^{- 3}$	$5.79 \times 10^{- 5}$	$6.00 \times 10^{- 5}$	$5.79 \times 10^{- 5}$	$1.41 \times 10^{- 2}$	$3.64 \times 10^{- 4}$	$3.80 \times 10^{- 4}$	$3.17 \times 10^{- 4}$	$1.99 \times 10^{- 2}$	$4.88 \times 10^{- 4}$	$5.00 \times 10^{- 4}$	$4.77 \times 10^{- 4}$
	$P^{g}$	$6.74 \times 10^{- 3}$	$5.80 \times 10^{- 5}$	$6.10 \times 10^{- 5}$	$5.78 \times 10^{- 5}$	$1.41 \times 10^{- 2}$	$3.66 \times 10^{- 4}$	$3.82 \times 10^{- 4}$	$3.16 \times 10^{- 4}$	$1.99 \times 10^{- 2}$	$4.90 \times 10^{- 4}$	$5.05 \times 10^{- 4}$	$4.76 \times 10^{- 4}$
	$Q^{g}$	-	$1.01 \times 10^{- 5}$	$1.20 \times 10^{- 5}$	$1.01 \times 10^{- 5}$	-	$9.67 \times 10^{- 4}$	$1.00 \times 10^{- 3}$	$7.88 \times 10^{- 4}$	-	$1.29 \times 10^{- 3}$	$1.35 \times 10^{- 3}$	$1.23 \times 10^{- 3}$
	$Q^{g}$	$1.82 \times 10^{- 1}$	$1.02 \times 10^{- 5}$	$1.22 \times 10^{- 5}$	$1.00 \times 10^{- 5}$	$1.91 \times 10^{- 1}$	$9.70 \times 10^{- 4}$	$1.02 \times 10^{- 3}$	$7.85 \times 10^{- 4}$	$1.11 \times 10^{- 1}$	$1.30 \times 10^{- 3}$	$1.36 \times 10^{- 3}$	$1.22 \times 10^{- 3}$
Line	$P_{i j}$	$6.28 \times 10^{- 3}$	$5.02 \times 10^{- 4}$	$6.00 \times 10^{- 4}$	$1.10 \times 10^{- 4}$	$1.70 \times 10^{- 2}$	$5.46 \times 10^{- 4}$	$6.00 \times 10^{- 4}$	$3.67 \times 10^{- 4}$	$1.40 \times 10^{- 2}$	$1.25 \times 10^{- 3}$	$1.30 \times 10^{- 3}$	$1.28 \times 10^{- 3}$
	$P_{i j}$	$4.27 \times 10^{- 2}$	$5.05 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$1.09 \times 10^{- 4}$	$7.68 \times 10^{- 2}$	$5.48 \times 10^{- 4}$	$6.05 \times 10^{- 4}$	$3.65 \times 10^{- 4}$	$6.86 \times 10^{- 2}$	$1.26 \times 10^{- 3}$	$1.32 \times 10^{- 3}$	$1.27 \times 10^{- 3}$
	$P_{j i}$	$6.38 \times 10^{- 3}$	$5.03 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$1.10 \times 10^{- 4}$	$1.71 \times 10^{- 2}$	$5.45 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$3.66 \times 10^{- 4}$	$1.43 \times 10^{- 2}$	$1.25 \times 10^{- 3}$	$1.31 \times 10^{- 3}$	$1.28 \times 10^{- 3}$
	$P_{j i}$	$3.17 \times 10^{- 2}$	$5.06 \times 10^{- 4}$	$6.12 \times 10^{- 4}$	$1.08 \times 10^{- 4}$	$1.12 \times 10^{- 1}$	$5.47 \times 10^{- 4}$	$6.08 \times 10^{- 4}$	$3.64 \times 10^{- 4}$	$5.51 \times 10^{- 2}$	$1.26 \times 10^{- 3}$	$1.33 \times 10^{- 3}$	$1.27 \times 10^{- 3}$
	$Q_{i j}$	-	$2.58 \times 10^{- 5}$	$3.00 \times 10^{- 5}$	$1.21 \times 10^{- 5}$	-	$4.00 \times 10^{- 4}$	$4.50 \times 10^{- 4}$	$3.04 \times 10^{- 4}$	-	$1.28 \times 10^{- 3}$	$1.35 \times 10^{- 3}$	$1.17 \times 10^{- 3}$
	$Q_{i j}$	$3.20 \times 10^{- 2}$	$2.60 \times 10^{- 5}$	$3.10 \times 10^{- 5}$	$1.20 \times 10^{- 5}$	$1.14 \times 10^{- 1}$	$4.02 \times 10^{- 4}$	$4.55 \times 10^{- 4}$	$3.02 \times 10^{- 4}$	$5.52 \times 10^{- 2}$	$1.29 \times 10^{- 3}$	$1.36 \times 10^{- 3}$	$1.16 \times 10^{- 3}$
	$Q_{j i}$	-	$2.56 \times 10^{- 5}$	$3.10 \times 10^{- 5}$	$1.21 \times 10^{- 5}$	-	$4.01 \times 10^{- 4}$	$4.60 \times 10^{- 4}$	$3.05 \times 10^{- 4}$	-	$1.28 \times 10^{- 3}$	$1.36 \times 10^{- 3}$	$1.17 \times 10^{- 3}$
	$Q_{j i}$	$3.82 \times 10^{- 2}$	$2.58 \times 10^{- 5}$	$3.12 \times 10^{- 5}$	$1.19 \times 10^{- 5}$	$7.52 \times 10^{- 2}$	$4.03 \times 10^{- 4}$	$4.58 \times 10^{- 4}$	$3.03 \times 10^{- 4}$	$6.84 \times 10^{- 2}$	$1.29 \times 10^{- 3}$	$1.37 \times 10^{- 3}$	$1.16 \times 10^{- 3}$
Trans.	$P_{i j}$	$7.22 \times 10^{- 3}$	$2.30 \times 10^{- 4}$	$2.50 \times 10^{- 4}$	$6.47 \times 10^{- 5}$	$4.28 \times 10^{- 3}$	$1.81 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$1.44 \times 10^{- 4}$	$2.25 \times 10^{- 2}$	$7.79 \times 10^{- 4}$	$8.00 \times 10^{- 4}$	$7.53 \times 10^{- 4}$
	$P_{i j}$	$1.72 \times 10^{- 1}$	$2.32 \times 10^{- 4}$	$2.52 \times 10^{- 4}$	$6.45 \times 10^{- 5}$	$8.29 \times 10^{- 2}$	$1.82 \times 10^{- 4}$	$2.02 \times 10^{- 4}$	$1.43 \times 10^{- 4}$	$1.05 \times 10^{- 1}$	$7.81 \times 10^{- 4}$	$8.05 \times 10^{- 4}$	$7.51 \times 10^{- 4}$
	$P_{j i}$	$7.30 \times 10^{- 3}$	$2.31 \times 10^{- 4}$	$2.60 \times 10^{- 4}$	$6.49 \times 10^{- 5}$	$4.32 \times 10^{- 3}$	$1.81 \times 10^{- 4}$	$2.10 \times 10^{- 4}$	$1.44 \times 10^{- 4}$	$2.26 \times 10^{- 2}$	$7.82 \times 10^{- 4}$	$8.10 \times 10^{- 4}$	$7.56 \times 10^{- 4}$
	$P_{j i}$	$9.36 \times 10^{- 2}$	$2.33 \times 10^{- 4}$	$2.62 \times 10^{- 4}$	$6.48 \times 10^{- 5}$	$1.38 \times 10^{- 1}$	$1.83 \times 10^{- 4}$	$2.12 \times 10^{- 4}$	$1.42 \times 10^{- 4}$	$2.14 \times 10^{- 1}$	$7.83 \times 10^{- 4}$	$8.12 \times 10^{- 4}$	$7.54 \times 10^{- 4}$
	$Q_{i j}$	-	$1.98 \times 10^{- 5}$	$2.20 \times 10^{- 5}$	$9.61 \times 10^{- 6}$	-	$2.76 \times 10^{- 4}$	$3.00 \times 10^{- 4}$	$2.40 \times 10^{- 4}$	-	$1.38 \times 10^{- 3}$	$1.45 \times 10^{- 3}$	$1.30 \times 10^{- 3}$
	$Q_{i j}$	$9.29 \times 10^{- 2}$	$2.00 \times 10^{- 5}$	$2.22 \times 10^{- 5}$	$9.60 \times 10^{- 6}$	$1.37 \times 10^{- 1}$	$2.78 \times 10^{- 4}$	$3.02 \times 10^{- 4}$	$2.38 \times 10^{- 4}$	$2.12 \times 10^{- 1}$	$1.39 \times 10^{- 3}$	$1.46 \times 10^{- 3}$	$1.29 \times 10^{- 3}$
	$Q_{j i}$	-	$1.94 \times 10^{- 5}$	$2.10 \times 10^{- 5}$	$9.44 \times 10^{- 6}$	-	$2.78 \times 10^{- 4}$	$3.10 \times 10^{- 4}$	$2.42 \times 10^{- 4}$	-	$1.40 \times 10^{- 3}$	$1.48 \times 10^{- 3}$	$1.32 \times 10^{- 3}$
	$Q_{j i}$	$1.52 \times 10^{- 1}$	$1.96 \times 10^{- 5}$	$2.12 \times 10^{- 5}$	$9.42 \times 10^{- 6}$	$7.90 \times 10^{- 2}$	$2.80 \times 10^{- 4}$	$3.05 \times 10^{- 4}$	$2.40 \times 10^{- 4}$	$9.31 \times 10^{- 2}$	$1.41 \times 10^{- 3}$	$1.49 \times 10^{- 3}$	$1.31 \times 10^{- 3}$

Table 3. Feasibility of baselinesand LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

	IEEE-14				IEEE-30				IEEE-118
	DC-IPOPT	GAT	GIN	Our_F	DC-IPOPT	GAT	GIN	Our_F	DC-IPOPT	GAT	GIN	Our_F
$S_{i j}$	-	$4.39 \times 10^{- 7}$	$6.00 \times 10^{- 7}$	$4.39 \times 10^{- 7}$	-	$2.60 \times 10^{- 6}$	$4.00 \times 10^{- 6}$	$2.60 \times 10^{- 6}$	-	$5.09 \times 10^{- 7}$	$7.00 \times 10^{- 7}$	$5.09 \times 10^{- 7}$
$S_{i j}$	$2.26 \times 10^{- 2}$	$5.24 \times 10^{- 6}$	$7.00 \times 10^{- 6}$	$5.24 \times 10^{- 6}$	$1.10 \times 10^{- 1}$	$3.11 \times 10^{- 5}$	$4.00 \times 10^{- 5}$	$3.11 \times 10^{- 5}$	$6.55 \times 10^{- 2}$	$1.36 \times 10^{- 5}$	$1.80 \times 10^{- 5}$	$1.36 \times 10^{- 5}$
$S_{j i}$	-	$4.83 \times 10^{- 6}$	$6.00 \times 10^{- 6}$	$4.83 \times 10^{- 6}$	-	$2.14 \times 10^{- 7}$	$3.00 \times 10^{- 7}$	$2.14 \times 10^{- 7}$	-	$2.24 \times 10^{- 7}$	$3.50 \times 10^{- 7}$	$2.24 \times 10^{- 7}$
$S_{j i}$	$3.60 \times 10^{- 2}$	$3.17 \times 10^{- 7}$	$4.00 \times 10^{- 7}$	$3.17 \times 10^{- 7}$	$1.17 \times 10^{- 1}$	$4.59 \times 10^{- 5}$	$6.00 \times 10^{- 5}$	$4.59 \times 10^{- 5}$	$6.65 \times 10^{- 2}$	$8.42 \times 10^{- 6}$	$1.10 \times 10^{- 5}$	$8.42 \times 10^{- 6}$
$Δ θ_{i j}$	-	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	-	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	-	$3.05 \times 10^{- 10}$	$5.00 \times 10^{- 10}$	$0.00 \times 10^{+ 0}$
$Δ θ_{i j}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$
$P_{b}$	-	$5.30 \times 10^{- 3}$	$6.00 \times 10^{- 3}$	$5.30 \times 10^{- 3}$	-	$1.08 \times 10^{- 2}$	$1.20 \times 10^{- 2}$	$9.11 \times 10^{- 3}$	-	$2.40 \times 10^{- 2}$	$2.70 \times 10^{- 2}$	$2.40 \times 10^{- 2}$
$P_{b}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$
$Q_{b}$	-	$1.82 \times 10^{- 3}$	$2.50 \times 10^{- 3}$	$1.82 \times 10^{- 3}$	-	$5.35 \times 10^{- 3}$	$6.00 \times 10^{- 3}$	$4.20 \times 10^{- 3}$	-	$7.98 \times 10^{- 3}$	$9.00 \times 10^{- 3}$	$7.98 \times 10^{- 3}$
$Q_{b}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$

Table 4. Optimality ratio (%) of baselines and LG-HGNN across different grid sizes on ’Full topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

	DC-IPOPT	GAT	GIN	Our_F
IEEE-14	96.75	101.15	100.95	100.35
IEEE-14	97.90	100.85	100.68	100.25
IEEE-30	95.15	101.85	101.45	100.75
IEEE-30	96.55	101.35	101.15	100.55
IEEE-118	93.80	102.45	102.05	101.15
IEEE-118	95.20	101.95	101.65	100.95

Table 5. MSE of baselines and LG-HGNN on GOC-2000 (Full topology). Non-shaded rows are before PF; shaded rows are after PF.

		DC-IPOPT	GAT	GIN	Our_F
Bus	$θ$	$5.20 \times 10^{- 4}$	$3.15 \times 10^{- 4}$	$3.40 \times 10^{- 4}$	$2.25 \times 10^{- 4}$
	$θ$ (PF)	$3.80 \times 10^{- 4}$	$2.10 \times 10^{- 4}$	$2.35 \times 10^{- 4}$	$1.45 \times 10^{- 4}$
	$\| V \|$	$5.40 \times 10^{- 5}$	$3.90 \times 10^{- 5}$	$4.10 \times 10^{- 5}$	$2.95 \times 10^{- 5}$
	$\| V \|$ (PF)	$3.60 \times 10^{- 5}$	$2.60 \times 10^{- 5}$	$2.80 \times 10^{- 5}$	$1.85 \times 10^{- 5}$
Gen	$P^{g}$	$7.80 \times 10^{- 4}$	$5.85 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$4.95 \times 10^{- 4}$
	$P^{g}$ (PF)	$5.10 \times 10^{- 4}$	$3.90 \times 10^{- 4}$	$4.10 \times 10^{- 4}$	$3.20 \times 10^{- 4}$
	$Q^{g}$	$1.10 \times 10^{- 3}$	$8.50 \times 10^{- 4}$	$9.20 \times 10^{- 4}$	$7.40 \times 10^{- 4}$
	$Q^{g}$ (PF)	$7.40 \times 10^{- 4}$	$5.80 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$4.80 \times 10^{- 4}$
Line	$P_{i j}$	$2.95 \times 10^{- 3}$	$1.92 \times 10^{- 3}$	$2.05 \times 10^{- 3}$	$1.58 \times 10^{- 3}$
	$P_{i j}$ (PF)	$1.95 \times 10^{- 3}$	$1.25 \times 10^{- 3}$	$1.35 \times 10^{- 3}$	$1.02 \times 10^{- 3}$
	$Q_{i j}$	$3.10 \times 10^{- 3}$	$2.05 \times 10^{- 3}$	$2.20 \times 10^{- 3}$	$1.70 \times 10^{- 3}$
	$Q_{i j}$ (PF)	$2.05 \times 10^{- 3}$	$1.35 \times 10^{- 3}$	$1.45 \times 10^{- 3}$	$1.10 \times 10^{- 3}$

Table 6. MSE of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

		IEEE-14					IEEE-30					IEEE-118
		DC-IPOPT	GAT	GIN	Our_F	Our_N	DC-IPOPT	GAT	GIN	Our_F	Our_N	DC-IPOPT	GAT	GIN	Our_F	Our_N
Bus	$θ$	$4.35 \times 10^{- 3}$	$1.59 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$4.70 \times 10^{- 5}$	$4.50 \times 10^{- 5}$	$2.14 \times 10^{- 2}$	$1.16 \times 10^{- 4}$	$1.30 \times 10^{- 4}$	$1.06 \times 10^{- 4}$	$1.04 \times 10^{- 4}$	$1.29 \times 10^{- 2}$	$1.61 \times 10^{- 4}$	$1.80 \times 10^{- 4}$	$1.60 \times 10^{- 4}$	$1.58 \times 10^{- 4}$
	$θ$	$4.35 \times 10^{- 3}$	$4.02 \times 10^{- 4}$	$4.50 \times 10^{- 4}$	$1.74 \times 10^{- 4}$	$1.70 \times 10^{- 4}$	$2.14 \times 10^{- 2}$	$1.59 \times 10^{- 3}$	$1.80 \times 10^{- 3}$	$1.39 \times 10^{- 3}$	$1.36 \times 10^{- 3}$	$1.29 \times 10^{- 2}$	$6.00 \times 10^{- 4}$	$6.50 \times 10^{- 4}$	$6.09 \times 10^{- 4}$	$6.00 \times 10^{- 4}$
	$\| V \|$	$5.12 \times 10^{- 3}$	$1.45 \times 10^{- 6}$	$1.80 \times 10^{- 6}$	$7.10 \times 10^{- 6}$	$1.05 \times 10^{- 6}$	$4.57 \times 10^{- 3}$	$6.07 \times 10^{- 6}$	$7.00 \times 10^{- 6}$	$2.90 \times 10^{- 6}$	$2.85 \times 10^{- 6}$	$5.03 \times 10^{- 3}$	$1.84 \times 10^{- 5}$	$2.00 \times 10^{- 5}$	$2.60 \times 10^{- 5}$	$1.58 \times 10^{- 5}$
	$\| V \|$	$5.12 \times 10^{- 3}$	$1.30 \times 10^{- 6}$	$1.60 \times 10^{- 6}$	$1.10 \times 10^{- 6}$	$1.05 \times 10^{- 6}$	$4.57 \times 10^{- 3}$	$1.04 \times 10^{- 5}$	$1.20 \times 10^{- 5}$	$4.46 \times 10^{- 6}$	$4.40 \times 10^{- 6}$	$5.03 \times 10^{- 3}$	$1.80 \times 10^{- 5}$	$2.00 \times 10^{- 5}$	$1.56 \times 10^{- 5}$	$1.54 \times 10^{- 5}$
Gen	$P^{g}$	$6.74 \times 10^{- 3}$	$6.65 \times 10^{- 4}$	$7.50 \times 10^{- 4}$	$2.96 \times 10^{- 4}$	$2.90 \times 10^{- 4}$	$1.41 \times 10^{- 2}$	$5.28 \times 10^{- 4}$	$6.00 \times 10^{- 4}$	$4.96 \times 10^{- 4}$	$4.90 \times 10^{- 4}$	$1.99 \times 10^{- 2}$	$5.41 \times 10^{- 4}$	$6.00 \times 10^{- 4}$	$5.62 \times 10^{- 4}$	$5.55 \times 10^{- 4}$
	$P^{g}$	$6.74 \times 10^{- 3}$	$3.89 \times 10^{- 3}$	$4.50 \times 10^{- 3}$	$1.47 \times 10^{- 3}$	$1.45 \times 10^{- 3}$	$1.41 \times 10^{- 2}$	$1.01 \times 10^{- 2}$	$1.20 \times 10^{- 2}$	$9.09 \times 10^{- 3}$	$9.00 \times 10^{- 3}$	$1.99 \times 10^{- 2}$	$1.66 \times 10^{- 3}$	$1.80 \times 10^{- 3}$	$1.69 \times 10^{- 3}$	$1.67 \times 10^{- 3}$
	$Q^{g}$	-	$1.79 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$1.22 \times 10^{- 4}$	$1.20 \times 10^{- 4}$	-	$1.39 \times 10^{- 3}$	$1.50 \times 10^{- 3}$	$1.05 \times 10^{- 3}$	$1.03 \times 10^{- 3}$	-	$1.40 \times 10^{- 3}$	$1.55 \times 10^{- 3}$	$1.30 \times 10^{- 3}$	$1.28 \times 10^{- 3}$
	$Q^{g}$	$1.82 \times 10^{- 1}$	$2.58 \times 10^{- 4}$	$2.80 \times 10^{- 4}$	$1.65 \times 10^{- 4}$	$1.62 \times 10^{- 4}$	$1.91 \times 10^{- 1}$	$1.91 \times 10^{- 3}$	$2.10 \times 10^{- 3}$	$1.39 \times 10^{- 3}$	$1.37 \times 10^{- 3}$	$1.11 \times 10^{- 1}$	$1.39 \times 10^{- 3}$	$1.50 \times 10^{- 3}$	$1.32 \times 10^{- 3}$	$1.30 \times 10^{- 3}$
Line	$P_{i j}$	$6.32 \times 10^{- 3}$	$2.02 \times 10^{- 3}$	$2.30 \times 10^{- 3}$	$1.11 \times 10^{- 3}$	$1.08 \times 10^{- 3}$	$1.69 \times 10^{- 2}$	$8.67 \times 10^{- 4}$	$9.50 \times 10^{- 4}$	$7.04 \times 10^{- 4}$	$6.95 \times 10^{- 4}$	$1.40 \times 10^{- 2}$	$1.38 \times 10^{- 3}$	$1.50 \times 10^{- 3}$	$1.49 \times 10^{- 3}$	$1.47 \times 10^{- 3}$
	$P_{i j}$	$4.30 \times 10^{- 2}$	$1.51 \times 10^{- 3}$	$1.70 \times 10^{- 3}$	$6.53 \times 10^{- 4}$	$6.50 \times 10^{- 4}$	$7.66 \times 10^{- 2}$	$2.04 \times 10^{- 3}$	$2.30 \times 10^{- 3}$	$1.81 \times 10^{- 3}$	$1.79 \times 10^{- 3}$	$6.87 \times 10^{- 2}$	$3.90 \times 10^{- 4}$	$4.20 \times 10^{- 4}$	$3.99 \times 10^{- 4}$	$3.95 \times 10^{- 4}$
	$P_{j i}$	$6.42 \times 10^{- 3}$	$2.01 \times 10^{- 3}$	$2.30 \times 10^{- 3}$	$1.11 \times 10^{- 3}$	$1.08 \times 10^{- 3}$	$1.71 \times 10^{- 2}$	$8.65 \times 10^{- 4}$	$9.50 \times 10^{- 4}$	$7.02 \times 10^{- 4}$	$6.93 \times 10^{- 4}$	$1.43 \times 10^{- 2}$	$1.38 \times 10^{- 3}$	$1.52 \times 10^{- 3}$	$1.49 \times 10^{- 3}$	$1.47 \times 10^{- 3}$
	$P_{j i}$	$3.17 \times 10^{- 2}$	$1.51 \times 10^{- 3}$	$1.72 \times 10^{- 3}$	$6.53 \times 10^{- 4}$	$6.50 \times 10^{- 4}$	$1.12 \times 10^{- 1}$	$2.02 \times 10^{- 3}$	$2.28 \times 10^{- 3}$	$1.80 \times 10^{- 3}$	$1.78 \times 10^{- 3}$	$5.51 \times 10^{- 2}$	$3.91 \times 10^{- 4}$	$4.25 \times 10^{- 4}$	$3.99 \times 10^{- 4}$	$3.95 \times 10^{- 4}$
	$Q_{i j}$	-	$4.43 \times 10^{- 3}$	$5.00 \times 10^{- 3}$	$2.42 \times 10^{- 4}$	$2.40 \times 10^{- 4}$	-	$6.17 \times 10^{- 4}$	$7.00 \times 10^{- 4}$	$3.88 \times 10^{- 4}$	$3.85 \times 10^{- 4}$	-	$1.30 \times 10^{- 3}$	$1.45 \times 10^{- 3}$	$1.19 \times 10^{- 3}$	$1.17 \times 10^{- 3}$
	$Q_{i j}$	$3.20 \times 10^{- 2}$	$1.29 \times 10^{- 4}$	$1.50 \times 10^{- 4}$	$9.70 \times 10^{- 5}$	$9.65 \times 10^{- 5}$	$1.14 \times 10^{- 1}$	$5.38 \times 10^{- 4}$	$6.00 \times 10^{- 4}$	$3.68 \times 10^{- 4}$	$3.65 \times 10^{- 4}$	$5.52 \times 10^{- 2}$	$1.09 \times 10^{- 3}$	$1.20 \times 10^{- 3}$	$9.81 \times 10^{- 4}$	$9.75 \times 10^{- 4}$
	$Q_{j i}$	-	$3.99 \times 10^{- 3}$	$4.50 \times 10^{- 3}$	$2.44 \times 10^{- 4}$	$2.42 \times 10^{- 4}$	-	$6.18 \times 10^{- 4}$	$7.00 \times 10^{- 4}$	$3.90 \times 10^{- 4}$	$3.87 \times 10^{- 4}$	-	$1.30 \times 10^{- 3}$	$1.46 \times 10^{- 3}$	$1.19 \times 10^{- 3}$	$1.17 \times 10^{- 3}$
	$Q_{j i}$	$3.86 \times 10^{- 2}$	$1.35 \times 10^{- 4}$	$1.55 \times 10^{- 4}$	$9.97 \times 10^{- 5}$	$9.92 \times 10^{- 5}$	$7.50 \times 10^{- 2}$	$5.52 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$3.81 \times 10^{- 4}$	$3.78 \times 10^{- 4}$	$6.85 \times 10^{- 2}$	$1.09 \times 10^{- 3}$	$1.21 \times 10^{- 3}$	$9.80 \times 10^{- 4}$	$9.74 \times 10^{- 4}$
Trans.	$P_{i j}$	$7.16 \times 10^{- 3}$	$1.22 \times 10^{- 3}$	$1.40 \times 10^{- 3}$	$6.17 \times 10^{- 4}$	$6.10 \times 10^{- 4}$	$4.28 \times 10^{- 3}$	$2.87 \times 10^{- 4}$	$3.20 \times 10^{- 4}$	$2.44 \times 10^{- 4}$	$2.40 \times 10^{- 4}$	$2.26 \times 10^{- 2}$	$9.57 \times 10^{- 4}$	$1.05 \times 10^{- 3}$	$9.21 \times 10^{- 4}$	$9.15 \times 10^{- 4}$
	$P_{i j}$	$1.72 \times 10^{- 1}$	$3.63 \times 10^{- 3}$	$4.00 \times 10^{- 3}$	$1.37 \times 10^{- 3}$	$1.35 \times 10^{- 3}$	$8.27 \times 10^{- 2}$	$2.76 \times 10^{- 3}$	$3.00 \times 10^{- 3}$	$2.49 \times 10^{- 3}$	$2.46 \times 10^{- 3}$	$1.05 \times 10^{- 1}$	$1.48 \times 10^{- 3}$	$1.60 \times 10^{- 3}$	$1.51 \times 10^{- 3}$	$1.49 \times 10^{- 3}$
	$P_{j i}$	$7.24 \times 10^{- 3}$	$1.22 \times 10^{- 3}$	$1.42 \times 10^{- 3}$	$6.19 \times 10^{- 4}$	$6.12 \times 10^{- 4}$	$4.31 \times 10^{- 3}$	$2.87 \times 10^{- 4}$	$3.20 \times 10^{- 4}$	$2.45 \times 10^{- 4}$	$2.41 \times 10^{- 4}$	$2.26 \times 10^{- 2}$	$9.60 \times 10^{- 4}$	$1.06 \times 10^{- 3}$	$9.24 \times 10^{- 4}$	$9.18 \times 10^{- 4}$
	$P_{j i}$	$9.36 \times 10^{- 2}$	$3.64 \times 10^{- 3}$	$4.05 \times 10^{- 3}$	$1.38 \times 10^{- 3}$	$1.36 \times 10^{- 3}$	$1.38 \times 10^{- 1}$	$2.78 \times 10^{- 3}$	$3.05 \times 10^{- 3}$	$2.51 \times 10^{- 3}$	$2.48 \times 10^{- 3}$	$2.14 \times 10^{- 1}$	$1.49 \times 10^{- 3}$	$1.62 \times 10^{- 3}$	$1.51 \times 10^{- 3}$	$1.49 \times 10^{- 3}$
	$Q_{i j}$	-	$2.21 \times 10^{- 4}$	$2.50 \times 10^{- 4}$	$1.46 \times 10^{- 4}$	$1.44 \times 10^{- 4}$	-	$3.92 \times 10^{- 4}$	$4.30 \times 10^{- 4}$	$2.84 \times 10^{- 4}$	$2.80 \times 10^{- 4}$	-	$1.46 \times 10^{- 3}$	$1.60 \times 10^{- 3}$	$1.39 \times 10^{- 3}$	$1.37 \times 10^{- 3}$
	$Q_{i j}$	$9.30 \times 10^{- 2}$	$1.78 \times 10^{- 4}$	$2.00 \times 10^{- 4}$	$1.24 \times 10^{- 4}$	$1.22 \times 10^{- 4}$	$1.37 \times 10^{- 1}$	$3.72 \times 10^{- 4}$	$4.10 \times 10^{- 4}$	$2.57 \times 10^{- 4}$	$2.54 \times 10^{- 4}$	$2.12 \times 10^{- 1}$	$1.31 \times 10^{- 3}$	$1.45 \times 10^{- 3}$	$1.23 \times 10^{- 3}$	$1.21 \times 10^{- 3}$
	$Q_{j i}$	-	$2.23 \times 10^{- 4}$	$2.50 \times 10^{- 4}$	$1.46 \times 10^{- 4}$	$1.44 \times 10^{- 4}$	-	$3.96 \times 10^{- 4}$	$4.40 \times 10^{- 4}$	$2.88 \times 10^{- 4}$	$2.84 \times 10^{- 4}$	-	$1.47 \times 10^{- 3}$	$1.62 \times 10^{- 3}$	$1.41 \times 10^{- 3}$	$1.39 \times 10^{- 3}$
	$Q_{j i}$	$1.52 \times 10^{- 1}$	$1.99 \times 10^{- 4}$	$2.20 \times 10^{- 4}$	$1.32 \times 10^{- 4}$	$1.30 \times 10^{- 4}$	$7.87 \times 10^{- 2}$	$4.37 \times 10^{- 4}$	$4.80 \times 10^{- 4}$	$3.16 \times 10^{- 4}$	$3.12 \times 10^{- 4}$	$9.32 \times 10^{- 2}$	$1.35 \times 10^{- 3}$	$1.48 \times 10^{- 3}$	$1.26 \times 10^{- 3}$	$1.24 \times 10^{- 3}$

Table 7. Feasibility of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

	IEEE-14					IEEE-30					IEEE-118
	DC-IPOPT	GAT	GIN	Our_F	Our_N	DC-IPOPT	GAT	GIN	Our_F	Our_N	DC-IPOPT	GAT	GIN	Our_F	Our_N
$S_{i j}$	-	$9.98 \times 10^{- 6}$	$1.50 \times 10^{- 5}$	$1.20 \times 10^{- 5}$	$9.98 \times 10^{- 6}$	-	$5.97 \times 10^{- 7}$	$8.00 \times 10^{- 7}$	$4.50 \times 10^{- 7}$	$3.59 \times 10^{- 7}$	-	$2.32 \times 10^{- 6}$	$3.00 \times 10^{- 6}$	$2.50 \times 10^{- 6}$	$2.26 \times 10^{- 6}$
$S_{i j}$	$2.26 \times 10^{- 2}$	$1.14 \times 10^{- 4}$	$1.50 \times 10^{- 4}$	$7.50 \times 10^{- 5}$	$6.41 \times 10^{- 5}$	$1.09 \times 10^{- 1}$	$3.59 \times 10^{- 4}$	$4.50 \times 10^{- 4}$	$3.80 \times 10^{- 4}$	$3.29 \times 10^{- 4}$	$6.55 \times 10^{- 2}$	$2.69 \times 10^{- 5}$	$3.50 \times 10^{- 5}$	$2.90 \times 10^{- 5}$	$2.64 \times 10^{- 5}$
$S_{j i}$	-	$5.65 \times 10^{- 6}$	$8.00 \times 10^{- 6}$	$6.80 \times 10^{- 6}$	$5.65 \times 10^{- 6}$	-	$5.06 \times 10^{- 7}$	$7.00 \times 10^{- 7}$	$5.80 \times 10^{- 7}$	$5.01 \times 10^{- 7}$	-	$1.95 \times 10^{- 6}$	$2.50 \times 10^{- 6}$	$2.20 \times 10^{- 6}$	$1.90 \times 10^{- 6}$
$S_{j i}$	$3.60 \times 10^{- 2}$	$1.13 \times 10^{- 4}$	$1.45 \times 10^{- 4}$	$8.20 \times 10^{- 5}$	$6.92 \times 10^{- 5}$	$1.17 \times 10^{- 1}$	$4.16 \times 10^{- 4}$	$5.00 \times 10^{- 4}$	$4.30 \times 10^{- 4}$	$3.79 \times 10^{- 4}$	$6.65 \times 10^{- 2}$	$2.08 \times 10^{- 5}$	$2.70 \times 10^{- 5}$	$2.30 \times 10^{- 5}$	$2.08 \times 10^{- 5}$
$Δ θ_{i j}$	-	$1.20 \times 10^{- 6}$	$1.50 \times 10^{- 6}$	$5.00 \times 10^{- 10}$	$0.00 \times 10^{+ 0}$	-	$2.50 \times 10^{- 6}$	$3.00 \times 10^{- 6}$	$1.20 \times 10^{- 6}$	$0.00 \times 10^{+ 0}$	-	$4.67 \times 10^{- 6}$	$6.00 \times 10^{- 6}$	$2.50 \times 10^{- 6}$	$0.00 \times 10^{+ 0}$
$Δ θ_{i j}$	$0.00 \times 10^{+ 0}$	$1.10 \times 10^{- 6}$	$1.40 \times 10^{- 6}$	$4.00 \times 10^{- 10}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$2.30 \times 10^{- 6}$	$2.80 \times 10^{- 6}$	$1.00 \times 10^{- 6}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$4.50 \times 10^{- 6}$	$5.80 \times 10^{- 6}$	$2.20 \times 10^{- 6}$	$0.00 \times 10^{+ 0}$
$P_{b}$	-	$1.03 \times 10^{- 2}$	$1.30 \times 10^{- 2}$	$1.15 \times 10^{- 2}$	$1.03 \times 10^{- 2}$	-	$1.03 \times 10^{- 2}$	$1.25 \times 10^{- 2}$	$1.12 \times 10^{- 2}$	$9.81 \times 10^{- 3}$	-	$2.18 \times 10^{- 2}$	$2.60 \times 10^{- 2}$	$2.45 \times 10^{- 2}$	$2.48 \times 10^{- 2}$
$P_{b}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$
$Q_{b}$	-	$4.11 \times 10^{- 3}$	$5.50 \times 10^{- 3}$	$4.80 \times 10^{- 3}$	$4.11 \times 10^{- 3}$	-	$5.26 \times 10^{- 3}$	$6.50 \times 10^{- 3}$	$5.80 \times 10^{- 3}$	$4.76 \times 10^{- 3}$	-	$7.82 \times 10^{- 3}$	$9.50 \times 10^{- 3}$	$8.50 \times 10^{- 3}$	$8.18 \times 10^{- 3}$
$Q_{b}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$	$0.00 \times 10^{+ 0}$

Table 8. Optimality ratio (%) of baselines, Our_F and Our_N across different grid sizes on ‘N-1 topology’. The non-shaded rows are pre-power flow results, while the shaded rows are post-power flow results.

	DC-IPOPT	GAT	GIN	Our_F	Our_N
IEEE-14	96.80	101.32	101.10	100.52	100.42
IEEE-14	97.78	101.08	100.83	100.46	100.34
IEEE-30	94.57	102.02	101.63	100.92	100.84
IEEE-30	96.18	101.57	101.32	100.74	100.66
IEEE-118	93.33	102.73	102.32	101.42	101.33
IEEE-118	94.87	102.22	101.94	101.23	101.13

Table 9. Ablation studyon IEEE-14, IEEE-30, and IEEE-118 systems (MSE before PF).

Method	IEEE-14		IEEE-30		IEEE-118
Method	$θ$	$\| V \|$	$θ$	$\| V \|$	$θ$	$\| V \|$
LG-HGNN (Full)	$1.50 \times 10^{- 6}$	$3.12 \times 10^{- 6}$	$3.22 \times 10^{- 5}$	$1.90 \times 10^{- 6}$	$1.48 \times 10^{- 4}$	$1.50 \times 10^{- 5}$
w/o ER-PE	$1.95 \times 10^{- 6}$	$4.21 \times 10^{- 6}$	$4.28 \times 10^{- 5}$	$2.85 \times 10^{- 6}$	$2.31 \times 10^{- 4}$	$2.37 \times 10^{- 5}$
w/o Constrained Decoder	$2.43 \times 10^{- 6}$	$5.92 \times 10^{- 6}$	$5.01 \times 10^{- 5}$	$4.22 \times 10^{- 6}$	$2.86 \times 10^{- 4}$	$3.95 \times 10^{- 5}$
w/o Global Attention	$3.82 \times 10^{- 6}$	$7.90 \times 10^{- 6}$	$6.44 \times 10^{- 5}$	$6.51 \times 10^{- 6}$	$4.51 \times 10^{- 4}$	$6.87 \times 10^{- 5}$

Table 10. Sensitivity of LG-HGNN (Our_F) to N-1 topology perturbations. Reported as MSE on bus voltage angle

θ

and magnitude

| V |

before PF.

Table 10. Sensitivity of LG-HGNN (Our_F) to N-1 topology perturbations. Reported as MSE on bus voltage angle

θ

and magnitude

| V |

before PF.

System	$θ$		$\| V \|$
System	Full	N-1	Full	N-1
IEEE-14	$1.50 \times 10^{- 6}$	$4.70 \times 10^{- 5}$	$3.12 \times 10^{- 6}$	$7.10 \times 10^{- 6}$
IEEE-30	$3.22 \times 10^{- 5}$	$1.06 \times 10^{- 4}$	$1.90 \times 10^{- 6}$	$2.90 \times 10^{- 6}$
IEEE-118	$1.48 \times 10^{- 4}$	$1.60 \times 10^{- 4}$	$1.50 \times 10^{- 5}$	$2.60 \times 10^{- 5}$

Table 11. Sensitivity to model parameter noise (MSE before PF).

Noise Level	IEEE-14		IEEE-30		IEEE-118
Noise Level	$θ$	$\| V \|$	$θ$	$\| V \|$	$θ$	$\| V \|$
No noise (Full)	$1.50 \times 10^{- 6}$	$3.12 \times 10^{- 6}$	$3.22 \times 10^{- 5}$	$1.90 \times 10^{- 6}$	$1.48 \times 10^{- 4}$	$1.50 \times 10^{- 5}$
$σ = 0.01$	$1.63 \times 10^{- 6}$	$3.41 \times 10^{- 6}$	$3.54 \times 10^{- 5}$	$2.12 \times 10^{- 6}$	$1.67 \times 10^{- 4}$	$1.72 \times 10^{- 5}$
$σ = 0.02$	$1.82 \times 10^{- 6}$	$3.89 \times 10^{- 6}$	$3.98 \times 10^{- 5}$	$2.45 \times 10^{- 6}$	$1.92 \times 10^{- 4}$	$1.94 \times 10^{- 5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Heterogeneous Graph Neural Network with Local and Global Message Passing for AC-Optimal Power Flow Solutions

Abstract

1. Introduction

2. Related Work

2.1. Alternating Current Optimal Power Flow

2.2. Optimization-Based Methods

2.3. Heuristic-Based Methods

2.4. Data-Driven Methods

2.4.1. Non-Graph Deep Learning Methods

2.4.2. Graph Neural Network Methods for OPF

2.4.3. Physics-Informed and Constraint-Aware Neural OPF

2.5. Summary and Method Comparison

3. Power System Graph and AC-Optimal Power Flow Formulation

3.1. Notations

3.2. Power System Graph Description

3.3. AC-OPF Mathematical Formulation

4. Methodology

4.1. Model Architecture

4.1.1. Encoding Stage

4.1.2. Processing Stage: Local and Global Message Passing

4.1.3. Decoding Stage

4.2. Loss Function

5. Experimental Evaluation

5.1. Datasets

5.2. Baseline Models

5.3. Experimental Configuration

5.4. Evaluation Metrics

5.5. Post-Processing Predictions with Power Flow

5.6. Performance Comparison on Full Topology

Scalability to Large-Scale Grids

5.7. Performance Comparison on N-1 Topology

5.8. Ablation Study

5.9. Sensitivity Analysis

5.9.1. Sensitivity to Graph Topology Perturbations

5.9.2. Sensitivity to Model Parameter Uncertainty

5.10. Computational Efficiency Analysis

6. Discussion

6.1. Interpretation of Results

6.2. Comparison with Existing Methods

6.3. Practical Implications and Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Article Access Statistics