Review Reports - Uncertainty-Aware Multi-Branch Graph Attention Network for Transient Stability Assessment of Power Systems Under Disturbances

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study develops a robust and accurate transient stability assessment (TSA) framework for modern power systems with high renewable integration. By employing a multi-branch Graph Attention Network (GAT) with feature aggregation and uncertainty-aware fusion, the study seeks to enhance system robustness, improve prediction reliability under disturbances, and effectively identify potential instability risks in complex and large-scale power grids.

While the manuscript offers valuable insights, several concerns must be addressed before it is considered for publication. These comments are intended to enhance the clarity, rigor, practical relevance, and reproducibility of the research. The reviewer recommends the following revisions:

Provide a clearer link between the challenges of transient stability assessment (TSA) in renewable-integrated systems and the specific need for the proposed multi-branch GAT framework.
Clearly highlight how the proposed approach differs from existing GAT-based TSA models, emphasizing unique contributions such as the learnable mask mechanism and uncertainty-aware aggregation.
Discuss the computational complexity and scalability of the proposed model, especially for real-time TSA applications in large-scale systems.
Incorporate additional test cases with varying levels of noise and missing data to further demonstrate robustness under extreme or realistic grid conditions.
Compare the proposed framework with a wider range of baseline methods, including other deep learning architectures such as Graph Convolutional Networks (GCN) and Temporal GNNs.
Discuss how the model’s learned features or attention weights can be interpreted in terms of power system physical properties or operating conditions.
Outline how the proposed model could be integrated into existing TSA or energy management platforms for real-time grid monitoring.
Expand the literature review to include recent studies from the past three years (2023-2025). Improve the literature with the following papers:

https://www.mdpi.com/1996-1073/17/17/4330

https://www.sciencedirect.com/science/article/pii/S0196890425003929

Validate the framework on unseen grid configurations or dynamic topology changes to demonstrate adaptability to evolving power systems.
Elaborate on how uncertainty is modeled, quantified, and used in the aggregation process, potentially including visualization of uncertainty levels.
Strengthen the discussion of future work by prioritizing practical validation steps, such as deployment in real-time dispatch systems and integration with physical-based stability indicators.

Comments on the Quality of English Language

N/A

Author Response

Comments 1: Provide a clearer link between the challenges of transient stability assessment (TSA) in renewable-integrated systems and the specific need for the proposed multi-branch GAT framework.

Response 1: Thank you for your valuable comments. We have revised the motivation for multi-branch GAT framerwork. Specifically, High RES penetration makes TSA inputs non-stationary and noisy (sensor errors, dropouts, rapid dispatch changes), so models that (i) rely on many features are easily destabilized by small scattered perturbations, while (ii) models that rely on few key features can fail catastrophically when those features are corrupted. Our multi-branch GAT directly targets this: each branch learns a sparse, distinct mask so dependence is spread across limited, partly disjoint feature subsets (reducing simultaneous hit probability), and an uncertainty-aware aggregation down-weights branches that become unreliable under current operating conditions. This architecture is thus tailored to RES-driven TSA: diversify what matters, and trust only what is stable—supporting robust, real-time assessment.

Comments 2: Clearly highlight how the proposed approach differs from existing GAT-based TSA models, emphasizing unique contributions such as the learnable mask mechanism and uncertainty-aware aggregation.

Response 2: Compared with prior GAT-based TSA models, our approach introduces three distinctive components: (1) a learnable dual-mask mechanism (on node and edge features) within a multi-branch architecture, where each branch learns a sparse, distinct feature mask via sparsity and inter-branch discrepancy losses (no hard-coded feature groups); (2) a prediction-consistency constraint across branches to stabilize learning while preserving diversity; and (3) an uncertainty-aware aggregation that assigns branch weights using an entropy-based reliability score modulated by mask coverage (the mask-reliability factor), thereby down-weighting unreliable branches under noisy/missing inputs. Together, these designs specifically target robust TSA in disturbance-prone, renewable-integrated settings, which existing GAT-based methods do not explicitly address.

Comments 3: Discuss the computational complexity and scalability of the proposed model, especially for real-time TSA applications in large-scale systems.

Response 3: We now clarified in the Discussion that our work targets moderate-size benchmarks (IEEE 39- and 118-bus), on which the proposed multi-branch GAT with MC dropout trains and infers efficiently on commodity hardware.Even in the worst-case scenario of a fully connected graph topology, the overall computational cost scales as \(\mathcal{O}\!B|V|^{2}d)\)), where \(B\) is the number of GAT branches and \(d\) is the node feature dimension.Given the modest node counts (39 or 118), this complexity remains manageable.We acknowledge that scaling to much larger grids (e.g., thousands of buses) can increase both latency and memory and that obtaining high-fidelity large-grid data is non-trivial.Future work will explore zonal/region partitioning, hierarchical or cluster-wise GNNs, sparse/mini-batch kernels, branch sharing or pruning, and distillation to compact single-branch models to maintain real-time performance at scale.

Comments 4: Incorporate additional test cases with varying levels of noise and missing data to further demonstrate robustness under extreme or realistic grid conditions.

Response 4: We have added a more extreme test setting with 50% noise/missing data to both datasets (in addition to 0/5/10/20%). Results are reported in the revised Table and show our method maintains clear gains over the baselines under this high-stress condition, further supporting its robustness.

Comments 5: Compare the proposed framework with a wider range of baseline methods, including other deep learning architectures such as Graph Convolutional Networks (GCN) and Temporal GNNs.

Response 5: Our paper already reports GCN results as a non-attentional GNN baseline. Temporal GNNs are out of scope here because our setting uses transient inputs without explicit time-series features. We note this limitation and will evaluate temporal baselines in future work when sequence data (e.g., PMU time series) are incorporated.

Comments 6: Discuss how the model’s learned features or attention weights can be interpreted in terms of power system physical properties or operating conditions.

Response 6: We have added a short interpretability analysis in the results section. On the IEEE 39-bus system, a representative branch’s learned mask prioritizes generator-centric cues—bus voltage and nearby generators’ active/reactive power—while largely suppressing load variables. These signals are directly tied to TSA (e.g., voltage depression and abrupt P/Q changes preceding angle separation and loss of synchronism), so the selection is physically plausible and aligns with power-system engineering intuition.

Comments 7&11: Outline how the proposed model could be integrated into existing TSA or energy management platforms for real-time grid monitoring. Strengthen the discussion of future work by prioritizing practical validation steps, such as deployment in real-time dispatch systems and integration with physical-based stability indicators.

Response 7&11: We expanded Future Work to prioritize practical validation and integration: (i) a light-weight deployment path for TSA/EMS (streaming PMU/SCADA adapters, single-pass online inference under a latency budget, zonal partitioning, and a gRPC/REST scoring service with confidence outputs), and (ii) real-time dispatch pilots that combine our predictions with physics-based stability indicators (e.g., energy margin/ROCOF/angle-separation proxies) and report both accuracy and UQ calibration in operator-in-the-loop evaluations. This clarifies how our model will be embedded and validated in practice.

Comments 8: Expand the literature review to include recent studies from the past three years (2023-2025). Improve the literature with the following papers:https://www.mdpi.com/1996-1073/17/17/4330; https://www.sciencedirect.com/science/article/pii/S0196890425003929

Response 8: We expanded the literature (2023–2025) by (i) citing the recent TSA survey that systematizes simulation/direct/data-driven/analytical methods in our “Traditional Methods” section, and (ii) adding the EMS study for RES-rich microgrids highlighting real-time, non-stationary operationin the deep-learning context.

Comments 9: Validate the framework on unseen grid configurations or dynamic topology changes to demonstrate adaptability to evolving power systems.

Response 9: We added to Future Work an explicit plan to validate adaptability on unseen grid configurations and dynamic topology changes (inductive cross-system tests, topology perturbations, and scenario generators), and to study topology-aware augmentation/domain adaptation to sustain performance in evolving power systems.

Comments 10: Elaborate on how uncertainty is modeled, quantified, and used in the aggregation process, potentially including visualization of uncertainty levels.

Response 10: We model branch-level uncertainty with two signals: (i) predictive entropy from MC-dropout to capture confidence, and (ii) a mask-coverage–based reliability factor to penalize branches that depend on too many inputs (more noise exposure). These are combined into a single weight per branch so that uncertain and over-wide branches are down-weighted during aggregation; weights are then normalized across branches. We do not provide uncertainty visualizations because the quantity is a scalar per branch (not a spatial map).

Reviewer 2 Report

Comments and Suggestions for Authors

The proposed architecture involves running multiple GATs in parallel and performing Monte Carlo dropout for uncertainty estimation. This is inherently more computationally expensive than a single GAT model. The paper makes no mention of how much longer does it take to train the multi-branch model compared to the baselines? What is the latency for a single prediction? This is critical for any method intended for "real-time" or "online" assessment.
A discussion on the computational overhead and the model's scalability to much larger power grids (e.g., thousands of buses) is a significant omission.

The model introduces several new hyperparameters. The paper does not discuss how these parameters were selected or how sensitive the model's performance is to their values. A sensitivity analysis would strengthen the paper's claims and provide better guidance for practical implementation.

Do different branches consistently learn to focus on specific types of features (e.g., one branch on generator outputs, another on voltage phase angles)?

Please note that Dropout approximation is a quite convenient yet low-fidelity UQ approach, which has been thoroughly discussed in uncertainty-aware papers e.g., https://pubs.acs.org/doi/10.1021/acsomega.1c00975 A dedicated discussion should be focused on the future work or extension to advanced UQ techniques.

Do the features selected by the model align with the intuition of power systems engineers?
Visualizing or analyzing the learned masks would provide deeper insight into the model's decision-making process and significantly enhance its interpretability.

Some design choices could benefit from further justification. For example, the formula for the mask reliability factor is presented without explanation. While it intuitively penalizes high coverage, the rationale for this specific mathematical form is not discussed. A brief justification would improve the paper's rigor.

Author Response

Comments 1:The proposed architecture involves running multiple GATs in parallel and performing Monte Carlo dropout for uncertainty estimation. This is inherently more computationally expensive than a single GAT model. The paper makes no mention of how much longer does it take to train the multi-branch model compared to the baselines? What is the latency for a single prediction? This is critical for any method intended for "real-time" or "online" assessment.A discussion on the computational overhead and the model's scalability to much larger power grids (e.g., thousands of buses) is a significant omission.

Response 1: Thank you for your valuable comments. We now clarified in the Discussion that our work targets moderate-size benchmarks (IEEE 39- and 118-bus), on which the proposed multi-branch GAT with MC dropout trains and infers efficiently on commodity hardware.Even in the worst-case scenario of a fully connected graph topology, the overall computational cost scales as \(\mathcal{O}\!B|V|^{2}d)\)), where \(B\) is the number of GAT branches and \(d\) is the node feature dimension.Given the modest node counts (39 or 118), this complexity remains manageable.We acknowledge that scaling to much larger grids (e.g., thousands of buses) can increase both latency and memory and that obtaining high-fidelity large-grid data is non-trivial.Future work will explore zonal/region partitioning, hierarchical or cluster-wise GNNs, sparse/mini-batch kernels, branch sharing or pruning, and distillation to compact single-branch models to maintain real-time performance at scale.

Comments 2: The model introduces several new hyperparameters. The paper does not discuss how these parameters were selected or how sensitive the model's performance is to their values. A sensitivity analysis would strengthen the paper's claims and provide better guidance for practical implementation.·

Response 2: We updated Implementation Details to explicitly state the chosen hyperparameters. We use \textbf{B = 8} GAT branches to extract features in power grid topology. For losses, the mask-loss weights are set to \(\alpha = 0.3\), \(\beta = 0.2\), and the alignment-loss weight is \(\alpha = 0.2\). For uncertainty-aware aggregation, we use \(\alpha = 1.2\) for entropy weight and \(\beta = 0.8\) for coverage penalty.

Comments 3: Do different branches consistently learn to focus on specific types of features (e.g., one branch on generator outputs, another on voltage phase angles)?

Response 3: In our design, each branch receives inputs filtered by a learnable feature mask that is updated end-to-end by the training loss. We do not predefine or hard-assign a branch to a specific feature type (e.g., generators vs. phase angles); instead, specialization emerges adaptively from the data and objective. In practice, branches often settle on distinct feature subsets, but the exact partition is data-dependent rather than fixed by design.

Comments 4: Please note that Dropout approximation is a quite convenient yet low-fidelity UQ approach, which has been thoroughly discussed in uncertainty-aware papers e.g., https://pubs.acs.org/doi/10.1021/acsomega.1c00975 IF: 4.3 Q2 A dedicated discussion should be focused on the future work or extension to advanced UQ techniques.

Response 4: We agree that MC-dropout is a convenient but limited UQ proxy. In the revised manuscript we added a Future Work note explicitly committing to higher-fidelity UQ(e.g. deep ensembles, Bayesian/variational GNN layers) along with calibration assessments (e.g., ECE/NLL and coverage).

Comments 5: Do the features selected by the model align with the intuition of power systems engineers?Visualizing or analyzing the learned masks would provide deeper insight into the model's decision-making process and significantly enhance its interpretability.

Response 5: We have added a short interpretability analysis in the results section. On the IEEE 39-bus system, a representative branch’s learned mask prioritizes generator-centric cues—bus voltage and nearby generators’ active/reactive power—while largely suppressing load variables. These signals are directly tied to TSA (e.g., voltage depression and abrupt P/Q changes preceding angle separation and loss of synchronism), so the selection is physically plausible and aligns with power-system engineering intuition.

Comments 6: Some design choices could benefit from further justification. For example, the formula for the mask reliability factor is presented without explanation. While it intuitively penalizes high coverage, the rationale for this specific mathematical form is not discussed. A brief justification would improve the paper's rigor.

Response 6 : We revised the Motivation to clarify that our design seeks sparse, distinct masks so each branch relies on a small, partly disjoint feature subset—reducing the chance that minor, scattered perturbations simultaneously degrade all branches. Accordingly, the mask reliability factor penalizes high coverage because wide masks are more exposed to noise and missing data, which empirically raises branch uncertainty and destabilizes aggregation. We now briefly justify the chosen form as bounded, monotone, near-linear for small coverages, and numerically stable, providing a simple, interpretable way to down-weight over-wide branches.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have effectively addressed my concerns. This revised version of the manuscript shows sufficient improvement and addresses all the reviewers' comments. Therefore, I recommend that it be accepted for publication.