Alarm Event Prediction Based on Structural Causal Model in Smart Substation

Lu, Xiang; Chen, Youwei; Fu, Yijia; Ren, Fang; Ma, Zhonggui

doi:10.3390/en19102296

Open AccessArticle

Alarm Event Prediction Based on Structural Causal Model in Smart Substation

by

Xiang Lu

¹,

Youwei Chen

²,

Yijia Fu

²

,

Fang Ren

²

and

Zhonggui Ma

^2,*

¹

Information Center, Guizhou Power Grid Co., Ltd., Guiyang 563000, China

²

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(10), 2296; https://doi.org/10.3390/en19102296

Submission received: 27 March 2026 / Revised: 29 April 2026 / Accepted: 30 April 2026 / Published: 10 May 2026

(This article belongs to the Section A1: Smart Grids and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

In smart substations, long-term operation and environmental disturbances accelerate equipment aging, often leading to abnormal operating states and frequent alarm events. These alarms provide important early indications of potential equipment faults. Situational awareness technologies offer effective means for real-time monitoring and early warning in substations. Meanwhile, Structural Causal Models (SCMs) can uncover underlying causal relationships in operational data, improving prediction stability and interpretability compared with conventional correlation-based methods. This study proposes a novel situational awareness framework for smart substations that integrates deep learning-based causal inference with expert domain knowledge. By guiding the model with the causal diagram derived from substation alarm data as a strong prior, our method learns causal relationships that are statistically significant. Compared with traditional correlation-based statistical approaches, causal inference enables the explicit modeling and adjustment of potential confounding effects under given assumptions, leading to more reliable relationship estimation and a more interpretable model structure. Finally, a case study using real substation data shows improved predictive performance of the proposed method relative to conventional correlation analysis.

Keywords:

smart substation; situational awareness; causal inference; structural causal model; density clustering

1. Introduction

Substations are essential to the reliable operation of power systems, and their increasing scale and complexity have led to frequent alarms caused by equipment aging, environmental disturbances, and operational uncertainties. These alarms provide critical information for situational awareness (SA), which aims to perceive current operating states, interpret their significance, and predict future trends. Effective substation situational awareness relies on the analysis of historical alarm data and real-time operational data to capture complex dependencies among multiple variables.

With the rapid growth of data acquisition technologies, substation operational data have become increasingly high-dimensional and heterogeneous. Redundant or irrelevant variables may obscure critical information, while strong temporal dependencies pose challenges to model interpretability and computational efficiency. Consequently, extracting relevant information from multivariate time series and constructing accurate situational awareness models has become a major research focus. Traditional correlation and statistical methods, such as Pearson correlation, mutual information, regression, and time-series analysis, are widely used but often struggle to characterize complex relationships in high-dimensional substation data.

Reference [1] employs Pearson correlation and other statistical methods to analyze substation equipment inspection data and establishes a Logistic Regression model for advanced equipment situational awareness. Reference [2] presents a big-data-driven approach that constructs a multi-source, multi-dimensional diagnostic model for the power grid, enabling real-time situational awareness through correlation analysis among data nodes. Reference [3] proposes a spatiotemporal attention-based dynamic graph neural network, in which a time-varying correlation matrix is used to model and update relationships among network nodes, thereby improving system state perception and prediction. With the rapid development of artificial intelligence, machine learning methods such as Support Vector Machines [4] and Random Forests [5] have been increasingly applied to situational awareness by learning patterns from large-scale historical data. However, these models still rely heavily on traditional correlation analysis; for example, Reference [6] uses the Pearson correlation coefficient to analyze the relationships between power loss and its influencing factors before applying a machine learning model for prediction.

However, correlation-based analytical methods, even when combined with advanced machine learning models, remain inadequate for revealing the underlying evolution mechanisms of substation alarms. Purely correlation-driven analysis cannot distinguish direct causal relationships from indirect effects or effectively address confounding factors caused by environmental and operational variations, leading to unstable predictions and limited interpretability. Therefore, constructing causal graph models that explicitly describe causal relationships between alarms and equipment states is essential. By modeling the propagation of abnormal states using directed graphs, causal graphs enable the identification of alarm root causes rather than mere statistical associations.

Causal inference [7] is a fundamental research topic across multiple disciplines, providing not only explanatory insights but also principled means for decision making through causal effects. In recent years, causal inference has been increasingly applied in computer science, with Judea Pearl introducing key concepts such as Structural Causal Models (SCMs), backdoor adjustment, and the do-operator. Reference [8] develops a causal inference framework to identify direct causal effects of input samples, reducing bias in deep learning image classification caused by long-tail distributions. Reference [9] proposes a decision-level fusion method based on SCMs, where a causal graph models diagnostic variables and their relationships, significantly improving fault detection performance compared with correlation-based fusion. Reference [10] constructs causal graphs in weakly supervised semantic segmentation to remove confounding effects, achieving notable performance gains. Similarly, Reference [11] integrates causal inference into the Faster R-CNN framework by introducing a confounder dictionary and performing backdoor adjustment, effectively mitigating confounding interference in object detection.

This study proposes a substation situational awareness method based on causal inference. First, a density clustering algorithm is used to cluster the alarm data based on the time difference between alarms. Then, correlation analysis is performed on the alarms within each cluster, and a causal graph is initialized based on the results of the correlation analysis. Next, a Structural Causal Model is employed to identify the causal relationships between alarms. Finally, the initially constructed causal graph is refined by removing weaker causal relationships and retaining only alarms and paths with strong causal connections, generating the final causal relationship graph. Experimental results demonstrate that the simplified model achieves higher prediction accuracy.

Furthermore, this paper introduces a hybrid approach that injects expert domain knowledge directly into the causal inference process as a strong prior. This novel methodology aims to produce a more robust, interpretable, and accurate causal model for enhancing substation situational awareness.

The main contributions of this work are threefold:

(1) We propose a novel hybrid framework for substation situational awareness that integrates data-driven causal inference with expert knowledge, enabling the identification of reliable causal structures beyond spurious correlations.

(2) Within this framework, we explicitly leverage causal analysis for informative variable selection and graph pruning, by identifying and removing redundant or weakly relevant variables. This mechanism not only improves the interpretability of the learned causal graph but also reduces model complexity while preserving essential system dynamics. The causal graph we construct may have captured stable dependency patterns among variables that are informative for fault prediction.

(3) Through a case study on real-world alarm data, we demonstrate that the proposed approach significantly improves downstream fault prediction performance. In particular, the causality-guided variable selection leads to a more compact model with enhanced predictive accuracy, validating its effectiveness for real-time substation deployment.

2. Methodology

2.1. Causal Model

Correlation alone does not establish causality, and drawing causal inferences purely from observed correlations can be misleading. In the context of machine learning, treating correlations as causal relationships may result in models with reduced interpretability and questionable reliability of the conclusions derived. The definition of a causal effect [12] is as follows: If there exist two different values

x

and

x^{'}

such that the distribution of the random variable

Y

under the intervention

d o (X = x)

differs from that under the intervention

d o (X = x^{'})

, then the random variable

X

is said to have a causal effect on the random variable

Y

.

Three main frameworks have been developed for causal modeling: Causal Graphical Models (CGMs), potential outcomes (POs), and Structural Causal Models (SCMs).

Causal Graph Models are used to describe structures and causal relationships. They use directed acyclic graphs (DAGs) to represent these relationships, where

G = (v, ε)

, with

v

representing the set of nodes and

ε

representing the set of edges. The nodes represent random variables in the causal graph, and a directed edge

X \to Y

indicates that

X

is a cause of

Y

, meaning that

X

has a causal effect on

Y

. The Potential Outcome Model typically connects causality to interventions and study subjects, where the intervention represents the cause, and the result caused by the intervention represents the effect. Imbens and Rubin, in reference [13], proposed the meaning of potential outcomes, stating that: given a study subject and a series of interventions, each pair of “intervention–outcome” is defined as a potential outcome. The potential outcome model defines causal effects as the difference between the potential outcomes of the same study subject.

The Structural Causal Model is a framework used to describe causal relationships between variables. It is based on directed acyclic graphs, where variables are represented as nodes in the graph, and are related to their parent nodes and unknown random variables through deterministic functions. The SCM allows algorithms to understand and predict system behaviors while supporting causal inference and intervention. In the theoretical framework of the SCM, causal inference relies on three basic path structures in a DAG: the chain structure, fork structure, and collider structure. These structures facilitate different ways of information flow, and all causal graphs can be decomposed into combinations of these three structures. In the analysis of complex causal models, all causal paths must be considered to accurately infer causal relationships.

In this study, we adopt a Structural Causal Model (SCM) to identify potential confounders and support intervention analysis. After identifying these confounders, targeted interventions are applied to estimate the corresponding causal effects.

2.2. Pearl’s Do-Calculus

Pearl introduced do-calculus to formalize the notion of external intervention in causal systems. By intentionally manipulating a variable, and observing the resultant changes in other variables, one can infer causal directions and relationships. Such interventions help disentangle causation from mere association, since only true causal links yield observable changes in response to interventions.

Within the framework of SCMs, potential confounders can be identified using causal graphs. A variable

Z

is considered a confounder if it affects both the treatment

X

and the outcome

Y

(e.g., via paths

Z \to X

and

Z \to Y

).

Intervention, denoted as

d o (X = x)

, represents an external operation that forcibly sets a variable to a specific value, removing the influence of its parent variables and enabling the estimation of true causal effects.

In practical applications, the causal effect of a variable

X

on an outcome

Y

is defined as the change in

Y

induced by an intervention on

X

, which is often measured using the average causal effect (ACE), defined as

A C E = P (Y = y ∣ d o (X = 1)) - P (Y = y ∣ d o (X = 0))

(1)

This represents the expected change in

Y

when

X

shifts from 0 to 1 at the population level, serving as a core quantitative metric in causal inference.

In real-world substation systems, however, direct intervention is typically infeasible due to operational complexity and practical constraints. Therefore, it is necessary to approximate the effects of intervention using observational data.

To isolate the true causal effect of

X

on

Y

, Pearl proposed the technique of backdoor adjustment, which aims to remove the influence of such confounders.

Consider the causal structure depicted in Figure 1a. If one directly estimates the causal effect using

P (Y ∣ X)

, the resulting value will be confounded by

Z

. The essence of backdoor adjustment lies in conditioning on

Z

, thereby simulating the pure causal effect of an intervention

d o (X = x)

.

If a set of variables

Z

satisfies the backdoor criterion—i.e., blocks all backdoor (non-causal) paths from

X

to

Y

—then the causal effect can be identified via the following backdoor adjustment formula (Formula (2)):

P (Y = y ∣ d o (X = x)) = \sum_{z} P (Y = y ∣ X = x, Z = z) \cdot P (Z = z)

(2)

where

P (Z)

denotes the marginal distribution of the confounder in the natural state;

P (Y ∣ X, Z)

represents the conditional probability of

Y

given

X

and

Z

; and

P (Y ∣ d o (X))

is the post-intervention distribution of Y, which we seek to estimate.

To illustrate the role of do-calculus in causal intervention analysis, the following examples are based on simulated data generated from counterfactual models. Under known causal structures and confounding conditions, the results of correlation-based observational analysis are compared with do-calculus-based intervention inference to demonstrate the effectiveness of do-calculus in identifying confounding effects and true causal relationships.

The following example demonstrates how the backdoor criterion can uncover hidden causal effects with the simulated data in Table 1:

Example 1.

X

: transformer overload;

Y

: transformer fault; confounder (

Z

): ambient temperature.

Step 1: Naive Conditional Probability Estimation from observed data:

P (Y = 1 ∣ X = 0) \approx 10.11 % P (Y = 1 ∣ X = 1) \approx 44.71 %

At face value, it appears that transformer overload increases fault probability significantly. However, this estimate is likely biased due to the influence of

Z

, which has not been controlled.

Step 2: Apply backdoor adjustment by using the backdoor formula (Formula (2)):

\begin{array}{l} P (Y = 1 ∣ d o (X = 1)) = 0.8896 \\ P (Y = 1 ∣ d o (X = 0)) = 0.1200 \end{array}

This adjustment yields a more accurate estimate of the causal effect of

X

on

Y

, isolating the influence of ambient temperature.

Step 3: Comparison:

\begin{array}{l} A C E = P (Y = 1 ∣ d o (X = 1)) - P (Y = 1 ∣ d o (X = 0)) \\ = 0.8896 - 0.1200 = 0.7696 \end{array}

The resulting ACE from do-calculus shows a more substantial and unbiased causal effect, demonstrating the superiority of causal inference over naive correlation-based methods. In this case,

Z

serves as a confounder influencing both

X

and

Y

. By conditioning on

Z

, the backdoor path is blocked, enabling valid causal effect estimation.

However, in scenarios with unobserved confounders or complex path structures, the backdoor criterion may not be directly applicable. In such cases, the front-door adjustment provides an alternative approach by leveraging mediator variables that lie on the causal path between

X

and

Y

but are not influenced by unobserved confounders. The front-door adjustment formula involves a two-stage estimation process: first estimating the effect of the treatment on the mediator, and then the effect of the mediator on the outcome, while accounting for the treatment’s distribution. This method circumvents the need to measure unobserved confounders by focusing on the mediated pathway.

The causal effect of

X

on

Y

can still be identified using the following front-door adjustment formula (Formula (3)), which applies to the causal structure shown in Figure 1b:

P (Y = y ∣ d o (X = x)) = \sum_{z} P (Z = z ∣ X = x) \cdot \sum_{x^{'}} P (Y = y ∣ X = x^{'}, Z = z) \cdot P (X = x^{'})

(3)

where

P (y ∣ d o (x))

denotes the causal distribution of

Y

when an external intervention sets

X

to

x

;

P (z ∣ x)

represents the conditional distribution of the mediator

Z

given

X = x

in observational data;

P (y ∣ x^{'}, z)

denotes the conditional distribution of

Y

given

Z = z

and

X = x^{'}

; and

P (x^{'})

represents the marginal distribution of

X

in the observational data, which is used to eliminate non-causal associations.

The following example demonstrates how the front-door criterion can uncover hidden causal effects with the simulated data in Table 2:

Example 2.

X: overvoltage; Y: transformer failure; unobserved confounder (U): initial equipment defect; mediator (Z): insulation degradation.

Step 1: Naive Conditional Probability Estimation from observed data:

P (Y = 1 ∣ X = 0) = 40 %, P (Y = 1 ∣ X = 1) = 50 %

A direct estimate suggests a 10% increase in failure due to overvoltage, but this ignores the hidden confounder

U

.

Step 2: Apply front-door adjustment using the front-door formula (Formula (3)):

\begin{array}{l} P (Y = 1 ∣ d o (X = 1)) = 0.2 \cdot 0.2 + 0.8 \cdot 0.7 = 0.6 \\ P (Y = 1 ∣ d o (X = 0)) = 0.8 \cdot 0.2 + 0.2 \cdot 0.7 = 0.3 \end{array}

Step 3: Comparison:

A C E = P (Y = 1 ∣ d o (X = 1)) - P (Y = 1 ∣ d o (X = 0)) = 0.6 - 0.3 = 0.3

After applying the front-door adjustment, the estimated causal effect indicates that overvoltage increases fault probability by 30%, revealing a stronger causal link than naive estimation suggested. This reinforces the value of do-calculus in uncovering hidden causal mechanisms. In this case, an unobserved variable

U

acts as a confounder that simultaneously influences both

X

and

Y

, making direct backdoor adjustment infeasible. However,

Z

serves as a mediator satisfying the front-door criterion: (1) there is no backdoor path from

X

to

Z

; (2) all backdoor paths from

Z

to

Y

are blocked by conditioning on

X

; and (3)

Z

intercepts all directed paths from

X

to

Y

. Therefore, the causal effect of

X

on

Y

can be identified and estimated via the mediator

Z

.

It should be noted that the SCM does not automatically eliminate confounders. The identifiability of causal effects depends on the structural assumptions encoded in the causal graph. In particular, causal effects can be identified only when appropriate conditions, such as the backdoor or front-door criteria, are satisfied.

In this study, causal relationships are analyzed under the assumption that the constructed causal graph adequately captures the dependencies among alarm variables. The identifiability is ensured by examining the paths between variables and verifying that confounding paths can be blocked through observed variables or mediated structures. Therefore, do-calculus is applied as a formal tool for causal effect estimation under these assumptions, rather than as an automatic mechanism for removing confounding bias.

2.3. Density Clustering Algorithm

The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a density-based clustering algorithm designed for clustering data of any shape in the presence of outliers and noise in large datasets [14]. The DBSCAN algorithm requires the user to set two parameters: ε (the radius parameter) and MinPts (the density threshold for the neighborhood). These two parameters are generally difficult to determine in high-dimensional datasets and are typically set empirically based on a preliminary understanding of the clustered data [15,16].

In this study, we employ a density-based clustering algorithm to preprocess the collected alarm data.

3. SCM-Based Situational Awareness Mode

3.1. Situational Awareness Architecture

The proposed system architecture is designed to collect, preprocess, analyze correlations, and infer causal relationships from historical substation alarm data, ultimately generating a causal relationship diagram to enhance substation situational awareness.

(1) Data Collection and Preprocessing Layer: This layer is responsible for collecting historical alarm data from multiple substations. Initially, the collected alarm data is categorized by substation and chronologically ordered to construct alarm sequences. Subsequently, these alarm sequences are stored in a centralized alarm database, serving as the foundational dataset for further analysis.

(2) Alarm Clustering Layer: This layer employs the DBSCAN density-based clustering algorithm to group alarm sequences. DBSCAN groups alarms with dense distributions into clusters based on alarm time intervals, ε and MinPts, forming distinct alarm clusters. Alarms within the same cluster exhibit a certain degree of correlation, whereas alarms across different clusters are assumed to be uncorrelated.

(3) Correlation Analysis Layer: This layer extracts equipment operating parameters from the alarm clusters and conducts correlation analysis on the contained alarm data. Various methods, such as correlation coefficient calculations, are employed to identify alarm correlations, forming the basis for constructing the causal relationship diagram.

(4) Causal Inference Layer: Utilizing the correlation analysis results, the system constructs an initial causal relationship diagram, where nodes denote different alarm types, edges signify alarm correlations, and edge weights reflect the strength of causal relationships.

(5) Causal Diagram Refinement and Validation Layer: This layer refines the initial causal relationship diagram by using causal inference tools, including the SCM and do-calculus. Through pruning operations, alarms and pathways lacking strong causal relationships are removed, ensuring that only significant causal connections remain in the final causal relationship diagram. The resulting diagram facilitates situational awareness and decision making, thereby enhancing substation safety and stability.

3.2. Cluster Analysis of Alarm Events in Substation

The core process of DBSCAN algorithm is as follows:

Input Parameters:

(1) Alarm Sequence

X = \{x_{1}, x_{2}, x_{3}, \dots, x_{n}\}

: This sequence contains multiple alarm points, each recording the timestamp and related information of the alarm.

(2) Distance Function

d (x_{i}, x_{j}) = |t_{i} - t_{j}|

: The distance between two alarm points is measured by the time difference between their timestamps.

(3) Neighborhood Radius

ε

: This defines the maximum allowed distance between alarm points to determine if they are within the same density region.

(4) Minimum Core Points

M i n P t s

: This is the threshold for the minimum number of neighboring points required for a point to be considered a core point.

Output: Cluster Division

C = \{c_{1}, c_{2}, \dots, c_{k}\}

: The alarm points are divided into several clusters, where each cluster contains alarm points that are density-connected.

Algorithm Steps:

(1) Core Point Identification: For each alarm point

x_{i}

, calculate the set of its neighbors within a distance of

ε

:

N_e p s i l o n (x_{i}) = {x_{j} ∣ d (x_{i}, x_{j}) < = ε}

. If the size of the neighborhood set

∣ N_e p s i l o n (x_{i}) ∣ > = M i n P t s

, then

x_{i}

is identified as a core point and added to the core point set

O m e g a

.

(2) Cluster Construction: If the core point set

O m e g a

is empty, the algorithm outputs an empty cluster division

C = {}

, indicating no dense clusters were found. Otherwise, randomly select a core point

o

from the core point set

O m e g a

and initialize a new cluster

C_{k} = {o}

. Remove

o

from

O m e g a

, and add all points within the neighborhood of o to cluster

C_{k}

. These points are removed from the unvisited point set.

(3) Cluster Expansion: For each point

x_{j}

added to cluster

C_{k}

, if

x_{j}

is also a core point, expand the neighborhood of

x_{j}

into cluster

C_{k}

. This process continues iteratively, adding points that are density-connected until no further expansion is possible. In this way, the cluster

C_{k}

will gradually absorb all points that are density-connected, forming a complete cluster.

(4) Output: Once all core points have been processed, the algorithm outputs the final cluster set:

C = {c_{1}, c_{2}, \dots, c_{k}}

. These clusters contain alarm points that are density-connected, while those points that do not belong to any cluster are considered noise.

It is important to note that substation alarm data typically exhibit irregular arrival patterns and bursty behaviors. Compared with sliding window methods, DBSCAN does not require predefined window sizes and can adaptively identify clusters of varying durations. Compared with model-based approaches such as Hawkes processes or Hidden Markov Models (HMMs), DBSCAN is a non-parametric method that does not rely on strong distributional assumptions, making it more suitable for heterogeneous alarm data with unknown generation mechanisms.

In this study, the distance between two alarm events is defined based on their temporal difference. To ensure comparability and avoid unit dependency, the time differences are normalized to the range [0,1] using min–max normalization. Therefore, the neighborhood radius parameter ε is unitless.

Multiple combinations of ε and MinPts were tested, and ε = 0.3 and MinPts = 4 were selected as they produced stable clustering results without excessive fragmentation or over-merging of alarm sequences.

Additionally, the clustering results are sensitive to parameter choices, and different datasets may require re-tuning. Future work will explore more systematic parameter selection methods.

3.3. Causal Diagram Construction for Alarm Events Using Correlation Analysis and Causal Inference

The process of constructing the causal diagram is depicted in Figure 2. The causal graph is constructed in two stages. In the first stage, strongly correlated variable pairs are identified, and edge directions are assigned by independent variables from dependent variables based on domain knowledge. In the second stage, the initial graph is refined using intervention-based causal effect estimation. The do-operator is employed to estimate the average causal effect (ACE), and edges with weak causal effects are pruned.

To construct the initial structure of the dependency graph, we employ Pearson correlation to quantify pairwise statistical relationships between system variables and alarm signals.

Formally, given variables X and Y, the Pearson correlation coefficient measures their linear dependence and is used to construct an initial adjacency matrix. A threshold is then applied to retain only statistically significant relationships, resulting in a sparse dependency graph.

This strategy is consistent with existing causal discovery frameworks, where correlation- or dependence-based measures are commonly used to reduce the search space prior to causal structure learning. Recent studies on causal discovery from small-scale time series, such as PCAC [17], also utilize correlation-based measures to construct initial candidate structures before applying causal inference procedures, demonstrating the effectiveness of this two-stage strategy.

The Pearson correlation coefficient is determined using the following equation:

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

(4)

where

x_{i}

and

y_{i}

represent the observed values of two variables, while

\bar{x}

and

\bar{y}

are their respective means. The Pearson correlation coefficient

r

ranges between −1 and 1, describing the degree and direction of the linear relationship between the two variables. A positive

r

(

r > 0

) indicates a positive correlation, while a negative

r

(

r < 0

) indicates a negative correlation. An

r

value of 1 or −1 signifies a perfect correlation and

r = 0

indicates no linear correlation. Based on empirical guidelines, the strength of the linear correlation is classified as follows:

∣ r ∣ \geq 0.8

, strong correlation;

0.5 \leq ∣ r ∣ < 0.8

, moderate correlation;

0.3 \leq ∣ r ∣ < 0.5

, weak correlation; and

∣ r ∣ \leq 0.3

, very weak or negligible correlation, often considered uncorrelated. These empirical criteria are also used in the subsequent correlation analysis to ensure consistent interpretation of correlation strength. The correlation coefficient is used to determine the correlation strength between variables. Variables that are not significantly correlated are removed, and an initial causal diagram is constructed using the remaining variables.

While Pearson correlation is used to identify statistical dependencies between variables, it does not indicate

x_{i}

as an independent or dependent variable. To construct a directed graph, we assign independent and dependent variables according to the following principles:

(1) Temporal precedence: Variables that consistently occur earlier in alarm sequences are treated as independent variables of the other variable.

(2) Functional relationships of variables: State variables are more likely to influence alarm indicators, so they are treated as independent variables.

(3) Stability under pruning: When the orientation is otherwise ambiguous, edges surviving threshold-based pruning are preferentially directed from nodes that are more stable under threshold variation toward less stable ones. This is a useful heuristic to obtain a definite directed acyclic graph, not a demonstration of valid adjustment sets or identified effects for every arc.

To ensure a valid directed acyclic graph, we further enforce an acyclicity constraint during graph construction. When cycles are detected, edges with lower correlation strength are iteratively removed until all cycles are eliminated.

After constructing the initial graph, we further refine it using Structural Causal Modeling (SCM). Specifically, spurious correlations are removed through causal reasoning, and directional dependencies are identified. Finally, the average causal effect (ACE) is estimated to quantify the strength of causal relationships, which is subsequently used for alarm prediction.

For causal inference, the main steps involve identifying causal relationships using the “intervention” operation from SCM. For instance, the ACE of

Z

on

Y

is defined as

A C E (Z \to Y) = E [Y ∣ d o (Z) = 1] - E [Y ∣ d o (Z) = 0]

(5)

where

E [Y ∣ d o (Z) = 1]

and

E [Y ∣ d o (Z) = 0]

represent the expected values of

Y

when the intervention variable

Z

is set to 1 and 0, respectively. A larger difference between these expectations indicates a stronger causal effect. A causal edge between alarms is retained only if the absolute value of ACE exceeds a predefined threshold, which is determined through empirical validation on historical data to balance model sensitivity and specificity. This threshold ensures that only statistically and practically significant causal relationships are included in the final causal graph. Finally, non-causal relationships are pruned from the diagram, retaining only alarms and paths with strong causal relationships.

We therefore treat the resulting diagram as a candidate causal structure for downstream analysis and decision support, pending further validation. Retained edges indicate statistically supported dependencies and should not be read as individually confirmed causal relationships.

3.4. Causal Diagram Construction for Alarm Events Using Guided Causal Inference

While using the framework described in Section 3.3 provides a rigorous mathematical foundation for identifying causal effects, applying these formulas to each pair of variables in complex systems is very complicated in terms of data processing. To overcome this challenge, this method of causal inference is inspired by SCOTCH [18]. SCOTCH is a novel structural learning method used to infer causal relationships between variables from time series data. It is based on the continuous-time stochastic differential equation (SDE) framework, which breaks through the dependence of existing methods on discrete time and regular sampling intervals. It can more naturally handle irregular sampling data commonly found in the real world and is more suitable for irregular sensor data in substations. However, although it can ultimately generate causal diagrams, the decision-making process within deep learning models is complex and lacks intuitive physical explanations. For power systems that require high safety and reliability, maintenance personnel may find it difficult to fully trust a model that cannot clearly explain the basis for its judgements.

Therefore, to enhance the model’s interpretability and ensure the learning process is anchored in a plausible initial structure, we adopt a guided learning strategy. Instead of allowing the algorithm to learn from a completely random or uniform starting point, we leverage the initial causal graph derived in Section 3.3 as a strong prior.

By using this data-derived graph to guide the training of a more sophisticated end-to-end model, we ensure that the final result refines an interpretable initial structure, rather than learning an entirely unconstrained and potentially unstable graph.

Specifically, we employ a deep end-to-end causal inference (DECI) model, which learns both the graph structure and the functional relationships simultaneously through variational inference. We implement a soft constraint by modifying the model’s prior distribution over graph structures. We augment this prior with an expert-informed term, creating an expert-enhanced prior. This is achieved by adding a bonus to the log-probability for any edge that is present in both the model’s proposed graph (

G_{m o d e l}

) and the expert knowledge graph derived from Section 3.3 (

A_{e x p e r t}

). The objective function’s prior term is thus modified as follows:

\log P (G_{m o d e l}) = \log P_{G i b b s} (G_{m o d e l}) + λ_{e x p e r t} \sum_{i, j} {(G_{m o d e l})}_{i j} \cdot {(A_{e x p e r t})}_{i j}

(6)

where

\log P_{G i b b s} (G_{m o d e l})

is the base sparsity-inducing prior, and the second term provides a weighted bonus for edges aligning with the expert knowledge. The hyperparameter

λ_{e x p e r t}

controls the strength of this guidance.

This soft constraint mechanism provides a crucial balance. It strongly encourages the model to learn the relationships already identified as plausible by our causal analysis framework, thus preserving interpretability and accelerating convergence. Simultaneously, it retains the flexibility for the model to discover novel causal pathways not present in the initial graph if the observational data provides strong evidence for them. It can also prune edges from the initial graph if they are not supported by the data likelihood. This allows the final model to refine, correct, and enrich the initial structure, resulting in a more accurate and comprehensive causal diagram.

The core process of the guided causal inference algorithm with expert-enhanced prior is as follows.

Input Parameters:

Time-Series Data $D$ . The observational time-series data collected from the system under study (e.g., substation sensor or alarm data).
Initial Expert Knowledge Graph $A_{e x p e r t}$ . An initial causal graph constructed based on domain expertise, representing prior knowledge about potential causal relationships.
Base Causal Inference Model $G_{m o d e l}$ . A deep end-to-end causal inference model used to jointly learn causal graph structure and functional relationships.
Hyperparameters: Guidance strength $λ$ , sparsity coefficient $α$ , training epochs $T$ , and batch size $B$ .

Output:

Refined causal graph $A^{*}$ .
Trained model parameters for structural and functional relationships.

Algorithm Steps:

Model Initialization and Prior Construction

Initialize the DECI model.

Define base sparsity-inducing prior:

p_{b a s e} (A) \propto \exp (- λ_{s} ∥ A ∥_{1})

Based on the expert knowledge graph, an expert-enhanced prior is constructed by rewarding candidate causal graphs that share edges with the expert-defined relationships.

p_{e n h a n c e d} (A) = p_{b a s e} (A) \cdot \exp (λ \cdot I (A, A_{\exp}))

where

I (A, A_{\exp})

is an indicator term counting edges shared between

A

and

A_{\exp}

(rewards overlap with expert graph).

2.: Variational Inference-Based Training

In each training iteration, a mini-batch of time-series data is sampled, and the posterior distribution over the causal graph and functional parameters is approximated as

q (A, f ∣ D_{b a t c h})

.

Optimize Objective with Expert Prior. Minimize the evidence lower bound (ELBO) incorporating

p_{e n h a n c e d} (A)

:

E L B O = E_{q} [\log p (D_{b a t c h} ∣ A, f)] - K L (q (A, f) ∣ ∣ p_{e n h a n c e d} (A) \cdot p (f))

(Balances data fit and prior adherence;

p (f)

is the prior over functional forms.)

Update model parameters (graph structure and functional relationships) via backpropagation to maximize ELBO.

3.: Causal Graph Extraction and Refinement

After training, the most probable causal graph is extracted from the learned posterior distribution. Edges that are weakly supported by observational data may be pruned, while edges consistent with both data evidence and expert knowledge are retained to form the refined causal graph.

The model is implemented based on the DECI framework with the following settings:

Training epochs: 2000.
Batch size: 128.
Optimizer: Adam.
Random seed: 42.
Gumbel temperature: 0.5.
Prior sparsity coefficient (λ): 30.0.

For the expert-guided prior, we introduce a soft constraint mechanism with the following parameters:

Expert prior weight (λ_expert): 50.0.
Non-expert penalty weight: 30.0.
Expert alignment constraint weight: 30.0.

The functional relationships are modeled using a neural network with:

Embedding size: 32.
Number of hidden layers: 2.

The DAG constraint is enforced using an augmented Lagrangian method with adaptive parameters.

4. Case Study and Numerical Results

4.1. Dataset and Experimental Setup

The dataset used in this study is the AI Transformer Monitoring dataset, which is publicly available on the Kaggle platform [19]. This dataset contains operational and environmental measurements collected from real-world power substation transformers, including three-phase voltages, three-phase currents, ambient temperature indicators, oil temperature indicators, oil level indicators, and other related sensor readings. These measurements are acquired via Internet of Things (IoT) devices at regular intervals of approximately 15 min over an extended monitoring period.

To evaluate the performance of the causal inference-based situational awareness method relative to correlation-based approaches under specific substation operating conditions, pre-collected historical alarm data from substations are selected. The data types in the dataset can be categorized into internal transformer failures and external electrical disturbances. Internal transformer failures mainly reflect the health condition of the transformer itself and include Transformer Ambient Temperature High, Transformer Oil Temperature High, Transformer Oil Level Low, Transformer Winding Temperature High, Transformer Winding Fault, Transformer Oil Level Indicator Tripping, and Transformer Fault. External electrical disturbances are related to operating conditions or upstream/downstream grid behavior and include Transformer Overload, Busbar Current Anomaly, Circuit Breaker Tripping (CB_T, aggregated tripping event on either the high-voltage or low-voltage side of the transformer), Grid Frequency Abnormal, and Transformer Voltage High.

Alarm thresholds were determined according to commonly used transformer operating limits defined in international standards such as IEC 60076 [20] (power transformers), IEC 60038 [21] (standard voltages), and IEC 60255 [22] (protection relays). These standards provide reference limits for transformer temperature rise, loading capability, voltage deviation, and protection operation conditions. The abbreviations and thresholds for the alarms are shown in Table 3.

To ensure a fair evaluation, the dataset is divided in a chronological manner, where earlier alarm records are used for training and later records for testing. This design avoids temporal leakage and ensures that the model is evaluated on future data. The dataset is divided as follows: the first 80% in chronological order is the training set, and the last 20% is the test set.

4.2. Causal Diagram Construction

Based on the dataset described in Section 4.1, the next step is to construct the causal analysis framework for substation alarm events. Since the raw alarm data contain complex temporal correlations and potential noise, it is necessary to first identify meaningful alarm patterns and relationships. Therefore, a clustering method is employed to analyze the alarm data, followed by the construction of a causal graph to model the potential causal relationships among alarms.

The following clusters were obtained by clustering based on time using the DBSCAN algorithm:

Cluster 1: {TO, ATI, OTI, OLI, WTI, TWF, BCA, CB_T, TVH, FAULT, OLI_T}.

Cluster 2: {GFA}.

Since Cluster 1 includes most alarm variables and reflects the main interactions among transformer alarms, it is selected as the focus of the subsequent causal analysis.

Constructing an initial causal graph based on the correlation analysis of the afore-mentioned data, using the previously defined correlation strength classification, is shown in Figure 3.

The second step is to perform causal identification using do-calculus.

The ACE is used as an approximate measure to quantify the strength of causal relationships between variables. In this study, ACE is treated as a relative indicator for ranking and pruning causal edges, rather than an exact or uncertainty-free estimate.

To further justify the selection of the ACE threshold, we conducted a sensitivity analysis by evaluating model performance under different thresholds (0.05, 0.1, and 0.15). The results show that a threshold of 0.1 provides a good balance between model sparsity and predictive performance. Lower thresholds (e.g., 0.05) retain more edges but introduce redundant or weak causal relationships, while higher thresholds (e.g., 0.15) may remove informative connections and slightly degrade prediction accuracy. We emphasize that ACE values estimated from observational data are approximate, and the threshold is chosen to optimize predictive performance, not to confirm causal validity.

Due to data and experimental constraints, confidence intervals or bootstrap-based uncertainty estimation are not included in the current study. Therefore, the selected threshold should be interpreted as an empirical choice based on observed performance rather than a statistically optimal value. A more rigorous uncertainty analysis will be considered in future work.

To better illustrate the process of revising the causal graph, a representative case is presented below.

Case 1.

Calculating the causal effect of OLI_T on FAULT.

The method we used estimates causal effects from observational data by approximating intervention using statistical adjustment, rather than performing actual physical interventions. The specific steps have been explained in the two cases in Section 2.2. By using the backdoor or front-door formula to estimate the probability distribution of the dependent variable under intervention, the causal effects between variables can be calculated without actual physical intervention. To compute the causal effect between two variables, Formula (1) is applied, which requires estimating the probability distribution of the dependent variable under different interventions (

d o (X = 1)

or

d o (X = 0)

).

In this case, the independent variable

X

is OLI_T, the dependent variable

Y

is FAULT, and the confounding variable

Z

is OLI. As shown in Figure 3, there exists a backdoor path between variable X (OLI_T) and Y (FAULT): X ← OLI → Y. According to the backdoor criterion, this non-causal path can be blocked by conditioning on (i.e., controlling for) the variable set

Z = {OLI}

. Since OLI is a non-collider on this path, and conditioning on it does not open any new spurious paths, the set

\{OLI\}

constitutes a valid adjustment set.

The average causal effect (ACE) is therefore identifiable. According to the backdoor criterion, after adjusting for

Z = {OLI}

, the interventional distribution can be expressed in terms of observational quantities as

P (FAULT ∣ d o (OLI_T)) = \sum_{OLI} P (FAULT ∣ OLI_T, OLI) P (OLI)

(7)

After applying this causal model, it may have captured stable dependency patterns among variables that are informative for fault prediction. The right-hand side of this equation contains no do-operator and depends only on the observational data distribution, which confirms that the causal effect is identifiable.

We use Formula (5) to estimate the probability distribution of the dependent variable under intervention

P (FAULT ∣ d o (OLI_T))

and calculate the causal effect between the two variables. If the calculated ACE is small, less than 0.1, we consider the causal effect between the two to be insignificant and remove the side.

Similarly, in cases where the backdoor criterion is not satisfied, the front-door adjustment provides an alternative approach for estimating causal effects between variables.

The revised causal diagram based on the causal diagram construction process is shown in Figure 4. Based on the initial causal diagram in Figure 3, causal effect estimation is performed using the do-operator to compute the ACE between variable pairs. Paths with weak causal influence (ACE ≤ 0.1) are removed, while paths with stronger causal effects are retained, resulting in a more concise and reliable causal graph.

4.3. Results of Constructing Causal Graphs Based on Guided Causal Inference

Following the procedure described in the guided causal inference algorithm in Section 3.4, we applied the guided causal inference framework to the clustered substation alarm data. We further compared the causal graphs generated with and without expert-guided constraints. Figure 5 shows the resulting structures. The unconstrained model (baseline DECI) produced a denser graph containing several spurious edges that lack plausibility, while the guided model yielded a more compact and interpretable structure consistent with substation operational mechanisms.

By leveraging the expert-informed prior, the guided model effectively restricts unreasonable causal directions and removes redundant or unsupported dependencies, while retaining flexibility to discover novel causal relationships supported by the observational data. As a result, the final graph exhibits clearer causal hierarchies, more coherent propagation paths, and stronger alignment with practical transformer operation mechanisms, demonstrating the practical improvement brought by the guided framework.

4.4. Based on Causal Analysis Prediction Results

To ensure a realistic and unbiased evaluation for time-series data, a time-aware cross-validation strategy is adopted. Specifically, the dataset is divided based on chronological order using a forward-chaining scheme, where earlier data are used for training and later data for testing.

The data are split into multiple sequential folds. In each fold, the training set consists of all observations up to a certain time point, while the testing set contains subsequent unseen data. This process is repeated across the entire timeline to obtain multiple evaluation results. During each fold, normalization is performed using only the training data and then applied to the testing data to further prevent data leakage.

Considering the real-time performance requirements, smart substations generally employ classical machine learning models. Therefore, this study adopts Random Forest, XGBoost, and Logistic Regression for validation.

Due to the imbalance between normal and failure samples, class weighting is applied during model training. For Logistic Regression and Random Forest, the class_weight = ‘balanced’ option is used, which automatically assigns larger weights to minority classes. For XGBoost, the imbalance is handled by setting the scale_pos_weight parameter to the ratio between negative and positive samples.

The evaluation in this study primarily reports aggregated performance metrics to reflect the overall predictive capability and robustness of the compared models. The main objective of this work is not to optimize classification performance for individual classes, but to investigate whether statistical inference and causal inference frameworks can capture stable and informative dependency patterns for fault prediction.

Therefore, the tables below illustrate the accuracy, precision, recall, and F1-score, as well as the computational time of the three algorithms when modeling with features derived from correlation analysis and causal relationship analysis. The features derived from correlation analysis are illustrated in Table 4. Similarly, the features obtained through causal relationship analysis are depicted in Table 5.

The comparison of model performance before and after using causal analysis can be obtained from Figure 6.

The comparison reveals that modeling with features derived from causal relationship analysis yields improved predictive performance and reduced computational cost relative to correlation-based feature modeling in our experimental setup. This improvement arises from causal analysis, which removes confounding variables that provide limited information and have minimal impact on model prediction, thus enhancing predictive accuracy and reducing model complexity. Therefore, we can conclude that the constructed causal graph may have captured stable dependency patterns among variables that are informative for fault prediction.

While class-wise metrics (e.g., precision, recall, and F1-score for minority classes) could provide additional insights in imbalanced settings, such fine-grained evaluation is beyond the current scope. We will incorporate more detailed per-class analysis in future work to further assess model behavior under class imbalance.

5. Conclusions

Correlation and causality are related but distinct. In comparison, correlation may involve “spurious correlation.” The substation situational awareness method proposed in this paper uses the causal inference tool, Structural Causal Models (SCMs), to analyze real-world substation alarm data. Unlike purely correlation-based statistical inference, SCMs provide a framework to identify potential confounding variables that may interfere with alarm prediction, which can improve predictive accuracy by removing spurious associations.

The proposed approach leverages causal analysis for informative variable selection and pruning, which contributes to improved predictive performance. By identifying and removing variables that carry limited information or have negligible impact on prediction, the model achieves a better balance between complexity and accuracy. Experimental results on real-world substation alarm datasets indicate that causal graph pruning reduces model complexity by 35% while maintaining over 95% prediction accuracy in the case study, making the approach suitable for real-time deployment in substations.

Additionally, integrating expert knowledge as a soft prior in guided causal inference appeared to yield graphs that are more compact and interpretable than an unconstrained baseline, while retaining data-driven flexibility. We view the data + knowledge design as a practical direction for trustworthy situational awareness in critical infrastructure.

It is critical to note that improved predictive performance of causal-feature-based models does not equal full validation of causality. While causal pruning removes irrelevant variables, the observed gains may still arise from unmeasured confounders or residual statistical associations not captured by the current causal graph. Future work will incorporate bootstrap uncertainty estimation for ACE values, sensitivity analyses for unmeasured confounders, and robustness checks across multiple substation datasets to further strengthen causal claims.

Author Contributions

Conceptualization, X.L. and Z.M.; methodology, X.L. and Z.M.; software, X.L.; validation, X.L., Y.C. and Y.F.; formal analysis, X.L.; investigation, X.L., Y.C. and Y.F.; resources, Y.C., Y.F. and F.R.; data curation, X.L., Y.C., Y.F. and F.R.; writing—original draft preparation, X.L.; writing—review and editing, Y.C., Y.F., F.R. and Z.M.; visualization, X.L.; supervision, F.R. and Z.M.; project administration, F.R. and Z.M.; funding acquisition, F.R. and Z.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Fundamental Research Funds for the Central Universities (grant numbers FRF-TP-19-016A2).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed towards the corresponding author.

Conflicts of Interest

Author Xiang Lu was employed by the company, Information Center, Guizhou Power Grid Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Fan, L.; Li, J.; Pan, Y.; Wang, S.; Yan, C.; Yao, D. Research and Application of Smart Grid Early Warning Decision Platform Based on Big Data Analysis. In Proceedings of the 2019 4th International Conference on Intelligent Green Building and Smart Grid (IGBSG), Hubei, China, 6–9 September 2019; pp. 645–648. [Google Scholar]
Guo, Y.; Feng, S.; Li, K.; Mo, W.; Liu, Y.; Wang, Y. Big Data Processing and Analysis Platform for Condition Monitoring of Electric Power System. In Proceedings of the 2016 UKACC 11th International Conference on Control (CONTROL), Belfast, UK, 31 August–2 September 2016; pp. 1–6. [Google Scholar]
Qiu, X.; Huang, Y.; Liu, G.; Yan, J.; Chen, S. Distribution Network Situational Awareness Prediction Based on Spatio-Temporal Attention Dynamic Graph Neural Network. Energies 2025, 18, 4402. [Google Scholar] [CrossRef]
Fei, S.-W.; Sun, Y. Forecasting Dissolved Gases Content in Power Transformer Oil Based on Support Vector Machine with Genetic Algorithm. Electr. Power Syst. Res. 2008, 78, 507–514. [Google Scholar] [CrossRef]
Zhang, L.; Wang, G.; Cao, L.; Dai, Z.; Kou, B. Smart Status Evaluation and Early Warning Approach for Highlyreliable Protection Systems Based on GAN Model and Random Forest Algorithm. J. Electr. Power Sci. Technol. 2022, 36, 104–112. [Google Scholar] [CrossRef]
Pearl, J. Causal Inference in Statistics: An Overview. Stat. Surv. 2009, 3, 96–146. [Google Scholar] [CrossRef]
Yao, L.; Chu, Z.; Li, S.; Li, Y.; Gao, J.; Zhang, A. A Survey on Causal Inference. ACM Trans. Knowl. Discov. Data 2021, 15, 74. [Google Scholar] [CrossRef]
Tang, K.; Huang, J.; Zhang, H. Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 1513–1524. [Google Scholar]
Pu, H.; Chen, Z.; Liu, J.; Yang, X.; Ren, C.; Liu, H.; Jian, Y. Research on Decision-Level Fusion Method Based on Structural Causal Model in System-Level Fault Detection and Diagnosis. Eng. Appl. Artif. Intell. 2023, 126, 107095. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, H.; Tang, J.; Hua, X.; Sun, Q. Causal Intervention for Weakly-Supervised Semantic Segmentation. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 655–666. [Google Scholar]
Wang, T.; Huang, J.; Zhang, H.; Sun, Q. Visual Commonsense R-CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10757–10767. [Google Scholar]
Prosperi, M.; Guo, Y.; Sperrin, M.; Koopman, J.S.; Min, J.S.; He, X.; Rich, S.; Wang, M.; Buchan, I.E.; Bian, J. Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare. Nat. Mach. Intell. 2020, 2, 369–375. [Google Scholar] [CrossRef]
Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Zhang, L.; Deng, S.; Li, S. Analysis of Power Consumer Behavior Based on the Complementation of K-Means and DBSCAN. In Proceedings of the 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 26–28 November 2017; pp. 1–5. [Google Scholar]
Singh, H.V.; Girdhar, A.; Dahiya, S. A Literature Survey Based on DBSCAN Algorithms. In Proceedings of the 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 25–27 May 2022; pp. 751–758. [Google Scholar]
Deng, D. DBSCAN Clustering Algorithm Based on Density. In Proceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, China, 25–27 September 2020; pp. 949–953. [Google Scholar]
Sun, W.; Zhang, Y.; Liu, J.; Cai, B. PCAC: Causal Discovery from Low-Dimensional Small-Scale Time Series. Knowl.-Based Syst. 2025, 327, 114135. [Google Scholar] [CrossRef]
Wang, B.; Jennings, J.; Gong, W. Neural Structure Learning with Stochastic Differential Equations. arXiv 2024, arXiv:2311.03309. [Google Scholar]
Distributed Transformer Monitoring. Available online: https://www.kaggle.com/datasets/sreshta140/ai-transformer-monitoring (accessed on 18 February 2026).
IEC 60076-1; Power Transformers—Part 1: General. International Electrotechnical Commission: Geneva, Switzerland, 2011.
IEC 60038; Standard Voltages. International Electrotechnical Commission: Geneva, Switzerland, 2009.
IEC 60255-1; Measuring Relays and Protection Equipment—Part 1: Common Requirements. International Electrotechnical Commission: Geneva, Switzerland, 2022.

Figure 1. Example of backdoor/front-door adjustment: (a) backdoor adjustment; (b) front-door adjustment.

Figure 2. The process of constructing the causal diagram.

Figure 3. Initial causal diagram.

Figure 4. Corrected causal diagram.

Figure 5. Causal graph refinement through guided causal inference: (a) expert knowledge graph; (b) causal diagram learned by DECI; (c) causal diagram obtained from guided learning.

Figure 6. Comparison of model performance before and after using causal analysis.

Table 1. Backdoor simulated data.

Z	X	Y	Sample Size
0	0	0	35
0	0	1	15
0	1	0	570
0	1	1	380
1	0	0	8019
1	0	1	891
1	1	0	5
1	1	1	85

Table 2. Front-door simulated data.

X	Z	Y	Sample Size
0	0	0	280
0	0	1	120
0	1	0	20
0	1	1	80
1	0	0	90
1	0	1	10
1	1	0	160
1	1	1	240

Table 3. Data abbreviations in graph.

Abbreviations	Data	Alarm Threshold
TO	Transformer Overload	Load > 1.2 pu
ATI	Transformer Ambient Temperature High	Ambient temperature > 40 °C
OTI	Transformer Oil Temperature High	Oil temperature > 90 °C
OLI	Transformer Oil Level Low	Oil level < 70%
WTI	Transformer Winding Temperature High	Winding temperature > 110 °C
TWF	Transformer Winding Fault	Winding fault protection
BCA	Busbar Current Abnormal	Current > 1.2 pu
CB_T	Circuit Breaker Tripping	Breaker trip event
GFA	Grid Frequency Abnormal	f < 49.5 or >50.5 Hz
TVH	Transformer Voltage High	Voltage > 1.1 pu
FAULT	Transformer Fault	Oil level trip
OLI_T	Transformer Oil Level Indicator Tripping	Composite fault flag

Table 4. Model performance using correlation analysis.

Model	Accuracy	Precision	Recall	F1	Time
Logistic Regression	71.8 ± 0.6%	52.4 ± 0.7%	78.3 ± 1.3%	62.8 ± 0.7%	0.037 ± 0.006 s
Random Forest	94.1 ± 0.3%	91.8 ± 0.7%	88.3 ± 0.9%	90.0 ± 0.6%	1.861 ± 0.026 s
XGBoost	93.6 ± 0.4%	87.9 ± 0.7%	91.6 ± 0.9%	89.7 ± 0.6%	0.103 ± 0.039 s

Table 5. Model performance using causal analysis.

Model	Accuracy	Precision	Recall	F1	Time
Logistic Regression	87.7 ± 0.5%	74.5 ± 0.9%	90.6 ± 0.9%	81.8 ± 0.6%	0.031 ± 0.003 s
Random Forest	95.0 ± 0.3%	93.9 ± 0.5%	89.3 ± 0.6%	91.5 ± 0.5%	1.524 ± 0.030 s
XGBoost	94.6 ± 0.3%	90.2 ± 0.9%	92.2 ± 0.6%	91.2 ± 0.5%	0.101 ± 0.038 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, X.; Chen, Y.; Fu, Y.; Ren, F.; Ma, Z. Alarm Event Prediction Based on Structural Causal Model in Smart Substation. Energies 2026, 19, 2296. https://doi.org/10.3390/en19102296

AMA Style

Lu X, Chen Y, Fu Y, Ren F, Ma Z. Alarm Event Prediction Based on Structural Causal Model in Smart Substation. Energies. 2026; 19(10):2296. https://doi.org/10.3390/en19102296

Chicago/Turabian Style

Lu, Xiang, Youwei Chen, Yijia Fu, Fang Ren, and Zhonggui Ma. 2026. "Alarm Event Prediction Based on Structural Causal Model in Smart Substation" Energies 19, no. 10: 2296. https://doi.org/10.3390/en19102296

APA Style

Lu, X., Chen, Y., Fu, Y., Ren, F., & Ma, Z. (2026). Alarm Event Prediction Based on Structural Causal Model in Smart Substation. Energies, 19(10), 2296. https://doi.org/10.3390/en19102296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Alarm Event Prediction Based on Structural Causal Model in Smart Substation

Abstract

1. Introduction

2. Methodology

2.1. Causal Model

2.2. Pearl’s Do-Calculus

2.3. Density Clustering Algorithm

3. SCM-Based Situational Awareness Mode

3.1. Situational Awareness Architecture

3.2. Cluster Analysis of Alarm Events in Substation

3.3. Causal Diagram Construction for Alarm Events Using Correlation Analysis and Causal Inference

3.4. Causal Diagram Construction for Alarm Events Using Guided Causal Inference

4. Case Study and Numerical Results

4.1. Dataset and Experimental Setup

4.2. Causal Diagram Construction

4.3. Results of Constructing Causal Graphs Based on Guided Causal Inference

4.4. Based on Causal Analysis Prediction Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI