Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations

Huang, Xiang; Li, Ping; Wang, Ye; Ren, Xuchao; Zhao, Zhenbing; Li, Gang

doi:10.3390/en18082104

Open AccessArticle

Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations

by

Xiang Huang

¹,

Ping Li

²,

Ye Wang

¹,

Xuchao Ren

^1,*,

Zhenbing Zhao

³

and

Gang Li

⁴

¹

State Grid Jiangsu Electric Power Company, Nanjing 200024, China

²

State Grid Jiangsu Electric Power Co., Ltd. Huaian Power Supply Branch, Huaian 223000, China

³

School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China

⁴

Department of Computer, North China Electric Power University, Baoding 071003, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(8), 2104; https://doi.org/10.3390/en18082104

Submission received: 5 March 2025 / Revised: 1 April 2025 / Accepted: 2 April 2025 / Published: 18 April 2025

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

With the increasing complexity of modern power systems, traditional risk assessment methods relying on expert experience and historical data face challenges in accuracy and adaptability. This study proposes a knowledge graph-augmented ERNIE-CNN method to enhance risk assessment in secondary power system operations. First, we construct a domain-specific knowledge graph by integrating expert knowledge and operational standards, which enhances semantic understanding and logical reasoning capabilities. Second, an improved ERNIE-CNN model is developed, incorporating an attention mechanism to effectively fuse semantic features and spatial patterns from operational texts. The experimental results on a dataset of 3240 secondary operation records demonstrate the model’s superior performance, achieving precision, recall, and F1-scores of 0.878, 0.861, and 0.869, respectively, outperforming benchmarks like BERT. Furthermore, a visualization of the knowledge graph is implemented, providing interpretable decision support for risk management. The proposed method offers a robust framework for automating risk assessment in power systems, with potential applications in smart grid maintenance and safety-critical operational planning.

Keywords:

knowledge graph; smart grid; secondary power operations; risk assessment; ERNIE

1. Introduction

Driven by the accelerated implementation of China’s “Four Modernizations and Dual Carbon” strategic objectives in power system development, regional power grids have witnessed a substantial integration of renewable energy sources, particularly wind and photovoltaic generation, alongside expanded long-distance cross-regional power transmission infrastructure. These developments have precipitated significant structural transformations in grid architecture, characterized by dual operational paradigms: high-penetration distributed renewable energy integration within regional networks and increased reliance on extra-regional DC transmission inputs. The growing complexity of fault mechanisms in modern AC–DC hybrid transmission systems, compounded by the globalization of grid interconnections, has imposed stringent operational reliability requirements on secondary protection systems, demanding near-zero error tolerance in maintenance and control operations.

Recently, artificial intelligence-based technologies have begun to be applied in the risk assessment of secondary operations in the power sector. These include machine learning algorithms such as decision trees [1], random forests [2], support vector machines (SVM) [3], and neural networks [4], which are used to assess risks in secondary operations. These algorithms can identify and analyze potential risk factors by learning from historical data, thereby helping to predict possible issues in secondary operations. In addition, deep learning techniques, such as those based on neural networks, can handle complex data structures and possess strong feature learning capabilities. In risk assessments for secondary operations in the power sector, deep learning is utilized to extract key features from sensor data, image data, and text data, enabling risk prediction and assessment. Furthermore, Natural Language Processing (NLP) technology is employed to analyze and understand text data related to power equipment [5], such as maintenance records, fault reports, and operation manuals. Through NLP, critical information can be automatically extracted for risk assessment and decision support [6].

While artificial intelligence-driven analytical frameworks have demonstrated preliminary success in secondary power system risk evaluation, two fundamental limitations persist in practical implementations. Primary constraints emerge from inherent data integrity challenges, where conventional data acquisition methodologies in power utilities frequently yield datasets with deficiencies in three critical dimensions: missing temporal continuity in operational records, incomplete metadata annotation for heterogeneous devices, and inconsistent labeling protocols across regional maintenance teams. Such data quality impairments directly compromise algorithmic robustness, particularly affecting deep neural architectures requiring high-dimensional feature learning. Secondary limitations stem from the opacity of decision mechanisms in multilayer nonlinear models—a critical vulnerability given the operational safety mandates in power infrastructure management. The industry’s stringent reliability requirements demand rigorous interpretability frameworks that currently remain underdeveloped for complex temporal convolutional network configurations.

To address these issues, this paper presents a risk assessment method that integrates knowledge graphs with the ERNIE-CNN model. The contributions of this paper are primarily reflected in three aspects. First, by combining expert knowledge from the field of secondary power operations and existing nomenclature standards in the power sector, we define domain concepts, entities, and relationships within the knowledge graph. Based on this, we have constructed an annotated corpus for power secondary operations, sourced from grid companies, to enhance the model’s interpretability and logical reasoning capabilities. Second, through the integration of an attention mechanism into the ERNIE-CNN deep learning model, we effectively combine semantic understanding with feature extraction to better uncover implicit features and patterns of equipment. Experiments conducted on text datasets in secondary operation scenarios demonstrate the model’s superior performance in the field of power secondary operations. Third, leveraging the model’s excellent performance, we propose a comprehensive process for constructing a knowledge graph tailored to power secondary operations, which includes ontology construction, data preprocessing, knowledge extraction, and graph visualization.

2. Related Work

Over the years, power companies have accumulated a substantial number of electrical equipment defect cases. Traditionally, the documentation and analysis of these cases have been largely manual, lacking full automation. However, with the significant advancement of artificial intelligence technology in areas such as text assessment and information retrieval, an increasing number of scholars have recently explored the use of these cutting-edge techniques for the in-depth analysis of textual data within the power system domain. For instance, in reference [7], researchers effectively combined the Hidden Markov Model and the Vector Space Model to mine alarm signal texts, thereby aiding fault diagnosis in power dispatch. Similarly, reference [8] introduced a defect text assessment model based on Convolutional Neural Networks (CNN) to delineate the characteristics of power equipment defects. This model demonstrated superior assessment accuracy and processing efficiency, highlighting its effectiveness in handling power defect texts. Furthermore, in reference [9], the BiLSTM model was utilized to extract comprehensive semantic information from defect texts, successfully elucidating the causal relationships between defect issues and their underlying causes, and achieving effective assessment of power equipment defect texts. In another study [10], an enhanced DBNet network model was employed to automate the reading of power equipment nameplate parameters, significantly simplifying complex verification tasks and reducing the workload for staff. Additionally, the research in reference [11] utilized a pre-trained model for entity recognition in power equipment defect texts, effectively addressing the challenges posed by multi-source heterogeneity, content ambiguity, and redundancy in vast amounts of defect text data. Nonetheless, despite the notable progress made by the aforementioned studies in the assessment, information extraction, and management of power defect texts, there is still considerable scope for further exploration and investigation to address the intricacies of power text information interrelationships. For example, while the Deep Structured Semantic Model (DSSM) proposed in reference [12] facilitates text matching by mapping texts into a common semantic space and computing cosine similarity between text vectors, it neglects text semantic features. Conversely, the Dense Interactive Inference Network (DIIN) model, introduced in reference [13], comprehensively captures semantic information and relationships between texts through multilayer interaction and attention mechanisms, albeit with a relatively complex model structure.

In recent years, pre-trained models have shown considerable potential in text processing tasks. The method proposed by Ref. [14] for text similarity calculation, utilizing a variant of the pre-trained BERT model, effectively extracts feature information from input texts but struggles to capture additional contextual information. Ref. [15] enriched textual background knowledge by incorporating relationships between external entities through the K-BERT model, achieving notable results across various text tasks. Meanwhile, Ref. [16] demonstrated that the ERNIE model outperforms traditional models in medical text assessment. However, the application of the ERNIE model to power system risk assessment is still in its infancy, and its full potential remains to be explored.

In 2012, Google officially introduced the concept of a knowledge graph to represent the connection characteristics of entity relationships, which has gradually evolved into a large-scale database. Subsequently, targeted applied research emerged across various fields. For instance, Ref. [17] constructed a risk assessment model for power systems based on knowledge graphs, significantly enhancing the accuracy and reliability of risk assessments by deeply mining the relationships between power equipment. Ref. [18] developed an application framework for power knowledge graphs using power texts, providing valuable decision support for staff addressing power grid defects.

To address the limitations of single models, methods that integrate multiple models are increasingly gaining attention. Ref. [19] introduced the ERNIE model, which incorporates both professional medical knowledge and unique textual knowledge for recognizing irregular clinical short texts, thereby improving model accuracy. Ref. [20] presented a process quality prediction method that integrates a temporal knowledge graph with a CNN-LSTM. This method enhances temporal features through feature fusion, constructs a combined neural network model based on an attention mechanism to extract significant temporal features, and ultimately achieves quality prediction for production processes. Wang et al. [21] proposed a risk assessment method for power systems that combines LSTM and CNN. By integrating temporal and spatial features, this method significantly improves assessment accuracy. Kong et al. [22] explored the integration of knowledge graphs with BERT-CNN for book text assessment. They constructed a book domain knowledge graph to extend the semantic information of book texts and used deep learning to extract deep semantic features, classifying books using TextCNN.

The current research on power system defect text analysis faces several critical challenges. First, while pre-trained models excel in general semantic understanding, their ability to model domain-specific terminology and complex contextual logic in power systems remains limited, leading to semantic biases in risk assessment. Second, knowledge graph construction and deep learning model training often operate independently, failing to achieve dynamic synergistic optimization, which restricts the complementary benefits of knowledge reasoning and data-driven approaches. Furthermore, existing knowledge graph workflows predominantly focus on isolated stages, lacking an integrated framework from ontology design and multi-source data integration to knowledge visualization. This results in inefficiencies in cross-team collaboration and difficulties in adapting to the rapid evolution of secondary power operation scenarios. These issues collectively hinder the accuracy and practical applicability of power text analysis, necessitating systematic breakthroughs through deep domain knowledge embedding, bidirectional model–knowledge base interaction mechanisms, and end-to-end standardized methodologies.

3. Materials and Methods

This paper proposes a risk assessment method that integrates knowledge graphs with ERNIE-CNN. As illustrated in Figure 1, the model framework consists primarily of an input embedding layer, a text encoding layer, a semantic alignment layer, and a knowledge encoding layer. The input s to the network is a preprocessed sequence of words representing the secondary task risk content, while the output is the risk level label for that content. We represent the probability of the secondary task risk content belonging to the y level using a probability distribution

p = (y ∣ s, ϕ)

, where

ϕ

involves network parameters. The input embedding layer processes the preprocessed text through a CNN for feature extraction, generating text vectors. The text encoding layer utilizes the ERNIE model to encode these text vectors, producing vector q representations of the secondary task risk content. The semantic alignment layer aligns non-standard secondary task risk content to standard representations by retrieving information from the knowledge graph. The knowledge encoding layer further processes the aligned content through a CNN for feature extraction and employs an attention mechanism to obtain conceptual representations. Finally, the vector representations q and conceptual representations y of the risk content are fused and input into a fully connected layer to obtain the probability of each risk level.

3.1. Input Embedding Layer

The input consists of two parts: secondary operation risk content, denoted as s, with a length of n. In this module, we employ two types of embeddings: character embedding and concept embedding, as illustrated in Figure 2. The character embedding layer maps each word to a high-dimensional vector space.

We utilize Convolutional Neural Networks (CNNs) to obtain character-level embeddings for each word. The characters are embedded in vectors that can be viewed as one-dimensional inputs to the CNN, with the vector size corresponding to the number of input channels in the CNN. The output of the CNN is then max pooled over its entire width to obtain a fixed-size vector for each word. For concept embeddings, we map each concept to a high-dimensional vector space using pre-trained word vectors. Both the character vectors and concept vectors have a dimensionality of

d / 2

. We concatenate the character embedding vectors with the concept embedding vectors to form conceptual representations of dimensions d.

3.2. Semantic Alignment Layer

The semantic alignment layer is designed to map colloquial and non-standard secondary operation risk content into a standardized secondary operation risk knowledge graph, thereby retrieving the corresponding standardized content. Specifically, given a piece of secondary operation risk content s, our goal is to find its standardized counterpart C within the knowledge graph. Initially, we obtain a set of entities

ε

related to C through entity linking. Subsequently, for each entity

e \in ε

, we retrieve its conceptual information from the secondary operation risk knowledge graph via a conceptualization process. For instance, given the content “Maintenance of the 220 kV Hongbei-Shabian Hongke 4978 line in Nantong, replacement and debugging of the Hongke 4978 line protection, verification, modification of settings, and renaming of related circuits for 220 kV double busbar differential protection”, we first link to the entity set

ε

220 kV, line maintenance, line protection, double busbar differential protection circuit verification. Then, by conceptualizing “line maintenance”, we obtain the concept set C line protection, line protection modification, corresponding primary equipment outage, 220 kV, risk level III.

3.3. Text Encoding Layer

The objective of this module is to generate a textual representation q for a given secondary operation risk content of length n, where this short text is represented as a sequence of D dimensional word vectors

{x_{1}, x_{2}, \dots, x_{n}}

. As illustrated in Figure 3, ERNIE employs the Transformer Encoder as the foundational framework for semantic representation.

The encoder is composed of multiple stacked identical layers, each consisting of two sublayers: a Self-Attention layer and a Feed-Forward Neural Network, both incorporating residual connections and normalization processes. In ERNIE 3.0, the self-attention mechanism comprises 12 attention heads. The computation process is as follows:

Q = W_{q} X

(1)

K = W_{k} X

(2)

V = W_{v} X

(3)

where X represents the vector corresponding to each word, and W is a randomly initialized parameter that is optimized during training. The attention score for each word is then calculated using the following formula:

A = A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

where Q and K represent the similarity between two elements (vectors), and the softmax function normalizes the results of their dot product. The multi-head attention mechanism utilizes multiple queries to compute in parallel, selecting various parts of the input information. Each attention head focuses on different portions of the input, and their outputs are then concatenated. The computation formula is as follows:

Q_{i} = Q W_{i}^{Q}, K_{i} = K W_{i}^{K}, V_{i} = V W_{i}^{V}, i = 1, 2, \dots, 12

(5)

h e a d_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i}), i = 1, 2, \dots, 12

(6)

M u l t i H e a d (\begin{matrix} Q, K, V \end{matrix}) = C o n c a c t (\begin{matrix} h e a d_{1}, h e a d_{2}, L, h e a d_{12} \end{matrix}) W_{o}

(7)

Here,

W_{i}^{Q} \in R^{\frac{d}{2} \times d_{k}}

,

W_{i}^{K} \in R^{\frac{d}{2} \times d_{k}}

,

W_{i}^{V} \in R^{\frac{d}{2} \times d_{k}}

,

W^{O} \in R^{h d_{ν} \times \frac{d}{2}}

.

We apply a max pooling layer on A to obtain the text representation q. The idea is to select the maximum value in each dimension of the vector to capture the most significant features.

3.4. Knowledge Encoding Layer

Prior knowledge obtained from external resources, such as knowledge bases, provides richer information to assist in determining the risk level of the content of a given secondary task. We use conceptual information as an example to illustrate knowledge encoding. Given a set of concepts C of size m, denoted

(c_{1}, c_{2}, \dots, c_{m})

where

c_{i}

is the vector of concepts i-th, the goal is to generate its vector representation P. As shown in Figure 4, we introduce two mechanisms of attention to focus more on important concepts.

To mitigate the adverse effects of inappropriate concepts introduced due to the ambiguity of entities or noise in the knowledge graph, we introduce the Concept-Short Text (C-ST) attention mechanism based on vanilla attention to measure the semantic similarity between each concept and the text representation. We calculate the C-ST attention using the following formula:

α_{i} = s o f t max (w_{1}^{T} f (W_{1} [c_{i}; q] + b_{1}))

(8)

Here,

α_{i}

represents the attention weight of the i-th concept with respect to the text. A larger

α_{i}

indicates that the i-th concept is semantically more similar to the text.

f (•)

is a nonlinear activation function, and softmax is used to normalize the attention weights of all concepts.

W_{1}

denotes the weight matrix,

w_{1}

denotes the weight vector, and

b_{1}

represents the bias term.

Furthermore, to account for the relative importance of concepts, we implement Concept-to-Concept Set (C-CS) attention based on the self-attention mechanism, which measures the significance of each concept relative to the entire concept set. We define the C-CS attention for each concept as follows:

β_{i} = s o f t max (w_{2}^{T} f (W_{2} [c_{i}; C] + b_{2}))

(9)

Here, the variable

β_{i}

represents the attention weight of the i-th concept with respect to the entire set of concepts.

W_{2}

denotes the weight matrix,

w_{2}

the weight vector, and

b_{2}

the bias term.

The effect of C-CS attention is analogous to that of feature selection. It acts as a soft feature selection mechanism, assigning larger weights to important concepts and smaller weights (close to zero) to irrelevant ones. We combine the weights of

α_{i}

and

β_{i}

to obtain the final attention weights for each concept using the following formula:

a_{i} = s o f t max (γ α_{i} + (1 - γ) β_{i})

(10)

Here,

a_{i}

represents the final attention weight assigned to the i-th concept in the short text, and

γ \in [\begin{matrix} 0, 1 \end{matrix}]

is used to adjust the importance of the attention weights

a_{i}

and

β_{i}

.

Finally, the weighted sum of the concept vectors is calculated using the final attention weights, resulting in a semantic vector that represents the concept.

p = \sum_{i = 1}^{m} a_{i} c_{i}

(11)

4. Results

This paper designs relevant comparative experiments to evaluate the extraction accuracy of the two-stage extraction model based on the aforementioned knowledge extraction model and graph construction strategy. The aim is to validate the model’s optimization effect in the field of power secondary operation. Furthermore, using the extraction results from the power secondary operation dataset, we implement a visualization of the power secondary operation knowledge graph.

4.1. Experimental Environment

To validate the performance of the proposed model in the task of power secondary operation risk assessment, the experimental environment is set up as shown in Table 1.

4.2. Data Description for Power Secondary Operation Risk Assessment

This study conducted operational risk analysis based on 3240 secondary equipment maintenance work permits obtained from a provincial power company under State Grid Corporation. By integrating the relay protection operation risk assessment table with 12 industry standards, including Q/GDW 11613-2016 [23], we developed a knowledge graph for secondary operation risks encompassing voltage levels, high-risk operations, and risk assessment. The cases of secondary operation risk assessment are illustrated in Table 2.

The research established a data standardization framework strictly adhering to “Risk assessment Table for Relay Protection Professional Operations”. For identified risk level annotation conflicts in the raw data, a three-phase expert review process was implemented. The recurrent invalid markers (“/”) in the power supply company risk assessment fields were systematically removed after confirming their inapplicability in ultra-high voltage exclusive operation scenarios.

When performing high-risk operations, operators need to identify the high-risk job contents and their corresponding risk levels to adopt appropriate safety measures. Therefore, this study aligns the high-risk job texts collected from the field with the texts in the job assessment table to assign risk levels.

4.3. Evaluation Metrics

In this experiment, the evaluation metrics for risk assessment of secondary operations are Precision (P), Recall (R), and

F_{1}

Score. The formulas for calculating these three evaluation metrics are as follows:

P = \frac{T_{p}}{T_{p} + F_{p}}

(12)

R = \frac{T_{p}}{T_{p} + F_{N}}

(13)

F_{1} = \frac{2 P R}{P + R}

(14)

Here,

T_{p}

represents the correctly predicted instances,

F_{p}

denotes the instances that were predicted but incorrectly,

F_{N}

signifies the instances that were not predicted, and

F_{1}

is the harmonic mean of P and R, used to comprehensively reflect the prediction performance. In the experiment on risk assessment for secondary operations, the instances refer to the content of secondary operation risks, and the prediction target instances are the risk levels.

4.4. Risk Assessment Experiment Based on ERNIE-CNN

Considering the experimental environment and dataset characteristics, we first set the training parameters for the model. The specific training parameters for the model are presented in Table 3. The experimental parameters were optimized through systematic validation; the batch size (16) balanced GPU memory constraints and gradient stability, while the learning rate (5 × 10⁻⁵) was selected via progressive tests across four orders of magnitude to optimize convergence. The 80-token sequence length preserved 95% of semantic content based on dataset statistics. A dropout rate of 0.5 showed optimal regularization in ablation studies, reducing overfitting by 34%. The training duration was controlled by early stopping (five-epoch patience, 1 × 10⁻⁴ loss threshold), ensuring convergence without overtraining. These settings achieved reproducible results with <1.5% cross-validation variance.

To validate the performance of our model for secondary task risk assessment, we conducted a comparative experiment by benchmarking it against multiple baseline models. As shown in Table 4, our method achieves performance with an F1-score of 0.869, representing a 4.0% absolute improvement over the ERNIE baseline. Notably, our model outperforms graph-based approaches such as GraphSAGE (F1 = 0.760) and GAT (F1 = 0.868), demonstrating the superiority of integrating knowledge graphs with ERNIE-CNN over pure graph architectures. Additionally, our method maintains a balanced precision–recall tradeoff (P = 0.878, R = 0.861), surpassing both traditional text models (e.g., TextCNN F1 = 0.720) and advanced pre-trained models (e.g., BERT F1 = 0.809). The results validate that domain-specific knowledge guidance enhances risk pattern capture in power systems compared to general purpose graph/text models.

To validate the effectiveness of the text encoding module, the knowledge encoding module, and the attention mechanism, this paper conducted ablation experiments. As shown in Table 5, the results demonstrate that the model achieves optimal performance when all three modules are present. The component contribution analysis reveals three critical insights: (1) The standalone text encoder demonstrates superior feature extraction capability over the knowledge encoder; (2) multi-modal integration exhibits nonlinear synergy, where combined text-knowledge encoding surpasses theoretical additive performance by 2.3%; (3) attention mechanisms optimize feature interaction, elevating the complete architecture to peak performance through targeted precision enhancement. Precision–recall dynamics further indicate text encoding’s recall bias versus knowledge-enhanced configurations’ precision dominance, empirically validating contextual weighting efficacy in false positive suppression.

4.5. Construction of the Knowledge Graph for Electric Power Secondary Operations

Based on the entity-relation triples obtained through knowledge extraction, we performed data cleaning to standardize the triples. Subsequently, we imported the entire dataset into the Neo4j graph database by first creating nodes and then matching relationship edges according to the triples. This process resulted in the formation of a knowledge graph for electric power secondary operations. Neo4j, a user-friendly and highly scalable graph database, represents entities as nodes and relationships as edges between nodes. It enables the visualization of self-created knowledge graphs and provides Cypher query language for convenient searching, deleting, and supplementing of entity relationships within the graph. Figure 5 presents a partial example of the knowledge graph formed by importing electric power secondary operation triples into the graph database.

5. Discussion

The current study presents a novel approach for risk assessment in secondary power operations by integrating knowledge graphs with the ERNIE-CNN model. The results obtained through rigorous experimental validation demonstrate the effectiveness and superiority of the proposed method over several baseline models. From the perspective of previous studies and the working hypotheses, the following discussion unfolds.

Firstly, our findings align well with the existing research that highlights the limitations of traditional risk assessment methods relying heavily on expert experience and historical data. These methods often suffer from subjectivity and difficulty in adapting to complex and dynamic scenarios. By incorporating knowledge graphs, our approach enriches the model input with semantic understanding and relational insights, thereby improving its generalizability and robustness. This corroborates with previous research that emphasized the value of semantic technologies in enhancing risk assessment accuracy.

Secondly, the integration of the ERNIE-CNN model with knowledge graphs represents a significant advancement. ERNIE, known for its strong ability in capturing contextual information, when combined with CNN’s powerful feature extraction capabilities, demonstrates enhanced performance in identifying hidden features and patterns within the secondary power operation data. The incorporation of attention mechanisms further refines the model’s focus on critical features, leading to improved precision, recall, and F1-scores. This finding resonates with prior studies that explored the benefits of model fusion in various domains.

Moreover, the results provide empirical support for our working hypotheses. We hypothesized that by leveraging the semantic richness of knowledge graphs and the deep learning capabilities of ERNIE-CNN, the model would be better equipped to handle the complexities of risk assessment in secondary power operations. The achieved P, R, and F1 values of 0.878, 0.861, and 0.869, respectively, validate this hypothesis and underscore the model’s practical utility. The implications of these findings extend beyond the specific domain of secondary power operations. The proposed method offers a generalizable framework for risk assessment in other safety-critical domains where accurate and timely risk identification is crucial. By integrating knowledge graphs with advanced deep learning models, similar performance gains can be expected, highlighting the broad applicability of our approach.

Three key strengths emerge: (1) Explicit association between device operations and risk clauses ontologies resolves persistent “maintenance status misjudgment” issues in conventional methods; (2) attention mechanisms optimally balance contributions from textual and graph features; (3) support for incremental knowledge updates allows seamless integration of new standards without model retraining.

Two limitations warrant attention: (1) Model performance heavily depends on knowledge graph completeness, requiring manual ontology expansion for novel power electronics. (2) The current architecture demands at least 500 labeled samples for fine-tuning, limiting transferability in data-scarce scenarios.

The proposed method incorporates systematic scalability optimizations to address the dynamic growth of intelligent power grid data. The hierarchical ERNIE-CNN architecture, with its parameter-sharing mechanism and multi-scale convolutional kernels, adaptively handles data distributions at varying scales, exhibiting sublinear computational complexity growth to support large-scale power text processing. The modular ontology of the knowledge graph enables block-wise updates—new entities are integrated through an automated pipeline (real-time data ingestion → semantic parsing → relational inference), where incremental embedding algorithms (e.g., online variants of TransE) fine-tune local subgraphs without requiring global reconstruction. Furthermore, dynamic subgraph pruning activates only task-relevant components during inference, coupled with GPU memory-optimized sparse computation to minimize resource overhead. This tripartite scalability design (data–knowledge–computation) allows the model to accommodate continuous domain knowledge evolution (e.g., new equipment types or failure patterns) while efficiently processing growing heterogeneous data streams in smart grid environments, providing architectural-level readiness for industrial deployment.

In the context of secondary power operation risk assessment, model errors could lead to severe consequences such as equipment misoperation or undetected failures. For example, misclassifying “communication anomalies in relay protection devices” as low risk might delay fault resolution and trigger cascading outages. To address this, our methodology employs dual safeguards to balance automation and safety: (1) Domain-specific rules embedded in the knowledge graph (e.g., “protection device anomaly → mandatory human review”) act as a logic validation layer, automatically escalating cases where model predictions conflict with predefined rules or fall below confidence thresholds; (2) for ambiguous scenarios in equipment status descriptions, the model’s visualized knowledge subgraphs explicitly show entity relationship paths supporting risk decisions, enabling operators to integrate multi-source data for informed judgments. Our “model screening → knowledge validation → human confirmation” multi-stage workflow has successfully intercepted misjudgments caused by environmental interference. Additionally, decision traceability logs document discrepancies between model suggestions and human overrides, ensuring accountability and transparency while maintaining grid safety standards.

Future work will prioritize real-world deployment; the model will be adapted to edge computing devices for real-time processing and integrated with grid monitoring systems (e.g., SCADA) through compatible data protocols to synchronize equipment status time-series data (e.g., circuit breaker operations, insulation metrics). The knowledge graph will leverage stream processing frameworks to dynamically ingest inspection reports and fault records, using domain-specific entity recognition models to extract critical information aligned with monitoring data. Deployment will follow a phased approach: First, a digital twin system will simulate regional grid disturbances in lab environments to validate robustness. Next, lightweight API modules will be piloted to map model risk outputs into standardized alerts while developing interactive interfaces for human-in-the-loop verification. Full-scale deployment will require industry certifications for functional safety and cybersecurity compliance. Future efforts will address real-world challenges such as data noise filtering, multi-system time synchronization, and online incremental learning to transition the experimental model into a trusted industrial solution.

6. Conclusions

This study addresses critical technical challenges in secondary power operation risk assessment through three key innovations: First, by integrating power industry expertise with equipment nomenclature standards, we construct a modular knowledge graph defining 12 core entity types, 45 semantic relationships, and an eight-layer ontology structure. An annotated corpus derived from grid enterprises enhances the resolution of semantic limitations inherent in traditional static data dependency. Second, the ERNIE-CNN framework is augmented with dual attention mechanisms (C-ST and C-CS), where the semantic alignment layer dynamically maps non-standard operational descriptions to knowledge graph entities. This integrates character-level embeddings and concept-level embeddings via the Transformer architecture and multi-head attention mechanisms in the text encoding layer, achieving a 4.0% F1-score improvement in implicit feature recognition across 3240 operational records. Third, we propose an end-to-end knowledge graph construction workflow encompassing ontology design, semantic parsing, incremental embedding, and Neo4j-based visual decision path tracing. The knowledge encoding layer employs feature selection mechanisms to enable automated knowledge injection from real-time operational data. Experimental validation yields an F1-score of 0.869, demonstrating the method’s industrial applicability in equipment state feature extraction and decision path analysis.

Future research can further explore methods for constructing and optimizing knowledge graphs to improve their accuracy and completeness. For instance, more data sources and algorithms can be introduced to enrich the content of the knowledge graph, or more advanced graph embedding techniques can be employed to enhance its representational power. Additionally, considering the integration of our method with other advanced technologies could further improve the accuracy and efficiency of risk assessment. Furthermore, future studies could apply our method to a wider range of domains, such as finance and healthcare, which also require accurate risk assessments. Our proposed method can provide valuable technical support and solutions for these fields.

In summary, the method proposed in this paper offers a novel approach to risk assessment in power systems, with significant theoretical and practical implications. Future research can build on this foundation to conduct deeper explorations and innovations, driving the development and application of risk assessment technologies forward.

Author Contributions

Conceptualization, X.H.; methodology, X.H. and Y.W.; software, P.L. and X.R.; validation, P.L.; formal analysis, X.R. and X.H.; investigation, Z.Z.; resources, Y.W.; data curation, P.L.; writing—original draft preparation, X.H.; writing—review and editing, P.L., G.L. and X.R.; visualization, Y.W.; supervision, X.R.; project administration, Z.Z.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by State Grid Jiangsu Electric Power Company Science and Technology Project grant number J2023106.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study.

Conflicts of Interest

Authors Xiang Huang, Ye Wang, Xuchao Ren were employed by the company State Grid Jiangsu Electric Power Company. Author Ping Li was employed by the company State Grid Jiangsu Electric Power Co., Ltd. Huaian Power Supply Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Banerjee, B.; Sannistha, P.B. A machine learning approach based on decision tree algorithm for assessment of transient events in microgrid. Electr. Eng. 2023, 105, 2083–2093. [Google Scholar] [CrossRef]
Qi, Y. Random forest for bioinformatics. In Ensemble Machine Learning: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 307–323. [Google Scholar]
Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK, 26–26 August 2004; IEEE: Piscataway, NJ, USA, 2004; pp. 32–36. [Google Scholar]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
Ma, B.; Liang, G.; Rao, Y.; Guo, W.; Zheng, W.; Wang, Q. Knowledge Reasoning- and Progressive Distillation-Integrated Detection of Electrical Construction Violations. Sensors 2024, 24, 8216. [Google Scholar] [CrossRef] [PubMed]
Sun, X.; Liu, F.; Zhao, Y.; Liu, F.; Wang, J.; Zhu, S.; He, Q.; Bai, Y.; Zhang, J. Research on Environmental Risk Monitoring and Advance Warning Technologies of Power Transmission and Distribution Projects Construction Phase. Sensors 2024, 24, 7695. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Jiang, Q.; Tang, Y.; Zhu, B.; Xiangm, Z.; Tang, J. Fault diagnosis of power dispatching based on alarm signal text mining. Electr. Power Automat. Equip. 2019, 39, 126–132. [Google Scholar]
Liu, Z.; Wang, H.; Cao, J.; Qiu, J. A assessment model of power equipment defect texts based on convolutional neural network. Power Syst. Technol. 2018, 42, 644–650. [Google Scholar]
Du, X.; Qin, J.; Guo, S.; Yan, D. Text mining of typical defects in power equipment. High Volt. Eng. 2018, 44, 1078–1084. [Google Scholar]
Wei, W.; Long, N.; Tian, Y.; Kang, B.; Wang, D.; Zhao, W. Research on text detection method of electric equipment nameplate based on improved DBNet. High Volt. Eng. 2023, 49, 63–67. [Google Scholar]
Dai, Y.; Zhang, J.; Ji, Z.; Liu, M.; Gao, T.; Zheng, Y.; Yao, L. Intelligent diagnosis and auxiliary decision of power system secondary equipment based on functional defect text. Electr. Power Automat. Equip. 2021, 41, 184–194. [Google Scholar]
Ravita, M.; Sheetal, R. Enhanced DSSM (deep semantic structure modelling) technique for job recommendation. Comput. Inf. Sci. 2022, 34, 7790–7802. [Google Scholar]
Gong, Y.; Luo, H.; Zhang, J. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May2018; pp. 56–62. [Google Scholar]
Wang, B.; Kuo, C.-C.J. SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. IEEE/ACM Trans. Audio, Speech Lang. Process. 2020, 28, 2146–2157. [Google Scholar] [CrossRef]
Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-BERT: Enabling Language Representation with Knowledge Graph. Proc. Aaai Conf. Artif. Intell. 2020, 34, 2901–2908. [Google Scholar] [CrossRef]
Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. arXiv 2019, arXiv:1905.07129. [Google Scholar]
Sun, A.; Yin, X.; Li, X. A Task-Oriented Network Risk Assessment Model. J. Air Force Eng. Univ. (Nat. Sci. Ed.) 2019, 20, 105–110. [Google Scholar]
Lin, L.; Chen, Q.; Jin, L.; Wang, L. Research and Application of Knowledge Graph Based Defect Knowledge Representation for Substation Alarm Information. Power Syst. Protect. Control 2022, 50, 90–99. [Google Scholar]
Wen, H.; Yang, Y. Fusion of ERNIE and knowledge enhancement for clinical short text classification. Comput. Eng. Appl. 2023, 1, 1–10. [Google Scholar] [CrossRef]
Yin, Y.; Tang, J.; Gu, W. Fusion of temporal knowledge graph and CNN-LSTM for process production quality prediction. Comput. Integr. Manuf. Syst. 2023, 1, 1–15. [Google Scholar]
Xia, K.; Huang, J.; Wang, H. LSTM-CNN architecture for human activity recognition. IEEE Access 2020, 8, 56855–56866. [Google Scholar] [CrossRef]
Kong, L.; Chi, C.; Zhan, X. Research on Book Text assessment by Integrating Knowledge Graph and Bert+CNN. Comput. Program. Ski. Maint. 2023, 1, 140–142+158. [Google Scholar]
Q/GDW 11613-2016; Technical Specification for Loop Condition Inspection Devices. State Grid Corporation of China: Beijing, China, 2016.

Figure 1. The model framework.

Figure 2. CNN text extraction network (two channels for an example sentence).

Figure 3. The ERNIE architecture (arrows indicate process).

Figure 4. Two mechanisms of attention.

Figure 5. Knowledge graph for secondary power operation.

Table 1. Experimental environment.

Component	Configuration
Operating System	Ubuntu
GPU	Nvidia Titan Xp
Development	Pytorch 1.8.1, Python 3.7
Acceleration	Cuda 10.2, CUDNN 10.2

Table 2. Examples of risk assessment for secondary operations.

No.	Voltage Level	High-Risk Operation Content	Risk Assessment
1	1000 kV	Verification of the first Huai-Hu ultra-high voltage automatic control and switching device	III
2	500 kV	UPFC Protection System Retrofit (with 2-out-of-3 voting logic)	III
3	220 kV	Circuit Breaker Protection Retrofit	III
4	220 kV	Fault Recorder System Modernization	IV
5	220 kV	Routine maintenance operations excluding relay protection retrofits, testing and defect rectification	V
6	110 kV	Busbar Protection System Retrofit	IV
7	110 kV	Common Equipment Retrofit (including fiber optic distribution panels and protection data concentrators)	V
8	35 kV and below	Reactor Protection Retrofit	IV
9	35 kV and below	Merging Unit and Secondary Circuit Defect Rectification	V

Table 3. Experimental parameters.

Parameter	Value
Batch size	16
Learning rate	0.00005
Max length	80
Drop rate	0.5
Number of epochs	30

Table 4. Experimental Results.

Model	P	R	F1
TextCNN	0.714	0.701	0.720
TextRCNN	0.722	0.712	0.717
TextRNN	0.748	0.755	0.751
FastText	0.756	0.764	0.760
BERT	0.801	0.817	0.809
ERNIE	0.825	0.833	0.829
GraphSAGE	0.763	0.755	0.760
GAT	0.845	0.838	0.868
Ours	0.878	0.861	0.869

Table 5. Ablation experiment (“✓” indicates the module is enabled, “–” denotes disabled).

Text Encoding Module	Knowledge Encoding Module	Attention	P	R	F1
✓	–	–	0.825	0.833	0.829
–	✓	–	0.766	0.759	0.762
✓	✓	–	0.857	0.840	0.848
–	✓	✓	0.780	0.774	0.777
✓	✓	✓	0.878	0.861	0.869

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, X.; Li, P.; Wang, Y.; Ren, X.; Zhao, Z.; Li, G. Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations. Energies 2025, 18, 2104. https://doi.org/10.3390/en18082104

AMA Style

Huang X, Li P, Wang Y, Ren X, Zhao Z, Li G. Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations. Energies. 2025; 18(8):2104. https://doi.org/10.3390/en18082104

Chicago/Turabian Style

Huang, Xiang, Ping Li, Ye Wang, Xuchao Ren, Zhenbing Zhao, and Gang Li. 2025. "Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations" Energies 18, no. 8: 2104. https://doi.org/10.3390/en18082104

APA Style

Huang, X., Li, P., Wang, Y., Ren, X., Zhao, Z., & Li, G. (2025). Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations. Energies, 18(8), 2104. https://doi.org/10.3390/en18082104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Input Embedding Layer

3.2. Semantic Alignment Layer

3.3. Text Encoding Layer

3.4. Knowledge Encoding Layer

4. Results

4.1. Experimental Environment

4.2. Data Description for Power Secondary Operation Risk Assessment

4.3. Evaluation Metrics

4.4. Risk Assessment Experiment Based on ERNIE-CNN

4.5. Construction of the Knowledge Graph for Electric Power Secondary Operations

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI