A Knowledge Graph Method towards Power System Fault Diagnosis and Classiﬁcation

: As the scale and complexity of electrical grids continue to expand, the necessity for robust fault detection techniques becomes increasingly urgent. This paper seeks to address the limitations in traditional fault detection approaches, such as the dependence on human experience, low efﬁciency, and a lack of logical relationships. In response, this study presents a cascaded model that leverages the Random Forest classiﬁer in combination with knowledge reasoning. The proposed method exhibits a high efﬁciency and accuracy in identifying six basic fault types. This approach not only simpliﬁes fault detection and handling processes but also improves their interpretability. The paper begins by constructing a power fault simulation model, which is based on the IEEE 14-bus system. Subsequently, a Random Forest classiﬁcation model is developed and compared with other commonly used models such as Support Vector Machines (SVMs), k-Nearest Neighbor (KNN), and Naïve Bayes, using metrics such as the F1-score, accuracy, and confusion matrices. Our results reveal that the Random Forest classiﬁer outperforms the other models, particularly in small-sample datasets, with an accuracy of 90%. Then, we apply knowledge mining technology to create a comprehensive knowledge graph of power faults. At last, we use the transE model for knowledge reasoning to enhance the interpretability to assist decision making and to validate its reliability.


Introduction
The modern world is heavily reliant on electrical power, making it a critical component of our infrastructure [1].However, power systems are susceptible to various types of faults, including short circuits, voltage fluctuations, equipment failures, and environmental disturbances [2].These faults can lead to power outages and equipment damage, and even pose risks to public safety [3].The economic and social consequences of power faults are significant, emphasizing the need for robust fault detection and diagnosis methods.
Efficient fault detection techniques are essential to mitigate the impact of power faults.Rapid identification and localization of faults allow power grid operators to take corrective actions promptly, reducing downtime and minimizing economic losses [4].Furthermore, early fault detection can prevent cascading failures and ensure the stability of the entire power system.Therefore, the development of accurate and reliable fault detection methods is of utmost importance in the field of power system engineering.
Various methods have been proposed for power fault detection over the years [5], encompassing a wide spectrum of approaches.These range from traditional techniques like rule-based systems, which rely on predefined heuristics and thresholds, to more advanced methodologies such as machine learning and data-driven methods [6].Expert systems, for instance, are designed to harness human knowledge by encoding expert rules, enabling them to detect faults with a certain degree of expertise [7].
Support Vector Machines (SVMs), on the other hand, provide a potent tool for classification tasks, including fault detection, by identifying optimal hyperplanes for separating different classes [8].Recent years have witnessed the ascendancy of Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs), particularly in the realms of imageand graph-based data, where their capabilities have shone in power fault detection [9,10].
Naive Bayes, rooted in the principles of Bayes' theorem, employs probabilistic reasoning to make predictions based on statistical probabilities, making it a valuable addition to the arsenal of fault detection methods [11].On the other hand, the simplicity and effectiveness of k-Nearest Neighbor (KNN) lie in its ability to classify data points based on the majority class among their nearest neighbors, a feature that makes it an attractive option for many applications, including power fault detection [12].
Despite their promise, each of these methods comes with its own set of limitations that can impede their effectiveness.Traditional rule-based systems, characterized by their reliance on predefined heuristics, can struggle to adapt to dynamic conditions and are often ill suited to the management of complex, interconnected power systems.Deep learning techniques like CNNs and GNNs frequently require large quantities of labeled data for training, posing significant challenges in the context of power system fault detection.Furthermore, the performance of KNN can be highly sensitive to the choice of distance metric and the number of neighbors selected, making it important to fine-tune these parameters for optimal results.Additionally, one common drawback shared by many of these methods is their lack of interpretability, making it challenging to gain insights into the underlying causes of detected faults.
While these methods have advanced the field of power fault detection, the need for fault detection techniques that are both interpretable and adaptable to the complex, evolving nature of modern power systems remains unmet.The inability to gain a deep understanding of why certain faults are detected and the challenges in adapting existing methods to new conditions underscore the necessity for innovative approaches.
This paper introduces a novel approach to power fault detection that aims to overcome the limitations of existing methods.As shown in Figure 1, our proposed method leverages the power of knowledge graphs, which allows us to represent and model the complex relationships and dependencies within power systems.By capturing the knowledge of experts and historical data, we create a structured representation of the power grid, making it easier to identify and analyze faults.
To enhance the accuracy and robustness of our approach, we integrate knowledge graphs with the power of random forests, a machine learning ensemble method.This combination allows us to harness the strengths of both approaches, leveraging the structured knowledge encoded in the graph while benefiting from the predictive power of random forests.The result is a powerful and adaptable fault detection model that can handle the intricacies of modern power systems.
In the subsequent sections of this paper, we will delve into the details of our proposed knowledge-graph-based fault detection method, its implementation, and its performance evaluation.We believe that this approach holds the potential to significantly improve the reliability and efficiency of power fault diagnosis, ultimately contributing to the stability and resilience of our electrical power systems.

Power Fault Model
With the continuous development of the power industry, the scale of power systems is getting increasingly larger.In order to avoid the limitations of the scale and complexity of the power system, and for the safety and reliability of the experiment, MATLAB simulations are employed for modeling and analyzing the actual operation of the power system.The IEEE 14-bus system is a commonly used standard test system in power systems for simulating and analyzing various phenomena, such as stability, power flow, and power faults [13].It consists of 14 nodes, including 4 generator nodes, 3 load nodes, and 7 switch nodes.Through simulation, we can more intuitively understand the operation of the power system and can easily measure data under various fault conditions [14].

Structures and Elements
The various components of the power grid include generators, transformers, switchgear, and transmission lines [15].Generators are devices used to convert other energy sources into electricity, transformers are devices used to enhance or reduce current and voltage, switchgear is used to control the flow of electricity and maintain the proper operation of the equipment, and transmission lines are devices used to transport electricity from the power plant to the substation and then through the distribution grid to the consumer.All of these components need to maintain a high degree of stability and reliability during operation to ensure the normal operation of the power system.
In this paper, we have extracted the constituent elements of the grid, including the generators, transformers, transmission lines, low-voltage lines, and power-using elements, and then modelled these structures.

Fault Classification
A power failure refers to a problem or malfunction in the power supply system that results in the inability to transmit or provide electricity effectively.As shown in Figure 2, power failures can typically be categorized into two aspects: short circuit faults and open circuit faults [16].

Short Circuit Fault
A short circuit fault is an electrical fault that occurs when there is an unintended low-resistance pathway for the electrical current to flow, leading to overheating, sparks, fire, or even electrical shock hazards.Short circuit faults can be classified into two main types.
Symmetrical Faults: Also known as balanced faults, symmetrical faults occur when all three phases of an electrical system experience the same level of fault.This can happen due to reasons such as conductor insulation failure, direct contact between conductors of different phases, or faulty equipment.
Unsymmetrical Faults: Unsymmetrical faults, also known as unbalanced faults, occur when the fault affects one or more phases of the electrical system differently.This can occur due to reasons such as phase-to-phase or phase-to-ground faults, faulty insulation, or equipment failure.

Open Circuit Fault
An open circuit fault refers to a discontinuity or break in an electrical circuit which prevents the flow of current.This interruption can occur in one or multiple conductors within the circuit, leading to different types of open circuit faults.
One-conductor open fault: In this type of fault, a single conductor in the electrical circuit is broken or disconnected, causing a gap in the flow of electricity.As a result, current cannot pass through the affected conductor, and the circuit becomes incomplete.This fault could be due to a broken wire, a loose connection, or a faulty component.
Two-conductor open fault: Unlike the previous fault, a two-conductor open fault involves the simultaneous breakage or disconnection of two conductors in the circuit.As a result, two separate gaps are created, preventing the flow of current through both conductors.This fault could occur due to multiple causes, such as physical damage, incorrect wiring, or faulty components.

Model Construction
The power system model contains a three-phase source, connected to an RLC load (three-phase series RLC load), connected to a three-phase transformer (three-phase transformer) via a three-phase PI section Line, and finally connected to a three-phase transformer by a three-phase PI section Line.The three-phase PI section line is connected to the threephase transformer.Finally, the three-phase fault and three-phase breaker simulate various faults, and the real-time parameters are measured by an oscilloscope to analyze the voltage of each phase sequence of the fault [17].

Power Fault Graph
The knowledge graph of a power fault is a comprehensive representation of various aspects related to electricity disruptions [18].It encompasses the causes, consequences, and preventive measures associated with power outages.Through interconnected nodes and links, the graph organizes and visualizes the intricate network of factors, such as equipment malfunctions, natural disasters, and human errors, that can result in power failures.This knowledge graph provides a valuable resource for understanding and analyzing power disruptions, aiding in their prevention and facilitating prompt resolution when they occur [19].

Graph Construction
Constructing a power fault knowledge graph involves four essential steps: data acquisition, knowledge extraction, integration, and quality improvement [20,21].The process is shown in Figure 3.
Data acquisition involves gathering information from diverse sources such as books, the internet, and expert experience to obtain domain-specific knowledge on power faults.
Knowledge extraction employs techniques like crawling, parsing, and fact extraction.Crawling systematically browses sources, parsing organizes and structures data, and fact extraction identifies specific factual information related to power faults.
Integration combines the extracted information to construct a coherent and interconnected knowledge graph.This process includes knowledge linking, establishing relationships between information, and knowledge fusion, merging diverse knowledge sources to ensure consistency and avoid redundancy.
Quality improvement enhances the constructed graph's overall quality.It includes knowledge completion and correction.Knowledge completion fills gaps or missing information by leveraging additional sources or expert input, while knowledge correction rectifies inaccuracies or errors for reliable information.

Power Fault Knowledge Graph Framework
The electric power domain's knowledge graph framework consists of four indispensable components: the equipment entity graph, the concept graph, the business logic graph, and the fault case graph [22].The equipment entity graph comprehensively captures detailed information about physical assets and devices within the power system, thus enabling a holistic view of the infrastructure.
Incorporating semantic knowledge, the concept graph augments the power fault knowledge graph by defining relationships and associations between equipment entities, significantly enhancing reasoning capabilities.Meanwhile, the business logic graph integrates operational rules and regulations, providing guidance for the power system's operation, monitoring, and maintenance practices.
Furthermore, the fault case graph serves as a repository of historical and simulated data, housing fault records, potential causes, and diagnostic outcomes.This repository facilitates timely fault detection, diagnosis, and resolution.
Collectively, these components synergize to create a comprehensive and dynamic power fault knowledge graph.This knowledge graph is pivotal in optimizing system performance and bolstering the reliability of power systems.

Random Forest Algorithm
In recent years, machine learning techniques have gained significant attention in the field of power fault detection due to their ability to handle complex and non-linear patterns in power system data.The Random Forest (RF) classifier, a popular ensemble learning algorithm, has shown promise in this regard for its ability to handle high-dimensional power fault data [23].This section explores the application of RF classifiers for power fault detection.We delve into the principle of the algorithm, discuss parameter settings, and present experimental results, including comparative experiments with other algorithms, along with the evaluation of their performance.

Principle of the Algorithm
Random Forest is an ensemble learning algorithm that is widely used for classification tasks.It is based on the idea of decision trees and combines multiple trees to make robust predictions.The key principles of the Random Forest algorithm are as follows.
Decision trees: Random Forest is built upon decision trees, which are simple models that partition data into subsets based on feature values.Each tree learns from a random subset of the training data and features, making them less prone to overfitting [24].
Bootstrap aggregating (bagging): Random Forest employs a technique known as bagging, where multiple decision trees are trained independently on different subsets of the training data with replacements.This diversity helps reduce variance and improve overall accuracy.
Random feature selection: Another crucial aspect of Random Forest is the random selection of a subset of features at each node of the tree.This randomness further reduces the correlation among the individual trees and leads to better generalization.
Voting or averaging: In the classification task, Random Forest combines the predictions of individual trees by either majority voting (for classification) or averaging (for regression), resulting in a final prediction.

Parameter Settings
To effectively apply Random Forest to power fault detection, appropriate parameter settings must be chosen.Some key parameters are listed below (Table 1).

Experimental Design
To assess the effectiveness of Random Forest in power fault detection, we conducted experiments and compared its performance with other algorithms commonly used in this domain, including Support Vector Machines (SVMs), k-Nearest Neighbor (KNN) and the Bayesian classifier (BC).Due to the small sample size, a four-fold cross-validation technique was employed to maximize data utilization and mitigate overfitting.The performance of each algorithm was evaluated using various metrics such as the accuracy, precision, recall, F1-score, and confusion matrix (shown in Figures 4-7

Results
We obtained 66 sets of fault data after simulation using a power fault model.We used 30% of the data as a test set and the rest as a training set.The results are as follows (Table 2).Our experimental results indicate that Random Forest classifiers exhibit superior performance in power fault detection compared to the alternative algorithms.In conclusion, the application of Random Forest classifiers in power fault detection has shown promising results, owing to its robustness, generalization capabilities, and ease of parameter tuning.

Statistical Analysis
In order to test whether the RF algorithm is statistically significantly different from the other algorithms, we employed the Friedman test to assess the statistical significance.Subsequently, Nemenyi's follow-up test was conducted to determine significant differences between algorithm pairs.Table 3 presents the results of Friedman's test for the accuracy, recall, precision, and F1-score metrics.
The p-values in Table 3 indicate significant differences (p < 0.05) among the algorithms in terms of classification performance on the four data subsets.Based on this observed significance, the Nemenyi follow-up test examines specific algorithm pairs.Figures 8-11 display Nemenyi test results, revealing that RF significantly outperforms other algorithms in accuracy, recall, and the F1-score.Notably, RF exhibits superior performance compared to SVMs and the other two algorithms across various metrics.
In conclusion, the statistical analyses confirm the superiority of the proposed RF algorithm over alternative methods, demonstrating an enhanced classification performance.

Knowledge-Based Decision Making
In this chapter, we employed the transE algorithm, a knowledge reasoning technique, on the power fault knowledge graph that was previously constructed.Through the training process, we acquired embedded representations of diverse entities and relationships present in the knowledge graph.These embedded representations encapsulate essential features and semantics of the entities and relationships, enabling effective reasoning and analysis [25,26].

TransE Algorithm
The widely used transE [27] algorithm maps entities and relations to low-dimensional vector representations, enabling semantic representation and inference in knowledge graphs.Inspired by translation invariance in word vectors, transE represents the relationship between entities as the vector difference between them, resulting in a simple and efficient algorithm.During model training, the algorithm learns certain semantic information.Given a triad (h, r, t), the goal of transE is to make the vector h + r approximate the vector t, representing the transfer of h to r.This relationship is depicted in the following Figure 12.

Knowledge Reasoning
To utilize the obtained embeddings, we use the scoring function, denoted as f (h, r, t), which takes a fault type as the head entity [28].The relationship, denoted as r, can be selected from fault characteristics, suggested solutions, or historical cases, depending on the analysis requirements.By inputting the fault type and relationship into the scoring function, we obtain a score for each potential tail entity.
The selected tail based on the obtained scores entity offers abundant information pertaining to the specific fault being investigated.By leveraging this approach, we enhance the comprehensive understanding of power faults, enabling more effective fault diagnosis, analysis, and decision making [29].

Result
After training with the transE algorithm, we obtained the embedding representations of the power fault knowledge graph.While receiving the fault types passed by the Ran-dom Forest classifier, the scoring function calculates the scores for the possible outcomes and takes the highest scoring outcome as the output.We will use MR, MRR, HITS@1, and HITS@3 as evaluation metrics, and the following Table 4 displays the results of this approach [30].The results show that this method performs well.The model achieved a Mean Rank (MR) of 1.672, signifying the model's ability to rank correct answers higher within a list of possibilities.Furthermore, the HITS@3 score of 0.867 reflects the model's strong performance in correctly predicting relationships, with the correct answer frequently ranking within the top three.This is due to the fact that power fault knowledge graphs are mostly one-to-one relationships well suited for transE.

Power Fault Diagnosis and Decision-Making Model
These diverse model's results were integrated into a robust and all-encompassing framework for fault classification, detection, and decision support within the intricate domain of electric power systems.The primary dataset comprises power fault data, meticulously collected and preprocessed, which are then subjected to an advanced machine learning approach, namely the Random Forest classifier.This classification step serves as a crucial initial gatekeeper, as it effectively discerns the specific category or nature of the fault occurrence, allowing for precise diagnosis.
Subsequently, the identified fault type becomes the linchpin of a more elaborate knowledge reasoning model.This model acts as an intellectual cornerstone, facilitating operators and engineers to access many domain-specific insights and recommendations.These insights include historical patterns, potential root causes, recommended mitigation strategies, and even expert guidance on handling similar incidents.This dynamic fusion of data-driven classification and knowledge-driven reasoning empowers decision makers to make well-informed choices and respond swiftly to electrical power faults.
This comprehensive end-to-end solution optimizes the entire workflow, streamlining the process from raw fault data to effective solutions.In doing so, it significantly elevates the operational efficiency, reduces downtime, and ultimately contributes to the reliability and sustainability of the electric power infrastructure.

Discussion
We propose a combined model.It enables automatic output of the corresponding fault type and various information related to the fault based on the observed time-series voltage data.One of the primary contributions of our research is the substantial improvement in fault diagnosis accuracy.Our integrated model, which employs a Random Forest classifier as a fault detection algorithm, has demonstrated a substantial enhancement in the precision of fault identification.Through extensive testing and validation, we have consistently achieved accuracy rates exceeding 90%, outperforming modern machinelearning-based approaches.
Another important result is the successful incorporation of domain knowledge into our fault diagnosis process.Our model's knowledge base includes a comprehensive repository of historical data, industry standards, and expert-generated rules.This ensures that our system not only detects faults but also provides a more contextually relevant understanding of the issues at hand.We believe this integration of domain knowledge sets our approach apart from other purely data-driven methods and significantly contributes to the interpretability and reliability of our results.
Our method also excels in terms of interpretability, which is essential for power grid operators who need to understand the reasoning behind fault diagnoses.The system provides detailed explanations for its decisions, showing how it arrived at a particular diagnosis.This transparency allows operators to trust and act upon the provided insights, reducing the risk of human-machine miscommunication.
Furthermore, our solution's scalability is a key feature.We have designed the system architecture to be modular and flexible, allowing it to seamlessly incorporate improvements in fault detection and knowledge-reasoning algorithms.As these algorithms evolve and improve, our model's accuracy and effectiveness will continue to increase.This scalability ensures that our system can adapt to the evolving demands of the power industry, which is characterized by an ever-expanding power grid and the constant need for efficient fault diagnosis and handling.It also makes our solution future-proof, positioning it to play a pivotal role in the ongoing transformation of the power sector.

Limitations
Limited by experimental conditions, the fault data in this paper are mainly from the simulation of power system faults in the IEEE 14-bus system, and thus the data quality is not high enough.However, due to the good robustness and generalization ability of the Random Forest classifier, the performance would be maintained if trained with better quality data.In addition, we have only explored the voltage data observed under the six basic fault types, and although this essentially covers most fault situations, other fault types, such as complex faults caused by combinations of several basic fault types, still exist.There are limitations under some certain situations.

Conclusions
In conclusion, our study has introduced a novel combined model that leverages the synergy between knowledge graphs and machine learning algorithms, achieving a higher level of integration and automation compared to prior research efforts.This combined model integrates two specialized sub-models for fault detection and knowledge reasoning, both of which have demonstrated a strong performance.
The key contributions of our research lie in providing end-to-end solutions for the classification, diagnosis, and handling of most basic fault types.As a modular model, it excels in terms of modularity, specialization, and flexibility, allowing for easy adaptability to various fault scenarios.
While our model showcases advantages in interpretability and scalability, there is room for improvement.Future research directions include enhancing the quality and scale of our dataset, improving the knowledge reasoning model to accommodate complex relationships, refining the fault detection algorithm for greater accuracy, and extending the model's capabilities to diagnose and handle more complex fault types.Addressing these areas in future work will further bolster the model's performance and expand its applicability to a broader range of fault scenarios, contributing to the ongoing advancement of fault detection and handling in power systems.
).All simulation experiments were conducted in the Python-PyCharm Community, Edition 2022, and the MATLAB 2022b environment on a computer featuring a 5.4 GHz Intel (R) Core (TM) i9-13900 CPU, 16.0 GB RAM, and 64-bit Windows 11.

Figure 4 .
Figure 4. Confusion matrix of the RF model.

Figure 5 .
Figure 5. Confusion matrix of the SVM model.

Figure 6 .
Figure 6.Confusion matrix of the KNN model.

Figure 7 .
Figure 7. Confusion matrix of the BC model.

Table 1 .
Parameters settings of random forest classifier.

Table 2 .
Experimental results of four methods.

Table 3 .
The Friedman's test of the classification capability of all algorithms.

Table 4 .
Experimental results of transE.