1. Introduction
As the terminal link in the power system, the distribution network directly affects the reliability of power supply and the continuity of electricity use for users [
1]. With the continuous expansion of the scale of distribution networks in modern society, their complexity and vulnerability have significantly increased. Currently, over 80% of grid faults occur in distribution networks [
2,
3]. These faults not only disrupt power supply and economic activities but also pose serious threats to public safety and daily life. As a result, many countries and regions have introduced policies and measures to enhance the resilience of distribution systems, including promoting smart grid development, facilitating the integration of distributed renewable energy, and imposing stricter requirements on power supply reliability indicators such as the System Average Interruption Frequency Index (SAIFI) and other power supply reliability metrics, and the objective of fault handling has shifted from ‘sustainable operation for 2 h’ to ‘local isolation of permanent small-current ground faults,’ which imposes higher demands on rapidly locating fault areas. These policy directions highlight the urgency of developing efficient and accurate fault location methods to ensure the stable operation of power systems and meet modern reliability standards. Therefore, rapidly and accurately locating fault sections has become a critical issue that needs to be addressed, especially in large-scale, complex distribution grids where various types of faults coexist, further complicating identification efforts [
4].
Existing methods for fault section localization in distribution networks can be broadly classified into two categories: knowledge-driven and data-driven approaches [
5]. Knowledge-driven methods rely on diagnostic rules, physical principles, and fault models to infer fault locations. These techniques typically analyze network topology, equipment status, and historical fault data, in combination with operational experience and fault mechanisms, to determine fault segments [
6]. Several representative knowledge-driven methods have been proposed. For example, reference [
7] extracts the volt-ampere characteristics and zero-sequence power vectors based on multiple transient features and uses fuzzy C-means clustering to locate the fault section, effectively avoiding threshold dependency issues. It achieves 100% discrimination accuracy under high-resistance grounding and different compensation conditions, but its adaptability still needs to be verified in multi-feeder structures. In [
8], a method based on the centroid frequency of zero-sequence current and K-means clustering is proposed, demonstrating certain fault tolerance but risking identification failure in special scenarios. Reference [
9] utilizes the discrete Fréchet distance metric to measure the differences between transient and steady-state waveforms, and achieves segment localization through FCM algorithm clustering, with a localization accuracy rate of 97.2%. It demonstrates strong robustness under complex operating conditions, particularly exhibiting good adaptability to high-impedance ground faults. Furthermore, [
10] introduces an improved admittance model accounting for three-phase zero-sequence asymmetry and proposes a phase-angle difference-based line selection criterion, validated through both simulations and field experiments. In [
11], variational mode decomposition and a peak–trough detection algorithm are used to extract concavity features of zero-sequence current, forming kurtosis and concavity-based criteria applicable to various fault types. Reference [
12] combines closed–open difference operations with graph theory and shortest path theory and uses an improved binary ocean predator algorithm to search for faulty feeders in the candidate solution space, achieving 100% accuracy in a variety of fault scenarios. In summary, knowledge-driven methods offer strong interpretability and fast localization for conventional fault scenarios, leveraging physical mechanisms and domain expertise. However, their adaptability and accuracy are limited in handling complex and diverse fault conditions, restricting their applicability in real-world distribution networks.
Data-driven methods aim to localize faults in distribution networks by leveraging real-time measurements and applying machine learning and data mining techniques. These approaches do not depend on expert knowledge or predefined diagnostic rules; instead, they rely on large volumes of historical data to train models that automatically identify and classify fault types, thereby partially overcoming the limitations of knowledge-driven methods [
13]. Several representative studies demonstrate the application of deep learning in this context. In [
14], a spatiotemporal graph convolutional network incorporating attention mechanisms is developed by combining a graph attention module with a one-dimensional convolutional neural network (1D-CNN). This model enables the fusion of multi-source telemetry data and node mapping, exhibiting strong adaptability to diverse operational scenarios and generalization across network topologies. Reference [
15] constructs an image feature matrix using wavelet packet transformation, extracts image features using a fine-tuned AlexNet network, and combines multiple classifiers for fault segment identification, fully utilizing the advantages of convolutional networks in extracting image information. The algorithm achieves an accuracy rate of 99.92%. Reference [
16] uses an edge computing architecture to construct a multi-objective optimization model and trains a deep neural network based on steady-state changes in phase current to achieve rapid fault location and improve model deployment flexibility. In multiple experiments, the accuracy rate reached over 95%. Reference [
17] designed an attention network structure to enhance the model’s adaptability to the distribution network topology, introduced a consistency risk control strategy to suppress inference bias, and improved the reliability and robustness of predictions, maintaining an accuracy rate of over 98% in a variety of scenarios. Reference [
18] constructs a two-dimensional image based on the Gram angle, and through image fusion and a dual-channel convolutional neural network, jointly extracts time-domain and image features to achieve collaborative learning of high-dimensional features and fault line screening, with an accuracy rate of 96.04%. In [
19], complete ensemble empirical mode decomposition is applied to obtain intrinsic mode functions of zero-sequence currents from each line, which are concatenated into a time–frequency matrix. A convolutional neural network is then used to automatically extract discriminative fault features. Reference [
20] employs discrete wavelet transform to extract instantaneous-voltage statistical features and integrates multiple artificial neural networks to improve fault phase identification. However, this method faces limitations in accurately determining fault propagation paths. Overall, data-driven methods offer strong automation and adaptability, allowing for the detection of subtle fault features in complex power system environments and improving localization accuracy. Nevertheless, their performance heavily relies on data quality and can be vulnerable to interference, which may hinder their robustness in real-world scenarios.
In summary, although both knowledge-driven and data-driven methods have demonstrated effectiveness in fault section localization within distribution networks, notable limitations persist in real-world applications. Knowledge-driven approaches are built upon expert heuristics and physical mechanism models, offering strong theoretical interpretability. However, these methods often struggle to generalize to the highly dynamic and complex operating conditions of distribution networks. Their rule-based frameworks typically fail to comprehensively address novel or compound fault scenarios, leading to insufficient adaptability.
Conversely, data-driven approaches leverage large volumes of operational data to perform fault localization through data modeling and algorithmic inference. These methods exhibit considerable flexibility and are capable of capturing intricate fault characteristics. Nevertheless, they often lack an in-depth understanding of the underlying electrical mechanisms. Their decision-making is based solely on patterns learned from data, which increases the risk of misclassification, especially when training datasets are incomplete or biased. Furthermore, model parameters in data-driven approaches generally lack physical interpretability [
21,
22], rendering the inference process opaque and undermining trust in high-stakes applications, such as those demanding stringent safety and reliability in power systems. To address these challenges, knowledge–data hybrid approaches have emerged as a promising direction. These methods integrate the explanatory strength of physics-based models with the learning capacity of data-driven algorithms [
23,
24]. By embedding expert knowledge and system mechanisms into the diagnostic framework, they enhance the credibility of data interpretation and improve the controllability of the inference process. This fusion not only strengthens model adaptability under boundary conditions and extreme fault cases but also improves interpretability and practical relevance, establishing it as a key research trend in intelligent fault localization for distribution networks.
In practice, the reliable operation of distribution networks requires not only accurate fault localization but also models with strong robustness and interpretability. Existing knowledge-driven approaches often fail to adapt to complex and dynamic operating conditions, while purely data-driven methods may lack physical interpretability and be sensitive to noise and incomplete training data. Therefore, there is a practical necessity to develop hybrid approaches that integrate domain knowledge with data-driven learning. In response to the aforementioned issues, this study proposes a hybrid fault localization method that integrates physics-based knowledge with deep learning architectures. Transient zero-sequence current waveforms are used as inputs to construct dual-channel image representations via transfer entropy and Markov Transition Fields (MTFs). A knowledge-guided attention mechanism is incorporated alongside a similarity-weighted strategy to design a physically constrained deep neural network model. Simulation experiments on distribution network systems are conducted to verify the model’s advantages in terms of accuracy, robustness, and interpretability. These results provide both theoretical and engineering insights for the development of safe, efficient, and intelligent fault localization systems. The proposed method provides tangible value by significantly improving fault localization accuracy, reducing misclassification under noisy conditions, and enhancing model interpretability. These advantages demonstrate its potential application in real-world distribution network operation and maintenance, offering both theoretical significance and engineering practicality. The remainder of this paper is structured as follows:
Section 2 introduces the fault mechanism of distribution networks, transfer entropy and the MTF method, and the fully connected layer structure;
Section 3 describes the improved AlexNet and the knowledge-guided feature extraction and decision module;
Section 4 presents the experimental setup, results, and robustness analysis; and
Section 5 summarizes the paper and discusses future research directions.
2. Distribution Network Fault Mechanism and Convolutional Neural Networks
2.1. Transient Zero-Sequence Current Direction Analysis
When a single-phase ground fault occurs, a grounding path is formed between the fault point and the earth, giving rise to a fault current. In systems with an ungrounded neutral, the fault current propagates through an asymmetric current distribution across the network, thereby inducing transient zero-sequence currents. The propagation of the fault current follows the characteristics of a transient process, rather than reaching a steady state instantaneously. During this evolution, the magnitude and direction of the zero-sequence currents exhibit noticeable fluctuations. The distribution of zero-sequence current during a single-phase ground fault in a low-current grounding system is illustrated in
Figure 1. Here, A, B, and C represent the three-phase lines respectively. When a fault occurs on phase C, the voltage increase on the non-faulty phases causes charging of the capacitance to earth, while the voltage drop on the faulty phase induces discharging of this capacitance. The transient current is the sum of these two components. Due to the uneven impedance distribution across the grid, the current propagation paths are affected by varying impedances, resulting in opposite directions of zero-sequence currents at upstream and downstream nodes.
Figure 2 shows the transient zero-sequence current waveforms extracted from the upstream and downstream measurement nodes of a neutral-point ungrounded system during a single-phase ground fault. It can be observed that the directions of the transient zero-sequence currents on both sides of the fault point are always opposite, while the phases of the transient zero-sequence currents on the same side of the fault point are consistent, and the waveforms are relatively similar. Zero-sequence currents exhibit significant distortion between 0.02 and 0.03 s, corresponding to the occurrence of the single-phase ground fault and the propagation of electromagnetic transients. The rapid rise in current is caused by sudden voltage imbalance, while the subsequent oscillatory distortion reflects interactions between distributed capacitances. This distortion typically persists for several cycles until the system enters a new quasi-steady state. This type of distortion contains rich information about the fault direction and segment characteristics. Therefore, by calculating and extracting the zero-sequence current phase data at each measurement node in the system at the initial moment of the fault or during the transient process, this information can be comprehensively used as criteria to determine the fault segment.
2.2. Transfer Entropy
In the transient stage of single-phase grounding faults in distribution networks, zero-sequence current signals at different nodes exhibit explicit causal propagation characteristics. To quantitatively characterize the signal propagation paths and their directional properties, the concept of transfer entropy (TE) from information theory is introduced to model the temporal dependencies among measurement nodes. Originally proposed by Schreiber in 2000, transfer entropy is a non-symmetric measure of conditional mutual information [
25]. It quantifies the information gain obtained from one time series in improving the predictive capability of another, thereby indicating the presence of potential causal influence between variables. In practical applications to distribution networks, let nodes
X and
Y, respectively, record transient zero-sequence current sequences
and
. Then, the transfer entropy from
Y to
X is defined as
Here, and denote the lagged states of Y and X, respectively, representing the embedding dimensions of historical information, while denotes the joint probability density function. Unlike symmetric measures such as mutual information, transfer entropy possesses explicit directionality, enabling the identification of both the direction and intensity of transient fault information propagation. This, in turn, reveals the underlying channels of information flow within the system.
In practice, the probability distributions in Equation (1) are estimated using a k-nearest-neighbor conditional entropy estimator:
Embedding parameters: .
Transfer entropy calculation: given in the form of conditional entropy difference: .
Estimator: conditional entropies and are estimated using kNN with .
Normalization: all matrix entries are normalized to [0,1] for cross-node comparison.
Complexity: the computational cost is , with N nodes, M samples, and neighbor number k. This remains computationally tractable in our experiments.
Based on the aforementioned definition, a transfer entropy matrix can be constructed to characterize both the causal strength and directional features of zero-sequence current propagation among nodes in the distribution network. Specifically, under a given fault scenario, transient zero-sequence current data are collected from
N measurement nodes, and the transfer entropy
is computed for each node pair
, resulting in an
N ×
N transfer entropy matrix
T. Each element
represents the strength of information transfer from node
i to node
j. Typically, entropy values from upstream to downstream of the fault are higher than those in the reverse direction. As illustrated in
Figure 3a, the normalized transfer entropy matrix can be visualized as a two-dimensional heat map, where “high-gradient transition points” closely correspond to the boundaries of fault sections, providing an effective criterion for fault localization. Furthermore, as shown in
Figure 3b, when the transfer entropy matrix is overlaid onto the distribution network topology, it effectively forms a directed weighted graph, where the edge weights indicate the direction and magnitude of information propagation paths. Highlighting the segments with higher weights in the diagram will identify the faulty sections.
When a single-phase ground fault occurs in a distribution network, the fault point serves as the origin of current disturbance, triggering significant transient electromagnetic perturbations. These signals propagate outward from the fault point throughout the system along the network topology. Because the propagation path of current is closely associated with the electrical distance between nodes and the impedance structure of the grid, high-value elements in the transfer entropy matrix tend to cluster between upstream and downstream node pairs along the fault propagation path. In the vicinity of the fault point—particularly at upstream nodes that receive more direct disturbances—the ability to disseminate information across the network is markedly enhanced. This is reflected in significantly higher outgoing transfer entropy values from these nodes compared to the entropy they receive. In contrast, downstream nodes, located near the termination of the disturbance path, tend to receive substantially more information than they transmit, resulting in a state where incoming entropy exceeds outgoing entropy. This phenomenon—where the transfer entropy matrix naturally reveals “information sources” and “information sinks”—is closely aligned with the physical propagation characteristics of the fault point and its adjacent upstream and downstream nodes.
2.3. Markov Transition Field
The Markov Transition Field (MTF) is a technique for transforming one-dimensional time series into two-dimensional images [
26]. Its core principle involves leveraging the state transition probability matrix of a Markov process to explicitly encode temporal dependencies within the time series into a 2D spatial representation. Compared with traditional time-domain or frequency-domain methods, MTF captures the dynamic structure of time series while expressing the probabilistic transitions between time points in image form, thereby offering a natural input format for convolutional neural network (CNN)-based temporal modeling. In the context of fault section localization in distribution networks, transient zero-sequence current waveforms exhibit pronounced dynamic characteristics, with variations jointly influenced by node location, electrical topology, and fault type. Converting these waveform sequences into MTF images enables a global perspective on the evolving system states during fault propagation and provides more discriminative features for intelligent diagnostic models.
Let the one-dimensional time series of the zero-sequence current associated with a fault be denoted as
. To enhance the model’s generalizability and stability, the sequence is first normalized to the interval [0,1], then sorted by amplitude and partitioned into Q quantile bins. Common binning strategies include uniform binning, where each bin spans an equal amplitude range; quantile binning, where each bin contains an equal number of data points; and normal binning, where the number of points in each bin follows a normal distribution. The discretized state sequence is denoted as
, where
This operation maps continuous values to a finite set of states, forming the basis of Markov modeling. Based on the state sequence S, calculate the Markov first-order state transition probability matrix
, where each element
represents the probability that the system transitions from state
i to state
j:
This matrix can be interpreted as a compressed representation of evolutionary trends in the state space, characterizing the global dynamic properties of the system. By mapping the state transition probabilities in the transition matrix onto the time point pairs of the original time series, a Markov Transition Field matrix
is formed:
Each element of the matrix M represents the probability of a transition from the state at time a to the state at time b. In the Markov Transition Field (MTF) image, each pixel encodes the probability of a state transition between two time points in the series, thereby offering a joint representation that integrates both state and temporal dimensions. For transient processes induced by single-phase ground faults in distribution networks, it is typically necessary to collect zero-sequence current waveforms from multiple measurement nodes. The transient current signals from all nodes under the same fault event are concatenated into a longer one-dimensional time series according to the node index order. This means that the complete time series effectively constitutes a sequential superposition of observations from multiple nodes. Consequently, once the MTF matrix is constructed, the resulting image can be interpreted as comprising several sub-regions, each corresponding to specific node data. Both the row and column indices in the image align one-to-one with time points in the series, and the waveform segment associated with a particular node can be identified based on the concatenation order.
In distribution networks, zero-sequence current waveforms caused by faults often exhibit abrupt variations and localized propagation characteristics. MTF enables these non-stationary transition patterns to be mapped into spatially salient regions within the image, thereby facilitating the distinction between normal and fault states.
Figure 4 presents an MTF image generated from a concatenated time series of multi-node zero-sequence currents. The high-probability bands near the diagonal reflect short-term stationarity in the signal, while regions farther from the diagonal reveal long-term dynamic behavior.
It is worth noting that once the transfer entropy matrix is transformed into a two-dimensional image, its high-value regions exhibit clear physical correspondence with the critical state-transition areas in the MTF image. Specifically, in the MTF representation, cross-node temporal transition patterns are encoded in the off-diagonal sub-blocks between waveform segments from different nodes. Meanwhile, the high-gradient entries in the transfer entropy matrix reflect the intensity of causal relationships between node pairs. When both representations are jointly fed into the model in the image domain, the highlighted regions consistently indicate the primary propagation path of fault disturbances and the key upstream and downstream nodes. These two modalities thus corroborate and complement each other. This multi-perspective information fusion not only enhances the expressiveness of spatiotemporal dependencies in the input data but also provides a physically interpretable and intuitive feature basis for subsequent convolutional kernels to automatically attend to the “fault-source-to-propagation-chain” structure.
2.4. AlexNet
AlexNet is a deep convolutional neural network (CNN) architecture proposed by Krizhevsky et al. in 2012, which achieved a groundbreaking performance improvement in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [
27]. The network consists of five convolutional layers, three max-pooling layers, and three fully connected layers [
28], offering powerful capabilities in visual feature extraction and representation. The model incorporates the ReLU activation function to accelerate the convergence of nonlinear mappings, employs dropout regularization to mitigate overfitting, and leverages GPU-based parallel computation to enhance training efficiency.
The final three layers of AlexNet are fully connected layers, with
, the last layer, mathematically formulated as a typical combination of linear transformation and normalization. If the output feature vector of the previous layer is
, then the final output layer is
Among them, is the weight matrix, is the bias term, and N is the number of target categories. The Softmax function normalizes the linear response of each category into a probability distribution vector , thereby realizing the estimation of category probabilities and the final decision output. This structure not only enables the model to have strong discriminative ability but also can effectively compress and normalize high-dimensional abstract feature vectors, resulting in good numerical stability of the output.
Although AlexNet shows a good performance in image scenarios, it has certain limitations when processing fault images with temporal causal structures and physical constraints in power systems:
The input layer cannot be compatible with special images that contain topological causal structures and physical propagation laws.
Convolutional layers cannot actively focus on the critical reverse information of upstream and downstream waveforms in the fault propagation chain.
The output layer does not consider the important impact of waveform difference on classification decisions and lacks a targeted weighting mechanism.
3. Knowledge-Driven Optimization of AlexNet
3.1. Knowledge-Guided Feature Extraction Layer
In conventional data-driven approaches, neural networks rely on large-scale training datasets to automatically learn feature representations. Although such models exhibit strong adaptability, their decision-making processes often lack physical interpretability, are vulnerable to noise, and are sensitive to boundary samples. To enhance both the accuracy and interpretability of fault section identification in distribution networks, domain knowledge of fault mechanisms is incorporated into the neural network framework. A knowledge-guided feature extraction layer is thus designed to help the model focus on image regions that are strongly associated with underlying physical characteristics.
As discussed in
Section 1, under single-phase ground fault conditions, the zero-sequence current waveforms of upstream and downstream nodes exhibit significant phase differences and amplitude discontinuities. These differences are particularly pronounced during the initial transient stage, where the polarity inversion of waveforms is especially evident. Based on this physical insight, one-dimensional transient zero-sequence current time series from multiple measurement nodes are concatenated in node order to form a unified input sequence. This sequence is then transformed into a two-dimensional image using the Markov Transition Field (MTF) method, which serves as the input to the neural network. Given the node-wise concatenation, the resulting MTF matrix can be interpreted as consisting of multiple sub-blocks: diagonal sub-blocks represent local state transitions within individual nodes, while off-diagonal sub-blocks capture inter-node (i.e., upstream–downstream) transition patterns. Due to the opposite initial phases of upstream and downstream waveforms, the state transition directions in these inter-node regions typically present sharper local gradient changes than those observed within a single node. These regions form strong anti-correlated high-probability zones in the off-diagonal sub-blocks. In the MTF image, the sub-blocks immediately adjacent to the diagonal are the most informative spatial regions that encode phase-opposite features between upstream and downstream nodes.
To explicitly extract such critical image structures, a set of Knowledge-Guided Modules (KGMs) is integrated into the AlexNet architecture. These modules are designed as spatially aware attention mechanisms informed by node topology, aimed at selectively modeling the regions in the MTF image that correspond to waveform inversion phenomena. Specifically, let denote the input MTF image, where T is the temporal length, and represents the transition probability from the state at time i to the state at time j.
Based on the physical mechanism of polarity reversal between upstream and downstream node waveforms, a difference-sensitive function
is introduced to measure whether the element
is located in a potential “waveform inversion propagation region.” The function is defined as follows:
Among them, δ represents the time interval threshold, which is used to exclude autocorrelation regions; and ϵ is the low transition probability threshold, which is used to identify negative correlation regions. By constructing a mask matrix
where
, a knowledge-aware image attention map can be obtained. This attention map is embedded into the network as a guiding signal to perform weighted fusion with the original image features:
where
is the original feature representation of MTF images;
represents element-wise multiplication;
is an adjustable hyperparameter used to control the intensity of knowledge enhancement.
The knowledge-guided feature extraction layer enhances the representation of critical regions in distribution network fault localization tasks by incorporating domain-specific physical mechanisms into the neural network. By assigning differentiated weights to spatial positions, this module enables the network to more accurately preserve salient fault-related features in the image—such as phase and amplitude deviations—while suppressing or disregarding irrelevant information. Specifically, the introduction of physical knowledge guides the network to focus more effectively on regions that are of high diagnostic importance during training, thereby significantly improving its sensitivity to key fault patterns. This design not only improves the model’s performance in fault localization tasks within distribution networks but also provides a theoretical foundation and technical support for fault diagnosis in intelligent power systems based on deep learning.
3.2. Knowledge-Driven Fully Connected Layer
In the event of a single-phase ground fault in a distribution network, the zero-sequence current waveforms of nodes located upstream and downstream of the fault point typically exhibit notable discrepancies, characterized by transient features such as amplitude jumps, phase reversals, and frequency deviations. To effectively incorporate this domain-specific physical knowledge into the neural network, a difference-based weight adjustment mechanism is integrated into the final fully connected layer () of the AlexNet architecture. Specifically, the waveform discrepancies between adjacent nodes are quantified using the Hausdorff distance, which is embedded as prior knowledge during the network’s inference phase to enhance its fault section discrimination capability.
Assume that a given fault simulation involves
n measurement nodes. Let nodes
i and
i + 1 be adjacent upstream and downstream nodes, respectively, and let their corresponding normalized zero-sequence current time series be denoted as
, where
T is the sequence length. To assess the overall waveform shape difference between these sequences, the Hausdorff distance
is defined as follows:
Among them,
represents the Euclidean distance between two points. Through this metric, the maximum local deviation between waveform pairs can be extracted, which has strong anomaly sensitivity. Calculate the Hausdorff distance between each pair of adjacent nodes to form a difference degree vector:
To facilitate fusion with neural network parameters, the discrepancy vector is normalized, defined as
Here, denotes the 2-norm normalization of the vector, ensuring its numerical scale is consistent with the network training weights.
Based on the above analysis, a discrepancy-guided fully connected layer correction mechanism is constructed. To incorporate the discrepancy prior information, a correction factor vector
is constructed, defined as follows:
where
is the node pair discrepancy corresponding to the i-th segment, and α is an adjustable hyperparameter controlling the prior guidance strength. The finally corrected weight matrix is expressed as
In the formula,
represents the linear discriminant weight of the i-th segment. Therefore, the corrected network output can be expressed as
In essence, this mechanism weights, amplifies, or suppresses the weight vector of each section category according to the degree of difference, making the network output more inclined to those sections with significant differences, that is, those more likely to contain faults during prediction, thereby improving the accuracy and interpretability of classification results. By introducing the Hausdorff distance to measure waveform differences and implementing a difference-guided weight correction mechanism in the fully connected layer, fault mechanism knowledge can be effectively integrated with the neural network structure, constructing a knowledge–data collaborative driven discrimination mechanism.
3.3. Knowledge-Data Joint Driving Model Design
To effectively improve the location accuracy of single-phase grounding fault sections in distribution networks and the physical interpretability of the model, a knowledge–data joint driven deep learning model structure is proposed. This method integrates the end-to-end feature expression capability based on data learning with the structure guidance mechanism based on mechanism knowledge. By combining the signal differences and waveform propagation laws of transient zero-sequence current waveforms, it realizes a model decision-making process that is more in line with physical reality. As shown in
Figure 5, the entire model consists of three core modules: the image expression enhancement module, the knowledge-driven feature extraction module, and the similarity adjustment-based determination module. These three modules work together to construct a fault section location process with a clear structure and reasonable decision-making.
(1) Image Expression Enhancement Module
To further enhance the temporal–spatial–causal comprehensive expression capabilities of input features, a transfer entropy matrix is introduced as an auxiliary channel on top of the original MTF image conversion, jointly constructing a dual-channel image input to form an image expression enhancement module. For each transient zero-sequence current sequence collected at each node under different fault conditions, we first concatenate the data in order of node number to form one-dimensional time-series data. Using the Markov Transition Field (MTF) method, we construct a state transition probability matrix and generate a two-dimensional image to capture the dynamic evolution patterns of the time-series signals. At this point, the generated MTF image has a size of NT × NT, where N is the number of nodes and T is the number of sampling points per node. Therefore, the entire MTF matrix can be viewed as composed of several sub-blocks, each of which has a clear physical correspondence with the node data. Specifically, the row and column indices in the image correspond one-to-one to the time points in the time-series data, and the waveform of a particular time point belonging to a specific node can be directly located through the sequence concatenation order. Meanwhile, we calculate the transition entropy values between measurement nodes based on the same time-series data and construct a transition entropy matrix for node pairs, with a size of N × N. After normalizing this matrix, we expand the transition entropy values between each node pair to the same size as the MTF matrix sub-block, i.e., to T × T. This ensures that the size of the transition entropy matrix matches that of the MTF matrix, and the physical meaning of each sub-block is consistent, representing the relationships between nodes. Finally, the MTF matrix and transition entropy matrix are aligned along the channel dimension, forming an MTF+TE dual-channel input. The MTF image focuses on representing the state transition probabilities and long-term/short-term dynamic relationships of individual node waveforms, while the transition entropy image emphasizes the causal transmission strength between nodes and the directionality of upstream and downstream relationships. The two complement each other, forming a more complete spatiotemporal causal multi-dimensional joint representation that reveals the patterns of fault propagation. This fusion approach ensures that the AlexNet network can simultaneously perceive temporal structural patterns and node coupling information during feature learning, and exhibits higher sensitivity to high-gradient regions (i.e., sudden disturbances and prominent information flow channels), thereby effectively activating the convolutional response in the ‘fault path’ region and significantly improving the separability and accuracy of fault segment localization.
(2) Feature Extraction Module
Based on the original convolution layer, this paper designs and introduces the Knowledge-Guided Module (KGM). This module integrates fault mechanisms into the shallow convolutional network and sets higher convolution weights for regions in the network that may be key paths of fault propagation according to the physical law of waveform reverse propagation in the distribution network. Specifically, by analyzing the inversion rules of zero-sequence current amplitude and phase at upstream and downstream nodes of the fault section, a knowledge-guided feature attention module is constructed to explicitly identify regions in the MTF image that may contain key fault information. Through spatial attention weight adjustment, this module guides the network to pay more attention to regions in the image that exhibit structural features such as waveform polarity inversion and mutation boundaries. This process not only improves the effectiveness of network feature extraction but also enhances the physical interpretability of model decisions. In addition, historical fault data is used to conduct offline training on the AlexNet model with the introduced knowledge-enhanced structure, learning the implicit mapping relationship between images and segments, and thereby achieving efficient optimization of feature layer parameters.
(3) Determination module
In the classification output stage, the original AlexNet uses standard fully connected layers combined with the Softmax function for classification. Although it has a certain discrimination ability, it cannot actively model the similarities or differences between the waveforms of different nodes. To address that, this paper introduces the Hausdorff distance as a quantitative indicator to measure waveform differences. By calculating the Hausdorff distance of waveform pairs between adjacent nodes, a difference vector is constructed and used to adjust the corresponding weights of the output layer. This mechanism enables the model to focus more on regions with significant waveform differences in classification decisions, thereby enhancing the discriminative sensitivity to fault sections. The specific weight correction method is to adopt the fully connected layer weight matrix obtained by Formula 12 based on the normalized Hausdorff difference index. This method not only enhances the discriminative ability of the output layer but also improves the overall physical interpretability and engineering adaptability of the model.
In summary, the model in this paper breaks through the black-box nature and staticity of AlexNet in structural design, integrates the modular modeling strategy driven by distribution network knowledge, and possesses good interpretability, scalability, and engineering applicability. The fault section location implementation process is shown in
Figure 6. Let N be the number of measurement nodes, T the per-node samples, L = NT the concatenated length, Q the MTF bins, and H×W the final dual-channel image size. The offline cost is dominated by naïve mapping
(MTF) and
(TE). The online inference per event is dominated by the AlexNet forward and image-sized operations
.
In practice, a zero-sequence current can be obtained in two standard ways: (i) by residual (vector) summation of the three-phase CT secondaries in protection/measurement IEDs (i.e., computing zero-sequence current from existing-phase CTs), or (ii) via a core-balance (zero-sequence) current transformer (CBCT) or Rogowski coil installed around the three conductors to directly measure zero-sequence current.
To enable multi-node causal analysis (transfer entropy and MTF), time-synchronized acquisition (e.g., GPS/PTP) is considered so that transients are aligned across nodes. The MTF+TE processing is lightweight and can run on a substation gateway/edge device or offline.
Cost impact. Because most distribution feeders already deploy phase CTs and IEDs, zero-sequence current is often available without adding new primary sensors. The incremental cost mainly consists of (a) enabling oscillography/export at ≥10 kHz, (b) providing a time-sync source at a few key locations, and (c) a compact data acquisition/edge gateway if not already present. Where coverage is missing, CBCT/Rogowski sensors can be added selectively at feeder heads and sectionalizing points rather than every span. Hence, the overall system cost scales with the number of instrumented nodes and is typically modest because it reuses existing substation assets. The next section will verify the performance of the above modules in actual scenarios through simulation experiments.
4. Experiment and Analysis
4.1. Simulation Configuration
The experiments and model training in this paper were conducted in the following hardware and software environment: the operating system used was Windows 10 64-bit, the hardware platform was an Intel Core i7-8550U CPU (1.80 GHz), and we used an NVIDIA GeForce RTX 4060; The software utilises the MATLAB R2022a version. A simulation model for a 10 kV ungrounded neutral distribution network was developed using MATLAB/Simulink, based on the IEEE 14-node system. The topology of the simulation structure is shown in
Figure 7. The model includes four feeders, designated as Line1–Line4. The nodes of each feeder are numbered 1–14. The entire network is divided into 10 fault sections, i.e., the intervals between measurement nodes. Each section serves as a target for fault location identification. The circled numbers (①–⑩) in the figure represent the predefined single-phase ground fault section numbers. Solid lines indicate overhead lines, while dashed lines indicate cable lines.
Among them, sections 6–9 are cable lines, and sections 1–5 and 10 are overhead lines. Overhead line parameters: positive sequence resistance r1 = 0.531 Ω/km, positive sequence inductance l1 = 0.096 × 10
−3 H/km, positive sequence capacitance c1 = 0.938 × 10
−6 F/km; zero sequence resistance r0 = 0.234 Ω/km, zero sequence inductance l0 = 0.355 × 10
−3 H/km, zero sequence capacitance c0 = 0.965 × 10
−6 F/km. Cable line parameters: positive sequence resistance r1 = 0.01273 Ω/km, positive sequence inductance l1 = 0.9337 × 10
−3 H/km, positive sequence capacitance c1 = 12.74 × 10
−9 F/km; zero sequence resistance r0 = 0.3864 Ω/km, zero sequence inductance l0 = 4.1264 × 10
−3 H/km, zero sequence capacitance c0 = 7.751 × 10
−9 F/km. Line lengths: L1 = 7 Km, L2 = 13 Km, L3 = 11 Km, L4 = 10 Km, L5 = 10 Km, L6 = 12 Km, L7 = 8 Km, L8 = 15 Km, L9 = 12 Km, L10 = 25 Km. The rated transformation ratio of the main transformer is 110/10 kV. According to the data requirements of the model input, simulation experiments were conducted under different fault conditions. The fault parameters are shown in
Table 1. The zero-sequence current of each measurement node was collected with a sampling data window length of 0.15 s and a sampling frequency of 10 kHz. Under each section fault scenario, 540 fault samples were obtained, and a total of 5400 fault data were simulated as the total sample set.
4.2. Comparison of the Effects of Different MTF Parameters
When applying the Markov Transition Field (MTF) to convert transient zero-sequence current waveforms from distribution network nodes into image representations, the number of quantile bins (Q) and the chosen discretization strategy significantly influence the resulting image’s texture and structural features. These, in turn, play a critical role in determining the model’s discriminative performance. From a theoretical perspective, the value of Q dictates the granularity of state space partitioning. A smaller Q yields coarser partitions, enabling the MTF image to emphasize global structural features but at the expense of local detail resolution. Conversely, a larger Q value enhances textural richness, allowing the model to capture fine-grained distinctions, though potentially at the cost of increased noise and risk of overfitting. To evaluate the impact of Q and discretization strategies, experiments were conducted using under three common partitioning methods: normal, quantile, and uniform.
For each fault section, 54 fault waveform samples were randomly selected from the dataset. Each parameter configuration was used to preprocess the original zero-sequence current time series into corresponding MTF images. All image samples were then split into training, validation, and test sets in a ratio of 0.7:0.15:0.15. A unified network architecture and fixed hyperparameter settings were used for training and evaluation across all configurations.
Figure 8 presents the validation accuracy curves for the MTF-based models under various parameter settings. As training progressed, models using the
normal and
uniform strategies achieved consistently higher validation accuracy. In contrast, the
quantile strategy yielded inferior performance, with noticeably lower accuracy levels.
To further distinguish the performance differences between different parameter MTF methods,
Table 2 summarizes the final validation set accuracy under each parameter configuration, making the results easier to interpret. Similarly, it can be observed that the overall effect of the quantile strategy is poor, with low accuracy. When the quantile box is set to 5, the validation set accuracy of all three strategies reaches its maximum value, with the normal and uniform strategies both reaching 100%.
Figure 9 shows the accuracy rates of the test set under various parameter combinations. It can be observed that the parameter setting of the
Q = 10 + uniform strategy combination achieved the highest accuracy of 99.75% on the test set. In scenarios with larger
Q values, the test set accuracy of both uniform and normal strategies decreased, which aligns with the trend observed in the validation set samples. Therefore, based on the comparative experiments conducted with key hyperparameter settings during the MTF construction process, the parameter setting of the
Q = 10 + uniform strategy combination is identified as the optimal parameter for the current research problem.
4.3. Visualization-Based Validation of Feature Extraction Capability
To further validate the feature extraction capability of the proposed model for fault section location in distribution networks, the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm was employed to visualize the feature representations at different layers of the neural network. A total of 810 MTF image samples were randomly selected from the test set. The feature representations at the input layer and the fully connected layer 7 (
) were extracted and reduced to two dimensions using t-SNE for visualization, as illustrated in
Figure 10. The input layer features, corresponding to the raw pixel distributions, mainly reflect the low-order statistical structure of the images. In the two-dimensional embedded space, these features display a high degree of overlap and irregular scattering, with blurred class boundaries and a lack of clear inter-class separability—indicating significant limitations in the discriminative power of the raw inputs. By contrast, the output features at the
layer exhibit pronounced intra-class compactness and inter-class separability. The t-SNE results show that samples corresponding to the same fault section form relatively compact clusters, while those from different sections are clearly separated in the embedded space. This demonstrates that the model effectively captures the semantic distinctions between fault patterns in deeper layers. Overall, these results indicate that the knowledge-driven model possesses strong feature extraction capabilities, successfully transforming unstructured raw inputs into high-dimensional representations with enhanced discriminative characteristics—thereby providing a solid foundation for accurate fault section classification.
4.4. Model Performance
In order to validate the effectiveness of the proposed method and evaluate its performance in the faulty segment localization task, the following three different configurations of AlexNet models are comprehensively compared:
Model 1: The traditional AlexNet model, i.e., the standard AlexNet network architecture without any knowledge-driven modifications, serves as the benchmark for comparison;
Model 2: Knowledge-driven AlexNet, which adds a knowledge-driven feature extraction layer to the base model to improve the feature extraction capability and training efficiency during the training process;
Model 3: Joint-driven model, which is based on the knowledge-driven AlexNet model, and further optimizes the fully connected layer by adding a knowledge-driven mechanism to improve the classification performance.
All three models use the MTF image set under the
Q = 10 + uniform policy combination parameter as the experimental samples, and randomly divide the training set, validation set, and test set according to 0.7:0.15:0.15. When training the network, the solver adopts the Adam optimization algorithm, the initial learning efficiency is set to 0.0001, the batch size is set to 64, and the maximum training period is 10.
Figure 11 shows the curves of the correct rate of the training set and the loss function value of model 1 and model 2, and it can be seen that the accuracy rate of the training samples of model 1 rises gradually but the curves fluctuate a lot, and the correct rates of the training samples of model 1 and model 2 both can reach 100% stably, but the convergence speed of model 2 is faster. At the same time, model 2 also shows better results in training time compared with model 1, and the training times of model 1 and model 2 are 3 min 45 s and 2 min 58 s, respectively.
The 810 test set samples that were not involved in training were selected and classified with model 1 and model 2, respectively; the classification prediction results were recorded and the confusion matrix was generated, and the results are shown in
Figure 12. The confusion matrix clearly shows the number of correct and incorrect classifications of the models for each category, where the diagonal data are the correct localization results and the non-diagonal data are the misclassifications. According to
Figure 12, it can be seen that model 2 outperforms model 1 on the test set, with significantly fewer misclassified samples. To explicitly quantify the false-alarm rate, we computed the misclassification rate from the confusion matrices in
Figure 12. The rate decreases from 0.864% (model 1) to 0.370% (model 2), confirming that the knowledge-guided design effectively suppresses off-diagonal errors, and showing its significant advantage in generalization ability.
For the misclassified samples in the test set of the knowledge-driven AlexNet model, we selected the zero-sequence current data and image dataset from segments 4 and 6 to optimize the weight parameters of the last fully connected layer, resulting in Model 3.
Table 3 shows the test accuracy of the three models on the dataset containing misclassified samples. It can be seen that the proposed method achieves higher accuracy and generalization capability compared to traditional models, and it can correct misclassifications based on knowledge mechanisms. The localization accuracy on the misclassified test set samples reached 99.53%, demonstrating a significant advantage.
To further validate the adaptability of the proposed method in complex fault conditions, this paper introduces high-resistance grounding scenarios and phase-to-phase fault scenarios. For high-resistance grounding faults, the grounding resistance was set to 2000 Ω, 3000 Ω, 4000 Ω, and 5000 Ω in the simulation, with all other parameters remaining consistent with the previous experiments. In this case, the amplitude of the transient zero-sequence current at the fault point significantly decreases, and the signal-to-noise ratio of the current waveform is low, posing greater challenges for fault section localization. The method proposed in this paper was compared with the methods in References [
14,
15].
Table 4 presents a comparison of the localization results of the three methods under different grounding resistances. As shown in the table, the grounding resistance value increases. References [
14,
15] misjudge in high-impedance fault conditions, while the method in this paper can still correctly detect the fault section in high-impedance faults.
Further consideration is given to phase-to-phase short-circuit fault scenarios, with two-phase short-circuits (AB, AC, BC) set up in the simulation. Under such fault conditions, the current waveform contains stronger symmetrical components and weaker zero-sequence components, posing new challenges to the feature extraction capabilities of fault location methods. The method proposed in this paper is compared with the methods described in References [
14,
15].
Table 5 lists the comparison of fault location results among the three methods under phase-to-phase fault scenarios. It can be observed that the methods described in References [
14,
15] both produce false alarms in phase-to-phase fault scenarios, whereas the method described in this paper is still able to accurately locate the fault section, further verifying the applicability of this method in phase-to-phase short-circuit faults.
In addition to conducting experiments on the proposed model to achieve quantitative comparisons, we also conducted qualitative analyses and comparisons between our work and several representative methods in the literature. For example, wavelet packet + AlexNet-based methods [
15] and image-fusion dual-channel CNN approaches [
19] have achieved promising fault localization accuracy, yet their robustness and interpretability remain limited. Graph neural network-based approaches [
14] demonstrate adaptability to network topology but often require large-scale training data and involve higher computational costs. In contrast, our method integrates knowledge-driven mechanisms, which enhances interpretability and noise immunity without significantly increasing model complexity. The conceptual and methodological comparison suggests that the proposed approach provides a valuable addition to the current state of the art, with both theoretical and engineering significance.
4.5. Interpretability Analyses
The Grad-CAM method is used to analyze the interpretability of the joint-drive model, by analyzing the feature regions that the joint-drive model pays attention to when making decisions, and thereby generating class activation mappings, where parts with smaller weights are mapped in blue and parts with larger weights are mapped in red, so that they can be overlaid on the original image as a heat map, and the feature attention to the different regions is displayed visually.
Randomly extracted image samples of 10 zones in the case of faults are shown in
Figure 13 for the feature heat map generated by Grad-CAM, and through observation of the image, it can be found that the joint-driven model can make full use of the corresponding image elements of the zones; that is, for different zones, the knowledge-driven approach can guide the model to pay attention to spatial–temporal features in the image that represent the zones in which the current fault point is located, so that the fault features can be utilized in a more complete way, while enhancing the interpretability of the data-driven model.
4.6. Noise Immunity
In the real working environment, the collected timing data are often interfered with by noise, so the need to ensure that the model can effectively deal with the input of noise interference, and the need to ensure the universality and stability of the model, cannot be ignored in practical applications. In order to simulate the noise interference in the real environment, Gaussian noise with four different signal-to-noise ratio levels of 30 dB, 40 dB, 50 dB, and 60 dB is superimposed on the original waveform data.
Table 6 shows the results of partial segment positioning under different S/N ratios, and it can be seen that the proposed method is able to locate faulty segments in the case of superimposed noise.
Considering that disturbances in the field environment are more complex, to further enhance the robustness evaluation of the proposed method, we extended our experiments to include various noise and perturbation types, such as impulsive noise, sampling jitter/desynchronization, CT saturation, missing channels, and amplitude/frequency drift. The strength and parameters for each perturbation were set as follows:
Pulse noise: Pulse occurrence probability set to 1%, pulse amplitude set to 10 times the signal standard deviation.
Sampling asynchrony: Asynchrony offset set to ±2 sample points.
CT saturation: Saturation threshold set to 3σ, simulating the saturation distortion of a current transformer.
Missing channels: Missing channel ratio set to 10%
Amplitude and frequency drift: Amplitude drift set to ±5%, frequency drift set to ±0.2%, simulating the slow changes in signal amplitude and frequency.
Under different perturbations, the method proposed in this paper was compared with the traditional AlexNet method.
Table 7 lists the comparison of localization results between the traditional AlexNet method and the method proposed in this paper under different perturbations. It can be observed that under various disturbance scenarios, the method proposed in this paper can accurately locate the fault section, while the traditional AlexNet method may produce misjudgments.
The joint-driven model shows strong noise resistance, which is mainly due to the knowledge guidance mechanism introduced in the model. By embedding fault features in the distribution network into the model, the model is able to focus on these key fault features even when the data part is disturbed by noise, thus effectively suppressing the impact of noise. The knowledge-driven part helps the model to prioritize features with clear physical meaning and reduce the interference of noise. Meanwhile, the data-driven part of the model further enhances the model’s generalization ability under noisy inputs by training on clean data. The fusion of these two components makes the model naturally robust, thus reducing the impact of noise on recognition accuracy.
5. Conclusions
To address the issues of low accuracy in locating single-phase ground fault sections in distribution grids and poor model interpretability, we propose a deep learning method that integrates knowledge-driven and data-driven approaches. This method uses transient zero-sequence current waveforms as core features to construct a dual-channel image representation model combining Markov transition fields and transition entropy. It introduces a knowledge-guided attention mechanism and a similarity-weighted strategy based on the Hausdorff distance, integrating electrical physical principles into the neural network structure to significantly enhance the model’s ability to perceive and distinguish critical fault features. Experimental results conducted on a simulation system based on an IEEE 14-node renovation demonstrate that the proposed method achieves a fault location accuracy rate of 99.53%. The proposed joint-driven model outperforms the traditional AlexNet model in terms of fault segment location accuracy, noise resistance, and generalization capability.
The main contributions of this paper include the following:
Proposing a knowledge-driven and data-driven combined framework to improve the interpretability and performance of fault location models.
Introducing transfer entropy to characterize the direction of information transmission and combining it with MTF to achieve image representation of signals.
Designing an improved AlexNet for fault segment classification, achieving advanced levels of accuracy.
However, this paper still has certain limitations: its results are mainly validated using simulation data, and further testing using actual fault recording data is still needed. Additionally, while the computational overhead of this method is acceptable for offline analysis, it still needs to be optimized for real-time application in protection devices.
Future work will focus on incorporating actual operational measurement data; optimizing the model structure to reduce computational burden; and exploring adaptive solutions for online fault detection to enhance the method’s engineering applicability in smart distribution systems.