1. Introduction
Accurate and fast fault location is essential for improving distribution network reliability and service restoration capability. With the increasing penetration of distributed generation, power electronic devices, and flexible loads, modern distribution networks exhibit more complex operating characteristics, such as bidirectional power flow, variable fault current contribution, and changing topology [
1,
2,
3]. These factors make fault transient characteristics increasingly nonlinear and nonstationary, thereby increasing the difficulty of locating faults rapidly and accurately under diverse operating conditions.
Traditional fault location approaches can be broadly divided into impedance-based and traveling-wave-based categories. Impedance-based techniques estimate the fault distance or section from measured voltage/current quantities and line impedance parameters. Recent studies have enhanced such strategies by combining impedance models with metaheuristic optimization, direct load-flow calculation, fault current constraints, and special grounding-network models [
4,
5,
6,
7]. These methods are attractive because their physical meaning is clear and their implementation cost is relatively low. However, their location accuracy remains sensitive to fault resistance, load uncertainty, distributed generation output, parameter errors, and equivalent-source modeling assumptions [
8]. Traveling-wave-based methods utilize the arrival time, polarity, or waveform characteristics of fault-generated traveling waves to locate faults [
9]. Time-matrix modeling, wide-scale time-window operators, and wavefront-distortion compensation have recently been investigated to improve traveling-wave fault location in active or multi-branch distribution networks [
10,
11,
12]. Although traveling-wave-based methods can achieve high location accuracy, they usually require high-frequency measurement devices, reliable wavefront detection, accurate time synchronization, and precise wave velocity estimation, which may limit their practical deployment in complex distribution networks.
In recent years, artificial intelligence-based methods have been widely investigated for fault diagnosis and location in distribution networks. Machine learning and deep learning models can establish nonlinear mappings between fault measurements and fault location labels, thereby reducing the dependence on explicit system modeling. Learning-based strategies have also been increasingly applied to adaptive operation and control in modern power electronic energy systems, such as electric-vehicle-based fast frequency response with deep reinforcement learning [
13]. For example, sparse-meter-based location, sparse overcomplete representation with Bayesian learning, and learning-based identification methods have been used for faulted section location under limited measurements or inverter-interfaced distributed generators [
14,
15,
16]. Deep convolutional neural networks and wavelet scattering networks have also been introduced to learn discriminative features from synchrophasor or transient signals [
17,
18,
19]. Nevertheless, many existing intelligent methods still rely on manually designed features or external signal preprocessing. In particular, when wavelet transform is used only as a preprocessing tool, multi-scale time-frequency information is separated from the deep feature learning process, which may weaken the end-to-end representation ability and feature fusion capability of the model. This indicates a specific research gap: conventional “wavelet preprocessing + CNN” schemes usually generate wavelet coefficients before network training, and the subsequent CNN only learns features from the preprocessed outputs. As a result, the interaction between multi-scale decomposition and deep feature extraction is limited, and some fault-sensitive transient information may not be sufficiently fused during model learning. Therefore, it is necessary to integrate wavelet decomposition more closely into the feature-learning process so that multi-scale fault information can be represented and fused within the network.
To address these issues, this paper proposes a wavelet-embedded residual attention convolutional neural network for distribution network fault location. The fault location task is formulated as a multi-class classification problem, where each predefined fault section is regarded as one candidate class. Unlike conventional wavelet-CNN schemes, the proposed method embeds discrete wavelet decomposition into the CNN feature extraction process, allowing low-frequency trend components and high-frequency transient components to be preserved within the network pipeline. In this embedded design, wavelet decomposition is no longer treated as an isolated preprocessing step; instead, it becomes part of the feature extraction pipeline, allowing multi-scale components to be jointly processed, fused, and refined by subsequent trainable convolutional representations. This design helps preserve both global trend information and local transient disturbances, thereby improving the discriminative representation of fault sections. Furthermore, residual connections are introduced to improve the stability of deep feature learning, and an attention mechanism is employed to enhance fault-sensitive channels and frequency components. In this way, the proposed model can extract more discriminative multi-scale transient features for accurate fault location classification.
The remainder of the paper is organized as follows.
Section 2 describes the proposed wavelet-embedded residual attention CNN for distribution network fault location, including the wavelet-embedded convolution layer, wavelet residual attention feature extraction network, and fault location classification objective.
Section 3 presents the experimental setup, dataset construction, cross-validation strategy, training settings, and evaluation metrics.
Section 4 reports the experimental verification results, including fault location performance comparison, computational-effort analysis, and robustness tests. Finally,
Section 5 concludes the paper and discusses future work.
3. Experimental Setup
Case studies are conducted on the IEEE 33-bus distribution system, whose topology is shown in
Figure 2. The system is a radial distribution feeder with one source bus and 32 load buses, and it is commonly used to verify distribution network protection and fault location methods. Its main feeder and lateral branches provide different electrical distances and branch relationships, making it suitable for examining section-level fault location. In this study, faults are assigned to candidate line sections, and the corresponding transient measurement signals are used as model inputs.
The fault location task is formulated as a multi-class classification problem, where each predefined line section is regarded as one candidate fault location class. Therefore, the output of the model is the faulted section label rather than a continuous distance value, which is consistent with section-level fault isolation and maintenance in distribution networks. This setting also allows the classification result to be directly compared with the actual faulted section, so that both overall accuracy and class-wise location behavior can be analyzed.
Before being fed into the network, each input signal is normalized to reduce the influence of amplitude-scale differences among samples. The dataset contains 5000 fault samples. Each sample consists of 33 node signals, and each node signal contains 50 sampling points. The 32 line sections are used as fault location classes. To provide a more complete dataset description, different operating and fault conditions are considered in the simulation. The fault inception angle is varied from 0° to 330° in 30° increments. The load level is set to 0.8, 1.0, and 1.2 times the nominal value to represent light-load, nominal-load, and heavy-load operating conditions, respectively. Distributed generation is modeled as inverter-interfaced sources operating at a fixed power factor, with penetration levels of 0%, 10%, and 20%. The 5000 samples are generated from different combinations of fault locations, fault types, inception angles, load levels, and distributed-generation conditions, rather than from only one fixed operating condition.
To avoid information leakage across highly similar fault scenarios, the dataset is divided at the scenario level rather than the individual-sample level. Specifically, samples sharing the same fault location, fault type, fault inception angle, load level, and distributed-generation condition are assigned to the same data subset. This prevents nearly identical fault cases from appearing simultaneously in the training and test sets, so that the reported performance better reflects the generalization ability of the model rather than memorization of highly similar samples.
To further evaluate the generalization ability of the proposed model more reliably and reduce the selection bias caused by a single train–test split, stratified 5-fold cross-validation is adopted. In this setting, the whole dataset is divided into five folds while preserving the class distribution of the 32 fault location classes in each fold. The scenario-level grouping described above is also maintained during fold construction. For each validation round, four folds are used for model training, and the remaining fold is used for testing. When validation is required for model selection, it is conducted only within the training folds to avoid any information leakage into the test fold. This process is repeated five times so that each fold is used once as the test set. The final performance is obtained by averaging the results over the five folds.
In the implementation of the proposed wavelet-embedded convolution layer, the Daubechies 4 (db4) wavelet basis is used. The corresponding low-pass and high-pass wavelet filters are fixed and non-trainable during network optimization, rather than being updated by back-propagation. Therefore, the wavelet decomposition provides deterministic multi-scale signal representation, while the subsequent convolutional, residual, and attention modules perform adaptive feature learning and feature fusion. This setting makes the role of the wavelet-embedded layer explicit and clarifies that the performance improvement comes from the joint use of fixed wavelet decomposition and trainable deep feature extraction.
The network is trained using cross-entropy loss, and AdamW is adopted for parameter optimization. To ensure a fair comparison, the same data-processing and evaluation protocol is used for all compared methods. Thus, differences in the final results mainly come from the model structures rather than from inconsistent training settings. The key training parameters of the proposed method are listed in
Table 1.
The performance is evaluated by accuracy, precision, recall, and F1-score. Accuracy reflects the overall proportion of correctly located samples, while precision, recall, and F1-score describe the class-wise recognition quality from different perspectives. Since the fault location task involves multiple candidate sections, relying only on accuracy may obscure uneven performance among classes. Therefore, precision and recall are calculated in a one-vs-rest manner and then averaged over all classes:
where
is the number of correctly located samples,
N is the total number of test samples,
K is the number of fault location classes, and
,
, and
denote true positives, false positives, and false negatives of the
kth class, respectively. For the stratified 5-fold cross-validation, these metrics are first calculated on the test fold of each round and then averaged over the five rounds to obtain the final reported performance.
4. Experimental Verification and Discussion
4.1. Fault Location Performance and Comparison
This subsection first examines the class-wise location behavior of the proposed method and then compares it with MLP, SVM, CNN, ResNet, and Attention-CNN. MLP denotes multi-layer perceptron, SVM denotes support vector machine, and CNN denotes convolutional neural network. MLP and SVM are used as basic classifiers, while CNN, ResNet, and Attention-CNN represent standard convolutional feature learning, residual feature propagation, and attention-based feature enhancement, respectively. These baselines were selected because they are representative and widely used in fault location studies: MLP and SVM serve as classical machine learning baselines; CNN is the basic deep learning baseline; ResNet is used to examine the effect of residual learning; and Attention-CNN is used to assess the effect of attention enhancement. Together, these methods provide a fair and systematic comparison across the main modeling paradigms and allow us to isolate the contribution of each component of the proposed framework under the same data representation and training protocol. The comparison is intended to evaluate whether the joint use of wavelet embedding, residual learning, and attention enhancement improves fault location performance over these representative baseline structures.
The class-wise correct location rate is illustrated in
Figure 3. Across the five-fold evaluation, the proposed method achieves an average test accuracy of 98.27% and a class-averaged correct location rate of 98.29%. In the representative test fold used for class-wise visualization, most candidate sections are located with very high accuracy. The lowest class-wise rate appears at F11, where the correct location rate is 90.91%. The remaining non-perfect sections, such as F9, F10, and F16, still maintain correct location rates above 91%. These results show that the proposed model does not rely only on several easily identified sections, but preserves high recognition ability across most candidate fault location classes.
To further examine the location error pattern,
Figure 4 gives the misclassification distribution of each class in the representative test fold. The stacked bars show only the misclassification categories with nonzero values, including previous-section errors and other-section errors, while the black curve denotes the total error rate. The misclassified samples are mainly concentrated in a small number of sections, and the maximum class-wise error rate is 9.09% at F11. Most wrong predictions are assigned to the previous adjacent section, whereas non-neighboring errors occur only rarely. In this representative test fold, no next-section or second-neighbor-section errors are observed. This error pattern indicates that the proposed method seldom produces large location deviations, which is useful for practical fault isolation because localized errors are easier to inspect and correct.
The comparison results are summarized in
Table 2. MLP is used as a basic nonlinear classifier, and SVM is used as a traditional machine learning baseline. CNN directly uses standard convolution and serves as the basic deep learning benchmark. ResNet introduces residual connections to improve feature propagation, whereas Attention-CNN adds an attention module to emphasize informative features. These methods provide a progressive comparison for evaluating the contribution of the proposed integrated structure.
Among all compared methods, the proposed method obtains the highest accuracy, precision, recall, and F1-score. Its accuracy is 7.59, 3.33, 1.46, 0.67, and 1.20 percentage points higher than those of MLP, SVM, CNN, ResNet, and Attention-CNN, respectively. The large gains over MLP and SVM indicate that direct classification based on flattened or pooled features is insufficient to fully capture the spatial–temporal fault patterns. The improvement over CNN shows the benefit of multi-scale wavelet feature extraction. In addition, the gains over ResNet and Attention-CNN suggest that residual propagation and attention weighting become more effective when they are combined with wavelet-based multi-scale representations. Although the numerical accuracy improvement over ResNet and Attention-CNN is moderate, the advantage of the proposed method is not limited to the overall fault-classification metric. Since the classification labels correspond to physical line sections, improved classification performance directly contributes to more reliable section-level fault location. Moreover, the proposed method provides more balanced class-wise recognition, reduces large location deviations, and maintains stable performance under different transition resistances, measurement noise levels, and fault types. Therefore, the proposed model improves not only the overall correctness but also the practical reliability of fault section identification.
To further evaluate the computational effort of different methods, the average inference time was measured on the same evaluation samples under the same hardware environment and batch-size setting. The results are listed in
Table 3.
As shown in
Table 3, the proposed method requires a slightly longer inference time than the compared baselines because the wavelet-embedded feature extraction, residual propagation, and attention enhancement introduce additional computational operations. However, the average inference time is still only 1.05 ms/sample, which remains within an acceptable range for online fault location applications. Compared with ResNet and Attention-CNN, the proposed method introduces only modest computational overhead while achieving higher accuracy, more balanced class-wise performance, and stronger robustness. Therefore, the proposed method provides a favorable trade-off between location performance and computational cost. Overall, these results confirm that the proposed method improves both overall correctness and class-wise recognition balance.
4.2. Ablation Study
To further verify the contribution of each component in the proposed framework, an ablation study is conducted. Four model variants are compared: CNN, CNN + Wavelet, CNN + Wavelet + Residual, and CNN + Wavelet + Residual + Attention. The CNN variant is the basic convolutional baseline and is kept identical to the CNN used in the comparison experiment. The CNN + Wavelet variant introduces the wavelet-embedded feature extraction module to evaluate the effect of multi-scale transient representation. The CNN + Wavelet + Residual variant further adds residual connections to examine the benefit of improved feature propagation. Finally, CNN + Wavelet + Residual + Attention corresponds to the complete proposed model, in which the attention mechanism is used to enhance fault-sensitive representations. The ablation results are summarized in
Table 4.
As shown in
Table 4, the performance improves progressively as each module is added. Compared with the basic CNN, CNN + Wavelet improves the accuracy from 96.80% to 97.42%, indicating that the wavelet-embedded module can enrich multi-scale transient features and improve fault location representation. After adding residual connections, the accuracy further increases to 97.98%, which shows that residual learning helps improve feature propagation and stabilize deeper feature extraction. When the attention mechanism is further introduced, the complete model achieves the best performance, with 98.27% accuracy and a 98.33% F1-score. This confirms that the attention mechanism can further enhance fault-sensitive information and improve classification reliability. Overall, the ablation study demonstrates that wavelet embedding, residual learning, and attention enhancement all contribute positively to the final fault location performance.
4.3. Robustness Analysis
The robustness of the proposed method is evaluated under different transition resistances, noise levels, and fault types. These factors are selected because they directly affect the waveform amplitude, transient components, and phase relationships of fault signals. In practical distribution networks, such variations are difficult to avoid, and a practical fault location model should maintain stable performance when signal characteristics change. Therefore, the following tests provide a more comprehensive evaluation of the adaptability of the proposed method.
4.3.1. Influence of Transition Resistance
The influence of transition resistance is investigated first, and the corresponding results are listed in
Table 5.
With the increase in transition resistance, the accuracy decreases from 98.09% at 0.01 to 91.84% at 100 . This trend is expected because a larger transition resistance weakens the fault current and reduces the distinction between different fault sections. In particular, when the transition resistance increases to 100 , the fault current amplitude becomes significantly smaller, and the transient signatures associated with different line sections become less distinguishable. As a result, the feature separability among adjacent fault sections is reduced, which makes the classification task more difficult and leads to a more obvious performance drop. Even under the highest tested resistance, however, the accuracy remains above 90%. Compared with the 0.01 case, the accuracy drop at 100 is 6.25 percentage points. Meanwhile, the F1-score remains above 90.63% in all tested resistance cases, indicating that the proposed method can still preserve effective location features under high-resistance faults. Nevertheless, high-resistance faults remain more challenging than low-resistance faults because their fault-induced transients are weaker and more easily affected by load variation and measurement noise. Therefore, additional feature enhancement, high-resistance-fault-oriented data augmentation, or adaptive sample reweighting may further improve the location performance under high-resistance fault conditions, which will be considered in future work.
4.3.2. Influence of Measurement Noise
The influence of measurement noise is evaluated by varying the signal-to-noise ratio (SNR), and the results are given in
Table 6.
When the SNR decreases from 40 dB to 10 dB, the accuracy decreases from 97.96% to 93.86%. The degradation is gradual rather than abrupt, indicating that the model remains stable as measurement interference increases. At 10 dB, the input signal is strongly disturbed, but the F1-score still reaches 93.97%. This result suggests that the proposed model has a degree of noise tolerance, which can be attributed to the combined use of low-frequency trend information and high-frequency transient information.
4.3.3. Influence of Fault Type
Finally, the adaptability of the proposed method to different fault categories is examined.
Table 7 lists the results for single-phase-to-ground, phase-to-phase, two-phase-to-ground, and three-phase faults.
For the four fault types, the accuracy ranges from 98.07% to 98.52%. Among them, the two-phase-to-ground fault yields the lowest accuracy, while the three-phase fault yields the highest accuracy. This result is consistent with the fact that different fault types produce different transient characteristics and phase couplings. Three-phase faults usually have more pronounced and balanced waveform changes, whereas grounding-related faults may exhibit more asymmetric phase responses and weaker transient differences among phases. Even so, the F1-score remains above 98.11% for all fault categories, showing that the proposed method maintains stable performance when the fault type changes.
5. Conclusions
In this paper, a wavelet-embedded residual attention convolutional neural network has been proposed for fault location in distribution networks. The fault location task is formulated as a multi-class classification problem, in which each predefined line section corresponds to one candidate class. By embedding fixed discrete wavelet decomposition into the convolutional feature extraction process, the proposed method obtains low-frequency trend information and high-frequency transient information, which are then adaptively fused by trainable convolutional, residual, and attention modules. Residual connections and the attention mechanism are further introduced to improve feature propagation and enhance fault-sensitive representations. Simulation experiments on the IEEE 33-bus distribution system show that the proposed method outperforms representative classical machine learning and deep learning baselines, achieving an average accuracy of 98.27% and an average F1-score of 98.33%. The comparison and ablation results provide a fair and representative evaluation of the proposed method, while the expanded discussion further analyzes the experimental results and their practical implications. The class-wise results are also stable, with most misclassifications limited to adjacent sections rather than large location deviations. In addition, the robustness tests under different transition resistances, noise levels, and fault types indicate that the proposed method has good adaptability to varying fault conditions. The present study is primarily evaluated on the IEEE 33-bus distribution system, and more field-measured data from practical distribution networks should be incorporated in future work. The effects of topology changes, distributed-generation operating modes, load variations, and measurement errors also warrant further analysis. Future research will focus on improving the generalization ability, computational efficiency, and lightweight deployment of the proposed method for practical online fault location.