1. Introduction
Power cables are critical components of modern power transmission and distribution systems and are widely deployed in urban distribution networks, industrial power supply systems, offshore wind farms, and transportation infrastructure. With the rapid growth of electricity demand and the continuous expansion of power grids, the scale and complexity of cable networks have increased significantly. Consequently, the operational reliability of power cables directly affects the stability and safety of the entire power system. However, during long-term operation, cables are inevitably exposed to insulation aging, thermal stress, mechanical damage, and environmental corrosion, which may gradually degrade their insulation performance and lead to various types of faults. Among these failures, many permanent faults are preceded by incipient faults, which often manifest as intermittent discharge phenomena or weak insulation degradation before evolving into catastrophic failures [
1,
2]. Early detection of such faults is therefore of great importance for improving power system reliability and reducing maintenance costs.
In the past decades, numerous studies have investigated cable fault detection and location techniques. Traditional methods mainly include bridge-based techniques, impedance analysis, and traveling-wave-based fault location methods. These methods have been widely applied to locate permanent faults in power cables and distribution networks [
3,
4,
5]. For example, traveling-wave-based approaches utilize high-frequency transient signals generated during fault occurrence to determine fault location with high accuracy [
6]. Other approaches rely on voltage distribution characteristics or distributed parameter models to estimate fault distances in long-distance cables [
7]. Although these techniques are effective for locating permanent faults, they are generally less sensitive to high-impedance or incipient faults, which often exhibit weak transient features.
To overcome this limitation, researchers have introduced advanced signal processing techniques to extract weak fault signatures from cable monitoring signals. Time–frequency analysis methods such as wavelet transform and empirical mode decomposition (EMD) have been widely applied to analyze transient signals in power systems [
8,
9]. However, EMD often suffers from mode mixing problems when processing complex non-stationary signals. To address this issue, improved signal decomposition algorithms such as Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Variational Mode Decomposition (VMD) have been proposed to enhance signal decomposition quality and improve feature extraction capability [
10,
11,
12]. These methods have demonstrated strong potential in power equipment fault diagnosis by effectively separating signal components in different frequency bands.
With the rapid development of artificial intelligence technologies, machine learning and deep learning methods have become increasingly important in the field of power system fault diagnosis. Traditional machine learning algorithms such as Support Vector Machines (SVM) and Random Forests have been used to identify fault patterns and distinguish abnormal events from normal operating conditions [
13,
14]. In recent years, deep learning approaches including Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and attention-based models have achieved significant progress in fault diagnosis tasks due to their strong feature learning capability [
15,
16,
17]. For instance, deep convolutional networks have been successfully applied to cable fault classification tasks with improved diagnostic accuracy [
18]. In addition, recent studies have integrated attention mechanisms and Transformer architectures into fault diagnosis frameworks, enabling models to capture long-range temporal dependencies and enhance feature representation [
19,
20].
In the context of cable fault diagnosis, several recent studies have explored the integration of deep learning and advanced signal processing techniques. For example, deep learning models combined with wavelet packet transform have been used to identify faults in transmission lines with improved robustness [
21]. Incremental learning-based convolutional networks have also been proposed to enhance the generalization capability of cable fault diagnosis systems when new fault types appear [
22]. Moreover, graph neural networks and self-attention mechanisms have been introduced to analyze time-series signals for cable defect detection, achieving promising results in practical monitoring scenarios [
23]. Recent review studies further indicate that artificial intelligence techniques provide powerful tools for cable fault detection and localization, especially in complex power networks with large-scale monitoring data [
24].
Recent studies in adjacent sensing and diagnostic fields also indicate several methodological directions that are relevant to the present problem. For example, Taguchi- and ANOVA-based optimization frameworks have been used to statistically validate parameter effects and hyperparameter settings in AI-assisted monitoring systems [
25]. Such methods provide structured factor analysis and stronger statistical interpretability, particularly when the objective is to quantify the influence of a limited number of design variables. Gaussian-process-based data-driven models have further shown advantages in surrogate modelling and uncertainty-aware regression for electromagnetic characterization problems [
26]. In addition, integrated sensor/electronic-circuit monitoring architectures combined with artificial intelligence have demonstrated the practical value of coupling embedded acquisition hardware with host-side intelligent diagnosis in noisy industrial environments [
27]. In contrast to orthogonal-design-based statistical optimisation strategies, the present study addresses a continuous and nonlinear CEEMDAN parameter-search problem in which the decomposition quality depends on the coupled interaction between noise amplitude and ensemble number. For this reason, a population-based meta-heuristic search strategy, namely WOA, is adopted to provide adaptive global optimisation for the decomposition stage.
Despite these advances, detecting incipient cable faults remains challenging due to the weak and intermittent characteristics of early fault signals, strong environmental noise, and the limited availability of labeled fault data. Traditional signal decomposition methods often rely on manually selected parameters, which may affect decomposition performance and diagnostic accuracy. In addition, many deep learning models struggle to simultaneously capture both local transient features and long-range temporal dependencies in fault signals. Therefore, it is necessary to develop an integrated framework that combines robust signal processing techniques with advanced deep learning architectures to improve the reliability of incipient fault diagnosis.
To address these challenges, this paper proposes a diagnostic framework for incipient cable faults based on WOA-optimized CEEMDAN and a TCN-BiLSTM-Multi-HeadAttention network. First, the Whale Optimization Algorithm is employed to adaptively optimize the key parameters of CEEMDAN, thereby improving decomposition quality and enhancing the extraction of weak fault features. Subsequently, the decomposed intrinsic mode functions are reconstructed as multi-channel inputs and fed into a hybrid deep learning architecture that integrates Temporal Convolutional Networks, Bidirectional Long Short-Term Memory, and multi-HeadAttention for fault identification. Experimental results on both simulation data and real cable-monitoring data demonstrate that the proposed method achieves superior diagnostic accuracy, robustness, and practical applicability compared with several representative approaches. The overall workflow of the proposed research framework for incipient cable fault diagnosis is shown in
Figure 1.
The main contributions of this study are summarized as follows:
- (1)
An adaptive WOA-guided CEEMDAN framework is developed to optimize the noise amplitude and ensemble number automatically, which improves decomposition quality and enhances the extraction of weak incipient-fault components from noisy cable signals.
- (2)
A hybrid TCN-BiLSTM-Multihead-Attention network is constructed to jointly learn local transients, bidirectional temporal dependencies, and globally important feature interactions, thereby improving diagnostic discriminability for subtle fault patterns.
- (3)
An integrated signal-processing-to-diagnosis pipeline is established by explicitly mapping the optimized CEEMDAN outputs to multi-channel deep features, while the computational burden is controlled through offline parameter optimization, limited IMF retention, and a compact sequential architecture.
- (4)
Extensive experiments on both simulated and real measured datasets, together with ablation, small-sample, noise-robustness, ROC/AUC, and confusion-matrix analyses, verify the superiority and engineering applicability of the proposed method for incipient cable fault diagnosis.
The offline phase includes signal acquisition from both simulation and real cable-monitoring systems, signal segmentation and normalization, WOA-based optimization of CEEMDAN parameters, adaptive signal decomposition, IMF selection and multi-channel feature construction, model training, and performance validation. The subsequent stage illustrates the intended diagnosis procedure for future engineering implementation, including new signal acquisition, fixed-parameter CEEMDAN decomposition, IMF-based feature reconstruction, TCN-based local feature extraction, BiLSTM-based temporal modeling, multi-head-attention-based feature reweighting, Softmax-based fault classification, and alarm output. It should be noted that, in the present study, this procedure was validated as an offline diagnosis framework based on field-acquired data rather than as a fully online embedded deployment.
3. TCN-BiLSTM-Multihead-Attention Model
To effectively capture both local and global temporal dependencies in cable incipient fault signals, this paper proposes a hybrid deep learning model that integrates Temporal Convolutional Network (TCN) [
32], Bidirectional Long Short-Term Memory (BiLSTM), and Multihead Attention mechanism. This architecture leverages the strengths of each component to enhance feature extraction and sequence modeling capabilities.
3.1. Temporal Convolutional Network (TCN)
TCN is a variant of convolutional networks designed for sequence modeling tasks. It employs causal convolutions to ensure that no future information leaks into the past, making it suitable for time-series analysis. The architecture also incorporates dilated convolutions to exponentially increase the receptive field without significantly increasing the number of parameters. This allows TCN to capture long-range dependencies with high computational efficiency.
Given an input sequence
, the output of a dilated causal convolution at time
t is defined as:
where
is the kernel size,
is the dilation factor, and
is the convolution filter. By stacking multiple layers with increasing dilation rates, TCN captures multi-scale temporal features critical for identifying incipient fault patterns.
3.2. Bidirectional Long Short-Term Memory
While TCN excels at capturing local patterns, BiLSTM is introduced to model bidirectional temporal dependencies [
33]. LSTM units mitigate the vanishing gradient problem through gating mechanisms, enabling the network to retain long-term memory. The bidirectional extension processes the sequence in both forward and backward directions, allowing the model to access past and future context simultaneously. The structural diagram is shown in
Figure 3.
For each time step t, the forward LSTM generates a hidden state
, and the backward LSTM generates
. These are concatenated to form the final hidden representation:
This structure enhances the model’s sensitivity to subtle variations in fault signals that may precede or follow a fault event.
3.3. Multihead Attention Mechanism
To further improve the model’s ability to focus on informative time steps, a multi-HeadAttention mechanism is applied to the output of the BiLSTM layer. The attention mechanism assigns different weights to different time steps, enabling the model to emphasize fault-related features while suppressing irrelevant fluctuations and residual noise, as shown in Equation (15). Here,
Q,
K and
V denote the query, key, and value matrices derived from the BiLSTM outputs, respectively, and
denotes the dimension of the key vectors [
34]. Multi-HeadAttention performs multiple attention operations in parallel, each of which learns a different representation subspace, and then concatenates the outputs, as shown in Equations (16) and (17). By operating in multiple representation subspaces, the Multi-Head Attention mechanism enhances the model’s ability to capture complex fault patterns at different temporal scales.
where
,
,
and
are trainable projection matrices. By performing attention operations in multiple representation subspaces, the Multihead Attention mechanism enhances the model’s ability to capture complex fault patterns at different temporal scales. For clarity, the main symbols used in Equations (15)–(17) are summarized in
Table 2.
This mechanism enhances the model’s capacity to capture complex fault patterns across different temporal scales.
3.4. TCN-BiLSTM-Multihead-Attention
The TCN-BiLSTM-Multihead-Attention hybrid neural network model proposed in this paper is an end-to-end deep learning architecture designed specifically for cable incipient fault signal characteristics. Its overall structure adopts a hierarchical progressive design as shown in
Figure 4, primarily consisting of five core modules: the input layer receives multi-dimensional time-series data reconstructed by WOA-CEEMDAN (feature matrix composed of IMF1-IMF6); the TCN feature extraction module serves as the front end, capturing local temporal patterns and multi-scale features of fault signals through multi-layer dilated causal convolutions; the BiLSTM temporal modeling module, composed of forward and backward LSTMs, performs bidirectional modeling on the feature sequences output by TCN, simultaneously capturing past and future contextual dependencies; the Multihead-Attention feature enhancement module maps the BiLSTM outputs into query, key, and value matrices, highlighting critical time steps of fault features from different representation subspaces through parallel computation of multiple attention heads; the output layer, through global average pooling, fully connected layers, and a Softmax classifier, ultimately outputs the cable incipient fault identification results. This hybrid architecture fully leverages the local feature extraction capability of TCN, the temporal modeling capability of BiLSTM, and the feature enhancement capability of the attention mechanism, achieving efficient extraction and accurate recognition of weak cable incipient fault characteristics.
As illustrated in
Figure 4, dropout layers with a rate of 0.3 were inserted after the Multihead Attention modules to reduce overfitting and improve the generalization capability of the proposed network. In the overall framework, WOA-CEEMDAN serves as the signal preprocessing module and is responsible for adaptively decomposing noisy cable current signals into informative intrinsic mode functions, thereby enhancing weak fault-related components. The TCN module acts as the front-end feature extractor, capturing local transient patterns and multi-scale temporal characteristics through dilated causal convolutions. The BiLSTM module further models bidirectional temporal dependencies, enabling the network to exploit both past and future contextual information in the reconstructed sequences. The Multihead Attention module adaptively reweights the learned temporal features and emphasizes fault-sensitive representations from different subspaces, which helps suppress irrelevant fluctuations and residual noise. The dropout modules are introduced to improve model robustness and alleviate overfitting during training. Finally, the fully connected layer and Softmax classifier map the high-level features into fault categories and generate the final diagnosis results.
3.5. Computational Complexity and Practical Deployment Considerations
Although the proposed model combines TCN, BiLSTM, and Multi-Head Attention, its computational burden is controlled from both the preprocessing and network-design perspectives. First, WOA is used offline to determine the CEEMDAN parameters only once for a given dataset and therefore does not increase the online inference cost. Second, only the most informative IMF components are retained after decomposition, which reduces redundant input channels and suppresses noise propagation into the classifier. Third, the TCN front end extracts local features using shared convolution kernels and dilated convolutions, allowing the receptive field to be enlarged without substantially increasing the model complexity. The subsequent BiLSTM and Multi-Head Attention modules operate on compressed sequential features rather than on raw waveforms, which further reduces the sequence-modeling burden. In the present study, the framework was validated as an offline diagnosis method based on field-acquired data rather than as a fully deployed embedded system. Therefore, the current complexity discussion is intended to demonstrate methodological feasibility, whereas dedicated runtime benchmarking and real-time embedded implementation will be investigated in future work.
Formally, for an input sequence of length T, the dominant costs of the three modules can be expressed as , , , respectively, where k is the kernel size, C is the channel dimension, H is the hidden size, hhh is the number of attention heads, and denotes the reduced sequence length after TCN feature extraction. Therefore, the proposed hybrid architecture improves representation capability without causing an unacceptable increase in online diagnostic complexity. In addition, it should be emphasized that the current implementation corresponds to an offline diagnosis framework based on practical field-acquired data, rather than a fully online embedded diagnosis system. Accordingly, the present computational-complexity discussion is intended to demonstrate the methodological feasibility of the proposed architecture, while dedicated runtime benchmarking and fully embedded real-time deployment are beyond the scope of the present study.
4. Experimental Results and Analysis
4.1. Detailed Configuration Parameters
For signal preprocessing, both simulated and measured current signals were segmented into samples of 2048 points and normalized using z-score standardization. The CEEMDAN search ranges were set to α ϵ [0.05, 0.30] and n ϵ [50, 150]. The WOA population size was set to 30, the maximum iteration number was 100, and the spiral constant b was fixed to 1.0. According to the optimization results, the optimal CEEMDAN parameters were α = 0.13 and n = 126, the maximum decomposition level was set to 7. IMF1–IMF6 were retained to construct the multi-channel input features for the diagnosis model.
The TCN module comprised three residual blocks with a kernel size of 3 and dilation factors of 1, 2, and 4, respectively. The numbers of convolution channels were set to 32, 64, and 64. The BiLSTM hidden size was 64 in each direction. The Multihead Attention module used 4 heads with a model dimension of 128. A dropout rate of 0.3 was applied after the BiLSTM and attention layers to alleviate overfitting. The classifier consisted of a fully connected layer with 64 neurons and a Softmax output layer.
For network training, the Adam optimizer was employed with an initial learning rate of 1 × 10−3 and a batch size of 64. The maximum training epoch was 100, and early stopping with a patience of 10 epochs was adopted according to the validation loss. The loss function was categorical cross-entropy. All experiments were conducted under the same training/validation/testing split ratio of 70%/15%/15% to ensure a fair comparison.
4.2. Dataset and Experimental Setup
To validate the effectiveness of the proposed TCN-BiLSTM-Multihead-Attention model under controlled conditions, a simulation model of a 10 kV cable distribution network was constructed using PSCAD/EMTDC as shown in
Figure 5 and
Figure 6. The cable model was based on the YJV42-8.7/10 kV parameters, with a total length of 5 km. Incipient faults were simulated by introducing intermittent single-phase-to-ground arcs with fault resistance ranging from 100 Ω to 500 Ω (step 100 Ω), and fault inception angles varying from 0° to 90° (step 15°). The sampling frequency was set to 10 MHz to match the real acquisition system. A total of 12,000 simulated samples were generated, each containing 2048 time points (approximately 0.2 ms duration) covering the pre-fault, fault inception, and post-fault transient periods.
To illustrate the behavior of incipient faults under controlled simulation conditions,
Figure 7 presents the current waveform during a typical discharge event. The waveform exhibits high-frequency transient components at the fault inception moment, followed by damped oscillations, which are characteristic of intermittent arcing faults in cable systems. Correspondingly,
Figure 8 shows the voltage waveform during the same discharge event. The voltage signal exhibits a sudden drop at the fault instant, accompanied by high-frequency distortion and subsequent recovery, which are important indicators of incipient fault behavior.
To further validate the practical effectiveness of the proposed algorithm, real-world cable data were also collected and employed in this study. Experimental data were collected from an online cable fault acquisition system comprising high-frequency current sensors, a 14-bit ADC with 10 MHz sampling rate, and an MCU control unit. The test object was YJV42-8.7/10 kV cable. As shown in
Figure 9 and
Figure 10, a total of 14,000 samples were collected over 2 h, randomly partitioned into training (70%), validation (15%), and testing (15%) sets.
It should be noted that the present study focuses on offline fault diagnosis rather than fully online embedded inference. In the experimental setup, the MCU control unit was mainly used for signal acquisition, control, and data transmission, while the subsequent CEEMDAN-based preprocessing and TCN-BiLSTM-Multihead-Attention diagnosis were performed offline on the host-side computing platform. Therefore, the purpose of the real measured dataset in this work is to validate the diagnostic effectiveness, robustness, and engineering applicability of the proposed method under practical acquisition conditions, rather than to demonstrate a fully deployed real-time edge implementation. Quantitative benchmarking of end-to-end execution time, memory usage, and embedded real-time deployment efficiency will be investigated in future work.
4.3. WOA-CEEMDAN Optimization
To further investigate the parameter sensitivity and optimization effectiveness of the proposed WOA-CEEMDAN framework, additional experiments were conducted focusing on parameter selection, decomposition performance comparison, and robustness analysis.
4.3.1. Parameter Sensitivity Analysis of WOA-CEEMDAN
The core of the WOA lies in simulating the intelligent foraging behavior of whales. To find the optimal parameter combination for CEEMDAN, this paper designs experiments to demonstrate the optimization process of the proposed scheme.
Figure 11 shows the convergence behavior of the WOA optimization process. The fitness value decreases rapidly during the early iterations and gradually stabilizes after approximately 50 iterations, reaching F(P) = 0.124. This result indicates that the proposed optimization procedure converges efficiently to a stable solution within the preset iteration range.
The performance of CEEMDAN is highly dependent on two critical parameters: noise amplitude (α) and ensemble number (n). In order to verify the effectiveness of the WOA optimization process, multiple combinations of these parameters were tested before optimization.
The noise amplitude α was varied from 0.05 to 0.30, while the ensemble number
n ranged from 50 to 150. RMSE, SNR, and information entropy were used as evaluation indicators.
Table 3 presents the influence of different parameter combinations on decomposition quality.
To avoid ambiguity, the search range of α in both the WOA optimization and the pre-optimization sensitivity analysis was fixed to [0.05, 0.30]. Therefore, the α = 0.05 rows in
Table 3 are not additional baseline tests outside the optimization boundary, but part of the same candidate interval used to verify parameter sensitivity before selecting the optimal solution.
The results demonstrate that excessive noise amplitude leads to noise amplification in the decomposition results, while too small noise amplitude fails to eliminate mode mixing effectively. The optimal parameter combination (α = 0.13, n = 126) identified by WOA achieves the lowest RMSE and highest SNR, indicating superior signal reconstruction and noise suppression capability.
4.3.2. Comparison of Different Objective Functions
To further examine whether the spectral-entropy-based objective function used in
Section 2.3 is appropriate for CEEMDAN parameter optimisation, additional comparative experiments were conducted using several alternative objective criteria. Under the same WOA settings (population size = 30, maximum iterations = 100, and identical search ranges for α and
n), four objective functions were tested: (1) spectral entropy (SE), (2) reconstruction error (RE), (3) orthogonality index (OI), and (4) a composite objective combining spectral entropy and reconstruction error. For each objective function, WOA was used to determine the optimal CEEMDAN parameters, and the resulting decomposition performance was evaluated using RMSE, SNR, and information entropy.
As shown in
Table 4, different objective functions lead to different optimal CEEMDAN parameter combinations and noticeably different decomposition results. Among the tested criteria, the spectral-entropy-based objective function achieves the best overall performance, with the lowest RMSE (0.097), the highest SNR (8.42 dB), and the lowest information entropy (12.80). This indicates that spectral entropy is more effective in suppressing noisy IMF components while preserving weak fault-related transient features. By contrast, the reconstruction-error-based objective and the orthogonality-index-based objective result in higher RMSE values, lower SNR values, and larger information entropy, suggesting that optimizing only signal fidelity or IMF independence is insufficient to achieve the best fault-oriented decomposition quality.
The composite objective (SE + RE) provides intermediate performance, yielding better results than RE and OI alone, but still remaining inferior to the spectral-entropy-based objective in all three evaluation indices. This suggests that introducing reconstruction-error information can improve the balance of the optimisation to some extent, but it does not outperform spectral entropy alone for the present problem. Overall,
Table 4 demonstrates that the spectral-entropy-based objective function offers the most suitable trade-off between noise suppression, decomposition compactness, and weak-feature preservation, and is therefore adopted as the default fitness function in the proposed WOA-CEEMDAN framework.
4.3.3. Comparison with Other Signal Decomposition Methods
To evaluate the effectiveness of the proposed method in early cable fault diagnosis, this study compares it with several widely used signal decomposition methods, including EMD, CEEMDAN, PSO-CEEMDAN [
35], GWO-CEEMDAN, SSA-CEEMDAN, and WOA-CEEMDAN, focusing on feature extraction capability, noise suppression performance, and decomposition accuracy. To provide a comprehensive evaluation, two groups of comparative experiments are designed based on simulation data and real cable monitoring data, respectively. Three quantitative evaluation metrics, namely Root Mean Square Error (RMSE), Signal-to-Noise Ratio (SNR), and total information entropy, are employed to assess the performance of different decomposition methods. The detailed results are presented in
Table 5.
The results in
Table 5 show that the proposed WOA-CEEMDAN method achieves the best overall decomposition performance on both the simulation and real-world datasets. On the simulation dataset, WOA-CEEMDAN obtains the lowest RMSE (0.093), the highest SNR (8.42 dB), and the lowest information entropy (12.80), indicating superior reconstruction accuracy, stronger noise suppression, and better concentration of fault-related information. Among the other meta-heuristic optimization methods, GWO-CEEMDAN ranks second, followed by SSA-CEEMDAN and PSO-CEEMDAN, while the conventional decomposition methods, including EEMD, CEEMDAN, and VMD, show relatively inferior performance.
A similar trend can be observed on the real-world dataset. WOA-CEEMDAN again achieves the best performance, with the lowest RMSE (0.109), the highest SNR (7.58 dB), and the lowest IE (13.61). Compared with PSO-CEEMDAN, the proposed WOA-CEEMDAN further reduces the reconstruction error and improves the signal-to-noise ratio, while also producing more concentrated and informative intrinsic mode components. GWO-CEEMDAN provides competitive results on the real dataset and ranks second overall, whereas SSA-CEEMDAN also improves upon conventional CEEMDAN but remains inferior to both GWO-CEEMDAN and WOA-CEEMDAN. Although VMD achieves a relatively high SNR on both datasets, its RMSE and information entropy remain less favorable than those of the meta-heuristically optimized CEEMDAN variants, indicating that noise suppression alone is insufficient if fault-related information is not preserved effectively.
Overall,
Table 5 confirms that introducing meta-heuristic optimization into CEEMDAN is beneficial for cable incipient fault signal decomposition, and that WOA provides the most effective parameter optimization strategy among the methods compared in this study. This result validates the suitability of WOA for adaptively tuning the CEEMDAN parameters (α,
n), thereby enhancing the extraction of weak fault characteristics from noisy monitoring signals.
The decomposition level L also plays an important role in CEEMDAN performance. Different decomposition levels affect the number of intrinsic mode functions (IMFs) and the depth of signal decomposition. The results are shown in
Figure 12.
When the number of IMF layers increases from 3 to 7, the spectral entropy decreases from 0.245 to 0.121 (a reduction of 50.6%), the orthogonality drops from 0.234 to 0.082 (a reduction of 64.9%), and the reconstruction error declines from 0.0187 to 0.0079 (a reduction of 57.7%). This indicates that the first seven IMF layers contain the main characteristic information of early cable fault signals, including high-frequency partial discharge pulses, intermittent arc features, and power frequency fundamental components. When MaxIMF exceeds 7, the improvement in each performance metric becomes extremely limited, while the increased feature dimensionality may interfere with the subsequent classification model. Therefore, this paper selects MaxIMF = 7 as the optimal maximum number of IMF layers.
After determining the optimal decomposition parameters (α = 0.13,
n = 126, L = 7), the WOA-CEEMDAN method is applied to the cable incipient fault signals. WOA-CEEMDAN decomposition is performed, as shown in
Figure 13.
Figure 13 presents the WOA-CEEMDAN decomposition results. IMF1-IMF3 capture high-frequency transient features corresponding to initial fault reflections and subsequent oscillations, serving as primary carriers of early discharge information. IMF4-IMF6 represent medium-to-low frequency components associated with power frequency coupling and load current.
Figure 13 indicate that the decomposed IMFs have clear physical meanings and distinct time–frequency characteristics. IMF1 represents high-frequency random noise (2–5 kHz) with a small amplitude (±0.15 A), high kurtosis (8.2), and an energy contribution of 2.34%. IMF2 is a damped oscillatory mode centered around 800 Hz, corresponding to the electromagnetic transient at arc breakdown, with an amplitude of ±0.35 kA and an energy ratio of 5.67%. IMF3 and IMF4 capture the main fault-related information as regular trapezoidal pulse sequences, associated with the principal arc discharge and its slow modulation, respectively; together they contribute 30.68% of the total energy, with amplitudes up to ±0.65 kA. IMF5 corresponds to the 50 Hz fundamental component and is the dominant energy carrier, accounting for 58.76% of the total energy with an amplitude of ±0.25 kA. IMF6 reflects the low-frequency trend (0–10 Hz) related to the fault-induced DC offset, with an amplitude of ±0.15 kA and an energy contribution of 2.15%.
4.4. Ablation Experiment
To verify the necessity of each component in the proposed hybrid architecture, an ablation study was conducted by removing one module at a time while keeping the data preprocessing procedure, train/validation/test split, CEEMDAN decomposition settings, optimizer, learning rate, batch size, and stopping criteria unchanged. In this way, the contribution of each block can be evaluated under a fair and controlled experimental setting. As shown in
Table 6. Four representative variants were considered: (1) TCN only, where only temporal convolutional feature extraction was retained; (2) BiLSTM only, where the model relied only on bidirectional recurrent temporal modeling; (3) TCN-BiLSTM, where the attention module was removed; and (4) the full TCN-BiLSTM-Multihead-Attention model, which corresponds to the proposed framework.
The purpose of this experiment is to determine whether the performance improvement of the proposed model comes from the combination of all modules rather than from a single dominant block. In particular, TCN is expected to capture local multi-scale transient patterns, BiLSTM is used to model forward and backward temporal dependencies, and the Multihead Attention module is introduced to further emphasize fault-related informative features while suppressing irrelevant fluctuations and residual noise.
The ablation results indicate that the full TCN-BiLSTM-Multihead-Attention model achieves the best overall performance on both datasets. The TCN-only variant performs better than the BiLSTM-only variant, indicating that local multi-scale transient feature extraction is highly important for capturing weak incipient fault signatures. However, without bidirectional temporal modeling, the TCN-only model cannot fully capture long-range sequential dependencies. In contrast, the BiLSTM-only model benefits from temporal context modeling, but its limited local feature extraction capability makes it less effective under noisy and nonstationary signal conditions.
When TCN and BiLSTM are combined, the performance improves substantially, which confirms that local convolutional feature extraction and bidirectional temporal dependency modeling are complementary. After further introducing the Multihead Attention mechanism, all metrics are improved again, indicating that attention-based reweighting helps the model focus on the most fault-sensitive time steps while suppressing irrelevant fluctuations. Therefore, the ablation study verifies that each component contributes positively to the final diagnosis performance, and the superiority of the proposed model arises from the coordinated interaction of all three modules rather than from a single dominant block.
4.5. Comparative Analysis of Different Models
To comprehensively evaluate the effectiveness of the proposed diagnostic framework, two groups of experiments—simulation and real-world—were designed to compare the WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model with several representative methods, including Random Forest, 1D-CNN, Bi-LSTM [
36], Transformer, and CNN-BiLSTM-Attention [
37,
38]. The simulation dataset was generated using the PSCAD/EMTDC cable model described in
Section 4.1, incorporating controllable fault conditions such as varying fault resistance, inception angles, and discharge magnitudes. The real-world dataset was collected from an online cable fault acquisition system, containing environmental noise and disturbances that can significantly affect diagnostic performance. Based on these two datasets, each method was comprehensively evaluated in terms of accuracy, precision, recall, and F1-score, and the corresponding results are presented in
Table 7.
Table 7.
Main parameter settings of the compared models in
Table 8.
Table 7.
Main parameter settings of the compared models in
Table 8.
| Model | Main Parameter Settings |
|---|
| Random Forest | Number of trees = 200; maximum depth = 20; minimum samples split = 2; minimum samples leaf = 1; criterion = Gini impurity. |
| 1D-CNN | Three 1D convolutional layers with kernel size = 3; channel numbers = 32, 64, and 64; ReLU activation; global average pooling; fully connected layer with 64 neurons; Softmax output layer. |
| Bi-LSTM | Two-layer Bi-LSTM; hidden size = 64 in each direction; dropout = 0.3; fully connected layer with 64 neurons; Softmax output layer. |
| Transformer | Input embedding dimension = 128; number of attention head = 4; number of encoder layers = 2; feed-forward dimension = 256; dropout = 0.3; fully connected layer with 64 neurons; Softmax output layer. |
| CNN-BiLSTM-Attention | CNN front-end with convolution channels = 32 and 64; kernel size = 3; Bi-LSTM hidden size = 64 in each direction; attention layer with 4 heads; dropout = 0.3; fully connected layer with 64 neurons; Softmax output layer. |
| WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention | CEEMDAN search ranges: WOA population size = 30; maximum iterations = 100; spiral constant b = 1.0; optimal CEEMDAN parameters:α = 0.13, n = 126; maximum decomposition level = 7; retained IMFs = IMF1–IMF6; TCN with three residual blocks (kernel size = 3, dilation = 1, 2, 4, channels = 32, 64, 64); Bi-LSTM hidden size = 64 in each direction; Multihead Attention with 4 heads and model dimension = 128;dropout = 0.3;fully connected layer with 64 neurons; Softmax output layer. |
Table 8.
Overall model comparison on simulation data and real-world data.
Table 8.
Overall model comparison on simulation data and real-world data.
| Group | Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score |
|---|
| Sim data 1 | Random Forest | 83.21 | 82.17 | 81.74 | 0.819 |
| 1D-CNN | 88.72 | 87.43 | 86.98 | 0.872 |
| Bi-LSTM | 86.25 | 84.76 | 85.12 | 0.849 |
| Transformer | 92.38 | 91.47 | 92.05 | 0.917 |
| CNN-BiLSTM-Attention | 94.21 | 93.78 | 93.64 | 0.937 |
| WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention | 96.43 | 97.05 | 96.88 | 0.969 |
| Rel data 2 | Random Forest | 78.92 | 77.15 | 76.42 | 0.768 |
| 1D-CNN | 85.36 | 84.18 | 83.72 | 0.839 |
| Bi-LSTM | 83.21 | 81.90 | 82.35 | 0.821 |
| Transformer | 89.74 | 88.63 | 89.05 | 0.887 |
| CNN-BiLSTM-Attention | 92.56 | 92.14 | 91.88 | 0.920 |
| WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention | 95.59 | 96.82 | 96.50 | 0.954 |
The proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model demonstrates superior and robust performance on both simulation and real datasets, outperforming other methods in accuracy, precision, and F1-score. By integrating WOA-optimized CEEMDAN decomposition with TCN-BiLSTM-Multihead-Attention architecture, the model achieves enhanced feature extraction and global temporal modeling capabilities. While real-world environmental noise causes a slight performance decrease (96.43%→95.59%) compared to idealized simulation conditions, the model maintains strong practical applicability and generalization capability in realistic scenarios.
4.6. Additional Cross-Cable Validation for Generalizability
To further validate the generalizability of the proposed framework across different cable configurations, additional experiments were conducted on YJV22-8.7/10 kV cables with a length of 8 km and YJV-8.7/10 kV cables with a length of 3 km, in addition to the original YJV42-8.7/10 kV cable used in the main experiments. These supplementary experiments were designed to examine whether the proposed method can maintain stable diagnostic performance under changes in cable structure, cable length, and electrical characteristics. The results demonstrate that the proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention framework still achieves consistently strong performance on the additional cable models, indicating that the method has satisfactory cross-cable generalizability rather than being restricted to a single experimental setup. As shown in
Table 9, the proposed method maintains high accuracy across all three cable types, with only limited performance variation, further supporting its robustness under different cable configurations.
As shown in
Table 9, the proposed method maintains high accuracy across all three cable types (95.73–96.85%). This demonstrates strong generalization across cable constructions, attributed to the method proposed in this paper’s adaptive optimization that captures fault-related features independent of cable-specific characteristics.
4.7. Small-Sample Experiment
Due to the rarity of incipient fault events in real-world cable monitoring, obtaining large labeled fault datasets is challenging. To evaluate the model’s small-sample learning capability, experiments were conducted with four training data proportions (20%, 40%, 60%, and 80%), while keeping the testing dataset unchanged for fair comparison. This approach assesses diagnostic performance under limited data conditions. The results are shown in
Table 10.
The results demonstrate that the proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model maintains superior performance even when the available training data are limited. When only 20% of the training data are used, the proposed method still achieves 86.92% accuracy, outperforming the Transformer model by 8.29%. This indicates that the proposed model has strong small-sample learning capability, which is particularly beneficial for early cable fault diagnosis where labeled data are scarce.
4.8. Noise Robustness Experiment Under Gaussian, EMI, and Impulse Noise
To evaluate the model’s robustness against real-world disturbances such as electromagnetic interference and environmental noise, Gaussian white noise with varying signal-to-noise ratios (5 dB, 10 dB, 15 dB, and 20 dB) was added to the original signals, testing the proposed method’s performance under different noise levels. The results are shown in
Table 11.
The results show that the proposed model consistently achieves the highest diagnostic accuracy under all noise levels. Even under severe noise conditions (SNR = 5 dB), the proposed method maintains an accuracy of 81.54%, significantly outperforming the comparison models.
To further evaluate the robustness of the proposed method under practical non-Gaussian interference, additional experiments were conducted under electromagnetic interference (EMI) conditions. In this study, EMI was modeled as sinusoidal narrowband interference superimposed on the original cable current signal. For clarity, three interference severity levels were defined according to the interference amplitude relative to the peak value of the original signal: low EMI corresponds to an interference amplitude of 5% of the signal peak, medium EMI corresponds to 10%, and high EMI corresponds to 15%. The interference frequency was selected from typical power-related electromagnetic components, including 50 Hz, 150 Hz, and 250 Hz, in order to simulate realistic electromagnetic coupling in cable-monitoring environments. The corresponding robustness results under EMI conditions are shown in
Figure 14.
As shown in
Figure 14, all compared models exhibit a gradual decrease in diagnostic accuracy with increasing EMI intensity. The proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention framework achieves accuracies of 96.21%, 94.37%, and 91.68% under low-, medium-, and high-level EMI, respectively, which are consistently higher than those of CNN-BiLSTM-Attention (95.06%, 92.41%, and 89.85%), Transformer (93.18%, 90.52%, and 87.33%), 1D-CNN (90.42%, 87.15%, and 83.76%), and Bi-LSTM (88.63%, 84.97%, and 81.24%). This demonstrates that the proposed method has stronger robustness against narrowband electromagnetic interference.
In addition to EMI, impulse noise was further introduced to simulate abrupt transient disturbances that may arise from switching operations, external electrical shocks, or sensor spikes in industrial environments. The impulse noise was modeled as sparse high-amplitude perturbations randomly occurring in the signal sequence. The disturbance severity was classified into three levels using both impulse probability and impulse amplitude. Specifically, low impulse noise corresponds to an impulse occurrence probability of 0.5% with an amplitude of 2 times the original signal peak, medium impulse noise corresponds to a probability of 1% with an amplitude of 3 times the signal peak, and high impulse noise corresponds to a probability of 2% with an amplitude of 5 times the signal peak. The robustness results under these impulse-noise conditions are presented in
Figure 15.
As illustrated in
Figure 15, all compared models exhibit a clear reduction in diagnostic accuracy as the severity of impulse noise increases, reflecting the strong influence of sparse high-amplitude transients on fault-feature extraction and classification. The proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention framework achieves accuracies of 95.93%, 93.14%, and 89.47% under low-, medium-, and high-level impulse-noise conditions, respectively, which are consistently higher than those of CNN-BiLSTM-Attention (94.58%, 90.86%, and 87.24%), Transformer (92.64%, 88.73%, and 84.91%), 1D-CNN (89.75%, 85.94%, and 81.68%), and Bi-LSTM (87.82%, 83.46%, and 79.35%). This confirms that the proposed method maintains superior robustness under abrupt nonstationary interference.
4.9. ROC Curve and AUC Analysis
Receiver Operating Characteristic (ROC) curves were used to evaluate the classification performance of different models. The ROC curve illustrates the relationship between the True Positive Rate (TPR) and False Positive Rate (FPR) under varying decision thresholds. The Area Under the Curve (AUC) quantitatively reflects the classifier’s discrimination ability.
Figure 16 shows the ROC curves of several representative models, including 1D-CNN, Bi-LSTM, Transformer, and the proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model. The corresponding AUC values are summarized in
Table 12.
Figure 16 shows that the ROC curve of the proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model remains consistently closer to the upper-left corner on both the simulation and real-world datasets, indicating a more favorable balance between fault sensitivity and false alarm rate across a wide range of thresholds. The corresponding AUC values are summarized in
Table 12. On the simulation dataset, the proposed model achieves the highest AUC of 0.97, improving upon CNN-BiLSTM-Attention by 0.03, Transformer by 0.05, 1D-CNN by 0.09, and Bi-LSTM by 0.11. A similar trend is observed on the real-world dataset, where the proposed model also attains an AUC of 0.97, which is 0.04 higher than CNN-BiLSTM-Attention and 0.05 higher than Transformer.
These results indicate that the proposed framework does not merely perform well at a single operating point; rather, it preserves strong separability between incipient-fault and non-fault patterns over a broad range of decision thresholds. This property is particularly important for engineering deployment, because alarm thresholds in online monitoring systems often need to be adjusted according to different operational risk preferences. The high AUC on the real-world dataset further confirms that the combination of WOA-CEEMDAN preprocessing and hybrid sequence modeling effectively suppresses the influence of noise and distribution variability.
It is also worth noting that the AUC gap between the simulation and real-world datasets is negligible for the proposed method, whereas several comparison models exhibit a more obvious performance drop under real measurement conditions. This observation suggests that the proposed framework has better cross-scenario generalization and more stable probability discrimination capability. Therefore, the ROC/AUC analysis, together with the accuracy, F1-score, and confusion-matrix results, provides consistent evidence that the proposed method offers superior diagnostic reliability for practical incipient cable fault monitoring.
4.10. Model Performance Analysis
To more clearly illustrate the recognition performance of the proposed WOA-CEEMDAN-TCN-BiLSTM-Multihead-Attention model on real-world collected data, this section presents its confusion matrix as shown in
Figure 17. On the real-world test set, the model achieves 1380 true positives (TP) and 636 true negatives (TN), along with 39 false positives (FP) and 45 false negatives (FN). This indicates that the model exhibits a slight preference for issuing conservative alarms rather than missing actual faults—a desirable characteristic for early warning applications in power systems. Accordingly, the false negative rate (FNR) is approximately 3.16%, while the false positive rate (FPR) stands at about 5.78%. In other words, the proposed method maintains a high probability of detecting incipient cable faults before they evolve into more severe failures. The low FNR is important for early fault diagnosis, since missed alarms may allow weak insulation defects to develop into more severe failures. Meanwhile, the relatively limited FPR suggests that the framework also has good practicality for online monitoring scenarios, where excessive false alarms may increase maintenance burden.
Overall, the confusion-matrix-based analysis further confirms that the proposed model is not only accurate in terms of global metrics, but also practically meaningful in engineering applications where fault sensitivity is more critical than nominal-state conservatism.
In addition, the Matthews correlation coefficient (MCC) calculated from the confusion matrix is 0.909, which further confirms the balanced classification capability of the proposed model. Unlike accuracy alone, MCC jointly considers true positives, true negatives, false positives, and false negatives, and therefore provides a more comprehensive evaluation of diagnostic reliability. Overall, the confusion-matrix-based analysis demonstrates that the proposed framework achieves strong and balanced performance on real-world cable fault data.
To further assess the balance between precision and recall for different diagnostic models, the Precision–Recall AUCs are shown in
Figure 18.
As shown in
Figure 18, the proposed model achieves a more favorable precision–recall trade-off than the comparison methods. Its PR curve remains closer to the upper-right region, indicating that high fault recall can still be maintained without a substantial loss of precision. This result is consistent with the previously reported accuracy, F1-score, ROC-AUC, and confusion-matrix analysis, and further confirms the superior reliability of the proposed framework for practical incipient cable fault diagnosis.