1. Introduction
With the growing penetration and scale of renewable energy in power systems, the distribution network has changed from the traditional one-way single-source mode to the multi-direction multi-source mode. Numerous distributed generation units and nonlinear power electronic devices have been incorporated into the distribution grid, leading to power quality issues, including voltage fluctuations, harmonic distortion, and three-phase imbalance. These issues are becoming increasingly prominent, posing severe challenges to power supply quality [
1,
2]. To maintain the proper functioning of power equipment and prevent system faults from spreading, real-time monitoring of power quality data is essential for promptly analyzing and addressing anomalies. This helps minimize power equipment damage and financial losses while enhancing the power system’s operational efficiency. However, electromagnetic interference within distribution terminals and the weak noise resistance of microprocessor-embedded chips often cause power quality data waveforms to exhibit burrs, distortion, and disturbances. This makes it more difficult to extract and identify fault features [
3,
4]. Therefore, it is urgent to investigate power quality data enhancement and processing techniques for distribution terminals under the complex distribution network noise environment. These studies enhance data feature extraction, improve disturbance identification robustness and accuracy, and strongly support power quality monitoring and fault warning.
Due to electromagnetic interference and complex noise in the distribution terminal and microprocessor during high-frequency acquisition, the collected electrical waveform will appear to have a burr, which increases the difficulty of power quality detection and analysis [
5]. In traditional research, low-pass filtering or wavelet transform is used to remove the tip burr component and clean the data. A method for mitigating large burr signals was proposed, which is derived from the established theory of nonlinear systems and the burr threshold boundary [
6]. Under the condition of same hardware, the influence of the burr phenomenon can be reduced by combining the burr jitter of sufficient amplitude with low-pass filtering. In [
7], Tian et al. carried out wavelet domain filtering and threshold processing on the bus load data based on the wavelet threshold denoising method to remove burr in the waveform. However, when processing high-frequency signal sequences, the traditional low-pass filtering method may cause signal distortion or degradation. This is particularly problematic for high-frequency components containing key features of power quality, as it is often difficult to balance the contradiction between denoising and retaining useful information, which affects the subsequent analysis of power quality data.
Renewable energy power systems and nonlinear load grid connections can lead to Power Quality Disturbance (PQD), resulting in signal degradation and waveform distortion. These issues significantly impact the stable and secure operation of precision computers and microprocessors within the distribution network, potentially leading to anomalies or severe accidents [
8]. Consequently, accurately identifying and classifying PQD is essential for enhancing power supply quality, monitoring power equipment conditions, and preventing grid faults. Traditional PQD classification methods rely on techniques such as Short-Time Fourier Transform (STFT), S-transform (ST), and wavelet transform for feature extraction. These manually selected features serve as classifier inputs, mapping extracted features to disturbance categories [
9,
10]. In [
11], Tang et al. introduced a PQD detection and classification approach utilizing ST and a kernel Support Vector Machine (SVM) to mitigate the signal distortion caused by nonlinear load integration into the distribution network. By optimizing window parameters and employing multi-feature classification with kernel functions, effective PQD classification under varying noise levels was achieved. Similarly, an improved wavelet transform-based PQD identification method was proposed in [
12]. This approach leveraged a two-tree wavelet transform to extract complex disturbance features and employed a directed acyclic graph SVM for PQD classification. Additionally, SVM parameters were fine-tuned using the locust optimization algorithm, improving both classification accuracy and processing speed. In [
13], Abdelsalam et al. introduced a PQD identification method that uses a Kalman filter for automatic detection and burr removal. It then classifies disturbances based on three factors: the three consecutive maximum values of instantaneous total harmonic distortion, the standard deviation, and the energy difference between the distorted signal and its fundamental frequency. In [
14], Zhao et al. combined time–frequency domain multi-features with a decision tree for disturbance identification. It uses energy spectrum features from wavelet transform and other time–frequency features from S-transform to create identification rules. However, as noise complexity increases within distribution networks, traditional methods that rely on statistical analyses and experience-based feature extractions become less effective. Moreover, conventional classifiers struggle with generalization and slow processing speeds when handling large-scale datasets. Deep learning, with its multi-layered structure, offers robust nonlinear feature extraction capabilities. Architectures such as Convolutional Neural Networks (CNN) and autoencoders enable autonomous learning from extensive PQD datasets. These models convert one-dimensional PQD signal sequences into two-dimensional representations, allowing image-processing techniques to directly extract features, recognize disturbance patterns, and classify PQD signals [
15]. In [
16], Cui et al. proposed a CNN-based method for multi-class PQD classification, where a one-dimensional feature matrix is extracted using ST. After optimizing the contour parameters to generate a two-dimensional image, it is then input into the CNN for training to complete the PQD classification task under different interference conditions. An automatic identification and classification method of PQD based on deep learning was designed. Multidimensional spectral characteristics of the fractional domain were extracted through spatial Fourier transform. These characteristics were then converted into spatial feature maps [
17]. The multi-inputs were sent to a deep spectrum convolution fusion network for multidimensional feature fusion and PQD classification. This can improve the detection and anti-noise ability of complex PQD signals. However, the large number of network parameters, the high computational complexity, the inability to parallelize calculation, and the long-distance dependence of these structures limit their application in large-scale power quality data processing.
Being a highly capable deep learning model, long-range dependencies and global context in time series data are effectively captured by the Transformer via its self-attention mechanism. This addresses the limitation of neural networks, which struggle to perform parallel and balanced computation simultaneously [
18]. In [
19], Yoon et al. introduced a deep PQD identification technique based on the Transformer. By segmenting voltage signal inputs and embedding them into Transformer encoders, the model accurately classifies PQD types and locations under different fault conditions using a multi-head attention mechanism and multi-layer perceptron. Similarly, an end-to-end disturbance identification framework integrating deep CNN and the Transformer was proposed in [
20]. This approach employs a deep CNN for one-dimensional feature extraction, while a Transformer encoder and attention mechanism facilitate sequence-based autonomous learning and feature fusion, enhancing the speed and accuracy of fault detection and prediction. In [
21], Chen et al. constructed an optimization algorithm based on the STFT and CNN. By converting signal samples into time–frequency matrices as inputs for the CNN, the algorithm leverages the powerful two-dimensional data processing capability of CNNs to achieve more efficient interference recognition. In [
22], Zhou et al. adopted a dual-branch structure combining CNN and Long Short-Term Memory (LSTM) to process raw time series data and wavelet transform coefficients. They used the Continuous Wavelet Transform (CWT) to extract time–frequency features and employed an LSTM layer with a self-attention mechanism to capture temporal dynamics. Despite its effectiveness in identifying PQD, the CNN models typically process only single-mode sequence features, overlooking the interrelation between time and frequency domains as well as the complementary nature of multi-dimensional information. Consequently, under complex noise conditions in the distribution network, achieving precise signal characterization remains challenging, ultimately affecting detection performance.
In summary, the following challenges remain in power quality data processing. Firstly, due to the electromagnetic interference inside the distribution terminal and the weak anti-noise ability of the chip embedded in the microprocessor, the collected signal sequence appears as a burr. The traditional low-pass filtering method easily causes high-frequency information distortion, affecting the subsequent analysis of power quality data. Secondly, traditional PQD identification methods are often based on single-frequency domain signal processing or deep learning algorithms. These methods ignore the comprehensive consideration of the dual characteristics of the time domain and frequency domain, and are unable to accurately capture the long-distance dependence between different sequences. Additionally, the large number of network parameters makes parallel computation impossible, resulting in the insufficient robustness and accuracy of PQD identification in the complex noise environment in the distribution network.
Therefore, in order to solve the above problems, we propose a power quality data augmentation and processing method for distribution terminals considering high-frequency sampling to improve the robustness and accuracy of PQD identification in complex noise environments. Firstly, a burr removal method based on a high-frequency filter operator is proposed to address the burr caused by electromagnetic interference, noise, and related faults. Only power quality sampling sequences that exhibit burr characteristics are suppressed, while key information from the normal bumps of the original waveform is preserved through additional constraints. The filtered signal sequence is then analyzed and processed in the time–frequency domain. Using S-transform and signal period reconstruction, the original one-dimensional signal is transformed into a two-dimensional matrix image, enabling the extraction of multi-modal PQD signal features. A PQD identification approach based on a dual-channel time–frequency feature fusion network is proposed. The time–frequency domain matrix image is simultaneously fed into a CNN and Transformer encoder to extract globally coupled features. A cross-attention mechanism is employed for feature fusion, enhancing features’ representation and ultimately enabling accurate PQD classification. The comparisons between the proposed work and other works from the perspectives of operator complexity, preservation of transients, dual characteristics of time–frequency domains, long-term dependency capture capability, and burr removal without high-frequency information distortion are shown in
Table 1. The main contributions of this paper are summarized as follows.
The burr removal method of sampling waveforms based on high-frequency filter operator: The method uses high-frequency filtering operators and takes into account the small bumps that may appear in normal waveforms. By adding different restrictions or parameters, only the sampled signals that meet the burr characteristics are processed. While eliminating burrs, it can accurately retain the key features of high-frequency information, avoid the distortion of high-frequency fault information after filtering, and improve the data quality of the distribution terminal.
The PQD identification method based on a dual-channel time–frequency feature fusion network: First, the PQD signal undergoes an S-transform and signal period reconstruction to generate a matrix image, forming the dual-channel time–frequency domain input features. Next, two types of matrix image features are fed into a CNN to extract global time-domain and local frequency-domain features. Additionally, a Transformer encoder captures highly coupled global time–frequency features, while the cross-attention mechanism enhances deep fusion by modeling the long-term dependencies between the sequences. Finally, the classification layer outputs the PQD identification results, improving robustness and accuracy in complex distribution network noise conditions.
4. Experimental Results and Analysis
According to the IEEE Std 1159-2019 standard [
23], a dataset was constructed considering the normal operating condition (C0) and eight types of single PQDs. The eight types of single PQDs are as follows: voltage spike (C1), voltage notch (C2), voltage flicker (C3), harmonics (C4), voltage sag (C5), voltage swell (C6), voltage interruption (C7), and transient oscillation (C8). The mathematical models for each disturbance are listed in
Table 2. Based on the mathematical models provided in
Table 2, the disturbances were generated with a sampling frequency of 12.8 kHz and a sampling length of 20 cycles, resulting in 640 sampling points per signal. Each type of disturbance included 300 samples. The dataset was divided into training (60%), validation (20%), and test (20%) sets according to standard proportions to ensure effective model learning and performance evaluation across different data subsets. The hardware and software settings are as follows. The CPU is Intel Core (TM) i7-1360P 2.20 GHz. The GPU is NVIDIA GeForce GTX 4060 with 8 GB of video memory. The running memory is 16 GB. The software version used is MATLAB 2019. The operating system is WINDOWS11 (64-bit). In the simulation of burr removal, the current waveform is used as an example.
Table 3 lists the specific parameters [
24,
25].
Table 4 lists the key hyperparameters of the CNN and Transformer components [
16,
26].
Four baseline algorithms are used for comparison. The first two baselines represent traditional, widely used approaches. Baseline 1 [
13] is a PQD identification method that uses a Kalman filter for automatic burr detection and removal. For this method, the filter gain was set to 0.05, the state noise covariance Q was diag (1 × 10
−4), the observation noise covariance R was 0.01, and the iteration termination threshold was 1 × 10
−6. Baseline 2 [
14] combines time–frequency domain multi-features with a decision tree for disturbance identification. The decision tree was configured with a maximum depth of 12, a minimum of eight samples for a split, a feature sampling ratio of 0.8, and a pruning coefficient of 0.01. Its time–frequency features were extracted using a 200 ms window with 50% overlap. While these methods are valued for their low complexity and ease of deployment, they have notable limitations. Baseline 1’s reliance on single-domain signal processing compromises its robustness in complex noise environments. Baseline 2 does not address the crucial pre-processing step of burr removal, which impacts the quality of its input data. We also compare our work against two state-of-the-art deep learning models: STFT-CNN [
20] (Baseline 3) and CNN-LSTM [
21] (Baseline 4). The key hyperparameters for the CNN and Transformer components of these models, such as the number of heads and learning rate, are detailed in
Table 4. Although these models leverage robust 2D data processing capabilities, they primarily operate on single-mode sequence features, neglecting the comprehensive integration of time–frequency dual characteristics that our method introduces.
To ensure a fair and rigorous comparison, all methods were evaluated under a unified training and testing framework. For the deep learning models, we employed the Adam optimizer and a uniform training schedule of 100 epochs, with an early stopping mechanism set with a patience of 10 epochs to prevent overfitting. This standardized procedure ensures that any observed performance differences are attributable to the architectural merits of the models rather than to variations in the training process. Moreover, both Baselines maintain identical input/output configurations and simulation parameter settings to the proposed algorithm.
Figure 4 shows the comparison of disturbance removal effects in the power quality data of different algorithms. The pre-filter power quality signal (C0) is affected by harmonics (C4), sag (C5), and transient oscillation (C8). Compared with Baseline 1 and 2, the proposed algorithm reduces the mean absolute error (MAE) by 62.73% and 75.26%, respectively. The reason for this is that the proposed algorithm uses high-frequency filtering operators for filtering. Based on a dual-channel time–frequency feature fusion network for deep feature analysis, it can eliminate disturbance while avoiding the signal distortion caused by filtering normal signals. As a result, it is able to improve the power quality data. However, Baseline 1 and 2 overlook the distinct disturbance characteristics in the signals and lack deep dual-domain feature fusion analysis, leading to poor high-frequency fault identification accuracy and degraded disturbance removal performance.
Let
be the sampling value of the extreme point in the data after the current filtering point
. The extreme point does not align with the morphological characteristics of the unrestricted conditional shock operator.
is the sampling value with the smaller absolute value among the two sampling points with different values from
before and after the extreme value point. Thus, different restriction conditions can be adopted and expressed as follows:
where
is restriction condition 1 and
is restriction condition 2.
Figure 5 shows the comparison of the disturbance removal effects in the PQD of the proposed algorithm under different conditions. Compared with condition 1, the disturbance removal effect of conditions 2 is better. The reason for this is that condition 2, adopted by the proposed algorithm, only processes the sampled signals that conform to the disturbance characteristics. At the same time, a high-frequency filtering operator is applied to further filter the signal, thus preserving the effective information. However, condition 1 does not take into account the differential characteristics of disturbance, leading to the loss of some effective information. This, in turn, distorts the waveform of power quality data and negatively impacts the disturbance removal effect.
PQD identification is performed using the dataset, and the results are categorized into True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). TP denotes correctly predicted positive samples, FP represents negative samples incorrectly predicted as positive, TN refers to correctly predicted negative samples, and FN indicates positive samples incorrectly predicted as negative. Accuracy, precision, and recall are used to assess algorithm performance, which can be expressed as follows:
Besides precision and recall, we introduce F1, which is a comprehensive evaluation metric for binary classification models to balance accuracy and recall. F1 is used to measure the accuracy and stability of the model in classification tasks, especially when the categories are imbalanced. F1 is expressed as follows:
Figure 6 shows the comparison of PQD identification accuracy over training iterations. Compared with Baseline 1, Baseline 2, Baseline 3, and Baseline 4, the identification accuracy of the proposed algorithm is improved by 21.33%, 16.98%, 11.37%, and 8.84%, respectively. The reason for this is that the proposed algorithm constructs the time–frequency domain dual-input features and processes them separately. It then couples the two features to enhance the PQD signal processing effect. However, Baseline 1 neglects the dual-channel time–frequency feature fusion network and fails to fully leverage disturbance signal characteristics, leading to reduced identification accuracy under the complex noise conditions in the distribution network. Baseline 2 lacks high-frequency filter operators, causing the distortion of high-frequency fault information and impacting the accuracy of disturbance identification. Baseline 3 and Baseline 4 process only single-mode sequence features, neglecting the interrelation between time and frequency domains as well as the complementary nature of multi-dimensional information. Consequently, under complex noise conditions in the distribution network, achieving precise signal characterization remains challenging, ultimately affecting detection performance.
Figure 7 illustrates the PQD identification accuracy of various algorithms against SNR. When SNR = 0 dB, the proposed algorithm improves its identification accuracy by 36.26%, 31.74%, 15.85%, and 16.22% compared to Baseline 1, 2, 3, and 4, respectively. Meanwhile, the proposed algorithm exhibits the smallest error bars among all Baselines. This improvement is due to the proposed algorithm’s use of the S-transform and signal period reconstruction to generate matrix image features, which are then input into the CNN to extract global time-domain and local frequency-domain features. Transformer encoders further extract highly coupled global features from the time–frequency sequences. The cross-attention mechanism captures long-term dependencies between sequences, enabling deep feature fusion. As a result, the identification accuracy and robustness of PQD are significantly enhanced in the complex noise environment of the distribution network.
Table 5 presents the PQD identification and classification results of the proposed algorithm at different noise levels. As the noise level decreases, the classification accuracy for 17 power quality events remains mostly at 100%, with the accuracy for disruption and vibration disturbances dropping to 97% and 98% at SNR = 30 dB. Additionally, the classification accuracy for some mixed disturbances (e.g., temporary drop + harmonic + vibration) falls to 95% at SNR = 30 dB. This demonstrates that the proposed algorithm maintains its robust performance and high accuracy even in high-noise-level environments, enabling effective PQD identification and classification. The algorithm works by preserving the key features of the original waveform while eliminating burrs through additional restrictions, and then extracting time–frequency domain features, allowing the classification layer to output accurate results even under high-noise-level conditions, significantly improving robustness and accuracy.
To further confirm the robustness and statistical significance of the proposed algorithm’s performance, we conducted five independent repeated experiments on the test set. For our proposed method, the mean accuracy was 97.9% with a 95% confidence interval of ±1.1%, and the mean F1-score was 97.7% with a 95% confidence interval of ±1.2%. A two-sided independent samples t-test was performed to compare the proposed algorithm against each of the four baseline methods, for both accuracy and the F1-score. Specifically, for accuracy, the t-statistics were 4.13 when compared to Baseline 1, 3.66 compared to Baseline 2, 2.55 compared to Baseline 3, and 2.87 compared to Baseline 4. For the F1-score, the t-statistics were 4.11 when compared to Baseline 1, 3.68 compared to Baseline 2, 2.54 compared to Baseline 3, and 2.89 compared to Baseline 4. The results consistently revealed statistically significant differences with , favoring our proposed method. These findings strongly underscore the proposed algorithm’s superior and robust performance compared to existing methods, even under varying experimental conditions.
Figure 8 shows the relationship between PQD identification accuracy and the limiting parameter λ. As λ increases, the accuracy gradually declines because looser high-frequency filtering leaves more residual noise and weakens the extraction of critical fault-related components, which in turn affects feature representation and classification. To balance noise suppression and feature preservation in practical settings, λ should be adapted to the precision requirements of different operational scenarios. A practical workflow involves initializing λ based on the noise statistics of the signal first (e.g., λ
initial ≈ 1.2σ, where σ is the noise RMS), and then fine-tuning it according to the application. For transient-sensitive tasks such as detecting lightning-induced overvoltages or switching events, λ is typically tuned within a smaller range (0.10–0.25) to retain high-frequency fault signatures while suppressing noise. For routine periodic monitoring, where computational efficiency and signal smoothness are more important, a larger λ in the 0.25–0.45 range is preferable. This adaptive tuning strategy enables an effective trade-off between denoising performance and information integrity across diverse operational conditions.
Real-world disturbance data were used to validate the reliability of the proposed algorithm. The real-world PQD dataset used in this study consists of 1200 samples collected from the operational monitoring records of three regional 10 kV distribution substations, covering four mainstream types of distribution terminal units operating under different service conditions. The dataset includes eight typical PQD classes—voltage spike, notch, harmonics, sag, swell, interruption, flicker, and transient oscillation—along with a normal sinusoidal signal, ensuring sufficient diversity and representativeness for practical validation. In this paper, we ensure effective measurement of PQD baseline through data cleaning. We adopt the hybrid data cleaning framework using Markov Logic Networks (MLN) [
27]. Firstly, infer a set of possible instantiation rules based on MLN, and then construct a two-layer MLN index structure to generate multiple data versions, facilitating the cleaning process. In the two-stage data cleaning step, we first propose the concept of reliability score to clean up errors in each data version separately and then use the new concept of the fusion score to eliminate conflicting values between different data versions.
Figure 9 presents the performance analysis of different algorithms under real-world disturbance data. The simulation results demonstrate that the proposed algorithm achieves the highest accuracy, precision, recall, and F1-score compared to the four baselines. This superiority stems from the algorithm’s innovative dual-channel time–frequency feature fusion network design and its optimization strategy for complex noise environments. By incorporating the dual-channel input and a cross-attention mechanism, the proposed method enhances both feature richness and noise robustness, enabling more accurate identification of various types of disturbances.
To further validate the model’s practical applicability and statistical robustness, we performed 20 independent evaluations on the real-world disturbance dataset, and the results are visualized in
Figure 10. The proposed algorithm consistently outperforms all baselines, achieving a superior median accuracy of 97.8%, which is a 3.9% improvement over the strongest baseline (Baseline 3) and a significant 16.4% gain over the weakest baseline (Baseline 1). Crucially, the proposed method also demonstrates the highest stability with the narrowest interquartile range, indicating reliable and consistent performance on unseen, real-world data. This consistency is attributed to the dual-channel network’s ability to extract rich, complementary features and the cross-attention mechanism’s effectiveness in enhancing noise robustness, allowing the model to generalize effectively from synthetic training to complex, practical scenarios.
The computing complexity of the proposed algorithm is composed of three parts, i.e., CNN feature extraction, a dual-path Transformer encoder, and cross-attention feature fusion. Specifically, the computing complexity of the CNN feature extraction part is , where is the number of power distribution terminals. is the length of the sampled sequence. is the feature dimension. The dual-path Transformer encode part has a computing complexity of . The cross-attention feature fusion part has a computing complexity of . Therefore, the total computing complexity is . The computing complexities of Baseline 3 and Baseline 4 are also , which indicates the proposed algorithm has low complexity.
We have provided ablations to demonstrate the isolated contributions of the following factors: no burr filter, single channel, CNN only, Transformer only, and fusion without cross-attention, which are summarized in
Table 6. The results underscore the importance of our key architectural innovations. The final analysis reveals that the burr removal, the dual-channel feature fusion strategy, and the cross-attention mechanism play crucial roles, contributing 5.39%, 1.38%, and 1.16% to the identification accuracy improvement, respectively.