1. Introduction
With the rapid development and large-scale integration of renewable energy, multi-energy complementary systems have emerged as an important approach in modern power system evolution [
1]. However, this system architecture also presents considerable challenges, particularly power quality disturbances caused by widespread power electronic integration and changing load characteristics [
2]. Power quality disturbances typically manifest as deviations in voltage, current, or frequency from nominal values [
3]. These disturbances are varied, including isolated events and complex disturbance combinations [
4]. On the power source side, large-scale integration of power electronic devices, such as distributed photovoltaics and energy storage, causes serious issues, including harmonics and voltage fluctuations [
5,
6]. On the load side, emerging high-energy-consuming industries, such as Power-to-Hydrogen, have rendered large-scale rectifiers major sources of grid harmonics and disturbances [
7]. These disturbances threaten grid stability, cause distributed source disconnections, damage equipment, interrupt production lines, and result in substantial economic losses [
8]. Therefore, efficient and accurate power quality disturbance detection technologies are essential for stable smart grid operation and reliable modern industrial systems.
The study of power quality disturbance identification methods primarily involves two components: feature extraction and disturbance classification. For feature extraction, signal processing techniques, including the Fourier Transform (FT) [
9], Short-Time Fourier Transform (STFT) [
10], Continuous Wavelet Transform (CWT) [
11,
12], and S-transform (ST) [
13,
14,
15,
16,
17], are widely applied. While traditional FT analyzes signal spectra and harmonics, it struggles with non-stationary power quality disturbances due to inherent limitations [
9]. Although windowing techniques partially improve FT performance for non-stationary signals, the fixed resolution imposed by the window function remains restrictive [
10]. Continuous Wavelet Transform excels in analyzing non-stationary signals with abrupt changes, but suffers from poor noise immunity and high computational complexity [
11,
12]. In contrast, the S-transform employs a Gaussian window whose width inversely scales with frequency. This approach overcomes fixed-window limitations and provides superior noise suppression, offering clear benefits for power quality disturbance feature extraction [
14]. However, a single window function in the S-transform cannot simultaneously achieve optimal resolution in both time and frequency domains. To address this limitation, Liu et al. proposed time–frequency segmentation with dynamic correction of the S-transform window function, enabling accurate power quality disturbance localization [
15]. Subsequent studies further enhanced the S-transform by integrating singular value decomposition (SVD) to improve feature extraction performance [
16]. Moreover, feature characterization was improved by directly matching the Gaussian window to the signal spectrum, accelerating the S-transform optimization process [
17]. However, although these advanced time–frequency methods generate high-dimensional feature sets with rich information, they inevitably introduce substantial feature redundancy.
In the study of disturbance classification, classical methods primarily include decision tree (DT) [
18], partial least squares (PLS) [
19,
20], extreme learning machine (ELM) [
21,
22], and random forest (RF) [
23,
24,
25]. Duc et al. modeled relationships among power quality efficiency, safety, and reliability using partial least squares structural equation modeling (SEM), with artificial neural networks (ANN) employed for analysis [
20]. Subudhi U optimized extreme learning machine parameters using a Grey Wolf Optimizer (GWO), categorizing seventeen power quality disturbance events [
21]. Experimental results demonstrated accurate detection and classification performance. Random forest has attracted increasing attention due to its high classification performance and parallelized training advantages. Ravi et al. implemented power quality disturbance classification by constructing a random forest based on decision trees and recursive feature importance partitioning [
23]. Jampana further introduced gradient boosting to select key features, enhancing the random forest’s ability to recognize complex grid disturbances [
24]. Ismael combined continuous wavelet transform time–frequency localization with random forest classifiers for effective classification [
25]. Jiang Z et al. detected principal frequency points using iterative circular filtering envelope extreme algorithms and optimized Gaussian window parameters for resolution enhancement [
26]. However, random forest classification performance strongly depends on input feature quality, and high-dimensional transformed features introduce redundancy and substantial computational costs. To address these challenges, research has shifted toward multi-dimensional feature fusion. A mainstream approach converts signals into images and applies deep learning techniques [
27]. Li et al. integrated statistical features with convolutional neural network (CNN)-based image features [
28]. Liu et al. and subsequent studies fused one-dimensional signals with two-dimensional images generated using continuous wavelet transform or Gramian angular field (GAF), processing these representations using CNN or transfer learning models [
29,
30]. Another research direction focuses on multi-sensor data fusion and graphical analysis. Izadi et al. proposed a synchronized Lissajous figure method, converting multi-sensor waveform differences into images for CNN-based identification, demonstrating strong robustness [
31,
32].
In summary, advanced time-frequency analysis methods generate high-dimensional feature sets with comprehensive signal characterization but inevitably introduce substantial redundancy. The performance of classifiers such as RF largely depends on input feature quality. When processing high-dimensional features derived from signal transformations, redundancy reduces classification accuracy and increases computational burden. Meanwhile, deep learning methods based on image fusion exhibit strong feature learning capability but suffer from high computational complexity, long training times, and strict hardware requirements. Consequently, neither traditional feature extraction–classification approaches nor deep learning-based image fusion methods effectively balance performance, efficiency, and engineering practicality. To address these limitations, this paper proposes a multi-scale feature fusion PQD identification model based on time–frequency domain fusion. The proposed model abandons image-based recognition and complex deep learning architectures. By integrating time-domain and frequency-domain numerical features, it constructs a lightweight and efficient recognition framework. This paper incorporates the Maximum Relevance Minimum Redundancy (mRMR) algorithm into the feature extraction–classification workflow. By selecting an optimal feature subset with high discriminative ability and low redundancy, the model reduces training time and computational overhead. To validate effectiveness and robustness, experiments were conducted on a dataset containing twenty-one typical power quality disturbance types. Gaussian white noise with varying signal-to-noise ratios and signal loss rates was introduced to simulate real-world noise and data incompleteness. The main contributions of this study are summarized as follows.
- 1
Construct a multi-scale time–frequency feature set with complementary information by selectively extracting spectral, low-, mid-, and high-frequency components from the S-transform matrix and fusing them with time-domain features. This approach overcomes single-feature limitations by forming an informative multi-scale feature set that provides high-quality input for high-precision power quality disturbance recognition.
- 2
Propose a power quality identification framework integrating mRMR feature selection to address high dimensionality and redundancy in multi-scale feature sets. The mRMR algorithm selects an optimal feature subset with strong discriminative power and minimal mutual information, improving classification efficiency while preserving core discriminative information.
- 3
Construct perturbation datasets reflecting real-world scenarios by generating twenty-one standard perturbation waveforms based on IEEE Std 1159-2019 [
33]. This work addresses the limitation that most existing studies focus on white noise environments, which inadequately represent practical detection conditions. To overcome this limitation, signal loss mechanisms with varying proportions are introduced. This design enables comprehensive and rigorous evaluation of power quality disturbance recognition robustness under complex real-world scenarios.
- 4
Conduct systematic comparative experiments to evaluate the proposed framework under different feature extraction methods, feature selection strategies, and varying noise and data incompleteness conditions.
The remainder of this paper is organized as follows.
Section 2 presents the proposed methodology.
Section 3 presents a case study validating the effectiveness of the ST-mRMR-RF method.
Section 4 presents a discussion with a summary of the findings. Finally,
Section 5 presents the conclusion with a summary of the findings.
3. Results and Discussion
3.1. Dataset Description
In this paper, twenty-one power quality disturbances are studied and labeled from C1 to C21. These disturbances include seven typical single disturbances (C1–C7) and fourteen composite disturbances (C8–C21). Based on IEEE Std 1159-2019, a total of 21,000 waveform samples are generated using MATLAB 2021a, with 1000 samples per power quality disturbance class. The twenty-one power quality disturbance signals are summarized in
Table 1.
The simulation experiments were conducted on a computer with an AMD Ryzen 7 7840H CPU (Santa Clara, CA, USA), 32 GiB Kingston DDR4-3200 RAM (Fountain Valley, CA, USA), and an AMD Radeon 780M integrated GPU. The software environment comprised MATLAB 2021a running on Windows 11.
Robustness Evaluation Under Noisy Conditions
To evaluate model robustness in noisy environments, additive white Gaussian noise (AWGN) is introduced at different signal-to-noise ratio (SNR) levels. The SNR is defined as the ratio of clean signal power to noise power, expressed in decibels (dB):
where
and
represent the power of the clean signal and the noise, respectively.
Given a desired signal-to-noise ratio (SNR) level and the original clean signal
, the power of the required additive noise
can be derived from the equation above.
A white Gaussian noise sequence,
, with the same length as the original signal
and a power of
, is generated. The specific noise sequence is given by the following expression:
where
is a random number drawn from a standard Gaussian distribution with a mean of 0 and a variance of 1. Finally, the generated noise sequence is added to the original clean signal to obtain the noisy signal
:
To verify whether the added noise simulates uncertainty in real measurement environments,
Figure 4 compares the original signal with signals contaminated by noise at 20 dB, 30 dB, and 40 dB SNRs. From
Figure 4, as the SNR decreases and noise intensity increases, characteristic perturbations in the waveform are progressively obscured by noise. This results in increased waveform asperities and blurred local details. Such behavior reflects interference commonly encountered during signal acquisition in practical industrial environments.
3.2. Comparison of Feature Extraction Methods and Performance Evaluation
The feature datasets obtained through three different feature extraction methods are fed into the classification models—RF, PLS, ELM, and CNN—for training. To comprehensively assess a classifier’s effectiveness, four primary metrics are commonly used: accuracy, precision, recall, and F1 score. The resulting classification performances are presented in
Table 2.
From
Table 2, it can be observed that the features extracted using the ST method yield the best performance in the classification model, while the STFT extraction method demonstrates the poorest performance. This limitation arises because the STFT method requires preset fixed-length windows, resulting in fixed time-frequency resolution. Consequently, it is unsuitable for non-stationary signals. In contrast, the ST method employs adaptive windows that dynamically adjust their size based on signal frequency, achieving high resolution in both time and frequency domains. The RF model achieves the highest classification accuracy, reaching 99.98%, 99.68%, and 99.90% for the ST, STFT, and CWT extracted feature sets, respectively.
Power quality disturbances include seven typical single disturbances (C1–C7) and fourteen composite disturbances (C8–C21). From the confusion matrix in
Figure 5, most samples are correctly classified. Specifically, the RF model shown in
Figure 5a correctly classifies all single disturbances, with only one composite disturbance misclassified, achieving 99.98% accuracy. The CNN model in
Figure 5d exhibits more misclassifications among single disturbances but still attains 99.81% accuracy. In contrast, the PLS model in
Figure 5b performs worst, particularly for category C7, yielding an overall accuracy of 91.84%.
To identify power quality disturbance types that are difficult to classify using the RF model, classification tests were conducted under noise-free conditions and noise levels of 20 dB, 30 dB, and 40 dB. The corresponding classification results are presented in
Figure 6.
Under noisy conditions, classification accuracy decreases. Confusion matrix analysis reveals a systematic misclassification trend from C2 (voltage sag) to C4 (voltage interruption). Specifically, six misclassifications occur in
Figure 6d, seven in
Figure 6c, and fifteen in
Figure 6b. This behavior arises because voltage sags and interruptions both exhibit decreasing voltage magnitudes over time. Such similarities lead to closely aligned feature representations, differing mainly in voltage drop magnitude. Increasing noise progressively compromises feature extraction precision. As shown in
Figure 6b at 20 dB and
Figure 6c at 30 dB, C2-to-C4 misclassifications increase markedly. This trend occurs because noise obscures critical distinguishing features, thereby degrading overall classification accuracy.
In summary, the ST-RF method demonstrates superior overall performance. However, a key question remains whether the S-transform improvement arises from time-domain features, frequency-domain features, or their interaction effect. To address this issue, ablation studies are conducted in the following section to provide definitive verification.
3.3. Ablation Experiment
To validate the selected frequency band partitioning strategy, ablation experiments were conducted. The proposed scheme was compared with two control approaches: a three-part division (0–166 Hz, 166–333 Hz, 333–500 Hz) and a bisection method (0–250 Hz, 250–500 Hz). All experiments used the same dataset and a unified experimental framework to ensure fair comparisons. The experimental results are presented as follows:
Table 3 compares Random Forest performance under different frequency band allocation strategies. The results show that the proposed segmentation strategy achieves optimal performance, with accuracy, precision, recall, and F1 score all reaching 99.98%. This performance surpasses the three-class partitioning strategy at 99.95% and the two-class partitioning strategy at 99.89%. These results indicate that fundamental frequency-based partitioning is more effective than uniform frequency partitioning for extracting highly discriminative features.
To validate the effectiveness of time–frequency fusion features on classification performance, three ablation experiments were conducted. These experiments evaluated four classifiers (RF, PLS, ELM, and CNN) using time-domain features, frequency-domain features, and time–frequency fusion features, respectively. Model parameters were configured as follows. For CNN, a kernel size of [2 × 1] with 16 and 32 kernels was used, optimized by the Adam optimizer with a 0.001 learning rate and 0.0001 L2 regularization. For ELM, the hidden layer contained 1850 nodes with a Sigmoid activation function. For PLS, 100 principal components were retained. The experimental results are presented in
Table 4.
As shown in
Table 4, the time–frequency fusion feature set achieves superior overall classification performance among RF, PLS, ELM, and CNN models. The corresponding accuracies are 99.98%, 91.89%, 97.84%, and 99.81%, respectively. When comparing single feature sets, RF, PLS, and ELM achieve higher accuracy using frequency-domain features. Conversely, CNN performs better with time-domain features than with frequency-domain features. Further analysis of RF performance shows that the fused feature set achieves the highest accuracy at 99.98%. This is followed by time-domain features at 99.25%, while frequency-domain features yield the lowest accuracy at 98.36%. This is because time-frequency fusion is not merely a simple concatenation of features. Rather, it involves the effective integration of time-domain and frequency-domain feature vectors to construct a more comprehensive joint representation space. This process enables the model to simultaneously capture the temporal dynamics and spectral structural information of the signal, thereby overcoming the informational limitations inherent in features from either domain alone.
3.4. Dimensionality Reduction Analysis
While the effectiveness of S-transform-based time–frequency fusion features was validated previously, the resulting 3600-dimensional feature set exhibits substantial redundancy. To address this issue, this study employs three feature reduction techniques: Maximum Relevance Minimum Redundancy (mRMR), an Autoencoder (AE), and Lasso regression. The autoencoder uses a single hidden layer with a Sigmoid activation function, a regularization coefficient of 0.004, and 100 training iterations. In the Lasso method, the regularization penalty coefficient is set to 0.01, and features are selected based on the largest absolute regression coefficients. These approaches produce feature subsets with 100, 120, and 200 dimensions, respectively.
Table 5 summarizes the parameter configurations for the mRMR algorithm and the Random Forest classifier. The mRMR feature selection algorithm employs 100 selected features, determined through feature selection experiments. This strategy reduces the original 3600-dimensional feature space while maintaining high recognition accuracy and improving computational efficiency. The Random Forest classifier is configured with 50 decision trees. Validation indicates that further increasing the tree number yields no noticeable performance improvement. This configuration optimizes computational overhead while preserving effective ensemble performance. Gini impurity is adopted as the splitting criterion due to its suitability and efficiency for classification tasks. The maximum tree depth is unrestricted, and the feature number follows the square-root rule to capture data characteristics effectively.
As shown in
Figure 7, the CNN model achieves its highest accuracy of 99.83% using the Lasso-selected feature set. On the AE-selected feature set, PLS demonstrates superior performance. For the mRMR-selected feature set, the ST-mRMR-RF model performs best, achieving 100% accuracy. The mRMR-selected feature set not only reduces Random Forest training time but also improves classification accuracy. Notably, feature selection does not always enhance performance. For example, after AE-based reduction, the ST-AE-RF model accuracy decreases to 80.48%.
Computational efficiency is a critical metric for evaluating the performance of dimensionality reduction methods.
Table 6 compares experimental results of RF models using feature sets with and without the mRMR feature selection algorithm.
As shown in
Table 6, the mRMR algorithm reduces feature dimensionality from 3600 to 100, lowering feature count, parameter scale, and overall computational complexity. RF Model training time decreases from 95.15 s to 10.56 s, corresponding to an 88.91% reduction, while batch inference time decreases by 88.32%. Although mRMR introduces additional one-time computational overhead, the sustained acceleration during iterative training outweighs this cost, improving deployment feasibility on edge devices.
To determine the minimum dataset size required for acceptable accuracy, data ablation experiments were conducted. By progressively reducing training data size, model accuracy dependency on data volume was quantified, as shown in
Figure 8. In noise-free conditions, the model requires only 30% of the original data to maintain near-optimal performance. Under 20 dB noise, accuracy gradually decreases as data volume reduces, declining from 99.24% to 98.10%. In low-to-moderate noise environments of 30 dB and 40 dB, accuracy remains at 98.99% and 99.63%, even when data volume decreases to 300 samples. These results indicate that under low-to-moderate noise, the model effectively suppresses noise and retains efficient learning with limited data.
3.5. Comprehensive Assessment of Model Robustness
To ensure practical applicability of the proposed ST-mRMR-RF model in real-world power systems, this section evaluates robustness across two dimensions: additive noise interference and data loss. These scenarios simulate common signal transmission challenges, including random interference and sensor or data acquisition equipment failures.
To evaluate model noise robustness, additive noise at signal-to-noise ratios of 20 dB, 30 dB, and 40 dB is added to the original signal. Classification training is then performed using both the original feature set and the mRMR-selected feature set, with results shown in
Figure 9.
As illustrated in
Figure 8, the ST-mRMR-RF method achieves 100% accuracy using the optimized feature set under noise-free conditions. Under noisy conditions with SNRs of 20 dB, 30 dB, and 40 dB, it maintains accuracies of 99.24%, 99.75%, and 99.84%, respectively. In contrast, the ST-mRMR-PLS and ST-mRMR-ELM methods exhibit performance degradation at 20 dB SNR, with accuracies decreasing to 84.86% and 94.87%, respectively. This result indicates their limited noise robustness. The ST-mRMR-RF method consistently demonstrates the highest overall performance, regardless of noise presence. Overall, although classification accuracy decreases with increasing noise, the ST-mRMR-RF method remains robust, maintaining 99.24% accuracy even under severe 20 dB noise.
As shown in
Figure 10, under noise-free conditions, ROC curves for all classes approach the top-left region, yielding an AUC of 1.000. This result indicates near-perfect classification capability. Even under 20 dB noise interference, AUC values for most classes remain high, such as 0.999 for Class 2. This performance demonstrates the strong noise robustness of the proposed model’s feature representation.
Figure 11 shows that classification accuracy increases and gradually converges as the number of decision trees increases. Under noise-free conditions, accuracy reaches 100% when the number of trees equals 21. With SNRs of 20 dB, 30 dB, and 40 dB, accuracies are 99.24%, 99.75%, and 99.84% at 52, 33, and 46 trees.
Although prior experiments confirm ST-mRMR-RF robustness under noise, real industrial scenarios often include simultaneous signal interference and data loss. To evaluate practicality and stability under adverse conditions, a two-factor experiment was designed. The experiment examines the combined effects of Signal-to-Noise Ratios (no-noise, 30 dB, 40 dB) and data loss ratios (0%, 5%, 10%, 15%). Classification accuracy results are reported in
Table 7.
As presented in
Table 6, the model demonstrates strong robustness against data loss under noise-free or high-SNR (40 dB) conditions, maintaining an accuracy above 96% even with a 15% data loss ratio. However, under a 30 dB noise environment, the impact of data loss is markedly amplified, leading to an approximately 6 percentage-point drop in accuracy at the same loss ratio. This result reveals a significant coupling effect between noise interference and data loss.
As shown in
Table 6, the model exhibits robustness to data loss under noise-free or high signal-to-noise ratio conditions at 40 dB. Even with a 15% data loss rate, accuracy remains above 96%. However, under a 30 dB noise environment, data loss impact increases, causing about a six percentage-point accuracy reduction compared with no-loss conditions. This result indicates a coupling effect between noise interference and data loss.
To validate the proposed method,
Table 8 presents comparative results against existing approaches. The ST-mRMR-RF method shows strong performance in noisy environments, especially under challenging 20 dB signal-to-noise ratio conditions. Under noise-free conditions, advanced methods such as SMST-DCNN and DWT-CNN achieve classification accuracies of 99.5% or above. As noise intensity increases, the performance of all methods degrades. At a 20 dB signal-to-noise ratio, ST-mRMR-RF achieves 99.24% accuracy, outperforming all comparison methods. These results indicate that compact, discriminative feature subsets constructed using mRMR provide strong noise robustness.
4. Discussion
This study proposes an improved method integrating mRMR feature selection with the ST-RF framework, enhancing computational efficiency and classification performance. However, the ST-mRMR-RF framework shows limitations in discriminating feature-similar categories and handling extreme composite scenarios.
For feature-similar categories, this study improves discrimination by constructing a multi-source feature representation, applying mRMR for low-redundancy selection, and leveraging ensemble learning. Experiments show perfect separation of voltage sags and short interruptions under noise-free conditions. However, as noise increases (
Figure 6b–d), misclassifications rise, indicating strong noise interference as a major contributor to category confusion. Future work should enhance feature extraction robustness under strong noise conditions.
Regarding robustness in extreme composite scenarios, experiments evaluated the method under Gaussian white noise and partial data loss. However, real-world power grids often involve colored noise and high-proportion data loss simultaneously. As shown in
Table 6, under 40 dB Gaussian white noise, accuracy decreases to 97.62% when the sampling omission rate reaches 10%, and further degrades as the omission rate increases. This indicates that non-white noise and severe data missingness challenge feature stability and class boundary identification. This study has not yet sufficiently explored highly complex real-world power grid environments. Future research should test more realistic noise and missing data models and incorporate adaptive time–frequency analysis methods, such as Gaussian window optimization or spectral peakedness statistics. This will enable systematic evaluation and enhancement of theoretical performance limits and practical robustness under demanding conditions.
In summary, the limitations of this study mainly lie in insufficient adaptability to complex real-world environments. First, feature discriminability decreases under strong noise and compound interference, reducing the ability to distinguish feature-similar disturbances. Second, the study relies on synthetic datasets and simplified noise models, and robustness under extreme conditions, including non-stationary colored noise and high data loss rates, remains unverified. Engineering deployment also faces challenges from embedded device computational constraints and multi-module integration. To address these limitations, future work will follow several directions. First, the model will be validated using real-world data, and feature enhancement strategies resistant to non-stationary noise will be developed. Second, algorithm lightweighting and edge optimization will be implemented for deployment. Finally, an intelligent monitoring framework integrating online recognition and adaptive learning will be constructed.
5. Conclusions
This study proposes an S-transform-based multi-scale feature fusion method, termed ST-mRMR-RF, for power quality disturbance classification. The method extracts spectral, low-, mid-, and high-frequency components from the S-transform time–frequency matrix as frequency-domain features. These features are fused with time-domain features to construct a comprehensive power quality disturbance feature set. To address feature redundancy and computational inefficiency, the Maximum Relevance Minimum Redundancy (mRMR) algorithm selects discriminative feature subsets. This strategy reduces feature dimensionality while preserving critical information, improving computational efficiency and classification accuracy. Experimental results verify the effectiveness and robustness of the proposed ST-mRMR-RF framework.
Feature sets extracted using the S-transform consistently outperform those derived from STFT and CWT across different classifiers. Using the S-transform multi-scale fusion feature set, RF, PLS, ELM, and CNN achieve accuracies of 99.98%, 91.89%, 97.84%, and 99.81%, respectively. These accuracies exceed those obtained using single feature sets. In particular, RF shows accuracy improvements of 0.73% over time-domain features and 1.62% over frequency-domain features.
Following mRMR-based feature selection, the Random Forest classifier achieves 100% accuracy. Model training time decreases from 95.15 s to 10.59 s, representing an 88.91% reduction. Inference time decreases from 6.48 s to 0.69 s, corresponding to an 88.32% reduction, demonstrating the effectiveness of mRMR within this framework.
Under noise conditions with signal-to-noise ratios of 20 dB, 30 dB, and 40 dB, ST-mRMR-RF maintains high performance, achieving accuracies of 99.24%, 99.75%, and 99.84%, respectively. Additionally, under 40 dB noise with signal loss rates of 5%, 10%, and 15%, ST-mRMR-RF achieves accuracies of 98.27%, 97.62%, and 96.00%, respectively. These results demonstrate robustness against data loss.
In summary, compared with traditional methods, ST-mRMR-RF enhances feature representation through multi-scale fusion and reduces feature dimensionality using mRMR, improving computational efficiency. The method maintains high classification accuracy under noise and signal loss, making it suitable for real-world power grids with complex interference and incomplete data.