Power Quality Disturbance Classiﬁcation Based on Parallel Fusion of CNN and GRU

: Effective identiﬁcation of complex power quality disturbances (PQDs) is the premise and key to improving power quality issues in the current complex power grid environment. However, with the increasing application of solid-state switches, nonlinear devices, and multi-energy system generation, the power grid disturbance signals are distorted and complicated. This increases the difﬁculty of PQDs identiﬁcation. To address this issue, this paper presents a novel method for power quality disturbance classiﬁcation using a convolutional neural network (CNN) and gated recurrent unit (GRU). The CNN consists of convolutional blocks, some of which come with a squeeze-and-excitation block (SE), and is used to extract the short-term features from PQDs, where the convolutional block is used to capture the spatial information from PQDs and the SE is used to enhance the feature extraction capability of the convolutional neural network. The GRU network is designed to capture the long-term feature from PQDs, and an attention mechanism connected to GRU’s hidden states at different times is proposed to improve the GRU’s feature capture ability in long-term sequences. The CNN and GRU are parallelly arranged to perceive the same PQDs in two different views, and the feature information extracted from them is fused and transmitted to the Softmax activation layer for classiﬁcation. Based on MATLAB-Simulink, a typical multi-energy-source system is constructed to analyze PQDs, and twelve PQDs are simulated to validate the proposed method. The simulation results show that the proposed method has higher classiﬁcation accuracy in both single and hybrid disturbances and signiﬁcant advantages in noise immunity.


Introduction
With the development of smart grids, the large-scale integration of advanced power electronic devices, electric vehicles, and multi-energy system have brought new challenges to the operation of distribution networks [1].The high penetration level of renewable energies requires the use of operation and management strategies to maintain and enhance the reliability, efficiency, and safety of the power grid [2].Fundamentally, the extensive use of power electronics equipment, combined with a tremendous number of nonlinear loads in the power system, has resulted in an increase in both single PQDs and compound PQDs.This situation poses enormous challenges to the power grid, ultimately affecting the reliability of power system operation [3].Therefore, detecting and classifying PQDs accurately and efficiently is key to resolving power quality problems [4].
Traditionally, the classification of PQDs is divided into three stages: feature extraction, feature selection, and feature classification [5].There are many methods based on signal processing for feature extraction and feature selection, such as short-time Fourier transform [6], wavelet transform [7], wavelet packet transform [8], S-transform [9], and Hilbert Huang transform [10].All of these methods can extract spatial-temporal features of PQDs, enabling the identification of PQDs.Feature classification is responsible for associating the PQDs with selected feature types, and machine learning classifiers such as decision tree [11], support vector machine [12], Bayesian decision [13], artificial neural network (ANN) [14], and random forest [15] classifiers are commonly used for this purpose.It is important to recognize that the performance of the classifier is directly affected by the choice of features.However, there are still two issues that need to be addressed.The first issue is that the features extracted from PQDs are often artificially selected and heavily dependent on expert experience.The second issue is that the classifier may encounter difficulties in accurately classifying complex compound PQDs [16].
Deep learning is widely used in various fields such as image, signal, and information processing.In recent years, researchers have started exploring the application of deep learning to the classification of PQDs.One of the key advantages of a deep learning network is its ability to automatically select features, since it consists of both feature extraction and classifier components that update simultaneously during the learning process.This represents a significant improvement over traditional machine learning classifiers that typically rely on manually selected features.Garcia et al. [17] utilized a CNN [18] to classify PQDs.A CNN has a very efficient ability to obtain short-term information from PQDs.In [19], a multi-fusion convolutional neural network is utilized for the classification of PQDs with a focus on automatic extraction and fusion of features from multiple sources.It combines time and frequency domain information to enable the automatic classification of complex PQDs.As stated in [20], a sequence-to-sequence deep learning model based on the GRU [21] is proposed for the recognition of power quality disturbance types and their corresponding time locations.The model is capable of recognizing the type of each element in the sequence and subsequently locating the starting and ending times of the disturbances.Junior et al. [22] and Mohan et al. [23] utilized a CNN combined with long short-term memory in series (CNN-LSTM) to classify PQDs.The CNN-LSTM model applies a CNN to extract features and long short-term memory (LSTM) [24] to filter and update these features.By combining these layers, CNN-LSTM performs the automated extraction, selection, and classification of PQDs, unifying the problem into a single task.Kumar et al. [25] employed S-Transform, ANN, and rule-based decision trees to classify PQDs.S-matrix contours such as maximum amplitude versus time and amplitude versus frequency from the S-Transform matrix clearly depict the disturbance patterns of the power system.Feature extraction from the S-Transform was utilized to train the ANN.Decision rules were then used to map observations about an item to determine its target value, representing the decision-making process.This method effectively classifies both single and compound disturbances, with the S-Transform matrix clearly showing the disturbance modes of the power system.However, the feature acquisition and network training processes are separate, resulting in decreased network training efficiency and classification accuracy compared to deep learning networks that integrate both processes.
The application of deep learning in PQD problems can not only improve the classification accuracy, but also save manpower and simplify the processes.However, the methods mentioned above such as CNNs or recurrent neural networks (RNNs) may have some limitations.For example, although a CNN can improve its feature extraction ability through multilayer stacking, it may not be able to fully capture the temporal correlation of the feature information extracted from PQDs.Similarly, while an RNN can extract temporal features from time series data, it may be challenging to extract the full feature information from long time series data such as PQDs.Although the CNN-LSTM method combines the advantages of CNN and LSTM, the temporal feature extracted from CNN may not be comprehensive enough, which could affect the further improvement of classification accuracy.
Given the above difficulties, the main contributions of this paper are as follows: (1) This paper proposes a novel parallel constructional network (called CNN-GRU-P) composed of a CNN block and a GRU network block for classifying PQDs.By transmitting PQDs into both network blocks simultaneously, the proposed method provides a more comprehensive understanding of the PQDs from two different views.
The output of the two networks is fused through the fully connected layer and then transmitted to the Softmax activation layer for classification to obtain a more accurate classification result.(2) A CNN is utilized to extract short-term features from input PQDs.To further improve the accuracy of the classification, a squeeze-and-excitation operation is incorporated into the convolutional block.The squeeze operation is responsible for extracting contextual information, while the excitation operation captures channel-wise dependencies.By incorporating SE blocks, the weights of feature channels can be recalibrated, resulting in adaptively enhancing feature channels that contain important information and suppressing irrelevant feature channels.(3) A GRU network with an attention mechanism is utilized to extract long-term features from PQDs.The attention mechanism is able to assign correlation coefficients between each memory unit, thereby highlighting the impact of important information.This approach significantly enhances the feature extraction ability of the GRU network for PQDs.( 4) In order to further analyze the key factors that lead to PQDs in microgrids and validate the effectiveness of the proposed method, a simulation model based on MATLAB-Simulink was established to simulate twelve different types of PQDs.These PQDs are generated through three-phase faults, switching of heavy loads and capacitor banks, and connecting nonlinear loads.

Convolutional Neural Network with Squeeze-and-Excitation
The CNN architecture comprises several layers, including convolutional layers, pooling layers, batch normalization layers, and activation function layers [26].A deep convolutional neural network is composed of multiple layers of convolutional neural networks that are stacked on top of one another.Each layer of the network extracts output information from the previous layer and transmits it to the next layer.This process facilitates the precise and efficient extraction of deep signal features in the flow of data [27].
The convolutional layer utilizes convolutional kernels of a specific size to extract features from input signals.The pooling layer employs max pooling to decrease computation, prevent overfitting, and enhance the neural network's ability to resist noise in PQDs.By normalizing the input data for each layer during the training process, batch normalization (BN) guarantees that the input data remain consistent in distribution.This can enhance the training speed and reduce overfitting [28].
The convolutional kernel is the key component of the convolutional neural network; it combines the local receptive field and channel information from each layer's convolutional kernel to create information characteristics.However, the interdependence between each channel of the convolutional kernel is not taken into account.To improve the feature extraction capability of the convolutional neural network, this paper proposes incorporating the SE block.The SE block obtains global information from the convolutional layer through two operations: "squeeze" and "excitation".The SE block employs a lightweight gating mechanism to learn the interdependence between convolutional kernel channels, selectively emphasizing informative features and suppressing irrelevant features.This enhances the network's representation ability [29].
The "squeeze" operation in the SE block is performed through global average pooling, which compresses the global spatial information into convolution channel statistics.The statistical information of the convolution channel represents the input information and is a collection of convolution kernel information.This operation reduces the dimensionality of the global spatial information to a single vector for each channel, which is then used to compute channel-wise attention weight.The formula for the "squeeze" operation is as follows [29]: where , and Z is the output.F sq means the global average pooling operation to achieve channel statistics information by compressing the spatial dimensions W × H.
The "excitation" operation is shown in Formula (2) [29].The channel dependency is obtained through a simple gating mechanism.
where F ex symbolizes the full connection layer operation to reduce the computational load by reducing the number of channels.W 1 and W 2 are learnable parameters.
The output of the SE block is shown in Formula (3) [29].
where F scale represents the multiplication operation channel-wise.
, and s is used to describe the weight of c feature maps in tensor U.

Gate Recurrent Unit with Attention Mechanism
The GRU neural network is a version of the recurrent neural network that excels in feature extraction from time series signals [30].Compared to LSTM, the GRU requires the training of fewer parameters.The GRU neural network can learn to extract relevant features from a sequence of PQD samples by selectively retaining and forgetting certain information from the previous samples, based on the current input and the previous hidden state.As a result, we can achieve similar or even better training loss with fewer training iterations [30,31].The GRU consists of two gates, namely the update gate z t and reset gate r t .The combination of the new input and the previous memory is adjusted by r t .The preservation of the previous memory is controlled by z t .The architecture of GRU is shown in Figure 1, and its updating equations [30] are given as follows: where x t , h t , and h t are input data, output hidden states, and candidate hidden states at time t.W z , W r , and W h are input data's weight coefficient matrices of the update gate, reset gate, and candidate hidden states.U z , U r , and U h are hidden states' weight coefficient matrices of the update gate, reset gate, and candidate hidden state.The parameters in the weight coefficient matrices are the data that the network needs to train.b z , b r , and b h are the corresponding biases.Bias is a constant that helps adjust the output value of neurons, enabling them to make better decisions for a given input.denotes the Hadamard product.σ is the Sigmoid function, and tanh is the hyperbolic tangent function.The attention mechanism is inspired by the human brain's ability to selectively focu on certain stimuli while ignoring others.Its core idea is to allocate computational re sources to key areas of the input while downplaying the significance of less critical areas This approach effectively removes the noise and irrelevant factors that could otherwise interfere with the processing of important information.In essence, the attention mecha nism enables more efficient use of computing resources and better performance in com plex tasks [32].PQDs, being long time series signals, benefit from the use of an attention mechanism that allocates a distribution coefficient to each data point based on its contri bution to the classification results in the series.This allows the neural network model to assign greater weight to the data with a more significant impact while ignoring redundan or irrelevant information.By incorporating the attention mechanism in the network, we can overcome the challenges of long-term dependence, such as information redundancy and loss, and improve the feature extraction capability of the model, ultimately enhancing the classifier's performance.
The model architecture of the attention mechanism in the GRU network is shown in Figure 2.Where  ,  ,  , ⋯ ,  is the input sequence,  ,  ,  , ⋯ ,  is the state value of the GRU network,  is the attention distribution coefficient of the hidden layer state of the historical input information to the last state of the hidden layer,  is the weight of the whole hidden layer state, and  is the hidden layer state of the last output node.
The core of the attention mechanism is to calculate the attention distribution coeffi cient  , as shown in Formulas ( 8) and ( 9) [32]: The attention mechanism is inspired by the human brain's ability to selectively focus on certain stimuli while ignoring others.Its core idea is to allocate computational resources to key areas of the input while downplaying the significance of less critical areas.This approach effectively removes the noise and irrelevant factors that could otherwise interfere with the processing of important information.In essence, the attention mechanism enables more efficient use of computing resources and better performance in complex tasks [32].PQDs, being long time series signals, benefit from the use of an attention mechanism that allocates a distribution coefficient to each data point based on its contribution to the classification results in the series.This allows the neural network model to assign greater weight to the data with a more significant impact while ignoring redundant or irrelevant information.By incorporating the attention mechanism in the network, we can overcome the challenges of long-term dependence, such as information redundancy and loss, and improve the feature extraction capability of the model, ultimately enhancing the classifier's performance.
The model architecture of the attention mechanism in the GRU network is shown in Figure 2. The attention mechanism is inspired by the human brain's ability to selectively focus on certain stimuli while ignoring others.Its core idea is to allocate computational resources to key areas of the input while downplaying the significance of less critical areas This approach effectively removes the noise and irrelevant factors that could otherwise interfere with the processing of important information.In essence, the attention mechanism enables more efficient use of computing resources and better performance in complex tasks [32].PQDs, being long time series signals, benefit from the use of an attention mechanism that allocates a distribution coefficient to each data point based on its contribution to the classification results in the series.This allows the neural network model to assign greater weight to the data with a more significant impact while ignoring redundant or irrelevant information.By incorporating the attention mechanism in the network, we can overcome the challenges of long-term dependence, such as information redundancy and loss, and improve the feature extraction capability of the model, ultimately enhancing the classifier's performance.
The model architecture of the attention mechanism in the GRU network is shown in Figure 2.Where  ,  ,  , ⋯ ,  is the input sequence,  ,  ,  , ⋯ ,  is the state value of the GRU network,  is the attention distribution coefficient of the hidden layer state of the historical input information to the last state of the hidden layer,  is the weight of the whole hidden layer state, and  is the hidden layer state of the last output node.
The core of the attention mechanism is to calculate the attention distribution coefficient  , as shown in Formulas ( 8) and ( 9) [32]: Where is the state value of the GRU network, α ki is the attention distribution coefficient of the hidden layer state of the historical input information to the last state of the hidden layer, β is the weight of the whole hidden layer state, and h k is the hidden layer state of the last output node.
The core of the attention mechanism is to calculate the attention distribution coefficient α ki , as shown in Formulas ( 8) and ( 9) [32]: where e ki is the energy value of the hidden layer state at i, l is the length of the input sequence, and V, W, and U are the weight coefficient matrices that need to be trained in the network.
The semantic encoding and output feature vector that generate the attention distribution are shown in Formulas (10) and (11):

PQD Classification Based on CNN-GRU-P
A "parallel" architecture network, referred to as CNN-GRU-P, which combines the advantages of the CNN and GRU neural networks, has been designed to extract both short-term and long-term features.The architecture of this network is illustrated in Figure 3.The proposed network consists of three layers: signal processing layer, CNN-GRU-P layer, and output layer.

Simulation and Analysis
In this study, MATLAB-Simulink [34] was used to simulate the power grid.Figure 4 shows the microgrid model, which simulates the generation of PQDs in the power grid by setting the state of the components in the model [35].Figure 5 illustrates twelve types of PQDs: six single PQDs, namely sag, swell, interrupt, transient, harmonic, and flicker, and six compound PQDs, namely harmonic with sag, harmonic with swell, harmonic with interrupt, flicker with sag, flicker with swell, and flicker with Sag can be caused by a three-phase fault or switching of a large load.Swell can occur when large loads are suddenly removed from the system.Interrupt happens when there is a permanent threephase fault.Transient can be created by switching a large capacitor bank.Harmonic is To mitigate overfitting, we normalize each input PQD set by scaling it to a range of [0, 1]; we also assign corresponding labels to each PQD set and divide all PQDs into batches of size 128.The preprocessed PQDs are then fed into both the CNN and GRU networks for subsequent analysis.
Before inputting the PQD dataset into the CNN layer, the PQD dataset undergoes a reshaping process to convert its shape from (batch size, sequence length, 1) to (batch size, m, n), where n is the number of sampling periods in which the PQDs were obtained and m is the number of sampling points per cycle.This allows for the long PQD sequences to be divided into n shorter sequences of length m.This reshaping process improves the efficiency of network training.
The CNN comprises a stack of three convolutional blocks, two SE blocks, and a max average pooling layer.Each convolutional block includes a one-dimensional convolutional layer with a kernel size of 8, 5, and 3 and a number of filters of 128, 256, and 128, respectively.Each one-dimensional convolutional layer is followed by batch normalization with a momentum of 0.99 and epsilon of 0.001, and then the ReLU activation function.The first two convolutional blocks also include an SE block.The final convolutional block is followed by a global average pooling layer.
The GRU consists of a stack of GRU blocks and a dropout layer.The GRU block includes a GRU layer with 64 units and an attention embedding dimension of 128.The dropout rate is set to 80% to mitigate overfitting.
After the feature extraction, the short-term features from the CNN and long-term features from the GRU are concatenated using the Concat function and then input to a fully connected layer.Each concatenated PQD of features is used as an input to the fully connected layer, which applies the Softmax function to predict the PQD labels.The Adam optimizer [33] is used to calculate the loss between the model's predictions and true labels, with a learning rate of 1 × 10 −4 .

Simulation and Analysis
In this study, MATLAB-Simulink [34] was used to simulate the power grid.Figure 4 shows the microgrid model, which simulates the generation of PQDs in the power grid by setting the state of the components in the model [35].Figure 5 illustrates twelve types of PQDs: six single PQDs, namely sag, swell, interrupt, transient, harmonic, and flicker, and six compound PQDs, namely harmonic with sag, harmonic with swell, harmonic with interrupt, flicker with sag, flicker with swell, and flicker with harmonic.Sag can be caused by a three-phase fault or switching of a large load.Swell can occur when large loads are suddenly removed from the system.Interrupt happens when there is a permanent three-phase fault.Transient can be created by switching a large capacitor bank.Harmonic is caused by three-phase nonlinear loads, and flicker can be created by connecting three-phase dynamic loads.Flicker is generated by changing the formula of the input power supply [36].Compound PQDs can be simulated by combining single PQDs.The sampling frequency is set to 3200 Hz, and the data length of a single sample is 640 points.In total, 36,000 samples have been generated and divided into three sets, with 80% of the samples used for training, 10% for validating, and 10% for testing.The proposed method was trained and evaluated using a computer with Intel(R) Core (TM) i7-10700 CPU @ 2.90 GHz, 16 GB DDR4 RAM, and NVIDIA Geforce GTX 1650 graphic card.
Figure 6 illustrates the accuracy and loss of CNN-GRU-P on both the training set and validation set.The training and validation accuracy of CNN-GRU-P increased, while the training and validation loss decreased.The training accuracy rate increased significantly from 52.7% to 98.7%, while the validation accuracy rate rose from 65.1% to 98.4%.In addition, the training loss decreased from 1.34 to 0.03, and the validation loss decreased from 0.87 to 0.04.These results suggest that the parameters of the network model are reasonable and that the training process is not overfitting the data.
The proposed network CNN-GRU-P along with CNN and GRU in series (called CNN-GRU-S) were first simulated and compared.Under zero-noise condition, the training time for CNN-GRU-P is 34 min and 42 s, while for CNN-GRU-S it is 24 min and 32 s.Despite the difference in training time, both networks have similar classification accuracy for a single PQD.However, there are differences in accuracy for compound PQDs between the two networks.For harmonic with sag, flicker with sag, and harmonic with swell, the accuracy in CNN-GRU-P is 99.9%, 99.0%, and 99.9%, respectively, while in CNN-GRU-S The proposed method was trained and evaluated using a computer with Intel(R) Core (TM) i7-10700 CPU @ 2.90 GHz, 16 GB DDR4 RAM, and NVIDIA Geforce GTX 1650 graphic card.
Figure 6 illustrates the accuracy and loss of CNN-GRU-P on both the training set and validation set.The training and validation accuracy of CNN-GRU-P increased, while the training and validation loss decreased.The training accuracy rate increased significantly from 52.7% to 98.7%, while the validation accuracy rate rose from 65.1% to 98.4%.In addition, the training loss decreased from 1.34 to 0.03, and the validation loss decreased from 0.87 to 0.04.These results suggest that the parameters of the network model are reasonable and that the training process is not overfitting the data.
Energies 2023, 16, x FOR PEER REVIEW 9 of it is 89.5%, 87.6%, and 88.4%, respectively.While CNN-GRU-S has a shorter training tim CNN-GRU-P has demonstrated better performance in classifying compound PQ which is more representative of the real situation in a power grid.In order to further refl the advantages of parallel connection, CNN-GRU-P is compared with CNN [17], GRU [ and CNN-LSTM [20].
For the 12 PQDs mentioned above, three groups of data were sampled.One group PQDs had noise, while the other two groups of PQDs had Gaussian noise added at 30 and 20 dB, respectively.The same dataset was used to train the CNN, GRU, CNN-LST and CNN-GRU-P neural networks.The training times for these networks were 7 min a 46 s for CNN, 35 min and 29 s for GRU, 15 min and 4 s for CNN-LSTM, and 34 min a 42 s for CNN-GRU-P.Table 1 displays the classification accuracy results of the CNN, GRU, CNN-LST and CNN-GRU-P neural networks for three datasets.The proposed network CNN-GRU-P along with CNN and GRU in series (called CNN-GRU-S) were first simulated and compared.Under zero-noise condition, the training time for CNN-GRU-P is 34 min and 42 s, while for CNN-GRU-S it is 24 min and 32 s.Despite the difference in training time, both networks have similar classification accuracy for a single PQD.However, there are differences in accuracy for compound PQDs between the two networks.For harmonic with sag, flicker with sag, and harmonic with swell, the accuracy in CNN-GRU-P is 99.9%, 99.0%, and 99.9%, respectively, while in CNN-GRU-S it is 89.5%, 87.6%, and 88.4%, respectively.While CNN-GRU-S has a shorter training time, CNN-GRU-P has demonstrated better performance in classifying compound PQDs, which is more representative of the real situation in a power grid.In order to further reflect the advantages of parallel connection, CNN-GRU-P is compared with CNN [17], GRU [19], and CNN-LSTM [20].
For the 12 PQDs mentioned above, three groups of data were sampled.One group of PQDs had noise, while the other two groups of PQDs had Gaussian noise added at 30 dB and 20 dB, respectively.The same dataset was used to train the CNN, GRU, CNN-LSTM, and CNN-GRU-P neural networks.The training times for these networks were 7 min and 46 s for CNN, 35 min and 29 s for GRU, 15 min and 4 s for CNN-LSTM, and 34 min and 42 s for CNN-GRU-P.
Table 1 displays the classification accuracy results of the CNN, GRU, CNN-LSTM, and CNN-GRU-P neural networks for three datasets.Among the four neural networks, CNN has the lowest overall performance, but it demonstrates excellent recognition ability for oscillatory transients, achieving a 100% classification accuracy for this type of data.This is due to the strong feature extraction capability of the CNN.However, the CNN is limited in its ability to extract temporal features from PQDs, which ultimately restricts further improvements in classification accuracy.
The GRU network achieved an average accuracy of 89.2% on the 20 dB dataset, which is 4.7% higher than the average accuracy of CNN, which was 84.5%.These results demonstrate that the GRU has a strong anti-noise performance.Unlike CNN, GRU is capable of obtaining temporal features from PQDs, which contributes to its performance.
The CNN-LSTM network achieved an average accuracy of 97.0% on the 20 dB dataset.The classification accuracy of CNN-LSTM on the 20 dB noise dataset is 12.5% higher than that of CNN and 7.8% higher than that of LSTM.These results suggest that CNN-LSTM has excellent noise immunity.By combining CNN for feature extraction and LSTM for filtering and updating features, CNN-LSTM demonstrates a stronger feature extraction ability compared to CNN and LSTM.
The CNN-GRU-P network outperformed the other three neural networks on all three datasets, achieving an average classification accuracy that is higher than theirs.Moreover, the CNN-GRU-P network achieved a remarkable accuracy of 99.6% on the noise-free dataset.For the single PQDs, the CNN-GRU-P network demonstrated high accuracies, achieving 99.4%, 99.8%, 94.8, 100%, 98.5%, and 99.9% for sag, swell, interrupt, oscillatory transient, harmonic, and flicker, respectively.For compound PQDs, the CNN-GRU-P network achieved accuracies of 98.6%, 98.8%, 95.9%, 97.2%, 97.5%, and 95.9% for harmonic with sag, harmonic with swell, harmonic with interrupt, flicker with sag, flicker with swell, and flicker with harmonic, respectively.The results clearly demonstrate the effectiveness of the CNN-GRU-P network in detecting and classifying PQDs.Compared to the other three neural networks, the CNN-GRU-P network achieved a higher overall classification accuracy, and it also demonstrated better performance on compound disturbances.In particular, the CNN-GRU-P network achieved an average accuracy of 98.3% and 99.1% on the 20 dB and 30 dB noise datasets, respectively, indicating its strong anti-noise interference ability.These findings suggest that the CNN-GRU-P network is a promising approach for detecting and classifying PQDs in power systems.
The CNN-GRU-P network combines the strong feature extraction capability of the CNN with the ability of the GRU to extract time series features, making it a powerful approach for detecting and classifying PQDs.Unlike CNN-LSTM, which uses a "series" combination of a CNN and LSTM, CNN-GRU-P can simultaneously extract features from both the CNN and the GRU, enabling it to obtain complete time series features.Moreover, the SE block enhances the network's ability to extract characteristic signals on different convolution channels, while the attention mechanism strengthens the GRU's ability to extract timing signals of long timing signals such as PQDs.These features make CNN-GRU-P a highly effective approach for detecting and classifying PQDs in power systems.PQDs, such as three-phase faults, nonlinear loads, and large capacitor banks.The simulation results show that this method is suitable for accurately identifying single and composite power quality disturbances and provides a feasible solution for solving serious power quality problems in microgrids.

Figure 2 .
Figure 2. The attention mechanism in the GRU network.

Figure 2 .
Figure 2. The attention mechanism in the GRU network.

Figure 2 .
Figure 2. The attention mechanism in the GRU network.

( 1 )
This article proposes a novel parallel neural network (CNN-GRU-P) that combines a CNN with SE blocks and a GRU network with an attention mechanism for the classification of PQDs.The CNN-GRU-P method leverages the short-term feature extraction ability of a CNN and the long-term feature extraction ability of a GRU, while adding SE modules and attention mechanisms to enhance the network's training efficiency.The end-to-end training processes of the network enable automatic feature extraction and selection, merging and replacing existing feature extraction, selection, and classification in the network.Experimental results demonstrate that the classification accuracy of the CNN-GRU-P network in single and composite disturbances outperforms other networks and exhibits good noise resistance.The application of this classification network to power quality classification can improve power quality and ensure the operational reliability of multi-energy systems.It should be noted that the proposed network structure is relatively complex, resulting in longer training times and requiring good training equipment.Nonetheless, the CNN-GRU-P method demonstrates promising results and has the potential to contribute significantly to the field of power quality classification.(2)A simulation model was established to analyze the factors leading to microgrid