Automatic Power Quality Disturbance Diagnosis Based on Residual Denoising Convolutional Auto-Encoder

: With the increasing integration of non-linear electronic loads, the diagnosis and classiﬁcation of power quality are becoming crucial for power grid signal management. This paper presents a novel diagnosis strategy based on unsupervised learning, namely residual denoising convolutional auto-encoder (RDCA), which extracts features from the complex power quality disturbances (PQDs) automatically. Firstly, the time–frequency analysis is applied to isolate frequency domain information. Then, the RDCA with a weight residual structure is utilized to extract the useful features in the contaminated PQD data, where the performance is improved using the residual structure. A single-layer convolutional neural network (SCNN) with an added batch normalization layer is proposed to classify the features. Furthermore, combining with RDCA and SCNN, we further propose a classiﬁcation framework to classify complex PQDs. To provide a reasonable interpretation of the RDCA, visual analysis is employed to gain insight into the model, leading to a better understanding of the features from different layers. The simulation and experimental tests are conducted to verify the practicability and robustness of the RDCA.


Introduction
The primary purpose of the power system is to provide industry and individuals with a stable and clean electricity signal [1]. However, the power grid suffers from power quality disturbances (PQDs) due to the increasing involvement of non-linear inductive and capacitive loads in the distribution network. These disturbances that pollute the original clean AC sinusoidal waveform have a negative impact on other consumers and even the stability of the power grid. Moreover, different kinds of PQDs, such as flicker, harmonics, swells, transients, etc., result in not only economic losses, but also an unreasonable waste of energy [2,3]. Unfortunately, the combination of different types of disturbances increases the difficulties of accurate detection. Therefore, to improve the accuracy of power quality diagnosis, it is necessary to take automatic and effective measures to cope with more complex situations.
Taking into account the requirements for accurate diagnosis of power quality, a large number of studies have been proposed. In general, the PQD diagnosis and classification steps can be approximately divided into three steps, i.e., time-frequency analysis, feature extraction, and classification [4]. Various time-frequency analysis algorithms, such as shortrime Fourier transform (STFT), transform (GT), wavelet transform (WT), Hilbert-Huang transform (HHT), the Stockwell transform (ST), and some optimized methods have been proposed for the digital signal processing [5]. However, the fixed parameters setting will reduce its applicability, especially for complex PQ disturbances.

1.
The time-frequency information of PQDs is initially separated by DRST. To refine the feature extraction capability, a noise reduction, and weight residual structure referred as residual denoising convolutional auto-encoder (RDCA) is proposed to automatically extract features from time-frequency information.

2.
In terms of classification, the single-layer CNN (SCNN) classifier is used to fulfill the classification task. The batch normalization (BN) layer and dynamic learning rate are integrated to speed up convergence. 3.
We also propose a classification framework based on RDCA and SCNN. Especially, we perform two visual verification methods to investigate how RDCA could benefit from the diagnosis of PQDs. Features extracted from different layers of RDCA are visualized and analyzed to verify the meaning of the extracted features.

4.
To validate the framework, the simulation and experimental results on both single and combined PQDs data sets are conducted under different noise conditions. The rest of the paper is structured as follows. In Section 2, the proposed RDCA and SCNN are introduced. The PQD diagnostic framework is presented in Section 3. Section 4 explores the interpretability of extracted features by RDCA. Thereafter, Section 5 elucidates the effectiveness of the proposed method through a large number of experiments. Finally, Section 6 concludes this paper.

Time-Frequency Analysis of Disturbance Signals
Single and combined PQDs contain abundant time domain and frequency domain information. Hereby, different manually selected features are applied to extract as complete feature information as possible from the time-frequency analysis since the classification accuracy of PQDs is highly affected by the performance of time-frequency analysis [22].
The advantage of DRST is that it has high energy concentration at both low and high frequencies. In our work, we use the DRST, which has a double resolution, as a time-frequency decomposition algorithm for PQDs [16]. The DRST of input PQD signal x(t) can be defined as where β 1,2 denotes the adjustable parameters, f denotes the signal frequency, µ is the time shift factor. Essentially, the time-frequency spectrum of PQDs is divided into two parts, where f ≤ 1.5 f 0 and f > 1.5 f 0 represent the low-frequency part and high-frequency part, respectively. It is worth mentioning that to avoid the spectrum leakage and provide more accurately signal analysis result, the separation value is set to 1.5. The f 0 is the fundamental frequency. Besides, to improve the frequency resolution and noise reduction capability, the values of parameter β 1,2 are set to 121 and 10, respectively, according to [16]. To verify the effectiveness, we compared the disturbance signal analysis result with different parameter β 1,2 . The disturbance signal is the harmonic disturbance with the harmonic level is the third and the fifth harmonic in Figure 1a-c show the signal analysis result with different parameters of β 1,2 . From Figure 1, the parameters β 1,2 directly affect the time-frequency result. For example, when parameters β 1,2 are set to 121 and 10, respectively, the third harmonic and the fifth harmonic can be identified accurately. When parameters β 1,2 are set to 150 and 40, respectively, the time-frequency resolution is lower. Besides, there is also the spectrum leakage problem. Thus, to improve the frequency resolution and noise reduction capability, the β 1,2 are set to 121 and 10, respectively.
To further verify the detection performance, we compared the transients disturbance analysis result with different parameters of β 1,2 . Figure 2a shows the transients input signal.  From Figure 2, when parameters β 1,2 are set to 121 and 10, respectively, the transients disturbance can be identified accurately, indicating a suitable frequency resolution. When parameters β 1,2 are set to 150 and 40, respectively, the frequency resolution is lower. Besides, there is also the spectrum leakage problem. Thus, to improve the time-frequency resolution, the β 1,2 are set to 121 and 10, respectively.
Different from the manually selected features perspective, the output of DRST, which is an amplitude time-frequency matrix, can be regarded as an image perspective instead of a conventional single amplitude value feature. At this time, the amplitude and time of the time-frequency can be considered as pixel value and location of the image, respectively. In this work, the sampling frequency is set to 2.56 kHz where the fundamental frequency f 0 = 50 Hz and the length of the sampling period is 10 cycles of the fundamental component. Then the size of the amplitude time-frequency matrix is 256 × 512 (height × width). It is worth mentioning that the 256 and 512 are the length of the frequency axis and the time axis, respectively. For the 50 Hz fundamental frequency, there will be 512 points in 10 periods. Meanwhile, the length of the frequency axis of the time-frequency matrix is half of the time axis since the results of the FFT is symmetrical. Thus, the size of the amplitude time-frequency matrix is 256 × 512. The equations of discrete DRST are listed as follows.
Consider the input signal x(t) sampled at a fixed sampling frequency f s corresponding to the time interval of T s = 1/ f s . The discrete Fourier transform (DFT) of sampled signal is where x(l) represents the discrete-time sampling sequence of x(t) and k = 0, 1, . . . , N − 1. By using the Equation (2) to the sampled signal, we get an N-point vector X N of frequency samples as Define a 2-D matrix P N×M as where M = 2N is owing to the symmetry property of DRST. Another 2-D matrix G N×M is defined as where G(n, m) = e −(2π 2 n 2 /mλ 1,2 ) is the FFT for the redefined Gaussian window, n = 0, 1, . . . , N − 1 and m = 0, 1, . . . , M. Thus, by utilizing the convolution theorem and frequency shifting property of Fourier transform, the N-point discrete DRST can be described in matrix form as where • represents the Hadamard product of the matrix and IDFT representation of the inverse DFT(IDFT). In applications, the FFT and inverse FFT(IFFT), instead of DFT and IDFT, respectively, are used to improve the computational efficiency. The DRST(n, m) is, in general, a complex-valued matrix. Therefore, it can be expressed as DRST(n, m) = | DRST(n, m)|e jφ(n,m) where | DRST(n, m)| is the magnitude of DRST(n, m) and φ(n, m) = ∠ DRST(n, m) is the phase of DRST(n, m). Clearly, the amplitude associated with DRST matrix can be written as Next, the amplitude time-frequency matrix of DRST is fed to RDCA.

Regular Convolutional Auto-Encoder
The auto-encoder (AE) serves as a data dimensionality reduction technique [23]. Traditionally, the framework of AE consists of two components, the encoder and decoder. The encoder is mainly responsible for segmenting the input image and extracting valid features, and the decoder is used to reconstruct input data with feature information [24]. The number of nodes in the hidden layer is much lower than the input layer so that the encoder can provide effective representation for the input data.
Given an input amplitude time-frequency data set x = {x 1 , x 2 , . . . , x l } with shape size m × n for each element, where x l denotes the lth sample of x. Note that the x is the two-dimensional matrix set, which represents the time-frequency matrix caused by the DRST. The m and n represent the height and width of the time-frequency matrix, namely, the m and n are 256 and 512, respectively. In the auto-encoder, the data can be viewed as multiple two-dimensional images stacked on top of each other. Each image can be viewed as a feature mapping. The output mapping z r i of the encoder in the rth layer can be expressed as where W and b are the d × d × h dimensional convolutional kernel (also called convolution filter) weight and offset vector, respectively. The h is the depth of the convolution layer, i = 0, 1, . . . , h. The d is the size of the convolution filter. The sign * is the convolution operation, f (·) is the activate function. We choose the rectified linear unit as the activate function to improve the convergence speed [25]. The size of z r i becomes m × n × h which the zero is padding into the blank space.
Following the convolutional layer in the encoder, a max-pooling layer, which plays a role of the downsampling operation, is applied to reduce the input dimension [20]. The output feature z r pi of the max-pooling layer is defined as z r pi = max (z i (p × p × h) s ), where p × p × h is the size of the pooling layer with the stride length s. It indicates that the maximum value per p × p size in z i is the output of max-pooling. The size of pooled feature vectors z r pi is determined with the formula (m − p)/s × (n − p)/s × h, where · denotes rounding down. The target features z r pi extraction by the AE are fed to the classifier.
To reconstruct the signal, the decoder constructs a symmetric AE structure by upsampling. Similarly, the hidden layer representation z r pi is then reversely mapped to the same input size. This inverse mapping process is called decoder, the reconstructed output x of the nonlinearity form is where W and b are the weight and offset vector, respectively. The dimension of x is expanded in up-sampling by interpolation so that the W and b have the same dimension corresponding to the encoder. The regular AE can be used to compress complex PQDs to obtain representative features. However, it is difficult for AE to maintain effectiveness in shallow layers when the real-time diagnosis of PQDs is taken into account.

Residual Denoising Convolutional Auto-Encoder
One of the competitive methods to improve the optimization and feature expression ability is residual structure using shortcuts [26]. However, the effect of residual structure will be limited, especially for shallow AE. Therefore, we introduce a weighting strategy to residual structure and design a residual denoising convolutional auto-encoder.
The weight residual structure is shown in Figure 3. Figure 3 shows that the weight residual structure is an identity mapping where a weight parameter λ to feature z r i and f (z r i ) are directly added. The red dotted line represents the original residual structure. It is worth mentioning that the feature f (z r i ) are more separable than z r i after passing through multiple convolutional layers. It is reasonable to assign a lower weight for z r i . Thus we introduce a weight parameter λ to feature z r i as shown in Figure 3. The output of the weight residual structure can be obtained as where z r i and z r+1 i are the input and output features of the weight residual structure. Additionally, we add a 1 × 1 convolution layer and max-pooling for z r i to fit the data dimension. Based on the original residual structure, we can simply constrain the weight parameter λ to (0, 1). The weight residual structure is equivalent to the original residual structure when λ = 0. The detailed parameter λ value can be determined experimentally.
To learn useful features, a training criterion that contributes to the improvement of feature performance, namely denoising auto-encoder, is introduced to RDCA [23]. The key step is to clean partially corrupted input or add some noise to the corrupted input during training. For instance, the isotropic Gaussian noise (GN) σ is superimposed to the input x, then the input of RDCA becomes x + σ while the output is still x.
Thereafter, the proposed RDCA generates PQD features by stacking multiple convolutional layers. Finally, the model parameters are selected by optimizing the mean square error between the input x and reconstructed output x .

Single-Layer Convolutional Neural Network
After the RDCA is established, an effective classifier is required to classify the encoded features. The general classifier uses the softmax classifier with a fully connected (FC) layer [27]. However, this classifier is easily troubled by high dimensional data, especially when the output of RDCA is high dimensional.
In this section, we propose a single-layer CNN, which consists of a convolutional layer, a pooling layer, a BN, and a FC layer. The single convolutional layer also enhances flexibility of auto-encoder, without considering the depth of the convolution filter in encoder. Followed by the pooling layer, we use the BN layer to speed up training. The output of the BN layer is set as z c , and it is reshaped to a sequence to connect the FC layer.
Then the feature vector z c , an image containing abstract features, is fed to the softmax classifier. We set k as the total number of PQDs classes. The output of softmax classifier is obtained as where θ and y denote the parameters of softmax classifier and output class, respectively. Normally, the category with the highest probability of PQDs can be specified as the final class. To further reduce the over-fitting under multiple combined disturbances, the dropout is applied to softmax classifier, which can randomly omit certain probability neural nodes [27].

Classification Framework
By using the proposed RDCA and SCNN, this section presents a new framework of automatic PQD diagnosis and classification. Its framework is illustrated in Figure 4. The detailed parameter settings are as follows: the size of C1, C2, C3, and C4 are 11 × 11 × 8, 9 × 9 × 16, 9 × 9 × 24, and 7 × 7 × 4, respectively. The P1, P2, and P3 are all set to 2 × 2. The stride s are all set to 1 × 1. We set the learning algorithm as the Adam. The learning rate is 0.001 and the dropout rate is 0.5. The batch size is 50.  The sign C i and P i are the sizes of convolutional filter and max-pooling in the ith layer, respectively. The F i denotes the number of nodes in the FC layer. For example, when the input size m × n is set as 128 × 256, it can be inferred from Figure 4 that the output of the first max-pooling layer is 64 × 128 × 8 according to Section 2.2.
It is worth noting that the original shape of the amplitude matrix x is 256 × 512. However, when there is an output of RDCA model with the shape 64 × 128 × 8, the superparameters of RDCA can exceed millions. Thus, one compromise to reduce the RDCA model parameters is to down-sampling DRST, i.e., the size of amplitude time-frequency matrix decreases from 256 × 512 to 128 × 256.
Schematically, the classification of PQDs can be divided into three steps.

1.
Time-frequency analysis: Obtain the amplitude matrix data set x based on DRST. Each amplitude matrix is normalized to [0, 1] for better training.

2.
Automatic feature extraction: A multilayer RDCA is designed and trained using the amplitude time-frequency matrix. Then obtain the output features z r pi of the encoder.

3.
Classification: Remove the decoder structure of RDCA, and then feed features z r pi to the well-designed SCNN.
To reduce over-fitting, a dynamic learning rate is used in the training process of RDCA and SCNN. Specifically, the learning rate is reduced by half when the training loss is no longer reduced by five consecutive iterations.

PQD Signal Model and Parameter Setting
In this work, a total of sixteen types of PQDs are considered to verify the performance of RDCA, including different single and combined PQDs as shown in Table 1. It should be noted that every single model is established based on the IEEE Standard [8,28] for the combined PQDs. It is worth mentioning that all harmonic levels are the combination of the third and the fifth harmonic.
For the better separation of the useful information through RDCA, the PQDs are randomly generated with MATLAB. Different signal-to-noise ratios (SNR) of GN are superimposed on the original sequence signal. The input SNR of RDCA is 10 dB higher than the reconstructed signal, e.g., the target reconstructed signal is 30 dB when the input is set 20 dB.

Convolution Filter and Extracted Features Visualization
It is particularly crucial to have a reliable interpretation of the extracted features from the black box in RDCA. Moreover, from the perspective of PQDs diagnosis structure optimization, the interpretation can also contribute to design model expectations.
In this work, we sought to establish the connection between PQD diagnosis and extracted features. Concretely, we take two methods to visualize the features in different layers from convolution filter and feature output aspects, namely gradient ascent and de-convolutional network (Deconvnet) [25,29].
The effect of the filter on a certain disturbance image with the size of 128 × 256 × 1 is illustrated in Figure 5. Here we set the L13 as input of the RDCA-3. Two pictures of each convolutional layer are picked by setting the threshold value that has the maximum loss value. The orange line at the top of the image represents the amplitude of the fundamental component. The second layer has some preliminary and uncertain shapes. It shows that the texture is clearer in the third layer, and it can be regarded as multiple complex combinations of the former layers. In the CNN model, the lower the convolutional layers' number is, the closer the signal is to the original signal, which means no useful information has been learned. The more the convolutional layers' number is, the higher the feature order is and more abstract, and many typical local features can be extracted. Therefore, the features in Layer3 are some typical local features in L13, indicating that information has been learned.
The details of the L13 disturbance image become increasingly abstract and high level when the layer of RDCA goes deeper. Particularly, the harmonic component becomes blurred while the characteristics of the fundamental wave are retained in the third layer. Figure 5 manifests that the convolution operation has the benefit to extract edge information and amplitude information of PQDs through a combination of features.
To further explore the feature information, we focus on the output of the feature maps in the encoder. Taking the single PQDs L1, L4, L5, L6, and L9, and combined PQDs L13, L14, and L15 as examples, the output features of the third layer in RDCA-3 are illustrated in Figure 6. The size of feature maps in the third layer is 16 × 32.
According to the comparison of automatically extracted features in Figure 6, it can be seen that different PQD signals have unique characteristics. For instance, it clearly shows that there is a distinct recessed area and a small block in class L4 and L6, respectively, these corresponding high-level features indicate that the signal is voltage interrupt and transient disturbances. Furthermore, the alternating amplitude of L7 indicates that this feature is closest to the characteristic of the flicker signal. Both L5 and L13 demonstrate that the harmonics have more than one line in the feature visualization. Thus, it is revealed that the RDCA can accurately determine the location of the PQDs and categories. Nevertheless, some of the features are hard to be recognized visually. For example, it is difficult to intuitively judge whether L13 contains a sag or swell signal in Figure 6. Besides, owing to the amplitude of spike being small, it is not obvious for finding the difference of L9 by our intuitive feeling. To investigate the meaning of the unintuitive features, we utilize the Deconvnet approach proposed by Zeiler [29]. The Deconvnet gives an insight into the intermediate feature layers which maps the extracted features back into the input pixel space.
The strategy of Deconvnet is comprised of deconvolution and unpooling operations. Typically, the Deconvnet can be obtained by stacking deconvolution and unpooling operations with the weight parameters that have been trained in the RDCA. Following this technology, an exemplary plot of four extracted features and the corresponding Deconvnet analysis results have been presented in Figure 7. The symbol L11-i and L11-d-i represent the feature of RDCA-3 in the ith layer of L11 and the feature of Deconvnet in ith layer of L11, respectively. It can be observed from Figure 7 that each feature corresponds to a part of the original PQD's input image. For example, the part of harmonic in L11-2 is arduous to identify, however, the flicker and harmonic features become clear after the Deconvnet as shown in L11-d-2. Furthermore, those visually illegible abstract features especially for the high-level features in the third layer, such as the L6-3 and the L11-3, both have a clear and explicable meaning. Even though the features are sometimes illegible, it is to be mentioned that each feature corresponds to a portion of the PQD signal.
Overall, the meaning of automatically extracted features has a tremendous difference from the traditional manually selected features. The visualization results as illustrated make automatic feature extraction possible through their differences. However, it is still unknown whether these abstract features can be subdivided. Therefore, a suitable visualization analysis of feature classification is necessary.

Visual Analysis of Feature Classification Ability
To verify the separability of the features z pi , the stochastic neighbor embedding based on Student-t distribution (t-SNE) is utilized to analyze the high dimensional feature data [31]. Particularly, the t-SNE converts affinities of data points into the Student-t distribution probabilities. Thus, it is suitable for the visualization of high-dimensional PQDs feature images.
For a clearer view of the distribution of features, six types of PQDs are selected and fed into the t-SNE including L1, L5, L10, L13, L14, and L15. Three hundred feature samples are selected per type. Two parameters of the t-SNE, i.e., perplexity and iterations, are set to 40 and 3000, respectively. Figure 8 shows the results of t-SNE on feature data z 3 pi , generating by the third convolutional layer in the RDCA-3. Schematically, it indicates from Figure 8 that the distance and location of features for different types of PQDs are distinct. The purpose is to explain the change of disturbance features through visual analysis. Three hundred feature samples are selected per type. For instance, the classes L5 and L14 are distributed on both sides of the graph, respectively. Conversely, some features overlap each other, such as the L14 and the L15, which indicates that they have similar characteristics. The L1 and L5 are gathered together indicating that the corresponding original signals can be reconstructed with fewer features. It can be revealed from the comparison that the features extracted from the RDCA-3 are separable, which provides a guarantee for PQDs classification.

Experimental Analysis and Verification
In this section, many experiments are conducted to verify the RDCA and the diagnosis and classification results of PQDs. Concretely, 1600 samples per type are generated. The fold cross-validation is utilized to average output results, where 1040 samples (65%) for each type of disturbance are used for training and 560 samples (35%) each for validation and testing, respectively. To reduce the attenuation rate of feature size and allow multilayer structure stacking in encoder process, all the size and stride of max-pooling are set to Pi = 2 × 2 and s = 2. All the strides of the convolution filter are also set to 1 in RDCA. Grid search is used to select the rest of the model parameters. The training process is implemented on a laptop with the GTX 1060 GPU card. In the present study, the 20 epochs are set for RDCA training based on the Keras framework [32].

Sensitivity of RDCA Model Parameters
The structure of the model determines the classification performance of the extracted features. In this light, we conducted many experiments to test the performance of diverse layers, sizes of convolution filters, and convergence speeds as is shown in Figure 9. The 20 dB GN is added to the PQDs data set in this test. Moreover, the depth of each layer in the encoder is set to 8, 16, 24, and 32. In addition, we set the equal size of convolution filter per layer in each experiment. First of all, it is clear from Figure 9a that the classification accuracy increases significantly when the convolution filter size is lower than 5 × 5, and the performance of size 7 × 7 to 13 × 13 maintain at the same level for a certain layer. Secondly, when the layers are lower than 3, it has been witnessed that the precision is the corresponding increase with the rise of the layer number. The RDCA-3 has the best performance compared with the RDCA-4, indicating that the RDCA-4 may be overfitting due to lots of parameters. Thus a three layers RDCA with the 11 × 11 convolution filter in the first layer and 9 × 9 filter in the next two layers are selected.
Based on the RDCA-3, we further verify the validity of the weight residual structure as shown in Figure 9b. It shows that the RDCA performs better than the regular AE (λ = 0). We set λ = 0.4 because the RDCA-3 has the highest accuracy. Followed by the RDCA-3, we verify the convergence speed of SCNN. Figure 9c shows that the SCNN with the BN achieves stability when iterations is 15. The loss is also smaller, indicating that the SCNN has learned more information.

Experiments on the RDCA with Noise
To verify the noise immunity performance of the model, we trained both the threelayer RDCA (RDCA-3) and the three layer regular convolutional auto-encoder (CAE-3) without a denoising strategy. It is to be noted that the reconstructed signal in the decoder is exactly the same as the input for the CAE-3, where the size of the filter is optimally set to 11 × 11 for the first convolutional layer and 9 × 9 for the rest layers. Further, the depth of the RDCA and CAE in different layers are sequentially 8, 16, and 24, respectively. The input signal SNR range from 10 dB to 30 dB with a sliding stride of 5 dB. The classification result with different layers and noise is depicted in Figure 10, where the error bar denotes the standard deviation.
It is clear from Figure 10 that the classification accuracy increases as the noise level decreases for both RDCA and CAE. The result is in line with expectations, in which a lower noise level means a cleaner signal. Specifically, when the SNR value is below 20 dB, the advantages of the RDCA-3 are more obvious than the CAE-3. Further, the performance of the RDCA-3 is close to the CAE-3 when the SNR value is higher than 20 dB. Given the difference in reconstruction goals, the classification result indicates that the RDCA has stronger PQD feature extraction capabilities.  Figure 10. The classification accuracies with varying noise.

Comparison of RDCA, CAE, and CNN
Considering the proposed RDCA integrate the advantages of CAE and CNN and these three methods can all extract the input feature automatically, to verify the validity of the RDCA, we compare the proposed RDCA with some advanced feature extraction methods, including auto-encoder (AE), CAE [17], and DCNN [33]. The detailed structure of different methods is as follows.
For traditional AE, the input matrix x is directly flattened, i.e., the input layer has 32,768 neuron nodes. In this test, a single-layer (AE-1) and a three-layer AE (AE-3) model are used for comparison. We set the structure of 32,768 to 2000 for the AE-1. Therefore, it means that there are 2000 nodes in the second layer of the encoder. Similarly, the structure of AE-3 is set to 32,768 to 5000 and then 1000 to 512.
For the CAE [17], an optimal single-layer CAE and three-layer CAE with L2 regularization are used to compare. The softmax classifier is used to classify both for AE and CAE.
For the DCNN, a five-layer AlexNet with fewer parameters is used to classify timefrequency information directly [33]. The depth of each layer is set to 8, 16, 32, 64, and 128. In particular, the PQDs' signal is superimposed with the GN and the SNR is 20 dB. The quantitative performance and the test time are obtained, as listed in Table 2. The test time calculates the part of feature extraction and classification for all test sets (8960 samples), ignoring the time-frequency transform.
It is shown in Table 2 that the proposed RDCA outperforms the AE, CAE, and DCNN. The performance of the RDCA-3 with the softmax classifier is lower than that with the SCNN. Particularly, the accuracy of RDCA-1 is better than DCNN, which indicates that the operation of refactoring contributes to the extraction of effective PQDs features. The performance of AE is not satisfactory due to the lack of convolution operations. On the other hand, the three-layer network always has a better classification result compared to the single-layer models. For the test time analysis, the AE and CAE take less time because the structures of models are simpler with fewer parameters. It can be witnessed that the total time of the RDCA models in Table 2 is similar to the CNN and CAE.

Comparison with Other PQD Diagnosis Methods
In order to further verify the validity of the proposed method, we compare the RDCA with some recent research results are compared, as listed in Table 3. It is observed that different methods have different types and quantities of manually selected features. For instance, the method proposed in [9] has 62 kinds of features, while the proposal in [36] has 4 features. However, the accuracy of manually selected feature extraction methods is still lower than the automatic feature extraction. Moreover, sixteen types of disturbances are reported in [30], it is obviously difficult to distinguish all the PQDs by nine features. Particularly, only 30 dB noise conditions of the PQDs are reported in [34,36]. It can be seen from Table 3 that the proposed method performs with the highest classification accuracy under the condition of 20 dB noise level.

Experimental Verification of Proposed Method
It is worthwhile to verify the performance of RDCA under the experimental data set, which is closer to a real power grid environment. We designed a hardware device to capture the PQD signals from the signal source, as shown in Figure 11. This platform consists of three parts: signal generating unit, measurement unit, and monitoring unit. A highprecision three-phase power quality source Fluke 6100A is utilized for signal generating, which can synthesize sinusoidal and non-sinusoidal power with harmonics, fluctuating harmonics, flicker, interrupt, sags, and swells. Then the hardware circuit with the 16-bit analog to digital converter (ADC) ADS8556 is used to sample real-time PQD signal and digital signal processor (DSP) TMS320C6748 with a clock frequency of 375 MHz is used to process the sampled signal via the DRST. After that, the features of real-time data can be extracted by the RDCA as soon as the DSP sends the result data of DRST to the host PC through a USB serial interface. Due to the functional limitation of Fluke 6100A, we generate eight types of PQDs including L1, L2, L3, L4, L5, L7, L10, and L13. Table 4 presents the performance of experimental data generated by the hardware platform. It is noted that a total of 50 samples are generated per type of disturbance. The test time consists of DRST and RDCA, which the DRST and RDCA are calculated in DSP and the host PC under all the samples, respectively. As listed in Table 4, it is observed that the performance is a bit lower than the simulation because the experimental signal is more complicated than the simulation model. However, the average accuracy still reaches 95.75%. On the other hand, the test time shows that the combined disturbances are time-consuming compared to a single disturbance. In general, it is caused by the fact that the combined PQDs have more points of key frequency in the DRST analysis. The highest detection time namely, L10, is 4.76 ms for each sample, once the training of the RDCA model is completed. The results also reveal that real-time power quality diagnosis can be realized effectively in a high-frequency computing unit.

Conclusions
In this paper, new RDCA and SCNN methodologies have been proposed with automatic extracted features and classification for both single and combined PQD signals. The automatically extracted features, which are based on the primary novel idea of treating the time-frequency matrix as the whole angle, solve the uncertainty of the issue of manually selected features. For the purpose of PQD diagnosis and classification, the parameter sensitivity experiments show that the weight residual structure of the RDCA is helpful to improve the accuracy. Furthermore, the convergence experiment reveals that the BN layer of the SCNN contributes to accelerating convergence and makes the structure of the RDCA more flexible. The visualization results show that the feature maps are able to obtain higher abstractness along with the increment of layers. A variety of comparative experiments manifest that the proposed method has superior precision as well as anti-noise capability than the existing approaches. The designed hardware test platform shows the real-time PQD diagnosis using the proposed method. The implementation performance demonstrates the practicability and robustness of the proposed method. To further improve this methodology, future work will focus on optimization of high-level feature extraction to simplify the model structure.

Conflicts of Interest:
The authors declare no conflict of interest.