Classifying Power Quality Disturbances Based on Phase Space Reconstruction and a Convolutional Neural Network

: This paper presents a hybrid approach combining phase space reconstruction (PSR) with a convolutional neural network (CNN) for power quality disturbance (PQD) classiﬁcation. Firstly, a PSR technique is developed to transform a 1D voltage disturbance signal into a 2D image ﬁle. Then, a CNN model is developed for the image classiﬁcation. The feature maps are extracted automatically from the image ﬁle and di ﬀ erent patterns are derived from variables in CNN. A set of synthetic signals, as well as operational measurements, are used to validate the proposed method. Moreover, the test results are also compared with existing methods, including empirical mode decomposition (EMD) with balanced neural tree (BNT), S-transform (ST) with neural network (NN) and decision tree (DT), hybrid ST with DT, adaptive linear neuron (ADALINE) with feedforward neural network (FFNN), and variational mode decomposition (VMD) with deep stochastic conﬁguration network (DSCN). Based on deep learning algorithms, the proposed method is capable of providing more accurate results without any human intervention for PQDs. It also enables the planning of PQ remedy actions


Introduction
In the last three decades, the penetration of renewable energy into the power grid network has increased. This gives rise to power quality (PQ) issues. The same is true with increasing uptake of wind turbines, solar energy, and energy storage systems [1,2]. These disturbances greatly affect the safe and economical operations of smart grid networks and decrease the lifetime and power conversion efficiency and reliability of grid-connected renewable energy systems. Therefore, PQ analysis, including disturbance recognition and classification, is a crucial task to provide adequate information for further remedial actions. For instance, faulty power equipment can be discovered so that predictive maintenance can be scheduled in time. If power quality disturbance signals are detected, the control strategy of power converters can also be optimized to avoid catastrophic failures [3,4].
In essence, PQ refers to multifarious electromagnetic phenomena that cause deviations in voltage and current from their ideal waveforms, which are known as PQ disturbances (PQDs). The presence of PQDs can be divided into sags, swells, interruptions, oscillations, flickers, harmonics, notches, spikes, and their combinations, as per international standards such as EN 50160 [5], IEC 61000 [6], and IEEE-1159 [7]. The conventional methods of PQD recognition and classification contain two steps: feature extraction and classification. The feature extraction methods are mostly based on signal where i = 1, 2, . . . , L, L = N − (m − 1)τ, N is the number of sample points. Then, a phase space matrix can be obtained, which represents the coordinates of the signal trajectory: In essence, the phase space reconstruction trajectory is the strange attractor of a time series data in a chaotic system [36], e.g., electroencephalogram, short time power load, and stock exchange. In this paper, the disturbance components carried by periodical voltage signals are considered and a x(t) − x(t + τ) phase space is constructed to describe the signal trajectory. Hence, the 1D time series data of PQDs are mapped into 2D images, which serve as the input data to the CNN model for PQD classification. By comparing the traditional time-frequency-based feature extraction methods, the PQDs can be identified from a graphic perspective in the time domain, which overcomes the spectrum aliasing problem in the time-frequency transforming process.

The Theory of the Convolutional Neural Network
The CNN is a biologically inspired feedforward artificial neural network (ANN) that presents a simple model for the mammalian visual cortex. It has been widely used in the visual field, such as image recognition [37,38] and video classification [39]. A fundamental framework of the CNN model is illustrated in Figure 1. Typically, CNN architectures consist of an input layer, convolution layers, pooling layers, fully connected layers, and an output layer. The convolution layer uses the mathematical 2D convolution operation to transform low-level local features into high-level global features. It consists of multiple filters W k , each giving rise to an output feature map. This map h k corresponds to the weight matrix W k of the kth filter and can be obtained by: where the sign * is the mathematical 2D convolution operation, x is the input data of this layer, b is a bias term, and f (·) is a nonlinear activation function. The weight-sharing technique adopted between neurons in different layers helps the process of feed forward and backpropagation (BP) [40] to reduce the number of parameters under consideration. Through the convolution layer, the hidden invariant features in the data are extracted automatically and effectively. The pooling layer implements nonlinear downsampling after the convolution layer through a max-pooling (or average-pooling) method. The output of the convolution layer is divided into a set of nonoverlapping rectangles, outputting the maximum (or average) of each subregion. The aim of this layer is to reduce the spatial size of the representation progressively, decrease the number of parameters, and avoid overfitting. A fully connected layer combines all the feature maps to produce the final classification vector to the output layer.

The Proposed Algorithm of PQDs Detection and Classification Based on PSR and CNN
In this section, the mathematical models of the PQD signals are firstly presented; then, the framework and training process of the proposed method are demonstrated.

PQ Disturbance Model
To illustrate the use of the PSR-CNN method for PQDs classification, ten types of single and mixed voltage disturbance signals have been synthesized in MATLAB according to the IEEE-1159. The labels and numerical models are presented in Table 1. The amplitude of the simulated signals are normalized to 1 p.u. and the fundamental frequency is 50 Hz. These PQDs are all simulated over the defined parameter range as shown in Table 1.

The Proposed Algorithm of PQDs Detection and Classification Based on PSR and CNN
In this section, the mathematical models of the PQD signals are firstly presented; then, the framework and training process of the proposed method are demonstrated.

PQ Disturbance Model
To illustrate the use of the PSR-CNN method for PQDs classification, ten types of single and mixed voltage disturbance signals have been synthesized in MATLAB according to the IEEE-1159. The labels and numerical models are presented in Table 1. The amplitude of the simulated signals are normalized to 1 p.u. and the fundamental frequency is 50 Hz. These PQDs are all simulated over the defined parameter range as shown in Table 1.
Next, a 2D image is generated by the PSR method. The delay time τ is an important parameter of the PSR method. A good choice of τ facilitates the performance analysis. If the value is low, the adjacent successive elements x i+τ and x i+2τ of the vectors are strongly correlated; if it is high, the adjacent elements are almost independent. The effect of τ for the attractor construction is the expansion degree of the diagonal in the phase plane. The trajectory can be expanded at the utmost around the diagonal, which is a function of τ. To choose an appropriate delay time τ, a mutual information function of the system [s, q] = [x(t), x(t + τ)] [35] is used in this paper: where P s (s i ), P q q j and P sq s i , q j are the probabilities and the joint probabilities of the events s i , q j [39], respectively. A delay time τ is determined by dropping the I(S, Q) value to a sufficient small value 1 e [35]. Specifically, the sample frequency f s is 6.4 kHz, the number of sample points in one cycle is 200, and there are 10 cycles in every synthetic signal. In addition, the dimension parameter m of the PSR method is 2 and the delay time τ is 20 according to (4). Then, the 2D images of the PQD signals are transformed by the PSR method and illustrated in Figure 2. The size of every image is 200 × 200 pixels.

The Proposed Method
The main framework of the proposed method is illustrated in Figure 3. A CNN model composed of three convolution layers, two pooling layers, a fully connecting layer, and an output layer is established. The number of the convolution kernels in different layers are 32, 48, and 64, respectively. Table 1. Results of variational mode decomposition (VMD) for power quality disturbances (PQDs) with harmonics and interharmonics extraction.

PQ Disturbance Label Numerical Model Parameters
Harmonic Specifically, the sample frequency is 6.4 kHz, the number of sample points in one cycle is 200, and there are 10 cycles in every synthetic signal. In addition, the dimension parameter of the PSR method is 2 and the delay time is 20 according to (4). Then, the 2D images of the PQD signals are transformed by the PSR method and illustrated in Figure 2. The size of every image is 200 × 200 pixels.

The Proposed Method
The main framework of the proposed method is illustrated in Figure 3. A CNN model composed of three convolution layers, two pooling layers, a fully connecting layer, and an output layer is established. The number of the convolution kernels in different layers are 32, 48, and 64, respectively.
Herein, the image features are extracted by the convolution operator to preserve the spatial relationship between pixels in the image matrix. A feature map is produced by a kernel based on the input image. In this paper, three kinds of convolution kernel with two different size (5 × 5 and 3 × 3) are used. For the sake of simplicity, a 5 × 5 convolution kernel is taken for example and the convolution operation is implemented by: where is the weight of the kernel, is the input data of this layer, is a bias term, is the result of convolution operation, is the number of kernels, and , and , are the location labels of the original image and convolution kernel matrices, respectively.
The implementation of the convolution operator is illustrated in Figure 4. A kernel slides over the input image by using (5) to produce a feature map and the stride size of the sliding is 1. To guarantee the process of feature extraction, the zero padding method is used to preserve the information of the input volume. The convolution of another kernel over the same image gives a different feature map. Actually, the convolution operation captures the local dependencies in the input image and different feature maps are generated by different kernels. An additional operation, called activation, is performed after every convolution layer. The purpose of this process is to introduce nonlinearity in the CNN model for better learning the nonlinear PQDs data. Herein, the rectified linear unit (ReLU) activation function is adopted to fulfill the nonlinear activation requirements due to its advantages of faster training speed and gradient vanishing problem alleviation. The function of ReLU is illustrated as: Commonly, a pooling layer is added after the ReLU function to continuously reduce the dimensionality and number of parameters of the network. It shortens the time for the training computation and effectively controls overfitting. The average pooling method instead of max pooling is used because it retains the true feature of the sparse PQD image matrix. Finally, the fully connected layer, which is a traditional multilayer perceptron, uses a softmax function to estimate the classification vector to the output layer. Herein, the image features are extracted by the convolution operator to preserve the spatial relationship between pixels in the image matrix. A feature map is produced by a kernel based on the input image. In this paper, three kinds of convolution kernel with two different size (5 × 5 and 3 × 3) are used. For the sake of simplicity, a 5 × 5 convolution kernel is taken for example and the convolution operation is implemented by: where w is the weight of the kernel, x is the input data of this layer, b is a bias term, c is the result of convolution operation, k is the number of kernels, and i, j and m, n are the location labels of the original image and convolution kernel matrices, respectively. The implementation of the convolution operator is illustrated in Figure 4. A kernel slides over the input image by using (5) to produce a feature map and the stride size of the sliding is 1. To guarantee the process of feature extraction, the zero padding method is used to preserve the information of the input volume. The convolution of another kernel over the same image gives a different feature map. Actually, the convolution operation captures the local dependencies in the input image and different feature maps are generated by different kernels. An additional operation, called activation, is performed after every convolution layer. The purpose of this process is to introduce nonlinearity in the CNN model for better learning the nonlinear PQDs data. Herein, the rectified linear unit (ReLU) activation function is adopted to fulfill the nonlinear activation requirements due to its advantages of faster training speed and gradient vanishing problem alleviation. The function of ReLU is illustrated as:

The Training of the Model
A training process is needed to improve the effectiveness of the model. The overall training process is carried out by six steps: Step 1: Initialize all the parameters of the kernels with random values.
Step 2: Divide the original images into training and testing sets. The model goes through the forward propagation step (convolution, ReLU, pooling, and fully connected layers) and determines the output probabilities for each class with a training image.
Step 3: A cross entropy function is used as a cost function to calculate the error at the output layer: where is the error of the th type of PQDs, is the target probability, and is the output probability. Commonly, a pooling layer is added after the ReLU function to continuously reduce the dimensionality and number of parameters of the network. It shortens the time for the training computation and effectively controls overfitting. The average pooling method instead of max pooling is used because it retains the true feature of the sparse PQD image matrix. Finally, the fully connected layer, which is a traditional multilayer perceptron, uses a softmax function to estimate the classification vector to the output layer.

The Training of the Model
A training process is needed to improve the effectiveness of the model. The overall training process is carried out by six steps: Step 1: Initialize all the parameters of the kernels with random values.
Step 2: Divide the original images into training and testing sets. The model goes through the forward propagation step (convolution, ReLU, pooling, and fully connected layers) and determines the output probabilities for each class with a training image.
Step 3: A cross entropy function is used as a cost function to calculate the error at the output layer: where e i is the error of the ith type of PQDs, y i is the target probability, andŷ i is the output probability.
Step 4: Calculate the gradients of the error with respect to all weights and parameters in the model by using the BP technique. Then, the gradient descent is used to update all the parameter values of the kernels to minimize the output error.
Step 5: Repeat Steps 2-4 with all images in the training set until the error is within the preset value.
Step 6: After the training process, the testing set is used to validate the accuracy of the model.

Results and Discussion
The effectiveness of the proposed method is verified in this paper for PQD detection and classification. The test data are generated through both synthetic and operational signals in a smart grid system.

Synthetic Signals
As illustrated in Section 3, ten voltage disturbances were established in MATLAB. For each disturbance, there are 400 sample signals, where 320 are for training and 80 are for validation.
A CNN model is established for the PQD 2D image training. In total, ten epochs are adopted in the training progress to obtain optimal parameters of the model. The hardware for the model training is based on an Intel (R) Core (TM) i7-6700HQ CPU @ 2.6GHz, 16 GB RAM and NVIDIA GeForce GTX 970M GPU with 192-bit 4 GB GDDR5 memory. The training process lasts 111 min and 52 s. The training progress and the confusion matrices of training and validation results are shown in Figures 5 and 6, respectively. As illustrated in Section 3, ten voltage disturbances were established in MATLAB. For each disturbance, there are 400 sample signals, where 320 are for training and 80 are for validation.
A CNN model is established for the PQD 2D image training. In total, ten epochs are adopted in the training progress to obtain optimal parameters of the model. The hardware for the model training is based on an Intel (R) Core (TM) i7-6700HQ CPU @ 2.6GHz, 16 GB RAM and NVIDIA GeForce GTX 970M GPU with 192-bit 4 GB GDDR5 memory. The training process lasts 111 min and 52 s. The training progress and the confusion matrices of training and validation results are shown in Figures  5 and 6, respectively.
In Figure 5, the high classification rate can be obtained only after five epochs and the loss of the cost function was almost zero. In Figure 6a, the classification rate in the training confusion matrix was 100% for both single and mixed synthetic disturbances. While 80 events were used for validation, the total classification rate was 99.8%, as shown in Figure 6b. The results demonstrate that the proposed method based on the PSR and CNN can achieve an excellent classification rate for PQDs. Additionally, the accuracy comparison of the proposed PQ assessment framework with other methods was illustrated in Table 2 with five methods, including EMD with balanced neural tree [41], ST with NN and DT [25], Hybrid ST with DT [42], ADALINE with FNN [43], and VMD with DSCN [22]. In Table 2, the proposed method is shown to be as good as the current best methods in terms of accuracy, as the 99.67-99.90% accuracy is well within the measurement uncertainty. The novelty of this work lies in the deep learning-based classification method and automatic feature extraction, while the existing methods are handcrafted. The features extracted by the convolution layers are shown in Figure 7. It can be seen that the feature maps are different even in one layer. For the different convolution layers, the extracted features are more specific when going to deep levels. Moreover, the weights of a convolution kernel are different too, leading to different pattern calculations for different variables. The evaluation results show that the proposed framework has comparatively better performance than the existing methods.

Real-World Signals
Following the synthetic signals, four types of operational voltage disturbance signals were measured and provided online at the IEEE Working Group on Power Quality Data Analytics [44], including sag, swell, sag and harmonic, and swell and harmonic. For each disturbance, there are 20 In Figure 5, the high classification rate can be obtained only after five epochs and the loss of the cost function was almost zero. In Figure 6a, the classification rate in the training confusion matrix was 100% for both single and mixed synthetic disturbances. While 80 events were used for validation, the total classification rate was 99.8%, as shown in Figure 6b. The results demonstrate that the proposed method based on the PSR and CNN can achieve an excellent classification rate for PQDs. Additionally, the accuracy comparison of the proposed PQ assessment framework with other methods was illustrated in Table 2 with five methods, including EMD with balanced neural tree [41], ST with NN and DT [25], Hybrid ST with DT [42], ADALINE with FNN [43], and VMD with DSCN [22]. In Table 2, the proposed method is shown to be as good as the current best methods in terms of accuracy, as the 99.67-99.90% accuracy is well within the measurement uncertainty. The novelty of this work lies in the deep learning-based classification method and automatic feature extraction, while the existing methods are handcrafted. The features extracted by the convolution layers are shown in Figure 7. It can be seen that the feature maps are different even in one layer. For the different convolution layers, the extracted features are more specific when going to deep levels. Moreover, the weights of a convolution kernel are different too, leading to different pattern calculations for different variables. The evaluation results show that the proposed framework has comparatively better performance than the existing methods.

Real-World Signals
Following the synthetic signals, four types of operational voltage disturbance signals were measured and provided online at the IEEE Working Group on Power Quality Data Analytics [44], including sag, swell, sag and harmonic, and swell and harmonic. For each disturbance, there are 20

Real-World Signals
Following the synthetic signals, four types of operational voltage disturbance signals were measured and provided online at the IEEE Working Group on Power Quality Data Analytics [44], including sag, swell, sag and harmonic, and swell and harmonic. For each disturbance, there are 20 sample signals, of which the fundamental frequency is 60 Hz and which are sampled at 7.6 kHz. The number of sample points in one cycle is 128 and there are eight cycles in every signal. Several operational signals are needed to participate in the training process to fine-tune the parameters of the existing mode. To choose an optimal number of operational signals, a different number of synthetic signals of each disturbance from the training data used in the previous section were randomly replaced by the real-world signals. Then, the new training data were used to fine-tune the CNN model. Finally, the fine-tuned model was validated by using the operational data. The results are shown in Figure 8. It can be seen that a low classification rate (20%) is obtained without the fine-tuning process. Then, the classification rate improves (97.5%) with the five real-world signals used. The classification rate reaches the highest point (98.8%) when the number of operational signals is seven. Hence, in order to improve the performance of the classification of operational voltage disturbances, seven synthetic signals of each disturbance from the training data were randomly replaced with real-world signals. The validation confusion matrix is shown in Figure 9. The features extracted by the convolution layers are presented in Figure 10. In total, 20 events for each disturbance were used for validation and the total classification rate was 98.8%, as shown in Figure 9. It is observed that the proposed method combining PSR and CNN has good performance for both synthetic and operational PQD classifications.
replaced with real-world signals. The validation confusion matrix is shown in Figure 9. The features extracted by the convolution layers are presented in Figure 10. In total, 20 events for each disturbance were used for validation and the total classification rate was 98.8%, as shown in Figure 9. It is observed that the proposed method combining PSR and CNN has good performance for both synthetic and operational PQD classifications.   extracted by the convolution layers are presented in Figure 10. In total, 20 events for each disturbance were used for validation and the total classification rate was 98.8%, as shown in Figure 9. It is observed that the proposed method combining PSR and CNN has good performance for both synthetic and operational PQD classifications.

Discussion
It can be seen from the above results that once the trained model is available, the PQDs can be easily classified with a high classification rate. The feature maps can be extracted from the disturbance signals automatically without human intervention. Besides, the novelties of this paper are listed as follows: (1) The PQD classification can transform a complicated 1D signal processing problem into a simpler 2D space image classification problem. A CNN model-based method was established to achieve this. This idea may be applied to similar research fields. (2) The 2D images transformed from 1D voltage disturbance signals are in grayscale which has only one color channel. That is, the input data used in this paper is much simpler than traditional image classification methods, which use color graphs with three color channels. (5) From real-world signals, the classification rate can be improved by adding a small amount of real-world data into the training process to fine-tune the parameters of the model. It proves that the proposed method has an excellent capability of learning and adaptation. (6) In addition, sag, swell, and interruption were considered in this paper. These kinds of disturbances have similar shapes, but with different amplitudes. For accurate classification, the coordinate information is reserved in their 2D images. Through this operation, the three types of voltage disturbance can be effectively distinguished. (7) The proposed method can be implemented very quickly for classification after the training process.
It is convenient for end users without requiring specialist knowledge. Whilst this work is based on the offline tests, it can be applied in online tests. This will be implemented in future work.

Discussion
It can be seen from the above results that once the trained model is available, the PQDs can be easily classified with a high classification rate. The feature maps can be extracted from the disturbance signals automatically without human intervention. Besides, the novelties of this paper are listed as follows: (1) The PQD classification can transform a complicated 1D signal processing problem into a simpler 2D space image classification problem. A CNN model-based method was established to achieve this. This idea may be applied to similar research fields. (2) The 2D images transformed from 1D voltage disturbance signals are in grayscale which has only one color channel. That is, the input data used in this paper is much simpler than traditional image classification methods, which use color graphs with three color channels.

Conclusions
In this paper, a new algorithm based on PSR and CNN is developed for the detection and classification of PQDs. Firstly, the PSR method is used to transform the 1D voltage disturbance signals into 2D images. The complicated 1D signal processing problem becomes a simple image classification issue through mapping into 2D space by using this transformation. Then, a CNN-based model is established and trained with image data to obtain optimal parameters for PQDs classification. Compared with current state-of-the-art methods, this algorithm is proved to be better and more accurate in terms of feature extraction. The feature maps are extracted automatically without human intervention. Finally, the real-world and simulated PQ events are used to confirm the effectiveness of the proposed method. This will help guide subsequent remedial actions.

Conflicts of Interest:
The authors declare there is no conflicts of interest regarding the publication of this paper.