Power Quality Disturbance Recognition Using VMD-Based Feature Extraction and Heuristic Feature Selection

: Power quality disturbances (PQDs) have a large negative impact on electric power systems with the increasing use of sensitive electrical loads. This paper presents a novel hybrid algorithm for PQD detection and classiﬁcation. The proposed method is constructed while using the following main steps: computer simulation of PQD signals, signal decomposition, feature extraction, heuristic selection of feature selection, and classiﬁcation. First, di ﬀ erent types of PQD signals are generated by computer simulation. Second, variational mode decomposition (VMD) is used to decompose the signals into several instinct mode functions (IMFs). Third, the statistical features are calculated in the time series for each IMF. Next, a two-stage feature selection method is imported to eliminate the redundant features by utilizing permutation entropy and the Fisher score algorithm. Finally, the selected feature vectors are fed into a multiclass support vector machine (SVM) model to classify the PQDs. Several experimental investigations are performed to verify the performance and e ﬀ ectiveness of the proposed method in a noisy environment. Moreover, the results demonstrate that the start and end points of the PQD can be e ﬃ ciently detected.


Introduction
Currently, renewable energy, such as photovoltaic and wind energy, is increasingly integrated with power systems to respond to the global energy crisis.Distribution generation systems with high penetration and power electronic equipment with nonlinear loads cause serious power quality disturbances (PQDs).The uncertain behavior of wind energy and PV systems occurs in nonstationary disturbances due to the random variation of environmental factors [1].Smart transmission systems that are equipped with modern power systems increase the applications of nonlinear switched devices, which exaggerate the power quality problems in distribution systems [2,3].Furthermore, nonlinear electronic loads, switching phenomena, short circuit faults, and lightning strikes can cause power quality disturbances [4].These disturbances are harmful to electrical equipment and they cause system faults, computer data loss, and programmable logic controller failures or malfunctions [5].For instance, a short circuit fault or transformer energizing may result in variations of the magnitude of voltage and current waveforms, which create sag, swell, and interruption events [6].The application of an Appl.Sci.2019, 9, 4901 3 of 22 reviewed studies, this proposed approach takes advantage of VMD, permutation entropy, and Fisher scoring to obtain the relevant features for classification.Specifically, VMD is utilized to decompose the obtained signal into a cluster of sub-intrinsic mode functions.Afterwards, the statistical features, such as RMS, standard deviation, and multiscale moments, are extracted from each IMF under noise-free and noisy environments.Furthermore, the permutation entropy and Fisher scoring are used to effectively identify the relevant features.Both of the two analysis methods are essential and constitute a two-stage heuristic feature selection method.Finally, an SVM classifier is implemented to identify the different types of the PQ disturbance signals through the optimized feature vectors.Experimental investigations are performed while using the proposed methodology in both a simulation model and real microgrid platform.
The main contributions of the proposed approach include the following aspects: (1) VMD is more capable of PQD signal decomposition when compared with wavelet packet decomposition due to its adaptive and robust characteristics, especially in dealing with recursive calculation and mode mixing problems.Additionally, the start and end points of a PQD event can be detected by using VMD and permutation entropy.(2) The two-stage heuristic feature selection technique consists of permutation entropy, and the Fisher score is utilized to eliminate any irrelevant features, which not only enhances the generalization capability and detection accuracy of the classifier, but also releases the computational complexity in terms of efficiency.(3) The proposed algorithm has sufficient flexibility under different levels of noisy environments, either for the detected sensitivity or specificity of the PQD.

Variational Mode Decomposition
Different from EMD, VMD is an adaptive and non-recursive method that can analyze both nonstationary and nonlinear signals.The scope of VMD decomposes a signal into band-limited sub-signals by using a non-recursive calculation method.Each decomposed sub-signal has certain sparse properties in the frequency domain.Additionally, the decomposed sub-signal is defined as the mode as well as EMD.Moreover, each sub-signal is assumed to be concentrated around a corresponding frequency centre.Thus, the bandwidth of each sub-signal can be chosen by utilizing the H1 Gaussian smoothness for the transformed signal.
VMD first utilizes the Hilbert transform to convert each mode u k into an analytical expression u k + in a single-sided spectral domain to obtain the bandwidth of each mode function: After the Hilbert transformation, the frequency spectrum of each mode is shifted to the baseband and the corresponding estimated centre frequency ω k is adjusted by using an exponential tuned term.Subsequently, the bandwidth is estimated according to the Gaussian smoothness of the demodulated signal by utilizing the squared L2-norm of the gradient [25].Thus, the VMD process is realized by solving a constrained variational problem [26]: where f (t) is the target signal, {u k }: = {u 1 , . . ., u K } represents the set of the decomposed modes, and {ω k }: = {ω 1 , . . ., ω K } represents the respective centre frequencies, respectively.Subsequently, the constraint optimization problem in Equation ( 2) can be converted into an unconstrained problem via a quadratic penalty term and Lagrangian multipliers, as below: To solve the original minimization problem, the alternate direction method of multipliers (ADMM) is adopted to determine the saddle point of the augmented Lagrangian in a sequence of iterative suboptimizations.[21] summarizes the details of the VMD algorithm.By this process, all of the mode functions are obtained and updated by Wiener filtering to tune the centre frequency in the spectral domain: (ω > 0) The centre frequency ω k n is calculated from the weighted centre of each mode in the spectral domain: the centre represents the frequency of the least squares linear regression of the instantaneous phase.Figure 1 presents the VMD algorithm as well as Appendix A.
To solve the original minimization problem, the alternate direction method of multipliers (ADMM) is adopted to determine the saddle point of the augmented Lagrangian in a sequence of iterative suboptimizations.[21] summarizes the details of the VMD algorithm.By this process, all of the mode functions are obtained and updated by Wiener filtering to tune the centre frequency in the spectral domain: The centre frequency ωk n is calculated from the weighted centre of each mode in the spectral domain: the centre represents the frequency of the least squares linear regression of the instantaneous phase.Figure 1 presents the VMD algorithm as well as Appendix A.

Permutation Entropy
Entropy is a conventional approach to indicate the degree of uncertainty for a random system in the field of information theory.As entropy increases with the degree of randomness for an observed system, it can be utilized in a quantitative analysis for feature extraction in the field of PQD detection.Permutation entropy is one of most popular entropies in signal processing [22].The temporal information of the monitored object is recognized according to permutation entropy by counting the ordinal patterns.Permutation entropy can estimate the complexity of a time-series after comparing the neighbouring values.By definition, permutation entropy is calculated by the time-series probability density function that is based on Shannon entropy.For a time domain series with a finite length of N, {x(t)} (t = 1, 2, …, N), a segment can be constructed by an m-order dimension.Each m-order segment is then sorted in an ascending sequence that is defined as

Permutation Entropy
Entropy is a conventional approach to indicate the degree of uncertainty for a random system in the field of information theory.As entropy increases with the degree of randomness for an observed system, it can be utilized in a quantitative analysis for feature extraction in the field of PQD detection.Permutation entropy is one of most popular entropies in signal processing [22].The temporal information of the monitored object is recognized according to permutation entropy by counting the ordinal patterns.Permutation entropy can estimate the complexity of a time-series after comparing the neighbouring values.By definition, permutation entropy is calculated by the time-series probability density function that is based on Shannon entropy.For a time domain series with a finite length of N, {x(t)} (t = 1, 2, . . ., N), a segment can be constructed by an m-order dimension.Each m-order segment is then sorted in an ascending sequence that is defined as permutation pattern π.The m-order segment X is defined in Equation ( 5) while considering the time delay α: From practical experience, the time lag α is selected as 1 and the value m is from 3 to 7. Hence, the number of m-order permutation patterns π is the factorial of m.The relative frequency of each permutation pattern π is calculated in Equation ( 6): where H represents a number.Subsequently, permutation entropy PE(m) is calculated while using the probability density function p(π) for the relative frequency in Equation (7):

Fisher Score-Based Feature Selection Method
As mentioned above, a number of features can be extracted according to the behavior of the statistical method.However, the extracted features are not all useful for the classification, which might be redundant or irrelevant.It is essential to select the significant features and discard irrelevant features to improve the accuracy of the classification.Meanwhile, low dimensionality is beneficial to simultaneously reduce the computation burden and improve the efficiency of the classification algorithm.Various methods, such as Laplacian Score, Fisher score, ReliefF, Wilcoxon rank, and Gain Ratio, are valid for feature selection.
The Fisher score is one of the most popular algorithms for ranking the priority of features [27,28].As a supervised algorithm, the Fisher score ranks extracted features by the Fisher criterion.In this method, the extracted features can be quantified in discriminative scores for different classes.For instance, the data set {f ik (1), f ik (2), . . .f ik (N)} is defined as the kth feature in the ith class.First, the mean value u ik and standard deviation σ 2 ik of the kth feature are calculated, as follows: Afterwards, the Fisher score value FS ij between the ith and the jth class can be quantified, as follows: As observed from the Fisher score expression, a large value represents a highly relevant correlation of the kth feature between the ith and the jth classes.A larger Fisher score indicates that the corresponding feature reveals more relevant distinction between the classes.The feature has a high priority to be chosen for the training model.

Proposed Method
Figure 2 presents the flowchart of the proposed method.The main scheme is implemented in four steps: PQD generation in a computer simulation, feature extraction, optimization of feature selection, and pattern recognition.In this paper, the proposed algorithm is designed to classify 13 classes of different signals, which are generated as a pure sine signal, sag, swell, interrupt, harmonics, sag with harmonics, swell with harmonics, interrupt with harmonics, flicker, oscillatory transient, impulsive transient, periodic notch, and spike.Second, by utilizing VMD to decompose the target signal into several IMFs, the feature vector is extracted in the statistical analysis method.Third, permutation entropy is imported to filter the invalid IMFs.Subsequently, the Fisher score is adopted to eliminate the redundant features to optimize feature selection.Finally, the classifier is realized by SVM to recognize different PQD patterns.
Appl.Sci.2019, 9, x FOR PEER REVIEW 6 of 22 transient, impulsive transient, periodic notch, and spike.Second, by utilizing VMD to decompose the target signal into several IMFs, the feature vector is extracted in the statistical analysis method.Third, permutation entropy is imported to filter the invalid IMFs.Subsequently, the Fisher score is adopted to eliminate the redundant features to optimize feature selection.Finally, the classifier is realized by SVM to recognize different PQD patterns.
The training data is generated from computer simulation with the 13 classes mentioned above.The dataset size of each class is 1000.After training, several analysis methods, such as confusion matrix and receiver operating characteristic curve, are imported to evaluate the performance of the training model, which is presented in Section 4.4.In another word, the artificial dataset of the simulation signals is utilized for the training and developed set.Additionally, the trained model will be tested on the real experimental setup to verify the effectiveness of the classifier, which is presented in Section 4.5.
. As specific implementation, the data generation and preprocessing, including the VMD algorithm, is realized on the Python script in the training stage.Subsequently, the public feature extraction library, LibXtract, is integrated with the console application to extract features.Afterwards, the feature heuristic selection and classifier model is trained on the server setup in Anaconda Python.In the training stage, the fault simulator system and the data acquisition system are both integrated on the LabVIEW environment.In particular, the communication is realized by the data-logging and supervisory control (DSC) packet.Additionally, the DAQmx driver packet realizes the data acquisition system.The experimental voltage/current data in time domain is obtained and saved as csv format in files.As an alternative method, the proposed algorithm can be realized on other platforms, such as C++ and Matlab.The training data is generated from computer simulation with the 13 classes mentioned above.The dataset size of each class is 1000.After training, several analysis methods, such as confusion matrix and receiver operating characteristic curve, are imported to evaluate the performance of the training model, which is presented in Section 4.4.In another word, the artificial dataset of the simulation signals is utilized for the training and developed set.Additionally, the trained model will be tested on the real experimental setup to verify the effectiveness of the classifier, which is presented in Section 4.5.

Computer Simulation of PQDs
As specific implementation, the data generation and preprocessing, including the VMD algorithm, is realized on the Python script in the training stage.Subsequently, the public feature extraction library, LibXtract, is integrated with the console application to extract features.Afterwards, the feature heuristic selection and classifier model is trained on the server setup in Anaconda Python.In the training stage, the fault simulator system and the data acquisition system are both integrated on the LabVIEW environment.In particular, the communication is realized by the data-logging and supervisory control (DSC) packet.Additionally, the DAQmx driver packet realizes the data acquisition system.The experimental voltage/current data in time domain is obtained and saved as csv format in files.As an alternative method, the proposed algorithm can be realized on other platforms, such as C++ and Matlab.

Computer Simulation of PQDs
In this work, a set of signals for training is artificially obtained through computer models.The artificial PQD signals consist of 10 single types, including pure a sine signal, sag, swell, interrupt, harmonics, flicker, oscillatory transient, impulsive transient, periodic notch, and spike.The multiple types consist of sag with harmonics, swell with harmonics, and interrupt with harmonics.The definition of the PQD signals with the parameter variations is based on the IEEE-1159 standard [29], as presented in Table 1.Waveform signals with 10 cycles are generated for 2000 points at a 10 kHz sampling frequency.The parameter A represents the normalized amplitude at a constant value.The parameter α represents the intensity levels of the sag, swell, and interruption.The expression u(t) denotes the time step function.
Table 1.Mathematical model of the power quality disturbances.

C1
Pure Sine In this work, all of the studied cases are based on pure normal-class sinusoidal signals that can vary their amplitude within 10% of their ideal magnitude.Additionally, the probability of a signal containing noise must be considered.Therefore, the simulated signal is coupled with Gaussian noise contamination at different signal-to-noise ratios (SNRs) in this paper.The SNR is defined, as follows, where P s represents the original signal power and P n represents the noise power correspondingly.SNR = 10 * log P s P n (10)

PQD Feature Extraction
For the recognition of different PQD patterns, it is essential to utilize a classifier to analyze the signal that is obtained from the voltage or current sensors.However, it is difficult to identify the time-series signal directly for common classification methods.As feature extraction plays an important role in the pattern recognition schemes, VMD is utilized to decompose the obtained signal into several IMFs for potential feature extraction in this paper.The sub-mode decomposed by VMD contains a specific spectrum, which can accurately trace the signal changes of the power quality.Thus, signal decomposition can effectively eliminate the impact of noise and separate the useful components in high level modes.As mentioned above, the decomposition results depend on the mode number, which is defined as K.When considering the harmonic components that exist in the PQD signal, the value of K is selected as 5. Additionally, parameter α of the data-fidelity constraint is selected as 2500 based on the experimental trial.Subsequently, by generating artificial PQD signals, the IMF is decomposed by VMD.After the decomposition, the statistical features are calculated and extracted from each mode in the time domain.Table 2 marks and lists all of the features and the corresponding labels.

No.
Statistical Feature Expression With the proposed feature extraction method, the PQD signals are decomposed into five IMFs.Subsequently, the statistical feature vectors are given as the following equations, where F 1 , F 2 , . . ., F 9 Appl.Sci.2019, 9, 4901 9 of 22 represent the feature vectors of RMS, standard deviation, variance, range, skewness, kurtosis, mean, average deviation, and permutation entropy, respectively.
The data must be normalized between 0 and 1 by using the min-max normalization method in Equation (12), where Z i is the normalized data, F max is the maximum data of the vector, and F min is the minimum data, respectively, since the elements of the extracted feature sets have different units.
After normalization, the feature set F is obtained while using Equation (13).

PQD Feature Heuristic Selection
The original feature vectors in high dimensions contain not only useful information, but also redundant messages.As a result, the original high dimensional feature vectors would lead to the waste of computer resources and a decrease in accuracy.In this paper, a two-stage heuristic feature selection algorithm is adopted to filter relevant features for the classification of PQDs.When considering the function of the VMD method, some decomposed IMFs may be the separated noise, which contains little useful information.As a result, it is obligatory to remove these feature vectors extracted from the useless IMFs.Further, permutation entropy is imported by estimating the time-series complexity of the obtained IMF to select IMFs.In this work, a hard threshold of permutation entropy is computed to select valid IMFs, which is set as 0.6.In other words, if the permutation entropy of the IMF is larger than 0.6, the decomposed IMF is treated as chaotic with less useful information [30].
In the second stage, the Fisher score is adopted to filter the relevant features.As described above, if the feature is highly relevant, the corresponding Fisher score is relatively large.Hence, the classification accuracy would achieve perfection.On the contrary, if the feature shows irrelevance for the classification result, the Fisher score is near zero.In this circumstance, the classifier would have poor performance.In addition, one point should be noted: the threshold selection of the Fisher's score is very important.In this study, the threshold value of the Fisher score is set as 0.25 and Section 4.3 discusses the reason of the threshold selection.

Multiclass Classification for PQDs
After feature selection, a support vector machine (SVM) is utilized as the classifier to identify the pattern of PQDs.It is essential to adjust the strategies that are used for multiclass classification since the original SVM was designed for binary classification.When compared with binary classification problems, multi classification is more complex and it contains large datasets [31].In this paper, the strategy "one-versus-rest" (OVR) is adopted to expand the SVM for multiclass classification.For an N-class problem, N hyperplanes are generated for N classes.When generating the ith hyperplane of the ith class, SVM treats the ith class as the positive case.Meanwhile, the rest are treated as a negative class.During the classifying process, all of the samples are required in each quadratic programming problem.The new sample is assigned to the corresponding class by calculating the closest distant to the hyperplane.Figure 3 gives an example of the OVR strategy.When considering a data set with four classes marked C1, C2, C3, and C4, the OVR strategy treats each class as the positive example and the rest as negative examples.Therefore, four classifiers will be trained to give the predicted result.If there is only one classifier like f 3 predicts the sample as positive, the final result is determined.Otherwise, it is usually referred the forecast confidence given by each classifier.Subsequently, choose the maximum value as the final result.
Appl.Sci.2019, 9, x FOR PEER REVIEW 10 of 22 Figure 3 gives an example of the OVR strategy.When considering a data set with four classes marked C1, C2, C3, and C4, the OVR strategy treats each class as the positive example and the rest as negative examples.Therefore, four classifiers will be trained to give the predicted result.If there is only one classifier like f3 predicts the sample as positive, the final result is determined.Otherwise, it is usually referred the forecast confidence given by each classifier.Subsequently, choose the maximum value as the final result.In this paper, the radial basis function (RBF) is chosen as the kernel function of the SVM, because the selected features are nonlinear and the quadratic programming problem is not linearly separable.By choosing a nonlinear kernel, the SVM model will build a nonlinear classification boundary.Furthermore, the penalty parameter C and the adjustable parameter γ in the SVM with an RBF kernel are optimized by conducting the grid search algorithm.In particular, C is the penalty coefficient, which is representative of the tolerance for error.The higher C is, the less error is tolerated, which makes the classifier model over-fit.The smaller C is, the easier it is to under-fit.If C is too large or too small, the generalization ability of the classifier would be poor.Parameter γ comes with the RBF function when it is selected as kernel.It implicitly determines the distribution of data after mapping to the new eigenspace.The larger the gamma is, the less the support vector is, the smaller the gamma value is, and the more the support vector is.

Experimental Setup
A microgrid platform is built in the laboratory environment to verify the proposed method, as shown in Figures 4 and 5.The microgrid platform is designed in a double layer structure with a flexible topology that contains flexible line connection, distributed generations like PV array and wind turbine, line impedance simulators, including resistance, and inductive and capacitive loads (100 kW RLC Load).To be specific, the distributed generations consist of centralized inverters for the photovoltaic roof, microinverters for the photovoltaic roof, a photovoltaic simulator, a diesel generator simulator, and a wind turbine simulator.Moreover, the load of the microgrid platform consists of lighting systems, an induction motor, charging piles for electric vehicles, and a 100 kW programmable PLC load.The microgrid platform is designed to operate at 400 V three phases with 50 Hz.Two sub-microgrid layers are coupled with bidirectional converters, which allow for the electrical energy to be transmitted in bidirectional flows.These sub-microgrid layers are connected to the bulk power system by deploying the three phases AC bus.Moreover, the microgrid platform is able to simulate different line faults to recur by utilizing the line impedance simulator under the logical program control.In this paper, the radial basis function (RBF) is chosen as the kernel function of the SVM, because the selected features are nonlinear and the quadratic programming problem is not linearly separable.By choosing a nonlinear kernel, the SVM model will build a nonlinear classification boundary.Furthermore, the penalty parameter C and the adjustable parameter γ in the SVM with an RBF kernel are optimized by conducting the grid search algorithm.In particular, C is the penalty coefficient, which is representative of the tolerance for error.The higher C is, the less error is tolerated, which makes the classifier model over-fit.The smaller C is, the easier it is to under-fit.If C is too large or too small, the generalization ability of the classifier would be poor.Parameter γ comes with the RBF function when it is selected as kernel.It implicitly determines the distribution of data after mapping to the new eigenspace.The larger the gamma is, the less the support vector is, the smaller the gamma value is, and the more the support vector is.

Experimental Setup
A microgrid platform is built in the laboratory environment to verify the proposed method, as shown in Figures 4 and 5.The microgrid platform is designed in a double layer structure with a flexible topology that contains flexible line connection, distributed generations like PV array and wind turbine, line impedance simulators, including resistance, and inductive and capacitive loads (100 kW RLC Load).To be specific, the distributed generations consist of centralized inverters for the photovoltaic roof, microinverters for the photovoltaic roof, a photovoltaic simulator, a diesel generator simulator, and a wind turbine simulator.Moreover, the load of the microgrid platform consists of lighting systems, an induction motor, charging piles for electric vehicles, and a 100 kW programmable PLC load.The microgrid platform is designed to operate at 400 V three phases with 50 Hz.Two sub-microgrid layers are coupled with bidirectional converters, which allow for the electrical energy to be transmitted in bidirectional flows.These sub-microgrid layers are connected to the bulk power system by deploying the three phases AC bus.Moreover, the microgrid platform is able to simulate different line faults to recur by utilizing the line impedance simulator under the logical program control.

PQD Signal Decomposition and Detection
An artificial signal that represents an interruption with harmonics will be discussed, as follows, in order to verify the effectiveness of VMD. Figure 6 shows the PQD type of the interruption with third and fifth harmonics in the time series.As seen, the electric power is interrupted from 0.035 s to 0.117 s.Meanwhile, the obtained signal is contaminated with the third and fifth harmonics.Additionally, white noise is added with 25 dB SNR to represent the real signal.

PQD Signal Decomposition and Detection
An artificial signal that represents an interruption with harmonics will be discussed, as follows, in order to verify the effectiveness of VMD. Figure 6 shows the PQD type of the interruption with third and fifth harmonics in the time series.As seen, the electric power is interrupted from 0.035 s to 0.117 s.Meanwhile, the obtained signal is contaminated with the third and fifth harmonics.Additionally, white noise is added with 25 dB SNR to represent the real signal.

PQD Signal Decomposition and Detection
An artificial signal that represents an interruption with harmonics will be discussed, as follows, in order to verify the effectiveness of VMD. Figure 6 shows the PQD type of the interruption with third and fifth harmonics in the time series.As seen, the electric power is interrupted from 0.035 s to 0.117 s.Meanwhile, the obtained signal is contaminated with the third and fifth harmonics.Additionally, white noise is added with 25 dB SNR to represent the real signal.7a, it appears that the PQD signal is under-decomposed.Although the fundamental component is obtained in the first IMF, the third and fifth harmonics mix in the second IMF. Figure 7b shows the same result.When K = 5 in Figure 7c, the first three IMFs are decomposed as fundamental components, the third and fifth harmonics, effectively.Interestingly, the fourth IMF presents two mutation pulses for detecting the start and the end of the PQD.By setting an appropriate threshold, the start and the end of the PQD can be easily detected and implemented on the hardware with less computational cost.Meanwhile, Figure 7d also shows a similar result when K = 6, which reveals that the PQD signal is over-decomposed, because the other two IMFs share the noise component.Nevertheless, this situation is less harmful than the under-decomposed situation, since feature selection in the next step will help to filter redundant IMFs.Moreover, from Figure 7c, it is apparent that the first four IMFs carry valid information and the remaining IMFs are invalid.Therefore, it is obligatory to consider these four IMFs for feature extraction.Meanwhile, the permutation entropy of each IMF is 0.231, 0.357, 0.442, 0.553, and 0.689, which verifies the observation results.
(a) (b) Figure 7 presents the different decomposition result of VMD, and Figure 7a-d represent the decomposed number K = 3, 4, 5 and 6, respectively.When K = 3 in Figure 7a, it appears that the PQD signal is under-decomposed.Although the fundamental component is obtained in the first IMF, the third and fifth harmonics mix in the second IMF. Figure 7b shows the same result.When K = 5 in Figure 7c, the first three IMFs are decomposed as fundamental components, the third and fifth harmonics, effectively.Interestingly, the fourth IMF presents two mutation pulses for detecting the start and the end of the PQD.By setting an appropriate threshold, the start and the end of the PQD can be easily detected and implemented on the hardware with less computational cost.Meanwhile, Figure 7d also shows a similar result when K = 6, which reveals that the PQD signal is over-decomposed, because the other two IMFs share the noise component.Nevertheless, this situation is less harmful than the under-decomposed situation, since feature selection in the next step will help to filter redundant IMFs.Moreover, from Figure 7c, it is apparent that the first four IMFs carry valid information and the remaining IMFs are invalid.Therefore, it is obligatory to consider these four IMFs for feature extraction.Meanwhile, the permutation entropy of each IMF is 0.231, 0.357, 0.442, 0.553, and 0.689, which verifies the observation results.
Additionally, as referred to in Ref. [24], the wavelet packet decomposition (WPD) is imported for comparison.Figure 8a,b present the decomposition result by WPD when the decomposition level is 2 and 3, respectively.In Figure 8a, since the decomposition level is 2, the corresponding wavelet node is 4, which reveals that WPD can only filter noise, but the first node still retains the main component of the original signal.Though the number of the wavelet nodes increases to 8, it still reveals a similar result in Figure 8b.By contrast, VMD performs better, even if the decomposed number K is not suitable for decomposition.The reason for this phenomenon is that each node of the wavelet packet can be approximately treated as a band-pass filter at the specific frequency range.As mentioned in Section 3.1, the sampling frequency is selected as 10 kHz.For the two levels of WPD that are shown in Figure 8a, the band width of each node is approximately 1.25 kHz.Meanwhile, for the three levels of WPD that are shown in Figure 8b, the band width of each node is approximately 0.625 kHz.However, the PQD signal is defined as the interruption of the third and fifth harmonics, where sinusoidal components with frequencies of 50 Hz, 150 Hz, and 250 Hz exist.Hence, all of these components are retained and mixed in the first node, so the WPD does not seem to work effectively.Based on the decomposition result, VMD is adaptive and robust for PQD signal decomposition, especially for recursive calculation and mode mixing problems.
over-decomposed, because the other two IMFs share the noise component.Nevertheless, this situation is less harmful than the under-decomposed situation, since feature selection in the next step will help to filter redundant IMFs.Moreover, from Figure 7c, it is apparent that the first four IMFs carry valid information and the remaining IMFs are invalid.Therefore, it is obligatory to consider these four IMFs for feature extraction.Meanwhile, the permutation entropy of each IMF is 0.231, 0.357, 0.442, 0.553, and 0.689, which verifies the observation results.Additionally, as referred to in Ref. [24], the wavelet packet decomposition (WPD) is imported for comparison.Figure 8a,b present the decomposition result by WPD when the decomposition level is 2 and 3, respectively.In Figure 8a, since the decomposition level is 2, the corresponding wavelet node is 4, which reveals that WPD can only filter noise, but the first node still retains the main component of the original signal.Though the number of the wavelet nodes increases to 8, it still reveals a similar result in Figure 8b.By contrast, VMD performs better, even if the decomposed number K is not suitable for decomposition.The reason for this phenomenon is that each node of the wavelet packet can be approximately treated as a band-pass filter at the specific frequency range.As mentioned in Section 3.1, the sampling frequency is selected as 10 kHz.For the two levels of WPD that are shown in Figure 8a, the band width of each node is approximately 1.25 kHz.Meanwhile, for the three levels of WPD that are shown in Figure 8b, the band width of each node is approximately 0.625 kHz.However, the PQD signal is defined as the interruption of the third and fifth harmonics, where sinusoidal components with frequencies of 50 Hz, 150 Hz, and 250 Hz exist.Hence, all of these components are retained and mixed in the first node, so the WPD does not seem to work effectively.Based on the decomposition result, VMD is adaptive and robust for PQD signal decomposition, especially for recursive calculation and mode mixing problems.

Feature Selection Analysis
The Fisher score with permutation entropy, which is described in Section 3.3, is utilized to filter redundant features as a feature heuristic selection method.Specifically, the embedded dimension parameter of permutation entropy is set to 6. Based on the experimental trial, the time delay parameter of permutation entropy is set to 1.The hard threshold of permutation entropy is set to 0.6.Figure 9

Feature Selection Analysis
The Fisher score with permutation entropy, which is described in Section 3.3, is utilized to filter redundant features as a feature heuristic selection method.Specifically, the embedded dimension parameter of permutation entropy is set to 6. Based on the experimental trial, the time delay parameter of permutation entropy is set to 1.The hard threshold of permutation entropy is set to 0.6.Figure 9 Appl.Sci.2019, 9, 4901 14 of 22 presents the correlation of the top 16 selected features, which represent the IMF2 RMS, IMF2 standard deviation, IMF2 variance, IMF2 average deviation, IMF2 kurtosis, IMF2 skewness, IMF2 range, IMF2 permutation entropy, IMF1 permutation entropy, IMF1 variance, IMF1 standard deviation, IMF1 RMS, IMF1 average deviation, IMF1 kurtosis, IMF1 range, and IMF4 permutation entropy.As shown, each feature is imported into the training dataset to determine the classification accuracy, with the aim of showing the effectiveness of the feature selection method.The experiments are conducted 50 times to avoid errors and stochastic disturbances.Subsequently, the accuracy results are generated by averaging the calculations.If the feature is irrelevant to the target class, the corresponding Fisher's score is nearly zero.Hence, the average accuracy of the classification would be poor.Conversely, if the feature is highly relevant, the corresponding Fisher score is relatively large.Thus, the classifier that is constructed by relevant features would achieve perfect accuracy.Moreover, the threshold value of the selected feature is very important.Threshold selection is discussed in the following paragraph.In this case, the SVM classifier is adopted to focus on the effects of the selected features.To be specific, the sample data sets are partitioned into 10 equal folds, and the tests were performed in 10 iterations.The number of each test instance is 1000.For the SVM classifier, the cost values C and γ of the SVM model are set to 100 and 0.02, respectively, and the linear kernel is selected as the kernel function.The classification threshold is set to 0.5.For each PQD class, the data sets were split into the training data sets and the test data sets.The ratio of the samples for the training and test is 4:1.
Two feature ranking methods, Fisher's score only and Fisher's score with the permutation algorithm, were used for feature selection for comparison.Figure 10a,b show the precision and recalls results by extracting different features according to the Fisher's score only and Fisher's score with the permutation algorithm, respectively.The average precision and average recall ratio reached 96.5% when the top 24 ranked features were selected by the Fisher's scoring algorithm.Similarly, the average precision and the average recall ratio according to the Fisher's score with the permutation algorithm achieved 97.6% when the SVM classifier was trained by feeding the 16 top-ranked features.However, the SVM classifier that was trained by the Fisher's score with the permutation selection algorithm showed better performance.Moreover, it cost less dimensions for feature vectors.In this case, the SVM classifier is adopted to focus on the effects of the selected features.To be specific, the sample data sets are partitioned into 10 equal folds, and the tests were performed in 10 iterations.The number of each test instance is 1000.For the SVM classifier, the cost values C and γ of the SVM model are set to 100 and 0.02, respectively, and the linear kernel is selected as the kernel function.The classification threshold is set to 0.5.For each PQD class, the data sets were split into the training data sets and the test data sets.The ratio of the samples for the training and test is 4:1.
Two feature ranking methods, Fisher's score only and Fisher's score with the permutation algorithm, were used for feature selection for comparison.Figure 10a,b show the precision and recalls results by extracting different features according to the Fisher's score only and Fisher's score with the permutation algorithm, respectively.The average precision and average recall ratio reached 96.5% when the top 24 ranked features were selected by the Fisher's scoring algorithm.Similarly, the average precision and the average recall ratio according to the Fisher's score with the permutation algorithm achieved 97.6% when the SVM classifier was trained by feeding the 16 top-ranked features.However, the SVM classifier that was trained by the Fisher's score with the permutation selection algorithm showed better performance.Moreover, it cost less dimensions for feature vectors.
Interestingly, as observed from the average precision and the average recall ratio, the accuracy was initially increased with the increased number of top-ranked features, because the decision boundary of the SVM hyperplane is more clearly separated by increasing the relevant feature vectors.However, once the feature dimension reached the proper ranking feature point, the performance of the classifier continued to decrease due to redundant features.After a decrease, the average precision and average recall ratio increased again, because the SVM imported the slack variable and the penalty factor to construct the decision boundary in soft intervals, which improved the robustness and generalization of the classifier.
96.5% when the top 24 ranked features were selected by the Fisher's scoring algorithm.Similarly, the average precision and the average recall ratio according to the Fisher's score with the permutation algorithm achieved 97.6% when the SVM classifier was trained by feeding the 16 top-ranked features.However, the SVM classifier that was trained by the Fisher's score with the permutation selection algorithm showed better performance.Moreover, it cost less dimensions for feature vectors.Interestingly, as observed from the average precision and the average recall ratio, the accuracy was initially increased with the increased number of top-ranked features, because the decision The accuracy of the SVM classifier is compared under different ratios of training and testing sets to further investigate the effectiveness of the heuristic selection method.The percentages of the training set were increased by 5%, 10%, 15%, 20%, 25%, 30%, and 35%.1000 trials were performed for each training set percentage to reduce random effects.Figure 11a,b presents the bar graphs of the average accuracy for different percentages of training and testing sets.The error standard deviations are also marked in figures.As discussed in Figure 11, the top 24 features are selected during training and testing.Overall, the training and test accuracies with heuristic selection were both better than the values without selection.When the training percentage is set as 5%, the training accuracy is larger than the testing accuracy, as shown in Figure 11a as well as Figure 11b, which reveals that the training model is underfitting.With the increased percentage of the training sets, the testing accuracy is enhanced.Meanwhile, the classification error is decreased.When the training percentage increases to 30%, the training and test accuracies approach one another when the feature heuristic selection method is used.Nevertheless, neither the training nor the test accuracy are consistent, regardless of the increasing percentages of the training set.boundary of the SVM hyperplane is more clearly separated by increasing the relevant feature vectors.However, once the feature dimension reached the proper ranking feature point, the performance of the classifier continued to decrease due to redundant features.After a decrease, the average precision and average recall ratio increased again, because the SVM imported the slack variable and the penalty factor to construct the decision boundary in soft intervals, which improved the robustness and generalization of the classifier.
The accuracy of the SVM classifier is compared under different ratios of training and testing sets to further investigate the effectiveness of the heuristic selection method.The percentages of the training set were increased by 5%, 10%, 15%, 20%, 25%, 30%, and 35%.1000 trials were performed for each training set percentage to reduce random effects.Figure 11a

Performance Analysis
The performance of the proposed method can be depicted by the following values: precision for the correctly identified type of PQD, Poccur; recall for the correctly identified type of PQD, Roccur; precision for the incorrectly identified type of PQD, Pother; and, recall for the incorrectly identified type of PQD, Rother.

Performance Analysis
The performance of the proposed method can be depicted by the following values: precision for the correctly identified type of PQD, P occur ; recall for the correctly identified type of PQD, R occur ; precision for the incorrectly identified type of PQD, P other ; and, recall for the incorrectly identified type of PQD, R other .
As shown in Equation ( 14), TP represents the number of true positive instances, which means that the estimated PQD type is correctly classified.FP represents the number of false positive instances, which means that another PQD is classified incorrectly as the estimated PQD type.FN represents the number of false negative instances, which means that the estimated PQD type is incorrectly classified as another PQD type.TN represents the number of true negative instances, which means that another PQD type is correctly identified as the corresponding PQD.Table 3 presents the confusion matrix of the PQD classification for the validation data sets.The classification results achieve high accuracy, as observed from the confusion matrix.However, there are still some mistakes due to the similar characteristics of some PQD types.For instance, two C4 (Interrupt Disturbances) samples are incorrectly assigned to C2 (Sag).Meanwhile, three C2 samples are also incorrectly assigned to C4, because the formula definitions of C2 and C4 are the same, and the only difference between these two classes is parameter α.As referred to above in Table 2, parameter α randomly varies over a range.To be specific, for C2 (sag), the range of parameter α is from 0.1 to 0.9.For C4 (interrupt) and the range of parameter α is from 0.9 to 1.When parameter α is selected to be near for both classes, the features that were extracted from the signal are also similar.In this circumstance, it would lead to an incorrect classification.The results for C11 (Sag with harmonics) and C13 (Interruption with harmonics) also verify this situation.The receiver operating characteristic (ROC) curve is adopted to depict the tradeoff relationship between the true positive rate and the false positive rate to evaluate the performance of the proposed classifier, as shown in Figure 12.Ten OVR classifiers were constructed due to the ten PQD types since the "one-versus-rest" (OVR) structure was utilized for multiclass classification in this paper.Subsequently, ten ROC curves were drawn to represent the true positive rate versus the false positive rate at various threshold values of θ.To be specific, the threshold varied from 0 to 1 with a step of 0.02.By giving the threshold θ, the average values of the true positive rate and false positive rate were calculated.The areas of the four ROC curves from the C1 class to the C10 class were 0.979, 0.958, 0.949, 0.963, 0.957, 0.978, 0.947, 0.965, 0.986, and 0.987.For the PQD classification, a large false positive rate would cause false alarms, which would result in unexpected or even mistaken strategy responses for power grids.More seriously, a large false positive rate would lead to worse consequences in the distribution generation systems with high penetration.By contrast, a small true positive rate would degrade the usability of the PQD detecting system.Being observed from the ROC curves, the proposed method has enough flexibility to satisfy various requirements, either for the disturbance-detected sensitivity or specificity.

Accuracy Under a Real Noisy Environment
The signals in the real power systems are contaminated with the high probability of containing noise.Therefore, an artificial test signal is constructed with different intensities of white Gaussian noise (AWGN).The intensity of the white Gaussian noise is given as the signal to noise ratio (SNR), which is defined in Equation (10), where Ps and Pn are the power of the signal and noise, respectively.
Table 4 shows the performance of the proposed classifier under different SNRs between 20 and 50 dB, which is conducted on the real microgrid platform.The robustness of the proposed algorithm is tested for 500 test samples for each PQD class in the presence of noise.The type of PQD events and the number of PQD events are selected in random series during the test stage.Overall, the classification accuracy decreases as the SNR level decreases.Nevertheless, the classification accuracy still reaches greater than 94%, even if the SNR drops to 20 dB.The results demonstrate that the SVM classifier of the proposed algorithm is less sensitive in a noisy environment.The reason that the classifier is insensitive to noise is attributed to the satisfactory performance of the algorithm.Each mode has a central frequency with a narrow bandwidth since VMD decomposes the signal into band-limited modes by updating with Wiener filtering.In this method, the disturbance signal is effectively separated from noise, which benefits the procedure of feature extraction for the SVM classifier.

Accuracy Under a Real Noisy Environment
The signals in the real power systems are contaminated with the high probability of containing noise.Therefore, an artificial test signal is constructed with different intensities of white Gaussian noise (AWGN).The intensity of the white Gaussian noise is given as the signal to noise ratio (SNR), which is defined in Equation (10), where Ps and Pn are the power of the signal and noise, respectively.
Table 4 shows the performance of the proposed classifier under different SNRs between 20 and 50 dB, which is conducted on the real microgrid platform.The robustness of the proposed algorithm is tested for 500 test samples for each PQD class in the presence of noise.The type of PQD events and the number of PQD events are selected in random series during the test stage.Overall, the classification accuracy decreases as the SNR level decreases.Nevertheless, the classification accuracy still reaches greater than 94%, even if the SNR drops to 20 dB.The results demonstrate that the SVM classifier of the proposed algorithm is less sensitive in a noisy environment.The reason that the classifier is insensitive to noise is attributed to the satisfactory performance of the algorithm.Each mode has a central frequency with a narrow bandwidth since VMD decomposes the signal into band-limited modes by updating with Wiener filtering.In this method, the disturbance signal is effectively separated from noise, which benefits the procedure of feature extraction for the SVM classifier.Then we would like to discuss the value choice of K.If K is too small (K = 2), the signal is underbinning.Then the decomposed results are shown in Figure A2.It is apparent that each mode is either shared by neighboring modes.Then we would like to discuss the value choice of K.If K is too small (K = 2), the signal is underbinning.Then the decomposed results are shown in Figure A2.It is apparent that each mode is either shared by neighboring modes.Then we would like to discuss the value choice of K.If K is too small (K = 2), the signal is underbinning.Then the decomposed results are shown in Figure A2.It is apparent that each mode is either shared by neighboring modes.If K is too small (K = 4), the signal is overbinning.Then the decomposed results are shown in Figure A3.It seems that each mode is separated clearly as pure sine signal.However, compared with Figure A3a,b, the frequencies of two modes are same, which is like mode duplication.
Based on what discussed above, we can see that the VMD algorithm is able to decomposed signal in an adaptive method.As same as a classical shortcoming of many segmentation algorithms, it is vital to preset the number of clusters in initial stage.For the VMD algorithm, the choice of K is important.Moreover, the parameter α of the data-fidelity constraint, which influences the tightness of the band-limits, is also important in the decomposition.The details of the VMD parameters can be referred in [19].If K is too small (K = 4), the signal is overbinning.Then the decomposed results are shown in Figure A3.It seems that each mode is separated clearly as pure sine signal.However, compared with Figure A3a,b, the frequencies of two modes are same, which is like mode duplication.
Based on what discussed above, we can see that the VMD algorithm is able to decomposed signal in an adaptive method.As same as a classical shortcoming of many segmentation algorithms, it is vital to preset the number of clusters in initial stage.For the VMD algorithm, the choice of K is important.Moreover, the parameter α of the data-fidelity constraint, which influences the tightness of the band-limits, is also important in the decomposition.The details of the VMD parameters can be referred in [19].

Figure 1 .
Figure 1.Flow chart of the VMD algorithm.

Figure 1 .
Figure 1.Flow chart of the VMD algorithm.

Figure 2 .
Figure 2. Main flow chart of the proposed method.

Figure 2 .
Figure 2. Main flow chart of the proposed method.

Figure 4 .
Figure 4. Experimental setup for power quality disturbances.

Figure 5 .
Figure 5.The architecture diagram of the experimental setup.

Figure 6 .
Figure 6.Original signal of interruption with harmonics.

Figure 7
Figure 7 presents the different decomposition result of VMD, and Figure 7a-d represent the decomposed number K = 3, 4, 5 and 6, respectively.When K = 3 in Figure7a, it appears that the PQD signal is under-decomposed.Although the fundamental component is obtained in the first IMF, the third and fifth harmonics mix in the second IMF.Figure7bshows the same result.When K = 5 in Figure7c, the first three IMFs are decomposed as fundamental components, the third and fifth harmonics, effectively.Interestingly, the fourth IMF presents two mutation pulses for detecting the start and the end of the PQD.By setting an appropriate threshold, the start and the end of the PQD can be easily detected and implemented on the hardware with less computational cost.Meanwhile, Figure7dalso shows a similar result when K = 6, which reveals that the PQD signal is over-decomposed, because the other two IMFs share the noise component.Nevertheless, this situation is less harmful than the under-decomposed situation, since feature selection in the next step will help to filter redundant IMFs.Moreover, from Figure7c, it is apparent that the first four IMFs carry valid information and the remaining IMFs are invalid.Therefore, it is obligatory to consider these four IMFs for feature extraction.Meanwhile, the permutation entropy of each IMF is 0.231, 0.357, 0.442, 0.553, and 0.689, which verifies the observation results.

Figure 6 .
Figure 6.Original signal of interruption with harmonics.

Figure 8 .
Figure 8. PQD signal decomposition results by WPD.(a) When the decomposition level is 2 by WPD; and, (b) when the decomposition level is 3 by WPD.

Figure 8 .
Figure 8. PQD signal decomposition results by WPD.(a) When the decomposition level is 2 by WPD; and, (b) when the decomposition level is 3 by WPD.
Appl.Sci.2019, 9, x FOR PEER REVIEW 14 of 22Moreover, the threshold value of the selected feature is very important.Threshold selection is discussed in the following paragraph.

Figure 9 .
Figure 9. Correlation between features and classification accuracy.

Figure 9 .
Figure 9. Correlation between features and classification accuracy.

Figure 10 .
Figure 10.Identification accuracy results with different numbers of selected features.(a) Training result only by Fisher's scoring method; and, (b) training result by Fisher's score and the permutation entropy method.

Figure 10 .
Figure 10.Identification accuracy results with different numbers of selected features.(a) Training result only by Fisher's scoring method; and, (b) training result by Fisher's score and the permutation entropy method.

Figure 11 .
Figure 11.Accuracy comparison with different training percentages.(a) classification accuracy with feature heuristic selection; and, (b) classification accuracy without feature selection.

Figure 11 .
Figure 11.Accuracy comparison with different training percentages.(a) classification accuracy with feature heuristic selection; and, (b) classification accuracy without feature selection.

Figure 12 .
Figure 12.Receiver operating characteristic curve for each PQD class.

Figure 12 .
Figure 12.Receiver operating characteristic curve for each PQD class.

Table 2 .
Statistical parameters for feature extraction.

Table 3 .
Confusion matrix of PQD classification.

Table 4 .
Performance of PQD classification accuracy under a noisy environment.

Table 4 .
Performance of PQD classification accuracy under a noisy environment.