Through the simulation contrast experiment, the new method is analyzed and validated in aspects of feature selection methods, classifier performance, and signal processing methods.

#### 4.1. Feature Extraction of PQ Signals

Referring to [

13,

15], 15 kinds of PQ signals are generated by simulation, including normal (C0), sag (C1), swell (C2), interruption (C3), flicker (C4), transient (C5), harmonic (C6), notch (C7), spike (C8), harmonic with sag (C9), harmonic with swell (C10), harmonic with flicker (C11), sag with transient (C12), swell with transient (C13) and flicker with transient (C14). The sampling frequency is 3.2 kHz, and the fundamental frequency is 50 Hz. For the sake of improving the capability of features extracted from ST, according to literature [

25], the different values of the window width factor are given in different frequency domain. The original features extracted from ST modular matrix (STMM) are described as follow [

13]:

Feature 1 (F1): the maximum value of the maximum amplitude of each column in STMM (A_{max}).

Feature 2 (F2): the minimum value of the maximum amplitude of each column in STMM (A_{min}).

Feature 3 (F3): the mean value of the maximum amplitude of each column in STMM (Mean).

Feature 4 (F4): the standard deviation (STD) of the maximum amplitude of each column in STMM (STD).

Feature 5 (F5): the amplitude factor (${A}_{f}$) of the maximum amplitude of each column in STMM, defined as ${A}_{f}=\frac{{A}_{\mathrm{max}}+{A}_{\mathrm{min}}-1}{2}$ in the range $0<{A}_{f}<1$.

Feature 6 (F6): the STD of the maximum amplitude in the high frequency area above 100 Hz.

Feature 7 (F7): the maximum value of the maximum amplitude in the high frequency area above 100 Hz (A_{HFmax}).

Feature 8 (F8): the minimum value of the maximum amplitude in the high frequency area above 100 Hz (A_{HFmin}).

Feature 9 (F9): ${A}_{HF\mathrm{max}}-{A}_{HF\mathrm{min}}$.

Feature 10 (F10): the Skewness of the high frequency area.

Feature 11 (F11): the kurtosis of the high frequency area.

Feature 12 (F12): the standard deviation of the maximum amplitude of each frequency.

Feature 13 (F13): the mean value of the maximum amplitude of each frequency.

Feature 14 (F14): the mean value of the standard deviation of the amplitude of each frequency.

Feature 15 (F15): the STD of the STD of the amplitude of each frequency.

Feature 16 (F16): the STD of the STD of the amplitude of the low frequency area below 100 Hz.

Feature 17 (F17): the STD of the STD of the amplitude of the high frequency area above 100 Hz.

Feature 18 (F18): the total harmonic distortion (THD).

Feature 19 (F19): the energy drop amplitude of 1/4 cycle of the original signal.

Feature 20 (F20): the energy rising amplitude of 1/4 cycle of the original signal.

Feature 21 (F21): the standard deviation of the amplitude of fundamental frequency.

Feature 22 (F22): the maximum value of the intermediate frequency area.

Feature 23 (F23): energy of the high frequency area from 700 Hz to 1000 Hz.

Feature 24 (F24): energy of the high frequency area after morphological de-noising.

Feature 25 (F25): energy of local matrix.

Feature 26 (F26): the summation of maximum value and minimum value of the amplitude of STMM.

Feature 27 (F27): the summation of the maximum value and minimum value of the maximum amplitude of each column in STMM.

Feature 28 (F28): the root mean square of the mean value of the amplitude of each column in STMM.

Feature 29 (F29): the summation of the maximum value and minimum value of the standard deviation of the amplitude of each column in STMM.

Feature 30 (F30): the STD of the STD of the amplitude of each column in STMM.

Feature 31 (F31): the mean value of the minimum value of the amplitude of each line in STMM.

Feature 32 (F32): the STD of the minimum value of the amplitude of each line in STMM.

Feature 33 (F33): the root mean square of the minimum value of the amplitude of each line in STMM.

Feature 34 (F34): the STD of the STD of the amplitude of each line in STMM.

Feature 35 (F35): the root mean square of the standard deviation of the amplitude of each line in STMM.

The amplitude of voltage of a sampling point is ${x}_{i}$, where $1\le i\le M$, and M is the number of all sampling points. Then the relevant calculation formulas of features are described as follow:

Mean: $\overline{x}=\frac{1}{M}{\displaystyle \sum _{i=1}^{M}{x}_{i}}$.

STD: ${\sigma}_{STD}=\sqrt{\frac{1}{M}{\displaystyle \sum _{i=1}^{M}{({x}_{i}-\overline{x})}^{2}}}$.

Skewness: ${\sigma}_{skewness}=\frac{1}{(M-1){\sigma}_{STD}^{3}}{\displaystyle \sum _{i=1}^{M}{({x}_{i}-\overline{x})}^{3}}$.

Kurtosis: ${\sigma}_{kurtosis}=\frac{1}{(M-1){\sigma}_{STD}^{4}}{\displaystyle \sum _{i=1}^{M}{({x}_{i}-\overline{x})}^{4}}$.

And the calculation formulas of F19 and F20 are given by:

$F19=\frac{\mathrm{min}[Rms(m)]}{{R}_{0}}$.

$F20=\frac{\mathrm{max}[Rms(m)]}{{R}_{0}}$.

where $Rms(m)$ is the root mean square (RMS) of each $1/4$ cycles of the original signal, and ${R}_{0}$ is the RMS of standard PQ signal with no noise.

Moreover, sampling point in the matrix of ith row and jth column is ${x}_{ij}$, where ${N}_{1}\le i\le {N}_{2}$, ${M}_{1}\le j\le {M}_{2}$, ${N}_{1}$, ${N}_{2}$, ${M}_{1}$ and ${M}_{2}$ are the starting line, the end line, the starting column and the ending column of the required submatrix for the calculation of relevant energy features respectively. The calculation formula of energy relevant features is described as follows:

Energy: ${\sigma}_{energy}={\displaystyle \sum _{i={N}_{1}}^{{N}_{2}}{\displaystyle \sum _{j={M}_{1}}^{{M}_{2}}{\left|{x}_{ij}\right|}^{2}}}$.

The calculation methods of these features mainly refer to [

13,

15]. Among them, the calculation methods of features from F1 to F24 refer to [

13], and calculation methods of features from F26 to F35 refer to [

15]. Moreover, there are six kinds of complex disturbances needed to be classified, and the classification of complex disturbances with transient is easy to be disturbed by noise and time-frequency energy of starting and ending points of voltage sag. Therefore, F25 is introduced for identification of transient oscillation components.

The calculation method of F25 is described as follows:

- (1)
Using the maximum of the summation of amplitudes of each row in oscillation frequency domain, and the maximum of the summation of amplitudes of each column in the full time domain, to locate the possible time-frequency center point of oscillation.

- (2)
The local energy of the final 1/4 cycle and the $\pm $150 Hz range of this time-frequency center point is calculated as F25.

The above features reflect the disturbance characteristics of different types of PQ disturbances from four aspects, which are disturbance amplitude, disturbance frequency, energy of high frequency and mutations of original signal energy. When a disturbance occurs, the values of some features will have big difference between different types of disturbances. Then the features which reflect the disturbance index can be used to recognize disturbances. Eleven features can distinguish different disturbances according to disturbance amplitude, including F1to F5, F21 and F26 to F30. Nineteen features can distinguish different disturbances according to disturbance frequency, including F6 to F18, F22 and F31 to F35. And these features reflect the main frequency components of disturbances and the amplitude spectrum differences. Three features can distinguish higher harmonics from transient oscillations according to the energy in high frequency area, including F23 to F25. Finally, based on the characteristic that the original signal amplitude of disturbances with sag, interruption and swell will mutate after a disturbance occurs, two features, F19 and F20, can distinguish these three kinds of disturbances by calculating the energy of 1/4 cycle of the original signal.

#### 4.2. Feature Selection and Classification Effect Analysis of the New Method

Fifteen types of PQ disturbances with random disturbance parameters and signal-to-noise ratio (SNR) between 50 dB and 20 dB were simulated in Matlab 7.2. Five hundred samples of each type are generated to train the RF classifier for feature selection. Moreover, 100 samples of each type, with random disturbance parameters and the SNR are 50, 40, 30 and 20 dB respectively, are generated to verify the feature selection effect and classification ability of the new method under different noise environments.

According to the new method, features with non-zero EnI value will be added to selected feature subset one after another following the order from big to small of their EnI values. Whenever a feature is added, RF is used to verify the classification effect of this feature subset. Using information gain and Gini index as the basis of the node partition respectively, the two different importances of features are shown in

Figure 3a,b. It can be known from

Figure 3a that there are 20 features with their EnI value is 0. This means these features have no or very little effect on the node segmentation. Therefore, when searching the feature space, the new method needs only to iterate 15 times while GiI method needs to iterate 35 times. The efficiency of the new method in feature selection is better than GiI based method.

**Figure 3.**
(**a**) EnI value of features; (**b**) GiI value of features.

**Figure 3.**
(**a**) EnI value of features; (**b**) GiI value of features.

According to

Figure 3a, F4, F5, F22 and F25 have the highest EnI value. As explained in

Section 4.1, F4 represents the standard deviation of the maximum amplitude of each column in STMM. Then the values of the standard deviation of disturbances such as sag, swell and interruption are large. The values of the STD of steady-state disturbances such as normal voltage, flicker and spike are small respectively, so F4 can divide all kinds of disturbances into two categories. F5 represents the amplitude factor of the maximum amplitude of each column in STMM. Because the values of F5 of swell, sag and other types of disturbances are in different intervals, F5 can distinguish swell and sag with others. F22 represents the maximum value of the intermediate frequency area, and it can distinguish harmonic with other disturbances. F25 represents the energy of local matrix. According to the characteristic that the disturbance frequency of transient is high, F25 can distinguish transient with other disturbances.

Figure 4a–c illustrates the classification performances of combinations of the first four features in

Figure 3a in the condition of SNR =

$\infty $.

Figure 4a shows the scatter plot of combination of F5, F22 and F25. It can be seen that C1 and C5, C2 and C4, C7 and C12 and C6 and C15 exist cross sample. The other types of disturbance are clearly divided. Then F4 and F5 are used for further segmentation as

Figure 4b shows. Although C2 and C4 still exists cross in

Figure 4b, the cross number is sharply reduced. C7, C12, C6 and C15 are completely separated. As shown in

Figure 4c, C1 and C5 can be clearly divided by combination of F4 and F22. Therefore, the four features with the highest EnI value can distinguish 15 types of PQ signal effectively. The validity of the new method is proved.

**Figure 4.**
(**a**) Scatter plot of F5, F22 and F25; (**b**) Scatter plot of F4 and F5; (**c**) Scatter plot of F4 and F22.

**Figure 4.**
(**a**) Scatter plot of F5, F22 and F25; (**b**) Scatter plot of F4 and F5; (**c**) Scatter plot of F4 and F22.

Figure 5 and

Figure 6 present the classification effect and training error of different feature subsets with different SNR respectively. With the feature number increased one by one, the classification accuracy is increasing and the training error is decreasing. As shown in

Figure 5 and

Figure 6, the classification accuracy and the training error tend to be stable when the feature subset dimension of the new method exceeds four, while GiI method needs at least ten features to achieve satisfying classification results.

When the number of selected feature is 4 or 10, respectively, the details of the classification accuracy of EnI method and GiI method are listed in

Table 1,

Table 2,

Table 3 and

Table 4. From these four tables, it can be seen that EnI method can achieve higher classification accuracy with the same feature subset under the high noise environment (the SNR of PQ signals is 20 dB).

**Figure 5.**
(**a**) Classification accuracy of different feature subsets obtained from EnI method; (**b**) Classification accuracy of different feature subsets obtained from GiI method.

**Figure 5.**
(**a**) Classification accuracy of different feature subsets obtained from EnI method; (**b**) Classification accuracy of different feature subsets obtained from GiI method.

**Figure 6.**
(**a**) Training error of different feature subsets obtained from EnI method; (**b**) Train error of different feature subsets obtained from GiI method.

**Figure 6.**
(**a**) Training error of different feature subsets obtained from EnI method; (**b**) Train error of different feature subsets obtained from GiI method.

**Table 1.**
Classification of new method (the number of feature is 4, SNR is 20 dB).

**Table 1.**
Classification of new method (the number of feature is 4, SNR is 20 dB).
Class | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 |
---|

**C0** | 86 | 0 | 0 | 0 | 1 | 9 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |

**C1** | 0 | 87 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 |

**C2** | 0 | 0 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 |

**C3** | 0 | 5 | 0 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

**C4** | 0 | 0 | 0 | 0 | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |

**C5** | 0 | 0 | 0 | 0 | 0 | 99 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C6** | 0 | 0 | 0 | 0 | 0 | 0 | 96 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 0 |

**C7** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C8** | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 96 | 0 | 3 | 0 | 0 | 0 | 0 |

**C9** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 |

**C10** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 |

**C11** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 |

**C12** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 |

**C13** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 |

**C14** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |

Comprehensive accuracy: 95.9% |

**Table 2.**
Classification of GiI method (the number of feature is 4, SNR is 20 dB).

**Table 2.**
Classification of GiI method (the number of feature is 4, SNR is 20 dB).
Class | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 |
---|

**C0** | 37 | 0 | 57 | 0 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 |

**C1** | 0 | 63 | 0 | 22 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 |

**C2** | 10 | 0 | 82 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 5 | 0 |

**C3** | 0 | 1 | 0 | 98 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

**C4** | 0 | 1 | 0 | 0 | 84 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 |

**C5** | 0 | 0 | 0 | 0 | 0 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 0 |

**C6** | 0 | 0 | 0 | 0 | 0 | 0 | 33 | 3 | 0 | 0 | 32 | 32 | 0 | 0 | 0 |

**C7** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 80 | 0 | 2 | 0 | 17 | 0 | 1 | 0 |

**C8** | 1 | 0 | 1 | 0 | 0 | 0 | 4 | 64 | 28 | 0 | 0 | 2 | 0 | 0 | 0 |

**C9** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 86 | 0 | 14 | 0 | 0 | 0 |

**C10** | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 6 | 0 | 0 | 29 | 35 | 0 | 0 | 0 |

**C11** | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 | 0 | 16 | 6 | 74 | 0 | 0 | 0 |

**C12** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 85 | 0 | 15 |

**C13** | 0 | 0 | 0 | 0 | 0 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 67 | 0 |

**C14** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 98 |

Comprehensive accuracy: 66.3% |

**Table 3.**
Classification of new method (the number of feature is 10, SNR is 20 dB).

**Table 3.**
Classification of new method (the number of feature is 10, SNR is 20 dB).
Class | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 |
---|

**C0** | 91 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C1** | 0 | 87 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 |

**C2** | 0 | 0 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 |

**C3** | 0 | 1 | 0 | 98 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |

**C4** | 0 | 0 | 0 | 0 | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |

**C5** | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C6** | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C7** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C8** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 |

**C9** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 |

**C10** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 |

**C11** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 |

**C12** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 |

**C13** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 |

**C14** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |

Comprehensive accuracy: 97.1% |

**Table 4.**
Classification of GiI method (the number of feature is 10, SNR is 20 dB).

**Table 4.**
Classification of GiI method (the number of feature is 10, SNR is 20 dB).
Class | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 |
---|

**C0** | 90 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |

**C1** | 0 | 90 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 |

**C2** | 0 | 0 | 94 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 |

**C3** | 0 | 3 | 0 | 97 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C4** | 0 | 0 | 0 | 0 | 91 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 |

**C5** | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C6** | 0 | 0 | 0 | 0 | 0 | 0 | 94 | 0 | 1 | 0 | 0 | 5 | 0 | 0 | 0 |

**C7** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

**C8** | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 0 | 0 | 0 | 0 | 0 | 0 |

**C9** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 99 | 0 | 0 | 0 | 0 | 0 |

**C10** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 |

**C11** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 98 | 0 | 0 | 0 |

**C12** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 | 0 |

**C13** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0 |

**C14** | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |

Comprehensive accuracy: 96.8% |

#### 4.3. Comparison Experiment and Analysis

The feature selection result of the new method is compared with GiI method, SFS algorithm [

35] and sequential backward search (SBS) [

36] to testify the validity of the new approach. The number of selected feature based on GiI method, SFS method and SBS method are 10, 13 and 15 respectively. The new method considers two cases, including the dimension of the feature subset are 4 and 10 respectively. Moreover, the original feature set is used as a contrast as well.

The selected features after feature selection segment by the new method are {F4,F5,F22,F25} and {F1,F3,F4,F5,F18,F21,F22,F25,F26, F33}, respectively.

The selected features after feature selection segment by the GiI method are {F5,F9,F10,F11,F18,F19,F22,F25,F27,F31}.

The selected features after feature selection segment by the SFS method are {F2,F4,F5,F7,F10,F16,F18,F19,F22,F26,F27,F29,F31}.

The selected features after feature selection segment by the SBS method is {F1,F3,F4,F6,F11,F13,F18,F22,F23,F25,F27,F28,F29,F31,F33}.

For the sake of verifying the validity of the feature selection results of the new method, four kinds of classifier, including RF, SVM [

14], PNN [

13] and DT, are used to classify 15 kinds of PQ signals under the condition of different noise environments and different feature subsets. The DT classifier is constructed by rpart software package in R project. The classification results are shown in

Table 5.

**Table 5.**
Comparison of feature selection method.

**Table 5.**
Comparison of feature selection method.
SNR | Feature Selection Method | The Number of Features | Classification Accuracy(%) |
---|

RF | SVM | NN | DT |
---|

50 dB | EnI + SFS | 4 | 99.7 | 95.5 | 98.9 | 98.1 |

GiI + SFS | 4 | 82.6 | 74.6 | 76.1 | 75.3 |

EnI + SFS | 10 | 99.9 | 98.6 | 99.6 | 99.0 |

GiI + SFS | 10 | 99.9 | 98.5 | 99.7 | 98.9 |

SFS | 13 | 99.4 | 98.3 | 99.5 | 98.3 |

SBS | 15 | 99.8 | 98.7 | 99.5 | 99.2 |

ALL | 35 | 99.9 | 98.9 | 97.6 | 99.5 |

40 dB | EnI + SFS | 4 | 99.9 | 96.1 | 99.2 | 99.4 |

GiI + SFS | 4 | 84.7 | 72.1 | 77.2 | 76.7 |

EnI + SFS | 10 | 100 | 96.8 | 99.8 | 99.7 |

GiI + SFS | 10 | 100 | 98.4 | 99.8 | 99.4 |

SFS | 13 | 99.6 | 98.4 | 99.6 | 98.7 |

SBS | 15 | 99.9 | 98.5 | 99.7 | 99.6 |

ALL | 35 | 100 | 99.3 | 98.2 | 99.9 |

30 dB | EnI + SFS | 4 | 99.7 | 95.8 | 99.1 | 98.5 |

GiI + SFS | 4 | 79.3 | 70.1 | 71.9 | 72.1 |

EnI + SFS | 10 | 99.7 | 96.2 | 99.6 | 99.0 |

GiI + SFS | 10 | 99.7 | 97.9 | 99.5 | 99.0 |

SFS | 13 | 98.8 | 97.7 | 99.1 | 98.0 |

SBS | 15 | 99.7 | 97.9 | 99.5 | 99.1 |

ALL | 35 | 99.7 | 98.2 | 97.6 | 99.6 |

20 dB | EnI + SFS | 4 | 95.9 | 94.8 | 94.2 | 92.5 |

GiI + SFS | 4 | 66.3 | 59.5 | 63.5 | 60.9 |

EnI + SFS | 10 | 97.1 | 95.9 | 95.2 | 93.9 |

GiI + SFS | 10 | 96.8 | 90.3 | 95.0 | 85.5 |

SFS | 13 | 90.3 | 90.7 | 88.7 | 80.5 |

SBS | 15 | 98.5 | 88.6 | 94.8 | 94.2 |

ALL | 35 | 97.6 | 90.9 | 94.5 | 95.0 |

The feature selection methods based on EnI and GiI are compared according to

Table 5. When RF is used as the classifier, and the selected feature number of EnI method is 4, the classification accuracy is almost close to GiI method with 10 features. When the selected feature number of EnI method is equal to GiI method, the accuracy of these two methods under the condition that the SNR is higher than 30 dB are the same, but the accuracy of EnI method under the condition that the SNR is 20 dB exceeds 0.3% compared to GiI method. It is proved that the new method based on EnI has better effect than GiI based method with RF based classifier. Meanwhile, when SNR is 20 dB, the SBS method can achieve the classification accuracy of 98.5%. However, when taking classification accuracy under all conditions and the efficiency of feature selection and extraction into consideration, EnI method is still thought to be better than SBS method. It can also be seen that the new method can use the same feature subset to achieve satisfying classification accuracy under different noise environments. This overcomes the disadvantage that existing research [

15] needs to select different feature subsets under different noise environments. Meanwhile, when RF is used as classifier and the dimension of the selected feature subset increases from 4 to 10, the classification accuracy of high SNR environment has not improved, but the classification accuracy of SNR is 20 dB has improved 1.2%. Therefore, different feature subsets can be selected according to the demand of classification accuracy and efficiency in practical work.

The classification ability of different classifiers can also be analyzed using

Table 5. As shown in

Table 5, when compared to the other three classifiers, RF performs better on the new test sets. The best classification accuracy can only be achieved by using RF as the classifier no matter what level of the noise environment is. When the SNR is 50 dB, and the feature selection methods are EnI + SFS (the number of selected feature is 10), GiI + SFS (the number of selected feature is 10) and ALL, RF can achieve the classification accuracy of 99.9%. When the SNR is 40 dB, and the feature selection methods are EnI + SFS (the number of selected feature is 10) and GiI + SFS (the number of selected feature is 10), RF can achieve the classification accuracy of 100%. When the SNR is 30 dB, and the feature selection methods are EnI + SFS (the number of selected feature is 4), EnI + SFS (the number of selected feature is 10), GiI + SFS (the number of selected feature is 10) and ALL, RF can achieve the classification accuracy of 99.7%. When the noise environment is high (SNR is 20 dB), and the feature selection method is SBS, the RF classification accuracy is higher than the SVM of 9.9%, and is higher than the other two classifiers of 3.7% and 4.3% respectively. All these prove that RF has higher anti-noise ability, and is more suitable for the application under high noise environment. Moreover, the RF classification accuracy is higher than the DT under any condition, which proves that RF has better generalization ability than DT.

Besides classification accuracy, the impact on classification efficiency by feature selection is also analyzed. In practical application, the original PQ signals have the need for ST process after they are collected. Then the corresponding features are extracted according to the ST results. Finally, the extracted features are used as the input of the well trained classifier to output the disturbance type. Therefore, feature selection can effectively reduce the computing time of features and complexity of classifier. When the number of selected feature are 4, 10, 13, 15 and 35 respectively, the normalized time that 50 new test sets of original disturbance signals consumed from ST process to disturbance type output is shown in

Figure 7. The whole time of signals recognized by 35 features were treated as the standard time (1 pu).

From

Figure 7, it can be seen that, the total classificaiton time reduces significantly with the decrease of feature number. When the number of selected feature decreased from 35 to 4, the total classificaiton time can reduce by 39.3%. When the number of selected feature decreased from 35 to 10, the total classification time can reduce by 27.3%. It proves that feature selection improves the classification efficiency of the classifier effectively.

**Figure 7.**
The normalized time of different selected feature number.

**Figure 7.**
The normalized time of different selected feature number.

#### 4.5. Affection of Signal Processing Method on Classification Accuracy

The influence of the signal processing method for PQ signals will also be considered. Different signal processing methods will affect the classification accuracy of PQ disturbance signals. Therefore, after the new feature selection method and RF classifier are proved to be effective, the classification accuracy of discrete wavelet transform (DWT) [

37] and wavelet package transform (WPT) [

38] are compared to ST. The new method is chosen as the feature selection and classification method.

In the contrast experiment, the features of DWT based method are extracted refers to literature [

37]. The fourth-order Daubechies wavelet (db-4) was chosen as the mother wavelet function. Then a 9-level multiresolution decomposition process is performed to the original signals. According to the detail coefficients at each level and the approximate coefficient at the last level, 90 features are extracted. The feature extraction methods of DWT are shown in

Table 6.

**Table 6.**
Feature extraction methods based on DWT [

37].

**Table 6.**
Feature extraction methods based on DWT [37].
Feature Extraction Methods Based on DWT |
---|

Mean | ${\mu}_{i}=\frac{1}{N}{\displaystyle {\sum}_{j=1}^{N}{C}_{ij}}$ | Energy | ${E}_{i}={{\displaystyle {\sum}_{j=1}^{N}\left|{C}_{ij}\right|}}^{2}$ |

Standard deviation | ${\sigma}_{i}={\left(\frac{1}{N}{\displaystyle {\sum}_{j=1}^{N}{({C}_{ij}-{\mu}_{i})}^{2}}\right)}^{\frac{1}{2}}$ | Shannon entropy | $S{E}_{i}=-{\displaystyle {\sum}_{j=1}^{N}{C}_{ij}^{2}\mathrm{log}({C}_{ij}^{2})}$ |

Skewness | $S{K}_{i}=\sqrt{\frac{1}{6N}}{{\displaystyle {\sum}_{j=1}^{N}\left(\frac{{C}_{ij}-{\mu}_{i}}{{\sigma}_{i}}\right)}}^{3}$ | Log energy entropy | $LO{E}_{i}={\displaystyle {\sum}_{j=1}^{N}\mathrm{log}({C}_{ij}^{2})}$ |

Kurtosis | $KR{T}_{i}=\sqrt{\frac{N}{24}}\left(\frac{1}{N}{{\displaystyle {\sum}_{j=1}^{N}\left(\frac{{C}_{ij}-{\mu}_{i}}{{\sigma}_{i}}\right)}}^{4}-3\right)$ | Norm entropy | $N{E}_{i}={{\displaystyle {\sum}_{j=1}^{N}({C}_{ij})}}^{P}1\le P$ |

RMS | $rm{s}_{i}=\sqrt{\frac{1}{N}{\displaystyle {\sum}_{j=1}^{N}{C}_{ij}^{2}}}$ | | |

In

Table 6,

i = 1,2,L,…,

l represents multi resolution level, and

N stands for the number of details or approximate coefficients at each multi resolution level.

The features extracted from WPT refer to literature [

38]. The fourth-order Daubechies wavelet (db-4) was also chosen as the mother wavelet function. Then 16 wavelet coefficients can be obtained by performing a 4-level decomposition process, and 96 features can be extracted according to these coefficients. The feature extraction methods of WPT are shown in

Table 7.

**Table 7.**
Feature extraction methods based on WPT [

38].

**Table 7.**
Feature extraction methods based on WPT [38].
Feature Extraction Methods Based on WPT |
---|

Mean | ${\mu}_{j}=\frac{1}{M}{\displaystyle {\sum}_{l=1}^{M}{C}_{jl}}$ | Kurtosis | $KR{T}_{j}=\frac{E{({C}_{jl}-{\mu}_{j})}^{4}}{{\sigma}_{j}^{4}}$ |

Standard deviation | ${\sigma}_{j}={\left(\frac{1}{M-1}{\displaystyle {\sum}_{l=1}^{M}{({C}_{jl}-{\mu}_{j})}^{2}}\right)}^{\frac{1}{2}}$ | Energy | $E{D}_{j}={{\displaystyle {\sum}_{l=1}^{M}\left|{C}_{jl}\right|}}^{2}$ |

Skewness | $S{K}_{j}=\frac{E{({C}_{jl}-{\mu}_{j})}^{3}}{{\sigma}_{j}^{3}}$ | Entropy | $EN{T}_{j}=-{\displaystyle {\sum}_{l=1}^{M}{C}_{jl}^{2}\mathrm{log}({C}_{jl}^{2})}$ |

In

Table 7,

j = 1,2,L,…,

k represents the number of nodes at the fourth decomposition level, and

$M$ is the number of coefficients in each decomposed data.

After the original feature subsets are obtained, the new feature selection stategy put forward in this paper is adopted to select useful features as well. The number of features selected from the original feature subsets of DWT and WPT are 23 and 27, respectively, and the descriptions of these two optimal feature subsets are shown in

Table 8 and

Table 9, respectively. Finally, the two optimal feature subsets are used as the input of the RF to train the classifier. The classification accuracy of the classifier is shown in

Table 10.

**Table 8.**
The selected features extracted from DWT method.

**Table 8.**
The selected features extracted from DWT method.
The Numbers and Names of the Selected Features Extracted from DWT Method |
---|

7 | 7th level of mean | 37 | 7th level of kurtosis | 65 | 5th level of Shannon entropy |

9 | 9th level of mean | 44 | 4th level of RMS | 67 | 7th level of Shannon entropy |

14 | 4th level of Std. Deviation | 45 | 5th level of RMS | 84 | 4th level of norm entropy |

15 | 5th level of Std. Deviation | 48 | 8th level of RMS | 85 | 5th level of norm entropy |

20 | App. level of Std. deviation | 54 | 4th level of energy | 86 | 6th level of norm entropy |

27 | 7th level of Skewness | 55 | 5th level of energy | 87 | 7th level of norm entropy |

32 | 2th level of kurtosis | 58 | 8th level of energy | 90 | App. level of norm entropy |

35 | 5th level of kurtosis | 64 | 4th level of Shannon entropy | | |

**Table 9.**
The selected features extracted from WPT method.

**Table 9.**
The selected features extracted from WPT method.
The Numbers and Names of the Selected Features Extracted from WPT Method |
---|

1 | Mean of 1st node | 49 | kurtosis of 1st node | 61 | kurtosis of 13th node |

2 | Mean of 2nd node | 50 | kurtosis of 2nd node | 62 | kurtosis of 14th node |

4 | Mean of 4th node | 51 | kurtosis of 3rd node | 64 | kurtosis of 16th node |

7 | Mean of 7th node | 52 | kurtosis of 4th node | 65 | energy of 1st node |

17 | Std. deviation of 1st node | 53 | kurtosis of 5th node | 66 | energy of 2ndnode |

18 | Std. deviation of 2nd node | 54 | kurtosis of 6th node | 68 | energy of 4th node |

20 | Std. deviation of 4th node | 55 | kurtosis of 7th node | 81 | entropy of 1st node |

33 | skewness of 1st node | 56 | kurtosis of 8th node | 82 | entropy of 2nd node |

34 | skewness of 2nd node | 58 | kurtosis of 10th node | 84 | entropy of 4th node |

**Table 10.**
Effect of different signal processing methods for PQ classification.

**Table 10.**
Effect of different signal processing methods for PQ classification.
SNR | Feature Selection | Classification Accuracy with Different Signal Processing Method(%) |
---|

ST | DWT | WPT |
---|

50 dB | No | 99.7 | 98.4 | 95.5 |

Yes | 99.9 | 97.5 | 94.2 |

40 dB | No | 100 | 98.8 | 96.4 |

Yes | 100 | 98.9 | 94.8 |

30 dB | No | 99.7 | 97.1 | 94.0 |

Yes | 99.7 | 96.7 | 91.5 |

20 dB | No | 97.6 | 83.5 | 82.9 |

Yes | 97.1 | 85.8 | 82.6 |

From

Table 10, it can be clearly seen that the method with ST can achieve higher classification accuracy than the other signal processing methods under any conditions. When SNR is 20 dB and there is no feature selection process, the classification accuracy of ST based method is higher than DWT and WPT of 14.1% and 14.7%, respectively. If the feature selection process is performed, the classification accuracy of ST based method is higher than DWT and WPT of 11.3% and 14.5%, respectively. These prove that ST has good anti-noise ability. It is reasonable to use ST as the signal processing method in the new approach.