Power Quality Disturbance Classification Based on DWT and Multilayer Perceptron Extreme Learning Machine

In order to effectively identify complex power quality disturbances, a power quality disturbance classification method based on empirical wavelet transform and a multi-layer perceptron extreme learning machine (ELM) is proposed. The model uses the discrete wavelet transform (DWT) multi-resolution method to extract classification features. Combined with hierarchical ELM (H-ELM) characteristics, the particle swarm optimization (PSO) single-object feature selection method is used to select the optimal feature set. The hidden layer of the H-ELM classifier in the model is trained by forward training. Once the previous layer is established, the weight of the current layer can be fixed without fine-tuning. Therefore, the training speed can be accelerated, the recognition accuracy is almost independent of the parameter adjustment, and the model has strong robustness. In order to solve the problem of data imbalance in the actual power system, a data enhancement method is proposed to reduce the impact of data imbalance and enhance the generalization performance of the network. The simulation results showed that the proposed method can identify 16 disturbances efficiently and accurately under different noise conditions, and the robustness of the proposed method is verified by the measured data.


Introduction
Due to the large-scale use of power electronic devices, there has been an increase in distributed power supply grid-connected non-linear loads.Concurrently the proliferation of reactive power devices and solid-state switches cause the power grid to frequently suffer from various interferences.All these factors result in the emergence of various power quality disturbances [1].Accurate positioning and identification of power quality disturbances is the premise of power quality analysis and governance.Therefore, pattern recognition of power quality disturbances has become a top priority [2].The classification study of power quality disturbance (PQD) is divided into three stages, feature extraction, feature selection [3] and classifier design.
At present, power quality disturbance feature extraction [4][5][6][7][8] is mainly based on experience and statistics.In the literature [9], the S transform is used for feature extraction, but the S-transformed Gaussian window changes in the same direction as the frequency, which hinders its adaptability for different signals analysis.In reference [10], the authors further applied the multi-resolution S transform to extract the features, but the analysis of the signal was cumbersome.In another reference [6], the feature extraction was performed by using the short-time Fourier transform, but the Fourier transform window function is fixed, the time-frequency resolution is single, and the extracted features lack the multi-resolution features.Researchers [11] have proposed using discrete wavelet transform (DWT) to overcome the fixed-resolution problem of short-time Fourier transform to analyze PQD signals.DWT is especially suitable for automatic detection and feature extraction of PQD, especially in terms of transient interference.Moreover, DWT has multiple resolutions, which can determine the initial feature set more accurately.
In recent years, the vast development of data mining, machine learning algorithms and hardware computing capabilities has offered powerful tools to various fields.From the perspective of classifier design, decision trees (DT), probabilistic neural networks (PNN), support vector machines (SVM) and deep neural networks (DNN) have achieved good results in PQD classification.However, the DT [12] classification threshold setting depends on training samples, simultaneously, DT classification has poor generalization ability; compared with DT, PNN [13] is faster and more accurate in general, but it is slower in classifying new cases, PNN requires more storage space than DT; SVM [14,15] needs to set many parameters and is prone to overfitting.Although the neural network model [16,17] has a high classification accuracy, its training and classification speed is slower, and the training network requires a large amount of data.In this study, we extend the extreme learning machine (ELM) and propose a hierarchical ELM (H-ELM) framework for ELM-based multilayer perceptron.H-ELM has both the ability to classify small samples and the high accuracy of deep learning classification.At the same time, the framework has a high classification performance.When processing large amounts of data, the classification speed is fast, and its self-learning feature extraction module can greatly prevent the model over-fitting.
Most existing studies have aimed to optimize the classifiers and feature extraction, but have lacked consideration of the actual operating conditions of power quality disturbance data.Artificial intelligence applications, however, should be able to consider actual grid data characteristics.There is a data imbalance in the power quality disturbance data collected by the power grid, and there are great differences even with the same type of interference.Most of the existing research is focused on the balanced data of simulation, while they do not pay attention to the above problem.In this paper, the data enhancement method is used to deal with the imbalance of data and to achieve data balance.For the disturbance difference, the H-ELM classifier with strong generalization ability is used for classification.For the first time, the method is applied to the classification of power quality disturbances.We propose to use a power quality disturbance recognition method based on DWT and a multilayer perceptron extreme learning machine.The major contributions of this paper are as follows: (1) For the first time, the H-ELM algorithm is applied to the PQD classification.A comprehensive experimental exploration of H-ELM for PQD classification is performed.
(2) The feature selection algorithm is combined with the H-ELM algorithm to improve the classification accuracy and speed.The simulation results show that the method is more accurate than the traditional methods in classification accuracy.Both speed and the ability to process big data have improved significantly.
(3) In this paper, we consider the problem of PQD data imbalance, and utilize data enhancement to solve the data imbalance.Simultaneously, the paper also uses the data enhancement method to expand the data set to solve the problem of insufficient measured tag data.
The rest of the paper is organized in the following sequence.Section 2 describes power quality disturbance feature extraction.Section 3 discusses the classification of power quality disturbance based on H-ELM.In Section 4, a simulation experiment verifies the feasibility of the algorithm.Section 5 uses the measurement data to verify the feasibility of the algorithm.Section 6 concludes the paper.

Feature Extraction Based on Discrete Wavelet Transform
Wavelet transform [18] is used to analyze stationary and non-stationary signals in various scenes, which can analyze local discontinuities in the signal.Mathematically, continuous wavelet transform (CWT) for a continuous signal with respect to the wavelet function ψ(t) is given by (1).
Parameters a and b represent scale and conversion parameters, f (t) represents the original signal.In the practical application of CWT, there is redundant information that is not suitable for computer analysis.The study [11] found that DWT is more suitable for the analysis of PQDs, as shown in Equation (2).
The scaling and translation parameters are replaced by functions of m and n integers, i.e., a = a 0 m and b = kb 0 a 0 m , respectively, whereas f (k) is the sequence of discrete points of the continuous time signal f (t).
In the feature extraction process, the PQD signal is decomposed using a discrete wavelet transform.This wavelet analysis is actually a measure of the similarity between the mother wavelet and the input signal.The correct choice of wavelet master function is one of the main problems in the execution of DWT applications.In this paper, the widely used Daubechies4 wavelet filter is used as a mother wavelet [19].The number of decomposition levels l is also very important.Choosing a higher l will bring more information into the system.In the text, the PQD signal is decomposed into eight levels for feature extraction.
The statistical parameters that are used for the feature extraction were obtained from the literature [17].The seven statistical features are entropy (Ent), standard deviation (σ), mean (µ), kurtosis (KT), skewness (SK), root mean square (RMS) and range (RG).These statistical feature values are calculated using the mathematical formulas of Equations ( 3) through (9).The power quality waveform is decomposed into eight levels, providing eight detail coefficients and one approximation coefficient, and the total features obtained is 63, which is done for each PQD in the 4500 × 1280 signal matrix.Finally, a 4500 × 63 feature matrix is obtained and normalized for classification.The original feature set according to DWT statistics is shown in Table 1.
µ_1 represents the mean feature provided by the first layer of detail coefficients, σ_1 represents the standard deviation feature provided by the first layer of detail coefficients, RMS_1 represents the root mean square feature provided by the first layer of detail coefficients, KT_1 represents the kurtosis feature provided by the first layer of detail coefficients, Ent_1 represents the entropy feature provided by the first layer of detail coefficients, SK_1 represents the skewness feature provided by the first layer of detail coefficients and RG_1 represents the range feature provided by the first layer of detail coefficients. Rang: Entropy: Standard deviation: (5) Mean: Kurtosis: Skewness: Root mean square: where i = 1, 2, . . ., l represents the number of wavelet decomposition at level l.Here N is the number of coefficients in each decomposed data.The PQD waveforms are decomposed into up to eight levels which provide eight detail coefficients (D1, D2, D3, D4, D5, D6, D7, D8) and one approximation coefficient (A8).

Feature Selection to Select the Best Feature
The combination of different features has different effects on the classifier.In order to verify that the proposed feature is an effective feature and find the best feature combination, the existing research adopts the multi-objective feature selection method.The H-ELM classifier proposed in this paper has a high calculation speed, and the difference in the number of classification features is less than 10, which has little effect on the classification speed of the classifier.
In this paper, the particle swarm optimization algorithm is used to optimize the error of the classifier to select the best feature combination.The features selected by PSO-SVM meet the classification accuracy requirements of H-ELM.The main idea of PSO is to select the subset as a search optimization problem, generate different combinations, evaluate the combination and compare with other combinations.This makes the selection of the best feature subset an optimization problem.The algorithm used in this paper uses SVM classification accuracy as the objective function as shown in Equation (10), where a i is the SVM classification accuracy.
For the fitness definition, the classification accuracy a, representing the percentage of the example of the correct classification, is evaluated by Equation (11).The number of examples of correct and incorrect classifications is represented by c and u, respectively.
The classification accuracy of SVM is the objective function, the classification error is the lowest, and the most suitable feature combination is selected.The best feature combination selected by the PSO single-objective optimization feature selection is shown in Table 2.The feature selection algorithm used in this paper is offline.The feature selection is first performed to find the best combination of features.Then, the feature extraction is performed for the feature quantity to be extracted with the selected best feature, which reduces the computational complexity of the feature extraction and the running time of the algorithm.The selected features are related to the data set and related to the parameter selection of the feature selection, but a common result is that the feature combinations selected by this method have better classification accuracy.

Classification of Power Quality Disturbance Based on H-ELM
The essence of machine learning is to establish a network with specific weights and deviations, by using given input data and target values, and to classify them.After the arrival of new data, the output category can be judged through the trained network.The H-ELM framework [20] is a multilayer perceptron extreme learning machine that consists of two independent phases: (1) an unsupervised hierarchical feature representation that automatically extracts features from the input data and the original input features are converted to a higher latitude representation, and (2) the supervised feature classification.Because ELM combines the entire network through feature extraction and classification, it does not need to fine-tune the parameters, and can adapt to the network through the sparse self-encoder, so it has the advantages of fast training speed and high classification accuracy.

ELM Learning Algorithm
ELM can be built using randomly initialized hidden layer nodes, given power quality disturbance data (x i , t i ) , where x i is the training data vector, t i represents the type of each power quality disturbance data, and L represents the number of hidden layer nodes.ELM theory seeks minimal training errors as shown in Equation (12). where 2 ), 1, 2, . . ., +∞, H is the output matrix of the hidden layer as shown in Equation ( 13) and β is the output weight.λ is a user-specified parameter and provides a tradeoff between the distance of the separating margin and the training error [21].
T is the training data tag matrix, as shown in Equation ( 14): The ELM training algorithm can be divided into the following three steps: (1) Randomly assign hidden layer node parameters; (2) Calculate the hidden layer output matrix H; (3) Obtain an output weight vector such as Equation (15).
where T = [t 1 , . . ., t N ] T , H † is the generalized inverse matrix of Moore-Penrose of matrix H.According to the ridge regression theory, it was suggested that a positive value (1/λ) is added to the diagonal of HH T in the calculation of the output weights β.To improve the robustness of ELM, the output weight vector can be obtained using Equation (16).
The output function of ELM is shown in Equation ( 17):

ELM-Based Sparse Autoencoder
The ELM universal approximation function is used in the design of automatic encoders, and sparse constraints are added to the automatic encoder optimization [20].The optimization model of the ELM sparse autoencoder can be expressed as Equation (18): where X represents the input data, H represents the random map output, and β is the hidden layer weight to be obtained.In order to generate more sparse and compact features of the inputs, 1 optimization is performed for the establishment of ELM [20].
The problem in ( 17) is solved by a fast iteration shrinkage-thresholding algorithm.The implementation process is as follows: (1) Calculate the Lipschitz constant γ of the gradient of the smooth convex function ∇p.

H-ELM Framework
H-ELM is constructed in multiple layers.As shown in Figure 1, unlike the greedy layered training of the traditional deep learning framework, it can be seen that the H-ELM training framework is structurally divided into two separate phases: (1) unsupervised hierarchical feature representation and (2) supervised feature classification [20].
The autoencoder in the H-ELM framework is a self-encoder with sparse constraints.The implementation of the ELM sparse autoencoder is shown in Figure 1b above.It can be seen from the figure that unlike the automatic encoder in deep learning, the input weight of the ELM sparse autoencoder is established by searching the loop path from the random space.ELM theory demonstrates that ELM training using stochastic mapped input weights can approximate any input data.That is to say, if the automatic encoder is trained according to the concept of ELM, once the automatic encoder is initialized, the parameters do not need to be fine-tuned.

. H-ELM Framework
H-ELM is constructed in multiple layers.As shown in Figure 1, unlike the greedy layered ining of the traditional deep learning framework, it can be seen that the H-ELM trainin mework is structurally divided into two separate phases: (1) unsupervised hierarchical featur resentation and (2) supervised feature classification [20].The autoencoder in the H-ELM framework is a self-encoder with sparse constraints.Th plementation of the ELM sparse autoencoder is shown in Figure 1b above.It can be seen from th ure that unlike the automatic encoder in deep learning, the input weight of the ELM spars oencoder is established by searching the loop path from the random space.ELM theory onstrates that ELM training using stochastic mapped input weights can approximate any inpu a.That is to say, if the automatic encoder is trained according to the concept of ELM, once th omatic encoder is initialized, the parameters do not need to be fine-tuned.

Classification Process
The classification process of this method is shown in Figure 2. The classification prediction model used in this paper is an algorithm based on hierarchical learning.The main trend of hierarchical learning is to conduct research based on deep learning.Deep learning training is challenging, requires a lot of data, and requires pre-processing data, therefore, it is difficult to apply to the classification of power quality disturbances.The H-ELM framework used in this paper is a hierarchical structure, mainly consisting of two parts: "feature extraction and supervised feature classification".The H-ELM algorithm has a more compact and more meaningful feature representation than the original ELM.Utilizing the advantages of ELM random feature mapping, the hierarchical coding output is randomly projected before the final decision, so that better classification results can be achieved, and the learning speed is faster.The hidden layer of the H-ELM framework is trained in the forward training mode.Once the previous layer is established, the weight of the current layer can be fixed without fine-tuning.Therefore, the proposed algorithm has high accuracy and a fast classification performance.The classification process of this method is shown in Figure 2. The classification prediction model used in this paper is an algorithm based on hierarchical learning.The main trend of hierarchical learning is to conduct research based on deep learning.Deep learning training is challenging, requires a lot of data, and requires pre-processing data, therefore, it is difficult to apply to the classification of power quality disturbances.The H-ELM framework used in this paper is a hierarchical structure, mainly consisting of two parts: "feature extraction and supervised feature classification".The H-ELM algorithm has a more compact and more meaningful feature representation than the original ELM.Utilizing the advantages of ELM random feature mapping, the hierarchical coding output is randomly projected before the final decision, so that better classification results can be achieved, and the learning speed is faster.The hidden layer of the H-ELM framework is trained in the forward training mode.Once the previous layer is established, the weight of the current layer can be fixed without fine-tuning.Therefore, the proposed algorithm has high accuracy and a fast classification performance.

Simulation Analysis and Result Verification
The proposed method uses the parametric equations of 15 PQD signals, including pure sine waves, to evaluate the classification performance of the proposed algorithm.The PQD simulation data set consists of nine single types, namely pure sinusoidal waveforms, sag, swell, interrupt, harmonics, Oscillatory transient, flicker, notch and spikes.The six complex PQD signals include sag with harmonics, swell with harmonics, interruption with harmonics, harmonic with flicker, flicker with sag and flicker with swell.The parameter variation of the power quality disturbance equation conforms to the parametric equation of the Institute of Electrical and Electronics Engineers 1159(IEEE-1159) standard [22].

Simulation Analysis and Result Verification
The proposed method uses the parametric equations of 15 PQD signals, including pure sine waves, to evaluate the classification performance of the proposed algorithm.The PQD simulation data set consists of nine single types, namely pure sinusoidal waveforms, sag, swell, interrupt, harmonics, Oscillatory transient, flicker, notch and spikes.The six complex PQD signals include sag with harmonics, swell with harmonics, interruption with harmonics, harmonic with flicker, flicker with sag and flicker with swell.The parameter variation of the power quality disturbance equation conforms to the parametric equation of the Institute of Electrical and Electronics Engineers 1159(IEEE-1159) standard [22].
The power quality disturbance signal specifications are: amplitude 1pu, duration t = 0.2 s, total period T = 10, total sampling point 1280 and sampling frequency 6.4 kHz.Each power quality disturbance type simulation generates 300 signals, to give a total of 4500 signals.These signals are stored in a matrix of size 4500 × 1280.A similar matrix increases the Gaussian white noise ratio by 50, 40, 30 and 20 dB at the signal-to-noise ratio (SNR).Part of the simulation signal of power quality disturbance is shown in Figure 3.
An important feature of the multilayer perceptron extreme learning machine is that the classification speed is fast and the algorithm runs for a short time.Compared with the machine learning algorithm, the algorithm has the advantages of high classification accuracy and fast classification speed.In order to verify that the multilayer perceptron also has this characteristic in the classification of power quality disturbance, this paper compares the speed of classification of the same dataset by three algorithms.Table 3 shows the total running time of the three algorithms.All the simulations were accomplished in MATLAB 2016a software on a laptop with Intel Core i5-M-520 processor at 2.40 GHz clock speed and 8 GB of RAM [18].As is shown in Table 3, the operating speed of H-ELM is superior to the PNN algorithm.
Appl.Sci.2019, 9, x FOR PEER REVIEW 9 of 16 The power quality disturbance signal specifications are: amplitude 1pu , duration 0.2 ts = , total period 10 T = , total sampling point 1280 and sampling frequency 6.4 kHz.Each power quality disturbance type simulation generates 300 signals, to give a total of 4500 signals.These signals are stored in a matrix of size 4500 × 1280.A similar matrix increases the Gaussian white noise ratio by 50, 40, 30 and 20 dB at the signal-to-noise ratio (SNR).Part of the simulation signal of power quality disturbance is shown in Figure 3.An important feature of the multilayer perceptron extreme learning machine is that the classification speed is fast and the algorithm runs for a short time.Compared with the machine learning algorithm, the algorithm has the advantages of high classification accuracy and fast classification speed.In order to verify that the multilayer perceptron also has this characteristic in the classification of power quality disturbance, this paper compares the speed of classification of the same dataset by three algorithms.Table 3 shows the total running time of the three algorithms.All the simulations were accomplished in MATLAB 2016a software on a laptop with Intel Core i5-M-520 processor at 2.40 GHz clock speed and 8 GB of RAM [18].As is shown in Table 3, the operating speed of H-ELM is superior to the PNN algorithm.4 and Figure 4.It can be seen intuitively from Figure 4 that our method has a good classification effect compared to other methods.Using the original data set classification, the H-ELM algorithm has a higher classification performance than the other three machine learning algorithms.When using the best feature set for classification, under the 20 dB signal-to-noise ratio, the classification effect of the  In order to verify the classification effect of the multilayer perceptron extreme learning machine, we compared it with five other existing methods.The comparison results are shown in Table 4 and Figure 4.It can be seen intuitively from Figure 4 that our method has a good classification effect compared to other methods.Using the original data set classification, the H-ELM algorithm has a higher classification performance than the other three machine learning algorithms.When using the best feature set for classification, under the 20 dB signal-to-noise ratio, the classification effect of the algorithm is significantly improved.The principal component analysis and support vector machines (PCASVM) algorithm also has good PQD recognition accuracy, but its training speed is slow, for the same amount of data, and its operating speed is 30 times that of the H-ELM algorithm.Therefore, the high performance of the H-ELM can be seen.Table 4 shows that the classification performance of the H-ELM algorithm is best when using the best feature set selected by PSO for classification.The performance of the pure H-ELM algorithm is better than that of other machine learning algorithms for the classification of power quality disturbances.It can be clearly seen from the results of the first four sets of experiments that the classification performance of the H-ELM classifier is better under the same data volume and the same feature set.Compared with the existing methods, the classification effect of the model proposed in this paper is obviously improved under various signal-to-noise ratios.It is proved that H-ELM has better classification performance for power quality disturbance data.In the simulation analysis, in order to verify that the features selected by the PSO feature selection promoted the classification results, the experiments using the original 63 features for classification and the classification using the best feature set were analyzed.The experimental results are shown in Table 5.From the four aspects of training time, training accuracy, test time and test accuracy, it can be clearly seen that the features determined by the PSO feature selection are used for classification, which has better classification accuracy and faster classification speed.Feature selection is only the determination phase of the initial feature, and is not embedded in the program.Just like the expert determines the feature, it only implements this process through the optimization method, so as to get a better classification effect.Table 4 shows that the classification performance of the H-ELM algorithm is best when using the best feature set selected by PSO for classification.The performance of the pure H-ELM algorithm is better than that of other machine learning algorithms for the classification of power quality disturbances.It can be clearly seen from the results of the first four sets of experiments that the classification performance of the H-ELM classifier is better under the same data volume and the same feature set.Compared with the existing methods, the classification effect of the model proposed in this paper is obviously improved under various signal-to-noise ratios.It is proved that H-ELM has better classification performance for power quality disturbance data.
In the simulation analysis, in order to verify that the features selected by the PSO feature selection promoted the classification results, the experiments using the original 63 features for classification and the classification using the best feature set were analyzed.The experimental results are shown in Table 5.From the four aspects of training time, training accuracy, test time and test accuracy, it can be clearly seen that the features determined by the PSO feature selection are used for classification, which has better classification accuracy and faster classification speed.Feature selection is only the determination phase of the initial feature, and is not embedded in the program.Just like the expert determines the feature, it only implements this process through the optimization method, so as to get a better classification effect.Figure 5 shows a comparison of training and test times for categorizing the best features selected using the PSO feature selection and classifying them with the original feature set.It can be seen from the figure that the classification is performed using the best feature set, and the training speed and test speed are significantly better than the original features.
test speed are significantly better than the original features.
Figure 6 shows a comparison of the classification accuracy using the best feature set and the original feature set.As can be seen from the figure, the blue histogram is significantly higher than the orange histogram.It shows that the classifier's classification accuracy is significantly improved after using the best feature set.Table 6 shows the classification accuracy rate of each disturbance, and that the overall disturbance classification accuracy rate is above 95%, which satisfies the actual classification needs.The misclassified samples are mainly concentrated in the 20 dB signal, with and without harmonic disturbance, indicating that high noise has greater interference with the identification of signal harmonics.

Disturbance Type
Classification Accuracy (%) Figure 6 shows a comparison of the classification accuracy using the best feature set and the original feature set.As can be seen from the figure, the blue histogram is significantly higher than the orange histogram.It shows that the classifier's classification accuracy is significantly improved after using the best feature set.
Figure 5 shows a comparison of training and test times for categorizing the best features selected using the PSO feature selection and classifying them with the original feature set.It can be seen from the figure that the classification is performed using the best feature set, and the training speed and test speed are significantly better than the original features.
Figure 6 shows a comparison of the classification accuracy using the best feature set and the original feature set.As can be seen from the figure, the blue histogram is significantly higher than the orange histogram.It shows that the classifier's classification accuracy is significantly improved after using the best feature set.Table 6 shows the classification accuracy rate of each disturbance, and that the overall disturbance classification accuracy rate is above 95%, which satisfies the actual classification needs.The misclassified samples are mainly concentrated in the 20 dB signal, with and without harmonic disturbance, indicating that high noise has greater interference with the identification of signal harmonics.

Disturbance Type
Classification Accuracy (%) Table 6 shows the classification accuracy rate of each disturbance, and that the overall disturbance classification accuracy rate is above 95%, which satisfies the actual classification needs.The misclassified samples are mainly concentrated in the 20 dB signal, with and without harmonic disturbance, indicating that high noise has greater interference with the identification of signal harmonics.

Real Signal Classification Verification
To further verify the feasibility of the proposed method in the actual signal, in this section, a set of actual signals is used to test the effectiveness of the H-ELM.The data set is provided by the IEEE Power Engineering Society database [23,24] for PQD classification.This data set has been tested in reference [25] for power quality classification effects to meet the needs of the experiment.The sampling rate of the supplied signal is 256 points per cycle.Each signal has a length of 1536.The obtained waveforms are determined label by label, and the data set is processed according to the data enhancement method.The actual disturbance signal is shown in Figure 7.

Real Signal Classification Verification
To further verify the feasibility of the proposed method in the actual signal, in this section, a set of actual signals is used to test the effectiveness of the H-ELM.The data set is provided by the IEEE Power Engineering Society database [23,24] for PQD classification.This data set has been tested in reference [25] for power quality classification effects to meet the needs of the experiment.The sampling rate of the supplied signal is 256 points per cycle.Each signal has a length of 1536.The obtained waveforms are determined label by label, and the data set is processed according to the data enhancement method.The actual disturbance signal is shown in Figure 7.The actual power quality disturbance data has an unbalanced feature.For example, the type of disturbance of voltage sag accounts for more than 80% of all disturbances.There is no relevant research on this problem.This paper proposes a data enhancement method to preprocess data.In computer vision, data enhancement is often used to increase the number of training samples to enhance the generalization performance of the classifier.In this paper, for the problem of data The actual power quality disturbance data has an unbalanced feature.For example, the type of disturbance of voltage sag accounts for more than 80% of all disturbances.There is no relevant research on this problem.This paper proposes a data enhancement method to preprocess data.In computer vision, data enhancement is often used to increase the number of training samples to enhance the generalization performance of the classifier.In this paper, for the problem of data imbalance, the data enhancement method is adopted, the amount of data is equalized and the data enhancement operation is performed for disturbances such as flicker with less data volume.The data enhancement operation mainly adopts random cropping, moderately increases random noise, reverses the signal, etc., and performs random extraction verification on all the operation signals to ensure that the data after data enhancement belongs to the disturbance type data.The data enhancement operation is shown in Figure 8. imbalance, the data enhancement method is adopted, the amount of data is equalized and the data enhancement operation is performed for disturbances such as flicker with less data volume.The data enhancement operation mainly adopts random cropping, moderately increases random noise, reverses the signal, etc., and performs random extraction verification on all the operation signals to ensure that the data after data enhancement belongs to the disturbance type data.The data enhancement operation is shown in Figure 8.The measured data set is classified and verified according to the method in this paper.The optimal feature combination and classifier parameters are shown in Table 7.
Table 7.The H-ELM algorithm parameter setting.

Parameter Numerical Value
Best feature set _3, _7,_2, _7,RMS_4,RMS_6,KT_6,KT_7,KT_9,Ent_2, Ent_6,Ent_7,Ent_8,RG_1 H-ELM hidden layer node N1=N2=10，N3=290 L2 penalty P on the last layer of ELM 2^-30 Scale factor S 0.8 We verified the real signals by five methods.The classification accuracy rate and algorithm running time of the five methods are shown in Table 8.It can be seen from the table that the proposed method achieves better classification performance on the measured data.Since the amount of real signal data is small, the samples of various types of signals are small, and even after the data enhancement operation, the amount of data is only 1000 sets.Therefore, the improvement of classification performance by the best features is affected.Since the real signal is more complex than the simulated signal, the real signal accuracy is reduced compared to the simulated signal.The measured data set is classified and verified according to the method in this paper.The optimal feature combination and classifier parameters are shown in Table 7.
Table 7.The H-ELM algorithm parameter setting.

Parameter Numerical Value
Best feature set µ_3, µ_7,σ_2, σ_7,RMS_4,RMS_6,KT_6,KT_7,KT_9,Ent_2, Ent_6,Ent_7,Ent_8,RG_1 H-ELM hidden layer node N1=N2=10, N3=290 L2 penalty P on the last layer of ELM 2ˆ-30 Scale factor S 0.8 We verified the real signals by five methods.The classification accuracy rate and algorithm running time of the five methods are shown in Table 8.It can be seen from the table that the proposed method achieves better classification performance on the measured data.Since the amount of real signal data is small, the samples of various types of signals are small, and even after the data enhancement operation, the amount of data is only 1000 sets.Therefore, the improvement of classification performance by the best features is affected.Since the real signal is more complex than the simulated signal, the real signal accuracy is reduced compared to the simulated signal.The best feature set is selected by the feature selection algorithm.The disturbance classification results obtained by the method of the present invention are shown in Table 9.There is no data imbalance treatment, the classification accuracy is 92% after the data enhancement process, the average recognition accuracy of the disturbance is 93% after eliminating the influence of data imbalance.The results of the real data classification are lower than the simulation results.The main reason is that the training data is less, the data contains multiple disturbances, and the labeling is inaccurate.In general, the method has a good classification effect and can be adapted to the disturbance classification in the actual power grid.

Conclusions
Aiming at the problem of identifying complex power quality disturbances, a method for fast and accurate identification of power quality complex disturbance based on DWT and H-ELM is proposed.The simulation results of the example are as follows.
(1) The feature extraction is performed by DWT, the feature selection is performed by the PSO feature selection algorithm, and the feature combination with the best classification performance is selected.Based on the selected classification features, a network with good generalization performance can be trained.

Figure 2 .
Figure 2. Classification flowchart based on the H-ELM algorithm.

Figure 2 .
Figure 2. Classification flowchart based on the H-ELM algorithm.

Figure 4 .
Figure 4. Five methods of classification accuracy rate histogram.PCAVSM-principal component analysis and support vector machines.

Figure 4 .
Figure 4. Five methods of classification accuracy rate histogram.PCAVSM-principal component analysis and support vector machines.

Figure 5 .
Figure 5. Test and training time histogram.

Figure 6 .
Figure 6.Test and training accuracy histogram.

Table 1 .
Original feature set.

Table 3 .
Algorithm running time comparison.SNR, signal-to-noise ratio; PNN, probabilistic neural networks.In order to verify the classification effect of the multilayer perceptron extreme learning machine, we compared it with five other existing methods.The comparison results are shown in Table

Table 4 .
Comparison of classification accuracy of five methods.The principal component analysis and support vector machines (PCASVM) algorithm also has good PQD recognition accuracy, but its training speed is slow, for the same amount of data, and its operating speed is 30 times that of the H-ELM algorithm.Therefore, the high performance of the H-ELM can be seen.

Table 4 .
Comparison of classification accuracy of five methods.

Table 5 .
Comparison between the PSO-H-ELM and H-ELM algorithms.

Table 5 .
Comparison between the PSO-H-ELM and H-ELM algorithms.

Table 6 .
Classification effect of each disturbance type of PSO-H-ELM.

Table 6 .
Classification effect of each disturbance type of PSO-H-ELM.

Table 6 .
Classification effect of each disturbance type of PSO-H-ELM.

Table 8 .
Comparison of classification accuracy and algorithm running time of five methods.

Table 8 .
Comparison of classification accuracy and algorithm running time of five methods.

Table 9 .
True signal classification result.