Power Quality Disturbances Recognition Based on a Multiresolution Generalized S-Transform and a PSO-Improved Decision Tree

In a microgrid, the distributed generators (DG) can power the user loads directly. As a result, power quality (PQ) events are more likely to affect the users. This paper proposes a Multiresolution Generalized S-transform (MGST) approach to improve the ability of analyzing and monitoring the power quality in a microgrid. Firstly, the time-frequency distribution characteristics of different types of disturbances are analyzed. Based on the characteristics, the frequency domain is segmented into three frequency areas. After that, the width factor of the window function in the S-transform is set in different frequency areas. MGST has different time-frequency resolution in each frequency area to satisfy the recognition requirements of different disturbances in each frequency area. Then, a rule-based decision tree classifier is designed. In addition, particle swarm optimization (PSO) is applied to extract the applicable features. Finally, the proposed method is compared with some others. The simulation experiments show that the new approach has better accuracy and noise immunity.


Introduction
Recently, the fast growing microgrid technology provides a good solution to solve the problem of large scale distributed generator coupling.However, compared with the traditional grid, power quality in microgrids has become the subject of interest for utilities as well as energy producers or prosumers [1].In a traditional grid, power quality can be improved by centralized treatments in substations and it has little influence on the end users, but in microgrids, especially islanded ones, the distributed generators directly power the user loads.Furthermore, the capability of microgrids is relatively small and there exist high proportions of nonlinear and unbalanced loads, so microgrids are more likely to be troubled by PQ (power quality) events, such as harmonics, transient disturbances and so on [2].These disturbances can lead to serious impacts on power equipment and users' important production processes, which may cause economic damage [3], so monitoring and detailed analysis for power quality are necessary to manage power quality and monitor the status of the devices in microgrids.
Power quality disturbance classification is a basis of analysis and control of power quality, which plays an important role in transient analysis and monitoring of power electronic devices.What's more, through the analysis of PQ disturbances, a large amount of information can be mined to contribute to locating faults, detecting the operation state of systems and troubleshooting [4][5][6].Along with electric power technology advances, the monitoring and the analysis of the power quality status will be developed gradually from generation and transmission system to distribution system.As a result, the accuracy and instantaneity of disturbance recognition is much more important now.Additionally, the research now focuses on complex power quality disturbances recognition instead of simplex disturbances [7].
The power quality disturbance classification procedure is always composed of two parts.The first part is signal processing and the second part is pattern recognition [4,5].In an actual operating process, more than one type of disturbance may take place at the same time.That is what we call a complex disturbance.This results in a higher need for signal processing.Common approaches include short-time Fourier transform (STFT) [8], wavelet transform (WT) [8], S-transform (ST) [9], Generalized S-transform (GST) [10] and Hilbert-Huang Transform (HHT) [11].WT is a well-developed signal processing method.However, its effect in practical application relies on the choice of base functions and decomposition scale.ST and GST, which are conditioned by the Heisenberg uncertainty principle, have nevertheless been widely used in power quality disturbance classification.When disposing complex disturbances, time resolution and frequency resolution can't be satisfied at the same time.HHT has recently become a popular method because of its full self-adaptability.There exist many problems to solve in HHT, although some progresses have been reached in power quality disturbance classification.For example, end effects, mode mixing problem and bad real-time feature caused by large amount of computation.Pattern recognition techniques include Support Vector Machine (SVM) [12], Artificial Neural Network (ANN) [12], k-Nearest Neighbor (KNN) [13] and decision tree (DT) [14], etc.This article chooses DT to design the classifier.Compared with the other approaches, DT can be carried out more easily.Meanwhile, it also has high efficiency that can satisfy the demands of situations when real-time performance is highly needed, although it's worth mentioning that DT's classification effect depends on the feature selected.In the considered situation the classification ability of the feature is great.As a result, DT can be applied.This paper extracts six features from the original signal and S-matrix.In order to strengthen the classification ability of the features, particle swarm optimization (PSO) is presented to optimize the parameter.PSO is a swarm intelligence method which originates from birds flock's behavior when looking for food [15,16].The problem discussed in this paper is a one-dimensional continuous optimization problem.Since the traditional methods, analysis methods and methods of exhaustion are not suitable here because the calculation process is complicated, PSO can effectively decrease the computation time, and achieve a solution with a high fitness value.

Generalized S-Transform
Stockwell proposed the S-transform in 1996 [17].As an extension to short-time Fourier transform and continuous wavelet transform, it can be expressed as follows: The input signal is () ht , which after the S-transform will become ( , ) where ( , ) w t f is the Gaussian window function, and is the width of window.
The inverse S-transform can be obtained by utilizing the inverse Fourier transform, which is expressed as: As shown above, considering a discrete signal, T is the sampling interval, N is the total number of sample points.As , f n NT kT    , the discrete form of S-transform can be expressed as: With Equation (1), we can see that the S-transform is different from STFT, as the height and the width of gauss window vary with changing frequency, thus it overcomes the defect that the window height and width of the short-time Fourier transform are constant.The S-transform result is a two-dimension matrix, which is defined as the S-matrix.The S modulus matrix can be obtained by modulus arithmetic, whose columns reflect amplitude frequency characteristics of the signal in some certain time, and rows describe time-domain distribution of the signal in the particular frequency.Different frequency components of the non-stationary signal with distortion will produce different time-frequency distribution characteristics, whereby the high frequency area of a signal changes dramatically, and the low frequency area has a relatively stable change [18].It is necessary to adjust the window width of the Gaussian window according to the signal analysis requirement, that is, a wider time window in the high frequency area while are more narrow time window can be used in the low frequency area.The width of the S-transform window will be invariant after confirmation of the window function, thus its application is limited and some scholars have thus put forward a generalized S-transform [10], with the width factor of window function  , that is, making 1 () f f   .With the adjustment of the value  , the variation rate of width of window varies and the Gaussian window function is expressed as: Then generalized S-transform is obtained as: ( , ) ( ) 2 According to the Equation ( 8), the discrete form of the generalized S-transform is expressed as: , 0 where: G(m,n) is obtained from the Gaussian window function by FFT. is a constant.When =1  we have a standard form of the S-transform, in other words, the S-transform is a special case of the generalized S-transform.In the generalized S-transform, the time (or frequency) resolution of the time-frequency spectrum can be improved by changing the value of  , whereby small values of  correspond to high frequency resolution and large values of  corresponds to high time resolution.Considering the contrariety between the time resolution and the frequency resolution, in the application of the generalized S-transform, we should choose an appropriate  value according to the actual situation.
The existing methods to recognize PQ disturbance by using the generalized S-transform generally adjust whether the harmonic components exist.When analyzing the signal with harmonic components, a smaller  is chosen to obtain a higher frequency resolution.Otherwise, when analyzing the signal without harmonic components, a larger  is chosen to obtain a higher time resolution, but, when we utilize GST to analyze complex disturbances with harmonic components as well as fundamental frequency disturbances (such as harmonics with voltage sag, harmonics with voltage swell, etc.), the GST adopts a smaller  value.For sag or swell in the complex disturbance signals, GST performs poorly, which affects the classification accuracy.

Multiresolution Generalized S-Transform
General power quality disturbance signals include voltage sag, voltage swell, voltage interruption, flicker, impulse, notch, harmonics and oscillatory transients, etc.According to the time-frequency distribution characteristics of different disturbance signals by S-transform, voltage sag, voltage swell, voltage interruption and flicker are concentrated in the low frequency area, which can be analyzed by the change of the fundamental frequency amplitude, utilizing higher time resolution.The energy of harmonic signals are distributed in the harmonic frequency area (generally considering odd harmonics under the 13th), which should adopt higher frequency resolution to overcome sidelobe effects.The disturbance energy of the transient oscillation is distributed in the frequency area higher than the harmonic frequency.An adaptively adjusted  is needed to avoid different kinds of influence including the high frequency energy of other types of disturbance signal and noise.The impulse and notch disturbances occur near the fundamental frequency as well as in the harmonic frequency area, which can be separated from other types by the features of the original signal.
In order to improve the complex disturbance classification ability of the generalized S-transform, the paper proposes MGST.The GST modulus matrix is divided into three areas of low, middle and high frequency.According to the need of disturbance classification in different frequency areas, different  values are used.The low frequency area ranges from 1 to 100 Hz, which is used to analyze disturbances including sag, swell, interruption, flicker, impulse and notch.The middle frequency area ranges from 101 to 700 Hz, and is used to classify the harmonic components of the disturbance.The part above 700 Hz is the high frequency area, which is used to classify transient components.In order to meet the requirement of the complex disturbance analysis, we need to set different  values in different frequency areas.

The Setting of the Width Factor  in the Low Frequency Area
Because signals are processed with different width factors in different frequency areas, there is no need to consider the conflict between time and frequency resolution in different frequency areas.Thus the width of a low frequency window can be narrower than GST.Generally, the amplitude factor of fundamental frequency is used to describe the change of the fundamental [19].Figure 1 compares the situations of different width factors acting on the amplitude factor of fundamental frequency between sag (black star) and notch (red dot).As shown in Figure 1, when the width factor is increased appropriately, the number of the cross samples declines significantly.Through statistical experiments, the width factor in low frequency area is set as LF 5  ( LF  represents  of low frequency).The window width in middle frequency area can be wider than GST so that the classification ability of the harmonic component can be improved to overcome the sidelobe effect.Using the same harmonic disturbance signal, Figure 2 shows the maximum spectrum amplitude curves with different width factors.As shown in Figure 2, the 5th harmonic, which can't be recognized by traditional ST, can be recognized when the width factor is reduced suitably.Through the statistical experiments, the width factor is selected as MF 1/ 3  ( MF  represents  of middle frequency).The setting of the width factor in the high frequency area aims to reflect the transient component feature.Interruption and sag distribute in the whole frequency range, whose energy changes dramatically near the edge of distortion, so they may be mistakenly identified as the signal containing a transient disturbance.At the same time, a signal with noise may be identified as the signal containing a transient component.However, the requirements in these two cases are different obviously.On the one hand, we should maintain the high frequency energy of the transient to distinguish between sag, interruption and the transient disturbances.In this case, MGST need a higher time resolution.On the other hand, higher frequency resolution is needed to suppress any noise.Therefore, for the width factor in the high frequency area, MGST gives an adaptive setting.Based on the analysis of the fundamental frequency through the Fourier spectrum ( F A ), we can judge whether the signal contains a fundamental frequency disturbance (sag or interruption).In this paper, when 0.997pu 1.003pu ( pu represents per unit), we decide that there is no fundamental frequency disturbance and make .In this way, by adaptively adjusting the width factor in the high frequency area we can meet different requirements of the window width in different situations.Figure 3 is the time-frequency contour in the high frequency area, when the same transient signal without disturbance in fundamental frequency is analyzed with different width factors.As is shown, when the window width increases appropriately, the anti-noise performance is improved significantly.

PQ Disturbance Classifier Based on Decision Tree
DT recognizes different PQ disturbances by turning a complex classification problem into some binary classification problems.It can be carried out easily and has a high classification efficiency [14].As the disturbance samples concerned in this paper at each decision node can be differentiated by merely one feature, DT is undoubtedly the best choice.
As a matter of fact, the noise level in power system is not fixed.In [20,21], waveforms are generated through power system simulation which is significant.However, taking various circumstances into consideration, this paper analyzes the PQ disturbance signals simulated by MATLAB 7.0 software with SNR (signal to noise ratio) evenly distributed from 30 to 50 dB.The model references [13,22,23].The models are given in Table 1.In real power systems, the frequency may change with the fluctuating active power.However, the frequency variation is generally restricted no more than 0.2 Hz (when the capacity is small, the value is 0.5 Hz).The S-transform in this paper divides the frequency domain by 1 Hz so the method is reasonable.What's more, the power quality disturbances researched in this paper usually refer to voltage disturbances occurring in a short time, so there is a negligible effect on the experiment especially after the S-transform method and the assumption is acceptable.The sampling rate is 3.2 kHz.This paper analyzes 13 kinds of disturbances.The types of disturbances and the corresponding class labels considered are denoted in Table 2.For each kind of disturbance 2000 samples are generated.Half of them are used to train the DT, the others are used to test the DT's accuracy.

Disturbance Class Modeling Equations Equations' Parameters
Pure Signal

The Structure of DT
The structure of DT classifier is shown in Figure 6.
Features 1 and 2 are both selected from original signal, by which we can distinguish whether a sample signal's energy increases or decreases.
Feature 3: Standard deviation of amplitude of fundamental frequency, we call it Fstd σ : T " in the high frequency area.s T is utilized to reduce the impact of noise, and in this paper we call it "cut-off threshold".By setting the cut-off threshold " s T ", the noise resistant ability of classifier is improved without increasing the system complexity, however, it is difficult to determine a reasonable value.In next part, the paper proposes a modified PSO method to solve this problem.

Background
In order to improve the noise resistance ability of the feature, this paper proposes a modified high frequency area energy as Feature 6 to judge whether there exists a transient component in the high frequency area in a sample signal.The main principle is that a cut-off threshold is presented to easily filter the noise in the high frequency area.It is proved that this method effectively improves the accuracy of recognition.Meanwhile, it can be carried out easily without influencing the reliability.
Nevertheless, there is certain difficulty to determine the cut-off threshold.On one hand, choosing too large a value may give rise to excessive energy reduction in some slight transient samples that results in failure.On the other hand, too small a value can't achieve the expected targets of resisting noise.What's more, large amount of samples are needed to enumerate kinds of disturbance signals with different parameters.When a certain " s T " is selected, every sample in sample set should be processed by MGST.As a result, the amount of computations is very huge.Optimization problem about the cut-off threshold in this paper is a one-dimensional continuous optimization problem whose objective function can be expressed as s () y f T  .Thereinto, y represents false recognition rate that is a function depending on independent variable " s T ".The function can be realized through programming.So the problem can be expressed as follows: S is the value range of cut-off threshold " s T ". s () y f T  can be achieved through programming basing on MGST and feature calculation.On account of the complex computation process, an untraditional computational intelligence technology must be chosen to solve the problem.

The Search of Optimized Threshold via Modified PSO
Particle swarm optimization (PSO) was proposed in 1995 by Kennedy and Eberhart [15,16].It is a bionic algorithm which imitates birds flock's looking for food.PSO shows a strong charm in various kinds of problems because of easy realization, less parameters and simple concept.The problem concerned in this paper can be solved effectively by PSO without wasting too much time.

Basic Fundamentals of PSO
The basic fundamentals of PSO can be described as follows: A swarm constituted by m particles is flown through the D-dimensional search space of the problem.Each particle is regarded as a unit without volume whose velocity is adjusted dynamically based on its own experience and its companions' experience.On that basis, its position (that is, the value of cut-off threshold " s T ") changes.Thereinto, the position of the i th particle is expressed as 12 ( , , , ) x x x  and the velocity of the i th particle is expressed as 12 ( , , , ) ,1 im  .The best position of the i th particle so far is represented as In the k th iteration, the position and velocity in the d th dimension (1 dD  ) of the particles are updated according to the formulas as follows: The parameter w represents inertia weight, whose value is always set to 1 in the basic PSO.The parameters 1 c and 2 c represent acceleration coefficient.In the basic PSO, it's often the case that 1 c and 2 c are all equal to 2. In the formula, 1 rand and 2 rand are two pseudo-random numbers between 0 and 1.The velocity of particles are limited to max v .After several iterations, the value of " s T " is refreshed again and again with the moving of the particles.Along with the changing of " s T ", the false recognition rate is reduced at the same time.

The Improvement of PSO
The above-mentioned PSO is a basic PSO.Many scholars have done a lot of work on different parts of PSO that include population size [24], inertia weight [25], neighborhood topology [26] acceleration coefficient [27,28] and so on.According to the real conditions in this paper, we make improvements of PSO as follows: Improvement of the inertia weight w Inertia weight is a very important parameter in PSO that directly affects the balance of global and local searching ability.A large inertia weight promotes global searching ability while a small inertia weight promotes local searching ability.
According to the requirements of the problem concerned in this paper, the authors designed a fuzzy control system to adaptively adjust the inertia weight.The best fitness value at present, known as   f y, and the inertia weight at present are selected as inputs.The percentage change of the inertia weight is selected as output.Because different problems have different ranges of fitness value, Equation ( 20) is In Equation ( 20  In this way, the inertia weight is adjusted adaptively on the basis of the current situation during each iteration.The adjusting strategy is shown in Table 3 below.HIGH HIGH LOW Take RULE1 as an example.It expresses that percentage change of the inertia weight should be attached to the MEDIUM fuzzy set (that is, there is no need to change it too much) if the best fitness value and inertia weight at present relatively belong to the LOW fuzzy set.In engineering, there is always too much noise in PQ disturbance signals.Through this method, the local best solution caused by noise is avoided by adjusting the inertia weight.The anti-noise effects and robust performance are enhanced and that is good for improving accuracy.

Improvement of the acceleration coefficient
The acceleration coefficients 1 c and 2 c reflect the communication between the particles.A large 1 c leads to the particles relying on their own experience.As a result, the particles hover in the area near themselves.However, a large 2 c leads to all particles rapidly moving towards the best unit at present.
In this case, the particles may converge on local best solution at the beginning of the algorithm.
To balance the contradiction, 1 c and 2 c are usually set to be the same value.Nevertheless, this can't satisfy the actual needs sometimes.For the purpose of having a larger 1 c and smaller 2 c at the beginning of the algorithm, decreasing 1 c and increasing 2 c , adjustment as follows is utilized: _ max Hence, at the beginning of the algorithm, the particles tend to search for a better s T in the whole search space.At the end of the algorithm, the particles tend to move towards the best unit of the swarm.By this means, computation time is reduced, but no solutions would be left out.
In the formula, 1i c and 1 f c are the initial and final value of 1 c .Analogously, 2i c and 2 f c are similar to 1i c and 1 f c .I is the current iteration time and _ max I is the maximum number of iteration.In [27], they change symmetrically, that is 1 c decreases linearly from 2.5 to 0.5 and 2 c increases linearly from 0.5 to 2.5.This method behaves well concerning single hump functions but not ideally concerning multi-peak functions, and it may lead to premature convergence.To solve the problem, [28] proposes asymmetric changing.It's found that when 1 c decreases linearly from 2.75 to 1.25 and 2 c increases linearly from 0.5 to 2.25, most of the functions can be solved well, so this paper selects this method to adjust the acceleration coefficient.In this paper, the false recognition rate of different cut-off thresholds s T is chosen as the fitness function.The simulation disturbances are generated by MATLAB and then white Gaussian noise is added to make the SNR distribute from 30 to 50 dB.When a certain cut-off threshold s T is input, the values of the features are calculated by MGST, so the false recognition rate corresponding to the certain s T is acquired.The fitness function is realized through programming on the basis of the abovementioned way and then the modified PSO is utilized to optimize the cut-off threshold s T .
After multiple tests, 0.0192 is taken for the optimal s T .

Simulation and Experiment
The simulation signals of different SNR are generated to test the new method.There are 1000 signals in each group whose SNR are 30~50 dB, 30 dB, 40 dB and 50 dB.Decision trees based on ST [9] and GST [10] are also made for comparison with MGST.The result for 30~50 dB is shown in Table 4 and the results of 30 dB, 40 dB and 50 dB are shown in Figure 8.As shown in Table 4 and Figure 8, MGST does better than ST and GST when recognizing all kinds of disturbances in different noise levels.What's more, consistent with the theory, complex disturbances are recognized much better by MGST.Future works will focus on further optimizing the width factor of the window function in different frequency areas and reducing the computational burden of MGST.

Figure 2 .
Figure 2. The width factor's impact on the feature in the middle frequency area.

HF 1 / 3
 ( HF  represents  of high frequency); otherwise, there exists a fundamental frequency disturbance and HF 2 

3. 1 .
The Analysis of Disturbance Signal via MGST The time-frequency contour plots of different disturbance signals via MGST are shown in Figure 5.The first part of each disturbance shows the plots of the disturbance signal.The second part is the time-frequency contour that presents frequency values versus time values for the S-matrix.The third part, called fundamental frequency amplitude plot, presents amplitude of fundamental frequency versus time values.The fourth part, called frequency-mean amplitude plot, presents mean amplitudes versus normalized frequency values, and the values in these plots are obtained by calculating the mean value of each row of the S-matrix.The fifth part, called frequency-maximum amplitude plot, presents maximum amplitudes versus normalized frequency values, and the values in these plots are obtained by searching the maximum value of the rows of the S-matrix at every frequency.The sixth part, called frequency-standard deviation plot, shows standard deviations (Std) versus normalized frequency values, and the values in these plots are obtained by searching the rows of the S-matrix at every frequency.As far as simplex disturbances concerned, features of voltage sag, swell, interruption, flicker, notch and impulse are mainly in the low frequency area.The distortion characteristics of the fundamental frequency are different from the others.Harmonics are mainly in the middle frequency area.Transients are mainly in the high frequency area.Complex disturbances have different features in all three frequency areas.Considering the characteristics of MGST, features are extracted from different frequency areas to recognize disturbances, and in this way more targeted features can be acquired.
), max f and min f are decided by the specific problem.It is necessary to confirm or estimate their values.Three membership functions (left, middle and right) are presented as Equations (21)-(23) to provide the inputs corresponding to the three fuzzy sets: LOW, MEDIUM and HIGH where a and b are parameters determined by the practical problems.The curves of the membership functions are shown in Figure 7:

Figure 7 .
Figure 7.The curves of three membership functions.

Table 2 .
Types of PQ disturbances.

Table 3 .
The adjusting strategy of inertia weight.

Table 4 .
Accuracy in percentage using different techniques in 30 to 50 dB.