Tsallis Wavelet Entropy and Its Application in Power Signal Analysis

As a novel data mining approach, a wavelet entropy algorithm is used to perform entropy statistics on wavelet coefficients (or reconstructed signals) at various wavelet scales on the basis of wavelet decomposition and entropy statistic theory. Shannon wavelet energy entropy, one kind of wavelet entropy algorithm, has been taken into consideration and utilized in many areas since it came into being. However, as there is wavelet aliasing after the wavelet decomposition, and the information set of different-scale wavelet decomposition coefficients (or reconstructed signals) is non-additive to a certain extent, Shannon entropy, which is more adaptable to extensive systems, couldn't do accurate uncertainty statistics on the wavelet decomposition results. Therefore, the transient signal features are extracted incorrectly by using Shannon wavelet energy entropy. From the two aspects, the theoretical limitations and negative effects of wavelet aliasing on extraction accuracy, the problems which exist in the feature extraction process of transient signals by Shannon wavelet energy entropy, are discussed in depth. Considering the defects of Shannon wavelet energy entropy, a novel wavelet entropy named Tsallis wavelet energy entropy is proposed by using Tsallis entropy instead of Shannon entropy, and it is applied to the feature extraction of transient signals in power systems. Theoretical derivation and experimental result prove that compared with Shannon wavelet energy entropy, Tsallis wavelet energy entropy could reduce the negative effects of wavelet aliasing on accuracy of feature extraction and extract transient signal feature of power system accurately.


Introduction
Transient feature extraction is an important part of signal analysis, and as a new feature extraction algorithm, wavelet entropy has attracted the attention of experts and scholars all over the world.Wavelet entropy is a combination of wavelet decomposition and entropy statistics theories, and it has the advantages of multi-resolution analysis and complexity evaluation for time-varying signals, which means that the macro and micro aspects of some special signals could be researched in the timefrequency domain.For the above reasons, wavelet entropy has gradually been used in engineering signal surveys such as electroencephalography (EEG) testing, machinery vibration detection, power system fault diagnosis and so on [1][2][3].Various studies have also manifested the better performance of wavelet entropy in analyzing the variability and complexity of climate processes compared with traditional methods [4][5][6].For estimating the cognitive workload of human brains, wavelet entropy is used to extract the variation features from EEG signals, and for a subject-independent multi-channel classification scheme, the entropy features achieved high accuracy, up to 98% for channels from the frontal lobes, in the delta frequency band [7].Bing et al. [8] presented a sensor fault detection method based on wavelet entropy and applied it to fault diagnosis of micro-gas turbine engine sensors.Remarkably, Zhenyou et al. [9,10] introduced wavelet entropy to the signal analysis of power systems, which makes it possible for some special power signals, including voltage dips, voltage interruptions, voltage flicker, voltage pulse and so on, to be analyzed more extensively and deeply than ever.
However, most research on wavelet entropy involves engineering applications of Shannon wavelet energy entropy (SWEE), but its physical meanings, working mechanism, and application principle have not been well discussed yet.In fact, Shannon wavelet energy entropy has some disadvantages in processing non-stationary signals, which could result in inaccurate or wrong results.
In view of the abovementioned facts, in this paper, the theoretical basis of SWEE has been analyzed, and the emphasis of research was placed on the working mechanism for feature extraction of transient signals.Wavelet aliasing's negative effect on feature extraction, which uses Shannon wavelet entropy, is discussed to find the primary reason for the problem.
Because Tsallis entropy is good at expressing the uncertainty of generalized systems, with a combination of Tsallis entropy and wavelet decomposition, a wavelet entropy-Tsallis wavelet energy entropy (TWEE) is constructed and the physical meaning of its algorithm is also explained.Starting with an analysis of wavelet aliasing and feature extraction effects, this paper studied the working mechanism and the suitable scope of TWEE, analyzes the relationship and difference between TWEE and SWEE and provides the initial principle of non-extensivity index selection.Finally, for analyzing transient voltage fluctuations caused by non-faulty indirect lightning strikes, TWEE is applied to transient feature extraction of lightning strikes.The corresponding voltage data when a non-faulty indirect lightning strike took place on 110 kV overhead distribution line in Guangdong, China, is collected and analyzed using TWEE, and the comparison of feature extraction, based upon TWEE and SWEE, has been carried out.The experimental results show that TWEE is better than SWEE in analyzing transient signals.

The Definition of SWEE
Shannon wavelet entropy (SWE) is the cooperative product of wavelet decomposition and Shannon entropy theory.By changing the segment of wavelet coefficients at the corresponding scales, and for different statistic objects using Shannon entropy, several derived algorithms are proposed based on SWE theory.Among these, SWEE is one of the most widely used algorithms in all fields [10].The definition of SWEE is given as follows: first, a sliding window (width w N  ) is defined upon wavelet coefficients { ( ), 1,..., ; 1,..., } represents the energy sum of D in sliding window, where: represents the energy branch at scale j.At the moment / 2 w m  , SWEE is defined as: where L is the length of ( )

The Theoretical Defects of SWEE
In the macro-perspective, SWEE is the data mining of wavelet coefficients (or reconstructed signals) base on Shannon entropy statistics, so it is inevitable that Shannon WE inherits the statistic properties of Shannon entropy.As well known, Shannon entropy is the extended application for B-G entropy in information theory.As an important index in thermodynamics, B-G entropy represents the measurement for the uncertainty of N-energy-level system, and is defined as: where i p represents the probability of No. i system status and k is Boltzmann constant.Shannon was inspired to induce B-G entropy to measure the information loss in information theory, and given the definition of information entropy-Shannon entropy as following: suppose that a probability measure on X, as an X-tuple of numbers, is i p ( 1, 2,..., i N  ) satisfying: with the convention that log 0 i a i p p  as 0 i p  [11,12].In addition, when a = 2, e, or 10, the unit of H is bit, nat or hart respectively.From Equations ( 2) and (3), it is found that there is a strong connection between Shannon entropy and B-G entropy.In some ways, Shannon entropy naturally inherits the statistic property of B-G entropy such as convex linearity, extensivity, continuity, and so on, so Shannon entropy also satisfies the following Equation ( 4): where A and B are two subspaces of one information set.In other words, the statistic range of Shannon entropy is restricted to additive signals, which like B-G entropy only fits into extensive systems in thermodynamics [13].
It is well known that Boltzmann-Gibbs statistics (BGS) are inadequate for treating some complex systems [14].These are systems with complex or long term interactions and correlations, systems showing often distribution laws are different from the usual ones (Gauss, Poisson), systems in chaotic or fractal states and often related to nonextensive phenomena in energy, entropy, heat, and other quantities [15].In the process of wavelet decomposition, energy leakage exists at each scale, and the leakage of energy invades neighboring scales.At the same time, the extent of interaction, among neighbor scales, will change with a change in the component of signal.Obviously, it belongs to the complex system mentioned above.In other words, Equation ( 4) is not true when Shannon entropy is used to analyze wavelet decomposition statistically, which means SWEE is not entirely applicable to some transient signals with components change in the time-frequency domain, so it is necessary to discuss this issue in detail and solve it.

Energy Leakage and Wavelet Aliasing
The analyzed signal result using SWEE is undoubtedly connected to the consistence between the wavelet decomposition result and the corresponding component in the signal, so it is a key point for SWEE accuracy that the proper orthogonal wavelet basis be selected to obtain accurate coefficients or reconstructed signals.However, for most mother wavelets, wavelet aliasing still exists more or less at the neighbor scales, which means that the information and energy of signal are incomplete at the decomposition scales, so we can say that wavelet aliasing is the main reason for energy leakage.To better understand energy leakage, the mechanism of wavelet aliasing is researched as follows: wavelet aliasing is when frequency bands, corresponding with wavelet scales, overlap each other after wavelet decomposition.Here, the definition of continuous wavelet transform is given as below.For 2 ( ) ( ) , the continuous wavelet transform function is defined as: where ( ) t  represents the mother wavelet function, ( ) t  represents the complex conjugate of ( ) t  , a is the scale operator, and b is the shift operator.
From Equation ( 5), the derived frequency-domain expression of the correlation function is obtained as: where ˆ( ) a   is the frequency-domain expression of ( ) t a


, and ( ) X  is the frequency-domain expression of ( ) x t .In addition, the center frequency is given as: where 0  is the center frequency of ˆ( )   , and a  is the center frequency of ˆ( ) As the Db4 wavelet is taken as ( ) t  , according to Equation ( 6), the wavelet function curve, in the frequency domain, is drawn as in Figure 1.From Figure 1 and Equation ( 7), it is known that a  changes with the scale a, and wavelet coefficient ( , ) Tx W a b reaches its maximum value when the frequency of the signal is equal to a  .However, the curve of ˆ( ) a   is not the rectangle, and even if Mallat algorithm is used in discrete wavelet decomposition, this means that the frequency overlap of ˆ( ) a   still exists at the neighbor scales, so the energy of a signal in the frequency-domain window, corresponding to one scale, would leak into the neighbor scales.In conclusion, wavelet aliasing is the main reason of energy leakage, which is almost inevitable for most mother wavelets.

The Negative Effect of Wavelet Aliasing on SWEE
Based on the above analysis, wavelet aliasing can have negative effects on the accuracy of SWEE, therefore we further research the relationship between SWEE and wavelet aliasing.
First, the time sequence of signal x(t) is decomposed into N scales on a orthogonal wavelet, suppose that no wavelet aliasing takes place at neighbor scales, the mathematical expressions of expectation SWEE is derived as: where: 1 ( 1,..., ) Similarly, suppose that wavelet aliasing takes place at neighbor scales k and k + 1, and ε represents the energy loss from k to k + 1, mathematical expressions of SWEE are derived as: where: ' ' ( 1,..., ) Then, constructing the difference function ( )   as: where: . Furthermore, for researching the analytic properties of ( ) transformed as: where: Obviously, H(P) is the Shannon information entropy of a two-channel information source with the prior probability 1 ( , ) According to the concavity of Shannon information entropy function, H(P) meets the following conditions: According to Equations ( 11) and ( 12), it can be deduced as: which means that ( ) k f p has concavity, maximum and minimum values.
According to the above analysis, the curve of ( ) k f p is drawn as shown in Figure 2.  (1) As From the above, it is only when without wavelet aliasing at the neighbor scales.However, according to the analysis in Section 3.1, wave aliasing inevitably exists to varying degrees, so energy leakage caused by wavelet aliasing has have detrimental effects on the accuracy of SWEE.Without considering the signal characteristics, using the SWEE algorithm directly, the accuracy of feature extraction is sometimes unbelievable.To prove the actual existence of a negative effect of wavelet aliasing on SWEE, we propose a numerical example as follows.For above all, a signal is constructed as : where:   According to the definition of SWEE, the value of SWEE has to increase with the complexity of the signal in the time-frequency domain.For this reason, with the frequency component increasing, the value of SWEE, from 0 s to 0.4 s, should be minimum from 0 s to 1 s, and the value of 0.4-0.8s should be less than that of 0.8-1 s.However, from Figure 3, SWEE never represents the complexity change of the signal accurately, and the ripple disturbance is also serious, which does not correspond with the fact, so it is true that wavelet aliasing has a negative effect on the accuracy of SWEE, and this also verifies the correctness of the above theoretical deduction.

Tsallis Entropy
Nonextensive statistical mechanics, pioneered by Tsallis, offer a consistent theoretical framework for the studies of complex systems with long-range interactions, long-time memories, multifractal and self-similar structures, or anomalous diffusion phenomena.As a nonextensive entropy, Tsallis entropy is the extension and development of extensive entropy (B-G entropy) in statistical physics [16].It can explain some abnormal experimental phenomena, such as the complexity of non-additive systems, which cannot be explained by the theory of extensive entropy.Tsallis entropy in a discrete expression is defined as follows: where Different from extensive entropy, q is introduced as a parameter, so it is called nonextensivity index, and it represents the extent of nonextensivity of the system in Tsallis entropy.The non-additivity of Tsallis entropy of a complex system, composed of A and B subsystems, is defined as follows: Note that q < 1 and q > 1 correspond respectively to the ultra-extensity and sub-extensity of the system.For researching the mathematical concave and convex nature of Tsallis entropy, let P and Q be probability variables, and satisfy: where: ) ) S ( ) (1 )S ( ) From Equation ( 15), Tsallis entropy function represents the concave nature.At the same time, Tsallis entropy function has a definite concavity for q values (S q is concave for q > 0 and convex for q < 0).Furthermore, we consider Tsallis entropy statistical characteristics as two independent subsystems.According to Equation ( 13), we have a curve trace like Figure 4 which indicates the change law of Tsallis entropy with the probability distribution under different q values.From Figure 4, the variation of q values has considerable effects on the statistical characteristics of Tsallis entropy.When q > 0 and q ≠ 1, the function curves take on the appearance of concavity, and it exists corresponding maximum values for all.When q → 1, Tsallis entropy tends to B-G entropy and it can describe the complexity of additive systems when considering probabilistically independent events [17,18].Moreover, when q < 0, the function curves are contrary to the former for q > 0, and there exists the corresponding minimum values.Based on the above analysis, Tsallis entropy with appropriate q value is not only more flexible in information measurement but more widespread in the statistical range of entropy.
is defined as the energy sum of high-frequency part of testing signal in ( ; , ) W n w  on j scales where: TWEE is defined as: As the sliding-window slides, the trendline that TWEE changes with time can be obtained and plotted.In the above expression, the scale space corresponds to the frequency space, and TWEE indicates the energy distribution of signals.A wavelet functions does not have pulse selectivity in either the frequency or time domain, whereas it has a support region, so the partition of the original signal at scale space also indicates the signals' energy distribution at time-frequency domains.The more complex the analyzed signal is, more modes the energy congregates to and the larger the TWEE is.Therefore the TWEE is an index to evaluate the signal complexity or uncertainty.

The Inhibition on Wavelet Aliasing Using TWEE
According to Section 3, wavelet aliasing causes the energy loss at the neighbor scales, and the information set of wavelet coefficients (or reconstructed signals) at different frequency bands, is nonadditive, so SWEE is unsuited for extracting the features of complex signals, especially for transient signals.On the contrary, as the theoretical foundation of TWEE, Tsallis entropy is good at characterizing the uncertainty of nonextensive system because of its nonadditivity.The extent of energy loss has a close correlation with the degree of nonadditivity of information set, for this, if the appropriate value of q is chosen, using TWEE should be able to extract the features accurately without the wavelet aliasing effect.
To prove the above point, the example proposed in Section 3.2 is taken again as an analysis case.For the time sequence of x(t), as q = 0.1, the feature of frequency complexity is extracted using TWEE as Figure 6.
Comparing Figure 6 with Figure 3, it is found that the value of TWEE from 0 s to 0.4 s, is less than that from 0.4 s to 0.8 s, and the value of 0.4-0.8s is less than 0.8-1 s, which is coherent with the frequency variance of x(t).The simulation results show TWEE can inhibit the negative effect of wavelet aliasing and extract the feature of transient signal accurately at the appropriate value of q.
Apart from the statistical characteristics of entropy, we drive the root reason why TWEE are not affected by wavelet aliasing and could characterize the accurate features of transient signals.Taking a three-level system as the analysis object, according to Equation ( 13), the statistical results are calculated, and the corresponding relation between Tsallis entropy with q and probability distribution are shown in Figure 7. From Figure 7, as q > 0 and q ≠ 1, for the small probability event, Tsallis entropy represents the concave nature, and extends the statistical range, but reduces sensitivity with the increasing value of q.On the contrary, with the decreasing value of q, the statistical range is narrowed down and sensitivity is enhanced.Therefore, when the energy loss at the neighbor scales is too serious to be ignored, it is suggested to reduce the value of q for TWEE, which could enhance the inhibition capability on wavelet aliasing.Of course, with the value of q decreasing, TWEE has less resolution to distinguish the changes of time-frequency complexity for the big probability event in transient signals.

Conclusions
Entropy statistics theories and wavelet decomposition algorithmw are the academic foundation of wavelet entropy, therefore the choice of entropy has a direct impact on the signal analysis results.Because wavelet aliasing exists at neighbor scales, the information set of wavelet coefficients (or reconstructed signals) is endowed with additivity property, and the extent of nonadditivity is much higher for information sets corresponding to transient signals with varying frequency.At this point, using the SWEE algorithm would result in a false analysis result against the fact.In this paper, considering the advantage of Tsallis entropy in measuring the uncertainty of generalized system, TWEE is proposed to extract the features of transient signals with changing the value of q.We apply the TWEE algorithm to the analysis of transient power signals and find that it can not only indicate the occurrence time of lightning strikes, but also characterize the decay trend of lightning energy without the effect of wavelet aliasing, which means the TWEE algorithm has stronger ability to analyze transient signals than SWEE.Of course, do more in-depth research on the wavelet entropy's theoretical basis and application range is needed, especially in the choice of nonextensivity index for the different signals.
j d k ,  is the sliding factor, S is the maximum of scale number, m is the number of sliding step,

Figure 1 .
Figure 1.DB4 wavelet function curves with different scales in frequency domain.
Considering that wavelet aliasing and energy leakage exist at the neighbor scales, on the basic premise of 0 k p p    , ( )   is discussed as following:

f 1 =
200 Hz, f 2 = 390 Hz, f 3 = 450 Hz, f 4 = 625 Hz, f 5 = 1,260 Hz.In addition, the sampling frequency f s = 5,000 Hz, and W n represents white noise.Then, four-level wavelet decomposition on the basis of the Db4 wavelet is used for the signal x(t).When w = 30 and 1   , according to Equation (1), SWEE of x(t) is obtained and the curve is drawn as in Figure3.

Figure 4 .
Figure 4. Relations between entropy with q and probability distribution under two-level system.

Figure 5 .
Figure 5. Wavelet coefficients matrix in the sliding-window.
In the field of power quality, the research on transient signals such as transient overvoltages, voltage dips, voltage interruptions, voltage flicker and voltage pulses, have attracted great attention.As one of transient signals in power systems, the voltage fluctuations caused by indirect lightning strikes often happen in the overhead distribution lines in South China, where most indirect lightning strikes are low-energy and unlikely to cause faults directly.However, the safety of power transmission and the quality of power supply could be suffered from negative effects because of high incidence rate of lightning strikes.In addition to electromagnetic interference in PT secondary circuits, the short duration and low energy level of non-faulty indirect lightning strikes are another major reason which could result in failure in extracting the features of this kind of lightning strike, so the phase voltages, caused by indirect lightning strikes, are taken as analysis objects, and TWEE and SWEE are separately used to extract the transient features of lightning strikes.A set of phase voltage signals of a 110 kV transmission line, with lightning strike interference, are obtained from the Guangzhou EPRI (Figure 8).The details are as follows: Fault line: the 110 kV Jiaji transmission line in Guangdong.Fault type: lightning strike interference.Recording site: the 110 kV side of main transformer of 220 kV Jiahe substation.Occurrence time: 15:57:7:480, 23 March 2010.