Enhanced Partial Discharge Signal Denoising Using Dispersion Entropy Optimized Variational Mode Decomposition

This paper presents a new approach for denoising Partial Discharge (PD) signals using a hybrid algorithm combining the adaptive decomposition technique with Entropy measures and Group-Sparse Total Variation (GSTV). Initially, the Empirical Mode Decomposition (EMD) technique is applied to decompose a noisy sensor data into the Intrinsic Mode Functions (IMFs), Mutual Information (MI) analysis between IMFs is carried out to set the mode length K. Then, the Variational Mode Decomposition (VMD) technique decomposes a noisy sensor data into K number of Band Limited IMFs (BLIMFs). The BLIMFs are separated as noise, noise-dominant, and signal-dominant BLIMFs by calculating the MI between BLIMFs. Eventually, the noise BLIMFs are discarded from further processing, noise-dominant BLIMFs are denoised using GSTV, and the signal BLIMFs are added to reconstruct the output signal. The regularization parameter λ for GSTV is automatically selected based on the values of Dispersion Entropy of the noise-dominant BLIMFs. The effectiveness of the proposed denoising method is evaluated in terms of performance metrics such as Signal-to-Noise Ratio, Root Mean Square Error, and Correlation Coefficient, which are are compared to EMD variants, and the results demonstrated that the proposed approach is able to effectively denoise the synthetic Blocks, Bumps, Doppler, Heavy Sine, PD pulses and real PD signals.


Introduction
High Voltage (HV) equipment uses a variety of insulating materials for protecting the system and have dielectric media that are either solid, liquid, or gas depending on the design requirements of HV equipment. Insulation deterioration could advance further to physical and chemical degradation of the local and adjacent insulation, which may result in failure of the entire insulation, leading to failure of the power system equipment [1]. Due to the high voltage stress, localized dielectric breakdown of a small portion of an insulator occurs, resulting in a Partial Discharge (PD) signal. As a part of the condition-based maintenance plan for HV equipment, PD monitoring is an effective and non-destructive diagnostic tool that helps to assess the condition of HV equipment. Hence, PD measurements in generators, HV cables, motors, switch gears, transformers, etc., are carried out during operating (on-line measurement) conditions. The most common issue faced during on-line PD measurements is the interference of external signal referred as noise, which sometimes has a very high amplitude compared to the PD signal [2,3]. The most common on-site noise signals reported during PD measurements are noise from corona discharge, white Gaussian noise, thermal noise, pink noise, and high-frequency signal interference from communication equipment, commonly referred to as Discrete Spectral Interference (DSI) [4,5].
In the past few years, many researchers proposed signal decomposition based denoising algorithms in various engineering disciplines, the usage of such techniques being applied to PD signals, such as Wavelet Transform (WT) [4][5][6][7][8][9], Empirical Mode Decomposition (EMD) and its variants [10,11], Adaptive Local Iterative Filtering (ALIF) [12] and Variational Mode Decomposition (VMD) [13][14][15]. WT is the one of the most widely used decomposition method in various domains. In [4,5] the authors applied hard and soft thresholding on the coefficients of the wavelet transform, this was then extended by Ghorat et al. in [9] as Adaptive Dual Tree Complex Wavelet Transform (ADTCWT) with an automatic threshold by applying adaptive singular value decomposition. Wavelet denoising methods rely on the type and order of the wavelet, the level of decomposition, and threshold type, which needs parameter tuning and the performance depends on effective selection of those parameters. The selection of the mother wavelet is critical in the wavelet denoising method as seen in literature [5,7], the Energy Based Wavelet Selection (EBWS) method [6], Correlation Based Wavelet Selection (CBWS) method [6,8], and Signal-to-Noise Ratio (SNR) Based Wavelet Selection (SNRBWS) method [8]. Daubechies as the mother wavelet is effective in denoising PD signals mixed with DSI and white noise. However, denoising by thresholding of wavelet coefficients often introduces some artifacts such as spurious noise spikes and pseudo-Gibbs oscillation, which hinders the performance of these methods.
Apart from the usage of WT, many adaptive signal decomposition techniques were used in denoising applications [16]. Empirical Mode Decomposition (EMD) proposed by Huang et al. [17] is used in PD denoising [10], and its variant Novel Adaptive Ensemble EMD (NAEEMD) is applied for denoising PD signals [11]. The EMD method proved to be powerful in extracting IMFs from the given signal, however, other issues such as high computation time, error accumulation, mode mixing, and end effects are reported in [18]. Ensemble EMD (EEMD) [19] and Complete Ensemble EMD with Adaptive Noise (CEEMDAN) [20] has been developed based on EMD, however, all the issues mentioned above have not been fully addressed. To avoid such issues, Dragomiretskiy and Zosso [21] proposed VMD, an algorithm, which is a non-recursive decomposition method that decomposes a multi-component input signal into a set of Band Limited IMFs (BLIMFs). The PD signal is acquired through a sensor circuit that contains PD pulses with different frequency levels and various noise sources. The VMD method can decompose the sensor signal into a set of BLIMFs, which is beneficial in terms of analysing the decomposed PD signal. A recent and comprehensive review of EMD, EEMD, CEEMDAN, and VMD methods presented in [22] lists several advantages of VMD methods and suggests the VMD method for applications to the vibration-based condition monitoring of mechanical systems. The VMD has also been utilised in PD signal denoising in [13][14][15] where an optimised VMD and wavelet (OVMDW) is applied in denoising UHF PD signals [13], and the hybrid of VMD and Wavelet Packet Transform (WPT) were applied to synthetic, real-time PD signal denoising [14]. However, in both methods only white noise was addressed. The PD fault diagnosis procedure proposed in [15] has VMD as the decomposition technique followed by feature extraction from the selected IMFs to train a classifier.
The entropy proposed by Shannon [23] is an effective and widely used measure to study the randomness or uncertainty of time series data. Many forms of entropy measures are used in denoising methods. In adaptive denoising methods, Mutual Information (MI) entropy is commonly used for analyzing the frequency between IMFs to select the specific IMFs for further processing [24][25][26]. The MI of phase spectra between consecutive IMFs is presented to decide stochastic or deterministic components present in IMFs [27]. The spectral density initial IMFs of EMD are spread moderately across the frequency spectrum. However, to the best of the author's knowledge, no experimental studies have been carried out to compare it against VMD.
Permutation Entropy (PE) is a measure for arbitrary time series based on analysis of permutation patterns [28], which reflects the complexity of the signal used in PD denois-ing [29], however it only considers the order of amplitude values and does not consider the differences between amplitude values. Dispersion Entropy (DE) is an improved version of PE [30], which is used for analyzing various signal properties such as amplitude, frequency and noise-power [29]. The DE along with MIE is used to identify noisy, noise-dominant, and signal-dominant IMFs for further processing.
Recently, sparse representation has become widely used in signal processing applications. Rudin et al. proposed a total variation denoising method based on an optimization problem for the removal of noise in 2-dimensional data, i.e., image [31] and T. Figueiredo et al. introduced Majorization-Minimization (MM) algorithm for denoising an image [32]. Selesnick and Chen and Condat [33,34] proposed an one dimensional Total Variation Denoising (TVD) algorithm applied in vibration signal denoising [35] and partial discharge signal denoising [3,36]. Selesnick and Chen proposed an iterative Group-Sparse Total Variation (GSTV) algorithm derived using MM optimization method, which is suitable when the estimated signal to the group is sparse. GSTV is designed to alleviate the staircase artifact often arising in the TVD. GSTV is computationally efficient due to the use of fast solvers for the banded system, and applied for PD signal denoising in [37].
In this paper, a new denoising method, VMD-GSTV, is presented to remove the various contaminating noise sources from partial discharge signals. A comparative analysis of EMD-Detrended Fluctuation Analysis (EMD-DFA) [38], Complete Ensemble EMD with Adaptive Noise (CEEMDAN) [20], EMD-GSTV, and VMD-GSTV are conducted, and the algorithms are applied to the simulated synthetic signals and real PD data. The performance indices of these algorithms are computed and presented.
The rest of this paper is organised as follows: PD measurement setup and source of disturbances in PD measurement are discussed in Section 2. The proposed method followed by the techniques used such as VMD, MI, DE, and GSTV methods are briefly introduced in Section 3. Applications of the VMD-GSTV to synthetic and real-world signals are presented in Section 4. The simulation results and testing are presented in Section 5, discussion and conclusion are in Sections 6 and 7, respectively.

PD Measurement
According to the IEC60270 standard [39], the most common techniques used for PD measurement are electrical, ultra-high frequency measurement, acoustic emission, and the high frequency current transformer (HFCT) sensor method. Each type exhibits a different behavior in terms of pulse type, width, rise, and decay time. A typical electric PD measurement setup with the test object is shown in Figure 1, which is one of the most popular methods used in controlled areas such as laboratories. In the PD measurement setup, U∼ is the high-voltage power supply, Z is the filter, and C a is the test object. The coupling capacitor (C k ) allows flow of short PD current pulse, and the matching impedance (Z m ) of Coupling Device (CD) converts the PD current pulses into voltage pulses. The matching impedance, Z m is either RC circuit for wide-band PD detection or RLC circuit for narrow-band detection as shown in Figure 2. The detector outputs different pulse shapes based on the type of detection circuit, which is realized as the natural response of either parallel RC or RLC circuit [7,40]. These are Damped Exponential Pulse (DEP) and Damped Oscillatory Pulse (DOP).

Sources of Disturbances in PD Measurement
A typical PD measurement system contains a sensor, an amplifier, an oscilloscope, and a computer for data acquisition and processing of data. HFCTs are often used as sensor in non-invasive PD measurement systems, which is clamped around the conductor that connects the cable to the ground. According to the IEC60270 standard [39], the main source of noise in PD measurement is background noise, which does not originate in the test object. Background noise comes in the form of either white noise in the measurement system, high-frequency signal interference from radio broadcasts, or due to the switching operations in other circuits or commutating machines, etc. [39]. The external interference in PD measurements can lead to the wrong detection of PD signals due to the larger magnitude than the PD signal. The on-site noise and disturbance can be classified as: • White noise-a random noise signal that has same power at all frequencies. The thermal noise generated by the detection system and the noise sources such as ambient noise and amplifier noise are considered as white noise. a detailed discussion can be found in [2,5,41]; • DSI-the interference from radio transmission such as communication and amplitude modulation/frequency modulation ratio emissions. A detailed discussion is found in [7]; • Pink (or 1 f ) noise is a signal with a power spectral density that is inversely proportional to the frequency of the signal. Detailed discussion can be found in [24].
In this work, DOP and DEP models along with white noise, DSI, and color noise is considered for implementation. Apart from this, the proposed denoising method is evaluated using the standard test signals such as Blocks, Bumps, Doppler, Heavy Sine, and real PD data. The simulation models with the parameter settings will be discussed later in this paper.
Measuring instrument Figure 1. Standard setup used for PD measurement [39].  Figure 3 outlines the method followed in the proposed denoising method VMD-GSTV. The proposed VMD-GSTV inherits the advantages of both the VMD and GSTV method in addition to mutual information and dispersion entropy. The steps of the algorithm are as follows:

Review of Algorithms
1. Decompose the input signal by the EMD to obtain IMFs; 2. Calculate the MI of the phase spectra of the IMFs and determine the number of modes (K) for VMD; 3. Decompose the input signal by VMD using the mode parameter K to obtain BLIMFs; 4. Calculate the MI of the phase spectra of the BLIMFs, and draw the boundary between noise and noise-dominant BLIMFs; 5. Compute the DE of noise-dominant BLIMFs and set the value of λ for GSTV; 6. Denoise the noise-dominant BLIMFs with GSTV to create an output vector and discard the noise BLIMFs; 7. Add the signal BLIMFs to the output vector directly without denoising to reconstruct the signal.  6. Denoise the noise-dominant BLIMFs with GSTV to create an output vector and discard the noise BLIMFs; 7. Add the signal BLIMFs to the output vector directly without denoising to reconstruct the signal.
In the following sections, the techniques that formulate the proposed denoising method are reviewed and discussed.

Variational Mode Decomposition (VMD)
VMD decomposes a real valued input signal f into K number of predefined BLIMFs referred to as u k , which is compact around a central frequency ω k . The process of signal decomposition to solve a constrained variational problem is written as [21]: where {u k } = {u 1 , . . . , u K } and {ω k } = {ω 1 , . . . , ω K } are the mode components and their center frequencies, respectively, K is the total number of modes to be recovered, δ(t) denotes the impulse function and f is the input signal. To solve Equation (1), constrained variational problem is transformed into unconstrained. This is achieved by introducing the Lagrangian multiplier (λ) and quadratic penalty term α. The new unconstrained problem is as follows: The solution of Equation (2) can now be seen as the saddle point of the augmented Lagrangian in a sequence of iterative sub-optimizations referred to as the Alternate Direction Method of Multipliers (ADMM) [42]. The optimization procedure of VMD includes the following steps: 1. Initialize modes û 1 k , center frequency ω 1 k , andλ 1 . Set n = 0 2. Update the modesû k for all ω ≥ 0 :û n+1 In the following sections, the techniques that formulate the proposed denoising method are reviewed and discussed.

Variational Mode Decomposition (VMD)
VMD decomposes a real valued input signal f into K number of predefined BLIMFs referred to as u k , which is compact around a central frequency ω k . The process of signal decomposition to solve a constrained variational problem is written as [21]: where {u k } = {u 1 , . . . , u K } and {ω k } = {ω 1 , . . . , ω K } are the mode components and their center frequencies, respectively, K is the total number of modes to be recovered, δ(t) denotes the impulse function and f is the input signal. To solve Equation (1), constrained variational problem is transformed into unconstrained. This is achieved by introducing the Lagrangian multiplier (λ) and quadratic penalty term α. The new unconstrained problem is as follows: The solution of Equation (2) can now be seen as the saddle point of the augmented Lagrangian in a sequence of iterative sub-optimizations referred to as the Alternate Direction Method of Multipliers (ADMM) [42]. The optimization procedure of VMD includes the following steps: 1. Initialize modes û 1 k , center frequency ω 1 k , andλ 1 . Set n = 0 2. Update the modesû k for all ω ≥ 0 :û n+1 During the optimization procedure, the VMD method follows a non-recursive approach to obtain the BLIMFs and the quadratic data fidelity term in Equation (2) improves the convergence rapidly. Further details and mathematical description of the VMD algorithm can be found in [21]. Within this paper the input signal is decomposed into K predefined modes with the parameters, α = 2000 and τ = 0, tolerance level set as 1 × 10 −6 as described in [21]. The improper selection of the number of modes K results in over-or under-decomposition. In this work we avoid this potential issue by selecting K using an EMD-based algorithm using mutual information analysis.
To demonstrate the VMD decomposition for a noisy input signal f is decomposed using VMD resulting into a set of BLIMFs (u k ), then the frequency spectra (|F(u k )|) and the phase spectra (θ(u k )) of BLIMFs are calculated. As an example, a sample synthetic 'Bumps' signal with added white Gaussian noise of 10 dB is shown in Figure 4 and the noisy 'Bumps' signal is decomposed using VMD as shown in Figure 5, in line with the literature [27], the noisy 'Bumps' signal is also decomposed using EMD as shown in Figure 6.

Mutual Information (MI)
Shannon [23] developed MI to measure the dependency between two random variables. For example, if two random variables are strictly said to be independent, their MI is zero. Let x and y be two independent random variables on same sample space X and Y, respectively, then MI can be defined as: where p(x, y) is the joint Probability Density Function (PDF) of x and y, p(x) and p(y) are the marginal PDF of x and y, respectively. Moreover, the MI can be expressed as where H(x) and H(y) are information entropy and H(x, y) is joint entropy of x and y. According to Rios and De Mello [27], the stochastic or deterministic components present in IMFs can be decided by finding the MI of phase spectra between consecutive IMFs. A noisy signal decomposed by EMD has a set of IMFs in order from high-frequency to low-frequency IMFs. Two consecutive high-frequency IMFs exhibit a lower level of mutual information, as they are considered to be stochastic. Two consecutive low-frequency IMFs shares high information between them resulting in higher values of MI, hence it is considered to be deterministic. As observed in Figure 6, the frequency is reduced as new IMFs are extracted in the EMD method. In Figure 5, the frequency of BLIMFs is increased as new IMFs are extracted in the VMD method. The mean frequency and MI of phase spectra of the IMFs and BLIMFs are listed in Tables 1 and 2, respectively.   The threshold value selected for EMD is 0.9 and the VMD is 0.6 for dividing noise and noise-dominant IMFs. These values are selected based on the mean MI values between IMFs of various noise sources discussed in Section 4.4. As shown in Table 2, the first value above the threshold is used to draw the boundary. The MI value 2.3165 of I MF 4−5 is above the threshold value 0.9 for EMD and 1.4764 of I MF 1−2 is above the threshold value 0.6 for VMD, therefore I MF 5 and I MF 2 are selected as the boundary between noise and signal-dominant IMFs in EMD and VMD methods, respectively. Then, each IMF is analyzed using the DE as summarized in the following section.

Dispersion Entropy (DE)
In this work, the DE measure is used to classify whether a particular IMF is a noise or noise-dominant or signal IMF. PE is a measure for arbitrary time series based on analysis of permutation patterns and DE is an improved version of PE to quantify the regularity of time series [30]. For a given time series x with the length of N, the DE is calculated as follows [30]: Initially, x = x 1 , x 2 , · · · , x N are mapped to y = y 1 , x 2 , · · · , y N from 0 to 1 using the Normal Cumulative Distribution Function (NCDF) which is defined as: where (µ) is the mean and (σ) is the standard deviation of the signal x, and y is the probability that a random variable x is less than or equal to the time series x. Then each y j , where j = 1, 2, . . . , N are assigned a class from 1 to c by the linear algorithms as follows: where z c j denotes the j-th member of the classified time series. Then, the embedding vector z m,c i with dimension m and time delay d are generated using the following equation: then, each time series z m,c i is mapped to a dispersion pattern The total possible number of dispersion patterns is c m . The relative frequency of each potential dispersion patterns can be given by: where # means the number of dispersion patterns of π v 0 v 1 ···v m−1 that is assigned to z m,c i .
Lastly, according to Shannon's definition of entropy, the DE value with embedding dimension m, the number of classes c, and time delay d is computed as follows: As suggested in [30], the following parameters are selected: the embedding dimension m = 2, the number of classes c = 5, and the time delay d = 1. Each IMF obtained through EMD and VMD methods are segmented into 128 sample frames and the DE value of each segment is calculated. The segment size is selected as the power of 2 value and c m < N. For example, the 'Bumps' signal of the 1024 sample point that is decomposed through EMD and VMD has eight IMFs, each IMF is further segmented into eight frames, and the DE of each frame is calculated. The computed DE value of each frame of EMD and VMD is shown in Figure 7a,b. Three noise sources, as described in Section 4.4, such as Additive White Gaussian Noise (AWGN), DSI, and the color noise of 8K sample points are segmented into 128 sample frames, then the DE value of each sample frame is calculated and plotted in Figure 7c. It is observed that the DE values of various noise sources are higher than three for all segments. As observed in Figure 7a,b, the DE values of noise IMFs' segments are higher than 2.75, and in some cases above 3 as well. On the other hand, the DE values of signal IMFs' segments are in the range of 1 to 2 and the DE values of noise-dominant IMFs are in the range of 2 to 2.75. The noise-dominant IMFs need to be filtered based on the noise content present in the IMFs, and hence based on these DE values of each IMF, the key parameter is set for the denoising method.

Group-Sparse Total Variation (GSTV) Denoising
GSTV is an extension of total variation denoising, designed to reduce the staircase artifact that occurs in the total variation denoising method. A suitable penalty function is used to promote the group sparsity behavior of the signal derivative. A computationally efficient and fast converging MM algorithm is used to minimize the F(x) without any parameters [33].
Assume that noisy signal y n ∈ R N is modeled as given in (10) y n = x n + w n n = 1, 2 . . . , N x n ∈ R N and w n ∈ R N are the clean signal and the noise, respectively. The clean signal can be estimated by solving the optimization problem: where φ is a penalty function that promotes group sparsity, λ is the regularization parameter and Dx n as the first-order difference matrix of an N-point signal x n , where D is a matrix of size (N − 1) × N is represented as The penalty function described in [33] is used in this work, P denotes the group size. The value of the P in the range of 1 to 10. If P = 1, φ(v) = v 1 then Equation (11) is the standard 1D total variation denoising problem. In this work, P is set as 3, the function φ(v) is a convex measure of group sparsity. The λ has more influence on denoising as it changes the total variation of the signal. A positive value of λ is selected for denoising noise-dominant IMFs based on the DE value as described in the previous section and P value is selected according to the 1D denoising illustration given in [43].  Figure 4 and (c) three noise sources such as AWGN, DSI, and 1/f noise. In (a,b), the segments above the 'Noise' line are considered as noise IMFs, the segments below the 'Signal' line are considered as signal IMFs and the segments in between are referred to as noise-dominant IMFs.

Applications of the VMD-GSTV to Synthetic and Real-World Applications
To validate the performance of the proposed method, we used six signal types as shown in Figure 8. As per the literature [24,26], four of the six signals are standard test signals used for evaluating denoising methods. The two other signals are DOP and DEP, which are the types of PD signals discussed in Section 2. These signals are corrupted by artificially generated noise signals. Apart from these signals, three real PD data set from [44] are used to validate the proposed method. In this section, the standard test signals, noise models, and real PD data set used in this work are presented.

Synthetic Test Signal Models
The test signal function in MATLAB ® such as Blocks, Bumps, Doppler, and Heavy Sine are used to model the synthetic signals, as shown in Figure 8a-d. The signal size is set from 1 K (2 10 , 1024) to 16 K (2 14 , 16,384) in order to analyze the performance of the denoising algorithms for different signal size.

Synthetic PD Models
Two types of partial discharge models were considered for the simulation. The PD signal PD DOP (t) and PD DEP (t) are modeled using the Equations (14) and (15).
PD DEP (t) = A e −αt − e −βt (15) where A represents the amplitude of the pulse, α and β are the damping factors and ω is the damping frequency. Figure 8e,f shows the PD DOP (t) and PD DEP (t) pulse with amplitude between 3 V to 7 V, α is set as 7.5 × 10 5 , β is set as 16 × 10 6 and ω as 150 kHz. The sampling frequency f s is set to 20 MHz.

Real PD Data
The real PD data available at [44] of the void, surface, and corona discharge signals were used to test the algorithm and the outcomes are reported.

Common Noise Models in Measurement System
The noise sources are independently modeled as described in the following sections and are added together to generate different noise signals as shown in Table 3. The noise signal Sx n is modeled with the presence and absence of AWGN, DSI, and pink noise, where x is 1 · · · 11 and N is the length of the signal: The six test signals are added with five levels of AWGN from −5 dB to 20 dB with the presence and absence of DSI and pink noise, making a total of 66 signals.  (17). - A more common random noise process is white noise, which possesses uniform power at all frequencies and the noise voltage amplitude has a Gaussian or Normal distribution. The MATLAB ® built-in function 'awgn' is used to add white Gaussian noise with a Signalto-Noise Ratio (SNR) of −5 dB, 0 dB, 5 dB, 10 dB and 20 dB. The signal power is measured before adding the noise to it. The noise signal Sa n represented as S1 to S5 with five levels of AWGN.

Discrete Spectral Interference (DSI)
The noise due to the interference of communication equipment in the form of Amplitude Modulation (AM) radio, Frequency Modulation (FM) radio, and mobile communication emissions is referred to as DSI. The frequency band of FM and mobile communications systems indicates that these systems have very minimal impact on PD measurements. The presence of continuous sinusoidal noise from the communication systems is represented in the form of a combination of AM signals given by Equation (17).
where A c is the carrier amplitude and f c is frequency of the carrier signal, m is the modulation depth, and f m the frequency of the modulating wave. In the simulation, the following values were used:

Pink Noise
A third noise model is pink noise, simulated using MATLAB ® function ColoredNoise for required length, represented as Sp n .

Numerical Experiments
The simulation is carried out using MATLAB ® installed on a Windows 8.1 operating system running on an Intel(R) Core i7-4700MQ CPU @ 2.40 GHz processor with 8 GB RAM. The simulated signal is added with various noise signals such as AWGN, DSI, and Color noise, denoised using various algorithms and the filter evaluation parameters were calculated for each method and compared with the proposed method.

Denoising Algorithms
Using simulated data, the following methods have been considered for comparing the performance of proposed method: • EMD-DFA [38]-EMD-based denoising technique with detrended fluctuation analysis (DFA) to define the threshold to reject noisy IMFs and reconstruct the signal; • CEEMDAN [20]-a complete ensemble EMD with adaptive noise, a variation of EEMD provides better spectral separation of the IMFs. EEMD proposed in [19] is an extension of EMD, developed to overcome the mode mixing issues; • EMD-GSTV-classical EMD method [17] with a proposed framework for the selection of IMFs to reconstruct the signal.

Filter Evaluation Parameters
In order to assess the performance of denoising algorithms, the following parameters were computed and analyzed. Let us consider input signal x n , reconstructed signal xr n , and noisy signal y n .
A. Root Mean Square Error (RMSE): The RMSE between the input signal and reconstructed signal is given in Equation (18). The lower value of RMSE indicates that the reconstructed signal is similar to the simulated signal and better denoising algorithm.
B. Signal to Noise Ratio (SNR): The SNR is calculated in order to test the effectiveness of denoising techniques. The SNR is given as The positive value of the SNR indicates the high power of the signal as compared to noise level and vice versa.
C. Correlation Coefficient (CC): It is computed using the Equation (20), wherex is the mean value of x n ,xr is the mean value of xr n , is given by The inference obtained from the CC is as follows: If CC = 1 indicates the highest shape similarity, whereas if CC = -1 means total asymmetry between the signals.

Results
The performance of denoising methods are verified using standard filter evaluation parameters such as SNR, RMSE, and CC. The results presented in Tables 4-7 show the mean value of the parameters measured for 10 iterations of the denoised signals of various denoising algorithms. The effective filter should remove the unwanted noise components, which have no relationship to the signal of interest. The best evaluation parameters in the results tables are highlighted in bold for each method and noise levels. The analysis is carried out based on the output SNR values, signal length, RMSE, and CC. The performance parameter with high SNR, high CC, and low RMSE values are considered to be the best filters.
Noise sources such as DSI and color noises are also added with the test signals and the performance of the algorithms are evaluated. The impact of these noises imposed on the test signals are minimum and in most of the cases, the merit of the parameters remains the same.

Illustration of Denoising Simulated Signals Corrupted by White Noise
Four out of the six signals shown in Figure 8 are the standard test signals, and the two other signals are PD signals. Table 4 presents SNR, RMSE, and CC values of denoised 'Blocks' signal with 16 K sample points corrupted with AWGN for various denoising methods used in this paper. Each test signal with varying signal lengths (N = 2 10 to 2 14 ) are corrupted by AWGN with input SNR −5 dB, 0 dB, 5 dB, 10 dB, and 20 dB. We can note that VMD-GSTV performs better as compared to other techniques. It is also observed from the signal length versus output SNR plots, as shown in Figures 9 and 10, that VMD-GSTV provides better output SNR values for most of the synthetic test signals. A sample denoised signal 'Bumps' and 'Block' obtained from various methods are shown in Figure 11a,f.  Generally, EMD and VMD-based denoising methods exhibit lower performance in piece-wise constant signals [24,26]. Referring to Table 4, the highest output SNR value of the proposed VMD-GSTV for the 'Blocks' signal for all input SNR levels are followed by CEEMDAN and EMD-GSTV. The output SNR values are better in 'Blocks', 'Bumps', 'Doppler', and 'PD DOP' signals for a different signal length (N = 2 10 to 2 14 ). These result indicate the efficacy of the proposed denoising method.
Signal length plays an important role also in denoising algorithms, the computation time is directly related to the signal length. From the observation of SNR and RMSE plots, most of the algorithms perform better in higher signal length, notably VMD-GSTV is good in all synthetic test signals except PD DEP signal.
The output signals are analyzed qualitatively by plotting the original signal and denoised signals obtained from various methods. Figure 11a,f has the original 'Bumps' and 'Block' signal shown in black color and noisy 'Bumps' and 'Block' signal corrupted with input SNR = 5 dB AWGN is shown in gray color. Figure 11b-e,g-j has a denoised signal of a different algorithm with the original input signal. The figures demonstrate that the proposed VMD-GSTV method has better reconstructed output signals under extreme noise conditions as compared to EMD-DFA, CEEMDAN, and EMD-GSTV methods.    Tables 4-7 show the RMSE value computed between the denoised signal input signal, Figures 12 and 13 show that in most of the cases the RMSE value of proposed VMD-GSTV is better than other methods. In certain cases the RMSE values of CEEMDAN is better than the proposed method, however, in the higher signal lengths the proposed method performs better. The proposed methods result in lower RMSE value for PD DOP signal, on the other hand PD DEP in lower noise levels with a higher signal length.

Illustration of the Denoising PD Signal
The PD signals stated in Section 4.2 with different amplitude, damping factor, and damping frequencies were considered subject to denoising algorithms. The PD DOP signal corrupted with 10 dB AWGN and the denoised signal using various algorithms is shown in Figure 14. The most common source of noise described in Section 4.4, is mixed with a different PD signal, which adversely affects the shape of the original pulse shape and the PD footprints. Hence, the proposed algorithm is subjected to the noise models and the outcome is analyzed. Figure 15 shows the performance of VMD-GSTV on various test signals with a different signal length corrupted with AWGN of 5 dB or AWGN of 5 dB + DSI or AWGN of 5 dB + DSI + pink noise. It is observed that the four test signals have almost similar results, whereas the output SNR value of PD DOP and PD DEP signals have a down trend in output SNR while pink noise is added to the signal. However, as per the numerical values, the VMD-GSTV is better than other methods, which is not included in Figure 15.
Apart from simulated PD signals, the real signals such as 'PD Surface', 'PD Void', and 'PD Corona' are also used in this work as a dataset from [44]. The data available in the literature is longer, hence only 2 K sample points are used in order to show the PD pulses on the plot. The output of the VMD-GSTV method is presented in Figure 16. The proposed method can eliminate the noise from the PD signal and retain the PD pulses.

Conclusions
In this paper, a new denoising method is proposed which combines VMD, statistical features such as MI, DE, and the GSTV denoising algorithm. Our approach is to decompose the signal using VMD with the mode parameter set by number of EMD IMFs with MI analysis, later VMD IMFs are also analyzed using MI to group noise-dominant and signaldominant IMFs. Based on the DE values of the noise-dominant IMFs, the IMFs are filtered using GSTV with appropriate λ value to create the output vector. The signal-dominant IMFs are directly added to the output vector to create the denoised signal.
Using simulated and real data, the proposed method VMD-GSTV can remove the noise in the given signal while retaining the signal features. The proposed method is having better output SNR, RMSE and CC parameters in most signals as compared to the other methods considered in this paper. The output SNR of the proposed method is 4 to 13% higher than other methods for the input SNR of −5 dB, this indicates that the proposed method performs better under extreme noisy conditions. Furthermore, this method is applied to denoise synthetic PD DOP and PD DEP signal and real PD data of corona, surface, and void discharges. As per the observation, the proposed method has the ability to denoise the PD signals with low amplitude, buried white noise, DSI, and pink noise.
The application of other entropy methods such as Multiscale Dispersion Entropy (MDE), Refined Composite Multiscale Dispersion Entropy (RCMDE), and other decomposition methods such as ALIF and Empirical Wavelet Transform (EWT) will be considered in future work.