Block-Adaptive Rényi Entropy-Based Denoising for Non-Stationary Signals

This paper approaches the problem of signal denoising in time-variable noise conditions. Non-stationary noise results in variable degradation of the signal’s useful information content over time. In order to maximize the correct recovery of the useful part of the signal, this paper proposes a denoising method that uses a criterion based on amplitude segmentation and local Rényi entropy estimation which are limited over short time blocks of the signal spectrogram. Local estimation of the signal features reduces the denoising problem to the stationary noise case. Results, presented for synthetic and real data, show consistently better performance gained by the proposed adaptive method compared to denoising driven by global criteria.


Introduction
Non-stationary signals, produced by numerous phenomena that are the focus of various disciplines of engineering, demand particular techniques of representation and analysis due to their frequency content, which varies over time [1].
Two-dimensional energy time-frequency distributions (TFDs) have been found to be particularly useful tools when dealing with challenging features, such as multiple components presenting variable instantaneous frequencies (IFs), or unwanted presence of noise.
In various engineering applications, what is often considered the useful information content of a signal consists of signal components represented in the time-frequency (TF) plane by continuous energy regions, peaks of which correspond to the IFs of the noise-free signal.
However, degradation of the signal quality, which may occur due to noise present during the signal acquisition or transmission, makes the extraction of the signal components from the background noise a challenging task.
On the other hand, methods that are not limited to particular types of signals have been proposed, but these often put constraints on the signal structure (a constant or known number of components is required in the entire measurement time, or signals components are not allowed to intersect) [16][17][18][19].
Selection of the TFD regions that support the signal components have also been proposed by thresholding methods (usually as a percentage of the maximal TFD value) [2,4,7]. However, if the choice of the threshold is not driven by trial-and-error procedures, it can produce significant errors in the extracted information.
New insights into the TFD structure have been provided by local TF entropy analysis. In [20], a TFD amplitude segmentation is proposed to partition the TFD into classes, whose local Rényi entropies (LRE) are evaluated and compared in order to select only those classes that contain the signal components. Compared to methods that make use of 2-D entropy maps for signal denoising [21], the LRE approach, which is a 1-D estimation, drastically reduces the computational cost.
In this paper, we propose a 1-D entropy-based method that, unlike the one in [20], is adapted to perform in conditions of variable noise intensity. This is possible due to multiple partitioning of the signal TFD; the TFD is initially divided into time building blocks, and each block is then subject to amplitude segmentation. The method is not limited to certain types of signals and does not require any previous knowledge of the signal.

Background Theory
A non-stationary multi-component signal can be written in the form [1] with L being the total number of signal components, A l (t) the amplitude of each of the individual components x l (t), and Φ l (t) the instantaneous phase. The IF corresponds to the derivative of the instantaneous phase as f l (t) = Φ l (t)/2π.
Such signals, however, may be distorted at the time of their collection from noisy environments, as well as during the transmission process.
An additive signal model that takes into account the presence of independent additive white Gaussian noise (AWGN), ν(t) ∼ N (0, 1), reads On the other hand, more complex noise conditions may occur as in the case of nonstationary noise, which may be the result of noise intensity modulation. In this case, the noise is of the form w(t)ν(t), with ν(t) being unit normal distribution variance and w(t) non-constant positive. For any t 1 , t 2 , the random variable is the unit normal distribution, hence the process is Gaussian, but, since the amplitude of the process w(t) varies, the process is not stationary in either a strict or weak sense. Thus, even in an ostensibly Gaussian case, non-trivial effects may occur. The simultaneous presence of multiple spectral components and noise privileges representations of such signals in the joint time-frequency domain.
The general class of quadratic time-frequency distributions (TFDs) is defined in terms of the noisy signal instantaneous autocorrelation function, and the time-lag kernel G(t, τ), as [1] A discrete model of the noisy signal reads where The discrete quadratic class TFD is computed as the discrete Fourier transform (DFT) of the convolution of the time-lag kernel filter G(n, i) and the instantaneous autocorrelation function as [1] ρ(n, m) = DFT i→m {G(n, i) * n y(n + i)y (n − i) }, (6) for i in an interval of integers.
In the attempt to select the TFD regions that are considered useful for further analysis, i.e., signal component, data amplitude can be considered as a discriminant. In fact, signal components, when represented in the TF domain, tend to be prominent energy ridges emerging from the noisy background. However, establishing a threshold value to discriminate useful data from noise appears far from practicable.
In this sense, machine learning methods can be considered useful tools for initial data segmentation.
Amplitude discrimination of data in the TFD can be achieved by application of the K-means algorithm [22]. If the TFD, ρ(n, m), is considered as an N×M-dimensional set of observations, the K-means algorithm partitions these N×M observations into K sets by minimizing the within-cluster sum of squares as: where P k is the mean of each set C k .
Thus K classes ρ k (n, m), k ∈ N, 1 ≤ k ≤ K, are obtained from the TFD as collection of coefficients ρ(n, m) satisfying ρ k (n, m) = 1 C k (n, m)ρ(n, m), with the set indicator function defined by Yet, the segmentation procedure itself can not be considered a predictor of the origins of data contained in a particular class. Namely, classes containing mainly noise-originated data should be discarded from further analysis, while classes containing data signal components should be preserved.
The white noise assumption predicts noise power spectral density evenly spread over the signal bandwidth. In fact, a TFD of dimension N computed over M frequency bins, with L components supported over 1-D trajectories, with L N, imposes a sparse distribution on the TF plane [23].
In light of the above, classes containing mainly noise coefficients are expected to present large frequency supports (intended as the subset of the frequency domain containing non-zero elements), in comparison to classes containing signal components.
As a result, the segmentation procedure, ruled by Exp. (8), assigns densely distributed classes to components, due to their fast amplitude changes. Accordingly, data in the resulting classes will present fractional frequency supports in comparison to classes populated by noise-originated data.

A LRE-Based Criterion for Useful Class Selection
In [20], the properties of the LRE, well-known measures of signal supports in the TF plane, have been exploited in order to identify structural differences between the noise classes and signal component classes.
The LRE is obtained by estimating the Rényi entropy over a short time slice of duration ∆n of one class of a positive TFD, as: (11) with n constrained to the interval [p − ∆n/2, p + ∆n/2], m going over all frequency bins, and ρ(m, n) normalized to 1 across the computation domain. The LRE measure, highly indicative of TFD supports, is then used as input to a selection procedure that provides classes containing signal components. This procedure produces highly reliable discrimination of classes containing noise from classes containing signal components.
The TFD of a two-component noisy signal and LRE estimates H k (p) for K-means segmentation into five classes are reported in Figure 1a,b.
The selection procedure reported in the steps below is based on a relative distance criterion of the LRE functions H k (p): 1.
The first class ρ 1 (n, m), consisting of the smallest coefficients obtained from expression Exp. (8), associated to the LRE function H 1 (p), is discarded as noise.

2.
Starting from the second class, ρ 2 (n, m), all consecutive classes for which for at least one instant p H k (p) ≥ H 1 (p) holds are classified as noise, and thus discarded.

3.
Considering that in the previous steps a total of r classes have been discarded, for the remaining classes, ρ k (p), r < k ≤ K we introduce a threshold i as follows. We first introduce a closeness value d k on the remaining classes by We now define i as the least index of The remaining classes, with indexes k = i, ..., K, are added up to obtain the useful information content of the signal, i.e., the signal components.
The criterion for class selection requires, according to Exp. (8), that the first class ρ 1 (n, m) is populated by coefficients with the smallest amplitudes. By this assumption, the first class is discarded. The algorithm then relies on a class's structural affinity criterion: if a class, ρ k (n, m), presents larger frequency supports than ρ 1 (n, m), it will be also considered populated by noise coefficients. In other words, all the classes that for at least one instant p satisfy H k (p) ≥ H 1 (p) are also discarded.
The remaining classes, starting from the two consecutive classes whose LRE presents the smallest difference H k (p) − H k+1 (p), are classified as useful classes, i.e., containing coefficients generated from signal components.
The useful classes summed up, representing the extracted signal components, are shown in Figure 1c.
However, the method has been initially formulated for noisy signals in the form of Equation (4), which implies constant noise intensity over the measurement time. This fact then puts into question the suitability of the method for a more generalized signal of the form with non-constant noise intensity w(n) = const.
In fact, in the case of noise with variable intensity, the region of the signal presenting the largest local signal-to-noise ratio (SNR) will be the one dictating the number of classes to be discarded. This will cause an unjustified loss of useful information in the regions of the TF plane where the components' structure is better preserved.
The described scenario can be observed in Figure 1: the two-component non-stationary signal is embedded in AWGN modulated by a Gaussian window of the form w(n) = e −2β 2 (n−N/2) 2 (N−1) 2 , β = 2.5. As can be observed, the intense noise level in the central part of the signal forces classes C 1 , C 2 , and C 3 to be discarded, since in the central part of the signal noise content is significant. On the other hand, in the initial and final part of the measurement interval, the presence of noise is negligible, which is ignored by the selection criterion. This results in severe loss of signal components in the regions of the TF plane where the level of noise is low.

A Short-Term LRE Approach for Variable Noise Intensity Conditions
In order to address the problem of component extraction in the case of noise with variable intensity over time, we propose a local approach, which accounts for variable degradation of useful content over the temporal axis. To achieve a local insight into the TFD structure, we split the TFD into temporal segments, representing the TFD as the sum of N ∆t ∈ N blocks of duration ∆t. The amplitude segmentation based on the K-means over N/∆t blocks will result in K classes for each block: Again, the within-cluster sum of squares minimization is applied to produce these K classes for each building block, as where P k,t is the mean of each set C k,t .
Thus we obtain K classes ρ k,t (n, m), k ∈ N, 1 ≤ k ≤ K, derived form one TFD block ρ t (n, m), as follows For a non-negative TDF, the LRE is now well-defined and dependent both on t, representing the block index, and p, the temporal index inside one block: with n constrained to the interval [p − ∆n/2, p + ∆n/2], m going over all frequency bins, and where ρ(m, n) is normalized across the summation domain.
Since the local Rényi entropy of Equation (18) is frequency-shift invariant, by shifting different energy regions inside one time slice ∆n we can obtain a single energy region with compact frequency support ∆m, centered around an arbitrary frequency µ t (p).
Taking a non-zero region inside a short time interval ∆n and frequency m, with frequency support ∆m, we approximate all non-zero elements in one class ρ k,t (n, m) by A k . This approximation is justified since the K-means clustering is done across amplitudes.
The adopted approximation makes the amplitude A k homogenous of order α in the expression for the Rényi entropy, so it cancels out.
The local Rény entropy of ρ k,t (n, m) now becomes Thus, regardless of the choice of α, the LRE describes a normalized weight of the TFD domain belonging to a certain amplitude class and point in time dependent on p and t.
The criterion for selecting classes containing useful information is now applied to each time block t ranging from 1 to N/∆t. Thus, the algorithm can be summarized as follows:

1.
Initially t is set to one. 2.
The first class ρ 1,t (n, f ), consisting of the smallest coefficients obtained from Exp. (15) and associated with the LRE function H 1,t (p), is discarded as noise.

3.
Starting from the second class, ρ 2,t (n, f ), all consecutive classes for which for at least one instant p holds H k,t (p) ≥ H 1,t (p) are discarded as noise.

4.
Assuming that in steps 2 and 3 a total of r t classes has been discarded, for the remaining classes ρ k,t (p), r t < k ≤ K we introduce a threshold i t as follows. We first introduce a closeness value d k,t on the remaining classes by We now define i t as the least index of d i t ,t such that d k,t < d i,t for k < i t and d k,t ≥ d i,t for k > i t .

5.
The classes with indices k = i t , . . . , K are summed together to obtain the TFD building block with the useful information content of the signal at block t, 6. t is incremented by one and the procedure is repeated from step 2, until t = N/∆t.

7.
The total useful information of the signal is obtained by summing the extracted information over all the building blocks as The performance of the proposed method, applied over ∆t = N/5 TFD blocks with N = 500 s, is reported in Figure 2. The noisy TFD is partitioned in five blocks of duration ∆t = 100 s (Figure 2a). The LRE is estimated for five classes of each of the TFD blocks (Figure 2b), and the output of the algorithm according to the selection criterion applied to each of the TFD building blocks is shown in Figure 2c.
Compared to the extracted components when the LRE is estimated over the entire time axis (Figure 1c), the presented approach, which estimates the LRE over individual time blocks, clearly preserves useful information (as visible from the first and last time blocks) by adapting the number of removed coefficients over the measurement time.

Real Data
As illustrative examples of real-life signals, acoustic waves propagated in air, namely a flute sound signal and a bird song signal, are considered (the omnidirectional microphone used for data collection has a sample/bit rate of 48 kHz/16-bit, frequency response of 20 Hz-20 kHz, and sensitivity −36 dB (1 V/Pa at 1 kHz)).
The flute signal has been collected by a bit depth of 16 bits per sample, with a sampling rate of 8 kHz degraded by transient television static.
The noisy TFD, together with results produced by the non-adaptive LRE-based algorithm [20] and the proposed block-adaptive LRE algorithm is reported in Figure 3a,c,e.
The bird song signal has been collected in an environment of howling wind and rustling leaves (psithurism), by a bit depth of 16 bits per sample, with a sampling rate of 12 kHz. The noisy TFD, the outputs of the non-adaptive LRE-based algorithm [20] and the proposed block-adaptive LRE algorithm are reported in Figure 3b,d,f.
For both reported real-life signals, the non-adaptive LRE approach results in a significant loss of useful information compared to the proposed block-adaptive LRE algorithm.
In fact, for both considered signals, the non-adaptive LRE estimation applies an excessively strict criterion on signal components, as shown in Figure 3c,d.
The non-adaptive LRE criterion is determined by the worst noise conditions encountered over the entire measurement time. As a result, the non-adaptive LRE provides near-to-optimal denoising only for the most compromised parts of the signal TFD, with unnecessary useful information loss elsewhere. On the other hand, the block-partitioning treats sub-regions with less variable noise levels, which assures locally adequate K-means segmentation and efficient noise removal, with a contained loss of useful information (Figure 3e,f). For real-life signals, an ideal, noise-free counterpart of the signal is not available, as well as a numerical assessment of the algorithms' performance. However, the visual assessment suggests the method provides a more favorable trade-off between noise suppression and useful information loss when compared to the non-adaptive LRE approach.
This, however, needs to be corroborated by numerical evidence on synthetic data.

Simulation Results
The results provided by the proposed block-adaptive LRE algorithm are compared to those obtained by the non-adaptive LRE-based algorithm presented in [20] and the well-performing K-means ICI algorithm, which combines TFD K-means segmentation with an intersection of confidence intervals (ICI) statistical criterion on classes supports to recover useful information [8].
The performance quality is evaluated by means of the error-rate measure. The error rate is calculated by subtracting the denoised TFDs from the reference, noise-free TFD.
The number of residual non-zero elements represents the error rate as a percentage of the N × M-dimensional set of observations.
The results are presented for two multi-component test signals for different SNRs.
The first of the test signals, which will be referred to as Sig 1, consists of two linearly frequency modulated components, with the addition of non-stationary white Gaussian noise.
The second signal, which will be referred to as Sig 2, presents two components, with sinusoidal and parabolic frequency modulations, respectively. Sig 2, which is embedded in additive non-stationary uniform white noise, and presents intersecting components. The signals' parameters are reported in Table 1.
The spectrograms of the noise-free and noisy signals (computed with a Hamming window with a duration of round (N/7) the information content extracted by the nonadaptive LRE algorithm, the K-means/ICI method, and the block-adaptive LRE algorithm are shown in Figure 4.   (g,h), and extracted useful information by the proposed block-adaptive LRE algorithm (i,j). Signals are referred to as Sig 1 (left column) and Sig 2 (right column). Table 1. Parameters of the test signals from Figure 4. y(n) = ∑ L l=1 x l (n) + w(n)ν(n), x l (n) = A l (n)e jΦ l (n) 2π(195 × 10 −6 n 2 + 249 × 10 −3 n − 76) 2π(283 × 10 −6 n 2 + 9 × 10 −3 n − 21)  Table 2 reports the obtained results in terms of error rate (ER), and false negative (FN) estimates, representing the useful information damage. Results are based on 1000 independent noise realizations for the two test signals and four different SNRs. For the proposed algorithm, results are reported for three different values of ∆t (building-block duration). Table 2. Performance comparison for signals in Figure 4. Values are averaged from 1000 simulations of the signal with different noise realizations.

Discussion
As visible from Figure 4 and corroborated by numerical results in Table 2, the proposed adaptive method convincingly outperforms the non-adaptive LRE-based algorithm and the ICI method in terms of ER; even better results are obtained for both the test signals over the entire simulation set while maintaining stable performance with respect to the parameter ∆t.
In terms of ER performance, improvement compared to the non-adaptive LRE-based algorithm spans from 10.76% (∆t = N/5, 6 dB) to 19.10% (∆t = N/9, 0 dB) in the case of Sig 1.
When compared to the ICI method, the proposed algorithm reduces the ER in the range of 4.19 (∆t = N/9, −3 dB) to 9.51% (∆t = N/9, 6 dB) in the case of Sig 1.
Considering the FN component of the ER brings detailed insight into the performance of the compared methods.
Compared to the proposed method, the non-adaptive LRE-based algorithm presents significantly larger ER and FN estimates, suggesting that it provides a too strict criterion in the case of variable noise conditions, causing unjustified loss of useful information. In terms of FN estimates, the ICI method performs similarly to the proposed algorithm, while maintaining a substantially larger ER. This is to indicate that the criterion applied by the ICI method is too concessive for signals embedded in non-stationary noise.
Partitioning of the TFD allows the proposed block-adaptive algorithm to enhance the sensitivity of the K-means amplitude segmentation since inside one time block nearto-stationary noise conditions are achieved. As a result, time blocks present lower noise levels and better-preserved components are spared from the strict criterion that is applied to blocks severely degraded by noise presence. This results in a generally better recovery of signal components while maintaining accurate denoising over different time blocks.
The partitioning does not add a significant computational burden. The computation of local entropy is in effect done here across the whole TFD, so the total computational cost is O(N MK), the same as the non-adaptive LRE approach. The block-based method presented here can be seen as iterative applications of non-adaptive LRE over smaller segments, so K-means is done multiple times but over smaller segments. The approximate K-means algorithm's cost is controlled by an iteration count I and is linear in the size of the blocks. Thus, we apply K-means N/∆n times on M × ∆n blocks with individual complexity O(M∆nIK), resulting in O(MNKI), asymptotically the same as for non-adaptive LRE.
The complexity of the 2-D entropy estimation [21], on the other hand, is O(N 5 ), which is impeditive for application on real-life signals and extensive simulations.
Concerning parameter tuning, the selection of K, ∆t, and ∆n should be discussed. The number of classes K, analyzed in [24], is recommended to be in the range K = 6 . . . 12. If chosen from the reported interval, the influence of the number of classes on the segmentation results is marginal.
As for the size of the building block length ∆t, the obvious relation N > ∆t > ∆n is imposed, where ∆n, is a constant in Equation (19) [20]. The size of the building block length ∆t determines the degree of the algorithm's adaptability. Values that are too large would result in the algorithm's poor adjustment to the signal's local conditions. For values that are too small, the windows would cause a loss of the TFD's structural features, which are the core of the denoising process. However, both these extremes can be intended as loose constraints, since results are stable for independent ranges N/10 < ∆t < N/5 and N/50 < ∆n < N/25.
Thus, none of the parameters can be considered critical.

Conclusions
This paper presents a method for signal denoising in time-variable noise conditions. The denoising criterion is based on amplitude segmentation and LRE estimation over individual time building blocks of the signal spectrogram. By introducing the blockadaptive approach, the effects of noise non-stationarity are minimized.
In view of these considerations, and based on the reported results, the local estimation approach of the TFD's structural features results in being beneficial for efficient denoising in the case of non-stationary noise.
Furthermore, in this work, we have considered only non-stationary signals with variable noise intensity over time. On the other hand, if signals were corrupted by colored noise with uneven frequency distribution, an extension of the methods' adaptivity should be considered. Band-limited block LRE estimation, representing a multi-dimensional adaptive denoising method, will be the focus of our future work.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: