(cid:96) p -STFT: A Robust Parameter Estimator of a Frequency Hopping Signal for Impulsive Noise

: Impulsive noise is commonly present in many applications of actual communication networks, leading to algorithms based on the Gaussian model no longer being applicable. A robust parameter estimator of frequency-hopping (FH) signals suitable for various impulsive noise environments, referred to as (cid:96) p -STFT, is proposed. The (cid:96) p -STFT estimator replaces the (cid:96) 2 -norm by using the generalized version (cid:96) p -norm where 1 < p < 2 for the derivation of the short-time Fourier transform (STFT) as an objective function. It combines impulsive noise processing with any time-frequency analysis algorithm based on STFT. Considering the accuracy of parameter estimation, the double-window spectrogram di ﬀ erence (DWSD) algorithm is used to illustrate the suitability of (cid:96) p -STFT. Computer simulations are mainly conducted in α -stable noise to compare the performance of (cid:96) p -STFT with STFT and fractional low-order STFT (FLOSTFT), Cauchy noise, and Gaussian mixture noise as supplements of di ﬀ erent background noises to better demonstrate the robustness and accuracy of (cid:96) p -STFT.


Introduction
The outstanding advantages of strong anti-jamming, a low interception rate, and a high-frequency spectrum utilization rate [1][2][3] have led to the rapid development of frequency hopping communication technology in military radio, military wireless communication, and civil wireless local area networks. Under the condition of less prior information, the blind parameter estimation of captured frequency-hopping (FH) signals is the core problem of FH communication. Accurate estimation parameters are helpful to monitor and interfere with enemy communications [4] and then steal enemy signals precisely and effectively, which is a crucial part of communication countermeasures.
The popular parameter estimation methods for FH signals mostly utilize a Gaussian model as background noise. However, a large amount of impulsive noise often occurs in the practical intricate noise environment, such as low-frequency atmospheric noise, low-voltage power communication, seismic signals, and radar clutter [5][6][7], which are characterized by remarkable impulses [8] and heavier tails. Since they have no second-order or higher moments, the performance of the existing time-frequency analysis methods based on the Gaussian noise model is usually unacceptable or even invalid: for example, the commonly used short-time Fourier transform (STFT), which is easy to calculate for simple principles [9]; Wigner-Ville distribution (WVD), with high time-frequency resolution; and pseudo-Wigner-Ville distribution (PWVD), with fewer cross-terms. Even improved algorithms that have been proposed in recent years will exhibit undesirable performance degradation, such as the spectrogram method, double-window spectrogram difference (DWSD) [10] method, and STFT-WVD joint algorithm.

Impulsive Noise Distribution
The α-stable distribution and Gaussian mixture distribution (GMD) are two basic and widely used empirical models for non-Gaussian noise.
The α-stable distribution is completely determined by four unique parameters: α, β, γ, µ. The characteristic exponent α reflects the degree of the impulsive characteristics, which is embodied as the smaller the value of α, the stronger the impulse characteristics. The symmetry β determines the symmetry of the distribution. The dispersion γ is used to measure the extent to which the sample deviates from the mean, and finally, µ is the location parameter. If β = 0, γ = 1, and µ = 0, the distribution corresponds to the standard symmetric α-stable distribution (SαS). For α = 1 with β = 0 or α = 2, they correspond to two special cases of Cauchy distribution and Gaussian distribution [21], which is consistent with the implication of α.
The second-order GMD can be expressed as a weighted summation of two independent and identically distributed (IID) Gaussian distributions [22]; that is: where 0 < ε 1 < 1 is the weighting coefficient, which is used to control the ratio of impulsive noise to Gaussian noise; and µ 1 , µ 2 and σ 2 1 ,σ 2 2 are the mean and variance of the Gaussian distribution, and the impulse characteristics are represented by the large difference between the two variances; that is, σ 2 2 /σ 2 1 . Figure 1 shows the magnitude of the Cauchy noise, the standard SαS noise, and mixed Gaussian noise in the time domain.
The α-stable distribution is completely determined by four unique parameters: α, β, γ, μ. The characteristic exponent α reflects the degree of the impulsive characteristics, which is embodied as the smaller the value of α, the stronger the impulse characteristics. The symmetry β determines the symmetry of the distribution. The dispersion γ is used to measure the extent to which the sample deviates from the mean, and finally, μ is the location parameter. If β = 0, γ = 1, and μ = 0, the distribution corresponds to the standard symmetric α-stable distribution (ЅαЅ). For α = 1 with β = 0 or α = 2, they correspond to two special cases of Cauchy distribution and Gaussian distribution [21], which is consistent with the implication of α.
The second-order GMD can be expressed as a weighted summation of two independent and identically distributed (IID) Gaussian distributions [22]; that is: where 1 0 1 ε < < is the weighting coefficient, which is used to control the ratio of impulsive noise to Gaussian noise; and 1 μ , 2 μ and 2 1 σ , 2 2 σ are the mean and variance of the Gaussian distribution, and the impulse characteristics are represented by the large difference between the two variances; that is, 2 2 2 1 σ σ . Figure 1 shows the magnitude of the Cauchy noise, the standard ЅαЅ noise, and mixed Gaussian noise in the time domain.

Proposed Algorithm
The received FH signal is formulated as:

Proposed Algorithm
The received FH signal is formulated as: with 0 < t ≤ T; A is a numerical constant denoting the amplitude of the FH signal; rect T H = 1 t ∈ (−T H /2, T H /2) 0 others denotes a rectangular window; T H and f k are the hopping period and hopping frequency to be estimated separately; and n(t) represents the impulsive noise.

Implementation of p -STFT
The definition of STFT indicates that a long non-stationary signal is transformed into a short stationary signal matrix by adding a window function in the signal time domain, then perform DFT (Discrete Fourier Transform) on the windowed signal matrix to obtain the time-frequency spectrum matrix. Therefore, the 1D signal model is innovatively represented as a 2D signal matrix: where Y = [y 1 , y 2 , . . . , y N ] denotes a K × N reconstructed data matrix and has the form . , x N ] ∈ C K×N denotes the time-frequency spectrum matrix that needs to be estimated.
denotes a steering vector with w k = 2πk/K, k = 0, 1, . . . , K − 1. y N = [y 1 , y 2 , . . . , y m , . . . , y k ] is the short discrete stationary signal that is achieved by moving the position of the window function on the observed signal at a different time. Since the window function length M is less than the estimated spectrum length K, that is M < K, the zero-padding on signal matrix is required. Z = [z 1 , z 2 , . . . , z N ] ∈ C K×N is the impulsive noise. The estimation task is to construct the above STFT cost function to obtain time-frequency spectrum X from signal Y mixed with impulsive noise.
In conventional estimators, the cost function of spectrum estimation by DFT is in form of 2 -norm, which is exploited in reference [18]. Considering the relationship between STFT and DFT, we know that the cost function of STFT has the same form of 2 -norm, that means the estimation of X, denoted byX, can be obtained byX = argmin which corresponds to 2 -norm minimization, and · F represents the 2 -norm. Note that X is corresponding to the spectrum obtained by directly performing STFT on the FH signal. To illustrate this point visually, we verify this result in the first part of simulation. It is well known that the LS solution is equivalent to the maximum likelihood (ML) estimate when noise is a zero-mean white Gaussian process. In fact, it was also verified in reference [23] for LS-based parameter estimation in the presence of white Gaussian noise via several representative signal processing examples. Nevertheless, the validity of the Gaussian assumption is approximate at best, while the occurrence of non-Gaussian noise is common in many fields. In particular, if the noise is impulsive noise, unreliable parameter estimation will result, since the performance of the 2 -norm minimizer is very sensitive to outliers. To achieve robust estimation, p -norm minimization with 1 < p < 2 is widely used since it is less sensitive to outliers than the square function. In this work, we focus on the cost function of p -norm · p . In light of the p -STFT model, it first calculates the initial spectrum of X without filtering the outliers. Then, the powerful iterative reweighted least squares (IRLS) algorithm [24] is employed to eliminate the noise interference. One thing to point out is that the cost function of p -STFT is a matrix that needs to be converted to column vectors before processing noise by IRLS [25]. IRLS will continuously iterate the weight matrix to obtain a satisfactory solution that converges to optimal accuracy. Finally, a time-frequency analysis algorithm via p -STFT is constructed.
Since each column in matrix Y is independent, we can convert a big equation of the cost function matrix into small questions of N vectors [26] to tackle. That is, the matrix objective function of Equation (9) can be further expressed as where y n and x n are the nth column of Y and X, respectively. From there we can separately process the p -norm of each column vector and then arrange the processed N sub-problems into a matrix in order, which is the time-frequency spectrum we ultimately need. Now the vector data model can be written as: where (11) is solved by minimizing the vector objective function as follows:x withx K = [x 1 ,x 2 , . . . ,x K ]. Employing (12) yields: To solve Equation (13), IRLS is applicable to settle such minimizing cost function problem of p -norm. It starts with a rough estimate of vector x K and then adjusts the weight coefficient to gradually approximate the optimal value. The objective function f (x) can be reformulated as It is not difficult to obtain the matrix form of Equation (14), and it can be expressed as where is a diagonal weighting matrix represented by a residual vector; that is, r K = y K − a K (w k )x k . We further analyze Equation (16) to get the estimate ofx k aŝ However, here x k is taken as a function of W K , and W K is also taken as a residual function of x k , and we cannot get the optimal estimate of x k in one step. Usually, all elements in the weighted matrix are initially set to W (0) K = 1, and the initial value of vector x K can be given by The core of the IRLS method lies in continuously updating the estimated vector by updating weighted coefficients that utilize the previous residual vector. After each iteration of vector x K , IRLS will automatically judge whether it converges. If the accuracy requirement does not meet the tolerance ε, which is an extremely small positive number, it will continue iterating to guarantee convergence.
Afterwards, following the order of the previous matrix vectorization, we rearrange the series of vectors processed by p -STFT into a time-frequency matrix in order: At this point, we choose an appropriate time-frequency analysis algorithm and construct it via the previously generated p -STFT. The complete p -STFT algorithm is summarized in Algorithm 1.

Algorithm 1: For p -norm short-time Fourier transform ( p -STFT)
1. Reconstruct the original signal into a signal matrix and build the STFT cost function. 2. Simplify the matrix p -norm optimization into sequential vector p -norm optimization.

Initialize the weighted matrix
6. Repeat steps 4 and 5 until k = K to get the updatedx k (n) . If go to step 4. 7. Arrange the vectors into a time-frequency matrix in order. 8. Construct time-frequency analysis algorithm using a p -STFT spectrum matrix.

Parameter Estimation Based on p -STFT
As usual, whether it is directly acquiring the characteristics of the FH signal for tracking interference or further constructing the local oscillator signal to steal enemy information, the blind estimation of parameters is a necessary condition for the next step. The double-window spectrogram difference (DWSD) method has a superior time-frequency representation that can accurately estimate important parameters such as hopping period, hopping frequency, hopping time set, and hopping rate. More specifically, the spectrogram (SPEC) method should be performed using a long window function h l (t) and short window function h s (t), respectively on signal to obtain the time-frequency transform results. Combining the two sets of results with better time and frequency resolution will provide the time-frequency analysis of DWSD. The definition of DWSD is given as where using the p -STFT method to eliminate the presence of outliers and acquire the p − STFT l (t, w) and p − STFT s (t, w). The normalized Stankovic entropy measure is an effective criterion used for evaluating the time-frequency concentration in the spectrogram [10], which has the expression where SP(n, m) denotes the signal entry of time-frequency distribution that is affected by the length of the window function. The smaller the value of R 3 , the better the time-frequency performance and the more optimal the length of the window function. Suppose N equals the length of signal, Figure 2 shows the curve of entropy measure R 3 obtained by simultaneously varying the length of long window and the short window between 0~N/2 and N/8~5N/8 in a same sampling step. When the short window ranges from N/20 to 9N/40 and the long window ranges from N/4 to 9N/20, we can obtain a better time-frequency performance.
where ( ) ( ) using the ℓp-STFT method to eliminate the presence of outliers and acquire the . The normalized Stankovic entropy measure is an effective criterion used for evaluating the time-frequency concentration in the spectrogram [10], which has the expression where ( , ) SP n m denotes the signal entry of time-frequency distribution that is affected by the length of the window function. The smaller the value of 3 R , the better the time-frequency performance and the more optimal the length of the window function. Suppose N equals the length of signal, Figure 2 shows the curve of entropy measure 3 R obtained by simultaneously varying the length of long window and the short window between 0~N/2 and N/8~5N/8 in a same sampling step. When the short window ranges from N/20 to 9N/40 and the long window ranges from N/4 to 9N/20, we can obtain a better time-frequency performance. Afterwards, we can apply the treated ℓp-STFT spectrum to construct the crucial time-frequency diagram even in an impulsive noise environment. The To more conveniently and clearly estimate parameters, we can take the maximum value of the time-frequency diagram to obtain the time-frequency ridge line, as shown in Figure 5b.
Performing first-order difference on the time-frequency ridge line will produce a frequency difference sequence 1 d . Afterwards, we can apply the treated p -STFT spectrum to construct the crucial time-frequency diagram even in an impulsive noise environment. The p -DWSD can be expressed as To more conveniently and clearly estimate parameters, we can take the maximum value of the time-frequency diagram to obtain the time-frequency ridge line, as shown in Figure 5b.
Performing first-order difference on the time-frequency ridge line will produce a frequency difference sequence d 1 .
The time domain position of the value in d 1 is an equally spaced difference sequence d 2 , which corresponds to the hopping time set. The estimated hopping period is the mean value of a set of the hopping period approximations, which are acquired by performing first-order difference on d 2 . Then we can easily estimate the hopping rate, which is the reciprocal of the hopping period and the hopping time set, because of the changeless hopping period.
The estimation of hopping frequency is also based on the time-frequency ridge line, and can be acquired by averaging the instantaneous frequency in each hopping period.
After obtaining the above parameters, it is not difficult to determine that the hopping period and hopping frequency are two basic parameters. The estimation accuracy of these two parameters is an important criterion to evaluate the performance of the p -STFT algorithm.

Simulation Result and Analysis
Computer simulations were conducted to investigate the performance of the derived estimator. The signal is generated by Equation (4), where the parameters in the FH signal are set as follows: eight hopping periods with T H = 0.1s, the sampling rate f s = 10 kHz, the sampling point N = 800, the hopping frequencies f k = {0.5, 2.5, 1, 3.5, 2, 3, 1.5, 4} kHz, and the noise n(t) being standard SαS noise with α = 1.5, β = 0, γ = 1. Since impulsive noise has no finite second-order moment, the generalized signal-to-noise ratio (GSNR) [27] will be more applicable than the traditional signal-to-noise ratio, and is set to 3 dB. The order of p -STFT satisfies 1 < p < 2, p = 1.2, the explanation for p is given in Appendix A, and the tolerance is taken as ε = 1 × 10 −4 . All results were simulated using MATLAB running on Intel(R) Core(TM) i7-4790 CPU@3.60 GHz and Windows 10 for observations. First, we verified the equivalence between the spectrogram obtained by STFT and the spectrogram estimated from the cost function of STFT based on 2 -norm. We select the length of the window function is 100 and carry out the experiment in non-noise environment to removal interference. Figure 3 shows that the time-frequency spectrograms produced by these two ways are almost consistent. It serves well to prove that the cost function of STFT is effective and the spectrogram obtained by STFT is equivalent to the spectrogram estimated by the cost function base on 2 -norm.  Second, we studied the performance of the proposed method applied to the DWSD algorithm in α-stable noise. The lengths of the long and short windows are 251 and 51, respectively. This means they are within the range we have illustrated and the selection of window function length is suitable for the DWSD method. The time-frequency diagram of STFT with a narrow window function is shown in Figure 4a, which is also the initial spectrogram of ℓp-STFT. We can see that there are several impulse interferences that are maintained for a short time but with a high amplitude, as well as many small noise points, both of which seriously disturb the signal estimation. In contrast, the spectrogram in Figure 4b, which was obtained by ℓp-STFT, effectively extracts the energy of signal submerged in  Second, we studied the performance of the proposed method applied to the DWSD algorithm in α-stable noise. The lengths of the long and short windows are 251 and 51, respectively. This means they are within the range we have illustrated and the selection of window function length is suitable for the DWSD method. The time-frequency diagram of STFT with a narrow window function is shown in Figure 4a, which is also the initial spectrogram of p -STFT. We can see that there are several impulse interferences that are maintained for a short time but with a high amplitude, as well as many small noise points, both of which seriously disturb the signal estimation. In contrast, the spectrogram in Figure 4b, which was obtained by p -STFT, effectively extracts the energy of signal submerged in impulsive noise while removing noise outliers. In the same way, the result comparison of time-frequency diagrams using a wide window function are presented in Figure 4c,d. The result is consistent with the effect of p -STFT adding the narrow window function. Therefore, p -STFT provides excellent robustness that appears to be hardly affected by the characteristics of noise outliers. Then, Figure 5a,b shows the time-frequency diagram and the time-frequency ridge line of p -DWSD, which combines the above two sets of optimal results processed by p -STFT. p -DWSD exhibits extraordinarily powerful time-frequency concentration and accurately describes the parameter information of FH signals, even in an impulsive noise environment.
Third, we selected the algorithms related to the STFT to compare the performance of parameter estimation: STFT, FLOSTFT, and p -STFT. The FLOSTFT method represents a class of algorithms based on fractional low-order statistics, and the fractional lower moments assumes p = 0.4. After repeating 100 experiments on these three methods in α-stable noise with α = 1.2, β = 0, γ = 1, we compared the relative error of hopping period and the mean squared error (MSE) of hopping frequency to evaluate their parameter estimation performance.  Second, we studied the performance of the proposed method applied to the DWSD algorithm in α-stable noise. The lengths of the long and short windows are 251 and 51, respectively. This means they are within the range we have illustrated and the selection of window function length is suitable for the DWSD method. The time-frequency diagram of STFT with a narrow window function is shown in Figure 4a, which is also the initial spectrogram of ℓp-STFT. We can see that there are several impulse interferences that are maintained for a short time but with a high amplitude, as well as many small noise points, both of which seriously disturb the signal estimation. In contrast, the spectrogram in Figure 4b, which was obtained by ℓp-STFT, effectively extracts the energy of signal submerged in impulsive noise while removing noise outliers. In the same way, the result comparison of timefrequency diagrams using a wide window function are presented in Figure 4c,d. The result is consistent with the effect of ℓp-STFT adding the narrow window function. Therefore, ℓp-STFT provides excellent robustness that appears to be hardly affected by the characteristics of noise outliers. Then, Figure 5a,b shows the time-frequency diagram and the time-frequency ridge line of ℓp-DWSD, which combines the above two sets of optimal results processed by ℓp-STFT. ℓp-DWSD exhibits extraordinarily powerful time-frequency concentration and accurately describes the parameter information of FH signals, even in an impulsive noise environment.    (c) (d) Third, we selected the algorithms related to the STFT to compare the performance of parameter estimation: STFT, FLOSTFT, and ℓp-STFT. The FLOSTFT method represents a class of algorithms based on fractional low-order statistics, and the fractional lower moments assumes p = 0.4. After repeating 100 experiments on these three methods in α-stable noise with α = 1.2, β = 0, γ = 1, we compared the relative error of hopping period and the mean squared error (MSE) of hopping frequency to evaluate their parameter estimation performance.
On the one hand, in terms of the estimated performance of the hopping period that is shown in Figure 6a, the accuracy of STFT without any processing is the worst among the three methods but is also the most primitive. STFT can be used as an intuitive indicator of performance comparison. From the holistic perspective of the curve, the larger the impulse amplitude, the lower the fault tolerance of the algorithm and the worse the performance becomes. When GSNR > 3 dB, both FLOSTFT and ℓp-STFT methods almost accurately estimate the hopping period and maintain stable estimation under high GSNR, and for STFT, the same result requires a higher GSNR to be completed, while the robustness of the algorithm under low GSNR can better prove the superiority of the algorithm. From Figure 4a we can find that, first, at low GSNR, the overall performance of the ℓp-STFT algorithm is superior to FLOSTFT and FLOSTFT is better than STFT. Second, the periodic relative error curve of the ℓp-STFT algorithm has the fastest rate of decline compared with the other two methods. On the one hand, in terms of the estimated performance of the hopping period that is shown in Figure 6a, the accuracy of STFT without any processing is the worst among the three methods but is also the most primitive. STFT can be used as an intuitive indicator of performance comparison. From the holistic perspective of the curve, the larger the impulse amplitude, the lower the fault tolerance of the algorithm and the worse the performance becomes. When GSNR > 3 dB, both FLOSTFT and p -STFT methods almost accurately estimate the hopping period and maintain stable estimation under high GSNR, and for STFT, the same result requires a higher GSNR to be completed, while the robustness of the algorithm under low GSNR can better prove the superiority of the algorithm. From Figure 4a we can find that, first, at low GSNR, the overall performance of the p -STFT algorithm is superior to FLOSTFT and FLOSTFT is better than STFT. Second, the periodic relative error curve of the p -STFT algorithm has the fastest rate of decline compared with the other two methods. Furthermore, at GSNR = 1 dB, the relative error value of p -STFT is almost equal to zero. This effectively demonstrates that the p -STFT algorithm not only maintains stable and accurate estimation at high GSNR, but also quickly achieves relatively high parameter estimation accuracy at low GSNR.
On the other hand, from the perspective of estimating the performance of the hopping frequency, in addition to α-stable noise, Cauchy noise with α = 1 and GWD noise with ε = 0.03, µ = 0, σ 1 = 2, σ 2 2 /σ 2 1 = 3000 are also used as impulsive noise environments to better illustrate the robustness of the p -STFT algorithm. Comparing the MSE curve of the three methods presented in Figure 6b-d in any kind of noise environment, p -STFT has higher accuracy for frequency estimation than the other two methods. For Figure 6b, with smaller impulsive amplitudes, when GSNR < 0 dB, the estimated performance of FLOSTFT is seriously degraded, whereas p -STFT can still estimate the hopping frequency comparatively accurately, and this point is more evident in the Cauchy noise environment of Figure 6c. The strong impulsive noise features further highlight the advantages of p -STFT, and the estimation precision of p -STFT is nearly 10 dB higher than FLOSTFT when GSNR < 0 dB. Figure 6d showing GWD noise represents the case where the impulse appears more frequently, and the noise removal effect of the p -STFT algorithm is still the best under this scenario, at about 8 dB higher than FLOSTFT in low GSNR. Even when GSNR > 0 dB, although the estimated performance of both algorithms is improved, p -STFT is also maintained at nearly 3 dB higher than FLOSTFT for all impulsive noise environments. This result convincingly proves the robustness of p -STFT to outliers and the accuracy of parameter estimation in the face of strong impulsive noise. and the noise removal effect of the ℓp-STFT algorithm is still the best under this scenario, at about 8 dB higher than FLOSTFT in low GSNR. Even when GSNR > 0 dB, although the estimated performance of both algorithms is improved, ℓp-STFT is also maintained at nearly 3 dB higher than FLOSTFT for all impulsive noise environments. This result convincingly proves the robustness of ℓp-STFT to outliers and the accuracy of parameter estimation in the face of strong impulsive noise.

Conclusions
Almost all algorithms approximately applicable to the Gaussian distribution model have varying degrees of performance degradation in an impulsive noise environment. To solve such a knotty problem, a new algorithm called p -STFT is proposed for robust and accurate parameter estimation of FH signals. It takes STFT as an associated bond to combine the p -norm algorithm with the time-frequency analysis algorithm. The simulation results demonstrate that the p -STFT method has strong robustness and parameter estimation accuracy, especially in low-GSNR environments. However, the computational complexity caused by iteration requires further research in the future.
The convergence rate of ℓp-STFT method in α-stable noise, Cauchy noise and GWD noise are also compared. In fact, we primarily care about the number of iterations of ℓp-STFT method under different tolerances. Seven tolerances are taken for verification, that are ε = 1 × 10 -6 , 1 × 10 -5 , 1 × 10 -4 , 1 × 10 -3 , 1 × 10 -2 , 1 × 10 -1 and 1 × 10 -0 . From the Figure A1b we observed that the proposed algorithm has similar convergence rate under different impulsive noises, which has a linear approximation relationship to the tolerance accuracy. In addition, it can be seen that for the tolerance ε = 1 × 10 -4 we provided, the relative error is less than the tolerance after 9 iterations.