Speciﬁc Emitter Identiﬁcation Based on Synchrosqueezing Transform for Civil Radar

: Time-frequency (TF) signal features are widely used in speciﬁc emitter identiﬁcation (SEI) which commonly arises in many applications, especially for radar signals. Due to data scale and algorithm complexity, it is difﬁcult to obtain an informative representation for SEI with existing TF features. In this paper, a feature extraction method is proposed based on synchrosqueezing transform (SST). The SST feature has an equivalent dimension to Fourier transform, and retains the most relevant information of the signal, leading to on average approximately 20 percent improvement in SEI for complex frequency modulation signals compared with existing handcrafted features. Numerous results demonstrate that the synchrosqueezing TF feature can offer a better recognition accuracy, especially for the signals with intricate time-varying information. Moreover, a linear relevance propagation algorithm is employed to attest to the SST feature importance from the perspective of deep learning.


Introduction
Specific emitter identification (SEI) has always been a challenging problem in signal recognition. With the support of big data and high computing capability, technologies such as deep neural networks (DNN) can perform the identification task efficiently. However, the insufficiency of the training data set for DNN limits the depth of network and hinders performance yielding inaccurate classifications. Furthermore, understanding and interpreting network learning is always a challenging problem. Therefore, an effective feature extraction strategy is needed for SEI tasks. The ideal feature, before the classifier, should have a complete but low dimensional representation of the signal. Neither time or frequency alone can adequately fulfill this requirement when dealing with non-stationary signals. As a powerful non-stationary signal processing tool, TF analysis can reveal the signal local frequency characteristics. However, the mapping from one-dimensional signal to two-dimensional representation can introduce a large dimensionality problem. Attempts to reduce the redundancy in time-frequency (TF) feature have become an important subject [1][2][3]. Towards this end, some works [4][5][6] treat the TF spectrum as images. Then, downsampling of both variables is applied, and the result is sent to the classifier for identification. It is noted, however, that fine details of the emitter signature and modulation profile can be missed by classification confusion between closely resembling emitters.
Two effective methods have been recently introduced to tackle the above problem [7,8], namely the ambiguity function representative slice (AF-RS) and the compressed sensing mask (CS-MASK). These two features not only reveal physical characteristics of measured radar signal, but also alleviate the computational costs. However, when it comes to complex frequency modulation signals, these two methods may perform unstably. For AF-RS, taking a single AF slice near the origin may only capture a part of the modulation information. For CS-MASK, it is complicated, and rather an art, to set the parameter of mask. Both methods face challenges when dealing with complex modulation signals, giving inadequate identification.
To overcome the limitations above, the synchrosqueezing transform (SST) [9], as an empirical mode decomposition-like tool, can be applied since it squeezes the energy from a wide frequency range into a narrow bandwidth near instantaneous frequency points, which can greatly reduce the scale of feature, and retains most modulation information. Hence, in this paper, a feature extraction method based on SST is proposed. This feature shows a good recognition rate, meanwhile, due to flavor and philosophy of feature extraction, it can probably facilitate the interpretation of the deep networks with linear relevance propagation (LRP) algorithm [10,11]. The data from civil aviation meteorological radar and radar signal generators are used to verify the validity of the proposed feature.
The paper is arranged as follows. Two TF feature extraction methods are discussed in the next section. Section 3 illustrates LRP algorithm. The SST method and the corresponding properties are introduced in Section 4; Section 5 proposes the SST feature extraction method. Numerous simulation results based on measured data and the interpretation of deep neural networks are given in Section 6. Finally, we conclude the paper in Section 7.

Time-Frequency Feature Extraction Methods
In this section, we briefly introduce the two TF feature extraction methods: Ambiguity Function Representative Slice (AF-RS) and Compressed Sensing Mask (CS-MASK) [7,8]. Ambiguity function (AF) concentrates most signal energy at and near the range-Doppler origin. Because of this property, it is possible to extract a feature with a low dimension by organizing the information near the origin in the ambiguity domain (AD).

AF-RS
The AF is the inverse Fourier transform of the instantaneous autocorrelation function of a signal over the time variable. In essence, it transforms signals to delay-frequency offset domain, whereas common methods depict signatures in joint time-frequency domain (TFD). Owing to the fact that typical targets do not rapidly transient over a short time period, most of the backscattering energy is not highly offset away from the AD origin. The AF of signal f is given by: in which τ is time delay and ν is frequency offset. It is clear from (1) that when frequency offset is 0, the AF degenerates into the integral of autocorrelation function of the signal with respect to time.
According to the definition of autocorrelation function, the maximum of AF is near the origin. For illustration, consider a single-frequency signal and an LFM signal. The AF of the former is shown in Figure 1. At zero frequency offset, the signal energy reaches the maximum value. The cross-terms arising from the two Euler representations of a real sinusoid are shown parallel to frequency axis. These terms are considered artifacts and do not aid signal identification. On the other hand, the AF of LFM signal is shown in Figure 2. It can be seen that although the energy is mainly concentrated at zero frequency offset, the time-varying frequency of the signal cannot be fully captured in only one slice of zero frequency offset.
Since the energy of AF exhibits a peak at shift frequency being zero, the slices of ambiguity function with shift frequency near zero (including zero-slice) could be considered to be the major representative feature sets of radar emitter pulse which is named representative slice. This kind of feature not only coincides with the characteristics of real radar data but also can reduce the computation complexity.  Figure 3 shows the slices of the LFM signal near the zero-frequency offset, which is from the frequency offset from 0 (ν = 0) to 5 (ν = 5): A class-dependent method to classify radar emitter is proposed by Gillespie and Atlas [12,13], which is used to extract initial features of radar signals, followed by a kernel optimization scheme in ambiguity function plane. Paper [8] applied Direct Discriminant Ratio (DRR) as a criterion to rank the kernel, defined as: here,Ā i means the "average" auto-ambiguity function of Class i. Frequency offset ν is usually set near zero in order to rank the points along the major direction of ambiguity function distribution. By this norm, AF-RS extracts the slices near the origin in delay-frequency offset domain as a feature set. The most representative slice is selected as the feature, resulting in a great reduction in data dimension and an obvious enhancement of identification accuracy. However, there are still some shortcomings, mainly in: (1) Only the slice at a fixed frequency is extracted as a feature, while other slices may contribute to identification in other offsets.
(2) Compared with the zero-frequency-offset slice, the representative slice has more feature fusion optimization process, which probably makes it difficult to achieve real-time optimization. It must be based on recognition rate feedback, resulting in an increase in the computational complexity.

CS-MASK
A compressed sensing mask is based on that ambiguity function and Wigner-Ville distribution is a two-dimensional Fourier transform pair relationship. After transforming the signal into the ambiguity domain, the compressed sensing theory is used to optimize the sparse ambiguity function mask feature.
According to the expression of ambiguity function, the two-dimensional inverse Fourier transform of the Wigner-Ville distribution is the ambiguity function, shown below: here WVD f (t, f ) is Winger-Ville Distribution (WVD) of signal f (t), a common quadratic time frequency representation tool, defined as: Assuming WVD and ambiguity as length N points sequences, the matrix correlation of them as follows, which are related by two-dimensional Fourier transform: where Ψ is 2D Fourier transform matrix in size of N × N, and W D is the matrix form of Winger-Ville Distribution of the processed signal f (t). To reconstruct Wigner-Ville distribution from ambiguity function with high accuracy and low data dimension, a measurement matrix Φ in size of M × N (M N) used in the compressed sensing is defined. Feature A Mask f (M×1) , an M sparse measurement of AF, is obtained as follows: An ideal feature should include as much information as possible with a low dimension. The sparse feature can be obtained according to compressed sensing theory, which can be optimized as: where F −1 2d is 2D Fourier inverse transform, ε represents a user-specified bound to constrain the error in an acceptable range. Because the 2D Fourier transform of ambiguity function is WVD, the efficacy of the feature CS-MASK can be examined by the error between its 2D Fourier inverse transform and WVD according to Equation (8). Figure 4 represents the ambiguity function and CS-MASK feature of an LFM signal. The signal length is 256, the size of the original ambiguity function is 256 × 256, while the size of CS-MASK extracted according to Equations (7) and (8) is 17 × 17, meaning the feature dimension has been greatly reduced.  The amount of information contained in CS-MASK can be measured with its recovered WVD. Figure 5 is a comparison of WVD of the original signal and WVD of the signal reconstructed by CS-MASK.
Nevertheless, there are still two points which can be improved in this method: (1) The selection of the mask size is based on the optimization criterion, which needs to be selected by the reconstructed signal WVD, so the calculation amount is large, although the rapid optimization method in two projects is proposed [7]; (2) The size of the optimized feature is not fixed but will change with certain factors such as signal modulation and noise interference.

Linear Relevance Propagation
The LRP method is generally used in the middle layer visualization and feature inversion in image processing with DNN. LRP is based on the layer-wise conservation principle which states that relevance R is conserved in process of back-propagating from one layer to the next, i.e., [11], The relevance of the last layer is equivalent to the classification function f (x) of the input data. In each layer, relevance R is defined as below: where r (l) is the relevance of the neurons in each layer. The heatmap h n = r 1 n , the relevance in the first hidden layer, is obtained by the following rule: where α and β are two fixed parameters during iterations, and z + here a (l) n is the activation of neuron n in layer l, and ω (l,l+1) n,m is the weighs from nth neuron in layer l to mth neuro in layer l + 1.
In this paper, LRP method is used to create a heatmap for the recognition result of each method that reflects which parts of the original data contribute positively to recognition of network and which contributes negatively. Combined with this heatmap, we can understand this network, perhaps even manage to interpret how it works. This research is conducted in the simulation part.

Synchrosqueezing Transform
SST, proposed by Daubechies et al., is based on wavelet transform and empirical mode decomposition (EMD) [9]. SST is a post TF processing tool which compresses the spectrum in TFD by squeezing the energy distributed in a frequency range into some certain instantaneous frequency points. SST has been widely applied in many fields of biomedical signals, geographic signals, sonar signals, etc. [14,15] In this paper, Short Time Fourier Transform (STFT), a commonly used time-frequency analysis tool, is adopted as time frequency transform. STFT converts the signals into a 2D time-frequency domain by performing Fourier Transform in a fixed window traversing the time domain. Considering a signal f (t), the STFT of f (t) with a window g(t) in Schwartz class, is defined as [16][17][18]: where g(t) * is the complex conjugate of g(t), and f is frequency.
Signal f (t) to be considered in synchrosqueezing techniques in the STFT is defined by: where K is a constant, a k > 0 is a continuously differentiable function, φ k is a two times continuously differentiable function satisfying φ k (t) > 0 and φ k+1 (t) > φ k (t) for all t. In this context, ideal T-F (ITF) representations can be defined as: where q is a positive integer depending on the chosen T-F distribution (TFD), and δ refers to Dirac Distribution. The STFT being is associated with q = 1. Base on this theory, the synchrosqueezing operator reassigns the STFT as follows: is a time reassignment factor, defined as [17]: here W g and W f are WVD of g(t) and f (t). By squeezing the frequency components within a certain range of instantaneous frequency, an energy-concentrated TF representation can be obtained. Then, the reconstruction of the original signal is computed by where C h = Re(C ϕ C ψ ), C ϕ = e −i2π f t f 2 and C ψ = 1 2 +∞ 0ψ (ξ) 1 ξ dξ.ψ(ξ) is the complex conjugate of the Fourier transform of the mother wavelet ψ(t) = 1 √ ωπ e − t 2 2 e i2πt (ω = 2π f ). More details can be found in [19].
Because of the property that the energy of a signal is mainly concentrated near the instantaneous frequency points, it is only necessary to extract vital frequency, energy, and time-varying information. The effect of first-order SST is better only when the second derivative of the phase with respect to time is approximately zero or much smaller than its first derivative. However, for complex frequency modulation signals, such as secondary frequency hopping signals, the second derivative of phase is not negligible. Hence, vertical synchrosqueezing transform (VSST) is proposed [20]. In this paper, VSST is used as the TF analysis tool because the experimental datasets contain numerous complex modulation signals.

The Proposed SST Feature Extraction Algorithm
In this section, the SST feature extraction is proposed to achieve an accurate recognition rate. The property of VSST is elaborated and some experimental results of VSST are shown in Section 5.1. Additionally, the multipath effect, a common phenomenon in radar signal processing, usually leads to an adverse impact on feature representation. VSST can mitigate the multipath effect, which is discussed in Section 5.2 and the SST feature extraction method is presented in Section 5.3

Vertical Synchrosqueezing Transform
The latest research results [20] have achieved vertical synchrosqueezing transform (VSST) and even high-order form. For nonlinear FM signals, the concentration of the transform is further improved. D.H.Pham gives a VSST expression based on STFT: Here,ω f (t, f ) andτ f (t, f ) are TF rearrangement factors,ω f (t, f ) is the second-order estimate of the instantaneous frequency of signal. Compared with the first-order SST, VSST obtains more accurate instantaneous frequency estimation by adding the adjustment factorq f (t, f ), meaning the TF concentration can be further improved. The original signal can be quickly recovered using the following formula [20]: (20) Figure 6 shows the VSST and the inverse transform of synthetic linear frequency modulation pulse signal. The signal information is shown below: The normalized fs =256, the total length of the signal is 512 and the signal to noise ratio (SNR) is 0 dB.  Figure 6, VSST retains the time-varying information of the original signal and has a lossless inverse transformation, which brings great convenience for feature extraction and identification. Figure 7 is an enlarged partial view of VSST. It can be seen that most energy is concentrated near instantaneous frequency points. Compared to STFT and WVD, SST is a more accurate TF representation because of its perfect TF concentration.

Suppression on Multipath Effect by VSST
The multipath effect refers to a phenomenon that after electromagnetic wave propagates through different paths, each component arrives at the receiver in different time, which causes great interference due to the superposition of each phase [20]. The mulipath effect denotes as: where f m is the signal with multipath effect, E m is the ratio of amplitude of multipath effect to that of signal. Figure 8 shows the SST of the simulation signal sig(t) in equation (21) with multipath effect under different E m . Here, time delay has two conditions: τ = 16 and 128 sampling points. The SNR is 0 dB. From Figure 8a,d,g,j, it can be seen that multipath effect can cause a great adverse impact on SST feature. Figure 8c,f,i,l manifest that VSST can separate the multipath component from the signal and suppress it effectively.

Feature Extraction
It is clear from the various experimental results of SST that energy from the signal in TFD is mainly concentrated within a limited range, while some redundant information is distributed around the rest area. Based on this property, SST feature will be derived as follows: First, STFT together with VSST are performed on one signal to obtain the TF spectrum. Second, the energy sequence E(n) and the phase trajectory sequence f (n) are extracted from the SST domain. It is worth noting that in order to embody more detailed information, the compression bandwidth of the energy sequence can be appropriately extended. Finally, the two sequences are merged into one new feature matrix and then sent to the classifier for identification. The feature matrix is shown below: Take the SST feature of the real measured signal from DatasetII as an example (The description of DatasetII is given in the next section), see in Figure 9, the length of the signal is N = 500, and the size of original spectrum matrix is 250 × 500. In comparison, the size of the feature matrix extracted by SST is reduced to 2 × 500. The process of the proposed algorithm (including feature extraction and LRP interpretation) is given as follows and Figure 10: Step1: Signal Preprocessing.
Preprocess the signals to obtain invalid and neat signals including demean, aligning and normalization. (Denoise is not considered in this paper since it may remove important details by accident, while SST can keep the main part of the signal within the narrow band and remove the rest of the signal with noise.) The obtained data usually contain abnormal value, noise, and interference which influence the identification in this algorithm. In addition, normalization is required in order to avoid the adverse impact of different signal values for identification.
Step2: Feature Extraction. In this step, firstly it is required to perform STFT on the processed signals. Then SST is used to suppress the energy of STFT. In SST spectrum, instantaneous frequency and suppressed energy are extracted as a new TF feature matrix.
Step3: Identification. Finally, the extracted SST feature in Step2 is delivered into DNN for training and identification. It should be noted that the structure of DNN for each type of data is identical. A better recognition rate means that the feature may include more useful information for training the network.
Combined with the LRP algorithm, heatmaps of various signals can be obtained. Based on these heatmaps, we can manage to understand how the networks work, and even try to give a reasonable interpretation of its identification mechanism.

Experimental Results
In this section, the supervised classifier is applied to identify the four sets of measured radar data, and compare SST feature with other three features to verify the effectiveness of the proposed method. DatasetI contains 10 types of signals from 10 identical types of civil aviation meteorological radars. It includes 500 approximate single-frequency signal samples; the next three data sets are all collected from radar signal generators. DatasetII contains 10 types of 500 single frequency signal samples; DatasetIII contains 10 types of 500 LFM signal samples; DatasetIV contains 10 types of 500 LFM signal samples. The reason for choosing these four data sets is that they are very typical and representative real measured signals of civil radars. We randomly select a signal from each data set to perform STFT and SST in order to show the TF characteristics of each data set, shown in Figure 11. Table 1 shows the processing time to obtain SST feature of each data set.
The extracted SST features of each data set are then sent to the classifier for training and identification. The training rate is set from 10% to 70%, and deep neural network (DNN) with fully connected layers is used as a classifier. The recognition rates under different features are shown in Tables 2-5 and Figure 12. It can be seen that for DatasetI, SST feature has only an inapparent advantage over the power spectrum (PSE), AF-RS and CS-MASK [7,8]. For the single-frequency signal in DatasetII, the effect of SST features is even slightly worse than PSE and CS-MASK. It is because the form of single-frequency signals is simple, and the energy is highly concentrated at the origin of AD domain, hence the information extracted by the mask is very complete. From the perspective of PSE, single-frequency signals are time-invariant signals, so one-dimensional energy information can represent signals enough, while SST feature may introduce redundant information which reduces the recognition rate.  In contrast, for DatasetIII and IV, the SST feature obtains the best recognition rate among these four features. This is because the modulation form of LFM signal is more complicated than single-frequency signal, and theoretically it needs more accurate information to identify. SST feature contains more detailed time-varying information of signal than other three features, so it can obtain a better representation of LFM signal.  To further verify the effectiveness of the SST feature, the LRP algorithm [10,11] is used to obtain the TF heatmaps in reverse. The parameters α and β in Equation (11) are set as α = 3, and β = −1.5. The number of the hidden layers in auto-coder is 3, and relu is selected as the activation function of each layer. Figure 13 shows heatmaps of deep network for LFM signals from DatasetIII and DatasetIV. The brightness of pixels represents the contribution to identification of deep network. It is apparent from Figure 13 that the brightest region is concentrated near the SST feature, meaning SST feature is akin to the part of spectrum which contributes most to the identification of deep network. In other words, the decisive information in deep network is not disarray of some pixels in spectrum, but some special parts of the spectrum make the most contribution. It can be observed that there exists a horizontal stripe and two vertical stripes in Figure 13a,b. The horizontal stripe may be caused by harmonic components in the signal, and the vertical stripes may be caused by phase hopping. In Figure 13c,d, a dim horizontal stripe appears within the first quarter of the entire time, which is probably due to the noise at the beginning from signal generators. The specific causes of this phenomenon need to be further studied. These experimental results demonstrate the validity of the SST feature, and also reflect that recognition of the deep network is possibly interpretable.

Conclusions
In this paper, we proposed a time-frequency feature extraction method based on SST. The energy sequence and phase trajectory sequence are merged into a feature matrix, which retains almost complete time-frequency information, and holds clear physical meaning with a small dimension. The proposed feature can obtain better recognition rates especially for complex modulation signals due to its high energy concentration and precise instantaneous frequency which can contain more beneficial information, meanwhile, suppress redundant or even negative information, such as noise and multipath effect, to classifier. The superiority of the method in comparison to PSE, AF-RS, and CS-MASK [7,8] is verified by the identification of the measured radar data. Furthermore, combined with the proposed feature and LRP algorithm [11], it is possible to understand and interpret the inner mechanism of the identification by deep networks.