LPI Radar Waveform Recognition Based on Time-Frequency Distribution

In this paper, an automatic radar waveform recognition system in a high noise environment is proposed. Signal waveform recognition techniques are widely applied in the field of cognitive radio, spectrum management and radar applications, etc. We devise a system to classify the modulating signals widely used in low probability of intercept (LPI) radar detection systems. The radar signals are divided into eight types of classifications, including linear frequency modulation (LFM), BPSK (Barker code modulation), Costas codes and polyphase codes (comprising Frank, P1, P2, P3 and P4). The classifier is Elman neural network (ENN), and it is a supervised classification based on features extracted from the system. Through the techniques of image filtering, image opening operation, skeleton extraction, principal component analysis (PCA), image binarization algorithm and Pseudo–Zernike moments, etc., the features are extracted from the Choi–Williams time-frequency distribution (CWD) image of the received data. In order to reduce the redundant features and simplify calculation, the features selection algorithm based on mutual information between classes and features vectors are applied. The superiority of the proposed classification system is demonstrated by the simulations and analysis. Simulation results show that the overall ratio of successful recognition (RSR) is 94.7% at signal-to-noise ratio (SNR) of −2 dB.


Introduction
There are considerable interests in radar systems that "to see and not be seen" can commonly called low probability of intercept (LPI) radar [1]. LPI radar is used more and more widely. Meanwhile, the number of modulating waveforms is increasing quickly for the LPI radar system. However, it is no longer easy, solely relying on human operators, to recognize the LPI radar of interest. That is, automatic recognition of radar waveform is becoming increasingly important for some critical applications in military-like radar identification, threat analysis, electronic warfare (EW), surveillance, etc. Moreover, it can be applied in civilian areas including signal recognition, spectrum management and cognitive radio, etc.
In the past, many scholars have devoted themselves to exploring the automatic recognition system of the radiation sources in the applications. They have proposed several feasible approaches, making the system more intelligent, more robust, and more like a real human operator. These achievements push forward the development in the field of the signal recognition. In [2] and [3] , the statistical characteristics of the radiation signals is estimated by a maximum likelihood algorithm, and the ratio of successful recognition (RSR) is greater than 90% in the case of signal-to-noise ratio (SNR) ≥ 10 dB. However, the classification of the signals are not extensive, as several kinds of LPI radar waveforms such as Frank code are not mentioned. LPI radar waveform usually has lower signal power, which makes it difficult to classify the LPI radar waveforms directly. Time-frequency (T-F) techniques are The paper is organized as follows. The automatic waveform recognition system and signal model are introduced in Section 2. Section 3 describes the composition of the classifier. Section 4 explains the features that the system required and how to calculate them. After that, a great number of features are calculated. In order to improve the efficiency of the system, the features selection algorithm is given in Section 5. Section 6 shows the simulation results and makes discussions. Finally, Section 7 draws the conclusions.

System Overview
As shown in Figure 1, the system is composed of four modules: time-frequency processing, signal feature estimation, image feature extraction and classifiers. In order to obtain the signal power accurately, it is necessary to estimate the adding of white noise (σ 2 ε ) in the samples. Different from majority of the existing systems, the carrier frequency estimation is not required in local systems because the carrier frequency shows the shift of the position in the CWD image. However, the signal structure is not changed at a different frequency, and the advantage of image features is not being sensitive to where the object lies in the image. The classifier consists of two classifier cells, network1 and network2. It can be identified including LFM, Costas codes, BPSK, as well as Frank code and P1-P4 polyphase codes. According to all of signal features and several image features, four types of classifications are classified in network1. They are LFM, Costas codes, BPSK and polyphase codes (including Frank and P1-P4). After that, the polyphase codes are classified in network2. At the same time, network2 is controlled by network1. As the signal is considered as polyphase codes by network1, network2 will begin to work. The reason to design two subsidiary cell structures is to reduce the number of input features and to improve accuracy of the signal classification. For more details, see Figure 2.
In this paper, the center frequency of the signal bandwidth is considered as carrier frequency. In addition, the signal is disturbed by additive white Gaussian noise, which means that the intercepted discrete time signal model is given by where n is an integer, T is a sampling interval, s(nT) is the complex of the transmitted signals, and m(nT) is complex white Gaussian noise with the variance σ 2 ε . A is amplitude, for simplicity and without loss of generality, so we assume that A is an invariant constant. φ is the instantaneous phase of the complex signal. If the input signal is real, the Hilbert transform is used: where x(k) is the real signal, and H[·] is Hilbert transform. For more details, see [15]. When the recognition result is regarded as polyphase by network1, network2 will start to work. In addition, the input features are classified into five types of polyphase codes directly.

Waveform Classifier
The three-layer ENN is used for the signal classification in the paper. There are connections from the hidden layer to these context units fixed with a weight of one [16]. At each time step, the input is propagated in a standard feed-forward fashion, and then the error back propagation (BP) learning rule is applied [17]. The fixed back connections result in context units always maintaining a copy of previous values of hidden units (since they propagate over the connections before the BP learning rule is applied). Thus, the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multi-layer perceptron [18,19]. In Figure 3, the hidden layer contains 46 neurons, sigmoid activation f (x) = 1/(1 + e −x ) is selected in each layer, and the number of input and output layers' neurons are determined by the vector dimension, respectively.  For ENN, a method to fix the number of hidden layer neurons H num is discussed in [20]. A simple calculation formula is formulated as: where C is the number of classification, and X is the dimension of a feature vector obtained. A slight adjustment of the number of hidden layer neurons needs to be done, until the network achieves the optimal situation. A simple suggestion is to re-enter the training data as the test data, and the correct recognition rate should be greater than 99%. Otherwise, the number of hidden layer neurons should be adjusted. Forty-six neurons are used in this paper.

Features Extraction
In this section, the feature extraction is described in details. Feature extraction plays a key role in the automatic recognition system, the algorithm of which determines not only the RSR but also the robustness of the system. The features include signal features and image features. The section is organized as follows. First, the signal features that based on the second order statistics, PSD, and instantaneous properties are illuminated. Then, the Choi-Williams distribution is introduced, and a further eight types of radar waveforms are shown in CWD images, respectively. After that, image preprocessing based on image morphology is addressed. Finally, the image features of the waveforms are estimated and extracted from the CWD image. Table 1 lists the necessary features that are presented in network1 and network2. Meanwhile, in order to keep the classifiers as concise as possible, other features are considered but not listed. For example, higher order moments (up to the fourth order) and cumulants (up to the fourth order), Pseudo-Zernike moments (up to the eighth order) and other instantaneous properties, etc. However, these features are not found discriminative enough in the system recognition. How to select the final features will be described in Section 5. The nth order moment of the complex signal is given bŷ where N is the number of samples, and ( * ) is conjugated components.M 10 andM 20 are calculated from this formula. The absolute value is ensured that the estimate value is an invariant constant when the signal phase rotated.
The nth order cumulant is given by [21,22] whereM 10 is given in Equation (4). In the system,Ĉ 20 is calculated. Additive independent complex second order circular noise does not affect moment features and cumulant features.

Based on Power Spectral Density (PSD)
PSD describes how the signal power distributes in frequency domain. Two features based on PSD are utilized. The signal should be estimated as an invariant scaling before PSD is calculated. Therefore, invariant scaling is estimated as [23]:ỹ where y(k) is the kth of samples, andM 21 is the three-order moment obtained in Equation (4). The variance of the additive noise σ 2 ε can be obtained in [24]. The features about PSD are given by whereỹ(k) is given in Equation (6). γ 1 is a nice feature to distinguish between binary phase and Costas codes. In fact, the square of the complex envelope is constant for binary phase modulation signals.
The feature γ 2 makes an outstanding performance in distinguishing binary phase modulation signals from others.

Based on Instantaneous Properties
Instantaneous properties, which include enormous information, work well in distinguishing frequency modulation signal from phase modulation signal. In this paper, instantaneous frequency and instantaneous phase need to be estimated. The standard deviation of the absolute value of instantaneous phase is estimated as [25]. For the sake of simplicity, φ(k) = tan −1 [Im(y(k))/Re(y(k))] is utilized, where Im and Re are the imaginary and real parts of the signal, respectively: where N is the number of samples, φ is the instantaneous phase of the complex signal, and the range of φ is between −π and π. For more details, see [3].
In order to estimate the instantaneous frequency clearly, the method is decomposed into several steps: f. the absolute value of the normalized centered instantaneous frequency is given by; * φ u (k) corrects the radian phase angles in φ(k) by adding multiples of ±2π when absolute jumps between consecutive element of φ(k) is greater than or equal to the jump tolerance of π radians. ** There are some spikes in the instantaneous frequency estimation in the vicinity of phase discontinuity of some waveforms. In order to smooth the f (k) and to remove the spikes, a median-filter with window size 5 is used.

Choi-Williams Distribution (CWD)
Choi-Williams distribution is a member in Cohen Classes [26], which can reduce image interference from cross terms effectively: where t and ω are time and frequency axes, and f (ξ, τ) is the kernel function given by The kernel function is a low-pass filter to eliminate cross terms. σ refers to controllable factor. The bigger σ is, the more obvious cross terms are. Meanwhile, σ = 1 is used in this paper to balance the cross terms and resolution. Eight types of waveforms of CWD transformation are shown in Figure 4. A method for fast calculation of CWD is found in [10]. The structure of a fast CWD method is based on the standard fast Fourier transformation (FFT). Therefore, the number of sampling points is recommended to be a small power of two, such as 128, 256, 512, etc. In this paper, 1024 × 1024 points are selected. sequentially. There are significant differences among the Choi-Williams time-frequency distribution (CWD) images. The controllable factor σ = 1 also be used.

Image Preprocessing
In the following parts, the real parts of CWD results of waveforms is treated as a 2D image. Digital image processing is explored to gain interested features. In this part, CWD image is processed into a binary image with three operations.
First, the length of detected signal, however, is N < 1024 points in most cases. Zero padding is utilized because we select the CWD transformation of 1024 points. Then, the CWD image is resized to N × N to reduce the computation load. Finally, the resized image is converted to a binary image, based on global thresholding algorithm [27]. The operation steps are as following: a. transform the resized image to gray image between [0, 1], i.e.; b. estimate the initial threshold T. It can be obtained from the average of the minimum and maximum from the image G(x, y); c. divide the image into two pixel groups G 1 and G 2 after the comparison with the threshold T.
G 1 includes all pixels in the image that the values > T, and G 2 includes all pixels in the image that the values ≤ T ; d. calculate the average value µ 1 and µ 2 of two pixel groups G 1 and G 2 , respectively; e. update the threshold value; , and calculate δT, i.e.; δT = T now − T be f ore g. until the δT is smaller than a predefined convergence value, 0.001 is used in the paper; h. calculate B(x, y); After the image binarization, however, there is some isolated noise and processing noise in the binary images. Isolated noise is generated because the signal is transmitted in the noisy environment. In addition, processing noise is generated in the kernel of CWD itself. It is a kind of special straight line, thin but long. The width of the line is less than three pixels, whereas the majority of lines in the CWD image are longer than half of the image length. In order to remove the noise, a morphological opening is applied (erosion followed by dilation). Erosion and dilation are the basic operations in morphological image processing. Morphological techniques probe an image with a small shape or template called a structuring element. The structuring element is positioned at all possible locations in the image. Furthermore, it is compared with the corresponding neighbourhood of pixels. The structuring element is said to fit the image if, for each of its pixels set to 1, the corresponding image pixel is also 1. Similarly, a structuring element is said to hit, or intersect, an image if, at least for one of its pixels set to 1, the corresponding image pixel is also 1. The opening is so called because it can open up a gap between objects connected by a thin bridge of pixels. Any regions that have survived from the erosion are restored to their original size through the dilation. In the paper, the structuring element in the size of 3 × 3 pixels is used. After the opening operation, the groups as small as a minimum 10% of the size of the largest group are removed. The image process is introduced in Figure 5.

Binary Image
Opening Processing

Noise Removed
Original Image 1024u1024 NuN Figure 5. In this figure, P3 code is selected at signal-to-noise ratio (SNR) of -6 dB as an example. Original image is resized by N × N, and the bicubic interpolation is adopted. That is, the output pixel value is a weighted average of pixels in the nearest 4-by-4 neighborhood. By using the opening processing, the binary image is smoother and the noise is removed in the last.

Image Features
The number of objects in the binary image (N obj ) is a useful feature. For example, Frank code and P3 have two objects, respectively, while LFM and P1 only have one. However, Costas codes have many objects in different location. In order to distinguish different waveforms, two features N obj1 and N obj2 are used in the paper. N obj1 and N obj2 are the number of the objects that represent larger than 20% and 50% of the size of the largest object, respectively.
A feature is also found in the location of the maximum energy in time domain of the image, i.e., where C N×N (t, ω) is a resized version of the CWD image, and N is the length of the sampling data. The standard deviation of the width of the signal objects (σ obj ) and the rotate angle of the largest object (θ max ) are appropriate for polyphase codes discrimination. Namely, the featureσ obj is suitable to classify two kinds of waveforms such as "stepped waveform" (including Frank code, P1) and "linear waveform" (including LFM, P3 and P4). In eight types of waveforms, only P2 has a negative slope. Therefore, P2 can be picked out by the parameterθ max from others easily. The features are calculated as follows: • for each object, k = 1, 2, ..., N obj1 ; a. decide the k and mask the other objects away from the binary image; b. calculate the principal components of the binary image; c. rotate * the image until the principal component of the object is parallel to the vertical axis, recorded as B r (x, y); d. calculate the row sum, i.e.,r(x) = ∑ N−1 y=0 B r (x, y), x = 0, 1, 2, ..., N − 1; e. normalize r(x), i.e.,r(x) = r(x)/ max{r(x)}; f. calculate the standard deviation ofr(x), i.e., where N is the number of samples.
• output the rotation degree of the maximum of objectsθ max ; • output the average of theσ k,obj1 , i.e.; * nearest neighbor interpolation is used in rotation processing.
Next, we select the maximum object with others removed, extracting the skeleton of the maximum object and estimating the linear trend of it. In the estimation, minimizing the square errors method is applied. The linear trend is subtracted from skeleton of the object to acquire the resulting vector f n . The standard deviation of f n is given bŷ where M is the length of f n .
In order to express the randomness of f n , a statistical characteristics test is proposed. The details of the test are as follows: * where Φ(·) is the standard normal cumulative distribution function. The value of p R is between 0 and 1. ** note that p R is no longer a probability. It is a measure of the similarity with Gaussian distribution.
The standard deviation Var(·) value is too small for machine precision. Therefore, it is replaced by variance Var(·) in (f).
When the p R is closer to 1, the test signal is more similar to Gaussian distribution, and the number of N is more than 50, the distribution of R is similar to the standard normal distribution, i.e., Z = (R − E(R))/ Var(R) ∼ N (0, 1). However, the values of Var(R) are quite small for machine precision for the P4 codes. Therefore, the value of R is normalized by using the equation Y = (R − E(R))/Var(R). The feature p R is worked in network2 only.
The other three features are obtained from the autocorrelation of f n , i.e., c(m) = ∑ k f n (k) f n (k − m), and m = 0, 1, ..., N − 1. Figure 6 indicates the differences of P1 and P4. In order to characterize these differences, the features are introduced.
where, m 0 is the location of the minimum of the lag value, and m 1 is the location of the sidelobe maximum of the lag value in the [m 0 , N − 1]. The FFT result is selected to characterize the power of oscillation of c(m), and the feature is estimated as: where c(m) is normalized by using the maximum value of itself. Namely, the range of c(m) is [−1, 1]. The final features are the members of Pseudo-Zernike moments. Pseudo-Zernike moments are invariant to translation, rotation, scaling and mirroring. It is suitable for the problems about pattern recognition [28][29][30]. These invariant features can reduce the amount of data used in training, which makes the recognition easy. The p + q order of the image geometric moments are defined as: m pq = ∑ x ∑ y B(x, y)x p y q (18) where B(x, y) is the binary image. The scaling and translation invariant central geometric moments are given by wherex = m 10 /m 00 andȳ = m 01 /m 00 . The scaling and translation invariant radial geometric moments are given by The Pseudo-Zernike moments are defined as: where k = (n − s − m)/2, d = (n − s − m − 1)/2 and The dynamic range can be reduced by calculating the logarithm, i.e.,Ẑ nm = ln|Z nm |. The members of the Pseudo-Zernike moments, includingẐ 20 ,Ẑ 22 ,Ẑ 30 ,Ẑ 31 ,Ẑ 32 ,Ẑ 33 andẐ 43 , are selected. The features are used in network2 only because the features in the network1 can distinguish between LFM, Costas codes, BPSK and polyphase codes very well.

Features Selection
In this section, the features selection algorithm for final feature vectors is exploited. With the increase of dimension of feature vector, the redundant information should be removed. The algorithm can select excellent features from a large number of them. Thus, the training time is reduced [31]. Based on the mutual information in the features, greedy selection algorithm is selected. It means that a new feature that has the maximum of mutual information is joined to the feature vector (which includes the previously selected features) in each cycle. The procedure is continued until the number of selected features reaches our design. Mutual information is estimated between class C and the feature vectors X :Î where c k is the kth class and c is the number of classes. H(·) denotes the entropy, N is the total number of training data and x j is the jth training vector. Probability density functionp(c k |x j ) is estimated by using Parzen windows. Entropy of the class H(C) is calculated by using the training data, i.e., p(c k ) = N k /N, where N k is the total number of training data from class c k . Thep(c k |x j ) is given byp where ∆ is a covariance matrix of a d-dimensional vector, and h is the window width parameter, (i.e., h = 1/log 2 (n)), in which, n is the number of samples. For more details, see [31].
In this stage, a total of 23 features for network1 and network2 are selected. The mutual information between each individual feature and class are estimated. In addition, there are 10 and 19 features remaining for network1 and network2, respectively. The mutual information estimation at each step of the features selection algorithm for network1 and network2 are shown in Figure 7. However, features selection would be stopped, when the mutual information less than 10 −4 because the estimation of mutual information is not very accurate especially in a high dimension [32]. See Figure 7 and Table 1 for the list of selected features.

Network2
Index of selected features according Table.1 Estimate of I(x;C) Figure 7. In this figure, the features were selected by the mutual information. The abscissa was the index of selected features at each cycle. There was one-to-one correspondence between the index and the Table 1. For example, the 5th feature was selected at the second cycle in network1. It indicated that the 5th feature (γ 2 ) had the greatest mutual information in all of the features except the 8th feature (N obj1 ). Its ordinate was the sum mutual information of N obj1 and γ 2 . In network1, 11 features were used, and the H(C) = 1.5488. In addition, in network2, 19 features were used, and H(C) = 2.3219.

Simulation and Discussion
In this section, the automatic waveform recognition system is measured by simulation signals. The purpose is to test accurate rate of the identification results, robustness and computational complexity in different conditions.

Create Simulation Signals
In this part, we create the simulation signals for experiments. In order to expedite the system recognition, the continuous sampling points are intercepted from the observation window. Supposing radar sources are disturbed by additive white Gaussian noise (AWGN) in propagation channel. In addition, SNR is defined as SNR = 10 log 10 (σ 2 s )/(σ 2 ε ), where σ 2 s and σ 2 ε are variances of original signal and noise respectively. U(·) denotes a uniform frequency. For example, there is a original frequency f 0 = 1000, and the sample rate f s = 8000. The uniform result is f 0 = U( f 0 / f s ) = U(1/8). For every waveform, there are different parameters that need to be set. For LFM, the initial frequency is a random number between U(1/16) and U (1/8). Bandwidth ∆ f ranges from U(1/16) to U(1/8) as well. For BPSK, the random value of the carrier frequency is U(1/8, 1/4), and the length of Barker codes selects from {7, 11, 13}. The number of code periods (N p ) and the number of cycles per phase code (cpp) are range of [100, 300] and [1,5], respectively. For Costas codes, the number of frequency change is a random value between 3 and 6. The fundamental frequency f min is U(1/24). For instance, once the Costas codes are selected, the number is 4, namely, a no-repeating sequence {3, 2, 1, 4} is generated. Then, the Costas codes {3 f min , 2 f min , f min , 4 f min } is decided. For Frank code, the carrier frequency is U (1/8, 1/4). In addition, cpp and M are range of [1,5] and [4,8], respectively. For P1-P4 polyphase codes, the parameters are similar to Frank code. For more details, see Table 2.

Experiment With SNR
The experiment research shows the relationship between the RSR and SNR. One thousand groups are provided for every type of radar waveform, from which 90% of the groups data are used for training and 10% for testing. The SNR would be increased from -8 dB to 14 dB with a step of 2 dB. The system will compare with Lundén's [11] because his system has similar classifications to us, and also is considered as the satisfactory automatic radar classification system at present. The simulation results are shown in Figure 8. Figure 8 plots the classification probabilities as a function of the SNR. The probabilities are measured by the testing data of each kind of waveforms, and the overall probabilities are also calculated. As shown in Figure 8, our system performs better on the classification of each kind of waveform than Lundén's, especially at low SNR. At SNR > 8 dB, RSR of our system approaches 100%. With the decrease of SNR, the successful ratio can still be kept at a high level. However, the performance of Lundén's is far from satisfactory. At SNR of -2 dB, most waveforms of RSR reach 90%. Moreover, the overall RSR of the waveforms approach 95%. Nevertheless, the overall RSR of Lundén's is less than 70% in the same conditions. Costas codes and BPSK are not sensitive with SNR, and P4 code has low RSR in the simulations. Table 3 shows the confusion table at SNR of -2 dB, while the overall RSR is 94.7%. As the table shows, the probabilities of error recognition of Frank code and P4 code are higher than others. Some of Frank codes are identified as P3 codes by the system, while P4 codes are identified as LFM. In fact, the two pairs are very similar, which can be seen in Figure 4. When the SNR is low, the probabilities of error recognition is also reasonable.

Experiment with Robustness
In this part, we explore the robustness of approach through a different number of training samples. There are 200 groups of testing samples for each of the radar waveform, and the number of training samples would be increased from 100 to 900 with a step of 200. To be more adequate, the experiment would be repeated three times with samples from -8 dB, -2 dB and 10 dB.
As shown in Figure 9, with the increase in number of training samples, the successful recognition is also increased. Meanwhile, for all groups of SNR, the change of successful recognition is not obvious with the number over 500. It means that as long as the number is over 500, the classifier will be able to work in the best state. The approach works well in the small number of samples, which has important significance for radar waveform analysis.

Experiment with Computation
Computational complexity issue is an important indicator to measure the the performance of classification system. We reproduce Lundén's method [11], and compare it with this paper in the same conditions. All eight kinds of waveforms are tested under three different SNRs: -8 dB, -2 dB and 10 dB, and each test repeats 10 times on average. The testing environment and testing results are demonstrated as Tables 4 and 5, respectively. As shown in Table 5, the proposed method spends about 55 s and Lundén's about 85 s, respectively. However, Lundén's method has several approaches which do not apply in this paper, such as Wigner-Ville distribution, data driven and peak search, etc. These approaches consume extra time. In each waveform, there is the trace of reduction in time, when SNR is increasing. BPSK spends the least time and P3 spends the most time, but in the overall range, the difference is not obvious. The recipe in this paper is stable, and the variety of SNR and waveforms have little effect on the computational complexity of system.

Conclusions
In this paper, an automatic radar waveform recognition system is explored. The recognition system can identify eight types of LPI radar waveforms (including LFM, Costas codes, BPSK (Barker modulation), Frank code and P1-P4) in a highly noisy environment. The RSR relies more on the choice of the features extracted from an intercepted radar pulse as well. The features include two categories, one is extracted from waveforms directly (signal features), the other is from time-frequency image (image features). To reduce cross terms, the CWD transformation is adopted in the time-frequency image. The final feature vectors have been selected by using the information theoretic features selection algorithm.
Elman neural network, a typical global network, is applied as the classifier cell of the recognition system. It is a robust network that has a memory function. The structure of the classifier is composed of two cells, network1 and network2. In network1, the waveforms are classified into four types (including LFM, Costas codes, BPSK and polyphase codes), through the use of all signal features and a part of image features. The polyphase codes are further classified into five types (including Frank code, P1-P4) in network2.
New features have been proposed from the time-frequency image. By adopting translation, scaling and rotation features, we can focus more on the structure of the waveform itself. It is important to classify different kinds of polyphase codes and improve the RSR of polyphase codes. The simulation results show when SNR ≥ -2 dB, and the RSR is more than 94%.