Neural Networks for Radar Waveform Recognition

: For passive radar detection system, radar waveform recognition is an important research area. In this paper, we explore an automatic radar waveform recognition system to detect, track and locate the low probability of intercept (LPI) radars. The system can classify (but not identify) 12 kinds of signals, including binary phase shift keying (BPSK) (barker codes modulated), linear frequency modulation (LFM), Costas codes, Frank code, P1-P4 codesand T1-T4 codeswith a low signal-to-noise ratio (SNR). It is one of the most extensive classiﬁcation systems in the open articles. A hybrid classiﬁer is proposed, which includes two relatively independent subsidiary networks, convolutional neural network (CNN) and Elman neural network (ENN). We determine the parameters of the architecture to make networks more effectively. Speciﬁcally, we focus on how the networks are designed, what the best set of features for classiﬁcation is and what the best classiﬁed strategy is. Especially, we propose several key features for the classiﬁer based on Choi–Williams time-frequency distribution (CWD). Finally, the recognition system is simulated by experimental data. The experiments show the overall successful recognition ratio of 94.5% at an SNR of − 2 dB.


Introduction
Modern radars usually have low instantaneous power, called low probability of intercept (LPI) radars, which are used in electronic warfare (EW).For a radar electronic intelligence (ELINT) system (anti-radar system), analyzing and classifying the waveforms of LPI radars is one of the most effective methods to detect, track and locate the LPI radars [1,2].Therefore, the second order statistics and power spectral density are utilized in the waveforms' recognition earlier to classify phase shift keying (PSK), frequency shift keying (FSK) and amplitude shift keying (ASK) [3].Dudczyk presents the parameters (such as pulse repetition interval (PRI), pulse width (PW), etc.) to identify different radar signals [4][5][6][7].Nandi introduces the decision theoretic approach to classify different types of modulated signals [8].Additionally, the ratio of successful recognition (RSR) is over 94% at a signal-to-noise ratio (SNR) ≥15 dB.The artificial neural network is also utilized in the recognition system.The multi-layer perceptron (MLP) recognizer reaches more than 99% recognized performance at SNR ≥0 dB [9].Atomic decomposition (AD) is also addressed in the detection and classification of complex radar signals.Additionally, the receiver realizes the interception of four signals (including linear frequency modulation (LFM), PSK, FSK and continuous wave (CW)) [10].Time-frequency techniques can increase signal processing gain for the low power signals [11].In [12], López analyzes the differences among LFM, PSK and FSK based on the short-time Fourier transform (STFT).Additionally, the RSR ≥90% at SNR ≥0 dB.Lundén [13] introduces a wide classification system to classify the intercepted pulse compression waveforms.The system achieves overall RSR ≥98% at SNR ≥6 dB.Ming improves the system of Lundén and shows the results in [14].The sparse classification (SC) based on random projections is proposed in [15].The approach improves efficiency, noise robustness and information completeness.LFM, FSK and PSK are recognized with RSR ≥90% at SNR ≥0 dB.
We investigate the convolutional neural network (CNN) for radar waveform recognition.CNN has been proposed in image recognition fields [16,17].Recently, it has been applied for speech recognition [18][19][20][21], computer vision [22,23] and handwritten recognition [24][25][26], etc. Abdel-Hamid introduces the approaches to reduce the further error rate by utilizing CNNs in [27].Experimental results show that CNNs reduce the error rate by 6%-10% compared with deep neural networks (DNNs) on the speech recognition tasks.In [26], a hybrid model of two superior classifiers CNN and support vector machine (SVM) is discussed.The RSR of the model achieves more than 99.81%, in which SVM performs as a classifier and CNN works as a feature extractor.
In this paper, we explore a wide radar waveform recognition system to classify, but not identify In this paper, the meaning of "classify" is that we distinguish the different types of waveforms.Additionally, "identify" is distinguishing the different individuals of the same type.Twelve types of waveforms (LFM, BPSK, Costas codes, polyphase codes and polytime codes) by using CNN and Elman neural network (ENN) are discussed.We propose time-frequency and statistical characteristic approaches to process detected signals, which transmit in the highly noisy environment.The detected signals are processed into time-frequency images with the Choi-Williams distribution (CWD).CWD has few cross terms for signals, which is a member of the Cohen classes [28].Time-frequency images show the three main pieces of information of signals: time location, frequency location and amplitude.In the images, time and frequency information is more robust than amplitude.To make the images more suitable for the classifier, a thresholding method is investigated.The method handles the time-frequency images as binary images.After that, binary images are addressed by noise-removing approaches.The final images are used for classification and feature extraction.However, polyphase codes (including Frank code and P1-P4 codes) and LFM are similar to each other.It is difficult to classify them through shapes individually.Therefore, we extract some effective features for further classification of them.Features extraction is from binary images through digital image processing (such as skeleton extraction, Zernike moments [29], principal component analysis (PCA), etc.).The set of features is the input of ENN.Additionally, the output of ENN is the classification result.The entire structure of the classifier consists of two networks, CNN and ENN.CNN is the primary cell of the classifier, and ENN is auxiliary.Binary images are resized for CNN to separate polytime codes (include T1-T4) from the other eight kinds of waveforms.Additionally, we extract features for ENN, which can indicate the eight remaining codes obviously.Only if "others" are selected by the CNN, ENN starts to work (see Figure 2).In the experiments, the recognition system has overall RSR ≥94% at SNR ≥−2 dB.
In this paper, the major contributions can be summarized as follows: (1) build the framework of signals processing; additionally, establish the label data for testing the system; (2) the proposed recognition system can classify as many as 12 kinds of waveforms, which are described in the context; previous articles can seldom reach such a wide range of classification of radar signals; especially, four kinds of polytime codes are classified together for the first time in the published literature; (3) almost all interested parameters and all features will be estimated by received data without a priori knowledge; (4) propose a hybrid classifier that has two different networks (CNN and ENN).
The paper is organized as follows.The structure of the recognition system is exhibited in Section 2. Section 3 proposes the signal model and preprocessing.Section 4 explores the feature extraction, including signal features and image features.Additionally, it lists all features that we need.After that, Section 5 searches the structure of the classifier and describes it in detail.Section 6 shows the experiments.Section 7 draws the conclusions.

System View
The entire classification system mainly consists of three components: preprocessing, feature estimation and recognition; see Figure 1.It is an automatic process from the preprocessing part to the recognition part.In the preprocessing part, the received data are transformed into time-frequency images by utilizing CWD transformation.Then, the time-frequency images are transformed into the binary image through image binarization, image opening operation and noise-removing algorithms.
In the feature extraction part, we extract effective features to train and test the classifier.Different kinds of waveforms have different shapes in the images.After image processing, the differences of shapes are more significant.CNN has a powerful ability of classification, which distinguishes polytime codes from others.To classify these similar waveforms (such as polyphase codes), we extract features from detected signals and binary images.In the recognition part, all of the waveforms are classified via the proposed classifier based on the extracted features.The hybrid classifier consists of two networks, network1 and network2 ; see Figure 2. The entire classifier can classify 12 different kinds of radar waveforms, which has been mentioned in the writing.Network1 is the main network composed of CNN.Its input is a binary image after preprocessing.Additionally, the outputs are five different kinds of classification results.They are four kinds of polytime codes (T1-T4) and others (do not belong to the polytime class).Network2 is ENN, which is an auxiliary network.Network2 assists the main network (network1) in classifying the eight remaining waveforms that do not belong to polytime codes.When the waveform is considered as "others" by network1, network2 will begin to classify the waveform into one of the eight kinds of waveforms.The proposed structure of the classifier can improve the classified power.

Preprocessing
In this section, detected signals are processed into binary images with the Choi-Williams time-frequency distribution.

Signal Model
We assume the signal is contaminated by additive white Gaussian noise (AWGN).Additionally, the amplitude is constant for time.In summary, the signal model is formulated as follows where s(nT) is the n-th sample of complex signals.Additionally, m(nT) is the n-th sample of complex white Gaussian noise (WGN).The variance of WGN equals σ 2 ε .A is the amplitude.However, for the sake of simplicity, we suppose A = 1.φ is the instantaneous phase of complex signals.To process detected signals from real to complex, Hilbert transform is applied [30].

Choi-Williams Distribution
The Choi-Williams distribution is a kind of time-frequency distribution, which expresses the details of detected signals.It can reduce the cross terms from the signals obviously.
where ω and t are the axes of frequency and time, respectively.f (ξ, τ) is a two-dimensional low-pass filter to balance cross terms and resolution.The kernel function is formulated as follows: σ is the controllable factor.The cross terms will be more obvious with the increase of σ.In this paper, σ = 1 is applied.In Figure 3, 12 kinds of signals are transformed into time-frequency images through CWD transformation.The work in [31] proposes a new fast calculation of CWD based on standard Fourier transformation (FFT).We could recommend that the number of the sampling is the power of two, such as 256, 512, etc.In this paper, 1024 sampling points are investigated.However, the length of signals is N < 1024 for most of the time.Therefore, zero padding is utilized in the process.

Binary Image
In this part, the detected signals are processed into binary images with the global thresholding algorithm [32].Before it is done, we need to resize the time-frequency images to reduce the computational load, in which the resized size is N × N. The algorithm is organized as follows.c.Separate the image into two pixel groups G 1 and G 2 ; G 1 includes all pixels that values > T, and G 2 includes others; d.Calculate the average value µ 1 and µ 2 of two pixel groups G 1 and G 2 , respectively; e. Update the threshold, i.e., f. Repeat (b)-(e), until the δT is smaller than 0.001, i.e., δT = T now − T be f ore ; g. Compute B(x, y) as follows:

Noise Removed
After the operation of binarization, meanwhile, there are some isolated noises and processed noises in B(x, y).Isolated noises come from the noisy environment.In the binary image, isolated noises are groups of pixels that have no fixed shape.Processed noises are generated from the CWD kernel especially.In the binary image, they are straight lines, long but thin.The length is more than 50 pixels, but the width of lines is less than three pixels.Image opening operation (erosion followed by dilation) algorithms are proposed to remove the processed noises.Additionally, the size of the operational kernel is 3 × 3. It is effective to remove the shape whose width is less than three pixels.For isolated noises, we count the number of pixels of signals or noises.In this paper, the groups are removed, in which sizes are smaller than 10% of the largest one.The content of removing noise is introduced in

Feature Extraction
In this section, we extract some useful features and build a feature vector for ENN in order to assist the CNN to complete recognition.The section consists of two parts, including signal features and image features.The features, which we can estimate or calculate from detected signals directly, belong to signal features.Similarly, image features include the features that are extract from binary images.Table 1 lists the signal features and image features that are used in network2.

Signal Features
In this part, the features are extracted from signals based on signal processing approaches.

Based on the Statistics
We estimate the n-order moment of complex signals as follows: where ( * ) is the conjugated symbol and N is the sample number.We utilize absolute values to ensure that the estimated values are invariant constants when the signal phase rotates.M10 and M20 are calculated by Equation ( 4).
The n-order cumulant is given by [33,34]: where, the same as context, M10 is from Equation (4).

Based on the Power Spectral Density
Before estimation of Power Spectral Density (PSD), the detected signals should be normalized as follows: where M21 is obtained from Equation (4) and y(k) is the k-th sample.The variance of additive noise σ 2 ε can be obtained in [35].
The PSD are calculated as follows: where ỹ(k) is from Equation (6).

Based on the Instantaneous Properties
Instantaneous properties are the essential characteristics of detected signals.They can distinguish frequency modulated signals from phase modulated signals effectively.In this paper, we estimate the instantaneous frequency and instantaneous phase from samples.The standard deviation of instantaneous phase is addressed in [9].For brevity, φ(k) = tan −1 [Im(y(k))/Re(y(k))] is applied; where, Re and Im are the real and imaginary parts of complex signals, respectively.The standard deviation of instantaneous phase is given by: σφ where N is the sample number.φ is the instantaneous phase with the range of [−π, π].Instantaneous frequency estimation is more complex than instantaneous phase.We describe the method in several steps to make it clear.
e. Normalize the instantaneous frequency f (k), φ u (k) is the unwrapped phase of φ(k).When the absolute jumps from φ(k), we can add ±2π to recover the consecutive phase.
In the sequence of f (k), some spikes are created by processing.We use the median filter algorithm with window size of five to smooth the spikes.

Image Features
In this part, we extract the features based on binary images.The number of objects (N obj ) is a key feature.For instance, Costas codes have more than three objects, but Frank code and P2 have two.Additionally, P1, P4 and LFM only have one.We estimate two features N obj1 and N obj2 .N obj1 is the number of objects, the sizes of the pixels of which are more than 20% of the size of the largest object.Likewise, N obj2 ≥ 50%.
The maximum energy location in time domain is also a feature, i.e., where C N×N (t, ω) is the resized time-frequency image and N is the sample number.
The standard deviation of the width of signal objects ( σobj ) can describe the concentration of signal energy.The feature is estimated as follows.
• Repeat for every object, do k = 1, 2, ..., N obj1 ; 1. Retain the k-th object and remove others, called B k (x, y); 2. Estimate the principal components of B k (x, y); 3. Rotate the B k (x, y) until the principal components are vertical; record as B k (x, y); 4. Sum the vertical axis, i.e., 6. Estimate the standard deviation of v(x), i.e., σk,obj1 where N is the sample number; • Output the rotation degree θmax , which performs Step (c) at the maximum object.
P2 has a negative slope in five types of polyphase codes.Therefore, the feature θmax can classify P2 from others easily.The feature shows the angle between the maximum object and the vertical direction.It can be obtained from the calculation of σobj easily.
Next, we retain the maximum object in the binary image, but others are removed.The skeleton of the object is extracted by utilizing the image morphology method.Additionally, the linear trend of the object is also estimated based on minimizing the square errors method at the same time.Subtract the linear trend from the skeleton to achieve the difference vector f n .The standard deviation of f n is estimated as: where M is the sample number of f n .Some features are extracted by using autocorrelation of f n , i.e., c(m) = ∑ k f n (k) f n (k − m), m = 0, 1, ..., N − 1.The autocorrelation method makes differences more significant among stepped waveforms (P1, Frank code) and linear waveforms (P3, P4, LFM).See Figure 3 for more details.
The ratio of the maximum value and sidelobe maximum value of c(m) is formulated as: where m 0 is the value corresponding to the minimum of c(m) and m 1 is the value corresponding to the maximum of c(m) in the location of [m 0 , N − 1].
We estimate the maximum of the absolute of FFT operation âmax as follows: where Pseudo-Zernike moments are invariant for topological transformation [36], such as rotation, translation, mirroring and scaling.They are widely applied in pattern recognition [37][38][39].The n-order image geometric moments are calculated as: where B(x, y) is from Section 3.3.The central geometric moments for scale and translation invariant are given by: where x = m 10 /m 00 and ȳ = m 01 /m 00 .The scale and translation invariant radial geometric moments are shown as: where x = x − x and ỹ = y − ȳ.Then, the pseudo-Zernike moments can be estimated as follows: where k = (n − s − m)/2, d = (n − s − m − 1)/2 and: At last, Ẑnm is estimated, i.e., Ẑnm = ln|Z nm |.The members of pseudo-Zernike moments include Ẑ20 , Ẑ22 , Ẑ30 , Ẑ31 , Ẑ32 , Ẑ33 and Ẑ43 .

Classifier
In Section 4, we complete the resized binary image labels for CNN and the feature vector extraction with 22 elements for ENN.In this section, we describe the structure of two networks in detail.

CNN
CNN is a new neural network, which has a special structure for image feature extraction.Different from traditional network, the input of CNN is a two-dimensional feature (image).The convolution layers can extract information, and pooling layers reduce computer load effectively.CNN is not a full connected network, which is similar to the cerebral cortex.The architecture of the CNN model is shown in Figure 5. CNN has the hierarchical architecture [40].Hence, we describe the neural architecture as follows.a.The input layer is a binary image, which is from Section 3.4.To reduce the computer load, we resize the image to 32 × 32 with the nearest neighbor interpolation algorithm.b.The first hidden layer C 1 is a convolutional layer, which has six feature maps.Different feature maps require a different convolutional kernel.C 1 has six convolutional kernels with a size of 5 × 5. We utilize C 1 (m, n, k) to represent the value of the k-th feature map at position (m, n) in the C 1 layer.c.The second hidden layer S 1 is a down-sampling layer with six feature maps.In S 1 , every feature value is the average of four adjacent elements in C 1 .We denote S 1 (m, n, k) as the context.Further, we have: The size of feature maps in S 1 reduces to 1/4, compared with feature maps of C 1 .d. C 2 is a convolutional layer with 16 different kernels.It is not fully connected with the S 1 layer [41].
The connection details are described in Table 2. C 2 (m, n, k) is also utilized to describe the neurons in this layer.For the α-th column in Table 2, we mark row indices by β α,0 , β α,1 , •, β α,p−1 .
For instance, if α = 7, then we will get parameters as follows: p = 4, Further, the size of the convolutional kernel is p × 5 × 5. K α is the α-th kernel.Additionally, we have: For example, for the zeroth column, p = 3, α 00 = 0, α 01 = 1, α 02 = 2, and we also have: e. Similar to S 1 , this layer is a down-sampling layer, called S 2 .S 2 has 16 feature maps.To follow the context in Equation ( 20), we donate: f.The connection between S 2 and N 1 is a full connection.Each kernel in N 1 will be connected with all of the feature maps in S 2 .There are 120 kernels in this layer.Additionally, the size of the kernel is 5 × 5, which means the output is a column vector with the size of 120 × 1.We describe N 1 (λ) as the λ-th feature map of N 1 and K λ as the λ-th kernel.Then, we have: g. Finally, the connected style between N 1 and output layer is fully connected.There are five neurons (defined by the classes we want to classify) in the output layer with the sigmoid function.

ENN
The three-layer ENN is utilized in the paper for signal classification.The connections, which connect different hidden layers or output layers, have different weights [42].At every time step, the input is propagated in a feed-forward fashion and the feedback of the output.Additionally, the error back propagation (BP) learning algorithm is also utilized [43].The connection results in that the context units always maintain a copy of the previous values of hidden units.Thus, the network can keep the past state, which is useful for applications such as sequence prediction [44][45][46].In Figure 6, there are 46 neurons in the hidden layer.For the input and output layer, the number of neurons is determined by the dimension of input and output vectors.Sigmoid function f (x) = 1/(1 + e −x ) is proposed in every layer.In [47], Sheela discusses the different methods to fix the neurons number H num of hidden layers.In this paper, a simple formula is given by: where X is the dimension of the feature vector.C is the number of categories.The proposed formula cannot determine the optimal number of hidden layer completely.We may fine-tune the number in some situations.Forty-six neurons of hidden layers are applied in this paper.

Simulation Results and Discussion
In this section, the performance of the proposed recognition system is analyzed by utilizing simulated data.The section consists of three parts, including creating the simulated data, discussing the relationship between SNR and RSR, depicting the accurate rate of robustness and summarizing the experiments.

Production of Simulated Signals
In this part, simulated signals are created.

Experiment with SNR
In this part, we depict the relation between SNR and RSR in Figure 7.There are 1000 labels in each waveform class.Twenty percent of the labels are utilized for testing and 80% for training.The result is compared with Lundén's system [13] and our previous work [14], both of which are wide systems in waveform classification.
Figure 7 plots the experimental results of RSR with different SNR.Twelve kinds of waveforms and the "overall" are provided.The solid line shows the proposed system, and the dotted lines represent others.For LFM and P4, the proposed approach provides better performance than Lundén's, especially at low SNR, but poorer than the previous work, although the difference is not too much.For BPSK and Costas codes, the three RSRs almost have similar results, and all of them are at a high level.For Frank and P2, the results of the proposed method and previous work are alike and higher than Lundén's.In the simulation of P1, the proposed method is the best when the SNR is more than −2 dB.The results of P3 are similar to P1; proposed method performs well at high SNR.For polytime codes, the proposed approach also has excellent RSRs.It benefits from the outstanding design of pre-processing and the high RSR of the classifier.Finally, the overall RSR has been raised by 20% in the proposed approaches, compared to Lundén's and previous work.At SNR of −2 dB, the overall probabilities are still more than 90%.Table 4 exhibits the confusion table of 12 kinds of waveforms at the SNR of −2 dB.As Table 4 shows, the waveforms of P3 and P4 are not "always" classified correctly.For P3, most of the errors are classified into the Frank code.Meanwhile, most of the errors of P4 are classified into LFM.However, the two pairs are very similar; see Figure 3.

Experiment with Robustness
The robustness of proposed approaches is explored in different training samples.There are 900 label samples in each waveform for training and 100 labels for testing.Afterwards, the training samples will be increased from 100-900 with a step of 200.Meanwhile, the experiment will be repeated for three times in the condition of SNR = −4 dB, 0 dB and 6 dB.
Figure 8 plots the impact of training samples on successful recognition with three conditions of SNR.In general, it is positively correlated between training samples and successful recognition.When the samples are less than 500, the successful recognition increases obviously.However, when the samples are more than 500, the successful recognition is substantially retained.It means that the proposed approaches are able to work well in a small number of samples.

Experiment with Computational Burden
Computational burden is also an important issue for the classification system.We measure the time of the proposed method and compare it with [13,14] in the same conditions.Three different SNRs, −4 dB, −0 dB and 6 dB are tested, and each test repeats 50 times to calculate the average value.Table 5 shows the testing environment, and Table 6 demonstrates the testing results, respectively.In Table 6, the proposed method and previous work spend less than 60 s, while Lundén's more than 80 s; because Lundén's method has more calculations, we do not need to compute, such as the Wigner-Ville distribution, peak search and data driven, etc.We also improve the effectiveness of the system and reduce the consumption of time compared with previous work.In the same type of waveform, the highest SNR has the least time.In the different types of waveforms, BPSK is easiest to calculate, but P3 code is the opposite.However, overall, the change of cost is not obvious.The proposed method is stable, and different waveforms or SNR also have little effect on the computational burden of the classification system.

Conclusions
In this paper, an automatic system to realize the recognition of radar signal waveforms is proposed.We build the processing flow for detected signals by utilizing signal and image processing algorithms.Using these methods, the signal waveforms are fully represented via sets of feature vectors and binary images.The vectors and images are classified into 12 types in the classifier.The simulation results show that the overall RSR is more than 94% at SNR ≥−2 dB.Additionally, the processes of feature extraction and noise removed make the system robust.When the sample labels are more than 500, the successful recognition is substantially retained.At last, the computational burden is tested.The proposed method is stable in different waveforms or SNR and spends less time than Lundén's method and our previous work.

Figure 2 .
Figure 2.This figure shows the details of the classifier.Network1 is the main network composed of CNN and Network2 is ENN.Network2 assists the main network (network1) to complete the classification.

Figure 4 .Figure 4 .
Figure 4.In this figure, we exhibit the processing with P3 code at an signal-to-noise ratio (SNR) of −4 dB.

Figure 5 .
Figure 5.The figure shows the structure of Convolutional Neural Network (CNN).The input image are processed in the hidden layers and classified in the out layer.

Figure 6 .
Figure 6.This figure shows the structure of Elman neural network.It is a 3 layers network that has feedback loops.So, the network can keep the past state, which is useful for waveform classification.

Table 4 .
Confusion matrix for the system at an SNR of −2 dB.The overall Ratio of Successful Recognition (RSR) is 94.5%.

Figure 8 .
Figure 8.The figure shows the successful recognition ratio of different numbers of samples.
The figure shows the systematic components.Received data is processed in the preprocessing part and feature estimation part to extract features.And the data is classified in the classifier part.

Table 1 .
List of the features for network2.

Table 2 .
Connection detail about S 1 and C 2 .

Table 3 .
List of the parameters of simulated signals.
This figure depicts the different probabilities of 12 types of radar waveforms with testing data.

Table 5 .
The testing environment.: Central Processing Unit.GPU: Graphics Processing Unit.MATLAB is a software produced by the MathWorks, Inc. that located in Natick, Massachusetts, United States. CPU