1. Introduction
Intra-pulse modulation classification of radar emitter signals is a key technology, which helps to analyze the radar systems. It plays an important role in electronic support measure (ESM) systems, electronic intelligence (ELINT) systems and radar warning receivers (RWRs) [
1,
2,
3]. The accurate classification of intra-pulse modulation of radar emitter signals could increase the reliability of estimating the function of radar and provide the presence the potential threat, such that necessary measures or counter measures against enemy radars could be taken by the ELINT system.
Traditional methods of intra-pulse modulation classification require the features which are usually extracted manually. For example, in [
4], Yang et al. calculated the higher-order cumulants (HOC) of radar emitter signals and trained the support vector machines (SVM) to classify different automatic modulations. In [
5], Park et al. used wavelet features and SVM to classify eight different digital modulations. However, these traditional methods require a great deal of prior knowledge and their performance is poor when the radar emitter signals are on low signal-to-noise ratio (SNR).
In recent years, deep learning [
6] has attracted great attention in the field of artificial intelligence. Some deep learning-based methods, especially convolutional neural network (CNN) [
7,
8,
9,
10,
11], have been applied in classification problems. A large amount of research on intra-pulse modulation classification of radar emitter signals have been proposed. In [
12], a CNN model which use time-frequency images extracted by Cohen class time-frequency distribution as the input, was used to recognize the intra-pulse modulation of radar signals. Kong et al. [
13] used Choi-William Distribution (CWD) images of low probability of intercept (LPI) radar signals and recognize the intra-pulse modulations. Besides, in [
14], the authors proposed a novel blind modulation classification method based on the time–frequency distribution and convolutional neural network, where the experiment results show that the method proposed in this study is efficient and robust and enables a high degree of automation for extracting features, training weights and making decisions. Liu et al. [
15] proposed an algorithm of radar emitter signal recognition, which uses the time-frequency images as the input of CNN. In [
16], a joint feature map, which combines time-frequency image and instantaneous autocorrelation image, was used as the input of CNN to classify the modulation of radar emitter signals. In [
17], the data were firstly preprocessed by Short-Time Fourier Transformation and then a CNN model was trained to classify intra-pulse modulation of radar signals.
However, the above methods are mainly based on 2-D CNN and time-frequency transformation of original sampled radar emitter signals. In the real environment, the quantity of radar emitters is huge, which lead to the situation that the pulse widths of the received radar emitter signals usually vary in a range. When the sampling frequency is given, the length of the sampled data is determined accordingly. Although these proposed 2-D CNN-based methods use time-frequency images to circumvent the problem where the length of the sampled radar signals is always different with each other, the preprocessing stage, especially the dimensional transformation of radar signals, still consumes more time and storage space. Moreover, as the length of sampled radar signals varies, it is hard to choose a suitable shape of time-frequency image (TFI) for CNN’s input to balance the speed for training and testing and classification accuracy, which leads to the poor feasibility.
Considering these limitations and inspired by [
18], which employed a multi-branch convolutional network and a dynamic selection mechanism in CNNs that allows each neuron to adaptively select its receptive field size based on multiple scales of input information, this paper proposed a 1-D selective kernel convolutional neural network (1-D SKCNN) for intra-pulse modulation classification of radar emitter signals. In the stage of data preprocessing, the sampled signals are processed by zero-padding, fast Fourier transformation (FFT) and amplitude normalization, which are much faster than the dimensional transformation in those 2-D CNN-based methods. Then, the results of the data preprocessing: the normalized frequency-domain sequences, will be used to train the proposed 1-D SKCNN. This proposed method could classify eleven different intra-pulse modulations of radar emitter signals, which have a relatively wide interval for both duration and bandwidth. And the experimental results show that this method has the advantages of higher accuracy.
This paper is organized as follows: In
Section 2, the proposed method, including the structure of 1-D SKCNN and the preprocessing of data, is introduced in detail. The dataset, parameters setting, experimental results of proposed method are shown in
Section 3. The comparisons with other methods and discussions are shown in
Section 4. The conclusion is present in
Section 5.
2. The Proposed Method
Intra-pulse modulation classification of radar emitter signals refers to classifying each pulse of radar emitter signal to a certain modulation type. Thus, in this paper, we proposed a 1-D selective kernel convolutional neural network named 1-D SKCNN for intra-pulse modulation classification. This method consists of the following parts: (1) Preprocess the 1-D raw data of radar emitter signals, which includes zero-padding, fast Fourier transformation and amplitude normalization. (2) Design the 1-D SKCNN model to extract features and conduct per-pulse classification. (3) Train the 1-D SKCNN.
2.1. The Structure of Proposed 1-D SKCNN
Traditional CNN models are usually designed to process 2-D data. However, the radar emitter signals are usually in 1-D form and it is time-consuming and storage-consuming to do the dimensional transformation such as time-frequency transformation. In order to improve the timeliness, in this paper, we proposed the 1-D SKCNN to classify the intra-pulse modulation of radar emitter signals. The overall architecture of the proposed 1-D SKCNN is shown in
Figure 1.
The input of 1-D SKCNN is the frequency-domain sequence, and this will be introduced later. There are four main blocks in 1-D SKCNN and each block contains a selective kernel convolutional block, a max-pooling layer and a batch-normalization layer [
19].
Figure 2 shows the structure of the selective kernel convolutional block.
In the selective kernel convolutional block, two convolutional layers are designed to extract the feature at the same time. Inspired by the fact that in the neuroscience community, the receptive field size of visual cortical neurons is modulated by the stimulus, the size of the kernel in these two convolutional layers is set to be different, which is hoped to extract the features from different scale adaptively. Assuming that
is the input feature map, the calculation of the two convolutional layers with padding operation would be:
where
fupper and
flower are composed of two kinds of normal convolution with different size of kernel and “ReLU” activation function [
20]. Then we choose addition as the fusing operation and fuse the two matrices in channel way. The fused result at this stage,
could be shown as:
Like [
21], we use channel-wised global-average pooling to get the global information. As the channel number of
Xfused is
C, the output result of global-average pooling will be a vector named
. This process could be shown as:
Then
xgap is sent to a Multi-Layer Perceptron (MLP) which contains a shared weight hidden layer and two independent output layer. The activation function in the shared weight hidden layer is “ReLU”. The output shape of this MLP is same as the shape of
xgap. Therefore, we could get two output vector named
and
. This processing is shown as:
where
Wshared means the weight of the shared weight hidden layer,
δ means the “ReLU” function,
Wupper and
Wlower are the weight of two independent output layer.
Next, we choose soft attention to weight to importance of the convolutional result:
Xupper and
Xlower. Therefore, a SoftMax operation is applied to
xupper and
xlower. The process of the smoothing operation could be shown as:
where
and
are the corresponding smoothing result of
xupper and
xlower. And the final output feature map of the selective kernel convolutional block,
, which is thought as the reweighted feature maps, could be obtained through the attention and the original
Xupper and
Xlower:
where “
” stands for channel-wised elements multiply computation. Specifically, Equation (9) could be expressed as:
2.2. Preprocessing of Data
In the task of intra-pulse modulation classification of radar emitter signals, the radars may have multiple wave modes where the pulse widths could range from a microsecond to hundreds of microseconds. However, the commonly used wave mode of radars is the short pulse width mode and the pulse could be collected separately according to their pulse widths.
Therefore, assuming that the pulse widths of radar emitter signals vary in a certain range, when the sampling frequency is given, the length of the sample radar signals is determined. Unlike IQ-sampling, we sample the time-domain radar emitter signals only in one channel based on the theory of Nyquist sampling. Therefore, each pulse of radar emitter signal in analog domain will be converted to a 1-D time-domain sequence in digital domain.
CNN always requires a fixed-shaped input due to the full connection layer. As the pulse width varies in a certain range, we need to choose a suitable transformation to ensure that the input shape of each sample is the same. Although the easiest way to preprocess the data is by padding with enough zeros to ensure that the length of all preprocessed samples is the same, too many useless zeros may have a negative impact and the classification performance based on these time-domain sequences is not good. To explain this, an experiment will be introduced later.
In 1-D SKCNN, we choose to use the amplitude sequence of the signal in frequency domain as the input. First, we need to find the upper limitation of the pulse width and set a proper value of pulse width to ensure all the samples could be covered under this value. Then based on the sampling frequency, the corresponding length could be calculated. Next, we pad enough number of “0” at the end of the sampled sequence in time domain to ensure that the length of all padded sample is same. This process is shown as:
where the length of
xpadded is equal to the result which is calculated by multiplying the sampling frequency and the set proper value of pulse width.
refers to one of the 1-D sampled signals in time domain.
Then, we use FFT to transform
xpadded and calculate the modulus of the result of FFT. This process is shown as:
where
means the process of FFT. As the result,
is a real sequence where the values of the elements are all greater or equal to zero.
refers to the conjugation of
.
To reduce the influence of different amplitudes on classification, data normalization is needed. This process is shown as:
where
xinput is the input data for 1-D SKCNN.
means the process of finding the max value in the sequence. Therefore, the value of each element in
xinput ranges from 0 to 1.
3. Dataset and Experiments
In this section, a simulation dataset will be used to train and test the proposed method and other baseline methods. In all the experiments, a computer equipped with an Intel 10900K CPU, 64 GB RAM and RTX 3090 GPU hardware capabilities. MATLAB 2021a software, Keras and Python programming language have been used.
3.1. Dataset and Parameters Setting
Typically, the carrier frequency of radar could be from 300 MHz to 300 GHz, and it is not possible to sample those high-frequency signals directly based on the theory of Nyquist sampling. As for the receivers, they usually have adaptive local oscillators which could down mix the frequency of the received signals and output the signals with lower frequency after the low-pass filter. The output signals with lower frequency are the final signals for sampling and analyzing. Besides, relatively short pulse width mode is commonly used and it is the typical wave mode of radar.
Based on the above reasons, in this letter, we use the simulation dataset to train and test the proposed method. Eleven different varieties of radar emitter signals whose pulse widths vary from 2 μs to 50 μs, including single-carrier frequency (SCF) signals, linear frequency modulation (LFM) signals, sinusoidal frequency modulation (SFM) signals, binary frequency shift keying (BFSK) signals, quadrature frequency shift keying (QFSK) signals, even quadratic frequency modulation (EQFM) signals, dual frequency modulation (DLFM) signals, multiple linear frequency modulation (MLFM) signals, binary phase shift keying (BPSK) signals, Frank phase-coded (Frank) signals, and composite modulation (LFM-BPSK) signals. The sampling frequency is 1 GHz and the parameters of signals are shown in
Table 1.
SNR is controlled as the power of the signals over the noise, which is defined as:
where
Psignal is the power of pure radar signal and
Pnoise is the power of noise. And the calculation of signal power is shown as:
where
P means the power of signal,
means the sampled sequence in time domain. In the simulation, the type of noise is additive white Gaussian noise (AWGN) [
22] and SNR ranges from −14 dB to 0 dB with 2 dB increment. The sampled radar emitter signal with AWGN could be written as:
where
means the pure sampled intra-pulse radar emitter signal sequence without noise,
means the sampled AWGN, and
is the sampled radar emitter signal with AWGN.
Figure 3 shows the original waveforms of eleven intra-pulse modulations radar emitter signal samples in time domain when the SNR is 0 dB.
At each value of SNR, the quantity of samples for each intra-pulse modulation signal at each value of SNR is 1800, where 800 samples are used for training, 200 samples are used for validation and 800 samples are used for testing. That is, there are 70,400 samples in the training dataset, 17,600 samples in the validation dataset and 70,400 samples in the testing dataset.
The reason why the number of samples in the testing dataset is much greater than that in the validation dataset and equal to that in the training dataset is that, in real applications, the number of samples which need to be tested could be larger than that in the validation dataset. As the parameters, like carrier frequency, pulse width and bandwidth, are changing in a wider range, the testing dataset with a large number of samples could include as many situations as possible. Besides, the average classification accuracy of the well-trained model in [
23] on a testing dataset containing 385,000 samples is almost 4% lower than that in a validation dataset containing 30,800 samples. Therefore, in order to increase the reliability of the classification and evaluate the real performance of the methods, we decide to use a large testing dataset. The influence of the amount of training data will be discussed in
Section 3.3.4.
Due to the sampling frequency and the max value of pulse width, we set 50 μs as the proper value of the pulse width. Therefore, the length of
xpadded in the experiment is set to 50,000.
Figure 4 shows the frequency-domain amplitude spectra of the same samples in
Figure 3 after data preprocessing stage.
3.2. Baseline Methods
In order to provide some evidence for not choosing the time-domain sequence as the input, we use the same structure of 1-D SKCNN to conduct the experiment, which is named 1-D SKCNN-time. For this baseline method, the data preprocessing contains same zero padding (see Equation (11)) and amplitude normalization. The amplitude normalization for 1-D SKCNN-time is shown as:
where
xtime is the input sequence for 1-D SKCNN-time.
refers to the conjugation of
xpadded.
Besides, to show the effectiveness of the selective kernel and attention mechanism, we organize three models. All three model have the same full connection layers. The structure of first two single models, named CNN-kernel_size16 and CNN-kernel_size9, are with two fixed kernel size (16 and 9, respectively) and include four blocks, where the convolutional layer, max-pooling layer and batch-normalization layer are connected one by one. And the third model named CNN-nonAP is transferred by deleting the attention part.
In addition, we employ some representative methods as the baselines including CNN-Qu [
12], CNN-Kong [
13], CNN-Zhang [
14], GoogLeNet [
17]. These methods are based on time-frequency transformation and have been proved that they have advantage of good accuracy on intra-pulse modulation classification.
3.3. Experiments on 1-D SKCNN
3.3.1. Experimental Settings of 1-D SKCNN
The two kernel sizes in the selective kernel block are 9 and 16, respectively. The pooling size and stride in each max-pooling layer are set to be 7. The full connection unit contains one hidden layer with 512 neurons and “ReLU” activation function. The activation function for the output layer is “SoftMax”.
At the stage of training 1-D SKCNN, the cross-entropy function is selected as the loss function. The optimization algorithm for the proposed 1-D SKCNN is adaptive moment estimation (ADAM) [
24]. The batch size is 64 and we have run 20 epochs for training, where the learning rate in the first 15 epochs is 0.001 and in the last five epochs is 0.0001. The weights used for the testing section is saved when the accuracy of validation dataset is highest.
3.3.2. Experimental Results of 1-D SKCNN
The proposed 1-D SKCNN model was trained based on the preprocessed data in
Section 3.1. The value of average accuracy during the training session are shown in
Figure 5.
Figure 5 shows that after training several epochs, the accuracy of the model on validation dataset turned to stable, which denotes that the model converged. Next, we test the classification performance of the proposed model with the weights where the accuracy of validation dataset is highest.
The classification performance under different SNR of the proposed 1-D SKCNN has been tested.
Table 2 gives the classification accuracy of eleven intra-pulse modulations based on the 1-D SKCNN. When SNR is greater than or equal to −10 dB, the average classification accuracy is above 97%, and the accuracy increases as the value of SNR rises. When SNR is greater than or equal to −6 dB, the classification accuracy of each intra-pulse modulation is over 99%.
In order to analyze the specific classification result, the confusion matrix is given in
Figure 6. Combined with the average accuracy in
Table 2, we could find that the main errors are from the classification that some LFM signals and LFM-BPSK signals are classified mistakenly as each other. Besides, there are some DLFM signals being classified as MLFM signals, and MLFM signals being classified as LFM signals. For the other types of intra-pulse modulation signals, the 1-D SKCNN could classify them accurately even when the SNR in the extreme low condition.
3.3.3. Learned Features
We investigate the extracted features from the proposed 1-D SKCNN. And some features filtered after the first, the second and the third main block based on one SFM sample are shown in
Figure 7a–c, respectively.
Figure 7 shows that as the depth of layer increases, the extracted features become sparser and more abstract, which indicates some intrinsic patterns in that SFM signal.
Besides, to visualize the attention part in the selective kernel convolutional blocks, we analyze the value of
zupper which refers to the attention part for the kernel size 9 and
zlower which refers to the attention part for the kernel size 16 by using the same SFM sample. The values of the attention part in the first, second and third selective convolutional block is shown in
Figure 8a–c, respectively. It indicates that the features filtered by the convolutional layer with different kernel size are weighted by the soft-attention part. In other words, for each output channel from the selective convolutional block, it was fused from its original two channels that gain weights differently based on the attention part.
3.3.4. The Influence of Volume of Training Data Size
We investigated the performance of the proposed method with different training data sizes. The validation dataset and testing dataset are the same as before. 0%, 5%, 10%, 25%, 50%, 75% and 100% of the whole training dataset are used for training 1-D SKCNN. The weights were saved when the accuracy of validation dataset is the highest and they were used for the testing section.
Figure 9 shows the average accuracy of 1-D SKCNN on testing dataset when the volume of training dataset is 0%, 5%, 10%, 25%, 50%, 75% and 100% of the original training dataset.
As
Figure 9 shows, the average accuracy of 1-D SKCNN increases logarithmically based on volume of training data size, which satisfies the conclusions in [
25]. Besides it could be concluded that the scale of the original training dataset with 70,400 samples is enough for training the proposed 1-D SKCNN.
4. Comparisons with the Baselines and Discussion
In order to show the superiority of the proposed method, in this section we will compare the proposed method with the baseline methods. For the task of intra-pulse modulation classification of radar emitter signals, the process of these method mainly focus on two things: (1) Time and storage usage and (2) the classification performance. Therefore, the comparisons could be divided into two parts: (1) Comparisons among the methods in the time and storage space usage. (2) Comparisons among the methods in classification performance.
4.1. Comparisons with the Baselines in the Time and Storage Space Usage
The scale of parameter for the model is a typical feature to measure the storage space usage. As the result, we provide the parameters for each CNN model in the methods in
Table 3. As the table shows, 1-D SKCNN is light-design with fewer parameters compared with the 2-D CNN based model.
Then, we evaluate the time usage of the methods. The time usage could be divided into two part: 1. The stage of data preprocessing. 2. The stage of training model. The data preprocessing for all methods was done by MATLAB 2021a. We randomly selected 500 samples from the dataset to do the data preprocessing for all methods, and the time usage is shown in
Table 4.
It is obvious that those 2-D CNN based methods which require time-frequency transformation need much more time to prepare the data for training models. And compared with the baselines, the FFT-based data preprocessing is easier to be accomplished.
Next, we evaluate the time usage in the stage of training model.
Table 5 gives the time usage per training epoch which includes the validation part and the number of epoch used for training. It could be seen that 1-D CNN-based methods require more time than the 2-D CNN-based methods. The main reason for this is that the input for 1-D CNN model is a vector with the shape of 1 × 50,000. However, the input for those 2-D CNN models requiring less training time is a matrix with the shape of 64 × 64 or 128 × 128. As the result, there will be more multiplication and addition operations in 1-D CNN-based methods.
4.2. Comparisons with the Baselines in the Classification Performance
In this section, we will evaluate the classification performance of the methods with different evaluation metrics, including average accuracy (AA), kappa coefficient (KC), recall, precision and F1-score.
Table 6 gives the classification accuracy of different methods on the testing dataset and the KC, the recall, the precision and the F1-score are given in
Table 7.
The comparison between 1-D SKCNN-time and 1-D SKCNN shows that although the structure of the model is the same, the frequency-domain sequence, instead of time-domain sequence, could bring a much positive classification result when it is used as the input. Then, except for 1-D SKCNN-time, we find that the classification performance of 1-D CNN-based methods are superior to those 2-D TFI-based CNN methods based on the given metrics. Although GoogleNet performs best among the 2-D CNN methods and its classification accuracy could be near to 100% when SNR is high, its rate of accuracy falls faster when compared with other 1-D CNN-based method except 1-D SKCNN-time. And when SNR is −10 dB, all of the 1-D CNN model except 1-D SKCNN-time could remain a good classification accuracy with over 90%.
Besides, the comparisons among 1-D SKCNN, CNN-kernel_size16, CNN-kernel_size9 and CNN-nonAP show that 1-D SKCNN performs best not only in the overall average accuracy, but also in every condition of SNR. And based on the analysis of time and storage usage in
Section 4.1, we could find that the selective kernel convolutional blocks, which contain selective kernel convolutional layers and soft-attention part at the same time, cost little computation resource and could provide a better classification result.