Radar Signal Intrapulse Modulation Recognition Based on a Denoising-Guided Disentangled Network

: Accurate recognition of radar modulation mode helps to better estimate radar echo parameters, thereby occupying an advantageous position in the radar electronic warfare (EW). However, under low signal-to-noise ratio environments, recent deep-learning-based radar signal recognition methods often perform poorly due to the unsuitable denoising preprocess. In this paper, a denoising-guided disentangled network based on an inception structure is proposed to simultaneously complete the denoising and recognition of radar signals in an end-to-end manner. The pure radar signal representation (PSR) is disentangled from the noise signal representation (NSR) through a feature disentangler and used to learn a radar signal modulation recognizer under low-SNR environments. Signal noise mutual information loss is proposed to enlarge the gap between the PSR and the NSR. Experimental results demonstrate that our method can obtain a recognition accuracy of 98.75% in the − 8 dB SNR and 89.25% in the − 10 dB environment of 12 modulation formats.


Introduction
Accurate identification of the radar signal intrapulse modulation helps to estimate the function of the radar transmitter and improve the accuracy of radar signal parameter estimation, which is critical in electronic intelligence systems, modern electronic support measure systems, and the radar early warning receiver [1][2][3][4][5]. Nevertheless, the currently widely used pulse compression technique greatly reduces the power spectral density of the radar signal, although it improves the range resolution of the pulse radar. Therefore, under the normal radar operating environments, the signal-to-noise ratio (SNR) of received radar signal is always significantly reduced, thereby seriously affecting the recognition accuracy of radar signal modulation type [6,7]. How to accurately identify the modulation type of radar signals in a low-SNR environment is still an urgent problem to be solved [6].
Traditional intrapulse modulation recognition (IPMR) methods consist of feature extraction and classifier [8]. The accuracy of classic recognition techniques in low-SNR environments is mainly based on feature extraction algorithms, such as high-order cumulants (HOCs), cyclostationary spectrum, instantaneous frequency features, wavelet transformation features, and Wigner Ville distribution (WVD) features [9,10]. Artificial feature extraction is an extremely skillful task that requires researchers with extensive experience. These approaches are difficult to generalize when recognizing new modulation formats. Recently, the works in [4,6,10,11] have proposed to automatically learn discriminative feature representation based on deep convolutional neural networks (DCNNs) to identify radar signal modulation format. However, heavy noise negatively affects the feature learning process of the DCNN when the SNR is low, resulting in the failure to guarantee the high performance of IPMR. Therefore, antinoise embedding is crucial for deep radar IPMR models to automatically extract discriminative feature representation.
Some studies have proposed to denoise radar signals with low SNR before identifying modulation categories through deep convolutional networks [3,[12][13][14]. However, these models use denoising as a preprocessing process independent of the modulation recognition step, which is unsuitable for radar signal identification, since the useful signal is inevitably suppressed by noise filters. The residual noise of noncooperative radar signals adversely influences the feature learning of the DCNN when the SNR is below −6 dB [14].
This paper proposes a denoising-guided disentangled network (DGDNet) to recognize intrapulse modulation mode of radar signals at low SNR. We convert the radar modulated signal to time-frequency image (TFI) by using the Cohen class time-frequency distribution (CTFD). Due to the effect of low SNR, TFIs generated directly from CTFD usually contain strong noise. We then use the DGDNet to disentangle the pure radar single representations (PSR) from the noise signal representations (NSR). The signal noise mutual information loss (SNMI) is proposed to broaden the gap between the PSR and NSR. PSR is used to implement radar modulation mode recognition and reduce the effect of noise on classification performance. The DGDNet adopts an end-to-end manner to simultaneously complete denoising and recognition of noisy TFIs, and automatically learn a discriminative radar signal feature expression in a low-SNR environment.
The contributions of this paper are the following: (1) We propose the DGDNet to simultaneously complete the denoising and recognition of noisy TFIs in an end-to-end manner; (2) We propose a feature disentangler to extract PSR from NSR and design the SNMI loss to obtain discriminative radar signal feature representation; (3) The experimental results demonstrate that the proposed method can obtain a recognition accuracy of 98.75% in the −8 dB SNR and 89.25% in the −10 dB environment of 12 modulation formats.

Related Work
Many studies have been conducted on the IPMR of radar signals in recent years. Our work focuses on identifying signal modulations in low-SNR environments.

Conventional IPMR under Low SNR
Radar intrapulse modulated signals are difficult to detect and identify due to their extremely low peak power, high duty cycle, and other broad spectrum. Many studies on intrapulse feature extraction use signal statistics, such as cumulants (HOC), spectrum, and time-frequency features, as discriminant features to recognize the format of radar signals [15,16]. In [15][16][17], the composite cumulants such as phase jitter, phase offset, and frequency offset are used as extracted features to identify radar intrapulse modulations due to their robustness to noise and model mismatch. Ravi K. and Lunden J. used spectral analysis and instantaneous time-domain to classify digital modulation signals [4,18]. In [11], features extracted from the wavelet transform are used for the identification of multiphaseshift keying and multifrequency-shift keying. In [19,20], an autocorrelation estimator is employed to analyze radar signal features. In [21], radar signal features are analyzed by the Wigner-Hough-Radon transform technique. Chen et al. [22] proposed the feature selection algorithm based on the mutual information between class and feature vector.
Traditional radar modulation recognition methods can correctly recognize the radar modulation formats in normal SNR environments. However, the difficulty of extracting the characteristic parameters within the radar signal pulse also increases as these characteristic parameters become more diverse and fragile. Traditional recognition methods may have problems of low identification accuracy and computational complexity under ultralow-SNR conditions.

Deep-Learning-Based IPMR in Low-SNR Conditions
Unlike handcrafted feature extraction methods, deep-learning-based models have the capability to automatically capture discriminative feature representations to identify different radar modulation formats.
Artificial neural networks were used as a new method of modulation recognition in [23] for the first time. Most deep learning-based IPMR approaches typically consist of two steps: denoising processing and modulation classification to improve system performance in low-SNR environments. The method proposed in [24] involves designing an eight-layer CNN classifier to identify TFIs, which is preprocessed by a series of 2D Wiener filters, bilinear interpolation and the Otsu method to remove background noise. Qu [14] proposed the convolutional denoising autoencoder (CDAE) to effectively reduce the interference of low SNR on IPMR and improve the classification performance. In [10], a deep autoencoder network for modulation classification was proposed. The network is trained with a non-negative constraint algorithm for constraining negative weights and inferring more meaningful hidden structures. In [25], three network compression and acceleration strategies are proposed, in order to reduce the CNN network's dependence on computational and memory resources, so that it can be applied to radar platforms with lower computational power and maintain good accuracy. In [26], a neural network applied in postprocessing of radar information is introduced.
Some scholars employ multi-task learning methods for radar modulation pattern recognition. Zhu et al. propose a deep multilabel-based AMC framework to classify the compound radar signals [3]. Wang et al. proposed a multitask learning (MTL)-based method [6] for generalized modulation classification in different noise scenarios.
All the above methods improve the recognition accuracy in low-SNR environments. However, disentangling the pure signal from the noise in deep feature space is a more straightforward solution when the background noise is extremely strong. The denoising and classification tasks can be synchronously completed in an end-to-end way through disentangled learning, and the two tasks can even supervise and promote each other.

Disentangled Learning
As a method of feature decomposition, disentangled learning aims to correctly reveal a set of independent factors that produce the current observation [23], which has been demonstrated as effective in tasks of image translation and image classification [27]. Interestingly, in [28], Han et al. proposed a disentangled-learning-based network for exploring disentangled general representations in biosignal processing.
In our method, we propose a disentangled framework for not only noise reduction, but also to use disentangled learning to disentangle a low-SNR radar representation into pure PSRs and NSRs. This method has great potential in bridging the gap between radar signal denoising and classification by using DGDNet. It enhances the useful signal by correctly uncovering two independent feature representations in the modulated radar signals.

Signal Model
In order to emphasize the disentangling and denoising effects of the DGDNet, we suppose that the modulation parameters of target signal are time-invariant. Additionally, we suppose that only one unknown modulation format signal enters the network at each time. The model of target signal is the following: where r(t) is the received signal containing noise, s(t) is the useful signal, and n(t) is the noise. The modulated signal model of s(t) can be represented as A is the signal amplitude, T is the pulse width, and t indicates the time. f c and φ 0 are the carrier frequency and the initial phase, respectively. φ(t) is the phase function that determines the modulation of the signal. rect(t) is a rectangular function. It can be represented as below, and |t| indicates the absolute value of t.
We focus on the features of the modulated signal φ(t). φ(t) is a quadratic function related to time for a linear frequency modulation (FM) function (LFM), whereas it is a sinusoidal function related to time for a cosine FM function (SFM).
In this paper, n(t) is focused on additive Gaussian white noise. We can assume that the variance is σ , the probability density function of n(t) can be expressed as SNR is defined as where σ s is signal power and σ n corresponds to noise power.

System Overview
The system can be divided into the following especially vital two parts: radar signal transform module and denoising-guided disentangled network, as we can see in Figure 1. The radar signal transform module uses the CTFD to transform the radar modulation signal into TFIs. Compared with other widely used time-frequency analysis methods, CTFD has desirable characteristics such as higher resolution, non-negativity, etc. In addition, CTFD can eliminate the cross term of the radar signal through a reasonable design of the kernel function. However, the TFI obtained through the CTFD contains a high noise level under extra low SNR, seriously affecting the recognition accuracy of IPMR.
We propose the DGDNet to strengthen the representation of the modulated radar signal and improve the IPMR recognition accuracy. The pure PSRs and the NSRs are disentangled to reduce the influence of noise signals which are up to identification results. In the meantime, the SNMI loss is proposed to reduce the correlation of NSR and PSR for improving the purity of the PSR. The recognition module is used to obtain the discriminative radar features from the PSR and perform the radar modulation recognition. Reconstruction loss, cross entropy loss, and SNMI loss are jointly used in training the DGDNet to improve the IPMR performance of radar signal under low SNR.

Radar Signal Transform Module
To extract the TFIs from noisy radar signal in low-SNR environments, CTFD is a more suitable time-frequency transformer than others, such as short-time Fourier transform (STFT) and WVD. STFT cuts the signal into small time slices of a certain length. Hence, it is unsuitable for unknown signals. WVD has cross terms that seriously affect the TFIs of nonlinear FM. CTFD can obtain the expected properties, such as higher resolution, being non-negative, and removal of cross terms by smoothing the WVD through time and frequency shifting with a kernel function [28]. The CTFD is defined as where r(u) and r * (u) are the received signal and its conjugation, respectively. AF (τ, υ) is a fuzzy function. τ is the time delay, and υ is the frequency shift. CT F D(t, ω) is the time-frequency output. φ(τ, υ) is the kernel function. The CT F D(t, w) is the result of a 2D Fourier transform of the kernel-treated fuzzy functions. AF (τ, υ) can be divided into self-term and interterm. The center of the self-term is located at (τ, υ) = (0, 0), and the center position of the interterm directly reflects the shock situation of the signal crossover. The farther the center is from the origin of the plane, the worse the shock.
For modulated signals, such as nonlinear FM signals, especially as PSK and FSK, have cross terms which sorely impact the accuracy of modulation signal recognition. Cross terms of modulated signals are susceptible to noisy signals, resulting in instability. When using the TFIs to identify the signals, we aim to minimize cross terms of the signal while maintaining its hidden characteristics.
The kernel function φ(τ, υ) is actually a flat 2D low-pass filter that filters the fuzzy function to suppress the cross terms, which is defined as where γ and ξ are used to shape and resize the kernel function. In Figure 2, Figure 2a presents the contour diagram of the kernel function at γ = 0.0005 and ξ = 0. 025, and Figure 2b is the local amplification diagram (with labels) of the kernel function. The kernel function is distributed along the axis and elliptical. This function satisfies most frequency modulation functions, such as SFM, secondary FM signal EQFM, and phase modulation signal PSK, where the fuzzy field interterms are in or around the axis. This condition allows the CTFD of the new nuclear function to have better noise-resistant capability.

Structure of The Network
The DGDNet is divided into three parts, namely, the global feature extractor (backbone), the feature disentangler, and the modulation mode recognizer. We directly use the Inception_v4 similar backbone as the global feature extractor to pick up the integrated features, including the useful and noise signals in the TFIs. The featured disentangler includes a pure radar feature extractor and a noise feature extractor that are used to obtain the PSR and NSR, respectively. The cosine distance loss between the ideal images and reconstruction images is proposed to supervise the extraction processes of the PSR and NSR. The SNMI loss between the PSR and NSR is proposed to increase the independence between the pure radar feature extraction process and the noise feature extraction process. Discriminative features are automatically extracted by the modulation mode recognizer to perform the radar signal modulation format classification, as shown in Figure 3.

Global Feature Extractor
The global feature extractor is a stem module (Inception_v4 similar backbone) aimed to obtain the deep features of input TFIs. The network roughly consists of nine layers, where three filter cascade layers are found. Correspondingly, the two small branches in front of each concat layer are used to automatically extract discriminative features at different scales. The initial size of the TFIs is 1024 × 1024. The input TFIs become a tensor with a size of 299 × 299 through image resizing and normalization. The output simultaneously contains the useful information and the noise. Its structure is shown in Figure 4.
The module applies convolution kernels of different sizes to extract image features and achieve feature fusion. The convolution layer (1 × 1 convolution) is used to adjust the dimensionality of the feature maps. After the global feature extractor, the network output is a tensor of 35 × 35 × 384 as the input of the pure radar feature extractor and the noise feature extractor.    Figure 4. Structure of the stem module.

Feature Disentangler
The output of the global feature extractor is feature maps, including PSR and NSR, and the feature disentangler is devised to progressively disentangle the PSR from the NSR by using the pure radar feature extractor and noise feature extractor. • Pure Radar Feature Extractor The pure radar feature extractor includes four Inception_A modules, one Reduction_A module, seven Inception_B modules, and one deconvolution module. The Inception module is used to extract the useful signal features hidden in the TFIs. The reduction layer is applied to reduce the image size. The output of the pure radar feature extractor is the PSR, which can be used to classify different modulation formats. The PSR can be used to reconstruct the denoised TFIs through the deconvolution module. This condition motivates us to design the radar signal reconstruction loss as the reconstruction loss L PSR _ cos is the cosine distance between the ideal denoising image y idl and pure image x recn reconstructed by the deconvolution module. Unfortunately, ideal denoising pictures cannot be obtained in real scenes (confrontation scenes, blind reception scenes). Therefore, we directly use the TFIs transformed from the radar signal under the SNR of 16 dB as the ideal denoising images. The details of the modules are shown in Figure 5. Figure 5a presents Inception_A, Figure 5b is Inception_B, Figure 5c is Reduction_A (k = 192, l = 224, m = 256, n = 384), and Figure 5d presents the deconvolution modules (Deconv-modules).  • Noise Feature Extractor Similar to the pure radar feature extractor, the noise signal extractor is based on the Inception structure. It contains one Inception_A module, one Reduction_A module, two Inception_B, and one Deconvolution module. The output of the noise feature extractor is the NSR, which can be used to reconstruct the noise images through the deconvolution module. Similar to the pure radar feature extraction process, the TFIs transformed from the radar signal under SNR of 16 dB can be used as the ideal denoising images. Therefore, the ideal noising images can be calculated as the difference between the input noisy TFIs and the ideal denoising images, as shown in Figure 2. A cosine distance loss L NSR _cos is designed to calculate the gap between the noise image and the ideal noising image, which is defined as where x in is the input image, p rc n is the noise picture reconstructed by the noise signal extractor, and y idl is the ideal pure image.

• SNMI Loss
To improve the independence between the PSR and the NSR, we propose the SNMI loss I SN to reduce the correlation between the pure radar feature extraction process and the noise feature extraction process. I SN is defined as I SN (X PSR , X NSR ) = log( dP p_n dP psr dP nsr )dP p_n , P psr = X NSR dP p_n P nsr = X PSR dP p_n , where X PSR and X NSR denote the PSR outputted by the pure radar feature extractor and the NSR exported by the noise signal extractor. P p_n indicates the joint probability distribution between X PSR and X NSR . P psr and P nsr are the marginal distributions. Minimizing I SN can promote the independence of X PSR and X NSR .

Modulation Mode Recognizer
The modulation mode recognizer contains one Reduction_B module, three Inception_C modules, and one FC layer. The FC layer is followed by the softmax layer to convert the feature representation to the probability values. Followed by the pure radar feature extractor, the recognizer is designed to extract the discriminative feature representation from the PSR to perform the classification of 12 modulation formats.
The Reduction_B and the Inception_C module structures are shown in Figure 6. The recognizer loss is the cross entropy of the predicted results and the actual modulation mode labels. Cross entropy is defined as where y lab is the given modulation mode label, and x pre is the predicted result. The total loss L total is defined as where α is used to normalize I SN for matching with other losses and is set to 0.25 in the experiments.

Dataset Types
Our method identifies 12 types of radar modulation signals, namely, monopulse signals, LFM signal, SFM signal, bilinear FM signal, multiple linear FM, EQFM, binary PSK, binary frequency key-control (2FSK), quad frequency key-control (4FSK), polyphase coded signal (Frank), and composite modulation (LFM-BPSK, 2FSK-BPSK). The corresponding TFIs are shown in Figure 7. The normalized frequency is used, and the number of sampling points is 1000. Figure 7a-l show the TFI of 12 types of radar modulation signals respectively. Figure 7a-e,l have much better time-frequency characteristics. Such time-frequency images are used as datasets for network training.

Construction of Datasets
Due to the secrecy of radar signals, especially for military radars, it is difficult to obtain real datasets. Therefore, we use a Matlab simulation platform to generate simulation datasets with a large dynamic parameter range (details are shown in Table 1). We take SNR = 16 as pure signal. The signal length is N = 1024, and the sampling rate is 200 MHz. Considering that −10 dB is a very typical low SNR in a practical application environment, most of the current radar modulation identification methods still cannot significantly improve the radar modulation recognition accuracy in the −10 dB SNR environment, especially when the types of radar modulation signals reach 12. Additionally, 8 dB also represents a relatively high SNR level achievable in practical applications. Therefore, in this paper, the evaluating SNR ranges from −10 dB to 8 dB. For every radar modulation type, the number of training and test datasets is 700 and 100 per 2 dB SNR, respectively. Therefore, 84,000 training and 12,000 test samples are used altogether. Each signal adds Gaussian white noise.

Baseline Methods
We use six methods as performance baselines to demonstrate the supervision of our approach for the IPMR task. kNN: This model is a classic machine learning method [29] which is used for data mining classification. SVM: This model is another machine learning method [30]. It is supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. ADGOONet: This model is a two-stage IPMR that includes the denoising step and recognition step. ADNet [24], which is a state-of-the-art image denoising network proposed in 2020, is used as the denoising filter. In the recognition step, GoogleNet [23] is used to classify the denoised TFIs generated from ADNet.
ADVGGNet: Similar to ADGOONet, the denoising filter is ADNet, and the modulation recognizer is Vgg16.
ADRESNet: This model is also a two-stage IPMR. ADNet is used as a noise filter, and ResNet50 [27] is applied as a modulation format identifier.
INCEPTIONv4 [28]: This model uses classic CNNs for image recognition. It autonomously learns features that are most conducive to image classification and continuously optimizes itself to achieve modulation format recognition.

Simulation Result and Analysis
We train our networks with stochastic gradient descent and by utilizing the Pytorch platform with 3080ti GPU. We use a learning rate which is set to 0.01, decayed at the 30th, 50th, and 70th epochs with an exponential rate of 0.1. The total number of epochs is 100. The initial image size of the datasets is 1024 × 1024. Thus, we first use the three interpolation methods to adjust the image size to 64 × 64, divide each pixel value of the image by 255, and normalize it between 0 and 1. We add a dropout before the FC layer in the modulation mode recognizer to reduce overfitting.
The three baseline methods are compared in Figure 8 to better illustrate the performance of the DGDNet. Table 2 lists the detailed indicators. We define the Overall Probability of Successful Recognition (OPSR) to describe the recognition accuracy of IPMR. OPSR is the ratio of the model's correct prediction across all test sets.
where NT pre is the number of correct prediction of dataset, NS test is the overall number of test dataset. Above an SNR of −6 dB, the OPSR mostly reaches 100%, especially the DGDNet. Below −6 dB, the OPSR of DGDNet exceeds the three other two-stage IPMR models. The end-to-end strategy is effective.
To demonstrate the effectiveness of the feature disentangler and the SNMI loss, we compared the baseline Inceptionv4 and the DGDNet without SNMI loss in Figure 9. The OPSR is greatly improved under low SNR after adding the feature disentangle. Under the SNR of −10 dB, the OPSR increases by approximately 0.7 percentage points. The OPSR is improved in the SNR of −6 and −8 dB in Table 3 after enlarging the independence of PSR and NSR through the SNMI loss. NSL indicates "No SNMI Loss". Therefore, the feature disentangler and the SNMI loss are extremely effective for improving the recognition of radar signal modulation format in low-SNR environments.  [6]. Among them, classical machine learning algorithms such as SVM and kNN have unsatisfactory accuracy, while deep-learning-based methods are more competitive in accuracy regardless of low SNR or high SNR. Figure 9. Comparison of the DGDNet with different parameters and Inception_v4 [28]. As can be seen in the figure, Inception_v4's performance below −4 dB is inferior to that of DGDNet proposed by us. Meanwhile, DGDNet performed better with SNMI.

Conclusions
In this paper, a novel network called DGDNet is proposed to recognize intrapulse modulation mode of radar signal. The noisy TFIs under low-SNR environments can be obtained through the CTFD. The DGDNet is used to simultaneously complete the denoising and recognition of noisy TFIs in an end-to-end method. Meanwhile, PSR and NSR can be automatically extracted from the feature disentangler to improve the radar signal modulation identification performance in a low-SNR environment. The experimental results are sufficient to encourage us that the proposed method can obtain a final recognition accuracy of 98.75% in the −8 dB SNR and 89.25% in the −10 dB environment of 12 modulation formats.