Recognition of Noisy Radar Emitter Signals Using a One-Dimensional Deep Residual Shrinkage Network

Signal features can be obscured in noisy environments, resulting in low accuracy of radar emitter signal recognition based on traditional methods. To improve the ability of learning features from noisy signals, a new radar emitter signal recognition method based on one-dimensional (1D) deep residual shrinkage network (DRSN) is proposed, which offers the following advantages: (i) Unimportant features are eliminated using the soft thresholding function, and the thresholds are automatically set based on the attention mechanism; (ii) without any professional knowledge of signal processing or dimension conversion of data, the 1D DRSN can automatically learn the features characterizing the signal directly from the 1D data and achieve a high recognition rate for noisy signals. The effectiveness of the 1D DRSN was experimentally verified under different types of noise. In addition, comparison with other deep learning methods revealed the superior performance of the DRSN. Last, the mechanism of eliminating redundant features using the soft thresholding function was analyzed.


Introduction
One of the most important functions of radar countermeasure systems is that radar emitter signal recognition, in which classification and recognition of intercepted radar signals are carried out to determine the radar type, purpose, carrier, threat level, and recognition credibility of the radar [1]. Therefore, accurate radar emitter signal recognition is essential for subsequent radar analysis and action preparation.
The existing radar emitter signal recognition methods can be divided into two categories. The first category includes traditional signal analysis methods, including gray correlation analysis [2], template matching [3], fuzzy matching [4], and attribute measurement [5]. However, there are various deficiencies in the traditional methods, the recognition performance is dependent on the richness of prior knowledge, tolerance rate and robustness are poor, and they do not have automatic learning abilities. The second category includes deep learning methods, which are usually based on time-frequency transform. For example, the classification and recognition of signals were achieved using the Choi-Williams time-frequency distribution with convolutional neural network (CNN) [6]. In a study by Zhao et al., the Margenau-Hill time-frequency distribution and smooth pseudo-Wigner-Ville distribution (SPWVD) were used as signal features, and then a classifier was built for radar emitter signal recognition based on an automatic encoder (AE), a deep belief network (DBN), and a CNN [7]. Based on the deep Q-learning network (DQN) [8], the Cohen's class time-frequency distributions were used for signal recognition. Wu et al. [9] used one-dimensional (1D) CNN for radar signal recognition. However, the parameter optimization of traditional deep learning methods is a difficult task, the error function gradient may gradually become inaccurate in the process of reverse propagation. When the network layer is too deep, the parameters in the initial layers cannot be optimized well. Moreover, the 1D raw signal needs to be converted to a two-dimensional signal, consuming extra time and calculation resources.
With the development of radar technology, the electromagnetic environment is becoming increasingly complex, and received radar signals are inevitably accompanied by noise [10,11]. What is more tricky is that different types of noise have different impacts on radar signals. Traditional signal analysis and processing methods are effective only for a certain type of noise. Besides, the recognition accuracy is low under strong noise conditions. For deep learning methods, the learned high-level features are likely to be less discriminative under the interference of noise. Therefore, it is necessary to develop a new radar emitter signal recognition model that not only has the ability to process different types of noise, but can also achieve a high recognition rate under strong noise conditions. To achieve the above requirements, in this study, a radar emitter signal recognition method based on a 1D deep residual shrinkage network (DRSN) is proposed.
The main contribution of this study are as follows: (1) Important features could be directly extracted from a time-sequential sequence using a 1D DRSN without dimension conversion. Compared with traditional deep learning methods, the recognition accuracy was improved. (2) The rectified linear unit (ReLU) was replaced by a soft thresholding function to eliminate unimportant features. Moreover, the attention mechanism was used to adaptively set the threshold to achieve recognition of noisy radar emitter signals. The mechanism of elimination of redundant features using the soft thresholding function was analyzed. (3) Radar emitter signals containing different types of noise were recognized using the proposed method, showing excellent results.
The rest of this article is organized as follows. Four noise models are proposed in Section 2, and the structure of the 1D DRSN is proposed in Section 3. Section 4 presents tests and data analysis, followed by the elucidation of the denoising mechanism of the soft thresholding function in Section 5. Finally, the study is concluded in Section 6.

Signal Noise
We do not know what types of noise it is when intercepting a radar emitter signal. Thus, the ability to process different types of noise is essential for a radar emitter signal recognition model. The following four representative types of noise were used as the background noise of the radar emitter signal, and then the 1D DRSN was used for recognition.

Gaussian Noise
With strong randomness, gaussian noise widely exists in the environment [12]. Under a low signal-to-noise ratio (SNR), Gaussian noise can severely impact the time-domain waveform of the signal and cover up useful information, resulting in difficulty in signal recognition. As a type of Gaussian noise, white Gaussian noise is often added to a signal to form additive white Gaussian noise in communication channels for testing and modeling, where the probability density function is as follows: where x is a random variable, and µ, σ are the mean and standard deviation of the Gaussian distribution, respectively.

Laplacian Noise
Laplacian noise is a non-Gaussian noise. There are often pulse noise and co-channel interference in the actual communication environment [13], causing the Gaussian noise model to no longer be applicable, so it is essential to recognize radar emitter signals with non-Gaussian noise. In this study, Laplacian noise was used as a non-Gaussian noise for testing the performance of the proposed model. The Laplacian probability density function is as follows: where x is a random variable and λ, µ are constants.

Poisson Noise
Poisson noise is a signal-dependent noise. Different from the distribution of white Gaussian noise, which is independent of the signal, the distribution of Poisson noise is closely related to the signal; i.e., there is a strong correlation between the noise intensity and signal intensity [14]. The traditional methods for processing additive Gaussian noise are not applicable to Poisson noise. Considering the advantage of deep learning-based methods with self-learning discriminant features, in this study, radar emitter signals with Poisson noise were recognized using the 1D DRSN method. The probability density function of Poisson noise is as follows: where λ is the average frequency of a random event per unit time.

Cauchy Noise
Cauchy noise is a kind of Lévy noise, which has a heavy-tailed probability distribution [15]. In the field of radar emitter signal recognition, Cauchy distributions can be used to simulate burr noise environment. The probability density function of Cauchy noise is as follows: where µ is the location parameter, and σ is the scale parameter.

1D Convolution
In the proposed 1D DRSN method, 1D convolution will be used. The convolutional layer is the key difference between a CNN and a fully connected neural network (FCNN), the number of training parameters is greatly reduced by the convolutional layer via weight sharing. Since the radar emitter signal was 1D, 1D convolution was used in this study. Compared with traditional two-dimensional convolution, 1D convolution requires fewer parameters, and there is no need for signal dimension conversion, which reduces the time cost and calculation resources necessary. The process of 1D convolution is as follows: where y j is the jth channel of the output feature map, k is the convolutional kernel, b is the bias, and C is the number of input channels. The 1D convolution is shown in Figure 1, and the height in Figure 1 is 1.

1D DRSN
The difference between the DRSN [16] and general deep learning methods, like residual network (ResNet), is that the traditional ReLU is replaced by the soft thresholding function as a nonlinear activation function. A soft threshold is often used as a key step in signal denoising [17]. In traditional denoising algorithms, the signal is converted into a domain with unimportant features near zero, then these features near zero are set to 0 by the soft thresholding function. For instance, the key to the wavelet denoising algorithm is to design a filter that amplifies useful information in the signal and reduces the noise information to near zero, then the soft threshold function is used to filter the noise. However, extensive knowledge is required to design this kind of filter in reality; thus, it is often a difficult task. By integrating the soft thresholding function and the deep learning method, the DRSN can effectively improve the recognition accuracy while overcoming the difficulties associated with manually designed filters. The soft threshold function is as follows:

1D DRSN
The difference between the DRSN [16] and general deep learning methods, like residual network (ResNet), is that the traditional ReLU is replaced by the soft thresholding function as a nonlinear activation function. A soft threshold is often used as a key step in signal denoising [17]. In traditional denoising algorithms, the signal is converted into a domain with unimportant features near zero, then these features near zero are set to 0 by the soft thresholding function. For instance, the key to the wavelet denoising algorithm is to design a filter that amplifies useful information in the signal and reduces the noise information to near zero, then the soft threshold function is used to filter the noise. However, extensive knowledge is required to design this kind of filter in reality; thus, it is often a difficult task. By integrating the soft thresholding function and the deep learning method, the DRSN can effectively improve the recognition accuracy while overcoming the difficulties associated with manually designed filters. The soft threshold function is as follows: where is a threshold with a positive value. Different from ReLU, which sets the negative values to 0, the soft thresholding function sets the values near zero to 0, and the negative useful features are retained. The partial derivative of the soft thresholding function is as follows: The partial derivative of the soft thresholding function is either 1 or 0; thus, vanishing gradient and exploding gradient can be avoided. The diagrams of the soft thresholding where τ is a threshold with a positive value. Different from ReLU, which sets the negative values to 0, the soft thresholding function sets the values near zero to 0, and the negative useful features are retained. The partial derivative of the soft thresholding function is as follows: The partial derivative of the soft thresholding function is either 1 or 0; thus, vanishing gradient and exploding gradient can be avoided. The diagrams of the soft thresholding function and ReLU function are shown in Figure 2.
The residual shrinkage units of the 1D DRSN are shown in Figure 3a, which include two batch normalization units, two ReLU activation functions, two 1D convolutional layers, one identity shortcut and one attention mechanism unit. The purpose of batch normalization is to reduce the training difficulty and to increase the training speed, and the equations are as follows: where x n and y n are the input features and output features of a certain batch, γ and β are two parameters that can be trained, and ε is a near-zero positive number used to prevent zero division. In batch normalization, the samples are converted to a distribution with a mean of 0 and standard deviation of 1 to reduce the training difficulty. The residual shrinkage units of the 1D DRSN are shown in Figure 3a, which include two batch normalization units, two ReLU activation functions, two 1D convolutional layers, one identity shortcut and one attention mechanism unit. The purpose of batch normalization is to reduce the training difficulty and to increase the training speed, and the equations are as follows: where and are the input features and output features of a certain batch, and are two parameters that can be trained, and is a near-zero positive number used to prevent zero division. In batch normalization, the samples are converted to a distribution with a mean of 0 and standard deviation of 1 to reduce the training difficulty.
As one of the common activation functions in existing deep learning networks [18], the ReLU function can effectively prevent vanishing gradient and increase the training speed. The equation is as follows: The soft threshold is determined by the attention mechanism. As shown in Figure 3a, assuming that the output after two 1D convolutional layers is , and the output after taking the absolute value and global average pooling is: where is a vector with a length of , , and c are the index numbers of the width, height and channel of the feature map. The output is obtained through two fully connected layers and is then converted to a number between 0 and 1 using the sigmoid function.
where is also a vector with a length of , and the threshold is obtained after multiplication of Equations (12) and (13).
where is the threshold of channel . The aim of Equation (14) is to emphasize that different channels may have different thresholds in the feature map with channels. In this way, useful features can be flexibly retained, and useless features can be deleted.  Figure 3b is the overall structure of the 1D DRSN, which is similar to that of ResNet [19], the only difference is that the residual module is replaced by the 1D residual shrinkage building unit (RSBU) module, which integrates soft thresholding function and attention mechanism. Note that the DRSN consists of a series of RSBU modules and a global As one of the common activation functions in existing deep learning networks [18], the ReLU function can effectively prevent vanishing gradient and increase the training speed. The equation is as follows: The soft threshold is determined by the attention mechanism. As shown in Figure 3a, assuming that the output after two 1D convolutional layers is x, and the output after taking the absolute value and global average pooling is: where y is a vector with a length of C, i, j and c are the index numbers of the width, height and channel of the feature map. The output z is obtained through two fully connected layers and is then converted to a number between 0 and 1 using the sigmoid function.
where a is also a vector with a length of C, and the threshold is obtained after multiplication of Equations (12) and (13).
where τ c is the threshold of channel c. The aim of Equation (14) is to emphasize that different channels may have different thresholds in the feature map with C channels. In this way, useful features can be flexibly retained, and useless features can be deleted. Figure 3b is the overall structure of the 1D DRSN, which is similar to that of ResNet [19], the only difference is that the residual module is replaced by the 1D residual shrinkage building unit (RSBU) module, which integrates soft thresholding function and attention mechanism. Note that the DRSN consists of a series of RSBU modules and a global average pooling layer. The global average pooling layer is obtained by the mean of each channel, whose function is to reduce the number of training weights and the risk of overfitting.
where x c is the output features of channel c, and N is the number of features. The crossentropy function is used as the error propagation function: where y is the data label,ŷ is the predicted category, g(θ, x) is the output of the model, x and θ are the input and parameters of the network model, and k is the number of categories to be classified. After the cross-entropy error is calculated, the gradient descent algorithm is used for parameter optimization. To achieve fast convergence, the adaptive moment estimation (ADAM) method is used in this study, and the parameters proposed in previous studies are used [20,21].

Network Construction
The structure of the proposed network is shown in Figure 4. The first and second numbers in the bracket in residual shrinkage building units (RSBUs) are the number and width of the convolutional kernel, respectively. "/2" represents a step length of 2. In the RSBUs, three RSBUs were followed when the previous DRSN's step length was 2, and there were a total of 12 RSBUs. The number of iterations was 160. To accelerate the training speed while ensuring the training accuracy, the original learning rate was set to 0.1. The

Network Construction
The structure of the proposed network is shown in Figure 4. The first and second numbers in the bracket in residual shrinkage building units (RSBUs) are the number and width of the convolutional kernel, respectively. "/2" represents a step length of 2. In the RSBUs, three RSBUs were followed when the previous DRSN's step length was 2, and there were a total of 12 RSBUs. The number of iterations was 160. To accelerate the training speed while ensuring the training accuracy, the original learning rate was set to 0.1. The learning rate decreased by 10-fold every 40 cycles until the number of iterations was 120, lastly, the learning rate of the final 40 cycles decreased by 2-fold. The number of batch samples was set to 128. To prevent overfitting of the model, we used L2 regularization, and the penalty coefficient was set to 0.0001 according to the recommendations of [22].

Results
The test platforms and parameters used in this study are shown in Table 1.

Datasets
Seven types of representative radar emitter signals were used, including 13-bit barker codes, frequency coding signals, frequency diversity signals, linear frequency-modulated signals (LFM), nonlinear frequency-modulated signals (NLFM), single-carrier frequency signals (CW) and barker-lfm mixed modulation signals. The SNR of the signals ranged from −8 dB to 4 dB with an interval of 2 dB, i.e., a total of seven SNRs. The sampling

Results
The test platforms and parameters used in this study are shown in Table 1.

Datasets
Seven types of representative radar emitter signals were used, including 13-bit barker codes, frequency coding signals, frequency diversity signals, linear frequency-modulated signals (LFM), nonlinear frequency-modulated signals (NLFM), single-carrier frequency signals (CW) and barker-lfm mixed modulation signals. The SNR of the signals ranged from −8 dB to 4 dB with an interval of 2 dB, i.e., a total of seven SNRs. The sampling frequency was 512 MHz. All radar emitter signal data contained only one pulse, and the data length was 512, if the signal is less than 512 points, it shall be supplemented completely by zero-filling method. Note that the recognition of short monopulse data was more challenging. Four types of noise were generated, including Gaussian noise, Laplacian noise, Poisson noise, and Cauchy noise. Under each noise conditions: Training set and validation set: 300 samples were generated under each SNR and signal type, a total of 7×7×300 = 14700 samples, which are randomly divided into 80% as the training set and 20% as the validation set.
Testing set: 100 samples were generated under each SNR and signal type, a total of 7×7×100 = 4900 samples.
The parameters of the signals are shown in Table 2.
It is worth noting that the intervals of the carrier frequency and modulation parameters with different types of signals are overlapped deliberately to increase the recognition difficulty.

Recognition Results of Radar Signals with the Four Types of Noise
The 1D DRSN model was used for recognition of radar emitter signals with four different types of noise, i.e., Gaussian noise, Laplacian noise, Poisson noise and Cauchy noise. The average recognition rate of the training set and validation set varied with the number of iterations, as shown in Figure 5.  As Figure 5 shows, the accuracy of the training set and validation set of the four types of noise reaches a high level as the number of iterations increases, and then they were both stable after 120 iterations, indicating that the model has converged, and finally the training accuracy nearly stabilized at 96.51%, 99.00%, 99.90%, and 99.99%, respectively.
The trained model was then evaluated using the testing set. The variation in the recognition rate with SNR under the four noise conditions is shown in Figure 6. As Figure 5 shows, the accuracy of the training set and validation set of the four types of noise reaches a high level as the number of iterations increases, and then they were both stable after 120 iterations, indicating that the model has converged, and finally the training accuracy nearly stabilized at 96.51%, 99.00%, 99.90%, and 99.99%, respectively.
The trained model was then evaluated using the testing set. The variation in the recognition rate with SNR under the four noise conditions is shown in Figure 6.  Figure 6a shows that under Gaussian noise, when the SNR is above −2 dB, the recognition rate reaches over 86%. Note that, except for linear frequency modulated (LFM) and nonlinear frequency modulated (NLFM) signals, the recognition rate of all signals is above 85% when the SNR is over −6 dB. The reason for the low recognition rate of LFM and NLFM signals with low SNR is that both LFM and NLFM signals are frequency modulated signals, the raw time-domain waveforms of which are very similar, and the difference in the frequency domain cannot reflected. Moreover, under the condition of strong Gaussian noise, the features of the two signals are covered, which increases the difficulty of recognition. However, the recognition accuracy of LFM and NLFM achieve 99% and 94%, respectively, when SNR is 4 dB shows that 1D DRSN is still better than the general deep learning methods. If only the frequency modulation signal needs to be recognized, the recognition rate would be improved. Under Laplacian noise, when the SNR is greater than −8 dB, the recognition rate of the 7 types of radar emitter signals is greater than 69%. Figure 6a,b show that the overall recognition rate of radar emitter signals with Laplacian noise is higher than that of the signals with Gaussian noise, indicating that the 1D DRSN model has stronger adaptability for signal recognition under non-Gaussian noise condi-  Figure 6a shows that under Gaussian noise, when the SNR is above −2 dB, the recognition rate reaches over 86%. Note that, except for linear frequency modulated (LFM) and nonlinear frequency modulated (NLFM) signals, the recognition rate of all signals is above 85% when the SNR is over −6 dB. The reason for the low recognition rate of LFM and NLFM signals with low SNR is that both LFM and NLFM signals are frequency modulated signals, the raw time-domain waveforms of which are very similar, and the difference in the frequency domain cannot reflected. Moreover, under the condition of strong Gaussian noise, the features of the two signals are covered, which increases the difficulty of recognition. However, the recognition accuracy of LFM and NLFM achieve 99% and 94%, respectively, when SNR is 4 dB shows that 1D DRSN is still better than the general deep learning methods. If only the frequency modulation signal needs to be recognized, the recognition rate would be improved. Under Laplacian noise, when the SNR is greater than −8 dB, the recognition rate of the 7 types of radar emitter signals is greater than 69%. Figure 6a,b show that the overall recognition rate of radar emitter signals with Laplacian noise is higher than that of the signals with Gaussian noise, indicating that the 1D DRSN model has stronger adaptability for signal recognition under non-Gaussian noise conditions. Under the condition of Poisson noise and Cauchy noise, the average recognition rate is greater than 98%, 97%, respectively, as shown in Figure 6c,d. This is because Poisson noise and Cauchy noise appear as a spike in the time-domain signal. Compared with the signal amplitude, the Poisson noise and Cauchy noise amplitude are much larger. Unlike the situation of the first two types of noise, where the time-domain waveform is submerged by the noise, the time-domain waveform is basically retained under the background of Poisson noise and Cauchy noise. Thus, the 1D DRSN model can accurately filter out the Poisson noise and Cauchy noise, and the learned features have strong discriminative ability, leading to a high average recognition rate. The recognition rate under each condition is described in detail in the Appendix A.

Analysis of Learned Features
We analyzed the features learned by the 1D DRSN model. Specifically, the test samples were input into the trained 1D DRSN model to extract the features after the global average pooling layer, and the t-distributed stochastic neighbor embedding (t-SNE) was used to reduce the dimensionality to two-dimensional space for visual analysis [23]. Although there was information loss during the dimensionality reduction process, t-SNE, as a nonlinear unsupervised dimensionality reduction method, can intuitively determine whether the learned features are distinguishable. Figure 7 shows the two-dimensional t-distribution diagrams of the radar emitter signals at 4 dB under four types of noise.

Analysis of Learned Features
We analyzed the features learned by the 1D DRSN model. Specifically, the test samples were input into the trained 1D DRSN model to extract the features after the global average pooling layer, and the t-distributed stochastic neighbor embedding (t-SNE) was used to reduce the dimensionality to two-dimensional space for visual analysis [23]. Although there was information loss during the dimensionality reduction process, t-SNE, as a nonlinear unsupervised dimensionality reduction method, can intuitively determine whether the learned features are distinguishable. Figure 7 shows the two-dimensional tdistribution diagrams of the radar emitter signals at 4 dB under four types of noise.   As Figure 7a,b show that, except for some of the LFM and NLFM signals being aliased, the most samples of the same type of radar emitter signal are concentrated in the same area and away from other types of samples, indicating that the features learned by the 1D DRSN have high discrimination ability. Some LFM and NLFM samples are aliased, the reason why is that the only difference in terms of amplitude is that the changing rate in the time domain. Against the noisy background, the Gaussian noise or Laplacian noise covers this unique distinguishing feature, thereby causing aliasing of LFM and NLFM samples. This also explains why the recognition rate of LFM and NLFM signals is low when the SNR is low. If the LFM and NLFM signals are regarded as frequency modulation signals, the recognition rate would be improved. In addition, Figure 7c,d show that the distance between different types of samples is large and that the same types of samples are concentrated in the same areas, indicating the high discrimination of the learned features. As Poisson noise and Cauchy noise appear as a sudden spike in the time domain, it does not affect the signal waveform, the DRSN filters out the noise well and is able to retain useful features, resulting in high signal recognition rate.

Comparison with Other Models
To further verify the effectiveness of the 1D DRSN, we compared the proposed method with some of the state-of-the-art deep learning networks, i.e., 1D ResNet and 1D ConvNet. The same network structure was used, as shown in Table 3. In Table 3, RBU stands for ResNet residual module unit. Unlike RSBU, RBU uses ReLU as the activation function without the attention mechanism, so the only difference between DRSN and ResNet is the activation function, which one uses the soft thresholding function and the other uses the ReLU function. CBU represents convolution module unit, which is different from RBU and RSBU in that it has no identity connection.
The average recognition rates of the four models under Gaussian noise, Laplacian noise, Poisson noise and Cauchy noise are shown in Figure 8. Under all noise conditions, the average recognition rate of the 1D DRSN is higher than those of 1D ResNet and 1D ConvNet, indicating that the 1D DRSN model has a better performance, which is because 1D DRSN uses a soft thresholding activation function to retain useful features of the signal while eliminate the useless features of noise. Tables 4-7 show the average recognition accuracies of DRSN, ResNet and ConvNet under four kinds of noise conditions on the test set, which indicated that the DRSN model has a better performance for radar emitter signal recognition under noisy condition. Table 8 shows the number of parameters and training time per epoch for DRSN, ResNet and ConvNet models, respectively. It can be seen that the DRSN improves recognition accuracy without significantly increasing the consuming of time and computing resources.

Comparison with Different Sampling Frequencies
Under the condition of Nyquist sampling theorem, the data sets with sampling frequencies of 460 MHz, 512 MHz and 1024 MHz were used to compare the effects of different sampling frequencies on the performance of 1D DRSN. The results are shown in Figure 9.

Comparison between the Soft Thresholding Function and ReLU Function
In order to elaborate the mechanism of eliminating redundant features of soft thresholding function, the soft thresholding function and ReLU function are compared and analyzed in this section. Both of them can set the features of part of the interval to 0 and delete useless features. As mentioned earlier, the gradient of the soft thresholding function and ReLU is either 1 or 0, both of which are conducive to error back propagation. Compared with ReLU, the advantage of the soft thresholding function is that the threshold can be set flexibly, thereby more accurately deleting useless feature intervals while retaining useful features. The soft thresholding function is expressed as follows: Considering the bias when training 1D DRSN model, when the bias is = 0: The ReLU function sets all negative features to 0, as shown in Figure 10, the red shaded part indicates that the feature of the part is set to 0.   As can be seen from Figure 9, the greater sampling frequency, the higher recognition rate, in particular, the lower SNR, the more obvious improvement of accuracy. This is because the larger the sampling frequency, the more information obtained, and the more effective discriminant features the 1D DRSN can learn, but it also consumes more time and computing resources.
The average accuracy and training time at different sampling frequencies are shown in Table 9.

Comparison between the Soft Thresholding Function and ReLU Function
In order to elaborate the mechanism of eliminating redundant features of soft thresholding function, the soft thresholding function and ReLU function are compared and analyzed in this section. Both of them can set the features of part of the interval to 0 and delete useless features. As mentioned earlier, the gradient of the soft thresholding function and ReLU is either 1 or 0, both of which are conducive to error back propagation. Compared with ReLU, the advantage of the soft thresholding function is that the threshold can be set flexibly, thereby more accurately deleting useless feature intervals while retaining useful features. The soft thresholding function is expressed as follows: Considering the bias when training 1D DRSN model, when the bias is b = 0: The ReLU function sets all negative features to 0, as shown in Figure 10, the red shaded part indicates that the feature of the part is set to 0. Similarly, the soft thresholding function is shown as Figure 11 when the bias is = 0, the features in the interval [− , ] are deleted, and the rest are retained and move to the 0 by . Similarly, the soft thresholding function is shown as Figure 11 when the bias is b = 0, the features in the interval [−τ, τ] are deleted, and the rest are retained and move to the 0 by τ. Similarly, the soft thresholding function is shown as Figure 11 when the bias is = 0, the features in the interval [− , ] are deleted, and the rest are retained and move to the 0 by .  When the bias b = 0, here, we analyze only the situation when b > 0 in consideration of the length of the paper. When b > 0: The ReLU function becomes y = max(x + b, 0), and the feature values are shifted up by b; then the negative features are set to 0, as shown Figure 12: When the bias ≠ 0, here, we analyze only the situation when b > 0 in consideration of the length of the paper. When > 0: The ReLU function becomes = ( + , 0) , and the feature values are shifted up by ; then the negative features are set to 0, as shown Figure 12:  When the bias ≠ 0, here, we analyze only the situation when b > 0 in consideration of the length of the paper. When > 0: The ReLU function becomes = ( + , 0) , and the feature values are shifted up by ; then the negative features are set to 0, as shown Figure 12: Next, we analyze the reason why soft thresholding function is better than ReLU function as the activation function. The explanation is that the former can achieve the same function as ReLU when given proper values of b and τ, but the converse is not. In fact, in 1D DRSN, the bias b and threshold τ are both trainable parameters. Due to the data of radar signals are finite, the feature values are distributed in a certain interval. When the soft thresholding function is equivalent to the ReLU function with a bias of 0, as shown Figure 14: Next, we analyze the reason why soft thresholding function is better than ReLU function as the activation function. The explanation is that the former can achieve the same function as ReLU when given proper values of and , but the converse is not. In fact, in 1D DRSN, the bias and threshold are both trainable parameters. Due to the data of radar signals are finite, the feature values are distributed in a certain interval. When the soft thresholding function is equivalent to the ReLU function with a bias of 0, as shown Figure 14: Similarly, the soft thresholding function is equivalent to the ReLU function with a non-zero bias when the following equation is satisfied: Instead, no matter how ReLU + bias combined, it cannot achieve the same functions as the soft thresholding function. Therefore, the soft thresholding function has a more flexible deletion interval than ReLU, which makes it more flexible and reliable when removing useless features caused by noise, moreover, the 1D DRSN uses an attention mechanism to adaptively set an appropriate threshold for each sample. Thus, the 1D DRSN is suitable for situations where the noise level of each sample is different. As a result, the 1D DRSN achieves a better result in the recognition of noisy radar emitter signals. Similarly, the soft thresholding function is equivalent to the ReLU function with a non-zero bias when the following equation is satisfied:

Conclusions
Instead, no matter how ReLU + bias combined, it cannot achieve the same functions as the soft thresholding function. Therefore, the soft thresholding function has a more flexible deletion interval than ReLU, which makes it more flexible and reliable when removing useless features caused by noise, moreover, the 1D DRSN uses an attention mechanism to adaptively set an appropriate threshold for each sample. Thus, the 1D DRSN is suitable for situations where the noise level of each sample is different. As a result, the 1D DRSN achieves a better result in the recognition of noisy radar emitter signals.

Conclusions
In this study, we proposed a radar emitter signal recognition method based on 1D DRSN and experimentally showed the following advantages of this method: (i) by using the soft thresholding function with attention mechanism, a high recognition rate can be achieved for different types of strong noise; and (ii) no professional knowledge of signal processing or dimension conversion of data is needed, and the 1D DRSN can automatically learn the features of the signal directly from the 1D data.
The 1D DRSN outperformed traditional deep learning methods, improving the average recognition rate by 6.18% and 1.00% compared with the 1D ConvNet and ResNet, respectively. It shows that 1D DRSN can effectively improve the recognition rate of noisy radar emitter signals.
Lastly, the mechanism of the soft thresholding function was analyzed, and the reason why the soft thresholding function outperforms ReLU was discussed. The result suggested that the soft thresholding function is suitable for the recognition of noisy highly noised radar emitter signals.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The recognition results in detail for Figure 6a-d are shown in Tables A1-A4, respectively.