Automatic Modulation Recognition Based on a DCN-BiLSTM Network

Automatic modulation recognition (AMR) is a significant technology in noncooperative wireless communication systems. This paper proposes a deep complex network that cascades the bidirectional long short-term memory network (DCN-BiLSTM) for AMR. In view of the fact that the convolution operation of the traditional convolutional neural network (CNN) loses the partial phase information of the modulated signal, resulting in low recognition accuracy, we first apply a deep complex network (DCN) to extract the features of the modulated signal containing phase and amplitude information. Then, we cascade bidirectional long short-term memory (BiLSTM) layers to build a bidirectional long short-term memory model according to the extracted features. The BiLSTM layers can extract the contextual information of signals well and address the long-term dependence problems. Next, we feed the features into a fully connected layer. Finally, a softmax classifier is used to perform classification. Simulation experiments show that the performance of our proposed algorithm is better than that of other neural network recognition algorithms. When the signal-to-noise ratio (SNR) exceeds 4 dB, our model’s recognition rate for the 11 modulation signals can reach 90%.


Introduction
As a significant technology for noncooperative wireless communication systems, automatic modulation recognition (AMR) plays an important role in practical civil and military applications, such as cognitive radio, interference recognition and spectrum monitoring [1]. In the absence of prior knowledge, it can identify the modulation type of an intercepted signal, providing parameter information for subsequent demodulation [2].
Traditional AMR algorithms can be divided into two categories. One is based on maximum likelihood (ML) theory [3], and the other is a feature-based (FB) method [4]. The first approach uses probability theory, hypothesis test theory and an appropriate decision strategy to solve the AMR problem. The feature-based approach first extracts the modulated signal characteristics and then completes the recognition using classifiers. In feature-based methods, the number of selected features influences the recognition performance. The main features used for modulation signal identification include instantaneous amplitude, phase, frequency, high-order cumulant [5], cyclic spectrum [6] and wavelet characteristics [7]. Many of the classifiers used are based on machine learning algorithms; these include decision trees, support vector machines (SVMs) [8] and artificial neural networks (ANNs) [9].
In recent years, deep learning (DL), which is a powerful machine learning approach, has achieved great success in diverse fields such as image classification [10] and speech recognition [11]. The concept of DL comes from the research of ANNs. A multilayer perceptron with multiple hidden layers is a DL structure. DL forms a more abstract high-level representation attribute category or feature by combining low-level features to (2) We demonstrate the effectiveness of the DCN-BiLSTM network through experiments.
The experimental results show that the performance of the proposed algorithm is better than that of other neural network recognition algorithms. When the signal-tonoise ratio (SNR) exceeds 4 dB, the recognition rate of our proposed model on the 11 modulation signals can reach 90%.
The remainder of this paper is organized as follows. Section 2 shows the signal model. Section 3 proposes the DCN-BiLSTM network model and introduces its components. In Section 4, we report the results of an experiment conducted to evaluate the proposed method and provide the optimal parameter configuration for the DCN-BiLSTM model. In addition, we use a cross-validation method to evaluate the network. Finally, Section 5 concludes this paper.

Signal Model
This paper uses the open dataset named RML2016.10a [16]. Figure 1 illustrates the dataset generation technique used in [16]. For the Rayleigh fading channel, the received signal can be expressed as where s(n) is the modulated signal sent by the communication transmitter and L is the number of multipaths; h l (n) is the Rayleigh fading factor of the lth path; n l is the delay of the lth path; f dl represents the Doppler frequency; and ω(n) is additive white Gaussian noise. In addition, to ensure that the channel model is similar to a real channel, the channel model of this dataset includes the sampling rate and carrier rate offsets. The specific modulation types and parameters are shown in Table 1.  For a received signal, if we know the modulation type range, sampling frequency, sampling rate offset range, carrier rate offset range and the other signal parameters included in Table 1, we can use the recognition system model proposed in Section 3.

DCN-BiLSTM Network Model
As shown in Figure 2, we propose the DCN-BiLSTM network model for AMR. First, we preprocess the received signal and divide it into I and Q components. Then, we send I and Q to the DCN-BiLSTM network, which is designed for identification. Finally, we obtain the identified signal type. The network has four parts: an input layer, DCN layers, BiLSTM layers and a fully connected layer. convoluted with the Q-channel convolution kernel. After convolution, the real features and imaginary features are output. The activation function for the complex-valued convolution is a rectified linear unit (ReLU) function, which is defined as follows: where x is the input. When x > 0, the activation function has a linear relationship with the input. The BiLSTM layers connect the contextual information among signals and build a bidirectional long short-term memory model for the extracted features. The fully connected layer uses the softmax activation function to output the predicted probability of the modulation information.
where z is the output of the previous layer and eventually forms the input to the fully connected layer; C is the input dimension and the number of modulation types; and y i is the probability of an unknown signal being predicted as category i.
The algorithm uses the cross-entropy loss function to calculate the gradient in reverse to update the bias and weight values. The back-propagation update process is as follows where θ n is the bias or weight of the last moment; η is the learning rate; and ϕ is the loss function.

Deep Complex-Valued Network Module (DCN)
The DCN [25] layers are composed of many complex-valued convolution kernels. These complex-valued convolution kernels of different scales are stacked together to perform a hierarchical convolution operation on the input signal.
In the complex-valued convolution operation, the real and imaginary parts are convolved separately. In Cartesian notation, the complex input matrix is defined as M = M R + iM I . Similarly, the complex-valued convolution kernel matrix is defined as K = K R + iK I . These parameters, including M R , M I , K R , K I , are all real-valued matrices. The complex-valued convolution expression is where * is the operation of convolution. The above formula can be expanded to Figure 3 shows a schematic diagram of the complex-valued convolution operation. The real and imaginary convolutions of the complex-valued signal are expressed as follows: where Re{M * K} is the real part of the signal and Im{M * K} is the imaginary part of the signal. The outputs of the DCN layers will carry phase information and are used as the input to the next layer.

Bidirectional Long Short-Term Memory Module (BiLSTM)
We cascade a BiLSTM behind the DCN to facilitate the extraction of contextual information of features. The BiLSTM network is composed of both forward and reverse LSTM networks [26]. As shown in Figure 4, the LSTM network contains many LSTM memory cells [27] that each includes three control units, namely, an input gate, a forget gate and an output gate. Figure 5 shows a diagram of the LSTM memory cells [27].
In Figure 5, t represents the current moment. The input feature sequence x t and the output sequence of the previous time h t−1 are input to the memory cell. The forgetting factor f t is obtained via the forgetting gate and is expressed as follows: where W f is the connection matrix of x t , h t−1 . b f is the offset matrix, and σ is the sigmoid activation function, which is used to control the information-passing rate. The expression is where the output value of σ is between 0 and 1. The input gate and memory status update information are where i t is the output of the input gate and tanh is an activation function that generates candidate valuesC t . In addition,C t participates in the calculation to obtain the memory Figure 5. An LSTM memory cell.
Among these various components, the memory state C t is the most important because it can allow information to flow through the entire link under the condition that it must remain unchanged, ensuring the integrity of the information for a long time. The output gate control factor o t determines whether to output information h t and is expressed as follows: where W o is the output gate weight matrix and b o is the offset matrix. Compared with C t , h t contains more information about the current moment. Therefore, h t represents short-term memory, while C t represents long-term memory. However, a classical LSTM considers only information from the previous moment. To consider both the former moment and the next moment together, the BiLSTM [26] adds reverse operations based on the LSTM model in [28,29]. Figure 6 shows a structural operation graph of the BiLSTM.  As shown in Figure 6, the BiLSTM reverses the input sequence and calculates the output again in the same way as an LSTM. The final result is a stack of the forward LSTM and the reverse LSTM, which achieves the goal of considering the contextual information. The final outputs of the BiLSTM are h t , where t = 1, 2, . . . , n, can be expressed as shown in Figure 7. The expression of h t is as follows: The output features of the BiLSTM are mapped into a sparse space by the fully connected layer. After the network is trained, the algorithm outputs the classification probability of the corresponding modulation modes.

Experiment Results and Discussions
The relevant platform and software settings for this experiment are shown in Table 2. There are 1000 samples of each modulated signal for each SNR comprising a total of samples is 220,000 samples. The ratio of training sets to test sets is 8:2. We use the np.random.choice function to implement the proportional selection of the dataset to obtain the training sets and the test sets. In this experiment, we performed the following steps.
Step1 Initialize the DCN-BiLSTM network randomly, extract a specified number of samples in the training sets and input them into the network for training.
Step2 Compare the classification result obtained in the last layer of the network with the actual type; use the cross-entropy function to calculate the network loss value; and adjust the network weight value through the optimization algorithm.
Step3 Before the next training starts, use the loss value as a standard to measure the network performance. When it does not drop within 10 iterations, the training is stopped.
Step4 Repeat Steps 2-4 until the maximum number of training is reached or the conditions for premature termination of training are met. The maximum number of training in this article is 150. After training, the weights are saved and the classification model is output.
Step5 Input the test sets into the trained model to obtain the recognition result.

Algorithm Performance Comparison
We selected the recognition algorithms based on a CNN [17], Resnet [18], Inception [18], CLDNN [18], MTL-CNN [20], CVC [23] and CNN-LSTM [24] as benchmark models. A performance comparison chart for these eight recognition algorithms is shown in Figure 8. Figure 8 shows that, from −2 to 18 dB, the accuracy of the DCN-BiLSTM network is substantially higher than the accuracy of the other seven recognition algorithms. When the SNR exceeds 4 dB, the recognition accuracy of the DCN-BiLSTM network for the 11 modulation signals can reach 90%.    To fully consider the phase information, we replaced the CNN with a DCN. The accuracy confusion matrix is shown in Figure 10b. Clearly, the accuracies on the QPSK and 8PSK modulation types are much better than in Figure 10a, and the accuracy on the 16QAM and 64QAM modulation types has improved as well. However, the accuracy on the 16QAM and 64QAM types is still not good enough for practical applications; therefore, we still need to improve their recognition accuracy.    Considering the connections among data points, we cascaded the BiLSTM after the DCN to extract the contextual information of signals. An accuracy confusion matrix for the DCN-BiLSTM is shown in Figure 10c, showing that the recognition accuracy of 16QAM and 64QAM is greatly improved compared with the results in Figure 10a,b. This result indicates that the BiLSTM is useful for extracting the contextual features of signals.
As shown in Figure 10a-c, it is quite difficult to recognize wide band frequency modulation (WBFM). The reason is that the dataset uses voice signals to generate analog signals, and people's voices have silent periods during speaking, leaving only a single carrier during the silent period. Thus, the WBFM signals can easily be misclassified as AM-DSB (amplitude modulation-double side band modulation) signals.
To fully compare the dataset recognition capabilities of the above several networks, we used an online platform that can perform statistical analysis [30]. First, we uploaded the file representing the recognition result in csv format, as shown in Table 3. Table 3 shows the recognition error rate of five types of datasets. The error rates of the first four datasets correspond to the corresponding modulated signals in the fifth dataset. According to Rodríguez-Fdez et al. [30] and the test situation in this paper, we selected Friedman [31] as the test type to be applied. We chose Holm [32], which is widely used, as the post-hoc with control method. At the same time, we set the significance level α to 0.05.
After experiments, the algorithm rankings obtained are shown in Table 4. As we can see, DCN-BiLSTM has the highest ranking and its performance is better than the other seven networks. Moreover, the ranking in Table 4 is consistent with the ranking of the recognition effect in Figure 8, which also shows the correctness of the experiment. Table 5 summarizes the comparison between DCN-BiLSTM and the other seven algorithms by using post-hoc with control methods. By comparing p-value with α, it can be seen that the DCN-BiLSTM network is significantly different from Inception, CNN and Resnet, indicating that the proposed network has significant progress compared with them. At the same time, there are no significant differences between DCN-BiLSTM and CLDNN, CVC, CNN-LSTM and MTL-CNN, which means that the proposed algorithm inherits the excellent performance of the four networks and can replace them in the field of modulation recognition.  To obtain the best parameter configuration for the DCN-BiLSTM network, this section studies the influences of each parameter configuration on the algorithm's performance.
First, we change only the number of DCN layers to find the best number of DCN layers. The different recognition results are shown in Figure 11. As shown in Figure 11, the overall recognition rate is the highest with six DCN layers when the SNR is greater than 2 dB. With fewer than six layers, the network's ability to extract phase features is not strong enough. With more than six layers, the network extracts redundant features, and the recognition rate no longer improves, which wastes memory. Therefore, the best number of DCN layers for this algorithm is six.
Changing the BiLSTM layer number also affects the recognition performance. Figure 12 shows a comparison of the recognition performance under different numbers of BiL-STM layers. With fewer than two BiLSTM layers, the algorithm's ability to process feature information is poor. With more than two layers, while the accuracy rate is equivalent to that of a two-layer BiLSTM network, the added layers cause a speed reduction and waste memory. Therefore, it is best to set the number of BiLSTM layers to two.
Based on the previous analysis, the final DCN-BiLSTM network parameters are shown in Table 6.

Five-Fold Cross Validation
To evaluate the performance of the network proposed in this article, we used the five-fold cross-validation method to train and test the network in this part. We divide the dataset into five parts equally, and use one part as the test sets each time and the remaining four parts as the training sets. Finally we obtained five training and test results, as shown in Figure 13. The average recognition rate curve of these five times is shown in Figure 14. As shown in Figures 13 and 14, when SNR is greater than 4 dB, the recognition accuracy of the proposed network is more than 90% under five trainings, covering the entire dataset, which illustrates the rationality and stability of the designed network.

Conclusions
In this paper, we propose a classification algorithm based on the DCN-BiLSTM network that achieves direct recognition of 11 different types of modulated signals. First, DCN layers are used to extract the phase features of the modulation signal. Then, BiLSTM layers are used to extract the contextual information and construct a bidirectional long short-term memory model for the features. Compared with previous network recognition algorithms based on CNN, Inception, Resnet, CLDNN, MTL-CNN, CVC and CNN-LSTM, the recognition accuracy of the DCN-BiLSTM network is significantly higher under high SNR. However, even when the SNR is as low as 4 dB, the recognition accuracy rate of the DCN-BiLSTM network can still reach 90%.
However, the DCN-BiLSTM network has a slow training speed, and the method works satisfactorily only for signals whose frequency offset and sampling frequency offset are within a certain range. In addition, the identified signal must belong to one of the 11 specified types. In future work, we plan to modify or optimize these issues. In particular, on the one hand, for the training speed issue, we can use other GPUs with stronger computing capabilities to speed up training. On the other hand, in terms of network structure, each LSTM unit in the BiLSTM layers contains three gate functions, resulting in more parameters, which is the main reason for the slow network training speed. Therefore, we can try to optimize the LSTM unit to reduce the number of parameters and increase the training speed, such as reducing or simplifying the gate function.
In addition to the phase information and contextual information mentioned in this paper, many other characteristics of modulated signals could be considered, such as timefrequency domain characteristics and constellation characteristics. In future work, we plan to use other features in combination with the DCN-BiLSTM network to improve the modulation signal identification performance.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: