Modulation Recognition of Communication Signal Based on Convolutional Neural Network

: In the noncooperation communication scenario, digital signal modulation recognition will help people to identify the communication targets and have better management over them. To solve problems such as high complexity, low accuracy and cumbersome manual extraction of features by traditional machine learning algorithms, a kind of communication signal modulation recognition model based on convolution neural network (CNN) is proposed. In this paper, a convolution neural network combines bidirectional long short-term memory (BiLSTM) with a symmetrical structure to successively extract the frequency domain features and timing features of signals and then assigns importance weights based on the attention mechanism to complete the recognition task. Seven typical digital modulation schemes including 2ASK, 4ASK, 4FSK, BPSK, QPSK, 8PSK and 64QAM are used in the simulation test, and the results show that, compared with the classical machine learning algorithm, the proposed algorithm has higher recognition accuracy at low SNR, which conﬁrmed that the proposed modulation recognition method is effective in noncooperation communication systems.


Introduction
Modulation recognition is a technology to judge the modulation mode of a received signal when the content of modulation information is unknown. It is widely used in radio signal monitoring, electronic countermeasures, intelligent communication and other fields. In a real environment, due to the interference of noncooperative communication and background noise, some features of the received signal will be blurred, which will affect the recognition result. How to obtain higher recognition accuracy of the modulation mode under low SNR is an important research topic. Automatic modulation recognition is the main modulation recognition method in wireless communication.
In 1969, Alarabi published an article entitled "Automatic Classification of Modulation Types by Pattern Recognition Technology", in which he proposed for the first time that the recognition of signal modulation mode is essentially a pattern recognition problem [1]. At present, the methods of automatic modulation recognition can be divided into three categories: modulation recognition based on hypothesis testing, modulation recognition based on feature extraction and modulation recognition based on deep learning. The research status of these three methods at home and abroad is analyzed as follows.
Based on the recognition method of the modulation mode of the hypothesis test, the maximum likelihood function and the most appropriate decision threshold of the signal are derived according to the signal characteristics by using the theory of probability and hypothesis test, and then the statistical quantity of the signal to be tested is compared with the decision threshold, so as to realize the recognition of the modulation mode of the signal. In 1988, Kim and Polydoros proposed a recognition method based on the average likelihood ratio test (ALRT) for the first time to identify the modulation pattern of BPSK and QPSK [2]. In 1997, Schreyogg et al. proposed a robust method for asynchronous modulation recognition without a priori knowledge about modulation and transmission parameters and proved the robustness of the classifier modulation types on ASK, BPSK, QPSK, 2FSK, MSK and CW simulation signals [3]. In 2000, Panagiotou et al. proposed a recognition method based on a hybrid likelihood ratio test (HLRT). This method combines the above two methods and absorbs the advantages of the two methods to improve the recognition effect [4]. The recognition method of the modulation mode based on hypothesis testing requires a lot of prior information such as mean value and variance of modulation signal, which is difficult to obtain accurately in noncooperative communication. In addition, the recognition accuracy is low at low SNR because of the great influence of noise. Therefore, the traditional recognition method is gradually being replaced by new methods.
The method of modulation recognition based on feature extraction is to extract the most representative and reflective features in the time domain or frequency domain of signals of different modulation types, so as to accurately identify signals of different modulation types. Nandi proposed a recognition method based on instantaneous statistical features and used the decision tree as a classifier to identify 13 kinds of communication signals. When the SNR is 10 dB, the recognition accuracy reaches more than 90% [5]. Tan proposed a recognition method based on time-domain characteristics. Feature parameters are composed into feature vectors, and the classifier selects random forest, which can realize automatic recognition of six modulation modes of low-order digital signals such as 2ASK and 2FSK [6]. Swami used fourth-order cumulants to identify MPSK and MQAM signals [7]. Wang combined the fourth-order and sixth-order cumulants and used the support vector machine (SVM) as the classifier to realize the modulation identification of MPSK [8]. Vladimir proposed a recognition method based on normalized sixth-order cumulants. This method can identify BPSK, QPSK, 16QAM and 64QAM efficiently [9].
With the popularity of big data, deep learning has become a research hotspot in the field of artificial intelligence. With its powerful ability to automatically extract features, it has been widely used in computer vision and natural language processing. As a result, more and more researchers are applying deep learning to their field of study. In recent years, deep learning has been gradually introduced into the research of digital signal modulation recognition. Automatic modulation recognition based on deep learning does not require prior information. It fully utilizes the autonomic learning mechanism of the neural network to take the original signal or the converted signal data as the input of the neural network to train the label. Then the characteristics of the nonlinear data can be extracted through the nonlinear function in the network. Then the features are sent to the output layer to realize the recognition of the modulation mode of the communication signal.
Tu proposed a modulation recognition method for digital signals based on deep autocoder networks. According to different modulation signals with different cyclic spectrum characteristics and small wave characteristics, the original characteristics of signals are extracted to complete the identification of the unknown signal modulation mode [10]. O'Shea takes the baseband complex signal as input directly and uses CNN to extract the signal characteristics and identify the modulation methods of 11 signals. When SNR is 0 dB, the recognition accuracy reaches 80% [11]. Peng proposed a modulation recognition method based on deep neural network (DNN), which preprocesses the signal to be detected, generates the constellation map and takes the constellation map as the input of the neural network. Using the trained network, the model can recognize the modulated signal. When SNR is greater than 4 dB, the recognition accuracy reaches over 95% [12]. Hou proposed a modulation recognition method for communication signals based on deep learning, which solves the end-to-end signal recognition problem by designing a deep neural network model, thus simplifying the tedious process of artificial feature extraction [13]. Peng proposed to represent data in the form of a grid topology and combine it with CNN to complete the recognition of modulation signals [14].
Wang proposed a convolution neural network recognition algorithm based on the constellation to identify the modulation modes of different signals [15]. Xie proposed a new modulation identification method. First, high-order cumulant features are extracted. On this basis, features of signals of different modulation types can be further extracted, and features are taken as the input of the DNN to improve the recognition accuracy [16]. Tang proposed a modulation recognition method for communication signals based on generating antagonistic networks. Firstly, the auxiliary classifier is used to generate adversarial networks (ACGANs) to extend the data. The classic model AlexNet is then used as a classifier. Compared with the recognition accuracy of the original data set, the expanded data set has a significant improvement in accuracy [17]. Tu used pruning technology to reduce the convolution parameter in CNN and the number of floating-point operations per second and used this modulation method of CNN to identify digital signals. Compared with the original CNN, this lightweight CNN can reduce the training time to 33-35% on the basis of maintaining the recognition accuracy [18]. Shi proposed a particle swarm optimization algorithm for the problem that DNN falls into local minimum. The number of hidden layer nodes of DNN was optimized to extract the characteristics of the signal to be detected as the network input, which could improve the recognition accuracy under low SNR [19]. Liu [20] proposed a deep complex network that cascades the bidirectional long short-term memory network (DCN-BiLSTM) for automatic modulation recognition, and its recognition rate for the 11 modulation signals can reach 90% when SNR exceeds 4 dB.
In this paper, a novel modulation recognition method of the digital signal based on CNN is proposed, called CNN + BiLSTM + attention (C-BiLSTM-A). The algorithm includes two steps of feature extraction and recognition. In the feature extraction operation, the convolution neural network was first used to extract the frequency-domain features of the signal, and then the feature was taken as input. BiLSTM was used to extract the timing features of the signal. Finally, the feature parameters were weighted and summed through the attention mechanism, and a SoftMax classifier was used to identify the modulation mode of the signal.

Algorithm Design
The structure of the C-BiLSTM-A network model proposed in this paper is shown in Figure 1. The digital signal data set was used as the input of C-BiLSTM-A. Firstly, a onedimensional convolution of two layers was selected to extract the signal's coarse features, then the activation function ReLU was used to map the features to the nonlinear space, and then the nonlinear features were mapped to the sample label space through a full connection layer. After that, the signal features were input into BiLSTM, and the two-way gating structure in BiLSTM was used to filter the features and retain part of the feature information to further extract the timing features of digital signals. Due to the different feature information carried by each short subsequence in a long sequence, its importance is also different. Therefore, an attention mechanism was introduced after BiLSTM, and the attention value of different short sequences was obtained by calculating the attention distribution of the input feature information and weighted sum. Finally, it was input to the full connection layer and combined with the classifier to recognize the modulation mode of the digital signal correctly.

Bi-Directional Long Short-Term Memory
At present, the deep learning structure is mainly applied in image and natural processing, and there is less network structure in the application of sequence signal classification. The data processed in this paper belong to one-dimensional time-series signals. On the basis of referring to the convolutional neural network designed by Li, the two-dimensional convolutional layer was adjusted to the one-dimensional convolutional layer [21]. The linear rectifying function ReLU was selected as the activation function, which can effectively alleviate the problem of gradient disappearance in the network and thus make the deep learning model of training more stable.
It was found that digital modulation signals can be regarded as a kind of unit sequence, while the long short-term memory network (LSTM), a powerful sequential signal processing structure, belongs to a special kind of recurrent neural networks (RNNs) [22]. Compared with RNN, LSTM can well express the long-term dependent information in the input. Therefore, LSTM can perform better in longer sequences [23,24]. Whereas RNN has only one transitive state t h , LSTM has two transitive states: one cell state, t c , and one hidden state, t h . The cell state, t c , provides time dependence and time characteristics for the input data, and the LSTM realizes long-term control through the cell state. The unit state realizes the long-term control function, mainly through three kinds of gate structures: forgetting gate, inputting gate and outputting gate. The structure of the LSTM is shown in Figure 2.

Bi-Directional Long Short-Term Memory
At present, the deep learning structure is mainly applied in image and natural processing, and there is less network structure in the application of sequence signal classification. The data processed in this paper belong to one-dimensional time-series signals. On the basis of referring to the convolutional neural network designed by Li, the two-dimensional convolutional layer was adjusted to the one-dimensional convolutional layer [21]. The linear rectifying function ReLU was selected as the activation function, which can effectively alleviate the problem of gradient disappearance in the network and thus make the deep learning model of training more stable.
It was found that digital modulation signals can be regarded as a kind of unit sequence, while the long short-term memory network (LSTM), a powerful sequential signal processing structure, belongs to a special kind of recurrent neural networks (RNNs) [22]. Compared with RNN, LSTM can well express the long-term dependent information in the input. Therefore, LSTM can perform better in longer sequences [23,24]. Whereas RNN has only one transitive state h t , LSTM has two transitive states: one cell state, c t , and one hidden state, h t . The cell state, c t , provides time dependence and time characteristics for the input data, and the LSTM realizes long-term control through the cell state. The unit state realizes the long-term control function, mainly through three kinds of gate structures: forgetting gate, inputting gate and outputting gate. The structure of the LSTM is shown in Figure 2.
The positions of the three gate structures in Figure 2 correspond to forgetting gate, f ; inputting gate, i; and outputting gate, o, respectively, and the candidate value, g, adds information to the cell state. The element state can filter and update the input timing signals well and give it time dependence, which is beneficial to the classification and prediction of timing signals.
For the structure in Figure 2, the learnable weights of the LSTM layer are input weights, w; recurrent weights, R; and bias, b. The matrices are the series of input weights, recursive weights, and biases for each component, respectively. The connection of these matrices is shown in Equation (1): where x t represents current input, h t−1 represents the output of the previous hidden state, σ is activation function sigmoid, tanh represents another activation σ c .
The expression of unit state about time, t, is as follows: where ⊗ represents the Hadamard product (vector product of elements). The hidden state expression at t is expressed as follows: In general, LSTM layer functions use the hyperbolic tangent function (tanh) as the state activation function.
The specific working principle is as follows: the forgetting gate belongs to the forgetting stage, which is mainly about the selective forgetting of the input passed in by the previous node. Specifically, f t controls which parts of the output, c t−1 , for the previous state need to be retained and which parts need to be forgotten. Its expression is as follows: where f t refers to forgetting gating, is the last output information and also the input data information at the current moment. The input gate belongs to the input stage, which is to restrict the input information of this stage. Here, it is mainly to process the input, x t , at the current moment. Specifically, the current input is controlled through input gating i t (i stands for information). Its expression is as follows: The output gate belongs to the output stage. In this stage, it is determined which feature information will be the output of the current state, which is specifically controlled by output gating, o t . Its expression is as follows: The role of the element state in Figure 2 is to make the information of the past moment run directly along the chain, with only a few linear interactions. Its expression is as follows: Similar to the LSTM calculation process, BiLSTM adds the reverse operation on top of it, which can be understood as reversing the input sequence and calculating the output in the way of LSTM again. The final result is the stack of the results of forward LSTM and reverse LSTM [25]. The structure of BiLSTM is shown in Figure 3. The forgetting gate selectively forgets the input passed by the previous node. The input gate is to learn new information to replace the forgotten information. Here, it is mainly to process the input at the current time. The output gate determines which feature information is output as the current unit state. The bidirectional long-term and short-term memory network adds reverse operation on the basis of LSTM. It can be understood that the input sequence is reversed and calculated again in the way of LSTM. The final result is the stacking of forward LSTM and reverse LSTM, which is a symmetrical structure. The positions of the three gate structures in Figure 2 correspond to forgetting gate, f ; inputting gate, i ; and outputting gate, o , respectively, and the candidate value, g , adds information to the cell state. The element state can filter and update the input timing signals well and give it time dependence, which is beneficial to the classification and prediction of timing signals. For the structure in Figure 2, the learnable weights of the LSTM layer are input weights, w ; recurrent weights, R ; and bias, b. The matrices are the series of input weights, recursive weights, and biases for each component, respectively. The connection of these matrices is shown in Equation (1): where t x represents current input， 1 t h − represents the output of the previous hidden state, σ is activation function sigmoid， tanh represents another activation c σ .
, , The expression of unit state about time, t , is as follows: where ⊗ represents the Hadamard product (vector product of elements). The hidden state expression at t is expressed as follows: In general, LSTM layer functions use the hyperbolic tangent function ( tanh ) as the state activation function.
The specific working principle is as follows: the forgetting gate belongs to the forgetting stage, which is mainly about the selective forgetting of the input passed in by The input gate belongs to the input stage, which is to restrict the input information of this stage. Here, it is mainly to process the input, t x , at the current moment. Specifically, the current input is controlled through input gating t i (i stands for information). Its expression is as follows: The output gate belongs to the output stage. In this stage, it is determined which feature information will be the output of the current state, which is specifically controlled by output gating, t o . Its expression is as follows: The role of the element state in Figure 2 is to make the information of the past moment run directly along the chain, with only a few linear interactions. Its expression is as follows: Similar to the LSTM calculation process, BiLSTM adds the reverse operation on top of it, which can be understood as reversing the input sequence and calculating the output in the way of LSTM again. The final result is the stack of the results of forward LSTM and reverse LSTM [25]. The structure of BiLSTM is shown in Figure 3. The forgetting gate selectively forgets the input passed by the previous node. The input gate is to learn new information to replace the forgotten information. Here, it is mainly to process the input at the current time. The output gate determines which feature information is output as the current unit state. The bidirectional long-term and short-term memory network adds reverse operation on the basis of LSTM. It can be understood that the input sequence is reversed and calculated again in the way of LSTM. The final result is the stacking of forward LSTM and reverse LSTM, which is a symmetrical structure.

Attention Mechanism
The mechanism of attention stems from the study of human vision. The attention mechanism can be expressed as a resource allocation model. At some point, the attention will focus on one part of the target, while ignoring the rest. Using mathematical description is a weighted change of the target data. Its core objective is also to select the information that is more critical to the current mission objective from the numerous data [26,27]. Because the importance degree of different short subsequences in the long time series varies, and the important salient features usually contain more information, it needs to analyze the importance degree of the attention mechanism introduced into the target sequence.
The attention mechanism can be divided into hard attention and soft attention mechanisms according to the method of selecting task-related information from the input information. Hard attention refers to the selection of information at a certain position in the input sequence as input, such as the random selection of a piece of information or the information with the highest probability of occurrence. Soft attention refers to the choice of information in the input sequence, it means that all pieces of information are used as input. Since soft attention is more common in practical applications, this paper chooses it as the attention mechanism weighted method. Input information [x 0 , x 1 , . . . , x n ] is represented as n + 1 groups, and here as a group of input information x i , i = 0, 1, . . . , n. From the perspective of computing resources, it is not necessary to input all the information into the computing network but rather to select some information that is more relevant to the task. The computational process of the attention mechanism can be expressed as two steps: firstly, the attention distribution is calculated on all input information. Then the input information is weighted according to the distribution of attention. The specific calculation process is divided into the following three steps: The first step is to introduce different functions and calculation mechanisms. According to query and input, the correlation of them is calculated, and the formula is expressed as follows: similarity(query, The second step is to introduce a similar calculation method, so f tmax, to numerically transform the score in the first step, which can not only transform the original score into the probability distribution with the sum of all element weights equal to 1 through normalization but also highlight the weight of important elements in the sequence through the internal mechanism. The formula is shown as follows: a i represents the distribution of attention, simil i refers to similarity(query, x i ), which represents the attention rating mechanism. The third step is to calculate the weight coefficients according to the second step, the input, x i , can be a weighted sum to get the attention value. The formula is represented as follows: The attention value can complete the classification prediction task. In view of the characteristics of the digital modulation signal itself, a new deep learning model is proposed in this paper. The corresponding network structure of the proposed C-BiLSTM-A model is shown in Figure 4.  In this paper, firstly, CNN was used on a training data set to extract spatial features. Then, BiLSTM was introduced to further extract the temporal features of signals and to train and learn the two types of features, which can improve the recognition accuracy. Although BiLSTM is suitable for the classification and prediction of sequences, considering that the signals to be identified are all long sequences, the importance of different short subsequences in long sequences is different, so only using BiLSTM cannot accurately distinguish the differences of each subsequence. Therefore, the attention mechanism was introduced after BiLSTM to effectively distinguish the importance by calculating the attention distribution of different short subsequences in long sequences. This method can further improve the recognition accuracy of digital signal modulation at low SNR. In this paper, firstly, CNN was used on a training data set to extract spatial features. Then, BiLSTM was introduced to further extract the temporal features of signals and to train and learn the two types of features, which can improve the recognition accuracy. Although BiLSTM is suitable for the classification and prediction of sequences, considering that the signals to be identified are all long sequences, the importance of different short subsequences in long sequences is different, so only using BiLSTM cannot accurately distinguish the differences of each subsequence. Therefore, the attention mechanism was introduced after BiLSTM to effectively distinguish the importance by calculating the attention distribution of different short subsequences in long sequences. This method can further improve the recognition accuracy of digital signal modulation at low SNR.

Experimental Environment and Super Parameter Selection
In order to verify the universality and effectiveness of the proposed algorithm, this paper used MATLAB simulation to get the seven signal sets of 2ASK, 4ASK, 4FSK, BPSK, QPSK, 8PSK and 64QAM. Each signal is obtained by pulse shaping of baseband signal, shaping filtering, carrier modulation and Gaussian noise simulation. Meanwhile, the TensorFlow-1.8.0 + Keras-2.2.4 framework was set up to train the CNN + BiLSTM + attention (C-BiLSTM-A) network. The data set required for the experiment was divided into a training set and a test set in a ratio of 9:1 and a k-Fold cross-validation method was used to evaluate machine learning mode.
Since the length of the signal data used for training was not long, the step size of the convolution kernel was set to 2, and the size of the convolution kernel is set to 1 × 1 and 1 × 3. The spatial features of digital signals were extracted using a convolution neural network. The cross-entropy loss function was selected as the loss function, and the formula is as follows: where p(k) represents the probability distribution of real SNR and q(k) represents the probability distribution of SNR predicted by training model. The cross-entropy loss function can measure the similarity between two distributions, which is often used in multi classification problems. The optimizer was used to reduce the value of loss function to update hidden layer parameters. Adam was selected as the optimizer, which has a series of advantages, such as less memory requirement, simple implementation and high efficiency. The learning rate was set to a fixed value of 0.001, and dropout was set to 0.1. In this way, the of over fitting problem in the full connection layer due to too many parameters was avoided.

Recognition Performance Comparison
This paper adopts the C-BiLSTM-A algorithm to identify the modulation mode of the digital signal. Meanwhile, CNN, C-BiLSTM, SVM [28], random forest (RF) [29] and KNN [30] were taken as comparison algorithms. The evaluation index required by the experiment was the recognition accuracy of seven typical signals under different SNRs (−20-18 dB). The experimental results are shown in Figure 5 below. Figure 5 shows the recognition rates of seven signals at different SNRs, respectively. As can be seen from the figure, the recognition rate of each digital signal increases with the increase of the SNR. The recognition rate of the algorithm presented in this paper was obviously higher than that of other comparison algorithms. When the SNR was 0 dB, the recognition accuracy of the proposed algorithm was 3-10% higher than that of other comparison algorithms. Figure 5b shows the recognition accuracy of signal 4ASK at different signal-to-noise ratios. It can be seen from the figure that the recognition accuracy of the C-BiLSTM-A model increased rapidly from SNR = −14 dB. When SNR = 2 dB, the recognition accuracy gradually reached a stable level and approached 100%, especially under low SNR, the recognition accuracy was better than that in other comparison algorithm models. It can be seen from Figure 5d that the recognition rate of the C-BiLSTM-A model for BPSK increased rapidly from SNR = −6 dB. When SNR = 6 dB, the recognition rate was close to 90% until SNR = 10 dB, which tended to be stable, while other comparison algorithms gradually reached a stable state when SNR = 14 dB. Tables 1-7 show seven kinds of signal recognition rates obtained by applying different modulation recognition algorithms to the results, including C-BiLSTM-A, C-BiLSTM, CNN, random forest, KNN and SVM. It can be seen from the table that the C-BiLSTM-A algorithm had the highest recognition accuracy for seven kinds of signals.

Recognition Performance Comparison
This paper adopts the C-BiLSTM-A algorithm to identify the modulation mode of the digital signal. Meanwhile, CNN, C-BiLSTM, SVM [28], random forest (RF) [29] and KNN [30] were taken as comparison algorithms. The evaluation index required by the experiment was the recognition accuracy of seven typical signals under different SNRs (−20-18 dB). The experimental results are shown in Figure 5 below.
(e) (f)  Figure 5 shows the recognition rates of seven signals at different SNRs, respectively. As can be seen from the figure, the recognition rate of each digital signal increases with the increase of the SNR. The recognition rate of the algorithm presented in this paper was obviously higher than that of other comparison algorithms. When the SNR was 0 dB, the recognition accuracy of the proposed algorithm was 3-10% higher than that of other comparison algorithms. Figure 5b shows the recognition accuracy of signal 4ASK at different signal-to-noise ratios. It can be seen from the figure that the recognition accuracy of the C-BiLSTM-A model increased rapidly from SNR = −14 dB. When SNR = 2 dB, the recognition accuracy gradually reached a stable level and approached 100%, especially under low SNR, the recognition accuracy was better than that in other comparison algorithm models. It can be seen from Figure 5d that the recognition rate of the C-BiLSTM-A model for BPSK increased rapidly from SNR = −6 dB. When SNR = 6 dB, the recognition rate was close to 90% until SNR = 10 dB, which tended to be stable, while other comparison algorithms gradually reached a stable state when SNR = 14 dB. Tables 1-7 show seven kinds of signal recognition rates obtained by applying different modulation recognition algorithms to the results, including C-BiLSTM-A, C-BiLSTM, CNN, random forest, KNN and SVM. It can be seen from the table that the C-BiLSTM-A algorithm had the highest recognition accuracy for seven kinds of signals.          Table 1 shows the recognition accuracy of BPSK. It can be seen from Table 1 that the recognition accuracy of the C-BiLSTM-A model gradually increased from SNR = −6 dB, reaching a recognition accuracy of 0.914870 when SNR = 8 dB, and CNN and C-BiLSTM only reached a recognition accuracy of 0.873964 and 0.830846 when SNR = 8 dB. The recognition accuracy of C-BiLSTM-A was more than 4% higher than that of CNN and C-BiLSTM. Compared with the three traditional machine learning algorithms, SVM had the highest recognition rate of 0.826976 when SNR = 8 dB. The results show that the recognition effect of the C-BiLSTM-A model was better than that of other comparison algorithm models. Table 2 shows the recognition accuracy of QPSK. From Table 2, it can be seen that the recognition accuracy of the C-BiLSTM-A model increased rapidly from SNR = −4 dB, which was significantly better than that of other models. When SNR = 8 dB, it reached a recognition accuracy of 0.829187, while CNN and C-BiLSTM only reached a recognition accuracy of 0.784964 and 0.819237 when SNR = 8 dB. C-BiLSTM-A had a higher recognition rate than CNN and C-BiLSTM. It shows that the recognition effect of the C-BiLSTM-A model on QPSK was better than that of other comparison algorithm models. Table 7 shows the recognition accuracy of 64QAM. It can be seen from Table 7 that the recognition accuracy of C-BiLSTM-A model increased rapidly from SNR = −4 dB, which was significantly better than that of other models. When SNR = 10 dB, it reached a recognition accuracy of 0.933112 and tended to be stable. When SNR = 10 dB, CNN and C-BiLSTM only achieved a recognition accuracy of 0.883914 and 0.878386. The recognition accuracy of C-BiLSTM-A was 5% and 6% higher than that of CNN and C-BiLSTM. Compared with the three traditional machine learning algorithms, the highest recognition rate of the SVM model was 0.883361 when SNR = 10 dB. The results show that the recognition effect of the C-BiLSTM-A model on 64QAM was better than that of other comparison algorithm models.

SNR (dB) C-BiLSTM-
It can be seen from the table that the recognition accuracy of the method proposed in this paper was higher than that of other comparison algorithms at low SNR.
The confusion matrix of various algorithm models obtained by experiments at SNR = 0 dB is shown in Figure 6. The rightmost column of the chart shows the percentage of all samples belonging to each category correctly classified and incorrectly classified, which is called precision. The row at the bottom of the chart shows the percentage of all samples for each category correctly classified and incorrectly classified, which is called recall rate. The lower right cell of the chart shows the overall accuracy. As can be seen from Figure 6, the confusion matrix can visually show the misclassification of samples by different algorithms. It can be seen from the confusion matrix that the error rate of the proposed algorithm was smaller than that of other comparison algorithms. The C-BiLSTM-A model easily confused the [QPSK, 8PSK] modulation types, which means that the recognition effect of this model on this group of modulation types was general. In terms of accuracy, the recognition accuracy of 4ASK, 4FSK, BPSK and 64QAM is high and reaches more than 90%, and the recognition accuracy on QPSK, 2ASK and 8PSK was relatively general and also reaches more than 70%. In terms of recall rate, the recall rate of 4ASK has reached more than 90%, the recall rate of 2ASK has also reached 70%, and the effect on BPSK is relatively general. However, in general, the deep learning algorithm designed in this chapter had a very good overall recognition effect in the recognition of digital signal modulation. 4ASK has reached more than 90%, the recall rate of 2ASK has also reached 70%, and the effect on BPSK is relatively general. However, in general, the deep learning algorithm designed in this chapter had a very good overall recognition effect in the recognition of digital signal modulation.

Conclusions
This paper proposed the C-BiLSTM-A algorithm, which improves the recognition accuracy of digital signal modulation by combining CNN, BiLSTM and the attention

Conclusions
This paper proposed the C-BiLSTM-A algorithm, which improves the recognition accuracy of digital signal modulation by combining CNN, BiLSTM and the attention mechanism. A lot of experimental results show that the improved C-BiLSTM-A model proposed in this chapter has achieved good recognition results under the modulation types of 2ASK, 4ASK, 4FSK, BPSK, QPSK, 8PSK and 64QAM, and the recognition accuracy is about 5% higher than that of the comparison algorithm under low signal-to-noise ratio. It shows that adding BiLSTM and attention mechanism to CNN is very helpful to improve the recognition effect of the model. At the same time, it also proves that the deep learning model performs better in universality in the field of digital signal modulation recognition than the traditional machine learning methods. There are still many aspects to be improved in this experiment, for example, the network structure and super parameters still have some room for improvement, and the follow-up work will continue to try and improve them.
Author Contributions: Conceptualization, writing-review and editing, A.W. and K.J.; methodology, software and validation, X.Q. and J.Z. All authors have read and agreed to the published version of the manuscript.