Automatic Modulation Classification of Digital Communication Signals Using SVM Based on Hybrid Features, Cyclostationary, and Information Entropy

Since digital communication signals are widely used in radio and underwater acoustic systems, the modulation classification of these signals has become increasingly significant in various military and civilian applications. However, due to the adverse channel transmission characteristics and low signal to noise ratio (SNR), the modulation classification of communication signals is extremely challenging. In this paper, a novel method for automatic modulation classification of digital communication signals using a support vector machine (SVM) based on hybrid features, cyclostationary, and information entropy is proposed. In this proposed method, by combining the theory of the cyclostationary and entropy, based on the existing signal features, we propose three other new features to assist the classification of digital communication signals, which are the maximum value of the normalized cyclic spectrum when the cyclic frequency is not zero, the Shannon entropy of the cyclic spectrum, and Renyi entropy of the cyclic spectrum respectively. Because these new features do not require any prior information and have a strong anti-noise ability, they are very suitable for the identification of communication signals. Finally, a one against one SVM is designed as a classifier. Simulation results show that the proposed method outperforms the existing methods in terms of classification performance and noise tolerance.


Introduction
Automatic modulation classification (AMC) of digital communication signals has now become an established research area [1]. It plays an important role in many applications. Some of these applications are for civilian purposes such as signal confirmation and spectrum management. The others are for military purposes such as surveillance, electronic warfare, and threat analysis. Therefore, if the types of the enemy signals are recognized, it will be of great significance for us to analyze and interfere with the enemy information.
Many methods for the modulation recognition of the communication signals have been published in recent years. In general, these methods can be divided into two categories: one is based on the decision-theoretic framework and the other is based on the statistical pattern recognition. The decision-theoretic approach is made by maximizing the probability of a certain modulation being sent given the received signal. The maximum likelihood algorithm is the most popular algorithm used in this approach. In the pattern recognition approach, the decision is made based on a set of features extracted from the intercepted signal, which is widely used in practical engineering. Extracting features from the intercepted signal is often followed by a pattern recognizer that determines the signal modulation. The following is an overview of some of these modulation recognition algorithms. the cyclic frequency is not zero, the Shannon entropy of the cyclic spectrum, and Renyi entropy of the cyclic spectrum respectively. Since these new features do not require any prior information and have a strong anti-noise ability, they are very suitable for the identification of communication signals. Finally, a one against one SVM is used as a classifier. Simulation results show that the proposed method in this paper performs well in the low SNR condition.
The rest of the paper is arranged as follows: In Section 2, the mathematical model of the communication signals to be identified is given. In Section 3, the proposed features for signal classification in this paper are described in detail. The proposed SVM classifier is given in Section 4. The simulation results are displayed in Section 5. Finally, conclusions are addressed in Section 6.

Signal Model
In this work, the mathematical model of the signals to be recognized is expressed as where n = 0, 1, · · · , N − 1, N represents the signal length. y (n), x (n) and v (n) are respective the transmitted modulation signal, the intercepted signal, and the noise sample at discrete time n.
The transmitted signal {x (n) , n = 0, 1, · · · , N − 1} is drawn from an unknown constellation set Ψ which in turn belongs to a set of possible modulation formats {Ψ 1 , Ψ 2 , · · · , Ψ K }. The modulation classification problem refers to the determination of the constellation set Ψ to which the transmitted signal belong based on the intercepted signal {y (n) , n = 0, 1, · · · , N − 1}. In this paper, we have considered the following digital communication signals for classification: BPSK, EPSK, 2FSK, 4FSK and MSK.

Instantaneous Features
In [3], instantaneous features, which contain hidden modulation information in a single domain, were demonstrated to be suitable for signal classification. According to the considered digital communication signals in this paper, the following instantaneous features are selected.
(1) σ ap : the standard deviation of the absolute value of the nonlinear component of the instantaneous phase in the non-weak segments of the intercepted signal: where A(n) denotes the instantaneous amplitude and φ (n) denotes the instantaneous phase of the intercepted signal both at time instants t = n/ f s . A n (n) = A (n) /m a , m a is the average value of the instantaneous amplitude over one frame, that is φ NL (n) is the value of the non-linear component of the instantaneous phase at time instants t = n/ f s , a t is a threshold for A(n) below which the estimation of the instantaneous phase is very sensitive to the noise, and C is the number of samples in {φ NL (n)} for which A n (n) > a t . σ ap is mainly used to distinguish the MPSK signals and it also can differentiate the modulation schemes of MFSK signals to some extent.
(2) σ a f : standard deviation of the absolute value of the normalized-centered instantaneous frequency over non-weak segments of the intercepted signal: . σ a f can differentiate between the modulation types without frequency information and the FSK modulation types and also between 2FSK and 4FSK. Figures 1 and 2 show the relationship between the features σ ap and σ a f of different modulation signals with the SNRs. In this simulation, the sampling frequency f s = 10 KHz, the signal length N = 4096, and the noise v (n) is the white Gaussian noise. According to Figures 1 and 2, it is clear that although feature σ ap and feature σ a f of different signals are different, and the difference is not obvious when the SNR is low. Therefore, we need to extract other new features to assist the identification of the modulation types of these signals in low SNR environments, and these features are introduced in the next section.

The Cyclostationary of Communication Signals
A discrete time process x(n) is cyclostationary, if the discrete time cyclic autocorrelation function exists and is not identically zero when α = 0. A particularly convenient characterization of cyclostationary is the cyclic spectrum, which is the Fourier series transform of the R α x (k). In (5) and (6), i = √ −1, * denotes the conjugation operation, α is called the cyclic frequency and f is the spectral frequency. Moreover, it is obvious that the cyclic autocorrelation function and the cyclic spectrum will reduce to the conventional autocorrelation function and the power spectral density when the cyclic frequency α = 0.
The cyclic spectrum can be calculated directly through the double limit of the time smoothed cyclic periodogram, S α where where ∆ f and ∆t are called the frequency and time resolutions of the estimation. The time smoothed cyclic periodogram is used to estimate the cyclic spectrum point by point. When x(n) is the MPSK signal, the mathematical model of x(n) can be described as and where f c denotes the carrier frequency, A denotes the signal amplitude, θ m ∈ {2kπ/M, k = 0, 1, · · · , M − 1} is phase of the transmitted MPSK symbol, T 0 denotes the symbol period. Here, q (n) is a rectangle pulse, and therefore When M = 2 , x (n) is a BPSK signal, then based on (6) the cyclic spectrum of the BPSK signal can be obtained For all integers p. Similarly, when M ≥ 4, the cyclic spectrum of the MPSK signal can be written as According to (14) and (15), we can see that the cyclic spectra of the BPSK signals have large values, when α = p/T 0 and α = ±2 f c + p/T 0 . However, the cyclic spectra of the MPSK (M ≥ 4) signals only have nonzero values at α = p/T 0 .
Similarly, the type of FSK signals can be expressed as where the M frequencies { f c + f m , m = 1, 2, · · · , M} are keyed randomly. When the phase sequences θ m are constant then the FSK is called clock phase coherent FSK and the MSK signal belong to this type of signal. Simultaneously, (16) can be written as where and therefore then based on (6) the cyclic spectrum of (18) can be given by Letting its maximum values at f = ± f m , and if f m T 0 are integers, there are additional peaks at α = ± f m and f = 0. There are also secondary maxima, down by the factor M − 1 from the primary maximum, at ±α = f m ± f n and ± f = ( f m ∓ f n )/2.
When the phase sequence {θ m (r) , m = 1, 2, · · · , M} = θ n is independent and identically distributed with uniform fraction of time distribution over (−π, π], x (n) in (16) is called phase incoherent FSK and the 2FSK/4FSK signal belong to this type of signal, and (16) can be re-written as and where For a purely stationary f (r) with discrete M-ary fraction of time distribution {P m } M 1 , the cyclic spectrum of (22) can be expressed as Comparing (21) with (25), we can see that there are no impulses in (18), and there are no peaks at α = ±2 f m for phase incoherent FSK. Figure 3 shows the cyclic spectra of different communication signals under different SNR environments. The noise is the additive white Gaussian noise (AWGN). From Figure 3, it is clear that the cyclic spectra (α > 0) of these communication signals are not only distinct but also have strong noise immunity, which means the cyclic spectrum is a very good tool for identifying these signals.

The Feature of the Cyclic Spectrum
From Figure 4 and the theory mentioned in Section 3.2.1, we can obtain that the maximum H c of the normalized cyclic spectrum is a good characteristic to distinguish the communication signals, H c is defined as: where max {} denotes taking the maximum value, || represents taking the amplitude. The relationship between the feature H c of different modulation signals and the SNR is shown in Figure 4. The signal propagation channel is AWGN. It can be seen from Figure 4 that the feature H c of different modulation signals varies significantly, which means H c is a good feature to distinguish them.

The Information Entropy Features of the Cyclic Spectrum
The information entropy was first proposed by Shannon, which was used to measure the uncertainty of signal distribution and represents the complexity degree of the signal. Therefore, information entropy provides a theoretical basis for the signal characterization description. Presently, entropy is applied many subjects [22][23][24][25][26][27][28]. Because of the symmetry of the cyclic spectrum, that is Then according to [17], the Shannon entropy of the cyclic spectrum can be defined as where From (28), the entropy H s has several important properties: (I) Symmetry: when the order of each component P α x ( f ) changed, the H s will not be changed, which means the entropy is only related to the whole statistical properties of the data set. According to this property, we can obtain that the entropy H s is robust to the signal modulation parameters such as carrier frequency, code rate, etc.
(II) Non-negative property: the entropy H s is a non-negative value, that is (III) Extreme property: when each component in data set existed in equal probability, the entropy H s will get its maximum value. that is H s log 2 Ω, where Ω represents the number of the P α x ( f ).
Similarly, according to [17], the two-dimensional Renyi entropy of the cyclic spectrum can be defined as follows: where β is the order of the Renyi entropy of the cyclic spectrum, and β ≥ 0, β = 1. Compared with Shannon entropy, the Renyi entropy can better reflect the difference between two different distributions [29]. The relationship between the information entropy H s and H β of different modulation signals and the SNR is shown in Figures 5 and 6. In these simulations, the sampling frequency f s = 10 KHz, the signal length N = 4096, and without loss of generality, the order β is set to 5. The noise v (n) is the white Gaussian noise. From Figure 5, it is clear that the entropy H s is a good feature to distinguish 2FSK, 4FSK, and MSK in a low SNR environment. Similarly, the entropy H β is a good feature to distinguish between BPSK and QPSK, 4FSK and 2FSK, 4FSK and MSK when the SNR is low.

The Proposed SVM Classifier
The traditional artificial neural networks (ANNs) often encounter problems such as overfitting and local minimization. Meanwhile, the large amount of sample data needed for full training of an ANN cannot be guaranteed in practical applications [30][31][32]. The SVM based on the structural risk minimization criterion cannot only minimize the classification error but also improve the generalization ability and has outstanding small sample learning ability. Therefore, based on the mentioned above, this paper will use the SVM to design the classifier to automatically identify the types of the modulation signals.
Given a training set of instance-label pairs (x i , y i ) , i = 1, 2, · · · , l where x i ∈ R n is the input vector and y i ∈ {1, −1} l represents two classes label. Then the mathematical model for the two classes of SVM classifiers can be defined as follows: where i = 1, 2, · · · , l ,w is the vector of weight coefficient, ξ i 0 is the slack variable for the errors, D > 0 is the penalty parameter of the error term, a larger D corresponding to assigning a higher penalty to errors. Each x i is then mapped to a Φ (x i ) in the kernel-induced feature space, which is related to the kernal function Then the standard SVM tries to find a hyperplane w T Φ (x) + b that has a large margin and small training error. The kernel function has many types, such as linear function, polynomial function, radial basis function (RBF), and sigmoid function. The expressions of these functions are given as follows [33]: (I) The linear kernel function: (II) The polynomial kernel function: (III) The RBF kernel function: (37) (IV) The sigmoid kernel function: where γ is the reciprocal of the number of signal types to be classified. Obviously, in this paper γ = 1/5. The effect of these kernel functions on the classification performance of SVM is discussed in detail in the next section.
SVM was originally only used for two types of classification problems, in order to achieve multi-classification problems a multiclass SVM comprising ten two-class sub-SVMs is designed. The number of sub-SVMs is U(U − 1)/2, where U is the number of the signal types. Figure 7 shows the classification procedure structure of the multiclass SVM proposed in this paper. The results of modulation types classification Figure 7. The structure of multiclass SVM classifier based on ten sub-SVMs using the one-versus-one algorithm.

Simulation Results
This section shows the simulation results of the proposed method for the classification of the considered digital modulation signals {BPSK, QPSK, 2FSK, 4FSK, MSK}, and the feature set adopted in these tests is {σ ap , σ a f , H c , H s , H β }. The sampling frequency f s = 10 KHz, and the signal length N = 4096. The noise is the additive white Gaussian noise and was added according to SNRs {−5 dB,−4 dB, · · · , 20 dB}. Each modulation type has 2000 realizations and half of the realizations with SNRs of −5 dB, 0 dB, 5 dB, 15 dB, and 20 dB are used for training. Simulations results have been given in figures and tables, and we use the accuracy metric to test the recognition performance. Furthermore, we have given some confusion matrixes for particular experiments that are considerable. Tables 1-3 show the confusion matrixes over the AWGN channel of the proposed methods under different SNRs. The kernel function used here is the RBF function. From these tables, we can obtain that the overall accuracy of the proposed method for different modulation signals can reach 85.92%, when the SNR = 0 dB, and the overall accuracy will be greater than 99% when the SNR ≥ 6 dB. To evaluate the performance of different kernel functions for multiclass SVM. Table 4 shows the overall accuracy of the proposed method when using different kernel functions, and the SNR = 6 dB. According to Table 4, it is obvious that for the method proposed in this paper, the RBF function has the best performance, and the Sigmoid function has the worst performance. Therefore, the RBF function is recommended for the kernel function of the SVM classifier designed in this paper.

Classification in AWGN Channel
In practical applications, the complexity of the algorithm is an important consideration. Then, in order to measure the computational complexity of the proposed method in this paper, the recognition time of each sub-SVM is shown in Table 5. The simulation is implemented on a computer with a CPU of Intel Core 2.6 GHz i5-3230M and 4-Gb RAM, under the 64-bit Windows 7 system (Microsoft, Redmond, WA, USA). The multiclass SVM is accomplished via MATLAB2011b (MathWorks, Natick, MA, USA). In practice, it is easy to find DSP with similar performance, such as TMS320C6678 and so on. Since the proposed SVM classifier in this paper uses a parallel structure, the time spent by the SVM classifier is equal to the maximum time spent by one of the sub-SVMs. From Table 5, we can obtain that the maximum time of the sub-SVMs is 35.708 µs, which is acceptable in practical applications.
To show the superiority of the method proposed in this paper, the performance of the proposed method is investigated by making comparisons with the existing methods in [7,17]. Figure 8 shows the overall accuracy of different methods under different SNRs. The test uses 1000 Monte Carlo experiments. According to Figure 8, we can obtain that when the SNR < 5 dB, the recognition performance of the method proposed in this paper is better than that of the methods in [7,17], and when the SNR ≥ 5 dB, the recognition performance of the method proposed in this paper is comparable to that of the method in [17]. The proposed method The method in [7] The method in [17]

Classification in Fading Channel
In practical environments, the propagation of signals is often affected by the channels. Table 6 shows the performance of the proposed method when the channel is the Rayleigh channel, and the SNR = 6 dB. It is assumed that there are two channels of multipath signals, the delay of multipath signals is 0.005 s and 0.01 s respectively, and the frequency deviation of multipath signals is 5 Hz and 10 Hz respectively. From Table 6 we can see that the overall accuracy is 99.22% on this occasion, which is comparable to that shown in Table 2. This is because since the multipath effect will affect the amplitude of the cyclic spectrum but not the shape of the cyclic spectrum, the multipath effect has little influence on the entropy characteristics proposed in this paper, so at this point, the performance of the method presented in this paper will not be seriously affected.

Conclusions
Since digital communication signals are widely used in various military and civilian applications, it is important to improve the recognition rate of digital communication signals. In this paper, a novel signal classification method using SVM based on hybrid features, cyclostationary, and information entropy is proposed. The method combines the theory of the cyclostationary and entropy and uses a one against one SVM as a classifier. Simulation results show that the proposed method has a good recognition performance for the signals considered in this paper when in low SNR environments and fading channels.