Cooperative Spectrum Sensing Based on Convolutional Neural Networks

Cooperative spectrum sensing (CSS) is an important topic due to its capacity to solve the issue of the hidden terminal. However, the sensing performance of CSS is still poor, especially in low signal-to-noise ratio (SNR) situations. In this paper, convolutional neural networks (CNN) are considered to extract the features of the observed signal and, as a consequence, improve the sensing performance. More specifically, a novel two-dimensional dataset of the received signal is established and three classical CNN (LeNet, AlexNet and VGG-16)-based CSS schemes are trained and analyzed on the proposed dataset. In addition, sensing performance comparisons are made between the proposed CNN-based CSS schemes and the AND, OR, majority voting-based CSS schemes. The simulation results state that the sensing accuracy of the proposed schemes is greatly improved and the network depth helps with this.


Introduction
The rapid development of wireless communication technology has led to more and more wireless network services. The radio spectrum, as the most valuable resource in wireless networks, cannot meet the requirements of wireless services at present and in the future [1]. The existing fixed spectrum allocation method makes the spectrum utilization low and seriously uneven. According to the investigation, the average spectrum utilization is less than 5% at any time or place [2]. Dynamic spectrum access (DSA) is considered to be the main technical solution to the contradiction between supply and demand [3]. As the basis of DSA, cognitive radio (CR) technology has become one of the most cutting-edge research topics in the field of wireless networking.
CR has the ability to interact with its communication environment and change its transmission parameters according to the result. The cognitive radio network (CRN) [4][5][6] is a network with CR as the service terminal. In the CRN, the licensed user and the unlicensed user coexist, where the licensed user is named the primary user (PU) and the unlicensed user is called the secondary user (SU). The CR users with CR can sense idle spectrum resources and maximize the utilization of spectrum resources without interfering with PU. The research work of the CRN is mainly focused on spectrum sensing (SS), spectrum sharing, spectrum access and spectrum mobility [7]. Different from the channel allocation of traditional wireless networks, the channel allocation of the CRN is based on the real-time sensing of channel conditions [8]. As a result, SS is the foundation of CR.
The main function of SS is to detect the spectrum holes available to SU and to monitor the signal activity of the PU to ensure that when the PU uses the spectrum again [9], SU can quickly exit the corresponding frequency band. Two factors influence the performance of SS, namely false alarm probability and detection probability. The false alarm probability is negatively correlated with the throughput of the CR, while the detection probability denotes the protection capacity for PU. Classical SS schemes contain single node-based 1.
A novel two-dimensional dataset of the received signal is established for CSS and the signal-to-noise ratio (SNR) of the dataset varies from −8 dB to 5 dB with an SNR step of 1 dB.

2.
The classical LeNet network, AlexNet network and VGG-16 network are trained on the proposed dataset. The corresponding false alarm probability and detection probability of the three networks are analyzed and compared. 3.
The AND rule-based CSS, the OR rule-based SS and the MV-based CSS are compared with the proposed CNN-based CSS scheme. The experimental results validate the effectiveness of the proposed scheme.
The rest of this paper is organized as follows: Section 2 gives the related work of this paper. In Section 3, the sensing scenario and sensing model are discussed. The main contribution of this paper is shown in Section 4. Simulation experiments and result analysis are presented in Section 5. Finally, Section 6 concludes this paper.

Related Work
In this section, the machine learning-based SS schemes are reviewed and discussed.

Reference
Main Contribution Advantages [30] A STFT-CNN method is proposed for SS based on the short-term Fourier transform (STFT) and CNN.
The signal feature in the frequency domain is considered. [31] The application of machine learning (ML) models in cooperative spectrum sensing of cognitive radio networks (CRNs) is considered, including multilayer perceptron (MLP), support vector machine and Naive Bayes.
The sensing complexity is much lower compared with the ML-based schemes. [32] A CNN-LSTM detector is proposed, which first uses the CNN to extract the energy correlation features from the covariance matrices generated by the sensing data; then, the series of energy correlation features corresponding to multiple sensing periods are input into the LSTM so that the PU activity pattern can be learned.
The time series features are considered for the possible improvement of the sensing performance. [33] The distributed deep reinforcement learning method is adopted to learn the optimal CSS strategy.
The distributed CSS is discussed and analyzed. [34] Multiple machine learning-enabled solutions are adopted to tackle the challenges of the complex sensing model in CSS for a non-orthogonal multiple access transmission mechanism, including unsupervised learning algorithms (K-means clustering and Gaussian mixture model) as well as supervised learning algorithms (directed acyclic graph-support vector machine, K-nearest-neighbor and back-propagation neural network).
The sensing accuracy is greatly improved with a moderate complexity. [35] Deep reinforcement learning (DRL)-based CSS algorithm is proposed, which is employed to decrease the signaling in the network of SUs.
The sensing accuracy of CSS is improved as far as possible. [36] Based on the fact that pulse radar signals can be modeled as a train of rectangular pulses with two amplitude levels, we develop a generalized likelihood ratio test spectrum sensing scheme, as well as its less complex sub-optimal variations, to detect the presence of such signals.
The closed-from solutions can be obtained with a lower complexity. [37] A deep compressive spectrum sensing GAN (DCSS-GAN) is proposed, where two neural networks are trained to compete with each other to recover the spectrum from undersampled samples in the time domain.
The sensing performance with the undersampled samples can be greatly improved.
From Table 1, it can be seen that much effort has been made to possibly improve the performance of SS. In [31,34,36], machine learning (ML)-based schemes are considered for SS, including multilayer perceptron (MLP), support vector machine, Naive Bayes, directed acyclic graph-support vector machine, K-nearest-neighbor and generalized likelihood ratio test. Meanwhile, deep learning (DL) is utilized for SS in [30,32,33,35,37], where CNN with STFT, CNN with LSTM, deep reinforcement learning and a generative adversarial network (GAN) are considered.
Although the sensing performance is greatly improved for the above-mentioned schemes, the ML-based schemes are devoted to the extraction of the shallow features, while the DL-based schemes aim at feature extraction in the frequency domain and complex features, at the expense of computational complexity. In this paper, three basic CNN networks are considered to improve the performance of SS, based on a novel SS dataset. The LeNet network, AlexNet network and VGG-16 network are trained and adjusted on the proposed dataset. Superb sensing performance is obtained based on the simulation experiments.
Note that the proposed schemes in this paper balance sensing performance and sensing complexity. Although the considered CNN networks are classical, they are easy to implement due to their high popularity. In addition, the dataset considered in this paper is based on the energy value of the local SS node. Both the data acquisition and the network structure have low complexity. However, the sensing performance is at a high level according to the results of the simulation experiments. In summary, the proposed schemes in this paper are necessary and useful for the possible performance improvement of SS.

Sensing Scenario
As shown in Figure 1, each SU directly transmits the local sensing information to the FC, and the FC makes the final decision and then sends it to each SU for the centralized CSS. The details of centralized CSS are as follows: 1.
The energy vector of each sensing node is obtained by the sampling and signal processing.

2.
The obtained energy vector of each node is sent to FC over the reporting channel, where the reporting channel obeys the Rayleigh distribution.

3.
After the reporting channel, a two-dimensional matrix is obtained with the energy vector of each sensing node. Then, the mean value of the covariance of the twodimensional matrix is removed by each matrix element. Finally, the updated twodimensional matrix is input to the CNN module for the final decision.

4.
The final decision result at FC is sent to each local sensing node.
ing complexity. Although the considered CNN networks are classical, they are easy to implement due to their high popularity. In addition, the dataset considered in this paper is based on the energy value of the local SS node. Both the data acquisition and the network structure have low complexity. However, the sensing performance is at a high level according to the results of the simulation experiments. In summary, the proposed schemes in this paper are necessary and useful for the possible performance improvement of SS.

Sensing Scenario
As shown in Figure 1, each SU directly transmits the local sensing information to the FC, and the FC makes the final decision and then sends it to each SU for the centralized CSS. The details of centralized CSS are as follows: 1. The energy vector of each sensing node is obtained by the sampling and signal processing. 2. The obtained energy vector of each node is sent to FC over the reporting channel, where the reporting channel obeys the Rayleigh distribution. 3. After the reporting channel, a two-dimensional matrix is obtained with the energy vector of each sensing node. Then, the mean value of the covariance of the two-dimensional matrix is removed by each matrix element. Finally, the updated two-dimensional matrix is input to the CNN module for the final decision. 4. The final decision result at FC is sent to each local sensing node.

Sensing Model
Assume that the received signal at the th i receiver of local sensing node can be formulated as:

Sensing Model
Assume that the received signal at the ith receiver of local sensing node can be formulated as: where s(n) denotes the primary signal, x(n) denotes the background noise with Gaussian distribution and h(n) represents the Rayleigh channel. The sampling frequency is f s and the sensing duration is τ.
The energy value of the ith local sensing node can be written as shown in (2) and (3): where r 1 (n) denotes the real part of r(n) while r 2 (n) denotes its imaginary part. Then, the energy vector can be obtained: Appl. Sci. 2021, 11, 4440 5 of 13 where N denotes the sampling point of the ith local sensing node. After the reporting channel, the energy vector can be updated as: where h 1 (n) denotes the reporting channel with the Rayleigh distribution and x 1 (n) denotes the background noise. At the FC, the two-dimensional matrix is obtained based on each local energy vector: Based on (8), the covariance matrix of M can be denoted as: The updated covariance matrix with the mean value removed is shown as: Finally, R M1 is input to the CNN module for the final decision and the decision result can be described as: where ψ(·) denotes the CNN operations.

Main Contribution
In this section, the main contribution of this paper is discussed, including the dataset construction, the CNN module considered in this paper and the training of the CNNs.

Dataset Construction
The simulation band is conducted at very high frequency (VHF), where the carrier frequency is set as f c = 1.0 × 10 8 Hz and sampling frequency is f s = 3.0 × 10 8 Hz. Orthogonal Frequency Division Multiplexing (OFDM) signal is chosen as the test signal (PU) for its generality and popularity in wireless communication networks. In the simulation, the OFDM signal is first generated, and then white Gaussian noise with with the mean zero and the variance one is added into the OFDM signal.
In this paper, the sensing nodes are fixed to 5 without the description because the sensing node number determines the dataset construction. If the number of sensing nodes is not fixed, the required dataset is very large. On the other hand, the sensing node number determines the size of the two-dimensional matrix, and the sensing performance with the same CNN network will be relatively positive with the sensing node number. As a result, fixing the sensing nodes does not influence the performance validation of the proposed network.
For each sensing node, a random integer between −8 and 5 is used as the signal-tonoise ratio (SNR). There are two kinds of samples in the constructed dataset, namely the H 0 sample and the H 1 sample, where the H 0 sample denotes the absence of PU and the H 1 sample denotes the presence of PU. The H 0 sample and the H 1 sample are in equal proportion. For each sample (the H 0 sample or the H 1 sample), 6000 sets of the complex signal are obtained, with the sampling points of with each sampling points 1000 and then the modulus value is calculated for each sampling point. As a result, a modulus sequence matrix is obtained with the size 5 × 1000. According to the operations in (9) and (10), the updated covariance matrix is obtained. Then, the dataset construction is finished, where the first 2000 groups are taken as the test set and the rest work as the training set. Figure 2 exhibits the obtained covariance matrix when SNR = −8 dB, where the left figure denotes the H 0 case and the right one denotes the presence of PU. In Figure 2, much difference exists, which helps to determine the presence or absence of the PU.

network.
For each sensing node, a random integer between −8 and 5 is used as the signal-tonoise ratio (SNR). There are two kinds of samples in the constructed dataset, namely the 0 H  (9) and (10), the updated covariance matrix is obtained. Then, the dataset construction is finished, where the first 2000 groups are taken as the test set and the rest work as the training set. Figure 2 exhibits the obtained covariance matrix when SNR = −8 dB, where the left figure denotes the 0 H case and the right one denotes the presence of PU. In Figure 2, much difference exists, which helps to determine the presence or absence of the PU.
LeNet was proposed by LeCun, the founder of the convolution neural network, in 1994 to solve the visual task of handwritten digit recognition [38]. As shown in Figure 3, LeNet contains 2 convolution layers, 2 pooling layers and 2 full connection layers.
AlexNet carries forward the idea of LeNet and applies the basic principles of CNN to a very deep and wide network [39], as shown in Figure 4. The advantages of AlexNet can be described as five aspects. (1) ReLU is successfully used as the activation function of CNN, and its effect is proven to be better than that of sigmoid in the deeper network; therefore, the gradient dispersion problem of sigmoid in the deeper network is solved
LeNet was proposed by LeCun, the founder of the convolution neural network, in 1994 to solve the visual task of handwritten digit recognition [38]. As shown in Figure 3, LeNet contains 2 convolution layers, 2 pooling layers and 2 full connection layers.   AlexNet carries forward the idea of LeNet and applies the basic principles of CNN to a very deep and wide network [39], as shown in Figure 4. The advantages of AlexNet can be described as five aspects. (1) ReLU is successfully used as the activation function of CNN, and its effect is proven to be better than that of sigmoid in the deeper network; therefore, the gradient dispersion problem of sigmoid in the deeper network is solved successfully. (2) During training, Dropout is used to randomly ignore some neurons to avoid over-fitting of the model.

The Training of the CNNs
The output layer of the LeNet network in this paper is activated by the sigmoid function. The Adam optimizer is used because the learning rate can be adjusted adaptively. The loss function uses a bivariate cross-entropy function. At the same time, to adapt to the shallow CNN and maintain the stable running of the model, the image size is reset to 36 × 36 when reading the picture (two-dimensional data). Pictures are processed in batches, and the number of pictures in a batch is 100.
Batch normalization (BN) is used for the AlexNet network instead of LRN. Unlike the LeNet network, the random gradient descent optimizer (SGD) is selected for the AlexNet network. Like the LeNet network, the cross-entropy function is considered the loss function of the AlexNet network and the probability of random deactivation is 0.5. The algorithm resets the image size to 100 × 100 when extracting the picture, and cuts it to

The Training of the CNNs
The output layer of the LeNet network in this paper is activated by the sigmoid function. The Adam optimizer is used because the learning rate can be adjusted adaptively. The loss function uses a bivariate cross-entropy function. At the same time, to adapt to the shallow CNN and maintain the stable running of the model, the image size is reset to 36 × 36 when reading the picture (two-dimensional data). Pictures are processed in batches, and the number of pictures in a batch is 100.
Batch normalization (BN) is used for the AlexNet network instead of LRN. Unlike the LeNet network, the random gradient descent optimizer (SGD) is selected for the AlexNet network. Like the LeNet network, the cross-entropy function is considered the loss function of the AlexNet network and the probability of random deactivation is 0.5. The algorithm resets the image size to 100 × 100 when extracting the picture, and cuts it to 99 × 99 in the center. The number of pictures is 100 in a batch.
For VGG-16, the BN and Adam optimizer are used. The loss function is still the binary cross-entropy function, and the random deactivation is used in the full connection layer with the probability 0.2. The image size is reset to 100 × 100 when extracting the picture. The input data are processed in batches, but when affected by GPU memory, the number of batches is reduced to 50.

Simulations and Discussion
In this section, the test results of the proposed CNN-based CSS scheme are presented and discussed. First of all, the detection probability and false alarm probability of the proposed schemes under the test set are given, where the SNR varies from −8 dB to 5 dB with an SNR step of 1 dB. Then, the sensing performance of the proposed scheme is compared with the AND rule, the OR rule and the MV rule. Figures 6 and 7, respectively, provide the detection probability comparisons and the false alarm probability comparisons among the proposed CNN-based CSS schemes. From Figure 6, the detection probabilities of the three CNN-based CSS schemes are all above 0.9, which indicates a stronger ability to protect the PU for the proposed schemes. In addition, the detection probability improves from LeNet to VGG-16. Based on the analysis in Section 4.2, the detection probability of the proposed CNN-based CSS schemes is gradually improved with the rise in the network depth. This indicates that the network depth helps with the feature acquisition of the primary signal and the sensing performance is at a high level in return. In Figure 7, the false alarm probability of the proposed CNNbased CSS schemes is always below 0.1, even below 0.05 for AlexNet and VGG-16, which indicates that the proposed schemes can keep the throughput of the cognitive system at a high level. In addition, the false alarm probability decreases gradually from LeNet to VGG-16. The same conclusion can be obtained from the detection probability-that the network depth contributes to decreasing the false alarm probability.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 13 CSS schemes is always below 0.1, even below 0.05 for AlexNet and VGG-16, which indicates that the proposed schemes can keep the throughput of the cognitive system at a high level. In addition, the false alarm probability decreases gradually from LeNet to VGG-16. The same conclusion can be obtained from the detection probability-that the network depth contributes to decreasing the false alarm probability.   In Figure 8, the sensing performance comparisons are made among various CSS schemes, including the AND-, OR-and MV-based CSS scheme along with the proposed CNN-based schemes. Five sensing nodes are considered in the experiments for convenience. From Figure 8, the detection probability of the MV-based CSS scheme is higher than that of the AND rule-based CSS scheme and the OR rule-based CSS scheme under the same false alarm probability, which validates the rationality of the simulation experiments. Moreover, the sensing performance of the proposed CNN-based CSS schemes is obviously higher than that of the classical schemes, which indicates that the proposed schemes are more effective for CSS. In addition, the detection probability of the VGG-16 networkbased CSS scheme is highest, while the detection probability of the LeNet network is lowest under the same false alarm probability. This states that the network depth is positively related to the sensing performance of the proposed CNN-based CSS schemes, which corresponds with the analysis in Figures 6 and 7. In Figure 8, the sensing performance comparisons are made among various CSS schemes, including the AND-, OR-and MV-based CSS scheme along with the proposed CNN-based schemes. Five sensing nodes are considered in the experiments for convenience. From Figure 8, the detection probability of the MV-based CSS scheme is higher than that of the AND rule-based CSS scheme and the OR rule-based CSS scheme under the same false alarm probability, which validates the rationality of the simulation experiments. Moreover, the sensing performance of the proposed CNN-based CSS schemes is obviously higher than that of the classical schemes, which indicates that the proposed schemes are more effective for CSS. In addition, the detection probability of the VGG-16 network-based CSS scheme is highest, while the detection probability of the LeNet network is lowest under the same false alarm probability. This states that the network depth is positively related to the sensing performance of the proposed CNN-based CSS schemes, which corresponds with the analysis in Figures 6 and 7. As a supplement, the influence of the sensing node on the sensing performance is exhibited in Figure 9 based on the proposed AlexNet network-based CSS scheme, where the sensing node varies from 5 to 8 and from 8 to 10. From Figure 9, the sensing perfor- As a supplement, the influence of the sensing node on the sensing performance is exhibited in Figure 9 based on the proposed AlexNet network-based CSS scheme, where the sensing node varies from 5 to 8 and from 8 to 10. From Figure 9, the sensing performance of the AlexNet network-based CSS scheme increases with the rise in the sensing node, which indicates that the sensing performance of the proposed CSS scheme is positively correlated with the sensing node. As a supplement, the influence of the sensing node on the sensing performance is exhibited in Figure 9 based on the proposed AlexNet network-based CSS scheme, where the sensing node varies from 5 to 8 and from 8 to 10. From Figure 9, the sensing performance of the AlexNet network-based CSS scheme increases with the rise in the sensing node, which indicates that the sensing performance of the proposed CSS scheme is positively correlated with the sensing node. Note that the receiver operator characteristic curve (ROC) is widely considered to exhibit the performance of SS, where the X axis denotes the false alarm probability and the Y axis denotes the detection probability. In the simulations, the false alarm is fixed as 0-1 with a step of 0.1, where the corresponding detection probability is obtained by the mathematical statistics. The slope changes greatly at a false alarm rate of 0.1 because the false alarm probability jumps with the step of 0.1. Specifically, the false alarm probability is 0 at first, while it will jump to 0.1 at the next moment for all the considered CSS schemes. As a result, all these plots show a similar pattern. Note that the receiver operator characteristic curve (ROC) is widely considered to exhibit the performance of SS, where the X axis denotes the false alarm probability and the Y axis denotes the detection probability. In the simulations, the false alarm is fixed as 0-1 with a step of 0.1, where the corresponding detection probability is obtained by the mathematical statistics. The slope changes greatly at a false alarm rate of 0.1 because the false alarm probability jumps with the step of 0.1. Specifically, the false alarm probability is 0 at first, while it will jump to 0.1 at the next moment for all the considered CSS schemes. As a result, all these plots show a similar pattern.
To further evaluate the performance of the proposed CNN-based CSS schemes, the average computation time is analyzed in Figure 10. Note that the computation time denotes the processing time of a picture through the CNN network, where the CNN network is trained in advance. In the simulations, many tests are conducted and the average computation time denotes the average value of the processing time for a picture. From Figure 9, the average computation time of the LeNet network-based CSS scheme is the lowest, while the average computation time of the VGG-16 network-based CSS scheme is the highest. This means that the sensing accuracy of the VGG-16 network-based CSS scheme is at the expense of the system overhead. As a result, the selection of the appropriate CNN-based CSS scheme depends on the system requirement of the sensing accuracy and the sensing speed.
is trained in advance. In the simulations, many tests are conducted and the average com-putation time denotes the average value of the processing time for a picture. From Figure  9, the average computation time of the LeNet network-based CSS scheme is the lowest, while the average computation time of the VGG-16 network-based CSS scheme is the highest. This means that the sensing accuracy of the VGG-16 network-based CSS scheme is at the expense of the system overhead. As a result, the selection of the appropriate CNNbased CSS scheme depends on the system requirement of the sensing accuracy and the sensing speed.

Conclusions
In this paper, three CNN-based CSS schemes are proposed to further improve the sensing performance of CSS. Firstly, the two-dimensional dataset is established based on the covariance matrix of the observed signal. Then, the LeNet, AlexNet and VGG-16-based CSS schemes are trained. Finally, the sensing performance of the proposed schemes is compared with that of the classical CSS schemes, including the AND, OR, majority voting rule. In addition, the average computation times of the proposed CNN-based CSS schemes are discussed. The simulation results show that the sensing performance of the proposed schemes is obviously higher than that of the classical schemes.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the commercial purpose.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions
In this paper, three CNN-based CSS schemes are proposed to further improve the sensing performance of CSS. Firstly, the two-dimensional dataset is established based on the covariance matrix of the observed signal. Then, the LeNet, AlexNet and VGG-16-based CSS schemes are trained. Finally, the sensing performance of the proposed schemes is compared with that of the classical CSS schemes, including the AND, OR, majority voting rule. In addition, the average computation times of the proposed CNN-based CSS schemes are discussed. The simulation results show that the sensing performance of the proposed schemes is obviously higher than that of the classical schemes.