Open Access
This article is

- freely available
- re-usable

*Information*
**2019**,
*10*(11),
331;
https://doi.org/10.3390/info10110331

Article

Wideband Spectrum Sensing Method Based on Channels Clustering and Hidden Markov Model Prediction

^{1}

Institute of Electronic Engineering, China Academy of Engineering Physics, Mianyang 621900, China

^{2}

School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China

^{*}

Author to whom correspondence should be addressed.

Received: 23 September 2019 / Accepted: 21 October 2019 / Published: 25 October 2019

## Abstract

**:**

Spectrum sensing is the necessary premise for implementing cognitive radio technology. The conventional wideband spectrum sensing methods mainly work with sweeping frequency and still face major challenges in performance and efficiency. This paper introduces a new wideband spectrum sensing method based on channels clustering and prediction. This method counts on the division of the wideband spectrum into uniform sub-channels, and employs a density-based clustering algorithm called Ordering Points to Identify Clustering Structure (OPTICS) to cluster the channels in view of the correlation between the channels. The detection channel (DC) is selected and detected for each cluster, and states of other channels (estimated channels, ECs) in the cluster are then predicted with Hidden Markov Model (HMM), so that all channels states of the wideband spectrum are finally obtained. The simulation results show that the proposed method could effectively improve the wideband spectrum sensing performance.

Keywords:

channels clustering; cognitive radio; HMM prediction; OPTICS algorithm; wideband spectrum sensing## 1. Introduction

Due to the fixed allocation mechanism, a large amount of spectrum resources have been troubled with a very low utilization. The coordination between the demand for high-rate wireless communication technology and the shortage of spectrum resources can be hardly obtained and is getting more worsened. Cognitive Radio (CR) [1] is targeted by researchers because it can detect and utilize the unoccupied spectrum. In CR, the Secondary User (SU) could detect the unoccupied frequency band through spectrum sensing technology. In the following, SUs would dynamically access the unoccupied licensed band without interfering the Primary User (PU), thereby improving spectrum utilization.

With the rapid expansion of the spectrum resources demand for various wireless devices and services, it is urgently necessary to quickly detect and utilize more spectrum opportunities efficiently. For the allocation of the entire radio spectrum is from 6 kHz to 300 GHz [2], the sensing spectrum band by CR will be extended to several GHz. For common wideband spectrum sensing, it demands high-speed sampling and signal processing according to the Nyquist theorem. The conventional wideband spectrum sensing methods are mainly multiband spectrum sensing techniques [3] including serial sensing, parallel sensing and wideband spectrum sensing based on Sub-Nyquist [4], such as compressive sensing [5]. In serial sensing, each channel is detected sequentially based on reconfigurable bandpass filter or tunable oscillator or two-stage sensing [6], which may cause a long delay unsuitable for fast processing. Parallel sensing is carried out mainly by filter banks [7], which may detect all the channels at a time, but it needs lots of radio frequency components and the hardware implementation complexity is high. Compressive sensing is one of the popular wideband spectrum sensing methods based on Sub-Nyquist, which could reduce sampling rate, but the reconstruction algorithm is of high complexity and its prerequisite is that the signal is under sparseness. From what has been discussed above, it motivates us to study effective wideband spectrum sensing methods.

Comparatively, the wideband spectrum sensing method based on estimation or prediction is able to obtain the occupancy status of future channels instead of directly detecting multiple channels [8], which has attracted more and more researchers’ attention. Previously, spectrum prediction is mainly carried out in time domain, and the channel state of the next time slot is predicted. In frequency domain, the adjacent or related channels states are inferred based on channels correlation. In order to take best use of the spectrum data, more and more spectrum prediction methods are made in joint time-frequency domain or even multi-dimension [9].

In time domain, one of the commonly used prediction methods is Hidden Markov Model (HMM). When using the HMM model for prediction, the channel state cannot be directly observed. Rather, it can be observed through the observation vector sequence obtained by spectrum sensing, and use the observations to train the HMM model parameters, on which the channel state is predicted [10]. Its computational complexity is related to the number of states and the length of the sequence of observation vectors. It is proved by the measured data in [11] that the spectrum occupancy state of PUs obeys the Markov chain and the occupancy state is predicted with the HMM. Two HMMs are adopted to detect honest and malicious users respectively in collaborative spectrum sensing, and they are jointly estimated simultaneously with an effective inference algorithm [12]. While [13] makes a decision fusion based on HMM by exploiting time-correlation of the unknown binary source. In [14], a hidden bivariate Markov chain is applied to model the received signal in a Gaussian channel, and it can predict the state in time domain. With regard to regression analysis, ARMA shows better performance in predicting cyclostationary time series than non-stationary time series, while the ARIMA model has better prediction performance for non-stationary time series [15]. In addition, the above prediction methods are all directed to a narrowband spectrum.

In frequency domain, Li et al. [16] applied Bayesian networks to model joint multi-channel spectrum prediction, but the application areas of the model are limited. Because of the highly spectrum state correlations among channels, which is evaluated by real-world data [17], the spectrum prediction methods referring to frequency domain are mostly extended to time-frequency domain. Yin et al. [18] designed a two-dimensional frequent pattern mining. However, a lot of calculations are required for determining the frequent pattern which may vary in different environments. Sun et al. [19] converted the spectrum data to two-dimensional image space for time-frequency spectrum prediction. This method is based on historical multi-day data to predict the use of spectrum in the next day, with limited prediction accuracy.

For a wideband or multiband spectrum sensing method based on estimation or prediction in time-frequency domain, channels with highly correlations are clustered and only one or some channels are selected to be detected, while the states of other channels (ECs) in the same cluster are predicted or estimated [20,21]. Gao. et al. [20] combined minimum entropy increment (MEI) algorithm or greedy clustering (GC) algorithm and Markov process to predict, but the computational complexity of the MEI algorithm is too high, while the GC algorithm has low accuracy. Huang. et al. [21] applied HMM to estimate the states of other channels (ECs) in the same group clustered by a multi-centers clustering (MCC) algorithm, while there are many detection channels (DCs) in one clustered group, which would cost much time and energy to detect. So wideband spectrum sensing method based on channels clustering and prediction would save time and energy consumption due to the reduction of DCs, but the main difficulty is to find the proper clustering and prediction algorithm. Also based on high correlation among the samples, [22] proposes the oversampling at the receiver, which would introduce a lot of calculation. In spatial domain, the maximal ratio combining (MRC) and equal gain combining (EGC) are employed to combine the spatial diversity for studying the statistical properties of the capacity of Nakagami-m channels [23], and a spatial correlation coefficient is proposed to express the correlation characteristics of mobile cognitive radio users in different environments [24].

In view of the above discussion, this paper proposes a novel wideband spectrum sensing method based on channels clustering algorithm of OPTICS and HMM prediction in time-frequency domain. The approach goes first by dividing the wideband spectrum into uniform sub-channels, and then applies the density-based OPTICS clustering algorithm to cluster the channels with highly correlation. For each cluster, one DC is selected based on the minimum information entropy principle and the DC should be detected directly. Subsequently, the ECs are predicted by HMM according to their correlations with DCs and dependence on their past states. In the end, the performance is evaluated by simulation and compared with the GC algorithm and the MEI algorithm.

The rest of this paper is organized as follows: problem description is introduced in Section 2. In Section 3, the density-based clustering algorithm OPTICS is briefly discussed and applied to cluster channels based on channels correlation. HMM-based channel prediction method presents in Section 4. Simulation study demonstrates the performance of the proposed wideband spectrum sensing method in Section 5. Section 6 is the conclusion of this paper.

In this paper, we will denote wideband channel set in upper-case letter (e.g., $C$) and the state of a sub-channel in lower-case letter with the index notation (e.g., ${c}_{i}$), while the state of a sub-channel in the time slot $m$ is represented by ${c}_{i}^{m}$. In the following, the matrix is signified in upper-case letter (e.g., $A$) and the element in matrix is denoted in lower-case letter with double index notations (e.g., ${a}_{ij}$).

## 2. System Model

For the wideband spectrum, it can be evenly divided into N sub-channels. As shown in Figure 1, the sub-channel numbers are 1, 2, 3, ..., N. The conventional sensing method needs to detect each of these sub-channels serially or in parallel, which would lead to a large amount of time and energy cost. The real-world measured data proved that there is a highly correlation between the channels [25]. The sub-channels of the same color in Figure 1 represent a kind of highly correlation with the relevant channels being clustered. The clustered sub-channels are distributed as shown in Figure 2. Sub-channels of the same color are defined as a cluster. A DC is selected for each channel cluster based on the principle of minimum information entropy, and other channel states of the same cluster are predicted by the HMM model to ultimately obtain the channel states of the wideband spectrum.

## 3. Correlation-Based Channels Clustering

#### 3.1. Channel Correlation Metrics

In cognitive radio, Channel State Information (CSI) is divided into two types: “unoccupied (idle)” and “occupied (busy)”, which are represented by binary "0" and "1" in this paper. The channel correlation indicates the probability of consistent states of two channels, i.e., the higher the correlation, the greater the probability that the two channels will be in the same or opposite state (depending on sign of correlation) [26]. In this work, Channel Correlation Factor (CCF) is used to represent channel correlation, which is defined as:
where $M$ is the total number of time slots, ${\rho}_{ij}$ indicates the CCF between the $i$-th channel and the $j$-th channel, $c{s}_{i}^{m}$, $c{s}_{j}^{m}$ respectively represent the state of the channel $i$ and the channel $j$ in the time slot $m$, $I(A)$ is a discriminant function, if the value of $A$ is true, then $I(A)=1$, otherwise $I(A)=0$.

$${\rho}_{ij}=\frac{{\displaystyle \sum _{m=1}^{M}I(c{s}_{i}^{m}=c{s}_{j}^{m})}}{{\displaystyle \sum _{m=1}^{M}I(c{s}_{i}^{m}=c{s}_{j}^{m})+{\displaystyle \sum _{m=1}^{M}I(c{s}_{i}^{m}\ne c{s}_{j}^{m})}}}$$

${\rho}_{ij}$ is given a value that ranges from 0 to 1. When ${\rho}_{ij}$ approaches 0, it is a negative correlation between two channels. That is, when one channel is in a certain state, the other channel is in the other state for the channel state is either 0 or 1. When ${\rho}_{ij}$ approaches 1, it represents a positive correlation between two channels. That is, the state of the two channels is basically the same. When ${\rho}_{ij}$ approaches 0.5, there is no obvious correlation between the two channels, that is, when one channel is in a certain state, the state of the other channel may be the same or opposite, and the probability of both cases is 50% each.

#### 3.2. Channel Correlation Verification

Using the real-world Power Spectral Density (PSD) dataset measured by RWTH Aachen University in 2007, this spectral dataset covers the frequency range from 20 MHz to 6 GHz with the time of one week [25]. According to the measured PSD values of the NE GSM1800 DownLink (1820.2–1875.4 MHz) frequency band and the NE TV (614–698 MHz) frequency band, the channel occupancy and channel correlation distribution maps are shown in Figure 3, where the spectrum resolution is 200 kHz, the average sampling interval is about 1.8 s, and the power unit is dBm. Since the actual amount of data is large, one sample value is taken for every 60 measured data, and the lowest threshold is −107 dBm/200 kHz in an earlier requirements document of the IEEE 802.22 standardization committee. When the PSD value is greater than the detection threshold, the CSI is “1”. Otherwise, the CSI is “0”

As shown in Figure 3, I would rather say that a block-correlation pattern is evident, which confirms the motivation for considering contiguous aggregation (clustering) of frequency bins.

#### 3.3. OPTICS Algorithm for Channel Clustering

As anticipated in Section 1, GC, MEI and MCC have their limitations for channels clustering. While sub-channels of wideband spectrum are regarded as the objects of the spectrum space, the density-based OPTICS [27] clustering algorithm is applied for channels clustering in this paper, which is an extension of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [28]. DBSCAN is a density-based clustering algorithm to look for high-density areas separated by low-density areas, and all the points are marked as core point, border point or noise point respectively depending on the density of the area where this point is located. In this algorithm, an edge is assigned between all the core points within the distance predefined, and each group of connected core points forms a cluster, then each border point is assigned to a cluster of associated core points. The advantage of DBSCAN algorithm is relatively noise-resistant and can deal with clusters of any shape and size, and one of the disadvantage is high dependence on input parameters, which is improved in OPTICS algorithm.

The OPTICS algorithm starts from a randomly selected core object and preferentially expands toward high-density data areas. At the same time, the neighborhood objects are output in an ordered list according to the density, and clusters of different densities are displayed according to the reachability-distance curve of the channel. However, the data clustering is not explicitly generated, and the final clustering result can be obtained from the ordered list or the reachability-distance curve according to the actual application. In this work, the clustering results of the channels are obtained from the ordered list.

The wideband spectrum is evenly divided into $N$ sub-channels that is $C=\left\{{c}_{1},{c}_{2},\cdots ,{c}_{N}\right\}$, where $C$ is the channel set. The information entropy is used to represent the distance between the channels [29], which is defined as:
where ${c}_{i}$ is the DC, ${c}_{j}$ is the predicted channel, and $H({c}_{j}|{c}_{i})$ is the entropy function, indicating the uncertainty introduced by the DC for predicting other channels (ECs). From the entropy function, the smaller the entropy means the smaller the predicted error. The basic definitions in OPTICS algorithm are as follows.

$$\begin{array}{ll}r({c}_{j},{c}_{i})& =H({c}_{j}|{c}_{i})\\ & =-{\rho}_{ij}\mathrm{log}{\rho}_{ij}-(1-{\rho}_{ij})\mathrm{log}(1-{\rho}_{ij}),i,j\in N\end{array}$$

**Definition**

**1.**

(ε-neighborhood) A set of channels with a radius of ε (ε > 0) centered on a selected channel object, called the ε-neighborhood of the object, and ε is a preset parameter.

**Definition**

**2.**

(MinPts) The ε-neighborhood of each object in a channel cluster need to contain at least a minimum number of channels, and also called the neighborhood density threshold, where MinPts is another preset parameter.

**Definition**

**3.**

(core object) The object contains at least MinPts objects in the ε-neighborhood.

**Definition**

**4.**

(core-distance) The distance makes an object to be core object, which is equal the distance from the object to its MinPts’ neighbor. If the object is not core object, the core-distance of the object is undefined.

**Definition**

**5.**

(reachability-distance) For object${c}_{j}$and core object${c}_{i}$, if the distance$r({c}_{j},{c}_{i})$is bigger than the core-distance of${c}_{i}$, the reachability-distance is$r({c}_{j},{c}_{i})$. Otherwise, the reachability-distance is the core-distance of${c}_{i}$. If${c}_{i}$is not core object, the reachability-distance is undefined.

In the channels clustering process based on the OPTICS algorithm, the seed queue $Q$ is generated, and the result queue $R$ is the final output, while the elements in $Q$ are arranged in ascending order according to the reachability-distance. After each expansion of the OPTICS algorithm, it is necessary to update the $Q$ in ascending order of the reachability-distance, and then take the minimum reachability-distance in $Q$. To simplify the algorithm, $Q$ is no longer sorted in this work, only to use the function to take the point with the smallest reachability-distance in $Q$. As a result, the average time complexity of selecting the minimum distance of reachability-distance is reduced from $O(N\mathrm{log}N)$ to $O(N)$, and the time complexity of presented OPTICS is $\mathrm{O}({N}^{2})$. The OPTICS algorithm is described as follows (Algorithm 1).

Algorithm 1 Channels clustering algorithm of OPTICS |

Input: specify channel set $C$ to be clusteredParameters: ε and MinPtsOutput: result queue $R$1: Create a result queue $R$ and a seed queue $Q$.2: Randomly select an unprocessed core object $p$ from the channel set $C$ and add $p$ to $R$, then take the ε-neighborhood channels of object $p$, which is showed as ${C}_{1}\subset C$. If the objects of ${C}_{1}$ is not in $R$, add ${C}_{1}$ to $Q$, and then $Q$ is arranged in ascending order according to the reachability-distance.3: Select a core object $q$ with the smallest reachability-distance in $Q$ and add $q$ to $R$, then take the ε-neighborhood channels of object $q$, which is showed as ${C}_{2}\subset C$. If the objects of ${C}_{2}$ is in $R$, repeat step 3. Otherwise, go to step 4.4: if the objects of ${C}_{2}$ is already in $Q$, the objects of ${C}_{2}$ in $Q$ are arranged in ascending order according to the reachability-distance, and then return to step 3. Otherwise, add ${C}_{2}$ to $Q$ and $Q$ is arranged in ascending order according to the reachability-distance, then return to step 3.5: If $Q$ is empty, return to step 2.6: If the channel set $C$ is empty, the algorithm ends, and the ordered sample objects in the result queue $R$ are output. |

#### 3.4. DC Selection and Clustering Results

It can be known from the OPTICS algorithm that only the core object may become the DC. Therefore, the DC is selected from the core object of each cluster according to the principle of minimum uncertainty introduced by the DC, and the definition is as:
where ${g}_{k}$ represents the k-th group channel after clustering, ${d}_{i}$ and ${c}_{j}$ are the channels in ${g}_{k}$, ${d}_{k}$ indicates the detection channel selected from the k-th group, $J$ is the number of channels in the cluster. The above equation shows that the uncertainty of the prediction of other channel states by the selected DC is the smallest, that is, the prediction accuracy is the highest. Once the DC is selected for each group, the detected channel grouping ${C}_{d}=\left\{{d}_{1},{d}_{2},\cdots ,{d}_{n}\right\}$ is obtained.

$${d}_{k}=\underset{{d}_{i},{c}_{j}\in {g}_{k}}{\mathrm{arg}\hspace{1em}\mathrm{min}}\text{\hspace{0.17em}}{\displaystyle \sum _{j=1,j\ne i}^{J}H\left({c}_{j}\right|{d}_{i})}$$

As the OPTICS algorithm is a density-based greedy search, which may lead to sparse points that cannot be effectively clustered, that is, sparse points close to the DC may be judged as “noise”. To solve the above problem, after determining the ${C}_{d}$, it is determined again whether all ε-neighborhood points of the DC are in the result queue, and if not, getting added to the result queue.

Set the threshold radius here to ε1, 0 < ε1 < ε, and the steps to obtain the clustering channels from the updated result queue are as follows (Algorithm 2).

Algorithm 2 Obtaining the clustering channels |

Input: result queue $R$Output: clustered channels1: Set the threshold radius ε1.2: Take out the head element $P$ of queue $R$. If the reachability-distance of $P$ is greater than ε1, add $P$ to the current cluster; otherwise, if the core-distance of $P$ is less than ε1, add $P$ to a new cluster.3: Repeat step 2 until $R$ is empty.4: The algorithm is end when $R$ is empty. |

## 4. HMM-Based Channel Prediction

#### 4.1. Esimation Model

Based on the highly correlation between cluster channels, the states of other channels (ECs) in cluster can be directly predicted by detecting channel states. However, due to the errors in the clustering performance, the statistical similarity and the differences in some time of individual channels will be caused, which will lead to false alarm or missed detection, especially the missed detection will cause SU interference to PU. Therefore, it is necessary to improve the prediction methods of other channels (ECs) in the cluster. In this work, the HMM is applied to predict the state of other channels (ECs) in cluster, so that the state of the entire wideband spectrum could be quickly obtained.

The HMM may be viewed as a mixed model consisting of a hidden Markov process and an observable stochastic process in a specific hidden state, represented by a triple $\lambda =(\pi ,A,B)$, where the detected channel state and the predicted channel state are treated as observable state $\theta =\left\{{\theta}_{1},{\theta}_{2},\cdots ,{\theta}_{K}\right\}$ and hidden state $S=\left\{{s}_{1},{s}_{2},\cdots ,{s}_{D}\right\}$, respectively. Elements in $\lambda $ are defined as follows:

- $\pi =\left\{{\pi}_{i}\right\}$ represents the initial probability distribution of the state, where ${\mathsf{\pi}}_{\mathrm{i}}=P\left({q}_{1}={s}_{i}\right)$, $1\le i\le D$.
- The state transition probability matrix is ${A}_{D\times D}=\left\{{a}_{ij}\right\}$, where ${a}_{ij}=P\left({q}_{m+1}={s}_{j}|{q}_{m}={s}_{i}\right)$, $1\le i,j\le D$, and $\underset{j=1}{\overset{D}{\Sigma}}{a}_{ij}=1$, ${q}_{m}$ indicates the state of the model in the time slot of $m$.
- The observation matrix is ${B}_{D\times K}=\left\{{b}_{i}(k)\right\}=\left\{{b}_{ik}\right\}$, where ${b}_{ik}=P\left({o}_{m}={\theta}_{k}|{q}_{m}={s}_{i}\right)$, $1\le i\le D$, $1\le k\le K$ and $\underset{k=1}{\overset{K}{\Sigma}}{b}_{ik}=1$, ${o}_{m}$ indicates the observation state of the model in the time slot $m$.

In the channel state prediction herein, both the observable state and the hidden state are two states, that is $K=D=2$, then $S=\left\{{s}_{1},{s}_{2},\cdots ,{s}_{D}\right\}=\left\{0,1\right\}$, $\theta =\left\{{\theta}_{1},{\theta}_{2},\cdots ,{\theta}_{K}\right\}=\left\{0,1\right\}$. Figure 4 shows a two state transition diagram of the HMM, and the ECS is predicted channel state, while the DCS is detected channel state.

#### 4.2. HMM-Based Channel Prediction

Given observation data $O=\left({o}_{1},{o}_{2},\cdots ,{o}_{M}\right)\in \theta $ and predicted channel history data $Q=\left({q}_{1},{q}_{2},\cdots ,{q}_{M}\right)\in S$, $M$ is the data length, and the parameters of the HMM are directly predicted by the maximum likelihood estimation method. With the discriminant function $I(A)$ to predict the initial state probability matrix $\pi $ and the state transition probability matrix A and the formulas are as:

$${\widehat{\pi}}_{i}=I({q}_{1}={s}_{i})$$

$${\widehat{a}}_{ij}\text{\hspace{0.17em}}=\frac{{\displaystyle \sum _{m=1}^{M-1}I({q}_{m}={s}_{i})\times I({q}_{m+1}={s}_{j})}}{{\displaystyle \sum _{m=1}^{M-1}I({q}_{m}={s}_{i})}}$$

The observation matrix is obtained from the correlation coefficient matrix. When the two-channel correlation coefficient ${\rho}_{ij}\ge 0.5$ indicates the two-channel positive correlation, the observation matrix is as:

$${b}_{i}({\theta}_{j})=p({o}_{m}={\theta}_{j}|{q}_{m}={s}_{i})=\{\begin{array}{ll}{\rho}_{ij}& {s}_{i}={\theta}_{j}\\ 1-{\rho}_{ij}& {s}_{i}\ne {\theta}_{j}\end{array}$$

The HMM model parameters λ = (π, A, B) are predicted by the above method, and other channel states in the channel cluster are predicted by the Viterbi algorithm, thereby obtaining the occupancy of the wideband spectrum after clustering. Time complexity of HMM mainly depends on the Viterbi algorithm, whose complexity is $\mathrm{O}(D\ast {2}^{M})$, where $D$ is the number of hidden states and $M$ is the length of observation series.

## 5. Simulation Results and Discussion

#### 5.1. Simulation Results of Channels Clustering

Based on the OPTICS algorithm, the reachability-distance curves and the channel clustering results of NE TV frequency band with different parameter values of ε and MinPts are shown in Figure 5. ε is given the values of 0.11 and 0.21, and the corresponding channel correlation coefficients are 0.985 and 0.965. In the reachability-distance curve, the abscissa represents the order of processing points in the result queue of the OPTICS algorithm, and the ordinate represents the reachability-distance; the cluster appears as a recessed area in the figure, and the deeper the recess, the closer the cluster is; the point where no recess is formed is a density sparse point treated as "noise".

It can be seen from the reachability-distance curve that the smaller the ε is, the more highly correlation of the clustering channels is required, that is, the denser the clustering channels, and the low-density channels cannot be clustered; the smaller the MinPts, the more jagged the graph is and vice versa, the smoother the graph. The empirical value of MinPts is 10 to 20, which is applied in this work. It can be seen from the channels clustering results that a different ε may lead to a different number of clusters of channels, and the smaller ε is, the fewer the number of channels per cluster. Some clusters in the figure have no channel display, because only clusters with cluster channel numbers greater than 3 are set.

There are about 420 sub-channels in NE TV frequency band, but only no more than 180 sub-channels are clustered, because the parameters in OPTICS algorithm are set strictly that means only sub-channels with highly correlation are clustered. If more sub-channels would be clustered according to request, change the parameters of ε and MinPts.

For a cluster, the DC determined by Equation 3 is declared as the center of the cluster for it makes the least uncertainty for the other prediction channels in the cluster. As shown in Figure 5, NE TV band channels are all clustered to two groups with different parameters and only two DCs need to be detected, which can save a lot of time compared to detect all channels one by one. Meanwhile, the DC of each cluster is unchanged though the parameters of ε and MinPts are different, supposed the state of DC is obtained.

OPTICS clustering algorithm is an unsupervised learning. This paper will evaluate the clustering performance based on the prediction results of other channels (ECs) in the channel cluster.

#### 5.2. Performance Analysis

The parameters that evaluate the wideband spectrum sensing performance are false alarm probability (${P}_{f}$) and missed detection probability (${P}_{m}$), which are calculated as follows [20]:
where $c\widehat{s}$ and $cs$ are estimated channel state and actual one respectively, and $N(\cdot )$ is the number of incidents happened. The ${P}_{f}$ represents the probability that the channel state is idle while estimated as busy, which will reduce the spectrum efficiency, and the ${P}_{m}$ represents the probability that the channel state is busy while estimated as idle, which will increase the interference to PUs.

$$\{\begin{array}{l}{P}_{f}=\frac{N(c\widehat{s}=1|\mathrm{cs}=0)}{N(\mathrm{cs}=0)}\\ {P}_{m}=\frac{N(c\widehat{s}=0|\mathrm{cs}=1)}{N(\mathrm{cs}=1)}\end{array}$$

In this paper, the effects of different OPTICS parameters (ε and MinPts) on HMM prediction for different frequency bands (NE GSM1800 DL band and NE TV band) are analyzed, which are shown as Figure 6 and Figure 7.

The simulation results show that the ${P}_{m}$ through HMM prediction is less than 0.035 and the ${P}_{f}$ is less than 0.03 for channel clusters with the CCF greater than 0.95. In the OPTICS algorithm, the smaller the neighborhood threshold, that is, the more highly correlation of the cluster channel, the lower the sum of the false alarm probability and the missed detection probability of the HMM prediction. When the neighborhood threshold became larger, that is, the channel correlation is weakened and the density threshold remained unchanged, the probability of missed detection decreased, and the false alarm probability increased. When the neighborhood threshold is constant and the density threshold became larger, the probability of missed detection increased and the probability of false alarm decreased.

#### 5.3. Performance Comparison

In order to compare with GC algorithm and MEI algorithm in [20], the data of NE TV band channels are employed in our work. In OPTICS algorithm, some channels with sparse points regarded as ”noise” are not clustered in any group, so the efficiency gain (EG) in this paper also presents the detecting time saved compared with detecting all channels., which is calculated as follow:
where ${N}_{c}$ is the number of channels clustered by OPTICS algorithm, and $N$ is the total number of potential channels being clustered. EG is the ratio of clustered channels to the total channels.

$$EG=\frac{{N}_{c}}{N}$$

The EGs in our work are calculated from the clustering results by OPTICS algorithm with different ε and MinPts parameters. The maximum ${P}_{m}$ and ${P}_{f}$ obtained from the simulation of NE TV band channels are compared to the ${P}_{m}$ and ${P}_{f}$ proposed by [20] in Figure 8, while computational complexity are compared in Table 1. As can be seen from Figure 8 and Table 1, the performance based on OPTICS algorithm outperforms greatly than the GC algorithm even though the computational complexity of the former is slightly more, which could be tolerated. Both the prediction performance and computational complexity based on the OPTICS algorithm outperforms than the MEI algorithm.

## 6. Conclusions

In this paper, a wideband spectrum sensing method based on channels clustering and HMM prediction is presented. The highly correlated channels are clustered by OPTICS algorithm. A DC is selected in each channel cluster according to the principle of minimum information entropy, and the states of the other channels (ECs) in the cluster are predicted by HMM. The simulation results show that the proposed method could effectively improve the sensing performance, and the HMM based prediction performs better than the direct estimation. If wider band spectrum needs to be detected, the spectrum resolution could be reduced, which can reduce the amount of data processing correspondingly. Meanwhile, only one channel needs to be detected in a cluster, so that the method would be used for fast wideband spectrum sensing. However, the spectrum prediction accuracy based on statistics cannot meet the needs of cognitive users, which means that it is also just a rough sensing method in some applications with strict precision requirements. Next, channels will be selected according to the rough spectrum sensing results and the communication requirements, and then more accurate and faster spectrum sensing method will be implemented by equipment for the selected channels. For shadow or fading channels, in order to improve the detection performance, cooperative spectrum sensing method is essential [30], and decision fusion rules are our important research contents in the future [31,32].

## Author Contributions

Conceptualization, H.W. and B.W.; methodology, Y.Y.; software, M.Q.; validation, H.W., B.W. and Y.Y.; formal analysis, M.Q.; investigation, H.W.; resources, Y.Y.; data curation, M.Q.; writing—original draft preparation, H.W.; writing—review and editing, M.Q.; visualization, H.W.; supervision, B.W.; project administration, M.Q.; funding acquisition, Y.Y.

## Funding

This research was funded by the National Key Research and Development Project under Grant 2016YFF0104000, in part by the Science and Technology Planning Project of Sichuan Province under Grant 2019YJ0309, in part by the Sichuan Education Department Funded Project under Grant 18ZB0611, in part by the Longshan Academic Research Support Team of Southwest University of Science and Technology under Grant 17LZXT12.

## Acknowledgments

We are grateful for Spectrum Data Archive of the Institute of Net-worked Systems at RWTH Aachen University providing wireless data.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- Mitola, J.; Maguire, G.Q., Jr. Cognitive Radio: Making Software Radios More Personal. IEEE Pers. Commun.
**1999**, 6, 13–18. [Google Scholar] [CrossRef] - Chen, Y.F.; Oh, H.S. A Survey of Measurement-based Spectrum Occupancy Modeling for Cognitive Radios. IEEE Commun. Surv. Tutor.
**2016**, 18, 848–859. [Google Scholar] [CrossRef] - Hattab, G.; Ibnkahla, M. Multiband Spectrum Access: Great Promises for Future Cognitive Radio Networks. Proc. IEEE
**2014**, 102, 282–306. [Google Scholar] [CrossRef] - Ma, Y.; Gao, Y.; Cavallaro, A.; Parini, C.G.; Zhang, W.; Liang, Y.C. Sparsity Independent Sub-Nyquist Rate Wideband Spectrum Sensing on Real-Time TV White Space. IEEE Trans. Veh. Technol.
**2017**, 66, 8784–8794. [Google Scholar] [CrossRef] - Mohsen, G.; Ali, S. Adaptive data-driven wideband compressive spectrum sensing for cognitive radio networks. J. Commun. Inf. Netw.
**2018**, 3, 75–83. [Google Scholar] - Luo, L.; Neihart, N.M.; Roy, S.; Allstot, D.J. A two-stage sensing technique for dynamic spectrum access. IEEE Trans. Wirel. Commun.
**2009**, 8, 3028–3037. [Google Scholar] - Samuel, C.P.; Kankar, D.S. A Low-Complexity Multistage Polyphase Filter Bank for Wireless Microphone Detection in CR. Circuits Syst. Signal Process.
**2017**, 36, 1671–1685. [Google Scholar] [CrossRef] - Xing, X.S.; Jing, T.; Cheng, W.; Huo, Y.; Cheng, X.Z. Spectrum Prediction in Cognitive Radio Networks. IEEE Wirel. Commun.
**2013**, 20, 90–96. [Google Scholar] [CrossRef] - Ding, G.R.; Jiao, Y.T.; Wang, J.L.; Zou, Y.L.; Wu, Q.H.; Yao, Y.D.; Hanzo, L. Spectrum Inference in Cognitive Radio Networks: Algorithms and Applications. IEEE Commun. Surv. Tutor.
**2018**, 20, 150–182. [Google Scholar] [CrossRef] - Melián-Gutiérrez, L.; Zazo, S.; Blanco-Murillo, J.L.; Pérez-álvarez, I.; García-Rodríguez, A.; Pérez-Díaz, B. HF Spectrum Activity Prediction Model Based on HMM for Cognitive Radio Applications. Phys. Commun.
**2013**, 9, 199–211. [Google Scholar] [CrossRef] - Ghosh, C.; Cordeiro, C.; Agrawal, D.P.; Rao, M.B. Markov Chain Existence and Hidden Markov Models in Spectrum Sensing. In Proceedings of the 2009 IEEE International Conference on Pervasive Computing and Communications, Galveston, TX, USA, 9–13 March 2009; pp. 1–6. [Google Scholar]
- He, X.F.; Dai, H.Y.; Ning, P. HMM-Based Malicious User Detection for Robust Collaborative Spectrum Sensing. IEEE J. Sel. Areas Commun.
**2013**, 31, 2196–2208. [Google Scholar] - Rossi, P.S.; Ciuonzo, D.; Ekman, T. HMM-Based Decision Fusion in Wireless Sensor Networks with Noncoherent Multiple Access. IEEE Commun. Lett.
**2015**, 19, 871–874. [Google Scholar] [CrossRef] - Nguyen, T.; Mark, B.L.; Ephraim, Y. Spectrum Sensing Using a Hidden Bivariate Markov Model. IEEE Trans. Wirel. Commun.
**2013**, 12, 4582–4591. [Google Scholar] [CrossRef] - Chakraborty, D.; Sanyal, S.K. Performance analysis of different autoregressive methods for spectrum estimation along with their real time implementations. In Proceedings of the Second International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 23–25 September 2016. [Google Scholar]
- Li, H.S.; Qiu, R.C. A Graphical Framework for Spectrum Modeling and Decision Making in Cognitive Radio Networks. In Proceedings of the 2010 IEEE Global Telecommunications Conference (GLOBECOM), Miami, FL, USA, 6–10 December 2010; pp. 1–6. [Google Scholar]
- Hossain, K.; Champagne, B.; Assra, A. Cooperative multiband joint detection with correlated spectral occupancy in cognitive radio networks. IEEE Trans. Signal Process.
**2012**, 60, 2682–2687. [Google Scholar] [CrossRef] - Yin, S.X.; Chen, D.W.; Zhang, Q.; Liu, M.Y.; Li, S.F. Mining Spectrum Usage Data: A Large-Scale Spectrum Measurement Study. IEEE. Trans. Mob. Comput.
**2012**, 11, 1033–1046. [Google Scholar] - Sun, J.C.; Wang, J.L.; Ding, G.R.; Shen, L.; Yang, J.; Wu, Q.H.; Yu, L. Long-Term Spectrum State Prediction: An Image Inference Perspective. IEEE Access
**2018**, 6, 43489–43498. [Google Scholar] [CrossRef] - Gao, M.F.; Yan, X.; Zhang, Y.C.; Liu, C.Y.; Zhang, Y.F.; Feng, Z.Y. Fast Spectrum Sensing: A Combination of Channel Correlation and Markov Model. In Proceedings of the 2014 IEEE Military Communications Conference, Baltimore, MD, USA, 6–8 October 2014; pp. 405–410. [Google Scholar]
- Huang, S.; Feng, Z.Y.; Yao, Y.Y.; Zhang, Y.F.; Zhang, P. Multi-centers cooperative estimation based fast spectrum sensing. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; pp. 1–6. [Google Scholar]
- Han, W.J.; Huang, C.; Li, J.D.; Li, Z.; Cui, S.G. Correlation-Based Spectrum Sensing with Oversampling in Cognitive Radio. IEEE J. Sel. Areas Commun.
**2015**, 33, 788–802. [Google Scholar] [CrossRef] - Juboori, S.A.; Fernando, X. Multiantenna Spectrum Sensing Over Correlated Nakagami-m Channels with MRC and EGC Diversity Receptions. IEEE Trans. Veh. Technol.
**2018**, 67, 2155–2164. [Google Scholar] [CrossRef] - Cacciapuoti, A.S.; Akyildiz, I.F.; Paura, L. Correlation-aware user selection for cooperative spectrum sensing in cognitive radio ad hoc networks. IEEE J. Sel. Areas Commun.
**2012**, 30, 297–306. [Google Scholar] [CrossRef] - Wellens, M.; Mahonen, P. Lessons learned from an extensive spectrum occupancy measurement campaign and a stochastic duty cycle model. In Proceedings of the 2009 5th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities and Workshops, Washington, DC, USA, 6–8 April 2009; pp. 1–9. [Google Scholar]
- Gao, M.F.; Yan, X.; Zhu, Y.; Zhang, Q.X.; Feng, Z.Y.; Liu, B.L. Channel Correlation Assisted Fast Spectrum Sensing. In Proceedings of the 2013 IEEE 78th Vehicular Technology Conference (VTC Fall), Las Vegas, NV, USA; 2013; pp. 1–5. [Google Scholar]
- Omrani, A.; Santhisree, K.; Damodaram. Clustering sequential data with OPTICS. In Proceedings of the 2011 IEEE 3rd International Conference on Communication Software and Networks, Xi’an, China, 27–29 May 2011; pp. 49–60. [Google Scholar]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X.W. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Sun, J.C.; Shen, L.; Ding, G.R.; Li, R.P.; Wu, Q.H. Predictability Analysis of Spectrum State Evolution: Performance Bounds and Real-World Data Analytics. IEEE Access
**2017**, 5, 22760–22774. [Google Scholar] [CrossRef] - Ganesan, G.; Li, Y. Cooperative Spectrum Sensing in Cognitive Radio, Part II: Multiuser Networks. IEEE Trans. Wirel. Commun.
**2007**, 6, 2214–2222. [Google Scholar] [CrossRef] - Rossi, P.S.; Ciuonzo, D.; Romano, G. Orthogonality and Cooperation in Collaborative Spectrum Sensing through MIMO Decision Fusion. IEEE Trans. Wirel. Commun.
**2013**, 12, 5826–5836. [Google Scholar] [CrossRef] - Mohammad, F.R.; Ciuonzo, D.; Mohammed, Z.A.K. Mean-Based Blind Hard Decision Fusion Rules. IEEE Signal Process. Lett.
**2018**, 25, 630–634. [Google Scholar] [CrossRef]

**Figure 3.**Channel occupancy and channel correlation distribution. (

**a**) NE GSM1800 DL channel occupancy distribution. (

**b**) NE GSM1800 DL channel correlation distribution. (

**c**) NE TV channel occupancy distribution. (

**d**) NE TV channel correlation distribution.

**Figure 5.**OPTICS algorithm clustering NE TV band channels. (

**a**) Reachability-distance curve (ε = 0.11, MinPts = 10). (

**b**) channels clustering result (ε = 0.11, MinPts = 10). (

**c**) Reachability-distance curve (ε = 0.21, MinPts = 10). (

**d**) channels clustering result (ε = 0.21, MinPts = 10). (

**e**) Reachability-distance curve (ε = 0.21, MinPts = 20). (

**f**) channels clustering result (ε = 0.21, MinPts = 20).

**Figure 6.**${P}_{m}$ and ${P}_{f}$ predicted by HMM for NE GSM1800 DL band. (

**a**) The relational graph between ${P}_{m}$ and CCF with different ε and MinPts. (

**b**) The relational graph between ${P}_{f}$ and CCF with different ε and MinPts.

**Figure 7.**${P}_{m}$ and ${P}_{f}$ predicted by HMM for NE TV (614-698MHz) frequency band. (

**a**) The relational graph between ${P}_{m}$ and CCF with different ε and MinPts. (

**b**) The relational graph between ${P}_{f}$ and CCF with different ε and MinPts.

OPTICS Algorithm | MEI Algorithm | GC Algorithm | |
---|---|---|---|

Time Complexity | $\mathrm{O}({N}^{2})$ | $\mathrm{O}({N}^{5})$ | $\mathrm{O}(\mathrm{ln}N+1)$ |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).