1. Introduction
Today, drones or unmanned aerial vehicles (UAVs) are used to improve our daily lives. Due to the fast development of embedded devices and wireless communications, drones have become cheaper and more powerful. For example, drones contribute to civil areas, such as logistics and agriculture, and play important roles in responding to search and rescue in emergency disasters [
1,
2]. Drones have become an integral part of society’s rapid development [
3,
4]. However, the widespread use of drones without government regulations may cause risks related to people’s security and privacy. For example, drones are used by some people to eavesdrop on wireless communication data from long distances [
5]. Therefore, relevant authorities have considered the safety and privacy issues involved in drones; efficient identification and detection of drone signals need to be adopted [
6].
Radiofrequency (RF) fingerprinting-based recognition technology [
7,
8,
9] is a classification technology based on the physical layer measurements. RF fingerprinting plays an important role in the recognition and detection of drones to accurately identify a variety of Internet of Things (IoT) devices [
10]. Because the inherent characteristics and specifications of different IoT devices (e.g., radiofrequency) are not completely consistent, RF fingerprinting technology detects and identifies different devices by extracting subtle differences. In addition, the process of RF fingerprinting recognition usually includes two steps: training and classification [
11,
12,
13], which are shown in
Figure 1. First, we use the RF data receiver to collect RF data from different IoT devices, such as nonlinear phase changes and frequency offsets. Subsequently, the RF fingerprinting characteristics of each device were extracted and stored in our database [
14]. Second, we can identify and classify the signals of unknown devices according to the prepared RF fingerprinting database obtained in the first step. M. Ezuma et al. [
15] used the
k-nearest neighbor (KNN) classifier to detect and classify RF signals from different UAV controllers. However, there is an upper limit to the recognition accuracy of RF fingerprinting based on the traditional algorithm and we urgently need better technology with higher recognition performance [
16].
Deep learning-aided algorithms [
17,
18,
19,
20,
21] are widely used in the field of wireless communications and other fields [
22,
23,
24] because due to their efficient feature extraction and recognition capabilities [
20,
25,
26,
27,
28,
29,
30,
31,
32]. Convolutional neural networks (CNNs) have improved the classification accuracies of automatic modulation classifications (AMCs) [
33]. Y. Wang et al. [
34] proposed a new AMC method combined with two CNNs trained on different datasets and achieved higher identification accuracy. In addition, the use of the DL algorithm in RF fingerprint recognition technology has also led to many outstanding results. For example, L. Peng et al. [
35] utilized the differential constellation trace figure (DCTF) and CNN algorithm to achieve 99% recognition accuracy on 54 Zigbee devices.
RF fingerprinting technology based on DL has achieved high classification accuracy based on deep real-valued networks. The RF signals sent by wireless devices combine in-phase and quadrature components. Compared with a deep real-valued network, a deep complex-valued network [
36] extracts more abstract data from drone RF signals (the RF signals transmitted from drones), which help to achieve higher classification accuracy. Inspired by this, this paper proposes a DC-CNN-based RF recognition technology to detect different drone signals. Unlike most RF technologies that use real-valued CNN models, our proposed algorithm model is based on a deep complex-valued network, which extracts the hidden features of drone signals with high accuracy in comparison with a real-valued CNN. The main contributions of this article are summarized as follows:
We propose a drone recognition technology based on a DC-CNN model with improved classification performances within two given independent drone signal datasets.
Our study used recently published drone datasets [
37] in which drone RF data (measured under different operating modes) and background activities were captured in a laboratory setting at Qatar University. This dataset used two RF signal receivers to receive the high and low-frequency signal data of the drone and the entire RF spectrum was obtained by performing a discrete Fourier transform (DFT) on these signal data.
We present nine different models that compare and evaluate classification performances to show the superior performance of the DC-CNN model. We comprehensively evaluated the performance of each algorithm and found that the proposed DC-CNN model is superior to the other algorithm models.
The remainder of the paper is organized as follows:
Section 2 is an overview of the related work.
Section 3 introduces our proposed drone recognition system design and some basic theories of the deep complex-valued network.
Section 4 provides the architectures of two algorithm models and their training steps. The dataset implementations and simulation results of our drone recognition methods are described in
Section 5. Finally, the paper is concluded in
Section 6.
4. Algorithm Model and Implementation
In this section, we focus on two DL algorithm models. We first introduce an existing classification algorithm model: convolutional, long short-term memory, fully connected deep neural network (CLDNN), which incorporates three classic DL algorithm models. Next, we will elaborate on the DC-CNN algorithm proposed in this paper for drone detection, which is extended from a deep real-valued CNN (DR-CNN). Finally, the architecture of another DL algorithm and specific training steps are described.
4.1. Architecture of CLDNN
CLDNN combines the advantages of three DL algorithm models: CNN, long short-term memory (LSTM), and deep neural network (DNN). The architecture of the CLDNN model is depicted in
Figure 6. In addition to the input and output layers, the CLDNN model includes the convolutional part, the LSTM part, and the fully connected (FC) part. The sample size of the drone dataset used in this paper is
, which is also the size of the input layer.
This model has two convolutional layers—one LSTM layer and three FC layers. The size of the convolutional kernel in the first convolution layer is , and in the second convolutional layer is . In order to better extract the features, the convolutional kernel numbers in the two convolutional layers are 128 and 64, respectively, and the only LSTM layer has 256 neurons. In addition, the activation functions of the convolutional and LSTM layers are , and the dropout layer is behind each of them in order to reduce overfitting and accelerate the network convergence. Moreover, the neuron numbers in the three FC layers are 256, 128, and M (the type of drones to be distinguished). By connecting the FC layer with the activation function, we can output the predicted probability of the target drone and realize the drone recognition.
4.2. Architecture of DC-CNN
DC-CNN is used to extract hiding features from drone signals because it has a much more powerful and effective feature representation and classification capability than DR-CNN in signal identification. Unlike the CLDNN model, the proposed DC-CNN algorithm model in this paper is based on the fusion of a traditional CNN and deep complex-valued network. The architecture of the DC-CNN model used in drone detection is depicted in
Figure 7. In addition to the input and output layers, our proposed DC-CNN model includes a complex-valued convolutional part and a complex-valued FC part, which use the theory of a complex-valued convolution operation and complex-valued weight initialization, as described in
Section 4. The sample size of the drone dataset used in this paper is
, which is also the input size in our proposed model. Moreover, the specific parameter settings are based on our training experience and were adjusted after multiple experiments to achieve the best recognition ability.
Our proposed DC-CNN model has two complex-valued convolutional layers and three complex-valued FC layers. The size of the complex-valued convolutional kernel in the first complex-valued convolution layer is 16, and in the second complex-valued convolutional layer is 8. In order to better extract features, the complex-valued convolutional kernel numbers in the two complex-valued convolutional layers are 128 and 64, respectively, and all complex-valued convolution layers use one-dimensional complex-valued convolution (). Additionally, the activation functions of complex-valued convolutional layers are complex-valued () and the dropout layer is behind each of them in order to reduce overfitting and accelerate network convergence. Moreover, the neuron numbers in the three complex-valued FC layers are 256, 128, and M (the type of drones to be distinguished). By connecting the complex-valued FC layer with one activation function, we can output the predicted probability of the target drone and realize drone detection.
4.3. Architecture of Other DL Models
As mentioned above, in reference [
40], the author used a real-valued convolutional neural network to identify RF data. In this paper [
37], the author used three-layered fully connected neural network structures to identify drone signals. We also used these two DL models for comparative experiments to show our superiority. The rest of the DL models used in this paper contain DR-CNN (Conv2D) and DR-CNN (Conv1D) in [
40], a fully connected neural network (FCN) in [
37], and LSTM models, whose architectures are shown in
Table 1.
4.4. Training Process of DL-Based Drone Recognition Method
The DL-based drone RF fingerprinting recognition method is modeled as a multi-class classification problem and cross-entropy (CE) is generally applied as its loss function. Suppose that the training samples and labels are
, the CE loss function can be expressed as:
this function represents the CE loss function, which can indicate the classification performance;
represents the mapping between the input and output of the DL model and
is its weight. Considering that we have thousands of samples for each training, it is impossible to put them into the network simultaneously. Our model training batch size is 64, which means that the loss value for each calculation is calculated from 64 drone samples.
Additionally,
is used as the optimizer for DL-based methods, which can be written as:
where
and
are the biased first and second-moment estimates;
and
are the bias-corrected first and second-moment estimates;
is the gradient;
and
are the decaying factors;
is the learning rate;
is the minimum.
Finally, the entire proposed intelligent drone recognition algorithm steps are presented. Algorithm 1 lists the pseudo-code of the DL-based drone RF fingerprinting recognition method.
Algorithm 1 The proposed DL-based drone recognition method. |
- Require:
IQ samples with the size of ; - Ensure:
The best algorithm model for the drone detection method; - 1:
initial and ; - 2:
Select IQ samples in drone datasets and mix them randomly; - 3:
Initialize the complex-valued neural network weight parameter ; - 4:
Divide all drone datasets into training sets and testing sets within 7:3; - 5:
Send the IQ samples to different DL models for training. The structure and parameters of the CLDNN algorithm model are shown in Figure 6 and those of DC-CNN are shown in Figure 7; - 6:
Test and verify data multiple times in order to obtain the average recognition accuracy and each sample’s running GPU time; - 7:
Evaluate the various indicators of each algorithm and achieve the best algorithm for the drone recognition method; - 8:
return The best algorithm model.
|
4.5. Comparison Method: TD Feature with ML Recognizers
Here, we adopt a TD feature and ML-based drone recognition as the comparison method, which is given as follows.
4.5.1. Pre-Processing & Conversion
Convert the complex baseband signal into other forms. For instance, the complex baseband signal is defined as , and , where and are the in-phase and quadrature components, respectively, and K is the number of sampling points. In the time domain, the baseband signal can be converted into instantaneous amplitude , instantaneous phase , and instantaneous frequency . In addition, the conversion can also be performed by wavelet transform, Hilbert–Huang transform, or differential constellation trace figures.
4.5.2. Feature Extraction
Here, we mostly introduce the time domain (TD) features [
42]. In detail, the first step of feature extraction is to evenly divide the converted signal component into multiple slices. Assuming that the converted signal component is
, the length of each slice is
, and the number of slices is
, the
i-th component slice can be written as
Features are extracted from each slice , and the total signal component . The typical features are the standard deviation , variance , skewness , and kurtosis . The four features extracted from the i-th slice can consist of a feature vector . Next, these feature vectors and , respectively, are extracted from each slice and total signal component, and are integrated into the final feature vector for traditional transmitter device recognition.
Except for the above TD features, energy entropy, the first moments, and the second order moment are useful temporal features, while spectral flatness, spectral brightness, and spectral roll-off are effective spectral features.
4.5.3. Recognition-Based on ML
With the development of ML, some ML recognizers, such as SVM, random forest (RF), decision tree (DT), and so on, are applied to identify various specific emitters based on the above extracted feature.
5. Results and Discussion
In this section, the performance of our proposed DC-CNN algorithm model is demonstrated and analyzed using two different drone datasets. We compared a total of nine different machine learning or DL models, all of which were trained within independent training and testing datasets. We comprehensively evaluated all algorithms in the classification accuracy, sample running time, model size, and so on. The details are shown below.
5.1. Dataset Description and Experimental Setup
Similar to most DL-based RF fingerprinting research studies, we used two RF receivers to receive high-frequency and low-frequency signals when multiple drones were running. The portable computer was used to perform DFT to the RF data collected from two RF receivers and connect them to form the entire drone RF spectrum. We used two independent RF fingerprint-based drone datasets for training and testing. In
Table 2, we provide all details of the drone datasets used in our simulation experiment. Dataset 1 contained four classes of drone data: background activities (noise signals in the space without drones), drone 1 activities, drone 2 activities, and drone 3 activities. Those three drone activities were in flight modes without any instructions or operations. This dataset was used to verify whether our proposed DC-CNN could identify different drones accurately. Moreover, dataset 2 contained RF data collected from two kinds of drones in four different operating modes, including connecting modes (connecting to the controller), automatic hovering modes (without other instructions), straight flight modes (flying without video recording), and recording modes (flying with video recording). Two different drones were tested and collected data in four different operating modes, totaling eight classes of drone data. This dataset was used to verify whether our proposed DC-CNN could perform better in identifying the different operating modes of one drone. The size of each class of RF samples is
and its number is 1100, which means that there are 4400 samples in dataset 1 and 8800 samples in dataset 2.
Moreover, extensive experiments were performed in order to evaluate the classification performances of our proposed DC-CNN model. We used one RF signal collector and one computer, which included two operating systems: Window10 and Ubuntu 16.04.1-Linux. Further, the computer contained 8 Intel Xeon E3 (x 86_64) central processing units (CPUs) and 4 NVIDIA GTX1080Ti graphic processing units (GPUs), which could efficiently handle various matrix multiplication and convolution operations. On the one hand, the Windows system mainly uses MATLABR2019a software to preprocess RF drone data. On the other hand, the Linux system mainly uses software for training and testing DL models. In addition, we used the software library based on language to complete the construction of DL models.
5.2. Accuracy of DL and Traditional Algorithm Methods within Two Datasets
Accuracies of the comparisons and traditional algorithms are shown in
Figure 8; all 9 algorithm models were used to distinguish between dataset 1, which contained 4400 samples, and dataset 2, which contained 8800 samples. As the number of classification samples increased, the recognition accuracy decreased, as expected. The downward trend of each algorithm was basically the same. We first compared the accuracy of our proposed DC-CNN algorithm with traditional signal processing algorithms, such as random forest with time domain features (TD-RF), decision tree with time domain features (TD-DT), and support vector machine with time domain features (TD-SVM). Compared with the DL models, the accuracies of the traditional signal processing algorithms dropped faster.
In order to further compare the performance of our proposed algorithm, we selected popular DL algorithms for training, such as CLDNN, DR-CNN (Conv2D), and LSTM. The recognition accuracy of LSTM is not ideal, meaning that it is not suitable for recognizing drone signals. The accuracies of the three DL algorithms were very close, which indicated that they all could effectively identify the signal. The CLDNN model had a (more than) 30% higher recognition accuracy than DR-CNN (Conv2D). The DR-CNN (Conv1D) model had a (more than) 30% higher recognition accuracy than the FCN model in dataset 2. In addition, we compared FCN and DR-CNN (Conv1D); DC-CNN was more capable of identifying different drone RF signals. The accuracies of most algorithms decreased in a nonlinear manner.
As shown in
Table 3, we compared the classification accuracies of nine algorithms in total. The recognition accuracy of the DC-CNN model in four classes was 99% and dropped to 74% in eight classes. When training on dataset 1, the recognition accuracies of the first four algorithms in
Table 1 were all higher than 90%. In addition, compared to classic DL models (CNN, FCN, LSTM), DC-CNN and CLDNN models were more capable of identifying different drone RF signals. The DC-CNN model proposed in this paper achieved the best recognition accuracies in two datasets, which were 99.5% and 74.1%, respectively.
Additionally, it can be clearly seen from
Table 3 that DR-CNN (Conv2D) performed better than DR-CNN (Conv1D) in dataset 1, but it had lower accuracy than DR-CNN (Conv1D) in dataset 2. The size of the input drone RF datum was
, which is two-dimensional; considering the excellent performance of the CNN in image recognition, we expanded the two-dimensional data into three-dimensional data in the DR-CNN (Conv2D) input layer in order to extract features better. In this case, DR-CNN (Conv2D) was accomplished at processing three-dimensional data, and the recognition accuracy should have been better than that of DR-CNN (Conv1D), as reflected in dataset 1. However, the recognition performance of DR-CNN (Conv2D) in dataset 2 had a 7% lower recognition accuracy than DR-CNN (Conv1D). We believe that DR-CNN (Conv2D) had an overfitting phenomenon in training, which led to the low test results.
5.3. Learning Curves of Different DL Models in Different Datasets
This section will illustrate the training processes of models that obtained the top three recognition accuracies and analyze the loss convergence speeds between different models.
Figure 9 shows the convergence curve in dataset 1. These three algorithms converged quickly, all approaching 0.
Figure 10 shows the convergence curve in dataset 2. Due to the increased difficulty in identification, the convergence speed slowed down. In particular, compared with CLDNN and DR-CNN, the DC-CNN model designed in this paper was always the first to achieve convergence, and its loss value was closer to 0. In addition, although the convergence speed of CLDNN was slower than that of DR-CNN, its final loss value was lower than that of CNN, and the recognition performance was better.
5.4. Algorithm System Comparison
Figure 11 depicts the running GPU time required for each algorithm to identify a single drone sample and the total parameters contained in that algorithm model. It can be seen from the figure that the parameters included in one model were not directly related to the GPU time. We can clearly see that the total model parameters of the DC-CNN model were roughly the same as DR-CNN (Conv1D) and FCN models, but the GPU time of the DC-CNN model was much less than those in the two datasets. In addition, the CLDNN model’s GPT time is similar to that of the DR-CNN model, but the total model parameters are four times that of the DR-CNN model. Further, we can conclude that the recognition accuracy of CLDNN is higher than most other algorithms (obtained at the cost of too many model parameters and running times). In summary, we conclude that even if the complex-valued network has the same parameters as the real-valued network, the complex-valued network has a lower processing time and faster recognition speed.
5.5. Confusion Matrix of the DC-CNN Model in Different Datasets
Subsequently, we show the confusion matrices of our proposed DC-CNN model within different datasets in
Figure 12 and
Figure 13. These percentage matrices can apparently reflect some details that cannot be seen above.
Figure 12 represents dataset 1 (four classes) and
Figure 12 represents dataset 2 (eight classes). By observing the diagonals of all the confusion matrices, we can see the recognition performance of our algorithm model for each class. First, we find that the DC-CNN model can accurately identify each type of drone signal in dataset 1. Even in the worst case, the classification accuracy of background activity signals can achieve 98.5%. Moreover, for dataset 2, the name of each class is composed of
M and two numbers—the first number indicating the drone model and the second number indicating the running mode of the drone. For example, the meaning of
is the signal received by the first drone in mode 2. Therefore, we can easily understand that the DC-CNN model is difficult to identify the signal of the second drone in mode 3 (
), whose identification accuracy is only 4.88%.
We can see that from four to eight classes, the accuracy of our proposed DC-CNN model reduces by about 25%. In dataset 1, the signals from the different categories of drones are distinguished, and the physical properties of their signal transmitters are very different, which is easy to recognize. However, we mainly focus on distinguishing the signals in different operating modes of the same drone in dataset 2, which is much more difficult. There are mainly two difficulties that lead to errors. First, the identification errors between different operating mode signals are from the same drone (e.g., 53.28% of the D2 record is recognized as the D2 connect). Second, identification errors of the same operating mode signal are from different types of drones (e.g., 65.92% of the D2 fly is recognized as the D1 fly). This is also the main reason for the decline in the accuracy of the DC-CNN model within dataset 2.
5.6. Additive Evaluation of DC-CNN Model in Different Datasets
Finally, we evaluate the performance of the DC-CNN algorithm model in two datasets from other aspects. The additive performance evaluation indicators that we used in this paper are
,
, and
. The definitions of precision and recall are as follows:
Among them, TP, FP, and FN indicate true positive, false positive, and false negative, respectively. The value of the
is determined by the values of
and
, whose definition is:
As shown in
Figure 14 and
Figure 15, blue is the
value, green is the
value, and the red curve represents the
for each class. The values of these three evaluation indicators in
Figure 14 are all above 98%, showing that the DC-CNN model performs excellent in identifying drone dataset 1. In addition, we can clearly see from
Figure 15 that the three indicators of DC-CNN in drone dataset 2 perform very well in most classes. However, the recognition performance of the
class seems to be worse; the
and
values do not even reach 10%. Moreover, we observe that the
value of
and the
value of
are only about 50%, which will be the focus of our next research study.
6. Conclusions
In this paper, we proposed a new drone recognition method based on DC-CNN. Unlike conventional DR-CNN methods, our proposed DC-CNN method can extract more hidden features from drone RF signals that have in-phase and quadrature parts. We used nine different drone recognition algorithm models that were trained separately on two independent datasets and evaluated their performances regarding their classifications, GPU times, and parameters. The experimental results show that the classification accuracy of the DC-CNN model to identify dataset 1 was 99.5%, and that of dataset 2 was 74.1%. Simulation results prove that our proposed algorithm performs well compared to other existing DL-based drone recognition algorithms. In future work, we will consider further optimizing network parameters and running times.