P ER D ET : Machine-Learning-Based UAV GPS Spooﬁng Detection Using Perception Data

: To ensure that unmanned aerial vehicle (UAV) positioning is not affected by GPS spooﬁng signals, we propose P ER D ET , a perception-data-based UAV GPS spooﬁng detection approach utilizing machine learning algorithms. Based on the principle of the position estimation process and attitude estimation process, we choose the data gathered by the accelerometer, gyroscope, magnetometer, GPS and barometer as features. Although these sensors have different shortcomings, their variety makes sure that the selected perception data can compensate for each other. We collect the experimental data through real ﬂights, which make P ER D ET more practical. Furthermore, we run various machine learning algorithms on our dataset and select the most effective classiﬁer as the detector. Through the performance evaluation and comparison, we demonstrate that P ER D ET is better than existing methods and is an effective method with a detecting rate of 99.69%. For a fair comparison, we reproduce the existing method and run it on our dataset to compare the performance between this method and our P ER D ET approach.


Introduction
Unmanned aerial vehicles (UAVs) are widely used in various domains, such as the agriculture, military and recreation.For any application, positioning is important.UAV outdoor positioning primarily depends on GPS.However, it is easy for an attacker to deceive a GPS receiver.A GPS spoofing attack can be implemented with a low-cost device [1][2][3].There exist a variety of GPS spoofing techniques targeting UAVs [4,5].An attacker can use a device (for example, a software-defined radio device) to generate fake GPS signals or record GPS signals.Then, signals are sent to deceive a UAV with fake positions, which may lead to incorrect flight paths.The effects of GPS spoofing have been analyzed and demonstrated on UAVs [1,[6][7][8][9]].An attacker can capture and control a UAV by the GPS spoofing [8,9].
To detect GPS spoofing attacks, some researchers [10][11][12][13][14] apply machine-learningbased methods with sensor data for the detection.Wang et al. [10] applied long short-term memory (LSTM) to design a GPS spoofing attack detection method.Their training data contained the velocity, acceleration, latitude, and longitude in the x axis and y axis of UAVs.The LSTM algorithm was trained so that it could predict UAV positions with the given flight path that included the feature data.Meanwhile, a UAV had to fly along a given path for detecting GPS spoofing attacks.Finally, the detection was implemented by measuring the difference between the predicted position and the position provided by a GPS receiver.The determination was performed based a given threshold value.They evaluated the method through the MATLAB simulation.Panice et al. [11] used support vector machine (SVM) to propose a detection approach for UAV GPS spoofing attacks.They trained the SVM model with the error between the positions calculated by GPS signals and the positions measured by the inertial navigation system (INS).Their dataset was collected through the simulation.Feng et al. [12] proposed a detection method for GPS spoofing attacks on UAVs using the XGBoost model.Their features consisted of the angular velocity, acceleration and distance.The distance was calculated by the longitude and latitude between two time points.Their binary classification method could directly determine if a UAV was under the GPS spoofing attack.Calvo-Palomino et al. [13] trained the LSTM model with GPS Doppler shift measurements in 5 seconds of granularity, so that the GPS spoofing detection was implemented by the comparison of the Doppler effect and its predictable pattern.The effectiveness was demonstrated through simulation experiments.Kim et al. [14] proposed a detection method based on the multi-layer perceptron (MLP) model, which was trained with the data from the gyroscope, accelerometer and GPS.The performance was evaluated with a software-in-the-loop (SITL) simulation.Although these methods have applied machine learning algorithms to the perception data, only a few types of sensor data are selected.They neglect shortcomings of sensors, for example, the influence of the vibration on the accelerometer, the cumulative error of the gyroscope, the electromagnetic interference on the magnetometer and the effect of wind on the barometer.Thus, the error of the sensor itself may be misjudged as spoofing attacks.The complementarity among the perception data is not considered, which can effectively mitigate the error.In addition, some researchers [15][16][17][18] detect GPS spoofing attacks by comparing the results calculated by sensors.Feng et al. [15] leveraged the data from the gyroscope and accelerometer to calculate the acceleration in the body-fixed coordinate, and compared it with the position measured by the GPS to detect GPS spoofing attacks.They also proposed another detection method by comparing the yaw calculated using the angular velocity from the gyroscope with the angle calculated by the GPS data [16].Keum and Duk [17] detected GPS spoofing attacks by the direct comparison of the acceleration estimated from the GPS receiver and the acceleration measured from the accelerometer.Meng et al. [18] proposed a detection method for GPS spoofing attacks on UAVs based on linear regression (LR), which was used to predict the longitude and latitude for the comparison of the GPS-based position.Its performance evaluation was based on the simulation.However, these methods have same problems with aforementioned researchers, which merely consider a small number of sensor data types.Meanwhile, the gyroscope has cumulative error, the accelerometer is easily affected by the vibration and the GPS localization has errors.These shortcomings can be made up by other sensors.Moreover, some works [19,20] require collaborating with other UAVs to detect GPS spoofing attacks.Liang et al. [19] used the ground control station (GCS) to calculate the position with the received positions from multiple UAVs.Then, the GCS sent the position to UAVs for the detection of GPS spoofing attacks.The evaluation was carried out by the simulation experiments.Jansen et al. [20] detected GPS spoofing attacks by leveraging crowdsourcing to monitor the position data derived from the periodical broadcast GPS data by UAVs or aircraft.They evaluated this method with both the real-world data and the simulated data.These approaches are not suitable for a single UAV and they need to work together with other UAVs for the detection of GPS spoofing attacks.Furthermore, other techniques are used to detect GPS spoofing attacks, such as image matching [21], video-based location [22] and Kullback-Leibler divergence [23].Xue et al. [21] proposed a satellite imagery matching approach to detect GPS spoofing attacks on UAVs.The detection was implemented through the comparison of historical satellite images (collected from Google Earth) at the GPS-based position and real-time aerial images (taken by cameras).This method requires that a UAV has a large storage capacity to pre-store lots of satellite images.Moreover, to detect GPS spoofing attacks, it need to interrupt the filming mission to take pictures of the ground.Barak et al. [22] proposed a UAV GPS spoofing detection method with frames collected from camera's video stream and locations obtained from a GPS.They calculated the similarity correlation between frames and compute the distance between the frame-corresponding GPS-based positions.To predict the position for the comparison with the GPS-based position, they graphed the correlation vs. distance, built a linear regression function suitable for the graph, and learned how a UAV behaved.Their method was evaluated through the simulation data and real data.However, this method could be affected by different terrain, ambient light and altitude, which showed that it was impractical.Elena et al. [23] proposed a technology to detect GPS spoofing attacks on UAVs using Kullback-Leibler divergence.They first used Poisson distribution to describe the random variation of UAV parameters, including flight altitude, the number of satellites that the UAV sees, GPS speed, flight angle, latitude and longitude.Then, they calculated the entropy value using Kullback-Leibler divergence.They evaluated the approach through the simulation.However, the abnormal behavior caused by an attack or environmental influences could produce higher entropy values.
By analyzing existing works, we summarize their weaknesses, as follows.
• Most existing machine-learning-based detection approaches only consider a few types of perception data.They cannot sufficiently represent the relationship among perception data since each sensor has shortcomings, for example, an accelerometer is sensitive to vibrations.

•
Most detection methods are proposed and evaluated with the simulation data, which is not able to reflect real scenarios.

•
The comparison between different detection techniques is difficult.Some approaches directly compare the value of the accuracy (the detection rate).However, these techniques are evaluated with different dataset.It may reduce the credibility of the comparison.
In this paper, we propose a perception-data-based UAV GPS spoofing detection approach utilizing machine learning algorithms.This approach is called PERDET.We take 15 kinds of perception data into consideration.Through the analysis of the position estimation and attitude estimation process [24], the perception data collected by the accelerometer, gyroscope, magnetometer, GPS and barometer are selected as representative features for model training.These perception data can compensate the shortcomings of each other.Since at least two kinds of sensors support the calculation of one same result, each sensor can confirm the results calculated by other sensors.For example, the yaw can be calculated by the angular velocity of a gyroscope or the magnetic forces of a magnetometer.To make PERDET more practical, we build a dataset using real flights, including normal flight scenarios and attacked flight scenarios.Furthermore, we run various machine learning algorithms on the selected feature data.Through the performance evaluation, we select the most effective classifier as a detector in our PERDET approach to detect GPS spoofing attacks.Finally, we implement an existing UAV GPS spoofing detection approach, and we run it on our dataset for the performance comparison with our PERDET.This comparison is based on the same dataset, which persuasively demonstrates the effectiveness of PERDET.
The contributions of this paper are summarized as follows.
• Feature selection is based on the complementarity among the perception data.The relationship among these data is analyzed from the view of the position estimation and attitude estimation processes.The selected features can compensate the weaknesses of different kinds of sensors.This method of feature selection is not considered by existing works.

•
We build a dataset through real flights, including normal and attacked scenarios.Machine-learning-based detection approaches are very dependent on the dataset, which is able to affect the effectiveness and feasibility of the classifier.Most existing works are proposed and evaluated through the simulation.

•
We have implemented the existing method, run it on our dataset and compared its performance with our PERDET.The experiment comparison demonstrates that PERDET is better than existing methods.
The rest of this paper is organized as follows.In Section 2, we introduce the preliminaries of this study.In Section 3, we propose our detection method, PERDET, and analyze features for the selection of appropriate data types to represent the perception data.Section 4 explains experimental results, including the data collection, feature selection, model selection and evaluation.Section 5 compares and discusses our PERDET approach with existing detection methods.Section 6 concludes this paper.

Preliminaries
This section focuses specifically on the background of the UAV perception data, primarily discussing the UAV architecture and sensors used to collect the perception data, including the GPS, accelerometer, gyroscope, magnetometer and barometer.

The UAV Architecture
The architecture of the UAV is presented in Figure 1.Perception is an indispensable ability of a UAV.It perceives the UAV motion and the external environment with the GPS receiver, accelerometer, gyroscope, magnetometer and barometer, as shown in the sensors of the figure.The perception data is transmitted to the position and attitude estimator.The estimator composes various original perception data to obtain estimated attitude, angle, position and speed using the data fusion and filter algorithms.The estimation process is based on the complementarity between sensors to make up the shortcomings of sensors, for example, an accelerometer and a magnetometer can measure the attitude and a gyroscope can also do it.For our PERDET approach, we use the original perception data for the detection of GPS spoofing attacks.The navigation system is responsible for mission planning and path planning.The control system mainly contains the position, altitude and attitude controllers.

GPS
GPS is a kind of the Global Navigation Satellite System (GNSS).China's Beidou, Russia's Glonass, and the European Union's Galileo are also common GNSSs.The GNSS is a space-based radio navigation and positioning system.It provides users with all-weather 3D coordinates, speed and time information at any location on the Earth's surface or near-Earth space.There are three parts in the satellite navigation system.They are the space constellation part, ground monitoring part and user equipment part.The user equipment is the GPS receiver, which can be installed in the UAV.This receiver can be a GPS module or a complex multi-frequency multi-system receiver.The positioning is achieved by calculating the satellite signals containing the satellite orbit information.
The GNSS works in the same way, using the distance between the known location satellite and the user receiver to obtain the specific position of the receiver.In brief, each satellite has a high precision atomic clock to ensure the time synchronization between the on-orbit satellite and ground.A satellite constantly sends its location and time data signals to the ground.The sending time is compared to the time of data received by the ground receiver, and further combining the satellite location is able to calculate the distance between the receiver and the satellite.It needs three satellites to compute the receiver's three-dimensional position.Firstly, three satellites are used as sphere centers to build spheres with the distance between the receiver and the satellite as the radius.Then, the intersection of three spheres is determined as the position of the receiver.Since the atomic clocks on the satellite and ground have time error, it produces an extra variable.There are total 4 variables that require 4 equations established by 4 mutual independent satellites.Therefore, only four visible satellites are needed to locate any location on earth.

Triaxial Accelerometer
A triaxial accelerometer is an inertial sensor.It is used to sense the specific force on an object.This force is the overall acceleration without gravity or the non-gravitational force applied per unit mass.The overall acceleration is zero when the accelerometer is stationary; the overall acceleration is the acceleration of gravity when the accelerometer is free falling, but its internal acceleration is gravity-less, so the three-axis accelerometer is zero in this case.
A triaxial accelerometer can be used to measure angles based on its principle.Intuitively, the angle of the accelerometer to the ground decides the amount of compression on the spring, as shown in Figure 2. The specific force is measured by the amount of compression on the spring.Therefore, if there is no external force, the accelerometer can accurately measure the pitch angle and roll angle with no cumulative error.However, it cannot measure the yaw angle.Micro-Electro-Mechanical System (MEMS) three-axis accelerometer is widely used in UAVs.It is based on piezoresistive, piezoelectric and capacitive working principles.The produced pressure or displacement is proportional to the change of resistance, voltage and capacitance, respectively.The specially designed amplifier and filter circuits can measure the change.The disadvantage of the MEMS three-axis accelerometer is susceptible to the vibration.

Triaxial Gyroscope
A triaxial gyroscope is also an inertial sensor.It is designed to measure the angular velocity and calculate the angle after integration of angular velocity in UAVs.To understand how a three-axis gyroscope works, you must first know the Coriolis force.
Coriolis forces.When a particle follows a straight line with respect to an inertial frame, it moves in a curve with respect to the rotational frame because of inertia.It can be considered that there is a force that drives the particle trajectory to build a curve in the rotating system, as shown in Figure 3.The Coriolis force is able to describe this shift, as shown in the following expression: where F coriolis is the Coriolis force; m is the mass; ω is the angular velocity; and v is the velocity.In other words, when the motion of the straight line is placed in a rotating system, the trajectory of the line will be shifted.Actually, the motion of a straight line is not affected by the force.This virtual force is called the Coriolis force.
Therefore, two objects are selected in the gyroscope, as shown in Figure 4.They are in constant motion with a phase difference of −180 degrees, that is, the two mass blocks are the same size and moving in opposite directions.They generate opposite Coriolis forces, so that two corresponding capacitor plates are forced to move, which will change the capacitance difference.This change of capacitance difference is proportional to the change of rotation angle.The capacitance is able to acquire the angular velocity of the rotation.Therefore, the gyroscope can measure the pitch angle, roll angle and yaw angle.Its disadvantage is that it has the cumulative error from the integration of the angular velocity.

Triaxial Magnetometer
In this study, the magnetometer provides the geomagnetic field data in the XYZ axis to the UAV.The data are then fed into the microcontroller's algorithm to calculate the heading angle associated with the magnetic North Pole.
The magnetometer consists of three mutually perpendicular magnetoresistance sensors in the XYZ axis.Each sensor measures the strength of the geomagnetic field in that direction.
In one hand, an alloy material with a crystalline structure can be used to design a triaxial magnetometer.This material is very sensitive to the external magnetic field.The strength of the magnetic field can be measured by the changes in the resistance value of the reluctance sensor.
In addition, the triaxial magnetometer can be designed based on the Lorentz force.The current flows through the magnetic field to generate force, which will produce changes such as capacitance that can be used to measure the strength of the magnetic field.
The triaxial magnetometer can be used to measure the yaw angle.However, it is susceptible to the electromagnetic interference.

Barometer
A barometer is usually used to measure the atmospheric pressure and the corresponding absolute altitude, or to obtain a relative altitude by subtracting two altitude values.Piezoelectric barometers are often used in multi-rotor aircraft.
The relationship between the altitude and the air pressure is expressed as follows [25]: where P 0 is the standard atmospheric pressure, that is, 1013.25 mbar (1 mbar = 1 hpa); altitude is the altitude in meters; and P is the air pressure with the unit mbar at a certain altitude.

Framework
The framework of our PERDET approach is presented in Figure 5. Firstly, we collect perception data through real flight experiments.These data are classified into two classifications: the normal and the attacked.Secondly, the feature analysis is performed on various sensor data based on the principle of the attitude estimation and position estimation, so that we can select relevant sensor data types as features.Thirdly, we further carry out the feature selection to extract significant features and discard the irrelevant or lower correlated features, so that we can make sure the selected features are the most representative features and can decrease the training time.After that, multiple machine learning algorithms run on our dataset with selected features for model training.Then, we evaluate these trained model and choose an appropriate classifier as our PERDET detector.PERDET reads the unknown flight data, and then determines whether the UAV has been attacked and produces a detection report to record the detection process and results.

Feature Analysis
GPS spoofing attacks aim to deceive a UAV with fake positions, which contain the latitude, longitude and altitude.To detect GPS spoofing attacks, the latitude and longitude should be focused.However, the position of a UAV is calculated using not only the GPS data but also the data from other sensors, such as an accelerometer and a barometer.Thus, the data from an accelerometer and a barometer should be considered to detect the correctness of the GPS data.Moreover, the position changes along with the change of the UAV attitude.In other word, the attitude determines the change way of the position.Therefore, the attitude-related perception data also should be taken into consideration for the detection of GPS spoofing attacks, including the data from an accelerometer, a magnetometer and a gyroscope.
To select the GPS-related data types as features, this section mainly analyzes the relationship among the perception data obtained by the GPS, accelerometer, gyroscope, magnetometer and barometer.Based on the principle of the position estimation and attitude estimation, the feature analysis process is carried out from two aspects-position analysis and attitude analysis-as follows.

Position Analysis
The UAV position is the 3D coordinate, including the horizontal two-dimensional position and the vertical position.Therefore, we discuss the position relationship from the view of the horizontal position and the vertical position, as follows.
Horizontal position.For the analysis of the horizontal position, we focus on the GPS and the accelerometer, as shown in Figure 6.An accelerometer is able to gather the XYZ-axis acceleration.The integration of the XY-axis acceleration is the horizontal speed, which can be integrated to the horizontal position, as shown in Figure 6a.The GPS receiver can receive satellite positioning signals and generate the horizontal position using the latitude and longitude, which can be taken the derivative to the horizontal speed and further to the horizontal acceleration, as shown in Figure 6b.Both the accelerometer and the GPS can provide the horizontal position, speed and acceleration.Thus, the acceleration from the accelerometer and the latitude, longitude and horizontal speed from the GPS receiver are relevant for the horizontal position.Vertical position: The vertical position is related to the barometer and GPS, as shown in Figure 7.The GPS receiver can provide the altitude, which can be taken the derivative to the vertical speed and further to the vertical acceleration, as shown in Figure 7a.The barometer can obtain the air pressure, which can be used to calculate the altitude using Formula (2).This altitude is also able to be taken the derivative to the vertical speed and further to the vertical acceleration, as shown in Figure 7b.It is obvious that the altitude and vertical speed offered by the GPS receiver and the altitude provided by the barometer are relevant for the vertical position.

Attitude Analysis
For the attitude analysis, the data provided by the accelerometer, magnetometer and gyroscope are discussed, as shown in Figure 8.The XY-axis acceleration can be used to calculate the pitch angle and roll angle, as shown in Figure 8a.The three-axis magnetic force (i.e., m x b , m y b and m z b ) is first transformed from the body-fixed coordinate to the geographic coordinate (i.e., m x e , m y e and m z e ), and then the magnetic force is mapped to the horizontal plane for the calculation of the yaw angle, as shown in Figure 8b.The three-axis angular velocity is also first transformed from the body-fixed coordinate to the geographic coordinate, and then the angular velocity can be integrated to the pitch angle, roll angle and yaw angle, as shown in Figure 8c.
Based on the above analysis, we know that the attitude, including the pitch angle, roll angle and yaw angle, can be generated by the gyroscope or the accelerometer and magnetometer.Therefore, three-axis angular velocity provided by the gyroscope, XY-axis acceleration offered by the accelerometer and three axis magnetic force supplied by the magnetometer are relevant.

Summary of Data Features
As discussed above, the relevant data contain the three-axis angular velocity, three-axis acceleration, latitude, longitude, GPS-based altitude, horizontal speed, vertical speed, threeaxis magnetic force and barometer-based altitude.To further reflect the feature relationship, we select the variance, standard deviation and mean of each relevant sensor data that discussed above as features for model training.

Results
The experiment and evaluation of our PERDET approach is performed in this section.We first present how to collect the dataset in real flights and preprocess these data.We have built a wide range of features in Section 3.2.Then, the effectiveness of these features are evaluated to select which ones are better to construct the classifier.After that, different ML algorithms are trained with selected features, and we further explain how to choose the best classifier for GPS spoofing detection.

Data Collection and Preprocessing
Existing researchers using the real dataset have not shared their experimental dataset that support our experimental setup.Thus, we perform real UAV flights to collect the flight data.The normal flight data are easy to gather.For attacked scenarios, we implement random attacks by injecting a software into the flight control software.It modifies the normal latitude and longitude with the random value.The detailed working processes are presented here:

•
For the longitude: -Firstly, we should select a value to change the longitude.The injected software generates a random value RanVal.RanVal mod 50 to generate a remainder.If the remainder is not greater than 20, the injected software will generate a new random value RanVal.The threshold value 20 is determined through a number of experiments and it can ensure that a UAV will be deceived to deviate from the planned path.-Then, we need to determine whether to decrease or increase the longitude.The random value RanVal mod 10.If the remainder is greater than 5, the GPS longitude is subtracted with this random value RanVal; if the remainder is less than or equal to 5, the GPS longitude is added with this random value RanVal.
In the calculation process, the longitude has been expanded 10 7 times for the convenience of calculation.Thus, the longitude is directly added or subtracted with the RanVal.
• For the latitude, the injected software performs a similar operation.
The calculation frequency of this injected software is 5Hz.It is the same with that of the GPS data.The changed value 20 is corresponding to about 0.2 meter for the longitude and 0.22 meter for the latitude.
In our experiment, we use a random value and not a constant value to modify the longitude and latitude since the linear change of the latitude or longitude can be easily detected and the random change representing all kinds of situations will be more difficult to detect.Since we have constrained the random value, such that it must be greater than 20, the UAV can be deceived by our injected software.In our experiment, the real flight demonstrates that the aforementioned process is able to divert the UAV to the wrong destination with an incorrect path.The experimental flight paths contain not only the straight line and the curve, but also the ascending path and descending path.
The experimental UAV consists of a Pixhawk 2.4.8 [26] as a flight controller.In our experiments, only part of sensors are applied, including a GPS receiver, an MPU-6000, an HMC5883 and an MS5611.The GPS receiver is to receiving satellite positioning signals and provide the location and time information to the flight control software.An MPU-6000 is the 6-axis motion processing sensor, including an accelerometer and a gyroscope.An HMC-5883 is a magnetometer.An MS-5611 is a barometer.
After collecting the flight data, we need to preprocess them.Each sensor has a different sampling frequency, so the number of the different sensor data are different.To unify the data, we set 1 second as the time period to handle the original flight data.The final UAV dataset is provided in Table 1.We calculate the variance, standard deviation and mean of each sensor data type to build our dataset for model training.The total number of data are 5303 (the normal dataset has 4232 pieces and the attacked dataset has 1071 pieces).

Feature Selection
We analyze the relationship of the perception data of a UAV in Section 3.2 to select a total of 45 features, as shown in Table 2.They are the variance, standard deviation and mean of the original data.The selected data contains 9 gyroscope features, 9 accelerometer features, 9 magnetometer features, 15 GPS-measured features (including horizontal position, altitude, horizontal speed and vertical speed) and 3 barometer-measured features.However, we are not sure whether each feature has the equivalent ability to represent the UAV position.Thus, there is a need to select the more informative features and discard the low-correlated or even irrelevant features.This is the target of the feature selection process.It can reduce both training and prediction time by excluding the low-correlated or irrelevant features.To evaluate the 45 features, a random forest (RF) algorithm is applied with our dataset.The RF consists of a number of decision trees.In the forest, the importance of each feature is calculated based on the information gain it provides [27].For each node of the tree, the information gain is utilized to represent the information entropy change of the node from the current state to the proposed state.
To construct an effective and efficient RF model to evaluate features, first, it is necessary to choose the appropriate number of trees in the RF model.A smaller number may make the RF model inaccurate, and a larger number may increase the training time and lead to the model overfitting.Out-of-bag (OOB) error can be used to optimize this number.The OOB error estimation process computes the misclassification probability.In this paper, the OOB error is calculated with varying the number of trees from 15 to 500, as shown in Figure 9.We can see that the OOB error rate tends to be stable, close to the minimum value (i.e., 0.0041) when the number of trees approximates 368.Thus, the optimized number of trees in the RF model is set as 368, which is a parameter for the model training.After that, to compute importance score of 45 features, the RF model is run with 368 trees based on information gain theory.Furthermore, we pick up the most informative features which have stability and the lowest mean absolute error (MAE).The remaining feature selection process contains following three steps: 1.
Compute the importance score for each one of 45 features; 2.
Sort features based on the importance score in descending order; 3.
Train the RF model with k features, for k = 1-n, and determine the number of features, which makes the RF model produce the stable and lowest MAE.
Step 1.We run the RF model with 368 trees to calculate importance scores.45 features and their corresponding importance scores are provided in Table 2.
Step 2. All features are sorted according to importance scores in the descending order in Figure 10.We can see that the first three features have relatively higher scores than other features.Their feature numbers are 35 (the mean of magnetometer x-axis), 26 (the mean of GPS-measured altitude) and 20 (the mean of the latitude), which can be queried in Table 2.It can be observed that, from the 4th to the 13th feature, the importance score decreases from about 0.06 to 0.014.From the 14th to 21th feature, the importance score decreases gradually, which is from about 0.013 to 0.01.The importance scores of the rest features decrease steadily and they are less than 0.01.Step 3. To compute the MAE, we run the RF model for the varying number of features from 1 to 45.The MAE and the corresponding number of features are shown in Figure 11.It can be observed that the MAE is basically stable when the number of features is more than 21.It also means that the MAE will not decrease seriously when we choose more than 21 features.Thus, the first 21 features in Figure 10

Model Selection and Evaluation
The aim of this section is to choose a proper classifier and demonstrate the effectiveness of the selected features.Six machine learning algorithms are applied to our dataset with selected features, including support vector machine (SVM) with the linear kernel, SVM with the rbf kernel, K-nearest neighbor (KNN), random forest (RF), gradient boosting decision tree (GBDT) and extreme gradient boosting (XGBoost).Through the comparison of results of these models, the best classifier is selected as the GPS spoofing detection model-that is, the core of our PERDET approach.
The receiver operating characteristic (ROC) curve is the common approach for assessing the performance of classifiers.An ROC curve illustrates the relationship between the false positive rate (FPR) and the true positive rate (TPR) with a range of thresholds for the determination of a binary classification.We run 6 models with default parameters and 21 features on our dataset.The generated ROC curve is presented in Figure 12.This figure provides the area under the curve (AUC) to compare the effectiveness of six classifiers.The AUC value is between 0 and 1, and 0.5 means random predictions with an uninformative classifier and the higher value (>0.5) represent the better performance of the classifier.We can see that RF has the best the AUC value with 0.999859 in Figure 12.The AUC values of XGBoost, GBDT, SVM with the rbf kernel, KNN and SVM with the linear kernel are 0.999452, 0.999159, 0.932036, 0.919147 and 0.910132, respectively.It means that the detection capacity of these classifiers and all AUC values are more than 0.91, which can demonstrate the effectiveness of the features selected in Section 4.2.It also can show that the RF model is the best classifier for the GPS spoofing detection and it can be used as the detection model in our PERDET approach.For the classification, the prediction model must calculate the score of a sample.It determines that the sample is a malicious GPS spoofing attack if the score is bigger than the threshold; otherwise, the sample is seen as a normal benign scenario.This threshold is a probability and its value range is between 0 and 1.Using different thresholds for the classification, the classifier shows different performances.
This process allows us to select classifiers with different thresholds for different types of classification requirements.In this paper, the threshold value should satisfy that the TPR is maximized and the FPR is minimized.The optimal cut-off point in the ROC curve is able to provide this value.Optimal cut-off points of all classifiers are labeled in Figure 13, which is a zoomed-in ROC curve.In this Figure, it can be observed that the TPRs in the optimal cut-off points are descending in the following order: 100% (XGBoost), 99.69% (RF), 98.44% (GBDT), 95.94% (KNN), 92.50% (SVM with the rbf kernel) and 85.63% (SVM with the linear kernel).The FPR, TPR and corresponding threshold are summarized in Table 3.We can see that the RF model has the lowest FPR, 0.24%, and XGBoost, GBDT, SVM with the linear kernel, SVM with the rbf kernel and KNN come next with 0.31%, 0.71%, 14.24%, 14.63% and 19.75%, respectively.After that, using optimal thresholds for each classifier, we apply several kinds of classification assessment criteria to evaluate the performance of classifiers, including the accuracy, the precision, the recall, and the F1-measure.The formulae of these criteria are presented in Formulae (3)-( 6), where TP is the true positive, TN is the true negative, FN is the false negative and FP is the false positive.The assessment results are provided in Table 4.The explanation of these criteria is presented in the following.F1-measure: F1-measure is the harmonic mean of the precision value and the recall value, as defined in Formula (6).Its value range is between 0 and 1-the higher the better.Thus, the RF and XGBoost models, with the F1-measure value 99.22%, have the best performance compared with the other models.
Through the above comparison of the performance, it can be concluded that the RF and XGBoost models are the best among the six classifiers in terms of comprehensive classification results.This demonstrates that the RF and XGBoost models are the optimal classifiers for our dataset.Thus, the RF model or XGBoost model can be selected as the detection model of our PERDET approach.

Comparison and Discussion
Some existing works have applied machine learning algorithms to detect GPS spoofing attacks on the UAV.However, it is difficult to compare the PERDET with other methods, since we use different dataset.For machine learning algorithms, the data itself is an important factor that can significantly affect the performance of algorithms.
By analyzing all existing works, we try to select existing methods for the comparison.Feng et al. [12] used the XGBoost and SVM models to analyze the flight data, including the angular velocity, acceleration and the distance between two time points, for the detection of UAV GPS spoofing attacks.Their method is abbreviated as the JSA method.In the JSA method, they only selected the angular velocity, acceleration and GPS-based positions that used to calculate the distance, and these data could be obtained from the flight log.In our experiment, we already have the flight log which supports the performance assessment of the JSA method.Therefore, we choose the JSA method as the object for the comparison.
However, the implementation of the JSA method is not open source and its experimental data are also not shared.We first should implement the JSA method, and then run this method on our dataset for the comparison.The detailed description of the JSA method was provided in the paper of Feng et al. [12].The comparison of experimental results between the JSA method and our PERDET approach is shown in Table 5.In the JSA method, the parameters were optimized for the XGBoost model.Even the JSA method ran with optimized parameters, the performance of our PERDET approach is better in terms of the accuracy, the precision, the recall and the F1-measure.Furthermore, we can see that, when both our PERDET method and the JSA method apply the XGBoost model to our dataset, PERDET has a much better performance.The core difference between these two methods is the feature set.Thus, we further demonstrate that the selected features of the PERDET method are more effective.
Table 5.Comparison of experimental results between the JSA method [12] and our PERDET approach.We have implemented the JSA method.For the JSA method, SVM-linear, SVM-rbf, XGBoost (with default parameters) and XGBoost (with optimized parameters provided by the JSA method) models are trained and tested using our dataset.The last row is the performance of our PERDET approach using default parameters.Both the JSA method and our PERDET are proposed based on machine learning algorithms.The important reason why the JSA method cannot work as well as the PERDET is that the JSA method did not comprehensively consider the relationship among the flight data and only considered the horizontal distance; a UAV performs both horizontal and vertical movements.Moreover, a UAV has various data types such as the acceleration, angular velocity, position, yaw, pitch and roll and so on, which are correlated.The relevance among the acceleration, angular velocity and position cannot sufficiently reflect the GPSbased position calculated by the latitude and the longitude.It is necessary to consider the relevance among the GPS position information with other perception data.Because the acceleration and the angular velocity have measurement errors that may be misjudged as spoofing attacks, it is not appropriate to use only them to detect GPS spoofing attacks.

Model
In addition, Feng et al. [12] asserted that the JSA method was better than their two detection methods, called the TECS method [16] and the DATE method [15].It means that our PERDET is also better than the TECS method [16] and DATE method [15], since PERDET is better than the JSA method.

Conclusions
We propose the PERDET approach, which is a machine-learning-based UAV GPS spoofing method using the perception data.We comprehensively analyze the perception data of a UAV using the position estimation process and attitude estimation process.We choose data from the accelerometer, gyroscope, magnetometer, GPS and barometer as features.These data can compensate for each other.Each one of these sensors are able to verify the result measured by other sensors.The selection process of the perception is the significant difference between our PERDET and existing approaches.Moreover, we collect the dataset by real flight experiments.Most existing perception-data-based detection methods are evaluated through simulation.The experiment and comparison presented here demonstrate that PERDET is effective and can work better than existing methods.

Figure 1 .
Figure 1.The architecture of the UAV.

Figure 2 .
Figure 2. The measurement principle of the MEMS accelerometer.

Figure 4 .
Figure 4.The principle of the gyroscope.

Figure 5 .
Figure 5. Framework of the PERDET detection approach.

Figure 6 .
Figure 6.The relationship of the perception data in the horizontal position.(a) Accelerometer-based horizontal position; (b) GPS-based horizontal position.

Figure 7 .
Figure 7.The relationship of the perception data in the vertical position.(a) GPS-based altitude; (b) Barometer-based altitude.

Figure 8 .
Figure 8.The relationship of the perception data in the attitude.

Figure 9 .
Figure 9. OOB error rate versus the number of trees.
are selected for the model training and classification.By checking the feature number in

Figure 13 .
Figure 13.Zoomed-in ROC curve at top left.

Table 1 .
Our UAV dataset for model training.

Table 2 .
All features and their importance scores.

Table 3 .
Optimal thresholds of each model.

Table 4 .
(5)essment results of different models with default model parameters.Accuracy represents the proportion of all data classified by the classifier with correct classification, including positive and negative classes.The calculation formula is provided in Formula (3).The RF and XGBoost models have the highest accuracy value (99.69%) among six models.However, only one accuracy criterion cannot sufficiently reflect the efficiency.It is necessary to take other criteria into consideration.•Precision:Precisionrepresentsthenumber of positive classes among the number of positive classes considered by the classifier, as defined in Formula (4).It can be observed that the RF model has the highest value (99.07%).•Recall:Recallrepresents the number of positive classes decided by the classifier in the whole positive class, as defined in Formula(5).It shows the capacity to correctly classify the positive samples.It can be observed that the XGBoost model has a highest recall value (99.69%) than the other classifiers.•