Aircraft Track Anomaly Detection Based on MOD-Bi-LSTM

: In order to ensure ﬂight safety and eliminate hidden dangers, it is very important to detect aircraft track anomalies, which include track deviations and track outliers. Many existing track anomaly detection methods cannot make full use of multidimensional information of the relevant track. Based on this problem, an aircraft track anomaly detection method based on the combination of the Multidimensional Outlier Descriptor (MOD) and the Bi-directional Long-Short Time Memory network (Bi-LSTM) is proposed in this paper. Firstly, track deviation detection is transformed into the track density classiﬁcation problem, and then a multidimensional outlier descriptor is designed to detect track deviation. Secondly, track outliers detection is transformed into a prediction problem, and then a Bi-LSTM model is designed to detect track outliers. Experimental results based on real aircraft track data indicate that the accuracy of the proposed method is 96% and the recall rate is 97.36%. It can detect both track deviation and track outliers effectively.


Introduction
With the rapid development of satellite communication technology, the ADS-B data reflecting aircraft track is becoming more abundant, and there is a lot of valuable information from the track data. In order to realize multi-source information fusion and the tracking of aircraft flights, Chinese and other scholars have carried out a lot of exploration and research work on track data mining methods, such as the ADS-B anomaly data detection model based on VAE-SVDD [1], track clustering [2], track correlation analysis [3,4], object motion pattern recognition [5], path planning [6], anomalous trajectories detection [7], etc. In the field of information fusion, the abnormal behavior of the target can be mined through the multi-dimensional track characteristics of the target, which is of great significance for situation assessment, threat assessment and command decision-making [8]. In the real environment, there are many unreasonable sampling points that have huge differences with their neighboring track points in motion features in the track data. These points are outliers in the track data [9]. All kinds of outliers affect the data quality of the track, and also have a certain effect on the information mining in subsequent tracks. Therefore, anomaly detection based on track data is an important basis of track data analysis and mining.
The track data of an aircraft flight is usually a sequence comprised of Multidimensional data points. The abnormal behavior of the target can be mined through the anomaly detection of Multidimensional tracks. Scholars at home and abroad have done a lot of research on anomaly detection and track anomaly detection. Chandola et al. summarized the anomaly detection technology, classification, application scope, advantages and disadvantages [10,11]. Existing anomaly detection methods can be divided into statistics-based anomaly detection [12], distance-based anomaly detection [13], density-based anomaly detection [14], depth-based anomaly detection [15] and deviation-based anomaly detection [16], etc. Existing track anomaly detection methods include methods based on the Multidimensional Local Outlier Factor [8], extracting global track features [17], classifiers [18,19], track segment similarity detection [20,21], etc. In recent years, with the advancement of deep learning methods in the big data detection field, anomaly detection methods of deep learning have come to include an anomaly detection algorithm based on multi-layer convolution neural network interactive visualization [22], a Longshort Time Memory Network anomaly detection algorithm based on the encoder-decoder framework [23], a novel intrusion detector based on deep learning hybrid methods [24], forecasting and anomaly detection approaches using LSTM and LSTM autoencoder techniques with the applications in supply chain management [25], a new method for anomaly detection of seismic preprecursor data based on LSTM-RNN [26], a deep learning-based hybrid intelligent intrusion detection system [27], deep learning for anomaly detection [28], a track anomaly detection algorithm based on the Bidirectional Long-short Time Memory Network [9], etc. In addition, Ruff et al. published a review on deep and shallow anomaly detection [29].
Some of the above methods can only detect the position anomaly of the target track, some do not make full use of the multidimensional features of the target, some rely too much on the parameter threshold, some are unable to automatically learn the difference between the abnormal points and normal points of the track and some have poor applicability to complex types of data. In a word, there are some limitations in mining the abnormal behavior of a target.
In view of the above problems, a track anomaly detection algorithm based on the Multidimensional Outlier Descriptor (MOD) combined with the Bi-directional Long-Short Time Memory Network (Bi-LSTM) is proposed in this paper. Firstly, the MOD algorithm is used to detect the track deviation. Then a 5-dimensional motion feature vector is built for each track point, and the feature vectors of track data within a time interval are selected as the input of the Bi-LSTM network to detect various abnormal behaviors of an aircraft.

Track Anomaly Detection Based on MOD-Bi-LSTM
The aircraft track anomaly includes the track deviation and the track outliers. A track deviation means the flight of the aircraft deviates from the fixed flight route. The track outliers refers to the abnormal height, speed and heading of aircraft at some track points. The flowchart of the algorithm for track anomaly detection based on MOD-Bi-LSTM is shown in Figure 1.

Data Preprocessing
Due to some reasons, the flight data of aircraft may be missing some of the time. In order to not affect the detection of track anomaly, the missing data need to be filled. According to the distribution of data, the mean and median can be used to fill the data. If the data is uniformly distributed, the mean is used to fill the data, and if the data is non-uniformly distributed, the median is used to fill the data.

Track Deviation Detection
The MOD algorithm is used to detect the track deviation. It is necessary to determine the threshold of the MOD algorithm. Since the MOD algorithm is to detect static data, the nearest neighbor number k and the anomaly threshold T can be determined through experimental verification. In the same scene, once the values of k and T are determined, these values can be adopted in the subsequent track deviation detection. The MOD value of the tracks is calculated, and tracks with a MOD value higher than the threshold are detected as a deviational track, then the deviational tracks will be removed.

Data Preprocessing
Due to some reasons, the flight data of aircraft may be missing some of the time. In order to not affect the detection of track anomaly, the missing data need to be filled. According to the distribution of data, the mean and median can be used to fill the data. If the data is uniformly distributed, the mean is used to fill the data, and if the data is nonuniformly distributed, the median is used to fill the data.

Track Deviation Detection
The MOD algorithm is used to detect the track deviation. It is necessary to determine the threshold of the MOD algorithm. Since the MOD algorithm is to detect static data, the nearest neighbor number k and the anomaly threshold T can be determined through experimental verification. In the same scene, once the values of k and T are determined, these values can be adopted in the subsequent track deviation detection. The MOD value of the tracks is calculated, and tracks with a MOD value higher than the threshold are detected as a deviational track, then the deviational tracks will be removed.  sequence value are differentiated to obtain the difference sequence, and the difference sequence is solved by the SVDD algorithm to obtain the threshold of abnormal detection. In the test phase, whether the flight state of the aircraft is abnormal is judged by comparing the differentials with the threshold.

MOD for Track Deviation Detection
In anomaly detection, Local Outlier Factors (LOF) can not only solve the problem of anomaly detection with inconsistent local density [30] well, but also measure the position anomaly reflected by Euclidean distance well. For multidimensional target track data, LOF is not able to measure the position, speed and heading anomaly of the target for different requirements. Therefore, the MOD based on Dynamic Time Warping (DTW) algorithm can be the anomaly measurement of a multidimensional track. MOD can not only measure the position anomaly of the target, but also measure the height, speed and heading anomaly of the target, simultaneously.
Track data generally is multidimensional sequences comprised of multidimensional data points. In civic air traffic monitoring and processing systems, track data usually includes multidimensional characteristics such as the batch number, attribute, category, quantity, model, aircraft number, time, longitude, dimension, altitude, speed and heading of the target. Tracks can be expressed in following sets: where: TD is the track set, i ∈ [1, n] is the track number and n is the total number of tracks; the track TR i is a multidimensional track point sequence composed of several multidimensional track points in time sequence: where: P ij stands for the jth multidimensional track point in the ith track, j ∈ [1, m] for the number of track points, and m for the total number of track points. For different tracks TR i , m is not necessarily the same. P ij for the track point is a vector with multidimensional characteristics, i.e., the target number, attribute, type, time, longitude, dimension, height, speed and heading of the jth multidimensional track point in the ith track. The DTW similarity function as the distance measurement replaces the traditional Euclidean distance to evaluate the distance between sequences to solve the problem that it is difficult to accurately calculate the distance between time series by using the Euclidean distance. The principle of DTW algorithm is to give the two sequences, TR A = {TR A1 , TR A2 , . . . , TR Ai , . . . , TR An } and TR B = TR B1 , TR B2 , . . . , TR B j , . . . , TR Bm , whose lengths are n and m. A matrix grid of n*m is constructed, with Euclidean distance D as the standard and matrix elements (i, j) as the distance D TR Ai , TR B j between the point TR Ai and the point TR B j . The matrix angle at the beginning of the sequence is the boundary condition, satisfying the constraints of continuity and monotony. The path with the smallest accumulative distance calculated by dynamic programming is the best path. At this point, the accumulative distance value of the best path is the DTW similarity of the two sequences. When calculating the DTW distance between the two tracks, only position characteristics are considered instead of the dynamic characteristics such as speed, heading and acceleration in terms of the Euclidean distance between the two points. The multi-factor directional DTW distance δ M of the track TR A and the track TR B is defined by the multi-factor distance m f dist(P a , P b ) between two points. Considering the position, velocity, heading and acceleration features between the two vector points, the multi-factor distance between them is defined as follows: where: v P a and v P b is the speed of the point P a and the point P b ; θ P a and θ P b is respectively the heading of P a and P b ; α P a and α P b is respectively the heading of P a and P b . Euclidean Distance dist v P a , v P b is the Velocity Characteristic between P a and P b ; Euclidean distance dist θ P a , θ P b is the heading characteristic between P a and P b . w d is the weight factor of the position feature; w v is the weight factor of speed characteristics; w θ is the weight factor of heading characteristics; w α is the weight factor of acceleration character-istics. The weight factor satisfies the formulation: In the anomaly detection of multidimensional tracks, the values of four feature weights can be determined proportionally according to the needs of different detection tasks. When considering the position anomaly only, w d = 1, w v = 0, w θ = 0, w α = 0. Based on the defined multi-factor distance m f dist(P a , P b ), the dimension of directional DTW distance will be expanded, and the multi-factor directional DTW distance between multidimensional tracks TR A and TR B is defined as: represents the similarity of multidimensional tracks TR A and TR B . Calculate MOD for multidimensional tracks TR i ∈ TD: (1) Calculate the multi-factor neighbor boundary distance of the track: (2) Calculate the neighboring track of the track TR i : (3) Calculate the multi-factor reachable distance from the track TR i to the track TR j : (4) Calculate the nearest neighbor density of the track TR i : In the formula, replacing δ M with δ M reach can smooth the neighbor density. The larger the value k, the smoother the effect becomes.
(5) Calculate the Multidimensional Outlier Descriptor of the track TR i : where: the greater the ratio of the neighbor density of the tracks TR i and the neighbor of the track TR i , the greater the degree of anomaly of the track TR i becomes.

Bi-LSTM for Track Outliers Detection
Recurrent neural network (RNN) is a recursive neural network whose input are sequences data. RNN has the characteristics of memory and parameter sharing, so it can learn the nonlinear features of sequences with high efficiency. Although RNN can deal with nonlinear time sequences effectively, it has the problem of gradient explosion and disappearance for time sequences with a too long delay. The Long Short Term Memory Network (LSTM) is an evolutionary network of traditional recurrent neural networks, which can improve the gradient explosion and disappearance problem. However, the predicted output of LSTM is determined by the input of previous multiple moments, and may lead to the loss of useful information when extracting data features. In many cases, the prediction is affected by the input of previous and subsequent multiple moments. Therefore, this paper adopts the Bi-LSTM network with a forward and backward structure for track outliers detection [31].

Introduction of Bi-LSTM Neural Network Model
Bi-LSTM consists of a forward Long Short Term Memory Network (LSTM) and a backward LSTM. Because track outliers do not occur in isolation, the forward and backward structure of the Bi-LSTM model is more suitable for track outliers detection. The LSTM neural network unit is the basic unit that constitutes the Bi-LSTM neural network, and its network cell structure is shown in Figure 2. The LSTM model can be regarded as an optimized recurrent neural network model, which is mainly aimed at the gradient problem in the process of long sequence training. The model is composed of input signal t x at time t , cell state t C , temporary cell state ˆt C , hidden layer state t h , forget gate t f , memory gate t i , and output gate t σ [32]. The working principle is as follows: (1) The forget gate screens weak correlation information and deletes it: (2) The input gate screens the information of the strong correlation degree, and the sigmoid layer and the hidden layer jointly update the information in the cell state: where c W and c b represent training parameters; The LSTM model can be regarded as an optimized recurrent neural network model, which is mainly aimed at the gradient problem in the process of long sequence training. The model is composed of input signal x t at time t, cell state C t , temporary cell stateĈ t , hidden layer state h t , forget gate f t , memory gate i t , and output gate σ t [32]. The working principle is as follows: (1) The forget gate screens weak correlation information and deletes it: where h t−1 represents the hidden layer information at the previous moment; x t represents the current input; W f , b f is the training parameter; σ is the sigmoid function (neural network activation function); f t represents the weight of retained information. (2) The input gate screens the information of the strong correlation degree, and the sigmoid layer and the hidden layer jointly update the information in the cell state: where W c and b c represent training parameters; C t−1 represents the information of a previous cell state; C t represents information about the current state of the cell.
(3) The output gate determines the final output information: where w 0 is the weight matrix, and the weight matrix used in this paper is a random matrix, that is, random initialization is carried out at the beginning of model training, and the weight matrix is iteratively updated through the back propagation of training loss. b 0 is the offset and σ is the activation function.
Combining forward LSTM and backward LSTM, a Bi-LSTM model can be obtained. The Bi-LSTM structure is shown in Figure 3. Combining forward LSTM and backward LSTM, a Bi-LSTM model can be obtained. The Bi-LSTM structure is shown in Figure 3.   (5) Reverse error calculation: the characteristics are transferred from the LSTM in the

Bi-LSTM Neural Network Model Construction
(1) In flight data, each track point is represented by a 5-dimensional vector: F t = {longitude, latitude, height, heading, speed}. Feature extraction is carried out on the data to form the training set of the model: (2) Data standardization processing; X = {X 1 , X 2 , . . . , X n } represents the processed training set, which is used as the model input, with Y = {Y 1 , Y 2 , . . . , Y n } representing model output. forward and the backward motions and the results are calculated to obtain the difference loss between the real value and the predicted value. According to the loss, the whole network is backtracked and the parameters are modified. In order to improve the generalization ability of the model, a dropout mechanism is added between the Bi-LSTM layer and the full connection layer of the first layer to prevent overfitting of the model [33]. (6) When the training times and error values meet the set requirements, the model training is stopped and the test set is predicted.

SVDD Evaluates Threshold
To solve the adaptive problem of detection threshold of aircraft abnormal behavior, the difference sequence could be obtained by taking the difference between the predicted sequence value in this paper and the real sequence value. Based on the difference sequence, a Support Vector Domain Description (SVDD) classifier [34,35] is achieved to judge flight anomaly data. The main idea of SVDD classifier design is divided into several steps. Firstly, the difference sequence is mapped to high-dimensional space by nonlinear mapping. Then, the smallest hypersphere containing all or most of the difference sequence samples is found in the high-dimensional space. Finally, the obtained hypersphere functions as the discrimination boundary for anomaly detection (the discrimination rule is: if the sample point of the difference sequence falls into the hypersphere in the high-dimensional space, the sample point is discriminated as a normal point; if the sample point of the difference sequence falls outside the hypersphere in high-dimensional space, judge the sample point as an abnormal point).
Based on the difference sequence between the predicted sequence and the real sequence, the radius r and the center a of the hypersphere can be obtained by SVDD solution, thus obtaining the classifier.

Experimental Setup
In this paper, the MOD-Bi-LSTM model is constructed using TensorFlow framework, and the model is trained on NVIDIA Titan XP graphics card. Because the test data in this study contains 5 features, the input size of the MOD-Bi-LSTM model is set to 5. Additionally, the number of hidden units is 128, the dropout rate is 0.5, the batch size is 1000 and the epoch is set to 256. The ReLU activation function and Adam optimizer are used in the model, and the initial learning rate is set to 0.001.

Data Set
In this paper, the real aircraft track data is used to verify the performance of the proposed method. The flight data of CCA1315, CCA1369, CCA1883 and CCA1803 within a month are extracted in this paper, and each datum includes the information of flight time, height, speed, heading, longitude and latitude. However, the distribution of positive and negative samples is unbalanced in the data set. In this paper, the data set is randomly divided into a training set and test set. The combination of an undersampling method and oversampling method is used to balance the category distribution of training data. The Synthetic Minority Oversampling Technique (SMOTE) is a kind of oversampling method. Its main idea is to form new minority samples by interpolating between several minority samples. However, SMOTE will produce noise samples during the interpolation process. This problem can be solved by using the Edited Nearest Neighbor (ENN) method to clean up the interpolation results. Any sample that is different from its k nearest neighbor categories will be removed, thus generating a class-balanced training set.

Loss Function
The loss function is used to measure the inconsistency between the predicted value f (x) and the real value y of the model. It is usually expressed by L(y, f (x)) and can also be called the Cost Function. The smaller the value of loss function, the better the fitting of the model.
The loss function used in this experiment is Mean Absolute Error (MAE), also known as L1-loss. MAE is the sum of absolute values of the difference between the target value and the predicted value, which can be used to measure the difference between the predicted value and the real value. The calculation formula is as follows: where y i is the target value andŷ i is the predicted value.

Evaluation Metrics
A variety of commonly used machine learning evaluation metrics are adopted, including Accuracy, Precision, Recall, F1-score, Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE).
For binary classification problem, the sample can be divided into a true positive example (TP), false positive example (FP), true negative example (TN) and false negative example (FN) according to the combination of real situation and model prediction results. The confusion matrix is shown in Table 1. Accuracy is defined as the ratio of correctly predicted sample number to total sample number; precision is defined as the ratio of correctly predicted positive samples to the total predicted positive samples. The recall rate is defined as the ratio of correctly predicted positive samples to the actual total number of positive samples; The F1-score is defined as the harmonic mean of precision and recall rate. The calculation formulas of these three indicators are as follows: The root mean square error (RMSE) is the arithmetic square root of the square of the difference between the predicted value and the real value. When the predicted value is completely consistent with the real value, RMSE is equal to 0. The greater the error, the greater the RMSE value. The calculation formula is as follows: where y i is the real value andŷ i is the predicted value, with n indicating the number of values. The Mean Absolute Percentage Error (MAPE) is a percentage value. The smaller the MAPE value, the better the accuracy of the prediction model. The calculation formula is as follows: where y i is the real value andŷ i is the predicted value, with n indicating the number of values.

Track Deviation Detection
The MOD algorithm can be used to preprocess the track data and detect the track deviation from the route in the data. Take longitude, latitude and height for example, Figure 4 shows the track deviation detection result based on MOD, in which the red dotted line indicates the track deviated from the route and the green solid line indicates the normal track. Figure 4a-d respectively show the flight track of flights CCA1315, CCA1369, CCA1883 and CCA1803 within a month. According to the detection results, the tracks that obviously deviate from the fixed flight route are removed from the data set.  (18) where i y is the real value and ˆi y is the predicted value, with n indicating the number of values.

Track Deviation Detection
The MOD algorithm can be used to preprocess the track data and detect the track deviation from the route in the data. Take     In order to verify the effectiveness of the MOD algorithm, the MOD algorithm was compared with the LOF algorithm, and the results are shown in Table 2.

Track Outliers Detection
After the track data which deviates from the route in the data set is removed, the remaining data is divided into a training set and test set, which take up respectively 70% and 30% of the set, and the training set is added into the Bi-LSTM network for training. The In order to verify the effectiveness of the MOD algorithm, the MOD algorithm was compared with the LOF algorithm, and the results are shown in Table 2. After the track data which deviates from the route in the data set is removed, the remaining data is divided into a training set and test set, which take up respectively 70% and 30% of the set, and the training set is added into the Bi-LSTM network for training.     In order to verify the effectiveness of the MOD algorithm, the MOD algorithm was compared with the LOF algorithm, and the results are shown in Table 2. After the track data which deviates from the route in the data set is removed, the remaining data is divided into a training set and test set, which take up respectively 70% and 30% of the set, and the training set is added into the Bi-LSTM network for training.     In order to verify the effectiveness of the MOD algorithm, the MOD algorithm was compared with the LOF algorithm, and the results are shown in Table 2. After the track data which deviates from the route in the data set is removed, the remaining data is divided into a training set and test set, which take up respectively 70% and 30% of the set, and the training set is added into the Bi-LSTM network for training.     In order to verify the effectiveness and innovation of the proposed method, we can compare the model in this paper with the LSTM model, Bi-LSTM model, BP model, LR model and CNN model. The results are shown in Table 3. To solve the adaptive problem of the threshold of aircraft anomaly detection, the SVDD classifier is obtained based on the difference sequence between the predicted sequence and the real sequence, and the automatic detection of track outliers is realized. Next, 100 tracks' data is randomly extracted from the data set. The flight speed in a certain period of time is changed to 0.5 times of the original speed, the flight height is increased by 50 m, and the heading is reduced by 10 degrees. The modified track data and normal track data are added into the SVDD classifier for testing. Out of the 100 abnormal test samples, a total of 96 test samples showed that the distance between the predicted value and the true value to the center of the hypersphere is greater than the threshold value r, so the detection accuracy is 96%. Out of 265 normal test samples, a total of 258 test samples showed that the difference between the predicted value and the true value is less than the threshold r from the center of the hypersphere, thus the recall rate of detection is 97.36%. Table 4 presents a comparison of the detection effects of the models in this paper (the LSTM model, Bi-LSTM model, BP model, LR model and CNN model) on track outliers of aircraft. As shown in Table 4, the accuracy and recall of the algorithm in this paper are both higher than those of other methods, in particular the recall rate is much higher than that of other methods. A high recall rate means that the model can detect more real outliers, which is very important in anomaly detection.

Conclusions
In this paper, an aircraft track anomaly detection method based on the combination of MOD and Bi-LSTM is proposed, and the effectiveness of the method is verified by using the real aircraft track data. By analyzing the experimental results, the proposed method can detect both track deviation and track outliers well. The main conclusions of this article are as follows: (1) The track deviation detection problem is transformed into the track density classification problem, and the MOD is designed to detect track deviation. The accuracy and recall of the MOD algorithm are improved compared to the LOF algorithm.
(2) The track outliers detection problem is transformed into the prediction problem, and the Bi-LSTM model is used to detect the track outliers. Compared with the traditional methods, the accuracy is improved. (3) The anomaly detection algorithm considers not only the track density information, but also the features of the track point. More comprehensive factors are taken into account and the accuracy has improved.
In future studies, more attention should be paid to the application of the model, and the response time of the model should be reduced to ensure real time detection. Next, some optimization algorithms will be used to further optimize the parameters of the MOD-Bi-LSTM model to improve the accuracy of anomaly detection. Data Availability Statement: Data was obtained from the third party and are available from the authors with the permission of the third party.