Flight Anomaly Detection via a Deep Hybrid Model

: In the civil aviation industry, security risk management has shifted from post-accident investigations and analyses to pre-accident warnings in an attempt to reduce ﬂight risks by identifying currently untracked ﬂight events and their trends and effectively preventing risks before they occur. The use of ﬂight monitoring data for ﬂight anomaly detection is effective in discovering unknown and potential ﬂight incidents. In this paper, we propose a time-feature attention mechanism and construct a deep hybrid model for ﬂight anomaly detection. The hybrid model combines a time-feature attention-based convolutional autoencoder with the HDBSCAN clustering algorithm, where the autoencoder is constructed and trained to extract ﬂight features while the HDBSCAN works as an anomaly detector. Quick access record (QAR) ﬂight data containing information of aircraft landing at Kunming Changshui International and Chengdu Shuangliu International airports are used as the experimental data, and the results show that (1) the time-feature-based convolutional autoencoder proposed in this paper can better extract the ﬂight features and further discover the different landing patterns; (2) in the representation space of the ﬂights, anomalous ﬂight objects are better separated from normal objects to provide a quality database for subsequent anomaly detection; and (3) the discovered ﬂight patterns are consistent with those at the airports, resulting in anomalies that could be interpreted with the corresponding pattern. Moreover, several examples of anomalous ﬂights at each airport are presented to analyze the characteristics of anomalies. P


Introduction
Flight safety is one of the most important topics in civil aviation. In recent years, safety risk management in civil aviation has shifted from post-accident investigations and analyses to pre-accident warnings. With the expectation of more departing flights in the future, civil aviation aims to effectively prevent potential accidents before they occur and thus to remain at a historically low level of accidents per year. For this, there is a need to innovatively and proactively identify operationally significant safety events that are currently untracked and then implement risk mitigation in the form of revising safety requirements to address the newly identified vulnerabilities. Identifying vulnerabilities or hazards is a key step in the process of risk reduction, for which machine learning can provide assistance [1].
Quick access record (QAR) equipment, installed on aircraft, is equipment used to quickly record various flight parameters during flight. QAR data contain up to 2000 flight parameters, including all kinds of attitude parameters, dynamic parameters, external environment parameters, and flight operation parameters. Each flight parameter is recorded once a second, and some parameters are recorded up to eight times a second. QAR data provide comprehensive information on the status and details of aircraft during flight and can be used for flight safety research and analysis. At the end of 2013, the Civil Aviation An AE [24], which mainly contains an encoder and a decoder, is a feed-forward multilayer neural network. It uses an encoder to compress the original data in high-dimensional space into hidden representations in low-dimensional latent space and a decoder to reconstruct the original data from the hidden representations [24]. An AE is trained by minimizing the reconstruction error between its input and its output. Under the assumption that there is a higher prevalence of normal instances than abnormal data instances, AE-based methods use the reconstruction error as an anomaly score. Reddy et al. [16] applied an AE to raw time series data from multiple flight sensors by using sliding overlapping time windows to form input vectors. Zhou et al. [17] implemented an AE with a regularization term (called the "robust AE") to eliminate outliers in the case of a lack of clean training data.
VAEs [25,26], based on traditional AEs, are a type of deep generative model and have also been extensively used for anomaly detection [18][19][20]. VAEs model high-dimensional distributions by casting learning representations as a variational inference [27] problem. The aim of a VAE is to learn a mechanism that can reveal the probability distribution of the input and generate new samples from random variables that take values in the latent space following the distribution. The optimization process takes into account the quality of autoencoded samples with respect to their reconstruction probability and the Kullback-Leibler (KL) divergence between the prior distribution and the transformed posterior distribution through the encoding process [28]. An anomaly will be reconstructed poorly through the generative process, and its encoding falls outside the distribution.
GANs [29] are another type of generative model that consist of two competing networks, a generator and a discriminator. The generator learns to approximate the distribution of a given dataset, and the discriminator learns to distinguish between real data samples and the generator's synthetic output samples [29]. GANs have recently been used for anomaly detection [21][22][23]28] and for more advanced variants [30].
Previous studies have achieved good performance and have greatly contributed to our understanding of unsupervised DAD. However, a well-known drawback of neural networks is the lack of interpretability [28]. Moreover, for multi-feature time series data, the inter-time and inter-feature relationships are rarely considered. To avoid these shortcomings in flight anomaly detection, in this paper, we propose a time-feature attention mechanism to capture the inter-time and inter-feature relationships within QAR time series data, and develop an unsupervised deep hybrid model for flight anomaly detection, as shown in Figure 1. Deep hybrid models for anomaly detection use deep neural networks as feature extractors, and the features learned within the hidden representations of autoencoders are input to traditional anomaly detection algorithms [9]. Deep hybrid model in this study combines a time-feature attention-based convolutional AE (TFA-CAE) neural network model with a hierarchical density-based spatial clustering of applications with noise (HDBSCAN) [31] algorithm, where TFA-CAE is built and trained for the extraction of flight features while HDBSCAN employs the extracted flight features to detect anomalous flights. With the size of latent space is set 2, anomalous flights detected with the unsupervised deep hybrid model are more intuitively interpreted and comprehensible.
The remainder of this paper is organized as follows. Section 2 presents the methodology used in our research, and the results of the case study experiment are presented in Section 3. Section 4 contains a summary and conclusion of the paper and an outline of further research.

Data Preprocessing
Generally, aircraft in flight are affected by many factors, such as the external atmospheric environment (wind direction, wind speed, temperature, etc.), the aircraft itself (the position of all control surfaces, engine status, etc.), the pilot's basic capabilities and skills (cognitive reliability, flight operations skills, etc.), and the pilot's mental state (fatigue, emotional status, etc.) [32]. During flight, these factors are in constant flux and have a continuous and complex impact on the aircraft. Regardless of how these factors change, however, their effects are ultimately reflected in the change in aircraft attitude and kinematic parameters, including the attitude angle, speed, and acceleration in the three longitudinal, vertical and lateral dimensions [33]. Figure 2 shows a kinematic analysis of a flight. In our research, the above two kinds of flight parameters are selected for the extraction of flight features. Table 1 shows the details of the specific parameters.

Data Preprocessing
Generally, aircraft in flight are affected by many factors, such as the external atmospheric environment (wind direction, wind speed, temperature, etc.), the aircraft itself (the position of all control surfaces, engine status, etc.), the pilot's basic capabilities and skills (cognitive reliability, flight operations skills, etc.), and the pilot's mental state (fatigue, emotional status, etc.) [32]. During flight, these factors are in constant flux and have a continuous and complex impact on the aircraft. Regardless of how these factors change, however, their effects are ultimately reflected in the change in aircraft attitude and kinematic parameters, including the attitude angle, speed, and acceleration in the three longitudinal, vertical and lateral dimensions [33]. Figure 2 shows a kinematic analysis of a flight. In our research, the above two kinds of flight parameters are selected for the extraction of flight features. Table 1 shows the details of the specific parameters.

Data Preprocessing
Generally, aircraft in flight are affected by many factors, such as the external atmospheric environment (wind direction, wind speed, temperature, etc.), the aircraft itself (the position of all control surfaces, engine status, etc.), the pilot's basic capabilities and skills (cognitive reliability, flight operations skills, etc.), and the pilot's mental state (fatigue, emotional status, etc.) [32]. During flight, these factors are in constant flux and have a continuous and complex impact on the aircraft. Regardless of how these factors change, however, their effects are ultimately reflected in the change in aircraft attitude and kinematic parameters, including the attitude angle, speed, and acceleration in the three longitudinal, vertical and lateral dimensions [33]. Figure 2 shows a kinematic analysis of a flight. In our research, the above two kinds of flight parameters are selected for the extraction of flight features. Table 1 shows the details of the specific parameters.   In this article, we focus on the landing phase of a flight. Compared to the entire profile of a normal flight, this phase of a flight is relatively short but has a high percentage of flight accidents. A review of accident statistics indicates that over 45% of all general aviation accidents occur during the approach and landing phases of a flight [34]. Thus, a fixed duration of 90 s before flight touchdown was chosen as the study flight phase for this paper, as shown in Figure 3. For each flight, every flight parameter in Table 1 is sampled at a fixed interval.  In this article, we focus on the landing phase of a flight. Compared to the entire profile of a normal flight, this phase of a flight is relatively short but has a high percentage of flight accidents. A review of accident statistics indicates that over 45% of all general aviation accidents occur during the approach and landing phases of a flight [34]. Thus, a fixed duration of 90 s before flight touchdown was chosen as the study flight phase for this paper, as shown in Figure 3. For each flight, every flight parameter in Table 1 is sampled at a fixed interval.

Time-Feature Attention Module
Previous literature [35][36][37][38][39][40][41][42] has extensively shown the significance of attention. Attention not only tells where to focus but also improves the representation of interests [42]. As during a certain period, the deviations of anomalous flights occur to one or more features, our goal is to know when (the time of deviation occurs) and which (the features with deviation) to focus on and improve the corresponding values by using an attention mechanism. Thus, we propose a new network module, named the "Time-Feature Attention Module (TFAM)". To emphasize meaningful features along the time and feature axes, a time module and a feature module are sequentially applied (as shown in Figure 4) so that TFAM can learn "when" and "which" to attend to on the time and feature axes, respectively. As the output of the TFAM module, anomalous flights will be differentiated from the common flights; meanwhile, flights within the same common flight pattern will

Time-Feature Attention Module
Previous literature [35][36][37][38][39][40][41][42] has extensively shown the significance of attention. Attention not only tells where to focus but also improves the representation of interests [42]. As during a certain period, the deviations of anomalous flights occur to one or more features, our goal is to know when (the time of deviation occurs) and which (the features with deviation) to focus on and improve the corresponding values by using an attention mechanism. Thus, we propose a new network module, named the "Time-Feature Attention Module (TFAM)". To emphasize meaningful features along the time and feature axes, a time module and a feature module are sequentially applied (as shown in Figure 4) so that TFAM can learn "when" and "which" to attend to on the time and feature axes, respectively. As the output of the TFAM module, anomalous flights will be differentiated from the common flights; meanwhile, flights within the same common flight pattern will be more tightly aggregated and apparently separated from the ones within other common patterns. where ⨀ denotes the Hadamard product operation. In performing the multiplication operation, the time attention values are propagated (copied) along the dimension of the feature parameters, and the feature attention values are propagated along the time dimension. Figures 5 and 6 depict the computation processes of the time attention and feature attention modules, respectively.  Figure 5. To efficiently compute the time attention, we squeeze the feature dimension of the input QAR time series. Given a QAR time series ∈ ℝ as input, the feature information of ∈ ℝ is first aggregated by using a single-layer perception (SLP) along the time axis to generate a context descriptor of the feature parameters . To produce the time attention map ∈ ℝ , a sigmoid function is applied to the context descriptor . Finally, we multiply time attention ∈ ℝ and ∈ ℝ by the Hadamard product to obtain the time-refined ∈ ℝ . The time attention ∈ ℝ and time-refined ∈ ℝ are computed as where denotes the sigmoid function, and ⨀ denotes the Hadamard product operation.
∈ ℝ . Given a QAR sequence S ∈ R F×T as input, where F means the size of feature dimension and T denotes the size of time dimension, the time attention module first infers a 1D time attention map A t ∈ R 1×T and produces the time-refined output S ∈ R F×T . Afterward, the feature attention module takes S ∈ R F×T as input to infer a 1D feature attention map A f ∈ R F×1 and generates the final refined output S ∈ R F×T . The computation process of overall attention can be summarized as follows: ∈ ℝ along the feature axis to generate a temporal context descriptor . A feature attention map ∈ ℝ is produced after is forwarded to a sigmoid function. To obtain the final refined output ∈ ℝ , we multiply ∈ where denotes the sigmoid function, and ⨀ denotes the Hadamard product operation.
∈ ℝ , and is the matrix transpose of .

Time-Feature Attention-Based Convolutional Autoencoder (TFA-CAE)
An AE network is a powerful nonlinear model structure for data reduction and feature extraction. For feature extraction, the AE is built and trained with the goal of representing meaningful attributes of the original data input with the latent space representation as much as possible. Once training is completed, the encoder part can be used as a powerful automatic feature extractor. The architecture of the AE network model, such as convolutional neural network AEs (CNN-AEs) and recurrent neural network AEs (RNN-  Figure 5. To efficiently compute the time attention, we squeeze the feature dimension of the input QAR time series. Given a QAR time series S ∈ R F×T as input, the feature information of S ∈ R F×T is first aggregated by using a singlelayer perception (SLP) along the time axis to generate a context descriptor of the feature parameters C t . To produce the time attention map A t (S) ∈ R 1×T , a sigmoid function is applied to the context descriptor C t . Finally, we multiply time attention A t (S) ∈ R 1×T and S ∈ R F×T by the Hadamard product to obtain the time-refined S ∈ R F×T . The time attention A t (S) ∈ R 1×T and time-refined S ∈ R F×T are computed as where σ denotes the sigmoid function, and denotes the Hadamard product operation. W t ∈ R 1×F . Feature Attention Module. Different from time attention, feature attention pays more attention to key features and is complementary to time attention. We produce a feature attention map by probing the inter-feature relationship of a QAR time series. The computation of feature attention is very close to the computation process of time attention except that we squeeze the time dimension of the time-refined sequence, as shown in Figure 6. First, a single layer perception (SLP) is used to aggregate the temporal information of the time-refined S ∈ R F×T along the feature axis to generate a temporal context descriptor C f . A feature attention map A f S ∈ R F×1 is produced after C f is forwarded to a sigmoid function. To obtain the final refined output S ∈ R F×T , we multiply A f S ∈ R F×1 and S ∈ R F×T by the Hadamard product. In short, the feature attention A f S ∈ R F×1 and S ∈ R F×T are computed as where σ denotes the sigmoid function, and denotes the Hadamard product operation. W f ∈ R 1×T , and S T is the matrix transpose of S .

Time-Feature Attention-Based Convolutional Autoencoder (TFA-CAE)
An AE network is a powerful nonlinear model structure for data reduction and feature extraction. For feature extraction, the AE is built and trained with the goal of representing meaningful attributes of the original data input with the latent space representation as much as possible. Once training is completed, the encoder part can be used as a powerful automatic feature extractor. The architecture of the AE network model, such as convolutional neural network AEs (CNN-AEs) and recurrent neural network AEs (RNN-AEs), is flexible and diverse, and its choice is dependent on the nature of the data. Previously, RNN-based models were considered and preferred for sequential data. However, recent studies have shown that CNN-based models perform better than general RNN-based models [43]. The main advantage of CNNs is their ability to extract complicated hidden features from high-dimensional data with complex structures [44], and they can also be adopted to extract features from sequential data.
In this paper, we construct a TFA-CAE network model for feature extraction of flight data. Figure 7 shows the structure and parameters of TFA-CAE, in which the TFAM is placed in front of the AE. An original QAR time series is first passed through the timefeature module to generate a refined time series. Within the encoder, multi-convolutional and max-pooled layers are stacked on the refined time series to extract hierarchical features, and all units in the last convolutional layer are then flattened to form a vector followed by two fully connected layers, which are called embedded layers. As the second fully connected layer has two units, the input QAR time series is thus transformed into a twodimensional feature space (latent space). Designed as a symmetrical form, the decoder then reconstructs the original QAR sequence using the features in the latent space. AEs), is flexible and diverse, and its choice is dependent on the nature of the data. Previously, RNN-based models were considered and preferred for sequential data. However, recent studies have shown that CNN-based models perform better than general RNNbased models [43]. The main advantage of CNNs is their ability to extract complicated hidden features from high-dimensional data with complex structures [44], and they can also be adopted to extract features from sequential data.
In this paper, we construct a TFA-CAE network model for feature extraction of flight data. Figure 7 shows the structure and parameters of TFA-CAE, in which the TFAM is placed in front of the AE. An original QAR time series is first passed through the timefeature module to generate a refined time series. Within the encoder, multi-convolutional and max-pooled layers are stacked on the refined time series to extract hierarchical features, and all units in the last convolutional layer are then flattened to form a vector followed by two fully connected layers, which are called embedded layers. As the second fully connected layer has two units, the input QAR time series is thus transformed into a two-dimensional feature space (latent space). Designed as a symmetrical form, the decoder then reconstructs the original QAR sequence using the features in the latent space.
In the training of TFA-CAE, indexes of each max-pooling layer within the encoder are fed into the symmetrical unmax-pooling layers of the decoder. The error between the original QAR input and the reconstructed output is backpropagated to optimize the parameters of the model. Because our model is trained under the assumption that a small amount of anomalous data exists in the dataset, we use the Huber loss function [45], which is less sensitive to anomalies, to reduce the distortion of the model by anomalous flights.

HDBSCAN
Clustering is considered to play a significant role in unsupervised learning and deals with the data structure partition in unknown areas. In data science, clustering is used to find patterns or groupings in large datasets and is a general technique with broad applicability across scientific domains. For anomaly detection, objects deviating from patterns or far away from groupings are considered anomalies. As a classic density-based algorithm, density-based spatial clustering of applications with noise (DBSCAN) [46] defines the density at which the number of objects within a neighborhood of a certain radius must In the training of TFA-CAE, indexes of each max-pooling layer within the encoder are fed into the symmetrical unmax-pooling layers of the decoder. The error between the original QAR input and the reconstructed output is backpropagated to optimize the parameters of the model. Because our model is trained under the assumption that a small amount of anomalous data exists in the dataset, we use the Huber loss function [45], which is less sensitive to anomalies, to reduce the distortion of the model by anomalous flights.

HDBSCAN
Clustering is considered to play a significant role in unsupervised learning and deals with the data structure partition in unknown areas. In data science, clustering is used to find patterns or groupings in large datasets and is a general technique with broad Aerospace 2022, 9, 329 9 of 19 applicability across scientific domains. For anomaly detection, objects deviating from patterns or far away from groupings are considered anomalies. As a classic density-based algorithm, density-based spatial clustering of applications with noise (DBSCAN) [46] defines the density at which the number of objects within a neighborhood of a certain radius must reach the specified size (minPts). The size of the radius is specified by the distance threshold parameter (epsilon). All objects within the connected subsets that fulfill this density threshold are regarded as clusters, while the others are discarded as outliers. This method can detect clusters of different shapes and does not need to specify the number of clusters in advance. However, the major weakness of DBSCAN is that its epsilon parameter serves as a globe density threshold and, therefore, it cannot discover clusters of variable densities [47]. To overcome this problem, many variations of DBSCAN, such as HDBSCAN [31], OPTICS [48], AUTO-HDS [49], and DECODE [50], have been proposed. HDBSCAN has been shown to outperform both AUTO-HDS and the combination of OPTICS with Sander et al.'s [51] cluster extraction method [52]. The main implementation steps of HDBSCAN are shown as follows [53]: (1) Transform the space. In the HDBSCAN algorithm, a new distance metric between points called the mutual reachability distance is first defined to spread apart points with low densities: where d mreach−k x p , x q is the mutual reachability distance; core k x p and core k x q are the core distances defined for parameter k for points x p and x q , respectively; and d x p , x q is the original metric distance between x p and x q . With this metric, points with high density maintain the same distance from each other, while the sparse points are separated from other points by at least their core distance. (2) Build the minimum spanning tree. The dataset is then represented by a mutual reachability graph with the data points as vertices and a weighted edge between any two points with weights equal to the mutual reachability distances of those points. A minimum spanning tree of the graph, in which all the weights of all edges are the smallest, is constructed so that removing any edges of the tree will split the graph. The minimum spanning tree can be built quickly and efficiently with Prim's algorithm [54]. (3) Build the cluster hierarchy. Given the above minimum spanning tree, the next step is to convert it into a hierarchy of connected components. This is most easily done in the reverse order: sort the edges of the tree by the distances in increasing order and then iterate through them, creating a new merged cluster for each edge. The key here is to identify the two clusters for each edge to join together, which is easily achieved with a union-find data structure [55]. (4) Condense the cluster tree. This step condenses down the large and complicated cluster hierarchy into a smaller tree with slightly more data attached to each node. As the most important parameter of HDBSCAN, min_cluster_size is needed to make this concrete. Starting from the root, when each cluster is split, the samples of subclusters with sizes less than min_cluster_size are detected and marked as "outliers". If all the subclusters contain fewer than min_cluster_size objects, the cluster is considered to have disappeared at this density level. After walking through the whole hierarchy and completing this process, a tree with a small number of nodes, the condensed cluster tree, is obtained. (5) Extract the clusters. HDBSCAN uses λ/distance to measure the persistence of clusters and introduces the stability indicator. For each cluster, the stability is computed as: λ birth : the lambda value when the cluster is formed; λ death : the lambda value when the cluster is split into two subclusters; λ p : the lambda value when point p falls out of the cluster. where λ birth < λ p < λ death and s cluster is the stability of the cluster. As a solution to cluster extraction, HDBSCAN's selection algorithm traverses the condensed cluster tree from bottom to top and selects the cluster with the highest stability on each path.
The HDBSCAN class has a large number of parameters that can be set during initialization, but in practice, few parameters have a significant practical impact on clustering. min_cluster_size (the minimum size of clusters) and min_samples (the number of samples in a neighborhood for a point to be considered a core point) are the two main parameters that affect the anomaly detection results. Increasing min_cluster_size will reduce the number of clusters. The larger min_samples is, the more points there are that are considered outliers, and clusters will be restricted to denser areas.

Experimental Data
Flight landing data from Kunming Changshui International Airport (ICAO: ZPPP, hereafter) and Chengdu Shuangliu International Airport (ICAO: ZUUU, hereafter) in 2018 are taken as our experimental data in this paper. The dataset contains 14,195 flights, and the data are extracted as described in Section 2.2 and standardized by min-max normalization after the absolute operation for negative parameters. We split the dataset into a training set and a validation set. The training dataset is used to train the model, while the validation dataset is used to determine when to stop the training of the model. The details of the two datasets are shown in Table 2. Table 2. The details of the two datasets.

Name of Dataset Size of Dataset
Training set 9936 Validation set 4259

Model Training
In the experiment of this article, a GRU-based AE (GRU-AE) and a self-attentionbased [52] convolutional AE (SA-CAE) are also constructed for comparison with the TFA-CAE. All the models are built and trained using the PyTorch deep learning framework version 1.10. In the network training, early stopping is simultaneously adopted with the patience set to 15, which means model training stops when the loss error of the validation set no longer decreases after 15 epochs. In addition, the "Adam" optimizer is also used to build both models. Table 3 shows the comparison of the average loss values of the models. Because the decoder reconstructs the original QAR time series with the latent space representation mapped by the encoder, a smaller loss error between the original QAR input and the reconstructed outputs indicates a better feature representation of the input. Therefore, TFA-CAE is able to extract more representative features of the flight due to its smaller average loss error, as demonstrated in Table 3.

Visualization of the Extracted Flight Features
As the size of the latent representation is set to two, we are able to visualize the data of the extracted flight features. Figure 8 shows the visualization results of the flight features extracted by each model separately, as well as the PCA. By comparison, the flight features extracted by TFA-CAE are divided into two clusters (shown in Figure 8a), while the others show an irregular pattern of distribution (shown in Figure 8c,e,g). Flight feature objects extracted by TFA-CAE in the same cluster are more tightly aggregated together, while the two clusters are more clearly separated. Considering the fact that an aircraft landing at different airports follows the specific landing procedure for each airport, these two clusters are supposed to indicate the two kinds of landing patterns of the ZPPP and ZUUU airports. To verify the two landing patterns, we further divide the extracted flight features by these two airports separately. The division results of the flight features extracted by each model are separately shown in Figure 8b,d,f,h. In Figure 8b, the two clusters of flight features extracted by the TFA-CAE are consistent with the division according to the airports. For the other division results, the features of these two categories show an intertwined distribution. Based on the above, we summarize that the TFA-CAE model is able to more accurately extract features of flight and further discover different flight patterns, which provides a better feature basis for anomaly detection and analyses of anomalous characteristics.
Aerospace 2022, 9,329 11 of 20 the others show an irregular pattern of distribution (shown in Figure 8c,e,g). Flight feature objects extracted by TFA-CAE in the same cluster are more tightly aggregated together, while the two clusters are more clearly separated. Considering the fact that an aircraft landing at different airports follows the specific landing procedure for each airport, these two clusters are supposed to indicate the two kinds of landing patterns of the ZPPP and ZUUU airports. To verify the two landing patterns, we further divide the extracted flight features by these two airports separately. The division results of the flight features extracted by each model are separately shown in Figure 8b,d,f,h. In Figure 8b, the two clusters of flight features extracted by the TFA-CAE are consistent with the division according to the airports. For the other division results, the features of these two categories show an intertwined distribution. Based on the above, we summarize that the TFA-CAE model is able to more accurately extract features of flight and further discover different flight patterns, which provides a better feature basis for anomaly detection and analyses of anomalous characteristics.

Result of HDBSCAN Clustering
Flight features extracted by TFA-CAE are fed into the HBSCAN cluster algorithm for flight anomaly detection. From Section 2.4, we know that the input parameters of HDB-SCAN are min_cluster_size and min_samples. Therefore, the two parameters are used to determine the results of clustering and need to be specified artificially. To ensure the objectivity of clustering by minimizing manual intervention, we introduce a quantitative measure called the density-based clustering validation (DBCV) index [56] to evaluate and then determine the final clustering result. The index assesses the clustering quality based on the relative density connection between pairs of objects and is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within-and between-cluster density connectedness of the clustering results [57]. Unlike other metrics, such as the silhouette coefficient (SC) [58] and the Davies-Bouldin index (DBI) [59], DBCV takes noise into account and captures the shape property of clusters via densities but not distances and works for density-based clustering

Result of HDBSCAN Clustering
Flight features extracted by TFA-CAE are fed into the HBSCAN cluster algorithm for flight anomaly detection. From Section 2.4, we know that the input parameters of HDBSCAN are min_cluster_size and min_samples. Therefore, the two parameters are used to determine the results of clustering and need to be specified artificially. To ensure the objectivity of clustering by minimizing manual intervention, we introduce a quantitative measure called the density-based clustering validation (DBCV) index [56] to evaluate and then determine the final clustering result. The index assesses the clustering quality based on the relative density connection between pairs of objects and is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within-and between-cluster density connectedness of the clustering results [57]. Unlike other metrics, such as the silhouette coefficient (SC) [58] and the Davies-Bouldin index (DBI) [59], DBCV takes noise into account and captures the shape property of clusters via densities but not distances and works for density-based clustering algorithms precisely. As explained in their paper, DBCV produces a score between −1 and 1, with the larger the value indicating a better clustering solution.
The optimization goal of HDBSCAN is to find the optimal min_cluster_size and min_samples values that yield the maximum DBCV score. Figure 9 illustrates the process of selecting these two parameters. To avoid a loss of generality, we first fixed min_samples to the default setting and iterated min_cluster_size from 2 to 200 while calculating the number of clusters, as shown in Figure 9a. The number of clusters remained at 2 when min_cluster_size was 26 or greater. Then, we fixed min_cluster_size = 26 and iterated min_samples from 2 to 500 to obtain the maximum DBCV score, as shown in Figure 9b. Finally, we chose min_cluster_size = 26 and min_samples = 499 for our case study.
Aerospace 2022, 9,329 13 of 20 algorithms precisely. As explained in their paper, DBCV produces a score between -1 and 1, with the larger the value indicating a better clustering solution.
The optimization goal of HDBSCAN is to find the optimal min_cluster_size and min_samples values that yield the maximum DBCV score. Figure 9 illustrates the process of selecting these two parameters. To avoid a loss of generality, we first fixed min_samples to the default setting and iterated min_cluster_size from 2 to 200 while calculating the number of clusters, as shown in Figure 9a. The number of clusters remained at 2 when min_cluster_size was 26 or greater. Then, we fixed min_cluster_size = 26 and iterated min_samples from 2 to 500 to obtain the maximum DBCV score, as shown in Figure 9b. Finally, we chose min_cluster_size = 26 and min_samples = 499 for our case study. The result of HDBSCAN clustering of flight features is shown in Figure 10a, in which the red points indicate outliers, while the blue and green points represent two clusters, and , respectively. In the case study, a total of 825 outliers were detected. Furthermore, we matched the clustering result of the extracted flight features with the two airports, as shown in Figure 10b. In detail, is a cluster of flights that represents the common landing pattern of the ZPPP airport, while represents the landing pattern of the ZUUU airport. A total of 553 anomalous flights (red points) landed at the ZPPP airport, and 272 anomalous flights (purple points) landed at the ZUUU airport. and are two aggregations of anomalous flights that deviate far from the landing pattern of the ZPPP airport; in each cluster, the flights have similar anomalous characteristics. The point represents a flight with the farthest deviation from the common pattern of the ZPPP airport. One drawback is that six flights (orange points) at the ZUUU airport are clustered into the pattern of the ZPPP airport. The result of HDBSCAN clustering of flight features is shown in Figure 10a, in which the red points indicate outliers, while the blue and green points represent two clusters, C 1 and C 2 , respectively. In the case study, a total of 825 outliers were detected. Furthermore, we matched the clustering result of the extracted flight features with the two airports, as shown in Figure 10b. In detail, C 1 is a cluster of flights that represents the common landing pattern of the ZPPP airport, while C 2 represents the landing pattern of the ZUUU airport. A total of 553 anomalous flights (red points) landed at the ZPPP airport, and 272 anomalous flights (purple points) landed at the ZUUU airport. O 1 and O 2 are two aggregations of anomalous flights that deviate far from the landing pattern of the ZPPP airport; in each cluster, the flights have similar anomalous characteristics. The point P 1 represents a flight with the farthest deviation from the common pattern of the ZPPP airport. One drawback is that six flights (orange points) at the ZUUU airport are clustered into the pattern of the ZPPP airport.

Anomalous Flights Detected during the Landing Phase
In this section, the characteristics of anomalous flights with operational significance for each airport are further analyzed in detail. As mentioned above, an aircraft landing at a certain airport will follow the specific landing procedure for that airport, and an anomalous flight landing at a given airport may be normal according to the flight procedure of another airport. Therefore, the assessment and analysis of anomalous flights should be performed in the airport where a flight lands. For this, all the flights are first divided according to the two airports, and the example anomalous flights of each airport are separately presented in detail.
For each example, the most distinctive flight parameters are presented in graphs with the same format, where the red lines denote anomalous flight patterns, and the light green band depicts the common flight landing pattern of each airport (the normal part in the clustering result).

Examples of Anomalous Flights That Landed at the ZPPP Airport
In this section, flights within the and collections that landed at the ZPPP airport are taken as examples to explain in detail the characteristics of the anomalous flights and the flight marked with point . For the collection, in which all the flights have the same unusual behavior, a flight is plotted as a representative to show the details of the collection anomaly, as shown in Figure 11a. To be specific, is a collection of anomalous flights with flight altitude error records. The calibrated radar altitudes of all the flights were recorded as 0 throughout the landing phase. Figure 11b shows the anomalous flight marked with point , which is the furthest deviation from the landing pattern at the ZPPP airport. During the entire landing phase of the flight, the lateral acceleration G-Force was much lower than the common pattern, with values below −0.5.

Anomalous Flights Detected during the Landing Phase
In this section, the characteristics of anomalous flights with operational significance for each airport are further analyzed in detail. As mentioned above, an aircraft landing at a certain airport will follow the specific landing procedure for that airport, and an anomalous flight landing at a given airport may be normal according to the flight procedure of another airport. Therefore, the assessment and analysis of anomalous flights should be performed in the airport where a flight lands. For this, all the flights are first divided according to the two airports, and the example anomalous flights of each airport are separately presented in detail.
For each example, the most distinctive flight parameters are presented in graphs with the same format, where the red lines denote anomalous flight patterns, and the light green band depicts the common flight landing pattern of each airport (the normal part in the clustering result).

Examples of Anomalous Flights That Landed at the ZPPP Airport
In this section, flights within the O 1 and O 2 collections that landed at the ZPPP airport are taken as examples to explain in detail the characteristics of the anomalous flights and the flight marked with point P 1 . For the O 1 collection, in which all the flights have the same unusual behavior, a flight is plotted as a representative to show the details of the collection anomaly, as shown in Figure 11a. To be specific, O 1 is a collection of anomalous flights with flight altitude error records. The calibrated radar altitudes of all the flights were recorded as 0 throughout the landing phase. Figure 11b shows the anomalous flight marked with point P 1 , which is the furthest deviation from the landing pattern at the ZPPP airport. During the entire landing phase of the flight, the lateral acceleration G-Force was much lower than the common pattern, with values below −0.5.
The two flights within anomaly collection O 2 have extremely similar anomalous behavior as presented in Figure 12. They had been landing with a lower power; more specifically, the longitudinal acceleration G-forces of the two flights were much lower than the common pattern with values below −0.4 during the entire landing phase. The two flights within anomaly collection have extremely similar anomalous behavior as presented in Figure 12. They had been landing with a lower power; more specifically, the longitudinal acceleration G-forces of the two flights were much lower than the common pattern with values below −0.4 during the entire landing phase.

Examples of Anomalous Flights That Landed at the ZUUU Airport
Two examples of anomalous flights that landed at the ZUUU airport are separately displayed in Figures 13 and 14. Figure 13 shows a flight landing with a higher elevation and a larger longitudinal acceleration. The pitch was always much larger than the common elevation angle, although it returned to normal between approximately 8 and 5 s before touchdown. From approximately 80 s before touchdown, the longitudinal acceleration was greater than the common pattern until approximately 10 s before touchdown; it briefly returned to the common range from approximately 10 s to approximately 5 s before touchdown but then increased above the normal pattern until touchdown.  The two flights within anomaly collection have extremely similar anomalous behavior as presented in Figure 12. They had been landing with a lower power; more specifically, the longitudinal acceleration G-forces of the two flights were much lower than the common pattern with values below −0.4 during the entire landing phase.

Examples of Anomalous Flights That Landed at the ZUUU Airport
Two examples of anomalous flights that landed at the ZUUU airport are separately displayed in Figures 13 and 14. Figure 13 shows a flight landing with a higher elevation and a larger longitudinal acceleration. The pitch was always much larger than the common elevation angle, although it returned to normal between approximately 8 and 5 s before touchdown. From approximately 80 s before touchdown, the longitudinal acceleration was greater than the common pattern until approximately 10 s before touchdown; it briefly returned to the common range from approximately 10 s to approximately 5 s before touchdown but then increased above the normal pattern until touchdown.

Examples of Anomalous Flights That Landed at the ZUUU Airport
Two examples of anomalous flights that landed at the ZUUU airport are separately displayed in Figures 13 and 14. Figure 13 shows a flight landing with a higher elevation and a larger longitudinal acceleration. The pitch was always much larger than the common elevation angle, although it returned to normal between approximately 8 and 5 s before touchdown. From approximately 80 s before touchdown, the longitudinal acceleration was greater than the common pattern until approximately 10 s before touchdown; it briefly returned to the common range from approximately 10 s to approximately 5 s before touchdown but then increased above the normal pattern until touchdown. As opposed to the flight mentioned above, a flight landing with a fluctuating attitude and dynamics that are lower than the common pattern is shown in Figure 14. To be specific, the AOA, PITCH, LONG, and PITCH_RATE fluctuated throughout the landing phase, with PITCH_RATE being the most pronounced. Separately, the angle of attack was lower than the common pattern from approximately 75 s to approximately 45 s before touchdown and fluctuated back and forth at the lower limit of the normal range during the subsequent landing. As the result of the larger change rate, the flight pitched up and down with a larger range in the two time periods of 85 to 65 s and 45 s to touchdown. Moreover, the flight landed with the head of the aircraft below the horizontal line from 70 to 45 s before touchdown because its angle of pitch was less than 0. In addition, the longitudinal acceleration was always smaller than the common pattern from approximately 62 s to approximately 42 s before touchdown.   As opposed to the flight mentioned above, a flight landing with a fluctuating attitude and dynamics that are lower than the common pattern is shown in Figure 14. To be specific, the AOA, PITCH, LONG, and PITCH_RATE fluctuated throughout the landing phase, with PITCH_RATE being the most pronounced. Separately, the angle of attack was lower than the common pattern from approximately 75 s to approximately 45 s before touchdown and fluctuated back and forth at the lower limit of the normal range during the subsequent landing. As the result of the larger change rate, the flight pitched up and down with a larger range in the two time periods of 85 to 65 s and 45 s to touchdown. Moreover, the flight landed with the head of the aircraft below the horizontal line from 70 to 45 s before touchdown because its angle of pitch was less than 0. In addition, the longitudinal acceleration was always smaller than the common pattern from approximately 62 s to approximately 42 s before touchdown.

Conclusions
In the field of civil aviation, safety management has shifted from post-event investigation and analysis toward advanced warning with the aim of effectively preventing potential accidents before they occur. For this, civil aviation strives to innovate and proactively identify operationally significant safety events that are untracked and then implement risk mitigation in the form of revising safety requirements to address the newly identified vulnerability. To improve and automate the identification of unknown vulnerabilities in flight operations, we proposed a time-feature attention mechanism that focuses on the key parameters and times and constructed a hybrid model for identifying operationally significant anomalies. The hybrid model combines TFA-CAE, which extracts flight features, and an HDBSCAN clustering algorithm to automatically detect anomalies based on the extracted features. The time-feature attention mechanism is demonstrated to improve the performance of flight feature extraction by comparing the TFA-CAE with SA-CAE (self-attention-based CAE), GRU-AE and PCA. In the subsequent detection and analysis of anomalies, flight patterns that are consistent with those at each airport are discovered, resulting in the anomalies being interpreted according to the corresponding pattern. Moreover, the hybrid model can also discover the aggregation of anomalous flights with similar unusual behaviors.
Future work: In this study, the flight patterns at the airport level and the corresponding anomalous flights were well discovered and detected, respectively; the next steps will potentially focus on developing an architecture to further discover the flight patterns at the runway level and detect the corresponding anomalous flights for airports with multiple runways. In addition, the automatic capture and classification of the same anomaly characteristics is another area of research that can enable better-targeted risk prevention.