Air Traffic Flow Management Delay Prediction Based on Feature Extraction and an Optimization Algorithm

: Air Traffic Flow Management (ATFM) delay can quantitatively reflect the congestion caused by the imbalance between capacity and demand in an airspace network. Furthermore, it is an important parameter for the ex-post analysis of airspace congestion and the effectiveness of ATFM strategy implementation. If ATFM delays can be predicted in advance, the predictability and effectiveness of ATFM strategies can be improved. In this paper, a short-term ATFM delay regression prediction method is proposed for the characteristics of the multiple sources, high dimension, and complexity of ATFM delay prediction data. The method firstly constructs an ATFM delay prediction network model, specifies the prediction object


Introduction
It is difficult to match the continuous growth in air transportation demand with the improvement in airport and airspace networks' support abilities and management levels.ATFM is facing unprecedented challenges; moreover, China's ATFM system is in its initial stage and its implementation effect has not achieved its expectation.As a key evaluation index of the implementation effect of the ATFM strategy, ATFM delays are very important in reducing the number of flight delays and improve the efficiency of an ATFM system.Therefore, this paper focuses on ATFM delay prediction research and enacts ATFM delay prediction from a network perspective so as to grasp delay trends and bottleneck points in the system more accurately and enable the ATFM department to predict possible delay situations in advance and adjust the ATFM strategy to reduce potential delay losses.
An ATFM delay is the time difference between the Target Take-Off Time (TTOT) requested by the aircraft operator and the Calculated Take-Off Time (CTOT) first assigned by the ATFM function [1].The capacity of the nodes (airports or waypoints) is dynamically changing due to various reasons such as weather, other airspace users, etc.When there is a mismatch between the node capacity and demand, the ATFM sector performs a traffic management strategy that takes the TTOT as the input and assigns the CTOT to each flight subject to traffic control for calculation.ATFM delays reflect flight delays due to airspace cell capacity constraints that result from the adoption of ATFM strategies.The International Civil Aviation Organization (ICAO) in its ATFM manual suggests an average ATFM delay of 1 min for each flight en route [2].The European Organization for the Safety of Air Navigation (EUROCONTROL) suggests an average ATFM delay of 0.5 min [3].Therefore, predictions of ATFM delay durations and the number of delayed flights can provide a decision-making reference for the selection and application of ATFM strategies.
ATFM delay prediction research mainly includes two major aspects, as follows: ATFM delay causes and applications as well as the ATFM delay prediction model.In the research on ATFM delay causes and applications, Delgado et al. first divided ATFM delay into ground delay and airborne delay, and they proposed a deceleration strategy so that airborne ATFM delays could replace a small portion of ground ATFM delays [4].At present, many studies have been carried out at home and abroad in the field of airport surface operation optimization, mainly focusing on airport surface traffic operation modeling, airport surface performance index analysis, and airport surface resource optimization scheduling.In order to validate the scientific validity of existing ATFM regulations, Delgado et al. concluded that ATFM delays caused by airspace capacity account for 50-60% of the total number of ATFM delays, with this outcome being reached by integrating historical data from the past five years; they also concluded that airspace capacity constraints are mainly due to air traffic control capacity and staffing issues.In addition, the currently available airspace capacity is lower than the expected traffic demand [5].Bolic et al. optimized flight plans by shifting the times of the flights causing ATFM delays [6].Post et al. evaluated the operating conditions leading to the increased probability of an airport ATFM delay through Bayesian networks, and the results showed that the predicted arrival congestion index and the actual arrival congestion index were the indicators that had the highest impact on airport ATFM delays [7].Ramon et al. proposed three indicators to predict the trend of ATFM delays from the perspective of ATFM delay evolution trends, which were as follows: the expectation of an actual ATFM delay, the probability distribution of an ATFM delay, and the trend of ATFM delays [8].Sergi et al. categorized the causes of ATFM delays into those of airport traffic, airport capacity, network capacity, and operating slots.Meanwhile, they proposed two classification models to realize the prediction of ATFM delay occurrence probability and delay causes [9].
In addition, space weather events are also an important factor affecting ATFM delays.Space weather refers to the process of the sun interfering with the space environment, the geomagnetic field, as well as the Earth's ionosphere and thermosphere.Wiliams [10] and James [11] et al. have carried out a number of studies on coronal mass ejections (CMEs), proving that CMEs are the largest rapid ejection phenomenon in the solar atmosphere and the main source of disturbance for space weather.These disturbances may affect the high-frequency radio wave communications that are used by the aviation industry [12], affect the normal operation of global navigation satellite systems [13,14], and even cause increased radiation that endangers the health of flight crews and passengers [15,16].When encountering unusual space weather, airlines respond to these threats with measures such as cancelling flight plans, lowering flight altitudes, and changing flight routes, thereby resulting in additional fuel consumption [15,16].In addition, when the timing of space weather affects the normal operation of satellite navigation, aircraft must use ground navigation instead of satellite navigation, which leads to higher standards for aircraft separation and lower airspace capacity, resulting in increased flight delays, increased costs, and other problems [12][13][14].To deal with the effects of unusual space weather, Robyn et al. examine the moderate and severe thresholds adopted to identify events where space weather is likely to affect high-frequency radio communication and evaluate the frequency and duration of events [12].Xue et al. simulated a satellite navigation failure scenario and evaluated the potential economic impact of the upcoming space weather on flight operations from the ATFM perspective [13].Xue et al. created a hypothetical scenario by simulating the prediction flight data of Hong Kong International Airport during a geomagnetic storm to explore the impact of GNSS positioning error on ATFM [14].Xue et al. proposed a multi-objective optimization model to assign flight altitude and speed [15].Hands developed a new model to predict the effects of the airborne radiation environment to provide real-time information about atmospheric radiation [16].
In the research on ATFM delay prediction models, in order to realize high-precision ATFM delay regression prediction, traditional machine learning can no longer meet the complex and large-volume prediction task, and scholars prefer to use deep learning and its deformation algorithms.The concept of deep learning originates from the research of artificial neural networks.Multi-layer perceptrons with multiple hidden layers are a kind of deep learning structure.Aiming at the problems of low computational efficiency and numerous parameters of deep learning algorithms, Jingyi Qu et al. have successively proposed an airport delay prediction model based on regional residuals and LSTM network [17], a flight delay prediction model based on the spatiotemporal sequence of Conv-LSTM [18], and a flight delay prediction model based on MobileNetV2 [19].In addition, Jingyi Qu et al.The results showed that air traffic regional control centers are one of the main influencing factors [21].Chen et al. developed a deep residual neural network (ResNet) for nonlinear functional regression, replacing convolutional and pooling layers with a fully connected layer to ensure that the deep residuals can achieve high-precision prediction of complex problems in nonlinear regression [22].Qu et al. proposed two flight delay prediction models based on meteorological data, namely the DCNN model and SE-DenseNet model.In the DCNN model, both a linear channel and convolutional channel are designed to enhance the patency of the deep network.In the SE-Densenet model, an SE module is added after the convolution layer of each DenseNet block to realize feature recalibration in the feature extraction process [23].Chen et al. extended the traditional idea of the FC-LSTM network to the Conv-LSTM network, and used the Conv-LSTM network to extract spatial and temporal features to achieve short-term prediction of delay in network structure [24].Micha et al. combined the hybrid density network and random forest algorithm to realize the probability prediction of flight delay, and integrated these probability prediction results into the flight gate allocation problem, improving the robustness of gate allocation [25].Hu et al. proposed a traffic flow prediction model based on multi-attention mechanism Spatiotemporal Graph Convolution network to realize dynamic adjustment of spatiotemporal features [26].Ma et al. proposed a traffic flow prediction method based on multi-head self-attention mechanism spatiotemporal Infographic Convolutional network [27].Aiming at the problem that the DenseNet model will lose the basic information obtained from independent input features, Jiang et al. proposed an improved regression model by DenseNet, in which the convolution layer and pooling layer are replaced by the fully connected layer, and the original connection shortcut is maintained to reuse features [28].Sergi et al. proposed the RNN-CNN cascade architecture to realize capacity prediction of enroute traffic [29].Jiang Yu et al. regularized the airport network graph structure by means of spectral convolution, used GCN and GLU to capture the spatiotemporal correlation in the network and formed spatiotemporal convolution blocks, and proposed a flight delay prediction model based on spatiotemporal graph convolution neural network [30].Wu Chen et al. obtained the dynamic characteristics of airport ground support process by using the Petri Net model, and integrated CNN, LSTM, and ATT algorithms to propose a CNN-LSTM-ATT flight delay prediction model [31].Deep learning is widely used in the field of transportation and has excellent performance in delay prediction, but there are problems such as complex and many parameters and high dependence on raw data.At the same time, according to the classification of ATFM delay causes by EUROCONTROL [32], the ATFM delay data are characterized by multiple data sources, multiple variables, and unbalanced data, which has caused some difficulties to the prediction work.
A summary of ATFM delay application and prediction methods is shown in Tables 1  and 2. The ATFM system in China did not start operation until May 2021.Compared to developed aviation countries such as Europe and the United States, there is still a certain research gap in the study of ATFM delays in China.Particularly in the areas of prediction and post-analysis, further research is needed.In the face of increasingly saturated airspace resources, in-depth research on ATFM delay indicators is crucial to reduce delays caused by current capacity shortages and provide references and preparations for effectively managing available capacity.Additionally, existing ATFM delay prediction algorithms primarily focus on traditional machine learning and deep learning.Traditional machine learning methods have low prediction accuracy, while deep learning methods perform well in delay prediction but suffer from complex and numerous parameters and high dependency on raw data.Moreover, ATFM delay prediction datasets exhibit characteristics such as multiple data sources, multiple variables, and imbalanced data, which pose certain difficulties for prediction work.Therefore, algorithm optimization or the use of joint algorithms is necessary to achieve accurate and high-precision ATFM delay prediction.Therefore, in order to achieve reliable and high-precision ATFM delay prediction results, this paper combines feature extraction algorithms, a deep learning prediction model, and a parameter optimization algorithm, and proposes two ATFM delay prediction models with higher robustness, which can achieve short-term prediction of ATFM delay duration and the number of delayed flights from the tactical stage.

ATFM Delay Prediction Method Design 2.1. ATFM Delay Prediction Network Model
This paper determines whether ATFM delay occurs on a flight according to the difference between CTOT and TTOT of the flight.The calculation method is shown in Equation ( 1 The mean value calculation method of ATFM delay is shown as follows.According to the mean value calculation method, ATFM delay of departure, ATFM delay of arrival, ATFM delay of airport, and other dimensions can be calculated.

Average ATFM delay =
Total ATFM Delay Total Flight Volume The average ATFM delay per unit time is: N indicates the total number of flights per unit of time; Di indicates the delay coefficient of the ith flight.
Flights affected by congestion nodes generate ATFM delays due to flow control.The location where ATFM delays occur can be the departure airport, destination airport, a waypoint on the route, etc.In this paper, we do not take the ATFM delay of the individual flight or airport as the prediction object, but realize the prediction of ATFM delay duration and delayed flight volume from a systematic point of view.
According to the basic concept of network graph and ATFM delay generation process, the dynamic ATFM delay prediction network graph is constructed by integrating time information with the airports and waypoints as nodes and the routes as edges.The ATFM delay prediction network graph G can be expressed as G = (V, E, T), where, V denotes the set of nodes; E is the set of edges; T is the set of time, an ordered time sequence, which represents the time points in the dynamic network graph.(V1, V2) denotes a directed edge from node V1 to node V2.In order to simplify the network graph, two key waypoints are selected as nodes in the network graph for each route.According to the running direction of the route, the departure airport node, two key waypoints, and the destination airport node are connected in turn, and the connecting line constitutes a complete directed edge.A brief schematic of the ATFM delay prediction network is shown in Figure 1, in which the ATFM delays are predicted for AC-edge, BD-edge, CA-edge, and DB-edge.

ATFM Delay Prediction Process
There are two major steps in ATFM delay prediction research, as shown in Figure 2.

ATFM Delay Prediction Process
There are two major steps in ATFM delay prediction research, as shown in Figure 2.

ATFM Delay Prediction Process
There are two major steps in ATFM delay prediction research, as shown in Figure 2. The first step is to build the ATFM delay prediction network and dataset, including the following three steps.
(1) Data collection and preprocessing: using and matching weather forecast data, flow control release data, flight schedule data, and route data.And the corresponding ATFM delays are calculated.Meanwhile, the variables with more outliers and missing values are eliminated to form the temporal ATFM delay original prediction dataset.

A B
Step 1：Constructing ATFM Delay Prediction Networks and Datasets The first step is to build the ATFM delay prediction network and dataset, including the following three steps.
(1) Data collection and preprocessing: using and matching weather forecast data, flow control release data, flight schedule data, and route data.And the corresponding ATFM delays are calculated.Meanwhile, the variables with more outliers and missing values are eliminated to form the temporal ATFM delay original prediction dataset.(2) Establish ATFM delay prediction network model: Select the key elements in the network and construct the network model.According to the original prediction dataset, define the scope of the network and construct the ATFM delay prediction network diagram.(3) Construct ATFM delay prediction index system: In the case of lack of data acquisition and unclear delay causes, mine factors affecting ATFM delay from the perspective of departure airport, destination airport, airspace network, etc., to form a high-quality and diversified ATFM delay prediction dataset.It includes the mining of common flow control information, key node identification and flow statistics methods, and dynamic weighted PageRank value calculation of nodes.
Step 2 is the construction of the ATFM delay prediction model and example validation, including the following two steps.(2) Instance validation: Four typical busy airports and their main route points in East China are selected as nodes of the ATFM delay prediction network for instance validation.The combinations of different models are tested for effectiveness, and the importance of prediction features and prediction results are analyzed in depth.

ATFM Delay Prediction Index System
EUROCONTROL classifies ATFM delay causes into two main categories, including route disturbance events, airport disturbance events, waypoint capacity, airport capacity, airport weather, and control staffing.Among them, route and airport disturbance events are the main reasons affecting ATFM delay.The occurrence of disturbing events is usually random and inevitable.Therefore, when constructing the factors affecting ATFM delay, some innovative indicators should be put forward according to the situation of prediction network construction and data collection.

Common Flow Control Information Mining
In China, the reasons for flow control are categorized into six main groups: public safety, flight schedules, airports, ATC, traffic, and other airspace users.There is a close relationship between flow control information and ATFM delays.And flow control information can convey flow control measures and adjustment aspects, which provide flow control for flights that experience ATFM delays.By analyzing the occurrence patterns and trends of historical flow control, some of the more constant flow control information can be mined and used as ATFM delay predictors to improve ATFM delay predictability.
Taking the flow control information received by the Shanghai Approach from 1 January 2023 to 20 June 2023 as an example, there are 9841 flow control messages in total.The statistics of the top ten historical flow control messages received are shown in Table 3.Among them, the flow control named Message-MIT-OVTAN was published 1256 times, the frequency of publication accounted for 12.76% of the total number of releases, and the average duration of flow control measures was 637 min.The top ten flow control measures last more than 280 min, showing a pattern in flow control reasons and time distribution.Therefore, the flow control content with frequent flow control and long duration is selected as the key category index of ATFM delay prediction.The frequency statistics of controlled waypoints are shown in Figure 3, in which OV-TAN waypoints were controlled 4009 times, much higher than other waypoints, accounting for 7.05% of the total controlled waypoint frequency.In addition, there are 115 waypoints with more than 100 instances (749 waypoints were controlled), and the frequency of high-frequency controlled waypoints accounted for 85% of the total controlled waypoint frequency.Therefore, the waypoints with more than 100 instances of controlled frequency are regarded as common controlled waypoints, which are used as ATFM delay predictors.
Routes containing common controlled waypoints have a higher probability of generating ATFM delay.
The frequency statistics of controlled waypoints are shown in Figure 3, in which OVTAN waypoints were controlled 4009 times, much higher than other waypoints, accounting for 7.05% of the total controlled waypoint frequency.In addition, there are 115 waypoints with more than 100 instances (749 waypoints were controlled), and the frequency of high-frequency controlled waypoints accounted for 85% of the total controlled waypoint frequency.Therefore, the waypoints with more than 100 instances of controlled frequency are regarded as common controlled waypoints, which are used as ATFM delay predictors.Routes containing common controlled waypoints have a higher probability of generating ATFM delay.

Key Node Identification and Flow Counting Method
There may be one or more routes between city pairs.In order to simplify the experiment, this paper filters out one of the most frequently used routes between city pairs.Multiple waypoints exist on the route, and the higher the flow of a waypoint, the higher the possibility of the waypoint becoming a capacity bottleneck, thus generating ATFM delay.Based on historical data statistics, the top two waypoints account for approximately 20% of the total traffic.Therefore, these two waypoints are considered as key waypoints on the route.

Key Node Identification and Flow Counting Method
There may be one or more routes between city pairs.In order to simplify the experiment, this paper filters out one of the most frequently used routes between city pairs.Multiple waypoints exist on the route, and the higher the flow of a waypoint, the higher the possibility of the waypoint becoming a capacity bottleneck, thus generating ATFM delay.Based on historical data statistics, the top two waypoints account for approximately 20% of the total traffic.Therefore, these two waypoints are considered as key waypoints on the route.Usually, the key waypoints are the ones carrying large flow pressure or the intersection of multiple routes.The flow statistics for the key waypoints are shown in Figure 4.

Dynamic Weighted PageRank Calculation Method
The PageRank algorithm can be defined on any directed network graph and describes the behavior of a random wanderer visiting each node along the directed graph.Under certain conditions, the probability of the limit case visiting each node converges to a smooth distribution, and the value of this probability is the PageRank value, which can indicate the importance of the node [33].The higher the importance of nodes (airports or key waypoints) in the network, the higher the probability of ATFM delay, so the PageRank value of the nodes in the ATFM delay prediction network can be used as an indicator of ATFM delay prediction.In this paper, for the problem of the average distribution irrationality in the traditional PageRank algorithm, the dynamic weighted PageRank algorithm is

Dynamic Weighted PageRank Calculation Method
The PageRank algorithm can be defined on any directed network graph and describes the behavior of a random wanderer visiting each node along the directed graph.Under certain conditions, the probability of the limit case visiting each node converges to a smooth distribution, and the value of this probability is the PageRank value, which can indicate the importance of the node [33].The higher the importance of nodes (airports or key waypoints) in the network, the higher the probability of ATFM delay, so the PageRank value of the nodes in the ATFM delay prediction network can be used as an indicator of ATFM delay prediction.In this paper, for the problem of the average distribution irrationality in the traditional PageRank algorithm, the dynamic weighted PageRank algorithm is used to calculate the importance of the nodes in the network.And the dynamic weighted PageRank value can more accurately reflect edge weight and time factors on the importance of nodes.
(1) Dynamic weighted PageRank value calculation for airport nodes The higher the waypoint flow passed by a route departing from an airport, the higher the likelihood that the route will be subject to flow control, and the more important the airport node is in the ATFM delay prediction network.According to the statistical process of key waypoints, two key waypoints are filtered out for the routes passing between two airport nodes.These two key waypoints can represent the higher level of the routes passing through the busy nodes, so the sum of the two key waypoints' flow can be used as the weights for weighted PageRank value calculation for airport nodes.The formula for calculating the dynamic weighted PageRank value of an airport node in the airport network is as follows: where N is the total number of airport nodes in the airport network and other symbols are defined as shown in Table 4.
Table 4. Formula symbol definition for dynamically weighted PageRank values.

Symbol Definition
PR t+1 (V i ) PageRank value of node V i at the t + 1 moment.
∂ ∂ is the damping coefficient, a parameter that controls the probability of randomly visiting a node.
β β is an attenuation factor that controls the effect of time.The value of β ranges from 0 to 1, indicating the decline degree in the importance of the page.
PR t (V i ), PR t V j PageRank value of node V i , node V j at moment t.

Count V j
The number of outgoing chains for node V j W V j , V i The weights of node V j and node V i µ Weight coefficient, µ ranging from 0 to 1, controls the influence of input weight on PageRank value.
At moment t, the flow at two key waypoints R 1 and R 2 on a route with node V j as the departure airport and node V i as the destination airport.
(2) Dynamic PageRank value calculation method for waypoints Because it is difficult to obtain the relevant data of waypoints, the unweighted dynamic PageRank value calculation method is used.The formula for calculating the dynamic PageRank value of waypoints in the airspace network is as follows: where N is the total number of selected waypoints in the airspace network, and other symbols are defined as shown in Table 2.
In summary, the ATFM delay prediction index system is constructed as shown in Table 5; ATFM delay prediction indexes are categorized as four major categories: departure airport, arrival airport, airspace network, and others.

ATFM Delay PATFM Delay Prediction Model
The ATFM delay prediction task belongs to time series regression prediction and involves multiple factors and complex relationships between variables in the prediction data.In this paper, based on the feature extraction module, a heuristic parameter optimization algorithm, we improve the feature extraction ability, long-term dependence modeling ability, and computational efficiency of the prediction model, so as to obtain better performance and effect.

Feature Extraction Module
In this paper, CNN, TCN, and attention mechanism are used to extract features from the ATFM delay prediction dataset from temporal and spatial perspectives, respectively.By extracting the most representative features from the prediction data and mining the hidden information, the model operation performance is improved.In ATFM delay prediction, CNN performs multilayer convolution and pooling operations on the received ATFM delay multidimensional prediction data to extract spatial features with local perceptual ability.These features can capture structures and patterns in the input data, such as airspace distribution structure in air traffic, flight density, etc. TCN can effectively capture long-term dependencies and temporal correlations in time series data, as well as model complex nonlinear relationships, to improve the accuracy of the prediction model.
Attention mechanism is a technique used to enhance the performance of neural network models by dynamically assigning weights so that the model can pay more attention to the useful information in the input and improve the performance and expressiveness of the model.In LSTM, attention mechanism can be applied to the input, hidden state, and output parts.And CNN and TCN can pre-process the input data of LSTM, so attention mechanism is applied to the output part of LSTM.
The infrastructure of using the attention mechanism for the output part is shown in Figure 5. First, the attention score is obtained by performing similarity computation between query and key; then, the attention score is normalized to obtain attention weights; finally, the attention weights are multiplied by the corresponding values, and all the weighted values are summed up to obtain the final weighted representation.The result of the weighted summation can be used as a direct output of the prediction result or passed to the subsequent layers for further processing.

LSTM Model
Long Short-Term Memory (LSTM) solves the problem of gradient vanishing and gradient explosion in traditional RNN by introducing a gating mechanism.The structure of LSTM is shown in Figure 6.x, h, and C represent input, hidden state, and memory state, respectively.The LSTM selectively updates, saves, and passes information through the interaction of x and C, and the interaction of h and C. The LSTM contains three key gating mechanisms: forgetting gate, input gate, and output gate.Through the use of gating operations and state updates, the sigmoid function and tanh function help the LSTM model to better deal with long-term dependencies, memorized information, and the hidden state of the output.In this case, the sigmoid function maps the input value to a range between 0 and 1, and the tanh function maps the input value to a range between −1 and 1.

LSTM Model
Long Short-Term Memory (LSTM) solves the problem of gradient vanishing and gradient explosion in traditional RNN by introducing a gating mechanism.The structure of LSTM is shown in Figure 6.x, h, and C represent input, hidden state, and memory state, respectively.The LSTM selectively updates, saves, and passes information through the interaction of x and C, and the interaction of h and C. The LSTM contains three key gating mechanisms: forgetting gate, input gate, and output gate.Through the use of gating operations and state updates, the sigmoid function and tanh function help the LSTM model to better deal with long-term dependencies, memorized information, and the hidden state of the output.In this case, the sigmoid function maps the input value to a range between 0 and 1, and the tanh function maps the input value to a range between −1 and 1.

LSTM Model Based on Feature Extraction Optimization
Referring to CNN-LSTM [17,18] and TCN-LSTM [34] models, and combining with attention mechanism, this paper proposes two improved LSTM models, which are the CNN-LSTM-ATT model and the TCN-LSTM-ATT model.As shown in Figure 7, the steps of ATFM delay prediction are as follows: (1) Input the ATFM delay time series and prediction index data into the feature extraction module.Among them, CNN mainly extracts the spatial characteristics of data, and TCN mainly extracts the temporal characteristics of data.The input data are convolved and pooled in the feature extraction module to obtain the feature-mapped

LSTM Model Based on Feature Extraction Optimization
Referring to CNN-LSTM [17,18] and TCN-LSTM [34] models, and combining with attention mechanism, this paper proposes two improved LSTM models, which are the CNN-LSTM-ATT model and the TCN-LSTM-ATT model.As shown in Figure 7, the steps of ATFM delay prediction are as follows:

ATFM Delay Prediction Model Based on Sparrow Search Algorithm
Sparrow search algorithm (SSA) is a heuristic optimization algorithm based on the foraging and migratory behavior of bird flocks.SSA finds the optimal solution by simulating the interaction, cooperation, and competition behaviors of sparrows during the foraging process.During the optimization process, each sparrow represents a solution and its quality is evaluated based on its fitness value.By simulating the searching, following and competing among individual sparrows, the algorithm gradually adjusts its position to approximate the optimal solution.The SSA algorithm has the advantages of faster convergence, excellent global search capability, high adaptivity, etc., and can be applied to a wide range of optimization problems.
The CNN-LSTM-ATT model and TCN-LSTM-ATT model have complex structures, and their performance depends largely on the selection of parameters.In recent years, in order to improve the performance and prediction accuracy, many scholars have used the SSA algorithm to optimize the parameters in the LSTM model and improved LSTM models [35][36][37].Therefore, in this paper, the SSA algorithm is used to automatically search for parameter combinations in the ATFM delay prediction model.The parameter is regarded

ATFM Delay Prediction Model Based on Sparrow Search Algorithm
Sparrow search algorithm (SSA) is a heuristic optimization algorithm based on the foraging and migratory behavior of bird flocks.SSA finds the optimal solution by simulating the interaction, cooperation, and competition behaviors of sparrows during the foraging process.During the optimization process, each sparrow represents a solution and its quality is evaluated based on its fitness value.By simulating the searching, following and competing among individual sparrows, the algorithm gradually adjusts its position to approximate the optimal solution.The SSA algorithm has the advantages of faster convergence, excellent global search capability, high adaptivity, etc., and can be applied to a wide range of optimization problems.
The CNN-LSTM-ATT model and TCN-LSTM-ATT model have complex structures, and their performance depends largely on the selection of parameters.In recent years, in order to improve the performance and prediction accuracy, many scholars have used the SSA algorithm to optimize the parameters in the LSTM model and improved LSTM models [35][36][37].Therefore, in this paper, the SSA algorithm is used to automatically search for parameter combinations in the ATFM delay prediction model.The parameter is regarded as a sparrow individual, and the model performance is determined according to the location of the sparrow individual in the space.After several rounds of testing, we determine the important parameters that affect the prediction performance of the CNN-LSTM-ATT model and TCN-LSTM-ATT model.For the CNN-LSTM-ATT model, the number of layers in the hidden layer, the number of neurons, and learning rate in the LSTM model are the parameters to be optimized.For the SSA-LSTM-2 model, the number of filters of the convolutional layer in the TCN module, the number of neurons in the hidden layer, and the learning rate are the parameters to be optimized.The parameter definitions are shown in Table 6.

Parameter Definition
The number of layers in the hidden layer (n_hidden) In LSTM network, the more hidden layers, the more complex the model, the stronger the learning ability, and the easier it is to overfit.
The number of neurons (n_neuron) n_neuron determines the capacity and expressive power of the model.A higher number of neurons increases the complexity of the model, allowing it to better capture long-term dependencies and complex patterns in the input sequence.

Learning rate
Learning rate can control the network learning speed.If the setting is too small, the model convergence speed will slow down.If the setting is too large, oscillations may occur and the network cannot converge.
The number of filters in convolutional layer (n_filter) n_filter determines the expressiveness and learning ability of the model.A larger number of filters can capture more local features and increase the receptive field of the model, which may lead to overfitting.
The optimization process of the SSA algorithm for the ATFM delay prediction model is shown in Figure 8.
(1) Determine the parameters to be optimized and set the range of parameters.According to the constraint range, randomly generate the position and speed of initial individuals to construct the sparrow population.At the same time, initialize the parameters such as population number, dimension, and initial position.
(2) According to the current position of sparrow individuals, pass the corresponding parameters to the ATFM delay prediction model.Then, train the ATFM delay prediction model using the training set and evaluate the model performance using the validation set.
(3) Calculate the fitness function value based on the performance metrics (accuracy, loss function) to measure the performance of the sparrow individual.
(4) Based on the fitness function value, update the new speed and position of the individual sparrow so that the individual sparrow moves to a more optimal position.The sparrow individual with the highest fitness is selected as the globally optimal position in the population.
(5) Repeat steps 2-4 until a predetermined number of iterations is reached.( 6) At the end of the iterations, select the sparrow individual with the best fitness based on the fitness function value, and its corresponding ATFM delay prediction model parameter combination is the best parameter combination.
The number of filters in convolutional layer (n_filter) n_filter determines the expressiveness and learning ability of the model.A larger number of filters can capture more local features and increase the receptive field of the model, which may lead to overfitting.
The optimization process of the SSA algorithm for the ATFM delay prediction model is shown in Figure 8. (1) Determine the parameters to be optimized and set the range of parameters.According to the constraint range, randomly generate the position and speed of initial individuals to construct the sparrow population.At the same time, initialize the parameters such as population number, dimension, and initial position.

Experimental Environment
According to the authors' prequel study on congestion discrimination and prediction in air traffic networks [38], a region with a high congestion level in Chinese airspace (East China) is selected as the ATFM delay prediction network.In addition, the flow control during the data collection period mainly occurred on domestic routes, so only domestic routes are selected for example validation.Four fields in East China are selected as the departure airports to construct the ATFM delay prediction network, as shown in Figure 9.Among them, the four east China fields are denoted by ICAO codes, ZSSS (Shanghai Hongqiao International Airport), ZSPD (Shanghai Pudong International Airport), ZSHC (Hangzhou Xiaoshan International Airport), and ZSNJ (Nanjing Lukou International Airport).
From 1 May to 31 May 2023, we select the ATFM delay prediction data of four departures in East China, with a total of 43,964 valid data points, 20,654 data points with CTOT moments assigned, and a total of 15,994 ATFM delay data points actually generated.The ATFM delay prediction data from 1 May to 21 May are used as the training set; the ATFM delay prediction data from 22 May to 24 May are used as the validation set; and the ATFM delay prediction data from 25 May to 31 May are used as the test set.routes are selected for example validation.Four fields in East China are selected as the departure airports to construct the ATFM delay prediction network, as shown in Figure 9.Among them, the four east China fields are denoted by ICAO codes, ZSSS (Shanghai Hongqiao International Airport), ZSPD (Shanghai Pudong International Airport), ZSHC (Hangzhou Xiaoshan International Airport), and ZSNJ (Nanjing Lukou International Airport).ATFM delays are generated due to a variety of complex reasons.In order to improve the accuracy of ATFM delay prediction and make the prediction indicators as close as possible to the real situation, the time window of the relevant indicators is selected as shown in Table 7.In this paper, from a tactical point of view, we can make short-term prediction of ATFM delay duration and delayed flight volume from one day to several hours in the future.

Comparison of Prediction Effect
The experiments are conducted in the PyTorch framework to build and train the models, and after several rounds of testing, the random seed is set to 221.In this paper, we propose the CNN-LSTM-ATT model and TCN-LSTM-ATT model based on SSA optimization (denoted as SSA-LSTM-1 and SSA-LSTM-2, respectively), and use CNN-LSTM, TCN-LSTM, CNN-LSTM-ATT (denoted as LSTM-1), and TCN-LSTM-ATT (denoted as LSTM-2) models as comparison experiments.
The parameter optimization results of the SSA-LSTM-1 and SSA-LSTM-2 models for the ATFM delay prediction data in East China are shown in Tables 8 and 9, respectively.For the SSA-LSTM-1 model, the Mean Absolute Error (MAE) and R2 results of combination 10 are optimal.Therefore, combination 10 is set as the optimal combination of SSA-LSTM-1 with two hidden layers, 64 neurons, and a learning rate of 0.001.For the SSA-LSTM-2 model, the evaluation parameters of combination 10 are optimal with an MAE of about 4.4 min and an R2 of about 0.87.Combination 10 is set as the optimal combination of SSA-LSTM-2 with 32 filters, 64 neurons, and a learning rate of 0.01.The ATFM delay prediction data are input into the six prediction models and the prediction performances of the models are evaluated using the loss function, MAE, and R2.As shown in Figure 10, the SSA-LSTM-1 and SSA-LSTM-2 models outperform the other four models in terms of convergence speed and loss values.The SSA-LSTM-1 and SSA-LSTM-2 models reach the converged state after only 22 iterations.Among them, the SSA-LSTM-1 loss value is lower than the SSA-LSTM-2 model.As shown in Table 10, the CNN-LSTM and TCN-LSTM models perform poorly with low R2.The LSTM-1 and LSTM-2 models have higher R2 and the models fit the data better, but the MAE values are high.The SSA-LSTM-1 and SSA-LSTM-2 models have the best performances in terms of MAE and R2 metrics.This indicates that the optimization of the SSA algorithm for LSTM-1 and LSTM-2 can improve the accuracy and reliability of prediction.In summary, SSA-LSTM-1 and SSA-LSTM-2 outperform the other four models in prediction performance, and SSA-LSTM-1 is slightly better than A-LSTM-2 in prediction accuracy.

Analysis of Prediction Result
We output the optimal prediction results of SSA-LSTM-1 for ATFM delay prediction and compare with the actual ATFM delay values, as shown in Figure 11.As a whole, the predicted values of ATFM delay are lower than the actual values.When the actual ATFM delay value is low, the ATFM delay prediction accuracy is high, and when the actual ATFM delay value is high, the prediction results have some deviation.In addition, the ATFM prediction results of SSA-LSTM-1 for ZSNJ and ZSPD are better than those of ZSHC and ZSSS.Among the four airports, ATFM delays of more than 60 min accounted for less than 10% of the data, but the MAE of this part of the data is much higher than that of ATFM delays of less than 60 min.Therefore, in order to further compare the prediction effect under different values, we set 60 min as the ATFM delay threshold and divide into two groups of prediction data, as shown in Figure 12.Among them, the most obvious difference is in ZSHC, where the MAE for ATFM delay over 60 min is 22.6 min, while the MAE for ATFM delay less than 60 min is only 3 min.There are fewer high-delay samples in the predicted data, and more complex factors in practice lead to high ATFM delay, which limits the ability of the model to predict the high ATFM delay.
other four models in terms of convergence speed and loss values.The SSA-LSTM-1 and SSA-LSTM-2 models reach the converged state after only 22 iterations.Among them, the SSA-LSTM-1 loss value is lower than the SSA-LSTM-2 model.As shown in Table 10, the CNN-LSTM and TCN-LSTM models perform poorly with low R2.The LSTM-1 and LSTM-2 models have higher R2 and the models fit the data better, but the MAE values are high.The SSA-LSTM-1 and SSA-LSTM-2 models have the best performances in terms of MAE and R2 metrics.This indicates that the optimization of the SSA algorithm for LSTM-1 and LSTM-2 can improve the accuracy and reliability of prediction.In summary, SSA-LSTM-1 and SSA-LSTM-2 outperform the other four models in prediction performance, and SSA-LSTM-1 is slightly better than A-LSTM-2 in prediction accuracy.

Analysis of Prediction Result
We output the optimal prediction results of SSA-LSTM-1 for ATFM delay prediction and compare with the actual ATFM delay values, as shown in Figure 11.As a whole, the predicted values of ATFM delay are lower than the actual values.When the actual ATFM delay value is low, the ATFM delay prediction accuracy is high, and when the actual ATFM delay value is high, the prediction results have some deviation.In addition, the ATFM prediction results of SSA-LSTM-1 for ZSNJ and ZSPD are better than those of ZSHC and ZSSS.Among the four airports, ATFM delays of more than 60 min accounted for less than 10% of the data, but the MAE of this part of the data is much higher than that of ATFM delays of less than 60 min.Therefore, in order to further compare the prediction effect under different values, we set 60 min as the ATFM delay threshold and divide into two groups of prediction data, as shown in Figure 12.Among them, the most obvious difference is in ZSHC, where the MAE for ATFM delay over 60 min is 22.6 min, while the MAE for ATFM delay less than 60 min is only 3 min.There are fewer high-delay samples in the predicted data, and more complex factors in practice lead to high ATFM delay, which limits the ability of the model to predict the high ATFM delay.13, there is a certain pattern in the time distribution, which is characterized by a low value at both ends and a high value in the middle.In addition, when the ATFM delayed flight volume is low, the ATFM delay prediction accuracy is high; when the ATFM delayed flight volume increases, the MAE of ATFM delay prediction is high.In SSA-LSTM-1 and SSA-LSTM-2, we calculate the absolute mean of the gradient and normalize it to obtain the importance of the predicted features, selecting the features with importance greater than 0.01 for comparison.As shown in Figure 14, the normalized flow control content contributes the most to SSA-LSTM-1 and SSA-LSTM-2, with feature importance of 0.19 and 0.21, respectively.This is followed by the common controlled waypoints, with feature importance of 0.13 and 0.16, respectively.In addition, the weather  From 27 May 2023 to 31 May 2023, the prediction results of SSA-LSTM-1 on ATFM delayed flight volume are output and compared with MAE.As shown in Figure 13, there is a certain pattern in the time distribution, which is characterized by a low value at both ends and a high value in the middle.In addition, when the ATFM delayed flight volume is low, the ATFM delay prediction accuracy is high; when the ATFM delayed flight volume increases, the MAE of ATFM delay prediction is high.In SSA-LSTM-1 and SSA-LSTM-2, we calculate the absolute mean of the gradient and normalize it to obtain the importance of the predicted features, selecting the features with importance greater than 0.01 for comparison.As shown in Figure 14, the normalized flow control content contributes the most to SSA-LSTM-1 and SSA-LSTM-2, with feature importance of 0.19 and 0.21, respectively.This is followed by the common controlled waypoints, with feature importance of 0.13 and 0.16, respectively.In addition, the weather In SSA-LSTM-1 and SSA-LSTM-2, we calculate the absolute mean of the gradient and normalize it to obtain the importance of the predicted features, selecting the features with importance greater than 0.01 for comparison.As shown in Figure 14, the normalized flow control content contributes the most to SSA-LSTM-1 and SSA-LSTM-2, with feature importance of 0.19 and 0.21, respectively.This is followed by the common controlled waypoints, with feature importance of 0.13 and 0.16, respectively.In addition, the weather type in the departure airport, estimated flow-to-capacity ratio of departure airport, and estimated flow at key waypoints are also important features affecting ATFM delay prediction, with a contribution rate of more than 5%.In summary, the common flow control information has a greater impact on ATFM delay prediction.
x FOR PEER REVIEW 21 of 24 type in the departure airport, estimated flow-to-capacity ratio of departure airport, and estimated flow at key waypoints are also important features affecting ATFM delay prediction, with a contribution rate of more than 5%.In summary, the common flow control information has a greater impact on ATFM delay prediction.

Discussions and Implications
In order to further improve the predictability of ATFM delays, this paper adds normalized flow control content and normalized controlled waypoint indicators in order to construct a more comprehensive ATFM delay regression prediction indicator system.Meanwhile, this paper proposes two ATFM delay regression prediction models based on the improved LSTM model, which realizes the short-term prediction of ATFM delay duration and delayed flight volume.
The occurrence of delays often leads to the waste and loss of resources due to untimely and unreasonable remedial measures and lagging information communication, thus hindering the development of the economy.Therefore, accurately grasping the ATFM delays and development trends under congested hours can provide airlines with some space to take measures to solve the problem and reduce economic losses such as additional fuel costs, wasted human resources, and loss of passengers caused by delays.At the same time, ATFM can better plan and manage air traffic flow to improve overall air operation efficiency, thus attracting more passengers and increasing the economic contribution of air transportation.
In addition, the purpose of demand and capacity management of air traffic is not only to control the demand in order to ensure and improve the flight quality and passenger satisfaction, but more importantly, to identify the key factors affecting the high quality of aviation networks and airports.This paper explores the key factors affecting ATFM delays by calculating the contribution rate of ATFM delay prediction indicators to the model and provides a scientific basis for improving the current situation of delays caused by traffic management from the root.

Conclusions
In order to solve the problems of multi-source, high-dimensional, and unbalanced ATFM delay prediction data, this paper proposes two ATFM delay prediction models based on improved deep learning algorithms to realize the short-term prediction of ATFM delay.The main results are as follows: (1) Construct ATFM delay prediction network model.Taking the points of imbalance between capacity and demand (airports and waypoints) that flights may pass through on

Discussions and Implications
In order to further improve the predictability of ATFM delays, this paper adds normalized flow control content and normalized controlled waypoint indicators in order to construct a more comprehensive ATFM delay regression prediction indicator system.Meanwhile, this paper proposes two ATFM delay regression prediction models based on the improved LSTM model, which realizes the short-term prediction of ATFM delay duration and delayed flight volume.
The occurrence of delays often leads to the waste and loss of resources due to untimely and unreasonable remedial measures and lagging information communication, thus hindering the development of the economy.Therefore, accurately grasping the ATFM delays and development trends under congested hours can provide airlines with some space to take measures to solve the problem and reduce economic losses such as additional fuel costs, wasted human resources, and loss of passengers caused by delays.At the same time, ATFM can better plan and manage air traffic flow to improve overall air operation efficiency, thus attracting more passengers and increasing the economic contribution of air transportation.
In addition, the purpose of demand and capacity management of air traffic is not only to control the demand in order to ensure and improve the flight quality and passenger satisfaction, but more importantly, to identify the key factors affecting the high quality of aviation networks and airports.This paper explores the key factors affecting ATFM delays by calculating the contribution rate of ATFM delay prediction indicators to the model and provides a scientific basis for improving the current situation of delays caused by traffic management from the root.

Conclusions
In order to solve the problems of multi-source, high-dimensional, and unbalanced ATFM delay prediction data, this paper proposes two ATFM delay prediction models based on improved deep learning algorithms to realize the short-term prediction of ATFM delay.The main results are as follows: (1) Construct ATFM delay prediction network model.Taking the points of imbalance between capacity and demand (airports and waypoints) that flights may pass through on routes as nodes in the ATFM delay prediction network, and routes as edges, the dynamic ATFM delay prediction network model is constructed in terms of days.In order to avoid the inconsistency of ATFM delay generation and occurrence locations, the edges in the ATFM delay prediction network are used as the prediction objects.
(2) Construct ATFM delay prediction index system and propose innovative indicators through the mining of historical flow control data, combing the common flow control information, and selecting common flow control contents and common controlled waypoints as the key prediction indicators.In addition, this system of predictive metrics includes estimated traffic and dynamically weighted PageRank values for key nodes.
(3) Construct ATFM delay prediction model.Combining the feature extraction module, prediction model, and parameter optimization algorithm, we construct the SSA-LSTM-1 and SSA-LSTM-2 prediction models.The model prediction results show that the MAE of SSA-LSTM-1 and SSA-LSTM-2 for ATFM delay duration prediction is 4.25 min and 4.38 min, respectively.Among them, the prediction MAE of the SSA-LSTM-1 model is reduced by 2.71 min, 3.68 min, 1.28 min, and 1.05 min compared to CNN-LSTM, TCN-LSTM, CNN-LSTM-ATT, and TCN-LSTM-ATT, respectively.To exclude the effect of higher delay values, 60 min was set as the ATFM delay threshold, and the predicted MAE of SSA-LSTM-1 for ZSHC with ATFM delay of more than 60 min is 22.6 min, while the predicted MAE for ATFM delay of less than 60 min is only 2.9 min.In addition, through the calculation of the contribution ratio of the prediction metrics, the normalized flow control content and normalized waypoints contribute the most to the prediction results of SSA-LSTM-1 and SSA-LSTM-2, with a significance of more than 0.03.
In this paper, we focus on the mining of factors influencing ATFM delay and ATFM delay regression prediction, and the accuracy of model prediction decreases in the case of more delayed flights and higher ATFM delay values.In order to further improve the predictability of ATFM delays and provide support for deploying ATFM strategies in advance, the next phase of research will consider adding more reliable influencing factors and introducing a data imbalance algorithm to optimize the model.In addition, this paper needs to integrate flight information, flow control data, weather preparation data, etc., during data collection.And a large amount of data is lost due to the data matching problem, thus leading to a reduction in data samples, which is also a problem to be considered in the next phase.
proposed a flight delay prediction model based on NR-DenseNet, which simultaneously realizes delay class classification prediction and regression prediction by establishing a shared layer of multi-task learning feature extraction and a loss weighting method [20].Yu et al. applied deep belief network to mine the internal and deep patterns of flight delay, and proposed the DBN-SVR flight delay prediction model.
), and D is the delay coefficient of the flight.that the flight does not experience an ATFM delay and D = 1 indicates that the flight experiences an ATFM delay.

( 1 )
Constructing ATFM delay prediction model: Joint feature extraction module, prediction module, and parameter optimization module are used to construct different combinations of ATFM delay prediction network models, including CNN-LSTM-ATT, TCN-LSTM-ATT, and CNN-LSTM-ATT based on SSA optimization.

Figure 3 .
Figure 3. Statistics chart of controlled waypoint frequency.
Usually, the key waypoints are the ones carrying large flow pressure or the intersection of multiple routes.The flow statistics for the key waypoints are shown in Figure 4.

Figure 3 .
Figure 3. Statistics chart of controlled waypoint frequency.

Figure 4 .
Figure 4. Flow statistics chart of key waypoints.

Figure 4 .
Figure 4. Flow statistics chart of key waypoints.

Figure 5 .
Figure 5. First, the attention score is obtained by performing similarity computation between query and key; then, the attention score is normalized to obtain attention weights; finally, the attention weights are multiplied by the corresponding values, and all the weighted values are summed up to obtain the final weighted representation.The result of the weighted summation can be used as a direct output of the prediction result or passed to the subsequent layers for further processing.

Figure 5 .
Figure 5. Flow statistics chart of key waypoints.Figure 5. Flow statistics chart of key waypoints.

Figure 5 .
Figure 5. Flow statistics chart of key waypoints.Figure 5. Flow statistics chart of key waypoints.

( 1 )
Input the ATFM delay time series and prediction index data into the feature extraction module.Among them, CNN mainly extracts the spatial characteristics of data, and TCN mainly extracts the temporal characteristics of data.The input data are convolved and pooled in the feature extraction module to obtain the feature-mapped data, which are then passed to the LSTM layer through the fully connected layer.(2) At each time step, the LSTM receives an input vector from the feature extraction module and gradually updates its internal state and memory, calculating the value of the hidden state or memory cell for the current time step.The value of this hidden state or memory cell is regarded as the result of the processing of the feature data by LSTM, which is passed to the Attention module.(3) The Attention module accepts the output and attention weight vector of LSTM.By calculating the similarity relationship between each time step output and the attention weights, Attention obtains a weighted output vector that measures the importance of each time step output.Attention outputs a weighted aggregated feature vector.(4) The output of Attention is plugged into the fully connected layer, which is further nonlinearly transformed and mapped by the activation function.The final output is then produced.Aerospace 2024, 11, x FOR PEER REVIEW 14 of 24

Figure 8 .
Figure 8. Process of ATFM delay prediction model based on SSA optimization(iteration).

Figure 8 .
Figure 8. Process of ATFM delay prediction model based on SSA optimization (iteration).

Figure 9 .From 1
Figure 9. ATFM delay prediction network diagram of four airports in East China.From 1 May to 31 May 2023, we select the ATFM delay prediction data of four departures in East China, with a total of 43,964 valid data points, 20,654 data points with CTOT

Figure 9 .
Figure 9. ATFM delay prediction network diagram of four airports in East China.

Figure 10 .
Figure 10.Comparison diagram of loss function curve.

Figure 10 .
Figure 10.Comparison diagram of loss function curve.

Figure 12 .
Figure 12.Comparison of ATFM delay prediction results under different values.

From 27
May 2023 to 31 May 2023, the prediction results of SSA-LSTM-1 on ATFM delayed flight volume are output and compared with MAE.As shown in Figure

Figure 13 .
Figure 13.Comparison of ATFM delay prediction flight volume and MAE.

Figure 12 .
Figure 12.Comparison of ATFM delay prediction results under different values.

From 27 24 Figure 12 .
Figure 12.Comparison of ATFM delay prediction results under different values.

Figure 13 .
Figure 13.Comparison of ATFM delay prediction flight volume and MAE.

Figure 13 .
Figure 13.Comparison of ATFM delay prediction flight volume and MAE.

Figure 14 .
Figure 14.Weight comparison of ATFM delay prediction index.

Figure 14 .
Figure 14.Weight comparison of ATFM delay prediction index.

Table 1 .
Summary of previous research on ATFM delay applications.

Table 2 .
Summary of previous research on ATFM delay prediction methods.

Table 3 .
Historical Statistics Table of Flow Control Measures Accepted by Shanghai Approach.

Table 5 .
ATFM delay prediction index system.

Table 7 .
Time window of ATFM delay prediction index.

Table 10 .
Comparison table of evaluation parameters in ATFM delay prediction models.

Table 10 .
Comparison table of evaluation parameters in ATFM delay prediction models.