1. Introduction
The aviation industry plays a crucial role in modern transportation systems, with its efficiency profoundly influencing national and regional economic activities and societal well-being [
1,
2,
3,
4]. Modern aviation systems are under considerable strain due to multiple factors, including constrained flight space resources amidst the ever-increasing demand for air transportation, as well as other operational challenges such as aircraft maintenance issues, weather conditions, and aircraft availability [
5,
6,
7]. These factors collectively contribute to widespread and frequent occurrences of flight delays. The duration of flight delay is considered a critical indicator for examining the quality of service within the aviation industry. As per the regulation of the Civil Aviation Administration of China (CAAC), flights arriving over 15 min later than scheduled time are viewed as delayed flights. Following this rule and the statistics, in 2019, on-time ratios for arrival flights were only 78.96% in the United States and 81.66% in China [
8,
9]. This implies that the proportion of delayed flights is around 20.5% in the United States and 17.7% in China during this period, posing inevitable consequences that chronically plague both airlines and passengers, such as economic loss and dissatisfied passenger experiences [
10,
11,
12]. According to the Federal Aviation Administration (FAA), in 2019 alone, the cost caused by flight delays results in an estimated
$32.9 billion in costs to airlines, airports, and passengers [
13]. Such high economic losses and negative passenger experiences motivate the research community to conduct relevant studies on investigating flight delay duration for aviation management.
In aviation systems, the estimation of flight delay duration is a necessary ingredient for airport regulators, aviation authorities, and airlines prior to designing strategies for air traffic control (ATC) and operational management. Take arrival delays as an illustration, which are among the common types of flight delays.
Figure 1 depicts the description of flight arrival delay by exhibiting the scheduled against the real flight trajectory flow for a single sector (a one-way flight from origin to destination). It usually consists of five parts: taxi-out stage (aircraft moves away from the gate at the origin airport to prepare for take-off), airborne stage (aircraft is completely off the ground and flying), taxi-in stage (aircraft lands at the destination airport and moves towards the gate), turn-around stage (the period between taxi-in and the next taxi-out for a subsequent sector, involving activities like passenger disembarkation and boarding), and taxi-out again for the next flight. From
Figure 1, arrival delay refers to the gap between the real gate-in time (when the aircraft arrives at the gate of the destination airport) and the scheduled gate-in time. Based on a poll, 80% of respondents expressed annoyance with flight arrival delays and believed that accuracy in delay prediction should be improved [
14]. While accurate delay prediction is already practiced by airlines for day-of-operations management (e.g., prioritizing flights with many connecting passengers), this study focuses on providing a network-level perspective for stakeholders such as aviation authorities and multi-airport coordinators who can use these predictions to inform long-term strategic planning and resource allocation across multiple airports and airlines. Furthermore, enhanced accuracy in delay estimation can indirectly contribute to delay prevention by informing more realistic flight scheduling and proactive traffic management strategies. Consequently, it is imperative to develop an accurate method for estimating the duration of flight arrival delay to alleviate public anxiety, enable efficient air traffic control strategies, and ultimately improve airline service satisfaction while reducing unnecessary economic losses.
To date, numerous efforts have been dedicated to investigating the duration of flight delay, which can be divided into the following streams: single-airport-level research [
15,
16,
17] and single-airline-level research [
18,
19]. For instance, Yu et al. [
17] established a model framework using a deep belief network to evaluate flight delays at Beijing Capital International Airport, achieving satisfactory results. Similarly, Chakrabarty et al. [
19] evaluated the flight arrival delays of American Airlines by analyzing the performance of random forest (RF), K-Nearest Neighbor (KNN), support vector machine (SVM), and gradient boosting classifier (GBC). These studies presented beneficial methods for exploring flight delay duration at a single airport or for a single airline. However, different airports or airlines operate under varying conditions, and these existing methods are not directly applicable to estimating flight delay duration at the network level, which involves interactions across multiple airports and airlines. In this context, “network level” refers to an interconnected system where airports represent nodes (points of operation) and flight routes represent edges (connections between nodes). Therefore, it necessitates establishing an accurate method that incorporates aviation network features from a multi-airport-multi-airline perspective.
There exist several factors that potentially influence multi-airports-multi-airlines-level flight delay. The U.S. Bureau of Transportation Statistics classified the factors into five groups, namely air carrier, security, late-arriving aircraft, national aviation system, and extreme weather [
8,
20,
21]. To accurately estimate the duration of flight delay at the network level, two main challenges need to be addressed. First, historical flight delay data are generally high-volume and high-dimension, exhibiting diverse possible causes for analyzing the resulting flight delay. At the same time, the data may contain redundant information, which not only increases the difficulty of extracting aviation network features with high explanatory power but also negatively impacts the accuracy performance. Second, by and large, relationships between causal factors and delay exist, and the effects of correlations among causal factors are highly non-linear. As a result, investigating the underlying influencing mechanism of flight delay is a non-trivial task.
To surmount the above two challenges, this study proposes a novel end-to-end methodological framework tailored to network-level flight arrival delay duration estimation across multiple airports and airlines that synergizes complex network theory with deep learning. The proposed framework is composed of a selection-abstraction-prediction pipeline designed to address high-dimensional and non-linear aviation data. Specifically, in the selection phase, based on the complex network theory, we adopt the XGBoost algorithm to conduct the selection of aviation network features at node and edge levels and extract explanatory variables. Bayesian optimization (BO) is utilized to fine-tune the hyperparameters of the XGBoost algorithm to form the BO–XGBoost algorithm. This method is chosen for its ability to efficiently optimize expensive black-box functions. By modeling the objective function with a Gaussian process [
22], it strategically selects hyperparameters that balance exploration of unknown regions and exploitation of promising areas, thereby accelerating convergence and improving model performance compared to conventional search techniques. Second, the abstraction phase employs a modified multi-layer extreme learning machine (ML-ELM) to compress these selected features into high-level latent representations, effectively denoising the inputs. Finally, the mapping phase utilizes a deep kernel extreme learning machine (DKELM) to perform precise regression. This architecture specifically addresses the non-trivial challenge of capturing complex, multi-scale features in network delay data, which standard baselines lacking this hierarchical abstraction capability fail to resolve. The main contributions of this study are summarized as follows:
We develop a novel end-to-end methodological framework that uniquely fuses complex network-derived topological features (node degree and edge betweenness centrality) with the BO–XGBoost for feature selection and a customized DKELM for delay estimation. Their synergistic integration, specifically designed to capture network-level delay propagation mechanisms in a multi-airport, multi-airline system, constitutes the core methodological novelty of this study. The proposed method considers a broad scope of influencing factors associated with flight delays and targets stakeholders such as aviation authorities and multi-airport coordinators who can leverage these predictions for strategic decision-making.
In the feature selection module, complex network theory is utilized to reveal spatial features of the aviation network topology at both nodes (airports) and edges (flight routes). By utilizing Bayesian optimization, hyperparameters in the XGBoost algorithm are fine-tuned to extract explanatory features. In the deep learning module, a modified multi-layer extreme learning machine DKELM is designed, where extreme learning machine auto-encoders are incorporated to form a stacked architecture for robust and reliable estimation.
The case studies demonstrate the superior performance of the proposed integrated framework when compared with several benchmark methods on a five-year U.S. flight dataset. Furthermore, the results of this study provide valuable insights on proactive measures for the aviation administration, thereby ensuring sustainable aviation development.
The remainder of this study is organized as follows.
Section 2 reviews the literature pertaining to flight delay.
Section 3 presents the integrated framework for estimating the duration of flight delay in detail. In
Section 4, we conduct numerical experiment using a high-dimensional flight database from the U.S. Bureau of Transportation Statistics. Finally, conclusions and future research directions are given in
Section 5.
2. Literature Review
Accurate estimation of flight delay duration is increasingly recognized as a fundamental aspect of aviation management, which can help aviation authorities achieve efficient resource scheduling, improve airline service, and enhance passenger satisfaction [
8,
13]. A number of studies have investigated flight delay using various methodologies, including operational research, probabilistic models, and statistical analysis [
23]. Operational research employs optimization-based techniques, mathematical simulation, or queueing theory to model air traffic operations. For instance, Hansen [
24] focused on the congestion problem at airport runways and developed a deterministic queuing model to analyze the delay propagation effects of a specific arriving flight on subsequent flights at the Los Angeles International Airport. Pyrgiotis et al. [
10] proposed a probabilistic model to estimate the probability of flight delay based on historical data or by fitting a suitable distribution to describe delay patterns. For analyzing interactions between delays at different airports, Hao et al. [
25] created a regression model combining econometric and simulation methods to quantify the economic and operational impact of New York’s three major commercial airports on regional air traffic and local economies. Additionally, Pathomsiri et al. [
26] introduced a non-parametric function to assess undesirable outputs of airports considering delayed flights based upon data of 56 U.S. airports during 2000–2003. While these methods provide valuable insights into delay prediction, they often focus on single airports or specific operational aspects and may not fully capture network-wide interactions or achieve high predictive accuracy in multi-airport multi-airline contexts.
More recently, data-driven machine learning methods have been put forward for flight delay estimation, building on the extensive set of features identified in prior work [
27,
28]. Many features, such as weather conditions, air traffic volume, airport capacity, and historical delay patterns, have been tested and studied in the extant literature. This paper adopts a feature selection approach from a similarly large set of features, ensuring that the selected variables align with those commonly explored in prior studies (e.g., weather, traffic data, and departure delay) while also incorporating network-level attributes (e.g., node-specific airport characteristics and edge-specific features) to address multi-airport-multi-airline dynamics. For example, Choi et al. [
29] integrated historical weather and traffic data to examine whether a delayed scheduled flight would occur using various machine learning methods, including decision trees (DT) and K-Nearest Neighbor (KNN). Guo et al. [
30] explored flight delay by employing maximal information coefficient and random forest regression. Güvercin et al. [
31] investigated representative time-series for each airport using a clustered airport modeling method to estimate flight delay. Additionally, Cai et al. [
32] developed a graph convolutional neural network model aimed at estimating the duration of flight departure delay. These studies inform the methodological choices in this paper, particularly in the adoption of advanced machine learning techniques (e.g., deep learning) combined with complex network theory, to capture both local and global dependencies in the aviation system, which are often underexplored in single-airport or single-airline studies. Importantly, our end-to-end framework goes beyond these approaches by explicitly modeling the aviation system as a weighted directed graph and extracting topological features such as node degree and edge betweenness centrality. These network-level indicators, when combined with a deep kernel extreme learning machine, enable more accurate delay estimation across the entire network, as demonstrated by our experimental results showing a 7.8% improvement over benchmark methods.
Prior studies can be further separated into two branches according to the research object: departure delay and arrival delay. The first branch primarily concerns flight delay distribution before flights take off from airports. Flight delay models in this branch consider the utilization of gates, runways, and airspace resources, which play a crucial role in aircraft scheduling, gate management, and the punctuality of subsequent flights. The second branch, as illustrated in
Figure 1, encompasses delays throughout the flight lifecycle, including departure delay from the originating airport. This aspect is vital for managing the receiving airport’s capacity and coordinating subsequent ground services. Further, arrival delay is more closely related to passenger satisfaction, as passengers may need to update their subsequent itineraries according to the duration of arrival delay. The factors considered in flight delay models differ between these two branches. Some scholars have investigated flight arrival delay. For instance, Mueller and Chatterji [
33] utilized the normal distributions to obtain the probability of arrival delay. Abdel-Aty et al. [
15] detected the periodic patterns and associated factors of arrival delay with a frequency-related statistical analysis techniques. Zoutendijk and Mitici [
34] tailored a probabilistic delay estimation model for the individual flight by employing random forest regression. Nonetheless, as different airports or airlines have different practical circumstances, most of the existing flight arrival delay models cannot be directly applied in the entire aviation network. Meanwhile, influencing factors associated with flight arrival delay from multiple dimensions are not sufficiently taken into account, while arrival delay may be induced by these varied and likely correlated factors at the network level. To fill such gaps, this study proposes an integrated framework incorporating deep learning and complex network theory to extract aviation network features at nodes and edges and further estimate the duration of flight arrival delay from a multi-airports-multi-airlines-level perspective.
3. Methodology for Estimating the Duration of Flight Delay
3.1. Aviation Network Feature Extraction
The aviation network is a large-scale and highly interconnected system comprising numerous airports and flight routes. Capturing the spatial and operational characteristics of such a system solely through airport-level data is insufficient, as it overlooks the intricate interdependencies between airports and routes. To address this, we model the aviation network as a weighted directed graph , where N represents the set of airports (nodes) and E denotes the set of flights. The weight of each edge reflects the number of flights operating on that route during a specific time period. Formally, we define an adjacency matrix for the weighted directed graph, where indicates that there are m scheduled flights from airport i to airport j; otherwise, . This representation allows us to account for both the structural and operational aspects of the network, including potential parallel edges (multiple flights on the same route).
To extract meaningful spatial and operational features from this aviation network, we employ complex network theory, which offers a robust toolbox of metrics for analyzing the structure and dynamics of interconnected systems. Given the wide array of available metrics, our selection of specific measures—namely, node degree and edge betweenness centrality—is driven by their relevance to capturing key characteristics of delay propagation and network congestion, which are central to flight delay prediction. Below, we elaborate on the rationale behind these choices and address their implementation details.
Degree Centrality as a Measure of Airport Congestion. At the node level, we use the degree of an airport to quantify its activity level and potential for congestion, which is closely tied to delay initiation and propagation. The degree centrality [
35,
36] of airport
i, denoted as
, is defined as the sum of incoming and outgoing flights:
. Among them,
is the number of flights arriving to airport
i and
is the number of departure flights. We interpret
as an indicator of the congestion level at airport
i, as higher degree values often correlate with increased operational complexity, resource constraints, and susceptibility to delays. This metric was chosen over other node-level measures (e.g., clustering coefficient or eigenvector centrality) because it directly reflects the volume of traffic—an immediate driver of operational bottlenecks in aviation networks—and is computationally efficient for large-scale analysis.
Edge Betweenness Centrality as a Measure of Route Criticality. At the edge level, we adopt edge betweenness centrality [
37] to assess the importance of specific flight routes in facilitating network connectivity and influencing delay propagation. Edge betweenness centrality measures the extent to which a given route lies on the shortest paths between all pairs of airports in the network. For a route
, it is defined as:
where
s and
d represent source and destination airports,
is the total number of shortest paths from
s to
d, and
is the number of those shortest paths passing through route
e. Here, shortest paths are computed based on topological distance (i.e., number of edges), unweighted by factors such as passenger volume or connecting times, as our focus is on structural connectivity rather than passenger-specific travel patterns. This choice aligns with our objective of identifying critical routes for delay propagation across the entire network during a given time window.
We acknowledge several nuances in this definition that merit clarification. First, since E represents individual flight routes and multiple flights may operate between the same pair of airports (resulting in parallel edges), each distinct route e may have its own betweenness value. However, in our implementation, we aggregate parallel edges into a single weighted edge for computational efficiency when calculating shortest paths, with weights inversely proportional to the number of flights (to reflect higher capacity). This aggregation avoids redundancy while preserving the structural importance of high-frequency routes. Second, we do not incorporate passenger ticket data or actual connection feasibility (e.g., minimum connecting times) into path calculations. While these factors are relevant for passenger-centric analyses, they are beyond the scope of this study, which prioritizes network-wide structural properties over individual travel itineraries. Lastly, edge betweenness centrality is computed within discrete time windows to capture temporal variations in network dynamics, ensuring that reflects route criticality under specific operational conditions.
Our decision to use edge betweenness centrality over other edge-level metrics (e.g., edge clustering or load centrality) stems from its established ability to identify critical links in transportation networks (e.g., [
37,
38]). Routes with high betweenness centrality often act as bottlenecks or conduits for delay propagation, making this metric particularly suitable for understanding how local disruptions impact global network performance. By focusing on degree and betweenness centrality, we balance interpretability, computational feasibility, and relevance to delay prediction objectives.
Through the implementation of the complex network, we obtained two spatial characteristics of the aviation network, the crowdedness degree of airport and the edge betweenness centrality, which are then fed into the next session.
3.2. Influencing Factors Selection
A large number of factors increase model complexity, consume excessive computational resources, and reduce generalization ability [
39]. To solve the dilemma confronted by high-dimensional datasets, it is essential to implement feature selection before training the model. The feature selection module discovers and keeps crucial features with high explanatory power to the response variable. Hence, the dimension of the dataset is reduced by eliminating redundant and irrelevant attributes.
3.2.1. Extreme Gradient Boosting Machine (XGBoost)
XGBoost is an extensible tree boosting system that can be utilized as an embedded method to select the optimal feature subset. The performance of XGBoost is fully recognized in practice due to fast learning and excellent accuracy [
40]. In this study, we adopt XGBoost to execute feature selection. The procedures of XGBoost algorithms are illustrated in the following.
Given a dataset with
n samples and
m features
, XGBoost is the sum of predictions of
P independent trees.
where
is the space of regression trees. Each
is matched with an independent tree structure and leaf weights. To gain the set of functions in the tree ensemble method, we propose the following regularized objective minimization problem:
where
represents a differentiable convex loss function that measures the difference between actual labels
and predictions
, and regularization term
incorporated additionally is used for preventing overfitting. The model in training round
t is fitted by creating a new tree while keeping results in previous round, i.e.,
. The objective function can be denoted by:
where
reflects the model complexity and can be represented as
.
To find the optimal solution quickly, we can take a second-order Taylor expansion on the loss function, and we have
For simplicity, we ignore the term unrelated to the fitting in the t-th training round as the value of this term is fixed. and are the gradient and Hessian on the loss function, respectively.
XGBoost is an advanced ensemble learning method with the following multiple hyperparameters to be tuned.
Number of estimators: the number of base estimators contained in the model. In general, too few weak estimators result in model underfitting, but too many weak estimators greatly increase the computational complexity.
Maximum depth: the maximum depth of the tree, or the number of splits from root node to leaf node. If the tree is deep, the tree-based model consumes large amounts of memory space and model overfitting is likely to occur.
Minimum child weight: the minimum sum of sample weights in the child node. The tree splitting process continues until the sum of sample weights in a leaf node is less than this value. XGBoost behaves overly conservative when the value is set to be too large.
Subsample: the proportion of samples used to train the model.
3.2.2. Bayesian Optimization
To efficiently seek the optimal hyperparameters of XGBoost, Bayesian optimization (BO) is employed in the feature selection module, which has been proved to be highly capable of resolving the difficulties of unknown function and computational complexity [
41]. BO combines prior knowledge with current sampling points to approximate the posterior distribution of the unknown objective function by applying the Bayes theorem. Then, the next sampling point is selected on the basis of posterior distribution [
40,
42].
There are two main phases in an attempt to perform BO [
41]. First, we need to select a prior function. The Gaussian process (GP) prior is often used to model the distribution of the objective function as its flexibility and practically, which is denoted as
. GP is given by the property that any finite set of
Z points
generates the multi-variate Gaussian distribution on
. From the elegant marginalization properties of the Gaussian distribution, marginals and conditionals can be computed subsequently when taking the
z-th point as the function value
. Second, the following assessment point is decided by choosing an acquisition function (AC). Based on the first stage, we suppose that
is obtained from a GP prior and noisy observations are
, where
obeys Gaussian distribution
with mean
and noise variance
. The prior information provided by GP and previous observations induce a posterior over function. Then, we employ the acquisition function represented by
to determine at which point in
should be evaluated next through a proxy optimization
.
Herein, the GP upper confidence bound (GP-UCB) is chosen as our acquisition function. GP-UCB either exploits the region of the current optimal value (high mean region) or explores unsampled region (high variance region) to obtain the next sampling point. The GP-UCB function is written as:
where
makes trade-off between exploitation and exploration.
and
are mean and standard deviation of the estimation function, respectively.
3.2.3. The Hybrid BO–XGBoost Method
As a hybrid feature selection approach, key explanatory variables are detected by calculating the feature importance score of XGBoost. BO is embedded in the XGBoost method to find the optimal hyperparameter set, which can enhance the feature selection performance. In the process of optimization, the most likely hyperparameter set given by the posterior distribution is used to evaluate the objective function [
43]. In this study, we choose mean square error (MSE) as the objective function. The set of hyperparameters to obtain the minimum MSE is the best hyperparameter in the XGBoost method. The flowchart of the hybrid BO–XGBoost feature selection method is shown in
Figure 2.
3.3. Deep Learning
While BO–XGBoost effectively reduces the feature dimension by eliminating irrelevant noise, it primarily captures decision-tree-based linear and simple non-linear boundaries. Flight delays in a large-scale network, however, exhibit highly complex, deep non-linear patterns driven by the topological structure of the aviation network. To capture these dynamics, simple regression is insufficient. Instead, we couple the BO–XGBoost end-to-end with a deep learning module for arrival delay estimation, which is designed through stacking multiple extreme learning machine auto encoders (ELM-AE) and a standard kernel extreme learning machine (KELM).
3.3.1. Kernel Extreme Learning Machine (KELM)
Extreme learning machine (ELM) is a single hidden layer feed forward network [
44]. In the conventional ELM, the input weights
and biases
are randomly generated based on the number of neurons
l in the hidden layer, and the output weights
are estimated by the least squares method [
45]. Let
be the set of input variables, the output variable
is formulated as
where ⊤ represents the transpose,
denotes the output of the hidden layer,
represents the activation function, and
demonstrates the output weights.
and
are the input weights and biases, respectively. Suppose there are
n samples in the dataset, Equation (
7) can be expressed in matrix form as
The solution model for
is represented by minimizing the following constrained optimization problem, i.e.,
where
is the vector of errors of the output of ELM and
C is a regularized parameter. ELM improves the generalization capability and avoids data overfitting by reducing the norm of the output weights and training errors [
46]. From the KKT theorem, Equation (
9) is denoted equivalently as
Here,
indicates Lagrange multipliers. The value of
is obtained on account of optimality condition.
where
is an
n-dimensional identity matrix. Then, the output function of ELM corresponding to Equation (
7) is
Therefore, the total output vector for
n input data is
There is no specific requirement for which the alternative method (see Equations (12a) and (12b)) is used on different sizes of datasets theoretically. Thus, both methods can be applied in different circumstances, but the computational costs and efficiency are distinguishing. According to the study in [
47], the number of hidden neurons
l has little effect on the generalization ability of ELM, and ELM exhibits favorable performance as long as
l is large enough. When the size of the input data is quite large
, one may be inclined to use Equation (12b) to decrease computational costs. Nevertheless, when
is unknown, one may tend to use Equation (12a).
To cope with the unknown
, KELM is presented by adding kernel function to the ELM [
47]. KELM considers the kernel function to represent non-linear feature mapping of the hidden layer. The kernel function satisfies Mercer’s conditions and is convenient to use. Common kernel functions are Radial basis function (RBF), Sigmoid kernel and Polynomial kernel (Poly). We define the kernel matrix for ELM as
Therefore, the estimated output of KELM based on Equation (
14) is obtained as
3.3.2. Multi-Layer Extreme Learning Machine (ML-ELM)
ELM, due to the structure of a single hidden layer, limits the modeling accuracy and stability, especially for the evaluation of high-volume and high-dimensional data [
48]. To explore the relationship between input and output in complex cases, ML-ELM is promoted. The structure of ML-ELM is shown in
Figure 3. ML-ELM is a multi-layer feed forward neural network, which combines the high efficiency of ELM with the deep structure of auto encoder (ELM-AE for short). The input dimension of ELM-AE is the same as the output, so the unsupervised learning mode is naturally transformed into supervised learning mode [
49]. ELM-AE projects input data to hidden layers, and the input weights and biases are orthogonal (
,
). The output weights
is computed by
where
is the input matrix. The learned feature map can be inversely restored to the raw input data via the output weights
. On the contrary, the transpose of
can map input data to the compressed features [
50]. Using the basic principle above, ML-ELM stacks multiple ELM-AE and obtains the weight matrix of the hidden layer through layer-by-layer training from bottom to top [
51]. The output of ML-ELM at the
i-th (
) layer is expressed as
From Equation (
17), we know that random weights and biases are no longer required from the second hidden layer. Hence, ML-ELM addresses the issue of the random weights and biases caused by the single hidden layer network in ELM. Meanwhile, the deep structure of ML-ELM ensures that features are fully learned and key information is effectively extracted.
3.3.3. Deep Kernel Extreme Learning Machine (DKELM)
Although DKELM has been applied in other domains (e.g., [
50]), its adaptation to flight delay estimation, fed with aviation network features selected via BO–XGBoost, represents a novel application architecture that addresses the unique challenges of high-dimensional, spatially interdependent delay data in large-scale aviation systems. Deep kernel extreme learning machine is a combination of two modules: ML-ELM for feature learning and KELM for decision making [
50]. ML-ELM employs ELM-AE to train parameters of hidden layers and obtain the high-level features by stacking multiple ELM-AE. The output of ML-ELM is then utilized as the input of KELM to generate the predictions. KELM makes more accurate estimation with less user intervention. The training principle of the DKELM method is illustrated intuitively in
Figure 4. The input features are abstracted and compressed by
k hidden layers to obtain more compact features. Then, the kernel function is adopted to represent the non-linear mapping of compressed features.
Based on the elaboration of the above techniques, we are now in a position to show the process of flight arrival delay estimation in intelligent aviation systems. Our proposed method fully couples the high efficiency of BO–XGBoost in feature selection with the superior estimation capability of DKELM when dealing with high-volume and high-dimension data. Algorithm 1 depicts the main procedure of the proposed BO–XGBoost–DKELM method for estimating flight arrival delay.
The proposed model estimates flight arrival delay based on historical patterns learned from real operational data, which naturally embed physical and operational constraints (e.g., maximum turnaround times, airport curfews, and air traffic control regulations). While the model itself does not impose explicit upper bounds on predicted delays, its regularized structure and kernel-based smoothing minimize the risk of generating infeasible estimates. In practice, should a predicted delay exceed a reasonable operational threshold, a post-processing step can be applied to align the output with airline-specific or airport-specific feasibility limits.
| Algorithm 1: The main procedures of the BO–XGBoost–DKELM method |
- 1:
Input: Raw flight dataset , where denotes the feature vector of flight i and denotes its actual arrival delay (in minutes); Search space for XGBoost hyperparameters; Maximum BO iterations ; DKELM hyperparameters: number of hidden layers k, kernel type , and regularization parameter C. - 2:
Output: Trained DKELM model . - 3:
Step 1. Preprocessing: - 4:
Encode nominal features in into numerical format. - 5:
Apply standard normalization: , where and are empirical mean and standard deviation. - 6:
Split into training set (70%) and test set (30%). - 7:
Step 2. Feature Selection via BO-XGBoost: - 8:
Initialize Gaussian Process (GP) prior over . - 9:
for to do - 10:
Sample hyperparameter configuration using GP-UCB acquisition function. - 11:
Train XGBoost model on with hyperparameters . - 12:
Compute validation MSE: . - 13:
Update GP posterior with . - 14:
end for - 15:
Select optimal . - 16:
Train final XGBoost model with on full . - 17:
Compute feature importance scores for all . - 18:
Select top features , where is chosen to minimize validation MSE. - 19:
Retain only features in : . - 20:
Step 3. Estimation: - 21:
Construct compressed representation using k stacked ELM-AE layers. - 22:
Train KELM regressor on with kernel and parameter C. - 23:
Return final model .
|
5. Conclusions
The growing concern about aviation operational efficiency and passenger satisfaction makes accurate estimation of the duration of flight delay vital for sustainable aviation management. In this paper, we propose an integrated architecture that combines deep learning with complex network theory for accurately estimating the duration of flight delay. Drawing on complex network theory, the feature selection module adopts the XGBoost algorithm to conduct the selection of aviation network features at node and edge levels. To extract explanatory features in a more objective and accurate manner, Bayesian optimization is utilized to fine-tune the GBoost algorithm’s hyperparameters. Then, the modified multi-layer extreme learning machine DKELM uses these extracted features as input indicators, incorporating extreme learning machine auto encoders into the deep network to form a stacked architecture for robust and reliable estimation. Case studies are conducted based on a high-dimensional flight database from the U.S. Bureau of Transportation Statistics. Upon comparison with commonly used benchmark methods, the integrated framework demonstrates clear advantages in estimating the duration of flight delays. The conclusions are drawn as follows.
We conduct quantitative analysis of explanatory factors that affect the duration of flight delay using the XGBoost algorithm with Bayesian optimization. Our findings indicate that flight arrival delay is closely related to features such as departure delay, delay by national aviation system, crowdedness degree of airport, edge betweenness centrality, cancelled code, and delay by carrier. Afterwards, DKELM can accurately estimate flight arrival delay by utilizing these explanatory factors as input indicators.
By employing the proposed integrated framework, we can estimate the duration of flight delay from a multi-airports-multi-airlines-level perspective based on the five-year U.S. flight dataset. According to our case studies, the proposed method achieves an average delay error of 3.36 min. When juxtaposed with benchmark methods, this integrated method demonstrates the highest precision, with an average improvement of 7.8%.
As per the findings of this study, the aviation administration could consider several proactive measures such as iteratively upgrading the aviation navigation system with advanced algorithms and data analytics, expanding key infrastructure at hub airports, implementing appropriate priority access measures for airlines with higher edge betweenness centrality, and enhancing real-time passenger information system. These managerial insights are beneficial to enhance operational efficiency and management capacity, thereby ensuring sustainable aviation development in the future.
Admittedly, this study comes with some limitations. Though the BTS dataset includes aircraft tail numbers, it lacks explicit labels identifying adjacent flights, which would directly map the sequence of flights operated by the same aircraft. As a result, the current model forecasts arrival delays using only flight-specific features and does not examine whether delays render subsequent flight schedules operationally infeasible. We recognize that, a limitation of the current data-driven approach is that it predicts the delay duration based on network features but does not explicitly enforce physical constraints regarding the propagation of delay to the subsequent flight (e.g., when the predicted arrival delay exceeds the turnaround buffer). When datasets that include explicit labels identifying adjacent flights are available, future work could explore methods such as physics-informed machine learning to embed upper limits in the process and mitigate potential infeasibilities in turnaround scheduling between connected flights. In addition, future work can also involve exploring the applicability of this framework to other modes of transport, such as railway and road transport, which could yield more insights for the broader field of multi-modal transportation management.
On the other hand, a persistent challenge in deep learning is ensuring robustness against Out-of-Distribution (OOD) scenarios and zero-sample cases where the model encounters patterns not present during training. In complex network systems, enhancing the model’s ability to generalize to unseen data regimes is critical, particularly for black swan events (e.g., unprecedented extreme weather or new route openings). Recent methodological advancements have begun to address these limitations through novel architectures and data augmentation strategies, such as the studies in [
59,
60,
61,
62], which employ Out-of-Distribution method or meta-learning to extract robust features even in the absence of target samples. Although our proposed framework relies on supervised learning from historical aviation data, acknowledging these advancements in OOD robustness and zero-sample learning offers a valuable perspective on handling the sporadic and extreme nature of rare delay events that fall outside standard statistical distributions. To this end, in our future research, we can integrate OOD robustness and zero-sample learning capabilities into our methodology, enabling it to efficiently address unexpected events in aviation systems.