Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management

Shen, Xiuyu; Huang, Haoran; Liu, Liu; Chen, Jingxu

doi:10.3390/su18010241

Open AccessArticle

Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management

by

Xiuyu Shen

,

Haoran Huang

,

Liu Liu

and

Jingxu Chen

^*

School of Transportation, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(1), 241; https://doi.org/10.3390/su18010241

Submission received: 24 November 2025 / Revised: 22 December 2025 / Accepted: 23 December 2025 / Published: 25 December 2025

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

Flight delay serves as a pivotal metric for assessing service quality in the aviation industry. Accurately estimating flight delay duration is increasingly acknowledged as a cornerstone of aviation management, with significant implications for operational efficiency, passenger satisfaction, and economic outcomes. Most existing approaches often focus on single airports or airlines and overlook the complex interdependencies within the broader aviation network, limiting their applicability for system-wide planning. To address this gap, this study proposes a novel integrated framework that combines deep learning and complex network theory to predict flight arrival delay duration from a multi-airport and multi-airline perspective. Leveraging Bayesian optimization, we fine tune hyperparameters in the XGBoost algorithm to extract critical aviation network features at both node (airports) and edge (flight routes) levels. These features, which capture structural properties such as airport congestion and route criticality, are then used as inputs for a deep kernel extreme learning machine to estimate delay duration. Numerical experiment using a high-dimensional flight dataset from the U.S. Bureau of Transportation Statistics reveals that the proposed framework achieves superior accuracy, with an average delay error of 3.36 min and a 7.8% improvement over established benchmark methods. This approach fills gaps in network-level delay prediction, and the findings of this research could provide valuable insights for the aviation administration, aiding in making informed decisions on proactive measures that contribute to the sustainable development of the aviation industry.

Keywords:

aviation management; duration of flight delay; complex network; deep learning

1. Introduction

The aviation industry plays a crucial role in modern transportation systems, with its efficiency profoundly influencing national and regional economic activities and societal well-being [1,2,3,4]. Modern aviation systems are under considerable strain due to multiple factors, including constrained flight space resources amidst the ever-increasing demand for air transportation, as well as other operational challenges such as aircraft maintenance issues, weather conditions, and aircraft availability [5,6,7]. These factors collectively contribute to widespread and frequent occurrences of flight delays. The duration of flight delay is considered a critical indicator for examining the quality of service within the aviation industry. As per the regulation of the Civil Aviation Administration of China (CAAC), flights arriving over 15 min later than scheduled time are viewed as delayed flights. Following this rule and the statistics, in 2019, on-time ratios for arrival flights were only 78.96% in the United States and 81.66% in China [8,9]. This implies that the proportion of delayed flights is around 20.5% in the United States and 17.7% in China during this period, posing inevitable consequences that chronically plague both airlines and passengers, such as economic loss and dissatisfied passenger experiences [10,11,12]. According to the Federal Aviation Administration (FAA), in 2019 alone, the cost caused by flight delays results in an estimated $32.9 billion in costs to airlines, airports, and passengers [13]. Such high economic losses and negative passenger experiences motivate the research community to conduct relevant studies on investigating flight delay duration for aviation management.

In aviation systems, the estimation of flight delay duration is a necessary ingredient for airport regulators, aviation authorities, and airlines prior to designing strategies for air traffic control (ATC) and operational management. Take arrival delays as an illustration, which are among the common types of flight delays. Figure 1 depicts the description of flight arrival delay by exhibiting the scheduled against the real flight trajectory flow for a single sector (a one-way flight from origin to destination). It usually consists of five parts: taxi-out stage (aircraft moves away from the gate at the origin airport to prepare for take-off), airborne stage (aircraft is completely off the ground and flying), taxi-in stage (aircraft lands at the destination airport and moves towards the gate), turn-around stage (the period between taxi-in and the next taxi-out for a subsequent sector, involving activities like passenger disembarkation and boarding), and taxi-out again for the next flight. From Figure 1, arrival delay refers to the gap between the real gate-in time (when the aircraft arrives at the gate of the destination airport) and the scheduled gate-in time. Based on a poll, 80% of respondents expressed annoyance with flight arrival delays and believed that accuracy in delay prediction should be improved [14]. While accurate delay prediction is already practiced by airlines for day-of-operations management (e.g., prioritizing flights with many connecting passengers), this study focuses on providing a network-level perspective for stakeholders such as aviation authorities and multi-airport coordinators who can use these predictions to inform long-term strategic planning and resource allocation across multiple airports and airlines. Furthermore, enhanced accuracy in delay estimation can indirectly contribute to delay prevention by informing more realistic flight scheduling and proactive traffic management strategies. Consequently, it is imperative to develop an accurate method for estimating the duration of flight arrival delay to alleviate public anxiety, enable efficient air traffic control strategies, and ultimately improve airline service satisfaction while reducing unnecessary economic losses.

To date, numerous efforts have been dedicated to investigating the duration of flight delay, which can be divided into the following streams: single-airport-level research [15,16,17] and single-airline-level research [18,19]. For instance, Yu et al. [17] established a model framework using a deep belief network to evaluate flight delays at Beijing Capital International Airport, achieving satisfactory results. Similarly, Chakrabarty et al. [19] evaluated the flight arrival delays of American Airlines by analyzing the performance of random forest (RF), K-Nearest Neighbor (KNN), support vector machine (SVM), and gradient boosting classifier (GBC). These studies presented beneficial methods for exploring flight delay duration at a single airport or for a single airline. However, different airports or airlines operate under varying conditions, and these existing methods are not directly applicable to estimating flight delay duration at the network level, which involves interactions across multiple airports and airlines. In this context, “network level” refers to an interconnected system where airports represent nodes (points of operation) and flight routes represent edges (connections between nodes). Therefore, it necessitates establishing an accurate method that incorporates aviation network features from a multi-airport-multi-airline perspective.

There exist several factors that potentially influence multi-airports-multi-airlines-level flight delay. The U.S. Bureau of Transportation Statistics classified the factors into five groups, namely air carrier, security, late-arriving aircraft, national aviation system, and extreme weather [8,20,21]. To accurately estimate the duration of flight delay at the network level, two main challenges need to be addressed. First, historical flight delay data are generally high-volume and high-dimension, exhibiting diverse possible causes for analyzing the resulting flight delay. At the same time, the data may contain redundant information, which not only increases the difficulty of extracting aviation network features with high explanatory power but also negatively impacts the accuracy performance. Second, by and large, relationships between causal factors and delay exist, and the effects of correlations among causal factors are highly non-linear. As a result, investigating the underlying influencing mechanism of flight delay is a non-trivial task.

To surmount the above two challenges, this study proposes a novel end-to-end methodological framework tailored to network-level flight arrival delay duration estimation across multiple airports and airlines that synergizes complex network theory with deep learning. The proposed framework is composed of a selection-abstraction-prediction pipeline designed to address high-dimensional and non-linear aviation data. Specifically, in the selection phase, based on the complex network theory, we adopt the XGBoost algorithm to conduct the selection of aviation network features at node and edge levels and extract explanatory variables. Bayesian optimization (BO) is utilized to fine-tune the hyperparameters of the XGBoost algorithm to form the BO–XGBoost algorithm. This method is chosen for its ability to efficiently optimize expensive black-box functions. By modeling the objective function with a Gaussian process [22], it strategically selects hyperparameters that balance exploration of unknown regions and exploitation of promising areas, thereby accelerating convergence and improving model performance compared to conventional search techniques. Second, the abstraction phase employs a modified multi-layer extreme learning machine (ML-ELM) to compress these selected features into high-level latent representations, effectively denoising the inputs. Finally, the mapping phase utilizes a deep kernel extreme learning machine (DKELM) to perform precise regression. This architecture specifically addresses the non-trivial challenge of capturing complex, multi-scale features in network delay data, which standard baselines lacking this hierarchical abstraction capability fail to resolve. The main contributions of this study are summarized as follows:

We develop a novel end-to-end methodological framework that uniquely fuses complex network-derived topological features (node degree and edge betweenness centrality) with the BO–XGBoost for feature selection and a customized DKELM for delay estimation. Their synergistic integration, specifically designed to capture network-level delay propagation mechanisms in a multi-airport, multi-airline system, constitutes the core methodological novelty of this study. The proposed method considers a broad scope of influencing factors associated with flight delays and targets stakeholders such as aviation authorities and multi-airport coordinators who can leverage these predictions for strategic decision-making.
In the feature selection module, complex network theory is utilized to reveal spatial features of the aviation network topology at both nodes (airports) and edges (flight routes). By utilizing Bayesian optimization, hyperparameters in the XGBoost algorithm are fine-tuned to extract explanatory features. In the deep learning module, a modified multi-layer extreme learning machine DKELM is designed, where extreme learning machine auto-encoders are incorporated to form a stacked architecture for robust and reliable estimation.
The case studies demonstrate the superior performance of the proposed integrated framework when compared with several benchmark methods on a five-year U.S. flight dataset. Furthermore, the results of this study provide valuable insights on proactive measures for the aviation administration, thereby ensuring sustainable aviation development.

The remainder of this study is organized as follows. Section 2 reviews the literature pertaining to flight delay. Section 3 presents the integrated framework for estimating the duration of flight delay in detail. In Section 4, we conduct numerical experiment using a high-dimensional flight database from the U.S. Bureau of Transportation Statistics. Finally, conclusions and future research directions are given in Section 5.

2. Literature Review

Accurate estimation of flight delay duration is increasingly recognized as a fundamental aspect of aviation management, which can help aviation authorities achieve efficient resource scheduling, improve airline service, and enhance passenger satisfaction [8,13]. A number of studies have investigated flight delay using various methodologies, including operational research, probabilistic models, and statistical analysis [23]. Operational research employs optimization-based techniques, mathematical simulation, or queueing theory to model air traffic operations. For instance, Hansen [24] focused on the congestion problem at airport runways and developed a deterministic queuing model to analyze the delay propagation effects of a specific arriving flight on subsequent flights at the Los Angeles International Airport. Pyrgiotis et al. [10] proposed a probabilistic model to estimate the probability of flight delay based on historical data or by fitting a suitable distribution to describe delay patterns. For analyzing interactions between delays at different airports, Hao et al. [25] created a regression model combining econometric and simulation methods to quantify the economic and operational impact of New York’s three major commercial airports on regional air traffic and local economies. Additionally, Pathomsiri et al. [26] introduced a non-parametric function to assess undesirable outputs of airports considering delayed flights based upon data of 56 U.S. airports during 2000–2003. While these methods provide valuable insights into delay prediction, they often focus on single airports or specific operational aspects and may not fully capture network-wide interactions or achieve high predictive accuracy in multi-airport multi-airline contexts.

More recently, data-driven machine learning methods have been put forward for flight delay estimation, building on the extensive set of features identified in prior work [27,28]. Many features, such as weather conditions, air traffic volume, airport capacity, and historical delay patterns, have been tested and studied in the extant literature. This paper adopts a feature selection approach from a similarly large set of features, ensuring that the selected variables align with those commonly explored in prior studies (e.g., weather, traffic data, and departure delay) while also incorporating network-level attributes (e.g., node-specific airport characteristics and edge-specific features) to address multi-airport-multi-airline dynamics. For example, Choi et al. [29] integrated historical weather and traffic data to examine whether a delayed scheduled flight would occur using various machine learning methods, including decision trees (DT) and K-Nearest Neighbor (KNN). Guo et al. [30] explored flight delay by employing maximal information coefficient and random forest regression. Güvercin et al. [31] investigated representative time-series for each airport using a clustered airport modeling method to estimate flight delay. Additionally, Cai et al. [32] developed a graph convolutional neural network model aimed at estimating the duration of flight departure delay. These studies inform the methodological choices in this paper, particularly in the adoption of advanced machine learning techniques (e.g., deep learning) combined with complex network theory, to capture both local and global dependencies in the aviation system, which are often underexplored in single-airport or single-airline studies. Importantly, our end-to-end framework goes beyond these approaches by explicitly modeling the aviation system as a weighted directed graph and extracting topological features such as node degree and edge betweenness centrality. These network-level indicators, when combined with a deep kernel extreme learning machine, enable more accurate delay estimation across the entire network, as demonstrated by our experimental results showing a 7.8% improvement over benchmark methods.

Prior studies can be further separated into two branches according to the research object: departure delay and arrival delay. The first branch primarily concerns flight delay distribution before flights take off from airports. Flight delay models in this branch consider the utilization of gates, runways, and airspace resources, which play a crucial role in aircraft scheduling, gate management, and the punctuality of subsequent flights. The second branch, as illustrated in Figure 1, encompasses delays throughout the flight lifecycle, including departure delay from the originating airport. This aspect is vital for managing the receiving airport’s capacity and coordinating subsequent ground services. Further, arrival delay is more closely related to passenger satisfaction, as passengers may need to update their subsequent itineraries according to the duration of arrival delay. The factors considered in flight delay models differ between these two branches. Some scholars have investigated flight arrival delay. For instance, Mueller and Chatterji [33] utilized the normal distributions to obtain the probability of arrival delay. Abdel-Aty et al. [15] detected the periodic patterns and associated factors of arrival delay with a frequency-related statistical analysis techniques. Zoutendijk and Mitici [34] tailored a probabilistic delay estimation model for the individual flight by employing random forest regression. Nonetheless, as different airports or airlines have different practical circumstances, most of the existing flight arrival delay models cannot be directly applied in the entire aviation network. Meanwhile, influencing factors associated with flight arrival delay from multiple dimensions are not sufficiently taken into account, while arrival delay may be induced by these varied and likely correlated factors at the network level. To fill such gaps, this study proposes an integrated framework incorporating deep learning and complex network theory to extract aviation network features at nodes and edges and further estimate the duration of flight arrival delay from a multi-airports-multi-airlines-level perspective.

3. Methodology for Estimating the Duration of Flight Delay

3.1. Aviation Network Feature Extraction

The aviation network is a large-scale and highly interconnected system comprising numerous airports and flight routes. Capturing the spatial and operational characteristics of such a system solely through airport-level data is insufficient, as it overlooks the intricate interdependencies between airports and routes. To address this, we model the aviation network as a weighted directed graph

G = (N, E)

, where N represents the set of airports (nodes) and E denotes the set of flights. The weight of each edge reflects the number of flights operating on that route during a specific time period. Formally, we define an adjacency matrix

A = {(a_{i j})}_{N \times N}

for the weighted directed graph, where

a_{i j} = m

indicates that there are m scheduled flights from airport i to airport j; otherwise,

a_{i j} = 0

. This representation allows us to account for both the structural and operational aspects of the network, including potential parallel edges (multiple flights on the same route).

To extract meaningful spatial and operational features from this aviation network, we employ complex network theory, which offers a robust toolbox of metrics for analyzing the structure and dynamics of interconnected systems. Given the wide array of available metrics, our selection of specific measures—namely, node degree and edge betweenness centrality—is driven by their relevance to capturing key characteristics of delay propagation and network congestion, which are central to flight delay prediction. Below, we elaborate on the rationale behind these choices and address their implementation details.

Degree Centrality as a Measure of Airport Congestion. At the node level, we use the degree of an airport to quantify its activity level and potential for congestion, which is closely tied to delay initiation and propagation. The degree centrality [35,36] of airport i, denoted as

d_{i}

, is defined as the sum of incoming and outgoing flights:

d_{i} = d_{i}^{i n} + d_{i}^{o u t}

. Among them,

d_{i}^{i n} = \sum_{j = 1}^{N} a_{j i}

is the number of flights arriving to airport i and

d_{i}^{o u t} = \sum_{j = 1}^{N} a_{i j}

is the number of departure flights. We interpret

d_{i}

as an indicator of the congestion level at airport i, as higher degree values often correlate with increased operational complexity, resource constraints, and susceptibility to delays. This metric was chosen over other node-level measures (e.g., clustering coefficient or eigenvector centrality) because it directly reflects the volume of traffic—an immediate driver of operational bottlenecks in aviation networks—and is computationally efficient for large-scale analysis.

Edge Betweenness Centrality as a Measure of Route Criticality. At the edge level, we adopt edge betweenness centrality [37] to assess the importance of specific flight routes in facilitating network connectivity and influencing delay propagation. Edge betweenness centrality measures the extent to which a given route lies on the shortest paths between all pairs of airports in the network. For a route

e \in E

, it is defined as:

r_{t} (e) = \sum_{s, t \in V} \frac{σ (s, t / e)}{σ (s, t)},

(1)

where s and d represent source and destination airports,

σ (s, t)

is the total number of shortest paths from s to d, and

σ (s, t / e)

is the number of those shortest paths passing through route e. Here, shortest paths are computed based on topological distance (i.e., number of edges), unweighted by factors such as passenger volume or connecting times, as our focus is on structural connectivity rather than passenger-specific travel patterns. This choice aligns with our objective of identifying critical routes for delay propagation across the entire network during a given time window.

We acknowledge several nuances in this definition that merit clarification. First, since E represents individual flight routes and multiple flights may operate between the same pair of airports (resulting in parallel edges), each distinct route e may have its own betweenness value. However, in our implementation, we aggregate parallel edges into a single weighted edge for computational efficiency when calculating shortest paths, with weights inversely proportional to the number of flights (to reflect higher capacity). This aggregation avoids redundancy while preserving the structural importance of high-frequency routes. Second, we do not incorporate passenger ticket data or actual connection feasibility (e.g., minimum connecting times) into path calculations. While these factors are relevant for passenger-centric analyses, they are beyond the scope of this study, which prioritizes network-wide structural properties over individual travel itineraries. Lastly, edge betweenness centrality is computed within discrete time windows to capture temporal variations in network dynamics, ensuring that

r (e)

reflects route criticality under specific operational conditions.

Our decision to use edge betweenness centrality over other edge-level metrics (e.g., edge clustering or load centrality) stems from its established ability to identify critical links in transportation networks (e.g., [37,38]). Routes with high betweenness centrality often act as bottlenecks or conduits for delay propagation, making this metric particularly suitable for understanding how local disruptions impact global network performance. By focusing on degree and betweenness centrality, we balance interpretability, computational feasibility, and relevance to delay prediction objectives.

Through the implementation of the complex network, we obtained two spatial characteristics of the aviation network, the crowdedness degree of airport and the edge betweenness centrality, which are then fed into the next session.

3.2. Influencing Factors Selection

A large number of factors increase model complexity, consume excessive computational resources, and reduce generalization ability [39]. To solve the dilemma confronted by high-dimensional datasets, it is essential to implement feature selection before training the model. The feature selection module discovers and keeps crucial features with high explanatory power to the response variable. Hence, the dimension of the dataset is reduced by eliminating redundant and irrelevant attributes.

3.2.1. Extreme Gradient Boosting Machine (XGBoost)

XGBoost is an extensible tree boosting system that can be utilized as an embedded method to select the optimal feature subset. The performance of XGBoost is fully recognized in practice due to fast learning and excellent accuracy [40]. In this study, we adopt XGBoost to execute feature selection. The procedures of XGBoost algorithms are illustrated in the following.

Given a dataset with n samples and m features

{\{(x_{i}, y_{i})\}}_{i = 1}^{n} \subset R^{m} \times R

, XGBoost is the sum of predictions of P independent trees.

z_{i} = \sum_{p = 1}^{P} f_{p} (x_{i}), f_{p} \in F,

(2)

where

F

is the space of regression trees. Each

f_{p}

is matched with an independent tree structure and leaf weights. To gain the set of functions in the tree ensemble method, we propose the following regularized objective minimization problem:

L (ϕ) = \sum_{i} l (y_{i}, z_{i}) + \sum_{p} R (f_{p}),

(3)

where

l (\cdot)

represents a differentiable convex loss function that measures the difference between actual labels

y_{i}

and predictions

z_{i}

, and regularization term

R (\cdot)

incorporated additionally is used for preventing overfitting. The model in training round t is fitted by creating a new tree while keeping results in previous round, i.e.,

z_{i}^{(t)} = z_{i}^{(t - 1)} + f_{t} (x_{i})

. The objective function can be denoted by:

L^{(t)} \approx \sum_{i = 1}^{n} l (y_{i}, z_{i}^{(t - 1)} + f_{t} (x_{i})) + R (f_{t}),

(4)

where

R (\cdot)

reflects the model complexity and can be represented as

R (f_{t}) = σ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

.

To find the optimal solution quickly, we can take a second-order Taylor expansion on the loss function, and we have

\begin{matrix} L^{(t)} & \approx \sum_{i = 1}^{n} ({(y_{i} - z_{i}^{(t - 1)})}^{2} + g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})) + R (f_{t}) \\ \propto \sum_{i = 1}^{n} (g_{i} f_{t} (x_{i}) + \frac{1}{2} H_{i} f_{t}^{2} (x_{i})) + σ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2} . \end{matrix}

(5)

For simplicity, we ignore the term

{(y_{i} - z_{i}^{(t - 1)})}^{2}

unrelated to the fitting in the t-th training round as the value of this term is fixed.

g_{i} = \frac{\partial l (y_{i}, z_{i}^{(t - 1)})}{\partial z_{i}^{(t - 1)}}

and

H_{i} = \frac{\partial^{2} l (y_{i}, z_{i}^{(t - 1)})}{\partial z_{i}^{{(t - 1)}^{2}}}

are the gradient and Hessian on the loss function, respectively.

XGBoost is an advanced ensemble learning method with the following multiple hyperparameters to be tuned.

Number of estimators: the number of base estimators contained in the model. In general, too few weak estimators result in model underfitting, but too many weak estimators greatly increase the computational complexity.

Maximum depth: the maximum depth of the tree, or the number of splits from root node to leaf node. If the tree is deep, the tree-based model consumes large amounts of memory space and model overfitting is likely to occur.

Minimum child weight: the minimum sum of sample weights in the child node. The tree splitting process continues until the sum of sample weights in a leaf node is less than this value. XGBoost behaves overly conservative when the value is set to be too large.

Subsample: the proportion of samples used to train the model.

3.2.2. Bayesian Optimization

To efficiently seek the optimal hyperparameters of XGBoost, Bayesian optimization (BO) is employed in the feature selection module, which has been proved to be highly capable of resolving the difficulties of unknown function and computational complexity [41]. BO combines prior knowledge with current sampling points to approximate the posterior distribution of the unknown objective function by applying the Bayes theorem. Then, the next sampling point is selected on the basis of posterior distribution [40,42].

There are two main phases in an attempt to perform BO [41]. First, we need to select a prior function. The Gaussian process (GP) prior is often used to model the distribution of the objective function as its flexibility and practically, which is denoted as

f : X \to R

. GP is given by the property that any finite set of Z points

{\{x_{z} \in X\}}_{z = 1}^{Z}

generates the multi-variate Gaussian distribution on

R^{Z}

. From the elegant marginalization properties of the Gaussian distribution, marginals and conditionals can be computed subsequently when taking the z-th point as the function value

f (x_{z})

. Second, the following assessment point is decided by choosing an acquisition function (AC). Based on the first stage, we suppose that

f (x)

is obtained from a GP prior and noisy observations are

y_{z}, z = 1, 2, \dots, Z

, where

y_{z}

obeys Gaussian distribution

y_{z} \sim N (f (x_{z}), ν)

with mean

f (x_{z})

and noise variance

ν

. The prior information provided by GP and previous observations induce a posterior over function. Then, we employ the acquisition function represented by

α : X \to R^{+}

to determine at which point in

X

should be evaluated next through a proxy optimization

x_{n e x t} = a r g {max}_{x} a (x)

.

Herein, the GP upper confidence bound (GP-UCB) is chosen as our acquisition function. GP-UCB either exploits the region of the current optimal value (high mean region) or explores unsampled region (high variance region) to obtain the next sampling point. The GP-UCB function is written as:

a_{G P - U C B} (x) = μ (x) - β σ (x),

(6)

where

β

makes trade-off between exploitation and exploration.

μ (x)

and

σ (x)

are mean and standard deviation of the estimation function, respectively.

3.2.3. The Hybrid BO–XGBoost Method

As a hybrid feature selection approach, key explanatory variables are detected by calculating the feature importance score of XGBoost. BO is embedded in the XGBoost method to find the optimal hyperparameter set, which can enhance the feature selection performance. In the process of optimization, the most likely hyperparameter set given by the posterior distribution is used to evaluate the objective function [43]. In this study, we choose mean square error (MSE) as the objective function. The set of hyperparameters to obtain the minimum MSE is the best hyperparameter in the XGBoost method. The flowchart of the hybrid BO–XGBoost feature selection method is shown in Figure 2.

3.3. Deep Learning

While BO–XGBoost effectively reduces the feature dimension by eliminating irrelevant noise, it primarily captures decision-tree-based linear and simple non-linear boundaries. Flight delays in a large-scale network, however, exhibit highly complex, deep non-linear patterns driven by the topological structure of the aviation network. To capture these dynamics, simple regression is insufficient. Instead, we couple the BO–XGBoost end-to-end with a deep learning module for arrival delay estimation, which is designed through stacking multiple extreme learning machine auto encoders (ELM-AE) and a standard kernel extreme learning machine (KELM).

3.3.1. Kernel Extreme Learning Machine (KELM)

Extreme learning machine (ELM) is a single hidden layer feed forward network [44]. In the conventional ELM, the input weights

w

and biases

b

are randomly generated based on the number of neurons l in the hidden layer, and the output weights

β

are estimated by the least squares method [45]. Let

u \in R^{r}

be the set of input variables, the output variable

y (u) \in R^{s}

is formulated as

y {(u)}^{⊤} = h {(u)}^{⊤} β = g {(w^{⊤} u + b)}^{⊤} β,

(7)

where ⊤ represents the transpose,

h (u)

denotes the output of the hidden layer,

g (\cdot)

represents the activation function, and

β

demonstrates the output weights.

w

and

b

are the input weights and biases, respectively. Suppose there are n samples in the dataset, Equation (7) can be expressed in matrix form as

Y = H β .

(8)

The solution model for

β

is represented by minimizing the following constrained optimization problem, i.e.,

\begin{matrix} min \frac{1}{2} {∥ β ∥}^{2} + \frac{C}{2} {∥ e ∥}^{2}, s . t . Y = H β + e, \end{matrix}

(9)

where

e

is the vector of errors of the output of ELM and C is a regularized parameter. ELM improves the generalization capability and avoids data overfitting by reducing the norm of the output weights and training errors [46]. From the KKT theorem, Equation (9) is denoted equivalently as

min \frac{1}{2} {∥ β ∥}^{2} + \frac{C}{2} {∥ e ∥}^{2} - φ (H β + e - Y) .

(10)

Here,

φ

indicates Lagrange multipliers. The value of

β

is obtained on account of optimality condition.

β = \{\begin{matrix} H^{⊤} {(\frac{1}{C} I + {HH}^{⊤})}^{- 1} Y, n \leq l \\ {(\frac{1}{C} I + H^{⊤} H)}^{- 1} H^{⊤} Y, n > l \end{matrix}

(11)

where

I

is an n-dimensional identity matrix. Then, the output function of ELM corresponding to Equation (7) is

f (u)^{⊤} = {\begin{matrix} (12a) & h (u)^{⊤} H^{⊤} {(\frac{1}{C} I + H H^{⊤})}^{- 1} Y, n \leq l \\ (12b) & h (u)^{⊤} {(\frac{1}{C} I + H^{⊤} H)}^{- 1} H^{⊤} Y, n > l \end{matrix}

Therefore, the total output vector for n input data is

O = \{\begin{matrix} {HH}^{⊤} {(\frac{1}{C} I + H^{⊤} H)}^{- 1} Y, n \leq l \\ H {(\frac{1}{C} I + H^{⊤} H)}^{- 1} H^{⊤} Y, n > l \end{matrix}

(13)

There is no specific requirement for which the alternative method (see Equations (12a) and (12b)) is used on different sizes of datasets theoretically. Thus, both methods can be applied in different circumstances, but the computational costs and efficiency are distinguishing. According to the study in [47], the number of hidden neurons l has little effect on the generalization ability of ELM, and ELM exhibits favorable performance as long as l is large enough. When the size of the input data is quite large

n ≫ l

, one may be inclined to use Equation (12b) to decrease computational costs. Nevertheless, when

h (u)

is unknown, one may tend to use Equation (12a).

To cope with the unknown

h (u)

, KELM is presented by adding kernel function to the ELM [47]. KELM considers the kernel function to represent non-linear feature mapping of the hidden layer. The kernel function satisfies Mercer’s conditions and is convenient to use. Common kernel functions are Radial basis function (RBF), Sigmoid kernel and Polynomial kernel (Poly). We define the kernel matrix for ELM as

K_{E L M} = {HH}^{⊤} : K_{E L M i, j} = h (u_{i}) \cdot h (u_{j}) = K (u_{i}, u_{j}) .

(14)

Therefore, the estimated output of KELM based on Equation (14) is obtained as

\begin{matrix} f {(u)}^{⊤} & = h {(u)}^{⊤} H^{⊤} {(\frac{1}{C} I + {HH}^{⊤})}^{- 1} Y = {[\begin{matrix} K (u, u_{1}) \\ K (u, u_{2}) \\ ⋮ \\ K (u, u_{n}) \end{matrix}]}^{⊤} {(\frac{1}{C} I + K_{E L M})}^{- 1} Y . \end{matrix}

(15)

3.3.2. Multi-Layer Extreme Learning Machine (ML-ELM)

ELM, due to the structure of a single hidden layer, limits the modeling accuracy and stability, especially for the evaluation of high-volume and high-dimensional data [48]. To explore the relationship between input and output in complex cases, ML-ELM is promoted. The structure of ML-ELM is shown in Figure 3. ML-ELM is a multi-layer feed forward neural network, which combines the high efficiency of ELM with the deep structure of auto encoder (ELM-AE for short). The input dimension of ELM-AE is the same as the output, so the unsupervised learning mode is naturally transformed into supervised learning mode [49]. ELM-AE projects input data to hidden layers, and the input weights and biases are orthogonal (

w w^{⊤} = I

,

b^{⊤} b = 1

). The output weights

β

is computed by

β = H^{⊤} {(\frac{1}{C} I + {HH}^{⊤})}^{- 1} U,

(16)

where

U

is the input matrix. The learned feature map can be inversely restored to the raw input data via the output weights

β

. On the contrary, the transpose of

β

can map input data to the compressed features [50]. Using the basic principle above, ML-ELM stacks multiple ELM-AE and obtains the weight matrix of the hidden layer through layer-by-layer training from bottom to top [51]. The output of ML-ELM at the i-th (

i \geq 2

) layer is expressed as

H_{i} = g (β_{i}^{⊤} H_{i - 1}) .

(17)

From Equation (17), we know that random weights and biases are no longer required from the second hidden layer. Hence, ML-ELM addresses the issue of the random weights and biases caused by the single hidden layer network in ELM. Meanwhile, the deep structure of ML-ELM ensures that features are fully learned and key information is effectively extracted.

3.3.3. Deep Kernel Extreme Learning Machine (DKELM)

Although DKELM has been applied in other domains (e.g., [50]), its adaptation to flight delay estimation, fed with aviation network features selected via BO–XGBoost, represents a novel application architecture that addresses the unique challenges of high-dimensional, spatially interdependent delay data in large-scale aviation systems. Deep kernel extreme learning machine is a combination of two modules: ML-ELM for feature learning and KELM for decision making [50]. ML-ELM employs ELM-AE to train parameters of hidden layers and obtain the high-level features by stacking multiple ELM-AE. The output of ML-ELM is then utilized as the input of KELM to generate the predictions. KELM makes more accurate estimation with less user intervention. The training principle of the DKELM method is illustrated intuitively in Figure 4. The input features are abstracted and compressed by k hidden layers to obtain more compact features. Then, the kernel function is adopted to represent the non-linear mapping of compressed features.

Based on the elaboration of the above techniques, we are now in a position to show the process of flight arrival delay estimation in intelligent aviation systems. Our proposed method fully couples the high efficiency of BO–XGBoost in feature selection with the superior estimation capability of DKELM when dealing with high-volume and high-dimension data. Algorithm 1 depicts the main procedure of the proposed BO–XGBoost–DKELM method for estimating flight arrival delay.

The proposed model estimates flight arrival delay based on historical patterns learned from real operational data, which naturally embed physical and operational constraints (e.g., maximum turnaround times, airport curfews, and air traffic control regulations). While the model itself does not impose explicit upper bounds on predicted delays, its regularized structure and kernel-based smoothing minimize the risk of generating infeasible estimates. In practice, should a predicted delay exceed a reasonable operational threshold, a post-processing step can be applied to align the output with airline-specific or airport-specific feasibility limits.

Algorithm 1: The main procedures of the BO–XGBoost–DKELM method

1:: Input: Raw flight dataset $D = {(x_{i}, y_{i})}_{i = 1}^{N}$ , where $x_{i} \in R^{d}$ denotes the feature vector of flight i and $y_{i} \in R_{\geq 0}$ denotes its actual arrival delay (in minutes); Search space $Θ$ for XGBoost hyperparameters; Maximum BO iterations $T_{BO}$ ; DKELM hyperparameters: number of hidden layers k, kernel type $κ (\cdot, \cdot)$ , and regularization parameter C.
2:: Output: Trained DKELM model $M_{DKELM}$ .
3:: Step 1. Preprocessing:
4:: Encode nominal features in $D$ into numerical format.
5:: Apply standard normalization: ${\tilde{x}}_{i} = (x_{i} - μ) / σ$ , where $μ$ and $σ$ are empirical mean and standard deviation.
6:: Split $D$ into training set $D_{train}$ (70%) and test set $D_{test}$ (30%).
7:: Step 2. Feature Selection via BO-XGBoost:
8:: Initialize Gaussian Process (GP) prior over $Θ$ .
9:: for $t = 1$ to $T_{BO}$ do
10:: Sample hyperparameter configuration $θ_{t} \in Θ$ using GP-UCB acquisition function.
11:: Train XGBoost model $M_{XGB}^{(t)}$ on $D_{train}$ with hyperparameters $θ_{t}$ .
12:: Compute validation MSE: ${MSE}^{(t)} = \frac{1}{| D_{val} |} \sum_{(x, y) \in D_{val}} {(y - M_{XGB}^{(t)} (x))}^{2}$ .
13:: Update GP posterior with $(θ_{t}, {MSE}^{(t)})$ .
14:: end for
15:: Select optimal $θ^{*} = arg {min}_{θ_{t}} {MSE}^{(t)}$ .
16:: Train final XGBoost model $M_{XGB}^{*}$ with $θ^{*}$ on full $D_{train}$ .
17:: Compute feature importance scores $s_{j}$ for all $j = 1, \dots, d$ .
18:: Select top $d^{*}$ features $F^{*} = {j ∣ s_{j} \geq τ}$ , where $τ$ is chosen to minimize validation MSE.
19:: Retain only features in $F^{*}$ : $x_{i}^{sel} = {[x_{i, j}]}_{j \in F^{*}}$ .
20:: Step 3. Estimation:
21:: Construct compressed representation $z_{i} = ML-ELM (x_{i}^{sel})$ using k stacked ELM-AE layers.
22:: Train KELM regressor $M_{KELM}$ on ${(z_{i}, y_{i})}_{(x_{i}, y_{i}) \in D_{train}}$ with kernel $κ$ and parameter C.
23:: Return final model $M_{DKELM} (x) = M_{KELM} (ML-ELM (x^{sel}))$ .

4. Numerical Experiment

4.1. Data Acquisition and Processing

The high-dimensional flight dataset is obtained from the U.S. Bureau of Transportation Statistics (BTS) (https://www.transtats.bts.gov/, accessed on 12 September 2021) to conduct case studies. We select flight records from 1 May 2015 to 31 December 2019 to constitute our case dataset. Note that data from after 2020 are not included to avoid the impact of the COVID-19 pandemic on the analysis. Through data cleaning, we rigorously process the raw data to ensure operational feasibility. Specifically, records containing physically impossible values (e.g., negative flight times) or extreme statistical outliers (defined by the interquartile range rule) are removed to prevent the model from learning from non-feasible samples. Finally, we obtain the case dataset that comprises a total of 5,339,771 flights, including 373 airports and 10 airlines. Each flight record includes carriers, flight departure date, flight arrival date, departure and arrival airports, etc. The features and associated types are listed in Table 1.

The case dataset includes both nominal and numerical features. As machine learning methods are not suitable for directly processing nominal data, nominal data are converted into numerical data by Python encoding. Furthermore, data normalization is indispensable for features at different scales. We apply standard normalization to eradicate magnitude differences among features. The normalized feature is expressed as

\tilde{x} = \frac{x - E (x)}{\sqrt{V a r (x)}},

(18)

where

E (x)

and

\sqrt{V a r (x)}

are the mean value and standard deviation, respectively. The resultant dataset is randomly divided into a training set (70%) and a test set (30%).

To understand the regularity of flight arrival delay, delay distribution is analyzed statistically. As shown in Figure 5, flight arrival delay follows a long-tailed distribution. The majority of delays are less than 40 min, while longer delays (over 80 min) occur less frequently. This non-Gaussian distribution is typical in aviation delay data and reflects the sporadic nature of severe delays. Our proposed model does not rely on normality assumptions and is capable of modeling such complex distributions through its deep learning architecture.

4.2. Evaluation Metrics

In this section, root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (

R^{2}

) are chosen as evaluation metrics to examine the accuracy performance of our approach and benchmark methods. The definitions are given below:

\begin{matrix} M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |, \end{matrix}

(19)

\begin{matrix} R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}, \end{matrix}

(20)

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i})}, \end{matrix}

(21)

where n is the number of samples, and

y_{i}

and

{\hat{y}}_{i}

denote the actual and estimated target values.

{\bar{y}}_{i}

is the average of the actual target values. MAE measures the absolute deviation between the actual values and estimated values. RMSE describes the stability of the estimation method. Moreover, the correlation coefficient

R^{2}

is taken to reflect the ratio of the variance of the variable explained by the estimation model.

4.3. Selection of the Optimal Feature Subset

4.3.1. Bayesian Optimization Results

Before adopting XGBoost for feature selection, the BO method is used to optimize the hyperparameters of XGBoost. The BO procedure employed the GP-UCB as the acquisition function with

β = 2.576

, which balances exploration and exploitation. The maximum number of BO iterations is set to 200. The total runtime for the BO–XGBoost feature selection process is approximately 7.5 min on a desktop computer with an Intel Core i7-12700K CPU and 64 GB RAM, using Python 3.8.

Figure 6 describes the MSE of the XGBoost method changes with the value of the optimizable hyperparameter after 200 iterations. In Figure 6, a smaller point means a lower MSE. From the results, XGBoost achieves the best performance under the best hyperparameter set (n_eatimators = 82, max_depth = 12, max_child_weigh = 20, subsample = 1.00). The range and the optimal value of different hyperparameters are provided in Table 2.

4.3.2. Feature Importance

After training the BO–XGBoost, we attempt to capture and characterize the key features that contribute to flight arrival delay. Figure 7 ranks all input features in order of feature importance score, which is implemented by the “xgb.plot_importance” function of the XGBoost algorithm in sklearn. XGBoost has been served as an individual regression estimation method in the existing literature [43]. From Figure 7, we can clearly see that departure delay is the dominating cause for arrival delay, which is due to the fact that long departure delays are highly likely to lead to arrival delays. Moreover, crowdedness degree of airport and edge betweenness centrality at the network level also have a significant impact on flight arrival delay. Further, through multiple thresholds, feature selection is carried out based on feature importance score, and various feature subsets are obtained. The thresholds in Table 3 are determined based on the feature importance scores from the BO–XGBoost model. These scores indicate the relative contribution of each feature to predicting flight delay. The thresholds are applied in a stepwise manner to select subsets of features: we start with the full set of 27 features and rank them in descending order of importance score. A threshold is set to include only features whose importance score exceeded that value. Multiple thresholds are tested iteratively, each time reducing the number of features by incrementally lowering the threshold until only one feature remained. For each feature subset, the mean squared error (MSE) of the BO–XGBoost model is calculated. The subset yielding the lowest MSE is selected as optimal. Table 3 illustrates that 21 features are selected to form the best feature subset with the lowest MSE at 8.3771, corresponding to a threshold of 0.000277. The results are not highly sensitive to small changes in the cutoff threshold. As shown in Table 3, the MSE remains relatively stable across a range of thresholds near the optimal value. For instance: with 20 features (threshold is 0.000296), MSE is 8.3954. With 22 features (threshold is 0.000275), MSE is 8.3784. This indicates that the model performance is robust to minor variations in the threshold, and the selected feature subset provides a stable basis for delay estimation. Ultimately, the selected features contain departure delay, delay by national aviation system, crowdedness degree of airport, edge betweenness centrality, cancelled code, delay by carrier, delay by late aircraft arrival, delay by weather, delay by security check, carriers, scheduled elapsed time, precipitation of destination airport, precipitation of origin airport, destination airport, visibility of destination airport, month, origin airport, dry bulb temperature of origin airport, wind speed of destination airport, visibility of origin airport, and station pressure of destination airport, which are then fed into the deep learning module DKELM for flight arrival delay estimation.

4.4. Performance Results

4.4.1. Estimation Performance Comparison

As previously mentioned in Section 3.3, there exist three key parameters in the DKELM method: regularized parameter C, the number of hidden layers k, and the type of the kernel function. The estimation performance of DKELM depends on the selected optimums of the above parameters. In our case studies,

C, k

and kernel function are set as 7, 5 and RBF kernel, respectively. Regarding the initialization and training procedure, the DKELM consists of two stages: unsupervised feature learning (ML-ELM) and supervised decision making (KELM). As described in Section 3.3.2, the ML-ELM layers are initialized using orthogonal random weights (

w w^{⊤} = I

) to ensure feature diversity, and the weights are analytically determined layer by layer using auto-encoders, avoiding the need for backpropagation. The final regression layer (KELM) is solved via the closed-form least squares solution (Equation (11)), which ensures a global optimum.

To assess the DKELM method, flight arrival delay is also estimated by three benchmark methods, i.e., RF [52], SVR [53], KNN [54], and Graph Convolutional Network (GCN) [55,56]. To ensure a fair comparison, all compared methods receive a comparable optimization effort; specifically, they are hyperparameter-tuned using the BO approach with an identical search budget of 200 iterations. The specific hyperparameter search spaces tailored for each baseline, along with the resulting optimal values, are summarized in Table 4. To ensure statistical reliability, we repeat the experiment over 5 independent runs with different random seeds ({10, 15, 22, 36, 51}) for data splitting. For each run, the dataset is randomly divided into training (70%) and test (30%) sets. The performance metrics (RMSE, MAE, and

R^{2}

) are collected for each method across all runs. Additionally, paired t-tests are conducted between the proposed DKELM and each benchmark method to statistically validate performance superiority. Under these settings, we obtain the compared results.

We first visualize the training process. To this end, we further partitioned the initial training set to obtain the final training set (80%) and validation set (20%). The training curve is shown in Figure 8. From the figure, we can observe that as the number of iterations increases, both the training and validation curves tend toward stability. Moreover, the validation curve consistently remains close to the training curve, indicating that the model has not overfitted and can be further utilized to accomplish prediction tasks.

Table 5 reports the performance comparison results with mean and std. The obtained results imply that our proposed integrated framework can averagely achieve 3.3632 min error for flight arrival delay, which remarkably outperforms other benchmark methods. All p-values from paired t-tests are below 0.01, indicating statistically significant superiority. Regarding computational budget, we record the average runtime for all methods on the same hardware. The training times for the baseline models are comparable to our approach: RF requires 13 min, SVR with 21 min, GCN with 22 min, and KNN with 25 min. In comparison, the average training time for our proposed framework is approximately 16.8 min. While our method requires longer training time compared to RF due to its two-stage architecture and hyperparameter optimization, it achieves significantly higher prediction accuracy, which is critical for operational decision-making in aviation management. In terms of memory usage and parameter scaling, the framework explicitly addresses the challenge of high-dimensional, large-scale data through the optimization strategy described in Equation (13). Specifically, when the size of the input data is quite large

n ≫ l

, one may be inclined to use Equation (12b) to decrease computational costs. Nevertheless, when the output of the hidden layer

h (u)

is unknown, one may tend to use Equation (12a). This design drastically reduces the memory footprint and ensures that the parameter count remains manageable, determined primarily by the network width rather than dataset size. Consequently, the model exhibits low inference latency, making it highly suitable for real-time decision support in aviation management. These characteristics indicate that the framework is scalable to larger aviation networks and capable of processing high-frequency data streams on standard production hardware without necessitating prohibitively expensive computational resources.

To fully demonstrate the superiority of the proposed model, we also employ a Taylor diagram, which is a two-dimensional display for the purpose of displaying the accuracy performance of methods in a compact way. On the Taylor diagram, a single point integrates the information of three indicators: correction coefficient, root mean square (RMS) difference, along with their standard deviation ratio [57,58]. The position near the reference point indicates good accuracy performance. Figure 9 plots the Taylor diagram for all applied methods on the training set and test set. The DKELM method is located closest to the reference point, which confirms that DKELM is superior to other comparison methods in the estimation of flight arrival delay.

4.4.2. Parameter Comparison

Kernel function and regularized parameter are crucial parameters affecting the accuracy performance of the DKELM method. Therefore, we discuss the impact on evaluation indicators of different choices of these two parameters in this part. The kernel function in the DKELM method can enhance the regression accuracy. Table 6 presents the accuracy performance of the DKELM method on flight arrival delay using three kernel functions: RBF kernel, Linear kernel, Poly kernel. The test results reveal that the function of DKELM is obviously distinct under different kernel functions. DKELM achieves the best performance by using RBF kernel.

In addition, there is no empirical regularized parameter that applies to all scenarios in practice, so we next analyze the regularized parameter. Figure 10 shows the metrics changes of the DKELM method when the regularized parameter evolves in the range of 0.001 to 8. By investigating the performance variation of the DKELM method, we interestingly find that its performance improves rapidly toward a plateau on both the training and test sets when the regularized parameter increases from 0.001 to 6. DKELM performs best under the regularized parameter of 7, while performing a little worse at 8.

4.5. Managerial Insights

The integrated framework developed in this study not only identifies key explanatory features for flight arrival delays but also provides accurate estimations of delay durations. These insights can serve as a foundation for aviation managers and policymakers to implement proactive strategies aimed at enhancing operational efficiency, improving passenger experience, and fostering sustainable development in the aviation sector. However, translating these analytical insights into actionable measures requires careful consideration of the competitive, commercial, legislative, and operational landscapes of the aviation industry. Below, we outline several recommendations, grounded in the findings of our framework, while addressing the practical challenges and opportunities for implementation.

Enhancing Coordination for Network-Based Schedule Optimization. Our framework highlights the role of network-wide structural factors, such as edge betweenness centrality, in delay propagation across the aviation system. To mitigate these effects, aviation authorities could leverage advanced algorithms and data analytics to facilitate better coordination among airlines for optimizing flight schedules. For instance, identifying critical routes prone to delay propagation (via high betweenness centrality) could inform collaborative scheduling adjustments to reduce bottlenecks at hub airports. However, we acknowledge that achieving such coordination is challenging in a competitive market where flight schedules are a primary strategic lever for airlines to attract passengers and maximize revenue. To address this, we suggest that regulatory bodies, such as the Federal Aviation Administration (FAA) or international equivalents, could play a mediating role by establishing data-sharing platforms or incentive mechanisms (e.g., reduced fees for schedule adjustments that alleviate network congestion). Additionally, voluntary industry consortia could be encouraged to align on shared goals of delay reduction without compromising competitive positioning. While legislative mandates might be necessary in some contexts, a balanced approach combining incentives and voluntary cooperation is likely to be more feasible and effective.
Targeted Infrastructure Investments at Critical Airports. The predictive insights from our model underscore the importance of airport congestion levels (captured through node degree) as a significant contributor to flight delays. Investing in infrastructure, such as additional runways, gates, taxiways, advanced ground handling equipment, and expanded passenger terminals, could enhance capacity and operational efficiency at these critical nodes. To quantify the potential impact of such investments, we performed a counterfactual analysis focusing on the top ten airports with the highest betweenness centrality in our network. Using our trained DKELM model, we simulated a scenario where the capacity (reflected through node degree) of these airports was increased by 15%, approximating the effect of infrastructure expansion. The model estimated an average reduction in arrival delay of 2.1 min per flight at these airports, which translates to an aggregate reduction of approximately 11,200 delay minutes per day across the network, based on average daily flight volumes. This reduction represents a 6.4% decrease in total daily delay minutes attributable to congestion at these high-betweenness nodes. These estimates provide a data-driven foundation for prioritizing infrastructure projects. Aviation authorities and airport operators can use such counterfactual analyses to weigh the projected reduction in delay costs, such as lost productivity, passenger compensation, and operational inefficiencies, against the required capital investment. While substantial financial commitments are necessary, this quantitative approach supports more informed and evidence-based decision-making in infrastructure planning.
Improving Real-Time Passenger Information Systems. The ability of our framework to provide accurate estimates of flight arrival delays offers significant potential to enhance passenger experience through timely and reliable information updates. Passengers could benefit from real-time notifications about flight statuses via digital displays at airports, mobile applications, or other communication channels, enabling them to adjust travel plans accordingly. To realize this potential, we must consider which stakeholders—air traffic control (ATC), airports, or airlines—are best positioned to adopt and integrate this technology into existing systems. We propose a collaborative approach where airlines take the lead in disseminating delay predictions directly to passengers through their apps and communication platforms, given their direct relationship with customers. Simultaneously, airports could integrate these updates into terminal displays and public announcements to ensure consistency across touchpoints. ATC authorities could support this ecosystem by providing validated data inputs to ensure accuracy. Integration could be facilitated through standardized application programming interfaces (APIs) or centralized platforms managed by aviation authorities, ensuring seamless data flow among stakeholders. This multi-stakeholder model not only improves information transparency but also distributes implementation responsibilities, making adoption more feasible.
Reducing Environmental Impact through Operational Efficiency. Accurate delay duration estimates allow airlines and air traffic management to better synchronize arrivals with gate availability and ground service readiness. This reduces the need for aircraft to circle in holding patterns or idle on tarmacs with engines running, both of which significantly increase fuel consumption and $C O_{2}$ emissions. Over time, consistent application of such predictive tools across the network can yield measurable reductions in the sector’s environmental footprint, supporting global commitments to decarbonizing aviation.

5. Conclusions

The growing concern about aviation operational efficiency and passenger satisfaction makes accurate estimation of the duration of flight delay vital for sustainable aviation management. In this paper, we propose an integrated architecture that combines deep learning with complex network theory for accurately estimating the duration of flight delay. Drawing on complex network theory, the feature selection module adopts the XGBoost algorithm to conduct the selection of aviation network features at node and edge levels. To extract explanatory features in a more objective and accurate manner, Bayesian optimization is utilized to fine-tune the GBoost algorithm’s hyperparameters. Then, the modified multi-layer extreme learning machine DKELM uses these extracted features as input indicators, incorporating extreme learning machine auto encoders into the deep network to form a stacked architecture for robust and reliable estimation. Case studies are conducted based on a high-dimensional flight database from the U.S. Bureau of Transportation Statistics. Upon comparison with commonly used benchmark methods, the integrated framework demonstrates clear advantages in estimating the duration of flight delays. The conclusions are drawn as follows.

We conduct quantitative analysis of explanatory factors that affect the duration of flight delay using the XGBoost algorithm with Bayesian optimization. Our findings indicate that flight arrival delay is closely related to features such as departure delay, delay by national aviation system, crowdedness degree of airport, edge betweenness centrality, cancelled code, and delay by carrier. Afterwards, DKELM can accurately estimate flight arrival delay by utilizing these explanatory factors as input indicators.
By employing the proposed integrated framework, we can estimate the duration of flight delay from a multi-airports-multi-airlines-level perspective based on the five-year U.S. flight dataset. According to our case studies, the proposed method achieves an average delay error of 3.36 min. When juxtaposed with benchmark methods, this integrated method demonstrates the highest precision, with an average improvement of 7.8%.
As per the findings of this study, the aviation administration could consider several proactive measures such as iteratively upgrading the aviation navigation system with advanced algorithms and data analytics, expanding key infrastructure at hub airports, implementing appropriate priority access measures for airlines with higher edge betweenness centrality, and enhancing real-time passenger information system. These managerial insights are beneficial to enhance operational efficiency and management capacity, thereby ensuring sustainable aviation development in the future.

Admittedly, this study comes with some limitations. Though the BTS dataset includes aircraft tail numbers, it lacks explicit labels identifying adjacent flights, which would directly map the sequence of flights operated by the same aircraft. As a result, the current model forecasts arrival delays using only flight-specific features and does not examine whether delays render subsequent flight schedules operationally infeasible. We recognize that, a limitation of the current data-driven approach is that it predicts the delay duration based on network features but does not explicitly enforce physical constraints regarding the propagation of delay to the subsequent flight (e.g., when the predicted arrival delay exceeds the turnaround buffer). When datasets that include explicit labels identifying adjacent flights are available, future work could explore methods such as physics-informed machine learning to embed upper limits in the process and mitigate potential infeasibilities in turnaround scheduling between connected flights. In addition, future work can also involve exploring the applicability of this framework to other modes of transport, such as railway and road transport, which could yield more insights for the broader field of multi-modal transportation management.

On the other hand, a persistent challenge in deep learning is ensuring robustness against Out-of-Distribution (OOD) scenarios and zero-sample cases where the model encounters patterns not present during training. In complex network systems, enhancing the model’s ability to generalize to unseen data regimes is critical, particularly for black swan events (e.g., unprecedented extreme weather or new route openings). Recent methodological advancements have begun to address these limitations through novel architectures and data augmentation strategies, such as the studies in [59,60,61,62], which employ Out-of-Distribution method or meta-learning to extract robust features even in the absence of target samples. Although our proposed framework relies on supervised learning from historical aviation data, acknowledging these advancements in OOD robustness and zero-sample learning offers a valuable perspective on handling the sporadic and extreme nature of rare delay events that fall outside standard statistical distributions. To this end, in our future research, we can integrate OOD robustness and zero-sample learning capabilities into our methodology, enabling it to efficiently address unexpected events in aviation systems.

Author Contributions

Conceptualization, X.S. and H.H.; methodology, X.S.; software, X.S. and L.L.; validation, X.S.; formal analysis, X.S. and L.L.; investigation, X.S. and J.C.; resources, X.S.; data curation, X.S. and H.H.; writing—original draft preparation, X.S.; writing—review and editing, X.S.; visualization, L.L.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 71901059.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data can be accessed upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ngo, T.; Tsui, K.W.H. A data-driven approach for estimating airport efficiency under endogeneity: An application to New Zealand airports. Res. Transp. Bus. Manag. 2020, 34, 100412. [Google Scholar] [CrossRef]
Lopes, D.P.; Rita, P.; Treiblmaier, H. The impact of blockchain on the aviation industry: Findings from a qualitative study. Res. Transp. Bus. Manag. 2021, 41, 100669. [Google Scholar] [CrossRef]
Afonso, F.; Sohst, M.; Diogo, C.M.; Rodrigues, S.S. Strategies towards a more sustainable aviation: A systematic review. Prog. Aerosp. Sci. 2023, 137, 100878. [Google Scholar] [CrossRef]
Queiróz Júnior, H.D.S.; Falcão, V.; da Silva, F.G.F.; Bezerra, I.M.T.; Machado, J.K.L. Machine Learning Methods Benchmarking for Predicting Flight Delays: An Efficiency Meta-Analysis. Sustainability 2025, 17, 9887. [Google Scholar] [CrossRef]
Meng, L.Z.; Wu, M.G.; Wen, X.X.; Li, G.Z.; Sun, X.G. Delay Propagation Analysis Based on an Improved SIR Infectious Disease Model. Phys. A Stat. Mech. Its Appl. 2025, 674, 130746. [Google Scholar] [CrossRef]
Zhang, H.; Wu, W.; Jiang, Y.; Chen, X. Flight delay propagation in the multiplex network system of airline networks. Phys. A Stat. Mech. Its Appl. 2024, 648, 129883. [Google Scholar] [CrossRef]
Dong, X.; Zhu, X.; Hu, M.; Bao, J. A methodology for predicting ground delay program incidence through machine learning. Sustainability 2023, 15, 6883. [Google Scholar] [CrossRef]
U.S. BTS. Bureau of Transportation Statistics. 2020. Available online: https://transtats.bts.gov (accessed on 12 September 2021).
CAAC. Bulletin on the Development of the Civil Aviation Industry in 2019. 2020. Available online: https://www.caac.gov.cn (accessed on 1 June 2020).
Pyrgiotis, N.; Malone, K.M.; Odoni, A. Modelling delay propagation within an airport network. Transp. Res. Part C Emerg. Technol. 2013, 27, 60–75. [Google Scholar] [CrossRef]
Cai, K.Q.; Zhang, J.; Xiao, M.M.; Tang, K.; Du, W.B. Simultaneous optimization of airspace congestion and flight delay in air traffic network flow management. IEEE Trans. Intell. Transp. Syst. 2017, 18, 3072–3082. [Google Scholar] [CrossRef]
Noviantoro, T.; Huang, J.P. Investigating airline passenger satisfaction: Data mining method. Res. Transp. Bus. Manag. 2022, 43, 100726. [Google Scholar] [CrossRef]
Anupkumar, A. Investigating the Costs and Economic Impact of Flight Delays in the Aviation Industry and the Potential Strategies for Reduction. Master’s Thesis, California State University, Bakersfield, CA, USA, 2023. [Google Scholar]
Alla, H.; Moumoun, L.; Balouki, Y. Flight arrival delay prediction using supervised machine learning algorithms. In Intelligent Systems in Big Data, Semantic Web and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2021; pp. 231–246. [Google Scholar]
Abdel-Aty, M.; Lee, C.; Bai, Y.; Li, X.; Michalak, M. Detecting periodic patterns of arrival delay. J. Air Transp. Manag. 2007, 13, 355–361. [Google Scholar] [CrossRef]
Moreira, L.; Dantas, C.; Oliveira, L.; Soares, J.; Ogasawara, E. On evaluating data preprocessing methods for machine learning models for flight delays. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Yu, B.; Guo, Z.; Asian, S.; Wang, H.; Chen, G. Flight delay prediction for commercial air transport: A deep learning approach. Transp. Res. Part E Logist. Transp. Rev. 2019, 125, 203–221. [Google Scholar] [CrossRef]
Pamplona, D.A.; Weigang, L.; de Barros, A.G.; Shiguemori, E.H.; Alves, C.J.P. Supervised neural network with multilevel input layers for predicting of air traffic delays. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]
Chakrabarty, N.; Kundu, T.; Dandapat, S.; Sarkar, A.; Kole, D.K. Flight arrival delay prediction using gradient boosting classifier. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018; Springer: Berlin/Heidelberg, Germany, 2019; Volume 2, pp. 651–659. [Google Scholar]
Fernandes, N.; Moro, S.; Costa, C.J.; Aparicio, M. Factors influencing charter flight departure delay. Res. Transp. Bus. Manag. 2020, 34, 100413. [Google Scholar] [CrossRef]
Bubalo, B.; Gaggero, A.A. Flight delays in European airline networks. Res. Transp. Bus. Manag. 2021, 41, 100631. [Google Scholar] [CrossRef]
Li, G.; Wu, J.; Deng, C.; Chen, Z.; Shao, X. Convolutional neural network-based Bayesian Gaussian mixture for intelligent fault diagnosis of rotating machinery. IEEE Trans. Instrum. Meas. 2021, 70, 3517410. [Google Scholar] [CrossRef]
Sternberg, A.; Soares, J.; Carvalho, D.; Ogasawara, E. A review on flight delay prediction. arXiv 2017, arXiv:1703.06118. [Google Scholar]
Hansen, M. Micro-level analysis of airport delay externalities using deterministic queuing models: A case study. J. Air Transp. Manag. 2002, 8, 73–87. [Google Scholar] [CrossRef]
Hao, L.; Hansen, M.; Zhang, Y.; Post, J. New York, New York: Two ways of estimating the delay impact of New York airports. Transp. Res. Part E Logist. Transp. Rev. 2014, 70, 245–260. [Google Scholar] [CrossRef]
Pathomsiri, S.; Haghani, A.; Dresner, M.; Windle, R.J. Impact of undesirable outputs on the productivity of US airports. Transp. Res. Part E Logist. Transp. Rev. 2008, 44, 235–259. [Google Scholar] [CrossRef]
Rebollo, J.J.; Balakrishnan, H. Characterization and prediction of air traffic delays. Transp. Res. Part C Emerg. Technol. 2014, 44, 231–241. [Google Scholar] [CrossRef]
Li, Q.; Guan, X.; Liu, J. A CNN-LSTM framework for flight delay prediction. Expert Syst. Appl. 2023, 227, 120287. [Google Scholar] [CrossRef]
Choi, S.; Kim, Y.J.; Briceno, S.; Mavris, D. Prediction of weather-induced airline delays based on machine learning algorithms. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), Sacramento, CA, USA, 25–29 September 2016; pp. 1–6. [Google Scholar]
Guo, Z.; Yu, B.; Hao, M.; Wang, W.; Jiang, Y.; Zong, F. A novel hybrid method for flight departure delay prediction using Random Forest Regression and Maximal Information Coefficient. Aerosp. Sci. Technol. 2021, 116, 106822. [Google Scholar] [CrossRef]
Güvercin, M.; Ferhatosmanoglu, N.; Gedik, B. Forecasting flight delays using clustered models based on airport networks. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3179–3189. [Google Scholar] [CrossRef]
Cai, K.; Li, Y.; Fang, Y.P.; Zhu, Y. A deep learning approach for flight delay prediction through time-evolving graphs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 11397–11407. [Google Scholar] [CrossRef]
Mueller, E.; Chatterji, G. Analysis of aircraft arrival and departure delay characteristics. In Proceedings of the AIAA’s Aircraft Technology, Integration, and Operations (ATIO) 2002 Technical Forum, Los Angeles, CA, USA, 1–3 October 2002; p. 5866. [Google Scholar]
Zoutendijk, M.; Mitici, M. Probabilistic flight delay predictions using machine learning and applications to the flight-to-gate assignment problem. Aerospace 2021, 8, 152. [Google Scholar] [CrossRef]
Song, M.G.; Yeo, G.T. Analysis of the air transport network characteristics of major airports. Asian J. Shipp. Logist. 2017, 33, 117–125. [Google Scholar] [CrossRef]
Cheung, T.K.; Wong, C.W.; Zhang, A. The evolution of aviation network: Global airport connectivity index 2006–2016. Transp. Res. Part E Logist. Transp. Rev. 2020, 133, 101826. [Google Scholar] [CrossRef]
Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef]
Li, Q.; Jing, R. Flight delay prediction from spatial and temporal perspective. Expert Syst. Appl. 2022, 205, 117662. [Google Scholar] [CrossRef]
Yun, K.K.; Yoon, S.W.; Won, D. Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process. Expert Syst. Appl. 2021, 186, 115716. [Google Scholar] [CrossRef]
Shi, R.; Xu, X.; Li, J.; Li, Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl. Soft Comput. 2021, 109, 107538. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Qiu, Y.; Zhou, J.; Khandelwal, M.; Yang, H.; Yang, P.; Li, C. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration. Eng. Comput. 2022, 38, 4145–4162. [Google Scholar] [CrossRef]
Huang, G.B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011, 2, 107–122. [Google Scholar] [CrossRef]
Kina, C.; Turk, K.; Atalay, E.; Donmez, I.; Tanyildizi, H. Comparison of extreme learning machine and deep learning model in the estimation of the fresh properties of hybrid fiber-reinforced SCC. Neural Comput. Appl. 2021, 33, 11641–11659. [Google Scholar] [CrossRef]
Mohanty, D.; Parida, A.K.; Khuntia, S.S. Financial market prediction under deep learning framework using auto encoder and kernel extreme learning machine. Appl. Soft Comput. 2021, 99, 106898. [Google Scholar] [CrossRef]
Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2011, 42, 513–529. [Google Scholar] [CrossRef]
Kasun, L.L.C.; Zhou, H.; Huang, G.B.; Vong, C.M. Representational Learning with ELMs for Big Data. IEEE Intell. Syst. 2013, 28, 31–34. [Google Scholar]
Li, K.; Xiong, M.; Li, F.; Su, L.; Wu, J. A novel fault diagnosis algorithm for rotating machinery based on a sparsity and neighborhood preserving deep extreme learning machine. Neurocomputing 2019, 350, 261–270. [Google Scholar] [CrossRef]
Meng, Y.; Yu, S.; Qiu, Z.; Zhang, J.; Wu, J.; Yao, T.; Qin, J. Modeling and optimization of sugarcane juice clarification process. J. Food Eng. 2021, 291, 110223. [Google Scholar] [CrossRef]
Tang, J.; Deng, C.; Huang, G.B. Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Networks Learn. Syst. 2015, 27, 809–821. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Wu, C.H.; Ho, J.M.; Lee, D.T. Travel-time prediction with support vector regression. IEEE Trans. Intell. Transp. Syst. 2004, 5, 276–281. [Google Scholar] [CrossRef]
Sebastiani, F. Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 2002, 34, 1–47. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Feng, R.; Cui, H.; Feng, Q.; Chen, S.; Gu, X.; Yao, B. Urban traffic congestion level prediction using a fusion-based graph convolutional network. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14695–14705. [Google Scholar] [CrossRef]
Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
Kardani, N.; Bardhan, A.; Gupta, S.; Samui, P.; Nazem, M.; Zhang, Y.; Zhou, A. Predicting permeability of tight carbonates using a hybrid machine learning approach of modified equilibrium optimizer and extreme learning machine. Acta Geotech. 2021, 17, 1239–1255. [Google Scholar] [CrossRef]
Lu, B.; Zhao, Z.; Gan, X.; Liang, S.; Fu, L.; Wang, X.; Zhou, C. Graph out-of-distribution generalization with controllable data augmentation. IEEE Trans. Knowl. Data Eng. 2024, 36, 6317–6329. [Google Scholar] [CrossRef]
Chen, Z.; Huang, H.Z.; Deng, Z.; Wu, J. Shrinkage mamba relation network with out-of-distribution data augmentation for rotating machinery fault detection and localization under zero-faulty data. Mech. Syst. Signal Process. 2025, 224, 112145. [Google Scholar] [CrossRef]
Chen, Z.; Wu, J.; Deng, C.; Wang, X.; Wang, Y. Deep attention relation network: A zero-shot learning method for bearing fault diagnosis under unknown domains. IEEE Trans. Reliab. 2022, 72, 79–89. [Google Scholar] [CrossRef]
Li, M.; Chai, Z.; Huang, X.; Qiu, Y.; Yang, X. Radio Frequency Fingerprint Identification for Few-Shot Scenario via Grad-CAM Feature Augmentation and Meta-Learning. IEEE Internet Things J. 2025, 12, 27638–27648. [Google Scholar] [CrossRef]

Figure 1. An illustration of flight arrival delay, which is the difference between the real gate-in time and scheduled gate-in time.

Figure 2. Flowchart of the hybrid BO–XGBoost method.

Figure 3. Structure of ML-ELM.

Figure 4. The principle of the DKELM method.

Figure 5. Histogram of flight arrival delay distribution. The unit of frequency is number of flights.

Figure 6. The MSE value of BO optimizing XGBoost. (a) N_eatimators changes; (b) Max_depth changes; (c) Max_child_weight changes; (d) subsample changes.

Figure 7. XGBoost feature importance score of input features.

Figure 8. The training and validation curves for the proposed model.

Figure 9. Taylor diagram of all the comparison methods. (a) Training set; (b) test set.

Figure 10. Performance indicators comparison for different regularized parameter values.

Table 1. Feature information of flight arrival delay.

Feature Type

Features

Numerical

departure delay, delay by national aviation system, delay by carrier, delay by late aircraft arrival, delay by weather, delay by security check, month, weekday, day, scheduled elapsed time, wind speed of origin airport, station pressure of origin airport, visibility of origin airport, precipitation of origin airport, dry bulb temperature of origin airport, wind speed of destination airport, station pressure of destination airport, visibility of destination airport, precipitation of destination airport, dry bulb temperature of destination airport.

Nominal

carriers, origin airport, destination airport, tail number, cancelled code.

Table 2. Values of the optimizable hyperparameters.

Hyperparameter	Select Range	Optimal Value
Number of estimators	[50, 200], increased by 2	82
Maximum depth	[7, 12], increased by 1	12
Minimum child weight	[1, 20], increased by 0.5	20
Subsample	[0.80, 1.00], increased by 0.025	1.00

Table 3. Feature selection with multiple thresholds.

Threshold	Number of Features	MSE	Threshold	Number of Features	MSE
0.543648	1	74.7409	0.000350	15	8.4451
0.211734	2	21.5490	0.000334	16	8.4366
0.190648	3	20.4013	0.000331	17	8.4305
0.015957	4	17.3163	0.000313	18	8.4258
0.014394	5	11.0104	0.000299	19	8.4208
0.011507	6	8.8850	0.000296	20	8.3954
0.005454	7	8.6715	0.000277	21	8.3771
0.000715	8	8.6195	0.000275	22	8.3784
0.000646	9	8.5446	0.000274	23	8.4005
0.000601	10	8.5353	0.000236	24	8.3962
0.000377	11	8.5273	0.000199	25	8.3822
0.000374	12	8.4939	0.000183	26	8.3932
0.000362	13	8.4930	0.000175	27	8.3882
0.000362	14	8.4686

Table 4. Hyperparameter tuning details for all compared methods.

Method	Hyperparameter Search Space	Optimal Values
RF	n_estimators: [50, 210], increased by 20	150
	max_depth: [5, 15], increased by 1	7
	min_samples_split: [2, 10], increased by 1	6
SVR	regularization: {0.01, 0.1, 1, 5, 10, 100}	1
	gamma: 0.01, 0.1, 1, 5, 10, 100	10
	kernel: {“RBF”, “Linear”}	RBF
KNN	n_neighbors: [3, 12], increased by 1	7
GCN	hidden_units: {[32, 64], [64, 128], [32, 128]}	[32, 64]
	learning_rate: [0.001, 0.01], increased by 0.001	0.01
Proposed	C: [1, 10], increased by 1	7
	k: [2, 9], increased by 1	5
	kernel: {“RBF”, “Linear”, “Poly”}	RBF

Table 5. Method performance comparisons on training set and test set.

	Training Set			Test Set
Method	RMSE	MAE	$R^{2}$	RMSE	MAE	$R^{2}$	p-Value
DKELM	5.1672 ± 0.19	3.3652 ± 0.13	0.9795 ± 0.03	5.4721 ± 0.21	3.3632 ± 0.18	0.9711 ± 0.04	-
GCN	6.2134 ± 0.26	3.5027 ± 0.19	0.9693 ± 0.04	6.2925 ± 0.28	3.8206 ± 0.22	0.9513 ± 0.05	0.006
KNN	13.9658 ± 0.39	9.1684 ± 0.41	0.9421 ± 0.08	10.2952 ± 0.45	11.8274 ± 0.52	0.8851 ± 0.09	<0.001
SVR	11.1305 ± 0.31	3.8602 ± 0.37	0.9632 ± 0.07	8.2900 ± 0.38	9.8102 ± 0.41	0.9049 ± 0.07	0.003
RF	8.0889 ± 0.26	3.7237 ± 0.21	0.9651 ± 0.05	7.0968 ± 0.31	4.8562 ± 0.29	0.9149 ± 0.06	<0.001

Table 6. Performance comparison of different kernel functions.

	Training Set			Test Set
Method	RMSE	MAE	$R^{2}$	RMSE	MAE	$R^{2}$
RBF kernel	5.1672	3.3652	0.9795	5.4721	3.3632	0.9711
Linear kernel	14.0300	8.9432	0.9410	15.1838	8.9357	0.9324
Poly kernel	14.0758	8.9467	0.9406	15.2799	8.9684	0.9316

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, X.; Huang, H.; Liu, L.; Chen, J. Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management. Sustainability 2026, 18, 241. https://doi.org/10.3390/su18010241

AMA Style

Shen X, Huang H, Liu L, Chen J. Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management. Sustainability. 2026; 18(1):241. https://doi.org/10.3390/su18010241

Chicago/Turabian Style

Shen, Xiuyu, Haoran Huang, Liu Liu, and Jingxu Chen. 2026. "Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management" Sustainability 18, no. 1: 241. https://doi.org/10.3390/su18010241

APA Style

Shen, X., Huang, H., Liu, L., & Chen, J. (2026). Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management. Sustainability, 18(1), 241. https://doi.org/10.3390/su18010241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Deep Learning and Complex Network Theory for Estimating Flight Delay Duration in Aviation Management

Abstract

1. Introduction

2. Literature Review

3. Methodology for Estimating the Duration of Flight Delay

3.1. Aviation Network Feature Extraction

3.2. Influencing Factors Selection

3.2.1. Extreme Gradient Boosting Machine (XGBoost)

3.2.2. Bayesian Optimization

3.2.3. The Hybrid BO–XGBoost Method

3.3. Deep Learning

3.3.1. Kernel Extreme Learning Machine (KELM)

3.3.2. Multi-Layer Extreme Learning Machine (ML-ELM)

3.3.3. Deep Kernel Extreme Learning Machine (DKELM)

4. Numerical Experiment

4.1. Data Acquisition and Processing

4.2. Evaluation Metrics

4.3. Selection of the Optimal Feature Subset

4.3.1. Bayesian Optimization Results

4.3.2. Feature Importance

4.4. Performance Results

4.4.1. Estimation Performance Comparison

4.4.2. Parameter Comparison

4.5. Managerial Insights

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI