Next Article in Journal
Modeling and Systematic Analysis of Grinding Behavior for Overburden, Saprolite, and Their Mixtures
Previous Article in Journal
Experimental and Numerical Investigation of Aerodynamics of Optimum Side-View Mirror Geometries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Evolution and Taxonomy of Deep Learning Models for Aircraft Trajectory Prediction: A Review of Performance and Future Directions

Department of Information Security at Paichai University, Daejeon 35345, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(19), 10739; https://doi.org/10.3390/app151910739
Submission received: 4 September 2025 / Revised: 26 September 2025 / Accepted: 2 October 2025 / Published: 5 October 2025
(This article belongs to the Section Aerospace Science and Engineering)

Abstract

Accurate aircraft trajectory prediction is fundamental to air traffic management, operational safety, and intelligent aerospace systems. With the growing availability of flight data, deep learning has emerged as a powerful tool for modeling the spatiotemporal complexity of 4D trajectories. This paper presents a comprehensive review of deep learning-based approaches for aircraft trajectory prediction, focusing on their evolution, taxonomy, performance, and future directions. We classify existing models into five groups—RNN-based, attention-based, generative, graph-based, and hybrid and integrated models—and evaluate them using standardized metrics such as the RMSE, MAE, ADE, and FDE. Common datasets, including ADS-B and OpenSky, are summarized, along with the prevailing evaluation metrics. Beyond model comparison, we discuss real-world applications in anomaly detection, decision support, and real-time air traffic management, and highlight ongoing challenges such as data standardization, multimodal integration, uncertainty quantification, and self-supervised learning. This review provides a structured taxonomy and forward-looking perspectives, offering valuable insights for researchers and practitioners working to advance next-generation trajectory prediction technologies.

1. Introduction

The aviation operation environment is becoming increasingly complex due to the continuous growth of air traffic, diversification of operation paradigms, and emergence of Urban Air Mobility (UAM) [1,2,3]. This complexity has increased the need for accurate trajectory prediction in areas such as air traffic management (ATM), Collision Avoidance Systems (CASs), autonomous UAV operations, and fuel efficiency [2,4]. Trajectory prediction plays a critical role by estimating future positions, speed, altitude, and heading, thereby enabling optimized air traffic flow and enhanced safety [5]. As a result, high-precision trajectory prediction is regarded as a key element of strategic decision-making in aviation operations [6].
Traditional trajectory prediction approaches can be divided into physics-based and statistical models. Physics-based models, grounded in aerodynamics and aircraft dynamics, can yield accurate predictions under specific conditions but are limited in flexibility when facing nonlinear, disturbance-driven operations [7]. Statistical methods, such as linear regression, ARIMA, Kalman filters, HMMs, and Bayesian estimators, are useful for temporal predictions based on past trajectories [8,9,10,11], but they face problems with long-term dependencies, nonlinear relationships, noisy data, and missing values. Their fixed mathematical structures also reduce their adaptability in scenarios involving inter-aircraft interactions, irregular patterns, or complex operational behaviors [4,5]. Consequently, their applicability is limited in high-complexity environments such as dense low-altitude operations, urban flights of non-standard platforms (UAVs, eVTOLs, etc.), and multi-aircraft group behavior [1,3]. In addition to conventional radar-based surveillance systems, crowdsourced data sources have increasingly been used to support large-scale monitoring and prediction. Networks such as OpenSky and ADS-B provide user-contributed flight data that complement traditional datasets and have enabled significant progress in trajectory modeling [12,13].
To address these limitations, deep learning-based approaches have recently gained attention [14]. Deep learning can automatically learn high-dimensional and nonlinear time-series data, inferring latent patterns without explicit modeling. RNNs are effective for sequential dependencies but suffer from vanishing gradients in long sequences [15]. LSTM and GRU overcome this issue through gating mechanisms and are widely used for trajectory prediction [16,17,18]. Transformer architectures, leveraging attention mechanisms, provide greater scalability and stability for long-sequence learning [19,20,21], with variants such as Informer [22], Performer [23], and Longformer [24] further improving their efficiency. Graph Attention Networks (GATs) model multi-aircraft interactions for group trajectory prediction [25,26]. Generative models, including GAN, VAE, and diffusion, support trajectory data augmentation, scenario simulation, and training diversity, addressing data scarcity and imbalance [27,28,29].
Several surveys and reviews relating to air traffic management and trajectory prediction have been published in recent years. The authors of [30] systematically reviewed applications of AI in air operations, with a primary emphasis on military contexts, where trajectory prediction was addressed only as a secondary theme. The authors of [31] focused on artificial intelligence and explainable AI in air traffic management, but trajectory prediction was only mentioned as a subtopic within the broader scope. In [32], the authors discussed an AI-based Internet of Vehicles utilizing UAVs, but the scope was limited to drones and thus not directly relevant to civil aircraft trajectory prediction. Finally, ref. [33] provided a comprehensive overview of aircraft trajectory prediction in civil aviation, but it mainly concentrated on conventional physics-based and statistical methods (e.g., regression models, Kalman filters, and Bayesian estimators) and did not sufficiently incorporate the most recent deep learning-based studies. In addition, existing reviews are often limited to specific model types or provide only fragmented comparisons, lacking a systematic taxonomy, performance-based evaluation, or application-oriented perspectives. We carefully investigated the literature between 2020 and June 2025 and found that survey or review papers dedicated exclusively to deep learning-based aircraft trajectory prediction are extremely scarce. Most existing works focus on broader air traffic management, autonomous vehicles, or multimodal trajectory prediction, or only mention aircraft trajectory prediction as a minor topic. This scarcity further highlights the originality and necessity of the present review; it also indicates that, to date, there has been no reproducible and structured review dedicated solely to deep learning-based aircraft trajectory prediction while incorporating the latest research trends up to June 2025.
To address these gaps, this paper makes four main contributions:
  • To the best of our knowledge, this study provides the first structured and reproducible review devoted exclusively to deep learning-based aircraft trajectory prediction, comprising studies published up to June 2025.
  • This study establishes a comprehensive taxonomy that systematically categorizes five major model families: RNN-based, attention-based, generative, graph-based, and hybrid and integrated models.
  • We conduct quantitative comparisons of representative models using standardized metrics (RMSE, MAE, ADE, and FDE) and benchmark datasets, while explicitly discussing dataset-related challenges such as OpenSky’s coverage bias and preprocessing inconsistencies.
  • We examine practical applications in ATM, anomaly detection, optimization, and real-time system integration, and critically discuss key technical requirements including scalability, explainability, and certification.
The remainder of this paper is organized as follows: Section 2 presents the systematic literature review (SLR) methodology, including the research questions and study selection criteria. Section 3 categorizes trajectory prediction models by structure and processing methods. Section 4 compares their prediction performance based on datasets and evaluation metrics. Section 5 discusses applications, technical challenges, and research directions. Section 6 concludes the review.

2. Systematic Literature Review Methodology

To systematically summarize trends in the development of deep learning-based aircraft trajectory prediction techniques, this study designed a literature collection and analysis procedure based on a structured taxonomy review approach. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were partially applied to ensure objectivity and reproducibility in the research selection process.

2.1. Research Questions

For this review, we established research questions (RQs) to evaluate existing publications and systematically extract key information regarding aircraft trajectory prediction studies. These questions clarify the scope of this study, with the aim of analyzing the structural characteristics of deep learning-based models, as well as evaluation metrics, datasets, application domains, and research gaps. Well-defined research questions provide researchers in this field with practical, targeted insights and guide future research directions. Table 1 presents the research questions (RQ1–RQ5) and their corresponding objectives for the literature review.

2.2. Literature Search Strategy

To ensure reproducibility and transparency, a structured literature search was conducted. Boolean operators (“AND”, “OR”) were applied to construct the final search query as follows:
(“aircraft trajectory prediction” OR “flight trajectory prediction”)
AND (“deep learning” OR “LSTM” OR “GRU” OR “ConvLSTM” OR “CNN”
OR “Transformer” OR “GAN” OR “Diffusion” OR “GNN”)
This query was consistently applied across major academic databases, including IEEE Xplore, ScienceDirect, SpringerLink, MDPI, ACM Digital Library, arXiv, AIAA ARC, and Hindawi (Wiley), without alternative variations.
The search scope was limited to publications from January 2020 to June 2025. Database searches were conducted between 13 April 2025 and 10 July 2025, with iterative updates to ensure that the most recent studies were included. The final cutoff date for inclusion was 10 July 2025.
A three-step selection process was applied: title/abstract/keyword screening, followed by full-text review, and final eligibility check.
Inclusion criteria:
  • Search terms in title, abstract, or keywords
  • Explicit focus on aircraft trajectory prediction using deep learning
  • Reported quantitative performance metrics (e.g., RMSE, MAE)
Exclusion criteria:
  • Published outside the 2020–June 2025 timeframe
  • Studies only on trajectory classification or deviation detection
  • Unrelated topics (e.g., communication, navigation)
  • Deep learning used only as auxiliary or comparative tool
  • Duplicates (retained most comprehensive version)
  • Full text not accessible
The PRISMA flow diagram of the research selection process is shown in Figure 1. A total of 350 initial results were reduced to 223 after duplicate removal and initial filtering. After title and abstract screening, 114 papers were included for full-text review. Finally, 46 studies were included in the qualitative synthesis of this review.
While the PRISMA flow diagram and general inclusion/exclusion criteria provide an overview of the selection process, they may not be sufficient to ensure full reproducibility. Therefore, this review also presents specific examples to clarify the operational definitions applied.
  • Examples of Inclusion Criteria
    -
    Studies proposing deep learning-based aircraft trajectory prediction models.
    For example, a study that applied LSTM to ADS-B data for four-dimensional trajectory prediction was included because it directly aligns with the focus of this review.
    -
    Studies validating prediction models with real-world aviation data.
    For instance, a Transformer-based model applied to multi-step trajectory prediction was included due to its empirical contribution.
  • Examples of Exclusion Criteria
    -
    Review or commentary papers.
    For example, articles that only provide an overview of air traffic management or general trends, without proposing new models or presenting experiments, were excluded.
    -
    Studies where deep learning was not the primary prediction model.
    For instance, papers that were mainly based on statistical or traditional machine learning approaches, with deep learning used only as a comparative baseline, were excluded.
    -
    UAV or drone trajectory prediction studies.
    These were excluded because their operating environments and data sources differ substantially from those of commercial aircraft, which are the focus of this review.
    -
    Studies lacking accessible full text or sufficient methodological details.
    For example, papers that were only available as abstracts or without reproducible experimental setups were excluded.
By presenting both the operational definitions and concrete examples of inclusion and exclusion, the selection process minimizes subjective bias and ensures that subsequent researchers can reproduce the same procedure with clarity.
The distribution of the collected papers across databases is shown in Figure 2. IEEE Xplore contributed the largest share (22 papers, ~48%), followed by MDPI (9 papers, ~20%). Three studies were collected from other academic outlets.
Additionally, three relevant studies were identified through Google Scholar and forward and backward citation tracking but were not indexed in the primary academic databases targeted in this study. Specifically, two of these papers were retrieved from the AAAI Conference on Artificial Intelligence, and one was published in the International Journal of Digital Earth. Nevertheless, these studies represent core methodological contributions in the field of deep learning-based aircraft trajectory prediction and are closely aligned with recent research trends and were therefore included in the final analysis.
The annual publication trend is shown in Figure 3. Research on deep learning-based aircraft trajectory prediction has increased steadily since 2020, peaking in 2024 with 13 papers. By June 2025, nine papers had already been published, suggesting continued growth in this field.
To further strengthen reproducibility, the complete search strings, dataset identifiers, and supplementary comparison tables are provided in the Supplementary Materials (Tables S1–S4). These materials include the full bibliographic list of reviewed studies, detailed database coverage, dataset access links, and extended performance/cost/robustness comparisons.

2.3. Overview of Classification Criteria

Deep learning-based models for aircraft trajectory prediction exhibit diverse structures and characteristics, making systematic classification necessary for comprehensive understanding. In this review, a total of 46 major studies are examined, and classification is organized into two processes: a basic taxonomy that categorizes models into five core families, and a composite classification that reflects fusion-oriented designs such as multi-module integration and multi-step processing.
First, the basic taxonomy organizes the models into five categories. To provide a clear taxonomy, five basic categories are defined: RNN-family models, attention-based models, generative models, graph-based models, and hybrid and integrated models. Auxiliary enhancements within the same family (e.g., LSTM + attention, GRU + self-attention, etc.) or simple input/data fusion are not considered hybrid but remain within their primary category. The quantitative distribution of the reviewed studies across these five categories is presented in Table 2.
Second, beyond this basic taxonomy, a composite classification is adopted to capture recent studies that pursue broader objectives such as uncertainty quantification, multi-object interaction, or situational awareness. These include the following:
  • Multi-module integration, where CNN, RNN, Transformer, and GCN are combined to learn spatiotemporal dependencies and complex interactions that a single network cannot represent.
  • Multi-step modeling, where the prediction process is divided into sequential stages, often including correction, ensemble, or uncertainty quantification modules, thereby reducing the number of accumulated errors and enhancing reliability.
This dual perspective—basic taxonomy and composite classification—allows for a systematic yet flexible categorization of aircraft trajectory prediction models, ensuring that both traditional and fusion-oriented designs are comprehensively covered.

3. Classification of Deep Learning Models for Aircraft Trajectory Prediction

Various deep learning-based models have been proposed in aircraft trajectory prediction research; however, since their structures and characteristics differ, a systematic classification is required for comprehensive understanding. This section addresses this need by summarizing major studies published to date and analyzing the features of each model type.
RQ1: Which types of deep learning-based models have been proposed for aircraft trajectory prediction?

3.1. RNN (Recurrent Neural Network)-Based Models

Recurrent Neural Networks (RNNs) are among the earliest deep learning approaches that were introduced for aircraft trajectory prediction. They specialize in modeling temporal dependencies of time-series data, with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) being widely adopted to mitigate the vanishing gradient problem. Over time, extensions such as Bi-LSTM, ConvLSTM, and Social-LSTM have been proposed to address limitations in multi-aircraft interaction modeling and spatiotemporal feature extraction.
The authors of [34] proposed a Seq2Seq LSTM model incorporating an encoder–decoder structure with a moving window strategy. This design effectively updated input information during prediction and reduced cumulative errors in long-term forecasting. Compared with GRU, the model achieved lower performance, demonstrating the potential of LSTM-based architectures in long-horizon prediction. Reference [35] introduced the Social-LSTM to explicitly capture multi-aircraft interactions. By mapping latent aircraft states onto a spatial grid and aggregating them using a social pooling mechanism, the model outperformed standard LSTMs in terms of accuracy in congested terminal space. This highlighted the importance of relational modeling in congested airspace environments. Reference [36] compared the CNN-Stacked LSTM and ConvLSTM architectures. ConvLSTM integrated convolutional operations into LSTM to jointly learn spatiotemporal features, using fused radar, weather, and ADS-B data. The results showed that ConvLSTM outperformed Stacked LSTM in terms of accuracy, particularly for short-term predictions, demonstrating the benefits of combining spatial and temporal representations. The authors of [37] proposed a Constrained LSTM that embedded flight-phase constraints (climb, cruise, and descent) into the prediction process. This phase-aware design achieved lower accuracy, ensuring that predicted trajectories were more realistic and consistent with actual operational patterns.
Other RNN-based models include a two-layer LSTM integrating ADS-B and BeiDou data that maintained robustness under data loss or signal errors [38]; a Bi-LSTM that captured bidirectional temporal dependencies but still suffered from error accumulation in long-term predictions [39]; and a comparative analysis of LSTM-FC for 150 s horizons, which achieved strong latitude/longitude prediction but limited accuracy in altitude forecasting [15].

3.2. Attention-Based Models

The attention mechanism assigns weights to each time step of the input sequence, allowing the model to focus on critical parts. In aircraft trajectory prediction, it has been actively applied to mitigate the long-term dependency problem of sequential data and effectively learn complex spatiotemporal patterns. The reviewed studies can be divided into attention-oriented models, which refine the attention architecture itself, and hybrid and context-integrated models, which combine attention with other networks or external contextual information. Table 3 summarizes these two categories and the corresponding studies.

3.2.1. Attention-Oriented Models

The primary goal of this family is to stabilize long-term predictions and improve the inherent performance of Transformer-based architectures. The authors of [21] proposed the Noise-Robust Autoregressive Transformer (NRAT), which integrates hybrid positional encoding (absolute + relative) and noise injection to enhance robustness under noisy conditions and in cases with missing data. The model showed better performance than that of the vanilla Transformer. Reference [40] introduced the Trajectory Embedding Transformer (TET), employing an encoder–decoder structure with multi-head attention and positional encoding. Using OpenSky ADS-B data, the TET outperformed LSTM and GRU in long-term prediction and showed robustness with different input sequence lengths. The authors of [41] developed the Flight Trajectory Transformer (FT-TF), which combines sliding windows and variable selection networks. The model achieved superior results in terms of the accuracy of altitude, latitude, and longitude, outperforming Informer and Autoformer. Other studies include [42], which proposed an Attention-LSTM with improved accuracy compared with standard LSTM, and [43], which introduced a Trajectory Stabilization Module and one-step inference to mitigate cumulative errors while quantifying the prediction uncertainty.

3.2.2. Hybrid and Context-Integrated Attention Models

This family enhances realism and applicability by integrating attention with other neural architectures or external contextual information. Reference [44] presented FlightBERT++, a non-autoregressive Transformer with a horizon-aware decoder. This model improves upon Binary Encoding (BE)-based representations to address the problem of high-order bit prediction errors (outliers). It also improves the inference speed by predicting multiple steps simultaneously with a non-autoregressive structure. Furthermore, it reduces error accumulation and outliers through differential-based prediction. The authors of [45] applied the Temporal Fusion Transformer (TFT), integrating static, known, and observed features. Using multi-route OpenSky data, it achieved highly accurate 2D position prediction, which was particularly effective in the approach and terminal phases, while providing explainability through its variable selection mechanism. The authors of [46] proposed the Inverted Transformer Framework, which tokenizes input variables and fuses multiple flight modes. This approach effectively learns global temporal patterns and inter-variable interactions from multivariate time series (ADS-B) data. As a result, the inverted input showed superior performance across all evaluation metrics compared with the original input in various Transformer models. Other notable works include reference [47], which introduced a Patched Spatial-Temporal Transformer (PSTT) that mitigated cumulative errors through patch-based division and single-step decoding, and reference [48], whose authors applied Attention-LSTM for short-term prediction using fighter engagement data.
Attention-based models have evolved in two directions. Attention-oriented models focus on refining Transformer mechanisms to improve long-term stability, while hybrid and context-integrated models emphasize real-world applicability through contextual integration, multi-horizon prediction, and explainability.

3.3. Generative Models

Generative models are increasingly important in aircraft trajectory prediction because they can represent uncertainty and multiple possible outcomes as probability distributions. Unlike deterministic models that produce a single trajectory, generative approaches produce diverse outputs approximating real distributions, thereby improving realism and robustness. Two families are prominent: GAN-based and diffusion-based models.

3.3.1. GAN-Based Models

GANs [71] consist of a generator and a discriminator that learn adversarially to produce realistic data distributions. Reference [49] proposed a TPGAN based on a conditional GAN (CGAN) [72]. This model regards sequential trajectories as probabilistic distributions and performs multi-sequence prediction simultaneously, reducing cumulative errors. The generator used an Upsample-CNN structure, while the discriminator used a Downsample-CNN to capture local trajectory patterns. Among Conv1D-, Conv2D-, and LSTM-based models, the Conv1D-based TPGAN achieved the best balance of accuracy and computational speed, significantly alleviating cumulative errors in multi-step prediction. The authors of [28] developed a Conditional Tabular GAN (CTGAN) [73] that was tailored to trajectory data with non-Gaussian and categorical properties. By improving the sampling strategy and replacing one-hot with leave-one-out encoding, CTGAN achieved stable learning even on small datasets. It demonstrated strong robustness against data imbalance and was effective in approximating real distributions for long-term prediction.
In summary, GAN-based models excel in handling data imbalance and generating diverse trajectories, but they may still suffer from unstable training and discriminator dependence.

3.3.2. Diffusion-Based Models

Diffusion models [74] generate trajectories by gradually adding noise to data and then denoising it in reverse, thus learning to recover the original distribution. The authors of [50] were the first to introduce diffusion models into this field. Their framework combined a trajectory encoder and context encoder to incorporate the runway and arrival time, using a Transformer-based decoder for reverse diffusion. This overcame the limitations of deterministic approaches, yielding context-aware and realistic long-term predictions. Reference [51] proposed GooDFlight, a goal-conditioned diffusion framework. By integrating a goal encoder directly into the diffusion process, the generated trajectories consistently converged to the intended endpoint, ensuring both realism and reliability. GooDFlight demonstrated improved accuracy and diversity while maintaining physical consistency, making it more applicable to real-world operations.
Overall, diffusion models provide superior long-term stability and goal-oriented consistency, although they have higher computational costs than GAN-based methods.
GAN-based approaches focus on data imbalance mitigation and multi-sequence diversity, as exemplified by TPGAN [49] and CTGAN [28]. In contrast, diffusion models such as those by Yin et al. [50] and GooDFlight [51] emphasize contextual consistency and reliability, extending applicability to real-world operational environments.

3.4. Graph-Based Models

Graph neural networks (GNNs) [75] provide advantages for aircraft trajectory prediction by modeling irregular relational information, such as multi-aircraft interactions, state dependencies, and nonlinear spatiotemporal patterns. The Graph Convolutional Network (GCN) [76] applies convolution to graph data by aggregating neighbor information, effectively capturing local structures but limiting the representation of complex relationships due to uniform weighting. To address this, the Graph Attention Network (GAT) [77] includes attention mechanisms to assign dynamic weights to neighbors, enabling more precise interaction modeling at the cost of higher computational complexity.
The authors of [25] proposed an attention-based Graph Convolutional Network (AGCN) combined with GRU. Aircraft are modeled as nodes with adjacency matrices representing interactions, while the AGCN selectively extracts key relational features before the GRU captures temporal dependencies. This structure is particularly effective in capturing multi-aircraft combat dynamics, though it is limited to simulation data without real weather or intent variables. Reference [26] introduced a spatiotemporal Graph Attention Network (ST-GAT), which integrates Transformer modules for temporal sequence modeling with GAT modules for spatial interaction learning. The two outputs are fused in a fully connected decoder to generate final trajectory predictions. This architecture enables robust spatiotemporal integration in multi-agent settings, but the evaluation is confined to simulated scenarios without pilot or environmental data. The authors of [52] developed a Global–Local Interattribute Relationship GCN (GLR-GCN), employing dual graphs: a global graph for capturing broad correlations among state variables and a local graph for physically related variables. Each graph undergoes GCN operations, and outputs are refined with a temporal convolution layer. This dual structure explicitly encodes both global and local relationships, outperforming baselines, but it remains sensitive to noise and lacks testing in UAV or combat environments. Reference [53] proposed a Dual Attention Spatiotemporal GCN (DA-STGCN), which extends the conventional ST-GCN by incorporating both temporal attention to capture stepwise importance and node attention to model dynamic aircraft interactions. The addition of temporal and node-level attention strengthens terminal area predictions and enhances global/local relational learning, but the model’s high computational demand and absence of weather/ATC inputs remain challenges.
Overall, graph-based models highlight that explicitly modeling relational and interactional structures in trajectory prediction significantly enhances both accuracy and interpretability. Although issues of computational complexity and scalability persist, these models show strong potential in dense airspaces and highly interactive traffic environment.

3.5. Hybrid and Integrated Models

Recent studies in aircraft trajectory prediction emphasize that a single deep learning structure cannot adequately capture the complexity of spatiotemporal patterns and external factors. To address this, hybrid and integrated models combine heterogeneous neural architectures or extend learning strategies to leverage their complementary strengths. These models can be grouped into two subcategories: structural hybrid prediction models and representation learning and generalization-based models.

3.5.1. Structural Hybrid Prediction Models

Structural hybrid prediction models aim to enhance both accuracy and stability by integrating the strengths of different deep learning architectures such as RNN, CNN, attention, Transformer, or GAN. While these approaches improve performance in both short- and long-term prediction and effectively utilize diverse input variables (ADS-B, weather, flight plans, etc.), they also increase the model’s complexity and computational cost.
The authors of [54] proposed a CNN-LSTM model that extracts local spatial patterns from ADS-B trajectory data [78] using a 1D-CNN and learns temporal dependencies through a two-layer LSTM. This design demonstrated stable performance for both single- and multi-step prediction. Similarly, the authors of [55] developed a Phase Hybrid model that explicitly divides flight stages (climb, cruise, and descent) and applies specialized ST-LSTM with spatiotemporal graph (ST-Graph) and CNN modules for each phase. By integrating weather data and an attention mechanism, this approach reflects both stage-specific dynamics and external influences.
Beyond CNN–RNN combinations, some works explored generative-predictor integration. Reference [56] introduced the Deep Generative and Predictive Network Model (DGPNM), which combines a WGAN-based [79] generator with an LSTM predictor. The generator diversifies trajectory samples, while the predictor learns from these expanded distributions to improve robustness under data-scarce conditions. Similarly, the authors of [17] proposed CG3D, which couples CNN-GRU with a 3D CNN and employs Monte Carlo Dropout to both capture spatiotemporal dependencies and quantify prediction uncertainty, providing reliability in long-horizon forecasts.
Advanced integration has also been explored through Transformer- and GAN-based designs. Reference [57] presented the SATF (Spatially Aware Time-Frequency Transformer), which fuses a CNN-based spatial encoder with a Time-Frequency Transformer to simultaneously capture spatial distributions and frequency-domain variability. The authors of [58] combined dilated TCN and GRU with attention, enabling efficient long-term temporal learning, short-term dependency modeling, and selective weighting of key features. Reference [18] introduced a CNN-LSTM + attention + social pooling structure to explicitly model multi-aircraft interactions by aggregating relative positions and inter-aircraft dynamics. The authors of [19] proposed the IMM-Informer hybrid, merging physics-based motion models (Constant Velocity, Constant Acceleration, and Constant Turn) with the Informer architecture to preserve physical consistency while improving long-term deep learning prediction. Finally, the authors of [59] developed the GCTrajectory model, which incorporates a CNN-Transformer-based generator with a Bi-LSTM discriminator in a GAN framework, thereby generating realistic trajectories while maintaining temporal coherence.
Taken together, these works can be grouped into three categories. First, CNN–RNN hybrids [54,55,60,61,62,63,64] focus on jointly learning spatial and temporal dependencies, with some studies [60,61] enhancing this design using attention for long-term dependencies. Second, phase- and situation-specific models [20,65,66,67] incorporate stage-based dynamics or additional intent/contextual information, showing strong adaptability in terminal and complex airspace scenarios. Third, advanced integrated and generative designs [18,19,58,59] push beyond conventional limits by combining state-of-the-art components—such as TCN, Transformer, social pooling, GAN, and physics-informed modules—demonstrating the potential to address multi-aircraft interactions, long-term stability, and data scarcity simultaneously.
Overall, structural hybrid models differ in architecture but share the common principle of combining heterogeneous modules to overcome the limitations of single structures. They demonstrate situationally optimized designs, whether for short- or long-term horizons, stage-specific predictions, or multi-aircraft interaction modeling.

3.5.2. Representation Learning and Generalization-Based Models

Representation learning and generalization-based models aim to learn universal representations that can be used in various downstream tasks, such as classification, clustering, and anomaly detection, beyond simple trajectory prediction. This approach uses self-supervised learning [80], contrastive learning [81], and Transformer-based encoders to generate robust feature embeddings across diverse flight scenarios and even has advantages in data-scarce environments. Reference [68] proposed a Hybrid–Recurrent framework that learns weather data and ADS-B data together. A CNN or self-attention module extracts spatial features of weather data, while LSTM, GRU, or IndRNN learns temporal dependencies. This structure showed high accuracy on specific routes but revealed limitations in generalization performance, as errors increased by 70–500% on unseen routes. Reference [69] proposed TSCC (Trajectory Segmentation-based Contrastive Coding), which segments trajectories into semantic units and applies contrastive learning. By using a Transformer-based encoder to learn both continuity and local features, performance improved in terms of trajectory classification and anomaly detection, and robust representations were generated across various downstream tasks. Reference [70] proposed the FLIGHT2VEC model, which learned trajectory representations using behavior-adaptive patching and motion trend learning, combined with Transformer-based self-supervised learning. This model showed excellent performance in various tasks such as prediction, recognition, and anomaly detection, and had significant advantages in efficiency compared with existing models.
These approaches are differentiated in that they focus not on improving trajectory prediction accuracy itself but on learning universal and reusable representations. This allows them to maintain a certain level of performance, even when datasets and tasks change, and provides an important foundation for future air traffic management and diverse areas of research.
To consolidate the discussion of individual categories, Table 4 provides a comparative summary of the deep learning model families, including their representative models, typical datasets, major strengths, and known limitations.
Building on this tabular comparison, Figure 4 presents a flow diagram that visually summarizes the taxonomy of deep learning models for aircraft trajectory prediction and highlights their representative performance characteristics across key evaluation metrics.

3.6. Classification of Composite Structure Models

In recent aircraft trajectory prediction research, rather than relying on a single neural network structure, there have been active attempts to improve performance by combining multiple modules or introducing step-by-step procedures. Such composite structure approaches are effective in simultaneously learning spatiotemporal characteristics and securing prediction stability. In this section, composite structure models are analyzed by dividing them into multi-module combined models and multi-step-based models.

3.6.1. Multi-Module Combined Models

Multi-module combined models integrate heterogeneous neural networks such as CNN, RNN, Transformer, and GCN to overcome the limitations of single structures. This design allows for simultaneous learning of spatiotemporal dependencies, aircraft interactions, and complex data distributions. They are broadly divided into structural combinations (e.g., Conv + RNN, Graph + DL, and Conv + Transformer) aimed at enhancing spatiotemporal learning, and purpose-driven extensions (e.g., generative/adversarial + predictor, representation Learning/SSL, and specialized modules), with goals such as data diversity, generalization, or domain-specific adaptations. Together, these models extend prediction capabilities beyond accuracy improvement to robustness and broader applicability. Table 5 shows our classification of multi-module combined models.
(1)
Conv + RNN Fusion Models
Conv + RNN is the most widely used structure, where CNN/TCN extracts spatial patterns and LSTM/GRU learns temporal dependencies. Representative models such as CNN-LSTM [54,59] reduced the RMSE and MAE compared with single models, while TCN- GRU + Attention [66] enhanced long-term learning. These models are simple and stable but limited in long-horizon prediction compared with Transformer-based approaches.
(2)
Graph + Deep Learning Fusion models
Graph-based models explicitly reflect aircraft interactions and airspace networks. For example, LSTM + GCN + Attention [25] improved accuracy by modeling multi-aircraft correlations, while Transformer + GAT [26] and GLR-GCN + TCN [52] performed well in congested airspaces. However, graph definition and edge weighting are complex, and large-scale computation hinders real-time use.
(3)
Conv + Transformer Fusion Models
This category combines CNN/TCN’s local feature extraction with Transformer’s long-term dependency learning. TCN-Informer [65] exhibited improved accuracy and efficiency, while CNN + Transformer-based generative models [57,59] captured both local and global patterns, generating more realistic trajectories. Despite strong scalability, the computational cost and memory demand remain challenges.
(4)
Generative/Adversarial + Predictor Models
These models integrate generative and predictive modules to address data scarcity and imbalance. WGAN-GP + LSTM [56] preserved trajectory distributions while generating new sequences, and CNN-Transformer generators with Bi-LSTM discriminators [59] improved realism and long-term accuracy. Training instability and discriminator dependence remain limitations.
(5)
Representation Learning/Self-Supervised Learning-Based Models
Representation learning enhances generalization via contrastive or self-supervised learning. Trajectory Contrastive Coding [69] produced robust embeddings, while FLIGHT2VEC [70] combined behavior-adaptive patching and motion trend learning to achieve strong results in prediction, recognition, and anomaly detection. These approaches secure universal representation spaces but often require large-scale pre-training and lack interpretability.
(6)
Other Specialized Modules
Several models add modules that are tailored to specific scenarios. Examples include clustering–CNN [62] for efficiency, Spatiotemporal Attention + RNN [20] for enhanced interaction modeling, and social-pooling [18] for neighboring aircraft interactions. IMM + Informer [19] and Bi-LSTM + AE + Voting [64] introduced correction modules for stability. While effective for unique challenges such as imbalance and multi-aircraft dynamics, these designs increase complexity and reduce efficiency.
Multi-module combined models are classified into six approaches, with structural combinations contributing to enhancing spatiotemporal learning ability, and purpose-driven extensions contributing to data diversity, generalization, and reflection of special situations. This shows that beyond simple performance improvement, they strengthen the scalability and practicality of trajectory prediction research.

3.6.2. Multi-Step-Based Models

Unlike single-pass architectures, multi-step-based models perform trajectory prediction sequentially through step-by-step procedures. This design improves stability, reliability, and efficiency by incorporating processes such as error correction, validation, ensembling, and uncertainty quantification. In this review, multi-step approaches are grouped into five categories.
(1)
Prediction–Correction Structures
The authors of references [19,67,69] generated initial predictions and refined them using correction modules such as AutoEncoder, Informer, or IMM. For example, Bi-LSTM–AutoEncoder–Voting [64] reduced error variance via a multi-stage pipeline, while IMM–Informer [19] combined physics-based initial estimation with deep learning correction to improve performance across flight phases.
(2)
Generation–Validation Structures
References [17,59] introduce concepts from GAN-based approaches. Generators produce candidate trajectories, while discriminators validate realism, or uncertainty quantification is added after prediction. The CNN + Transformer generator with a Bi-LSTM discriminator [59] improved both diversity and reliability, whereas Monte Carlo Dropout [17] enhanced confidence in prediction results.
(3)
Ensemble–Voting Structures
The authors of [64] increased prediction stability by combining multiple prediction results. Although this overlaps with the prediction–correction type, it has separate significance in reducing prediction variance and securing robustness by integrating multi-module results.
(4)
Uncertainty Quantification Structures
Reference [17] describes procedures to quantify uncertainty after prediction, complementing result interpretation and reliability. This reflects that uncertainty management is an important issue in aircraft trajectory prediction and expands applicability in terms of safety and real-time decision-making.
(5)
Preprocessing–Prediction Structures
The structures described in [20,40,68] performed preprocessing such as clustering, spatial feature extraction, or attention insertion at the input stage, followed by the predictor. Such preprocessing contributes to improving data efficiency and stabilizing model performance in complex or imbalanced datasets.
Multi-step-based models cover five complementary categories: prediction–correction, generation–validation, ensemble, uncertainty quantification, and preprocessing–prediction. Their common feature is compensating for the limitations of single structures, thereby strengthening the practicality and reliability of aircraft trajectory prediction. However, challenges remain, including increased computational costs and optimization across modules. Some studies [19,59,64] also exhibit hybrid characteristics of both multi-step and multi-module approaches, indicating that future research is moving toward more complex and converged designs.

4. Performance Evaluation and Analysis

In Section 3, the structural characteristics of deep learning models for aircraft trajectory prediction were summarized. This section reviews commonly used evaluation metrics and datasets, compares model performance, and provides answers to RQ2 and RQ3 while highlighting strengths, limitations, and future research directions.
RQ2: Which evaluation metrics are commonly used to compare the performance of aircraft trajectory prediction models?

4.1. Evaluation Metrics

In aircraft trajectory prediction research, a wide range of metrics have been employed to assess models’ accuracy and stability. Initially, evaluation focused on traditional error-based measures such as the MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error) [82]. The MAE, which is the average difference between predicted and actual values, is intuitive and less sensitive to outliers, while the RMSE penalizes large errors more heavily, making it effective for evaluating sharp maneuvers or turning segments.
The MAPE (Mean Absolute Percentage Error) has been adopted as an auxiliary index in some studies due to its intuitive interpretation, although it becomes unstable when actual values approach zero [83]. To account for temporal alignment, DTW (Dynamic Time Warping) is used to compare trajectories of different lengths or time intervals [84], while the MED (Mean Euclidean Distance) captures spatial accuracy by measuring the distance between predicted and actual positions [85].
With the growth of long-term prediction and generative approaches, the ADE (Average Displacement Error) and FDE (Final Displacement Error) have become increasingly important [86]. The ADE evaluates consistency across the entire predicted trajectory, while the FDE focuses on the positional error at the final point, which is particularly critical for landing point estimation and long-horizon forecasting.
In recent studies, probability-based measures such as the Negative Log Likelihood (NLL) [87] and Continuous Ranked Probability Score (CRPS) [88] have been introduced to quantify the prediction uncertainty and reliability of predictions. Additionally, quality metrics such as the F1 score [89], AUC [90], and diversity [91] have been employed, particularly in generative modeling contexts.
Table 6 summarizes the evaluation metrics used in the 46 reviewed studies. The category “other” includes less commonly used metrics such as the MMD, IS, FID, R2, F1, AUROC, and AUPR. The MAE and RMSE were used in 28 and 27 studies, respectively, showing the highest frequencies, while the adoption of ADE and FDE has increased rapidly since 2022. The MAPE, DTW, and MED were only used in a limited number of studies.
Figure 5 illustrates annual adoption trends. The MAE and RMSE have consistently served as standard metrics since 2020, while the ADE and FDE gained prominence after 2022, especially in generative and long-term prediction studies from 2023 to 2025. Notably, many recent works combined the ADE/FDE with the RMSE to jointly assess both short-term accuracy and long-term stability.
In addition, probability-based metrics (NLL and CRPS) and quality metrics (F1 score, AUC, and diversity) have been introduced in a limited way since 2023, mainly in Bayesian or generative models. This reflects the shift from focusing solely on accuracy toward incorporating uncertainty quantification and ensuring diversity assessment, which are critical for practical deployment.
In practice, short-term prediction scenarios benefit more from error-based metrics such as the MAE and RMSE, as they emphasize pointwise accuracy and are less sensitive to trajectory divergence over time. In contrast, the ADE and FDE are more suitable for long-term prediction and generative tasks, as they capture cumulative deviation and final position accuracy, which are critical for applications such as landing point estimation and conflict detection. Probabilistic measures (e.g., NLL and CRPS) further complement these by quantifying the prediction uncertainty, making them particularly relevant for safety-critical operations.
Each evaluation metric has its own characteristics. The MAE and RMSE remain the most widely used because they are simple, well-standardized, and allow for consistent comparison across datasets and studies. The ADE and FDE, on the other hand, have rapidly gained in popularity since 2022 with the rise in long-horizon and generative models, as they effectively capture cumulative trajectory deviation and endpoint accuracy, which are critical for operational applications such as landing point estimation. In contrast, metrics such as the MAPE, DTW, and MED are used less frequently due to inherent drawbacks: the MAPE becomes unstable when true values approach zero (e.g., altitude or ground speed), DTW is computationally expensive and less practical for large-scale or real-time evaluation, and the MED, while intuitive, largely overlaps with the ADE/FDE and therefore contributes little additional insight. Thus, the observed differences reflect not only research trends but also the practical suitability and limitations of each metric, with adoption rates varying according to specific application contexts.
RQ3: Which types of data and representative datasets are used as inputs for aircraft trajectory prediction models?

4.2. Datasets

The performance of trajectory prediction models is highly dependent on the datasets that are used for training and evaluation. Table 7 provides a concise summary of the major datasets, while the following paragraphs describe their structure, scope, strengths, and limitations in more detail.
  • ADS-B provides state vectors such as latitude/longitude, altitude, speed, heading, timestamp, and aircraft identifiers, which are broadcast by aircraft transponders. It is widely used due to its global availability, but raw data often contain noise, missing values, and inconsistencies that require careful preprocessing and filtering.
  • OpenSky Network [13] is a crowdsourced repository that aggregates ADS-B signals from a distributed sensor network. It offers open access and has become a standard research dataset. However, its coverage is uneven—dense across Europe and North America but sparse elsewhere—leading to potential biases and limited generalizability if used in isolation.
  • Institutional Radar and Flight Plan Data (FAA [12], EUROCONTROL [92], CAAC [93], and NATS [94]) provide high-fidelity radar tracks, flight plans, and sometimes weather information. These datasets are region-specific, and access is often restricted, but they offer strong reliability and realism when integrated with open sources.
  • Commercial Platforms (Flightradar24 [95], FlightAware [96], and ADS-B Exchange [97]) provide user-friendly interfaces and broad global coverage. They are frequently used in academic and applied studies. However, subscription-based access tiers may limit reproducibility and scalability for large-scale research.
  • Simulation Data (e.g., DCS World [98], TacView [99], and Air Combat [100]) allow for flexible scenario design, particularly for rare or extreme conditions such as combat maneuvers or emergency situations. While synthetic data are noise-free and customizable, they cannot fully replicate the complexity of real-world operational environments.
While Table 7 summarizes the core characteristics of these datasets, the above descriptions clarify their structural fields, coverage differences, and limitations. In particular, the uneven coverage of the OpenSky Network highlights the importance of integrating multiple complementary sources to ensure generalizability and reproducibility in trajectory prediction research.
Table 7. Major dataset sources for aircraft trajectory prediction, including data types, strengths, limitations, and accessibility conditions.
Table 7. Major dataset sources for aircraft trajectory prediction, including data types, strengths, limitations, and accessibility conditions.
SourceProvided Data
(Data Structure)
StrengthsLimitationsAccessibility
OpenSky Network [13]Global ADS-B data (latitude, longitude, altitude, speed, track, timestamp, aircraft ID).Free and open access, large-scale data availability, widely used in the research community.Uneven regional coverage, missing values, and noise present.Open (Free)
FAA (Federal Aviation Administration) [12]U.S. ADS-B, radar tracks, flight plans, weather data.High reliability for U.S. air traffic data, integration with auxiliary data possible.Restricted access for some datasets, prior authorization may be required.Partially Restricted
CAAC (Civil Aviation Administration of China) [93]ADS-B-based positions, speeds, altitudes, and airport-specific data in China.Large-scale trajectory data for Chinese air routes and airports.Limited public availability, access procedures required.Restricted
EUROCONTROL
[92]
European air traffic data (flight plans, ADS-/radar tracks, meteorological data).Comprehensive coverage of European airspace, tailored for ATM research.Access restrictions, often requires collaborative projects.Restricted
NATS (UK National Air Traffic Services) [94]U.K. and European flight trajectories, flight plans, ATC data.High-quality data, suitable for ATC simulation research.Limited public release, typically requires institutional collaboration.Restricted
Commercial Platforms Real-time and historical aircraft positions and trajectories.User-friendly, high-quality data available in paid version.Free version limited, scalability issues for large-scale research.Freemium (Free + Paid Tiers)
Table 8 summarizes the main dataset sources used in the reviewed studies and their frequency of use. The most frequently used dataset was the OpenSky Network (12 studies), which provides free global ADS-B data with high reproducibility and accessibility. Commercial platforms (12 studies) were also widely used, including ADS-B data collected from services such as FlightRadar24 and ADS-B Exchange.
For region-specific datasets, CAAC (China, six studies), FAA (United States, two studies), EUROCONTROL (one study), CETC/HU7603·ATMB (China, three studies), and SCAT (Sweden, one study) were identified. These datasets are highly reliable as they target specific national or regional air routes and airspaces, but they have the limitation of restricted accessibility. Simulation datasets (six studies) were also used, which have the advantage of reflecting diverse maneuvers and scenarios that are difficult to capture in real operational data.
Figure 6 shows the overall distribution of dataset types used in the reviewed studies. About 64% of the studies were based on real ADS-B datasets, while synthetic data (14%), flight plan (10%), and weather data (8%) were also utilized. Radar and GNSS data were used in a limited number of cases.
Figure 7 shows the annual trends in dataset usage. ADS-B remained dominant throughout the period, but since 2022, studies using multi-data integration with flight plan and weather data have increased.
From this analysis, it can be seen that while aircraft trajectory prediction research still relies heavily on ADS-B-based data, the use of multi-data integration and synthetic datasets has been gradually expanding in recent years. This trend reflects both the strengths and limitations of ADS-B data. ADS-B remains the most widely used source due to its global availability, high temporal resolution, and open accessibility, making it indispensable for large-scale trajectory modeling. However, ADS-B data also suffer from noise, missing values, and uneven coverage across regions, which limit their reliability for operational deployment. To overcome these issues and enhance realism, recent studies have increasingly integrated complementary datasets such as radar tracks, flight plans, and weather information, which provide higher accuracy and contextual factors. In parallel, synthetic and simulation datasets are being adopted to compensate for data imbalance and to generate rare or extreme scenarios that are underrepresented in real-world data. Thus, the gradual expansion of multi-source integration and synthetic datasets does not imply that ADS-B is obsolete but rather that it is being complemented to address its inherent limitations and support broader research objectives.

4.3. Performance Comparison

This section compares and analyzes how the major deep learning model families perform in actual experimental environments, using the evaluation metrics and datasets discussed in Section 4.1 and Section 4.2. This analysis expands the discussion on the metrics and datasets summarized in RQ2 and RQ3 and presents concrete applications for understanding the structural characteristics of models, as introduced in RQ1.

4.3.1. RNN-Based Models

The RNN family represents the earliest deep learning approach to be applied to aircraft trajectory prediction and remains a common baseline. LSTM models [15] showed stable short- and mid-term performance with ADS-B datasets, but their accuracy declined significantly at longer horizons, particularly in terms of altitude prediction. Modified architectures, such as Social-LSTM [35], ConvLSTM [36], Constrained LSTM [37], and Bi-LSTM [39], were introduced to address these weaknesses.
Social-LSTM [35] integrated social pooling to capture multi-aircraft interactions, reducing errors by approximately 10–15% over standard LSTM in congested airspaces. ConvLSTM [36] combined a CNN for spatial feature extraction with LSTM for temporal learning, lowering the RMSE from about 0.12–0.015 km to about 0.011 km and effectively mitigating cumulative long-term errors. Constrained LSTM [37] incorporated phase-specific flight dynamics (climb, cruise, and descent), improving the prediction accuracy and better aligning outputs with real operational patterns. Bi-LSTM [39] enhanced robustness against missing or incomplete inputs by learning bi-directional temporal dependencies, achieving slightly improved RMSE and MAE values compared with the baseline LSTM.
Overall, RNN-based models continue to provide reliable baseline performance in short- and mid-term prediction. However, their limitations in long-horizon accuracy and generalization remain evident, and improvements through structural variants often come at the cost of increased complexity and computational demand. The detailed structural features, datasets, and performance results of RNN-based models are summarized in Table 9.

4.3.2. Attention-Based Models

Attention-based models were introduced in aircraft trajectory prediction to overcome the problems of long-term dependency and the sequential processing limitations of RNNs. The self-attention mechanism enables learning of global patterns across the entire input sequence, reducing cumulative errors in long-term prediction and securing efficiency through parallel computation. Consequently, recent studies increasingly favor attention-based models over the RNN family, as they provide more reliable long-horizon predictions and improved computational scalability, and can be divided into two categories: the first is attention-oriented models, which directly adopt Transformer-based structures, and the second is hybrid and context-integrated models, which combine attention with CNNs, RNNs, or external contextual information.
Attention-oriented models focus on enhancing long-term prediction through extensions of Transformer structures. For example, ref. [40] introduced the Trajectory Embedding Transformer (TET) with positional encoding and an encoder–decoder structure, achieving an ADE of 1.84, FDE of 2.37, and MDE of 3.92 on OpenSky datasets, with an up to 51.4% FDE improvement over RNNs. The authors of [41] developed the Flight Trajectory Transformer (FT-TF), which outperformed LSTM, BP, Autoformer, and Informer on the FLIGHT19 dataset, with an altitude RMSE of 124.35 and latitude/longitude RMSE 0.0081° and 0.0132°. Reference [42] proposed an Attention-LSTM using ADS-B data (Civil Aviation Administration of China, 2020), where attention improved the RMSE, MAE, DTW, and MRE compared with LSTM and other baselines. Reference [43] proposed stabilization and one-step inference modules to reduce the RMSE by 10–30% compared with GRU, LSTM, CNN-LSTM, and vanilla Transformer, while [21] introduced the Noise-Robust Autoregressive Transformer (NRAT), which improved robustness with Gaussian noise injection, hybrid positional encoding, and scheduled sampling, reducing the RMSE by more than 27% under horizon 100.
Hybrid and context-integrated models strengthen practical applicability by combining attention with other networks or external variables. Reference [44] presented FlightBERT++, combining Conv1D + Transformer encoders with a horizon-aware context generator, achieving an MAE of 0.0017–0.0124 and RMSE of 1.15–7.43 m across horizons with inference speeds that were suitable for real-time use. The authors of [45] applied the Temporal Fusion Transformer (TFT) on 7146 OpenSky flights, showing latitude/longitude MAE of 0.0133° and 0.0170°, thus outperforming LSTM, although the altitude accuracies remained similar. Reference [46] proposed the Inverted Transformer, treating variables as tokens, which reduced the MAE to 0.0602 on the CETC-10 dataset and showed potential for free route airspace and collision avoidance. The authors of [47] developed the Patched Spatial-Temporal Transformer (PSTT), which segmented trajectories into patches and applied spatial-temporal attention, recording an MAE of 0.179 and MSE of 0.161, with spatial attention contributing the most to performance. Reference [48] proposed an Attention-LSTM encoder–decoder for military aerial combat data, achieving an ADE of 0.625, a 32% improvement compared with over Bi-LSTM. The detailed structural features, datasets, and performance results of attention-based models are summarized in Table 10.
In summary, attention-based models consistently outperform RNN-based models in capturing long-term dependencies, handling large-scale data, and ensuring robustness in noisy conditions. Hybrid and context-integrated approaches further enhance applicability by incorporating external features, although high computational cost and dependence on large-scale training data remain major challenges.

4.3.3. Generative Models

Generative models extend beyond simple trajectory prediction by generating new samples or expanding data distributions to secure generalization and diversity. This family includes GAN, and diffusion approaches, which differ from RNN- and attention-based models by also employing the ADE, FDE, MED, diversity score, and NLL as evaluation metrics. GAN-based methods learn trajectory distributions to create realistic sequences. For example, ref. [49] proposed a Conv1D-based TPGAN on Beijing–Chengdu ADS-B data, achieving MAE of 0.070 (Lat), 0.055 (Lon), and 0.041 (Alt), outperforming the Conv2D and LSTM variants. Reference [28] introduced a CTGAN model using 2,100 Hong Kong flights, reducing the MED by 81.55% and improving latitude, longitude, altitude, and speed predictions by 7.76%, 84.48%, 81.98%, and 36.51%, respectively. Importantly, it maintained stability even with limited data.
Diffusion-based methods enhance long-term stability by gradual data generation and improve diversity through goal-conditioning. The authors of [50] applied this to Singapore Changi arrivals, achieving an ADE of 0.508/FDE of 0.962 in 2D and an ADE of 0.528/FDE of 1.003 in 3D, with contextual inputs improving results by 6–8%. Reference [51] proposed a goal-conditioned diffusion model trained on UK (OpenSky) data, reaching an ADE of 0.365, FDE of 0.987, and goal hit rate of 66.2%, which was about 20% better than earlier models. Table 11 shows the detailed structural features, datasets, and performance of generative models.
Overall, generative models deliver RNN-level performance in short-term prediction but show more significant advantages in long-term accuracy and diversity. GAN-based models excel in data-scarce conditions, while diffusion models reduce uncertainty and improve goal alignment. Remaining challenges include training instability and limited generalization beyond operational datasets.

4.3.4. Graph-Based Models

Graph-based models have gained attention in aircraft trajectory prediction because they can directly reflect interactions among aircraft. While existing RNN- or attention-based models showed strengths in learning the time-series patterns of individual aircraft, they had limitations in sufficiently expressing spatial and relational constraints arising in multi-aircraft situations. To overcome this, the GNN (graph neural network) models aircraft as nodes in a graph and their spatial/operational interactions as edges. Such a structure is particularly effective in situations with high aircraft density, such as in terminal areas or under complex traffic patterns. The authors of [25] applied GNN to fighter trajectory simulation data and, under conditions of a 10-step input and 30-step output, recorded an ADE of 0.713 km and FDE of 1.235 km, demonstrating robustness in combat scenarios. The authors of [26] performed multi-step prediction based on CS World fighter simulation data (30 scenarios) and achieved high accuracy with an ADE of 0.009 km and FDE of 0.009 km. The authors of [52] utilized OpenSky ADS-B data to simultaneously learn node attributes and trajectory attributes, achieving an MAE of 0.1863 and RMSE of 0.3644, which improved relational feature capture compared with RNN models. The authors of [53] modeled terminal areas using ADS-B trajectories, achieving an ADE of 0.0082 and FDE of 0.011, thereby demonstrating predictive accuracy in complex airspace scenarios. The GNN family shows a 10–20% performance improvement compared with baselines in situations where multi-aircraft interaction is important, with particular strength in terminal areas. On the other hand, the graph construction process is challenging when defining nodes and edges, and as scenarios become more complex, the computational cost of updating graphs increases significantly. Moreover, scalability issues may arise when training with large-scale real ADS-B data. Table 12 shows the detailed structural features, datasets, and performance of graph-based models.
In summary, graph-based models demonstrate new possibilities for aircraft trajectory prediction through their unique advantage of relational information learning and interaction representation, but they face constraints in terms of computational cost and scalability.

4.3.5. Hybrid and Integrated Models

Hybrid models integrate architectures such as CNN, RNN, attention, Transformer, and GAN to maximize complementary strengths. By combining different modules, they achieve the lowest RMSE and MAE values (≈0.011–0.012) of all reviewed model families, consistently outperforming single-architecture baselines. Particularly, studies that integrated weather variables, flight intent, or GAN-based data augmentation demonstrated marked performance gains in both short- and long-term prediction. Despite these advent- ages, increased structural complexity and longer training times highlight the need for lightweight optimization to enable real-time application.
Structural hybrid models can be grouped into three categories according to their design methods and performance characteristics. The first type combines CNN/TCN with RNN, allowing for simultaneous learning of spatial patterns and temporal dependencies. For example, CNN-LSTM [54] reduced RMSE by about 20% compared with a single LSTM, verifying the improvement over baseline models. Extended designs such as those presented in [60,61], which integrated attention, alleviated long-horizon error accumulation, achieving reductions in RMSE and MAE values, particularly at turning points. These CNN/TCN + RNN hybrids also demonstrated robustness under noisy or incomplete data, making them effective for stable short- and mid-term prediction optimization, with overall gains of 15–20% relative to baselines.
The second category includes phase/terminal/intent-specific models that incorporate domain knowledge into the prediction pipeline. The authors of [55] applied phase-dependent modules, achieving large reductions in altitude MAEs during takeoff, climb, and descent. The authors of [66] maintained stable long-term RMSE values by encoding flight intent information, while in [20,65], the authors produced highly precise predictions in terminal areas, reaching a latitude MAE of about 0.017° and altitude MAE of about 6.1 m. These models greatly enhanced realism and accuracy in complex segments of flight but required precise phase recognition and additional data sources, limiting their general applicability.
The third category involves advanced integration and generative-based designs, which combine cutting-edge modules to enhance long-term prediction and uncertainty quantification. GAN-based hybrids such as those presented in [56,58] reduced the RMSE and MAE by as much as 70% under data-scarce conditions, showing clear advantages in distributional learning. CG3D [17] used CNN-GRU, 3D CNN, and Monte Carlo Dropout to simultaneously model spatiotemporal features and quantify predictive uncertainty, enhancing stability in long horizons. SATF [57] integrated frequency-domain learning with Transformer, achieving R2 > 0.95 in long-term prediction, while IMM-Informer [19] merged physics-based IMM with a Transformer variant to reduce the RMSE by 10–30%. These advanced hybrids proved especially strong in multi-aircraft, multi-step, and uncertainty-sensitive scenarios, although their model complexity and training costs limited real-time deployment.
Representation learning and generalization-based hybrid models aim to learn universal representations rather than optimizing for a single prediction task. The Hybrid-Recurrent framework [68] integrated weather variables with ADS-B, showing high accuracy under specific conditions but large error increases (70–500%) on unseen routes, reflecting generalization challenges. TSCC [69], which segments trajectories and applies contrastive learning, improved the F1-score and AUC by >10% compared with Autoencoder baselines, generating robust embeddings for classification and anomaly detection. FLIGHT2VEC [70], the most recent contribution, combined behavior-adaptive patching with motion trend learning in a Transformer-based self-supervised framework. It reduced the RMSE by 12–18%, reached state-of-the-art results in recognition and anomaly detection, and was significantly more efficient than prior models. While these approaches effectively address limitations in data-scarce or new environments, they often come with high initial training costs and emphasize representation quality over direct trajectory accuracy. Table 13 summarizes the structural features, datasets, and performance of structural hybrid deep learning models for aircraft trajectory prediction, while Table 14 presents the corresponding details for representation learning and generalization-based deep learning models.
In summary, hybrid models have emerged as a central evolutionary pathway in aircraft trajectory prediction. Structural fusion designs excel in short-term stability, long-term accuracy, and uncertainty handling, while representation-based hybrids enhance scalability and adaptability across downstream tasks. Reported performance improvements range from 15% to over 70% above baseline models, especially in studies leveraging contextual integration or GAN-based augmentation. However, their strengths are offset by increased complexity, longer training times, and challenges in generalizing from simulation or route-specific data to actual operational environments. These findings suggest that future research should focus on lightweight architectures, efficient optimization strategies, and hybrid frameworks that balance accuracy with scalability.

4.4. Integrated Discussion of Deep Learning-Based Trajectory Prediction Research

This section synthesizes the performance evaluation results presented in previous sections to provide an integrated discussion of performance trends and limitations across the different model families. While absolute numerical comparisons are difficult due to heterogeneous evaluation metrics, general patterns emerge around the RMSE, MAE, and ADE/FDE.
RNN-based models have served as early baselines, providing stable short- and mid-term performance, with Bi-LSTM and ConvLSTM showing robustness to noisy data. However, cumulative errors and altitude instability remain major limitations in long-horizon prediction. Attention models clearly address these long-term dependency issues, achieving consistent improvements in RMSE, MAE, and ADE/FDE values, while hybrid attention structures integrating weather, intent, and multi-aircraft information have enhanced practical applicability. The main barriers are their reliance on large-scale datasets and high computational costs. Generative models (GAN, diffusion, etc.) deliver competitive or slightly weaker performance on conventional metrics but excel in diversity, ADE/FDE, and NLL, making them effective for small-data and long-horizon scenarios. Their challenges are training instability and limited generalization to real operational environments. Graph-based models explicitly captured multi-aircraft interactions, demonstrating 10–20% ADE/FDE improvement in dense or terminal airspace. However, graph construction and scalability remain significant obstacles for real-time large-scale deployment. Hybrid and integrated models generally achieved the best overall results, recording the lowest RMSE/MAE values (about 0.011–0.012 km). CNN-LSTM and Phase Hybrid models improved short- and mid-term predictions, while advanced integrations with Transformer, GAN, or physics-based modules enhanced long-term stability and uncertainty quantification. Representation learning-based hybrids further strengthened generalization across tasks. Their drawbacks are high structural complexity and computation cost.
Table 15 shows the strengths and limitations of the different categories of models for aircraft trajectory prediction. Attention and hybrid and integrated models demonstrate overall superiority, while generative and graph-based models provide specialized advantages in specific contexts. RNNs continue to serve as useful baselines but exhibit clear weaknesses in long-term scenarios. The absence of standardized datasets and evaluation frameworks remains a fundamental limitation across studies
In addition to Table 15, which outlines the general strengths and limitations of each model family, we further provide a structured comparison in Table 16. This complementary table highlights three critical dimensions—performance, computational cost, and robustness—allowing for a clearer assessment of trade-offs across model categories. By presenting these dimensions side by side, this review enables a more systematic evaluation of a model’s suitability for practical deployment. For a more detailed comparison of model performance across datasets, computational cost, and robustness, an extended version of Table 16 is provided in the Supplementary Materials (Table S4).

5. Applications and Future Research Directions

This section integrates the findings on the applicability of deep learning-based trajectory prediction models and the challenges for real-world deployment. While the structural and performance analyses presented in previous sections highlight their academic value, translating these methods into operational environments requires further consideration of application areas, technical requirements, and future research directions.
RQ4: In which application domains are aircraft trajectory prediction models applied, and which technical considerations must be addressed for real-time system implementation?

5.1. Application Domains

Deep learning-based trajectory prediction models have been applied to several areas of aviation operations. The most direct application is in air traffic management (ATM), where accurate short- and mid-term predictions support conflict avoidance, separation assurance, and runway sequencing. In particular, hybrid and integrated models and attention-based models, which show improved stability in long-horizon prediction, are considered promising for managing high-density terminal areas. However, most studies remain limited to offline validation, and challenges remain in achieving real-time integration and ensuring safety certification.
Generative models such as GANs and diffusion models are increasingly applied for data augmentation to overcome problems of imbalance and scarcity. By generating synthetic data for rare scenarios or extreme conditions, these models contribute to improved prediction performance and generalization, while also being useful for safety validation, pilot training, and policy evaluation in simulation environments. However, diffusion models also face critical limitations for real-time aviation systems. Their iterative sampling process typically requires hundreds of inference steps, resulting in high computational cost and latency. These constraints make them less suitable for time-sensitive applications such as air traffic control or collision avoidance, where rapid predictions are essential. Current research trends therefore emphasize reducing the number of sampling steps, developing accelerated inference algorithms, or designing lightweight diffusion variants to enhance their practicality in operational environments.
Another important application is anomaly detection and safety monitoring. Since trajectory prediction models learn the distribution of normal flight patterns, they can detect abnormal trajectories or potential risk events in real time. Graph- and attention-based models are particularly effective in multi-aircraft scenarios, where relational interactions must be captured to identify collective anomalies.
Trajectory predictions are also utilized for optimization and decision support. By forecasting aircraft positions under varying conditions, models can assist in selecting fuel-efficient routes, reducing congestion, and dynamically reallocating airspace. When combined with reinforcement learning, they can further support real-time decisions such as rerouting, conflict resolution, or runway assignment, thereby contributing to both cost reduction and environmental sustainability.
The greatest potential benefit arises from real-time integration of prediction models with operational systems. When coupled with ATM and flight management systems, predictions can directly inform decisions such as dynamic rerouting in adverse weather, congestion management in terminal areas, and early detection of abnormal trajectories. Real-time applicability thus represents a crucial direction for transforming these models from academic prototypes into operational tools.

5.2. Technical Considerations for Applications

For trajectory prediction models to be deployed in operational environments, several technical requirements must be addressed. Real-time processing is the foremost challenge, as even short delays can compromise aviation safety. In dense and high-traffic airspaces, scalability poses an additional challenge, as models must process large volumes of streaming data from multiple aircraft simultaneously. This requires not only algorithmic efficiency but also distributed architectures and parallel processing strategies to ensure that predictions remain accurate and timely under heavy workload conditions. In summary, addressing scalability in dense airspaces requires not only distributed and parallel architectures but also lightweight model design and high-performance hardware to ensure both efficiency and reliability.
Equally important are reliability and safety certification. Since aviation systems are subject to strict international regulations, prediction models must provide not only accuracy but also explainability and uncertainty quantification to meet certification requirements. This aspect is particularly critical in aviation, where regulatory bodies and operators must be able to interpret and validate the reasoning behind predictions. Without sufficient transparency, even highly accurate models may face challenges in gaining operational trust and certification. Incorporating explainable AI (XAI) methods and advanced uncertainty quantification is therefore essential for bridging the gap between technical performance and regulatory acceptance. Standardization and interoperability also represent essential issues, as prediction modules must integrate seamlessly with existing ATM, flight management, and safety monitoring systems. This necessitates standardized data formats, communication protocols, and operational interfaces.
Concerns about data security and privacy must also be addressed. While ADS-B data is publicly available, sensitive flight information may be restricted. In cloud-based learning and real-time data-sharing environments, encryption and access control mechanisms are indispensable. Finally, operational costs and maintainability cannot be overlooked. Highly complex models may achieve high accuracy but increase computational costs and hinder long-term system maintenance. Striking a balance between prediction accuracy and computational efficiency is therefore critical for practical deployment.
RQ5: What are the limitations of current aircraft trajectory prediction studies, and which research directions are being proposed for the future?

5.3. Future Research Directions

Deep learning-based aircraft trajectory prediction has made significant progress in recent years, yet several challenges still prevent its full operational adoption. Current limitations—such as reliance on narrow data sources, high computational costs, lack of scalability, and insufficient explainability—must be systematically addressed. Rather than presenting all possible directions equally, future research should establish clear priorities that can maximize both scientific advancement and practical adoption. In this section, we highlight three major priorities: (1) multimodal data fusion and standardized benchmarks, (2) model simplification and computational efficiency, and (3) emerging methods for robustness and generalization. These priorities are followed by dedicated discussions on explainability and the gap between academic prototypes and industry adoption.
(1) Multimodal data fusion and standardized benchmarks should be the foremost priority.
Most existing studies still rely heavily on ADS-B data, which, despite its wide availability, suffers from limitations in quality and coverage. Future research should therefore focus on constructing multimodal datasets that incorporate weather, ATC instructions, and flight plans, while also building large-scale standardized benchmarks to ensure reproducibility. Addressing institutional barriers to data sharing is equally essential to enable wider collaboration and more realistic modeling.
(2) Simplification of complex architectures and improvements in computational efficiency are critical.
Although hybrid models currently achieve the strongest performance, their structural complexity and high computational costs hinder real-time deployment. Future work should therefore prioritize strategies such as simplifying multi-module architectures, pruning, quantization, and distillation, while incorporating explainable AI and robust uncertainty quantification to strengthen trustworthiness in operational settings.
(3) Robustness and generalization must be actively pursued.
Research should move beyond accuracy alone and develop methods that enhance stability and transferability across scenarios. Self-supervised learning has significant potential for exploiting large volumes of unlabeled data, while multi-step and multi-module frameworks can further strengthen long-term stability. Integrated evaluation metrics that combine accuracy and uncertainty will allow for more holistic performance assessment. Future studies may also explore reinforcement learning combined with trajectory prediction to support real-time decision-making in dense and dynamic airspaces.
(4) Explainability remains a fundamental requirement in aviation applications.
Providing accurate predictions alone is not sufficient; models must also offer clear insights into why specific predictions are made. Future research should therefore prioritize the development of interpretable deep learning architectures, attention visualization methods, feature attribution techniques, and tools for interpreting temporal and spatial patterns. These approaches, when coupled with robust uncertainty quantification, will enhance transparency, support safety certification, and help bridge the gap between technical performance and operational acceptance.
(5) Bridging the gap between academic prototypes and industry adoption is essential.
Despite strong academic progress, most studies rely on open datasets such as OpenSky or ADS-B, while high-fidelity datasets maintained by agencies like FAA, EUROCONTROL, and CAAC remain restricted. Even when models demonstrate high accuracy in research environments, practical deployment requires meeting conditions such as certification, real-time integration, and data security. Future efforts should therefore emphasize industry collaboration, regulatory alignment, and pilot testing to accelerate real-world adoption.
From a forward-looking perspective, future research must pursue strategies that advance accuracy, uncertainty management, real-time applicability, and explainability simultaneously, thereby ensuring the sustainable development and practical adoption of deep learning-based aircraft trajectory prediction technologies.

6. Conclusions

This review systematically analyzed 46 major studies on deep learning-based aircraft trajectory prediction published between 2020 and June 2025. By comprehensively examining model architectures, datasets, evaluation metrics, and performance comparisons, the current state and achievements of the field were summarized. By including research up to June 2025, this review ensures that the latest research trends are fully reflected.
The results show that RNN-based models remain widely used as baselines, offering stable performance in short- and mid-term prediction, but their limitations become evident in long-term forecasting. Attention-based models overcame these weaknesses and demonstrated clear superiority in long-horizon prediction. Hybrid models, by combining the strengths of diverse architectures, achieved the lowest RMSE and MAE values overall. Generative models revealed new possibilities in trajectory diversity and uncertainty quantification, while graph-based models provided unique advantages in modeling multi-aircraft interactions.
From the perspective of data and metrics, most studies still rely heavily on ADS-B sources such as OpenSky, with some integrating weather, flight plan, and ATC information to enhance realism. However, differences in dataset preprocessing and evaluation metrics across studies remain a major limitation, making direct performance comparison and reproducibility difficult. This highlights the need for standardized datasets and unified evaluation frameworks.
Trajectory prediction models hold strong potential for innovation in various application areas, including air traffic management (ATM), anomaly detection, and operational optimization. However, their practical deployment is still hindered by data quality and standardization gaps, model complexity, high computational costs, and limitations in real-time application.
To translate these application prospects into reality, future research should focus on overcoming current barriers—for example, by using self-supervised learning to address data scarcity, employing explainable AI to enhance model transparency, extending multi-module architectures for robustness, and developing new frameworks that integrate both accuracy and uncertainty.
The unique contributions of this review lie in offering the first structured and reproducible synthesis that is exclusively dedicated to deep learning-based aircraft trajectory prediction, presenting a comprehensive taxonomy of model families, providing quantitative comparisons across standardized metrics and datasets, and critically examining both practical applications and unresolved challenges.
In conclusion, research on deep learning-based aircraft trajectory prediction is evolving beyond mere accuracy improvement toward incorporating diversity, uncertainty, and real-time applicability. By presenting the trends and limitations of existing research, this review provides guidance for both academia and industry to advance trajectory prediction technologies in a practical and impactful way.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app151910739/s1. Table S1. Categorized list of reviewed studies on deep learning-based aircraft trajectory prediction (2020–June 2025) (subdivided into RNN-based, attention-based, generative, graph-based, and hybrid and integrated models). Table S2. Search strings and query parameters used for the systematic literature review (PRISMA protocol). Table S3. Dataset identifiers, sources, and access links (e.g., ADS-B, OpenSky, FAA, EUROCONTROL, CAAC). Table S4. Extended comparison tables of model performance across datasets, computational cost, and robustness.

Author Contributions

Conceptualization, N.K.; methodology, N.K.; formal analysis, N.K.; investigation, N.K. and B.L.; validation, B.L.; data curation, N.K.; writing—original draft preparation, N.K. and B.L.; writing—review and editing, N.K. and B.L.; supervision, B.L.; visualization, N.K.; project administration, B.L.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korean government (MSIT) (IITP-2025-2022-00156334).

Data Availability Statement

No new data were created in this study. The datasets analyzed are publicly available and can be accessed from the OpenSky Network (https://opensky-network.org), the Federal Aviation Administration (FAA, https://www.faa.gov), and EUROCONTROL (https://www.eurocontrol.int).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Z.; Delahaye, D.; Farges, J.-L.; Alam, S. Complexity Optimal Air Traffic Assignment for Multi-Layer Transport Network in Urban Air Mobility Operations. Transp. Res. Part C Emerg. Technol. 2022, 142, 103776. [Google Scholar] [CrossRef]
  2. Schuchardt, B.I.; Geister, D.; Lüken, T.; Knabe, F.; Metz, I.C.; Peinecke, N.; Schweiger, K. Air Traffic Management as a Vital Part of Urban Air Mobility—A Review of DLR’s Research Work from 1995 to 2022. Aerospace 2023, 10, 81. [Google Scholar] [CrossRef]
  3. Rajendran, S.; Srinivas, S. Air taxi service for urban mobility: A critical review of recent developments, future challenges, and opportunities. Transp. Res. Part E Logist. Transp. Rev. 2020, 143, 102090. [Google Scholar] [CrossRef]
  4. Zhang, Z.; Guo, D.; Zhou, S.; Zhang, J.; Lin, Y. Flight trajectory prediction enabled by time-frequency wavelet transform. Nat. Commun. 2023, 14, 5258. [Google Scholar] [CrossRef]
  5. Corbetta, M.; Banerjee, P.; Okolo, W.; Gorospe, G.; Luchinsky, D.G. Real-Time UAV Trajectory Prediction for Safety Monitoring in Low-Altitude Airspace. In Proceedings of the AIAA Aviation 2019 Forum, Dallas, TX, USA, 17–21 June 2019. [Google Scholar] [CrossRef]
  6. Weitz, P. Determination and visualization of uncertainties in 4D-trajectory prediction. In Proceedings of the 2013 Integrated Communications, Navigation and Surveillance Conference (ICNS), Herndon, VA, USA, 23–25 April 2013; pp. 1–9. [Google Scholar] [CrossRef]
  7. Rohani, A.S.; Puranik, T.G.; Kalyanam, K.M. Machine Learning Approach for Aircraft Performance Model Parameter Estimation for Trajectory Prediction Applications. In Proceedings of the 2023 IEEE/AIAA 42nd Digital Avionics Systems Conference (DASC), Barcelona, Spain, 18–22 September 2023; pp. 1–9. [Google Scholar] [CrossRef]
  8. Wild, G.; Baxter, G.; Srisaeng, P.; Richardson, S. Machine learning for air transport planning and management. In Proceedings of the AIAA Aviation 2022 Forum, Chicago, IL, USA, 27 June–1 July 2022; p. 3706. [Google Scholar] [CrossRef]
  9. Xu, D.; Wang, Y.; Jia, L.; Qin, Y.; Dong, H. Real-time road traffic state prediction based on ARIMA and Kalman filter. Front. Inf. Technol. Electron. Eng. 2017, 18, 287–302. [Google Scholar] [CrossRef]
  10. Prevost, C.G.; Desbiens, A.; Gagnon, E. Extended Kalman Filter for State Estimation and Trajectory Prediction of a Moving Object Detected by an Unmanned Aerial Vehicle. In Proceedings of the 2007 American Control Conference, New York, NY, USA, 9–13 July 2007; pp. 1805–1810. [Google Scholar] [CrossRef]
  11. Wang, Y.; Pang, Y.; Liu, Y.; Dutta, P.; Yang, B.-J. Aircraft Trajectory Prediction and Risk Assessment Using Bayesian Updating. In Proceedings of the AIAA Aviation Forum 2019, Dallas, TX, USA, 17–21 June 2019; AIAA 2019-2936. pp. 1–13. [Google Scholar]
  12. Federal Aviation Administration (FAA). Air Traffic by the Numbers 2022; FAA: Washington, DC, USA, 2022. Available online: https://www.faa.gov (accessed on 29 August 2025).
  13. Schaefer, M.; Strohmeier, M.; Lenders, V.; Martinovic, I.; Wilhelm, M. Bringing up OpenSky: A large-scale ADS-B sensor network for research. In Proceedings of the 13th IEEE/ACM International Symposium on Information Processing in Sensor Networks (IPSN), Berlin, Germany, 15–17 April 2014; pp. 83–94. [Google Scholar] [CrossRef]
  14. Hashemi, S.M.; Botez, R.M.; Ghazi, G. Bidirectional Long Short-Term Memory Development for Aircraft Trajectory Prediction Applications to the UAS-S4 Ehécatl. Aerospace 2024, 11, 625. [Google Scholar] [CrossRef]
  15. Silvestre, J.; Mielgo, P.; Bregon, A.; Martínez-Prieto, M.A.; Álvarez-Esteban, P.C. Towards aircraft trajectory prediction using LSTM networks. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (SAC ‘24), New York, NY, USA, 8–12 April 2024; pp. 1059–1060. [Google Scholar] [CrossRef]
  16. Zhang, Y.; Jia, Z.; Dong, C.; Liu, Y.; Zhang, L.; Wu, Q. Recurrent LSTM-based UAV Trajectory Prediction with ADS-B Information. In Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
  17. Nia, H.S.; Regan, A.C. 4D Flight Trajectory Prediction Using a Hybrid Deep Learning Prediction Method Based on ADS-B Technology: A Case Study of Hartsfield Jackson Atlanta International Airport. Transp. Res. Part C Emerg. Technol. 2022, 144, 103878. [Google Scholar] [CrossRef]
  18. Hao, Q.; Zhang, J.; Jing, T.; Wang, W. Flight Trajectory Prediction Using an Enhanced CNN-LSTM Network. arXiv 2024. [Google Scholar] [CrossRef]
  19. Li, F.; Xu, X.; Wang, R.; Ma, M.; Dong, Z. Flight Trajectory Prediction Based on Automatic Dependent Surveillance-Broadcast Data Fusion with Interacting Multiple Model and Informer Framework. Sensors 2025, 25, 2531. [Google Scholar] [CrossRef]
  20. Dong, X.; Tian, Y.; Dai, L.; Li, J.; Wan, L. A New Accurate Aircraft Trajectory Prediction in Terminal Airspace Based on Spatio Temporal Attention Mechanism. Aerospace 2024, 11, 718. [Google Scholar] [CrossRef]
  21. Youyou, Y.; Fang, Y.; Long, T. Noise Robust Autoregressive Transformer for Aircraft Trajectory Prediction. Sci. Rep. 2025, 15, 11370. [Google Scholar] [CrossRef]
  22. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv 2020. [Google Scholar] [CrossRef]
  23. Du, D.; Su, B.; Wei, Z. Preformer: Predictive Transformer with Multi-Scale Segment-Wise Correlations for Long-Term Time Series Forecasting. arXiv 2022. [Google Scholar] [CrossRef]
  24. Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020. [Google Scholar] [CrossRef]
  25. Sun, Y.; Zhou, X.; Yang, Z.; Wang, W.; Shi, Q. Flight Trajectory Prediction Method Based on Attentional Graph Convolutional Network. In Proceedings of the 2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS), Chengdu, China, 21–23 July 2023; pp. 127–132. [Google Scholar] [CrossRef]
  26. Sun, Y.; Jing, T.; Wang, J.; Wang, W. Fighter flight trajectory prediction based on spatio-temporal graphical attention network. arXiv 2024. [Google Scholar] [CrossRef]
  27. Pang, Y.; Liu, Y. Conditional Generative Adversarial Networks (CGAN) for Aircraft Trajectory Prediction considering weather effects. In Proceedings of the AIAA Aviation Forum 2020, Virtual Event, 15–19 June 2020. AIAA 2020-1853. [Google Scholar] [CrossRef]
  28. Zhang, H.; Liu, Z. Four-Dimensional Aircraft Trajectory Prediction Based on Generative Deep Learning. J. Aerosp. Inf. Syst. 2024, 21, 554–567. [Google Scholar] [CrossRef]
  29. Wu, X.; Yang, H.; Chen, H.; Hu, Q.; Hu, H. Long-term 4D trajectory prediction using generative adversarial networks. Transp. Res. Part C Emerg. Technol. 2022, 136, 103554. [Google Scholar] [CrossRef]
  30. Zohdy, M.A.; Hegazy, T.; Abdelrahman, A.; Said, H.; Elhoseny, M. Applications of Artificial Intelligence in Air Operations: A Systematic Review. Appl. Sci. 2025, 15, 2012. [Google Scholar] [CrossRef]
  31. Henttu, A.; Chatterjee, K.; Majumdar, A. A Survey on Artificial Intelligence (AI) and eXplainable AI in Air Traffic Management: Current Trends and Development with Future Research Trajectory. Appl. Sci. 2022, 12, 1295. [Google Scholar] [CrossRef]
  32. Abdelmaboud, A.; Khafaga, D.S.; Abuarqoub, A.; Barkaoui, K. A Survey on Artificial-Intelligence-Based Internet of Vehicles Utilizing Unmanned Aerial Vehicles. Drones 2024, 8, 353. [Google Scholar] [CrossRef]
  33. Tian, Y.; Huang, J.; Wang, B.; Ren, X. Aircraft 4D Trajectory Prediction in Civil Aviation: A Review. Aerospace 2022, 9, 91. [Google Scholar] [CrossRef]
  34. Zeng, W.; Quan, Z.; Zhao, Z.; Xie, C.; Lu, X. A Deep Learning Approach for Aircraft Trajectory Prediction in Terminal Airspace. IEEE Access 2020, 8, 151250–151266. [Google Scholar] [CrossRef]
  35. Xu, Z.; Zeng, W.; Chu, X.; Cao, P. Multi-Aircraft Trajectory Collaborative Prediction Based on Social Long Short-Term Memory Netw. Aerospace 2021, 8, 115. [Google Scholar] [CrossRef]
  36. Wu, J. Aircraft Trajectory Prediction Based on Long and Short-Term Memory Structural Models. In Proceedings of the 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), Bengaluru, India, 20–21 April 2024; pp. 1–5. [Google Scholar] [CrossRef]
  37. Shi, Z.; Xu, M.; Pan, Q. 4-D Flight Trajectory Prediction With Constrained LSTM Network. IEEE Trans. Intell. Transp. Syst. 2021, 22, 7242–7255. [Google Scholar] [CrossRef]
  38. Wang, B.; Zhai, Z.; Xiong, R.; Gao, B. Flight Trajectory Prediction of General Aviation Aircraft Based on LSTM Model. In Proceedings of the 2021 IEEE 4th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 24–26 September 2021; pp. 176–180. [Google Scholar] [CrossRef]
  39. Yang, Z.; Kang, X.; Gong, Y.; Wang, J. Aircraft trajectory prediction and aviation safety in ADS-B failure conditions based on neural network. Sci. Rep. 2023, 13, 19677. [Google Scholar] [CrossRef]
  40. Tong, Q.; Hu, J.; Chen, Y.; Guo, D.; Liu, X. Long-Term Trajectory Prediction Model Based on Transformer. IEEE Access 2023, 11, 143695–143703. [Google Scholar] [CrossRef]
  41. Lu, G.; Wang, H.; Song, Z.; Cui, P. An improved transformer-based model for long-term 4D trajectory prediction in civil aviation. Int. J. Intell. Syst. 2024, 2024, 4323604. [Google Scholar] [CrossRef]
  42. Jia, P.; Chen, H.; Zhang, L.; Han, D. Attention-LSTM based prediction model for aircraft 4-D trajectory. Sci. Rep. 2022, 12, 15533. [Google Scholar] [CrossRef]
  43. Luo, S.; Zhao, M.; Zhao, Z.; Li, L.; Zhang, S.; Zhang, X. FT-TF: A 4D Long-Term Flight Trajectory Prediction Method Based on Transformer. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 4616–4621. [Google Scholar] [CrossRef]
  44. Hao, H.; Si, X.; Li, M.; Liang, J. Flight Trajectory Change Prediction Via Patched Spatial-Temporal Transformer. In Proceedings of the IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 14–19 July 2024; pp. 10394–10398. [Google Scholar] [CrossRef]
  45. Guo, D.; Zhang, Z.; Yan, Z.; Zhang, J.; Lin, Y. FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 20–22 February 2024; pp. 127–134. [Google Scholar] [CrossRef]
  46. Lu, G.; Long, L.; Zhang, Y.; Zheng, Y. An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion. Aerospace 2025, 12, 319. [Google Scholar] [CrossRef]
  47. Sun, Y.; Wang, D.; Wang, W.; Xiong, L.; Yang, X. Confrontational flight trajectory prediction based on attention mechanism. In Proceedings of the 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 18–20 December 2020; pp. 211–214. [Google Scholar] [CrossRef]
  48. Silvestre, J.; Mielgo, P.; Bregon, A.; Martínez-Prieto, M.A.; Álvarez-Esteban, P.C. Multi-Route Aircraft Trajectory Prediction Using Temporal Fusion Transformers. IEEE Access 2024, 12, 174094–174106. [Google Scholar] [CrossRef]
  49. Hu, Q.; Huang, G.; Shi, H.; Lin, Y.; Guo, D. A Short-term Aircraft Trajectory Prediction Framework Using Conditional Generative Adversarial Network. In Proceedings of the 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Dali, China, 27–30 October 2022; pp. 433–439. [Google Scholar] [CrossRef]
  50. Yin, Y.; Zhang, S.; Zhang, Y.; Zhang, Y.; Xiang, S. Context-aware Aircraft Trajectory Prediction with Diffusion Models. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; pp. 5312–5317. [Google Scholar] [CrossRef]
  51. Yang, S.; Liu, L.; Chen, B.; Cheng, S.; Shi, Z.; Zou, Z. GooDFlight: Goal-Oriented Diffusion Model for Flight Trajectory Prediction. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 7447–7465. [Google Scholar] [CrossRef]
  52. Fan, Y.; Tan, Y.; Wu, L.; Ye, H.; Lyu, Z. Global and Local Interattribute Relationships-Based Graph Convolutional Network for Flight Trajectory Prediction. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 2642–2657. [Google Scholar] [CrossRef]
  53. Kuang, Y.; Wang, Z.; Zhang, J.; Shi, Z.; Zhang, Y. DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Graph Convolution. arXiv 2025. [Google Scholar] [CrossRef]
  54. Ma, L.; Tian, S. A Hybrid CNN-LSTM Model for Aircraft 4D Trajectory Prediction. IEEE Access 2020, 8, 134668–134680. [Google Scholar] [CrossRef]
  55. Zhang, K.; Chen, B. Phased flight trajectory prediction with deep learning. arXiv 2022. [Google Scholar] [CrossRef]
  56. Zhang, L.; Chen, H.; Jia, P.; Tian, Z.; Du, X. WGAN-GP and LSTM based Prediction Model for Aircraft 4-D Trajectory. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2022; pp. 937–942. [Google Scholar] [CrossRef]
  57. Wang, S.; Wang, Y.; Xu, L.; Shi, J.; Hu, Z. SATF: A flight trajectory prediction method incorporating spatial awareness and time frequency transformation. Aerosp. Sci. Technol. 2025, 147, 107714. [Google Scholar] [CrossRef]
  58. Ma, L.; Meng, X.; Wu, Z. Data-Driven 4D Trajectory Prediction Model Using Attention-TCN-GRU. Aerospace 2024, 11, 313. [Google Scholar] [CrossRef]
  59. Fang, J.; Liu, Y.; Xu, T.; Wang, Y.; Li, J. GCTrajectory: A CNN-Transformer Adversarial Learning Framework for Flight Trajectory Prediction. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 28–30 February 2025; pp. 650–654. [Google Scholar] [CrossRef]
  60. Ding, W.; Huang, J.; Shang, G.; Wang, X.; Li, B.; Li, Y.; Liu, H. Short-Term Trajectory Prediction Based on Hyperparametric Optimisation and a Dual Attention Mechanism. Aerospace 2022, 9, 464. [Google Scholar] [CrossRef]
  61. Ma, T.-Y.; Meng, X.; Wu, Z. A Novel Trajectory Prediction Method Based on CNN, BILSTM, and Attention Mechanism: Analysis of B1520 Flight Data. Aerospace 2024, 11, 822. [Google Scholar] [CrossRef]
  62. Wu, Y.; Yu, H.; Du, J.; Liu, B.; Yu, W. An Aircraft Trajectory Prediction Method Based on Trajectory Clustering and a Spatiotemporal Feature Network. Electronics 2022, 11, 3453. [Google Scholar] [CrossRef]
  63. Huang, J.; Ding, W. Aircraft Trajectory Prediction Based on Bayesian Optimised Temporal Convolutional Network–Bidirectional Gated Recurrent Unit Hybrid Neural Network. Int. J. Aerosp. Eng. 2022, 2022, 2086904. [Google Scholar] [CrossRef]
  64. Wu, H.; Liang, Y.; Zhou, B.; Sun, H. A Bi-LSTM and AutoEncoder Based Framework for Multi-step Flight Trajectory Prediction. In Proceedings of the 2023 IEEE 5th International Conference on Computer, Communications and Robotics Engineering (ICCRE), Chongqing, China, 1–3 December 2023; pp. 44–50. [Google Scholar] [CrossRef]
  65. Dong, Z.; Fan, B.; Li, F.; Xu, X.; Sun, H.; Cao, W. TCN-Informer-Based Flight Trajectory Prediction for Aircraft in the Approach Phase. Sustainability 2023, 15, 16344. [Google Scholar] [CrossRef]
  66. Tran, P.N.; Nguyen, H.Q.V.; Pham, D.-T.; Alam, S. Aircraft Trajectory Prediction With Enriched Intent Using Encoder-Decoder Architecture. IEEE Access 2022, 10, 17881–17896. [Google Scholar] [CrossRef]
  67. Liu, B.; Shi, Q.; Han, P. Short-Term 4D Trajectory Prediction Method Based on LSTM-IMM. In Proceedings of the 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC), Portsmouth, VA, USA, 18–22 September 2022; pp. 1–8. [Google Scholar] [CrossRef]
  68. Schimpf, N.; Wang, Z.; Li, S.; Knoblock, E.J.; Li, H.; Apaza, R.D. A Generalized Approach to Aircraft Trajectory Prediction via Supervised Deep Learning. IEEE Access 2023, 11, 116183–116195. [Google Scholar] [CrossRef]
  69. Phisannupawong, T.; Damanik, J.J.; Choi, H.-L. Aircraft Trajectory Segmentation-based Contrastive Coding: A Framework for Self-supervised Trajectory Representation. IEEE Open J. Intell. Transp. Syst. 2025, 6, 738–757. [Google Scholar] [CrossRef]
  70. Liu, S.; Sun, Y.; Jing, T.; Li, Z.; Wang, W. Effective and Efficient Representation Learning for Flight Trajectories. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 20–22 February 2025; pp. 1–8. [Google Scholar] [CrossRef]
  71. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 139–147. [Google Scholar] [CrossRef]
  72. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014. [Google Scholar] [CrossRef]
  73. Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional GAN. arXiv 2019. [Google Scholar] [CrossRef]
  74. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar] [CrossRef]
  75. Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
  76. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2017. [Google Scholar] [CrossRef]
  77. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2018. [Google Scholar] [CrossRef]
  78. Shao, Y.; Sun, C. Performance evaluation of China’s air routes based on network data envelopment analysis approach. J. Air Transp. Manag. 2016, 55, 67–75. [Google Scholar] [CrossRef]
  79. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar] [CrossRef]
  80. Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015; pp. 1422–1430. [Google Scholar] [CrossRef]
  81. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (ICML), Virtual Event, 13–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar] [CrossRef]
  82. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  83. Makridakis, S.; Wheelwright, S.C.; McGee, V.E. Forecasting: Methods and Applications, 2nd ed.; Wiley: New York, NY, USA, 1983. [Google Scholar]
  84. Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 43–49. [Google Scholar] [CrossRef]
  85. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
  86. Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 961–971. [Google Scholar] [CrossRef]
  87. Mercat, J.; Zoghby, N.E.; Sandou, G.; Beauvois, D.; Pita Gil, G. Kinematic single vehicle trajectory prediction baselines and applications with the NGSIM dataset. arXiv 2019. [Google Scholar] [CrossRef]
  88. Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
  89. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  90. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  91. Kuncheva, L.I.; Whitaker, C.J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 2003, 51, 181–207. [Google Scholar] [CrossRef]
  92. EUROCONTROL. Challenges of Growth 2018: European Air Traffic in 2040; EUROCONTROL: Brussels, Belgium, 2018; Available online: https://www.eurocontrol.int (accessed on 29 August 2025).
  93. Civil Aviation Administration of China (CAAC). Annual Report of China Civil Aviation Flight Statistics 2021; CAAC: Beijing, China, 2021. Available online: http://www.caac.gov.cn (accessed on 29 August 2025).
  94. NATS. Annual Report and Accounts 2022; NATS Holdings: Whiteley, UK, 2022; Available online: https://www.nats.aero (accessed on 29 August 2025).
  95. Flightradar24. Flightradar24: Live Flight Tracker. Available online: https://www.flightradar24.com (accessed on 29 August 2025).
  96. FlightAware. FlightAware Live Flight Tracking. Available online: https://www.flightaware.com (accessed on 29 August 2025).
  97. ADS-B Exchange. The World’s Largest Source of Unfiltered Flight Data. Available online: https://www.adsbexchange.com (accessed on 29 August 2025).
  98. Eagle Dynamics. Digital Combat Simulator (DCS) World, Version 2.9; Eagle Dynamics: Moscow, Russia, 2024. Available online: https://www.digitalcombatsimulator.com (accessed on 29 August 2025).
  99. Raia Software. TacView Flight Data Visualization Tool, Version 1.9; Raia Software: Paris, France, 2024. Available online: https://www.tacview.net (accessed on 29 August 2025).
  100. Air Combat Center. Air Combat Simulation Platform. Available online: https://aircombatcentre.com.au/ (accessed on 29 August 2025).
Figure 1. PRISMA flow diagram illustrating the identification, screening, eligibility, and inclusion process for selecting deep learning-based aircraft trajectory prediction studies.
Figure 1. PRISMA flow diagram illustrating the identification, screening, eligibility, and inclusion process for selecting deep learning-based aircraft trajectory prediction studies.
Applsci 15 10739 g001
Figure 2. Distribution of reviewed studies by search engine source.
Figure 2. Distribution of reviewed studies by search engine source.
Applsci 15 10739 g002
Figure 3. Annual distribution of deep learning-based aircraft trajectory prediction studies published between 2020 and June 2025, showing the number of publications per year.
Figure 3. Annual distribution of deep learning-based aircraft trajectory prediction studies published between 2020 and June 2025, showing the number of publications per year.
Applsci 15 10739 g003
Figure 4. Flow diagram summarizing taxonomy of deep learning models for aircraft trajectory prediction and their representative performance characteristics across key evaluation metrics.
Figure 4. Flow diagram summarizing taxonomy of deep learning models for aircraft trajectory prediction and their representative performance characteristics across key evaluation metrics.
Applsci 15 10739 g004
Figure 5. Annual adoption trends of evaluation metrics (MAE, RMSE, ADE, and FDE) in reviewed aircraft trajectory prediction studies (2020–2025.6).
Figure 5. Annual adoption trends of evaluation metrics (MAE, RMSE, ADE, and FDE) in reviewed aircraft trajectory prediction studies (2020–2025.6).
Applsci 15 10739 g005
Figure 6. Distribution of dataset types used in reviewed aircraft trajectory prediction studies, showing proportions of ADS-B, radar, flight plans, weather, synthetic/simulation, and GNSS (BeiDou).
Figure 6. Distribution of dataset types used in reviewed aircraft trajectory prediction studies, showing proportions of ADS-B, radar, flight plans, weather, synthetic/simulation, and GNSS (BeiDou).
Applsci 15 10739 g006
Figure 7. Temporal trends in dataset usage across reviewed studies (2020–2025.6), showing annual adoption of ADS-B, flight plans, weather, simulation/synthetic data, radar, and GNSS (BeiDou).
Figure 7. Temporal trends in dataset usage across reviewed studies (2020–2025.6), showing annual adoption of ADS-B, flight plans, weather, simulation/synthetic data, radar, and GNSS (BeiDou).
Applsci 15 10739 g007
Table 1. Research questions (RQ1–RQ5) and corresponding objectives formulated for conducting the systematic literature review (SLR) on deep learning-based aircraft trajectory prediction.
Table 1. Research questions (RQ1–RQ5) and corresponding objectives formulated for conducting the systematic literature review (SLR) on deep learning-based aircraft trajectory prediction.
No.QuestionObjective
RQ1Which types of deep learning-based models have been proposed for aircraft trajectory prediction?To identify the categories and structural characteristics of deep learning models used in trajectory prediction.
RQ2Which evaluation metrics are commonly used to compare the performance of aircraft trajectory prediction models?To systematically summarize the performance metrics that are generally applied for accuracy comparison.
RQ3Which types of data and representative datasets are used as inputs for aircraft trajectory prediction models?To review input sources such as ADS-B, radar, weather, and flight plans, and to summarize publicly available and proprietary datasets.
RQ4In which application domains are aircraft trajectory prediction models applied, and which technical considerations must be addressed for real-time system implementation?To examine application domains such as ATM, data augmentation, and anomaly detection, and identify technical considerations such as real-time capability and computational efficiency.
RQ5What are the limitations of current aircraft trajectory prediction studies, and which research directions are being proposed for the future?To summarize existing limitations and highlight future directions, including multimodal learning, uncertainty quantification, and reinforcement learning integration.
Table 2. Structural classification of deep learning models applied to aircraft trajectory prediction, organized by model category, criteria, frequency of use, and representative references.
Table 2. Structural classification of deep learning models applied to aircraft trajectory prediction, organized by model category, criteria, frequency of use, and representative references.
CategoryCriteriaNumber (Rate)Reference
RNN-Based ModelsSpecialized in sequential dependency learning using LSTM, GRU, Bi-LSTM, and ConvLSTM7 (17%)[15,34,35,36,37,38,39]
Attention-Based ModelsSelf-attention-based long-term dependency learning, including Transformer, Informer, and TFT10(24%)[21,40,41,42,43,44,45,46,47,48]
Generative ModelsGenerative approaches such as GAN and diffusion for data augmentation and diversity4(10%)[28,49,50,51]
Graph-Based ModelsModeling aircraft interactions and relational patterns via GCN, GAT, and ST-GCN4(10%)[25,26,52,53]
Hybrid and Integrated ModelsCombining heterogeneous structures (CNN, RNN, attention, GAN, etc.) to overcome the limits of single models21(50%)[17,18,19,20,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]
Table 3. Summary of attention-oriented and hybrid/context-integrated attention models for aircraft trajectory prediction, including their descriptions, representative characteristics, and references.
Table 3. Summary of attention-oriented and hybrid/context-integrated attention models for aircraft trajectory prediction, including their descriptions, representative characteristics, and references.
CategoryDescriptionRepresentative characteristicsReferences
Attention-Oriented ModelsStudies that rely only on attention or Transformer architectures or focus on improving their internal mechanisms.Validate Transformer performance, enhance positional encoding, and stabilize long-term predictions.[21,40,41,42,43]
Hybrid and Context-Integrated Attention ModelsStudies that integrate attention with other deep learning structures (CNN, RNN, AutoEncoder, etc.) or combine attention with contextual information such as weather, flight modes, or multi-aircraft interactions.Capture spatiotemporal dependencies, leverage auxiliary networks, and improve robustness with multimodal and contextual data.[44,45,46,47,48]
Table 4. Comparative summary of categories of deep learning models for aircraft trajectory prediction.
Table 4. Comparative summary of categories of deep learning models for aircraft trajectory prediction.
Model CategoryRepresentative ModelsDataset(s)StrengthsWeaknesses
RNN-basedLSTM, GRU, Bi-LSTM, ConvLSTMADS-B, ADS-B + Weather,
ADS-B + ATC
  • Effective for sequential dependency modeling;
  • Widely validated and robust for short-/mid-term prediction.
  • Gradient vanishing in long sequences;
  • Limited scalability in dense traffic;
  • Weak under noisy/missing data.
Attention-basedTransformer, Informer, TFT, Dual AttentionADS-B, Weather-integrated datasets,
Flight plans
  • Captures long-term dependencies;
  • Strong scalability with parallel processing;
  • Stable in long-horizon prediction.
  • High computational costs;
  • Requires large-scale datasets;
  • Complex hyperparameter tuning.
GenerativeGAN, Diffusion, VAE, TimeGANADS-B, Simulation-based datasets
  • Data augmentation for rare/extreme scenarios;
  • Improves generalization and diversity;
  • Models uncertainty explicitly.
  • Training instability (GAN);
  • High latency and computational demand (diffusion);
  • Limited real-time applicability.
Graph-basedGCN, GAT, ST-GCN,
DA-STGCN
ADS-B, Simulation (e.g., DCS World), Flight plans
  • Captures multi-aircraft interactions;
  • Effective in dense traffic/terminal areas;
  • Useful for anomaly detection.
  • High graph construction costs;
  • Sensitive to noise;
  • Scalability issues in real-time systems.
Hybrid/IntegratedCNN-LSTM, Attention-LSTM, IMM + Informer, ST-LSTM + CNNADS-B, ADS-B + Weather, ADS-B + Radar
  • Achieves highest predictive accuracy;
  • Combines strengths of multiple structures;
  • Flexible design for diverse applications.
  • High structural complexity;
  • Greater computational overhead;
  • Risk of overfitting without standardized benchmarks
Table 5. Classification of multi-module models for aircraft trajectory prediction by category and approach, with examples of model structures and corresponding rationale.
Table 5. Classification of multi-module models for aircraft trajectory prediction by category and approach, with examples of model structures and corresponding rationale.
Category Approach Ref. Model StructureRationale
Structural CombinationConv + RNN[17,18,54,55,58,60,61,62,63]CNN/Conv1D/TCN combined with LSTM/GRU/BiGRUConvolution/TCN extracts spatial features, while RNNs capture temporal dependencies.
Graph + (RNN/Transformer/Conv)[25,26,52,53]LSTM + GCN + Attention, Transformer + GAT, GLR-GCN + TCN, Dual-Attention ST-GCNGraphs represent spatial interactions, complemented by RNN/Conv/Transformer for temporal modeling.
Conv + Transformer[57,59,65]TCN-Informer, CNN + Transformer Generator + Bi-LSTM Discriminator, Spatial /Time-Frequency TransformerConv/TCN preprocessing and feature extraction combined with Transformer-based modules.
Purpose-Driven ExtensionsGenerative/Adversarial + Predictor[56,59]WGAN + LSTM Predictor, CNN + Transformer Generator + Bi-LSTM DiscriminatorCombines generative phase with predictive/discriminative modules.
Representation Learning/SSL[69,70]Trajectory Contrastive Coding, FLIGHT2VECSelf-supervised/contrastive learning enhances general-purpose embeddings and transferability.
Other Specialized Modules[18,19,20,62,64]Clustering + CNN, Spatiotemporal Attention + RNN, Social-Pooling, Bi-LSTM + AE + Voting, IMM + InformerIntegrates auxiliary modules (e.g., clustering, social pooling, correction) to complement core predictors.
Table 6. Evaluation metrics used in reviewed aircraft trajectory prediction studies, showing their frequency of usage and corresponding references.
Table 6. Evaluation metrics used in reviewed aircraft trajectory prediction studies, showing their frequency of usage and corresponding references.
Metric UsageRef.
MAE28[15,17,20,21,35,36,37,38,39,41,42,43,44,45,47,48,49,54,55,58,60,61,63,64,65,66,68,70]
RMSE27[15,17,20,21,35,36,37,38,40,42,43,44,45,48,54,55,58,59,60,61,62,63,64,65,66,67,68]
MAPE6[39,40,42,43,54,68]
DTW2[37,40]
MED1[28]
ADE9[18,25,26,28,41,47,50,51,53,55]
FDE8[18,25,26,28,41,50,51,53,55]
other10[18,28,30,40,49,50,51,58,69,70]
Table 8. Frequency of dataset sources used across reviewed studies, with usage counts and corresponding references.
Table 8. Frequency of dataset sources used across reviewed studies, with usage counts and corresponding references.
Dataset SourceUsageRef.
OpenSky Network12[15,17,19,36,39,41,46,51,52,65,69]
CAAC (China)6[40,42,58,61,62]
FAA (U.S.)2[35,68]
EUROCONTROL1[15]
CETC/HU7603/ATMB (China)3[21,41,48]
NATS (UK)
SCAT (Sweden), ATFMTraj1[70]
Simulation (Air Combat, DCS, TacView)6[18,25,26,43,47]
Commercial (ADS-B generic, BeiDou GNSS, JFK/Boston, etc.)12[20,28,34,37,38,44,45,46,49,53,54,55,56,60,63,67]
Table 9. Structural and performance summary of RNN-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
Table 9. Structural and performance summary of RNN-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformanceStrengthsDrawback
[15]2024LSTM + fully connected layer
(with interpolation smoothing)
ADS-B MAE: 0.0208° (Lat), 0.0364° (Lon)Improves long-term 2D trajectory prediction accuracyWeak altitude prediction; no interaction modeling
[34]2020Seq2Seq LSTM (encoder–decoder with sampling, noise filtering)ADS-B EE (Euclidean Error): 330.1 m, AE (Altitude Error): 45.3 mAchieves superior accuracy in terminal phases.Limited to short-term terminal data, reducing generalization
[35]2021Social-LSTM with pooling grid for multi-aircraft interactionADS-B MAPHE ≈ 660 m, MAPVE ≈ 13 m.Captures multi-aircraft interactionsError accumulation; pooling config limited
[36]2024ConvLSTM (CNN for spatial + LSTM for temporal)ADS-B + ATC Radar + WeatherMAE: Time 337 s, Horiz. 65.15 m, Vert. 4842.74 m(ConvLSTM)Effective spatiotemporal fusion; robust short-termLimited generalization across weather/airspace
[37]2021Constrained LSTM
(flight-phase constraints)
ADS-B RMSE ≈ 0.009 km(Alt)Phase-specific constraints improve accuracyLimited long-term prediction
[38]2021Dual-layer LSTM with ultimodal input (ADS-B + BeiDou)ADS-B + BeiDou RMSE ≈ 0.39 kmRobust under signal loss; real-time capableNo weather/terrain factors; limited long-term predictions
[39]2023Bi-LSTM (bidirectional sequence learning)ADS-BMAE (Lat) = 0.00226°, (Lon) = 0.00238Stable prediction with ADS-B gaps; safety use-casePrimarily short-term recovery
MAPHE: Mean Average Position Horizontal Error. MAPVE: Mean Average Position Vertical Error.
Table 10. Structural and performance summary of attention-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
Table 10. Structural and performance summary of attention-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformance StrengthsDrawbacks
[21]2025NRAT (Transformer decoder + denoising, autoregressive)ADS-BRMSE: 0.015,
MAE: 0.011
Robust to noisy inputs; stable autoregressive Transformer.Narrow dataset; autoregressive accumulation risk.
[40]2023TET (Transformer
encoder–decoder +
positional encoding)
ADS-B ADE: 0.312,
FDE: 0.641
Captures long-term spatiotemporal patterns.Requires larger/more diverse dataset.
[41]2023FT-TF (Frequency Transformer + CNN encoder)FLIGHT19
(simulation)
ADE: 0.128,
FDE: 0.243
Specialized for combat scenarios.Limited to simulation;
very short horizon.
[42]2022Attention-LSTMADS-B RMSE: 0.0026, MAE: 0.0019, DTW: 0.021 Attention and LSTM improves accuracy; ablation verified.Limited to Chinese ADS-B;
no weather/ATC.
[43]2024Transformer + Trajectory Stabilization + one-step inferenceADS-B RMSE:0.0464(Lat)
RMSE:3.9228(Alt)
Stabilization and one-step inference improves long-horizon accuracy.No weather/ATC;
lacks multi-aircraft ability.
[44]2024FlightBERT++
(Conv1D encoder +
differential decoder)
ADS-BMAE: 0.0017 to 0.0124Non-autoregressive; accurate and efficient multi-horizon prediction.Needs broader datasets;
complex architecture.
[45]2024TFT (Temporal Fusion Transformer)ADS-B MAE: 0.0133°
Altitude: 318 ft
Integrates contextual features; explainability.Altitude weaker due to
route diversity.
[46]2025Inverted Transformer (variable tokens + multi-flight fusion)ADS-B MAE = 0.0602,
MSE = 0.0171
Learns variable relations; multi-flight fusion improves generalization.No weather/ATC;
heavy computing.
[47]2024PSTT (patched spatiotemporal Transformer + single-step decoder)ADS-BMSE: 0.161,
MAE: 0.179
Patch embedding reduces
computing; single-step decoder avoids AR errors.
Small dataset (2 routes);
limited generalization.
[48]2020Attention-LSTM (BiLSTM + ATT)Fighter trajectory (real ombat)ADE: 0.625Captures two-aircraft interaction; validated on combat data.Limited to one-step; not generalized to multi-aircraft.
Table 11. Structural and performance summary of generative deep learning models (e.g., GAN, Diffusion, etc.) applied to aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
Table 11. Structural and performance summary of generative deep learning models (e.g., GAN, Diffusion, etc.) applied to aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformanceStrengthsDrawbacks
[28]2024CTGAN (Conditional Tabular GAN with frequency sampling fix + leave-one-out encoding)ADS-BJS = 0.0539, MMD = 0.0150, MED = 0.0085Robust to imbalanced/small datasets; preserves distributionNo weather/ATC integration; limited generalization.
[49]2022TPGAN (Conv1D/2D + LSTM encoders, WGAN-GP loss)ADS-B MAE:0.070(Lat),
0.055(Lon), 0.041(Alt);
High accuracy and speed; reduces error accumulationConv2D/LSTM less efficient;
no external context.
[50]2023Context-aware diffusion (LSTM + context encoder + Transformer-based diffusion)ADS-B ADE = 0.528,
FDE = 1.003 (3D)
Captures multimodality; first diffusion applied in TPLimited to arrivals;
lacks weather/ATC data.
[51]2025GooDFlight (goal-oriented diffusion with trajectory + goal encoder)ADS-B ADE = 0.365, FDE = 0.987;
Goal Hit Rate = 66.2%
Goal-conditioned; improves accuracy and target consistencyHigh computing costs;
external context not included.
Table 12. Structural and performance summary of graph-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
Table 12. Structural and performance summary of graph-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformanceStrengthsDrawbacks
[25]2023AGCN
(GCN + Attention + LSTM)
Simulated fighter trajectories ADE ≈ 0.71 km,
FDE ≈ 1.24 km
Captures multi-aircraft interactions; better in combat dynamics.Simulation only; lacks real fighter/weather data
[26]2024ST-GAT
(Transformer + GAT + FC)
DCS World fighter simulation ADE = 0.098 km,
FDE = 0.124 km
Integrates temporal and spatial; robust in multi-agent scenarios.Simulation only;
no pilot intent/weather
[52]2024GLR-GCN (Global Graph + Local Graph + Temporal CNN)ADS-BMAE = 0.1863,
RMSE = 0.3644,
MRE = 0.1089
Encodes global/local relations; 10–20% better vs. baselines.Sensitive to noise; UAV/combat not tested
[53]2025DA-STGCN (Dual Attention + GAT + STGCN + TXP-CNN)ADS-B ADE = 0.0082 km,
FDE = 0.011 km
Strong terminal performance; globally/locally captured.No weather/ATC;
high computing demand
Table 13. Structural and performance summary of structural hybrid deep learning models for aircraft trajectory prediction, combining heterogeneous modules (e.g., CNN, RNN, Transformer, and GAN) including datasets, performance metrics, strengths, and drawbacks.
Table 13. Structural and performance summary of structural hybrid deep learning models for aircraft trajectory prediction, combining heterogeneous modules (e.g., CNN, RNN, Transformer, and GAN) including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformanceStrengthsDrawbacks
[17]2022CNN-GRU + 3D CNN + Monte Carlo Dropout (CG3D)ADS-B MAE = 0.1406
RMSE = 2232
Quantifies uncertainty; strong for long horizons.High computational costs; requires massive dataset.
[18]2024CNN-LSTM + attention + social poolingADS-B ADE = 0.235 km
FDE = 0.388 km
Robust in short-term; spatialtemporal clustering.Generalization unclear.
[19]2025IMM (CV/CA/CT) + InformerADS-B MAE = 0.026°(Lat), MAE = 0.024°(Lon), RMSE = 0.033°Combines physics-based IMM with deep Informer; robust to noise.Complex and costly.
[20]2024CNN + GRU + Spatiotemporal Attention (STAM)ADS-B MADE = 1365.27
MAPE = 12.69%
Captures local dynamics in TMA.Limited to terminal phase.
[54]2020CNN for spatial + LSTM for temporalADS-B RMSE: 0.007–0.009° (Lat/Lon),
MAE = 20–25 m(Alt)
Real-time potential; improved over LSTM.Small dataset;
no weather/ATC.
[55]2022Phase-specific ST-LSTM + CNN + attentionADS-B + WeatherRMSE = 0.021,
MAE = 0.014
Phase dependent accuracy improvement.Needs accurate flight-phase segmentation.
[56]2022WGAN-GP for data- aug. + LSTM predictorADS-B RMSE = 0.021km(Alt), MAE = 0.002–0.004° (Lat/Lon) Data augmentation for scarce routes.Route-specific;
small dataset.
[57]2025Spatial awareness encoder + Time-frequency Transformer (SATF)ADS-B RMSE: 0.0704(Alt) 0.0444(Lon), 0.0388(Lat) (horizon = 20)Enhances long-horizon prediction.High computational costs; spectral preprocessing.
[58]2024Attention + TCN + GRUADS-B RMSE = 0.016,
MAE = 0.012
Robust multi-step prediction.Requires large-scale dataset.
[59]2025CNN-Transformer generator + Bi-LSTM Discriminator (GAN)ADS-B RMSE ≈ 0.012,
MAE ≈ 0.010
Realistic generation + improved prediction.Unstable GAN training.
[60]2022CNN + BiLSTM + Dual Attention + GA(genetic algorithm)ADS-B RMSE: 0.029°(Lat),
0.018(Lon), 50.68
Improves performance in short-term aircraft trajectory prediction.Uncertainty in real-time applicability due to model complexity.
[61]2024CNN + BiLSTM + multi-head attentionADS-B MAE = 0.00088, RMSE = 0.00112, R2 =0.995Multi-head attention improves turning point accuracy.Limited to a single aircraft dataset, raising concerns about generalization.
[62]2023Clustering + CNN-LSTMADS-B RMSE = 0.015,
MAE = 0.011
Handles clustered trajectories efficiently.Limited generalization to new routes.
[63]2022TCN + BiGRU + Dual Attention + BOADS-B RMSE: 20.14 m(Alt), 0.004°(Lat), 0.009°(Lon)
MAE = 0.014
Achieves superior accuracy and robustness, effectively capturing spatiotemporal dependencies.Single route dataset with limited generalization.
[64]2023Bi-LSTM + AutoEncoder + VotingADS-B MAE ≈ 0.018,
RMSE ≈ 0.026
Reduces error variance via ensemble.Higher computational costs.
[65]2023TCN (dilated conv) + InformerADS-B MAE = 0.017°(Lat),
MAE = 37.5 m(Alt)
Superior accuracy in approach—phase prediction through hybrid TCN-Informer design.Single-aircraft ADS-B dataset limits generalization and real-time validation.
[66]2022Encoder–decoder (Conv1D + GRU) + IntentADS-B RMSE ≈ 0.020,
MAE ≈ 0.013
Stable in intent-integrated prediction.Depends on intent data quality.
[67]2022IMM (CV/CA/CT) + LSTM correctionADS-B + RadarRMSE: 0.01319°(Lon),
0.01101°(Lat), 65.91 km
Robust initialization; effective for short-term prediction.Weak for long horizons.
Table 14. Structural and performance summary of representation learning and generalization-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
Table 14. Structural and performance summary of representation learning and generalization-based deep learning models for aircraft trajectory prediction, including datasets, performance metrics, strengths, and drawbacks.
RefYearModel StructureDatasetPerformanceStrengthDrawback
[68]2023Hybrid-Recurrent (CNN/SA + LSTM/GRU/IndRNN)ADS-B + WeatherHoriz. Error ≈ 40 nmi, Vert. Error ≈ 1160 ftShows importance of weather integration; CNN-GRU most robust.Poor generalization to unseen routes (error ↑ by 70–500%).
[69]2025ATSCC (Trajectory Segmentation + Contrastive Coding + Transformer)ADS-B ACC = 0.9946,
ARI = 0.8195
(Clustering Quality)
Strong clustering; no labels needed.Focused on clustering, not full trajectory prediction.
[70]2025FLIGHT2VEC (Behavior-Adaptive Patching + Motion Trend Learning + Transformer)ADS-B MAE =0.0381
RMSE = 0.0836
(Lon, horizon = 60)
Universal trajectory representation learning;
efficient and effective.
No explicit weather/ATC features.
Table 15. Comparative strengths and limitations of different deep learning model categories (RNN, attention, generative, graph, and hybrid and integrated) when applied to aircraft trajectory prediction.
Table 15. Comparative strengths and limitations of different deep learning model categories (RNN, attention, generative, graph, and hybrid and integrated) when applied to aircraft trajectory prediction.
Model CategoryStrengthsLimitations
RNN-basedStrong in temporal sequence modeling; stable baseline in short- and mid-term prediction; Bi-LSTM and ConvLSTM variants are robust to noisy dataWeak in long-horizon prediction due to error accumulation;
instability in altitude prediction
Attention-basedSolves long-term dependency; efficient parallel computation; superior long-horizon performance; improved RMSE, MAE, and ADE/FDE over RNNRequires large-scale datasets;
high computational cost; limited by parameter tuning complexity
GenerativeCaptures trajectory diversity and distribution generalization; strong in long-horizon prediction and small-data scenarios; uses new metrics (ADE, FDE, diversity, NLL)Training instability;
limited generalization to real-world operational data
Graph-basedExplicitly models multi-aircraft interactions; strong performance in complex traffic (e.g., terminal areas); 10–20% ADE/FDE improvement over baselineHigh cost of graph construction;
scalability issues in large-scale or real-time applications
Hybrid and integratedAchieves overall best RMSE/MAE (≈0.011–0.012); combines complementary structures; advanced models integrate Transformer, GAN, and physics-based modules for long-term accuracy and uncertainty quantification; representation learning enhances generalizationStructural complexity; long training time;
computational cost hinders real-time deployment
Table 16. Structured comparison of deep learning model categories for aircraft trajectory prediction, highlighting trade-offs across performance, computational cost, and robustness.
Table 16. Structured comparison of deep learning model categories for aircraft trajectory prediction, highlighting trade-offs across performance, computational cost, and robustness.
Model
Category
PerformanceComputational CostRobustness
RNN-basedStable in short-/mid-term prediction;
moderate RMSE/MAE;
weaker in long-horizon predic- tion
Low-to-moderate
(efficient training/
inference).
Sensitive to noise/missing data;
error accumulation over long sequences.
Attention-basedBest long-horizon performa- nce;
improved ADE/FDE and stabi- lity
High GPU/memory demand;
complex tuning.
Strong generalization with sufficient data; scalability in parallel training.
GenerativeCompetitive ADE/FDE;
enhances diversity and rare-scenario prediction
High training costs.
- GAN instability;
- Diffusion latency.
Useful for data augmentation;
robustness to imbalance but unstable training.
Graph-based+10–20% ADE/FDE gain in dense/terminal airspaces;
strong interaction modeling.
High graph construction and computational costs.Sensitive to noisy graphs; limited scalability in real-time large-scale use.
Hybrid and integratedOverall best RMSE/MAE (≈0.011–0.012 km);
combines strengths of multiple families
Highest structural complexity and cost.Strongest robustness across tasks; effective generalization with representation learning.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kwak, N.; Lee, B. The Evolution and Taxonomy of Deep Learning Models for Aircraft Trajectory Prediction: A Review of Performance and Future Directions. Appl. Sci. 2025, 15, 10739. https://doi.org/10.3390/app151910739

AMA Style

Kwak N, Lee B. The Evolution and Taxonomy of Deep Learning Models for Aircraft Trajectory Prediction: A Review of Performance and Future Directions. Applied Sciences. 2025; 15(19):10739. https://doi.org/10.3390/app151910739

Chicago/Turabian Style

Kwak, NaeJoung, and ByoungYup Lee. 2025. "The Evolution and Taxonomy of Deep Learning Models for Aircraft Trajectory Prediction: A Review of Performance and Future Directions" Applied Sciences 15, no. 19: 10739. https://doi.org/10.3390/app151910739

APA Style

Kwak, N., & Lee, B. (2025). The Evolution and Taxonomy of Deep Learning Models for Aircraft Trajectory Prediction: A Review of Performance and Future Directions. Applied Sciences, 15(19), 10739. https://doi.org/10.3390/app151910739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop