An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion

Lu, Gaoyong; Ou, Yang; Li, Wei; Zeng, Xinyu; Zhang, Ziyang; Huang, Dongcheng; Kotenko, Igor

doi:10.3390/aerospace12040319

Open AccessArticle

An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion

by

Gaoyong Lu

¹,

Yang Ou

¹,

Wei Li

^2,*

,

Xinyu Zeng

^2,*,

Ziyang Zhang

²,

Dongcheng Huang

³ and

Igor Kotenko

⁴

¹

The 10th Research Institute of China Electronics Technology Group, Chengdu 610036, China

²

College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China

³

Heilongjiang Dasanyuan Dairy Machinery Co., Ltd., Harbin 150069, China

⁴

Laboratory of Computer Security Problems, St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), Saint-Petersburg 199178, Russia

^*

Authors to whom correspondence should be addressed.

Aerospace 2025, 12(4), 319; https://doi.org/10.3390/aerospace12040319

Submission received: 9 February 2025 / Revised: 30 March 2025 / Accepted: 31 March 2025 / Published: 8 April 2025

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

As globalization and rapid economic development drive a surge in air transportation demand, the need for enhanced efficiency and safety in flight operations has become increasingly critical. However, the exponential growth in flight numbers has exacerbated airspace congestion, creating a stark contrast with the limited availability of airspace resources. This imbalance poses significant challenges to flight punctuality and operational efficiency. To mitigate these issues, existing models often rely solely on individual flight data, which restricts the breadth and depth of feature learning. In this study, we propose an innovative Inverted Transformer framework for aviation trajectory prediction enhanced by multi-flight mode fusion. This framework leverages multi-flight inputs and inverted data processing to enrich feature representation and optimize the modeling of multi-variate time series. By treating the entire time series of each variable as an independent token, our model effectively captures global temporal dependencies and enhances correlation analysis among multiple variables. Extensive experiments on real-world aviation trajectory datasets demonstrate the superiority of our proposed framework. The results show significant improvements in prediction accuracy. Moreover, the integration of multi-flight data enables the model to learn more comprehensive flight patterns, leading to robust performance across varying flight conditions. This research provides a novel perspective and methodology for aviation trajectory prediction, contributing to the efficient and safe development of air transportation systems.

Keywords:

trajectory prediction; spatiotemporal interaction; transformer; time series; deep learning

1. Introduction

The rapid progress of globalization and economic development have led to a significant increase in the demand for air transportation. This surge has imposed higher requirements for efficiency and safety in flight operations. The expanding aviation industry has not only intensified airport surface traffic but also led to a proliferation of taxiways and runways. As a result, the complexity of interconnections among these components has escalated. However, the limited availability of airspace resources is increasingly at odds with the exponential growth in flight numbers, exacerbating airspace congestion and negatively impacting flight punctuality and passenger comfort. Efficient ground operations become particularly challenging during extensive flight delays or unexpected changes in taxiway conditions [1]. Research into aviation trajectory prediction has a long history, dating back to the mid-20th century, and has evolved in tandem with advancements in the aviation industry and computer technology. Initially, trajectory predictions were predominantly based on mathematical models, leveraging Newton’s laws of motion and fundamental aerodynamic principles to formulate straightforward predictive models for aircraft trajectories. However, with the maturation of big data and machine learning technologies, data-driven approaches have come to the forefront. These techniques harness historical flight data, employing statistical analyses and machine learning algorithms to forecast future flight paths. Contemporary planning methods integrate multi-objective optimization, considering parameters such as flight duration and fuel efficiency, to devise optimal flight routes. Meanwhile, deep learning, through neural networks, processes vast spatiotemporal datasets, extracting intricate features to achieve high-precision trajectory predictions.

The recent advent of Transformer models has offered innovative solutions to a myriad of problems [2]. However, incorporating raw data into these models can result in the loss of correlations among multiple variables during the prediction phase. Furthermore, a significant portion of contemporary research predominantly utilizes trajectory data from a single flight for both training and prediction. This reliance on limited data from a single flight implies that the knowledge features acquired during the model training process are correspondingly restricted.

To address these limitations, we propose an improved Transformer model with inverted input. The inverted operation restructures the input data so that the complete temporal evolution of each flight variable (e.g., speed, altitude, etc.) is treated as an independent token before encoding. This fundamental architectural innovation provides three key benefits:

Global Feature Aggregation: Each token encapsulates an entire variable’s history, enabling the self-attention mechanism to directly model long-term intra-variable dependencies and cross-variable interactions at the system level.
Multi-Flight Knowledge Fusion: By processing data from multi-flights simultaneously, this enables model generalization to unseen flight phases through shared feature learning.
Physical Consistency Preservation: The inverted structure naturally aligns with aviation domain constraints, and temporal causality is preserved within each token.

The remainder of this paper is organized as follows: Section 2 provides an overview of the related work in trajectory prediction. Section 3 outlines the architecture of the proposed inverted Transformer model. Section 4 details the methodology and experimentation undertaken in this research. Section 5 discusses the performance outcomes derived from the experimental results, offering a comprehensive analysis. Finally, Section 6 concludes the paper.

2. Related Work

Trajectory prediction is a critical area of research with applications spanning various domains, including aircraft flight paths, ship navigation, vehicle movement, and pedestrian dynamics. Optimization of trajectory prediction can significantly reduce carbon emissions [3]. Over the past decade, significant advancements have been made, particularly with the integration of machine learning and deep learning techniques. This section provides an overview of the key developments and methodologies in trajectory prediction, focusing on aviation applications.

2.1. Physical-Model-Based Trajectory Prediction

Early approaches to trajectory prediction relied heavily on physical models, leveraging fundamental principles of motion and aerodynamics, which encompassed kinematic models, including the Constant Velocity Model, Constant Acceleration Model [4,5], Constant Turning Rate and Velocity Model, Constant Turning Rate and Acceleration Model [6,7], Constant Steering Angle and Velocity Model, and Constant Steering Angle and Acceleration Model [8]. The Kalman Filtering Method offers an advantage over kinematic models by accounting for the uncertainty of the predicted trajectory. This is achieved by modeling the uncertainty of the current target’s motion state using a Gaussian distribution [9]. However, a unimodal Gaussian distribution is inadequate to represent trajectory uncertainty. To address this, the literature proposes the Interactive Multiple Model (IMM) to generate multi-modal trajectories. Jin et al. [10] introduced the Switched Kalman Filter (SKF) to better describe the uncertainty of the target’s motion state. (3) The Monte Carlo method, examined in the work of Okamoto et al. [11], introduced a policy-based model that employs the Monte Carlo method to forecast the trajectories of traffic participants by estimating uncertain states. Similarly, Wang et al. [12] utilized the Monte Carlo method for trajectory prediction and further refined it using Model Predictive Control (MPC). While these models offer a foundational understanding, they often fall short in capturing the complexities of real-world flight dynamics.

2.2. Trajectory Prediction Based on Machine Learning

Machine learning techniques have revolutionized trajectory prediction by enabling the extraction of temporal characteristics from Automatic Identification System (AIS) data. This is achieved through either supervised or unsupervised learning methods to align with the predicted trajectory. The current research can be categorized into the following areas.

2.2.1. Classification

The principle of trajectory prediction based on classification models involves extracting effective trajectory sequences from a vast array of historical trajectories. These sequences are then compared and analyzed against the trajectory to be predicted, with the goal of identifying the optimal matching route for the current trajectory sequence. For instance, Virjonen et al. [13] employed the K-Nearest Neighbors (KNNs) algorithm for trajectory prediction. They used the current target’s trajectory to identify the nearest K trajectory points and then calculated the feature similarity for the corresponding trajectory sequences of these points. The most similar sequence was used as the final prediction result.

While classification models are computationally less complex and easier to construct, they have several limitations. Specifically, they are unable to produce probabilistic outputs, which are often necessary for assessing prediction uncertainty. Moreover, these models struggle to set traffic context parameters or adapt to the complex interactive behaviors of traffic participants in dynamic environments. This lack of adaptability can significantly impact their performance in real-world applications.

2.2.2. Regression

The principle of trajectory prediction using regression models involves deriving a regression equation from trajectory data and leveraging the trend of this equation to extrapolate future trajectories. For example, Liu J. et al. [14] incorporated the variables of speed, heading, timestamp, and position from Automatic Identification System (AIS) data as feature samples. They employed an adaptive chaotic differential evolution algorithm to fine-tune the parameters of Support Vector Machines (SVMs) for enhanced trajectory prediction. This approach outperforms Recurrent Neural Networks (RNNs) and Back-Propagation Neural Networks (BPNNs) in terms of prediction accuracy, computational efficiency, simplicity, and feasibility.

Gaussian Process Regression (GPR) has emerged as an effective machine learning technique for addressing intricate regression challenges, including high dimensionality, limited sample sizes, and nonlinearity. For instance, Rong H. et al. [15] introduced a probabilistic trajectory regression prediction model based on the Gaussian process framework. This model employs a continuous Cholesky decomposition algorithm to mitigate computational complexity, ultimately characterizing the future uncertainty of the target trajectory by predicting the probability distribution of potential positions. GPR offers a more straightforward method than its counterparts and can forecast trajectories over extended periods, and it provides outputs with inherent probabilistic attributes, aiding in uncertainty identification. However, GPR faces challenges in accounting for the traffic environment surrounding moving targets.

2.3. Trajectory Prediction Models Based on Planning

Current research on medium- and long-term trajectory prediction is still in its infancy. Unlike short-term prediction, medium- and long-term trajectory prediction focuses more on route planning rather than specific motion state prediction. In maritime navigation, ships often consider the economic costs of movement when the destination is determined, aiming to optimize future trajectories. Silveira et al. [16] were pioneers in constructing a graph structure based on Automatic Identification System (AIS) data, where graph nodes represented grids within the sea area, and edge weights corresponded to the average speed of transitions between these nodes. They utilized the Dijkstra shortest path algorithm to determine safe ship routes. However, constructing a graph structure for targets and static obstacles over a large area increases the complexity and reduces the construction efficiency of the graph structure, severely impacting the efficiency of route prediction during dynamic interactions among traffic participants.

To address these challenges, Martinsen et al. [17] proposed a trajectory planning method that combines graph search and convex optimization. They constructed an objective function based on a ship’s 3-DOF (degrees of freedom) model and incorporated constraints such as sea area space, minimizing time, distance, and fuel costs to achieve optimal trajectory prediction. Despite its innovative approach, this method suffers from a lengthy runtime and requires further optimization to improve performance.

Another notable contribution is the beam search strategy-based ship trajectory planning and generation method proposed by Karbowska-Chilinska et al. [18]. This method generates a trajectory tree consisting of the ship’s risk avoidance strategies. Experimental results demonstrated that this approach can find shorter trajectory paths, achieving an approximate 3% improvement compared to other methods. Lazarowska et al. [19] developed a ship route planning algorithm utilizing a discrete artificial potential field for autonomous ships. This algorithm ensures real-time path generation and safe navigation in environments with both static and dynamic obstacles. The use of artificial potential fields provides a flexible and efficient way to handle complex maritime environments.

Reinforcement learning (RL) has also been employed to predict aviation target trajectories. RL algorithms learn optimal strategies through agent–environment interaction, making them well suited for dynamic and complex scenarios. For example, Maw et al. [20] introduced an RL-based unmanned aerial vehicle (UAV) trajectory planning method that transforms the trajectory planning problem into an RL problem using a Deep Q-Network (DQN) to achieve trajectory planning. This approach leverages the power of RL to adapt to changing environments and optimize trajectories in real-time. Similarly, Wang et al. [21] proposed an RL-based ship trajectory tracking method that learns the similarities and patterns among trajectory data to achieve intelligent tracking and situation analysis of ship trajectories. This method demonstrates the potential of RL in improving the accuracy and reliability of trajectory prediction.

2.4. Trajectory Prediction Based on Deep Learning

Deep learning has revolutionized the field of trajectory prediction, with models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) playing pivotal roles. More recently, Graph Neural Networks (GNNs) and Transformer models have emerged as powerful tools, offering new possibilities for handling complex spatiotemporal data.

CNNs are renowned for their unique network structure and robust feature extraction capabilities, making them highly effective for time series prediction tasks. Akter et al. [22] introduced a CNN-based system for the detection, monitoring, and type identification of unmanned aerial vehicles (UAVs). Their architecture employs deep convolutional layers to effectively learn the intrinsic feature maps of radio frequency signals from three distinct UAV types. This approach adeptly extracts salient features from UAV trajectory states, facilitating precise characterization and identification of UAV types. The ability of CNNs to capture spatial hierarchies makes them particularly suitable for tasks involving complex signal processing and trajectory analysis. RNNs, with their inherent compatibility with sequence data, excel in capturing temporal and correlation information within trajectory data. Jung et al. [23] presented a novel sequence-to-sequence RNN model tailored for predicting object motion trajectories in space. Their model leverages the temporal dependencies in trajectory data to make accurate predictions, demonstrating the effectiveness of RNNs in handling sequential information. The ability of RNNs to maintain a memory of past inputs makes them well suited for tasks involving time series data, such as trajectory prediction. Given that trajectory data are quintessentially non-Euclidean, GNNs have emerged as a viable tool for modeling and analyzing aviation target trajectories. Zhang et al. [24] devised a marine wireless sensor network construction method leveraging GNNs, transforming trajectory data into a graph structure for clustering purposes. Their approach effectively captures the spatial relationships between different trajectories, enhancing the accuracy of trajectory analysis. Similarly, Lin et al. [25] proposed a GNN-centric fleet trajectory recognition technique, utilizing GNNs for both representation and classification of fleet data. The ability of GNNs to process non-Euclidean spaces makes them particularly suited for tasks involving complex spatial relationships. The recent advent of Transformer models [26] has ushered in novel advancements in the realm of time series prediction [27,28,29,30]. These models are capable of discerning long-term dependencies within sequences, courtesy of their self-attention mechanism. However, when multiple variables at an identical time step are embedded into a singular channel, the intervariable correlation is inadvertently obliterated during the prediction phase. This limitation can be particularly problematic in the context of flight prediction, where prevailing models predominantly harness trajectory data from isolated flights for both training and prediction. Such a reliance on data from a solitary flight inherently constrains the knowledge features that can be gleaned during model training.

Consequently, this study advocates for the utilization of multi-flight input in conjunction with a Transformer model predicated on inverted input. The employment of multi-flight input serves to apprehend more comprehensive flight patterns and features. Concurrently, the adoption of inverted input is instrumental in fully harnessing the modeling prowess of Transformers when addressing multi-variate time series challenges.

3. Network Architecture

The conventional approach in time series forecasting using Transformer-based models involves processing data in a manner where each token represents a single time step across multiple variables, as illustrated in Figure 1. This approach influences the extraction of correlations among various variables, Specifically, the approach deals with the following challenges:

Heterogeneity of Simultaneous Measurements: Data points recorded at the same time step often represent distinct physical phenomena. Due to inconsistent recording practices across different variables, aggregating these points into a single token can obscure the inherent correlations between multiple variables. This aggregation can lead to a loss of critical information about the relationships between different physical processes, thereby hindering the model’s ability to capture multi-variate dependencies.
Complexity of Temporal Representation: The presence of a large number of local receptive fields, combined with the representation of temporally inconsistent events at the same time point, makes it challenging for tokens formed at a single time step to convey meaningful information. This complexity arises because the same time step may capture diverse and nonsynchronous events, which, when combined, can introduce noise and ambiguity into the token representation, thereby reducing the model’s effectiveness in processing temporal patterns.
Underutilization of Permutation-Invariant Attention: Although variations in sequences are significantly influenced by the order of the data, permutation-invariant attention mechanisms are not effectively utilized across the temporal dimension in traditional Transformer models. This limitation arises because the self-attention mechanism, while capable of capturing long-range dependencies, does not fully leverage the temporal structure of the data. As a result, the model may fail to effectively utilize the historical information and temporal context, leading to suboptimal performance in time series forecasting tasks.

Figure 1. The upper part depicts the flow of the original approach inputting into Transformer-based forecasters, while the lower part illustrates the flow of the inverted approach inputting into the Transformer-based forecasters.

However, this approach has inherent limitations when dealing with complex, multi-variate time series data, such as aviation trajectory data. In this study, we introduce an innovative inverted Transformer framework that significantly enhances the model’s ability to capture global temporal dependencies and multi-variate correlations. The network structure of the proposed model is illustrated in Figure 2.

3.1. Inverted Input Embedding

In traditional Transformer models, the input data are organized such that each token represents a single time step across all variables. This can obscure the correlations between different variables, especially when the variables represent distinct physical phenomena. To address this, we employ an inverted input operation, where the entire time series of each variable is treated as an independent token. This approach allows the model to capture the global dependencies and trends across the entire time series while preserving the independence between variables.

Given the original input matrix

X \in R^{T \times N}

, where T is the number of time steps, and N is the number of features, the inverted matrix becomes

X \in R^{N \times T}

. Each feature’s entire time series is treated as a single token, enabling the model to better capture the dependencies and trends across the entire time series. The embedding operation is implemented using a Multi-Layer Perceptron (MLP).

X = X . t r a n s p o s e, X \in R^{N \times T}

(1)

h_{n}^{0} = E m b e d d i n g (X_{:, n})

(2)

where

H = {h_{1}, h_{2}, \dots, h_{N}} \in R^{N \times T}

represents the set of N tokens with dimension D, and

X_{:, n}

refers to the time series of the n-th feature.

Aviation trajectory variables exhibit well-defined physical couplings (such as nonlinear speed–altitude relationships during climb phases). Converntional Transformers bundle variables at the same timestep into a single token, forcing the model to learn these relationships within local temporal windows. In contrast, our inverted operation treats the entire time series of each variable as an independent token, enabling the self-attention mechanism to directly model the following:

Long-term intra-variable patterns;
Global inter-variable correlations.

3.2. Self-Attention Mechanism

The self-attention mechanism is a core component of the Transformer model (see Figure 3), enabling it to capture long-range dependencies in the data. In our inverted Transformer framework, since each token represents the entire time series of a single variable, the attention mechanism focuses on capturing the correlations between different variables.

For each token, the feature vector is multiplied by parameter matrices

W^{Q}

,

W^{K}

, and

W^{V}

to obtain the query vector

Q_{i}

, key vector

K_{i}

, and value vector

V_{i}

. The similarity between each variable and all other variables is computed using the dot product of the query and key vectors, followed by a softmax function to obtain the attention weights:

Q_{i} = h_{i} W^{Q}, K_{i} = h_{i} W^{K}, V_{i} = h_{i} W^{V}, K_{j} = h_{j} W^{K}

(3)

A_{i} = s o f t m a x (\frac{Q_{i} K_{j}^{T}}{\sqrt{d_{k}}}) V_{i}

(4)

A t t e n t i o n (Q, K, V) = [\begin{matrix} A_{1} \\ \dots \\ A_{n} \end{matrix}]

(5)

where

d_{k}

is the embedding dimension of the attention head, which is the feature dimension of the query and key vectors. The scaling factor

\frac{1}{\sqrt{d_{k}}}

normalizes each temporal token across its dimensions.

In the inverted Transformer, where tokens represent variable-level sequences, the attention score

Q_{i} K_{j}^{T}

in Equation (4) fundamentally differs from conventional Transformer. The score computes the correlations between entire temporal evolutions of different variables.

3.3. Layer Normalization

To ensure stable training and prevent gradient vanishing, we apply layer normalization to the attention matrix A obtained in Section 3.2. This is done with the aim of averting gradient vanishing during the training phase, which is a phenomenon that can complicate the process of fitting the loss function. The detailed implementation procedure is as follows:

A^{'} = A + H

(6)

where H represents the original data block, and

A^{'} = {h_{1}^{'}, h_{2}^{'}, \dots, h_{n}^{'}}

, with

n = 1, 2, \dots, N

.

Layer normalization standardizes the data, mitigating the risk of gradient explosion during the model training phase:

L a y e r N o r m (A^{'}) = \{\frac{h_{n}^{'} - M e a n (h_{n}^{'})}{\sqrt{V a r (h_{n}^{'})}} ∣ n = 1, 2, \dots, N\}

(7)

In traditional Transformer models, normalization is applied to the representations of multiple variables at an identical timestamp, which can introduce interaction noise in noncausal or delayed processes. In contrast, our inverted input Transformer model applies normalization to the time series representations of individual variables, effectively mitigating discrepancies caused by inconsistent measurements.

3.4. Feedforward Network

The feedforward network (FFN) applies more sophisticated nonlinear transformations to the output of the attention mechanism. The FFN architecture comprises two Conv1d layers with an activation function:

F F N (A^{'}) = m a x (0, w_{1} A^{'} + b_{1}) w_{2} + b_{2}

(8)

where

w_{1}

and

b_{1}

represent the weight matrix and bias vector for the first linear transformation, respectively, while

w_{2}

and

b_{2}

represent the weight matrix and bias vector for the second linear transformation.

Traditional Transformer models typically treat variables as independent entities, often neglecting their temporal dynamics within the broader sequence. This approach may result in insufficient detailed information for accurate prediction. In contrast, the inverted Transformer, by tokenizing the entire sequence of variables, excels at handling complex time series data. This method provides a more comprehensive representation, capturing the intricate temporal relationships and dependencies among variables, thereby enhancing the model’s predictive capabilities.

3.5. Projection

After the encoding block processing, which involves stacking self-attention, layer normalization, and feedforward networks, the resulting outputs are fed into a Multi-Layer Perceptron (MLP) for the projection operation. The MLP layers consist of an input layer, hidden layers, and an output layer. Through both linear and nonlinear combinations, the MLP promotes comprehensive interactions among various dimensions of the feature vectors, allowing the model to capture richer information on nonlinear and combined features. The projection can be succinctly expressed as

{\hat{Y}}_{:, n} = P r o j e c t i o n (h_{n}^{L})

(9)

where

h_{n}^{L}

represents the output of the embedding vector with a lookback window size of n at the L-th layer, and

{\hat{Y}}_{:, n}

denotes the predicted result for the flight trajectory.

4. Experiments

4.1. Dataset

The ADS-B data used in this study were obtained through a collaborative research program between our laboratory and the Tenth Research Institute of China Electronics Technology Group Corporation (CETC-10). The dataset comprises actual operational records in May 2021. The ADS-B dataset encompasses a comprehensive range of attributes, including flight number, call sign, timestamp, longitude, latitude, speed, and altitude, as detailed in Table 1. For the purposes of this study, the dataset was organized by flight numbers to extract the trajectory data for each specific flight. To ensure the robustness of the training inputs, a length check was applied to each flight’s data, filtering out sequences that might be too short and thus potentially less informative.

As an automatic surveillance technology, ADS-B operates using a broadcast mechanism. Aircraft transmit state vectors as position, velocity, and altitude via an Extended Squitter at 0.5–1 Hz intervals. While this provides richer data streams than traditional radar systems, the technology inherently suffers from several limitations that become particularly pronounced in multi-flight integration scenarios. Space weather disturbances, such as those documented during the 2003 Halloween solar storms, can induce ionospheric scintillation that manifests as intermittent signal loss. Furthermore, the dependence on GNSS synchronization makes the system vulnerable to clock drift in older transponders, while terrain occlusion in mountainous regions creates spatial inconsistencies in sampling rates. Furthermore, during the data collection process, challenges such as inconsistent sampling intervals, missing values, and outliers may emerge. Compared to conventional single-flight studies, multi-flight integration faces the critical challenge of temporal misalignment caused by heterogeneous sampling rates. Our temporal alignment method employs adaptive interpolation to synchronize multi-flight data. In this context, it is imperative to preprocess the gathered ADS-B data prior to their utilization in training. The subsequent steps delineate the preprocessing procedure:

Elimination of Redundant Features: The flight number and call sign were identified as noncontributory to the prediction objective and were consequently excluded as superfluous features.
Temporal Feature Integration We amalgamated the data and temporal features to diminish the quantity of features, thereby streamlining operations.
Uniform Sampling: The method of uniform sampling was employed to diminish the quantity of data points while concurrently maintaining their representativeness.
Treatment of Missing Values: The missing data were imputed using the mean value method.
Dataset Partitioning: The dataset was divided into training, validation, and test sets following a 70%, 20%, and 10% ratio, respectively.

The primary objective of this experiment is to investigate the impact of inputting different scales of datasets during model training. To achieve this goal, the experiment simulates different scales of datasets by limiting the number of flight records. Specifically, as the limitation imposed increases, the number of data points decreases. Detailed experimental results are presented in Section 5.1.

4.2. Evaluation Metrics

In assessing the model’s performance, this paper employs three crucial indicators to comprehensively measure prediction accuracy: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

The Mean Squared Error (MSE) is the average of the squares of the prediction errors, which is calculated as follows:

M S E = \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{N}

(10)

The Root Mean Squared Error (RMSE) is the square root of MSE, which is formulated as follows:

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{N}}

(11)

The Mean Absolute Error (MAE) represents the average of the absolute values of the prediction errors, which is formulated as follows:

M A E = \frac{\sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |}{N}

(12)

where

y_{i}

denotes the predicted value, and

{\hat{y}}_{i}

represents the true value.

5. Experiment Results

5.1. Experiment Performance

In this study, we aimed to examine the influence of varying data scales on model training. To achieve this, we utilized different data length constraints to create our training sets. We then proceeded to make predictions for a flight with the ID CBJ5270-1619861400-schedule-0516:0.

By manipulating the data length constraint, we were able to input data at varying scales to examine the impact of these scales on the model’s performance. Additionally, we modified the number of encoder and decoder layers in pursuit of optimal results. The findings are detailed in Table 2:

The results indicate that the largest scale inputs during the training of the model yielded the best prediction evaluation. Consequently, we can infer that the scales of input data significantly influence the performance of the model.

5.2. Inverted Transformer Performance

In this study, we assessed the efficacy of the inverted Transformer by separately inputting inverted and original data into various Transformer-based models. Our aim was to determine if the invert operation could augment the capacity of Transformer models to process time series data more effectively. We compared the performance of the inverted data input with that of the unprocessed data input on the Reformer [31], Informer [32], Flowformer [28], and Flashattention [33]. The experimental results are presented in Table 3 and Figure 4. These results suggest that the invert operation enhances the ability of Transformer models to process and comprehend time series data. Specifically, it boosts the efficiency of Transformer predictors, bolsters their generalization capability to unseen variables, and promotes a more effective use of historical observation data.

From Figure 4, we can see that the inverted input outperformed the original input on all Transformer-based forecasters. Table 3 shows the improvement of the invert operation on each forecaster.

In conclusion, when data processed through the invert operation were fed into various Transformer-based forecasters, the experimental outcomes indicate a range of improvements relative to the direct input of raw data. This underscores the efficacy of the invert operation in enhancing time series data for Transformer-based forecasting models.

5.3. Result Visualization

In order to better evaluate the prediction effect, this paper visualizes the predicted results in the geographical coordinate system. Using the folium library, we mapped the flight trajectory of flight ID CBJ5270-1619861400-schedule-0516:0. To distinguish the predicted trajectory from the real trajectory, we used the color blue to represent the predicted trajectory and the color red to represent the real trajectory. The visualization is shown in Figure 5. It can be seen that the predicted trajectory of the model is consistent with the real trajectory.

6. Conclusions

This study presents a novel approach to aviation trajectory prediction by introducing an inverted Transformer framework enhanced with multi-flight mode fusion. The proposed method addresses the limitations of traditional Transformer models, which often treat variables as independent entities and neglect their temporal dynamics within the broader sequence. By tokenizing the entire sequence of variables, the inverted Transformer effectively captures global temporal dependencies and enhances the correlation analysis among multiple variables. The experimental results demonstrate the superior performance of the proposed framework. The use of multi-flight data significantly enriches the feature representation, enabling the model to learn more comprehensive flight patterns and characteristics. This, in turn, leads to robust performance across varying flight conditions. The study also highlights the importance of data scale in model training, as larger input data scales yielded better prediction accuracy.

The findings of this research contribute to the field of aviation trajectory prediction by providing a new perspective and methodology. The inverted Transformer framework not only improves prediction accuracy but also enhances the model’s ability to generalize to unseen variables. This approach has the potential to be applied in various aviation related applications, such as air traffic management, flight scheduling, and aviation safety systems. Specifically, the variable-level temporal modeling could extend beyond positional prediction to energy-aware trajectory optimization—a task previously requiring a separate fuel model [34]. Future work may focus on further optimizing the model architecture and exploring additional data sources to further improve prediction accuracy. Additionally, this framework could empower Free Route Airspace operations through adaptive constraint learning while remaining applicable to maritime/vehicle domains.

Author Contributions

Conceptualization, G.L. and W.L.; Data curation, Y.O.; Formal analysis, G.L.; Funding acquisition, W.L.; Investigation, W.L. and Z.Z.; Methodology, W.L.; Project administration, G.L. and Y.O.; Resources, Z.Z.; Software, Y.O., X.Z. and D.H.; Validation, X.Z., Z.Z., D.H. and I.K.; Visualization, X.Z. and D.H.; Writing—original draft, X.Z. and D.H.; Writing—review and editing, G.L., W.L. and I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was sponsored by the Fundamental Research Funds for the Central Universities, 3072024XX0604 and KYWZ120240606, and the Natural Science Foundation of Heilongjiang Province, LH2023F020.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We gratefully acknowledge the valuable comments and suggestions provided by the editors and reviewers.

Conflicts of Interest

Authors Gaoyong Lu and Yang Ou were employed by the company The 10th Research Institute of China Electronics Technology Group. Author Dongcheng Huang was employed by the company Heilongjiang Dasanyuan Dairy Machinery Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, S.; Zhou, S.; Miao, J.; Shang, H.; Cui, Y.; Lu, Y. Autonomous Trajectory Planning Method for Stratospheric Airship Regional Station-Keeping Based on Deep Reinforcement Learning. Aerospace 2024, 11, 753. [Google Scholar] [CrossRef]
Dong, X.; Tian, Y.; Dai, L.; Li, J.; Wan, L. A New Accurate Aircraft Trajectory Prediction in Terminal Airspace Based on Spatio-Temporal Attention Mechanism. Aerospace 2024, 11, 718. [Google Scholar] [CrossRef]
Xue, D.; Du, S.; Wang, B.; Shang, W.L.; Avogadro, N.; Ochieng, W.Y. Low-carbon benefits of aircraft adopting continuous descent operations. Appl. Energy 2025, 383, 125390. [Google Scholar] [CrossRef]
Ammoun, S.; Nashashibi, F. Real time trajectory prediction for collision risk estimation between vehicles. In Proceedings of the 2009 IEEE 5Th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 27–29 August 2009; pp. 417–422. [Google Scholar]
Schubert, R.; Richter, E.; Wanielik, G. Comparison and evaluation of advanced motion models for vehicle tracking. In Proceedings of the 2008 11th International Conference on Information Fusion, Cologne, Germany, 30 June–3 July 2008; pp. 1–6. [Google Scholar]
Lytrivis, P.; Thomaidis, G.; Amditis, A. Cooperative path prediction in vehicular environments. In Proceedings of the 2008 11th International IEEE Conference on Intelligent Transportation Systems, Beijing, China, 12–15 October 2008; pp. 803–808. [Google Scholar]
Barth, A.; Franke, U. Where will the oncoming vehicle be the next second? In Proceedings of the 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008; pp. 1068–1073. [Google Scholar]
Batz, T.; Watson, K.; Beyerer, J. Recognition of dangerous situations within a cooperative group of vehicles. In Proceedings of the 2009 IEEE Intelligent Vehicles Symposium, Xi’an, China, 3–5 June 2009; pp. 907–912. [Google Scholar]
Kaempchen, N.; Weiss, K.; Schaefer, M.; Dietmayer, K.C. IMM object tracking for high dynamic driving maneuvers. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 825–830. [Google Scholar]
Jin, B.; Jiu, B.; Su, T.; Liu, H.; Liu, G. Switched Kalman filter-interacting multiple model algorithm based on optimal autoregressive model for manoeuvring target tracking. IET Radar Sonar Navig. 2015, 9, 199–209. [Google Scholar] [CrossRef]
Okamoto, K.; Berntorp, K.; Di Cairano, S. Driver intention-based vehicle threat assessment using random forests and particle filtering. IFAC-PapersOnLine 2017, 50, 13860–13865. [Google Scholar] [CrossRef]
Wang, Y.; Liu, Z.; Zuo, Z.; Li, Z.; Wang, L.; Luo, X. Trajectory planning and safety assessment of autonomous vehicles based on motion prediction and model predictive control. IEEE Trans. Veh. Technol. 2019, 68, 8546–8556. [Google Scholar] [CrossRef]
Virjonen, P.; Nevalainen, P.; Pahikkala, T.; Heikkonen, J. Ship movement prediction using k-NN method. In Proceedings of the 2018 Baltic Geodetic Congress (BGC Geomatics), Olsztyn, Poland, 21–23 June 2018; pp. 304–309. [Google Scholar]
Liu, J.; Shi, G.; Zhu, K. Vessel trajectory prediction model based on AIS sensor data and adaptive chaos differential evolution support vector regression (ACDE-SVR). Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef]
Rong, H.; Teixeira, A.; Soares, C.G. Ship trajectory uncertainty prediction based on a Gaussian Process model. Ocean. Eng. 2019, 182, 499–511. [Google Scholar] [CrossRef]
Silveira, P.; Teixeira, A.; Guedes-Soares, C. AIS based shipping routes using the Dijkstra algorithm. Transnav Int. J. Mar. Navig. Saf. Sea Transp. 2019, 13. [Google Scholar] [CrossRef]
Martinsen, A.B.; Lekkas, A.M.; Gros, S. Optimal model-based trajectory planning with static polygonal constraints. IEEE Trans. Control. Syst. Technol. 2021, 30, 1159–1170. [Google Scholar] [CrossRef]
Karbowska-Chilinska, J.; Koszelew, J.; Ostrowski, K.; Kuczynski, P.; Kulbiej, E.; Wolejsza, P. Beam search Algorithm for ship anti-collision trajectory planning. Sensors 2019, 19, 5338. [Google Scholar] [CrossRef] [PubMed]
Lazarowska, A. Comparison of discrete artificial potential field algorithm and wave-front algorithm for autonomous ship trajectory planning. IEEE Access 2020, 8, 221013–221026. [Google Scholar] [CrossRef]
Maw, A.A.; Tyan, M.; Nguyen, T.A.; Lee, J.W. iADA*-RL: Anytime graph-based path planning with deep reinforcement learning for an autonomous UAV. Appl. Sci. 2021, 11, 3948. [Google Scholar] [CrossRef]
Wang, S.; Yan, X.; Ma, F.; Wu, P.; Liu, Y. A novel path following approach for autonomous ships based on fast marching method and deep reinforcement learning. Ocean. Eng. 2022, 257, 111495. [Google Scholar] [CrossRef]
Akter, R.; Doan, V.S.; Lee, J.M.; Kim, D.S. CNN-SSDI: Convolution neural network inspired surveillance system for UAVs detection and identification. Comput. Netw. 2021, 201, 108519. [Google Scholar] [CrossRef]
Jung, O.; Seong, J.; Jung, Y.; Bang, H. Recurrent neural network model to predict re-entry trajectories of uncontrolled space objects. Adv. Space Res. 2021, 68, 2515–2529. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Q.; Zhang, Y.; Zhu, Z. VGAE-AMF: A novel topology reconstruction algorithm for invulnerability of ocean wireless sensor networks based on graph neural network. J. Mar. Sci. Eng. 2023, 11, 843. [Google Scholar] [CrossRef]
Lin, Z.; Zhang, X.; He, F. A GNN-LSTM-Based Fleet Formation Recognition Algorithm. In Proceedings of the International Conference on Guidance, Navigation and Control, Tianjin, China, 5–7 August 2022; pp. 7272–7281. [Google Scholar]
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Wu, H.; Wu, J.; Xu, J.; Wang, J.; Long, M. Flowformer: Linearizing transformers with conservation flows. arXiv 2022, arXiv:2202.06258. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Efficient Transformer. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
Dao, T.; Fu, D.; Ermon, S.; Rudra, A.; Ré, C. Flashattention: Fast and memory-efficient exact attention with io-awareness. Adv. Neural Inf. Process. Syst. 2022, 35, 16344–16359. [Google Scholar]
Khan, W.A.; Ma, H.L.; Ouyang, X.; Mo, D.Y. Prediction of aircraft trajectory and the associated fuel consumption using covariance bidirectional extreme learning machines. Transp. Res. Part E Logist. Transp. Rev. 2021, 145, 102189. [Google Scholar] [CrossRef]

Figure 2. (a) The overview of the inverted Transformer. (b) The framework of the multi-variate attention model. (c) The framework of the feedforward network model.

Figure 3. The structure of self-attention.

Figure 4. The effectiveness of inverted input vs original input on various forecasters; the light blue represents the results of original data inputting, while the dark blue represents the results of inverted data inputting.

Figure 5. Trajectory visualization in red line.

Table 1. Description of dataset feature.

Feature Name	Description
HBID	Flight number, the unique identification of the flight
WZSJ	The time of position, including date and time
JD	Longitude
WD	Latitude
GD	Altitude
SD	Airplane flying speed

Table 2. Detailed results for scaling up the dataset scale and the parameters of inverted Transformer. The best results are in bold.

Parameters	Sequence length	10			30			50
	Encoder layer	1	2	3	1	2	3	1	2	3
	Decoder layer	2	3	4	2	3	4	2	3	4
Results	MAE	0.01707	0.07209	0.09898	0.06055	0.10614	0.03966	0.05466	0.02167	0.19648
	RMSE	0.13064	0.26841	0.31462	0.14588	0.32579	0.19915	0.23380	0.14721	0.44326
	MAE	0.06019	0.13311	0.10631	0.12304	0.15811	0.13908	0.13624	0.06150	0.05371

Table 3. Result on various models based on Transformer using original and inverted inputting.

Model	Transformer		Reformer		Informer		Flowformer		Flashattention
Metric	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE
Original	0.0279	0.0835	0.0357	0.0803	0.0321	0.0935	0.0285	0.0719	0.0294	0.0873
Inverted	0.0171	0.0602	0.0218	0.0784	0.0222	0.0735	0.0235	0.0704	0.0196	0.0767
Promotion	63.3%	38.8%	63.7%	2.4%	45.0%	21.2%	21.4%	2.2%	49.9%	13.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, G.; Ou, Y.; Li, W.; Zeng, X.; Zhang, Z.; Huang, D.; Kotenko, I. An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion. Aerospace 2025, 12, 319. https://doi.org/10.3390/aerospace12040319

AMA Style

Lu G, Ou Y, Li W, Zeng X, Zhang Z, Huang D, Kotenko I. An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion. Aerospace. 2025; 12(4):319. https://doi.org/10.3390/aerospace12040319

Chicago/Turabian Style

Lu, Gaoyong, Yang Ou, Wei Li, Xinyu Zeng, Ziyang Zhang, Dongcheng Huang, and Igor Kotenko. 2025. "An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion" Aerospace 12, no. 4: 319. https://doi.org/10.3390/aerospace12040319

APA Style

Lu, G., Ou, Y., Li, W., Zeng, X., Zhang, Z., Huang, D., & Kotenko, I. (2025). An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion. Aerospace, 12(4), 319. https://doi.org/10.3390/aerospace12040319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Inverted Transformer Framework for Aviation Trajectory Prediction with Multi-Flight Mode Fusion

Abstract

1. Introduction

2. Related Work

2.1. Physical-Model-Based Trajectory Prediction

2.2. Trajectory Prediction Based on Machine Learning

2.2.1. Classification

2.2.2. Regression

2.3. Trajectory Prediction Models Based on Planning

2.4. Trajectory Prediction Based on Deep Learning

3. Network Architecture

3.1. Inverted Input Embedding

3.2. Self-Attention Mechanism

3.3. Layer Normalization

3.4. Feedforward Network

3.5. Projection

4. Experiments

4.1. Dataset

4.2. Evaluation Metrics

5. Experiment Results

5.1. Experiment Performance

5.2. Inverted Transformer Performance

5.3. Result Visualization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI