A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification

Yang, Zhenwei; Chen, Zhuang; Sun, Botian; Ai, Yibo; Zhang, Weidong

doi:10.3390/s26051526

Open AccessArticle

A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification

by

Zhenwei Yang

,

Zhuang Chen

,

Botian Sun

,

Yibo Ai

and

Weidong Zhang

^*

National Center for Materials Service Safety (NCMS), University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(5), 1526; https://doi.org/10.3390/s26051526

Submission received: 16 December 2025 / Revised: 16 February 2026 / Accepted: 20 February 2026 / Published: 28 February 2026

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Versions Notes

Abstract

Maneuver pattern classification is fundamental for understanding and predicting the dynamic behaviors of aerial vehicles operating in increasingly complex airspace environments. However, existing rule-based and data-driven approaches are constrained by the scarcity, imbalance, and limited maneuver diversity of real-world flight data, leading to a restricted generalization capability and a reduced robustness to noise. To address these challenges, we construct a standardized Maneuver Pattern Library, a curated dataset of simulated flight trajectories encompassing five representative maneuver primitives: climb, descent, left turn, right turn, and loiter. Trajectories are generated using the X-Plane 12 flight simulator under controlled conditions to ensure maneuver diversity and label consistency, refined through noise reduction and cubic spline interpolation, and rendered from synchronized top and side views with time-encoded color gradients to preserve temporal continuity. Building upon this dataset, we propose DualView-LiteNet, a lightweight Siamese convolutional network designed to jointly learn complementary spatial and temporal cues from dual-view trajectory representations through parameter sharing and feature fusion. In addition to comprehensive comparisons with multiple baseline models on the simulated benchmark, we further evaluate the trained model via direct inference on a real-world ADS-B dataset collected from ADS-B Exchange, without any fine-tuning. The consistent performance observed in this sim-to-real setting demonstrates the practical feasibility and generalization capability of the proposed approach. The experimental results show that DualView-LiteNet achieves an accuracy of 97.64%, with its precision, recall, and F1-score all reaching 0.98 on the benchmark dataset, validating its effectiveness for aerial maneuver pattern classification and establishing a reliable reference for future research.

Keywords:

maneuver pattern classification; Siamese network; dual-view fusion; lightweight architecture; aerial trajectory analysis

1. Introduction

With the rapid development of urban air traffic and integrated space–air–ground networks, aircraft are moving towards a new stage of greater intelligence and collaboration. At the same time, this places higher demands on the aircraft’s behavior. In order to analyze the motion characteristics of aircraft more accurately, it is particularly necessary to study their maneuver patterns. Recognizing these patterns is the core part of aircraft motion characteristics’ analysis. Common flight maneuvers include straight flight, climb, descent, turn, and loiter. Such recognition technology can improve the control capability of aircraft, help detect air traffic conflicts, and assist aircraft in path planning. In other areas, this technology can also be used to train and evaluate pilots, helping them avoid hidden hazards and incorrect maneuvers in a timely manner. As flight missions become increasingly complex and performance standards become increasingly stringent, research on aircraft maneuver patterns is of great significance [1].

The initial phase of research in this area was characterized by a strong dependence on rule-based systems derived from expert knowledge [2,3,4,5]. These systems operated by matching real-time flight parameters against a predefined rule set within an expert-curated knowledge base to identify action types. Although effective in constrained and simple operational contexts, this approach exhibited a critical limitation: its performance was bottlenecked by the scope of manual knowledge engineering. This inherent constraint renders traditional methods increasingly incapable of addressing the demands of modern, high-maneuverability aircraft and dynamically changing battlefields. Notably, the advent of AI and the broader technological revolution have catalyzed a paradigm shift in the field. Data-driven research methods are now unequivocally mainstream. This change is demonstrated by contemporary methods. For instance, Bayesian models and Support Vector Machines (SVM) provide reliable dynamic classification of flight parameters. More effectively, deep neural networks significantly improve pattern identification accuracy by extracting intricate, nonlinear patterns from enormous volumes of flight data [6,7].

Despite notable progress in data-driven maneuver pattern recognition, current approaches exhibit an excessive reliance on flight data. Typically acquired through radar scanning, such data frequently suffers from quality inconsistencies, sensor noise, and compromised temporal integrity, especially under fast-changing maneuver states. These issues lead to trajectory discontinuities, misaligned sampling rates, and viewpoint-dependent distortions, all of which hinder stable feature learning. Consequently, recognition models trained on these datasets tend to demonstrate suboptimal accuracy and limited generalization capability. Furthermore, the inherent scarcity of flight data—particularly the prohibitive costs associated with acquiring high-value maneuver samples in real-world scenarios—poses additional constraints on algorithmic refinement [8,9,10]. Most existing datasets contain only single-view or sparsely sampled trajectories, lacking the structured multi-view representations needed to capture the full geometric and temporal characteristics of complex aerial maneuvers. While current data-driven methods, including both machine learning and deep learning, continue to face challenges such as insufficient accuracy, data inefficiency, and sensitivity to sampling irregularities, research leveraging neural networks for lightweight, robust maneuver recognition remains relatively underdeveloped. Nevertheless, ongoing advances in artificial intelligence, especially the growing maturity of neural network technologies, offer promising new pathways for advancing maneuver pattern recognition [11].

To address the long-standing challenges of insufficient model generalization and low recognition accuracy in maneuver pattern recognition, this study integrates data-driven paradigms with deep neural networks to fundamentally optimize the recognition framework. Guided by this core strategy, we first establish a systematic Maneuver Pattern Library as the foundation of our work. Building upon the Maneuver Pattern Library, a dual-view image-based classification approach is proposed, and its overall framework is illustrated in Figure 1. The Maneuver Pattern Library adopts dual-view projection techniques to convert continuous maneuver trajectories into structured images, effectively preserving key temporal features while reducing data dimensionality. This design enhances the representation of trajectory morphology and improves the model’s robustness to data irregularities. In addition, the methodology developed for constructing the Maneuver Pattern Library can be extended to the organization of domain knowledge in aerospace studies, providing useful support for subsequent analyses. To handle the characteristics of dual-view data, a Siamese network is employed, which independently extracts features from each view through convolutional operations and integrates them through a lightweight feature fusion mechanism. Based on this framework, we conduct systematic comparisons with conventional machine learning methods, CNN-based baselines, and attention-enhanced dual-view variants, and further assess the trained model via direct inference on real-world ADS-B trajectories. This study contributes a practical and extensible framework for maneuver pattern recognition and offers a reference for future work in related engineering applications.

The contributions are summarized as follows:

We construct and release the Maneuver Pattern Library, a structured dual-view, time-encoded image dataset annotated with five fundamental flight maneuver categories.
We define the dual-view maneuver pattern classification task and demonstrate how orthogonal trajectory projections enhance robustness and interpretability in maneuver pattern classification.
We establish comprehensive benchmarks for dual-view maneuver pattern classification by comparing conventional machine learning models (SVM), CNN-based approaches, attention-enhanced dual-view baselines, and the proposed lightweight Siamese network DualView-LiteNet. Extensive experiments on simulated data and direct inference on real-world ADS-B trajectories demonstrate the effectiveness and generalization capability of the proposed framework.

The remainder of this paper is organized as follows: Section 2 reviews the related work relevant to this study. Section 3 introduces the construction of the Maneuver Pattern Library. Section 4 presents the proposed DualView-LiteNet framework for the maneuver pattern classification task. Section 5 reports the experimental results and provides detailed performance analysis. Finally, Section 6 concludes the paper and discusses potential future research directions and applications.

2. Related Works

2.1. Knowledge Base Matching Methods

Early studies on maneuver pattern recognition primarily relied on rule-driven, knowledge-based matching techniques. In these approaches, domain experts predefined a library of canonical maneuver templates that encoded representative flight attributes such as attitude, the angle of attack, and the roll rate. Incoming trajectory segments were then aligned with these templates to infer maneuver categories. Austin [12], for example, formalized pilot knowledge into seven prototypical fighter maneuvers serving as modular components for pattern recognition. Barndt [13] expanded this paradigm by introducing over 1200 expert rules and associating maneuver intensity with structural stress, thereby detecting risky behaviors such as descent pulls and high-G turns. However, such systems often grew cumbersome, with redundant or overlapping logic. To address latency and uncertainty, Tian [14] incorporated fuzzy control theory, smoothing trajectories, labeling key nodes, and filtering perturbations to enhance real-time recognition. Despite their interpretability, knowledge-based frameworks remain limited by their dependence on handcrafted rules and poor scalability in highly dynamic flight scenarios.

2.2. Similarity-Based Matching Methods

Similarity-based approaches employ distance metrics or scoring functions to align and compare observed trajectories with standard templates. Dynamic Time Warping (DTW) is the most prevalent technique, automatically stretching or compressing time axes to align key features and compute a similarity score. In the maneuver pattern recognition context, DTW matches a test sequence to prototype maneuvers by minimizing temporal distance, thus handling variations in execution speed [15,16,17,18,19,20,21]. Complementary to DTW, Genetic Algorithms (GAs) have been applied to evolve optimal matching rules without the explicit modeling of maneuver dynamics [22,23,24]. By encoding candidate solutions as chromosomes and applying selection, crossover, and mutation, GAs search the high-dimensional rule space to maximize classification fitness. While similarity-based methods offer interpretability and control, their performance degrades on noisy or incomplete trajectories and they often incur high computational costs.

2.3. Machine Learning-Based Methods

With advances in sensing and computing, machine learning techniques such as Bayesian classifiers and SVMs have been adopted to mitigate the limitations of rule-based systems. Bayesian approaches estimate the posterior probability of each maneuver class given features extracted from radar, inertial measurement units, or GPS data—such as velocity, acceleration, the angle of attack, and the roll rate—and assign the class with the highest posterior [25,26,27,28,29,30]. These methods require comprehensive feature sets, yet parameters like the roll rate are not always available in real time, limiting the accuracy and responsiveness. SVMs address small-sample, high-dimensional classification by constructing optimal hyperplanes in feature space via kernel functions [31,32,33,34]. While SVMs generalize well on limited data, they depend on careful feature engineering or time window segmentation to capture temporal dynamics, and their computational complexity can hinder real-time deployment.

2.4. Deep Learning-Based Methods

The emergence of deep neural networks has led to end-to-end maneuver pattern classification solutions that learn hierarchical representations directly from raw or minimally processed data. Early work employed autoencoders combined with spectral clustering to reduce dimensionality and segment trajectories [35]. Recurrent neural networks (RNN) [36] have been applied to capture sequential dependencies in flight dynamics. For example, multiple RNN [36] architectures—Bi-LSTM [37], LSTM [38], GRU [39], and vanilla RNN [36]—have demonstrated strong capability in identifying turning and maneuver types of airborne targets [40]. Such temporal models often incorporate sliding windows, dropout, or bidirectional structures to improve stability and robustness in long-range sequence modeling [41,42].

More recently, Transformer architectures—with self-attention mechanisms—have demonstrated their superior ability to model spatiotemporal dependencies, inspiring variants tailored to maneuver classification [43,44,45,46,47,48,49]. However, these architectures typically involve large parameter counts and substantial computational overhead, making them less suitable for lightweight or real-time deployment scenarios. Although a few recent Transformer variants attempt to reduce model complexity through sparse attention, windowed attention, or hierarchical designs, the existing literature still primarily focuses on performance rather than architectural efficiency. In addition to deep neural models, probabilistic graphical models have also been explored. For instance, Markov Random Field (MRF)-based segmentation methods have been proposed to recognize and partition flight actions, achieving a higher accuracy than traditional flight action recognition algorithms [50]. These models provide structured interpretability but generally lack the flexibility and representation power of modern neural networks.

While knowledge-based and similarity-based approaches offer interpretability and controllability, they lack robustness in complex or noisy environments. Traditional machine learning improves generalization but requires extensive feature engineering. Deep learning methods excel in feature extraction and end-to-end learning, yet they are often computationally heavy and data hungry. Existing studies rarely adopt explicitly lightweight architectures for maneuver classification, leaving a gap between high-capacity models and resource-constrained applications. This motivates the development of our lightweight dual-view framework, which aims to maintain competitive recognition performance while significantly reducing computational complexity and dependency on large-scale labeled datasets.

3. Maneuver Pattern Library

The Maneuver Pattern Library serves as the dataset for maneuver pattern recognition and constitutes the foundational focus of this research. While previous recognition models often directly analyze time series data, trajectory time series data is susceptible to influences from data sources, with actual trajectory data frequently exhibiting significant fluctuations. To address this, our study leverages projection-based dimensionality reduction to convert trajectory data into image features, thereby constructing a comprehensive Maneuver Pattern Library to support subsequent investigations.

Figure 2 provides a detailed visualization of the five maneuver categories included in the Maneuver Pattern Library—climb, descent, left turn, right turn, and loiter—each shown in both top-view and side-view projections. A color gradient from dark to light indicates the progression from the start to the end of the trajectory. These complementary views illustrate why single-view observation is often insufficient for reliable maneuver discrimination. For example, climb and descent trajectories appear almost linear in the top view, yet are distinguished in the side view by whether the end point lies above or below the start. Conversely, left turn, right turn, and loiter maneuvers require top-view information: turning direction is inferred from the horizontal curvature and relative positioning of start and end points, while loiter is identifiable by its closed or near-closed loop pattern. By presenting both projections, Figure 2 qualitatively demonstrates the necessity of dual-view fusion and provides intuitive evidence of how different maneuvers manifest in complementary geometry. These visualizations support the construction of a robust Maneuver Pattern Library and directly motivate the dual-view modeling strategy employed in this work.

3.1. Data Generation

Since real aircraft flight data are often confidential due to military and commercial restrictions, and may suffer from radar inaccuracies or data loss, this study employs a flight simulation platform to generate high-quality data for subsequent analysis. Mainstream simulation systems include Microsoft Flight Simulator (MFS) [51,52], FlightGear [53], Prepar3D [54], Digital Combat Simulator World (DCS World) [55], and X-Plane 12 [56,57,58,59,60,61,62,63]. MFS, developed by Microsoft since 1982, features ray tracing and photorealistic environments, offering diverse aircraft and weather settings. FlightGear, launched in 1997 as an open-source project, allows users to modify and extend its realistic flight models freely. Prepar3D, developed by Lockheed Martin in 2010, focuses on professional training and supports plugin extensions with real-time meteorological and navigation systems. DCS World, introduced by Eagle Dynamics in 2012, specializes in military flight and air combat simulation, featuring an advanced physics engine for realistic aerial maneuvers. X-Plane 12, released in 2022 by Laminar Research, employs Blade Element Theory to precisely model aerodynamic behavior and integrates modern graphics and dynamic weather systems.

Each platform has unique strengths and limitations. MFS offers superior graphics but demands high hardware performance and is less suitable for extreme maneuvers. Prepar3D provides realistic environments but is costly and complex to use. DCS World focuses mainly on combat scenarios, limiting its general applicability. FlightGear, though open source, lags behind in terms of its graphics and physics realism. X-Plane 12, in contrast, achieves a strong balance between realism, flexibility, and usability. Its physics engine accurately reproduces stall and adverse weather behavior, and its built-in aircraft, airports, and exportable flight data make it ideal for research purposes. Accordingly, X-Plane 12 is selected as the main platform in this study. It provides authentic, controllable, and diverse flight data that effectively support trajectory analysis and maneuver pattern recognition.

As seen in Figure 3, all simulations use a desktop hardware setup consisting of a Thrustmaster A-10C “Warthog” joystick and throttle system (Guillemot Corporation S.A., Carentoir, France). The joystick’s three-axis magnetic sensors and 19 programmable buttons enable the precise control of roll, pitch, and yaw inputs, while the throttle’s adjustable friction mechanism ensures stable and smooth power transitions during maneuver execution. Leveraging these controls, we pilot a Grumman F-14 “Tomcat” (Grumman Aerospace Corporation, Bethpage, NY, USA) through the prescribed maneuver set, as illustrated in Figure 4. It should be noted that the choice of the F-14 is not motivated by military applications, but rather by its role as a well-established and thoroughly documented aerodynamic platform within the simulation environment. The proposed framework does not depend on aircraft-specific parameters, and the maneuver patterns are defined purely based on trajectory geometry and kinematics. The F-14 is a twin-seat, twin-engine, variable-sweep wing aircraft whose aerodynamic characteristics and structural design have been extensively documented over decades of use. The detailed physical and performance parameters of the aircraft are summarized in Table 1. Its propulsion system originally featured the TF30-P-414A turbofan engines (Pratt & Whitney, East Hartford, CT, USA) and was later upgraded to F110-GE-400 engines (General Electric, Cincinnati, OH, USA), each capable of delivering up to 123 kN of maximum afterburning thrust. This combination of well-established aerodynamic data, stable performance characteristics, and strong maneuverability makes the F-14 a representative and reliable platform for generating consistent and realistic maneuver trajectory data in simulation.

The simulation environment is configured around Runway 11L of Beijing Daxing International Airport, a modern large-scale civil airport with a 3796 m long and 60 m wide runway. Selecting a major international airport provides a standardized and obstacle-free environment, reducing confounding factors from terrain or runway limitations and ensuring that takeoff, landing, and low-altitude procedures remain consistent across repeated runs. Weather conditions are intentionally set to clear skies with 64 km visibility, 58 °F temperature, an altimeter setting of 1013 hPa, and a wave height of 3 ft from direction 270°. These parameters establish a controlled baseline with minimal atmospheric disturbance, allowing maneuver differences to arise primarily from pilot inputs rather than stochastic environmental effects. This controlled setting enhances the reproducibility of trajectories and isolates the effect of maneuver type on trajectory variation. Under this standardized environment, we conducted 14 simulation runs covering the five target maneuver categories, each recorded at high temporal resolution to ensure the fidelity of the resulting 3D trajectory dataset. The detailed process of segmenting these continuous trajectories into discrete maneuver samples, along with the final distribution of samples across the five classes, is described in Section 3.3.

3.2. Coordinate Transformation

The raw trajectory data are recorded in WGS84 geographic coordinates (latitude, longitude, and altitude), which are unsuitable for direct Euclidean distance computation. To obtain a unified metric 3D representation, we convert all points into the geocentric Cartesian system CGCS2000, whose origin is at the Earth’s center of mass and whose axes align with the prime meridian (X), the 90° E meridian (Y), and the rotational axis (Z).

For each trajectory point with geodetic latitude

ϕ

, longitude

λ

, and altitude h, we compute the radius of curvature in the prime vertical as

N = a / \sqrt{1 - e^{2} {sin}^{2} ϕ}

, where

a = 6,378,137.0

m is the WGS84 equatorial radius and

e^{2} = 2 f - f^{2}

is the first eccentricity squared derived from the flattening

f = 1 / 298.257222101

. The Cartesian coordinates are then obtained by

X = (N + h) cos ϕ cos λ, Y = (N + h) cos ϕ sin λ, Z = ((1 - e^{2}) N + h) sin ϕ .

(1)

This conversion produces a consistent Euclidean 3D trajectory representation suitable for subsequent spatial analysis.

3.3. Maneuver Pattern Annotation and Data Augmentation

Since no standardized scheme exists for maneuver pattern segmentation, we develop a semi-automated annotation tool in Python (version 3.10.8, Python Software Foundation, Wilmington, DE, USA) using Matplotlib (version 3.10.3, NumFOCUS, Austin, TX, USA) for trajectory visualization and manual labeling. Rather than relying on subjective judgment, the annotation process follows explicit kinematic and geometric rules derived from standard maneuver definitions. Specifically, climb and descent maneuvers are identified based on sustained monotonic changes in the vertical coordinate, with their horizontal projections remaining approximately linear over the corresponding time interval. Left and right turns are determined by the curvature direction of the trajectory in the horizontal plane, where clockwise and counterclockwise bending patterns correspond to right and left turns, respectively. Loiter maneuvers are identified by closed-loop or near-circular patterns in the horizontal trajectory projection, typically spanning multiple revolutions or a continuous looping segment.

Annotators use the visualization tool to inspect trajectories and record the start and end timestamps of maneuver segments according to these predefined rules. The role of the annotators is to apply the criteria consistently and ensure the temporal coherence of maneuver boundaries, rather than to perform subjective or experience-driven interpretation. This rule-based annotation strategy improves reproducibility and enables the systematic construction of a maneuver-centric dataset.

To address class imbalance and enhance robustness, we apply two augmentation strategies:

1.: Spatial Translation: Add small random offsets to each Cartesian coordinate, using horizontal displacement ranges of 10 m and 50 m to simulate realistic GPS drift while preserving the maneuver’s global geometric structure.
2.: Random Temporal Sampling: Subsample points along the trajectory by adjusting the sampling density to 0.7 or 0.8, mimicking the variability in the onboard measurement frequency without altering the temporal ordering.

Combining these methods yields additional synthetic samples that maintain maneuver characteristics yet diversify spatial and temporal contexts. From the 14 raw trajectories, we annotate and augment to obtain the following sample counts: climb (133), descent (148), left turn (118), right turn (107), and loiter (126), totaling 632 original samples. Post-augmentation, the maneuver library comprises 1264 samples. Although these augmented samples originate from the limited set of raw trajectories, the introduced spatial perturbations and temporal resampling effectively simulate realistic measurement noise and natural intra-class variability—such as slightly steeper or smoother altitude transitions or modest differences in turn curvature. This strengthens the model’s robustness and generalization. However, we acknowledge that augmentation cannot create fundamentally new maneuver types or unseen combinations beyond those embodied in the original flights; thus, the dataset’s intrinsic diversity remains constrained by the scope of the initial raw trajectories.

3.4. Trajectory Denoising via Kalman Filter

Although X-Plane 12 provides high-fidelity dynamics, simulation noise remains. We adopt a Kalman filter for recursive “predict correct” smoothing [64]. The state transition is denoted as

{\bar{x}}_{k} = A x_{k - 1} + B u_{k} + w_{k},

(2)

{\bar{P}}_{k} = A P_{k - 1} A^{T} + Q,

(3)

where

{\bar{x}}_{k}

is the estimated system state at time step k, A is the state transition matrix, B is the control input matrix,

u_{k}

is the control input,

w_{k}

is the process noise,

{\bar{P}}_{k}

is the state covariance matrix, and Q is the process noise covariance matrix.

The update step is formulated as

K_{k} = {\bar{P}}_{k} H^{T} {(H {\bar{P}}_{k} H^{T} + R)}^{- 1},

(4)

x_{k} = {\bar{x}}_{k} + K_{k} (z_{k} - H {\bar{x}}_{k}),

(5)

P_{k} = (1 - K_{k} H) {\bar{P}}_{k},

(6)

where

K_{k}

is the Kalman gain,

z_{k}

is the measurement at step k, H is the observation matrix, R is the measurement noise covariance, and I is the identity matrix.

We determine the process noise covariance Q and measurement noise covariance R through a grid search over the range

[0.01, 0.05]

. For each parameter pair, three quantitative metrics were evaluated: (i) the mean-squared error (MSE) between the filtered and raw trajectories, reflecting fidelity; (ii) the trajectory length, representing the trade-off between smoothness and over-smoothing; and (iii) the acceleration variance (second-order difference variance), indicating motion stability. Sensitivity analysis shows that MSE varies only slightly within

Q \pm 0.002

, while the acceleration variance is more sensitive to changes in R. The parameter combination

Q = 0.02

and

R = 0.03

lies in a Pareto-optimal region that balances smoothness and fidelity, and is therefore adopted as the final configuration.

3.5. Dual-View Projection and Cubic Spline Interpolation

To reduce dimensionality while retaining maneuver-specific information, each 3D segment is projected onto two orthogonal planes:

Top view (X-Y) for horizontal turns and loiter loops;
Side view (X-Z) for climbs, descents, and pitch variations.

Projected points are often sparse, so we employ cubic spline interpolation to reconstruct continuous curves. For ordered points

x_{0}, x_{1}, \dots, x_{n}

, we define piecewise polynomials [65,66]:

S_{i} (x) = a_{i} + b_{i} (x - x_{i}) + c_{i} {(x - x_{i})}^{2} + d_{i} {(x - x_{i})}^{3},

(7)

where

S_{i} (x)

is the cubic spline function in the i-th interval,

x_{i}

is the knot point, and

a_{i}

,

b_{i}

,

c_{i}

, and

d_{i}

are the spline coefficients to be solved.

These splines satisfy continuity constraints on the function values and the first and second derivatives at interval boundaries. Boundary conditions are defined with zero second derivatives at end points to ensure natural spline behavior. Solving the resulting linear system yields smooth, non-oscillatory trajectories suitable for image rendering.

3.6. Temporal Color Encoding

After the trajectory data undergoes projection processing, the resulting trajectory lines contain only spatial position information, as the temporal characteristics from the original data are no longer preserved. This makes it difficult for recognition models to extract temporal information. The temporal information of trajectories captures key features such as velocity and acceleration. To address this issue, this study assigns colors along the timeline to the trajectory lines, utilizing gradual variations in color intensity to visually represent temporal evolution. The core concept of color mapping involves assigning colors to trajectory points based on temporal information, creating a visual gradient from dark to light along the trajectory from start to end. We begin by normalizing the timestamp corresponding to each trajectory point:

t_{i}^{'} = \frac{t_{i} - t_{min}}{t_{max} - t_{min}},

(8)

where

t_{i}

is the timestamp of the i-th trajectory point, and

t_{min}

and

t_{max}

are the start and end times of the trajectory, respectively.

Next, the normalized time values are mapped to the hue channel of the HSV color model, causing the trajectory color to evolve over time. In the HSV model, the hue (H, measured in degrees from 0° to 360°) governs the trajectory color, while saturation (S, ranging from zero to one) and value (V, ranging from zero to one) remain fixed to maintain smooth color transitions. The mapping is defined as follows:

H_{i} = H_{min} + (H_{max} - H_{min}) \cdot t_{i}^{'},

(9)

where

H_{min}

and

H_{max}

define the color gradient range. To prevent abrupt color shifts, linear interpolation is applied to smooth the color transitions, ensuring a visually coherent color evolution. This approach effectively embeds temporal information directly into the 2D trajectory visualization, allowing for the intuitive perception of time-based progression from the start to the end point.

The final library comprises 1264 paired top-down and side-view images, each exhibiting smooth trajectories and clear temporal gradients, as seen in Figure 2. This standardized dataset underpins the Siamese convolutional network recognition framework described in Section 4.

4. Method

This section introduces the DualView-LiteNet framework for maneuver pattern classification, building upon our constructed Maneuver Pattern Library that provides a structured dataset of annotated flight trajectories. As depicted in Figure 5, the framework strategically leverages synchronized top-view and side-view representations from the Maneuver Pattern Library to capture consistent spatiotemporal relationships across diverse maneuvers.

The architecture comprises three core stages: (a) maneuver pattern data preprocessing, (b) multi-view feature extraction and fusion, and (c) maneuver classification. A key design innovation lies in the adoption of a shared-weight Siamese convolutional structure, which enables the learning of unified feature representations from dual-view inputs while maintaining parameter efficiency. By simultaneously processing horizontal (top-view) and vertical (side-view) motion dynamics, the network demonstrates enhanced resilience to perspective variations and significantly improves inter-class discrimination. These deliberate architectural choices allow DualView-LiteNet to achieve an optimal balance between recognition accuracy and computational demands, addressing the critical need for deployable real-time systems in resource-constrained operational environments.

4.1. Task Description

The task of maneuver pattern classification based on dual-view trajectory imagery aims to identify the type of aerial maneuver performed by a target through its spatiotemporal motion signatures observed from two complementary perspectives. Each sample consists of a synchronized pair of time series trajectory images: the top view, which represents the planar motion across horizontal coordinates, and the side view, which depicts altitude changes over time. Together, these two projections characterize the aircraft’s motion in both lateral and vertical dimensions, offering a more complete depiction of flight behavior.

Formally, the input to the model is defined as

I_{top}, I_{side} \in R^{H \times W \times 3},

(10)

where

I_{top}

and

I_{side}

denote the top-view and side-view trajectory images respectively, each with spatial resolution

(H, W)

and three RGB channels encoding both spatial contours and temporal gradients.

The corresponding label for each trajectory pair is defined as

y \in {1, 2, \dots, C},

(11)

where C is the number of maneuver categories, including climb, descent, left turn, right turn, and loiter.

The goal is to learn a mapping function,

f : R^{2 \times H \times W \times 3} \to {1, 2, \dots, C},

(12)

that accurately maps the input image pair

(I_{top}

and

I_{side})

to its corresponding maneuver pattern label y.

4.2. Architecture of DualView-LiteNet

DualView-LiteNet is a lightweight Siamese CNN designed for maneuver pattern classification from synchronized top-view and side-view trajectory images. As shown in Figure 5, the architecture comprises three key stages: (a) data preprocessing, (b) dual-branch feature extraction with shared weights and feature fusion, and (c) maneuver pattern classification.

4.2.1. Data Preprocessing

As illustrated in Figure 5a, all raw trajectory images undergo a uniform preprocessing pipeline to ensure consistent inputs and stable model training. First, synchronized top-view and side-view frames corresponding to the same maneuver instance are paired and temporally aligned. Each image is then resized to

224 \times 224

pixels while maintaining the original aspect ratio to fit the network’s expected input size. To mitigate illumination and sensor differences, pixel values are normalized by subtracting the channel-wise mean and dividing by the standard deviation. Finally, the processed images are converted into tensors of shape

[3, 224, 224]

, compatible with PyTorch (version 2.7.1, Linux Foundation, San Francisco, CA, USA)-style model input conventions.

4.2.2. Dual-Branch Feature Extraction

As shown in Figure 6, corresponding to the Backbone 1 and Backbone 2 components in Figure 5b, the feature extraction module is designed to capture both horizontal motion patterns from the top view and altitude dynamics from the side view. A Siamese convolutional architecture with two identical branches is employed, following a design widely used in metric learning to promote consistent feature representation. At the initialization stage, both branches share identical convolutional weights. During training, the gradients are synchronized across branches, ensuring that parameter updates remain consistent. This shared weight mechanism enables the model to retain view-specific characteristics while benefiting from the mutual reinforcement of the dual branches, effectively improving generalization.

Each branch consists of two consecutive convolutional stages. In each stage, a

3 \times 3

convolutional layer captures local spatial and texture information:

F_{i j}^{l} = \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} W_{m n}^{l} X_{(i + m) (j + n)}^{l - 1} + b^{l},

(13)

where

W_{m n}^{l}

and

b^{l}

denote the weights and bias of layer l, and

X^{l - 1}

is the input feature map. An ReLU activation function

σ (x) = max (0, x)

follows to introduce nonlinearity and prevent vanishing gradients, and a

2 \times 2

max pooling operation is then applied to reduce spatial resolution and enhance translational robustness:

P_{i j}^{l} = max_{m, n \in {0, 1}} F_{(2 i + m) (2 j + n)}^{l} .

(14)

By stacking two such convolution–activation–pooling blocks, each branch can progressively extract higher-level representations of the trajectory structure without incurring unnecessary model depth. This configuration achieves a balance between representational power and parameter efficiency, and subsequent experiments confirm its effectiveness in the maneuver pattern classification task.

4.2.3. Feature Fusion

The feature fusion module integrates the outputs of the Siamese network’s top-view and side-view branches by flattening and concatenating their respective feature maps, as depicted in Figure 7 (corresponding to the feature fusion block in Figure 5b). Let

F_{top}

and

F_{side} \in R^{H \times W \times C}

denote the high-dimensional feature maps produced by the two branches, where H, W, and C are the height, width, and number of channels, respectively.

Each feature map is first transformed into a one-dimensional vector [67]:

F_{t o p}^{f l a t} = Flatten F_{t o p},

(15)

F_{s i d e}^{f l a t} = Flatten F_{s i d e},

(16)

where

F_{t o p}^{f l a t} \in R^{H \cdot W \cdot C}

and

F_{s i d e}^{f l a t} \in R^{H \cdot W \cdot C}

are the one-dimensional vectors obtained by flattening the original three-dimensional tensors in

R^{H \cdot W \cdot C}

.

These flattened vectors are then concatenated along the channel dimension to form a unified embedding:

F_{f u s i o n} = Concat (F_{t o p}^{f l a t}, F_{s i d e}^{f l a t}) \in R^{2 \cdot H \cdot W \cdot C},

(17)

where

F_{f u s i o n}

denotes the fused feature vector of dimension

R^{2 \cdot H \cdot W \cdot C}

, formed by concatenating the two flattened vectors along the channel axis.

The resulting fused vector

F_{f u s i o n}

combines complementary spatial features from the top-view branch with altitude dynamics from the side-view branch. This simple yet effective concatenation preserves multi-dimensional information relevant to both horizontal and vertical maneuver patterns, thereby enhancing classification accuracy and robustness.

4.2.4. Maneuver Pattern Classification

The classification head converts the fused feature vector into maneuver pattern predictions through a compact fully connected network followed by a Softmax layer. As shown in Figure 5c, two linear transformations are sequentially applied, each followed by an ReLU activation to introduce nonlinearity and enhance feature discriminability.

Given the fused feature

X \in R^{D}

, the first transformation maps it into a 512-dimensional latent space [68]:

H_{1} = ReLU (W_{1} X + b_{1}),

(18)

where

W_{1} \in R^{512 \times D}

and

b_{1} \in R^{512}

represent the corresponding weights and biases. The intermediate representation is then further compressed to 128 dimensions through another linear ReLU transformation [68]:

H_{2} = ReLU (W_{2} H_{1} + b_{2}),

(19)

with

W_{2} \in R^{128 \times 512}

and

b_{2} \in R^{128}

. Finally, the 128-dimensional vector is linearly projected to

C = 5

logits and normalized via the Softmax function [69]:

\hat{y} = Softmax (W_{3} H_{2} + b_{3}),

(20)

where

W_{3} \in R^{C \times 128}

and

b_{3} \in R^{C}

. The output

\hat{y} \in R^{C}

provides the probability distribution across the five maneuver categories, and the class with the highest probability is selected as the prediction result.

This two-layer fully connected design efficiently compresses the high-dimensional fused features into a compact representation suitable for classification. By maintaining a simple yet expressive architecture, it achieves a favorable balance between computational efficiency and classification accuracy, making it well-suited for real-time inference scenarios.

4.3. Evaluation Metrics and Analysis

To comprehensively assess the performance of DualView-LiteNet on the multi-class maneuver pattern classification task, four widely used evaluation metrics are adopted: overall accuracy, weighted precision, weighted recall, and the weighted F1-score. Together, these indicators capture both global correctness and class-specific behavior, which is particularly important when the data distribution is imbalanced.

The overall accuracy is the fraction of correctly classified samples:

Accuracy = \frac{\sum_{i = 1}^{C} {TP}_{i}}{\sum_{i = 1}^{C} ({TP}_{i} + {FP}_{i} + {FN}_{i})},

(21)

where C is the number of classes, and for class i,

{TP}_{i}

,

{FP}_{i}

, and

{FN}_{i}

denote true positives, false positives, and false negatives, respectively.

The weighted precision is the average of per-class precision values, weighted by the number of true instances in each class:

{Precision}_{w} = \sum_{i = 1}^{C} \frac{N_{i}}{N} \times \frac{{TP}_{i}}{{TP}_{i} + {FP}_{i}},

(22)

where

N_{i}

is the number of ground truth samples in class i, and

N = \sum_{i = 1}^{C} N_{i}

.

The weighted recall is similarly defined as

{Recall}_{w} = \sum_{i = 1}^{C} \frac{N_{i}}{N} \times \frac{{TP}_{i}}{{TP}_{i} + {FN}_{i}},

(23)

where

N_{i}

is the number of ground truth samples in class i,

N = \sum_{i = 1}^{C} N_{i}

, and

{FN}_{i}

denotes false negatives for class i.

The weighted F1-Score is defined as the harmonic mean of the weighted precision and weighted recall, and therefore serves as a summary indicator that jointly reflects the balance between these two complementary aspects of model performance:

{F1-Score}_{w} = 2 \times \frac{{Precision}_{w} \times {Recall}_{w}}{{Precision}_{w} + {Recall}_{w}},

(24)

where

{Precision}_{w}

and

{Recall}_{w}

are defined as above.

These four metrics offer a holistic evaluation of the model’s performance. In particular, the weighted F1-score should be interpreted as a consolidated measure derived from precision and recall rather than as an independent outcome. Together, they not only assess overall prediction correctness but also quantify per-class robustness, ensuring that DualView-LiteNet maintains a consistent classification performance across all maneuver categories even under class imbalance conditions. Overall, this metric formulation ensures that performance comparisons in Section 5 are both interpretable and internally consistent, thereby facilitating concise and reliable quantitative analysis.

5. Experiments

This section presents a comprehensive evaluation of the proposed method and several comparative models on the maneuver pattern classification task using the established Maneuver Pattern Library dataset.

5.1. Experiment Settings

5.1.1. Dataset

The Maneuver Pattern Library dataset contains a total of 1264 labeled samples, each representing a synchronized pair of top-view and side-view trajectory images generated from simulated flight data. To ensure a fair evaluation, the dataset is randomly divided into training, validation, and testing subsets in a 5:2:3 ratio, with the class distribution maintained across all splits to avoid imbalance. Each sample is annotated with one of five maneuver categories: climb, descent, left turn, right turn, or loiter, which collectively cover the fundamental flight behavior patterns considered in this study.

In addition to the simulated dataset, we incorporate a real-world flight dataset sourced from ADS-B Exchange, a globally recognized cooperative surveillance network that provides open access to real-time and historical Automatic Dependent Surveillance–Broadcast (ADS-B) messages. From this source, we manually curate and annotate five maneuver categories consistent with the simulated dataset, with 100 real trajectory samples per class. The ADS-B dataset is used exclusively for evaluation: models trained on the simulated Maneuver Pattern Library are directly applied to the real-world trajectories in an inference-only manner, without any fine-tuning or retraining. This cross-domain evaluation protocol is designed to assess the generalization capability of simulation-trained models when exposed to real-flight data with naturally occurring noise and operational variability.

5.1.2. DualView-LiteNet Implementation Details

The implementation of DualView-LiteNet follows the architecture introduced in Section 4.2. To ensure stable optimization, all models are trained using the AdamW optimizer with an initial learning rate of

2 \times 10^{- 4}

, a weight decay of 0.01, and a cosine annealing schedule for a total of 20 epochs. Batch normalization and dropout are applied to prevent overfitting, and early stopping is adopted based on validation performance. The shared-weight design substantially reduces parameter redundancy while maintaining representational alignment across both views, which helps the model to effectively integrate spatial and altitude-related cues. To demonstrate computational efficiency, DualView-LiteNet contains 34,417,349 parameters and requires only 0.247 GFLOPs per forward pass. This lightweight configuration makes DualView-LiteNet suitable for real-time inference under limited computational resources.

5.1.3. Baseline Settings

We compare our proposed method with several baseline models, including a traditional SVM [34] and three dual-view deep learning baselines: CNN, DualView-SelfAttn, and DualView-CrossAttn.

SVM [34] serves as a classical baseline representative of traditional machine learning approaches. Each input sample consists of two image views (top and side), from which handcrafted features are extracted independently. Specifically, we employ a combination of color histograms, Local Binary Pattern (LBP) texture descriptors, and Canny edge features to characterize appearance, texture, and contour information. The extracted features from both views are concatenated into a single representation with a dimensionality of approximately 1588, which is then fed into an SVM [34] classifier. An SVM [34] with a radial basis function (RBF) kernel is adopted. The penalty parameter is set to

C = 1.4

based on validation performance, and the spread parameter of the RBF kernel is set to

γ = 0.003

. Other kernel parameters, such as degree and coef0, are not used in our configuration. The model is trained using a one-vs-rest strategy for multi-class classification. For completeness, if SIFT and HOG descriptors are used, SIFT produces a set of local descriptors of size

(N, 128)

for each image, and the mean vector is computed to obtain a 128-dimensional representation. HOG features are extracted as a one-dimensional vector and concatenated with the SIFT representation before being fed into the SVM [34] classifier.

As a deep learning baseline, we implement a CNN based on a conventional convolutional neural network [70] with two independent branches for processing the top-view and side-view images separately. Each branch extracts spatial features from its respective input, and the resulting feature maps are concatenated and passed through a fully connected classifier. The network architecture, including convolutional layers, activation functions, and hyperparameters, is kept consistent with our proposed model to ensure a fair comparison. Unlike the proposed Siamese-based design, CNN does not share weights between the two branches, which may limit its ability to enforce feature consistency across views.

To further investigate the effect of attention mechanisms in dual-view modeling, we introduce DualView-SelfAttn as an attention-enhanced baseline. In this model, each view is first processed by a shared CNN backbone, after which multi-head self-attention is applied independently within each view to capture long-range spatial dependencies. The attended features from the two views are then flattened, concatenated, and fed into the same classification head as in CNN. This baseline evaluates whether enhancing intra-view feature representation via self-attention alone is sufficient for improving maneuver recognition.

In addition, we construct DualView-CrossAttn to explicitly model inter-view interactions. Based on the same shared CNN backbone, this model introduces bidirectional cross-attention between the top-view and side-view feature sequences, allowing each view to selectively attend to informative regions of the other. The resulting cross-attended features from both views are concatenated and passed to the classifier. By directly modeling cross-view dependencies, DualView-CrossAttn serves as a strong attention-based baseline to assess the effectiveness of explicit inter-view information exchange compared with the proposed fusion strategy.

5.2. Experiment Results

The quantitative evaluation is conducted using four standard metrics: overall accuracy, weighted precision, weighted recall, and the weighted F1-score, providing a comprehensive assessment of classification performance across all maneuver categories. The detailed results are summarized in Table 2.

The SVM [34] baseline attains an overall accuracy of 70.87%, with precision, recall, and F1-score values of 0.74, 0.67, and 0.67, respectively. These results indicate that handcrafted features can partially describe maneuver patterns but struggle to capture complex spatiotemporal characteristics and multi-view correlations. DualView-CrossAttn introduces an explicit cross-attention mechanism to model inter-view feature interactions. While this design improves accuracy to 72.41% compared with the SVM [34] baseline, its precision (0.46) and F1-score (0.51) remain relatively low, suggesting that early-stage cross-view interaction without sufficiently discriminative view-specific representations may introduce noise and limit classification reliability. DualView-SelfAttn applies self-attention within each view branch to enhance intra-view feature modeling. This leads to further improvements, achieving 77.87% accuracy and balanced precision, recall, and F1-score values of 0.62, 0.60, and 0.62, respectively. The results indicate that strengthening view-specific feature representations is beneficial, although the lack of explicit cross-view alignment still constrains overall performance. In contrast, the CNN [70] achieves substantial gains across all metrics, reaching 93.84% accuracy, 0.94 precision, 0.92 recall, and a 0.93 F1-score, demonstrating the effectiveness of end-to-end feature learning for maneuver pattern recognition. However, the use of independent branches limits cross-view consistency, leaving room for further improvement.

The proposed DualView-LiteNet achieves the best overall performance, with 97.64% accuracy and uniformly high scores in precision (0.98), recall (0.98), and F1-score (0.98). This consistent improvement across all evaluation metrics highlights the model’s capability to jointly learn spatial and temporal-related features while maintaining strong cross-view coherence through shared-weight training. Overall, the results confirm that DualView-LiteNet significantly enhances both precision and recall while preserving high classification reliability across maneuver categories, demonstrating its superior robustness and generalization capability on the Maneuver Pattern Library dataset.

Although the proposed DualView-LiteNet achieves consistently high accuracy across maneuver categories, an examination of representative failure cases reveals meaningful insights into the model’s limitations. Misclassifications primarily occur in boundary maneuvers where geometric patterns partially overlap. For example, loiter segments—many of which are generated through data augmentation due to the scarcity of real samples—sometimes present incomplete circular shapes. In such cases, their horizontal-view signatures closely resemble those of left or right turns, both involving sustained heading changes and similar centripetal motion patterns, as illustrated in Figure 8. This overlap can lead the model to incorrectly classify an incomplete loiter maneuver as a turning action. These errors typically arise when the distinguishing features are weak or when the trajectory lies near the class decision boundary, causing ambiguity in the dual-view projections. Such observations suggest that the current feature representation, while effective, still lacks sensitivity to global motion continuity, such as the persistence and completeness of cyclic motion in loiter maneuvers. Future improvements may focus on incorporating duration- or cycle-aware descriptors as well as collecting additional hard samples near class boundaries to strengthen the model’s ability to discriminate between highly similar maneuver types.

5.3. Ablation Study

To investigate the functional contribution of each module within DualView-LiteNet, this section presents ablation experiments focusing on five key aspects: the convolutional kernel size, number of convolutional layers, feature fusion strategy, connection layer configuration, and activation function.

5.3.1. Effect of Convolutional Kernel Size on DualView-LiteNet Performance

The convolutional kernel size plays a critical role in determining the receptive field and feature extraction capability of the Siamese network. A larger receptive field allows the model to capture broader spatial context but may lead to the loss of fine-grained details, whereas a smaller kernel emphasizes local information.

In this experiment, kernel sizes of

3 \times 3

,

5 \times 5

, and

7 \times 7

were compared under identical settings: ReLU activation, batch size of four, two convolutional layers, feature fusion by concatenation, and a fully connected connection layer.

As shown in Table 3, the

3 \times 3

kernel performs overwhelmingly better in this setting. This result indicates that the network is highly sensitive to receptive field size when processing dual-view inputs. In our trajectory library, maneuver patterns are dominated by line-shaped and fine-grained structures. Using a

5 \times 5

kernel causes the model to overfit, achieving low test accuracy despite converging on the training set. With a

7 \times 7

kernel, the receptive field becomes too large, leading to the poor extraction of local trajectory details; as a result, the loss fails to converge and overall accuracy remains near random chance. Consequently, the

3 \times 3

convolutional kernel offers the optimal balance, preserving essential local features while maintaining stable and effective training.

5.3.2. Effect of the Number of Convolutional Layers on DualView-LiteNet Performance

The convolutional branches in the Siamese network are responsible for learning spatial and structural information from the input data. The number of convolutional layers directly influences the model’s representation capacity and training stability. Too many layers may cause overfitting or convergence difficulties, while too few layers may lead to insufficient feature extraction.

To identify the optimal configuration, we tested models with twor, three, and four convolutional layers under fixed settings: ReLU activation,

3 \times 3

kernels, a batch size of four, concatenation fusion, and a fully connected connection layer.

As indicated in Table 4, the configuration with two convolutional layers performs best for this dataset. This result suggests that the maneuver patterns in the Maneuver Pattern Library contain distinctive yet relatively local geometric structures, which can be effectively captured by a shallow convolutional stack. In contrast, adding more layers introduces excessive nonlinearity and parameterization, which destabilizes training and leads to severe performance degradation. Although the four-layer configuration partially recovers accuracy, it still shows reduced generalization capability, indicating that deeper architectures may overfit or extract redundant transformations that are unnecessary for this domain. Overall, the two-layer design provides the best trade-off between representation capacity, optimization stability, and generalization, which aligns with the lightweight design philosophy of DualView-LiteNet.

5.3.3. Effect of Feature Fusion Strategies on DualView-LiteNet Performance

For dual-view maneuver pattern classification, an effective feature fusion strategy is essential to integrate complementary information from both perspectives. Three fusion strategies were compared: (1) concatenation, which connects feature vectors along the feature dimension; (2) addition, which performs element-wise summation with equal weighting (1:1); and (3) multiplication, which performs element-wise products to emphasize inter-view interactions. The settings were identical to previous experiments (ReLU,

3 \times 3

kernels, a batch size of four, two convolutional layers, and fully connected layer).

According to Table 5, all three fusion strategies perform reasonably well. However, concatenation yields the highest accuracy and most balanced performance across all metrics. This advantage can be attributed to its ability to retain complete feature representations from both views, allowing the subsequent fully connected layers to autonomously learn optimal cross-view relationships. In contrast, addition imposes a rigid 1:1 weighting scheme that assumes equal contribution from the two views—an assumption that may not hold for trajectory data where top-view and side-view cues have different discriminative strengths. The element-wise summation also risks mutual cancellation when feature signs differ, leading to the loss of complementary information. Multiplication further amplifies this issue: while it highlights co-occurring feature activations, it suppresses mismatched or low-magnitude responses, causing overly sparse representations that may hinder downstream classification. These observations indicate that concatenation provides the most flexible and expressive fusion mechanism, enabling DualView-LiteNet to fully exploit cross-view complementarity without imposing restrictive structural assumptions.

5.3.4. Effect of Connection Schemes on DualView-LiteNet Performance

In classification tasks, the extracted features must be projected into the target label space. This projection can be implemented either via a fully connected layer (composed of multiple linear transformations and nonlinear activations) or a single linear transformation without hidden layers. This experiment aims to evaluate whether the use of a fully connected structure is necessary for mapping fused features to maneuver categories. The settings were identical to prior experiments (ReLU,

3 \times 3

kernels, a batch size of four, two convolutional layers, and concatenation fusion).

As shown in Table 6, the linear mapping scheme performs significantly worse than the fully connected configuration. A single linear transformation imposes a strict linear separability assumption on the fused feature space, which is insufficient for modeling the complex, nonlinear interactions between dual-view representations—particularly for maneuver patterns characterized by subtle geometric variations. In contrast, introducing a fully connected structure enriches the model’s expressive power by stacking multiple linear transformations with nonlinear activations, enabling the network to learn hierarchical decision boundaries rather than relying on a single global projection. Furthermore, the fully connected layers help to re-balance the contributions from the two views after concatenation, implicitly performing feature reweighting and cross-view coupling that a linear layer cannot achieve. These advantages collectively explain the substantial performance gap, demonstrating that nonlinear connection schemes are essential for achieving robust and high-fidelity maneuver pattern discrimination in DualView-LiteNet.

5.3.5. Effect of Activation Functions on DualView-LiteNet Performance

The activation function plays a crucial role in introducing nonlinearity to the network, directly influencing its representational power. To compare the effects of different activation functions, we tested Sigmoid, Tanh, and ReLU under consistent experimental settings.

As shown in Table 7, it is evident that both Sigmoid and Tanh suffer from gradient vanishing issues during training, resulting in poor classification accuracy and limited generalization ability. Beyond the vanishing gradient problem, these bounded activation functions also compress feature values into narrow numeric ranges, which reduces feature discriminability, which is particularly detrimental for maneuver patterns where subtle local variations must be preserved across dual views. Additionally, Sigmoid and Tanh introduce higher computational costs and a slower convergence due to their expensive exponential operations. In contrast, ReLU not only avoids saturation but also promotes sparse feature activation, enabling the network to emphasize salient geometric cues while suppressing uninformative background patterns. This sparsity-driven selectivity is especially beneficial for the Maneuver Pattern Library, where fine-grained trajectory structures dominate. Therefore, ReLU provides a better trade-off between nonlinearity, optimization efficiency, and feature expressiveness, making it the most suitable activation function for the DualView-LiteNet architecture.

5.3.6. Effect of Data Augmentation on DualView-LiteNet Performance

To evaluate the effectiveness and physical plausibility of the proposed data augmentation strategies, we conduct an ablation experiment comparing DualView-LiteNet trained with and without augmentation. The augmentations include (i) spatial translation within small local offsets to simulate realistic GPS drift or sensor localization noise, and (ii) temporal subsampling to mimic variations in sampling frequency commonly observed in airborne sensing platforms.

Table 8 summarizes the results. Without augmentation, the model achieves an accuracy of 0.9251, a recall of 0.91, a precision of 0.92, and an F1-score of 0.92. After applying both spatial and temporal augmentations, performance improves substantially to an accuracy of 0.9764, a recall of 0.98, a precision of 0.98, and an F1-score of 0.98. Beyond the numerical improvement, a deeper inspection reveals that augmentation effectively mitigates overfitting by preventing the network from memorizing the limited geometric variations present in the Maneuver Pattern Library. The spatial perturbations introduce realistic deviations consistent with sensor noise, enabling the model to develop invariance to small trajectory shifts. Meanwhile, temporal subsampling exposes the network to diverse motion rhythms, improving robustness against varying sampling frequencies and trajectory pacing. These combined effects enrich the data distribution in a physically meaningful manner, strengthening the model’s ability to generalize to unseen trajectories while preserving the intrinsic structure of each maneuver pattern.

5.4. Real-World Evaluation on ADS-B Dataset

To further assess the practical relevance of the proposed approach, we evaluate all models on a real-world flight dataset sourced from ADS-B Exchange, following an inference-only protocol. All models are trained exclusively on the simulated Maneuver Pattern Library and directly applied to the real-world trajectories without any fine-tuning or retraining. This setting reflects a strict cross-domain evaluation scenario and is designed to examine whether simulation-trained models can generalize to real-flight data with naturally occurring noise and operational variability. As shown in Table 9, all methods exhibit a noticeable performance drop when transferred from simulated data to real-world ADS-B trajectories, which is expected due to differences in data distributions, sensing noise, and maneuver execution characteristics. Nevertheless, the relative performance ranking among different models remains consistent across both datasets. Traditional SVM [34] struggles to generalize to real-world data, achieving only 20.0% accuracy, indicating the limited transferability of handcrafted features. Attention-based baselines, including DualView-SelfAttn and DualView-CrossAttn, demonstrate improved robustness compared to SVM [34], but their performance is still constrained by the insufficient modeling of cross-view feature consistency. In contrast, the proposed DualView-LiteNet achieves the best performance on the real-world dataset, reaching 65.0% accuracy and an F1-score of 0.64. Despite the absence of real-data supervision during training, the proposed model consistently outperforms all baselines, suggesting that the shared-weight dual-view design effectively captures maneuver-discriminative patterns that are less sensitive to domain shifts. These results provide empirical evidence that simulation-based maneuver primitive learning can offer meaningful generalization to real-flight data, thereby partially addressing concerns regarding the practical validity of simulation-driven approaches. Overall, this real-world inference evaluation highlights the practical robustness of the proposed framework and underscores the effectiveness of dual-view shared representation learning in mitigating sim-to-real performance degradation.

6. Conclusions

This study presented DualView-LiteNet, a lightweight dual-view temporal classification framework designed to extract and fuse complementary spatial–temporal cues from synchronized top-view and side-view trajectory sequences. To provide a controlled environment for validating the core capability of the proposed architecture, we constructed a simplified maneuver dataset composed of several standard, single-type maneuvers. This dataset serves primarily as a proof-of-concept platform rather than a final application-oriented benchmark. The experimental results show that DualView-LiteNet effectively captures both horizontal and vertical motion characteristics, consistently outperforming traditional SVM-based methods, dual-view attention-based baselines, and conventional CNN architectures on the simulated dataset. The comparative analysis indicates that introducing explicit cross-view interaction mechanisms improves performance over handcrafted feature and simple fusion baselines, while shared-weight dual-view convolutional learning further strengthens feature consistency and discriminative capability. While a strong performance on idealized data alone does not guarantee its direct applicability to real-world flight scenarios, additional inference experiments on real-world ADS-B trajectories demonstrate that the model trained purely on simulated data can generalize to realistic flight patterns to a certain extent. This observation supports the validity of the proposed simulation-driven formulation and suggests that the learned dual-view representations capture maneuver-related structures that are not limited to the synthetic domain.

Accordingly, this work should be regarded as an initial step toward more comprehensive maneuver understanding rather than a complete end-to-end solution. The proposed dual-view framework is model centric and general in its design; its ability to align multi-view temporal features is expected to remain beneficial when extended to more complex, noisy, and compound maneuver patterns. Nevertheless, substantial future efforts are still required to systematically evaluate robustness under more diverse and less controlled conditions.

Future work will prioritize (i) a deeper investigation of sim-to-real generalization using larger and more diverse real-world flight datasets, (ii) expanding the maneuver library to include compound, non-ideal, and mixed maneuvers that better reflect practical flight behaviors, and (iii) enhancing DualView-LiteNet with more expressive temporal modeling mechanisms, additional sensing cues, and domain adaptation strategies. These directions are essential for further improving the robustness, interpretability, and practical value of multi-view maneuver pattern classification.

Author Contributions

Conceptualization, Z.Y., Y.A. and W.Z.; Methodology, Z.Y., Z.C. and Y.A.; Software, Z.Y.; Validation, Z.Y., Z.C. and B.S.; Formal analysis, Z.Y.; Investigation, Z.Y. and Z.C.; Resources, Y.A. and W.Z.; Data curation, Z.C. and B.S.; Writing—original draft, Z.Y. and Z.C.; Writing—review & editing, Z.Y., B.S. and Y.A.; Visualization, Z.Y. and Z.C.; Supervision, Y.A. and W.Z.; Project administration, Y.A. and W.Z.; Funding acquisition, Y.A. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2023YFA609204. The APC was funded by the same grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

During the preparation of this manuscript, the authors used DeepSeek (DeepSeek-R1, DeepSeek-AI) and ChatGPT (GPT-4, OpenAI) for the purposes of English translation and language polishing to improve the clarity of the text. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhu, L.; Sun, Q.; Duan, J.; Pang, M. Automatic evaluation method of pilot flight training quality based on maneuver action type recognition. Syst. Eng. Electron. 2023, 45, 3932–3940. [Google Scholar]
Zhu, K.; Dong, Y. Study on the design of air combat maneuver library. Aeronaut. Comput. Technol. 2001, 4, 50–52. [Google Scholar]
Wang, R.; Gao, Z. Research on decision system in air combat simulation using maneuver library. Flight Dyn. 2009, 27, 72–75. [Google Scholar]
Wang, Y.; Wang, C.; Zhang, Y. Acrobatic maneuver reorganization method compared with parameters relevance and feature of sequence change. Comput. Eng. Appl. 2016, 52, 246–249. [Google Scholar]
Wang, Y.; Gao, Y. Research on complex action recognition method based on basic flight movements, Ship Electron. Eng 2018, 38, 74–76. [Google Scholar]
Wu, J.; Hu, C.; Sun, C.; Chen, X.; Yan, R. Aircraft flight regime recognition with deep temporal segmentation neural network. Eng. Appl. Artif. Intell. 2023, 120, 105840. [Google Scholar] [CrossRef]
Jasra, S.K.; Valentino, G.; Muscat, A.; Camilleri, R. A Comparative Study of Unsupervised Deep Learning Methods for Anomaly Detection in Flight Data. Aerospace 2025, 12, 645. [Google Scholar] [CrossRef]
Johnson, W.W.; Kaiser, M.K. Perspective imagery in synthetic scenes used to control and guide aircraft during landing and taxi: Some issues and concerns. In Proceedings of the Synthetic Vision for Vehicle Guidance and Control; SPIE: Bellingham, WA, USA, 1995; Volume 2463, pp. 194–204. [Google Scholar]
Tian, W.; Zhang, H.; Li, H.; Xiong, Y. Flight maneuver intelligent recognition based on deep variational autoencoder network. EURASIP J. Adv. Signal Process. 2022, 2022, 21. [Google Scholar] [CrossRef]
Li, F.; Xu, X.; Wang, R.; Ma, M.; Dong, Z. Flight Trajectory Prediction Based on Automatic Dependent Surveillance-Broadcast Data Fusion with Interacting Multiple Model and Informer Framework. Sensors 2025, 25, 2531. [Google Scholar] [CrossRef]
Fang, W.; Wang, Y.; Yan, W.; Lin, C. Symbolized flight action recognition based on neural network. Syst. Eng. Electron. 2022, 44, 737–745. [Google Scholar]
Austin, F.; Carbone, G.; Falco, M.; Hinz, H.; Lewis, M. Automated maneuvering decisions for air-to-air combat. In Proceedings of the Guidance, Navigation and Control Conference; AIAA: Reston, Virginia, 1987; p. 2393. [Google Scholar]
Barndt, G.; Sarkar, S.; Miller, C. Maneuver regime recognition development and verification for H-60 structural monitoring. In Proceedings of the Annual Forum Proceedings-American Helicopter Society; American Helicopter Society, Inc.: Fairfax, VA, USA, 2007; Volume 63, p. 317. [Google Scholar]
Tian, F.; Zhang, T.; Meng, G.; Sun, F. Intelligent recognition of fighter’s maneuver based on fuzzy control algorithm. In Proceedings of the 2014 Fourth International Conference on Instrumentation and Measurement, Computer, Communication and Control; IEEE: Piscataway, NJ, USA, 2014; pp. 584–589. [Google Scholar]
Keogh, E.; Ratanamahatana, C.A. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 2005, 7, 358–386. [Google Scholar] [CrossRef]
Al-Naymat, G.; Chawla, S.; Taheri, J. Sparsedtw: A novel approach to speed up dynamic time warping. arXiv 2012, arXiv:1201.2969. [Google Scholar] [CrossRef]
Li, H.; Shan, Z.; Guo, H. Flight action recognition algorithm based on MDTW. Comput. Eng. Appl. 2015, 51, 267–270. [Google Scholar]
Adwan, S.; Alsaleh, I.; Majed, R. A new approach for image stitching technique using Dynamic Time Warping (DTW) algorithm towards scoliosis X-ray diagnosis. Measurement 2016, 84, 32–46. [Google Scholar] [CrossRef]
Zhao, J.; Itti, L. shapeDTW: Shape dynamic time warping. Pattern Recognit. 2018, 74, 171–184. [Google Scholar] [CrossRef]
Zhang, H.; Dong, Y.; Xu, D. Multilevel dynamic time warping: A parameter-light method for fast time series classification. J. Intell. Fuzzy Syst. 2021, 40, 10197–10210. [Google Scholar] [CrossRef]
Lu, J.; Chai, H.; Jia, R. A general framework for flight maneuvers automatic recognition. Mathematics 2022, 10, 1196. [Google Scholar] [CrossRef]
Gao, Y.; Ni, S.; Wang, Y. A method for flight state rule acquisition based on improved quantum genetic algorithm. Electron. Opt. Control 2011, 18, 28–31. [Google Scholar]
Zhang, T.; Yu, L.; Zhou, Z.L.; Liu, H.Q. Decision-Making for Air Combat Maneuvering Based on Variable Weight Improved Pseudo-Parallel Genetic Algorithm. In Proceedings of the 2012 International Conference on Electronics, Communications and Control; IEEE Computer Society: Los Alamitos, CA, USA, 2012; pp. 3082–3085. [Google Scholar]
Xie, J.; Yang, Q.; Dai, S.; Wang, W.; Zhang, J. Air combat maneuver decision based on reinforcement genetic algorithm. Xibei Gongye Daxue Xuebao/J. Northwestern Polytech. Univ. 2020, 38, 1330–1338. [Google Scholar] [CrossRef]
Tong, Q.; Li, L.; Tong, Z.; Guo, H.; Li, S.; Huang, H. Air combat intention threat modeling and simulation based on maneuver recognition. Mod. Def. Technol. 2014, 42, 174–184. [Google Scholar]
Shen, Y.; Ni, S.; Zhang, P. Flight action recognition method based on Bayesian network. Comput. Eng. Appl. 2017, 53, 161–167. [Google Scholar]
Liu, H.; Wang, H.; Meng, G.; Wu, H.; Zhou, M. Flight training evaluation based on dynamic Bayesian network and fuzzy gray theory. Acta Aeronaut. Astronaut. Sin. 2021, 42, 250–261. [Google Scholar]
Gohari, K.; Kazemnejad, A.; Mohammadi, M.; Eskandari, F.; Saberi, S.; Esmaieli, M.; Sheidaei, A. A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients. BMC Med. Res. Methodol. 2023, 23, 190. [Google Scholar] [CrossRef]
Zhang, X.; Qi, W. Dynamic Bayesian networks model for causal-time series coupling in aircraft takeoff attitude and flight operations. J. Tsinghua Univ. (Sci. Technol.) 2024, 64, 1070–1081. [Google Scholar]
Pang, N.Y.; Guan, D.H.; Yuan, W.W. An interpretable real-time maneuver identification algorithm based on early time series classification. Comput. Eng. Sci. 2024, 46, 353. [Google Scholar]
Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
Xie, C.; Ni, S.; Zhang, Z.; Wang, Y. Recognition method of acrobatic maneuver based on state matching and support vector machines. J. Project. Rocket. Missiles Guid. 2004, 3, 240–242. [Google Scholar]
Liu, Q.; Li, R.; Qiao, C. Air Combat Flight Action Recognition Based on Improved Support Vector Regression. Mod. Def. Technol. 2024, 52, 49. [Google Scholar]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Xi, Z.; Kou, Y.; Li, Z.; Lv, Y.; Li, Y. An air combat maneuver pattern extraction based on time series segmentation and clustering analysis. Def. Technol. 2024, 36, 149–162. [Google Scholar] [CrossRef]
Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 2005; pp. 799–804. [Google Scholar]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS); IEEE: Piscataway, NJ, USA, 2017; pp. 1597–1600. [Google Scholar]
Wu, J.; Xiong, H.; Zong, R.; Zhao, Y.; Zhou, X. Target Turning Maneuver Type Recognition Based on Recurrent Neural Networks. J. Guangdong Univ. Technol. 2020, 37, 67–73. [Google Scholar]
Wang, Z.; Wang, Y.; Yang, N.; Mi, Y.; Qu, L. Construction method of flight data mining model based on lstm. Acta Aeronaut. Astronaut. Sin 2021, 42, 47–53. [Google Scholar]
Fan, H.; Fan, H.; Gao, R. Research on air target maneuver recognition based on LSTM network. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI); IEEE: Piscataway, NJ, USA, 2020; pp. 6–10. [Google Scholar]
Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Wang, Q.; Bu, S.; He, Z.; Dong, Z.Y. Toward the prediction level of situation awareness for electric power systems using CNN-LSTM network. IEEE Trans. Ind. Inform. 2020, 17, 6951–6961. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 10012–10022. [Google Scholar]
Qin, S.; Luo, Y.; Tao, G. Memory-augmented U-transformer for multivariate time series anomaly detection. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
Korban, M.; Youngs, P.; Acton, S.T. A semantic and motion-aware spatiotemporal transformer network for action detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6055–6069. [Google Scholar] [CrossRef]
Sun, L.; Li, C.; Ren, Y.; Zhang, Y. A multitask dynamic graph attention autoencoder for imbalanced multilabel time series classification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11829–11842. [Google Scholar] [CrossRef] [PubMed]
Liang, F.; Chen, X.; He, S.; Song, Z.; Lu, H. An Aerial Target Recognition Algorithm Based on Self-Attention and LSTM. Comput. Mater. Contin. 2024, 81, 1101–1121. [Google Scholar] [CrossRef]
Yan, T.L.; Li, Y.; Wang, F.Q. Recognition and division of aircraft flight action based on MRF model. Comput. Eng. Sci. 2022, 44, 159. [Google Scholar]
Donick, M. Microsoft Flight Simulator (1982). Comput. 50 Zentrale Tit. 2025, 6, 158. [Google Scholar]
Louali, R.; Belloula, A.; Djouadi, M.S.; Bouaziz, S. Real-time characterization of Microsoft Flight Simulator 2004 for integration into Hardware In the Loop architecture. In Proceedings of the 2011 19th Mediterranean Conference on Control & Automation (MED); IEEE: Piscataway, NJ, USA, 2011; pp. 1241–1246. [Google Scholar]
Perry, A.R. The flightgear flight simulator. In Proceedings of the USENIX Annual Technical Conference; USENIX: Berkeley, CA, USA, 2004; Volume 686, pp. 1–12. [Google Scholar]
Daniec, K.; Iwaneczko, P.; Jędrasiak, K.; Nawrat, A. Prototyping the autonomous flight algorithms using the prepar3D^® simulator. In Vision Based Systemsfor UAV Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 219–232. [Google Scholar]
Deng, Q.; Liang, W.; Li, X. Record-driven Simulated Flight Training Mission Recommendation. In Proceedings of the 2025 4th International Symposium on Computer Applications and Information Technology (ISCAIT); IEEE: Piscataway, NJ, USA, 2025; pp. 1668–1671. [Google Scholar]
Garcia, R.; Barnes, L. Multi-uav simulator utilizing x-plane. In Proceedings of the Selected Papers from the 2nd International Symposium on UAVs, Reno, NV, USA, 8–10 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 393–406. [Google Scholar]
Ribeiro, L.R.; Oliveira, N.M.F. UAV autopilot controllers test platform using Matlab/Simulink and X-Plane. In Proceedings of the 2010 IEEE Frontiers in Education Conference (FIE); IEEE: Piscataway, NJ, USA, 2010; p. S2H-1. [Google Scholar]
Figueiredo, H.V.; Saotome, O. Simulation platform for quadricopter: Using matlab/simulink and x-plane. In Proceedings of the 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium; IEEE: Piscataway, NJ, USA, 2012; pp. 51–55. [Google Scholar]
Bittar, A.; Figuereido, H.V.; Guimaraes, P.A.; Mendes, A.C. Guidance software-in-the-loop simulation using x-plane and simulink for uavs. In Proceedings of the 2014 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: Piscataway, NJ, USA, 2014; pp. 993–1002. [Google Scholar]
Bittar, A.; De Oliveira, N.M.F.; De Figueiredo, H.V. Hardware-in-the-loop simulation with X-plane of attitude control of a SUAV exploring atmospheric conditions. J. Intell. Robot. Syst. 2014, 73, 271–287. [Google Scholar] [CrossRef]
Ersoy, E.; Yalçın, M.K. Designing autopilot system for fixed-wing flight mode of a tilt-rotor UAV in a virtual environment: X-Plane. Int. Adv. Res. Eng. J. 2018, 2, 33–42. [Google Scholar]
Yu, L.; He, G.; Zhao, S.; Wang, X.; Shen, L. Design and Implementation of a Hardware-in-the-Loop Simulation System for a Tilt Trirotor UAV. J. Adv. Transp. 2020, 2020, 4305742. [Google Scholar] [CrossRef]
Aláez, D.; Olaz, X.; Prieto, M.; Porcellinis, P.; Villadangos, J. Hil flight simulator for vtol-uav pilot training using x-plane. Information 2022, 13, 585. [Google Scholar] [CrossRef]
Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. 1960. Available online: https://www.unitedthc.com/DSP/Kalman1960.pdf (accessed on 19 February 2026).
Schoenberg, I.J. Contributions to the problem of approximation of equidistant data by analytic functions. Part A. On the problem of smoothing or graduation. A first class of analytic approximation formulae. Q. Appl. Math. 1946, 4, 45–99. [Google Scholar] [CrossRef]
Wolberg, G. Cubic Spline Interpolation: A Review; Columbia University: New York, NY, USA, 1988. [Google Scholar]
Rosenfeld, A. Algorithms for image/vector conversion. In Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques; Association for Computing Machinery: New York, NY, USA, 1978; pp. 135–139. [Google Scholar]
Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
Bridle, J. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Proceedings of the 3rd International Conference on Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1989; Volume 2. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of maneuver pattern classification. The schematic illustrates the maneuver pattern classification based on top-view and side-view trajectory images. The black dashed curves represent the target’s trajectories, and the red highlighted sections indicate the specific trajectory segments identified for maneuver pattern classification.

Figure 2. Illustration of the Maneuver Pattern Library. This figure presents a schematic overview of the Maneuver Pattern Library. The top row shows the top-view projections of each maneuver, while the bottom row shows the corresponding side-view projections. Each column represents a specific maneuver type: (a) climb maneuver, (b) descent maneuver, (c) left-turn maneuver, (d) right-turn maneuver, and (e) loiter maneuver. A color gradient from dark to light indicates the progression from the start to the end of the trajectory.

Figure 3. Flight simulation equipment. (a) Control stick; (b) throttle quadrant.

Figure 4. Illustration of the flight page. (a) Flight configuration; (b) flight interface.

Figure 5. Maneuver pattern classification framework.

Figure 6. Backbone module. A Siamese dual-branch CNN with shared weights that extracts horizontal trajectory patterns from the top view, and vertical trajectory patterns from the side view.

Figure 7. Feature fusion module. Illustration of the flatten–concatenate strategy used to merge top-view and side-view feature maps into a joint representation.

Figure 8. Representative misclassification case. (a) Top view; (b) side view. A color gradient from dark to light indicates the progression from the start to the end of the trajectory.

Table 1. Aircraft parameters.

Length	Wingspan	Height	Maximum Flight Speed	Maximum Takeoff Weight	Service Ceiling	Combat Radius
(m)	(m)	(m)	(Mach)	(kg)	(m)	(km)
19.1	19.54	4.88	2.34	33,724	15,200	926

Table 2. Maneuver pattern classification benchmark on Maneuver Pattern Library. The upward arrows (↑) indicate that higher values represent better performance.

Method	Simulated Dataset				Real-World Dataset
Method	Accuracy↑	Precision↑	Recall↑	F1-Score↑	Accuracy↑	Precision↑	Recall↑	F1-Score↑
SVM [34]	0.7087	0.74	0.67	0.67	0.2000	0.04	0.20	0.07
DualView-CrossAttn	0.7241	0.46	0.58	0.51	0.3120	0.30	0.31	0.23
DualView-SelfAttn	0.7787	0.62	0.60	0.62	0.4080	0.45	0.41	0.40
CNN [70]	0.9384	0.94	0.92	0.93	0.602	0.58	0.60	0.58
DualView-LiteNet (Ours)	0.9764	0.98	0.98	0.98	0.6500	0.66	0.65	0.64

Table 3. Performance comparison across different convolutional kernel sizes. The upward arrows (↑) indicate that higher values represent better performance.

Convolutional Kernel Size	Accuracy↑	Precision↑	Recall↑	F1-Score↑
$3 \times 3$	0.9764	0.98	0.98	0.98
$5 \times 5$	0.2350	0.18	0.12	0.14
$7 \times 7$	0.2342	0.05	0.23	0.09

Table 4. Performance comparison across different number of convolutional layers. The upward arrows (↑) indicate that higher values represent better performance.

Number of Convolutional Layers	Accuracy↑	Precision↑	Recall↑	F1-Score↑
2	0.9764	0.98	0.98	0.98
3	0.4098	0.71	0.41	0.38
4	0.9426	0.94	0.93	0.93

Table 5. Performance comparison across different fusion methods. The upward arrows (↑) indicate that higher values represent better performance.

Fusion Method	Accuracy↑	Precision↑	Recall↑	F1-Score↑
Concat	0.9764	0.98	0.98	0.98
Add	0.9426	0.94	0.94	0.94
Mul	0.9099	0.92	0.91	0.91

Table 6. Performance comparison across different connection schemes. The upward arrows (↑) indicate that higher values represent better performance.

Connection Scheme	Accuracy↑	Precision↑	Recall↑	F1-Score↑
Fully Connected	0.9764	0.98	0.98	0.98
Linear Mapping	0.8023	0.81	0.79	0.80

Table 7. Performance comparison across different activation functions. The upward arrows (↑) indicate that higher values represent better performance.

Activation Function	Accuracy↑	Precision↑	Recall↑	F1-Score↑
Relu	0.9764	0.98	0.98	0.98
Sigmoid	0.2442	0.05	0.23	0.09
Tanh	0.2123	0.04	0.21	0.07

Table 8. Performance comparison across different data augmentation settings. The upward arrows (↑) indicate that higher values represent better performance, and the checkmark (✓) denotes that the data augmentation strategy is applied.

Augmentation	Accuracy↑	Precision↑	Recall↑	F1-Score↑
	0.9251	0.92	0.91	0.92
✓	0.9764	0.98	0.98	0.98

Table 9. Inference-only maneuver classification results on the real-world ADS-B dataset.

Method	Accuracy↑	Precision↑	Recall↑	F1-Score↑
SVM [34]	0.2000	0.04	0.20	0.07
DualView-CrossAttn	0.3120	0.30	0.31	0.23
DualView-SelfAttn	0.4080	0.45	0.41	0.40
CNN [70]	0.6020	0.58	0.60	0.58
DualView-LiteNet (Ours)	0.6500	0.66	0.65	0.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Z.; Chen, Z.; Sun, B.; Ai, Y.; Zhang, W. A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification. Sensors 2026, 26, 1526. https://doi.org/10.3390/s26051526

AMA Style

Yang Z, Chen Z, Sun B, Ai Y, Zhang W. A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification. Sensors. 2026; 26(5):1526. https://doi.org/10.3390/s26051526

Chicago/Turabian Style

Yang, Zhenwei, Zhuang Chen, Botian Sun, Yibo Ai, and Weidong Zhang. 2026. "A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification" Sensors 26, no. 5: 1526. https://doi.org/10.3390/s26051526

APA Style

Yang, Z., Chen, Z., Sun, B., Ai, Y., & Zhang, W. (2026). A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification. Sensors, 26(5), 1526. https://doi.org/10.3390/s26051526

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Standardized Maneuver Pattern Library and Dual-View Framework for Multi-View Maneuver Classification

Abstract

1. Introduction

2. Related Works

2.1. Knowledge Base Matching Methods

2.2. Similarity-Based Matching Methods

2.3. Machine Learning-Based Methods

2.4. Deep Learning-Based Methods

3. Maneuver Pattern Library

3.1. Data Generation

3.2. Coordinate Transformation

3.3. Maneuver Pattern Annotation and Data Augmentation

3.4. Trajectory Denoising via Kalman Filter

3.5. Dual-View Projection and Cubic Spline Interpolation

3.6. Temporal Color Encoding

4. Method

4.1. Task Description

4.2. Architecture of DualView-LiteNet

4.2.1. Data Preprocessing

4.2.2. Dual-Branch Feature Extraction

4.2.3. Feature Fusion

4.2.4. Maneuver Pattern Classification

4.3. Evaluation Metrics and Analysis

5. Experiments

5.1. Experiment Settings

5.1.1. Dataset

5.1.2. DualView-LiteNet Implementation Details

5.1.3. Baseline Settings

5.2. Experiment Results

5.3. Ablation Study

5.3.1. Effect of Convolutional Kernel Size on DualView-LiteNet Performance

5.3.2. Effect of the Number of Convolutional Layers on DualView-LiteNet Performance

5.3.3. Effect of Feature Fusion Strategies on DualView-LiteNet Performance

5.3.4. Effect of Connection Schemes on DualView-LiteNet Performance

5.3.5. Effect of Activation Functions on DualView-LiteNet Performance

5.3.6. Effect of Data Augmentation on DualView-LiteNet Performance

5.4. Real-World Evaluation on ADS-B Dataset

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI