1. Introduction
Manned–Unmanned Teaming (MUM-T),
Figure 1, is the cooperative interaction between manned and unmanned aircraft, in order to improve the overall capabilities of a mission. This concept involves the collaboration between human-operated aircraft and unmanned aerial vehicles (UAVs) or remote carriers. The objective is to exploit the advantages of both types of aircraft in order to accomplish operations that are more efficient and adaptable.
The utilization of this cooperative strategy is extensively utilized in military contexts, where MUM-T significantly improves the efficiency and efficacy of surveillance, reconnaissance, and other mission categories. The integration of piloted and unpiloted platforms optimizes the advantages of each, enhancing the overall effectiveness and flexibility of air warfare tactics.
MUM-T enables seamless communication and coordination. MUM-T facilitates uninterrupted communication and coordination between manned and unmanned platforms, enabling them to exchange information, carry out complementary activities, and improve situational awareness. Piloted aircraft can gain advantages from the supplementary functionalities of unmanned carriers, including enhanced range, prolonged endurance, and the capacity to enter hazardous locations without endangering human pilots. Unmanned vehicles can utilize the knowledge and decision-making skills of human pilots.
MUM-T refers to a continuous flight where manned and unmanned aircraft fly closely together. Close-formation flying poses a challenge that requires an accurate estimation of the relative state between the fighter aircraft and the remote carriers, as well as a reliable formation guidance system that utilizes this estimated relative state. A straightforward approach to determine the relative placement is comparing the GPS coordinates of the fighter aircraft (manned) with those of the remote carriers. This technique is valuable for precise formation flight in situations when there is a significant distance between aircraft. The precision of this method is measured in meters, which may not be sufficient for flying in close formation.
However, it is also effective in environments where Global Navigation Satellite System signals are not available. Furthermore, a source of perturbations to consider is the temporal synchronization discrepancy that arises when comparing data between unmanned aerial vehicles (UAVs). This error is amplified by elevated horizontal vehicle motion and data losses.
In comparison to previous research efforts, the present work distinguishes itself by addressing relative navigation and control within a modular and platform-agnostic MUM-T framework. While many studies have focused on hardware-specific implementations, operator-centric interfaces, or platform-dependent estimation algorithms, few have proposed a sensor fusion and control architecture that can generalize across vehicle types and mission profiles. This scalability is critical to enable reliable coordination between heterogeneous unmanned and manned aerial systems operating under GNSS-degraded conditions, latency, and environmental uncertainty.
To attain the accuracy required for close-formation flight—especially in contested airspaces or autonomous refueling scenarios—it is necessary to complement visual data with inertial and absolute references. Vision-only techniques such as active contours [
1], silhouette tracking [
2], and neural network-based processing [
3] have shown promise in controlled environments. Others, like camera-based geometric localization [
4,
5], attempt to reconstruct a position directly from passive vision. However, these methods are prone to occlusion, mismatching, clutter, and lighting variation, limiting their operational robustness.
For this reason, modern vision-based navigation systems in MUM-T increasingly rely on sensor fusion, combining visual markers with data from inertial measurement units (IMUs), magnetometers, barometric altimeters, and GNSS receivers. Systems such as those proposed for autonomous aerial refueling [
6,
7] or collaborative UAV tracking [
8,
9] have shown that multi-source fusion provides the reliability and fault-tolerance necessary for precise relative estimation in complex scenarios.
While the estimation framework forms the core of the proposed solution, it is equally important to position this contribution within the broader MUM-T systems literature.
A valuable system-level taxonomy of MUM-T configurations is presented in [
10], focusing on AI and interoperability protocols such as STANAG 4586. A detailed historical evolution of MUM-T in U.S. Army aviation is presented in [
10], documenting the progression of Levels of Interoperability (LOI) with platforms like Apache and Hunter. These contributions help contextualize current architectural needs, but do not tackle the estimation problem directly.
Kim and Kim introduce a helicopter–UAV simulation platform integrating PX4-based drones with a full-scale simulator in Unreal Engine [
10]. Their system excels in HMI experimentation, offering tactile and immersive feedback loops for operator-in-the-loop testing. However, it does not include estimation modules or GNSS-degraded capability evaluation.
A timeline-based task assignment planner that enables multi-level delegation of UAV tasks to the pilot is presented in [
11]. Their planner is tested with military pilots in the IFS Jet simulator and proves effective for mission-level decision flow. Later, Ref. [
12] extend this approach by incorporating mental resource models—specifically the capacity model and Wicken’s multiple resource theory—into task allocation logic. By anticipating pilot workload, their planning system schedules UAV tasks to avoid cognitive overload, thus preserving performance.
In more adversarial contexts, other works have tackled autonomous maneuverings and control. Ramírez López and Żbikowski [
13] simulate autonomous decision-making in UCAV dogfighting, modeling maneuvers as a game-theoretic problem space. Similarly, Huang et al. [
14] propose a Bayesian inference-based optimization framework using moving horizon control for combat trajectory selection, showing improved survivability under dynamic threats.
An autonomy balancing in MUM-T swarm engagements is presented in [
15]. Their RESCHU-SA simulation integrates physiological feedback (heart rate and posture) to adjust autonomy levels based on operator stress, showing how mixed-initiative control improves mission performance. This perspective of dynamic autonomy assignment complements our work’s focus on robust estimation under variable data availability.
Human–machine interface (HMI) studies have also shown relevant results; the authors of [
16] compare voice, touch, and multimodal inputs for UAV control in high-stress scenarios, concluding that touch-based input consistently outperforms others in terms of reliability and operator preference. The authors of [
17] present the Hyper-Enabled Operator (HEO) framework, emphasizing cognitive load reduction and seamless HMI design as critical enablers for effective MUM-T command.
This paper proposes a novel relative estimation and control framework for Manned–Unmanned Teaming (MUM-T), specifically designed to enable autonomous close-formation flights in dynamic and complex operational scenarios. The core objectives of this work are threefold: (1) to develop a quaternion-based relative state estimator that integrates data from GPS, Inertial Navigation Systems (INSs), and vision-based pose estimation to enhance accuracy and robustness; (2) to validate the proposed framework through comprehensive simulations and initial experimental trials, demonstrating its feasibility in real-world applications; and (3) to effectively address challenges such as temporal synchronization discrepancies, GNSS degradation, vision dropouts, and sensor noise by employing an Unscented Kalman Filter (UKF), which is known for its superior performance in nonlinear estimation problems without requiring Jacobian computations.
Unlike the Extended Kalman Filter (EKF), which requires the derivation and linearization of Jacobians, the UKF uses a deterministic sampling method (sigma points) to propagate uncertainty through nonlinear transformations. This not only improves estimation accuracy through a second-order approximation [
18,
19] but also increases numerical stability and resilience to poor initialization—key in scenarios where GPS may be intermittent and visual markers may briefly disappear.
In this work, the UKF is applied in two distinct layers:
First, an onboard AHRS (Attitude and Heading Reference System) fuses GPS, IMU, magnetometer, and barometer data to maintain stable state estimation in each UAV.
Second, a relative state estimator on the leader or tanker aircraft uses visual PnP-based measurements (e.g., using POSIT [
20]) combined with the state vectors of both vehicles to infer relative pose. The POSIT (Pose from Orthography and Scaling with Iterations) algorithm was employed to estimate the three-dimensional pose of known rigid objects from single monocular images. This method assumes that the intrinsic parameters of the camera are known and relies on a set of 2D-3D correspondences between image features and predefined object landmarks. POSIT initially approximates the object pose using a scaled orthographic projection, which simplifies the computation. It then iteratively refines the solution by updating the estimated depths of the 3D points, progressively converging toward a more accurate perspective pose. Due to its computational efficiency and robustness in real-time scenarios, POSIT was particularly suitable for the visual tracking and spatial estimation tasks addressed in this work.
A well-documented limitation of the UKF in attitude representation is that quaternion-based filtering can yield non-unit quaternions when averaging sigma points. To address this, the present work implements Generalized Rodrigues Parameters (GRPs) for attitude error representation, following the approach of Crassidis & Markley [
19], where a three-component GRP vector is used within the UKF framework. This method preserves quaternion normalization, reduces covariance size, and has been shown to outperform conventional EKF in scenarios with large initialization errors.
The use of relative navigation techniques between aerial vehicles has been previously explored by Fosbury and Crassidis [
21], who investigated the integration of inertial and GPS data for estimating relative position and velocity in formation flight. Their work emphasized the importance of accurate time synchronization and sensor alignment, particularly in distributed systems where each vehicle independently estimates its own state. While their estimator was primarily based on Extended Kalman Filter (EKF) formulations, the foundational insights regarding the coupling of inertial and GNSS measurements for inter-vehicle navigation directly inform the architecture presented in this work. In contrast, the proposed system builds upon these principles by incorporating visual observations and switching to a quaternion-based Unscented Kalman Filter (UKF) with GRP-based attitude error representation, thereby improving estimation consistency and robustness under nonlinear conditions.
This approach builds upon past work in visual navigation and aerial refueling [
7,
22,
23,
24,
25], where both relative state and estimation frameworks were evaluated under dynamic, nonlinear flight conditions. However, unlike previous implementations that often relied on tightly coupled hardware platforms and custom estimation logic, our approach prioritizes modularity and abstraction, enabling deployment across a range of airframes and mission contexts.
The evolution of Manned–Unmanned Teaming (MUM-T) is intrinsically tied to advancements in distributed intelligence and cooperative control, particularly those inspired by swarm robotics and network-centric warfare. Foundational work by Beni introduced the concept of swarm intelligence as a form of self-organized behavior in distributed systems without centralized control, which later evolved into the framework of swarm robotics as applied to collective mobile agents like UAVs [
26,
27]. This paradigm has greatly influenced the development of cooperative robotic systems, where simple agents interact locally to produce complex, emergent behavior [
28,
29].
MUM-T systems leverage such decentralized architectures to increase flexibility, resilience, and mission efficiency. In this context, cooperative control strategies allow unmanned aerial systems (UASs) to dynamically share tasks, react to environmental changes, and support manned platforms without requiring explicit micromanagement [
30]. Architectures enabling these behaviors are often aligned with principles of network-centric warfare (NCW), where connectivity and information superiority become key enablers of operational effectiveness [
31,
32,
33], further emphasizing the need for scalable command and control mechanisms capable of orchestrating large UAV swarms in adversarial scenarios, as demonstrated in the 50 vs. 50 live-fly competition.
A critical aspect in the deployment of MUM-T systems lies in airborne communication networks, where small UAVs form dynamic links to ensure robust coordination with manned assets [
34]. These networks support varying degrees of autonomy and delegation, ranging from basic Level 1 ISR feed relay to Level 5 full mission autonomy [
35]. The implementation of such multi-level delegation schemes has been explored in recent operational planning tools that use timeline-based tasking and mixed-initiative interfaces [
36].
From a practical standpoint, modern MUM-T concepts aim to transition from isolated mission execution toward robotic autonomous systems that seamlessly collaborate with manned platforms in contested environments. Capstone studies and field demonstrations have validated the utility of this teaming, particularly in ISR, targeting, and refueling roles [
37]. For instance, autonomous aerial refueling tasks now benefit from multi-sensor fusion strategies, such as vision/GPS integration through Extended Kalman Filters (EKFs), enabling precise relative navigation during cooperative engagements [
38].
Ultimately, the convergence of swarm theory, cooperative autonomy, robust C2 architectures, and applied mission planning underpins the growing maturity of MUM-T systems. These developments not only expand the tactical envelope of unmanned platforms but also redefine the role of human operators in future joint operations.
Additionally, the proposed system is architected with modularity and platform agnosticism in mind. The relative navigation and control framework is intentionally designed not to be tightly coupled to any specific aircraft model or sensor configuration. Instead, it supports generalization across heterogeneous aerial platforms with different dynamic characteristics. This abstraction of vehicle-specific dynamics enables broader applicability of the system, facilitating its integration into collaborative MUM-T operations across diverse mission profiles. To further enhance its flexibility, the system has been implemented in a way that allows it to function as a high-level control layer on top of standard autopilot systems commonly used in UAVs, such as ArduPilot in this case, as can be seen in
Figure 2. This approach simplifies integration efforts and ensures compatibility with existing flight control architectures.
Finally, an extensive simulation environment has been developed to rigorously evaluate the performance of the framework under a range of realistic conditions, including GNSS outages, visual occlusion, and maneuvering flight. These simulation results pave the way for future ground-based and in-flight testing campaigns, with the ultimate goal of deploying the proposed MUM-T system in operational environments requiring precise, resilient, and scalable formation control capabilities.
In summary, the proposed system leverages advances in UKF estimation, GRP attitude error representation, and multi-sensor fusion to achieve sub-meter accurate relative navigation in formation flight and GNSS-degraded environments. It does so while maintaining modularity across airframe types and being robust to initialization errors and visual signal dropouts. When compared to previous works—whether they emphasize control, planning, or human–machine interaction—this paper uniquely integrates a resilient, platform-agnostic estimation framework directly linked to robust formation flight control for MUM-T missions.
The remainder of this paper is organized as follows:
Section 2 provides a detailed problem formulation, including the design of the relative estimation algorithm and its mathematical foundations.
Section 2.3 describes the implementation of the vision system.
Section 3 focuses on the guidance and control strategies for maintaining formation flight configurations.
Section 4 presents simulation results, evaluating the performance of the proposed framework in diverse scenarios.
Section 5 outlines experimental setups and findings, highlighting real-world applicability. Finally,
Section 6 concludes the paper, summarizing key achievements and identifying directions for future research.
3. Formation Flight
Guidance for formation is necessary to convert the vehicles from their initial state to a desirable configuration known as rendezvous, and subsequently to sustain the formation configuration. The technique we propose is deterministic and explicitly articulated based on the estimated relative state. The ideal condition of the remote carrier is determined based on the commanded formation structure and the state of the fighter in order to account for the leader’s dynamics. The incorrect state is then rectified and minimized by the utilization of vector guidance in Equation (51). The primary purpose of the setpoint state is to largely preserve the formation, rather than expressly ensuring that the fighter remains within the field of view (FOV). This positive attribute is achievable due to the fact that vision serves as a support, allowing the estimate framework to withstand occasional interruptions in visual input and maintain its performance even during prolonged periods of visual unavailability.
The setpoint state is ideally the fighter state at some point in the past with some offsets determined by formation configuration. This section seeks to approximate this by assuming
is not highly dynamic. We begin by commanding a formation configuration, expressed as a position
, as shown in
Figure 4.
This is in the fighter’s body frame but in a plane that is parallel to the navigation frame, i.e., only corrected by
. It is important that this configuration is in the body frame rather than inertial frame to ensure that the leader is within the remote carrier’s camera field, in case it is used. It is also important to note that the inertial heading
often differs from
by a crab angle β as a result of wind. When traveling at the same velocity, in equivalent wind, β will be the same for both aircrafts. To account for this,
is placed in the fighter’s inertial frame to form
, where
in
Figure 4.
Firstly, position and velocity setpoints are calculated based on the relative position wanted in the formation flight;
Figure 5 and
Figure 6 describes the geometric relationship between the leader and setpoint state. Here, the leader is in a banked turn with heading rate
which must be accounted for. Equations (48)–(50) calculate the geometric relationship.
The setpoint position
is then calculated in Equation (50), where
and
are rotation matrices around the z axis, as defined in Equation (51).
The setpoint velocity
is then calculated in Equation (53).
where r is the radio of gyration and
is
Finally, to converge to and track the setpoint, the error state is translated, in terms of the estimated relative state, to mid-level commands that our platform autopilot can accept. In our case, this is
,
and
. To accomplish this, three control laws for positioning, velocities, and lateral acceleration have been developed. First, the position error calculation follows Equation (55).
where the commanded velocity is
using horizontal and vertical gains
and
, respectively. The operator
denotes element-wise multiplication.
In the case of velocity error, the longitudinal component is placed in the fighter wind frame, using the fighter airspeed
.
For the lateral command, in
Figure 7, which is the most complex dynamic in the close-formation flight, the lateral guidance aims to minimize the angle
between the X-Y component of the desired air-relative velocity vector
and the vehicle’s current air-relative velocity vector
.
First, the rotation rate
of
is calculated in Equation (61).
Then, the commanded rate
is calculated using a lateral gain
in Equation (62).
The bank angle command
is then calculated using the commanded heading rate, follower airspeed and gravity
g.
Position Setpoint Calculations
The MUM-T planner aerial module generates coordinated 3D formations of UAVs around a reference point, typically associated with a manned aircraft or a shared operational target. This planner supports a variety of tactical geometric arrangements—including Arrow, Line, cross, sphere, Plus, and Z formations—based on defined mission parameters, as show in
Table 2 and
Figure 8.
This functionality is designed specifically for Manned–Unmanned Teaming (MUM-T) contexts, where UAVs must assume predictable, geometrically controlled positions around a common reference to ensure coordination, deconfliction, and rapid deployment upon tasking.
These position setpoints will be the input of the relative navigation system.
The MUM-T planner parameters that can be defined are in the following
Table 3.
The qualitative distance categories used in
Table 3 are defined as follows:
In multi-UAV mission scenarios involving Manned–Unmanned Teaming (MUM-T), the precise positioning of unmanned aerial vehicles relative to a central reference—typically a manned fighter aircraft—is critical to ensuring operational coherence, spatial deconfliction, and synchronized action. This section presents the mathematical formulation used to compute the relative coordinates of each UAV within a predefined formation geometry.
The UAVs are positioned with respect to a reference point (usually the first waypoint), which represents the center of the formation. The geometric layout is governed by the mission’s orientation heading, the selected formation type, and a lateral separation parameter defined by one of three possible standoff categories: FAR, MEDIUM, or NEAR. Altitude variation may also be introduced to establish a three-dimensional distribution depending on the formation configuration.
For each formation type, a specific rule is applied to compute the relative positions (, , ) of the UAVs in a local UTM coordinate frame. The following subsections define the equations used for calculating these positions. We begin with the Arrow formation, which creates a symmetric V-shape centered on the heading axis of the reference aircraft.
For each position assignment corresponding to the formation types defined previously, the following sections detail how the target coordinates are computed for each UAV within the formation. These coordinates are defined relative to the leader’s frame and account for the spacing and topology constraints of the specific formation pattern.
- (a)
Arrow
The calculation of the relative waypoint is performed in the following way:
For UAV i ∈ {0,…, N − 1}:
If
i = 0: UAV at the center:
If
i > 0: UAVs placed in alternating directions:
- (b)
Line
Let heading angle α be in radians:
For UAV i ∈ {0,…, N − 1}:
- (c)
Cross/Plus
UAVs placed in four cardinal directions (0°, 90°, 180°, 270°) or full 360°.
Uniform angular spacing computed from total UAV count.
For UAV i ∈ {0,…, N − 1}:
- (d)
Z
For UAV i ∈ {0,…, N − 1}:
- (e)
Sphere
The rest are placed on a horizontal circle as in Plus formation.
For
Let M = N − (number of vertical UAVs), then
where V = number of vertical UAVs (1 or 2)
- 3.
Position Assignment
Each position is computed in UTM space and then converted back to geodetic coordinates.
4. Simulations
This section describes the high-fidelity simulation environment developed to evaluate the performance of the relative state estimator, the vision-based error management subsystem, and the mathematical consistency of the control and estimation architecture within a Manned–Unmanned Teaming (MUM-T) framework. The primary purpose of this simulator is not to emulate a complete formation flight mission, but rather to validate the correctness and robustness of the estimation pipeline under realistic sensor and communication conditions prior to hardware-in-the-loop or flight testing.
The simulation framework,
Figure 9 and
Figure 10, has been implemented using MATLAB Simulink 2019a, with the Aerospace Blockset and Aerosim Blockset providing a modular foundation for modeling the aircraft dynamics, actuator responses, and sensor suites. The UAVs and the manned fighter are modeled as nonlinear 6-degrees-of-freedom (6-DOF) systems, with actuators modeled via first-order lags, saturation limits, and rate constraints.
To create a representative environment, the simulation includes
A gravity model,
A geomagnetic field model,
Atmospheric pressure and wind disturbances,
Sensor models that incorporate bias, Gaussian white noise, and cross-axis coupling,
GPS with Gauss–Markov drift,
And crucially, communication latency and packet loss, especially affecting telemetry between the fighter and the remote carriers.
The vision subsystem is modeled as a standard RGB camera with the following:
Resolution: 1920 × 1080 pixels;
Field of view (FOV): 70° horizontal × 42° vertical;
Frame rate: 30 fps.
Fiducial markers are projected into image coordinates based on the relative position and orientation between the aircraft, using Equations (19) and (20). To reflect realistic limitations, the pixel stream is corrupted with white noise and randomized distortions, emulating occlusion, marker jitter, and partial dropout effects.
This controlled vision degradation is critical for validating the vision fault management strategy, which dynamically adjusts estimator confidence and fallback logic when visual information becomes unreliable or inconsistent.
- (a)
Simulated UAV Models
The simulation includes a fleet of three fixed-wing VTOL UAVs, each defined by the following:
Wingspan: 2500 mm
Length: 1260 mm
Material: Carbon fiber
Empty weight (without battery): 7.0 kg
Maximum Take-off Weight (MTOW): 15.0 kg
Cruise speed: 26 m/s (at 12.5 kg)
Stall speed: 15.5 m/s (at 12.5 kg)
Max flight altitude: 4800 m
Wind resistance:
Fixed-wing: 10.8–13.8 m/s
VTOL: 5.5–7.9 m/s
Voltage: 12S LiPo
Operating temperature: −20 °C to 45 °C
These characteristics ensure that the simulated agents reflect realistic operational constraints for medium-sized UAVs in tactical environments.
- (b)
Simulation environment
All simulations were executed on a Lenovo Legion 5 16IRX8 laptop, equipped with
This setup enables real-time multi-agent simulation with sufficient fidelity for estimator tuning and controller evaluation.
As mentioned, the vision sensor was simulated as a normal camera, with a resolution of 1920 × 1080 pixels, a field of view of 70° × 42°, and a frame rate of 30 fps. A frame rate of 30 FPS was chosen as a trade-off between computational load and sufficient temporal resolution for close-range visual tracking. For the short-range relative estimation task, this rate proved adequate to maintain accurate marker detection and smooth estimation. Nonetheless, higher FPS (e.g., 60 or 90 FPS) would reduce latency and potentially improve robustness during fast maneuvers and will be considered in future implementations.
The field of view (FOV) is a crucial factor to consider when choosing a camera. It is essential to monitor all the markers during the close-formation flight procedure. In order to calculate the relative posture, it is necessary to simulate pixel coordinates. These coordinates are determined using Equations (18)–(20), taking into account the positions and attitudes of both the fighter and the remote carriers, as seen in
Figure 7. Subsequently, white noise and a random arrangement were introduced into the pixel stream to replicate the simulated observations.
Figure 11 and
Figure 12 depict the comparative orientation and location of the fighter and remote carrier, respectively. The green line represents the accurate data collected from the simulator, the red line represents the unprocessed data from each UAV, the intense blue line represents the vision measurements, and the light blue line represents the estimated data of the relative attitude and the position between UAVs.
Simulations had instances of vision dropouts, which are evident when the POSIT measurement reaches zero. The presence of vertical lines that reach zero denotes the start and end of the dropout. The vision problems can be ascribed to the markers located outside the field of view of the remote carrier. Despite significant setbacks in foresight, the estimator’s accuracy remains largely unaffected.
Figure 13 shows relative velocities between aircraft. The intense blue is the true data obtained from the simulator, the red one is the raw data, and the green one is the estimator data, showing adequate robustness for the entrusted mission.
Analysis
This section presents a quantitative evaluation of the accuracy of the relative state estimation subsystem used during formation flight experiments. The evaluation is based on a comparison between the estimated states and ground-truth references. Three key components are evaluated: relative position, relative velocity, and relative attitude. The comparison is expressed in terms of Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), which are widely used metrics in estimation theory and control validation.
Mean Absolute Error (MAE) quantifies the average magnitude of the estimation error. It provides a clear sense of the overall deviation between estimated and true values, regardless of the direction of the error. It is less sensitive to outliers and is often used to assess the average expected deviation in real applications.
where
is the estimated value at time step
i,
is the corresponding ground-truth value, and
N is the total number of samples.
Root Mean Square Error (RMSE) penalizes larger deviations more strongly due to the squaring of errors before averaging. RMSE is more sensitive to occasional large errors (outliers) and thus provides a better sense of estimation robustness and worst-case deviations. It is particularly relevant in systems where sporadic large errors may compromise performance or safety.
where
is the estimated value at time step
i,
is the corresponding ground-truth value, and
N is the total number of samples.
While both metrics provide a measure of estimation accuracy, RMSE is more sensitive to large errors due to the squaring operation and is therefore more appropriate for detecting outliers or transient performance degradation. MAE, being less affected by extreme values, offers a more general sense of the overall estimation quality.
Figure 14 illustrates the errors between the estimated relative position and the ground-truth relative position over time. The figure provides a visual representation of the estimation accuracy across the three spatial axes and highlights both the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) values for each component. This visualization complements the quantitative analysis by showing not only the average performance but also the temporal dynamics and possible transient deviations of the estimation algorithm.
Table 4 summarizes the MAE and RMSE values between the estimated and true relative positions in the X, Y, and Z directions. In this configuration, the position estimation is primarily based on onboard inertial sensors and kinematic models, with vision-based measurements used as a secondary aiding source when available to improve drift correction and reduce long-term bias.
The lateral and vertical position estimates exhibit low error levels (MAE < 5 cm and RMSE < 11 cm). The longitudinal (X) component presents a higher RMSE of 28 cm, despite a moderate MAE of 10 cm. This discrepancy is explained by occasional discontinuities caused by a temporary loss or reacquisition of visual markers, which degrade the aiding signal provided by the vision system. These transient errors are better captured by the RMSE metric due to their sensitivity to outliers.
- 2.
Relative Velocity Estimation
Figure 15 illustrates the errors between the estimated relative velocity and the ground-truth relative velocity over time. The plot presents the temporal evolution of the estimation error along each axis and includes the corresponding Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values. This figure provides insight into the dynamic behavior and precision of the velocity estimation algorithm, which relies solely on inertial measurements and onboard models.
The relative velocity estimation is computed entirely from inertial measurements and onboard kinematic models and does not rely on any visual data.
Table 5 provides the error metrics for the estimation of relative velocity.
The velocity estimation system shows uniformly low errors across all axes, with MAEs below 6.3 cm/s. The small differences between MAE and RMSE values indicate a consistent estimation performance with limited influence from transient deviations. These results confirm the robustness of the inertial-based estimation approach, particularly useful in environments with poor visual feedback or during visual marker occlusions.
- 3.
Relative Attitude Estimation
Figure 16 illustrates the errors between the estimated relative attitude (roll, pitch, and yaw) and the corresponding ground-truth values over time. The figure includes the evolution of estimation errors for each rotational axis, along with the associated Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values. This visualization highlights the performance of the attitude estimation subsystem, which is primarily inertial-based and supported by visual corrections when available, and reveals the impact of transient discrepancies due to temporary visual marker loss or reacquisition.
Table 6 provides the error metrics for the estimation of relative attitude angles. The attitude estimation algorithm is primarily driven by gyroscopic and inertial data, with vision-based pose information used as a complementary correction input when available.
All three attitude components demonstrate errors well below 1°, indicating high-quality orientation estimation. The pitch and yaw estimates are slightly more accurate than roll, which is more susceptible to cross-axis coupling and vision-related noise. Similarly to the position estimation case, outliers observed in the roll axis are likely caused by temporary disruptions in visual marker tracking, which affect the vision-aided correction process.
The combined analysis demonstrates that the relative estimation framework provides accurate and reliable outputs across all six degrees of freedom and their derivatives. The core estimation algorithms are inertial-driven, with visual data acting as an aiding mechanism that enhances long-term consistency. This design provides robustness against the temporary loss of visual information, while still benefiting from visual corrections when available. The low MAE and RMSE values across position, velocity, and attitude confirm that the system is well-suited for autonomous formation flight and cooperative navigation tasks in uncertain or visually degraded environments.
5. Experiments
For the experiments conducted in this paper, we utilized a swarming framework developed in the Robotic Operating System (ROS) and C++. This framework, independent of this study, enables the coordination of diverse robotic systems, including terrestrial, aerial, and aquatic platforms. Its modular architecture allows the seamless integration of new platforms and payloads, while C++ ensures high performance and real-time communication. The framework also interfaces with ArduPilot, providing a unified control interface for autonomous vehicles. This implementation facilitates efficient swarm coordination, making it a valuable tool for autonomous systems research.
The primary purpose of the Manned–Unmanned Teaming Air mission is to safeguard a fixed-wing vehicle, typically operated by a human pilot. A fleet of 2 remote carrier vehicles will trail after the UAV, simulating to be the fighter, maintaining a consistent formation around it. The level of protection will be contingent upon the following parameters:
The selected vehicles will be evenly spaced around the leading vehicle, with the number of selected vehicles being taken into account.
Placement: Depending on the selected formation, the vehicles will be positioned in different locations.
- ○
In a PLUS (+) formation, the vehicles will be evenly dispersed around the leading vehicle, with one remote carrier positioned directly in front.
- ○
The CROSS configuration (X) involves the equitable distribution of remote carriers surrounding the leading vehicle, with two vehicles positioned in front.
- ○
ARROW formation: they will be positioned so that the tip of the arrow is the vehicle to follow.
- ○
SPHERE formation: it is equivalent to CROSS formation, but one vehicle will be placed above and another below
- ○
LINE formation: they will be distributed forming a line to the sides of the leader.
- ○
Z formation: they will be distributed forming a vertical line above and below the leader.
The pilot has the authority to determine the distance between assets, with three separation options available: close, middle, or far, based on the desired proximity to the leader.
The MUM-T Human Machine Interface (HMI) will provide a live representation of the spatial configuration of each element in the MUM-T formation. The marked trajectories will enable operators to visually perceive the relative position between the manned aircraft and UAVs, both predicted and current. Furthermore, a dynamic depiction of restricted airspace will be included, computed according to the fighter’s dynamics and its anticipated future placements. The exclusion zones will undergo regular updates to guarantee both safety and efficiency in operations.
The determination of security levels between UAVs and manned aircraft will be based on their proximity and relative position and indicated with a color code, green, yellow, and red, based on the level of risk. The interface will offer both visual and aural alerts to notify operators of potential hazards, allowing them to make well-informed decisions to prevent dangerous situations. To ensure the safety of the manned aircraft and maintain the integrity of the MUM-T formation, we will employ collision detection and proximity detection algorithms, as shown in
Figure 17.
The system will enable the control of various formations, adjusting to the individual requirements of the task. Operators have the ability to arrange formations in various tactical configurations such as line, wedge, square, or other arrangements depending on the mission objectives and environmental conditions. The interface will additionally enable the synchronization between UAVs and manned aircraft, enabling flexible adjustments in the arrangement to promptly react to unforeseen circumstances, as can be seen in
Figure 17 and images below.
A prerequisite for initiating a MUM-T Air mission is the presence of a piloted vehicle within the swarm system. After being chosen, the vehicles can be directed to follow the selected target based on the specified characteristics. The MUM-T class constructor will handle declaring the ROS objects (subscribers, publishers, and services), as well as initializing variables or retrieving their values from the database. Following these fundamental initializations, you will acquire the precise location of the foremost car in order to determine the allocation of the other vehicles based on it. After the distribution is computed, each vehicle will commence moving towards its designated location.
After the mission is properly established, the commands sent to the autopilot will be adjusted whenever the leader’s position is updated. Furthermore, you will need to inquire about the velocities of the object, encompassing both its linear and angular motion.
Given that this mission lacks a predetermined sequence of waypoints and the movement of the vehicle to be tracked is not established, three distinct setpoints will be sent. These setpoints include the desired speed of the system, the desired bank angle, and the desired altitude above the ground. In order for the autopilot to process these commands, it is imperative that its operating mode is set to stabilize, which is used to control the UAV in attitude angles.
Once the system is in guided mode and receives the required data from the vehicle to be followed, it will calculate the setpoints to be delivered to the autopilot. The system will then wait for an output that aligns with the input parameters of the mission.
To ensure non-air collisions, there exists a collision detection module, which alters the flight altitude of each aircraft, and the height setpoint will cease to change in the mission whenever a potential collision is detected. This adjustment will consider the altitude commanded by the anti-collision module. This flowchart is explained in
Figure 18.
This experiment involved the use of three unmanned aerial vehicles (UAVs) to test formation flying and mission re-tasking capabilities in a controlled environment. Two of the UAVs were configured to act as remote carriers (RCs), while the third simulated a fighter. The primary objectives were to analyze the ability of the group to fly in various geometric formations, perform a mission assigned to the remote carriers, and subsequently regroup with the fighter.
- ○
Fighter UAV: Responsible for leading the formation and coordinating operations. This UAV acted as the central node for mission management.
- ○
Remote Carriers: Equipped with limited autonomy capabilities, the two UAVs supported the fighter by executing specific tasks and following instructions.
Swarm Composition:
GCS:
Communications:
- ▪
XTEND 900MHz 1W for mesh communications
- ○
Antenna omni 5 Dbi for airplane
Antenna omni 11 Dbi for ground
The methodology followed during the experiments is divided into different steps.
Formation Flying: Several formation patterns were mentioned before for the UAVs,
Figure 20, to adopt and maintain during flight. The experiment evaluated the stability and accuracy of each formation under varying flight conditions, including changes in speed, sharp turns, and turbulence (
Figure 21).
Mission Re-tasking: During the flight, the remote carriers were assigned a new mission that required them to temporarily leave the formation. The mission involved tasks such as simulating reconnaissance over a designated area (
Figure 22).
Regrouping: After completing their mission, the remote carriers returned to the main formation. The regrouping process followed a safe approach protocol to minimize interference and ensure a smooth return to their assigned positions within the formation (
Figure 23).
The following results and observations were obtained during the test,
Figure 24.
Formation Stability: The formations were stable in simpler configurations, such as the Plus pattern. However, in more dynamic configurations like the Arrow formation, greater deviations were observed in the relative positions due to sensor navigation limitations.
Re-Tasking Efficiency: The remote carriers successfully completed their assigned mission within an acceptable timeframe, although synchronization during regrouping with the fighter experienced minor delays in scenarios with increased simulated traffic density.
Regrouping Success: Regrouping was successfully achieved in most trials, highlighting the importance of trajectory optimization algorithms in multi-UAV scenarios.
The following
Figure 25 and
Figure 26 show the different trajectories of the fighter and the remote carriers during various formation types, including Arrow, Line, and Z formations. It also illustrates the re-tasking of the two remote carriers as they temporarily leave the formation to execute a mission, followed by their subsequent regrouping with the fighter.
Analysis
To rigorously evaluate the safety and stability of the swarm’s decentralized collision avoidance mechanism, a detailed statistical analysis was conducted on inter-agent interactions throughout the flight. The methodology was applied pairwise to all UAVs in the swarm using synchronized trajectory data obtained from real flight experiments.
While the covariance matrices are not plotted for brevity, we observed consistent convergence behavior in the diagonal elements of the state covariance matrix across all estimation scenarios. The filter achieved steady-state values within the first 3–4 s of flight, indicating stable estimation performance.
First, inter-UAV distances were computed over time to classify proximity interactions into three severity levels:
Risk: separation between 10 and 40 m.
Danger: separation between 5 and 10 m.
Near-collision: separation below 5 m.
Each category was quantified in terms of total event count, duration in seconds, and percentage of the total mission time. In parallel, Time to Collision (TTC) was calculated dynamically for each UAV pair based on their relative position and velocity vectors. TTC values provided insight into imminent threats not necessarily captured by spatial proximity alone.
A critical converging event was defined as a situation in which the following occurred:
The UAVs had a relative heading <90°, indicating converging trajectories;
The TTC was less than a defined threshold (5 s);
The vertical separation was below 5 m.
These events were identified and visualized separately to highlight genuine risks of three-dimensional conflict, even in cases where horizontal separation appeared acceptable.
Beyond discrete event counting, several additional metrics were computed to support claims of overall system stability:
Average separation distance between each UAV pair.
Percentage of time with valid TTC, indicating the predictability of motion geometry.
Rate of critical events per minute of flight, focusing exclusively on near-collisions and converging interactions.
All results were exported into a structured summary table (see
Table 7) and supplemented with visual timelines and trajectory-based plots (see
Figure 27 and
Figure 28). This combination of geometric, kinematic, and statistical indicators provides a robust foundation for validating the swarm’s ability to maintain separation autonomously under dynamically evolving mission conditions.
Table 7 presents the results of a detailed pairwise analysis of UAV trajectories, focusing on proximity and heading convergence as indicators of potential collision risk. The dataset evaluates key metrics for three UAV pairs: UAV 1–2, UAV 1–3, and UAV 2–3. In all experiments, UAV 1 is designated as the leader, and the relative estimation is performed from the perspective of UAV 2 (wingman). The notation “UAV 1–2” therefore indicates an estimation of UAV 2’s state relative to UAV 1.
The following indicators were computed: minimum distance reached, number and duration of risk-related events, and percentage of mission time spent in each risk category. The categories include general Risk Events, Danger Events, Near-Collision Events, and Critical Converging Events (i.e., proximity events with heading alignment suggesting imminent convergence).
Pair UAV 1–2 reached a minimum separation of 34.77 m, triggering 61 risk events. These events accumulated a total of 30.17 s, corresponding to 0.51% of the total mission time. Additionally, 22 critical converging events were detected, with a cumulative duration of 10.88 s (0.18% of the mission time). No danger or near-collision events were recorded, indicating that although frequent, these interactions were not severe in nature.
Pair UAV 1–3 experienced slightly closer interactions, with a minimum distance of 29.35 m. A total of 30 risk events were logged, lasting 14.84 s (0.25% of the mission). Despite the reduced event count, the number of critical converging events was higher than in UAV 1–2 (23 events, lasting 11.38 s, or 0.19% of the mission), suggesting more alignment in velocity vectors during close encounters.
Pair UAV 2–3 showed the most concerning behavior, with the closest minimum distance of 19.67 m. This pair experienced a significantly higher number of risk events (474), totaling 234.47 s, which represents 3.98% of the mission time—an order of magnitude higher than the other pairs. However, only 18 of these events were classified as critically converging, indicating that while close-range operations were frequent, their converging nature was less prominent (0.15% of the time). Still, the elevated number and duration of risk events suggest a persistent violation of conservative safety margins.
Overall, while none of the pairs registered any Danger or Near-Collision events (0% in all cases), the high number of Risk and Critical Converging events—especially in the case of UAV 2–3—highlights the need for tuning inter-agent spacing policies and potentially modifying the collision avoidance or path planning strategies for improved long-term safety and stability of the swarm during operation.