1. Introduction
Drone swarms are of great irreplaceable value in GPS-denied scenarios such as post-disaster search and rescue and regional surveillance. As a key enabling technique for delivering autonomous formation control, cooperative localization plays a direct and decisive role in the efficacy of swarms during missions through its accuracy and stability. However, the dynamic topological evolution of networks arises from the high-speed maneuvering of nodes and masking in complex environments and leads to the strong coupling effect of ranging error, including nonlinear accumulation in the temporal dimension and multi-hop diffusion in the spatial dimension [
1]. For this reason, some conventional methods face a dual challenge: localization diffusion and real-time performance collapse.
This paper is intended to address three key problems in the cooperative localization of drone swarms under dynamic topology as follows: a. dynamic topology representation involves the time-varying characteristic of neighborhood relations, which is caused by node maneuvering and link masking, so that the traditional static graph model completely fails in this case [
2]; b. error coupling mechanism is characterized by the joint effect of non-line-of-sight (NLOS) bias and abrupt motion change, which causes the error to accumulate exponentially along the time axis and diffuse through the network topology space [
3]; c. feasibility of decoupling depends on whether the spatio-temporally entangled error can be separated and its convergence controlled, which affects the intrinsic stability of the localization system.
The existing research methods have some fundamental limitations. Centralized algorithms (e.g., multidimensional scaling, MDS) rely on factorization of the global static distance matrix, and their calculation of O(N
3) is too complicated to cope with real-time topological changes. Moreover, they are extremely sensitive to single NLOS errors [
4]. Filter methods (e.g., extended Kalman filter, EKF) are advantageous in terms of distribution but are prone to diffusion in high-speed maneuvering scenarios because of their sensitivity to nonlinear/non-Gaussian noises [
5]. Graph optimization models (e.g., g2o) have included robustness as a constraint but depend too much on initial estimates and fail to guarantee convergence during abrupt topological changes [
6]. Deep learning models (e.g., Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT)), which have become popular in recent years, can capture the association between nodes but generally neglect spatio-temporal dynamic relations, meaning they are unable to interpret theories in a rigorous sense [
7]. It should be noted that all of these techniques do not achieve explicit mathematical mapping between dynamic topology and localization error, let alone provide proof of convergence.
In order to overcome the above problems, this paper presents ST-DCL, a Spatio-Temporally Decoupled Cooperative Localization method. The core of the proposed ST-DCL framework is the innovative integration of a Dynamic Weighted Multidimensional Scaling (DW-MDS) optimizer and a Spatio-Temporal Graph Neural Network (ST-GNN) architecture. In this scheme, a spatio-temporal coupling equation for error propagation is first created under dynamic topology to rigorously derive the upper bound of the error. Based on this model, the DW-MDS optimizer is conceived to attenuate the propagation of historical errors by virtue of an adaptive sliding window mechanism and to block the spatial diffusion path by confidence weight. Subsequently, it is theoretically proven that this optimizer converges within a compact convex set to the global optimal solution at a linear rate. Within the ST-DCL framework, the ST-GNN integrates a dynamic topological attention mechanism and dilated causal convolution in an innovative way to further capture the local nonlinear effect of abrupt topological changes. The dynamic topological attention mechanism adjusts the neighbor aggregation weight in an adaptive fashion to suppress spatial diffusion, while the dilated causal convolution introduces long-term time series dependence into the modeling to halt error accumulation. A confidence feedback mechanism is borrowed to connect the DW-MDS with the ST-GNN and form a closed-loop control system of error suppression. It is revealed through strict analysis that the residual error correction generated by the ST-GNN converges to a bounded neighborhood on the basis of probability when the algebraic connectivity meets the derived threshold. The key innovation of this paper is reflected in four aspects as follows:
1. We articulate the intrinsic dilemma between global consistency and local adaptability in dynamic swarm cooperative localization. Grounded in this perspective, a novel differential equation model is established to elucidate the joint impact of abrupt node maneuvering and network algebraic connectivity on spatio-temporal error propagation. This provides a theoretical foundation and a predictive error bound for the subsequent algorithm design.
2. We propose the ST-DCL framework, whose core innovation lies in a closed-loop spatio-temporal decoupling architecture. Via a confidence-aware feedback loop, it organically integrates a global Dynamic Weighted MDS (DW-MDS) optimizer with a local Spatio-Temporal Graph Neural Network (ST-GNN) corrector, achieving the targeted and synergistic suppression of the diverging spatial and temporal error components.
3. Theory-Guided Algorithmic Design with Component-Level Innovations:
DW-MDS Optimizer: Designed by integrating an adaptive temporal sliding window and dynamic spatial confidence weights, its update mechanism is directly linked to the derived error model. This design ensures convergence to a stable solution at a linear rate even under time-varying topologies.
ST-GNN Corrector: Its architecture features two newly designed dedicated components: (i) a Dynamic Topological Attention Module that actively regulates neighbor aggregation based on real-time link reliability to block spatial error diffusion, and (ii) a Dilated Causal Convolution Module that captures multi-scale temporal dependencies to mitigate error accumulation. This structure is a direct computational embodiment of the error suppression strategy inferred from our theoretical model.
4. We provide rigorous convergence analysis for both core modules. Extensive simulations under challenging conditions (e.g., high dynamics, NLOS) validate the framework’s effectiveness, demonstrating consistent performance gains over baseline methods. Beyond simulations, real-world flight experiments with a 10-UAV swarm confirm its practical deployability and superior accuracy in GPS-denied environments, achieving real-time performance on onboard embedded processors.
The core contribution of this work lies in a closed-loop spatio-temporal decoupling architecture based on confidence feedback, addressing the dilemma between global consistency and local adaptability under dynamic topology. Unlike existing hybrid methods that often treat learning modules as black-box enhancers, the proposed ST-DCL connects the global optimizer (DW-MDS) and the local corrector (ST-GNN) into a closed-loop system via confidence feedback derived from the error propagation model. This design aims to achieve the synergistic suppression of spatio-temporal error components and provides a theoretical analysis framework for convergence under dynamic topology, complementing existing methods in terms of interpretability and convergence guarantees.
The structure of this paper is as follows:
Section 2 outlines the relevant works.
Section 3 elaborates on a mathematical model for dynamic topology and error propagation. The principles followed in the design of the DW-MDS optimizer and the ST-GNN are detailed in
Section 4.
Section 5 presents a systematic validation experiment and comparative analysis.
Section 6 gives a critical discussion on the limitations and prospects of the work and sums up the achievements and future focuses of the research.
2. Relevant Works
Cooperative localization, as a key technique supporting drone swarms in GPS-denied environments, has evolved and integrated with other techniques in three aspects, that is, optimization theory, state estimation, and data learning. Nevertheless, there are main challenges still being encountered in its research, including dynamic topology, non-line-of-sight (NLOS) error, and the spatio-temporal coupling effect. This section gives a systematic description of key methods in the domain of cooperative localization and particularly expounds their theoretical bases, applicability, and limitations, providing an academic background to support the proposed ST-DCL framework.
2.1. Localization Methods Based on Optimization
Optimization methods are proven to be effective at minimizing the difference between observed and estimated distances by creating an objective function in topological static or slowly changing networks. However, their performance often plummets due to overcomplicated computation, sensitivity to outliers, and lack of spatio-temporal decoupling mechanisms, while facing the dynamic topological evolution and frequent NLOS interference caused by the high-speed maneuvering of nodes and complex environments.
Centralized optimization methods are limited to computation in global coupling. Multidimensional scaling, a classical analysis method [
8], relies on the spectral decomposition of the global distance matrix, making it difficult to satisfy the needs of real-time localization in large networks because of the computational complexity of its
. A great number of variants from this method have been developed in an attempt to alleviate the difficulty. Among them, vMDS [
9] introduces a time smoothness constraint to cope with the abrupt changes in node trajectories, but it fails to effectively inhibit the multi-hop diffusion of error in the spatial dimension. In order to identify the reliability of links, dwMDS [
10] integrates space weights, but these weights are normally static or designed heuristically, making it difficult to cope with abrupt topological changes and unexpected NLOS errors. In FC-MDS [
11], the network is assumed to be fully connected or have missing links created by the shorted path. It resolves the problem of connectivity but overlooks different reliabilities between links. Moreover, it is unable to resist NLOS errors and noise satisfactorily, causing the averaging of errors in the network.
Distributed optimization methods attempt to reduce the cost of communication and computation by virtue of local computation. Ran et al. [
12] constructed a cooperative navigation model based on a factor graph and adopted the sum-product algorithm for a lower cost of communication. However, the model lacks robustness in the NLOS scenarios under the Gaussian assumption of NLOS error. Yuhong et al. [
13] put forth a method of integrating coordinate transforms and combined inertial navigation and relative observation, but they did not fully consider the time-varying characteristic of topology, causing an absence of spatial consistency. Han et al. [
14] designed a fully covered path planning method based on gradient descent, but its computational complexity soared with the in-creasing number of nodes. Hence, it cannot be simply borrowed for very large swarms.
Mixed architecture is developed as an attempt to combine heuristic search and rolling optimization, so as to ensure the efficiency and accuracy of computation at the same time. Hu et al. [
15] brought forward a dynamic discrete pigeon-inspired optimization algorithm integrating biomimetic intelligence and model prediction control, but they did not include a closed-loop suppression mechanism of error propagation. Wu et al. [
16] developed a hybrid localization strategy based on clustering, which adopted an optimization algorithm for clustering. In a cluster, the improved MDS was used for centralized localization. After that, coordinate fusion was conducted between clusters. Their work provides a promising distributed solution for the cooperative localization of swarms in denied environments.
2.2. Dynamic System State Estimation
State estimation methods depend on system dynamics and observation models for probability estimation to localize targets. They are mainly designed into filters and nonlinear observers. Nevertheless, these methods are highly dependent on the linearity level of the model, the assumption of noise distribution, and initial state, meaning they are prone to performance degradation in environments with a dynamic topology and complicated noise.
Kalman filter methods, e.g., extended Kalman filter (EKF) and unscented Kalman filter (UKF), have been widely applied in the integration of multi-source sensors. Liu et al. [
17] adopted the IMU pre-integration technique to suppress linear error, but noticeable bias was still encountered in the attitude estimation at the time of abrupt acceleration changes. An et al. [
18] introduced a topology tracking control method based on the adaptive consistency protocol, which was extremely sensitive to non-Gaussian noise. In the NLOS environment, its error variance increases by more than 200%. Farhat et al. [
19] designed a nonlinear observer based on the Lyapunov stability theory but did not build a model for the diffusion effect of errors in multi-hop networks, which caused an increase in localization bias in the cooperative maneuvering of swarms.
Particle filter (PF) methods approach the posterior distribution of states by non-parametric sampling, meaning they are applicable to highly nonlinear systems. Li et al. [
20] utilized the recursive neural network (RNN) to predict motion distribution for the improvement of particle sampling efficiency. However, it is time-consuming to compute in the resampling process, so it cannot satisfy the real-time needs of large swarms. Pan et al. [
21] developed a visual–inertial simultaneous localization and mapping (VI-SLAM) system based on the selection of key frames, which maintained the real-time feature by lowering computation. Nevertheless, the probability of localization failure increased dramatically when visual features were missing.
2.3. Data-Driven Localization Model
With breakthroughs in deep learning, data-driven methods have provided a new pattern for cooperative localization. These methods are able to automatically learn the complicated associations of nodes, but they are highly dependent on annotated data and are poorly explained. Moreover, they often ignore the physical constraints on systems, so their generalization is limited if there is not sufficient data for training or the scenario is out of distribution.
A graph neural network captures the spatial correlation between network nodes through messaging and neighborhood aggregation. Chen et al. [
22] presented a graph convolution network (GCN) that aggregated the neighborhood information by virtue of Laplace smoothing but could not cope with the topological relations of dynamic changes. Ramezani et al. [
23] developed a Graph Attention Network (GAT) with an adaptive neighborhood weight mechanism, but the GAT had a computational complexity and was troubled by a significant time delay when lots of nodes were involved. Dai et al. [
24] applied the Transformer architecture to the graph structure data, which could build a model with long-term dependence, but failed to effectively integrate the dynamic features of time and space. Therefore, there was a large prediction error in the high-speed maneuvering of swarms.
In recent years, hybrid frameworks combining learning and optimization have become a research focus in the frontier. Zhu et al. [
25] put forth Graph VSLAM to embed GNN into factor graphs for optimization, which enhanced its capability of judging data associations. Bo et al. [
26] integrated neural radiation field and state estimation in a NeuSEF framework to strengthen its adaptability to complicated environments. Even so, these methods still face some severe limitations: a. the system is still characterized by “black box” on the whole because there is no explicit modeling for the spatio-temporal coupling mechanism of error; b. its potential in practical applications is limited due to excessive dependence on massive annotated data and the extreme high cost of track–range joint annotation for real dynamic swarms.
Above all, the existing methods for cooperative localization are clearly unable to satisfactorily cope with three challenges including dynamic topology, NLOS error, and spatio-temporal coupling. The optimization methods may be well-developed but weakly adaptable. The state estimation methods are easy to diffuse and highly hypothesized. The data-driven methods lack a theoretical guarantee and capability of generalization. The ST-DCL framework proposed in this paper is built upon the pioneering integration of dynamic system theory, graph optimization, and deep learning. It is intended to construct a provably convergent cooperative localization system that is mathematically explainable, dynamically adaptable, and highly robust, addressing the limitations of the isolated approaches reviewed in this section.
3. System Model and Problem Description
In this section, a mathematical model is established for the cooperative localization of drone swarms under dynamic topology to elucidate the spatio-temporal coupling mechanism of the ranging error in GNSS-denied environments and provide the theoretical support for subsequent algorithm design. We systematically address the following key questions after defining the dynamic network model, quantifying the regularity of error propagation, and formulating the optimization objective function:
Dynamic topology characterization: How can a model be built for the topological time-varying characteristics of networks arising from the high-speed maneuvering of nodes and communication masking?
Error coupling mechanism: How does the ranging error have nonlinear accumulation in the spatial and temporal dimensions and cause localization diffusion?
Decoupling feasibility: Is there any separable error propagation model for the stability control of cooperative localization?
3.1. Construction of a Network Model
Firstly, the dynamic graph model is characterized. In order to accurately outline the spatio-temporal evolution characteristics of highly maneuverable drone swarms, a network is modeled as a time-discrete dynamic graph sequence in this paper. In the sequence, the node set denotes the survived drone node at the time t (considering the random failure of nodes); N is the cluster size; and the edge set is formed by the two-way ranging links between nodes, whose connection dynamically varies with time.
Multidimensional scaling (MDS), as a classical analysis method, estimates the relative coordinates through the spectral decomposition of the distance matrix between nodes. A network constructed by N nodes is given, and its localization process can be formalized as follows:
Step 1: Construct a distance matrix. The ranging matrix between nodes is constructed with the element as the observed Euclidean distance between nodes i and j.
Step 2: Bi-centralize. The bi-centralization matrix is calculated with as the centralization matrix, as the distance matrix after squaring the element, and as the ones vector.
Step 3: Perform eigenvalue decomposition. Eigenvalue decomposition is carried out for the matrix
as follows:
where
denotes the eigenvector matrix, and
is the diagonal matrix including the eigenvalues of
. The first k largest eigenvalues (
k = 2 or 3, representing the 2D/3D space, respectively) and their eigenvectors are taken to obtain the coordinate matrix of nodes:
The model performs well with static wireless sensor networks [
1], but it has three innate defects as follows: a. the static assumption is invalid. It requires a constant distance matrix D, which does not match with the time-varying ranging data from the high-speed motion of nodes; b. high sensitivity to anomalies. A single NLOS ranging error may trigger a butterfly effect on matrix decomposition; c. lack of spatio-temporal correlation. The motion continuity constraint of nodes is not utilized, which causes a lack of smoothness in track estimation.
Compared with the traditional static graph models, the model includes the following key characteristics for dynamic adaptability:
Definition 1 (dynamic topology constraint). At any time t, the communication link between nodes satisfieswhere indicates the coordinates of the 3D position of the node i at the time t; is the maximum communication radius at the ideal line of sight (LOS); represents the link-masking state (which is LOS for or NLOS for ). The model uses time-varying probability to reflect the coupling effect of electromagnetic interference and terrain masking. It gets closer to the actual situation than the fixed NLOS probability adopted in the existing studies. In real-world networks, communication links can suffer from intermittent disruptions due to physical obstructions or interference. Our model treats the link state as a stochastic process that evolves over time. The sliding window mechanism in DW-MDS enables the system to leverage temporal redundancy when partial observations are missing. Meanwhile, the dynamic attention weight in the ST-GNN (see Equation (28) for details) adjusts in real time based on the historical quality of the link. When a link is interrupted, its corresponding attention weight is automatically attenuated, and the system then relies on multi-hop information propagation to maintain positioning. Secondly, a model is built for ranging error coupling. The data link of drones provides the observed relative range between nodes , and its error can be decomposed intowhere is the bias caused by NLOS and is subject to zero-mean-value Gaussian distribution, and is the measurement noise of hardware and is also subject to zero-mean-value Gaussian distribution. It should be noted that the Bernoulli-based NLOS masking model and the zero-mean Gaussian bias assumption are a tractable simplification for theoretical derivation and initial algorithm design. In practical environments, non-line-of-sight (NLOS) errors typically exhibit non-Gaussian, heavy-tailed, or time-varying biased characteristics [
27]
. The Gaussian noise assumption adopted in this paper is a common theoretical simplification intended to facilitate theoretical analysis and preliminary simulation verification. However, the core algorithmic components of the proposed framework—namely, the Huber loss function within the Distributed Weighted Multidimensional Scaling (DW-MDS) and the dynamic topological attention mechanism in the Spatio-Temporal Graph Neural Network (ST-GNN)—are specifically designed to enhance robustness against non-ideal errors, including outliers and time-varying biases. Subsequent real-world flight tests (Section 6) will be conducted in highly realistic, complex electromagnetic and topographical conditions to directly evaluate the framework’s capability to mitigate actual NLOS errors. Definition 2 (dynamic ranging constraint). In the time window , the observation set of the node pair satisfieswhere the length of the window T is adjusted and adapted to the maximum acceleration of the nodes:where is the maximum acceleration vector of drones, and denotes the discrete time step length. This design ensures that the time relevance of ranging data can effectively reflect the maneuvering characteristics of nodes and avoid the motion information loss attributed to the fixed window length. It will be elaborate further. 3.2. Analysis of Error Coupling Mechanism
In drone swarms under dynamic topology, the spatio-temporal coupling effect of the ranging error is characterized by nonlinear accumulation in the temporal dimension and multi-hop diffusion in the spatial dimension, as shown in
Figure 1. In this section, random process analysis and network flow theory are adopted to prudently prove two key aspects as follows:
(1) Accumulation effect on time
We define the localization error of nodes as
, where
is the estimated position of the node
i and
is the real position of
i. The evolution of the localization error satisfies the following recursive equation:
where
is the velocity coupling coefficient,
. Here,
is the threshold velocity change that triggers significant error amplification, and
is the scale parameter controlling the smoothness of the transition from normal to amplified error accumulation. This function represents the amplification of error accumulation by the abrupt change in velocity at the node
;
stands for the term of NLOS deviation,
, which reflects the squared enhancement of the ranging error by the density of masking in the environment. Moreover,
stands for the probability or proportion of the communication path between nodes subject to NLOS at time t,
indicates the maximum ranging deviation subject to NLOS, and
represents the ranging error of the node i and the node j at time t;
is a standard normal random variable representing the normalized measurement noise. The coefficient
is the equivalent propagation coefficient for the NLOS deviation, a dimensionless constant that quantifies the average amplification of a single NLOS ranging error into the node localization error. The coefficient
is the standard deviation of the measurement noise
(as defined in Equation (4)), scaling
to its actual magnitude.
Theorem 1 (upper bound of error accumulation): In the drone swarms under dynamic topology, the upper bound of node localization error accumulation in the temporal dimension satisfies
The proof is detailed in
Appendix A. The recursive equation is expanded by mathematical induction and combined with Holder inequality to obtain the upper bound of the accumulation error index. When
(i.e., high-speed maneuvering), the error tends to diffuse. This verifies the root cause of the failure of the traditional localization algorithms under dynamic topology.
(2) Spatial diffusion effect
For the drone swarms under dynamic topology, the spatial diffusion effect must refer to the phenomenon that the localization error of a single node is propagated level by level through a multi-hop communication link in the network. This must be rooted in the dependence of cooperative localization, that is, each node relies on the estimated position of neighboring nodes to resolve and determine its own position. If the node i has an error arising from NLOS or high-speed maneuvering, this state of error will be propagated in the following ways:
① Direct propagation: The error of i deforms its measured distance from the direct neighbor
j, that is,
, which results in the localization bias of
j:
where
is the normalized propagation coefficient, and
is the confidence weight of the link.
② Multi-hop cascade: After the error diffuses through k hops, its upper bound of the node m is as follows:
where
is the average connectivity of the network, and
is the largest single-hop error.
Dynamic topology increases the uncertainty of diffusion. The maneuvering of nodes leads to the time variation in neighbor set , causing the asymmetric path of propagation (e.g., errors aggregate towards the center of a formation at the time of centripetal movement). In the NLOS environment, is increased to speed up diffusion. For this purpose, an algorithm is required to simultaneously suppress the generation of errors (e.g., robustness optimization in DW-MDS) and block the path of propagation (e.g., topological attention mechanism in the ST-GNN).
3.3. Modeling of the Mathematical Problem
This section is intended to transform the problem of cooperative localization under dynamic topology into a calculable optimization model and then resolve two key contradictions: a. the contradiction between measurement accuracy and dynamic adaptability. A traditional static optimization model cannot adapt to the abrupt change in the observation matrix under the time variation in topology; b. the contradiction between error suppression and computation efficiency. It is required to suppress spatio-temporal coupling errors and ensure the real-time constraint on drone platforms at the same time. For the above intention, a spatio-temporal joint optimization framework is therefore constructed as follows:
Firstly, an objective function is developed. After integrating the characteristics of dynamic topology and the principles of error coupling, a multi-objective optimization function is created as follows:
where
is the position set of all nodes in the time window [t − T, t]. As for the dynamic distance fitting term, the localization accuracy is optimized under dynamic topology by minimizing the deviation of the measured distance from the distance between estimated positions, while the Huber loss function
is adopted to inhibit the NLOS outliers. As for the trajectory’s total variation [
28] constraint, the continuity of node motion
is guaranteed by regularizing the total variation, while
is the regularization coefficient used to control the weight of trajectory smoothness. As for the covariance propagation suppression term, the spectral characteristic of Laplacian matrix L is utilized to confine the upper bound of propagation for the error covariance matrix
, while
is the regularization coefficient used to control the weight of covariance propagation suppression.
Secondly, a model is built under constraints to ensure that the solution space of the optimization problem is practically applicable to the systems. Compared with the traditional problems of localization, special attention should be paid to three key constraints under dynamic topology as follows:
① Communication topology constraint:
where
is the radius of communication;
is the edge set at the time
, which satisfies the adjacency matrix
with
if, and only if,
. This constraint requires the optimizer to satisfy the time variation in communication topology at the time of solving the position of nodes. When the distance between two nodes exceeds
, its ranging link immediately fails (
) so as to prevent the algorithm from erroneously using the outdated observation.
② Kinematic constraint:
where
is the maximum cruising velocity;
is the maximum acceleration; and
is the discrete time step length. This constraint guarantees that the solved node trajectories match the dynamic features of drones so that the algorithm will not generate the estimated positions beyond the capacity of the drones.
③ Error diffusion constraint:
where
is the error covariance matrix at time t;
is the maximum RMS error allowed by a single node; and
is the total number of nodes in a swarm. By restricting the trajectory of the global error covariance matrix, this constraint prevents the exponential diffusion of localization errors in the networks. Therefore, it is essentially the application of Lyapunov stability in the problem of localization to ensure that a system can still converge to the bounded region regardless of disturbances.
4. Spatio-Temporal Decoupling Cooperative Localization
In order to overcome the problem of localization diffusion caused by the spatio-temporal coupling error under dynamic topology, this paper proposes ST-DCL, a Spatio-Temporally Decoupled Cooperative Localization method with algebraic connectivity-guided closed-loop feedback. The core of the ST-DCL framework is the cooperative operation of a Dynamic Weighted Multidimensional Scaling (DW-MDS) optimizer and a Spatio-Temporal Graph Neural Network (ST-GNN) corrector. The DW-MDS optimizer is dedicated to efficient coarse localization and primary spatio-temporal decoupling at the global level, while the ST-GNN corrector focuses on nonlinear error compensation and spatio-temporal dependence modeling at the local level. These two components are interconnected via an innovative confidence feedback mechanism, driving a closed-loop system that reinforces their mutual performance, thereby constituting a theoretically grounded and provably convergent estimation framework. The overall architecture is illustrated in
Figure 2.
4.1. Framework of Dynamic Weighted Multidimensional Scaling (DW-MDS)
Traditional multidimensional scaling (MDS) methods are based on the static decomposition of a global distance matrix, so they cannot adapt to the abrupt changes in the observation matrix arising from dynamic topology. For this reason, a Dynamic Weighted MDS (DW-MDS) framework is proposed as a core component of ST-DCL to deliver spatio-temporal decoupling through a sliding window mechanism and adaptive weight assignment.
The DW-MDS manages error propagation through dual temporal and spatial weighting mechanisms. The adaptive sliding window mechanism (Equations (15) and (16)) assigns higher weights to recent observations via exponential decay, aiming to reduce the influence of outdated erroneous estimates on the current state, thereby mitigating temporal error accumulation. The adaptive spatial confidence weights (Equations (17) and (18)) dynamically adjust the contribution of each ranging measurement based on link type (LOS/NLOS) and packet loss rate, aiming to weaken the spatial diffusion of errors caused by unreliable links. Together, they provide attenuation paths for error propagation in both temporal and spatial dimensions.
4.1.1. Sliding Window Mechanism
In this design, attention is paid to resolving the problem of error accumulation caused by outdated historical data under dynamic topology. The weight of exponential attenuation is assigned as follows:
where the attenuation factor
is negatively correlated with the error accumulation rate
, that is,
, so that the weight of historical data is ensured to attenuate with the error propagation. Among them, the adaptive window length is set to
where the window length
is based on kinematic constraints, representing the theoretical minimum time required for a UAV to accelerate from a standstill to a displacement exceeding the communication radius [
29].
4.1.2. Adaptive Spatial Weight Assignment
When NLOS exists under dynamic topology, the confidence weight of each link should be dynamically adjusted to control the destructive influence of NLOS and high packet loss links on localization accuracy. The adaptive spatial weight function is designed to quantify the quality of links by weight:
where
is the mark of NLOS (whose value is 0/1), and detected by the link model in Equation (3);
is the particle loss rate; and
and
are coefficients, being 0.5 and 0.3, respectively. It is multiplied by the sliding window weight w(τ) to obtain the final weight:
4.1.3. Improved Stress Function
In traditional MDS, only geometrical constraints are taken into account, and the dynamic characteristics of nodes are ignored. In this section, the motion smoothing constraint is included in the MDS optimization to suppress the error propagation caused by the abrupt change in trajectory. The stress function is defined as
where
represents the estimated discrete velocity, and
is the regularization coefficient, which is determined by the maximum velocity error limit
.
The improved stress function is solved by iterative optimization to approach the global optimal solution under dynamic constraints in the following steps:
Step 1: Initialize and construct the initial coordinate matrix based on the node position and velocity in the output of the inertial navigation system (INS).
Step 2: Calculate the weighted distance matrix
with Guttman transform, and update it in the following rule:
where
is the estimated distance matrix of the current iteration, and
represents the Hadamard product. By virtue of matrix retraction mapping, nonlinear optimization is transformed into a linear iteration.
Step 3: Conduct the projection update and perform the geometrical centralized update. The geometrical invariability of MDS is utilized to ensure the spatial consistency of solutions.
Step 4: Utilize velocity constraint projection to perform the projection gradient descent of the updated coordinates:
where
is the set of dynamic constraints. In this operation, convex set projection is carried out to guarantee the constant existence of solutions in the feasible domain. Above all, Algorithm 1 outlines the iterative DW-MDS optimizer. Its core steps involve the following: computing a transformed distance matrix via the Guttman transform (lines 3–4), updating node positions (line 5), enforcing kinematic constraints through projection (lines 6–10), and checking for convergence (lines 12–14). The pseudo-code of the DW-MDS algorithm is concluded as follows:
| Algorithm 1. DW-MDS Algorithm |
Input: Ranging observation set , adaptive spatial weight , temporal attenuation weight , initial position , maximum iteration times , convergence threshold , and maximum velocity constraint in the current time window Output: Estimate the optimization position 1. Initialize: , calculate the initial global weight matrix 2. for do 3. Guttman transform: Based on , calculate the estimated distance Calculate the transform matrix 4. Position update: 5. for each node i do 6. Calculate the estimated velocity: 7. if then 8. 9. Rectify the position backward based on the corrected velocity 10. end if 11. end for 12. if then 13. Bteak 14. end if 15. end for 16. 17. return |
4.1.4. Convergence Guarantee and Complexity Control of DW-MDS Algorithm
(1) Convergence of the algorithm
Convergence theorem: While satisfying the regularization parameter (with as the spectral radius and L as the Laplacian matrix), the improved stress function has an exclusive global minimum on the compact convex set . Moreover, the DW-MDS iteration algorithm converges at a linear rate to the solution.
Proof (of convergence theorem). Step 1: Construct a Hessian matrix. The Hessian matrix of the improved stress function
is as follows:
where
is the block diagonalized dynamic weight matrix;
is the expanded graph Laplacian matrix; and
is the 3D unit matrix.
Step 2: Prove the positive definiteness. For any non-zero vector
, we have
Since W is a diagonal dominance matrix and L is semi-definite, is positively definite when .
Step 3: Apply the Banach fixed-point theorem and define the mapping
. Its Lipschitz constant
satisfies
According to the Banach theorem, the iteration must converge to an exclusive fixed point.
Conclusion: The improved stress function is strongly convex on the compact convex set . The DW-MDS algorithm converges linearly at a rate of .
The convergence condition requires the regularization parameter to satisfy . In practice, is not known a priori and varies with the dynamic topology. Therefore, we employ a practical strategy by setting , where is a conservative estimate of the spectral radius based on the average node degree, and is a safety factor greater than 1 (set to in our simulations to ensure under varying conditions). □
(2) Complexity of the algorithm
The computational complexity of the DW-MDS framework has three dominant parts: a. complexity of dynamic weight update
, which needs to traverse the links in the window at all times (
operations in total); b. complexity of Guttman transform
, which is mainly intended to complete the computation of
floating points involved in the weighted distance matrix B; and c. complexity of projection gradient descent
, which mainly executes the mapping to the velocity of each node (computation of N vector norms). The total time complexity is as follows:
Compared with the complexity of traditional MDS , the computational efficiency is improved by 1–2 magnitudes when T≪N (typical value ).
4.2. Spatio-Temporal Graph Neural Network (ST-GNN)
The DW-MDS framework delivers global coarse localization but is limited by its sliding window mechanism and struggles to capture the nonlinear propagation of local errors arising from abrupt topological changes. Therefore, the Spatio-Temporal Graph Neural Network (ST-GNN) is proposed as the second core component of ST-DCL, serving as a local correction module that works in mutual compensation with the DW-MDS. Its key innovations include dynamic spatio-temporal attention (adaptively fusing heterogeneous spatio-temporal features to address the over-smoothing problem of traditional GNNs), dilated causal convolution (modeling long-term time series dependence to suppress the butterfly effect of error accumulation), and a confidence feedback mechanism (feeding local corrections back to the DW-MDS to achieve closed-loop optimization). The overall structure of the ST-GNN is presented in
Figure 3.
The ST-GNN designed in this paper differs from standard Spatio-Temporal Graph Neural Networks, primarily in its components tailored to dynamic swarm localization. The dynamic spatio-temporal attention module (
Section 4.2.1) explicitly incorporates relative kinematic features (relative velocity) and link confidence into the attention weight calculation, enabling it to respond to topology changes and link reliability variations caused by maneuvers beyond mere node feature similarity. The choice of dilated causal convolution (
Section 4.2.2) over recurrent neural networks (RNNs) or temporal self-attention is mainly based on two considerations: first, its deterministic receptive field can cover long time spans, facilitating alignment with the time scale of error accumulation; second, its feedforward computation structure is more amenable to parallel processing, which is beneficial for meeting the determinism and low-latency requirements of UAV onboard computing platforms.
4.2.1. Dynamic Spatio-Temporal Attention Layer
This layer addresses the problem of inaccurate feature aggregation caused by time-varying neighbor relationships under dynamic topology, enhancing robustness against sudden link disruptions.
Firstly, multimodal feature encoding must be performed as the fundamental information processing unit for the dynamic spatio-temporal attention mechanism. Its key objective is to integrate heterogeneous sensor data with state estimates and construct a feature space that comprehensively describes the dynamic behavior of nodes so as to provide sufficient information support for the subsequent calculation of spatio-temporal attention weights:
where
are inherited from the output of the DW-MDS in
Section 4.1 and provide the kinematic prior;
is the residual error that is the difference between the estimated position and the output of the DW-MDS, and it is used to quantify the need for local correction; and
is the confidence level, which reflects the reliability of the DW-MDS weight matrix and is used for gate control regulation.
Secondly, the computation of dynamic attention weights must be performed and exists as the core decision-making unit of the dynamic spatio-temporal attention mechanism. Its primary goal is the adaptive adjustment of information transfer intensity among neighboring nodes based on real-time topological changes and node states so as to achieve the dual optimization of critical information enhancement and abnormal interference suppression in the dynamic denied environment. In this section, the aggregation weights are adjusted on the basis of relative motion features:
where
indicates the relative velocity between two nodes;
is the trainable weight matrix to realize nonlinear mapping in the feature space. If the nodes i and j are separated quickly (
is too large) and
, potential abnormal links are automatically cut off.
At last, the gate control state should be updated to dynamically balance the historical states and neighbor information in the following process:
where
is the Sigmoid function and its output ranges [0, 1], and
is the Hadamard product to weigh at the feature level.
4.2.2. Dilated Causal Convolution Layer
This component is designed to model long-term temporal dependencies, thereby suppressing the exponential accumulation of errors identified in the temporal dimension analysis (
Section 3.2). Traditional time series models (e.g., RNN) often fail to capture such long-range dependencies effectively.
Firstly, multi-scale dilated convolutions are designed to capture multi-scale temporal patterns (instantaneous maneuvering, medium-term trend, and long-term cycle) in drone swarm motion through convolution kernels of varying dilation factors. This design helps address the short-sighted problem of traditional RNNs with vanishing gradients. It is assumed that dilated convolutions exist in stacked L layers, and each layer has a dilation factor
. The dilation factor
covers a time span corresponding to the term
in the error recursive equation in
Section 3.2, meaning that the receptive field is large enough to suppress the effect of accumulation. The time series characteristics are extracted by the following formula:
where
represents the width of the convolution kernel;
is the dilation factor of the lth layer (
= 1 for the first layer,
= 2 for the second layer, and
= 4 for the third layer). The perceptive field is expanded exponentially (e.g., covering the time t − 7 to t when the layer is L = 3) so that the model can be built to cover the error propagation paths of seven time steps. This overcomes the traditional RNN’s limitation of capturing dependence in only 3–5 steps. Moreover,
corresponds to the bias vector of the convolution layer.
Secondly, strict causal constraints are imposed. A temporal model is built with only historical information to satisfy the requirement for causality in the online localization of drone swarms (future data cannot be used). The zero-fill strategy is adopted to perform zero initialization at the time when
:
The output series are moved leftwards by the position to eliminate the future risk of information leakage and satisfy the strict causality with the output relying only on the data [, t] at time t.
At last, multi-level feature fusion is conducted to fuse motion features at different time scales in an adaptive way and to enhance the ability to characterize complex maneuvering patterns. Attention weights are calculated by
where
reflects the importance of dilated convolution features in the mth layer. In a scenario of unexpected maneuvering, the weight
of the lower layer (
) increases. In the scenario of cruising at a constant velocity, the weight
of the higher layer (
) dominates. The output of feature fusion is as follows:
The multi-scale time series features are dynamically integrated to make the model not only respond to unexpected states (depending on short-term features) but also suppress long-term error accumulation (depending on long-term features).
4.2.3. Residual Correction and Feedback Mechanism
This mechanism ensures kinematic feasibility while compensating for local errors and feeds the correction results back to the global DW-MDS optimizer, realizing the closed-loop error suppression that is central to the ST-DCL framework.
where
is the maximum allowed localization bias.
Secondly, residual errors are fused under the drive of the confidence level to dynamically adjust the global and local weights of localization results:
where
is the confidence level. The higher the confidence level
, the larger the
, which implies a stronger trust in local correction. Moreover,
is a bias scalar used to adjust the baseline value of the weight
.
In the end, closed-loop feedback regulation is carried out to feed back the corrected position error to the DW-MDS framework:
where
is the ranging reconstruction error to measure the geometrical consistency after correction;
is the feedback gain coefficient. If the corrected distance
is much different from the measured value
, the weight of the link is lowered to block error propagation. Above all, the pseudo-code of the ST-GNN algorithm is presented in Algorithm 2. Algorithm 2 details the ST-GNN correction procedure. It operates in three phases: encoding node features and computing attention weights (lines 5–13), performing multi-scale temporal convolution (lines 15–22), and finally, generating and applying the confidence-weighted correction while updating the feedback (lines 24–34).
The feedback mechanism (Equation (36)) modulates the edge weights for the next DW-MDS iteration. Stability is ensured because the feedback acts as a damping factor rather than a driving force. The gain
controls the sensitivity of weight reduction. If
is too large, it may cause topology oscillation (frequently discarding and recovering links); if it is too small, error suppression is slow. In practice, we set
such that weight changes do not exceed 10% per iteration, ensuring that the global optimizer’s solution space changes smoothly, thus preventing system divergence.
| Algorithm 2. ST-GNN Algorithm |
Input: DW-MDS output position , estimated range observation set , weight matrices , , , , , ; bias , ; maximum bias , feedback gain , feature dimension , convolution layer , width of convolution kernel , dilation factor ; maximum iteration times , convergence threshold , and maximum velocity constraint . Output: corrected position , update weight (feed back to DW-MDS) 1. Initialize: Initialize node features 2. Modified for each node i and time do 3. 4. 5. end for 6. for to do 7. for node do 8. Calculate neighbor attention weight 9. Gate control update: , 10. end for 11. for node i do 12. for l = 1to Ldo 13. for to : 14. Time series convolution: 15. end for 16. end for 17. 18. 19. end for 20. for node i ∈ V 21. Generate correction 22. Confidence fusion , 23. Update features , 24. end for 25. for link : 26. 27. 28. end for 29. if and 30. Break 31. end for 32. return |
4.2.4. Theoretical Performance Analysis
(1) Error correction convergence analysis
Proposition (error correction convergence). While satisfying the following conditions, the residual error correction sequence generated by the ST-GNN converges by probability to the zero neighborhood.
Stability of dynamic topology: The network connectivity satisfies (with as the algebraic connectivity of the graph).
Ranging bounded:
Decay in learning rate:
Proof (of the proposition). Step 1: Define the Lyapunov function, and construct the energy function to measure the state of the system error:
where
is the error covariance matrix of node i, and
is the weight coefficient.
Step 2: Establish the recurrent relationship, and expand
following the residual error update rule of the ST-GNN
:
where
is the historical information before the time t;
and
are the Lipschitz constant of the gradient, respectively; and
and
are the constants of relevance to network parameters, respectively.
Step 3: Conclude the convergence. Based on the theory of stochastic approximation, if the learning rate satisfies Condition 3, the gradient descent term is driven by
In other words, the correction eventually enters the neighborhood of the radius so as to converge by probability. □
(2) Time complexity analysis of the algorithm
The time complexity of the ST-GNN is dominated by three parts: a. dynamic attention mechanism. The attention weight of nodes is calculated for each time step. The complexity is , with T as the time window length, N as the number of nodes, and d as the feature dimension (64 by default); b. dilated spatio-temporal convolution. For the dilated convolution of L layers, the convolution kernel of the width K is executed in each layer. The complexity is , with L and K to be 3 and 3, respectively; and c. residual error correction mapping. The position correction is generated by linear layers. The complexity is . The overall time complexity is , which has a linear relationship with the number of nodes N. It is significantly lower than that of traditional methods (e.g., in the MDS-MAP) and supports the real-time deployment of thousand nodes.
4.2.5. Training Strategy and Generalization Capability Design
(1) Training Data Construction
The training data is sourced from the simulation environment established in
Section 5 of this paper. Utilizing the motion model (Equation (40)), dynamic topology constraints (Equation (3)), and error model (Equation (4)) within this environment, we generated over 500 dynamic trajectory sequences. This dataset extensively covers various formation patterns, including random distributions, linear, circular, and helical formations. The node velocities range from 5 to 80 m per second, and random non-line-of-sight (NLOS) links with ratios varying from 0% to 40% are incorporated. This ensures the diversity and complexity of the training dataset.
(2) Data Augmentation Techniques
To enhance the model’s robustness against unknown disturbances and noise, multiple data augmentation techniques are applied online during training. First, random edge dropout is implemented, where at each training time step, edges in the adjacency matrix are randomly masked with a probability of 15%. This simulates transient link failures that may occur in actual communication. Second, Gaussian noise is added to the input state information (node positions and velocities), with a standard deviation of 0.1 m for position perturbations and 0.2 m/s for velocity perturbations, simulating uncertainties in perception and state estimation. Finally, a mixed-noise strategy is adopted for generating noise in ranging observations. Each training batch randomly combines Gaussian noise, uniformly distributed noise (simulating bounded interference), and impulse noise (simulating occasional large errors), thereby compelling the model to learn more generalizable error suppression features.
(3) Training Procedure and Methodology
A two-stage training strategy is employed. In the first stage, the model is pre-trained using the mean squared error between the predicted positions and the ground-truth positions provided by the simulation, enabling the model to learn the basic mapping for localization. In the second stage, a joint loss function is introduced for fine-tuning. This function includes both the mean squared error term and a smoothness regularization term based on the second-order difference in the node’s estimated trajectory, promoting physical plausibility. The entire training process uses the Adam optimizer with an initial learning rate of 0.001, and a cosine annealing scheduler gradually decays the learning rate to 1 × 10−5.
(4) Generalization Capability Design
The model’s generalization capability stems from the synergy between its architectural design and training strategy. At the architectural level, the dynamic topological attention mechanism enables it to adaptively handle unseen network connectivity patterns, while the multi-scale temporal convolution module ensures the ability to capture temporal dependencies under different maneuvering modes. Regarding the training strategy, a strict “scene isolation” principle is adopted: during training set construction, certain complex specific formation patterns (e.g., high-speed dense interweaving maneuvers) are deliberately excluded. These reserved scenarios are used exclusively for final testing, thereby enabling rigorous evaluation of the model’s zero-shot generalization capability. This design establishes a solid foundation for transferring the algorithm from the simulation environment to subsequent real-world flight tests.
5. Simulation Analysis
In order to verify the effectiveness of the ST-DCL method proposed in this paper, tests were conducted in the Matlab 2024a environment to examine its positioning performance, convergence, running time, and capability of processing abnormal data. In the tests, the network covered an area of 10 km × 10 km × 5 km. The number of nodes was 200. The velocity was a random value between 0 and 100 m/s. The direction varied randomly within 0–360°. The largest communication range of nodes was R = 5 km. It was assumed that the ranging error was subject to the Gaussian distribution, that is,
. A simple air motion model of patrol missiles was established, with
as the maximum velocity component of nodes. At time t, the coordinates of the nodes are defined as
where
is an evenly distributed random number, and
.
All of the numbers were the average from 350 stochastic simulation tests. Meanwhile, the normalized root mean square error (NRMSE) was adopted as an index to evaluate the localization results. Its expression is as follows:
where
is the estimated position of the
ith node. In
Appendix B, the Cramer Rao Lower Bound (CRLB) is also taken as a criterion for assessing the performance of the proposed algorithm. A total of 15% of nodes were randomly selected as the anchor nodes and tested in randomly distributed networks, U-shape networks, and O-shape networks, respectively.
5.1. Experimental Setup
All simulations were conducted in MATLAB R2024a on a desktop computer with an Intel i7-12700K CPU and 32 GB RAM. The swarm operates in a 3D volume of 10 km × 10 km × 5 km. Key parameters are the number of nodes
N = 200, maximum velocity
∈ {50, 100} m/s, maximum communication range
= 5 km, ranging noise
~N(0, 10
2), NLOS bias
= 25 m, occurring on 15% of randomly selected links, and the discrete time step of
= 1 s. We compare our ST-DCL method against four state-of-the-art cooperative localization algorithms: classical MDS [
8], FC-MDS [
11], dwMDS [
10], and vMDS [
9]. For a fair comparison, all algorithms were implemented by the authors in MATLAB based on their original formulations. The normalized root mean square error (NRMSE, Equation (41)) is used as the primary evaluation metric.
5.2. Performance of the Proposed ST-DCL Method in Different Network Distributions
In order to verify the overall performance of the ST-DCL method in swarm localization, ranging errors were evenly distributed in
. After statistics, we obtained the localization results of nodes with the MDS-MAP and the ST-DCL method in randomly distributed networks, U-shape networks, and O-shape networks, as shown in
Figure 4.
Meanwhile, the nodes in randomly distributed networks were taken as an example to present the thermal map and distribution map of localization errors with the ST-DCL method and the classical MDS algorithm.
The results from
Figure 5 tests reveal that the traditional MDS algorithm has a much larger localization error than the proposed ST-DCL method in all of these networks, including poorly connected, randomly distributed networks; non-convex U-shape networks; and highly symmetric O-shape networks. It must be attributed to the innate defects of the traditional MDS algorithm, including static processing, global sensitivity, and the lack of error resistance mechanism. The proposed ST-DCL method delivers the error suppression in the temporal and spatial dimensions through its DW-MDS module. Additionally, the ST-GNN module is added for local nonlinear and dynamic topology compensation so as to effectively cope with all kinds of complex networks. Therefore, the ST-DCL method is proven to be highly robust and greatly capable of generalization.
5.3. NRMSE Performance
In randomly distributed networks, the percentage of NLOS links was 15%. The maximum measurement bias was
= 25 m. The ranging error was
, with
N = 200. The proposed ST-DCL method was compared with classical MDS, FC-MDS, dwMDS, and vMDS, which are currently popular in terms of performance. The NRMSE performance of these algorithms varied with time (t ∈ [1, 100]), as shown in
Figure 6. In
Figure 6a, we have
= 100. In
Figure 6b, we have
= 50. In
Figure 6a,b, the five algorithms show a similar trend. The average NRMSE performance of the ST-DCL method was 0.0077 and 0.0068, respectively, so that the algorithm always maintained the optimal performance. The ST-DCL method has the capability of a fast response, and its topology attention can instantaneously suppress links of poor quality. In this way, it can prevent the intense diffusion of errors. Moreover, it is capable of closed-loop feedback and nonlinear correction, lowering the error quickly. As a centralized algorithm, MDS uses the distance of the shortest path instead of the actual distance to constitute a distance matrix, which is the main source of errors. The algorithm FC-MDS employs the strategy of iterative refinement to generate local mapping at each node, which results in excessive local mapping in networks. The algorithm vMDS mainly focuses on temporal smoothing. The trajectory smoothing term is adopted to prevent the abrupt change in estimates in the temporal dimension, but it is difficult to cope with the error caused by multi-hop propagation in space. The algorithm dwMDS attaches importance to spatial weight but lacks the explicit temporal handling mechanism.
5.4. Estimation Tracks of the Mobile Network
In randomly distributed networks, the percentage of NLOS links was 15%. The maximum measurement bias was
= 25 m. The ranging error was
, with r = 5000 m, N = 200, Vmax = 10 m/s, and t ∈ [1, 100]. The actual trajectories of nodes and the estimated trajectories calculated by the ST-DCL method are illustrated in
Figure 7. Four nodes, 21, 22, 23, and 24, were selected. In
Figure 7a, the colored solid lines indicate the estimated trajectories by the ST-DCL method, while the black broken lines represent the actual trajectories of nodes. The high level of overlapping between such lines implies that the algorithm FC-MDS can accurately estimate the accurate positions of nodes. In
Figure 7b, the ZX plane is selected, and the trajectories of nodes in the plane can more vividly demonstrate the estimated performance of the proposed method.
5.5. NRMSE with Different Radii of Communication
In randomly distributed networks, the percentage of NLOS links was 15%. The maximum measurement bias was
= 25 m. The ranging error was
, with Vmax = 10 m/s, N = 200, and t ∈ [1, 100]. In
Figure 8, the NRMSE performance of the algorithms is presented with different radii of communication R in randomly distributed networks. By increasing the radius of communication, the NRMSE of these algorithms decreases. When R = 3500, the NRMSE performance of the ST-DCL method is close to CRLB. When the radius of communication R exceeds 2500, the algorithm vMDS needs to cache two-hop neighbors for nodes so that it gradually degenerates into a centralized algorithm. Its NRMSE performance is not much different from that of the ST-DCL method. On the whole, the ST-DCL method delivers the optimal NRMSE performance.
5.6. Computational Efficiency and Overhead Analysis
In randomly distributed networks, the percentage of NLOS links was 15%. The maximum measurement bias was
= 25 m. The ranging error was
, with R = 5000 m,
= 10 m/s, and t ∈ [1, 100]. To assess the suitability of ST-DCL for real-time deployment on resource-constrained platforms, we analyzed the average runtime per time step relative to the baselines, as shown in
Figure 9. (1) Overhead vs. Performance Trade-off: Compared to the lightweight dwMDS baseline, ST-DCL introduces marginal computational overhead due to the ST-GNN inference and sliding window maintenance. Specifically, for the swarm size, the average runtime increases from 28 ms (dwMDS) to 36 ms (ST-DCL), representing an overhead of approximately 28.6%. However, this slight increase in computational cost yields a significant 66% reduction in localization error (as detailed in
Section 5.3), demonstrating a highly favorable cost-performance trade-off. (2) Real-time Feasibility: Despite the added overhead, ST-DCL remains significantly more efficient than global optimization methods like MDS-MAP and SDP-based approaches, achieving a speedup of roughly 2.6x. Most critically, the absolute runtime of 42.1 ms stays well below the 100 ms threshold required for typical 10 Hz drone control loops. This confirms that the proposed framework is computationally efficient enough to run online on onboard embedded computers (e.g., NVIDIA Jetson series) without inducing control latency.
5.7. Robustness and Stress Testing
This section validates the robustness of the ST-DCL framework under two extremely challenging stress scenarios that go beyond conventional simulation parameters: high-probability and non-Gaussian NLOS errors and extremely low connectivity with intermittent links. Results are based on over 100 independent Monte Carlo runs.
(1) Limit Test Against Severe NLOS Errors
To emulate harsh, realistic conditions beyond the 15% NLOS probability used in prior sections, we systematically increase it to 50%; for other parameter settings, see
Section 5.1. Furthermore, the NLOS bias is modeled with a heavy-tailed, two-component Gaussian Mixture Model (GMM). One component represents the nominal ranging noise (standard deviation of 10 m, consistent with LOS conditions), while the other simulates severe outliers with a standard deviation of up to 50 m—five times the nominal value. As shown in
Figure 10, when the NLOS probability exceeds 20% or the error distribution becomes heavy-tailed, the localization error of traditional optimization methods (e.g., dwMDS) and pure learning methods degrades sharply. In contrast, ST-DCL demonstrates remarkable stability. Its NRMSE increases gradually, maintaining usable accuracy (NRMSE < 0.08) even at 50% NLOS probability. This resilience stems from the synergistic defense of its hybrid architecture: the Huber loss in DW-MDS provides the first layer of robust estimation, while the dynamic attention mechanism in the ST-GNN adaptively reduces the aggregation weights for links highly suspected of NLOS to near-zero levels, achieving effective outlier isolation at the feature level and curbing spatial error diffusion.
(2) Performance Under Extremely Sparse and Dynamic Connectivity
This test evaluates performance near the network’s connectivity limit, beyond the communication range analysis in
Section 5.4. We construct two severe scenarios: (1) ultra-low connectivity: A short communication radius reduces the network’s average degree to 2.5 (near the connectivity threshold). (2) Highly dynamic intermittent links: Simulates random link outages with a 40% failure probability at each time step.
As shown in
Figure 11a, in the ultra-low connectivity scenario (with an average node degree around 2.5 near the critical connectivity threshold), simulation results indicate that as network connectivity approaches critical levels, the average normalized root mean square error (NRMSE) of all algorithms generally increases, reflecting a significant decline in localization performance. Among them, ST-DCL maintains a relatively low error rate even under extreme sparsity, demonstrating strong robustness, while traditional MDS shows the most pronounced performance degradation and highest sensitivity to connectivity. Overall, all algorithms exhibit a performance turning point near the critical connectivity region, confirming the substantial impact of network connectivity on localization accuracy.
As shown in
Figure 11b, in the highly dynamic intermittent connectivity scenario, where each link has a 25% probability of failure at each time step, the localization performance of all algorithms exhibits significant fluctuations due to the random link interruptions. The traditional MDS and FC-MDS methods suffer from severe performance degradation and high error variance, as they lack the mechanisms to adapt to such dynamic link failures. The dwMDS and vMDS methods show moderate improvements by incorporating spatial weights and temporal smoothing, respectively, but still struggle with the rapidly changing topology. In contrast, ST-DCL demonstrates superior robustness: its confidence-based feedback mechanism quickly identifies and down-weights unstable links, thereby concentrating computational resources on reliable connections. Although the convergence speed of ST-DCL is slowed by link instability, it eventually achieves a significantly higher and more stable accuracy than all baseline methods, validating its effectiveness in highly dynamic and intermittent connectivity environments.
6. Real-World Flight Validation and Benchmarking
To validate the performance and engineering practicality of the ST-DCL framework in real-world dynamics, such as GPS-denied environments, this chapter conducts field flight experiments based on ten UAVs. The primary objectives of the experiments include the following: (1) verifying the real-time capability and stability of the framework on real hardware platforms; (2) testing the algorithm’s robustness against dynamic topologies and non-ideal ranging errors in complex scenarios involving building occlusions and intermittent links; and (3) performing a fair, offline performance comparison among ST-DCL, traditional optimization methods, and advanced deep learning models using a unified sensor dataset synchronously recorded during the flights. This chapter sequentially introduces the experimental system, designed scenarios, and evaluation methodology, followed by an analysis and discussion of the results.
6.1. Experimental System and Data Acquisition Protocol
6.1.1. Robotic Platform and Sensor Suite
A homogeneous swarm consisting of ten custom-made quadrotor drones serves as the test platform, as illustrated in
Figure 12. Each platform is equipped with an identical set of sensors and computing hardware:
1. Relative Ranging: A LD150(-1) Ultra-Wideband (UWB, sourced from China Haoru Technology Company, Beijing, China) module provided peer-to-peer distance measurements. The communication radius was configured to 400 m. We acknowledge and characterize the inherent clock offset and skew between modules, which introduce a systematic bias into raw range observations.
2. Onboard Computation and Logging: An NVIDIA Jetson Nano (4 GB) served as the primary computer. Its dual role was to (i) execute the real-time ST-DCL pipeline and (ii) record all raw sensor data with microsecond-precision timestamps to an onboard SD card. This device represents the class of resource-constrained embedded processors targeted by our design.
3. Proprioceptive and Reference Sensors: The stock Pixhawk flight controller provided inertial measurement unit (IMU) data (accelerometer and gyroscope) and single-point GPS fixes.
4. A Note on Ground-Truth Fidelity: The single-point GPS positions serve as our reference trajectory. Under open-sky conditions, their typical horizontal accuracy is 1.5–2.5 m (CEP). In proximity to structures, multipath effects can degrade this further. While this precludes a centimeter-accurate absolute error assessment, the relative motion and temporal trends provided by GPS are sufficiently reliable for a comparative analysis of algorithm behavior, robustness, and consistency. We supplement this with topology-relative metrics to mitigate absolute drift.
6.1.2. Software Stack and Data Pipeline
The software architecture was designed to decouple real-time operation from high-fidelity data collection for post hoc analysis:
1. Real-Time Onboard Localization: A streamlined C++ implementation of the ST-DCL algorithm ran at 10 Hz on each Jetson Nano, consuming live UWB and IMU streams to produce pose estimates.
2. High-Fidelity Sensor Logging: A separate, high-priority process sampled and packetized all sensor data at their native frequencies (UWB: ~100 Hz, IMU: 200 Hz, and GPS: 5 Hz) with synchronized hardware timestamps. This resulted in a dense, multimodal time series dataset for each vehicle.
3. Post-Flight Benchmarking Environment: All logs were synchronized and merged offline on a high-performance workstation to create a unified “FlightBench” dataset. This dataset, comprising {time, UWB_range_matrix, IMU_data, GPS_fix}, forms the immutable basis for all subsequent algorithmic comparisons, guaranteeing identical input conditions.
6.1.3. Deployment Environment
Experiments were conducted in a university campus environment (
Figure 13), selected for its heterogeneous RF propagation characteristics:
1. Zone A (Open Basketball Court): A large, open area primarily exhibiting line-of-sight (LOS) conditions, serving as a baseline environment.
2. Zone B (Perimeter of a Multi-Story Academic Building): A challenging zone featuring dense obstacles. Concrete walls induce severe NLOS attenuation and complete signal occlusion, while also creating strong GPS multipath interference. This zone is instrumental for stress-testing NLOS resilience.
6.2. Designed Flight Scenarios
Three mission-style scenarios were executed to systematically probe different aspects of performance:
S1: Dynamic Formation in Open Space: The swarm performed coordinated figure-eight and expanding/contracting circle maneuvers in Zone A. This scenario tests basic tracking accuracy and smoothness under mild, predictable topology changes.
S2: Controlled NLOS Penetration and Network Partition: A high-stress scenario where the swarm was split into two sub-teams flying through opposite sides of the building in Zone B (
Figure 13b). This orchestrates a predictable yet severe network partition and NLOS event, challenging the algorithm’s error suppression and recovery convergence.
S3: Prolonged Mixed-Environment Surveillance: A 15 min free-flight mission where UAVs patrolled pseudo-randomly between Zones A and B. This tests long-term stability, adaptability to intermittent challenges, and overall system endurance.
6.3. Experimental Methodology
To ensure fairness and consistency in evaluation, this experiment adopts a strategy of “Online Verification, Offline Uniform Processing”:
1. Online Operation: All UAVs run the ST-DCL algorithm in real time, verifying its stability and feasibility on the embedded platforms (Jetson Nano).
2. Offline Uniform Processing: After the flights, the sensor data (UWB ranging, IMU, and GPS) recorded by all UAVs are time-synchronized and aligned on a unified high-performance computer, forming a standard dataset. All compared algorithms—including the baseline methods MDS [
8], dwMDS [
10], vMDS [
9], our proposed ST-DCL, and a representative deep learning baseline, the Spatio-Temporal Graph Attention Network (ST-GAT) [
30]—are executed on this identical dataset. This method completely eliminates interference from hardware differences and real-time system scheduling.
The evaluation metrics primarily consist of the following two, balancing absolute accuracy and internal consistency: (i) Absolute Trajectory Error (ATE): The root mean square error between the estimated position and the GPS’s reference position; (ii) Formation Geometry Error (FGE): For each timestamp, calculate the relative distances between all UAV pairs, then compute the RMSE of the differences between the distances derived from the estimated positions and those from the GPS references. This metric effectively mitigates the common error caused by GPS absolute drift, more reliably reflecting the algorithm’s ability to maintain the internal geometry of the swarm.
6.4. Results and Analysis
6.4.1. System Feasibility: Computational Load and Real-Time Performance
To validate the deployment feasibility of ST-DCL on resource-constrained platforms, we measured the average processing time per frame of each algorithm on the Jetson Nano. The results are summarized in
Table 1. ST-DCL achieves an average runtime of approximately 45 ms, meeting the 10 Hz real-time requirement. In comparison, the traditional MDS is unsuitable for real-time operation due to its O(N
3) complexity. While vMDS and dwMDS are operable, their execution times are significantly longer than that of ST-DCL. Although the deep learning model ST-GAT has a short inference time, it requires extensive offline training on simulated data and struggles to generalize unseen real-world impairments, highlighting a key limitation of purely data-driven approaches. These results confirm the efficiency and practical balance achieved by ST-DCL for lightweight embedded systems.
6.4.2. Comprehensive Localization Performance Comparison
The unified flight dataset for offline benchmarking comprises the time-synchronized UWB ranging matrices, IMU measurements, and GPS references from all ten UAVs across the three designed scenarios (S1, S2, and S3). The localization accuracy of all algorithms on the unified flight dataset, with the Formation Geometry Error (FGE) as the primary metric, is presented in
Table 2. The key findings are as follows:
1. Consistent Superiority: ST-DCL achieves the best accuracy across all three scenarios.
2. Robustness Verification: In the most challenging scenario, Scenario 2, featuring structured NLOS, the ST-DCL’s error (1.52 m) is significantly lower than that of traditional optimization methods (dwMDS: 3.01 m; vMDS: 2.45 m), demonstrating the effectiveness of its closed-loop error suppression mechanism.
3. Comparison with Deep Learning: The deep learning baseline ST-GAT performs reasonably within its training distribution (Scenario 3), but its performance severely degrades in the out-of-distribution (OOD) scenario, Scenario 2, even underperforming vMDS. This clearly reveals the generalization bottleneck of purely data-driven methods when confronting real, unseen physical-layer impairments. In contrast, ST-DCL, which integrates theory-guided optimization with data-driven compensation, exhibits stronger environmental adaptability.
6.4.3. Visualization of the Key Mechanism
To intuitively demonstrate the core adaptive confidence weight mechanism of ST-DCL, we performed an in-depth analysis using the temporal data of a representative link from the ten-UAV swarm that underwent a complete NLOS event cycle. The results are shown in
Figure 14. Although the experiment involved 45 potential links, the temporal data from this single link clearly reveals the general operating principle of the algorithm.
Figure 14a illustrates three key variables: the GPS-RTK reference distance reflecting true relative motion (black line), the raw UWB ranging measurements containing significant NLOS bias (red line), and the dynamically calculated confidence weight cij(t) by the algorithm (blue line). It can be observed that during the NLOS period at t ≈ 8–10 s, the ranging values exhibit a sudden jump, while the confidence weight plummets from ~1.0 to below 0.2 within approximately 150 milliseconds, achieving an active “soft cut-off” of the unreliable link.
Figure 14b further demonstrates the effect of this mechanism: the positioning error of the node associated with this link (green line), although increased during the NLOS period, shows that its peak value is significantly suppressed, and its convergence speed is much faster than the reference case without suppression (red line). This visualization of a single link directly correlates the dynamic weight adjustment with the error suppression effect. In practice, this mechanism operates independently and in parallel on all links within the swarm, collectively forming the foundation of the ST-DCL framework for resisting dynamic topology and NLOS interference, thereby enhancing global positioning consistency.
7. Conclusions and Future Work
This paper has confronted the fundamental challenge of spatio-temporal error coupling in cooperative localization for dynamic drone swarms. The core outcome is the development of ST-DCL, a unified framework that mathematically formalizes, algorithmically decouples, and systematically suppresses the intertwined error propagation mechanisms under dynamic topology. The principal contributions are fourfold:
1. Theoretical Foundation: We established a novel differential equation model for error propagation under dynamic topology, which, for the first time, rigorously quantifies the joint impact of abrupt kinematic changes and network algebraic connectivity on localization error, providing a predictive upper bound.
2. Algorithmic Innovation: We introduced a spatio-temporal decoupling architecture via the synergistic cooperation of the DW-MDS optimizer and the ST-GNN corrector. This design cleanly separates the task of global coarse localization with inherent error suppression (via DW-MDS) from that of local, nonlinear error compensation (via ST-GNN).
3. Provable Convergence Guarantees: The framework is underpinned by rigorous theoretical analysis, proving the global linear convergence of the DW-MDS optimizer and the bounded convergence of the ST-GNN’s residual correction, thereby delivering a provably stable localization system.
4. The framework’s performance is rigorously validated through both extensive simulations and real-world flight tests. Under stringent simulated conditions (200 nodes, 50 m/s, and 15% NLOS), ST-DCL achieves an NRMSE of 0.0068, representing a 21% enhancement over state-of-the-art methods. Furthermore, field experiments with a 10-UAV swarm in GPS-denied, NLOS-prone environments confirm its practical robustness, real-time capability, and consistent accuracy, thereby substantiating its deployment readiness for autonomous missions.
Despite its strengths, there are still some limitations in this research. The computation of the proposed algorithm is highly complex because of the DW-MDS sliding window optimization and the ST-GNN dynamic attention mechanism. Hence, it faces some real-time challenges in very large swarms. The future research will focus on innovative clustered architecture and algebraic connectivity-guided layered topology segmentation so as to distribute the loads of computation to local optimizations of subset swarms. Moreover, lightweight ST-GNN reasoning engines are utilized to quantitatively compress the complexity from to .
In addition, while the chosen hyperparameters (e.g., dilation factor, feedback gain) demonstrate robust performance in our validation scenarios, systematic sensitivity analysis and adaptive tuning strategies for varying deployment conditions remain valuable future work to further enhance practical applicability.